idnits 2.17.1 draft-ietf-nfsv4-rfc3530bis-35.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The abstract seems to indicate that this document obsoletes RFC1813, but the header doesn't have an 'Obsoletes:' line to match this. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 999 has weird spacing: '...ned int cb_...' == Line 9746 has weird spacing: '...S4resok reso...' == Line 9910 has weird spacing: '...T4resok reso...' == Line 10232 has weird spacing: '...R4resok resok...' == Line 10475 has weird spacing: '...4denied den...' == (10 more instances...) == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 04, 2014) is 3431 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '16' on line 2242 -- Looks like a reference, but probably isn't: '17' on line 2242 == Unused Reference: 'RFC0793' is defined on line 13773, but no explicit reference was found in the text ** Obsolete normative reference: RFC 3490 (Obsoleted by RFC 5890, RFC 5891) ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) == Outdated reference: A later version (-24) exists of draft-ietf-nfsv4-rfc3530bis-dot-x-23 -- Possible downref: Non-RFC (?) normative reference: ref. 'SPECIALCASING' -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE' -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 3010 (Obsoleted by RFC 3530) -- Obsolete informational reference (is this intentional?): RFC 3530 (Obsoleted by RFC 7530) -- Obsolete informational reference (is this intentional?): RFC 5246 (Obsoleted by RFC 8446) -- Obsolete informational reference (is this intentional?): RFC 5661 (Obsoleted by RFC 8881) Summary: 2 errors (**), 0 flaws (~~), 10 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 T. Haynes, Ed. 3 Internet-Draft Primary Data 4 Obsoletes: 3530 (if approved) D. Noveck, Ed. 5 Intended status: Standards Track Dell 6 Expires: June 7, 2015 December 04, 2014 8 Network File System (NFS) Version 4 Protocol 9 draft-ietf-nfsv4-rfc3530bis-35.txt 11 Abstract 13 The Network File System (NFS) version 4 is a distributed file system 14 protocol which builds on the heritage of NFS protocol version 2, RFC 15 1094, and version 3, RFC 1813. Unlike earlier versions, the NFS 16 version 4 protocol supports traditional file access while integrating 17 support for file locking and the mount protocol. In addition, 18 support for strong security (and its negotiation), compound 19 operations, client caching, and internationalization have been added. 20 Of course, attention has been applied to making NFS version 4 operate 21 well in an Internet environment. 23 This document, together with the companion XDR description document, 24 RFCNFSv4XDR, obsoletes RFC 3530 as the definition of the NFS version 25 4 protocol. 27 Requirements Language 29 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 30 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 31 document are to be interpreted as described in RFC 2119 [RFC2119] 32 except where "REQUIRED" and "RECOMMENDED" are used as qualifiers to 33 distinguish classes of attributes as described in Section 1.3.3.2 and 34 Section 5. 36 Status of This Memo 38 This Internet-Draft is submitted in full conformance with the 39 provisions of BCP 78 and BCP 79. 41 Internet-Drafts are working documents of the Internet Engineering 42 Task Force (IETF). Note that other groups may also distribute 43 working documents as Internet-Drafts. The list of current Internet- 44 Drafts is at http://datatracker.ietf.org/drafts/current/. 46 Internet-Drafts are draft documents valid for a maximum of six months 47 and may be updated, replaced, or obsoleted by other documents at any 48 time. It is inappropriate to use Internet-Drafts as reference 49 material or to cite them other than as "work in progress." 51 This Internet-Draft will expire on June 7, 2015. 53 Copyright Notice 55 Copyright (c) 2014 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (http://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with respect 63 to this document. Code Components extracted from this document must 64 include Simplified BSD License text as described in Section 4.e of 65 the Trust Legal Provisions and are provided without warranty as 66 described in the Simplified BSD License. 68 This document may contain material from IETF Documents or IETF 69 Contributions published or made publicly available before November 70 10, 2008. The person(s) controlling the copyright in some of this 71 material may not have granted the IETF Trust the right to allow 72 modifications of such material outside the IETF Standards Process. 73 Without obtaining an adequate license from the person(s) controlling 74 the copyright in such materials, this document may not be modified 75 outside the IETF Standards Process, and derivative works of it may 76 not be created outside the IETF Standards Process, except to format 77 it for publication as an RFC or to translate it into languages other 78 than English. 80 Table of Contents 82 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 7 83 1.1. NFS Version 4 Goals . . . . . . . . . . . . . . . . . . . 7 84 1.2. Definitions in the companion document NFS Version 4 85 Protocol are Authoritative . . . . . . . . . . . . . . . 8 86 1.3. Overview of NFSv4 Features . . . . . . . . . . . . . . . 8 87 1.3.1. RPC and Security . . . . . . . . . . . . . . . . . . 9 88 1.3.2. Procedure and Operation Structure . . . . . . . . . . 9 89 1.3.3. Filesystem Model . . . . . . . . . . . . . . . . . . 10 90 1.3.4. OPEN and CLOSE . . . . . . . . . . . . . . . . . . . 12 91 1.3.5. File Locking . . . . . . . . . . . . . . . . . . . . 12 92 1.3.6. Client Caching and Delegation . . . . . . . . . . . . 12 93 1.4. General Definitions . . . . . . . . . . . . . . . . . . . 13 94 1.5. Changes since RFC 3530 . . . . . . . . . . . . . . . . . 15 95 1.6. Changes between RFC 3010 and RFC3530 . . . . . . . . . . 16 97 2. Protocol Data Types . . . . . . . . . . . . . . . . . . . . . 17 98 2.1. Basic Data Types . . . . . . . . . . . . . . . . . . . . 17 99 2.2. Structured Data Types . . . . . . . . . . . . . . . . . . 19 100 3. RPC and Security Flavor . . . . . . . . . . . . . . . . . . . 23 101 3.1. Ports and Transports . . . . . . . . . . . . . . . . . . 23 102 3.1.1. Client Retransmission Behavior . . . . . . . . . . . 24 103 3.2. Security Flavors . . . . . . . . . . . . . . . . . . . . 25 104 3.2.1. Security mechanisms for NFSv4 . . . . . . . . . . . . 25 105 3.3. Security Negotiation . . . . . . . . . . . . . . . . . . 26 106 3.3.1. SECINFO . . . . . . . . . . . . . . . . . . . . . . . 27 107 3.3.2. Security Error . . . . . . . . . . . . . . . . . . . 27 108 3.3.3. Callback RPC Authentication . . . . . . . . . . . . . 27 109 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 28 110 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 28 111 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . . 29 112 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . . 29 113 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 29 114 4.2.1. General Properties of a Filehandle . . . . . . . . . 30 115 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . . 30 116 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . . 31 117 4.2.4. One Method of Constructing a Volatile Filehandle . . 32 118 4.3. Client Recovery from Filehandle Expiration . . . . . . . 33 119 5. Attributes . . . . . . . . . . . . . . . . . . . . . . . . . 33 120 5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . . 35 121 5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 35 122 5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 35 123 5.4. Classification of Attributes . . . . . . . . . . . . . . 37 124 5.5. Set-Only and Get-Only Attributes . . . . . . . . . . . . 38 125 5.6. REQUIRED Attributes - List and Definition References . . 38 126 5.7. RECOMMENDED Attributes - List and Definition References . 39 127 5.8. Attribute Definitions . . . . . . . . . . . . . . . . . . 41 128 5.8.1. Definitions of REQUIRED Attributes . . . . . . . . . 41 129 5.8.2. Definitions of Uncategorized RECOMMENDED Attributes . 43 130 5.9. Interpreting owner and owner_group . . . . . . . . . . . 49 131 5.10. Character Case Attributes . . . . . . . . . . . . . . . . 52 132 6. Access Control Attributes . . . . . . . . . . . . . . . . . . 52 133 6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 52 134 6.2. File Attributes Discussion . . . . . . . . . . . . . . . 53 135 6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . . 53 136 6.2.2. Attribute 33: mode . . . . . . . . . . . . . . . . . 68 137 6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 68 138 6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . . 68 139 6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 69 140 6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 70 141 6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 71 142 6.4.2. Retrieving the mode and/or ACL Attributes . . . . . . 72 143 6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 72 144 7. NFS Server Name Space . . . . . . . . . . . . . . . . . . . . 74 145 7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 74 146 7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 75 147 7.3. Server Pseudo Filesystem . . . . . . . . . . . . . . . . 75 148 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 76 149 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . . 76 150 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . . 76 151 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 76 152 7.8. Security Policy and Name Space Presentation . . . . . . . 77 153 8. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 78 154 8.1. Location Attributes . . . . . . . . . . . . . . . . . . . 78 155 8.2. File System Presence or Absence . . . . . . . . . . . . . 78 156 8.3. Getting Attributes for an Absent File System . . . . . . 79 157 8.3.1. GETATTR Within an Absent File System . . . . . . . . 80 158 8.3.2. READDIR and Absent File Systems . . . . . . . . . . . 81 159 8.4. Uses of Location Information . . . . . . . . . . . . . . 81 160 8.4.1. File System Replication . . . . . . . . . . . . . . . 82 161 8.4.2. File System Migration . . . . . . . . . . . . . . . . 83 162 8.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . . 83 163 8.5. Location Entries and Server Identity . . . . . . . . . . 84 164 8.6. Additional Client-Side Considerations . . . . . . . . . . 85 165 8.7. Effecting File System Referrals . . . . . . . . . . . . . 86 166 8.7.1. Referral Example (LOOKUP) . . . . . . . . . . . . . . 86 167 8.7.2. Referral Example (READDIR) . . . . . . . . . . . . . 90 168 8.8. The Attribute fs_locations . . . . . . . . . . . . . . . 92 169 9. File Locking and Share Reservations . . . . . . . . . . . . . 94 170 9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 95 171 9.1.1. Client ID . . . . . . . . . . . . . . . . . . . . . . 95 172 9.1.2. Server Release of Client ID . . . . . . . . . . . . . 98 173 9.1.3. Use of Seqids . . . . . . . . . . . . . . . . . . . . 99 174 9.1.4. Stateid Definition . . . . . . . . . . . . . . . . . 100 175 9.1.5. lock-owner . . . . . . . . . . . . . . . . . . . . . 106 176 9.1.6. Use of the Stateid and Locking . . . . . . . . . . . 107 177 9.1.7. Sequencing of Lock Requests . . . . . . . . . . . . . 109 178 9.1.8. Recovery from Replayed Requests . . . . . . . . . . . 110 179 9.1.9. Interactions of multiple sequence values . . . . . . 110 180 9.1.10. Releasing state-owner State . . . . . . . . . . . . . 111 181 9.1.11. Use of Open Confirmation . . . . . . . . . . . . . . 112 182 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . . 113 183 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . . 113 184 9.4. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 114 185 9.5. Lease Renewal . . . . . . . . . . . . . . . . . . . . . . 115 186 9.6. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 116 187 9.6.1. Client Failure and Recovery . . . . . . . . . . . . . 116 188 9.6.2. Server Failure and Recovery . . . . . . . . . . . . . 116 189 9.6.3. Network Partitions and Recovery . . . . . . . . . . . 118 190 9.7. Recovery from a Lock Request Timeout or Abort . . . . . . 126 191 9.8. Server Revocation of Locks . . . . . . . . . . . . . . . 126 192 9.9. Share Reservations . . . . . . . . . . . . . . . . . . . 128 193 9.10. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . . 128 194 9.10.1. Close and Retention of State Information . . . . . . 129 195 9.11. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 130 196 9.12. Short and Long Leases . . . . . . . . . . . . . . . . . . 130 197 9.13. Clocks, Propagation Delay, and Calculating Lease 198 Expiration . . . . . . . . . . . . . . . . . . . . . . . 131 199 9.14. Migration, Replication and State . . . . . . . . . . . . 131 200 9.14.1. Migration and State . . . . . . . . . . . . . . . . 132 201 9.14.2. Replication and State . . . . . . . . . . . . . . . 133 202 9.14.3. Notification of Migrated Lease . . . . . . . . . . . 133 203 9.14.4. Migration and the lease_time Attribute . . . . . . . 134 204 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 135 205 10.1. Performance Challenges for Client-Side Caching . . . . . 135 206 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 136 207 10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 138 208 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 142 209 10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 143 210 10.3.2. Data Caching and File Locking . . . . . . . . . . . 144 211 10.3.3. Data Caching and Mandatory File Locking . . . . . . 145 212 10.3.4. Data Caching and File Identity . . . . . . . . . . . 146 213 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 147 214 10.4.1. Open Delegation and Data Caching . . . . . . . . . . 149 215 10.4.2. Open Delegation and File Locks . . . . . . . . . . . 151 216 10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 151 217 10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 154 218 10.4.5. OPEN Delegation Race with CB_RECALL . . . . . . . . 156 219 10.4.6. Clients that Fail to Honor Delegation Recalls . . . 157 220 10.4.7. Delegation Revocation . . . . . . . . . . . . . . . 158 221 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 158 222 10.5.1. Revocation Recovery for Write Open Delegation . . . 159 223 10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 159 224 10.7. Data and Metadata Caching and Memory Mapped Files . . . 161 225 10.8. Name Caching . . . . . . . . . . . . . . . . . . . . . . 163 226 10.9. Directory Caching . . . . . . . . . . . . . . . . . . . 164 227 11. Minor Versioning . . . . . . . . . . . . . . . . . . . . . . 165 228 12. Internationalization . . . . . . . . . . . . . . . . . . . . 166 229 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 166 230 12.2. Limitations on internationalization-related processing 231 in the NFSv4 context . . . . . . . . . . . . . . . . . . 168 232 12.3. Summary of Server Behavior Types . . . . . . . . . . . . 168 233 12.4. String Encoding . . . . . . . . . . . . . . . . . . . . 169 234 12.5. Normalization . . . . . . . . . . . . . . . . . . . . . 170 235 12.6. Types with Processing Defined by Other Internet Areas . 171 236 12.7. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 172 237 12.8. Servers that accept file component names that are not 238 valid UTF-8 strings . . . . . . . . . . . . . . . . . . 173 239 13. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 174 240 13.1. Error Definitions . . . . . . . . . . . . . . . . . . . 174 241 13.1.1. General Errors . . . . . . . . . . . . . . . . . . . 175 242 13.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 177 243 13.1.3. Compound Structure Errors . . . . . . . . . . . . . 178 244 13.1.4. File System Errors . . . . . . . . . . . . . . . . . 179 245 13.1.5. State Management Errors . . . . . . . . . . . . . . 181 246 13.1.6. Security Errors . . . . . . . . . . . . . . . . . . 182 247 13.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 183 248 13.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 183 249 13.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 185 250 13.1.10. Client Management Errors . . . . . . . . . . . . . . 186 251 13.1.11. Attribute Handling Errors . . . . . . . . . . . . . 186 252 13.1.12. Miscellaneous Errors . . . . . . . . . . . . . . . . 187 253 13.2. Operations and their valid errors . . . . . . . . . . . 187 254 13.3. Callback operations and their valid errors . . . . . . . 194 255 13.4. Errors and the operations that use them . . . . . . . . 195 256 14. NFSv4 Requests . . . . . . . . . . . . . . . . . . . . . . . 200 257 14.1. Compound Procedure . . . . . . . . . . . . . . . . . . . 201 258 14.2. Evaluation of a Compound Request . . . . . . . . . . . . 202 259 14.3. Synchronous Modifying Operations . . . . . . . . . . . . 202 260 14.4. Operation Values . . . . . . . . . . . . . . . . . . . . 203 261 15. NFSv4 Procedures . . . . . . . . . . . . . . . . . . . . . . 203 262 15.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 203 263 15.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 203 264 15.3. Operation 3: ACCESS - Check Access Rights . . . . . . . 207 265 15.4. Operation 4: CLOSE - Close File . . . . . . . . . . . . 210 266 15.5. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 211 267 15.6. Operation 6: CREATE - Create a Non-Regular File Object . 213 268 15.7. Operation 7: DELEGPURGE - Purge Delegations Awaiting 269 Recovery . . . . . . . . . . . . . . . . . . . . . . . . 216 270 15.8. Operation 8: DELEGRETURN - Return Delegation . . . . . . 217 271 15.9. Operation 9: GETATTR - Get Attributes . . . . . . . . . 218 272 15.10. Operation 10: GETFH - Get Current Filehandle . . . . . . 220 273 15.11. Operation 11: LINK - Create Link to a File . . . . . . . 220 274 15.12. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 222 275 15.13. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 226 276 15.14. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 228 277 15.15. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 229 278 15.16. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 231 279 15.17. Operation 17: NVERIFY - Verify Difference in Attributes 232 280 15.18. Operation 18: OPEN - Open a Regular File . . . . . . . . 233 281 15.19. Operation 19: OPENATTR - Open Named Attribute Directory 243 282 15.20. Operation 20: OPEN_CONFIRM - Confirm Open . . . . . . . 244 283 15.21. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 246 284 15.22. Operation 22: PUTFH - Set Current Filehandle . . . . . . 248 285 15.23. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 248 286 15.24. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 250 287 15.25. Operation 25: READ - Read from File . . . . . . . . . . 251 288 15.26. Operation 26: READDIR - Read Directory . . . . . . . . . 253 289 15.27. Operation 27: READLINK - Read Symbolic Link . . . . . . 257 290 15.28. Operation 28: REMOVE - Remove Filesystem Object . . . . 258 291 15.29. Operation 29: RENAME - Rename Directory Entry . . . . . 260 292 15.30. Operation 30: RENEW - Renew a Lease . . . . . . . . . . 262 293 15.31. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 263 294 15.32. Operation 32: SAVEFH - Save Current Filehandle . . . . . 264 295 15.33. Operation 33: SECINFO - Obtain Available Security . . . 265 296 15.34. Operation 34: SETATTR - Set Attributes . . . . . . . . . 268 297 15.35. Operation 35: SETCLIENTID - Negotiate Client ID . . . . 271 298 15.36. Operation 36: SETCLIENTID_CONFIRM - Confirm Client ID . 275 299 15.37. Operation 37: VERIFY - Verify Same Attributes . . . . . 278 300 15.38. Operation 38: WRITE - Write to File . . . . . . . . . . 280 301 15.39. Operation 39: RELEASE_LOCKOWNER - Release Lockowner 302 State . . . . . . . . . . . . . . . . . . . . . . . . . 284 303 15.40. Operation 10044: ILLEGAL - Illegal operation . . . . . . 285 304 16. NFSv4 Callback Procedures . . . . . . . . . . . . . . . . . . 286 305 16.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 286 306 16.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 286 307 16.2.6. Operation 3: CB_GETATTR - Get Attributes . . . . . . 288 308 16.2.7. Operation 4: CB_RECALL - Recall an Open Delegation . 289 309 16.2.8. Operation 10044: CB_ILLEGAL - Illegal Callback 310 Operation . . . . . . . . . . . . . . . . . . . . . 290 311 17. Security Considerations . . . . . . . . . . . . . . . . . . . 291 312 18. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 293 313 18.1. Named Attribute Definitions . . . . . . . . . . . . . . 293 314 18.1.1. Initial Registry . . . . . . . . . . . . . . . . . . 294 315 18.1.2. Updating Registrations . . . . . . . . . . . . . . . 294 316 19. References . . . . . . . . . . . . . . . . . . . . . . . . . 294 317 19.1. Normative References . . . . . . . . . . . . . . . . . . 294 318 19.2. Informative References . . . . . . . . . . . . . . . . . 296 319 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 299 320 Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 300 321 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 300 323 1. Introduction 325 1.1. NFS Version 4 Goals 327 The Network Filesystem version 4 (NFSv4) protocol is a further 328 revision of the NFS protocol defined already by versions 2 [RFC1094] 329 and 3 [RFC1813]. It retains the essential characteristics of 330 previous versions: design for easy recovery, independent of transport 331 protocols, operating systems and file systems, simplicity, and good 332 performance. The NFSv4 revision has the following goals: 334 o Improved access and good performance on the Internet. 336 The protocol is designed to transit firewalls easily, perform well 337 where latency is high and bandwidth is low, and scale to very 338 large numbers of clients per server. 340 o Strong security with negotiation built into the protocol. 342 The protocol builds on the work of the Open Network Computing 343 (ONC) Remote Procedure Call (RPC) working group in supporting the 344 RPCSEC_GSS protocol (see both [RFC2203] and [RFC5403]). 345 Additionally, the NFS version 4 protocol provides a mechanism to 346 allow clients and servers the ability to negotiate security and 347 require clients and servers to support a minimal set of security 348 schemes. 350 o Good cross-platform interoperability. 352 The protocol features a file system model that provides a useful, 353 common set of features that does not unduly favor one file system 354 or operating system over another. 356 o Designed for protocol extensions. 358 The protocol is designed to accept standard extensions that do not 359 compromise backward compatibility. 361 This document, together with the companion XDR description document 362 [RFCNFSv4XDR], obsoletes [RFC3530] as the authoritative document 363 describing NFSv4. It does not introduce any over-the-wire protocol 364 changes, in the sense that previously valid requests remain valid. 366 1.2. Definitions in the companion document NFS Version 4 Protocol are 367 Authoritative 369 [RFCNFSv4XDR], "Network File System (NFS) Version 4 External Data 370 Representation Standard (XDR) Description", contains the definitions 371 in XDR description language of the constructs used by the protocol. 372 Inside this document, several of the constructs are reproduced for 373 purposes of explanation. The reader is warned of the possibility of 374 errors in the reproduced constructs outside of [RFCNFSv4XDR]. For 375 any part of the document that is inconsistent with [RFCNFSv4XDR], 376 [RFCNFSv4XDR] is to be considered authoritative. 378 1.3. Overview of NFSv4 Features 380 To provide a reasonable context for the reader, the major features of 381 NFSv4 protocol will be reviewed in brief. This will be done to 382 provide an appropriate context for both the reader who is familiar 383 with the previous versions of the NFS protocol and the reader who is 384 new to the NFS protocols. For the reader new to the NFS protocols, 385 some fundamental knowledge is still expected. The reader should be 386 familiar with the XDR and RPC protocols as described in [RFC5531] and 387 [RFC4506]. A basic knowledge of file systems and distributed file 388 systems is expected as well. 390 1.3.1. RPC and Security 392 As with previous versions of NFS, the External Data Representation 393 (XDR) and RPC mechanisms used for the NFSv4 protocol are those 394 defined in [RFC5531] and [RFC4506]. To meet end to end security 395 requirements, the RPCSEC_GSS framework (both version 1 in [RFC2203] 396 and version 2 in [RFC5403]) will be used to extend the basic RPC 397 security. With the use of RPCSEC_GSS, various mechanisms can be 398 provided to offer authentication, integrity, and privacy to the NFS 399 version 4 protocol. Kerberos V5 will be used as described in 400 [RFC4121] to provide one security framework. With the use of 401 RPCSEC_GSS, other mechanisms may also be specified and used for NFS 402 version 4 security. 404 To enable in-band security negotiation, the NFSv4 protocol has added 405 a new operation which provides the client with a method of querying 406 the server about its policies regarding which security mechanisms 407 must be used for access to the server's file system resources. With 408 this, the client can securely match the security mechanism that meets 409 the policies specified at both the client and server. 411 1.3.2. Procedure and Operation Structure 413 A significant departure from the previous versions of the NFS 414 protocol is the introduction of the COMPOUND procedure. For the 415 NFSv4 protocol, there are two RPC procedures, NULL and COMPOUND. The 416 COMPOUND procedure is defined in terms of operations and these 417 operations correspond more closely to the traditional NFS procedures. 419 With the use of the COMPOUND procedure, the client is able to build 420 simple or complex requests. These COMPOUND requests allow for a 421 reduction in the number of RPCs needed for logical file system 422 operations. For example, without previous contact with a server a 423 client will be able to read data from a file in one request by 424 combining LOOKUP, OPEN, and READ operations in a single COMPOUND RPC. 425 With previous versions of the NFS protocol, this type of single 426 request was not possible. 428 The model used for COMPOUND is very simple. There is no logical OR 429 or ANDing of operations. The operations combined within a COMPOUND 430 request are evaluated in order by the server. Once an operation 431 returns a failing result, the evaluation ends and the results of all 432 evaluated operations are returned to the client. 434 The NFSv4 protocol continues to have the client refer to a file or 435 directory at the server by a "filehandle". The COMPOUND procedure 436 has a method of passing a filehandle from one operation to another 437 within the sequence of operations. There is a concept of a "current 438 filehandle" and "saved filehandle". Most operations use the "current 439 filehandle" as the file system object to operate upon. The "saved 440 filehandle" is used as temporary filehandle storage within a COMPOUND 441 procedure as well as an additional operand for certain operations. 443 1.3.3. Filesystem Model 445 The general file system model used for the NFSv4 protocol is the same 446 as previous versions. The server file system is hierarchical with 447 the regular files contained within being treated as opaque byte 448 streams. In a slight departure, file and directory names are encoded 449 with UTF-8 to deal with the basics of internationalization. 451 The NFSv4 protocol does not require a separate protocol to provide 452 for the initial mapping between path name and filehandle. Instead of 453 using the older MOUNT protocol for this mapping, the server provides 454 a ROOT filehandle that represents the logical root or top of the file 455 system tree provided by the server. The server provides multiple 456 file systems by gluing them together with pseudo file systems. These 457 pseudo file systems provide for potential gaps in the path names 458 between real file systems. 460 1.3.3.1. Filehandle Types 462 In previous versions of the NFS protocol, the filehandle provided by 463 the server was guaranteed to be valid or persistent for the lifetime 464 of the file system object to which it referred. For some server 465 implementations, this persistence requirement has been difficult to 466 meet. For the NFSv4 protocol, this requirement has been relaxed by 467 introducing another type of filehandle, volatile. With persistent 468 and volatile filehandle types, the server implementation can match 469 the abilities of the file system at the server along with the 470 operating environment. The client will have knowledge of the type of 471 filehandle being provided by the server and can be prepared to deal 472 with the semantics of each. 474 1.3.3.2. Attribute Types 476 The NFSv4 protocol has a rich and extensible file object attribute 477 structure, which is divided into REQUIRED, RECOMMENDED, and named 478 attributes (see Section 5). 480 Several (but not all) of the REQUIRED attributes are derived from the 481 attributes of NFSv3 (see definition of the fattr3 data type in 482 [RFC1813]). An example of a REQUIRED attribute is the file object's 483 type (Section 5.8.1.2) so that regular files can be distinguished 484 from directories (also known as folders in some operating 485 environments) and other types of objects. REQUIRED attributes are 486 discussed in Section 5.1. 488 An example of the RECOMMENDED attributes is an acl (Section 6.2.1). 489 This attribute defines an Access Control List (ACL) on a file object. 490 An ACL provides file access control beyond the model used in NFSv3. 491 The ACL definition allows for specification of specific sets of 492 permissions for individual users and groups. In addition, ACL 493 inheritance allows propagation of access permissions and restriction 494 down a directory tree as file system objects are created. 495 RECOMMENDED attributes are discussed in Section 5.2. 497 A named attribute is an opaque byte stream that is associated with a 498 directory or file and referred to by a string name. Named attributes 499 are meant to be used by client applications as a method to associate 500 application-specific data with a regular file or directory. NFSv4.1 501 modifies named attributes relative to NFSv4.0 by tightening the 502 allowed operations in order to prevent the development of non- 503 interoperable implementations. Named attributes are discussed in 504 Section 5.3. 506 1.3.3.3. Multi-server Namespace 508 A single-server namespace is the file system hierarchy that the 509 server presents for remote access. It is a proper subset of all the 510 file systems available locally. NFSv4 contains a number of features 511 to allow implementation of namespaces that cross server boundaries 512 and that allow and facilitate a non-disruptive transfer of support 513 for individual file systems between servers. They are all based upon 514 attributes that allow one file system to specify alternative or new 515 locations for that file system. I.e., just as a client might 516 traverse across local file systems on a single server, it can now 517 traverse to a remote file system on a different server. 519 These attributes may be used together with the concept of absent file 520 systems, which provide specifications for additional locations but no 521 actual file system content. This allows a number of important 522 facilities: 524 o Location attributes may be used with absent file systems to 525 implement referrals whereby one server may direct the client to a 526 file system provided by another server. This allows extensive 527 multi-server namespaces to be constructed. 529 o Location attributes may be provided for present file systems to 530 provide the locations of alternative file system instances or 531 replicas to be used in the event that the current file system 532 instance becomes unavailable. 534 o Location attributes may be provided when a previously present file 535 system becomes absent. This allows non-disruptive migration of 536 file systems to alternative servers. 538 1.3.4. OPEN and CLOSE 540 The NFSv4 protocol introduces OPEN and CLOSE operations. The OPEN 541 operation provides a single point where file lookup, creation, and 542 share semantics (see Section 9.9) can be combined. The CLOSE 543 operation also provides for the release of state accumulated by OPEN. 545 1.3.5. File Locking 547 With the NFSv4 protocol, the support for byte range file locking is 548 part of the NFS protocol. The file locking support is structured so 549 that an RPC callback mechanism is not required. This is a departure 550 from the previous versions of the NFS file locking protocol, Network 551 Lock Manager (NLM) [RFC1813]. The state associated with file locks 552 is maintained at the server under a lease-based model. The server 553 defines a single lease period for all state held by a NFS client. If 554 the client does not renew its lease within the defined period, all 555 state associated with the client's lease may be released by the 556 server. The client may renew its lease with use of the RENEW 557 operation or implicitly by use of other operations (primarily READ). 559 1.3.6. Client Caching and Delegation 561 The file, attribute, and directory caching for the NFSv4 protocol is 562 similar to previous versions. Attributes and directory information 563 are cached for a duration determined by the client. At the end of a 564 predefined timeout, the client will query the server to see if the 565 related file system object has been updated. 567 For file data, the client checks its cache validity when the file is 568 opened. A query is sent to the server to determine if the file has 569 been changed. Based on this information, the client determines if 570 the data cache for the file should kept or released. Also, when the 571 file is closed, any modified data is written to the server. 573 If an application wants to serialize access to file data, file 574 locking of the file data ranges in question should be used. 576 The major addition to NFSv4 in the area of caching is the ability of 577 the server to delegate certain responsibilities to the client. When 578 the server grants a delegation for a file to a client, the client is 579 guaranteed certain semantics with respect to the sharing of that file 580 with other clients. At OPEN, the server may provide the client 581 either a read (OPEN_DELEGATE_READ) or a write (OPEN_DELEGATE_WRITE) 582 delegation for the file (see Section 10.4). If the client is granted 583 a OPEN_DELEGATE_READ delegation, it is assured that no other client 584 has the ability to write to the file for the duration of the 585 delegation. If the client is granted a OPEN_DELEGATE_WRITE 586 delegation, the client is assured that no other client has read or 587 write access to the file. 589 Delegations can be recalled by the server. If another client 590 requests access to the file in such a way that the access conflicts 591 with the granted delegation, the server is able to notify the initial 592 client and recall the delegation. This requires that a callback path 593 exist between the server and client. If this callback path does not 594 exist, then delegations cannot be granted. The essence of a 595 delegation is that it allows the client to locally service operations 596 such as OPEN, CLOSE, LOCK, LOCKU, READ, or WRITE without immediate 597 interaction with the server. 599 1.4. General Definitions 601 The following definitions are provided for the purpose of providing 602 an appropriate context for the reader. 604 Anonymous Stateid: Special locking object defined in 605 Section 9.1.4.3. 607 Absent File System: A file system is "absent" when a namespace 608 component does not have a backing file system. 610 Byte: In this document, a byte is an octet, i.e., a datum exactly 8 611 bits in length. 613 Client: The client is the entity that accesses the NFS server's 614 resources. The client may be an application that contains the 615 logic to access the NFS server directly. The client may also be 616 the traditional operating system client that provides remote file 617 system services for a set of applications. 619 With reference to byte-range locking, the client is also the 620 entity that maintains a set of locks on behalf of one or more 621 applications. This client is responsible for crash or failure 622 recovery for those locks it manages. 624 Note that multiple clients may share the same transport and 625 connection and multiple clients may exist on the same network 626 node. 628 Client ID: A 64-bit quantity used as a unique, short-hand reference 629 to a client supplied Verifier and ID. The server is responsible 630 for supplying the Client ID. 632 File System: The file system is the collection of objects on a 633 server that share the same fsid attribute (see Section 5.8.1.9). 635 Lease: An interval of time defined by the server for which the 636 client is irrevocably granted a lock. At the end of a lease 637 period the lock may be revoked if the lease has not been extended. 638 The lock must be revoked if a conflicting lock has been granted 639 after the lease interval. 641 All leases granted by a server have the same fixed duration. Note 642 that the fixed interval duration was chosen to alleviate the 643 expense a server would have in maintaining state about variable 644 length leases across server failures. 646 Lock: The term "lock" is used to refer to both record (byte-range) 647 locks as well as share reservations unless specifically stated 648 otherwise. 650 Lock-Owner: Each byte-range lock is associated with a specific lock- 651 owner and an open-owner. The lock-owner consists of a Client ID 652 and an opaque owner string. The client presents this to the 653 server to establish the ownership of the byte-range lock as 654 needed. 656 Open-Owner: Each open file is associated with a specific open-owner, 657 which consists of a Client ID and an opaque owner string. The 658 client presents this to the server to establish the ownership of 659 the open as needed. 661 READ Bypass Stateid: Special locking object defined in 662 Section 9.1.4.3. 664 Server: The "Server" is the entity responsible for coordinating 665 client access to a set of file systems. 667 Stable Storage: NFSv4 servers must be able to recover without data 668 loss from multiple power failures (including cascading power 669 failures, that is, several power failures in quick succession), 670 operating system failures, and hardware failure of components 671 other than the storage medium itself (for example, disk, 672 nonvolatile RAM). 674 Some examples of stable storage that are allowable for an NFS 675 server include: 677 (1) Media commit of data, that is, the modified data has been 678 successfully written to the disk media, for example, the disk 679 platter. 681 (2) An immediate reply disk drive with battery-backed on-drive 682 intermediate storage or uninterruptible power system (UPS). 684 (3) Server commit of data with battery-backed intermediate 685 storage and recovery software. 687 (4) Cache commit with uninterruptible power system (UPS) and 688 recovery software. 690 Stateid: A stateid is a 128-bit quantity returned by a server that 691 uniquely identifies the open and locking states provided by the 692 server for a specific open-owner or lock-owner/open-owner pair for 693 a specific file and type of lock. 695 Verifier: A 64-bit quantity generated by the client that the server 696 can use to determine if the client has restarted and lost all 697 previous lock state. 699 1.5. Changes since RFC 3530 701 The main changes from RFC 3530 [RFC3530] are: 703 o The XDR definition has been moved to a companion document 704 [RFCNFSv4XDR]. 706 o The IETF intellectual property statements were updated to the 707 latest version. 709 o There is a restructured and more complete explanation of multi- 710 server namespace features. 712 o The handling of domain names were updated to reflect 713 Internationalized Domain Names in Applications (IDNA) [RFC5891]. 715 o The previously required LIPKEY and SPKM-3 security mechanisms have 716 been removed. 718 o Some clarification on a client re-establishing callback 719 information to the new server if state has been migrated. 721 o A third edge case was added for Courtesy locks and network 722 partitions. 724 o The definition of stateid was strengthened. 726 1.6. Changes between RFC 3010 and RFC3530 728 The definition of the NFSv4 protocol in [RFC3530] replaced and 729 obsoleted the definition present in [RFC3010]. While portions of the 730 two documents remained the same, there were substantive changes in 731 others. The changes made between [RFC3010] and [RFC3530] reflect 732 implementation experience and further review of the protocol. 734 The following list is not all inclusive of all changes but presents 735 some of the most notable changes or additions made: 737 o The state model has added an open_owner4 identifier. This was 738 done to accommodate Posix based clients and the model they use for 739 file locking. For Posix clients, an open_owner4 would correspond 740 to a file descriptor potentially shared amongst a set of processes 741 and the lock_owner4 identifier would correspond to a process that 742 is locking a file. 744 o Clarifications and error conditions were added for the handling of 745 the owner and group attributes. Since these attributes are string 746 based (as opposed to the numeric uid/gid of previous versions of 747 NFS), translations may not be available and hence the changes 748 made. 750 o Clarifications for the ACL and mode attributes to address 751 evaluation and partial support. 753 o For identifiers that are defined as XDR opaque, limits were set on 754 their size. 756 o Added the mounted_on_fileid attribute to allow Posix clients to 757 correctly construct local mounts. 759 o Modified the SETCLIENTID/SETCLIENTID_CONFIRM operations to deal 760 correctly with confirmation details along with adding the ability 761 to specify new client callback information. Also added 762 clarification of the callback information itself. 764 o Added a new operation RELEASE_LOCKOWNER to enable notifying the 765 server that a lock_owner4 will no longer be used by the client. 767 o RENEW operation changes to identify the client correctly and allow 768 for additional error returns. 770 o Verify error return possibilities for all operations. 772 o Remove use of the pathname4 data type from LOOKUP and OPEN in 773 favor of having the client construct a sequence of LOOKUP 774 operations to achieve the same effect. 776 2. Protocol Data Types 778 The syntax and semantics to describe the data types of the NFS 779 version 4 protocol are defined in the XDR [RFC4506] and RPC [RFC5531] 780 documents. The next sections build upon the XDR data types to define 781 types and structures specific to this protocol. As a reminder, the 782 size constants and definitive definitions can be found in 783 [RFCNFSv4XDR]. 785 2.1. Basic Data Types 787 These are the base NFSv4 data types. 789 +-----------------+-------------------------------------------------+ 790 | Data Type | Definition | 791 +-----------------+-------------------------------------------------+ 792 | int32_t | typedef int int32_t; | 793 | uint32_t | typedef unsigned int uint32_t; | 794 | int64_t | typedef hyper int64_t; | 795 | uint64_t | typedef unsigned hyper uint64_t; | 796 | attrlist4 | typedef opaque attrlist4<>; | 797 | | Used for file/directory attributes. | 798 | bitmap4 | typedef uint32_t bitmap4<>; | 799 | | Used in attribute array encoding. | 800 | changeid4 | typedef uint64_t changeid4; | 801 | | Used in the definition of change_info4. | 802 | clientid4 | typedef uint64_t clientid4; | 803 | | Shorthand reference to client identification. | 804 | count4 | typedef uint32_t count4; | 805 | | Various count parameters (READ, WRITE, COMMIT). | 806 | length4 | typedef uint64_t length4; | 807 | | Describes LOCK lengths. | 808 | mode4 | typedef uint32_t mode4; | 809 | | Mode attribute data type. | 810 | nfs_cookie4 | typedef uint64_t nfs_cookie4; | 811 | | Opaque cookie value for READDIR. | 812 | nfs_fh4 | typedef opaque nfs_fh4; | 813 | | Filehandle definition. | 814 | nfs_ftype4 | enum nfs_ftype4; | 815 | | Various defined file types. | 816 | nfsstat4 | enum nfsstat4; | 817 | | Return value for operations. | 818 | nfs_lease4 | typedef uint32_t nfs_lease4; | 819 | | Duration of a lease in seconds. | 820 | offset4 | typedef uint64_t offset4; | 821 | | Various offset designations (READ, WRITE, LOCK, | 822 | | COMMIT). | 823 | qop4 | typedef uint32_t qop4; | 824 | | Quality of protection designation in SECINFO. | 825 | sec_oid4 | typedef opaque sec_oid4<>; | 826 | | Security Object Identifier. The sec_oid4 data | 827 | | type is not really opaque. Instead it contains | 828 | | an ASN.1 OBJECT IDENTIFIER as used by GSS-API | 829 | | in the mech_type argument to | 830 | | GSS_Init_sec_context. See [RFC2743] for | 831 | | details. | 832 | seqid4 | typedef uint32_t seqid4; | 833 | | Sequence identifier used for file locking. | 834 | utf8string | typedef opaque utf8string<>; | 835 | | UTF-8 encoding for strings. | 836 | utf8str_cis | typedef utf8string utf8str_cis; | 837 | | Case insensitive UTF-8 string. | 838 | utf8str_cs | typedef utf8string utf8str_cs; | 839 | | Case sensitive UTF-8 string. | 840 | utf8str_mixed | typedef utf8string utf8str_mixed; | 841 | | UTF-8 strings with a case sensitive prefix and | 842 | | a case insensitive suffix. | 843 | component4 | typedef utf8str_cs component4; | 844 | | Represents pathname components. | 845 | linktext4 | typedef opaque linktext4<>; | 846 | | Symbolic link contents ("symbolic link" is | 847 | | defined in an Open Group [openg_symlink] | 848 | | standard). | 849 | ascii_REQUIRED4 | typedef utf8string ascii_REQUIRED4; | 850 | | String is sent as ASCII and thus is | 851 | | automatically UTF-8. | 852 | pathname4 | typedef component4 pathname4<>; | 853 | | Represents path name for fs_locations. | 854 | nfs_lockid4 | typedef uint64_t nfs_lockid4; | 855 | verifier4 | typedef opaque verifier4[NFS4_VERIFIER_SIZE]; | 856 | | Verifier used for various operations (COMMIT, | 857 | | CREATE, OPEN, READDIR, WRITE) | 858 | | NFS4_VERIFIER_SIZE is defined as 8. | 859 +-----------------+-------------------------------------------------+ 861 End of Base Data Types 862 Table 1 864 2.2. Structured Data Types 866 2.2.1. nfstime4 868 struct nfstime4 { 869 int64_t seconds; 870 uint32_t nseconds; 871 }; 873 The nfstime4 structure gives the number of seconds and nanoseconds 874 since midnight or 0 hour January 1, 1970 Coordinated Universal Time 875 (UTC). Values greater than zero for the seconds field denote dates 876 after the 0 hour January 1, 1970. Values less than zero for the 877 seconds field denote dates before the 0 hour January 1, 1970. In 878 both cases, the nseconds field is to be added to the seconds field 879 for the final time representation. For example, if the time to be 880 represented is one-half second before 0 hour January 1, 1970, the 881 seconds field would have a value of negative one (-1) and the 882 nseconds fields would have a value of one-half second (500000000). 883 Values greater than 999,999,999 for nseconds are considered invalid. 885 This data type is used to pass time and date information. A server 886 converts to and from its local representation of time when processing 887 time values, preserving as much accuracy as possible. If the 888 precision of timestamps stored for a file system object is less than 889 defined, loss of precision can occur. An adjunct time maintenance 890 protocol is recommended to reduce client and server time skew. 892 2.2.2. time_how4 894 enum time_how4 { 895 SET_TO_SERVER_TIME4 = 0, 896 SET_TO_CLIENT_TIME4 = 1 897 }; 899 2.2.3. settime4 901 union settime4 switch (time_how4 set_it) { 902 case SET_TO_CLIENT_TIME4: 903 nfstime4 time; 904 default: 905 void; 906 }; 907 The above definitions are used as the attribute definitions to set 908 time values. If set_it is SET_TO_SERVER_TIME4, then the server uses 909 its local representation of time for the time value. 911 2.2.4. specdata4 913 struct specdata4 { 914 uint32_t specdata1; /* major device number */ 915 uint32_t specdata2; /* minor device number */ 916 }; 918 This data type represents additional information for the device file 919 types NF4CHR and NF4BLK. 921 2.2.5. fsid4 923 struct fsid4 { 924 uint64_t major; 925 uint64_t minor; 926 }; 928 This type is the file system identifier that is used as a REQUIRED 929 attribute. 931 2.2.6. fs_location4 933 struct fs_location4 { 934 utf8str_cis server<>; 935 pathname4 rootpath; 936 }; 938 2.2.7. fs_locations4 940 struct fs_locations4 { 941 pathname4 fs_root; 942 fs_location4 locations<>; 943 }; 945 The fs_location4 and fs_locations4 data types are used for the 946 fs_locations RECOMMENDED attribute which is used for migration and 947 replication support. 949 2.2.8. fattr4 951 struct fattr4 { 952 bitmap4 attrmask; 953 attrlist4 attr_vals; 954 }; 955 The fattr4 structure is used to represent file and directory 956 attributes. 958 The bitmap is a counted array of 32 bit integers used to contain bit 959 values. The position of the integer in the array that contains bit n 960 can be computed from the expression (n / 32) and its bit within that 961 integer is (n mod 32). 963 0 1 964 +-----------+-----------+-----------+-- 965 | count | 31 .. 0 | 63 .. 32 | 966 +-----------+-----------+-----------+-- 968 2.2.9. change_info4 970 struct change_info4 { 971 bool atomic; 972 changeid4 before; 973 changeid4 after; 974 }; 976 This structure is used with the CREATE, LINK, REMOVE, RENAME 977 operations to let the client know the value of the change attribute 978 for the directory in which the target file system object resides. 980 2.2.10. clientaddr4 982 struct clientaddr4 { 983 /* see struct rpcb in RFC 1833 */ 984 string r_netid<>; /* network id */ 985 string r_addr<>; /* universal address */ 986 }; 988 The clientaddr4 structure is used as part of the SETCLIENTID 989 operation to either specify the address of the client that is using a 990 client ID or as part of the callback registration. The r_netid and 991 r_addr fields respectively contain a network id and universal 992 address. The network id and universal address concepts together with 993 formats for TCP over IPv4 and TCP over IPv6 are defined in [RFC5665], 994 specifically Tables 2 and 3 and Sections 5.2.3.3 and 5.2.3.4. 996 2.2.11. cb_client4 998 struct cb_client4 { 999 unsigned int cb_program; 1000 clientaddr4 cb_location; 1001 }; 1002 This structure is used by the client to inform the server of its call 1003 back address; includes the program number and client address. 1005 2.2.12. nfs_client_id4 1007 struct nfs_client_id4 { 1008 verifier4 verifier; 1009 opaque id; 1010 }; 1012 This structure is part of the arguments to the SETCLIENTID operation. 1014 2.2.13. open_owner4 1016 struct open_owner4 { 1017 clientid4 clientid; 1018 opaque owner; 1019 }; 1021 This structure is used to identify the owner of open state. 1023 2.2.14. lock_owner4 1025 struct lock_owner4 { 1026 clientid4 clientid; 1027 opaque owner; 1028 }; 1030 This structure is used to identify the owner of file locking state. 1032 2.2.15. open_to_lock_owner4 1034 struct open_to_lock_owner4 { 1035 seqid4 open_seqid; 1036 stateid4 open_stateid; 1037 seqid4 lock_seqid; 1038 lock_owner4 lock_owner; 1039 }; 1041 This structure is used for the first LOCK operation done for an 1042 open_owner4. It provides both the open_stateid and lock_owner such 1043 that the transition is made from a valid open_stateid sequence to 1044 that of the new lock_stateid sequence. Using this mechanism avoids 1045 the confirmation of the lock_owner/lock_seqid pair since it is tied 1046 to established state in the form of the open_stateid/open_seqid. 1048 2.2.16. stateid4 1050 struct stateid4 { 1051 uint32_t seqid; 1052 opaque other[NFS4_OTHER_SIZE]; 1053 }; 1055 This structure is used for the various state sharing mechanisms 1056 between the client and server. For the client, this data structure 1057 is read-only. The server is required to increment the seqid field 1058 monotonically at each transition of the stateid. This is important 1059 since the client will inspect the seqid in OPEN stateids to determine 1060 the order of OPEN processing done by the server. 1062 3. RPC and Security Flavor 1064 The NFSv4 protocol is a RPC application that uses RPC version 2 and 1065 the XDR as defined in [RFC5531] and [RFC4506]. The RPCSEC_GSS 1066 security flavors as defined in version 1 ([RFC2203]) and version 2 1067 ([RFC5403]) MUST be implemented as the mechanism to deliver stronger 1068 security for the NFSv4 protocol. However, deployment of RPCSEC_GSS 1069 is optional. 1071 3.1. Ports and Transports 1073 Historically, NFSv2 and NFSv3 servers have resided on port 2049. The 1074 registered port 2049 [RFC3232] for the NFS protocol SHOULD be the 1075 default configuration. Using the registered port for NFS services 1076 means the NFS client will not need to use the RPC binding protocols 1077 as described in [RFC1833]; this will allow NFS to transit firewalls. 1079 Where an NFSv4 implementation supports operation over the IP network 1080 protocol, the supported transport layer between NFS and IP MUST be an 1081 IETF standardized transport protocol that is specified to avoid 1082 network congestion; such transports include TCP and Stream Control 1083 Transmission Protocol (SCTP). To enhance the possibilities for 1084 interoperability, an NFSv4 implementation MUST support operation over 1085 the TCP transport protocol. 1087 If TCP is used as the transport, the client and server SHOULD use 1088 persistent connections. This will prevent the weakening of TCP's 1089 congestion control via short lived connections and will improve 1090 performance for the Wide Area Network (WAN) environment by 1091 eliminating the need for SYN handshakes. 1093 As noted in Section 17, the authentication model for NFSv4 has moved 1094 from machine-based to principal-based. However, this modification of 1095 the authentication model does not imply a technical requirement to 1096 move the TCP connection management model from whole machine-based to 1097 one based on a per user model. In particular, NFS over TCP client 1098 implementations have traditionally multiplexed traffic for multiple 1099 users over a common TCP connection between an NFS client and server. 1100 This has been true, regardless of whether the NFS client is using 1101 AUTH_SYS, AUTH_DH, RPCSEC_GSS or any other flavor. Similarly, NFS 1102 over TCP server implementations have assumed such a model and thus 1103 scale the implementation of TCP connection management in proportion 1104 to the number of expected client machines. It is intended that NFSv4 1105 will not modify this connection management model. NFSv4 clients that 1106 violate this assumption can expect scaling issues on the server and 1107 hence reduced service. 1109 3.1.1. Client Retransmission Behavior 1111 When processing a NFSv4 request received over a reliable transport 1112 such as TCP, the NFSv4 server MUST NOT silently drop the request, 1113 except if the established transport connection has been broken. 1114 Given such a contract between NFSv4 clients and servers, clients MUST 1115 NOT retry a request unless one or both of the following are true: 1117 o The transport connection has been broken 1119 o The procedure being retried is the NULL procedure 1121 Since reliable transports, such as TCP, do not always synchronously 1122 inform a peer when the other peer has broken the connection (for 1123 example, when an NFS server reboots), the NFSv4 client may want to 1124 actively "probe" the connection to see if has been broken. Use of 1125 the NULL procedure is one recommended way to do so. So, when a 1126 client experiences a remote procedure call timeout (of some arbitrary 1127 implementation specific amount), rather than retrying the remote 1128 procedure call, it could instead issue a NULL procedure call to the 1129 server. If the server has died, the transport connection break will 1130 eventually be indicated to the NFSv4 client. The client can then 1131 reconnect, and then retry the original request. If the NULL 1132 procedure call gets a response, the connection has not broken. The 1133 client can decide to wait longer for the original request's response, 1134 or it can break the transport connection and reconnect before re- 1135 sending the original request. 1137 For callbacks from the server to the client, the same rules apply, 1138 but the server doing the callback becomes the client, and the client 1139 receiving the callback becomes the server. 1141 3.2. Security Flavors 1143 Traditional RPC implementations have included AUTH_NONE, AUTH_SYS, 1144 AUTH_DH, and AUTH_KRB4 as security flavors. With [RFC2203] an 1145 additional security flavor of RPCSEC_GSS has been introduced which 1146 uses the functionality of GSS-API [RFC2743]. This allows for the use 1147 of various security mechanisms by the RPC layer without the 1148 additional implementation overhead of adding RPC security flavors. 1149 For NFSv4, the RPCSEC_GSS security flavor MUST be used to enable the 1150 mandatory to implement security mechanism. Other flavors, such as, 1151 AUTH_NONE, AUTH_SYS, and AUTH_DH MAY be implemented as well. 1153 3.2.1. Security mechanisms for NFSv4 1155 RPCSEC_GSS, via GSS-API, supports multiple mechanisms that provide 1156 security services. For interoperability, NFSv4 clients and servers 1157 MUST support the Kerberos V5 security mechanism. 1159 The use of RPCSEC_GSS requires selection of mechanism, quality of 1160 protection (QOP), and service (authentication, integrity, privacy). 1161 For the mandated security mechanisms, NFSv4 specifies that a QOP of 1162 zero is used, leaving it up to the mechanism or the mechanism's 1163 configuration to map QOP zero to an appropriate level of protection. 1164 Each mandated mechanism specifies a minimum set of cryptographic 1165 algorithms for implementing integrity and privacy. NFSv4 clients and 1166 servers MUST be implemented on operating environments that comply 1167 with the required cryptographic algorithms of each required 1168 mechanism. 1170 3.2.1.1. Kerberos V5 as a Security Triple 1172 The Kerberos V5 GSS-API mechanism as described in [RFC4121] MUST be 1173 implemented with the RPCSEC_GSS services as specified in Table 2. 1174 Both client and server MUST support each of the pseudo flavors. 1176 Mapping pseudo flavor to service 1178 +--------+-------+----------------------+-----------------------+ 1179 | Number | Name | Mechanism's OID | RPCSEC_GSS service | 1180 +--------+-------+----------------------+-----------------------+ 1181 | 390003 | krb5 | 1.2.840.113554.1.2.2 | rpc_gss_svc_none | 1182 | 390004 | krb5i | 1.2.840.113554.1.2.2 | rpc_gss_svc_integrity | 1183 | 390005 | krb5p | 1.2.840.113554.1.2.2 | rpc_gss_svc_privacy | 1184 +--------+-------+----------------------+-----------------------+ 1186 Table 2 1188 Note that the pseudo flavor is presented here as a mapping aid to the 1189 implementer. Because this NFS protocol includes a method to 1190 negotiate security and it understands the GSS-API mechanism, the 1191 pseudo flavor is not needed. The pseudo flavor is needed for NFSv3 1192 since the security negotiation is done via the MOUNT protocol as 1193 described in [RFC2623]. 1195 At the time this document was specified, the Advanced Encryption 1196 Standard (AES) with HMAC-SHA1 was a required algorithm set for 1197 Kerberos V5. In contrast, when NFSv4.0 was first specified in 1198 [RFC3530], weaker algorithm sets were REQUIRED for Kerberos V5, and 1199 were REQUIRED in the NFSv4.0 specification, because the Kerberos V5 1200 specification at the time did not specify stronger algorithms. The 1201 NFSv4 specification does not specify required algorithms for Kerberos 1202 V5, and instead, the implementer is expected to track the evolution 1203 of the Kerberos V5 standard if and when stronger algorithms are 1204 specified. 1206 3.2.1.1.1. Security Considerations for Cryptographic Algorithms in 1207 Kerberos V5 1209 When deploying NFSv4, the strength of the security achieved depends 1210 on the existing Kerberos V5 infrastructure. The algorithms of 1211 Kerberos V5 are not directly exposed to or selectable by the client 1212 or server, so there is some due diligence required by the user of 1213 NFSv4 to ensure that security is acceptable where needed. Guidance 1214 is provided in [RFC6649] as to why weak algorithms should be disabled 1215 by default. 1217 3.3. Security Negotiation 1219 With the NFSv4 server potentially offering multiple security 1220 mechanisms, the client needs a method to determine or negotiate which 1221 mechanism is to be used for its communication with the server. The 1222 NFS server can have multiple points within its file system name space 1223 that are available for use by NFS clients. In turn the NFS server 1224 can be configured such that each of these entry points can have 1225 different or multiple security mechanisms in use. 1227 The security negotiation between client and server SHOULD be done 1228 with a secure channel to eliminate the possibility of a third party 1229 intercepting the negotiation sequence and forcing the client and 1230 server to choose a lower level of security than required or desired. 1231 See Section 17 for further discussion. 1233 3.3.1. SECINFO 1235 The SECINFO operation will allow the client to determine, on a per 1236 filehandle basis, what security triple (see [RFC2743]) is to be used 1237 for server access. In general, the client will not have to use the 1238 SECINFO operation except during initial communication with the server 1239 or when the client encounters a new security policy as the client 1240 navigates the name space. Either condition will force the client to 1241 negotiate a new security triple. 1243 3.3.2. Security Error 1245 Based on the assumption that each NFSv4 client and server MUST 1246 support a minimum set of security (i.e., Kerberos-V5 under 1247 RPCSEC_GSS), the NFS client will start its communication with the 1248 server with one of the minimal security triples. During 1249 communication with the server, the client can receive an NFS error of 1250 NFS4ERR_WRONGSEC. This error allows the server to notify the client 1251 that the security triple currently being used is not appropriate for 1252 access to the server's file system resources. The client is then 1253 responsible for determining what security triples are available at 1254 the server and choose one which is appropriate for the client. See 1255 Section 15.33 for further discussion of how the client will respond 1256 to the NFS4ERR_WRONGSEC error and use SECINFO. 1258 3.3.3. Callback RPC Authentication 1260 Except as noted elsewhere in this section, the callback RPC 1261 (described later) MUST mutually authenticate the NFS server to the 1262 principal that acquired the client ID (also described later), using 1263 the security flavor of the original SETCLIENTID operation used. 1265 For AUTH_NONE, there are no principals, so this is a non-issue. 1267 AUTH_SYS has no notions of mutual authentication or a server 1268 principal, so the callback from the server simply uses the AUTH_SYS 1269 credential that the user used when he set up the delegation. 1271 For AUTH_DH, one commonly used convention is that the server uses the 1272 credential corresponding to this AUTH_DH principal: 1274 unix.host@domain 1276 where host and domain are variables corresponding to the name of 1277 server host and directory services domain in which it lives such as a 1278 Network Information System domain or a DNS domain. 1280 Regardless of what security mechanism under RPCSEC_GSS is being used, 1281 the NFS server MUST identify itself in GSS-API via a 1282 GSS_C_NT_HOSTBASED_SERVICE name type. GSS_C_NT_HOSTBASED_SERVICE 1283 names are of the form: 1285 service@hostname 1287 For NFS, the "service" element is 1289 nfs 1291 Implementations of security mechanisms will convert nfs@hostname to 1292 various different forms. For Kerberos V5, the following form is 1293 RECOMMENDED: 1295 nfs/hostname 1297 For Kerberos V5, nfs/hostname would be a server principal in the 1298 Kerberos Key Distribution Center database. This is the same 1299 principal the client acquired a GSS-API context for when it issued 1300 the SETCLIENTID operation, therefore, the realm name for the server 1301 principal must be the same for the callback as it was for the 1302 SETCLIENTID. 1304 4. Filehandles 1306 The filehandle in the NFS protocol is a per server unique identifier 1307 for a file system object. The contents of the filehandle are opaque 1308 to the client. Therefore, the server is responsible for translating 1309 the filehandle to an internal representation of the file system 1310 object. 1312 4.1. Obtaining the First Filehandle 1314 The operations of the NFS protocol are defined in terms of one or 1315 more filehandles. Therefore, the client needs a filehandle to 1316 initiate communication with the server. With the NFSv2 protocol 1317 [RFC1094] and the NFSv3 protocol [RFC1813], there exists an ancillary 1318 protocol to obtain this first filehandle. The MOUNT protocol, RPC 1319 program number 100005, provides the mechanism of translating a string 1320 based file system path name to a filehandle which can then be used by 1321 the NFS protocols. 1323 The MOUNT protocol has deficiencies in the area of security and use 1324 via firewalls. This is one reason that the use of the public 1325 filehandle was introduced in [RFC2054] and [RFC2055]. With the use 1326 of the public filehandle in combination with the LOOKUP operation in 1327 the NFSv2 and NFSv3 protocols, it has been demonstrated that the 1328 MOUNT protocol is unnecessary for viable interaction between NFS 1329 client and server. 1331 Therefore, the NFSv4 protocol will not use an ancillary protocol for 1332 translation from string based path names to a filehandle. Two 1333 special filehandles will be used as starting points for the NFS 1334 client. 1336 4.1.1. Root Filehandle 1338 The first of the special filehandles is the ROOT filehandle. The 1339 ROOT filehandle is the "conceptual" root of the file system name 1340 space at the NFS server. The client uses or starts with the ROOT 1341 filehandle by employing the PUTROOTFH operation. The PUTROOTFH 1342 operation instructs the server to set the "current" filehandle to the 1343 ROOT of the server's file tree. Once this PUTROOTFH operation is 1344 used, the client can then traverse the entirety of the server's file 1345 tree with the LOOKUP operation. A complete discussion of the server 1346 name space is in Section 7. 1348 4.1.2. Public Filehandle 1350 The second special filehandle is the PUBLIC filehandle. Unlike the 1351 ROOT filehandle, the PUBLIC filehandle may be bound or represent an 1352 arbitrary file system object at the server. The server is 1353 responsible for this binding. It may be that the PUBLIC filehandle 1354 and the ROOT filehandle refer to the same file system object. 1355 However, it is up to the administrative software at the server and 1356 the policies of the server administrator to define the binding of the 1357 PUBLIC filehandle and server file system object. The client may not 1358 make any assumptions about this binding. The client uses the PUBLIC 1359 filehandle via the PUTPUBFH operation. 1361 4.2. Filehandle Types 1363 In the NFSv2 and NFSv3 protocols, there was one type of filehandle 1364 with a single set of semantics, of which the primary one was that it 1365 was persistent across a server reboot. As such, this type of 1366 filehandle is termed "persistent" in NFS Version 4. The semantics of 1367 a persistent filehandle remain the same as before. A new type of 1368 filehandle introduced in NFS Version 4 is the "volatile" filehandle, 1369 which attempts to accommodate certain server environments. 1371 The volatile filehandle type was introduced to address server 1372 functionality or implementation issues which make correct 1373 implementation of a persistent filehandle infeasible. Some server 1374 environments do not provide a file system level invariant that can be 1375 used to construct a persistent filehandle. The underlying server 1376 file system may not provide the invariant or the server's file system 1377 programming interfaces may not provide access to the needed 1378 invariant. Volatile filehandles may ease the implementation of 1379 server functionality such as hierarchical storage management or file 1380 system reorganization or migration. However, the volatile filehandle 1381 increases the implementation burden for the client. 1383 Since the client will need to handle persistent and volatile 1384 filehandles differently, a file attribute is defined which may be 1385 used by the client to determine the filehandle types being returned 1386 by the server. 1388 4.2.1. General Properties of a Filehandle 1390 The filehandle contains all the information the server needs to 1391 distinguish an individual file. To the client, the filehandle is 1392 opaque. The client stores filehandles for use in a later request and 1393 can compare two filehandles from the same server for equality by 1394 doing a byte-by-byte comparison. However, the client MUST NOT 1395 otherwise interpret the contents of filehandles. If two filehandles 1396 from the same server are equal, they MUST refer to the same file. 1397 However, it is not required that two different filehandles refer to 1398 different file system objects. Servers SHOULD try to maintain a one- 1399 to-one correspondence between filehandles and file system objects but 1400 there may be situations in which the mapping is not one-to-one. 1401 Clients MUST use filehandle comparisons only to improve performance, 1402 not for correct behavior. All clients need to be prepared for 1403 situations in which it cannot be determined whether two different 1404 filehandles denote the same object and in such cases, avoid assuming 1405 that objects denoted are different, as this might cause incorrect 1406 behavior. Further discussion of filehandle and attribute comparison 1407 in the context of data caching is presented in Section 10.3.4. 1409 As an example, in the case that two different path names when 1410 traversed at the server terminate at the same file system object, the 1411 server SHOULD return the same filehandle for each path. This can 1412 occur if a hard link is used to create two file names which refer to 1413 the same underlying file object and associated data. For example, if 1414 paths /a/b/c and /a/d/c refer to the same file, the server SHOULD 1415 return the same filehandle for both path names traversals. 1417 4.2.2. Persistent Filehandle 1419 A persistent filehandle is defined as having a fixed value for the 1420 lifetime of the file system object to which it refers. Once the 1421 server creates the filehandle for a file system object, the server 1422 MUST accept the same filehandle for the object for the lifetime of 1423 the object. If the server restarts or reboots the NFS server must 1424 honor the same filehandle value as it did in the server's previous 1425 instantiation. Similarly, if the file system is migrated, the new 1426 NFS server must honor the same filehandle as the old NFS server. 1428 The persistent filehandle will be become stale or invalid when the 1429 file system object is removed. When the server is presented with a 1430 persistent filehandle that refers to a deleted object, it MUST return 1431 an error of NFS4ERR_STALE. A filehandle may become stale when the 1432 file system containing the object is no longer available. The file 1433 system may become unavailable if it exists on removable media and the 1434 media is no longer available at the server or the file system in 1435 whole has been destroyed or the file system has simply been removed 1436 from the server's name space (i.e., unmounted in a UNIX environment). 1438 4.2.3. Volatile Filehandle 1440 A volatile filehandle does not share the same longevity 1441 characteristics of a persistent filehandle. The server may determine 1442 that a volatile filehandle is no longer valid at many different 1443 points in time. If the server can definitively determine that a 1444 volatile filehandle refers to an object that has been removed, the 1445 server should return NFS4ERR_STALE to the client (as is the case for 1446 persistent filehandles). In all other cases where the server 1447 determines that a volatile filehandle can no longer be used, it 1448 should return an error of NFS4ERR_FHEXPIRED. 1450 The REQUIRED attribute "fh_expire_type" is used by the client to 1451 determine what type of filehandle the server is providing for a 1452 particular file system. This attribute is a bitmask with the 1453 following values: 1455 FH4_PERSISTENT: The value of FH4_PERSISTENT is used to indicate a 1456 persistent filehandle, which is valid until the object is removed 1457 from the file system. The server will not return 1458 NFS4ERR_FHEXPIRED for this filehandle. FH4_PERSISTENT is defined 1459 as a value in which none of the bits specified below are set. 1461 FH4_VOLATILE_ANY: The filehandle may expire at any time, except as 1462 specifically excluded (i.e., FH4_NOEXPIRE_WITH_OPEN). 1464 FH4_NOEXPIRE_WITH_OPEN: May only be set when FH4_VOLATILE_ANY is 1465 set. If this bit is set, then the meaning of FH4_VOLATILE_ANY is 1466 qualified to exclude any expiration of the filehandle when it is 1467 open. 1469 FH4_VOL_MIGRATION: The filehandle will expire as a result of 1470 migration. If FH4_VOLATILE_ANY is set, FH4_VOL_MIGRATION is 1471 redundant. 1473 FH4_VOL_RENAME: The filehandle will expire during rename. This 1474 includes a rename by the requesting client or a rename by any 1475 other client. If FH4_VOLATILE_ANY is set, FH4_VOL_RENAME is 1476 redundant. 1478 Servers which provide volatile filehandles that may expire while open 1479 (i.e., if FH4_VOL_MIGRATION or FH4_VOL_RENAME is set or if 1480 FH4_VOLATILE_ANY is set and FH4_NOEXPIRE_WITH_OPEN not set), should 1481 deny a RENAME or REMOVE that would affect an OPEN file of any of the 1482 components leading to the OPEN file. In addition, the server SHOULD 1483 deny all RENAME or REMOVE requests during the grace period upon 1484 server restart. 1486 Note that the bits FH4_VOL_MIGRATION and FH4_VOL_RENAME allow the 1487 client to determine that expiration has occurred whenever a specific 1488 event occurs, without an explicit filehandle expiration error from 1489 the server. FH4_VOLATILE_ANY does not provide this form of 1490 information. In situations where the server will expire many, but 1491 not all filehandles upon migration (e.g., all but those that are 1492 open), FH4_VOLATILE_ANY (in this case with FH4_NOEXPIRE_WITH_OPEN) is 1493 a better choice since the client may not assume that all filehandles 1494 will expire when migration occurs, and it is likely that additional 1495 expirations will occur (as a result of file CLOSE) that are separated 1496 in time from the migration event itself. 1498 4.2.4. One Method of Constructing a Volatile Filehandle 1500 A volatile filehandle, while opaque to the client, could contain: 1502 [volatile bit = 1 | server boot time | slot | generation number] 1504 o slot is an index in the server volatile filehandle table 1506 o generation number is the generation number for the table entry/ 1507 slot 1509 When the client presents a volatile filehandle, the server makes the 1510 following checks, which assume that the check for the volatile bit 1511 has passed. If the server boot time is less than the current server 1512 boot time, return NFS4ERR_FHEXPIRED. If slot is out of range, return 1513 NFS4ERR_BADHANDLE. If the generation number does not match, return 1514 NFS4ERR_FHEXPIRED. 1516 When the server reboots, the table is gone (it is volatile). 1518 If volatile bit is 0, then it is a persistent filehandle with a 1519 different structure following it. 1521 4.3. Client Recovery from Filehandle Expiration 1523 If possible, the client should recover from the receipt of an 1524 NFS4ERR_FHEXPIRED error. The client must take on additional 1525 responsibility so that it may prepare itself to recover from the 1526 expiration of a volatile filehandle. If the server returns 1527 persistent filehandles, the client does not need these additional 1528 steps. 1530 For volatile filehandles, most commonly the client will need to store 1531 the component names leading up to and including the file system 1532 object in question. With these names, the client should be able to 1533 recover by finding a filehandle in the name space that is still 1534 available or by starting at the root of the server's file system name 1535 space. 1537 If the expired filehandle refers to an object that has been removed 1538 from the file system, obviously the client will not be able to 1539 recover from the expired filehandle. 1541 It is also possible that the expired filehandle refers to a file that 1542 has been renamed. If the file was renamed by another client, again 1543 it is possible that the original client will not be able to recover. 1544 However, in the case that the client itself is renaming the file and 1545 the file is open, it is possible that the client may be able to 1546 recover. The client can determine the new path name based on the 1547 processing of the rename request. The client can then regenerate the 1548 new filehandle based on the new path name. The client could also use 1549 the compound operation mechanism to construct a set of operations 1550 like: 1552 RENAME A B 1553 LOOKUP B 1554 GETFH 1556 Note that the COMPOUND procedure does not provide atomicity. This 1557 example only reduces the overhead of recovering from an expired 1558 filehandle. 1560 5. Attributes 1562 To meet the requirements of extensibility and increased 1563 interoperability with non-UNIX platforms, attributes need to be 1564 handled in a flexible manner. The NFSv3 fattr3 structure contains a 1565 fixed list of attributes that not all clients and servers are able to 1566 support or care about. The fattr3 structure cannot be extended as 1567 new needs arise and it provides no way to indicate non-support. With 1568 the NFSv4.0 protocol, the client is able to query what attributes the 1569 server supports and construct requests with only those supported 1570 attributes (or a subset thereof). 1572 To this end, attributes are divided into three groups: REQUIRED, 1573 RECOMMENDED, and named. Both REQUIRED and RECOMMENDED attributes are 1574 supported in the NFSv4.0 protocol by a specific and well-defined 1575 encoding and are identified by number. They are requested by setting 1576 a bit in the bit vector sent in the GETATTR request; the server 1577 response includes a bit vector to list what attributes were returned 1578 in the response. New REQUIRED or RECOMMENDED attributes may be added 1579 to the NFSv4 protocol as part of a new minor version by publishing a 1580 Standards Track RFC which allocates a new attribute number value and 1581 defines the encoding for the attribute. See Section 11 for further 1582 discussion. 1584 Named attributes are accessed by the OPENATTR operation, which 1585 accesses a hidden directory of attributes associated with a file 1586 system object. OPENATTR takes a filehandle for the object and 1587 returns the filehandle for the attribute hierarchy. The filehandle 1588 for the named attributes is a directory object accessible by LOOKUP 1589 or READDIR and contains files whose names represent the named 1590 attributes and whose data bytes are the value of the attribute. For 1591 example: 1593 +----------+-----------+---------------------------------+ 1594 | LOOKUP | "foo" | ; look up file | 1595 | GETATTR | attrbits | | 1596 | OPENATTR | | ; access foo's named attributes | 1597 | LOOKUP | "x11icon" | ; look up specific attribute | 1598 | READ | 0,4096 | ; read stream of bytes | 1599 +----------+-----------+---------------------------------+ 1601 Named attributes are intended for data needed by applications rather 1602 than by an NFS client implementation. NFS implementers are strongly 1603 encouraged to define their new attributes as RECOMMENDED attributes 1604 by bringing them to the IETF Standards Track process. 1606 The set of attributes that are classified as REQUIRED is deliberately 1607 small since servers need to do whatever it takes to support them. A 1608 server should support as many of the RECOMMENDED attributes as 1609 possible but, by their definition, the server is not required to 1610 support all of them. Attributes are deemed REQUIRED if the data is 1611 both needed by a large number of clients and is not otherwise 1612 reasonably computable by the client when support is not provided on 1613 the server. 1615 Note that the hidden directory returned by OPENATTR is a convenience 1616 for protocol processing. The client should not make any assumptions 1617 about the server's implementation of named attributes and whether or 1618 not the underlying file system at the server has a named attribute 1619 directory. Therefore, operations such as SETATTR and GETATTR on the 1620 named attribute directory are undefined. 1622 5.1. REQUIRED Attributes 1624 These MUST be supported by every NFSv4.0 client and server in order 1625 to ensure a minimum level of interoperability. The server MUST store 1626 and return these attributes, and the client MUST be able to function 1627 with an attribute set limited to these attributes. With just the 1628 REQUIRED attributes some client functionality can be impaired or 1629 limited in some ways. A client can ask for any of these attributes 1630 to be returned by setting a bit in the GETATTR request. For each 1631 such bit set, the server MUST return the corresponding attribute 1632 value. 1634 5.2. RECOMMENDED Attributes 1636 These attributes are understood well enough to warrant support in the 1637 NFSv4.0 protocol. However, they may not be supported on all clients 1638 and servers. A client MAY ask for any of these attributes to be 1639 returned by setting a bit in the GETATTR request but MUST handle the 1640 case where the server does not return them. A client MAY ask for the 1641 set of attributes the server supports and SHOULD NOT request 1642 attributes the server does not support. A server should be tolerant 1643 of requests for unsupported attributes and simply not return them 1644 rather than considering the request an error. It is expected that 1645 servers will support all attributes they comfortably can and only 1646 fail to support attributes that are difficult to support in their 1647 operating environments. A server should provide attributes whenever 1648 they don't have to "tell lies" to the client. For example, a file 1649 modification time should be either an accurate time or should not be 1650 supported by the server. At times this will be difficult for 1651 clients, but a client is better positioned to decide whether and how 1652 to fabricate or construct an attribute or whether to do without the 1653 attribute. 1655 5.3. Named Attributes 1657 These attributes are not supported by direct encoding in the NFSv4 1658 protocol but are accessed by string names rather than numbers and 1659 correspond to an uninterpreted stream of bytes that are stored with 1660 the file system object. The name space for these attributes may be 1661 accessed by using the OPENATTR operation. The OPENATTR operation 1662 returns a filehandle for a virtual "named attribute directory", and 1663 further perusal and modification of the name space may be done using 1664 operations that work on more typical directories. In particular, 1665 READDIR may be used to get a list of such named attributes, and 1666 LOOKUP and OPEN may select a particular attribute. Creation of a new 1667 named attribute may be the result of an OPEN specifying file 1668 creation. 1670 Once an OPEN is done, named attributes may be examined and changed by 1671 normal READ and WRITE operations using the filehandles and stateids 1672 returned by OPEN. 1674 Named attributes and the named attribute directory may have their own 1675 (non-named) attributes. Each of these objects must have all of the 1676 REQUIRED attributes and may have additional RECOMMENDED attributes. 1677 However, the set of attributes for named attributes and the named 1678 attribute directory need not be, and typically will not be, as large 1679 as that for other objects in that file system. 1681 Named attributes might be the target of delegations. However, since 1682 granting of delegations is at the server's discretion, a server need 1683 not support delegations on named attributes. 1685 It is RECOMMENDED that servers support arbitrary named attributes. A 1686 client should not depend on the ability to store any named attributes 1687 in the server's file system. If a server does support named 1688 attributes, a client that is also able to handle them should be able 1689 to copy a file's data and metadata with complete transparency from 1690 one location to another; this would imply that names allowed for 1691 regular directory entries are valid for named attribute names as 1692 well. 1694 In NFSv4.0, the structure of named attribute directories is 1695 restricted in a number of ways, in order to prevent the development 1696 of non-interoperable implementations in which some servers support a 1697 fully general hierarchical directory structure for named attributes 1698 while others support a limited but adequate structure for named 1699 attributes. In such an environment, clients or applications might 1700 come to depend on non-portable extensions. The restrictions are: 1702 o CREATE is not allowed in a named attribute directory. Thus, such 1703 objects as symbolic links and special files are not allowed to be 1704 named attributes. Further, directories may not be created in a 1705 named attribute directory, so no hierarchical structure of named 1706 attributes for a single object is allowed. 1708 o If OPENATTR is done on a named attribute directory or on a named 1709 attribute, the server MUST return an error. 1711 o Doing a RENAME of a named attribute to a different named attribute 1712 directory or to an ordinary (i.e., non-named-attribute) directory 1713 is not allowed. 1715 o Creating hard links between named attribute directories or between 1716 named attribute directories and ordinary directories is not 1717 allowed. 1719 Names of attributes will not be controlled by this document or other 1720 IETF Standards Track documents. See Section 18 for further 1721 discussion. 1723 5.4. Classification of Attributes 1725 Each of attributes accessed using SETATTR and GETATTR (i.e., REQUIRED 1726 an RECOMMENDED attributes) can be classified in one of three 1727 categories: 1729 1. per server attributes for which the value of the attribute will 1730 be the same for all file objects that share the same server. 1732 2. per file system attributes for which the value of the attribute 1733 will be the same for some or all file objects that share the same 1734 server and fsid attribute (Section 5.8.1.9). See below for 1735 details regarding when such sharing is in effect. 1737 3. per file system object attributes 1739 The handling of per file system attributes depends on the particular 1740 attribute and the setting of the homogeneous (Section 5.8.2.12) 1741 attribute. The following rules apply: 1743 1. The values of the attribute supported_attrs, fsid, homogeneous, 1744 link_support, and symlink_support are always common to all object 1745 within the given file system. 1747 2. For other attributes, different values may be returned for 1748 different file system objects if the attribute homogeneous is 1749 supported within the file system in question and has the value 1750 false. 1752 The classification of attributes is as follows. Note that the 1753 attributes time_access_set and time_modify_set are not listed in this 1754 section because they are write-only attributes corresponding to 1755 time_access and time_modify, and are used in a special instance of 1756 SETATTR. 1758 o The per-server attribute is: 1760 lease_time 1762 o The per-file system attributes are: 1764 supported_attrs, fh_expire_type, link_support, symlink_support, 1765 unique_handles, aclsupport, cansettime, case_insensitive, 1766 case_preserving, chown_restricted, files_avail, files_free, 1767 files_total, fs_locations, homogeneous, maxfilesize, maxname, 1768 maxread, maxwrite, no_trunc, space_avail, space_free, 1769 space_total, time_delta, 1771 o The per-file system object attributes are: 1773 type, change, size, named_attr, fsid, rdattr_error, filehandle, 1774 acl, archive, fileid, hidden, maxlink, mimetype, mode, 1775 numlinks, owner, owner_group, rawdev, space_used, system, 1776 time_access, time_backup, time_create, time_metadata, 1777 time_modify, mounted_on_fileid 1779 For quota_avail_hard, quota_avail_soft, and quota_used, see their 1780 definitions below for the appropriate classification. 1782 5.5. Set-Only and Get-Only Attributes 1784 Some REQUIRED and RECOMMENDED attributes are set-only; i.e., they can 1785 be set via SETATTR but not retrieved via GETATTR. Similarly, some 1786 REQUIRED and RECOMMENDED attributes are get-only; i.e., they can be 1787 retrieved via GETATTR but not set via SETATTR. If a client attempts 1788 to set a get-only attribute or get a set-only attribute, the server 1789 MUST return NFS4ERR_INVAL. 1791 5.6. REQUIRED Attributes - List and Definition References 1793 The list of REQUIRED attributes appears in Table 3. The meaning of 1794 the columns of the table are: 1796 o Name: The name of attribute 1798 o Id: The number assigned to the attribute. In the event of 1799 conflicts between the assigned number and [RFCNFSv4XDR], the 1800 latter is authoritative, but in such an event, it should be 1801 resolved with Errata to this document and/or [RFCNFSv4XDR]. See 1802 [IESG_ERRATA] for the Errata process. 1804 o Data Type: The XDR data type of the attribute. 1806 o Acc: Access allowed to the attribute. R means read-only (GETATTR 1807 may retrieve, SETATTR may not set). W means write-only (SETATTR 1808 may set, GETATTR may not retrieve). R W means read/write (GETATTR 1809 may retrieve, SETATTR may set). 1811 o Defined in: The section of this specification that describes the 1812 attribute. 1814 REQUIRED attributes 1816 +-----------------+----+------------+-----+-------------------+ 1817 | Name | Id | Data Type | Acc | Defined in: | 1818 +-----------------+----+------------+-----+-------------------+ 1819 | supported_attrs | 0 | bitmap4 | R | Section 5.8.1.1 | 1820 | type | 1 | nfs_ftype4 | R | Section 5.8.1.2 | 1821 | fh_expire_type | 2 | uint32_t | R | Section 5.8.1.3 | 1822 | change | 3 | changeid4 | R | Section 5.8.1.4 | 1823 | size | 4 | uint64_t | R W | Section 5.8.1.5 | 1824 | link_support | 5 | bool | R | Section 5.8.1.6 | 1825 | symlink_support | 6 | bool | R | Section 5.8.1.7 | 1826 | named_attr | 7 | bool | R | Section 5.8.1.8 | 1827 | fsid | 8 | fsid4 | R | Section 5.8.1.9 | 1828 | unique_handles | 9 | bool | R | Section 5.8.1.10 | 1829 | lease_time | 10 | nfs_lease4 | R | Section 5.8.1.11 | 1830 | rdattr_error | 11 | nfsstat4 | R | Section 5.8.1.12 | 1831 | filehandle | 19 | nfs_fh4 | R | Section 5.8.1.13 | 1832 +-----------------+----+------------+-----+-------------------+ 1834 Table 3 1836 5.7. RECOMMENDED Attributes - List and Definition References 1838 The RECOMMENDED attributes are defined in Table 4. The meanings of 1839 the column headers are the same as Table 3; see Section 5.6 for the 1840 meanings. 1842 RECOMMENDED attributes 1844 +-------------------+----+-----------------+-----+------------------+ 1845 | Name | Id | Data Type | Acc | Defined in: | 1846 +-------------------+----+-----------------+-----+------------------+ 1847 | acl | 12 | nfsace4<> | R W | Section 6.2.1 | 1848 | aclsupport | 13 | uint32_t | R | Section 6.2.1.2 | 1849 | archive | 14 | bool | R W | Section 5.8.2.1 | 1850 | cansettime | 15 | bool | R | Section 5.8.2.2 | 1851 | case_insensitive | 16 | bool | R | Section 5.8.2.3 | 1852 | case_preserving | 17 | bool | R | Section 5.8.2.4 | 1853 | chown_restricted | 18 | bool | R | Section 5.8.2.5 | 1854 | fileid | 20 | uint64_t | R | Section 5.8.2.6 | 1855 | files_avail | 21 | uint64_t | R | Section 5.8.2.7 | 1856 | files_free | 22 | uint64_t | R | Section 5.8.2.8 | 1857 | files_total | 23 | uint64_t | R | Section 5.8.2.9 | 1858 | fs_locations | 24 | fs_locations4 | R | Section 5.8.2.10 | 1859 | hidden | 25 | bool | R W | Section 5.8.2.11 | 1860 | homogeneous | 26 | bool | R | Section 5.8.2.12 | 1861 | maxfilesize | 27 | uint64_t | R | Section 5.8.2.13 | 1862 | maxlink | 28 | uint32_t | R | Section 5.8.2.14 | 1863 | maxname | 29 | uint32_t | R | Section 5.8.2.15 | 1864 | maxread | 30 | uint64_t | R | Section 5.8.2.16 | 1865 | maxwrite | 31 | uint64_t | R | Section 5.8.2.17 | 1866 | mimetype | 32 | ascii_ | R W | Section 5.8.2.18 | 1867 | | | REQUIRED4<> | | | 1868 | mode | 33 | mode4 | R W | Section 6.2.2 | 1869 | mounted_on_fileid | 55 | uint64_t | R | Section 5.8.2.19 | 1870 | no_trunc | 34 | bool | R | Section 5.8.2.20 | 1871 | numlinks | 35 | uint32_t | R | Section 5.8.2.21 | 1872 | owner | 36 | utf8str_mixed | R W | Section 5.8.2.22 | 1873 | owner_group | 37 | utf8str_mixed | R W | Section 5.8.2.23 | 1874 | quota_avail_hard | 38 | uint64_t | R | Section 5.8.2.24 | 1875 | quota_avail_soft | 39 | uint64_t | R | Section 5.8.2.25 | 1876 | quota_used | 40 | uint64_t | R | Section 5.8.2.26 | 1877 | rawdev | 41 | specdata4 | R | Section 5.8.2.27 | 1878 | space_avail | 42 | uint64_t | R | Section 5.8.2.28 | 1879 | space_free | 43 | uint64_t | R | Section 5.8.2.29 | 1880 | space_total | 44 | uint64_t | R | Section 5.8.2.30 | 1881 | space_used | 45 | uint64_t | R | Section 5.8.2.31 | 1882 | system | 46 | bool | R W | Section 5.8.2.32 | 1883 | time_access | 47 | nfstime4 | R | Section 5.8.2.33 | 1884 | time_access_set | 48 | settime4 | W | Section 5.8.2.34 | 1885 | time_backup | 49 | nfstime4 | R W | Section 5.8.2.35 | 1886 | time_create | 50 | nfstime4 | R W | Section 5.8.2.36 | 1887 | time_delta | 51 | nfstime4 | R | Section 5.8.2.37 | 1888 | time_metadata | 52 | nfstime4 | R | Section 5.8.2.38 | 1889 | time_modify | 53 | nfstime4 | R | Section 5.8.2.39 | 1890 | time_modify_set | 54 | settime4 | W | Section 5.8.2.40 | 1891 +-------------------+----+-----------------+-----+------------------+ 1892 Table 4 1894 5.8. Attribute Definitions 1896 5.8.1. Definitions of REQUIRED Attributes 1898 5.8.1.1. Attribute 0: supported_attrs 1900 The bit vector that would retrieve all REQUIRED and RECOMMENDED 1901 attributes that are supported for this object. The scope of this 1902 attribute applies to all objects with a matching fsid. 1904 5.8.1.2. Attribute 1: type 1906 Designates the type of an object in terms of one of a number of 1907 special constants: 1909 o NF4REG designates a regular file. 1911 o NF4DIR designates a directory. 1913 o NF4BLK designates a block device special file. 1915 o NF4CHR designates a character device special file. 1917 o NF4LNK designates a symbolic link. 1919 o NF4SOCK designates a named socket special file. 1921 o NF4FIFO designates a fifo special file. 1923 o NF4ATTRDIR designates a named attribute directory. 1925 o NF4NAMEDATTR designates a named attribute. 1927 Within the explanatory text and operation descriptions, the following 1928 phrases will be used with the meanings given below: 1930 o The phrase "is a directory" means that the object's type attribute 1931 is NF4DIR or NF4ATTRDIR. 1933 o The phrase "is a special file" means that the object's type 1934 attribute is NF4BLK, NF4CHR, NF4SOCK, or NF4FIFO. 1936 o The phrase "is a regular file" means that the object's type 1937 attribute is NF4REG or NF4NAMEDATTR. 1939 o The phrase "is a symbolic link" means that the object's type 1940 attribute is NF4LNK. 1942 5.8.1.3. Attribute 2: fh_expire_type 1944 Server uses this to specify filehandle expiration behavior to the 1945 client. See Section 4 for additional description. 1947 5.8.1.4. Attribute 3: change 1949 A value created by the server that the client can use to determine if 1950 file data, directory contents, or attributes of the object have been 1951 modified. The server MAY return the object's time_metadata attribute 1952 for this attribute's value but only if the file system object cannot 1953 be updated more frequently than the resolution of time_metadata. 1955 5.8.1.5. Attribute 4: size 1957 The size of the object in bytes. 1959 5.8.1.6. Attribute 5: link_support 1961 TRUE, if the object's file system supports hard links. 1963 5.8.1.7. Attribute 6: symlink_support 1965 TRUE, if the object's file system supports symbolic links. 1967 5.8.1.8. Attribute 7: named_attr 1969 TRUE, if this object has named attributes. In other words, object 1970 has a non-empty named attribute directory. 1972 5.8.1.9. Attribute 8: fsid 1974 Unique file system identifier for the file system holding this 1975 object. The fsid attribute has major and minor components, each of 1976 which are of data type uint64_t. 1978 5.8.1.10. Attribute 9: unique_handles 1980 TRUE, if two distinct filehandles are guaranteed to refer to two 1981 different file system objects. 1983 5.8.1.11. Attribute 10: lease_time 1985 Duration of the lease at server in seconds. 1987 5.8.1.12. Attribute 11: rdattr_error 1989 Error returned from an attempt to retrieve attributes during a 1990 READDIR operation. 1992 5.8.1.13. Attribute 19: filehandle 1994 The filehandle of this object (primarily for READDIR requests). 1996 5.8.2. Definitions of Uncategorized RECOMMENDED Attributes 1998 The definitions of most of the RECOMMENDED attributes follow. 1999 Collections that share a common category are defined in other 2000 sections. 2002 5.8.2.1. Attribute 14: archive 2004 TRUE, if this file has been archived since the time of last 2005 modification (deprecated in favor of time_backup). 2007 5.8.2.2. Attribute 15: cansettime 2009 TRUE, if the server is able to change the times for a file system 2010 object as specified in a SETATTR operation. 2012 5.8.2.3. Attribute 16: case_insensitive 2014 TRUE, if file name comparisons on this file system are case 2015 insensitive. This refers only to comparisons, and not to the case in 2016 which file names are stored. 2018 5.8.2.4. Attribute 17: case_preserving 2020 TRUE, if file name case on this file system is preserved. This 2021 refers only to how file names are stored, and not to how they are 2022 compared. File names stored in mixed case might be compared using 2023 either case-insensitive or case-sensitive comparisons. 2025 5.8.2.5. Attribute 18: chown_restricted 2027 If TRUE, the server will reject any request to change either the 2028 owner or the group associated with a file if the caller is not a 2029 privileged user (for example, "root" in UNIX operating environments 2030 or in Windows 2000, the "Take Ownership" privilege). 2032 5.8.2.6. Attribute 20: fileid 2034 A number uniquely identifying the file within the file system. 2036 5.8.2.7. Attribute 21: files_avail 2038 File slots available to this user on the file system containing this 2039 object -- this should be the smallest relevant limit. 2041 5.8.2.8. Attribute 22: files_free 2043 Free file slots on the file system containing this object - this 2044 should be the smallest relevant limit. 2046 5.8.2.9. Attribute 23: files_total 2048 Total file slots on the file system containing this object. 2050 5.8.2.10. Attribute 24: fs_locations 2052 Locations where this file system may be found. If the server returns 2053 NFS4ERR_MOVED as an error, this attribute MUST be supported. 2055 The server specifies the root path for a given server by returning a 2056 path consisting of zero path components. 2058 5.8.2.11. Attribute 25: hidden 2060 TRUE, if the file is considered hidden with respect to the Windows 2061 API. 2063 5.8.2.12. Attribute 26: homogeneous 2065 TRUE, if this object's file system is homogeneous, i.e., all objects 2066 in the file system (all objects on the server with the same fsid) 2067 have common values for all per-file-system attributes. 2069 5.8.2.13. Attribute 27: maxfilesize 2071 Maximum supported file size for the file system of this object. 2073 5.8.2.14. Attribute 28: maxlink 2075 Maximum number of hard links for this object. 2077 5.8.2.15. Attribute 29: maxname 2079 Maximum file name size supported for this object. 2081 5.8.2.16. Attribute 30: maxread 2083 Maximum amount of data the READ operation will return for this 2084 object. 2086 5.8.2.17. Attribute 31: maxwrite 2088 Maximum amount of data the WRITE operation will accept for this 2089 object. This attribute SHOULD be supported if the file is writable. 2090 Lack of this attribute can lead to the client either wasting 2091 bandwidth or not receiving the best performance. 2093 5.8.2.18. Attribute 32: mimetype 2095 MIME media type/subtype of this object. 2097 5.8.2.19. Attribute 55: mounted_on_fileid 2099 Like fileid, but if the target filehandle is the root of a file 2100 system, this attribute represents the fileid of the underlying 2101 directory. 2103 UNIX-based operating environments connect a file system into the 2104 namespace by connecting (mounting) the file system onto the existing 2105 file object (the mount point, usually a directory) of an existing 2106 file system. When the mount point's parent directory is read via an 2107 API like readdir(), the return results are directory entries, each 2108 with a component name and a fileid. The fileid of the mount point's 2109 directory entry will be different from the fileid that the stat() 2110 system call returns. The stat() system call is returning the fileid 2111 of the root of the mounted file system, whereas readdir() is 2112 returning the fileid that stat() would have returned before any file 2113 systems were mounted on the mount point. 2115 Unlike NFSv3, NFSv4.0 allows a client's LOOKUP request to cross other 2116 file systems. The client detects the file system crossing whenever 2117 the filehandle argument of LOOKUP has an fsid attribute different 2118 from that of the filehandle returned by LOOKUP. A UNIX-based client 2119 will consider this a "mount point crossing". UNIX has a legacy 2120 scheme for allowing a process to determine its current working 2121 directory. This relies on readdir() of a mount point's parent and 2122 stat() of the mount point returning fileids as previously described. 2123 The mounted_on_fileid attribute corresponds to the fileid that 2124 readdir() would have returned as described previously. 2126 While the NFSv4.0 client could simply fabricate a fileid 2127 corresponding to what mounted_on_fileid provides (and if the server 2128 does not support mounted_on_fileid, the client has no choice), there 2129 is a risk that the client will generate a fileid that conflicts with 2130 one that is already assigned to another object in the file system. 2131 Instead, if the server can provide the mounted_on_fileid, the 2132 potential for client operational problems in this area is eliminated. 2134 If the server detects that there is no mounted point at the target 2135 file object, then the value for mounted_on_fileid that it returns is 2136 the same as that of the fileid attribute. 2138 The mounted_on_fileid attribute is RECOMMENDED, so the server SHOULD 2139 provide it if possible, and for a UNIX-based server, this is 2140 straightforward. Usually, mounted_on_fileid will be requested during 2141 a READDIR operation, in which case it is trivial (at least for UNIX- 2142 based servers) to return mounted_on_fileid since it is equal to the 2143 fileid of a directory entry returned by readdir(). If 2144 mounted_on_fileid is requested in a GETATTR operation, the server 2145 should obey an invariant that has it returning a value that is equal 2146 to the file object's entry in the object's parent directory, i.e., 2147 what readdir() would have returned. Some operating environments 2148 allow a series of two or more file systems to be mounted onto a 2149 single mount point. In this case, for the server to obey the 2150 aforementioned invariant, it will need to find the base mount point, 2151 and not the intermediate mount points. 2153 5.8.2.20. Attribute 34: no_trunc 2155 If this attribute is TRUE, then if the client uses a file name longer 2156 than name_max, an error will be returned instead of the name being 2157 truncated. 2159 5.8.2.21. Attribute 35: numlinks 2161 Number of hard links to this object. 2163 5.8.2.22. Attribute 36: owner 2165 The string name of the owner of this object. 2167 5.8.2.23. Attribute 37: owner_group 2169 The string name of the group ownership of this object. 2171 5.8.2.24. Attribute 38: quota_avail_hard 2173 The value in bytes that represents the amount of additional disk 2174 space beyond the current allocation that can be allocated to this 2175 file or directory before further allocations will be refused. It is 2176 understood that this space may be consumed by allocations to other 2177 files or directories. 2179 5.8.2.25. Attribute 39: quota_avail_soft 2181 The value in bytes that represents the amount of additional disk 2182 space that can be allocated to this file or directory before the user 2183 may reasonably be warned. It is understood that this space may be 2184 consumed by allocations to other files or directories though there 2185 may exist server side rules as to which other files or directories. 2187 5.8.2.26. Attribute 40: quota_used 2189 The value in bytes that represents the amount of disk space used by 2190 this file or directory and possibly a number of other similar files 2191 or directories, where the set of "similar" meets at least the 2192 criterion that allocating space to any file or directory in the set 2193 will reduce the "quota_avail_hard" of every other file or directory 2194 in the set. 2196 Note that there may be a number of distinct but overlapping sets of 2197 files or directories for which a quota_used value is maintained, 2198 e.g., "all files with a given owner", "all files with a given group 2199 owner", etc. The server is at liberty to choose any of those sets 2200 when providing the content of the quota_used attribute, but should do 2201 so in a repeatable way. The rule may be configured per file system 2202 or may be "choose the set with the smallest quota". 2204 5.8.2.27. Attribute 41: rawdev 2206 Raw device number of file of type NF4BLK or NF4CHR. The device 2207 number is split into major and minor numbers. If the file's type 2208 attribute is not NF4BLK or NF4CHR, this attribute SHOULD NOT be 2209 returned, and any value returned SHOULD NOT be considered useful. 2211 5.8.2.28. Attribute 42: space_avail 2213 Disk space in bytes available to this user on the file system 2214 containing this object -- this should be the smallest relevant limit. 2216 5.8.2.29. Attribute 43: space_free 2218 Free disk space in bytes on the file system containing this object -- 2219 this should be the smallest relevant limit. 2221 5.8.2.30. Attribute 44: space_total 2223 Total disk space in bytes on the file system containing this object. 2225 5.8.2.31. Attribute 45: space_used 2227 Number of file system bytes allocated to this object. 2229 5.8.2.32. Attribute 46: system 2231 This attribute is TRUE if this file is a "system" file with respect 2232 to the Windows operating environment. 2234 5.8.2.33. Attribute 47: time_access 2236 The time_access attribute represents the time of last access to the 2237 object by a READ operation sent to the server. The notion of what is 2238 an "access" depends on the server's operating environment and/or the 2239 server's file system semantics. For example, for servers obeying 2240 Portable Operating System Interface (POSIX) semantics, time_access 2241 would be updated only by the READ and READDIR operations and not any 2242 of the operations that modify the content of the object [16], [17], 2243 [read_api], [readdir_api], [write_api]. Of course, setting the 2244 corresponding time_access_set attribute is another way to modify the 2245 time_access attribute. 2247 Whenever the file object resides on a writable file system, the 2248 server should make its best efforts to record time_access into stable 2249 storage. However, to mitigate the performance effects of doing so, 2250 and most especially whenever the server is satisfying the read of the 2251 object's content from its cache, the server MAY cache access time 2252 updates and lazily write them to stable storage. It is also 2253 acceptable to give administrators of the server the option to disable 2254 time_access updates. 2256 5.8.2.34. Attribute 48: time_access_set 2258 Sets the time of last access to the object. SETATTR use only. 2260 5.8.2.35. Attribute 49: time_backup 2262 The time of last backup of the object. 2264 5.8.2.36. Attribute 50: time_create 2266 The time of creation of the object. This attribute does not have any 2267 relation to the traditional UNIX file attribute "ctime" or "change 2268 time". 2270 5.8.2.37. Attribute 51: time_delta 2272 Smallest useful server time granularity. 2274 5.8.2.38. Attribute 52: time_metadata 2276 The time of last metadata modification of the object. 2278 5.8.2.39. Attribute 53: time_modify 2280 The time of last modification to the object. 2282 5.8.2.40. Attribute 54: time_modify_set 2284 Sets the time of last modification to the object. SETATTR use only. 2286 5.9. Interpreting owner and owner_group 2288 The RECOMMENDED attributes "owner" and "owner_group" (and also users 2289 and groups used as values of the "who" field within nfs4ace 2290 structures used in the acl attribute) are represented in the form of 2291 UTF-8 strings. This format avoids use of a representation that is 2292 tied to a particular underlying implementation at the client or 2293 server. Note that section 6.1 of [RFC2624] provides additional 2294 rationale. It is expected that the client and server will have their 2295 own local representation of owners and groups that is used for local 2296 storage or presentation to the application via API's that expect such 2297 a representation. Therefore, the protocol requires that when these 2298 attributes are transferred between the client and server, the local 2299 representation is translated to a string of the form 2300 "identifier@dns_domain". This allows clients and servers that do not 2301 use the same local representation to effectively interoperate since 2302 they both use a common syntax that can be interpreted by both. 2304 Similarly, security principals may be represented in different ways 2305 by different security mechanisms. Servers normally translate these 2306 representations into a common format, generally that used by local 2307 storage, to serve as a means of identifying the users corresponding 2308 to these security principals. When these local identifiers are 2309 translated to the form of the owner attribute, associated with files 2310 created by such principals, they identify, in a common format, the 2311 users associated with each corresponding set of security principals. 2313 The translation used to interpret owner and group strings is not 2314 specified as part of the protocol. This allows various solutions to 2315 be employed. For example, a local translation table may be consulted 2316 that maps a numeric identifier to the user@dns_domain syntax. A name 2317 service may also be used to accomplish the translation. A server may 2318 provide a more general service, not limited by any particular 2319 translation (which would only translate a limited set of possible 2320 strings) by storing the owner and owner_group attributes in local 2321 storage without any translation or it may augment a translation 2322 method by storing the entire string for attributes for which no 2323 translation is available while using the local representation for 2324 those cases in which a translation is available. 2326 Servers that do not provide support for all possible values of user 2327 and group strings SHOULD return an error (NFS4ERR_BADOWNER) when a 2328 string is presented that has no translation, as the value to be set 2329 for a SETATTR of the owner or owner_group attributes or as part of 2330 the value of the acl attribute When a server does accept a user or 2331 group string as valid on a SETATTR, it is promising to return that 2332 same string (for which see below) when a corresponding GETATTR is 2333 done, as long as there has been no further change in the 2334 corresponding attribute before the GETATTR. For some 2335 internationalization-related exceptions where this is not possible, 2336 see below. Configuration changes (including changes from the mapping 2337 of the string to the local representation) and ill-constructed name 2338 translations (those that contain aliasing) may make that promise 2339 impossible to honor. Servers should make appropriate efforts to 2340 avoid a situation in which these attributes have their values changed 2341 when no real change to either ownership or acls has occurred. 2343 The "dns_domain" portion of the owner string is meant to be a DNS 2344 domain name. For example, "user@example.org". Servers should accept 2345 as valid a set of users for at least one domain. A server may treat 2346 other domains as having no valid translations. A more general 2347 service is provided when a server is capable of accepting users for 2348 multiple domains, or for all domains, subject to security 2349 constraints. 2351 As an implementation guide, both clients and servers may provide a 2352 means to configure the "dns_domain" portion of the owner string. For 2353 example, the DNS domain name of the host running the NFS server might 2354 be "lab.example.org", but the user names are defined in 2355 "example.org". In the absence of such a configuration, or as a 2356 default, the current DNS domain name of the server should be the 2357 value used for the "dns_domain". 2359 As mentioned above, it is desirable that a server when accepting a 2360 string of the form "user@domain" or "group@domain" in an attribute, 2361 return this same string when that corresponding attribute is fetched. 2362 Internationalization issues make this impossible under certain 2363 circumstances and the client needs to take note of these. See 2364 Section 12 for a detailed discussion of these issues. 2366 In the case where there is no translation available to the client or 2367 server, the attribute value will be constructed without the "@". 2368 Therefore, the absence of the "@" from the owner or owner_group 2369 attribute signifies that no translation was available at the sender 2370 and that the receiver of the attribute should not use that string as 2371 a basis for translation into its own internal format. Even though 2372 the attribute value cannot be translated, it may still be useful. In 2373 the case of a client, the attribute string may be used for local 2374 display of ownership. 2376 To provide a greater degree of compatibility with NFSv3, which 2377 identified users and groups by 32-bit unsigned user identifiers and 2378 group identifiers, owner and group strings that consist of ASCII- 2379 encoded decimal numeric values with no leading zeros can be given a 2380 special interpretation by clients and servers that choose to provide 2381 such support. The receiver may treat such a user or group string as 2382 representing the same user as would be represented by an NFSv3 uid or 2383 gid having the corresponding numeric value. 2385 A server SHOULD reject such a numeric value if the security mechanism 2386 is using Kerberos. I.e., in such a scenario, the client will already 2387 need to form "user@domain" strings. For any other security 2388 mechanism, the server SHOULD accept such numeric values. As an 2389 implementation note, the server could make such an acceptance be 2390 configurable. If the server does not support numeric values or if it 2391 is configured off, then it MUST return an NFS4ERR_BADOWNER error. If 2392 the security mechanism is using Kerberos and the client attempts to 2393 use the special form, then the server SHOULD return an 2394 NFS4ERR_BADOWNER error when there is a valid translation for the user 2395 or owner designated in this way. In that case, the client must use 2396 the appropriate user@domain string and not the special form for 2397 compatibility. 2399 The client MUST always accept numeric values if the security 2400 mechanism is not RPCSEC_GSS. A client can determine if a server 2401 supports numeric identifiers by first attempting to provide a numeric 2402 identifier. If this attempt rejected with an NFS4ERR_BADOWNER error, 2403 then the client should only use named identifiers of the form 2404 "user@dns_domain". 2406 The owner string "nobody" may be used to designate an anonymous user, 2407 which will be associated with a file created by a security principal 2408 that cannot be mapped through normal means to the owner attribute. 2410 5.10. Character Case Attributes 2412 With respect to the case_insensitive and case_preserving attributes, 2413 case insensitive comparisons of Unicode characters SHOULD use Unicode 2414 Default Case Folding as defined in Chapter 3 of the Unicode Standard 2415 [UNICODE], and MAY override that behavior for specific selected 2416 characters with the case folding defined in the SpecialCasing.txt 2417 [SPECIALCASING] file in section 3.13 of the Unicode Standard. 2419 The SpecialCasing.txt file replaces the Default Case Folding with 2420 locale and context-dependent case folding for specific situations. 2421 An example of locale and context-dependent case folding is that LATIN 2422 CAPITAL LETTER I ("I", U+0049) is default case folded to LATIN SMALL 2423 LETTER I ("i", U+0069); however, several languages (e.g. Turkish) 2424 treat an "I" character with a dot as a different letter than an "I" 2425 character without a dot, therefore in such languages, unless an I is 2426 before a dot_above, the "I" (U+0049) character should be case folded 2427 to a different character, LATIN SMALL LETTER DOTLESS I (U+0131). 2429 The [UNICODE] and [SPECIALCASING] references in this RFC are for 2430 version 6.3.0 of the Unicode standard, as that was the latest version 2431 of Unicode when this RFC was published. Implementations SHOULD 2432 always use the latest version of Unicode (http://www.unicode.org/ 2433 versions/latest/). 2435 [RFC Editor: please check that 6.3.0 is the latest version before 2436 publication of this document as an RFC.] 2438 6. Access Control Attributes 2440 Access Control Lists (ACLs) are file attributes that specify fine 2441 grained access control. This chapter covers the "acl", "aclsupport", 2442 "mode", file attributes, and their interactions. Note that file 2443 attributes may apply to any file system object. 2445 6.1. Goals 2447 ACLs and modes represent two well established models for specifying 2448 permissions. This chapter specifies requirements that attempt to 2449 meet the following goals: 2451 o If a server supports the mode attribute, it should provide 2452 reasonable semantics to clients that only set and retrieve the 2453 mode attribute. 2455 o If a server supports ACL attributes, it should provide reasonable 2456 semantics to clients that only set and retrieve those attributes. 2458 o On servers that support the mode attribute, if ACL attributes have 2459 never been set on an object, via inheritance or explicitly, the 2460 behavior should be traditional UNIX-like behavior. 2462 o On servers that support the mode attribute, if the ACL attributes 2463 have been previously set on an object, either explicitly or via 2464 inheritance: 2466 * Setting only the mode attribute should effectively control the 2467 traditional UNIX-like permissions of read, write, and execute 2468 on owner, owner_group, and other. 2470 * Setting only the mode attribute should provide reasonable 2471 security. For example, setting a mode of 000 should be enough 2472 to ensure that future opens for read or write by any principal 2473 fail, regardless of a previously existing or inherited ACL. 2475 o When a mode attribute is set on an object, the ACL attributes may 2476 need to be modified so as to not conflict with the new mode. In 2477 such cases, it is desirable that the ACL keep as much information 2478 as possible. This includes information about inheritance, AUDIT 2479 and ALARM ACEs, and permissions granted and denied that do not 2480 conflict with the new mode. 2482 6.2. File Attributes Discussion 2484 Support for each of the ACL attributes is RECOMMENDED and not 2485 required, since file systems accessed using NFSV4 might not support 2486 ACL's. 2488 6.2.1. Attribute 12: acl 2490 The NFSv4.0 ACL attribute contains an array of access control entries 2491 (ACEs) that are associated with the file system object. Although the 2492 client can read and write the acl attribute, the server is 2493 responsible for using the ACL to perform access control. The client 2494 can use the OPEN or ACCESS operations to check access without 2495 modifying or reading data or metadata. 2497 The NFS ACE structure is defined as follows: 2499 typedef uint32_t acetype4; 2501 typedef uint32_t aceflag4; 2503 typedef uint32_t acemask4; 2505 struct nfsace4 { 2506 acetype4 type; 2507 aceflag4 flag; 2508 acemask4 access_mask; 2509 utf8str_mixed who; 2510 }; 2512 To determine if a request succeeds, the server processes each nfsace4 2513 entry in order. Only ACEs which have a "who" that matches the 2514 requester are considered. Each ACE is processed until all of the 2515 bits of the requester's access have been ALLOWED. Once a bit (see 2516 below) has been ALLOWED by an ACCESS_ALLOWED_ACE, it is no longer 2517 considered in the processing of later ACEs. If an ACCESS_DENIED_ACE 2518 is encountered where the requester's access still has unALLOWED bits 2519 in common with the "access_mask" of the ACE, the request is denied. 2520 When the ACL is fully processed, if there are bits in the requester's 2521 mask that have not been ALLOWED or DENIED, access is denied. 2523 Unlike the ALLOW and DENY ACE types, the ALARM and AUDIT ACE types do 2524 not affect a requester's access, and instead are for triggering 2525 events as a result of a requester's access attempt. Therefore, AUDIT 2526 and ALARM ACEs are processed only after processing ALLOW and DENY 2527 ACEs. 2529 The NFSv4.0 ACL model is quite rich. Some server platforms may 2530 provide access control functionality that goes beyond the UNIX-style 2531 mode attribute, but which is not as rich as the NFS ACL model. So 2532 that users can take advantage of this more limited functionality, the 2533 server may support the acl attributes by mapping between its ACL 2534 model and the NFSv4.0 ACL model. Servers must ensure that the ACL 2535 they actually store or enforce is at least as strict as the NFSv4 ACL 2536 that was set. It is tempting to accomplish this by rejecting any ACL 2537 that falls outside the small set that can be represented accurately. 2538 However, such an approach can render ACLs unusable without special 2539 client-side knowledge of the server's mapping, which defeats the 2540 purpose of having a common NFSv4 ACL protocol. Therefore servers 2541 should accept every ACL that they can without compromising security. 2542 To help accomplish this, servers may make a special exception, in the 2543 case of unsupported permission bits, to the rule that bits not 2544 ALLOWED or DENIED by an ACL must be denied. For example, a UNIX- 2545 style server might choose to silently allow read attribute 2546 permissions even though an ACL does not explicitly allow those 2547 permissions. (An ACL that explicitly denies permission to read 2548 attributes should still result in a denial.) 2550 The situation is complicated by the fact that a server may have 2551 multiple modules that enforce ACLs. For example, the enforcement for 2552 NFSv4.0 access may be different from, but not weaker than, the 2553 enforcement for local access, and both may be different from the 2554 enforcement for access through other protocols such as Server Message 2555 Block (SMB) [MS-SMB]. So it may be useful for a server to accept an 2556 ACL even if not all of its modules are able to support it. 2558 The guiding principle with regard to NFSv4 access is that the server 2559 must not accept ACLs that give an appearance of more restricted 2560 access to a file than what is actually enforced. 2562 6.2.1.1. ACE Type 2564 The constants used for the type field (acetype4) are as follows: 2566 const ACE4_ACCESS_ALLOWED_ACE_TYPE = 0x00000000; 2567 const ACE4_ACCESS_DENIED_ACE_TYPE = 0x00000001; 2568 const ACE4_SYSTEM_AUDIT_ACE_TYPE = 0x00000002; 2569 const ACE4_SYSTEM_ALARM_ACE_TYPE = 0x00000003; 2571 All four bit types are permitted in the acl attribute. 2573 +------------------------------+--------------+---------------------+ 2574 | Value | Abbreviation | Description | 2575 +------------------------------+--------------+---------------------+ 2576 | ACE4_ACCESS_ALLOWED_ACE_TYPE | ALLOW | Explicitly grants | 2577 | | | the access defined | 2578 | | | in acemask4 to the | 2579 | | | file or directory. | 2580 | ACE4_ACCESS_DENIED_ACE_TYPE | DENY | Explicitly denies | 2581 | | | the access defined | 2582 | | | in acemask4 to the | 2583 | | | file or directory. | 2584 | ACE4_SYSTEM_AUDIT_ACE_TYPE | AUDIT | LOG (in a system | 2585 | | | dependent way) any | 2586 | | | access attempt to a | 2587 | | | file or directory | 2588 | | | which uses any of | 2589 | | | the access methods | 2590 | | | specified in | 2591 | | | acemask4. | 2592 | ACE4_SYSTEM_ALARM_ACE_TYPE | ALARM | Generate a system | 2593 | | | ALARM (system | 2594 | | | dependent) when any | 2595 | | | access attempt is | 2596 | | | made to a file or | 2597 | | | directory for the | 2598 | | | access methods | 2599 | | | specified in | 2600 | | | acemask4. | 2601 +------------------------------+--------------+---------------------+ 2603 The "Abbreviation" column denotes how the types will be referred to 2604 throughout the rest of this chapter. 2606 6.2.1.2. Attribute 13: aclsupport 2608 A server need not support all of the above ACE types. This attribute 2609 indicates which ACE types are supported for the current file system. 2610 The bitmask constants used to represent the above definitions within 2611 the aclsupport attribute are as follows: 2613 const ACL4_SUPPORT_ALLOW_ACL = 0x00000001; 2614 const ACL4_SUPPORT_DENY_ACL = 0x00000002; 2615 const ACL4_SUPPORT_AUDIT_ACL = 0x00000004; 2616 const ACL4_SUPPORT_ALARM_ACL = 0x00000008; 2618 Servers which support either the ALLOW or DENY ACE type SHOULD 2619 support both ALLOW and DENY ACE types. 2621 Clients should not attempt to set an ACE unless the server claims 2622 support for that ACE type. If the server receives a request to set 2623 an ACE that it cannot store, it MUST reject the request with 2624 NFS4ERR_ATTRNOTSUPP. If the server receives a request to set an ACE 2625 that it can store but cannot enforce, the server SHOULD reject the 2626 request with NFS4ERR_ATTRNOTSUPP. 2628 6.2.1.3. ACE Access Mask 2630 The bitmask constants used for the access mask field are as follows: 2632 const ACE4_READ_DATA = 0x00000001; 2633 const ACE4_LIST_DIRECTORY = 0x00000001; 2634 const ACE4_WRITE_DATA = 0x00000002; 2635 const ACE4_ADD_FILE = 0x00000002; 2636 const ACE4_APPEND_DATA = 0x00000004; 2637 const ACE4_ADD_SUBDIRECTORY = 0x00000004; 2638 const ACE4_READ_NAMED_ATTRS = 0x00000008; 2639 const ACE4_WRITE_NAMED_ATTRS = 0x00000010; 2640 const ACE4_EXECUTE = 0x00000020; 2641 const ACE4_DELETE_CHILD = 0x00000040; 2642 const ACE4_READ_ATTRIBUTES = 0x00000080; 2643 const ACE4_WRITE_ATTRIBUTES = 0x00000100; 2645 const ACE4_DELETE = 0x00010000; 2646 const ACE4_READ_ACL = 0x00020000; 2647 const ACE4_WRITE_ACL = 0x00040000; 2648 const ACE4_WRITE_OWNER = 0x00080000; 2649 const ACE4_SYNCHRONIZE = 0x00100000; 2651 Note that some masks have coincident values, for example, 2652 ACE4_READ_DATA and ACE4_LIST_DIRECTORY. The mask entries 2653 ACE4_LIST_DIRECTORY, ACE4_ADD_FILE, and ACE4_ADD_SUBDIRECTORY are 2654 intended to be used with directory objects, while ACE4_READ_DATA, 2655 ACE4_WRITE_DATA, and ACE4_APPEND_DATA are intended to be used with 2656 non-directory objects. 2658 6.2.1.3.1. Discussion of Mask Attributes 2660 ACE4_READ_DATA 2662 Operation(s) affected: 2664 READ 2666 OPEN 2668 Discussion: 2670 Permission to read the data of the file. 2672 Servers SHOULD allow a user the ability to read the data of the 2673 file when only the ACE4_EXECUTE access mask bit is set. 2675 ACE4_LIST_DIRECTORY 2677 Operation(s) affected: 2679 READDIR 2681 Discussion: 2683 Permission to list the contents of a directory. 2685 ACE4_WRITE_DATA 2687 Operation(s) affected: 2689 WRITE 2691 OPEN 2693 SETATTR of size 2695 Discussion: 2697 Permission to modify a file's data. 2699 ACE4_ADD_FILE 2701 Operation(s) affected: 2703 CREATE 2705 LINK 2707 OPEN 2709 RENAME 2711 Discussion: 2713 Permission to add a new file in a directory. The CREATE 2714 operation is affected when nfs_ftype4 is NF4LNK, NF4BLK, 2715 NF4CHR, NF4SOCK, or NF4FIFO. (NF4DIR is not listed because it 2716 is covered by ACE4_ADD_SUBDIRECTORY.) OPEN is affected when 2717 used to create a regular file. LINK and RENAME are always 2718 affected. 2720 ACE4_APPEND_DATA 2722 Operation(s) affected: 2724 WRITE 2726 OPEN 2728 SETATTR of size 2730 Discussion: 2732 The ability to modify a file's data, but only starting at EOF. 2733 This allows for the notion of append-only files, by allowing 2734 ACE4_APPEND_DATA and denying ACE4_WRITE_DATA to the same user 2735 or group. If a file has an ACL such as the one described above 2736 and a WRITE request is made for somewhere other than EOF, the 2737 server SHOULD return NFS4ERR_ACCESS. 2739 ACE4_ADD_SUBDIRECTORY 2741 Operation(s) affected: 2743 CREATE 2745 RENAME 2747 Discussion: 2749 Permission to create a subdirectory in a directory. The CREATE 2750 operation is affected when nfs_ftype4 is NF4DIR. The RENAME 2751 operation is always affected. 2753 ACE4_READ_NAMED_ATTRS 2755 Operation(s) affected: 2757 OPENATTR 2759 Discussion: 2761 Permission to read the named attributes of a file or to lookup 2762 the named attributes directory. OPENATTR is affected when it 2763 is not used to create a named attribute directory. This is 2764 when 1.) createdir is TRUE, but a named attribute directory 2765 already exists, or 2.) createdir is FALSE. 2767 ACE4_WRITE_NAMED_ATTRS 2769 Operation(s) affected: 2771 OPENATTR 2773 Discussion: 2775 Permission to write the named attributes of a file or to create 2776 a named attribute directory. OPENATTR is affected when it is 2777 used to create a named attribute directory. This is when 2778 createdir is TRUE and no named attribute directory exists. The 2779 ability to check whether or not a named attribute directory 2780 exists depends on the ability to look it up, therefore, users 2781 also need the ACE4_READ_NAMED_ATTRS permission in order to 2782 create a named attribute directory. 2784 ACE4_EXECUTE 2786 Operation(s) affected: 2788 READ 2790 Discussion: 2792 Permission to execute a file. 2794 Servers SHOULD allow a user the ability to read the data of the 2795 file when only the ACE4_EXECUTE access mask bit is set. This 2796 is because there is no way to execute a file without reading 2797 the contents. Though a server may treat ACE4_EXECUTE and 2798 ACE4_READ_DATA bits identically when deciding to permit a READ 2799 operation, it SHOULD still allow the two bits to be set 2800 independently in ACLs, and MUST distinguish between them when 2801 replying to ACCESS operations. In particular, servers SHOULD 2802 NOT silently turn on one of the two bits when the other is set, 2803 as that would make it impossible for the client to correctly 2804 enforce the distinction between read and execute permissions. 2806 As an example, following a SETATTR of the following ACL: 2808 nfsuser:ACE4_EXECUTE:ALLOW 2809 A subsequent GETATTR of ACL for that file SHOULD return: 2811 nfsuser:ACE4_EXECUTE:ALLOW 2813 Rather than: 2815 nfsuser:ACE4_EXECUTE/ACE4_READ_DATA:ALLOW 2817 ACE4_EXECUTE 2819 Operation(s) affected: 2821 LOOKUP 2823 OPEN 2825 REMOVE 2827 RENAME 2829 LINK 2831 CREATE 2833 Discussion: 2835 Permission to traverse/search a directory. 2837 ACE4_DELETE_CHILD 2839 Operation(s) affected: 2841 REMOVE 2843 RENAME 2845 Discussion: 2847 Permission to delete a file or directory within a directory. 2848 See Section 6.2.1.3.2 for information on how ACE4_DELETE and 2849 ACE4_DELETE_CHILD interact. 2851 ACE4_READ_ATTRIBUTES 2853 Operation(s) affected: 2855 GETATTR of file system object attributes 2856 VERIFY 2858 NVERIFY 2860 READDIR 2862 Discussion: 2864 The ability to read basic attributes (non-ACLs) of a file. On 2865 a UNIX system, basic attributes can be thought of as the stat 2866 level attributes. Allowing this access mask bit would mean the 2867 entity can execute "ls -l" and stat. If a READDIR operation 2868 requests attributes, this mask must be allowed for the READDIR 2869 to succeed. 2871 ACE4_WRITE_ATTRIBUTES 2873 Operation(s) affected: 2875 SETATTR of time_access_set, time_backup, 2877 time_create, time_modify_set, mimetype, hidden, system 2879 Discussion: 2881 Permission to change the times associated with a file or 2882 directory to an arbitrary value. Also permission to change the 2883 mimetype, hidden and system attributes. A user having 2884 ACE4_WRITE_DATA or ACE4_WRITE_ATTRIBUTES will be allowed to set 2885 the times associated with a file to the current server time. 2887 ACE4_DELETE 2889 Operation(s) affected: 2891 REMOVE 2893 Discussion: 2895 Permission to delete the file or directory. See 2896 Section 6.2.1.3.2 for information on ACE4_DELETE and 2897 ACE4_DELETE_CHILD interact. 2899 ACE4_READ_ACL 2901 Operation(s) affected: 2903 GETATTR of acl 2904 NVERIFY 2906 VERIFY 2908 Discussion: 2910 Permission to read the ACL. 2912 ACE4_WRITE_ACL 2914 Operation(s) affected: 2916 SETATTR of acl and mode 2918 Discussion: 2920 Permission to write the acl and mode attributes. 2922 ACE4_WRITE_OWNER 2924 Operation(s) affected: 2926 SETATTR of owner and owner_group 2928 Discussion: 2930 Permission to write the owner and owner_group attributes. On 2931 UNIX systems, this is the ability to execute chown() and 2932 chgrp(). 2934 ACE4_SYNCHRONIZE 2936 Operation(s) affected: 2938 NONE 2940 Discussion: 2942 Permission to use the file object as a synchronization 2943 primitive for interprocess communication. This permission is 2944 not enforced or interpreted by the NFSv4.0 server on behalf of 2945 the client. 2947 Typically, the ACE4_SYNCHRONIZE permission is only meaningful 2948 on local file systems, i.e., file systems not accessed via 2949 NFSv4.0. The reason that the permission bit exists is that 2950 some operating environments, such as Windows, use 2951 ACE4_SYNCHRONIZE. 2953 For example, if a client copies a file that has 2954 ACE4_SYNCHRONIZE set from a local file system to an NFSv4.0 2955 server, and then later copies the file from the NFSv4.0 server 2956 to a local file system, it is likely that if ACE4_SYNCHRONIZE 2957 was set in the original file, the client will want it set in 2958 the second copy. The first copy will not have the permission 2959 set unless the NFSv4.0 server has the means to set the 2960 ACE4_SYNCHRONIZE bit. The second copy will not have the 2961 permission set unless the NFSv4.0 server has the means to 2962 retrieve the ACE4_SYNCHRONIZE bit. 2964 Server implementations need not provide the granularity of control 2965 that is implied by this list of masks. For example, POSIX-based 2966 systems might not distinguish ACE4_APPEND_DATA (the ability to append 2967 to a file) from ACE4_WRITE_DATA (the ability to modify existing 2968 contents); both masks would be tied to a single "write" permission. 2969 When such a server returns attributes to the client, it would show 2970 both ACE4_APPEND_DATA and ACE4_WRITE_DATA if and only if the write 2971 permission is enabled. 2973 If a server receives a SETATTR request that it cannot accurately 2974 implement, it should err in the direction of more restricted access, 2975 except in the previously discussed cases of execute and read. For 2976 example, suppose a server cannot distinguish overwriting data from 2977 appending new data, as described in the previous paragraph. If a 2978 client submits an ALLOW ACE where ACE4_APPEND_DATA is set but 2979 ACE4_WRITE_DATA is not (or vice versa), the server should either turn 2980 off ACE4_APPEND_DATA or reject the request with NFS4ERR_ATTRNOTSUPP. 2982 6.2.1.3.2. ACE4_DELETE vs. ACE4_DELETE_CHILD 2984 Two access mask bits govern the ability to delete a directory entry: 2985 ACE4_DELETE on the object itself (the "target"), and 2986 ACE4_DELETE_CHILD on the containing directory (the "parent"). 2988 Many systems also take the "sticky bit" (MODE4_SVTX) on a directory 2989 to allow unlink only to a user that owns either the target or the 2990 parent; on some such systems the decision also depends on whether the 2991 target is writable. 2993 Servers SHOULD allow unlink if either ACE4_DELETE is permitted on the 2994 target, or ACE4_DELETE_CHILD is permitted on the parent. (Note that 2995 this is true even if the parent or target explicitly denies the other 2996 of these permissions.) 2998 If the ACLs in question neither explicitly ALLOW nor DENY either of 2999 the above, and if MODE4_SVTX is not set on the parent, then the 3000 server SHOULD allow the removal if and only if ACE4_ADD_FILE is 3001 permitted. In the case where MODE4_SVTX is set, the server may also 3002 require the remover to own either the parent or the target, or may 3003 require the target to be writable. 3005 This allows servers to support something close to traditional UNIX- 3006 like semantics, with ACE4_ADD_FILE taking the place of the write bit. 3008 6.2.1.4. ACE flag 3010 The bitmask constants used for the flag field are as follows: 3012 const ACE4_FILE_INHERIT_ACE = 0x00000001; 3013 const ACE4_DIRECTORY_INHERIT_ACE = 0x00000002; 3014 const ACE4_NO_PROPAGATE_INHERIT_ACE = 0x00000004; 3015 const ACE4_INHERIT_ONLY_ACE = 0x00000008; 3016 const ACE4_SUCCESSFUL_ACCESS_ACE_FLAG = 0x00000010; 3017 const ACE4_FAILED_ACCESS_ACE_FLAG = 0x00000020; 3018 const ACE4_IDENTIFIER_GROUP = 0x00000040; 3020 A server need not support any of these flags. If the server supports 3021 flags that are similar to, but not exactly the same as, these flags, 3022 the implementation may define a mapping between the protocol-defined 3023 flags and the implementation-defined flags. 3025 For example, suppose a client tries to set an ACE with 3026 ACE4_FILE_INHERIT_ACE set but not ACE4_DIRECTORY_INHERIT_ACE. If the 3027 server does not support any form of ACL inheritance, the server 3028 should reject the request with NFS4ERR_ATTRNOTSUPP. If the server 3029 supports a single "inherit ACE" flag that applies to both files and 3030 directories, the server may reject the request (i.e., requiring the 3031 client to set both the file and directory inheritance flags). The 3032 server may also accept the request and silently turn on the 3033 ACE4_DIRECTORY_INHERIT_ACE flag. 3035 6.2.1.4.1. Discussion of Flag Bits 3037 ACE4_FILE_INHERIT_ACE 3038 Any non-directory file in any sub-directory will get this ACE 3039 inherited. 3041 ACE4_DIRECTORY_INHERIT_ACE 3042 Can be placed on a directory and indicates that this ACE should be 3043 added to each new directory created. 3044 If this flag is set in an ACE in an ACL attribute to be set on a 3045 non-directory file system object, the operation attempting to set 3046 the ACL SHOULD fail with NFS4ERR_ATTRNOTSUPP. 3048 ACE4_INHERIT_ONLY_ACE 3049 Can be placed on a directory but does not apply to the directory; 3050 ALLOW and DENY ACEs with this bit set do not affect access to the 3051 directory, and AUDIT and ALARM ACEs with this bit set do not 3052 trigger log or alarm events. Such ACEs only take effect once they 3053 are applied (with this bit cleared) to newly created files and 3054 directories as specified by the above two flags. 3055 If this flag is present on an ACE, but neither 3056 ACE4_DIRECTORY_INHERIT_ACE nor ACE4_FILE_INHERIT_ACE is present, 3057 then an operation attempting to set such an attribute SHOULD fail 3058 with NFS4ERR_ATTRNOTSUPP. 3060 ACE4_NO_PROPAGATE_INHERIT_ACE 3061 Can be placed on a directory. This flag tells the server that 3062 inheritance of this ACE should stop at newly created child 3063 directories. 3065 ACE4_SUCCESSFUL_ACCESS_ACE_FLAG 3067 ACE4_FAILED_ACCESS_ACE_FLAG 3068 The ACE4_SUCCESSFUL_ACCESS_ACE_FLAG (SUCCESS) and 3069 ACE4_FAILED_ACCESS_ACE_FLAG (FAILED) flag bits may be set only on 3070 ACE4_SYSTEM_AUDIT_ACE_TYPE (AUDIT) and ACE4_SYSTEM_ALARM_ACE_TYPE 3071 (ALARM) ACE types. If during the processing of the file's ACL, 3072 the server encounters an AUDIT or ALARM ACE that matches the 3073 principal attempting the OPEN, the server notes that fact, and the 3074 presence, if any, of the SUCCESS and FAILED flags encountered in 3075 the AUDIT or ALARM ACE. Once the server completes the ACL 3076 processing, it then notes if the operation succeeded or failed. 3077 If the operation succeeded, and if the SUCCESS flag was set for a 3078 matching AUDIT or ALARM ACE, then the appropriate AUDIT or ALARM 3079 event occurs. If the operation failed, and if the FAILED flag was 3080 set for the matching AUDIT or ALARM ACE, then the appropriate 3081 AUDIT or ALARM event occurs. Either or both of the SUCCESS or 3082 FAILED can be set, but if neither is set, the AUDIT or ALARM ACE 3083 is not useful. 3085 The previously described processing applies to ACCESS operations 3086 even when they return NFS4_OK. For the purposes of AUDIT and 3087 ALARM, we consider an ACCESS operation to be a "failure" if it 3088 fails to return a bit that was requested and supported. 3090 ACE4_IDENTIFIER_GROUP 3091 Indicates that the "who" refers to a GROUP as defined under UNIX 3092 or a GROUP ACCOUNT as defined under Windows. Clients and servers 3093 MUST ignore the ACE4_IDENTIFIER_GROUP flag on ACEs with a who 3094 value equal to one of the special identifiers outlined in 3095 Section 6.2.1.5. 3097 6.2.1.5. ACE Who 3099 The "who" field of an ACE is an identifier that specifies the 3100 principal or principals to whom the ACE applies. It may refer to a 3101 user or a group, with the flag bit ACE4_IDENTIFIER_GROUP specifying 3102 which. 3104 There are several special identifiers which need to be understood 3105 universally, rather than in the context of a particular DNS domain. 3106 Some of these identifiers cannot be understood when an NFS client 3107 accesses the server, but have meaning when a local process accesses 3108 the file. The ability to display and modify these permissions is 3109 permitted over NFS, even if none of the access methods on the server 3110 understands the identifiers. 3112 +---------------+---------------------------------------------------+ 3113 | Who | Description | 3114 +---------------+---------------------------------------------------+ 3115 | OWNER | The owner of the file. | 3116 | GROUP | The group associated with the file. | 3117 | EVERYONE | The world, including the owner and owning group. | 3118 | INTERACTIVE | Accessed from an interactive terminal. | 3119 | NETWORK | Accessed via the network. | 3120 | DIALUP | Accessed as a dialup user to the server. | 3121 | BATCH | Accessed from a batch job. | 3122 | ANONYMOUS | Accessed without any authentication. | 3123 | AUTHENTICATED | Any authenticated user (opposite of ANONYMOUS). | 3124 | SERVICE | Access from a system service. | 3125 +---------------+---------------------------------------------------+ 3127 Table 5 3129 To avoid conflict, these special identifiers are distinguished by an 3130 appended "@" and should appear in the form "xxxx@" (with no domain 3131 name after the "@"). For example: ANONYMOUS@. 3133 The ACE4_IDENTIFIER_GROUP flag MUST be ignored on entries with these 3134 special identifiers. When encoding entries with these special 3135 identifiers, the ACE4_IDENTIFIER_GROUP flag SHOULD be set to zero. 3137 6.2.1.5.1. Discussion of EVERYONE@ 3139 It is important to note that "EVERYONE@" is not equivalent to the 3140 UNIX "other" entity. This is because, by definition, UNIX "other" 3141 does not include the owner or owning group of a file. "EVERYONE@" 3142 means literally everyone, including the owner or owning group. 3144 6.2.2. Attribute 33: mode 3146 The NFSv4.0 mode attribute is based on the UNIX mode bits. The 3147 following bits are defined: 3149 const MODE4_SUID = 0x800; /* set user id on execution */ 3150 const MODE4_SGID = 0x400; /* set group id on execution */ 3151 const MODE4_SVTX = 0x200; /* save text even after use */ 3152 const MODE4_RUSR = 0x100; /* read permission: owner */ 3153 const MODE4_WUSR = 0x080; /* write permission: owner */ 3154 const MODE4_XUSR = 0x040; /* execute permission: owner */ 3155 const MODE4_RGRP = 0x020; /* read permission: group */ 3156 const MODE4_WGRP = 0x010; /* write permission: group */ 3157 const MODE4_XGRP = 0x008; /* execute permission: group */ 3158 const MODE4_ROTH = 0x004; /* read permission: other */ 3159 const MODE4_WOTH = 0x002; /* write permission: other */ 3160 const MODE4_XOTH = 0x001; /* execute permission: other */ 3162 Bits MODE4_RUSR, MODE4_WUSR, and MODE4_XUSR apply to the principal 3163 identified in the owner attribute. Bits MODE4_RGRP, MODE4_WGRP, and 3164 MODE4_XGRP apply to principals identified in the owner_group 3165 attribute but who are not identified in the owner attribute. Bits 3166 MODE4_ROTH, MODE4_WOTH, MODE4_XOTH apply to any principal that does 3167 not match that in the owner attribute, and does not have a group 3168 matching that of the owner_group attribute. 3170 Bits within the mode other than those specified above are not defined 3171 by this protocol. A server MUST NOT return bits other than those 3172 defined above in a GETATTR or READDIR operation, and it MUST return 3173 NFS4ERR_INVAL if bits other than those defined above are set in a 3174 SETATTR, CREATE, OPEN, VERIFY or NVERIFY operation. 3176 6.3. Common Methods 3178 The requirements in this section will be referred to in future 3179 sections, especially Section 6.4. 3181 6.3.1. Interpreting an ACL 3183 6.3.1.1. Server Considerations 3185 The server uses the algorithm described in Section 6.2.1 to determine 3186 whether an ACL allows access to an object. However, the ACL may not 3187 be the sole determiner of access. For example: 3189 o In the case of a file system exported as read-only, the server may 3190 deny write permissions even though an object's ACL grants it. 3192 o Server implementations MAY grant ACE4_WRITE_ACL and ACE4_READ_ACL 3193 permissions to prevent a situation from arising in which there is 3194 no valid way to ever modify the ACL. 3196 o All servers will allow a user the ability to read the data of the 3197 file when only the execute permission is granted (i.e., If the ACL 3198 denies the user the ACE4_READ_DATA access and allows the user 3199 ACE4_EXECUTE, the server will allow the user to read the data of 3200 the file). 3202 o Many servers have the notion of owner-override in which the owner 3203 of the object is allowed to override accesses that are denied by 3204 the ACL. This may be helpful, for example, to allow users 3205 continued access to open files on which the permissions have 3206 changed. 3208 o Many servers have the notion of a "superuser" that has privileges 3209 beyond an ordinary user. The superuser may be able to read or 3210 write data or metadata in ways that would not be permitted by the 3211 ACL. 3213 6.3.1.2. Client Considerations 3215 Clients SHOULD NOT do their own access checks based on their 3216 interpretation the ACL, but rather use the OPEN and ACCESS operations 3217 to do access checks. This allows the client to act on the results of 3218 having the server determine whether or not access should be granted 3219 based on its interpretation of the ACL. 3221 Clients must be aware of situations in which an object's ACL will 3222 define a certain access even though the server will not have adequate 3223 information to enforce it. For example, the server has no way of 3224 determining whether a particular OPEN reflects a user's open for read 3225 access, or is done as part of executing the file in question. In 3226 such situations, the client needs to do its part in the enforcement 3227 of access as defined by the ACL. To do this, the client will send 3228 the appropriate ACCESS operation (or use a cached previous 3229 determination) prior to servicing the request of the user or 3230 application in order to determine whether the user or application 3231 should be granted the access requested. For examples in which the 3232 ACL may define accesses that the server does not enforce see 3233 Section 6.3.1.1. 3235 6.3.2. Computing a Mode Attribute from an ACL 3237 The following method can be used to calculate the MODE4_R*, MODE4_W* 3238 and MODE4_X* bits of a mode attribute, based upon an ACL. 3240 First, for each of the special identifiers OWNER@, GROUP@, and 3241 EVERYONE@, evaluate the ACL in order, considering only ALLOW and DENY 3242 ACEs for the identifier EVERYONE@ and for the identifier under 3243 consideration. The result of the evaluation will be an NFSv4 ACL 3244 mask showing exactly which bits are permitted to that identifier. 3246 Then translate the calculated mask for OWNER@, GROUP@, and EVERYONE@ 3247 into mode bits for, respectively, the user, group, and other, as 3248 follows: 3250 1. Set the read bit (MODE4_RUSR, MODE4_RGRP, or MODE4_ROTH) if and 3251 only if ACE4_READ_DATA is set in the corresponding mask. 3253 2. Set the write bit (MODE4_WUSR, MODE4_WGRP, or MODE4_WOTH) if and 3254 only if ACE4_WRITE_DATA and ACE4_APPEND_DATA are both set in the 3255 corresponding mask. 3257 3. Set the execute bit (MODE4_XUSR, MODE4_XGRP, or MODE4_XOTH), if 3258 and only if ACE4_EXECUTE is set in the corresponding mask. 3260 6.3.2.1. Discussion 3262 Some server implementations also add bits permitted to named users 3263 and groups to the group bits (MODE4_RGRP, MODE4_WGRP, and 3264 MODE4_XGRP). 3266 Implementations are discouraged from doing this, because it has been 3267 found to cause confusion for users who see members of a file's group 3268 denied access that the mode bits appear to allow. (The presence of 3269 DENY ACEs may also lead to such behavior, but DENY ACEs are expected 3270 to be more rarely used.) 3272 The same user confusion seen when fetching the mode also results if 3273 setting the mode does not effectively control permissions for the 3274 owner, group, and other users; this motivates some of the 3275 requirements that follow. 3277 6.4. Requirements 3279 The server that supports both mode and ACL must take care to 3280 synchronize the MODE4_*USR, MODE4_*GRP, and MODE4_*OTH bits with the 3281 ACEs which have respective who fields of "OWNER@", "GROUP@", and 3282 "EVERYONE@" so that the client can see semantically equivalent access 3283 permissions exist whether the client asks for owner, owner_group and 3284 mode attributes, or for just the ACL. 3286 Many requirements refer to Section 6.3.2, but note that the methods 3287 have behaviors specified with "SHOULD". This is intentional, to 3288 avoid invalidating existing implementations that compute the mode 3289 according to the withdrawn POSIX ACL draft ([P1003.1e]), rather than 3290 by actual permissions on owner, group, and other. 3292 6.4.1. Setting the mode and/or ACL Attributes 3294 6.4.1.1. Setting mode and not ACL 3296 When any of the nine low-order mode bits are changed because the mode 3297 attribute was set, and no ACL attribute is explicitly set, the acl 3298 attribute must be modified in accordance with the updated value of 3299 those bits. This must happen even if the value of the low-order bits 3300 is the same after the mode is set as before. 3302 Note that any AUDIT or ALARM ACEs are unaffected by changes to the 3303 mode. 3305 In cases in which the permissions bits are subject to change, the acl 3306 attribute MUST be modified such that the mode computed via the method 3307 in Section 6.3.2 yields the low-order nine bits (MODE4_R*, MODE4_W*, 3308 MODE4_X*) of the mode attribute as modified by the attribute change. 3309 The ACL attributes SHOULD also be modified such that: 3311 1. If MODE4_RGRP is not set, entities explicitly listed in the ACL 3312 other than OWNER@ and EVERYONE@ SHOULD NOT be granted 3313 ACE4_READ_DATA. 3315 2. If MODE4_WGRP is not set, entities explicitly listed in the ACL 3316 other than OWNER@ and EVERYONE@ SHOULD NOT be granted 3317 ACE4_WRITE_DATA or ACE4_APPEND_DATA. 3319 3. If MODE4_XGRP is not set, entities explicitly listed in the ACL 3320 other than OWNER@ and EVERYONE@ SHOULD NOT be granted 3321 ACE4_EXECUTE. 3323 Access mask bits other than those listed above, appearing in ALLOW 3324 ACEs, MAY also be disabled. 3326 Note that ACEs with the flag ACE4_INHERIT_ONLY_ACE set do not affect 3327 the permissions of the ACL itself, nor do ACEs of the type AUDIT and 3328 ALARM. As such, it is desirable to leave these ACEs unmodified when 3329 modifying the ACL attributes. 3331 Also note that the requirement may be met by discarding the acl in 3332 favor of an ACL that represents the mode and only the mode. This is 3333 permitted, but it is preferable for a server to preserve as much of 3334 the ACL as possible without violating the above requirements. 3336 Discarding the ACL makes it effectively impossible for a file created 3337 with a mode attribute to inherit an ACL (see Section 6.4.3). 3339 6.4.1.2. Setting ACL and not mode 3341 When setting the acl and not setting the mode attribute, the 3342 permission bits of the mode need to be derived from the ACL. In this 3343 case, the ACL attribute SHOULD be set as given. The nine low-order 3344 bits of the mode attribute (MODE4_R*, MODE4_W*, MODE4_X*) MUST be 3345 modified to match the result of the method Section 6.3.2. The three 3346 high-order bits of the mode (MODE4_SUID, MODE4_SGID, MODE4_SVTX) 3347 SHOULD remain unchanged. 3349 6.4.1.3. Setting both ACL and mode 3351 When setting both the mode and the acl attribute in the same 3352 operation, the attributes MUST be applied in this order: mode, then 3353 ACL. The mode-related attribute is set as given, then the ACL 3354 attribute is set as given, possibly changing the final mode, as 3355 described above in Section 6.4.1.2. 3357 6.4.2. Retrieving the mode and/or ACL Attributes 3359 This section applies only to servers that support both the mode and 3360 ACL attributes. 3362 Some server implementations may have a concept of "objects without 3363 ACLs", meaning that all permissions are granted and denied according 3364 to the mode attribute, and that no ACL attribute is stored for that 3365 object. If an ACL attribute is requested of such a server, the 3366 server SHOULD return an ACL that does not conflict with the mode; 3367 that is to say, the ACL returned SHOULD represent the nine low-order 3368 bits of the mode attribute (MODE4_R*, MODE4_W*, MODE4_X*) as 3369 described in Section 6.3.2. 3371 For other server implementations, the ACL attribute is always present 3372 for every object. Such servers SHOULD store at least the three high- 3373 order bits of the mode attribute (MODE4_SUID, MODE4_SGID, 3374 MODE4_SVTX). The server SHOULD return a mode attribute if one is 3375 requested, and the low-order nine bits of the mode (MODE4_R*, 3376 MODE4_W*, MODE4_X*) MUST match the result of applying the method in 3377 Section 6.3.2 to the ACL attribute. 3379 6.4.3. Creating New Objects 3381 If a server supports any ACL attributes, it may use the ACL 3382 attributes on the parent directory to compute an initial ACL 3383 attribute for a newly created object. This will be referred to as 3384 the inherited ACL within this section. The act of adding one or more 3385 ACEs to the inherited ACL that are based upon ACEs in the parent 3386 directory's ACL will be referred to as inheriting an ACE within this 3387 section. 3389 In the presence or absence of the mode and ACL attributes, the 3390 behavior of CREATE and OPEN SHOULD be: 3392 1. If just the mode is given in the call: 3394 In this case, inheritance SHOULD take place, but the mode MUST be 3395 applied to the inherited ACL as described in Section 6.4.1.1, 3396 thereby modifying the ACL. 3398 2. If just the ACL is given in the call: 3400 In this case, inheritance SHOULD NOT take place, and the ACL as 3401 defined in the CREATE or OPEN will be set without modification, 3402 and the mode modified as in Section 6.4.1.2 3404 3. If both mode and ACL are given in the call: 3406 In this case, inheritance SHOULD NOT take place, and both 3407 attributes will be set as described in Section 6.4.1.3. 3409 4. If neither mode nor ACL are given in the call: 3411 In the case where an object is being created without any initial 3412 attributes at all, e.g., an OPEN operation with an opentype4 of 3413 OPEN4_CREATE and a createmode4 of EXCLUSIVE4, inheritance SHOULD 3414 NOT take place. Instead, the server SHOULD set permissions to 3415 deny all access to the newly created object. It is expected that 3416 the appropriate client will set the desired attributes in a 3417 subsequent SETATTR operation, and the server SHOULD allow that 3418 operation to succeed, regardless of what permissions the object 3419 is created with. For example, an empty ACL denies all 3420 permissions, but the server should allow the owner's SETATTR to 3421 succeed even though WRITE_ACL is implicitly denied. 3423 In other cases, inheritance SHOULD take place, and no 3424 modifications to the ACL will happen. The mode attribute, if 3425 supported, MUST be as computed in Section 6.3.2, with the 3426 MODE4_SUID, MODE4_SGID and MODE4_SVTX bits clear. If no 3427 inheritable ACEs exist on the parent directory, the rules for 3428 creating acl attributes are implementation defined. 3430 6.4.3.1. The Inherited ACL 3432 If the object being created is not a directory, the inherited ACL 3433 SHOULD NOT inherit ACEs from the parent directory ACL unless the 3434 ACE4_FILE_INHERIT_FLAG is set. 3436 If the object being created is a directory, the inherited ACL should 3437 inherit all inheritable ACEs from the parent directory, those that 3438 have ACE4_FILE_INHERIT_ACE or ACE4_DIRECTORY_INHERIT_ACE flag set. 3439 If the inheritable ACE has ACE4_FILE_INHERIT_ACE set, but 3440 ACE4_DIRECTORY_INHERIT_ACE is clear, the inherited ACE on the newly 3441 created directory MUST have the ACE4_INHERIT_ONLY_ACE flag set to 3442 prevent the directory from being affected by ACEs meant for non- 3443 directories. 3445 When a new directory is created, the server MAY split any inherited 3446 ACE which is both inheritable and effective (in other words, which 3447 has neither ACE4_INHERIT_ONLY_ACE nor ACE4_NO_PROPAGATE_INHERIT_ACE 3448 set), into two ACEs, one with no inheritance flags, and one with 3449 ACE4_INHERIT_ONLY_ACE set. This makes it simpler to modify the 3450 effective permissions on the directory without modifying the ACE 3451 which is to be inherited to the new directory's children. 3453 7. NFS Server Name Space 3455 7.1. Server Exports 3457 On a UNIX server the name space describes all the files reachable by 3458 pathnames under the root directory or "/". On a Windows server the 3459 name space constitutes all the files on disks named by mapped disk 3460 letters. NFS server administrators rarely make the entire server's 3461 file system name space available to NFS clients. More often portions 3462 of the name space are made available via an "export" feature. In 3463 previous versions of the NFS protocol, the root filehandle for each 3464 export is obtained through the MOUNT protocol; the client sends a 3465 string that identifies an object in the exported name space and the 3466 server returns the root filehandle for it. The MOUNT protocol 3467 supports an EXPORTS procedure that will enumerate the server's 3468 exports. 3470 7.2. Browsing Exports 3472 The NFSv4 protocol provides a root filehandle that clients can use to 3473 obtain filehandles for these exports via a multi-component LOOKUP. A 3474 common user experience is to use a graphical user interface (perhaps 3475 a file "Open" dialog window) to find a file via progressive browsing 3476 through a directory tree. The client must be able to move from one 3477 export to another export via single-component, progressive LOOKUP 3478 operations. 3480 This style of browsing is not well supported by the NFSv2 and NFSv3 3481 protocols. The client expects all LOOKUP operations to remain within 3482 a single server file system. For example, the device attribute will 3483 not change. This prevents a client from taking name space paths that 3484 span exports. 3486 An automounter on the client can obtain a snapshot of the server's 3487 name space using the EXPORTS procedure of the MOUNT protocol. If it 3488 understands the server's pathname syntax, it can create an image of 3489 the server's name space on the client. The parts of the name space 3490 that are not exported by the server are filled in with a "pseudo file 3491 system" that allows the user to browse from one mounted file system 3492 to another. There is a drawback to this representation of the 3493 server's name space on the client: it is static. If the server 3494 administrator adds a new export the client will be unaware of it. 3496 7.3. Server Pseudo Filesystem 3498 NFSv4 servers avoid this name space inconsistency by presenting all 3499 the exports within the framework of a single server name space. An 3500 NFSv4 client uses LOOKUP and READDIR operations to browse seamlessly 3501 from one export to another. Portions of the server name space that 3502 are not exported are bridged via a "pseudo file system" that provides 3503 a view of exported directories only. A pseudo file system has a 3504 unique fsid and behaves like a normal, read only file system. 3506 Based on the construction of the server's name space, it is possible 3507 that multiple pseudo file systems may exist. For example, 3509 /a pseudo file system 3510 /a/b real file system 3511 /a/b/c pseudo file system 3512 /a/b/c/d real file system 3514 Each of the pseudo file systems are considered separate entities and 3515 therefore will have a unique fsid. 3517 7.4. Multiple Roots 3519 The DOS and Windows operating environments are sometimes described as 3520 having "multiple roots". Filesystems are commonly represented as 3521 disk letters. MacOS represents file systems as top level names. 3522 NFSv4 servers for these platforms can construct a pseudo file system 3523 above these root names so that disk letters or volume names are 3524 simply directory names in the pseudo root. 3526 7.5. Filehandle Volatility 3528 The nature of the server's pseudo file system is that it is a logical 3529 representation of file system(s) available from the server. 3530 Therefore, the pseudo file system is most likely constructed 3531 dynamically when the server is first instantiated. It is expected 3532 that the pseudo file system may not have an on disk counterpart from 3533 which persistent filehandles could be constructed. Even though it is 3534 preferable that the server provide persistent filehandles for the 3535 pseudo file system, the NFS client should expect that pseudo file 3536 system filehandles are volatile. This can be confirmed by checking 3537 the associated "fh_expire_type" attribute for those filehandles in 3538 question. If the filehandles are volatile, the NFS client must be 3539 prepared to recover a filehandle value (e.g., with a multi-component 3540 LOOKUP) when receiving an error of NFS4ERR_FHEXPIRED. 3542 7.6. Exported Root 3544 If the server's root file system is exported, one might conclude that 3545 a pseudo file system is not needed. This would be wrong. Assume the 3546 following file systems on a server: 3548 / disk1 (exported) 3549 /a disk2 (not exported) 3550 /a/b disk3 (exported) 3552 Because disk2 is not exported, disk3 cannot be reached with simple 3553 LOOKUPs. The server must bridge the gap with a pseudo file system. 3555 7.7. Mount Point Crossing 3557 The server file system environment may be constructed in such a way 3558 that one file system contains a directory which is 'covered' or 3559 mounted upon by a second file system. For example: 3561 /a/b (file system 1) 3562 /a/b/c/d (file system 2) 3564 The pseudo file system for this server may be constructed to look 3565 like: 3567 / (place holder/not exported) 3568 /a/b (file system 1) 3569 /a/b/c/d (file system 2) 3571 It is the server's responsibility to present the pseudo file system 3572 that is complete to the client. If the client sends a lookup request 3573 for the path "/a/b/c/d", the server's response is the filehandle of 3574 the file system "/a/b/c/d". In previous versions of the NFS 3575 protocol, the server would respond with the filehandle of directory " 3576 /a/b/c/d" within the file system "/a/b". 3578 The NFS client will be able to determine if it crosses a server mount 3579 point by a change in the value of the "fsid" attribute. 3581 7.8. Security Policy and Name Space Presentation 3583 Because NFSv4 clients possess the ability to change the security 3584 mechanisms used, after determining what is allowed, by using SECINFO 3585 the server SHOULD NOT present a different view of the namespace based 3586 on the security mechanism being used by a client. Instead, it should 3587 present a consistent view and return NFS4ERR_WRONGSEC if an attempt 3588 is made to access data with an inappropriate security mechanism. 3590 If security considerations make it necessary to hide the existence of 3591 a particular file system, as opposed to all of the data within it, 3592 the server can apply the security policy of a shared resource in the 3593 server's namespace to components of the resource's ancestors. For 3594 example: 3596 / (place holder/not exported) 3597 /a/b (file system 1) 3598 /a/b/MySecretProject (file system 2) 3600 The /a/b/MySecretProject directory is a real file system and is the 3601 shared resource. Suppose the security policy for /a/b/ 3602 MySecretProject is Kerberos with integrity and it is desired to limit 3603 knowledge of the existence of this file system. In this case, the 3604 server should apply the same security policy to /a/b. This allows 3605 for knowledge of the existence of a file system to be secured when 3606 desirable. 3608 For the case of the use of multiple, disjoint security mechanisms in 3609 the server's resources, applying that sort of policy would result in 3610 the higher-level file system not being accessible using any security 3611 flavor. Therefore, that sort of configuration is not compatible with 3612 hiding the existence (as opposed to the contents) from clients using 3613 multiple disjoint sets of security flavors. 3615 In other circumstances, a desirable policy is for the security of a 3616 particular object in the server's namespace to include the union of 3617 all security mechanisms of all direct descendants. A common and 3618 convenient practice, unless strong security requirements dictate 3619 otherwise, is to make the entire pseudo file system accessible by all 3620 of the valid security mechanisms. 3622 Where there is concern about the security of data on the network, 3623 clients should use strong security mechanisms to access the pseudo 3624 file system in order to prevent man-in-the-middle attacks. 3626 8. Multi-Server Namespace 3628 NFSv4 supports attributes that allow a namespace to extend beyond the 3629 boundaries of a single server. It is RECOMMENDED that clients and 3630 servers support construction of such multi-server namespaces. Use of 3631 such multi-server namespaces is optional, however, and for many 3632 purposes, single-server namespaces are perfectly acceptable. Use of 3633 multi-server namespaces can provide many advantages, however, by 3634 separating a file system's logical position in a namespace from the 3635 (possibly changing) logistical and administrative considerations that 3636 result in particular file systems being located on particular 3637 servers. 3639 8.1. Location Attributes 3641 NFSv4 contains RECOMMENDED attributes that allow file systems on one 3642 server to be associated with one or more instances of that file 3643 system on other servers. These attributes specify such file system 3644 instances by specifying a server address target (either as a DNS name 3645 representing one or more IP addresses or as a literal IP address) 3646 together with the path of that file system within the associated 3647 single-server namespace. 3649 The fs_locations RECOMMENDED attribute allows specification of the 3650 file system locations where the data corresponding to a given file 3651 system may be found. 3653 8.2. File System Presence or Absence 3655 A given location in an NFSv4 namespace (typically but not necessarily 3656 a multi-server namespace) can have a number of file system instance 3657 locations associated with it via the fs_locations attribute. There 3658 may also be an actual current file system at that location, 3659 accessible via normal namespace operations (e.g., LOOKUP). In this 3660 case, the file system is said to be "present" at that position in the 3661 namespace, and clients will typically use it, reserving use of 3662 additional locations specified via the location-related attributes to 3663 situations in which the principal location is no longer available. 3665 When there is no actual file system at the namespace location in 3666 question, the file system is said to be "absent". An absent file 3667 system contains no files or directories other than the root. Any 3668 reference to it, except to access a small set of attributes useful in 3669 determining alternative locations, will result in an error, 3670 NFS4ERR_MOVED. Note that if the server ever returns the error 3671 NFS4ERR_MOVED, it MUST support the fs_locations attribute. 3673 While the error name suggests that we have a case of a file system 3674 that once was present, and has only become absent later, this is only 3675 one possibility. A position in the namespace may be permanently 3676 absent with the set of file system(s) designated by the location 3677 attributes being the only realization. The name NFS4ERR_MOVED 3678 reflects an earlier, more limited conception of its function, but 3679 this error will be returned whenever the referenced file system is 3680 absent, whether it has moved or simply never existed. 3682 Except in the case of GETATTR-type operations (to be discussed 3683 later), when the current filehandle at the start of an operation is 3684 within an absent file system, that operation is not performed and the 3685 error NFS4ERR_MOVED is returned, to indicate that the file system is 3686 absent on the current server. 3688 Because a GETFH cannot succeed if the current filehandle is within an 3689 absent file system, filehandles within an absent file system cannot 3690 be transferred to the client. When a client does have filehandles 3691 within an absent file system, it is the result of obtaining them when 3692 the file system was present, and having the file system become absent 3693 subsequently. 3695 It should be noted that because the check for the current filehandle 3696 being within an absent file system happens at the start of every 3697 operation, operations that change the current filehandle so that it 3698 is within an absent file system will not result in an error. This 3699 allows such combinations as PUTFH-GETATTR and LOOKUP-GETATTR to be 3700 used to get attribute information, particularly location attribute 3701 information, as discussed below. 3703 8.3. Getting Attributes for an Absent File System 3705 When a file system is absent, most attributes are not available, but 3706 it is necessary to allow the client access to the small set of 3707 attributes that are available, and most particularly that which gives 3708 information about the correct current locations for this file system, 3709 fs_locations. 3711 8.3.1. GETATTR Within an Absent File System 3713 As mentioned above, an exception is made for GETATTR in that 3714 attributes may be obtained for a filehandle within an absent file 3715 system. This exception only applies if the attribute mask contains 3716 at least the fs_locations attribute bit, which indicates the client 3717 is interested in a result regarding an absent file system. If it is 3718 not requested, GETATTR will result in an NFS4ERR_MOVED error. 3720 When a GETATTR is done on an absent file system, the set of supported 3721 attributes is very limited. Many attributes, including those that 3722 are normally REQUIRED, will not be available on an absent file 3723 system. In addition to the fs_locations attribute, the following 3724 attributes SHOULD be available on absent file systems. In the case 3725 of RECOMMENDED attributes, they should be available at least to the 3726 same degree that they are available on present file systems. 3728 fsid: This attribute should be provided so that the client can 3729 determine file system boundaries, including, in particular, the 3730 boundary between present and absent file systems. This value must 3731 be different from any other fsid on the current server and need 3732 have no particular relationship to fsids on any particular 3733 destination to which the client might be directed. 3735 mounted_on_fileid: For objects at the top of an absent file system, 3736 this attribute needs to be available. Since the fileid is within 3737 the present parent file system, there should be no need to 3738 reference the absent file system to provide this information. 3740 Other attributes SHOULD NOT be made available for absent file 3741 systems, even when it is possible to provide them. The server should 3742 not assume that more information is always better and should avoid 3743 gratuitously providing additional information. 3745 When a GETATTR operation includes a bit mask for the attribute 3746 fs_locations, but where the bit mask includes attributes that are not 3747 supported, GETATTR will not return an error, but will return the mask 3748 of the actual attributes supported with the results. 3750 Handling of VERIFY/NVERIFY is similar to GETATTR in that if the 3751 attribute mask does not include fs_locations the error NFS4ERR_MOVED 3752 will result. It differs in that any appearance in the attribute mask 3753 of an attribute not supported for an absent file system (and note 3754 that this will include some normally REQUIRED attributes) will also 3755 cause an NFS4ERR_MOVED result. 3757 8.3.2. READDIR and Absent File Systems 3759 A READDIR performed when the current filehandle is within an absent 3760 file system will result in an NFS4ERR_MOVED error, since, unlike the 3761 case of GETATTR, no such exception is made for READDIR. 3763 Attributes for an absent file system may be fetched via a READDIR for 3764 a directory in a present file system, when that directory contains 3765 the root directories of one or more absent file systems. In this 3766 case, the handling is as follows: 3768 o If the attribute set requested includes fs_locations, then 3769 fetching of attributes proceeds normally and no NFS4ERR_MOVED 3770 indication is returned, even when the rdattr_error attribute is 3771 requested. 3773 o If the attribute set requested does not include fs_locations, then 3774 if the rdattr_error attribute is requested, each directory entry 3775 for the root of an absent file system will report NFS4ERR_MOVED as 3776 the value of the rdattr_error attribute. 3778 o If the attribute set requested does not include either of the 3779 attributes fs_locations or rdattr_error then the occurrence of the 3780 root of an absent file system within the directory will result in 3781 the READDIR failing with an NFS4ERR_MOVED error. 3783 o The unavailability of an attribute because of a file system's 3784 absence, even one that is ordinarily REQUIRED, does not result in 3785 any error indication. The set of attributes returned for the root 3786 directory of the absent file system in that case is simply 3787 restricted to those actually available. 3789 8.4. Uses of Location Information 3791 The location-bearing attribute of fs_locations provides, together 3792 with the possibility of absent file systems, a number of important 3793 facilities in providing reliable, manageable, and scalable data 3794 access. 3796 When a file system is present, these attributes can provide 3797 alternative locations, to be used to access the same data, in the 3798 event of server failures, communications problems, or other 3799 difficulties that make continued access to the current file system 3800 impossible or otherwise impractical. Under some circumstances, 3801 multiple alternative locations may be used simultaneously to provide 3802 higher-performance access to the file system in question. Provision 3803 of such alternative locations is referred to as "replication" 3804 although there are cases in which replicated sets of data are not in 3805 fact present, and the replicas are instead different paths to the 3806 same data. 3808 When a file system is present and subsequently becomes absent, 3809 clients can be given the opportunity to have continued access to 3810 their data, at an alternative location. Transfer of the file system 3811 contents to the new location is referred to as "migration". See 3812 Section 8.4.2 for details. 3814 Alternative locations may be be physical replicas of the file system 3815 data or alternative communication paths to the same server or, in the 3816 case of various forms of server clustering, another server providing 3817 access to the same physical file system. The client's 3818 responsibilities in dealing with this transition depend on the 3819 specific nature of the new access path as well as how and whether 3820 data was in fact migrated. These issues will be discussed in detail 3821 below. 3823 Where a file system was not previously present, specification of file 3824 system location provides a means by which file systems located on one 3825 server can be associated with a namespace defined by another server, 3826 thus allowing a general multi-server namespace facility. A 3827 designation of such a location, in place of an absent file system, is 3828 called a "referral". 3830 Because client support for location-related attributes is OPTIONAL, a 3831 server may (but is not required to) take action to hide migration and 3832 referral events from such clients, by acting as a proxy, for example. 3834 8.4.1. File System Replication 3836 The fs_locations attribute provides alternative locations, to be used 3837 to access data in place of or in addition to the current file system 3838 instance. On first access to a file system, the client should obtain 3839 the value of the set of alternative locations by interrogating the 3840 fs_locations attribute. 3842 In the event that server failures, communications problems, or other 3843 difficulties make continued access to the current file system 3844 impossible or otherwise impractical, the client can use the 3845 alternative locations as a way to get continued access to its data. 3846 Multiple locations may be used simultaneously, to provide higher 3847 performance through the exploitation of multiple paths between client 3848 and target file system. 3850 Multiple server addresses, whether they are derived from a single 3851 entry with a DNS name representing a set of IP addresses or from 3852 multiple entries each with its own server address, may correspond to 3853 the same actual server. 3855 8.4.2. File System Migration 3857 When a file system is present and becomes absent, clients can be 3858 given the opportunity to have continued access to their data, at an 3859 alternative location, as specified by the fs_locations attribute. 3860 Typically, a client will be accessing the file system in question, 3861 get an NFS4ERR_MOVED error, and then use the fs_locations attribute 3862 to determine the new location of the data. 3864 Such migration can be helpful in providing load balancing or general 3865 resource reallocation. The protocol does not specify how the file 3866 system will be moved between servers. It is anticipated that a 3867 number of different server-to-server transfer mechanisms might be 3868 used with the choice left to the server implementer. The NFSv4 3869 protocol specifies the method used to communicate the migration event 3870 between client and server. 3872 When an alternative location is designated as the target for 3873 migration, it must designate the same data. Where file systems are 3874 writable, a change made on the original file system must be visible 3875 on all migration targets. Where a file system is not writable but 3876 represents a read-only copy (possibly periodically updated) of a 3877 writable file system, similar requirements apply to the propagation 3878 of updates. Any change visible in the original file system must 3879 already be effected on all migration targets, to avoid any 3880 possibility that a client, in effecting a transition to the migration 3881 target, will see any reversion in file system state. 3883 8.4.3. Referrals 3885 Referrals provide a way of placing a file system in a location within 3886 the namespace essentially without respect to its physical location on 3887 a given server. This allows a single server or a set of servers to 3888 present a multi-server namespace that encompasses file systems 3889 located on multiple servers. Some likely uses of this include 3890 establishment of site-wide or organization-wide namespaces, or even 3891 knitting such together into a truly global namespace. 3893 Referrals occur when a client determines, upon first referencing a 3894 position in the current namespace, that it is part of a new file 3895 system and that the file system is absent. When this occurs, 3896 typically by receiving the error NFS4ERR_MOVED, the actual location 3897 or locations of the file system can be determined by fetching the 3898 fs_locations attribute. 3900 The locations-related attribute may designate a single file system 3901 location or multiple file system locations, to be selected based on 3902 the needs of the client. 3904 Use of multi-server namespaces is enabled by NFSv4 but is not 3905 required. The use of multi-server namespaces and their scope will 3906 depend on the applications used and system administration 3907 preferences. 3909 Multi-server namespaces can be established by a single server 3910 providing a large set of referrals to all of the included file 3911 systems. Alternatively, a single multi-server namespace may be 3912 administratively segmented with separate referral file systems (on 3913 separate servers) for each separately administered portion of the 3914 namespace. The top-level referral file system or any segment may use 3915 replicated referral file systems for higher availability. 3917 Generally, multi-server namespaces are for the most part uniform, in 3918 that the same data made available to one client at a given location 3919 in the namespace is made available to all clients at that location. 3921 8.5. Location Entries and Server Identity 3923 As mentioned above, a single location entry may have a server address 3924 target in the form of a DNS name that may represent multiple IP 3925 addresses, while multiple location entries may have their own server 3926 address targets that reference the same server. 3928 When multiple addresses for the same server exist, the client may 3929 assume that for each file system in the namespace of a given server 3930 network address, there exist file systems at corresponding namespace 3931 locations for each of the other server network addresses. It may do 3932 this even in the absence of explicit listing in fs_locations. Such 3933 corresponding file system locations can be used as alternative 3934 locations, just as those explicitly specified via the fs_locations 3935 attribute. 3937 If a single location entry designates multiple server IP addresses, 3938 the client should choose a single one to use. When two server 3939 addresses are designated by a single location entry and they 3940 correspond to different servers, this normally indicates some sort of 3941 misconfiguration, and so the client should avoid using such location 3942 entries when alternatives are available. When they are not, clients 3943 should pick one of IP addresses and use it, without using others that 3944 are not directed to the same server. 3946 8.6. Additional Client-Side Considerations 3948 When clients make use of servers that implement referrals, 3949 replication, and migration, care should be taken that a user who 3950 mounts a given file system that includes a referral or a relocated 3951 file system continues to see a coherent picture of that user-side 3952 file system despite the fact that it contains a number of server-side 3953 file systems that may be on different servers. 3955 One important issue is upward navigation from the root of a server- 3956 side file system to its parent (specified as ".." in UNIX), in the 3957 case in which it transitions to that file system as a result of 3958 referral, migration, or a transition as a result of replication. 3959 When the client is at such a point, and it needs to ascend to the 3960 parent, it must go back to the parent as seen within the multi-server 3961 namespace rather than sending a LOOKUPP operation to the server, 3962 which would result in the parent within that server's single-server 3963 namespace. In order to do this, the client needs to remember the 3964 filehandles that represent such file system roots and use these 3965 instead of issuing a LOOKUPP operation to the current server. This 3966 will allow the client to present to applications a consistent 3967 namespace, where upward navigation and downward navigation are 3968 consistent. 3970 Another issue concerns refresh of referral locations. When referrals 3971 are used extensively, they may change as server configurations 3972 change. It is expected that clients will cache information related 3973 to traversing referrals so that future client-side requests are 3974 resolved locally without server communication. This is usually 3975 rooted in client-side name look up caching. Clients should 3976 periodically purge this data for referral points in order to detect 3977 changes in location information. 3979 A potential problem exists if a client were to allow an open-owner to 3980 have state on multiple file systems on server, in that it is unclear 3981 how the sequence numbers associated with open-owners are to be dealt 3982 with, in the event of transparent state migration. A client can 3983 avoid such a situation, if it ensures that any use of an open-owner 3984 is confined to a single file system. 3986 A server MAY decline to migrate state associated with open-owners 3987 that span multiple file systems. In cases in which the server 3988 chooses not to migrate such state, the server MUST return 3989 NFS4ERR_BAD_STATEID when the client uses those stateids on the new 3990 server. 3992 The server MUST return NFS4ERR_STALE_STATEID when the client uses 3993 those stateids on the old server, regardless of whether migration has 3994 occurred or not. 3996 8.7. Effecting File System Referrals 3998 Referrals are effected when an absent file system is encountered, and 3999 one or more alternative locations are made available by the 4000 fs_locations attribute. The client will typically get an 4001 NFS4ERR_MOVED error, fetch the appropriate location information, and 4002 proceed to access the file system on a different server, even though 4003 it retains its logical position within the original namespace. 4004 Referrals differ from migration events in that they happen only when 4005 the client has not previously referenced the file system in question 4006 (so there is nothing to transition). Referrals can only come into 4007 effect when an absent file system is encountered at its root. 4009 The examples given in the sections below are somewhat artificial in 4010 that an actual client will not typically do a multi-component look 4011 up, but will have cached information regarding the upper levels of 4012 the name hierarchy. However, these example are chosen to make the 4013 required behavior clear and easy to put within the scope of a small 4014 number of requests, without getting unduly into details of how 4015 specific clients might choose to cache things. 4017 8.7.1. Referral Example (LOOKUP) 4019 Let us suppose that the following COMPOUND is sent in an environment 4020 in which /this/is/the/path is absent from the target server. This 4021 may be for a number of reasons. It may be the case that the file 4022 system has moved, or it may be the case that the target server is 4023 functioning mainly, or solely, to refer clients to the servers on 4024 which various file systems are located. 4026 o PUTROOTFH 4028 o LOOKUP "this" 4030 o LOOKUP "is" 4032 o LOOKUP "the" 4034 o LOOKUP "path" 4036 o GETFH 4038 o GETATTR(fsid,fileid,size,time_modify) 4039 Under the given circumstances, the following will be the result. 4041 o PUTROOTFH --> NFS_OK. The current fh is now the root of the 4042 pseudo-fs. 4044 o LOOKUP "this" --> NFS_OK. The current fh is for /this and is 4045 within the pseudo-fs. 4047 o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is 4048 within the pseudo-fs. 4050 o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and 4051 is within the pseudo-fs. 4053 o LOOKUP "path" --> NFS_OK. The current fh is for /this/is/the/path 4054 and is within a new, absent file system, but ... the client will 4055 never see the value of that fh. 4057 o GETFH --> NFS4ERR_MOVED. Fails because current fh is in an absent 4058 file system at the start of the operation, and the specification 4059 makes no exception for GETFH. 4061 o GETATTR(fsid,fileid,size,time_modify) Not executed because the 4062 failure of the GETFH stops processing of the COMPOUND. 4064 Given the failure of the GETFH, the client has the job of determining 4065 the root of the absent file system and where to find that file 4066 system, i.e., the server and path relative to that server's root fh. 4067 Note here that in this example, the client did not obtain filehandles 4068 and attribute information (e.g., fsid) for the intermediate 4069 directories, so that it would not be sure where the absent file 4070 system starts. It could be the case, for example, that /this/is/the 4071 is the root of the moved file system and that the reason that the 4072 look up of "path" succeeded is that the file system was not absent on 4073 that operation but was moved between the last LOOKUP and the GETFH 4074 (since COMPOUND is not atomic). Even if we had the fsids for all of 4075 the intermediate directories, we could have no way of knowing that / 4076 this/is/the/path was the root of a new file system, since we don't 4077 yet have its fsid. 4079 In order to get the necessary information, let us re-send the chain 4080 of LOOKUPs with GETFHs and GETATTRs to at least get the fsids so we 4081 can be sure where the appropriate file system boundaries are. The 4082 client could choose to get fs_locations at the same time but in most 4083 cases the client will have a good guess as to where file system 4084 boundaries are (because of where NFS4ERR_MOVED was, and was not, 4085 received) making fetching of fs_locations unnecessary. 4087 OP01: PUTROOTFH --> NFS_OK 4089 - Current fh is root of pseudo-fs. 4091 OP02: GETATTR(fsid) --> NFS_OK 4093 - Just for completeness. Normally, clients will know the fsid of 4094 the pseudo-fs as soon as they establish communication with a 4095 server. 4097 OP03: LOOKUP "this" --> NFS_OK 4099 OP04: GETATTR(fsid) --> NFS_OK 4101 - Get current fsid to see where file system boundaries are. The 4102 fsid will be that for the pseudo-fs in this example, so no 4103 boundary. 4105 OP05: GETFH --> NFS_OK 4107 - Current fh is for /this and is within pseudo-fs. 4109 OP06: LOOKUP "is" --> NFS_OK 4111 - Current fh is for /this/is and is within pseudo-fs. 4113 OP07: GETATTR(fsid) --> NFS_OK 4115 - Get current fsid to see where file system boundaries are. The 4116 fsid will be that for the pseudo-fs in this example, so no 4117 boundary. 4119 OP08: GETFH --> NFS_OK 4121 - Current fh is for /this/is and is within pseudo-fs. 4123 OP09: LOOKUP "the" --> NFS_OK 4125 - Current fh is for /this/is/the and is within pseudo-fs. 4127 OP10: GETATTR(fsid) --> NFS_OK 4129 - Get current fsid to see where file system boundaries are. The 4130 fsid will be that for the pseudo-fs in this example, so no 4131 boundary. 4133 OP11: GETFH --> NFS_OK 4134 - Current fh is for /this/is/the and is within pseudo-fs. 4136 OP12: LOOKUP "path" --> NFS_OK 4138 - Current fh is for /this/is/the/path and is within a new, absent 4139 file system, but ... 4141 - The client will never see the value of that fh. 4143 OP13: GETATTR(fsid, fs_locations) --> NFS_OK 4145 - We are getting the fsid to know where the file system boundaries 4146 are. In this operation, the fsid will be different than that of 4147 the parent directory (which in turn was retrieved in OP10). Note 4148 that the fsid we are given will not necessarily be preserved at 4149 the new location. That fsid might be different, and in fact the 4150 fsid we have for this file system might be a valid fsid of a 4151 different file system on that new server. 4153 - In this particular case, we are pretty sure anyway that what has 4154 moved is /this/is/the/path rather than /this/is/the since we have 4155 the fsid of the latter and it is that of the pseudo-fs, which 4156 presumably cannot move. However, in other examples, we might not 4157 have this kind of information to rely on (e.g., /this/is/the might 4158 be a non-pseudo file system separate from /this/is/the/path), so 4159 we need to have other reliable source information on the boundary 4160 of the file system that is moved. If, for example, the file 4161 system /this/is had moved, we would have a case of migration 4162 rather than referral, and once the boundaries of the migrated file 4163 system was clear we could fetch fs_locations. 4165 - We are fetching fs_locations because the fact that we got an 4166 NFS4ERR_MOVED at this point means that it is most likely that this 4167 is a referral and we need the destination. Even if it is the case 4168 that /this/is/the is a file system that has migrated, we will 4169 still need the location information for that file system. 4171 OP14: GETFH --> NFS4ERR_MOVED 4173 - Fails because current fh is in an absent file system at the start 4174 of the operation, and the specification makes no exception for 4175 GETFH. Note that this means the server will never send the client 4176 a filehandle from within an absent file system. 4178 Given the above, the client knows where the root of the absent file 4179 system is (/this/is/the/path) by noting where the change of fsid 4180 occurred (between "the" and "path"). The fs_locations attribute also 4181 gives the client the actual location of the absent file system, so 4182 that the referral can proceed. The server gives the client the bare 4183 minimum of information about the absent file system so that there 4184 will be very little scope for problems of conflict between 4185 information sent by the referring server and information of the file 4186 system's home. No filehandles and very few attributes are present on 4187 the referring server, and the client can treat those it receives as 4188 transient information with the function of enabling the referral. 4190 8.7.2. Referral Example (READDIR) 4192 Another context in which a client may encounter referrals is when it 4193 does a READDIR on a directory in which some of the sub-directories 4194 are the roots of absent file systems. 4196 Suppose such a directory is read as follows: 4198 o PUTROOTFH 4200 o LOOKUP "this" 4202 o LOOKUP "is" 4204 o LOOKUP "the" 4206 o READDIR (fsid, size, time_modify, mounted_on_fileid) 4208 In this case, because rdattr_error is not requested, fs_locations is 4209 not requested, and some of the attributes cannot be provided, the 4210 result will be an NFS4ERR_MOVED error on the READDIR, with the 4211 detailed results as follows: 4213 o PUTROOTFH --> NFS_OK. The current fh is at the root of the 4214 pseudo-fs. 4216 o LOOKUP "this" --> NFS_OK. The current fh is for /this and is 4217 within the pseudo-fs. 4219 o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is 4220 within the pseudo-fs. 4222 o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and 4223 is within the pseudo-fs. 4225 o READDIR (fsid, size, time_modify, mounted_on_fileid) --> 4226 NFS4ERR_MOVED. Note that the same error would have been returned 4227 if /this/is/the had migrated, but it is returned because the 4228 directory contains the root of an absent file system. 4230 So now suppose that we re-send with rdattr_error: 4232 o PUTROOTFH 4234 o LOOKUP "this" 4236 o LOOKUP "is" 4238 o LOOKUP "the" 4240 o READDIR (rdattr_error, fsid, size, time_modify, mounted_on_fileid) 4242 The results will be: 4244 o PUTROOTFH --> NFS_OK. The current fh is at the root of the 4245 pseudo-fs. 4247 o LOOKUP "this" --> NFS_OK. The current fh is for /this and is 4248 within the pseudo-fs. 4250 o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is 4251 within the pseudo-fs. 4253 o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and 4254 is within the pseudo-fs. 4256 o READDIR (rdattr_error, fsid, size, time_modify, mounted_on_fileid) 4257 --> NFS_OK. The attributes for directory entry with the component 4258 named "path" will only contain rdattr_error with the value 4259 NFS4ERR_MOVED, together with an fsid value and a value for 4260 mounted_on_fileid. 4262 So suppose we do another READDIR to get fs_locations (although we 4263 could have used a GETATTR directly, as in Section 8.7.1). 4265 o PUTROOTFH 4267 o LOOKUP "this" 4269 o LOOKUP "is" 4271 o LOOKUP "the" 4273 o READDIR (rdattr_error, fs_locations, mounted_on_fileid, fsid, 4274 size, time_modify) 4276 The results would be: 4278 o PUTROOTFH --> NFS_OK. The current fh is at the root of the 4279 pseudo-fs. 4281 o LOOKUP "this" --> NFS_OK. The current fh is for /this and is 4282 within the pseudo-fs. 4284 o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is 4285 within the pseudo-fs. 4287 o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and 4288 is within the pseudo-fs. 4290 o READDIR (rdattr_error, fs_locations, mounted_on_fileid, fsid, 4291 size, time_modify) --> NFS_OK. The attributes will be as shown 4292 below. 4294 The attributes for the directory entry with the component named 4295 "path" will only contain: 4297 o rdattr_error (value: NFS_OK) 4299 o fs_locations 4301 o mounted_on_fileid (value: unique fileid within referring file 4302 system) 4304 o fsid (value: unique value within referring server) 4306 The attributes for entry "path" will not contain size or time_modify 4307 because these attributes are not available within an absent file 4308 system. 4310 8.8. The Attribute fs_locations 4312 The fs_locations attribute is defined by both fs_location4 4313 (Section 2.2.6) and fs_locations4 (Section 2.2.7). It is used to 4314 represent the location of a file system by providing a server name 4315 and the path to the root of the file system within that server's 4316 namespace. When a set of servers have corresponding file systems at 4317 the same path within their namespaces, an array of server names may 4318 be provided. An entry in the server array is a UTF-8 string and 4319 represents one of a traditional DNS host name, IPv4 address, IPv6 4320 address, or a zero-length string. A zero-length string SHOULD be 4321 used to indicate the current address being used for the RPC call. It 4322 is not a requirement that all servers that share the same rootpath be 4323 listed in one fs_location4 instance. The array of server names is 4324 provided for convenience. Servers that share the same rootpath may 4325 also be listed in separate fs_location4 entries in the fs_locations 4326 attribute. 4328 The fs_locations4 data type and fs_locations attribute contain an 4329 array of such locations. Since the namespace of each server may be 4330 constructed differently, the "fs_root" field is provided. The path 4331 represented by fs_root represents the location of the file system in 4332 the current server's namespace, i.e., that of the server from which 4333 the fs_locations attribute was obtained. The fs_root path is meant 4334 to aid the client by clearly referencing the root of the file system 4335 whose locations are being reported, no matter what object within the 4336 current file system the current filehandle designates. The fs_root 4337 is simply the pathname the client used to reach the object on the 4338 current server (i.e., the object to which the fs_locations attribute 4339 applies). 4341 When the fs_locations attribute is interrogated and there are no 4342 alternative file system locations, the server SHOULD return a zero- 4343 length array of fs_location4 structures, together with a valid 4344 fs_root. 4346 As an example, suppose there is a replicated file system located at 4347 two servers (servA and servB). At servA, the file system is located 4348 at path /a/b/c. At, servB the file system is located at path /x/y/z. 4349 If the client were to obtain the fs_locations value for the directory 4350 at /a/b/c/d, it might not necessarily know that the file system's 4351 root is located in servA's namespace at /a/b/c. When the client 4352 switches to servB, it will need to determine that the directory it 4353 first referenced at servA is now represented by the path /x/y/z/d on 4354 servB. To facilitate this, the fs_locations attribute provided by 4355 servA would have an fs_root value of /a/b/c and two entries in 4356 fs_locations. One entry in fs_locations will be for itself (servA) 4357 and the other will be for servB with a path of /x/y/z. With this 4358 information, the client is able to substitute /x/y/z for the /a/b/c 4359 at the beginning of its access path and construct /x/y/z/d to use for 4360 the new server. 4362 Note that: there is no requirement that the number of components in 4363 each rootpath be the same; there is no relation between the number of 4364 components in rootpath or fs_root, and none of the components in each 4365 rootpath and fs_root have to be the same. In the above example, we 4366 could have had a third element in the locations array, with server 4367 equal to "servC", and rootpath equal to "/I/II", and a fourth element 4368 in locations with server equal to "servD" and rootpath equal to "/ 4369 aleph/beth/gimel/daleth/he". 4371 The relationship between fs_root to a rootpath is that the client 4372 replaces the pathname indicated in fs_root for the current server for 4373 the substitute indicated in rootpath for the new server. 4375 For an example of a referred or migrated file system, suppose there 4376 is a file system located at serv1. At serv1, the file system is 4377 located at /az/buky/vedi/glagoli. The client finds that object at 4378 glagoli has migrated (or is a referral). The client gets the 4379 fs_locations attribute, which contains an fs_root of /az/buky/vedi/ 4380 glagoli, and one element in the locations array, with server equal to 4381 serv2, and rootpath equal to /izhitsa/fita. The client replaces /az/ 4382 buky/vedi/glagoli with /izhitsa/fita, and uses the latter pathname on 4383 serv2. 4385 Thus, the server MUST return an fs_root that is equal to the path the 4386 client used to reach the object to which the fs_locations attribute 4387 applies. Otherwise, the client cannot determine the new path to use 4388 on the new server. 4390 9. File Locking and Share Reservations 4392 Integrating locking into the NFS protocol necessarily causes it to be 4393 stateful. With the inclusion of share reservations the protocol 4394 becomes substantially more dependent on state than the traditional 4395 combination of NFS and NLM (Network Lock Manager) [xnfs]. There are 4396 three components to making this state manageable: 4398 o clear division between client and server 4400 o ability to reliably detect inconsistency in state between client 4401 and server 4403 o simple and robust recovery mechanisms 4405 In this model, the server owns the state information. The client 4406 requests changes in locks and the server responds with the changes 4407 made. Non-client-initiated changes in locking state are infrequent. 4408 The client receives prompt notification of such changes and can 4409 adjust its view of the locking state to reflect the server's changes. 4411 Individual pieces of state created by the server and passed to the 4412 client at its request are represented by 128-bit stateids. These 4413 stateids may represent a particular open file, a set of byte-range 4414 locks held by a particular owner, or a recallable delegation of 4415 privileges to access a file in particular ways or at a particular 4416 location. 4418 In all cases, there is a transition from the most general information 4419 that represents a client as a whole to the eventual lightweight 4420 stateid used for most client and server locking interactions. The 4421 details of this transition will vary with the type of object but it 4422 always starts with a client ID. 4424 To support Win32 share reservations it is necessary to atomically 4425 OPEN or CREATE files and apply the appropriate locks in the same 4426 operation. Having a separate share/unshare operation would not allow 4427 correct implementation of the Win32 OpenFile API. In order to 4428 correctly implement share semantics, the previous NFS protocol 4429 mechanisms used when a file is opened or created (LOOKUP, CREATE, 4430 ACCESS) need to be replaced. The NFSv4 protocol has an OPEN 4431 operation that subsumes the NFSv3 methodology of LOOKUP, CREATE, and 4432 ACCESS. However, because many operations require a filehandle, the 4433 traditional LOOKUP is preserved to map a file name to filehandle 4434 without establishing state on the server. The policy of granting 4435 access or modifying files is managed by the server based on the 4436 client's state. These mechanisms can implement policy ranging from 4437 advisory only locking to full mandatory locking. 4439 9.1. Opens and Byte-Range Locks 4441 It is assumed that manipulating a byte-range lock is rare when 4442 compared to READ and WRITE operations. It is also assumed that 4443 server restarts and network partitions are relatively rare. 4444 Therefore it is important that the READ and WRITE operations have a 4445 lightweight mechanism to indicate if they possess a held lock. A 4446 byte-range lock request contains the heavyweight information required 4447 to establish a lock and uniquely define the owner of the lock. 4449 The following sections describe the transition from the heavy weight 4450 information to the eventual stateid used for most client and server 4451 locking and lease interactions. 4453 9.1.1. Client ID 4455 For each LOCK request, the client must identify itself to the server. 4456 This is done in such a way as to allow for correct lock 4457 identification and crash recovery. A sequence of a SETCLIENTID 4458 operation followed by a SETCLIENTID_CONFIRM operation is required to 4459 establish the identification onto the server. Establishment of 4460 identification by a new incarnation of the client also has the effect 4461 of immediately breaking any leased state that a previous incarnation 4462 of the client might have had on the server, as opposed to forcing the 4463 new client incarnation to wait for the leases to expire. Breaking 4464 the lease state amounts to the server removing all lock, share 4465 reservation, and, where the server is not supporting the 4466 CLAIM_DELEGATE_PREV claim type, all delegation state associated with 4467 same client with the same identity. For discussion of delegation 4468 state recovery, see Section 10.2.1. 4470 Owners of opens and owners of byte-range locks are separate entities 4471 and remain separate even if the same opaque arrays are used to 4472 designate owners of each. The protocol distinguishes between open- 4473 owners (represented by open_owner4 structures) and lock-owners 4474 (represented by lock_owner4 structures). 4476 Both sorts of owners consist of a clientid and an opaque owner 4477 string. For each client, the set of distinct owner values used with 4478 that client constitutes the set of owners of that type, for the given 4479 client. 4481 Each open is associated with a specific open-owner while each byte- 4482 range lock is associated with a lock-owner and an open-owner, the 4483 latter being the open-owner associated with the open file under which 4484 the LOCK operation was done. 4486 Client identification is encapsulated in the following structure: 4488 struct nfs_client_id4 { 4489 verifier4 verifier; 4490 opaque id; 4491 }; 4493 The first field, verifier, is a client incarnation verifier that is 4494 used to detect client reboots. Only if the verifier is different 4495 from that which the server has previously recorded for the client (as 4496 identified by the second field of the structure, id) does the server 4497 start the process of canceling the client's leased state. 4499 The second field, id, is a variable length string that uniquely 4500 defines the client. 4502 There are several considerations for how the client generates the id 4503 string: 4505 o The string should be unique so that multiple clients do not 4506 present the same string. The consequences of two clients 4507 presenting the same string range from one client getting an error 4508 to one client having its leased state abruptly and unexpectedly 4509 canceled. 4511 o The string should be selected so the subsequent incarnations 4512 (e.g., reboots) of the same client cause the client to present the 4513 same string. The implementer is cautioned against an approach 4514 that requires the string to be recorded in a local file because 4515 this precludes the use of the implementation in an environment 4516 where there is no local disk and all file access is from an NFSv4 4517 server. 4519 o The string should be different for each server network address 4520 that the client accesses, rather than common to all server network 4521 addresses. The reason is that it may not be possible for the 4522 client to tell if the same server is listening on multiple network 4523 addresses. If the client issues SETCLIENTID with the same id 4524 string to each network address of such a server, the server will 4525 think it is the same client, and each successive SETCLIENTID will 4526 cause the server to begin the process of removing the client's 4527 previous leased state. 4529 o The algorithm for generating the string should not assume that the 4530 client's network address won't change. This includes changes 4531 between client incarnations and even changes while the client is 4532 stilling running in its current incarnation. This means that if 4533 the client includes just the client's and server's network address 4534 in the id string, there is a real risk, after the client gives up 4535 the network address, that another client, using a similar 4536 algorithm for generating the id string, will generate a 4537 conflicting id string. 4539 Given the above considerations, an example of a well generated id 4540 string is one that includes: 4542 o The server's network address. 4544 o The client's network address. 4546 o For a user level NFSv4 client, it should contain additional 4547 information to distinguish the client from other user level 4548 clients running on the same host, such as an universally unique 4549 identifier (UUID). 4551 o Additional information that tends to be unique, such as one or 4552 more of: 4554 * The client machine's serial number (for privacy reasons, it is 4555 best to perform some one way function on the serial number). 4557 * A MAC address (for privacy reasons, it is best to perform some 4558 one way function on the MAC address). 4560 * The timestamp of when the NFSv4 software was first installed on 4561 the client (though this is subject to the previously mentioned 4562 caution about using information that is stored in a file, 4563 because the file might only be accessible over NFSv4). 4565 * A true random number. However since this number ought to be 4566 the same between client incarnations, this shares the same 4567 problem as that of the using the timestamp of the software 4568 installation. 4570 As a security measure, the server MUST NOT cancel a client's leased 4571 state if the principal that established the state for a given id 4572 string is not the same as the principal issuing the SETCLIENTID. 4574 Note that SETCLIENTID (Section 15.35) and SETCLIENTID_CONFIRM 4575 (Section 15.36) have a secondary purpose of establishing the 4576 information the server needs to make callbacks to the client for the 4577 purpose of supporting delegations. It is permitted to change this 4578 information via SETCLIENTID and SETCLIENTID_CONFIRM within the same 4579 incarnation of the client without removing the client's leased state. 4581 Once a SETCLIENTID and SETCLIENTID_CONFIRM sequence has successfully 4582 completed, the client uses the shorthand client identifier, of type 4583 clientid4, instead of the longer and less compact nfs_client_id4 4584 structure. This shorthand client identifier (a client ID) is 4585 assigned by the server and should be chosen so that it will not 4586 conflict with a client ID previously assigned by the server. This 4587 applies across server restarts or reboots. When a client ID is 4588 presented to a server and that client ID is not recognized, as would 4589 happen after a server reboot, the server will reject the request with 4590 the error NFS4ERR_STALE_CLIENTID. When this happens, the client must 4591 obtain a new client ID by use of the SETCLIENTID operation and then 4592 proceed to any other necessary recovery for the server reboot case 4593 (See Section 9.6.2). 4595 The client must also employ the SETCLIENTID operation when it 4596 receives a NFS4ERR_STALE_STATEID error using a stateid derived from 4597 its current client ID, since this also indicates a server reboot 4598 which has invalidated the existing client ID (see Section 9.6.2 for 4599 details). 4601 See the detailed descriptions of SETCLIENTID (Section 15.35.4) and 4602 SETCLIENTID_CONFIRM (Section 15.36.4) for a complete specification of 4603 the operations. 4605 9.1.2. Server Release of Client ID 4607 If the server determines that the client holds no associated state 4608 for its client ID, the server may choose to release the client ID. 4609 The server may make this choice for an inactive client so that 4610 resources are not consumed by those intermittently active clients. 4611 If the client contacts the server after this release, the server must 4612 ensure the client receives the appropriate error so that it will use 4613 the SETCLIENTID/SETCLIENTID_CONFIRM sequence to establish a new 4614 identity. It should be clear that the server must be very hesitant 4615 to release a client ID since the resulting work on the client to 4616 recover from such an event will be the same burden as if the server 4617 had failed and restarted. Typically a server would not release a 4618 client ID unless there had been no activity from that client for many 4619 minutes. 4621 Note that if the id string in a SETCLIENTID request is properly 4622 constructed, and if the client takes care to use the same principal 4623 for each successive use of SETCLIENTID, then, barring an active 4624 denial of service attack, NFS4ERR_CLID_INUSE should never be 4625 returned. 4627 However, client bugs, server bugs, or perhaps a deliberate change of 4628 the principal owner of the id string (such as the case of a client 4629 that changes security flavors, and under the new flavor, there is no 4630 mapping to the previous owner) will in rare cases result in 4631 NFS4ERR_CLID_INUSE. 4633 In that event, when the server gets a SETCLIENTID for a client ID 4634 that currently has no state, or it has state, but the lease has 4635 expired, rather than returning NFS4ERR_CLID_INUSE, the server MUST 4636 allow the SETCLIENTID, and confirm the new client ID if followed by 4637 the appropriate SETCLIENTID_CONFIRM. 4639 9.1.3. Use of Seqids 4641 In several contexts, 32-bit sequence values, called "seqids" are used 4642 as part of managing locking state. Such values are used: 4644 o To provide an ordering of locking-related operations associated 4645 with a particular lock-owner or open-owner. See Section 9.1.7 for 4646 a detailed explanation. 4648 o To define an ordered set of instances of a set of locks sharing a 4649 particular set of ownership characteristics. See Section 9.1.4.2 4650 for a detailed explanation. 4652 Successive seqid values for the same object are normally arrived at 4653 by incrementing the current value by one. This pattern continues 4654 until the seqid is incremented past NFS4_UINT32_MAX, in which case 4655 one (rather than zero) is to be the next seqid value. 4657 When two seqid values are to be compared to determine which of the 4658 two is later, the possibility of wraparound needs to be considered. 4659 In many cases, the values are such that simple numeric comparisons 4660 can be used. For example, if the seqid values to be compared are 4661 both less than one million, the higher value can be considered the 4662 later. On the other hand, if one of the values is at or near 4663 NFS_UINT32_MAX and the other is less than one million, then 4664 implementations can reasonably decide that the lower value has had 4665 one more wraparound and is thus, while numerically lower, actually 4666 later. 4668 Implementations can compare seqids in the presence of potential 4669 wraparound by adopting the reasonable assumption that the chain of 4670 increments from one to the other is shorter than 2**31. So, if the 4671 difference between the two seqids is less than 2**31, then the lower 4672 seqid is to be treated as earlier. If, however, the difference 4673 between the two seqids is greater than or equal to 2**31, then it can 4674 be assumed that the lower seqid has encountered one more wraparound 4675 and can be treated as later. 4677 9.1.4. Stateid Definition 4679 When the server grants a lock of any type (including opens, byte- 4680 range locks, and delegations), it responds with a unique stateid that 4681 represents a set of locks (often a single lock) for the same file, of 4682 the same type, and sharing the same ownership characteristics. Thus, 4683 opens of the same file by different open-owners each have an 4684 identifying stateid. Similarly, each set of byte-range locks on a 4685 file owned by a specific lock-owner has its own identifying stateid. 4686 Delegations also have associated stateids by which they may be 4687 referenced. The stateid is used as a shorthand reference to a lock 4688 or set of locks, and given a stateid, the server can determine the 4689 associated state-owner or state-owners (in the case of an open-owner/ 4690 lock-owner pair) and the associated filehandle. When stateids are 4691 used, the current filehandle must be the one associated with that 4692 stateid. 4694 All stateids associated with a given client ID are associated with a 4695 common lease that represents the claim of those stateids and the 4696 objects they represent to be maintained by the server. See 4697 Section 9.5 for a discussion of the lease. 4699 Each stateid must be unique to the server. Many operations take a 4700 stateid as an argument but not a clientid, so the server must be able 4701 to infer the client from the stateid. 4703 9.1.4.1. Stateid Types 4705 With the exception of special stateids (see Section 9.1.4.3), each 4706 stateid represents locking objects of one of a set of types defined 4707 by the NFSv4 protocol. Note that in all these cases, where we speak 4708 of a guarantee, it is understood there are situations such as a 4709 client restart, or lock revocation, that allow the guarantee to be 4710 voided. 4712 o Stateids may represent opens of files. 4714 Each stateid in this case represents the OPEN state for a given 4715 client ID/open-owner/filehandle triple. Such stateids are subject 4716 to change (with consequent incrementing of the stateid's seqid) in 4717 response to OPENs that result in upgrade and OPEN_DOWNGRADE 4718 operations. 4720 o Stateids may represent sets of byte-range locks. 4722 All locks held on a particular file by a particular owner and all 4723 gotten under the aegis of a particular open file are associated 4724 with a single stateid with the seqid being incremented whenever 4725 LOCK and LOCKU operations affect that set of locks. 4727 o Stateids may represent file delegations, which are recallable 4728 guarantees by the server to the client that other clients will not 4729 reference, or will not modify, a particular file until the 4730 delegation is returned. 4732 A stateid represents a single delegation held by a client for a 4733 particular filehandle. 4735 9.1.4.2. Stateid Structure 4737 Stateids are divided into two fields, a 96-bit "other" field 4738 identifying the specific set of locks and a 32-bit "seqid" sequence 4739 value. Except in the case of special stateids (see Section 9.1.4.3), 4740 a particular value of the "other" field denotes a set of locks of the 4741 same type (for example, byte-range locks, opens, or delegations), for 4742 a specific file or directory, and sharing the same ownership 4743 characteristics. The seqid designates a specific instance of such a 4744 set of locks, and is incremented to indicate changes in such a set of 4745 locks, either by the addition or deletion of locks from the set, a 4746 change in the byte-range they apply to, or an upgrade or downgrade in 4747 the type of one or more locks. 4749 When such a set of locks is first created, the server returns a 4750 stateid with seqid value of one. On subsequent operations that 4751 modify the set of locks, the server is required to advance the 4752 "seqid" field by one whenever it returns a stateid for the same 4753 state-owner/file/type combination and the operation is one that might 4754 make some change in the set of locks actually designated. In this 4755 case, the server will return a stateid with an "other" field the same 4756 as previously used for that state-owner/file/type combination, with 4757 an incremented "seqid" field. 4759 Seqids will be compared, by both the client and the server. The 4760 client uses such comparisons to determine the order of operations 4761 while the server uses them to determine whether the 4762 NFS4ERR_OLD_STATEID error is to be returned. In all cases, the 4763 possibility of seqid wraparound needs to be taken into account, as 4764 discussed in Section 9.1.3 4766 9.1.4.3. Special Stateids 4768 Stateid values whose "other" field is either all zeros or all ones 4769 are reserved. They MUST NOT be assigned by the server but have 4770 special meanings defined by the protocol. The particular meaning 4771 depends on whether the "other" field is all zeros or all ones and the 4772 specific value of the "seqid" field. 4774 The following combinations of "other" and "seqid" are defined in 4775 NFSv4: 4777 Anonymous Stateid: When "other" and "seqid" are both zero, the 4778 stateid is treated as a special anonymous stateid, which can be 4779 used in READ, WRITE, and SETATTR requests to indicate the absence 4780 of any open state associated with the request. When an anonymous 4781 stateid value is used, and an existing open denies the form of 4782 access requested, then access will be denied to the request. 4784 READ Bypass Stateid: When "other" and "seqid" are both all ones, the 4785 stateid is a special READ bypass stateid. When this value is used 4786 in WRITE or SETATTR, it is treated like the anonymous value. When 4787 used in READ, the server MAY grant access, even if access would 4788 normally be denied to READ requests. 4790 If a stateid value is used which has all zero or all ones in the 4791 "other" field, but does not match one of the cases above, the server 4792 MUST return the error NFS4ERR_BAD_STATEID. 4794 Special stateids, unlike other stateids, are not associated with 4795 individual client IDs or filehandles and can be used with all valid 4796 client IDs and filehandles. 4798 9.1.4.4. Stateid Lifetime and Validation 4800 Stateids must remain valid until either a client restart or a server 4801 restart or until the client returns all of the locks associated with 4802 the stateid by means of an operation such as CLOSE or DELEGRETURN. 4803 If the locks are lost due to revocation as long as the client ID is 4804 valid, the stateid remains a valid designation of that revoked state. 4805 Stateids associated with byte-range locks are an exception. They 4806 remain valid even if a LOCKU frees all remaining locks, so long as 4807 the open file with which they are associated remains open. 4809 It should be noted that there are situations in which the client's 4810 locks become invalid, without the client requesting they be returned. 4811 These include lease expiration and a number of forms of lock 4812 revocation within the lease period. It is important to note that in 4813 these situations, the stateid remains valid and the client can use it 4814 to determine the disposition of the associated lost locks. 4816 An "other" value must never be reused for a different purpose (i.e. 4817 different filehandle, owner, or type of locks) within the context of 4818 a single client ID. A server may retain the "other" value for the 4819 same purpose beyond the point where it may otherwise be freed but if 4820 it does so, it must maintain "seqid" continuity with previous values. 4822 One mechanism that may be used to satisfy the requirement that the 4823 server recognize invalid and out-of-date stateids is for the server 4824 to divide the "other" field of the stateid into two fields. 4826 o An index into a table of locking-state structures. 4828 o A generation number which is incremented on each allocation of a 4829 table entry for a particular use. 4831 And then store in each table entry, 4833 o The client ID with which the stateid is associated. 4835 o The current generation number for the (at most one) valid stateid 4836 sharing this index value. 4838 o The filehandle of the file on which the locks are taken. 4840 o An indication of the type of stateid (open, byte-range lock, file 4841 delegation). 4843 o The last "seqid" value returned corresponding to the current 4844 "other" value. 4846 o An indication of the current status of the locks associated with 4847 this stateid. In particular, whether these have been revoked and 4848 if so, for what reason. 4850 With this information, an incoming stateid can be validated and the 4851 appropriate error returned when necessary. Special and non-special 4852 stateids are handled separately. (See Section 9.1.4.3 for a 4853 discussion of special stateids.) 4855 When a stateid is being tested, and the "other" field is all zeros or 4856 all ones, a check that the "other" and "seqid" fields match a defined 4857 combination for a special stateid is done and the results determined 4858 as follows: 4860 o If the "other" and "seqid" fields do not match a defined 4861 combination associated with a special stateid, the error 4862 NFS4ERR_BAD_STATEID is returned. 4864 o If the combination is valid in general but is not appropriate to 4865 the context in which the stateid is used (e.g., an all-zero 4866 stateid is used when an open stateid is required in a LOCK 4867 operation), the error NFS4ERR_BAD_STATEID is also returned. 4869 o Otherwise, the check is completed and the special stateid is 4870 accepted as valid. 4872 When a stateid is being tested, and the "other" field is neither all 4873 zeros or all ones, the following procedure could be used to validate 4874 an incoming stateid and return an appropriate error, when necessary, 4875 assuming that the "other" field would be divided into a table index 4876 and an entry generation. Note that the terms "earlier" and "later" 4877 used in connection with seqid comparison are to be understood as 4878 explained in Section 9.1.3. 4880 o If the table index field is outside the range of the associated 4881 table, return NFS4ERR_BAD_STATEID. 4883 o If the selected table entry is of a different generation than that 4884 specified in the incoming stateid, return NFS4ERR_BAD_STATEID. 4886 o If the selected table entry does not match the current filehandle, 4887 return NFS4ERR_BAD_STATEID. 4889 o If the stateid represents revoked state or state lost as a result 4890 of lease expiration, then return NFS4ERR_EXPIRED, 4891 NFS4ERR_BAD_STATEID, or NFS4ERR_ADMIN_REVOKED, as appropriate. 4893 o If the stateid type is not valid for the context in which the 4894 stateid appears, return NFS4ERR_BAD_STATEID. Note that a stateid 4895 may be valid in general, but be invalid for a particular 4896 operation, as, for example, when a stateid which doesn't represent 4897 byte-range locks is passed to the non-from_open case of LOCK or to 4898 LOCKU, or when a stateid which does not represent an open is 4899 passed to CLOSE or OPEN_DOWNGRADE. In such cases, the server MUST 4900 return NFS4ERR_BAD_STATEID. 4902 o If the "seqid" field is not zero, and it is later than the current 4903 sequence value corresponding to the current "other" field, return 4904 NFS4ERR_BAD_STATEID. 4906 o If the "seqid" field is earlier than the current sequence value 4907 corresponding to the current "other" field, return 4908 NFS4ERR_OLD_STATEID. 4910 o Otherwise, the stateid is valid and the table entry should contain 4911 any additional information about the type of stateid and 4912 information associated with that particular type of stateid, such 4913 as the associated set of locks, such as open-owner and lock-owner 4914 information, as well as information on the specific locks, such as 4915 open modes and byte ranges. 4917 9.1.4.5. Stateid Use for I/O Operations 4919 Clients performing Input/Output (I/O) operations need to select an 4920 appropriate stateid based on the locks (including opens and 4921 delegations) held by the client and the various types of state-owners 4922 sending the I/O requests. SETATTR operations that change the file 4923 size are treated like I/O operations in this regard. 4925 The following rules, applied in order of decreasing priority, govern 4926 the selection of the appropriate stateid. In following these rules, 4927 the client will only consider locks of which it has actually received 4928 notification by an appropriate operation response or callback. 4930 o If the client holds a delegation for the file in question, the 4931 delegation stateid SHOULD be used. 4933 o Otherwise, if the entity corresponding to the lock-owner (e.g., a 4934 process) sending the I/O has a byte-range lock stateid for the 4935 associated open file, then the byte-range lock stateid for that 4936 lock-owner and open file SHOULD be used. 4938 o If there is no byte-range lock stateid, then the OPEN stateid for 4939 the current open-owner, i.e., the OPEN stateid for the open file 4940 in question, SHOULD be used. 4942 o Finally, if none of the above apply, then a special stateid SHOULD 4943 be used. 4945 Ignoring these rules may result in situations in which the server 4946 does not have information necessary to properly process the request. 4947 For example, when mandatory byte-range locks are in effect, if the 4948 stateid does not indicate the proper lock-owner, via a lock stateid, 4949 a request might be avoidably rejected. 4951 The server however should not try to enforce these ordering rules and 4952 should use whatever information is available to properly process I/O 4953 requests. In particular, when a client has a delegation for a given 4954 file, it SHOULD take note of this fact in processing a request, even 4955 if it is sent with a special stateid. 4957 9.1.4.6. Stateid Use for SETATTR Operations 4959 In the case of SETATTR operations, a stateid is present. In cases 4960 other than those that set the file size, the client may send either a 4961 special stateid or, when a delegation is held for the file in 4962 question, a delegation stateid. While the server SHOULD validate the 4963 stateid and may use the stateid to optimize the determination as to 4964 whether a delegation is held, it SHOULD note the presence of a 4965 delegation even when a special stateid is sent, and MUST accept a 4966 valid delegation stateid when sent. 4968 9.1.5. lock-owner 4970 When requesting a lock, the client must present to the server the 4971 client ID and an identifier for the owner of the requested lock. 4972 These two fields are referred to as the lock-owner and the definition 4973 of those fields are: 4975 o A client ID returned by the server as part of the client's use of 4976 the SETCLIENTID operation. 4978 o A variable length opaque array used to uniquely define the owner 4979 of a lock managed by the client. 4981 This may be a thread id, process id, or other unique value. 4983 When the server grants the lock, it responds with a unique stateid. 4984 The stateid is used as a shorthand reference to the lock-owner, since 4985 the server will be maintaining the correspondence between them. 4987 9.1.6. Use of the Stateid and Locking 4989 All READ, WRITE and SETATTR operations contain a stateid. For the 4990 purposes of this section, SETATTR operations which change the size 4991 attribute of a file are treated as if they are writing the area 4992 between the old and new size (i.e., the range truncated or added to 4993 the file by means of the SETATTR), even where SETATTR is not 4994 explicitly mentioned in the text. The stateid passed to one of these 4995 operations must be one that represents an OPEN (e.g., via the open- 4996 owner), a set of byte-range locks, or a delegation, or it may be a 4997 special stateid representing anonymous access or the READ bypass 4998 stateid. 5000 If the state-owner performs a READ or WRITE in a situation in which 5001 it has established a lock or share reservation on the server (any 5002 OPEN constitutes a share reservation) the stateid (previously 5003 returned by the server) must be used to indicate what locks, 5004 including both byte-range locks and share reservations, are held by 5005 the state-owner. If no state is established by the client, either 5006 byte-range lock or share reservation, the anonymous stateid is used. 5007 Regardless whether an anonymous stateid or a stateid returned by the 5008 server is used, if there is a conflicting share reservation or 5009 mandatory byte-range lock held on the file, the server MUST refuse to 5010 service the READ or WRITE operation. 5012 Share reservations are established by OPEN operations and by their 5013 nature are mandatory in that when the OPEN denies READ or WRITE 5014 operations, that denial results in such operations being rejected 5015 with error NFS4ERR_LOCKED. Byte-range locks may be implemented by 5016 the server as either mandatory or advisory, or the choice of 5017 mandatory or advisory behavior may be determined by the server on the 5018 basis of the file being accessed (for example, some UNIX-based 5019 servers support a "mandatory lock bit" on the mode attribute such 5020 that if set, byte-range locks are required on the file before I/O is 5021 possible). When byte-range locks are advisory, they only prevent the 5022 granting of conflicting lock requests and have no effect on READs or 5023 WRITEs. Mandatory byte-range locks, however, prevent conflicting I/O 5024 operations. When they are attempted, they are rejected with 5025 NFS4ERR_LOCKED. When the client gets NFS4ERR_LOCKED on a file it 5026 knows it has the proper share reservation for, it will need to issue 5027 a LOCK request on the region of the file that includes the region the 5028 I/O was to be performed on, with an appropriate locktype (i.e., 5029 READ*_LT for a READ operation, WRITE*_LT for a WRITE operation). 5031 With NFSv3, there was no notion of a stateid so there was no way to 5032 tell if the application process of the client sending the READ or 5033 WRITE operation had also acquired the appropriate byte-range lock on 5034 the file. Thus there was no way to implement mandatory locking. 5035 With the stateid construct, this barrier has been removed. 5037 Note that for UNIX environments that support mandatory file locking, 5038 the distinction between advisory and mandatory locking is subtle. In 5039 fact, advisory and mandatory byte-range locks are exactly the same in 5040 so far as the APIs and requirements on implementation. If the 5041 mandatory lock attribute is set on the file, the server checks to see 5042 if the lock-owner has an appropriate shared (read) or exclusive 5043 (write) byte-range lock on the region it wishes to read or write to. 5044 If there is no appropriate lock, the server checks if there is a 5045 conflicting lock (which can be done by attempting to acquire the 5046 conflicting lock on the behalf of the lock-owner, and if successful, 5047 release the lock after the READ or WRITE is done), and if there is, 5048 the server returns NFS4ERR_LOCKED. 5050 For Windows environments, there are no advisory byte-range locks, so 5051 the server always checks for byte-range locks during I/O requests. 5053 Thus, the NFSv4 LOCK operation does not need to distinguish between 5054 advisory and mandatory byte-range locks. It is the NFS version 4 5055 server's processing of the READ and WRITE operations that introduces 5056 the distinction. 5058 Every stateid other than the special stateid values noted in this 5059 section, whether returned by an OPEN-type operation (i.e., OPEN, 5060 OPEN_DOWNGRADE), or by a LOCK-type operation (i.e., LOCK or LOCKU), 5061 defines an access mode for the file (i.e., READ, WRITE, or READ- 5062 WRITE) as established by the original OPEN which began the stateid 5063 sequence, and as modified by subsequent OPENs and OPEN_DOWNGRADEs 5064 within that stateid sequence. When a READ, WRITE, or SETATTR which 5065 specifies the size attribute, is done, the operation is subject to 5066 checking against the access mode to verify that the operation is 5067 appropriate given the OPEN with which the operation is associated. 5069 In the case of WRITE-type operations (i.e., WRITEs and SETATTRs which 5070 set size), the server must verify that the access mode allows writing 5071 and return an NFS4ERR_OPENMODE error if it does not. In the case, of 5072 READ, the server may perform the corresponding check on the access 5073 mode, or it may choose to allow READ on opens for WRITE only, to 5074 accommodate clients whose write implementation may unavoidably do 5075 reads (e.g., due to buffer cache constraints). However, even if 5076 READs are allowed in these circumstances, the server MUST still check 5077 for locks that conflict with the READ (e.g., another open specifying 5078 denial of READs). Note that a server which does enforce the access 5079 mode check on READs need not explicitly check for conflicting share 5080 reservations since the existence of OPEN for read access guarantees 5081 that no conflicting share reservation can exist. 5083 A READ bypass stateid MAY allow READ operations to bypass locking 5084 checks at the server. However, WRITE operations with a READ bypass 5085 stateid MUST NOT bypass locking checks and are treated exactly the 5086 same as if an anonymous stateid were used. 5088 A lock may not be granted while a READ or WRITE operation using one 5089 of the special stateids is being performed and the range of the lock 5090 request conflicts with the range of the READ or WRITE operation. For 5091 the purposes of this paragraph, a conflict occurs when a shared lock 5092 is requested and a WRITE operation is being performed, or an 5093 exclusive lock is requested and either a READ or a WRITE operation is 5094 being performed. A SETATTR that sets size is treated similarly to a 5095 WRITE as discussed above. 5097 9.1.7. Sequencing of Lock Requests 5099 Locking is different than most NFS operations as it requires "at- 5100 most-one" semantics that are not provided by ONC RPC. ONC RPC over a 5101 reliable transport is not sufficient because a sequence of locking 5102 requests may span multiple TCP connections. In the face of 5103 retransmission or reordering, lock or unlock requests must have a 5104 well defined and consistent behavior. To accomplish this, each lock 5105 request contains a sequence number that is a consecutively increasing 5106 integer. Different state-owners have different sequences. The 5107 server maintains the last sequence number (L) received and the 5108 response that was returned. The server SHOULD assign a seqid value 5109 of one for the first request issued for any given state-owner. 5110 Subsequent values are arrived at by incrementing the seqid value, 5111 subject to wraparound as described in Section 9.1.3. 5113 Note that for requests that contain a sequence number, for each 5114 state-owner, there should be no more than one outstanding request. 5116 When a request is received, its sequence number (r) is compared to 5117 that of the last one received (L). Only if it has the correct next 5118 sequence, normally L + 1, is the request processed beyond the point 5119 of seqid checking. Given a properly-functioning client, the response 5120 to (r) must have been received before the last request (L) was sent. 5121 If a duplicate of last request (r == L) is received, the stored 5122 response is returned. If the sequence value received is any other 5123 value, it is rejected with the return of error NFS4ERR_BAD_SEQID. 5124 Sequence history is reinitialized whenever the SETCLIENTID/ 5125 SETCLIENTID_CONFIRM sequence changes the client verifier. 5127 It is critical the server maintain the last response sent to the 5128 client to provide a more reliable cache of duplicate non-idempotent 5129 requests than that of the traditional cache described in [Chet]. The 5130 traditional duplicate request cache uses a least recently used 5131 algorithm for removing unneeded requests. However, the last lock 5132 request and response on a given state-owner must be cached as long as 5133 the lock state exists on the server. 5135 The client MUST advance the sequence number for the CLOSE, LOCK, 5136 LOCKU, OPEN, OPEN_CONFIRM, and OPEN_DOWNGRADE operations. This is 5137 true even in the event that the previous operation that used the 5138 sequence number received an error. The only exception to this rule 5139 is if the previous operation received one of the following errors: 5140 NFS4ERR_STALE_CLIENTID, NFS4ERR_STALE_STATEID, NFS4ERR_BAD_STATEID, 5141 NFS4ERR_BAD_SEQID, NFS4ERR_BADXDR, NFS4ERR_RESOURCE, 5142 NFS4ERR_NOFILEHANDLE, or NFS4ERR_MOVED. 5144 9.1.8. Recovery from Replayed Requests 5146 As described above, the sequence number is per state-owner. As long 5147 as the server maintains the last sequence number received and follows 5148 the methods described above, there are no risks of a Byzantine router 5149 re-sending old requests. The server need only maintain the (state- 5150 owner, sequence number) state as long as there are open files or 5151 closed files with locks outstanding. 5153 LOCK, LOCKU, OPEN, OPEN_DOWNGRADE, and CLOSE each contain a sequence 5154 number and therefore the risk of the replay of these operations 5155 resulting in undesired effects is non-existent while the server 5156 maintains the state-owner state. 5158 9.1.9. Interactions of multiple sequence values 5160 Some Operations may have multiple sources of data for request 5161 sequence checking and retransmission determination. Some Operations 5162 have multiple sequence values associated with multiple types of 5163 state-owners. In addition, such Operations may also have a stateid 5164 with its own seqid value, that will be checked for validity. 5166 As noted above, there may be multiple sequence values to check. The 5167 following rules should be followed by the server in processing these 5168 multiple sequence values within a single operation. 5170 o When a sequence value associated with a state-owner is unavailable 5171 for checking because the state-owner is unknown to the server, it 5172 takes no part in the comparison. 5174 o When any of the state-owner sequence values are invalid, 5175 NFS4ERR_BAD_SEQID is returned. When a stateid sequence is 5176 checked, NFS4ERR_BAD_STATEID, or NFS4ERR_OLD_STATEID is returned 5177 as appropriate, but NFS4ERR_BAD_SEQID has priority. 5179 o When any one of the sequence values matches a previous request, 5180 for a state-owner, it is treated as a retransmission and not re- 5181 executed. When the type of the operation does not match that 5182 originally used, NFS4ERR_BAD_SEQID is returned. When the server 5183 can determine that the request differs from the original it may 5184 return NFS4ERR_BAD_SEQID. 5186 o When multiple of the sequence values match previous operations, 5187 but the operations are not the same, NFS4ERR_BAD_SEQID is 5188 returned. 5190 o When there are no available sequence values available for 5191 comparison and the operation is an OPEN, the server indicates to 5192 the client that an OPEN_CONFIRM is required, unless it can 5193 conclusively determine that confirmation is not required (e.g., by 5194 knowing that no open-owner state has ever been released for the 5195 current clientid). 5197 9.1.10. Releasing state-owner State 5199 When a particular state-owner no longer holds open or file locking 5200 state at the server, the server may choose to release the sequence 5201 number state associated with the state-owner. The server may make 5202 this choice based on lease expiration, for the reclamation of server 5203 memory, or other implementation specific details. Note that when 5204 this is done, a retransmitted request, normally identified by a 5205 matching state-owner sequence may not be correctly recognized, so 5206 that the client will not receive the original response that it would 5207 have if the state-owner state was not released. 5209 If the server were able to be sure that a given state-owner would 5210 never again be used by a client, such an issue could not arise. Even 5211 when the state-owner state is released and the client subsequently 5212 uses that state-owner, retransmitted requests will be detected as 5213 invalid and the request not executed, although the client may have a 5214 recovery path that is more complicated than simply getting the 5215 original response back transparently. 5217 In any event, the server is able to safely release state-owner state 5218 (in the sense that retransmitted requests will not be erroneously 5219 acted upon) when the state-owner is not currently being utilized by 5220 the client (i.e., there are no open files associated with an open- 5221 owner and no lock stateids associated with a lock-owner). The server 5222 may choose to hold the state-owner state in order to simplify the 5223 recovery path, in the case in which retransmissions of currently 5224 active requests are received. However, the period for which it 5225 chooses to hold this state is implementation specific. 5227 In the case that a LOCK, LOCKU, OPEN_DOWNGRADE, or CLOSE is 5228 retransmitted after the server has previously released the state- 5229 owner state, the server will find that the state-owner has no files 5230 open and an error will be returned to the client. If the state-owner 5231 does have a file open, the stateid will not match and again an error 5232 is returned to the client. 5234 9.1.11. Use of Open Confirmation 5236 In the case that an OPEN is retransmitted and the open-owner is being 5237 used for the first time or the open-owner state has been previously 5238 released by the server, the use of the OPEN_CONFIRM operation will 5239 prevent incorrect behavior. When the server observes the use of the 5240 open-owner for the first time, it will direct the client to perform 5241 the OPEN_CONFIRM for the corresponding OPEN. This sequence 5242 establishes the use of a open-owner and associated sequence number. 5243 Since the OPEN_CONFIRM sequence connects a new open-owner on the 5244 server with an existing open-owner on a client, the sequence number 5245 may have any valid (i.e., non-zero) value. The OPEN_CONFIRM step 5246 assures the server that the value received is the correct one. (see 5247 Section 15.20 for further details.) 5249 There are a number of situations in which the requirement to confirm 5250 an OPEN would pose difficulties for the client and server, in that 5251 they would be prevented from acting in a timely fashion on 5252 information received, because that information would be provisional, 5253 subject to deletion upon non-confirmation. Fortunately, these are 5254 situations in which the server can avoid the need for confirmation 5255 when responding to open requests. The two constraints are: 5257 o The server must not bestow a delegation for any open which would 5258 require confirmation. 5260 o The server MUST NOT require confirmation on a reclaim-type open 5261 (i.e., one specifying claim type CLAIM_PREVIOUS or 5262 CLAIM_DELEGATE_PREV). 5264 These constraints are related in that reclaim-type opens are the only 5265 ones in which the server may be required to send a delegation. For 5266 CLAIM_NULL, sending the delegation is optional while for 5267 CLAIM_DELEGATE_CUR, no delegation is sent. 5269 Delegations being sent with an open requiring confirmation are 5270 troublesome because recovering from non-confirmation adds undue 5271 complexity to the protocol while requiring confirmation on reclaim- 5272 type opens poses difficulties in that the inability to resolve the 5273 status of the reclaim until lease expiration may make it difficult to 5274 have timely determination of the set of locks being reclaimed (since 5275 the grace period may expire). 5277 Requiring open confirmation on reclaim-type opens is avoidable 5278 because of the nature of the environments in which such opens are 5279 done. For CLAIM_PREVIOUS opens, this is immediately after server 5280 reboot, so there should be no time for open-owners to be created, 5281 found to be unused, and recycled. For CLAIM_DELEGATE_PREV opens, we 5282 are dealing with either a client reboot situation or a network 5283 partition resulting in deletion of lease state (and returning 5284 NFS4ERR_EXPIRED). A server which supports delegations can be sure 5285 that no open-owners for that client have been recycled since client 5286 initialization or deletion of lease state and thus can be confident 5287 that confirmation will not be required. 5289 9.2. Lock Ranges 5291 The protocol allows a lock-owner to request a lock with a byte range 5292 and then either upgrade or unlock a sub-range of the initial lock. 5293 It is expected that this will be an uncommon type of request. In any 5294 case, servers or server file systems may not be able to support sub- 5295 range lock semantics. In the event that a server receives a locking 5296 request that represents a sub-range of current locking state for the 5297 lock-owner, the server is allowed to return the error 5298 NFS4ERR_LOCK_RANGE to signify that it does not support sub-range lock 5299 operations. Therefore, the client should be prepared to receive this 5300 error and, if appropriate, report the error to the requesting 5301 application. 5303 The client is discouraged from combining multiple independent locking 5304 ranges that happen to be adjacent into a single request since the 5305 server may not support sub-range requests and for reasons related to 5306 the recovery of file locking state in the event of server failure. 5307 As discussed in the Section 9.6.2 below, the server may employ 5308 certain optimizations during recovery that work effectively only when 5309 the client's behavior during lock recovery is similar to the client's 5310 locking behavior prior to server failure. 5312 9.3. Upgrading and Downgrading Locks 5314 If a client has a write lock on a record, it can request an atomic 5315 downgrade of the lock to a read lock via the LOCK request, by setting 5316 the type to READ_LT. If the server supports atomic downgrade, the 5317 request will succeed. If not, it will return NFS4ERR_LOCK_NOTSUPP. 5318 The client should be prepared to receive this error, and if 5319 appropriate, report the error to the requesting application. 5321 If a client has a read lock on a record, it can request an atomic 5322 upgrade of the lock to a write lock via the LOCK request by setting 5323 the type to WRITE_LT or WRITEW_LT. If the server does not support 5324 atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the upgrade 5325 can be achieved without an existing conflict, the request will 5326 succeed. Otherwise, the server will return either NFS4ERR_DENIED or 5327 NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the 5328 client issued the LOCK request with the type set to WRITEW_LT and the 5329 server has detected a deadlock. The client should be prepared to 5330 receive such errors and if appropriate, report the error to the 5331 requesting application. 5333 9.4. Blocking Locks 5335 Some clients require the support of blocking locks. The NFS version 5336 4 protocol must not rely on a callback mechanism and therefore is 5337 unable to notify a client when a previously denied lock has been 5338 granted. Clients have no choice but to continually poll for the 5339 lock. This presents a fairness problem. Two new lock types are 5340 added, READW and WRITEW, and are used to indicate to the server that 5341 the client is requesting a blocking lock. The server should maintain 5342 an ordered list of pending blocking locks. When the conflicting lock 5343 is released, the server may wait the lease period for the first 5344 waiting client to re-request the lock. After the lease period 5345 expires the next waiting client request is allowed the lock. Clients 5346 are required to poll at an interval sufficiently small that it is 5347 likely to acquire the lock in a timely manner. The server is not 5348 required to maintain a list of pending blocked locks as it is not 5349 used to provide correct operation but only to increase fairness. 5350 Because of the unordered nature of crash recovery, storing of lock 5351 state to stable storage would be required to guarantee ordered 5352 granting of blocking locks. 5354 Servers may also note the lock types and delay returning denial of 5355 the request to allow extra time for a conflicting lock to be 5356 released, allowing a successful return. In this way, clients can 5357 avoid the burden of needlessly frequent polling for blocking locks. 5358 The server should take care in the length of delay in the event the 5359 client retransmits the request. 5361 If a server receives a blocking lock request, denies it, and then 5362 later receives a nonblocking request for the same lock, which is also 5363 denied, then it should remove the lock in question from its list of 5364 pending blocking locks. Clients should use such a nonblocking 5365 request to indicate to the server that this is the last time they 5366 intend to poll for the lock, as may happen when the process 5367 requesting the lock is interrupted. This is a courtesy to the 5368 server, to prevent it from unnecessarily waiting a lease period 5369 before granting other lock requests. However, clients are not 5370 required to perform this courtesy, and servers must not depend on 5371 them doing so. Also, clients must be prepared for the possibility 5372 that this final locking request will be accepted. 5374 9.5. Lease Renewal 5376 The purpose of a lease is to allow a server to remove stale locks 5377 that are held by a client that has crashed or is otherwise 5378 unreachable. It is not a mechanism for cache consistency and lease 5379 renewals may not be denied if the lease interval has not expired. 5381 The client can implicitly provide a positive indication that it is 5382 still active and that the associated state held at the server, for 5383 the client, is still valid. Any operation made with a valid clientid 5384 (DELEGPURGE, LOCK, LOCKT, OPEN, RELEASE_LOCKOWNER, or RENEW) or a 5385 valid stateid (CLOSE, DELEGRETURN, LOCK, LOCKU, OPEN, OPEN_CONFIRM, 5386 OPEN_DOWNGRADE, READ, SETATTR, or WRITE) informs the server to renew 5387 all of the leases for that client (i.e., all those sharing a given 5388 client ID). In the latter case, the stateid must not be one of the 5389 special stateids (anonymous stateid or READ bypass stateid). 5391 Note that if the client had restarted or rebooted, the client would 5392 not be making these requests without issuing the SETCLIENTID/ 5393 SETCLIENTID_CONFIRM sequence. The use of the SETCLIENTID/ 5394 SETCLIENTID_CONFIRM sequence (one that changes the client verifier) 5395 notifies the server to drop the locking state associated with the 5396 client. SETCLIENTID/SETCLIENTID_CONFIRM never renews a lease. 5398 If the server has rebooted, the stateids (NFS4ERR_STALE_STATEID 5399 error) or the client ID (NFS4ERR_STALE_CLIENTID error) will not be 5400 valid hence preventing spurious renewals. 5402 This approach allows for low overhead lease renewal which scales 5403 well. In the typical case no extra RPC calls are required for lease 5404 renewal and in the worst case one RPC is required every lease period 5405 (i.e., a RENEW operation). The number of locks held by the client is 5406 not a factor since all state for the client is involved with the 5407 lease renewal action. 5409 Since all operations that create a new lease also renew existing 5410 leases, the server must maintain a common lease expiration time for 5411 all valid leases for a given client. This lease time can then be 5412 easily updated upon implicit lease renewal actions. 5414 9.6. Crash Recovery 5416 The important requirement in crash recovery is that both the client 5417 and the server know when the other has failed. Additionally, it is 5418 required that a client sees a consistent view of data across server 5419 restarts or reboots. All READ and WRITE operations that may have 5420 been queued within the client or network buffers must wait until the 5421 client has successfully recovered the locks protecting the READ and 5422 WRITE operations. 5424 9.6.1. Client Failure and Recovery 5426 In the event that a client fails, the server may recover the client's 5427 locks when the associated leases have expired. Conflicting locks 5428 from another client may only be granted after this lease expiration. 5429 If the client is able to restart or reinitialize within the lease 5430 period the client may be forced to wait the remainder of the lease 5431 period before obtaining new locks. 5433 To minimize client delay upon restart, open and lock requests are 5434 associated with an instance of the client by a client supplied 5435 verifier. This verifier is part of the initial SETCLIENTID call made 5436 by the client. The server returns a client ID as a result of the 5437 SETCLIENTID operation. The client then confirms the use of the 5438 client ID with SETCLIENTID_CONFIRM. The client ID in combination 5439 with an opaque owner field is then used by the client to identify the 5440 open-owner for OPEN. This chain of associations is then used to 5441 identify all locks for a particular client. 5443 Since the verifier will be changed by the client upon each 5444 initialization, the server can compare a new verifier to the verifier 5445 associated with currently held locks and determine that they do not 5446 match. This signifies the client's new instantiation and subsequent 5447 loss of locking state. As a result, the server is free to release 5448 all locks held which are associated with the old client ID which was 5449 derived from the old verifier. 5451 Note that the verifier must have the same uniqueness properties of 5452 the verifier for the COMMIT operation. 5454 9.6.2. Server Failure and Recovery 5456 If the server loses locking state (usually as a result of a restart 5457 or reboot), it must allow clients time to discover this fact and re- 5458 establish the lost locking state. The client must be able to re- 5459 establish the locking state without having the server deny valid 5460 requests because the server has granted conflicting access to another 5461 client. Likewise, if there is the possibility that clients have not 5462 yet re-established their locking state for a file, the server must 5463 disallow READ and WRITE operations for that file. The duration of 5464 this recovery period is equal to the duration of the lease period. 5466 A client can determine that server failure (and thus loss of locking 5467 state) has occurred, when it receives one of two errors. The 5468 NFS4ERR_STALE_STATEID error indicates a stateid invalidated by a 5469 reboot or restart. The NFS4ERR_STALE_CLIENTID error indicates a 5470 client ID invalidated by reboot or restart. When either of these are 5471 received, the client must establish a new client ID (see 5472 Section 9.1.1) and re-establish the locking state as discussed below. 5474 The period of special handling of locking and READs and WRITEs, equal 5475 in duration to the lease period, is referred to as the "grace 5476 period". During the grace period, clients recover locks and the 5477 associated state by reclaim-type locking requests (i.e., LOCK 5478 requests with reclaim set to true and OPEN operations with a claim 5479 type of either CLAIM_PREVIOUS or CLAIM_DELEGATE_PREV). During the 5480 grace period, the server must reject READ and WRITE operations and 5481 non-reclaim locking requests (i.e., other LOCK and OPEN operations) 5482 with an error of NFS4ERR_GRACE. 5484 If the server can reliably determine that granting a non-reclaim 5485 request will not conflict with reclamation of locks by other clients, 5486 the NFS4ERR_GRACE error does not have to be returned and the non- 5487 reclaim client request can be serviced. For the server to be able to 5488 service READ and WRITE operations during the grace period, it must 5489 again be able to guarantee that no possible conflict could arise 5490 between an impending reclaim locking request and the READ or WRITE 5491 operation. If the server is unable to offer that guarantee, the 5492 NFS4ERR_GRACE error must be returned to the client. 5494 For a server to provide simple, valid handling during the grace 5495 period, the easiest method is to simply reject all non-reclaim 5496 locking requests and READ and WRITE operations by returning the 5497 NFS4ERR_GRACE error. However, a server may keep information about 5498 granted locks in stable storage. With this information, the server 5499 could determine if a regular lock or READ or WRITE operation can be 5500 safely processed. 5502 For example, if a count of locks on a given file is available in 5503 stable storage, the server can track reclaimed locks for the file and 5504 when all reclaims have been processed, non-reclaim locking requests 5505 may be processed. This way the server can ensure that non-reclaim 5506 locking requests will not conflict with potential reclaim requests. 5507 With respect to I/O requests, if the server is able to determine that 5508 there are no outstanding reclaim requests for a file by information 5509 from stable storage or another similar mechanism, the processing of I 5510 /O requests could proceed normally for the file. 5512 To reiterate, for a server that allows non-reclaim lock and I/O 5513 requests to be processed during the grace period, it MUST determine 5514 that no lock subsequently reclaimed will be rejected and that no lock 5515 subsequently reclaimed would have prevented any I/O operation 5516 processed during the grace period. 5518 Clients should be prepared for the return of NFS4ERR_GRACE errors for 5519 non-reclaim lock and I/O requests. In this case the client should 5520 employ a retry mechanism for the request. A delay (on the order of 5521 several seconds) between retries should be used to avoid overwhelming 5522 the server. Further discussion of the general issue is included in 5523 [Floyd]. The client must account for the server that is able to 5524 perform I/O and non-reclaim locking requests within the grace period 5525 as well as those that cannot do so. 5527 A reclaim-type locking request outside the server's grace period can 5528 only succeed if the server can guarantee that no conflicting lock or 5529 I/O request has been granted since reboot or restart. 5531 A server may, upon restart, establish a new value for the lease 5532 period. Therefore, clients should, once a new client ID is 5533 established, refetch the lease_time attribute and use it as the basis 5534 for lease renewal for the lease associated with that server. 5535 However, the server must establish, for this restart event, a grace 5536 period at least as long as the lease period for the previous server 5537 instantiation. This allows the client state obtained during the 5538 previous server instance to be reliably re-established. 5540 9.6.3. Network Partitions and Recovery 5542 If the duration of a network partition is greater than the lease 5543 period provided by the server, the server will have not received a 5544 lease renewal from the client. If this occurs, the server may cancel 5545 the lease and free all locks held for the client. As a result, all 5546 stateids held by the client will become invalid or stale. Once the 5547 client is able to reach the server after such a network partition, 5548 all I/O submitted by the client with the now invalid stateids will 5549 fail with the server returning the error NFS4ERR_EXPIRED. Once this 5550 error is received, the client will suitably notify the application 5551 that held the lock. 5553 9.6.3.1. Courtesy Locks 5555 As a courtesy to the client or as an optimization, the server may 5556 continue to hold locks, including delegations, on behalf of a client 5557 for which recent communication has extended beyond the lease period, 5558 delaying the cancellation of the lease. If the server receives a 5559 lock or I/O request that conflicts with one of these courtesy locks 5560 or if it runs out of resources, the server MAY cause lease 5561 cancellation to occur at that time and henceforth return 5562 NFS4ERR_EXPIRED when any of the stateids associated with the freed 5563 locks is used. If lease cancellation has not occurred and the server 5564 receives a lock or I/O request that conflicts with one of the 5565 courtesy locks, the requirements are as follows: 5567 o In the case of a courtesy lock which is not a delegation, it MUST 5568 free the courtesy lock and grant the new request. 5570 o In the case of lock or I/O request which conflicts with a 5571 delegation which is being held as courtesy lock, the server MAY 5572 delay resolution of request but MUST NOT reject the request and 5573 MUST free the delegation and grant the new request eventually. 5575 o In the case of a requests for a delegation which conflicts with a 5576 delegation which is being held as a courtesy lock, the server MAY 5577 grant the new request or not as it chooses, but if it grants the 5578 conflicting request, the delegation held as a courtesy lock MUST 5579 be freed. 5581 If the server does not reboot or cancel the lease before the network 5582 partition is healed, when the original client tries to access a 5583 courtesy lock which was freed, the server SHOULD send back a 5584 NFS4ERR_BAD_STATEID to the client. If the client tries to access a 5585 courtesy lock which was not freed, then the server SHOULD mark all of 5586 the courtesy locks as implicitly being renewed. 5588 9.6.3.2. Lease Cancellation 5590 As a result of lease expiration, leases may be canceled, either 5591 immediately upon expiration or subsequently, depending on the 5592 occurrence of a conflicting lock or extension of the period of 5593 partition beyond what the server will tolerate. 5595 When a lease is canceled, all locking state associated with it is 5596 freed and use of any the associated stateids will result in 5597 NFS4ERR_EXPIRED being returned. Similarly, use of the associated 5598 clientid will result in NFS4ERR_EXPIRED being returned. 5600 The client should recover from this situation by using SETCLIENTID 5601 followed by SETCLIENTID_CONFIRM, in order to establish a new 5602 clientid. Once a lock is obtained using this clientid, a lease will 5603 be established. 5605 9.6.3.3. Client's Reaction to a Freed Lock 5607 There is no way for a client to predetermine how a given server is 5608 going to behave during a network partition. When the partition 5609 heals, either the client still has all of its locks, it has some of 5610 its locks, or it has none of them. The client will be able to 5611 examine the various error return values to determine its response. 5613 NFS4ERR_EXPIRED: 5615 All locks have been freed as a result of a lease cancellation 5616 which occurred during the partition. The client should use a 5617 SETCLIENTID to recover. 5619 NFS4ERR_ADMIN_REVOKED: 5621 The current lock has been revoked before, during, or after the 5622 partition. The client SHOULD handle this error as it normally 5623 would. 5625 NFS4ERR_BAD_STATEID: 5627 The current lock has been revoked/released during the partition 5628 and the server did not reboot. Other locks MAY still be renewed. 5629 The client need not do a SETCLIENTID and instead SHOULD probe via 5630 a RENEW call. 5632 NFS4ERR_RECLAIM_BAD: 5634 The current lock has been revoked during the partition and the 5635 server rebooted. The server might have no information on the 5636 other locks. They may still be renewable. 5638 NFS4ERR_NO_GRACE: 5640 The client's locks have been revoked during the partition and the 5641 server rebooted. None of the client's locks will be renewable. 5643 NFS4ERR_OLD_STATEID: 5645 The server has not rebooted. The client SHOULD handle this error 5646 as it normally would. 5648 9.6.3.4. Edge Conditions 5650 When a network partition is combined with a server reboot, then both 5651 the server and client have responsibilities to ensure that the client 5652 does not reclaim a lock which it should no longer be able to access. 5653 Briefly those are: 5655 o Client's responsibility: A client MUST NOT attempt to reclaim any 5656 locks which it did not hold at the end of its most recent 5657 successfully established client lease. 5659 o Server's responsibility: A server MUST NOT allow a client to 5660 reclaim a lock unless it knows that it could not have since 5661 granted a conflicting lock. However, in deciding whether a 5662 conflicting lock could have been granted, it is permitted to 5663 assume its clients are responsible, as above. 5665 A server may consider a client's lease "successfully established" 5666 once it has received an open operation from that client. 5668 The above are directed to CLAIM_PREVIOUS reclaims and not to 5669 CLAIM_DELEGATE_PREV reclaims, which generally do not involve a server 5670 reboot. However, when a server persistently stores delegation 5671 information to support CLAIM_DELEGATE_PREV across a period in which 5672 both client and server are down at the same time, similar strictures 5673 apply. 5675 The next sections give examples showing what can go wrong if these 5676 responsibilities are neglected, and provides examples of server 5677 implementation strategies that could meet a server's 5678 responsibilities. 5680 9.6.3.4.1. First Server Edge Condition 5682 The first edge condition has the following scenario: 5684 1. Client A acquires a lock. 5686 2. Client A and server experience mutual network partition, such 5687 that client A is unable to renew its lease. 5689 3. Client A's lease expires, so server releases lock. 5691 4. Client B acquires a lock that would have conflicted with that of 5692 Client A. 5694 5. Client B releases the lock 5695 6. Server reboots 5697 7. Network partition between client A and server heals. 5699 8. Client A issues a RENEW operation, and gets back a 5700 NFS4ERR_STALE_CLIENTID. 5702 9. Client A reclaims its lock within the server's grace period. 5704 Thus, at the final step, the server has erroneously granted client 5705 A's lock reclaim. If client B modified the object the lock was 5706 protecting, client A will experience object corruption. 5708 9.6.3.4.2. Second Server Edge Condition 5710 The second known edge condition follows: 5712 1. Client A acquires a lock. 5714 2. Server reboots. 5716 3. Client A and server experience mutual network partition, such 5717 that client A is unable to reclaim its lock within the grace 5718 period. 5720 4. Server's reclaim grace period ends. Client A has no locks 5721 recorded on server. 5723 5. Client B acquires a lock that would have conflicted with that of 5724 Client A. 5726 6. Client B releases the lock. 5728 7. Server reboots a second time. 5730 8. Network partition between client A and server heals. 5732 9. Client A issues a RENEW operation, and gets back a 5733 NFS4ERR_STALE_CLIENTID. 5735 10. Client A reclaims its lock within the server's grace period. 5737 As with the first edge condition, the final step of the scenario of 5738 the second edge condition has the server erroneously granting client 5739 A's lock reclaim. 5741 9.6.3.4.3. Handling Server Edge Conditions 5743 In both of the above examples, the client attempts reclaim of a lock 5744 that it held at the end of its most recent successfully established 5745 lease; thus, it has fulfilled its responsibility. 5747 The server, however, has failed, by granting a reclaim, despite 5748 having granted a conflicting lock since the reclaimed lock was last 5749 held. 5751 Solving these edge conditions requires that the server either assume 5752 after it reboots that edge condition occurs, and thus return 5753 NFS4ERR_NO_GRACE for all reclaim attempts, or that the server record 5754 some information in stable storage. The amount of information the 5755 server records in stable storage is in inverse proportion to how 5756 harsh the server wants to be whenever the edge conditions occur. The 5757 server that is completely tolerant of all edge conditions will record 5758 in stable storage every lock that is acquired, removing the lock 5759 record from stable storage only when the lock is unlocked by the 5760 client and the lock's owner advances the sequence number such that 5761 the lock release is not the last stateful event for the owner's 5762 sequence. For the two aforementioned edge conditions, the harshest a 5763 server can be, and still support a grace period for reclaims, 5764 requires that the server record in stable storage some minimal 5765 information. For example, a server implementation could, for each 5766 client, save in stable storage a record containing: 5768 o the client's id string 5770 o a boolean that indicates if the client's lease expired or if there 5771 was administrative intervention (see Section 9.8) to revoke a 5772 byte-range lock, share reservation, or delegation 5774 o a timestamp that is updated the first time after a server boot or 5775 reboot the client acquires byte-range locking, share reservation, 5776 or delegation state on the server. The timestamp need not be 5777 updated on subsequent lock requests until the server reboots. 5779 The server implementation would also record in the stable storage the 5780 timestamps from the two most recent server reboots. 5782 Assuming the above record keeping, for the first edge condition, 5783 after the server reboots, the record that client A's lease expired 5784 means that another client could have acquired a conflicting record 5785 lock, share reservation, or delegation. Hence the server must reject 5786 a reclaim from client A with the error NFS4ERR_NO_GRACE or 5787 NFS4ERR_RECLAIM_BAD. 5789 For the second edge condition, after the server reboots for a second 5790 time, the record that the client had an unexpired record lock, share 5791 reservation, or delegation established before the server's previous 5792 incarnation means that the server must reject a reclaim from client A 5793 with the error NFS4ERR_NO_GRACE or NFS4ERR_RECLAIM_BAD. 5795 Regardless of the level and approach to record keeping, the server 5796 MUST implement one of the following strategies (which apply to 5797 reclaims of share reservations, byte-range locks, and delegations): 5799 1. Reject all reclaims with NFS4ERR_NO_GRACE. This is super harsh, 5800 but necessary if the server does not want to record lock state in 5801 stable storage. 5803 2. Record sufficient state in stable storage to meet its 5804 responsibilities. In doubt, the server should err on the side of 5805 being harsh. 5807 In the event that, after a server reboot, the server determines 5808 that there is unrecoverable damage or corruption to the stable 5809 storage, then for all clients and/or locks affected, the server 5810 MUST return NFS4ERR_NO_GRACE. 5812 9.6.3.4.4. Client Edge Condition 5814 A third edge condition effects the client and not the server. If the 5815 server reboots in the middle of the client reclaiming some locks and 5816 then a network partition is established, the client might be in the 5817 situation of having reclaimed some, but not all locks. In that case, 5818 a conservative client would assume that the non-reclaimed locks were 5819 revoked. 5821 The third known edge condition follows: 5823 1. Client A acquires a lock 1. 5825 2. Client A acquires a lock 2. 5827 3. Server reboots. 5829 4. Client A issues a RENEW operation, and gets back a 5830 NFS4ERR_STALE_CLIENTID. 5832 5. Client A reclaims its lock 1 within the server's grace period. 5834 6. Client A and server experience mutual network partition, such 5835 that client A is unable to reclaim its remaining locks within 5836 the grace period. 5838 7. Server's reclaim grace period ends. 5840 8. Client B acquires a lock that would have conflicted with Client 5841 A's lock 2. 5843 9. Client B releases the lock. 5845 10. Server reboots a second time. 5847 11. Network partition between client A and server heals. 5849 12. Client A issues a RENEW operation, and gets back a 5850 NFS4ERR_STALE_CLIENTID. 5852 13. Client A reclaims both lock 1 and lock 2 within the server's 5853 grace period. 5855 At the last step, the client reclaims lock 2 as if it had held that 5856 lock continuously, when in fact a conflicting lock was granted to 5857 client B. 5859 This occurs because the client failed its responsibility, by 5860 attempting to reclaim lock 2 even though it had not held that lock at 5861 the end of the lease that was established by the SETCLIENTID after 5862 the first server reboot. (The client did hold lock 2 on a previous 5863 lease. But it is only the most recent lease that matters.) 5865 A server could avoid this situation by rejecting the reclaim of lock 5866 2. However, to do so accurately it would have to ensure that 5867 additional information about individual locks held survives reboot. 5868 Server implementations are not required to do that, so the client 5869 must not assume that the server will. 5871 Instead, a client MUST reclaim only those locks which it successfully 5872 acquired from the previous server instance, omitting any that it 5873 failed to reclaim before a new reboot. Thus, in the last step above, 5874 client A should reclaim only lock 1. 5876 9.6.3.4.5. Client's Handling of Reclaim Errors 5878 A mandate for the client's handling of the NFS4ERR_NO_GRACE and 5879 NFS4ERR_RECLAIM_BAD errors is outside the scope of this 5880 specification, since the strategies for such handling are very 5881 dependent on the client's operating environment. However, one 5882 potential approach is described below. 5884 When the client's reclaim fails, it could examine the change 5885 attribute of the objects the client is trying to reclaim state for, 5886 and use that to determine whether to re-establish the state via 5887 normal OPEN or LOCK requests. This is acceptable provided the 5888 client's operating environment allows it. In other words, the client 5889 implementer is advised to document for his users the behavior. The 5890 client could also inform the application that its byte-range lock or 5891 share reservations (whether they were delegated or not) have been 5892 lost, such as via a UNIX signal, a GUI pop-up window, etc. See 5893 Section 10.5, for a discussion of what the client should do for 5894 dealing with unreclaimed delegations on client state. 5896 For further discussion of revocation of locks see Section 9.8. 5898 9.7. Recovery from a Lock Request Timeout or Abort 5900 In the event a lock request times out, a client may decide to not 5901 retry the request. The client may also abort the request when the 5902 process for which it was issued is terminated (e.g., in UNIX due to a 5903 signal). It is possible though that the server received the request 5904 and acted upon it. This would change the state on the server without 5905 the client being aware of the change. It is paramount that the 5906 client re-synchronize state with server before it attempts any other 5907 operation that takes a seqid and/or a stateid with the same state- 5908 owner. This is straightforward to do without a special re- 5909 synchronize operation. 5911 Since the server maintains the last lock request and response 5912 received on the state-owner, for each state-owner, the client should 5913 cache the last lock request it sent such that the lock request did 5914 not receive a response. From this, the next time the client does a 5915 lock operation for the state-owner, it can send the cached request, 5916 if there is one, and if the request was one that established state 5917 (e.g., a LOCK or OPEN operation), the server will return the cached 5918 result or if never saw the request, perform it. The client can 5919 follow up with a request to remove the state (e.g., a LOCKU or CLOSE 5920 operation). With this approach, the sequencing and stateid 5921 information on the client and server for the given state-owner will 5922 re-synchronize and in turn the lock state will re-synchronize. 5924 9.8. Server Revocation of Locks 5926 At any point, the server can revoke locks held by a client and the 5927 client must be prepared for this event. When the client detects that 5928 its locks have been or may have been revoked, the client is 5929 responsible for validating the state information between itself and 5930 the server. Validating locking state for the client means that it 5931 must verify or reclaim state for each lock currently held. 5933 The first instance of lock revocation is upon server reboot or re- 5934 initialization. In this instance the client will receive an error 5935 (NFS4ERR_STALE_STATEID or NFS4ERR_STALE_CLIENTID) and the client will 5936 proceed with normal crash recovery as described in the previous 5937 section. 5939 The second lock revocation event is the inability to renew the lease 5940 before expiration. While this is considered a rare or unusual event, 5941 the client must be prepared to recover. Both the server and client 5942 will be able to detect the failure to renew the lease and are capable 5943 of recovering without data corruption. For the server, it tracks the 5944 last renewal event serviced for the client and knows when the lease 5945 will expire. Similarly, the client must track operations which will 5946 renew the lease period. Using the time that each such request was 5947 sent and the time that the corresponding reply was received, the 5948 client should bound the time that the corresponding renewal could 5949 have occurred on the server and thus determine if it is possible that 5950 a lease period expiration could have occurred. 5952 The third lock revocation event can occur as a result of 5953 administrative intervention within the lease period. While this is 5954 considered a rare event, it is possible that the server's 5955 administrator has decided to release or revoke a particular lock held 5956 by the client. As a result of revocation, the client will receive an 5957 error of NFS4ERR_ADMIN_REVOKED. In this instance the client may 5958 assume that only the state-owner's locks have been lost. The client 5959 notifies the lock holder appropriately. The client cannot assume the 5960 lease period has been renewed as a result of a failed operation. 5962 When the client determines the lease period may have expired, the 5963 client must mark all locks held for the associated lease as 5964 "unvalidated". This means the client has been unable to re-establish 5965 or confirm the appropriate lock state with the server. As described 5966 in Section 9.6, there are scenarios in which the server may grant 5967 conflicting locks after the lease period has expired for a client. 5968 When it is possible that the lease period has expired, the client 5969 must validate each lock currently held to ensure that a conflicting 5970 lock has not been granted. The client may accomplish this task by 5971 issuing an I/O request; if there is no relevant I/O pending, a zero- 5972 length read specifying the stateid associated with the lock in 5973 question can be synthesised to trigger the renewal. If the response 5974 to the request is success, the client has validated all of the locks 5975 governed by that stateid and re-established the appropriate state 5976 between itself and the server. 5978 If the I/O request is not successful, then one or more of the locks 5979 associated with the stateid was revoked by the server and the client 5980 must notify the owner. 5982 9.9. Share Reservations 5984 A share reservation is a mechanism to control access to a file. It 5985 is a separate and independent mechanism from byte-range locking. 5986 When a client opens a file, it issues an OPEN operation to the server 5987 specifying the type of access required (READ, WRITE, or BOTH) and the 5988 type of access to deny others (OPEN4_SHARE_DENY_NONE, 5989 OPEN4_SHARE_DENY_READ, OPEN4_SHARE_DENY_WRITE, or 5990 OPEN4_SHARE_DENY_BOTH). If the OPEN fails the client will fail the 5991 application's open request. 5993 Pseudo-code definition of the semantics: 5995 if (request.access == 0) 5996 return (NFS4ERR_INVAL) 5997 else if ((request.access & file_state.deny)) || 5998 (request.deny & file_state.access)) 5999 return (NFS4ERR_DENIED) 6001 This checking of share reservations on OPEN is done with no exception 6002 for an existing OPEN for the same open-owner. 6004 The constants used for the OPEN and OPEN_DOWNGRADE operations for the 6005 access and deny fields are as follows: 6007 const OPEN4_SHARE_ACCESS_READ = 0x00000001; 6008 const OPEN4_SHARE_ACCESS_WRITE = 0x00000002; 6009 const OPEN4_SHARE_ACCESS_BOTH = 0x00000003; 6011 const OPEN4_SHARE_DENY_NONE = 0x00000000; 6012 const OPEN4_SHARE_DENY_READ = 0x00000001; 6013 const OPEN4_SHARE_DENY_WRITE = 0x00000002; 6014 const OPEN4_SHARE_DENY_BOTH = 0x00000003; 6016 9.10. OPEN/CLOSE Operations 6018 To provide correct share semantics, a client MUST use the OPEN 6019 operation to obtain the initial filehandle and indicate the desired 6020 access and what access, if any, to deny. Even if the client intends 6021 to use one of the special stateids (anonymous stateid or READ bypass 6022 stateid), it must still obtain the filehandle for the regular file 6023 with the OPEN operation so the appropriate share semantics can be 6024 applied. Clients that do not have a deny mode built into their 6025 programming interfaces for opening a file should request a deny mode 6026 of OPEN4_SHARE_DENY_NONE. 6028 The OPEN operation with the CREATE flag, also subsumes the CREATE 6029 operation for regular files as used in previous versions of the NFS 6030 protocol. This allows a create with a share to be done atomically. 6032 The CLOSE operation removes all share reservations held by the open- 6033 owner on that file. If byte-range locks are held, the client SHOULD 6034 release all locks before issuing a CLOSE. The server MAY free all 6035 outstanding locks on CLOSE but some servers may not support the CLOSE 6036 of a file that still has byte-range locks held. The server MUST 6037 return failure, NFS4ERR_LOCKS_HELD, if any locks would exist after 6038 the CLOSE. 6040 The LOOKUP operation will return a filehandle without establishing 6041 any lock state on the server. Without a valid stateid, the server 6042 will assume the client has the least access. For example, if one 6043 client opened a file with OPEN4_SHARE_DENY_BOTH and another client 6044 accesses the file via a filehandle obtained through LOOKUP, the 6045 second client could only read the file using the special READ bypass 6046 stateid. The second client could not WRITE the file at all because 6047 it would not have a valid stateid from OPEN and the special anonymous 6048 stateid would not be allowed access. 6050 9.10.1. Close and Retention of State Information 6052 Since a CLOSE operation requests deallocation of a stateid, dealing 6053 with retransmission of the CLOSE, may pose special difficulties, 6054 since the state information, which normally would be used to 6055 determine the state of the open file being designated, might be 6056 deallocated, resulting in an NFS4ERR_BAD_STATEID error. 6058 Servers may deal with this problem in a number of ways. To provide 6059 the greatest degree assurance that the protocol is being used 6060 properly, a server should, rather than deallocate the stateid, mark 6061 it as close-pending, and retain the stateid with this status, until 6062 later deallocation. In this way, a retransmitted CLOSE can be 6063 recognized since the stateid points to state information with this 6064 distinctive status, so that it can be handled without error. 6066 When adopting this strategy, a server should retain the state 6067 information until the earliest of: 6069 o Another validly sequenced request for the same open-owner, that is 6070 not a retransmission. 6072 o The time that an open-owner is freed by the server due to period 6073 with no activity. 6075 o All locks for the client are freed as a result of a SETCLIENTID. 6077 Servers may avoid this complexity, at the cost of less complete 6078 protocol error checking, by simply responding NFS4_OK in the event of 6079 a CLOSE for a deallocated stateid, on the assumption that this case 6080 must be caused by a retransmitted close. When adopting this 6081 approach, it is desirable to at least log an error when returning a 6082 no-error indication in this situation. If the server maintains a 6083 reply-cache mechanism, it can verify the CLOSE is indeed a 6084 retransmission and avoid error logging in most cases. 6086 9.11. Open Upgrade and Downgrade 6088 When an OPEN is done for a file and the open-owner for which the open 6089 is being done already has the file open, the result is to upgrade the 6090 open file status maintained on the server to include the access and 6091 deny bits specified by the new OPEN as well as those for the existing 6092 OPEN. The result is that there is one open file, as far as the 6093 protocol is concerned, and it includes the union of the access and 6094 deny bits for all of the OPEN requests completed. Only a single 6095 CLOSE will be done to reset the effects of both OPENs. Note that the 6096 client, when issuing the OPEN, may not know that the same file is in 6097 fact being opened. The above only applies if both OPENs result in 6098 the OPENed object being designated by the same filehandle. 6100 When the server chooses to export multiple filehandles corresponding 6101 to the same file object and returns different filehandles on two 6102 different OPENs of the same file object, the server MUST NOT "OR" 6103 together the access and deny bits and coalesce the two open files. 6104 Instead the server must maintain separate OPENs with separate 6105 stateids and will require separate CLOSEs to free them. 6107 When multiple open files on the client are merged into a single open 6108 file object on the server, the close of one of the open files (on the 6109 client) may necessitate change of the access and deny status of the 6110 open file on the server. This is because the union of the access and 6111 deny bits for the remaining opens may be smaller (i.e., a proper 6112 subset) than previously. The OPEN_DOWNGRADE operation is used to 6113 make the necessary change and the client should use it to update the 6114 server so that share reservation requests by other clients are 6115 handled properly. The stateid returned has the same "other" field as 6116 that passed to the server. The "seqid" value in the returned stateid 6117 MUST be incremented (Section 9.1.4), even in situations in which 6118 there has been no change to the access and deny bits for the file. 6120 9.12. Short and Long Leases 6122 When determining the time period for the server lease, the usual 6123 lease tradeoffs apply. Short leases are good for fast server 6124 recovery at a cost of increased RENEW or READ (with zero length) 6125 requests. Longer leases are certainly kinder and gentler to servers 6126 trying to handle very large numbers of clients. The number of RENEW 6127 requests drop in proportion to the lease time. The disadvantages of 6128 long leases are slower recovery after server failure (the server must 6129 wait for the leases to expire and the grace period to elapse before 6130 granting new lock requests) and increased file contention (if client 6131 fails to transmit an unlock request then server must wait for lease 6132 expiration before granting new locks). 6134 Long leases are usable if the server is able to store lease state in 6135 non-volatile memory. Upon recovery, the server can reconstruct the 6136 lease state from its non-volatile memory and continue operation with 6137 its clients and therefore long leases would not be an issue. 6139 9.13. Clocks, Propagation Delay, and Calculating Lease Expiration 6141 To avoid the need for synchronized clocks, lease times are granted by 6142 the server as a time delta. However, there is a requirement that the 6143 client and server clocks do not drift excessively over the duration 6144 of the lock. There is also the issue of propagation delay across the 6145 network which could easily be several hundred milliseconds as well as 6146 the possibility that requests will be lost and need to be 6147 retransmitted. 6149 To take propagation delay into account, the client should subtract it 6150 from lease times (e.g., if the client estimates the one-way 6151 propagation delay as 200 msec, then it can assume that the lease is 6152 already 200 msec old when it gets it). In addition, it will take 6153 another 200 msec to get a response back to the server. So the client 6154 must send a lock renewal or write data back to the server 400 msec 6155 before the lease would expire. 6157 The server's lease period configuration should take into account the 6158 network distance of the clients that will be accessing the server's 6159 resources. It is expected that the lease period will take into 6160 account the network propagation delays and other network delay 6161 factors for the client population. Since the protocol does not allow 6162 for an automatic method to determine an appropriate lease period, the 6163 server's administrator may have to tune the lease period. 6165 9.14. Migration, Replication and State 6167 When responsibility for handling a given file system is transferred 6168 to a new server (migration) or the client chooses to use an 6169 alternative server (e.g., in response to server unresponsiveness) in 6170 the context of file system replication, the appropriate handling of 6171 state shared between the client and server (i.e., locks, leases, 6172 stateids, and client IDs) is as described below. The handling 6173 differs between migration and replication. For related discussion of 6174 file server state and recover of such see the sections under 6175 Section 9.6. 6177 If a server replica or a server immigrating a file system agrees to, 6178 or is expected to, accept opaque values from the client that 6179 originated from another server, then servers SHOULD encode the 6180 "opaque" values in network byte order. This way, servers acting as 6181 replicas or immigrating file systems will be able to parse values 6182 like stateids, directory cookies, filehandles, etc. even if their 6183 native byte order is different from other servers cooperating in the 6184 replication and migration of the file system. 6186 9.14.1. Migration and State 6188 In the case of migration, the servers involved in the migration of a 6189 file system SHOULD transfer all server state from the original to the 6190 new server. This must be done in a way that is transparent to the 6191 client. This state transfer will ease the client's transition when a 6192 file system migration occurs. If the servers are successful in 6193 transferring all state, the client will continue to use stateids 6194 assigned by the original server. Therefore the new server must 6195 recognize these stateids as valid. This holds true for the client ID 6196 as well. Since responsibility for an entire file system is 6197 transferred with a migration event, there is no possibility that 6198 conflicts will arise on the new server as a result of the transfer of 6199 locks. 6201 As part of the transfer of information between servers, leases would 6202 be transferred as well. The leases being transferred to the new 6203 server will typically have a different expiration time from those for 6204 the same client, previously on the old server. To maintain the 6205 property that all leases on a given server for a given client expire 6206 at the same time, the server should advance the expiration time to 6207 the later of the leases being transferred or the leases already 6208 present. This allows the client to maintain lease renewal of both 6209 classes without special effort. 6211 The servers may choose not to transfer the state information upon 6212 migration. However, this choice is discouraged. In this case, when 6213 the client presents state information from the original server (e.g., 6214 in a RENEW op or a READ op of zero length), the client must be 6215 prepared to receive either NFS4ERR_STALE_CLIENTID or 6216 NFS4ERR_STALE_STATEID from the new server. The client should then 6217 recover its state information as it normally would in response to a 6218 server failure. The new server must take care to allow for the 6219 recovery of state information as it would in the event of server 6220 restart. 6222 A client SHOULD re-establish new callback information with the new 6223 server as soon as possible, according to sequences described in 6224 Section 15.35 and Section 15.36. This ensures that server operations 6225 are not blocked by the inability to recall delegations. 6227 9.14.2. Replication and State 6229 Since client switch-over in the case of replication is not under 6230 server control, the handling of state is different. In this case, 6231 leases, stateids and client IDs do not have validity across a 6232 transition from one server to another. The client must re-establish 6233 its locks on the new server. This can be compared to the re- 6234 establishment of locks by means of reclaim-type requests after a 6235 server reboot. The difference is that the server has no provision to 6236 distinguish requests reclaiming locks from those obtaining new locks 6237 or to defer the latter. Thus, a client re-establishing a lock on the 6238 new server (by means of a LOCK or OPEN request), may have the 6239 requests denied due to a conflicting lock. Since replication is 6240 intended for read-only use of file systems, such denial of locks 6241 should not pose large difficulties in practice. When an attempt to 6242 re-establish a lock on a new server is denied, the client should 6243 treat the situation as if his original lock had been revoked. 6245 9.14.3. Notification of Migrated Lease 6247 In the case of lease renewal, the client may not be submitting 6248 requests for a file system that has been migrated to another server. 6249 This can occur because of the implicit lease renewal mechanism. The 6250 client renews leases for all file systems when submitting a request 6251 to any one file system at the server. 6253 In order for the client to schedule renewal of leases that may have 6254 been relocated to the new server, the client must find out about 6255 lease relocation before those leases expire. To accomplish this, all 6256 operations which implicitly renew leases for a client (such as OPEN, 6257 CLOSE, READ, WRITE, RENEW, LOCK, and others), will return the error 6258 NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be 6259 renewed has been transferred to a new server. This condition will 6260 continue until the client receives an NFS4ERR_MOVED error and the 6261 server receives the subsequent GETATTR(fs_locations) for an access to 6262 each file system for which a lease has been moved to a new server. 6263 By convention, the compound including the GETATTR(fs_locations) 6264 SHOULD append a RENEW operation to permit the server to identify the 6265 client doing the access. 6267 Upon receiving the NFS4ERR_LEASE_MOVED error, a client that supports 6268 file system migration MUST probe all file systems from that server on 6269 which it holds open state. Once the client has successfully probed 6270 all those file systems which are migrated, the server MUST resume 6271 normal handling of stateful requests from that client. 6273 In order to support legacy clients that do not handle the 6274 NFS4ERR_LEASE_MOVED error correctly, the server SHOULD time out after 6275 a wait of at least two lease periods, at which time it will resume 6276 normal handling of stateful requests from all clients. If a client 6277 attempts to access the migrated files, the server MUST reply 6278 NFS4ERR_MOVED. 6280 When the client receives an NFS4ERR_MOVED error, the client can 6281 follow the normal process to obtain the new server information 6282 (through the fs_locations attribute) and perform renewal of those 6283 leases on the new server. If the server has not had state 6284 transferred to it transparently, the client will receive either 6285 NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID from the new server, 6286 as described above. The client can then recover state information as 6287 it does in the event of server failure. 6289 9.14.4. Migration and the lease_time Attribute 6291 In order that the client may appropriately manage its leases in the 6292 case of migration, the destination server must establish proper 6293 values for the lease_time attribute. 6295 When state is transferred transparently, that state should include 6296 the correct value of the lease_time attribute. The lease_time 6297 attribute on the destination server must never be less than that on 6298 the source since this would result in premature expiration of leases 6299 granted by the source server. Upon migration in which state is 6300 transferred transparently, the client is under no obligation to re- 6301 fetch the lease_time attribute and may continue to use the value 6302 previously fetched (on the source server). 6304 If state has not been transferred transparently (i.e., the client 6305 sees a real or simulated server reboot), the client should fetch the 6306 value of lease_time on the new (i.e., destination) server, and use it 6307 for subsequent locking requests. However the server must respect a 6308 grace period at least as long as the lease_time on the source server, 6309 in order to ensure that clients have ample time to reclaim their 6310 locks before potentially conflicting non-reclaimed locks are granted. 6311 The means by which the new server obtains the value of lease_time on 6312 the old server is left to the server implementations. It is not 6313 specified by the NFS version 4 protocol. 6315 10. Client-Side Caching 6317 Client-side caching of data, of file attributes, and of file names is 6318 essential to providing good performance with the NFS protocol. 6319 Providing distributed cache coherence is a difficult problem and 6320 previous versions of the NFS protocol have not attempted it. 6321 Instead, several NFS client implementation techniques have been used 6322 to reduce the problems that a lack of coherence poses for users. 6323 These techniques have not been clearly defined by earlier protocol 6324 specifications and it is often unclear what is valid or invalid 6325 client behavior. 6327 The NFSv4 protocol uses many techniques similar to those that have 6328 been used in previous protocol versions. The NFSv4 protocol does not 6329 provide distributed cache coherence. However, it defines a more 6330 limited set of caching guarantees to allow locks and share 6331 reservations to be used without destructive interference from client 6332 side caching. 6334 In addition, the NFSv4 protocol introduces a delegation mechanism 6335 which allows many decisions normally made by the server to be made 6336 locally by clients. This mechanism provides efficient support of the 6337 common cases where sharing is infrequent or where sharing is read- 6338 only. 6340 10.1. Performance Challenges for Client-Side Caching 6342 Caching techniques used in previous versions of the NFS protocol have 6343 been successful in providing good performance. However, several 6344 scalability challenges can arise when those techniques are used with 6345 very large numbers of clients. This is particularly true when 6346 clients are geographically distributed which classically increases 6347 the latency for cache re-validation requests. 6349 The previous versions of the NFS protocol repeat their file data 6350 cache validation requests at the time the file is opened. This 6351 behavior can have serious performance drawbacks. A common case is 6352 one in which a file is only accessed by a single client. Therefore, 6353 sharing is infrequent. 6355 In this case, repeated reference to the server to find that no 6356 conflicts exist is expensive. A better option with regards to 6357 performance is to allow a client that repeatedly opens a file to do 6358 so without reference to the server. This is done until potentially 6359 conflicting operations from another client actually occur. 6361 A similar situation arises in connection with file locking. Sending 6362 file lock and unlock requests to the server as well as the read and 6363 write requests necessary to make data caching consistent with the 6364 locking semantics (see Section 10.3.2) can severely limit 6365 performance. When locking is used to provide protection against 6366 infrequent conflicts, a large penalty is incurred. This penalty may 6367 discourage the use of file locking by applications. 6369 The NFSv4 protocol provides more aggressive caching strategies with 6370 the following design goals: 6372 o Compatibility with a large range of server semantics. 6374 o Provide the same caching benefits as previous versions of the NFS 6375 protocol when unable to provide the more aggressive model. 6377 o Requirements for aggressive caching are organized so that a large 6378 portion of the benefit can be obtained even when not all of the 6379 requirements can be met. 6381 The appropriate requirements for the server are discussed in later 6382 sections in which specific forms of caching are covered (see 6383 Section 10.4). 6385 10.2. Delegation and Callbacks 6387 Recallable delegation of server responsibilities for a file to a 6388 client improves performance by avoiding repeated requests to the 6389 server in the absence of inter-client conflict. With the use of a 6390 "callback" RPC from server to client, a server recalls delegated 6391 responsibilities when another client engages in sharing of a 6392 delegated file. 6394 A delegation is passed from the server to the client, specifying the 6395 object of the delegation and the type of delegation. There are 6396 different types of delegations but each type contains a stateid to be 6397 used to represent the delegation when performing operations that 6398 depend on the delegation. This stateid is similar to those 6399 associated with locks and share reservations but differs in that the 6400 stateid for a delegation is associated with a client ID and may be 6401 used on behalf of all the open-owners for the given client. A 6402 delegation is made to the client as a whole and not to any specific 6403 process or thread of control within it. 6405 Because callback RPCs may not work in all environments (due to 6406 firewalls, for example), correct protocol operation does not depend 6407 on them. Preliminary testing of callback functionality by means of a 6408 CB_NULL procedure determines whether callbacks can be supported. The 6409 CB_NULL procedure checks the continuity of the callback path. A 6410 server makes a preliminary assessment of callback availability to a 6411 given client and avoids delegating responsibilities until it has 6412 determined that callbacks are supported. Because the granting of a 6413 delegation is always conditional upon the absence of conflicting 6414 access, clients must not assume that a delegation will be granted and 6415 they must always be prepared for OPENs to be processed without any 6416 delegations being granted. 6418 Once granted, a delegation behaves in most ways like a lock. There 6419 is an associated lease that is subject to renewal together with all 6420 of the other leases held by that client. 6422 Unlike locks, an operation by a second client to a delegated file 6423 will cause the server to recall a delegation through a callback. 6425 On recall, the client holding the delegation must flush modified 6426 state (such as modified data) to the server and return the 6427 delegation. The conflicting request will not be acted on until the 6428 recall is complete. The recall is considered complete when the 6429 client returns the delegation or the server times out its wait for 6430 the delegation to be returned and revokes the delegation as a result 6431 of the timeout. In the interim, the server will either delay 6432 responding to conflicting requests or respond to them with 6433 NFS4ERR_DELAY. Following the resolution of the recall, the server 6434 has the information necessary to grant or deny the second client's 6435 request. 6437 At the time the client receives a delegation recall, it may have 6438 substantial state that needs to be flushed to the server. Therefore, 6439 the server should allow sufficient time for the delegation to be 6440 returned since it may involve numerous RPCs to the server. If the 6441 server is able to determine that the client is diligently flushing 6442 state to the server as a result of the recall, the server MAY extend 6443 the usual time allowed for a recall. However, the time allowed for 6444 recall completion should not be unbounded. 6446 An example of this is when responsibility to mediate opens on a given 6447 file is delegated to a client (see Section 10.4). The server will 6448 not know what opens are in effect on the client. Without this 6449 knowledge the server will be unable to determine if the access and 6450 deny state for the file allows any particular open until the 6451 delegation for the file has been returned. 6453 A client failure or a network partition can result in failure to 6454 respond to a recall callback. In this case, the server will revoke 6455 the delegation which in turn will render useless any modified state 6456 still on the client. 6458 Clients need to be aware that server implementers may enforce 6459 practical limitations on the number of delegations issued. Further, 6460 as there is no way to determine which delegations to revoke, the 6461 server is allowed to revoke any. If the server is implemented to 6462 revoke another delegation held by that client, then the client may be 6463 able to determine that a limit has been reached because each new 6464 delegation request results in a revoke. The client could then 6465 determine which delegations it may not need and preemptively release 6466 them. 6468 10.2.1. Delegation Recovery 6470 There are three situations that delegation recovery must deal with: 6472 o Client reboot or restart 6474 o Server reboot or restart (see Section 9.6.3.1) 6476 o Network partition (full or callback-only) 6478 In the event the client reboots or restarts, the confirmation of a 6479 SETCLIENTID done with an nfs_client_id4 with a new verifier4 value 6480 will result in the release of byte-range locks and share 6481 reservations. Delegations, however, may be treated a bit 6482 differently. 6484 There will be situations in which delegations will need to be 6485 reestablished after a client reboots or restarts. The reason for 6486 this is the client may have file data stored locally and this data 6487 was associated with the previously held delegations. The client will 6488 need to reestablish the appropriate file state on the server. 6490 To allow for this type of client recovery, the server MAY allow 6491 delegations to be retained after other sort of locks are released. 6492 This implies that requests from other clients that conflict with 6493 these delegations will need to wait. Because the normal recall 6494 process may require significant time for the client to flush changed 6495 state to the server, other clients need to be prepared for delays 6496 that occur because of a conflicting delegation. In order to give 6497 clients a chance to get through the reboot process during which 6498 leases will not be renewed, the server MAY extend the period for 6499 delegation recovery beyond the typical lease expiration period. For 6500 open delegations, such delegations that are not released are 6501 reclaimed using OPEN with a claim type of CLAIM_DELEGATE_PREV. (See 6502 Section 10.5 and Section 15.18 for discussion of open delegation and 6503 the details of OPEN respectively). 6505 A server MAY support a claim type of CLAIM_DELEGATE_PREV, but if it 6506 does, it MUST NOT remove delegations upon SETCLIENTID_CONFIRM and 6507 instead MUST make them available for client reclaim using 6508 CLAIM_DELEGATE_PREV. The server MUST NOT remove the delegations 6509 until either the client does a DELEGPURGE, or one lease period has 6510 elapsed from the time the later of the SETCLIENTID_CONFIRM or the 6511 last successful CLAIM_DELEGATE_PREV reclaim. 6513 Note that the requirement stated above is not meant to imply that 6514 when the server is no longer obliged, as required above, to retain 6515 delegation information, that it should necessarily dispose of it. 6516 Some specific cases are: 6518 o When the period is terminated by the occurrence of DELEGPURGE, 6519 deletion of unreclaimed delegations is appropriate and desirable. 6521 o When the period is terminated by a lease period elapsing without a 6522 successful CLAIM_DELEGATE_PREV reclaim, and that situation appears 6523 to be the result of a network partition (i.e., lease expiration 6524 has occurred), a server's lease expiration approach, possibly 6525 including the use of courtesy locks would normally provide for the 6526 retention of unreclaimed delegations. Even in the event that 6527 lease cancellation occurs, such delegation should be reclaimed 6528 using CLAIM_DELEGATE_PREV as part of network partition recovery. 6530 o When the period of non-communicating is followed by a client 6531 reboot, unreclaimed delegations, should also be reclaimable by use 6532 of CLAIM_DELEGATE_PREV as part of client reboot recovery. 6534 o When the period is terminated by a lease period elapsing without a 6535 successful CLAIM_DELEGATE_PREV reclaim, and lease renewal is 6536 occurring, the server may well conclude that unreclaimed 6537 delegations have been abandoned, and consider the situation as one 6538 in which an implied DELEGPURGE should be assumed. 6540 A server that supports a claim type of CLAIM_DELEGATE_PREV MUST 6541 support the DELEGPURGE operation, and similarly a server that 6542 supports DELEGPURGE MUST support CLAIM_DELEGATE_PREV. A server which 6543 does not support CLAIM_DELEGATE_PREV MUST return NFS4ERR_NOTSUPP if 6544 the client attempts to use that feature or performs a DELEGPURGE 6545 operation. 6547 Support for a claim type of CLAIM_DELEGATE_PREV, is often referred to 6548 as providing for "client-persistent delegations" in that they allow 6549 use of client persistent storage on the client to store data written 6550 by the client, even across a client restart. It should be noted 6551 that, with the optional exception noted below, this feature requires 6552 persistent storage to be used on the client and does not add to 6553 persistent storage requirements on the server. 6555 One good way to think about client-persistent delegations is that for 6556 the most part, they function like "courtesy locks", with special 6557 semantic adjustments to allow them to be retained across a client 6558 restart, which cause all other sorts of locks to be freed. Such 6559 locks are generally not retained across a server restart. The one 6560 exception is the case of simultaneous failure of the client and 6561 server and is discussed below. 6563 When the server indicates support of CLAIM_DELEGATE_PREV (implicitly) 6564 by returning NFS_OK to DELEGPURGE, a client with a write delegation, 6565 can use write-back caching for data to be written to the server, 6566 deferring the write-back, until such time as the delegation is 6567 recalled, possibly after intervening client restarts. Similarly, 6568 when the server indicates support of CLAIM_DELEGATE_PREV, a client 6569 with a read delegation and an open-for-write subordinate to that 6570 delegation, may be sure of the integrity of its persistently cached 6571 copy of the file after a client restart without specific verification 6572 of the change attribute. 6574 When the server reboots or restarts, delegations are reclaimed (using 6575 the OPEN operation with CLAIM_PREVIOUS) in a similar fashion to byte- 6576 range locks and share reservations. However, there is a slight 6577 semantic difference. In the normal case, if the server decides that 6578 a delegation should not be granted, it performs the requested action 6579 (e.g., OPEN) without granting any delegation. For reclaim, the 6580 server grants the delegation but a special designation is applied so 6581 that the client treats the delegation as having been granted but 6582 recalled by the server. Because of this, the client has the duty to 6583 write all modified state to the server and then return the 6584 delegation. This process of handling delegation reclaim reconciles 6585 three principles of the NFSv4 protocol: 6587 o Upon reclaim, a client claiming resources assigned to it by an 6588 earlier server instance must be granted those resources. 6590 o The server has unquestionable authority to determine whether 6591 delegations are to be granted and, once granted, whether they are 6592 to be continued. 6594 o The use of callbacks is not to be depended upon until the client 6595 has proven its ability to receive them. 6597 When a client has more than a single open associated with a 6598 delegation, state for those additional opens can be established using 6599 OPEN operations of type CLAIM_DELEGATE_CUR. When these are used to 6600 establish opens associated with reclaimed delegations, the server 6601 MUST allow them when made within the grace period. 6603 Situations in which there is a series of client and server restarts 6604 where there is no restart of both at the same time, are dealt with 6605 via a combination of CLAIM_DELEGATE_PREV and CLAIM_PREVIOUS reclaim 6606 cycles. Persistent storage is needed only on the client. For each 6607 server failure, a CLAIM_PREVIOUS reclaim cycle is done, while for 6608 each client restart, a CLAIM_DELEGATE_PREV reclaim cycle is done. 6610 To deal with the possibility of simultaneous failure of client and 6611 server (e.g., a data center power outage), the server MAY 6612 persistently store delegation information so that it can respond to a 6613 CLAIM_DELEGATE_PREV reclaim request which it receives from a 6614 restarting client. This is the one case in which persistent 6615 delegation state can be retained across a server restart. A server 6616 is not required to store this information, but if it does do so, it 6617 should do so for write delegations and for read delegations, during 6618 the pendency of which (across multiple client and/or server 6619 instances), some open-for-write was done as part of delegation. When 6620 the space to persistently record such information is limited, the 6621 server should recall delegations in this class in preference to 6622 keeping them active without persistent storage recording. 6624 When a network partition occurs, delegations are subject to freeing 6625 by the server when the lease renewal period expires. This is similar 6626 to the behavior for locks and share reservations, and, as for locks 6627 and share reservations it may be modified by support for "courtesy 6628 locks" in which locks are not freed in the absence of a conflicting 6629 lock request. Whereas, for locks and share reservations, freeing of 6630 locks will occur immediately upon the appearance of a conflicting 6631 request, for delegations, the server MAY institute period during 6632 which conflicting requests are held off. Eventually the occurrence 6633 of a conflicting request from another client will cause revocation of 6634 the delegation. 6636 A loss of the callback path (e.g., by later network configuration 6637 change) will have a similar effect in that it can also result in 6638 revocation of a delegation A recall request will fail and revocation 6639 of the delegation will result. 6641 A client normally finds out about revocation of a delegation when it 6642 uses a stateid associated with a delegation and receives one of the 6643 errors NFS4ERR_EXPIRED, NFS4ERR_BAD_STATEID, or NFS4ERR_ADMIN_REVOKED 6644 (NFS4ERR_EXPIRED indicates that all lock state associated with the 6645 client has been lost). It also may find out about delegation 6646 revocation after a client reboot when it attempts to reclaim a 6647 delegation and receives NFS4ERR_EXPIRED. Note that in the case of a 6648 revoked OPEN_DELEGATE_WRITE delegation, there are issues because data 6649 may have been modified by the client whose delegation is revoked and 6650 separately by other clients. See Section 10.5.1 for a discussion of 6651 such issues. Note also that when delegations are revoked, 6652 information about the revoked delegation will be written by the 6653 server to stable storage (as described in Section 9.6). This is done 6654 to deal with the case in which a server reboots after revoking a 6655 delegation but before the client holding the revoked delegation is 6656 notified about the revocation. 6658 Note that when there is a loss of a delegation, due to a network 6659 partition in which all locks associated with the lease are lost, the 6660 client will also receive the error NFS4ERR_EXPIRED. This case can be 6661 distinguished from other situations in which delegations are revoked 6662 by seeing that the associated clientid becomes invalid so that 6663 NFS4ERR_STALE_CLIENTID is returned when it is used. 6665 When NFS4ERR_EXPIRED is returned, the server MAY retain information 6666 about the delegations held by the client, deleting those that are 6667 invalidated by a conflicting request. Retaining such information 6668 will allow the client to recover all non-invalidated delegations 6669 using the claim type CLAIM_DELEGATE_PREV, once the 6670 SETCLIENTID_CONFIRM is done to recover. Attempted recovery of a 6671 delegation that the client has no record of, typically because they 6672 were invalidated by conflicting requests, will get the error 6673 NFS4ERR_BAD_RECLAIM. Once a reclaim is attempted for all delegations 6674 that the client held, it SHOULD do a DELEGPURGE to allow any 6675 remaining server delegation information to be freed. 6677 10.3. Data Caching 6679 When applications share access to a set of files, they need to be 6680 implemented so as to take account of the possibility of conflicting 6681 access by another application. This is true whether the applications 6682 in question execute on different clients or reside on the same 6683 client. 6685 Share reservations and byte-range locks are the facilities the NFS 6686 version 4 protocol provides to allow applications to coordinate 6687 access by providing mutual exclusion facilities. The NFSv4 6688 protocol's data caching must be implemented such that it does not 6689 invalidate the assumptions that those using these facilities depend 6690 upon. 6692 10.3.1. Data Caching and OPENs 6694 In order to avoid invalidating the sharing assumptions that 6695 applications rely on, NFSv4 clients should not provide cached data to 6696 applications or modify it on behalf of an application when it would 6697 not be valid to obtain or modify that same data via a READ or WRITE 6698 operation. 6700 Furthermore, in the absence of open delegation (see Section 10.4) two 6701 additional rules apply. Note that these rules are obeyed in practice 6702 by many NFSv2 and NFSv3 clients. 6704 o First, cached data present on a client must be revalidated after 6705 doing an OPEN. Revalidating means that the client fetches the 6706 change attribute from the server, compares it with the cached 6707 change attribute, and if different, declares the cached data (as 6708 well as the cached attributes) as invalid. This is to ensure that 6709 the data for the OPENed file is still correctly reflected in the 6710 client's cache. This validation must be done at least when the 6711 client's OPEN operation includes DENY=WRITE or BOTH thus 6712 terminating a period in which other clients may have had the 6713 opportunity to open the file with WRITE access. Clients may 6714 choose to do the revalidation more often (such as at OPENs 6715 specifying DENY=NONE) to parallel the NFSv3 protocol's practice 6716 for the benefit of users assuming this degree of cache 6717 revalidation. 6719 Since the change attribute is updated for data and metadata 6720 modifications, some client implementers may be tempted to use the 6721 time_modify attribute and not the change attribute to validate 6722 cached data, so that metadata changes do not spuriously invalidate 6723 clean data. The implementer is cautioned against this approach. 6724 The change attribute is guaranteed to change for each update to 6725 the file, whereas time_modify is guaranteed to change only at the 6726 granularity of the time_delta attribute. Use by the client's data 6727 cache validation logic of time_modify and not the change attribute 6728 runs the risk of the client incorrectly marking stale data as 6729 valid. 6731 o Second, modified data must be flushed to the server before closing 6732 a file OPENed for write. This is complementary to the first rule. 6733 If the data is not flushed at CLOSE, the revalidation done after 6734 the client OPENs a file is unable to achieve its purpose. The 6735 other aspect to flushing the data before close is that the data 6736 must be committed to stable storage, at the server, before the 6737 CLOSE operation is requested by the client. In the case of a 6738 server reboot or restart and a CLOSEd file, it may not be possible 6739 to retransmit the data to be written to the file. Hence, this 6740 requirement. 6742 10.3.2. Data Caching and File Locking 6744 For those applications that choose to use file locking instead of 6745 share reservations to exclude inconsistent file access, there is an 6746 analogous set of constraints that apply to client side data caching. 6747 These rules are effective only if the file locking is used in a way 6748 that matches in an equivalent way the actual READ and WRITE 6749 operations executed. This is as opposed to file locking that is 6750 based on pure convention. For example, it is possible to manipulate 6751 a two-megabyte file by dividing the file into two one-megabyte 6752 regions and protecting access to the two regions by file locks on 6753 bytes zero and one. A lock for write on byte zero of the file would 6754 represent the right to do READ and WRITE operations on the first 6755 region. A lock for write on byte one of the file would represent the 6756 right to do READ and WRITE operations on the second region. As long 6757 as all applications manipulating the file obey this convention, they 6758 will work on a local file system. However, they may not work with 6759 the NFSv4 protocol unless clients refrain from data caching. 6761 The rules for data caching in the file locking environment are: 6763 o First, when a client obtains a file lock for a particular region, 6764 the data cache corresponding to that region (if any cached data 6765 exists) must be revalidated. If the change attribute indicates 6766 that the file may have been updated since the cached data was 6767 obtained, the client must flush or invalidate the cached data for 6768 the newly locked region. A client might choose to invalidate all 6769 of non-modified cached data that it has for the file but the only 6770 requirement for correct operation is to invalidate all of the data 6771 in the newly locked region. 6773 o Second, before releasing a write lock for a region, all modified 6774 data for that region must be flushed to the server. The modified 6775 data must also be written to stable storage. 6777 Note that flushing data to the server and the invalidation of cached 6778 data must reflect the actual byte ranges locked or unlocked. 6779 Rounding these up or down to reflect client cache block boundaries 6780 will cause problems if not carefully done. For example, writing a 6781 modified block when only half of that block is within an area being 6782 unlocked may cause invalid modification to the region outside the 6783 unlocked area. This, in turn, may be part of a region locked by 6784 another client. Clients can avoid this situation by synchronously 6785 performing portions of write operations that overlap that portion 6786 (initial or final) that is not a full block. Similarly, invalidating 6787 a locked area which is not an integral number of full buffer blocks 6788 would require the client to read one or two partial blocks from the 6789 server if the revalidation procedure shows that the data which the 6790 client possesses may not be valid. 6792 The data that is written to the server as a prerequisite to the 6793 unlocking of a region must be written, at the server, to stable 6794 storage. The client may accomplish this either with synchronous 6795 writes or by following asynchronous writes with a COMMIT operation. 6796 This is required because retransmission of the modified data after a 6797 server reboot might conflict with a lock held by another client. 6799 A client implementation may choose to accommodate applications which 6800 use byte-range locking in non-standard ways (e.g., using a byte-range 6801 lock as a global semaphore) by flushing to the server more data upon 6802 a LOCKU than is covered by the locked range. This may include 6803 modified data within files other than the one for which the unlocks 6804 are being done. In such cases, the client must not interfere with 6805 applications whose READs and WRITEs are being done only within the 6806 bounds of record locks which the application holds. For example, an 6807 application locks a single byte of a file and proceeds to write that 6808 single byte. A client that chose to handle a LOCKU by flushing all 6809 modified data to the server could validly write that single byte in 6810 response to an unrelated unlock. However, it would not be valid to 6811 write the entire block in which that single written byte was located 6812 since it includes an area that is not locked and might be locked by 6813 another client. Client implementations can avoid this problem by 6814 dividing files with modified data into those for which all 6815 modifications are done to areas covered by an appropriate byte-range 6816 lock and those for which there are modifications not covered by a 6817 byte-range lock. Any writes done for the former class of files must 6818 not include areas not locked and thus not modified on the client. 6820 10.3.3. Data Caching and Mandatory File Locking 6822 Client side data caching needs to respect mandatory file locking when 6823 it is in effect. The presence of mandatory file locking for a given 6824 file is indicated when the client gets back NFS4ERR_LOCKED from a 6825 READ or WRITE on a file it has an appropriate share reservation for. 6826 When mandatory locking is in effect for a file, the client must check 6827 for an appropriate file lock for data being read or written. If a 6828 lock exists for the range being read or written, the client may 6829 satisfy the request using the client's validated cache. If an 6830 appropriate file lock is not held for the range of the read or write, 6831 the read or write request must not be satisfied by the client's cache 6832 and the request must be sent to the server for processing. When a 6833 read or write request partially overlaps a locked region, the request 6834 should be subdivided into multiple pieces with each region (locked or 6835 not) treated appropriately. 6837 10.3.4. Data Caching and File Identity 6839 When clients cache data, the file data needs to be organized 6840 according to the file system object to which the data belongs. For 6841 NFSv3 clients, the typical practice has been to assume for the 6842 purpose of caching that distinct filehandles represent distinct file 6843 system objects. The client then has the choice to organize and 6844 maintain the data cache on this basis. 6846 In the NFSv4 protocol, there is now the possibility to have 6847 significant deviations from a "one filehandle per object" model 6848 because a filehandle may be constructed on the basis of the object's 6849 pathname. Therefore, clients need a reliable method to determine if 6850 two filehandles designate the same file system object. If clients 6851 were simply to assume that all distinct filehandles denote distinct 6852 objects and proceed to do data caching on this basis, caching 6853 inconsistencies would arise between the distinct client side objects 6854 which mapped to the same server side object. 6856 By providing a method to differentiate filehandles, the NFSv4 6857 protocol alleviates a potential functional regression in comparison 6858 with the NFSv3 protocol. Without this method, caching 6859 inconsistencies within the same client could occur and this has not 6860 been present in previous versions of the NFS protocol. Note that it 6861 is possible to have such inconsistencies with applications executing 6862 on multiple clients but that is not the issue being addressed here. 6864 For the purposes of data caching, the following steps allow an NFSv4 6865 client to determine whether two distinct filehandles denote the same 6866 server side object: 6868 o If GETATTR directed to two filehandles returns different values of 6869 the fsid attribute, then the filehandles represent distinct 6870 objects. 6872 o If GETATTR for any file with an fsid that matches the fsid of the 6873 two filehandles in question returns a unique_handles attribute 6874 with a value of TRUE, then the two objects are distinct. 6876 o If GETATTR directed to the two filehandles does not return the 6877 fileid attribute for both of the handles, then it cannot be 6878 determined whether the two objects are the same. Therefore, 6879 operations which depend on that knowledge (e.g., client side data 6880 caching) cannot be done reliably. Note that if GETATTR does not 6881 return the fileid attribute for both filehandles, it will return 6882 it for neither of the filehandles, since the fsid for both 6883 filehandles is the same. 6885 o If GETATTR directed to the two filehandles returns different 6886 values for the fileid attribute, then they are distinct objects. 6888 o Otherwise they are the same object. 6890 10.4. Open Delegation 6892 When a file is being OPENed, the server may delegate further handling 6893 of opens and closes for that file to the opening client. Any such 6894 delegation is recallable, since the circumstances that allowed for 6895 the delegation are subject to change. In particular, the server may 6896 receive a conflicting OPEN from another client, the server must 6897 recall the delegation before deciding whether the OPEN from the other 6898 client may be granted. Making a delegation is up to the server and 6899 clients should not assume that any particular OPEN either will or 6900 will not result in an open delegation. The following is a typical 6901 set of conditions that servers might use in deciding whether OPEN 6902 should be delegated: 6904 o The client must be able to respond to the server's callback 6905 requests. The server will use the CB_NULL procedure for a test of 6906 callback ability. 6908 o The client must have responded properly to previous recalls. 6910 o There must be no current open conflicting with the requested 6911 delegation. 6913 o There should be no current delegation that conflicts with the 6914 delegation being requested. 6916 o The probability of future conflicting open requests should be low 6917 based on the recent history of the file. 6919 o The existence of any server-specific semantics of OPEN/CLOSE that 6920 would make the required handling incompatible with the prescribed 6921 handling that the delegated client would apply (see below). 6923 There are two types of open delegations, OPEN_DELEGATE_READ and 6924 OPEN_DELEGATE_WRITE. A OPEN_DELEGATE_READ delegation allows a client 6925 to handle, on its own, requests to open a file for reading that do 6926 not deny read access to others. It MUST, however, continue to send 6927 all requests to open a file for writing to the server. Multiple 6928 OPEN_DELEGATE_READ delegations may be outstanding simultaneously and 6929 do not conflict. A OPEN_DELEGATE_WRITE delegation allows the client 6930 to handle, on its own, all opens. Only one OPEN_DELEGATE_WRITE 6931 delegation may exist for a given file at a given time and it is 6932 inconsistent with any OPEN_DELEGATE_READ delegations. 6934 When a single client holds a OPEN_DELEGATE_READ delegation, it is 6935 assured that no other client may modify the contents or attributes of 6936 the file. If more than one client holds an OPEN_DELEGATE_READ 6937 delegation, then the contents and attributes of that file are not 6938 allowed to change. When a client has an OPEN_DELEGATE_WRITE 6939 delegation, it may modify the file data since no other client will be 6940 accessing the file's data. The client holding a OPEN_DELEGATE_WRITE 6941 delegation may only affect file attributes which are intimately 6942 connected with the file data: size, time_modify, change. 6944 When a client has an open delegation, it does not send OPENs or 6945 CLOSEs to the server but updates the appropriate status internally. 6946 For a OPEN_DELEGATE_READ delegation, opens that cannot be handled 6947 locally (opens for write or that deny read access) must be sent to 6948 the server. 6950 When an open delegation is made, the response to the OPEN contains an 6951 open delegation structure which specifies the following: 6953 o the type of delegation (read or write) 6955 o space limitation information to control flushing of data on close 6956 (OPEN_DELEGATE_WRITE delegation only, see Section 10.4.1) 6958 o an nfsace4 specifying read and write permissions 6960 o a stateid to represent the delegation for READ and WRITE 6962 The delegation stateid is separate and distinct from the stateid for 6963 the OPEN proper. The standard stateid, unlike the delegation 6964 stateid, is associated with a particular open-owner and will continue 6965 to be valid after the delegation is recalled and the file remains 6966 open. 6968 When a request internal to the client is made to open a file and open 6969 delegation is in effect, it will be accepted or rejected solely on 6970 the basis of the following conditions. Any requirement for other 6971 checks to be made by the delegate should result in open delegation 6972 being denied so that the checks can be made by the server itself. 6974 o The access and deny bits for the request and the file as described 6975 in Section 9.9. 6977 o The read and write permissions as determined below. 6979 The nfsace4 passed with delegation can be used to avoid frequent 6980 ACCESS calls. The permission check should be as follows: 6982 o If the nfsace4 indicates that the open may be done, then it should 6983 be granted without reference to the server. 6985 o If the nfsace4 indicates that the open may not be done, then an 6986 ACCESS request must be sent to the server to obtain the definitive 6987 answer. 6989 The server may return an nfsace4 that is more restrictive than the 6990 actual ACL of the file. This includes an nfsace4 that specifies 6991 denial of all access. Note that some common practices such as 6992 mapping the traditional user "root" to the user "nobody" may make it 6993 incorrect to return the actual ACL of the file in the delegation 6994 response. 6996 The use of delegation together with various other forms of caching 6997 creates the possibility that no server authentication will ever be 6998 performed for a given user since all of the user's requests might be 6999 satisfied locally. Where the client is depending on the server for 7000 authentication, the client should be sure authentication occurs for 7001 each user by use of the ACCESS operation. This should be the case 7002 even if an ACCESS operation would not be required otherwise. As 7003 mentioned before, the server may enforce frequent authentication by 7004 returning an nfsace4 denying all access with every open delegation. 7006 10.4.1. Open Delegation and Data Caching 7008 OPEN delegation allows much of the message overhead associated with 7009 the opening and closing files to be eliminated. An open when an open 7010 delegation is in effect does not require that a validation message be 7011 sent to the server unless there exists a potential for conflict with 7012 the requested share mode. The continued endurance of the 7013 "OPEN_DELEGATE_READ delegation" provides a guarantee that no OPEN for 7014 write and thus no write has occurred that did not originate from this 7015 client. Similarly, when closing a file opened for write and if 7016 OPEN_DELEGATE_WRITE delegation is in effect, the data written does 7017 not have to be flushed to the server until the open delegation is 7018 recalled. The continued endurance of the open delegation provides a 7019 guarantee that no open and thus no read or write has been done by 7020 another client. 7022 For the purposes of open delegation, READs and WRITEs done without an 7023 OPEN (anonymous and READ bypass stateids) are treated as the 7024 functional equivalents of a corresponding type of OPEN. READs and 7025 WRITEs done with an anonymous stateid done by another client will 7026 force the server to recall a OPEN_DELEGATE_WRITE delegation. A WRITE 7027 with an anonymous stateid done by another client will force a recall 7028 of OPEN_DELEGATE_READ delegations. The handling of a READ bypass 7029 stateid is identical except that a READ done with a READ bypass 7030 stateid will not force a recall of an OPEN_DELEGATE_READ delegation. 7032 With delegations, a client is able to avoid writing data to the 7033 server when the CLOSE of a file is serviced. The file close system 7034 call is the usual point at which the client is notified of a lack of 7035 stable storage for the modified file data generated by the 7036 application. At the close, file data is written to the server and 7037 through normal accounting the server is able to determine if the 7038 available file system space for the data has been exceeded (i.e., 7039 server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting 7040 includes quotas. The introduction of delegations requires that a 7041 alternative method be in place for the same type of communication to 7042 occur between client and server. 7044 In the delegation response, the server provides either the limit of 7045 the size of the file or the number of modified blocks and associated 7046 block size. The server must ensure that the client will be able to 7047 flush data to the server of a size equal to that provided in the 7048 original delegation. The server must make this assurance for all 7049 outstanding delegations. Therefore, the server must be careful in 7050 its management of available space for new or modified data taking 7051 into account available file system space and any applicable quotas. 7052 The server can recall delegations as a result of managing the 7053 available file system space. The client should abide by the server's 7054 state space limits for delegations. If the client exceeds the stated 7055 limits for the delegation, the server's behavior is undefined. 7057 Based on server conditions, quotas or available file system space, 7058 the server may grant OPEN_DELEGATE_WRITE delegations with very 7059 restrictive space limitations. The limitations may be defined in a 7060 way that will always force modified data to be flushed to the server 7061 on close. 7063 With respect to authentication, flushing modified data to the server 7064 after a CLOSE has occurred may be problematic. For example, the user 7065 of the application may have logged off the client and unexpired 7066 authentication credentials may not be present. In this case, the 7067 client may need to take special care to ensure that local unexpired 7068 credentials will in fact be available. One way that this may be 7069 accomplished by tracking the expiration time of credentials and 7070 flushing data well in advance of their expiration. 7072 10.4.2. Open Delegation and File Locks 7074 When a client holds a OPEN_DELEGATE_WRITE delegation, lock operations 7075 may be performed locally. This includes those required for mandatory 7076 file locking. This can be done since the delegation implies that 7077 there can be no conflicting locks. Similarly, all of the 7078 revalidations that would normally be associated with obtaining locks 7079 and the flushing of data associated with the releasing of locks need 7080 not be done. 7082 When a client holds a OPEN_DELEGATE_READ delegation, lock operations 7083 are not performed locally. All lock operations, including those 7084 requesting non-exclusive locks, are sent to the server for 7085 resolution. 7087 10.4.3. Handling of CB_GETATTR 7089 The server needs to employ special handling for a GETATTR where the 7090 target is a file that has a OPEN_DELEGATE_WRITE delegation in effect. 7091 The reason for this is that the client holding the 7092 OPEN_DELEGATE_WRITE delegation may have modified the data and the 7093 server needs to reflect this change to the second client that 7094 submitted the GETATTR. Therefore, the client holding the 7095 OPEN_DELEGATE_WRITE delegation needs to be interrogated. The server 7096 will use the CB_GETATTR operation. The only attributes that the 7097 server can reliably query via CB_GETATTR are size and change. 7099 Since CB_GETATTR is being used to satisfy another client's GETATTR 7100 request, the server only needs to know if the client holding the 7101 delegation has a modified version of the file. If the client's copy 7102 of the delegated file is not modified (data or size), the server can 7103 satisfy the second client's GETATTR request from the attributes 7104 stored locally at the server. If the file is modified, the server 7105 only needs to know about this modified state. If the server 7106 determines that the file is currently modified, it will respond to 7107 the second client's GETATTR as if the file had been modified locally 7108 at the server. 7110 Since the form of the change attribute is determined by the server 7111 and is opaque to the client, the client and server need to agree on a 7112 method of communicating the modified state of the file. For the size 7113 attribute, the client will report its current view of the file size. 7114 For the change attribute, the handling is more involved. 7116 For the client, the following steps will be taken when receiving a 7117 OPEN_DELEGATE_WRITE delegation: 7119 o The value of the change attribute will be obtained from the server 7120 and cached. Let this value be represented by c. 7122 o The client will create a value greater than c that will be used 7123 for communicating modified data is held at the client. Let this 7124 value be represented by d. 7126 o When the client is queried via CB_GETATTR for the change 7127 attribute, it checks to see if it holds modified data. If the 7128 file is modified, the value d is returned for the change attribute 7129 value. If this file is not currently modified, the client returns 7130 the value c for the change attribute. 7132 For simplicity of implementation, the client MAY for each CB_GETATTR 7133 return the same value d. This is true even if, between successive 7134 CB_GETATTR operations, the client again modifies in the file's data 7135 or metadata in its cache. The client can return the same value 7136 because the only requirement is that the client be able to indicate 7137 to the server that the client holds modified data. Therefore, the 7138 value of d may always be c + 1. 7140 While the change attribute is opaque to the client in the sense that 7141 it has no idea what units of time, if any, the server is counting 7142 change with, it is not opaque in that the client has to treat it as 7143 an unsigned integer, and the server has to be able to see the results 7144 of the client's changes to that integer. Therefore, the server MUST 7145 encode the change attribute in network order when sending it to the 7146 client. The client MUST decode it from network order to its native 7147 order when receiving it and the client MUST encode it network order 7148 when sending it to the server. For this reason, the change attribute 7149 is defined as an unsigned integer rather than an opaque array of 7150 bytes. 7152 For the server, the following steps will be taken when providing a 7153 OPEN_DELEGATE_WRITE delegation: 7155 o Upon providing a OPEN_DELEGATE_WRITE delegation, the server will 7156 cache a copy of the change attribute in the data structure it uses 7157 to record the delegation. Let this value be represented by sc. 7159 o When a second client sends a GETATTR operation on the same file to 7160 the server, the server obtains the change attribute from the first 7161 client. Let this value be cc. 7163 o If the value cc is equal to sc, the file is not modified and the 7164 server returns the current values for change, time_metadata, and 7165 time_modify (for example) to the second client. 7167 o If the value cc is NOT equal to sc, the file is currently modified 7168 at the first client and most likely will be modified at the server 7169 at a future time. The server then uses its current time to 7170 construct attribute values for time_metadata and time_modify. A 7171 new value of sc, which we will call nsc, is computed by the 7172 server, such that nsc >= sc + 1. The server then returns the 7173 constructed time_metadata, time_modify, and nsc values to the 7174 requester. The server replaces sc in the delegation record with 7175 nsc. To prevent the possibility of time_modify, time_metadata, 7176 and change from appearing to go backward (which would happen if 7177 the client holding the delegation fails to write its modified data 7178 to the server before the delegation is revoked or returned), the 7179 server SHOULD update the file's metadata record with the 7180 constructed attribute values. For reasons of reasonable 7181 performance, committing the constructed attribute values to stable 7182 storage is OPTIONAL. 7184 As discussed earlier in this section, the client MAY return the same 7185 cc value on subsequent CB_GETATTR calls, even if the file was 7186 modified in the client's cache yet again between successive 7187 CB_GETATTR calls. Therefore, the server must assume that the file 7188 has been modified yet again, and MUST take care to ensure that the 7189 new nsc it constructs and returns is greater than the previous nsc it 7190 returned. An example implementation's delegation record would 7191 satisfy this mandate by including a boolean field (let us call it 7192 "modified") that is set to FALSE when the delegation is granted, and 7193 an sc value set at the time of grant to the change attribute value. 7194 The modified field would be set to TRUE the first time cc != sc, and 7195 would stay TRUE until the delegation is returned or revoked. The 7196 processing for constructing nsc, time_modify, and time_metadata would 7197 use this pseudo code: 7199 if (!modified) { 7200 do CB_GETATTR for change and size; 7202 if (cc != sc) 7203 modified = TRUE; 7204 } else { 7205 do CB_GETATTR for size; 7206 } 7208 if (modified) { 7209 sc = sc + 1; 7210 time_modify = time_metadata = current_time; 7211 update sc, time_modify, time_metadata into file's metadata; 7212 } 7214 This would return to the client (that sent GETATTR) the attributes it 7215 requested, but make sure size comes from what CB_GETATTR returned. 7216 The server would not update the file's metadata with the client's 7217 modified size. 7219 In the case that the file attribute size is different than the 7220 server's current value, the server treats this as a modification 7221 regardless of the value of the change attribute retrieved via 7222 CB_GETATTR and responds to the second client as in the last step. 7224 This methodology resolves issues of clock differences between client 7225 and server and other scenarios where the use of CB_GETATTR breaks 7226 down. 7228 It should be noted that the server is under no obligation to use 7229 CB_GETATTR and therefore the server MAY simply recall the delegation 7230 to avoid its use. 7232 10.4.4. Recall of Open Delegation 7234 The following events necessitate recall of an open delegation: 7236 o Potentially conflicting OPEN request (or READ/WRITE done with 7237 "special" stateid) 7239 o SETATTR issued by another client 7241 o REMOVE request for the file 7243 o RENAME request for the file as either source or target of the 7244 RENAME 7246 Whether a RENAME of a directory in the path leading to the file 7247 results in recall of an open delegation depends on the semantics of 7248 the server file system. If that file system denies such RENAMEs when 7249 a file is open, the recall must be performed to determine whether the 7250 file in question is, in fact, open. 7252 In addition to the situations above, the server may choose to recall 7253 open delegations at any time if resource constraints make it 7254 advisable to do so. Clients should always be prepared for the 7255 possibility of recall. 7257 When a client receives a recall for an open delegation, it needs to 7258 update state on the server before returning the delegation. These 7259 same updates must be done whenever a client chooses to return a 7260 delegation voluntarily. The following items of state need to be 7261 dealt with: 7263 o If the file associated with the delegation is no longer open and 7264 no previous CLOSE operation has been sent to the server, a CLOSE 7265 operation must be sent to the server. 7267 o If a file has other open references at the client, then OPEN 7268 operations must be sent to the server. The appropriate stateids 7269 will be provided by the server for subsequent use by the client 7270 since the delegation stateid will not longer be valid. These OPEN 7271 requests are done with the claim type of CLAIM_DELEGATE_CUR. This 7272 will allow the presentation of the delegation stateid so that the 7273 client can establish the appropriate rights to perform the OPEN. 7274 (see Section 15.18 for details.) 7276 o If there are granted file locks, the corresponding LOCK operations 7277 need to be performed. This applies to the OPEN_DELEGATE_WRITE 7278 delegation case only. 7280 o For a OPEN_DELEGATE_WRITE delegation, if at the time of recall the 7281 file is not open for write, all modified data for the file must be 7282 flushed to the server. If the delegation had not existed, the 7283 client would have done this data flush before the CLOSE operation. 7285 o For a OPEN_DELEGATE_WRITE delegation when a file is still open at 7286 the time of recall, any modified data for the file needs to be 7287 flushed to the server. 7289 o With the OPEN_DELEGATE_WRITE delegation in place, it is possible 7290 that the file was truncated during the duration of the delegation. 7291 For example, the truncation could have occurred as a result of an 7292 OPEN UNCHECKED4 with a size attribute value of zero. Therefore, 7293 if a truncation of the file has occurred and this operation has 7294 not been propagated to the server, the truncation must occur 7295 before any modified data is written to the server. 7297 In the case of OPEN_DELEGATE_WRITE delegation, file locking imposes 7298 some additional requirements. To precisely maintain the associated 7299 invariant, it is required to flush any modified data in any region 7300 for which a write lock was released while the OPEN_DELEGATE_WRITE 7301 delegation was in effect. However, because the OPEN_DELEGATE_WRITE 7302 delegation implies no other locking by other clients, a simpler 7303 implementation is to flush all modified data for the file (as 7304 described just above) if any write lock has been released while the 7305 OPEN_DELEGATE_WRITE delegation was in effect. 7307 An implementation need not wait until delegation recall (or deciding 7308 to voluntarily return a delegation) to perform any of the above 7309 actions, if implementation considerations (e.g., resource 7310 availability constraints) make that desirable. Generally, however, 7311 the fact that the actual open state of the file may continue to 7312 change makes it not worthwhile to send information about opens and 7313 closes to the server, except as part of delegation return. Only in 7314 the case of closing the open that resulted in obtaining the 7315 delegation would clients be likely to do this early, since, in that 7316 case, the close once done will not be undone. Regardless of the 7317 client's choices on scheduling these actions, all must be performed 7318 before the delegation is returned, including (when applicable) the 7319 close that corresponds to the open that resulted in the delegation. 7320 These actions can be performed either in previous requests or in 7321 previous operations in the same COMPOUND request. 7323 10.4.5. OPEN Delegation Race with CB_RECALL 7325 The server informs the client of recall via a CB_RECALL. A race case 7326 which may develop is when the delegation is immediately recalled 7327 before the COMPOUND which established the delegation is returned to 7328 the client. As the CB_RECALL provides both a stateid and a 7329 filehandle for which the client has no mapping, it cannot honor the 7330 recall attempt. At this point, the client has two choices, either do 7331 not respond or respond with NFS4ERR_BADHANDLE. If it does not 7332 respond, then it runs the risk of the server deciding to not grant it 7333 further delegations. 7335 If instead it does reply with NFS4ERR_BADHANDLE, then both the client 7336 and the server might be able to detect that a race condition is 7337 occurring. The client can keep a list of pending delegations. When 7338 it receives a CB_RECALL for an unknown delegation, it can cache the 7339 stateid and filehandle on a list of pending recalls. When it is 7340 provided with a delegation, it would only use it if it was not on the 7341 pending recall list. Upon the next CB_RECALL, it could immediately 7342 return the delegation. 7344 In turn, the server can keep track of when it issues a delegation and 7345 assume that if a client responds to the CB_RECALL with a 7346 NFS4ERR_BADHANDLE, then the client has yet to receive the delegation. 7347 The server SHOULD give the client a reasonable time both to get this 7348 delegation and to return it before revoking the delegation. Unlike a 7349 failed callback path, the server should periodically probe the client 7350 with CB_RECALL to see if it has received the delegation and is ready 7351 to return it. 7353 When the server finally determines that enough time has lapsed, it 7354 SHOULD revoke the delegation and it SHOULD NOT revoke the lease. 7355 During this extended recall process, the server SHOULD be renewing 7356 the client lease. The intent here is that the client not pay too 7357 onerous a burden for a condition caused by the server. 7359 10.4.6. Clients that Fail to Honor Delegation Recalls 7361 A client may fail to respond to a recall for various reasons, such as 7362 a failure of the callback path from server to the client. The client 7363 may be unaware of a failure in the callback path. This lack of 7364 awareness could result in the client finding out long after the 7365 failure that its delegation has been revoked, and another client has 7366 modified the data for which the client had a delegation. This is 7367 especially a problem for the client that held a OPEN_DELEGATE_WRITE 7368 delegation. 7370 The server also has a dilemma in that the client that fails to 7371 respond to the recall might also be sending other NFS requests, 7372 including those that renew the lease before the lease expires. 7373 Without returning an error for those lease renewing operations, the 7374 server leads the client to believe that the delegation it has is in 7375 force. 7377 This difficulty is solved by the following rules: 7379 o When the callback path is down, the server MUST NOT revoke the 7380 delegation if one of the following occurs: 7382 * The client has issued a RENEW operation and the server has 7383 returned an NFS4ERR_CB_PATH_DOWN error. The server MUST renew 7384 the lease for any byte-range locks and share reservations the 7385 client has that the server has known about (as opposed to those 7386 locks and share reservations the client has established but not 7387 yet sent to the server, due to the delegation). The server 7388 SHOULD give the client a reasonable time to return its 7389 delegations to the server before revoking the client's 7390 delegations. 7392 * The client has not issued a RENEW operation for some period of 7393 time after the server attempted to recall the delegation. This 7394 period of time MUST NOT be less than the value of the 7395 lease_time attribute. 7397 o When the client holds a delegation, it cannot rely on operations, 7398 except for RENEW, that take a stateid, to renew delegation leases 7399 across callback path failures. The client that wants to keep 7400 delegations in force across callback path failures must use RENEW 7401 to do so. 7403 10.4.7. Delegation Revocation 7405 At the point a delegation is revoked, if there are associated opens 7406 on the client, the applications holding these opens need to be 7407 notified. This notification usually occurs by returning errors for 7408 READ/WRITE operations or when a close is attempted for the open file. 7410 If no opens exist for the file at the point the delegation is 7411 revoked, then notification of the revocation is unnecessary. 7412 However, if there is modified data present at the client for the 7413 file, the user of the application should be notified. Unfortunately, 7414 it may not be possible to notify the user since active applications 7415 may not be present at the client. See Section 10.5.1 for additional 7416 details. 7418 10.5. Data Caching and Revocation 7420 When locks and delegations are revoked, the assumptions upon which 7421 successful caching depend are no longer guaranteed. For any locks or 7422 share reservations that have been revoked, the corresponding owner 7423 needs to be notified. This notification includes applications with a 7424 file open that has a corresponding delegation which has been revoked. 7425 Cached data associated with the revocation must be removed from the 7426 client. In the case of modified data existing in the client's cache, 7427 that data must be removed from the client without it being written to 7428 the server. As mentioned, the assumptions made by the client are no 7429 longer valid at the point when a lock or delegation has been revoked. 7430 For example, another client may have been granted a conflicting lock 7431 after the revocation of the lock at the first client. Therefore, the 7432 data within the lock range may have been modified by the other 7433 client. Obviously, the first client is unable to guarantee to the 7434 application what has occurred to the file in the case of revocation. 7436 Notification to a lock-owner will in many cases consist of simply 7437 returning an error on the next and all subsequent READs/WRITEs to the 7438 open file or on the close. Where the methods available to a client 7439 make such notification impossible because errors for certain 7440 operations may not be returned, more drastic action such as signals 7441 or process termination may be appropriate. The justification for 7442 this is that an invariant for which an application depends on may be 7443 violated. Depending on how errors are typically treated for the 7444 client operating environment, further levels of notification 7445 including logging, console messages, and GUI pop-ups may be 7446 appropriate. 7448 10.5.1. Revocation Recovery for Write Open Delegation 7450 Revocation recovery for a OPEN_DELEGATE_WRITE delegation poses the 7451 special issue of modified data in the client cache while the file is 7452 not open. In this situation, any client which does not flush 7453 modified data to the server on each close must ensure that the user 7454 receives appropriate notification of the failure as a result of the 7455 revocation. Since such situations may require human action to 7456 correct problems, notification schemes in which the appropriate user 7457 or administrator is notified may be necessary. Logging and console 7458 messages are typical examples. 7460 If there is modified data on the client, it must not be flushed 7461 normally to the server. A client may attempt to provide a copy of 7462 the file data as modified during the delegation under a different 7463 name in the file system name space to ease recovery. Note that when 7464 the client can determine that the file has not been modified by any 7465 other client, or when the client has a complete cached copy of the 7466 file in question, such a saved copy of the client's view of the file 7467 may be of particular value for recovery. In other cases, recovery 7468 using a copy of the file based partially on the client's cached data 7469 and partially on the server copy as modified by other clients, will 7470 be anything but straightforward, so clients may avoid saving file 7471 contents in these situations or mark the results specially to warn 7472 users of possible problems. 7474 Saving of such modified data in delegation revocation situations may 7475 be limited to files of a certain size or might be used only when 7476 sufficient disk space is available within the target file system. 7477 Such saving may also be restricted to situations when the client has 7478 sufficient buffering resources to keep the cached copy available 7479 until it is properly stored to the target file system. 7481 10.6. Attribute Caching 7483 The attributes discussed in this section do not include named 7484 attributes. Individual named attributes are analogous to files and 7485 caching of the data for these needs to be handled just as data 7486 caching is for regular files. Similarly, LOOKUP results from an 7487 OPENATTR directory are to be cached on the same basis as any other 7488 pathnames and similarly for directory contents. 7490 Clients may cache file attributes obtained from the server and use 7491 them to avoid subsequent GETATTR requests. Such caching is write 7492 through in that modification to file attributes is always done by 7493 means of requests to the server and should not be done locally and 7494 cached. The exception to this are modifications to attributes that 7495 are intimately connected with data caching. Therefore, extending a 7496 file by writing data to the local data cache is reflected immediately 7497 in the size as seen on the client without this change being 7498 immediately reflected on the server. Normally such changes are not 7499 propagated directly to the server but when the modified data is 7500 flushed to the server, analogous attribute changes are made on the 7501 server. When open delegation is in effect, the modified attributes 7502 may be returned to the server in the response to a CB_GETATTR call. 7504 The result of local caching of attributes is that the attribute 7505 caches maintained on individual clients will not be coherent. 7506 Changes made in one order on the server may be seen in a different 7507 order on one client and in a third order on a different client. 7509 The typical file system application programming interfaces do not 7510 provide means to atomically modify or interrogate attributes for 7511 multiple files at the same time. The following rules provide an 7512 environment where the potential incoherency mentioned above can be 7513 reasonably managed. These rules are derived from the practice of 7514 previous NFS protocols. 7516 o All attributes for a given file (per-fsid attributes excepted) are 7517 cached as a unit at the client so that no non-serializability can 7518 arise within the context of a single file. 7520 o An upper time boundary is maintained on how long a client cache 7521 entry can be kept without being refreshed from the server. 7523 o When operations are performed that modify attributes at the 7524 server, the updated attribute set is requested as part of the 7525 containing RPC. This includes directory operations that update 7526 attributes indirectly. This is accomplished by following the 7527 modifying operation with a GETATTR operation and then using the 7528 results of the GETATTR to update the client's cached attributes. 7530 Note that if the full set of attributes to be cached is requested by 7531 READDIR, the results can be cached by the client on the same basis as 7532 attributes obtained via GETATTR. 7534 A client may validate its cached version of attributes for a file by 7535 fetching just both the change and time_access attributes and assuming 7536 that if the change attribute has the same value as it did when the 7537 attributes were cached, then no attributes other than time_access 7538 have changed. The reason why time_access is also fetched is because 7539 many servers operate in environments where the operation that updates 7540 change does not update time_access. For example, POSIX file 7541 semantics do not update access time when a file is modified by the 7542 write system call. Therefore, the client that wants a current 7543 time_access value should fetch it with change during the attribute 7544 cache validation processing and update its cached time_access. 7546 The client may maintain a cache of modified attributes for those 7547 attributes intimately connected with data of modified regular files 7548 (size, time_modify, and change). Other than those three attributes, 7549 the client MUST NOT maintain a cache of modified attributes. 7550 Instead, attribute changes are immediately sent to the server. 7552 In some operating environments, the equivalent to time_access is 7553 expected to be implicitly updated by each read of the content of the 7554 file object. If an NFS client is caching the content of a file 7555 object, whether it is a regular file, directory, or symbolic link, 7556 the client SHOULD NOT update the time_access attribute (via SETATTR 7557 or a small READ or READDIR request) on the server with each read that 7558 is satisfied from cache. The reason is that this can defeat the 7559 performance benefits of caching content, especially since an explicit 7560 SETATTR of time_access may alter the change attribute on the server. 7561 If the change attribute changes, clients that are caching the content 7562 will think the content has changed, and will re-read unmodified data 7563 from the server. Nor is the client encouraged to maintain a modified 7564 version of time_access in its cache, since this would mean that the 7565 client will either eventually have to write the access time to the 7566 server with bad performance effects, or it would never update the 7567 server's time_access, thereby resulting in a situation where an 7568 application that caches access time between a close and open of the 7569 same file observes the access time oscillating between the past and 7570 present. The time_access attribute always means the time of last 7571 access to a file by a read that was satisfied by the server. This 7572 way clients will tend to see only time_access changes that go forward 7573 in time. 7575 10.7. Data and Metadata Caching and Memory Mapped Files 7577 Some operating environments include the capability for an application 7578 to map a file's content into the application's address space. Each 7579 time the application accesses a memory location that corresponds to a 7580 block that has not been loaded into the address space, a page fault 7581 occurs and the file is read (or if the block does not exist in the 7582 file, the block is allocated and then instantiated in the 7583 application's address space). 7585 As long as each memory mapped access to the file requires a page 7586 fault, the relevant attributes of the file that are used to detect 7587 access and modification (time_access, time_metadata, time_modify, and 7588 change) will be updated. However, in many operating environments, 7589 when page faults are not required these attributes will not be 7590 updated on reads or updates to the file via memory access (regardless 7591 of whether the file is a local file or is being accessed remotely). 7592 A client or server MAY fail to update attributes of a file that is 7593 being accessed via memory mapped I/O. This has several implications: 7595 o If there is an application on the server that has memory mapped a 7596 file that a client is also accessing, the client may not be able 7597 to get a consistent value of the change attribute to determine 7598 whether its cache is stale or not. A server that knows that the 7599 file is memory mapped could always pessimistically return updated 7600 values for change so as to force the application to always get the 7601 most up to date data and metadata for the file. However, due to 7602 the negative performance implications of this, such behavior is 7603 OPTIONAL. 7605 o If the memory mapped file is not being modified on the server, and 7606 instead is just being read by an application via the memory mapped 7607 interface, the client will not see an updated time_access 7608 attribute. However, in many operating environments, neither will 7609 any process running on the server. Thus NFS clients are at no 7610 disadvantage with respect to local processes. 7612 o If there is another client that is memory mapping the file, and if 7613 that client is holding a OPEN_DELEGATE_WRITE delegation, the same 7614 set of issues as discussed in the previous two bullet items apply. 7615 So, when a server does a CB_GETATTR to a file that the client has 7616 modified in its cache, the response from CB_GETATTR will not 7617 necessarily be accurate. As discussed earlier, the client's 7618 obligation is to report that the file has been modified since the 7619 delegation was granted, not whether it has been modified again 7620 between successive CB_GETATTR calls, and the server MUST assume 7621 that any file the client has modified in cache has been modified 7622 again between successive CB_GETATTR calls. Depending on the 7623 nature of the client's memory management system, this weak 7624 obligation may not be possible. A client MAY return stale 7625 information in CB_GETATTR whenever the file is memory mapped. 7627 o The mixture of memory mapping and file locking on the same file is 7628 problematic. Consider the following scenario, where the page size 7629 on each client is 8192 bytes. 7631 * Client A memory maps first page (8192 bytes) of file X 7633 * Client B memory maps first page (8192 bytes) of file X 7635 * Client A write locks first 4096 bytes 7637 * Client B write locks second 4096 bytes 7638 * Client A, via a STORE instruction modifies part of its locked 7639 region. 7641 * Simultaneous to client A, client B issues a STORE on part of 7642 its locked region. 7644 Here the challenge is for each client to resynchronize to get a 7645 correct view of the first page. In many operating environments, the 7646 virtual memory management systems on each client only know a page is 7647 modified, not that a subset of the page corresponding to the 7648 respective lock regions has been modified. So it is not possible for 7649 each client to do the right thing, which is to only write to the 7650 server that portion of the page that is locked. For example, if 7651 client A simply writes out the page, and then client B writes out the 7652 page, client A's data is lost. 7654 Moreover, if mandatory locking is enabled on the file, then we have a 7655 different problem. When clients A and B issue the STORE 7656 instructions, the resulting page faults require a byte-range lock on 7657 the entire page. Each client then tries to extend their locked range 7658 to the entire page, which results in a deadlock. 7660 Communicating the NFS4ERR_DEADLOCK error to a STORE instruction is 7661 difficult at best. 7663 If a client is locking the entire memory mapped file, there is no 7664 problem with advisory or mandatory byte-range locking, at least until 7665 the client unlocks a region in the middle of the file. 7667 Given the above issues the following are permitted: 7669 o Clients and servers MAY deny memory mapping a file they know there 7670 are byte-range locks for. 7672 o Clients and servers MAY deny a byte-range lock on a file they know 7673 is memory mapped. 7675 o A client MAY deny memory mapping a file that it knows requires 7676 mandatory locking for I/O. If mandatory locking is enabled after 7677 the file is opened and mapped, the client MAY deny the application 7678 further access to its mapped file. 7680 10.8. Name Caching 7682 The results of LOOKUP and READDIR operations may be cached to avoid 7683 the cost of subsequent LOOKUP operations. Just as in the case of 7684 attribute caching, inconsistencies may arise among the various client 7685 caches. To mitigate the effects of these inconsistencies and given 7686 the context of typical file system APIs, an upper time boundary is 7687 maintained on how long a client name cache entry can be kept without 7688 verifying that the entry has not been made invalid by a directory 7689 change operation performed by another client. 7691 When a client is not making changes to a directory for which there 7692 exist name cache entries, the client needs to periodically fetch 7693 attributes for that directory to ensure that it is not being 7694 modified. After determining that no modification has occurred, the 7695 expiration time for the associated name cache entries may be updated 7696 to be the current time plus the name cache staleness bound. 7698 When a client is making changes to a given directory, it needs to 7699 determine whether there have been changes made to the directory by 7700 other clients. It does this by using the change attribute as 7701 reported before and after the directory operation in the associated 7702 change_info4 value returned for the operation. The server is able to 7703 communicate to the client whether the change_info4 data is provided 7704 atomically with respect to the directory operation. If the change 7705 values are provided atomically, the client is then able to compare 7706 the pre-operation change value with the change value in the client's 7707 name cache. If the comparison indicates that the directory was 7708 updated by another client, the name cache associated with the 7709 modified directory is purged from the client. If the comparison 7710 indicates no modification, the name cache can be updated on the 7711 client to reflect the directory operation and the associated timeout 7712 extended. The post-operation change value needs to be saved as the 7713 basis for future change_info4 comparisons. 7715 As demonstrated by the scenario above, name caching requires that the 7716 client revalidate name cache data by inspecting the change attribute 7717 of a directory at the point when the name cache item was cached. 7718 This requires that the server update the change attribute for 7719 directories when the contents of the corresponding directory is 7720 modified. For a client to use the change_info4 information 7721 appropriately and correctly, the server must report the pre and post 7722 operation change attribute values atomically. When the server is 7723 unable to report the before and after values atomically with respect 7724 to the directory operation, the server must indicate that fact in the 7725 change_info4 return value. When the information is not atomically 7726 reported, the client should not assume that other clients have not 7727 changed the directory. 7729 10.9. Directory Caching 7731 The results of READDIR operations may be used to avoid subsequent 7732 READDIR operations. Just as in the cases of attribute and name 7733 caching, inconsistencies may arise among the various client caches. 7735 To mitigate the effects of these inconsistencies, and given the 7736 context of typical file system APIs, the following rules should be 7737 followed: 7739 o Cached READDIR information for a directory which is not obtained 7740 in a single READDIR operation must always be a consistent snapshot 7741 of directory contents. This is determined by using a GETATTR 7742 before the first READDIR and after the last of READDIR that 7743 contributes to the cache. 7745 o An upper time boundary is maintained to indicate the length of 7746 time a directory cache entry is considered valid before the client 7747 must revalidate the cached information. 7749 The revalidation technique parallels that discussed in the case of 7750 name caching. When the client is not changing the directory in 7751 question, checking the change attribute of the directory with GETATTR 7752 is adequate. The lifetime of the cache entry can be extended at 7753 these checkpoints. When a client is modifying the directory, the 7754 client needs to use the change_info4 data to determine whether there 7755 are other clients modifying the directory. If it is determined that 7756 no other client modifications are occurring, the client may update 7757 its directory cache to reflect its own changes. 7759 As demonstrated previously, directory caching requires that the 7760 client revalidate directory cache data by inspecting the change 7761 attribute of a directory at the point when the directory was cached. 7762 This requires that the server update the change attribute for 7763 directories when the contents of the corresponding directory is 7764 modified. For a client to use the change_info4 information 7765 appropriately and correctly, the server must report the pre and post 7766 operation change attribute values atomically. When the server is 7767 unable to report the before and after values atomically with respect 7768 to the directory operation, the server must indicate that fact in the 7769 change_info4 return value. When the information is not atomically 7770 reported, the client should not assume that other clients have not 7771 changed the directory. 7773 11. Minor Versioning 7775 To address the requirement of an NFS protocol that can evolve as the 7776 need arises, the NFSv4 protocol contains the rules and framework to 7777 allow for future minor changes or versioning. 7779 The base assumption with respect to minor versioning is that any 7780 future accepted minor version must follow the IETF process and be 7781 documented in a standards track RFC. Therefore, each minor version 7782 number will correspond to an RFC. Minor version 0 of the NFS version 7783 4 protocol is represented by this RFC. The COMPOUND and CB_COMPOUND 7784 procedures support the encoding of the minor version being requested 7785 by the client. 7787 Future minor versions will extend, rather than replace the XDR for 7788 the preceding minor version, as had been done in moving from NFSv2 to 7789 NFSv3 and from NFSv3 to NFSv4.0. 7791 Specification of detailed rules for the construction of minor 7792 versions will be addressed in documents defining early minor versions 7793 or, more desirably, in an RFC establishing a versioning framework for 7794 NFSv4 as a whole. 7796 12. Internationalization 7798 12.1. Introduction 7800 Internationalization is a complex topic with its own set of 7801 terminology (see [RFC6365]). The topic is made more complex in 7802 NFSv4.0 by the tangled history and state of NFS implementations. 7803 This section describes what we might call "NFSv4.0 7804 internationalization" (i.e., internationalization as implemented by 7805 existing clients and servers) as the basis upon which NFSv4.0 clients 7806 may implement internationalization support. 7808 This section is based on the behavior of existing implementations. 7809 Note that the behaviors described are each demonstrated by a 7810 combination of an NFSv4 server implementation proper and a server- 7811 side physical file system. It is common for servers and physical 7812 file systems to be configurable as to the behavior shown. In the 7813 discussion below, each configuration that shows different behavior is 7814 considered separately. 7816 Note that in this section, the keywords "MUST", "SHOULD", and "MAY", 7817 retain their normal meanings. However, in deriving this 7818 specification from implementation patterns, we document below how the 7819 normative terms used derive from the behavior of existing 7820 implementations, in those situations in which existing implementation 7821 behavior patterns can be determined. 7823 o Behavior implemented by all existing clients or servers is 7824 described using "MUST", since new implementations need to follow 7825 existing ones to be assured of interoperability. While it is 7826 possible that different behavior might be workable, we have found 7827 no case where this seems reasonable. 7829 o Behavior implemented by no existing clients or servers is 7830 described using "MUST NOT", if such behavior poses 7831 interoperability problems. 7833 o Behavior implemented by most existing clients or servers, where 7834 that behavior is more desirable than any alternative is described 7835 using "SHOULD", since new implementations need to follow that 7836 existing practice unless there are strong reasons to do otherwise. 7838 The converse holds for "SHOULD NOT". 7840 o Behavior implemented by some, but not all existing clients or 7841 servers, is described using "MAY", indicating that new 7842 implementations have a choice as to whether they will behave in 7843 that way. Thus, new implementations will have the same 7844 flexibility that existing ones do. 7846 o Behavior implemented by all existing clients or servers, so far as 7847 is known, but where there remains some uncertainty as to details 7848 is described using "should". Such cases primarily concern details 7849 of error returns. New implementations should follow existing 7850 practice even though such situations generally do not affect 7851 interoperability. 7853 There are also cases in which certain server behaviors, while not 7854 known to exist, cannot be reliably determined not to exist. In part, 7855 this is a consequence of the long period of time that has elapsed 7856 since the publication of [RFC3530], resulting in a situation in which 7857 those involved in the implementation may no longer be involved in or 7858 aware of working group activities. 7860 In the case of possible server behavior that is neither known to 7861 exist nor known not to exist, we use SHOULD NOT and MUST NOT as 7862 follows, and similarly for "SHOULD" and "MUST". 7864 o In some cases, the potential behavior is not known to exist but is 7865 of such a nature that, if it were in fact implemented, 7866 interoperability difficulties would be expected and reported, 7867 giving us cause to conclude that the potential behavior is not 7868 implemented. For such behavior, we use MUST NOT. Similarly we 7869 use "MUST" to apply to the contrary behavior. 7871 o In other cases, potential behavior is not known to exist but the 7872 behavior, while undesirable, is not of such a nature that we are 7873 able to draw any conclusions about its potential existence. In 7874 such cases, we use SHOULD NOT. Similarly we use "SHOULD" to apply 7875 to the contrary behavior. 7877 In the case of a MAY, SHOULD, or SHOULD NOT that applies to servers, 7878 clients need to be aware that there are servers which may or may not 7879 take the specified action, and they need to be prepared for either 7880 eventuality. 7882 12.2. Limitations on internationalization-related processing in the 7883 NFSv4 context 7885 There are a number of noteworthy circumstances that limit the degree 7886 to which internationalization-related processing can be made 7887 universal with regard to NFSv4 clients and servers: 7889 o The NFSv4 client is part of an extensive set of client-side 7890 software components whose design and internal interfaces are not 7891 within the IETF's purview, limiting the degree to which a 7892 particular character encoding may be made standard. 7894 o Server-side handling of file component names is typically 7895 implemented within a server-side physical file system, whose 7896 handling of character encoding and normalization is not 7897 specifiable by the IETF. 7899 o Typical implementation patterns in Unix systems result in the 7900 NFSv4 client having no knowledge of the character encoding being 7901 used, which may even vary between processes on the same client 7902 system. 7904 o Users may need access to files stored previously with non-UTF-8 7905 encodings, or with UTF-8 encodings that do not match any 7906 particular normalization form. 7908 12.3. Summary of Server Behavior Types 7910 As mentioned in Section 12.6, servers MAY reject component name 7911 strings that are not valid UTF-8. This leads to a number of types of 7912 valid server behavior as outlined below. When these are combined 7913 with the valid normalization-related behaviors as described in 7914 Section 12.4, this leads to the combined behaviors outlined below. 7916 o Servers which limit file component names to UTF-8 strings exist 7917 with normalization-related handling described in Section 12.4. 7918 These are best described as "UTF-8-only servers". 7920 o Servers which do not limit file component names to UTF-8 strings 7921 are very common and are necessary to deal with clients/ 7922 applications not oriented to the use of UTF-8. Such servers 7923 ignore normalization-related issues and there is no way for them 7924 to implement either normalization or representation-independent 7925 lookups. These are best described as "UTF-8-unaware servers" 7926 since they treat file component names as uninterpreted strings of 7927 bytes and have no knowledge of the characters represented. See 7928 Section 12.7 for details. 7930 o It is possible for a server to allow component names which are not 7931 valid UTF-8, while still being aware of the structure of UTF-8 7932 strings. Such servers could implement either normalization or 7933 representation-independent lookups, but apply those techniques 7934 only to valid UTF-8 strings. Such servers are not common, but it 7935 is possible to configure at least one known server to have this 7936 behavior. This behavior SHOULD NOT be used due to the possibility 7937 that a filename using one character set may, by coincidence, have 7938 the appearance of a UTF-8 filename; the results of UTF-8 7939 normalization or representation-independent lookups are unlikely 7940 to be correct in all cases with respect to the other character 7941 set. 7943 12.4. String Encoding 7945 Strings that potentially contain characters outside the ASCII range 7946 [RFC20] are generally represented in NFSv4 using the UTF-8 encoding 7947 [RFC3629] of Unicode [UNICODE]. See [RFC3629] for precise encoding 7948 and decoding rules. 7950 Some details of the protocol treatment depend on the type of string: 7952 o For strings which are component names, the preferred encoding for 7953 any non-ASCII characters is the UTF-8 representation of Unicode. 7955 In many cases, clients have no knowledge of the encoding being 7956 used, with the encoding done at user-level under control of a per- 7957 process locale specification. As a result, it may be impossible 7958 for the NFSv4 client to enforce use of UTF-8. Use of non-UTF-8 7959 encodings can be problematic since it may interfere with access to 7960 files stored using other forms of name encoding. Also, 7961 normalization-related processing (see Section 12.5) of a string 7962 not encoded in UTF-8 could result in inappropriate name 7963 modification or aliasing. In cases in which one has a non-UTF8 7964 encoded name that accidentally conforms to UTF-8 rules, 7965 substitution of canonically equivalent strings can change the non- 7966 UTF-8 encoded name drastically. 7968 The kinds of modification and aliasing mentioned here can lead to 7969 both false negatives and false positives depending on the strings 7970 in question, which can result in security issues such as elevation 7971 of privilege and denial of service (see [RFC6943] for further 7972 discussion). 7974 o For strings based on domain names, non-ASCII characters MUST be 7975 represented using the UTF-8 encoding of Unicode, and additional 7976 string format restrictions apply. See Section 12.6 for details. 7978 o The contents of symbolic links (of type linktext4 in the XDR) MUST 7979 be treated as opaque data by NFSv4 servers. Although UTF-8 7980 encoding is often used, it need not be. In this respect, the 7981 contents of symbolic links are like the contents of regular files 7982 in that their encoding is not within the scope of this 7983 specification. 7985 o For other sorts of strings, any non-ASCII characters SHOULD be 7986 represented using the UTF-8 encoding of Unicode. 7988 12.5. Normalization 7990 The client and server operating environments may differ in their 7991 policies and operational methods with respect to character 7992 normalization (See [UNICODE] for a discussion of normalization 7993 forms). This difference may also exist between applications on the 7994 same client. This adds to the difficulty of providing a single 7995 normalization policy for the protocol that allows for maximal 7996 interoperability. This issue is similar to the character case issues 7997 where the server may or may not support case insensitive file name 7998 matching and may or may not preserve the character case when storing 7999 file names. The protocol does not mandate a particular behavior but 8000 allows for a range of useful behaviors. 8002 The NFS version 4 protocol does not mandate the use of a particular 8003 normalization form at this time. A subsequent minor version of the 8004 NFSv4 protocol might specify a particular normalization form. 8005 Therefore, the server and client can expect that they may receive 8006 unnormalized characters within protocol requests and responses. If 8007 the operating environment requires normalization, then the 8008 implementation will need to normalize the various UTF-8 encoded 8009 strings within the protocol before presenting the information to an 8010 application (at the client) or local file system (at the server). 8012 Server implementations MAY normalize file names to conform to a 8013 particular normalization form before using the resulting string when 8014 looking up or creating a file. Servers MAY also perform 8015 normalization-insensitive string comparisons without modifying the 8016 names to match a particular normalization form. Except in cases in 8017 which component names are excluded from normalization-related 8018 handling because they are not valid UTF-8 strings, a server MUST make 8019 the same choice (as to whether to normalize or not, the target form 8020 of normalization and whether to do normalization-insensitive string 8021 comparisons) in the same way for all accesses to a particular file 8022 system. Servers SHOULD NOT reject a file name because it does not 8023 conform to a particular normalization form as this may deny access to 8024 clients that use a different normalization form. 8026 12.6. Types with Processing Defined by Other Internet Areas 8028 There are two types of strings that NFSv4 deals with that are based 8029 on domain names. Processing of such strings is defined by other 8030 Internet standards, and hence the processing behavior for such 8031 strings should be consistent across all server operating systems and 8032 server file systems. 8034 These are as follows: 8036 o Server names as they appear in the fs_locations attribute. Note 8037 that for most purposes, such server names will only be sent by the 8038 server to the client. The exception is use of the fs_locations 8039 attribute in a VERIFY or NVERIFY operation. 8041 o Principal suffixes which are used to denote sets of users and 8042 groups, and are in the form of domain names. 8044 The general rules for handling all of these domain-related strings 8045 are similar and independent of the role of the sender or receiver as 8046 client or server although the consequences of failure to obey these 8047 rules may be different for client or server. The server can report 8048 errors when it is sent invalid strings, whereas the client will 8049 simply ignore invalid string or use a default value in their place. 8051 The string sent SHOULD be in the form of one or more U-labels as 8052 defined by [RFC5890]. If that is impractical, it can instead be in 8053 the form of one or more LDH labels [RFC5890] or a UTF-8 domain name 8054 that contains labels that are not properly formatted U-labels. The 8055 receiver needs to be able to accept domain and server names in any of 8056 the formats allowed. The server MUST reject, using the error 8057 NFS4ERR_INVAL, a string that is not valid UTF-8, or that contains an 8058 ASCII label that is not a valid LDH label, or that contains an XN- 8059 label (begins with "xn--") for which the characters after "xn--" are 8060 not valid output of the Punycode algorithm [RFC3492]. 8062 When a domain string is part of id@domain or group@domain, there are 8063 two possible approaches: 8065 1. The server treats the domain string as a series of U-labels. In 8066 cases where the domain string is a series of A-labels or NR-LDH 8067 labels, it converts them to U-labels using the Punycode algorithm 8068 [RFC3492]. In cases where the domain string is series of other 8069 sorts of LDH labels, the server can use the ToUnicode function 8070 defined in [RFC3490] to convert the string to a series of labels 8071 that generally conform to the U-label syntax. In cases where the 8072 domain string is a UTF-8 string that contains non-U-labels, the 8073 server can attempt to use to ToASCII function defined in 8074 [RFC3490] and then the ToUnicode function on the string to 8075 convert it to a series of labels that generally conform to the 8076 U-label syntax. As a result, the domain string returned within a 8077 userid on a GETATTR may not match that sent when the userid is 8078 set using SETATTR, although when this happens, the domain will be 8079 in the form that generally conform to the U-label syntax. 8081 2. The server does not attempt to treat the domain string as a 8082 series of U-labels; specifically, it does not map a domain string 8083 which is not a U-label into a U-label using the methods described 8084 above. As a result, the domain string returned on a GETATTR of 8085 the userid MUST be the same as that used when setting the userid 8086 by the SETATTR. 8088 A server SHOULD use the first method. 8090 For VERIFY and NVERIFY, additional string processing requirements 8091 apply to verification of the owner and owner_group attributes, see 8092 Section 5.9. 8094 12.7. UTF-8 Related Errors 8096 Where the client sends an invalid UTF-8 string, the server MAY return 8097 an NFS4ERR_INVAL error. This includes cases in which inappropriate 8098 prefixes are detected and where the count includes trailing bytes 8099 that do not constitute a full UCS character. 8101 Requirements for server handling of component names which are not 8102 valid UTF-8, when a server does not return NFS4ERR_INVAL in response 8103 to receiving them, are described in Section 12.8. 8105 Where the client supplied string is not rejected with NFS4ERR_INVAL 8106 but contains characters that are not supported by the server as a 8107 value for that string (e.g., names containing slashes, or characters 8108 that do not fit into 16 bits when converted from UTF-8 to a Unicode 8109 codepoint), the server should return an NFS4ERR_BADCHAR error. 8111 Where a UTF-8 string is used as a file name, and the file system, 8112 while supporting all of the characters within the name, does not 8113 allow that particular name to be used, the error should return the 8114 error NFS4ERR_BADNAME. This includes such situations as file system 8115 prohibitions of "." and ".." as file names for certain operations, 8116 and similar constraints 8118 12.8. Servers that accept file component names that are not valid UTF-8 8119 strings 8121 As stated previously, servers MAY accept, on all or on some subset of 8122 the physical file systems exported, component names that are not 8123 valid UTF-8 strings. A typical pattern is for a server to use 8124 UTF-8-unaware physical file systems that treat component names as 8125 uninterpreted strings of bytes, rather than having any awareness of 8126 the character set being used. 8128 Such servers SHOULD NOT change the stored representation of component 8129 names from those received on the wire, and SHOULD use an octet-by- 8130 octet comparison of component name strings to determine equivalence 8131 (as opposed to any broader notion of string comparison). This is 8132 because the server has no knowledge of the character encoding being 8133 used. 8135 Nonetheless, when such a server uses a broader notion of string 8136 equivalence than recommended in the preceding paragraph the following 8137 considerations apply: 8139 o Outside of 7-bit ASCII, string processing that changes string 8140 contents is usually specific to a character set and hence is 8141 generally unsafe when the character set is unknown. This 8142 processing could change the filename in an unexpected fashion, 8143 rendering the file inaccessible to the application or client that 8144 created or renamed the file and to others expecting the original 8145 filename. Hence, such processing should not be performed because 8146 doing so is likely to result in incorrect string modification or 8147 aliasing. 8149 o Unicode normalization is particularly dangerous, as such 8150 processing assumes that the string is UTF-8. When that assumption 8151 is false because a different character set was used to create the 8152 filename, normalization may corrupt the filename with respect to 8153 that character set, rendering the file inaccessible to the 8154 application that created it and others expecting the original 8155 filename. Hence, Unicode normalization SHOULD NOT be performed, 8156 because it may cause incorrect string modification or aliasing. 8158 When the above recommendations are not followed, the resulting string 8159 modification and aliasing can lead to both false negatives and false 8160 positives depending on the strings in question, which can result in 8161 security issues such as elevation of privilege and denial of service 8162 (see [RFC6943] for further discussion). 8164 13. Error Values 8166 NFS error numbers are assigned to failed operations within a Compound 8167 (COMPOUND or CB_COMPOUND) request. A Compound request contains a 8168 number of NFS operations that have their results encoded in sequence 8169 in a Compound reply. The results of successful operations will 8170 consist of an NFS4_OK status followed by the encoded results of the 8171 operation. If an NFS operation fails, an error status will be 8172 entered in the reply and the Compound request will be terminated. 8174 13.1. Error Definitions 8176 Protocol Error Definitions 8178 +-----------------------------+--------+-------------------+ 8179 | Error | Number | Description | 8180 +-----------------------------+--------+-------------------+ 8181 | NFS4_OK | 0 | Section 13.1.3.1 | 8182 | NFS4ERR_ACCESS | 13 | Section 13.1.6.1 | 8183 | NFS4ERR_ADMIN_REVOKED | 10047 | Section 13.1.5.1 | 8184 | NFS4ERR_ATTRNOTSUPP | 10032 | Section 13.1.11.1 | 8185 | NFS4ERR_BADCHAR | 10040 | Section 13.1.7.1 | 8186 | NFS4ERR_BADHANDLE | 10001 | Section 13.1.2.1 | 8187 | NFS4ERR_BADNAME | 10041 | Section 13.1.7.2 | 8188 | NFS4ERR_BADOWNER | 10039 | Section 13.1.11.2 | 8189 | NFS4ERR_BADTYPE | 10007 | Section 13.1.4.1 | 8190 | NFS4ERR_BADXDR | 10036 | Section 13.1.1.1 | 8191 | NFS4ERR_BAD_COOKIE | 10003 | Section 13.1.1.2 | 8192 | NFS4ERR_BAD_RANGE | 10042 | Section 13.1.8.1 | 8193 | NFS4ERR_BAD_SEQID | 10026 | Section 13.1.8.2 | 8194 | NFS4ERR_BAD_STATEID | 10025 | Section 13.1.5.2 | 8195 | NFS4ERR_CB_PATH_DOWN | 10048 | Section 13.1.12.1 | 8196 | NFS4ERR_CLID_INUSE | 10017 | Section 13.1.10.1 | 8197 | NFS4ERR_DEADLOCK | 10045 | Section 13.1.8.3 | 8198 | NFS4ERR_DELAY | 10008 | Section 13.1.1.3 | 8199 | NFS4ERR_DENIED | 10010 | Section 13.1.8.4 | 8200 | NFS4ERR_DQUOT | 69 | Section 13.1.4.2 | 8201 | NFS4ERR_EXIST | 17 | Section 13.1.4.3 | 8202 | NFS4ERR_EXPIRED | 10011 | Section 13.1.5.3 | 8203 | NFS4ERR_FBIG | 27 | Section 13.1.4.4 | 8204 | NFS4ERR_FHEXPIRED | 10014 | Section 13.1.2.2 | 8205 | NFS4ERR_FILE_OPEN | 10046 | Section 13.1.4.5 | 8206 | NFS4ERR_GRACE | 10013 | Section 13.1.9.1 | 8207 | NFS4ERR_INVAL | 22 | Section 13.1.1.4 | 8208 | NFS4ERR_IO | 5 | Section 13.1.4.6 | 8209 | NFS4ERR_ISDIR | 21 | Section 13.1.2.3 | 8210 | NFS4ERR_LEASE_MOVED | 10031 | Section 13.1.5.4 | 8211 | NFS4ERR_LOCKED | 10012 | Section 13.1.8.5 | 8212 | NFS4ERR_LOCKS_HELD | 10037 | Section 13.1.8.6 | 8213 | NFS4ERR_LOCK_NOTSUPP | 10043 | Section 13.1.8.7 | 8214 | NFS4ERR_LOCK_RANGE | 10028 | Section 13.1.8.8 | 8215 | NFS4ERR_MINOR_VERS_MISMATCH | 10021 | Section 13.1.3.2 | 8216 | NFS4ERR_MLINK | 31 | Section 13.1.4.7 | 8217 | NFS4ERR_MOVED | 10019 | Section 13.1.2.4 | 8218 | NFS4ERR_NAMETOOLONG | 63 | Section 13.1.7.3 | 8219 | NFS4ERR_NOENT | 2 | Section 13.1.4.8 | 8220 | NFS4ERR_NOFILEHANDLE | 10020 | Section 13.1.2.5 | 8221 | NFS4ERR_NOSPC | 28 | Section 13.1.4.9 | 8222 | NFS4ERR_NOTDIR | 20 | Section 13.1.2.6 | 8223 | NFS4ERR_NOTEMPTY | 66 | Section 13.1.4.10 | 8224 | NFS4ERR_NOTSUPP | 10004 | Section 13.1.1.5 | 8225 | NFS4ERR_NOT_SAME | 10027 | Section 13.1.11.3 | 8226 | NFS4ERR_NO_GRACE | 10033 | Section 13.1.9.2 | 8227 | NFS4ERR_NXIO | 6 | Section 13.1.4.11 | 8228 | NFS4ERR_OLD_STATEID | 10024 | Section 13.1.5.5 | 8229 | NFS4ERR_OPENMODE | 10038 | Section 13.1.8.9 | 8230 | NFS4ERR_OP_ILLEGAL | 10044 | Section 13.1.3.3 | 8231 | NFS4ERR_PERM | 1 | Section 13.1.6.2 | 8232 | NFS4ERR_RECLAIM_BAD | 10034 | Section 13.1.9.3 | 8233 | NFS4ERR_RECLAIM_CONFLICT | 10035 | Section 13.1.9.4 | 8234 | NFS4ERR_RESOURCE | 10018 | Section 13.1.3.4 | 8235 | NFS4ERR_RESTOREFH | 10030 | Section 13.1.4.12 | 8236 | NFS4ERR_ROFS | 30 | Section 13.1.4.13 | 8237 | NFS4ERR_SAME | 10009 | Section 13.1.11.4 | 8238 | NFS4ERR_SERVERFAULT | 10006 | Section 13.1.1.6 | 8239 | NFS4ERR_SHARE_DENIED | 10015 | Section 13.1.8.10 | 8240 | NFS4ERR_STALE | 70 | Section 13.1.2.7 | 8241 | NFS4ERR_STALE_CLIENTID | 10022 | Section 13.1.10.2 | 8242 | NFS4ERR_STALE_STATEID | 10023 | Section 13.1.5.6 | 8243 | NFS4ERR_SYMLINK | 10029 | Section 13.1.2.8 | 8244 | NFS4ERR_TOOSMALL | 10005 | Section 13.1.1.7 | 8245 | NFS4ERR_WRONGSEC | 10016 | Section 13.1.6.3 | 8246 | NFS4ERR_XDEV | 18 | Section 13.1.4.14 | 8247 +-----------------------------+--------+-------------------+ 8249 Table 6 8251 13.1.1. General Errors 8253 This section deals with errors that are applicable to a broad set of 8254 different purposes. 8256 13.1.1.1. NFS4ERR_BADXDR (Error Code 10036) 8258 The arguments for this operation do not match those specified in the 8259 XDR definition. This includes situations in which the request ends 8260 before all the arguments have been seen. Note that this error 8261 applies when fixed enumerations (these include booleans) have a value 8262 within the input stream which is not valid for the enum. A replier 8263 may pre-parse all operations for a Compound procedure before doing 8264 any operation execution and return RPC-level XDR errors in that case. 8266 13.1.1.2. NFS4ERR_BAD_COOKIE (Error Code 10003) 8268 Used for operations that provide a set of information indexed by some 8269 quantity provided by the client or cookie sent by the server for an 8270 earlier invocation. Where the value cannot be used for its intended 8271 purpose, this error results. 8273 13.1.1.3. NFS4ERR_DELAY (Error Code 10008) 8275 For any of a number of reasons, the replier could not process this 8276 operation in what was deemed a reasonable time. The client should 8277 wait and then try the request with a new RPC transaction ID. 8279 Some example of situations that might lead to this situation: 8281 o A server that supports hierarchical storage receives a request to 8282 process a file that had been migrated. 8284 o An operation requires a delegation recall to proceed and waiting 8285 for this delegation recall makes processing this request in a 8286 timely fashion impossible. 8288 13.1.1.4. NFS4ERR_INVAL (Error Code 22) 8290 The arguments for this operation are not valid for some reason, even 8291 though they do match those specified in the XDR definition for the 8292 request. 8294 13.1.1.5. NFS4ERR_NOTSUPP (Error Code 10004) 8296 Operation not supported, either because the operation is an OPTIONAL 8297 one and is not supported by this server or because the operation MUST 8298 NOT be implemented in the current minor version. 8300 13.1.1.6. NFS4ERR_SERVERFAULT (Error Code 10006) 8302 An error occurred on the server which does not map to any of the 8303 specific legal NFSv4 protocol error values. The client should 8304 translate this into an appropriate error. UNIX clients may choose to 8305 translate this to EIO. 8307 13.1.1.7. NFS4ERR_TOOSMALL (Error Code 10005) 8309 Used where an operation returns a variable amount of data, with a 8310 limit specified by the client. Where the data returned cannot be 8311 fitted within the limit specified by the client, this error results. 8313 13.1.2. Filehandle Errors 8315 These errors deal with the situation in which the current or saved 8316 filehandle, or the filehandle passed to PUTFH intended to become the 8317 current filehandle, is invalid in some way. This includes situations 8318 in which the filehandle is a valid filehandle in general but is not 8319 of the appropriate object type for the current operation. 8321 Where the error description indicates a problem with the current or 8322 saved filehandle, it is to be understood that filehandles are only 8323 checked for the condition if they are implicit arguments of the 8324 operation in question. 8326 13.1.2.1. NFS4ERR_BADHANDLE (Error Code 10001) 8328 Illegal NFS filehandle for the current server. The current 8329 filehandle failed internal consistency checks. Once accepted as 8330 valid (by PUTFH), no subsequent status change can cause the 8331 filehandle to generate this error. 8333 13.1.2.2. NFS4ERR_FHEXPIRED (Error Code 10014) 8335 A current or saved filehandle which is an argument to the current 8336 operation is volatile and has expired at the server. 8338 13.1.2.3. NFS4ERR_ISDIR (Error Code 21) 8340 The current or saved filehandle designates a directory when the 8341 current operation does not allow a directory to be accepted as the 8342 target of this operation. 8344 13.1.2.4. NFS4ERR_MOVED (Error Code 10019) 8346 The file system which contains the current filehandle object is not 8347 present at the server. It may have been relocated, migrated to 8348 another server or may have never been present. The client may obtain 8349 the new file system location by obtaining the "fs_locations" or 8350 attribute for the current filehandle. For further discussion, refer 8351 to Section 8. 8353 13.1.2.5. NFS4ERR_NOFILEHANDLE (Error Code 10020) 8355 The logical current or saved filehandle value is required by the 8356 current operation and is not set. This may be a result of a 8357 malformed COMPOUND operation (i.e., no PUTFH or PUTROOTFH before an 8358 operation that requires the current filehandle be set). 8360 13.1.2.6. NFS4ERR_NOTDIR (Error Code 20) 8362 The current (or saved) filehandle designates an object which is not a 8363 directory for an operation in which a directory is required. 8365 13.1.2.7. NFS4ERR_STALE (Error Code 70) 8367 The current or saved filehandle value designating an argument to the 8368 current operation is invalid. The file system object referred to by 8369 that filehandle no longer exists or access to it has been revoked. 8371 13.1.2.8. NFS4ERR_SYMLINK (Error Code 10029) 8373 The current filehandle designates a symbolic link when the current 8374 operation does not allow a symbolic link as the target. 8376 13.1.3. Compound Structure Errors 8378 This section deals with errors that relate to overall structure of a 8379 Compound request (by which we mean to include both COMPOUND and 8380 CB_COMPOUND), rather than to particular operations. 8382 There are a number of basic constraints on the operations that may 8383 appear in a Compound request. 8385 13.1.3.1. NFS_OK (Error code 0) 8387 Indicates the operation completed successfully, in that all of the 8388 constituent operations completed without error. 8390 13.1.3.2. NFS4ERR_MINOR_VERS_MISMATCH (Error code 10021) 8392 The minor version specified is not one that the current listener 8393 supports. This value is returned in the overall status for the 8394 Compound but is not associated with a specific operation since the 8395 results must specify a result count of zero. 8397 13.1.3.3. NFS4ERR_OP_ILLEGAL (Error Code 10044) 8399 The operation code is not a valid one for the current Compound 8400 procedure. The opcode in the result stream matched with this error 8401 is the ILLEGAL value, although the value that appears in the request 8402 stream may be different. Where an illegal value appears and the 8403 replier pre-parses all operations for a Compound procedure before 8404 doing any operation execution, an RPC-level XDR error may be returned 8405 in this case. 8407 13.1.3.4. NFS4ERR_RESOURCE (Error Code 10018) 8409 For the processing of the Compound procedure, the server may exhaust 8410 available resources and cannot continue processing operations within 8411 the Compound procedure. This error will be returned from the server 8412 in those instances of resource exhaustion related to the processing 8413 of the Compound procedure. 8415 13.1.4. File System Errors 8417 These errors describe situations which occurred in the underlying 8418 file system implementation rather than in the protocol or any NFSv4.x 8419 feature. 8421 13.1.4.1. NFS4ERR_BADTYPE (Error Code 10007) 8423 An attempt was made to create an object with an inappropriate type 8424 specified to CREATE. This may be because the type is undefined, 8425 because it is a type not supported by the server, or because it is a 8426 type for which create is not intended such as a regular file or named 8427 attribute, for which OPEN is used to do the file creation. 8429 13.1.4.2. NFS4ERR_DQUOT (Error Code 69) 8431 Resource (quota) hard limit exceeded. The user's resource limit on 8432 the server has been exceeded. 8434 13.1.4.3. NFS4ERR_EXIST (Error Code 17) 8436 A file system object of the specified target name (when creating, 8437 renaming or linking) already exists. 8439 13.1.4.4. NFS4ERR_FBIG (Error Code 27) 8441 Filesystem object too large. The operation would have caused a file 8442 system object to grow beyond the server's limit. 8444 13.1.4.5. NFS4ERR_FILE_OPEN (Error Code 10046) 8446 The operation is not allowed because a file system object involved in 8447 the operation is currently open. Servers may, but are not required 8448 to disallow linking-to, removing, or renaming open file system 8449 objects. 8451 13.1.4.6. NFS4ERR_IO (Error Code 5) 8453 Indicates that an I/O error occurred for which the file system was 8454 unable to provide recovery. 8456 13.1.4.7. NFS4ERR_MLINK (Error Code 31) 8458 The request would have caused the server's limit for the number of 8459 hard links a file system object may have to be exceeded. 8461 13.1.4.8. NFS4ERR_NOENT (Error Code 2) 8463 Indicates no such file or directory. The file system object 8464 referenced by the name specified does not exist. 8466 13.1.4.9. NFS4ERR_NOSPC (Error Code 28) 8468 Indicates no space left on device. The operation would have caused 8469 the server's file system to exceed its limit. 8471 13.1.4.10. NFS4ERR_NOTEMPTY (Error Code 66) 8473 An attempt was made to remove a directory that was not empty. 8475 13.1.4.11. NFS4ERR_NXIO (Error Code 6) 8477 I/O error. No such device or address. 8479 13.1.4.12. NFS4ERR_RESTOREFH (Error Code 10030) 8481 The RESTOREFH operation does not have a saved filehandle (identified 8482 by SAVEFH) to operate upon. 8484 13.1.4.13. NFS4ERR_ROFS (Error Code 30) 8486 Indicates a read-only file system. A modifying operation was 8487 attempted on a read-only file system. 8489 13.1.4.14. NFS4ERR_XDEV (Error Code 18) 8491 Indicates an attempt to do an operation, such as linking, that 8492 inappropriately crosses a boundary. This may be due to such 8493 boundaries as: 8495 o That between file systems (where the fsids are different). 8497 o That between different named attribute directories or between a 8498 named attribute directory and an ordinary directory. 8500 o That between regions of a file system that the file system 8501 implementation treats as separate (for example for space 8502 accounting purposes), and where cross-connection between the 8503 regions are not allowed. 8505 13.1.5. State Management Errors 8507 These errors indicate problems with the stateid (or one of the 8508 stateids) passed to a given operation. This includes situations in 8509 which the stateid is invalid as well as situations in which the 8510 stateid is valid but designates revoked locking state. Depending on 8511 the operation, the stateid when valid may designate opens, byte-range 8512 locks, or file delegations. 8514 13.1.5.1. NFS4ERR_ADMIN_REVOKED (Error Code 10047) 8516 A stateid designates locking state of any type that has been revoked 8517 due to administrative interaction, possibly while the lease is valid, 8518 or because a delegation was revoked because of failure to return it, 8519 while the lease was valid. 8521 13.1.5.2. NFS4ERR_BAD_STATEID (Error Code 10025) 8523 A stateid generated by the current server instance was used which 8524 either: 8526 o Does not designate any locking state (either current or 8527 superseded) for a current (state-owner, file) pair. 8529 o Designates locking state that was freed after lease expiration but 8530 without any lease cancellation, as may happen in the handling of 8531 "courtesy locks". 8533 13.1.5.3. NFS4ERR_EXPIRED (Error Code 10011) 8535 A stateid or clientid designates locking state of any type that has 8536 been revoked or released due to cancellation of the client's lease, 8537 either immediately upon lease expiration, or following a later 8538 request for a conflicting lock. 8540 13.1.5.4. NFS4ERR_LEASE_MOVED (Error Code 10031) 8542 A lease being renewed is associated with a file system that has been 8543 migrated to a new server. 8545 13.1.5.5. NFS4ERR_OLD_STATEID (Error Code 10024) 8547 A stateid is provided with a seqid value that is not the most 8548 current. 8550 13.1.5.6. NFS4ERR_STALE_STATEID (Error Code 10023) 8552 A stateid generated by an earlier server instance was used. 8554 13.1.6. Security Errors 8556 These are the various permission-related errors in NFSv4. 8558 13.1.6.1. NFS4ERR_ACCESS (Error Code 13) 8560 Indicates permission denied. The caller does not have the correct 8561 permission to perform the requested operation. Contrast this with 8562 NFS4ERR_PERM (Section 13.1.6.2), which restricts itself to owner or 8563 privileged user permission failures. 8565 13.1.6.2. NFS4ERR_PERM (Error Code 1) 8567 Indicates requester is not the owner. The operation was not allowed 8568 because the caller is neither a privileged user (root) nor the owner 8569 of the target of the operation. 8571 13.1.6.3. NFS4ERR_WRONGSEC (Error Code 10016) 8573 Indicates that the security mechanism being used by the client for 8574 the operation does not match the server's security policy. The 8575 client should change the security mechanism being used and re-send 8576 the operation. SECINFO can be used to determine the appropriate 8577 mechanism. 8579 13.1.7. Name Errors 8581 Names in NFSv4 are UTF-8 strings. When the strings are not of length 8582 zero, the error NFS4ERR_INVAL results. When they are not valid UTF-8 8583 the error NFS4ERR_INVAL also results, but servers may accommodate 8584 file systems with different character formats and not return this 8585 error. Besides this, there are a number of other errors to indicate 8586 specific problems with names. 8588 13.1.7.1. NFS4ERR_BADCHAR (Error Code 10040) 8590 A UTF-8 string contains a character which is not supported by the 8591 server in the context in which it is being used. 8593 13.1.7.2. NFS4ERR_BADNAME (Error Code 10041) 8595 A name string in a request consisted of valid UTF-8 characters 8596 supported by the server but the name is not supported by the server 8597 as a valid name for current operation. An example might be creating 8598 a file or directory named ".." on a server whose file system uses 8599 that name for links to parent directories. 8601 This error should not be returned due a normalization issue in a 8602 string. When a file system keeps names in a particular normalization 8603 form, it is the server's responsibility to do the appropriate 8604 normalization, rather than rejecting the name. 8606 13.1.7.3. NFS4ERR_NAMETOOLONG (Error Code 63) 8608 Returned when the filename in an operation exceeds the server's 8609 implementation limit. 8611 13.1.8. Locking Errors 8613 This section deal with errors related to locking, both as to share 8614 reservations and byte-range locking. It does not deal with errors 8615 specific to the process of reclaiming locks. Those are dealt with in 8616 the next section. 8618 13.1.8.1. NFS4ERR_BAD_RANGE (Error Code 10042) 8620 The range for a LOCK, LOCKT, or LOCKU operation is not appropriate to 8621 the allowable range of offsets for the server. E.g., this error 8622 results when a server which only supports 32-bit ranges receives a 8623 range that cannot be handled by that server. (See Section 15.12.4). 8625 13.1.8.2. NFS4ERR_BAD_SEQID (Error Code 10026) 8627 The sequence number (seqid) in a locking request is neither the next 8628 expected number or the last number processed. 8630 13.1.8.3. NFS4ERR_DEADLOCK (Error Code 10045) 8632 The server has been able to determine a file locking deadlock 8633 condition for a blocking lock request. 8635 13.1.8.4. NFS4ERR_DENIED (Error Code 10010) 8637 An attempt to lock a file is denied. Since this may be a temporary 8638 condition, the client is encouraged to re-send the lock request until 8639 the lock is accepted. See Section 9.4 for a discussion of the re- 8640 send. 8642 13.1.8.5. NFS4ERR_LOCKED (Error Code 10012) 8644 A read or write operation was attempted on a file where there was a 8645 conflict between the I/O and an existing lock: 8647 o There is a share reservation inconsistent with the I/O being done. 8649 o The range to be read or written intersects an existing mandatory 8650 byte range lock. 8652 13.1.8.6. NFS4ERR_LOCKS_HELD (Error Code 10037) 8654 An operation was prevented by the unexpected presence of locks. 8656 13.1.8.7. NFS4ERR_LOCK_NOTSUPP (Error Code 10043) 8658 A locking request was attempted which would require the upgrade or 8659 downgrade of a lock range already held by the owner when the server 8660 does not support atomic upgrade or downgrade of locks. 8662 13.1.8.8. NFS4ERR_LOCK_RANGE (Error Code 10028) 8664 A lock request is operating on a range that overlaps in part a 8665 currently held lock for the current lock-owner and does not precisely 8666 match a single such lock where the server does not support this type 8667 of request, and thus does not implement POSIX locking semantics 8668 [fcntl]. See Section 15.12.5, Section 15.13.5, and Section 15.14.5 8669 for a discussion of how this applies to LOCK, LOCKT, and LOCKU 8670 respectively. 8672 13.1.8.9. NFS4ERR_OPENMODE (Error Code 10038) 8674 The client attempted a READ, WRITE, LOCK or other operation not 8675 sanctioned by the stateid passed (e.g., writing to a file opened only 8676 for read). 8678 13.1.8.10. NFS4ERR_SHARE_DENIED (Error Code 10015) 8680 An attempt to OPEN a file with a share reservation has failed because 8681 of a share conflict. 8683 13.1.9. Reclaim Errors 8685 These errors relate to the process of reclaiming locks after a server 8686 restart. 8688 13.1.9.1. NFS4ERR_GRACE (Error Code 10013) 8690 The server is in its recovery or grace period which should at least 8691 match the lease period of the server. A locking request other than a 8692 reclaim could not be granted during that period. 8694 13.1.9.2. NFS4ERR_NO_GRACE (Error Code 10033) 8696 The server cannot guarantee that it has not granted state to another 8697 client which may conflict with this client's state. No further 8698 reclaims from this client will succeed. 8700 13.1.9.3. NFS4ERR_RECLAIM_BAD (Error Code 10034) 8702 The server cannot guarantee that it has not granted state to another 8703 client which may conflict with the requested state. However, this 8704 applies only to the state requested in this call; further reclaims 8705 may succeed. 8707 Unlike NFS4ERR_RECLAIM_CONFLICT, this can occur between correctly 8708 functioning clients and servers: the "edge condition" scenarios 8709 described in Section 9.6.3.1 leave only the server knowing whether 8710 the client's locks are still valid, and NFS4ERR_RECLAIM_BAD is the 8711 server's way of informing the client that they are not. 8713 13.1.9.4. NFS4ERR_RECLAIM_CONFLICT (Error Code 10035) 8715 The reclaim attempted by the client conflicts with a lock already 8716 held by another client. Unlike NFS4ERR_RECLAIM_BAD, this can only 8717 occur if one of the clients misbehaved. 8719 13.1.10. Client Management Errors 8721 This sections deals with errors associated with requests used to 8722 create and manage client IDs. 8724 13.1.10.1. NFS4ERR_CLID_INUSE (Error Code 10017) 8726 The SETCLIENTID operation has found that a clientid is already in use 8727 by another client. 8729 13.1.10.2. NFS4ERR_STALE_CLIENTID (Error Code 10022) 8731 A client ID not recognized by the server was used in a locking or 8732 SETCLIENTID_CONFIRM request. 8734 13.1.11. Attribute Handling Errors 8736 This section deals with errors specific to attribute handling within 8737 NFSv4. 8739 13.1.11.1. NFS4ERR_ATTRNOTSUPP (Error Code 10032) 8741 An attribute specified is not supported by the server. This error 8742 MUST NOT be returned by the GETATTR operation. 8744 13.1.11.2. NFS4ERR_BADOWNER (Error Code 10039) 8746 Returned when an owner or owner_group attribute value or the who 8747 field of an ace within an ACL attribute value cannot be translated to 8748 a local representation. 8750 13.1.11.3. NFS4ERR_NOT_SAME (Error Code 10027) 8752 This error is returned by the VERIFY operation to signify that the 8753 attributes compared were not the same as those provided in the 8754 client's request. 8756 13.1.11.4. NFS4ERR_SAME (Error Code 10009) 8758 This error is returned by the NVERIFY operation to signify that the 8759 attributes compared were the same as those provided in the client's 8760 request. 8762 13.1.12. Miscellaneous Errors 8764 13.1.12.1. NFS4ERR_CB_PATH_DOWN (Error Code 10048) 8766 There is a problem contacting the client via the callback path. 8768 13.2. Operations and their valid errors 8770 This section contains a table which gives the valid error returns for 8771 each protocol operation. The error code NFS4_OK (indicating no 8772 error) is not listed but should be understood to be returnable by all 8773 operations except ILLEGAL. 8775 Valid error returns for each protocol operation 8777 +---------------------+---------------------------------------------+ 8778 | Operation | Errors | 8779 +---------------------+---------------------------------------------+ 8780 | ACCESS | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE, | 8781 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 8782 | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | 8783 | | NFS4ERR_IO, NFS4ERR_MOVED, | 8784 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_RESOURCE, | 8785 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE | 8786 | | | 8787 | CLOSE | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADHANDLE, | 8788 | | NFS4ERR_BAD_SEQID, NFS4ERR_BAD_STATEID, | 8789 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 8790 | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, | 8791 | | NFS4ERR_INVAL, NFS4ERR_ISDIR, | 8792 | | NFS4ERR_LEASE_MOVED, NFS4ERR_LOCKS_HELD, | 8793 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 8794 | | NFS4ERR_OLD_STATEID, NFS4ERR_RESOURCE, | 8795 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 8796 | | NFS4ERR_STALE_STATEID | 8797 | | | 8798 | COMMIT | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE, | 8799 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 8800 | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | 8801 | | NFS4ERR_IO, NFS4ERR_ISDIR, NFS4ERR_MOVED, | 8802 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_RESOURCE, | 8803 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 8804 | | NFS4ERR_STALE, NFS4ERR_SYMLINK | 8805 | | | 8806 | CREATE | NFS4ERR_ACCESS, NFS4ERR_ATTRNOTSUPP, | 8807 | | NFS4ERR_BADCHAR, NFS4ERR_BADHANDLE, | 8808 | | NFS4ERR_BADNAME, NFS4ERR_BADOWNER, | 8809 | | NFS4ERR_BADTYPE, NFS4ERR_BADXDR, | 8810 | | NFS4ERR_DELAY, NFS4ERR_DQUOT, | 8811 | | NFS4ERR_EXIST, NFS4ERR_FHEXPIRED, | 8812 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED, | 8813 | | NFS4ERR_NAMETOOLONG, NFS4ERR_NOFILEHANDLE, | 8814 | | NFS4ERR_NOSPC, NFS4ERR_NOTDIR, | 8815 | | NFS4ERR_PERM, NFS4ERR_RESOURCE, | 8816 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 8817 | | NFS4ERR_STALE | 8818 | | | 8819 | DELEGPURGE | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 8820 | | NFS4ERR_NOTSUPP, NFS4ERR_LEASE_MOVED, | 8821 | | NFS4ERR_RESOURCE, NFS4ERR_SERVERFAULT, | 8822 | | NFS4ERR_STALE_CLIENTID | 8823 | | | 8824 | DELEGRETURN | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BAD_STATEID, | 8825 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 8826 | | NFS4ERR_EXPIRED, NFS4ERR_INVAL, | 8827 | | NFS4ERR_LEASE_MOVED, NFS4ERR_MOVED, | 8828 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP, | 8829 | | NFS4ERR_OLD_STATEID, NFS4ERR_RESOURCE, | 8830 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 8831 | | NFS4ERR_STALE_STATEID | 8832 | | | 8833 | GETATTR | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE, | 8834 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 8835 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 8836 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED, | 8837 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_RESOURCE, | 8838 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE | 8839 | | | 8840 | GETFH | NFS4ERR_BADHANDLE, NFS4ERR_FHEXPIRED, | 8841 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 8842 | | NFS4ERR_RESOURCE, NFS4ERR_SERVERFAULT, | 8843 | | NFS4ERR_STALE | 8844 | | | 8845 | ILLEGAL | NFS4ERR_BADXDR, NFS4ERR_OP_ILLEGAL | 8846 | | | 8847 | LINK | NFS4ERR_ACCESS, NFS4ERR_BADCHAR, | 8848 | | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME, | 8849 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 8850 | | NFS4ERR_DQUOT, NFS4ERR_EXIST, | 8851 | | NFS4ERR_FHEXPIRED, NFS4ERR_FILE_OPEN, | 8852 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR, | 8853 | | NFS4ERR_MLINK, NFS4ERR_MOVED, | 8854 | | NFS4ERR_NAMETOOLONG, NFS4ERR_NOENT, | 8855 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, | 8856 | | NFS4ERR_NOTDIR, NFS4ERR_NOTSUPP, | 8857 | | NFS4ERR_RESOURCE, NFS4ERR_ROFS, | 8858 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 8859 | | NFS4ERR_WRONGSEC, NFS4ERR_XDEV | 8860 | | | 8861 | LOCK | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 8862 | | NFS4ERR_BADHANDLE, NFS4ERR_BAD_RANGE, | 8863 | | NFS4ERR_BAD_SEQID, NFS4ERR_BAD_STATEID, | 8864 | | NFS4ERR_BADXDR, NFS4ERR_DEADLOCK, | 8865 | | NFS4ERR_DELAY, NFS4ERR_DENIED, | 8866 | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, | 8867 | | NFS4ERR_GRACE, NFS4ERR_INVAL, | 8868 | | NFS4ERR_ISDIR, NFS4ERR_LEASE_MOVED, | 8869 | | NFS4ERR_LOCK_NOTSUPP, NFS4ERR_LOCK_RANGE, | 8870 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 8871 | | NFS4ERR_NO_GRACE, NFS4ERR_OLD_STATEID, | 8872 | | NFS4ERR_OPENMODE, NFS4ERR_RECLAIM_BAD, | 8873 | | NFS4ERR_RECLAIM_CONFLICT, NFS4ERR_RESOURCE, | 8874 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 8875 | | NFS4ERR_STALE_CLIENTID, | 8876 | | NFS4ERR_STALE_STATEID | 8877 | | | 8878 | LOCKT | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE, | 8879 | | NFS4ERR_BAD_RANGE, NFS4ERR_BADXDR, | 8880 | | NFS4ERR_DELAY, NFS4ERR_DENIED, | 8881 | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, | 8882 | | NFS4ERR_GRACE, NFS4ERR_INVAL, | 8883 | | NFS4ERR_ISDIR, NFS4ERR_LEASE_MOVED, | 8884 | | NFS4ERR_LOCK_RANGE, NFS4ERR_MOVED, | 8885 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_RESOURCE, | 8886 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 8887 | | NFS4ERR_STALE_CLIENTID | 8888 | | | 8889 | LOCKU | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 8890 | | NFS4ERR_BADHANDLE, NFS4ERR_BAD_RANGE, | 8891 | | NFS4ERR_BAD_SEQID, NFS4ERR_BAD_STATEID, | 8892 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 8893 | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, | 8894 | | NFS4ERR_GRACE, NFS4ERR_INVAL, | 8895 | | NFS4ERR_ISDIR, NFS4ERR_LEASE_MOVED, | 8896 | | NFS4ERR_LOCK_RANGE, NFS4ERR_MOVED, | 8897 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_OLD_STATEID, | 8898 | | NFS4ERR_RESOURCE, NFS4ERR_SERVERFAULT, | 8899 | | NFS4ERR_STALE, NFS4ERR_STALE_STATEID | 8900 | | | 8901 | LOOKUP | NFS4ERR_ACCESS, NFS4ERR_BADCHAR, | 8902 | | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME, | 8903 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 8904 | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | 8905 | | NFS4ERR_IO, NFS4ERR_MOVED, | 8906 | | NFS4ERR_NAMETOOLONG, NFS4ERR_NOENT, | 8907 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTDIR, | 8908 | | NFS4ERR_RESOURCE, NFS4ERR_SERVERFAULT, | 8909 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 8910 | | NFS4ERR_WRONGSEC | 8911 | | | 8912 | LOOKUPP | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE, | 8913 | | NFS4ERR_DELAY, NFS4ERR_FHEXPIRED, | 8914 | | NFS4ERR_IO, NFS4ERR_MOVED, NFS4ERR_NOENT, | 8915 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTDIR, | 8916 | | NFS4ERR_RESOURCE, NFS4ERR_SERVERFAULT, | 8917 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 8918 | | NFS4ERR_WRONGSEC | 8919 | | | 8920 | NVERIFY | NFS4ERR_ACCESS, NFS4ERR_ATTRNOTSUPP, | 8921 | | NFS4ERR_BADCHAR, NFS4ERR_BADHANDLE, | 8922 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 8923 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 8924 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED, | 8925 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_SAME, | 8926 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE | 8927 | | | 8928 | OPEN | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 8929 | | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADCHAR, | 8930 | | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME, | 8931 | | NFS4ERR_BADOWNER, NFS4ERR_BADXDR, | 8932 | | NFS4ERR_BAD_SEQID, NFS4ERR_BAD_STATEID, | 8933 | | NFS4ERR_DELAY, NFS4ERR_DQUOT, | 8934 | | NFS4ERR_EXIST, NFS4ERR_EXPIRED, | 8935 | | NFS4ERR_FBIG, NFS4ERR_FHEXPIRED, | 8936 | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_IO, | 8937 | | NFS4ERR_ISDIR, NFS4ERR_MOVED, | 8938 | | NFS4ERR_NAMETOOLONG, NFS4ERR_NOENT, | 8939 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, | 8940 | | NFS4ERR_NOTDIR, NFS4ERR_NOTSUP, | 8941 | | NFS4ERR_NO_GRACE, NFS4ERR_OLD_STATEID, | 8942 | | NFS4ERR_PERM, NFS4ERR_RECLAIM_BAD, | 8943 | | NFS4ERR_RECLAIM_CONFLICT, NFS4ERR_RESOURCE, | 8944 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 8945 | | NFS4ERR_SHARE_DENIED, NFS4ERR_STALE, | 8946 | | NFS4ERR_STALE_CLIENTID, NFS4ERR_SYMLINK, | 8947 | | NFS4ERR_WRONGSEC | 8948 | | | 8949 | OPENATTR | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE, | 8950 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 8951 | | NFS4ERR_DQUOT, NFS4ERR_FHEXPIRED, | 8952 | | NFS4ERR_IO, NFS4ERR_MOVED, NFS4ERR_NOENT, | 8953 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, | 8954 | | NFS4ERR_NOTSUPP, NFS4ERR_RESOURCE, | 8955 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 8956 | | NFS4ERR_STALE | 8957 | | | 8958 | OPEN_CONFIRM | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADHANDLE, | 8959 | | NFS4ERR_BAD_SEQID, NFS4ERR_BAD_STATEID, | 8960 | | NFS4ERR_BADXDR, NFS4ERR_EXPIRED, | 8961 | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | 8962 | | NFS4ERR_ISDIR, NFS4ERR_LEASE_MOVED, | 8963 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 8964 | | NFS4ERR_OLD_STATEID, NFS4ERR_RESOURCE, | 8965 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 8966 | | NFS4ERR_STALE_STATEID | 8967 | | | 8968 | OPEN_DOWNGRADE | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADHANDLE, | 8969 | | NFS4ERR_BADXDR, NFS4ERR_BAD_SEQID, | 8970 | | NFS4ERR_BAD_STATEID, NFS4ERR_DELAY, | 8971 | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, | 8972 | | NFS4ERR_INVAL, NFS4ERR_LEASE_MOVED, | 8973 | | NFS4ERR_LOCKS_HELD, NFS4ERR_MOVED, | 8974 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_OLD_STATEID, | 8975 | | NFS4ERR_RESOURCE, NFS4ERR_ROFS, | 8976 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 8977 | | NFS4ERR_STALE_STATEID | 8978 | | | 8979 | PUTFH | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR, | 8980 | | NFS4ERR_DELAY, NFS4ERR_FHEXPIRED, | 8981 | | NFS4ERR_MOVED, NFS4ERR_SERVERFAULT, | 8982 | | NFS4ERR_STALE, NFS4ERR_WRONGSEC | 8983 | | | 8984 | PUTPUBFH | NFS4ERR_DELAY, NFS4ERR_SERVERFAULT, | 8985 | | NFS4ERR_WRONGSEC | 8986 | | | 8987 | PUTROOTFH | NFS4ERR_DELAY, NFS4ERR_SERVERFAULT, | 8988 | | NFS4ERR_WRONGSEC | 8989 | | | 8990 | READ | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 8991 | | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR, | 8992 | | NFS4ERR_BAD_STATEID, NFS4ERR_DELAY, | 8993 | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, | 8994 | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_IO, | 8995 | | NFS4ERR_ISDIR, NFS4ERR_LEASE_MOVED, | 8996 | | NFS4ERR_LOCKED, NFS4ERR_MOVED, | 8997 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_OLD_STATEID, | 8998 | | NFS4ERR_OPENMODE, NFS4ERR_RESOURCE, | 8999 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 9000 | | NFS4ERR_STALE_STATEID, NFS4ERR_SYMLINK | 9001 | | | 9002 | READDIR | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE, | 9003 | | NFS4ERR_BADXDR, NFS4ERR_BAD_COOKIE, | 9004 | | NFS4ERR_DELAY, NFS4ERR_FHEXPIRED, | 9005 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED, | 9006 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTDIR, | 9007 | | NFS4ERR_NOT_SAME, NFS4ERR_RESOURCE, | 9008 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 9009 | | NFS4ERR_TOOSMALL | 9010 | | | 9011 | READLINK | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE, | 9012 | | NFS4ERR_DELAY, NFS4ERR_FHEXPIRED, | 9013 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR, | 9014 | | NFS4ERR_MOVED, NFS4ERR_NOTSUP, | 9015 | | NFS4ERR_RESOURCE, NFS4ERR_NOFILEHANDLE, | 9016 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE | 9017 | | | 9018 | RELEASE_LOCKOWNER | NFS4ERR_BADXDR, NFS4ERR_EXPIRED, | 9019 | | NFS4ERR_LEASE_MOVED, NFS4ERR_LOCKS_HELD, | 9020 | | NFS4ERR_RESOURCE, NFS4ERR_SERVERFAULT, | 9021 | | NFS4ERR_STALE_CLIENTID | 9022 | | | 9023 | REMOVE | NFS4ERR_ACCESS, NFS4ERR_BADCHAR, | 9024 | | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME, | 9025 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 9026 | | NFS4ERR_FHEXPIRED, NFS4ERR_FILE_OPEN, | 9027 | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_IO, | 9028 | | NFS4ERR_MOVED, NFS4ERR_NAMETOOLONG, | 9029 | | NFS4ERR_NOENT, NFS4ERR_NOFILEHANDLE, | 9030 | | NFS4ERR_NOTDIR, NFS4ERR_NOTEMPTY, | 9031 | | NFS4ERR_RESOURCE, NFS4ERR_ROFS, | 9032 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE | 9033 | | | 9034 | RENAME | NFS4ERR_ACCESS, NFS4ERR_BADCHAR, | 9035 | | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME, | 9036 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 9037 | | NFS4ERR_DQUOT, NFS4ERR_EXIST, | 9038 | | NFS4ERR_FHEXPIRED, NFS4ERR_FILE_OPEN, | 9039 | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_IO, | 9040 | | NFS4ERR_MOVED, NFS4ERR_NAMETOOLONG, | 9041 | | NFS4ERR_NOENT, NFS4ERR_NOFILEHANDLE, | 9042 | | NFS4ERR_NOSPC, NFS4ERR_NOTDIR, | 9043 | | NFS4ERR_NOTEMPTY, NFS4ERR_RESOURCE, | 9044 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 9045 | | NFS4ERR_STALE, NFS4ERR_WRONGSEC, | 9046 | | NFS4ERR_XDEV | 9047 | | | 9048 | RENEW | NFS4ERR_ACCESS, NFS4ERR_BADXDR, | 9049 | | NFS4ERR_CB_PATH_DOWN, NFS4ERR_EXPIRED, | 9050 | | NFS4ERR_LEASE_MOVED, NFS4ERR_RESOURCE, | 9051 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE_CLIENTID | 9052 | | | 9053 | RESTOREFH | NFS4ERR_BADHANDLE, NFS4ERR_FHEXPIRED, | 9054 | | NFS4ERR_MOVED, NFS4ERR_RESOURCE, | 9055 | | NFS4ERR_RESTOREFH, NFS4ERR_SERVERFAULT, | 9056 | | NFS4ERR_STALE, NFS4ERR_WRONGSEC | 9057 | | | 9058 | SAVEFH | NFS4ERR_BADHANDLE, NFS4ERR_FHEXPIRED, | 9059 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 9060 | | NFS4ERR_RESOURCE, NFS4ERR_SERVERFAULT, | 9061 | | NFS4ERR_STALE | 9062 | | | 9063 | SECINFO | NFS4ERR_ACCESS, NFS4ERR_BADCHAR, | 9064 | | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME, | 9065 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 9066 | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | 9067 | | NFS4ERR_MOVED, NFS4ERR_NAMETOOLONG, | 9068 | | NFS4ERR_NOENT, NFS4ERR_NOFILEHANDLE, | 9069 | | NFS4ERR_NOTDIR, NFS4ERR_RESOURCE, | 9070 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE | 9071 | | | 9072 | SETATTR | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 9073 | | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADCHAR, | 9074 | | NFS4ERR_BADHANDLE, NFS4ERR_BADOWNER, | 9075 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 9076 | | NFS4ERR_DELAY, NFS4ERR_DQUOT, | 9077 | | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | 9078 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 9079 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR, | 9080 | | NFS4ERR_LEASE_MOVED, NFS4ERR_LOCKED, | 9081 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 9082 | | NFS4ERR_NOSPC, NFS4ERR_OLD_STATEID, | 9083 | | NFS4ERR_OPENMODE, NFS4ERR_PERM, | 9084 | | NFS4ERR_RESOURCE, NFS4ERR_ROFS, | 9085 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 9086 | | NFS4ERR_STALE_STATEID | 9087 | | | 9088 | SETCLIENTID | NFS4ERR_BADXDR, NFS4ERR_CLID_INUSE, | 9089 | | NFS4ERR_DELAY, NFS4ERR_INVAL, | 9090 | | NFS4ERR_RESOURCE, NFS4ERR_SERVERFAULT | 9091 | | | 9092 | SETCLIENTID_CONFIRM | NFS4ERR_BADXDR, NFS4ERR_CLID_INUSE, | 9093 | | NFS4ERR_DELAY, NFS4ERR_RESOURCE, | 9094 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE_CLIENTID | 9095 | | | 9096 | VERIFY | NFS4ERR_ACCESS, NFS4ERR_ATTRNOTSUPP, | 9097 | | NFS4ERR_BADCHAR, NFS4ERR_BADHANDLE, | 9098 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 9099 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 9100 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED, | 9101 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOT_SAME, | 9102 | | NFS4ERR_RESOURCE, NFS4ERR_SERVERFAULT, | 9103 | | NFS4ERR_STALE | 9104 | | | 9105 | WRITE | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 9106 | | NFS4ERR_BADXDR, NFS4ERR_BADHANDLE, | 9107 | | NFS4ERR_BAD_STATEID, NFS4ERR_DELAY, | 9108 | | NFS4ERR_DQUOT, NFS4ERR_EXPIRED, | 9109 | | NFS4ERR_FBIG, NFS4ERR_FHEXPIRED, | 9110 | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_IO, | 9111 | | NFS4ERR_ISDIR, NFS4ERR_LEASE_MOVED, | 9112 | | NFS4ERR_LOCKED, NFS4ERR_MOVED, | 9113 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, | 9114 | | NFS4ERR_NXIO, NFS4ERR_OLD_STATEID, | 9115 | | NFS4ERR_OPENMODE, NFS4ERR_RESOURCE, | 9116 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 9117 | | NFS4ERR_STALE, NFS4ERR_STALE_STATEID, | 9118 | | NFS4ERR_SYMLINK | 9119 | | | 9120 +---------------------+---------------------------------------------+ 9122 Table 7 9124 13.3. Callback operations and their valid errors 9126 This section contains a table which gives the valid error returns for 9127 each callback operation. The error code NFS4_OK (indicating no 9128 error) is not listed but should be understood to be returnable by all 9129 callback operations with the exception of CB_ILLEGAL. 9131 Valid error returns for each protocol callback operation 9133 +-------------+-----------------------------------------------------+ 9134 | Callback | Errors | 9135 | Operation | | 9136 +-------------+-----------------------------------------------------+ 9137 | CB_GETATTR | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR, NFS4ERR_DELAY, | 9138 | | NFS4ERR_INVAL, NFS4ERR_SERVERFAULT | 9139 | | | 9140 | CB_ILLEGAL | NFS4ERR_BADXDR, NFS4ERR_OP_ILLEGAL | 9141 | | | 9142 | CB_RECALL | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR, | 9143 | | NFS4ERR_BAD_STATEID, NFS4ERR_DELAY, | 9144 | | NFS4ERR_SERVERFAULT | 9145 | | | 9146 +-------------+-----------------------------------------------------+ 9148 Table 8 9150 13.4. Errors and the operations that use them 9152 Errors and the operations that use them 9154 +--------------------------+----------------------------------------+ 9155 | Error | Operations | 9156 +--------------------------+----------------------------------------+ 9157 | NFS4ERR_ACCESS | ACCESS, COMMIT, CREATE, GETATTR, LINK, | 9158 | | LOCK, LOCKT, LOCKU, LOOKUP, LOOKUPP, | 9159 | | NVERIFY, OPEN, OPENATTR, READ, | 9160 | | READDIR, READLINK, REMOVE, RENAME, | 9161 | | RENEW, SECINFO, SETATTR, VERIFY, WRITE | 9162 | | | 9163 | NFS4ERR_ADMIN_REVOKED | CLOSE, DELEGRETURN, LOCK, LOCKU, OPEN, | 9164 | | OPEN_CONFIRM, OPEN_DOWNGRADE, READ, | 9165 | | SETATTR, WRITE | 9166 | | | 9167 | NFS4ERR_ATTRNOTSUPP | CREATE, NVERIFY, OPEN, SETATTR, VERIFY | 9168 | | | 9169 | NFS4ERR_BADCHAR | CREATE, LINK, LOOKUP, NVERIFY, OPEN, | 9170 | | REMOVE, RENAME, SECINFO, SETATTR, | 9171 | | VERIFY | 9172 | | | 9173 | NFS4ERR_BADHANDLE | ACCESS, CB_GETATTR, CB_RECALL, CLOSE, | 9174 | | COMMIT, CREATE, GETATTR, GETFH, LINK, | 9175 | | LOCK, LOCKT, LOCKU, LOOKUP, LOOKUPP, | 9176 | | NVERIFY, OPEN, OPENATTR, OPEN_CONFIRM, | 9177 | | OPEN_DOWNGRADE, PUTFH, READ, READDIR, | 9178 | | READLINK, REMOVE, RENAME, RESTOREFH, | 9179 | | SAVEFH, SECINFO, SETATTR, VERIFY, | 9180 | | WRITE | 9181 | | | 9182 | NFS4ERR_BADNAME | CREATE, LINK, LOOKUP, OPEN, REMOVE, | 9183 | | RENAME, SECINFO | 9184 | | | 9185 | NFS4ERR_BADOWNER | CREATE, OPEN, SETATTR | 9186 | | | 9187 | NFS4ERR_BADTYPE | CREATE | 9188 | | | 9189 | NFS4ERR_BADXDR | ACCESS, CB_GETATTR, CB_ILLEGAL, | 9190 | | CB_RECALL, CLOSE, COMMIT, CREATE, | 9191 | | DELEGPURGE, DELEGRETURN, GETATTR, | 9192 | | ILLEGAL, LINK, LOCK, LOCKT, LOCKU, | 9193 | | LOOKUP, NVERIFY, OPEN, OPENATTR, | 9194 | | OPEN_CONFIRM, OPEN_DOWNGRADE, PUTFH, | 9195 | | READ, READDIR, RELEASE_LOCKOWNER, | 9196 | | REMOVE, RENAME, RENEW, SECINFO, | 9197 | | SETATTR, SETCLIENTID, | 9198 | | SETCLIENTID_CONFIRM, VERIFY, WRITE | 9199 | | | 9200 | NFS4ERR_BAD_COOKIE | READDIR | 9201 | | | 9202 | NFS4ERR_BAD_RANGE | LOCK, LOCKT, LOCKU | 9203 | | | 9204 | NFS4ERR_BAD_SEQID | CLOSE, LOCK, LOCKU, OPEN, | 9205 | | OPEN_CONFIRM, OPEN_DOWNGRADE | 9206 | | | 9207 | NFS4ERR_BAD_STATEID | CB_RECALL, CLOSE, DELEGRETURN, LOCK, | 9208 | | LOCKU, OPEN, OPEN_CONFIRM, | 9209 | | OPEN_DOWNGRADE, READ, SETATTR, WRITE | 9210 | | | 9211 | NFS4ERR_CB_PATH_DOWN | RENEW | 9212 | | | 9213 | NFS4ERR_CLID_INUSE | SETCLIENTID, SETCLIENTID_CONFIRM | 9214 | | | 9215 | NFS4ERR_DEADLOCK | LOCK | 9216 | | | 9217 | NFS4ERR_DELAY | ACCESS, CB_GETATTR, CB_RECALL, CLOSE, | 9218 | | COMMIT, CREATE, DELEGPURGE, | 9219 | | DELEGRETURN, GETATTR, LINK, LOCK, | 9220 | | LOCKT, LOCKU, LOOKUP, LOOKUPP, | 9221 | | NVERIFY, OPEN, OPENATTR, | 9222 | | OPEN_DOWNGRADE, PUTFH, PUTPUBFH, | 9223 | | PUTROOTFH, READ, READDIR, READLINK, | 9224 | | REMOVE, RENAME, SECINFO, SETATTR, | 9225 | | SETCLIENTID, SETCLIENTID_CONFIRM, | 9226 | | VERIFY, WRITE | 9227 | | | 9228 | NFS4ERR_DENIED | LOCK, LOCKT | 9229 | | | 9230 | NFS4ERR_DQUOT | CREATE, LINK, OPEN, OPENATTR, RENAME, | 9231 | | SETATTR, WRITE | 9232 | | | 9233 | NFS4ERR_EXIST | CREATE, LINK, OPEN, RENAME | 9234 | | | 9235 | NFS4ERR_EXPIRED | CLOSE, DELEGRETURN, LOCK, LOCKT, | 9236 | | LOCKU, OPEN, OPEN_CONFIRM, | 9237 | | OPEN_DOWNGRADE, READ, | 9238 | | RELEASE_LOCKOWNER, RENEW, SETATTR, | 9239 | | WRITE | 9240 | | | 9241 | NFS4ERR_FBIG | OPEN, SETATTR, WRITE | 9242 | | | 9243 | NFS4ERR_FHEXPIRED | ACCESS, CLOSE, COMMIT, CREATE, | 9244 | | GETATTR, GETFH, LINK, LOCK, LOCKT, | 9245 | | LOCKU, LOOKUP, LOOKUPP, NVERIFY, OPEN, | 9246 | | OPENATTR, OPEN_CONFIRM, | 9247 | | OPEN_DOWNGRADE, PUTFH, READ, READDIR, | 9248 | | READLINK, REMOVE, RENAME, RESTOREFH, | 9249 | | SAVEFH, SECINFO, SETATTR, VERIFY, | 9250 | | WRITE | 9251 | | | 9252 | NFS4ERR_FILE_OPEN | LINK, REMOVE, RENAME | 9253 | | | 9254 | NFS4ERR_GRACE | GETATTR, LOCK, LOCKT, LOCKU, NVERIFY, | 9255 | | OPEN, READ, REMOVE, RENAME, SETATTR, | 9256 | | VERIFY, WRITE | 9257 | | | 9258 | NFS4ERR_INVAL | ACCESS, CB_GETATTR, CLOSE, COMMIT, | 9259 | | CREATE, DELEGRETURN, GETATTR, LINK, | 9260 | | LOCK, LOCKT, LOCKU, LOOKUP, NVERIFY, | 9261 | | OPEN, OPEN_CONFIRM, OPEN_DOWNGRADE, | 9262 | | READ, READDIR, READLINK, REMOVE, | 9263 | | RENAME, SECINFO, SETATTR, SETCLIENTID, | 9264 | | VERIFY, WRITE | 9265 | | | 9266 | NFS4ERR_IO | ACCESS, COMMIT, CREATE, GETATTR, LINK, | 9267 | | LOOKUP, LOOKUPP, NVERIFY, OPEN, | 9268 | | OPENATTR, READ, READDIR, READLINK, | 9269 | | REMOVE, RENAME, SETATTR, VERIFY, WRITE | 9270 | | | 9271 | NFS4ERR_ISDIR | CLOSE, COMMIT, LINK, LOCK, LOCKT, | 9272 | | LOCKU, OPEN, OPEN_CONFIRM, READ, | 9273 | | READLINK, SETATTR, WRITE | 9274 | | | 9275 | NFS4ERR_LEASE_MOVED | CLOSE, DELEGPURGE, DELEGRETURN, LOCK, | 9276 | | LOCKT, LOCKU, OPEN_CONFIRM, | 9277 | | OPEN_DOWNGRADE, READ, | 9278 | | RELEASE_LOCKOWNER, RENEW, SETATTR, | 9279 | | WRITE | 9280 | | | 9281 | NFS4ERR_LOCKED | READ, SETATTR, WRITE | 9282 | | | 9283 | NFS4ERR_LOCKS_HELD | CLOSE, OPEN_DOWNGRADE, | 9284 | | RELEASE_LOCKOWNER | 9285 | | | 9286 | NFS4ERR_LOCK_NOTSUPP | LOCK | 9287 | | | 9288 | NFS4ERR_LOCK_RANGE | LOCK, LOCKT, LOCKU | 9289 | | | 9290 | NFS4ERR_MLINK | LINK | 9291 | | | 9292 | NFS4ERR_MOVED | ACCESS, CLOSE, COMMIT, CREATE, | 9293 | | DELEGRETURN, GETATTR, GETFH, LINK, | 9294 | | LOCK, LOCKT, LOCKU, LOOKUP, LOOKUPP, | 9295 | | NVERIFY, OPEN, OPENATTR, OPEN_CONFIRM, | 9296 | | OPEN_DOWNGRADE, PUTFH, READ, READDIR, | 9297 | | READLINK, REMOVE, RENAME, RESTOREFH, | 9298 | | SAVEFH, SECINFO, SETATTR, VERIFY, | 9299 | | WRITE | 9300 | | | 9301 | NFS4ERR_NAMETOOLONG | CREATE, LINK, LOOKUP, OPEN, REMOVE, | 9302 | | RENAME, SECINFO | 9303 | | | 9304 | NFS4ERR_NOENT | LINK, LOOKUP, LOOKUPP, OPEN, OPENATTR, | 9305 | | REMOVE, RENAME, SECINFO | 9306 | | | 9307 | NFS4ERR_NOFILEHANDLE | ACCESS, CLOSE, COMMIT, CREATE, | 9308 | | DELEGRETURN, GETATTR, GETFH, LINK, | 9309 | | LOCK, LOCKT, LOCKU, LOOKUP, LOOKUPP, | 9310 | | NVERIFY, OPEN, OPENATTR, OPEN_CONFIRM, | 9311 | | OPEN_DOWNGRADE, READ, READDIR, | 9312 | | READLINK, REMOVE, RENAME, SAVEFH, | 9313 | | SECINFO, SETATTR, VERIFY, WRITE | 9314 | | | 9315 | NFS4ERR_NOSPC | CREATE, LINK, OPEN, OPENATTR, RENAME, | 9316 | | SETATTR, WRITE | 9317 | | | 9318 | NFS4ERR_NOTDIR | CREATE, LINK, LOOKUP, LOOKUPP, OPEN, | 9319 | | READDIR, REMOVE, RENAME, SECINFO | 9320 | | | 9321 | NFS4ERR_NOTEMPTY | REMOVE, RENAME | 9322 | | | 9323 | NFS4ERR_NOTSUP | OPEN, READLINK | 9324 | | | 9325 | NFS4ERR_NOTSUPP | DELEGPURGE, DELEGRETURN, LINK, | 9326 | | OPENATTR | 9327 | | | 9328 | NFS4ERR_NOT_SAME | READDIR, VERIFY | 9329 | | | 9330 | NFS4ERR_NO_GRACE | LOCK, OPEN | 9331 | | | 9332 | NFS4ERR_NXIO | WRITE | 9333 | | | 9334 | NFS4ERR_OLD_STATEID | CLOSE, DELEGRETURN, LOCK, LOCKU, OPEN, | 9335 | | OPEN_CONFIRM, OPEN_DOWNGRADE, READ, | 9336 | | SETATTR, WRITE | 9337 | | | 9338 | NFS4ERR_OPENMODE | LOCK, READ, SETATTR, WRITE | 9339 | | | 9340 | NFS4ERR_OP_ILLEGAL | CB_ILLEGAL, ILLEGAL | 9341 | | | 9342 | NFS4ERR_PERM | CREATE, OPEN, SETATTR | 9343 | | | 9344 | NFS4ERR_RECLAIM_BAD | LOCK, OPEN | 9345 | | | 9346 | NFS4ERR_RECLAIM_CONFLICT | LOCK, OPEN | 9347 | | | 9348 | NFS4ERR_RESOURCE | ACCESS, CLOSE, COMMIT, CREATE, | 9349 | | DELEGPURGE, DELEGRETURN, GETATTR, | 9350 | | GETFH, LINK, LOCK, LOCKT, LOCKU, | 9351 | | LOOKUP, LOOKUPP, OPEN, OPENATTR, | 9352 | | OPEN_CONFIRM, OPEN_DOWNGRADE, READ, | 9353 | | READDIR, READLINK, RELEASE_LOCKOWNER, | 9354 | | REMOVE, RENAME, RENEW, RESTOREFH, | 9355 | | SAVEFH, SECINFO, SETATTR, SETCLIENTID, | 9356 | | SETCLIENTID_CONFIRM, VERIFY, WRITE | 9357 | | | 9358 | NFS4ERR_RESTOREFH | RESTOREFH | 9359 | | | 9360 | NFS4ERR_ROFS | COMMIT, CREATE, LINK, OPEN, OPENATTR, | 9361 | | OPEN_DOWNGRADE, REMOVE, RENAME, | 9362 | | SETATTR, WRITE | 9363 | | | 9364 | NFS4ERR_SAME | NVERIFY | 9365 | | | 9366 | NFS4ERR_SERVERFAULT | ACCESS, CB_GETATTR, CB_RECALL, CLOSE, | 9367 | | COMMIT, CREATE, DELEGPURGE, | 9368 | | DELEGRETURN, GETATTR, GETFH, LINK, | 9369 | | LOCK, LOCKT, LOCKU, LOOKUP, LOOKUPP, | 9370 | | NVERIFY, OPEN, OPENATTR, OPEN_CONFIRM, | 9371 | | OPEN_DOWNGRADE, PUTFH, PUTPUBFH, | 9372 | | PUTROOTFH, READ, READDIR, READLINK, | 9373 | | RELEASE_LOCKOWNER, REMOVE, RENAME, | 9374 | | RENEW, RESTOREFH, SAVEFH, SECINFO, | 9375 | | SETATTR, SETCLIENTID, | 9376 | | SETCLIENTID_CONFIRM, VERIFY, WRITE | 9377 | | | 9378 | NFS4ERR_SHARE_DENIED | OPEN | 9379 | | | 9380 | NFS4ERR_STALE | ACCESS, CLOSE, COMMIT, CREATE, | 9381 | | DELEGRETURN, GETATTR, GETFH, LINK, | 9382 | | LOCK, LOCKT, LOCKU, LOOKUP, LOOKUPP, | 9383 | | NVERIFY, OPEN, OPENATTR, OPEN_CONFIRM, | 9384 | | OPEN_DOWNGRADE, PUTFH, READ, READDIR, | 9385 | | READLINK, REMOVE, RENAME, RESTOREFH, | 9386 | | SAVEFH, SECINFO, SETATTR, VERIFY, | 9387 | | WRITE | 9388 | | | 9389 | NFS4ERR_STALE_CLIENTID | DELEGPURGE, LOCK, LOCKT, OPEN, | 9390 | | RELEASE_LOCKOWNER, RENEW, | 9391 | | SETCLIENTID_CONFIRM | 9392 | | | 9393 | NFS4ERR_STALE_STATEID | CLOSE, DELEGRETURN, LOCK, LOCKU, | 9394 | | OPEN_CONFIRM, OPEN_DOWNGRADE, READ, | 9395 | | SETATTR, WRITE | 9396 | | | 9397 | NFS4ERR_SYMLINK | COMMIT, LOOKUP, LOOKUPP, OPEN, READ, | 9398 | | WRITE | 9399 | | | 9400 | NFS4ERR_TOOSMALL | READDIR | 9401 | | | 9402 | NFS4ERR_WRONGSEC | LINK, LOOKUP, LOOKUPP, OPEN, PUTFH, | 9403 | | PUTPUBFH, PUTROOTFH, RENAME, RESTOREFH | 9404 | | | 9405 | NFS4ERR_XDEV | LINK, RENAME | 9406 | | | 9407 +--------------------------+----------------------------------------+ 9409 Table 9 9411 14. NFSv4 Requests 9413 For the NFSv4 RPC program, there are two traditional RPC procedures: 9414 NULL and COMPOUND. All other functionality is defined as a set of 9415 operations and these operations are defined in normal XDR/RPC syntax 9416 and semantics. However, these operations are encapsulated within the 9417 COMPOUND procedure. This requires that the client combine one or 9418 more of the NFSv4 operations into a single request. 9420 The NFS4_CALLBACK program is used to provide server to client 9421 signaling and is constructed in a similar fashion as the NFSv4 9422 program. The procedures CB_NULL and CB_COMPOUND are defined in the 9423 same way as NULL and COMPOUND are within the NFS program. The 9424 CB_COMPOUND request also encapsulates the remaining operations of the 9425 NFS4_CALLBACK program. There is no predefined RPC program number for 9426 the NFS4_CALLBACK program. It is up to the client to specify a 9427 program number in the "transient" program range. The program and 9428 port number of the NFS4_CALLBACK program are provided by the client 9429 as part of the SETCLIENTID/SETCLIENTID_CONFIRM sequence. The program 9430 and port can be changed by another SETCLIENTID/SETCLIENTID_CONFIRM 9431 sequence, and it is possible to use the sequence to change them 9432 within a client incarnation without removing relevant leased client 9433 state. 9435 14.1. Compound Procedure 9437 The COMPOUND procedure provides the opportunity for better 9438 performance within high latency networks. The client can avoid 9439 cumulative latency of multiple RPCs by combining multiple dependent 9440 operations into a single COMPOUND procedure. A compound operation 9441 may provide for protocol simplification by allowing the client to 9442 combine basic procedures into a single request that is customized for 9443 the client's environment. 9445 The CB_COMPOUND procedure precisely parallels the features of 9446 COMPOUND as described above. 9448 The basic structure of the COMPOUND procedure is: 9450 +-----+--------------+--------+-----------+-----------+-----------+-- 9451 | tag | minorversion | numops | op + args | op + args | op + args | 9452 +-----+--------------+--------+-----------+-----------+-----------+-- 9454 and the reply's structure is: 9456 +------------+-----+--------+-----------------------+-- 9457 |last status | tag | numres | status + op + results | 9458 +------------+-----+--------+-----------------------+-- 9460 The numops and numres fields, used in the depiction above, represent 9461 the count for the counted array encoding use to signify the number of 9462 arguments or results encoded in the request and response. As per the 9463 XDR encoding, these counts must match exactly the number of operation 9464 arguments or results encoded. 9466 14.2. Evaluation of a Compound Request 9468 The server will process the COMPOUND procedure by evaluating each of 9469 the operations within the COMPOUND procedure in order. Each 9470 component operation consists of a 32 bit operation code, followed by 9471 the argument of length determined by the type of operation. The 9472 results of each operation are encoded in sequence into a reply 9473 buffer. The results of each operation are preceded by the opcode and 9474 a status code (normally zero). If an operation results in a non-zero 9475 status code, the status will be encoded and evaluation of the 9476 compound sequence will halt and the reply will be returned. Note 9477 that evaluation stops even in the event of "non error" conditions 9478 such as NFS4ERR_SAME. 9480 There are no atomicity requirements for the operations contained 9481 within the COMPOUND procedure. The operations being evaluated as 9482 part of a COMPOUND request may be evaluated simultaneously with other 9483 COMPOUND requests that the server receives. 9485 A COMPOUND is not a transaction and it is the client's responsibility 9486 for recovering from any partially completed COMPOUND procedure. 9487 These may occur at any point due to errors such as NFS4ERR_RESOURCE 9488 and NFS4ERR_DELAY. Note that these errors can occur in an otherwise 9489 valid operation string. Further, a server reboot which occurs in the 9490 middle of processing a COMPOUND procedure may leave the client with 9491 the difficult task of determining how far COMPOUND processing has 9492 proceeded. Therefore, the client should avoid overly complex 9493 COMPOUND procedures in the event of the failure of an operation 9494 within the procedure. 9496 Each operation assumes a "current" and "saved" filehandle that is 9497 available as part of the execution context of the compound request. 9498 Operations may set, change, or return the current filehandle. The 9499 "saved" filehandle is used for temporary storage of a filehandle 9500 value and as operands for the RENAME and LINK operations. 9502 14.3. Synchronous Modifying Operations 9504 NFSv4 operations that modify the file system are synchronous. When 9505 an operation is successfully completed at the server, the client can 9506 depend that any data associated with the request is now on stable 9507 storage (the one exception is in the case of the file data in a WRITE 9508 operation with the UNSTABLE4 option specified). 9510 This implies that any previous operations within the same compound 9511 request are also reflected in stable storage. This behavior enables 9512 the client's ability to recover from a partially executed compound 9513 request which may resulted from the failure of the server. For 9514 example, if a compound request contains operations A and B and the 9515 server is unable to send a response to the client, depending on the 9516 progress the server made in servicing the request the result of both 9517 operations may be reflected in stable storage or just operation A may 9518 be reflected. The server must not have just the results of operation 9519 B in stable storage. 9521 14.4. Operation Values 9523 The operations encoded in the COMPOUND procedure are identified by 9524 operation values. To avoid overlap with the RPC procedure numbers, 9525 operations 0 (zero) and 1 are not defined. Operation 2 is not 9526 defined but reserved for future use with minor versioning. 9528 15. NFSv4 Procedures 9530 [RFC Editor: prior to publishing this document as an RFC, please have 9531 every Section that has a title of "Procedure X:" or "Operation Y:" 9532 start at the top of a new page.] 9534 15.1. Procedure 0: NULL - No Operation 9536 15.1.1. SYNOPSIS 9538 9540 15.1.2. ARGUMENT 9542 void; 9544 15.1.3. RESULT 9546 void; 9548 15.1.4. DESCRIPTION 9550 Standard NULL procedure. Void argument, void response. This 9551 procedure has no functionality associated with it. Because of this 9552 it is sometimes used to measure the overhead of processing a service 9553 request. Therefore, the server should ensure that no unnecessary 9554 work is done in servicing this procedure. 9556 15.2. Procedure 1: COMPOUND - Compound Operations 9557 15.2.1. SYNOPSIS 9559 compoundargs -> compoundres 9561 15.2.2. ARGUMENT 9563 union nfs_argop4 switch (nfs_opnum4 argop) { 9564 case : ; 9565 ... 9566 }; 9568 struct COMPOUND4args { 9569 utf8str_cs tag; 9570 uint32_t minorversion; 9571 nfs_argop4 argarray<>; 9572 }; 9574 15.2.3. RESULT 9576 union nfs_resop4 switch (nfs_opnum4 resop) { 9577 case : ; 9578 ... 9579 }; 9581 struct COMPOUND4res { 9582 nfsstat4 status; 9583 utf8str_cs tag; 9584 nfs_resop4 resarray<>; 9585 }; 9587 15.2.4. DESCRIPTION 9589 The COMPOUND procedure is used to combine one or more of the NFS 9590 operations into a single RPC request. The main NFS RPC program has 9591 two main procedures: NULL and COMPOUND. All other operations use the 9592 COMPOUND procedure as a wrapper. 9594 The COMPOUND procedure is used to combine individual operations into 9595 a single RPC request. The server interprets each of the operations 9596 in turn. If an operation is executed by the server and the status of 9597 that operation is NFS4_OK, then the next operation in the COMPOUND 9598 procedure is executed. The server continues this process until there 9599 are no more operations to be executed or one of the operations has a 9600 status value other than NFS4_OK. 9602 In the processing of the COMPOUND procedure, the server may find that 9603 it does not have the available resources to execute any or all of the 9604 operations within the COMPOUND sequence. In this case, the error 9605 NFS4ERR_RESOURCE will be returned for the particular operation within 9606 the COMPOUND procedure where the resource exhaustion occurred. This 9607 assumes that all previous operations within the COMPOUND sequence 9608 have been evaluated successfully. The results for all of the 9609 evaluated operations must be returned to the client. 9611 The server will generally choose between two methods of decoding the 9612 client's request. The first would be the traditional one-pass XDR 9613 decode, in which decoding of the entire COMPOUND precedes execution 9614 of any operation within it. If there is an XDR decoding error in 9615 this case, an RPC XDR decode error would be returned. The second 9616 method would be to make an initial pass to decode the basic COMPOUND 9617 request and then to XDR decode each of the individual operations, as 9618 the server is ready to execute it. In this case, the server may 9619 encounter an XDR decode error during such an operation decode, after 9620 previous operations within the COMPOUND have been executed. In this 9621 case, the server would return the error NFS4ERR_BADXDR to signify the 9622 decode error. 9624 The COMPOUND arguments contain a "minorversion" field. The initial 9625 and default value for this field is 0 (zero). This field will be 9626 used by future minor versions such that the client can communicate to 9627 the server what minor version is being requested. If the server 9628 receives a COMPOUND procedure with a minorversion field value that it 9629 does not support, the server MUST return an error of 9630 NFS4ERR_MINOR_VERS_MISMATCH and a zero length resultdata array. 9632 Contained within the COMPOUND results is a "status" field. If the 9633 results array length is non-zero, this status must be equivalent to 9634 the status of the last operation that was executed within the 9635 COMPOUND procedure. Therefore, if an operation incurred an error 9636 then the "status" value will be the same error value as is being 9637 returned for the operation that failed. 9639 Note that operations, 0 (zero), 1 (one), and 2 (two) are not defined 9640 for the COMPOUND procedure. It is possible that the server receives 9641 a request that contains an operation that is less than the first 9642 legal operation (OP_ACCESS) or greater than the last legal operation 9643 (OP_RELEASE_LOCKOWNER). In this case, the server's response will 9644 encode the opcode OP_ILLEGAL rather than the illegal opcode of the 9645 request. The status field in the ILLEGAL return results will set to 9646 NFS4ERR_OP_ILLEGAL. The COMPOUND procedure's return results will 9647 also be NFS4ERR_OP_ILLEGAL. 9649 The definition of the "tag" in the request is left to the 9650 implementer. It may be used to summarize the content of the compound 9651 request for the benefit of packet sniffers and engineers debugging 9652 implementations. However, the value of "tag" in the response SHOULD 9653 be the same value as provided in the request. This applies to the 9654 tag field of the CB_COMPOUND procedure as well. 9656 15.2.4.1. Current Filehandle 9658 The current and saved filehandle are used throughout the protocol. 9659 Most operations implicitly use the current filehandle as a argument 9660 and many set the current filehandle as part of the results. The 9661 combination of client specified sequences of operations and current 9662 and saved filehandle arguments and results allows for greater 9663 protocol flexibility. The best or easiest example of current 9664 filehandle usage is a sequence like the following: 9666 PUTFH fh1 {fh1} 9667 LOOKUP "compA" {fh2} 9668 GETATTR {fh2} 9669 LOOKUP "compB" {fh3} 9670 GETATTR {fh3} 9671 LOOKUP "compC" {fh4} 9672 GETATTR {fh4} 9673 GETFH 9675 Figure 1 9677 In this example, the PUTFH (Section 15.22) operation explicitly sets 9678 the current filehandle value while the result of each LOOKUP 9679 operation sets the current filehandle value to the resultant file 9680 system object. Also, the client is able to insert GETATTR operations 9681 using the current filehandle as an argument. 9683 The PUTROOTFH (Section 15.24) and PUTPUBFH (Section 15.24) operations 9684 also set the current filehandle. The above example would replace 9685 "PUTFH fh1" with PUTROOTFH or PUTPUBFH with no filehandle argument in 9686 order to achieve the same effect (on the assumption that "compA" is 9687 directly below the root of the namespace). 9689 Along with the current filehandle, there is a saved filehandle. 9690 While the current filehandle is set as the result of operations like 9691 LOOKUP, the saved filehandle must be set directly with the use of the 9692 SAVEFH operation. The SAVEFH operations copies the current 9693 filehandle value to the saved value. The saved filehandle value is 9694 used in combination with the current filehandle value for the LINK 9695 and RENAME operations. The RESTOREFH operation will copy the saved 9696 filehandle value to the current filehandle value; as a result, the 9697 saved filehandle value may be used a sort of "scratch" area for the 9698 client's series of operations. 9700 15.2.5. IMPLEMENTATION 9702 Since an error of any type may occur after only a portion of the 9703 operations have been evaluated, the client must be prepared to 9704 recover from any failure. If the source of an NFS4ERR_RESOURCE error 9705 was a complex or lengthy set of operations, it is likely that if the 9706 number of operations were reduced the server would be able to 9707 evaluate them successfully. Therefore, the client is responsible for 9708 dealing with this type of complexity in recovery. 9710 A single compound should not contain multiple operations that have 9711 different values for the clientid field used in OPEN, LOCK, RENEW. 9712 This can cause confusion in cases in which operations that do not 9713 contain clientids have potential interactions with operations that 9714 do. When only a single clientid has been used, it is clear what 9715 client is being referenced. For a particular example involving the 9716 interaction of OPEN and GETATTR, see Section 15.18.6. 9718 15.3. Operation 3: ACCESS - Check Access Rights 9720 15.3.1. SYNOPSIS 9722 (cfh), accessreq -> supported, accessrights 9724 15.3.2. ARGUMENT 9726 const ACCESS4_READ = 0x00000001; 9727 const ACCESS4_LOOKUP = 0x00000002; 9728 const ACCESS4_MODIFY = 0x00000004; 9729 const ACCESS4_EXTEND = 0x00000008; 9730 const ACCESS4_DELETE = 0x00000010; 9731 const ACCESS4_EXECUTE = 0x00000020; 9733 struct ACCESS4args { 9734 /* CURRENT_FH: object */ 9735 uint32_t access; 9736 }; 9738 15.3.3. RESULT 9739 struct ACCESS4resok { 9740 uint32_t supported; 9741 uint32_t access; 9742 }; 9744 union ACCESS4res switch (nfsstat4 status) { 9745 case NFS4_OK: 9746 ACCESS4resok resok4; 9747 default: 9748 void; 9749 }; 9751 15.3.4. DESCRIPTION 9753 ACCESS determines the access rights that a user, as identified by the 9754 credentials in the RPC request, has with respect to the file system 9755 object specified by the current filehandle. The client encodes the 9756 set of access rights that are to be checked in the bit mask "access". 9757 The server checks the permissions encoded in the bit mask. If a 9758 status of NFS4_OK is returned, two bit masks are included in the 9759 response. The first, "supported", represents the access rights for 9760 which the server can verify reliably. The second, "access", 9761 represents the access rights available to the user for the filehandle 9762 provided. On success, the current filehandle retains its value. 9764 Note that the supported field will contain only as many values as 9765 were originally sent in the arguments. For example, if the client 9766 sends an ACCESS operation with only the ACCESS4_READ value set and 9767 the server supports this value, the server will return only 9768 ACCESS4_READ even if it could have reliably checked other values. 9770 The results of this operation are necessarily advisory in nature. A 9771 return status of NFS4_OK and the appropriate bit set in the bit mask 9772 does not imply that such access will be allowed to the file system 9773 object in the future. This is because access rights can be revoked 9774 by the server at any time. 9776 The following access permissions may be requested: 9778 ACCESS4_READ: Read data from file or read a directory. 9780 ACCESS4_LOOKUP: Look up a name in a directory (no meaning for non- 9781 directory objects). 9783 ACCESS4_MODIFY: Rewrite existing file data or modify existing 9784 directory entries. 9786 ACCESS4_EXTEND: Write new data or add directory entries. 9788 ACCESS4_DELETE: Delete an existing directory entry. 9790 ACCESS4_EXECUTE: Execute file (no meaning for a directory). 9792 On success, the current filehandle retains its value. 9794 15.3.5. IMPLEMENTATION 9796 In general, it is not sufficient for the client to attempt to deduce 9797 access permissions by inspecting the uid, gid, and mode fields in the 9798 file attributes or by attempting to interpret the contents of the ACL 9799 attribute. This is because the server may perform uid or gid mapping 9800 or enforce additional access control restrictions. It is also 9801 possible that the server may not be in the same ID space as the 9802 client. In these cases (and perhaps others), the client cannot 9803 reliably perform an access check with only current file attributes. 9805 In the NFSv2 protocol, the only reliable way to determine whether an 9806 operation was allowed was to try it and see if it succeeded or 9807 failed. Using the ACCESS operation in the NFSv4 protocol, the client 9808 can ask the server to indicate whether or not one or more classes of 9809 operations are permitted. The ACCESS operation is provided to allow 9810 clients to check before doing a series of operations which might 9811 result in an access failure. The OPEN operation provides a point 9812 where the server can verify access to the file object and method to 9813 return that information to the client. The ACCESS operation is still 9814 useful for directory operations or for use in the case the UNIX API 9815 "access" is used on the client. 9817 The information returned by the server in response to an ACCESS call 9818 is not permanent. It was correct at the exact time that the server 9819 performed the checks, but not necessarily afterward. The server can 9820 revoke access permission at any time. 9822 The client should use the effective credentials of the user to build 9823 the authentication information in the ACCESS request used to 9824 determine access rights. It is the effective user and group 9825 credentials that are used in subsequent read and write operations. 9827 Many implementations do not directly support the ACCESS4_DELETE 9828 permission. Operating systems like UNIX will ignore the 9829 ACCESS4_DELETE bit if set on an access request on a non-directory 9830 object. In these systems, delete permission on a file is determined 9831 by the access permissions on the directory in which the file resides, 9832 instead of being determined by the permissions of the file itself. 9833 Therefore, the mask returned enumerating which access rights can be 9834 supported will have the ACCESS4_DELETE value set to 0. This 9835 indicates to the client that the server was unable to check that 9836 particular access right. The ACCESS4_DELETE bit in the access mask 9837 returned will then be ignored by the client. 9839 15.4. Operation 4: CLOSE - Close File 9841 15.4.1. SYNOPSIS 9843 (cfh), seqid, open_stateid -> open_stateid 9845 15.4.2. ARGUMENT 9847 struct CLOSE4args { 9848 /* CURRENT_FH: object */ 9849 seqid4 seqid; 9850 stateid4 open_stateid; 9851 }; 9853 15.4.3. RESULT 9855 union CLOSE4res switch (nfsstat4 status) { 9856 case NFS4_OK: 9857 stateid4 open_stateid; 9858 default: 9859 void; 9860 }; 9862 15.4.4. DESCRIPTION 9864 The CLOSE operation releases share reservations for the regular or 9865 named attribute file as specified by the current filehandle. The 9866 share reservations and other state information released at the server 9867 as a result of this CLOSE is only associated with the supplied 9868 stateid. The sequence id provides for the correct ordering. State 9869 associated with other OPENs is not affected. 9871 If byte-range locks are held, the client SHOULD release all locks 9872 before issuing a CLOSE. The server MAY free all outstanding locks on 9873 CLOSE but some servers may not support the CLOSE of a file that still 9874 has byte-range locks held. The server MUST return failure if any 9875 locks would exist after the CLOSE. 9877 On success, the current filehandle retains its value. 9879 15.4.5. IMPLEMENTATION 9881 Even though CLOSE returns a stateid, this stateid is not useful to 9882 the client and should be treated as deprecated. CLOSE "shuts down" 9883 the state associated with all OPENs for the file by a single open- 9884 owner. As noted above, CLOSE will either release all file locking 9885 state or return an error. Therefore, the stateid returned by CLOSE 9886 is not useful for operations that follow. 9888 15.5. Operation 5: COMMIT - Commit Cached Data 9890 15.5.1. SYNOPSIS 9892 (cfh), offset, count -> verifier 9894 15.5.2. ARGUMENT 9896 struct COMMIT4args { 9897 /* CURRENT_FH: file */ 9898 offset4 offset; 9899 count4 count; 9900 }; 9902 15.5.3. RESULT 9904 struct COMMIT4resok { 9905 verifier4 writeverf; 9906 }; 9908 union COMMIT4res switch (nfsstat4 status) { 9909 case NFS4_OK: 9910 COMMIT4resok resok4; 9911 default: 9912 void; 9913 }; 9915 15.5.4. DESCRIPTION 9917 The COMMIT operation forces or flushes data to stable storage for the 9918 file specified by the current filehandle. The flushed data is that 9919 which was previously written with a WRITE operation which had the 9920 stable field set to UNSTABLE4. 9922 The offset specifies the position within the file where the flush is 9923 to begin. An offset value of 0 (zero) means to flush data starting 9924 at the beginning of the file. The count specifies the number of 9925 bytes of data to flush. If count is 0 (zero), a flush from offset to 9926 the end of the file is done. 9928 The server returns a write verifier upon successful completion of the 9929 COMMIT. The write verifier is used by the client to determine if the 9930 server has restarted or rebooted between the initial WRITE(s) and the 9931 COMMIT. The client does this by comparing the write verifier 9932 returned from the initial writes and the verifier returned by the 9933 COMMIT operation. The server must vary the value of the write 9934 verifier at each server event or instantiation that may lead to a 9935 loss of uncommitted data. Most commonly this occurs when the server 9936 is rebooted; however, other events at the server may result in 9937 uncommitted data loss as well. 9939 On success, the current filehandle retains its value. 9941 15.5.5. IMPLEMENTATION 9943 The COMMIT operation is similar in operation and semantics to the 9944 POSIX fsync() [fsync] system call that synchronizes a file's state 9945 with the disk (file data and metadata is flushed to disk or stable 9946 storage). COMMIT performs the same operation for a client, flushing 9947 any unsynchronized data and metadata on the server to the server's 9948 disk or stable storage for the specified file. Like fsync(), it may 9949 be that there is some modified data or no modified data to 9950 synchronize. The data may have been synchronized by the server's 9951 normal periodic buffer synchronization activity. COMMIT should 9952 return NFS4_OK, unless there has been an unexpected error. 9954 COMMIT differs from fsync() in that it is possible for the client to 9955 flush a range of the file (most likely triggered by a buffer- 9956 reclamation scheme on the client before file has been completely 9957 written). 9959 The server implementation of COMMIT is reasonably simple. If the 9960 server receives a full file COMMIT request, that is starting at 9961 offset 0 and count 0, it should do the equivalent of fsync()'ing the 9962 file. Otherwise, it should arrange to have the cached data in the 9963 range specified by offset and count to be flushed to stable storage. 9964 In both cases, any metadata associated with the file must be flushed 9965 to stable storage before returning. It is not an error for there to 9966 be nothing to flush on the server. This means that the data and 9967 metadata that needed to be flushed have already been flushed or lost 9968 during the last server failure. 9970 The client implementation of COMMIT is a little more complex. There 9971 are two reasons for wanting to commit a client buffer to stable 9972 storage. The first is that the client wants to reuse a buffer. In 9973 this case, the offset and count of the buffer are sent to the server 9974 in the COMMIT request. The server then flushes any cached data based 9975 on the offset and count, and flushes any metadata associated with the 9976 file. It then returns the status of the flush and the write 9977 verifier. The other reason for the client to generate a COMMIT is 9978 for a full file flush, such as may be done at close. In this case, 9979 the client would gather all of the buffers for this file that contain 9980 uncommitted data, do the COMMIT operation with an offset of 0 and 9981 count of 0, and then free all of those buffers. Any other dirty 9982 buffers would be sent to the server in the normal fashion. 9984 After a buffer is written by the client with the stable parameter set 9985 to UNSTABLE4, the buffer must be considered as modified by the client 9986 until the buffer has either been flushed via a COMMIT operation or 9987 written via a WRITE operation with stable parameter set to FILE_SYNC4 9988 or DATA_SYNC4. This is done to prevent the buffer from being freed 9989 and reused before the data can be flushed to stable storage on the 9990 server. 9992 When a response is returned from either a WRITE or a COMMIT operation 9993 and it contains a write verifier that is different than previously 9994 returned by the server, the client will need to retransmit all of the 9995 buffers containing uncommitted cached data to the server. How this 9996 is to be done is up to the implementer. If there is only one buffer 9997 of interest, then it should probably be sent back over in a WRITE 9998 request with the appropriate stable parameter. If there is more than 9999 one buffer, it might be worthwhile retransmitting all of the buffers 10000 in WRITE requests with the stable parameter set to UNSTABLE4 and then 10001 retransmitting the COMMIT operation to flush all of the data on the 10002 server to stable storage. The timing of these retransmissions is 10003 left to the implementer. 10005 The above description applies to page-cache-based systems as well as 10006 buffer-cache-based systems. In those systems, the virtual memory 10007 system will need to be modified instead of the buffer cache. 10009 15.6. Operation 6: CREATE - Create a Non-Regular File Object 10011 15.6.1. SYNOPSIS 10013 (cfh), name, type, attrs -> (cfh), cinfo, attrset 10015 15.6.2. ARGUMENT 10016 union createtype4 switch (nfs_ftype4 type) { 10017 case NF4LNK: 10018 linktext4 linkdata; 10019 case NF4BLK: 10020 case NF4CHR: 10021 specdata4 devdata; 10022 case NF4SOCK: 10023 case NF4FIFO: 10024 case NF4DIR: 10025 void; 10026 default: 10027 void; /* server should return NFS4ERR_BADTYPE */ 10028 }; 10030 struct CREATE4args { 10031 /* CURRENT_FH: directory for creation */ 10032 createtype4 objtype; 10033 component4 objname; 10034 fattr4 createattrs; 10035 }; 10037 15.6.3. RESULT 10039 struct CREATE4resok { 10040 change_info4 cinfo; 10041 bitmap4 attrset; /* attributes set */ 10042 }; 10044 union CREATE4res switch (nfsstat4 status) { 10045 case NFS4_OK: 10046 CREATE4resok resok4; 10047 default: 10048 void; 10049 }; 10051 15.6.4. DESCRIPTION 10053 The CREATE operation creates a non-regular file object in a directory 10054 with a given name. The OPEN operation is used to create a regular 10055 file. 10057 The objname specifies the name for the new object. The objtype 10058 determines the type of object to be created: directory, symlink, etc. 10060 If an object of the same name already exists in the directory, the 10061 server will return the error NFS4ERR_EXIST. 10063 For the directory where the new file object was created, the server 10064 returns change_info4 information in cinfo. With the atomic field of 10065 the change_info4 struct, the server will indicate if the before and 10066 after change attributes were obtained atomically with respect to the 10067 file object creation. 10069 If the objname is of zero length, NFS4ERR_INVAL will be returned. 10070 The objname is also subject to the normal UTF-8, character support, 10071 and name checks. See Section 12.7 for further discussion. 10073 The current filehandle is replaced by that of the new object. 10075 The createattrs specifies the initial set of attributes for the 10076 object. The set of attributes may include any writable attribute 10077 valid for the object type. When the operation is successful, the 10078 server will return to the client an attribute mask signifying which 10079 attributes were successfully set for the object. 10081 If createattrs includes neither the owner attribute nor an ACL with 10082 an ACE for the owner, and if the server's file system both supports 10083 and requires an owner attribute (or an owner ACE) then the server 10084 MUST derive the owner (or the owner ACE). This would typically be 10085 from the principal indicated in the RPC credentials of the call, but 10086 the server's operating environment or file system semantics may 10087 dictate other methods of derivation. Similarly, if createattrs 10088 includes neither the group attribute nor a group ACE, and if the 10089 server's file system both supports and requires the notion of a group 10090 attribute (or group ACE), the server MUST derive the group attribute 10091 (or the corresponding owner ACE) for the file. This could be from 10092 the RPC call's credentials, such as the group principal if the 10093 credentials include it (such as with AUTH_SYS), from the group 10094 identifier associated with the principal in the credentials (e.g., 10095 POSIX systems have a user database [getpwnam] that has the group 10096 identifier for every user identifier), inherited from directory the 10097 object is created in, or whatever else the server's operating 10098 environment or file system semantics dictate. This applies to the 10099 OPEN operation too. 10101 Conversely, it is possible the client will specify in createattrs an 10102 owner attribute or group attribute or ACL that the principal 10103 indicated the RPC call's credentials does not have permissions to 10104 create files for. The error to be returned in this instance is 10105 NFS4ERR_PERM. This applies to the OPEN operation too. 10107 15.6.5. IMPLEMENTATION 10109 If the client desires to set attribute values after the create, a 10110 SETATTR operation can be added to the COMPOUND request so that the 10111 appropriate attributes will be set. 10113 15.7. Operation 7: DELEGPURGE - Purge Delegations Awaiting Recovery 10115 15.7.1. SYNOPSIS 10117 clientid -> 10119 15.7.2. ARGUMENT 10121 struct DELEGPURGE4args { 10122 clientid4 clientid; 10123 }; 10125 15.7.3. RESULT 10127 struct DELEGPURGE4res { 10128 nfsstat4 status; 10129 }; 10131 15.7.4. DESCRIPTION 10133 Purges all of the delegations awaiting recovery for a given client. 10134 This is useful for clients which do not commit delegation information 10135 to stable storage to indicate that conflicting requests need not be 10136 delayed by the server awaiting recovery of delegation information. 10138 This operation is provided to support clients that record delegation 10139 information on stable storage on the client. In this case, 10140 DELEGPURGE should be issued immediately after doing delegation 10141 recovery (using CLAIM_DELEGATE_PREV) on all delegations known to the 10142 client. Doing so will notify the server that no additional 10143 delegations for the client will be recovered allowing it to free 10144 resources, and avoid delaying other clients who make requests that 10145 conflict with the unrecovered delegations. All client SHOULD use 10146 DELEGPURGE as part of recovery once it is known that no further 10147 CLAIM_DELEGATE_PREV recovery will be done. This includes clients 10148 that do not record delegation information on stable storage, who 10149 would then do a DELEGPURGE immediately after SETCLIENTID_CONFIRM. 10151 The set of delegations known to the server and the client may be 10152 different. The reasons for this include: 10154 o A client may fail after making a request which resulted in 10155 delegation but before it received the results and committed them 10156 to the client's stable storage. 10158 o A client may fail after deleting its indication that a delegation 10159 exists but before the delegation return is fully processed by the 10160 server. 10162 o In the case in which the server and the client restart, the server 10163 may have limited persistent recording of delegation to a subset of 10164 those in existence. 10166 o A client may have only persistently recorded information about a 10167 subset of delegations. 10169 The server MAY support DELEGPURGE, but its support or non-support 10170 should match that of CLAIM_DELEGATE_PREV: 10172 o A server may support both DELEGPURGE and CLAIM_DELEGATE_PREV. 10174 o A server may support neither DELEGPURGE nor CLAIM_DELEGATE_PREV. 10176 This fact allows a client starting up to determine if the server is 10177 prepared to support persistent storage of delegation information and 10178 thus whether it may use write-back caching to local persistent 10179 storage, relying on CLAIM_DELEGATE_PREV recovery to allow such 10180 changed data to be flushed safely to the server in the event of 10181 client restart. 10183 15.8. Operation 8: DELEGRETURN - Return Delegation 10185 15.8.1. SYNOPSIS 10187 (cfh), stateid -> 10189 15.8.2. ARGUMENT 10191 struct DELEGRETURN4args { 10192 /* CURRENT_FH: delegated file */ 10193 stateid4 deleg_stateid; 10194 }; 10196 15.8.3. RESULT 10197 struct DELEGRETURN4res { 10198 nfsstat4 status; 10199 }; 10201 15.8.4. DESCRIPTION 10203 Returns the delegation represented by the current filehandle and 10204 stateid. 10206 Delegations may be returned when recalled or voluntarily (i.e., 10207 before the server has recalled them). In either case the client must 10208 properly propagate state changed under the context of the delegation 10209 to the server before returning the delegation. 10211 15.9. Operation 9: GETATTR - Get Attributes 10213 15.9.1. SYNOPSIS 10215 (cfh), attrbits -> attrbits, attrvals 10217 15.9.2. ARGUMENT 10219 struct GETATTR4args { 10220 /* CURRENT_FH: directory or file */ 10221 bitmap4 attr_request; 10222 }; 10224 15.9.3. RESULT 10226 struct GETATTR4resok { 10227 fattr4 obj_attributes; 10228 }; 10230 union GETATTR4res switch (nfsstat4 status) { 10231 case NFS4_OK: 10232 GETATTR4resok resok4; 10233 default: 10234 void; 10235 }; 10237 15.9.4. DESCRIPTION 10239 The GETATTR operation will obtain attributes for the file system 10240 object specified by the current filehandle. The client sets a bit in 10241 the bitmap argument for each attribute value that it would like the 10242 server to return. The server returns an attribute bitmap that 10243 indicates the attribute values for which it was able to return 10244 values, followed by the attribute values ordered lowest attribute 10245 number first. 10247 The server MUST return a value for each attribute that the client 10248 requests if the attribute is supported by the server. If the server 10249 does not support an attribute or cannot approximate a useful value 10250 then it MUST NOT return the attribute value and MUST NOT set the 10251 attribute bit in the result bitmap. The server MUST return an error 10252 if it supports an attribute on the target but cannot obtain its 10253 value. In that case no attribute values will be returned. 10255 File systems which are absent should be treated as having support for 10256 a very small set of attributes as described in GETATTR Within an 10257 Absent File System (Section 8.3.1), even if previously, when the file 10258 system was present, more attributes were supported. 10260 All servers MUST support the REQUIRED attributes as specified in the 10261 section File Attributes (Section 5), for all file systems, with the 10262 exception of absent file systems. 10264 On success, the current filehandle retains its value. 10266 15.9.5. IMPLEMENTATION 10268 Suppose there is a OPEN_DELEGATE_WRITE delegation held by another 10269 client for file in question and size and/or change are among the set 10270 of attributes being interrogated. The server has two choices. 10271 First, the server can obtain the actual current value of these 10272 attributes from the client holding the delegation by using the 10273 CB_GETATTR callback. Second, the server, particularly when the 10274 delegated client is unresponsive, can recall the delegation in 10275 question. The GETATTR MUST NOT proceed until one of the following 10276 occurs: 10278 o The requested attribute values are returned in the response to 10279 CB_GETATTR. 10281 o The OPEN_DELEGATE_WRITE delegation is returned. 10283 o The OPEN_DELEGATE_WRITE delegation is revoked. 10285 Unless one of the above happens very quickly, one or more 10286 NFS4ERR_DELAY errors will be returned while a delegation is 10287 outstanding. 10289 15.10. Operation 10: GETFH - Get Current Filehandle 10291 15.10.1. SYNOPSIS 10293 (cfh) -> filehandle 10295 15.10.2. ARGUMENT 10297 /* CURRENT_FH: */ 10298 void; 10300 15.10.3. RESULT 10302 struct GETFH4resok { 10303 nfs_fh4 object; 10304 }; 10306 union GETFH4res switch (nfsstat4 status) { 10307 case NFS4_OK: 10308 GETFH4resok resok4; 10309 default: 10310 void; 10311 }; 10313 15.10.4. DESCRIPTION 10315 This operation returns the current filehandle value. 10317 On success, the current filehandle retains its value. 10319 15.10.5. IMPLEMENTATION 10321 Operations that change the current filehandle like LOOKUP or CREATE 10322 do not automatically return the new filehandle as a result. For 10323 instance, if a client needs to lookup a directory entry and obtain 10324 its filehandle then the following request is needed. 10326 PUTFH (directory filehandle) 10327 LOOKUP (entry name) 10328 GETFH 10330 15.11. Operation 11: LINK - Create Link to a File 10331 15.11.1. SYNOPSIS 10333 (sfh), (cfh), newname -> (cfh), cinfo 10335 15.11.2. ARGUMENT 10337 struct LINK4args { 10338 /* SAVED_FH: source object */ 10339 /* CURRENT_FH: target directory */ 10340 component4 newname; 10341 }; 10343 15.11.3. RESULT 10345 struct LINK4resok { 10346 change_info4 cinfo; 10347 }; 10349 union LINK4res switch (nfsstat4 status) { 10350 case NFS4_OK: 10351 LINK4resok resok4; 10352 default: 10353 void; 10354 }; 10356 15.11.4. DESCRIPTION 10358 The LINK operation creates an additional newname for the file 10359 represented by the saved filehandle, as set by the SAVEFH operation, 10360 in the directory represented by the current filehandle. The existing 10361 file and the target directory must reside within the same file system 10362 on the server. On success, the current filehandle will continue to 10363 be the target directory. If an object exists in the target directory 10364 with the same name as newname, the server must return NFS4ERR_EXIST. 10366 For the target directory, the server returns change_info4 information 10367 in cinfo. With the atomic field of the change_info4 struct, the 10368 server will indicate if the before and after change attributes were 10369 obtained atomically with respect to the link creation. 10371 If the newname has a length of 0 (zero), or if newname does not obey 10372 the UTF-8 definition, the error NFS4ERR_INVAL will be returned. 10374 15.11.5. IMPLEMENTATION 10376 Changes to any property of the "hard" linked files are reflected in 10377 all of the linked files. When a link is made to a file, the 10378 attributes for the file should have a value for numlinks that is one 10379 greater than the value before the LINK operation. 10381 The statement "file and the target directory must reside within the 10382 same file system on the server" means that the fsid fields in the 10383 attributes for the objects are the same. If they reside on different 10384 file systems, the error NFS4ERR_XDEV is returned. This error may be 10385 returned by some servers when there is an internal partitioning of a 10386 file system that the LINK operation would violate. 10388 On some servers, "." and ".." are illegal values for newname and the 10389 error NFS4ERR_BADNAME will be returned if they are specified. 10391 When the current filehandle designates a named attribute directory 10392 and the object to be linked (the saved filehandle) is not a named 10393 attribute for the same object, the error NFS4ERR_XDEV MUST be 10394 returned. When the saved filehandle designates a named attribute and 10395 the current filehandle is not the appropriate named attribute 10396 directory, the error NFS4ERR_XDEV MUST also be returned. 10398 When the current filehandle designates a named attribute directory 10399 and the object to be linked (the saved filehandle) is a named 10400 attribute within that directory, the server MAY return the error 10401 NFS4ERR_NOTSUPP. 10403 In the case that newname is already linked to the file represented by 10404 the saved filehandle, the server will return NFS4ERR_EXIST. 10406 Note that symbolic links are created with the CREATE operation. 10408 15.12. Operation 12: LOCK - Create Lock 10410 15.12.1. SYNOPSIS 10412 (cfh) locktype, reclaim, offset, length, locker -> stateid 10414 15.12.2. ARGUMENT 10416 enum nfs_lock_type4 { 10417 READ_LT = 1, 10418 WRITE_LT = 2, 10419 READW_LT = 3, /* blocking read */ 10420 WRITEW_LT = 4 /* blocking write */ 10421 }; 10422 /* 10423 * For LOCK, transition from open_owner to new lock_owner 10424 */ 10425 struct open_to_lock_owner4 { 10426 seqid4 open_seqid; 10427 stateid4 open_stateid; 10428 seqid4 lock_seqid; 10429 lock_owner4 lock_owner; 10430 }; 10432 /* 10433 * For LOCK, existing lock_owner continues to request file locks 10434 */ 10435 struct exist_lock_owner4 { 10436 stateid4 lock_stateid; 10437 seqid4 lock_seqid; 10438 }; 10440 union locker4 switch (bool new_lock_owner) { 10441 case TRUE: 10442 open_to_lock_owner4 open_owner; 10443 case FALSE: 10444 exist_lock_owner4 lock_owner; 10445 }; 10447 /* 10448 * LOCK/LOCKT/LOCKU: Record lock management 10449 */ 10450 struct LOCK4args { 10451 /* CURRENT_FH: file */ 10452 nfs_lock_type4 locktype; 10453 bool reclaim; 10454 offset4 offset; 10455 length4 length; 10456 locker4 locker; 10457 }; 10459 15.12.3. RESULT 10460 struct LOCK4denied { 10461 offset4 offset; 10462 length4 length; 10463 nfs_lock_type4 locktype; 10464 lock_owner4 owner; 10465 }; 10467 struct LOCK4resok { 10468 stateid4 lock_stateid; 10469 }; 10471 union LOCK4res switch (nfsstat4 status) { 10472 case NFS4_OK: 10473 LOCK4resok resok4; 10474 case NFS4ERR_DENIED: 10475 LOCK4denied denied; 10476 default: 10477 void; 10478 }; 10480 15.12.4. DESCRIPTION 10482 The LOCK operation requests a byte-range lock for the byte range 10483 specified by the offset and length parameters. The lock type is also 10484 specified to be one of the nfs_lock_type4s. If this is a reclaim 10485 request, the reclaim parameter will be TRUE; 10487 Bytes in a file may be locked even if those bytes are not currently 10488 allocated to the file. To lock the file from a specific offset 10489 through the end-of-file (no matter how long the file actually is) use 10490 a length field with all bits set to 1 (one). If the length is zero, 10491 or if a length which is not all bits set to one is specified, and 10492 length when added to the offset exceeds the maximum 64-bit unsigned 10493 integer value, the error NFS4ERR_INVAL will result. 10495 Some servers may only support locking for byte offsets that fit 10496 within 32 bits. If the client specifies a range that includes a byte 10497 beyond the last byte offset of the 32-bit range, but does not include 10498 the last byte offset of the 32-bit and all of the byte offsets beyond 10499 it, up to the end of the valid 64-bit range, such a 32-bit server 10500 MUST return the error NFS4ERR_BAD_RANGE. 10502 In the case that the lock is denied, the owner, offset, and length of 10503 a conflicting lock are returned. 10505 On success, the current filehandle retains its value. 10507 15.12.5. IMPLEMENTATION 10509 If the server is unable to determine the exact offset and length of 10510 the conflicting lock, the same offset and length that were provided 10511 in the arguments should be returned in the denied results. Section 9 10512 contains a full description of this and the other file locking 10513 operations. 10515 LOCK operations are subject to permission checks and to checks 10516 against the access type of the associated file. However, the 10517 specific right and modes required for various type of locks, reflect 10518 the semantics of the server-exported file system, and are not 10519 specified by the protocol. For example, Windows 2000 allows a write 10520 lock of a file open for READ, while a POSIX-compliant system does 10521 not. 10523 When the client makes a lock request that corresponds to a range that 10524 the lock-owner has locked already (with the same or different lock 10525 type), or to a sub-region of such a range, or to a region which 10526 includes multiple locks already granted to that lock-owner, in whole 10527 or in part, and the server does not support such locking operations 10528 (i.e., does not support POSIX locking semantics), the server will 10529 return the error NFS4ERR_LOCK_RANGE. In that case, the client may 10530 return an error, or it may emulate the required operations, using 10531 only LOCK for ranges that do not include any bytes already locked by 10532 that lock-owner and LOCKU of locks held by that lock-owner 10533 (specifying an exactly-matching range and type). Similarly, when the 10534 client makes a lock request that amounts to upgrading (changing from 10535 a read lock to a write lock) or downgrading (changing from write lock 10536 to a read lock) an existing record lock, and the server does not 10537 support such a lock, the server will return NFS4ERR_LOCK_NOTSUPP. 10538 Such operations may not perfectly reflect the required semantics in 10539 the face of conflicting lock requests from other clients. 10541 When a client holds an OPEN_DELEGATE_WRITE delegation, the client 10542 holding that delegation is assured that there are no opens by other 10543 clients. Thus, there can be no conflicting LOCK operations from such 10544 clients. Therefore, the client may be handling locking requests 10545 locally, without doing LOCK operations on the server. If it does 10546 that, it must be prepared to update the lock status on the server, by 10547 sending appropriate LOCK and LOCKU operations before returning the 10548 delegation. 10550 When one or more clients hold OPEN_DELEGATE_READ delegations, any 10551 LOCK operation where the server is implementing mandatory locking 10552 semantics MUST result in the recall of all such delegations. The 10553 LOCK operation may not be granted until all such delegations are 10554 returned or revoked. Except where this happens very quickly, one or 10555 more NFS4ERR_DELAY errors will be returned to requests made while the 10556 delegation remains outstanding. 10558 The locker argument specifies the lock-owner that is associated with 10559 the LOCK request. The locker4 structure is a switched union that 10560 indicates whether the client has already created byte-range locking 10561 state associated with the current open file and lock-owner. There 10562 are multiple cases to be considered, corresponding to possible 10563 combinations of whether locking state has been created for the 10564 current open file and lock-owner, and whether the boolean 10565 new_lock_owner is set. In all of the cases, there is a lock_seqid 10566 specified, whether the lock-owner is specified explicitly or 10567 implicitly. This seqid value is used for checking lock-owner 10568 sequencing/replay issues. When the given lock-owner is not known to 10569 the server, this establishes an initial sequence value for the new 10570 lock-owner. 10572 o In the case in which the state has been created and the boolean is 10573 false, the only part of the argument other than lock_seqid is just 10574 a stateid representing the set of locks associated with that open 10575 file and lock-owner. 10577 o In the case in which the state has been created and the boolean is 10578 true, the server rejects the request with the error 10579 NFS4ERR_BAD_SEQID. The only exception is where there is a 10580 retransmission of a previous request in which the boolean was 10581 true. In this case, the lock_seqid will match the original 10582 request and the response will reflect the final case, below. 10584 o In the case where no byte-range locking state has been established 10585 and the boolean is true, the argument contains an 10586 open_to_lock_owner structure which specifies the stateid of the 10587 open file and the lock-owner to be used for the lock. Note that 10588 although the open-owner is not given explicitly, the open_seqid 10589 associated with it is used to check for open-owner sequencing 10590 issues. This case provides a method to use the established state 10591 of the open_stateid to transition to the use of a lock stateid. 10593 15.13. Operation 13: LOCKT - Test For Lock 10595 15.13.1. SYNOPSIS 10597 (cfh) locktype, offset, length, owner -> {void, NFS4ERR_DENIED -> 10598 owner} 10600 15.13.2. ARGUMENT 10602 struct LOCKT4args { 10603 /* CURRENT_FH: file */ 10604 nfs_lock_type4 locktype; 10605 offset4 offset; 10606 length4 length; 10607 lock_owner4 owner; 10608 }; 10610 15.13.3. RESULT 10612 union LOCKT4res switch (nfsstat4 status) { 10613 case NFS4ERR_DENIED: 10614 LOCK4denied denied; 10615 case NFS4_OK: 10616 void; 10617 default: 10618 void; 10619 }; 10621 15.13.4. DESCRIPTION 10623 The LOCKT operation tests the lock as specified in the arguments. If 10624 a conflicting lock exists, the owner, offset, length, and type of the 10625 conflicting lock are returned; if no lock is held, nothing other than 10626 NFS4_OK is returned. Lock types READ_LT and READW_LT are processed 10627 in the same way in that a conflicting lock test is done without 10628 regard to blocking or non-blocking. The same is true for WRITE_LT 10629 and WRITEW_LT. 10631 The ranges are specified as for LOCK. The NFS4ERR_INVAL and 10632 NFS4ERR_BAD_RANGE errors are returned under the same circumstances as 10633 for LOCK. 10635 On success, the current filehandle retains its value. 10637 15.13.5. IMPLEMENTATION 10639 If the server is unable to determine the exact offset and length of 10640 the conflicting lock, the same offset and length that were provided 10641 in the arguments should be returned in the denied results. Section 9 10642 contains further discussion of the file locking mechanisms. 10644 LOCKT uses a lock_owner4 rather a stateid4, as is used in LOCK to 10645 identify the owner. This is because the client does not have to open 10646 the file to test for the existence of a lock, so a stateid may not be 10647 available. 10649 The test for conflicting locks SHOULD exclude locks for the current 10650 lock-owner. Note that since such locks are not examined the possible 10651 existence of overlapping ranges may not affect the results of LOCKT. 10652 If the server does examine locks that match the lock-owner for the 10653 purpose of range checking, NFS4ERR_LOCK_RANGE may be returned. In 10654 the event that it returns NFS4_OK, clients may do a LOCK and receive 10655 NFS4ERR_LOCK_RANGE on the LOCK request because of the flexibility 10656 provided to the server. 10658 When a client holds an OPEN_DELEGATE_WRITE delegation, it may choose 10659 (see Section 15.12.5)) to handle LOCK requests locally. In such a 10660 case, LOCKT requests will similarly be handled locally. 10662 15.14. Operation 14: LOCKU - Unlock File 10664 15.14.1. SYNOPSIS 10666 (cfh) type, seqid, stateid, offset, length -> stateid 10668 15.14.2. ARGUMENT 10670 struct LOCKU4args { 10671 /* CURRENT_FH: file */ 10672 nfs_lock_type4 locktype; 10673 seqid4 seqid; 10674 stateid4 lock_stateid; 10675 offset4 offset; 10676 length4 length; 10677 }; 10679 15.14.3. RESULT 10681 union LOCKU4res switch (nfsstat4 status) { 10682 case NFS4_OK: 10683 stateid4 lock_stateid; 10684 default: 10685 void; 10686 }; 10688 15.14.4. DESCRIPTION 10690 The LOCKU operation unlocks the byte-range lock specified by the 10691 parameters. The client may set the locktype field to any value that 10692 is legal for the nfs_lock_type4 enumerated type, and the server MUST 10693 accept any legal value for locktype. Any legal value for locktype 10694 has no effect on the success or failure of the LOCKU operation. 10696 The ranges are specified as for LOCK. The NFS4ERR_INVAL and 10697 NFS4ERR_BAD_RANGE errors are returned under the same circumstances as 10698 for LOCK. 10700 On success, the current filehandle retains its value. 10702 15.14.5. IMPLEMENTATION 10704 If the area to be unlocked does not correspond exactly to a lock 10705 actually held by the lock-owner the server may return the error 10706 NFS4ERR_LOCK_RANGE. This includes the case in which the area is not 10707 locked, where the area is a sub-range of the area locked, where it 10708 overlaps the area locked without matching exactly or the area 10709 specified includes multiple locks held by the lock-owner. In all of 10710 these cases, allowed by POSIX locking [fcntl] semantics, a client 10711 receiving this error, should if it desires support for such 10712 operations, simulate the operation using LOCKU on ranges 10713 corresponding to locks it actually holds, possibly followed by LOCK 10714 requests for the sub-ranges not being unlocked. 10716 When a client holds an OPEN_DELEGATE_WRITE delegation, it may choose 10717 (see Section 15.12.5)) to handle LOCK requests locally. In such a 10718 case, LOCKU requests will similarly be handled locally. 10720 15.15. Operation 15: LOOKUP - Lookup Filename 10722 15.15.1. SYNOPSIS 10724 (cfh), component -> (cfh) 10726 15.15.2. ARGUMENT 10728 struct LOOKUP4args { 10729 /* CURRENT_FH: directory */ 10730 component4 objname; 10731 }; 10733 15.15.3. RESULT 10735 struct LOOKUP4res { 10736 /* CURRENT_FH: object */ 10737 nfsstat4 status; 10738 }; 10740 15.15.4. DESCRIPTION 10742 This operation LOOKUPs or finds a file system object using the 10743 directory specified by the current filehandle. LOOKUP evaluates the 10744 component and if the object exists the current filehandle is replaced 10745 with the component's filehandle. 10747 If the component cannot be evaluated either because it does not exist 10748 or because the client does not have permission to evaluate the 10749 component, then an error will be returned and the current filehandle 10750 will be unchanged. 10752 If the component is of zero length, NFS4ERR_INVAL will be returned. 10753 The component is also subject to the normal UTF-8, character support, 10754 and name checks. See Section 12.7 for further discussion. 10756 15.15.5. IMPLEMENTATION 10758 If the client wants to achieve the effect of a multi-component 10759 lookup, it may construct a COMPOUND request such as (and obtain each 10760 filehandle): 10762 PUTFH (directory filehandle) 10763 LOOKUP "pub" 10764 GETFH 10765 LOOKUP "foo" 10766 GETFH 10767 LOOKUP "bar" 10768 GETFH 10770 NFSv4 servers depart from the semantics of previous NFS versions in 10771 allowing LOOKUP requests to cross mount points on the server. The 10772 client can detect a mount point crossing by comparing the fsid 10773 attribute of the directory with the fsid attribute of the directory 10774 looked up. If the fsids are different then the new directory is a 10775 server mount point. UNIX clients that detect a mount point crossing 10776 will need to mount the server's file system. This needs to be done 10777 to maintain the file object identity checking mechanisms common to 10778 UNIX clients. 10780 Servers that limit NFS access to "shares" or "exported" file systems 10781 should provide a pseudo-file system into which the exported file 10782 systems can be integrated, so that clients can browse the server's 10783 name space. The clients' view of a pseudo file system will be 10784 limited to paths that lead to exported file systems. 10786 Note: previous versions of the protocol assigned special semantics to 10787 the names "." and "..". NFSv4 assigns no special semantics to these 10788 names. The LOOKUPP operator must be used to lookup a parent 10789 directory. 10791 Note that this operation does not follow symbolic links. The client 10792 is responsible for all parsing of filenames including filenames that 10793 are modified by symbolic links encountered during the lookup process. 10795 If the current filehandle supplied is not a directory but a symbolic 10796 link, the error NFS4ERR_SYMLINK is returned as the error. For all 10797 other non-directory file types, the error NFS4ERR_NOTDIR is returned. 10799 15.16. Operation 16: LOOKUPP - Lookup Parent Directory 10801 15.16.1. SYNOPSIS 10803 (cfh) -> (cfh) 10805 15.16.2. ARGUMENT 10807 /* CURRENT_FH: object */ 10808 void; 10810 15.16.3. RESULT 10812 struct LOOKUPP4res { 10813 /* CURRENT_FH: directory */ 10814 nfsstat4 status; 10815 }; 10817 15.16.4. DESCRIPTION 10819 The current filehandle is assumed to refer to a regular directory or 10820 a named attribute directory. LOOKUPP assigns the filehandle for its 10821 parent directory to be the current filehandle. If there is no parent 10822 directory an NFS4ERR_NOENT error must be returned. Therefore, 10823 NFS4ERR_NOENT will be returned by the server when the current 10824 filehandle is at the root or top of the server's file tree. 10826 15.16.5. IMPLEMENTATION 10828 As for LOOKUP, LOOKUPP will also cross mount points. 10830 If the current filehandle is not a directory or named attribute 10831 directory, the error NFS4ERR_NOTDIR is returned. 10833 If the current filehandle is a named attribute directory that is 10834 associated with a file system object via OPENATTR (i.e., not a sub- 10835 directory of a named attribute directory), LOOKUPP SHOULD return the 10836 filehandle of the associated file system object. 10838 15.17. Operation 17: NVERIFY - Verify Difference in Attributes 10840 15.17.1. SYNOPSIS 10842 (cfh), fattr -> - 10844 15.17.2. ARGUMENT 10846 struct NVERIFY4args { 10847 /* CURRENT_FH: object */ 10848 fattr4 obj_attributes; 10849 }; 10851 15.17.3. RESULT 10853 struct NVERIFY4res { 10854 nfsstat4 status; 10855 }; 10857 15.17.4. DESCRIPTION 10859 This operation is used to prefix a sequence of operations to be 10860 performed if one or more attributes have changed on some file system 10861 object. If all the attributes match then the error NFS4ERR_SAME must 10862 be returned. 10864 On success, the current filehandle retains its value. 10866 15.17.5. IMPLEMENTATION 10868 This operation is useful as a cache validation operator. If the 10869 object to which the attributes belong has changed then the following 10870 operations may obtain new data associated with that object. For 10871 instance, to check if a file has been changed and obtain new data if 10872 it has: 10874 PUTFH (public) 10875 LOOKUP "foobar" 10876 NVERIFY attrbits attrs 10877 READ 0 32767 10879 In the case that a RECOMMENDED attribute is specified in the NVERIFY 10880 operation and the server does not support that attribute for the file 10881 system object, the error NFS4ERR_ATTRNOTSUPP is returned to the 10882 client. 10884 When the attribute rdattr_error or any write-only attribute (e.g., 10885 time_modify_set) is specified, the error NFS4ERR_INVAL is returned to 10886 the client. 10888 15.18. Operation 18: OPEN - Open a Regular File 10890 15.18.1. SYNOPSIS 10892 (cfh), seqid, share_access, share_deny, owner, openhow, claim -> 10893 (cfh), stateid, cinfo, rflags, attrset, delegation 10895 15.18.2. ARGUMENT 10897 /* 10898 * Various definitions for OPEN 10899 */ 10900 enum createmode4 { 10901 UNCHECKED4 = 0, 10902 GUARDED4 = 1, 10903 EXCLUSIVE4 = 2 10904 }; 10906 union createhow4 switch (createmode4 mode) { 10907 case UNCHECKED4: 10908 case GUARDED4: 10909 fattr4 createattrs; 10910 case EXCLUSIVE4: 10911 verifier4 createverf; 10912 }; 10914 enum opentype4 { 10915 OPEN4_NOCREATE = 0, 10916 OPEN4_CREATE = 1 10917 }; 10918 union openflag4 switch (opentype4 opentype) { 10919 case OPEN4_CREATE: 10920 createhow4 how; 10921 default: 10922 void; 10923 }; 10925 /* Next definitions used for OPEN delegation */ 10926 enum limit_by4 { 10927 NFS_LIMIT_SIZE = 1, 10928 NFS_LIMIT_BLOCKS = 2 10929 /* others as needed */ 10930 }; 10932 struct nfs_modified_limit4 { 10933 uint32_t num_blocks; 10934 uint32_t bytes_per_block; 10935 }; 10937 union nfs_space_limit4 switch (limit_by4 limitby) { 10938 /* limit specified as file size */ 10939 case NFS_LIMIT_SIZE: 10940 uint64_t filesize; 10941 /* limit specified by number of blocks */ 10942 case NFS_LIMIT_BLOCKS: 10943 nfs_modified_limit4 mod_blocks; 10944 } ; 10946 enum open_delegation_type4 { 10947 OPEN_DELEGATE_NONE = 0, 10948 OPEN_DELEGATE_READ = 1, 10949 OPEN_DELEGATE_WRITE = 2 10950 }; 10952 enum open_claim_type4 { 10953 CLAIM_NULL = 0, 10954 CLAIM_PREVIOUS = 1, 10955 CLAIM_DELEGATE_CUR = 2, 10956 CLAIM_DELEGATE_PREV = 3 10957 }; 10959 struct open_claim_delegate_cur4 { 10960 stateid4 delegate_stateid; 10961 component4 file; 10962 }; 10964 union open_claim4 switch (open_claim_type4 claim) { 10965 /* 10966 * No special rights to file. 10967 * Ordinary OPEN of the specified file. 10968 */ 10969 case CLAIM_NULL: 10970 /* CURRENT_FH: directory */ 10971 component4 file; 10972 /* 10973 * Right to the file established by an 10974 * open previous to server reboot. File 10975 * identified by filehandle obtained at 10976 * that time rather than by name. 10977 */ 10978 case CLAIM_PREVIOUS: 10979 /* CURRENT_FH: file being reclaimed */ 10980 open_delegation_type4 delegate_type; 10982 /* 10983 * Right to file based on a delegation 10984 * granted by the server. File is 10985 * specified by name. 10986 */ 10987 case CLAIM_DELEGATE_CUR: 10988 /* CURRENT_FH: directory */ 10989 open_claim_delegate_cur4 delegate_cur_info; 10991 /* 10992 * Right to file based on a delegation 10993 * granted to a previous boot instance 10994 * of the client. File is specified by name. 10995 */ 10996 case CLAIM_DELEGATE_PREV: 10997 /* CURRENT_FH: directory */ 10998 component4 file_delegate_prev; 10999 }; 11001 /* 11002 * OPEN: Open a file, potentially receiving an open delegation 11003 */ 11004 struct OPEN4args { 11005 seqid4 seqid; 11006 uint32_t share_access; 11007 uint32_t share_deny; 11008 open_owner4 owner; 11009 openflag4 openhow; 11010 open_claim4 claim; 11011 }; 11013 15.18.3. RESULT 11015 struct open_read_delegation4 { 11016 stateid4 stateid; /* Stateid for delegation*/ 11017 bool recall; /* Pre-recalled flag for 11018 delegations obtained 11019 by reclaim (CLAIM_PREVIOUS) */ 11021 nfsace4 permissions; /* Defines users who don't 11022 need an ACCESS call to 11023 open for read */ 11024 }; 11026 struct open_write_delegation4 { 11027 stateid4 stateid; /* Stateid for delegation */ 11028 bool recall; /* Pre-recalled flag for 11029 delegations obtained 11030 by reclaim 11031 (CLAIM_PREVIOUS) */ 11033 nfs_space_limit4 11034 space_limit; /* Defines condition that 11035 the client must check to 11036 determine whether the 11037 file needs to be flushed 11038 to the server on close. */ 11040 nfsace4 permissions; /* Defines users who don't 11041 need an ACCESS call as 11042 part of a delegated 11043 open. */ 11044 }; 11046 union open_delegation4 11047 switch (open_delegation_type4 delegation_type) { 11048 case OPEN_DELEGATE_NONE: 11049 void; 11050 case OPEN_DELEGATE_READ: 11051 open_read_delegation4 read; 11052 case OPEN_DELEGATE_WRITE: 11053 open_write_delegation4 write; 11054 }; 11056 /* 11057 * Result flags 11058 */ 11060 /* Client must confirm open */ 11061 const OPEN4_RESULT_CONFIRM = 0x00000002; 11062 /* Type of file locking behavior at the server */ 11063 const OPEN4_RESULT_LOCKTYPE_POSIX = 0x00000004; 11065 struct OPEN4resok { 11066 stateid4 stateid; /* Stateid for open */ 11067 change_info4 cinfo; /* Directory Change Info */ 11068 uint32_t rflags; /* Result flags */ 11069 bitmap4 attrset; /* attribute set for create*/ 11070 open_delegation4 delegation; /* Info on any open 11071 delegation */ 11072 }; 11074 union OPEN4res switch (nfsstat4 status) { 11075 case NFS4_OK: 11076 /* CURRENT_FH: opened file */ 11077 OPEN4resok resok4; 11078 default: 11079 void; 11080 }; 11082 15.18.4. Warning to Client Implementers 11084 OPEN resembles LOOKUP in that it generates a filehandle for the 11085 client to use. Unlike LOOKUP though, OPEN creates server state on 11086 the filehandle. In normal circumstances, the client can only release 11087 this state with a CLOSE operation. CLOSE uses the current filehandle 11088 to determine which file to close. Therefore, the client MUST follow 11089 every OPEN operation with a GETFH operation in the same COMPOUND 11090 procedure. This will supply the client with the filehandle such that 11091 CLOSE can be used appropriately. 11093 Simply waiting for the lease on the file to expire is insufficient 11094 because the server may maintain the state indefinitely as long as 11095 another client does not attempt to make a conflicting access to the 11096 same file. 11098 15.18.5. DESCRIPTION 11100 The OPEN operation creates and/or opens a regular file in a directory 11101 with the provided name. If the file does not exist at the server and 11102 creation is desired, specification of the method of creation is 11103 provided by the openhow parameter. The client has the choice of 11104 three creation methods: UNCHECKED4, GUARDED4, or EXCLUSIVE4. 11106 If the current filehandle is a named attribute directory, OPEN will 11107 then create or open a named attribute file. Note that exclusive 11108 create of a named attribute is not supported. If the createmode is 11109 EXCLUSIVE4 and the current filehandle is a named attribute directory, 11110 the server will return EINVAL. 11112 UNCHECKED4 means that the file should be created if a file of that 11113 name does not exist and encountering an existing regular file of that 11114 name is not an error. For this type of create, createattrs specifies 11115 the initial set of attributes for the file. The set of attributes 11116 may include any writable attribute valid for regular files. When an 11117 UNCHECKED4 create encounters an existing file, the attributes 11118 specified by createattrs are not used, except that when an size of 11119 zero is specified, the existing file is truncated. If GUARDED4 is 11120 specified, the server checks for the presence of a duplicate object 11121 by name before performing the create. If a duplicate exists, an 11122 error of NFS4ERR_EXIST is returned as the status. If the object does 11123 not exist, the request is performed as described for UNCHECKED4. For 11124 each of these cases (UNCHECKED4 and GUARDED4) where the operation is 11125 successful, the server will return to the client an attribute mask 11126 signifying which attributes were successfully set for the object. 11128 EXCLUSIVE4 specifies that the server is to follow exclusive creation 11129 semantics, using the verifier to ensure exclusive creation of the 11130 target. The server should check for the presence of a duplicate 11131 object by name. If the object does not exist, the server creates the 11132 object and stores the verifier with the object. If the object does 11133 exist and the stored verifier matches the client provided verifier, 11134 the server uses the existing object as the newly created object. If 11135 the stored verifier does not match, then an error of NFS4ERR_EXIST is 11136 returned. No attributes may be provided in this case, since the 11137 server may use an attribute of the target object to store the 11138 verifier. If the server uses an attribute to store the exclusive 11139 create verifier, it will signify which attribute by setting the 11140 appropriate bit in the attribute mask that is returned in the 11141 results. 11143 For the target directory, the server returns change_info4 information 11144 in cinfo. With the atomic field of the change_info4 struct, the 11145 server will indicate if the before and after change attributes were 11146 obtained atomically with respect to the link creation. 11148 Upon successful creation, the current filehandle is replaced by that 11149 of the new object. 11151 The OPEN operation provides for Windows share reservation capability 11152 with the use of the share_access and share_deny fields of the OPEN 11153 arguments. The client specifies at OPEN the required share_access 11154 and share_deny modes. For clients that do not directly support 11155 SHAREs (i.e., UNIX), the expected deny value is DENY_NONE. In the 11156 case that there is a existing SHARE reservation that conflicts with 11157 the OPEN request, the server returns the error NFS4ERR_SHARE_DENIED. 11158 For a complete SHARE request, the client must provide values for the 11159 owner and seqid fields for the OPEN argument. For additional 11160 discussion of SHARE semantics see Section 9.9. 11162 In the case that the client is recovering state from a server 11163 failure, the claim field of the OPEN argument is used to signify that 11164 the request is meant to reclaim state previously held. 11166 The "claim" field of the OPEN argument is used to specify the file to 11167 be opened and the state information which the client claims to 11168 possess. There are four basic claim types which cover the various 11169 situations for an OPEN. They are as follows: 11171 CLAIM_NULL: For the client, this is a new OPEN request and there is 11172 no previous state associate with the file for the client. 11174 CLAIM_PREVIOUS: The client is claiming basic OPEN state for a file 11175 that was held previous to a server reboot. Generally used when a 11176 server is returning persistent filehandles; the client may not 11177 have the file name to reclaim the OPEN. 11179 CLAIM_DELEGATE_CUR: The client is claiming a delegation for OPEN as 11180 granted by the server. Generally this is done as part of 11181 recalling a delegation. 11183 CLAIM_DELEGATE_PREV: The client is claiming a delegation granted to 11184 a previous client instance. This claim type is for use after a 11185 SETCLIENTID_CONFIRM and before the corresponding DELEGPURGE in two 11186 situations: after a client reboot and after a lease expiration 11187 that resulted in loss of all lock state. The server MAY support 11188 CLAIM_DELEGATE_PREV. If it does support CLAIM_DELEGATE_PREV, 11189 SETCLIENTID_CONFIRM MUST NOT remove the client's delegation state, 11190 and the server MUST support the DELEGPURGE operation. 11192 The following errors apply to use of the CLAIM_DELEGATE_PREV claim 11193 type: 11195 o NFS4ERR_NOTSUPP is returned if the server does not support this 11196 claim type. 11198 o NFS4ERR_INVAL is returned if the reclaim is done at an 11199 inappropriate time, e.g., after DELEGPURGE has been done. 11201 o NFS4ERR_BAD_RECLAIM is returned if the other error conditions do 11202 not apply and the server has no record of the delegation whose 11203 reclaim is being attempted. 11205 For OPEN requests whose claim type is other than CLAIM_PREVIOUS 11206 (i.e., requests other than those devoted to reclaiming opens after a 11207 server reboot) that reach the server during its grace or lease 11208 expiration period, the server returns an error of NFS4ERR_GRACE. 11210 For any OPEN request, the server may return an open delegation, which 11211 allows further opens and closes to be handled locally on the client 11212 as described in Section 10.4. Note that delegation is up to the 11213 server to decide. The client should never assume that delegation 11214 will or will not be granted in a particular instance. It should 11215 always be prepared for either case. A partial exception is the 11216 reclaim (CLAIM_PREVIOUS) case, in which a delegation type is claimed. 11217 In this case, delegation will always be granted, although the server 11218 may specify an immediate recall in the delegation structure. 11220 The rflags returned by a successful OPEN allow the server to return 11221 information governing how the open file is to be handled. 11223 OPEN4_RESULT_CONFIRM indicates that the client MUST execute an 11224 OPEN_CONFIRM operation before using the open file. 11225 OPEN4_RESULT_LOCKTYPE_POSIX indicates the server's file locking 11226 behavior supports the complete set of Posix locking techniques 11227 [fcntl]. From this the client can choose to manage file locking 11228 state in a way to handle a mis-match of file locking management. 11230 If the component is of zero length, NFS4ERR_INVAL will be returned. 11231 The component is also subject to the normal UTF-8, character support, 11232 and name checks. See Section 12.7 for further discussion. 11234 When an OPEN is done and the specified open-owner already has the 11235 resulting filehandle open, the result is to "OR" together the new 11236 share and deny status together with the existing status. In this 11237 case, only a single CLOSE need be done, even though multiple OPENs 11238 were completed. When such an OPEN is done, checking of share 11239 reservations for the new OPEN proceeds normally, with no exception 11240 for the existing OPEN held by the same owner. In this case, the 11241 stateid returned as an "other" field that matches that of the 11242 previous open while the "seqid" field is incremented to reflect the 11243 change status due to the new open (Section 9.1.4). 11245 If the underlying file system at the server is only accessible in a 11246 read-only mode and the OPEN request has specified 11247 OPEN4_SHARE_ACCESSS_WRITE or OPEN4_SHARE_ACCESS_BOTH, the server will 11248 return NFS4ERR_ROFS to indicate a read-only file system. 11250 As with the CREATE operation, the server MUST derive the owner, owner 11251 ACE, group, or group ACE if any of the four attributes are required 11252 and supported by the server's file system. For an OPEN with the 11253 EXCLUSIVE4 createmode, the server has no choice, since such OPEN 11254 calls do not include the createattrs field. Conversely, if 11255 createattrs is specified, and includes owner or group (or 11256 corresponding ACEs) that the principal in the RPC call's credentials 11257 does not have authorization to create files for, then the server may 11258 return NFS4ERR_PERM. 11260 In the case of a OPEN which specifies a size of zero (e.g., 11261 truncation) and the file has named attributes, the named attributes 11262 are left as is. They are not removed. 11264 15.18.6. IMPLEMENTATION 11266 The OPEN operation contains support for EXCLUSIVE4 create. The 11267 mechanism is similar to the support in NFSv3 [RFC1813]. As in NFSv3, 11268 this mechanism provides reliable exclusive creation. Exclusive 11269 create is invoked when the how parameter is EXCLUSIVE4. In this 11270 case, the client provides a verifier that can reasonably be expected 11271 to be unique. A combination of a client identifier, perhaps the 11272 client network address, and a unique number generated by the client, 11273 perhaps the RPC transaction identifier, may be appropriate. 11275 If the object does not exist, the server creates the object and 11276 stores the verifier in stable storage. For file systems that do not 11277 provide a mechanism for the storage of arbitrary file attributes, the 11278 server may use one or more elements of the object meta-data to store 11279 the verifier. The verifier must be stored in stable storage to 11280 prevent erroneous failure on retransmission of the request. It is 11281 assumed that an exclusive create is being performed because exclusive 11282 semantics are critical to the application. Because of the expected 11283 usage, exclusive CREATE does not rely solely on the normally volatile 11284 duplicate request cache for storage of the verifier. The duplicate 11285 request cache in volatile storage does not survive a crash and may 11286 actually flush on a long network partition, opening failure windows. 11287 In the UNIX local file system environment, the expected storage 11288 location for the verifier on creation is the meta-data (time stamps) 11289 of the object. For this reason, an exclusive object create may not 11290 include initial attributes because the server would have nowhere to 11291 store the verifier. 11293 If the server cannot support these exclusive create semantics, 11294 possibly because of the requirement to commit the verifier to stable 11295 storage, it should fail the OPEN request with the error, 11296 NFS4ERR_NOTSUPP. 11298 During an exclusive CREATE request, if the object already exists, the 11299 server reconstructs the object's verifier and compares it with the 11300 verifier in the request. If they match, the server treats the 11301 request as a success. The request is presumed to be a duplicate of 11302 an earlier, successful request for which the reply was lost and that 11303 the server duplicate request cache mechanism did not detect. If the 11304 verifiers do not match, the request is rejected with the status, 11305 NFS4ERR_EXIST. 11307 Once the client has performed a successful exclusive create, it must 11308 issue a SETATTR to set the correct object attributes. Until it does 11309 so, it should not rely upon any of the object attributes, since the 11310 server implementation may need to overload object meta-data to store 11311 the verifier. The subsequent SETATTR must not occur in the same 11312 COMPOUND request as the OPEN. This separation will guarantee that 11313 the exclusive create mechanism will continue to function properly in 11314 the face of retransmission of the request. 11316 Use of the GUARDED4 attribute does not provide exactly-once 11317 semantics. In particular, if a reply is lost and the server does not 11318 detect the retransmission of the request, the operation can fail with 11319 NFS4ERR_EXIST, even though the create was performed successfully. 11320 The client would use this behavior in the case that the application 11321 has not requested an exclusive create but has asked to have the file 11322 truncated when the file is opened. In the case of the client timing 11323 out and retransmitting the create request, the client can use 11324 GUARDED4 to prevent against a sequence like: create, write, create 11325 (retransmitted) from occurring. 11327 For SHARE reservations (see Section 9.9), the client must specify a 11328 value for share_access that is one of OPEN4_SHARE_ACCESS_READ, 11329 OPEN4_SHARE_ACCESS_WRITE, or OPEN4_SHARE_ACCESS_BOTH. For 11330 share_deny, the client must specify one of OPEN4_SHARE_DENY_NONE, 11331 OPEN4_SHARE_DENY_READ, OPEN4_SHARE_DENY_WRITE, or 11332 OPEN4_SHARE_DENY_BOTH. If the client fails to do this, the server 11333 must return NFS4ERR_INVAL. 11335 Based on the share_access value (OPEN4_SHARE_ACCESS_READ, 11336 OPEN4_SHARE_ACCESS_WRITE, or OPEN4_SHARE_ACCESS_BOTH) the client 11337 should check that the requester has the proper access rights to 11338 perform the specified operation. This would generally be the results 11339 of applying the ACL access rules to the file for the current 11340 requester. However, just as with the ACCESS operation, the client 11341 should not attempt to second-guess the server's decisions, as access 11342 rights may change and may be subject to server administrative 11343 controls outside the ACL framework. If the requester is not 11344 authorized to READ or WRITE (depending on the share_access value), 11345 the server must return NFS4ERR_ACCESS. Note that since the NFS 11346 version 4 protocol does not impose any requirement that READs and 11347 WRITEs issued for an open file have the same credentials as the OPEN 11348 itself, the server still must do appropriate access checking on the 11349 READs and WRITEs themselves. 11351 If the component provided to OPEN resolves to something other than a 11352 regular file (or a named attribute), an error will be returned to the 11353 client. If it is a directory, NFS4ERR_ISDIR is returned; otherwise, 11354 NFS4ERR_SYMLINK is returned. Note that NFS4ERR_SYMLINK is returned 11355 for both symlinks and for special files of other types; NFS4ERR_INVAL 11356 would be inappropriate, since the arguments provided by the client 11357 were correct, and the client cannot necessarily know at the time it 11358 sent the OPEN that the component would resolve to a non-regular file. 11360 If the current filehandle is not a directory, the error 11361 NFS4ERR_NOTDIR will be returned. 11363 If a COMPOUND contains an OPEN which establishes an 11364 OPEN_DELEGATE_WRITE delegation, then normally subsequent GETATTRs 11365 result in a CB_GETATTR being sent to the client holding the 11366 delegation. However, in the case in which the OPEN and GETATTR are 11367 part of the same COMPOUND, the server SHOULD understand that the 11368 operations are for the same client ID and avoid querying the client, 11369 which will not be able to respond. This sequence of OPEN, GETATTR 11370 SHOULD be understood as retrieving of the size and change attributes 11371 at the time of OPEN, Further, as explained in Section 15.2.5, the 11372 client should not construct a COMPOUND which mixes operations for 11373 different client IDs. 11375 15.19. Operation 19: OPENATTR - Open Named Attribute Directory 11377 15.19.1. SYNOPSIS 11379 (cfh) createdir -> (cfh) 11381 15.19.2. ARGUMENT 11383 struct OPENATTR4args { 11384 /* CURRENT_FH: object */ 11385 bool createdir; 11386 }; 11388 15.19.3. RESULT 11390 struct OPENATTR4res { 11391 /* CURRENT_FH: named attr directory */ 11392 nfsstat4 status; 11393 }; 11395 15.19.4. DESCRIPTION 11397 The OPENATTR operation is used to obtain the filehandle of the named 11398 attribute directory associated with the current filehandle. The 11399 result of the OPENATTR will be a filehandle to an object of type 11400 NF4ATTRDIR. From this filehandle, READDIR and LOOKUP operations can 11401 be used to obtain filehandles for the various named attributes 11402 associated with the original file system object. Filehandles 11403 returned within the named attribute directory will have a type of 11404 NF4NAMEDATTR. 11406 The createdir argument allows the client to signify if a named 11407 attribute directory should be created as a result of the OPENATTR 11408 operation. Some clients may use the OPENATTR operation with a value 11409 of FALSE for createdir to determine if any named attributes exist for 11410 the object. If none exist, then NFS4ERR_NOENT will be returned. If 11411 createdir has a value of TRUE and no named attribute directory 11412 exists, one is created. The creation of a named attribute directory 11413 assumes that the server has implemented named attribute support in 11414 this fashion and is not required to do so by this definition. 11416 15.19.5. IMPLEMENTATION 11418 If the server does not support named attributes for the current 11419 filehandle, an error of NFS4ERR_NOTSUPP will be returned to the 11420 client. 11422 15.20. Operation 20: OPEN_CONFIRM - Confirm Open 11424 15.20.1. SYNOPSIS 11426 (cfh), seqid, stateid -> stateid 11428 15.20.2. ARGUMENT 11430 struct OPEN_CONFIRM4args { 11431 /* CURRENT_FH: opened file */ 11432 stateid4 open_stateid; 11433 seqid4 seqid; 11434 }; 11436 15.20.3. RESULT 11437 struct OPEN_CONFIRM4resok { 11438 stateid4 open_stateid; 11439 }; 11441 union OPEN_CONFIRM4res switch (nfsstat4 status) { 11442 case NFS4_OK: 11443 OPEN_CONFIRM4resok resok4; 11444 default: 11445 void; 11446 }; 11448 15.20.4. DESCRIPTION 11450 This operation is used to confirm the sequence id usage for the first 11451 time that a open-owner is used by a client. The stateid returned 11452 from the OPEN operation is used as the argument for this operation 11453 along with the next sequence id for the open-owner. The sequence id 11454 passed to the OPEN_CONFIRM must be 1 (one) greater than the seqid 11455 passed to the OPEN operation (Section 9.1.4). If the server receives 11456 an unexpected sequence id with respect to the original open, then the 11457 server assumes that the client will not confirm the original OPEN and 11458 all state associated with the original OPEN is released by the 11459 server. 11461 On success, the current filehandle retains its value. 11463 15.20.5. IMPLEMENTATION 11465 A given client might generate many open_owner4 data structures for a 11466 given client ID. The client will periodically either dispose of its 11467 open_owner4s or stop using them for indefinite periods of time. The 11468 latter situation is why the NFSv4 protocol does not have an explicit 11469 operation to exit an open_owner4: such an operation is of no use in 11470 that situation. Instead, to avoid unbounded memory use, the server 11471 needs to implement a strategy for disposing of open_owner4s that have 11472 no current open state for any files and have not been used recently. 11473 The time period used to determine when to dispose of open_owner4s is 11474 an implementation choice. The time period should certainly be no 11475 less than the lease time plus any grace period the server wishes to 11476 implement beyond a lease time. The OPEN_CONFIRM operation allows the 11477 server to safely dispose of unused open_owner4 data structures. 11479 In the case that a client issues an OPEN operation and the server no 11480 longer has a record of the open_owner4, the server needs to ensure 11481 that this is a new OPEN and not a replay or retransmission. 11483 Servers MUST NOT require confirmation on OPENs that grant delegations 11484 or are doing reclaim operations. See Section 9.1.11 for details. 11485 The server can easily avoid this by noting whether it has disposed of 11486 one open_owner4 for the given client ID. If the server does not 11487 support delegation, it might simply maintain a single bit that notes 11488 whether any open_owner4 (for any client) has been disposed of. 11490 The server must hold unconfirmed OPEN state until one of three events 11491 occur. First, the client sends an OPEN_CONFIRM request with the 11492 appropriate sequence id and stateid within the lease period. In this 11493 case, the OPEN state on the server goes to confirmed, and the 11494 open_owner4 on the server is fully established. 11496 Second, the client sends another OPEN request with a sequence id that 11497 is incorrect for the open_owner4 (out of sequence). In this case, 11498 the server assumes the second OPEN request is valid and the first one 11499 is a replay. The server cancels the OPEN state of the first OPEN 11500 request, establishes an unconfirmed OPEN state for the second OPEN 11501 request, and responds to the second OPEN request with an indication 11502 that an OPEN_CONFIRM is needed. The process then repeats itself. 11503 While there is a potential for a denial of service attack on the 11504 client, it is mitigated if the client and server require the use of a 11505 security flavor based on Kerberos V5 or some other flavor that uses 11506 cryptography. 11508 What if the server is in the unconfirmed OPEN state for a given 11509 open_owner4, and it receives an operation on the open_owner4 that has 11510 a stateid but the operation is not OPEN, or it is OPEN_CONFIRM but 11511 with the wrong stateid? Then, even if the seqid is correct, the 11512 server returns NFS4ERR_BAD_STATEID, because the server assumes the 11513 operation is a replay: if the server has no established OPEN state, 11514 then there is no way, for example, a LOCK operation could be valid. 11516 Third, neither of the two aforementioned events occur for the 11517 open_owner4 within the lease period. In this case, the OPEN state is 11518 canceled and disposal of the open_owner4 can occur. 11520 15.21. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access 11522 15.21.1. SYNOPSIS 11524 (cfh), stateid, seqid, access, deny -> stateid 11526 15.21.2. ARGUMENT 11527 struct OPEN_DOWNGRADE4args { 11528 /* CURRENT_FH: opened file */ 11529 stateid4 open_stateid; 11530 seqid4 seqid; 11531 uint32_t share_access; 11532 uint32_t share_deny; 11533 }; 11535 15.21.3. RESULT 11537 struct OPEN_DOWNGRADE4resok { 11538 stateid4 open_stateid; 11539 }; 11541 union OPEN_DOWNGRADE4res switch(nfsstat4 status) { 11542 case NFS4_OK: 11543 OPEN_DOWNGRADE4resok resok4; 11544 default: 11545 void; 11546 }; 11548 15.21.4. DESCRIPTION 11550 This operation is used to adjust the share_access and share_deny bits 11551 for a given open. This is necessary when a given open-owner opens 11552 the same file multiple times with different share_access and 11553 share_deny flags. In this situation, a close of one of the opens may 11554 change the appropriate share_access and share_deny flags to remove 11555 bits associated with opens no longer in effect. 11557 The share_access and share_deny bits specified in this operation 11558 replace the current ones for the specified open file. The 11559 share_access and share_deny bits specified must be exactly equal to 11560 the union of the share_access and share_deny bits specified for some 11561 subset of the OPENs in effect for current open-owner on the current 11562 file. If that constraint is not respected, the error NFS4ERR_INVAL 11563 should be returned. Since share_access and share_deny bits are 11564 subsets of those already granted, it is not possible for this request 11565 to be denied because of conflicting share reservations. 11567 As the OPEN_DOWNGRADE may change a file to be not-open-for-write and 11568 a write byte-range lock might be held, the server may have to reject 11569 the OPEN_DOWNGRADE with a NFS4ERR_LOCKS_HELD. 11571 On success, the current filehandle retains its value. 11573 15.22. Operation 22: PUTFH - Set Current Filehandle 11575 15.22.1. SYNOPSIS 11577 filehandle -> (cfh) 11579 15.22.2. ARGUMENT 11581 struct PUTFH4args { 11582 nfs_fh4 object; 11583 }; 11585 15.22.3. RESULT 11587 struct PUTFH4res { 11588 /* CURRENT_FH: */ 11589 nfsstat4 status; 11590 }; 11592 15.22.4. DESCRIPTION 11594 Replaces the current filehandle with the filehandle provided as an 11595 argument. 11597 If the security mechanism used by the requester does not meet the 11598 requirements of the filehandle provided to this operation, the server 11599 MUST return NFS4ERR_WRONGSEC. 11601 See Section 15.2.4.1 for more details on the current filehandle. 11603 15.22.5. IMPLEMENTATION 11605 Commonly used as the first operator in an NFS request to set the 11606 context for following operations. 11608 15.23. Operation 23: PUTPUBFH - Set Public Filehandle 11610 15.23.1. SYNOPSIS 11612 - -> (cfh) 11614 15.23.2. ARGUMENT 11616 void; 11618 15.23.3. RESULT 11620 struct PUTPUBFH4res { 11621 /* CURRENT_FH: public fh */ 11622 nfsstat4 status; 11623 }; 11625 15.23.4. DESCRIPTION 11627 Replaces the current filehandle with the filehandle that represents 11628 the public filehandle of the server's name space. This filehandle 11629 may be different from the "root" filehandle which may be associated 11630 with some other directory on the server. 11632 The public filehandle concept was introduced in [RFC2054], [RFC2055], 11633 [RFC2224]. The intent for NFSv4 is that the public filehandle 11634 (represented by the PUTPUBFH operation) be used as a method of 11635 providing compatibility with the WebNFS server of NFSv2 and NFSv3. 11637 The public filehandle and the root filehandle (represented by the 11638 PUTROOTFH operation) should be equivalent. If the public and root 11639 filehandles are not equivalent, then the public filehandle MUST be a 11640 descendant of the root filehandle. 11642 15.23.5. IMPLEMENTATION 11644 Used as the first operator in an NFS request to set the context for 11645 following operations. 11647 With the NFSv2 and 3 public filehandle, the client is able to specify 11648 whether the path name provided in the LOOKUP should be evaluated as 11649 either an absolute path relative to the server's root or relative to 11650 the public filehandle. [RFC2224] contains further discussion of the 11651 functionality. With NFSv4, that type of specification is not 11652 directly available in the LOOKUP operation. The reason for this is 11653 because the component separators needed to specify absolute vs. 11654 relative are not allowed in NFSv4. Therefore, the client is 11655 responsible for constructing its request such that the use of either 11656 PUTROOTFH or PUTPUBFH are used to signify absolute or relative 11657 evaluation of an NFS URL respectively. 11659 Note that there are warnings mentioned in [RFC2224] with respect to 11660 the use of absolute evaluation and the restrictions the server may 11661 place on that evaluation with respect to how much of its namespace 11662 has been made available. These same warnings apply to NFSv4. It is 11663 likely, therefore that because of server implementation details, an 11664 NFSv3 absolute public filehandle lookup may behave differently than 11665 an NFSv4 absolute resolution. 11667 There is a form of security negotiation as described in [RFC2755] 11668 that uses the public filehandle as a method of employing Simple and 11669 Protected GSSAPI Negotiation Mechanism (SNEGO) [RFC4178]. This 11670 method is not available with NFSv4 as filehandles are not overloaded 11671 with special meaning and therefore do not provide the same framework 11672 as NFSv2 and NFSv3. Clients should therefore use the security 11673 negotiation mechanisms described in this RFC. 11675 15.24. Operation 24: PUTROOTFH - Set Root Filehandle 11677 15.24.1. SYNOPSIS 11679 - -> (cfh) 11681 15.24.2. ARGUMENT 11683 void; 11685 15.24.3. RESULT 11687 struct PUTROOTFH4res { 11688 /* CURRENT_FH: root fh */ 11689 nfsstat4 status; 11690 }; 11692 15.24.4. DESCRIPTION 11694 Replaces the current filehandle with the filehandle that represents 11695 the root of the server's name space. From this filehandle a LOOKUP 11696 operation can locate any other filehandle on the server. This 11697 filehandle may be different from the "public" filehandle which may be 11698 associated with some other directory on the server. 11700 See Section 15.2.4.1 for more details on the current filehandle. 11702 15.24.5. IMPLEMENTATION 11704 Commonly used as the first operator in an NFS request to set the 11705 context for following operations. 11707 15.25. Operation 25: READ - Read from File 11709 15.25.1. SYNOPSIS 11711 (cfh), stateid, offset, count -> eof, data 11713 15.25.2. ARGUMENT 11715 struct READ4args { 11716 /* CURRENT_FH: file */ 11717 stateid4 stateid; 11718 offset4 offset; 11719 count4 count; 11720 }; 11722 15.25.3. RESULT 11724 struct READ4resok { 11725 bool eof; 11726 opaque data<>; 11727 }; 11729 union READ4res switch (nfsstat4 status) { 11730 case NFS4_OK: 11731 READ4resok resok4; 11732 default: 11733 void; 11734 }; 11736 15.25.4. DESCRIPTION 11738 The READ operation reads data from the regular file identified by the 11739 current filehandle. 11741 The client provides an offset of where the READ is to start and a 11742 count of how many bytes are to be read. An offset of 0 (zero) means 11743 to read data starting at the beginning of the file. If offset is 11744 greater than or equal to the size of the file, the status, NFS4_OK, 11745 is returned with a data length set to 0 (zero) and eof is set to 11746 TRUE. The READ is subject to access permissions checking. 11748 If the client specifies a count value of 0 (zero), the READ succeeds 11749 and returns 0 (zero) bytes of data again subject to access 11750 permissions checking. The server may choose to return fewer bytes 11751 than specified by the client. The client needs to check for this 11752 condition and handle the condition appropriately. 11754 The stateid value for a READ request represents a value returned from 11755 a previous byte-range lock or share reservation request or the 11756 stateid associated with a delegation. The stateid is used by the 11757 server to verify that the associated share reservation and any byte- 11758 range locks are still valid and to update lease timeouts for the 11759 client. 11761 If the read ended at the end-of-file (formally, in a correctly formed 11762 READ request, if offset + count is equal to the size of the file), or 11763 the read request extends beyond the size of the file (if offset + 11764 count is greater than the size of the file), eof is returned as TRUE; 11765 otherwise it is FALSE. A successful READ of an empty file will 11766 always return eof as TRUE. 11768 If the current filehandle is not a regular file, an error will be 11769 returned to the client. In the case the current filehandle 11770 represents a directory, NFS4ERR_ISDIR is returned; otherwise, 11771 NFS4ERR_INVAL is returned. 11773 For a READ using the special anonymous stateid, the server MAY allow 11774 the READ to be serviced subject to mandatory file locks or the 11775 current share deny modes for the file. For a READ using the special 11776 READ bypass stateid, the server MAY allow READ operations to bypass 11777 locking checks at the server. 11779 On success, the current filehandle retains its value. 11781 15.25.5. IMPLEMENTATION 11783 If the server returns a "short read" (i.e., fewer data than requested 11784 and eof is set to FALSE), the client should send another READ to get 11785 the remaining data. A server may return less data than requested 11786 under several circumstances. The file may have been truncated by 11787 another client or perhaps on the server itself, changing the file 11788 size from what the requesting client believes to be the case. This 11789 would reduce the actual amount of data available to the client. It 11790 is possible that the server reduces the transfer size and so returns 11791 a short read result. Server resource exhaustion may also result in a 11792 short read. 11794 If mandatory byte-range locking is in effect for the file, and if the 11795 byte-range corresponding to the data to be read from the file is 11796 WRITE_LT locked by an owner not associated with the stateid, the 11797 server will return the NFS4ERR_LOCKED error. The client should try 11798 to get the appropriate READ_LT via the LOCK operation before 11799 reattempting the READ. When the READ completes, the client should 11800 release the byte-range lock via LOCKU. 11802 If another client has an OPEN_DELEGATE_WRITE delegation for the file 11803 being read, the delegation must be recalled, and the operation cannot 11804 proceed until that delegation is returned or revoked. Except where 11805 this happens very quickly, one or more NFS4ERR_DELAY errors will be 11806 returned to requests made while the delegation remains outstanding. 11807 Normally, delegations will not be recalled as a result of a READ 11808 operation since the recall will occur as a result of an earlier OPEN. 11809 However, since it is possible for a READ to be done with a special 11810 stateid, the server needs to check for this case even though the 11811 client should have done an OPEN previously. 11813 15.26. Operation 26: READDIR - Read Directory 11815 15.26.1. SYNOPSIS 11817 (cfh), cookie, cookieverf, dircount, maxcount, attr_request -> 11818 cookieverf { cookie, name, attrs } 11820 15.26.2. ARGUMENT 11822 struct READDIR4args { 11823 /* CURRENT_FH: directory */ 11824 nfs_cookie4 cookie; 11825 verifier4 cookieverf; 11826 count4 dircount; 11827 count4 maxcount; 11828 bitmap4 attr_request; 11829 }; 11831 15.26.3. RESULT 11832 struct entry4 { 11833 nfs_cookie4 cookie; 11834 component4 name; 11835 fattr4 attrs; 11836 entry4 *nextentry; 11837 }; 11839 struct dirlist4 { 11840 entry4 *entries; 11841 bool eof; 11842 }; 11844 struct READDIR4resok { 11845 verifier4 cookieverf; 11846 dirlist4 reply; 11847 }; 11849 union READDIR4res switch (nfsstat4 status) { 11850 case NFS4_OK: 11851 READDIR4resok resok4; 11852 default: 11853 void; 11854 }; 11856 15.26.4. DESCRIPTION 11858 The READDIR operation retrieves a variable number of entries from a 11859 file system directory and returns client requested attributes for 11860 each entry along with information to allow the client to request 11861 additional directory entries in a subsequent READDIR. 11863 The arguments contain a cookie value that represents where the 11864 READDIR should start within the directory. A value of 0 (zero) for 11865 the cookie is used to start reading at the beginning of the 11866 directory. For subsequent READDIR requests, the client specifies a 11867 cookie value that is provided by the server on a previous READDIR 11868 request. 11870 The cookieverf value should be set to 0 (zero) when the cookie value 11871 is 0 (zero) (first directory read). On subsequent requests, it 11872 should be a cookieverf as returned by the server. The cookieverf 11873 must match that returned by the READDIR in which the cookie was 11874 acquired. If the server determines that the cookieverf is no longer 11875 valid for the directory, the error NFS4ERR_NOT_SAME must be returned. 11877 The dircount portion of the argument is a hint of the maximum number 11878 of bytes of directory information that should be returned. This 11879 value represents the length of the names of the directory entries and 11880 the cookie value for these entries. This length represents the XDR 11881 encoding of the data (names and cookies) and not the length in the 11882 native format of the server. 11884 The maxcount value of the argument is the maximum number of bytes for 11885 the result. This maximum size represents all of the data being 11886 returned within the READDIR4resok structure and includes the XDR 11887 overhead. The server may return less data. If the server is unable 11888 to return a single directory entry within the maxcount limit, the 11889 error NFS4ERR_TOOSMALL will be returned to the client. 11891 Finally, attr_request represents the list of attributes to be 11892 returned for each directory entry supplied by the server. 11894 On successful return, the server's response will provide a list of 11895 directory entries. Each of these entries contains the name of the 11896 directory entry, a cookie value for that entry, and the associated 11897 attributes as requested. The "eof" flag has a value of TRUE if there 11898 are no more entries in the directory. 11900 The cookie value is only meaningful to the server and is used as a 11901 "bookmark" for the directory entry. As mentioned, this cookie is 11902 used by the client for subsequent READDIR operations so that it may 11903 continue reading a directory. The cookie is similar in concept to a 11904 READ offset but should not be interpreted as such by the client. The 11905 server SHOULD try to accept cookie values issued with READDIR 11906 responses even if the directory has been modified between the READDIR 11907 calls but MAY return NFS4ERR_NOT_VALID if this is not possible as 11908 might be the case if the server has rebooted in the interim. 11910 In some cases, the server may encounter an error while obtaining the 11911 attributes for a directory entry. Instead of returning an error for 11912 the entire READDIR operation, the server can instead return the 11913 attribute 'fattr4_rdattr_error'. With this, the server is able to 11914 communicate the failure to the client and not fail the entire 11915 operation in the instance of what might be a transient failure. 11916 Obviously, the client must request the fattr4_rdattr_error attribute 11917 for this method to work properly. If the client does not request the 11918 attribute, the server has no choice but to return failure for the 11919 entire READDIR operation. 11921 For some file system environments, the directory entries "." and ".." 11922 have special meaning and in other environments, they may not. If the 11923 server supports these special entries within a directory, they should 11924 not be returned to the client as part of the READDIR response. To 11925 enable some client environments, the cookie values of 0, 1, and 2 are 11926 to be considered reserved. Note that the UNIX client will use these 11927 values when combining the server's response and local representations 11928 to enable a fully formed UNIX directory presentation to the 11929 application. 11931 For READDIR arguments, cookie values of 1 and 2 SHOULD NOT be used 11932 and for READDIR results cookie values of 0, 1, and 2 MUST NOT be 11933 returned. 11935 On success, the current filehandle retains its value. 11937 15.26.5. IMPLEMENTATION 11939 The server's file system directory representations can differ 11940 greatly. A client's programming interfaces may also be bound to the 11941 local operating environment in a way that does not translate well 11942 into the NFS protocol. Therefore the use of the dircount and 11943 maxcount fields are provided to allow the client the ability to 11944 provide guidelines to the server. If the client is aggressive about 11945 attribute collection during a READDIR, the server has an idea of how 11946 to limit the encoded response. The dircount field provides a hint on 11947 the number of entries based solely on the names of the directory 11948 entries. Since it is a hint, it may be possible that a dircount 11949 value is zero. In this case, the server is free to ignore the 11950 dircount value and return directory information based on the 11951 specified maxcount value. 11953 As there is no way for the client to indicate that a cookie value 11954 once received, will not be subsequently used, server implementations 11955 should avoid schemes that allocate memory corresponding to a returned 11956 cookie. Such allocation can be avoided if the server bases cookie 11957 values on a value such as the offset within the directory where the 11958 scan is to be resumed. 11960 Cookies generated by such techniques should be designed to remain 11961 valid despite modification of the associated directory. If a server 11962 were to invalidate a cookie because of a directory modification, 11963 READDIR's of large directories might never finish. 11965 If a directory is deleted after the client has carried out one or 11966 more READDIR operations on the directory, the cookies returned will 11967 become invalid but the server does not need to be concerned as the 11968 directory file handle used previously would have become stale and 11969 would be reported as such on subsequent READDIR operations. The 11970 server would not need to check the cookie verifier in this case. 11972 However, certain re-organization operations on a directory (including 11973 directory compaction) may invalidate READDDIR cookies previously 11974 given out. When such a situation occurs, the server should modify 11975 the cookie verifier so as to disallow use of cookies which would 11976 otherwise no longer be valid. 11978 The cookieverf may be used by the server to help manage cookie values 11979 that may become stale. It should be a rare occurrence that a server 11980 is unable to continue properly reading a directory with the provided 11981 cookie/cookieverf pair. The server should make every effort to avoid 11982 this condition since the application at the client may not be able to 11983 properly handle this type of failure. 11985 The use of the cookieverf will also protect the client from using 11986 READDIR cookie values that may be stale. For example, if the file 11987 system has been migrated, the server may or may not be able to use 11988 the same cookie values to service READDIR as the previous server 11989 used. With the client providing the cookieverf, the server is able 11990 to provide the appropriate response to the client. This prevents the 11991 case where the server may accept a cookie value but the underlying 11992 directory has changed and the response is invalid from the client's 11993 context of its previous READDIR. 11995 Since some servers will not be returning "." and ".." entries as has 11996 been done with previous versions of the NFS protocol, the client that 11997 requires these entries be present in READDIR responses must fabricate 11998 them. 12000 15.27. Operation 27: READLINK - Read Symbolic Link 12002 15.27.1. SYNOPSIS 12004 (cfh) -> linktext 12006 15.27.2. ARGUMENT 12008 /* CURRENT_FH: symlink */ 12009 void; 12011 15.27.3. RESULT 12012 struct READLINK4resok { 12013 linktext4 link; 12014 }; 12016 union READLINK4res switch (nfsstat4 status) { 12017 case NFS4_OK: 12018 READLINK4resok resok4; 12019 default: 12020 void; 12021 }; 12023 15.27.4. DESCRIPTION 12025 READLINK reads the data associated with a symbolic link. The data is 12026 a UTF-8 string that is opaque to the server. That is, whether 12027 created by an NFS client or created locally on the server, the data 12028 in a symbolic link is not interpreted when created, but is simply 12029 stored. 12031 On success, the current filehandle retains its value. 12033 15.27.5. IMPLEMENTATION 12035 A symbolic link is nominally a pointer to another file. The data is 12036 not necessarily interpreted by the server, just stored in the file. 12037 It is possible for a client implementation to store a path name that 12038 is not meaningful to the server operating system in a symbolic link. 12039 A READLINK operation returns the data to the client for 12040 interpretation. If different implementations want to share access to 12041 symbolic links, then they must agree on the interpretation of the 12042 data in the symbolic link. 12044 The READLINK operation is only allowed on objects of type NF4LNK. 12045 The server should return the error, NFS4ERR_INVAL, if the object is 12046 not of type, NF4LNK. 12048 15.28. Operation 28: REMOVE - Remove Filesystem Object 12050 15.28.1. SYNOPSIS 12052 (cfh), filename -> change_info 12054 15.28.2. ARGUMENT 12055 struct REMOVE4args { 12056 /* CURRENT_FH: directory */ 12057 component4 target; 12058 }; 12060 15.28.3. RESULT 12062 struct REMOVE4resok { 12063 change_info4 cinfo; 12064 }; 12066 union REMOVE4res switch (nfsstat4 status) { 12067 case NFS4_OK: 12068 REMOVE4resok resok4; 12069 default: 12070 void; 12071 }; 12073 15.28.4. DESCRIPTION 12075 The REMOVE operation removes (deletes) a directory entry named by 12076 filename from the directory corresponding to the current filehandle. 12077 If the entry in the directory was the last reference to the 12078 corresponding file system object, the object may be destroyed. 12080 For the directory where the filename was removed, the server returns 12081 change_info4 information in cinfo. With the atomic field of the 12082 change_info4 struct, the server will indicate if the before and after 12083 change attributes were obtained atomically with respect to the 12084 removal. 12086 If the target is of zero length, NFS4ERR_INVAL will be returned. The 12087 target is also subject to the normal UTF-8, character support, and 12088 name checks. See Section 12.7 for further discussion. 12090 On success, the current filehandle retains its value. 12092 15.28.5. IMPLEMENTATION 12094 NFSv3 required a different operator RMDIR for directory removal and 12095 REMOVE for non-directory removal. This allowed clients to skip 12096 checking the file type when being passed a non-directory delete 12097 system call (e.g., unlink() [unlink] in POSIX) to remove a directory, 12098 as well as the converse (e.g., a rmdir() on a non-directory) because 12099 they knew the server would check the file type. NFSv4 REMOVE can be 12100 used to delete any directory entry independent of its file type. The 12101 implementer of an NFSv4 client's entry points from the unlink() and 12102 rmdir() system calls should first check the file type against the 12103 types the system call is allowed to remove before issuing a REMOVE. 12104 Alternatively, the implementer can produce a COMPOUND call that 12105 includes a LOOKUP/VERIFY sequence to verify the file type before a 12106 REMOVE operation in the same COMPOUND call. 12108 The concept of last reference is server specific. However, if the 12109 numlinks field in the previous attributes of the object had the value 12110 1, the client should not rely on referring to the object via a 12111 filehandle. Likewise, the client should not rely on the resources 12112 (disk space, directory entry, and so on) formerly associated with the 12113 object becoming immediately available. Thus, if a client needs to be 12114 able to continue to access a file after using REMOVE to remove it, 12115 the client should take steps to make sure that the file will still be 12116 accessible. The usual mechanism used is to RENAME the file from its 12117 old name to a new hidden name. 12119 If the server finds that the file is still open when the REMOVE 12120 arrives: 12122 o The server SHOULD NOT delete the file's directory entry if the 12123 file was opened with OPEN4_SHARE_DENY_WRITE or 12124 OPEN4_SHARE_DENY_BOTH. 12126 o If the file was not opened with OPEN4_SHARE_DENY_WRITE or 12127 OPEN4_SHARE_DENY_BOTH, the server SHOULD delete the file's 12128 directory entry. However, until last CLOSE of the file, the 12129 server MAY continue to allow access to the file via its 12130 filehandle. 12132 15.29. Operation 29: RENAME - Rename Directory Entry 12134 15.29.1. SYNOPSIS 12136 (sfh), oldname, (cfh), newname -> source_cinfo, target_cinfo 12138 15.29.2. ARGUMENT 12140 struct RENAME4args { 12141 /* SAVED_FH: source directory */ 12142 component4 oldname; 12143 /* CURRENT_FH: target directory */ 12144 component4 newname; 12145 }; 12147 15.29.3. RESULT 12149 struct RENAME4resok { 12150 change_info4 source_cinfo; 12151 change_info4 target_cinfo; 12152 }; 12154 union RENAME4res switch (nfsstat4 status) { 12155 case NFS4_OK: 12156 RENAME4resok resok4; 12157 default: 12158 void; 12159 }; 12161 15.29.4. DESCRIPTION 12163 The RENAME operation renames the object identified by oldname in the 12164 source directory corresponding to the saved filehandle, as set by the 12165 SAVEFH operation, to newname in the target directory corresponding to 12166 the current filehandle. The operation is required to be atomic to 12167 the client. Source and target directories must reside on the same 12168 file system on the server. On success, the current filehandle will 12169 continue to be the target directory. 12171 If the target directory already contains an entry with the name, 12172 newname, the source object must be compatible with the target: either 12173 both are non-directories or both are directories and the target must 12174 be empty. If compatible, the existing target is removed before the 12175 rename occurs (See Section 15.28 for client and server actions 12176 whenever a target is removed). If they are not compatible or if the 12177 target is a directory but not empty, the server will return the 12178 error, NFS4ERR_EXIST. 12180 If oldname and newname both refer to the same file (they might be 12181 hard links of each other), then RENAME should perform no action and 12182 return success. 12184 For both directories involved in the RENAME, the server returns 12185 change_info4 information. With the atomic field of the change_info4 12186 struct, the server will indicate if the before and after change 12187 attributes were obtained atomically with respect to the rename. 12189 If the oldname refers to a named attribute and the saved and current 12190 filehandles refer to the named attribute directories of different 12191 file system objects, the server will return NFS4ERR_XDEV just as if 12192 the saved and current filehandles represented directories on 12193 different file systems. 12195 If the oldname or newname is of zero length, NFS4ERR_INVAL will be 12196 returned. The oldname and newname are also subject to the normal 12197 UTF-8, character support, and name checks. See Section 12.7 for 12198 further discussion. 12200 15.29.5. IMPLEMENTATION 12202 The RENAME operation must be atomic to the client. The statement 12203 "source and target directories must reside on the same file system on 12204 the server" means that the fsid fields in the attributes for the 12205 directories are the same. If they reside on different file systems, 12206 the error, NFS4ERR_XDEV, is returned. 12208 Based on the value of the fh_expire_type attribute for the object, 12209 the filehandle may or may not expire on a RENAME. However, server 12210 implementers are strongly encouraged to attempt to keep filehandles 12211 from expiring in this fashion. 12213 On some servers, the file names "." and ".." are illegal as either 12214 oldname or newname, and will result in the error NFS4ERR_BADNAME. In 12215 addition, on many servers the case of oldname or newname being an 12216 alias for the source directory will be checked for. Such servers 12217 will return the error NFS4ERR_INVAL in these cases. 12219 If either of the source or target filehandles are not directories, 12220 the server will return NFS4ERR_NOTDIR. 12222 15.30. Operation 30: RENEW - Renew a Lease 12224 15.30.1. SYNOPSIS 12226 clientid -> () 12228 15.30.2. ARGUMENT 12230 struct RENEW4args { 12231 clientid4 clientid; 12232 }; 12234 15.30.3. RESULT 12236 struct RENEW4res { 12237 nfsstat4 status; 12238 }; 12240 15.30.4. DESCRIPTION 12242 The RENEW operation is used by the client to renew leases which it 12243 currently holds at a server. In processing the RENEW request, the 12244 server renews all leases associated with the client. The associated 12245 leases are determined by the clientid provided via the SETCLIENTID 12246 operation. 12248 15.30.5. IMPLEMENTATION 12250 When the client holds delegations, it needs to use RENEW to detect 12251 when the server has determined that the callback path is down. When 12252 the server has made such a determination, only the RENEW operation 12253 will renew the lease on delegations. If the server determines the 12254 callback path is down, it returns NFS4ERR_CB_PATH_DOWN. Even though 12255 it returns NFS4ERR_CB_PATH_DOWN, the server MUST renew the lease on 12256 the byte-range locks and share reservations that the client has 12257 established on the server. If for some reason the lock and share 12258 reservation lease cannot be renewed, then the server MUST return an 12259 error other than NFS4ERR_CB_PATH_DOWN, even if the callback path is 12260 also down. In the event that the server has conditions such that it 12261 could return either NFS4ERR_CB_PATH_DOWN or NFS4ERR_LEASE_MOVED, 12262 NFS4ERR_LEASE_MOVED MUST be handled first. 12264 The client that issues RENEW MUST choose the principal, RPC security 12265 flavor, and if applicable, GSS-API mechanism and service via one of 12266 the following algorithms: 12268 o The client uses the same principal, RPC security flavor -- and if 12269 the flavor was RPCSEC_GSS -- the same mechanism and service that 12270 was used when the client ID was established via 12271 SETCLIENTID_CONFIRM. 12273 o The client uses any principal, RPC security flavor mechanism and 12274 service combination that currently has an OPEN file on the server. 12275 I.e., the same principal had a successful OPEN operation, the file 12276 is still open by that principal, and the flavor, mechanism, and 12277 service of RENEW match that of the previous OPEN. 12279 The server MUST reject a RENEW that does not use one the 12280 aforementioned algorithms, with the error NFS4ERR_ACCESS. 12282 15.31. Operation 31: RESTOREFH - Restore Saved Filehandle 12283 15.31.1. SYNOPSIS 12285 (sfh) -> (cfh) 12287 15.31.2. ARGUMENT 12289 /* SAVED_FH: */ 12290 void; 12292 15.31.3. RESULT 12294 struct RESTOREFH4res { 12295 /* CURRENT_FH: value of saved fh */ 12296 nfsstat4 status; 12297 }; 12299 15.31.4. DESCRIPTION 12301 Set the current filehandle to the value in the saved filehandle. If 12302 there is no saved filehandle then return the error NFS4ERR_RESTOREFH. 12304 15.31.5. IMPLEMENTATION 12306 Operations like OPEN and LOOKUP use the current filehandle to 12307 represent a directory and replace it with a new filehandle. Assuming 12308 the previous filehandle was saved with a SAVEFH operator, the 12309 previous filehandle can be restored as the current filehandle. This 12310 is commonly used to obtain post-operation attributes for the 12311 directory, e.g., 12313 PUTFH (directory filehandle) 12314 SAVEFH 12315 GETATTR attrbits (pre-op dir attrs) 12316 CREATE optbits "foo" attrs 12317 GETATTR attrbits (file attributes) 12318 RESTOREFH 12319 GETATTR attrbits (post-op dir attrs) 12321 15.32. Operation 32: SAVEFH - Save Current Filehandle 12323 15.32.1. SYNOPSIS 12325 (cfh) -> (sfh) 12327 15.32.2. ARGUMENT 12329 /* CURRENT_FH: */ 12330 void; 12332 15.32.3. RESULT 12334 struct SAVEFH4res { 12335 /* SAVED_FH: value of current fh */ 12336 nfsstat4 status; 12337 }; 12339 15.32.4. DESCRIPTION 12341 Save the current filehandle. If a previous filehandle was saved then 12342 it is no longer accessible. The saved filehandle can be restored as 12343 the current filehandle with the RESTOREFH operator. 12345 On success, the current filehandle retains its value. 12347 15.32.5. IMPLEMENTATION 12349 15.33. Operation 33: SECINFO - Obtain Available Security 12351 15.33.1. SYNOPSIS 12353 (cfh), name -> { secinfo } 12355 15.33.2. ARGUMENT 12357 struct SECINFO4args { 12358 /* CURRENT_FH: directory */ 12359 component4 name; 12360 }; 12362 15.33.3. RESULT 12363 /* 12364 * From RFC 2203 12365 */ 12366 enum rpc_gss_svc_t { 12367 RPC_GSS_SVC_NONE = 1, 12368 RPC_GSS_SVC_INTEGRITY = 2, 12369 RPC_GSS_SVC_PRIVACY = 3 12370 }; 12372 struct rpcsec_gss_info { 12373 sec_oid4 oid; 12374 qop4 qop; 12375 rpc_gss_svc_t service; 12376 }; 12378 /* RPCSEC_GSS has a value of '6' - See RFC 2203 */ 12379 union secinfo4 switch (uint32_t flavor) { 12380 case RPCSEC_GSS: 12381 rpcsec_gss_info flavor_info; 12382 default: 12383 void; 12384 }; 12386 typedef secinfo4 SECINFO4resok<>; 12388 union SECINFO4res switch (nfsstat4 status) { 12389 case NFS4_OK: 12390 SECINFO4resok resok4; 12391 default: 12392 void; 12393 }; 12395 15.33.4. DESCRIPTION 12397 The SECINFO operation is used by the client to obtain a list of valid 12398 RPC authentication flavors for a specific directory filehandle, file 12399 name pair. SECINFO should apply the same access methodology used for 12400 LOOKUP when evaluating the name. Therefore, if the requester does 12401 not have the appropriate access to LOOKUP the name then SECINFO must 12402 behave the same way and return NFS4ERR_ACCESS. 12404 The result will contain an array which represents the security 12405 mechanisms available, with an order corresponding to server's 12406 preferences, the most preferred being first in the array. The client 12407 is free to pick whatever security mechanism it both desires and 12408 supports, or to pick in the server's preference order the first one 12409 it supports. The array entries are represented by the secinfo4 12410 structure. The field 'flavor' will contain a value of AUTH_NONE, 12411 AUTH_SYS (as defined in [RFC5531]), or RPCSEC_GSS (as defined in 12412 [RFC2203]). 12414 For the flavors AUTH_NONE and AUTH_SYS, no additional security 12415 information is returned. For a return value of RPCSEC_GSS, a 12416 security triple is returned that contains the mechanism object id (as 12417 defined in [RFC2743]), the quality of protection (as defined in 12418 [RFC2743]) and the service type (as defined in [RFC2203]). It is 12419 possible for SECINFO to return multiple entries with flavor equal to 12420 RPCSEC_GSS with different security triple values. 12422 On success, the current filehandle retains its value. 12424 If the name has a length of 0 (zero), or if name does not obey the 12425 UTF-8 definition, the error NFS4ERR_INVAL will be returned. 12427 15.33.5. IMPLEMENTATION 12429 The SECINFO operation is expected to be used by the NFS client when 12430 the error value of NFS4ERR_WRONGSEC is returned from another NFS 12431 operation. This signifies to the client that the server's security 12432 policy is different from what the client is currently using. At this 12433 point, the client is expected to obtain a list of possible security 12434 flavors and choose what best suits its policies. 12436 As mentioned, the server's security policies will determine when a 12437 client request receives NFS4ERR_WRONGSEC. The operations which may 12438 receive this error are: LINK, LOOKUP, LOOKUPP, OPEN, PUTFH, PUTPUBFH, 12439 PUTROOTFH, RENAME, RESTOREFH, and indirectly READDIR. LINK and 12440 RENAME will only receive this error if the security used for the 12441 operation is inappropriate for saved filehandle. With the exception 12442 of READDIR, these operations represent the point at which the client 12443 can instantiate a filehandle into the "current filehandle" at the 12444 server. The filehandle is either provided by the client (PUTFH, 12445 PUTPUBFH, PUTROOTFH) or generated as a result of a name to filehandle 12446 translation (LOOKUP and OPEN). RESTOREFH is different because the 12447 filehandle is a result of a previous SAVEFH. Even though the 12448 filehandle, for RESTOREFH, might have previously passed the server's 12449 inspection for a security match, the server will check it again on 12450 RESTOREFH to ensure that the security policy has not changed. 12452 If the client wants to resolve an error return of NFS4ERR_WRONGSEC, 12453 the following will occur: 12455 o For LOOKUP and OPEN, the client will use SECINFO with the same 12456 current filehandle and name as provided in the original LOOKUP or 12457 OPEN to enumerate the available security triples. 12459 o For LINK, PUTFH, RENAME, and RESTOREFH, the client will use 12460 SECINFO and provide the parent directory filehandle and object 12461 name which corresponds to the filehandle originally provided by 12462 the PUTFH RESTOREFH, or for LINK and RENAME, the SAVEFH. 12464 o For LOOKUPP, PUTROOTFH and PUTPUBFH, the client will be unable to 12465 use the SECINFO operation since SECINFO requires a current 12466 filehandle and none exist for these two operations. Therefore, 12467 the client must iterate through the security triples available at 12468 the client and reattempt the PUTROOTFH or PUTPUBFH operation. In 12469 the unfortunate event none of the MANDATORY security triples are 12470 supported by the client and server, the client SHOULD try using 12471 others that support integrity. Failing that, the client can try 12472 using AUTH_NONE, but because such forms lack integrity checks, 12473 this puts the client at risk. Nonetheless, the server SHOULD 12474 allow the client to use whatever security form the client requests 12475 and the server supports, since the risks of doing so are on the 12476 client. 12478 The READDIR operation will not directly return the NFS4ERR_WRONGSEC 12479 error. However, if the READDIR request included a request for 12480 attributes, it is possible that the READDIR request's security triple 12481 does not match that of a directory entry. If this is the case and 12482 the client has requested the rdattr_error attribute, the server will 12483 return the NFS4ERR_WRONGSEC error in rdattr_error for the entry. 12485 Note that a server MAY use the AUTH_NONE flavor to signify that the 12486 client is allowed to attempt to use authentication flavors that are 12487 not explicitly listed in the SECINFO results. Instead of using a 12488 listed flavor, the client might then, for instance opt to use an 12489 otherwise unlisted RPCSEC_GSS mechanism instead of AUTH_NONE. It may 12490 wish to do so in order to meet an application requirement for data 12491 integrity or privacy. In choosing to use an unlisted flavor, the 12492 client SHOULD always be prepared to handle a failure by falling back 12493 to using AUTH_NONE or another listed flavor. It cannot assume that 12494 identity mapping is supported, and should be prepared for the fact 12495 that its identity is squashed. 12497 See Section 17 for a discussion on the recommendations for security 12498 flavor used by SECINFO. 12500 15.34. Operation 34: SETATTR - Set Attributes 12502 15.34.1. SYNOPSIS 12504 (cfh), stateid, attrmask, attr_vals -> attrsset 12506 15.34.2. ARGUMENT 12508 struct SETATTR4args { 12509 /* CURRENT_FH: target object */ 12510 stateid4 stateid; 12511 fattr4 obj_attributes; 12512 }; 12514 15.34.3. RESULT 12516 struct SETATTR4res { 12517 nfsstat4 status; 12518 bitmap4 attrsset; 12519 }; 12521 15.34.4. DESCRIPTION 12523 The SETATTR operation changes one or more of the attributes of a file 12524 system object. The new attributes are specified with a bitmap and 12525 the attributes that follow the bitmap in bit order. 12527 The stateid argument for SETATTR is used to provide byte-range 12528 locking context that is necessary for SETATTR requests that set the 12529 size attribute. Since setting the size attribute modifies the file's 12530 data, it has the same locking requirements as a corresponding WRITE. 12531 Any SETATTR that sets the size attribute is incompatible with a share 12532 reservation that specifies OPEN4_SHARE_DENY_WRITE. The area between 12533 the old end-of-file and the new end-of-file is considered to be 12534 modified just as would have been the case had the area in question 12535 been specified as the target of WRITE, for the purpose of checking 12536 conflicts with byte-range locks, for those cases in which a server is 12537 implementing mandatory byte-range locking behavior. A valid stateid 12538 SHOULD always be specified. When the file size attribute is not set, 12539 the special anonymous stateid MAY be passed. 12541 On either success or failure of the operation, the server will return 12542 the attrsset bitmask to represent what (if any) attributes were 12543 successfully set. The attrsset in the response is a subset of the 12544 bitmap4 that is part of the obj_attributes in the argument. 12546 On success, the current filehandle retains its value. 12548 15.34.5. IMPLEMENTATION 12550 If the request specifies the owner attribute to be set, the server 12551 SHOULD allow the operation to succeed if the current owner of the 12552 object matches the value specified in the request. Some servers may 12553 be implemented in a way as to prohibit the setting of the owner 12554 attribute unless the requester has privilege to do so. If the server 12555 is lenient in this one case of matching owner values, the client 12556 implementation may be simplified in cases of creation of an object 12557 (e.g., an exclusive create via OPEN) followed by a SETATTR. 12559 The file size attribute is used to request changes to the size of a 12560 file. A value of zero causes the file to be truncated, a value less 12561 than the current size of the file causes data from new size to the 12562 end of the file to be discarded, and a size greater than the current 12563 size of the file causes logically zeroed data bytes to be added to 12564 the end of the file. Servers are free to implement this using holes 12565 or actual zero data bytes. Clients should not make any assumptions 12566 regarding a server's implementation of this feature, beyond that the 12567 bytes returned will be zeroed. Servers MUST support extending the 12568 file size via SETATTR. 12570 SETATTR is not guaranteed atomic. A failed SETATTR may partially 12571 change a file's attributes, hence the reason why the reply always 12572 includes the status and the list of attributes that were set. 12574 If the object whose attributes are being changed has a file 12575 delegation that is held by a client other than the one doing the 12576 SETATTR, the delegation(s) must be recalled, and the operation cannot 12577 proceed to actually change an attribute until each such delegation is 12578 returned or revoked. In all cases in which delegations are recalled, 12579 the server is likely to return one or more NFS4ERR_DELAY errors while 12580 the delegation(s) remains outstanding, although it might not do that 12581 if the delegations are returned quickly. 12583 Changing the size of a file with SETATTR indirectly changes the 12584 time_modify and change attributes. A client must account for this as 12585 size changes can result in data deletion. 12587 The attributes time_access_set and time_modify_set are write-only 12588 attributes constructed as a switched union so the client can direct 12589 the server in setting the time values. If the switched union 12590 specifies SET_TO_CLIENT_TIME4, the client has provided an nfstime4 to 12591 be used for the operation. If the switch union does not specify 12592 SET_TO_CLIENT_TIME4, the server is to use its current time for the 12593 SETATTR operation. 12595 If server and client times differ, programs that compare client time 12596 to file times can break. A time maintenance protocol should be used 12597 to limit client/server time skew. 12599 Use of a COMPOUND containing a VERIFY operation specifying only the 12600 change attribute, immediately followed by a SETATTR, provides a means 12601 whereby a client may specify a request that emulates the 12602 functionality of the SETATTR guard mechanism of NFSv3. Since the 12603 function of the guard mechanism is to avoid changes to the file 12604 attributes based on stale information, delays between checking of the 12605 guard condition and the setting of the attributes have the potential 12606 to compromise this function, as would the corresponding delay in the 12607 NFSv4 emulation. Therefore, NFSv4 servers should take care to avoid 12608 such delays, to the degree possible, when executing such a request. 12610 If the server does not support an attribute as requested by the 12611 client, the server should return NFS4ERR_ATTRNOTSUPP. 12613 A mask of the attributes actually set is returned by SETATTR in all 12614 cases. That mask MUST NOT include attribute bits not requested to be 12615 set by the client. If the attribute masks in the request and reply 12616 are equal, the status field in the reply MUST be NFS4_OK. 12618 15.35. Operation 35: SETCLIENTID - Negotiate Client ID 12620 15.35.1. SYNOPSIS 12622 client, callback, callback_ident -> clientid, setclientid_confirm 12624 15.35.2. ARGUMENT 12626 struct SETCLIENTID4args { 12627 nfs_client_id4 client; 12628 cb_client4 callback; 12629 uint32_t callback_ident; 12630 }; 12632 15.35.3. RESULT 12633 struct SETCLIENTID4resok { 12634 clientid4 clientid; 12635 verifier4 setclientid_confirm; 12636 }; 12638 union SETCLIENTID4res switch (nfsstat4 status) { 12639 case NFS4_OK: 12640 SETCLIENTID4resok resok4; 12641 case NFS4ERR_CLID_INUSE: 12642 clientaddr4 client_using; 12643 default: 12644 void; 12645 }; 12647 15.35.4. DESCRIPTION 12649 The client uses the SETCLIENTID operation to notify the server of its 12650 intention to use a particular client identifier, callback, and 12651 callback_ident for subsequent requests that entail creating lock, 12652 share reservation, and delegation state on the server. Upon 12653 successful completion the server will return a shorthand client ID 12654 which, if confirmed via a separate step, will be used in subsequent 12655 file locking and file open requests. Confirmation of the client ID 12656 must be done via the SETCLIENTID_CONFIRM operation to return the 12657 client ID and setclientid_confirm values, as verifiers, to the 12658 server. The reason why two verifiers are necessary is that it is 12659 possible to use SETCLIENTID and SETCLIENTID_CONFIRM to modify the 12660 callback and callback_ident information but not the shorthand client 12661 ID. In that event, the setclientid_confirm value is effectively the 12662 only verifier. 12664 The callback information provided in this operation will be used if 12665 the client is provided an open delegation at a future point. 12666 Therefore, the client must correctly reflect the program and port 12667 numbers for the callback program at the time SETCLIENTID is used. 12669 The callback_ident value is used by the server on the callback. The 12670 client can leverage the callback_ident to eliminate the need for more 12671 than one callback RPC program number, while still being able to 12672 determine which server is initiating the callback. 12674 15.35.5. IMPLEMENTATION 12676 To understand how to implement SETCLIENTID, make the following 12677 notations. Let: 12679 x be the value of the client.id subfield of the SETCLIENTID4args 12680 structure. 12682 v be the value of the client.verifier subfield of the 12683 SETCLIENTID4args structure. 12685 c be the value of the client ID field returned in the 12686 SETCLIENTID4resok structure. 12688 k represent the value combination of the fields callback and 12689 callback_ident fields of the SETCLIENTID4args structure. 12691 s be the setclientid_confirm value returned in the SETCLIENTID4resok 12692 structure. 12694 { v, x, c, k, s } be a quintuple for a client record. A client 12695 record is confirmed if there has been a SETCLIENTID_CONFIRM 12696 operation to confirm it. Otherwise it is unconfirmed. An 12697 unconfirmed record is established by a SETCLIENTID call. 12699 Since SETCLIENTID is a non-idempotent operation, let us assume that 12700 the server is implementing the duplicate request cache (DRC). 12702 When the server gets a SETCLIENTID { v, x, k } request, it processes 12703 it in the following manner. 12705 o It first looks up the request in the DRC. If there is a hit, it 12706 returns the result cached in the DRC. The server does NOT remove 12707 client state (locks, shares, delegations) nor does it modify any 12708 recorded callback and callback_ident information for client { x }. 12710 For any DRC miss, the server takes the client ID string x, and 12711 searches for client records for x that the server may have 12712 recorded from previous SETCLIENTID calls. For any confirmed 12713 record with the same id string x, if the recorded principal does 12714 not match that of SETCLIENTID call, then the server returns a 12715 NFS4ERR_CLID_INUSE error. 12717 For brevity of discussion, the remaining description of the 12718 processing assumes that there was a DRC miss, and that where the 12719 server has previously recorded a confirmed record for client x, 12720 the aforementioned principal check has successfully passed. 12722 o The server checks if it has recorded a confirmed record for { v, 12723 x, c, l, s }, where l may or may not equal k. If so, and since the 12724 id verifier v of the request matches that which is confirmed and 12725 recorded, the server treats this as a probable callback 12726 information update and records an unconfirmed { v, x, c, k, t } 12727 and leaves the confirmed { v, x, c, l, s } in place, such that t 12728 != s. It does not matter if k equals l or not. Any pre-existing 12729 unconfirmed { v, x, c, *, * } is removed. 12731 The server returns { c, t }. It is indeed returning the old 12732 clientid4 value c, because the client apparently only wants to 12733 update callback value k to value l. It's possible this request is 12734 one from the Byzantine router that has stale callback information, 12735 but this is not a problem. The callback information update is 12736 only confirmed if followed up by a SETCLIENTID_CONFIRM { c, t }. 12738 The server awaits confirmation of k via SETCLIENTID_CONFIRM { c, t 12739 }. 12741 The server does NOT remove client (lock/share/delegation) state 12742 for x. 12744 o The server has previously recorded a confirmed { u, x, c, l, s } 12745 record such that v != u, l may or may not equal k, and has not 12746 recorded any unconfirmed { *, x, *, *, * } record for x. The 12747 server records an unconfirmed { v, x, d, k, t } (d != c, t != s). 12749 The server returns { d, t }. 12751 The server awaits confirmation of { d, k } via SETCLIENTID_CONFIRM 12752 { d, t }. 12754 The server does NOT remove client (lock/share/delegation) state 12755 for x. 12757 o The server has previously recorded a confirmed { u, x, c, l, s } 12758 record such that v != u, l may or may not equal k, and recorded an 12759 unconfirmed { w, x, d, m, t } record such that c != d, t != s, m 12760 may or may not equal k, m may or may not equal l, and k may or may 12761 not equal l. Whether w == v or w != v makes no difference. The 12762 server simply removes the unconfirmed { w, x, d, m, t } record and 12763 replaces it with an unconfirmed { v, x, e, k, r } record, such 12764 that e != d, e != c, r != t, r != s. 12766 The server returns { e, r }. 12768 The server awaits confirmation of { e, k } via SETCLIENTID_CONFIRM 12769 { e, r }. 12771 The server does NOT remove client (lock/share/delegation) state 12772 for x. 12774 o The server has no confirmed { *, x, *, *, * } for x. It may or may 12775 not have recorded an unconfirmed { u, x, c, l, s }, where l may or 12776 may not equal k, and u may or may not equal v. Any unconfirmed 12777 record { u, x, c, l, * }, regardless whether u == v or l == k, is 12778 replaced with an unconfirmed record { v, x, d, k, t } where d != 12779 c, t != s. 12781 The server returns { d, t }. 12783 The server awaits confirmation of { d, k } via SETCLIENTID_CONFIRM 12784 { d, t }. The server does NOT remove client (lock/share/ 12785 delegation) state for x. 12787 The server generates the clientid and setclientid_confirm values and 12788 must take care to ensure that these values are extremely unlikely to 12789 ever be regenerated. 12791 15.36. Operation 36: SETCLIENTID_CONFIRM - Confirm Client ID 12793 15.36.1. SYNOPSIS 12795 clientid, setclientid_confirm -> - 12797 15.36.2. ARGUMENT 12799 struct SETCLIENTID_CONFIRM4args { 12800 clientid4 clientid; 12801 verifier4 setclientid_confirm; 12802 }; 12804 15.36.3. RESULT 12806 struct SETCLIENTID_CONFIRM4res { 12807 nfsstat4 status; 12808 }; 12810 15.36.4. DESCRIPTION 12812 This operation is used by the client to confirm the results from a 12813 previous call to SETCLIENTID. The client provides the server 12814 supplied (from a SETCLIENTID response) client ID. The server 12815 responds with a simple status of success or failure. 12817 15.36.5. IMPLEMENTATION 12819 The client must use the SETCLIENTID_CONFIRM operation to confirm the 12820 following two distinct cases: 12822 o The client's use of a new shorthand client identifier (as returned 12823 from the server in the response to SETCLIENTID), a new callback 12824 value (as specified in the arguments to SETCLIENTID) and a new 12825 callback_ident (as specified in the arguments to SETCLIENTID) 12826 value. The client's use of SETCLIENTID_CONFIRM in this case also 12827 confirms the removal of any of the client's previous relevant 12828 leased state. Relevant leased client state includes byte-range 12829 locks, share reservations, and where the server does not support 12830 the CLAIM_DELEGATE_PREV claim type, delegations. If the server 12831 supports CLAIM_DELEGATE_PREV, then SETCLIENTID_CONFIRM MUST NOT 12832 remove delegations for this client; relevant leased client state 12833 would then just include byte-range locks and share reservations. 12835 o The client's re-use of an old, previously confirmed, shorthand 12836 client identifier, a new callback value, and a new callback_ident 12837 value. The client's use of SETCLIENTID_CONFIRM in this case MUST 12838 NOT result in the removal of any previous leased state (locks, 12839 share reservations, and delegations) 12841 We use the same notation and definitions for v, x, c, k, s, and 12842 unconfirmed and confirmed client records as introduced in the 12843 description of the SETCLIENTID operation. The arguments to 12844 SETCLIENTID_CONFIRM are indicated by the notation { c, s }, where c 12845 is a value of type clientid4, and s is a value of type verifier4 12846 corresponding to the setclientid_confirm field. 12848 As with SETCLIENTID, SETCLIENTID_CONFIRM is a non-idempotent 12849 operation, and we assume that the server is implementing the 12850 duplicate request cache (DRC). 12852 When the server gets a SETCLIENTID_CONFIRM { c, s } request, it 12853 processes it in the following manner. 12855 o It first looks up the request in the DRC. If there is a hit, it 12856 returns the result cached in the DRC. The server does not remove 12857 any relevant leased client state nor does it modify any recorded 12858 callback and callback_ident information for client { x } as 12859 represented by the shorthand value c. 12861 For a DRC miss, the server checks for client records that match the 12862 shorthand value c. The processing cases are as follows: 12864 o The server has recorded an unconfirmed { v, x, c, k, s } record 12865 and a confirmed { v, x, c, l, t } record, such that s != t. If 12866 the principals of the records do not match that of the 12867 SETCLIENTID_CONFIRM, the server returns NFS4ERR_CLID_INUSE, and no 12868 relevant leased client state is removed and no recorded callback 12869 and callback_ident information for client { x } is changed. 12870 Otherwise, the confirmed { v, x, c, l, t } record is removed and 12871 the unconfirmed { v, x, c, k, s } is marked as confirmed, thereby 12872 modifying recorded and confirmed callback and callback_ident 12873 information for client { x }. 12875 The server does not remove any relevant leased client state. 12877 The server returns NFS4_OK. 12879 o The server has not recorded an unconfirmed { v, x, c, *, * } and 12880 has recorded a confirmed { v, x, c, *, s }. If the principals of 12881 the record and of SETCLIENTID_CONFIRM do not match, the server 12882 returns NFS4ERR_CLID_INUSE without removing any relevant leased 12883 client state and without changing recorded callback and 12884 callback_ident values for client { x }. 12886 If the principals match, then what has likely happened is that the 12887 client never got the response from the SETCLIENTID_CONFIRM, and 12888 the DRC entry has been purged. Whatever the scenario, since the 12889 principals match, as well as { c, s } matching a confirmed record, 12890 the server leaves client x's relevant leased client state intact, 12891 leaves its callback and callback_ident values unmodified, and 12892 returns NFS4_OK. 12894 o The server has not recorded a confirmed { *, *, c, *, * }, and has 12895 recorded an unconfirmed { *, x, c, k, s }. Even if this is a 12896 retry from client, nonetheless the client's first 12897 SETCLIENTID_CONFIRM attempt was not received by the server. Retry 12898 or not, the server doesn't know, but it processes it as if were a 12899 first try. If the principal of the unconfirmed { *, x, c, k, s } 12900 record mismatches that of the SETCLIENTID_CONFIRM request the 12901 server returns NFS4ERR_CLID_INUSE without removing any relevant 12902 leased client state. 12904 Otherwise, the server records a confirmed { *, x, c, k, s }. If 12905 there is also a confirmed { *, x, d, *, t }, the server MUST 12906 remove the client x's relevant leased client state, and overwrite 12907 the callback state with k. The confirmed record { *, x, d, *, t } 12908 is removed. 12910 Server returns NFS4_OK. 12912 o The server has no record of a confirmed or unconfirmed { *, *, c, 12913 *, s }. The server returns NFS4ERR_STALE_CLIENTID. The server 12914 does not remove any relevant leased client state, nor does it 12915 modify any recorded callback and callback_ident information for 12916 any client. 12918 The server needs to cache unconfirmed { v, x, c, k, s } client 12919 records and await for some time their confirmation. As should be 12920 clear from the record processing discussions for SETCLIENTID and 12921 SETCLIENTID_CONFIRM, there are cases where the server does not 12922 deterministically remove unconfirmed client records. To avoid 12923 running out of resources, the server is not required to hold 12924 unconfirmed records indefinitely. One strategy the server might use 12925 is to set a limit on how many unconfirmed client records it will 12926 maintain, and then when the limit would be exceeded, remove the 12927 oldest record. Another strategy might be to remove an unconfirmed 12928 record when some amount of time has elapsed. The choice of the 12929 amount of time is fairly arbitrary but it is surely no higher than 12930 the server's lease time period. Consider that leases need to be 12931 renewed before the lease time expires via an operation from the 12932 client. If the client cannot issue a SETCLIENTID_CONFIRM after a 12933 SETCLIENTID before a period of time equal to that of a lease expires, 12934 then the client is unlikely to be able maintain state on the server 12935 during steady state operation. 12937 If the client does send a SETCLIENTID_CONFIRM for an unconfirmed 12938 record that the server has already deleted, the client will get 12939 NFS4ERR_STALE_CLIENTID back. If so, the client should then start 12940 over, and send SETCLIENTID to reestablish an unconfirmed client 12941 record and get back an unconfirmed client ID and setclientid_confirm 12942 verifier. The client should then send the SETCLIENTID_CONFIRM to 12943 confirm the client ID. 12945 SETCLIENTID_CONFIRM does not establish or renew a lease. However, if 12946 SETCLIENTID_CONFIRM removes relevant leased client state, and that 12947 state does not include existing delegations, the server MUST allow 12948 the client a period of time no less than the value of lease_time 12949 attribute, to reclaim, (via the CLAIM_DELEGATE_PREV claim type of the 12950 OPEN operation) its delegations before removing unreclaimed 12951 delegations. 12953 15.37. Operation 37: VERIFY - Verify Same Attributes 12955 15.37.1. SYNOPSIS 12957 (cfh), fattr -> - 12959 15.37.2. ARGUMENT 12961 struct VERIFY4args { 12962 /* CURRENT_FH: object */ 12963 fattr4 obj_attributes; 12964 }; 12966 15.37.3. RESULT 12968 struct VERIFY4res { 12969 nfsstat4 status; 12970 }; 12972 15.37.4. DESCRIPTION 12974 The VERIFY operation is used to verify that attributes have a value 12975 assumed by the client before proceeding with following operations in 12976 the compound request. If any of the attributes do not match then the 12977 error NFS4ERR_NOT_SAME must be returned. The current filehandle 12978 retains its value after successful completion of the operation. 12980 15.37.5. IMPLEMENTATION 12982 One possible use of the VERIFY operation is the following compound 12983 sequence. With this the client is attempting to verify that the file 12984 being removed will match what the client expects to be removed. This 12985 sequence can help prevent the unintended deletion of a file. 12987 PUTFH (directory filehandle) 12988 LOOKUP (file name) 12989 VERIFY (filehandle == fh) 12990 PUTFH (directory filehandle) 12991 REMOVE (file name) 12993 This sequence does not prevent a second client from removing and 12994 creating a new file in the middle of this sequence but it does help 12995 avoid the unintended result. 12997 In the case that a RECOMMENDED attribute is specified in the VERIFY 12998 operation and the server does not support that attribute for the file 12999 system object, the error NFS4ERR_ATTRNOTSUPP is returned to the 13000 client. 13002 When the attribute rdattr_error or any write-only attribute (e.g., 13003 time_modify_set) is specified, the error NFS4ERR_INVAL is returned to 13004 the client. 13006 15.38. Operation 38: WRITE - Write to File 13008 15.38.1. SYNOPSIS 13010 (cfh), stateid, offset, stable, data -> count, committed, writeverf 13012 15.38.2. ARGUMENT 13014 enum stable_how4 { 13015 UNSTABLE4 = 0, 13016 DATA_SYNC4 = 1, 13017 FILE_SYNC4 = 2 13018 }; 13020 struct WRITE4args { 13021 /* CURRENT_FH: file */ 13022 stateid4 stateid; 13023 offset4 offset; 13024 stable_how4 stable; 13025 opaque data<>; 13026 }; 13028 15.38.3. RESULT 13030 struct WRITE4resok { 13031 count4 count; 13032 stable_how4 committed; 13033 verifier4 writeverf; 13034 }; 13036 union WRITE4res switch (nfsstat4 status) { 13037 case NFS4_OK: 13038 WRITE4resok resok4; 13039 default: 13040 void; 13041 }; 13043 15.38.4. DESCRIPTION 13045 The WRITE operation is used to write data to a regular file. The 13046 target file is specified by the current filehandle. The offset 13047 specifies the offset where the data should be written. An offset of 13048 0 (zero) specifies that the write should start at the beginning of 13049 the file. The count, as encoded as part of the opaque data 13050 parameter, represents the number of bytes of data that are to be 13051 written. If the count is 0 (zero), the WRITE will succeed and return 13052 a count of 0 (zero) subject to permissions checking. The server may 13053 choose to write fewer bytes than requested by the client. 13055 Part of the write request is a specification of how the write is to 13056 be performed. The client specifies with the stable parameter the 13057 method of how the data is to be processed by the server. If stable 13058 is FILE_SYNC4, the server must commit the data written plus all file 13059 system metadata to stable storage before returning results. This 13060 corresponds to the NFS version 2 protocol semantics. Any other 13061 behavior constitutes a protocol violation. If stable is DATA_SYNC4, 13062 then the server must commit all of the data to stable storage and 13063 enough of the metadata to retrieve the data before returning. The 13064 server implementer is free to implement DATA_SYNC4 in the same 13065 fashion as FILE_SYNC4, but with a possible performance drop. If 13066 stable is UNSTABLE4, the server is free to commit any part of the 13067 data and the metadata to stable storage, including all or none, 13068 before returning a reply to the client. There is no guarantee 13069 whether or when any uncommitted data will subsequently be committed 13070 to stable storage. The only guarantees made by the server are that 13071 it will not destroy any data without changing the value of verf and 13072 that it will not commit the data and metadata at a level less than 13073 that requested by the client. 13075 The stateid value for a WRITE request represents a value returned 13076 from a previous byte-range lock or share reservation request or the 13077 stateid associated with a delegation. The stateid is used by the 13078 server to verify that the associated share reservation and any byte- 13079 range locks are still valid and to update lease timeouts for the 13080 client. 13082 Upon successful completion, the following results are returned. The 13083 count result is the number of bytes of data written to the file. The 13084 server may write fewer bytes than requested. If so, the actual 13085 number of bytes written starting at location, offset, is returned. 13087 The server also returns an indication of the level of commitment of 13088 the data and metadata via committed. If the server committed all 13089 data and metadata to stable storage, committed should be set to 13090 FILE_SYNC4. If the level of commitment was at least as strong as 13091 DATA_SYNC4, then committed should be set to DATA_SYNC4. Otherwise, 13092 committed must be returned as UNSTABLE4. If stable was FILE4_SYNC, 13093 then committed must also be FILE_SYNC4: anything else constitutes a 13094 protocol violation. If stable was DATA_SYNC4, then committed may be 13095 FILE_SYNC4 or DATA_SYNC4: anything else constitutes a protocol 13096 violation. If stable was UNSTABLE4, then committed may be either 13097 FILE_SYNC4, DATA_SYNC4, or UNSTABLE4. 13099 The final portion of the result is the write verifier. The write 13100 verifier is a cookie that the client can use to determine whether the 13101 server has changed instance (boot) state between a call to WRITE and 13102 a subsequent call to either WRITE or COMMIT. This cookie must be 13103 consistent during a single instance of the NFSv4 protocol service and 13104 must be unique between instances of the NFSv4 protocol server, where 13105 uncommitted data may be lost. 13107 If a client writes data to the server with the stable argument set to 13108 UNSTABLE4 and the reply yields a committed response of DATA_SYNC4 or 13109 UNSTABLE4, the client will follow up some time in the future with a 13110 COMMIT operation to synchronize outstanding asynchronous data and 13111 metadata with the server's stable storage, barring client error. It 13112 is possible that due to client crash or other error that a subsequent 13113 COMMIT will not be received by the server. 13115 For a WRITE using the special anonymous stateid, the server MAY allow 13116 the WRITE to be serviced subject to mandatory file locks or the 13117 current share deny modes for the file. For a WRITE using the special 13118 READ bypass stateid, the server MUST NOT allow the WRITE operation to 13119 bypass locking checks at the server and is treated exactly the same 13120 as if the anonymous stateid were used. 13122 On success, the current filehandle retains its value. 13124 15.38.5. IMPLEMENTATION 13126 It is possible for the server to write fewer bytes of data than 13127 requested by the client. In this case, the server should not return 13128 an error unless no data was written at all. If the server writes 13129 less than the number of bytes specified, the client should issue 13130 another WRITE to write the remaining data. 13132 It is assumed that the act of writing data to a file will cause the 13133 time_modified of the file to be updated. However, the time_modified 13134 of the file should not be changed unless the contents of the file are 13135 changed. Thus, a WRITE request with count set to 0 should not cause 13136 the time_modified of the file to be updated. 13138 The definition of stable storage has been historically a point of 13139 contention. The following expected properties of stable storage may 13140 help in resolving design issues in the implementation. Stable 13141 storage is persistent storage that survives: 13143 1. Repeated power failures. 13145 2. Hardware failures (of any board, power supply, etc.). 13147 3. Repeated software crashes, including reboot cycle. 13149 This definition does not address failure of the stable storage module 13150 itself. 13152 The verifier is defined to allow a client to detect different 13153 instances of an NFSv4 protocol server over which cached, uncommitted 13154 data may be lost. In the most likely case, the verifier allows the 13155 client to detect server reboots. This information is required so 13156 that the client can safely determine whether the server could have 13157 lost cached data. If the server fails unexpectedly and the client 13158 has uncommitted data from previous WRITE requests (done with the 13159 stable argument set to UNSTABLE4 and in which the result committed 13160 was returned as UNSTABLE4 as well) it may not have flushed cached 13161 data to stable storage. The burden of recovery is on the client and 13162 the client will need to retransmit the data to the server. 13164 A suggested verifier would be to use the time that the server was 13165 booted or the time the server was last started (if restarting the 13166 server without a reboot results in lost buffers). 13168 The committed field in the results allows the client to do more 13169 effective caching. If the server is committing all WRITE requests to 13170 stable storage, then it should return with committed set to 13171 FILE_SYNC4, regardless of the value of the stable field in the 13172 arguments. A server that uses an NVRAM accelerator may choose to 13173 implement this policy. The client can use this to increase the 13174 effectiveness of the cache by discarding cached data that has already 13175 been committed on the server. 13177 Some implementations may return NFS4ERR_NOSPC instead of 13178 NFS4ERR_DQUOT when a user's quota is exceeded. In the case that the 13179 current filehandle is a directory, the server will return 13180 NFS4ERR_ISDIR. If the current filehandle is not a regular file or a 13181 directory, the server will return NFS4ERR_INVAL. 13183 If mandatory file locking is on for the file, and corresponding 13184 record of the data to be written file is read or write locked by an 13185 owner that is not associated with the stateid, the server will return 13186 NFS4ERR_LOCKED. If so, the client must check if the owner 13187 corresponding to the stateid used with the WRITE operation has a 13188 conflicting read lock that overlaps with the region that was to be 13189 written. If the stateid's owner has no conflicting read lock, then 13190 the client should try to get the appropriate write byte-range lock 13191 via the LOCK operation before re-attempting the WRITE. When the 13192 WRITE completes, the client should release the byte-range lock via 13193 LOCKU. 13195 If the stateid's owner had a conflicting read lock, then the client 13196 has no choice but to return an error to the application that 13197 attempted the WRITE. The reason is that since the stateid's owner 13198 had a read lock, the server either attempted to temporarily 13199 effectively upgrade this read lock to a write lock, or the server has 13200 no upgrade capability. If the server attempted to upgrade the read 13201 lock and failed, it is pointless for the client to re-attempt the 13202 upgrade via the LOCK operation, because there might be another client 13203 also trying to upgrade. If two clients are blocked trying upgrade 13204 the same lock, the clients deadlock. If the server has no upgrade 13205 capability, then it is pointless to try a LOCK operation to upgrade. 13207 15.39. Operation 39: RELEASE_LOCKOWNER - Release Lockowner State 13209 15.39.1. SYNOPSIS 13211 lock-owner -> () 13213 15.39.2. ARGUMENT 13215 struct RELEASE_LOCKOWNER4args { 13216 lock_owner4 lock_owner; 13217 }; 13219 15.39.3. RESULT 13221 struct RELEASE_LOCKOWNER4res { 13222 nfsstat4 status; 13223 }; 13225 15.39.4. DESCRIPTION 13227 This operation is used to notify the server that the lock_owner is no 13228 longer in use by the client and that future client requests will not 13229 reference this lock_owner. This allows the server to release cached 13230 state related to the specified lock_owner. If file locks, associated 13231 with the lock_owner, are held at the server, the error 13232 NFS4ERR_LOCKS_HELD will be returned and no further action will be 13233 taken. 13235 15.39.5. IMPLEMENTATION 13237 The client may choose to use this operation to ease the amount of 13238 server state that is held. Information that can be released when a 13239 RELEASE_LOCKOWNER is done includes the specified lock-owner string, 13240 the seqid associated with the lock-owner, any saved reply for the 13241 lock-owner, and any lock stateids associated with that lock-owner. 13243 Depending on the behavior of applications at the client, it may be 13244 important for the client to use this operation since the server has 13245 certain obligations with respect to holding a reference to lock- 13246 owner-associated state as long as an associated file is open. 13247 Therefore, if the client knows for certain that the lock_owner will 13248 no longer be used, either to reference existing lock stateids 13249 associated with the lock-owner to create new ones, it should use 13250 RELEASE_LOCKOWNER. 13252 15.40. Operation 10044: ILLEGAL - Illegal operation 13254 15.40.1. SYNOPSIS 13256 -> () 13258 15.40.2. ARGUMENT 13260 void; 13262 15.40.3. RESULT 13264 struct ILLEGAL4res { 13265 nfsstat4 status; 13266 }; 13268 15.40.4. DESCRIPTION 13270 This operation is a place holder for encoding a result to handle the 13271 case of the client sending an operation code within COMPOUND that is 13272 not supported. See Section 15.2.4 for more details. 13274 The status field of ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL. 13276 15.40.5. IMPLEMENTATION 13278 A client will probably not send an operation with code OP_ILLEGAL but 13279 if it does, the response will be ILLEGAL4res just as it would be with 13280 any other invalid operation code. Note that if the server gets an 13281 illegal operation code that is not OP_ILLEGAL, and if the server 13282 checks for legal operation codes during the XDR decode phase, then 13283 the ILLEGAL4res would not be returned. 13285 16. NFSv4 Callback Procedures 13287 The procedures used for callbacks are defined in the following 13288 sections. In the interest of clarity, the terms "client" and 13289 "server" refer to NFS clients and servers, despite the fact that for 13290 an individual callback RPC, the sense of these terms would be 13291 precisely the opposite. 13293 [RFC Editor: prior to publishing this document as an RFC, please have 13294 every Section that has a title of "Procedure X:" or "Operation Y:" 13295 start at the top of a new page.] 13297 16.1. Procedure 0: CB_NULL - No Operation 13299 16.1.1. SYNOPSIS 13301 13303 16.1.2. ARGUMENT 13305 void; 13307 16.1.3. RESULT 13309 void; 13311 16.1.4. DESCRIPTION 13313 Standard NULL procedure. Void argument, void response. Even though 13314 there is no direct functionality associated with this procedure, the 13315 server will use CB_NULL to confirm the existence of a path for RPCs 13316 from server to client. 13318 16.2. Procedure 1: CB_COMPOUND - Compound Operations 13320 16.2.1. SYNOPSIS 13322 compoundargs -> compoundres 13324 16.2.2. ARGUMENT 13326 enum nfs_cb_opnum4 { 13327 OP_CB_GETATTR = 3, 13328 OP_CB_RECALL = 4, 13329 OP_CB_ILLEGAL = 10044 13330 }; 13331 union nfs_cb_argop4 switch (unsigned argop) { 13332 case OP_CB_GETATTR: 13333 CB_GETATTR4args opcbgetattr; 13334 case OP_CB_RECALL: 13335 CB_RECALL4args opcbrecall; 13336 case OP_CB_ILLEGAL: void; 13337 }; 13339 struct CB_COMPOUND4args { 13340 utf8str_cs tag; 13341 uint32_t minorversion; 13342 uint32_t callback_ident; 13343 nfs_cb_argop4 argarray<>; 13344 }; 13346 16.2.3. RESULT 13348 union nfs_cb_resop4 switch (unsigned resop) { 13349 case OP_CB_GETATTR: CB_GETATTR4res opcbgetattr; 13350 case OP_CB_RECALL: CB_RECALL4res opcbrecall; 13351 case OP_CB_ILLEGAL: CB_ILLEGAL4res opcbillegal; 13352 }; 13354 struct CB_COMPOUND4res { 13355 nfsstat4 status; 13356 utf8str_cs tag; 13357 nfs_cb_resop4 resarray<>; 13358 }; 13360 16.2.4. DESCRIPTION 13362 The CB_COMPOUND procedure is used to combine one or more of the 13363 callback procedures into a single RPC request. The main callback RPC 13364 program has two main procedures: CB_NULL and CB_COMPOUND. All other 13365 operations use the CB_COMPOUND procedure as a wrapper. 13367 In the processing of the CB_COMPOUND procedure, the client may find 13368 that it does not have the available resources to execute any or all 13369 of the operations within the CB_COMPOUND sequence. In this case, the 13370 error NFS4ERR_RESOURCE will be returned for the particular operation 13371 within the CB_COMPOUND procedure where the resource exhaustion 13372 occurred. This assumes that all previous operations within the 13373 CB_COMPOUND sequence have been evaluated successfully. 13375 Contained within the CB_COMPOUND results is a 'status' field. This 13376 status must be equivalent to the status of the last operation that 13377 was executed within the CB_COMPOUND procedure. Therefore, if an 13378 operation incurred an error then the 'status' value will be the same 13379 error value as is being returned for the operation that failed. 13381 For the definition of the "tag" field, see Section 15.2. 13383 The value of callback_ident is supplied by the client during 13384 SETCLIENTID. The server must use the client supplied callback_ident 13385 during the CB_COMPOUND to allow the client to properly identify the 13386 server. 13388 Illegal operation codes are handled in the same way as they are 13389 handled for the COMPOUND procedure. 13391 16.2.5. IMPLEMENTATION 13393 The CB_COMPOUND procedure is used to combine individual operations 13394 into a single RPC request. The client interprets each of the 13395 operations in turn. If an operation is executed by the client and 13396 the status of that operation is NFS4_OK, then the next operation in 13397 the CB_COMPOUND procedure is executed. The client continues this 13398 process until there are no more operations to be executed or one of 13399 the operations has a status value other than NFS4_OK. 13401 16.2.6. Operation 3: CB_GETATTR - Get Attributes 13403 16.2.6.1. SYNOPSIS 13405 fh, attr_request -> attrmask, attr_vals 13407 16.2.6.2. ARGUMENT 13409 struct CB_GETATTR4args { 13410 nfs_fh4 fh; 13411 bitmap4 attr_request; 13412 }; 13414 16.2.6.3. RESULT 13415 struct CB_GETATTR4resok { 13416 fattr4 obj_attributes; 13417 }; 13419 union CB_GETATTR4res switch (nfsstat4 status) { 13420 case NFS4_OK: 13421 CB_GETATTR4resok resok4; 13422 default: 13423 void; 13424 }; 13426 16.2.6.4. DESCRIPTION 13428 The CB_GETATTR operation is used by the server to obtain the current 13429 modified state of a file that has been OPEN_DELEGATE_WRITE delegated. 13430 The attributes size and change are the only ones guaranteed to be 13431 serviced by the client. See Section 10.4.3 for a full description of 13432 how the client and server are to interact with the use of CB_GETATTR. 13434 If the filehandle specified is not one for which the client holds a 13435 OPEN_DELEGATE_WRITE delegation, an NFS4ERR_BADHANDLE error is 13436 returned. 13438 16.2.6.5. IMPLEMENTATION 13440 The client returns attrmask bits and the associated attribute values 13441 only for the change attribute, and attributes that it may change 13442 (time_modify, and size). 13444 16.2.7. Operation 4: CB_RECALL - Recall an Open Delegation 13446 16.2.7.1. SYNOPSIS 13448 stateid, truncate, fh -> () 13450 16.2.7.2. ARGUMENT 13452 struct CB_RECALL4args { 13453 stateid4 stateid; 13454 bool truncate; 13455 nfs_fh4 fh; 13456 }; 13458 16.2.7.3. RESULT 13460 struct CB_RECALL4res { 13461 nfsstat4 status; 13462 }; 13464 16.2.7.4. DESCRIPTION 13466 The CB_RECALL operation is used to begin the process of recalling an 13467 open delegation and returning it to the server. 13469 The truncate flag is used to optimize recall for a file which is 13470 about to be truncated to zero. When it is set, the client is freed 13471 of obligation to propagate modified data for the file to the server, 13472 since this data is irrelevant. 13474 If the handle specified is not one for which the client holds an open 13475 delegation, an NFS4ERR_BADHANDLE error is returned. 13477 If the stateid specified is not one corresponding to an open 13478 delegation for the file specified by the filehandle, an 13479 NFS4ERR_BAD_STATEID is returned. 13481 16.2.7.5. IMPLEMENTATION 13483 The client should reply to the callback immediately. Replying does 13484 not complete the recall except when an error was returned. The 13485 recall is not complete until the delegation is returned using a 13486 DELEGRETURN. 13488 16.2.8. Operation 10044: CB_ILLEGAL - Illegal Callback Operation 13490 16.2.8.1. SYNOPSIS 13492 -> () 13494 16.2.8.2. ARGUMENT 13496 void; 13498 16.2.8.3. RESULT 13499 /* 13500 * CB_ILLEGAL: Response for illegal operation numbers 13501 */ 13502 struct CB_ILLEGAL4res { 13503 nfsstat4 status; 13504 }; 13506 16.2.8.4. DESCRIPTION 13508 This operation is a place-holder for encoding a result to handle the 13509 case of the client sending an operation code within COMPOUND that is 13510 not supported. See Section 15.2.4 for more details. 13512 The status field of CB_ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL. 13514 16.2.8.5. IMPLEMENTATION 13516 A server will probably not send an operation with code OP_CB_ILLEGAL 13517 but if it does, the response will be CB_ILLEGAL4res just as it would 13518 be with any other invalid operation code. Note that if the client 13519 gets an illegal operation code that is not OP_ILLEGAL, and if the 13520 client checks for legal operation codes during the XDR decode phase, 13521 then the CB_ILLEGAL4res would not be returned. 13523 17. Security Considerations 13525 NFS has historically used a model where, from an authentication 13526 perspective, the client was the entire machine, or at least the 13527 source IP address of the machine. The NFS server relied on the NFS 13528 client to make the proper authentication of the end-user. The NFS 13529 server in turn shared its files only to specific clients, as 13530 identified by the client's source IP address. Given this model, the 13531 AUTH_SYS RPC security flavor simply identified the end-user using the 13532 client to the NFS server. When processing NFS responses, the client 13533 ensured that the responses came from the same IP address and port 13534 number that the request was sent to. While such a model is easy to 13535 implement and simple to deploy and use, it is certainly not a safe 13536 model. Thus, NFSv4 mandates that implementations support a security 13537 model that uses end to end authentication, where an end-user on a 13538 client mutually authenticates (via cryptographic schemes that do not 13539 expose passwords or keys in the clear on the network) to a principal 13540 on an NFS server. Consideration should also be given to the 13541 integrity and privacy of NFS requests and responses. The issues of 13542 end to end mutual authentication, integrity, and privacy are 13543 discussed as part of Section 3. 13545 When an NFSv4 mandated security model is used and a security 13546 principal or an NFSv4 name in user@dns_domain form needs to be 13547 translated to or from a local representation as described in 13548 Section 5.9, the translation SHOULD be done in a secure manner that 13549 preserves the integrity of the translation. For communication with a 13550 name service such as LDAP ([RFC4511]), this means employing a 13551 security service that uses authentication and data integrity. 13552 Kerberos and Transport Layer Security (TLS) ([RFC5246]) are examples 13553 of such a security service. 13555 Note that being REQUIRED to implement does not mean REQUIRED to use; 13556 AUTH_SYS can be used by NFSv4 clients and servers. However, AUTH_SYS 13557 is merely an OPTIONAL security flavor in NFSv4, and so 13558 interoperability via AUTH_SYS is not assured. 13560 For reasons of reduced administration overhead, better performance 13561 and/or reduction of CPU utilization, users of NFSv4 implementations 13562 may choose to not use security mechanisms that enable integrity 13563 protection on each remote procedure call and response. The use of 13564 mechanisms without integrity leaves the customer vulnerable to an 13565 attacker in between the NFS client and server that modifies the RPC 13566 request and/or the response. While implementations are free to 13567 provide the option to use weaker security mechanisms, there are two 13568 operations in particular that warrant the implementation overriding 13569 user choices. 13571 The first such operation is SECINFO. It is recommended that the 13572 client issue the SECINFO call such that it is protected with a 13573 security flavor that has integrity protection, such as RPCSEC_GSS 13574 with a security triple that uses either rpc_gss_svc_integrity or 13575 rpc_gss_svc_privacy (rpc_gss_svc_privacy includes integrity 13576 protection) service. Without integrity protection encapsulating 13577 SECINFO and therefore its results, an attacker in the middle could 13578 modify results such that the client might select a weaker algorithm 13579 in the set allowed by server, making the client and/or server 13580 vulnerable to further attacks. 13582 The second operation that SHOULD use integrity protection is any 13583 GETATTR for the fs_locations attribute. The attack has two steps. 13584 First the attacker modifies the unprotected results of some operation 13585 to return NFS4ERR_MOVED. Second, when the client follows up with a 13586 GETATTR for the fs_locations attribute, the attacker modifies the 13587 results to cause the client to migrate its traffic to a server 13588 controlled by the attacker. 13590 Because the operations SETCLIENTID/SETCLIENTID_CONFIRM are 13591 responsible for the release of client state, it is imperative that 13592 the principal used for these operations is checked against and match 13593 with the previous use of these operations. See Section 9.1.1 for 13594 further discussion. 13596 Unicode in the form of UTF-8 is used for file component names (i.e., 13597 both directory and file components), as well as the owner and 13598 owner_group attributes; other character sets may also be allowed for 13599 file component names. String processing (e.g., Unicode 13600 normalization) raises security concerns for string comparison - see 13601 Sections 5.9 and 12 for further discussion and see [RFC6943] for 13602 related identifier comparison security considerations. File 13603 component names are identifiers with respect to the identifier 13604 comparison discussion in [RFC6943] because they are used to identify 13605 the objects to which ACLs are applied, see Section 6. 13607 18. IANA Considerations 13609 This section uses terms that are defined in [RFC5226]. 13611 18.1. Named Attribute Definitions 13613 IANA has created a registry called the "NFSv4 Named Attribute 13614 Definitions Registry" for [RFC3530] and [RFC5661]. This section 13615 introduces no new changes, but it does recap the intent. 13617 The NFSv4 protocol supports the association of a file with zero or 13618 more named attributes. The name space identifiers for these 13619 attributes are defined as string names. The protocol does not define 13620 the specific assignment of the name space for these file attributes. 13621 The IANA registry promotes interoperability where common interests 13622 exist. While application developers are allowed to define and use 13623 attributes as needed, they are encouraged to register the attributes 13624 with IANA. 13626 Such registered named attributes are presumed to apply to all minor 13627 versions of NFSv4, including those defined subsequently to the 13628 registration. Where the named attribute is intended to be limited 13629 with regard to the minor versions for which they are not be used, the 13630 assignment in registry will clearly state the applicable limits. 13632 The registry is to be maintained using the Specification Required 13633 policy as defined in Section 4.1 of [RFC5226]. 13635 Under the NFSv4 specification, the name of a named attribute can in 13636 theory be up to 2^32 - 1 bytes in length, but in practice NFSv4 13637 clients and servers will be unable to handle a string that long. 13638 IANA should reject any assignment request with a named attribute that 13639 exceeds 128 UTF-8 characters. To give IESG the flexibility to set up 13640 bases of assignment of Experimental Use and Standards Action, the 13641 prefixes of "EXPE" and "STDS" are Reserved. The zero length named 13642 attribute name is Reserved. 13644 The prefix "PRIV" is allocated for Private Use. A site that wants to 13645 make use of unregistered named attributes without risk of conflicting 13646 with an assignment in IANA's registry should use the prefix "PRIV" in 13647 all of its named attributes. 13649 Because some NFSv4 clients and servers have case insensitive 13650 semantics, the fifteen additional lower case and mixed case 13651 permutations of each of "EXPE", "PRIV", and "STDS", are Reserved 13652 (e.g. "expe", "expE", "exPe", etc. are Reserved). Similarly, IANA 13653 must not allow two assignments that would conflict if both named 13654 attributes were converted to a common case. 13656 The registry of named attributes is a list of assignments, each 13657 containing three fields for each assignment. 13659 1. A US-ASCII string name that is the actual name of the attribute. 13660 This name must be unique. This string name can be 1 to 128 UTF-8 13661 characters long. 13663 2. A reference to the specification of the named attribute. The 13664 reference can consume up to 256 bytes (or more if IANA permits). 13666 3. The point of contact of the registrant. The point of contact can 13667 consume up to 256 bytes (or more if IANA permits). 13669 18.1.1. Initial Registry 13671 There is no initial registry. 13673 18.1.2. Updating Registrations 13675 The registrant is always permitted to update the point of contact 13676 field. To make any other change will require Expert Review or IESG 13677 Approval. 13679 19. References 13681 19.1. Normative References 13683 [RFC20] Cerf, V., "ASCII format for network interchange", RFC 20, 13684 October 1969. 13686 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 13687 Requirement Levels", March 1997. 13689 [RFC2203] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol 13690 Specification", RFC 2203, September 1997. 13692 [RFC2743] Linn, J., "Generic Security Service Application Program 13693 Interface Version 2, Update 1", RFC 2743, January 2000. 13695 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 13696 "Internationalizing Domain Names in Applications (IDNA)", 13697 RFC 3490, March 2003. 13699 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode 13700 for Internationalized Domain Names in Applications 13701 (IDNA)", RFC 3492, March 2003. 13703 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 13704 10646", STD 63, RFC 3629, November 2003. 13706 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 13707 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 13708 May 2008. 13710 [RFC5403] Eisler, M., "RPCSEC_GSS Version 2", RFC 5403, February 13711 2009. 13713 [RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol 13714 Specification Version 2", RFC 5531, May 2009. 13716 [RFC5665] Eisler, M., Ed., "IANA Considerations for Remote Procedure 13717 Call (RPC) Network Identifiers and Universal Address 13718 Formats", RFC 5665, January 2010. 13720 [RFC5890] Klensin, J., "Internationalized Domain Names in 13721 Applications (IDNA): Definitions and Document Framework", 13722 RFC 5890, August 2010. 13724 [RFC5891] Klensin, J., "Internationalized Domain Names in 13725 Applications (IDNA): Protocol", RFC 5891, August 2010. 13727 [RFC6649] Astrand, L. and T. Yu, "Deprecate DES, RC4-HMAC-EXP, and 13728 Other Weak Cryptographic Algorithms in Kerberos", RFC 13729 6649, July 2012. 13731 [RFCNFSv4XDR] 13732 Haynes, T. and D. Noveck, "NFSv4 Version 0 XDR 13733 Description", draft-ietf-nfsv4-rfc3530bis-dot-x-23 (work 13734 in progress), Dec 2014. 13736 [SPECIALCASING] 13737 The Unicode Consortium, "SpecialCasing-6.3.0.txt", Unicode 13738 Character Database , September 2013, 13739 . 13742 [UNICODE] The Unicode Consortium, "The Unicode Standard, Version 13743 6.3.0", September 2013, 13744 . 13746 [openg_symlink] 13747 The Open Group, "Section 3.372 of Chapter 3 of Base 13748 Definitions of The Open Group Base Specifications Issue 6, 13749 IEEE Std 1003.1, 2004 Edition, HTML Version 13750 (www.opengroup.org), ISBN 1931624232", 2004. 13752 19.2. Informative References 13754 [Chet] Juszczak, C., "Improving the Performance and Correctness 13755 of an NFS Server", USENIX Conference Proceedings , June 13756 1990. 13758 [Floyd] Floyd, S. and V. Jacobson, "The Synchronization of 13759 Periodic Routing Messages", IEEE/ACM Transactions on 13760 Networking 2(2), pp. 122-136, April 1994. 13762 [IESG_ERRATA] 13763 IESG, "IESG Processing of RFC Errata for the IETF Stream", 13764 July 2008. 13766 [MS-SMB] Microsoft Corporation, , "Server Message Block (SMB) 13767 Protocol Specification", MS-SMB 17.0, November 2009. 13769 [P1003.1e] 13770 Institute of Electrical and Electronics Engineers, Inc., 13771 "IEEE Draft P1003.1e", 1997. 13773 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 13774 793, September 1981. 13776 [RFC1094] Nowicki, B., "NFS: Network File System Protocol 13777 specification", RFC 1094, March 1989. 13779 [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS 13780 Version 3 Protocol Specification", RFC 1813, June 1995. 13782 [RFC1833] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", 13783 RFC 1833, August 1995. 13785 [RFC2054] Callaghan, B., "WebNFS Client Specification", RFC 2054, 13786 October 1996. 13788 [RFC2055] Callaghan, B., "WebNFS Server Specification", RFC 2055, 13789 October 1996. 13791 [RFC2224] Callaghan, B., "NFS URL Scheme", RFC 2224, October 1997. 13793 [RFC2623] Eisler, M., "NFS Version 2 and Version 3 Security Issues 13794 and the NFS Protocol's Use of RPCSEC_GSS and Kerberos V5", 13795 RFC 2623, June 1999. 13797 [RFC2624] Shepler, S., "NFS Version 4 Design Considerations", RFC 13798 2624, June 1999. 13800 [RFC2755] Chiu, A., Eisler, M., and B. Callaghan, "Security 13801 Negotiation for WebNFS", RFC 2755, January 2000. 13803 [RFC3010] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., 13804 Beame, C., Eisler, M., and D. Noveck, "Network File System 13805 (NFS) version 4 Protocol", RFC 3010, December 2000. 13807 [RFC3232] Reynolds, J., "Assigned Numbers: RFC 1700 is Replaced by 13808 an On-line Database", RFC 3232, January 2002. 13810 [RFC3530] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., 13811 Beame, C., Eisler, M., and D. Noveck, "Network File System 13812 (NFS) version 4 Protocol", RFC 3530, April 2003. 13814 [RFC4121] Zhu, L., Jaganathan, K., and S. Hartman, "The Kerberos 13815 Version 5 Generic Security Service Application Program 13816 Interface (GSS-API) Mechanism: Version 2", RFC 4121, July 13817 2005. 13819 [RFC4178] Zhu, L., Leach, P., Jaganathan, K., and W. Ingersoll, "The 13820 Simple and Protected Generic Security Service Application 13821 Program Interface (GSS-API) Negotiation Mechanism", RFC 13822 4178, October 2005. 13824 [RFC4506] Eisler, M., "XDR: External Data Representation Standard", 13825 RFC 4506, May 2006. 13827 [RFC4511] Sermersheim, J., "Lightweight Directory Access Protocol 13828 (LDAP): The Protocol", RFC 4511, June 2006. 13830 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 13831 (TLS) Protocol Version 1.2", RFC 5246, August 2008. 13833 [RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File 13834 System (NFS) Version 4 Minor Version 1 Protocol", RFC 13835 5661, January 2010. 13837 [RFC6365] Hoffman, P. and J. Klensin, "Terminology Used in 13838 Internationalization in the IETF", BCP 166, RFC 6365, 13839 September 2011. 13841 [RFC6943] Thaler, D., "Issues in Identifier Comparison for Security 13842 Purposes", RFC 6943, May 2013. 13844 [fcntl] The Open Group, "Section 'fcntl()' of System Interfaces of 13845 The Open Group Base Specifications Issue 6 IEEE Std 13846 1003.1, 2004 Edition, HTML Version (www.opengroup.org), 13847 ISBN 1931624232", 2004. 13849 [fsync] The Open Group, "Section 'fsync()' of System Interfaces of 13850 The Open Group Base Specifications Issue 6 IEEE Std 13851 1003.1, 2004 Edition, HTML Version (www.opengroup.org), 13852 ISBN 1931624232", 2004. 13854 [getpwnam] 13855 The Open Group, "Section 'getpwnam()' of System Interfaces 13856 of The Open Group Base Specifications Issue 6 IEEE Std 13857 1003.1, 2004 Edition, HTML Version (www.opengroup.org), 13858 ISBN 1931624232", 2004. 13860 [read_api] 13861 The Open Group, "Section 'read()' of System Interfaces of 13862 The Open Group Base Specifications Issue 6, IEEE Std 13863 1003.1, 2004 Edition", 2004. 13865 [readdir_api] 13866 The Open Group, "Section 'readdir()' of System Interfaces 13867 of The Open Group Base Specifications Issue 6, IEEE Std 13868 1003.1, 2004 Edition", 2004. 13870 [unlink] The Open Group, "Section 'unlink()' of System Interfaces 13871 of The Open Group Base Specifications Issue 6 IEEE Std 13872 1003.1, 2004 Edition, HTML Version (www.opengroup.org), 13873 ISBN 1931624232", 2004. 13875 [write_api] 13876 The Open Group, "Section 'write()' of System Interfaces of 13877 The Open Group Base Specifications Issue 6, IEEE Std 13878 1003.1, 2004 Edition", 2004. 13880 [xnfs] The Open Group, "Protocols for Interworking: XNFS, Version 13881 3W, ISBN 1-85912-184-5", February 1998. 13883 Appendix A. Acknowledgments 13885 A bis is certainly built on the shoulders of the first attempt. 13886 Spencer Shepler, Brent Callaghan, David Robinson, Robert Thurlow, 13887 Carl Beame, Mike Eisler, and David Noveck are responsible for a great 13888 deal of the effort in this work. 13890 Tom Haynes would like to thank NetApp, Inc. for its funding of his 13891 time on this project. 13893 Rob Thurlow clarified how a client should contact a new server if a 13894 migration has occurred. 13896 David Black, Nico Williams, Mike Eisler, Trond Myklebust, James 13897 Lentini, and Mike Kupfer read many drafts of Section 12 and 13898 contributed numerous useful suggestions, without which the necessary 13899 revision of that section for this document would not have been 13900 possible. 13902 Peter Staubach read almost all of the drafts of Section 12 leading to 13903 the published result and his numerous comments were always useful and 13904 contributed substantially to improving the quality of the final 13905 result. 13907 Peter Saint-Andre was gracious enough to read the last draft of 13908 Section 12 and provided some key insight as to the concerns of the 13909 Internationalization community. 13911 James Lentini graciously read the rewrite of Section 8 and his 13912 comments were vital in improving the quality of that effort. 13914 Rob Thurlow, Sorin Faibish, James Lentini, Bruce Fields, and Trond 13915 Myklebust were faithful attendants of the biweekly triage meeting and 13916 accepted many an action item. 13918 Bruce Fields was a good sounding board for both the Third Edge 13919 Condition and Courtesy Locks in general. He was also the leading 13920 advocate of stamping out backport issues from [RFC5661]. 13922 Marcel Telka was a champion of straightening out the difference 13923 between a lock-owner and an open-owner. He has also been diligent in 13924 reviewing the final document. 13926 Benjamin Kaduk reminded us that DES is dead and Nico Williams helped 13927 us close the lid on the coffin. 13929 Elwyn Davies provided a very thorough and engaging Gen-ART review, 13930 thanks! 13932 Appendix B. RFC Editor Notes 13934 [RFC Editor: please remove this section prior to publishing this 13935 document as an RFC] 13937 [RFC Editor: prior to publishing this document as an RFC, please 13938 replace all occurrences of RFCNFSv4XDR with RFCxxxx where xxxx is the 13939 RFC number assigned to the XDR document.] 13941 [RFC Editor: Please note that there is also a reference entry that 13942 needs to be modified for the companion document.] 13944 [RFC Editor: prior to publishing this document as an RFC, please have 13945 every top level subsection of both Section 15 and Section 16 that has 13946 a title of "Procedure X:" or "Operation Y:" start at the top of a new 13947 page.] 13949 Authors' Addresses 13951 Thomas Haynes (editor) 13952 Primary Data, Inc. 13953 4300 El Camino Real Ste 100 13954 Los Altos, CA 94022 13955 USA 13957 Phone: +1 408 215 1519 13958 Email: thomas.haynes@primarydata.com 13960 David Noveck (editor) 13961 Dell 13962 300 Innovative Way 13963 Nashua, NH 03062 13964 US 13966 Phone: +1 781 572 8038 13967 Email: dave_noveck@dell.com