idnits 2.17.1 draft-ietf-nfsv4-rfc3530bis-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 2 instances of lines with non-RFC2606-compliant FQDNs in the document. -- The abstract seems to indicate that this document obsoletes RFC1813, but the header doesn't have an 'Obsoletes:' line to match this. -- The abstract seems to indicate that this document obsoletes RFC3530, but the header doesn't have an 'Obsoletes:' line to match this. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1044 has weird spacing: '...ned int cb_...' == Line 1260 has weird spacing: '...otiated rpc_g...' == Line 1261 has weird spacing: '...otiated rpc_g...' == Line 1262 has weird spacing: '...otiated rpc_g...' == Line 1293 has weird spacing: '...otiated rpc_g...' == (16 more instances...) -- The exact meaning of the all-uppercase expression 'MAY NOT' is not defined in RFC 2119. If it is intended as a requirements expression, it should be rewritten using one of the combinations defined in RFC 2119; otherwise it should not be all-uppercase. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: The server can specify a root path by setting an array of zero path components. Other than this special case, the server MUST not present empty path components to the client. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: A problem exists if a client allows an open owner to have state on multiple filesystems on a server. If one of those filesystems is migrated, what happens to the sequence numbers? A client can avoid such a situation with the stipulation that any client which supports migration MUST ensure that any open owner is confined to a single filesystem. If the server finds itself migrating open owners that span multiple filesystems, then it MUST not migrate the state for the conflicting open owners on the non-migrated filesystems; instead it MUST return NFS4ERR_STALE_STATEID if the client tries to use those stateids. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: As a courtesy to the client or as an optimization, the server may continue to hold locks on behalf of a client for which recent communication has extended beyond the lease period. If the server receives a lock or I/O request that conflicts with one of these courtesy locks, the server MUST free the courtesy lock and grant the new request. If the server runs out of resources, it MAY free all courtesy locks. I.e., the client MUST not make an assumption that the server has issued courtesy locks. == The expression 'MAY NOT', while looking like RFC 2119 requirements text, is not defined in RFC 2119, and should not be used. Consider using 'MUST NOT' instead (if that is what you mean). Found 'MAY NOT' in this paragraph: The current lock has been revoked during the partition and the server did not reboot. Other locks MAY still be renewed. The client MAY NOT want to do a SETCLIENTID and instead SHOULD probe via a RENEW call. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: If a COMPOUND contains an OPEN which establishes a OPEN_DELEGATE_WRITE delegation, then a subsequent GETATTR inside that COMPOUND SHOULD not result in a CB_GETATTR to the client. The server SHOULD understand the GETATTR to be for the same client ID and avoid querying the client, which will not be able to respond. This sequence of OPEN, GETATTR SHOULD be understood as an atomic retrieval of the initial size and change attribute. Further, the client SHOULD NOT construct a COMPOUND which mixes operations for different client IDs. == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 13, 2011) is 4786 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' == Outdated reference: A later version (-24) exists of draft-ietf-nfsv4-rfc3530bis-dot-x-02 -- Possible downref: Non-RFC (?) normative reference: ref. '7' ** Obsolete normative reference: RFC 3454 (ref. '9') (Obsoleted by RFC 7564) -- Obsolete informational reference (is this intentional?): RFC 3530 (ref. '11') (Obsoleted by RFC 7530) -- Obsolete informational reference (is this intentional?): RFC 3010 (ref. '12') (Obsoleted by RFC 3530) -- Obsolete informational reference (is this intentional?): RFC 2373 (ref. '18') (Obsoleted by RFC 3513) -- Obsolete informational reference (is this intentional?): RFC 5661 (ref. '31') (Obsoleted by RFC 8881) -- Obsolete informational reference (is this intentional?): RFC 793 (ref. '33') (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 5226 (ref. '41') (Obsoleted by RFC 8126) Summary: 2 errors (**), 0 flaws (~~), 15 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 T. Haynes, Ed. 3 Internet-Draft NetApp 4 Intended status: Standards Track D. Noveck, Ed. 5 Expires: September 14, 2011 EMC 6 March 13, 2011 8 Network File System (NFS) Version 4 Protocol 9 draft-ietf-nfsv4-rfc3530bis-09.txt 11 Abstract 13 The Network File System (NFS) version 4 is a distributed filesystem 14 protocol which owes heritage to NFS protocol version 2, RFC 1094, and 15 version 3, RFC 1813. Unlike earlier versions, the NFS version 4 16 protocol supports traditional file access while integrating support 17 for file locking and the mount protocol. In addition, support for 18 strong security (and its negotiation), compound operations, client 19 caching, and internationalization have been added. Of course, 20 attention has been applied to making NFS version 4 operate well in an 21 Internet environment. 23 This document, together with the companion XDR description document, 24 replaces RFC 3530 as the definition of the NFS version 4 protocol. 26 Requirements Language 28 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 29 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 30 document are to be interpreted as described in RFC 2119 [1]. 32 Status of this Memo 34 This Internet-Draft is submitted to IETF in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF), its areas, and its working groups. Note that 39 other groups may also distribute working documents as Internet- 40 Drafts. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 The list of current Internet-Drafts can be accessed at 48 http://www.ietf.org/ietf/1id-abstracts.txt. 50 The list of Internet-Draft Shadow Directories can be accessed at 51 http://www.ietf.org/shadow.html. 53 This Internet-Draft will expire on September 14, 2011. 55 Copyright Notice 57 Copyright (c) 2011 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (http://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with respect 65 to this document. Code Components extracted from this document must 66 include Simplified BSD License text as described in Section 4.e of 67 the Trust Legal Provisions and are provided without warranty as 68 described in the BSD License. 70 This document may contain material from IETF Documents or IETF 71 Contributions published or made publicly available before November 72 10, 2008. The person(s) controlling the copyright in some of this 73 material may not have granted the IETF Trust the right to allow 74 modifications of such material outside the IETF Standards Process. 75 Without obtaining an adequate license from the person(s) controlling 76 the copyright in such materials, this document may not be modified 77 outside the IETF Standards Process, and derivative works of it may 78 not be created outside the IETF Standards Process, except to format 79 it for publication as an RFC or to translate it into languages other 80 than English. 82 Table of Contents 84 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 8 85 1.1. Changes since RFC 3530 . . . . . . . . . . . . . . . . . 8 86 1.2. Changes since RFC 3010 . . . . . . . . . . . . . . . . . 9 87 1.3. NFS Version 4 Goals . . . . . . . . . . . . . . . . . . 10 88 1.4. Inconsistencies of this Document with the companion 89 document NFS Version 4 Protocol . . . . . . . . . . . . 10 90 1.5. Overview of NFSv4 Features . . . . . . . . . . . . . . . 11 91 1.5.1. RPC and Security . . . . . . . . . . . . . . . . . . 11 92 1.5.2. Procedure and Operation Structure . . . . . . . . . 11 93 1.5.3. Filesystem Model . . . . . . . . . . . . . . . . . . 12 94 1.5.4. OPEN and CLOSE . . . . . . . . . . . . . . . . . . . 14 95 1.5.5. File Locking . . . . . . . . . . . . . . . . . . . . 14 96 1.5.6. Client Caching and Delegation . . . . . . . . . . . 14 97 1.6. General Definitions . . . . . . . . . . . . . . . . . . 15 98 2. Protocol Data Types . . . . . . . . . . . . . . . . . . . . . 17 99 2.1. Basic Data Types . . . . . . . . . . . . . . . . . . . . 17 100 2.2. Structured Data Types . . . . . . . . . . . . . . . . . 19 101 3. RPC and Security Flavor . . . . . . . . . . . . . . . . . . . 24 102 3.1. Ports and Transports . . . . . . . . . . . . . . . . . . 24 103 3.1.1. Client Retransmission Behavior . . . . . . . . . . . 25 104 3.2. Security Flavors . . . . . . . . . . . . . . . . . . . . 26 105 3.2.1. Security mechanisms for NFSv4 . . . . . . . . . . . 26 106 3.3. Security Negotiation . . . . . . . . . . . . . . . . . . 28 107 3.3.1. SECINFO . . . . . . . . . . . . . . . . . . . . . . 29 108 3.3.2. Security Error . . . . . . . . . . . . . . . . . . . 29 109 3.3.3. Callback RPC Authentication . . . . . . . . . . . . 29 110 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 31 111 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 31 112 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 32 113 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 32 114 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 32 115 4.2.1. General Properties of a Filehandle . . . . . . . . . 33 116 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 33 117 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 34 118 4.2.4. One Method of Constructing a Volatile Filehandle . . 35 119 4.3. Client Recovery from Filehandle Expiration . . . . . . . 35 120 5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 36 121 5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 37 122 5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 38 123 5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 38 124 5.4. Classification of Attributes . . . . . . . . . . . . . . 40 125 5.5. Set-Only and Get-Only Attributes . . . . . . . . . . . . 40 126 5.6. REQUIRED Attributes - List and Definition References . . 41 127 5.7. RECOMMENDED Attributes - List and Definition 128 References . . . . . . . . . . . . . . . . . . . . . . . 42 129 5.8. Attribute Definitions . . . . . . . . . . . . . . . . . 43 130 5.8.1. Definitions of REQUIRED Attributes . . . . . . . . . 43 131 5.8.2. Definitions of Uncategorized RECOMMENDED 132 Attributes . . . . . . . . . . . . . . . . . . . . . 45 133 5.9. Interpreting owner and owner_group . . . . . . . . . . . 51 134 5.10. Character Case Attributes . . . . . . . . . . . . . . . 54 135 6. Access Control Attributes . . . . . . . . . . . . . . . . . . 54 136 6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 54 137 6.2. File Attributes Discussion . . . . . . . . . . . . . . . 55 138 6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 55 139 6.2.2. Attribute 33: mode . . . . . . . . . . . . . . . . . 69 140 6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 70 141 6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 70 142 6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 71 143 6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 72 144 6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 72 145 6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 74 146 6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 74 147 7. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 76 148 7.1. Location Attributes . . . . . . . . . . . . . . . . . . 76 149 7.2. File System Presence or Absence . . . . . . . . . . . . 76 150 7.3. Getting Attributes for an Absent File System . . . . . . 77 151 7.3.1. GETATTR Within an Absent File System . . . . . . . . 78 152 7.3.2. READDIR and Absent File Systems . . . . . . . . . . 79 153 7.4. Uses of Location Information . . . . . . . . . . . . . . 79 154 7.4.1. File System Replication . . . . . . . . . . . . . . 80 155 7.4.2. File System Migration . . . . . . . . . . . . . . . 81 156 7.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 81 157 7.5. Location Entries and Server Identity . . . . . . . . . . 82 158 7.6. Additional Client-Side Considerations . . . . . . . . . 83 159 7.7. Effecting File System Transitions . . . . . . . . . . . 84 160 7.7.1. File System Transitions and Simultaneous Access . . 85 161 7.7.2. Filehandles and File System Transitions . . . . . . 85 162 7.7.3. Fileids and File System Transitions . . . . . . . . 86 163 7.7.4. Fsids and File System Transitions . . . . . . . . . 87 164 7.7.5. The Change Attribute and File System Transitions . . 87 165 7.7.6. Lock State and File System Transitions . . . . . . . 88 166 7.7.7. Write Verifiers and File System Transitions . . . . 90 167 7.7.8. Readdir Cookies and Verifiers and File System 168 Transitions . . . . . . . . . . . . . . . . . . . . 90 169 7.7.9. File System Data and File System Transitions . . . . 90 170 7.8. Effecting File System Referrals . . . . . . . . . . . . 92 171 7.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 92 172 7.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 96 173 7.9. The Attribute fs_locations . . . . . . . . . . . . . . . 98 174 7.9.1. Inferring Transition Modes . . . . . . . . . . . . . 100 175 8. NFS Server Name Space . . . . . . . . . . . . . . . . . . . . 101 176 8.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 102 177 8.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 102 178 8.3. Server Pseudo Filesystem . . . . . . . . . . . . . . . . 102 179 8.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 103 180 8.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 103 181 8.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 103 182 8.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 104 183 8.8. Security Policy and Name Space Presentation . . . . . . 104 184 9. File Locking and Share Reservations . . . . . . . . . . . . . 105 185 9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 106 186 9.1.1. Client ID . . . . . . . . . . . . . . . . . . . . . 106 187 9.1.2. Server Release of Client ID . . . . . . . . . . . . 109 188 9.1.3. Stateid Definition . . . . . . . . . . . . . . . . . 110 189 9.1.4. lock_owner . . . . . . . . . . . . . . . . . . . . . 117 190 9.1.5. Use of the Stateid and Locking . . . . . . . . . . . 118 191 9.1.6. Sequencing of Lock Requests . . . . . . . . . . . . 120 192 9.1.7. Recovery from Replayed Requests . . . . . . . . . . 121 193 9.1.8. Releasing lock_owner State . . . . . . . . . . . . . 121 194 9.1.9. Use of Open Confirmation . . . . . . . . . . . . . . 122 195 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 123 196 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 123 197 9.4. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 124 198 9.5. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 125 199 9.6. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 126 200 9.6.1. Client Failure and Recovery . . . . . . . . . . . . 126 201 9.6.2. Server Failure and Recovery . . . . . . . . . . . . 127 202 9.6.3. Network Partitions and Recovery . . . . . . . . . . 128 203 9.7. Recovery from a Lock Request Timeout or Abort . . . . . 134 204 9.8. Server Revocation of Locks . . . . . . . . . . . . . . . 135 205 9.9. Share Reservations . . . . . . . . . . . . . . . . . . . 136 206 9.10. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 136 207 9.10.1. Close and Retention of State Information . . . . . . 137 208 9.11. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 138 209 9.12. Short and Long Leases . . . . . . . . . . . . . . . . . 139 210 9.13. Clocks, Propagation Delay, and Calculating Lease 211 Expiration . . . . . . . . . . . . . . . . . . . . . . . 139 212 9.14. Migration, Replication and State . . . . . . . . . . . . 140 213 9.14.1. Migration and State . . . . . . . . . . . . . . . . 140 214 9.14.2. Replication and State . . . . . . . . . . . . . . . 141 215 9.14.3. Notification of Migrated Lease . . . . . . . . . . . 141 216 9.14.4. Migration and the Lease_time Attribute . . . . . . . 142 217 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 143 218 10.1. Performance Challenges for Client-Side Caching . . . . . 143 219 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 144 220 10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 146 221 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 148 222 10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 148 223 10.3.2. Data Caching and File Locking . . . . . . . . . . . 149 224 10.3.3. Data Caching and Mandatory File Locking . . . . . . 151 225 10.3.4. Data Caching and File Identity . . . . . . . . . . . 151 227 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 152 228 10.4.1. Open Delegation and Data Caching . . . . . . . . . . 155 229 10.4.2. Open Delegation and File Locks . . . . . . . . . . . 156 230 10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 156 231 10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 159 232 10.4.5. OPEN Delegation Race with CB_RECALL . . . . . . . . 161 233 10.4.6. Clients that Fail to Honor Delegation Recalls . . . 162 234 10.4.7. Delegation Revocation . . . . . . . . . . . . . . . 163 235 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 163 236 10.5.1. Revocation Recovery for Write Open Delegation . . . 164 237 10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 164 238 10.7. Data and Metadata Caching and Memory Mapped Files . . . 166 239 10.8. Name Caching . . . . . . . . . . . . . . . . . . . . . . 169 240 10.9. Directory Caching . . . . . . . . . . . . . . . . . . . 170 241 11. Minor Versioning . . . . . . . . . . . . . . . . . . . . . . 171 242 12. Internationalization . . . . . . . . . . . . . . . . . . . . 173 243 12.1. Use of UTF-8 . . . . . . . . . . . . . . . . . . . . . . 174 244 12.1.1. Relation to Stringprep . . . . . . . . . . . . . . . 174 245 12.1.2. Normalization, Equivalence, and Confusability . . . 175 246 12.2. String Type Overview . . . . . . . . . . . . . . . . . . 178 247 12.2.1. Overall String Class Divisions . . . . . . . . . . . 178 248 12.2.2. Divisions by Typedef Parent types . . . . . . . . . 179 249 12.2.3. Individual Types and Their Handling . . . . . . . . 180 250 12.3. Errors Related to Strings . . . . . . . . . . . . . . . 181 251 12.4. Types with Pre-processing to Resolve Mixture Issues . . 182 252 12.4.1. Processing of Principal Strings . . . . . . . . . . 182 253 12.4.2. Processing of Server Id Strings . . . . . . . . . . 183 254 12.5. String Types without Internationalization Processing . . 183 255 12.6. Types with Processing Defined by Other Internet Areas . 184 256 12.7. String Types with NFS-specific Processing . . . . . . . 185 257 12.7.1. Handling of File Name Components . . . . . . . . . . 185 258 12.7.2. Processing of Link Text . . . . . . . . . . . . . . 194 259 12.7.3. Processing of Principal Prefixes . . . . . . . . . . 195 260 13. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 196 261 13.1. Error Definitions . . . . . . . . . . . . . . . . . . . 196 262 13.1.1. General Errors . . . . . . . . . . . . . . . . . . . 198 263 13.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 199 264 13.1.3. Compound Structure Errors . . . . . . . . . . . . . 201 265 13.1.4. File System Errors . . . . . . . . . . . . . . . . . 201 266 13.1.5. State Management Errors . . . . . . . . . . . . . . 203 267 13.1.6. Security Errors . . . . . . . . . . . . . . . . . . 204 268 13.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 205 269 13.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 205 270 13.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 207 271 13.1.10. Client Management Errors . . . . . . . . . . . . . . 207 272 13.1.11. Attribute Handling Errors . . . . . . . . . . . . . 208 273 13.2. Operations and their valid errors . . . . . . . . . . . 208 274 13.3. Callback operations and their valid errors . . . . . . . 216 275 13.4. Errors and the operations that use them . . . . . . . . 216 276 14. NFSv4 Requests . . . . . . . . . . . . . . . . . . . . . . . 220 277 14.1. Compound Procedure . . . . . . . . . . . . . . . . . . . 221 278 14.2. Evaluation of a Compound Request . . . . . . . . . . . . 221 279 14.3. Synchronous Modifying Operations . . . . . . . . . . . . 222 280 14.4. Operation Values . . . . . . . . . . . . . . . . . . . . 223 281 15. NFSv4 Procedures . . . . . . . . . . . . . . . . . . . . . . 223 282 15.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 223 283 15.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 223 284 15.3. Operation 3: ACCESS - Check Access Rights . . . . . . . 229 285 15.4. Operation 4: CLOSE - Close File . . . . . . . . . . . . 231 286 15.5. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 232 287 15.6. Operation 6: CREATE - Create a Non-Regular File Object . 235 288 15.7. Operation 7: DELEGPURGE - Purge Delegations Awaiting 289 Recovery . . . . . . . . . . . . . . . . . . . . . . . . 238 290 15.8. Operation 8: DELEGRETURN - Return Delegation . . . . . . 239 291 15.9. Operation 9: GETATTR - Get Attributes . . . . . . . . . 239 292 15.10. Operation 10: GETFH - Get Current Filehandle . . . . . . 241 293 15.11. Operation 11: LINK - Create Link to a File . . . . . . . 242 294 15.12. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 243 295 15.13. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 247 296 15.14. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 249 297 15.15. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 250 298 15.16. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 252 299 15.17. Operation 17: NVERIFY - Verify Difference in 300 Attributes . . . . . . . . . . . . . . . . . . . . . . . 252 301 15.18. Operation 18: OPEN - Open a Regular File . . . . . . . . 254 302 15.19. Operation 19: OPENATTR - Open Named Attribute 303 Directory . . . . . . . . . . . . . . . . . . . . . . . 264 304 15.20. Operation 20: OPEN_CONFIRM - Confirm Open . . . . . . . 265 305 15.21. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 267 306 15.22. Operation 22: PUTFH - Set Current Filehandle . . . . . . 268 307 15.23. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 269 308 15.24. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 270 309 15.25. Operation 25: READ - Read from File . . . . . . . . . . 271 310 15.26. Operation 26: READDIR - Read Directory . . . . . . . . . 273 311 15.27. Operation 27: READLINK - Read Symbolic Link . . . . . . 277 312 15.28. Operation 28: REMOVE - Remove Filesystem Object . . . . 278 313 15.29. Operation 29: RENAME - Rename Directory Entry . . . . . 280 314 15.30. Operation 30: RENEW - Renew a Lease . . . . . . . . . . 282 315 15.31. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 283 316 15.32. Operation 32: SAVEFH - Save Current Filehandle . . . . . 284 317 15.33. Operation 33: SECINFO - Obtain Available Security . . . 284 318 15.34. Operation 34: SETATTR - Set Attributes . . . . . . . . . 287 319 15.35. Operation 35: SETCLIENTID - Negotiate Client ID . . . . 290 320 15.36. Operation 36: SETCLIENTID_CONFIRM - Confirm Client ID . 294 321 15.37. Operation 37: VERIFY - Verify Same Attributes . . . . . 297 322 15.38. Operation 38: WRITE - Write to File . . . . . . . . . . 299 323 15.39. Operation 39: RELEASE_LOCKOWNER - Release Lockowner 324 State . . . . . . . . . . . . . . . . . . . . . . . . . 303 325 15.40. Operation 10044: ILLEGAL - Illegal operation . . . . . . 304 326 16. NFSv4 Callback Procedures . . . . . . . . . . . . . . . . . . 304 327 16.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 305 328 16.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 305 329 16.2.6. Operation 3: CB_GETATTR - Get Attributes . . . . . . 307 330 16.2.7. Operation 4: CB_RECALL - Recall an Open Delegation . 308 331 16.2.8. Operation 10044: CB_ILLEGAL - Illegal Callback 332 Operation . . . . . . . . . . . . . . . . . . . . . 309 333 17. Security Considerations . . . . . . . . . . . . . . . . . . . 310 334 18. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 311 335 18.1. Named Attribute Definitions . . . . . . . . . . . . . . 311 336 18.1.1. Initial Registry . . . . . . . . . . . . . . . . . . 312 337 18.1.2. Updating Registrations . . . . . . . . . . . . . . . 312 338 18.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 312 339 18.2.1. Initial Registry . . . . . . . . . . . . . . . . . . 314 340 18.2.2. Updating Registrations . . . . . . . . . . . . . . . 314 341 19. References . . . . . . . . . . . . . . . . . . . . . . . . . 314 342 19.1. Normative References . . . . . . . . . . . . . . . . . . 314 343 19.2. Informative References . . . . . . . . . . . . . . . . . 315 344 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 317 345 Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 318 346 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 318 348 1. Introduction 350 1.1. Changes since RFC 3530 352 This document, together with the companion XDR description document 353 [2], obsoletes RFC 3530 [11] as the authoritative document describing 354 NFSv4. It does not introduce any over-the-wire protocol changes, in 355 the sense that previously valid requests requests remain valid. 356 However, some requests previously defined as invalid, although not 357 generally rejected, are now explicitly allowed, in that 358 internationalization handling has been generalized and liberalized. 359 The main changes from RFC 3530 are: 361 o The XDR definition has been moved to a companion document [2] 363 o Updates for the latest IETF intellectual property statements 365 o There is a restructured and more complete explanation of multi- 366 server namespace features. In particular, this explanation 367 explicitly describes handling of inter-server referrals, even 368 where neither migration nor replication is involved. 370 o More liberal handling of internationalization for file names and 371 user and group names, with the elimination of restrictions imposed 372 by stringprep, with the recognition that rules for the forms of 373 these name are the province of the receiving entity. 375 o Updating handling of domain names to reflect IDNA. 377 o Restructuring of string types to more appropriately reflect the 378 reality of required string processing. 380 o LIPKEY SPKM/3 has been moved from being REQUIRED to OPTIONAL. 382 o Some clarification on a client re-establishing callback 383 information to the new server if state has been migrated. 385 o A third edge case was added for Courtesy locks and network 386 partitions. 388 o The definition of stateid was strengthened, which had the side 389 effect of introducing a semantic change in a COMPOUND structure 390 having a current stateid and a saved stateid. 392 1.2. Changes since RFC 3010 394 This definition of the NFSv4 protocol replaces or obsoletes the 395 definition present in [12]. While portions of the two documents have 396 remained the same, there have been substantive changes in others. 397 The changes made between [12] and this document represent 398 implementation experience and further review of the protocol. While 399 some modifications were made for ease of implementation or 400 clarification, most updates represent errors or situations where the 401 [12] definition were untenable. 403 The following list is not all inclusive of all changes but presents 404 some of the most notable changes or additions made: 406 o The state model has added an open_owner4 identifier. This was 407 done to accommodate Posix based clients and the model they use for 408 file locking. For Posix clients, an open_owner4 would correspond 409 to a file descriptor potentially shared amongst a set of processes 410 and the lock_owner4 identifier would correspond to a process that 411 is locking a file. 413 o Clarifications and error conditions were added for the handling of 414 the owner and group attributes. Since these attributes are string 415 based (as opposed to the numeric uid/gid of previous versions of 416 NFS), translations may not be available and hence the changes 417 made. 419 o Clarifications for the ACL and mode attributes to address 420 evaluation and partial support. 422 o For identifiers that are defined as XDR opaque, limits were set on 423 their size. 425 o Added the mounted_on_filed attribute to allow Posix clients to 426 correctly construct local mounts. 428 o Modified the SETCLIENTID/SETCLIENTID_CONFIRM operations to deal 429 correctly with confirmation details along with adding the ability 430 to specify new client callback information. Also added 431 clarification of the callback information itself. 433 o Added a new operation LOCKOWNER_RELEASE to enable notifying the 434 server that a lock_owner4 will no longer be used by the client. 436 o RENEW operation changes to identify the client correctly and allow 437 for additional error returns. 439 o Verify error return possibilities for all operations. 441 o Remove use of the pathname4 data type from LOOKUP and OPEN in 442 favor of having the client construct a sequence of LOOKUP 443 operations to achieve the same effect. 445 o Clarification of the internationalization issues and adoption of 446 the new stringprep profile framework. 448 1.3. NFS Version 4 Goals 450 The NFSv4 protocol is a further revision of the NFS protocol defined 451 already by versions 2 [13] and 3 [14]. It retains the essential 452 characteristics of previous versions: design for easy recovery, 453 independent of transport protocols, operating systems and 454 filesystems, simplicity, and good performance. The NFSv4 revision 455 has the following goals: 457 o Improved access and good performance on the Internet. 459 The protocol is designed to transit firewalls easily, perform well 460 where latency is high and bandwidth is low, and scale to very 461 large numbers of clients per server. 463 o Strong security with negotiation built into the protocol. 465 The protocol builds on the work of the ONCRPC working group in 466 supporting the RPCSEC_GSS protocol. Additionally, the NFS version 467 4 protocol provides a mechanism to allow clients and servers the 468 ability to negotiate security and require clients and servers to 469 support a minimal set of security schemes. 471 o Good cross-platform interoperability. 473 The protocol features a filesystem model that provides a useful, 474 common set of features that does not unduly favor one filesystem 475 or operating system over another. 477 o Designed for protocol extensions. 479 The protocol is designed to accept standard extensions that do not 480 compromise backward compatibility. 482 1.4. Inconsistencies of this Document with the companion document NFS 483 Version 4 Protocol 485 [2], NFS Version 4 Protocol, contains the definitions in XDR 486 description language of the constructs used by the protocol. Inside 487 this document, several of the constructs are reproduced for purposes 488 of explanation. The reader is warned of the possibility of errors in 489 the reproduced constructs outside of [2]. For any part of the 490 document that is inconsistent with [2], [2] is to be considered 491 authoritative. 493 1.5. Overview of NFSv4 Features 495 To provide a reasonable context for the reader, the major features of 496 NFSv4 protocol will be reviewed in brief. This will be done to 497 provide an appropriate context for both the reader who is familiar 498 with the previous versions of the NFS protocol and the reader that is 499 new to the NFS protocols. For the reader new to the NFS protocols, 500 there is still a fundamental knowledge that is expected. The reader 501 should be familiar with the XDR and RPC protocols as described in [3] 502 and [15]. A basic knowledge of filesystems and distributed 503 filesystems is expected as well. 505 1.5.1. RPC and Security 507 As with previous versions of NFS, the External Data Representation 508 (XDR) and Remote Procedure Call (RPC) mechanisms used for the NFSv4 509 protocol are those defined in [3] and [15]. To meet end to end 510 security requirements, the RPCSEC_GSS framework [4] will be used to 511 extend the basic RPC security. With the use of RPCSEC_GSS, various 512 mechanisms can be provided to offer authentication, integrity, and 513 privacy to the NFS version 4 protocol. Kerberos V5 will be used as 514 described in [16] to provide one security framework. The LIPKEY GSS- 515 API mechanism described in [5] will be used to provide for the use of 516 user password and server public key by the NFSv4 protocol. With the 517 use of RPCSEC_GSS, other mechanisms may also be specified and used 518 for NFS version 4 security. 520 To enable in-band security negotiation, the NFSv4 protocol has added 521 a new operation which provides the client a method of querying the 522 server about its policies regarding which security mechanisms must be 523 used for access to the server's filesystem resources. With this, the 524 client can securely match the security mechanism that meets the 525 policies specified at both the client and server. 527 1.5.2. Procedure and Operation Structure 529 A significant departure from the previous versions of the NFS 530 protocol is the introduction of the COMPOUND procedure. For the 531 NFSv4 protocol, there are two RPC procedures, NULL and COMPOUND. The 532 COMPOUND procedure is defined in terms of operations and these 533 operations correspond more closely to the traditional NFS procedures. 535 With the use of the COMPOUND procedure, the client is able to build 536 simple or complex requests. These COMPOUND requests allow for a 537 reduction in the number of RPCs needed for logical filesystem 538 operations. For example, without previous contact with a server a 539 client will be able to read data from a file in one request by 540 combining LOOKUP, OPEN, and READ operations in a single COMPOUND RPC. 541 With previous versions of the NFS protocol, this type of single 542 request was not possible. 544 The model used for COMPOUND is very simple. There is no logical OR 545 or ANDing of operations. The operations combined within a COMPOUND 546 request are evaluated in order by the server. Once an operation 547 returns a failing result, the evaluation ends and the results of all 548 evaluated operations are returned to the client. 550 The NFSv4 protocol continues to have the client refer to a file or 551 directory at the server by a "filehandle". The COMPOUND procedure 552 has a method of passing a filehandle from one operation to another 553 within the sequence of operations. There is a concept of a "current 554 filehandle" and "saved filehandle". Most operations use the "current 555 filehandle" as the filesystem object to operate upon. The "saved 556 filehandle" is used as temporary filehandle storage within a COMPOUND 557 procedure as well as an additional operand for certain operations. 559 1.5.3. Filesystem Model 561 The general filesystem model used for the NFSv4 protocol is the same 562 as previous versions. The server filesystem is hierarchical with the 563 regular files contained within being treated as opaque byte streams. 564 In a slight departure, file and directory names are encoded with 565 UTF-8 to deal with the basics of internationalization. 567 The NFSv4 protocol does not require a separate protocol to provide 568 for the initial mapping between path name and filehandle. Instead of 569 using the older MOUNT protocol for this mapping, the server provides 570 a ROOT filehandle that represents the logical root or top of the 571 filesystem tree provided by the server. The server provides multiple 572 filesystems by gluing them together with pseudo filesystems. These 573 pseudo filesystems provide for potential gaps in the path names 574 between real filesystems. 576 1.5.3.1. Filehandle Types 578 In previous versions of the NFS protocol, the filehandle provided by 579 the server was guaranteed to be valid or persistent for the lifetime 580 of the filesystem object to which it referred. For some server 581 implementations, this persistence requirement has been difficult to 582 meet. For the NFSv4 protocol, this requirement has been relaxed by 583 introducing another type of filehandle, volatile. With persistent 584 and volatile filehandle types, the server implementation can match 585 the abilities of the filesystem at the server along with the 586 operating environment. The client will have knowledge of the type of 587 filehandle being provided by the server and can be prepared to deal 588 with the semantics of each. 590 1.5.3.2. Attribute Types 592 The NFSv4 protocol has a rich and extensible file object attribute 593 structure, which is divided into REQUIRED, RECOMMENDED, and named 594 attributes (see Section 5). 596 Several (but not all) of the REQUIRED attributes are derived from the 597 attributes of NFSv3 (see definition of the fattr3 data type in [14]). 598 An example of a REQUIRED attribute is the file object's type 599 (Section 5.8.1.2) so that regular files can be distinguished from 600 directories (also known as folders in some operating environments) 601 and other types of objects. REQUIRED attributes are discussed in 602 Section 5.1. 604 An example of three RECOMMENDED attributes are acl, sacl, and dacl. 605 These attributes define an Access Control List (ACL) on a file object 606 ((Section 6). An ACL provides directory and file access control 607 beyond the model used in NFSv3. The ACL definition allows for 608 specification of specific sets of permissions for individual users 609 and groups. In addition, ACL inheritance allows propagation of 610 access permissions and restriction down a directory tree as file 611 system objects are created. RECOMMENDED attributes are discussed in 612 Section 5.2. 614 A named attribute is an opaque byte stream that is associated with a 615 directory or file and referred to by a string name. Named attributes 616 are meant to be used by client applications as a method to associate 617 application-specific data with a regular file or directory. NFSv4.1 618 modifies named attributes relative to NFSv4.0 by tightening the 619 allowed operations in order to prevent the development of non- 620 interoperable implementations. Named attributes are discussed in 621 Section 5.3. 623 1.5.3.3. Multi-server Namespace 625 NFSv4 contains a number of features to allow implementation of 626 namespaces that cross server boundaries and that allow and facilitate 627 a non-disruptive transfer of support for individual file systems 628 between servers. They are all based upon attributes that allow one 629 file system to specify alternate or new locations for that file 630 system. 632 These attributes may be used together with the concept of absent file 633 systems, which provide specifications for additional locations but no 634 actual file system content. This allows a number of important 635 facilities: 637 o Location attributes may be used with absent file systems to 638 implement referrals whereby one server may direct the client to a 639 file system provided by another server. This allows extensive 640 multi-server namespaces to be constructed. 642 o Location attributes may be provided for present file systems to 643 provide the locations of alternate file system instances or 644 replicas to be used in the event that the current file system 645 instance becomes unavailable. 647 o Location attributes may be provided when a previously present file 648 system becomes absent. This allows non-disruptive migration of 649 file systems to alternate servers. 651 1.5.4. OPEN and CLOSE 653 The NFSv4 protocol introduces OPEN and CLOSE operations. The OPEN 654 operation provides a single point where file lookup, creation, and 655 share semantics can be combined. The CLOSE operation also provides 656 for the release of state accumulated by OPEN. 658 1.5.5. File Locking 660 With the NFSv4 protocol, the support for byte range file locking is 661 part of the NFS protocol. The file locking support is structured so 662 that an RPC callback mechanism is not required. This is a departure 663 from the previous versions of the NFS file locking protocol, Network 664 Lock Manager (NLM). The state associated with file locks is 665 maintained at the server under a lease-based model. The server 666 defines a single lease period for all state held by a NFS client. If 667 the client does not renew its lease within the defined period, all 668 state associated with the client's lease may be released by the 669 server. The client may renew its lease with use of the RENEW 670 operation or implicitly by use of other operations (primarily READ). 672 1.5.6. Client Caching and Delegation 674 The file, attribute, and directory caching for the NFSv4 protocol is 675 similar to previous versions. Attributes and directory information 676 are cached for a duration determined by the client. At the end of a 677 predefined timeout, the client will query the server to see if the 678 related filesystem object has been updated. 680 For file data, the client checks its cache validity when the file is 681 opened. A query is sent to the server to determine if the file has 682 been changed. Based on this information, the client determines if 683 the data cache for the file should kept or released. Also, when the 684 file is closed, any modified data is written to the server. 686 If an application wants to serialize access to file data, file 687 locking of the file data ranges in question should be used. 689 The major addition to NFSv4 in the area of caching is the ability of 690 the server to delegate certain responsibilities to the client. When 691 the server grants a delegation for a file to a client, the client is 692 guaranteed certain semantics with respect to the sharing of that file 693 with other clients. At OPEN, the server may provide the client 694 either a OPEN_DELEGATE_READ or OPEN_DELEGATE_WRITE delegation for the 695 file. If the client is granted a OPEN_DELEGATE_READ delegation, it 696 is assured that no other client has the ability to write to the file 697 for the duration of the delegation. If the client is granted a 698 OPEN_DELEGATE_WRITE delegation, the client is assured that no other 699 client has read or write access to the file. 701 Delegations can be recalled by the server. If another client 702 requests access to the file in such a way that the access conflicts 703 with the granted delegation, the server is able to notify the initial 704 client and recall the delegation. This requires that a callback path 705 exist between the server and client. If this callback path does not 706 exist, then delegations cannot be granted. The essence of a 707 delegation is that it allows the client to locally service operations 708 such as OPEN, CLOSE, LOCK, LOCKU, READ, or WRITE without immediate 709 interaction with the server. 711 1.6. General Definitions 713 The following definitions are provided for the purpose of providing 714 an appropriate context for the reader. 716 Byte: In this document, a byte is an octet, i.e., a datum exactly 8 717 bits in length. 719 Client: The client is the entity that accesses the NFS server's 720 resources. The client may be an application that contains the 721 logic to access the NFS server directly. The client may also be 722 the traditional operating system client that provides remote 723 filesystem services for a set of applications. 725 With reference to byte-range locking, the client is also the 726 entity that maintains a set of locks on behalf of one or more 727 applications. This client is responsible for crash or failure 728 recovery for those locks it manages. 730 Note that multiple clients may share the same transport and 731 connection and multiple clients may exist on the same network 732 node. 734 Client ID: A 64-bit quantity used as a unique, short-hand reference 735 to a client supplied Verifier and ID. The server is responsible 736 for supplying the Client ID. 738 File System: The file system is the collection of objects on a 739 server that share the same fsid attribute (see Section 5.8.1.9). 741 Lease: An interval of time defined by the server for which the 742 client is irrevocably granted a lock. At the end of a lease 743 period the lock may be revoked if the lease has not been extended. 744 The lock must be revoked if a conflicting lock has been granted 745 after the lease interval. 747 All leases granted by a server have the same fixed interval. Note 748 that the fixed interval was chosen to alleviate the expense a 749 server would have in maintaining state about variable length 750 leases across server failures. 752 Lock: The term "lock" is used to refer to both record (byte-range) 753 locks as well as share reservations unless specifically stated 754 otherwise. 756 Server: The "Server" is the entity responsible for coordinating 757 client access to a set of filesystems. 759 Stable Storage: NFSv4 servers must be able to recover without data 760 loss from multiple power failures (including cascading power 761 failures, that is, several power failures in quick succession), 762 operating system failures, and hardware failure of components 763 other than the storage medium itself (for example, disk, 764 nonvolatile RAM). 766 Some examples of stable storage that are allowable for an NFS 767 server include: 769 (1) Media commit of data, that is, the modified data has been 770 successfully written to the disk media, for example, the disk 771 platter. 773 (2) An immediate reply disk drive with battery-backed on-drive 774 intermediate storage or uninterruptible power system (UPS). 776 (3) Server commit of data with battery-backed intermediate 777 storage and recovery software. 779 (4) Cache commit with uninterruptible power system (UPS) and 780 recovery software. 782 Stateid: A stateid is a 128-bit quantity returned by a server that 783 uniquely defines the open and locking states provided by the 784 server for a specific open-owner or lock-owner/open-owner pair for 785 a specific file and type of lock. 787 Verifier: A 64-bit quantity generated by the client that the server 788 can use to determine if the client has restarted and lost all 789 previous lock state. 791 2. Protocol Data Types 793 The syntax and semantics to describe the data types of the NFS 794 version 4 protocol are defined in the XDR [15] and RPC [3] documents. 795 The next sections build upon the XDR data types to define types and 796 structures specific to this protocol. 798 2.1. Basic Data Types 800 These are the base NFSv4 data types. 802 +----------------+--------------------------------------------------+ 803 | Data Type | Definition | 804 +----------------+--------------------------------------------------+ 805 | int32_t | typedef int int32_t; | 806 | uint32_t | typedef unsigned int uint32_t; | 807 | int64_t | typedef hyper int64_t; | 808 | uint64_t | typedef unsigned hyper uint64_t; | 809 | attrlist4 | typedef opaque attrlist4<>; | 810 | | Used for file/directory attributes. | 811 | bitmap4 | typedef uint32_t bitmap4<>; | 812 | | Used in attribute array encoding. | 813 | changeid4 | typedef uint64_t changeid4; | 814 | | Used in the definition of change_info4. | 815 | clientid4 | typedef uint64_t clientid4; | 816 | | Shorthand reference to client identification. | 817 | count4 | typedef uint32_t count4; | 818 | | Various count parameters (READ, WRITE, COMMIT). | 819 | length4 | typedef uint64_t length4; | 820 | | Describes LOCK lengths. | 821 | mode4 | typedef uint32_t mode4; | 822 | | Mode attribute data type. | 823 | nfs_cookie4 | typedef uint64_t nfs_cookie4; | 824 | | Opaque cookie value for READDIR. | 825 | nfs_fh4 | typedef opaque nfs_fh4; | 826 | | Filehandle definition. | 827 | nfs_ftype4 | enum nfs_ftype4; | 828 | | Various defined file types. | 829 | nfsstat4 | enum nfsstat4; | 830 | | Return value for operations. | 831 | offset4 | typedef uint64_t offset4; | 832 | | Various offset designations (READ, WRITE, LOCK, | 833 | | COMMIT). | 834 | qop4 | typedef uint32_t qop4; | 835 | | Quality of protection designation in SECINFO. | 836 | sec_oid4 | typedef opaque sec_oid4<>; | 837 | | Security Object Identifier. The sec_oid4 data | 838 | | type is not really opaque. Instead it contains | 839 | | an ASN.1 OBJECT IDENTIFIER as used by GSS-API in | 840 | | the mech_type argument to GSS_Init_sec_context. | 841 | | See [6] for details. | 842 | seqid4 | typedef uint32_t seqid4; | 843 | | Sequence identifier used for file locking. | 844 | utf8string | typedef opaque utf8string<>; | 845 | | UTF-8 encoding for strings. | 846 | utf8_should | typedef utf8string utf8_should; | 847 | | String expected to be UTF8 but no validation | 848 | utf8val_should | typedef utf8string utf8val_should; | 849 | | String SHOULD be sent UTF8 and SHOULD be | 850 | | validated | 851 | utf8val_must | typedef utf8string utf8val_must; | 852 | | String MUST be sent UTF8 and MUST be validated | 853 | ascii_must | typedef utf8string ascii_must; | 854 | | String MUST be sent as ASCII and thus is | 855 | | automatically UTF8 | 856 | comptag4 | typedef utf8_should comptag4; | 857 | | Tag should be UTF8 but is not checked | 858 | component4 | typedef utf8val_should component4; | 859 | | Represents path name components. | 860 | linktext4 | typedef utf8val_should linktext4; | 861 | | Symbolic link contents. | 862 | pathname4 | typedef component4 pathname4<>; | 863 | | Represents path name for fs_locations. | 864 | nfs_lockid4 | typedef uint64_t nfs_lockid4; | 865 | verifier4 | typedef opaque verifier4[NFS4_VERIFIER_SIZE]; | 866 | | Verifier used for various operations (COMMIT, | 867 | | CREATE, EXCHANGE_ID, OPEN, READDIR, WRITE) | 868 | | NFS4_VERIFIER_SIZE is defined as 8. | 869 +----------------+--------------------------------------------------+ 871 End of Base Data Types 873 Table 1 875 2.2. Structured Data Types 877 2.2.1. nfstime4 879 struct nfstime4 { 880 int64_t seconds; 881 uint32_t nseconds; 882 }; 884 The nfstime4 structure gives the number of seconds and nanoseconds 885 since midnight or 0 hour January 1, 1970 Coordinated Universal Time 886 (UTC). Values greater than zero for the seconds field denote dates 887 after the 0 hour January 1, 1970. Values less than zero for the 888 seconds field denote dates before the 0 hour January 1, 1970. In 889 both cases, the nseconds field is to be added to the seconds field 890 for the final time representation. For example, if the time to be 891 represented is one-half second before 0 hour January 1, 1970, the 892 seconds field would have a value of negative one (-1) and the 893 nseconds fields would have a value of one-half second (500000000). 894 Values greater than 999,999,999 for nseconds are considered invalid. 896 This data type is used to pass time and date information. A server 897 converts to and from its local representation of time when processing 898 time values, preserving as much accuracy as possible. If the 899 precision of timestamps stored for a filesystem object is less than 900 defined, loss of precision can occur. An adjunct time maintenance 901 protocol is recommended to reduce client and server time skew. 903 2.2.2. time_how4 905 enum time_how4 { 906 SET_TO_SERVER_TIME4 = 0, 907 SET_TO_CLIENT_TIME4 = 1 908 }; 910 2.2.3. settime4 912 union settime4 switch (time_how4 set_it) { 913 case SET_TO_CLIENT_TIME4: 914 nfstime4 time; 915 default: 916 void; 917 }; 919 The above definitions are used as the attribute definitions to set 920 time values. If set_it is SET_TO_SERVER_TIME4, then the server uses 921 its local representation of time for the time value. 923 2.2.4. specdata4 925 struct specdata4 { 926 uint32_t specdata1; /* major device number */ 927 uint32_t specdata2; /* minor device number */ 928 }; 930 This data type represents additional information for the device file 931 types NF4CHR and NF4BLK. 933 2.2.5. fsid4 935 struct fsid4 { 936 uint64_t major; 937 uint64_t minor; 938 }; 940 This type is the filesystem identifier that is used as a mandatory 941 attribute. 943 2.2.6. fs_location4 945 struct fs_location4 { 946 utf8must server<>; 947 pathname4 rootpath; 948 }; 950 2.2.7. fs_locations4 952 struct fs_locations4 { 953 pathname4 fs_root; 954 fs_location4 locations<>; 955 }; 957 The fs_location4 and fs_locations4 data types are used for the 958 fs_locations recommended attribute which is used for migration and 959 replication support. 961 2.2.8. fattr4 963 struct fattr4 { 964 bitmap4 attrmask; 965 attrlist4 attr_vals; 966 }; 968 The fattr4 structure is used to represent file and directory 969 attributes. 971 The bitmap is a counted array of 32 bit integers used to contain bit 972 values. The position of the integer in the array that contains bit n 973 can be computed from the expression (n / 32) and its bit within that 974 integer is (n mod 32). 976 0 1 977 +-----------+-----------+-----------+-- 978 | count | 31 .. 0 | 63 .. 32 | 979 +-----------+-----------+-----------+-- 981 2.2.9. change_info4 983 struct change_info4 { 984 bool atomic; 985 changeid4 before; 986 changeid4 after; 987 }; 989 This structure is used with the CREATE, LINK, REMOVE, RENAME 990 operations to let the client know the value of the change attribute 991 for the directory in which the target filesystem object resides. 993 2.2.10. clientaddr4 995 struct clientaddr4 { 996 /* see struct rpcb in RFC 1833 */ 997 string r_netid<>; /* network id */ 998 string r_addr<>; /* universal address */ 999 }; 1001 The clientaddr4 structure is used as part of the SETCLIENTID 1002 operation to either specify the address of the client that is using a 1003 client ID or as part of the callback registration. The r_netid and 1004 r_addr fields are specified in [17], but they are underspecified in 1006 [17] as far as what they should look like for specific protocols. 1008 For TCP over IPv4 and for UDP over IPv4, the format of r_addr is the 1009 US-ASCII string: 1011 h1.h2.h3.h4.p1.p2 1013 The prefix, "h1.h2.h3.h4", is the standard textual form for 1014 representing an IPv4 address, which is always four octets long. 1015 Assuming big-endian ordering, h1, h2, h3, and h4, are respectively, 1016 the first through fourth octets each converted to ASCII-decimal. 1017 Assuming big-endian ordering, p1 and p2 are, respectively, the first 1018 and second octets each converted to ASCII-decimal. For example, if a 1019 host, in big-endian order, has an address of 0x0A010307 and there is 1020 a service listening on, in big endian order, port 0x020F (decimal 1021 527), then the complete universal address is "10.1.3.7.2.15". 1023 For TCP over IPv4 the value of r_netid is the string "tcp". For UDP 1024 over IPv4 the value of r_netid is the string "udp". 1026 For TCP over IPv6 and for UDP over IPv6, the format of r_addr is the 1027 US-ASCII string: 1029 x1:x2:x3:x4:x5:x6:x7:x8.p1.p2 1031 The suffix "p1.p2" is the service port, and is computed the same way 1032 as with universal addresses for TCP and UDP over IPv4. The prefix, 1033 "x1:x2:x3:x4:x5:x6:x7:x8", is the standard textual form for 1034 representing an IPv6 address as defined in Section 2.2 of [18]. 1035 Additionally, the two alternative forms specified in Section 2.2 of 1036 [18] are also acceptable. 1038 For TCP over IPv6 the value of r_netid is the string "tcp6". For UDP 1039 over IPv6 the value of r_netid is the string "udp6". 1041 2.2.11. cb_client4 1043 struct cb_client4 { 1044 unsigned int cb_program; 1045 clientaddr4 cb_location; 1046 }; 1048 This structure is used by the client to inform the server of its call 1049 back address; includes the program number and client address. 1051 2.2.12. nfs_client_id4 1053 struct nfs_client_id4 { 1054 verifier4 verifier; 1055 opaque id; 1056 }; 1058 This structure is part of the arguments to the SETCLIENTID operation. 1059 NFS4_OPAQUE_LIMIT is defined as 1024. 1061 2.2.13. open_owner4 1063 struct open_owner4 { 1064 clientid4 clientid; 1065 opaque owner; 1066 }; 1068 This structure is used to identify the owner of open state. 1069 NFS4_OPAQUE_LIMIT is defined as 1024. 1071 2.2.14. lock_owner4 1073 struct lock_owner4 { 1074 clientid4 clientid; 1075 opaque owner; 1076 }; 1078 This structure is used to identify the owner of file locking state. 1079 NFS4_OPAQUE_LIMIT is defined as 1024. 1081 2.2.15. open_to_lock_owner4 1083 struct open_to_lock_owner4 { 1084 seqid4 open_seqid; 1085 stateid4 open_stateid; 1086 seqid4 lock_seqid; 1087 lock_owner4 lock_owner; 1088 }; 1090 This structure is used for the first LOCK operation done for an 1091 open_owner4. It provides both the open_stateid and lock_owner such 1092 that the transition is made from a valid open_stateid sequence to 1093 that of the new lock_stateid sequence. Using this mechanism avoids 1094 the confirmation of the lock_owner/lock_seqid pair since it is tied 1095 to established state in the form of the open_stateid/open_seqid. 1097 2.2.16. stateid4 1099 struct stateid4 { 1100 uint32_t seqid; 1101 opaque other[12]; 1102 }; 1104 This structure is used for the various state sharing mechanisms 1105 between the client and server. For the client, this data structure 1106 is read-only. The starting value of the seqid field is undefined. 1107 The server is required to increment the seqid field monotonically at 1108 each transition of the stateid. This is important since the client 1109 will inspect the seqid in OPEN stateids to determine the order of 1110 OPEN processing done by the server. 1112 3. RPC and Security Flavor 1114 The NFSv4 protocol is a Remote Procedure Call (RPC) application that 1115 uses RPC version 2 and the corresponding eXternal Data Representation 1116 (XDR) as defined in [3] and [15]. The RPCSEC_GSS security flavor as 1117 defined in [4] MUST be used as the mechanism to deliver stronger 1118 security for the NFSv4 protocol. 1120 3.1. Ports and Transports 1122 Historically, NFSv2 and NFSv3 servers have resided on port 2049. The 1123 registered port 2049 [19] for the NFS protocol SHOULD be the default 1124 configuration. Using the registered port for NFS services means the 1125 NFS client will not need to use the RPC binding protocols as 1126 described in [17]; this will allow NFS to transit firewalls. 1128 Where an NFSv4 implementation supports operation over the IP network 1129 protocol, the supported transports between NFS and IP MUST be among 1130 the IETF-approved congestion control transport protocols, which 1131 include TCP and SCTP. To enhance the possibilities for 1132 interoperability, an NFSv4 implementation MUST support operation over 1133 the TCP transport protocol, at least until such time as a standards 1134 track RFC revises this requirement to use a different IETF-approved 1135 congestion control transport protocol. 1137 If TCP is used as the transport, the client and server SHOULD use 1138 persistent connections. This will prevent the weakening of TCP's 1139 congestion control via short lived connections and will improve 1140 performance for the WAN environment by eliminating the need for SYN 1141 handshakes. 1143 As noted in Section 17, the authentication model for NFSv4 has moved 1144 from machine-based to principal-based. However, this modification of 1145 the authentication model does not imply a technical requirement to 1146 move the TCP connection management model from whole machine-based to 1147 one based on a per user model. In particular, NFS over TCP client 1148 implementations have traditionally multiplexed traffic for multiple 1149 users over a common TCP connection between an NFS client and server. 1150 This has been true, regardless whether the NFS client is using 1151 AUTH_SYS, AUTH_DH, RPCSEC_GSS or any other flavor. Similarly, NFS 1152 over TCP server implementations have assumed such a model and thus 1153 scale the implementation of TCP connection management in proportion 1154 to the number of expected client machines. It is intended that NFSv4 1155 will not modify this connection management model. NFSv4 clients that 1156 violate this assumption can expect scaling issues on the server and 1157 hence reduced service. 1159 Note that for various timers, the client and server should avoid 1160 inadvertent synchronization of those timers. For further discussion 1161 of the general issue refer to [20]. 1163 3.1.1. Client Retransmission Behavior 1165 When processing a request received over a reliable transport such as 1166 TCP, the NFSv4 server MUST NOT silently drop the request, except if 1167 the transport connection has been broken. Given such a contract 1168 between NFSv4 clients and servers, clients MUST NOT retry a request 1169 unless one or both of the following are true: 1171 o The transport connection has been broken 1173 o The procedure being retried is the NULL procedure 1175 Since reliable transports, such as TCP, do not always synchronously 1176 inform a peer when the other peer has broken the connection (for 1177 example, when an NFS server reboots), the NFSv4 client may want to 1178 actively "probe" the connection to see if has been broken. Use of 1179 the NULL procedure is one recommended way to do so. So, when a 1180 client experiences a remote procedure call timeout (of some arbitrary 1181 implementation specific amount), rather than retrying the remote 1182 procedure call, it could instead issue a NULL procedure call to the 1183 server. If the server has died, the transport connection break will 1184 eventually be indicated to the NFSv4 client. The client can then 1185 reconnect, and then retry the original request. If the NULL 1186 procedure call gets a response, the connection has not broken. The 1187 client can decide to wait longer for the original request's response, 1188 or it can break the transport connection and reconnect before re- 1189 sending the original request. 1191 For callbacks from the server to the client, the same rules apply, 1192 but the server doing the callback becomes the client, and the client 1193 receiving the callback becomes the server. 1195 3.2. Security Flavors 1197 Traditional RPC implementations have included AUTH_NONE, AUTH_SYS, 1198 AUTH_DH, and AUTH_KRB4 as security flavors. With [4] an additional 1199 security flavor of RPCSEC_GSS has been introduced which uses the 1200 functionality of GSS-API [6]. This allows for the use of various 1201 security mechanisms by the RPC layer without the additional 1202 implementation overhead of adding RPC security flavors. For NFSv4, 1203 the RPCSEC_GSS security flavor MUST be used to enable the mandatory 1204 security mechanism. Other flavors, such as, AUTH_NONE, AUTH_SYS, and 1205 AUTH_DH MAY be implemented as well. 1207 3.2.1. Security mechanisms for NFSv4 1209 The use of RPCSEC_GSS requires selection of: mechanism, quality of 1210 protection, and service (authentication, integrity, privacy). The 1211 remainder of this document will refer to these three parameters of 1212 the RPCSEC_GSS security as the security triple. 1214 3.2.1.1. Kerberos V5 as a security triple 1216 The Kerberos V5 GSS-API mechanism as described in [16] MUST be 1217 implemented and provide the following security triples. 1219 column descriptions: 1221 1 == number of pseudo flavor 1222 2 == name of pseudo flavor 1223 3 == mechanism's OID 1224 4 == mechanism's algorithm(s) 1225 5 == RPCSEC_GSS service 1227 1 2 3 4 5 1228 -------------------------------------------------------------------- 1229 390003 krb5 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_none 1230 390004 krb5i 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_integrity 1231 390005 krb5p 1.2.840.113554.1.2.2 DES MAC MD5 rpc_gss_svc_privacy 1232 for integrity, 1233 and 56 bit DES 1234 for privacy. 1236 Note that the pseudo flavor is presented here as a mapping aid to the 1237 implementor. Because this NFS protocol includes a method to 1238 negotiate security and it understands the GSS-API mechanism, the 1239 pseudo flavor is not needed. The pseudo flavor is needed for NFSv3 1240 since the security negotiation is done via the MOUNT protocol. 1242 For a discussion of NFS' use of RPCSEC_GSS and Kerberos V5, please 1243 see [21]. 1245 Users and implementors are warned that 56 bit DES is no longer 1246 considered state of the art in terms of resistance to brute force 1247 attacks. Once a revision to [16] is available that adds support for 1248 AES, implementors are urged to incorporate AES into their NFSv4 over 1249 Kerberos V5 protocol stacks, and users are similarly urged to migrate 1250 to the use of AES. 1252 3.2.1.2. LIPKEY as a security triple 1254 The LIPKEY GSS-API mechanism as described in [5] MAY be implemented 1255 and provide the following security triples. The definition of the 1256 columns matches those in Section 3.2.1.1. 1258 1 2 3 4 5 1259 -------------------------------------------------------------------- 1260 390006 lipkey 1.3.6.1.5.5.9 negotiated rpc_gss_svc_none 1261 390007 lipkey-i 1.3.6.1.5.5.9 negotiated rpc_gss_svc_integrity 1262 390008 lipkey-p 1.3.6.1.5.5.9 negotiated rpc_gss_svc_privacy 1264 The mechanism algorithm is listed as "negotiated". This is because 1265 LIPKEY is layered on SPKM-3 and in SPKM-3 [5] the confidentiality and 1266 integrity algorithms are negotiated. Since SPKM-3 specifies HMAC-MD5 1267 for integrity as MANDATORY, 128 bit cast5CBC for confidentiality for 1268 privacy as MANDATORY, and further specifies that HMAC-MD5 and 1269 cast5CBC MUST be listed first before weaker algorithms, specifying 1270 "negotiated" in column 4 does not impair interoperability. In the 1271 event an SPKM-3 peer does not support the mandatory algorithms, the 1272 other peer is free to accept or reject the GSS-API context creation. 1274 Because SPKM-3 negotiates the algorithms, subsequent calls to 1275 LIPKEY's GSS_Wrap() and GSS_GetMIC() by RPCSEC_GSS will use a quality 1276 of protection value of 0 (zero). See section 5.2 of [22] for an 1277 explanation. 1279 LIPKEY uses SPKM-3 to create a secure channel in which to pass a user 1280 name and password from the client to the server. Once the user name 1281 and password have been accepted by the server, calls to the LIPKEY 1282 context are redirected to the SPKM-3 context. See [5] for more 1283 details. 1285 3.2.1.3. SPKM-3 as a security triple 1287 The SPKM-3 GSS-API mechanism as described in [5] MAY be implemented 1288 and provide the following security triples. The definition of the 1289 columns matches those in Section 3.2.1.1. 1291 1 2 3 4 5 1292 -------------------------------------------------------------------- 1293 390009 spkm3 1.3.6.1.5.5.1.3 negotiated rpc_gss_svc_none 1294 390010 spkm3i 1.3.6.1.5.5.1.3 negotiated rpc_gss_svc_integrity 1295 390011 spkm3p 1.3.6.1.5.5.1.3 negotiated rpc_gss_svc_privacy 1297 For a discussion as to why the mechanism algorithm is listed as 1298 "negotiated", see Section 3.2.1.2. 1300 Because SPKM-3 negotiates the algorithms, subsequent calls to 1301 SPKM-3's GSS_Wrap() and GSS_GetMIC() by RPCSEC_GSS will use a quality 1302 of protection value of 0 (zero). See section 5.2 of [22] for an 1303 explanation. 1305 Even though LIPKEY is layered over SPKM-3, SPKM-3 is specified as a 1306 mandatory set of triples to handle the situations where the initiator 1307 (the client) is anonymous or where the initiator has its own 1308 certificate. If the initiator is anonymous, there will not be a user 1309 name and password to send to the target (the server). If the 1310 initiator has its own certificate, then using passwords is 1311 superfluous. 1313 3.3. Security Negotiation 1315 With the NFSv4 server potentially offering multiple security 1316 mechanisms, the client needs a method to determine or negotiate which 1317 mechanism is to be used for its communication with the server. The 1318 NFS server may have multiple points within its filesystem name space 1319 that are available for use by NFS clients. In turn the NFS server 1320 may be configured such that each of these entry points may have 1321 different or multiple security mechanisms in use. 1323 The security negotiation between client and server SHOULD be done 1324 with a secure channel to eliminate the possibility of a third party 1325 intercepting the negotiation sequence and forcing the client and 1326 server to choose a lower level of security than required or desired. 1327 See Section 17 for further discussion. 1329 3.3.1. SECINFO 1331 The new SECINFO operation will allow the client to determine, on a 1332 per filehandle basis, what security triple is to be used for server 1333 access. In general, the client will not have to use the SECINFO 1334 operation except during initial communication with the server or when 1335 the client crosses policy boundaries at the server. It is possible 1336 that the server's policies change during the client's interaction 1337 therefore forcing the client to negotiate a new security triple. 1339 3.3.2. Security Error 1341 Based on the assumption that each NFSv4 client and server MUST 1342 support a minimum set of security (i.e., LIPKEY, SPKM-3, and 1343 Kerberos-V5 all under RPCSEC_GSS), the NFS client will start its 1344 communication with the server with one of the minimal security 1345 triples. During communication with the server, the client may 1346 receive an NFS error of NFS4ERR_WRONGSEC. This error allows the 1347 server to notify the client that the security triple currently being 1348 used is not appropriate for access to the server's filesystem 1349 resources. The client is then responsible for determining what 1350 security triples are available at the server and choose one which is 1351 appropriate for the client. See Section 15.33 for further discussion 1352 of how the client will respond to the NFS4ERR_WRONGSEC error and use 1353 SECINFO. 1355 3.3.3. Callback RPC Authentication 1357 Except as noted elsewhere in this section, the callback RPC 1358 (described later) MUST mutually authenticate the NFS server to the 1359 principal that acquired the client ID (also described later), using 1360 the security flavor the original SETCLIENTID operation used. 1362 For AUTH_NONE, there are no principals, so this is a non-issue. 1364 AUTH_SYS has no notions of mutual authentication or a server 1365 principal, so the callback from the server simply uses the AUTH_SYS 1366 credential that the user used when he set up the delegation. 1368 For AUTH_DH, one commonly used convention is that the server uses the 1369 credential corresponding to this AUTH_DH principal: 1371 unix.host@domain 1373 where host and domain are variables corresponding to the name of 1374 server host and directory services domain in which it lives such as a 1375 Network Information System domain or a DNS domain. 1377 Because LIPKEY is layered over SPKM-3, it is permissible for the 1378 server to use SPKM-3 and not LIPKEY for the callback even if the 1379 client used LIPKEY for SETCLIENTID. 1381 Regardless of what security mechanism under RPCSEC_GSS is being used, 1382 the NFS server, MUST identify itself in GSS-API via a 1383 GSS_C_NT_HOSTBASED_SERVICE name type. GSS_C_NT_HOSTBASED_SERVICE 1384 names are of the form: 1386 service@hostname 1388 For NFS, the "service" element is 1390 nfs 1392 Implementations of security mechanisms will convert nfs@hostname to 1393 various different forms. For Kerberos V5 and LIPKEY, the following 1394 form is RECOMMENDED: 1396 nfs/hostname 1398 For Kerberos V5, nfs/hostname would be a server principal in the 1399 Kerberos Key Distribution Center database. This is the same 1400 principal the client acquired a GSS-API context for when it issued 1401 the SETCLIENTID operation, therefore, the realm name for the server 1402 principal must be the same for the callback as it was for the 1403 SETCLIENTID. 1405 For LIPKEY, this would be the username passed to the target (the NFS 1406 version 4 client that receives the callback). 1408 It should be noted that LIPKEY may not work for callbacks, since the 1409 LIPKEY client uses a user id/password. If the NFS client receiving 1410 the callback can authenticate the NFS server's user name/password 1411 pair, and if the user that the NFS server is authenticating to has a 1412 public key certificate, then it works. 1414 In situations where the NFS client uses LIPKEY and uses a per-host 1415 principal for the SETCLIENTID operation, instead of using LIPKEY for 1416 SETCLIENTID, it is RECOMMENDED that SPKM-3 with mutual authentication 1417 be used. This effectively means that the client will use a 1418 certificate to authenticate and identify the initiator to the target 1419 on the NFS server. Using SPKM-3 and not LIPKEY has the following 1420 advantages: 1422 o When the server does a callback, it must authenticate to the 1423 principal used in the SETCLIENTID. Even if LIPKEY is used, 1424 because LIPKEY is layered over SPKM-3, the NFS client will need to 1425 have a certificate that corresponds to the principal used in the 1426 SETCLIENTID operation. From an administrative perspective, having 1427 a user name, password, and certificate for both the client and 1428 server is redundant. 1430 o LIPKEY was intended to minimize additional infrastructure 1431 requirements beyond a certificate for the target, and the 1432 expectation is that existing password infrastructure can be 1433 leveraged for the initiator. In some environments, a per-host 1434 password does not exist yet. If certificates are used for any 1435 per-host principals, then additional password infrastructure is 1436 not needed. 1438 o In cases when a host is both an NFS client and server, it can 1439 share the same per-host certificate. 1441 4. Filehandles 1443 The filehandle in the NFS protocol is a per server unique identifier 1444 for a filesystem object. The contents of the filehandle are opaque 1445 to the client. Therefore, the server is responsible for translating 1446 the filehandle to an internal representation of the filesystem 1447 object. 1449 4.1. Obtaining the First Filehandle 1451 The operations of the NFS protocol are defined in terms of one or 1452 more filehandles. Therefore, the client needs a filehandle to 1453 initiate communication with the server. With the NFSv2 protocol [13] 1454 and the NFSv3 protocol [14], there exists an ancillary protocol to 1455 obtain this first filehandle. The MOUNT protocol, RPC program number 1456 100005, provides the mechanism of translating a string based 1457 filesystem path name to a filehandle which can then be used by the 1458 NFS protocols. 1460 The MOUNT protocol has deficiencies in the area of security and use 1461 via firewalls. This is one reason that the use of the public 1462 filehandle was introduced in [23] and [24]. With the use of the 1463 public filehandle in combination with the LOOKUP operation in the 1464 NFSv2 and NFSv3 protocols, it has been demonstrated that the MOUNT 1465 protocol is unnecessary for viable interaction between NFS client and 1466 server. 1468 Therefore, the NFSv4 protocol will not use an ancillary protocol for 1469 translation from string based path names to a filehandle. Two 1470 special filehandles will be used as starting points for the NFS 1471 client. 1473 4.1.1. Root Filehandle 1475 The first of the special filehandles is the ROOT filehandle. The 1476 ROOT filehandle is the "conceptual" root of the filesystem name space 1477 at the NFS server. The client uses or starts with the ROOT 1478 filehandle by employing the PUTROOTFH operation. The PUTROOTFH 1479 operation instructs the server to set the "current" filehandle to the 1480 ROOT of the server's file tree. Once this PUTROOTFH operation is 1481 used, the client can then traverse the entirety of the server's file 1482 tree with the LOOKUP operation. A complete discussion of the server 1483 name space is in Section 8. 1485 4.1.2. Public Filehandle 1487 The second special filehandle is the PUBLIC filehandle. Unlike the 1488 ROOT filehandle, the PUBLIC filehandle may be bound or represent an 1489 arbitrary filesystem object at the server. The server is responsible 1490 for this binding. It may be that the PUBLIC filehandle and the ROOT 1491 filehandle refer to the same filesystem object. However, it is up to 1492 the administrative software at the server and the policies of the 1493 server administrator to define the binding of the PUBLIC filehandle 1494 and server filesystem object. The client may not make any 1495 assumptions about this binding. The client uses the PUBLIC 1496 filehandle via the PUTPUBFH operation. 1498 4.2. Filehandle Types 1500 In the NFSv2 and NFSv3 protocols, there was one type of filehandle 1501 with a single set of semantics. This type of filehandle is termed 1502 "persistent" in NFS Version 4. The semantics of a persistent 1503 filehandle remain the same as before. A new type of filehandle 1504 introduced in NFS Version 4 is the "volatile" filehandle, which 1505 attempts to accommodate certain server environments. 1507 The volatile filehandle type was introduced to address server 1508 functionality or implementation issues which make correct 1509 implementation of a persistent filehandle infeasible. Some server 1510 environments do not provide a filesystem level invariant that can be 1511 used to construct a persistent filehandle. The underlying server 1512 filesystem may not provide the invariant or the server's filesystem 1513 programming interfaces may not provide access to the needed 1514 invariant. Volatile filehandles may ease the implementation of 1515 server functionality such as hierarchical storage management or 1516 filesystem reorganization or migration. However, the volatile 1517 filehandle increases the implementation burden for the client. 1519 Since the client will need to handle persistent and volatile 1520 filehandles differently, a file attribute is defined which may be 1521 used by the client to determine the filehandle types being returned 1522 by the server. 1524 4.2.1. General Properties of a Filehandle 1526 The filehandle contains all the information the server needs to 1527 distinguish an individual file. To the client, the filehandle is 1528 opaque. The client stores filehandles for use in a later request and 1529 can compare two filehandles from the same server for equality by 1530 doing a byte-by-byte comparison. However, the client MUST NOT 1531 otherwise interpret the contents of filehandles. If two filehandles 1532 from the same server are equal, they MUST refer to the same file. 1533 Servers SHOULD try to maintain a one-to-one correspondence between 1534 filehandles and files but this is not required. Clients MUST use 1535 filehandle comparisons only to improve performance, not for correct 1536 behavior. All clients need to be prepared for situations in which it 1537 cannot be determined whether two filehandles denote the same object 1538 and in such cases, avoid making invalid assumptions which might cause 1539 incorrect behavior. Further discussion of filehandle and attribute 1540 comparison in the context of data caching is presented in 1541 Section 10.3.4. 1543 As an example, in the case that two different path names when 1544 traversed at the server terminate at the same filesystem object, the 1545 server SHOULD return the same filehandle for each path. This can 1546 occur if a hard link is used to create two file names which refer to 1547 the same underlying file object and associated data. For example, if 1548 paths /a/b/c and /a/d/c refer to the same file, the server SHOULD 1549 return the same filehandle for both path names traversals. 1551 4.2.2. Persistent Filehandle 1553 A persistent filehandle is defined as having a fixed value for the 1554 lifetime of the filesystem object to which it refers. Once the 1555 server creates the filehandle for a filesystem object, the server 1556 MUST accept the same filehandle for the object for the lifetime of 1557 the object. If the server restarts or reboots the NFS server must 1558 honor the same filehandle value as it did in the server's previous 1559 instantiation. Similarly, if the filesystem is migrated, the new NFS 1560 server must honor the same filehandle as the old NFS server. 1562 The persistent filehandle will be become stale or invalid when the 1563 filesystem object is removed. When the server is presented with a 1564 persistent filehandle that refers to a deleted object, it MUST return 1565 an error of NFS4ERR_STALE. A filehandle may become stale when the 1566 filesystem containing the object is no longer available. The file 1567 system may become unavailable if it exists on removable media and the 1568 media is no longer available at the server or the filesystem in whole 1569 has been destroyed or the filesystem has simply been removed from the 1570 server's name space (i.e., unmounted in a UNIX environment). 1572 4.2.3. Volatile Filehandle 1574 A volatile filehandle does not share the same longevity 1575 characteristics of a persistent filehandle. The server may determine 1576 that a volatile filehandle is no longer valid at many different 1577 points in time. If the server can definitively determine that a 1578 volatile filehandle refers to an object that has been removed, the 1579 server should return NFS4ERR_STALE to the client (as is the case for 1580 persistent filehandles). In all other cases where the server 1581 determines that a volatile filehandle can no longer be used, it 1582 should return an error of NFS4ERR_FHEXPIRED. 1584 The mandatory attribute "fh_expire_type" is used by the client to 1585 determine what type of filehandle the server is providing for a 1586 particular filesystem. This attribute is a bitmask with the 1587 following values: 1589 FH4_PERSISTENT: The value of FH4_PERSISTENT is used to indicate a 1590 persistent filehandle, which is valid until the object is removed 1591 from the filesystem. The server will not return NFS4ERR_FHEXPIRED 1592 for this filehandle. FH4_PERSISTENT is defined as a value in 1593 which none of the bits specified below are set. 1595 FH4_VOLATILE_ANY: The filehandle may expire at any time, except as 1596 specifically excluded (i.e., FH4_NOEXPIRE_WITH_OPEN). 1598 FH4_NOEXPIRE_WITH_OPEN: May only be set when FH4_VOLATILE_ANY is 1599 set. If this bit is set, then the meaning of FH4_VOLATILE_ANY is 1600 qualified to exclude any expiration of the filehandle when it is 1601 open. 1603 FH4_VOL_MIGRATION: The filehandle will expire as a result of 1604 migration. If FH4_VOLATILE_ANY is set, FH4_VOL_MIGRATION is 1605 redundant. 1607 FH4_VOL_RENAME: The filehandle will expire during rename. This 1608 includes a rename by the requesting client or a rename by any 1609 other client. If FH4_VOLATILE_ANY is set, FH4_VOL_RENAME is 1610 redundant. 1612 Servers which provide volatile filehandles that may expire while open 1613 (i.e., if FH4_VOL_MIGRATION or FH4_VOL_RENAME is set or if 1614 FH4_VOLATILE_ANY is set and FH4_NOEXPIRE_WITH_OPEN not set), should 1615 deny a RENAME or REMOVE that would affect an OPEN file of any of the 1616 components leading to the OPEN file. In addition, the server should 1617 deny all RENAME or REMOVE requests during the grace period upon 1618 server restart. 1620 Note that the bits FH4_VOL_MIGRATION and FH4_VOL_RENAME allow the 1621 client to determine that expiration has occurred whenever a specific 1622 event occurs, without an explicit filehandle expiration error from 1623 the server. FH4_VOLATILE_ANY does not provide this form of 1624 information. In situations where the server will expire many, but 1625 not all filehandles upon migration (e.g., all but those that are 1626 open), FH4_VOLATILE_ANY (in this case with FH4_NOEXPIRE_WITH_OPEN) is 1627 a better choice since the client may not assume that all filehandles 1628 will expire when migration occurs, and it is likely that additional 1629 expirations will occur (as a result of file CLOSE) that are separated 1630 in time from the migration event itself. 1632 4.2.4. One Method of Constructing a Volatile Filehandle 1634 A volatile filehandle, while opaque to the client could contain: 1636 [volatile bit = 1 | server boot time | slot | generation number] 1638 o slot is an index in the server volatile filehandle table 1640 o generation number is the generation number for the table entry/ 1641 slot 1643 When the client presents a volatile filehandle, the server makes the 1644 following checks, which assume that the check for the volatile bit 1645 has passed. If the server boot time is less than the current server 1646 boot time, return NFS4ERR_FHEXPIRED. If slot is out of range, return 1647 NFS4ERR_BADHANDLE. If the generation number does not match, return 1648 NFS4ERR_FHEXPIRED. 1650 When the server reboots, the table is gone (it is volatile). 1652 If volatile bit is 0, then it is a persistent filehandle with a 1653 different structure following it. 1655 4.3. Client Recovery from Filehandle Expiration 1657 If possible, the client SHOULD recover from the receipt of an 1658 NFS4ERR_FHEXPIRED error. The client must take on additional 1659 responsibility so that it may prepare itself to recover from the 1660 expiration of a volatile filehandle. If the server returns 1661 persistent filehandles, the client does not need these additional 1662 steps. 1664 For volatile filehandles, most commonly the client will need to store 1665 the component names leading up to and including the filesystem object 1666 in question. With these names, the client should be able to recover 1667 by finding a filehandle in the name space that is still available or 1668 by starting at the root of the server's filesystem name space. 1670 If the expired filehandle refers to an object that has been removed 1671 from the filesystem, obviously the client will not be able to recover 1672 from the expired filehandle. 1674 It is also possible that the expired filehandle refers to a file that 1675 has been renamed. If the file was renamed by another client, again 1676 it is possible that the original client will not be able to recover. 1677 However, in the case that the client itself is renaming the file and 1678 the file is open, it is possible that the client may be able to 1679 recover. The client can determine the new path name based on the 1680 processing of the rename request. The client can then regenerate the 1681 new filehandle based on the new path name. The client could also use 1682 the compound operation mechanism to construct a set of operations 1683 like: 1685 RENAME A B 1686 LOOKUP B 1687 GETFH 1689 Note that the COMPOUND procedure does not provide atomicity. This 1690 example only reduces the overhead of recovering from an expired 1691 filehandle. 1693 5. File Attributes 1695 To meet the requirements of extensibility and increased 1696 interoperability with non-UNIX platforms, attributes need to be 1697 handled in a flexible manner. The NFSv3 fattr3 structure contains a 1698 fixed list of attributes that not all clients and servers are able to 1699 support or care about. The fattr3 structure cannot be extended as 1700 new needs arise and it provides no way to indicate non-support. With 1701 the NFSv4.0 protocol, the client is able to query what attributes the 1702 server supports and construct requests with only those supported 1703 attributes (or a subset thereof). 1705 To this end, attributes are divided into three groups: REQUIRED, 1706 RECOMMENDED, and named. Both REQUIRED and RECOMMENDED attributes are 1707 supported in the NFSv4.0 protocol by a specific and well-defined 1708 encoding and are identified by number. They are requested by setting 1709 a bit in the bit vector sent in the GETATTR request; the server 1710 response includes a bit vector to list what attributes were returned 1711 in the response. New REQUIRED or RECOMMENDED attributes may be added 1712 to the NFSv4 protocol as part of a new minor version by publishing a 1713 Standards Track RFC which allocates a new attribute number value and 1714 defines the encoding for the attribute. See Section 11 for further 1715 discussion. 1717 Named attributes are accessed by the new OPENATTR operation, which 1718 accesses a hidden directory of attributes associated with a file 1719 system object. OPENATTR takes a filehandle for the object and 1720 returns the filehandle for the attribute hierarchy. The filehandle 1721 for the named attributes is a directory object accessible by LOOKUP 1722 or READDIR and contains files whose names represent the named 1723 attributes and whose data bytes are the value of the attribute. For 1724 example: 1726 +----------+-----------+---------------------------------+ 1727 | LOOKUP | "foo" | ; look up file | 1728 | GETATTR | attrbits | | 1729 | OPENATTR | | ; access foo's named attributes | 1730 | LOOKUP | "x11icon" | ; look up specific attribute | 1731 | READ | 0,4096 | ; read stream of bytes | 1732 +----------+-----------+---------------------------------+ 1734 Named attributes are intended for data needed by applications rather 1735 than by an NFS client implementation. NFS implementors are strongly 1736 encouraged to define their new attributes as RECOMMENDED attributes 1737 by bringing them to the IETF Standards Track process. 1739 The set of attributes that are classified as REQUIRED is deliberately 1740 small since servers need to do whatever it takes to support them. A 1741 server should support as many of the RECOMMENDED attributes as 1742 possible but, by their definition, the server is not required to 1743 support all of them. Attributes are deemed REQUIRED if the data is 1744 both needed by a large number of clients and is not otherwise 1745 reasonably computable by the client when support is not provided on 1746 the server. 1748 Note that the hidden directory returned by OPENATTR is a convenience 1749 for protocol processing. The client should not make any assumptions 1750 about the server's implementation of named attributes and whether or 1751 not the underlying file system at the server has a named attribute 1752 directory. Therefore, operations such as SETATTR and GETATTR on the 1753 named attribute directory are undefined. 1755 5.1. REQUIRED Attributes 1757 These MUST be supported by every NFSv4.0 client and server in order 1758 to ensure a minimum level of interoperability. The server MUST store 1759 and return these attributes, and the client MUST be able to function 1760 with an attribute set limited to these attributes. With just the 1761 REQUIRED attributes some client functionality may be impaired or 1762 limited in some ways. A client may ask for any of these attributes 1763 to be returned by setting a bit in the GETATTR request, and the 1764 server must return their value. 1766 5.2. RECOMMENDED Attributes 1768 These attributes are understood well enough to warrant support in the 1769 NFSv4.0 protocol. However, they may not be supported on all clients 1770 and servers. A client MAY ask for any of these attributes to be 1771 returned by setting a bit in the GETATTR request but must handle the 1772 case where the server does not return them. A client MAY ask for the 1773 set of attributes the server supports and SHOULD NOT request 1774 attributes the server does not support. A server should be tolerant 1775 of requests for unsupported attributes and simply not return them 1776 rather than considering the request an error. It is expected that 1777 servers will support all attributes they comfortably can and only 1778 fail to support attributes that are difficult to support in their 1779 operating environments. A server should provide attributes whenever 1780 they don't have to "tell lies" to the client. For example, a file 1781 modification time should be either an accurate time or should not be 1782 supported by the server. At times this will be difficult for 1783 clients, but a client is better positioned to decide whether and how 1784 to fabricate or construct an attribute or whether to do without the 1785 attribute. 1787 5.3. Named Attributes 1789 These attributes are not supported by direct encoding in the NFSv4 1790 protocol but are accessed by string names rather than numbers and 1791 correspond to an uninterpreted stream of bytes that are stored with 1792 the file system object. The name space for these attributes may be 1793 accessed by using the OPENATTR operation. The OPENATTR operation 1794 returns a filehandle for a virtual "named attribute directory", and 1795 further perusal and modification of the name space may be done using 1796 operations that work on more typical directories. In particular, 1797 READDIR may be used to get a list of such named attributes, and 1798 LOOKUP and OPEN may select a particular attribute. Creation of a new 1799 named attribute may be the result of an OPEN specifying file 1800 creation. 1802 Once an OPEN is done, named attributes may be examined and changed by 1803 normal READ and WRITE operations using the filehandles and stateids 1804 returned by OPEN. 1806 Named attributes and the named attribute directory may have their own 1807 (non-named) attributes. Each of these objects must have all of the 1808 REQUIRED attributes and may have additional RECOMMENDED attributes. 1809 However, the set of attributes for named attributes and the named 1810 attribute directory need not be, and typically will not be, as large 1811 as that for other objects in that file system. 1813 Named attributes and the named attribute directory might be the 1814 target of delegations (in the case of the named attribute directory 1815 these will be directory delegations). However, since granting of 1816 delegations is at the server's discretion, a server need not support 1817 delegations on named attributes or the named attribute directory. 1819 It is RECOMMENDED that servers support arbitrary named attributes. A 1820 client should not depend on the ability to store any named attributes 1821 in the server's file system. If a server does support named 1822 attributes, a client that is also able to handle them should be able 1823 to copy a file's data and metadata with complete transparency from 1824 one location to another; this would imply that names allowed for 1825 regular directory entries are valid for named attribute names as 1826 well. 1828 In NFSv4.0, the structure of named attribute directories is 1829 restricted in a number of ways, in order to prevent the development 1830 of non-interoperable implementations in which some servers support a 1831 fully general hierarchical directory structure for named attributes 1832 while others support a limited but adequate structure for named 1833 attributes. In such an environment, clients or applications might 1834 come to depend on non-portable extensions. The restrictions are: 1836 o CREATE is not allowed in a named attribute directory. Thus, such 1837 objects as symbolic links and special files are not allowed to be 1838 named attributes. Further, directories may not be created in a 1839 named attribute directory, so no hierarchical structure of named 1840 attributes for a single object is allowed. 1842 o If OPENATTR is done on a named attribute directory or on a named 1843 attribute, the server MUST return NFS4ERR_WRONG_TYPE. 1845 o Doing a RENAME of a named attribute to a different named attribute 1846 directory or to an ordinary (i.e., non-named-attribute) directory 1847 is not allowed. 1849 o Creating hard links between named attribute directories or between 1850 named attribute directories and ordinary directories is not 1851 allowed. 1853 Names of attributes will not be controlled by this document or other 1854 IETF Standards Track documents. See Section 18 for further 1855 discussion. 1857 5.4. Classification of Attributes 1859 Each of the REQUIRED and RECOMMENDED attributes can be classified in 1860 one of three categories: per server (i.e., the value of the attribute 1861 will be the same for all file objects that share the same server), 1862 per file system (i.e., the value of the attribute will be the same 1863 for some or all file objects that share the same fsid attribute 1864 (Section 5.8.1.9) and server owner), or per file system object. Note 1865 that it is possible that some per file system attributes may vary 1866 within the file system. Note that it is possible that some per file 1867 system attributes may vary within the file system, depending on the 1868 value of the "homogeneous" (Section 5.8.2.16) attribute. Note that 1869 the attributes time_access_set and time_modify_set are not listed in 1870 this section because they are write-only attributes corresponding to 1871 time_access and time_modify, and are used in a special instance of 1872 SETATTR. 1874 o The per-server attribute is: 1876 lease_time 1878 o The per-file system attributes are: 1880 supported_attrs, fh_expire_type, link_support, symlink_support, 1881 unique_handles, aclsupport, cansettime, case_insensitive, 1882 case_preserving, chown_restricted, files_avail, files_free, 1883 files_total, fs_locations, homogeneous, maxfilesize, maxname, 1884 maxread, maxwrite, no_trunc, space_avail, space_free, 1885 space_total, time_delta, 1887 o The per-file system object attributes are: 1889 type, change, size, named_attr, fsid, rdattr_error, filehandle, 1890 acl, archive, fileid, hidden, maxlink, mimetype, mode, 1891 numlinks, owner, owner_group, rawdev, space_used, system, 1892 time_access, time_backup, time_create, time_metadata, 1893 time_modify, mounted_on_fileid 1895 For quota_avail_hard, quota_avail_soft, and quota_used, see their 1896 definitions below for the appropriate classification. 1898 5.5. Set-Only and Get-Only Attributes 1900 Some REQUIRED and RECOMMENDED attributes are set-only; i.e., they can 1901 be set via SETATTR but not retrieved via GETATTR. Similarly, some 1902 REQUIRED and RECOMMENDED attributes are get-only; i.e., they can be 1903 retrieved via GETATTR but not set via SETATTR. If a client attempts 1904 to set a get-only attribute or get a set-only attribute, the server 1905 MUST return NFS4ERR_INVAL. 1907 5.6. REQUIRED Attributes - List and Definition References 1909 The list of REQUIRED attributes appears in Table 2. The meaning of 1910 the columns of the table are: 1912 o Name: The name of attribute 1914 o Id: The number assigned to the attribute. In the event of 1915 conflicts between the assigned number and [2], the latter is 1916 likely authoritative, but should be resolved with Errata to this 1917 document and/or [2]. See [25] for the Errata process. 1919 o Data Type: The XDR data type of the attribute. 1921 o Acc: Access allowed to the attribute. R means read-only (GETATTR 1922 may retrieve, SETATTR may not set). W means write-only (SETATTR 1923 may set, GETATTR may not retrieve). R W means read/write (GETATTR 1924 may retrieve, SETATTR may set). 1926 o Defined in: The section of this specification that describes the 1927 attribute. 1929 +-----------------+----+------------+-----+------------------+ 1930 | Name | Id | Data Type | Acc | Defined in: | 1931 +-----------------+----+------------+-----+------------------+ 1932 | supported_attrs | 0 | bitmap4 | R | Section 5.8.1.1 | 1933 | type | 1 | nfs_ftype4 | R | Section 5.8.1.2 | 1934 | fh_expire_type | 2 | uint32_t | R | Section 5.8.1.3 | 1935 | change | 3 | uint64_t | R | Section 5.8.1.4 | 1936 | size | 4 | uint64_t | R W | Section 5.8.1.5 | 1937 | link_support | 5 | bool | R | Section 5.8.1.6 | 1938 | symlink_support | 6 | bool | R | Section 5.8.1.7 | 1939 | named_attr | 7 | bool | R | Section 5.8.1.8 | 1940 | fsid | 8 | fsid4 | R | Section 5.8.1.9 | 1941 | unique_handles | 9 | bool | R | Section 5.8.1.10 | 1942 | lease_time | 10 | nfs_lease4 | R | Section 5.8.1.11 | 1943 | rdattr_error | 11 | enum | R | Section 5.8.1.12 | 1944 | filehandle | 19 | nfs_fh4 | R | Section 5.8.1.13 | 1945 +-----------------+----+------------+-----+------------------+ 1947 Table 2 1949 5.7. RECOMMENDED Attributes - List and Definition References 1951 The RECOMMENDED attributes are defined in Table 3. The meanings of 1952 the column headers are the same as Table 2; see Section 5.6 for the 1953 meanings. 1955 +-------------------+----+--------------+-----+------------------+ 1956 | Name | Id | Data Type | Acc | Defined in: | 1957 +-------------------+----+--------------+-----+------------------+ 1958 | acl | 12 | nfsace4<> | R W | Section 6.2.1 | 1959 | aclsupport | 13 | uint32_t | R | Section 6.2.1.2 | 1960 | archive | 14 | bool | R W | Section 5.8.2.1 | 1961 | cansettime | 15 | bool | R | Section 5.8.2.2 | 1962 | case_insensitive | 16 | bool | R | Section 5.8.2.3 | 1963 | case_preserving | 17 | bool | R | Section 5.8.2.4 | 1964 | chown_restricted | 18 | bool | R | Section 5.8.2.5 | 1965 | fileid | 20 | uint64_t | R | Section 5.8.2.6 | 1966 | files_avail | 21 | uint64_t | R | Section 5.8.2.7 | 1967 | files_free | 22 | uint64_t | R | Section 5.8.2.8 | 1968 | files_total | 23 | uint64_t | R | Section 5.8.2.9 | 1969 | fs_locations | 24 | fs_locations | R | Section 5.8.2.10 | 1970 | hidden | 25 | bool | R W | Section 5.8.2.11 | 1971 | homogeneous | 26 | bool | R | Section 5.8.2.12 | 1972 | maxfilesize | 27 | uint64_t | R | Section 5.8.2.13 | 1973 | maxlink | 28 | uint32_t | R | Section 5.8.2.14 | 1974 | maxname | 29 | uint32_t | R | Section 5.8.2.15 | 1975 | maxread | 30 | uint64_t | R | Section 5.8.2.16 | 1976 | maxwrite | 31 | uint64_t | R | Section 5.8.2.17 | 1977 | mimetype | 32 | utf8<> | R W | Section 5.8.2.18 | 1978 | mode | 33 | mode4 | R W | Section 6.2.2 | 1979 | mounted_on_fileid | 55 | uint64_t | R | Section 5.8.2.19 | 1980 | no_trunc | 34 | bool | R | Section 5.8.2.20 | 1981 | numlinks | 35 | uint32_t | R | Section 5.8.2.21 | 1982 | owner | 36 | utf8<> | R W | Section 5.8.2.22 | 1983 | owner_group | 37 | utf8<> | R W | Section 5.8.2.23 | 1984 | quota_avail_hard | 38 | uint64_t | R | Section 5.8.2.24 | 1985 | quota_avail_soft | 39 | uint64_t | R | Section 5.8.2.25 | 1986 | quota_used | 40 | uint64_t | R | Section 5.8.2.26 | 1987 | rawdev | 41 | specdata4 | R | Section 5.8.2.27 | 1988 | space_avail | 42 | uint64_t | R | Section 5.8.2.28 | 1989 | space_free | 43 | uint64_t | R | Section 5.8.2.29 | 1990 | space_total | 44 | uint64_t | R | Section 5.8.2.30 | 1991 | space_used | 45 | uint64_t | R | Section 5.8.2.31 | 1992 | system | 46 | bool | R W | Section 5.8.2.32 | 1993 | time_access | 47 | nfstime4 | R | Section 5.8.2.33 | 1994 | time_access_set | 48 | settime4 | W | Section 5.8.2.34 | 1995 | time_backup | 49 | nfstime4 | R W | Section 5.8.2.35 | 1996 | time_create | 50 | nfstime4 | R W | Section 5.8.2.36 | 1997 | time_delta | 51 | nfstime4 | R | Section 5.8.2.37 | 1998 | time_metadata | 52 | nfstime4 | R | Section 5.8.2.38 | 1999 | time_modify | 53 | nfstime4 | R | Section 5.8.2.39 | 2000 | time_modify_set | 54 | settime4 | W | Section 5.8.2.40 | 2001 +-------------------+----+--------------+-----+------------------+ 2003 Table 3 2005 5.8. Attribute Definitions 2007 5.8.1. Definitions of REQUIRED Attributes 2009 5.8.1.1. Attribute 0: supported_attrs 2011 The bit vector that would retrieve all REQUIRED and RECOMMENDED 2012 attributes that are supported for this object. The scope of this 2013 attribute applies to all objects with a matching fsid. 2015 5.8.1.2. Attribute 1: type 2017 Designates the type of an object in terms of one of a number of 2018 special constants: 2020 o NF4REG designates a regular file. 2022 o NF4DIR designates a directory. 2024 o NF4BLK designates a block device special file. 2026 o NF4CHR designates a character device special file. 2028 o NF4LNK designates a symbolic link. 2030 o NF4SOCK designates a named socket special file. 2032 o NF4FIFO designates a fifo special file. 2034 o NF4ATTRDIR designates a named attribute directory. 2036 o NF4NAMEDATTR designates a named attribute. 2038 Within the explanatory text and operation descriptions, the following 2039 phrases will be used with the meanings given below: 2041 o The phrase "is a directory" means that the object's type attribute 2042 is NF4DIR or NF4ATTRDIR. 2044 o The phrase "is a special file" means that the object's type 2045 attribute is NF4BLK, NF4CHR, NF4SOCK, or NF4FIFO. 2047 o The phrase "is an ordinary file" means that the object's type 2048 attribute is NF4REG or NF4NAMEDATTR. 2050 5.8.1.3. Attribute 2: fh_expire_type 2052 Server uses this to specify filehandle expiration behavior to the 2053 client. See Section 4 for additional description. 2055 5.8.1.4. Attribute 3: change 2057 A value created by the server that the client can use to determine if 2058 file data, directory contents, or attributes of the object have been 2059 modified. The server may return the object's time_metadata attribute 2060 for this attribute's value but only if the file system object cannot 2061 be updated more frequently than the resolution of time_metadata. 2063 5.8.1.5. Attribute 4: size 2065 The size of the object in bytes. 2067 5.8.1.6. Attribute 5: link_support 2069 TRUE, if the object's file system supports hard links. 2071 5.8.1.7. Attribute 6: symlink_support 2073 TRUE, if the object's file system supports symbolic links. 2075 5.8.1.8. Attribute 7: named_attr 2077 TRUE, if this object has named attributes. In other words, object 2078 has a non-empty named attribute directory. 2080 5.8.1.9. Attribute 8: fsid 2082 Unique file system identifier for the file system holding this 2083 object. The fsid attribute has major and minor components, each of 2084 which are of data type uint64_t. 2086 5.8.1.10. Attribute 9: unique_handles 2088 TRUE, if two distinct filehandles are guaranteed to refer to two 2089 different file system objects. 2091 5.8.1.11. Attribute 10: lease_time 2093 Duration of the lease at server in seconds. 2095 5.8.1.12. Attribute 11: rdattr_error 2097 Error returned from an attempt to retrieve attributes during a 2098 READDIR operation. 2100 5.8.1.13. Attribute 19: filehandle 2102 The filehandle of this object (primarily for READDIR requests). 2104 5.8.2. Definitions of Uncategorized RECOMMENDED Attributes 2106 The definitions of most of the RECOMMENDED attributes follow. 2107 Collections that share a common category are defined in other 2108 sections. 2110 5.8.2.1. Attribute 14: archive 2112 TRUE, if this file has been archived since the time of last 2113 modification (deprecated in favor of time_backup). 2115 5.8.2.2. Attribute 15: cansettime 2117 TRUE, if the server is able to change the times for a file system 2118 object as specified in a SETATTR operation. 2120 5.8.2.3. Attribute 16: case_insensitive 2122 TRUE, if file name comparisons on this file system are case 2123 insensitive. 2125 5.8.2.4. Attribute 17: case_preserving 2127 TRUE, if file name case on this file system is preserved. 2129 5.8.2.5. Attribute 18: chown_restricted 2131 If TRUE, the server will reject any request to change either the 2132 owner or the group associated with a file if the caller is not a 2133 privileged user (for example, "root" in UNIX operating environments 2134 or in Windows 2000, the "Take Ownership" privilege). 2136 5.8.2.6. Attribute 20: fileid 2138 A number uniquely identifying the file within the file system. 2140 5.8.2.7. Attribute 21: files_avail 2142 File slots available to this user on the file system containing this 2143 object -- this should be the smallest relevant limit. 2145 5.8.2.8. Attribute 22: files_free 2147 Free file slots on the file system containing this object - this 2148 should be the smallest relevant limit. 2150 5.8.2.9. Attribute 23: files_total 2152 Total file slots on the file system containing this object. 2154 5.8.2.10. Attribute 24: fs_locations 2156 Locations where this file system may be found. If the server returns 2157 NFS4ERR_MOVED as an error, this attribute MUST be supported. 2159 The server can specify a root path by setting an array of zero path 2160 components. Other than this special case, the server MUST not 2161 present empty path components to the client. 2163 5.8.2.11. Attribute 25: hidden 2165 TRUE, if the file is considered hidden with respect to the Windows 2166 API. 2168 5.8.2.12. Attribute 26: homogeneous 2170 TRUE, if this object's file system is homogeneous, i.e., all objects 2171 in the file system (all objects on the server with the same fsid) 2172 have common values for all per-file-system attributes. 2174 5.8.2.13. Attribute 27: maxfilesize 2176 Maximum supported file size for the file system of this object. 2178 5.8.2.14. Attribute 28: maxlink 2180 Maximum number of links for this object. 2182 5.8.2.15. Attribute 29: maxname 2184 Maximum file name size supported for this object. 2186 5.8.2.16. Attribute 30: maxread 2188 Maximum amount of data the READ operation will return for this 2189 object. 2191 5.8.2.17. Attribute 31: maxwrite 2193 Maximum amount of data the WRITE operation will accept for this 2194 object. This attribute SHOULD be supported if the file is writable. 2195 Lack of this attribute can lead to the client either wasting 2196 bandwidth or not receiving the best performance. 2198 5.8.2.18. Attribute 32: mimetype 2200 MIME body type/subtype of this object. 2202 5.8.2.19. Attribute 55: mounted_on_fileid 2204 Like fileid, but if the target filehandle is the root of a file 2205 system, this attribute represents the fileid of the underlying 2206 directory. 2208 UNIX-based operating environments connect a file system into the 2209 namespace by connecting (mounting) the file system onto the existing 2210 file object (the mount point, usually a directory) of an existing 2211 file system. When the mount point's parent directory is read via an 2212 API like readdir(), the return results are directory entries, each 2213 with a component name and a fileid. The fileid of the mount point's 2214 directory entry will be different from the fileid that the stat() 2215 system call returns. The stat() system call is returning the fileid 2216 of the root of the mounted file system, whereas readdir() is 2217 returning the fileid that stat() would have returned before any file 2218 systems were mounted on the mount point. 2220 Unlike NFSv3, NFSv4.0 allows a client's LOOKUP request to cross other 2221 file systems. The client detects the file system crossing whenever 2222 the filehandle argument of LOOKUP has an fsid attribute different 2223 from that of the filehandle returned by LOOKUP. A UNIX-based client 2224 will consider this a "mount point crossing". UNIX has a legacy 2225 scheme for allowing a process to determine its current working 2226 directory. This relies on readdir() of a mount point's parent and 2227 stat() of the mount point returning fileids as previously described. 2228 The mounted_on_fileid attribute corresponds to the fileid that 2229 readdir() would have returned as described previously. 2231 While the NFSv4.0 client could simply fabricate a fileid 2232 corresponding to what mounted_on_fileid provides (and if the server 2233 does not support mounted_on_fileid, the client has no choice), there 2234 is a risk that the client will generate a fileid that conflicts with 2235 one that is already assigned to another object in the file system. 2236 Instead, if the server can provide the mounted_on_fileid, the 2237 potential for client operational problems in this area is eliminated. 2239 If the server detects that there is no mounted point at the target 2240 file object, then the value for mounted_on_fileid that it returns is 2241 the same as that of the fileid attribute. 2243 The mounted_on_fileid attribute is RECOMMENDED, so the server SHOULD 2244 provide it if possible, and for a UNIX-based server, this is 2245 straightforward. Usually, mounted_on_fileid will be requested during 2246 a READDIR operation, in which case it is trivial (at least for UNIX- 2247 based servers) to return mounted_on_fileid since it is equal to the 2248 fileid of a directory entry returned by readdir(). If 2249 mounted_on_fileid is requested in a GETATTR operation, the server 2250 should obey an invariant that has it returning a value that is equal 2251 to the file object's entry in the object's parent directory, i.e., 2252 what readdir() would have returned. Some operating environments 2253 allow a series of two or more file systems to be mounted onto a 2254 single mount point. In this case, for the server to obey the 2255 aforementioned invariant, it will need to find the base mount point, 2256 and not the intermediate mount points. 2258 5.8.2.20. Attribute 34: no_trunc 2260 If this attribute is TRUE, then if the client uses a file name longer 2261 than name_max, an error will be returned instead of the name being 2262 truncated. 2264 5.8.2.21. Attribute 35: numlinks 2266 Number of hard links to this object. 2268 5.8.2.22. Attribute 36: owner 2270 The string name of the owner of this object. 2272 5.8.2.23. Attribute 37: owner_group 2274 The string name of the group ownership of this object. 2276 5.8.2.24. Attribute 38: quota_avail_hard 2278 The value in bytes that represents the amount of additional disk 2279 space beyond the current allocation that can be allocated to this 2280 file or directory before further allocations will be refused. It is 2281 understood that this space may be consumed by allocations to other 2282 files or directories. 2284 5.8.2.25. Attribute 39: quota_avail_soft 2286 The value in bytes that represents the amount of additional disk 2287 space that can be allocated to this file or directory before the user 2288 may reasonably be warned. It is understood that this space may be 2289 consumed by allocations to other files or directories though there is 2290 a rule as to which other files or directories. 2292 5.8.2.26. Attribute 40: quota_used 2294 The value in bytes that represents the amount of disc space used by 2295 this file or directory and possibly a number of other similar files 2296 or directories, where the set of "similar" meets at least the 2297 criterion that allocating space to any file or directory in the set 2298 will reduce the "quota_avail_hard" of every other file or directory 2299 in the set. 2301 Note that there may be a number of distinct but overlapping sets of 2302 files or directories for which a quota_used value is maintained, e.g. 2303 "all files with a given owner", "all files with a given group owner". 2304 etc. The server is at liberty to choose any of those sets when 2305 providing the content of the quota_used attribute, but should do so 2306 in a repeatable way. The rule may be configured per file system or 2307 may be "choose the set with the smallest quota". 2309 5.8.2.27. Attribute 41: rawdev 2311 Raw device number of file of type NF4BLK or NF4CHR. The device 2312 number is split into major and minor numbers. If the file's type 2313 attribute is not NF4BLK or NF4CHR, the value returned SHOULD NOT be 2314 considered useful. 2316 5.8.2.28. Attribute 42: space_avail 2318 Disk space in bytes available to this user on the file system 2319 containing this object -- this should be the smallest relevant limit. 2321 5.8.2.29. Attribute 43: space_free 2323 Free disk space in bytes on the file system containing this object -- 2324 this should be the smallest relevant limit. 2326 5.8.2.30. Attribute 44: space_total 2328 Total disk space in bytes on the file system containing this object. 2330 5.8.2.31. Attribute 45: space_used 2332 Number of file system bytes allocated to this object. 2334 5.8.2.32. Attribute 46: system 2336 This attribute is TRUE if this file is a "system" file with respect 2337 to the Windows operating environment. 2339 5.8.2.33. Attribute 47: time_access 2341 The time_access attribute represents the time of last access to the 2342 object by a READ operation sent to the server. The notion of what is 2343 an "access" depends on the server's operating environment and/or the 2344 server's file system semantics. For example, for servers obeying 2345 Portable Operating System Interface (POSIX) semantics, time_access 2346 would be updated only by the READ and READDIR operations and not any 2347 of the operations that modify the content of the object [16], [17], 2348 [26], [27], [28]. Of course, setting the corresponding 2349 time_access_set attribute is another way to modify the time_access 2350 attribute. 2352 Whenever the file object resides on a writable file system, the 2353 server should make its best efforts to record time_access into stable 2354 storage. However, to mitigate the performance effects of doing so, 2355 and most especially whenever the server is satisfying the read of the 2356 object's content from its cache, the server MAY cache access time 2357 updates and lazily write them to stable storage. It is also 2358 acceptable to give administrators of the server the option to disable 2359 time_access updates. 2361 5.8.2.34. Attribute 48: time_access_set 2363 Sets the time of last access to the object. SETATTR use only. 2365 5.8.2.35. Attribute 49: time_backup 2367 The time of last backup of the object. 2369 5.8.2.36. Attribute 50: time_create 2371 The time of creation of the object. This attribute does not have any 2372 relation to the traditional UNIX file attribute "ctime" or "change 2373 time". 2375 5.8.2.37. Attribute 51: time_delta 2377 Smallest useful server time granularity. 2379 5.8.2.38. Attribute 52: time_metadata 2381 The time of last metadata modification of the object. 2383 5.8.2.39. Attribute 53: time_modify 2385 The time of last modification to the object. 2387 5.8.2.40. Attribute 54: time_modify_set 2389 Sets the time of last modification to the object. SETATTR use only. 2391 5.9. Interpreting owner and owner_group 2393 The RECOMMENDED attributes "owner" and "owner_group" (and also users 2394 and groups within the "acl" attribute) are represented in terms of a 2395 UTF-8 string. To avoid a representation that is tied to a particular 2396 underlying implementation at the client or server, the use of the 2397 UTF-8 string has been chosen. Note that section 6.1 of RFC 2624 [29] 2398 provides additional rationale. It is expected that the client and 2399 server will have their own local representation of owner and 2400 owner_group that is used for local storage or presentation to the end 2401 user. Therefore, it is expected that when these attributes are 2402 transferred between the client and server, the local representation 2403 is translated to a syntax of the form "user@dns_domain". This will 2404 allow for a client and server that do not use the same local 2405 representation the ability to translate to a common syntax that can 2406 be interpreted by both. 2408 Similarly, security principals may be represented in different ways 2409 by different security mechanisms. Servers normally translate these 2410 representations into a common format, generally that used by local 2411 storage, to serve as a means of identifying the users corresponding 2412 to these security principals. When these local identifiers are 2413 translated to the form of the owner attribute, associated with files 2414 created by such principals, they identify, in a common format, the 2415 users associated with each corresponding set of security principals. 2417 The translation used to interpret owner and group strings is not 2418 specified as part of the protocol. This allows various solutions to 2419 be employed. For example, a local translation table may be consulted 2420 that maps a numeric identifier to the user@dns_domain syntax. A name 2421 service may also be used to accomplish the translation. A server may 2422 provide a more general service, not limited by any particular 2423 translation (which would only translate a limited set of possible 2424 strings) by storing the owner and owner_group attributes in local 2425 storage without any translation or it may augment a translation 2426 method by storing the entire string for attributes for which no 2427 translation is available while using the local representation for 2428 those cases in which a translation is available. 2430 Servers that do not provide support for all possible values of the 2431 owner and owner_group attributes SHOULD return an error 2432 (NFS4ERR_BADOWNER) when a string is presented that has no 2433 translation, as the value to be set for a SETATTR of the owner, 2434 owner_group, or acl attributes. When a server does accept an owner 2435 or owner_group value as valid on a SETATTR (and similarly for the 2436 owner and group strings in an acl), it is promising to return that 2437 same string for which see below) when a corresponding GETATTR is 2438 done. For some internationalization-related exceptions where this is 2439 not possible, see below. Configuration changes (including changes 2440 from the mapping of the string to the local representation) and ill- 2441 constructed name translations (those that contain aliasing) may make 2442 that promise impossible to honor. Servers should make appropriate 2443 efforts to avoid a situation in which these attributes have their 2444 values changed when no real change to ownership has occurred. 2446 The "dns_domain" portion of the owner string is meant to be a DNS 2447 domain name. For example, user@example.org. Servers should accept 2448 as valid a set of users for at least one domain. A server may treat 2449 other domains as having no valid translations. A more general 2450 service is provided when a server is capable of accepting users for 2451 multiple domains, or for all domains, subject to security 2452 constraints. 2454 As an implementation guide, both clients and servers may provide a 2455 means to configure the "dns_domain" portion of the owner string. For 2456 example, the DNS domain name might be "lab.example.org", but the user 2457 names are defined in "example.org". In the absence of such a 2458 configuration, or as a default, the current DNS domain name should be 2459 the value used for the "dns_domain". 2461 As mentioned above, it is desirable that a server when accepting a 2462 string of the form user@domain or group@domain in an attribute, 2463 return this same string when that corresponding attribute is fetched. 2464 Internationalization issues (for a general discussion of which see 2465 Section 12) make this impossible and the client needs to take note of 2466 the following situations: 2468 o The string representing the domain may be converted to equivalent 2469 U-label, if presented using a form other a a U-label. See 2470 Section 12.6 for details. 2472 o The user or group may be returned in a different form, due to 2473 normalization issues, although it will always be a canonically 2474 equivalent string. See See Section 12.7.3 for details. 2476 In the case where there is no translation available to the client or 2477 server, the attribute value will be constructed without the "@". 2478 Therefore, the absence of the "@" from the owner or owner_group 2479 attribute signifies that no translation was available at the sender 2480 and that the receiver of the attribute should not use that string as 2481 a basis for translation into its own internal format. Even though 2482 the attribute value cannot be translated, it may still be useful. In 2483 the case of a client, the attribute string may be used for local 2484 display of ownership. 2486 To provide a greater degree of compatibility with NFSv3, which 2487 identified users and groups by 32-bit unsigned user identifiers and 2488 group identifiers, owner and group strings that consist of decimal 2489 numeric values with no leading zeros can be given a special 2490 interpretation by clients and servers that choose to provide such 2491 support. The receiver may treat such a user or group string as 2492 representing the same user as would be represented by an NFSv3 uid or 2493 gid having the corresponding numeric value. 2495 A server SHOULD reject such a numeric value if the security mechanism 2496 is kerberized. I.e., in such a scenario, the client will already 2497 need to form "user@domain" strings. For any other security 2498 mechanism, the server SHOULD accept such numeric values. As an 2499 implementation note, the server could make such an acceptance be 2500 configurable. If the server does not support numeric values or if it 2501 is configured off, then it MUST return an NFS4ERR_BADOWNER error. If 2502 the security mechanism is kerberized and the client attempts to use 2503 the special form, then the server SHOULD return an NFS4ERR_BADOWNER 2504 error when there is a valid translation for the user or owner 2505 designated in this way. In that case, the client must use the 2506 appropriate user@domain string and not the special form for 2507 compatibility. 2509 The client MUST always accept numeric values if the security 2510 mechanism is not kerberized. A client can determine if a server 2511 supports such a mechanism by first attempting to provide a numeric 2512 value and only if it is rejected with an NFS4ERR_BADOWNER error, then 2513 providing a name value. After the first detection of such an error, 2514 the client should only use the special form. 2516 The owner string "nobody" may be used to designate an anonymous user, 2517 which will be associated with a file created by a security principal 2518 that cannot be mapped through normal means to the owner attribute. 2520 5.10. Character Case Attributes 2522 With respect to the case_insensitive and case_preserving attributes, 2523 each UCS-4 character (which UTF-8 encodes) has a "long descriptive 2524 name" RFC1345 [30] which may or may not include the word "CAPITAL" or 2525 "SMALL". The presence of SMALL or CAPITAL allows an NFS server to 2526 implement unambiguous and efficient table driven mappings for case 2527 insensitive comparisons, and non-case-preserving storage, although 2528 there are variations that occur additional characters with a name 2529 including "SMALL" or "CAPITAL" are added in a subsequent version of 2530 Unicode. 2532 For general character handling and internationalization issues, see 2533 Section 12. For details regarding case mapping, see the section 2534 Case-based Mapping Used for Component4 Strings. 2536 6. Access Control Attributes 2538 Access Control Lists (ACLs) are file attributes that specify fine 2539 grained access control. This chapter covers the "acl", "aclsupport", 2540 "mode", file attributes, and their interactions. Note that file 2541 attributes may apply to any file system object. 2543 6.1. Goals 2545 ACLs and modes represent two well established models for specifying 2546 permissions. This chapter specifies requirements that attempt to 2547 meet the following goals: 2549 o If a server supports the mode attribute, it should provide 2550 reasonable semantics to clients that only set and retrieve the 2551 mode attribute. 2553 o If a server supports ACL attributes, it should provide reasonable 2554 semantics to clients that only set and retrieve those attributes. 2556 o On servers that support the mode attribute, if ACL attributes have 2557 never been set on an object, via inheritance or explicitly, the 2558 behavior should be traditional UNIX-like behavior. 2560 o On servers that support the mode attribute, if the ACL attributes 2561 have been previously set on an object, either explicitly or via 2562 inheritance: 2564 * Setting only the mode attribute should effectively control the 2565 traditional UNIX-like permissions of read, write, and execute 2566 on owner, owner_group, and other. 2568 * Setting only the mode attribute should provide reasonable 2569 security. For example, setting a mode of 000 should be enough 2570 to ensure that future opens for read or write by any principal 2571 fail, regardless of a previously existing or inherited ACL. 2573 o When a mode attribute is set on an object, the ACL attributes may 2574 need to be modified so as to not conflict with the new mode. In 2575 such cases, it is desirable that the ACL keep as much information 2576 as possible. This includes information about inheritance, AUDIT 2577 and ALARM ACEs, and permissions granted and denied that do not 2578 conflict with the new mode. 2580 6.2. File Attributes Discussion 2582 6.2.1. Attribute 12: acl 2584 The NFSv4.0 ACL attribute contains an array of access control entries 2585 (ACEs) that are associated with the file system object. Although the 2586 client can read and write the acl attribute, the server is 2587 responsible for using the ACL to perform access control. The client 2588 can use the OPEN or ACCESS operations to check access without 2589 modifying or reading data or metadata. 2591 The NFS ACE structure is defined as follows: 2593 typedef uint32_t acetype4; 2595 typedef uint32_t aceflag4; 2597 typedef uint32_t acemask4; 2599 struct nfsace4 { 2600 acetype4 type; 2601 aceflag4 flag; 2602 acemask4 access_mask; 2603 utf8_must who; 2604 }; 2605 To determine if a request succeeds, the server processes each nfsace4 2606 entry in order. Only ACEs which have a "who" that matches the 2607 requester are considered. Each ACE is processed until all of the 2608 bits of the requester's access have been ALLOWED. Once a bit (see 2609 below) has been ALLOWED by an ACCESS_ALLOWED_ACE, it is no longer 2610 considered in the processing of later ACEs. If an ACCESS_DENIED_ACE 2611 is encountered where the requester's access still has unALLOWED bits 2612 in common with the "access_mask" of the ACE, the request is denied. 2613 When the ACL is fully processed, if there are bits in the requester's 2614 mask that have not been ALLOWED or DENIED, access is denied. 2616 Unlike the ALLOW and DENY ACE types, the ALARM and AUDIT ACE types do 2617 not affect a requester's access, and instead are for triggering 2618 events as a result of a requester's access attempt. Therefore, AUDIT 2619 and ALARM ACEs are processed only after processing ALLOW and DENY 2620 ACEs. 2622 The NFSv4.0 ACL model is quite rich. Some server platforms may 2623 provide access control functionality that goes beyond the UNIX-style 2624 mode attribute, but which is not as rich as the NFS ACL model. So 2625 that users can take advantage of this more limited functionality, the 2626 server may support the acl attributes by mapping between its ACL 2627 model and the NFSv4.0 ACL model. Servers must ensure that the ACL 2628 they actually store or enforce is at least as strict as the NFSv4 ACL 2629 that was set. It is tempting to accomplish this by rejecting any ACL 2630 that falls outside the small set that can be represented accurately. 2631 However, such an approach can render ACLs unusable without special 2632 client-side knowledge of the server's mapping, which defeats the 2633 purpose of having a common NFSv4 ACL protocol. Therefore servers 2634 should accept every ACL that they can without compromising security. 2635 To help accomplish this, servers may make a special exception, in the 2636 case of unsupported permission bits, to the rule that bits not 2637 ALLOWED or DENIED by an ACL must be denied. For example, a UNIX- 2638 style server might choose to silently allow read attribute 2639 permissions even though an ACL does not explicitly allow those 2640 permissions. (An ACL that explicitly denies permission to read 2641 attributes should still be rejected.) 2643 The situation is complicated by the fact that a server may have 2644 multiple modules that enforce ACLs. For example, the enforcement for 2645 NFSv4.0 access may be different from, but not weaker than, the 2646 enforcement for local access, and both may be different from the 2647 enforcement for access through other protocols such as SMB. So it 2648 may be useful for a server to accept an ACL even if not all of its 2649 modules are able to support it. 2651 The guiding principle with regard to NFSv4 access is that the server 2652 must not accept ACLs that appear to make access to the file more 2653 restrictive than it really is. 2655 6.2.1.1. ACE Type 2657 The constants used for the type field (acetype4) are as follows: 2659 const ACE4_ACCESS_ALLOWED_ACE_TYPE = 0x00000000; 2660 const ACE4_ACCESS_DENIED_ACE_TYPE = 0x00000001; 2661 const ACE4_SYSTEM_AUDIT_ACE_TYPE = 0x00000002; 2662 const ACE4_SYSTEM_ALARM_ACE_TYPE = 0x00000003; 2664 All four but types are permitted in the acl attribute. 2666 +------------------------------+--------------+---------------------+ 2667 | Value | Abbreviation | Description | 2668 +------------------------------+--------------+---------------------+ 2669 | ACE4_ACCESS_ALLOWED_ACE_TYPE | ALLOW | Explicitly grants | 2670 | | | the access defined | 2671 | | | in acemask4 to the | 2672 | | | file or directory. | 2673 | ACE4_ACCESS_DENIED_ACE_TYPE | DENY | Explicitly denies | 2674 | | | the access defined | 2675 | | | in acemask4 to the | 2676 | | | file or directory. | 2677 | ACE4_SYSTEM_AUDIT_ACE_TYPE | AUDIT | LOG (in a system | 2678 | | | dependent way) any | 2679 | | | access attempt to a | 2680 | | | file or directory | 2681 | | | which uses any of | 2682 | | | the access methods | 2683 | | | specified in | 2684 | | | acemask4. | 2685 | ACE4_SYSTEM_ALARM_ACE_TYPE | ALARM | Generate a system | 2686 | | | ALARM (system | 2687 | | | dependent) when any | 2688 | | | access attempt is | 2689 | | | made to a file or | 2690 | | | directory for the | 2691 | | | access methods | 2692 | | | specified in | 2693 | | | acemask4. | 2694 +------------------------------+--------------+---------------------+ 2696 The "Abbreviation" column denotes how the types will be referred to 2697 throughout the rest of this chapter. 2699 6.2.1.2. Attribute 13: aclsupport 2701 A server need not support all of the above ACE types. This attribute 2702 indicates which ACE types are supported for the current file system. 2703 The bitmask constants used to represent the above definitions within 2704 the aclsupport attribute are as follows: 2706 const ACL4_SUPPORT_ALLOW_ACL = 0x00000001; 2707 const ACL4_SUPPORT_DENY_ACL = 0x00000002; 2708 const ACL4_SUPPORT_AUDIT_ACL = 0x00000004; 2709 const ACL4_SUPPORT_ALARM_ACL = 0x00000008; 2711 Servers which support either the ALLOW or DENY ACE type SHOULD 2712 support both ALLOW and DENY ACE types. 2714 Clients should not attempt to set an ACE unless the server claims 2715 support for that ACE type. If the server receives a request to set 2716 an ACE that it cannot store, it MUST reject the request with 2717 NFS4ERR_ATTRNOTSUPP. If the server receives a request to set an ACE 2718 that it can store but cannot enforce, the server SHOULD reject the 2719 request with NFS4ERR_ATTRNOTSUPP. 2721 Support for any of the ACL attributes is optional (albeit, 2722 RECOMMENDED). 2724 6.2.1.3. ACE Access Mask 2726 The bitmask constants used for the access mask field are as follows: 2728 const ACE4_READ_DATA = 0x00000001; 2729 const ACE4_LIST_DIRECTORY = 0x00000001; 2730 const ACE4_WRITE_DATA = 0x00000002; 2731 const ACE4_ADD_FILE = 0x00000002; 2732 const ACE4_APPEND_DATA = 0x00000004; 2733 const ACE4_ADD_SUBDIRECTORY = 0x00000004; 2734 const ACE4_READ_NAMED_ATTRS = 0x00000008; 2735 const ACE4_WRITE_NAMED_ATTRS = 0x00000010; 2736 const ACE4_EXECUTE = 0x00000020; 2737 const ACE4_DELETE_CHILD = 0x00000040; 2738 const ACE4_READ_ATTRIBUTES = 0x00000080; 2739 const ACE4_WRITE_ATTRIBUTES = 0x00000100; 2741 const ACE4_DELETE = 0x00010000; 2742 const ACE4_READ_ACL = 0x00020000; 2743 const ACE4_WRITE_ACL = 0x00040000; 2744 const ACE4_WRITE_OWNER = 0x00080000; 2745 const ACE4_SYNCHRONIZE = 0x00100000; 2746 Note that some masks have coincident values, for example, 2747 ACE4_READ_DATA and ACE4_LIST_DIRECTORY. The mask entries 2748 ACE4_LIST_DIRECTORY, ACE4_ADD_FILE, and ACE4_ADD_SUBDIRECTORY are 2749 intended to be used with directory objects, while ACE4_READ_DATA, 2750 ACE4_WRITE_DATA, and ACE4_APPEND_DATA are intended to be used with 2751 non-directory objects. 2753 6.2.1.3.1. Discussion of Mask Attributes 2755 ACE4_READ_DATA 2757 Operation(s) affected: 2759 READ 2761 OPEN 2763 Discussion: 2765 Permission to read the data of the file. 2767 Servers SHOULD allow a user the ability to read the data of the 2768 file when only the ACE4_EXECUTE access mask bit is allowed. 2770 ACE4_LIST_DIRECTORY 2772 Operation(s) affected: 2774 READDIR 2776 Discussion: 2778 Permission to list the contents of a directory. 2780 ACE4_WRITE_DATA 2782 Operation(s) affected: 2784 WRITE 2786 OPEN 2788 SETATTR of size 2790 Discussion: 2792 Permission to modify a file's data. 2794 ACE4_ADD_FILE 2796 Operation(s) affected: 2798 CREATE 2800 LINK 2802 OPEN 2804 RENAME 2806 Discussion: 2808 Permission to add a new file in a directory. The CREATE 2809 operation is affected when nfs_ftype4 is NF4LNK, NF4BLK, 2810 NF4CHR, NF4SOCK, or NF4FIFO. (NF4DIR is not listed because it 2811 is covered by ACE4_ADD_SUBDIRECTORY.) OPEN is affected when 2812 used to create a regular file. LINK and RENAME are always 2813 affected. 2815 ACE4_APPEND_DATA 2817 Operation(s) affected: 2819 WRITE 2821 OPEN 2823 SETATTR of size 2825 Discussion: 2827 The ability to modify a file's data, but only starting at EOF. 2828 This allows for the notion of append-only files, by allowing 2829 ACE4_APPEND_DATA and denying ACE4_WRITE_DATA to the same user 2830 or group. If a file has an ACL such as the one described above 2831 and a WRITE request is made for somewhere other than EOF, the 2832 server SHOULD return NFS4ERR_ACCESS. 2834 ACE4_ADD_SUBDIRECTORY 2836 Operation(s) affected: 2838 CREATE 2840 RENAME 2842 Discussion: 2844 Permission to create a subdirectory in a directory. The CREATE 2845 operation is affected when nfs_ftype4 is NF4DIR. The RENAME 2846 operation is always affected. 2848 ACE4_READ_NAMED_ATTRS 2850 Operation(s) affected: 2852 OPENATTR 2854 Discussion: 2856 Permission to read the named attributes of a file or to lookup 2857 the named attributes directory. OPENATTR is affected when it 2858 is not used to create a named attribute directory. This is 2859 when 1.) createdir is TRUE, but a named attribute directory 2860 already exists, or 2.) createdir is FALSE. 2862 ACE4_WRITE_NAMED_ATTRS 2864 Operation(s) affected: 2866 OPENATTR 2868 Discussion: 2870 Permission to write the named attributes of a file or to create 2871 a named attribute directory. OPENATTR is affected when it is 2872 used to create a named attribute directory. This is when 2873 createdir is TRUE and no named attribute directory exists. The 2874 ability to check whether or not a named attribute directory 2875 exists depends on the ability to look it up, therefore, users 2876 also need the ACE4_READ_NAMED_ATTRS permission in order to 2877 create a named attribute directory. 2879 ACE4_EXECUTE 2881 Operation(s) affected: 2883 READ 2885 OPEN 2887 REMOVE 2889 RENAME 2891 LINK 2893 CREATE 2895 Discussion: 2897 Permission to execute a file. 2899 Servers SHOULD allow a user the ability to read the data of the 2900 file when only the ACE4_EXECUTE access mask bit is allowed. 2901 This is because there is no way to execute a file without 2902 reading the contents. Though a server may treat ACE4_EXECUTE 2903 and ACE4_READ_DATA bits identically when deciding to permit a 2904 READ operation, it SHOULD still allow the two bits to be set 2905 independently in ACLs, and MUST distinguish between them when 2906 replying to ACCESS operations. In particular, servers SHOULD 2907 NOT silently turn on one of the two bits when the other is set, 2908 as that would make it impossible for the client to correctly 2909 enforce the distinction between read and execute permissions. 2911 As an example, following a SETATTR of the following ACL: 2913 nfsuser:ACE4_EXECUTE:ALLOW 2915 A subsequent GETATTR of ACL for that file SHOULD return: 2917 nfsuser:ACE4_EXECUTE:ALLOW 2919 Rather than: 2921 nfsuser:ACE4_EXECUTE/ACE4_READ_DATA:ALLOW 2923 ACE4_EXECUTE 2925 Operation(s) affected: 2927 LOOKUP 2929 Discussion: 2931 Permission to traverse/search a directory. 2933 ACE4_DELETE_CHILD 2935 Operation(s) affected: 2937 REMOVE 2939 RENAME 2941 Discussion: 2943 Permission to delete a file or directory within a directory. 2944 See Section 6.2.1.3.2 for information on ACE4_DELETE and 2945 ACE4_DELETE_CHILD interact. 2947 ACE4_READ_ATTRIBUTES 2949 Operation(s) affected: 2951 GETATTR of file system object attributes 2953 VERIFY 2955 NVERIFY 2957 READDIR 2959 Discussion: 2961 The ability to read basic attributes (non-ACLs) of a file. On 2962 a UNIX system, basic attributes can be thought of as the stat 2963 level attributes. Allowing this access mask bit would mean the 2964 entity can execute "ls -l" and stat. If a READDIR operation 2965 requests attributes, this mask must be allowed for the READDIR 2966 to succeed. 2968 ACE4_WRITE_ATTRIBUTES 2970 Operation(s) affected: 2972 SETATTR of time_access_set, time_backup, 2974 time_create, time_modify_set, mimetype, hidden, system 2976 Discussion: 2978 Permission to change the times associated with a file or 2979 directory to an arbitrary value. Also permission to change the 2980 mimetype, hidden and system attributes. A user having 2981 ACE4_WRITE_DATA or ACE4_WRITE_ATTRIBUTES will be allowed to set 2982 the times associated with a file to the current server time. 2984 ACE4_DELETE 2986 Operation(s) affected: 2988 REMOVE 2990 Discussion: 2992 Permission to delete the file or directory. See 2993 Section 6.2.1.3.2 for information on ACE4_DELETE and 2994 ACE4_DELETE_CHILD interact. 2996 ACE4_READ_ACL 2998 Operation(s) affected: 3000 GETATTR of acl 3002 NVERIFY 3004 VERIFY 3006 Discussion: 3008 Permission to read the ACL. 3010 ACE4_WRITE_ACL 3012 Operation(s) affected: 3014 SETATTR of acl and mode 3016 Discussion: 3018 Permission to write the acl and mode attributes. 3020 ACE4_WRITE_OWNER 3022 Operation(s) affected: 3024 SETATTR of owner and owner_group 3026 Discussion: 3028 Permission to write the owner and owner_group attributes. On 3029 UNIX systems, this is the ability to execute chown() and 3030 chgrp(). 3032 ACE4_SYNCHRONIZE 3034 Operation(s) affected: 3036 NONE 3038 Discussion: 3040 Permission to access file locally at the server with 3041 synchronized reads and writes. 3043 Server implementations need not provide the granularity of control 3044 that is implied by this list of masks. For example, POSIX-based 3045 systems might not distinguish ACE4_APPEND_DATA (the ability to append 3046 to a file) from ACE4_WRITE_DATA (the ability to modify existing 3047 contents); both masks would be tied to a single "write" permission. 3048 When such a server returns attributes to the client, it would show 3049 both ACE4_APPEND_DATA and ACE4_WRITE_DATA if and only if the write 3050 permission is enabled. 3052 If a server receives a SETATTR request that it cannot accurately 3053 implement, it should err in the direction of more restricted access, 3054 except in the previously discussed cases of execute and read. For 3055 example, suppose a server cannot distinguish overwriting data from 3056 appending new data, as described in the previous paragraph. If a 3057 client submits an ALLOW ACE where ACE4_APPEND_DATA is set but 3058 ACE4_WRITE_DATA is not (or vice versa), the server should either turn 3059 off ACE4_APPEND_DATA or reject the request with NFS4ERR_ATTRNOTSUPP. 3061 6.2.1.3.2. ACE4_DELETE vs. ACE4_DELETE_CHILD 3063 Two access mask bits govern the ability to delete a directory entry: 3064 ACE4_DELETE on the object itself (the "target"), and 3065 ACE4_DELETE_CHILD on the containing directory (the "parent"). 3067 Many systems also take the "sticky bit" (MODE4_SVTX) on a directory 3068 to allow unlink only to a user that owns either the target or the 3069 parent; on some such systems the decision also depends on whether the 3070 target is writable. 3072 Servers SHOULD allow unlink if either ACE4_DELETE is permitted on the 3073 target, or ACE4_DELETE_CHILD is permitted on the parent. (Note that 3074 this is true even if the parent or target explicitly denies one of 3075 these permissions.) 3077 If the ACLs in question neither explicitly ALLOW nor DENY either of 3078 the above, and if MODE4_SVTX is not set on the parent, then the 3079 server SHOULD allow the removal if and only if ACE4_ADD_FILE is 3080 permitted. In the case where MODE4_SVTX is set, the server may also 3081 require the remover to own either the parent or the target, or may 3082 require the target to be writable. 3084 This allows servers to support something close to traditional UNIX- 3085 like semantics, with ACE4_ADD_FILE taking the place of the write bit. 3087 6.2.1.4. ACE flag 3089 The bitmask constants used for the flag field are as follows: 3091 const ACE4_FILE_INHERIT_ACE = 0x00000001; 3092 const ACE4_DIRECTORY_INHERIT_ACE = 0x00000002; 3093 const ACE4_NO_PROPAGATE_INHERIT_ACE = 0x00000004; 3094 const ACE4_INHERIT_ONLY_ACE = 0x00000008; 3095 const ACE4_SUCCESSFUL_ACCESS_ACE_FLAG = 0x00000010; 3096 const ACE4_FAILED_ACCESS_ACE_FLAG = 0x00000020; 3097 const ACE4_IDENTIFIER_GROUP = 0x00000040; 3099 A server need not support any of these flags. If the server supports 3100 flags that are similar to, but not exactly the same as, these flags, 3101 the implementation may define a mapping between the protocol-defined 3102 flags and the implementation-defined flags. 3104 For example, suppose a client tries to set an ACE with 3105 ACE4_FILE_INHERIT_ACE set but not ACE4_DIRECTORY_INHERIT_ACE. If the 3106 server does not support any form of ACL inheritance, the server 3107 should reject the request with NFS4ERR_ATTRNOTSUPP. If the server 3108 supports a single "inherit ACE" flag that applies to both files and 3109 directories, the server may reject the request (i.e., requiring the 3110 client to set both the file and directory inheritance flags). The 3111 server may also accept the request and silently turn on the 3112 ACE4_DIRECTORY_INHERIT_ACE flag. 3114 6.2.1.4.1. Discussion of Flag Bits 3116 ACE4_FILE_INHERIT_ACE 3117 Any non-directory file in any sub-directory will get this ACE 3118 inherited. 3120 ACE4_DIRECTORY_INHERIT_ACE 3121 Can be placed on a directory and indicates that this ACE should be 3122 added to each new directory created. 3123 If this flag is set in an ACE in an ACL attribute to be set on a 3124 non-directory file system object, the operation attempting to set 3125 the ACL SHOULD fail with NFS4ERR_ATTRNOTSUPP. 3127 ACE4_INHERIT_ONLY_ACE 3128 Can be placed on a directory but does not apply to the directory; 3129 ALLOW and DENY ACEs with this bit set do not affect access to the 3130 directory, and AUDIT and ALARM ACEs with this bit set do not 3131 trigger log or alarm events. Such ACEs only take effect once they 3132 are applied (with this bit cleared) to newly created files and 3133 directories as specified by the above two flags. 3134 If this flag is present on an ACE, but neither 3135 ACE4_DIRECTORY_INHERIT_ACE nor ACE4_FILE_INHERIT_ACE is present, 3136 then an operation attempting to set such an attribute SHOULD fail 3137 with NFS4ERR_ATTRNOTSUPP. 3139 ACE4_NO_PROPAGATE_INHERIT_ACE 3140 Can be placed on a directory. This flag tells the server that 3141 inheritance of this ACE should stop at newly created child 3142 directories. 3144 ACE4_SUCCESSFUL_ACCESS_ACE_FLAG 3146 ACE4_FAILED_ACCESS_ACE_FLAG 3147 The ACE4_SUCCESSFUL_ACCESS_ACE_FLAG (SUCCESS) and 3148 ACE4_FAILED_ACCESS_ACE_FLAG (FAILED) flag bits may be set only on 3149 ACE4_SYSTEM_AUDIT_ACE_TYPE (AUDIT) and ACE4_SYSTEM_ALARM_ACE_TYPE 3150 (ALARM) ACE types. If during the processing of the file's ACL, 3151 the server encounters an AUDIT or ALARM ACE that matches the 3152 principal attempting the OPEN, the server notes that fact, and the 3153 presence, if any, of the SUCCESS and FAILED flags encountered in 3154 the AUDIT or ALARM ACE. Once the server completes the ACL 3155 processing, it then notes if the operation succeeded or failed. 3156 If the operation succeeded, and if the SUCCESS flag was set for a 3157 matching AUDIT or ALARM ACE, then the appropriate AUDIT or ALARM 3158 event occurs. If the operation failed, and if the FAILED flag was 3159 set for the matching AUDIT or ALARM ACE, then the appropriate 3160 AUDIT or ALARM event occurs. Either or both of the SUCCESS or 3161 FAILED can be set, but if neither is set, the AUDIT or ALARM ACE 3162 is not useful. 3164 The previously described processing applies to ACCESS operations 3165 even when they return NFS4_OK. For the purposes of AUDIT and 3166 ALARM, we consider an ACCESS operation to be a "failure" if it 3167 fails to return a bit that was requested and supported. 3169 ACE4_IDENTIFIER_GROUP 3170 Indicates that the "who" refers to a GROUP as defined under UNIX 3171 or a GROUP ACCOUNT as defined under Windows. Clients and servers 3172 MUST ignore the ACE4_IDENTIFIER_GROUP flag on ACEs with a who 3173 value equal to one of the special identifiers outlined in 3174 Section 6.2.1.5. 3176 6.2.1.5. ACE Who 3178 The "who" field of an ACE is an identifier that specifies the 3179 principal or principals to whom the ACE applies. It may refer to a 3180 user or a group, with the flag bit ACE4_IDENTIFIER_GROUP specifying 3181 which. 3183 There are several special identifiers which need to be understood 3184 universally, rather than in the context of a particular DNS domain. 3185 Some of these identifiers cannot be understood when an NFS client 3186 accesses the server, but have meaning when a local process accesses 3187 the file. The ability to display and modify these permissions is 3188 permitted over NFS, even if none of the access methods on the server 3189 understands the identifiers. 3191 +---------------+--------------------------------------------------+ 3192 | Who | Description | 3193 +---------------+--------------------------------------------------+ 3194 | OWNER | The owner of the file | 3195 | GROUP | The group associated with the file. | 3196 | EVERYONE | The world, including the owner and owning group. | 3197 | INTERACTIVE | Accessed from an interactive terminal. | 3198 | NETWORK | Accessed via the network. | 3199 | DIALUP | Accessed as a dialup user to the server. | 3200 | BATCH | Accessed from a batch job. | 3201 | ANONYMOUS | Accessed without any authentication. | 3202 | AUTHENTICATED | Any authenticated user (opposite of ANONYMOUS) | 3203 | SERVICE | Access from a system service. | 3204 +---------------+--------------------------------------------------+ 3206 Table 4 3208 To avoid conflict, these special identifiers are distinguished by an 3209 appended "@" and should appear in the form "xxxx@" (with no domain 3210 name after the "@"). For example: ANONYMOUS@. 3212 The ACE4_IDENTIFIER_GROUP flag MUST be ignored on entries with these 3213 special identifiers. When encoding entries with these special 3214 identifiers, the ACE4_IDENTIFIER_GROUP flag SHOULD be set to zero. 3216 6.2.1.5.1. Discussion of EVERYONE@ 3218 It is important to note that "EVERYONE@" is not equivalent to the 3219 UNIX "other" entity. This is because, by definition, UNIX "other" 3220 does not include the owner or owning group of a file. "EVERYONE@" 3221 means literally everyone, including the owner or owning group. 3223 6.2.2. Attribute 33: mode 3225 The NFSv4.0 mode attribute is based on the UNIX mode bits. The 3226 following bits are defined: 3228 const MODE4_SUID = 0x800; /* set user id on execution */ 3229 const MODE4_SGID = 0x400; /* set group id on execution */ 3230 const MODE4_SVTX = 0x200; /* save text even after use */ 3231 const MODE4_RUSR = 0x100; /* read permission: owner */ 3232 const MODE4_WUSR = 0x080; /* write permission: owner */ 3233 const MODE4_XUSR = 0x040; /* execute permission: owner */ 3234 const MODE4_RGRP = 0x020; /* read permission: group */ 3235 const MODE4_WGRP = 0x010; /* write permission: group */ 3236 const MODE4_XGRP = 0x008; /* execute permission: group */ 3237 const MODE4_ROTH = 0x004; /* read permission: other */ 3238 const MODE4_WOTH = 0x002; /* write permission: other */ 3239 const MODE4_XOTH = 0x001; /* execute permission: other */ 3241 Bits MODE4_RUSR, MODE4_WUSR, and MODE4_XUSR apply to the principal 3242 identified in the owner attribute. Bits MODE4_RGRP, MODE4_WGRP, and 3243 MODE4_XGRP apply to principals identified in the owner_group 3244 attribute but who are not identified in the owner attribute. Bits 3245 MODE4_ROTH, MODE4_WOTH, MODE4_XOTH apply to any principal that does 3246 not match that in the owner attribute, and does not have a group 3247 matching that of the owner_group attribute. 3249 Bits within the mode other than those specified above are not defined 3250 by this protocol. A server MUST NOT return bits other than those 3251 defined above in a GETATTR or READDIR operation, and it MUST return 3252 NFS4ERR_INVAL if bits other than those defined above are set in a 3253 SETATTR, CREATE, OPEN, VERIFY or NVERIFY operation. 3255 6.3. Common Methods 3257 The requirements in this section will be referred to in future 3258 sections, especially Section 6.4. 3260 6.3.1. Interpreting an ACL 3262 6.3.1.1. Server Considerations 3264 The server uses the algorithm described in Section 6.2.1 to determine 3265 whether an ACL allows access to an object. However, the ACL may not 3266 be the sole determiner of access. For example: 3268 o In the case of a file system exported as read-only, the server may 3269 deny write permissions even though an object's ACL grants it. 3271 o Server implementations MAY grant ACE4_WRITE_ACL and ACE4_READ_ACL 3272 permissions to prevent a situation from arising in which there is 3273 no valid way to ever modify the ACL. 3275 o All servers will allow a user the ability to read the data of the 3276 file when only the execute permission is granted (i.e., If the ACL 3277 denies the user the ACE4_READ_DATA access and allows the user 3278 ACE4_EXECUTE, the server will allow the user to read the data of 3279 the file). 3281 o Many servers have the notion of owner-override in which the owner 3282 of the object is allowed to override accesses that are denied by 3283 the ACL. This may be helpful, for example, to allow users 3284 continued access to open files on which the permissions have 3285 changed. 3287 o Many servers have the notion of a "superuser" that has privileges 3288 beyond an ordinary user. The superuser may be able to read or 3289 write data or metadata in ways that would not be permitted by the 3290 ACL. 3292 6.3.1.2. Client Considerations 3294 Clients SHOULD NOT do their own access checks based on their 3295 interpretation the ACL, but rather use the OPEN and ACCESS operations 3296 to do access checks. This allows the client to act on the results of 3297 having the server determine whether or not access should be granted 3298 based on its interpretation of the ACL. 3300 Clients must be aware of situations in which an object's ACL will 3301 define a certain access even though the server will not enforce it. 3302 In general, but especially in these situations, the client needs to 3303 do its part in the enforcement of access as defined by the ACL. To 3304 do this, the client MAY send the appropriate ACCESS operation prior 3305 to servicing the request of the user or application in order to 3306 determine whether the user or application should be granted the 3307 access requested. For examples in which the ACL may define accesses 3308 that the server doesn't enforce see Section 6.3.1.1. 3310 6.3.2. Computing a Mode Attribute from an ACL 3312 The following method can be used to calculate the MODE4_R*, MODE4_W* 3313 and MODE4_X* bits of a mode attribute, based upon an ACL. 3315 First, for each of the special identifiers OWNER@, GROUP@, and 3316 EVERYONE@, evaluate the ACL in order, considering only ALLOW and DENY 3317 ACEs for the identifier EVERYONE@ and for the identifier under 3318 consideration. The result of the evaluation will be an NFSv4 ACL 3319 mask showing exactly which bits are permitted to that identifier. 3321 Then translate the calculated mask for OWNER@, GROUP@, and EVERYONE@ 3322 into mode bits for, respectively, the user, group, and other, as 3323 follows: 3325 1. Set the read bit (MODE4_RUSR, MODE4_RGRP, or MODE4_ROTH) if and 3326 only if ACE4_READ_DATA is set in the corresponding mask. 3328 2. Set the write bit (MODE4_WUSR, MODE4_WGRP, or MODE4_WOTH) if and 3329 only if ACE4_WRITE_DATA and ACE4_APPEND_DATA are both set in the 3330 corresponding mask. 3332 3. Set the execute bit (MODE4_XUSR, MODE4_XGRP, or MODE4_XOTH), if 3333 and only if ACE4_EXECUTE is set in the corresponding mask. 3335 6.3.2.1. Discussion 3337 Some server implementations also add bits permitted to named users 3338 and groups to the group bits (MODE4_RGRP, MODE4_WGRP, and 3339 MODE4_XGRP). 3341 Implementations are discouraged from doing this, because it has been 3342 found to cause confusion for users who see members of a file's group 3343 denied access that the mode bits appear to allow. (The presence of 3344 DENY ACEs may also lead to such behavior, but DENY ACEs are expected 3345 to be more rarely used.) 3347 The same user confusion seen when fetching the mode also results if 3348 setting the mode does not effectively control permissions for the 3349 owner, group, and other users; this motivates some of the 3350 requirements that follow. 3352 6.4. Requirements 3354 The server that supports both mode and ACL must take care to 3355 synchronize the MODE4_*USR, MODE4_*GRP, and MODE4_*OTH bits with the 3356 ACEs which have respective who fields of "OWNER@", "GROUP@", and 3357 "EVERYONE@" so that the client can see semantically equivalent access 3358 permissions exist whether the client asks for owner, owner_group and 3359 mode attributes, or for just the ACL. 3361 In this section, much is made of the methods in Section 6.3.2. Many 3362 requirements refer to this section. But note that the methods have 3363 behaviors specified with "SHOULD". This is intentional, to avoid 3364 invalidating existing implementations that compute the mode according 3365 to the withdrawn POSIX ACL draft (1003.1e draft 17), rather than by 3366 actual permissions on owner, group, and other. 3368 6.4.1. Setting the mode and/or ACL Attributes 3370 6.4.1.1. Setting mode and not ACL 3372 When any of the nine low-order mode bits are subject to change, 3373 either because the mode attribute was set or because the 3374 mode_set_masked attribute was set and the mask included one or more 3375 bits from the nine low-order mode bits, and no ACL attribute is 3376 explicitly set, the acl attribute must be modified in accordance with 3377 the updated value of those bits. This must happen even if the value 3378 of the low-order bits is the same after the mode is set as before. 3380 Note that any AUDIT or ALARM ACEs are unaffected by changes to the 3381 mode. 3383 In cases in which the permissions bits are subject to change, the acl 3384 attribute MUST be modified such that the mode computed via the method 3385 in Section 6.3.2 yields the low-order nine bits (MODE4_R*, MODE4_W*, 3386 MODE4_X*) of the mode attribute as modified by the attribute change. 3387 The ACL attributes SHOULD also be modified such that: 3389 1. If MODE4_RGRP is not set, entities explicitly listed in the ACL 3390 other than OWNER@ and EVERYONE@ SHOULD NOT be granted 3391 ACE4_READ_DATA. 3393 2. If MODE4_WGRP is not set, entities explicitly listed in the ACL 3394 other than OWNER@ and EVERYONE@ SHOULD NOT be granted 3395 ACE4_WRITE_DATA or ACE4_APPEND_DATA. 3397 3. If MODE4_XGRP is not set, entities explicitly listed in the ACL 3398 other than OWNER@ and EVERYONE@ SHOULD NOT be granted 3399 ACE4_EXECUTE. 3401 Access mask bits other those listed above, appearing in ALLOW ACEs, 3402 MAY also be disabled. 3404 Note that ACEs with the flag ACE4_INHERIT_ONLY_ACE set do not affect 3405 the permissions of the ACL itself, nor do ACEs of the type AUDIT and 3406 ALARM. As such, it is desirable to leave these ACEs unmodified when 3407 modifying the ACL attributes. 3409 Also note that the requirement may be met by discarding the acl in 3410 favor of an ACL that represents the mode and only the mode. This is 3411 permitted, but it is preferable for a server to preserve as much of 3412 the ACL as possible without violating the above requirements. 3413 Discarding the ACL makes it effectively impossible for a file created 3414 with a mode attribute to inherit an ACL (see Section 6.4.3). 3416 6.4.1.2. Setting ACL and not mode 3418 When setting the acl and not setting the mode or mode_set_masked 3419 attributes, the permission bits of the mode need to be derived from 3420 the ACL. In this case, the ACL attribute SHOULD be set as given. 3421 The nine low-order bits of the mode attribute (MODE4_R*, MODE4_W*, 3422 MODE4_X*) MUST be modified to match the result of the method 3423 Section 6.3.2. The three high-order bits of the mode (MODE4_SUID, 3424 MODE4_SGID, MODE4_SVTX) SHOULD remain unchanged. 3426 6.4.1.3. Setting both ACL and mode 3428 When setting both the mode (includes use of either the mode attribute 3429 or the mode_set_masked attribute) and the acl attribute in the same 3430 operation, the attributes MUST be applied in this order: mode (or 3431 mode_set_masked), then ACL. The mode-related attribute is set as 3432 given, then the ACL attribute is set as given, possibly changing the 3433 final mode, as described above in Section 6.4.1.2. 3435 6.4.2. Retrieving the mode and/or ACL Attributes 3437 This section applies only to servers that support both the mode and 3438 ACL attributes. 3440 Some server implementations may have a concept of "objects without 3441 ACLs", meaning that all permissions are granted and denied according 3442 to the mode attribute, and that no ACL attribute is stored for that 3443 object. If an ACL attribute is requested of such a server, the 3444 server SHOULD return an ACL that does not conflict with the mode; 3445 that is to say, the ACL returned SHOULD represent the nine low-order 3446 bits of the mode attribute (MODE4_R*, MODE4_W*, MODE4_X*) as 3447 described in Section 6.3.2. 3449 For other server implementations, the ACL attribute is always present 3450 for every object. Such servers SHOULD store at least the three high- 3451 order bits of the mode attribute (MODE4_SUID, MODE4_SGID, 3452 MODE4_SVTX). The server SHOULD return a mode attribute if one is 3453 requested, and the low-order nine bits of the mode (MODE4_R*, 3454 MODE4_W*, MODE4_X*) MUST match the result of applying the method in 3455 Section 6.3.2 to the ACL attribute. 3457 6.4.3. Creating New Objects 3459 If a server supports any ACL attributes, it may use the ACL 3460 attributes on the parent directory to compute an initial ACL 3461 attribute for a newly created object. This will be referred to as 3462 the inherited ACL within this section. The act of adding one or more 3463 ACEs to the inherited ACL that are based upon ACEs in the parent 3464 directory's ACL will be referred to as inheriting an ACE within this 3465 section. 3467 Implementors should standardize on what the behavior of CREATE and 3468 OPEN must be depending on the presence or absence of the mode and ACL 3469 attributes. 3471 1. If just the mode is given in the call: 3473 In this case, inheritance SHOULD take place, but the mode MUST be 3474 applied to the inherited ACL as described in Section 6.4.1.1, 3475 thereby modifying the ACL. 3477 2. If just the ACL is given in the call: 3479 In this case, inheritance SHOULD NOT take place, and the ACL as 3480 defined in the CREATE or OPEN will be set without modification, 3481 and the mode modified as in Section 6.4.1.2 3483 3. If both mode and ACL are given in the call: 3485 In this case, inheritance SHOULD NOT take place, and both 3486 attributes will be set as described in Section 6.4.1.3. 3488 4. If neither mode nor ACL are given in the call: 3490 In the case where an object is being created without any initial 3491 attributes at all, e.g., an OPEN operation with an opentype4 of 3492 OPEN4_CREATE and a createmode4 of EXCLUSIVE4, inheritance SHOULD 3493 NOT take place. Instead, the server SHOULD set permissions to 3494 deny all access to the newly created object. It is expected that 3495 the appropriate client will set the desired attributes in a 3496 subsequent SETATTR operation, and the server SHOULD allow that 3497 operation to succeed, regardless of what permissions the object 3498 is created with. For example, an empty ACL denies all 3499 permissions, but the server should allow the owner's SETATTR to 3500 succeed even though WRITE_ACL is implicitly denied. 3502 In other cases, inheritance SHOULD take place, and no 3503 modifications to the ACL will happen. The mode attribute, if 3504 supported, MUST be as computed in Section 6.3.2, with the 3505 MODE4_SUID, MODE4_SGID and MODE4_SVTX bits clear. If no 3506 inheritable ACEs exist on the parent directory, the rules for 3507 creating acl attributes are implementation defined. 3509 6.4.3.1. The Inherited ACL 3511 If the object being created is not a directory, the inherited ACL 3512 SHOULD NOT inherit ACEs from the parent directory ACL unless the 3513 ACE4_FILE_INHERIT_FLAG is set. 3515 If the object being created is a directory, the inherited ACL should 3516 inherit all inheritable ACEs from the parent directory, those that 3517 have ACE4_FILE_INHERIT_ACE or ACE4_DIRECTORY_INHERIT_ACE flag set. 3518 If the inheritable ACE has ACE4_FILE_INHERIT_ACE set, but 3519 ACE4_DIRECTORY_INHERIT_ACE is clear, the inherited ACE on the newly 3520 created directory MUST have the ACE4_INHERIT_ONLY_ACE flag set to 3521 prevent the directory from being affected by ACEs meant for non- 3522 directories. 3524 When a new directory is created, the server MAY split any inherited 3525 ACE which is both inheritable and effective (in other words, which 3526 has neither ACE4_INHERIT_ONLY_ACE nor ACE4_NO_PROPAGATE_INHERIT_ACE 3527 set), into two ACEs, one with no inheritance flags, and one with 3528 ACE4_INHERIT_ONLY_ACE set. This makes it simpler to modify the 3529 effective permissions on the directory without modifying the ACE 3530 which is to be inherited to the new directory's children. 3532 7. Multi-Server Namespace 3534 NFSv4 supports attributes that allow a namespace to extend beyond the 3535 boundaries of a single server. It is RECOMMENDED that clients and 3536 servers support construction of such multi-server namespaces. Use of 3537 such multi-server namespaces is OPTIONAL, however, and for many 3538 purposes, single-server namespaces are perfectly acceptable. Use of 3539 multi-server namespaces can provide many advantages, however, by 3540 separating a file system's logical position in a namespace from the 3541 (possibly changing) logistical and administrative considerations that 3542 result in particular file systems being located on particular 3543 servers. 3545 7.1. Location Attributes 3547 NFSv4 contains RECOMMENDED attributes that allow file systems on one 3548 server to be associated with one or more instances of that file 3549 system on other servers. These attributes specify such file system 3550 instances by specifying a server address target (either as a DNS name 3551 representing one or more IP addresses or as a literal IP address) 3552 together with the path of that file system within the associated 3553 single-server namespace. 3555 The fs_locations RECOMMENDED attribute allows specification of the 3556 file system locations where the data corresponding to a given file 3557 system may be found. 3559 7.2. File System Presence or Absence 3561 A given location in an NFSv4 namespace (typically but not necessarily 3562 a multi-server namespace) can have a number of file system instance 3563 locations associated with it via the fs_locations attribute. There 3564 may also be an actual current file system at that location, 3565 accessible via normal namespace operations (e.g., LOOKUP). In this 3566 case, the file system is said to be "present" at that position in the 3567 namespace, and clients will typically use it, reserving use of 3568 additional locations specified via the location-related attributes to 3569 situations in which the principal location is no longer available. 3571 When there is no actual file system at the namespace location in 3572 question, the file system is said to be "absent". An absent file 3573 system contains no files or directories other than the root. Any 3574 reference to it, except to access a small set of attributes useful in 3575 determining alternate locations, will result in an error, 3576 NFS4ERR_MOVED. Note that if the server ever returns the error 3577 NFS4ERR_MOVED, it MUST support the fs_locations attribute. 3579 While the error name suggests that we have a case of a file system 3580 that once was present, and has only become absent later, this is only 3581 one possibility. A position in the namespace may be permanently 3582 absent with the set of file system(s) designated by the location 3583 attributes being the only realization. The name NFS4ERR_MOVED 3584 reflects an earlier, more limited conception of its function, but 3585 this error will be returned whenever the referenced file system is 3586 absent, whether it has moved or not. 3588 Except in the case of GETATTR-type operations (to be discussed 3589 later), when the current filehandle at the start of an operation is 3590 within an absent file system, that operation is not performed and the 3591 error NFS4ERR_MOVED is returned, to indicate that the file system is 3592 absent on the current server. 3594 Because a GETFH cannot succeed if the current filehandle is within an 3595 absent file system, filehandles within an absent file system cannot 3596 be transferred to the client. When a client does have filehandles 3597 within an absent file system, it is the result of obtaining them when 3598 the file system was present, and having the file system become absent 3599 subsequently. 3601 It should be noted that because the check for the current filehandle 3602 being within an absent file system happens at the start of every 3603 operation, operations that change the current filehandle so that it 3604 is within an absent file system will not result in an error. This 3605 allows such combinations as PUTFH-GETATTR and LOOKUP-GETATTR to be 3606 used to get attribute information, particularly location attribute 3607 information, as discussed below. 3609 7.3. Getting Attributes for an Absent File System 3611 When a file system is absent, most attributes are not available, but 3612 it is necessary to allow the client access to the small set of 3613 attributes that are available, and most particularly that which gives 3614 information about the correct current locations for this file system, 3615 fs_locations. 3617 7.3.1. GETATTR Within an Absent File System 3619 As mentioned above, an exception is made for GETATTR in that 3620 attributes may be obtained for a filehandle within an absent file 3621 system. This exception only applies if the attribute mask contains 3622 at least the fs_locations attribute bit, which indicates the client 3623 is interested in a result regarding an absent file system. If it is 3624 not requested, GETATTR will result in an NFS4ERR_MOVED error. 3626 When a GETATTR is done on an absent file system, the set of supported 3627 attributes is very limited. Many attributes, including those that 3628 are normally REQUIRED, will not be available on an absent file 3629 system. In addition to the fs_locations attribute, the following 3630 attributes SHOULD be available on absent file systems. In the case 3631 of RECOMMENDED attributes, they should be available at least to the 3632 same degree that they are available on present file systems. 3634 fsid: This attribute should be provided so that the client can 3635 determine file system boundaries, including, in particular, the 3636 boundary between present and absent file systems. This value must 3637 be different from any other fsid on the current server and need 3638 have no particular relationship to fsids on any particular 3639 destination to which the client might be directed. 3641 mounted_on_fileid: For objects at the top of an absent file system, 3642 this attribute needs to be available. Since the fileid is within 3643 the present parent file system, there should be no need to 3644 reference the absent file system to provide this information. 3646 Other attributes SHOULD NOT be made available for absent file 3647 systems, even when it is possible to provide them. The server should 3648 not assume that more information is always better and should avoid 3649 gratuitously providing additional information. 3651 When a GETATTR operation includes a bit mask for the attribute 3652 fs_locations, but where the bit mask includes attributes that are not 3653 supported, GETATTR will not return an error, but will return the mask 3654 of the actual attributes supported with the results. 3656 Handling of VERIFY/NVERIFY is similar to GETATTR in that if the 3657 attribute mask does not include fs_locations the error NFS4ERR_MOVED 3658 will result. It differs in that any appearance in the attribute mask 3659 of an attribute not supported for an absent file system (and note 3660 that this will include some normally REQUIRED attributes) will also 3661 cause an NFS4ERR_MOVED result. 3663 7.3.2. READDIR and Absent File Systems 3665 A READDIR performed when the current filehandle is within an absent 3666 file system will result in an NFS4ERR_MOVED error, since, unlike the 3667 case of GETATTR, no such exception is made for READDIR. 3669 Attributes for an absent file system may be fetched via a READDIR for 3670 a directory in a present file system, when that directory contains 3671 the root directories of one or more absent file systems. In this 3672 case, the handling is as follows: 3674 o If the attribute set requested includes fs_locations, then 3675 fetching of attributes proceeds normally and no NFS4ERR_MOVED 3676 indication is returned, even when the rdattr_error attribute is 3677 requested. 3679 o If the attribute set requested does not include fs_locations, then 3680 if the rdattr_error attribute is requested, each directory entry 3681 for the root of an absent file system will report NFS4ERR_MOVED as 3682 the value of the rdattr_error attribute. 3684 o If the attribute set requested does not include either of the 3685 attributes fs_locations or rdattr_error then the occurrence of the 3686 root of an absent file system within the directory will result in 3687 the READDIR failing with an NFS4ERR_MOVED error. 3689 o The unavailability of an attribute because of a file system's 3690 absence, even one that is ordinarily REQUIRED, does not result in 3691 any error indication. The set of attributes returned for the root 3692 directory of the absent file system in that case is simply 3693 restricted to those actually available. 3695 7.4. Uses of Location Information 3697 The location-bearing attribute of fs_locations provides, together 3698 with the possibility of absent file systems, a number of important 3699 facilities in providing reliable, manageable, and scalable data 3700 access. 3702 When a file system is present, these attributes can provide 3703 alternative locations, to be used to access the same data, in the 3704 event of server failures, communications problems, or other 3705 difficulties that make continued access to the current file system 3706 impossible or otherwise impractical. Under some circumstances, 3707 multiple alternative locations may be used simultaneously to provide 3708 higher-performance access to the file system in question. Provision 3709 of such alternate locations is referred to as "replication" although 3710 there are cases in which replicated sets of data are not in fact 3711 present, and the replicas are instead different paths to the same 3712 data. 3714 When a file system is present and becomes absent, clients can be 3715 given the opportunity to have continued access to their data, at an 3716 alternate location. In this case, a continued attempt to use the 3717 data in the now-absent file system will result in an NFS4ERR_MOVED 3718 error and, at that point, the successor locations (typically only one 3719 although multiple choices are possible) can be fetched and used to 3720 continue access. Transfer of the file system contents to the new 3721 location is referred to as "migration", but it should be kept in mind 3722 that there are cases in which this term can be used, like 3723 "replication", when there is no actual data migration per se. 3725 Where a file system was not previously present, specification of file 3726 system location provides a means by which file systems located on one 3727 server can be associated with a namespace defined by another server, 3728 thus allowing a general multi-server namespace facility. A 3729 designation of such a location, in place of an absent file system, is 3730 called a "referral". 3732 Because client support for location-related attributes is OPTIONAL, a 3733 server may (but is not required to) take action to hide migration and 3734 referral events from such clients, by acting as a proxy, for example. 3736 7.4.1. File System Replication 3738 The fs_locations attribute provides alternative locations, to be used 3739 to access data in place of or in addition to the current file system 3740 instance. On first access to a file system, the client should obtain 3741 the value of the set of alternate locations by interrogating the 3742 fs_locations attribute. 3744 In the event that server failures, communications problems, or other 3745 difficulties make continued access to the current file system 3746 impossible or otherwise impractical, the client can use the alternate 3747 locations as a way to get continued access to its data. Multiple 3748 locations may be used simultaneously, to provide higher performance 3749 through the exploitation of multiple paths between client and target 3750 file system. 3752 The alternate locations may be physical replicas of the (typically 3753 read-only) file system data, or they may reflect alternate paths to 3754 the same server or provide for the use of various forms of server 3755 clustering in which multiple servers provide alternate ways of 3756 accessing the same physical file system. How these different modes 3757 of file system transition are represented within the fs_locations 3758 attribute and how the client deals with file system transition issues 3759 will be discussed in detail below. 3761 Multiple server addresses, whether they are derived from a single 3762 entry with a DNS name representing a set of IP addresses or from 3763 multiple entries each with its own server address, may correspond to 3764 the same actual server. 3766 7.4.2. File System Migration 3768 When a file system is present and becomes absent, clients can be 3769 given the opportunity to have continued access to their data, at an 3770 alternate location, as specified by the fs_locations attribute. 3771 Typically, a client will be accessing the file system in question, 3772 get an NFS4ERR_MOVED error, and then use the fs_locations attribute 3773 to determine the new location of the data. 3775 Such migration can be helpful in providing load balancing or general 3776 resource reallocation. The protocol does not specify how the file 3777 system will be moved between servers. It is anticipated that a 3778 number of different server-to-server transfer mechanisms might be 3779 used with the choice left to the server implementor. The NFSv4 3780 protocol specifies the method used to communicate the migration event 3781 between client and server. 3783 The new location may be an alternate communication path to the same 3784 server or, in the case of various forms of server clustering, another 3785 server providing access to the same physical file system. The 3786 client's responsibilities in dealing with this transition depend on 3787 the specific nature of the new access path as well as how and whether 3788 data was in fact migrated. These issues will be discussed in detail 3789 below. 3791 When an alternate location is designated as the target for migration, 3792 it must designate the same data. Where file systems are writable, a 3793 change made on the original file system must be visible on all 3794 migration targets. Where a file system is not writable but 3795 represents a read-only copy (possibly periodically updated) of a 3796 writable file system, similar requirements apply to the propagation 3797 of updates. Any change visible in the original file system must 3798 already be effected on all migration targets, to avoid any 3799 possibility that a client, in effecting a transition to the migration 3800 target, will see any reversion in file system state. 3802 7.4.3. Referrals 3804 Referrals provide a way of placing a file system in a location within 3805 the namespace essentially without respect to its physical location on 3806 a given server. This allows a single server or a set of servers to 3807 present a multi-server namespace that encompasses file systems 3808 located on multiple servers. Some likely uses of this include 3809 establishment of site-wide or organization-wide namespaces, or even 3810 knitting such together into a truly global namespace. 3812 Referrals occur when a client determines, upon first referencing a 3813 position in the current namespace, that it is part of a new file 3814 system and that the file system is absent. When this occurs, 3815 typically by receiving the error NFS4ERR_MOVED, the actual location 3816 or locations of the file system can be determined by fetching the 3817 fs_locations attribute. 3819 The locations-related attribute may designate a single file system 3820 location or multiple file system locations, to be selected based on 3821 the needs of the client. 3823 Use of multi-server namespaces is enabled by NFSv4 but is not 3824 required. The use of multi-server namespaces and their scope will 3825 depend on the applications used and system administration 3826 preferences. 3828 Multi-server namespaces can be established by a single server 3829 providing a large set of referrals to all of the included file 3830 systems. Alternatively, a single multi-server namespace may be 3831 administratively segmented with separate referral file systems (on 3832 separate servers) for each separately administered portion of the 3833 namespace. The top-level referral file system or any segment may use 3834 replicated referral file systems for higher availability. 3836 Generally, multi-server namespaces are for the most part uniform, in 3837 that the same data made available to one client at a given location 3838 in the namespace is made available to all clients at that location. 3840 7.5. Location Entries and Server Identity 3842 As mentioned above, a single location entry may have a server address 3843 target in the form of a DNS name that may represent multiple IP 3844 addresses, while multiple location entries may have their own server 3845 address targets that reference the same server. 3847 When multiple addresses for the same server exist, the client may 3848 assume that for each file system in the namespace of a given server 3849 network address, there exist file systems at corresponding namespace 3850 locations for each of the other server network addresses. It may do 3851 this even in the absence of explicit listing in fs_locations. Such 3852 corresponding file system locations can be used as alternate 3853 locations, just as those explicitly specified via the fs_locations 3854 attribute. 3856 If a single location entry designates multiple server IP addresses, 3857 the client cannot assume that these addresses are multiple paths to 3858 the same server. In most cases, they will be, but the client MUST 3859 verify that before acting on that assumption. When two server 3860 addresses are designated by a single location entry and they 3861 correspond to different servers, this normally indicates some sort of 3862 misconfiguration, and so the client should avoid using such location 3863 entries when alternatives are available. When they are not, clients 3864 should pick one of IP addresses and use it, without using others that 3865 are not directed to the same server. 3867 7.6. Additional Client-Side Considerations 3869 When clients make use of servers that implement referrals, 3870 replication, and migration, care should be taken that a user who 3871 mounts a given file system that includes a referral or a relocated 3872 file system continues to see a coherent picture of that user-side 3873 file system despite the fact that it contains a number of server-side 3874 file systems that may be on different servers. 3876 One important issue is upward navigation from the root of a server- 3877 side file system to its parent (specified as ".." in UNIX), in the 3878 case in which it transitions to that file system as a result of 3879 referral, migration, or a transition as a result of replication. 3880 When the client is at such a point, and it needs to ascend to the 3881 parent, it must go back to the parent as seen within the multi-server 3882 namespace rather than sending a LOOKUPP operation to the server, 3883 which would result in the parent within that server's single-server 3884 namespace. In order to do this, the client needs to remember the 3885 filehandles that represent such file system roots and use these 3886 instead of issuing a LOOKUPP operation to the current server. This 3887 will allow the client to present to applications a consistent 3888 namespace, where upward navigation and downward navigation are 3889 consistent. 3891 Another issue concerns refresh of referral locations. When referrals 3892 are used extensively, they may change as server configurations 3893 change. It is expected that clients will cache information related 3894 to traversing referrals so that future client-side requests are 3895 resolved locally without server communication. This is usually 3896 rooted in client-side name look up caching. Clients should 3897 periodically purge this data for referral points in order to detect 3898 changes in location information. 3900 A problem exists if a client allows an open owner to have state on 3901 multiple filesystems on a server. If one of those filesystems is 3902 migrated, what happens to the sequence numbers? A client can avoid 3903 such a situation with the stipulation that any client which supports 3904 migration MUST ensure that any open owner is confined to a single 3905 filesystem. If the server finds itself migrating open owners that 3906 span multiple filesystems, then it MUST not migrate the state for the 3907 conflicting open owners on the non-migrated filesystems; instead it 3908 MUST return NFS4ERR_STALE_STATEID if the client tries to use those 3909 stateids. 3911 7.7. Effecting File System Transitions 3913 Transitions between file system instances, whether due to switching 3914 between replicas upon server unavailability or to server-initiated 3915 migration events, are best dealt with together. This is so even 3916 though, for the server, pragmatic considerations will normally force 3917 different implementation strategies for planned and unplanned 3918 transitions. Even though the prototypical use cases of replication 3919 and migration contain distinctive sets of features, when all 3920 possibilities for these operations are considered, there is an 3921 underlying unity of these operations, from the client's point of 3922 view, that makes treating them together desirable. 3924 A number of methods are possible for servers to replicate data and to 3925 track client state in order to allow clients to transition between 3926 file system instances with a minimum of disruption. Such methods 3927 vary between those that use inter-server clustering techniques to 3928 limit the changes seen by the client, to those that are less 3929 aggressive, use more standard methods of replicating data, and impose 3930 a greater burden on the client to adapt to the transition. 3932 The NFSv4 protocol does not impose choices on clients and servers 3933 with regard to that spectrum of transition methods. In fact, there 3934 are many valid choices, depending on client and application 3935 requirements and their interaction with server implementation 3936 choices. The NFSv4.0 protocol does not provide the servers a means 3937 of communicating the transition methods. In the NFSv4.1 protocol 3938 [31], an additional attribute "fs_locations_info" is presented, which 3939 will define the specific choices that can be made, how these choices 3940 are communicated to the client, and how the client is to deal with 3941 any discontinuities. 3943 In the sections below, references will be made to various possible 3944 server implementation choices as a way of illustrating the transition 3945 scenarios that clients may deal with. The intent here is not to 3946 define or limit server implementations but rather to illustrate the 3947 range of issues that clients may face. Again, as the NFSv4.0 3948 protocol does not have an explicit means of communicating these 3949 issues to the client, the intent is to document the problems that can 3950 be faced in a multi-server name space and allow the client to use the 3951 inferred transitions available via fs_locations and other attributes 3952 (see Section 7.9.1). 3954 In the discussion below, references will be made to a file system 3955 having a particular property or to two file systems (typically the 3956 source and destination) belonging to a common class of any of several 3957 types. Two file systems that belong to such a class share some 3958 important aspects of file system behavior that clients may depend 3959 upon when present, to easily effect a seamless transition between 3960 file system instances. Conversely, where the file systems do not 3961 belong to such a common class, the client has to deal with various 3962 sorts of implementation discontinuities that may cause performance or 3963 other issues in effecting a transition. 3965 While fs_locations is available, default assumptions with regard to 3966 such classifications have to be inferred (see Section 7.9.1 for 3967 details). 3969 In cases in which one server is expected to accept opaque values from 3970 the client that originated from another server, the servers SHOULD 3971 encode the "opaque" values in big-endian byte order. If this is 3972 done, servers acting as replicas or immigrating file systems will be 3973 able to parse values like stateids, directory cookies, filehandles, 3974 etc., even if their native byte order is different from that of other 3975 servers cooperating in the replication and migration of the file 3976 system. 3978 7.7.1. File System Transitions and Simultaneous Access 3980 When a single file system may be accessed at multiple locations, 3981 either because of an indication of file system identity as reported 3982 by the fs_locations attribute, the client will, depending on specific 3983 circumstances as discussed below, either: 3985 o Access multiple instances simultaneously, each of which represents 3986 an alternate path to the same data and metadata. 3988 o Accesses one instance (or set of instances) and then transition to 3989 an alternative instance (or set of instances) as a result of 3990 network issues, server unresponsiveness, or server-directed 3991 migration. 3993 7.7.2. Filehandles and File System Transitions 3995 There are a number of ways in which filehandles can be handled across 3996 a file system transition. These can be divided into two broad 3997 classes depending upon whether the two file systems across which the 3998 transition happens share sufficient state to effect some sort of 3999 continuity of file system handling. 4001 When there is no such cooperation in filehandle assignment, the two 4002 file systems are reported as being in different handle classes. In 4003 this case, all filehandles are assumed to expire as part of the file 4004 system transition. Note that this behavior does not depend on 4005 fh_expire_type attribute and depends on the specification of the 4006 FH4_VOL_MIGRATION bit. 4008 When there is co-operation in filehandle assignment, the two file 4009 systems are reported as being in the same handle classes. In this 4010 case, persistent filehandles remain valid after the file system 4011 transition, while volatile filehandles (excluding those that are only 4012 volatile due to the FH4_VOL_MIGRATION bit) are subject to expiration 4013 on the target server. 4015 7.7.3. Fileids and File System Transitions 4017 The issue of continuity of fileids in the event of a file system 4018 transition needs to be addressed. The general expectation is that in 4019 situations in which the two file system instances are created by a 4020 single vendor using some sort of file system image copy, fileids will 4021 be consistent across the transition, while in the analogous multi- 4022 vendor transitions they will not. This poses difficulties, 4023 especially for the client without special knowledge of the transition 4024 mechanisms adopted by the server. Note that although fileid is not a 4025 REQUIRED attribute, many servers support fileids and many clients 4026 provide APIs that depend on fileids. 4028 It is important to note that while clients themselves may have no 4029 trouble with a fileid changing as a result of a file system 4030 transition event, applications do typically have access to the fileid 4031 (e.g., via stat). The result is that an application may work 4032 perfectly well if there is no file system instance transition or if 4033 any such transition is among instances created by a single vendor, 4034 yet be unable to deal with the situation in which a multi-vendor 4035 transition occurs at the wrong time. 4037 Providing the same fileids in a multi-vendor (multiple server 4038 vendors) environment has generally been held to be quite difficult. 4039 While there is work to be done, it needs to be pointed out that this 4040 difficulty is partly self-imposed. Servers have typically identified 4041 fileid with inode number, i.e., with a quantity used to find the file 4042 in question. This identification poses special difficulties for 4043 migration of a file system between vendors where assigning the same 4044 index to a given file may not be possible. Note here that a fileid 4045 is not required to be useful to find the file in question, only that 4046 it is unique within the given file system. Servers prepared to 4047 accept a fileid as a single piece of metadata and store it apart from 4048 the value used to index the file information can relatively easily 4049 maintain a fileid value across a migration event, allowing a truly 4050 transparent migration event. 4052 In any case, where servers can provide continuity of fileids, they 4053 should, and the client should be able to find out that such 4054 continuity is available and take appropriate action. Information 4055 about the continuity (or lack thereof) of fileids across a file 4056 system transition is represented by specifying whether the file 4057 systems in question are of the same fileid class. 4059 Note that when consistent fileids do not exist across a transition 4060 (either because there is no continuity of fileids or because fileid 4061 is not a supported attribute on one of instances involved), and there 4062 are no reliable filehandles across a transition event (either because 4063 there is no filehandle continuity or because the filehandles are 4064 volatile), the client is in a position where it cannot verify that 4065 files it was accessing before the transition are the same objects. 4066 It is forced to assume that no object has been renamed, and, unless 4067 there are guarantees that provide this (e.g., the file system is 4068 read-only), problems for applications may occur. Therefore, use of 4069 such configurations should be limited to situations where the 4070 problems that this may cause can be tolerated. 4072 7.7.4. Fsids and File System Transitions 4074 Since fsids are generally only unique within a per-server basis, it 4075 is likely that they will change during a file system transition. 4076 Clients should not make the fsids received from the server visible to 4077 applications since they may not be globally unique, and because they 4078 may change during a file system transition event. Applications are 4079 best served if they are isolated from such transitions to the extent 4080 possible. 4082 7.7.5. The Change Attribute and File System Transitions 4084 Since the change attribute is defined as a server-specific one, 4085 change attributes fetched from one server are normally presumed to be 4086 invalid on another server. Such a presumption is troublesome since 4087 it would invalidate all cached change attributes, requiring 4088 refetching. Even more disruptive, the absence of any assured 4089 continuity for the change attribute means that even if the same value 4090 is retrieved on refetch, no conclusions can be drawn as to whether 4091 the object in question has changed. The identical change attribute 4092 could be merely an artifact of a modified file with a different 4093 change attribute construction algorithm, with that new algorithm just 4094 happening to result in an identical change value. 4096 When the two file systems have consistent change attribute formats, 4097 and we say that they are in the same change class, the client may 4098 assume a continuity of change attribute construction and handle this 4099 situation just as it would be handled without any file system 4100 transition. 4102 7.7.6. Lock State and File System Transitions 4104 In a file system transition, the client needs to handle cases in 4105 which the two servers have cooperated in state management and in 4106 which they have not. Cooperation by two servers in state management 4107 requires coordination of client IDs. Before the client attempts to 4108 use a client ID associated with one server in a request to the server 4109 of the other file system, it must eliminate the possibility that two 4110 non-cooperating servers have assigned the same client ID by accident. 4112 In the case of migration, the servers involved in the migration of a 4113 file system SHOULD transfer all server state from the original to the 4114 new server. When this is done, it must be done in a way that is 4115 transparent to the client. With replication, such a degree of common 4116 state is typically not the case. 4118 This state transfer will reduce disruption to the client when a file 4119 system transition occurs. If the servers are successful in 4120 transferring all state, the client can attempt to establish sessions 4121 associated with the client ID used for the source file system 4122 instance. If the server accepts that as a valid client ID, then the 4123 client may use the existing stateids associated with that client ID 4124 for the old file system instance in connection with that same client 4125 ID in connection with the transitioned file system instance. 4127 File systems cooperating in state management may actually share state 4128 or simply divide the identifier space so as to recognize (and reject 4129 as stale) each other's stateids and client IDs. Servers that do 4130 share state may not do so under all conditions or at all times. If 4131 the server cannot be sure when accepting a client ID that it reflects 4132 the locks the client was given, the server must treat all associated 4133 state as stale and report it as such to the client. 4135 The client must establish a new client ID on the destination, if it 4136 does not have one already, and reclaim locks if allowed by the 4137 server. In this case, old stateids and client IDs should not be 4138 presented to the new server since there is no assurance that they 4139 will not conflict with IDs valid on that server. 4141 When actual locks are not known to be maintained, the destination 4142 server may establish a grace period specific to the given file 4143 system, with non-reclaim locks being rejected for that file system, 4144 even though normal locks are being granted for other file systems. 4146 Clients should not infer the absence of a grace period for file 4147 systems being transitioned to a server from responses to requests for 4148 other file systems. 4150 In the case of lock reclamation for a given file system after a file 4151 system transition, edge conditions can arise similar to those for 4152 reclaim after server restart (although in the case of the planned 4153 state transfer associated with migration, these can be avoided by 4154 securely recording lock state as part of state migration). Unless 4155 the destination server can guarantee that locks will not be 4156 incorrectly granted, the destination server should not allow lock 4157 reclaims and should avoid establishing a grace period. (See 4158 Section 9.14 for further details.) 4160 Information about client identity may be propagated between servers 4161 in the form of client_owner4 and associated verifiers, under the 4162 assumption that the client presents the same values to all the 4163 servers with which it deals. 4165 Servers are encouraged to provide facilities to allow locks to be 4166 reclaimed on the new server after a file system transition. Often 4167 such facilities may not be available and client should be prepared to 4168 re-obtain locks, even though it is possible that the client may have 4169 its LOCK or OPEN request denied due to a conflicting lock. 4171 The consequences of having no facilities available to reclaim locks 4172 on the new server will depend on the type of environment. In some 4173 environments, such as the transition between read-only file systems, 4174 such denial of locks should not pose large difficulties in practice. 4175 When an attempt to re-establish a lock on a new server is denied, the 4176 client should treat the situation as if its original lock had been 4177 revoked. Note that when the lock is granted, the client cannot 4178 assume that no conflicting lock could have been granted in the 4179 interim. Where change attribute continuity is present, the client 4180 may check the change attribute to check for unwanted file 4181 modifications. Where even this is not available, and the file system 4182 is not read-only, a client may reasonably treat all pending locks as 4183 having been revoked. 4185 7.7.6.1. Transitions and the Lease_time Attribute 4187 In order that the client may appropriately manage its lease in the 4188 case of a file system transition, the destination server must 4189 establish proper values for the lease_time attribute. 4191 When state is transferred transparently, that state should include 4192 the correct value of the lease_time attribute. The lease_time 4193 attribute on the destination server must never be less than that on 4194 the source, since this would result in premature expiration of a 4195 lease granted by the source server. Upon transitions in which state 4196 is transferred transparently, the client is under no obligation to 4197 refetch the lease_time attribute and may continue to use the value 4198 previously fetched (on the source server). 4200 If state has not been transferred transparently because the client ID 4201 is rejected when presented to the new server, the client should fetch 4202 the value of lease_time on the new (i.e., destination) server, and 4203 use it for subsequent locking requests. However, the server must 4204 respect a grace period of at least as long as the lease_time on the 4205 source server, in order to ensure that clients have ample time to 4206 reclaim their lock before potentially conflicting non-reclaimed locks 4207 are granted. 4209 7.7.7. Write Verifiers and File System Transitions 4211 In a file system transition, the two file systems may be clustered in 4212 the handling of unstably written data. When this is the case, and 4213 the two file systems belong to the same write-verifier class, write 4214 verifiers returned from one system may be compared to those returned 4215 by the other and superfluous writes avoided. 4217 When two file systems belong to different write-verifier classes, any 4218 verifier generated by one must not be compared to one provided by the 4219 other. Instead, it should be treated as not equal even when the 4220 values are identical. 4222 7.7.8. Readdir Cookies and Verifiers and File System Transitions 4224 In a file system transition, the two file systems may be consistent 4225 in their handling of READDIR cookies and verifiers. When this is the 4226 case, and the two file systems belong to the same readdir class, 4227 READDIR cookies and verifiers from one system may be recognized by 4228 the other and READDIR operations started on one server may be validly 4229 continued on the other, simply by presenting the cookie and verifier 4230 returned by a READDIR operation done on the first file system to the 4231 second. 4233 When two file systems belong to different readdir classes, any 4234 READDIR cookie and verifier generated by one is not valid on the 4235 second, and must not be presented to that server by the client. The 4236 client should act as if the verifier was rejected. 4238 7.7.9. File System Data and File System Transitions 4240 When multiple replicas exist and are used simultaneously or in 4241 succession by a client, applications using them will normally expect 4242 that they contain either the same data or data that is consistent 4243 with the normal sorts of changes that are made by other clients 4244 updating the data of the file system (with metadata being the same to 4245 the degree inferred by the fs_locations attribute). However, when 4246 multiple file systems are presented as replicas of one another, the 4247 precise relationship between the data of one and the data of another 4248 is not, as a general matter, specified by the NFSv4 protocol. It is 4249 quite possible to present as replicas file systems where the data of 4250 those file systems is sufficiently different that some applications 4251 have problems dealing with the transition between replicas. The 4252 namespace will typically be constructed so that applications can 4253 choose an appropriate level of support, so that in one position in 4254 the namespace a varied set of replicas will be listed, while in 4255 another only those that are up-to-date may be considered replicas. 4256 The protocol does define four special cases of the relationship among 4257 replicas to be specified by the server and relied upon by clients: 4259 o When multiple server addresses correspond to the same actual 4260 server, the client may depend on the fact that changes to data, 4261 metadata, or locks made on one file system are immediately 4262 reflected on others. 4264 o When multiple replicas exist and are used simultaneously by a 4265 client, they must designate the same data. Where file systems are 4266 writable, a change made on one instance must be visible on all 4267 instances, immediately upon the earlier of the return of the 4268 modifying requester or the visibility of that change on any of the 4269 associated replicas. This allows a client to use these replicas 4270 simultaneously without any special adaptation to the fact that 4271 there are multiple replicas. In this case, locks (whether share 4272 reservations or byte-range locks), and delegations obtained on one 4273 replica are immediately reflected on all replicas, even though 4274 these locks will be managed under a set of client IDs. 4276 o When one replica is designated as the successor instance to 4277 another existing instance after return NFS4ERR_MOVED (i.e., the 4278 case of migration), the client may depend on the fact that all 4279 changes written to stable storage on the original instance are 4280 written to stable storage of the successor (uncommitted writes are 4281 dealt with in Section 7.7.7). 4283 o Where a file system is not writable but represents a read-only 4284 copy (possibly periodically updated) of a writable file system, 4285 clients have similar requirements with regard to the propagation 4286 of updates. They may need a guarantee that any change visible on 4287 the original file system instance must be immediately visible on 4288 any replica before the client transitions access to that replica, 4289 in order to avoid any possibility that a client, in effecting a 4290 transition to a replica, will see any reversion in file system 4291 state. Since these file systems are presumed to be unsuitable for 4292 simultaneous use, there is no specification of how locking is 4293 handled; in general, locks obtained on one file system will be 4294 separate from those on others. Since these are going to be read- 4295 only file systems, this is not expected to pose an issue for 4296 clients or applications. 4298 7.8. Effecting File System Referrals 4300 Referrals are effected when an absent file system is encountered, and 4301 one or more alternate locations are made available by the 4302 fs_locations attribute. The client will typically get an 4303 NFS4ERR_MOVED error, fetch the appropriate location information, and 4304 proceed to access the file system on a different server, even though 4305 it retains its logical position within the original namespace. 4306 Referrals differ from migration events in that they happen only when 4307 the client has not previously referenced the file system in question 4308 (so there is nothing to transition). Referrals can only come into 4309 effect when an absent file system is encountered at its root. 4311 The examples given in the sections below are somewhat artificial in 4312 that an actual client will not typically do a multi-component look 4313 up, but will have cached information regarding the upper levels of 4314 the name hierarchy. However, these example are chosen to make the 4315 required behavior clear and easy to put within the scope of a small 4316 number of requests, without getting unduly into details of how 4317 specific clients might choose to cache things. 4319 7.8.1. Referral Example (LOOKUP) 4321 Let us suppose that the following COMPOUND is sent in an environment 4322 in which /this/is/the/path is absent from the target server. This 4323 may be for a number of reasons. It may be the case that the file 4324 system has moved, or it may be the case that the target server is 4325 functioning mainly, or solely, to refer clients to the servers on 4326 which various file systems are located. 4328 o PUTROOTFH 4330 o LOOKUP "this" 4332 o LOOKUP "is" 4334 o LOOKUP "the" 4336 o LOOKUP "path" 4337 o GETFH 4339 o GETATTR(fsid,fileid,size,time_modify) 4341 Under the given circumstances, the following will be the result. 4343 o PUTROOTFH --> NFS_OK. The current fh is now the root of the 4344 pseudo-fs. 4346 o LOOKUP "this" --> NFS_OK. The current fh is for /this and is 4347 within the pseudo-fs. 4349 o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is 4350 within the pseudo-fs. 4352 o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and 4353 is within the pseudo-fs. 4355 o LOOKUP "path" --> NFS_OK. The current fh is for /this/is/the/path 4356 and is within a new, absent file system, but ... the client will 4357 never see the value of that fh. 4359 o GETFH --> NFS4ERR_MOVED. Fails because current fh is in an absent 4360 file system at the start of the operation, and the specification 4361 makes no exception for GETFH. 4363 o GETATTR(fsid,fileid,size,time_modify) Not executed because the 4364 failure of the GETFH stops processing of the COMPOUND. 4366 Given the failure of the GETFH, the client has the job of determining 4367 the root of the absent file system and where to find that file 4368 system, i.e., the server and path relative to that server's root fh. 4369 Note here that in this example, the client did not obtain filehandles 4370 and attribute information (e.g., fsid) for the intermediate 4371 directories, so that it would not be sure where the absent file 4372 system starts. It could be the case, for example, that /this/is/the 4373 is the root of the moved file system and that the reason that the 4374 look up of "path" succeeded is that the file system was not absent on 4375 that operation but was moved between the last LOOKUP and the GETFH 4376 (since COMPOUND is not atomic). Even if we had the fsids for all of 4377 the intermediate directories, we could have no way of knowing that 4378 /this/is/the/path was the root of a new file system, since we don't 4379 yet have its fsid. 4381 In order to get the necessary information, let us re-send the chain 4382 of LOOKUPs with GETFHs and GETATTRs to at least get the fsids so we 4383 can be sure where the appropriate file system boundaries are. The 4384 client could choose to get fs_locations at the same time but in most 4385 cases the client will have a good guess as to where file system 4386 boundaries are (because of where NFS4ERR_MOVED was, and was not, 4387 received) making fetching of fs_locations unnecessary. 4389 OP01: PUTROOTFH --> NFS_OK 4391 - Current fh is root of pseudo-fs. 4393 OP02: GETATTR(fsid) --> NFS_OK 4395 - Just for completeness. Normally, clients will know the fsid of 4396 the pseudo-fs as soon as they establish communication with a 4397 server. 4399 OP03: LOOKUP "this" --> NFS_OK 4401 OP04: GETATTR(fsid) --> NFS_OK 4403 - Get current fsid to see where file system boundaries are. The 4404 fsid will be that for the pseudo-fs in this example, so no 4405 boundary. 4407 OP05: GETFH --> NFS_OK 4409 - Current fh is for /this and is within pseudo-fs. 4411 OP06: LOOKUP "is" --> NFS_OK 4413 - Current fh is for /this/is and is within pseudo-fs. 4415 OP07: GETATTR(fsid) --> NFS_OK 4417 - Get current fsid to see where file system boundaries are. The 4418 fsid will be that for the pseudo-fs in this example, so no 4419 boundary. 4421 OP08: GETFH --> NFS_OK 4423 - Current fh is for /this/is and is within pseudo-fs. 4425 OP09: LOOKUP "the" --> NFS_OK 4427 - Current fh is for /this/is/the and is within pseudo-fs. 4429 OP10: GETATTR(fsid) --> NFS_OK 4430 - Get current fsid to see where file system boundaries are. The 4431 fsid will be that for the pseudo-fs in this example, so no 4432 boundary. 4434 OP11: GETFH --> NFS_OK 4436 - Current fh is for /this/is/the and is within pseudo-fs. 4438 OP12: LOOKUP "path" --> NFS_OK 4440 - Current fh is for /this/is/the/path and is within a new, absent 4441 file system, but ... 4443 - The client will never see the value of that fh. 4445 OP13: GETATTR(fsid, fs_locations) --> NFS_OK 4447 - We are getting the fsid to know where the file system boundaries 4448 are. In this operation, the fsid will be different than that of 4449 the parent directory (which in turn was retrieved in OP10). Note 4450 that the fsid we are given will not necessarily be preserved at 4451 the new location. That fsid might be different, and in fact the 4452 fsid we have for this file system might be a valid fsid of a 4453 different file system on that new server. 4455 - In this particular case, we are pretty sure anyway that what has 4456 moved is /this/is/the/path rather than /this/is/the since we have 4457 the fsid of the latter and it is that of the pseudo-fs, which 4458 presumably cannot move. However, in other examples, we might not 4459 have this kind of information to rely on (e.g., /this/is/the might 4460 be a non-pseudo file system separate from /this/is/the/path), so 4461 we need to have other reliable source information on the boundary 4462 of the file system that is moved. If, for example, the file 4463 system /this/is had moved, we would have a case of migration 4464 rather than referral, and once the boundaries of the migrated file 4465 system was clear we could fetch fs_locations. 4467 - We are fetching fs_locations because the fact that we got an 4468 NFS4ERR_MOVED at this point means that it is most likely that this 4469 is a referral and we need the destination. Even if it is the case 4470 that /this/is/the is a file system that has migrated, we will 4471 still need the location information for that file system. 4473 OP14: GETFH --> NFS4ERR_MOVED 4474 - Fails because current fh is in an absent file system at the start 4475 of the operation, and the specification makes no exception for 4476 GETFH. Note that this means the server will never send the client 4477 a filehandle from within an absent file system. 4479 Given the above, the client knows where the root of the absent file 4480 system is (/this/is/the/path) by noting where the change of fsid 4481 occurred (between "the" and "path"). The fs_locations attribute also 4482 gives the client the actual location of the absent file system, so 4483 that the referral can proceed. The server gives the client the bare 4484 minimum of information about the absent file system so that there 4485 will be very little scope for problems of conflict between 4486 information sent by the referring server and information of the file 4487 system's home. No filehandles and very few attributes are present on 4488 the referring server, and the client can treat those it receives as 4489 transient information with the function of enabling the referral. 4491 7.8.2. Referral Example (READDIR) 4493 Another context in which a client may encounter referrals is when it 4494 does a READDIR on a directory in which some of the sub-directories 4495 are the roots of absent file systems. 4497 Suppose such a directory is read as follows: 4499 o PUTROOTFH 4501 o LOOKUP "this" 4503 o LOOKUP "is" 4505 o LOOKUP "the" 4507 o READDIR (fsid, size, time_modify, mounted_on_fileid) 4509 In this case, because rdattr_error is not requested, fs_locations is 4510 not requested, and some of the attributes cannot be provided, the 4511 result will be an NFS4ERR_MOVED error on the READDIR, with the 4512 detailed results as follows: 4514 o PUTROOTFH --> NFS_OK. The current fh is at the root of the 4515 pseudo-fs. 4517 o LOOKUP "this" --> NFS_OK. The current fh is for /this and is 4518 within the pseudo-fs. 4520 o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is 4521 within the pseudo-fs. 4523 o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and 4524 is within the pseudo-fs. 4526 o READDIR (fsid, size, time_modify, mounted_on_fileid) --> 4527 NFS4ERR_MOVED. Note that the same error would have been returned 4528 if /this/is/the had migrated, but it is returned because the 4529 directory contains the root of an absent file system. 4531 So now suppose that we re-send with rdattr_error: 4533 o PUTROOTFH 4535 o LOOKUP "this" 4537 o LOOKUP "is" 4539 o LOOKUP "the" 4541 o READDIR (rdattr_error, fsid, size, time_modify, mounted_on_fileid) 4543 The results will be: 4545 o PUTROOTFH --> NFS_OK. The current fh is at the root of the 4546 pseudo-fs. 4548 o LOOKUP "this" --> NFS_OK. The current fh is for /this and is 4549 within the pseudo-fs. 4551 o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is 4552 within the pseudo-fs. 4554 o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and 4555 is within the pseudo-fs. 4557 o READDIR (rdattr_error, fsid, size, time_modify, mounted_on_fileid) 4558 --> NFS_OK. The attributes for directory entry with the component 4559 named "path" will only contain rdattr_error with the value 4560 NFS4ERR_MOVED, together with an fsid value and a value for 4561 mounted_on_fileid. 4563 So suppose we do another READDIR to get fs_locations (although we 4564 could have used a GETATTR directly, as in Section 7.8.1). 4566 o PUTROOTFH 4568 o LOOKUP "this" 4569 o LOOKUP "is" 4571 o LOOKUP "the" 4573 o READDIR (rdattr_error, fs_locations, mounted_on_fileid, fsid, 4574 size, time_modify) 4576 The results would be: 4578 o PUTROOTFH --> NFS_OK. The current fh is at the root of the 4579 pseudo-fs. 4581 o LOOKUP "this" --> NFS_OK. The current fh is for /this and is 4582 within the pseudo-fs. 4584 o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is 4585 within the pseudo-fs. 4587 o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and 4588 is within the pseudo-fs. 4590 o READDIR (rdattr_error, fs_locations, mounted_on_fileid, fsid, 4591 size, time_modify) --> NFS_OK. The attributes will be as shown 4592 below. 4594 The attributes for the directory entry with the component named 4595 "path" will only contain: 4597 o rdattr_error (value: NFS_OK) 4599 o fs_locations 4601 o mounted_on_fileid (value: unique fileid within referring file 4602 system) 4604 o fsid (value: unique value within referring server) 4606 The attributes for entry "path" will not contain size or time_modify 4607 because these attributes are not available within an absent file 4608 system. 4610 7.9. The Attribute fs_locations 4612 The fs_locations attribute is structured in the following way: 4614 struct fs_location4 { 4615 utf8must server<>; 4616 pathname4 rootpath; 4617 }; 4619 struct fs_locations4 { 4620 pathname4 fs_root; 4621 fs_location4 locations<>; 4622 }; 4624 The fs_location4 data type is used to represent the location of a 4625 file system by providing a server name and the path to the root of 4626 the file system within that server's namespace. When a set of 4627 servers have corresponding file systems at the same path within their 4628 namespaces, an array of server names may be provided. An entry in 4629 the server array is a UTF-8 string and represents one of a 4630 traditional DNS host name, IPv4 address, IPv6 address, or an zero- 4631 length string. A zero-length string SHOULD be used to indicate the 4632 current address being used for the RPC call. It is not a requirement 4633 that all servers that share the same rootpath be listed in one 4634 fs_location4 instance. The array of server names is provided for 4635 convenience. Servers that share the same rootpath may also be listed 4636 in separate fs_location4 entries in the fs_locations attribute. 4638 The fs_locations4 data type and fs_locations attribute contain an 4639 array of such locations. Since the namespace of each server may be 4640 constructed differently, the "fs_root" field is provided. The path 4641 represented by fs_root represents the location of the file system in 4642 the current server's namespace, i.e., that of the server from which 4643 the fs_locations attribute was obtained. The fs_root path is meant 4644 to aid the client by clearly referencing the root of the file system 4645 whose locations are being reported, no matter what object within the 4646 current file system the current filehandle designates. The fs_root 4647 is simply the pathname the client used to reach the object on the 4648 current server (i.e., the object to which the fs_locations attribute 4649 applies). 4651 When the fs_locations attribute is interrogated and there are no 4652 alternate file system locations, the server SHOULD return a zero- 4653 length array of fs_location4 structures, together with a valid 4654 fs_root. 4656 As an example, suppose there is a replicated file system located at 4657 two servers (servA and servB). At servA, the file system is located 4658 at path /a/b/c. At, servB the file system is located at path /x/y/z. 4659 If the client were to obtain the fs_locations value for the directory 4660 at /a/b/c/d, it might not necessarily know that the file system's 4661 root is located in servA's namespace at /a/b/c. When the client 4662 switches to servB, it will need to determine that the directory it 4663 first referenced at servA is now represented by the path /x/y/z/d on 4664 servB. To facilitate this, the fs_locations attribute provided by 4665 servA would have an fs_root value of /a/b/c and two entries in 4666 fs_locations. One entry in fs_locations will be for itself (servA) 4667 and the other will be for servB with a path of /x/y/z. With this 4668 information, the client is able to substitute /x/y/z for the /a/b/c 4669 at the beginning of its access path and construct /x/y/z/d to use for 4670 the new server. 4672 Note that: there is no requirement that the number of components in 4673 each rootpath be the same; there is no relation between the number of 4674 components in rootpath or fs_root, and none of the components in each 4675 rootpath and fs_root have to be the same. In the above example, we 4676 could have had a third element in the locations array, with server 4677 equal to "servC", and rootpath equal to "/I/II", and a fourth element 4678 in locations with server equal to "servD" and rootpath equal to 4679 "/aleph/beth/gimel/daleth/he". 4681 The relationship between fs_root to a rootpath is that the client 4682 replaces the pathname indicated in fs_root for the current server for 4683 the substitute indicated in rootpath for the new server. 4685 For an example of a referred or migrated file system, suppose there 4686 is a file system located at serv1. At serv1, the file system is 4687 located at /az/buky/vedi/glagoli. The client finds that object at 4688 glagoli has migrated (or is a referral). The client gets the 4689 fs_locations attribute, which contains an fs_root of /az/buky/vedi/ 4690 glagoli, and one element in the locations array, with server equal to 4691 serv2, and rootpath equal to /izhitsa/fita. The client replaces /az/ 4692 buky/vedi/glagoli with /izhitsa/fita, and uses the latter pathname on 4693 serv2. 4695 Thus, the server MUST return an fs_root that is equal to the path the 4696 client used to reach the object to which the fs_locations attribute 4697 applies. Otherwise, the client cannot determine the new path to use 4698 on the new server. 4700 7.9.1. Inferring Transition Modes 4702 When fs_locations is used, information about the specific locations 4703 should be assumed based on the following rules. 4705 The following rules are general and apply irrespective of the 4706 context. 4708 o All listed file system instances should be considered as of the 4709 same handle class if and only if the current fh_expire_type 4710 attribute does not include the FH4_VOL_MIGRATION bit. Note that 4711 in the case of referral, filehandle issues do not apply since 4712 there can be no filehandles known within the current file system 4713 nor is there any access to the fh_expire_type attribute on the 4714 referring (absent) file system. 4716 o All listed file system instances should be considered as of the 4717 same fileid class if and only if the fh_expire_type attribute 4718 indicates persistent filehandles and does not include the 4719 FH4_VOL_MIGRATION bit. Note that in the case of referral, fileid 4720 issues do not apply since there can be no fileids known within the 4721 referring (absent) file system nor is there any access to the 4722 fh_expire_type attribute. 4724 o All file system instances servers should be considered as of 4725 different change classes. 4727 o All file system instances servers should be considered as of 4728 different readdir classes. 4730 For other class assignments, handling of file system transitions 4731 depends on the reasons for the transition: 4733 o When the transition is due to migration, that is, the client was 4734 directed to a new file system after receiving an NFS4ERR_MOVED 4735 error, the target should be treated as being of the same write- 4736 verifier class as the source. 4738 o When the transition is due to failover to another replica, that 4739 is, the client selected another replica without receiving and 4740 NFS4ERR_MOVED error, the target should be treated as being of a 4741 different write-verifier class from the source. 4743 The specific choices reflect typical implementation patterns for 4744 failover and controlled migration, respectively. 4746 See Section 17 for a discussion on the recommendations for the 4747 security flavor to be used by any GETATTR operation that requests the 4748 "fs_locations" attribute. 4750 8. NFS Server Name Space 4751 8.1. Server Exports 4753 On a UNIX server the name space describes all the files reachable by 4754 pathnames under the root directory or "/". On a Windows NT server 4755 the name space constitutes all the files on disks named by mapped 4756 disk letters. NFS server administrators rarely make the entire 4757 server's filesystem name space available to NFS clients. More often 4758 portions of the name space are made available via an "export" 4759 feature. In previous versions of the NFS protocol, the root 4760 filehandle for each export is obtained through the MOUNT protocol; 4761 the client sends a string that identifies the export of name space 4762 and the server returns the root filehandle for it. The MOUNT 4763 protocol supports an EXPORTS procedure that will enumerate the 4764 server's exports. 4766 8.2. Browsing Exports 4768 The NFSv4 protocol provides a root filehandle that clients can use to 4769 obtain filehandles for these exports via a multi-component LOOKUP. A 4770 common user experience is to use a graphical user interface (perhaps 4771 a file "Open" dialog window) to find a file via progressive browsing 4772 through a directory tree. The client must be able to move from one 4773 export to another export via single-component, progressive LOOKUP 4774 operations. 4776 This style of browsing is not well supported by the NFSv2 and NFSv3 4777 protocols. The client expects all LOOKUP operations to remain within 4778 a single server filesystem. For example, the device attribute will 4779 not change. This prevents a client from taking name space paths that 4780 span exports. 4782 An automounter on the client can obtain a snapshot of the server's 4783 name space using the EXPORTS procedure of the MOUNT protocol. If it 4784 understands the server's pathname syntax, it can create an image of 4785 the server's name space on the client. The parts of the name space 4786 that are not exported by the server are filled in with a "pseudo 4787 filesystem" that allows the user to browse from one mounted 4788 filesystem to another. There is a drawback to this representation of 4789 the server's name space on the client: it is static. If the server 4790 administrator adds a new export the client will be unaware of it. 4792 8.3. Server Pseudo Filesystem 4794 NFSv4 servers avoid this name space inconsistency by presenting all 4795 the exports within the framework of a single server name space. An 4796 NFSv4 client uses LOOKUP and READDIR operations to browse seamlessly 4797 from one export to another. Portions of the server name space that 4798 are not exported are bridged via a "pseudo filesystem" that provides 4799 a view of exported directories only. A pseudo filesystem has a 4800 unique fsid and behaves like a normal, read only filesystem. 4802 Based on the construction of the server's name space, it is possible 4803 that multiple pseudo filesystems may exist. For example, 4805 /a pseudo filesystem 4806 /a/b real filesystem 4807 /a/b/c pseudo filesystem 4808 /a/b/c/d real filesystem 4810 Each of the pseudo filesystems are considered separate entities and 4811 therefore will have a unique fsid. 4813 8.4. Multiple Roots 4815 The DOS and Windows operating environments are sometimes described as 4816 having "multiple roots". Filesystems are commonly represented as 4817 disk letters. MacOS represents filesystems as top level names. 4818 NFSv4 servers for these platforms can construct a pseudo file system 4819 above these root names so that disk letters or volume names are 4820 simply directory names in the pseudo root. 4822 8.5. Filehandle Volatility 4824 The nature of the server's pseudo filesystem is that it is a logical 4825 representation of filesystem(s) available from the server. 4826 Therefore, the pseudo filesystem is most likely constructed 4827 dynamically when the server is first instantiated. It is expected 4828 that the pseudo filesystem may not have an on disk counterpart from 4829 which persistent filehandles could be constructed. Even though it is 4830 preferable that the server provide persistent filehandles for the 4831 pseudo filesystem, the NFS client should expect that pseudo file 4832 system filehandles are volatile. This can be confirmed by checking 4833 the associated "fh_expire_type" attribute for those filehandles in 4834 question. If the filehandles are volatile, the NFS client must be 4835 prepared to recover a filehandle value (e.g., with a multi-component 4836 LOOKUP) when receiving an error of NFS4ERR_FHEXPIRED. 4838 8.6. Exported Root 4840 If the server's root filesystem is exported, one might conclude that 4841 a pseudo-filesystem is not needed. This would be wrong. Assume the 4842 following filesystems on a server: 4844 / disk1 (exported) 4845 /a disk2 (not exported) 4846 /a/b disk3 (exported) 4847 Because disk2 is not exported, disk3 cannot be reached with simple 4848 LOOKUPs. The server must bridge the gap with a pseudo-filesystem. 4850 8.7. Mount Point Crossing 4852 The server filesystem environment may be constructed in such a way 4853 that one filesystem contains a directory which is 'covered' or 4854 mounted upon by a second filesystem. For example: 4856 /a/b (filesystem 1) 4857 /a/b/c/d (filesystem 2) 4859 The pseudo filesystem for this server may be constructed to look 4860 like: 4862 / (place holder/not exported) 4863 /a/b (filesystem 1) 4864 /a/b/c/d (filesystem 2) 4866 It is the server's responsibility to present the pseudo filesystem 4867 that is complete to the client. If the client sends a lookup request 4868 for the path "/a/b/c/d", the server's response is the filehandle of 4869 the filesystem "/a/b/c/d". In previous versions of the NFS protocol, 4870 the server would respond with the filehandle of directory "/a/b/c/d" 4871 within the filesystem "/a/b". 4873 The NFS client will be able to determine if it crosses a server mount 4874 point by a change in the value of the "fsid" attribute. 4876 8.8. Security Policy and Name Space Presentation 4878 The application of the server's security policy needs to be carefully 4879 considered by the implementor. One may choose to limit the 4880 viewability of portions of the pseudo filesystem based on the 4881 server's perception of the client's ability to authenticate itself 4882 properly. However, with the support of multiple security mechanisms 4883 and the ability to negotiate the appropriate use of these mechanisms, 4884 the server is unable to properly determine if a client will be able 4885 to authenticate itself. If, based on its policies, the server 4886 chooses to limit the contents of the pseudo filesystem, the server 4887 may effectively hide filesystems from a client that may otherwise 4888 have legitimate access. 4890 As suggested practice, the server should apply the security policy of 4891 a shared resource in the server's namespace to the components of the 4892 resource's ancestors. For example: 4894 / 4895 /a/b 4896 /a/b/c 4898 The /a/b/c directory is a real filesystem and is the shared resource. 4899 The security policy for /a/b/c is Kerberos with integrity. The 4900 server should apply the same security policy to /, /a, and /a/b. 4901 This allows for the extension of the protection of the server's 4902 namespace to the ancestors of the real shared resource. 4904 For the case of the use of multiple, disjoint security mechanisms in 4905 the server's resources, the security for a particular object in the 4906 server's namespace should be the union of all security mechanisms of 4907 all direct descendants. 4909 9. File Locking and Share Reservations 4911 Integrating locking into the NFS protocol necessarily causes it to be 4912 stateful. With the inclusion of share reservations the protocol 4913 becomes substantially more dependent on state than the traditional 4914 combination of NFS and NLM (Network Lock Manager) [32]. There are 4915 three components to making this state manageable: 4917 o clear division between client and server 4919 o ability to reliably detect inconsistency in state between client 4920 and server 4922 o simple and robust recovery mechanisms 4924 In this model, the server owns the state information. The client 4925 requests changes in locks and the server responds with the changes 4926 made. Non-client-initiated changes in locking state are infrequent. 4927 The client receives prompt notification of such changes and can 4928 adjust its view of the locking state to reflect the server's changes. 4930 Individual pieces of state created by the server and passed to the 4931 client at its request are represented by 128-bit stateids. These 4932 stateids may represent a particular open file, a set of byte-range 4933 locks held by a particular owner, or a recallable delegation of 4934 privileges to access a file in particular ways or at a particular 4935 location. 4937 In all cases, there is a transition from the most general information 4938 that represents a client as a whole to the eventual lightweight 4939 stateid used for most client and server locking interactions. The 4940 details of this transition will vary with the type of object but it 4941 always starts with a client ID. 4943 To support Win32 share reservations it is necessary to atomically 4944 OPEN or CREATE files. Having a separate share/unshare operation 4945 would not allow correct implementation of the Win32 OpenFile API. In 4946 order to correctly implement share semantics, the previous NFS 4947 protocol mechanisms used when a file is opened or created (LOOKUP, 4948 CREATE, ACCESS) need to be replaced. The NFSv4 protocol has an OPEN 4949 operation that subsumes the NFSv3 methodology of LOOKUP, CREATE, and 4950 ACCESS. However, because many operations require a filehandle, the 4951 traditional LOOKUP is preserved to map a file name to filehandle 4952 without establishing state on the server. The policy of granting 4953 access or modifying files is managed by the server based on the 4954 client's state. These mechanisms can implement policy ranging from 4955 advisory only locking to full mandatory locking. 4957 9.1. Opens and Byte-Range Locks 4959 It is assumed that manipulating a byte-range lock is rare when 4960 compared to READ and WRITE operations. It is also assumed that 4961 server restarts and network partitions are relatively rare. 4962 Therefore it is important that the READ and WRITE operations have a 4963 lightweight mechanism to indicate if they possess a held lock. A 4964 byte-range lock request contains the heavyweight information required 4965 to establish a lock and uniquely define the owner of the lock. 4967 The following sections describe the transition from the heavy weight 4968 information to the eventual stateid used for most client and server 4969 locking and lease interactions. 4971 9.1.1. Client ID 4973 For each LOCK request, the client must identify itself to the server. 4974 This is done in such a way as to allow for correct lock 4975 identification and crash recovery. A sequence of a SETCLIENTID 4976 operation followed by a SETCLIENTID_CONFIRM operation is required to 4977 establish the identification onto the server. Establishment of 4978 identification by a new incarnation of the client also has the effect 4979 of immediately breaking any leased state that a previous incarnation 4980 of the client might have had on the server, as opposed to forcing the 4981 new client incarnation to wait for the leases to expire. Breaking 4982 the lease state amounts to the server removing all lock, share 4983 reservation, and, where the server is not supporting the 4984 CLAIM_DELEGATE_PREV claim type, all delegation state associated with 4985 same client with the same identity. For discussion of delegation 4986 state recovery, see Section 10.2.1. 4988 Owners of opens and owners of byte-range locks are separate entities 4989 and remain separate even if the same opaque arrays are used to 4990 designate owners of each. The protocol distinguishes between open- 4991 owners (represented by open_owner4 structures) and lock-owners 4992 (represented by lock_owner4 structures). 4994 Each open is associated with a specific open-owner while each byte- 4995 range lock is associated with a lock-owner and an open-owner, the 4996 latter being the open-owner associated with the open file under which 4997 the LOCK operation was done. 4999 Unlike the text in NFSv4.1 [31], this text treats "lock_owner" as 5000 meaning both a open_owner4 and a lock_owner4. Also, a "lock" can 5001 refer to both a byte-range and share lock. 5003 Client identification is encapsulated in the following structure: 5005 struct nfs_client_id4 { 5006 verifier4 verifier; 5007 opaque id; 5008 }; 5010 The first field, verifier is a client incarnation verifier that is 5011 used to detect client reboots. Only if the verifier is different 5012 from that which the server has previously recorded the client (as 5013 identified by the second field of the structure, id) does the server 5014 start the process of canceling the client's leased state. 5016 The second field, id is a variable length string that uniquely 5017 defines the client. 5019 There are several considerations for how the client generates the id 5020 string: 5022 o The string should be unique so that multiple clients do not 5023 present the same string. The consequences of two clients 5024 presenting the same string range from one client getting an error 5025 to one client having its leased state abruptly and unexpectedly 5026 canceled. 5028 o The string should be selected so the subsequent incarnations 5029 (e.g., reboots) of the same client cause the client to present the 5030 same string. The implementor is cautioned against an approach 5031 that requires the string to be recorded in a local file because 5032 this precludes the use of the implementation in an environment 5033 where there is no local disk and all file access is from an NFSv4 5034 server. 5036 o The string should be different for each server network address 5037 that the client accesses, rather than common to all server network 5038 addresses. The reason is that it may not be possible for the 5039 client to tell if the same server is listening on multiple network 5040 addresses. If the client issues SETCLIENTID with the same id 5041 string to each network address of such a server, the server will 5042 think it is the same client, and each successive SETCLIENTID will 5043 cause the server to begin the process of removing the client's 5044 previous leased state. 5046 o The algorithm for generating the string should not assume that the 5047 client's network address won't change. This includes changes 5048 between client incarnations and even changes while the client is 5049 stilling running in its current incarnation. This means that if 5050 the client includes just the client's and server's network address 5051 in the id string, there is a real risk, after the client gives up 5052 the network address, that another client, using a similar 5053 algorithm for generating the id string, will generate a 5054 conflicting id string. 5056 Given the above considerations, an example of a well generated id 5057 string is one that includes: 5059 o The server's network address. 5061 o The client's network address. 5063 o For a user level NFSv4 client, it should contain additional 5064 information to distinguish the client from other user level 5065 clients running on the same host, such as an universally unique 5066 identifier (UUID). 5068 o Additional information that tends to be unique, such as one or 5069 more of: 5071 * The client machine's serial number (for privacy reasons, it is 5072 best to perform some one way function on the serial number). 5074 * A MAC address. 5076 * The timestamp of when the NFSv4 software was first installed on 5077 the client (though this is subject to the previously mentioned 5078 caution about using information that is stored in a file, 5079 because the file might only be accessible over NFSv4). 5081 * A true random number. However since this number ought to be 5082 the same between client incarnations, this shares the same 5083 problem as that of the using the timestamp of the software 5084 installation. 5086 As a security measure, the server MUST NOT cancel a client's leased 5087 state if the principal that established the state for a given id 5088 string is not the same as the principal issuing the SETCLIENTID. 5090 Note that SETCLIENTID and SETCLIENTID_CONFIRM has a secondary purpose 5091 of establishing the information the server needs to make callbacks to 5092 the client for purpose of supporting delegations. It is permitted to 5093 change this information via SETCLIENTID and SETCLIENTID_CONFIRM 5094 within the same incarnation of the client without removing the 5095 client's leased state. 5097 Once a SETCLIENTID and SETCLIENTID_CONFIRM sequence has successfully 5098 completed, the client uses the shorthand client identifier, of type 5099 clientid4, instead of the longer and less compact nfs_client_id4 5100 structure. This shorthand client identifier (a client ID) is 5101 assigned by the server and should be chosen so that it will not 5102 conflict with a client ID previously assigned by the server. This 5103 applies across server restarts or reboots. When a client ID is 5104 presented to a server and that client ID is not recognized, as would 5105 happen after a server reboot, the server will reject the request with 5106 the error NFS4ERR_STALE_CLIENTID. When this happens, the client must 5107 obtain a new client ID by use of the SETCLIENTID operation and then 5108 proceed to any other necessary recovery for the server reboot case 5109 (See Section 9.6.2). 5111 The client must also employ the SETCLIENTID operation when it 5112 receives a NFS4ERR_STALE_STATEID error using a stateid derived from 5113 its current client ID, since this also indicates a server reboot 5114 which has invalidated the existing client ID (see Section 9.1.4 for 5115 details). 5117 See the detailed descriptions of SETCLIENTID and SETCLIENTID_CONFIRM 5118 for a complete specification of the operations. 5120 9.1.2. Server Release of Client ID 5122 If the server determines that the client holds no associated state 5123 for its client ID, the server may choose to release the client ID. 5124 The server may make this choice for an inactive client so that 5125 resources are not consumed by those intermittently active clients. 5126 If the client contacts the server after this release, the server must 5127 ensure the client receives the appropriate error so that it will use 5128 the SETCLIENTID/SETCLIENTID_CONFIRM sequence to establish a new 5129 identity. It should be clear that the server must be very hesitant 5130 to release a client ID since the resulting work on the client to 5131 recover from such an event will be the same burden as if the server 5132 had failed and restarted. Typically a server would not release a 5133 client ID unless there had been no activity from that client for many 5134 minutes. 5136 Note that if the id string in a SETCLIENTID request is properly 5137 constructed, and if the client takes care to use the same principal 5138 for each successive use of SETCLIENTID, then, barring an active 5139 denial of service attack, NFS4ERR_CLID_INUSE should never be 5140 returned. 5142 However, client bugs, server bugs, or perhaps a deliberate change of 5143 the principal owner of the id string (such as the case of a client 5144 that changes security flavors, and under the new flavor, there is no 5145 mapping to the previous owner) will in rare cases result in 5146 NFS4ERR_CLID_INUSE. 5148 In that event, when the server gets a SETCLIENTID for a client ID 5149 that currently has no state, or it has state, but the lease has 5150 expired, rather than returning NFS4ERR_CLID_INUSE, the server MUST 5151 allow the SETCLIENTID, and confirm the new client ID if followed by 5152 the appropriate SETCLIENTID_CONFIRM. 5154 9.1.3. Stateid Definition 5156 When the server grants a lock of any type (including opens, byte- 5157 range locks, and delegations), it responds with a unique stateid that 5158 represents a set of locks (often a single lock) for the same file, of 5159 the same type, and sharing the same ownership characteristics. Thus, 5160 opens of the same file by different open- owners each have an 5161 identifying stateid. Similarly, each set of byte-range locks on a 5162 file owned by a specific lock-owner has its own identifying stateid. 5163 Delegations also have associated stateids by which they may be 5164 referenced. The stateid is used as a shorthand reference to a lock 5165 or set of locks, and given a stateid, the server can determine the 5166 associated state-owner or state-owners (in the case of an open-owner/ 5167 lock-owner pair) and the associated filehandle. When stateids are 5168 used, the current filehandle must be the one associated with that 5169 stateid. 5171 All stateids associated with a given client ID are associated with a 5172 common lease that represents the claim of those stateids and the 5173 objects they represent to be maintained by the server. See 5174 Section 9.5 for a discussion of the lease. 5176 The server may assign stateids independently for different clients. 5177 A stateid with the same bit pattern for one client may designate an 5178 entirely different set of locks for a different client. The stateid 5179 is always interpreted with respect to the client ID associated with 5180 the current session. 5182 9.1.3.1. Stateid Types 5184 With the exception of special stateids (see Section 9.1.3.3), each 5185 stateid represents locking objects of one of a set of types defined 5186 by the NFSv4 protocol. Note that in all these cases, where we speak 5187 of guarantee, it is understood there are situations such as a client 5188 restart, or lock revocation, that allow the guarantee to be voided. 5190 o Stateids may represent opens of files. 5192 Each stateid in this case represents the OPEN state for a given 5193 client ID/open-owner/filehandle triple. Such stateids are subject 5194 to change (with consequent incrementing of the stateid's seqid) in 5195 response to OPENs that result in upgrade and OPEN_DOWNGRADE 5196 operations. 5198 o Stateids may represent sets of byte-range locks. 5200 All locks held on a particular file by a particular owner and all 5201 gotten under the aegis of a particular open file are associated 5202 with a single stateid with the seqid being incremented whenever 5203 LOCK and LOCKU operations affect that set of locks. 5205 o Stateids may represent file delegations, which are recallable 5206 guarantees by the server to the client, that other clients will 5207 not reference, or will not modify a particular file, until the 5208 delegation is returned. 5210 A stateid represents a single delegation held by a client for a 5211 particular filehandle. 5213 9.1.3.2. Stateid Structure 5215 Stateids are divided into two fields, a 96-bit "other" field 5216 identifying the specific set of locks and a 32-bit "seqid" sequence 5217 value. Except in the case of special stateids (see Section 9.1.3.3), 5218 a particular value of the "other" field denotes a set of locks of the 5219 same type (for example, byte-range locks, opens, delegations, or 5220 layouts), for a specific file or directory, and sharing the same 5221 ownership characteristics. The seqid designates a specific instance 5222 of such a set of locks, and is incremented to indicate changes in 5223 such a set of locks, either by the addition or deletion of locks from 5224 the set, a change in the byte-range they apply to, or an upgrade or 5225 downgrade in the type of one or more locks. 5227 When such a set of locks is first created, the server returns a 5228 stateid with seqid value of one. On subsequent operations that 5229 modify the set of locks, the server is required to increment the 5230 "seqid" field by one whenever it returns a stateid for the same 5231 state-owner/file/type combination and there is some change in the set 5232 of locks actually designated. In this case, the server will return a 5233 stateid with an "other" field the same as previously used for that 5234 state-owner/file/type combination, with an incremented "seqid" field. 5235 This pattern continues until the seqid is incremented past 5236 NFS4_UINT32_MAX, and one (not zero) is the next seqid value. The 5237 purpose of the incrementing of the seqid is to allow the server to 5238 communicate to the client the order in which operations that modified 5239 locking state associated with a stateid have been processed and to 5240 make it possible for the client to send requests that are conditional 5241 on the set of locks not having changed since the stateid in question 5242 was returned. 5244 When a client sends a stateid to the server, it has two choices with 5245 regard to the seqid sent. It may set the seqid to zero to indicate 5246 to the server that it wishes the most up-to-date seqid for that 5247 stateid's "other" field to be used. This would be the common choice 5248 in the case of a stateid sent with a READ or WRITE operation. It 5249 also may set a non-zero value, in which case the server checks if 5250 that seqid is the correct one. In that case, the server is required 5251 to return NFS4ERR_OLD_STATEID if the seqid is lower than the most 5252 current value and NFS4ERR_BAD_STATEID if the seqid is greater than 5253 the most current value. This would be the common choice in the case 5254 of stateids sent with a CLOSE or OPEN_DOWNGRADE. Because OPENs may 5255 be sent in parallel for the same owner, a client might close a file 5256 without knowing that an OPEN upgrade had been done by the server, 5257 changing the lock in question. If CLOSE were sent with a zero seqid, 5258 the OPEN upgrade would be canceled before the client even received an 5259 indication that an upgrade had happened. 5261 When a stateid is sent by the server to the client as part of a 5262 callback operation, it is not subject to checking for a current seqid 5263 and returning NFS4ERR_OLD_STATEID. This is because the client is not 5264 in a position to know the most up-to-date seqid and thus cannot 5265 verify it. Unless specially noted, the seqid value for a stateid 5266 sent by the server to the client as part of a callback is required to 5267 be zero with NFS4ERR_BAD_STATEID returned if it is not. 5269 In making comparisons between seqids, both by the client in 5270 determining the order of operations and by the server in determining 5271 whether the NFS4ERR_OLD_STATEID is to be returned, the possibility of 5272 the seqid being swapped around past the NFS4_UINT32_MAX value needs 5273 to be taken into account. When two seqid values are being compared, 5274 the total count of slots for all sessions associated with the current 5275 client is used to do this. When one seqid value is less than this 5276 total slot count and another seqid value is greater than 5277 NFS4_UINT32_MAX minus the total slot count, the former is to be 5278 treated as lower than the latter, despite the fact that it is 5279 numerically greater. 5281 9.1.3.3. Special Stateids 5283 Stateid values whose "other" field is either all zeros or all ones 5284 are reserved. They may not be assigned by the server but have 5285 special meanings defined by the protocol. The particular meaning 5286 depends on whether the "other" field is all zeros or all ones and the 5287 specific value of the "seqid" field. 5289 The following combinations of "other" and "seqid" are defined in 5290 NFSv4: 5292 o When "other" and "seqid" are both zero, the stateid is treated as 5293 a special anonymous stateid, which can be used in READ, WRITE, and 5294 SETATTR requests to indicate the absence of any open state 5295 associated with the request. When an anonymous stateid value is 5296 used, and an existing open denies the form of access requested, 5297 then access will be denied to the request. 5299 o When "other" and "seqid" are both all ones, the stateid is a 5300 special READ bypass stateid. When this value is used in WRITE or 5301 SETATTR, it is treated like the anonymous value. When used in 5302 READ, the server MAY grant access, even if access would normally 5303 be denied to READ requests. 5305 o When "other" is zero and "seqid" is one, the stateid represents 5306 the current stateid, which is whatever value is the last stateid 5307 returned by an operation within the COMPOUND. In the case of an 5308 OPEN, the stateid returned for the open file, and not the 5309 delegation is used. The stateid passed to the operation in place 5310 of the special value has its "seqid" value set to zero, except 5311 when the current stateid is used by the operation CLOSE or 5312 OPEN_DOWNGRADE. If there is no operation in the COMPOUND which 5313 has returned a stateid value, the server MUST return the error 5314 NFS4ERR_BAD_STATEID. As illustrated in Figure 5, if the value of 5315 a current stateid is a special stateid, and the stateid of an 5316 operation's arguments has "other" set to zero, and "seqid" set to 5317 one, then the server MUST return the error NFS4ERR_BAD_STATEID. 5319 o When "other" is zero and "seqid" is NFS4_UINT32_MAX, the stateid 5320 represents a reserved stateid value defined to be invalid. When 5321 this stateid is used, the server MUST return the error 5322 NFS4ERR_BAD_STATEID. 5324 If a stateid value is used which has all zero or all ones in the 5325 "other" field, but does not match one of the cases above, the server 5326 MUST return the error NFS4ERR_BAD_STATEID. 5328 Special stateids, unlike other stateids, are not associated with 5329 individual client IDs or filehandles and can be used with all valid 5330 client IDs and filehandles. In the case of a special stateid 5331 designating the current stateid, the current stateid value 5332 substituted for the special stateid is associated with a particular 5333 client ID and filehandle, and so, if it is used where current 5334 filehandle does not match that associated with the current stateid, 5335 the operation to which the stateid is passed will return 5336 NFS4ERR_BAD_STATEID. 5338 9.1.3.4. Stateid Lifetime and Validation 5340 Stateids must remain valid until either a client restart or a server 5341 restart or until the client returns all of the locks associated with 5342 the stateid by means of an operation such as CLOSE or DELEGRETURN. 5343 If the locks are lost due to revocation as long as the client ID is 5344 valid, the stateid remains a valid designation of that revoked state. 5345 Stateids associated with byte-range locks are an exception. They 5346 remain valid even if a LOCKU frees all remaining locks, so long as 5347 the open file with which they are associated remains open. 5349 It should be noted that there are situations in which the client's 5350 locks become invalid, without the client requesting they be returned. 5351 These include lease expiration and a number of forms of lock 5352 revocation within the lease period. It is important to note that in 5353 these situations, the stateid remains valid and the client can use it 5354 to determine the disposition of the associated lost locks. 5356 An "other" value must never be reused for a different purpose (i.e. 5357 different filehandle, owner, or type of locks) within the context of 5358 a single client ID. A server may retain the "other" value for the 5359 same purpose beyond the point where it may otherwise be freed but if 5360 it does so, it must maintain "seqid" continuity with previous values. 5362 One mechanism that may be used to satisfy the requirement that the 5363 server recognize invalid and out-of-date stateids is for the server 5364 to divide the "other" field of the stateid into two fields. 5366 o An index into a table of locking-state structures. 5368 o A generation number which is incremented on each allocation of a 5369 table entry for a particular use. 5371 And then store in each table entry, 5372 o The client ID with which the stateid is associated. 5374 o The current generation number for the (at most one) valid stateid 5375 sharing this index value. 5377 o The filehandle of the file on which the locks are taken. 5379 o An indication of the type of stateid (open, byte-range lock, file 5380 delegation). 5382 o The last "seqid" value returned corresponding to the current 5383 "other" value. 5385 o An indication of the current status of the locks associated with 5386 this stateid. In particular, whether these have been revoked and 5387 if so, for what reason. 5389 With this information, an incoming stateid can be validated and the 5390 appropriate error returned when necessary. Special and non-special 5391 stateids are handled separately. (See Section 9.1.3.3 for a 5392 discussion of special stateids.) 5394 When a stateid is being tested, and the "other" field is all zeros or 5395 all ones, a check that the "other" and "seqid" fields match a defined 5396 combination for a special stateid is done and the results determined 5397 as follows: 5399 o If the "other" and "seqid" fields do not match a defined 5400 combination associated with a special stateid, the error 5401 NFS4ERR_BAD_STATEID is returned. 5403 o If the special stateid is one designating the current stateid, and 5404 there is a current stateid, then the current stateid is 5405 substituted for the special stateid and the checks appropriate to 5406 non-special stateids in performed. 5408 o If the combination is valid in general but is not appropriate to 5409 the context in which the stateid is used (e.g. an all-zero stateid 5410 is used when an open stateid is required in a LOCK operation), the 5411 error NFS4ERR_BAD_STATEID is also returned. 5413 o Otherwise, the check is completed and the special stateid is 5414 accepted as valid. 5416 When a stateid is being tested, and the "other" field is neither all 5417 zeros or all ones, the following procedure could be used to validate 5418 an incoming stateid and return an appropriate error, when necessary, 5419 assuming that the "other" field would be divided into a table index 5420 and an entry generation. 5422 o If the table index field is outside the range of the associated 5423 table, return NFS4ERR_BAD_STATEID. 5425 o If the selected table entry is of a different generation than that 5426 specified in the incoming stateid, return NFS4ERR_BAD_STATEID. 5428 o If the selected table entry does not match the current filehandle, 5429 return NFS4ERR_BAD_STATEID. 5431 o If the client ID in the table entry does not match the client ID 5432 associated with the current session, return NFS4ERR_BAD_STATEID. 5434 o If the stateid represents revoked state, then return 5435 NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, or NFS4ERR_DELEG_REVOKED, 5436 as appropriate. 5438 o If the stateid type is not valid for the context in which the 5439 stateid appears, return NFS4ERR_BAD_STATEID. Note that a stateid 5440 may be valid in general, but be invalid for a particular 5441 operation, as, for example, when a stateid which doesn't represent 5442 byte-range locks is passed to the non-from_open case of LOCK or to 5443 LOCKU, or when a stateid which does not represent an open is 5444 passed to CLOSE or OPEN_DOWNGRADE. In such cases, the server MUST 5445 return NFS4ERR_BAD_STATEID. 5447 o If the "seqid" field is not zero, and it is greater than the 5448 current sequence value corresponding the current "other" field, 5449 return NFS4ERR_BAD_STATEID. 5451 o If the "seqid" field is not zero, and it is less than the current 5452 sequence value corresponding the current "other" field, return 5453 NFS4ERR_OLD_STATEID. 5455 o Otherwise, the stateid is valid and the table entry should contain 5456 any additional information about the type of stateid and 5457 information associated with that particular type of stateid, such 5458 as the associated set of locks, such as open-owner and lock-owner 5459 information, as well as information on the specific locks, such as 5460 open modes and byte ranges. 5462 9.1.3.5. Stateid Use for I/O Operations 5464 Clients performing I/O operations need to select an appropriate 5465 stateid based on the locks (including opens and delegations) held by 5466 the client and the various types of state-owners sending the I/O 5467 requests. SETATTR operations that change the file size are treated 5468 like I/O operations in this regard. 5470 The following rules, applied in order of decreasing priority, govern 5471 the selection of the appropriate stateid. In following these rules, 5472 the client will only consider locks of which it has actually received 5473 notification by an appropriate operation response or callback. 5475 o If the client holds a delegation for the file in question, the 5476 delegation stateid SHOULD be used. 5478 o Otherwise, if the entity corresponding to the lock-owner (e.g., a 5479 process) sending the I/O has a byte-range lock stateid for the 5480 associated open file, then the byte-range lock stateid for that 5481 lock-owner and open file SHOULD be used. 5483 o If there is no byte-range lock stateid, then the OPEN stateid for 5484 the open file in question SHOULD be used. 5486 o Finally, if none of the above apply, then a special stateid SHOULD 5487 be used. 5489 Ignoring these rules may result in situations in which the server 5490 does not have information necessary to properly process the request. 5491 For example, when mandatory byte-range locks are in effect, if the 5492 stateid does not indicate the proper lock-owner, via a lock stateid, 5493 a request might be avoidably rejected. 5495 The server however should not try to enforce these ordering rules and 5496 should use whatever information is available to properly process I/O 5497 requests. In particular, when a client has a delegation for a given 5498 file, it SHOULD take note of this fact in processing a request, even 5499 if it is sent with a special stateid. 5501 9.1.3.6. Stateid Use for SETATTR Operations 5503 In the case of SETATTR operations, a stateid is present. In cases 5504 other than those that set the file size, the client may send either a 5505 special stateid or, when a delegation is held for the file in 5506 question, a delegation stateid. While the server SHOULD validate the 5507 stateid and may use the stateid to optimize the determination as to 5508 whether a delegation is held, it SHOULD note the presence of a 5509 delegation even when a special stateid is sent, and MUST accept a 5510 valid delegation stateid when sent. 5512 9.1.4. lock_owner 5514 When requesting a lock, the client must present to the server the 5515 client ID and an identifier for the owner of the requested lock. 5517 These two fields are referred to as the lock_owner and the definition 5518 of those fields are: 5520 o A client ID returned by the server as part of the client's use of 5521 the SETCLIENTID operation. 5523 o A variable length opaque array used to uniquely define the owner 5524 of a lock managed by the client. 5526 This may be a thread id, process id, or other unique value. 5528 When the server grants the lock, it responds with a unique stateid. 5529 The stateid is used as a shorthand reference to the lock_owner, since 5530 the server will be maintaining the correspondence between them. 5532 9.1.5. Use of the Stateid and Locking 5534 All READ, WRITE and SETATTR operations contain a stateid. For the 5535 purposes of this section, SETATTR operations which change the size 5536 attribute of a file are treated as if they are writing the area 5537 between the old and new size (i.e., the range truncated or added to 5538 the file by means of the SETATTR), even where SETATTR is not 5539 explicitly mentioned in the text. The stateid passed to one of these 5540 operations must be one that represents an OPEN, a set of byte-range 5541 locks, or a delegation, or it may be a special stateid representing 5542 anonymous access or the special bypass stateid. 5544 If the lock_owner performs a READ or WRITE in a situation in which it 5545 has established a lock or share reservation on the server (any OPEN 5546 constitutes a share reservation) the stateid (previously returned by 5547 the server) must be used to indicate what locks, including both byte- 5548 range locks and share reservations, are held by the lockowner. If no 5549 state is established by the client, either byte-range lock or share 5550 reservation, a stateid of all bits 0 is used. Regardless whether a 5551 stateid of all bits 0, or a stateid returned by the server is used, 5552 if there is a conflicting share reservation or mandatory byte-range 5553 lock held on the file, the server MUST refuse to service the READ or 5554 WRITE operation. 5556 Share reservations are established by OPEN operations and by their 5557 nature are mandatory in that when the OPEN denies READ or WRITE 5558 operations, that denial results in such operations being rejected 5559 with error NFS4ERR_LOCKED. Byte-range locks may be implemented by 5560 the server as either mandatory or advisory, or the choice of 5561 mandatory or advisory behavior may be determined by the server on the 5562 basis of the file being accessed (for example, some UNIX-based 5563 servers support a "mandatory lock bit" on the mode attribute such 5564 that if set, byte-range locks are required on the file before I/O is 5565 possible). When byte-range locks are advisory, they only prevent the 5566 granting of conflicting lock requests and have no effect on READs or 5567 WRITEs. Mandatory byte-range locks, however, prevent conflicting I/O 5568 operations. When they are attempted, they are rejected with 5569 NFS4ERR_LOCKED. When the client gets NFS4ERR_LOCKED on a file it 5570 knows it has the proper share reservation for, it will need to issue 5571 a LOCK request on the region of the file that includes the region the 5572 I/O was to be performed on, with an appropriate locktype (i.e., 5573 READ*_LT for a READ operation, WRITE*_LT for a WRITE operation). 5575 With NFSv3, there was no notion of a stateid so there was no way to 5576 tell if the application process of the client sending the READ or 5577 WRITE operation had also acquired the appropriate byte-range lock on 5578 the file. Thus there was no way to implement mandatory locking. 5579 With the stateid construct, this barrier has been removed. 5581 Note that for UNIX environments that support mandatory file locking, 5582 the distinction between advisory and mandatory locking is subtle. In 5583 fact, advisory and mandatory byte-range locks are exactly the same in 5584 so far as the APIs and requirements on implementation. If the 5585 mandatory lock attribute is set on the file, the server checks to see 5586 if the lockowner has an appropriate shared (read) or exclusive 5587 (write) byte-range lock on the region it wishes to read or write to. 5588 If there is no appropriate lock, the server checks if there is a 5589 conflicting lock (which can be done by attempting to acquire the 5590 conflicting lock on the behalf of the lockowner, and if successful, 5591 release the lock after the READ or WRITE is done), and if there is, 5592 the server returns NFS4ERR_LOCKED. 5594 For Windows environments, there are no advisory byte-range locks, so 5595 the server always checks for byte-range locks during I/O requests. 5597 Thus, the NFSv4 LOCK operation does not need to distinguish between 5598 advisory and mandatory byte-range locks. It is the NFS version 4 5599 server's processing of the READ and WRITE operations that introduces 5600 the distinction. 5602 Every stateid other than the special stateid values noted in this 5603 section, whether returned by an OPEN-type operation (i.e., OPEN, 5604 OPEN_DOWNGRADE), or by a LOCK-type operation (i.e., LOCK or LOCKU), 5605 defines an access mode for the file (i.e., READ, WRITE, or READ- 5606 WRITE) as established by the original OPEN which began the stateid 5607 sequence, and as modified by subsequent OPENs and OPEN_DOWNGRADEs 5608 within that stateid sequence. When a READ, WRITE, or SETATTR which 5609 specifies the size attribute, is done, the operation is subject to 5610 checking against the access mode to verify that the operation is 5611 appropriate given the OPEN with which the operation is associated. 5613 In the case of WRITE-type operations (i.e., WRITEs and SETATTRs which 5614 set size), the server must verify that the access mode allows writing 5615 and return an NFS4ERR_OPENMODE error if it does not. In the case, of 5616 READ, the server may perform the corresponding check on the access 5617 mode, or it may choose to allow READ on opens for WRITE only, to 5618 accommodate clients whose write implementation may unavoidably do 5619 reads (e.g., due to buffer cache constraints). However, even if 5620 READs are allowed in these circumstances, the server MUST still check 5621 for locks that conflict with the READ (e.g., another open specify 5622 denial of READs). Note that a server which does enforce the access 5623 mode check on READs need not explicitly check for conflicting share 5624 reservations since the existence of OPEN for read access guarantees 5625 that no conflicting share reservation can exist. 5627 A stateid of all bits 1 (one) MAY allow READ operations to bypass 5628 locking checks at the server. However, WRITE operations with a 5629 stateid with bits all 1 (one) MUST NOT bypass locking checks and are 5630 treated exactly the same as if a stateid of all bits 0 were used. 5632 A lock may not be granted while a READ or WRITE operation using one 5633 of the special stateids is being performed and the range of the lock 5634 request conflicts with the range of the READ or WRITE operation. For 5635 the purposes of this paragraph, a conflict occurs when a shared lock 5636 is requested and a WRITE operation is being performed, or an 5637 exclusive lock is requested and either a READ or a WRITE operation is 5638 being performed. A SETATTR that sets size is treated similarly to a 5639 WRITE as discussed above. 5641 9.1.6. Sequencing of Lock Requests 5643 Locking is different than most NFS operations as it requires "at- 5644 most-one" semantics that are not provided by ONCRPC. ONCRPC over a 5645 reliable transport is not sufficient because a sequence of locking 5646 requests may span multiple TCP connections. In the face of 5647 retransmission or reordering, lock or unlock requests must have a 5648 well defined and consistent behavior. To accomplish this, each lock 5649 request contains a sequence number that is a consecutively increasing 5650 integer. Different lock_owners have different sequences. The server 5651 maintains the last sequence number (L) received and the response that 5652 was returned. The server is free to assign any value for the first 5653 request issued for any given lock_owner. 5655 Note that for requests that contain a sequence number, for each 5656 lock_owner, there should be no more than one outstanding request. 5658 If a request (r) with a previous sequence number (r < L) is received, 5659 it is rejected with the return of error NFS4ERR_BAD_SEQID. Given a 5660 properly-functioning client, the response to (r) must have been 5661 received before the last request (L) was sent. If a duplicate of 5662 last request (r == L) is received, the stored response is returned. 5663 If a request beyond the next sequence (r == L + 2) is received, it is 5664 rejected with the return of error NFS4ERR_BAD_SEQID. Sequence 5665 history is reinitialized whenever the SETCLIENTID/SETCLIENTID_CONFIRM 5666 sequence changes the client verifier. 5668 Since the sequence number is represented with an unsigned 32-bit 5669 integer, the arithmetic involved with the sequence number is mod 5670 2^32. For an example of modulo arithmetic involving sequence numbers 5671 see [33]. 5673 It is critical the server maintain the last response sent to the 5674 client to provide a more reliable cache of duplicate non-idempotent 5675 requests than that of the traditional cache described in [34]. The 5676 traditional duplicate request cache uses a least recently used 5677 algorithm for removing unneeded requests. However, the last lock 5678 request and response on a given lock_owner must be cached as long as 5679 the lock state exists on the server. 5681 The client MUST monotonically increment the sequence number for the 5682 CLOSE, LOCK, LOCKU, OPEN, OPEN_CONFIRM, and OPEN_DOWNGRADE 5683 operations. This is true even in the event that the previous 5684 operation that used the sequence number received an error. The only 5685 exception to this rule is if the previous operation received one of 5686 the following errors: NFS4ERR_STALE_CLIENTID, NFS4ERR_STALE_STATEID, 5687 NFS4ERR_BAD_STATEID, NFS4ERR_BAD_SEQID, NFS4ERR_BADXDR, 5688 NFS4ERR_RESOURCE, NFS4ERR_NOFILEHANDLE, or NFS4ERR_MOVED. 5690 9.1.7. Recovery from Replayed Requests 5692 As described above, the sequence number is per lock_owner. As long 5693 as the server maintains the last sequence number received and follows 5694 the methods described above, there are no risks of a Byzantine router 5695 re-sending old requests. The server need only maintain the 5696 (lock_owner, sequence number) state as long as there are open files 5697 or closed files with locks outstanding. 5699 LOCK, LOCKU, OPEN, OPEN_DOWNGRADE, and CLOSE each contain a sequence 5700 number and therefore the risk of the replay of these operations 5701 resulting in undesired effects is non-existent while the server 5702 maintains the lock_owner state. 5704 9.1.8. Releasing lock_owner State 5706 When a particular lock_owner no longer holds open or file locking 5707 state at the server, the server may choose to release the sequence 5708 number state associated with the lock_owner. The server may make 5709 this choice based on lease expiration, for the reclamation of server 5710 memory, or other implementation specific details. In any event, the 5711 server is able to do this safely only when the lock_owner no longer 5712 is being utilized by the client. The server may choose to hold the 5713 lock_owner state in the event that retransmitted requests are 5714 received. However, the period to hold this state is implementation 5715 specific. 5717 In the case that a LOCK, LOCKU, OPEN_DOWNGRADE, or CLOSE is 5718 retransmitted after the server has previously released the lock_owner 5719 state, the server will find that the lock_owner has no files open and 5720 an error will be returned to the client. If the lock_owner does have 5721 a file open, the stateid will not match and again an error is 5722 returned to the client. 5724 9.1.9. Use of Open Confirmation 5726 In the case that an OPEN is retransmitted and the lock_owner is being 5727 used for the first time or the lock_owner state has been previously 5728 released by the server, the use of the OPEN_CONFIRM operation will 5729 prevent incorrect behavior. When the server observes the use of the 5730 lock_owner for the first time, it will direct the client to perform 5731 the OPEN_CONFIRM for the corresponding OPEN. This sequence 5732 establishes the use of a lock_owner and associated sequence number. 5733 Since the OPEN_CONFIRM sequence connects a new open_owner on the 5734 server with an existing open_owner on a client, the sequence number 5735 may have any value. The OPEN_CONFIRM step assures the server that 5736 the value received is the correct one. (see Section 15.20 for further 5737 details.) 5739 There are a number of situations in which the requirement to confirm 5740 an OPEN would pose difficulties for the client and server, in that 5741 they would be prevented from acting in a timely fashion on 5742 information received, because that information would be provisional, 5743 subject to deletion upon non-confirmation. Fortunately, these are 5744 situations in which the server can avoid the need for confirmation 5745 when responding to open requests. The two constraints are: 5747 o The server must not bestow a delegation for any open which would 5748 require confirmation. 5750 o The server MUST NOT require confirmation on a reclaim-type open 5751 (i.e., one specifying claim type CLAIM_PREVIOUS or 5752 CLAIM_DELEGATE_PREV). 5754 These constraints are related in that reclaim-type opens are the only 5755 ones in which the server may be required to send a delegation. For 5756 CLAIM_NULL, sending the delegation is optional while for 5757 CLAIM_DELEGATE_CUR, no delegation is sent. 5759 Delegations being sent with an open requiring confirmation are 5760 troublesome because recovering from non-confirmation adds undue 5761 complexity to the protocol while requiring confirmation on reclaim- 5762 type opens poses difficulties in that the inability to resolve the 5763 status of the reclaim until lease expiration may make it difficult to 5764 have timely determination of the set of locks being reclaimed (since 5765 the grace period may expire). 5767 Requiring open confirmation on reclaim-type opens is avoidable 5768 because of the nature of the environments in which such opens are 5769 done. For CLAIM_PREVIOUS opens, this is immediately after server 5770 reboot, so there should be no time for lockowners to be created, 5771 found to be unused, and recycled. For CLAIM_DELEGATE_PREV opens, we 5772 are dealing with a client reboot situation. A server which supports 5773 delegation can be sure that no lockowners for that client have been 5774 recycled since client initialization and thus can ensure that 5775 confirmation will not be required. 5777 9.2. Lock Ranges 5779 The protocol allows a lock owner to request a lock with a byte range 5780 and then either upgrade or unlock a sub-range of the initial lock. 5781 It is expected that this will be an uncommon type of request. In any 5782 case, servers or server filesystems may not be able to support sub- 5783 range lock semantics. In the event that a server receives a locking 5784 request that represents a sub-range of current locking state for the 5785 lock owner, the server is allowed to return the error 5786 NFS4ERR_LOCK_RANGE to signify that it does not support sub-range lock 5787 operations. Therefore, the client should be prepared to receive this 5788 error and, if appropriate, report the error to the requesting 5789 application. 5791 The client is discouraged from combining multiple independent locking 5792 ranges that happen to be adjacent into a single request since the 5793 server may not support sub-range requests and for reasons related to 5794 the recovery of file locking state in the event of server failure. 5795 As discussed in the Section 9.6.2 below, the server may employ 5796 certain optimizations during recovery that work effectively only when 5797 the client's behavior during lock recovery is similar to the client's 5798 locking behavior prior to server failure. 5800 9.3. Upgrading and Downgrading Locks 5802 If a client has a write lock on a record, it can request an atomic 5803 downgrade of the lock to a read lock via the LOCK request, by setting 5804 the type to READ_LT. If the server supports atomic downgrade, the 5805 request will succeed. If not, it will return NFS4ERR_LOCK_NOTSUPP. 5806 The client should be prepared to receive this error, and if 5807 appropriate, report the error to the requesting application. 5809 If a client has a read lock on a record, it can request an atomic 5810 upgrade of the lock to a write lock via the LOCK request by setting 5811 the type to WRITE_LT or WRITEW_LT. If the server does not support 5812 atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the upgrade 5813 can be achieved without an existing conflict, the request will 5814 succeed. Otherwise, the server will return either NFS4ERR_DENIED or 5815 NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the 5816 client issued the LOCK request with the type set to WRITEW_LT and the 5817 server has detected a deadlock. The client should be prepared to 5818 receive such errors and if appropriate, report the error to the 5819 requesting application. 5821 9.4. Blocking Locks 5823 Some clients require the support of blocking locks. The NFS version 5824 4 protocol must not rely on a callback mechanism and therefore is 5825 unable to notify a client when a previously denied lock has been 5826 granted. Clients have no choice but to continually poll for the 5827 lock. This presents a fairness problem. Two new lock types are 5828 added, READW and WRITEW, and are used to indicate to the server that 5829 the client is requesting a blocking lock. The server should maintain 5830 an ordered list of pending blocking locks. When the conflicting lock 5831 is released, the server may wait the lease period for the first 5832 waiting client to re-request the lock. After the lease period 5833 expires the next waiting client request is allowed the lock. Clients 5834 are required to poll at an interval sufficiently small that it is 5835 likely to acquire the lock in a timely manner. The server is not 5836 required to maintain a list of pending blocked locks as it is used to 5837 increase fairness and not correct operation. Because of the 5838 unordered nature of crash recovery, storing of lock state to stable 5839 storage would be required to guarantee ordered granting of blocking 5840 locks. 5842 Servers may also note the lock types and delay returning denial of 5843 the request to allow extra time for a conflicting lock to be 5844 released, allowing a successful return. In this way, clients can 5845 avoid the burden of needlessly frequent polling for blocking locks. 5846 The server should take care in the length of delay in the event the 5847 client retransmits the request. 5849 If a server receives a blocking lock request, denies it, and then 5850 later receives a nonblocking request for the same lock, which is also 5851 denied, then it should remove the lock in question from its list of 5852 pending blocking locks. Clients should use such a nonblocking 5853 request to indicate to the server that this is the last time they 5854 intend to poll for the lock, as may happen when the process 5855 requesting the lock is interrupted. This is a courtesy to the 5856 server, to prevent it from unnecessarily waiting a lease period 5857 before granting other lock requests. However, clients are not 5858 required to perform this courtesy, and servers must not depend on 5859 them doing so. Also, clients must be prepared for the possibility 5860 that this final locking request will be accepted. 5862 9.5. Lease Renewal 5864 The purpose of a lease is to allow a server to remove stale locks 5865 that are held by a client that has crashed or is otherwise 5866 unreachable. It is not a mechanism for cache consistency and lease 5867 renewals may not be denied if the lease interval has not expired. 5869 The following events cause implicit renewal of all of the leases for 5870 a given client (i.e., all those sharing a given client ID). Each of 5871 these is a positive indication that the client is still active and 5872 that the associated state held at the server, for the client, is 5873 still valid. 5875 o An OPEN with a valid client ID. 5877 o Any operation made with a valid stateid (CLOSE, DELEGPURGE, 5878 DELEGRETURN, LOCK, LOCKU, OPEN, OPEN_CONFIRM, OPEN_DOWNGRADE, 5879 READ, RENEW, SETATTR, or WRITE). This does not include the 5880 special stateids of all bits 0 or all bits 1. 5882 Note that if the client had restarted or rebooted, the client 5883 would not be making these requests without issuing the 5884 SETCLIENTID/SETCLIENTID_CONFIRM sequence. The use of the 5885 SETCLIENTID/SETCLIENTID_CONFIRM sequence (one that changes the 5886 client verifier) notifies the server to drop the locking state 5887 associated with the client. SETCLIENTID/SETCLIENTID_CONFIRM never 5888 renews a lease. 5890 If the server has rebooted, the stateids (NFS4ERR_STALE_STATEID 5891 error) or the client ID (NFS4ERR_STALE_CLIENTID error) will not be 5892 valid hence preventing spurious renewals. 5894 This approach allows for low overhead lease renewal which scales 5895 well. In the typical case no extra RPC calls are required for lease 5896 renewal and in the worst case one RPC is required every lease period 5897 (i.e., a RENEW operation). The number of locks held by the client is 5898 not a factor since all state for the client is involved with the 5899 lease renewal action. 5901 Since all operations that create a new lease also renew existing 5902 leases, the server must maintain a common lease expiration time for 5903 all valid leases for a given client. This lease time can then be 5904 easily updated upon implicit lease renewal actions. 5906 9.6. Crash Recovery 5908 The important requirement in crash recovery is that both the client 5909 and the server know when the other has failed. Additionally, it is 5910 required that a client sees a consistent view of data across server 5911 restarts or reboots. All READ and WRITE operations that may have 5912 been queued within the client or network buffers must wait until the 5913 client has successfully recovered the locks protecting the READ and 5914 WRITE operations. 5916 9.6.1. Client Failure and Recovery 5918 In the event that a client fails, the server may recover the client's 5919 locks when the associated leases have expired. Conflicting locks 5920 from another client may only be granted after this lease expiration. 5921 If the client is able to restart or reinitialize within the lease 5922 period the client may be forced to wait the remainder of the lease 5923 period before obtaining new locks. 5925 To minimize client delay upon restart, lock requests are associated 5926 with an instance of the client by a client supplied verifier. This 5927 verifier is part of the initial SETCLIENTID call made by the client. 5928 The server returns a client ID as a result of the SETCLIENTID 5929 operation. The client then confirms the use of the client ID with 5930 SETCLIENTID_CONFIRM. The client ID in combination with an opaque 5931 owner field is then used by the client to identify the lock owner for 5932 OPEN. This chain of associations is then used to identify all locks 5933 for a particular client. 5935 Since the verifier will be changed by the client upon each 5936 initialization, the server can compare a new verifier to the verifier 5937 associated with currently held locks and determine that they do not 5938 match. This signifies the client's new instantiation and subsequent 5939 loss of locking state. As a result, the server is free to release 5940 all locks held which are associated with the old client ID which was 5941 derived from the old verifier. 5943 Note that the verifier must have the same uniqueness properties of 5944 the verifier for the COMMIT operation. 5946 9.6.2. Server Failure and Recovery 5948 If the server loses locking state (usually as a result of a restart 5949 or reboot), it must allow clients time to discover this fact and re- 5950 establish the lost locking state. The client must be able to re- 5951 establish the locking state without having the server deny valid 5952 requests because the server has granted conflicting access to another 5953 client. Likewise, if there is the possibility that clients have not 5954 yet re-established their locking state for a file, the server must 5955 disallow READ and WRITE operations for that file. The duration of 5956 this recovery period is equal to the duration of the lease period. 5958 A client can determine that server failure (and thus loss of locking 5959 state) has occurred, when it receives one of two errors. The 5960 NFS4ERR_STALE_STATEID error indicates a stateid invalidated by a 5961 reboot or restart. The NFS4ERR_STALE_CLIENTID error indicates a 5962 client ID invalidated by reboot or restart. When either of these are 5963 received, the client must establish a new client ID (see 5964 Section 9.1.1) and re-establish the locking state as discussed below. 5966 The period of special handling of locking and READs and WRITEs, equal 5967 in duration to the lease period, is referred to as the "grace 5968 period". During the grace period, clients recover locks and the 5969 associated state by reclaim-type locking requests (i.e., LOCK 5970 requests with reclaim set to true and OPEN operations with a claim 5971 type of CLAIM_PREVIOUS). During the grace period, the server must 5972 reject READ and WRITE operations and non-reclaim locking requests 5973 (i.e., other LOCK and OPEN operations) with an error of 5974 NFS4ERR_GRACE. 5976 If the server can reliably determine that granting a non-reclaim 5977 request will not conflict with reclamation of locks by other clients, 5978 the NFS4ERR_GRACE error does not have to be returned and the non- 5979 reclaim client request can be serviced. For the server to be able to 5980 service READ and WRITE operations during the grace period, it must 5981 again be able to guarantee that no possible conflict could arise 5982 between an impending reclaim locking request and the READ or WRITE 5983 operation. If the server is unable to offer that guarantee, the 5984 NFS4ERR_GRACE error must be returned to the client. 5986 For a server to provide simple, valid handling during the grace 5987 period, the easiest method is to simply reject all non-reclaim 5988 locking requests and READ and WRITE operations by returning the 5989 NFS4ERR_GRACE error. However, a server may keep information about 5990 granted locks in stable storage. With this information, the server 5991 could determine if a regular lock or READ or WRITE operation can be 5992 safely processed. 5994 For example, if a count of locks on a given file is available in 5995 stable storage, the server can track reclaimed locks for the file and 5996 when all reclaims have been processed, non-reclaim locking requests 5997 may be processed. This way the server can ensure that non-reclaim 5998 locking requests will not conflict with potential reclaim requests. 5999 With respect to I/O requests, if the server is able to determine that 6000 there are no outstanding reclaim requests for a file by information 6001 from stable storage or another similar mechanism, the processing of 6002 I/O requests could proceed normally for the file. 6004 To reiterate, for a server that allows non-reclaim lock and I/O 6005 requests to be processed during the grace period, it MUST determine 6006 that no lock subsequently reclaimed will be rejected and that no lock 6007 subsequently reclaimed would have prevented any I/O operation 6008 processed during the grace period. 6010 Clients should be prepared for the return of NFS4ERR_GRACE errors for 6011 non-reclaim lock and I/O requests. In this case the client should 6012 employ a retry mechanism for the request. A delay (on the order of 6013 several seconds) between retries should be used to avoid overwhelming 6014 the server. Further discussion of the general issue is included in 6015 [20]. The client must account for the server that is able to perform 6016 I/O and non-reclaim locking requests within the grace period as well 6017 as those that cannot do so. 6019 A reclaim-type locking request outside the server's grace period can 6020 only succeed if the server can guarantee that no conflicting lock or 6021 I/O request has been granted since reboot or restart. 6023 A server may, upon restart, establish a new value for the lease 6024 period. Therefore, clients should, once a new client ID is 6025 established, refetch the lease_time attribute and use it as the basis 6026 for lease renewal for the lease associated with that server. 6027 However, the server must establish, for this restart event, a grace 6028 period at least as long as the lease period for the previous server 6029 instantiation. This allows the client state obtained during the 6030 previous server instance to be reliably re-established. 6032 9.6.3. Network Partitions and Recovery 6034 If the duration of a network partition is greater than the lease 6035 period provided by the server, the server will have not received a 6036 lease renewal from the client. If this occurs, the server may free 6037 all locks held for the client. As a result, all stateids held by the 6038 client will become invalid or stale. Once the client is able to 6039 reach the server after such a network partition, all I/O submitted by 6040 the client with the now invalid stateids will fail with the server 6041 returning the error NFS4ERR_EXPIRED. Once this error is received, 6042 the client will suitably notify the application that held the lock. 6044 9.6.3.1. Courtesy Locks 6046 As a courtesy to the client or as an optimization, the server may 6047 continue to hold locks on behalf of a client for which recent 6048 communication has extended beyond the lease period. If the server 6049 receives a lock or I/O request that conflicts with one of these 6050 courtesy locks, the server MUST free the courtesy lock and grant the 6051 new request. If the server runs out of resources, it MAY free all 6052 courtesy locks. I.e., the client MUST not make an assumption that 6053 the server has issued courtesy locks. 6055 If the server does not reboot before the network partition is healed, 6056 when the original client tries to access a courtesy lock which was 6057 freed, the server SHOULD send back a NFS4ERR_BAD_STATEID to the 6058 client. If the client tries to access a courtesy lock which was not 6059 freed, then the server SHOULD mark all of the courtesy locks as 6060 implicitly being renewed. 6062 When a network partition is combined with a server reboot, then both 6063 the server and client have responsibilities to ensure that the client 6064 does not reclaim a lock which it should no longer be able to access. 6065 The next sections illustrate examples of these edge conditions and 6066 the steps necessary to be undertaken to ensure proper lock semantics. 6068 9.6.3.1.1. First Server Edge Condition 6070 The first edge condition has the following scenario: 6072 1. Client A acquires a lock. 6074 2. Client A and server experience mutual network partition, such 6075 that client A is unable to renew its lease. 6077 3. Client A's lease expires, so server releases lock. 6079 4. Client B acquires a lock that would have conflicted with that of 6080 Client A. 6082 5. Client B releases the lock 6084 6. Server reboots 6086 7. Network partition between client A and server heals. 6088 8. Client A issues a RENEW operation, and gets back a 6089 NFS4ERR_STALE_CLIENTID. 6091 9. Client A reclaims its lock within the server's grace period. 6093 Thus, at the final step, the server has erroneously granted client 6094 A's lock reclaim. If client B modified the object the lock was 6095 protecting, client A will experience object corruption. 6097 9.6.3.1.2. Second Server Edge Condition 6099 The second known edge condition follows: 6101 1. Client A acquires a lock. 6103 2. Server reboots. 6105 3. Client A and server experience mutual network partition, such 6106 that client A is unable to reclaim its lock within the grace 6107 period. 6109 4. Server's reclaim grace period ends. Client A has no locks 6110 recorded on server. 6112 5. Client B acquires a lock that would have conflicted with that of 6113 Client A. 6115 6. Client B releases the lock. 6117 7. Server reboots a second time. 6119 8. Network partition between client A and server heals. 6121 9. Client A issues a RENEW operation, and gets back a 6122 NFS4ERR_STALE_CLIENTID. 6124 10. Client A reclaims its lock within the server's grace period. 6126 As with the first edge condition, the final step of the scenario of 6127 the second edge condition has the server erroneously granting client 6128 A's lock reclaim. 6130 9.6.3.1.3. Handling Server Edge Conditions 6132 Solving these edge conditions requires that the server either assume 6133 after it reboots that edge condition occurs, and thus return 6134 NFS4ERR_NO_GRACE for all reclaim attempts, or that the server record 6135 some information in stable storage. The amount of information the 6136 server records in stable storage is in inverse proportion to how 6137 harsh the server wants to be whenever the edge conditions occur. The 6138 server that is completely tolerant of all edge conditions will record 6139 in stable storage every lock that is acquired, removing the lock 6140 record from stable storage only when the lock is unlocked by the 6141 client and the lock's lockowner advances the sequence number such 6142 that the lock release is not the last stateful event for the 6143 lockowner's sequence. For the two aforementioned edge conditions, 6144 the harshest a server can be, and still support a grace period for 6145 reclaims, requires that the server record in stable storage 6146 information some minimal information. For example, a server 6147 implementation could, for each client, save in stable storage a 6148 record containing: 6150 o the client's id string 6152 o a boolean that indicates if the client's lease expired or if there 6153 was administrative intervention (see Section 9.8) to revoke a 6154 byte-range lock, share reservation, or delegation 6156 o a timestamp that is updated the first time after a server boot or 6157 reboot the client acquires byte-range locking, share reservation, 6158 or delegation state on the server. The timestamp need not be 6159 updated on subsequent lock requests until the server reboots. 6161 The server implementation would also record in the stable storage the 6162 timestamps from the two most recent server reboots. 6164 Assuming the above record keeping, for the first edge condition, 6165 after the server reboots, the record that client A's lease expired 6166 means that another client could have acquired a conflicting record 6167 lock, share reservation, or delegation. Hence the server must reject 6168 a reclaim from client A with the error NFS4ERR_NO_GRACE. 6170 For the second edge condition, after the server reboots for a second 6171 time, the record that the client had an unexpired record lock, share 6172 reservation, or delegation established before the server's previous 6173 incarnation means that the server must reject a reclaim from client A 6174 with the error NFS4ERR_NO_GRACE. 6176 Regardless of the level and approach to record keeping, the server 6177 MUST implement one of the following strategies (which apply to 6178 reclaims of share reservations, byte-range locks, and delegations): 6180 1. Reject all reclaims with NFS4ERR_NO_GRACE. This is super harsh, 6181 but necessary if the server does not want to record lock state in 6182 stable storage. 6184 2. Record sufficient state in stable storage such that all known 6185 edge conditions involving server reboot, including the two noted 6186 in this section, are detected. False positives are acceptable. 6188 Note that at this time, it is not known if there are other edge 6189 conditions. In the event, after a server reboot, the server 6190 determines that there is unrecoverable damage or corruption to 6191 the the stable storage, then for all clients and/or locks 6192 affected, the server MUST return NFS4ERR_NO_GRACE. 6194 9.6.3.1.4. Client Edge Condition 6196 A third edge condition effects the client and not the server. If the 6197 server reboots in the middle of the client reclaiming some locks and 6198 then a network partition is established, the client might be in the 6199 situation of having reclaimed some, but not all locks. In that case, 6200 a conservative client would assume that the non-reclaimed locks were 6201 revoked. 6203 The third known edge condition follows: 6205 1. Client A acquires a lock 1. 6207 2. Client A acquires a lock 2. 6209 3. Server reboots. 6211 4. Client A issues a RENEW operation, and gets back a 6212 NFS4ERR_STALE_CLIENTID. 6214 5. Client A reclaims its lock 1 within the server's grace period. 6216 6. Client A and server experience mutual network partition, such 6217 that client A is unable to reclaim its remaining locks within 6218 the grace period. 6220 7. Server's reclaim grace period ends. 6222 8. Client B acquires a lock that would have conflicted with Client 6223 A's lock 2. 6225 9. Client B releases the lock. 6227 10. Server reboots a second time. 6229 11. Network partition between client A and server heals. 6231 12. Client A issues a RENEW operation, and gets back a 6232 NFS4ERR_STALE_CLIENTID. 6234 13. Client A reclaims both lock 1 and lock 2 within the server's 6235 grace period. 6237 At the last step, the client reclaims lock 2 as if it had held that 6238 lock continuously, when in fact a conflicting lock was granted to 6239 client B. 6241 A server could avoid this situation by rejecting the reclaim of lock 6242 2. However, to do so accurately it would have to ensure that 6243 additional information about individual locks held survives reboot. 6244 Server implementations are not required to do that, so the client 6245 must not assume that the server will. 6247 Instead, a client MUST reclaim only those locks which it succesfully 6248 acquired from the previous server instance, omitting any that it 6249 failed to reclaim before a new reboot. Thus, in the last step above, 6250 client A should reclaim only lock 1. 6252 9.6.3.1.5. Client's Handling of NFS4ERR_NO_GRACE 6254 A mandate for the client's handling of the NFS4ERR_NO_GRACE error is 6255 outside the scope of this specification, since the strategies for 6256 such handling are very dependent on the client's operating 6257 environment. However, one potential approach is described below. 6259 When the client receives NFS4ERR_NO_GRACE, it could examine the 6260 change attribute of the objects the client is trying to reclaim state 6261 for, and use that to determine whether to re-establish the state via 6262 normal OPEN or LOCK requests. This is acceptable provided the 6263 client's operating environment allows it. In other words, the client 6264 implementor is advised to document for his users the behavior. The 6265 client could also inform the application that its byte-range lock or 6266 share reservations (whether they were delegated or not) have been 6267 lost, such as via a UNIX signal, a GUI pop-up window, etc. See 6268 Section 10.5, for a discussion of what the client should do for 6269 dealing with unreclaimed delegations on client state. 6271 For further discussion of revocation of locks see Section 9.8. 6273 9.6.3.2. Client's Reaction to a Freed Lock 6275 There is no way for a client to predetermine how a given server is 6276 going to behave during a network partition. When the partition 6277 heals, either the client still has all of its locks, it has some of 6278 its locks, or it has none of them. The client will be able to 6279 examine the various error return values to determine its response. 6281 NFS4ERR_EXPIRED: 6283 All locks has been revoked during the partition. The client 6284 should use a SETCLIENTID to recover. 6286 NFS4ERR_ADMIN_REVOKED: 6288 The current lock has been revoked during the partition and there 6289 is no clue as to whether the server rebooted. 6291 NFS4ERR_BAD_STATEID: 6293 The current lock has been revoked during the partition and the 6294 server did not reboot. Other locks MAY still be renewed. The 6295 client MAY NOT want to do a SETCLIENTID and instead SHOULD probe 6296 via a RENEW call. 6298 NFS4ERR_NO_GRACE: 6300 The current lock has been revoked during the partition and the 6301 server rebooted. The server might have no information on the 6302 other locks. They may still be renewable. 6304 NFS4ERR_OLD_STATEID: 6306 The server has not rebooted. The client SHOULD handle this error 6307 as it normally would. 6309 9.7. Recovery from a Lock Request Timeout or Abort 6311 In the event a lock request times out, a client may decide to not 6312 retry the request. The client may also abort the request when the 6313 process for which it was issued is terminated (e.g., in UNIX due to a 6314 signal). It is possible though that the server received the request 6315 and acted upon it. This would change the state on the server without 6316 the client being aware of the change. It is paramount that the 6317 client re-synchronize state with server before it attempts any other 6318 operation that takes a seqid and/or a stateid with the same 6319 lock_owner. This is straightforward to do without a special re- 6320 synchronize operation. 6322 Since the server maintains the last lock request and response 6323 received on the lock_owner, for each lock_owner, the client should 6324 cache the last lock request it sent such that the lock request did 6325 not receive a response. From this, the next time the client does a 6326 lock operation for the lock_owner, it can send the cached request, if 6327 there is one, and if the request was one that established state 6328 (e.g., a LOCK or OPEN operation), the server will return the cached 6329 result or if never saw the request, perform it. The client can 6330 follow up with a request to remove the state (e.g., a LOCKU or CLOSE 6331 operation). With this approach, the sequencing and stateid 6332 information on the client and server for the given lock_owner will 6333 re-synchronize and in turn the lock state will re-synchronize. 6335 9.8. Server Revocation of Locks 6337 At any point, the server can revoke locks held by a client and the 6338 client must be prepared for this event. When the client detects that 6339 its locks have been or may have been revoked, the client is 6340 responsible for validating the state information between itself and 6341 the server. Validating locking state for the client means that it 6342 must verify or reclaim state for each lock currently held. 6344 The first instance of lock revocation is upon server reboot or re- 6345 initialization. In this instance the client will receive an error 6346 (NFS4ERR_STALE_STATEID or NFS4ERR_STALE_CLIENTID) and the client will 6347 proceed with normal crash recovery as described in the previous 6348 section. 6350 The second lock revocation event is the inability to renew the lease 6351 before expiration. While this is considered a rare or unusual event, 6352 the client must be prepared to recover. Both the server and client 6353 will be able to detect the failure to renew the lease and are capable 6354 of recovering without data corruption. For the server, it tracks the 6355 last renewal event serviced for the client and knows when the lease 6356 will expire. Similarly, the client must track operations which will 6357 renew the lease period. Using the time that each such request was 6358 sent and the time that the corresponding reply was received, the 6359 client should bound the time that the corresponding renewal could 6360 have occurred on the server and thus determine if it is possible that 6361 a lease period expiration could have occurred. 6363 The third lock revocation event can occur as a result of 6364 administrative intervention within the lease period. While this is 6365 considered a rare event, it is possible that the server's 6366 administrator has decided to release or revoke a particular lock held 6367 by the client. As a result of revocation, the client will receive an 6368 error of NFS4ERR_ADMIN_REVOKED. In this instance the client may 6369 assume that only the lock_owner's locks have been lost. The client 6370 notifies the lock holder appropriately. The client may not assume 6371 the lease period has been renewed as a result of a failed operation. 6373 When the client determines the lease period may have expired, the 6374 client must mark all locks held for the associated lease as 6375 "unvalidated". This means the client has been unable to re-establish 6376 or confirm the appropriate lock state with the server. As described 6377 in Section 9.6, there are scenarios in which the server may grant 6378 conflicting locks after the lease period has expired for a client. 6379 When it is possible that the lease period has expired, the client 6380 must validate each lock currently held to ensure that a conflicting 6381 lock has not been granted. The client may accomplish this task by 6382 issuing an I/O request, either a pending I/O or a zero-length read, 6383 specifying the stateid associated with the lock in question. If the 6384 response to the request is success, the client has validated all of 6385 the locks governed by that stateid and re-established the appropriate 6386 state between itself and the server. 6388 If the I/O request is not successful, then one or more of the locks 6389 associated with the stateid was revoked by the server and the client 6390 must notify the owner. 6392 9.9. Share Reservations 6394 A share reservation is a mechanism to control access to a file. It 6395 is a separate and independent mechanism from byte-range locking. 6396 When a client opens a file, it issues an OPEN operation to the server 6397 specifying the type of access required (READ, WRITE, or BOTH) and the 6398 type of access to deny others (OPEN4_SHARE_DENY_NONE, 6399 OPEN4_SHARE_DENY_READ, OPEN4_SHARE_DENY_WRITE, or 6400 OPEN4_SHARE_DENY_BOTH). If the OPEN fails the client will fail the 6401 application's open request. 6403 Pseudo-code definition of the semantics: 6405 if (request.access == 0) 6406 return (NFS4ERR_INVAL) 6407 else if ((request.access & file_state.deny)) || 6408 (request.deny & file_state.access)) 6409 return (NFS4ERR_DENIED) 6411 This checking of share reservations on OPEN is done with no exception 6412 for an existing OPEN for the same open_owner. 6414 The constants used for the OPEN and OPEN_DOWNGRADE operations for the 6415 access and deny fields are as follows: 6417 const OPEN4_SHARE_ACCESS_READ = 0x00000001; 6418 const OPEN4_SHARE_ACCESS_WRITE = 0x00000002; 6419 const OPEN4_SHARE_ACCESS_BOTH = 0x00000003; 6421 const OPEN4_SHARE_DENY_NONE = 0x00000000; 6422 const OPEN4_SHARE_DENY_READ = 0x00000001; 6423 const OPEN4_SHARE_DENY_WRITE = 0x00000002; 6424 const OPEN4_SHARE_DENY_BOTH = 0x00000003; 6426 9.10. OPEN/CLOSE Operations 6428 To provide correct share semantics, a client MUST use the OPEN 6429 operation to obtain the initial filehandle and indicate the desired 6430 access and what access, if any, to deny. Even if the client intends 6431 to use a stateid of all 0's or all 1's, it must still obtain the 6432 filehandle for the regular file with the OPEN operation so the 6433 appropriate share semantics can be applied. Clients that do not have 6434 a deny mode built into their programming interfaces for opening a 6435 file should request a deny mode of OPEN4_SHARE_DENY_NONE. 6437 The OPEN operation with the CREATE flag, also subsumes the CREATE 6438 operation for regular files as used in previous versions of the NFS 6439 protocol. This allows a create with a share to be done atomically. 6441 The CLOSE operation removes all share reservations held by the 6442 lock_owner on that file. If byte-range locks are held, the client 6443 SHOULD release all locks before issuing a CLOSE. The server MAY free 6444 all outstanding locks on CLOSE but some servers may not support the 6445 CLOSE of a file that still has byte-range locks held. The server 6446 MUST return failure, NFS4ERR_LOCKS_HELD, if any locks would exist 6447 after the CLOSE. 6449 The LOOKUP operation will return a filehandle without establishing 6450 any lock state on the server. Without a valid stateid, the server 6451 will assume the client has the least access. For example, if one 6452 client opened a file with OPEN4_SHARE_DENY_BOTH and another client 6453 accesses the file via a filehandle obtained through LOOKUP, the 6454 second client could only read the file using the special read bypass 6455 stateid. The second client could not WRITE the file at all because 6456 it would not have a valid stateid from OPEN and the special anonymous 6457 stateid would not be allowed access. 6459 9.10.1. Close and Retention of State Information 6461 Since a CLOSE operation requests deallocation of a stateid, dealing 6462 with retransmission of the CLOSE, may pose special difficulties, 6463 since the state information, which normally would be used to 6464 determine the state of the open file being designated, might be 6465 deallocated, resulting in an NFS4ERR_BAD_STATEID error. 6467 Servers may deal with this problem in a number of ways. To provide 6468 the greatest degree assurance that the protocol is being used 6469 properly, a server should, rather than deallocate the stateid, mark 6470 it as close-pending, and retain the stateid with this status, until 6471 later deallocation. In this way, a retransmitted CLOSE can be 6472 recognized since the stateid points to state information with this 6473 distinctive status, so that it can be handled without error. 6475 When adopting this strategy, a server should retain the state 6476 information until the earliest of: 6478 o Another validly sequenced request for the same lockowner, that is 6479 not a retransmission. 6481 o The time that a lockowner is freed by the server due to period 6482 with no activity. 6484 o All locks for the client are freed as a result of a SETCLIENTID. 6486 Servers may avoid this complexity, at the cost of less complete 6487 protocol error checking, by simply responding NFS4_OK in the event of 6488 a CLOSE for a deallocated stateid, on the assumption that this case 6489 must be caused by a retransmitted close. When adopting this 6490 approach, it is desirable to at least log an error when returning a 6491 no-error indication in this situation. If the server maintains a 6492 reply-cache mechanism, it can verify the CLOSE is indeed a 6493 retransmission and avoid error logging in most cases. 6495 9.11. Open Upgrade and Downgrade 6497 When an OPEN is done for a file and the lockowner for which the open 6498 is being done already has the file open, the result is to upgrade the 6499 open file status maintained on the server to include the access and 6500 deny bits specified by the new OPEN as well as those for the existing 6501 OPEN. The result is that there is one open file, as far as the 6502 protocol is concerned, and it includes the union of the access and 6503 deny bits for all of the OPEN requests completed. Only a single 6504 CLOSE will be done to reset the effects of both OPENs. Note that the 6505 client, when issuing the OPEN, may not know that the same file is in 6506 fact being opened. The above only applies if both OPENs result in 6507 the OPENed object being designated by the same filehandle. 6509 When the server chooses to export multiple filehandles corresponding 6510 to the same file object and returns different filehandles on two 6511 different OPENs of the same file object, the server MUST NOT "OR" 6512 together the access and deny bits and coalesce the two open files. 6513 Instead the server must maintain separate OPENs with separate 6514 stateids and will require separate CLOSEs to free them. 6516 When multiple open files on the client are merged into a single open 6517 file object on the server, the close of one of the open files (on the 6518 client) may necessitate change of the access and deny status of the 6519 open file on the server. This is because the union of the access and 6520 deny bits for the remaining opens may be smaller (i.e., a proper 6521 subset) than previously. The OPEN_DOWNGRADE operation is used to 6522 make the necessary change and the client should use it to update the 6523 server so that share reservation requests by other clients are 6524 handled properly. The stateid returned has the same "other" field as 6525 that passed to the server. The "seqid" value in the returned stateid 6526 MUST be incremented, even in situations in which there is no change 6527 to the access and deny bits for the file. 6529 9.12. Short and Long Leases 6531 When determining the time period for the server lease, the usual 6532 lease tradeoffs apply. Short leases are good for fast server 6533 recovery at a cost of increased RENEW or READ (with zero length) 6534 requests. Longer leases are certainly kinder and gentler to servers 6535 trying to handle very large numbers of clients. The number of RENEW 6536 requests drop in proportion to the lease time. The disadvantages of 6537 long leases are slower recovery after server failure (the server must 6538 wait for the leases to expire and the grace period to elapse before 6539 granting new lock requests) and increased file contention (if client 6540 fails to transmit an unlock request then server must wait for lease 6541 expiration before granting new locks). 6543 Long leases are usable if the server is able to store lease state in 6544 non-volatile memory. Upon recovery, the server can reconstruct the 6545 lease state from its non-volatile memory and continue operation with 6546 its clients and therefore long leases would not be an issue. 6548 9.13. Clocks, Propagation Delay, and Calculating Lease Expiration 6550 To avoid the need for synchronized clocks, lease times are granted by 6551 the server as a time delta. However, there is a requirement that the 6552 client and server clocks do not drift excessively over the duration 6553 of the lock. There is also the issue of propagation delay across the 6554 network which could easily be several hundred milliseconds as well as 6555 the possibility that requests will be lost and need to be 6556 retransmitted. 6558 To take propagation delay into account, the client should subtract it 6559 from lease times (e.g., if the client estimates the one-way 6560 propagation delay as 200 msec, then it can assume that the lease is 6561 already 200 msec old when it gets it). In addition, it will take 6562 another 200 msec to get a response back to the server. So the client 6563 must send a lock renewal or write data back to the server 400 msec 6564 before the lease would expire. 6566 The server's lease period configuration should take into account the 6567 network distance of the clients that will be accessing the server's 6568 resources. It is expected that the lease period will take into 6569 account the network propagation delays and other network delay 6570 factors for the client population. Since the protocol does not allow 6571 for an automatic method to determine an appropriate lease period, the 6572 server's administrator may have to tune the lease period. 6574 9.14. Migration, Replication and State 6576 When responsibility for handling a given file system is transferred 6577 to a new server (migration) or the client chooses to use an alternate 6578 server (e.g., in response to server unresponsiveness) in the context 6579 of file system replication, the appropriate handling of state shared 6580 between the client and server (i.e., locks, leases, stateids, and 6581 client IDs) is as described below. The handling differs between 6582 migration and replication. For related discussion of file server 6583 state and recover of such see the sections under Section 9.6. 6585 If a server replica or a server immigrating a filesystem agrees to, 6586 or is expected to, accept opaque values from the client that 6587 originated from another server, then it is a wise implementation 6588 practice for the servers to encode the "opaque" values in network 6589 byte order. This way, servers acting as replicas or immigrating 6590 filesystems will be able to parse values like stateids, directory 6591 cookies, filehandles, etc. even if their native byte order is 6592 different from other servers cooperating in the replication and 6593 migration of the filesystem. 6595 9.14.1. Migration and State 6597 In the case of migration, the servers involved in the migration of a 6598 filesystem SHOULD transfer all server state from the original to the 6599 new server. This must be done in a way that is transparent to the 6600 client. This state transfer will ease the client's transition when a 6601 filesystem migration occurs. If the servers are successful in 6602 transferring all state, the client will continue to use stateids 6603 assigned by the original server. Therefore the new server must 6604 recognize these stateids as valid. This holds true for the client ID 6605 as well. Since responsibility for an entire filesystem is 6606 transferred with a migration event, there is no possibility that 6607 conflicts will arise on the new server as a result of the transfer of 6608 locks. 6610 As part of the transfer of information between servers, leases would 6611 be transferred as well. The leases being transferred to the new 6612 server will typically have a different expiration time from those for 6613 the same client, previously on the old server. To maintain the 6614 property that all leases on a given server for a given client expire 6615 at the same time, the server should advance the expiration time to 6616 the later of the leases being transferred or the leases already 6617 present. This allows the client to maintain lease renewal of both 6618 classes without special effort. 6620 The servers may choose not to transfer the state information upon 6621 migration. However, this choice is discouraged. In this case, when 6622 the client presents state information from the original server (e.g., 6623 in a RENEW op or a READ op of zero length), the client must be 6624 prepared to receive either NFS4ERR_STALE_CLIENTID or 6625 NFS4ERR_STALE_STATEID from the new server. The client should then 6626 recover its state information as it normally would in response to a 6627 server failure. The new server must take care to allow for the 6628 recovery of state information as it would in the event of server 6629 restart. 6631 A client SHOULD re-establish new callback information with the new 6632 server as soon as possible, according to sequences described in 6633 Section 15.35 and Section 15.36. This ensures that server operations 6634 are not blocked by the inability to recall delegations. 6636 9.14.2. Replication and State 6638 Since client switch-over in the case of replication is not under 6639 server control, the handling of state is different. In this case, 6640 leases, stateids and client IDs do not have validity across a 6641 transition from one server to another. The client must re-establish 6642 its locks on the new server. This can be compared to the re- 6643 establishment of locks by means of reclaim-type requests after a 6644 server reboot. The difference is that the server has no provision to 6645 distinguish requests reclaiming locks from those obtaining new locks 6646 or to defer the latter. Thus, a client re-establishing a lock on the 6647 new server (by means of a LOCK or OPEN request), may have the 6648 requests denied due to a conflicting lock. Since replication is 6649 intended for read-only use of filesystems, such denial of locks 6650 should not pose large difficulties in practice. When an attempt to 6651 re-establish a lock on a new server is denied, the client should 6652 treat the situation as if his original lock had been revoked. 6654 9.14.3. Notification of Migrated Lease 6656 In the case of lease renewal, the client may not be submitting 6657 requests for a filesystem that has been migrated to another server. 6658 This can occur because of the implicit lease renewal mechanism. The 6659 client renews leases for all filesystems when submitting a request to 6660 any one filesystem at the server. 6662 In order for the client to schedule renewal of leases that may have 6663 been relocated to the new server, the client must find out about 6664 lease relocation before those leases expire. To accomplish this, all 6665 operations which implicitly renew leases for a client (such as OPEN, 6666 CLOSE, READ, WRITE, RENEW, LOCK, and others), will return the error 6667 NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be 6668 renewed has been transferred to a new server. This condition will 6669 continue until the client receives an NFS4ERR_MOVED error and the 6670 server receives the subsequent GETATTR(fs_locations) for an access to 6671 each filesystem for which a lease has been moved to a new server. By 6672 convention, the compound including the GETATTR(fs_locations) SHOULD 6673 append a RENEW operation to permit the server to identify the client 6674 doing the access. 6676 Upon receiving the NFS4ERR_LEASE_MOVED error, a client that supports 6677 filesystem migration MUST probe all filesystems from that server on 6678 which it holds open state. Once the client has successfully probed 6679 all those filesystems which are migrated, the server MUST resume 6680 normal handling of stateful requests from that client. 6682 In order to support legacy clients that do not handle the 6683 NFS4ERR_LEASE_MOVED error correctly, the server SHOULD time out after 6684 a wait of at least two lease periods, at which time it will resume 6685 normal handling of stateful requests from all clients. If a client 6686 attempts to access the migrated files, the server MUST reply 6687 NFS4ERR_MOVED. 6689 When the client receives an NFS4ERR_MOVED error, the client can 6690 follow the normal process to obtain the new server information 6691 (through the fs_locations attribute) and perform renewal of those 6692 leases on the new server. If the server has not had state 6693 transferred to it transparently, the client will receive either 6694 NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID from the new server, 6695 as described above. The client can then recover state information as 6696 it does in the event of server failure. 6698 9.14.4. Migration and the Lease_time Attribute 6700 In order that the client may appropriately manage its leases in the 6701 case of migration, the destination server must establish proper 6702 values for the lease_time attribute. 6704 When state is transferred transparently, that state should include 6705 the correct value of the lease_time attribute. The lease_time 6706 attribute on the destination server must never be less than that on 6707 the source since this would result in premature expiration of leases 6708 granted by the source server. Upon migration in which state is 6709 transferred transparently, the client is under no obligation to re- 6710 fetch the lease_time attribute and may continue to use the value 6711 previously fetched (on the source server). 6713 If state has not been transferred transparently (i.e., the client 6714 sees a real or simulated server reboot), the client should fetch the 6715 value of lease_time on the new (i.e., destination) server, and use it 6716 for subsequent locking requests. However the server must respect a 6717 grace period at least as long as the lease_time on the source server, 6718 in order to ensure that clients have ample time to reclaim their 6719 locks before potentially conflicting non-reclaimed locks are granted. 6720 The means by which the new server obtains the value of lease_time on 6721 the old server is left to the server implementations. It is not 6722 specified by the NFS version 4 protocol. 6724 10. Client-Side Caching 6726 Client-side caching of data, of file attributes, and of file names is 6727 essential to providing good performance with the NFS protocol. 6728 Providing distributed cache coherence is a difficult problem and 6729 previous versions of the NFS protocol have not attempted it. 6730 Instead, several NFS client implementation techniques have been used 6731 to reduce the problems that a lack of coherence poses for users. 6732 These techniques have not been clearly defined by earlier protocol 6733 specifications and it is often unclear what is valid or invalid 6734 client behavior. 6736 The NFSv4 protocol uses many techniques similar to those that have 6737 been used in previous protocol versions. The NFSv4 protocol does not 6738 provide distributed cache coherence. However, it defines a more 6739 limited set of caching guarantees to allow locks and share 6740 reservations to be used without destructive interference from client 6741 side caching. 6743 In addition, the NFSv4 protocol introduces a delegation mechanism 6744 which allows many decisions normally made by the server to be made 6745 locally by clients. This mechanism provides efficient support of the 6746 common cases where sharing is infrequent or where sharing is read- 6747 only. 6749 10.1. Performance Challenges for Client-Side Caching 6751 Caching techniques used in previous versions of the NFS protocol have 6752 been successful in providing good performance. However, several 6753 scalability challenges can arise when those techniques are used with 6754 very large numbers of clients. This is particularly true when 6755 clients are geographically distributed which classically increases 6756 the latency for cache re-validation requests. 6758 The previous versions of the NFS protocol repeat their file data 6759 cache validation requests at the time the file is opened. This 6760 behavior can have serious performance drawbacks. A common case is 6761 one in which a file is only accessed by a single client. Therefore, 6762 sharing is infrequent. 6764 In this case, repeated reference to the server to find that no 6765 conflicts exist is expensive. A better option with regards to 6766 performance is to allow a client that repeatedly opens a file to do 6767 so without reference to the server. This is done until potentially 6768 conflicting operations from another client actually occur. 6770 A similar situation arises in connection with file locking. Sending 6771 file lock and unlock requests to the server as well as the read and 6772 write requests necessary to make data caching consistent with the 6773 locking semantics (see Section 10.3.2) can severely limit 6774 performance. When locking is used to provide protection against 6775 infrequent conflicts, a large penalty is incurred. This penalty may 6776 discourage the use of file locking by applications. 6778 The NFSv4 protocol provides more aggressive caching strategies with 6779 the following design goals: 6781 o Compatibility with a large range of server semantics. 6783 o Provide the same caching benefits as previous versions of the NFS 6784 protocol when unable to provide the more aggressive model. 6786 o Requirements for aggressive caching are organized so that a large 6787 portion of the benefit can be obtained even when not all of the 6788 requirements can be met. 6790 The appropriate requirements for the server are discussed in later 6791 sections in which specific forms of caching are covered (see 6792 Section 10.4). 6794 10.2. Delegation and Callbacks 6796 Recallable delegation of server responsibilities for a file to a 6797 client improves performance by avoiding repeated requests to the 6798 server in the absence of inter-client conflict. With the use of a 6799 "callback" RPC from server to client, a server recalls delegated 6800 responsibilities when another client engages in sharing of a 6801 delegated file. 6803 A delegation is passed from the server to the client, specifying the 6804 object of the delegation and the type of delegation. There are 6805 different types of delegations but each type contains a stateid to be 6806 used to represent the delegation when performing operations that 6807 depend on the delegation. This stateid is similar to those 6808 associated with locks and share reservations but differs in that the 6809 stateid for a delegation is associated with a client ID and may be 6810 used on behalf of all the open_owners for the given client. A 6811 delegation is made to the client as a whole and not to any specific 6812 process or thread of control within it. 6814 Because callback RPCs may not work in all environments (due to 6815 firewalls, for example), correct protocol operation does not depend 6816 on them. Preliminary testing of callback functionality by means of a 6817 CB_NULL procedure determines whether callbacks can be supported. The 6818 CB_NULL procedure checks the continuity of the callback path. A 6819 server makes a preliminary assessment of callback availability to a 6820 given client and avoids delegating responsibilities until it has 6821 determined that callbacks are supported. Because the granting of a 6822 delegation is always conditional upon the absence of conflicting 6823 access, clients must not assume that a delegation will be granted and 6824 they must always be prepared for OPENs to be processed without any 6825 delegations being granted. 6827 Once granted, a delegation behaves in most ways like a lock. There 6828 is an associated lease that is subject to renewal together with all 6829 of the other leases held by that client. 6831 Unlike locks, an operation by a second client to a delegated file 6832 will cause the server to recall a delegation through a callback. 6834 On recall, the client holding the delegation must flush modified 6835 state (such as modified data) to the server and return the 6836 delegation. The conflicting request will not be acted on until the 6837 recall is complete. The recall is considered complete when the 6838 client returns the delegation or the server times its wait for the 6839 delegation to be returned and revokes the delegation as a result of 6840 the timeout. In the interim, the server will either delay responding 6841 to conflicting requests or respond to them with NFS4ERR_DELAY. 6842 Following the resolution of the recall, the server has the 6843 information necessary to grant or deny the second client's request. 6845 At the time the client receives a delegation recall, it may have 6846 substantial state that needs to be flushed to the server. Therefore, 6847 the server should allow sufficient time for the delegation to be 6848 returned since it may involve numerous RPCs to the server. If the 6849 server is able to determine that the client is diligently flushing 6850 state to the server as a result of the recall, the server may extend 6851 the usual time allowed for a recall. However, the time allowed for 6852 recall completion should not be unbounded. 6854 An example of this is when responsibility to mediate opens on a given 6855 file is delegated to a client (see Section 10.4). The server will 6856 not know what opens are in effect on the client. Without this 6857 knowledge the server will be unable to determine if the access and 6858 deny state for the file allows any particular open until the 6859 delegation for the file has been returned. 6861 A client failure or a network partition can result in failure to 6862 respond to a recall callback. In this case, the server will revoke 6863 the delegation which in turn will render useless any modified state 6864 still on the client. 6866 Clients need to be aware that server implementors may enforce 6867 practical limitations on the number of delegations issued. Further, 6868 as there is no way to determine which delegations to revoke, the 6869 server is allowed to revoke any. If the server is implemented to 6870 revoke another delegation held by that client, then the client may be 6871 able to determine that a limit has been reached because each new 6872 delegation request results in a revoke. The client could then 6873 determine which delegations it may not need and preemptively release 6874 them. 6876 10.2.1. Delegation Recovery 6878 There are three situations that delegation recovery must deal with: 6880 o Client reboot or restart 6882 o Server reboot or restart 6884 o Network partition (full or callback-only) 6886 In the event the client reboots or restarts, the failure to renew 6887 leases will result in the revocation of byte-range locks and share 6888 reservations. Delegations, however, may be treated a bit 6889 differently. 6891 There will be situations in which delegations will need to be 6892 reestablished after a client reboots or restarts. The reason for 6893 this is the client may have file data stored locally and this data 6894 was associated with the previously held delegations. The client will 6895 need to reestablish the appropriate file state on the server. 6897 To allow for this type of client recovery, the server MAY extend the 6898 period for delegation recovery beyond the typical lease expiration 6899 period. This implies that requests from other clients that conflict 6900 with these delegations will need to wait. Because the normal recall 6901 process may require significant time for the client to flush changed 6902 state to the server, other clients need be prepared for delays that 6903 occur because of a conflicting delegation. This longer interval 6904 would increase the window for clients to reboot and consult stable 6905 storage so that the delegations can be reclaimed. For open 6906 delegations, such delegations are reclaimed using OPEN with a claim 6907 type of CLAIM_DELEGATE_PREV. (See Section 10.5 and Section 15.18 for 6908 discussion of open delegation and the details of OPEN respectively). 6910 A server MAY support a claim type of CLAIM_DELEGATE_PREV, but if it 6911 does, it MUST NOT remove delegations upon SETCLIENTID_CONFIRM, and 6912 instead MUST, for a period of time no less than that of the value of 6913 the lease_time attribute, maintain the client's delegations to allow 6914 time for the client to issue CLAIM_DELEGATE_PREV requests. The 6915 server that supports CLAIM_DELEGATE_PREV MUST support the DELEGPURGE 6916 operation. 6918 When the server reboots or restarts, delegations are reclaimed (using 6919 the OPEN operation with CLAIM_PREVIOUS) in a similar fashion to byte- 6920 range locks and share reservations. However, there is a slight 6921 semantic difference. In the normal case if the server decides that a 6922 delegation should not be granted, it performs the requested action 6923 (e.g., OPEN) without granting any delegation. For reclaim, the 6924 server grants the delegation but a special designation is applied so 6925 that the client treats the delegation as having been granted but 6926 recalled by the server. Because of this, the client has the duty to 6927 write all modified state to the server and then return the 6928 delegation. This process of handling delegation reclaim reconciles 6929 three principles of the NFSv4 protocol: 6931 o Upon reclaim, a client reporting resources assigned to it by an 6932 earlier server instance must be granted those resources. 6934 o The server has unquestionable authority to determine whether 6935 delegations are to be granted and, once granted, whether they are 6936 to be continued. 6938 o The use of callbacks is not to be depended upon until the client 6939 has proven its ability to receive them. 6941 When a client has more than a single open associated with a 6942 delegation, state for those additional opens can be established using 6943 OPEN operations of type CLAIM_DELEGATE_CUR. When these are used to 6944 establish opens associated with reclaimed delegations, the server 6945 MUST allow them when made within the grace period. 6947 When a network partition occurs, delegations are subject to freeing 6948 by the server when the lease renewal period expires. This is similar 6949 to the behavior for locks and share reservations. For delegations, 6950 however, the server may extend the period in which conflicting 6951 requests are held off. Eventually the occurrence of a conflicting 6952 request from another client will cause revocation of the delegation. 6953 A loss of the callback path (e.g., by later network configuration 6954 change) will have the same effect. A recall request will fail and 6955 revocation of the delegation will result. 6957 A client normally finds out about revocation of a delegation when it 6958 uses a stateid associated with a delegation and receives the error 6959 NFS4ERR_EXPIRED. It also may find out about delegation revocation 6960 after a client reboot when it attempts to reclaim a delegation and 6961 receives that same error. Note that in the case of a revoked 6962 OPEN_DELEGATE_WRITE delegation, there are issues because data may 6963 have been modified by the client whose delegation is revoked and 6964 separately by other clients. See Section 10.5.1 for a discussion of 6965 such issues. Note also that when delegations are revoked, 6966 information about the revoked delegation will be written by the 6967 server to stable storage (as described in Section 9.6). This is done 6968 to deal with the case in which a server reboots after revoking a 6969 delegation but before the client holding the revoked delegation is 6970 notified about the revocation. 6972 10.3. Data Caching 6974 When applications share access to a set of files, they need to be 6975 implemented so as to take account of the possibility of conflicting 6976 access by another application. This is true whether the applications 6977 in question execute on different clients or reside on the same 6978 client. 6980 Share reservations and byte-range locks are the facilities the NFS 6981 version 4 protocol provides to allow applications to coordinate 6982 access by providing mutual exclusion facilities. The NFSv4 6983 protocol's data caching must be implemented such that it does not 6984 invalidate the assumptions that those using these facilities depend 6985 upon. 6987 10.3.1. Data Caching and OPENs 6989 In order to avoid invalidating the sharing assumptions that 6990 applications rely on, NFSv4 clients should not provide cached data to 6991 applications or modify it on behalf of an application when it would 6992 not be valid to obtain or modify that same data via a READ or WRITE 6993 operation. 6995 Furthermore, in the absence of open delegation (see Section 10.4) two 6996 additional rules apply. Note that these rules are obeyed in practice 6997 by many NFSv2 and NFSv3 clients. 6999 o First, cached data present on a client must be revalidated after 7000 doing an OPEN. Revalidating means that the client fetches the 7001 change attribute from the server, compares it with the cached 7002 change attribute, and if different, declares the cached data (as 7003 well as the cached attributes) as invalid. This is to ensure that 7004 the data for the OPENed file is still correctly reflected in the 7005 client's cache. This validation must be done at least when the 7006 client's OPEN operation includes DENY=WRITE or BOTH thus 7007 terminating a period in which other clients may have had the 7008 opportunity to open the file with WRITE access. Clients may 7009 choose to do the revalidation more often (i.e., at OPENs 7010 specifying DENY=NONE) to parallel the NFSv3 protocol's practice 7011 for the benefit of users assuming this degree of cache 7012 revalidation. Since the change attribute is updated for data and 7013 metadata modifications, some client implementors may be tempted to 7014 use the time_modify attribute and not change to validate cached 7015 data, so that metadata changes do not spuriously invalidate clean 7016 data. The implementor is cautioned in this approach. The change 7017 attribute is guaranteed to change for each update to the file, 7018 whereas time_modify is guaranteed to change only at the 7019 granularity of the time_delta attribute. Use by the client's data 7020 cache validation logic of time_modify and not change runs the risk 7021 of the client incorrectly marking stale data as valid. 7023 o Second, modified data must be flushed to the server before closing 7024 a file OPENed for write. This is complementary to the first rule. 7025 If the data is not flushed at CLOSE, the revalidation done after 7026 client OPENs as file is unable to achieve its purpose. The other 7027 aspect to flushing the data before close is that the data must be 7028 committed to stable storage, at the server, before the CLOSE 7029 operation is requested by the client. In the case of a server 7030 reboot or restart and a CLOSEd file, it may not be possible to 7031 retransmit the data to be written to the file. Hence, this 7032 requirement. 7034 10.3.2. Data Caching and File Locking 7036 For those applications that choose to use file locking instead of 7037 share reservations to exclude inconsistent file access, there is an 7038 analogous set of constraints that apply to client side data caching. 7039 These rules are effective only if the file locking is used in a way 7040 that matches in an equivalent way the actual READ and WRITE 7041 operations executed. This is as opposed to file locking that is 7042 based on pure convention. For example, it is possible to manipulate 7043 a two-megabyte file by dividing the file into two one-megabyte 7044 regions and protecting access to the two regions by file locks on 7045 bytes zero and one. A lock for write on byte zero of the file would 7046 represent the right to do READ and WRITE operations on the first 7047 region. A lock for write on byte one of the file would represent the 7048 right to do READ and WRITE operations on the second region. As long 7049 as all applications manipulating the file obey this convention, they 7050 will work on a local filesystem. However, they may not work with the 7051 NFSv4 protocol unless clients refrain from data caching. 7053 The rules for data caching in the file locking environment are: 7055 o First, when a client obtains a file lock for a particular region, 7056 the data cache corresponding to that region (if any cached data 7057 exists) must be revalidated. If the change attribute indicates 7058 that the file may have been updated since the cached data was 7059 obtained, the client must flush or invalidate the cached data for 7060 the newly locked region. A client might choose to invalidate all 7061 of non-modified cached data that it has for the file but the only 7062 requirement for correct operation is to invalidate all of the data 7063 in the newly locked region. 7065 o Second, before releasing a write lock for a region, all modified 7066 data for that region must be flushed to the server. The modified 7067 data must also be written to stable storage. 7069 Note that flushing data to the server and the invalidation of cached 7070 data must reflect the actual byte ranges locked or unlocked. 7071 Rounding these up or down to reflect client cache block boundaries 7072 will cause problems if not carefully done. For example, writing a 7073 modified block when only half of that block is within an area being 7074 unlocked may cause invalid modification to the region outside the 7075 unlocked area. This, in turn, may be part of a region locked by 7076 another client. Clients can avoid this situation by synchronously 7077 performing portions of write operations that overlap that portion 7078 (initial or final) that is not a full block. Similarly, invalidating 7079 a locked area which is not an integral number of full buffer blocks 7080 would require the client to read one or two partial blocks from the 7081 server if the revalidation procedure shows that the data which the 7082 client possesses may not be valid. 7084 The data that is written to the server as a prerequisite to the 7085 unlocking of a region must be written, at the server, to stable 7086 storage. The client may accomplish this either with synchronous 7087 writes or by following asynchronous writes with a COMMIT operation. 7088 This is required because retransmission of the modified data after a 7089 server reboot might conflict with a lock held by another client. 7091 A client implementation may choose to accommodate applications which 7092 use byte-range locking in non-standard ways (e.g., using a byte-range 7093 lock as a global semaphore) by flushing to the server more data upon 7094 a LOCKU than is covered by the locked range. This may include 7095 modified data within files other than the one for which the unlocks 7096 are being done. In such cases, the client must not interfere with 7097 applications whose READs and WRITEs are being done only within the 7098 bounds of record locks which the application holds. For example, an 7099 application locks a single byte of a file and proceeds to write that 7100 single byte. A client that chose to handle a LOCKU by flushing all 7101 modified data to the server could validly write that single byte in 7102 response to an unrelated unlock. However, it would not be valid to 7103 write the entire block in which that single written byte was located 7104 since it includes an area that is not locked and might be locked by 7105 another client. Client implementations can avoid this problem by 7106 dividing files with modified data into those for which all 7107 modifications are done to areas covered by an appropriate byte-range 7108 lock and those for which there are modifications not covered by a 7109 byte-range lock. Any writes done for the former class of files must 7110 not include areas not locked and thus not modified on the client. 7112 10.3.3. Data Caching and Mandatory File Locking 7114 Client side data caching needs to respect mandatory file locking when 7115 it is in effect. The presence of mandatory file locking for a given 7116 file is indicated when the client gets back NFS4ERR_LOCKED from a 7117 READ or WRITE on a file it has an appropriate share reservation for. 7118 When mandatory locking is in effect for a file, the client must check 7119 for an appropriate file lock for data being read or written. If a 7120 lock exists for the range being read or written, the client may 7121 satisfy the request using the client's validated cache. If an 7122 appropriate file lock is not held for the range of the read or write, 7123 the read or write request must not be satisfied by the client's cache 7124 and the request must be sent to the server for processing. When a 7125 read or write request partially overlaps a locked region, the request 7126 should be subdivided into multiple pieces with each region (locked or 7127 not) treated appropriately. 7129 10.3.4. Data Caching and File Identity 7131 When clients cache data, the file data needs to be organized 7132 according to the filesystem object to which the data belongs. For 7133 NFSv3 clients, the typical practice has been to assume for the 7134 purpose of caching that distinct filehandles represent distinct 7135 filesystem objects. The client then has the choice to organize and 7136 maintain the data cache on this basis. 7138 In the NFSv4 protocol, there is now the possibility to have 7139 significant deviations from a "one filehandle per object" model 7140 because a filehandle may be constructed on the basis of the object's 7141 pathname. Therefore, clients need a reliable method to determine if 7142 two filehandles designate the same filesystem object. If clients 7143 were simply to assume that all distinct filehandles denote distinct 7144 objects and proceed to do data caching on this basis, caching 7145 inconsistencies would arise between the distinct client side objects 7146 which mapped to the same server side object. 7148 By providing a method to differentiate filehandles, the NFSv4 7149 protocol alleviates a potential functional regression in comparison 7150 with the NFSv3 protocol. Without this method, caching 7151 inconsistencies within the same client could occur and this has not 7152 been present in previous versions of the NFS protocol. Note that it 7153 is possible to have such inconsistencies with applications executing 7154 on multiple clients but that is not the issue being addressed here. 7156 For the purposes of data caching, the following steps allow an NFSv4 7157 client to determine whether two distinct filehandles denote the same 7158 server side object: 7160 o If GETATTR directed to two filehandles returns different values of 7161 the fsid attribute, then the filehandles represent distinct 7162 objects. 7164 o If GETATTR for any file with an fsid that matches the fsid of the 7165 two filehandles in question returns a unique_handles attribute 7166 with a value of TRUE, then the two objects are distinct. 7168 o If GETATTR directed to the two filehandles does not return the 7169 fileid attribute for both of the handles, then it cannot be 7170 determined whether the two objects are the same. Therefore, 7171 operations which depend on that knowledge (e.g., client side data 7172 caching) cannot be done reliably. Note that if GETATTR does not 7173 return the fileid attribute for both filehandles, it will return 7174 it for neither of the filehandles, since the fsid for both 7175 filehandles is the same. 7177 o If GETATTR directed to the two filehandles returns different 7178 values for the fileid attribute, then they are distinct objects. 7180 o Otherwise they are the same object. 7182 10.4. Open Delegation 7184 When a file is being OPENed, the server may delegate further handling 7185 of opens and closes for that file to the opening client. Any such 7186 delegation is recallable, since the circumstances that allowed for 7187 the delegation are subject to change. In particular, the server may 7188 receive a conflicting OPEN from another client, the server must 7189 recall the delegation before deciding whether the OPEN from the other 7190 client may be granted. Making a delegation is up to the server and 7191 clients should not assume that any particular OPEN either will or 7192 will not result in an open delegation. The following is a typical 7193 set of conditions that servers might use in deciding whether OPEN 7194 should be delegated: 7196 o The client must be able to respond to the server's callback 7197 requests. The server will use the CB_NULL procedure for a test of 7198 callback ability. 7200 o The client must have responded properly to previous recalls. 7202 o There must be no current open conflicting with the requested 7203 delegation. 7205 o There should be no current delegation that conflicts with the 7206 delegation being requested. 7208 o The probability of future conflicting open requests should be low 7209 based on the recent history of the file. 7211 o The existence of any server-specific semantics of OPEN/CLOSE that 7212 would make the required handling incompatible with the prescribed 7213 handling that the delegated client would apply (see below). 7215 There are two types of open delegations, OPEN_DELEGATE_READ and 7216 OPEN_DELEGATE_WRITE. A OPEN_DELEGATE_READ delegation allows a client 7217 to handle, on its own, requests to open a file for reading that do 7218 not deny read access to others. Multiple OPEN_DELEGATE_READ 7219 delegations may be outstanding simultaneously and do not conflict. A 7220 OPEN_DELEGATE_WRITE delegation allows the client to handle, on its 7221 own, all opens. Only one OPEN_DELEGATE_WRITE delegation may exist 7222 for a given file at a given time and it is inconsistent with any 7223 OPEN_DELEGATE_READ delegations. 7225 When a client has a OPEN_DELEGATE_READ delegation, it may not make 7226 any changes to the contents or attributes of the file but it is 7227 assured that no other client may do so. When a client has a 7228 OPEN_DELEGATE_WRITE delegation, it may modify the file data since no 7229 other client will be accessing the file's data. The client holding a 7230 OPEN_DELEGATE_WRITE delegation may only affect file attributes which 7231 are intimately connected with the file data: size, time_modify, 7232 change. 7234 When a client has an open delegation, it does not send OPENs or 7235 CLOSEs to the server but updates the appropriate status internally. 7236 For a OPEN_DELEGATE_READ delegation, opens that cannot be handled 7237 locally (opens for write or that deny read access) must be sent to 7238 the server. 7240 When an open delegation is made, the response to the OPEN contains an 7241 open delegation structure which specifies the following: 7243 o the type of delegation (read or write) 7245 o space limitation information to control flushing of data on close 7246 (OPEN_DELEGATE_WRITE delegation only, see Section 10.4.1) 7248 o an nfsace4 specifying read and write permissions 7250 o a stateid to represent the delegation for READ and WRITE 7252 The delegation stateid is separate and distinct from the stateid for 7253 the OPEN proper. The standard stateid, unlike the delegation 7254 stateid, is associated with a particular lock_owner and will continue 7255 to be valid after the delegation is recalled and the file remains 7256 open. 7258 When a request internal to the client is made to open a file and open 7259 delegation is in effect, it will be accepted or rejected solely on 7260 the basis of the following conditions. Any requirement for other 7261 checks to be made by the delegate should result in open delegation 7262 being denied so that the checks can be made by the server itself. 7264 o The access and deny bits for the request and the file as described 7265 in Section 9.9. 7267 o The read and write permissions as determined below. 7269 The nfsace4 passed with delegation can be used to avoid frequent 7270 ACCESS calls. The permission check should be as follows: 7272 o If the nfsace4 indicates that the open may be done, then it should 7273 be granted without reference to the server. 7275 o If the nfsace4 indicates that the open may not be done, then an 7276 ACCESS request must be sent to the server to obtain the definitive 7277 answer. 7279 The server may return an nfsace4 that is more restrictive than the 7280 actual ACL of the file. This includes an nfsace4 that specifies 7281 denial of all access. Note that some common practices such as 7282 mapping the traditional user "root" to the user "nobody" may make it 7283 incorrect to return the actual ACL of the file in the delegation 7284 response. 7286 The use of delegation together with various other forms of caching 7287 creates the possibility that no server authentication will ever be 7288 performed for a given user since all of the user's requests might be 7289 satisfied locally. Where the client is depending on the server for 7290 authentication, the client should be sure authentication occurs for 7291 each user by use of the ACCESS operation. This should be the case 7292 even if an ACCESS operation would not be required otherwise. As 7293 mentioned before, the server may enforce frequent authentication by 7294 returning an nfsace4 denying all access with every open delegation. 7296 10.4.1. Open Delegation and Data Caching 7298 OPEN delegation allows much of the message overhead associated with 7299 the opening and closing files to be eliminated. An open when an open 7300 delegation is in effect does not require that a validation message be 7301 sent to the server. The continued endurance of the 7302 "OPEN_DELEGATE_READ delegation" provides a guarantee that no OPEN for 7303 write and thus no write has occurred. Similarly, when closing a file 7304 opened for write and if OPEN_DELEGATE_WRITE delegation is in effect, 7305 the data written does not have to be flushed to the server until the 7306 open delegation is recalled. The continued endurance of the open 7307 delegation provides a guarantee that no open and thus no read or 7308 write has been done by another client. 7310 For the purposes of open delegation, READs and WRITEs done without an 7311 OPEN are treated as the functional equivalents of a corresponding 7312 type of OPEN. This refers to the READs and WRITEs that use the 7313 special stateids consisting of all zero bits or all one bits. 7314 Therefore, READs or WRITEs with a special stateid done by another 7315 client will force the server to recall a OPEN_DELEGATE_WRITE 7316 delegation. A WRITE with a special stateid done by another client 7317 will force a recall of OPEN_DELEGATE_READ delegations. 7319 With delegations, a client is able to avoid writing data to the 7320 server when the CLOSE of a file is serviced. The file close system 7321 call is the usual point at which the client is notified of a lack of 7322 stable storage for the modified file data generated by the 7323 application. At the close, file data is written to the server and 7324 through normal accounting the server is able to determine if the 7325 available filesystem space for the data has been exceeded (i.e., 7326 server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting 7327 includes quotas. The introduction of delegations requires that a 7328 alternative method be in place for the same type of communication to 7329 occur between client and server. 7331 In the delegation response, the server provides either the limit of 7332 the size of the file or the number of modified blocks and associated 7333 block size. The server must ensure that the client will be able to 7334 flush data to the server of a size equal to that provided in the 7335 original delegation. The server must make this assurance for all 7336 outstanding delegations. Therefore, the server must be careful in 7337 its management of available space for new or modified data taking 7338 into account available filesystem space and any applicable quotas. 7339 The server can recall delegations as a result of managing the 7340 available filesystem space. The client should abide by the server's 7341 state space limits for delegations. If the client exceeds the stated 7342 limits for the delegation, the server's behavior is undefined. 7344 Based on server conditions, quotas or available filesystem space, the 7345 server may grant OPEN_DELEGATE_WRITE delegations with very 7346 restrictive space limitations. The limitations may be defined in a 7347 way that will always force modified data to be flushed to the server 7348 on close. 7350 With respect to authentication, flushing modified data to the server 7351 after a CLOSE has occurred may be problematic. For example, the user 7352 of the application may have logged off the client and unexpired 7353 authentication credentials may not be present. In this case, the 7354 client may need to take special care to ensure that local unexpired 7355 credentials will in fact be available. This may be accomplished by 7356 tracking the expiration time of credentials and flushing data well in 7357 advance of their expiration or by making private copies of 7358 credentials to assure their availability when needed. 7360 10.4.2. Open Delegation and File Locks 7362 When a client holds a OPEN_DELEGATE_WRITE delegation, lock operations 7363 may be performed locally. This includes those required for mandatory 7364 file locking. This can be done since the delegation implies that 7365 there can be no conflicting locks. Similarly, all of the 7366 revalidations that would normally be associated with obtaining locks 7367 and the flushing of data associated with the releasing of locks need 7368 not be done. 7370 When a client holds a OPEN_DELEGATE_READ delegation, lock operations 7371 are not performed locally. All lock operations, including those 7372 requesting non-exclusive locks, are sent to the server for 7373 resolution. 7375 10.4.3. Handling of CB_GETATTR 7377 The server needs to employ special handling for a GETATTR where the 7378 target is a file that has a OPEN_DELEGATE_WRITE delegation in effect. 7379 The reason for this is that the client holding the 7380 OPEN_DELEGATE_WRITE delegation may have modified the data and the 7381 server needs to reflect this change to the second client that 7382 submitted the GETATTR. Therefore, the client holding the 7383 OPEN_DELEGATE_WRITE delegation needs to be interrogated. The server 7384 will use the CB_GETATTR operation. The only attributes that the 7385 server can reliably query via CB_GETATTR are size and change. 7387 Since CB_GETATTR is being used to satisfy another client's GETATTR 7388 request, the server only needs to know if the client holding the 7389 delegation has a modified version of the file. If the client's copy 7390 of the delegated file is not modified (data or size), the server can 7391 satisfy the second client's GETATTR request from the attributes 7392 stored locally at the server. If the file is modified, the server 7393 only needs to know about this modified state. If the server 7394 determines that the file is currently modified, it will respond to 7395 the second client's GETATTR as if the file had been modified locally 7396 at the server. 7398 Since the form of the change attribute is determined by the server 7399 and is opaque to the client, the client and server need to agree on a 7400 method of communicating the modified state of the file. For the size 7401 attribute, the client will report its current view of the file size. 7402 For the change attribute, the handling is more involved. 7404 For the client, the following steps will be taken when receiving a 7405 OPEN_DELEGATE_WRITE delegation: 7407 o The value of the change attribute will be obtained from the server 7408 and cached. Let this value be represented by c. 7410 o The client will create a value greater than c that will be used 7411 for communicating modified data is held at the client. Let this 7412 value be represented by d. 7414 o When the client is queried via CB_GETATTR for the change 7415 attribute, it checks to see if it holds modified data. If the 7416 file is modified, the value d is returned for the change attribute 7417 value. If this file is not currently modified, the client returns 7418 the value c for the change attribute. 7420 For simplicity of implementation, the client MAY for each CB_GETATTR 7421 return the same value d. This is true even if, between successive 7422 CB_GETATTR operations, the client again modifies in the file's data 7423 or metadata in its cache. The client can return the same value 7424 because the only requirement is that the client be able to indicate 7425 to the server that the client holds modified data. Therefore, the 7426 value of d may always be c + 1. 7428 While the change attribute is opaque to the client in the sense that 7429 it has no idea what units of time, if any, the server is counting 7430 change with, it is not opaque in that the client has to treat it as 7431 an unsigned integer, and the server has to be able to see the results 7432 of the client's changes to that integer. Therefore, the server MUST 7433 encode the change attribute in network order when sending it to the 7434 client. The client MUST decode it from network order to its native 7435 order when receiving it and the client MUST encode it network order 7436 when sending it to the server. For this reason, change is defined as 7437 an unsigned integer rather than an opaque array of bytes. 7439 For the server, the following steps will be taken when providing a 7440 OPEN_DELEGATE_WRITE delegation: 7442 o Upon providing a OPEN_DELEGATE_WRITE delegation, the server will 7443 cache a copy of the change attribute in the data structure it uses 7444 to record the delegation. Let this value be represented by sc. 7446 o When a second client sends a GETATTR operation on the same file to 7447 the server, the server obtains the change attribute from the first 7448 client. Let this value be cc. 7450 o If the value cc is equal to sc, the file is not modified and the 7451 server returns the current values for change, time_metadata, and 7452 time_modify (for example) to the second client. 7454 o If the value cc is NOT equal to sc, the file is currently modified 7455 at the first client and most likely will be modified at the server 7456 at a future time. The server then uses its current time to 7457 construct attribute values for time_metadata and time_modify. A 7458 new value of sc, which we will call nsc, is computed by the 7459 server, such that nsc >= sc + 1. The server then returns the 7460 constructed time_metadata, time_modify, and nsc values to the 7461 requester. The server replaces sc in the delegation record with 7462 nsc. To prevent the possibility of time_modify, time_metadata, 7463 and change from appearing to go backward (which would happen if 7464 the client holding the delegation fails to write its modified data 7465 to the server before the delegation is revoked or returned), the 7466 server SHOULD update the file's metadata record with the 7467 constructed attribute values. For reasons of reasonable 7468 performance, committing the constructed attribute values to stable 7469 storage is OPTIONAL. 7471 As discussed earlier in this section, the client MAY return the same 7472 cc value on subsequent CB_GETATTR calls, even if the file was 7473 modified in the client's cache yet again between successive 7474 CB_GETATTR calls. Therefore, the server must assume that the file 7475 has been modified yet again, and MUST take care to ensure that the 7476 new nsc it constructs and returns is greater than the previous nsc it 7477 returned. An example implementation's delegation record would 7478 satisfy this mandate by including a boolean field (let us call it 7479 "modified") that is set to FALSE when the delegation is granted, and 7480 an sc value set at the time of grant to the change attribute value. 7481 The modified field would be set to TRUE the first time cc != sc, and 7482 would stay TRUE until the delegation is returned or revoked. The 7483 processing for constructing nsc, time_modify, and time_metadata would 7484 use this pseudo code: 7486 if (!modified) { 7487 do CB_GETATTR for change and size; 7489 if (cc != sc) 7490 modified = TRUE; 7491 } else { 7492 do CB_GETATTR for size; 7493 } 7495 if (modified) { 7496 sc = sc + 1; 7497 time_modify = time_metadata = current_time; 7498 update sc, time_modify, time_metadata into file's metadata; 7499 } 7501 This would return to the client (that sent GETATTR) the attributes it 7502 requested, but make sure size comes from what CB_GETATTR returned. 7503 The server would not update the file's metadata with the client's 7504 modified size. 7506 In the case that the file attribute size is different than the 7507 server's current value, the server treats this as a modification 7508 regardless of the value of the change attribute retrieved via 7509 CB_GETATTR and responds to the second client as in the last step. 7511 This methodology resolves issues of clock differences between client 7512 and server and other scenarios where the use of CB_GETATTR break 7513 down. 7515 It should be noted that the server is under no obligation to use 7516 CB_GETATTR and therefore the server MAY simply recall the delegation 7517 to avoid its use. 7519 10.4.4. Recall of Open Delegation 7521 The following events necessitate recall of an open delegation: 7523 o Potentially conflicting OPEN request (or READ/WRITE done with 7524 "special" stateid) 7526 o SETATTR issued by another client 7528 o REMOVE request for the file 7530 o RENAME request for the file as either source or target of the 7531 RENAME 7533 Whether a RENAME of a directory in the path leading to the file 7534 results in recall of an open delegation depends on the semantics of 7535 the server filesystem. If that filesystem denies such RENAMEs when a 7536 file is open, the recall must be performed to determine whether the 7537 file in question is, in fact, open. 7539 In addition to the situations above, the server may choose to recall 7540 open delegations at any time if resource constraints make it 7541 advisable to do so. Clients should always be prepared for the 7542 possibility of recall. 7544 When a client receives a recall for an open delegation, it needs to 7545 update state on the server before returning the delegation. These 7546 same updates must be done whenever a client chooses to return a 7547 delegation voluntarily. The following items of state need to be 7548 dealt with: 7550 o If the file associated with the delegation is no longer open and 7551 no previous CLOSE operation has been sent to the server, a CLOSE 7552 operation must be sent to the server. 7554 o If a file has other open references at the client, then OPEN 7555 operations must be sent to the server. The appropriate stateids 7556 will be provided by the server for subsequent use by the client 7557 since the delegation stateid will not longer be valid. These OPEN 7558 requests are done with the claim type of CLAIM_DELEGATE_CUR. This 7559 will allow the presentation of the delegation stateid so that the 7560 client can establish the appropriate rights to perform the OPEN. 7561 (see Section 15.18 for details.) 7563 o If there are granted file locks, the corresponding LOCK operations 7564 need to be performed. This applies to the OPEN_DELEGATE_WRITE 7565 delegation case only. 7567 o For a OPEN_DELEGATE_WRITE delegation, if at the time of recall the 7568 file is not open for write, all modified data for the file must be 7569 flushed to the server. If the delegation had not existed, the 7570 client would have done this data flush before the CLOSE operation. 7572 o For a OPEN_DELEGATE_WRITE delegation when a file is still open at 7573 the time of recall, any modified data for the file needs to be 7574 flushed to the server. 7576 o With the OPEN_DELEGATE_WRITE delegation in place, it is possible 7577 that the file was truncated during the duration of the delegation. 7578 For example, the truncation could have occurred as a result of an 7579 OPEN UNCHECKED4 with a size attribute value of zero. Therefore, 7580 if a truncation of the file has occurred and this operation has 7581 not been propagated to the server, the truncation must occur 7582 before any modified data is written to the server. 7584 In the case of OPEN_DELEGATE_WRITE delegation, file locking imposes 7585 some additional requirements. To precisely maintain the associated 7586 invariant, it is required to flush any modified data in any region 7587 for which a write lock was released while the OPEN_DELEGATE_WRITE 7588 delegation was in effect. However, because the OPEN_DELEGATE_WRITE 7589 delegation implies no other locking by other clients, a simpler 7590 implementation is to flush all modified data for the file (as 7591 described just above) if any write lock has been released while the 7592 OPEN_DELEGATE_WRITE delegation was in effect. 7594 An implementation need not wait until delegation recall (or deciding 7595 to voluntarily return a delegation) to perform any of the above 7596 actions, if implementation considerations (e.g., resource 7597 availability constraints) make that desirable. Generally, however, 7598 the fact that the actual open state of the file may continue to 7599 change makes it not worthwhile to send information about opens and 7600 closes to the server, except as part of delegation return. Only in 7601 the case of closing the open that resulted in obtaining the 7602 delegation would clients be likely to do this early, since, in that 7603 case, the close once done will not be undone. Regardless of the 7604 client's choices on scheduling these actions, all must be performed 7605 before the delegation is returned, including (when applicable) the 7606 close that corresponds to the open that resulted in the delegation. 7607 These actions can be performed either in previous requests or in 7608 previous operations in the same COMPOUND request. 7610 10.4.5. OPEN Delegation Race with CB_RECALL 7612 The server informs the client of recall via a CB_RECALL. A race case 7613 which may develop is when the delegation is immediately recalled 7614 before the COMPOUND which established the delegation is returned to 7615 the client. As the CB_RECALL provides both a stateid and a 7616 filehandle for which the client has no mapping, it cannot honor the 7617 recall attempt. At this point, the client has two choices, either do 7618 not respond or respond with NFS4ERR_BADHANDLE. If it does not 7619 respond, then it runs the risk of the server deciding to not grant it 7620 further delegations. 7622 If instead it does reply with NFS4ERR_BADHANDLE, then both the client 7623 and the server might be able to detect that a race condition is 7624 occurring. The client can keep a list of pending delegations. When 7625 it receives a CB_RECALL for an unknown delegation, it can cache the 7626 stateid and filehandle on a list of pending recalls. When it is 7627 provided with a delegation, it would only use it if it was not on the 7628 pending recall list. Upon the next CB_RECALL, it could immediately 7629 return the delegation. 7631 In turn, the server can keep track of when it issues a delegation and 7632 assume that if a client responds to the CB_RECALL with a 7633 NFS4ERR_BADHANDLE, then the client has yet to receive the delegation. 7634 The server SHOULD give the client a reasonable time both to get this 7635 delegation and to return it before revoking the delegation. Unlike a 7636 failed callback path, the server should periodically probe the client 7637 with CB_RECALL to see if it has received the delegation and is ready 7638 to return it. 7640 When the server finally determines that enough time has lapsed, it 7641 SHOULD revoke the delegation and it SHOULD NOT revoke the lease. 7642 During this extended recall process, the server SHOULD be renewing 7643 the client lease. The intent here is that the client not pay too 7644 onerous a burden for a condition caused by the server. 7646 10.4.6. Clients that Fail to Honor Delegation Recalls 7648 A client may fail to respond to a recall for various reasons, such as 7649 a failure of the callback path from server to the client. The client 7650 may be unaware of a failure in the callback path. This lack of 7651 awareness could result in the client finding out long after the 7652 failure that its delegation has been revoked, and another client has 7653 modified the data for which the client had a delegation. This is 7654 especially a problem for the client that held a OPEN_DELEGATE_WRITE 7655 delegation. 7657 The server also has a dilemma in that the client that fails to 7658 respond to the recall might also be sending other NFS requests, 7659 including those that renew the lease before the lease expires. 7660 Without returning an error for those lease renewing operations, the 7661 server leads the client to believe that the delegation it has is in 7662 force. 7664 This difficulty is solved by the following rules: 7666 o When the callback path is down, the server MUST NOT revoke the 7667 delegation if one of the following occurs: 7669 * The client has issued a RENEW operation and the server has 7670 returned an NFS4ERR_CB_PATH_DOWN error. The server MUST renew 7671 the lease for any byte-range locks and share reservations the 7672 client has that the server has known about (as opposed to those 7673 locks and share reservations the client has established but not 7674 yet sent to the server, due to the delegation). The server 7675 SHOULD give the client a reasonable time to return its 7676 delegations to the server before revoking the client's 7677 delegations. 7679 * The client has not issued a RENEW operation for some period of 7680 time after the server attempted to recall the delegation. This 7681 period of time MUST NOT be less than the value of the 7682 lease_time attribute. 7684 o When the client holds a delegation, it cannot rely on operations, 7685 except for RENEW, that take a stateid, to renew delegation leases 7686 across callback path failures. The client that wants to keep 7687 delegations in force across callback path failures must use RENEW 7688 to do so. 7690 10.4.7. Delegation Revocation 7692 At the point a delegation is revoked, if there are associated opens 7693 on the client, the applications holding these opens need to be 7694 notified. This notification usually occurs by returning errors for 7695 READ/WRITE operations or when a close is attempted for the open file. 7697 If no opens exist for the file at the point the delegation is 7698 revoked, then notification of the revocation is unnecessary. 7699 However, if there is modified data present at the client for the 7700 file, the user of the application should be notified. Unfortunately, 7701 it may not be possible to notify the user since active applications 7702 may not be present at the client. See Section 10.5.1 for additional 7703 details. 7705 10.5. Data Caching and Revocation 7707 When locks and delegations are revoked, the assumptions upon which 7708 successful caching depend are no longer guaranteed. For any locks or 7709 share reservations that have been revoked, the corresponding owner 7710 needs to be notified. This notification includes applications with a 7711 file open that has a corresponding delegation which has been revoked. 7712 Cached data associated with the revocation must be removed from the 7713 client. In the case of modified data existing in the client's cache, 7714 that data must be removed from the client without it being written to 7715 the server. As mentioned, the assumptions made by the client are no 7716 longer valid at the point when a lock or delegation has been revoked. 7717 For example, another client may have been granted a conflicting lock 7718 after the revocation of the lock at the first client. Therefore, the 7719 data within the lock range may have been modified by the other 7720 client. Obviously, the first client is unable to guarantee to the 7721 application what has occurred to the file in the case of revocation. 7723 Notification to a lock owner will in many cases consist of simply 7724 returning an error on the next and all subsequent READs/WRITEs to the 7725 open file or on the close. Where the methods available to a client 7726 make such notification impossible because errors for certain 7727 operations may not be returned, more drastic action such as signals 7728 or process termination may be appropriate. The justification for 7729 this is that an invariant for which an application depends on may be 7730 violated. Depending on how errors are typically treated for the 7731 client operating environment, further levels of notification 7732 including logging, console messages, and GUI pop-ups may be 7733 appropriate. 7735 10.5.1. Revocation Recovery for Write Open Delegation 7737 Revocation recovery for a OPEN_DELEGATE_WRITE delegation poses the 7738 special issue of modified data in the client cache while the file is 7739 not open. In this situation, any client which does not flush 7740 modified data to the server on each close must ensure that the user 7741 receives appropriate notification of the failure as a result of the 7742 revocation. Since such situations may require human action to 7743 correct problems, notification schemes in which the appropriate user 7744 or administrator is notified may be necessary. Logging and console 7745 messages are typical examples. 7747 If there is modified data on the client, it must not be flushed 7748 normally to the server. A client may attempt to provide a copy of 7749 the file data as modified during the delegation under a different 7750 name in the filesystem name space to ease recovery. Note that when 7751 the client can determine that the file has not been modified by any 7752 other client, or when the client has a complete cached copy of file 7753 in question, such a saved copy of the client's view of the file may 7754 be of particular value for recovery. In other case, recovery using a 7755 copy of the file based partially on the client's cached data and 7756 partially on the server copy as modified by other clients, will be 7757 anything but straightforward, so clients may avoid saving file 7758 contents in these situations or mark the results specially to warn 7759 users of possible problems. 7761 Saving of such modified data in delegation revocation situations may 7762 be limited to files of a certain size or might be used only when 7763 sufficient disk space is available within the target filesystem. 7764 Such saving may also be restricted to situations when the client has 7765 sufficient buffering resources to keep the cached copy available 7766 until it is properly stored to the target filesystem. 7768 10.6. Attribute Caching 7770 The attributes discussed in this section do not include named 7771 attributes. Individual named attributes are analogous to files and 7772 caching of the data for these needs to be handled just as data 7773 caching is for ordinary files. Similarly, LOOKUP results from an 7774 OPENATTR directory are to be cached on the same basis as any other 7775 pathnames and similarly for directory contents. 7777 Clients may cache file attributes obtained from the server and use 7778 them to avoid subsequent GETATTR requests. Such caching is write 7779 through in that modification to file attributes is always done by 7780 means of requests to the server and should not be done locally and 7781 cached. The exception to this are modifications to attributes that 7782 are intimately connected with data caching. Therefore, extending a 7783 file by writing data to the local data cache is reflected immediately 7784 in the size as seen on the client without this change being 7785 immediately reflected on the server. Normally such changes are not 7786 propagated directly to the server but when the modified data is 7787 flushed to the server, analogous attribute changes are made on the 7788 server. When open delegation is in effect, the modified attributes 7789 may be returned to the server in the response to a CB_RECALL call. 7791 The result of local caching of attributes is that the attribute 7792 caches maintained on individual clients will not be coherent. 7793 Changes made in one order on the server may be seen in a different 7794 order on one client and in a third order on a different client. 7796 The typical filesystem application programming interfaces do not 7797 provide means to atomically modify or interrogate attributes for 7798 multiple files at the same time. The following rules provide an 7799 environment where the potential incoherency mentioned above can be 7800 reasonably managed. These rules are derived from the practice of 7801 previous NFS protocols. 7803 o All attributes for a given file (per-fsid attributes excepted) are 7804 cached as a unit at the client so that no non-serializability can 7805 arise within the context of a single file. 7807 o An upper time boundary is maintained on how long a client cache 7808 entry can be kept without being refreshed from the server. 7810 o When operations are performed that change attributes at the 7811 server, the updated attribute set is requested as part of the 7812 containing RPC. This includes directory operations that update 7813 attributes indirectly. This is accomplished by following the 7814 modifying operation with a GETATTR operation and then using the 7815 results of the GETATTR to update the client's cached attributes. 7817 Note that if the full set of attributes to be cached is requested by 7818 READDIR, the results can be cached by the client on the same basis as 7819 attributes obtained via GETATTR. 7821 A client may validate its cached version of attributes for a file by 7822 fetching just both the change and time_access attributes and assuming 7823 that if the change attribute has the same value as it did when the 7824 attributes were cached, then no attributes other than time_access 7825 have changed. The reason why time_access is also fetched is because 7826 many servers operate in environments where the operation that updates 7827 change does not update time_access. For example, POSIX file 7828 semantics do not update access time when a file is modified by the 7829 write system call. Therefore, the client that wants a current 7830 time_access value should fetch it with change during the attribute 7831 cache validation processing and update its cached time_access. 7833 The client may maintain a cache of modified attributes for those 7834 attributes intimately connected with data of modified regular files 7835 (size, time_modify, and change). Other than those three attributes, 7836 the client MUST NOT maintain a cache of modified attributes. 7837 Instead, attribute changes are immediately sent to the server. 7839 In some operating environments, the equivalent to time_access is 7840 expected to be implicitly updated by each read of the content of the 7841 file object. If an NFS client is caching the content of a file 7842 object, whether it is a regular file, directory, or symbolic link, 7843 the client SHOULD NOT update the time_access attribute (via SETATTR 7844 or a small READ or READDIR request) on the server with each read that 7845 is satisfied from cache. The reason is that this can defeat the 7846 performance benefits of caching content, especially since an explicit 7847 SETATTR of time_access may alter the change attribute on the server. 7848 If the change attribute changes, clients that are caching the content 7849 will think the content has changed, and will re-read unmodified data 7850 from the server. Nor is the client encouraged to maintain a modified 7851 version of time_access in its cache, since this would mean that the 7852 client will either eventually have to write the access time to the 7853 server with bad performance effects, or it would never update the 7854 server's time_access, thereby resulting in a situation where an 7855 application that caches access time between a close and open of the 7856 same file observes the access time oscillating between the past and 7857 present. The time_access attribute always means the time of last 7858 access to a file by a read that was satisfied by the server. This 7859 way clients will tend to see only time_access changes that go forward 7860 in time. 7862 10.7. Data and Metadata Caching and Memory Mapped Files 7864 Some operating environments include the capability for an application 7865 to map a file's content into the application's address space. Each 7866 time the application accesses a memory location that corresponds to a 7867 block that has not been loaded into the address space, a page fault 7868 occurs and the file is read (or if the block does not exist in the 7869 file, the block is allocated and then instantiated in the 7870 application's address space). 7872 As long as each memory mapped access to the file requires a page 7873 fault, the relevant attributes of the file that are used to detect 7874 access and modification (time_access, time_metadata, time_modify, and 7875 change) will be updated. However, in many operating environments, 7876 when page faults are not required these attributes will not be 7877 updated on reads or updates to the file via memory access (regardless 7878 whether the file is local file or is being access remotely). A 7879 client or server MAY fail to update attributes of a file that is 7880 being accessed via memory mapped I/O. This has several implications: 7882 o If there is an application on the server that has memory mapped a 7883 file that a client is also accessing, the client may not be able 7884 to get a consistent value of the change attribute to determine 7885 whether its cache is stale or not. A server that knows that the 7886 file is memory mapped could always pessimistically return updated 7887 values for change so as to force the application to always get the 7888 most up to date data and metadata for the file. However, due to 7889 the negative performance implications of this, such behavior is 7890 OPTIONAL. 7892 o If the memory mapped file is not being modified on the server, and 7893 instead is just being read by an application via the memory mapped 7894 interface, the client will not see an updated time_access 7895 attribute. However, in many operating environments, neither will 7896 any process running on the server. Thus NFS clients are at no 7897 disadvantage with respect to local processes. 7899 o If there is another client that is memory mapping the file, and if 7900 that client is holding a OPEN_DELEGATE_WRITE delegation, the same 7901 set of issues as discussed in the previous two bullet items apply. 7902 So, when a server does a CB_GETATTR to a file that the client has 7903 modified in its cache, the response from CB_GETATTR will not 7904 necessarily be accurate. As discussed earlier, the client's 7905 obligation is to report that the file has been modified since the 7906 delegation was granted, not whether it has been modified again 7907 between successive CB_GETATTR calls, and the server MUST assume 7908 that any file the client has modified in cache has been modified 7909 again between successive CB_GETATTR calls. Depending on the 7910 nature of the client's memory management system, this weak 7911 obligation may not be possible. A client MAY return stale 7912 information in CB_GETATTR whenever the file is memory mapped. 7914 o The mixture of memory mapping and file locking on the same file is 7915 problematic. Consider the following scenario, where the page size 7916 on each client is 8192 bytes. 7918 * Client A memory maps first page (8192 bytes) of file X 7920 * Client B memory maps first page (8192 bytes) of file X 7922 * Client A write locks first 4096 bytes 7924 * Client B write locks second 4096 bytes 7926 * Client A, via a STORE instruction modifies part of its locked 7927 region. 7929 * Simultaneous to client A, client B issues a STORE on part of 7930 its locked region. 7932 Here the challenge is for each client to resynchronize to get a 7933 correct view of the first page. In many operating environments, the 7934 virtual memory management systems on each client only know a page is 7935 modified, not that a subset of the page corresponding to the 7936 respective lock regions has been modified. So it is not possible for 7937 each client to do the right thing, which is to only write to the 7938 server that portion of the page that is locked. For example, if 7939 client A simply writes out the page, and then client B writes out the 7940 page, client A's data is lost. 7942 Moreover, if mandatory locking is enabled on the file, then we have a 7943 different problem. When clients A and B issue the STORE 7944 instructions, the resulting page faults require a byte-range lock on 7945 the entire page. Each client then tries to extend their locked range 7946 to the entire page, which results in a deadlock. 7948 Communicating the NFS4ERR_DEADLOCK error to a STORE instruction is 7949 difficult at best. 7951 If a client is locking the entire memory mapped file, there is no 7952 problem with advisory or mandatory byte-range locking, at least until 7953 the client unlocks a region in the middle of the file. 7955 Given the above issues the following are permitted: 7957 o Clients and servers MAY deny memory mapping a file they know there 7958 are byte-range locks for. 7960 o Clients and servers MAY deny a byte-range lock on a file they know 7961 is memory mapped. 7963 o A client MAY deny memory mapping a file that it knows requires 7964 mandatory locking for I/O. If mandatory locking is enabled after 7965 the file is opened and mapped, the client MAY deny the application 7966 further access to its mapped file. 7968 10.8. Name Caching 7970 The results of LOOKUP and READDIR operations may be cached to avoid 7971 the cost of subsequent LOOKUP operations. Just as in the case of 7972 attribute caching, inconsistencies may arise among the various client 7973 caches. To mitigate the effects of these inconsistencies and given 7974 the context of typical filesystem APIs, an upper time boundary is 7975 maintained on how long a client name cache entry can be kept without 7976 verifying that the entry has not been made invalid by a directory 7977 change operation performed by another client. 7979 When a client is not making changes to a directory for which there 7980 exist name cache entries, the client needs to periodically fetch 7981 attributes for that directory to ensure that it is not being 7982 modified. After determining that no modification has occurred, the 7983 expiration time for the associated name cache entries may be updated 7984 to be the current time plus the name cache staleness bound. 7986 When a client is making changes to a given directory, it needs to 7987 determine whether there have been changes made to the directory by 7988 other clients. It does this by using the change attribute as 7989 reported before and after the directory operation in the associated 7990 change_info4 value returned for the operation. The server is able to 7991 communicate to the client whether the change_info4 data is provided 7992 atomically with respect to the directory operation. If the change 7993 values are provided atomically, the client is then able to compare 7994 the pre-operation change value with the change value in the client's 7995 name cache. If the comparison indicates that the directory was 7996 updated by another client, the name cache associated with the 7997 modified directory is purged from the client. If the comparison 7998 indicates no modification, the name cache can be updated on the 7999 client to reflect the directory operation and the associated timeout 8000 extended. The post-operation change value needs to be saved as the 8001 basis for future change_info4 comparisons. 8003 As demonstrated by the scenario above, name caching requires that the 8004 client revalidate name cache data by inspecting the change attribute 8005 of a directory at the point when the name cache item was cached. 8006 This requires that the server update the change attribute for 8007 directories when the contents of the corresponding directory is 8008 modified. For a client to use the change_info4 information 8009 appropriately and correctly, the server must report the pre and post 8010 operation change attribute values atomically. When the server is 8011 unable to report the before and after values atomically with respect 8012 to the directory operation, the server must indicate that fact in the 8013 change_info4 return value. When the information is not atomically 8014 reported, the client should not assume that other clients have not 8015 changed the directory. 8017 10.9. Directory Caching 8019 The results of READDIR operations may be used to avoid subsequent 8020 READDIR operations. Just as in the cases of attribute and name 8021 caching, inconsistencies may arise among the various client caches. 8022 To mitigate the effects of these inconsistencies, and given the 8023 context of typical filesystem APIs, the following rules should be 8024 followed: 8026 o Cached READDIR information for a directory which is not obtained 8027 in a single READDIR operation must always be a consistent snapshot 8028 of directory contents. This is determined by using a GETATTR 8029 before the first READDIR and after the last of READDIR that 8030 contributes to the cache. 8032 o An upper time boundary is maintained to indicate the length of 8033 time a directory cache entry is considered valid before the client 8034 must revalidate the cached information. 8036 The revalidation technique parallels that discussed in the case of 8037 name caching. When the client is not changing the directory in 8038 question, checking the change attribute of the directory with GETATTR 8039 is adequate. The lifetime of the cache entry can be extended at 8040 these checkpoints. When a client is modifying the directory, the 8041 client needs to use the change_info4 data to determine whether there 8042 are other clients modifying the directory. If it is determined that 8043 no other client modifications are occurring, the client may update 8044 its directory cache to reflect its own changes. 8046 As demonstrated previously, directory caching requires that the 8047 client revalidate directory cache data by inspecting the change 8048 attribute of a directory at the point when the directory was cached. 8049 This requires that the server update the change attribute for 8050 directories when the contents of the corresponding directory is 8051 modified. For a client to use the change_info4 information 8052 appropriately and correctly, the server must report the pre and post 8053 operation change attribute values atomically. When the server is 8054 unable to report the before and after values atomically with respect 8055 to the directory operation, the server must indicate that fact in the 8056 change_info4 return value. When the information is not atomically 8057 reported, the client should not assume that other clients have not 8058 changed the directory. 8060 11. Minor Versioning 8062 To address the requirement of an NFS protocol that can evolve as the 8063 need arises, the NFSv4 protocol contains the rules and framework to 8064 allow for future minor changes or versioning. 8066 The base assumption with respect to minor versioning is that any 8067 future accepted minor version must follow the IETF process and be 8068 documented in a standards track RFC. Therefore, each minor version 8069 number will correspond to an RFC. Minor version 0 of the NFS version 8070 4 protocol is represented by this RFC. The COMPOUND and CB_COMPOUND 8071 procedures support the encoding of the minor version being requested 8072 by the client. 8074 The following items represent the basic rules for the development of 8075 minor versions. Note that a future minor version may decide to 8076 modify or add to the following rules as part of the minor version 8077 definition. 8079 1. Procedures are not added or deleted 8081 To maintain the general RPC model, NFSv4 minor versions will not 8082 add to or delete procedures from the NFS program. 8084 2. Minor versions may add operations to the COMPOUND and 8085 CB_COMPOUND procedures. 8087 The addition of operations to the COMPOUND and CB_COMPOUND 8088 procedures does not affect the RPC model. 8090 1. Minor versions may append attributes to the bitmap4 that 8091 represents sets of attributes and to the fattr4 that 8092 represents sets of attribute values. 8094 This allows for the expansion of the attribute model to 8095 allow for future growth or adaptation. 8097 2. Minor version X must append any new attributes after the 8098 last documented attribute. 8100 Since attribute results are specified as an opaque array of 8101 per-attribute XDR encoded results, the complexity of adding 8102 new attributes in the midst of the current definitions would 8103 be too burdensome. 8105 3. Minor versions must not modify the structure of an existing 8106 operation's arguments or results. 8108 Again, the complexity of handling multiple structure definitions 8109 for a single operation is too burdensome. New operations should 8110 be added instead of modifying existing structures for a minor 8111 version. 8113 This rule does not preclude the following adaptations in a minor 8114 version. 8116 * adding bits to flag fields, such as new attributes to 8117 GETATTR's bitmap4 data type, and providing corresponding 8118 variants of opaque arrays, such as a notify4 used together 8119 with such bitmaps 8121 * adding bits to existing attributes like ACLs that have flag 8122 words 8124 * extending enumerated types (including NFS4ERR_*) with new 8125 values 8127 4. Minor versions must not modify the structure of existing 8128 attributes. 8130 5. Minor versions must not delete operations. 8132 This prevents the potential reuse of a particular operation 8133 "slot" in a future minor version. 8135 6. Minor versions must not delete attributes. 8137 7. Minor versions must not delete flag bits or enumeration values. 8139 8. Minor versions may declare an operation MUST NOT be implement. 8141 Specifying that an operation MUST NOT be implemented is 8142 equivalent to obsoleting an operation. For the client, it means 8143 that the operation MUST NOT be sent to the server. For the 8144 server, an NFS error can be returned as opposed to "dropping" 8145 the request as an XDR decode error. This approach allows for 8146 the obsolescence of an operation while maintaining its structure 8147 so that a future minor version can reintroduce the operation. 8149 1. Minor versions may declare that an attribute MUST NOT be 8150 implemented. 8152 2. Minor versions may declare that a flag bit or enumeration 8153 value MUST NOT be implemented. 8155 9. Minor versions may downgrade features from REQUIRED to 8156 RECOMMENDED, or RECOMMENDED to OPTIONAL. 8158 10. Minor versions may upgrade features from OPTIONAL to RECOMMENDED 8159 or RECOMMENDED to REQUIRED. 8161 11. A client and server that support minor version X SHOULD support 8162 minor versions 0 through X-1 as well. 8164 12. Except for infrastructural changes, no new features may be 8165 introduced as REQUIRED in a minor version. 8167 This rule allows for the introduction of new functionality and 8168 forces the use of implementation experience before designating a 8169 feature as REQUIRED. On the other hand, some classes of 8170 features are infrastructural and have broad effects. Allowing 8171 infrastructural features to be RECOMMENDED or OPTIONAL 8172 complicates implementation of the minor version. 8174 13. A client MUST NOT attempt to use a stateid, filehandle, or 8175 similar returned object from the COMPOUND procedure with minor 8176 version X for another COMPOUND procedure with minor version Y, 8177 where X != Y. 8179 12. Internationalization 8181 This chapter describes the string-handling aspects of the NFSv4 8182 protocol, and how they address issues related to 8183 internationalization, including issues related to UTF-8, 8184 normalization, string preparation, case folding, and handling of 8185 internationalization issues related to domains. 8187 The NFSv4 protocol needs to deal with internationalization, or I18N, 8188 with respect to file names and other strings as used within the 8189 protocol. The choice of string representation must allow for 8190 reasonable name/string access to clients, applications, and users 8191 which use various languages. The UTF-8 encoding of the UCS as 8192 defined by [7] allows for this type of access and follows the policy 8193 described in "IETF Policy on Character Sets and Languages", [8]. 8195 In implementing such policies, it is important to understand and 8196 respect the nature of NFSv4 as a means by which client 8197 implementations may invoke operations on remote file systems. Server 8198 implementations act as a conduit to a range of file system 8199 implementations that the NFSv4 server typically invokes through a 8200 virtual-file-system interface. 8202 Keeping this context in mind, one needs to understand that the file 8203 systems with which clients will be interacting will generally not be 8204 devoted solely to access using NFS version 4. Local access and its 8205 requirements will generally be important and often access over other 8206 remote file access protocols will be as well. It is generally a 8207 functional requirement in practice for the users of the NFSv4 8208 protocol (although it may be formally out of scope for this document) 8209 for the implementation to allow files created by other protocols and 8210 by local operations on the file system to be accessed using NFS 8211 version 4 as well. 8213 It also needs to be understood that a considerable portion of file 8214 name processing will occur within the implementation of the file 8215 system rather than within the limits of the NFSv4 server 8216 implementation per se. As a result, cetain aspects of name 8217 processing may change as the locus of processing moves from file 8218 system to file system. As a result of these factors, the protocol 8219 cannot enforce uniformity of name-related processing upon NFSv4 8220 server requests on the server as a whole. Because the server 8221 interacts with existing file system implementations, the same server 8222 handling will produce different behavior when interacting with 8223 different file system implementations. To attempt to require uniform 8224 behavior, and treat the the protocol server and the file system as a 8225 unified application, would considerably limit the usefulness of the 8226 protocol. 8228 12.1. Use of UTF-8 8230 As mentioned above, UTF-8 is used as a convenient way to encode 8231 Unicode which allows clients that have no internationalization 8232 requirements to avoid these issues since the mapping of ASCII names 8233 to UTF-8 is the identity. 8235 12.1.1. Relation to Stringprep 8237 RFC 3454 [9], otherwise known as "stringprep", documents a framework 8238 for using Unicode/UTF-8 in networking protocols, intended "to 8239 increase the likelihood that string input and string comparison work 8240 in ways that make sense for typical users throughout the world." A 8241 protocol conforming to this framework must define a profile of 8242 stringprep "in order to fully specify the processing options." 8243 NFSv4, while it does make normative references to stringprep and uses 8244 elements of that framework, it does not, for reasons that are 8245 explained below, conform to that framework, for all of the strings 8246 that are used within it. 8248 In addition to some specific issues which have caused stringprep to 8249 add confusion in handling certain characters for certain languages, 8250 there are a number of general reasons why stringprep profiles are not 8251 suitable for describing NFSv4. 8253 o Restricting the character repertoire to Unicode 3.2, as required 8254 by stringprep is unduly constricting. 8256 o Many of the character tables in stringprep are inappropriate 8257 because of this limited character repertoire, so that normative 8258 reference to stringprep is not desirable in many case and instead, 8259 we allow more flexibility in the definition of case mapping 8260 tables. 8262 o Because of the presence of different file systems, the specifics 8263 of processing are not fully defined and some aspects that are are 8264 RECOMMENDED, rather than REQUIRED. 8266 Despite these issues, in many cases the general structure of 8267 stringprep profiles, consisting of sections which deal with the 8268 applicability of the description, the character repertoire, character 8269 mapping, normalization, prohibited characters, and issues of the 8270 handling (i.e., possible prohibition) of bidirectional strings, is a 8271 convenient way to describe the string handling which is needed and 8272 will be used where appropriate. 8274 12.1.2. Normalization, Equivalence, and Confusability 8276 Unicode has defined several equivalence relationships among the set 8277 of possible strings. Understanding the nature and purpose of these 8278 equivalence relations is important to understand the handling of 8279 Unicode strings within NFSv4. 8281 Some string pairs are thought as only differing in the way accents 8282 and other diacritics are encoded, as illustrated in the examples 8283 below. Such string pairs are called "canonically equivalent". 8285 Such equivalence can occur when there are precomposed characters, 8286 as an alternative to encoding a base character in addition to a 8287 combining accent. For example, the character LATIN SMALL LETTER E 8288 WITH ACUTE (U+00E9) is defined as canonically equivalent to the 8289 string consisting of LATIN SMALL LETTER E followed by COMBINING 8290 ACUTE ACCENT (U+0065, U+0301). 8292 When multiple combining diacritics are present, differences in the 8293 ordering are not reflected in resulting display and the strings 8294 are defined as canonically equivalent. For example, the string 8295 consisting of LATIN SMALL LETTER Q, COMBINING ACUTE ACCENT, 8296 COMBINING GRAVE ACCENT (U+0071, U+0301, U+0300) is canonically 8297 equivalent to the string consisting of LATIN SMALL LETTER Q, 8298 COMBINING GRAVE ACCENT, COMBINING ACUTE ACCENT (U+0071, U+0300, 8299 U+0301) 8301 When both situations are present, the number of canonically 8302 equivalent strings can be greater. Thus, the following strings 8303 are all canonically equivalent: 8305 LATIN SMALL LETTER E, COMBINING MACRON, ACCENT, COMBINING ACUTE 8306 ACCENT (U+0xxx, U+0304, U+0301) 8308 LATIN SMALL LETTER E, COMBINING ACUTE ACCENT, COMBINING MACRON 8309 (U+0xxx, U+0301, U+0304) 8311 LATIN SMALL LETTER E WITH MACRON, COMBINING ACUTE ACCENT 8312 (U+011E, U+0301) 8314 LATIN SMALL LETTER E WITH ACUTE, COMBINING MACRON (U+00E9, 8315 U+0304) 8317 LATIN SMALL LETTER E WITH MACRON AND ACUTE (U+1E16) 8319 Additionally there is an equivalence relation of "compatibility 8320 equivalence". Two canonically equivalent strings are necessarily 8321 compatibility equivalent, although not the converse. An example of 8322 compatibility equivalent strings which are not canonically equivalent 8323 are GREEK CAPITAL LETTER OMEGA (U+03A9) and OHM SIGN (U+2129). These 8324 are identical in appearance while other compatibility equivalent 8325 strings are not. Another example would be "x2" and the two character 8326 string denoting x-squared which are clearly different in appearance 8327 although compatibility equivalent and not canonically equivalent. 8328 These have Unicode encodings LATIN SMALL LETTER X, DIGIT TWO (U+0078, 8329 U+0032) and LATIN SMALL LETTER X, SUPERSCRIPT TWO (U+0078, U+00B2), 8331 One way to deal with these equivalence relations is via 8332 normalization. A normalization form maps all strings to a 8333 corresponding normalized string in such a fashion that all strings 8334 that are equivalent (canonically or compatibly, depending on the 8335 form) are mapped to the same value. Thus the image of the mapping is 8336 a subset of Unicode strings conceived as the representatives of the 8337 equivalence classes defined by the chosen equivalence relation. 8339 In the NFSv4 protocol, handling of issues related to 8340 internationalization with regard to normalization follows one of two 8341 basic patterns: 8343 o For strings whose function is related to other internet standards, 8344 such as server and domain naming, the normalization form defined 8345 by the appropriate internet standards is used. For server and 8346 domain naming, this involves normalization form NFKC as specified 8347 in [10] 8349 o For other strings, particular those passed by the server to file 8350 system implementations, normalization requirements are the 8351 province of the file system and the job of this specification is 8352 not to specify a particular form but to make sure that 8353 interoperability is maximized, even when clients and server-based 8354 file systems have different preferences. 8356 A related but distinct issue concerns string confusability. This can 8357 occur when two strings (including single-character strings) having a 8358 similar appearance. There have been attempts to define uniform 8359 processing in an attempt to avoid such confusion (see stringprep [9]) 8360 but the results have often added confusion. 8362 Some examples of possible confusions and proposed processing intended 8363 to reduce/avoid confusions: 8365 o Deletion of characters believed to be invisible and appropriately 8366 ignored, justifying their deletion, including, WORD JOINER 8367 (U+2060), and the ZERO WIDTH SPACE (U+200B). 8369 o Deletion of characters supposed to not bear semantics and only 8370 affect glyph choice, including the ZERO WIDTH NON-JOINER (U+200C) 8371 and the ZERO WIDTH JOINER (U+200D), where the deletion turns out 8372 to be a problem for Farsi speakers. 8374 o Prohibition of space characters such as the EM SPACE (U+2003), the 8375 EN SPACE (U+2002), and the THIN SPACE (U+2009). 8377 In addition, character pairs which appear very similar and could and 8378 often do result in confusion. In addition to what Unicode defines as 8379 "compatibility equivalence", there are a considerable number of 8380 additional character pairs that could cause confusion. This includes 8381 characters such as LATIN CAPITAL LETTER O (U+004F) and DIGIT ZERO 8382 (U+0030), and CYRILLIC SMALL LETTER ER (U+0440) LATIN SMALL LETTER P 8383 (U+0070) (also with MATHEMATICAL BOLD SMALL P (U+1D429) and GREEK 8384 SMALL LETTER RHO (U+1D56, for good measure). 8386 NFSv4, as it does with normalization, takes a two-part approach to 8387 this issue: 8389 o For strings whose function is related to other internet standards, 8390 such as server and domain naming, any string processing to address 8391 the confusability issue is defined by the appropriate internet 8392 standards is used. For server and domain naming, this is the 8393 responsibility of IDNA as described in [10]. 8395 o For other strings, particularly those passed by the server to file 8396 system implementations, any such preparation requirements 8397 including the choice of how, or whether to address the 8398 confusability issue, are the responsibility of the file system to 8399 define, and for this specification to try to add its own set would 8400 add unacceptably to complexity, and make many files accessible 8401 locally and by other remote file access protocols, inaccessible by 8402 NFSv4. This specification defines how the protocol maximizes 8403 interoperability in the face of different file system 8404 implementations. NFSv4 does allow file systems to map and to 8405 reject characters, including those likely to result in confusion, 8406 since file systems may choose to do such things. It defines what 8407 the client will see in such cases, in order to limit problems that 8408 can arise when a file name is created and it appears to have a 8409 different name from the one it is assigned when the name is 8410 created. 8412 12.2. String Type Overview 8414 12.2.1. Overall String Class Divisions 8416 NFSv4 has to deal with a large set of different types of strings and 8417 because of the different role of each, internationalization issues 8418 will be different for each: 8420 o For some types of strings, the fundamental internationalization- 8421 related decisions are the province of the file system or the 8422 security-handling functions of the server and the protocol's job 8423 is to establish the rules under which file systems and servers are 8424 allowed to exercise this freedom, to avoid adding to confusion. 8426 o In other cases, the fundamental internationalization issues are 8427 the responsibility of other IETF groups and our job is simply to 8428 reference those and perhaps make a few choices as to how they are 8429 to be used (e.g., U-labels vs. A-labels). 8431 o There are also cases in which a string has a small amount of NFSv4 8432 processing which results in one or more strings being referred to 8433 one of the other categories. 8435 We will divide strings to be dealt with into the following classes: 8437 MIX: indicating that there is small amount of preparatory processing 8438 that either picks an internationalization handling mode or divides 8439 the string into a set of (two) strings with a different mode 8440 internationalization handling for each. The details are discussed 8441 in the section "Types with Pre-processing to Resolve Mixture 8442 Issues". 8444 NIP: indicating that, for various reasons, there is no need for 8445 internationalization-specific processing to be performed. The 8446 specifics of the various string types handled in this way are 8447 described in the section "String Types without 8448 Internationalization Processing". 8450 INET: indicating that the string needs to be processed in a fashion 8451 governed by non-NFS-specific internet specifications. The details 8452 are discussed in the section "Types with Processing Defined by 8453 Other Internet Areas". 8455 NFS: indicating that the string needs to be processed in a fashion 8456 governed by NFSv4-specific considerations. The primary focus is 8457 on enabling flexibility for the various file systems to be 8458 accessed and is described in the section "String Types with NFS- 8459 specific Processing". 8461 12.2.2. Divisions by Typedef Parent types 8463 There are a number of different string types within NFSv4 and 8464 internationalization handling will be different for different types 8465 of strings. Each the types will be in one of four groups based on 8466 the parent type that specifies the nature of its relationship to utf8 8467 and ascii. 8469 utf8_should/USHOULD: indicating that strings of this type SHOULD be 8470 UTF-8 but clients and servers will not check for valid UTF-8 8471 encoding. 8473 utf8val_should/UVSHOULD: indicating that strings of this type SHOULD 8474 be and generally will be in the form of the UTF-8 encoding of 8475 Unicode. Strings in most cases will be checked by the server for 8476 valid UTF-8 but for certain file systems, such checking may be 8477 inhibited. 8479 utf8val_must/UVMUST: indicating that strings of this type MUST be in 8480 the form of the UTF-8 encoding of Unicode. Strings will be 8481 checked by the server for valid UTF-8 and the server SHOULD ensure 8482 that when sent to the client, they are valid UTF-8. 8484 ascii_must/ASCII: indicating that strings of this type MUST be pure 8485 ASCII, and thus automatically UTF-8. The processing of these 8486 string must ensure that they are only have ASCII characters but 8487 this need not be a separate step if any normally required check 8488 for validity inherently assures that only ASCII characters are 8489 present. 8491 In those cases where UTF-8 is not required, USHOULD and UVSHOULD, and 8492 strings that are not valid UTF-8 are received and accepted, the 8493 receiver MUST NOT modify the strings. For example, setting 8494 particular bits such as the high-order bit to zero MUST NOT be done. 8496 12.2.3. Individual Types and Their Handling 8498 The first table outlines the handling for the primary string types, 8499 i.e., those not derived as a prefix or a suffix from a mixture type. 8501 +-----------------+----------+-------+------------------------------+ 8502 | Type | Parent | Class | Explanation | 8503 +-----------------+----------+-------+------------------------------+ 8504 | comptag4 | USHOULD | NIP | Should be utf8 but no | 8505 | | | | validation by server or | 8506 | | | | client is to be done. | 8507 | component4 | UVSHOULD | NFS | Should be utf8 but clients | 8508 | | | | may need to access file | 8509 | | | | systems with a different | 8510 | | | | name structure, such as file | 8511 | | | | systems that have non-utf8 | 8512 | | | | names. | 8513 | linktext4 | UVSHOULD | NFS | Should be utf8 since text | 8514 | | | | may include name components. | 8515 | | | | Because of the need to | 8516 | | | | access existing file | 8517 | | | | systems, this check may be | 8518 | | | | inhibited. | 8519 | fattr4_mimetype | ASCII | NIP | All mime types are ascii so | 8520 | | | | no specific utf8 processing | 8521 | | | | is required, given that you | 8522 | | | | are comparing to that list. | 8523 +-----------------+----------+-------+------------------------------+ 8525 Table 5 8527 There are a number of string types that are subject to preliminary 8528 processing. This processing may take the form either of selecting 8529 one of two possible forms based on the string contents or it in may 8530 consist of dividing the string into multiple conjoined strings each 8531 with different utf8-related processing. 8533 +---------+--------+-------+----------------------------------------+ 8534 | Type | Parent | Class | Explanation | 8535 +---------+--------+-------+----------------------------------------+ 8536 | prin4 | UVMUST | MIX | Consists of two parts separated by an | 8537 | | | | at-sign, a prinpfx4 and a prinsfx4. | 8538 | | | | These are described in the next table. | 8539 | server4 | UVMUST | MIX | Is either an IP address (serveraddr4) | 8540 | | | | which has to be pure ascii or a server | 8541 | | | | name svrname4, which is described | 8542 | | | | immediately below. | 8543 +---------+--------+-------+----------------------------------------+ 8545 Table 6 8547 The last table describes the components of the compound types 8548 described above. 8550 +----------+--------+------+----------------------------------------+ 8551 | Type | Class | Def | Explanation | 8552 +----------+--------+------+----------------------------------------+ 8553 | svraddr4 | ASCII | NIP | Server as IP address, whether IPv4 or | 8554 | | | | IPv6. | 8555 | svrname4 | UVMUST | INET | Server name as returned by server. | 8556 | | | | Not sent by client, except in | 8557 | | | | VERIFY/NVERIFY. | 8558 | prinsfx4 | UVMUST | INET | Suffix part of principal, in the form | 8559 | | | | of a domain name. | 8560 | prinpfx4 | UVMUST | NFS | Must match one of a list of valid | 8561 | | | | users or groups for that particular | 8562 | | | | domain. | 8563 +----------+--------+------+----------------------------------------+ 8565 Table 7 8567 12.3. Errors Related to Strings 8569 When the client sends an invalid UTF-8 string in a context in which 8570 UTF-8 is REQUIRED, the server MUST return an NFS4ERR_INVAL error. 8571 Within the framework of the previous section, this applies to strings 8572 whose type is defined as utf8val_must or ascii_must. When the client 8573 sends an invalid UTF-8 string in a context in which UTF-8 is 8574 RECOMMENDED and the server should test for UTF-8, the server SHOULD 8575 return an NFS4ERR_INVAL error. Within the framework of the previous 8576 section, this applies to strings whose type is defined as 8577 utf8val_should. These situations apply to cases in which 8578 inappropriate prefixes are detected and where the count includes 8579 trailing bytes that do not constitute a full UCS character. 8581 Where the client-supplied string is valid UTF-8 but contains 8582 characters that are not supported by the server file system as a 8583 value for that string (e.g., names containing characters that have 8584 more than two octets on a file system that supports UCS-2 characters 8585 only, file name components containing slashes on file systems that do 8586 not allow them in file name components), the server MUST return an 8587 NFS4ERR_BADCHAR error. 8589 Where a UTF-8 string is used as a file name component, and the file 8590 system, while supporting all of the characters within the name, does 8591 not allow that particular name to be used, the server should return 8592 the error NFS4ERR_BADNAME. This includes file system prohibitions of 8593 "." and ".." as file names for certain operations, and other such 8594 similar constraints. It does not include use of strings with non- 8595 preferred normalization modes. 8597 Where a UTF-8 string is used as a file name component, the file 8598 system implementation MUST NOT return NFS4ERR_BADNAME, simply due to 8599 a normalization mismatch. In such cases the implementation SHOULD 8600 convert the string to its own preferred normalization mode before 8601 performing the operation. As a result, a client cannot assume that a 8602 file created with a name it specifies will have that name when the 8603 directory is read. It may have instead, the name converted to the 8604 file system's preferred normalization form. 8606 Where a UTF-8 string is used as other than as file name component (or 8607 as symbolic link text) and the string does not meet the normalization 8608 requirements specified for it, the error NFS4ERR_INVAL is returned. 8610 12.4. Types with Pre-processing to Resolve Mixture Issues 8612 12.4.1. Processing of Principal Strings 8614 Strings denoting principals (users or groups) MUST be UTF-8 but since 8615 they consist of a principal prefix, an at-sign, and a domain, all 8616 three of which either are checked for being UTF-8, or inherently are 8617 UTF-8, checking the string as a whole for being UTF-8 is not 8618 required. Although a server implementation may choose to make this 8619 check on the string as whole, for example in converting it to 8620 Unicode, the description within this document, will reflect a 8621 processing model in which such checking happens after the division 8622 into a principal prefix and suffix, the latter being in the form of a 8623 domain name. 8625 The string should be scanned for at-signs. If there is more that one 8626 at-sign, the string is considered invalid. For cases in which there 8627 are no at-signs or the at-sign appears at the start or end of the 8628 string see Interpreting owner and owner_group. Otherwise, the 8629 portion before the at-sign is dealt with as a prinpfx4 and the 8630 portion after is dealt with as a prinsfx4. 8632 12.4.2. Processing of Server Id Strings 8634 Server id strings typically appear in responses (as attribute values) 8635 and only appear in requests as an attribute value presented to VERIFY 8636 and NVERIFY. With that exception, they are not subject to server 8637 validation and possible rejection. It is not expected that clients 8638 will typically do such validation on receipt of responses but they 8639 may as a way to check for proper server behavior. The responsibility 8640 for sending correct UTF-8 strings is with the server. 8642 Servers are identified by either server names or IP addresses. Once 8643 an id has been identified as an IP address, then there is no 8644 processing specific to internationalization to be done, since such an 8645 address must be ASCII to be valid. 8647 12.5. String Types without Internationalization Processing 8649 There are a number of types of strings which, for a number of 8650 different reasons, do not require any internationalization-specific 8651 handling, such as validation of UTF-8, normalization, or character 8652 mapping or checking. This does not necessarily mean that the strings 8653 need not be UTF-8. In some case, other checking on the string 8654 ensures that they are valid UTF-8, without doing any checking 8655 specific to internationalization. 8657 The following are the specific types: 8659 comptag4: strings are an aid to debugging and the sender should 8660 avoid confusion by not using anything but valid UTF-8. But any 8661 work validating the string or modifying it would only add 8662 complication to a mechanism whose basic function is best supported 8663 by making it not subject to any checking and having data maximally 8664 available to be looked at in a network trace. 8666 fattr4_mimetype: strings need to be validated by matching against a 8667 list of valid mime types. Since these are all ASCII, no 8668 processing specific to internationalization is required since 8669 anything that does not match is invalid and anything which does 8670 not obey the rules of UTF-8 will not be ASCII and consequently 8671 will not match, and will be invalid. 8673 svraddr4: strings, in order to be valid, need to be ASCII, but if 8674 you check them for validity, you have inherently checked that that 8675 they are ASCII and thus UTF-8. 8677 12.6. Types with Processing Defined by Other Internet Areas 8679 There are two types of strings which NFSv4 deals with whose 8680 processing is defined by other Internet standards, and where issues 8681 related to different handling choices by server operating systems or 8682 server file systems do not apply. 8684 These are as follows: 8686 o Server names as they appear in the fs_locations attribute. Note 8687 that for most purposes, such server names will only be sent by the 8688 server to the client. The exception is use of the fs_locations 8689 attribute in a VERIFY or NVERIFY operation. 8691 o Principal suffixes which are used to denote sets of users and 8692 groups, and are in the form of domain names. 8694 The general rules for handling all of these domain-related strings 8695 are similar and independent of role the of the sender or receiver as 8696 client or server although the consequences of failure to obey these 8697 rules may be different for client or server. The server can report 8698 errors when it is sent invalid strings, whereas the client will 8699 simply ignore invalid string or use a default value in their place. 8701 The string sent SHOULD be in the form of a U-label although it MAY be 8702 in the form of an A-label or a UTF-8 string that would not map to 8703 itself when canonicalized by applying ToUnicode(ToASCII(...)). The 8704 receiver needs to be able to accept domain and server names in any of 8705 the formats allowed. The server MUST reject, using the the error 8706 NFS4ERR_INVAL, a string which is not valid UTF-8 or which begins with 8707 "xn--" and violates the rules for a valid A-label. 8709 When a domain string is part of id@domain or group@domain, the server 8710 SHOULD map domain strings which are A-labels or are UTF-8 domain 8711 names which are not U-labels, to the corresponding U-label, using 8712 ToUnicode(domain) or ToUnicode(ToASCII(domain)). As a result, the 8713 domain name returned within a userid on a GETATTR may not match that 8714 sent when the userid is set using SETATTR, although when this 8715 happens, the domain will be in the form of a U-label. When the 8716 server does not map domain strings which are not U-labels into a 8717 U-label, which it MAY do, it MUST NOT modify the domain and the 8718 domain returned on a GETATTR of the userid MUST be the same as that 8719 used when setting the userid by the SETATTTR. 8721 The server MAY implement VERIFY and NVERIFY without translating 8722 internal state to a string form, so that, for example, a user 8723 principal which represents a specific numeric user id, will match a 8724 different principal string which represents the same numeric user id. 8726 12.7. String Types with NFS-specific Processing 8728 For a number of data types within NFSv4, the primary responsibility 8729 for internationalization-related handling is that of some entity 8730 other than the server itself (see below for details). In these 8731 situations, the primary responsibility of NFSv4 is to provide a 8732 framework in which that other entity (file system and server 8733 operating system principal naming framework) implements its own 8734 decisions while establishing rules to limit interoperability issues. 8736 This pattern applies to the following data types: 8738 o In the case of name components (strings of type component4), the 8739 server-side file system implementation (of which there may be more 8740 than one for a particular server) deals with internationalization 8741 issues, in a fashion that is appropriate to NFSv4, other remote 8742 file access protocols, and local file access methods. See 8743 "Handling of File Name Components" for the detailed treatment. 8745 o In the case of link text strings (strings of type lintext4), the 8746 issues are similar, but file systems are restricted in the set of 8747 acceptable internationalization-related processing that they may 8748 do, principally because symbolic links may contain name components 8749 that, when used, are presented to other file systems and/or other 8750 servers. See "Processing of Link Text" for the detailed 8751 treatment. 8753 o In the case of principal prefix strings, any decisions regarding 8754 internationalization are the responsibility of the server 8755 operating systems which may make its own rules regarding user and 8756 group name encoding. See "Processing of Principal Prefixes" for 8757 the detailed treatment. 8759 12.7.1. Handling of File Name Components 8761 There are a number of places within client and server where file name 8762 components are processed: 8764 o On the client, file names may be processed as part of forming 8765 NFSv4 requests. Any such processing will reflect specific needs 8766 of the client's environment and will be treated as out-of-scope 8767 from the viewpoint of this specification. 8769 o On the server, file names are processed as part of processing 8770 NFSv4 requests. In practice, parts of the processing will be 8771 implemented within the NFS version 4 server while other parts will 8772 be implemented within the file system. This processing is 8773 described in the sections below. These sections are organized in 8774 a fashion parallel to a stringprep profile. The same sorts of 8775 topics are dealt with but they differ in that there is a wider 8776 range of possible processing choices. 8778 o On the server, file name components might potentially be subject 8779 to processing as part of generating NFS version 4 responses. This 8780 specification assumes that this processing will be empty and that 8781 file name components will be copied verbatim at this point. The 8782 file name components may be modified as they appear in responses, 8783 relative to the values used in the request but this is only 8784 treated as reflecting changes made as part of request processing. 8785 For example, a change to a file name component made in processing 8786 a CREATE operation will be reflected in the READDIR since the 8787 files created will have names that reflect CREATE-time processing. 8789 o On the client, responses will need to be properly dealt with and 8790 the relevant issues will be discussed in the sections below. 8791 Primarily, this will involve dealing with the fact that file name 8792 components received in responses may need to be processed to meet 8793 the requirements of the client's internal environment. This will 8794 mainly involve dealing with changes in name components possibly 8795 made by server processing. It also addresses other sorts of 8796 expected behavior that do not involve a returned component4, such 8797 as whether a LOOKUP finds a given component4 or whether a CREATE 8798 or OPEN finds that a specified name already exists. 8800 12.7.1.1. Nature of Server Processing of Name Components in Request 8802 The component4 type defines a potentially case sensitive string, 8803 typically of UTF-8 characters. Its use in NFS version 4 is for 8804 representing file name components. Since file systems can implement 8805 case insensitive file name handling, it can be used for both case 8806 sensitive and case insensitive file name handling, based on the 8807 attributes of the file system. 8809 It may be the case that two valid distinct UTF-8 strings will be the 8810 same after the processing described below. In such a case, a server 8811 may either, 8813 o disallow the creation of a second name if its post-processed form 8814 collides with that of an existing name, or 8816 o allow the creation of the second name, but arrange so that after 8817 post processing, the second name is different than the post- 8818 processed form of the first name. 8820 12.7.1.2. Character Repertoire for the Component4 Type 8822 The RECOMMENDED character repertoire for file name components is a 8823 recent/current version of Unicode, as encoded via UTF-8. There are a 8824 number of alternate character repertoires which may be chosen by the 8825 server based on implementation constraints including the requirements 8826 of the file system being accessed. 8828 Two important alternative repertoires are: 8830 o One alternate character repertoire is to represent file name 8831 components as strings of bytes with no protocol-defined encoding 8832 of multi-byte characters. Most typically, implementations that 8833 support this single-byte alternative will make it available as an 8834 option set by an administrator for all file systems within a 8835 server or for some particular file systems. If a server accepts 8836 non-UTF-8 strings anywhere within a specific file system, then it 8837 MUST do so throughout the entire file system. 8839 o Another alternate character repertoire is the set of codepoints, 8840 representable by the file system, most typically UCS-4. 8842 Individual file system implementations may have more restricted 8843 character repertoires, as for example file system that only are 8844 capable of storing names consisting of UCS-2 characters. When this 8845 is the case, and the character repertoire is not restricted to 8846 single-byte characters, characters not within that repertoire are 8847 treated as prohibited and the error NFS4ERR_BADCHAR is returned by 8848 the server when that character is encountered. 8850 Strings are intended to be in UTF-8 format and servers SHOULD return 8851 NFS4ERR_INVAL, as discussed above, when the characters sent are not 8852 valid UTF-8. When the character repertoire consists of single-byte 8853 characters, UTF-8 is not enforced. Such situations should be 8854 restricted to those where use is within a restricted environment 8855 where a single character mapping locale can be administratively 8856 enforced, allowing a file name to be treated as a string of bytes, 8857 rather than as a string of characters. Such an arrangement might be 8858 necessary when NFSv4 access to a file system containing names which 8859 are not valid UTF-8 needs to be provided. 8861 However, in any of the following situations, file names have to be 8862 treated as strings of Unicode characters and servers MUST return 8863 NFS4ERR_INVAL when file names that are not in UTF-8 format: 8865 o Case-insensitive comparisons are specified by the file system and 8866 any characters sent contain non-ASCII byte codes. 8868 o Any normalization constraints are enforced by the server or file 8869 system implementation. 8871 o The server accepts a given name when creating a file and reports a 8872 different one when the directory is being examined. 8874 Much of the discussion below regarding normalization and silent 8875 deletion of characters within component4 strings is not applicable 8876 when the server does not enforce UTF-8 component4 strings and treats 8877 them as strings of bytes. A client may determine that a given 8878 filesystem is operating in this mode by performing a LOOKUP using a 8879 non-UTF-8 string, if NFS4ERR_INVAL is not returned, then name 8880 components will be treated as opaque and those sorts of modifications 8881 will not be seen. 8883 12.7.1.3. Case-based Mapping Used for Component4 Strings 8885 Case-based mapping is not always a required part of server processing 8886 of name components. However, if the NFSv4 file server supports the 8887 case_insensitive file system attribute, and if the case_insensitive 8888 attribute is true for a given file system, the NFS version 4 server 8889 MUST use the Unicode case mapping tables for the version of Unicode 8890 corresponding to the character repertoire. In the case where the 8891 character repertoire is UCS-2 or UCS-4, the case mapping tables from 8892 the latest available version of Unicode SHOULD be used. 8894 If the case_preserving attribute is present and set to false, then 8895 the NFSv4 server MUST use the corresponding Unicode case mapping 8896 table to map case when processing component4 strings. Whether the 8897 server maps from lower to upper case or the upper to lower case is a 8898 matter for implementation choice. 8900 Stringprep Table B.2 should not be used for these purpose since it is 8901 limited to Unicode version 3.2 and also because it erroneously maps 8902 the German ligature eszett to the string "ss", whereas later versions 8903 of Unicode contain both lower-case and upper-case versions of Eszett 8904 (SMALL LETTER SHARP S and CAPITAL LETTER SHARP S). 8906 Clients should be aware that servers may have mapped SMALL LETTER 8907 SHARP S to the string "ss" when case-insensitive mapping is in 8908 effect, with result that file whose name contains SMALL LETTER SHARP 8909 S may have that character replaced by "ss" or "SS". 8911 12.7.1.4. Other Mapping Used for Component4 Strings 8913 Other than for issues of case mapping, an NFSv4 server SHOULD limit 8914 visible (i.e., those that change the name of file to reflect those 8915 mappings to those from from a subset of the stringprep table B.1. 8917 Note particularly, the mappings from U+200C and U+200D to the empty 8918 string should be avoided, due to their undesirable effect on some 8919 strings in Farsi. 8921 Table B.1 may be used but it should be used only if required by the 8922 local file system implementation. For example, if the file system in 8923 question accepts file names containing the MONGOLIAN TODO SOFT HYPHEN 8924 character (U+1806) and they are distinct from the corresponding file 8925 names with this character removed, then using Table B.1 will cause 8926 functional problems when clients attempt to interact with that file 8927 system. The NFSv4 server implementation including the filesystem 8928 MUST NOT silently remove characters not within Table B.1. 8930 If an implementation wishes to eliminate other characters because it 8931 is believed that allowing component name versions that both include 8932 the character and do not have while otherwise the same, will 8933 contribute to confusion, it has two options: 8935 o Treat the characters as prohibited and return NFS4ERR_BADCHAR. 8937 o Eliminate the character as part of the name matching processing, 8938 while retaining it when a file is created. This would be 8939 analogous to file systems that are both case-insensitive and case- 8940 preserving,as discussed above, or those which are both 8941 normalization-insensitive and normalization-preserving, as 8942 discussed below. The handling will be insensitive to the presence 8943 of the chosen characters while preserving the presence or absence 8944 of such characters within names. 8946 Note that the second of these choices is a desirable way to handle 8947 characters within table B.1, again with the exception of U+200C and 8948 U+200D, which can cause issues for Farsi. 8950 In addition to modification due to normalization, discussed below, 8951 clients have to be able to deal with name modifications and other 8952 consequences of character mapping on the server, as discussed above. 8954 12.7.1.5. Normalization Issues for Component Strings 8956 The issues are best discussed separately for the server and the 8957 client. It is important to note that the server and client may have 8958 different approaches to this area, and that the server choice may not 8959 match the client operating environment. The issue of mismatches and 8960 how they may be best dealt with by the client is discussed in a later 8961 section. 8963 12.7.1.5.1. Server Normalization Issues for Component Strings 8965 The NFSv4 does not specify required use of a particular normalization 8966 form for component4 strings. Therefore, the server may receive 8967 unnormalized strings or strings that reflect either normalization 8968 form within protocol requests and responses. If the file system 8969 requires normalization, then the server implementation must normalize 8970 component4 strings within the protocol server before presenting the 8971 information to the local file system. 8973 With regard to normalization, servers have the following choices, 8974 with the possibility that different choices may be selected for 8975 different file systems. 8977 o Implement a particular normalization form, either NFC, or NFD, in 8978 which case file names received from a client are converted to that 8979 normalization form and as a consequence, the client will always 8980 receive names in that normalization form. If this option is 8981 chosen, then it is impossible to create two files in the same 8982 directory that have different names which map to the same name 8983 when normalized. 8985 o Implement handling which is both normalization-insensitive and 8986 normalization-preserving. This makes it impossible to create two 8987 files in the same directory that have two different canonically 8988 equivalent names, i.e., names which map to the same name when 8989 normalized. However, unlike the previous option, clients will not 8990 have the names that they present modified to meet the server's 8991 normalization constraints. 8993 o Implement normalization-sensitive handling without enforcing a 8994 normalization form constraint on file names. This exposes the 8995 client to the possibility that two files can be created in the 8996 same directory which have different names which map to the same 8997 name when normalized. This may be a significant issue when 8998 clients which use different normalization forms are used on the 8999 same file system, but this issue needs to be set against the 9000 difficulty of providing other sorts of normalization handling for 9001 some existing file systems. 9003 12.7.1.5.2. Client Normalization Issues for Component Strings 9005 The client, in processing name components, needs to deal with the 9006 fact that the server may impose normalization on file name components 9007 presented to it. As a result, a file can be created within a 9008 directory and that name be different from that sent by the client due 9009 to normalization at the server. 9011 Client operating environments differ in their handling of canonically 9012 equivalent names. Some environments treat canonically equivalent 9013 strings as essentially equal and we will call these environments 9014 normalization-aware. Others, because of the pattern of their 9015 development with regard to these issues treat different strings as 9016 different, even if they are canonically equivalent. We call these 9017 normalization-unaware. 9019 We discuss below issues that may arise when each of these types of 9020 environments interact with the various types of file systems, with 9021 regard to normalization handling. Note that complexity for the 9022 client is increased given that there are no file system attributes to 9023 determine the normalization handling present for that file system. 9024 Where the client has the ability to create files (file system not 9025 read-only and security allows it), attempting to create multiple 9026 files with canonically equivalent names and looking at success 9027 patterns and the names assigned by the server to these files can 9028 serve as a way to determine the relevant information. 9030 Normalization-aware environments interoperate most normally with 9031 servers that either impose a given normalization form or those that 9032 implement name handling which is both normalization-insensitive and 9033 normalization-preserving name handling. However, clients need to be 9034 prepared to interoperate with servers that have normalization- 9035 sensitive file naming. In this situation, the client needs to be 9036 prepared for the fact that a directory may contain multiple names 9037 that it considers equivalent. 9039 The following suggestions may be helpful in handling interoperability 9040 issues for normalization-aware client environments, when they 9041 interact with normalization-sensitive file systems. 9043 When READDIR is done, the names returned may include names that do 9044 not match the client's normalization form, but instead are other 9045 names canonically equivalent to the normalized name. 9047 When it can be determined that a normalization-insensitive server 9048 file system is not involved, the client can simply normalize 9049 filename components strings to its preferred normalization form. 9051 When it cannot be determined that a normalization-insensitive 9052 server file system is not involved, the client is generally best 9053 advised to process incoming name components so as to allow all 9054 name components in a canonical equivalence class to be together. 9055 When only a single member of class exists, it should generally 9056 mapped directly to the preferred normalization form, whether the 9057 name was of that form or not. 9059 When the client sees multiple names that are canonically 9060 equivalent, it is clear you have a file system which is 9061 normalization sensitive. Clients should generally replace each 9062 canonically equivalent name with one that appends some 9063 distinguishing suffix, usually including a number. The numbers 9064 should be assigned so that each distinct possible name with the 9065 set of canonically equivalent names has an assigned numeric value. 9066 Note that for some cases in which there are multiple instances of 9067 strings that might be composed or decomposed and/or situations 9068 with multiple diacritics to be applied to the same character, the 9069 class might be large. 9071 When interacting with a normalization-sensitive filesystem, it may 9072 be that the environment contains clients or implementations local 9073 to the OS in which the file system is embedded, which use a 9074 different normalization form. In such situations, a LOOKUP may 9075 well fail, even though the directory contains a name canonically 9076 equivalent to the name sought. One solution to this problem is to 9077 re-do the LOOKUP in that situation with name converted to the 9078 alternate normalization form. 9080 In the case in which normalization-unaware clients are involved in 9081 the mix, LOOKUP can fail and then the second LOOKUP, described 9082 above can also fail, even though there may well be a canonically 9083 equivalent name in the directory. One possible approach in that 9084 case is to use a READDIR to find the equivalent name and lookup 9085 that, although this can greatly add to client implementation 9086 complexity. 9088 When interacting with a normalization-sensitive filesystem, the 9089 situation where the environment contains clients or 9090 implementations local to the OS in which the file system is 9091 embedded, which use a different normalization form can also cause 9092 issues when a file (or symlink or directory, etc.) is being 9093 created. In such cases, you may be able to create an object of 9094 the specified name even though, the directory contains a 9095 canonically equivalent name. Similar issues can occur with LINK 9096 and RENAME. The client can't really do much about such 9097 situations, except be aware that they may occur. That's one of 9098 the reasons normalization-sensitive server file system 9099 implementations can be problematic to use when 9100 internationalization issues are important. 9102 Normalization-unaware environments interoperate most normally with 9103 servers that implement normalization-sensitive file naming. However, 9104 clients need to be prepared to interoperate with servers that impose 9105 a given normalization form or that implement name handling which is 9106 both normalization-insensitive and normalization-preserving. In the 9107 former case, a file created with a given name may find it changed to 9108 a different (although related name). In both cases, the client will 9109 have to deal with the fact that it is unable to create two names 9110 within a directory that are canonically equivalent. 9112 Note that although the client implementation itself and the kernel 9113 implementation may be normalization-unaware, treating name components 9114 as strings not subject to normalization, the environment as a whole 9115 may be normalization-aware if commonly used libraries result in an 9116 application environment where a single normalization form is used 9117 throughout. Because of this, normalization-unaware environments may 9118 be relatively rare. 9120 The following suggestions may be helpful in handling interoperability 9121 issues for truly normalization-unaware client environments, when they 9122 interact with file systems other than those which are normalization- 9123 sensitive. The issues tend to be the inverse of those for 9124 normalization-aware environments. The implementer should be careful 9125 not to erroneously treat the environment as normalization-unaware, 9126 based solely on the details of the kernel implementation. 9128 Unless the file system is normalization-preserving, when files (or 9129 other objects) are created, the object name as reported by a 9130 READDIR of the associated directory may show a name different than 9131 the one used to create the object. This behavior is something 9132 that the client has to accept. Since it has no preferred 9133 normalization form, it has no way of converting the name to a 9134 preferred form. 9136 In situations where there is an attempt to create multiple objects 9137 in the same directory which have canonically-equivalent names. 9138 these file systems will either report that an object of name 9139 already exists or simply open a file of that other name. 9141 If it desired to have those two objects in the same directory, the 9142 names must be made not canonically equivalent. It is possible to 9143 append some distinguishing character to the name of the second 9144 object but in clients having a typical file API (such as POSIX), 9145 the fact that the name change occurred cannot be propagated back 9146 to the requester. 9148 In cases where a client is application-specific, it may be 9149 possible for it to deal with such a collision by modifying the 9150 name and taking note of the changed name. 9152 12.7.1.6. Prohibited Characters for Component Names 9154 The NFSv4 protocol does not specify particular characters that may 9155 not appear in component names. File systems may have their own set 9156 of prohibited characters for which the error NFS4ERR_BADCHAR should 9157 be returned by the server. Clients need to be prepared for this 9158 error to occur whenever file name components are presented to the 9159 server. 9161 Clients whose character repertoire for acceptable characters in file 9162 name components is smaller than the entire scope of UCS-4 may need to 9163 deal with names returned by the server that contain characters 9164 outside that repertoire. It is up to the client whether it simply 9165 ignores these files or modifies the name to meet its own rules for 9166 acceptable names. 9168 Clients may encounter names that do not consist of valid UTF-8, if 9169 they interact with servers configured to allow this option. They are 9170 not required to deal with this case and may treat the server as not 9171 functioning correctly, or they may handle this as normal. Clients 9172 will normally make this a configuration option. As discussed above, 9173 a client can determine whether a particular file system is being 9174 supported by the server in this mode by issuing a LOOKUP specifying a 9175 name which is not valid UTF-8 and seeing if NFS4ERR_INVAL is 9176 returned. 9178 12.7.1.7. Bidirectional String Checking for Component Names 9180 The NFSv4 protocol does not require processing of component names to 9181 check for and reject bidirectional strings. Such processing may be a 9182 part of the file system implementation but if so, its particular form 9183 will be defined by the file system implementation. When strings are 9184 rejected on this basis, the error NFS4ERR_BADNAME would be returned. 9186 Clients need to be prepared for the fact that the server may reject a 9187 file name component if it consists of a bidirectional string, 9188 returning NFS4ERR_BADNAME. 9190 Clients may encounter names with bidirectional strings returned in 9191 responses from the server. If clients treat such strings as not 9192 valid file name components, it is up to the client whether it simply 9193 ignores these files or modifies the name component to meet its own 9194 rules for acceptable name component strings. 9196 12.7.2. Processing of Link Text 9198 Symbolic link text is defined as utf8val_should and therefore the 9199 server SHOULD validate link text on a CREATE and return NFS4ERR_INVAL 9200 if it is is not valid UTF-8. Note that file systems which treat 9201 names as strings of byte are an exception for which such validation 9202 need not be done. One other situation in which an NFSv4 might choose 9203 (or be configured) not to make such a check is when links within file 9204 system reference names in another which is configured to treat names 9205 as strings of bytes. 9207 On the other hand, UTF-8 validation of symbolic link text need not be 9208 done on the data resulting from a READLINK. Such data might have 9209 been stored by an NFS Version 4 server configured to allow non-UTF-8 9210 link text or it might have resulted from symbolic link text stored 9211 via local file system access or access via another remote file access 9212 protocol. 9214 Note that because of the role of the symbolic link, as data stored 9215 and read by the user, other sorts of validations or modifications 9216 should not be done. Note that when component names with the symbolic 9217 link text are used, such checks and modifications will be done at 9218 that time. In particular, 9220 o Limitation of the character repertoire MUST NOT be done. This 9221 includes limitations to reflect a particular version of Unicode, 9222 or the inability of any particularly file system to store 9223 characters beyond UCS-2. 9225 o Name mapping, whether for case folding or otherwise MUST NOT be 9226 done. 9228 o Checks for a type of normalization or normalization to a 9229 particular form MUST NOT be done. 9231 o Checks for specific characters excluded by the server or file 9232 system MUST NOT be done. 9234 o Checks for bidirectional strings MUST NOT be done. 9236 12.7.3. Processing of Principal Prefixes 9238 As mentioned above, users and groups are designated as a particular 9239 string at a specified domain. Servers will recognize a set of valid 9240 principals for one or more domains. With regard to the handling of 9241 these strings, the following rules MUST be followed 9243 o The string MUST be checked by the server for valid UTF-8 and the 9244 error NFS4ERR_INVAL returned if it is not valid. 9246 o The character repertoire for the principal prefix string should be 9247 limited to a current version of Unicode when the server is 9248 implemented. However, the client cannot be assured that all 9249 characters it receives as part of a user or group attribute are 9250 those that are defined in the Unicode version it expects to work 9251 with. 9253 o No character mapping is to be done, as for example table B.1 in 9254 stringprep, and no case mapping is to be done. The user and group 9255 names are to be treated as case-sensitive. 9257 o Strings must not be rejected based on their normalization. 9258 Servers should do normalization insensitive matching in converting 9259 a user to group to an internal id. The client cannot assume that 9260 the server preserves normalization so a user set to one string 9261 value may be returned as a string which differs in normalization 9262 and the client must be prepared to deal with that, by, for 9263 example, normalizing the string to the client's preferred form. 9265 o There are no checks for specific invalid characters but servers 9266 may limit the characters, with the result that any principal 9267 presented by the client which has such a characters is treated as 9268 invalid. 9270 o Specific checks for bidirectional strings are not done but servers 9271 may limit the principal prefix strings to those which are 9272 unidirectional or are of a certain direction, with the result that 9273 any principal presented by the client which done not meet that 9274 criterion will be treated as invalid. 9276 13. Error Values 9278 NFS error numbers are assigned to failed operations within a Compound 9279 (COMPOUND or CB_COMPOUND) request. A Compound request contains a 9280 number of NFS operations that have their results encoded in sequence 9281 in a Compound reply. The results of successful operations will 9282 consist of an NFS4_OK status followed by the encoded results of the 9283 operation. If an NFS operation fails, an error status will be 9284 entered in the reply and the Compound request will be terminated. 9286 13.1. Error Definitions 9288 Protocol Error Definitions 9290 +-----------------------------+--------+-------------------+ 9291 | Error | Number | Description | 9292 +-----------------------------+--------+-------------------+ 9293 | NFS4_OK | 0 | Section 13.1.3.1 | 9294 | NFS4ERR_ACCESS | 13 | Section 13.1.6.1 | 9295 | NFS4ERR_ATTRNOTSUPP | 10032 | Section 13.1.11.1 | 9296 | NFS4ERR_ADMIN_REVOKED | 10047 | Section 13.1.5.1 | 9297 | NFS4ERR_BADCHAR | 10040 | Section 13.1.7.1 | 9298 | NFS4ERR_BADHANDLE | 10001 | Section 13.1.2.1 | 9299 | NFS4ERR_BADNAME | 10041 | Section 13.1.7.2 | 9300 | NFS4ERR_BADOWNER | 10039 | Section 13.1.11.2 | 9301 | NFS4ERR_BADTYPE | 10007 | Section 13.1.4.1 | 9302 | NFS4ERR_BADXDR | 10036 | Section 13.1.1.1 | 9303 | NFS4ERR_BAD_COOKIE | 10003 | Section 13.1.1.2 | 9304 | NFS4ERR_BAD_RANGE | 10042 | Section 13.1.8.1 | 9305 | NFS4ERR_BAD_SEQID | 10026 | Section 13.1.8.2 | 9306 | NFS4ERR_BAD_STATEID | 10025 | Section 13.1.5.2 | 9307 | NFS4ERR_CLID_INUSE | 10017 | Section 13.1.10.1 | 9308 | NFS4ERR_DEADLOCK | 10045 | Section 13.1.8.3 | 9309 | NFS4ERR_DELAY | 10008 | Section 13.1.1.3 | 9310 | NFS4ERR_DENIED | 10010 | Section 13.1.8.4 | 9311 | NFS4ERR_DQUOT | 69 | Section 13.1.4.2 | 9312 | NFS4ERR_EXIST | 17 | Section 13.1.4.3 | 9313 | NFS4ERR_EXPIRED | 10011 | Section 13.1.5.3 | 9314 | NFS4ERR_FBIG | 27 | Section 13.1.4.4 | 9315 | NFS4ERR_FHEXPIRED | 10014 | Section 13.1.2.2 | 9316 | NFS4ERR_FILE_OPEN | 10046 | Section 13.1.4.5 | 9317 | NFS4ERR_GRACE | 10013 | Section 13.1.9.1 | 9318 | NFS4ERR_INVAL | 22 | Section 13.1.1.4 | 9319 | NFS4ERR_IO | 5 | Section 13.1.4.6 | 9320 | NFS4ERR_ISDIR | 21 | Section 13.1.2.3 | 9321 | NFS4ERR_LEASE_MOVED | 10031 | Section 13.1.5.4 | 9322 | NFS4ERR_LOCKED | 10012 | Section 13.1.8.5 | 9323 | NFS4ERR_LOCKS_HELD | 10037 | Section 13.1.8.6 | 9324 | NFS4ERR_LOCK_NOTSUPP | 10043 | Section 13.1.8.7 | 9325 | NFS4ERR_LOCK_RANGE | 10028 | Section 13.1.8.8 | 9326 | NFS4ERR_MINOR_VERS_MISMATCH | 10021 | Section 13.1.3.2 | 9327 | NFS4ERR_MLINK | 31 | Section 13.1.4.7 | 9328 | NFS4ERR_MOVED | 10019 | Section 13.1.2.4 | 9329 | NFS4ERR_NAMETOOLONG | 63 | Section 13.1.7.3 | 9330 | NFS4ERR_NOENT | 2 | Section 13.1.4.8 | 9331 | NFS4ERR_NOFILEHANDLE | 10020 | Section 13.1.2.5 | 9332 | NFS4ERR_NOSPC | 28 | Section 13.1.4.9 | 9333 | NFS4ERR_NOTDIR | 20 | Section 13.1.2.6 | 9334 | NFS4ERR_NOTEMPTY | 66 | Section 13.1.4.10 | 9335 | NFS4ERR_NOTSUPP | 10004 | Section 13.1.1.5 | 9336 | NFS4ERR_NOT_SAME | 10027 | Section 13.1.11.3 | 9337 | NFS4ERR_NO_GRACE | 10033 | Section 13.1.9.2 | 9338 | NFS4ERR_NXIO | 6 | Section 13.1.4.11 | 9339 | NFS4ERR_OLD_STATEID | 10024 | Section 13.1.5.5 | 9340 | NFS4ERR_OPENMODE | 10038 | Section 13.1.8.9 | 9341 | NFS4ERR_OP_ILLEGAL | 10044 | Section 13.1.3.3 | 9342 | NFS4ERR_PERM | 1 | Section 13.1.6.2 | 9343 | NFS4ERR_RECLAIM_BAD | 10034 | Section 13.1.9.3 | 9344 | NFS4ERR_RECLAIM_CONFLICT | 10035 | Section 13.1.9.4 | 9345 | NFS4ERR_RESOURCE | 10018 | Section 13.1.3.4 | 9346 | NFS4ERR_RESTOREFH | 10030 | Section 13.1.4.12 | 9347 | NFS4ERR_ROFS | 30 | Section 13.1.4.13 | 9348 | NFS4ERR_SAME | 10009 | Section 13.1.11.4 | 9349 | NFS4ERR_SERVERFAULT | 10006 | Section 13.1.1.6 | 9350 | NFS4ERR_STALE | 70 | Section 13.1.2.7 | 9351 | NFS4ERR_STALE_CLIENTID | 10022 | Section 13.1.10.2 | 9352 | NFS4ERR_STALE_STATEID | 10023 | Section 13.1.5.6 | 9353 | NFS4ERR_SYMLINK | 10029 | Section 13.1.2.8 | 9354 | NFS4ERR_TOOSMALL | 10005 | Section 13.1.1.7 | 9355 | NFS4ERR_WRONGSEC | 10016 | Section 13.1.6.3 | 9356 | NFS4ERR_XDEV | 18 | Section 13.1.4.14 | 9357 +-----------------------------+--------+-------------------+ 9359 Table 8 9361 13.1.1. General Errors 9363 This section deals with errors that are applicable to a broad set of 9364 different purposes. 9366 13.1.1.1. NFS4ERR_BADXDR (Error Code 10036) 9368 The arguments for this operation do not match those specified in the 9369 XDR definition. This includes situations in which the request ends 9370 before all the arguments have been seen. Note that this error 9371 applies when fixed enumerations (these include booleans) have a value 9372 within the input stream which is not valid for the enum. A replier 9373 may pre-parse all operations for a Compound procedure before doing 9374 any operation execution and return RPC-level XDR errors in that case. 9376 13.1.1.2. NFS4ERR_BAD_COOKIE (Error Code 10003) 9378 Used for operations that provide a set of information indexed by some 9379 quantity provided by the client or cookie sent by the server for an 9380 earlier invocation. Where the value cannot be used for its intended 9381 purpose, this error results. 9383 13.1.1.3. NFS4ERR_DELAY (Error Code 10008) 9385 For any of a number of reasons, the replier could not process this 9386 operation in what was deemed a reasonable time. The client should 9387 wait and then try the request with a new RPC transaction ID. 9389 Some example of situations that might lead to this situation: 9391 o A server that supports hierarchical storage receives a request to 9392 process a file that had been migrated. 9394 o An operation requires a delegation recall to proceed and waiting 9395 for this delegation recall makes processing this request in a 9396 timely fashion impossible. 9398 13.1.1.4. NFS4ERR_INVAL (Error Code 22) 9400 The arguments for this operation are not valid for some reason, even 9401 though they do match those specified in the XDR definition for the 9402 request. 9404 13.1.1.5. NFS4ERR_NOTSUPP (Error Code 10004) 9406 Operation not supported, either because the operation is an OPTIONAL 9407 one and is not supported by this server or because the operation MUST 9408 NOT be implemented in the current minor version. 9410 13.1.1.6. NFS4ERR_SERVERFAULT (Error Code 10006) 9412 An error occurred on the server which does not map to any of the 9413 specific legal NFSv4.1 protocol error values. The client should 9414 translate this into an appropriate error. UNIX clients may choose to 9415 translate this to EIO. 9417 13.1.1.7. NFS4ERR_TOOSMALL (Error Code 10005) 9419 Used where an operation returns a variable amount of data, with a 9420 limit specified by the client. Where the data returned cannot be fit 9421 within the limit specified by the client, this error results. 9423 13.1.2. Filehandle Errors 9425 These errors deal with the situation in which the current or saved 9426 filehandle, or the filehandle passed to PUTFH intended to become the 9427 current filehandle, is invalid in some way. This includes situations 9428 in which the filehandle is a valid filehandle in general but is not 9429 of the appropriate object type for the current operation. 9431 Where the error description indicates a problem with the current or 9432 saved filehandle, it is to be understood that filehandles are only 9433 checked for the condition if they are implicit arguments of the 9434 operation in question. 9436 13.1.2.1. NFS4ERR_BADHANDLE (Error Code 10001) 9438 Illegal NFS filehandle for the current server. The current file 9439 handle failed internal consistency checks. Once accepted as valid 9440 (by PUTFH), no subsequent status change can cause the filehandle to 9441 generate this error. 9443 13.1.2.2. NFS4ERR_FHEXPIRED (Error Code 10014) 9445 A current or saved filehandle which is an argument to the current 9446 operation is volatile and has expired at the server. 9448 13.1.2.3. NFS4ERR_ISDIR (Error Code 21) 9450 The current or saved filehandle designates a directory when the 9451 current operation does not allow a directory to be accepted as the 9452 target of this operation. 9454 13.1.2.4. NFS4ERR_MOVED (Error Code 10019) 9456 The file system which contains the current filehandle object is not 9457 present at the server. It may have been relocated, migrated to 9458 another server or may have never been present. The client may obtain 9459 the new file system location by obtaining the "fs_locations" or 9460 attribute for the current filehandle. For further discussion, refer 9461 to Section 7 9463 13.1.2.5. NFS4ERR_NOFILEHANDLE (Error Code 10020) 9465 The logical current or saved filehandle value is required by the 9466 current operation and is not set. This may be a result of a 9467 malformed COMPOUND operation (i.e., no PUTFH or PUTROOTFH before an 9468 operation that requires the current filehandle be set). 9470 13.1.2.6. NFS4ERR_NOTDIR (Error Code 20) 9472 The current (or saved) filehandle designates an object which is not a 9473 directory for an operation in which a directory is required. 9475 13.1.2.7. NFS4ERR_STALE (Error Code 70) 9477 The current or saved filehandle value designating an argument to the 9478 current operation is invalid The file referred to by that filehandle 9479 no longer exists or access to it has been revoked. 9481 13.1.2.8. NFS4ERR_SYMLINK (Error Code 10029) 9483 The current filehandle designates a symbolic link when the current 9484 operation does not allow a symbolic link as the target. 9486 13.1.3. Compound Structure Errors 9488 This section deals with errors that relate to overall structure of a 9489 Compound request (by which we mean to include both COMPOUND and 9490 CB_COMPOUND), rather than to particular operations. 9492 There are a number of basic constraints on the operations that may 9493 appear in a Compound request. 9495 13.1.3.1. NFS_OK (Error code 0) 9497 Indicates the operation completed successfully, in that all of the 9498 constituent operations completed without error. 9500 13.1.3.2. NFS4ERR_MINOR_VERS_MISMATCH (Error code 10021) 9502 The minor version specified is not one that the current listener 9503 supports. This value is returned in the overall status for the 9504 Compound but is not associated with a specific operation since the 9505 results must specify a result count of zero. 9507 13.1.3.3. NFS4ERR_OP_ILLEGAL (Error Code 10044) 9509 The operation code is not a valid one for the current Compound 9510 procedure. The opcode in the result stream matched with this error 9511 is the ILLEGAL value, although the value that appears in the request 9512 stream may be different. Where an illegal value appears and the 9513 replier pre-parses all operations for a Compound procedure before 9514 doing any operation execution, an RPC-level XDR error may be returned 9515 in this case. 9517 13.1.3.4. NFS4ERR_RESOURCE (Error Code 10018) 9519 For the processing of the Compound procedure, the server may exhaust 9520 available resources and cannot continue processing operations within 9521 the Compound procedure. This error will be returned from the server 9522 in those instances of resource exhaustion related to the processing 9523 of the Compound procedure. 9525 13.1.4. File System Errors 9527 These errors describe situations which occurred in the underlying 9528 file system implementation rather than in the protocol or any NFSv4.x 9529 feature. 9531 13.1.4.1. NFS4ERR_BADTYPE (Error Code 10007) 9533 An attempt was made to create an object with an inappropriate type 9534 specified to CREATE. This may be because the type is undefined, 9535 because it is a type not supported by the server, or because it is a 9536 type for which create is not intended such as a regular file or named 9537 attribute, for which OPEN is used to do the file creation. 9539 13.1.4.2. NFS4ERR_DQUOT (Error Code 19) 9541 Resource (quota) hard limit exceeded. The user's resource limit on 9542 the server has been exceeded. 9544 13.1.4.3. NFS4ERR_EXIST (Error Code 17) 9546 A file of the specified target name (when creating, renaming or 9547 linking) already exists. 9549 13.1.4.4. NFS4ERR_FBIG (Error Code 27) 9551 File too large. The operation would have caused a file to grow 9552 beyond the server's limit. 9554 13.1.4.5. NFS4ERR_FILE_OPEN (Error Code 10046) 9556 The operation is not allowed because a file involved in the operation 9557 is currently open. Servers may, but are not required to disallow 9558 linking-to, removing, or renaming open files. 9560 13.1.4.6. NFS4ERR_IO (Error Code 5) 9562 Indicates that an I/O error occurred for which the file system was 9563 unable to provide recovery. 9565 13.1.4.7. NFS4ERR_MLINK (Error Code 31) 9567 The request would have caused the server's limit for the number of 9568 hard links a file may have to be exceeded. 9570 13.1.4.8. NFS4ERR_NOENT (Error Code 2) 9572 Indicates no such file or directory. The file or directory name 9573 specified does not exist. 9575 13.1.4.9. NFS4ERR_NOSPC (Error Code 28) 9577 Indicates no space left on device. The operation would have caused 9578 the server's file system to exceed its limit. 9580 13.1.4.10. NFS4ERR_NOTEMPTY (Error Code 66) 9582 An attempt was made to remove a directory that was not empty. 9584 13.1.4.11. NFS4ERR_NXIO (Error Code 5) 9586 I/O error. No such device or address. 9588 13.1.4.12. NFS4ERR_RESTOREFH (Error Code 10030) 9590 The RESTOREFH operation does not have a saved filehandle (identified 9591 by SAVEFH) to operate upon. 9593 13.1.4.13. NFS4ERR_ROFS (Error Code 30) 9595 Indicates a read-only file system. A modifying operation was 9596 attempted on a read-only file system. 9598 13.1.4.14. NFS4ERR_XDEV (Error Code 18) 9600 Indicates an attempt to do an operation, such as linking, that 9601 inappropriately crosses a boundary. This may be due to such 9602 boundaries as: 9604 o That between file systems (where the fsids are different). 9606 o That between different named attribute directories or between a 9607 named attribute directory and an ordinary directory. 9609 o That between regions of a file system that the file system 9610 implementation treats as separate (for example for space 9611 accounting purposes), and where cross-connection between the 9612 regions are not allowed. 9614 13.1.5. State Management Errors 9616 These errors indicate problems with the stateid (or one of the 9617 stateids) passed to a given operation. This includes situations in 9618 which the stateid is invalid as well as situations in which the 9619 stateid is valid but designates revoked locking state. Depending on 9620 the operation, the stateid when valid may designate opens, byte-range 9621 locks, file or directory delegations, layouts, or device maps. 9623 13.1.5.1. NFS4ERR_ADMIN_REVOKED (Error Code 10047) 9625 A stateid designates locking state of any type that has been revoked 9626 due to administrative interaction, possibly while the lease is valid. 9628 13.1.5.2. NFS4ERR_BAD_STATEID (Error Code 10026) 9630 A stateid generated by the current server instance, but which does 9631 not designate any locking state (either current or superseded) for a 9632 current lockowner-file pair, was used. 9634 13.1.5.3. NFS4ERR_EXPIRED (Error Code 10011) 9636 A stateid designates locking state of any type that has been revoked 9637 due to expiration of the client's lease, either immediately upon 9638 lease expiration, or following a later request for a conflicting 9639 lock. 9641 13.1.5.4. NFS4ERR_LEASE_MOVED (Error Code 10031) 9643 A lease being renewed is associated with a file system that has been 9644 migrated to a new server. 9646 13.1.5.5. NFS4ERR_OLD_STATEID (Error Code 10024) 9648 A stateid with a non-zero seqid value does match the current seqid 9649 for the state designated by the user. 9651 13.1.5.6. NFS4ERR_STALE_STATEID (Error Code 10023) 9653 A stateid generated by an earlier server instance was used. 9655 13.1.6. Security Errors 9657 These are the various permission-related errors in NFSv4.1. 9659 13.1.6.1. NFS4ERR_ACCESS (Error Code 13) 9661 Indicates permission denied. The caller does not have the correct 9662 permission to perform the requested operation. Contrast this with 9663 NFS4ERR_PERM (Section 13.1.6.2), which restricts itself to owner or 9664 privileged user permission failures. 9666 13.1.6.2. NFS4ERR_PERM (Error Code 1) 9668 Indicates requester is not the owner. The operation was not allowed 9669 because the caller is neither a privileged user (root) nor the owner 9670 of the target of the operation. 9672 13.1.6.3. NFS4ERR_WRONGSEC (Error Code 10016) 9674 Indicates that the security mechanism being used by the client for 9675 the operation does not match the server's security policy. The 9676 client should change the security mechanism being used and re-send 9677 the operation. SECINFO can be used to determine the appropriate 9678 mechanism. 9680 13.1.7. Name Errors 9682 Names in NFSv4 are UTF-8 strings. When the strings are not are of 9683 length zero, the error NFS4ERR_INVAL results. When they are not 9684 valid UTF-8 the error NFS4ERR_INVAL also results, but servers may 9685 accommodate file systems with different character formats and not 9686 return this error. Besides this, there are a number of other errors 9687 to indicate specific problems with names. 9689 13.1.7.1. NFS4ERR_BADCHAR (Error Code 10040) 9691 A UTF-8 string contains a character which is not supported by the 9692 server in the context in which it being used. 9694 13.1.7.2. NFS4ERR_BADNAME (Error Code 10041) 9696 A name string in a request consisted of valid UTF-8 characters 9697 supported by the server but the name is not supported by the server 9698 as a valid name for current operation. An example might be creating 9699 a file or directory named ".." on a server whose file system uses 9700 that name for links to parent directories. 9702 This error should not be returned due a normalization issue in a 9703 string. When a file system keeps names in a particular normalization 9704 form, it is the server's responsibility to do the appropriate 9705 normalization, rather than rejecting the name. 9707 13.1.7.3. NFS4ERR_NAMETOOLONG (Error Code 63) 9709 Returned when the filename in an operation exceeds the server's 9710 implementation limit. 9712 13.1.8. Locking Errors 9714 This section deal with errors related to locking, both as to share 9715 reservations and byte-range locking. It does not deal with errors 9716 specific to the process of reclaiming locks. Those are dealt with in 9717 the next section. 9719 13.1.8.1. NFS4ERR_BAD_RANGE (Error Code 10042) 9721 The range for a LOCK, LOCKT, or LOCKU operation is not appropriate to 9722 the allowable range of offsets for the server. E.g., this error 9723 results when a server which only supports 32-bit ranges receives a 9724 range that cannot be handled by that server. (See Section 15.12.4). 9726 13.1.8.2. NFS4ERR_BAD_SEQID (Error Code 10026) 9728 The sequence number (seqid) in a locking request is neither the next 9729 expected number or the last number processed. 9731 13.1.8.3. NFS4ERR_DEADLOCK (Error Code 10045) 9733 The server has been able to determine a file locking deadlock 9734 condition for a blocking lock request. 9736 13.1.8.4. NFS4ERR_DENIED (Error Code 10010) 9738 An attempt to lock a file is denied. Since this may be a temporary 9739 condition, the client is encouraged to re-send the lock request until 9740 the lock is accepted. See Section 9.4 for a discussion of the re- 9741 send. 9743 13.1.8.5. NFS4ERR_LOCKED (Error Code 10012) 9745 A read or write operation was attempted on a file where there was a 9746 conflict between the I/O and an existing lock: 9748 o There is a share reservation inconsistent with the I/O being done. 9750 o The range to be read or written intersects an existing mandatory 9751 byte range lock. 9753 13.1.8.6. NFS4ERR_LOCKS_HELD (Error Code 10037) 9755 An operation was prevented by the unexpected presence of locks. 9757 13.1.8.7. NFS4ERR_LOCK_NOTSUPP (Error Code 10043) 9759 A locking request was attempted which would require the upgrade or 9760 downgrade of a lock range already held by the owner when the server 9761 does not support atomic upgrade or downgrade of locks. 9763 13.1.8.8. NFS4ERR_LOCK_RANGE (Error Code 10028) 9765 A lock request is operating on a range that overlaps in part a 9766 currently held lock for the current lock owner and does not precisely 9767 match a single such lock where the server does not support this type 9768 of request, and thus does not implement POSIX locking semantics [35]. 9769 See Section 15.12.5, Section 15.13.5, and Section 15.14.5 for a 9770 discussion of how this applies to LOCK, LOCKT, and LOCKU 9771 respectively. 9773 13.1.8.9. NFS4ERR_OPENMODE (Error Code 10038) 9775 The client attempted a READ, WRITE, LOCK or other operation not 9776 sanctioned by the stateid passed (e.g., writing to a file opened only 9777 for read). 9779 13.1.9. Reclaim Errors 9781 These errors relate to the process of reclaiming locks after a server 9782 restart. 9784 13.1.9.1. NFS4ERR_GRACE (Error Code 10013) 9786 The server is in its recovery or grace period which should at least 9787 match the lease period of the server. A locking request other than a 9788 reclaim could not be granted during that period. 9790 13.1.9.2. NFS4ERR_NO_GRACE (Error Code 10033) 9792 A reclaim of client state was attempted in circumstances in which the 9793 server cannot guarantee that conflicting state has not been provided 9794 to another client. As a result, the server cannot guarantee that 9795 conflicting state has not been provided to another client. 9797 13.1.9.3. NFS4ERR_RECLAIM_BAD (Error Code 10034) 9799 A reclaim attempted by the client does not match the server's state 9800 consistency checks and has been rejected therefore as invalid. 9802 13.1.9.4. NFS4ERR_RECLAIM_CONFLICT (Error Code 10035) 9804 The reclaim attempted by the client has encountered a conflict and 9805 cannot be satisfied. Potentially indicates a misbehaving client, 9806 although not necessarily the one receiving the error. The 9807 misbehavior might be on the part of the client that established the 9808 lock with which this client conflicted. 9810 13.1.10. Client Management Errors 9812 This sections deals with errors associated with requests used to 9813 create and manage client IDs. 9815 13.1.10.1. NFS4ERR_CLID_INUSE (Error Code 10017) 9817 The SETCLIENTID operation has found that a client id is already in 9818 use by another client. 9820 13.1.10.2. NFS4ERR_STALE_CLIENTID (Error Code 10022) 9822 A client ID not recognized by the server was used in a locking or 9823 SETCLIENTID_CONFIRM request. 9825 13.1.11. Attribute Handling Errors 9827 This section deals with errors specific to attribute handling within 9828 NFSv4. 9830 13.1.11.1. NFS4ERR_ATTRNOTSUPP (Error Code 10032) 9832 An attribute specified is not supported by the server. This error 9833 MUST NOT be returned by the GETATTR operation. 9835 13.1.11.2. NFS4ERR_BADOWNER (Error Code 10039) 9837 Returned when an owner or owner_group attribute value or the who 9838 field of an ace within an ACL attribute value cannot be translated to 9839 a local representation. 9841 13.1.11.3. NFS4ERR_NOT_SAME (Error Code 10027) 9843 This error is returned by the VERIFY operation to signify that the 9844 attributes compared were not the same as those provided in the 9845 client's request. 9847 13.1.11.4. NFS4ERR_SAME (Error Code 10009) 9849 This error is returned by the NVERIFY operation to signify that the 9850 attributes compared were the same as those provided in the client's 9851 request. 9853 13.2. Operations and their valid errors 9855 This section contains a table which gives the valid error returns for 9856 each protocol operation. The error code NFS4_OK (indicating no 9857 error) is not listed but should be understood to be returnable by all 9858 operations except ILLEGAL. 9860 Valid error returns for each protocol operation 9862 +---------------------+---------------------------------------------+ 9863 | Operation | Errors | 9864 +---------------------+---------------------------------------------+ 9865 | ACCESS | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE, | 9866 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 9867 | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | 9868 | | NFS4ERR_IO, NFS4ERR_MOVED, | 9869 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_RESOURCE, | 9870 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE | 9871 | CLOSE | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADHANDLE, | 9872 | | NFS4ERR_BAD_SEQID, NFS4ERR_BAD_STATEID, | 9873 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 9874 | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, | 9875 | | NFS4ERR_INVAL, NFS4ERR_ISDIR, | 9876 | | NFS4ERR_LEASE_MOVED, NFS4ERR_LOCKS_HELD, | 9877 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 9878 | | NFS4ERR_OLD_STATEID, NFS4ERR_RESOURCE, | 9879 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 9880 | | NFS4ERR_STALE_STATEID | 9881 | COMMIT | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE, | 9882 | | NFS4ERR_BADXDR, NFS4ERR_FHEXPIRED, | 9883 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR, | 9884 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 9885 | | NFS4ERR_RESOURCE, NFS4ERR_ROFS, | 9886 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 9887 | | NFS4ERR_SYMLINK | 9888 | CREATE | NFS4ERR_ACCESS, NFS4ERR_ATTRNOTSUPP, | 9889 | | NFS4ERR_BADCHAR, NFS4ERR_BADHANDLE, | 9890 | | NFS4ERR_BADNAME, NFS4ERR_BADOWNER, | 9891 | | NFS4ERR_BADTYPE, NFS4ERR_BADXDR, | 9892 | | NFS4ERR_DELAY, NFS4ERR_DQUOT, | 9893 | | NFS4ERR_EXIST, NFS4ERR_FHEXPIRED, | 9894 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED, | 9895 | | NFS4ERR_NAMETOOLONG, NFS4ERR_NOFILEHANDLE, | 9896 | | NFS4ERR_NOSPC, NFS4ERR_NOTDIR, | 9897 | | NFS4ERR_PERM, NFS4ERR_RESOURCE, | 9898 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 9899 | | NFS4ERR_STALE | 9900 | DELEGPURGE | NFS4ERR_BADXDR, NFS4ERR_NOTSUPP, | 9901 | | NFS4ERR_LEASE_MOVED, NFS4ERR_RESOURCE, | 9902 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE_CLIENTID | 9903 | DELEGRETURN | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BAD_STATEID, | 9904 | | NFS4ERR_BADXDR, NFS4ERR_EXPIRED, | 9905 | | NFS4ERR_INVAL, NFS4ERR_LEASE_MOVED, | 9906 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 9907 | | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, | 9908 | | NFS4ERR_RESOURCE, NFS4ERR_SERVERFAULT, | 9909 | | NFS4ERR_STALE, NFS4ERR_STALE_STATEID | 9910 | GETATTR | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE, | 9911 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 9912 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 9913 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED, | 9914 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_RESOURCE, | 9915 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE | 9916 | GETFH | NFS4ERR_BADHANDLE, NFS4ERR_FHEXPIRED, | 9917 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 9918 | | NFS4ERR_RESOURCE, NFS4ERR_SERVERFAULT, | 9919 | | NFS4ERR_STALE | 9920 | ILLEGAL | NFS4ERR_BADXDR, NFS4ERR_OP_ILLEGAL | 9921 | LINK | NFS4ERR_ACCESS, NFS4ERR_BADCHAR, | 9922 | | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME, | 9923 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 9924 | | NFS4ERR_DQUOT, NFS4ERR_EXIST, | 9925 | | NFS4ERR_FHEXPIRED, NFS4ERR_FILE_OPEN, | 9926 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR, | 9927 | | NFS4ERR_MLINK, NFS4ERR_MOVED, | 9928 | | NFS4ERR_NAMETOOLONG, NFS4ERR_NOENT, | 9929 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, | 9930 | | NFS4ERR_NOTDIR, NFS4ERR_NOTSUPP, | 9931 | | NFS4ERR_RESOURCE, NFS4ERR_ROFS, | 9932 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 9933 | | NFS4ERR_WRONGSEC, NFS4ERR_XDEV | 9934 | LOCK | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 9935 | | NFS4ERR_BADHANDLE, NFS4ERR_BAD_RANGE, | 9936 | | NFS4ERR_BAD_SEQID, NFS4ERR_BAD_STATEID, | 9937 | | NFS4ERR_BADXDR, NFS4ERR_DEADLOCK, | 9938 | | NFS4ERR_DELAY, NFS4ERR_DENIED, | 9939 | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, | 9940 | | NFS4ERR_GRACE, NFS4ERR_INVAL, | 9941 | | NFS4ERR_ISDIR, NFS4ERR_LEASE_MOVED, | 9942 | | NFS4ERR_LOCK_NOTSUPP, NFS4ERR_LOCK_RANGE, | 9943 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 9944 | | NFS4ERR_NO_GRACE, NFS4ERR_OLD_STATEID, | 9945 | | NFS4ERR_OPENMODE, NFS4ERR_RECLAIM_BAD, | 9946 | | NFS4ERR_RECLAIM_CONFLICT, NFS4ERR_RESOURCE, | 9947 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 9948 | | NFS4ERR_STALE_CLIENTID, | 9949 | | NFS4ERR_STALE_STATEID | 9950 | LOCKT | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE, | 9951 | | NFS4ERR_BAD_RANGE, NFS4ERR_BADXDR, | 9952 | | NFS4ERR_DELAY, NFS4ERR_DENIED, | 9953 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 9954 | | NFS4ERR_INVAL, NFS4ERR_ISDIR, | 9955 | | NFS4ERR_LEASE_MOVED, NFS4ERR_LOCK_RANGE, | 9956 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 9957 | | NFS4ERR_RESOURCE, NFS4ERR_SERVERFAULT, | 9958 | | NFS4ERR_STALE, NFS4ERR_STALE_CLIENTID | 9959 | LOCKU | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 9960 | | NFS4ERR_BADHANDLE, NFS4ERR_BAD_RANGE, | 9961 | | NFS4ERR_BAD_SEQID, NFS4ERR_BAD_STATEID, | 9962 | | NFS4ERR_BADXDR, NFS4ERR_EXPIRED, | 9963 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 9964 | | NFS4ERR_INVAL, NFS4ERR_ISDIR, | 9965 | | NFS4ERR_LEASE_MOVED, NFS4ERR_LOCK_RANGE, | 9966 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 9967 | | NFS4ERR_OLD_STATEID, NFS4ERR_RESOURCE, | 9968 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 9969 | | NFS4ERR_STALE_STATEID | 9970 | LOOKUP | NFS4ERR_ACCESS, NFS4ERR_BADCHAR, | 9971 | | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME, | 9972 | | NFS4ERR_BADXDR, NFS4ERR_FHEXPIRED, | 9973 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED, | 9974 | | NFS4ERR_NAMETOOLONG, NFS4ERR_NOENT, | 9975 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTDIR, | 9976 | | NFS4ERR_RESOURCE, NFS4ERR_SERVERFAULT, | 9977 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 9978 | | NFS4ERR_WRONGSEC | 9979 | LOOKUPP | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE, | 9980 | | NFS4ERR_DELAY, NFS4ERR_FHEXPIRED, | 9981 | | NFS4ERR_IO, NFS4ERR_MOVED, NFS4ERR_NOENT, | 9982 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTDIR, | 9983 | | NFS4ERR_RESOURCE, NFS4ERR_SERVERFAULT, | 9984 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 9985 | | NFS4ERR_WRONGSEC | 9986 | NVERIFY | NFS4ERR_ACCESS, NFS4ERR_ATTRNOTSUPP, | 9987 | | NFS4ERR_BADCHAR, NFS4ERR_BADHANDLE, | 9988 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 9989 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 9990 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED, | 9991 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_SAME, | 9992 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE | 9993 | OPEN | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 9994 | | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADCHAR, | 9995 | | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME, | 9996 | | NFS4ERR_BADOWNER, NFS4ERR_BADXDR, | 9997 | | NFS4ERR_BAD_SEQID, NFS4ERR_BAD_STATEID, | 9998 | | NFS4ERR_DELAY, NFS4ERR_DQUOT, | 9999 | | NFS4ERR_EXIST, NFS4ERR_EXPIRED, | 10000 | | NFS4ERR_FBIG, NFS4ERR_FHEXPIRED, | 10001 | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_IO, | 10002 | | NFS4ERR_ISDIR, NFS4ERR_MOVED, | 10003 | | NFS4ERR_NAMETOOLONG, NFS4ERR_NOENT, | 10004 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, | 10005 | | NFS4ERR_NOTDIR, NFS4ERR_NOTSUP, | 10006 | | NFS4ERR_NO_GRACE, NFS4ERR_OLD_STATEID, | 10007 | | NFS4ERR_PERM, NFS4ERR_RECLAIM_BAD, | 10008 | | NFS4ERR_RECLAIM_CONFLICT, NFS4ERR_RESOURCE, | 10009 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 10010 | | NFS4ERR_SHARE_DENIED, NFS4ERR_STALE, | 10011 | | NFS4ERR_STALE_CLIENTID, NFS4ERR_SYMLINK, | 10012 | | NFS4ERR_WRONGSEC | 10013 | OPENATTR | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE, | 10014 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 10015 | | NFS4ERR_DQUOT, NFS4ERR_FHEXPIRED, | 10016 | | NFS4ERR_IO, NFS4ERR_MOVED, NFS4ERR_NOENT, | 10017 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, | 10018 | | NFS4ERR_NOTSUPP, NFS4ERR_RESOURCE, | 10019 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 10020 | | NFS4ERR_STALE | 10021 | OPEN_CONFIRM | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADHANDLE, | 10022 | | NFS4ERR_BAD_SEQID, NFS4ERR_BAD_STATEID, | 10023 | | NFS4ERR_BADXDR, NFS4ERR_EXPIRED, | 10024 | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | 10025 | | NFS4ERR_ISDIR, NFS4ERR_LEASE_MOVED, | 10026 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 10027 | | NFS4ERR_OLD_STATEID, NFS4ERR_RESOURCE, | 10028 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 10029 | | NFS4ERR_STALE_STATEID | 10030 | OPEN_DOWNGRADE | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADHANDLE, | 10031 | | NFS4ERR_BADXDR, NFS4ERR_BAD_SEQID, | 10032 | | NFS4ERR_BAD_STATEID, NFS4ERR_DELAY, | 10033 | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, | 10034 | | NFS4ERR_INVAL, NFS4ERR_LEASE_MOVED, | 10035 | | NFS4ERR_LOCKS_HELD, NFS4ERR_MOVED, | 10036 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_OLD_STATEID, | 10037 | | NFS4ERR_RESOURCE, NFS4ERR_ROFS, | 10038 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 10039 | | NFS4ERR_STALE_STATEID | 10040 | PUTFH | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR, | 10041 | | NFS4ERR_DELAY, NFS4ERR_FHEXPIRED, | 10042 | | NFS4ERR_MOVED, NFS4ERR_SERVERFAULT, | 10043 | | NFS4ERR_STALE, NFS4ERR_WRONGSEC | 10044 | PUTPUBFH | NFS4ERR_DELAY, NFS4ERR_SERVERFAULT, | 10045 | | NFS4ERR_WRONGSEC | 10046 | PUTROOTFH | NFS4ERR_DELAY, NFS4ERR_SERVERFAULT, | 10047 | | NFS4ERR_WRONGSEC | 10048 | READ | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 10049 | | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR, | 10050 | | NFS4ERR_BAD_STATEID, NFS4ERR_DELAY, | 10051 | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, | 10052 | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_IO, | 10053 | | NFS4ERR_ISDIR, NFS4ERR_LEASE_MOVED, | 10054 | | NFS4ERR_LOCKED, NFS4ERR_MOVED, | 10055 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_OLD_STATEID, | 10056 | | NFS4ERR_OPENMODE, NFS4ERR_RESOURCE, | 10057 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 10058 | | NFS4ERR_STALE_STATEID, NFS4ERR_SYMLINK | 10059 | READDIR | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE, | 10060 | | NFS4ERR_BADXDR, NFS4ERR_BAD_COOKIE, | 10061 | | NFS4ERR_DELAY, NFS4ERR_FHEXPIRED, | 10062 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED, | 10063 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTDIR, | 10064 | | NFS4ERR_NOT_SAME, NFS4ERR_RESOURCE, | 10065 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 10066 | | NFS4ERR_TOOSMALL | 10067 | READLINK | NFS4ERR_ACCESS, NFS4ERR_BADHANDLE, | 10068 | | NFS4ERR_DELAY, NFS4ERR_FHEXPIRED, | 10069 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR, | 10070 | | NFS4ERR_MOVED, NFS4ERR_NOTSUP, | 10071 | | NFS4ERR_RESOURCE, NFS4ERR_NOFILEHANDLE, | 10072 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE | 10073 | RELEASE_LOCKOWNER | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | 10074 | | NFS4ERR_EXPIRED, NFS4ERR_LEASE_MOVED, | 10075 | | NFS4ERR_LOCKS_HELD, NFS4ERR_RESOURCE, | 10076 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE_CLIENTID | 10077 | REMOVE | NFS4ERR_ACCESS, NFS4ERR_BADCHAR, | 10078 | | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME, | 10079 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 10080 | | NFS4ERR_FHEXPIRED, NFS4ERR_FILE_OPEN, | 10081 | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_IO, | 10082 | | NFS4ERR_MOVED, NFS4ERR_NAMETOOLONG, | 10083 | | NFS4ERR_NOENT, NFS4ERR_NOFILEHANDLE, | 10084 | | NFS4ERR_NOTDIR, NFS4ERR_NOTEMPTY, | 10085 | | NFS4ERR_RESOURCE, NFS4ERR_ROFS, | 10086 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE | 10087 | RENAME | NFS4ERR_ACCESS, NFS4ERR_BADCHAR, | 10088 | | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME, | 10089 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 10090 | | NFS4ERR_DQUOT, NFS4ERR_EXIST, | 10091 | | NFS4ERR_FHEXPIRED, NFS4ERR_FILE_OPEN, | 10092 | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_IO, | 10093 | | NFS4ERR_MOVED, NFS4ERR_NAMETOOLONG, | 10094 | | NFS4ERR_NOENT, NFS4ERR_NOFILEHANDLE, | 10095 | | NFS4ERR_NOSPC, NFS4ERR_NOTDIR, | 10096 | | NFS4ERR_NOTEMPTY, NFS4ERR_RESOURCE, | 10097 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 10098 | | NFS4ERR_STALE, NFS4ERR_WRONGSEC, | 10099 | | NFS4ERR_XDEV | 10100 | RENEW | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 10101 | | NFS4ERR_BADXDR, NFS4ERR_CB_PATH_DOWN, | 10102 | | NFS4ERR_EXPIRED, NFS4ERR_LEASE_MOVED, | 10103 | | NFS4ERR_RESOURCE, NFS4ERR_SERVERFAULT, | 10104 | | NFS4ERR_STALE_CLIENTID | 10105 | RESTOREFH | NFS4ERR_BADHANDLE, NFS4ERR_FHEXPIRED, | 10106 | | NFS4ERR_MOVED, NFS4ERR_RESOURCE, | 10107 | | NFS4ERR_RESTOREFH, NFS4ERR_SERVERFAULT, | 10108 | | NFS4ERR_STALE, NFS4ERR_WRONGSEC | 10109 | SAVEFH | NFS4ERR_BADHANDLE, NFS4ERR_FHEXPIRED, | 10110 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 10111 | | NFS4ERR_RESOURCE, NFS4ERR_SERVERFAULT, | 10112 | | NFS4ERR_STALE | 10113 | SECINFO | NFS4ERR_ACCESS, NFS4ERR_BADCHAR, | 10114 | | NFS4ERR_BADHANDLE, NFS4ERR_BADNAME, | 10115 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 10116 | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | 10117 | | NFS4ERR_MOVED, NFS4ERR_NAMETOOLONG, | 10118 | | NFS4ERR_NOENT, NFS4ERR_NOFILEHANDLE, | 10119 | | NFS4ERR_NOTDIR, NFS4ERR_RESOURCE, | 10120 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE | 10121 | SETATTR | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 10122 | | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADCHAR, | 10123 | | NFS4ERR_BADHANDLE, NFS4ERR_BADOWNER, | 10124 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 10125 | | NFS4ERR_DELAY, NFS4ERR_DQUOT, | 10126 | | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | 10127 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 10128 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR, | 10129 | | NFS4ERR_LEASE_MOVED, NFS4ERR_LOCKED, | 10130 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 10131 | | NFS4ERR_NOSPC, NFS4ERR_OLD_STATEID, | 10132 | | NFS4ERR_OPENMODE, NFS4ERR_PERM, | 10133 | | NFS4ERR_RESOURCE, NFS4ERR_ROFS, | 10134 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 10135 | | NFS4ERR_STALE_STATEID | 10136 | SETCLIENTID | NFS4ERR_BADXDR, NFS4ERR_CLID_INUSE, | 10137 | | NFS4ERR_DELAY, NFS4ERR_INVAL, | 10138 | | NFS4ERR_RESOURCE, NFS4ERR_SERVERFAULT | 10139 | SETCLIENTID_CONFIRM | NFS4ERR_BADXDR, NFS4ERR_CLID_INUSE, | 10140 | | NFS4ERR_DELAY, NFS4ERR_RESOURCE, | 10141 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE_CLIENTID | 10142 | VERIFY | NFS4ERR_ACCESS, NFS4ERR_ATTRNOTSUPP, | 10143 | | NFS4ERR_BADCHAR, NFS4ERR_BADHANDLE, | 10144 | | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 10145 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 10146 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED, | 10147 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOT_SAME, | 10148 | | NFS4ERR_RESOURCE, NFS4ERR_SERVERFAULT, | 10149 | | NFS4ERR_STALE | 10150 | WRITE | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 10151 | | NFS4ERR_BADXDR, NFS4ERR_BADHANDLE, | 10152 | | NFS4ERR_BAD_STATEID, NFS4ERR_DELAY, | 10153 | | NFS4ERR_DQUOT, NFS4ERR_EXPIRED, | 10154 | | NFS4ERR_FBIG, NFS4ERR_FHEXPIRED, | 10155 | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_IO, | 10156 | | NFS4ERR_ISDIR, NFS4ERR_LEASE_MOVED, | 10157 | | NFS4ERR_LOCKED, NFS4ERR_MOVED, | 10158 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, | 10159 | | NFS4ERR_NXIO, NFS4ERR_OLD_STATEID, | 10160 | | NFS4ERR_OPENMODE, NFS4ERR_RESOURCE, | 10161 | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | 10162 | | NFS4ERR_STALE, NFS4ERR_STALE_STATEID, | 10163 | | NFS4ERR_SYMLINK | 10164 +---------------------+---------------------------------------------+ 10166 Table 9 10168 13.3. Callback operations and their valid errors 10170 This section contains a table which gives the valid error returns for 10171 each callback operation. The error code NFS4_OK (indicating no 10172 error) is not listed but should be understood to be returnable by all 10173 callback operations with the exception of CB_ILLEGAL. 10175 Valid error returns for each protocol callback operation 10177 +-------------+-----------------------------------------------------+ 10178 | Callback | Errors | 10179 | Operation | | 10180 +-------------+-----------------------------------------------------+ 10181 | CB_GETATTR | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR, NFS4ERR_DELAY, | 10182 | | NFS4ERR_INVAL, NFS4ERR_SERVERFAULT | 10183 | CB_ILLEGAL | NFS4ERR_BADXDR, NFS4ERR_OP_ILLEGAL | 10184 | CB_RECALL | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR, | 10185 | | NFS4ERR_BAD_STATEID, NFS4ERR_DELAY, | 10186 | | NFS4ERR_SERVERFAULT | 10187 +-------------+-----------------------------------------------------+ 10189 Table 10 10191 13.4. Errors and the operations that use them 10193 +--------------------------+----------------------------------------+ 10194 | Error | Operations | 10195 +--------------------------+----------------------------------------+ 10196 | NFS4ERR_ACCESS | ACCESS, COMMIT, CREATE, GETATTR, LINK, | 10197 | | LOCK, LOCKT, LOCKU, LOOKUP, LOOKUPP, | 10198 | | NVERIFY, OPEN, OPENATTR, READ, | 10199 | | READDIR, READLINK, REMOVE, RENAME, | 10200 | | RENEW, SECINFO, SETATTR, VERIFY, WRITE | 10201 | NFS4ERR_ADMIN_REVOKED | CLOSE, DELEGRETURN, LOCK, LOCKU, OPEN, | 10202 | | OPEN_CONFIRM, OPEN_DOWNGRADE, READ, | 10203 | | RELEASE_LOCKOWNER, RENEW, SETATTR, | 10204 | | WRITE | 10205 | NFS4ERR_ATTRNOTSUPP | CREATE, NVERIFY, OPEN, SETATTR, VERIFY | 10206 | NFS4ERR_BADCHAR | CREATE, LINK, LOOKUP, NVERIFY, OPEN, | 10207 | | REMOVE, RENAME, SECINFO, SETATTR, | 10208 | | VERIFY | 10209 | NFS4ERR_BADHANDLE | ACCESS, CB_GETATTR, CB_RECALL, CLOSE, | 10210 | | COMMIT, CREATE, GETATTR, GETFH, LINK, | 10211 | | LOCK, LOCKT, LOCKU, LOOKUP, LOOKUPP, | 10212 | | NVERIFY, OPEN, OPENATTR, OPEN_CONFIRM, | 10213 | | OPEN_DOWNGRADE, PUTFH, READ, READDIR, | 10214 | | READLINK, REMOVE, RENAME, RESTOREFH, | 10215 | | SAVEFH, SECINFO, SETATTR, VERIFY, | 10216 | | WRITE | 10217 | NFS4ERR_BADNAME | CREATE, LINK, LOOKUP, OPEN, REMOVE, | 10218 | | RENAME, SECINFO | 10219 | NFS4ERR_BADOWNER | CREATE, OPEN, SETATTR | 10220 | NFS4ERR_BADTYPE | CREATE | 10221 | NFS4ERR_BADXDR | ACCESS, CB_GETATTR, CB_ILLEGAL, | 10222 | | CB_RECALL, CLOSE, COMMIT, CREATE, | 10223 | | DELEGPURGE, DELEGRETURN, GETATTR, | 10224 | | ILLEGAL, LINK, LOCK, LOCKT, LOCKU, | 10225 | | LOOKUP, NVERIFY, OPEN, OPENATTR, | 10226 | | OPEN_CONFIRM, OPEN_DOWNGRADE, PUTFH, | 10227 | | READ, READDIR, RELEASE_LOCKOWNER, | 10228 | | REMOVE, RENAME, RENEW, SECINFO, | 10229 | | SETATTR, SETCLIENTID, | 10230 | | SETCLIENTID_CONFIRM, VERIFY, WRITE | 10231 | NFS4ERR_BAD_COOKIE | READDIR | 10232 | NFS4ERR_BAD_RANGE | LOCK, LOCKT, LOCKU | 10233 | NFS4ERR_BAD_SEQID | CLOSE, LOCK, LOCKU, OPEN, | 10234 | | OPEN_CONFIRM, OPEN_DOWNGRADE | 10235 | NFS4ERR_BAD_STATEID | CB_RECALL, CLOSE, DELEGRETURN, LOCK, | 10236 | | LOCKU, OPEN, OPEN_CONFIRM, | 10237 | | OPEN_DOWNGRADE, READ, SETATTR, WRITE | 10238 | NFS4ERR_CB_PATH_DOWN | RENEW | 10239 | NFS4ERR_CLID_INUSE | SETCLIENTID, SETCLIENTID_CONFIRM | 10240 | NFS4ERR_DEADLOCK | LOCK | 10241 | NFS4ERR_DELAY | ACCESS, CB_GETATTR, CB_RECALL, CLOSE, | 10242 | | CREATE, GETATTR, LINK, LOCK, LOCKT, | 10243 | | LOOKUPP, NVERIFY, OPEN, OPENATTR, | 10244 | | OPEN_DOWNGRADE, PUTFH, PUTPUBFH, | 10245 | | PUTROOTFH, READ, READDIR, READLINK, | 10246 | | REMOVE, RENAME, SECINFO, SETATTR, | 10247 | | SETCLIENTID, SETCLIENTID_CONFIRM, | 10248 | | VERIFY, WRITE | 10249 | NFS4ERR_DENIED | LOCK, LOCKT | 10250 | NFS4ERR_DQUOT | CREATE, LINK, OPEN, OPENATTR, RENAME, | 10251 | | SETATTR, WRITE | 10252 | NFS4ERR_EXIST | CREATE, LINK, OPEN, RENAME | 10253 | NFS4ERR_EXPIRED | CLOSE, DELEGRETURN, LOCK, LOCKU, OPEN, | 10254 | | OPEN_CONFIRM, OPEN_DOWNGRADE, READ, | 10255 | | RELEASE_LOCKOWNER, RENEW, SETATTR, | 10256 | | WRITE | 10257 | NFS4ERR_FBIG | OPEN, SETATTR, WRITE | 10258 | NFS4ERR_FHEXPIRED | ACCESS, CLOSE, COMMIT, CREATE, | 10259 | | GETATTR, GETFH, LINK, LOCK, LOCKT, | 10260 | | LOCKU, LOOKUP, LOOKUPP, NVERIFY, OPEN, | 10261 | | OPENATTR, OPEN_CONFIRM, | 10262 | | OPEN_DOWNGRADE, PUTFH, READ, READDIR, | 10263 | | READLINK, REMOVE, RENAME, RESTOREFH, | 10264 | | SAVEFH, SECINFO, SETATTR, VERIFY, | 10265 | | WRITE | 10266 | NFS4ERR_FILE_OPEN | LINK, REMOVE, RENAME | 10267 | NFS4ERR_GRACE | GETATTR, LOCK, LOCKT, LOCKU, NVERIFY, | 10268 | | OPEN, READ, REMOVE, RENAME, SETATTR, | 10269 | | VERIFY, WRITE | 10270 | NFS4ERR_INVAL | ACCESS, CB_GETATTR, CLOSE, COMMIT, | 10271 | | CREATE, DELEGRETURN, GETATTR, LINK, | 10272 | | LOCK, LOCKT, LOCKU, LOOKUP, NVERIFY, | 10273 | | OPEN, OPEN_CONFIRM, OPEN_DOWNGRADE, | 10274 | | READ, READDIR, READLINK, REMOVE, | 10275 | | RENAME, SECINFO, SETATTR, SETCLIENTID, | 10276 | | VERIFY, WRITE | 10277 | NFS4ERR_IO | ACCESS, COMMIT, CREATE, GETATTR, LINK, | 10278 | | LOOKUP, LOOKUPP, NVERIFY, OPEN, | 10279 | | OPENATTR, READ, READDIR, READLINK, | 10280 | | REMOVE, RENAME, SETATTR, VERIFY, WRITE | 10281 | NFS4ERR_ISDIR | CLOSE, COMMIT, LINK, LOCK, LOCKT, | 10282 | | LOCKU, OPEN, OPEN_CONFIRM, READ, | 10283 | | READLINK, SETATTR, WRITE | 10284 | NFS4ERR_LEASE_MOVED | CLOSE, DELEGPURGE, DELEGRETURN, LOCK, | 10285 | | LOCKT, LOCKU, OPEN_CONFIRM, | 10286 | | OPEN_DOWNGRADE, READ, | 10287 | | RELEASE_LOCKOWNER, RENEW, SETATTR, | 10288 | | WRITE | 10289 | NFS4ERR_LOCKED | READ, SETATTR, WRITE | 10290 | NFS4ERR_LOCKS_HELD | CLOSE, OPEN_DOWNGRADE, | 10291 | | RELEASE_LOCKOWNER | 10292 | NFS4ERR_LOCK_NOTSUPP | LOCK | 10293 | NFS4ERR_LOCK_RANGE | LOCK, LOCKT, LOCKU | 10294 | NFS4ERR_MLINK | LINK | 10295 | NFS4ERR_MOVED | ACCESS, CLOSE, COMMIT, CREATE, | 10296 | | DELEGRETURN, GETATTR, GETFH, LINK, | 10297 | | LOCK, LOCKT, LOCKU, LOOKUP, LOOKUPP, | 10298 | | NVERIFY, OPEN, OPENATTR, OPEN_CONFIRM, | 10299 | | OPEN_DOWNGRADE, PUTFH, READ, READDIR, | 10300 | | READLINK, REMOVE, RENAME, RESTOREFH, | 10301 | | SAVEFH, SECINFO, SETATTR, VERIFY, | 10302 | | WRITE | 10303 | NFS4ERR_NAMETOOLONG | CREATE, LINK, LOOKUP, OPEN, REMOVE, | 10304 | | RENAME, SECINFO | 10305 | NFS4ERR_NOENT | LINK, LOOKUP, LOOKUPP, OPEN, OPENATTR, | 10306 | | REMOVE, RENAME, SECINFO | 10307 | NFS4ERR_NOFILEHANDLE | ACCESS, CLOSE, COMMIT, CREATE, | 10308 | | DELEGRETURN, GETATTR, GETFH, LINK, | 10309 | | LOCK, LOCKT, LOCKU, LOOKUP, LOOKUPP, | 10310 | | NVERIFY, OPEN, OPENATTR, OPEN_CONFIRM, | 10311 | | OPEN_DOWNGRADE, READ, READDIR, | 10312 | | READLINK, REMOVE, RENAME, SAVEFH, | 10313 | | SECINFO, SETATTR, VERIFY, WRITE | 10314 | NFS4ERR_NOSPC | CREATE, LINK, OPEN, OPENATTR, RENAME, | 10315 | | SETATTR, WRITE | 10316 | NFS4ERR_NOTDIR | CREATE, LINK, LOOKUP, LOOKUPP, OPEN, | 10317 | | READDIR, REMOVE, RENAME, SECINFO | 10318 | NFS4ERR_NOTEMPTY | REMOVE, RENAME | 10319 | NFS4ERR_NOTSUP | OPEN, READLINK | 10320 | NFS4ERR_NOTSUPP | DELEGPURGE, DELEGRETURN, LINK, | 10321 | | OPENATTR | 10322 | NFS4ERR_NOT_SAME | READDIR, VERIFY | 10323 | NFS4ERR_NO_GRACE | LOCK, OPEN | 10324 | NFS4ERR_NXIO | WRITE | 10325 | NFS4ERR_OLD_STATEID | CLOSE, DELEGRETURN, LOCK, LOCKU, OPEN, | 10326 | | OPEN_CONFIRM, OPEN_DOWNGRADE, READ, | 10327 | | SETATTR, WRITE | 10328 | NFS4ERR_OPENMODE | LOCK, READ, SETATTR, WRITE | 10329 | NFS4ERR_OP_ILLEGAL | CB_ILLEGAL, ILLEGAL | 10330 | NFS4ERR_PERM | CREATE, OPEN, SETATTR | 10331 | NFS4ERR_RECLAIM_BAD | LOCK, OPEN | 10332 | NFS4ERR_RECLAIM_CONFLICT | LOCK, OPEN | 10333 | NFS4ERR_RESOURCE | ACCESS, CLOSE, COMMIT, CREATE, | 10334 | | DELEGPURGE, DELEGRETURN, GETATTR, | 10335 | | GETFH, LINK, LOCK, LOCKT, LOCKU, | 10336 | | LOOKUP, LOOKUPP, OPEN, OPENATTR, | 10337 | | OPEN_CONFIRM, OPEN_DOWNGRADE, READ, | 10338 | | READDIR, READLINK, RELEASE_LOCKOWNER, | 10339 | | REMOVE, RENAME, RENEW, RESTOREFH, | 10340 | | SAVEFH, SECINFO, SETATTR, SETCLIENTID, | 10341 | | SETCLIENTID_CONFIRM, VERIFY, WRITE | 10342 | NFS4ERR_RESTOREFH | RESTOREFH | 10343 | NFS4ERR_ROFS | COMMIT, CREATE, LINK, OPEN, OPENATTR, | 10344 | | OPEN_DOWNGRADE, REMOVE, RENAME, | 10345 | | SETATTR, WRITE | 10346 | NFS4ERR_SAME | NVERIFY | 10347 | NFS4ERR_SERVERFAULT | ACCESS, CB_GETATTR, CB_RECALL, CLOSE, | 10348 | | COMMIT, CREATE, DELEGPURGE, | 10349 | | DELEGRETURN, GETATTR, GETFH, LINK, | 10350 | | LOCK, LOCKT, LOCKU, LOOKUP, LOOKUPP, | 10351 | | NVERIFY, OPEN, OPENATTR, OPEN_CONFIRM, | 10352 | | OPEN_DOWNGRADE, PUTFH, PUTPUBFH, | 10353 | | PUTROOTFH, READ, READDIR, READLINK, | 10354 | | RELEASE_LOCKOWNER, REMOVE, RENAME, | 10355 | | RENEW, RESTOREFH, SAVEFH, SECINFO, | 10356 | | SETATTR, SETCLIENTID, | 10357 | | SETCLIENTID_CONFIRM, VERIFY, WRITE | 10358 | NFS4ERR_SHARE_DENIED | OPEN | 10359 | NFS4ERR_STALE | ACCESS, CLOSE, COMMIT, CREATE, | 10360 | | DELEGRETURN, GETATTR, GETFH, LINK, | 10361 | | LOCK, LOCKT, LOCKU, LOOKUP, LOOKUPP, | 10362 | | NVERIFY, OPEN, OPENATTR, OPEN_CONFIRM, | 10363 | | OPEN_DOWNGRADE, PUTFH, READ, READDIR, | 10364 | | READLINK, REMOVE, RENAME, RESTOREFH, | 10365 | | SAVEFH, SECINFO, SETATTR, VERIFY, | 10366 | | WRITE | 10367 | NFS4ERR_STALE_CLIENTID | DELEGPURGE, LOCK, LOCKT, OPEN, | 10368 | | RELEASE_LOCKOWNER, RENEW, | 10369 | | SETCLIENTID_CONFIRM | 10370 | NFS4ERR_STALE_STATEID | CLOSE, DELEGRETURN, LOCK, LOCKU, | 10371 | | OPEN_CONFIRM, OPEN_DOWNGRADE, READ, | 10372 | | SETATTR, WRITE | 10373 | NFS4ERR_SYMLINK | COMMIT, LOOKUP, LOOKUPP, OPEN, READ, | 10374 | | WRITE | 10375 | NFS4ERR_TOOSMALL | READDIR | 10376 | NFS4ERR_WRONGSEC | LINK, LOOKUP, LOOKUPP, OPEN, PUTFH, | 10377 | | PUTPUBFH, PUTROOTFH, RENAME, RESTOREFH | 10378 | NFS4ERR_XDEV | LINK, RENAME | 10379 +--------------------------+----------------------------------------+ 10381 Table 11 10383 14. NFSv4 Requests 10385 For the NFSv4 RPC program, there are two traditional RPC procedures: 10386 NULL and COMPOUND. All other functionality is defined as a set of 10387 operations and these operations are defined in normal XDR/RPC syntax 10388 and semantics. However, these operations are encapsulated within the 10389 COMPOUND procedure. This requires that the client combine one or 10390 more of the NFSv4 operations into a single request. 10392 The NFS4_CALLBACK program is used to provide server to client 10393 signaling and is constructed in a similar fashion as the NFSv4 10394 program. The procedures CB_NULL and CB_COMPOUND are defined in the 10395 same way as NULL and COMPOUND are within the NFS program. The 10396 CB_COMPOUND request also encapsulates the remaining operations of the 10397 NFS4_CALLBACK program. There is no predefined RPC program number for 10398 the NFS4_CALLBACK program. It is up to the client to specify a 10399 program number in the "transient" program range. The program and 10400 port number of the NFS4_CALLBACK program are provided by the client 10401 as part of the SETCLIENTID/SETCLIENTID_CONFIRM sequence. The program 10402 and port can be changed by another SETCLIENTID/SETCLIENTID_CONFIRM 10403 sequence, and it is possible to use the sequence to change them 10404 within a client incarnation without removing relevant leased client 10405 state. 10407 14.1. Compound Procedure 10409 The COMPOUND procedure provides the opportunity for better 10410 performance within high latency networks. The client can avoid 10411 cumulative latency of multiple RPCs by combining multiple dependent 10412 operations into a single COMPOUND procedure. A compound operation 10413 may provide for protocol simplification by allowing the client to 10414 combine basic procedures into a single request that is customized for 10415 the client's environment. 10417 The CB_COMPOUND procedure precisely parallels the features of 10418 COMPOUND as described above. 10420 The basic structure of the COMPOUND procedure is: 10422 +-----+--------------+--------+-----------+-----------+-----------+-- 10423 | tag | minorversion | numops | op + args | op + args | op + args | 10424 +-----+--------------+--------+-----------+-----------+-----------+-- 10426 and the reply's structure is: 10428 +------------+-----+--------+-----------------------+-- 10429 |last status | tag | numres | status + op + results | 10430 +------------+-----+--------+-----------------------+-- 10432 The numops and numres fields, used in the depiction above, represent 10433 the count for the counted array encoding use to signify the number of 10434 arguments or results encoded in the request and response. As per the 10435 XDR encoding, these counts must match exactly the number of operation 10436 arguments or results encoded. 10438 14.2. Evaluation of a Compound Request 10440 The server will process the COMPOUND procedure by evaluating each of 10441 the operations within the COMPOUND procedure in order. Each 10442 component operation consists of a 32 bit operation code, followed by 10443 the argument of length determined by the type of operation. The 10444 results of each operation are encoded in sequence into a reply 10445 buffer. The results of each operation are preceded by the opcode and 10446 a status code (normally zero). If an operation results in a non-zero 10447 status code, the status will be encoded and evaluation of the 10448 compound sequence will halt and the reply will be returned. Note 10449 that evaluation stops even in the event of "non error" conditions 10450 such as NFS4ERR_SAME. 10452 There are no atomicity requirements for the operations contained 10453 within the COMPOUND procedure. The operations being evaluated as 10454 part of a COMPOUND request may be evaluated simultaneously with other 10455 COMPOUND requests that the server receives. 10457 It is the client's responsibility for recovering from any partially 10458 completed COMPOUND procedure. Partially completed COMPOUND 10459 procedures may occur at any point due to errors such as 10460 NFS4ERR_RESOURCE and NFS4ERR_DELAY. This may occur even given an 10461 otherwise valid operation string. Further, a server reboot which 10462 occurs in the middle of processing a COMPOUND procedure may leave the 10463 client with the difficult task of determining how far COMPOUND 10464 processing has proceeded. Therefore, the client should avoid overly 10465 complex COMPOUND procedures in the event of the failure of an 10466 operation within the procedure. 10468 Each operation assumes a "current" and "saved" filehandle that is 10469 available as part of the execution context of the compound request. 10470 Operations may set, change, or return the current filehandle. The 10471 "saved" filehandle is used for temporary storage of a filehandle 10472 value and as operands for the RENAME and LINK operations. 10474 14.3. Synchronous Modifying Operations 10476 NFSv4 operations that modify the filesystem are synchronous. When an 10477 operation is successfully completed at the server, the client can 10478 depend that any data associated with the request is now on stable 10479 storage (the one exception is in the case of the file data in a WRITE 10480 operation with the UNSTABLE option specified). 10482 This implies that any previous operations within the same compound 10483 request are also reflected in stable storage. This behavior enables 10484 the client's ability to recover from a partially executed compound 10485 request which may resulted from the failure of the server. For 10486 example, if a compound request contains operations A and B and the 10487 server is unable to send a response to the client, depending on the 10488 progress the server made in servicing the request the result of both 10489 operations may be reflected in stable storage or just operation A may 10490 be reflected. The server must not have just the results of operation 10491 B in stable storage. 10493 14.4. Operation Values 10495 The operations encoded in the COMPOUND procedure are identified by 10496 operation values. To avoid overlap with the RPC procedure numbers, 10497 operations 0 (zero) and 1 are not defined. Operation 2 is not 10498 defined but reserved for future use with minor versioning. 10500 15. NFSv4 Procedures 10502 15.1. Procedure 0: NULL - No Operation 10504 15.1.1. SYNOPSIS 10506 10508 15.1.2. ARGUMENT 10510 void; 10512 15.1.3. RESULT 10514 void; 10516 15.1.4. DESCRIPTION 10518 Standard NULL procedure. Void argument, void response. This 10519 procedure has no functionality associated with it. Because of this 10520 it is sometimes used to measure the overhead of processing a service 10521 request. Therefore, the server should ensure that no unnecessary 10522 work is done in servicing this procedure. 10524 15.2. Procedure 1: COMPOUND - Compound Operations 10526 15.2.1. SYNOPSIS 10528 compoundargs -> compoundres 10530 15.2.2. ARGUMENT 10532 union nfs_argop4 switch (nfs_opnum4 argop) { 10533 case : ; 10534 ... 10535 }; 10536 struct COMPOUND4args { 10537 comptag4 tag; 10538 uint32_t minorversion; 10539 nfs_argop4 argarray<>; 10540 }; 10542 15.2.3. RESULT 10544 union nfs_resop4 switch (nfs_opnum4 resop) { 10545 case : ; 10546 ... 10547 }; 10549 struct COMPOUND4res { 10550 nfsstat4 status; 10551 comptag4 tag; 10552 nfs_resop4 resarray<>; 10553 }; 10555 15.2.4. DESCRIPTION 10557 The COMPOUND procedure is used to combine one or more of the NFS 10558 operations into a single RPC request. The main NFS RPC program has 10559 two main procedures: NULL and COMPOUND. All other operations use the 10560 COMPOUND procedure as a wrapper. 10562 The COMPOUND procedure is used to combine individual operations into 10563 a single RPC request. The server interprets each of the operations 10564 in turn. If an operation is executed by the server and the status of 10565 that operation is NFS4_OK, then the next operation in the COMPOUND 10566 procedure is executed. The server continues this process until there 10567 are no more operations to be executed or one of the operations has a 10568 status value other than NFS4_OK. 10570 In the processing of the COMPOUND procedure, the server may find that 10571 it does not have the available resources to execute any or all of the 10572 operations within the COMPOUND sequence. In this case, the error 10573 NFS4ERR_RESOURCE will be returned for the particular operation within 10574 the COMPOUND procedure where the resource exhaustion occurred. This 10575 assumes that all previous operations within the COMPOUND sequence 10576 have been evaluated successfully. The results for all of the 10577 evaluated operations must be returned to the client. 10579 The server will generally choose between two methods of decoding the 10580 client's request. The first would be the traditional one-pass XDR 10581 decode, in which decoding of the entire COMPOUND precedes execution 10582 of any operation within it. If there is an XDR decoding error in 10583 this case, an RPC XDR decode error would be returned. The second 10584 method would be to make an initial pass to decode the basic COMPOUND 10585 request and then to XDR decode each of the individual operations, as 10586 the server is ready to execute it. In this case, the server may 10587 encounter an XDR decode error during such an operation decode, after 10588 previous operations within the COMPOUND have been executed. In this 10589 case, the server would return the error NFS4ERR_BADXDR to signify the 10590 decode error. 10592 The COMPOUND arguments contain a "minorversion" field. The initial 10593 and default value for this field is 0 (zero). This field will be 10594 used by future minor versions such that the client can communicate to 10595 the server what minor version is being requested. If the server 10596 receives a COMPOUND procedure with a minorversion field value that it 10597 does not support, the server MUST return an error of 10598 NFS4ERR_MINOR_VERS_MISMATCH and a zero length resultdata array. 10600 Contained within the COMPOUND results is a "status" field. If the 10601 results array length is non-zero, this status must be equivalent to 10602 the status of the last operation that was executed within the 10603 COMPOUND procedure. Therefore, if an operation incurred an error 10604 then the "status" value will be the same error value as is being 10605 returned for the operation that failed. 10607 Note that operations, 0 (zero) and 1 (one) are not defined for the 10608 COMPOUND procedure. Operation 2 is not defined but reserved for 10609 future definition and use with minor versioning. If the server 10610 receives a operation array that contains operation 2 and the 10611 minorversion field has a value of 0 (zero), an error of 10612 NFS4ERR_OP_ILLEGAL, as described in the next paragraph, is returned 10613 to the client. If an operation array contains an operation 2 and the 10614 minorversion field is non-zero and the server does not support the 10615 minor version, the server returns an error of 10616 NFS4ERR_MINOR_VERS_MISMATCH. Therefore, the 10617 NFS4ERR_MINOR_VERS_MISMATCH error takes precedence over all other 10618 errors. 10620 It is possible that the server receives a request that contains an 10621 operation that is less than the first legal operation (OP_ACCESS) or 10622 greater than the last legal operation (OP_RELEASE_LOCKOWNER). In 10623 this case, the server's response will encode the opcode OP_ILLEGAL 10624 rather than the illegal opcode of the request. The status field in 10625 the ILLEGAL return results will set to NFS4ERR_OP_ILLEGAL. The 10626 COMPOUND procedure's return results will also be NFS4ERR_OP_ILLEGAL. 10628 The definition of the "tag" in the request is left to the 10629 implementor. It may be used to summarize the content of the compound 10630 request for the benefit of packet sniffers and engineers debugging 10631 implementations. However, the value of "tag" in the response SHOULD 10632 be the same value as provided in the request. This applies to the 10633 tag field of the CB_COMPOUND procedure as well. 10635 15.2.4.1. Current Filehandle 10637 The current and saved filehandle are used throughout the protocol. 10638 Most operations implicitly use the current filehandle as a argument 10639 and many set the current filehandle as part of the results. The 10640 combination of client specified sequences of operations and current 10641 and saved filehandle arguments and results allows for greater 10642 protocol flexibility. The best or easiest example of current 10643 filehandle usage is a sequence like the following: 10645 PUTFH fh1 {fh1} 10646 LOOKUP "compA" {fh2} 10647 GETATTR {fh2} 10648 LOOKUP "compB" {fh3} 10649 GETATTR {fh3} 10650 LOOKUP "compC" {fh4} 10651 GETATTR {fh4} 10652 GETFH 10654 Figure 1 10656 In this example, the PUTFH (Section 15.22) operation explicitly sets 10657 the current filehandle value while the result of each LOOKUP 10658 operation sets the current filehandle value to the resultant file 10659 system object. Also, the client is able to insert GETATTR operations 10660 using the current filehandle as an argument. 10662 The PUTROOTFH (Section 15.24) and PUTPUBFH (Section 15.24) operations 10663 also set the current filehandle. The above example would replace 10664 "PUTFH fh1" with PUTROOTFH or PUTPUBFH with no filehandle argument in 10665 order to achieve the same effect (on the assumption that "compA" is 10666 directly below the root of the namespace). 10668 Along with the current filehandle, there is a saved filehandle. 10669 While the current filehandle is set as the result of operations like 10670 LOOKUP, the saved filehandle must be set directly with the use of the 10671 SAVEFH operation. The SAVEFH operations copies the current 10672 filehandle value to the saved value. The saved filehandle value is 10673 used in combination with the current filehandle value for the LINK 10674 and RENAME operations. The RESTOREFH operation will copy the saved 10675 filehandle value to the current filehandle value; as a result, the 10676 saved filehandle value may be used a sort of "scratch" area for the 10677 client's series of operations. 10679 15.2.4.2. Current Stateid 10681 The COMPOUND processing environment also have a current stateid and a 10682 saved stateid, which allows for the passing of stateids between 10683 operations. 10685 A "current stateid" is the stateid that is associated with the 10686 current filehandle. The current stateid may only be changed by an 10687 operation that modifies the current filehandle or returns a stateid. 10688 If an operation returns a stateid it MUST set the current stateid to 10689 the returned value. If an operation sets the current filehandle but 10690 does not return a stateid, the current stateid MUST be set to the 10691 all-zeros special stateid, i.e. (seqid, other) = (0, 0). If an 10692 operation uses a stateid as an argument but does not return a 10693 stateid, the current stateid MUST NOT be changed. E.g., PUTFH, 10694 PUTROOTFH, and PUTPUBFH will change the current server state from 10695 {ocfh, (osid)} to {cfh, (0, 0)} while LOCK will change the current 10696 state from {cfh, (osid} to {cfh, (nsid)}. Operations like LOOKUP 10697 that transform a current filehandle and component name into a new 10698 current filehandle will also change the current stateid to {0, 0}. 10699 The SAVEFH and RESTOREFH operations will save and restore both the 10700 current filehandle and the current stateid as a set. 10702 The following example is the common case of a simple READ operation 10703 with a supplied stateid showing that the PUTFH initializes the 10704 current stateid to (0, 0). The subsequent READ with stateid (sid1) 10705 leaves the current stateid unchanged, but does evaluate the the 10706 operation. 10708 PUTFH fh1 - -> {fh1, (0, 0)} 10709 READ (sid1), 0, 1024 {fh1, (0, 0)} -> {fh1, (0, 0)} 10711 Figure 2 10713 This next example performs an OPEN with the root filehandle and as a 10714 result generates stateid (sid1). The next operation specifies the 10715 READ with the argument stateid set such that (seqid, other) are equal 10716 to (1, 0), but the current stateid set by the previous operation is 10717 actually used when the operation is evaluated. This allows correct 10718 interaction with any existing, potentially conflicting, locks. 10720 PUTROOTFH - -> {fh1, (0, 0)} 10721 OPEN "compA" {fh1, (0, 0)} -> {fh2, (sid1)} 10722 READ (1, 0), 0, 1024 {fh2, (sid1)} -> {fh2, (sid1)} 10723 CLOSE (1, 0) {fh2, (sid1)} -> {fh2, (sid2)} 10725 Figure 3 10727 This next example is similar to the second in how it passes the 10728 stateid sid2 generated by the LOCK operation to the next READ 10729 operation. This allows the client to explicitly surround a single 10730 I/O operation with a lock and its appropriate stateid to guarantee 10731 correctness with other client locks. The example also shows how 10732 SAVEFH and RESTOREFH can save and later re-use a filehandle and 10733 stateid, passing them as the current filehandle and stateid to a READ 10734 operation. 10736 PUTFH fh1 - -> {fh1, (0, 0)} 10737 LOCK 0, 1024, (sid1) {fh1, (sid1)} -> {fh1, (sid2)} 10738 READ (1, 0), 0, 1024 {fh1, (sid2)} -> {fh1, (sid2)} 10739 LOCKU 0, 1024, (1, 0) {fh1, (sid2)} -> {fh1, (sid3)} 10740 SAVEFH {fh1, (sid3)} -> {fh1, (sid3)} 10742 PUTFH fh2 {fh1, (sid3)} -> {fh2, (0, 0)} 10743 WRITE (1, 0), 0, 1024 {fh2, (0, 0)} -> {fh2, (0, 0)} 10745 RESTOREFH {fh2, (0, 0)} -> {fh1, (sid3)} 10746 READ (1, 0), 1024, 1024 {fh1, (sid3)} -> {fh1, (sid3)} 10748 Figure 4 10750 The final example shows a disallowed use of the current stateid. The 10751 client is attempting to implicitly pass anonymous special stateid, 10752 (0,0) to the READ operation. The server MUST return 10753 NFS4ERR_BAD_STATEID in the reply to the READ operation. 10755 PUTFH fh1 - -> {fh1, (0, 0)} 10756 READ (1, 0), 0, 1024 {fh1, (0, 0)} -> NFS4ERR_BAD_STATEID 10758 Figure 5 10760 15.2.5. IMPLEMENTATION 10762 Since an error of any type may occur after only a portion of the 10763 operations have been evaluated, the client must be prepared to 10764 recover from any failure. If the source of an NFS4ERR_RESOURCE error 10765 was a complex or lengthy set of operations, it is likely that if the 10766 number of operations were reduced the server would be able to 10767 evaluate them successfully. Therefore, the client is responsible for 10768 dealing with this type of complexity in recovery. 10770 The client SHOULD NOT construct a COMPOUND which mixes operations for 10771 different client IDs. 10773 15.3. Operation 3: ACCESS - Check Access Rights 10775 15.3.1. SYNOPSIS 10777 (cfh), accessreq -> supported, accessrights 10779 15.3.2. ARGUMENT 10781 const ACCESS4_READ = 0x00000001; 10782 const ACCESS4_LOOKUP = 0x00000002; 10783 const ACCESS4_MODIFY = 0x00000004; 10784 const ACCESS4_EXTEND = 0x00000008; 10785 const ACCESS4_DELETE = 0x00000010; 10786 const ACCESS4_EXECUTE = 0x00000020; 10788 struct ACCESS4args { 10789 /* CURRENT_FH: object */ 10790 uint32_t access; 10791 }; 10793 15.3.3. RESULT 10795 struct ACCESS4resok { 10796 uint32_t supported; 10797 uint32_t access; 10798 }; 10800 union ACCESS4res switch (nfsstat4 status) { 10801 case NFS4_OK: 10802 ACCESS4resok resok4; 10803 default: 10804 void; 10805 }; 10807 15.3.4. DESCRIPTION 10809 ACCESS determines the access rights that a user, as identified by the 10810 credentials in the RPC request, has with respect to the file system 10811 object specified by the current filehandle. The client encodes the 10812 set of access rights that are to be checked in the bit mask "access". 10813 The server checks the permissions encoded in the bit mask. If a 10814 status of NFS4_OK is returned, two bit masks are included in the 10815 response. The first, "supported", represents the access rights for 10816 which the server can verify reliably. The second, "access", 10817 represents the access rights available to the user for the filehandle 10818 provided. On success, the current filehandle retains its value. 10820 Note that the supported field will contain only as many values as 10821 were originally sent in the arguments. For example, if the client 10822 sends an ACCESS operation with only the ACCESS4_READ value set and 10823 the server supports this value, the server will return only 10824 ACCESS4_READ even if it could have reliably checked other values. 10826 The results of this operation are necessarily advisory in nature. A 10827 return status of NFS4_OK and the appropriate bit set in the bit mask 10828 does not imply that such access will be allowed to the file system 10829 object in the future. This is because access rights can be revoked 10830 by the server at any time. 10832 The following access permissions may be requested: 10834 ACCESS4_READ: Read data from file or read a directory. 10836 ACCESS4_LOOKUP: Look up a name in a directory (no meaning for non- 10837 directory objects). 10839 ACCESS4_MODIFY: Rewrite existing file data or modify existing 10840 directory entries. 10842 ACCESS4_EXTEND: Write new data or add directory entries. 10844 ACCESS4_DELETE: Delete an existing directory entry. 10846 ACCESS4_EXECUTE: Execute file (no meaning for a directory). 10848 On success, the current filehandle retains its value. 10850 15.3.5. IMPLEMENTATION 10852 In general, it is not sufficient for the client to attempt to deduce 10853 access permissions by inspecting the uid, gid, and mode fields in the 10854 file attributes or by attempting to interpret the contents of the ACL 10855 attribute. This is because the server may perform uid or gid mapping 10856 or enforce additional access control restrictions. It is also 10857 possible that the server may not be in the same ID space as the 10858 client. In these cases (and perhaps others), the client cannot 10859 reliably perform an access check with only current file attributes. 10861 In the NFSv2 protocol, the only reliable way to determine whether an 10862 operation was allowed was to try it and see if it succeeded or 10863 failed. Using the ACCESS operation in the NFSv4 protocol, the client 10864 can ask the server to indicate whether or not one or more classes of 10865 operations are permitted. The ACCESS operation is provided to allow 10866 clients to check before doing a series of operations which will 10867 result in an access failure. The OPEN operation provides a point 10868 where the server can verify access to the file object and method to 10869 return that information to the client. The ACCESS operation is still 10870 useful for directory operations or for use in the case the UNIX API 10871 "access" is used on the client. 10873 The information returned by the server in response to an ACCESS call 10874 is not permanent. It was correct at the exact time that the server 10875 performed the checks, but not necessarily afterward. The server can 10876 revoke access permission at any time. 10878 The client should use the effective credentials of the user to build 10879 the authentication information in the ACCESS request used to 10880 determine access rights. It is the effective user and group 10881 credentials that are used in subsequent read and write operations. 10883 Many implementations do not directly support the ACCESS4_DELETE 10884 permission. Operating systems like UNIX will ignore the 10885 ACCESS4_DELETE bit if set on an access request on a non-directory 10886 object. In these systems, delete permission on a file is determined 10887 by the access permissions on the directory in which the file resides, 10888 instead of being determined by the permissions of the file itself. 10889 Therefore, the mask returned enumerating which access rights can be 10890 determined will have the ACCESS4_DELETE value set to 0. This 10891 indicates to the client that the server was unable to check that 10892 particular access right. The ACCESS4_DELETE bit in the access mask 10893 returned will then be ignored by the client. 10895 15.4. Operation 4: CLOSE - Close File 10897 15.4.1. SYNOPSIS 10899 (cfh), seqid, open_stateid -> open_stateid 10901 15.4.2. ARGUMENT 10903 struct CLOSE4args { 10904 /* CURRENT_FH: object */ 10905 seqid4 seqid; 10906 stateid4 open_stateid; 10907 }; 10909 15.4.3. RESULT 10911 union CLOSE4res switch (nfsstat4 status) { 10912 case NFS4_OK: 10913 stateid4 open_stateid; 10914 default: 10915 void; 10916 }; 10918 15.4.4. DESCRIPTION 10920 The CLOSE operation releases share reservations for the regular or 10921 named attribute file as specified by the current filehandle. The 10922 share reservations and other state information released at the server 10923 as a result of this CLOSE is only associated with the supplied 10924 stateid. The sequence id provides for the correct ordering. State 10925 associated with other OPENs is not affected. 10927 If byte-range locks are held, the client SHOULD release all locks 10928 before issuing a CLOSE. The server MAY free all outstanding locks on 10929 CLOSE but some servers may not support the CLOSE of a file that still 10930 has byte-range locks held. The server MUST return failure if any 10931 locks would exist after the CLOSE. 10933 On success, the current filehandle retains its value. 10935 15.4.5. IMPLEMENTATION 10937 Even though CLOSE returns a stateid, this stateid is not useful to 10938 the client and should be treated as deprecated. CLOSE "shuts down" 10939 the state associated with all OPENs for the file by a single 10940 open_owner. As noted above, CLOSE will either release all file 10941 locking state or return an error. Therefore, the stateid returned by 10942 CLOSE is not useful for operations that follow. 10944 15.5. Operation 5: COMMIT - Commit Cached Data 10946 15.5.1. SYNOPSIS 10948 (cfh), offset, count -> verifier 10950 15.5.2. ARGUMENT 10952 struct COMMIT4args { 10953 /* CURRENT_FH: file */ 10954 offset4 offset; 10955 count4 count; 10956 }; 10958 15.5.3. RESULT 10960 struct COMMIT4resok { 10961 verifier4 writeverf; 10962 }; 10964 union COMMIT4res switch (nfsstat4 status) { 10965 case NFS4_OK: 10966 COMMIT4resok resok4; 10967 default: 10968 void; 10969 }; 10971 15.5.4. DESCRIPTION 10973 The COMMIT operation forces or flushes data to stable storage for the 10974 file specified by the current filehandle. The flushed data is that 10975 which was previously written with a WRITE operation which had the 10976 stable field set to UNSTABLE4. 10978 The offset specifies the position within the file where the flush is 10979 to begin. An offset value of 0 (zero) means to flush data starting 10980 at the beginning of the file. The count specifies the number of 10981 bytes of data to flush. If count is 0 (zero), a flush from offset to 10982 the end of the file is done. 10984 The server returns a write verifier upon successful completion of the 10985 COMMIT. The write verifier is used by the client to determine if the 10986 server has restarted or rebooted between the initial WRITE(s) and the 10987 COMMIT. The client does this by comparing the write verifier 10988 returned from the initial writes and the verifier returned by the 10989 COMMIT operation. The server must vary the value of the write 10990 verifier at each server event or instantiation that may lead to a 10991 loss of uncommitted data. Most commonly this occurs when the server 10992 is rebooted; however, other events at the server may result in 10993 uncommitted data loss as well. 10995 On success, the current filehandle retains its value. 10997 15.5.5. IMPLEMENTATION 10999 The COMMIT operation is similar in operation and semantics to the 11000 POSIX fsync() [36] system call that synchronizes a file's state with 11001 the disk (file data and metadata is flushed to disk or stable 11002 storage). COMMIT performs the same operation for a client, flushing 11003 any unsynchronized data and metadata on the server to the server's 11004 disk or stable storage for the specified file. Like fsync(), it may 11005 be that there is some modified data or no modified data to 11006 synchronize. The data may have been synchronized by the server's 11007 normal periodic buffer synchronization activity. COMMIT should 11008 return NFS4_OK, unless there has been an unexpected error. 11010 COMMIT differs from fsync() in that it is possible for the client to 11011 flush a range of the file (most likely triggered by a buffer- 11012 reclamation scheme on the client before file has been completely 11013 written). 11015 The server implementation of COMMIT is reasonably simple. If the 11016 server receives a full file COMMIT request, that is starting at 11017 offset 0 and count 0, it should do the equivalent of fsync()'ing the 11018 file. Otherwise, it should arrange to have the cached data in the 11019 range specified by offset and count to be flushed to stable storage. 11020 In both cases, any metadata associated with the file must be flushed 11021 to stable storage before returning. It is not an error for there to 11022 be nothing to flush on the server. This means that the data and 11023 metadata that needed to be flushed have already been flushed or lost 11024 during the last server failure. 11026 The client implementation of COMMIT is a little more complex. There 11027 are two reasons for wanting to commit a client buffer to stable 11028 storage. The first is that the client wants to reuse a buffer. In 11029 this case, the offset and count of the buffer are sent to the server 11030 in the COMMIT request. The server then flushes any cached data based 11031 on the offset and count, and flushes any metadata associated with the 11032 file. It then returns the status of the flush and the write 11033 verifier. The other reason for the client to generate a COMMIT is 11034 for a full file flush, such as may be done at close. In this case, 11035 the client would gather all of the buffers for this file that contain 11036 uncommitted data, do the COMMIT operation with an offset of 0 and 11037 count of 0, and then free all of those buffers. Any other dirty 11038 buffers would be sent to the server in the normal fashion. 11040 After a buffer is written by the client with the stable parameter set 11041 to UNSTABLE4, the buffer must be considered as modified by the client 11042 until the buffer has either been flushed via a COMMIT operation or 11043 written via a WRITE operation with stable parameter set to FILE_SYNC4 11044 or DATA_SYNC4. This is done to prevent the buffer from being freed 11045 and reused before the data can be flushed to stable storage on the 11046 server. 11048 When a response is returned from either a WRITE or a COMMIT operation 11049 and it contains a write verifier that is different than previously 11050 returned by the server, the client will need to retransmit all of the 11051 buffers containing uncommitted cached data to the server. How this 11052 is to be done is up to the implementor. If there is only one buffer 11053 of interest, then it should probably be sent back over in a WRITE 11054 request with the appropriate stable parameter. If there is more than 11055 one buffer, it might be worthwhile retransmitting all of the buffers 11056 in WRITE requests with the stable parameter set to UNSTABLE4 and then 11057 retransmitting the COMMIT operation to flush all of the data on the 11058 server to stable storage. The timing of these retransmissions is 11059 left to the implementor. 11061 The above description applies to page-cache-based systems as well as 11062 buffer-cache-based systems. In those systems, the virtual memory 11063 system will need to be modified instead of the buffer cache. 11065 15.6. Operation 6: CREATE - Create a Non-Regular File Object 11067 15.6.1. SYNOPSIS 11069 (cfh), name, type, attrs -> (cfh), change_info, attrs_set 11071 15.6.2. ARGUMENT 11073 union createtype4 switch (nfs_ftype4 type) { 11074 case NF4LNK: 11075 linktext4 linkdata; 11076 case NF4BLK: 11077 case NF4CHR: 11078 specdata4 devdata; 11079 case NF4SOCK: 11080 case NF4FIFO: 11081 case NF4DIR: 11082 void; 11083 default: 11084 void; /* server should return NFS4ERR_BADTYPE */ 11085 }; 11087 struct CREATE4args { 11088 /* CURRENT_FH: directory for creation */ 11089 createtype4 objtype; 11090 component4 objname; 11091 fattr4 createattrs; 11092 }; 11094 15.6.3. RESULT 11096 struct CREATE4resok { 11097 change_info4 cinfo; 11098 bitmap4 attrset; /* attributes set */ 11099 }; 11101 union CREATE4res switch (nfsstat4 status) { 11102 case NFS4_OK: 11103 CREATE4resok resok4; 11104 default: 11105 void; 11106 }; 11108 15.6.4. DESCRIPTION 11110 The CREATE operation creates a non-regular file object in a directory 11111 with a given name. The OPEN operation MUST be used to create a 11112 regular file. 11114 The objname specifies the name for the new object. The objtype 11115 determines the type of object to be created: directory, symlink, etc. 11117 If an object of the same name already exists in the directory, the 11118 server will return the error NFS4ERR_EXIST. 11120 For the directory where the new file object was created, the server 11121 returns change_info4 information in cinfo. With the atomic field of 11122 the change_info4 struct, the server will indicate if the before and 11123 after change attributes were obtained atomically with respect to the 11124 file object creation. 11126 If the objname is of zero length, NFS4ERR_INVAL will be returned. 11127 The objname is also subject to the normal UTF-8, character support, 11128 and name checks. See Section 12.3 for further discussion. 11130 If the objname has a length of 0 (zero), or if objname does not obey 11131 the UTF-8 definition, the error NFS4ERR_INVAL will be returned. 11133 The current filehandle is replaced by that of the new object. 11135 The createattrs specifies the initial set of attributes for the 11136 object. The set of attributes may include any writable attribute 11137 valid for the object type. When the operation is successful, the 11138 server will return to the client an attribute mask signifying which 11139 attributes were successfully set for the object. 11141 If createattrs includes neither the owner attribute nor an ACL with 11142 an ACE for the owner, and if the server's filesystem both supports 11143 and requires an owner attribute (or an owner ACE) then the server 11144 MUST derive the owner (or the owner ACE). This would typically be 11145 from the principal indicated in the RPC credentials of the call, but 11146 the server's operating environment or filesystem semantics may 11147 dictate other methods of derivation. Similarly, if createattrs 11148 includes neither the group attribute nor a group ACE, and if the 11149 server's filesystem both supports and requires the notion of a group 11150 attribute (or group ACE), the server MUST derive the group attribute 11151 (or the corresponding owner ACE) for the file. This could be from 11152 the RPC call's credentials, such as the group principal if the 11153 credentials include it (such as with AUTH_SYS), from the group 11154 identifier associated with the principal in the credentials (e.g., 11155 POSIX systems have a user database [37] that has the group identifier 11156 for every user identifier), inherited from directory the object is 11157 created in, or whatever else the server's operating environment or 11158 filesystem semantics dictate. This applies to the OPEN operation 11159 too. 11161 Conversely, it is possible the client will specify in createattrs an 11162 owner attribute or group attribute or ACL that the principal 11163 indicated the RPC call's credentials does not have permissions to 11164 create files for. The error to be returned in this instance is 11165 NFS4ERR_PERM. This applies to the OPEN operation too. 11167 15.6.5. IMPLEMENTATION 11169 If the client desires to set attribute values after the create, a 11170 SETATTR operation can be added to the COMPOUND request so that the 11171 appropriate attributes will be set. 11173 15.7. Operation 7: DELEGPURGE - Purge Delegations Awaiting Recovery 11175 15.7.1. SYNOPSIS 11177 clientid -> 11179 15.7.2. ARGUMENT 11181 struct DELEGPURGE4args { 11182 clientid4 clientid; 11183 }; 11185 15.7.3. RESULT 11187 struct DELEGPURGE4res { 11188 nfsstat4 status; 11189 }; 11191 15.7.4. DESCRIPTION 11193 Purges all of the delegations awaiting recovery for a given client. 11194 This is useful for clients which do not commit delegation information 11195 to stable storage to indicate that conflicting requests need not be 11196 delayed by the server awaiting recovery of delegation information. 11198 This operation should be used by clients that record delegation 11199 information on stable storage on the client. In this case, 11200 DELEGPURGE should be issued immediately after doing delegation 11201 recovery on all delegations known to the client. Doing so will 11202 notify the server that no additional delegations for the client will 11203 be recovered allowing it to free resources, and avoid delaying other 11204 clients who make requests that conflict with the unrecovered 11205 delegations. The set of delegations known to the server and the 11206 client may be different. The reason for this is that a client may 11207 fail after making a request which resulted in delegation but before 11208 it received the results and committed them to the client's stable 11209 storage. 11211 The server MAY support DELEGPURGE, but if it does not, it MUST NOT 11212 support CLAIM_DELEGATE_PREV. 11214 15.8. Operation 8: DELEGRETURN - Return Delegation 11216 15.8.1. SYNOPSIS 11218 (cfh), stateid -> 11220 15.8.2. ARGUMENT 11222 struct DELEGRETURN4args { 11223 /* CURRENT_FH: delegated file */ 11224 stateid4 deleg_stateid; 11225 }; 11227 15.8.3. RESULT 11229 struct DELEGRETURN4res { 11230 nfsstat4 status; 11231 }; 11233 15.8.4. DESCRIPTION 11235 Returns the delegation represented by the current filehandle and 11236 stateid. 11238 Delegations may be returned when recalled or voluntarily (i.e., 11239 before the server has recalled them). In either case the client must 11240 properly propagate state changed under the context of the delegation 11241 to the server before returning the delegation. 11243 15.9. Operation 9: GETATTR - Get Attributes 11245 15.9.1. SYNOPSIS 11247 (cfh), attrbits -> attrbits, attrvals 11249 15.9.2. ARGUMENT 11251 struct GETATTR4args { 11252 /* CURRENT_FH: directory or file */ 11253 bitmap4 attr_request; 11254 }; 11256 15.9.3. RESULT 11258 struct GETATTR4resok { 11259 fattr4 obj_attributes; 11260 }; 11262 union GETATTR4res switch (nfsstat4 status) { 11263 case NFS4_OK: 11264 GETATTR4resok resok4; 11265 default: 11266 void; 11267 }; 11269 15.9.4. DESCRIPTION 11271 The GETATTR operation will obtain attributes for the filesystem 11272 object specified by the current filehandle. The client sets a bit in 11273 the bitmap argument for each attribute value that it would like the 11274 server to return. The server returns an attribute bitmap that 11275 indicates the attribute values for which it was able to return, 11276 followed by the attribute values ordered lowest attribute number 11277 first. 11279 The server MUST return a value for each attribute that the client 11280 requests if the attribute is supported by the server. If the server 11281 does not support an attribute or cannot approximate a useful value 11282 then it MUST NOT return the attribute value and MUST NOT set the 11283 attribute bit in the result bitmap. The server MUST return an error 11284 if it supports an attribute on the target but cannot obtain its 11285 value. In that case no attribute values will be returned. 11287 File systems which are absent should be treated as having support for 11288 a very small set of attributes as described in GETATTR Within an 11289 Absent File System (Section 7.3.1), even if previously, when the file 11290 system was present, more attributes were supported. 11292 All servers MUST support the REQUIRED attributes as specified in the 11293 section File Attributes (Section 5), for all file systems, with the 11294 exception of absent file systems. 11296 On success, the current filehandle retains its value. 11298 15.9.5. IMPLEMENTATION 11300 Suppose there is a OPEN_DELEGATE_WRITE delegation held by another 11301 client for file in question and size and/or change are among the set 11302 of attributes being interrogated. The server has two choices. 11304 First, the server can obtain the actual current value of these 11305 attributes from the client holding the delegation by using the 11306 CB_GETATTR callback. Second, the server, particularly when the 11307 delegated client is unresponsive, can recall the delegation in 11308 question. The GETATTR MUST NOT proceed until one of the following 11309 occurs: 11311 o The requested attribute values are returned in the response to 11312 CB_GETATTR. 11314 o The OPEN_DELEGATE_WRITE delegation is returned. 11316 o The OPEN_DELEGATE_WRITE delegation is revoked. 11318 Unless one of the above happens very quickly, one or more 11319 NFS4ERR_DELAY errors will be returned if while a delegation is 11320 outstanding. 11322 15.10. Operation 10: GETFH - Get Current Filehandle 11324 15.10.1. SYNOPSIS 11326 (cfh) -> filehandle 11328 15.10.2. ARGUMENT 11330 /* CURRENT_FH: */ 11331 void; 11333 15.10.3. RESULT 11335 struct GETFH4resok { 11336 nfs_fh4 object; 11337 }; 11339 union GETFH4res switch (nfsstat4 status) { 11340 case NFS4_OK: 11341 GETFH4resok resok4; 11342 default: 11343 void; 11344 }; 11346 15.10.4. DESCRIPTION 11348 This operation returns the current filehandle value. 11350 On success, the current filehandle retains its value. 11352 15.10.5. IMPLEMENTATION 11354 Operations that change the current filehandle like LOOKUP or CREATE 11355 do not automatically return the new filehandle as a result. For 11356 instance, if a client needs to lookup a directory entry and obtain 11357 its filehandle then the following request is needed. 11359 PUTFH (directory filehandle) 11360 LOOKUP (entry name) 11361 GETFH 11363 15.11. Operation 11: LINK - Create Link to a File 11365 15.11.1. SYNOPSIS 11367 (sfh), (cfh), newname -> (cfh), change_info 11369 15.11.2. ARGUMENT 11371 struct LINK4args { 11372 /* SAVED_FH: source object */ 11373 /* CURRENT_FH: target directory */ 11374 component4 newname; 11375 }; 11377 15.11.3. RESULT 11379 struct LINK4resok { 11380 change_info4 cinfo; 11381 }; 11383 union LINK4res switch (nfsstat4 status) { 11384 case NFS4_OK: 11385 LINK4resok resok4; 11386 default: 11387 void; 11388 }; 11390 15.11.4. DESCRIPTION 11392 The LINK operation creates an additional newname for the file 11393 represented by the saved filehandle, as set by the SAVEFH operation, 11394 in the directory represented by the current filehandle. The existing 11395 file and the target directory must reside within the same filesystem 11396 on the server. On success, the current filehandle will continue to 11397 be the target directory. If an object exists in the target directory 11398 with the same name as newname, the server must return NFS4ERR_EXIST. 11400 For the target directory, the server returns change_info4 information 11401 in cinfo. With the atomic field of the change_info4 struct, the 11402 server will indicate if the before and after change attributes were 11403 obtained atomically with respect to the link creation. 11405 If the newname has a length of 0 (zero), or if newname does not obey 11406 the UTF-8 definition, the error NFS4ERR_INVAL will be returned. 11408 15.11.5. IMPLEMENTATION 11410 Changes to any property of the "hard" linked files are reflected in 11411 all of the linked files. When a link is made to a file, the 11412 attributes for the file should have a value for numlinks that is one 11413 greater than the value before the LINK operation. 11415 The statement "file and the target directory must reside within the 11416 same filesystem on the server" means that the fsid fields in the 11417 attributes for the objects are the same. If they reside on different 11418 filesystems, the error, NFS4ERR_XDEV, is returned. On some servers, 11419 the filenames, "." and "..", are illegal as newname. 11421 In the case that newname is already linked to the file represented by 11422 the saved filehandle, the server will return NFS4ERR_EXIST. 11424 Note that symbolic links are created with the CREATE operation. 11426 15.12. Operation 12: LOCK - Create Lock 11428 15.12.1. SYNOPSIS 11430 (cfh) locktype, reclaim, offset, length, locker -> stateid 11432 15.12.2. ARGUMENT 11434 enum nfs_lock_type4 { 11435 READ_LT = 1, 11436 WRITE_LT = 2, 11437 READW_LT = 3, /* blocking read */ 11438 WRITEW_LT = 4 /* blocking write */ 11439 }; 11440 /* 11441 * For LOCK, transition from open_owner to new lock_owner 11442 */ 11443 struct open_to_lock_owner4 { 11444 seqid4 open_seqid; 11445 stateid4 open_stateid; 11446 seqid4 lock_seqid; 11447 lock_owner4 lock_owner; 11448 }; 11450 /* 11451 * For LOCK, existing lock_owner continues to request file locks 11452 */ 11453 struct exist_lock_owner4 { 11454 stateid4 lock_stateid; 11455 seqid4 lock_seqid; 11456 }; 11458 union locker4 switch (bool new_lock_owner) { 11459 case TRUE: 11460 open_to_lock_owner4 open_owner; 11461 case FALSE: 11462 exist_lock_owner4 lock_owner; 11463 }; 11465 /* 11466 * LOCK/LOCKT/LOCKU: Record lock management 11467 */ 11468 struct LOCK4args { 11469 /* CURRENT_FH: file */ 11470 nfs_lock_type4 locktype; 11471 bool reclaim; 11472 offset4 offset; 11473 length4 length; 11474 locker4 locker; 11475 }; 11477 15.12.3. RESULT 11479 struct LOCK4denied { 11480 offset4 offset; 11481 length4 length; 11482 nfs_lock_type4 locktype; 11483 lock_owner4 owner; 11484 }; 11486 struct LOCK4resok { 11487 stateid4 lock_stateid; 11488 }; 11490 union LOCK4res switch (nfsstat4 status) { 11491 case NFS4_OK: 11492 LOCK4resok resok4; 11493 case NFS4ERR_DENIED: 11494 LOCK4denied denied; 11495 default: 11496 void; 11497 }; 11499 15.12.4. DESCRIPTION 11501 The LOCK operation requests a byte-range lock for the byte range 11502 specified by the offset and length parameters. The lock type is also 11503 specified to be one of the nfs_lock_type4s. If this is a reclaim 11504 request, the reclaim parameter will be TRUE; 11506 Bytes in a file may be locked even if those bytes are not currently 11507 allocated to the file. To lock the file from a specific offset 11508 through the end-of-file (no matter how long the file actually is) use 11509 a length field with all bits set to 1 (one). If the length is zero, 11510 or if a length which is not all bits set to one is specified, and 11511 length when added to the offset exceeds the maximum 64-bit unsigned 11512 integer value, the error NFS4ERR_INVAL will result. 11514 Some servers may only support locking for byte offsets that fit 11515 within 32 bits. If the client specifies a range that includes a byte 11516 beyond the last byte offset of the 32-bit range, but does not include 11517 the last byte offset of the 32-bit and all of the byte offsets beyond 11518 it, up to the end of the valid 64-bit range, such a 32-bit server 11519 MUST return the error NFS4ERR_BAD_RANGE. 11521 In the case that the lock is denied, the owner, offset, and length of 11522 a conflicting lock are returned. 11524 On success, the current filehandle retains its value. 11526 15.12.5. IMPLEMENTATION 11528 If the server is unable to determine the exact offset and length of 11529 the conflicting lock, the same offset and length that were provided 11530 in the arguments should be returned in the denied results. Section 9 11531 contains a full description of this and the other file locking 11532 operations. 11534 LOCK operations are subject to permission checks and to checks 11535 against the access type of the associated file. However, the 11536 specific right and modes required for various type of locks, reflect 11537 the semantics of the server-exported filesystem, and are not 11538 specified by the protocol. For example, Windows 2000 allows a write 11539 lock of a file open for READ, while a POSIX-compliant system does 11540 not. 11542 When the client makes a lock request that corresponds to a range that 11543 the lockowner has locked already (with the same or different lock 11544 type), or to a sub-region of such a range, or to a region which 11545 includes multiple locks already granted to that lockowner, in whole 11546 or in part, and the server does not support such locking operations 11547 (i.e., does not support POSIX locking semantics), the server will 11548 return the error NFS4ERR_LOCK_RANGE. In that case, the client may 11549 return an error, or it may emulate the required operations, using 11550 only LOCK for ranges that do not include any bytes already locked by 11551 that lock_owner and LOCKU of locks held by that lock_owner 11552 (specifying an exactly-matching range and type). Similarly, when the 11553 client makes a lock request that amounts to upgrading (changing from 11554 a read lock to a write lock) or downgrading (changing from write lock 11555 to a read lock) an existing record lock, and the server does not 11556 support such a lock, the server will return NFS4ERR_LOCK_NOTSUPP. 11557 Such operations may not perfectly reflect the required semantics in 11558 the face of conflicting lock requests from other clients. 11560 When a client holds an OPEN_DELEGATE_WRITE delegation, the client 11561 holding that delegation is assured that there are no opens by other 11562 clients. Thus, there can be no conflicting LOCK operations from such 11563 clients. Therefore, the client may be handling locking requests 11564 locally, without doing LOCK operations on the server. If it does 11565 that, it must be prepared to update the lock status on the server, by 11566 sending appropriate LOCK and LOCKU operations before returning the 11567 delegation. 11569 When one or more clients hold OPEN_DELEGATE_READ delegations, any 11570 LOCK operation where the server is implementing mandatory locking 11571 semantics MUST result in the recall of all such delegations. The 11572 LOCK operation may not be granted until all such delegations are 11573 returned or revoked. Except where this happens very quickly, one or 11574 more NFS4ERR_DELAY errors will be returned to requests made while the 11575 delegation remains outstanding. 11577 The locker argument specifies the lock-owner that is associated with 11578 the LOCK request. The locker4 structure is a switched union that 11579 indicates whether the client has already created byte-range locking 11580 state associated with the current open file and lock-owner. In the 11581 case in which it has, the argument is just a stateid representing the 11582 set of locks associated with that open file and lock-owner, together 11583 with a lock_seqid value that MAY be any value and MUST be ignored by 11584 the server. In the case where no byte-range locking state has been 11585 established, or the client does not have the stateid available, the 11586 argument contains the stateid of the open file with which this lock 11587 is to be associated, together with the lock-owner with which the lock 11588 is to be associated. The open_to_lock_owner case covers the very 11589 first lock done by a lock-owner for a given open file and offers a 11590 method to use the established state of the open_stateid to transition 11591 to the use of a lock stateid. 11593 15.13. Operation 13: LOCKT - Test For Lock 11595 15.13.1. SYNOPSIS 11597 (cfh) locktype, offset, length, owner -> {void, NFS4ERR_DENIED -> 11598 owner} 11600 15.13.2. ARGUMENT 11602 struct LOCKT4args { 11603 /* CURRENT_FH: file */ 11604 nfs_lock_type4 locktype; 11605 offset4 offset; 11606 length4 length; 11607 lock_owner4 owner; 11608 }; 11610 15.13.3. RESULT 11612 union LOCKT4res switch (nfsstat4 status) { 11613 case NFS4ERR_DENIED: 11614 LOCK4denied denied; 11615 case NFS4_OK: 11616 void; 11617 default: 11618 void; 11619 }; 11621 15.13.4. DESCRIPTION 11623 The LOCKT operation tests the lock as specified in the arguments. If 11624 a conflicting lock exists, the owner, offset, length, and type of the 11625 conflicting lock are returned; if no lock is held, nothing other than 11626 NFS4_OK is returned. Lock types READ_LT and READW_LT are processed 11627 in the same way in that a conflicting lock test is done without 11628 regard to blocking or non-blocking. The same is true for WRITE_LT 11629 and WRITEW_LT. 11631 The ranges are specified as for LOCK. The NFS4ERR_INVAL and 11632 NFS4ERR_BAD_RANGE errors are returned under the same circumstances as 11633 for LOCK. 11635 On success, the current filehandle retains its value. 11637 15.13.5. IMPLEMENTATION 11639 If the server is unable to determine the exact offset and length of 11640 the conflicting lock, the same offset and length that were provided 11641 in the arguments should be returned in the denied results. Section 9 11642 contains further discussion of the file locking mechanisms. 11644 LOCKT uses a lock_owner4 rather a stateid4, as is used in LOCK to 11645 identify the owner. This is because the client does not have to open 11646 the file to test for the existence of a lock, so a stateid may not be 11647 available. 11649 The test for conflicting locks SHOULD exclude locks for the current 11650 lockowner. Note that since such locks are not examined the possible 11651 existence of overlapping ranges may not affect the results of LOCKT. 11652 If the server does examine locks that match the lockowner for the 11653 purpose of range checking, NFS4ERR_LOCK_RANGE may be returned.. In 11654 the event that it returns NFS4_OK, clients may do a LOCK and receive 11655 NFS4ERR_LOCK_RANGE on the LOCK request because of the flexibility 11656 provided to the server. 11658 When a client holds an OPEN_DELEGATE_WRITE delegation, it may choose 11659 (see Section 15.12.5)) to handle LOCK requests locally. In such a 11660 case, LOCKT requests will similarly be handled locally. 11662 15.14. Operation 14: LOCKU - Unlock File 11664 15.14.1. SYNOPSIS 11666 (cfh) type, seqid, stateid, offset, length -> stateid 11668 15.14.2. ARGUMENT 11670 struct LOCKU4args { 11671 /* CURRENT_FH: file */ 11672 nfs_lock_type4 locktype; 11673 seqid4 seqid; 11674 stateid4 lock_stateid; 11675 offset4 offset; 11676 length4 length; 11677 }; 11679 15.14.3. RESULT 11681 union LOCKU4res switch (nfsstat4 status) { 11682 case NFS4_OK: 11683 stateid4 lock_stateid; 11684 default: 11685 void; 11686 }; 11688 15.14.4. DESCRIPTION 11690 The LOCKU operation unlocks the byte-range lock specified by the 11691 parameters. The client may set the locktype field to any value that 11692 is legal for the nfs_lock_type4 enumerated type, and the server MUST 11693 accept any legal value for locktype. Any legal value for locktype 11694 has no effect on the success or failure of the LOCKU operation. 11696 The ranges are specified as for LOCK. The NFS4ERR_INVAL and 11697 NFS4ERR_BAD_RANGE errors are returned under the same circumstances as 11698 for LOCK. 11700 On success, the current filehandle retains its value. 11702 15.14.5. IMPLEMENTATION 11704 If the area to be unlocked does not correspond exactly to a lock 11705 actually held by the lockowner the server may return the error 11706 NFS4ERR_LOCK_RANGE. This includes the case in which the area is not 11707 locked, where the area is a sub-range of the area locked, where it 11708 overlaps the area locked without matching exactly or the area 11709 specified includes multiple locks held by the lockowner. In all of 11710 these cases, allowed by POSIX locking [35] semantics, a client 11711 receiving this error, should if it desires support for such 11712 operations, simulate the operation using LOCKU on ranges 11713 corresponding to locks it actually holds, possibly followed by LOCK 11714 requests for the sub-ranges not being unlocked. 11716 When a client holds an OPEN_DELEGATE_WRITE delegation, it may choose 11717 (see Section 15.12.5)) to handle LOCK requests locally. In such a 11718 case, LOCKU requests will similarly be handled locally. 11720 15.15. Operation 15: LOOKUP - Lookup Filename 11722 15.15.1. SYNOPSIS 11724 (cfh), component -> (cfh) 11726 15.15.2. ARGUMENT 11728 struct LOOKUP4args { 11729 /* CURRENT_FH: directory */ 11730 component4 objname; 11731 }; 11733 15.15.3. RESULT 11735 struct LOOKUP4res { 11736 /* CURRENT_FH: object */ 11737 nfsstat4 status; 11738 }; 11740 15.15.4. DESCRIPTION 11742 This operation LOOKUPs or finds a filesystem object using the 11743 directory specified by the current filehandle. LOOKUP evaluates the 11744 component and if the object exists the current filehandle is replaced 11745 with the component's filehandle. 11747 If the component cannot be evaluated either because it does not exist 11748 or because the client does not have permission to evaluate the 11749 component, then an error will be returned and the current filehandle 11750 will be unchanged. 11752 If the component is of zero length, NFS4ERR_INVAL will be returned. 11753 The component is also subject to the normal UTF-8, character support, 11754 and name checks. See Section 12.3 for further discussion. 11756 15.15.5. IMPLEMENTATION 11758 If the client wants to achieve the effect of a multi-component 11759 lookup, it may construct a COMPOUND request such as (and obtain each 11760 filehandle): 11762 PUTFH (directory filehandle) 11763 LOOKUP "pub" 11764 GETFH 11765 LOOKUP "foo" 11766 GETFH 11767 LOOKUP "bar" 11768 GETFH 11770 NFSv4 servers depart from the semantics of previous NFS versions in 11771 allowing LOOKUP requests to cross mountpoints on the server. The 11772 client can detect a mountpoint crossing by comparing the fsid 11773 attribute of the directory with the fsid attribute of the directory 11774 looked up. If the fsids are different then the new directory is a 11775 server mountpoint. UNIX clients that detect a mountpoint crossing 11776 will need to mount the server's filesystem. This needs to be done to 11777 maintain the file object identity checking mechanisms common to UNIX 11778 clients. 11780 Servers that limit NFS access to "shares" or "exported" filesystems 11781 should provide a pseudo-filesystem into which the exported 11782 filesystems can be integrated, so that clients can browse the 11783 server's name space. The clients' view of a pseudo filesystem will 11784 be limited to paths that lead to exported filesystems. 11786 Note: previous versions of the protocol assigned special semantics to 11787 the names "." and "..". NFSv4 assigns no special semantics to these 11788 names. The LOOKUPP operator must be used to lookup a parent 11789 directory. 11791 Note that this operation does not follow symbolic links. The client 11792 is responsible for all parsing of filenames including filenames that 11793 are modified by symbolic links encountered during the lookup process. 11795 If the current filehandle supplied is not a directory but a symbolic 11796 link, the error NFS4ERR_SYMLINK is returned as the error. For all 11797 other non-directory file types, the error NFS4ERR_NOTDIR is returned. 11799 15.16. Operation 16: LOOKUPP - Lookup Parent Directory 11801 15.16.1. SYNOPSIS 11803 (cfh) -> (cfh) 11805 15.16.2. ARGUMENT 11807 /* CURRENT_FH: object */ 11808 void; 11810 15.16.3. RESULT 11812 struct LOOKUPP4res { 11813 /* CURRENT_FH: directory */ 11814 nfsstat4 status; 11815 }; 11817 15.16.4. DESCRIPTION 11819 The current filehandle is assumed to refer to a regular directory or 11820 a named attribute directory. LOOKUPP assigns the filehandle for its 11821 parent directory to be the current filehandle. If there is no parent 11822 directory an NFS4ERR_NOENT error must be returned. Therefore, 11823 NFS4ERR_NOENT will be returned by the server when the current 11824 filehandle is at the root or top of the server's file tree. 11826 15.16.5. IMPLEMENTATION 11828 As for LOOKUP, LOOKUPP will also cross mountpoints. 11830 If the current filehandle is not a directory or named attribute 11831 directory, the error NFS4ERR_NOTDIR is returned. 11833 15.17. Operation 17: NVERIFY - Verify Difference in Attributes 11835 15.17.1. SYNOPSIS 11837 (cfh), fattr -> - 11839 15.17.2. ARGUMENT 11841 struct NVERIFY4args { 11842 /* CURRENT_FH: object */ 11843 fattr4 obj_attributes; 11844 }; 11846 15.17.3. RESULT 11848 struct NVERIFY4res { 11849 nfsstat4 status; 11850 }; 11852 15.17.4. DESCRIPTION 11854 This operation is used to prefix a sequence of operations to be 11855 performed if one or more attributes have changed on some filesystem 11856 object. If all the attributes match then the error NFS4ERR_SAME must 11857 be returned. 11859 On success, the current filehandle retains its value. 11861 15.17.5. IMPLEMENTATION 11863 This operation is useful as a cache validation operator. If the 11864 object to which the attributes belong has changed then the following 11865 operations may obtain new data associated with that object. For 11866 instance, to check if a file has been changed and obtain new data if 11867 it has: 11869 PUTFH (public) 11870 LOOKUP "foobar" 11871 NVERIFY attrbits attrs 11872 READ 0 32767 11874 In the case that a recommended attribute is specified in the NVERIFY 11875 operation and the server does not support that attribute for the 11876 filesystem object, the error NFS4ERR_ATTRNOTSUPP is returned to the 11877 client. 11879 When the attribute rdattr_error or any write-only attribute (e.g., 11880 time_modify_set) is specified, the error NFS4ERR_INVAL is returned to 11881 the client. 11883 15.18. Operation 18: OPEN - Open a Regular File 11885 15.18.1. SYNOPSIS 11887 (cfh), seqid, share_access, share_deny, owner, openhow, claim -> 11888 (cfh), stateid, cinfo, rflags, attrset, delegation 11890 15.18.2. ARGUMENT 11892 /* 11893 * Various definitions for OPEN 11894 */ 11895 enum createmode4 { 11896 UNCHECKED4 = 0, 11897 GUARDED4 = 1, 11898 EXCLUSIVE4 = 2 11899 }; 11901 union createhow4 switch (createmode4 mode) { 11902 case UNCHECKED4: 11903 case GUARDED4: 11904 fattr4 createattrs; 11905 case EXCLUSIVE4: 11906 verifier4 createverf; 11907 }; 11909 enum opentype4 { 11910 OPEN4_NOCREATE = 0, 11911 OPEN4_CREATE = 1 11912 }; 11914 union openflag4 switch (opentype4 opentype) { 11915 case OPEN4_CREATE: 11916 createhow4 how; 11917 default: 11918 void; 11919 }; 11921 /* Next definitions used for OPEN delegation */ 11922 enum limit_by4 { 11923 NFS_LIMIT_SIZE = 1, 11924 NFS_LIMIT_BLOCKS = 2 11925 /* others as needed */ 11926 }; 11928 struct nfs_modified_limit4 { 11929 uint32_t num_blocks; 11930 uint32_t bytes_per_block; 11932 }; 11934 union nfs_space_limit4 switch (limit_by4 limitby) { 11935 /* limit specified as file size */ 11936 case NFS_LIMIT_SIZE: 11937 uint64_t filesize; 11938 /* limit specified by number of blocks */ 11939 case NFS_LIMIT_BLOCKS: 11940 nfs_modified_limit4 mod_blocks; 11941 } ; 11943 enum open_delegation_type4 { 11944 OPEN_DELEGATE_NONE = 0, 11945 OPEN_DELEGATE_READ = 1, 11946 OPEN_DELEGATE_WRITE = 2 11947 }; 11949 enum open_claim_type4 { 11950 CLAIM_NULL = 0, 11951 CLAIM_PREVIOUS = 1, 11952 CLAIM_DELEGATE_CUR = 2, 11953 CLAIM_DELEGATE_PREV = 3 11954 }; 11956 struct open_claim_delegate_cur4 { 11957 stateid4 delegate_stateid; 11958 component4 file; 11959 }; 11961 union open_claim4 switch (open_claim_type4 claim) { 11962 /* 11963 * No special rights to file. 11964 * Ordinary OPEN of the specified file. 11965 */ 11966 case CLAIM_NULL: 11967 /* CURRENT_FH: directory */ 11968 component4 file; 11969 /* 11970 * Right to the file established by an 11971 * open previous to server reboot. File 11972 * identified by filehandle obtained at 11973 * that time rather than by name. 11974 */ 11975 case CLAIM_PREVIOUS: 11976 /* CURRENT_FH: file being reclaimed */ 11977 open_delegation_type4 delegate_type; 11979 /* 11980 * Right to file based on a delegation 11981 * granted by the server. File is 11982 * specified by name. 11983 */ 11984 case CLAIM_DELEGATE_CUR: 11985 /* CURRENT_FH: directory */ 11986 open_claim_delegate_cur4 delegate_cur_info; 11988 /* 11989 * Right to file based on a delegation 11990 * granted to a previous boot instance 11991 * of the client. File is specified by name. 11992 */ 11993 case CLAIM_DELEGATE_PREV: 11994 /* CURRENT_FH: directory */ 11995 component4 file_delegate_prev; 11996 }; 11998 /* 11999 * OPEN: Open a file, potentially receiving an open delegation 12000 */ 12001 struct OPEN4args { 12002 seqid4 seqid; 12003 uint32_t share_access; 12004 uint32_t share_deny; 12005 open_owner4 owner; 12006 openflag4 openhow; 12007 open_claim4 claim; 12008 }; 12010 15.18.3. RESULT 12012 struct open_read_delegation4 { 12013 stateid4 stateid; /* Stateid for delegation*/ 12014 bool recall; /* Pre-recalled flag for 12015 delegations obtained 12016 by reclaim (CLAIM_PREVIOUS) */ 12018 nfsace4 permissions; /* Defines users who don't 12019 need an ACCESS call to 12020 open for read */ 12021 }; 12023 struct open_write_delegation4 { 12024 stateid4 stateid; /* Stateid for delegation */ 12025 bool recall; /* Pre-recalled flag for 12026 delegations obtained 12027 by reclaim 12028 (CLAIM_PREVIOUS) */ 12030 nfs_space_limit4 12031 space_limit; /* Defines condition that 12032 the client must check to 12033 determine whether the 12034 file needs to be flushed 12035 to the server on close. */ 12037 nfsace4 permissions; /* Defines users who don't 12038 need an ACCESS call as 12039 part of a delegated 12040 open. */ 12041 }; 12043 union open_delegation4 12044 switch (open_delegation_type4 delegation_type) { 12045 case OPEN_DELEGATE_NONE: 12046 void; 12047 case OPEN_DELEGATE_READ: 12048 open_read_delegation4 read; 12049 case OPEN_DELEGATE_WRITE: 12050 open_write_delegation4 write; 12051 }; 12053 /* 12054 * Result flags 12055 */ 12057 /* Client must confirm open */ 12058 const OPEN4_RESULT_CONFIRM = 0x00000002; 12059 /* Type of file locking behavior at the server */ 12060 const OPEN4_RESULT_LOCKTYPE_POSIX = 0x00000004; 12062 struct OPEN4resok { 12063 stateid4 stateid; /* Stateid for open */ 12064 change_info4 cinfo; /* Directory Change Info */ 12065 uint32_t rflags; /* Result flags */ 12066 bitmap4 attrset; /* attribute set for create*/ 12067 open_delegation4 delegation; /* Info on any open 12068 delegation */ 12069 }; 12071 union OPEN4res switch (nfsstat4 status) { 12072 case NFS4_OK: 12073 /* CURRENT_FH: opened file */ 12074 OPEN4resok resok4; 12076 default: 12077 void; 12078 }; 12080 15.18.4. WARNING TO CLIENT IMPLEMENTORS 12082 OPEN resembles LOOKUP in that it generates a filehandle for the 12083 client to use. Unlike LOOKUP though, OPEN creates server state on 12084 the filehandle. In normal circumstances, the client can only release 12085 this state with a CLOSE operation. CLOSE uses the current filehandle 12086 to determine which file to close. Therefore the client MUST follow 12087 every OPEN operation with a GETFH operation in the same COMPOUND 12088 procedure. This will supply the client with the filehandle such that 12089 CLOSE can be used appropriately. 12091 Simply waiting for the lease on the file to expire is insufficient 12092 because the server may maintain the state indefinitely as long as 12093 another client does not attempt to make a conflicting access to the 12094 same file. 12096 15.18.5. DESCRIPTION 12098 The OPEN operation creates and/or opens a regular file in a directory 12099 with the provided name. If the file does not exist at the server and 12100 creation is desired, specification of the method of creation is 12101 provided by the openhow parameter. The client has the choice of 12102 three creation methods: UNCHECKED4, GUARDED4, or EXCLUSIVE4. 12104 If the current filehandle is a named attribute directory, OPEN will 12105 then create or open a named attribute file. Note that exclusive 12106 create of a named attribute is not supported. If the createmode is 12107 EXCLUSIVE4 and the current filehandle is a named attribute directory, 12108 the server will return EINVAL. 12110 UNCHECKED4 means that the file should be created if a file of that 12111 name does not exist and encountering an existing regular file of that 12112 name is not an error. For this type of create, createattrs specifies 12113 the initial set of attributes for the file. The set of attributes 12114 may include any writable attribute valid for regular files. When an 12115 UNCHECKED4 create encounters an existing file, the attributes 12116 specified by createattrs are not used, except that when an size of 12117 zero is specified, the existing file is truncated. If GUARDED4 is 12118 specified, the server checks for the presence of a duplicate object 12119 by name before performing the create. If a duplicate exists, an 12120 error of NFS4ERR_EXIST is returned as the status. If the object does 12121 not exist, the request is performed as described for UNCHECKED4. For 12122 each of these cases (UNCHECKED4 and GUARDED4) where the operation is 12123 successful, the server will return to the client an attribute mask 12124 signifying which attributes were successfully set for the object. 12126 EXCLUSIVE4 specifies that the server is to follow exclusive creation 12127 semantics, using the verifier to ensure exclusive creation of the 12128 target. The server should check for the presence of a duplicate 12129 object by name. If the object does not exist, the server creates the 12130 object and stores the verifier with the object. If the object does 12131 exist and the stored verifier matches the client provided verifier, 12132 the server uses the existing object as the newly created object. If 12133 the stored verifier does not match, then an error of NFS4ERR_EXIST is 12134 returned. No attributes may be provided in this case, since the 12135 server may use an attribute of the target object to store the 12136 verifier. If the server uses an attribute to store the exclusive 12137 create verifier, it will signify which attribute by setting the 12138 appropriate bit in the attribute mask that is returned in the 12139 results. 12141 For the target directory, the server returns change_info4 information 12142 in cinfo. With the atomic field of the change_info4 struct, the 12143 server will indicate if the before and after change attributes were 12144 obtained atomically with respect to the link creation. 12146 Upon successful creation, the current filehandle is replaced by that 12147 of the new object. 12149 The OPEN operation provides for Windows share reservation capability 12150 with the use of the share_access and share_deny fields of the OPEN 12151 arguments. The client specifies at OPEN the required share_access 12152 and share_deny modes. For clients that do not directly support 12153 SHAREs (i.e., UNIX), the expected deny value is DENY_NONE. In the 12154 case that there is a existing SHARE reservation that conflicts with 12155 the OPEN request, the server returns the error NFS4ERR_SHARE_DENIED. 12156 For a complete SHARE request, the client must provide values for the 12157 owner and seqid fields for the OPEN argument. For additional 12158 discussion of SHARE semantics see Section 9.9. 12160 In the case that the client is recovering state from a server 12161 failure, the claim field of the OPEN argument is used to signify that 12162 the request is meant to reclaim state previously held. 12164 The "claim" field of the OPEN argument is used to specify the file to 12165 be opened and the state information which the client claims to 12166 possess. There are four basic claim types which cover the various 12167 situations for an OPEN. They are as follows: 12169 CLAIM_NULL: For the client, this is a new OPEN request and there is 12170 no previous state associate with the file for the client. 12172 CLAIM_PREVIOUS: The client is claiming basic OPEN state for a file 12173 that was held previous to a server reboot. Generally used when a 12174 server is returning persistent filehandles; the client may not 12175 have the file name to reclaim the OPEN. 12177 CLAIM_DELEGATE_CUR: The client is claiming a delegation for OPEN as 12178 granted by the server. Generally this is done as part of 12179 recalling a delegation. 12181 CLAIM_DELEGATE_PREV: The client is claiming a delegation granted to 12182 a previous client instance; used after the client reboots. The 12183 server MAY support CLAIM_DELEGATE_PREV. If it does support 12184 CLAIM_DELEGATE_PREV, SETCLIENTID_CONFIRM MUST NOT remove the 12185 client's delegation state, and the server MUST support the 12186 DELEGPURGE operation. 12188 For OPEN requests whose claim type is other than CLAIM_PREVIOUS 12189 (i.e., requests other than those devoted to reclaiming opens after a 12190 server reboot) that reach the server during its grace or lease 12191 expiration period, the server returns an error of NFS4ERR_GRACE. 12193 For any OPEN request, the server may return an open delegation, which 12194 allows further opens and closes to be handled locally on the client 12195 as described in Section 10.4. Note that delegation is up to the 12196 server to decide. The client should never assume that delegation 12197 will or will not be granted in a particular instance. It should 12198 always be prepared for either case. A partial exception is the 12199 reclaim (CLAIM_PREVIOUS) case, in which a delegation type is claimed. 12200 In this case, delegation will always be granted, although the server 12201 may specify an immediate recall in the delegation structure. 12203 The rflags returned by a successful OPEN allow the server to return 12204 information governing how the open file is to be handled. 12206 OPEN4_RESULT_CONFIRM indicates that the client MUST execute an 12207 OPEN_CONFIRM operation before using the open file. 12208 OPEN4_RESULT_LOCKTYPE_POSIX indicates the server's file locking 12209 behavior supports the complete set of Posix locking techniques [35]. 12210 From this the client can choose to manage file locking state in a way 12211 to handle a mis-match of file locking management. 12213 If the component is of zero length, NFS4ERR_INVAL will be returned. 12214 The component is also subject to the normal UTF-8, character support, 12215 and name checks. See Section 12.3 for further discussion. 12217 When an OPEN is done and the specified open_owner already has the 12218 resulting filehandle open, the result is to "OR" together the new 12219 share and deny status together with the existing status. In this 12220 case, only a single CLOSE need be done, even though multiple OPENs 12221 were completed. When such an OPEN is done, checking of share 12222 reservations for the new OPEN proceeds normally, with no exception 12223 for the existing OPEN held by the same owner. In this case, the 12224 stateid returned as an "other" field that matches that of the 12225 previous open while the "seqid" field is incremented to reflect the 12226 change status due to the new open. 12228 If the underlying filesystem at the server is only accessible in a 12229 read-only mode and the OPEN request has specified ACCESS_WRITE or 12230 ACCESS_BOTH, the server will return NFS4ERR_ROFS to indicate a read- 12231 only filesystem. 12233 As with the CREATE operation, the server MUST derive the owner, owner 12234 ACE, group, or group ACE if any of the four attributes are required 12235 and supported by the server's filesystem. For an OPEN with the 12236 EXCLUSIVE4 createmode, the server has no choice, since such OPEN 12237 calls do not include the createattrs field. Conversely, if 12238 createattrs is specified, and includes owner or group (or 12239 corresponding ACEs) that the principal in the RPC call's credentials 12240 does not have authorization to create files for, then the server may 12241 return NFS4ERR_PERM. 12243 In the case of a OPEN which specifies a size of zero (e.g., 12244 truncation) and the file has named attributes, the named attributes 12245 are left as is. They are not removed. 12247 15.18.6. IMPLEMENTATION 12249 The OPEN operation contains support for EXCLUSIVE4 create. The 12250 mechanism is similar to the support in NFSv3 [14]. As in NFSv3, this 12251 mechanism provides reliable exclusive creation. Exclusive create is 12252 invoked when the how parameter is EXCLUSIVE4. In this case, the 12253 client provides a verifier that can reasonably be expected to be 12254 unique. A combination of a client identifier, perhaps the client 12255 network address, and a unique number generated by the client, perhaps 12256 the RPC transaction identifier, may be appropriate. 12258 If the object does not exist, the server creates the object and 12259 stores the verifier in stable storage. For filesystems that do not 12260 provide a mechanism for the storage of arbitrary file attributes, the 12261 server may use one or more elements of the object meta-data to store 12262 the verifier. The verifier must be stored in stable storage to 12263 prevent erroneous failure on retransmission of the request. It is 12264 assumed that an exclusive create is being performed because exclusive 12265 semantics are critical to the application. Because of the expected 12266 usage, exclusive CREATE does not rely solely on the normally volatile 12267 duplicate request cache for storage of the verifier. The duplicate 12268 request cache in volatile storage does not survive a crash and may 12269 actually flush on a long network partition, opening failure windows. 12270 In the UNIX local filesystem environment, the expected storage 12271 location for the verifier on creation is the meta-data (time stamps) 12272 of the object. For this reason, an exclusive object create may not 12273 include initial attributes because the server would have nowhere to 12274 store the verifier. 12276 If the server cannot support these exclusive create semantics, 12277 possibly because of the requirement to commit the verifier to stable 12278 storage, it should fail the OPEN request with the error, 12279 NFS4ERR_NOTSUPP. 12281 During an exclusive CREATE request, if the object already exists, the 12282 server reconstructs the object's verifier and compares it with the 12283 verifier in the request. If they match, the server treats the 12284 request as a success. The request is presumed to be a duplicate of 12285 an earlier, successful request for which the reply was lost and that 12286 the server duplicate request cache mechanism did not detect. If the 12287 verifiers do not match, the request is rejected with the status, 12288 NFS4ERR_EXIST. 12290 Once the client has performed a successful exclusive create, it must 12291 issue a SETATTR to set the correct object attributes. Until it does 12292 so, it should not rely upon any of the object attributes, since the 12293 server implementation may need to overload object meta-data to store 12294 the verifier. The subsequent SETATTR must not occur in the same 12295 COMPOUND request as the OPEN. This separation will guarantee that 12296 the exclusive create mechanism will continue to function properly in 12297 the face of retransmission of the request. 12299 Use of the GUARDED4 attribute does not provide exactly-once 12300 semantics. In particular, if a reply is lost and the server does not 12301 detect the retransmission of the request, the operation can fail with 12302 NFS4ERR_EXIST, even though the create was performed successfully. 12303 The client would use this behavior in the case that the application 12304 has not requested an exclusive create but has asked to have the file 12305 truncated when the file is opened. In the case of the client timing 12306 out and retransmitting the create request, the client can use 12307 GUARDED4 to prevent against a sequence like: create, write, create 12308 (retransmitted) from occurring. 12310 For SHARE reservations, the client must specify a value for 12311 share_access that is one of READ, WRITE, or BOTH. For share_deny, 12312 the client must specify one of NONE, READ, WRITE, or BOTH. If the 12313 client fails to do this, the server must return NFS4ERR_INVAL. 12315 Based on the share_access value (READ, WRITE, or BOTH) the client 12316 should check that the requester has the proper access rights to 12317 perform the specified operation. This would generally be the results 12318 of applying the ACL access rules to the file for the current 12319 requester. However, just as with the ACCESS operation, the client 12320 should not attempt to second-guess the server's decisions, as access 12321 rights may change and may be subject to server administrative 12322 controls outside the ACL framework. If the requester is not 12323 authorized to READ or WRITE (depending on the share_access value), 12324 the server must return NFS4ERR_ACCESS. Note that since the NFS 12325 version 4 protocol does not impose any requirement that READs and 12326 WRITEs issued for an open file have the same credentials as the OPEN 12327 itself, the server still must do appropriate access checking on the 12328 READs and WRITEs themselves. 12330 If the component provided to OPEN is a symbolic link, the error 12331 NFS4ERR_SYMLINK will be returned to the client. If the current 12332 filehandle is not a directory, the error NFS4ERR_NOTDIR will be 12333 returned. 12335 If a COMPOUND contains an OPEN which establishes a 12336 OPEN_DELEGATE_WRITE delegation, then a subsequent GETATTR inside that 12337 COMPOUND SHOULD not result in a CB_GETATTR to the client. The server 12338 SHOULD understand the GETATTR to be for the same client ID and avoid 12339 querying the client, which will not be able to respond. This 12340 sequence of OPEN, GETATTR SHOULD be understood as an atomic retrieval 12341 of the initial size and change attribute. Further, the client SHOULD 12342 NOT construct a COMPOUND which mixes operations for different client 12343 IDs. 12345 15.18.7. Warning to Client Implementors 12347 OPEN resembles LOOKUP in that it generates a filehandle for the 12348 client to use. Unlike LOOKUP though, OPEN creates server state on 12349 the filehandle. In normal circumstances, the client can only release 12350 this state with a CLOSE operation. CLOSE uses the current filehandle 12351 to determine which file to close. Therefore, the client MUST follow 12352 every OPEN operation with a GETFH operation in the same COMPOUND 12353 procedure. This will supply the client with the filehandle such that 12354 CLOSE can be used appropriately. 12356 Simply waiting for the lease on the file to expire is insufficient 12357 because the server may maintain the state indefinitely as long as 12358 another client does not attempt to make a conflicting access to the 12359 same file. 12361 15.19. Operation 19: OPENATTR - Open Named Attribute Directory 12363 15.19.1. SYNOPSIS 12365 (cfh) createdir -> (cfh) 12367 15.19.2. ARGUMENT 12369 struct OPENATTR4args { 12370 /* CURRENT_FH: object */ 12371 bool createdir; 12372 }; 12374 15.19.3. RESULT 12376 struct OPENATTR4res { 12377 /* CURRENT_FH: named attr directory */ 12378 nfsstat4 status; 12379 }; 12381 15.19.4. DESCRIPTION 12383 The OPENATTR operation is used to obtain the filehandle of the named 12384 attribute directory associated with the current filehandle. The 12385 result of the OPENATTR will be a filehandle to an object of type 12386 NF4ATTRDIR. From this filehandle, READDIR and LOOKUP operations can 12387 be used to obtain filehandles for the various named attributes 12388 associated with the original filesystem object. Filehandles returned 12389 within the named attribute directory will have a type of 12390 NF4NAMEDATTR. 12392 The createdir argument allows the client to signify if a named 12393 attribute directory should be created as a result of the OPENATTR 12394 operation. Some clients may use the OPENATTR operation with a value 12395 of FALSE for createdir to determine if any named attributes exist for 12396 the object. If none exist, then NFS4ERR_NOENT will be returned. If 12397 createdir has a value of TRUE and no named attribute directory 12398 exists, one is created. The creation of a named attribute directory 12399 assumes that the server has implemented named attribute support in 12400 this fashion and is not required to do so by this definition. 12402 15.19.5. IMPLEMENTATION 12404 If the server does not support named attributes for the current 12405 filehandle, an error of NFS4ERR_NOTSUPP will be returned to the 12406 client. 12408 15.20. Operation 20: OPEN_CONFIRM - Confirm Open 12410 15.20.1. SYNOPSIS 12412 (cfh), seqid, stateid-> stateid 12414 15.20.2. ARGUMENT 12416 struct OPEN_CONFIRM4args { 12417 /* CURRENT_FH: opened file */ 12418 stateid4 open_stateid; 12419 seqid4 seqid; 12420 }; 12422 15.20.3. RESULT 12424 struct OPEN_CONFIRM4resok { 12425 stateid4 open_stateid; 12426 }; 12428 union OPEN_CONFIRM4res switch (nfsstat4 status) { 12429 case NFS4_OK: 12430 OPEN_CONFIRM4resok resok4; 12431 default: 12432 void; 12433 }; 12435 15.20.4. DESCRIPTION 12437 This operation is used to confirm the sequence id usage for the first 12438 time that a open_owner is used by a client. The stateid returned 12439 from the OPEN operation is used as the argument for this operation 12440 along with the next sequence id for the open_owner. The sequence id 12441 passed to the OPEN_CONFIRM must be 1 (one) greater than the seqid 12442 passed to the OPEN operation. If the server receives an unexpected 12443 sequence id with respect to the original open, then the server 12444 assumes that the client will not confirm the original OPEN and all 12445 state associated with the original OPEN is released by the server. 12447 On success, the current filehandle retains its value. 12449 15.20.5. IMPLEMENTATION 12451 A given client might generate many open_owner4 data structures for a 12452 given client ID. The client will periodically either dispose of its 12453 open_owner4s or stop using them for indefinite periods of time. The 12454 latter situation is why the NFSv4 protocol does not have an explicit 12455 operation to exit an open_owner4: such an operation is of no use in 12456 that situation. Instead, to avoid unbounded memory use, the server 12457 needs to implement a strategy for disposing of open_owner4s that have 12458 no current open state for any files and have not been used recently. 12459 The time period used to determine when to dispose of open_owner4s is 12460 an implementation choice. The time period should certainly be no 12461 less than the lease time plus any grace period the server wishes to 12462 implement beyond a lease time. The OPEN_CONFIRM operation allows the 12463 server to safely dispose of unused open_owner4 data structures. 12465 In the case that a client issues an OPEN operation and the server no 12466 longer has a record of the open_owner4, the server needs to ensure 12467 that this is a new OPEN and not a replay or retransmission. 12469 Servers must not require confirmation on OPENs that grant delegations 12470 or are doing reclaim operations. See Section 9.1.9 for details. The 12471 server can easily avoid this by noting whether it has disposed of one 12472 open_owner4 for the given client ID. If the server does not support 12473 delegation, it might simply maintain a single bit that notes whether 12474 any open_owner4 (for any client) has been disposed of. 12476 The server must hold unconfirmed OPEN state until one of three events 12477 occur. First, the client sends an OPEN_CONFIRM request with the 12478 appropriate sequence id and stateid within the lease period. In this 12479 case, the OPEN state on the server goes to confirmed, and the 12480 open_owner4 on the server is fully established. 12482 Second, the client sends another OPEN request with a sequence id that 12483 is incorrect for the open_owner4 (out of sequence). In this case, 12484 the server assumes the second OPEN request is valid and the first one 12485 is a replay. The server cancels the OPEN state of the first OPEN 12486 request, establishes an unconfirmed OPEN state for the second OPEN 12487 request, and responds to the second OPEN request with an indication 12488 that an OPEN_CONFIRM is needed. The process then repeats itself. 12489 While there is a potential for a denial of service attack on the 12490 client, it is mitigated if the client and server require the use of a 12491 security flavor based on Kerberos V5, LIPKEY, or some other flavor 12492 that uses cryptography. 12494 What if the server is in the unconfirmed OPEN state for a given 12495 open_owner4, and it receives an operation on the open_owner4 that has 12496 a stateid but the operation is not OPEN, or it is OPEN_CONFIRM but 12497 with the wrong stateid? Then, even if the seqid is correct, the 12498 server returns NFS4ERR_BAD_STATEID, because the server assumes the 12499 operation is a replay: if the server has no established OPEN state, 12500 then there is no way, for example, a LOCK operation could be valid. 12502 Third, neither of the two aforementioned events occur for the 12503 open_owner4 within the lease period. In this case, the OPEN state is 12504 canceled and disposal of the open_owner4 can occur. 12506 15.21. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access 12508 15.21.1. SYNOPSIS 12510 (cfh), stateid, seqid, access, deny -> stateid 12512 15.21.2. ARGUMENT 12514 struct OPEN_DOWNGRADE4args { 12515 /* CURRENT_FH: opened file */ 12516 stateid4 open_stateid; 12517 seqid4 seqid; 12518 uint32_t share_access; 12519 uint32_t share_deny; 12520 }; 12522 15.21.3. RESULT 12524 struct OPEN_DOWNGRADE4resok { 12525 stateid4 open_stateid; 12526 }; 12528 union OPEN_DOWNGRADE4res switch(nfsstat4 status) { 12529 case NFS4_OK: 12530 OPEN_DOWNGRADE4resok resok4; 12531 default: 12532 void; 12533 }; 12535 15.21.4. DESCRIPTION 12537 This operation is used to adjust the share_access and share_deny bits 12538 for a given open. This is necessary when a given openowner opens the 12539 same file multiple times with different share_access and share_deny 12540 flags. In this situation, a close of one of the opens may change the 12541 appropriate share_access and share_deny flags to remove bits 12542 associated with opens no longer in effect. 12544 The share_access and share_deny bits specified in this operation 12545 replace the current ones for the specified open file. The 12546 share_access and share_deny bits specified must be exactly equal to 12547 the union of the share_access and share_deny bits specified for some 12548 subset of the OPENs in effect for current openowner on the current 12549 file. If that constraint is not respected, the error NFS4ERR_INVAL 12550 should be returned. Since share_access and share_deny bits are 12551 subsets of those already granted, it is not possible for this request 12552 to be denied because of conflicting share reservations. 12554 As the OPEN_DOWNGRADE may change a file to be not-open-for-write and 12555 a write byte-range lock might be held, the server may have to reject 12556 the OPEN_DOWNGRADE with a NFS4ERR_LOCKS_HELD. 12558 On success, the current filehandle retains its value. 12560 15.22. Operation 22: PUTFH - Set Current Filehandle 12562 15.22.1. SYNOPSIS 12564 filehandle -> (cfh) 12566 15.22.2. ARGUMENT 12568 struct PUTFH4args { 12569 nfs_fh4 object; 12570 }; 12572 15.22.3. RESULT 12574 struct PUTFH4res { 12575 /* CURRENT_FH: */ 12576 nfsstat4 status; 12577 }; 12579 15.22.4. DESCRIPTION 12581 Replaces the current filehandle with the filehandle provided as an 12582 argument. Clears the current stateid. 12584 If the security mechanism used by the requester does not meet the 12585 requirements of the filehandle provided to this operation, the server 12586 MUST return NFS4ERR_WRONGSEC. 12588 See Section 15.2.4.1 for more details on the current filehandle. 12590 See Section 15.2.4.2 for more details on the current stateid. 12592 15.22.5. IMPLEMENTATION 12594 Commonly used as the first operator in an NFS request to set the 12595 context for following operations. 12597 15.23. Operation 23: PUTPUBFH - Set Public Filehandle 12599 15.23.1. SYNOPSIS 12601 - -> (cfh) 12603 15.23.2. ARGUMENT 12605 void; 12607 15.23.3. RESULT 12609 struct PUTPUBFH4res { 12610 /* CURRENT_FH: public fh */ 12611 nfsstat4 status; 12612 }; 12614 15.23.4. DESCRIPTION 12616 Replaces the current filehandle with the filehandle that represents 12617 the public filehandle of the server's name space. This filehandle 12618 may be different from the "root" filehandle which may be associated 12619 with some other directory on the server. 12621 The public filehandle represents the concepts embodied in [23], [24], 12622 [38]. The intent for NFSv4 is that the public filehandle 12623 (represented by the PUTPUBFH operation) be used as a method of 12624 providing WebNFS server compatibility with NFSv2 and NFSv3. 12626 The public filehandle and the root filehandle (represented by the 12627 PUTROOTFH operation) should be equivalent. If the public and root 12628 filehandles are not equivalent, then the public filehandle MUST be a 12629 descendant of the root filehandle. 12631 15.23.5. IMPLEMENTATION 12633 Used as the first operator in an NFS request to set the context for 12634 following operations. 12636 With the NFSv2 and 3 public filehandle, the client is able to specify 12637 whether the path name provided in the LOOKUP should be evaluated as 12638 either an absolute path relative to the server's root or relative to 12639 the public filehandle. [38] contains further discussion of the 12640 functionality. With NFSv4, that type of specification is not 12641 directly available in the LOOKUP operation. The reason for this is 12642 because the component separators needed to specify absolute vs. 12643 relative are not allowed in NFSv4. Therefore, the client is 12644 responsible for constructing its request such that the use of either 12645 PUTROOTFH or PUTPUBFH are used to signify absolute or relative 12646 evaluation of an NFS URL respectively. 12648 Note that there are warnings mentioned in [38] with respect to the 12649 use of absolute evaluation and the restrictions the server may place 12650 on that evaluation with respect to how much of its namespace has been 12651 made available. These same warnings apply to NFSv4. It is likely, 12652 therefore that because of server implementation details, an NFSv3 12653 absolute public filehandle lookup may behave differently than an 12654 NFSv4 absolute resolution. 12656 There is a form of security negotiation as described in [39] that 12657 uses the public filehandle a method of employing SNEGO. This method 12658 is not available with NFSv4 as filehandles are not overloaded with 12659 special meaning and therefore do not provide the same framework as 12660 NFSv2 and NFSv3. Clients should therefore use the security 12661 negotiation mechanisms described in this RFC. 12663 15.24. Operation 24: PUTROOTFH - Set Root Filehandle 12665 15.24.1. SYNOPSIS 12667 - -> (cfh) 12669 15.24.2. ARGUMENT 12671 void; 12673 15.24.3. RESULT 12675 struct PUTROOTFH4res { 12676 /* CURRENT_FH: root fh */ 12677 nfsstat4 status; 12678 }; 12680 15.24.4. DESCRIPTION 12682 Replaces the current filehandle with the filehandle that represents 12683 the root of the server's name space. From this filehandle a LOOKUP 12684 operation can locate any other filehandle on the server. This 12685 filehandle may be different from the "public" filehandle which may be 12686 associated with some other directory on the server. 12688 PUTROOTFH also clears the current stateid. 12690 See Section 15.2.4.1 for more details on the current filehandle. 12692 See Section 15.2.4.2 for more details on the current stateid. 12694 15.24.5. IMPLEMENTATION 12696 Commonly used as the first operator in an NFS request to set the 12697 context for following operations. 12699 15.25. Operation 25: READ - Read from File 12701 15.25.1. SYNOPSIS 12703 (cfh), stateid, offset, count -> eof, data 12705 15.25.2. ARGUMENT 12707 struct READ4args { 12708 /* CURRENT_FH: file */ 12709 stateid4 stateid; 12710 offset4 offset; 12711 count4 count; 12712 }; 12714 15.25.3. RESULT 12716 struct READ4resok { 12717 bool eof; 12718 opaque data<>; 12719 }; 12721 union READ4res switch (nfsstat4 status) { 12722 case NFS4_OK: 12723 READ4resok resok4; 12724 default: 12725 void; 12726 }; 12728 15.25.4. DESCRIPTION 12730 The READ operation reads data from the regular file identified by the 12731 current filehandle. 12733 The client provides an offset of where the READ is to start and a 12734 count of how many bytes are to be read. An offset of 0 (zero) means 12735 to read data starting at the beginning of the file. If offset is 12736 greater than or equal to the size of the file, the status, NFS4_OK, 12737 is returned with a data length set to 0 (zero) and eof is set to 12738 TRUE. The READ is subject to access permissions checking. 12740 If the client specifies a count value of 0 (zero), the READ succeeds 12741 and returns 0 (zero) bytes of data again subject to access 12742 permissions checking. The server may choose to return fewer bytes 12743 than specified by the client. The client needs to check for this 12744 condition and handle the condition appropriately. 12746 The stateid value for a READ request represents a value returned from 12747 a previous byte-range lock or share reservation request or the 12748 stateid associated with a delegation. The stateid is used by the 12749 server to verify that the associated share reservation and any byte- 12750 range locks are still valid and to update lease timeouts for the 12751 client. 12753 If the read ended at the end-of-file (formally, in a correctly formed 12754 READ request, if offset + count is equal to the size of the file), or 12755 the read request extends beyond the size of the file (if offset + 12756 count is greater than the size of the file), eof is returned as TRUE; 12757 otherwise it is FALSE. A successful READ of an empty file will 12758 always return eof as TRUE. 12760 If the current filehandle is not a regular file, an error will be 12761 returned to the client. In the case the current filehandle 12762 represents a directory, NFS4ERR_ISDIR is return; otherwise, 12763 NFS4ERR_INVAL is returned. 12765 For a READ with a stateid value of all bits 0, the server MAY allow 12766 the READ to be serviced subject to mandatory file locks or the 12767 current share deny modes for the file. For a READ with a stateid 12768 value of all bits 1, the server MAY allow READ operations to bypass 12769 locking checks at the server. 12771 On success, the current filehandle retains its value. 12773 15.25.5. IMPLEMENTATION 12775 If the server returns a "short read" (i.e., fewer data than requested 12776 and eof is set to FALSE), the client should send another READ to get 12777 the remaining data. A server may return less data than requested 12778 under several circumstances. The file may have been truncated by 12779 another client or perhaps on the server itself, changing the file 12780 size from what the requesting client believes to be the case. This 12781 would reduce the actual amount of data available to the client. It 12782 is possible that the server reduce the transfer size and so return a 12783 short read result. Server resource exhaustion may also occur in a 12784 short read. 12786 If mandatory byte-range locking is in effect for the file, and if the 12787 byte-range corresponding to the data to be read from the file is 12788 WRITE_LT locked by an owner not associated with the stateid, the 12789 server will return the NFS4ERR_LOCKED error. The client should try 12790 to get the appropriate READ_LT via the LOCK operation before 12791 reattempting the READ. When the READ completes, the client should 12792 release the byte-range lock via LOCKU. 12794 If another client has an OPEN_DELEGATE_WRITE delegation for the file 12795 being read, the delegation must be recalled, and the operation cannot 12796 proceed until that delegation is returned or revoked. Except where 12797 this happens very quickly, one or more NFS4ERR_DELAY errors will be 12798 returned to requests made while the delegation remains outstanding. 12799 Normally, delegations will not be recalled as a result of a READ 12800 operation since the recall will occur as a result of an earlier OPEN. 12801 However, since it is possible for a READ to be done with a special 12802 stateid, the server needs to check for this case even though the 12803 client should have done an OPEN previously. 12805 15.26. Operation 26: READDIR - Read Directory 12807 15.26.1. SYNOPSIS 12809 (cfh), cookie, cookieverf, dircount, maxcount, attr_request -> 12810 cookieverf { cookie, name, attrs } 12812 15.26.2. ARGUMENT 12814 struct READDIR4args { 12815 /* CURRENT_FH: directory */ 12816 nfs_cookie4 cookie; 12817 verifier4 cookieverf; 12818 count4 dircount; 12819 count4 maxcount; 12820 bitmap4 attr_request; 12821 }; 12823 15.26.3. RESULT 12825 struct entry4 { 12826 nfs_cookie4 cookie; 12827 component4 name; 12828 fattr4 attrs; 12829 entry4 *nextentry; 12830 }; 12832 struct dirlist4 { 12833 entry4 *entries; 12834 bool eof; 12835 }; 12837 struct READDIR4resok { 12838 verifier4 cookieverf; 12839 dirlist4 reply; 12840 }; 12842 union READDIR4res switch (nfsstat4 status) { 12843 case NFS4_OK: 12844 READDIR4resok resok4; 12845 default: 12846 void; 12847 }; 12849 15.26.4. DESCRIPTION 12851 The READDIR operation retrieves a variable number of entries from a 12852 filesystem directory and returns client requested attributes for each 12853 entry along with information to allow the client to request 12854 additional directory entries in a subsequent READDIR. 12856 The arguments contain a cookie value that represents where the 12857 READDIR should start within the directory. A value of 0 (zero) for 12858 the cookie is used to start reading at the beginning of the 12859 directory. For subsequent READDIR requests, the client specifies a 12860 cookie value that is provided by the server on a previous READDIR 12861 request. 12863 The cookieverf value should be set to 0 (zero) when the cookie value 12864 is 0 (zero) (first directory read). On subsequent requests, it 12865 should be a cookieverf as returned by the server. The cookieverf 12866 must match that returned by the READDIR in which the cookie was 12867 acquired. If the server determines that the cookieverf is no longer 12868 valid for the directory, the error NFS4ERR_NOT_SAME must be returned. 12870 The dircount portion of the argument is a hint of the maximum number 12871 of bytes of directory information that should be returned. This 12872 value represents the length of the names of the directory entries and 12873 the cookie value for these entries. This length represents the XDR 12874 encoding of the data (names and cookies) and not the length in the 12875 native format of the server. 12877 The maxcount value of the argument is the maximum number of bytes for 12878 the result. This maximum size represents all of the data being 12879 returned within the READDIR4resok structure and includes the XDR 12880 overhead. The server may return less data. If the server is unable 12881 to return a single directory entry within the maxcount limit, the 12882 error NFS4ERR_TOOSMALL will be returned to the client. 12884 Finally, attr_request represents the list of attributes to be 12885 returned for each directory entry supplied by the server. 12887 On successful return, the server's response will provide a list of 12888 directory entries. Each of these entries contains the name of the 12889 directory entry, a cookie value for that entry, and the associated 12890 attributes as requested. The "eof" flag has a value of TRUE if there 12891 are no more entries in the directory. 12893 The cookie value is only meaningful to the server and is used as a 12894 "bookmark" for the directory entry. As mentioned, this cookie is 12895 used by the client for subsequent READDIR operations so that it may 12896 continue reading a directory. The cookie is similar in concept to a 12897 READ offset but should not be interpreted as such by the client. 12898 Ideally, the cookie value should not change if the directory is 12899 modified since the client may be caching these values. 12901 In some cases, the server may encounter an error while obtaining the 12902 attributes for a directory entry. Instead of returning an error for 12903 the entire READDIR operation, the server can instead return the 12904 attribute 'fattr4_rdattr_error'. With this, the server is able to 12905 communicate the failure to the client and not fail the entire 12906 operation in the instance of what might be a transient failure. 12907 Obviously, the client must request the fattr4_rdattr_error attribute 12908 for this method to work properly. If the client does not request the 12909 attribute, the server has no choice but to return failure for the 12910 entire READDIR operation. 12912 For some filesystem environments, the directory entries "." and ".." 12913 have special meaning and in other environments, they may not. If the 12914 server supports these special entries within a directory, they should 12915 not be returned to the client as part of the READDIR response. To 12916 enable some client environments, the cookie values of 0, 1, and 2 are 12917 to be considered reserved. Note that the UNIX client will use these 12918 values when combining the server's response and local representations 12919 to enable a fully formed UNIX directory presentation to the 12920 application. 12922 For READDIR arguments, cookie values of 1 and 2 SHOULD NOT be used 12923 and for READDIR results cookie values of 0, 1, and 2 MUST NOT be 12924 returned. 12926 On success, the current filehandle retains its value. 12928 15.26.5. IMPLEMENTATION 12930 The server's filesystem directory representations can differ greatly. 12931 A client's programming interfaces may also be bound to the local 12932 operating environment in a way that does not translate well into the 12933 NFS protocol. Therefore the use of the dircount and maxcount fields 12934 are provided to allow the client the ability to provide guidelines to 12935 the server. If the client is aggressive about attribute collection 12936 during a READDIR, the server has an idea of how to limit the encoded 12937 response. The dircount field provides a hint on the number of 12938 entries based solely on the names of the directory entries. Since it 12939 is a hint, it may be possible that a dircount value is zero. In this 12940 case, the server is free to ignore the dircount value and return 12941 directory information based on the specified maxcount value. 12943 The cookieverf may be used by the server to help manage cookie values 12944 that may become stale. It should be a rare occurrence that a server 12945 is unable to continue properly reading a directory with the provided 12946 cookie/cookieverf pair. The server should make every effort to avoid 12947 this condition since the application at the client may not be able to 12948 properly handle this type of failure. 12950 The use of the cookieverf will also protect the client from using 12951 READDIR cookie values that may be stale. For example, if the file 12952 system has been migrated, the server may or may not be able to use 12953 the same cookie values to service READDIR as the previous server 12954 used. With the client providing the cookieverf, the server is able 12955 to provide the appropriate response to the client. This prevents the 12956 case where the server may accept a cookie value but the underlying 12957 directory has changed and the response is invalid from the client's 12958 context of its previous READDIR. 12960 Since some servers will not be returning "." and ".." entries as has 12961 been done with previous versions of the NFS protocol, the client that 12962 requires these entries be present in READDIR responses must fabricate 12963 them. 12965 15.27. Operation 27: READLINK - Read Symbolic Link 12967 15.27.1. SYNOPSIS 12969 (cfh) -> linktext 12971 15.27.2. ARGUMENT 12973 /* CURRENT_FH: symlink */ 12974 void; 12976 15.27.3. RESULT 12978 struct READLINK4resok { 12979 linktext4 link; 12980 }; 12982 union READLINK4res switch (nfsstat4 status) { 12983 case NFS4_OK: 12984 READLINK4resok resok4; 12985 default: 12986 void; 12987 }; 12989 15.27.4. DESCRIPTION 12991 READLINK reads the data associated with a symbolic link. The data is 12992 a UTF-8 string that is opaque to the server. That is, whether 12993 created by an NFS client or created locally on the server, the data 12994 in a symbolic link is not interpreted when created, but is simply 12995 stored. 12997 On success, the current filehandle retains its value. 12999 15.27.5. IMPLEMENTATION 13001 A symbolic link is nominally a pointer to another file. The data is 13002 not necessarily interpreted by the server, just stored in the file. 13003 It is possible for a client implementation to store a path name that 13004 is not meaningful to the server operating system in a symbolic link. 13005 A READLINK operation returns the data to the client for 13006 interpretation. If different implementations want to share access to 13007 symbolic links, then they must agree on the interpretation of the 13008 data in the symbolic link. 13010 The READLINK operation is only allowed on objects of type NF4LNK. 13011 The server should return the error, NFS4ERR_INVAL, if the object is 13012 not of type, NF4LNK. 13014 15.28. Operation 28: REMOVE - Remove Filesystem Object 13016 15.28.1. SYNOPSIS 13018 (cfh), filename -> change_info 13020 15.28.2. ARGUMENT 13022 struct REMOVE4args { 13023 /* CURRENT_FH: directory */ 13024 component4 target; 13025 }; 13027 15.28.3. RESULT 13029 struct REMOVE4resok { 13030 change_info4 cinfo; 13031 }; 13033 union REMOVE4res switch (nfsstat4 status) { 13034 case NFS4_OK: 13035 REMOVE4resok resok4; 13036 default: 13037 void; 13038 }; 13040 15.28.4. DESCRIPTION 13042 The REMOVE operation removes (deletes) a directory entry M named by 13043 filename from the directory corresponding to the current filehandle. 13044 If the entry in the directory was the last reference to the 13045 corresponding filesystem object, the object may be destroyed. 13047 For the directory where the filename was removed, the server returns 13048 change_info4 information in cinfo. With the atomic field of the 13049 change_info4 struct, the server will indicate if the before and after 13050 change attributes were obtained atomically with respect to the 13051 removal. 13053 If the target is of zero length, NFS4ERR_INVAL will be returned. The 13054 target is also subject to the normal UTF-8, character support, and 13055 name checks. See Section 12.3 for further discussion. 13057 On success, the current filehandle retains its value. 13059 15.28.5. IMPLEMENTATION 13061 NFSv3 required a different operator RMDIR for directory removal and 13062 REMOVE for non-directory removal. This allowed clients to skip 13063 checking the file type when being passed a non-directory delete 13064 system call (e.g., unlink() [40] in POSIX) to remove a directory, as 13065 well as the converse (e.g., a rmdir() on a non-directory) because 13066 they knew the server would check the file type. NFSv4 REMOVE can be 13067 used to delete any directory entry independent of its file type. The 13068 implementor of an NFSv4 client's entry points from the unlink() and 13069 rmdir() system calls should first check the file type against the 13070 types the system call is allowed to remove before issuing a REMOVE. 13071 Alternatively, the implementor can produce a COMPOUND call that 13072 includes a LOOKUP/VERIFY sequence to verify the file type before a 13073 REMOVE operation in the same COMPOUND call. 13075 The concept of last reference is server specific. However, if the 13076 numlinks field in the previous attributes of the object had the value 13077 1, the client should not rely on referring to the object via a 13078 filehandle. Likewise, the client should not rely on the resources 13079 (disk space, directory entry, and so on) formerly associated with the 13080 object becoming immediately available. Thus, if a client needs to be 13081 able to continue to access a file after using REMOVE to remove it, 13082 the client should take steps to make sure that the file will still be 13083 accessible. The usual mechanism used is to RENAME the file from its 13084 old name to a new hidden name. 13086 If the server finds that the file is still open when the REMOVE 13087 arrives: 13089 o The server SHOULD NOT delete the file's directory entry if the 13090 file was opened with OPEN4_SHARE_DENY_WRITE or 13091 OPEN4_SHARE_DENY_BOTH. 13093 o If the file was not opened with OPEN4_SHARE_DENY_WRITE or 13094 OPEN4_SHARE_DENY_BOTH, the server SHOULD delete the file's 13095 directory entry. However, until last CLOSE of the file, the 13096 server MAY continue to allow access to the file via its 13097 filehandle. 13099 15.29. Operation 29: RENAME - Rename Directory Entry 13101 15.29.1. SYNOPSIS 13103 (sfh), oldname, (cfh), newname -> source_change_info, 13104 target_change_info 13106 15.29.2. ARGUMENT 13108 struct RENAME4args { 13109 /* SAVED_FH: source directory */ 13110 component4 oldname; 13111 /* CURRENT_FH: target directory */ 13112 component4 newname; 13113 }; 13115 15.29.3. RESULT 13117 struct RENAME4resok { 13118 change_info4 source_cinfo; 13119 change_info4 target_cinfo; 13120 }; 13122 union RENAME4res switch (nfsstat4 status) { 13123 case NFS4_OK: 13124 RENAME4resok resok4; 13125 default: 13126 void; 13127 }; 13129 15.29.4. DESCRIPTION 13131 The RENAME operation renames the object identified by oldname in the 13132 source directory corresponding to the saved filehandle, as set by the 13133 SAVEFH operation, to newname in the target directory corresponding to 13134 the current filehandle. The operation is required to be atomic to 13135 the client. Source and target directories must reside on the same 13136 filesystem on the server. On success, the current filehandle will 13137 continue to be the target directory. 13139 If the target directory already contains an entry with the name, 13140 newname, the source object must be compatible with the target: either 13141 both are non-directories or both are directories and the target must 13142 be empty. If compatible, the existing target is removed before the 13143 rename occurs (See Section 15.28 for client and server actions 13144 whenever a target is removed). If they are not compatible or if the 13145 target is a directory but not empty, the server will return the 13146 error, NFS4ERR_EXIST. 13148 If oldname and newname both refer to the same file (they might be 13149 hard links of each other), then RENAME should perform no action and 13150 return success. 13152 For both directories involved in the RENAME, the server returns 13153 change_info4 information. With the atomic field of the change_info4 13154 struct, the server will indicate if the before and after change 13155 attributes were obtained atomically with respect to the rename. 13157 If the oldname refers to a named attribute and the saved and current 13158 filehandles refer to different filesystem objects, the server will 13159 return NFS4ERR_XDEV just as if the saved and current filehandles 13160 represented directories on different filesystems. 13162 If the oldname or newname is of zero length, NFS4ERR_INVAL will be 13163 returned. The oldname and newname are also subject to the normal 13164 UTF-8, character support, and name checks. See Section 12.3 for 13165 further discussion. 13167 15.29.5. IMPLEMENTATION 13169 The RENAME operation must be atomic to the client. The statement 13170 "source and target directories must reside on the same filesystem on 13171 the server" means that the fsid fields in the attributes for the 13172 directories are the same. If they reside on different filesystems, 13173 the error, NFS4ERR_XDEV, is returned. 13175 Based on the value of the fh_expire_type attribute for the object, 13176 the filehandle may or may not expire on a RENAME. However, server 13177 implementors are strongly encouraged to attempt to keep filehandles 13178 from expiring in this fashion. 13180 On some servers, the file names "." and ".." are illegal as either 13181 oldname or newname, and will result in the error NFS4ERR_BADNAME. In 13182 addition, on many servers the case of oldname or newname being an 13183 alias for the source directory will be checked for. Such servers 13184 will return the error NFS4ERR_INVAL in these cases. 13186 If either of the source or target filehandles are not directories, 13187 the server will return NFS4ERR_NOTDIR. 13189 15.30. Operation 30: RENEW - Renew a Lease 13191 15.30.1. SYNOPSIS 13193 clientid -> () 13195 15.30.2. ARGUMENT 13197 struct RENEW4args { 13198 clientid4 clientid; 13199 }; 13201 15.30.3. RESULT 13203 struct RENEW4res { 13204 nfsstat4 status; 13205 }; 13207 15.30.4. DESCRIPTION 13209 The RENEW operation is used by the client to renew leases which it 13210 currently holds at a server. In processing the RENEW request, the 13211 server renews all leases associated with the client. The associated 13212 leases are determined by the clientid provided via the SETCLIENTID 13213 operation. 13215 15.30.5. IMPLEMENTATION 13217 When the client holds delegations, it needs to use RENEW to detect 13218 when the server has determined that the callback path is down. When 13219 the server has made such a determination, only the RENEW operation 13220 will renew the lease on delegations. If the server determines the 13221 callback path is down, it returns NFS4ERR_CB_PATH_DOWN. Even though 13222 it returns NFS4ERR_CB_PATH_DOWN, the server MUST renew the lease on 13223 the byte-range locks and share reservations that the client has 13224 established on the server. If for some reason the lock and share 13225 reservation lease cannot be renewed, then the server MUST return an 13226 error other than NFS4ERR_CB_PATH_DOWN, even if the callback path is 13227 also down. In the event that the server has conditions such that is 13228 could return either NFS4ERR_CB_PATH_DOWN or NFS4ERR_LEASE_MOVED, 13229 NFS4ERR_LEASE_MOVED MUST be handled first. 13231 The client that issues RENEW MUST choose the principal, RPC security 13232 flavor, and if applicable, GSS-API mechanism and service via one of 13233 the following algorithms: 13235 o The client uses the same principal, RPC security flavor -- and if 13236 the flavor was RPCSEC_GSS -- the same mechanism and service that 13237 was used when the client id was established via 13238 SETCLIENTID_CONFIRM. 13240 o The client uses any principal, RPC security flavor mechanism and 13241 service combination that currently has an OPEN file on the server. 13242 I.e., the same principal had a successful OPEN operation, the file 13243 is still open by that principal, and the flavor, mechanism, and 13244 service of RENEW match that of the previous OPEN. 13246 The server MUST reject a RENEW that does not use one the 13247 aforementioned algorithms, with the error NFS4ERR_ACCESS. 13249 15.31. Operation 31: RESTOREFH - Restore Saved Filehandle 13251 15.31.1. SYNOPSIS 13253 (sfh) -> (cfh) 13255 15.31.2. ARGUMENT 13257 /* SAVED_FH: */ 13258 void; 13260 15.31.3. RESULT 13262 struct RESTOREFH4res { 13263 /* CURRENT_FH: value of saved fh */ 13264 nfsstat4 status; 13265 }; 13267 15.31.4. DESCRIPTION 13269 Set the current filehandle to the value in the saved filehandle. If 13270 there is no saved filehandle then return the error NFS4ERR_RESTOREFH. 13272 15.31.5. IMPLEMENTATION 13274 Operations like OPEN and LOOKUP use the current filehandle to 13275 represent a directory and replace it with a new filehandle. Assuming 13276 the previous filehandle was saved with a SAVEFH operator, the 13277 previous filehandle can be restored as the current filehandle. This 13278 is commonly used to obtain post-operation attributes for the 13279 directory, e.g., 13280 PUTFH (directory filehandle) 13281 SAVEFH 13282 GETATTR attrbits (pre-op dir attrs) 13283 CREATE optbits "foo" attrs 13284 GETATTR attrbits (file attributes) 13285 RESTOREFH 13286 GETATTR attrbits (post-op dir attrs) 13288 15.32. Operation 32: SAVEFH - Save Current Filehandle 13290 15.32.1. SYNOPSIS 13292 (cfh) -> (sfh) 13294 15.32.2. ARGUMENT 13296 /* CURRENT_FH: */ 13297 void; 13299 15.32.3. RESULT 13301 struct SAVEFH4res { 13302 /* SAVED_FH: value of current fh */ 13303 nfsstat4 status; 13304 }; 13306 15.32.4. DESCRIPTION 13308 Save the current filehandle. If a previous filehandle was saved then 13309 it is no longer accessible. The saved filehandle can be restored as 13310 the current filehandle with the RESTOREFH operator. 13312 On success, the current filehandle retains its value. 13314 15.32.5. IMPLEMENTATION 13316 15.33. Operation 33: SECINFO - Obtain Available Security 13318 15.33.1. SYNOPSIS 13320 (cfh), name -> { secinfo } 13322 15.33.2. ARGUMENT 13324 struct SECINFO4args { 13325 /* CURRENT_FH: directory */ 13326 component4 name; 13327 }; 13329 15.33.3. RESULT 13331 /* 13332 * From RFC 2203 13333 */ 13334 enum rpc_gss_svc_t { 13335 RPC_GSS_SVC_NONE = 1, 13336 RPC_GSS_SVC_INTEGRITY = 2, 13337 RPC_GSS_SVC_PRIVACY = 3 13338 }; 13340 struct rpcsec_gss_info { 13341 sec_oid4 oid; 13342 qop4 qop; 13343 rpc_gss_svc_t service; 13344 }; 13346 /* RPCSEC_GSS has a value of '6' - See RFC 2203 */ 13347 union secinfo4 switch (uint32_t flavor) { 13348 case RPCSEC_GSS: 13349 rpcsec_gss_info flavor_info; 13350 default: 13351 void; 13352 }; 13354 typedef secinfo4 SECINFO4resok<>; 13356 union SECINFO4res switch (nfsstat4 status) { 13357 case NFS4_OK: 13358 SECINFO4resok resok4; 13359 default: 13360 void; 13361 }; 13363 15.33.4. DESCRIPTION 13365 The SECINFO operation is used by the client to obtain a list of valid 13366 RPC authentication flavors for a specific directory filehandle, file 13367 name pair. SECINFO should apply the same access methodology used for 13368 LOOKUP when evaluating the name. Therefore, if the requester does 13369 not have the appropriate access to LOOKUP the name then SECINFO must 13370 behave the same way and return NFS4ERR_ACCESS. 13372 The result will contain an array which represents the security 13373 mechanisms available, with an order corresponding to server's 13374 preferences, the most preferred being first in the array. The client 13375 is free to pick whatever security mechanism it both desires and 13376 supports, or to pick in the server's preference order the first one 13377 it supports. The array entries are represented by the secinfo4 13378 structure. The field 'flavor' will contain a value of AUTH_NONE, 13379 AUTH_SYS (as defined in [3]), or RPCSEC_GSS (as defined in [4]). 13381 For the flavors AUTH_NONE and AUTH_SYS, no additional security 13382 information is returned. For a return value of RPCSEC_GSS, a 13383 security triple is returned that contains the mechanism object id (as 13384 defined in [6]), the quality of protection (as defined in [6]) and 13385 the service type (as defined in [4]). It is possible for SECINFO to 13386 return multiple entries with flavor equal to RPCSEC_GSS with 13387 different security triple values. 13389 On success, the current filehandle retains its value. 13391 If the name has a length of 0 (zero), or if name does not obey the 13392 UTF-8 definition, the error NFS4ERR_INVAL will be returned. 13394 15.33.5. IMPLEMENTATION 13396 The SECINFO operation is expected to be used by the NFS client when 13397 the error value of NFS4ERR_WRONGSEC is returned from another NFS 13398 operation. This signifies to the client that the server's security 13399 policy is different from what the client is currently using. At this 13400 point, the client is expected to obtain a list of possible security 13401 flavors and choose what best suits its policies. 13403 As mentioned, the server's security policies will determine when a 13404 client request receives NFS4ERR_WRONGSEC. The operations which may 13405 receive this error are: LINK, LOOKUP, LOOKUPP, OPEN, PUTFH, PUTPUBFH, 13406 PUTROOTFH, RENAME, RESTOREFH, and indirectly READDIR. LINK and 13407 RENAME will only receive this error if the security used for the 13408 operation is inappropriate for saved filehandle. With the exception 13409 of READDIR, these operations represent the point at which the client 13410 can instantiate a filehandle into the "current filehandle" at the 13411 server. The filehandle is either provided by the client (PUTFH, 13412 PUTPUBFH, PUTROOTFH) or generated as a result of a name to filehandle 13413 translation (LOOKUP and OPEN). RESTOREFH is different because the 13414 filehandle is a result of a previous SAVEFH. Even though the 13415 filehandle, for RESTOREFH, might have previously passed the server's 13416 inspection for a security match, the server will check it again on 13417 RESTOREFH to ensure that the security policy has not changed. 13419 If the client wants to resolve an error return of NFS4ERR_WRONGSEC, 13420 the following will occur: 13422 o For LOOKUP and OPEN, the client will use SECINFO with the same 13423 current filehandle and name as provided in the original LOOKUP or 13424 OPEN to enumerate the available security triples. 13426 o For LINK, PUTFH, RENAME, and RESTOREFH, the client will use 13427 SECINFO and provide the parent directory filehandle and object 13428 name which corresponds to the filehandle originally provided by 13429 the PUTFH RESTOREFH, or for LINK and RENAME, the SAVEFH. 13431 o For LOOKUPP, PUTROOTFH and PUTPUBFH, the client will be unable to 13432 use the SECINFO operation since SECINFO requires a current 13433 filehandle and none exist for these two operations. Therefore, 13434 the client must iterate through the security triples available at 13435 the client and reattempt the PUTROOTFH or PUTPUBFH operation. In 13436 the unfortunate event none of the MANDATORY security triples are 13437 supported by the client and server, the client SHOULD try using 13438 others that support integrity. Failing that, the client can try 13439 using AUTH_NONE, but because such forms lack integrity checks, 13440 this puts the client at risk. Nonetheless, the server SHOULD 13441 allow the client to use whatever security form the client requests 13442 and the server supports, since the risks of doing so are on the 13443 client. 13445 The READDIR operation will not directly return the NFS4ERR_WRONGSEC 13446 error. However, if the READDIR request included a request for 13447 attributes, it is possible that the READDIR request's security triple 13448 does not match that of a directory entry. If this is the case and 13449 the client has requested the rdattr_error attribute, the server will 13450 return the NFS4ERR_WRONGSEC error in rdattr_error for the entry. 13452 See Section 17 for a discussion on the recommendations for security 13453 flavor used by SECINFO. 13455 15.34. Operation 34: SETATTR - Set Attributes 13457 15.34.1. SYNOPSIS 13459 (cfh), stateid, attrmask, attr_vals -> attrsset 13461 15.34.2. ARGUMENT 13463 struct SETATTR4args { 13464 /* CURRENT_FH: target object */ 13465 stateid4 stateid; 13466 fattr4 obj_attributes; 13467 }; 13469 15.34.3. RESULT 13471 struct SETATTR4res { 13472 nfsstat4 status; 13473 bitmap4 attrsset; 13474 }; 13476 15.34.4. DESCRIPTION 13478 The SETATTR operation changes one or more of the attributes of a 13479 filesystem object. The new attributes are specified with a bitmap 13480 and the attributes that follow the bitmap in bit order. 13482 The stateid argument for SETATTR is used to provide byte-range 13483 locking context that is necessary for SETATTR requests that set the 13484 size attribute. Since setting the size attribute modifies the file's 13485 data, it has the same locking requirements as a corresponding WRITE. 13486 Any SETATTR that sets the size attribute is incompatible with a share 13487 reservation that specifies OPEN4_SHARE_DENY_WRITE. The area between 13488 the old end-of-file and the new end-of-file is considered to be 13489 modified just as would have been the case had the area in question 13490 been specified as the target of WRITE, for the purpose of checking 13491 conflicts with byte-range locks, for those cases in which a server is 13492 implementing mandatory byte-range locking behavior. A valid stateid 13493 SHOULD always be specified. When the file size attribute is not set, 13494 the special stateid consisting of all bits zero MAY be passed. 13496 On either success or failure of the operation, the server will return 13497 the attrsset bitmask to represent what (if any) attributes were 13498 successfully set. The attrsset in the response is a subset of the 13499 bitmap4 that is part of the obj_attributes in the argument. 13501 On success, the current filehandle retains its value. 13503 15.34.5. IMPLEMENTATION 13505 If the request specifies the owner attribute to be set, the server 13506 SHOULD allow the operation to succeed if the current owner of the 13507 object matches the value specified in the request. Some servers may 13508 be implemented in a way as to prohibit the setting of the owner 13509 attribute unless the requester has privilege to do so. If the server 13510 is lenient in this one case of matching owner values, the client 13511 implementation may be simplified in cases of creation of an object 13512 (e.g., an exclusive create via OPEN) followed by a SETATTR. 13514 The file size attribute is used to request changes to the size of a 13515 file. A value of zero causes the file to be truncated, a value less 13516 than the current size of the file causes data from new size to the 13517 end of the file to be discarded, and a size greater than the current 13518 size of the file causes logically zeroed data bytes to be added to 13519 the end of the file. Servers are free to implement this using holes 13520 or actual zero data bytes. Clients should not make any assumptions 13521 regarding a server's implementation of this feature, beyond that the 13522 bytes returned will be zeroed. Servers MUST support extending the 13523 file size via SETATTR. 13525 SETATTR is not guaranteed atomic. A failed SETATTR may partially 13526 change a file's attributes, hence the reason why the reply always 13527 includes the status and the list of attributes that were set. 13529 If the object whose attributes are being changed has a file 13530 delegation that is held by a client other than the one doing the 13531 SETATTR, the delegation(s) must be recalled, and the operation cannot 13532 proceed to actually change an attribute until each such delegation is 13533 returned or revoked. In all cases in which delegations are recalled, 13534 the server is likely to return one or more NFS4ERR_DELAY errors while 13535 the delegation(s) remains outstanding, although it might not do that 13536 if the delegations are returned quickly. 13538 Changing the size of a file with SETATTR indirectly changes the 13539 time_modify and change attributes. A client must account for this as 13540 size changes can result in data deletion. 13542 The attributes time_access_set and time_modify_set are write-only 13543 attributes constructed as a switched union so the client can direct 13544 the server in setting the time values. If the switched union 13545 specifies SET_TO_CLIENT_TIME4, the client has provided an nfstime4 to 13546 be used for the operation. If the switch union does not specify 13547 SET_TO_CLIENT_TIME4, the server is to use its current time for the 13548 SETATTR operation. 13550 If server and client times differ, programs that compare client time 13551 to file times can break. A time maintenance protocol should be used 13552 to limit client/server time skew. 13554 Use of a COMPOUND containing a VERIFY operation specifying only the 13555 change attribute, immediately followed by a SETATTR, provides a means 13556 whereby a client may specify a request that emulates the 13557 functionality of the SETATTR guard mechanism of NFSv3. Since the 13558 function of the guard mechanism is to avoid changes to the file 13559 attributes based on stale information, delays between checking of the 13560 guard condition and the setting of the attributes have the potential 13561 to compromise this function, as would the corresponding delay in the 13562 NFSv4 emulation. Therefore, NFSv4 servers should take care to avoid 13563 such delays, to the degree possible, when executing such a request. 13565 If the server does not support an attribute as requested by the 13566 client, the server should return NFS4ERR_ATTRNOTSUPP. 13568 A mask of the attributes actually set is returned by SETATTR in all 13569 cases. That mask MUST NOT include attribute bits not requested to be 13570 set by the client. If the attribute masks in the request and reply 13571 are equal, the status field in the reply MUST be NFS4_OK. 13573 15.35. Operation 35: SETCLIENTID - Negotiate Client ID 13575 15.35.1. SYNOPSIS 13577 client, callback, callback_ident -> clientid, setclientid_confirm 13579 15.35.2. ARGUMENT 13581 struct SETCLIENTID4args { 13582 nfs_client_id4 client; 13583 cb_client4 callback; 13584 uint32_t callback_ident; 13585 }; 13587 15.35.3. RESULT 13589 struct SETCLIENTID4resok { 13590 clientid4 clientid; 13591 verifier4 setclientid_confirm; 13592 }; 13594 union SETCLIENTID4res switch (nfsstat4 status) { 13595 case NFS4_OK: 13596 SETCLIENTID4resok resok4; 13597 case NFS4ERR_CLID_INUSE: 13598 clientaddr4 client_using; 13599 default: 13600 void; 13601 }; 13603 15.35.4. DESCRIPTION 13605 The client uses the SETCLIENTID operation to notify the server of its 13606 intention to use a particular client identifier, callback, and 13607 callback_ident for subsequent requests that entail creating lock, 13608 share reservation, and delegation state on the server. Upon 13609 successful completion the server will return a shorthand client ID 13610 which, if confirmed via a separate step, will be used in subsequent 13611 file locking and file open requests. Confirmation of the client ID 13612 must be done via the SETCLIENTID_CONFIRM operation to return the 13613 client ID and setclientid_confirm values, as verifiers, to the 13614 server. The reason why two verifiers are necessary is that it is 13615 possible to use SETCLIENTID and SETCLIENTID_CONFIRM to modify the 13616 callback and callback_ident information but not the shorthand client 13617 ID. In that event, the setclientid_confirm value is effectively the 13618 only verifier. 13620 The callback information provided in this operation will be used if 13621 the client is provided an open delegation at a future point. 13622 Therefore, the client must correctly reflect the program and port 13623 numbers for the callback program at the time SETCLIENTID is used. 13625 The callback_ident value is used by the server on the callback. The 13626 client can leverage the callback_ident to eliminate the need for more 13627 than one callback RPC program number, while still being able to 13628 determine which server is initiating the callback. 13630 15.35.5. IMPLEMENTATION 13632 To understand how to implement SETCLIENTID, make the following 13633 notations. Let: 13635 x be the value of the client.id subfield of the SETCLIENTID4args 13636 structure. 13638 v be the value of the client.verifier subfield of the 13639 SETCLIENTID4args structure. 13641 c be the value of the client ID field returned in the 13642 SETCLIENTID4resok structure. 13644 k represent the value combination of the fields callback and 13645 callback_ident fields of the SETCLIENTID4args structure. 13647 s be the setclientid_confirm value returned in the SETCLIENTID4resok 13648 structure. 13650 { v, x, c, k, s } be a quintuple for a client record. A client 13651 record is confirmed if there has been a SETCLIENTID_CONFIRM 13652 operation to confirm it. Otherwise it is unconfirmed. An 13653 unconfirmed record is established by a SETCLIENTID call. 13655 Since SETCLIENTID is a non-idempotent operation, let us assume that 13656 the server is implementing the duplicate request cache (DRC). 13658 When the server gets a SETCLIENTID { v, x, k } request, it processes 13659 it in the following manner. 13661 o It first looks up the request in the DRC. If there is a hit, it 13662 returns the result cached in the DRC. The server does NOT remove 13663 client state (locks, shares, delegations) nor does it modify any 13664 recorded callback and callback_ident information for client { x }. 13666 For any DRC miss, the server takes the client id string x, and 13667 searches for client records for x that the server may have 13668 recorded from previous SETCLIENTID calls. For any confirmed 13669 record with the same id string x, if the recorded principal does 13670 not match that of SETCLIENTID call, then the server returns a 13671 NFS4ERR_CLID_INUSE error. 13673 For brevity of discussion, the remaining description of the 13674 processing assumes that there was a DRC miss, and that where the 13675 server has previously recorded a confirmed record for client x, 13676 the aforementioned principal check has successfully passed. 13678 o The server checks if it has recorded a confirmed record for { v, 13679 x, c, l, s }, where l may or may not equal k. If so, and since 13680 the id verifier v of the request matches that which is confirmed 13681 and recorded, the server treats this as a probable callback 13682 information update and records an unconfirmed { v, x, c, k, t } 13683 and leaves the confirmed { v, x, c, l, s } in place, such that t 13684 != s. It does not matter if k equals l or not. Any pre-existing 13685 unconfirmed { v, x, c, *, * } is removed. 13687 The server returns { c, t }. It is indeed returning the old 13688 clientid4 value c, because the client apparently only wants to 13689 update callback value k to value l. It's possible this request is 13690 one from the Byzantine router that has stale callback information, 13691 but this is not a problem. The callback information update is 13692 only confirmed if followed up by a SETCLIENTID_CONFIRM { c, t }. 13694 The server awaits confirmation of k via SETCLIENTID_CONFIRM { c, t 13695 }. 13697 The server does NOT remove client (lock/share/delegation) state 13698 for x. 13700 o The server has previously recorded a confirmed { u, x, c, l, s } 13701 record such that v != u, l may or may not equal k, and has not 13702 recorded any unconfirmed { *, x, *, *, * } record for x. The 13703 server records an unconfirmed { v, x, d, k, t } (d != c, t != s). 13705 The server returns { d, t }. 13707 The server awaits confirmation of { d, k } via SETCLIENTID_CONFIRM 13708 { d, t }. 13710 The server does NOT remove client (lock/share/delegation) state 13711 for x. 13713 o The server has previously recorded a confirmed { u, x, c, l, s } 13714 record such that v != u, l may or may not equal k, and recorded an 13715 unconfirmed { w, x, d, m, t } record such that c != d, t != s, m 13716 may or may not equal k, m may or may not equal l, and k may or may 13717 not equal l. Whether w == v or w != v makes no difference. The 13718 server simply removes the unconfirmed { w, x, d, m, t } record and 13719 replaces it with an unconfirmed { v, x, e, k, r } record, such 13720 that e != d, e != c, r != t, r != s. 13722 The server returns { e, r }. 13724 The server awaits confirmation of { e, k } via SETCLIENTID_CONFIRM 13725 { e, r }. 13727 The server does NOT remove client (lock/share/delegation) state 13728 for x. 13730 o The server has no confirmed { *, x, *, *, * } for x. It may or 13731 may not have recorded an unconfirmed { u, x, c, l, s }, where l 13732 may or may not equal k, and u may or may not equal v. Any 13733 unconfirmed record { u, x, c, l, * }, regardless whether u == v or 13734 l == k, is replaced with an unconfirmed record { v, x, d, k, t } 13735 where d != c, t != s. 13737 The server returns { d, t }. 13739 The server awaits confirmation of { d, k } via SETCLIENTID_CONFIRM 13740 { d, t }. The server does NOT remove client (lock/share/ 13741 delegation) state for x. 13743 The server generates the clientid and setclientid_confirm values and 13744 must take care to ensure that these values are extremely unlikely to 13745 ever be regenerated. 13747 15.36. Operation 36: SETCLIENTID_CONFIRM - Confirm Client ID 13749 15.36.1. SYNOPSIS 13751 clientid, verifier -> - 13753 15.36.2. ARGUMENT 13755 struct SETCLIENTID_CONFIRM4args { 13756 clientid4 clientid; 13757 verifier4 setclientid_confirm; 13758 }; 13760 15.36.3. RESULT 13762 struct SETCLIENTID_CONFIRM4res { 13763 nfsstat4 status; 13764 }; 13766 15.36.4. DESCRIPTION 13768 This operation is used by the client to confirm the results from a 13769 previous call to SETCLIENTID. The client provides the server 13770 supplied (from a SETCLIENTID response) client ID. The server 13771 responds with a simple status of success or failure. 13773 15.36.5. IMPLEMENTATION 13775 The client must use the SETCLIENTID_CONFIRM operation to confirm the 13776 following two distinct cases: 13778 o The client's use of a new shorthand client identifier (as returned 13779 from the server in the response to SETCLIENTID), a new callback 13780 value (as specified in the arguments to SETCLIENTID) and a new 13781 callback_ident (as specified in the arguments to SETCLIENTID) 13782 value. The client's use of SETCLIENTID_CONFIRM in this case also 13783 confirms the removal of any of the client's previous relevant 13784 leased state. Relevant leased client state includes byte-range 13785 locks, share reservations, and where the server does not support 13786 the CLAIM_DELEGATE_PREV claim type, delegations. If the server 13787 supports CLAIM_DELEGATE_PREV, then SETCLIENTID_CONFIRM MUST NOT 13788 remove delegations for this client; relevant leased client state 13789 would then just include byte-range locks and share reservations. 13791 o The client's re-use of an old, previously confirmed, shorthand 13792 client identifier, a new callback value, and a new callback_ident 13793 value. The client's use of SETCLIENTID_CONFIRM in this case MUST 13794 NOT result in the removal of any previous leased state (locks, 13795 share reservations, and delegations) 13797 We use the same notation and definitions for v, x, c, k, s, and 13798 unconfirmed and confirmed client records as introduced in the 13799 description of the SETCLIENTID operation. The arguments to 13800 SETCLIENTID_CONFIRM are indicated by the notation { c, s }, where c 13801 is a value of type clientid4, and s is a value of type verifier4 13802 corresponding to the setclientid_confirm field. 13804 As with SETCLIENTID, SETCLIENTID_CONFIRM is a non-idempotent 13805 operation, and we assume that the server is implementing the 13806 duplicate request cache (DRC). 13808 When the server gets a SETCLIENTID_CONFIRM { c, s } request, it 13809 processes it in the following manner. 13811 o It first looks up the request in the DRC. If there is a hit, it 13812 returns the result cached in the DRC. The server does not remove 13813 any relevant leased client state nor does it modify any recorded 13814 callback and callback_ident information for client { x } as 13815 represented by the shorthand value c. 13817 For a DRC miss, the server checks for client records that match the 13818 shorthand value c. The processing cases are as follows: 13820 o The server has recorded an unconfirmed { v, x, c, k, s } record 13821 and a confirmed { v, x, c, l, t } record, such that s != t. If 13822 the principals of the records do not match that of the 13823 SETCLIENTID_CONFIRM, the server returns NFS4ERR_CLID_INUSE, and no 13824 relevant leased client state is removed and no recorded callback 13825 and callback_ident information for client { x } is changed. 13826 Otherwise, the confirmed { v, x, c, l, t } record is removed and 13827 the unconfirmed { v, x, c, k, s } is marked as confirmed, thereby 13828 modifying recorded and confirmed callback and callback_ident 13829 information for client { x }. 13831 The server does not remove any relevant leased client state. 13833 The server returns NFS4_OK. 13835 o The server has not recorded an unconfirmed { v, x, c, *, * } and 13836 has recorded a confirmed { v, x, c, *, s }. If the principals of 13837 the record and of SETCLIENTID_CONFIRM do not match, the server 13838 returns NFS4ERR_CLID_INUSE without removing any relevant leased 13839 client state and without changing recorded callback and 13840 callback_ident values for client { x }. 13842 If the principals match, then what has likely happened is that the 13843 client never got the response from the SETCLIENTID_CONFIRM, and 13844 the DRC entry has been purged. Whatever the scenario, since the 13845 principals match, as well as { c, s } matching a confirmed record, 13846 the server leaves client x's relevant leased client state intact, 13847 leaves its callback and callback_ident values unmodified, and 13848 returns NFS4_OK. 13850 o The server has not recorded a confirmed { *, *, c, *, * }, and has 13851 recorded an unconfirmed { *, x, c, k, s }. Even if this is a 13852 retry from client, nonetheless the client's first 13853 SETCLIENTID_CONFIRM attempt was not received by the server. Retry 13854 or not, the server doesn't know, but it processes it as if were a 13855 first try. If the principal of the unconfirmed { *, x, c, k, s } 13856 record mismatches that of the SETCLIENTID_CONFIRM request the 13857 server returns NFS4ERR_CLID_INUSE without removing any relevant 13858 leased client state. 13860 Otherwise, the server records a confirmed { *, x, c, k, s }. If 13861 there is also a confirmed { *, x, d, *, t }, the server MUST 13862 remove the client x's relevant leased client state, and overwrite 13863 the callback state with k. The confirmed record { *, x, d, *, t } 13864 is removed. 13866 Server returns NFS4_OK. 13868 o The server has no record of a confirmed or unconfirmed { *, *, c, 13869 *, s }. The server returns NFS4ERR_STALE_CLIENTID. The server 13870 does not remove any relevant leased client state, nor does it 13871 modify any recorded callback and callback_ident information for 13872 any client. 13874 The server needs to cache unconfirmed { v, x, c, k, s } client 13875 records and await for some time their confirmation. As should be 13876 clear from the record processing discussions for SETCLIENTID and 13877 SETCLIENTID_CONFIRM, there are cases where the server does not 13878 deterministically remove unconfirmed client records. To avoid 13879 running out of resources, the server is not required to hold 13880 unconfirmed records indefinitely. One strategy the server might use 13881 is to set a limit on how many unconfirmed client records it will 13882 maintain, and then when the limit would be exceeded, remove the 13883 oldest record. Another strategy might be to remove an unconfirmed 13884 record when some amount of time has elapsed. The choice of the 13885 amount of time is fairly arbitrary but it is surely no higher than 13886 the server's lease time period. Consider that leases need to be 13887 renewed before the lease time expires via an operation from the 13888 client. If the client cannot issue a SETCLIENTID_CONFIRM after a 13889 SETCLIENTID before a period of time equal to that of a lease expires, 13890 then the client is unlikely to be able maintain state on the server 13891 during steady state operation. 13893 If the client does send a SETCLIENTID_CONFIRM for an unconfirmed 13894 record that the server has already deleted, the client will get 13895 NFS4ERR_STALE_CLIENTID back. If so, the client should then start 13896 over, and send SETCLIENTID to reestablish an unconfirmed client 13897 record and get back an unconfirmed client ID and setclientid_confirm 13898 verifier. The client should then send the SETCLIENTID_CONFIRM to 13899 confirm the client ID. 13901 SETCLIENTID_CONFIRM does not establish or renew a lease. However, if 13902 SETCLIENTID_CONFIRM removes relevant leased client state, and that 13903 state does not include existing delegations, the server MUST allow 13904 the client a period of time no less than the value of lease_time 13905 attribute, to reclaim, (via the CLAIM_DELEGATE_PREV claim type of the 13906 OPEN operation) its delegations before removing unreclaimed 13907 delegations. 13909 15.37. Operation 37: VERIFY - Verify Same Attributes 13911 15.37.1. SYNOPSIS 13913 (cfh), fattr -> - 13915 15.37.2. ARGUMENT 13917 struct VERIFY4args { 13918 /* CURRENT_FH: object */ 13919 fattr4 obj_attributes; 13920 }; 13922 15.37.3. RESULT 13924 struct VERIFY4res { 13925 nfsstat4 status; 13926 }; 13928 15.37.4. DESCRIPTION 13930 The VERIFY operation is used to verify that attributes have a value 13931 assumed by the client before proceeding with following operations in 13932 the compound request. If any of the attributes do not match then the 13933 error NFS4ERR_NOT_SAME must be returned. The current filehandle 13934 retains its value after successful completion of the operation. 13936 15.37.5. IMPLEMENTATION 13938 One possible use of the VERIFY operation is the following compound 13939 sequence. With this the client is attempting to verify that the file 13940 being removed will match what the client expects to be removed. This 13941 sequence can help prevent the unintended deletion of a file. 13943 PUTFH (directory filehandle) 13944 LOOKUP (file name) 13945 VERIFY (filehandle == fh) 13946 PUTFH (directory filehandle) 13947 REMOVE (file name) 13949 This sequence does not prevent a second client from removing and 13950 creating a new file in the middle of this sequence but it does help 13951 avoid the unintended result. 13953 In the case that a recommended attribute is specified in the VERIFY 13954 operation and the server does not support that attribute for the 13955 filesystem object, the error NFS4ERR_ATTRNOTSUPP is returned to the 13956 client. 13958 When the attribute rdattr_error or any write-only attribute (e.g., 13959 time_modify_set) is specified, the error NFS4ERR_INVAL is returned to 13960 the client. 13962 15.38. Operation 38: WRITE - Write to File 13964 15.38.1. SYNOPSIS 13966 (cfh), stateid, offset, stable, data -> count, committed, writeverf 13968 15.38.2. ARGUMENT 13970 enum stable_how4 { 13971 UNSTABLE4 = 0, 13972 DATA_SYNC4 = 1, 13973 FILE_SYNC4 = 2 13974 }; 13976 struct WRITE4args { 13977 /* CURRENT_FH: file */ 13978 stateid4 stateid; 13979 offset4 offset; 13980 stable_how4 stable; 13981 opaque data<>; 13982 }; 13984 15.38.3. RESULT 13986 struct WRITE4resok { 13987 count4 count; 13988 stable_how4 committed; 13989 verifier4 writeverf; 13990 }; 13992 union WRITE4res switch (nfsstat4 status) { 13993 case NFS4_OK: 13994 WRITE4resok resok4; 13995 default: 13996 void; 13997 }; 13999 15.38.4. DESCRIPTION 14001 The WRITE operation is used to write data to a regular file. The 14002 target file is specified by the current filehandle. The offset 14003 specifies the offset where the data should be written. An offset of 14004 0 (zero) specifies that the write should start at the beginning of 14005 the file. The count, as encoded as part of the opaque data 14006 parameter, represents the number of bytes of data that are to be 14007 written. If the count is 0 (zero), the WRITE will succeed and return 14008 a count of 0 (zero) subject to permissions checking. The server may 14009 choose to write fewer bytes than requested by the client. 14011 Part of the write request is a specification of how the write is to 14012 be performed. The client specifies with the stable parameter the 14013 method of how the data is to be processed by the server. If stable 14014 is FILE_SYNC4, the server must commit the data written plus all 14015 filesystem metadata to stable storage before returning results. This 14016 corresponds to the NFS version 2 protocol semantics. Any other 14017 behavior constitutes a protocol violation. If stable is DATA_SYNC4, 14018 then the server must commit all of the data to stable storage and 14019 enough of the metadata to retrieve the data before returning. The 14020 server implementor is free to implement DATA_SYNC4 in the same 14021 fashion as FILE_SYNC4, but with a possible performance drop. If 14022 stable is UNSTABLE4, the server is free to commit any part of the 14023 data and the metadata to stable storage, including all or none, 14024 before returning a reply to the client. There is no guarantee 14025 whether or when any uncommitted data will subsequently be committed 14026 to stable storage. The only guarantees made by the server are that 14027 it will not destroy any data without changing the value of verf and 14028 that it will not commit the data and metadata at a level less than 14029 that requested by the client. 14031 The stateid value for a WRITE request represents a value returned 14032 from a previous byte-range lock or share reservation request or the 14033 stateid associated with a delegation. The stateid is used by the 14034 server to verify that the associated share reservation and any byte- 14035 range locks are still valid and to update lease timeouts for the 14036 client. 14038 Upon successful completion, the following results are returned. The 14039 count result is the number of bytes of data written to the file. The 14040 server may write fewer bytes than requested. If so, the actual 14041 number of bytes written starting at location, offset, is returned. 14043 The server also returns an indication of the level of commitment of 14044 the data and metadata via committed. If the server committed all 14045 data and metadata to stable storage, committed should be set to 14046 FILE_SYNC4. If the level of commitment was at least as strong as 14047 DATA_SYNC4, then committed should be set to DATA_SYNC4. Otherwise, 14048 committed must be returned as UNSTABLE4. If stable was FILE4_SYNC, 14049 then committed must also be FILE_SYNC4: anything else constitutes a 14050 protocol violation. If stable was DATA_SYNC4, then committed may be 14051 FILE_SYNC4 or DATA_SYNC4: anything else constitutes a protocol 14052 violation. If stable was UNSTABLE4, then committed may be either 14053 FILE_SYNC4, DATA_SYNC4, or UNSTABLE4. 14055 The final portion of the result is the write verifier. The write 14056 verifier is a cookie that the client can use to determine whether the 14057 server has changed instance (boot) state between a call to WRITE and 14058 a subsequent call to either WRITE or COMMIT. This cookie must be 14059 consistent during a single instance of the NFSv4 protocol service and 14060 must be unique between instances of the NFSv4 protocol server, where 14061 uncommitted data may be lost. 14063 If a client writes data to the server with the stable argument set to 14064 UNSTABLE4 and the reply yields a committed response of DATA_SYNC4 or 14065 UNSTABLE4, the client will follow up some time in the future with a 14066 COMMIT operation to synchronize outstanding asynchronous data and 14067 metadata with the server's stable storage, barring client error. It 14068 is possible that due to client crash or other error that a subsequent 14069 COMMIT will not be received by the server. 14071 For a WRITE with a stateid value of all bits 0, the server MAY allow 14072 the WRITE to be serviced subject to mandatory file locks or the 14073 current share deny modes for the file. For a WRITE with a stateid 14074 value of all bits 1, the server MUST NOT allow the WRITE operation to 14075 bypass locking checks at the server and are treated exactly the same 14076 as if a stateid of all bits 0 were used. 14078 On success, the current filehandle retains its value. 14080 15.38.5. IMPLEMENTATION 14082 It is possible for the server to write fewer bytes of data than 14083 requested by the client. In this case, the server should not return 14084 an error unless no data was written at all. If the server writes 14085 less than the number of bytes specified, the client should issue 14086 another WRITE to write the remaining data. 14088 It is assumed that the act of writing data to a file will cause the 14089 time_modified of the file to be updated. However, the time_modified 14090 of the file should not be changed unless the contents of the file are 14091 changed. Thus, a WRITE request with count set to 0 should not cause 14092 the time_modified of the file to be updated. 14094 The definition of stable storage has been historically a point of 14095 contention. The following expected properties of stable storage may 14096 help in resolving design issues in the implementation. Stable 14097 storage is persistent storage that survives: 14099 1. Repeated power failures. 14101 2. Hardware failures (of any board, power supply, etc.). 14103 3. Repeated software crashes, including reboot cycle. 14105 This definition does not address failure of the stable storage module 14106 itself. 14108 The verifier is defined to allow a client to detect different 14109 instances of an NFSv4 protocol server over which cached, uncommitted 14110 data may be lost. In the most likely case, the verifier allows the 14111 client to detect server reboots. This information is required so 14112 that the client can safely determine whether the server could have 14113 lost cached data. If the server fails unexpectedly and the client 14114 has uncommitted data from previous WRITE requests (done with the 14115 stable argument set to UNSTABLE4 and in which the result committed 14116 was returned as UNSTABLE4 as well) it may not have flushed cached 14117 data to stable storage. The burden of recovery is on the client and 14118 the client will need to retransmit the data to the server. 14120 A suggested verifier would be to use the time that the server was 14121 booted or the time the server was last started (if restarting the 14122 server without a reboot results in lost buffers). 14124 The committed field in the results allows the client to do more 14125 effective caching. If the server is committing all WRITE requests to 14126 stable storage, then it should return with committed set to 14127 FILE_SYNC4, regardless of the value of the stable field in the 14128 arguments. A server that uses an NVRAM accelerator may choose to 14129 implement this policy. The client can use this to increase the 14130 effectiveness of the cache by discarding cached data that has already 14131 been committed on the server. 14133 Some implementations may return NFS4ERR_NOSPC instead of 14134 NFS4ERR_DQUOT when a user's quota is exceeded. In the case that the 14135 current filehandle is a directory, the server will return 14136 NFS4ERR_ISDIR. If the current filehandle is not a regular file or a 14137 directory, the server will return NFS4ERR_INVAL. 14139 If mandatory file locking is on for the file, and corresponding 14140 record of the data to be written file is read or write locked by an 14141 owner that is not associated with the stateid, the server will return 14142 NFS4ERR_LOCKED. If so, the client must check if the owner 14143 corresponding to the stateid used with the WRITE operation has a 14144 conflicting read lock that overlaps with the region that was to be 14145 written. If the stateid's owner has no conflicting read lock, then 14146 the client should try to get the appropriate write byte-range lock 14147 via the LOCK operation before re-attempting the WRITE. When the 14148 WRITE completes, the client should release the byte-range lock via 14149 LOCKU. 14151 If the stateid's owner had a conflicting read lock, then the client 14152 has no choice but to return an error to the application that 14153 attempted the WRITE. The reason is that since the stateid's owner 14154 had a read lock, the server either attempted to temporarily 14155 effectively upgrade this read lock to a write lock, or the server has 14156 no upgrade capability. If the server attempted to upgrade the read 14157 lock and failed, it is pointless for the client to re-attempt the 14158 upgrade via the LOCK operation, because there might be another client 14159 also trying to upgrade. If two clients are blocked trying upgrade 14160 the same lock, the clients deadlock. If the server has no upgrade 14161 capability, then it is pointless to try a LOCK operation to upgrade. 14163 15.39. Operation 39: RELEASE_LOCKOWNER - Release Lockowner State 14165 15.39.1. SYNOPSIS 14167 lockowner -> () 14169 15.39.2. ARGUMENT 14171 struct RELEASE_LOCKOWNER4args { 14172 lock_owner4 lock_owner; 14173 }; 14175 15.39.3. RESULT 14177 struct RELEASE_LOCKOWNER4res { 14178 nfsstat4 status; 14179 }; 14181 15.39.4. DESCRIPTION 14183 This operation is used to notify the server that the lock_owner is no 14184 longer in use by the client. This allows the server to release 14185 cached state related to the specified lock_owner. If file locks, 14186 associated with the lock_owner, are held at the server, the error 14187 NFS4ERR_LOCKS_HELD will be returned and no further action will be 14188 taken. 14190 15.39.5. IMPLEMENTATION 14192 The client may choose to use this operation to ease the amount of 14193 server state that is held. Depending on behavior of applications at 14194 the client, it may be important for the client to use this operation 14195 since the server has certain obligations with respect to holding a 14196 reference to a lock_owner as long as the associated file is open. 14198 Therefore, if the client knows for certain that the lock_owner will 14199 no longer be used under the context of the associated open_owner4, it 14200 should use RELEASE_LOCKOWNER. 14202 15.40. Operation 10044: ILLEGAL - Illegal operation 14204 15.40.1. SYNOPSIS 14206 -> () 14208 15.40.2. ARGUMENT 14210 void; 14212 15.40.3. RESULT 14214 struct ILLEGAL4res { 14215 nfsstat4 status; 14216 }; 14218 15.40.4. DESCRIPTION 14220 This operation is a placeholder for encoding a result to handle the 14221 case of the client sending an operation code within COMPOUND that is 14222 not supported. See Section 15.2.4 for more details. 14224 The status field of ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL. 14226 15.40.5. IMPLEMENTATION 14228 A client will probably not send an operation with code OP_ILLEGAL but 14229 if it does, the response will be ILLEGAL4res just as it would be with 14230 any other invalid operation code. Note that if the server gets an 14231 illegal operation code that is not OP_ILLEGAL, and if the server 14232 checks for legal operation codes during the XDR decode phase, then 14233 the ILLEGAL4res would not be returned. 14235 16. NFSv4 Callback Procedures 14237 The procedures used for callbacks are defined in the following 14238 sections. In the interest of clarity, the terms "client" and 14239 "server" refer to NFS clients and servers, despite the fact that for 14240 an individual callback RPC, the sense of these terms would be 14241 precisely the opposite. 14243 16.1. Procedure 0: CB_NULL - No Operation 14245 16.1.1. SYNOPSIS 14247 14249 16.1.2. ARGUMENT 14251 void; 14253 16.1.3. RESULT 14255 void; 14257 16.1.4. DESCRIPTION 14259 Standard NULL procedure. Void argument, void response. Even though 14260 there is no direct functionality associated with this procedure, the 14261 server will use CB_NULL to confirm the existence of a path for RPCs 14262 from server to client. 14264 16.2. Procedure 1: CB_COMPOUND - Compound Operations 14266 16.2.1. SYNOPSIS 14268 compoundargs -> compoundres 14270 16.2.2. ARGUMENT 14272 enum nfs_cb_opnum4 { 14273 OP_CB_GETATTR = 3, 14274 OP_CB_RECALL = 4, 14275 OP_CB_ILLEGAL = 10044 14276 }; 14278 union nfs_cb_argop4 switch (unsigned argop) { 14279 case OP_CB_GETATTR: 14280 CB_GETATTR4args opcbgetattr; 14281 case OP_CB_RECALL: 14282 CB_RECALL4args opcbrecall; 14283 case OP_CB_ILLEGAL: void; 14284 }; 14285 struct CB_COMPOUND4args { 14286 comptag4 tag; 14287 uint32_t minorversion; 14288 uint32_t callback_ident; 14289 nfs_cb_argop4 argarray<>; 14290 }; 14292 16.2.3. RESULT 14294 union nfs_cb_resop4 switch (unsigned resop) { 14295 case OP_CB_GETATTR: CB_GETATTR4res opcbgetattr; 14296 case OP_CB_RECALL: CB_RECALL4res opcbrecall; 14297 case OP_CB_ILLEGAL: CB_ILLEGAL4res opcbillegal; 14298 }; 14300 struct CB_COMPOUND4res { 14301 nfsstat4 status; 14302 comptag4 tag; 14303 nfs_cb_resop4 resarray<>; 14304 }; 14306 16.2.4. DESCRIPTION 14308 The CB_COMPOUND procedure is used to combine one or more of the 14309 callback procedures into a single RPC request. The main callback RPC 14310 program has two main procedures: CB_NULL and CB_COMPOUND. All other 14311 operations use the CB_COMPOUND procedure as a wrapper. 14313 In the processing of the CB_COMPOUND procedure, the client may find 14314 that it does not have the available resources to execute any or all 14315 of the operations within the CB_COMPOUND sequence. In this case, the 14316 error NFS4ERR_RESOURCE will be returned for the particular operation 14317 within the CB_COMPOUND procedure where the resource exhaustion 14318 occurred. This assumes that all previous operations within the 14319 CB_COMPOUND sequence have been evaluated successfully. 14321 Contained within the CB_COMPOUND results is a 'status' field. This 14322 status must be equivalent to the status of the last operation that 14323 was executed within the CB_COMPOUND procedure. Therefore, if an 14324 operation incurred an error then the 'status' value will be the same 14325 error value as is being returned for the operation that failed. 14327 For the definition of the "tag" field, see Section 15.2. 14329 The value of callback_ident is supplied by the client during 14330 SETCLIENTID. The server must use the client supplied callback_ident 14331 during the CB_COMPOUND to allow the client to properly identify the 14332 server. 14334 Illegal operation codes are handled in the same way as they are 14335 handled for the COMPOUND procedure. 14337 16.2.5. IMPLEMENTATION 14339 The CB_COMPOUND procedure is used to combine individual operations 14340 into a single RPC request. The client interprets each of the 14341 operations in turn. If an operation is executed by the client and 14342 the status of that operation is NFS4_OK, then the next operation in 14343 the CB_COMPOUND procedure is executed. The client continues this 14344 process until there are no more operations to be executed or one of 14345 the operations has a status value other than NFS4_OK. 14347 16.2.6. Operation 3: CB_GETATTR - Get Attributes 14349 16.2.6.1. SYNOPSIS 14351 fh, attr_request -> attrmask, attr_vals 14353 16.2.6.2. ARGUMENT 14355 struct CB_GETATTR4args { 14356 nfs_fh4 fh; 14357 bitmap4 attr_request; 14358 }; 14360 16.2.6.3. RESULT 14362 struct CB_GETATTR4resok { 14363 fattr4 obj_attributes; 14364 }; 14366 union CB_GETATTR4res switch (nfsstat4 status) { 14367 case NFS4_OK: 14368 CB_GETATTR4resok resok4; 14369 default: 14370 void; 14371 }; 14373 16.2.6.4. DESCRIPTION 14375 The CB_GETATTR operation is used by the server to obtain the current 14376 modified state of a file that has been OPEN_DELEGATE_WRITE delegated. 14377 The attributes size and change are the only ones guaranteed to be 14378 serviced by the client. See Section 10.4.3 for a full description of 14379 how the client and server are to interact with the use of CB_GETATTR. 14381 If the filehandle specified is not one for which the client holds a 14382 OPEN_DELEGATE_WRITE delegation, an NFS4ERR_BADHANDLE error is 14383 returned. 14385 16.2.6.5. IMPLEMENTATION 14387 The client returns attrmask bits and the associated attribute values 14388 only for the change attribute, and attributes that it may change 14389 (time_modify, and size). 14391 16.2.7. Operation 4: CB_RECALL - Recall an Open Delegation 14393 16.2.7.1. SYNOPSIS 14395 stateid, truncate, fh -> () 14397 16.2.7.2. ARGUMENT 14399 struct CB_RECALL4args { 14400 stateid4 stateid; 14401 bool truncate; 14402 nfs_fh4 fh; 14403 }; 14405 16.2.7.3. RESULT 14407 struct CB_RECALL4res { 14408 nfsstat4 status; 14409 }; 14411 16.2.7.4. DESCRIPTION 14413 The CB_RECALL operation is used to begin the process of recalling an 14414 open delegation and returning it to the server. 14416 The truncate flag is used to optimize recall for a file which is 14417 about to be truncated to zero. When it is set, the client is freed 14418 of obligation to propagate modified data for the file to the server, 14419 since this data is irrelevant. 14421 If the handle specified is not one for which the client holds an open 14422 delegation, an NFS4ERR_BADHANDLE error is returned. 14424 If the stateid specified is not one corresponding to an open 14425 delegation for the file specified by the filehandle, an 14426 NFS4ERR_BAD_STATEID is returned. 14428 16.2.7.5. IMPLEMENTATION 14430 The client should reply to the callback immediately. Replying does 14431 not complete the recall except when an error was returned. The 14432 recall is not complete until the delegation is returned using a 14433 DELEGRETURN. 14435 16.2.8. Operation 10044: CB_ILLEGAL - Illegal Callback Operation 14437 16.2.8.1. SYNOPSIS 14439 -> () 14441 16.2.8.2. ARGUMENT 14443 void; 14445 16.2.8.3. RESULT 14447 /* 14448 * CB_ILLEGAL: Response for illegal operation numbers 14449 */ 14450 struct CB_ILLEGAL4res { 14451 nfsstat4 status; 14452 }; 14454 16.2.8.4. DESCRIPTION 14456 This operation is a placeholder for encoding a result to handle the 14457 case of the client sending an operation code within COMPOUND that is 14458 not supported. See Section 15.2.4 for more details. 14460 The status field of CB_ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL. 14462 16.2.8.5. IMPLEMENTATION 14464 A server will probably not send an operation with code OP_CB_ILLEGAL 14465 but if it does, the response will be CB_ILLEGAL4res just as it would 14466 be with any other invalid operation code. Note that if the client 14467 gets an illegal operation code that is not OP_ILLEGAL, and if the 14468 client checks for legal operation codes during the XDR decode phase, 14469 then the CB_ILLEGAL4res would not be returned. 14471 17. Security Considerations 14473 NFS has historically used a model where, from an authentication 14474 perspective, the client was the entire machine, or at least the 14475 source IP address of the machine. The NFS server relied on the NFS 14476 client to make the proper authentication of the end-user. The NFS 14477 server in turn shared its files only to specific clients, as 14478 identified by the client's source IP address. Given this model, the 14479 AUTH_SYS RPC security flavor simply identified the end-user using the 14480 client to the NFS server. When processing NFS responses, the client 14481 ensured that the responses came from the same IP address and port 14482 number that the request was sent to. While such a model is easy to 14483 implement and simple to deploy and use, it is certainly not a safe 14484 model. Thus, NFSv4 mandates that implementations support a security 14485 model that uses end to end authentication, where an end-user on a 14486 client mutually authenticates (via cryptographic schemes that do not 14487 expose passwords or keys in the clear on the network) to a principal 14488 on an NFS server. Consideration should also be given to the 14489 integrity and privacy of NFS requests and responses. The issues of 14490 end to end mutual authentication, integrity, and privacy are 14491 discussed as part of Section 3. 14493 Note that while NFSv4 mandates an end to end mutual authentication 14494 model, the "classic" model of machine authentication via IP address 14495 checking and AUTH_SYS identification can still be supported with the 14496 caveat that the AUTH_SYS flavor is neither MANDATORY nor RECOMMENDED 14497 by this specification, and so interoperability via AUTH_SYS is not 14498 assured. 14500 For reasons of reduced administration overhead, better performance 14501 and/or reduction of CPU utilization, users of NFSv4 implementations 14502 may choose to not use security mechanisms that enable integrity 14503 protection on each remote procedure call and response. The use of 14504 mechanisms without integrity leaves the customer vulnerable to an 14505 attacker in between the NFS client and server that modifies the RPC 14506 request and/or the response. While implementations are free to 14507 provide the option to use weaker security mechanisms, there are two 14508 operations in particular that warrant the implementation overriding 14509 user choices. 14511 The first such operation is SECINFO. It is recommended that the 14512 client issue the SECINFO call such that it is protected with a 14513 security flavor that has integrity protection, such as RPCSEC_GSS 14514 with a security triple that uses either rpc_gss_svc_integrity or 14515 rpc_gss_svc_privacy (rpc_gss_svc_privacy includes integrity 14516 protection) service. Without integrity protection encapsulating 14517 SECINFO and therefore its results, an attacker in the middle could 14518 modify results such that the client might select a weaker algorithm 14519 in the set allowed by server, making the client and/or server 14520 vulnerable to further attacks. 14522 The second operation that should definitely use integrity protection 14523 is any GETATTR for the fs_locations attribute. The attack has two 14524 steps. First the attacker modifies the unprotected results of some 14525 operation to return NFS4ERR_MOVED. Second, when the client follows 14526 up with a GETATTR for the fs_locations attribute, the attacker 14527 modifies the results to cause the client migrate its traffic to a 14528 server controlled by the attacker. 14530 Because the operations SETCLIENTID/SETCLIENTID_CONFIRM are 14531 responsible for the release of client state, it is imperative that 14532 the principal used for these operations is checked against and match 14533 the previous use of these operations. See Section 9.1.1 for further 14534 discussion. 14536 18. IANA Considerations 14538 This section uses terms that are defined in [41]. 14540 18.1. Named Attribute Definitions 14542 IANA will create a registry called the "NFSv4 Named Attribute 14543 Definitions Registry". 14545 The NFSv4 protocol supports the association of a file with zero or 14546 more named attributes. The name space identifiers for these 14547 attributes are defined as string names. The protocol does not define 14548 the specific assignment of the name space for these file attributes. 14549 An IANA registry will promote interoperability where common interests 14550 exist. While application developers are allowed to define and use 14551 attributes as needed, they are encouraged to register the attributes 14552 with IANA. 14554 Such registered named attributes are presumed to apply to all minor 14555 versions of NFSv4, including those defined subsequently to the 14556 registration. Where the named attribute is intended to be limited 14557 with regard to the minor versions for which they are not be used, the 14558 assignment in registry will clearly state the applicable limits. 14560 All assignments to the registry are made on a First Come First Served 14561 basis, per section 4.1 of [41]. The policy for each assignment is 14562 Specification Required, per section 4.1 of [41]. 14564 Under the NFSv4 specification, the name of a named attribute can in 14565 theory be up to 2^32 - 1 bytes in length, but in practice NFSv4 14566 clients and servers will be unable to a handle string that long. 14567 IANA should reject any assignment request with a named attribute that 14568 exceeds 128 UTF-8 characters. To give IESG the flexibility to set up 14569 bases of assignment of Experimental Use and Standards Action, the 14570 prefixes of "EXPE" and "STDS" are Reserved. The zero length named 14571 attribute name is Reserved. 14573 The prefix "PRIV" is allocated for Private Use. A site that wants to 14574 make use of unregistered named attributes without risk of conflicting 14575 with an assignment in IANA's registry should use the prefix "PRIV" in 14576 all of its named attributes. 14578 Because some NFSv4 clients and servers have case insensitive 14579 semantics, the fifteen additional lower case and mixed case 14580 permutations of each of "EXPE", "PRIV", and "STDS", are Reserved 14581 (e.g. "expe", "expE", "exPe", etc. are Reserved). Similarly, IANA 14582 must not allow two assignments that would conflict if both named 14583 attributes were converted to a common case. 14585 The registry of named attributes is a list of assignments, each 14586 containing three fields for each assignment. 14588 1. A US-ASCII string name that is the actual name of the attribute. 14589 This name must be unique. This string name can be 1 to 128 UTF-8 14590 characters long. 14592 2. A reference to the specification of the named attribute. The 14593 reference can consume up to 256 bytes (or more if IANA permits). 14595 3. The point of contact of the registrant. The point of contact can 14596 consume up to 256 bytes (or more if IANA permits). 14598 18.1.1. Initial Registry 14600 There is no initial registry. 14602 18.1.2. Updating Registrations 14604 The registrant is always permitted to update the point of contact 14605 field. To make any other change will require Expert Review or IESG 14606 Approval. 14608 18.2. ONC RPC Network Identifiers (netids) 14610 Section 2.2 discussed the r_netid field and the corresponding r_addr 14611 field of a clientaddr4 structure. The NFSv4 protocol depends on the 14612 syntax and semantics of these fields to effectively communicate 14613 callback information between client and server. Therefore, an IANA 14614 registry has been created to include the values defined in this 14615 document and to allow for future expansion based on transport usage/ 14616 availability. Additions to this ONC RPC Network Identifier registry 14617 must be done with the publication of an RFC. 14619 The initial values for this registry are as follows (some of this 14620 text is replicated from section 2.2 for clarity): 14622 The Network Identifier (or r_netid for short) is used to specify a 14623 transport protocol and associated universal address (or r_addr for 14624 short). The syntax of the Network Identifier is a US-ASCII string. 14625 The initial definitions for r_netid are: 14627 "tcp": TCP over IP version 4 14629 "udp": UDP over IP version 4 14631 "tcp6": TCP over IP version 6 14633 "udp6": UDP over IP version 6 14635 Note: the '"' marks are used for delimiting the strings for this 14636 document and are not part of the Network Identifier string. 14638 For the "tcp" and "udp" Network Identifiers the Universal Address or 14639 r_addr (for IPv4) is a US-ASCII string and is of the form: 14641 h1.h2.h3.h4.p1.p2 14643 The prefix, "h1.h2.h3.h4", is the standard textual form for 14644 representing an IPv4 address, which is always four octets long. 14645 Assuming big-endian ordering, h1, h2, h3, and h4, are respectively, 14646 the first through fourth octets each converted to ASCII-decimal. 14647 Assuming big-endian ordering, p1 and p2 are, respectively, the first 14648 and second octets each converted to ASCII-decimal. For example, if a 14649 host, in big-endian order, has an address of 0x0A010307 and there is 14650 a service listening on, in big endian order, port 0x020F (decimal 14651 527), then complete universal address is "10.1.3.7.2.15". 14653 For the "tcp6" and "udp6" Network Identifiers the Universal Address 14654 or r_addr (for IPv6) is a US-ASCII string and is of the form: 14656 x1:x2:x3:x4:x5:x6:x7:x8.p1.p2 14658 The suffix "p1.p2" is the service port, and is computed the same way 14659 as with universal addresses for "tcp" and "udp". The prefix, "x1:x2: 14660 x3:x4:x5:x6:x7:x8", is the standard textual form for representing an 14661 IPv6 address as defined in Section 2.2 of [18]. Additionally, the 14662 two alternative forms specified in Section 2.2 of [18] are also 14663 acceptable. 14665 18.2.1. Initial Registry 14667 There is no initial registry. 14669 18.2.2. Updating Registrations 14671 The registrant is always permitted to update the point of contact 14672 field. To make any other change will require Expert Review or IESG 14673 Approval. 14675 19. References 14677 19.1. Normative References 14679 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 14680 Levels", March 1997. 14682 [2] Haynes, T. and D. Noveck, "NFSv4 Version 0 XDR Description", 14683 draft-ietf-nfsv4-rfc3530bis-dot-x-02 (work in progress), 14684 Feb 2011. 14686 [3] Thurlow, R., "RPC: Remote Procedure Call Protocol Specification 14687 Version 2", RFC 5531, May 2009. 14689 [4] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol 14690 Specification", RFC 2203, September 1997. 14692 [5] Eisler, M., "LIPKEY - A Low Infrastructure Public Key Mechanism 14693 Using SPKM", RFC 2847, June 2000. 14695 [6] Linn, J., "Generic Security Service Application Program 14696 Interface Version 2, Update 1", RFC 2743, January 2000. 14698 [7] International Organization for Standardization, "Information 14699 Technology - Universal Multiple-octet coded Character Set (UCS) 14700 - Part 1: Architecture and Basic Multilingual Plane", 14701 ISO Standard 10646-1, May 1993. 14703 [8] Alvestrand, H., "IETF Policy on Character Sets and Languages", 14704 BCP 18, RFC 2277, January 1998. 14706 [9] Hoffman, P. and M. Blanchet, "Preparation of Internationalized 14707 Strings ("stringprep")", RFC 3454, December 2002. 14709 [10] Klensin, J., "Internationalized Domain Names in Applications 14710 (IDNA): Protocol", draft-ietf-idnabis-protocol-18 (work in 14711 progress), January 2010. 14713 19.2. Informative References 14715 [11] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame, 14716 C., Eisler, M., and D. Noveck, "Network File System (NFS) 14717 version 4 Protocol", RFC 3530, April 2003. 14719 [12] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame, 14720 C., Eisler, M., and D. Noveck, "Network File System (NFS) 14721 version 4 Protocol", RFC 3010, December 2000. 14723 [13] Nowicki, B., "NFS: Network File System Protocol specification", 14724 RFC 1094, March 1989. 14726 [14] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS Version 3 14727 Protocol Specification", RFC 1813, June 1995. 14729 [15] Eisler, M., "XDR: External Data Representation Standard", 14730 RFC 4506, May 2006. 14732 [16] Linn, J., "The Kerberos Version 5 GSS-API Mechanism", RFC 1964, 14733 June 1996. 14735 [17] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", 14736 RFC 1833, August 1995. 14738 [18] Hinden, R. and S. Deering, "IP Version 6 Addressing 14739 Architecture", RFC 2373, July 1998. 14741 [19] Reynolds, J., "Assigned Numbers: RFC 1700 is Replaced by an On- 14742 line Database", RFC 3232, January 2002. 14744 [20] Floyd, S. and V. Jacobson, "The Synchronization of Periodic 14745 Routing Messages", IEEE/ACM Transactions on Networking 2(2), 14746 pp. 122-136, April 1994. 14748 [21] Eisler, M., "NFS Version 2 and Version 3 Security Issues and 14749 the NFS Protocol's Use of RPCSEC_GSS and Kerberos V5", 14750 RFC 2623, June 1999. 14752 [22] Adams, C., "The Simple Public-Key GSS-API Mechanism (SPKM)", 14753 RFC 2025, October 1996. 14755 [23] Callaghan, B., "WebNFS Client Specification", RFC 2054, 14756 October 1996. 14758 [24] Callaghan, B., "WebNFS Server Specification", RFC 2055, 14759 October 1996. 14761 [25] IESG, "IESG Processing of RFC Errata for the IETF Stream", 14762 July 2008. 14764 [26] The Open Group, "Section 'read()' of System Interfaces of The 14765 Open Group Base Specifications Issue 6, IEEE Std 1003.1, 2004 14766 Edition", 2004. 14768 [27] The Open Group, "Section 'readdir()' of System Interfaces of 14769 The Open Group Base Specifications Issue 6, IEEE Std 1003.1, 14770 2004 Edition", 2004. 14772 [28] The Open Group, "Section 'write()' of System Interfaces of The 14773 Open Group Base Specifications Issue 6, IEEE Std 1003.1, 2004 14774 Edition", 2004. 14776 [29] Shepler, S., "NFS Version 4 Design Considerations", RFC 2624, 14777 June 1999. 14779 [30] Simonsen, K., "Character Mnemonics and Character Sets", 14780 RFC 1345, June 1992. 14782 [31] Shepler, S., Eisler, M., and D. Noveck, "Network File System 14783 (NFS) Version 4 Minor Version 1 Protocol", RFC 5661, 14784 January 2010. 14786 [32] The Open Group, "Protocols for Interworking: XNFS, Version 3W, 14787 ISBN 1-85912-184-5", February 1998. 14789 [33] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 14790 September 1981. 14792 [34] Juszczak, C., "Improving the Performance and Correctness of an 14793 NFS Server", USENIX Conference Proceedings , June 1990. 14795 [35] The Open Group, "Section 'fcntl()' of System Interfaces of The 14796 Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 14797 Edition, HTML Version (www.opengroup.org), ISBN 1931624232", 14798 2004. 14800 [36] The Open Group, "Section 'fsync()' of System Interfaces of The 14801 Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 14802 Edition, HTML Version (www.opengroup.org), ISBN 1931624232", 14803 2004. 14805 [37] The Open Group, "Section 'getpwnam()' of System Interfaces of 14806 The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 14807 2004 Edition, HTML Version (www.opengroup.org), ISBN 14808 1931624232", 2004. 14810 [38] Callaghan, B., "NFS URL Scheme", RFC 2224, October 1997. 14812 [39] Chiu, A., Eisler, M., and B. Callaghan, "Security Negotiation 14813 for WebNFS", RFC 2755, January 2000. 14815 [40] The Open Group, "Section 'unlink()' of System Interfaces of The 14816 Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 14817 Edition, HTML Version (www.opengroup.org), ISBN 1931624232", 14818 2004. 14820 [41] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA 14821 Considerations Section in RFCs", BCP 26, RFC 5226, May 2008. 14823 Appendix A. Acknowledgments 14825 A bis is certainly built on the shoulders of the first attempt. 14826 Spencer Shepler, Brent Callaghan, David Robinson, Robert Thurlow, 14827 Carl Beame, Mike Eisler, and David Noveck are responsible for a great 14828 deal of the effort in this work. 14830 Rob Thurlow clarified how a client should contact a new server if a 14831 migration has occurred. 14833 David Black, Nico Williams, Mike Eisler, Trond Myklebust, and James 14834 Lentini read many drafts of Section 12 and contributed numerous 14835 useful suggestions, without which the necessary revision of that 14836 section for this document would not have been possible. 14838 Peter Staubach read almost all of the drafts of Section 12 leading to 14839 the published result and his numerous comments were always useful and 14840 contributed substantially to improving the quality of the final 14841 result. 14843 James Lentini graciously read the rewrite of Section 7 and his 14844 comments were vital in improving the quality of that effort. 14846 Rob Thurlow, Sorin Faibish, James Lentini, Bruce Fields, and Trond 14847 Myklebust were faithful attendants of the biweekly triage meeting and 14848 accepted many an action item. 14850 Bruce Fields was a good sounding board for both the Third Edge 14851 Condition and Courtsey Locks in general. 14853 Appendix B. RFC Editor Notes 14855 [RFC Editor: please remove this section prior to publishing this 14856 document as an RFC] 14858 [RFC Editor: prior to publishing this document as an RFC, please 14859 replace all occurrences of RFCTBD10 with RFCxxxx where xxxx is the 14860 RFC number of this document] 14862 Authors' Addresses 14864 Thomas Haynes (editor) 14865 NetApp 14866 9110 E 66th St 14867 Tulsa, OK 74133 14868 USA 14870 Phone: +1 918 307 1415 14871 Email: thomas@netapp.com 14872 URI: http://www.tulsalabs.com 14874 David Noveck (editor) 14875 EMC Corporation 14876 32 Coslin Drive 14877 Southborough, MA 01772 14878 US 14880 Phone: +1 508 305 8404 14881 Email: novecd@emc.com