idnits 2.17.1 draft-ietf-nfsv4-minorversion1-23.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 27510. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 27521. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 27528. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 27534. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 3 instances of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Line 982 has weird spacing: '...privacy no ...' == Line 1001 has weird spacing: '...privacy no ...' == Line 1012 has weird spacing: '...privacy no ...' == Line 2936 has weird spacing: '...est|Pad bytes...' == Line 4058 has weird spacing: '... opaque devic...' == (31 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 12, 2008) is 5799 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0' is mentioned on line 14559, but not defined -- Looks like a reference, but probably isn't: 'X' on line 14311 -- Looks like a reference, but probably isn't: 'Y' on line 14319 -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 1831 (ref. '3') (Obsoleted by RFC 5531) == Outdated reference: A later version (-09) exists of draft-ietf-nfsv4-rpcrdma-06 == Outdated reference: A later version (-08) exists of draft-ietf-nfsv4-nfsdirect-06 ** Downref: Normative reference to an Informational RFC: RFC 2104 (ref. '11') == Outdated reference: A later version (-12) exists of draft-ietf-nfsv4-minorversion1-dot-x-04 ** Obsolete normative reference: RFC 3513 (ref. '13') (Obsoleted by RFC 4291) -- Possible downref: Non-RFC (?) normative reference: ref. '14' ** Obsolete normative reference: RFC 3454 (ref. '16') (Obsoleted by RFC 7564) ** Obsolete normative reference: RFC 3491 (ref. '17') (Obsoleted by RFC 5891) -- Possible downref: Non-RFC (?) normative reference: ref. '19' ** Obsolete normative reference: RFC 2434 (ref. '20') (Obsoleted by RFC 5226) -- Obsolete informational reference (is this intentional?): RFC 3530 (ref. '21') (Obsoleted by RFC 7530) -- No information found for draft-nfsv4-pnfs-obj - is the name correct? == Outdated reference: A later version (-12) exists of draft-ietf-nfsv4-pnfs-block-08 -- Obsolete informational reference (is this intentional?): RFC 3720 (ref. '38') (Obsoleted by RFC 7143) Summary: 7 errors (**), 0 flaws (~~), 13 warnings (==), 16 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 S. Shepler 3 Internet-Draft M. Eisler 4 Intended status: Standards Track D. Noveck 5 Expires: November 13, 2008 Editors 6 May 12, 2008 8 NFS Version 4 Minor Version 1 9 draft-ietf-nfsv4-minorversion1-23.txt 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on November 13, 2008. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2008). 40 Abstract 42 This Internet-Draft describes NFS version 4 minor version one, 43 including features retained from the base protocol and protocol 44 extensions made subsequently. Major extensions introduced in NFS 45 version 4 minor version one include: Sessions, Directory Delegations, 46 and parallel NFS (pNFS). 48 Requirements Language 50 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 51 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 52 document are to be interpreted as described in RFC 2119 [1]. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 11 57 1.1. The NFS Version 4 Minor Version 1 Protocol . . . . . . . 11 58 1.2. Scope of this Document . . . . . . . . . . . . . . . . . 11 59 1.3. NFSv4 Goals . . . . . . . . . . . . . . . . . . . . . . 11 60 1.4. NFSv4.1 Goals . . . . . . . . . . . . . . . . . . . . . 12 61 1.5. General Definitions . . . . . . . . . . . . . . . . . . 12 62 1.6. Overview of NFSv4.1 Features . . . . . . . . . . . . . . 15 63 1.6.1. RPC and Security . . . . . . . . . . . . . . . . . . 15 64 1.6.2. Protocol Structure . . . . . . . . . . . . . . . . . 15 65 1.6.3. File System Model . . . . . . . . . . . . . . . . . 16 66 1.6.4. Locking Facilities . . . . . . . . . . . . . . . . . 18 67 1.7. Differences from NFSv4.0 . . . . . . . . . . . . . . . . 18 68 2. Core Infrastructure . . . . . . . . . . . . . . . . . . . . . 19 69 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 20 70 2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . 20 71 2.2.1. RPC-based Security . . . . . . . . . . . . . . . . . 20 72 2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 23 73 2.4. Client Identifiers and Client Owners . . . . . . . . . . 24 74 2.4.1. Upgrade from NFSv4.0 to NFSv4.1 . . . . . . . . . . 27 75 2.4.2. Server Release of Client ID . . . . . . . . . . . . 28 76 2.4.3. Resolving Client Owner Conflicts . . . . . . . . . . 28 77 2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . 29 78 2.6. Security Service Negotiation . . . . . . . . . . . . . . 30 79 2.6.1. NFSv4.1 Security Tuples . . . . . . . . . . . . . . 30 80 2.6.2. SECINFO and SECINFO_NO_NAME . . . . . . . . . . . . 30 81 2.6.3. Security Error . . . . . . . . . . . . . . . . . . . 31 82 2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 35 83 2.8. Non-RPC-based Security Services . . . . . . . . . . . . 38 84 2.8.1. Authorization . . . . . . . . . . . . . . . . . . . 38 85 2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 38 86 2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 38 87 2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 38 88 2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 38 89 2.9.2. Client and Server Transport Behavior . . . . . . . . 39 90 2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 41 91 2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 41 92 2.10.1. Motivation and Overview . . . . . . . . . . . . . . 41 93 2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 42 94 2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 44 95 2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 45 96 2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 48 97 2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 61 98 2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 63 99 2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 69 100 2.10.9. Session Mechanics - Steady State . . . . . . . . . . 73 101 2.10.10. Session Inactivity Timer . . . . . . . . . . . . . . 75 102 2.10.11. Session Mechanics - Recovery . . . . . . . . . . . . 75 103 2.10.12. Parallel NFS and Sessions . . . . . . . . . . . . . 78 104 3. Protocol Constants and Data Types . . . . . . . . . . . . . . 78 105 3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 79 106 3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 79 107 3.3. Structured Data Types . . . . . . . . . . . . . . . . . 81 108 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 90 109 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 90 110 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 91 111 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 91 112 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 91 113 4.2.1. General Properties of a Filehandle . . . . . . . . . 92 114 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 93 115 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 93 116 4.3. One Method of Constructing a Volatile Filehandle . . . . 94 117 4.4. Client Recovery from Filehandle Expiration . . . . . . . 95 118 5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 96 119 5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 97 120 5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 97 121 5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 98 122 5.4. Classification of Attributes . . . . . . . . . . . . . . 99 123 5.5. Set-Only and Get-Only Attributes . . . . . . . . . . . . 100 124 5.6. REQUIRED Attributes - List and Definition References . . 100 125 5.7. RECOMMENDED Attributes - List and Definition 126 References . . . . . . . . . . . . . . . . . . . . . . . 101 127 5.8. Attribute Definitions . . . . . . . . . . . . . . . . . 103 128 5.8.1. Definitions of REQUIRED Attributes . . . . . . . . . 103 129 5.8.2. Definitions of Uncategorized RECOMMENDED 130 Attributes . . . . . . . . . . . . . . . . . . . . . 105 131 5.9. Interpreting owner and owner_group . . . . . . . . . . . 112 132 5.10. Character Case Attributes . . . . . . . . . . . . . . . 114 133 5.11. Directory Notification Attributes . . . . . . . . . . . 114 134 5.12. pNFS Attribute Definitions . . . . . . . . . . . . . . . 114 135 5.13. Retention Attributes . . . . . . . . . . . . . . . . . . 116 136 6. Access Control Attributes . . . . . . . . . . . . . . . . . . 119 137 6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 119 138 6.2. File Attributes Discussion . . . . . . . . . . . . . . . 120 139 6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 120 140 6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 135 141 6.2.3. Attribute 59: sacl . . . . . . . . . . . . . . . . . 135 142 6.2.4. Attribute 33: mode . . . . . . . . . . . . . . . . . 135 143 6.2.5. Attribute 74: mode_set_masked . . . . . . . . . . . 136 144 6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 137 145 6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 137 146 6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 138 147 6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 139 148 6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 139 149 6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 141 150 6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 141 151 7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 145 152 7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 145 153 7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 146 154 7.3. Server Pseudo File System . . . . . . . . . . . . . . . 146 155 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 147 156 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 147 157 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 147 158 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 148 159 7.8. Security Policy and Namespace Presentation . . . . . . . 148 160 8. State Management . . . . . . . . . . . . . . . . . . . . . . 149 161 8.1. Client and Session ID . . . . . . . . . . . . . . . . . 150 162 8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 150 163 8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 151 164 8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 152 165 8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 154 166 8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 155 167 8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 158 168 8.2.6. Stateid Use for SETATTR Operations . . . . . . . . . 159 169 8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 159 170 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 161 171 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 162 172 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 163 173 8.4.3. Network Partitions and Recovery . . . . . . . . . . 166 174 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 171 175 8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 172 176 8.7. Clocks, Propagation Delay, and Calculating Lease 177 Expiration . . . . . . . . . . . . . . . . . . . . . . . 172 178 8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 173 179 9. File Locking and Share Reservations . . . . . . . . . . . . . 174 180 9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 174 181 9.1.1. State-owner Definition . . . . . . . . . . . . . . . 174 182 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 175 183 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 178 184 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 178 185 9.4. Stateid Seqid Values and Byte-Range Locks . . . . . . . 179 186 9.5. Issues with Multiple Open-Owners . . . . . . . . . . . . 179 187 9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 180 188 9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 181 189 9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 181 190 9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 182 191 9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 183 192 9.11. Reclaim of Open and Byte-Range Locks . . . . . . . . . . 184 193 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 184 194 10.1. Performance Challenges for Client-Side Caching . . . . . 185 195 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 186 196 10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 188 197 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 190 198 10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 190 199 10.3.2. Data Caching and File Locking . . . . . . . . . . . 191 200 10.3.3. Data Caching and Mandatory File Locking . . . . . . 193 201 10.3.4. Data Caching and File Identity . . . . . . . . . . . 193 202 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 195 203 10.4.1. Open Delegation and Data Caching . . . . . . . . . . 197 204 10.4.2. Open Delegation and File Locks . . . . . . . . . . . 198 205 10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 199 206 10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 202 207 10.4.5. Clients that Fail to Honor Delegation Recalls . . . 204 208 10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 204 209 10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 205 210 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 206 211 10.5.1. Revocation Recovery for Write Open Delegation . . . 206 212 10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 207 213 10.7. Data and Metadata Caching and Memory Mapped Files . . . 209 214 10.8. Name and Directory Caching without Directory 215 Delegations . . . . . . . . . . . . . . . . . . . . . . 211 216 10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 211 217 10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 213 218 10.9. Directory Delegations . . . . . . . . . . . . . . . . . 214 219 10.9.1. Introduction to Directory Delegations . . . . . . . 214 220 10.9.2. Directory Delegation Design . . . . . . . . . . . . 215 221 10.9.3. Attributes in Support of Directory Notifications . . 216 222 10.9.4. Directory Delegation Recall . . . . . . . . . . . . 216 223 10.9.5. Directory Delegation Recovery . . . . . . . . . . . 217 224 11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 217 225 11.1. Location Attributes . . . . . . . . . . . . . . . . . . 217 226 11.2. File System Presence or Absence . . . . . . . . . . . . 218 227 11.3. Getting Attributes for an Absent File System . . . . . . 219 228 11.3.1. GETATTR Within an Absent File System . . . . . . . . 219 229 11.3.2. READDIR and Absent File Systems . . . . . . . . . . 220 230 11.4. Uses of Location Information . . . . . . . . . . . . . . 221 231 11.4.1. File System Replication . . . . . . . . . . . . . . 222 232 11.4.2. File System Migration . . . . . . . . . . . . . . . 222 233 11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 224 234 11.5. Location Entries and Server Identity . . . . . . . . . . 225 235 11.6. Additional Client-side Considerations . . . . . . . . . 226 236 11.7. Effecting File System Transitions . . . . . . . . . . . 226 237 11.7.1. File System Transitions and Simultaneous Access . . 228 238 11.7.2. Simultaneous Use and Transparent Transitions . . . . 228 239 11.7.3. Filehandles and File System Transitions . . . . . . 231 240 11.7.4. Fileids and File System Transitions . . . . . . . . 231 241 11.7.5. Fsids and File System Transitions . . . . . . . . . 233 242 11.7.6. The Change Attribute and File System Transitions . . 233 243 11.7.7. Lock State and File System Transitions . . . . . . . 234 244 11.7.8. Write Verifiers and File System Transitions . . . . 238 245 11.7.9. Readdir Cookies and Verifiers and File System 246 Transitions . . . . . . . . . . . . . . . . . . . . 238 247 11.7.10. File System Data and File System Transitions . . . . 238 248 11.8. Effecting File System Referrals . . . . . . . . . . . . 240 249 11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 240 250 11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 244 251 11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 246 252 11.10. The Attribute fs_locations_info . . . . . . . . . . . . 249 253 11.10.1. The fs_locations_server4 Structure . . . . . . . . . 253 254 11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 258 255 11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 259 256 11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 261 257 12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 265 258 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 265 259 12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 266 260 12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 267 261 12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 267 262 12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 267 263 12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 267 264 12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 268 265 12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 268 266 12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 268 267 12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 269 268 12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 269 269 12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 270 270 12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 271 271 12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 272 272 12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 272 273 12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 272 274 12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 273 275 12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 274 276 12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 276 277 12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 279 278 12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 287 279 12.5.7. Metadata Server Write Propagation . . . . . . . . . 287 280 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 287 281 12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 289 282 12.7.1. Recovery from Client Restart . . . . . . . . . . . . 289 283 12.7.2. Dealing with Lease Expiration on the Client . . . . 290 284 12.7.3. Dealing with Loss of Layout State on the Metadata 285 Server . . . . . . . . . . . . . . . . . . . . . . . 291 286 12.7.4. Recovery from Metadata Server Restart . . . . . . . 291 287 12.7.5. Operations During Metadata Server Grace Period . . . 293 288 12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 294 289 12.8. Metadata and Storage Device Roles . . . . . . . . . . . 294 290 12.9. Security Considerations for pNFS . . . . . . . . . . . . 294 291 13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 295 292 13.1. Client ID and Session Considerations . . . . . . . . . . 296 293 13.1.1. Sessions Considerations for Data Servers . . . . . . 298 294 13.2. File Layout Definitions . . . . . . . . . . . . . . . . 298 295 13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 299 296 13.4. Interpreting the File Layout . . . . . . . . . . . . . . 303 297 13.4.1. Determining the Stripe Unit Number . . . . . . . . . 303 298 13.4.2. Interpreting the File Layout Using Sparse Packing . 303 299 13.4.3. Interpreting the File Layout Using Dense Packing . . 306 300 13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 308 301 13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 310 302 13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 311 303 13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 313 304 13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 315 305 13.9. Metadata and Data Server State Coordination . . . . . . 315 306 13.9.1. Global Stateid Requirements . . . . . . . . . . . . 315 307 13.9.2. Data Server State Propagation . . . . . . . . . . . 316 308 13.10. Data Server Component File Size . . . . . . . . . . . . 318 309 13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 319 310 13.12. Security Considerations for the File Layout Type . . . . 319 311 14. Internationalization . . . . . . . . . . . . . . . . . . . . 320 312 14.1. Stringprep profile for the utf8str_cs type . . . . . . . 321 313 14.2. Stringprep profile for the utf8str_cis type . . . . . . 323 314 14.3. Stringprep profile for the utf8str_mixed type . . . . . 324 315 14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 326 316 14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 326 317 15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 327 318 15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 327 319 15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 329 320 15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 331 321 15.1.3. Compound Structure Errors . . . . . . . . . . . . . 332 322 15.1.4. File System Errors . . . . . . . . . . . . . . . . . 334 323 15.1.5. State Management Errors . . . . . . . . . . . . . . 336 324 15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 337 325 15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 337 326 15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 338 327 15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 339 328 15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 340 329 15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 341 330 15.1.12. Session Management Errors . . . . . . . . . . . . . 343 331 15.1.13. Client Management Errors . . . . . . . . . . . . . . 343 332 15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 344 333 15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 344 334 15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 345 336 15.2. Operations and their valid errors . . . . . . . . . . . 346 337 15.3. Callback operations and their valid errors . . . . . . . 362 338 15.4. Errors and the operations that use them . . . . . . . . 364 339 16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 378 340 16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 378 341 16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 379 342 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 390 343 18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 393 344 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 393 345 18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 399 346 18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 400 347 18.4. Operation 6: CREATE - Create a Non-Regular File Object . 403 348 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting 349 Recovery . . . . . . . . . . . . . . . . . . . . . . . . 406 350 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 407 351 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 407 352 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 409 353 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 410 354 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 413 355 18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 417 356 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 418 357 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 420 358 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 421 359 18.15. Operation 17: NVERIFY - Verify Difference in 360 Attributes . . . . . . . . . . . . . . . . . . . . . . . 423 361 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 424 362 18.17. Operation 19: OPENATTR - Open Named Attribute 363 Directory . . . . . . . . . . . . . . . . . . . . . . . 443 364 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 444 365 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 446 366 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 446 367 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 448 368 18.22. Operation 25: READ - Read from File . . . . . . . . . . 449 369 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 451 370 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 455 371 18.25. Operation 28: REMOVE - Remove File System Object . . . . 456 372 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 458 373 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 462 374 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 463 375 18.29. Operation 33: SECINFO - Obtain Available Security . . . 464 376 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 468 377 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 471 378 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 472 379 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 476 380 18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 478 381 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 481 382 18.36. Operation 43: CREATE_SESSION - Create New Session and 383 Confirm Client ID . . . . . . . . . . . . . . . . . . . 498 385 18.37. Operation 44: DESTROY_SESSION - Destroy existing 386 session . . . . . . . . . . . . . . . . . . . . . . . . 508 387 18.38. Operation 45: FREE_STATEID - Free stateid with no 388 locks . . . . . . . . . . . . . . . . . . . . . . . . . 509 389 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory 390 delegation . . . . . . . . . . . . . . . . . . . . . . . 510 391 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 514 392 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings 393 for a File System . . . . . . . . . . . . . . . . . . . 516 394 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using 395 a layout . . . . . . . . . . . . . . . . . . . . . . . . 518 396 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 521 397 18.44. Operation 51: LAYOUTRETURN - Release Layout 398 Information . . . . . . . . . . . . . . . . . . . . . . 531 399 18.45. Operation 52: SECINFO_NO_NAME - Get Security on 400 Unnamed Object . . . . . . . . . . . . . . . . . . . . . 535 401 18.46. Operation 53: SEQUENCE - Supply per-procedure 402 sequencing and control . . . . . . . . . . . . . . . . . 537 403 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 542 404 18.48. Operation 55: TEST_STATEID - Test stateids for 405 validity . . . . . . . . . . . . . . . . . . . . . . . . 544 406 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 546 407 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing 408 client ID . . . . . . . . . . . . . . . . . . . . . . . 550 409 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims 410 Finished . . . . . . . . . . . . . . . . . . . . . . . . 550 411 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 553 412 19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 553 413 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 554 414 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 554 415 20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 558 416 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 558 417 20.2. Operation 4: CB_RECALL - Recall a Delegation . . . . . . 559 418 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from 419 Client . . . . . . . . . . . . . . . . . . . . . . . . . 560 420 20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 564 421 20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to 422 Client . . . . . . . . . . . . . . . . . . . . . . . . . 568 423 20.6. Operation 8: CB_RECALL_ANY - Keep any N recallable 424 objects . . . . . . . . . . . . . . . . . . . . . . . . 569 425 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal 426 Resources for Recallable Objects . . . . . . . . . . . . 572 427 20.8. Operation 10: CB_RECALL_SLOT - change flow control 428 limits . . . . . . . . . . . . . . . . . . . . . . . . . 573 429 20.9. Operation 11: CB_SEQUENCE - Supply backchannel 430 sequencing and control . . . . . . . . . . . . . . . . . 574 431 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending 432 Delegation Wants . . . . . . . . . . . . . . . . . . . . 576 434 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible 435 lock availability . . . . . . . . . . . . . . . . . . . 577 436 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID 437 changes . . . . . . . . . . . . . . . . . . . . . . . . 579 438 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback 439 Operation . . . . . . . . . . . . . . . . . . . . . . . 581 440 21. Security Considerations . . . . . . . . . . . . . . . . . . . 581 441 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 583 442 22.1. Named Attribute Definitions . . . . . . . . . . . . . . 583 443 22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 583 444 22.3. Defining New Notifications . . . . . . . . . . . . . . . 584 445 22.4. Defining New Layout Types . . . . . . . . . . . . . . . 584 446 22.5. Path Variable Definitions . . . . . . . . . . . . . . . 586 447 22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 586 448 22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 586 449 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 586 450 23.1. Normative References . . . . . . . . . . . . . . . . . . 586 451 23.2. Informative References . . . . . . . . . . . . . . . . . 588 452 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 590 453 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 592 454 Intellectual Property and Copyright Statements . . . . . . . . . 593 456 1. Introduction 458 1.1. The NFS Version 4 Minor Version 1 Protocol 460 The NFS version 4 minor version 1 (NFSv4.1) protocol is the second 461 minor version of the NFS version 4 (NFSv4) protocol. The first minor 462 version, NFSv4.0 is described in [21]. It generally follows the 463 guidelines for minor versioning model listed in Section 10 of RFC 464 3530. However, it diverges from guidelines 11 ("a client and server 465 that supports minor version X must support minor versions 0 through 466 X-1"), and 12 ("no features may be introduced as mandatory in a minor 467 version"). These divergences are due to the introduction of the 468 sessions model for managing non-idempotent operations and the 469 RECLAIM_COMPLETE operation. These two new features are 470 infrastructural in nature and simplify implementation of existing and 471 other new features. Making them anything but REQUIRED would add 472 undue complexity to protocol definition and implementation. NFSv4.1 473 accordingly updates the Minor Versioning guidelines (Section 2.7). 475 As a minor version, NFSv4.1 is consistent with the overall goals for 476 NFSv4, but extends the protocol so as to better meet those goals, 477 based on experiences with NFSv4.0. In addition, NFSv4.1 has adopted 478 some additional goals, which motivate some of the major extensions in 479 NFSv4.1. 481 1.2. Scope of this Document 483 This document describes the NFSv4.1 protocol. With respect to 484 NFSv4.0, this document does not: 486 o describe the NFSv4.0 protocol, except where needed to contrast 487 with NFSv4.1. 489 o modify the specification of the NFSv4.0 protocol. 491 o clarify the NFSv4.0 protocol. 493 1.3. NFSv4 Goals 495 The NFSv4 protocol is a further revision of the NFS protocol defined 496 already by NFSv3 [22]. It retains the essential characteristics of 497 previous versions: easy recovery; independence of transport 498 protocols, operating systems and file systems; simplicity; and good 499 performance. NFSv4 has the following goals: 501 o Improved access and good performance on the Internet. 503 The protocol is designed to transit firewalls easily, perform well 504 where latency is high and bandwidth is low, and scale to very 505 large numbers of clients per server. 507 o Strong security with negotiation built into the protocol. 509 The protocol builds on the work of the ONCRPC working group in 510 supporting the RPCSEC_GSS protocol. Additionally, the NFSv4.1 511 protocol provides a mechanism to allow clients and servers the 512 ability to negotiate security and require clients and servers to 513 support a minimal set of security schemes. 515 o Good cross-platform interoperability. 517 The protocol features a file system model that provides a useful, 518 common set of features that does not unduly favor one file system 519 or operating system over another. 521 o Designed for protocol extensions. 523 The protocol is designed to accept standard extensions within a 524 framework that enable and encourages backward compatibility. 526 1.4. NFSv4.1 Goals 528 NFSv4.1 has the following goals, within the framework established by 529 the overall NFSv4 goals. 531 o To correct significant structural weaknesses and oversights 532 discovered in the base protocol. 534 o To add clarity and specificity to areas left unaddressed or not 535 addressed in sufficient detail in the base protocol. However, as 536 stated in Section 1.2, it is not a goal to clarify the NFSv4.0 537 protocol in the NFSv4.1 specification. 539 o To add specific features based on experience with the existing 540 protocol and recent industry developments. 542 o To provide protocol support to take advantage of clustered server 543 deployments including the ability to provide scalable parallel 544 access to files distributed among multiple servers. 546 1.5. General Definitions 548 The following definitions are provided for the purpose of providing 549 an appropriate context for the reader. 551 Byte This document defines a byte as an octet, i.e. a datum exactly 552 8 bits in length. 554 Client The "client" is the entity that accesses the NFS server's 555 resources. The client may be an application which contains the 556 logic to access the NFS server directly. The client may also be 557 the traditional operating system client that provides remote file 558 system services for a set of applications. 560 A client is uniquely identified by a Client Owner. 562 With reference to file locking, the client is also the entity that 563 maintains a set of locks on behalf of one or more applications. 564 This client is responsible for crash or failure recovery for those 565 locks it manages. 567 Note that multiple clients may share the same transport and 568 connection and multiple clients may exist on the same network 569 node. 571 Client ID A 64-bit quantity used as a unique, short-hand reference 572 to a client supplied Verifier and client owner. The server is 573 responsible for supplying the client ID. 575 Client Owner The client owner is a unique string, opaque to the 576 server, which identifies a client. Multiple network connections 577 and source network addresses originating from those connections 578 may share a client owner. The server is expected to treat 579 requests from connnections with the same client owner as coming 580 from the same client. 582 File System The collection of objects on a server (as identified by 583 the major identifier of a Server Owner, which is defined later in 584 this section), that share the same fsid attribute (see 585 Section 5.8.1.9). 587 Lease An interval of time defined by the server for which the client 588 is irrevocably granted a lock. At the end of a lease period the 589 lock may be revoked if the lease has not been extended. The lock 590 must be revoked if a conflicting lock has been granted after the 591 lease interval. 593 All leases granted by a server have the same fixed interval. Note 594 that the fixed interval was chosen to alleviate the expense a 595 server would have in maintaining state about variable length 596 leases across server failures. 598 Lock The term "lock" is used to refer to byte-range (in UNIX 599 environments, also known as record) locks, share reservations, 600 delegations, or layouts unless specifically stated otherwise. 602 Server The "Server" is the entity responsible for coordinating 603 client access to a set of file systems and is identified by a 604 Server owner. A server can span multiple network addresses. 606 Server Owner The "Server Owner" identifies the server to the client. 607 The server owner consists of a major and minor identifier. When 608 the client has two connections each to a peer with the same major 609 identifier, the client assumes both peers are the same server (the 610 server namespace is the same via each connection), and assumes and 611 lock state is sharable across both connections. When each peer 612 has both the same major and minor identifier, the client assumes 613 each connection might be associatable with the same session. 615 Stable Storage NFSv4.1 servers must be able to recover without data 616 loss from multiple power failures (including cascading power 617 failures, that is, several power failures in quick succession), 618 operating system failures, and hardware failure of components 619 other than the storage medium itself (for example, disk, 620 nonvolatile RAM). 622 Some examples of stable storage that are allowable for an NFS 623 server include: 625 1. Media commit of data, that is, the modified data has been 626 successfully written to the disk media, for example, the disk 627 platter. 629 2. An immediate reply disk drive with battery-backed on- drive 630 intermediate storage or uninterruptible power system (UPS). 632 3. Server commit of data with battery-backed intermediate storage 633 and recovery software. 635 4. Cache commit with uninterruptible power system (UPS) and 636 recovery software. 638 Stateid A 128-bit quantity returned by a server that uniquely 639 defines the open and locking state provided by the server for a 640 specific open-owner or lock-owner/open-owner pair for a specific 641 file and type of lock. 643 Verifier A 64-bit quantity generated by the client that the server 644 can use to determine if the client has restarted and lost all 645 previous lock state. 647 1.6. Overview of NFSv4.1 Features 649 To provide a reasonable context for the reader, the major features of 650 the NFSv4.1 protocol will be reviewed in brief. This will be done to 651 provide an appropriate context for both the reader who is familiar 652 with the previous versions of the NFS protocol and the reader that is 653 new to the NFS protocols. For the reader new to the NFS protocols, 654 there is still a set of fundamental knowledge that is expected. The 655 reader should be familiar with the XDR and RPC protocols as described 656 in [2] and [3]. A basic knowledge of file systems and distributed 657 file systems is expected as well. 659 In general this specification of NFSv4.1 will not distinguish those 660 added in minor version one from those present in the base protocol 661 but will treat NFSv4.1 as a unified whole. See Section 1.7 for a 662 summary of the differences between NFSv4.0 and NFSv4.1. 664 1.6.1. RPC and Security 666 As with previous versions of NFS, the External Data Representation 667 (XDR) and Remote Procedure Call (RPC) mechanisms used for the NFSv4.1 668 protocol are those defined in [2] and [3]. To meet end-to-end 669 security requirements, the RPCSEC_GSS framework [4] will be used to 670 extend the basic RPC security. With the use of RPCSEC_GSS, various 671 mechanisms can be provided to offer authentication, integrity, and 672 privacy to the NFSv4 protocol. Kerberos V5 will be used as described 673 in [5] to provide one security framework. The LIPKEY and SPKM-3 GSS- 674 API mechanisms described in [6] will be used to provide for the use 675 of user password and client/server public key certificates by the 676 NFSv4 protocol. With the use of RPCSEC_GSS, other mechanisms may 677 also be specified and used for NFSv4.1 security. 679 To enable in-band security negotiation, the NFSv4.1 protocol has 680 operations which provide the client a method of querying the server 681 about its policies regarding which security mechanisms must be used 682 for access to the server's file system resources. With this, the 683 client can securely match the security mechanism that meets the 684 policies specified at both the client and server. 686 1.6.2. Protocol Structure 687 1.6.2.1. Core Protocol 689 Unlike NFSv3, which used a series of ancillary protocols (e.g. NLM, 690 NSM, MOUNT), within all minor versions of NFSv4 a single RPC protocol 691 is used to make requests to the server. Facilities that had been 692 separate protocols, such as locking, are now integrated within a 693 single unified protocol. 695 1.6.2.2. Parallel Access 697 Minor version one supports high-performance data access to a 698 clustered server implementation by enabling a separation of metadata 699 access and data access, with the latter done to multiple servers in 700 parallel. 702 Such parallel data access is controlled by recallable objects known 703 as "layouts", which are integrated into the protocol locking model. 704 Clients direct requests for data access to a set of data servers 705 specified by the layout via a data storage protocol which may be 706 NFSv4.1 or may be another protocol. 708 1.6.3. File System Model 710 The general file system model used for the NFSv4.1 protocol is the 711 same as previous versions. The server file system is hierarchical 712 with the regular files contained within being treated as opaque byte 713 streams. In a slight departure, file and directory names are encoded 714 with UTF-8 to deal with the basics of internationalization. 716 The NFSv4.1 protocol does not require a separate protocol to provide 717 for the initial mapping between path name and filehandle. All file 718 systems exported by a server are presented as a tree so that all file 719 systems are reachable from a special per-server global root 720 filehandle. This allows LOOKUP operations to be used to perform 721 functions previously provided by the MOUNT protocol. The server 722 provides any necessary pseudo file systems to bridge any gaps that 723 arise due to unexported gaps between exported file systems. 725 1.6.3.1. Filehandles 727 As in previous versions of the NFS protocol, opaque filehandles are 728 used to identify individual files and directories. Lookup-type and 729 create operations translate file and directory names to filehandles 730 which are then used to identify objects in subsequent operations. 732 The NFSv4.1 protocol provides support for persistent filehandles, 733 guaranteed to be valid for the lifetime of the file system object 734 designated. In addition it provides support to servers to provide 735 filehandles with more limited validity guarantees, called volatile 736 filehandles. 738 1.6.3.2. File Attributes 740 The NFSv4.1 protocol has a rich and extensible attribute structure, 741 which is divided into REQUIRED, RECOMMENDED, and named attributes. 743 The acl, sacl, and dacl attributes compose a set of RECOMMENDED file 744 attributes that make up the Access Control List (ACL) of a file 745 (Section 6). These attributes provide for directory and file access 746 control beyond the model used in NFSv3. The ACL definition allows 747 for specification of specific sets of permissions for individual 748 users and groups. In addition, ACL inheritance allows propagation of 749 access permissions and restriction down a directory tree as file 750 system objects are created. 752 A named attribute is an opaque byte stream that is associated with a 753 directory or file and referred to by a string name. Named attributes 754 are meant to be used by client applications as a method to associate 755 application-specific data with a regular file or directory. NFSv4.1 756 modifies named attributes relative to NFSv4.0 by tightening the 757 allowed operations in order to prevent the development of non- 758 interoperable implementation. See Section 5.3 for details. 760 1.6.3.3. Multi-server Namespace 762 NFSv4.1 contains a number of features to allow implementation of 763 namespaces that cross server boundaries and that allow and facilitate 764 a non-disruptive transfer of support for individual file systems 765 between servers. They are all based upon attributes that allow one 766 file system to specify alternate or new locations for that file 767 system. 769 These attributes may be used together with the concept of absent file 770 systems, which provide specifications for additional locations but no 771 actual file system content. This allows a number of important 772 facilities: 774 o Location attributes may be used with absent file systems to 775 implement referrals whereby one server may direct the client to a 776 file system provided by another server. This allows extensive 777 multi-server namespaces to be constructed. 779 o Location attributes may be provided for present file systems to 780 provide the locations of alternate file system instances or 781 replicas to be used in the event that the current file system 782 instance becomes unavailable. 784 o Location attributes may be provided when a previously present file 785 system becomes absent. This allows non-disruptive migration of 786 file systems to alternate servers. 788 1.6.4. Locking Facilities 790 As mentioned previously, NFS v4.1 is a single protocol which includes 791 locking facilities. These locking facilities include support for 792 many types of locks including a number of sorts of recallable locks. 793 Recallable locks such as delegations allow the client to be assured 794 that certain events will not occur so long as that lock is held. 795 When circumstances change, the lock is recalled via a callback 796 request. The assurances provided by delegations allow more extensive 797 caching to be done safely when circumstances allow it. 799 The types of locks are: 801 o Share reservations as established by OPEN operations. 803 o Byte-range locks. 805 o File delegations, which are recallable locks that assure the 806 holder that inconsistent opens and file changes cannot occur so 807 long as the delegation is held. 809 o Directory delegations, which are recallable locks that assure the 810 holder that inconsistent directory modifications cannot occur so 811 long as the delegation is held. 813 o Layouts, which are recallable objects that assure the holder that 814 direct access to the file data may be performed directly by the 815 client and that no change to the data's location inconsistent with 816 that access may be made so long as the layout is held. 818 All locks for a given client are tied together under a single client- 819 wide lease. All requests made on sessions associated with the client 820 renew that lease. When leases are not promptly renewed locks are 821 subject to revocation. In the event of server restart, clients have 822 the opportunity to safely reclaim their locks within a special grace 823 period. 825 1.7. Differences from NFSv4.0 827 The following summarizes the major differences between minor version 828 one and the base protocol: 830 o Implementation of the sessions model (Section 2.10). 832 o Parallel access to data (Section 12). 834 o Addition of the RECLAIM_COMPLETE operation to better structure the 835 lock reclamation process (Section 18.51). 837 o Enhanced delegation support as follows. 839 * Delegations on directories and other file types in addition to 840 regular files (Section 18.39, Section 18.49). 842 * Operations to optimize acquisition of recalled or denied 843 delegations (Section 18.49, Section 20.5, Section 20.7). 845 * Notifications of changes to files and directories 846 (Section 18.39, Section 20.4). 848 * A method to allow a server to indicate it is recalling one or 849 more delegations for resource management reasons, and thus a 850 method to allow the client to pick which delegations to return 851 (Section 20.6). 853 o Attributes can be set atomically during exclusive file create via 854 the OPEN operation (see the new EXCLUSIVE4_1 creation method in 855 Section 18.16). 857 o Open files can be preserved if removed and the hard link count 858 goes to zero thus obviating the need for clients to rename deleted 859 files to partially hidden names -- colloquially called "silly 860 rename" (see the new OPEN4_RESULT_PRESERVE_UNLINKED reply flag in 861 Section 18.16). 863 o Improved compatibility with Microsoft Windows for Access Control 864 Lists (Section 6.2.3, Section 6.2.2, Section 6.4.3.2). 866 o Data retention (Section 5.13). 868 o Identification of the implementation of the NFS client and server 869 (Section 18.35). 871 o Support for notification of the availability of byte-range locks 872 (see the new OPEN4_RESULT_MAY_NOTIFY_LOCK reply flag in 873 Section 18.16 and see Section 20.11). 875 2. Core Infrastructure 876 2.1. Introduction 878 NFSv4.1 relies on core infrastructure common to nearly every 879 operation. This core infrastructure is described in the remainder of 880 this section. 882 2.2. RPC and XDR 884 The NFSv4.1 protocol is a Remote Procedure Call (RPC) application 885 that uses RPC version 2 and the corresponding eXternal Data 886 Representation (XDR) as defined in [3] and [2]. 888 2.2.1. RPC-based Security 890 Previous NFS versions have been thought of as having a host-based 891 authentication model, where the NFS server authenticates the NFS 892 client, and trusts the client to authenticate all users. Actually, 893 NFS has always depended on RPC for authentication. One of the first 894 forms of RPC authentication, AUTH_SYS, had no strong authentication, 895 and required a host-based authentication approach. NFSv4.1 also 896 depends on RPC for basic security services, and mandates RPC support 897 for a user-based authentication model. The user-based authentication 898 model has user principals authenticated by a server, and in turn the 899 server authenticated by user principals. RPC provides some basic 900 security services which are used by NFSv4.1. 902 2.2.1.1. RPC Security Flavors 904 As described in section 7.2 "Authentication" of [3], RPC security is 905 encapsulated in the RPC header, via a security or authentication 906 flavor, and information specific to the specified security flavor. 907 Every RPC header conveys information used to identify and 908 authenticate a client and server. As discussed in Section 2.2.1.1.1, 909 some security flavors provide additional security services. 911 NFSv4.1 clients and servers MUST implement RPCSEC_GSS. (This 912 requirement to implement is not a requirement to use.) Other 913 flavors, such as AUTH_NONE, and AUTH_SYS, MAY be implemented as well. 915 2.2.1.1.1. RPCSEC_GSS and Security Services 917 RPCSEC_GSS ([4]) uses the functionality of GSS-API [7]. This allows 918 for the use of various security mechanisms by the RPC layer without 919 the additional implementation overhead of adding RPC security 920 flavors. 922 2.2.1.1.1.1. Identification, Authentication, Integrity, Privacy 924 Via the GSS-API, RPCSEC_GSS can be used to identify and authenticate 925 users on clients to servers, and servers to users. It can also 926 perform integrity checking on the entire RPC message, including the 927 RPC header, and the arguments or results. Finally, privacy, usually 928 via encryption, is a service available with RPCSEC_GSS. Privacy is 929 performed on the arguments and results. Note that if privacy is 930 selected, integrity, authentication, and identification are enabled. 931 If privacy is not selected, but integrity is selected, authentication 932 and identification are enabled. If integrity and privacy are not 933 selected, but authentication is enabled, identification is enabled. 934 RPCSEC_GSS does not provide identification as a separate service. 936 Although GSS-API has an authentication service distinct from its 937 privacy and integrity services, GSS-API's authentication service is 938 not used for RPCSEC_GSS's authentication service. Instead, each RPC 939 request and response header is integrity protected with the GSS-API 940 integrity service, and this allows RPCSEC_GSS to offer per-RPC 941 authentication and identity. See [4] for more information. 943 NFSv4.1 client and servers MUST support RPCSEC_GSS's integrity and 944 authentication service. NFSv4.1 servers MUST support RPCSEC_GSS's 945 privacy service. 947 2.2.1.1.1.2. Security mechanisms for NFSv4.1 949 RPCSEC_GSS, via GSS-API, normalizes access to mechanisms that provide 950 security services. Therefore NFSv4.1 clients and servers MUST 951 support three security mechanisms: Kerberos V5, SPKM-3, and LIPKEY. 953 The use of RPCSEC_GSS requires selection of: mechanism, quality of 954 protection (QOP), and service (authentication, integrity, privacy). 955 For the mandated security mechanisms, NFSv4.1 specifies that a QOP of 956 zero (0) is used, leaving it up to the mechanism or the mechanism's 957 configuration to use an appropriate level of protection that QOP zero 958 maps to. Each mandated mechanism specifies minimum set of 959 cryptographic algorithms for implementing integrity and privacy. 960 NFSv4.1 clients and servers MUST be implemented on operating 961 environments that comply with the REQUIRED cryptographic algorithms 962 of each REQUIRED mechanism. 964 2.2.1.1.1.2.1. Kerberos V5 966 The Kerberos V5 GSS-API mechanism as described in [5] MUST be 967 implemented with the RPCSEC_GSS services as specified in the 968 following table: 970 column descriptions: 971 1 == number of pseudo flavor 972 2 == name of pseudo flavor 973 3 == mechanism's OID 974 4 == RPCSEC_GSS service 975 5 == NFSv4.1 clients MUST support 976 6 == NFSv4.1 servers MUST support 978 1 2 3 4 5 6 979 ------------------------------------------------------------------ 980 390003 krb5 1.2.840.113554.1.2.2 rpc_gss_svc_none yes yes 981 390004 krb5i 1.2.840.113554.1.2.2 rpc_gss_svc_integrity yes yes 982 390005 krb5p 1.2.840.113554.1.2.2 rpc_gss_svc_privacy no yes 984 Note that the number and name of the pseudo flavor is presented here 985 as a mapping aid to the implementor. Because the NFSv4.1 protocol 986 includes a method to negotiate security and it understands the GSS- 987 API mechanism, the pseudo flavor is not needed. The pseudo flavor is 988 needed for the NFSv3 since the security negotiation is done via the 989 MOUNT protocol as described in [23]. 991 2.2.1.1.1.2.2. LIPKEY 993 The LIPKEY V5 GSS-API mechanism as described in [6] MUST be 994 implemented with the RPCSEC_GSS services as specified in the 995 following table: 997 1 2 3 4 5 6 998 ------------------------------------------------------------------ 999 390006 lipkey 1.3.6.1.5.5.9 rpc_gss_svc_none yes yes 1000 390007 lipkey-i 1.3.6.1.5.5.9 rpc_gss_svc_integrity yes yes 1001 390008 lipkey-p 1.3.6.1.5.5.9 rpc_gss_svc_privacy no yes 1003 2.2.1.1.1.2.3. SPKM-3 as a security triple 1005 The SPKM-3 GSS-API mechanism as described in [6] MUST be implemented 1006 with the RPCSEC_GSS services as specified in the following table: 1008 1 2 3 4 5 6 1009 ------------------------------------------------------------------ 1010 390009 spkm3 1.3.6.1.5.5.1.3 rpc_gss_svc_none yes yes 1011 390010 spkm3i 1.3.6.1.5.5.1.3 rpc_gss_svc_integrity yes yes 1012 390011 spkm3p 1.3.6.1.5.5.1.3 rpc_gss_svc_privacy no yes 1014 2.2.1.1.1.3. GSS Server Principal 1016 Regardless of what security mechanism under RPCSEC_GSS is being used, 1017 the NFS server, MUST identify itself in GSS-API via a 1018 GSS_C_NT_HOSTBASED_SERVICE name type. GSS_C_NT_HOSTBASED_SERVICE 1019 names are of the form: 1021 service@hostname 1023 For NFS, the "service" element is 1025 nfs 1027 Implementations of security mechanisms will convert nfs@hostname to 1028 various different forms. For Kerberos V5, LIPKEY, and SPKM-3, the 1029 following form is RECOMMENDED: 1031 nfs/hostname 1033 2.3. COMPOUND and CB_COMPOUND 1035 A significant departure from the versions of the NFS protocol before 1036 NFSv4 is the introduction of the COMPOUND procedure. For the NFSv4 1037 protocol, in all minor versions, there are exactly two RPC 1038 procedures, NULL and COMPOUND. The COMPOUND procedure is defined as 1039 a series of individual operations and these operations perform the 1040 sorts of functions performed by traditional NFS procedures. 1042 The operations combined within a COMPOUND request are evaluated in 1043 order by the server, without any atomicity guarantees. A limited set 1044 of facilities exist to pass results from one operation to another. 1045 Once an operation returns a failing result, the evaluation ends and 1046 the results of all evaluated operations are returned to the client. 1048 With the use of the COMPOUND procedure, the client is able to build 1049 simple or complex requests. These COMPOUND requests allow for a 1050 reduction in the number of RPCs needed for logical file system 1051 operations. For example, multi-component lookup requests can be 1052 constructed by combining multiple LOOKUP operations. Those can be 1053 further combined with operations such as GETATTR, READDIR, or OPEN 1054 plus READ to do more complicated sets of operation without incurring 1055 additional latency. 1057 NFSv4.1 also contains a considerable set of callback operations in 1058 which the server makes an RPC directed at the client. Callback RPC's 1059 have a similar structure to that of the normal server requests. In 1060 all minor versions of the NFSv4 protocol there are two callback RPC 1061 procedures, CB_NULL and CB_COMPOUND. The CB_COMPOUND procedure is 1062 defined in an analogous fashion to that of COMPOUND with its own set 1063 of callback operations. 1065 The addition of new server and callback operations within the 1066 COMPOUND and CB_COMPOUND request framework provides a means of 1067 extending the protocol in subsequent minor versions. 1069 Except for a small number of operations needed for session creation, 1070 server requests and callback requests are performed within the 1071 context of a session. Sessions provide a client context for every 1072 request and support robust reply protection for non-idempotent 1073 requests. 1075 2.4. Client Identifiers and Client Owners 1077 For each operation that obtains or depends on locking state, the 1078 specific client must be identifiable by the server. 1080 Each distinct client instance is represented by a client ID. A 1081 client ID is a 64-bit identifier representing a specific client at a 1082 given time. The client ID is changed whenever the client re- 1083 initializes, and may change when the server re-initializes. Client 1084 IDs are used to support lock identification and crash recovery. 1086 During steady state operation, the client ID associated with each 1087 operation is derived from the session (see Section 2.10) on which the 1088 operation is sent. A session is associated with a client ID when the 1089 session is created. 1091 Unlike NFSv4.0, the only NFSv4.1 operations possible before a client 1092 ID is established are those needed to establish the client ID. 1094 A sequence of an EXCHANGE_ID operation followed by a CREATE_SESSION 1095 operation using that client ID (eir_clientid as returned from 1096 EXCHANGE_ID) is required to establish and confirm the client ID on 1097 the server. Establishment of identification by a new incarnation of 1098 the client also has the effect of immediately releasing any locking 1099 state that a previous incarnation of that same client might have had 1100 on the server. Such released state would include all lock, share 1101 reservation, layout state, and where the server is not supporting the 1102 CLAIM_DELEGATE_PREV claim type, all delegation state associated with 1103 the same client with the same identity. For discussion of delegation 1104 state recovery, see Section 10.2.1. For discussion of layout state 1105 recovery see Section 12.7.1. 1107 Releasing such state requires that the server be able to determine 1108 that one client instance is the successor of another. Where this 1109 cannot be done, for any of a number of reasons, the locking state 1110 will remain for a time subject to lease expiration (see Section 8.3) 1111 and the new client will need to wait for such state to be removed, if 1112 it makes conflicting lock requests. 1114 Client identification is encapsulated in the following Client Owner 1115 data type: 1117 struct client_owner4 { 1118 verifier4 co_verifier; 1119 opaque co_ownerid; 1120 }; 1122 The first field, co_verifier, is a client incarnation verifier. The 1123 server will start the process of canceling the client's leased state 1124 if co_verifier is different than what the server has previously 1125 recorded for the identified client (as specified in the co_ownerid 1126 field). 1128 The second field, co_ownerid is a variable length string that 1129 uniquely defines the client so that subsequent instances of the same 1130 client bear the same co_ownerid with a different verifier. 1132 There are several considerations for how the client generates the 1133 co_ownerid string: 1135 o The string should be unique so that multiple clients do not 1136 present the same string. The consequences of two clients 1137 presenting the same string range from one client getting an error 1138 to one client having its leased state abruptly and unexpectedly 1139 canceled. 1141 o The string should be selected so that subsequent incarnations 1142 (e.g. restarts) of the same client cause the client to present the 1143 same string. The implementor is cautioned from an approach that 1144 requires the string to be recorded in a local file because this 1145 precludes the use of the implementation in an environment where 1146 there is no local disk and all file access is from an NFSv4.1 1147 server. 1149 o The string should be the same for each server network address that 1150 the client accesses. This way, if a server has multiple 1151 interfaces, the client can trunk traffic over multiple network 1152 paths as described in Section 2.10.4. (Note: the precise opposite 1153 was advised in the NFSv4.0 specification [21].) 1155 o The algorithm for generating the string should not assume that the 1156 client's network address will not change, unless the client 1157 implementation knows it is using statically assigned network 1158 addresses. This includes changes between client incarnations and 1159 even changes while the client is still running in its current 1160 incarnation. Thus with dynamic address assignment, if the client 1161 includes just the client's network address in the co_ownerid 1162 string, there is a real risk that after the client gives up the 1163 network address, another client, using a similar algorithm for 1164 generating the co_ownerid string, would generate a conflicting 1165 co_ownerid string. 1167 Given the above considerations, an example of a well generated 1168 co_ownerid string is one that includes: 1170 o If applicable, the client's statically assigned network address. 1172 o Additional information that tends to be unique, such as one or 1173 more of: 1175 * The client machine's serial number (for privacy reasons, it is 1176 best to perform some one way function on the serial number). 1178 * A MAC address (again, a one way function should be performed). 1180 * The timestamp of when the NFSv4.1 software was first installed 1181 on the client (though this is subject to the previously 1182 mentioned caution about using information that is stored in a 1183 file, because the file might only be accessible over NFSv4.1). 1185 * A true random number. However since this number ought to be 1186 the same between client incarnations, this shares the same 1187 problem as that of using the timestamp of the software 1188 installation. 1190 o For a user level NFSv4.1 client, it should contain additional 1191 information to distinguish the client from other user level 1192 clients running on the same host, such as a process identifier or 1193 other unique sequence. 1195 The client ID is assigned by the server (the eir_clientid result from 1196 EXCHANGE_ID) and should be chosen so that it will not conflict with a 1197 client ID previously assigned by the server. This applies across 1198 server restarts. 1200 In the event of a server restart, a client may find out that its 1201 current client ID is no longer valid when it receives an 1202 NFS4ERR_STALE_CLIENTID error. The precise circumstances depend on 1203 the characteristics of the sessions involved, specifically whether 1204 the session is persistent (see Section 2.10.5.5), but in each case 1205 the client will receive this error when it attempts to establish a 1206 new session with the existing client ID and receives the error 1207 NFS4ERR_STALE_CLIENTID, indicating that a new client ID must be 1208 obtained via EXCHANGE_ID and the new session established with that 1209 client ID. 1211 When a session is not persistent, the client will find out that it 1212 needs to create a new session as a result of getting an 1213 NFS4ERR_BADSESSION, since the session in question was lost as part of 1214 a server restart. When the existing client ID is presented to a 1215 server as part of creating a session and that client ID is not 1216 recognized, as would happen after a server restart, the server will 1217 reject the request with the error NFS4ERR_STALE_CLIENTID. 1219 In the case of the session being persistent, the client will re- 1220 establish communication using the existing session after the restart. 1221 This session will be associated with the existing client ID but may 1222 only be used to retransmit operations that the client previously 1223 transmitted and did not see replies to. Replies to operations that 1224 the server previously performed will come from the reply cache, 1225 otherwise NFS4ERR_DEADSESSION will be returned. Hence, such a 1226 session is referred to as "dead". In this situation, in order to 1227 perform new operations, the client must establish a new session. If 1228 an attempt is made to establish this new session with the existing 1229 client ID, the server will reject the request with 1230 NFS4ERR_STALE_CLIENTID. 1232 When NFS4ERR_STALE_CLIENTID is received in either of these 1233 situations, the client must obtain a new client ID by use of the 1234 EXCHANGE_ID operation, then use that client ID as the basis of a new 1235 session, and then proceed to any other necessary recovery for the 1236 server restart case (See Section 8.4.2). 1238 See the descriptions of EXCHANGE_ID (Section 18.35) and 1239 CREATE_SESSION (Section 18.36) for a complete specification of these 1240 operations. 1242 2.4.1. Upgrade from NFSv4.0 to NFSv4.1 1244 To facilitate upgrade from NFSv4.0 to NFSv4.1, a server may compare a 1245 client_owner4 in an EXCHANGE_ID with an nfs_client_id4 established 1246 using the SETCLIENTID operation of NFSv4.0. A server that does so 1247 will allow an upgraded client to avoid waiting until the lease (i.e. 1248 the lease established by the NFSv4.0 instance client) expires. This 1249 requires the client_owner4 be constructed the same way as the 1250 nfs_client_id4. If the latter's contents included the server's 1251 network address (per the recommendations of the NFSv4.0 specification 1252 [21]), and the NFSv4.1 client does not wish to use a client ID that 1253 prevents trunking, it should send two EXCHANGE_ID operations. The 1254 first EXCHANGE_ID will have a client_owner4 equal to the 1255 nfs_client_id4. This will clear the state created by the NFSv4.0 1256 client. The second EXCHANGE_ID will not have the server's network 1257 address. The state created for the second EXCHANGE_ID will not have 1258 to wait for lease expiration, because there will be no state to 1259 expire. 1261 2.4.2. Server Release of Client ID 1263 NFSv4.1 introduces a new operation called DESTROY_CLIENTID 1264 (Section 18.50) which the client SHOULD use to destroy a client ID it 1265 no longer needs. This permits graceful, bilateral release of a 1266 client ID. The operation cannot be used if there are sessions 1267 associated with the client ID, or state with an unexpired lease. 1269 If the server determines that the client holds no associated state 1270 for its client ID (including sessions, opens, locks, delegations, 1271 layouts, and wants), the server may choose to unilaterally release 1272 the client ID in order to conserve resources. If the client contacts 1273 the server after this release, the server must ensure the client 1274 receives the appropriate error so that it will use the EXCHANGE_ID/ 1275 CREATE_SESSION sequence to establish a new client ID. The server 1276 ought to be very hesitant to release a client ID since the resulting 1277 work on the client to recover from such an event will be the same 1278 burden as if the server had failed and restarted. Typically a server 1279 would not release a client ID unless there had been no activity from 1280 that client for many minutes. As long as there are sessions, opens, 1281 locks, delegations, layouts, or wants, the server MUST NOT release 1282 the client ID. See Section 2.10.11.1.4 for discussion on releasing 1283 inactive sessions. 1285 2.4.3. Resolving Client Owner Conflicts 1287 When the server gets an EXCHANGE_ID for a client owner that currently 1288 has no state, or that has state, but the lease has expired, the 1289 server MUST allow the EXCHANGE_ID, and confirm the new client ID if 1290 followed by the appropriate CREATE_SESSION. 1292 When the server gets an EXCHANGE_ID for a new incarnation of a client 1293 owner that currently has an old incarnation with state and an 1294 unexpired lease, the server is allowed to dispose of the state of the 1295 previous incarnation of the client owner if one of the following are 1296 true: 1298 o The principal that created the client ID for the client owner is 1299 the same as the principal that is issuing the EXCHANGE_ID. Note 1300 that if the client ID was created with SP4_MACH_CRED state 1301 protection (Section 18.35), the principal MUST be based on 1302 RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be 1303 integrity or privacy, and the same GSS mechanism and principal 1304 must be used as that used when the client ID was created. 1306 o The client ID was established with SP4_SSV protection 1307 (Section 18.35, Section 2.10.7.3) and the client sends the 1308 EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the 1309 GSS SSV mechanism (Section 2.10.8). 1311 o The client ID was established with SP4_SSV protection, and under 1312 the conditions described herein, the EXCHANGE_ID was sent with 1313 SP4_MACH_CRED state protection. Because the SSV might not persist 1314 across client and server restart, and because the first time a 1315 client sends EXCHANGE_ID to a server it does not have an SSV, the 1316 client MAY send the subsequent EXCHANGE_ID without an SSV 1317 RPCSEC_GSS handle. Instead, as with SP4_MACH_CRED protection, the 1318 principal MUST be based on RPCSEC_GSS authentication, the 1319 RPCSEC_GSS service used MUST be integrity or privacy, and the same 1320 GSS mechanism and principal MUST be used as that used when the 1321 client ID was created. 1323 If none of the above situations apply, the server MUST return 1324 NFS4ERR_CLID_INUSE. 1326 If the server accepts the principal and co_ownerid as matching that 1327 which created the client ID, and the co_verifier in the EXCHANGE_ID 1328 differs from the co_verifier used when the client ID was created, 1329 then after the server receives a CREATE_SESSION that confirms the 1330 client ID, the server deletes state. If the co_verifier values are 1331 the same, (e.g. the client is either updating properties of the 1332 client ID (Section 18.35), or the client is attempting trunking 1333 (Section 2.10.4) the server MUST NOT delete state. 1335 2.5. Server Owners 1337 The Server Owner is similar to a Client Owner (Section 2.4), but 1338 unlike the Client Owner, there is no shorthand server ID. The Server 1339 Owner is defined in the following data type: 1341 struct server_owner4 { 1342 uint64_t so_minor_id; 1343 opaque so_major_id; 1344 }; 1346 The Server Owner is returned from EXCHANGE_ID. When the so_major_id 1347 fields are the same in two EXCHANGE_ID results, the connections each 1348 EXCHANGE_ID were sent over can be assumed to address the same Server 1349 (as defined in Section 1.5). If the so_minor_id fields are also the 1350 same, then not only do both connections connect to the same server, 1351 but the session can be shared across both connections. The reader is 1352 cautioned that multiple servers may deliberately or accidentally 1353 claim to have the same so_major_id or so_major_id/so_minor_id; the 1354 reader should examine Section 2.10.4 and Section 18.35 in order to 1355 avoid acting on falsely matching Server Owner values. 1357 The considerations for generating a so_major_id are similar to that 1358 for generating a co_ownerid string (see Section 2.4). The 1359 consequences of two servers generating conflicting so_major_id values 1360 are less dire than they are for co_ownerid conflicts because the 1361 client can use RPCSEC_GSS to compare the authenticity of each server 1362 (see Section 2.10.4). 1364 2.6. Security Service Negotiation 1366 With the NFSv4.1 server potentially offering multiple security 1367 mechanisms, the client needs a method to determine or negotiate which 1368 mechanism is to be used for its communication with the server. The 1369 NFS server may have multiple points within its file system namespace 1370 that are available for use by NFS clients. These points can be 1371 considered security policy boundaries, and in some NFS 1372 implementations are tied to NFS export points. In turn the NFS 1373 server may be configured such that each of these security policy 1374 boundaries may have different or multiple security mechanisms in use. 1376 The security negotiation between client and server must be done with 1377 a secure channel to eliminate the possibility of a third party 1378 intercepting the negotiation sequence and forcing the client and 1379 server to choose a lower level of security than required or desired. 1380 See Section 21 for further discussion. 1382 2.6.1. NFSv4.1 Security Tuples 1384 An NFS server can assign one or more "security tuples" to each 1385 security policy boundary in its namespace. Each security tuple 1386 consists of a security flavor (see Section 2.2.1.1), and if the 1387 flavor is RPCSEC_GSS, a GSS-API mechanism OID, a GSS-API quality of 1388 protection, and an RPCSEC_GSS service. 1390 2.6.2. SECINFO and SECINFO_NO_NAME 1392 The SECINFO and SECINFO_NO_NAME operations allow the client to 1393 determine, on a per filehandle basis, what security tuple is to be 1394 used for server access. In general, the client will not have to use 1395 either operation except during initial communication with the server 1396 or when the client crosses security policy boundaries at the server. 1397 However, the server's policies may also change at any time and force 1398 the client to negotiate a new security tuple. 1400 Where the use of different security tuples would affect the type of 1401 access that would be allowed if a request was sent over the same 1402 connection used for the SECINFO or SECINFO_NO_NAME operation (e.g. 1403 read-only vs. read-write) access, security tuples that allow greater 1404 access should be presented first. Where the general level of access 1405 is the same and different security flavors limit the range of 1406 principals whose privileges are recognized (e.g. allowing or 1407 disallowing root access), flavors supporting the greatest range of 1408 principals should be listed first. 1410 2.6.3. Security Error 1412 Based on the assumption that each NFSv4.1 client and server must 1413 support a minimum set of security (i.e., LIPKEY, SPKM-3, and 1414 Kerberos-V5 all under RPCSEC_GSS), the NFS client will initiate file 1415 access to the server with one of the minimal security tuples. During 1416 communication with the server, the client may receive an NFS error of 1417 NFS4ERR_WRONGSEC. This error allows the server to notify the client 1418 that the security tuple currently being used contravenes the server's 1419 security policy. The client is then responsible for determining (see 1420 Section 2.6.3.1) what security tuples are available at the server and 1421 choosing one which is appropriate for the client. 1423 2.6.3.1. Using NFS4ERR_WRONGSEC, SECINFO, and SECINFO_NO_NAME 1425 This section explains of the mechanics of NFSv4.1 security 1426 negotiation. 1428 2.6.3.1.1. Put Filehandle Operations 1430 The term "put filehandle operation" refers to PUTROOTFH, PUTPUBFH, 1431 PUTFH, and RESTOREFH. Each of the subsections herein describes how 1432 the server handles a subseries of operations that starts with a put 1433 filehandle operation. 1435 2.6.3.1.1.1. Put Filehandle Operation + SAVEFH 1437 The client is saving a filehandle for a future RESTOREFH, LINK, or 1438 RENAME. SAVEFH MUST NOT return NFS4ERR_WRONGSEC. To determine 1439 whether the put filehandle operation returns NFS4ERR_WRONGSEC or not, 1440 the server implementation pretends SAVEFH is not in the series of 1441 operations and examines which of the situations described in the 1442 other subsections of Section 2.6.3.1.1 apply. 1444 2.6.3.1.1.2. Two or More Put Filehandle Operations 1446 For a series of N put filehandle operations, the server MUST NOT 1447 return NFS4ERR_WRONGSEC to the first N-1 put filehandle operations. 1448 The N'th put filehandle operation is handled as if it is the first in 1449 a subseries of operations. For example if the server received PUTFH, 1450 PUTROOTFH, LOOKUP, then the PUTFH is ignored for NFS4ERR_WRONGSEC 1451 purposes, and the PUTROOTFH, LOOKUP subseries is processed as 1452 according to Section 2.6.3.1.1.3. 1454 2.6.3.1.1.3. Put Filehandle Operation + LOOKUP (or OPEN of an Existing 1455 Name) 1457 This situation also applies to a put filehandle operation followed by 1458 a LOOKUP or an OPEN operation that specifies an existing component 1459 name. 1461 In this situation, the client is potentially crossing a security 1462 policy boundary, and the set of security tuples the parent directory 1463 supports may differ from those of the child. The server 1464 implementation may decide whether to impose any restrictions on 1465 security policy administration. There are at least three approaches 1466 (sec_policy_child is the tuple set of the child export, 1467 sec_policy_parent is that of the parent). 1469 a) sec_policy_child <= sec_policy_parent (<= for subset). This 1470 means that the set of security tuples specified on the security 1471 policy of a child directory is always a subset of that of its 1472 parent directory. 1474 b) sec_policy_child ^ sec_policy_parent != {} (^ for intersection, 1475 {} for the empty set). This means that the security tuples 1476 specified on the security policy of a child directory always has a 1477 non empty intersection with that of the parent. 1479 c) sec_policy_child ^ sec_policy_parent == {}. This means that 1480 the set of tuples specified on the security policy of a child 1481 directory may not intersect with that of the parent. In other 1482 words, there are no restrictions on how the system administrator 1483 may set up these tuples. 1485 In order for a server to support approaches (b) (for the case when a 1486 client chooses a flavor that is not a member of sec_policy_parent) 1487 and (c), the put filehandle operation cannot return NFS4ERR_WRONGSEC 1488 when there is a security tuple mismatch. Instead, it should be 1489 returned from the LOOKUP (or OPEN by existing component name) that 1490 follows. 1492 Since the above guideline does not contradict approach (a), it should 1493 be followed in general. Even if approach (a) is implemented, it is 1494 possible for the security tuple used to be acceptable for the target 1495 of LOOKUP but not for the filehandles used in the put filehandle 1496 operation. The put filehandle operation could be a PUTROOTFH or 1497 PUTPUBFH, where the client cannot know the security tuples for the 1498 root or public filehandle. Or the security policy for the filehandle 1499 used by the put filehandle operation could have changed since the 1500 time the filehandle was obtained. 1502 Therefore, an NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC in 1503 response to the put filehandle operation if the operation is 1504 immediately followed by a LOOKUP or an OPEN by component name. 1506 2.6.3.1.1.4. Put Filehandle Operation + LOOKUPP 1508 Since SECINFO only works its way down, there is no way LOOKUPP can 1509 return NFS4ERR_WRONGSEC without SECINFO_NO_NAME. SECINFO_NO_NAME 1510 solves this issue via style SECINFO_STYLE4_PARENT, which works in the 1511 opposite direction as SECINFO. As with Section 2.6.3.1.1.3, a put 1512 filehandle operation that is followed by a LOOKUPP MUST NOT return 1513 NFS4ERR_WRONGSEC. If the server does not support SECINFO_NO_NAME, 1514 the client's only recourse is to send the put filehandle operation, 1515 LOOKUPP, GETFH sequence of operations with every security tuple it 1516 supports. 1518 Regardless of whether SECINFO_NO_NAME is supported, an NFSv4.1 server 1519 MUST NOT return NFS4ERR_WRONGSEC in response to a put filehandle 1520 operation if the operation is immediately followed by a LOOKUPP. 1522 2.6.3.1.1.5. Put Filehandle Operation + SECINFO/SECINFO_NO_NAME 1524 A security sensitive client is allowed to choose a strong security 1525 tuple when querying a server to determine a file object's permitted 1526 security tuples. The security tuple chosen by the client does not 1527 have to be included in the tuple list of the security policy of the 1528 either parent directory indicated in the put filehandle operation, or 1529 the child file object indicated in SECINFO (or any parent directory 1530 indicated in SECINFO_NO_NAME). Of course the server has to be 1531 configured for whatever security tuple the client selects, otherwise 1532 the request will fail at RPC layer with an appropriate authentication 1533 error. 1535 In theory, there is no connection between the security flavor used by 1536 SECINFO or SECINFO_NO_NAME and those supported by the security 1537 policy. But in practice, the client may start looking for strong 1538 flavors from those supported by the security policy, followed by 1539 those in the REQUIRED set. 1541 The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to a put 1542 filehandle operation that is immediately followed by SECINFO or 1543 SECINFO_NO_NAME. The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC 1544 from SECINFO or SECINFO_NO_NAME. 1546 2.6.3.1.1.6. Put Filehandle Operation + Nothing 1548 The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC. 1550 2.6.3.1.1.7. Put Filehandle Operation + Anything Else 1552 "Anything Else" includes OPEN by filehandle. 1554 The security policy enforcement applies to the filehandle specified 1555 in the put filehandle operation. Therefore the put filehandle 1556 operation must return NFS4ERR_WRONGSEC when there is a security tuple 1557 mismatch. This avoids the complexity adding NFS4ERR_WRONGSEC as an 1558 allowable error to every other operation. 1560 A COMPOUND containing the series put filehandle operation + 1561 SECINFO_NO_NAME (style SECINFO_STYLE4_CURRENT_FH) is an efficient way 1562 for the client to recover from NFS4ERR_WRONGSEC. 1564 The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to any operation 1565 other than a put filehandle operation, LOOKUP, LOOKUPP, and OPEN (by 1566 component name). 1568 2.6.3.1.1.8. Operations after SECINFO and SECINFO_NO_NAME 1570 Suppose a client sends a COMPOUND procedure containing the series 1571 SEQUENCE, PUTFH, SECINFO_NONAME, READ, and suppose the security tuple 1572 used does not match that required for the target file. By rule (see 1573 Section 2.6.3.1.1.5), neither PUTFH nor SECINFO_NO_NAME can return 1574 NFS4ERR_WRONGSEC. By rule (see Section 2.6.3.1.1.7), READ cannot 1575 return NFS4ERR_WRONGSEC. The issue is resolved by the fact that 1576 SECINFO and SECINFO_NO_NAME consume the current filehandle (note that 1577 this is a change from NFSv4.0). This leaves no current filehandle 1578 for READ to use, and READ returns NFS4ERR_NOFILEHANDLE. 1580 2.6.3.1.2. LINK and RENAME 1582 The LINK and RENAME operations use both the current and saved 1583 filehandles. When the current filehandle is injected into a series 1584 of operations via a put filehandle operation, the server MUST return 1585 NFS4ERR_WRONGSEC, per Section 2.6.3.1.1. LINK and RENAME MAY return 1586 NFS4ERR_WRONGSEC if the security policy of the saved filehandle 1587 rejects the security flavor used in the COMPOUND request's 1588 credentials. If the server does so, then if there is no intersection 1589 between the security policies of saved and current filehandles, this 1590 means it will be impossible for client to perform the intended LINK 1591 or RENAME operation. 1593 For example, suppose the client sends this COMPOUND request: 1594 SEQUENCE, PUTFH bFH, SAVEFH, PUTFH aFH, RENAME "c" "d", where 1595 filehandles bFH and aFH refer to different directories. Suppose no 1596 common security tuple exists between the security policies of aFH and 1597 bFH. If the client sends the request using credentials acceptable to 1598 bFH's security policy but not aFH's policy, then the PUTFH aFH 1599 operation will fail with NFS4ERR_WRONGSEC. After a SECINFO_NO_NAME 1600 request, the client sends SEQUENCE, PUTFH bFH, SAVEFH, PUTFH aFH, 1601 RENAME "c" "d", using credentials acceptable to aFH's security 1602 policy, but not bFH's policy. The server returns NFS4ERR_WRONGSEC on 1603 the RENAME operation. 1605 To prevent a client from an endless sequence of a request containing 1606 LINK or RENAME, followed by a request containing SECINFO_NO_NAME, the 1607 server MUST detect when the security policies of the current and 1608 saved filehandles have no mutually acceptable security tuple, and 1609 MUST NOT NFS4ERR_WRONGSEC in that situation. Instead the server MUST 1610 return NFS4ERR_XDEV. 1612 Thus while a server MAY return NFS4ERR_WRONGSEC from LINK and RENAME, 1613 the server implementor may reasonably decide the consequences are not 1614 worth the security benefits, and so allow the security policy of the 1615 current filehandle to override that of the saved filehandle. 1617 2.7. Minor Versioning 1619 To address the requirement of an NFS protocol that can evolve as the 1620 need arises, the NFSv4.1 protocol contains the rules and framework to 1621 allow for future minor changes or versioning. 1623 The base assumption with respect to minor versioning is that any 1624 future accepted minor version must follow the IETF process and be 1625 documented in a standards track RFC. Therefore, each minor version 1626 number will correspond to an RFC. Minor version zero of the NFSv4 1627 protocol is represented by [21], and minor version one is represented 1628 by this document [[Comment.1: RFC Editor: change "document" to "RFC" 1629 when we publish]]. The COMPOUND and CB_COMPOUND procedures support 1630 the encoding of the minor version being requested by the client. 1632 The following items represent the basic rules for the development of 1633 minor versions. Note that a future minor version may decide to 1634 modify or add to the following rules as part of the minor version 1635 definition. 1637 1. Procedures are not added or deleted 1639 To maintain the general RPC model, NFSv4 minor versions will not 1640 add to or delete procedures from the NFS program. 1642 2. Minor versions may add operations to the COMPOUND and 1643 CB_COMPOUND procedures. 1645 The addition of operations to the COMPOUND and CB_COMPOUND 1646 procedures does not affect the RPC model. 1648 * Minor versions may append attributes to the bitmap4 that 1649 represents sets of attributes and the fattr4 that represents 1650 sets of attribute values. 1652 This allows for the expansion of the attribute model to allow 1653 for future growth or adaptation. 1655 * Minor version X must append any new attributes after the last 1656 documented attribute. 1658 Since attribute results are specified as an opaque array of 1659 per-attribute XDR encoded results, the complexity of adding 1660 new attributes in the midst of the current definitions would 1661 be too burdensome. 1663 3. Minor versions must not modify the structure of an existing 1664 operation's arguments or results. 1666 Again the complexity of handling multiple structure definitions 1667 for a single operation is too burdensome. New operations should 1668 be added instead of modifying existing structures for a minor 1669 version. 1671 This rule does not preclude the following adaptations in a minor 1672 version. 1674 * adding bits to flag fields such as new attributes to 1675 GETATTR's bitmap4 data type and providing corresponding 1676 variants of opaque arrays, such as a notify4 used together 1677 with such bitmaps. 1679 * adding bits to existing attributes like ACLs that have flag 1680 words 1682 * extending enumerated types (including NFS4ERR_*) with new 1683 values 1685 * adding cases to a switched union 1687 4. Minor versions may not modify the structure of existing 1688 attributes. 1690 5. Minor versions may not delete operations. 1692 This prevents the potential reuse of a particular operation 1693 "slot" in a future minor version. 1695 6. Minor versions may not delete attributes. 1697 7. Minor versions may not delete flag bits or enumeration values. 1699 8. Minor versions may declare an operation MUST NOT be implemented. 1701 Specifying an operation MUST NOT be implemented is equivalent to 1702 obsoleting an operation. For the client, it means that the 1703 operation should not be sent to the server. For the server, an 1704 NFS error can be returned as opposed to "dropping" the request 1705 as an XDR decode error. This approach allows for the 1706 obsolescence of an operation while maintaining its structure so 1707 that a future minor version can reintroduce the operation. 1709 1. Minor versions may declare an attribute MUST NOT be 1710 implemented. 1712 2. Minor versions may declare a flag bit or enumeration value 1713 MUST NOT be implemented. 1715 9. Minor versions may downgrade features from REQUIRED to 1716 RECOMMENDED, or RECOMMENDED to OPTIONAL. 1718 10. Minor versions may upgrade features from OPTIONAL to RECOMMENDED 1719 or RECOMMENDED to REQUIRED. 1721 11. A client and server that supports minor version X should support 1722 minor versions 0 (zero) through X-1 as well. 1724 12. Except for infrastructural changes, no new features may be 1725 introduced as REQUIRED in a minor version. 1727 This rule allows for the introduction of new functionality and 1728 forces the use of implementation experience before designating a 1729 feature as REQUIRED. On the other hand, some classes of 1730 features are infrastructural and have broad effects. Allowing 1731 such features to not be REQUIRED complicates implementation of 1732 the minor version. 1734 13. A client MUST NOT attempt to use a stateid, filehandle, or 1735 similar returned object from the COMPOUND procedure with minor 1736 version X for another COMPOUND procedure with minor version Y, 1737 where X != Y. 1739 2.8. Non-RPC-based Security Services 1741 As described in Section 2.2.1.1.1.1, NFSv4.1 relies on RPC for 1742 identification, authentication, integrity, and privacy. NFSv4.1 1743 itself provides or enables additional security services as described 1744 in the next several subsections. 1746 2.8.1. Authorization 1748 Authorization to access a file object via an NFSv4.1 operation is 1749 ultimately determined by the NFSv4.1 server. A client can 1750 predetermine its access to a file object via the OPEN (Section 18.16) 1751 and the ACCESS (Section 18.1) operations. 1753 Principals with appropriate access rights can modify the 1754 authorization on a file object via the SETATTR (Section 18.30) 1755 operation. Attributes that affect access rights include: mode, 1756 owner, owner_group, acl, dacl, and sacl. See Section 5. 1758 2.8.2. Auditing 1760 NFSv4.1 provides auditing on a per file object basis, via the acl and 1761 sacl attributes as described in Section 6. It is outside the scope 1762 of this specification to specify audit log formats or management 1763 policies. 1765 2.8.3. Intrusion Detection 1767 NFSv4.1 provides alarm control on a per file object basis, via the 1768 acl and sacl attributes as described in Section 6. Alarms may serve 1769 as the basis for intrusion detection. It is outside the scope of 1770 this specification to specify heuristics for detecting intrusion via 1771 alarms. 1773 2.9. Transport Layers 1775 2.9.1. REQUIRED and RECOMMENDED Properties of Transports 1777 NFSv4.1 works over RDMA and non-RDMA_based transports with the 1778 following attributes: 1780 o The transport supports reliable delivery of data, which NFSv4.1 1781 requires but neither NFSv4.1 nor RPC has facilities for ensuring. 1783 [24] 1785 o The transport delivers data in the order it was sent. Ordered 1786 delivery simplifies detection of transmit errors, and simplifies 1787 the sending of arbitrary sized requests and responses, via the 1788 record marking protocol [3]. 1790 Where an NFSv4.1 implementation supports operation over the IP 1791 network protocol, any transport used between NFS and IP MUST be among 1792 the IETF-approved congestion control transport protocols. At the 1793 time this document was written, the only two transports that had the 1794 above attributes were TCP and SCTP. To enhance the possibilities for 1795 interoperability, an NFSv4.1 implementation MUST support operation 1796 over the TCP transport protocol. 1798 Even if NFSv4.1 is used over a non-IP network protocol, it is 1799 RECOMMENDED that the transport support congestion control. 1801 It is permissible for a connectionless transport to be used under 1802 NFSv4.1, however reliable and in-order delivery of data by the 1803 connectionless transport is REQUIRED. NFSv4.1 assumes that a client 1804 transport address and server transport address used to send data over 1805 a transport together constitute a connection, even if the underlying 1806 transport eschews the concept of a connection. 1808 2.9.2. Client and Server Transport Behavior 1810 If a connection-oriented transport (e.g. TCP) is used, the client 1811 and server SHOULD use long lived connections for at least three 1812 reasons: 1814 1. This will prevent the weakening of the transport's congestion 1815 control mechanisms via short lived connections. 1817 2. This will improve performance for the WAN environment by 1818 eliminating the need for connection setup handshakes. 1820 3. The NFSv4.1 callback model differs from NFSv4.0, and requires the 1821 client and server to maintain a client-created backchannel (see 1822 Section 2.10.3.1) for the server to use. 1824 In order to reduce congestion, if a connection-oriented transport is 1825 used, and the request is not the NULL procedure, 1827 o A requester MUST NOT retry a request unless the connection the 1828 request was sent over was lost before the reply was received. 1830 o A replier MUST NOT silently drop a request, even if the request is 1831 a retry. (The silent drop behavior of RPCSEC_GSS [4] does not 1832 apply because this behavior happens at the RPCSEC_GSS layer, a 1833 lower layer in the request processing). Instead, the replier 1834 SHOULD return an appropriate error (see Section 2.10.5.1) or it 1835 MAY disconnect the connection. 1837 When sending a reply, the replier MUST send the reply to the same 1838 full network address (e.g. if using an IP-based transport, the source 1839 port of the requester is part of the full network address) that the 1840 requester sent the request from. If using a connection-oriented 1841 transport, replies MUST be sent on the same connection the request 1842 was received from. 1844 If a connection is dropped after the replier receives the request but 1845 before the replier sends the reply, the replier might have an pending 1846 reply. If a connection is established with the same source and 1847 destination full network address as the dropped connection, then the 1848 replier MUST NOT send the reply until the client retries the request. 1849 The reason for this prohibition is that the client MAY retry a 1850 request over a different connection than is associated with the 1851 session. 1853 When using RDMA transports there are other reasons for not tolerating 1854 retries over the same connection: 1856 o RDMA transports use "credits" to enforce flow control, where a 1857 credit is a right to a peer to transmit a message. If one peer 1858 were to retransmit a request (or reply), it would consume an 1859 additional credit. If the replier retransmitted a reply, it would 1860 certainly result in an RDMA connection loss, since the requester 1861 would typically only post a single receive buffer for each 1862 request. If the requester retransmitted a request, the additional 1863 credit consumed on the server might lead to RDMA connection 1864 failure unless the client accounted for it and decreased its 1865 available credit, leading to wasted resources. 1867 o RDMA credits present a new issue to the reply cache in NFSv4.1. 1868 The reply cache may be used when a connection within a session is 1869 lost, such as after the client reconnects. Credit information is 1870 a dynamic property of the RDMA connection, and stale values must 1871 not be replayed from the cache. This implies that the reply cache 1872 contents must not be blindly used when replies are sent from it, 1873 and credit information appropriate to the channel must be 1874 refreshed by the RPC layer. 1876 In addition, as described in Section 2.10.5.2, while a session is 1877 active, the NFSv4.1 requester MUST NOT stop waiting for a reply. 1879 2.9.3. Ports 1881 Historically, NFSv3 servers have listened over TCP port 2049. The 1882 registered port 2049 [25] for the NFS protocol should be the default 1883 configuration. NFSv4.1 clients SHOULD NOT use the RPC binding 1884 protocols as described in [26]. 1886 2.10. Session 1888 2.10.1. Motivation and Overview 1890 Previous versions and minor versions of NFS have suffered from the 1891 following: 1893 o Lack of support for Exactly Once Semantics (EOS). This includes 1894 lack of support for EOS through server failure and recovery. 1896 o Limited callback support, including no support for sending 1897 callbacks through firewalls, and races between replies to normal 1898 requests and callbacks. 1900 o Limited trunking over multiple network paths. 1902 o Requiring machine credentials for fully secure operation. 1904 Through the introduction of a session, NFSv4.1 addresses the above 1905 shortfalls with practical solutions: 1907 o EOS is enabled by a reply cache with a bounded size, making it 1908 feasible to keep the cache in persistent storage and enable EOS 1909 through server failure and recovery. One reason that previous 1910 revisions of NFS did not support EOS was because some EOS 1911 approaches often limited parallelism. As will be explained in 1912 Section 2.10.5, NFSv4.1 supports both EOS and unlimited 1913 parallelism. 1915 o The NFSv4.1 client (defined in Section 1.5, Paragraph 2) creates 1916 transport connections and provides them to the server to use for 1917 sending callback requests, thus solving the firewall issue 1918 (Section 18.34). Races between responses from client requests, 1919 and callbacks caused by the requests are detected via the 1920 session's sequencing properties which are a consequence of EOS 1921 (Section 2.10.5.3). 1923 o The NFSv4.1 client can add an arbitrary number of connections to 1924 the session, and thus provide trunking (Section 2.10.4). 1926 o The NFSv4.1 client and server produces a session key independent 1927 of client and server machine credentials which can be used to 1928 compute a digest for protecting critical session management 1929 operations (Section 2.10.7.3). 1931 o The NFSv4.1 client can also create secure RPCSEC_GSS contexts for 1932 use by the session's backchannel that do not require the server to 1933 authenticate to a client machine principal (Section 2.10.7.2). 1935 A session is a dynamically created, long-lived server object created 1936 by a client, used over time from one or more transport connections. 1937 Its function is to maintain the server's state relative to the 1938 connection(s) belonging to a client instance. This state is entirely 1939 independent of the connection itself, and indeed the state exists 1940 whether the connection exists or not. A client may have one or more 1941 sessions associated with it so that client-associated state may be 1942 accessed using any of the sessions associated with that client's 1943 client ID, when connections are associated with those sessions. When 1944 no connections are associated with any of a client ID's sessions for 1945 an extended time, such objects as locks, opens, delegations, layouts, 1946 etc. are subject to expiration. The session serves as an object 1947 representing a means of access by a client to the associated client 1948 state on the server, independent of the physical means of access to 1949 that state. 1951 A single client may create multiple sessions. A single session MUST 1952 NOT serve multiple clients. 1954 2.10.2. NFSv4 Integration 1956 Sessions are part of NFSv4.1 and not NFSv4.0. Normally, a major 1957 infrastructure change such as sessions would require a new major 1958 version number to an ONC RPC program like NFS. However, because 1959 NFSv4 encapsulates its functionality in a single procedure, COMPOUND, 1960 and because COMPOUND can support an arbitrary number of operations, 1961 sessions have been added to NFSv4.1 with little difficulty. COMPOUND 1962 includes a minor version number field, and for NFSv4.1 this minor 1963 version is set to 1. When the NFSv4 server processes a COMPOUND with 1964 the minor version set to 1, it expects a different set of operations 1965 than it does for NFSv4.0. NFSv4.1 defines the SEQUENCE operation, 1966 which is required for every COMPOUND that operates over an 1967 established session, with the exception of some session 1968 administration operations, such as DESTROY_SESSION (Section 18.37). 1970 2.10.2.1. SEQUENCE and CB_SEQUENCE 1972 In NFSv4.1, when the SEQUENCE operation is present, it MUST be the 1973 first operation in the COMPOUND procedure. The primary purpose of 1974 SEQUENCE is to carry the session identifier. The session identifier 1975 associates all other operations in the COMPOUND procedure with a 1976 particular session. SEQUENCE also contains required information for 1977 maintaining EOS (see Section 2.10.5). Session-enabled NFSv4.1 1978 COMPOUND requests thus have the form: 1980 +-----+--------------+-----------+------------+-----------+---- 1981 | tag | minorversion | numops |SEQUENCE op | op + args | ... 1982 | | (== 1) | (limited) | + args | | 1983 +-----+--------------+-----------+------------+-----------+---- 1985 and the reply's structure is: 1987 +------------+-----+--------+-------------------------------+--// 1988 |last status | tag | numres |status + SEQUENCE op + results | // 1989 +------------+-----+--------+-------------------------------+--// 1990 //-----------------------+---- 1991 // status + op + results | ... 1992 //-----------------------+---- 1994 A CB_COMPOUND procedure request and reply has a similar form to 1995 COMPOUND, but instead of a SEQUENCE operation, there is a CB_SEQUENCE 1996 operation. CB_COMPOUND also has an additional field called 1997 "callback_ident", which is superfluous in NFSv4.1 and MUST be ignored 1998 by the client. CB_SEQUENCE has the same information as SEQUENCE, and 1999 also includes other information needed to resolve callback races 2000 (Section 2.10.5.3). 2002 2.10.2.2. Client ID and Session Association 2004 Each client ID (Section 2.4) can have zero or more active sessions. 2005 A client ID and associated session are required to perform file 2006 access in NFSv4.1. Each time a session is used (whether by a client 2007 sending a request to the server, or the client replying to a callback 2008 request from the server), the state leased to its associated client 2009 ID is automatically renewed. 2011 State such as share reservations, locks, delegations, and layouts 2012 (Section 1.6.4) is tied to the client ID. Client state is not tied 2013 to any individual session. Successive state changing operations from 2014 a given state owner MAY go over different sessions, provided the 2015 session is associated with the same client ID. A callback MAY arrive 2016 over a different session than from the session that originally 2017 acquired the state pertaining to the callback. For example, if 2018 session A is used to acquire a delegation, a request to recall the 2019 delegation MAY arrive over session B if both sessions are associated 2020 with the same client ID. Section 2.10.7.1 and Section 2.10.7.2 2021 discuss the security considerations around callbacks. 2023 2.10.3. Channels 2025 A channel is not a connection. A channel represents the direction 2026 ONC RPC requests are sent. 2028 Each session has one or two channels: the fore channel and the 2029 backchannel. Because there are at most two channels per session, and 2030 because each channel has a distinct purpose, channels are not 2031 assigned identifiers. 2033 The fore channel is used for ordinary requests from the client to the 2034 server, and carries COMPOUND requests and responses. A session 2035 always has a fore channel. 2037 The backchannel used for callback requests from server to client, and 2038 carries CB_COMPOUND requests and responses. Whether there is a 2039 backchannel or not is a decision by the client, however many features 2040 of NFSv4.1 require a backchannel. NFSv4.1 servers MUST support 2041 backchannels. 2043 Each session has resources for each channel, including separate reply 2044 caches (see Section 2.10.5.1). Note that even the backchannel 2045 requires a reply cache because some callback operations are 2046 nonidempotent. 2048 2.10.3.1. Association of Connections, Channels, and Sessions 2050 Each channel is associated with zero or more transport connections. 2051 A connection can be associated with one channel or both channels of a 2052 session; the client and server negotiate whether a connection will 2053 carry traffic for one channel or both channels via the CREATE_SESSION 2054 (Section 18.36) and the BIND_CONN_TO_SESSION (Section 18.34) 2055 operations. When a session is created via CREATE_SESSION, the 2056 connection that transported the CREATE_SESSION request is 2057 automatically associated with the fore channel, and optionally the 2058 backchannel. If the client specifies no state protection 2059 (Section 18.35) when the session is created, then when SEQUENCE is 2060 transmitted on a different connection, the connection is 2061 automatically associated with the fore channel of the session 2062 specified in the SEQUENCE operation. 2064 A connection's association with a session is not exclusive. A 2065 connection associated with the channel(s) of one session may be 2066 simultaneously associated with the channel(s) of other sessions 2067 including sessions associated with other client IDs. 2069 It is permissible for connections of multiple transport types to be 2070 associated with the same channel. For example both a TCP and RDMA 2071 connection can be associated with the fore channel. In the event an 2072 RDMA and non-RDMA connection are associated with the same channel, 2073 the maximum number of slots SHOULD be at least one more than the 2074 total number of RDMA credits (Section 2.10.5.1. This way if all RDMA 2075 credits are used, the non-RDMA connection can have at least one 2076 outstanding request. If a server supports multiple transport types, 2077 it MUST allow a client to associate connections from each transport 2078 to a channel. 2080 It is permissible for a connection of one type of transport to be 2081 associated with the fore channel, and a connection of a different 2082 type to be associated with the backchannel. 2084 2.10.4. Trunking 2086 Trunking is the use of multiple connections between a client and 2087 server in order to increase the speed of data transfer. NFSv4.1 2088 supports two types of trunking: session trunking and client ID 2089 trunking. NFSv4.1 servers MUST support trunking. 2091 Session trunking is essentially the association of multiple 2092 connections, each with potentially different target and/or source 2093 network addresses, to the same session. 2095 Client ID trunking is the association of multiple sessions to the 2096 same client ID, major server owner ID (Section 2.5), and server scope 2097 (Section 11.7.7). When two servers return the same major server 2098 owner and server scope it means the two servers are cooperating on 2099 locking state management which is a prerequisite for client ID 2100 trunking. 2102 Understanding and distinguishing session and client ID trunking 2103 requires understanding how the results of the EXCHANGE_ID 2104 (Section 18.35) operation identify a server. Suppose a client sends 2105 EXCHANGE_ID over two different connections each with a possibly 2106 different target network address but each EXCHANGE_ID with the same 2107 value in the eia_clientowner field. If the same NFSv4.1 server is 2108 listening over each connection, then each EXCHANGE_ID result MUST 2109 return the same values of eir_clientid, eir_server_owner.so_major_id 2110 and eir_server_scope. The client can then treat each connection as 2111 referring to the same server (subject to verification, see 2112 Paragraph 5 later in this section), and it can use each connection to 2113 trunk requests and replies. The question is whether session trunking 2114 and/or client ID trunking applies. 2116 Session Trunking If the eia_clientowner argument is the same in two 2117 different EXCHANGE_ID requests, and the eir_clientid, 2118 eir_server_owner.so_major_id, eir_server_owner.so_minor_id, and 2119 eir_server_scope results match in both EXCHANGE_ID results, then 2120 the client is permitted to perform session trunking. If the 2121 client has no session mapping to the tuple of eir_clientid, 2122 eir_server_owner.so_major_id, eir_server_scope, 2123 eir_server_owner.so_minor_id, then it creates the session via a 2124 CREATE_SESSION operation over one of the connections, which 2125 associates the connection to the session. If there is a session 2126 for the tuple, the client can send BIND_CONN_TO_SESSION to 2127 associate the connection to the session. (Of course, if the 2128 client does not want to use session trunking, it can invoke 2129 CREATE_SESSION on the connection. This will result in client ID 2130 trunking as described below.) 2132 Client ID Trunking If the eia_clientowner argument is the same in 2133 two different EXCHANGE_ID requests, and the eir_clientid, 2134 eir_server_owner.so_major_id, and eir_server_scope results match 2135 in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id 2136 results do not match then the client is permitted to perform 2137 client ID trunking. The client can associate each connection with 2138 different sessions, where each session is associated with the same 2139 server. 2141 Of course, even if the eir_server_owner.so_minor_id fields do 2142 match, the client is free to employ client ID trunking instead of 2143 session trunking. 2145 The client completes the act of client ID trunking by invoking 2146 CREATE_SESSION on each connection, using the same client ID that 2147 was returned in eir_clientid. These invocations create two 2148 sessions and also associate each connection with each session. 2150 When doing client ID trunking, locking state is shared across 2151 sessions associated with the same client ID. This requires the 2152 server to coordinate state across sessions. 2154 When two servers over two connections claim matching or partially 2155 matching eir_server_owner, eir_server_scope, and eir_clientid values, 2156 the client does not have to trust the servers' claims. The client 2157 may verify these claims before trunking traffic in the following 2158 ways: 2160 o For session trunking, clients SHOULD reliably verify if 2161 connections between different network paths are in fact associated 2162 with the same NFSv4.1 server and usable on the same session, and 2163 servers MUST allow clients to perform reliable verification. When 2164 a client ID is created, the client SHOULD specify that 2165 BIND_CONN_TO_SESSION is to be verified according to the SP4_SSV or 2166 SP4_MACH_CRED (Section 18.35) state protection options. For 2167 SP4_SSV, reliable verification depends on a shared secret (the 2168 SSV) that is established via the SET_SSV (Section 18.47) 2169 operation. 2171 When a new connection is associated with the session (via the 2172 BIND_CONN_TO_SESSION operation, see Section 18.34), if the client 2173 specified SP4_SSV state protection for the BIND_CONN_TO_SESSION 2174 operation, the client MUST send the BIND_CONN_TO_SESSION with 2175 RPCSEC_GSS protection, using integrity or privacy, and an 2176 RPCSEC_GSS handle created with the GSS SSV mechanism 2177 (Section 2.10.8). 2179 If the client mistakenly tries to associate a connection to a 2180 session of a wrong server, the server will either reject the 2181 attempt because it is not aware of the session identifier of the 2182 BIND_CONN_TO_SESSION arguments, or it will reject the attempt 2183 because the RPCSEC_GSS authentication fails. Even if the server 2184 mistakenly or maliciously accepts the connection association 2185 attempt, the RPCSEC_GSS verifier it computes in the response will 2186 not be verified by the client, so the client will know it cannot 2187 use the connection for trunking the specified session. 2189 If the client specified SP4_MACH_CRED state protection, the 2190 BIND_CONN_TO_SESSION operation will use RPCSEC_GSS integrity or 2191 privacy, using the same credential that was used when the client 2192 ID was created. Mutual authentication via RPCSEC_GSS assures the 2193 client that the connection is associated with the correct session 2194 of the correct server. 2196 o For client ID trunking, the client has at least two options for 2197 verifying that the same client ID obtained from two different 2198 EXCHANGE_ID operations came from the same server. The first 2199 option is to use RPCSEC_GSS authentication when issuing each 2200 EXCHANGE_ID. Each time an EXCHANGE_ID is sent with RPCSEC_GSS 2201 authentication, the client notes the principal name of the GSS 2202 target. If the EXCHANGE_ID results indicate client ID trunking is 2203 possible, and the GSS targets' principal names are the same, the 2204 servers are the same and client ID trunking is allowed. 2206 The second option for verification is to use SP4_SSV protection. 2208 When the client sends EXCHANGE_ID it specifies SP4_SSV protection. 2209 The first EXCHANGE_ID the client sends always has to be confirmed 2210 by a CREATE_SESSION call. The client then sends SET_SSV. Later 2211 the client sends EXCHANGE_ID to a second destination network 2212 address than the first EXCHANGE_ID was sent with. The client 2213 checks that each EXCHANGE_ID reply has the same eir_clientid, 2214 eir_server_owner.so_major_id, and eir_server_scope. If so, the 2215 client verifies the claim by issuing a CREATE_SESSION to the 2216 second destination address, protected with RPCSEC_GSS integrity 2217 using an RPCSEC_GSS handle returned by the second EXCHANGE_ID. If 2218 the server accepts the CREATE_SESSION request, and if the client 2219 verifies the RPCSEC_GSS verifier and integrity codes, then the 2220 client has proof the second server knows the SSV, and thus the two 2221 servers are the same for the purposes of client ID trunking. 2223 2.10.5. Exactly Once Semantics 2225 Via the session, NFSv4.1 offers Exactly Once Semantics (EOS) for 2226 requests sent over a channel. EOS is supported on both the fore and 2227 back channels. 2229 Each COMPOUND or CB_COMPOUND request that is sent with a leading 2230 SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver 2231 exactly once. This requirement holds regardless of whether the 2232 request is sent with reply caching specified (see 2233 Section 2.10.5.1.3). The requirement holds even if the requester is 2234 issuing the request over a session created between a pNFS data client 2235 and pNFS data server. To understand the rationale for this 2236 requirement, divide the requests into three classifications: 2238 o Nonidempotent requests. 2240 o Idempotent modifying requests. 2242 o Idempotent non-modifying requests. 2244 An example of a non-idempotent request is RENAME. If is obvious that 2245 if a replier executes the same RENAME request twice, and the first 2246 execution succeeds, the re-execution will fail. If the replier 2247 returns the result from the re-execution, this result is incorrect. 2248 Therefore, EOS is required for nonidempotent requests. 2250 An example of an idempotent modifying request is a COMPOUND request 2251 containing a WRITE operation. Repeated execution of the same WRITE 2252 has the same effect as execution of that write a single time. 2253 Nevertheless, enforcing EOS for WRITEs and other idempotent modifying 2254 requests is necessary to avoid data corruption. 2256 Suppose a client sends WRITE A to a noncompliant server that does not 2257 enforce EOS, and receives no response, perhaps due to a network 2258 partition. The client reconnects to the server and re-sends WRITE A. 2259 Now, the server has outstanding two instances of A. The server can be 2260 in a situation in which it executes and replies to the retry of A, 2261 while the first A is still waiting in the server's internal I/O 2262 system for some resource. Upon receiving the reply to the second 2263 attempt of WRITE A, the client believes its write is done so it is 2264 free to send WRITE B which overlaps the range of A. When the original 2265 A is dispatched from the server's I/O system, and executed (thus the 2266 second time A will have been written), then what has been written by 2267 B can be overwritten and thus corrupted. 2269 An example of an idempotent non-modifying request is a COMPOUND 2270 containing SEQUENCE, PUTFH, READLINK and nothing else. The re- 2271 execution of a such a request will not cause data corruption, or 2272 produce an incorrect result. Nonetheless, to keep the implementation 2273 simple, the replier MUST enforce EOS for all requests whether 2274 idempotent and non-modifying or not. 2276 Note that true and complete EOS is not possible unless the server 2277 persists the reply cache in stable storage, unless the server is 2278 somehow implemented to never require a restart (indeed if such a 2279 server exists, the distinction between a reply cache kept in stable 2280 storage versus one that is not is one without meaning). See 2281 Section 2.10.5.5 for a discussion of persistence in the reply cache. 2282 Regardless, even if the server does not persist the reply cache, EOS 2283 improves robustness and correctness over previous versions of NFS 2284 because the legacy duplicate request/reply caches were based on the 2285 ONC RPC transaction identifier (XID). Section 2.10.5.1 explains the 2286 shortcomings of the XID as a basis for a reply cache and describes 2287 how NFSv4.1 sessions improve upon the XID. 2289 2.10.5.1. Slot Identifiers and Reply Cache 2291 The RPC layer provides a transaction ID (XID), which, while required 2292 to be unique, is not convenient for tracking requests for two 2293 reasons. First, the XID is only meaningful to the requester; it 2294 cannot be interpreted by the replier except to test for equality with 2295 previously sent requests. When consulting an RPC-based duplicate 2296 request cache, the opaqueness of the XID requires a computationally 2297 expensive lookup (often via a hash that includes XID and source 2298 address). NFSv4.1 requests use a non-opaque slot ID which is an 2299 index into a slot table, which is far more efficient. Second, 2300 because RPC requests can be executed by the replier in any order, 2301 there is no bound on the number of requests that may be outstanding 2302 at any time. To achieve perfect EOS using ONC RPC would require 2303 storing all replies in the reply cache. XIDs are 32 bits; storing 2304 over four billion (2^32) replies in the reply cache is not practical. 2305 In practice, previous versions of NFS have chosen to store a fixed 2306 number of replies in the cache, and use a least recently used (LRU) 2307 approach to replacing cache entries with new entries when the cache 2308 is full. In NFSv4.1, the number of outstanding requests is bounded 2309 by the size of the slot table, and a sequence ID per slot is used to 2310 tell the replier when it is safe to delete a cached reply. 2312 In the NFSv4.1 reply cache, when the requester sends a new request, 2313 it selects a slot ID in the range 0..N, where N is the replier's 2314 current maximum slot ID granted to the requester on the session over 2315 which the request is to be sent. The value of N starts out as equal 2316 to ca_maxrequests - 1 (Section 18.36), but can be adjusted by the 2317 response to SEQUENCE or CB_SEQUENCE as described later in this 2318 section. The slot ID must be unused by any of the requests which the 2319 requester has already active on the session. "Unused" here means the 2320 requester has no outstanding request for that slot ID. 2322 A slot contains a sequence ID and the cached reply corresponding to 2323 the request sent with that sequence ID. The sequence ID is a 32 bit 2324 unsigned value, and is therefore in the range 0..0xFFFFFFFF (2^32 - 2325 1). The first time a slot is used, the requester MUST specify a 2326 sequence ID of one (1) (Section 18.36). Each time a slot is reused, 2327 the request MUST specify a sequence ID that is one greater than that 2328 of the previous request on the slot. If the previous sequence ID was 2329 0xFFFFFFFF, then the next request for the slot MUST have the sequence 2330 ID set to zero (i.e. (2^32 - 1) + 1 mod 2^32). 2332 The sequence ID accompanies the slot ID in each request. It is for 2333 the critical check at the server: it used to efficiently determine 2334 whether a request using a certain slot ID is a retransmit or a new, 2335 never-before-seen request. It is not feasible for the client to 2336 assert that it is retransmitting to implement this, because for any 2337 given request the client cannot know whether the server has seen it 2338 unless the server actually replies. Of course, if the client has 2339 seen the server's reply, the client would not retransmit. 2341 The replier compares each received request's sequence ID with the 2342 last one previously received for that slot ID, to see if the new 2343 request is: 2345 o A new request, in which the sequence ID is one greater than that 2346 previously seen in the slot (accounting for sequence wraparound). 2347 The replier proceeds to execute the new request, and the replier 2348 MUST increase the slot's sequence ID by one. 2350 o A retransmitted request, in which the sequence ID is equal to that 2351 currently recorded in the slot. If the original request has 2352 executed to completion, the replier returns the cached reply. See 2353 Section 2.10.5.2 for direction on how the replier deals with 2354 retries of requests that are stll in progress. 2356 o A misordered retry, in which the sequence ID is less than 2357 (accounting for sequence wraparound) that previously seen in the 2358 slot. The replier MUST return NFS4ERR_SEQ_MISORDERED (as the 2359 result from SEQUENCE or CB_SEQUENCE). 2361 o A misordered new request, in which the sequence ID is two or more 2362 than (accounting for sequence wraparound) than that previously 2363 seen in the slot. Note that because the sequence ID must 2364 wraparound to zero (0) once it reaches 0xFFFFFFFF, a misordered 2365 new request and a misordered retry cannot be distinguished. Thus, 2366 the replier MUST return NFS4ERR_SEQ_MISORDERED (as the result from 2367 SEQUENCE or CB_SEQUENCE). 2369 Unlike the XID, the slot ID is always within a specific range; this 2370 has two implications. The first implication is that for a given 2371 session, the replier need only cache the results of a limited number 2372 of COMPOUND requests . The second implication derives from the 2373 first, which is that unlike XID-indexed reply caches (also known as 2374 duplicate request caches - DRCs), the slot ID-based reply cache 2375 cannot be overflowed. Through use of the sequence ID to identify 2376 retransmitted requests, the replier does not need to actually cache 2377 the request itself, reducing the storage requirements of the reply 2378 cache further. These facilities make it practical to maintain all 2379 the required entries for an effective reply cache. 2381 The slot ID, sequence ID, and session ID therefore take over the 2382 traditional role of the XID and source network address in the 2383 replier's reply cache implementation. This approach is considerably 2384 more portable and completely robust - it is not subject to the 2385 reassignment of ports as clients reconnect over IP networks. In 2386 addition, the RPC XID is not used in the reply cache, enhancing 2387 robustness of the cache in the face of any rapid reuse of XIDs by the 2388 requester. While the replier does not care about the XID for the 2389 purposes of reply cache management (but the replier MUST return the 2390 same XID that was in the request), nonetheless there are 2391 considerations for the XID in NFSv4.1 that are the same as all other 2392 previous versions of NFS. The RPC XID remains in each message and 2393 must be formulated in NFSv4.1 requests as in any other ONC RPC 2394 request. The reasons include: 2396 o The RPC layer retains its existing semantics and implementation. 2398 o The requester and replier must be able to interoperate at the RPC 2399 layer, prior to the NFSv4.1 decoding of the SEQUENCE or 2400 CB_SEQUENCE operation. 2402 o If an operation is being used that does not start with SEQUENCE or 2403 CB_SEQUENCE (e.g. BIND_CONN_TO_SESSION), then the RPC XID is 2404 needed for correct operation to match the reply to the request. 2406 o The SEQUENCE or CB_SEQUENCE operation may generate an error. If 2407 so, the embedded slot ID, sequence ID, and session ID (if present) 2408 in the request will not be in the reply, and the requester has 2409 only the XID to match the reply to the request. 2411 Given that well formulated XIDs continue to be required, this begs 2412 the question why SEQUENCE and CB_SEQUENCE replies have a session ID, 2413 slot ID and sequence ID? Having the session ID in the reply means 2414 the requester does not have to use the XID to lookup the session ID, 2415 which would be necessary if the connection were associated with 2416 multiple sessions. Having the slot ID and sequence ID in the reply 2417 means requester does not have to use the XID to lookup the slot ID 2418 and sequence ID. Furhermore, since the XID is only 32 bits, it is 2419 too small to guarantee the re-association of a reply with its request 2420 ([27]); having session ID, slot ID, and sequence ID in the reply 2421 allows the client to validate that the reply in fact belongs to the 2422 matched request. 2424 The SEQUENCE (and CB_SEQUENCE) operation also carries a 2425 "highest_slotid" value which carries additional requester slot usage 2426 information. The requester must always indicate the slot ID 2427 representing the outstanding request with the highest-numbered slot 2428 value. The requester should in all cases provide the most 2429 conservative value possible, although it can be increased somewhat 2430 above the actual instantaneous usage to maintain some minimum or 2431 optimal level. This provides a way for the requester to yield unused 2432 request slots back to the replier, which in turn can use the 2433 information to reallocate resources. 2435 The replier responds with both a new target highest_slotid, and an 2436 enforced highest_slotid, described as follows: 2438 o The target highest_slotid is an indication to the requester of the 2439 highest_slotid the replier wishes the requester to be using. This 2440 permits the replier to withdraw (or add) resources from a 2441 requester that has been found to not be using them, in order to 2442 more fairly share resources among a varying level of demand from 2443 other requesters. The requester must always comply with the 2444 replier's value updates, since they indicate newly established 2445 hard limits on the requester's access to session resources. 2446 However, because of request pipelining, the requester may have 2447 active requests in flight reflecting prior values, therefore the 2448 replier must not immediately require the requester to comply. 2450 o The enforced highest_slotid indicates the highest slot ID the 2451 requester is permitted to use on a subsequent SEQUENCE or 2452 CB_SEQUENCE operation. The replier's enforced highest_slotid 2453 SHOULD be no less than the highest_slotid the requester indicated 2454 in the SEQUENCE or CB_SEQUENCE arguments. 2456 If a replier detects the client is being intransigent, i.e. it 2457 fails in a series of requests to honor the target highest_slotid 2458 even though the replier knows there are no outstanding requests a 2459 higher slot ids, it MAY take more forceful action. When faced 2460 with intransigence, the replier MAY reply with a new enforced 2461 highest_slotid that is less than its previous enforced 2462 highest_slotid. Thereafter, if the requester continues to send 2463 requests with a highest_slotid that is greater than the replier's 2464 new enforced highest_slotid the server MAY return 2465 NFS4ERR_BAD_HIGHSLOT, unless the slot ID in the request is greater 2466 than the new enforced highest_slotid, and the request is a retry. 2468 The replier SHOULD retain the slots it wants to retire until the 2469 requester sends a request with a highest_slotid less than or equal 2470 to the replier's new enforced highest_slotid. Also if a request 2471 is received with a slot that is higher than the new enforced 2472 highest_slotid, and the sequence ID is one higher than what is in 2473 the slot's reply cache, then the server can both retire the slot 2474 and return NFS4ERR_BADSLOT (however the server MUST NOT do one and 2475 not the other). (The reason it is safe to retire the slot is 2476 because that by using the next sequence ID, the client is 2477 indicating it has received the previous reply for the slot.) Once 2478 the replier has forcibly lowered the enforced highest_slotid, the 2479 requester is only allowed to send retries to the to-be-retired 2480 slots. 2482 o The requester SHOULD use the lowest available slot when issuing a 2483 new request. This way, the replier may be able to retire slot 2484 entries faster. However, where the replier is actively adjusting 2485 its granted highest_slotid, it will not be able to use only the 2486 receipt of the slot ID and highest_slotid in the request. Neither 2487 the slot ID nor the highest_slotid used in a request may reflect 2488 the replier's current idea of the requester's session limit, 2489 because the request may have been sent from the requester before 2490 the update was received. Therefore, in the downward adjustment 2491 case, the replier may have to retain a number of reply cache 2492 entries at least as large as the old value of maximum requests 2493 outstanding, until it can infer that the requester has seen a 2494 reply containing the new granted highest_slotid. The replier can 2495 infer that requester as seen such a reply when it receives a new 2496 request with the same slot ID as the request replied to and the 2497 next higher sequence ID. 2499 2.10.5.1.1. Caching of SEQUENCE and CB_SEQUENCE Replies 2501 When a SEQUENCE or CB_SEQUENCE operation is successfully executed, 2502 its reply MUST always be cached. Specifically, session ID, sequence 2503 ID, and slot ID MUST be cached in the reply cache. The reply from 2504 SEQUENCE also includes the highest slot ID, target highest slot ID, 2505 and status flags. Instead of caching these values, the server MAY 2506 re-compute the values from the current state of the fore channel, 2507 session and/or client ID as appropriate. Similarly, the reply from 2508 CB_SEQUENCE includes a highest slot ID and target highest slot ID. 2509 The client MAY re-compute the values from the current state of the 2510 session as appropriate. 2512 Regardless of whether a replier is re-computing highest slot ID, 2513 target slot ID, and status on replies to retries or not, the 2514 requester MUST NOT assume the values are being re-computed whenever 2515 it receives a reply after a retry is sent, since it has no way of 2516 knowing whether the reply it has received was sent by the server in 2517 response to the retry, or is a delayed response to the original 2518 request. Therefore, it may be the case that highest slot ID, target 2519 slot ID, or status bits may reflect the state of affairs when the 2520 request was first executed. Although acting based on such delayed 2521 information is valid, it may cause the receiver to do unneeded work. 2522 Requesters MAY choose to send additional requests to get the current 2523 state of affairs or use the state of affairs reported by subsequent 2524 requests, in preference to acting immediately on data which may be 2525 out of date. 2527 2.10.5.1.2. Errors from SEQUENCE and CB_SEQUENCE 2529 Any time SEQUENCE or CB_SEQUENCE return an error, the sequence ID of 2530 the slot MUST NOT change. The replier MUST NOT modify the reply 2531 cache entry for the slot whenever an error is returned from SEQUENCE 2532 or CB_SEQUENCE. 2534 2.10.5.1.3. Optional Reply Caching 2536 On a per-request basis the requester can choose to direct the replier 2537 to cache the reply to all operations after the first operation 2538 (SEQUENCE or CB_SEQUENCE) via the sa_cachethis or csa_cachethis 2539 fields of the arguments to SEQUENCE or CB_SEQUENCE. The reason it 2540 would not direct the replier to cache the entire reply is that the 2541 request is composed of all idempotent operations [24]. Caching the 2542 reply may offer little benefit. If the reply is too large (see 2543 Section 2.10.5.4), it may not be cacheable anyway. Even if the reply 2544 to idempotent request is small enough to cache, unnecessarily caching 2545 the reply slows down the server and increases RPC latency. 2547 Whether the requester requests the reply to be cached or not has no 2548 effect on the slot processing. If the results of SEQUENCE or 2549 CB_SEQUENCE are NFS4_OK, then the slot's sequence ID MUST be 2550 incremented by one. If a requester does not direct the replier to 2551 cache the reply, the replier MUST do one of following: 2553 o The replier can cache the entire original reply. Even though 2554 sa_cachethis or csa_cachethis are FALSE, the replier is always 2555 free to cache. It may choose this approach in order to simplify 2556 implementation. 2558 o The replier enters into its reply cache a reply consisting of the 2559 original results to the SEQUENCE or CB_SEQUENCE operation, and 2560 with the next operation in COMPOUND or CB_COMPOUND having the 2561 error NFS4ERR_RETRY_UNCACHED_REP. Thus if the requester later 2562 retries the request, it will get NFS4ERR_RETRY_UNCACHED_REP. 2564 2.10.5.2. Retry and Replay of Reply 2566 A requester MUST NOT retry a request, unless the connection it used 2567 to send the request disconnects. The requester can then reconnect 2568 and re-send the request, or it can re-send the request over a 2569 different connection that is associated with the same session. 2571 If the requester is a server wanting to re-send a callback operation 2572 over the backchannel of session, the requester of course cannot 2573 reconnect because only the client can associate connections with the 2574 backchannel. The server can re-send the request over another 2575 connection that is bound to the same session's backchannel. If there 2576 is no such connection, the server MUST indicate that the session has 2577 no backchannel by setting the SEQ4_STATUS_CB_PATH_DOWN_SESSION flag 2578 bit in the response to the next SEQUENCE operation from the client. 2579 The client MUST then associate a connection with the session (or 2580 destroy the session). 2582 Note that it is not fatal for a client to retry without a disconnect 2583 between the request and retry. However the retry does consume 2584 resources, especially with RDMA, where each request, retry or not, 2585 consumes a credit. Retries for no reason, especially retries sent 2586 shortly after the previous attempt, are a poor use of network 2587 bandwidth and defeat the purpose of a transport's inherent congestion 2588 control system. 2590 A requester MUST wait for a reply to a request before using the slot 2591 for another request. If it does not wait for a reply, then the 2592 requester does not know what sequence ID to use for the slot on its 2593 next request. For example, suppose a requester sends a request with 2594 sequence ID 1, and does not wait for the response. The next time it 2595 uses the slot, it sends the new request with sequence ID 2. If the 2596 replier has not seen the request with sequence ID 1, then the replier 2597 is not expecting sequence ID 2, and rejects the requester's new 2598 request with NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or 2599 CB_SEQUENCE). 2601 RDMA fabrics do not guarantee that the memory handles (Steering Tags) 2602 within each RPC/RDMA "chunk" ([8]) are valid on a scope outside that 2603 of a single connection. Therefore, handles used by the direct 2604 operations become invalid after connection loss. The server must 2605 ensure that any RDMA operations which must be replayed from the reply 2606 cache use the newly provided handle(s) from the most recent request. 2608 A retry might be sent while the original request is still in progress 2609 on the replier. The replier SHOULD deal with the issue by returning 2610 NFS4ERR_DELAY as the reply to SEQUENCE or CB_SEQUENCE operation, but 2611 implementations MAY return NFS4ERR_MISORDERED. Since errors from 2612 SEQUENCE and CB_SEQUENCE are never recorded in the reply cache, this 2613 approach allows the results of the execution of the original request 2614 to be properly recorded in the reply cache (assuming the requester 2615 specified the reply to be cached). 2617 2.10.5.3. Resolving Server Callback Races 2619 It is possible for server callbacks to arrive at the client before 2620 the reply from related fore channel operations. For example, a 2621 client may have been granted a delegation to a file it has opened, 2622 but the reply to the OPEN (informing the client of the granting of 2623 the delegation) may be delayed in the network. If a conflicting 2624 operation arrives at the server, it will recall the delegation using 2625 the backchannel, which may be on a different transport connection, 2626 perhaps even a different network, or even a different session 2627 associated with the same client ID 2629 The presence of a session between client and server alleviates this 2630 issue. When a session is in place, each client request is uniquely 2631 identified by its { session ID, slot ID, sequence ID } triple. By 2632 the rules under which slot entries (reply cache entries) are retired, 2633 the server has knowledge whether the client has "seen" each of the 2634 server's replies. The server can therefore provide sufficient 2635 information to the client to allow it to disambiguate between an 2636 erroneous or conflicting callback race condition. 2638 For each client operation which might result in some sort of server 2639 callback, the server SHOULD "remember" the { session ID, slot ID, 2640 sequence ID } triple of the client request until the slot ID 2641 retirement rules allow the server to determine that the client has, 2642 in fact, seen the server's reply. Until the time the { session ID, 2643 slot ID, sequence ID } request triple can be retired, any recalls of 2644 the associated object MUST carry an array of these referring 2645 identifiers (in the CB_SEQUENCE operation's arguments), for the 2646 benefit of the client. After this time, it is not necessary for the 2647 server to provide this information in related callbacks, since it is 2648 certain that a race condition can no longer occur. 2650 The CB_SEQUENCE operation which begins each server callback carries a 2651 list of "referring" { session ID, slot ID, sequence ID } triples. If 2652 the client finds the request corresponding to the referring session 2653 ID, slot ID and sequence ID to be currently outstanding (i.e. the 2654 server's reply has not been seen by the client), it can determine 2655 that the callback has raced the reply, and act accordingly. If the 2656 client does not find the request corresponding the referring triple 2657 to be outstanding (including the case of a session ID referring to a 2658 destroyed session), then there is no race with respect to this 2659 triple. The server SHOULD limit the referring triples to requests 2660 that refer to just those that apply to the objects referred to in the 2661 CB_COMPOUND procedure. 2663 The client must not simply wait forever for the expected server reply 2664 to arrive before responding to the CB_COMPOUND that won the race, 2665 because it is possible that it will be delayed indefinitely. The 2666 client should assume the likely case that the reply will arrive 2667 within the average round trip time for COMPOUND requests to the 2668 server, and wait that period of time. If that period of time expires 2669 it can respond to the CB_COMPOUND with NFS4ERR_DELAY. 2671 There are other scenarios under which callbacks may race replies. 2672 Among them are pNFS layout recalls as described in Section 12.5.5.2. 2674 2.10.5.4. COMPOUND and CB_COMPOUND Construction Issues 2676 Very large requests and replies may pose both buffer management 2677 issues (especially with RDMA) and reply cache issues. When the 2678 session is created, (Section 18.36), for each channel (fore and 2679 back), the client and server negotiate the maximum sized request they 2680 will send or process (ca_maxrequestsize), the maximum sized reply 2681 they will return or process (ca_maxresponsesize), and the maximum 2682 sized reply they will store in the reply cache 2683 (ca_maxresponsesize_cached). 2685 If a request exceeds ca_maxrequestsize, the reply will have the 2686 status NFS4ERR_REQ_TOO_BIG. A replier MAY return NFS4ERR_REQ_TOO_BIG 2687 as the status for first operation (SEQUENCE or CB_SEQUENCE) in the 2688 request (which means no operations in the request executed, and the 2689 state of the slot in the reply cache is unchanged), or it MAY opt to 2690 return it on a subsequent operation in the same COMPOUND or 2691 CB_COMPOUND request (which means at least one operation did execute 2692 and the state of the slot in reply cache does change). The replier 2693 SHOULD set NFS4ERR_REQ_TOO_BIG on the operation that exceeds 2694 ca_maxrequestsize. 2696 If a reply exceeds ca_maxresponsesize, the reply will have the status 2697 NFS4ERR_REP_TOO_BIG. A replier MAY return NFS4ERR_REP_TOO_BIG as the 2698 status for first operation (SEQUENCE or CB_SEQUENCE) in the request, 2699 or it MAY opt to return it on a subsequent operation (in the same 2700 COMPOUND or CB_COMPOUND reply). A replier MAY return 2701 NFS4ERR_REP_TOO_BIG in the reply to SEQUENCE or CB_SEQUENCE, even if 2702 the response would still exceed ca_maxresponsesize. 2704 If sa_cachethis or csa_cachethis are TRUE, then the replier MUST 2705 cache a reply except if an error is returned by the SEQUENCE or 2706 CB_SEQUENCE operation (see Section 2.10.5.1.2). If the reply exceeds 2707 ca_maxresponsesize_cached, (and sa_cachethis or csa_cachethis are 2708 TRUE) then the server MUST return NFS4ERR_REP_TOO_BIG_TO_CACHE. Even 2709 if NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for that matter) 2710 is returned on a operation other than first operation (SEQUENCE or 2711 CB_SEQUENCE), then the reply MUST be cached if sa_cachethis or 2712 csa_cachethis are TRUE. For example, if a COMPOUND has eleven 2713 operations, including SEQUENCE, the fifth operation is a RENAME, and 2714 the tenth operation is a READ for one million bytes, the server may 2715 return NFS4ERR_REP_TOO_BIG_TO_CACHE on the tenth operation. Since 2716 the server executed several operations, especially the non-idempotent 2717 RENAME, the client's request to cache the reply needs to be honored 2718 in order for correct operation of exactly once semantics. If the 2719 client retries the request, the server will have cached a reply that 2720 contains results for ten of the eleven requested operations, with the 2721 tenth operation having a status of NFS4ERR_REP_TOO_BIG_TO_CACHE. 2723 A client needs to take care that when sending operations that change 2724 the current filehandle (except for PUTFH, PUTPUBFH, PUTROOTFH and 2725 RESTOREFH) that it not exceed the maximum reply buffer before the 2726 GETFH operation. Otherwise the client will have to retry the 2727 operation that changed the current filehandle, in order to obtain the 2728 desired filehandle. For the OPEN operation (see Section 18.16), 2729 retry is not always available as an option. The following guidelines 2730 for the handling of filehandle changing operations are advised: 2732 o Within the same COMPOUND procedure, a client SHOULD send GETFH 2733 immediately after a current filehandle changing operation. A 2734 client MUST send GETFH after a current filehandle changing 2735 operation that is also non-idempotent (for example, the OPEN 2736 operation), unless the operation is RESTOREFH. RESTOREFH is an 2737 exception, because even though it is non-idempotent, the 2738 filehandle RESTOREFH produced originated from an operation that is 2739 either idempotent (e.g. PUTFH, LOOKUP), or non-idempotent (e.g. 2740 OPEN, CREATE). If the origin is non-idempotent, then because the 2741 client MUST send GETFH after the origin operation, the client can 2742 recover if RESTOREFH returns an error. 2744 o A server MAY return NFS4ERR_REP_TOO_BIG or 2745 NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a 2746 filehandle changing operation if the reply would be too large on 2747 the next operation. 2749 o A server SHOULD return NFS4ERR_REP_TOO_BIG or 2750 NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a 2751 filehandle changing non-idempotent operation if the reply would be 2752 too large on the next operation, especially if the operation is 2753 OPEN. 2755 o A server MAY return NFS4ERR_UNSAFE_COMPOUND to a non-idempotent 2756 current filehandle changing operation, if it looks at the next 2757 operation (in the same COMPOUND procedure) and finds it is not 2758 GETFH. The server SHOULD do this if it is unable to determine in 2759 advance whether the total response size would exceed 2760 ca_maxresponsesize_cached or ca_maxresponsesize. 2762 2.10.5.5. Persistence 2764 Since the reply cache is bounded, it is practical for the reply cache 2765 to persist across server restarts. The replier MUST persist the 2766 following information if it agreed to persist the session (when the 2767 session was created; see Section 18.36): 2769 o The session ID. 2771 o The slot table including the sequence ID and cached reply for each 2772 slot. 2774 The above are sufficient for a replier to provide EOS semantics for 2775 any requests that were sent and executed before the server restarted. 2776 If the replier is a client then there is no need for it to persist 2777 any more information, unless the client will be persisting all other 2778 state across client restart. In which case, the server will never 2779 see any NFSv4.1-level protocol manifestation of a client restart. If 2780 the replier is a server, with just the slot table and session ID 2781 persisting, any requests the client retries after the server restart 2782 will return the results that are cached in reply cache. and any new 2783 requests (i.e. the sequence ID is one (1) greater than the slot's 2784 sequence ID) MUST be rejected with NFS4ERR_DEADSESSION (returned by 2785 SEQUENCE). Such a session is considered dead. A server MAY re- 2786 animate a session after a server restart so that the session will 2787 accept new requests as well as retries. To re-animate a session the 2788 server needs to persist additional information through server 2789 restart: 2791 o The client ID. This is a prerequisite to let the client to create 2792 more sessions associated with the same client ID as the 2794 o The client ID's sequence ID that is used for creating sessions 2795 (see Section 18.35 and Section 18.36). This is a prerequisite to 2796 let the client create more sessions. 2798 o The principal that created the client ID. This allows the server 2799 to authenticate the client when it sends EXCHANGE_ID. 2801 o The SSV, if SP4_SSV state protection was specified when the client 2802 ID was created (see Section 18.35). This lets the client create 2803 new sessions, and associate connections with the new and existing 2804 sessions. 2806 o The properties of the client ID as defined in Section 18.35. 2808 A persistent reply cache places certain demands on the server. The 2809 execution of the sequence of operations (starting with SEQUENCE) and 2810 placement of its results in the persistent cache MUST be atomic. If 2811 a client retries an sequence of operations that was previously 2812 executed on the server the only acceptable outcomes are either the 2813 original cached reply or an indication that client ID or session has 2814 been lost (indicating a catastrophic loss of the reply cache or a 2815 session that has been deleted because the client failed to use the 2816 session for an extended period of time). 2818 A server could fail and restart in the middle of a COMPOUND procedure 2819 that contains one or more non-idempotent or idempotent-but-modifying 2820 operations. This creates an even higher challenge for atomic 2821 execution and placement of results in the reply cache. One way to 2822 view the problem is as a single transaction consisting of each 2823 operation in the COMPOUND followed by storing the result in 2824 persistent storage, then finally a transaction commit. If there is a 2825 failure before the transaction is committed, then the server rolls 2826 back the transaction. If server itself fails, then when it restarts, 2827 its recovery logic could roll back the transaction before starting 2828 the NFSv4.1 server. 2830 While the description of the implementation for atomic execution of 2831 the request and caching of the reply is beyond the scope of this 2832 document, an example implementation for NFSv2 [28] is described in 2833 [29]. 2835 2.10.6. RDMA Considerations 2837 A complete discussion of the operation of RPC-based protocols over 2838 RDMA transports is in [8]. A discussion of the operation of NFSv4, 2839 including NFSv4.1, over RDMA is in [9]. Where RDMA is considered, 2840 this specification assumes the use of such a layering; it addresses 2841 only the upper layer issues relevant to making best use of RPC/RDMA. 2843 2.10.6.1. RDMA Connection Resources 2845 RDMA requires its consumers to register memory and post buffers of a 2846 specific size and number for receive operations. 2848 Registration of memory can be a relatively high-overhead operation, 2849 since it requires pinning of buffers, assignment of attributes (e.g. 2850 readable/writable), and initialization of hardware translation. 2851 Preregistration is desirable to reduce overhead. These registrations 2852 are specific to hardware interfaces and even to RDMA connection 2853 endpoints, therefore negotiation of their limits is desirable to 2854 manage resources effectively. 2856 Following basic registration, these buffers must be posted by the RPC 2857 layer to handle receives. These buffers remain in use by the RPC/ 2858 NFSv4.1 implementation; the size and number of them must be known to 2859 the remote peer in order to avoid RDMA errors which would cause a 2860 fatal error on the RDMA connection. 2862 NFSv4.1 manages slots as resources on a per session basis (see 2863 Section 2.10), while RDMA connections manage credits on a per 2864 connection basis. This means that in order for a peer to send data 2865 over RDMA to a remote buffer, it has to have both an NFSv4.1 slot, 2866 and an RDMA credit. If multiple RDMA connections are associated with 2867 a session, then if the total number of credits across all RDMA 2868 connections associated with the session is X, and the number slots in 2869 the session is Y, then the maximum number of outstanding requests is 2870 lesser of X and Y. 2872 2.10.6.2. Flow Control 2874 Previous versions of NFS do not provide flow control; instead they 2875 rely on the windowing provided by transports like TCP to throttle 2876 requests. This does not work with RDMA, which provides no operation 2877 flow control and will terminate a connection in error when limits are 2878 exceeded. Limits such as maximum number of requests outstanding are 2879 therefore negotiated when a session is created (see the 2880 ca_maxrequests field in Section 18.36). These limits then provide 2881 the maxima which each connection associated with the session's 2882 channel(s) must remain within. RDMA connections are managed within 2883 these limits as described in section 3.3 ("Flow Control"[[Comment.2: 2884 RFC Editor: please verify section and title of the RPCRDMA 2885 document]]) of [8]; if there are multiple RDMA connections, then the 2886 maximum number of requests for a channel will be divided among the 2887 RDMA connections. Put a different way, the onus is on the replier to 2888 ensure that total number of RDMA credits across all connections 2889 associated with the replier's channel does exceed the channel's 2890 maximum number of outstanding requests. 2892 The limits may also be modified dynamically at the replier's choosing 2893 by manipulating certain parameters present in each NFSv4.1 reply. In 2894 addition, the CB_RECALL_SLOT callback operation (see Section 20.8) 2895 can be sent by a server to a client to return RDMA credits to the 2896 server, thereby lowering the maximum number of requests a client can 2897 have outstanding to the server. 2899 2.10.6.3. Padding 2901 Header padding is requested by each peer at session initiation (see 2902 the ca_headerpadsize argument to CREATE_SESSION in Section 18.36), 2903 and subsequently used by the RPC RDMA layer, as described in [8]. 2904 Zero padding is permitted. 2906 Padding leverages the useful property that RDMA preserve alignment of 2907 data, even when they are placed into anonymous (untagged) buffers. 2908 If requested, client inline writes will insert appropriate pad bytes 2909 within the request header to align the data payload on the specified 2910 boundary. The client is encouraged to add sufficient padding (up to 2911 the negotiated size) so that the "data" field of the NFSv4.1 WRITE 2912 operation is aligned. Most servers can make good use of such 2913 padding, which allows them to chain receive buffers in such a way 2914 that any data carried by client requests will be placed into 2915 appropriate buffers at the server, ready for file system processing. 2916 The receiver's RPC layer encounters no overhead from skipping over 2917 pad bytes, and the RDMA layer's high performance makes the insertion 2918 and transmission of padding on the sender a significant optimization. 2919 In this way, the need for servers to perform RDMA Read to satisfy all 2920 but the largest client writes is obviated. An added benefit is the 2921 reduction of message round trips on the network - a potentially good 2922 trade, where latency is present. 2924 The value to choose for padding is subject to a number of criteria. 2925 A primary source of variable-length data in the RPC header is the 2926 authentication information, the form of which is client-determined, 2927 possibly in response to server specification. The contents of 2928 COMPOUNDs, sizes of strings such as those passed to RENAME, etc. all 2929 go into the determination of a maximal NFSv4.1 request size and 2930 therefore minimal buffer size. The client must select its offered 2931 value carefully, so as not to overburden the server, and vice- versa. 2932 The payoff of an appropriate padding value is higher performance. 2933 [[Comment.3: RFC editor please keep this diagram on one page.]] 2935 Sender gather: 2936 |RPC Request|Pad bytes|Length| -> |User data...| 2937 \------+----------------------/ \ 2938 \ \ 2939 \ Receiver scatter: \-----------+- ... 2940 /-----+----------------\ \ \ 2941 |RPC Request|Pad|Length| -> |FS buffer|->|FS buffer|->... 2943 In the above case, the server may recycle unused buffers to the next 2944 posted receive if unused by the actual received request, or may pass 2945 the now-complete buffers by reference for normal write processing. 2946 For a server which can make use of it, this removes any need for data 2947 copies of incoming data, without resorting to complicated end-to-end 2948 buffer advertisement and management. This includes most kernel-based 2949 and integrated server designs, among many others. The client may 2950 perform similar optimizations, if desired. 2952 2.10.6.4. Dual RDMA and Non-RDMA Transports 2954 Some RDMA transports (for example [10]), permit a "streaming" (non- 2955 RDMA) phase, where ordinary traffic might flow before "stepping up" 2956 to RDMA mode, commencing RDMA traffic. Some RDMA transports start 2957 connections always in RDMA mode. NFSv4.1 allows, but does not 2958 assume, a streaming phase before RDMA mode. When a connection is 2959 associated with a session, the client and server negotiate whether 2960 the connection is used in RDMA or non-RDMA mode (see Section 18.36 2961 and Section 18.34). 2963 2.10.7. Sessions Security 2965 2.10.7.1. Session Callback Security 2967 Via session / connection association, NFSv4.1 improves security over 2968 that provided by NFSv4.0 for the backchannel. The connection is 2969 client-initiated (see Section 18.34), and subject to the same 2970 firewall and routing checks as the fore channel. The connection 2971 cannot be hijacked by an attacker who connects to the client port 2972 prior to the intended server as is possible with NFSv4.0. At the 2973 client's option (see Section 18.35), connection association is fully 2974 authenticated before being activated (see Section 18.34). Traffic 2975 from the server over the backchannel is authenticated exactly as the 2976 client specifies (see Section 2.10.7.2). 2978 2.10.7.2. Backchannel RPC Security 2980 When the NFSv4.1 client establishes the backchannel, it informs the 2981 server of the security flavors and principals to use when sending 2982 requests. If the security flavor is RPCSEC_GSS, the client expresses 2983 the principal in the form of an established RPCSEC_GSS context. The 2984 server is free to use any of the flavor/principal combinations the 2985 client offers, but it MUST NOT use unoffered combinations. This way, 2986 the client need not provide a target GSS principal for the 2987 backchannel as it did with NFSv4.0, nor the server have to implement 2988 an RPCSEC_GSS initiator as it did with NFSv4.0 [21]. 2990 The CREATE_SESSION (Section 18.36) and BACKCHANNEL_CTL 2991 (Section 18.33) operations allow the client to specify flavor/ 2992 principal combinations. 2994 Also note that the SP4_SSV state protection mode (see Section 18.35 2995 and Section 2.10.7.3) has the side benefit of providing SSV-derived 2996 RPCSEC_GSS contexts (Section 2.10.8). 2998 2.10.7.3. Protection from Unauthorized State Changes 3000 As described to this point in the specification, the state model of 3001 NFSv4.1 is vulnerable to an attacker that sends a SEQUENCE operation 3002 with a forged session ID and with a slot ID that it expects the 3003 legitimate client to use next. When the legitimate client uses the 3004 slot ID with the same sequence number, the server returns the 3005 attacker's result from the reply cache which disrupts the legitimate 3006 client and thus denies service to it. Similarly an attacker could 3007 send a CREATE_SESSION with a forged client ID to create a new session 3008 associated with the client ID. The attacker could send requests 3009 using the new session that change locking state, such as LOCKU 3010 operations to release locks the legitimate client has acquired. 3011 Setting a security policy on the file which requires RPCSEC_GSS 3012 credentials when manipulating the file's state is one potential work 3013 around, but has the disadvantage of preventing a legitimate client 3014 from releasing state when RPCSEC_GSS is required to do so, but a GSS 3015 context cannot be obtained (possibly because the user has logged off 3016 the client). 3018 NFSv4.1 provides three options to a client for state protection which 3019 are specified when a client creates a client ID via EXCHANGE_ID 3020 (Section 18.35). 3022 The first (SP4_NONE) is to simply waive state protection. 3024 The other two options (SP4_MACH_CRED and SP4_SSV) share several 3025 traits: 3027 o An RPCSEC_GSS-based credential is used to authenticate client ID 3028 and session maintenance operations, including creating and 3029 destroying a session, associating a connection with the session, 3030 and destroying the client ID. 3032 o Because RPCSEC_GSS is used to authenticate client ID and session 3033 maintenance, the attacker cannot associate a rogue connection with 3034 a legitimate session, or associate a rogue session with a 3035 legitimate client ID in order to maliciously alter the client ID's 3036 lock state via CLOSE, LOCKU, DELEGRETURN, LAYOUTRETURN, etc. 3038 o In cases where the server's security policies on a portion of its 3039 namespace require RPCSEC_GSS authentication, a client may have to 3040 use an RPCSEC_GSS credential to remove per-file state (for example 3041 LOCKU, CLOSE, etc.). The server may require that the principal 3042 that removes the state match certain criteria (for example, the 3043 principal might have to be the same as the one that acquired the 3044 state). However, the client might not have an RPCSEC_GSS context 3045 for such a principal, and might not be able to create such a 3046 context (perhaps because the user has logged off). When the 3047 client establishes SP4_MACH_CRED or SP4_SSV protection, it can 3048 specify a list of operations that the server MUST allow using the 3049 machine credential (if SP4_MACH_CRED is used) or the SSV 3050 credential (if SP4_SSV is used). 3052 The SP4_MACH_CRED state protection option uses a machine credential 3053 where the principal that creates the client ID, must also be the 3054 principal that performs client ID and session maintenance operations. 3055 The security of the machine credential state protection approach 3056 depends entirely on safe guarding the per-machine credential. 3057 Assuming a proper safe guard, using the per-machine credential for 3058 operations like CREATE_SESSION, BIND_CONN_TO_SESSION, 3059 DESTROY_SESSION, and DESTROY_CLIENTID will prevent an attacker from 3060 associating a rogue connection with a session, or associating a rogue 3061 session with a client ID. 3063 There are at least three scenarios for the SP4_MACH_CRED option: 3065 1. That the system administrator configures a unique, permanent per- 3066 machine credential for one of the mandated GSS mechanisms (for 3067 example, if Kerberos V5 is used, a "keytab" containing a 3068 principal named after client host name could be used). 3070 2. The client is used by a single user, and so the client ID and its 3071 sessions are used by just that user. If the user's credential 3072 expires, then session and client ID maintenance cannot occur, but 3073 since the client has a single user, only that user is 3074 inconvenienced. 3076 3. The physical client has multiple users, but the client 3077 implementation has a unique client ID for each user. This is 3078 effectively the same as the second scenario, but a disadvantage 3079 is that each user must be allocated at least one session each, so 3080 the approach suffers from lack of economy. 3082 The SP4_SSV protection option uses a Secret State Verifier (SSV) 3083 which is shared between a client and server. The SSV serves as the 3084 secret key for an internal (that is, internal to NFSv4.1) GSS 3085 mechanism that uses the secret key for Message Integrity Code (MIC) 3086 and Wrap tokens (Section 2.10.8). The SP4_SSV protection option is 3087 intended for the client that has multiple users, and the system 3088 administrator does not wish to configure a permanent machine 3089 credential for each client. The SSV is established on the server via 3090 SET_SSV (see Section 18.47). To prevent eavesdropping, a client 3091 SHOULD send SET_SSV via RPCSEC_GSS with the privacy service. Several 3092 aspects of the SSV make it intractable for an attacker to guess the 3093 SSV, and thus associate rogue connections with a session, and rogue 3094 sessions with a client ID: 3096 o The arguments to and results of SET_SSV include digests of the old 3097 and new SSV, respectively. 3099 o Because the initial value of the SSV is zero, therefore known, the 3100 client that opts for SP4_SSV protection and opts to apply SP4_SSV 3101 protection to BIND_CONN_TO_SESSION and CREATE_SESSION MUST send at 3102 least one SET_SSV operation before the first BIND_CONN_TO_SESSION 3103 operation or before the second CREATE_SESSION operation on a 3104 client ID. If it does not, the SSV mechanism will not generate 3105 tokens (Section 2.10.8). A client SHOULD send SET_SSV as soon as 3106 a session is created. 3108 o A SET_SSV does not replace the SSV with the argument to SET_SSV. 3109 Instead, the current SSV on the server is logically exclusive ORed 3110 (XORed) with the argument to SET_SSV. Each time a new principal 3111 uses a client ID for the first time, the client SHOULD send a 3112 SET_SSV with that principal's RPCSEC_GSS credentials, with 3113 RPCSEC_GSS service set to RPC_GSS_SVC_PRIVACY. 3115 Here are the types of attacks that can be attempted by an attacker 3116 named Eve on a victim named Bob, and how SP4_SSV protection foils 3117 each attack: 3119 o Suppose Eve is the first user to log into a legitimate client. 3120 Eve's use of an NFSv4.1 file system will cause the legitimate 3121 client to create a client ID with SP4_SSV protection, specifying 3122 that the BIND_CONN_TO_SESSION operation MUST use the SSV 3123 credential. Eve's use of the file system also causes an SSV to be 3124 created. The SET_SSV operation that creates the SSV will be 3125 protected by the RPCSEC_GSS context created by the legitimate 3126 client which uses Eve's GSS principal and credentials. Eve can 3127 eavesdrop on the network while her RPCSEC_GSS context is created, 3128 and the SET_SSV using her context is sent. Even if the legitimate 3129 client sends the SET_SSV with RPC_GSS_SVC_PRIVACY, because Eve 3130 knows her own credentials, she can decrypt the SSV. Eve can 3131 compute an RPCSEC_GSS credential that BIND_CONN_TO_SESSION will 3132 accept, and so associate a new connection with the legitimate 3133 session. Eve can change the slot ID and sequence state of a 3134 legitimate session, and/or the SSV state, in such a way that when 3135 Bob accesses the server via the same legitimate client, the 3136 legitimate client will be unable to use the session. 3138 The client's only recourse is to create a new client ID for Bob to 3139 use, and establish a new SSV for the client ID. The client will 3140 be unable to delete the old client ID, and will let the lease on 3141 the old client ID expire. 3143 Once the legitimate client establishes an SSV over the new session 3144 using Bob's RPCSEC_GSS context, Eve can use the new session via 3145 the legitimate client, but she cannot disrupt Bob. Moreover, 3146 because the client SHOULD have modified the SSV due to Eve using 3147 the new session, Bob cannot get revenge on Eve by associating a 3148 rogue connection with the session. 3150 The question is how did the legitimate client detect that Eve has 3151 hijacked the old session? When the client detects that a new 3152 principal, Bob, wants to use the session, it SHOULD have sent a 3153 SET_SSV, which leads to following sub-scenarios: 3155 * Let us suppose that from the rogue connection, Eve sent a 3156 SET_SSV with the same slot ID and sequence ID that the 3157 legitimate client later uses. The server will assume the 3158 SET_SSV sent with Bob's credentials is a retry, and return to 3159 the legitimate client the reply it sent Eve. However, unless 3160 Eve can correctly guess the SSV the legitimate client will use, 3161 the digest verification checks in the SET_SSV response will 3162 fail. That is an indication to the client that the session has 3163 apparently been hijacked. 3165 * Alternatively, Eve sent a SET_SSV with a different slot ID than 3166 the legitimate client uses for its SET_SSV. Then the digest 3167 verification of the SET_SSV sent with Bob's credentials fails 3168 on the server, and the error returned to the client makes it 3169 apparent that the session has been hijacked. 3171 * Alternatively, Eve sent an operation other than SET_SSV, but 3172 with the same slot ID and sequence that the legitimate client 3173 uses for its SET_SSV. The server returns to the legitimate 3174 client the response it sent Eve. The client sees that the 3175 response is not at all what it expects. The client assumes 3176 either session hijacking or a server bug, and either way 3177 destroys the old session. 3179 o Eve associates a rogue connection with the session as above, and 3180 then destroys the session. Again, Bob goes to use the server from 3181 the legitimate client, which sends a SET_SSV using Bob's 3182 credentials. The client receives an error that indicates the 3183 session does not exist. When the client tries to create a new 3184 session, this will fail because the SSV it has does not match that 3185 the server has, and now the client knows the session was hijacked. 3186 The legitimate client establishes a new client ID. 3188 o If Eve creates a connection before the legitimate client 3189 establishes an SSV, because the initial value of the SSV is zero 3190 and therefore known, Eve can send a SET_SSV that will pass the 3191 digest verification check. However because the new connection has 3192 not been associated with the session, the SET_SSV is rejected for 3193 that reason. 3195 In summary, an attacker's disruption of state when SP4_SSV protection 3196 is in use is limited to the formative period of a client ID, its 3197 first session, and the establishment of the SSV. Once a non- 3198 malicious user uses the client ID, the client quickly detects any 3199 hijack and rectifies the situation. Once a non-malicious user 3200 successfully modifies the SSV, the attacker cannot use NFSv4.1 3201 operations to disrupt the non-malicious user. 3203 Note that neither the SP4_MACH_CRED nor SP4_SSV protection approaches 3204 prevent hijacking of a transport connection that has previously been 3205 associated with a session. If the goal of a counter threat strategy 3206 is to prevent connection hijacking, the use of IPsec is RECOMMENDED. 3208 If a connection hijack occurs, the hijacker could in theory change 3209 locking state and negatively impact the service to legitimate 3210 clients. However if the server is configured to require the use of 3211 RPCSEC_GSS with integrity or privacy on the affected file objects, 3212 and if EXCHGID4_FLAG_BIND_PRINC_STATEID capability (Section 18.35), 3213 is in force, this will thwart unauthorized attempts to change locking 3214 state. 3216 2.10.8. The SSV GSS Mechanism 3218 The SSV provides the secret key for a mechanism that NFSv4.1 uses for 3219 state protection. Contexts for this mechanism are not established 3220 via the RPCSEC_GSS protocol. Instead, the contexts are automatically 3221 created when EXCHANGE_ID specifies SP4_SSV protection. The only 3222 tokens defined are the PerMsgToken (emitted by GSS_GetMIC) and the 3223 SealedMessage token (emitted by GSS_Wrap). 3225 The mechanism OID for the SSV mechanism is: 3226 iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech 3227 (1.3.6.1.4.1.28882.1.1). While the SSV mechanism does not define any 3228 initial context tokens, the OID can be used to let servers indicate 3229 that the SSV mechanism is acceptable whenever the client sends a 3230 SECINFO or SECINFO_NO_NAME operation (see Section 2.6). 3232 The SSV mechanism defines four subkeys derived from the SSV value. 3233 Each time SET_SSV is invoked the subkeys are recalculated by the 3234 client and server. The calculation of each of the four subkeys 3235 depends on each of the four respective ssv_subkey4 enumerated values. 3236 The calculation uses the HMAC [11], algorithm, using the current SSV 3237 as the key, the one way hash algorithm as negotiated by EXCHANGE_ID, 3238 and the input text as represented by the XDR encoded enumeration of 3239 type ssv_subkey4. 3241 /* Input for computing subkeys */ 3242 enum ssv_subkey4 { 3243 SSV4_SUBKEY_MIC_I2T = 1, 3244 SSV4_SUBKEY_MIC_T2I = 2, 3245 SSV4_SUBKEY_SEAL_I2T = 3, 3246 SSV4_SUBKEY_SEAL_T2I = 4 3247 }; 3249 The subkey derived from SSV4_SUBKEY_MIC_I2T is used for calculating 3250 message integrity codes (MICs) that originate from the NFSv4.1 3251 client, whether as part of a request over the fore channel, or a 3252 response over the backchannel. The subkey derived from SSV4_SUBKEY- 3253 MIST2I is used for MICs originating from the NFSv4.1 server. The 3254 subkey derived from SSV4_SUBKEY_SEAL_I2T is used for encryption text 3255 originating from the NFSv4.1 client and the subkey derived from 3256 SSV4_SUBKEY_SEAL_T2I is used for encryption text originating from the 3257 NFSv4.1 server. 3259 The PerMsgToken description is based on an XDR definition: 3261 /* Input for computing smt_hmac */ 3262 struct ssv_mic_plain_tkn4 { 3263 uint32_t smpt_ssv_seq; 3264 opaque smpt_orig_plain<>; 3265 }; 3267 /* SSV GSS PerMsgToken token */ 3268 struct ssv_mic_tkn4 { 3269 uint32_t smt_ssv_seq; 3270 opaque smt_hmac<>; 3271 }; 3273 The field smt_hmac is an HMAC calculated by using the subkey derived 3274 from SSV4_SUBKEY_MIC_I2T or SSV4_SUBKEY_MIC_T2I as the key, the one 3275 way hash algorithm as negotiated by EXCHANGE_ID, and the input text 3276 as represented by data of type ssv_mic_plain_tkn4. The field 3277 smpt_ssv_seq is the same as smt_ssv_seq. The field smpt_orig_plain 3278 is the "message" input passed to GSS_GetMIC() (see Section 2.3.1 of 3279 [7]). The caller of GSS_GetMIC() provides a pointer to a buffer 3280 containing the plain text. The SSV mechanism's entry point for 3281 GSS_GetMIC() encodes this into an opaque array, and the encoding will 3282 include an initial four byte length, plus any necessary padding. 3283 Prepended to this will be the XDR encoded value of smpt_ssv_seq thus 3284 making up an XDR encoding of a value of data type ssv_mic_plain_tkn4, 3285 which in turn is the input into the HMAC. 3287 The token emitted by GSS_GetMIC() is XDR encoded and of XDR data type 3288 ssv_mic_tkn4. The field smt_ssv_seq comes from the SSV sequence 3289 number which is equal to 1 after SET_SSV (Section 18.47) is called 3290 the first time on a client ID. Thereafter, it is incremented on each 3291 SET_SSV. Thus smt_ssv_seq represents the version of the SSV at the 3292 time GSS_GetMIC() was called. As noted in Section 18.35, the client 3293 and server can maintain multiple concurrent versions of the SSV. 3294 This allows the SSV to be changed without serializing all RPC calls 3295 that use the SSV mechanism with SET_SSV operations. Once the HMAC is 3296 calculated, it is XDR encoded into smt_hmac, which will include an 3297 initial four byte length, and any necessary padding. Prepended to 3298 this will be the XDR encoded value of smt_ssv_seq. 3300 The SealedMessage description is based on an XDR definition: 3302 /* Input for computing ssct_encr_data and ssct_hmac */ 3303 struct ssv_seal_plain_tkn4 { 3304 opaque sspt_confounder<>; 3305 uint32_t sspt_ssv_seq; 3306 opaque sspt_orig_plain<>; 3307 opaque sspt_pad<>; 3308 }; 3310 /* SSV GSS SealedMessage token */ 3311 struct ssv_seal_cipher_tkn4 { 3312 uint32_t ssct_ssv_seq; 3313 opaque ssct_iv<>; 3314 opaque ssct_encr_data<>; 3315 opaque ssct_hmac<>; 3316 }; 3318 The token emitted by GSS_Wrap() is XDR encoded and of XDR data type 3319 ssv_seal_cipher_tkn4. 3321 The ssct_ssv_seq field has the same meaning as smt_ssv_seq. 3323 The ssct_encr_data field is the result of encrypting a value of the 3324 XDR encoded data type ssv_seal_plain_tkn4. The encryption key is the 3325 subkey derived from SSV4_SUBKEY_SEAL_I2T or SSV4_SUBKEY_SEAL_T2I, and 3326 the encryption algorithm is that negotiated by EXCHANGE_ID. 3328 The ssct_iv field is the initialization vector (IV) for the 3329 encryption algorithm (if applicable) and is sent in clear text. The 3330 content and size of the IV MUST comply with specification of the 3331 encryption algorithm. For example, the id-aes256-CBC algorithm MUST 3332 use a 16 byte initialization vector (IV) which MUST be unpredictable 3333 for each instance of a value of type ssv_seal_plain_tkn4 that is 3334 encrypted with a particular SSV key. 3336 The ssct_hmac field is the result of computing an HMAC using value of 3337 the XDR encoded data type ssv_seal_plain_tkn4 as the input text. The 3338 key is the subkey derived from SSV4_SUBKEY_MIC_I2T or 3339 SSV4_SUBKEY_MIC_T2I, and the one way hash algorithm is that 3340 negotiated by EXCHANGE_ID. 3342 The sspt_confounder field is a random value. 3344 The sspt_ssv_seq field is the same as ssvt_ssv_seq. 3346 The field sspt_orig_plain field is the original plaintext and is the 3347 "input_message" input passed to GSS_Wrap() (see Section 2.3.3 of 3348 [7]). As with the handling of the plaintext by the SSV mechanism's 3349 GSS_GetMIC() entry point, the entry point for GSS_Wrap() expects a 3350 pointer to the plaintext, and will XDR encode an opaque array into 3351 sspt_orig_plain representing the plain text, along with the other 3352 fields of an instance of data type ssv_seal_plain_tkn4. 3354 The sspt_pad field is present to support encryption algorithms that 3355 require inputs to be in fixed sized blocks. The content of sspt_pad 3356 is zero filled except for the length. Beware that the XDR encoding 3357 of ssv_seal_plain_tkn4 contains three variable length arrays, and so 3358 each array consumes four bytes for an array length, and each array 3359 that follows the length is always padded to a multiple of four bytes 3360 per the XDR standard. 3362 For example suppose the encryption algorithm uses 16 byte blocks, and 3363 the sspt_confounder is three bytes long, and the sspt_orig_plain 3364 field is 15 bytes long. The XDR encoding of sspt_confounder uses 3365 eight bytes (4 + 3 + 1 byte pad), the XDR encoding of sspt_ssv_seq 3366 uses four bytes, the XDR encoding of sspt_orig_plain uses 20 bytes (4 3367 + 15 + 1 byte pad), and the smallest XDR encoding of the sspt_pad 3368 field is four bytes. This totals 36 bytes. The next multiple of 16 3369 is 48, thus the length field of sspt_pad needs to be set to 12 bytes, 3370 or a total encoding of 16 bytes. The total number of XDR encoded 3371 bytes is thus 8 + 4 + 20 + 16 = 48. 3373 GSS_Wrap() emits a token that is an XDR encoding of a value of data 3374 type ssv_seal_cipher_tkn4. Note that regardless whether the caller 3375 of GSS_Wrap() requests confidentiality or not, the token always has 3376 confidentiality. This is because the SSV mechanism is for 3377 RPCSEC_GSS, and RPCSEC_GSS never produces GSS_wrap() tokens without 3378 confidentiality. 3380 There is one SSV per client ID. Effectively there is a single GSS 3381 context for a client ID / SSV pair. All SSV mechanism RPCSEC_GSS 3382 handles of a client ID / SSV pair share the same GSS context. SSV 3383 GSS contexts do not expire except when the SSV is destroyed (causes 3384 would include the client ID being destroyed or a server restart). 3385 Since one purpose of context expiration is to replace keys that have 3386 been in use for "too long" hence vulnerable to compromise by brute 3387 force or accident, the client can replace the SSV key by sending 3388 periodic SET_SSV operations, by cycling through different users' 3389 RPCSEC_GSS credentials. This way the SSV is replaced without 3390 destroying the SSV's GSS contexts. 3392 SSV RPCSEC_GSS handles can be expired or deleted by the server at any 3393 time and the EXCHANGE_ID operation can be used to create more SSV 3394 RPCSEC_GSS handles. Expiration of SSV RPCSEC_GSS handles does not 3395 imply that the SSV or its GSS context have expired. 3397 The client MUST establish an SSV via SET_SSV before the SSV GSS 3398 context can be used to emit tokens from GSS_Wrap() and GSS_GetMIC(). 3399 If SET_SSV has not been successfully called, attempts to emit tokens 3400 MUST fail. 3402 The SSV mechanism does not support replay detection and sequencing in 3403 its tokens because RPCSEC_GSS does not use those features (See 3404 Section 5.2.2 "Context Creation Requests" in [4]). 3406 2.10.9. Session Mechanics - Steady State 3408 2.10.9.1. Obligations of the Server 3410 The server has the primary obligation to monitor the state of 3411 backchannel resources that the client has created for the server 3412 (RPCSEC_GSS contexts and backchannel connections). If these 3413 resources vanish, the server takes action as specified in 3414 Section 2.10.11.2. 3416 2.10.9.2. Obligations of the Client 3418 The client SHOULD honor the following obligations in order to utilize 3419 the session: 3421 o Keep a necessary session from going idle on the server. A client 3422 that requires a session, but nonetheless is not sending operations 3423 risks having the session be destroyed by the server. This is 3424 because sessions consume resources, and resource limitations may 3425 force the server to cull an inactive session. A server MAY 3426 consider a session to be inactive if the client has not used the 3427 session before the session inactivity timer (Section 2.10.10) has 3428 expired. 3430 o Destroy the session when not needed. If a client has multiple 3431 sessions, one of which has no requests waiting for replies, and 3432 has been idle for some period of time, it SHOULD destroy the 3433 session. 3435 o Maintain GSS contexts for the backchannel. If the client requires 3436 the server to use the RPCSEC_GSS security flavor for callbacks, 3437 then it needs to be sure the contexts handed to the server via 3438 BACKCHANNEL_CTL are unexpired. 3440 o Preserve a connection for a backchannel. The server requires a 3441 backchannel in order to gracefully recall recallable state, or 3442 notify the client of certain events. Note that if the connection 3443 is not being used for the fore channel, there is no way for the 3444 client tell if the connection is still alive (e.g., the server 3445 restarted without sending a disconnect). The onus is on the 3446 server, not the client, to determine if the backchannel's 3447 connection is alive, and to indicate in the response to a SEQUENCE 3448 operation when the last connection associated with a session's 3449 backchannel has disconnected. 3451 2.10.9.3. Steps the Client Takes To Establish a Session 3453 If the client does not have a client ID, the client sends EXCHANGE_ID 3454 to establish a client ID. If it opts for SP4_MACH_CRED or SP4_SSV 3455 protection, in the spo_must_enforce list of operations, it SHOULD at 3456 minimum specify: CREATE_SESSION, DESTROY_SESSION, 3457 BIND_CONN_TO_SESSION, BACKCHANNEL_CTL, and DESTROY_CLIENTID. If opts 3458 for SP4_SSV protection, the client needs to ask for SSV-based 3459 RPCSEC_GSS handles. 3461 The client uses the client ID to send a CREATE_SESSION on a 3462 connection to the server. The results of CREATE_SESSION indicate 3463 whether the server will persist the session reply cache through a 3464 server restarted or not, and the client notes this for future 3465 reference. 3467 If the client specified SP4_SSV state protection when the client ID 3468 was created, then it SHOULD send SET_SSV in the first COMPOUND after 3469 the session is created. Each time a new principal goes to use the 3470 client ID, it SHOULD send a SET_SSV again. 3472 If the client wants to use delegations, layouts, directory 3473 notifications, or any other state that requires a backchannel, then 3474 it must add a connection to the backchannel if CREATE_SESSION did not 3475 already do so. The client creates a connection, and calls 3476 BIND_CONN_TO_SESSION to associate the connection with the session and 3477 the session's backchannel. If CREATE_SESSION did not already do so, 3478 the client MUST tell the server what security is required in order 3479 for the client to accept callbacks. The client does this via 3480 BACKCHANNEL_CTL. If the client selected SP4_MACH_CRED or SP4_SSV 3481 protection when it called EXCHANGE_ID, then the client SHOULD specify 3482 that the backchannel use RPCSEC_GSS contexts for security. 3484 If the client wants to use additional connections for the 3485 backchannel, then it must call BIND_CONN_TO_SESSION on each 3486 connection it wants to use with the session. If the client wants to 3487 use additional connections for the fore channel, then it must call 3488 BIND_CONN_TO_SESSION if it specified SP4_SSV or SP4_MACH_CRED state 3489 protection when the client ID was created. 3491 At this point the session has reached steady state. 3493 2.10.10. Session Inactivity Timer 3495 The server MAY maintain a session inactivity timer for each session. 3496 If the session inactivity timer expires, then the server MAY destroy 3497 the session. To avoid losing a session due to inactivity, the client 3498 MUST renew the session inactivity timer. The length of session 3499 inactivity timer MUST NOT be less than the lease_time attribute 3500 (Section 5.8.1.11). As with lease renewal (Section 8.3), when the 3501 server receives a SEQUENCE operation, it resets the session 3502 inactivity timer, and MUST NOT allow the timer to expire while the 3503 rest of the operations in the COMPOUND procedure's request are still 3504 executing. Once the last operation has finished, the server MUST set 3505 the session inactivity timer to expire no sooner that the sum of the 3506 current time and the value of the lease_time attribute. 3508 2.10.11. Session Mechanics - Recovery 3510 2.10.11.1. Events Requiring Client Action 3512 The following events require client action to recover. 3514 2.10.11.1.1. RPCSEC_GSS Context Loss by Callback Path 3516 If all RPCSEC_GSS contexts granted by the client to the server for 3517 callback use have expired, the client MUST establish a new context 3518 via BACKCHANNEL_CTL. The sr_status_flags field of the SEQUENCE 3519 results indicates when callback contexts are nearly expired, or fully 3520 expired (see Section 18.46.3). 3522 2.10.11.1.2. Connection Loss 3524 If the client loses the last connection of the session, and if wants 3525 to retain the session, then it must create a new connection, and if, 3526 when the client ID was created, BIND_CONN_TO_SESSION was specified in 3527 the spo_must_enforce list, the client MUST use BIND_CONN_TO_SESSION 3528 to associate the connection with the session. 3530 If there was a request outstanding at the time the of connection 3531 loss, then if client wants to continue to use the session it MUST 3532 retry the request, as described in Section 2.10.5.2. Note that it is 3533 not necessary to retry requests over a connection with the same 3534 source network address or the same destination network address as the 3535 lost connection. As long as the session ID, slot ID, and sequence ID 3536 in the retry match that of the original request, the server will 3537 recognize the request as a retry if it executed the request prior to 3538 disconnect. 3540 If the connection that was lost was the last one associated with the 3541 backchannel, and the client wants to retain the backchannel and/or 3542 not put recallable state subject to revocation, the client must 3543 reconnect, and if it does, it MUST associate the connection to the 3544 session and backchannel via BIND_CONN_TO_SESSION. The server SHOULD 3545 indicate when it has no callback connection via the sr_status_flags 3546 result from SEQUENCE. 3548 2.10.11.1.3. Backchannel GSS Context Loss 3550 Via the sr_status_flags result of the SEQUENCE operation or other 3551 means, the client will learn if some or all of the RPCSEC_GSS 3552 contexts it assigned to the backchannel have been lost. If the 3553 client wants to the retain the backchannel and/or not put recallable 3554 state subjection to revocation, the client must use BACKCHANNEL_CTL 3555 to assign new contexts. 3557 2.10.11.1.4. Loss of Session 3559 The replier might lose a record of the session. Causes include: 3561 o Replier failure and restart 3563 o A catastrophe that causes the reply cache to be corrupted or lost 3564 on the media it was stored on. This applies even if the replier 3565 indicated in the CREATE_SESSION results that it would persist the 3566 cache. 3568 o The server purges the session of a client that has been inactive 3569 for a very extended period of time. 3571 Loss of reply cache is equivalent to loss of session. The replier 3572 indicates loss of session to the requester by returning 3573 NFS4ERR_BADSESSION on the next operation that uses the session ID 3574 that refers to the lost session. 3576 After an event like a server restart, the client may have lost its 3577 connections. The client assumes for the moment that the session has 3578 not been lost. It reconnects, and if it specified connection 3579 association enforcement when the session was created, it invokes 3580 BIND_CONN_TO_SESSION using the session ID. Otherwise, it invokes 3581 SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns 3582 NFS4ERR_BADSESSION, the client knows the session was lost. If the 3583 connection survives session loss, then the next SEQUENCE operation 3584 the client sends over the connection will get back 3585 NFS4ERR_BADSESSION. The client again knows the session was lost. 3587 When the client detects session loss, it must call CREATE_SESSION to 3588 recover. Any non-idempotent operations that were in progress may 3589 have been performed on the server at the time of session loss. The 3590 client has no general way to recover from this. 3592 Note that loss of session does not imply loss of lock, open, 3593 delegation, or layout state because locks, opens, delegations, and 3594 layouts are tied to the client ID and depend on the client ID, not 3595 the session. Nor does loss of lock, open, delegation, or layout 3596 state imply loss of session state, because the session depends on the 3597 client ID; loss of client ID however does imply loss of session, 3598 lock, open, delegation, and layout state. See Section 8.4.2. A 3599 session can survive a server restart, but lock recovery may still be 3600 needed. 3602 It is possible CREATE_SESSION will fail with NFS4ERR_STALE_CLIENTID 3603 (for example the server restarts and does not preserve client ID 3604 state). If so, the client needs to call EXCHANGE_ID, followed by 3605 CREATE_SESSION. 3607 2.10.11.2. Events Requiring Server Action 3609 The following events require server action to recover. 3611 2.10.11.2.1. Client Crash and Restart 3613 As described in Section 18.35, a restarted client sends EXCHANGE_ID 3614 in such a way it causes the server to delete any sessions it had. 3616 2.10.11.2.2. Client Crash with No Restart 3618 If a client crashes and never comes back, it will never send 3619 EXCHANGE_ID with its old client owner. Thus the server has session 3620 state that will never be used again. After an extended period of 3621 time and if the server has resource constraints, it MAY destroy the 3622 old session as well as locking state. 3624 2.10.11.2.3. Extended Network Partition 3626 To the server, the extended network partition may be no different 3627 from a client crash with no restart (see Section 2.10.11.2.2). 3628 Unless the server can discern that there is a network partition, it 3629 is free to treat the situation as if the client has crashed 3630 permanently. 3632 2.10.11.2.4. Backchannel Connection Loss 3634 If there were callback requests outstanding at the time of a 3635 connection loss, then the server MUST retry the request, as described 3636 in Section 2.10.5.2. Note that it is not necessary to retry requests 3637 over a connection with the same source network address or the same 3638 destination network address as the lost connection. As long as the 3639 session ID, slot ID, and sequence ID in the retry match that of the 3640 original request, the callback target will recognize the request as a 3641 retry even if it did see the request prior to disconnect. 3643 If the connection lost is the last one associated with the 3644 backchannel, then the server MUST indicate that in the 3645 sr_status_flags field of every SEQUENCE reply until the backchannel 3646 is reestablished. There are two situations each of which use 3647 different status flags: no connectivity for the session's 3648 backchannel, and no connectivity for any session backchannel of the 3649 client. See Section 18.46 for a description of the appropriate flags 3650 in sr_status_flags. 3652 2.10.11.2.5. GSS Context Loss 3654 The server SHOULD monitor when the number RPCSEC_GSS contexts 3655 assigned to the backchannel reaches one, and when that one context is 3656 near expiry (i.e. between one and two periods of lease time), 3657 indicate so in the sr_status_flags field of all SEQUENCE replies. 3658 The server MUST indicate when the all of the backchannel's assigned 3659 RPCSEC_GSS contexts have expired in the sr_status_flags field of all 3660 SEQUENCE replies. 3662 2.10.12. Parallel NFS and Sessions 3664 A client and server can potentially be a non-pNFS implementation, a 3665 metadata server implementation, a data server implementation, or two 3666 or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS, 3667 EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not 3668 mutually exclusive) are passed in the EXCHANGE_ID arguments and 3669 results to allow the client to indicate how it wants to use sessions 3670 created under the client ID, and to allow the server to indicate how 3671 it will allow the sessions to be used. See Section 13.1 for pNFS 3672 sessions considerations. 3674 3. Protocol Constants and Data Types 3676 The syntax and semantics to describe the data types of the NFSv4.1 3677 protocol are defined in the XDR RFC4506 [2] and RPC RFC1831 [3] 3678 documents. The next sections build upon the XDR data types to define 3679 constants, types and structures specific to this protocol. The full 3680 list of XDR data types is in [12]. 3682 3.1. Basic Constants 3684 const NFS4_FHSIZE = 128; 3685 const NFS4_VERIFIER_SIZE = 8; 3686 const NFS4_OPAQUE_LIMIT = 1024; 3687 const NFS4_SESSIONID_SIZE = 16; 3689 const NFS4_INT64_MAX = 0x7fffffffffffffff; 3690 const NFS4_UINT64_MAX = 0xffffffffffffffff; 3691 const NFS4_INT32_MAX = 0x7fffffff; 3692 const NFS4_UINT32_MAX = 0xffffffff; 3694 const NFS4_MAXFILELEN = 0xffffffffffffffff; 3695 const NFS4_MAXFILEOFF = 0xfffffffffffffffe; 3697 Except where noted, all these constants are defined in bytes. 3699 o NFS4_FHSIZE is the maximum size of a filehandle. 3701 o NFS4_VERIFIER_SIZE is the fixed size of a verifier. 3703 o NFS4_OPAQUE_LIMIT is the maximum size of certain opaque 3704 information. 3706 o NFS4_SESSIONID_SIZE is the fixed size of a session identifier. 3708 o NFS4_INT64_MAX is the maximum value of a signed 64 bit integer. 3710 o NFS4_UINT64_MAX is the maximum value of an unsigned 64 bit 3711 integer. 3713 o NFS4_INT32_MAX is the maximum value of a signed 32 bit integer. 3715 o NFS4_UINT32_MAX is the maximum value of an unsigned 32 bit 3716 integer. 3718 o NFS4_MAXFILELEN is the maximum length of a regular file. 3720 o NFS4_MAXFILEOFF is the maximum offset into a regular file. 3722 3.2. Basic Data Types 3724 These are the base NFSv4.1 data types. 3726 +---------------+---------------------------------------------------+ 3727 | Data Type | Definition | 3728 +---------------+---------------------------------------------------+ 3729 | int32_t | typedef int int32_t; | 3730 | uint32_t | typedef unsigned int uint32_t; | 3731 | int64_t | typedef hyper int64_t; | 3732 | uint64_t | typedef unsigned hyper uint64_t; | 3733 | attrlist4 | typedef opaque attrlist4<>; | 3734 | | Used for file/directory attributes. | 3735 | bitmap4 | typedef uint32_t bitmap4<>; | 3736 | | Used in attribute array encoding. | 3737 | changeid4 | typedef uint64_t changeid4; | 3738 | | Used in the definition of change_info4. | 3739 | clientid4 | typedef uint64_t clientid4; | 3740 | | Shorthand reference to client identification. | 3741 | count4 | typedef uint32_t count4; | 3742 | | Various count parameters (READ, WRITE, COMMIT). | 3743 | length4 | typedef uint64_t length4; | 3744 | | Describes LOCK lengths. | 3745 | mode4 | typedef uint32_t mode4; | 3746 | | Mode attribute data type. | 3747 | nfs_cookie4 | typedef uint64_t nfs_cookie4; | 3748 | | Opaque cookie value for READDIR. | 3749 | nfs_fh4 | typedef opaque nfs_fh4; | 3750 | | Filehandle definition. | 3751 | nfs_ftype4 | enum nfs_ftype4; | 3752 | | Various defined file types. | 3753 | nfsstat4 | enum nfsstat4; | 3754 | | Return value for operations. | 3755 | offset4 | typedef uint64_t offset4; | 3756 | | Various offset designations (READ, WRITE, LOCK, | 3757 | | COMMIT). | 3758 | qop4 | typedef uint32_t qop4; | 3759 | | Quality of protection designation in SECINFO. | 3760 | sec_oid4 | typedef opaque sec_oid4<>; | 3761 | | Security Object Identifier. The sec_oid4 data | 3762 | | type is not really opaque. Instead it contains an | 3763 | | ASN.1 OBJECT IDENTIFIER as used by GSS-API in the | 3764 | | mech_type argument to GSS_Init_sec_context. See | 3765 | | [7] for details. | 3766 | sequenceid4 | typedef uint32_t sequenceid4; | 3767 | | Sequence number used for various session | 3768 | | operations (EXCHANGE_ID, CREATE_SESSION, | 3769 | | SEQUENCE, CB_SEQUENCE). | 3770 | seqid4 | typedef uint32_t seqid4; | 3771 | | Sequence identifier used for file locking. | 3772 | sessionid4 | typedef opaque sessionid4[NFS4_SESSIONID_SIZE]; | 3773 | | Session identifier. | 3774 | slotid4 | typedef uint32_t slotid4; | 3775 | | Sequencing artifact for various session | 3776 | | operations (SEQUENCE, CB_SEQUENCE). | 3777 | utf8string | typedef opaque utf8string<>; | 3778 | | UTF-8 encoding for strings. | 3779 | utf8str_cis | typedef utf8string utf8str_cis; | 3780 | | Case-insensitive UTF-8 string. | 3781 | utf8str_cs | typedef utf8string utf8str_cs; | 3782 | | Case-sensitive UTF-8 string. | 3783 | utf8str_mixed | typedef utf8string utf8str_mixed; | 3784 | | UTF-8 strings with a case sensitive prefix and a | 3785 | | case insensitive suffix. | 3786 | component4 | typedef utf8str_cs component4; | 3787 | | Represents path name components. | 3788 | linktext4 | typedef utf8str_cs linktext4; | 3789 | | Symbolic link contents. | 3790 | pathname4 | typedef component4 pathname4<>; | 3791 | | Represents path name for fs_locations. | 3792 | verifier4 | typedef opaque verifier4[NFS4_VERIFIER_SIZE]; | 3793 | | Verifier used for various operations (COMMIT, | 3794 | | CREATE, EXCHANGE_ID, OPEN, READDIR, WRITE) | 3795 | | NFS4_VERIFIER_SIZE is defined as 8. | 3796 +---------------+---------------------------------------------------+ 3798 End of Base Data Types 3800 Table 1 3802 3.3. Structured Data Types 3804 3.3.1. nfstime4 3806 struct nfstime4 { 3807 int64_t seconds; 3808 uint32_t nseconds; 3809 }; 3811 The nfstime4 data type gives the number of seconds and nanoseconds 3812 since midnight or 0 hour January 1, 1970 Coordinated Universal Time 3813 (UTC). Values greater than zero for the seconds field denote dates 3814 after the 0 hour January 1, 1970. Values less than zero for the 3815 seconds field denote dates before the 0 hour January 1, 1970. In 3816 both cases, the nseconds field is to be added to the seconds field 3817 for the final time representation. For example, if the time to be 3818 represented is one-half second before 0 hour January 1, 1970, the 3819 seconds field would have a value of negative one (-1) and the 3820 nseconds fields would have a value of one-half second (500000000). 3821 Values greater than 999,999,999 for nseconds are invalid. 3823 This data type is used to pass time and date information. A server 3824 converts to and from its local representation of time when processing 3825 time values, preserving as much accuracy as possible. If the 3826 precision of timestamps stored for a file system object is less than 3827 defined, loss of precision can occur. An adjunct time maintenance 3828 protocol is RECOMMENDED to reduce client and server time skew. 3830 3.3.2. time_how4 3832 enum time_how4 { 3833 SET_TO_SERVER_TIME4 = 0, 3834 SET_TO_CLIENT_TIME4 = 1 3835 }; 3837 3.3.3. settime4 3839 union settime4 switch (time_how4 set_it) { 3840 case SET_TO_CLIENT_TIME4: 3841 nfstime4 time; 3842 default: 3843 void; 3844 }; 3846 The time_how4 and settime4 data types are used for setting timestamps 3847 in file object attributes. If set_it is SET_TO_SERVER_TIME4, then 3848 the server uses its local representation of time for the time value. 3850 3.3.4. specdata4 3852 struct specdata4 { 3853 uint32_t specdata1; /* major device number */ 3854 uint32_t specdata2; /* minor device number */ 3855 }; 3857 This data type represents the device numbers for the device file 3858 types NF4CHR and NF4BLK. 3860 3.3.5. fsid4 3862 struct fsid4 { 3863 uint64_t major; 3864 uint64_t minor; 3865 }; 3867 3.3.6. chg_policy4 3869 struct change_policy4 { 3870 uint64_t cp_major; 3871 uint64_t cp_minor; 3872 }; 3874 The chg_policy4 data type is used for the change_policy RECOMMENDED 3875 attribute. It provides change sequencing indication analogous to the 3876 change attribute. To enable the server to present a value valid 3877 across server re-initialization without requiring persistent storage, 3878 two 64-bit quantities are used, allowing one to be a server instance 3879 ID and the second to be incremented non-persistently, within a given 3880 server instance. 3882 3.3.7. fattr4 3884 struct fattr4 { 3885 bitmap4 attrmask; 3886 attrlist4 attr_vals; 3887 }; 3889 The fattr4 data type is used to represent file and directory 3890 attributes. 3892 The bitmap is a counted array of 32 bit integers used to contain bit 3893 values. The position of the integer in the array that contains bit n 3894 can be computed from the expression (n / 32) and its bit within that 3895 integer is (n mod 32). 3897 0 1 3898 +-----------+-----------+-----------+-- 3899 | count | 31 .. 0 | 63 .. 32 | 3900 +-----------+-----------+-----------+-- 3902 3.3.8. change_info4 3904 struct change_info4 { 3905 bool atomic; 3906 changeid4 before; 3907 changeid4 after; 3908 }; 3910 This data type is used with the CREATE, LINK, OPEN, REMOVE, and 3911 RENAME operations to let the client know the value of the change 3912 attribute for the directory in which the target file system object 3913 resides. 3915 3.3.9. netaddr4 3917 struct netaddr4 { 3918 /* see struct rpcb in RFC 1833 */ 3919 string na_r_netid<>; /* network id */ 3920 string na_r_addr<>; /* universal address */ 3921 }; 3923 The netaddr4 data type is used to identify TCP/IP based endpoints. 3924 The r_netid and r_addr fields are specified in RFC1833 [26], but they 3925 are underspecified in RFC1833 [26] as far as what they should look 3926 like for specific protocols. The next section clarifies this. 3928 3.3.9.1. Format of netaddr4 for TCP and UDP over IPv4 3930 For TCP over IPv4 and for UDP over IPv4, the format of r_addr is the 3931 US-ASCII string: 3933 h1.h2.h3.h4.p1.p2 3935 The prefix, "h1.h2.h3.h4", is the standard textual form for 3936 representing an IPv4 address, which is always four bytes long. 3937 Assuming big-endian ordering, h1, h2, h3, and h4, are respectively, 3938 the first through fourth bytes each converted to ASCII-decimal. The 3939 suffix, "p1.p2", is a textual form for representing a TCP and UDP 3940 service port. Assuming big-endian ordering, p1 and p2 are, 3941 respectively, the first and second bytes each converted to ASCII- 3942 decimal. For example, if a host, in big-endian order, has an address 3943 of 0x0A010307 and there is a service listening on, in big endian 3944 order, port 0x020F (decimal 527), then the complete universal address 3945 is "10.1.3.7.2.15". 3947 For TCP over IPv4 the value of r_netid is the string "tcp". For UDP 3948 over IPv4 the value of r_netid is the string "udp". That this 3949 document specifies the universal address and netid for UDP/IPv6 does 3950 not imply that UDP/IPv4 is a legal transport for NFSv4.1 (see 3951 Section 2.9). 3953 3.3.9.2. Format of netaddr4 for TCP and UDP over IPv6 3955 For TCP over IPv6 and for UDP over IPv6, the format of r_addr is the 3956 US-ASCII string: 3958 x1:x2:x3:x4:x5:x6:x7:x8.p1.p2 3960 The suffix "p1.p2" is the service port, and is computed the same way 3961 as with universal addresses for TCP and UDP over IPv4. The prefix, 3962 "x1:x2:x3:x4:x5:x6:x7:x8", is the preferred textual form for 3963 representing an IPv6 address as defined in Section 2.2 of RFC3513 3964 [13]. Additionally, the two alternative forms specified in Section 3965 2.2 of RFC3513 are also acceptable. 3967 For TCP over IPv6 the value of r_netid is the string "tcp6". For UDP 3968 over IPv6 the value of r_netid is the string "udp6". That this 3969 document specifies the universal address and netid for UDP/IPv6 does 3970 not imply that UDP/IPv6 is a legal transport for NFSv4.1 (see 3971 Section 2.9). 3973 3.3.10. state_owner4 3975 struct state_owner4 { 3976 clientid4 clientid; 3977 opaque owner; 3978 }; 3980 typedef state_owner4 open_owner4; 3981 typedef state_owner4 lock_owner4; 3983 The state_owner4 data type is the base type for the open_owner4 3984 Section 3.3.10.1 and lock_owner4 Section 3.3.10.2. 3986 3.3.10.1. open_owner4 3988 This data type is used to identify the owner of open state. 3990 3.3.10.2. lock_owner4 3992 This structure is used to identify the owner of byte-range locking 3993 state. 3995 3.3.11. open_to_lock_owner4 3997 struct open_to_lock_owner4 { 3998 seqid4 open_seqid; 3999 stateid4 open_stateid; 4000 seqid4 lock_seqid; 4001 lock_owner4 lock_owner; 4002 }; 4004 This data type is used for the first LOCK operation done for an 4005 open_owner4. It provides both the open_stateid and lock_owner such 4006 that the transition is made from a valid open_stateid sequence to 4007 that of the new lock_stateid sequence. Using this mechanism avoids 4008 the confirmation of the lock_owner/lock_seqid pair since it is tied 4009 to established state in the form of the open_stateid/open_seqid. 4011 3.3.12. stateid4 4013 struct stateid4 { 4014 uint32_t seqid; 4015 opaque other[12]; 4016 }; 4018 This data type is used for the various state sharing mechanisms 4019 between the client and server. The client never modifies a value of 4020 data type stateid. The starting value of the seqid field is 4021 undefined. The server is required to increment the seqid field by 4022 one (1) at each transition of the stateid. This is important since 4023 the client will inspect the seqid in OPEN stateids to determine the 4024 order of OPEN processing done by the server. 4026 3.3.13. layouttype4 4028 enum layouttype4 { 4029 LAYOUT4_NFSV4_1_FILES = 0x1, 4030 LAYOUT4_OSD2_OBJECTS = 0x2, 4031 LAYOUT4_BLOCK_VOLUME = 0x3 4032 }; 4034 This data type indicates what type of layout is being used. The file 4035 server advertises the layout types it supports through the 4036 fs_layout_type file system attribute (Section 5.12.1). A client asks 4037 for layouts of a particular type in LAYOUTGET, and processes those 4038 layouts in its layout-type-specific logic. 4040 The layouttype4 data type is 32 bits in length. The range 4041 represented by the layout type is split into three parts. Type 0x0 4042 is reserved. Types within the range 0x00000001-0x7FFFFFFF are 4043 globally unique and are assigned according to the description in 4044 Section 22.4; they are maintained by IANA. Types within the range 4045 0x80000000-0xFFFFFFFF are site specific and for private use only. 4047 The LAYOUT4_NFSV4_1_FILES enumeration specifies that the NFSv4.1 file 4048 layout type, as defined in Section 13, is to be used. The 4049 LAYOUT4_OSD2_OBJECTS enumeration specifies that the object layout, as 4050 defined in [30], is to be used. Similarly, the LAYOUT4_BLOCK_VOLUME 4051 enumeration specifies that the block/volume layout, as defined in 4052 [31], is to be used. 4054 3.3.14. deviceid4 4056 const NFS4_DEVICEID4_SIZE = 16; 4058 typedef opaque deviceid4[NFS4_DEVICEID4_SIZE]; 4059 Layout information includes device IDs that specify a storage device 4060 through a compact handle. Addressing and type information is 4061 obtained with the GETDEVICEINFO operation. Device IDs are not 4062 guaranteed to be valid across metadata server restarts. A device ID 4063 is unique per client ID and layout type. See Section 12.2.10 for 4064 more details. 4066 3.3.15. device_addr4 4068 struct device_addr4 { 4069 layouttype4 da_layout_type; 4070 opaque da_addr_body<>; 4071 }; 4073 The device address is used to set up a communication channel with the 4074 storage device. Different layout types will require different data 4075 types to define how they communicate with storage devices. The 4076 opaque da_addr_body field must be interpreted based on the specified 4077 da_layout_type field. 4079 This document defines the device address for the NFSv4.1 file layout 4080 (see Section 13.3), which identifies a storage device by network IP 4081 address and port number. This is sufficient for the clients to 4082 communicate with the NFSv4.1 storage devices, and may be sufficient 4083 for other layout types as well. Device types for object storage 4084 devices and block storage devices (e.g., SCSI volume labels) will be 4085 defined by their respective layout specifications. 4087 3.3.16. layout_content4 4089 struct layout_content4 { 4090 layouttype4 loc_type; 4091 opaque loc_body<>; 4092 }; 4094 The loc_body field must be interpreted based on the layout type 4095 (loc_type). This document defines the loc_body for the NFSv4.1 file 4096 layout type is defined; see Section 13.3 for its definition. 4098 3.3.17. layout4 4100 struct layout4 { 4101 offset4 lo_offset; 4102 length4 lo_length; 4103 layoutiomode4 lo_iomode; 4104 layout_content4 lo_content; 4105 }; 4106 The layout4 data type defines a layout for a file. The layout type 4107 specific data is opaque within lo_content. Since layouts are sub- 4108 dividable, the offset and length together with the file's filehandle, 4109 the client ID, iomode, and layout type, identify the layout. 4111 3.3.18. layoutupdate4 4113 struct layoutupdate4 { 4114 layouttype4 lou_type; 4115 opaque lou_body<>; 4116 }; 4118 The layoutupdate4 data type is used by the client to return updated 4119 layout information to the metadata server via the LAYOUTCOMMIT 4120 (Section 18.42) operation. This data type provides a channel to pass 4121 layout type specific information (in field lou_body) back to the 4122 metadata server. E.g., for the block/volume layout type this could 4123 include the list of reserved blocks that were written. The contents 4124 of the opaque lou_body argument are determined by the layout type. 4125 The NFSv4.1 file-based layout does not use this data type; if 4126 lou_type is LAYOUT4_NFSV4_1_FILES, the lou_body field MUST have a 4127 zero length. 4129 3.3.19. layouthint4 4131 struct layouthint4 { 4132 layouttype4 loh_type; 4133 opaque loh_body<>; 4134 }; 4136 The layouthint4 data type is used by the client to pass in a hint 4137 about the type of layout it would like created for a particular file. 4138 It is the data type specified by the layout_hint attribute described 4139 in Section 5.12.4. The metadata server may ignore the hint, or may 4140 selectively ignore fields within the hint. This hint should be 4141 provided at create time as part of the initial attributes within 4142 OPEN. The loh_body field is specific to the type of layout 4143 (loh_type). The NFSv4.1 file-based layout uses the 4144 nfsv4_1_file_layouthint4 data type as defined in Section 13.3. 4146 3.3.20. layoutiomode4 4148 enum layoutiomode4 { 4149 LAYOUTIOMODE4_READ = 1, 4150 LAYOUTIOMODE4_RW = 2, 4151 LAYOUTIOMODE4_ANY = 3 4152 }; 4153 The iomode specifies whether the client intends to just read or both 4154 read and write the data represented by the layout. While the 4155 LAYOUTIOMODE4_ANY iomode MUST NOT be used in the arguments to the 4156 LAYOUTGET operation, it MAY be used in the arguments to the 4157 LAYOUTRETURN and CB_LAYOUTRECALL operations. The LAYOUTIOMODE4_ANY 4158 iomode specifies that layouts pertaining to both LAYOUTIOMODE4_READ 4159 and LAYOUTIOMODE4_RW iomodes are being returned or recalled, 4160 respectively. The metadata server's use of the iomode may depend on 4161 the layout type being used. The storage devices MAY validate I/O 4162 accesses against the iomode and reject invalid accesses. 4164 3.3.21. nfs_impl_id4 4166 struct nfs_impl_id4 { 4167 utf8str_cis nii_domain; 4168 utf8str_cs nii_name; 4169 nfstime4 nii_date; 4170 }; 4172 This data type is used to identify client and server implementation 4173 details. The nii_domain field is the DNS domain name that the 4174 implementer is associated with. The nii_name field is the product 4175 name of the implementation and is completely free form. It is 4176 RECOMMENDED that the nii_name be used to distinguish machine 4177 architecture, machine platforms, revisions, versions, and patch 4178 levels. The nii_date field is the timestamp of when the software 4179 instance was published or built. 4181 3.3.22. threshold_item4 4183 struct threshold_item4 { 4184 layouttype4 thi_layout_type; 4185 bitmap4 thi_hintset; 4186 opaque thi_hintlist<>; 4187 }; 4189 This data type contains a list of hints specific to a layout type for 4190 helping the client determine when it should send I/O directly through 4191 the metadata server versus the storage devices. The data type 4192 consists of the layout type (thi_layout_type), a bitmap (thi_hintset) 4193 describing the set of hints supported by the server (they may differ 4194 based on the layout type), and a list of hints (thi_hintlist), whose 4195 content is determined by the hintset bitmap. See the mdsthreshold 4196 attribute for more details. 4198 The thi_hintset field is a bitmap of the following values: 4200 +-------------------------+---+---------+---------------------------+ 4201 | name | # | Data | Description | 4202 | | | Type | | 4203 +-------------------------+---+---------+---------------------------+ 4204 | threshold4_read_size | 0 | length4 | The file size below which | 4205 | | | | it is RECOMMENDED to read | 4206 | | | | data through the MDS. | 4207 | threshold4_write_size | 1 | length4 | The file size below which | 4208 | | | | it is RECOMMENDED to | 4209 | | | | write data through the | 4210 | | | | MDS. | 4211 | threshold4_read_iosize | 2 | length4 | For read I/O sizes below | 4212 | | | | this threshold it is | 4213 | | | | RECOMMENDED to read data | 4214 | | | | through the MDS | 4215 | threshold4_write_iosize | 3 | length4 | For write I/O sizes below | 4216 | | | | this threshold it is | 4217 | | | | RECOMMENDED to write data | 4218 | | | | through the MDS | 4219 +-------------------------+---+---------+---------------------------+ 4221 3.3.23. mdsthreshold4 4223 struct mdsthreshold4 { 4224 threshold_item4 mth_hints<>; 4225 }; 4227 This data type holds an array of elements of data type 4228 threshold_item4, each of which is valid for a particular layout type. 4229 An array is necessary because a server can support multiple layout 4230 types for a single file. 4232 4. Filehandles 4234 The filehandle in the NFS protocol is a per server unique identifier 4235 for a file system object. The contents of the filehandle are opaque 4236 to the client. Therefore, the server is responsible for translating 4237 the filehandle to an internal representation of the file system 4238 object. 4240 4.1. Obtaining the First Filehandle 4242 The operations of the NFS protocol are defined in terms of one or 4243 more filehandles. Therefore, the client needs a filehandle to 4244 initiate communication with the server. With the NFSv3 protocol 4245 RFC1813 [22], there exists an ancillary protocol to obtain this first 4246 filehandle. The MOUNT protocol, RPC program number 100005, provides 4247 the mechanism of translating a string based file system path name to 4248 a filehandle which can then be used by the NFS protocols. 4250 The MOUNT protocol has deficiencies in the area of security and use 4251 via firewalls. This is one reason that the use of the public 4252 filehandle was introduced in RFC2054 [32] and RFC2055 [33]. With the 4253 use of the public filehandle in combination with the LOOKUP operation 4254 in the NFSv3 protocol, it has been demonstrated that the MOUNT 4255 protocol is unnecessary for viable interaction between NFS client and 4256 server. 4258 Therefore, the NFSv4.1 protocol will not use an ancillary protocol 4259 for translation from string based path names to a filehandle. Two 4260 special filehandles will be used as starting points for the NFS 4261 client. 4263 4.1.1. Root Filehandle 4265 The first of the special filehandles is the ROOT filehandle. The 4266 ROOT filehandle is the "conceptual" root of the file system name 4267 space at the NFS server. The client uses or starts with the ROOT 4268 filehandle by employing the PUTROOTFH operation. The PUTROOTFH 4269 operation instructs the server to set the "current" filehandle to the 4270 ROOT of the server's file tree. Once this PUTROOTFH operation is 4271 used, the client can then traverse the entirety of the server's file 4272 tree with the LOOKUP operation. A complete discussion of the server 4273 name space is in the Section 7. 4275 4.1.2. Public Filehandle 4277 The second special filehandle is the PUBLIC filehandle. Unlike the 4278 ROOT filehandle, the PUBLIC filehandle may be bound or represent an 4279 arbitrary file system object at the server. The server is 4280 responsible for this binding. It may be that the PUBLIC filehandle 4281 and the ROOT filehandle refer to the same file system object. 4282 However, it is up to the administrative software at the server and 4283 the policies of the server administrator to define the binding of the 4284 PUBLIC filehandle and server file system object. The client may not 4285 make any assumptions about this binding. The client uses the PUBLIC 4286 filehandle via the PUTPUBFH operation. 4288 4.2. Filehandle Types 4290 In the NFSv3 protocol, there was one type of filehandle with a single 4291 set of semantics. This type of filehandle is termed "persistent" in 4292 NFSv4.1. The semantics of a persistent filehandle remain the same as 4293 before. A new type of filehandle introduced in NFSv4.1 is the 4294 "volatile" filehandle, which attempts to accommodate certain server 4295 environments. 4297 The volatile filehandle type was introduced to address server 4298 functionality or implementation issues which make correct 4299 implementation of a persistent filehandle infeasible. Some server 4300 environments do not provide a file system level invariant that can be 4301 used to construct a persistent filehandle. The underlying server 4302 file system may not provide the invariant or the server's file system 4303 programming interfaces may not provide access to the needed 4304 invariant. Volatile filehandles may ease the implementation of 4305 server functionality such as hierarchical storage management or file 4306 system reorganization or migration. However, the volatile filehandle 4307 increases the implementation burden for the client. 4309 Since the client will need to handle persistent and volatile 4310 filehandles differently, a file attribute is defined which may be 4311 used by the client to determine the filehandle types being returned 4312 by the server. 4314 4.2.1. General Properties of a Filehandle 4316 The filehandle contains all the information the server needs to 4317 distinguish an individual file. To the client, the filehandle is 4318 opaque. The client stores filehandles for use in a later request and 4319 can compare two filehandles from the same server for equality by 4320 doing a byte-by-byte comparison. However, the client MUST NOT 4321 otherwise interpret the contents of filehandles. If two filehandles 4322 from the same server are equal, they MUST refer to the same file. 4323 Servers SHOULD try to maintain a one-to-one correspondence between 4324 filehandles and files but this is not required. Clients MUST use 4325 filehandle comparisons only to improve performance, not for correct 4326 behavior. All clients need to be prepared for situations in which it 4327 cannot be determined whether two filehandles denote the same object 4328 and in such cases, avoid making invalid assumptions which might cause 4329 incorrect behavior. Further discussion of filehandle and attribute 4330 comparison in the context of data caching is presented in the 4331 Section 10.3.4. 4333 As an example, in the case that two different path names when 4334 traversed at the server terminate at the same file system object, the 4335 server SHOULD return the same filehandle for each path. This can 4336 occur if a hard link is used to create two file names which refer to 4337 the same underlying file object and associated data. For example, if 4338 paths /a/b/c and /a/d/c refer to the same file, the server SHOULD 4339 return the same filehandle for both path names traversals. 4341 4.2.2. Persistent Filehandle 4343 A persistent filehandle is defined as having a fixed value for the 4344 lifetime of the file system object to which it refers. Once the 4345 server creates the filehandle for a file system object, the server 4346 MUST accept the same filehandle for the object for the lifetime of 4347 the object. If the server restarts, the NFS server must honor the 4348 same filehandle value as it did in the server's previous 4349 instantiation. Similarly, if the file system is migrated, the new 4350 NFS server must honor the same filehandle as the old NFS server. 4352 The persistent filehandle will be become stale or invalid when the 4353 file system object is removed. When the server is presented with a 4354 persistent filehandle that refers to a deleted object, it MUST return 4355 an error of NFS4ERR_STALE. A filehandle may become stale when the 4356 file system containing the object is no longer available. The file 4357 system may become unavailable if it exists on removable media and the 4358 media is no longer available at the server or the file system in 4359 whole has been destroyed or the file system has simply been removed 4360 from the server's name space (i.e. unmounted in a UNIX environment). 4362 4.2.3. Volatile Filehandle 4364 A volatile filehandle does not share the same longevity 4365 characteristics of a persistent filehandle. The server may determine 4366 that a volatile filehandle is no longer valid at many different 4367 points in time. If the server can definitively determine that a 4368 volatile filehandle refers to an object that has been removed, the 4369 server should return NFS4ERR_STALE to the client (as is the case for 4370 persistent filehandles). In all other cases where the server 4371 determines that a volatile filehandle can no longer be used, it 4372 should return an error of NFS4ERR_FHEXPIRED. 4374 The REQUIRED attribute "fh_expire_type" is used by the client to 4375 determine what type of filehandle the server is providing for a 4376 particular file system. This attribute is a bitmask with the 4377 following values: 4379 FH4_PERSISTENT The value of FH4_PERSISTENT is used to indicate a 4380 persistent filehandle, which is valid until the object is removed 4381 from the file system. The server will not return 4382 NFS4ERR_FHEXPIRED for this filehandle. FH4_PERSISTENT is defined 4383 as a value in which none of the bits specified below are set. 4385 FH4_VOLATILE_ANY The filehandle may expire at any time, except as 4386 specifically excluded (i.e. FH4_NO_EXPIRE_WITH_OPEN). 4388 FH4_NOEXPIRE_WITH_OPEN May only be set when FH4_VOLATILE_ANY is set. 4389 If this bit is set, then the meaning of FH4_VOLATILE_ANY is 4390 qualified to exclude any expiration of the filehandle when it is 4391 open. 4393 FH4_VOL_MIGRATION The filehandle will expire as a result of a file 4394 system transition (migration or replication), in those case in 4395 which the continuity of filehandle use is not specified by 4396 _handle_ class information within the fs_locations_info attribute. 4397 When this bit is set, clients without access to fs_locations_info 4398 information should assume filehandles will expire on file system 4399 transitions. 4401 FH4_VOL_RENAME The filehandle will expire during rename. This 4402 includes a rename by the requesting client or a rename by any 4403 other client. If FH4_VOL_ANY is set, FH4_VOL_RENAME is redundant. 4405 Servers which provide volatile filehandles that may expire while open 4406 (i.e. if FH4_VOL_MIGRATION or FH4_VOL_RENAME is set or if 4407 FH4_VOLATILE_ANY is set and FH4_NOEXPIRE_WITH_OPEN not set), should 4408 deny a RENAME or REMOVE that would affect an OPEN file of any of the 4409 components leading to the OPEN file. In addition, the server should 4410 deny all RENAME or REMOVE requests during the grace period upon 4411 server restart. 4413 Servers which provide volatile filehandles that may expire while open 4414 require special care as regards handling of RENAMESs and REMOVEs. 4415 This situation can arise if FH4_VOL_MIGRATION or FH4_VOL_RENAME is 4416 set, if FH4_VOLATILE_ANY is set and FH4_NOEXPIRE_WITH_OPEN not set, 4417 or if a non-readonly file system has a transition target in a 4418 different _handle _ class. In these cases, the server should deny a 4419 RENAME or REMOVE that would affect an OPEN file of any of the 4420 components leading to the OPEN file. In addition, the server should 4421 deny all RENAME or REMOVE requests during the grace period, in order 4422 to make sure that reclaims of files where filehandles may have 4423 expired do not do a reclaim for the wrong file. 4425 Volatile filehandles are especially suitable for implementation of 4426 the pseudo file systems used to bridge exports. See Section 7.5 for 4427 a discussion of this. 4429 4.3. One Method of Constructing a Volatile Filehandle 4431 A volatile filehandle, while opaque to the client could contain: 4433 [volatile bit = 1 | server boot time | slot | generation number] 4434 o slot is an index in the server volatile filehandle table 4436 o generation number is the generation number for the table entry/ 4437 slot 4439 When the client presents a volatile filehandle, the server makes the 4440 following checks, which assume that the check for the volatile bit 4441 has passed. If the server boot time is less than the current server 4442 boot time, return NFS4ERR_FHEXPIRED. If slot is out of range, return 4443 NFS4ERR_BADHANDLE. If the generation number does not match, return 4444 NFS4ERR_FHEXPIRED. 4446 When the server restarts, the table is gone (it is volatile). 4448 If volatile bit is 0, then it is a persistent filehandle with a 4449 different structure following it. 4451 4.4. Client Recovery from Filehandle Expiration 4453 If possible, the client SHOULD recover from the receipt of an 4454 NFS4ERR_FHEXPIRED error. The client must take on additional 4455 responsibility so that it may prepare itself to recover from the 4456 expiration of a volatile filehandle. If the server returns 4457 persistent filehandles, the client does not need these additional 4458 steps. 4460 For volatile filehandles, most commonly the client will need to store 4461 the component names leading up to and including the file system 4462 object in question. With these names, the client should be able to 4463 recover by finding a filehandle in the name space that is still 4464 available or by starting at the root of the server's file system name 4465 space. 4467 If the expired filehandle refers to an object that has been removed 4468 from the file system, obviously the client will not be able to 4469 recover from the expired filehandle. 4471 It is also possible that the expired filehandle refers to a file that 4472 has been renamed. If the file was renamed by another client, again 4473 it is possible that the original client will not be able to recover. 4474 However, in the case that the client itself is renaming the file and 4475 the file is open, it is possible that the client may be able to 4476 recover. The client can determine the new path name based on the 4477 processing of the rename request. The client can then regenerate the 4478 new filehandle based on the new path name. The client could also use 4479 the compound operation mechanism to construct a set of operations 4480 like: 4482 RENAME A B 4483 LOOKUP B 4484 GETFH 4486 Note that the COMPOUND procedure does not provide atomicity. This 4487 example only reduces the overhead of recovering from an expired 4488 filehandle. 4490 5. File Attributes 4492 To meet the requirements of extensibility and increased 4493 interoperability with non-UNIX platforms, attributes must be handled 4494 in a flexible manner. The NFSv3 fattr3 structure contains a fixed 4495 list of attributes that not all clients and servers are able to 4496 support or care about. The fattr3 structure can not be extended as 4497 new needs arise and it provides no way to indicate non-support. With 4498 the NFSv4.1 protocol, the client is able query what attributes the 4499 server supports and construct requests with only those supported 4500 attributes (or a subset thereof). 4502 To this end, attributes are divided into three groups: REQUIRED, 4503 RECOMMENDED, and named. Both REQUIRED and RECOMMENDED attributes are 4504 supported in the NFSv4.1 protocol by a specific and well-defined 4505 encoding and are identified by number. They are requested by setting 4506 a bit in the bit vector sent in the GETATTR request; the server 4507 response includes a bit vector to list what attributes were returned 4508 in the response. New REQUIRED or RECOMMENDED attributes may be added 4509 to the NFSv4 protocol as part of a new minor version by publishing a 4510 standards-track RFC which allocates a new attribute number value and 4511 defines the encoding for the attribute. See Section 2.7 for further 4512 discussion. 4514 Named attributes are accessed by the new OPENATTR operation, which 4515 accesses a hidden directory of attributes associated with a file 4516 system object. OPENATTR takes a filehandle for the object and 4517 returns the filehandle for the attribute hierarchy. The filehandle 4518 for the named attributes is a directory object accessible by LOOKUP 4519 or READDIR and contains files whose names represent the named 4520 attributes and whose data bytes are the value of the attribute. For 4521 example: 4523 +----------+-----------+---------------------------------+ 4524 | LOOKUP | "foo" | ; look up file | 4525 | GETATTR | attrbits | | 4526 | OPENATTR | | ; access foo's named attributes | 4527 | LOOKUP | "x11icon" | ; look up specific attribute | 4528 | READ | 0,4096 | ; read stream of bytes | 4529 +----------+-----------+---------------------------------+ 4531 Named attributes are intended for data needed by applications rather 4532 than by an NFS client implementation. NFS implementors are strongly 4533 encouraged to define their new attributes as RECOMMENDED attributes 4534 by bringing them to the IETF standards-track process. 4536 The set of attributes which are classified as REQUIRED is 4537 deliberately small since servers must do whatever it takes to support 4538 them. A server should support as many of the RECOMMENDED attributes 4539 as possible but by their definition, the server is not required to 4540 support all of them. Attributes are deemed REQUIRED if the data is 4541 both needed by a large number of clients and is not otherwise 4542 reasonably computable by the client when support is not provided on 4543 the server. 4545 Note that the hidden directory returned by OPENATTR is a convenience 4546 for protocol processing. The client should not make any assumptions 4547 about the server's implementation of named attributes and whether the 4548 underlying file system at the server has a named attribute directory 4549 or not. Therefore, operations such as SETATTR and GETATTR on the 4550 named attribute directory are undefined. 4552 5.1. REQUIRED Attributes 4554 These MUST be supported by every NFSv4.1 client and server in order 4555 to ensure a minimum level of interoperability. The server MUST store 4556 and return these attributes and the client MUST be able to function 4557 with an attribute set limited to these attributes. With just the 4558 REQUIRED attributes some client functionality may be impaired or 4559 limited in some ways. A client may ask for any of these attributes 4560 to be returned by setting a bit in the GETATTR request and the server 4561 must return their value. 4563 5.2. RECOMMENDED Attributes 4565 These attributes are understood well enough to warrant support in the 4566 NFSv4.1 protocol. However, they may not be supported on all clients 4567 and servers. A client may ask for any of these attributes to be 4568 returned by setting a bit in the GETATTR request but must handle the 4569 case where the server does not return them. A client may ask for the 4570 set of attributes the server supports and SHOULD NOT request 4571 attributes the server does not support. A server should be tolerant 4572 of requests for unsupported attributes and simply not return them 4573 rather than considering the request an error. It is expected that 4574 servers will support all attributes they comfortably can and only 4575 fail to support attributes which are difficult to support in their 4576 operating environments. A server should provide attributes whenever 4577 they don't have to "tell lies" to the client. For example, a file 4578 modification time should be either an accurate time or should not be 4579 supported by the server. This will not always be comfortable to 4580 clients but the client is better positioned decide whether and how to 4581 fabricate or construct an attribute or whether to do without the 4582 attribute. 4584 5.3. Named Attributes 4586 These attributes are not supported by direct encoding in the NFSv4 4587 protocol but are accessed by string names rather than numbers and 4588 correspond to an uninterpreted stream of bytes which are stored with 4589 the file system object. The name space for these attributes may be 4590 accessed by using the OPENATTR operation. The OPENATTR operation 4591 returns a filehandle for a virtual "named attribute directory" and 4592 further perusal and modification of the name space may be done using 4593 operations that work on more typical directories. In particular, 4594 READDIR may be used to get a list of such named attributes and LOOKUP 4595 and OPEN may select a particular attribute. Creation of a new named 4596 attribute may be the result of an OPEN specifying file creation. 4598 Once an OPEN is done, named attributes may be examined and changed by 4599 normal READ and WRITE operations using the filehandles and stateids 4600 returned by OPEN. 4602 Named attributes and the named attribute directory may have their own 4603 (non-named) attributes. Each of objects must have all of the 4604 REQUIRED attributes and may have additional RECOMMENDED attributes. 4605 However, the set of attributes for named attributes and the named 4606 attribute directory need not be as large as, and typically will not 4607 be as large as that for other objects in that file system. 4609 Named attributes and the named attribute directory may be the target 4610 of delegations (in the case of the named attribute directory these 4611 will be directory delegations). However, since granting of 4612 delegations or not is within the server's discretion, a server need 4613 not support delegations on named attributes or the named attribute 4614 directory. 4616 It is RECOMMENDED that servers support arbitrary named attributes. A 4617 client should not depend on the ability to store any named attributes 4618 in the server's file system. If a server does support named 4619 attributes, a client which is also able to handle them should be able 4620 to copy a file's data and metadata with complete transparency from 4621 one location to another; this would imply that names allowed for 4622 regular directory entries are valid for named attribute names as 4623 well. 4625 In NFSv4.1, the structure of named attribute directories is 4626 restricted in a number of ways, in order to prevent the development 4627 of non-interoperable implementations in which some servers support a 4628 fully general hierarchical directory structure for named attributes 4629 while others support a limited set, but fully adequate to the 4630 feature's goals. In such an environment, clients or applications 4631 might come to depend on non-portable extensions. The restrictions 4632 are: 4634 o CREATE is not allowed in a named attribute directory. Thus, such 4635 objects as symbolic links and special files are not allowed to be 4636 named attributes. Further, directories may not be created in a 4637 named attribute directory so no hierarchical structure of named 4638 attributes for a single object is allowed. 4640 o If OPENATTR is done on a named attribute directory or on a named 4641 attribute, the server MUST return NFS4ERR_WRONG_TYPE. 4643 o Doing a RENAME of a named attribute to a different named attribute 4644 directory or to an ordinary (i.e. non-named-attribute) directory 4645 is not allowed. 4647 o Creating hard links between named attribute directories or between 4648 named attribute directories and ordinary directories is not 4649 allowed. 4651 Names of attributes will not be controlled by this document or other 4652 IETF standards track documents. See Section 22.1 for further 4653 discussion. 4655 5.4. Classification of Attributes 4657 Each of the REQUIRED and RECOMMENDED attributes can be classified in 4658 one of three categories: per server, per file system, or per file 4659 system object. Note that it is possible that some per file system 4660 attributes may vary within the file system. See the "homogeneous" 4661 attribute for its definition. Note that the attributes 4662 time_access_set and time_modify_set are not listed in this section 4663 because they are write-only attributes corresponding to time_access 4664 and time_modify, and are used in a special instance of SETATTR. 4666 o The per server attribute is: 4668 lease_time 4670 o The per file system attributes are: 4672 supported_attrs, suppattr_exclcreat, fh_expire_type, 4673 link_support, symlink_support, unique_handles, aclsupport, 4674 cansettime, case_insensitive, case_preserving, 4675 chown_restricted, files_avail, files_free, files_total, 4676 fs_locations, homogeneous, maxfilesize, maxname, maxread, 4677 maxwrite, no_trunc, space_avail, space_free, space_total, 4678 time_delta, change_policy, fs_status, fs_layout_type, 4679 fs_locations_info, fs_charset_cap 4681 o The per file system object attributes are: 4683 type, change, size, named_attr, fsid, rdattr_error, filehandle, 4684 acl, archive, fileid, hidden, maxlink, mimetype, mode, 4685 numlinks, owner, owner_group, rawdev, space_used, system, 4686 time_access, time_backup, time_create, time_metadata, 4687 time_modify, mounted_on_fileid, dir_notif_delay, 4688 dirent_notif_delay, dacl, sacl, layout_type, layout_hint, 4689 layout_blksize, layout_alignment, mdsthreshold, retention_get, 4690 retention_set, retentevt_get, retentevt_set, retention_hold, 4691 mode_set_masked 4693 For quota_avail_hard, quota_avail_soft, and quota_used see their 4694 definitions below for the appropriate classification. 4696 5.5. Set-Only and Get-Only Attributes 4698 Some REQUIRED and RECOMMENDED attributes are set-only, i.e. they can 4699 be set via SETATTR but not retrieved via GETATTR. Similarly, some 4700 REQUIRED and RECOMMENDED attributes are get-only, i.e. they can be 4701 retrieved GETATTR but not set via SETATTR. If a client attempts to 4702 set a get-only attribute or get a set-only attributes, the server 4703 MUST return NFS4ERR_INVAL. 4705 5.6. REQUIRED Attributes - List and Definition References 4707 The list of REQUIRED attributes appears in Table 4. The meaning of 4708 the columns of the table are: 4710 o Name: the name of attribute 4712 o Id: the number assigned to the attribute. In the event of 4713 conflicts between the assigned number and [12], the latter is 4714 authoritative. 4716 o Data Type: The XDR data type of the attribute. 4718 o Acc: Access allowed to the attribute. R means read-only (GETATTR 4719 may retrieve, SETATTR may not set). W means write-only (SETATTR 4720 may set, GETATTR may not retrieve). R W means read/write (GETATTR 4721 may retrieve, SETATTR may set). 4723 o Defined in: the section of this specification that describes the 4724 attribute. 4726 +--------------------+----+------------+-----+------------------+ 4727 | Name | Id | Data Type | Acc | Defined in: | 4728 +--------------------+----+------------+-----+------------------+ 4729 | supported_attrs | 0 | bitmap4 | R | Section 5.8.1.1 | 4730 | type | 1 | nfs_ftype4 | R | Section 5.8.1.2 | 4731 | fh_expire_type | 2 | uint32_t | R | Section 5.8.1.3 | 4732 | change | 3 | uint64_t | R | Section 5.8.1.4 | 4733 | size | 4 | uint64_t | R W | Section 5.8.1.5 | 4734 | link_support | 5 | bool | R | Section 5.8.1.6 | 4735 | symlink_support | 6 | bool | R | Section 5.8.1.7 | 4736 | named_attr | 7 | bool | R | Section 5.8.1.8 | 4737 | fsid | 8 | fsid4 | R | Section 5.8.1.9 | 4738 | unique_handles | 9 | bool | R | Section 5.8.1.10 | 4739 | lease_time | 10 | nfs_lease4 | R | Section 5.8.1.11 | 4740 | rdattr_error | 11 | enum | R | Section 5.8.1.12 | 4741 | filehandle | 19 | nfs_fh4 | R | Section 5.8.1.13 | 4742 | suppattr_exclcreat | 75 | bitmap4 | R | Section 5.8.1.14 | 4743 +--------------------+----+------------+-----+------------------+ 4745 Table 4 4747 5.7. RECOMMENDED Attributes - List and Definition References 4749 The RECOMMENDED attributes are defined in Table 5. The meanings of 4750 the column headers are the same as Table 4; see Section 5.6 for the 4751 meanings. 4753 +--------------------+----+----------------+-----+------------------+ 4754 | Name | Id | Data Type | Acc | Defined in: | 4755 +--------------------+----+----------------+-----+------------------+ 4756 | acl | 12 | nfsace4<> | R W | Section 6.2.1 | 4757 | aclsupport | 13 | uint32_t | R | Section 6.2.1.2 | 4758 | archive | 14 | bool | R W | Section 5.8.2.1 | 4759 | cansettime | 15 | bool | R | Section 5.8.2.2 | 4760 | case_insensitive | 16 | bool | R | Section 5.8.2.3 | 4761 | case_preserving | 17 | bool | R | Section 5.8.2.4 | 4762 | change_policy | 60 | chg_policy4 | R | Section 5.8.2.5 | 4763 | chown_restricted | 18 | bool | R | Section 5.8.2.6 | 4764 | dacl | 58 | nfsacl41 | R W | Section 6.2.2 | 4765 | dir_notif_delay | 56 | nfstime4 | R | Section 5.11.1 | 4766 | dirent_notif_delay | 57 | nfstime4 | R | Section 5.11.2 | 4767 | fileid | 20 | uint64_t | R | Section 5.8.2.7 | 4768 | files_avail | 21 | uint64_t | R | Section 5.8.2.8 | 4769 | files_free | 22 | uint64_t | R | Section 5.8.2.9 | 4770 | files_total | 23 | uint64_t | R | Section 5.8.2.10 | 4771 | fs_charset_cap | 76 | uint32_t | R | Section 5.8.2.11 | 4772 | fs_layout_type | 62 | layouttype4<> | R | Section 5.12.1 | 4773 | fs_locations | 24 | fs_locations | R | Section 5.8.2.12 | 4774 | fs_locations_info | 67 | * | R | Section 5.8.2.13 | 4775 | fs_status | 61 | fs4_status | R | Section 5.8.2.14 | 4776 | hidden | 25 | bool | R W | Section 5.8.2.15 | 4777 | homogeneous | 26 | bool | R | Section 5.8.2.16 | 4778 | layout_alignment | 66 | uint32_t | R | Section 5.12.2 | 4779 | layout_blksize | 65 | uint32_t | R | Section 5.12.3 | 4780 | layout_hint | 63 | layouthint4 | W | Section 5.12.4 | 4781 | layout_type | 64 | layouttype4<> | R | Section 5.12.5 | 4782 | maxfilesize | 27 | uint64_t | R | Section 5.8.2.17 | 4783 | maxlink | 28 | uint32_t | R | Section 5.8.2.18 | 4784 | maxname | 29 | uint32_t | R | Section 5.8.2.19 | 4785 | maxread | 30 | uint64_t | R | Section 5.8.2.20 | 4786 | maxwrite | 31 | uint64_t | R | Section 5.8.2.21 | 4787 | mdsthreshold | 68 | mdsthreshold4 | R | Section 5.12.6 | 4788 | mimetype | 32 | utf8<> | R W | Section 5.8.2.22 | 4789 | mode | 33 | mode4 | R W | Section 6.2.4 | 4790 | mode_set_masked | 74 | mode_masked4 | W | Section 6.2.5 | 4791 | mounted_on_fileid | 55 | uint64_t | R | Section 5.8.2.23 | 4792 | no_trunc | 34 | bool | R | Section 5.8.2.24 | 4793 | numlinks | 35 | uint32_t | R | Section 5.8.2.25 | 4794 | owner | 36 | utf8<> | R W | Section 5.8.2.26 | 4795 | owner_group | 37 | utf8<> | R W | Section 5.8.2.27 | 4796 | quota_avail_hard | 38 | uint64_t | R | Section 5.8.2.28 | 4797 | quota_avail_soft | 39 | uint64_t | R | Section 5.8.2.29 | 4798 | quota_used | 40 | uint64_t | R | Section 5.8.2.30 | 4799 | rawdev | 41 | specdata4 | R | Section 5.8.2.31 | 4800 | retentevt_get | 71 | retention_get4 | R | Section 5.13.3 | 4801 | retentevt_set | 72 | retention_set4 | W | Section 5.13.4 | 4802 | retention_get | 69 | retention_get4 | R | Section 5.13.1 | 4803 | retention_hold | 73 | uint64_t | R W | Section 5.13.5 | 4804 | retention_set | 70 | retention_set4 | W | Section 5.13.2 | 4805 | sacl | 59 | nfsacl41 | R W | Section 6.2.3 | 4806 | space_avail | 42 | uint64_t | R | Section 5.8.2.32 | 4807 | space_free | 43 | uint64_t | R | Section 5.8.2.33 | 4808 | space_total | 44 | uint64_t | R | Section 5.8.2.34 | 4809 | space_used | 45 | uint64_t | R | Section 5.8.2.35 | 4810 | system | 46 | bool | R W | Section 5.8.2.36 | 4811 | time_access | 47 | nfstime4 | R | Section 5.8.2.37 | 4812 | time_access_set | 48 | settime4 | W | Section 5.8.2.38 | 4813 | time_backup | 49 | nfstime4 | R W | Section 5.8.2.39 | 4814 | time_create | 50 | nfstime4 | R W | Section 5.8.2.40 | 4815 | time_delta | 51 | nfstime4 | R | Section 5.8.2.41 | 4816 | time_metadata | 52 | nfstime4 | R | Section 5.8.2.42 | 4817 | time_modify | 53 | nfstime4 | R | Section 5.8.2.43 | 4818 | time_modify_set | 54 | settime4 | W | Section 5.8.2.44 | 4819 +--------------------+----+----------------+-----+------------------+ 4821 Table 5 4823 * fs_locations_info4 4825 5.8. Attribute Definitions 4827 5.8.1. Definitions of REQUIRED Attributes 4829 5.8.1.1. Attribute 0: supported_attrs 4831 The bit vector which would retrieve all REQUIRED and RECOMMENDED 4832 attributes that are supported for this object. The scope of this 4833 attribute applies to all objects with a matching fsid. 4835 5.8.1.2. Attribute 1: type 4837 Designates the type of an object in terms of one of a number of 4838 special constants: 4840 o NF4REG designates a regular file. 4842 o NF4DIR designates a directory. 4844 o NF4BLK designates a block device special file. 4846 o NF4CHR designates a character device special file. 4848 o NF4LNK designates a symbolic link. 4850 o NF4SOCK designates a named socket special file. 4852 o NF4FIFO designates a fifo special file. 4854 o NF4ATTRDIR designates a named attribute directory. 4856 o NF4NAMEDATTR designates a named attribute. 4858 Within the explanatory text and operation descriptions, the following 4859 phrases will be used with the meanings given below: 4861 o The phrase "is a directory" means that the object is of type 4862 NF4DIR or of type NF4ATTRDIR. 4864 o The phrase "is a special file" means that the object is of one of 4865 the types NF4BLK, NF4CHR, NF4SOCK, or NF4FIFO. 4867 o The phrase "is an ordinary file" means that the object is of type 4868 NF4REG or of type NF4NAMEDATTR. 4870 5.8.1.3. Attribute 2: fh_expire_type 4872 Server uses this to specify filehandle expiration behavior to the 4873 client. See Section 4 for additional description. 4875 5.8.1.4. Attribute 3: change 4877 A value created by the server that the client can use to determine if 4878 file data, directory contents or attributes of the object have been 4879 modified. The server may return the object's time_metadata attribute 4880 for this attribute's value but only if the file system object can not 4881 be updated more frequently than the resolution of time_metadata. 4883 5.8.1.5. Attribute 4: size 4885 The size of the object in bytes. 4887 5.8.1.6. Attribute 5: link_support 4889 True, if the object's file system supports hard links. 4891 5.8.1.7. Attribute 6: symlink_support 4893 True, if the object's file system supports symbolic links. 4895 5.8.1.8. Attribute 7: named_attr 4897 True, if this object has named attributes. In other words, object 4898 has a non-empty named attribute directory. 4900 5.8.1.9. Attribute 8: fsid 4902 Unique file system identifier for the file system holding this 4903 object. fsid contains major and minor components each of which are of 4904 data type uint64_t. 4906 5.8.1.10. Attribute 9: unique_handles 4908 True, if two distinct filehandles guaranteed to refer to two 4909 different file system objects. 4911 5.8.1.11. Attribute 10: lease_time 4913 Duration of leases at server in seconds. 4915 5.8.1.12. Attribute 11: rdattr_error 4917 Error returned from getattr during readdir. 4919 5.8.1.13. Attribute 19: filehandle 4921 The filehandle of this object (primarily for readdir requests). 4923 5.8.1.14. Attribute 75: suppattr_exclcreat 4925 The bit vector which would set all REQUIRED and RECOMMENDED 4926 attributes that are supported by the EXCLUSIVE4_1 method of file 4927 creation via the OPEN operation. The scope of this attribute applies 4928 to all objects with a matching fsid. 4930 5.8.2. Definitions of Uncategorized RECOMMENDED Attributes 4932 The definitions of most of the RECOMMENDED attributes follow. 4933 Collections that share a common category are defined in other 4934 sections. 4936 5.8.2.1. Attribute 14: archive 4938 True, if this file has been archived since the time of last 4939 modification (deprecated in favor of time_backup). 4941 5.8.2.2. Attribute 15: cansettime 4943 True, if the server able to change the times for a file system object 4944 as specified in a SETATTR operation. 4946 5.8.2.3. Attribute 16: case_insensitive 4948 True, if file name comparisons on this file system are case 4949 insensitive. 4951 5.8.2.4. Attribute 17: case_preserving 4953 True, if file name case on this file system is preserved. 4955 5.8.2.5. Attribute 60: change_policy 4957 A value created by the server that the client can use to determine if 4958 some server policy related to the current file system has been 4959 subject to change. If the value remains the same then the client can 4960 be sure that the values of the attributes related to fs location and 4961 the fss_type field of the fs_status attribute have not changed. On 4962 the other hand, a change in this value does necessarily imply a 4963 change in policy. It is up to the client to interrogate the server 4964 to determine if some policy relevant to it has changed. See 4965 Section 3.3.6 for details. 4967 This attribute MUST change when the value returned by the 4968 fs_locations or fs_locations_info attribute changes, when a file 4969 system goes from read-only to writable or vice versa, or when the 4970 allowable set of security flavors for the file system or any part 4971 thereof is changed. 4973 5.8.2.6. Attribute 18: chown_restricted 4975 If TRUE, the server will reject any request to change either the 4976 owner or the group associated with a file if the caller is not a 4977 privileged user (for example, "root" in UNIX operating environments 4978 or in Windows 2000 the "Take Ownership" privilege). 4980 5.8.2.7. Attribute 20: fileid 4982 A number uniquely identifying the file within the file system. 4984 5.8.2.8. Attribute 21: files_avail 4986 File slots available to this user on the file system containing this 4987 object - this should be the smallest relevant limit. 4989 5.8.2.9. Attribute 22: files_free 4991 Free file slots on the file system containing this object - this 4992 should be the smallest relevant limit. 4994 5.8.2.10. Attribute 23: files_total 4996 Total file slots on the file system containing this object. 4998 5.8.2.11. Attribute 76: fs_charset_cap 5000 Character set capabilities for this file system. See Section 14.4. 5002 5.8.2.12. Attribute 24: fs_locations 5004 Locations where this file system may be found. If the server returns 5005 NFS4ERR_MOVED as an error, this attribute MUST be supported. 5007 5.8.2.13. Attribute 67: fs_locations_info 5009 Full function file system location. 5011 5.8.2.14. Attribute 61: fs_status 5013 Generic file system type information. 5015 5.8.2.15. Attribute 25: hidden 5017 True, if the file is considered hidden with respect to the Windows 5018 API. 5020 5.8.2.16. Attribute 26: homogeneous 5022 True, if this object's file system is homogeneous, i.e. are per file 5023 system attributes the same for all file system's objects. 5025 5.8.2.17. Attribute 27: maxfilesize 5027 Maximum supported file size for the file system of this object. 5029 5.8.2.18. Attribute 28: maxlink 5031 Maximum number of links for this object. 5033 5.8.2.19. Attribute 29: maxname 5035 Maximum file name size supported for this object. 5037 5.8.2.20. Attribute 30: maxread 5039 Maximum read size supported for this object. 5041 5.8.2.21. Attribute 31: maxwrite 5043 Maximum write size supported for this object. This attribute SHOULD 5044 be supported if the file is writable. Lack of this attribute can 5045 lead to the client either wasting bandwidth or not receiving the best 5046 performance. 5048 5.8.2.22. Attribute 32: mimetype 5050 MIME body type/subtype of this object. 5052 5.8.2.23. Attribute 55: mounted_on_fileid 5054 Like fileid, but if the target filehandle is the root of a file 5055 system, this attribute represents the fileid of the underlying 5056 directory. 5058 UNIX-based operating environments connect a file system into the 5059 namespace by connecting (mounting) the file system onto the existing 5060 file object (the mount point, usually a directory) of an existing 5061 file system. When the mount point's parent directory is read via an 5062 API like readdir(), the return results are directory entries, each 5063 with a component name and a fileid. The fileid of the mount point's 5064 directory entry will be different from the fileid that the stat() 5065 system call returns. The stat() system call is returning the fileid 5066 of the root of the mounted file system, whereas readdir() is 5067 returning the fileid stat() would have returned before any file 5068 systems were mounted on the mount point. 5070 Unlike NFSv3, NFSv4.1 allows a client's LOOKUP request to cross other 5071 file systems. The client detects the file system crossing whenever 5072 the filehandle argument of LOOKUP has an fsid attribute different 5073 from that of the filehandle returned by LOOKUP. A UNIX-based client 5074 will consider this a "mount point crossing". UNIX has a legacy 5075 scheme for allowing a process to determine its current working 5076 directory. This relies on readdir() of a mount point's parent and 5077 stat() of the mount point returning fileids as previously described. 5078 The mounted_on_fileid attribute corresponds to the fileid that 5079 readdir() would have returned as described previously. 5081 While the NFSv4.1 client could simply fabricate a fileid 5082 corresponding to what mounted_on_fileid provides (and if the server 5083 does not support mounted_on_fileid, the client has no choice), there 5084 is a risk that the client will generate a fileid that conflicts with 5085 one that is already assigned to another object in the file system. 5086 Instead, if the server can provide the mounted_on_fileid, the 5087 potential for client operational problems in this area is eliminated. 5089 If the server detects that there is no mounted point at the target 5090 file object, then the value for mounted_on_fileid that it returns is 5091 the same as that of the fileid attribute. 5093 The mounted_on_fileid attribute is RECOMMENDED, so the server SHOULD 5094 provide it if possible, and for a UNIX-based server, this is 5095 straightforward. Usually, mounted_on_fileid will be requested during 5096 a READDIR operation, in which case it is trivial (at least for UNIX- 5097 based servers) to return mounted_on_fileid since it is equal to the 5098 fileid of a directory entry returned by readdir(). If 5099 mounted_on_fileid is requested in a GETATTR operation, the server 5100 should obey an invariant that has it returning a value that is equal 5101 to the file object's entry in the object's parent directory, i.e. 5102 what readdir() would have returned. Some operating environments 5103 allow a series of two or more file systems to be mounted onto a 5104 single mount point. In this case, for the server to obey the 5105 aforementioned invariant, it will need to find the base mount point, 5106 and not the intermediate mount points. 5108 5.8.2.24. Attribute 34: no_trunc 5110 If this attribute is TRUE, then if the client uses a file name longer 5111 than name_max, an error will be returned instead of the name being 5112 truncated. 5114 5.8.2.25. Attribute 35: numlinks 5116 Number of hard links to this object. 5118 5.8.2.26. Attribute 36: owner 5120 The string name of the owner of this object. 5122 5.8.2.27. Attribute 37: owner_group 5124 The string name of the group ownership of this object. 5126 5.8.2.28. Attribute 38: quota_avail_hard 5128 The value in bytes which represents the amount of additional disk 5129 space beyond the current allocation that can be allocated to this 5130 file or directory before further allocations will be refused. It is 5131 understood that this space may be consumed by allocations to other 5132 files or directories. 5134 5.8.2.29. Attribute 39: quota_avail_soft 5136 The value in bytes which represents the amount of additional disk 5137 space that can be allocated to this file or directory before the user 5138 may reasonably be warned. It is understood that this space may be 5139 consumed by allocations to other files or directories though there is 5140 a rule as to which other files or directories. 5142 5.8.2.30. Attribute 40: quota_used 5144 The value in bytes which represent the amount of disc space used by 5145 this file or directory and possibly a number of other similar files 5146 or directories, where the set of "similar" meets at least the 5147 criterion that allocating space to any file or directory in the set 5148 will reduce the "quota_avail_hard" of every other file or directory 5149 in the set. 5151 Note that there may be a number of distinct but overlapping sets of 5152 files or directories for which a quota_used value is maintained. 5153 E.g. "all files with a given owner", "all files with a given group 5154 owner". etc. 5156 The server is at liberty to choose any of those sets but should do so 5157 in a repeatable way. The rule may be configured per file system or 5158 may be "choose the set with the smallest quota". 5160 5.8.2.31. Attribute 41: rawdev 5162 Raw device identifier; the UNIX device major/minor node information. 5163 If the value of type is not NF4BLK or NF4CHR, the value returned 5164 SHOULD NOT be considered useful. 5166 5.8.2.32. Attribute 42: space_avail 5168 Disk space in bytes available to this user on the file system 5169 containing this object - this should be the smallest relevant limit. 5171 5.8.2.33. Attribute 43: space_free 5173 Free disk space in bytes on the file system containing this object - 5174 this should be the smallest relevant limit. 5176 5.8.2.34. Attribute 44: space_total 5178 Total disk space in bytes on the file system containing this object. 5180 5.8.2.35. Attribute 45: space_used 5182 Number of file system bytes allocated to this object. 5184 5.8.2.36. Attribute 46: system 5186 This attribute is TRUE if this file is a "system" file with respect 5187 to the Windows operating environment. 5189 5.8.2.37. Attribute 47: time_access 5191 The time_access attribute represents the time of last access to the 5192 object by a read that was satisfied by the server. The notion of 5193 what is an "access" depends on server's operating environment and/or 5194 the server's file system semantics. For example, for servers obeying 5195 POSIX semantics, time_access would be updated only by the READLINK, 5196 READ, and READDIR operations and not any of the operations that 5197 modify the content of the object. Of course, setting the 5198 corresponding time_access_set attribute is another way to modify the 5199 time_access attribute. 5201 Whenever the file object resides on a writable file system, the 5202 server should make best efforts to record time_access into stable 5203 storage. However, to mitigate the performance effects of doing so, 5204 and most especially whenever the server is satisfying the read of the 5205 object's content from its cache, the server MAY cache access time 5206 updates and lazily write them to stable storage. It is also 5207 acceptable to give administrators of the server the option to disable 5208 time_access updates. 5210 5.8.2.38. Attribute 48: time_access_set 5212 Set the time of last access to the object. SETATTR use only. 5214 5.8.2.39. Attribute 49: time_backup 5216 The time of last backup of the object. 5218 5.8.2.40. Attribute 50: time_create 5220 The time of creation of the object. This attribute does not have any 5221 relation to the traditional UNIX file attribute "ctime" or "change 5222 time". 5224 5.8.2.41. Attribute 51: time_delta 5226 Smallest useful server time granularity. 5228 5.8.2.42. Attribute 52: time_metadata 5230 The time of last metadata modification of the object. 5232 5.8.2.43. Attribute 53: time_modify 5234 The time of last modification to the object. 5236 5.8.2.44. Attribute 54: time_modify_set 5238 Set the time of last modification to the object. SETATTR use only. 5240 5.9. Interpreting owner and owner_group 5242 The RECOMMENDED attributes "owner" and "owner_group" (and also users 5243 and groups within the "acl" attribute) are represented in terms of a 5244 UTF-8 string. To avoid a representation that is tied to a particular 5245 underlying implementation at the client or server, the use of the 5246 UTF-8 string has been chosen. Note that section 6.1 of RFC2624 [34] 5247 provides additional rationale. It is expected that the client and 5248 server will have their own local representation of owner and 5249 owner_group that is used for local storage or presentation to the end 5250 user. Therefore, it is expected that when these attributes are 5251 transferred between the client and server that the local 5252 representation is translated to a syntax of the form "user@ 5253 dns_domain". This will allow for a client and server that do not use 5254 the same local representation the ability to translate to a common 5255 syntax that can be interpreted by both. 5257 Similarly, security principals may be represented in different ways 5258 by different security mechanisms. Servers normally translate these 5259 representations into a common format, generally that used by local 5260 storage, to serve as a means of identifying the users corresponding 5261 to these security principals. When these local identifiers are 5262 translated to the form of the owner attribute, associated with files 5263 created by such principals they identify, in a common format, the 5264 users associated with each corresponding set of security principals. 5266 The translation used to interpret owner and group strings is not 5267 specified as part of the protocol. This allows various solutions to 5268 be employed. For example, a local translation table may be consulted 5269 that maps between a numeric identifier to the user@dns_domain syntax. 5270 A name service may also be used to accomplish the translation. A 5271 server may provide a more general service, not limited by any 5272 particular translation (which would only translate a limited set of 5273 possible strings) by storing the owner and owner_group attributes in 5274 local storage without any translation or it may augment a translation 5275 method by storing the entire string for attributes for which no 5276 translation is available while using the local representation for 5277 those cases in which a translation is available. 5279 Servers that do not provide support for all possible values of the 5280 owner and owner_group attributes, SHOULD return an error 5281 (NFS4ERR_BADOWNER) when a string is presented that has no 5282 translation, as the value to be set for a SETATTR of the owner, 5283 owner_group, or acl attributes. When a server does accept an owner 5284 or owner_group value as valid on a SETATTR (and similarly for the 5285 owner and group strings in an acl), it is promising to return that 5286 same string when a corresponding GETATTR is done. Configuration 5287 changes (including changes from the mapping of the string to the 5288 local representation) and ill-constructed name translations (those 5289 that contain aliasing) may make that promise impossible to honor. 5290 Servers should make appropriate efforts to avoid a situation in which 5291 these attributes have their values changed when no real change to 5292 ownership has occurred. 5294 The "dns_domain" portion of the owner string is meant to be a DNS 5295 domain name. For example, user@ietf.org. Servers should accept as 5296 valid a set of users for at least one domain. A server may treat 5297 other domains as having no valid translations. A more general 5298 service is provided when a server is capable of accepting users for 5299 multiple domains, or for all domains, subject to security 5300 constraints. 5302 In the case where there is no translation available to the client or 5303 server, the attribute value must be constructed without the "@". 5304 Therefore, the absence of the @ from the owner or owner_group 5305 attribute signifies that no translation was available at the sender 5306 and that the receiver of the attribute should not use that string as 5307 a basis for translation into its own internal format. Even though 5308 the attribute value can not be translated, it may still be useful. 5309 In the case of a client, the attribute string may be used for local 5310 display of ownership. 5312 To provide a greater degree of compatibility with NFSv3, which 5313 identified users and groups by 32-bit unsigned user identifiers and 5314 group identifiers, owner and group strings that consist of decimal 5315 numeric values with no leading zeros can be given a special 5316 interpretation by clients and servers which choose to provide such 5317 support. The receiver may treat such a user or group string as 5318 representing the same user as would be represented by an NFSv3 uid or 5319 gid having the corresponding numeric value. A server is not 5320 obligated to accept such a string, but may return an NFS4ERR_BADOWNER 5321 instead. To avoid this mechanism being used to subvert user and 5322 group translation, so that a client might pass all of the owners and 5323 groups in numeric form, a server SHOULD return an NFS4ERR_BADOWNER 5324 error when there is a valid translation for the user or owner 5325 designated in this way. In that case, the client must use the 5326 appropriate name@domain string and not the special form for 5327 compatibility. 5329 The owner string "nobody" may be used to designate an anonymous user, 5330 which will be associated with a file created by a security principal 5331 that cannot be mapped through normal means to the owner attribute. 5333 5.10. Character Case Attributes 5335 With respect to the case_insensitive and case_preserving attributes, 5336 each UCS-4 character (which UTF-8 encodes) has a "long descriptive 5337 name" RFC1345 [35] which may or may not include the word "CAPITAL" or 5338 "SMALL". The presence of SMALL or CAPITAL allows an NFS server to 5339 implement unambiguous and efficient table driven mappings for case 5340 insensitive comparisons, and non-case-preserving storage. For 5341 general character handling and internationalization issues, see 5342 Section 14. 5344 5.11. Directory Notification Attributes 5346 As described in Section 18.39, the client can request a minimum delay 5347 for notifications of changes to attributes, but the server is free to 5348 ignore what the client requests. The client can determine in advance 5349 what notification delays the server will accept by issuing a GETATTR 5350 for either or both of two directory notification attributes. When 5351 the client calls the GET_DIR_DELEGATION operation and asks for 5352 attribute change notifications, it should request notification delays 5353 that are no less than the values in the server-provided attributes. 5355 5.11.1. Attribute 56: dir_notif_delay 5357 The dir_notif_delay attribute is the minimum number of seconds the 5358 server will delay before notifying the client of a change to the 5359 directory's attributes. 5361 5.11.2. Attribute 57: dirent_notif_delay 5363 The dirent_notif_delay attribute is the minimum number of seconds the 5364 server will delay before notifying the client of a change to a file 5365 object that has an entry in the directory. 5367 5.12. pNFS Attribute Definitions 5369 5.12.1. Attribute 62: fs_layout_type 5371 The fs_layout_type attribute (see Section 3.3.13) applies to a file 5372 system and indicates what layout types are supported by the file 5373 system. When the client encounters a new fsid, the client SHOULD 5374 obtain the value for the fs_layout_type attribute associated with the 5375 new file system. This attribute is used by the client to determine 5376 if the layout types supported by the server match any of the client's 5377 supported layout types. 5379 5.12.2. Attribute 66: layout_alignment 5381 When a client holds layouts on files of a file system, the 5382 layout_alignment attribute indicates the preferred alignment for I/O 5383 to files on that file system. Where possible, the client should send 5384 READ and WRITE operations with offsets that are whole multiples of 5385 the layout_alignment attribute. 5387 5.12.3. Attribute 65: layout_blksize 5389 When a client holds layouts on files of a file system, the 5390 layout_blksize attribute indicates the preferred block size for I/O 5391 to files on that file system. Where possible, the client should send 5392 READ operations with a count argument that is a whole multiple of 5393 layout_blksize, and WRITE operations with a data argument of size 5394 that is a whole multiple of layout_blksize. 5396 5.12.4. Attribute 63: layout_hint 5398 The layout_hint attribute (see Section 3.3.19) may be set on newly 5399 created files to influence the metadata server's choice for the 5400 file's layout. If possible, this attribute is one of those set in 5401 the initial attributes within the OPEN operation. The metadata 5402 server may choose to ignore this attribute. The layout_hint 5403 attribute is a sub-set of the layout structure returned by LAYOUTGET. 5404 For example, instead of specifying particular devices, this would be 5405 used to suggest the stripe width of a file. The server 5406 implementation determines which fields within the layout will be 5407 used. 5409 5.12.5. Attribute 64: layout_type 5411 This attribute lists the layout type(s) available for a file. The 5412 value returned by the server is for informational purposes only. The 5413 client will use the LAYOUTGET operation to obtain the information 5414 needed in order to perform I/O. For example, the specific device 5415 information for the file and its layout. 5417 5.12.6. Attribute 68: mdsthreshold 5419 This attribute is a server provided hint used to communicate to the 5420 client when it is more efficient to send READ and WRITE operations to 5421 the metadata server or the data server. The two types of thresholds 5422 described are file size thresholds and I/O size thresholds. If a 5423 file's size is smaller than the file size threshold, data accesses 5424 SHOULD be sent to the metadata server. If an I/O request has a 5425 length that is below the I/O size threshold, the I/O SHOULD be sent 5426 to the metadata server. Each threshold type is specified separately 5427 for READ and WRITE. 5429 The server MAY provide both types of thresholds for a file. If both 5430 file size and I/O size are provided, the client SHOULD reach or 5431 exceed both thresholds before issuing its READ or WRITE requests to 5432 the data server. Alternatively, if only one of the specified 5433 thresholds are reached or exceeded, the I/O requests are sent to the 5434 metadata server. 5436 For each threshold type, a value of 0 indicates no READ or WRITE 5437 should be sent to the metadata server, while a value of all 1s 5438 indicates all READS or WRITES should be sent to the metadata server. 5440 The attribute is available on a per filehandle basis. If the current 5441 filehandle refers to a non-pNFS file or directory, the metadata 5442 server should return an attribute that is representative of the 5443 filehandle's file system. It is suggested that this attribute is 5444 queried as part of the OPEN operation. Due to dynamic system 5445 changes, the client should not assume that the attribute will remain 5446 constant for any specific time period, thus it should be periodically 5447 refreshed. 5449 5.13. Retention Attributes 5451 Retention is a concept whereby a file object can be placed in an 5452 immutable, undeletable, unrenamable state for a fixed or infinite 5453 duration of time. Once in this "retained" state, the file cannot be 5454 moved out of the state until the duration of retention has been 5455 reached. 5457 When retention is enabled, retention MUST extend to the data of the 5458 file, and the name of file. The server MAY extend retention to any 5459 other property of the file, including any subset of REQUIRED, 5460 RECOMMENDED, and named attributes, with the exceptions noted in this 5461 section. 5463 Servers MAY support or not support retention on any file object type. 5465 The five retention attributes are explained in the next subsections. 5467 5.13.1. Attribute 69: retention_get 5469 If retention is enabled for the associated file, this attribute's 5470 value represents the retention begin time of the file object. This 5471 attribute's value is only readable with the GETATTR operation and 5472 MUST NOT be modified by the SETATTR operation (Section 5.5). The 5473 value of the attribute consists of: 5475 const RET4_DURATION_INFINITE = 0xffffffffffffffff; 5476 struct retention_get4 { 5477 uint64_t rg_duration; 5478 nfstime4 rg_begin_time<1>; 5479 }; 5481 The field rg_duration is the duration in seconds indicating how long 5482 the file will be retained once retention is enabled. The field 5483 rg_begin_time is an array of up to one absolute time value. If the 5484 array is zero length, no beginning retention time has been 5485 established, and retention is not enabled. If rg_duration is equal 5486 to RET4_DURATION_INFINITE, the file, once retention is enabled, will 5487 be retained for an infinite duration. 5489 If (as soon as) rg_duration is zero, then rg_begin_time will be of 5490 zero length, and again, retention is not (no longer) enabled. 5492 5.13.2. Attribute 70: retention_set 5494 This attribute is used to set the retention duration and optionally 5495 enable retention for the associated file object. This attribute is 5496 only modifiable via the SETATTR operation and MUST NOT be retrieved 5497 by the GETATTR operation (Section 5.5). This attribute corresponds 5498 to retention_get. The value of the attribute consists of: 5500 struct retention_set4 { 5501 bool rs_enable; 5502 uint64_t rs_duration<1>; 5503 }; 5505 If the client sets rs_enable to TRUE, then it is enabling retention 5506 on the file object with the begin time of retention starting from the 5507 server's current time and date. The duration of the retention can 5508 also be provided if the rs_duration array is of length one. The 5509 duration is the time in seconds from the begin time of retention, and 5510 if set to RET4_DURATION_INFINITE, the file is to be retained forever. 5511 If retention is enabled, with no duration specified in either this 5512 SETATTR or a previous SETATTR, the duration defaults to zero seconds. 5513 The server MAY restrict the enabling of retention or the duration of 5514 retention on the basis of the ACE4_WRITE_RETENTION ACL permission. 5515 The enabling of retention MUST NOT prevent the enabling of event- 5516 based retention nor the modification of the retention_hold attribute. 5518 The following rules apply to both the retention_set and retentevt_set 5519 attributes. 5521 o As long as retention is not enabled, the client is permitted to 5522 decrease the duration. 5524 o The duration can always be set to an equal or higher value, even 5525 if retention is enabled. Note that once retention is enabled, the 5526 actual duration (as returned by the retention_get or retentevt_get 5527 attributes, see Section 5.13.1 or Section 5.13.3), is constantly 5528 counting down to zero (one unit per second), unless the duration 5529 was set to RET4_DURATION_INFINITE. Thus it will not be possible 5530 for the client to precisely extend the duration on a file that has 5531 retention enabled. 5533 o While retention is enabled, attempts to disable retention or 5534 decrease the retention's duration MUST fail with the error 5535 NFS4ERR_INVAL. 5537 o If the principal attempting to change retention_set or 5538 retentevt_set does not have ACE4_WRITE_RETENTION permissions, the 5539 attempt MUST fail with NFS4ERR_ACCESS. 5541 5.13.3. Attribute 71: retentevt_get 5543 Get the event-based retention duration, and if enabled, the event- 5544 based retention begin time of the file object. This attribute is 5545 like retention_get but refers to event-based retention. The event 5546 that triggers event-based retention is not defined by the NFSv4.1 5547 specification. 5549 5.13.4. Attribute 72: retentevt_set 5551 Set the event-based retention duration, and optionally enable event- 5552 based retention on the file object. This attribute corresponds to 5553 retentevt_get, is like retention_set, but refers to event-based 5554 retention. When event based retention is set, the file MUST be 5555 retained even if non-event-based retention has been set, and the 5556 duration of non-event-based retention has been reached. Conversely, 5557 when non-event-based retention has been set, the file MUST be 5558 retained even if event-based retention has been set, and the duration 5559 of event-based retention has been reached. The server MAY restrict 5560 the enabling of event-based retention or the duration of event-based 5561 retention on the basis of the ACE4_WRITE_RETENTION ACL permission. 5562 The enabling of event-based retention MUST NOT prevent the enabling 5563 of non-event-based retention nor the modification of the 5564 retention_hold attribute. 5566 5.13.5. Attribute 73: retention_hold 5568 Get or set administrative retention holds, one hold per bit position. 5570 This attribute allows one to 64 administrative holds, one hold per 5571 bit on the attribute. If retention_hold is not zero, then the file 5572 MUST NOT be deleted, renamed, or modified, even if the duration on 5573 enabled event or non-event-based retention has been reached. The 5574 server MAY restrict the modification of retention_hold on the basis 5575 of the ACE4_WRITE_RETENTION_HOLD ACL permission. The enabling of 5576 administration retention holds does not prevent the enabling of 5577 event-based or non-event-based retention. 5579 If the principal attempting to change retention_hold does not have 5580 ACE4_WRITE_RETENTION_HOLD permissions, the attempt MUST fail with 5581 NFS4ERR_ACCESS. 5583 6. Access Control Attributes 5585 Access Control Lists (ACLs) are file attributes that specify fine 5586 grained access control. This chapter covers the "acl", "dacl", 5587 "sacl", "aclsupport", "mode", "mode_set_masked" file attributes, and 5588 their interactions. Note that file attributes may apply to any file 5589 system object. 5591 6.1. Goals 5593 ACLs and modes represent two well established models for specifying 5594 permissions. This chapter specifies requirements that attempt to 5595 meet the following goals: 5597 o If a server supports the mode attribute, it should provide 5598 reasonable semantics to clients that only set and retrieve the 5599 mode attribute. 5601 o If a server supports ACL attributes, it should provide reasonable 5602 semantics to clients that only set and retrieve those attributes. 5604 o On servers that support the mode attribute, if ACL attributes have 5605 never been set on an object, via inheritance or explicitly, the 5606 behavior should be traditional UNIX-like behavior. 5608 o On servers that support the mode attribute, if the ACL attributes 5609 have been previously set on an object, either explicitly or via 5610 inheritance: 5612 * Setting only the mode attribute should effectively control the 5613 traditional UNIX-like permissions of read, write, and execute 5614 on owner, owner_group, and other. 5616 * Setting only the mode attribute should provide reasonable 5617 security. For example, setting a mode of 000 should be enough 5618 to ensure that future opens for read or write by any principal 5619 fail, regardless of a previously existing or inherited ACL. 5621 o NFSv4.1 may introduce different semantics relating to the mode and 5622 ACL attributes, but it does not render invalid any previously 5623 existing implementations. Additionally, this chapter provides 5624 clarifications based on previous implementations and discussions 5625 around them. 5627 o On servers that support both the mode and the acl or dacl 5628 attributes, the server must keep the two consistent with each 5629 other. The value of the mode attribute (with the exception of the 5630 three high order bits described in Section 6.2.4), must be 5631 determined entirely by the value of the ACL, so that use of the 5632 mode is never required for anything other than setting the three 5633 high order bits. See Section 6.4.1 for exact requirements. 5635 o When a mode attribute is set on an object, the ACL attributes may 5636 need to be modified so as to not conflict with the new mode. In 5637 such cases, it is desirable that the ACL keep as much information 5638 as possible. This includes information about inheritance, AUDIT 5639 and ALARM ACEs, and permissions granted and denied that do not 5640 conflict with the new mode. 5642 6.2. File Attributes Discussion 5644 6.2.1. Attribute 12: acl 5646 The NFSv4.1 ACL attribute contains an array of access control entries 5647 (ACEs) that are associated with the file system object. Although the 5648 client can read and write the acl attribute, the server is 5649 responsible for using the ACL to perform access control. The client 5650 can use the OPEN or ACCESS operations to check access without 5651 modifying or reading data or metadata. 5653 The NFS ACE structure is defined as follows: 5655 typedef uint32_t acetype4; 5657 typedef uint32_t aceflag4; 5659 typedef uint32_t acemask4; 5660 struct nfsace4 { 5661 acetype4 type; 5662 aceflag4 flag; 5663 acemask4 access_mask; 5664 utf8str_mixed who; 5665 }; 5667 To determine if a request succeeds, the server processes each nfsace4 5668 entry in order. Only ACEs which have a "who" that matches the 5669 requester are considered. Each ACE is processed until all of the 5670 bits of the requester's access have been ALLOWED. Once a bit (see 5671 below) has been ALLOWED by an ACCESS_ALLOWED_ACE, it is no longer 5672 considered in the processing of later ACEs. If an ACCESS_DENIED_ACE 5673 is encountered where the requester's access still has unALLOWED bits 5674 in common with the "access_mask" of the ACE, the request is denied. 5675 When the ACL is fully processed, if there are bits in the requester's 5676 mask that have not been ALLOWED or DENIED, access is denied. 5678 Unlike the ALLOW and DENY ACE types, the ALARM and AUDIT ACE types do 5679 not affect a requester's access, and instead are for triggering 5680 events as a result of a requester's access attempt. Therefore, AUDIT 5681 and ALARM ACEs are processed only after processing ALLOW and DENY 5682 ACEs. 5684 The NFSv4.1 ACL model is quite rich. Some server platforms may 5685 provide access control functionality that goes beyond the UNIX-style 5686 mode attribute, but which is not as rich as the NFS ACL model. So 5687 that users can take advantage of this more limited functionality, the 5688 server may support the acl attributes by mapping between its ACL 5689 model and the NFSv4.1 ACL model. Servers must ensure that the ACL 5690 they actually store or enforce is at least as strict as the NFSv4 ACL 5691 that was set. It is tempting to accomplish this by rejecting any ACL 5692 that falls outside the small set that can be represented accurately. 5693 However, such an approach can render ACLs unusable without special 5694 client-side knowledge of the server's mapping, which defeats the 5695 purpose of having a common NFSv4 ACL protocol. Therefore servers 5696 should accept every ACL that they can without compromising security. 5697 To help accomplish this, servers may make a special exception, in the 5698 case of unsupported permission bits, to the rule that bits not 5699 ALLOWED or DENIED by an ACL must be denied. For example, a UNIX- 5700 style server might choose to silently allow read attribute 5701 permissions even though an ACL does not explicitly allow those 5702 permissions. (An ACL that explicitly denies permission to read 5703 attributes should still be rejected.) 5705 The situation is complicated by the fact that a server may have 5706 multiple modules that enforce ACLs. For example, the enforcement for 5707 NFSv4.1 access may be different from, but not weaker than, the 5708 enforcement for local access, and both may be different from the 5709 enforcement for access through other protocols such as SMB. So it 5710 may be useful for a server to accept an ACL even if not all of its 5711 modules are able to support it. 5713 The guiding principle with regard to NFSv4 access is that the server 5714 must not accept ACLs that appear to make access to the file more 5715 restrictive than it really is. 5717 6.2.1.1. ACE Type 5719 The constants used for the type field (acetype4) are as follows: 5721 const ACE4_ACCESS_ALLOWED_ACE_TYPE = 0x00000000; 5722 const ACE4_ACCESS_DENIED_ACE_TYPE = 0x00000001; 5723 const ACE4_SYSTEM_AUDIT_ACE_TYPE = 0x00000002; 5724 const ACE4_SYSTEM_ALARM_ACE_TYPE = 0x00000003; 5726 Only the ALLOWED and DENIED bits types may be used in the dacl 5727 attribute, and only the AUDIT and ALARM bits may be used in the sacl 5728 attribute. All four are permitted in the acl attribute. 5730 +------------------------------+--------------+---------------------+ 5731 | Value | Abbreviation | Description | 5732 +------------------------------+--------------+---------------------+ 5733 | ACE4_ACCESS_ALLOWED_ACE_TYPE | ALLOW | Explicitly grants | 5734 | | | the access defined | 5735 | | | in acemask4 to the | 5736 | | | file or directory. | 5737 | ACE4_ACCESS_DENIED_ACE_TYPE | DENY | Explicitly denies | 5738 | | | the access defined | 5739 | | | in acemask4 to the | 5740 | | | file or directory. | 5741 | ACE4_SYSTEM_AUDIT_ACE_TYPE | AUDIT | LOG (in a system | 5742 | | | dependent way) any | 5743 | | | access attempt to a | 5744 | | | file or directory | 5745 | | | which uses any of | 5746 | | | the access methods | 5747 | | | specified in | 5748 | | | acemask4. | 5749 | ACE4_SYSTEM_ALARM_ACE_TYPE | ALARM | Generate a system | 5750 | | | ALARM (system | 5751 | | | dependent) when any | 5752 | | | access attempt is | 5753 | | | made to a file or | 5754 | | | directory for the | 5755 | | | access methods | 5756 | | | specified in | 5757 | | | acemask4. | 5758 +------------------------------+--------------+---------------------+ 5760 The "Abbreviation" column denotes how the types will be referred to 5761 throughout the rest of this chapter. 5763 6.2.1.2. Attribute 13: aclsupport 5765 A server need not support all of the above ACE types. This attribute 5766 indicates which ACE types are supported for the current file system. 5767 The bitmask constants used to represent the above definitions within 5768 the aclsupport attribute are as follows: 5770 const ACL4_SUPPORT_ALLOW_ACL = 0x00000001; 5771 const ACL4_SUPPORT_DENY_ACL = 0x00000002; 5772 const ACL4_SUPPORT_AUDIT_ACL = 0x00000004; 5773 const ACL4_SUPPORT_ALARM_ACL = 0x00000008; 5775 Servers which support either the ALLOW or DENY ACE type SHOULD 5776 support both ALLOW and DENY ACE types. 5778 Clients should not attempt to set an ACE unless the server claims 5779 support for that ACE type. If the server receives a request to set 5780 an ACE that it cannot store, it MUST reject the request with 5781 NFS4ERR_ATTRNOTSUPP. If the server receives a request to set an ACE 5782 that it can store but cannot enforce, the server SHOULD reject the 5783 request with NFS4ERR_ATTRNOTSUPP. 5785 Support for any of the ACL attributes is optional (albeit, 5786 RECOMMENDED). However, a server that supports either of the new ACL 5787 attributes (dacl or sacl) MUST allow use of the new ACL attributes to 5788 access all of the ACE types which it supports. In other words, if 5789 such a server supports ALLOW or DENY ACEs, then it MUST support the 5790 dacl attribute, and if it supports AUDIT or ALARM ACEs, then it MUST 5791 support the sacl attribute. 5793 6.2.1.3. ACE Access Mask 5795 The bitmask constants used for the access mask field are as follows: 5797 const ACE4_READ_DATA = 0x00000001; 5798 const ACE4_LIST_DIRECTORY = 0x00000001; 5799 const ACE4_WRITE_DATA = 0x00000002; 5800 const ACE4_ADD_FILE = 0x00000002; 5801 const ACE4_APPEND_DATA = 0x00000004; 5802 const ACE4_ADD_SUBDIRECTORY = 0x00000004; 5803 const ACE4_READ_NAMED_ATTRS = 0x00000008; 5804 const ACE4_WRITE_NAMED_ATTRS = 0x00000010; 5805 const ACE4_EXECUTE = 0x00000020; 5806 const ACE4_DELETE_CHILD = 0x00000040; 5807 const ACE4_READ_ATTRIBUTES = 0x00000080; 5808 const ACE4_WRITE_ATTRIBUTES = 0x00000100; 5809 const ACE4_WRITE_RETENTION = 0x00000200; 5810 const ACE4_WRITE_RETENTION_HOLD = 0x00000400; 5812 const ACE4_DELETE = 0x00010000; 5813 const ACE4_READ_ACL = 0x00020000; 5814 const ACE4_WRITE_ACL = 0x00040000; 5815 const ACE4_WRITE_OWNER = 0x00080000; 5816 const ACE4_SYNCHRONIZE = 0x00100000; 5818 Note that some masks have coincident values, for example, 5819 ACE4_READ_DATA and ACE4_LIST_DIRECTORY. The mask entries 5820 ACE4_LIST_DIRECTORY, ACE4_ADD_FILE, and ACE4_ADD_SUBDIRECTORY are 5821 intended to be used with directory objects, while ACE4_READ_DATA, 5822 ACE4_WRITE_DATA, and ACE4_APPEND_DATA are intended to be used with 5823 non-directory objects. 5825 6.2.1.3.1. Discussion of Mask Attributes 5827 ACE4_READ_DATA 5829 Operation(s) affected: 5831 READ 5833 OPEN 5835 Discussion: 5837 Permission to read the data of the file. 5839 Servers SHOULD allow a user the ability to read the data of the 5840 file when only the ACE4_EXECUTE access mask bit is allowed. 5842 ACE4_LIST_DIRECTORY 5844 Operation(s) affected: 5846 READDIR 5848 Discussion: 5850 Permission to list the contents of a directory. 5852 ACE4_WRITE_DATA 5854 Operation(s) affected: 5856 WRITE 5858 OPEN 5860 SETATTR of size 5862 Discussion: 5864 Permission to modify a file's data. 5866 ACE4_ADD_FILE 5868 Operation(s) affected: 5870 CREATE 5872 LINK 5874 OPEN 5876 RENAME 5878 Discussion: 5880 Permission to add a new file in a directory. The CREATE 5881 operation is affected when nfs_ftype4 is NF4LNK, NF4BLK, 5882 NF4CHR, NF4SOCK, or NF4FIFO. (NF4DIR is not listed because it 5883 is covered by ACE4_ADD_SUBDIRECTORY.) OPEN is affected when 5884 used to create a regular file. LINK and RENAME are always 5885 affected. 5887 ACE4_APPEND_DATA 5889 Operation(s) affected: 5891 WRITE 5893 OPEN 5895 SETATTR of size 5897 Discussion: 5899 The ability to modify a file's data, but only starting at EOF. 5900 This allows for the notion of append-only files, by allowing 5901 ACE4_APPEND_DATA and denying ACE4_WRITE_DATA to the same user 5902 or group. If a file has an ACL such as the one described above 5903 and a WRITE request is made for somewhere other than EOF, the 5904 server SHOULD return NFS4ERR_ACCESS. 5906 ACE4_ADD_SUBDIRECTORY 5908 Operation(s) affected: 5910 CREATE 5912 RENAME 5914 Discussion: 5916 Permission to create a subdirectory in a directory. The CREATE 5917 operation is affected when nfs_ftype4 is NF4DIR. The RENAME 5918 operation is always affected. 5920 ACE4_READ_NAMED_ATTRS 5922 Operation(s) affected: 5924 OPENATTR 5926 Discussion: 5928 Permission to read the named attributes of a file or to lookup 5929 the named attributes directory. OPENATTR is affected when it 5930 is not used to create a named attribute directory. This is 5931 when 1.) createdir is TRUE, but a named attribute directory 5932 already exists, or 2.) createdir is FALSE. 5934 ACE4_WRITE_NAMED_ATTRS 5936 Operation(s) affected: 5938 OPENATTR 5940 Discussion: 5942 Permission to write the named attributes of a file or to create 5943 a named attribute directory. OPENATTR is affected when it is 5944 used to create a named attribute directory. This is when 5945 createdir is TRUE and no named attribute directory exists. The 5946 ability to check whether or not a named attribute directory 5947 exists depends on the ability to look it up, therefore, users 5948 also need the ACE4_READ_NAMED_ATTRS permission in order to 5949 create a named attribute directory. 5951 ACE4_EXECUTE 5953 Operation(s) affected: 5955 READ 5957 OPEN 5959 REMOVE 5961 RENAME 5963 LINK 5965 CREATE 5967 Discussion: 5969 Permission to execute a file. 5971 Servers SHOULD allow a user the ability to read the data of the 5972 file when only the ACE4_EXECUTE access mask bit is allowed. 5973 This is because there is no way to execute a file without 5974 reading the contents. Though a server may treat ACE4_EXECUTE 5975 and ACE4_READ_DATA bits identically when deciding to permit a 5976 READ operation, it SHOULD still allow the two bits to be set 5977 independently in ACLs, and MUST distinguish between them when 5978 replying to ACCESS operations. In particular, servers SHOULD 5979 NOT silently turn on one of the two bits when the other is set, 5980 as that would make it impossible for the client to correctly 5981 enforce the distinction between read and execute permissions. 5983 As an example, following a SETATTR of the following ACL: 5985 nfsuser:ACE4_EXECUTE:ALLOW 5987 A subsequent GETATTR of ACL for that file SHOULD return: 5989 nfsuser:ACE4_EXECUTE:ALLOW 5991 Rather than: 5993 nfsuser:ACE4_EXECUTE/ACE4_READ_DATA:ALLOW 5995 ACE4_EXECUTE 5997 Operation(s) affected: 5999 LOOKUP 6001 Discussion: 6003 Permission to traverse/search a directory. 6005 ACE4_DELETE_CHILD 6007 Operation(s) affected: 6009 REMOVE 6011 RENAME 6013 Discussion: 6015 Permission to delete a file or directory within a directory. 6016 See Section 6.2.1.3.2 for information on ACE4_DELETE and 6017 ACE4_DELETE_CHILD interact. 6019 ACE4_READ_ATTRIBUTES 6021 Operation(s) affected: 6023 GETATTR of file system object attributes 6024 VERIFY 6026 NVERIFY 6028 READDIR 6030 Discussion: 6032 The ability to read basic attributes (non-ACLs) of a file. On 6033 a UNIX system, basic attributes can be thought of as the stat 6034 level attributes. Allowing this access mask bit would mean the 6035 entity can execute "ls -l" and stat. If a READDIR operation 6036 requests attributes, this mask must be allowed for the READDIR 6037 to succeed. 6039 ACE4_WRITE_ATTRIBUTES 6041 Operation(s) affected: 6043 SETATTR of time_access_set, time_backup, 6045 time_create, time_modify_set, mimetype, hidden, system 6047 Discussion: 6049 Permission to change the times associated with a file or 6050 directory to an arbitrary value. Also permission to change the 6051 mimetype, hidden and system attributes. A user having 6052 ACE4_WRITE_DATA or ACE4_WRITE_ATTRIBUTES will be allowed to set 6053 the times associated with a file to the current server time. 6055 ACE4_WRITE_RETENTION 6057 Operation(s) affected: 6059 SETATTR of retention_set, retentevt_set. 6061 Discussion: 6063 Permission to modify the durations of event and non-event-based 6064 retention. Also permission to enable event and non-event-based 6065 retention. A server MAY behave such that setting 6066 ACE4_WRITE_ATTRIBUTES allows ACE4_WRITE_RETENTION. 6068 ACE4_WRITE_RETENTION_HOLD 6070 Operation(s) affected: 6072 SETATTR of retention_hold. 6074 Discussion: 6076 Permission to modify the administration retention holds. A 6077 server MAY map ACE4_WRITE_ATTRIBUTES to 6078 ACE_WRITE_RETENTION_HOLD. 6080 ACE4_DELETE 6082 Operation(s) affected: 6084 REMOVE 6086 Discussion: 6088 Permission to delete the file or directory. See 6089 Section 6.2.1.3.2 for information on ACE4_DELETE and 6090 ACE4_DELETE_CHILD interact. 6092 ACE4_READ_ACL 6094 Operation(s) affected: 6096 GETATTR of acl, dacl, or sacl 6098 NVERIFY 6100 VERIFY 6102 Discussion: 6104 Permission to read the ACL. 6106 ACE4_WRITE_ACL 6108 Operation(s) affected: 6110 SETATTR of acl and mode 6112 Discussion: 6114 Permission to write the acl and mode attributes. 6116 ACE4_WRITE_OWNER 6118 Operation(s) affected: 6120 SETATTR of owner and owner_group 6122 Discussion: 6124 Permission to write the owner and owner_group attributes. On 6125 UNIX systems, this is the ability to execute chown() and 6126 chgrp(). 6128 ACE4_SYNCHRONIZE 6130 Operation(s) affected: 6132 NONE 6134 Discussion: 6136 Permission to access file locally at the server with 6137 synchronized reads and writes. 6139 Server implementations need not provide the granularity of control 6140 that is implied by this list of masks. For example, POSIX-based 6141 systems might not distinguish ACE4_APPEND_DATA (the ability to append 6142 to a file) from ACE4_WRITE_DATA (the ability to modify existing 6143 contents); both masks would be tied to a single "write" permission. 6144 When such a server returns attributes to the client, it would show 6145 both ACE4_APPEND_DATA and ACE4_WRITE_DATA if and only if the write 6146 permission is enabled. 6148 If a server receives a SETATTR request that it cannot accurately 6149 implement, it should err in the direction of more restricted access, 6150 except in the previously discussed cases of execute and read. For 6151 example, suppose a server cannot distinguish overwriting data from 6152 appending new data, as described in the previous paragraph. If a 6153 client submits an ALLOW ACE where ACE4_APPEND_DATA is set but 6154 ACE4_WRITE_DATA is not (or vice versa), the server should either turn 6155 off ACE4_APPEND_DATA or reject the request with NFS4ERR_ATTRNOTSUPP. 6157 6.2.1.3.2. ACE4_DELETE vs. ACE4_DELETE_CHILD 6159 Two access mask bits govern the ability to delete a directory entry: 6160 ACE4_DELETE on the object itself (the "target"), and 6161 ACE4_DELETE_CHILD on the containing directory (the "parent"). 6163 Many systems also take the "sticky bit" (MODE4_SVTX) on a directory 6164 to allow unlink only to a user that owns either the target or the 6165 parent; on some such systems the decision also depends on whether the 6166 target is writable. 6168 Servers SHOULD allow unlink if either ACE4_DELETE is permitted on the 6169 target, or ACE4_DELETE_CHILD is permitted on the parent. (Note that 6170 this is true even if the parent or target explicitly denies one of 6171 these permissions.) 6173 If the ACLs in question neither explicitly ALLOW nor DENY either of 6174 the above, and if MODE4_SVTX is not set on the parent, then the 6175 server SHOULD allow the removal if and only if ACE4_ADD_FILE is 6176 permitted. In the case where MODE4_SVTX is set, the server may also 6177 require the remover to own either the parent or the target, or may 6178 require the target to be writable. 6180 This allows servers to support something close to traditional unix- 6181 like semantics, with ACE4_ADD_FILE taking the place of the write bit. 6183 6.2.1.4. ACE flag 6185 The bitmask constants used for the flag field are as follows: 6187 const ACE4_FILE_INHERIT_ACE = 0x00000001; 6188 const ACE4_DIRECTORY_INHERIT_ACE = 0x00000002; 6189 const ACE4_NO_PROPAGATE_INHERIT_ACE = 0x00000004; 6190 const ACE4_INHERIT_ONLY_ACE = 0x00000008; 6191 const ACE4_SUCCESSFUL_ACCESS_ACE_FLAG = 0x00000010; 6192 const ACE4_FAILED_ACCESS_ACE_FLAG = 0x00000020; 6193 const ACE4_IDENTIFIER_GROUP = 0x00000040; 6194 const ACE4_INHERITED_ACE = 0x00000080; 6196 A server need not support any of these flags. If the server supports 6197 flags that are similar to, but not exactly the same as, these flags, 6198 the implementation may define a mapping between the protocol-defined 6199 flags and the implementation-defined flags. 6201 For example, suppose a client tries to set an ACE with 6202 ACE4_FILE_INHERIT_ACE set but not ACE4_DIRECTORY_INHERIT_ACE. If the 6203 server does not support any form of ACL inheritance, the server 6204 should reject the request with NFS4ERR_ATTRNOTSUPP. If the server 6205 supports a single "inherit ACE" flag that applies to both files and 6206 directories, the server may reject the request (i.e., requiring the 6207 client to set both the file and directory inheritance flags). The 6208 server may also accept the request and silently turn on the 6209 ACE4_DIRECTORY_INHERIT_ACE flag. 6211 6.2.1.4.1. Discussion of Flag Bits 6213 ACE4_FILE_INHERIT_ACE 6214 Any non-directory file in any sub-directory will get this ACE 6215 inherited. 6217 ACE4_DIRECTORY_INHERIT_ACE 6218 Can be placed on a directory and indicates that this ACE should be 6219 added to each new directory created. 6220 If this flag is set in an ACE in an ACL attribute to be set on a 6221 non-directory file system object, the operation attempting to set 6222 the ACL SHOULD fail with NFS4ERR_ATTRNOTSUPP. 6224 ACE4_INHERIT_ONLY_ACE 6225 Can be placed on a directory but does not apply to the directory; 6226 ALLOW and DENY ACEs with this bit set do not affect access to the 6227 directory, and AUDIT and ALARM ACEs with this bit set do not 6228 trigger log or alarm events. Such ACEs only take effect once they 6229 are applied (with this bit cleared) to newly created files and 6230 directories as specified by the above two flags. 6231 If this flag is present on an ACE, but neither 6232 ACE4_DIRECTORY_INHERIT_ACE nor ACE4_FILE_INHERIT_ACE is present, 6233 then an operation attempting to set such an attribute SHOULD fail 6234 with NFS4ERR_ATTRNOTSUPP. 6236 ACE4_NO_PROPAGATE_INHERIT_ACE 6237 Can be placed on a directory. This flag tells the server that 6238 inheritance of this ACE should stop at newly created child 6239 directories. 6241 ACE4_INHERITED_ACE 6242 Indicates that this ACE is inherited from a parent directory. A 6243 server that supports automatic inheritance will place this flag on 6244 any ACEs inherited from the parent directory when creating a new 6245 object. Client applications will use this to perform automatic 6246 inheritance. Clients and servers MUST clear this bit in the acl 6247 attribute; it may only be used in the dacl and sacl attributes. 6249 ACE4_SUCCESSFUL_ACCESS_ACE_FLAG 6250 ACE4_FAILED_ACCESS_ACE_FLAG 6251 The ACE4_SUCCESSFUL_ACCESS_ACE_FLAG (SUCCESS) and 6252 ACE4_FAILED_ACCESS_ACE_FLAG (FAILED) flag bits may be set only on 6253 ACE4_SYSTEM_AUDIT_ACE_TYPE (AUDIT) and ACE4_SYSTEM_ALARM_ACE_TYPE 6254 (ALARM) ACE types. If during the processing of the file's ACL, 6255 the server encounters an AUDIT or ALARM ACE that matches the 6256 principal attempting the OPEN, the server notes that fact, and the 6257 presence, if any, of the SUCCESS and FAILED flags encountered in 6258 the AUDIT or ALARM ACE. Once the server completes the ACL 6259 processing, it then notes if the operation succeeded or failed. 6260 If the operation succeeded, and if the SUCCESS flag was set for a 6261 matching AUDIT or ALARM ACE, then the appropriate AUDIT or ALARM 6262 event occurs. If the operation failed, and if the FAILED flag was 6263 set for the matching AUDIT or ALARM ACE, then the appropriate 6264 AUDIT or ALARM event occurs. Either or both of the SUCCESS or 6265 FAILED can be set, but if neither is set, the AUDIT or ALARM ACE 6266 is not useful. 6268 The previously described processing applies to ACCESS operations 6269 even when they return NFS4_OK. For the purposes of AUDIT and 6270 ALARM, we consider an ACCESS operation to be a "failure" if it 6271 fails to return a bit that was requested and supported. 6273 ACE4_IDENTIFIER_GROUP 6274 Indicates that the "who" refers to a GROUP as defined under UNIX 6275 or a GROUP ACCOUNT as defined under Windows. Clients and servers 6276 MUST ignore the ACE4_IDENTIFIER_GROUP flag on ACEs with a who 6277 value equal to one of the special identifiers outlined in 6278 Section 6.2.1.5. 6280 6.2.1.5. ACE Who 6282 The "who" field of an ACE is an identifier that specifies the 6283 principal or principals to whom the ACE applies. It may refer to a 6284 user or a group, with the flag bit ACE4_IDENTIFIER_GROUP specifying 6285 which. 6287 There are several special identifiers which need to be understood 6288 universally, rather than in the context of a particular DNS domain. 6289 Some of these identifiers cannot be understood when an NFS client 6290 accesses the server, but have meaning when a local process accesses 6291 the file. The ability to display and modify these permissions is 6292 permitted over NFS, even if none of the access methods on the server 6293 understands the identifiers. 6295 +---------------+--------------------------------------------------+ 6296 | Who | Description | 6297 +---------------+--------------------------------------------------+ 6298 | OWNER | The owner of the file | 6299 | GROUP | The group associated with the file. | 6300 | EVERYONE | The world, including the owner and owning group. | 6301 | INTERACTIVE | Accessed from an interactive terminal. | 6302 | NETWORK | Accessed via the network. | 6303 | DIALUP | Accessed as a dialup user to the server. | 6304 | BATCH | Accessed from a batch job. | 6305 | ANONYMOUS | Accessed without any authentication. | 6306 | AUTHENTICATED | Any authenticated user (opposite of ANONYMOUS) | 6307 | SERVICE | Access from a system service. | 6308 +---------------+--------------------------------------------------+ 6310 Table 7 6312 To avoid conflict, these special identifiers are distinguished by an 6313 appended "@" and should appear in the form "xxxx@" (with no domain 6314 name after the "@"). For example: ANONYMOUS@. 6316 The ACE4_IDENTIFIER_GROUP flag MUST be ignored on entries with these 6317 special identifiers. When encoding entries with these special 6318 identifiers, the ACE4_IDENTIFIER_GROUP flag SHOULD be set to zero. 6320 6.2.1.5.1. Discussion of EVERYONE@ 6322 It is important to note that "EVERYONE@" is not equivalent to the 6323 UNIX "other" entity. This is because, by definition, UNIX "other" 6324 does not include the owner or owning group of a file. "EVERYONE@" 6325 means literally everyone, including the owner or owning group. 6327 6.2.2. Attribute 58: dacl 6329 The dacl attribute is like the acl attribute, but dacl allows just 6330 ALLOW and DENY ACEs. The dacl attribute supports automatic 6331 inheritance (see Section 6.4.3.2). 6333 6.2.3. Attribute 59: sacl 6335 The sacl attribute is like the acl attribute, but sacl allows just 6336 AUDIT and ALARM ACEs. The sacl attribute supports automatic 6337 inheritance (see Section 6.4.3.2). 6339 6.2.4. Attribute 33: mode 6341 The NFSv4.1 mode attribute is based on the UNIX mode bits. The 6342 following bits are defined: 6344 const MODE4_SUID = 0x800; /* set user id on execution */ 6345 const MODE4_SGID = 0x400; /* set group id on execution */ 6346 const MODE4_SVTX = 0x200; /* save text even after use */ 6347 const MODE4_RUSR = 0x100; /* read permission: owner */ 6348 const MODE4_WUSR = 0x080; /* write permission: owner */ 6349 const MODE4_XUSR = 0x040; /* execute permission: owner */ 6350 const MODE4_RGRP = 0x020; /* read permission: group */ 6351 const MODE4_WGRP = 0x010; /* write permission: group */ 6352 const MODE4_XGRP = 0x008; /* execute permission: group */ 6353 const MODE4_ROTH = 0x004; /* read permission: other */ 6354 const MODE4_WOTH = 0x002; /* write permission: other */ 6355 const MODE4_XOTH = 0x001; /* execute permission: other */ 6357 Bits MODE4_RUSR, MODE4_WUSR, and MODE4_XUSR apply to the principal 6358 identified in the owner attribute. Bits MODE4_RGRP, MODE4_WGRP, and 6359 MODE4_XGRP apply to principals identified in the owner_group 6360 attribute but who are not identified in the owner attribute. Bits 6361 MODE4_ROTH, MODE4_WOTH, MODE4_XOTH apply to any principal that does 6362 not match that in the owner attribute, and does not have a group 6363 matching that of the owner_group attribute. 6365 Bits within the mode other than those specified above are not defined 6366 by this protocol. A server MUST NOT return bits other than those 6367 defined above in a GETATTR or READDIR operation, and it MUST return 6368 NFS4ERR_INVAL if bits other than those defined above are set in a 6369 SETATTR, CREATE, OPEN, VERIFY or NVERIFY operation. 6371 6.2.5. Attribute 74: mode_set_masked 6373 The mode_set_masked attribute is a write-only attribute that allows 6374 individual bits in the mode attribute to be set or reset, without 6375 changing others. It allows, for example, the bits MODE4_SUID, 6376 MODE4_SGID, and MODE4_SVTX to be modified while leaving unmodified 6377 any of the nine low-order mode bits devoted to permissions. 6379 In such instances that the nine low-order bits are left unmodified, 6380 then neither the acl nor the dacl attribute should be automatically 6381 modified as discussed in Section 6.4.1. 6383 The mode_set_masked attribute consists of two words each in the form 6384 of a mode4. The first consists of the value to be applied to the 6385 current mode value and the second is a mask. Only bits set to one in 6386 the mask word are changed (set or reset) in the file's mode. All 6387 other bits in the mode remain unchanged. Bits in the first word that 6388 correspond to bits which are zero in the mask are ignored, except 6389 that undefined bits are checked for validity and can result in 6390 NFS4ERR_INVAL as described below. 6392 The mode_set_masked attribute is only valid in a SETATTR operation. 6393 If it is used in a CREATE or OPEN operation, the server MUST return 6394 NFS4ERR_INVAL. 6396 Bits not defined as valid in the mode attribute are not valid in 6397 either word of the mode_set_masked attribute. The server MUST return 6398 NFS4ERR_INVAL if any of those are on in a SETATTR. If the mode and 6399 mode_set_masked attributes are both specified in the same SETATTR, 6400 the server MUST also return NFS4ERR_INVAL. 6402 6.3. Common Methods 6404 The requirements in this section will be referred to in future 6405 sections, especially Section 6.4. 6407 6.3.1. Interpreting an ACL 6409 6.3.1.1. Server Considerations 6411 The server uses the algorithm described in Section 6.2.1 to determine 6412 whether an ACL allows access to an object. However, the ACL may not 6413 be the sole determiner of access. For example: 6415 o In the case of a file system exported as read-only, the server may 6416 deny write permissions even though an object's ACL grants it. 6418 o Server implementations MAY grant ACE4_WRITE_ACL and ACE4_READ_ACL 6419 permissions to prevent a situation from arising in which there is 6420 no valid way to ever modify the ACL. 6422 o All servers will allow a user the ability to read the data of the 6423 file when only the execute permission is granted (i.e. If the ACL 6424 denies the user the ACE4_READ_DATA access and allows the user 6425 ACE4_EXECUTE, the server will allow the user to read the data of 6426 the file). 6428 o Many servers have the notion of owner-override in which the owner 6429 of the object is allowed to override accesses that are denied by 6430 the ACL. This may be helpful, for example, to allow users 6431 continued access to open files on which the permissions have 6432 changed. 6434 o Many servers have the notion of a "superuser" that has privileges 6435 beyond an ordinary user. The superuser may be able to read or 6436 write data or metadata in ways that would not be permitted by the 6437 ACL. 6439 6.3.1.2. Client Considerations 6441 Clients SHOULD NOT do their own access checks based on their 6442 interpretation the ACL, but rather use the OPEN and ACCESS operations 6443 to do access checks. This allows the client to act on the results of 6444 having the server determine whether or not access should be granted 6445 based on its interpretation of the ACL. 6447 Clients must be aware of situations in which an object's ACL will 6448 define a certain access even though the server will not enforce it. 6449 In general, but especially in these situations, the client needs to 6450 do its part in the enforcement of access as defined by the ACL. To 6451 do this, the client MAY send the appropriate ACCESS operation prior 6452 to servicing the request of the user or application in order to 6453 determine whether the user or application should be granted the 6454 access requested. For examples in which the ACL may define accesses 6455 that the server doesn't enforce see Section 6.3.1.1. 6457 6.3.2. Computing a Mode Attribute from an ACL 6459 The following method can be used to calculate the MODE4_R*, MODE4_W* 6460 and MODE4_X* bits of a mode attribute, based upon an ACL. 6462 First, for each of the special identifiers OWNER@, GROUP@, and 6463 EVERYONE@, evaluate the ACL in order, considering only ALLOW and DENY 6464 ACEs for the identifier EVERYONE@ and for the identifier under 6465 consideration. The result of the evaluation will be an NFSv4 ACL 6466 mask showing exactly which bits are permitted to that identifier. 6468 Then translate the calculated mask for OWNER@, GROUP@, and EVERYONE@ 6469 into mode bits for, respectively, the user, group, and other, as 6470 follows: 6472 1. Set the read bit (MODE4_RUSR, MODE4_RGRP, or MODE4_ROTH) if and 6473 only if ACE4_READ_DATA is set in the corresponding mask. 6475 2. Set the write bit (MODE4_WUSR, MODE4_WGRP, or MODE4_WOTH) if and 6476 only if ACE4_WRITE_DATA and ACE4_APPEND_DATA are both set in the 6477 corresponding mask. 6479 3. Set the execute bit (MODE4_XUSR, MODE4_XGRP, or MODE4_XOTH), if 6480 and only if ACE4_EXECUTE is set in the corresponding mask. 6482 6.3.2.1. Discussion 6484 Some server implementations also add bits permitted to named users 6485 and groups to the group bits (MODE4_RGRP, MODE4_WGRP, and 6486 MODE4_XGRP). 6488 Implementations are discouraged from doing this, because it has been 6489 found to cause confusion for users who see members of a file's group 6490 denied access that the mode bits appear to allow. (The presence of 6491 DENY ACEs may also lead to such behavior, but DENY ACEs are expected 6492 to be more rarely used.) 6494 The same user confusion seen when fetching the mode also results if 6495 setting the mode does not effectively control permissions for the 6496 owner, group, and other users; this motivates some of the 6497 requirements that follow. 6499 6.4. Requirements 6501 The server that supports both mode and ACL must take care to 6502 synchronize the MODE4_*USR, MODE4_*GRP, and MODE4_*OTH bits with the 6503 ACEs which have respective who fields of "OWNER@", "GROUP@", and 6504 "EVERYONE@" so that the client can see semantically equivalent access 6505 permissions exist whether the client asks for owner, owner_group and 6506 mode attributes, or for just the ACL. 6508 In this section, much is made of the methods in Section 6.3.2. Many 6509 requirements refer to this section. But note that the methods have 6510 behaviors specified with "SHOULD". This is intentional, to avoid 6511 invalidating existing implementations that compute the mode according 6512 to the withdrawn POSIX ACL draft (1003.1e draft 17), rather than by 6513 actual permissions on owner, group, and other. 6515 6.4.1. Setting the mode and/or ACL Attributes 6517 In the case where a server supports the sacl or dacl attribute, in 6518 addition to the acl attribute, the server MUST fail a request to set 6519 the acl attribute simultaneously with a dacl or sacl attribute. The 6520 error to be given is NFS4ERR_ATTRNOTSUP. 6522 6.4.1.1. Setting mode and not ACL 6524 When any of the nine low-order mode bits are subject to change, 6525 either because the mode attribute was set or because the 6526 mode_set_masked attribute was set and the mask included one or more 6527 bits from the nine low-order mode bits, and no ACL attribute is 6528 explicitly set, the acl and dacl attributes must be modified in 6529 accordance with the updated value of those bits. This must happen 6530 even if the value of the low-order bits is the same after the mode is 6531 set as before. 6533 Note that any AUDIT or ALARM ACEs (hence any ACEs in the sacl 6534 attribute) are unaffected by changes to the mode. 6536 In cases in which the permissions bits are subject to change, the acl 6537 and dacl attributes MUST be modified such that the mode computed via 6538 the method in Section 6.3.2 yields the low-order nine bits (MODE4_R*, 6539 MODE4_W*, MODE4_X*) of the mode attribute as modified by the 6540 attribute change. The ACL attributes SHOULD also be modified such 6541 that: 6543 1. If MODE4_RGRP is not set, entities explicitly listed in the ACL 6544 other than OWNER@ and EVERYONE@ SHOULD NOT be granted 6545 ACE4_READ_DATA. 6547 2. If MODE4_WGRP is not set, entities explicitly listed in the ACL 6548 other than OWNER@ and EVERYONE@ SHOULD NOT be granted 6549 ACE4_WRITE_DATA or ACE4_APPEND_DATA. 6551 3. If MODE4_XGRP is not set, entities explicitly listed in the ACL 6552 other than OWNER@ and EVERYONE@ SHOULD NOT be granted 6553 ACE4_EXECUTE. 6555 Access mask bits other those listed above, appearing in ALLOW ACEs, 6556 MAY also be disabled. 6558 Note that ACEs with the flag ACE4_INHERIT_ONLY_ACE set do not affect 6559 the permissions of the ACL itself, nor do ACEs of the type AUDIT and 6560 ALARM. As such, it is desirable to leave these ACEs unmodified when 6561 modifying the ACL attributes. 6563 Also note that the requirement may be met by discarding the acl and 6564 dacl, in favor of an ACL that represents the mode and only the mode. 6565 This is permitted, but it is preferable for a server to preserve as 6566 much of the ACL as possible without violating the above requirements. 6567 Discarding the ACL makes it effectively impossible for a file created 6568 with a mode attribute to inherit an ACL (see Section 6.4.3). 6570 6.4.1.2. Setting ACL and not mode 6572 When setting the acl or dacl and not setting the mode or 6573 mode_set_masked attributes, the permission bits of the mode need to 6574 be derived from the ACL. In this case, the ACL attribute SHOULD be 6575 set as given. The nine low-order bits of the mode attribute 6576 (MODE4_R*, MODE4_W*, MODE4_X*) MUST be modified to match the result 6577 of the method Section 6.3.2. The three high-order bits of the mode 6578 (MODE4_SUID, MODE4_SGID, MODE4_SVTX) SHOULD remain unchanged. 6580 6.4.1.3. Setting both ACL and mode 6582 When setting both the mode (includes use of either the mode attribute 6583 or the mode_set_masked attribute) and the acl or dacl attributes in 6584 the same operation, the attributes MUST be applied in this order: 6585 mode (or mode_set_masked), then ACL. The mode-related attribute is 6586 set as given, then the ACL attribute is set as given, possibly 6587 changing the final mode, as described above in Section 6.4.1.2. 6589 6.4.2. Retrieving the mode and/or ACL Attributes 6591 This section applies only to servers that support both the mode and 6592 ACL attributes. 6594 Some server implementations may have a concept of "objects without 6595 ACLs", meaning that all permissions are granted and denied according 6596 to the mode attribute, and that no ACL attribute is stored for that 6597 object. If an ACL attribute is requested of such a server, the 6598 server SHOULD return an ACL that does not conflict with the mode; 6599 that is to say, the ACL returned SHOULD represent the nine low-order 6600 bits of the mode attribute (MODE4_R*, MODE4_W*, MODE4_X*) as 6601 described in Section 6.3.2. 6603 For other server implementations, the ACL attribute is always present 6604 for every object. Such servers SHOULD store at least the three high- 6605 order bits of the mode attribute (MODE4_SUID, MODE4_SGID, 6606 MODE4_SVTX). The server SHOULD return a mode attribute if one is 6607 requested, and the low-order nine bits of the mode (MODE4_R*, 6608 MODE4_W*, MODE4_X*) MUST match the result of applying the method in 6609 Section 6.3.2 to the ACL attribute. 6611 6.4.3. Creating New Objects 6613 If a server supports any ACL attributes, it may use the ACL 6614 attributes on the parent directory to compute an initial ACL 6615 attribute for a newly created object. This will be referred to as 6616 the inherited ACL within this section. The act of adding one or more 6617 ACEs to the inherited ACL that are based upon ACEs in the parent 6618 directory's ACL will be referred to as inheriting an ACE within this 6619 section. 6621 Implementors should standardize on what the behavior of CREATE and 6622 OPEN must be depending on the presence or absence of the mode and ACL 6623 attributes. 6625 1. If just the mode is given in the call: 6627 In this case, inheritance SHOULD take place, but the mode MUST be 6628 applied to the inherited ACL as described in Section 6.4.1.1, 6629 thereby modifying the ACL. 6631 2. If just the ACL is given in the call: 6633 In this case, inheritance SHOULD NOT take place, and the ACL as 6634 defined in the CREATE or OPEN will be set without modification, 6635 and the mode modified as in Section 6.4.1.2 6637 3. If both mode and ACL are given in the call: 6639 In this case, inheritance SHOULD NOT take place, and both 6640 attributes will be set as described in Section 6.4.1.3. 6642 4. If neither mode nor ACL are given in the call: 6644 In the case where an object is being created without any initial 6645 attributes at all, e.g. an OPEN operation with an opentype4 of 6646 OPEN4_CREATE and a createmode4 of EXCLUSIVE4, inheritance SHOULD 6647 NOT take place (note that EXCLUSIVE4_1 is a better choice of 6648 createmode4, since it does permit initial attributes). Instead, 6649 the server SHOULD set permissions to deny all access to the newly 6650 created object. It is expected that the appropriate client will 6651 set the desired attributes in a subsequent SETATTR operation, and 6652 the server SHOULD allow that operation to succeed, regardless of 6653 what permissions the object is created with. For example, an 6654 empty ACL denies all permissions, but the server should allow the 6655 owner's SETATTR to succeed even though WRITE_ACL is implicitly 6656 denied. 6658 In other cases, inheritance SHOULD take place, and no 6659 modifications to the ACL will happen. The mode attribute, if 6660 supported, MUST be as computed in Section 6.3.2, with the 6661 MODE4_SUID, MODE4_SGID and MODE4_SVTX bits clear. If no 6662 inheritable ACEs exist on the parent directory, the rules for 6663 creating acl, dacl or sacl attributes are implementation defined. 6664 If either the dacl or sacl attribute is supported, then the 6665 ACL4_DEFAULTED flag SHOULD be set on the newly created 6666 attributes. 6668 6.4.3.1. The Inherited ACL 6670 If the object being created is not a directory, the inherited ACL 6671 SHOULD NOT inherit ACEs from the parent directory ACL unless the 6672 ACE4_FILE_INHERIT_FLAG is set. 6674 If the object being created is a directory, the inherited ACL should 6675 inherit all inheritable ACEs from the parent directory, those that 6676 have ACE4_FILE_INHERIT_ACE or ACE4_DIRECTORY_INHERIT_ACE flag set. 6677 If the inheritable ACE has ACE4_FILE_INHERIT_ACE set, but 6678 ACE4_DIRECTORY_INHERIT_ACE is clear, the inherited ACE on the newly 6679 created directory MUST have the ACE4_INHERIT_ONLY_ACE flag set to 6680 prevent the directory from being affected by ACEs meant for non- 6681 directories. 6683 When a new directory is created, the server MAY split any inherited 6684 ACE which is both inheritable and effective (in other words, which 6685 has neither ACE4_INHERIT_ONLY_ACE nor ACE4_NO_PROPAGATE_INHERIT_ACE 6686 set), into two ACEs, one with no inheritance flags, and one with 6687 ACE4_INHERIT_ONLY_ACE set. (In the case of a dacl or sacl attribute, 6688 both of those ACEs SHOULD also have the ACE4_INHERITED_ACE flag set.) 6689 This makes it simpler to modify the effective permissions on the 6690 directory without modifying the ACE which is to be inherited to the 6691 new directory's children. 6693 6.4.3.2. Automatic Inheritance 6695 The acl attribute consists only of an array of ACEs, but the sacl 6696 (Section 6.2.3) and dacl (Section 6.2.2) attributes also include an 6697 additional flag field. 6699 struct nfsacl41 { 6700 aclflag4 na41_flag; 6701 nfsace4 na41_aces<>; 6702 }; 6704 The flag field applies to the entire sacl or dacl; three flag values 6705 are defined: 6707 const ACL4_AUTO_INHERIT = 0x00000001; 6708 const ACL4_PROTECTED = 0x00000002; 6709 const ACL4_DEFAULTED = 0x00000004; 6711 and all other bits must be cleared. The ACE4_INHERITED_ACE flag may 6712 be set in the ACEs of the sacl or dacl (whereas it must always be 6713 cleared in the acl). 6715 Together these features allow a server to support automatic 6716 inheritance, which we now explain in more detail. 6718 Inheritable ACEs are normally inherited by child objects only at the 6719 time that the child objects are created; later modifications to 6720 inheritable ACEs do not result in modifications to inherited ACEs on 6721 descendents. 6723 However, the dacl and sacl provide an OPTIONAL mechanism which allows 6724 a client application to propagate changes to inheritable ACEs to an 6725 entire directory hierarchy. 6727 A server that supports this performs inheritance at object creation 6728 time in the normal way, and SHOULD set the ACE4_INHERITED_ACE flag on 6729 any inherited ACEs as they are added to the new object. 6731 A client application such as an ACL editor may then propagate changes 6732 to inheritable ACEs on a directory by recursively traversing that 6733 directory's descendants and modifying each ACL encountered to remove 6734 any ACEs with the ACE4_INHERITED_ACE flag and to replace them by the 6735 new inheritable ACEs (also with the ACE4_INHERITED_ACE flag set). It 6736 uses the existing ACE inheritance flags in the obvious way to decide 6737 which ACEs to propagate. (Note that it may encounter further 6738 inheritable ACEs when descending the directory hierarchy, and that 6739 those will also need to be taken into account when propagating 6740 inheritable ACEs to further descendants.) 6742 The reach of this propagation may be limited in two ways: first, 6743 automatic inheritance is not performed from any directory ACL that 6744 has the ACL4_AUTO_INHERIT flag cleared; and second, automatic 6745 inheritance stops wherever an ACL with the ACL4_PROTECTED flag is 6746 set, preventing modification of that ACL and also (if the ACL is set 6747 on a directory) of the ACL on any of the object's descendants. 6749 This propagation is performed independently for the sacl and the dacl 6750 attributes; thus the ACL4_AUTO_INHERIT and ACL4_PROTECTED flags may 6751 be independently set for the sacl and the dacl, and propagation of 6752 one type of acl may continue down a hierarchy even where propagation 6753 of the other acl has stopped. 6755 New objects should be created with a dacl and a sacl that both have 6756 the ACL4_PROTECTED flag cleared and the ACL4_AUTO_INHERIT flag set to 6757 the same value as that on, respectively, the sacl or dacl of the 6758 parent object. 6760 Both the dacl and sacl attributes are RECOMMENDED, and a server may 6761 support one without supporting the other. 6763 A server that supports both the old acl attribute and one or both of 6764 the new dacl or sacl attributes must do so in such a way as to keep 6765 all three attributes consistent with each other. Thus the ACEs 6766 reported in the acl attribute should be the union of the ACEs 6767 reported in the dacl and sacl attributes, except that the 6768 ACE4_INHERITED_ACE flag must be cleared from the ACEs in the acl. 6769 And of course a client that queries only the acl will be unable to 6770 determine the values of the sacl or dacl flag fields. 6772 When a client performs a SETATTR for the acl attribute, the server 6773 SHOULD set the ACL4_PROTECTED flag to true on both the sacl and the 6774 dacl. By using the acl attribute, as opposed to the dacl or sacl 6775 attributes, the client signals that it may not understand automatic 6776 inheritance, and thus cannot be trusted to set an ACL for which 6777 automatic inheritance would make sense. 6779 When a client application queries an ACL, modifies it, and sets it 6780 again, it should leave any ACEs marked with ACE4_INHERITED_ACE 6781 unchanged, in their original order, at the end of the ACL. If the 6782 application is unable to do this, it should set the ACL4_PROTECTED 6783 flag. This behavior is not enforced by servers, but violations of 6784 this rule may lead to unexpected results when applications perform 6785 automatic inheritance. 6787 If a server also supports the mode attribute, it SHOULD set the mode 6788 in such a way that leaves inherited ACEs unchanged, in their original 6789 order, at the end of the ACL. If it is unable to do so, it SHOULD 6790 set the ACL4_PROTECTED flag on the file's dacl. 6792 Finally, in the case where the request that creates a new file or 6793 directory does not also set permissions for that file or directory, 6794 and there are also no ACEs to inherit from the parent's directory, 6795 then the server's choice of ACL for the new object is implementation- 6796 dependent. In this case, the server SHOULD set the ACL4_DEFAULTED 6797 flag on the ACL it chooses for the new object. An application 6798 performing automatic inheritance takes the ACL4_DEFAULTED flag as a 6799 sign that the ACL should be completely replaced by one generated 6800 using the automatic inheritance rules. 6802 7. Single-server Namespace 6804 This chapter describes the NFSv4 single-server namespace. Single- 6805 server namespaces may be presented directly to clients, or they may 6806 be used as a basis to form larger multi-server namespaces (e.g. site- 6807 wide or organization-wide) to be presented to clients, as described 6808 in Section 11. 6810 7.1. Server Exports 6812 On a UNIX server, the namespace describes all the files reachable by 6813 pathnames under the root directory or "/". On a Windows server the 6814 namespace constitutes all the files on disks named by mapped disk 6815 letters. NFS server administrators rarely make the entire server's 6816 file system namespace available to NFS clients. More often portions 6817 of the namespace are made available via an "export" feature. In 6818 previous versions of the NFS protocol, the root filehandle for each 6819 export is obtained through the MOUNT protocol; the client sent a 6820 string that identified the export name within the namespace and the 6821 server returned the root filehandle for that export. The MOUNT 6822 protocol also provided an EXPORTS procedure that enumerated server's 6823 exports. 6825 7.2. Browsing Exports 6827 The NFSv4.1 protocol provides a root filehandle that clients can use 6828 to obtain filehandles for the exports of a particular server, via a 6829 series of LOOKUP operations within a COMPOUND, to traverse a path. A 6830 common user experience is to use a graphical user interface (perhaps 6831 a file "Open" dialog window) to find a file via progressive browsing 6832 through a directory tree. The client must be able to move from one 6833 export to another export via single-component, progressive LOOKUP 6834 operations. 6836 This style of browsing is not well supported by the NFSv3 protocol. 6837 In NFSv3, the client expects all LOOKUP operations to remain within a 6838 single server file system. For example, the device attribute will 6839 not change. This prevents a client from taking namespace paths that 6840 span exports. 6842 In the case of NFSv3, an automounter on the client can obtain a 6843 snapshot of the server's namespace using the EXPORTS procedure of the 6844 MOUNT protocol. If it understands the server's pathname syntax, it 6845 can create an image of the server's namespace on the client. The 6846 parts of the namespace that are not exported by the server are filled 6847 in with directories that might be constructed similarly to an NFSv4.1 6848 "pseudo file system" (see Section 7.3) that allows the user to browse 6849 from one mounted file system to another. There is a drawback to this 6850 representation of the server's namespace on the client: it is static. 6851 If the server administrator adds a new export the client will be 6852 unaware of it. 6854 7.3. Server Pseudo File System 6856 NFSv4.1 servers avoid this namespace inconsistency by presenting all 6857 the exports for a given server within the framework of a single 6858 namespace, for that server. An NFSv4.1 client uses LOOKUP and 6859 READDIR operations to browse seamlessly from one export to another. 6861 Where there are portions of the server namespace that are not 6862 exported, clients require some way of traversing those portions to 6863 reach actual exported file systems. A technique that servers may use 6864 to provide for this is to bridge unexported portion of the namespace 6865 via a "pseudo file system" that provides a view of exported 6866 directories only. A pseudo file system has a unique fsid and behaves 6867 like a normal, read-only file system. 6869 Based on the construction of the server's namespace, it is possible 6870 that multiple pseudo file systems may exist. For example, 6872 /a pseudo file system 6873 /a/b real file system 6874 /a/b/c pseudo file system 6875 /a/b/c/d real file system 6877 Each of the pseudo file systems is considered a separate entity and 6878 therefore MUST have its own fsid, unique among all the fsids for that 6879 server. 6881 7.4. Multiple Roots 6883 Certain operating environments are sometimes described as having 6884 "multiple roots". In such environments individual file systems are 6885 commonly represented by disk or volume names. NFSv4 servers for 6886 these platforms can construct a pseudo file system above these root 6887 names so that disk letters or volume names are simply directory names 6888 in the pseudo root. 6890 7.5. Filehandle Volatility 6892 The nature of the server's pseudo file system is that it is a logical 6893 representation of file system(s) available from the server. 6894 Therefore, the pseudo file system is most likely constructed 6895 dynamically when the server is first instantiated. It is expected 6896 that the pseudo file system may not have an on disk counterpart from 6897 which persistent filehandles could be constructed. Even though it is 6898 preferable that the server provide persistent filehandles for the 6899 pseudo file system, the NFS client should expect that pseudo file 6900 system filehandles are volatile. This can be confirmed by checking 6901 the associated "fh_expire_type" attribute for those filehandles in 6902 question. If the filehandles are volatile, the NFS client must be 6903 prepared to recover a filehandle value (e.g. with a series of LOOKUP 6904 operations) when receiving an error of NFS4ERR_FHEXPIRED. 6906 Because it is quite likely that servers will implement pseudo file 6907 systems using volatile filehandles, clients need to be prepared for 6908 them, rather than assuming that all filehandles will be persistent. 6910 7.6. Exported Root 6912 If the server's root file system is exported, one might conclude that 6913 a pseudo file system is unneeded. This not necessarily so. Assume 6914 the following file systems on a server: 6916 / fs1 (exported) 6917 /a fs2 (not exported) 6918 /a/b fs3 (exported) 6920 Because fs2 is not exported, fs3 cannot be reached with simple 6921 LOOKUPs. The server must bridge the gap with a pseudo file system. 6923 7.7. Mount Point Crossing 6925 The server file system environment may be constructed in such a way 6926 that one file system contains a directory which is 'covered' or 6927 mounted upon by a second file system. For example: 6929 /a/b (file system 1) 6930 /a/b/c/d (file system 2) 6932 The pseudo file system for this server may be constructed to look 6933 like: 6935 / (place holder/not exported) 6936 /a/b (file system 1) 6937 /a/b/c/d (file system 2) 6939 It is the server's responsibility to present the pseudo file system 6940 that is complete to the client. If the client sends a lookup request 6941 for the path "/a/b/c/d", the server's response is the filehandle of 6942 the root of the file system "/a/b/c/d". In previous versions of the 6943 NFS protocol, the server would respond with the filehandle of 6944 directory "/a/b/c/d" within the file system "/a/b". 6946 The NFS client will be able to determine if it crosses a server mount 6947 point by a change in the value of the "fsid" attribute. 6949 7.8. Security Policy and Namespace Presentation 6951 Because NFSv4 clients possess the ability to change the security 6952 mechanisms used, after determining what is allowed, by using SECINFO 6953 and SECINFO_NONAME, the server SHOULD NOT present a different view of 6954 the namespace based on the security mechanism being used by a client. 6955 Instead, it should present a consistent view and return 6956 NFS4ERR_WRONGSEC if an attempt is made to access data with an 6957 inappropriate security mechanism. 6959 If security considerations make it necessary to hide the existence of 6960 a particular file system, as opposed to all of the data within it, 6961 the server can apply the security policy of a shared resource in the 6962 server's namespace to components of the resource's ancestors. For 6963 example: 6965 / (place holder/not exported) 6966 /a/b (file system 1) 6967 /a/b/MySecretProject (file system 2) 6969 The /a/b/MySecretProject directory is a real file system and is the 6970 shared resource. Suppose the security policy for /a/b/ 6971 MySecretProject is Kerberos with integrity and it is desired to limit 6972 knowledge of the existence of this file system. In this case, the 6973 server should apply the same security policy to /a/b. This allows 6974 for knowledge of the existence of a file system to be secured when 6975 desirable. 6977 For the case of the use of multiple, disjoint security mechanisms in 6978 the server's resources, applying that sort of policy would result in 6979 the higher-level file system not being accessible using any security 6980 flavor, which would make the that higher-level file system 6981 inaccessible. Therefore, that sort of configuration is not 6982 compatible with hiding the existence (as opposed to the contents) 6983 from clients using multiple disjoint sets of security flavors. 6985 In other circumstances, a desirable policy is for the security of a 6986 particular object in the server's namespace should include the union 6987 of all security mechanisms of all direct descendants. A common and 6988 convenient practice, unless strong security requirements dictate 6989 otherwise, is to make all of the pseudo file system accessible by all 6990 of the valid security mechanisms. 6992 Where there is concern about the security of data on the network, 6993 clients should use strong security mechanisms to access the pseudo 6994 file system in order to prevent man-in-the-middle attacks. 6996 8. State Management 6998 Integrating locking into the NFS protocol necessarily causes it to be 6999 stateful. With the inclusion of such features as share reservations, 7000 file and directory delegations, recallable layouts, and support for 7001 mandatory byte-range locking, the protocol becomes substantially more 7002 dependent on proper management of state than the traditional 7003 combination of NFS and NLM [36]. These features include expanded 7004 locking facilities, which provide some measure of interclient 7005 exclusion, but the state also offers features not readily providable 7006 using a stateless model. There are three components to making this 7007 state manageable: 7009 o Clear division between client and server 7010 o Ability to reliably detect inconsistency in state between client 7011 and server 7013 o Simple and robust recovery mechanisms 7015 In this model, the server owns the state information. The client 7016 requests changes in locks and the server responds with the changes 7017 made. Non-client-initiated changes in locking state are infrequent. 7018 The client receives prompt notification of such changes and can 7019 adjust its view of the locking state to reflect the server's changes. 7021 Individual pieces of state created by the server and passed to the 7022 client at its request are represented by 128-bit stateids. These 7023 stateids may represent a particular open file, a set of byte-range 7024 locks held by a particular owner, or a recallable delegation of 7025 privileges to access a file in particular ways, or at a particular 7026 location. 7028 In all cases, there is a transition from the most general information 7029 which represents a client as a whole to the eventual lightweight 7030 stateid used for most client and server locking interactions. The 7031 details of this transition will vary with the type of object but it 7032 always starts with a client ID. 7034 8.1. Client and Session ID 7036 A client must establish a client ID (see Section 2.4) and then one or 7037 more sessionids (see Section 2.10) before performing any operations 7038 to open, lock, delegate, or obtain a layout for a file object. Each 7039 session ID is associated with a specific client ID, and thus serves 7040 as a shorthand reference to an NFSv4.1 client. 7042 For some types of locking interactions, the client will represent 7043 some number of internal locking entities called "owners", which 7044 normally correspond to processes internal to the client. For other 7045 types of locking-related objects, such as delegations and layouts, no 7046 such intermediate entities are provided for, and the locking-related 7047 objects are considered to be transferred directly between the server 7048 and a unitary client. 7050 8.2. Stateid Definition 7052 When the server grants a lock of any type (including opens, byte- 7053 range locks, delegations, and layouts) it responds with a unique 7054 stateid, that represents a set of locks (often a single lock) for the 7055 same file, of the same type, and sharing the same ownership 7056 characteristics. Thus opens of the same file by different open- 7057 owners each have an identifying stateid. Similarly, each set of 7058 byte-range locks on a file owned by a specific lock-owner has its own 7059 identifying stateid. Delegations and layouts also have associated 7060 stateids by which they may be referenced. The stateid is used as a 7061 shorthand reference to a lock or set of locks and given a stateid the 7062 server can determine the associated state-owner or state-owners (in 7063 the case of an open-owner/lock-owner pair) and the associated 7064 filehandle. When stateids are used, the current filehandle must be 7065 the one associated with that stateid. 7067 All stateids associated with a given client ID are associated with a 7068 common lease which represents the claim of those stateids and the 7069 objects they represent to be maintained by the server. See 7070 Section 8.3 for a discussion of leases. 7072 The server may assign stateids independently for different clients. 7073 A stateid with the same bit pattern for one client may designate an 7074 entirely different set of locks for a different client. The stateid 7075 is always interpreted with respect to the client ID associated with 7076 the current session. Stateids apply to all sessions associated with 7077 the given client ID and the client may use a stateid obtained from 7078 one session on another session associated with the same client ID. 7080 8.2.1. Stateid Types 7082 With the exception of special stateids (see Section 8.2.3), each 7083 stateid represents locking objects of one of a set of types defined 7084 by the NFSv4.1 protocol. Note that in all these cases, where we 7085 speak of guarantee, it is understood there are situations such as a 7086 client restart, or lock revocation, that allow the guarantee to be 7087 voided. 7089 o Stateids may represent opens of files. 7091 Each stateid in this case represents the open state for a given 7092 client ID/open-owner/filehandle triple. Such stateids are subject 7093 to change (with consequent incrementing of the stateid's seqid) in 7094 response to OPENs that result in upgrade and OPEN_DOWNGRADE 7095 operations. 7097 o Stateids may represent sets of byte-range locks. 7099 All locks held on a particular file by a particular owner and all 7100 gotten under the aegis of a particular open file are associated 7101 with a single stateid with the seqid being incremented whenever 7102 LOCK and LOCKU operations affect that set of locks. 7104 o Stateids may represent file delegations, which are recallable 7105 guarantees by the server to the client, that other clients will 7106 not reference, or will not modify a particular file, until the 7107 delegation is returned. In NFSv4.1, file delegations may be 7108 obtained on both regular and non-regular files. 7110 A stateid represents a single delegation held by a client for a 7111 particular filehandle. 7113 o Stateids may represent directory delegations, which are recallable 7114 guarantees by the server to the client, that other clients will 7115 not modify the directory, until the delegation is returned. 7117 A stateid represents a single delegation held by a client for a 7118 particular directory filehandle. 7120 o Stateids may represent layouts, which are recallable guarantees by 7121 the server to the client, that particular files may be accessed 7122 via an alternate data access protocol at specific locations. Such 7123 access is limited to particular sets of byte ranges and may 7124 proceed until those byte ranges are reduced or the layout is 7125 returned. 7127 A stateid represents the set of all layouts held by a particular 7128 client for a particular filehandle with a given layout type. The 7129 seqid is updated as the layouts of that set changes with layout 7130 stateid changing operations such as LAYOUTGET and LAYOUTRETURN. 7132 8.2.2. Stateid Structure 7134 Stateids are divided into two fields, a 96-bit "other" field 7135 identifying the specific set of locks and a 32-bit "seqid" sequence 7136 value. Except in the case of special stateids (see Section 8.2.3), a 7137 particular value of the "other" field denotes a set of locks of the 7138 same type (for example byte-range locks, opens, delegations, or 7139 layouts), for a specific file or directory, and sharing the same 7140 ownership characteristics. The seqid designates a specific instance 7141 of such a set of locks, and is incremented to indicate changes in 7142 such a set of locks, either by the addition or deletion of locks from 7143 the set, a change in the byte-range they apply to, or an upgrade or 7144 downgrade in the type of one or more locks. 7146 When such a set of locks is first created the server returns a 7147 stateid with seqid value of one. On subsequent operations which 7148 modify the set of locks the server is required to increment the seqid 7149 field by one (1) whenever it returns a stateid for the same state- 7150 owner/file/type combination and there is some change in the set of 7151 locks actually designated. In this case the server will return a 7152 stateid with an other field the same as previously used for that 7153 state-owner/file/type combination, with an incremented seqid field. 7155 This pattern continues until the seqid is incremented past 7156 NFS4_UINT32_MAX, and one (not zero) is the next seqid value. 7158 The purpose of the incrementing of the seqid is to allow the server 7159 to communicate to the client the order in which operations that 7160 modified locking state associated with a stateid have been processed 7161 and to make it possible for the client to send requests that are 7162 conditional on the set of locks not having changed since the stateid 7163 in question was returned. 7165 Except for layout stateids (Section 12.5.3) when a client sends a 7166 stateid to the server, it has two choices with regard to the seqid 7167 sent. It may set the seqid to zero to indicate to the server that it 7168 wishes the most up-to-date seqid for that stateid's "other" field to 7169 be used. This would be the common choice in the case of a stateid 7170 sent with a READ or WRITE operation. It also may set a non-zero 7171 value in which case the server checks if that seqid is the correct 7172 one. In that case the server is required to return 7173 NFS4ERR_OLD_STATEID if the seqid is lower than the most current value 7174 and NFS4ERR_BAD_STATEID if the seqid is greater than the most current 7175 value. This would be the common choice in the case of stateids sent 7176 with a CLOSE or OPEN_DOWNGRADE. Because OPENs may be sent in 7177 parallel for the same owner, a client might close a file without 7178 knowing that an OPEN upgrade had been done by the server, changing 7179 the lock in question. If CLOSE were sent with a zero seqid, the OPEN 7180 upgrade would be canceled before the client even received an 7181 indication that an upgrade had happened. 7183 When a stateid is sent by the server to client as part of a callback 7184 operation, it is not subject to checking for a current seqid and 7185 returning NFS4ERR_OLD_STATEID. This is because the client is not in 7186 a position to know the most up-to-date seqid and thus cannot verify 7187 it. Unless specially noted, the seqid value for a stateid sent by 7188 the server to the client as part of a callback is required to be zero 7189 with NFS4ERR_BAD_STATEID returned if it is not. 7191 In making comparisons between seqids, both by the client in 7192 determining the order of operations and by the server in determining 7193 whether the NFS4ERR_OLD_STATEID is to be returned, the possibility of 7194 the seqid being swapped around past the NFS4_UINT32_MAX value needs 7195 to be taken into account. When two seqid values are being compared, 7196 the total count of slots for all sessions associated with the current 7197 client is used to do this. When one seqid value is less that this 7198 total slot count and another seqid value is greater than 7199 NFS4_UINT32_MAX minus the total slot count, the former is to be 7200 treated as lower than the later, despite the fact that it is 7201 numerically greater. 7203 8.2.3. Special Stateids 7205 Stateid values whose "other" field is either all zeros or all ones 7206 are reserved. They may not be assigned by the server but have 7207 special meanings defined by the protocol. The particular meaning 7208 depends on whether the "other" field is all zeros or all ones and the 7209 specific value of the "seqid" field. 7211 The following combinations of "other" and "seqid" are defined in 7212 NFSv4.1: 7214 o When "other" and "seqid" are both zero, the stateid is treated as 7215 a special anonymous stateid, which can be used in READ, WRITE, and 7216 SETATTR requests to indicate the absence of any open state 7217 associated with the request. When an anonymous stateid value is 7218 used, and an existing open denies the form of access requested, 7219 then access will be denied to the request. This stateid MUST NOT 7220 be used on operations to data servers (Section 13.6). 7222 o When "other" and "seqid" are both all ones, the stateid is a 7223 special read bypass stateid. When this value is used in WRITE or 7224 SETATTR, it is treated like the anonymous value. When used in 7225 READ, the server MAY grant access, even if access would normally 7226 be denied to READ requests. This stateid MUST NOT be used on 7227 operations to data servers. 7229 o When "other" is zero and "seqid" is one, the stateid represents 7230 the current stateid, which is whatever value is the last stateid 7231 returned by an operation within the COMPOUND. In the case of an 7232 OPEN, the stateid returned for the open file, and not the 7233 delegation is used. The stateid passed to the operation in place 7234 of the special value has its "seqid" value set to zero, except 7235 when the current stateid is used by the operation CLOSE or 7236 OPEN_DOWNGRADE. If there is no operation in the COMPOUND which 7237 has returned a stateid value, the server MUST return the error 7238 NFS4ERR_BAD_STATEID. As illustrated in Figure 89, if the value of 7239 a current stateid is a special stateid, and the stateid of an 7240 operation's arguments has "other" set to zero, and "seqid" set to 7241 one, then the server MUST return the error NFS4ERR_BAD_STATEID. 7243 o When "other" is zero and "seqid" is NFS4_UINT32_MAX, the stateid 7244 represents a reserved stateid value defined to be invalid. When 7245 this stateid is used, the server MUST return the error 7246 NFS4ERR_BAD_STATEID. 7248 If a stateid value is used which has all zero or all ones in the 7249 "other" field, but does not match one of the cases above, the server 7250 MUST return the error NFS4ERR_BAD_STATEID. 7252 Special stateids, unlike other stateids, are not associated with 7253 individual client IDs or filehandles and can be used with all valid 7254 client IDs and filehandles. In the case of a special stateid 7255 designating the current stateid, the current stateid value 7256 substituted for the special stateid is associated with a particular 7257 client ID and filehandle, and so, if it is used where current 7258 filehandle does not match that associated with the current stateid, 7259 the operation to which the stateid is passed will return 7260 NFS4ERR_BAD_STATEID. 7262 8.2.4. Stateid Lifetime and Validation 7264 Stateids must remain valid until either a client restart or a server 7265 restart or until the client returns all of the locks associated with 7266 the stateid by means of an operation such as CLOSE or DELEGRETURN. 7267 If the locks are lost due to revocation the stateid remains a valid 7268 designation of that revoked state until the client frees it by using 7269 FREE_STATEID. Stateids associated with byte-range locks are an 7270 exception. They remain valid even if a LOCKU frees all remaining 7271 locks, so long as the open file with which they are associated 7272 remains open, unless the client does a FREE_STATEID to cause the 7273 stateid to be freed. 7275 It should be noted that there are situations in which the client's 7276 locks become invalid, without the client requesting they be returned. 7277 These include lease expiration and a number of forms of lock 7278 revocation within the lease period. It is important to note that in 7279 these situations, the stateid remains valid and the client can use it 7280 to determine the disposition of the associated lost locks. 7282 An "other" value must never be reused for a different purpose (i.e. 7283 different filehandle, owner, or type of locks) within the context of 7284 a single client ID. A server may retain the "other" value for the 7285 same purpose beyond the point where it may otherwise be freed but if 7286 it does so, it must maintain "seqid" continuity with previous values. 7288 One mechanism that may be used to satisfy the requirement that the 7289 server recognize invalid and out-of-date stateids is for the server 7290 to divide the "other" field of the stateid into two fields. 7292 o An index into a table of locking-state structures. 7294 o A generation number which is incremented on each allocation of a 7295 table entry for a particular use. 7297 And then store in each table entry, 7298 o The client ID with which the stateid is associated. 7300 o The current generation number for the (at most one) valid stateid 7301 sharing this index value. 7303 o The filehandle of the file on which the locks are taken. 7305 o An indication of the type of stateid (open, byte-range lock, file 7306 delegation, directory delegation, layout). 7308 o The last "seqid" value returned corresponding to the current 7309 "other" value. 7311 o An indication of the current status of the locks associated with 7312 this stateid. In particular, whether these have been revoked and 7313 if so, for what reason. 7315 With this information, an incoming stateid can be validated and the 7316 appropriate error returned when necessary. Special and non-special 7317 stateids are handled separately. (See Section 8.2.3 for a discussion 7318 of special stateids.) 7320 Note that stateids are implicitly qualified by the current client ID, 7321 as derived from the client ID associated with the current session. 7322 Note however, that the semantics of the session will prevent stateids 7323 associated with a previous client or server instance from being 7324 analyzed by this procedure. 7326 If server restart has resulted in an invalid client ID or a session 7327 ID which is invalid, SEQUENCE will return an error and the operation 7328 that takes a stateid as an argument will never be processed. 7330 If there has been a server restart where there is a persistent 7331 session, and all leased state has been lost, then the session in 7332 question will, although valid, be marked as dead, and any operation 7333 not satisfied by means of the reply cache will receive the error 7334 NFS4ERR_DEADSESSION, and thus not be processed as indicated below. 7336 When a stateid is being tested, and the "other" field is all zeros or 7337 all ones, a check that the "other" and "seqid" fields match a defined 7338 combination for a special stateid is done and the results determined 7339 as follows: 7341 o If the "other" and "seqid" fields do not match a defined 7342 combination associated with a special stateid, the error 7343 NFS4ERR_BAD_STATEID is returned. 7345 o If the special stateid is one designating the current stateid, and 7346 there is a current stateid, then the current stateid is 7347 substituted for the special stateid and the checks appropriate to 7348 non-special stateids in performed. 7350 o If the combination is valid in general but is not appropriate to 7351 the context in which the stateid is used (e.g. an all-zero stateid 7352 is used when an open stateid is required in a LOCK operation), the 7353 error NFS4ERR_BAD_STATEID is also returned. 7355 o Otherwise, the check is completed and the special stateid is 7356 accepted as valid. 7358 When a stateid is being tested, and the "other" field is neither all 7359 zeros or all ones, the following procedure could be used to validate 7360 an incoming stateid and return an appropriate error, when necessary, 7361 assuming that the "other" field would be divided into a table index 7362 and an entry generation. 7364 o If the table index field is outside the range of the associated 7365 table, return NFS4ERR_BAD_STATEID. 7367 o If the selected table entry is of a different generation than that 7368 specified in the incoming stateid, return NFS4ERR_BAD_STATEID. 7370 o If the selected table entry does not match the current filehandle, 7371 return NFS4ERR_BAD_STATEID. 7373 o If the client ID in the table entry does not match the client ID 7374 associated with the current session, return NFS4ERR_BAD_STATEID. 7376 o If the stateid represents revoked state, then return 7377 NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, or NFS4ERR_DELEG_REVOKED, 7378 as appropriate. 7380 o If the stateid type is not valid for the context in which the 7381 stateid appears, return NFS4ERR_BAD_STATEID. Note that a stateid 7382 may be valid in general, as would be reported by the TEST_STATEID 7383 operation, but be invalid for a particular operation, as, for 7384 example, when a stateid which doesn't represent byte-range locks 7385 is passed to the non-from_open case of LOCK or to LOCKU, or when a 7386 stateid which does not represent an open is passed to CLOSE or 7387 OPEN_DOWNGRADE. In such cases, the server MUST return 7388 NFS4ERR_BAD_STATEID. 7390 o If the "seqid" field is not zero, and it is greater than the 7391 current sequence value corresponding the current "other" field, 7392 return NFS4ERR_BAD_STATEID. 7394 o If the "seqid" field is not zero, and it is less than the current 7395 sequence value corresponding the current "other" field, return 7396 NFS4ERR_OLD_STATEID. 7398 o Otherwise, the stateid is valid and the table entry should contain 7399 any additional information about the type of stateid and 7400 information associated with that particular type of stateid, such 7401 as the associated set of locks, such as open-owner and lock-owner 7402 information, as well as information on the specific locks, such as 7403 open modes and byte ranges. 7405 8.2.5. Stateid Use for I/O Operations 7407 Clients performing I/O operations need to select an appropriate 7408 stateid based on the locks (including opens and delegations) held by 7409 the client and the various types of state-owners issuing the I/O 7410 requests. SETATTR operations which change the file size are treated 7411 like I/O operations in this regard. 7413 The following rules, applied in order of decreasing priority, govern 7414 the selection of the appropriate stateid. In following these rules, 7415 the client will only consider locks of which it has actually received 7416 notification by an appropriate operation response or callback. Note 7417 that the rules are slightly different in the case of I/O to data 7418 servers when file layouts are being used (see Section 13.9.1). 7420 o If the client holds a delegation for the file in question, the 7421 delegation stateid SHOULD be used. 7423 o Otherwise, if the lock-owner corresponding entity (e.g. process) 7424 issuing the I/O has a lock stateid for the associated open file, 7425 then the lock stateid for that lock-owner and open file SHOULD be 7426 used. 7428 o If there is no lock stateid, then the open stateid for the open 7429 file in question SHOULD be used. 7431 o Finally, if none of the above apply, then a special stateid SHOULD 7432 be used. 7434 Ignoring these rules may result in situations in which the server 7435 does not have information necessary to properly process the request. 7436 For example, when mandatory byte-range locks are in effect, if the 7437 stateid does not indicate the proper lockowner, via a lock stateid, a 7438 request might be avoidably rejected. 7440 The server however should not try to enforce these ordering rules and 7441 should use whatever information is available to proper process I/O 7442 requests. In particular, when a client has a delegation for a given 7443 file, it SHOULD take note of this fact in processing a request, even 7444 if it is sent with a special stateid. 7446 8.2.6. Stateid Use for SETATTR Operations 7448 Because each operation is associated with a session ID and from that 7449 the clientid can be determined, operations do not need to include a 7450 stateid for the server to be able to determine whether they should 7451 cause a delegation to be recalled or are to be treated as done within 7452 the scope of the delegation. 7454 In the case of SETATTR operations, a stateid is present. In cases 7455 other than those which set the file size, the client may send either 7456 a special stateid or, when a delegation is held for the file in 7457 question, a delegation stateid. While the server SHOULD validate the 7458 stateid and may use the stateid to optimize the determination as to 7459 whether a delegation is held, it SHOULD note the presence of a 7460 delegation even when a special stateid is sent, and MUST accept a 7461 valid delegation stateid when sent. 7463 8.3. Lease Renewal 7465 The purpose of a lease is to allow the client to indicate to the 7466 server, in a low-overhead way, that it is active, and thus that the 7467 server is to retain the client's locks. This arrangement allows the 7468 server to remove stale locking-related objects that are held by a 7469 client that has crashed or is otherwise unreachable, once the 7470 relevant lease expires. This in turn allows other clients to obtain 7471 conflicting locks without being delayed indefinitely by inactive or 7472 unreachable clients. It is not a mechanism for cache consistency and 7473 lease renewals may not be denied if the lease interval has not 7474 expired. 7476 Since each session is associated with a specific client (identified 7477 by the client's client ID), any operation sent on that session is an 7478 indication that the associated client is reachable. When a request 7479 is sent for a given session, successful execution of a SEQUENCE 7480 operation (or successful retrieval of the result of SEQUENCE from the 7481 reply cache) on an unexpired lease will result in the lease being 7482 implicitly renewed, for the standard renewal period (equal to the 7483 lease_time attribute). 7485 If the client ID's lease has not expired when the server receives a 7486 SEQUENCE operation, then the server MUST renew the lease. If the 7487 client ID's lease has expired when the server receives a SEQUENCE 7488 operation, the server MAY renew the lease; this depends on whether 7489 any state was revoked as a result of the client's failure to renew 7490 the lease before expiration. 7492 Absent other activity that would renew the lease, a COMPOUND 7493 consisting of a single SEQUENCE operation will suffice. The client 7494 should also take communication-related delays into account and take 7495 steps to ensure that the renewal messages actually reach the server 7496 in good time. For example: 7498 o When trunking is in effect, the client should consider issuing 7499 multiple requests on different connections, in order to ensure 7500 that renewal occurs, even in the event of blockage in the path 7501 used for one of those connections. 7503 o Transport retransmission delays might become so large as to 7504 approach or exceed the length of the lease period. This may be 7505 particularly likely when the server is unresponsive due to a 7506 restart; see Section 8.4.2.1. If the client implementation is not 7507 careful, transport retransmission delays can result in the client 7508 failing to detect a server restart before the grace period ends. 7509 The scenario is that the client is using a transport with 7510 exponential back off, such that the maximum retransmission timeout 7511 exceeds the both the grace period and the lease_time attribute. A 7512 network partition causes the client's connection's retransmission 7513 interval to back off, and even after the partition heals, the next 7514 transport-level retransmission is sent after the server has 7515 restarted and its grace period ends. 7517 The client MUST either recover from the ensuing NFS4ERR_NOGRACE 7518 errors, or it MUST ensure that despite transport level 7519 retransmission intervals that exceed the lease_time, nonetheless a 7520 SEQUENCE operation is sent that renews the lease before 7521 expiration. The client can achieve this by associating a new 7522 connection with the session, and sending a SEQUENCE operation on 7523 it. However, if the attempt to establish a new connection is 7524 delayed for some reason (e.g. exponential backoff of the 7525 connection establishment packets), the client will have to abort 7526 the connection establishment attempt before the lease expires, and 7527 attempt to re-connect. 7529 If the server renews the lease upon receiving a SEQUENCE operation, 7530 the server MUST NOT allow the lease to expire while the rest of the 7531 operations in the COMPOUND procedure's request are still executing. 7532 Once the last operation has finished, and the response to COMPOUND 7533 has been sent, the server MUST set the lease to expire no sooner than 7534 the sum of current time and the value of the lease_time attribute. 7536 A client ID's lease can expire when it has been at least the lease 7537 interval (lease_time) since the last lease-renewing SEQUENCE 7538 operation was sent on any of the client ID's sessions and there are 7539 no active COMPOUND operations on any such sessions. 7541 Because the SEQUENCE operation is the basic mechanism to renew a 7542 lease, and because if must be done at least once for each lease 7543 period, it is the natural mechanism whereby the server will inform 7544 the client of changes in the lease status that the client needs to be 7545 informed of. The client should inspect the status flags 7546 (sr_status_flags) returned by sequence and take the appropriate 7547 action (see Section 18.46.3 for details). 7549 o The status bits SEQ4_STATUS_CB_PATH_DOWN and 7550 SEQ4_STATUS_CB_PATH_DOWN_SESSION indicate problems with the 7551 backchannel which the client may need to address in order to 7552 receive callback requests. 7554 o The status bits SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and 7555 SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED indicate problems with GSS 7556 contexts for the backchannel which the client may have to address 7557 to allow callback requests to be sent to it. 7559 o The status bits SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, 7560 SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, 7561 SEQ4_STATUS_ADMIN_STATE_REVOKED, and 7562 SEQ4_STATUS_RECALLABLE_STATE_REVOKED notify the client of lock 7563 revocation events. When these bits are set, the client should use 7564 TEST_STATEID to find what stateids have been revoked and use 7565 FREE_STATEID to acknowledge loss of the associated state. 7567 o The status bit SEQ4_STATUS_LEASE_MOVE indicates that 7568 responsibility for lease renewal has been transferred to one or 7569 more new servers. 7571 o The status bit SEQ4_STATUS_RESTART_RECLAIM_NEEDED indicates that 7572 due to server restart the client must reclaim locking state. 7574 o The status bit SEQ4_STATUS_BACKCHANNEL_FAULT indicates the server 7575 has encountered an unrecoverable fault with the backchannel (e.g. 7576 it has lost track of a sequence ID for a slot in the backchannel). 7578 8.4. Crash Recovery 7580 A critical requirement in crash recovery is that both the client and 7581 the server know when the other has failed. Additionally, it is 7582 required that a client sees a consistent view of data across server 7583 restarts. All READ and WRITE operations that may have been queued 7584 within the client or network buffers must wait until the client has 7585 successfully recovered the locks protecting the READ and WRITE 7586 operations. Any that reach the server before the server can safely 7587 determine that the client has recovered enough locking state to be 7588 sure that such operations can be safely processed must be rejected. 7589 This will happen because either: 7591 o The state presented is no longer valid since it is associated with 7592 a now invalid client ID. In this case the client will receive 7593 either an NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, and any 7594 attempt to attach a new session to the existing client ID will 7595 result in an NFS4ERR_STALE_CLIENTID error. 7597 o Subsequent recovery of locks may make execution of the operation 7598 inappropriate (NFS4ERR_GRACE). 7600 8.4.1. Client Failure and Recovery 7602 In the event that a client fails, the server may release the client's 7603 locks when the associated lease has expired. Conflicting locks from 7604 another client may only be granted after this lease expiration. As 7605 discussed in Section 8.3, when a client has not failed and re- 7606 establishes its lease before expiration occurs, requests for 7607 conflicting locks will not be granted. 7609 To minimize client delay upon restart, lock requests are associated 7610 with an instance of the client by a client-supplied verifier. This 7611 verifier is part of the client_owner4 sent in the initial EXCHANGE_ID 7612 call made by the client. The server returns a client ID as a result 7613 of the EXCHANGE_ID operation. The client then confirms the use of 7614 the client ID by establishing a session associated with that client 7615 ID (see Section 18.36.3 for a description how this is done). All 7616 locks, including opens, byte-range locks, delegations, and layouts 7617 obtained by sessions using that client ID are associated with that 7618 client ID. 7620 Since the verifier will be changed by the client upon each 7621 initialization, the server can compare a new verifier to the verifier 7622 associated with currently held locks and determine that they do not 7623 match. This signifies the client's new instantiation and subsequent 7624 loss (upon confirmation of the new client ID) of locking state. As a 7625 result, the server is free to release all locks held which are 7626 associated with the old client ID which was derived from the old 7627 verifier. At this point conflicting locks from other clients, kept 7628 waiting while the lease had not yet expired, can be granted. In 7629 addition, all stateids associated with the old client ID can also be 7630 freed, as they are no longer reference-able. 7632 Note that the verifier must have the same uniqueness properties as 7633 the verifier for the COMMIT operation. 7635 8.4.2. Server Failure and Recovery 7637 If the server loses locking state (usually as a result of a restart), 7638 it must allow clients time to discover this fact and re-establish the 7639 lost locking state. The client must be able to re-establish the 7640 locking state without having the server deny valid requests because 7641 the server has granted conflicting access to another client. 7642 Likewise, if there is a possibility that clients have not yet re- 7643 established their locking state for a file, and that such locking 7644 state might make it invalid to perform READ or WRITE operations, for 7645 example through the establishment of mandatory locks, the server must 7646 disallow READ and WRITE operations for that file. 7648 A client can determine that loss of locking state has occurred via 7649 several methods. 7651 1. When a SEQUENCE (most common) or other operation returns 7652 NFS4ERR_BADSESSION, this may mean the session has been destroyed, 7653 but the client ID is still valid. The client sends a 7654 CREATE_SESSION request with the client ID to re-establish the 7655 session. If CREATE_SESSION fails with NFS4ERR_STALE_CLIENTID, 7656 the client must establish a new client ID (see Section 8.1) and 7657 re-establish its lock state with the new client ID, after the 7658 CREATE_SESSION operation succeeds (see Section 8.4.2.1). 7660 2. When a SEQUENCE (most common) or other operation on a persistent 7661 session returns NFS4ERR_DEADSESSION, this indicates that a 7662 session is no longer usable for new, i.e. not satisfied from the 7663 reply cache, operations. Once all pending operations are 7664 determined to be either performed before the retry or not 7665 performed, the client sends a CREATE_SESSION request with the 7666 client ID to re-establish the session. If CREATE_SESSION fails 7667 with NFS4ERR_STALE_CLIENTID, the client must establish a new 7668 client ID (see Section 8.1) and re-establish its lock state after 7669 the CREATE_SESSION, with the new client ID, succeeds, 7670 (Section 8.4.2.1). 7672 3. When a operation, neither SEQUENCE nor preceded by SEQUENCE (for 7673 example, CREATE_SESSION, DESTROY_SESSION) returns 7674 NFS4ERR_STALE_CLIENTID. The client MUST establish a new client 7675 ID (Section 8.1) and re-establish its lock state 7676 (Section 8.4.2.1). 7678 8.4.2.1. State Reclaim 7680 When state information and the associated locks are lost as a result 7681 of a server restart, the protocol must provide a way to cause that 7682 state to be re-established. The approach used is to define, for most 7683 types of locking state (layouts are an exception), a request whose 7684 function is to allow the client to re-establish on the server a lock 7685 first obtained from a previous instance. Generally these requests 7686 are variants of the requests normally used to create locks of that 7687 type and are referred to as "reclaim-type" requests and the process 7688 of re-establishing such locks is referred to as "reclaiming" them. 7690 Because each client must have an opportunity to reclaim all of the 7691 locks that it has without the possibility that some other client will 7692 be granted a conflicting lock, a special period called the "grace 7693 period" is devoted to the reclaim process. During this period, 7694 requests creating client IDs and sessions are handled normally, but 7695 locking requests are subject to special restrictions. Only reclaim- 7696 type locking requests are allowed, unless the server can reliably 7697 determine (through state persistently maintained across restart 7698 instances), that granting any such lock cannot possibly conflict with 7699 a subsequent reclaim. When a request is made to obtain a new lock 7700 (i.e. not a reclaim-type request) during the grace period and such a 7701 determination cannot be made, the server must return the error 7702 NFS4ERR_GRACE. 7704 Once a session is established using the new client ID, the client 7705 will use reclaim-type locking requests (e.g. LOCK requests with 7706 reclaim set to TRUE and OPEN operations with a claim type of 7707 CLAIM_PREVIOUS; see Section 9.11) to re-establish its locking state. 7708 Once this is done, or if there is no such locking state to reclaim, 7709 the client sends a global RECLAIM_COMPLETE operation, i.e. one with 7710 the rca_one_fs argument set to FALSE, to indicate that it has 7711 reclaimed all of the locking state that it will reclaim. Once a 7712 client sends such a RECLAIM_COMPLETE operation, it may attempt non- 7713 reclaim locking operations, although it may get NFS4ERR_GRACE errors 7714 the operations until the period of special handling is over. See 7715 Section 11.7.7 for a discussion of the analogous handling lock 7716 reclamation in the case of file systems transitioning from server to 7717 server. 7719 During the grace period, the server must reject READ and WRITE 7720 operations and non-reclaim locking requests (i.e. other LOCK and OPEN 7721 operations) with an error of NFS4ERR_GRACE, unless it can guarantee 7722 that these may be done safely, as described below. 7724 The grace period may last until all clients which are known to 7725 possibly have had locks have done a global RECLAIM_COMPLETE 7726 operation, indicating that they have finished reclaiming the locks 7727 they held before the server restart. This means that a client which 7728 has done a RECLAIM_COMPLETE must be prepared to receive an 7729 NFS4ERR_GRACE when attempting to acquire new locks. In order for the 7730 server to know that all clients with possible prior lock state have 7731 done a RECLAIM_COMPLETE, the server must maintain in stable storage a 7732 list of clients which may have such locks. The server may also 7733 terminate the grace period before all clients have done a global 7734 RECLAIM_COMPLETE. The server SHOULD NOT terminate the grace period 7735 before a time equal to the lease period in order to give clients an 7736 opportunity to find out about the server restart, as a result of 7737 issuing requests on associated sessions with a frequency governed by 7738 the lease time. Note that when a client does not issue such requests 7739 (or they are issued by the client but not received by the server), it 7740 is possible for the grace period to expire before the client finds 7741 out that the server restart has occurred. 7743 Some additional time in order to allow a client to establish a new 7744 client ID and session and to effect lock reclaims may be added to the 7745 lease time. Note that analogous rules apply to file system-specific 7746 grace periods discussed in Section 11.7.7. 7748 If the server can reliably determine that granting a non-reclaim 7749 request will not conflict with reclamation of locks by other clients, 7750 the NFS4ERR_GRACE error does not have to be returned even within the 7751 grace period, although NFS4ERR_GRACE must always be returned to 7752 clients attempting a non-reclaim lock request before doing their own 7753 global RECLAIM_COMPLETE. For the server to be able to service READ 7754 and WRITE operations during the grace period, it must again be able 7755 to guarantee that no possible conflict could arise between a 7756 potential reclaim locking request and the READ or WRITE operation. 7757 If the server is unable to offer that guarantee, the NFS4ERR_GRACE 7758 error must be returned to the client. 7760 For a server to provide simple, valid handling during the grace 7761 period, the easiest method is to simply reject all non-reclaim 7762 locking requests and READ and WRITE operations by returning the 7763 NFS4ERR_GRACE error. However, a server may keep information about 7764 granted locks in stable storage. With this information, the server 7765 could determine if a regular lock or READ or WRITE operation can be 7766 safely processed. 7768 For example, if the server maintained on stable storage summary 7769 information on whether mandatory locks exist, either mandatory byte- 7770 range locks, or share reservations specifying deny modes, many 7771 requests could be allowed during the grace period. If it is known 7772 that no such share reservations exist, OPEN request that do not 7773 specify deny modes may be safely granted. If, in addition, it is 7774 known that no mandatory byte-range locks exist, either through 7775 information stored on stable storage or simply because the server 7776 does not support such locks, READ and WRITE requests may be safely 7777 processed during the grace period. Another important case is where 7778 it is known that no mandatory byte-range locks exist, either because 7779 the server does not provide support for them, or because their 7780 absence is known from persistently recorded data. In this case, READ 7781 and WRITE operations specifying stateids derived from reclaim-type 7782 operation may be validly processed during the grace period because 7783 the fact of the valid reclaim ensures that no lock subsequently 7784 granted can prevent the I/O. 7786 To reiterate, for a server that allows non-reclaim lock and I/O 7787 requests to be processed during the grace period, it MUST determine 7788 that no lock subsequently reclaimed will be rejected and that no lock 7789 subsequently reclaimed would have prevented any I/O operation 7790 processed during the grace period. 7792 Clients should be prepared for the return of NFS4ERR_GRACE errors for 7793 non-reclaim lock and I/O requests. In this case the client should 7794 employ a retry mechanism for the request. A delay (on the order of 7795 several seconds) between retries should be used to avoid overwhelming 7796 the server. Further discussion of the general issue is included in 7797 [37]. The client must account for the server that can perform I/O 7798 and non-reclaim locking requests within the grace period as well as 7799 those that cannot do so. 7801 A reclaim-type locking request outside the server's grace period can 7802 only succeed if the server can guarantee that no conflicting lock or 7803 I/O request has been granted since restart. 7805 A server may, upon restart, establish a new value for the lease 7806 period. Therefore, clients should, once a new client ID is 7807 established, refetch the lease_time attribute and use it as the basis 7808 for lease renewal for the lease associated with that server. 7809 However, the server must establish, for this restart event, a grace 7810 period at least as long as the lease period for the previous server 7811 instantiation. This allows the client state obtained during the 7812 previous server instance to be reliably re-established. 7814 8.4.3. Network Partitions and Recovery 7816 If the duration of a network partition is greater than the lease 7817 period provided by the server, the server will not have received a 7818 lease renewal from the client. If this occurs, the server may free 7819 all locks held for the client, or it may allow the lock state to 7820 remain for a considerable period, subject to the constraint that if a 7821 request for a conflicting lock is made, locks associated with an 7822 expired lease do not prevent such a conflicting lock from being 7823 granted but MUST be revoked as necessary so as not to interfere with 7824 such conflicting requests. 7826 If the server chooses to delay freeing of lock state until there is a 7827 conflict, it may either free all of the clients locks once there is a 7828 conflict, or it may only revoke the minimum set of locks necessary to 7829 allow conflicting requests. When it adopts the finer-grained 7830 approach, it must revoke all locks associated with a given stateid, 7831 even if the conflict is with only a subset of locks. 7833 When the server chooses to free all of a client's lock state, either 7834 immediately upon lease expiration, or a result of the first attempt 7835 to obtain a conflicting a lock, the server may report the loss of 7836 lock state in a number of ways. 7838 The server may choose to invalidate the session and the associated 7839 client ID. In this case, once the client can communicate with the 7840 server, it will receive an NFS4ERR_BADSESSION error. Upon attempting 7841 to create a new session, it would get an NFS4ERR_STALE_CLIENTID. 7842 Upon creating the new client ID and new session it would attempt to 7843 reclaim locks not be allowed to do so by the server. 7845 Another possibility is for the server to maintain the session and 7846 client ID but for all stateids held by the client to become invalid 7847 or stale. Once the client can reach the server after such a network 7848 partition, the status returned by the SEQUENCE operation will 7849 indicate a loss of locking state, i.e. the flag 7850 SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED will be set in sr_status_flags. 7851 In addition, all I/O submitted by the client with the now invalid 7852 stateids will fail with the server returning the error 7853 NFS4ERR_EXPIRED. Once the client learns of the loss of locking 7854 state, it will suitably notify the applications that held the 7855 invalidated locks. The client should then take action to free 7856 invalidated stateids, either by establishing a new client ID using a 7857 new verifier or by doing a FREE_STATEID operation to release each of 7858 the invalidated stateids. 7860 When the server adopts a finer-grained approach to revocation of 7861 locks when lease have expired, only a subset of stateids will 7862 normally become invalid during a network partition. When the client 7863 can communicate with the server after such a network partition heals, 7864 the status returned by the SEQUENCE operation will indicate a partial 7865 loss of locking state (SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED). In 7866 addition, operations, including I/O submitted by the client, with the 7867 now invalid stateids will fail with the server returning the error 7868 NFS4ERR_EXPIRED. Once the client learns of the loss of locking 7869 state, it will use the TEST_STATEID operation on all of its stateids 7870 to determine which locks have been lost and then suitably notify the 7871 applications that held the invalidated locks. The client can then 7872 release the invalidated locking state and acknowledge the revocation 7873 of the associated locks by doing a FREE_STATEID operation on each of 7874 the invalidated stateids. 7876 When a network partition is combined with a server restart, there are 7877 edge conditions that place requirements on the server in order to 7878 avoid silent data corruption following the server restart. Two of 7879 these edge conditions are known, and are discussed below. 7881 The first edge condition arises as a result of the scenarios such as 7882 the following: 7884 1. Client A acquires a lock. 7886 2. Client A and server experience mutual network partition, such 7887 that client A is unable to renew its lease. 7889 3. Client A's lease expires, and the server releases the lock. 7891 4. Client B acquires a lock that would have conflicted with that of 7892 Client A. 7894 5. Client B releases its lock. 7896 6. Server restarts. 7898 7. Network partition between client A and server heals. 7900 8. Client A connects to new server instance and finds out about 7901 server restart. 7903 9. Client A reclaims its lock within the server's grace period. 7905 Thus, at the final step, the server has erroneously granted client 7906 A's lock reclaim. If client B modified the object the lock was 7907 protecting, client A will experience object corruption. 7909 The second known edge condition arises in situations such as the 7910 following: 7912 1. Client A acquires one or more locks. 7914 2. Server restarts. 7916 3. Client A and server experience mutual network partition, such 7917 that client A is unable to reclaim all of its locks within the 7918 grace period. 7920 4. Server's reclaim grace period ends. Client A has either no 7921 locks or an incomplete set of locks known to the server. 7923 5. Client B acquires a lock that would have conflicted with a lock 7924 of client A that was not reclaimed. 7926 6. Client B releases the lock. 7928 7. Server restarts a second time. 7930 8. Network partition between client A and server heals. 7932 9. Client A connects to new server instance and finds out about 7933 server restart. 7935 10. Client A reclaims its lock within the server's grace period. 7937 As with the first edge condition, the final step of the scenario of 7938 the second edge condition has the server erroneously granting client 7939 A's lock reclaim. 7941 Solving the first and second edge conditions requires that the server 7942 either always assumes after it restarts that some edge condition 7943 occurs, and thus return NFS4ERR_NO_GRACE for all reclaim attempts, or 7944 that the server record some information in stable storage. The 7945 amount of information the server records in stable storage is in 7946 inverse proportion to how harsh the server intends to be whenever 7947 edge conditions arise. The server that is completely tolerant of all 7948 edge conditions will record in stable storage every lock that is 7949 acquired, removing the lock record from stable storage only when the 7950 lock is released. For the two edge conditions discussed above, the 7951 harshest a server can be, and still support a grace period for 7952 reclaims, requires that the server record in stable storage 7953 information some minimal information. For example, a server 7954 implementation could, for each client, save in stable storage a 7955 record containing: 7957 o the co_ownerid field from the client_owner4 presented in the 7958 EXCHANGE_ID operation. 7960 o a boolean that indicates if the client's lease expired or if there 7961 was administrative intervention (see Section 8.5) to revoke a 7962 byte-range lock, share reservation, or delegation and there has 7963 been no acknowledgement, via FREE_STATEID, of such revocation. 7965 o a boolean that indicates whether the client may have locks that it 7966 believes to be reclaimable in situations which the grace period 7967 was terminated, making the server's view of lock reclaimability 7968 suspect. The server will set this for any client record in stable 7969 storage where the client has not done a suitable RECLAIM_COMPLETE 7970 (global or file system-specific depending on the target of the 7971 lock request) before it grants any new (i.e. not reclaimed) lock 7972 to any client. 7974 Assuming the above record keeping, for the first edge condition, 7975 after the server restarts, the record that client A's lease expired 7976 means that another client could have acquired a conflicting byte- 7977 range lock, share reservation, or delegation. Hence the server must 7978 reject a reclaim from client A with the error NFS4ERR_NO_GRACE. 7980 For the second edge condition, after the server restarts for a second 7981 time, the indication that the client had not completed its reclaims 7982 at the time at which the grace period ended means that the server 7983 must reject a reclaim from client A with the error NFS4ERR_NO_GRACE. 7985 When either edge condition occurs, the client's attempt to reclaim 7986 locks will result in the error NFS4ERR_NO_GRACE. When this is 7987 received, or after the client restarts with no lock state, the client 7988 will send a global RECLAIM_COMPLETE. When the RECLAIM_COMPLETE is 7989 received, the server and client are again in agreement regarding 7990 reclaimable locks and both booleans in persistent storage can be 7991 reset, to be set again only when there is a subsequent event that 7992 causes lock reclaim operations to be questionable. 7994 Regardless of the level and approach to record keeping, the server 7995 MUST implement one of the following strategies (which apply to 7996 reclaims of share reservations, byte-range locks, and delegations): 7998 1. Reject all reclaims with NFS4ERR_NO_GRACE. This is extremely 7999 unforgiving, but necessary if the server does not record lock 8000 state in stable storage. 8002 2. Record sufficient state in stable storage such that all known 8003 edge conditions involving server restart, including the two noted 8004 in this section, are detected. It is acceptable to erroneously 8005 recognize an edge condition and not allow a reclaim, when, with 8006 sufficient knowledge it would be allowed. The error the server 8007 would return in this case is NFS4ERR_NO_GRACE. Note it is not 8008 known if there are other edge conditions. 8010 In the event that, after a server restart, the server determines 8011 that there is unrecoverable damage or corruption to the 8012 information in stable storage, then for all clients and/or locks 8013 which may be affected, the server MUST return NFS4ERR_NO_GRACE. 8015 A mandate for the client's handling of the NFS4ERR_NO_GRACE error is 8016 outside the scope of this specification, since the strategies for 8017 such handling are very dependent on the client's operating 8018 environment. However, one potential approach is described below. 8020 When the client receives NFS4ERR_NO_GRACE, it could examine the 8021 change attribute of the objects the client is trying to reclaim state 8022 for, and use that to determine whether to re-establish the state via 8023 normal OPEN or LOCK requests. This is acceptable provided the 8024 client's operating environment allows it. In other words, the client 8025 implementor is advised to document for his users the behavior. The 8026 client could also inform the application that its byte-range lock or 8027 share reservations (whether they were delegated or not) have been 8028 lost, such as via a UNIX signal, a GUI pop-up window, etc. See 8029 Section 10.5 for a discussion of what the client should do for 8030 dealing with unreclaimed delegations on client state. 8032 For further discussion of revocation of locks see Section 8.5. 8034 8.5. Server Revocation of Locks 8036 At any point, the server can revoke locks held by a client and the 8037 client must be prepared for this event. When the client detects that 8038 its locks have been or may have been revoked, the client is 8039 responsible for validating the state information between itself and 8040 the server. Validating locking state for the client means that it 8041 must verify or reclaim state for each lock currently held. 8043 The first occasion of lock revocation is upon server restart. Note 8044 that this includes situations in which sessions are persistent and 8045 locking state is lost. In this class of instances, the client will 8046 receive an error (NFS4ERR_STALE_CLIENTID) on an operation that takes 8047 client ID, usually as part of recovery in response to a problem with 8048 the current session) and the client will proceed with normal crash 8049 recovery as described in the Section 8.4.2.1. 8051 The second occasion of lock revocation is the inability to renew the 8052 lease before expiration, as discussed in Section 8.4.3. While this 8053 is considered a rare or unusual event, the client must be prepared to 8054 recover. The server is responsible for determining the precise 8055 consequences of the lease expiration, informing the client of the 8056 scope of the lock revocation decided upon. The client then uses the 8057 status information provided by the server in the SEQUENCE results 8058 (field sr_status_flags, see Section 18.46.3) to synchronize its 8059 locking state with that of the server, in order to recover. 8061 The third occasion of lock revocation can occur as a result of 8062 revocation of locks within the lease period, either because of 8063 administrative intervention, or because a recallable lock (a 8064 delegation or layout) was not returned within the lease period after 8065 having been recalled. While these are considered rare events, they 8066 are possible and the client must be prepared to deal with them. When 8067 either of these events occur, the client finds out about the 8068 situation through the status returned by the SEQUENCE operation. Any 8069 use of stateids associated with locks revoked during the lease period 8070 will receive the error NFS4ERR_ADMIN_REVOKED or 8071 NFS4ERR_DELEG_REVOKED, as appropriate. 8073 In all situations in which a subset of locking state may have been 8074 revoked, which include all cases in which locking state is revoked 8075 within the lease period, it is up to the client to determine which 8076 locks have been revoked and which have not. It does this by using 8077 the TEST_STATEID operation on the appropriate set of stateids. Once 8078 the set of revoked locks has been determined, the applications can be 8079 notified, and the invalidated stateids can be freed and lock 8080 revocation acknowledged by using FREE_STATEID. 8082 8.6. Short and Long Leases 8084 When determining the time period for the server lease, the usual 8085 lease tradeoffs apply. Short leases are good for fast server 8086 recovery at a cost of increased operations to effect lease renewal 8087 (when there are no other operations during the period to effect lease 8088 renewal as a side-effect). Long leases are certainly kinder and 8089 gentler to servers trying to handle very large numbers of clients. 8090 The number of extra requests to effect lock renewal drops in inverse 8091 proportion to the lease time. The disadvantages of long leases 8092 include the possibility of slower recovery after certain failures. 8093 After server failure, a longer grace period may be required when some 8094 clients do not promptly reclaim their locks and do a global 8095 RECLAIM_COMPLETE. In the event of client failure, there can be a 8096 longer period for leases to expire thus forcing conflicting requests 8097 to wait. 8099 Long leases are practical if the server can store lease state in non- 8100 volatile memory. Upon recovery, the server can reconstruct the lease 8101 state from its non-volatile memory and continue operation with its 8102 clients and therefore long leases would not be an issue. 8104 8.7. Clocks, Propagation Delay, and Calculating Lease Expiration 8106 To avoid the need for synchronized clocks, lease times are granted by 8107 the server as a time delta. However, there is a requirement that the 8108 client and server clocks do not drift excessively over the duration 8109 of the lease. There is also the issue of propagation delay across 8110 the network which could easily be several hundred milliseconds as 8111 well as the possibility that requests will be lost and need to be 8112 retransmitted. 8114 To take propagation delay into account, the client should subtract it 8115 from lease times (e.g. if the client estimates the one-way 8116 propagation delay as 200 milliseconds, then it can assume that the 8117 lease is already 200 milliseconds old when it gets it). In addition, 8118 it will take another 200 milliseconds to get a response back to the 8119 server. So the client must send a lease renewal or write data back 8120 to the server at least 400 milliseconds before the lease would 8121 expire. 8123 The server's lease period configuration should take into account the 8124 network distance of the clients that will be accessing the server's 8125 resources. It is expected that the lease period will take into 8126 account the network propagation delays and other network delay 8127 factors for the client population. Since the protocol does not allow 8128 for an automatic method to determine an appropriate lease period, the 8129 server's administrator may have to tune the lease period. 8131 8.8. Obsolete Locking Infrastructure From NFSv4.0 8133 There are a number of operations and fields within existing 8134 operations that no longer have a function in NFSv4.1. In one way or 8135 another, these changes are all due to the implementation of sessions 8136 which provides client context and exactly once semantics as a base 8137 feature of the protocol, separate from locking itself. 8139 The following NFSv4.0 operations MUST NOT be implemented in NFSv4.1. 8140 The server MUST return NFS4ERR_NOTSUPP if these operations are found 8141 in an NFSv4.1 COMPOUND. 8143 o SETCLIENTID since its function has been replaced by EXCHANGE_ID. 8145 o SETCLIENTID_CONFIRM since client ID confirmation now happens by 8146 means of CREATE_SESSION. 8148 o OPEN_CONFIRM because state-owner-based seqids have been replaced 8149 by the sequence ID in the SEQUENCE operation. 8151 o RELEASE_LOCKOWNER because lock-owners with no associated locks do 8152 not have any sequence-related state and so can be deleted by the 8153 server at will. 8155 o RENEW because every SEQUENCE operation for a session causes lease 8156 renewal, making a separate operation superfluous. 8158 Also, there are a number of fields, present in existing operations 8159 related to locking that have no use in minor version one. They were 8160 used in minor version zero to perform functions now provided in a 8161 different fashion. 8163 o Sequence ids used to sequence requests for a given state-owner and 8164 to provide retry protection, now provided via sessions. 8166 o Client IDs used to identify the client associated with a given 8167 request. Client identification is now available using the client 8168 ID associated with the current session, without needing an 8169 explicit client ID field. 8171 Such vestigial fields in existing operations have no function in 8172 NFSv4.1 and are ignored by the server. Note that client IDs in 8173 operations new to NFSv4.1 (such as CREATE_SESSION and 8174 DESTROY_CLIENTID) are not ignored. 8176 9. File Locking and Share Reservations 8178 To support Win32 share reservations it is necessary to provide 8179 operations which atomically open or create files. Having a separate 8180 share/unshare operation would not allow correct implementation of the 8181 Win32 OpenFile API. In order to correctly implement share semantics, 8182 the previous NFS protocol mechanisms used when a file is opened or 8183 created (LOOKUP, CREATE, ACCESS) need to be replaced. The NFSv4.1 8184 protocol defines an OPEN operation which is capable of atomically 8185 looking up, creating, and locking a file on the server. 8187 9.1. Opens and Byte-Range Locks 8189 It is assumed that manipulating a byte-range lock is rare when 8190 compared to READ and WRITE operations. It is also assumed that 8191 server restarts and network partitions are relatively rare. 8192 Therefore it is important that the READ and WRITE operations have a 8193 lightweight mechanism to indicate if they possess a held lock. A 8194 byte-range lock request contains the heavyweight information required 8195 to establish a lock and uniquely define the owner of the lock. 8197 9.1.1. State-owner Definition 8199 When opening a file or requesting a byte-range lock, the client must 8200 specify an identifier which represents the owner of the requested 8201 lock. This identifier is in the form of a state-owner, represented 8202 in the protocol by a state_owner4, a variable-length opaque array 8203 which, when concatenated with the current client ID uniquely defines 8204 the owner of lock managed by the client. This may be a thread ID, 8205 process ID, or other unique value. 8207 Owners of opens and owners of byte-range locks are separate entities 8208 and remain separate even if the same opaque arrays are used to 8209 designate owners of each. The protocol distinguishes between open- 8210 owners (represented by open_owner4 structures) and lock-owners 8211 (represented by lock_owner4 structures). 8213 Each open is associated with a specific open-owner while each byte- 8214 range lock is associated with a lock-owner and an open-owner, the 8215 latter being the open-owner associated with the open file under which 8216 the LOCK operation was done. Delegations and layouts, on the other 8217 hand, are not associated with a specific owner but are associated 8218 with the client as a whole (identified by a client ID). 8220 9.1.2. Use of the Stateid and Locking 8222 All READ, WRITE and SETATTR operations contain a stateid. For the 8223 purposes of this section, SETATTR operations which change the size 8224 attribute of a file are treated as if they are writing the area 8225 between the old and new size (i.e. the range truncated or added to 8226 the file by means of the SETATTR), even where SETATTR is not 8227 explicitly mentioned in the text. The stateid passed to one of these 8228 operations must be one that represents an open, a set of byte-range 8229 locks, or a delegation, or it may be a special stateid representing 8230 anonymous access or the special bypass stateid. 8232 If the state-owner performs a READ or WRITE in a situation in which 8233 it has established a byte-range lock or share reservation on the 8234 server (any OPEN constitutes a share reservation) the stateid 8235 (previously returned by the server) must be used to indicate what 8236 locks, including both byte-range locks and share reservations, are 8237 held by the state-owner. If no state is established by the client, 8238 either byte-range lock or share reservation, a special stateid for 8239 anonymous state (zero as "other" and "seqid") is used. (See 8240 Section 8.2.3 for a description of 'special' stateids in general.) 8241 Regardless whether a stateid for anonymous state or a stateid 8242 returned by the server is used, if there is a conflicting share 8243 reservation or mandatory byte-range lock held on the file, the server 8244 MUST refuse to service the READ or WRITE operation. 8246 Share reservations are established by OPEN operations and by their 8247 nature are mandatory in that when the OPEN denies READ or WRITE 8248 operations, that denial results in such operations being rejected 8249 with error NFS4ERR_LOCKED. Byte-range locks may be implemented by 8250 the server as either mandatory or advisory, or the choice of 8251 mandatory or advisory behavior may be determined by the server on the 8252 basis of the file being accessed (for example, some UNIX-based 8253 servers support a "mandatory lock bit" on the mode attribute such 8254 that if set, byte-range locks are required on the file before I/O is 8255 possible). When byte-range locks are advisory, they only prevent the 8256 granting of conflicting lock requests and have no effect on READs or 8257 WRITEs. Mandatory byte-range locks, however, prevent conflicting I/O 8258 operations. When they are attempted, they are rejected with 8259 NFS4ERR_LOCKED. When the client gets NFS4ERR_LOCKED on a file it 8260 knows it has the proper share reservation for, it will need to send a 8261 LOCK request on the region of the file that includes the region the 8262 I/O was to be performed on, with an appropriate locktype (i.e. 8263 READ*_LT for a READ operation, WRITE*_LT for a WRITE operation). 8265 Note that for UNIX environments that support mandatory file locking, 8266 the distinction between advisory and mandatory locking is subtle. In 8267 fact, advisory and mandatory byte-range locks are exactly the same in 8268 so far as the APIs and requirements on implementation. If the 8269 mandatory lock attribute is set on the file, the server checks to see 8270 if the lock-owner has an appropriate shared (read) or exclusive 8271 (write) byte-range lock on the region it wishes to read or write to. 8272 If there is no appropriate lock, the server checks if there is a 8273 conflicting lock (which can be done by attempting to acquire the 8274 conflicting lock on behalf of the lock-owner, and if successful, 8275 release the lock after the READ or WRITE is done), and if there is, 8276 the server returns NFS4ERR_LOCKED. 8278 For Windows environments, byte-range locks are always mandatory, so 8279 the server always checks for byte-range locks during I/O requests. 8281 Thus, the NFSv4.1 LOCK operation does not need to distinguish between 8282 advisory and mandatory byte-range locks. It is the NFSv4.1 server's 8283 processing of the READ and WRITE operations that introduces the 8284 distinction. 8286 Every stateid which is validly passed to READ, WRITE or SETATTR, with 8287 the exception of special stateid values, defines an access mode for 8288 the file (i.e. READ, WRITE, or READ-WRITE) 8290 o For stateids associated with opens, this is the mode defined by 8291 the original OPEN which caused the allocation of the open stateid 8292 and as modified by subsequent OPENs and OPEN_DOWNGRADEs for the 8293 same open-owner/file pair. 8295 o For stateids returned by byte-range lock requests, the appropriate 8296 mode is the access mode for the open stateid associated with the 8297 lock set represented by the stateid. 8299 o For delegation stateids the access mode is based on the type of 8300 delegation. 8302 When a READ, WRITE, or SETATTR (which specifies the size attribute) 8303 is done, the operation is subject to checking against the access mode 8304 to verify that the operation is appropriate given the stateid with 8305 which the operation is associated. 8307 In the case of WRITE-type operations (i.e. WRITEs and SETATTRs which 8308 set size), the server MUST verify that the access mode allows writing 8309 and MUST return an NFS4ERR_OPENMODE error if it does not. In the 8310 case, of READ, the server may perform the corresponding check on the 8311 access mode, or it may choose to allow READ on opens for WRITE only, 8312 to accommodate clients whose write implementation may unavoidably do 8313 reads (e.g. due to buffer cache constraints). However, even if READs 8314 are allowed in these circumstances, the server MUST still check for 8315 locks that conflict with the READ (e.g. another open specify denial 8316 of READs). Note that a server which does enforce the access mode 8317 check on READs need not explicitly check for conflicting share 8318 reservations since the existence of OPEN for read access guarantees 8319 that no conflicting share reservation can exist. 8321 The read bypass special stateid (all bits of "other" and "seqid" set 8322 to one) indicates a desire to bypass locking checks. The server MAY 8323 allow READ operations to bypass locking checks at the server, when 8324 this special stateid is used. However, WRITE operations with this 8325 special stateid value MUST NOT bypass locking checks and are treated 8326 exactly the same as if a special stateid for anonymous state were 8327 used. 8329 A lock may not be granted while a READ or WRITE operation using one 8330 of the special stateids is being performed and the scope of the lock 8331 to be granted would conflict with the READ or WRITE operation. This 8332 can occur when: 8334 o A mandatory byte range lock is requested with range that conflicts 8335 with the range of the READ or WRITE operation. For the purposes 8336 of this paragraph, a conflict occurs when a shared lock is 8337 requested and a WRITE operation is being performed, or an 8338 exclusive lock is requested and either a READ or a WRITE operation 8339 is being performed. 8341 o A share reservation is requested which denies reading and or 8342 writing and the corresponding operation is being performed. 8344 o A delegation is to be granted and the delegation type would 8345 prevent the I/O operation, i.e. READ and WRITE conflict with a 8346 write delegation and WRITE conflicts with a read delegation. 8348 When a client holds a delegation, it needs to ensure that the stateid 8349 sent conveys the association of operation with the delegation, to 8350 avoid the delegation from being avoidably recalled. When the 8351 delegation stateid, or a stateid open associated with that 8352 delegation, or a stateid representing byte-range locks derived form 8353 such an open is used, the server knows that the READ, WRITE, or 8354 SETATTR does not conflict with the delegation, but is sent under the 8355 aegis of the delegation. Even though it is possible for the server 8356 to determine from the client ID (via the session ID) that the client 8357 does in fact have a delegation, the server is not obliged to check 8358 this, so using a special stateid can result in avoidable recall of 8359 the delegation. 8361 9.2. Lock Ranges 8363 The protocol allows a lock-owner to request a lock with a byte range 8364 and then either upgrade, downgrade, or unlock a sub-range of the 8365 initial lock, or a range that consists of a range which overlaps, 8366 fully or partially, that initial lock or a combination of a set of 8367 existing locks for the same lock-owner. It is expected that this 8368 will be an uncommon type of request. In any case, servers or server 8369 file systems may not be able to support sub-range lock semantics. In 8370 the event that a server receives a locking request that represents a 8371 sub-range of current locking state for the lock-owner, the server is 8372 allowed to return the error NFS4ERR_LOCK_RANGE to signify that it 8373 does not support sub-range lock operations. Therefore, the client 8374 should be prepared to receive this error and, if appropriate, report 8375 the error to the requesting application. 8377 The client is discouraged from combining multiple independent locking 8378 ranges that happen to be adjacent into a single request since the 8379 server may not support sub-range requests and for reasons related to 8380 the recovery of file locking state in the event of server failure. 8381 As discussed in Section 8.4.2, the server may employ certain 8382 optimizations during recovery that work effectively only when the 8383 client's behavior during lock recovery is similar to the client's 8384 locking behavior prior to server failure. 8386 9.3. Upgrading and Downgrading Locks 8388 If a client has a write lock on a byte-range, it can request an 8389 atomic downgrade of the lock to a read lock via the LOCK request, by 8390 setting the type to READ_LT. If the server supports atomic 8391 downgrade, the request will succeed. If not, it will return 8392 NFS4ERR_LOCK_NOTSUPP. The client should be prepared to receive this 8393 error, and if appropriate, report the error to the requesting 8394 application. 8396 If a client has a read lock on a byte-range, it can request an atomic 8397 upgrade of the lock to a write lock via the LOCK request by setting 8398 the type to WRITE_LT or WRITEW_LT. If the server does not support 8399 atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the upgrade 8400 can be achieved without an existing conflict, the request will 8401 succeed. Otherwise, the server will return either NFS4ERR_DENIED or 8402 NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the 8403 client sent the LOCK request with the type set to WRITEW_LT and the 8404 server has detected a deadlock. The client should be prepared to 8405 receive such errors and if appropriate, report the error to the 8406 requesting application. 8408 9.4. Stateid Seqid Values and Byte-Range Locks 8410 When a lock or unlock request is done, passing a stateid, the stateid 8411 returned has the same "other" value and a "seqid" value that is 8412 incremented to reflect the occurrence of the lock or unlock request. 8413 The server MUST increment the value of the "seqid" field whenever 8414 there is any change to the locking status of any byte offset as 8415 described by any of locks covered by the stateid. A change in 8416 locking status includes a change from locked to unlocked or the 8417 reverse or a change from being locked for read to being locked for 8418 write or the reverse. 8420 When there is no such change, as, for example when a range already 8421 locked for write is locked again for write, the server MAY increment 8422 the "seqid" value. 8424 9.5. Issues with Multiple Open-Owners 8426 When the same file is opened by multiple open-owners, a client will 8427 have multiple open stateids for that file, each associated with a 8428 different open-owner. In that case, there can be multiple LOCK and 8429 LOCKU requests for the same lock-owner issued using the different 8430 open stateids, and so a situation may arise in which there are 8431 multiple stateids, each representing byte-range locks on the same 8432 file and held by the same lock-owner but each associated with a 8433 different open-owner. 8435 In such a situation, the locking status of each byte (i.e. whether it 8436 is locked, the read or write mode of the lock and the lock-owner 8437 holding the lock) MUST reflect the last LOCK or LOCKU operation done 8438 for the lock-owner in question, independent of the stateid through 8439 which the request was issued. 8441 When a byte is locked by the lock-owner in question, the open-owner 8442 to which that lock is assigned SHOULD be that of the open-owner 8443 associated with the stateid through which the last LOCK of that byte 8444 was done. When there is a change in the open-owner associated with 8445 locks for the stateid through which a LOCK or LOCKU was done, the 8446 "seqid" field of the stateid MUST be incremented, even if the 8447 locking, in terms of lock-owners has not changed. When there is a 8448 change to the set of locked bytes associated with a different stateid 8449 for the same lock-owner, i.e. associated with a different open-owner, 8450 the "seqid" value for that stateid MUST NOT be incremented. 8452 9.6. Blocking Locks 8454 Some clients require the support of blocking locks. While NFSv4.1 8455 provides a callback when a previously unavailable lock becomes 8456 available, this is an OPTIONAL feature and clients cannot depend on 8457 its presence. Clients need to be prepared to continually poll for 8458 the lock. This presents a fairness problem. Two of the lock types, 8459 READW and WRITEW, are used to indicate to the server that the client 8460 is requesting a blocking lock. When the callback is not used, the 8461 server should maintain an ordered list of pending blocking locks. 8462 When the conflicting lock is released, the server may wait for the 8463 period of time equal to lease_time for the first waiting client to 8464 re-request the lock. After the lease period expires, the next 8465 waiting client request is allowed the lock. Clients are required to 8466 poll at an interval sufficiently small that it is likely to acquire 8467 the lock in a timely manner. The server is not required to maintain 8468 a list of pending blocked locks as it is used to increase fairness 8469 and not correct operation. Because of the unordered nature of crash 8470 recovery, storing of lock state to stable storage would be required 8471 to guarantee ordered granting of blocking locks. 8473 Servers may also note the lock types and delay returning denial of 8474 the request to allow extra time for a conflicting lock to be 8475 released, allowing a successful return. In this way, clients can 8476 avoid the burden of needlessly frequent polling for blocking locks. 8477 The server should take care in the length of delay in the event the 8478 client retransmits the request. 8480 If a server receives a blocking lock request, denies it, and then 8481 later receives a nonblocking request for the same lock, which is also 8482 denied, then it should remove the lock in question from its list of 8483 pending blocking locks. Clients should use such a nonblocking 8484 request to indicate to the server that this is the last time they 8485 intend to poll for the lock, as may happen when the process 8486 requesting the lock is interrupted. This is a courtesy to the 8487 server, to prevent it from unnecessarily waiting a lease period 8488 before granting other lock requests. However, clients are not 8489 required to perform this courtesy, and servers must not depend on 8490 them doing so. Also, clients must be prepared for the possibility 8491 that this final locking request will be accepted. 8493 When server indicates, via the flag OPEN4_RESULT_MAY_NOTIFY_LOCK, 8494 that CB_NOTIFY_LOCK callbacks will be done for the current open file, 8495 the client should take notice of this, but, since this is a hint, 8496 cannot rely on a CB_NOTIFY_LOCK always being done. A client may 8497 reasonably reduce the frequency with which it polls for a denied 8498 lock, since the greater latency that might occur is likely to be 8499 eliminated given a prompt callback, but it still needs to poll. When 8500 it receives a CB_NOTIFY_LOCK it should promptly try to obtain the 8501 lock, but it should be aware that other clients may polling and the 8502 server is under no obligation to reserve the lock for that particular 8503 client. 8505 9.7. Share Reservations 8507 A share reservation is a mechanism to control access to a file. It 8508 is a separate and independent mechanism from byte-range locking. 8509 When a client opens a file, it sends an OPEN operation to the server 8510 specifying the type of access required (READ, WRITE, or BOTH) and the 8511 type of access to deny others (deny NONE, READ, WRITE, or BOTH). If 8512 the OPEN fails the client will fail the application's open request. 8514 Pseudo-code definition of the semantics: 8516 if (request.access == 0) { 8517 return (NFS4ERR_INVAL) 8518 } else { 8519 if ((request.access & file_state.deny)) || 8520 (request.deny & file_state.access)) { 8521 return (NFS4ERR_DENIED) 8522 } 8523 return (NFS4ERR_OK); 8525 When doing this checking of share reservations on OPEN, the current 8526 file_state used in the algorithm includes bits that reflect all 8527 current opens, including those for the open-owner making the new OPEN 8528 request. 8530 The constants used for the OPEN and OPEN_DOWNGRADE operations for the 8531 access and deny fields are as follows: 8533 const OPEN4_SHARE_ACCESS_READ = 0x00000001; 8534 const OPEN4_SHARE_ACCESS_WRITE = 0x00000002; 8535 const OPEN4_SHARE_ACCESS_BOTH = 0x00000003; 8537 const OPEN4_SHARE_DENY_NONE = 0x00000000; 8538 const OPEN4_SHARE_DENY_READ = 0x00000001; 8539 const OPEN4_SHARE_DENY_WRITE = 0x00000002; 8540 const OPEN4_SHARE_DENY_BOTH = 0x00000003; 8542 9.8. OPEN/CLOSE Operations 8544 To provide correct share semantics, a client MUST use the OPEN 8545 operation to obtain the initial filehandle and indicate the desired 8546 access and what access, if any, to deny. Even if the client intends 8547 to use a special stateid for anonymous state or read bypass, it must 8548 still obtain the filehandle for the regular file with the OPEN 8549 operation so the appropriate share semantics can be applied. For 8550 clients that do not have a deny mode built into their open 8551 programming interfaces, deny equal to NONE should be used. 8553 The OPEN operation with the CREATE flag, also subsumes the CREATE 8554 operation for regular files as used in previous versions of the NFS 8555 protocol. This allows a create with a share to be done atomically. 8557 The CLOSE operation removes all share reservations held by the open- 8558 owner on that file. If byte-range locks are held, the client SHOULD 8559 release all locks before issuing a CLOSE. The server MAY free all 8560 outstanding locks on CLOSE but some servers may not support the CLOSE 8561 of a file that still has byte-range locks held. The server MUST 8562 return failure, NFS4ERR_LOCKS_HELD, if any locks would exist after 8563 the CLOSE. 8565 The LOOKUP operation will return a filehandle without establishing 8566 any lock state on the server. Without a valid stateid, the server 8567 will assume the client has the least access. For example, a file 8568 opened with deny READ/WRITE using a filehandle obtained through 8569 LOOKUP could only be read using the special read bypass stateid and 8570 could not be written at all because it would not have a valid stateid 8571 and the special anonymous stateid would not be allowed access. 8573 9.9. Open Upgrade and Downgrade 8575 When an OPEN is done for a file and the open-owner for which the open 8576 is being done already has the file open, the result is to upgrade the 8577 open file status maintained on the server to include the access and 8578 deny bits specified by the new OPEN as well as those for the existing 8579 OPEN. The result is that there is one open file, as far as the 8580 protocol is concerned, and it includes the union of the access and 8581 deny bits for all of the OPEN requests completed. The open is 8582 represented by a single stateid whose "other" values matches that of 8583 the original open, and whose "seqid" value is incremented to reflect 8584 the occurrence of the upgrade. The increment is required in cases in 8585 which the "upgrade" results in no change to the open mode (e.g. an 8586 OPEN is done for read when the existing open file is opened for read- 8587 write). Only a single CLOSE will be done to reset the effects of 8588 both OPENs. The client may use the stateid returned by the OPEN 8589 effecting the upgrade or with a stateid sharing the same "other" 8590 field and a seqid of zero, although care needs to be taken as far as 8591 upgrades which happen while the CLOSE is pending. Note that the 8592 client, when issuing the OPEN, may not know that the same file is in 8593 fact being opened. The above only applies if both OPENs result in 8594 the OPENed object being designated by the same filehandle. 8596 When the server chooses to export multiple filehandles corresponding 8597 to the same file object and returns different filehandles on two 8598 different OPENs of the same file object, the server MUST NOT "OR" 8599 together the access and deny bits and coalesce the two open files. 8600 Instead the server must maintain separate OPENs with separate 8601 stateids and will require separate CLOSEs to free them. 8603 When multiple open files on the client are merged into a single open 8604 file object on the server, the close of one of the open files (on the 8605 client) may necessitate change of the access and deny status of the 8606 open file on the server. This is because the union of the access and 8607 deny bits for the remaining opens may be smaller (i.e. a proper 8608 subset) than previously. The OPEN_DOWNGRADE operation is used to 8609 make the necessary change and the client should use it to update the 8610 server so that share reservation requests by other clients are 8611 handled properly. The stateid returned has the same "other" field as 8612 that passed to the server. The "seqid" value in the returned stateid 8613 MUST be incremented, even is situation in which there is no change 8614 the access and deny bits for the file. 8616 9.10. Parallel OPENs 8618 Unlike the case of NFSv4.0, in which OPEN operations for the same 8619 open-owner are inherently serialized because of the owner-based 8620 seqid, multiple OPENs for the same open-owner may be done in 8621 parallel. When clients do this, they may encounter situations in 8622 which, because of the existence of hard links, two OPEN operations 8623 may turn out to open the same file, with a later OPEN performed being 8624 an upgrade of the first, with this fact only visible to the client 8625 once the operations complete. 8627 In this situation, clients may determine the order in which the OPENs 8628 were performed by examining the stateids returned by the OPENs. 8629 Stateids that share a common value of the "other" field can be 8630 recognized as having opened the same file, with the order of the 8631 operations determinable from the order of the "seqid" fields, mod any 8632 possible wraparound of the 32-bit field. 8634 When the possibility exists that the client will send multiple OPENs 8635 for the same open-owner in parallel, it may be the case that an open 8636 upgrade may happen without the client knowing beforehand that this 8637 could happen. Because of this possibility, CLOSEs and 8638 OPEN_DOWNGRADEs, should generally be sent with a non-zero seqid in 8639 the stateid, to avoid the possibility that the status change 8640 associated with an open upgrade is not inadvertently lost. 8642 9.11. Reclaim of Open and Byte-Range Locks 8644 Special forms of the LOCK and OPEN operations are provided when it is 8645 necessary to re-establish byte-range locks or opens after a server 8646 failure. 8648 o To reclaim existing opens, an OPEN operation is performed using a 8649 CLAIM_PREVIOUS. Because the client, in this type of situation, 8650 will have already opened the file and have the filehandle of the 8651 target file, this operation requires that the current filehandle 8652 be the target file, rather than a directory and no file name is 8653 specified. 8655 o To reclaim byte-range locks, a LOCK operation with the reclaim 8656 parameter set to true is used. 8658 Reclaims of opens associated with delegations are discussed in 8659 Section 10.2.1. 8661 10. Client-Side Caching 8663 Client-side caching of data, of file attributes, and of file names is 8664 essential to providing good performance with the NFS protocol. 8665 Providing distributed cache coherence is a difficult problem and 8666 previous versions of the NFS protocol have not attempted it. 8667 Instead, several NFS client implementation techniques have been used 8668 to reduce the problems that a lack of coherence poses for users. 8669 These techniques have not been clearly defined by earlier protocol 8670 specifications and it is often unclear what is valid or invalid 8671 client behavior. 8673 The NFSv4.1 protocol uses many techniques similar to those that have 8674 been used in previous protocol versions. The NFSv4.1 protocol does 8675 not provide distributed cache coherence. However, it defines a more 8676 limited set of caching guarantees to allow locks and share 8677 reservations to be used without destructive interference from client 8678 side caching. 8680 In addition, the NFSv4.1 protocol introduces a delegation mechanism 8681 which allows many decisions normally made by the server to be made 8682 locally by clients. This mechanism provides efficient support of the 8683 common cases where sharing is infrequent or where sharing is read- 8684 only. 8686 10.1. Performance Challenges for Client-Side Caching 8688 Caching techniques used in previous versions of the NFS protocol have 8689 been successful in providing good performance. However, several 8690 scalability challenges can arise when those techniques are used with 8691 very large numbers of clients. This is particularly true when 8692 clients are geographically distributed which classically increases 8693 the latency for cache revalidation requests. 8695 The previous versions of the NFS protocol repeat their file data 8696 cache validation requests at the time the file is opened. This 8697 behavior can have serious performance drawbacks. A common case is 8698 one in which a file is only accessed by a single client. Therefore, 8699 sharing is infrequent. 8701 In this case, repeated reference to the server to find that no 8702 conflicts exist is expensive. A better option with regards to 8703 performance is to allow a client that repeatedly opens a file to do 8704 so without reference to the server. This is done until potentially 8705 conflicting operations from another client actually occur. 8707 A similar situation arises in connection with file locking. Sending 8708 file lock and unlock requests to the server as well as the read and 8709 write requests necessary to make data caching consistent with the 8710 locking semantics (see Section 10.3.2) can severely limit 8711 performance. When locking is used to provide protection against 8712 infrequent conflicts, a large penalty is incurred. This penalty may 8713 discourage the use of file locking by applications. 8715 The NFSv4.1 protocol provides more aggressive caching strategies with 8716 the following design goals: 8718 o Compatibility with a large range of server semantics. 8720 o Providing the same caching benefits as previous versions of the 8721 NFS protocol when unable to support the more aggressive model. 8723 o Requirements for aggressive caching are organized so that a large 8724 portion of the benefit can be obtained even when not all of the 8725 requirements can be met. 8727 The appropriate requirements for the server are discussed in later 8728 sections in which specific forms of caching are covered (see 8729 Section 10.4). 8731 10.2. Delegation and Callbacks 8733 Recallable delegation of server responsibilities for a file to a 8734 client improves performance by avoiding repeated requests to the 8735 server in the absence of inter-client conflict. With the use of a 8736 "callback" RPC from server to client, a server recalls delegated 8737 responsibilities when another client engages in sharing of a 8738 delegated file. 8740 A delegation is passed from the server to the client, specifying the 8741 object of the delegation and the type of delegation. There are 8742 different types of delegations but each type contains a stateid to be 8743 used to represent the delegation when performing operations that 8744 depend on the delegation. This stateid is similar to those 8745 associated with locks and share reservations but differs in that the 8746 stateid for a delegation is associated with a client ID and may be 8747 used on behalf of all the open-owners for the given client. A 8748 delegation is made to the client as a whole and not to any specific 8749 process or thread of control within it. 8751 The backchannel is established by CREATE_SESSION and 8752 BIND_CONN_TO_SESSION, and the client is required to maintain it. 8753 Because the backchannel may be down, even temporarily, correct 8754 protocol operation does not depend on them. Preliminary testing of 8755 backchannel functionality by means of a CB_COMPOUND procedure with a 8756 single operation, CB_SEQUENCE, can be used to check the continuity of 8757 the backchannel. A server avoids delegating responsibilities until 8758 it has determined that the backchannel exists. Because the granting 8759 of a delegation is always conditional upon the absence of conflicting 8760 access, clients must not assume that a delegation will be granted and 8761 they must always be prepared for OPENs, WANT_DELEGATIONs, and 8762 GET_DIR_DELEGATIONs to be processed without any delegations being 8763 granted. 8765 Once granted, a delegation behaves in many ways like a lock. There 8766 is an associated lease that is subject to renewal together with all 8767 of the other leases held by that client. 8769 Unlike locks, an operation by a second client to a delegated file 8770 will cause the server to recall a delegation through a callback. For 8771 individual operations, we will describe, under IMPLEMENTATION, when 8772 such operations are required to effect a recall. A number of points 8773 should be noted, however. 8775 o The server is free to recall a delegation whenever it feels it is 8776 desirable and may do so even if no operations requiring recall are 8777 being done. 8779 o Operations done outside the NFSv4 protocol, due to, for example, 8780 access by other protocols, or by local access, also need to result 8781 in delegation recall when they make analogous changes to file 8782 system data. What is crucial is if the change would invalidate 8783 the guarantees provided by the delegation. When this is possible, 8784 the delegation needs to be recalled and must be returned or 8785 revoked before allowing the operation to proceed. 8787 o The semantics of the file system are crucial in defining when 8788 delegation recall is required. If a particular change within a 8789 specific implementation causes change to a file attribute, then 8790 delegation recall is required, whether that operation has been 8791 specifically listed as requiring delegation recall. Again, what 8792 is critical is whether the guarantees provided by the delegation 8793 are being invalidated. 8795 Despite those caveats, the implementation sections for a number of 8796 operations describe situations in which delegation recall would be 8797 required under some common circumstances: 8799 o For GETATTR, see Section 18.7.4. 8801 o For OPEN, see Section 18.16.4. 8803 o For READ, see Section 18.22.4. 8805 o For REMOVE, see Section 18.25.4. 8807 o For RENAME, see Section 18.26.4. 8809 o For SETATTR, see Section 18.30.4. 8811 o For WRITE, see Section 18.32.4. 8813 On recall, the client holding the delegation must flush modified 8814 state (such as modified data) to the server and return the 8815 delegation. The conflicting request will not be acted on until the 8816 recall is complete. The recall is considered complete when the 8817 client returns the delegation or the server times its wait for the 8818 delegation to be returned and revokes the delegation as a result of 8819 the timeout. In the interim, the server will either delay responding 8820 to conflicting requests or respond to them with NFSERR_DELAY. 8821 Following the resolution of the recall, the server has the 8822 information necessary to grant or deny the second client's request. 8824 At the time the client receives a delegation recall, it may have 8825 substantial state that needs to be flushed to the server. Therefore, 8826 the server should allow sufficient time for the delegation to be 8827 returned since it may involve numerous RPCs to the server. If the 8828 server is able to determine that the client is diligently flushing 8829 state to the server as a result of the recall, the server may extend 8830 the usual time allowed for a recall. However, the time allowed for 8831 recall completion should not be unbounded. 8833 An example of this is when responsibility to mediate opens on a given 8834 file is delegated to a client (see Section 10.4). The server will 8835 not know what opens are in effect on the client. Without this 8836 knowledge the server will be unable to determine if the access and 8837 deny state for the file allows any particular open until the 8838 delegation for the file has been returned. 8840 A client failure or a network partition can result in failure to 8841 respond to a recall callback. In this case, the server will revoke 8842 the delegation which in turn will render useless any modified state 8843 still on the client. 8845 10.2.1. Delegation Recovery 8847 There are three situations that delegation recovery must deal with: 8849 o Client restart 8851 o Server restart 8853 o Network partition (full or backchannel-only) 8855 In the event the client restarts, the failure to renew the lease will 8856 result in the revocation of byte-range locks and share reservations. 8857 Delegations, however, may be treated a bit differently. 8859 There will be situations in which delegations will need to be 8860 reestablished after a client restarts. The reason for this is the 8861 client may have file data stored locally and this data was associated 8862 with the previously held delegations. The client will need to 8863 reestablish the appropriate file state on the server. 8865 To allow for this type of client recovery, the server MAY extend the 8866 period for delegation recovery beyond the typical lease expiration 8867 period. This implies that requests from other clients that conflict 8868 with these delegations will need to wait. Because the normal recall 8869 process may require significant time for the client to flush changed 8870 state to the server, other clients need be prepared for delays that 8871 occur because of a conflicting delegation. This longer interval 8872 would increase the window for clients to restart and consult stable 8873 storage so that the delegations can be reclaimed. For open 8874 delegations, such delegations are reclaimed using OPEN with a claim 8875 type of CLAIM_DELEGATE_PREV or CLAIM_DELEG_PREV_FH (See Section 10.5 8876 and Section 18.16 for discussion of open delegation and the details 8877 of OPEN respectively). 8879 A server MAY support claim types of CLAIM_DELEGATE_PREV and 8880 CLAIM_DELEG_PREV_FH, and if it does, it MUST NOT remove delegations 8881 upon a CREATE_SESSION that confirms a client ID created by 8882 EXCHANGE_ID, and instead MUST, for a period of time no less than that 8883 of the value of the lease_time attribute, maintain the client's 8884 delegations to allow time for the client to send CLAIM_DELEGATE_PREV 8885 requests. The server that supports CLAIM_DELEGATE_PREV and/or 8886 CLAIM_DELEG_PREV_FH MUST support the DELEGPURGE operation. 8888 When the server restarts, delegations are reclaimed (using the OPEN 8889 operation with CLAIM_PREVIOUS) in a similar fashion to byte-range 8890 locks and share reservations. However, there is a slight semantic 8891 difference. In the normal case if the server decides that a 8892 delegation should not be granted, it performs the requested action 8893 (e.g. OPEN) without granting any delegation. For reclaim, the 8894 server grants the delegation but a special designation is applied so 8895 that the client treats the delegation as having been granted but 8896 recalled by the server. Because of this, the client has the duty to 8897 write all modified state to the server and then return the 8898 delegation. This process of handling delegation reclaim reconciles 8899 three principles of the NFSv4.1 protocol: 8901 o Upon reclaim, a client reporting resources assigned to it by an 8902 earlier server instance must be granted those resources. 8904 o The server has unquestionable authority to determine whether 8905 delegations are to be granted and, once granted, whether they are 8906 to be continued. 8908 o The use of callbacks is not to be depended upon until the client 8909 has proven its ability to receive them. 8911 When a client needs to reclaim a delegation and there is no 8912 associated open, the client may use the CLAIM_PREVIOUS variant of the 8913 WANT_DELEGATION operation. However, since the server is not required 8914 to support this operation, an alternative is to reclaim via a dummy 8915 open together with the delegation using an OPEN of type 8916 CLAIM_PREVIOUS. The dummy open file can be released using a CLOSE to 8917 re-establish the original state to be reclaimed, a delegation without 8918 an associated open. 8920 When a client has more than a single open associated with a 8921 delegation, state for those additional opens can be established using 8922 OPEN operations of type CLAIM_DELEGATE_CUR. When these are used to 8923 establish opens associated with reclaimed delegations, the server 8924 MUST allow them when made within the grace period. 8926 When a network partition occurs, delegations are subject to freeing 8927 by the server when the lease renewal period expires. This is similar 8928 to the behavior for locks and share reservations. For delegations, 8929 however, the server may extend the period in which conflicting 8930 requests are held off. Eventually the occurrence of a conflicting 8931 request from another client will cause revocation of the delegation. 8932 A loss of the backchannel (e.g. by later network configuration 8933 change) will have the same effect. A recall request will fail and 8934 revocation of the delegation will result. 8936 A client normally finds out about revocation of a delegation when it 8937 uses a stateid associated with a delegation and receives one of the 8938 errors NFS4EER_EXPIRED, NFS4ERR_ADMIN_REVOKED, or 8939 NFS4ERR_DELEG_REVOKED. It also may find out about delegation 8940 revocation after a client restart when it attempts to reclaim a 8941 delegation and receives that same error. Note that in the case of a 8942 revoked write open delegation, there are issues because data may have 8943 been modified by the client whose delegation is revoked and 8944 separately by other clients. See Section 10.5.1 for a discussion of 8945 such issues. Note also that when delegations are revoked, 8946 information about the revoked delegation will be written by the 8947 server to stable storage (as described in Section 8.4.3). This is 8948 done to deal with the case in which a server restarts after revoking 8949 a delegation but before the client holding the revoked delegation is 8950 notified about the revocation. 8952 10.3. Data Caching 8954 When applications share access to a set of files, they need to be 8955 implemented so as to take account of the possibility of conflicting 8956 access by another application. This is true whether the applications 8957 in question execute on different clients or reside on the same 8958 client. 8960 Share reservations and byte-range locks are the facilities the 8961 NFSv4.1 protocol provides to allow applications to coordinate access 8962 by using mutual exclusion facilities. The NFSv4.1 protocol's data 8963 caching must be implemented such that it does not invalidate the 8964 assumptions that those using these facilities depend upon. 8966 10.3.1. Data Caching and OPENs 8968 In order to avoid invalidating the sharing assumptions that 8969 applications rely on, NFSv4.1 clients should not provide cached data 8970 to applications or modify it on behalf of an application when it 8971 would not be valid to obtain or modify that same data via a READ or 8972 WRITE operation. 8974 Furthermore, in the absence of open delegation (see Section 10.4), 8975 two additional rules apply. Note that these rules are obeyed in 8976 practice by many NFSv3 clients. 8978 o First, cached data present on a client must be revalidated after 8979 doing an OPEN. Revalidating means that the client fetches the 8980 change attribute from the server, compares it with the cached 8981 change attribute, and if different, declares the cached data (as 8982 well as the cached attributes) as invalid. This is to ensure that 8983 the data for the OPENed file is still correctly reflected in the 8984 client's cache. This validation must be done at least when the 8985 client's OPEN operation includes DENY=WRITE or BOTH thus 8986 terminating a period in which other clients may have had the 8987 opportunity to open the file with WRITE access. Clients may 8988 choose to do the revalidation more often (i.e. at OPENs specifying 8989 DENY=NONE) to parallel the NFSv3 protocol's practice for the 8990 benefit of users assuming this degree of cache revalidation. 8992 Since the change attribute is updated for data and metadata 8993 modifications, some client implementors may be tempted to use the 8994 time_modify attribute and not the change attribute to validate 8995 cached data, so that metadata changes do not spuriously invalidate 8996 clean data. The implementor is cautioned in this approach. The 8997 change attribute is guaranteed to change for each update to the 8998 file, whereas time_modify is guaranteed to change only at the 8999 granularity of the time_delta attribute. Use by the client's data 9000 cache validation logic of time_modify and not change runs the risk 9001 of the client incorrectly marking stale data as valid. 9003 o Second, modified data must be flushed to the server before closing 9004 a file OPENed for write. This is complementary to the first rule. 9005 If the data is not flushed at CLOSE, the revalidation done after 9006 client OPENs as file is unable to achieve its purpose. The other 9007 aspect to flushing the data before close is that the data must be 9008 committed to stable storage, at the server, before the CLOSE 9009 operation is requested by the client. In the case of a server 9010 restart and a CLOSEd file, it may not be possible to retransmit 9011 the data to be written to the file. Hence, this requirement. 9013 10.3.2. Data Caching and File Locking 9015 For those applications that choose to use file locking instead of 9016 share reservations to exclude inconsistent file access, there is an 9017 analogous set of constraints that apply to client side data caching. 9018 These rules are effective only if the file locking is used in a way 9019 that matches in an equivalent way the actual READ and WRITE 9020 operations executed. This is as opposed to file locking that is 9021 based on pure convention. For example, it is possible to manipulate 9022 a two-megabyte file by dividing the file into two one-megabyte 9023 regions and protecting access to the two regions by file locks on 9024 bytes zero and one. A lock for write on byte zero of the file would 9025 represent the right to do READ and WRITE operations on the first 9026 region. A lock for write on byte one of the file would represent the 9027 right to do READ and WRITE operations on the second region. As long 9028 as all applications manipulating the file obey this convention, they 9029 will work on a local file system. However, they may not work with 9030 the NFSv4.1 protocol unless clients refrain from data caching. 9032 The rules for data caching in the file locking environment are: 9034 o First, when a client obtains a file lock for a particular region, 9035 the data cache corresponding to that region (if any cache data 9036 exists) must be revalidated. If the change attribute indicates 9037 that the file may have been updated since the cached data was 9038 obtained, the client must flush or invalidate the cached data for 9039 the newly locked region. A client might choose to invalidate all 9040 of non-modified cached data that it has for the file but the only 9041 requirement for correct operation is to invalidate all of the data 9042 in the newly locked region. 9044 o Second, before releasing a write lock for a region, all modified 9045 data for that region must be flushed to the server. The modified 9046 data must also be written to stable storage. 9048 Note that flushing data to the server and the invalidation of cached 9049 data must reflect the actual byte ranges locked or unlocked. 9050 Rounding these up or down to reflect client cache block boundaries 9051 will cause problems if not carefully done. For example, writing a 9052 modified block when only half of that block is within an area being 9053 unlocked may cause invalid modification to the region outside the 9054 unlocked area. This, in turn, may be part of a region locked by 9055 another client. Clients can avoid this situation by synchronously 9056 performing portions of write operations that overlap that portion 9057 (initial or final) that is not a full block. Similarly, invalidating 9058 a locked area which is not an integral number of full buffer blocks 9059 would require the client to read one or two partial blocks from the 9060 server if the revalidation procedure shows that the data which the 9061 client possesses may not be valid. 9063 The data that is written to the server as a prerequisite to the 9064 unlocking of a region must be written, at the server, to stable 9065 storage. The client may accomplish this either with synchronous 9066 writes or by following asynchronous writes with a COMMIT operation. 9068 This is required because retransmission of the modified data after a 9069 server restart might conflict with a lock held by another client. 9071 A client implementation may choose to accommodate applications which 9072 use byte-range locking in non-standard ways (e.g. using a byte-range 9073 lock as a global semaphore) by flushing to the server more data upon 9074 an LOCKU than is covered by the locked range. This may include 9075 modified data within files other than the one for which the unlocks 9076 are being done. In such cases, the client must not interfere with 9077 applications whose READs and WRITEs are being done only within the 9078 bounds of byte-range locks which the application holds. For example, 9079 an application locks a single byte of a file and proceeds to write 9080 that single byte. A client that chose to handle a LOCKU by flushing 9081 all modified data to the server could validly write that single byte 9082 in response to an unrelated unlock. However, it would not be valid 9083 to write the entire block in which that single written byte was 9084 located since it includes an area that is not locked and might be 9085 locked by another client. Client implementations can avoid this 9086 problem by dividing files with modified data into those for which all 9087 modifications are done to areas covered by an appropriate byte-range 9088 lock and those for which there are modifications not covered by a 9089 byte-range lock. Any writes done for the former class of files must 9090 not include areas not locked and thus not modified on the client. 9092 10.3.3. Data Caching and Mandatory File Locking 9094 Client side data caching needs to respect mandatory file locking when 9095 it is in effect. The presence of mandatory file locking for a given 9096 file is indicated when the client gets back NFS4ERR_LOCKED from a 9097 READ or WRITE on a file it has an appropriate share reservation for. 9098 When mandatory locking is in effect for a file, the client must check 9099 for an appropriate file lock for data being read or written. If a 9100 lock exists for the range being read or written, the client may 9101 satisfy the request using the client's validated cache. If an 9102 appropriate file lock is not held for the range of the read or write, 9103 the read or write request must not be satisfied by the client's cache 9104 and the request must be sent to the server for processing. When a 9105 read or write request partially overlaps a locked region, the request 9106 should be subdivided into multiple pieces with each region (locked or 9107 not) treated appropriately. 9109 10.3.4. Data Caching and File Identity 9111 When clients cache data, the file data needs to be organized 9112 according to the file system object to which the data belongs. For 9113 NFSv3 clients, the typical practice has been to assume for the 9114 purpose of caching that distinct filehandles represent distinct file 9115 system objects. The client then has the choice to organize and 9116 maintain the data cache on this basis. 9118 In the NFSv4.1 protocol, there is now the possibility to have 9119 significant deviations from a "one filehandle per object" model 9120 because a filehandle may be constructed on the basis of the object's 9121 pathname. Therefore, clients need a reliable method to determine if 9122 two filehandles designate the same file system object. If clients 9123 were simply to assume that all distinct filehandles denote distinct 9124 objects and proceed to do data caching on this basis, caching 9125 inconsistencies would arise between the distinct client side objects 9126 which mapped to the same server side object. 9128 By providing a method to differentiate filehandles, the NFSv4.1 9129 protocol alleviates a potential functional regression in comparison 9130 with the NFSv3 protocol. Without this method, caching 9131 inconsistencies within the same client could occur and this has not 9132 been present in previous versions of the NFS protocol. Note that it 9133 is possible to have such inconsistencies with applications executing 9134 on multiple clients but that is not the issue being addressed here. 9136 For the purposes of data caching, the following steps allow an 9137 NFSv4.1 client to determine whether two distinct filehandles denote 9138 the same server side object: 9140 o If GETATTR directed to two filehandles returns different values of 9141 the fsid attribute, then the filehandles represent distinct 9142 objects. 9144 o If GETATTR for any file with an fsid that matches the fsid of the 9145 two filehandles in question returns a unique_handles attribute 9146 with a value of TRUE, then the two objects are distinct. 9148 o If GETATTR directed to the two filehandles does not return the 9149 fileid attribute for both of the handles, then it cannot be 9150 determined whether the two objects are the same. Therefore, 9151 operations which depend on that knowledge (e.g. client side data 9152 caching) cannot be done reliably. Note that if GETATTR does not 9153 return the fileid attribute for both filehandles, it will return 9154 it for neither of the filehandles, since the fsid for both 9155 filehandles is the same. 9157 o If GETATTR directed to the two filehandles returns different 9158 values for the fileid attribute, then they are distinct objects. 9160 o Otherwise they are the same object. 9162 10.4. Open Delegation 9164 When a file is being OPENed, the server may delegate further handling 9165 of opens and closes for that file to the opening client. Any such 9166 delegation is recallable, since the circumstances that allowed for 9167 the delegation are subject to change. In particular, the server may 9168 receive a conflicting OPEN from another client, the server must 9169 recall the delegation before deciding whether the OPEN from the other 9170 client may be granted. Making a delegation is up to the server and 9171 clients should not assume that any particular OPEN either will or 9172 will not result in an open delegation. The following is a typical 9173 set of conditions that servers might use in deciding whether OPEN 9174 should be delegated: 9176 o The client must be able to respond to the server's callback 9177 requests. If a backchannel has been established, the server will 9178 send a CB_COMPOUND request, containing a single operation, 9179 CB_SEQUENCE, for a test of backchannel availability. 9181 o The client must have responded properly to previous recalls. 9183 o There must be no current open conflicting with the requested 9184 delegation. 9186 o There should be no current delegation that conflicts with the 9187 delegation being requested. 9189 o The probability of future conflicting open requests should be low 9190 based on the recent history of the file. 9192 o The existence of any server-specific semantics of OPEN/CLOSE that 9193 would make the required handling incompatible with the prescribed 9194 handling that the delegated client would apply (see below). 9196 There are two types of open delegations, read and write. A read open 9197 delegation allows a client to handle, on its own, requests to open a 9198 file for reading that do not deny read access to others. Multiple 9199 read open delegations may be outstanding simultaneously and do not 9200 conflict. A write open delegation allows the client to handle, on 9201 its own, all opens. Only one write open delegation may exist for a 9202 given file at a given time and it is inconsistent with any read open 9203 delegations. 9205 When a client has a read open delegation, it is assured that neither 9206 the contents, the attributes (with the exception of time_access), nor 9207 the names of any links to the file will change without its knowledge, 9208 so long as the delegation is held. When a client has a write open 9209 delegation, it may modify the file data locally since no other client 9210 will be accessing the file's data. The client holding a write 9211 delegation may only locally affect file attributes which are 9212 intimately connected with the file data: size, change, time_access, 9213 time_metadata, and time_modify. All other attributes must be 9214 reflected on the server. 9216 When a client has an open delegation, it does not need to send OPENs 9217 or CLOSEs to the server. Instead the client may update the 9218 appropriate status internally. For a read open delegation, opens 9219 that cannot be handled locally (opens for write or that deny read 9220 access) must be sent to the server. 9222 When an open delegation is made, the reply to the OPEN contains an 9223 open delegation structure which specifies the following: 9225 o the type of delegation (read or write). 9227 o space limitation information to control flushing of data on close 9228 (write open delegation only, see Section 10.4.1). 9230 o an nfsace4 specifying read and write permissions. 9232 o a stateid to represent the delegation for READ and WRITE. 9234 The delegation stateid is separate and distinct from the stateid for 9235 the OPEN proper. The standard stateid, unlike the delegation 9236 stateid, is associated with a particular lock-owner and will continue 9237 to be valid after the delegation is recalled and the file remains 9238 open. 9240 When a request internal to the client is made to open a file and an 9241 open delegation is in effect, it will be accepted or rejected solely 9242 on the basis of the following conditions. Any requirement for other 9243 checks to be made by the delegate should result in open delegation 9244 being denied so that the checks can be made by the server itself. 9246 o The access and deny bits for the request and the file as described 9247 in Section 9.7. 9249 o The read and write permissions as determined below. 9251 The nfsace4 passed with delegation can be used to avoid frequent 9252 ACCESS calls. The permission check should be as follows: 9254 o If the nfsace4 indicates that the open may be done, then it should 9255 be granted without reference to the server. 9257 o If the nfsace4 indicates that the open may not be done, then an 9258 ACCESS request must be sent to the server to obtain the definitive 9259 answer. 9261 The server may return an nfsace4 that is more restrictive than the 9262 actual ACL of the file. This includes an nfsace4 that specifies 9263 denial of all access. Note that some common practices such as 9264 mapping the traditional user "root" to the user "nobody" may make it 9265 incorrect to return the actual ACL of the file in the delegation 9266 response. 9268 The use of a delegation together with various other forms of caching 9269 creates the possibility that no server authentication and 9270 authorization will ever be performed for a given user since all of 9271 the user's requests might be satisfied locally. Where the client is 9272 depending on the server for authentication and authorization, the 9273 client should be sure authentication and authorization occurs for 9274 each user by use of the ACCESS operation. This should be the case 9275 even if an ACCESS operation would not be required otherwise. As 9276 mentioned before, the server may enforce frequent authentication by 9277 returning an nfsace4 denying all access with every open delegation. 9279 10.4.1. Open Delegation and Data Caching 9281 An OPEN delegation allows much of the message overhead associated 9282 with the opening and closing files to be eliminated. An open when an 9283 open delegation is in effect does not require that a validation 9284 message be sent to the server. The continued endurance of the "read 9285 open delegation" provides a guarantee that no OPEN for write and thus 9286 no write has occurred. Similarly, when closing a file opened for 9287 write and if write open delegation is in effect, the data written 9288 does not have to be written to the server until the open delegation 9289 is recalled. The continued endurance of the open delegation provides 9290 a guarantee that no open and thus no read or write has been done by 9291 another client. 9293 For the purposes of open delegation, READs and WRITEs done without an 9294 OPEN are treated as the functional equivalents of a corresponding 9295 type of OPEN. Although client SHOULD NOT use special stateids when 9296 an open exists, delegation handling on the server can use the client 9297 ID associated with the current session to determine if the operation 9298 has been done by the holder of the delegation, in which case, no 9299 recall is necessary, or by another client, in which case the 9300 delegation must be recalled and I/O not proceed until the delegation 9301 is recalled or revoked. 9303 With delegations, a client is able to avoid writing data to the 9304 server when the CLOSE of a file is serviced. The file close system 9305 call is the usual point at which the client is notified of a lack of 9306 stable storage for the modified file data generated by the 9307 application. At the close, file data is written to the server and 9308 through normal accounting the server is able to determine if the 9309 available file system space for the data has been exceeded (i.e. 9310 server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting 9311 includes quotas. The introduction of delegations requires that a 9312 alternative method be in place for the same type of communication to 9313 occur between client and server. 9315 In the delegation response, the server provides either the limit of 9316 the size of the file or the number of modified blocks and associated 9317 block size. The server must ensure that the client will be able to 9318 write modified data to the server of a size equal to that provided in 9319 the original delegation. The server must make this assurance for all 9320 outstanding delegations. Therefore, the server must be careful in 9321 its management of available space for new or modified data taking 9322 into account available file system space and any applicable quotas. 9323 The server can recall delegations as a result of managing the 9324 available file system space. The client should abide by the server's 9325 state space limits for delegations. If the client exceeds the stated 9326 limits for the delegation, the server's behavior is undefined. 9328 Based on server conditions, quotas or available file system space, 9329 the server may grant write open delegations with very restrictive 9330 space limitations. The limitations may be defined in a way that will 9331 always force modified data to be flushed to the server on close. 9333 With respect to authentication, flushing modified data to the server 9334 after a CLOSE has occurred may be problematic. For example, the user 9335 of the application may have logged off the client and unexpired 9336 authentication credentials may not be present. In this case, the 9337 client may need to take special care to ensure that local unexpired 9338 credentials will in fact be available. This may be accomplished by 9339 tracking the expiration time of credentials and flushing data well in 9340 advance of their expiration or by making private copies of 9341 credentials to assure their availability when needed. 9343 10.4.2. Open Delegation and File Locks 9345 When a client holds a write open delegation, lock operations are 9346 performed locally. This includes those required for mandatory file 9347 locking. This can be done since the delegation implies that there 9348 can be no conflicting locks. Similarly, all of the revalidations 9349 that would normally be associated with obtaining locks and the 9350 flushing of data associated with the releasing of locks need not be 9351 done. 9353 When a client holds a read open delegation, lock operations are not 9354 performed locally. All lock operations, including those requesting 9355 non-exclusive locks, are sent to the server for resolution. 9357 10.4.3. Handling of CB_GETATTR 9359 The server needs to employ special handling for a GETATTR where the 9360 target is a file that has a write open delegation in effect. The 9361 reason for this is that the client holding the write delegation may 9362 have modified the data and the server needs to reflect this change to 9363 the second client that submitted the GETATTR. Therefore, the client 9364 holding the write delegation needs to be interrogated. The server 9365 will use the CB_GETATTR operation. The only attributes that the 9366 server can reliably query via CB_GETATTR are size and change. 9368 Since CB_GETATTR is being used to satisfy another client's GETATTR 9369 request, the server only needs to know if the client holding the 9370 delegation has a modified version of the file. If the client's copy 9371 of the delegated file is not modified (data or size), the server can 9372 satisfy the second client's GETATTR request from the attributes 9373 stored locally at the server. If the file is modified, the server 9374 only needs to know about this modified state. If the server 9375 determines that the file is currently modified, it will respond to 9376 the second client's GETATTR as if the file had been modified locally 9377 at the server. 9379 Since the form of the change attribute is determined by the server 9380 and is opaque to the client, the client and server need to agree on a 9381 method of communicating the modified state of the file. For the size 9382 attribute, the client will report its current view of the file size. 9383 For the change attribute, the handling is more involved. 9385 For the client, the following steps will be taken when receiving a 9386 write delegation: 9388 o The value of the change attribute will be obtained from the server 9389 and cached. Let this value be represented by c. 9391 o The client will create a value greater than c that will be used 9392 for communicating modified data is held at the client. Let this 9393 value be represented by d. 9395 o When the client is queried via CB_GETATTR for the change 9396 attribute, it checks to see if it holds modified data. If the 9397 file is modified, the value d is returned for the change attribute 9398 value. If this file is not currently modified, the client returns 9399 the value c for the change attribute. 9401 For simplicity of implementation, the client MAY for each CB_GETATTR 9402 return the same value d. This is true even if, between successive 9403 CB_GETATTR operations, the client again modifies in the file's data 9404 or metadata in its cache. The client can return the same value 9405 because the only requirement is that the client be able to indicate 9406 to the server that the client holds modified data. Therefore, the 9407 value of d may always be c + 1. 9409 While the change attribute is opaque to the client in the sense that 9410 it has no idea what units of time, if any, the server is counting 9411 change with, it is not opaque in that the client has to treat it as 9412 an unsigned integer, and the server has to be able to see the results 9413 of the client's changes to that integer. Therefore, the server MUST 9414 encode the change attribute in network order when sending it to the 9415 client. The client MUST decode it from network order to its native 9416 order when receiving it and the client MUST encode it network order 9417 when sending it to the server. For this reason, change is defined as 9418 an unsigned integer rather than an opaque array of bytes. 9420 For the server, the following steps will be taken when providing a 9421 write delegation: 9423 o Upon providing a write delegation, the server will cache a copy of 9424 the change attribute in the data structure it uses to record the 9425 delegation. Let this value be represented by sc. 9427 o When a second client sends a GETATTR operation on the same file to 9428 the server, the server obtains the change attribute from the first 9429 client. Let this value be cc. 9431 o If the value cc is equal to sc, the file is not modified and the 9432 server returns the current values for change, time_metadata, and 9433 time_modify (for example) to the second client. 9435 o If the value cc is NOT equal to sc, the file is currently modified 9436 at the first client and most likely will be modified at the server 9437 at a future time. The server then uses its current time to 9438 construct attribute values for time_metadata and time_modify. A 9439 new value of sc, which we will call nsc, is computed by the 9440 server, such that nsc >= sc + 1. The server then returns the 9441 constructed time_metadata, time_modify, and nsc values to the 9442 requester. The server replaces sc in the delegation record with 9443 nsc. To prevent the possibility of time_modify, time_metadata, 9444 and change from appearing to go backward (which would happen if 9445 the client holding the delegation fails to write its modified data 9446 to the server before the delegation is revoked or returned), the 9447 server SHOULD update the file's metadata record with the 9448 constructed attribute values. For reasons of reasonable 9449 performance, committing the constructed attribute values to stable 9450 storage is OPTIONAL. 9452 As discussed earlier in this section, the client MAY return the same 9453 cc value on subsequent CB_GETATTR calls, even if the file was 9454 modified in the client's cache yet again between successive 9455 CB_GETATTR calls. Therefore, the server must assume that the file 9456 has been modified yet again, and MUST take care to ensure that the 9457 new nsc it constructs and returns is greater than the previous nsc it 9458 returned. An example implementation's delegation record would 9459 satisfy this mandate by including a boolean field (let us call it 9460 "modified") that is set to FALSE when the delegation is granted, and 9461 an sc value set at the time of grant to the change attribute value. 9462 The modified field would be set to TRUE the first time cc != sc, and 9463 would stay TRUE until the delegation is returned or revoked. The 9464 processing for constructing nsc, time_modify, and time_metadata would 9465 use this pseudo code: 9467 if (!modified) { 9468 do CB_GETATTR for change and size; 9470 if (cc != sc) 9471 modified = TRUE; 9472 } else { 9473 do CB_GETATTR for size; 9474 } 9476 if (modified) { 9477 sc = sc + 1; 9478 time_modify = time_metadata = current_time; 9479 update sc, time_modify, time_metadata into file's metadata; 9480 } 9482 This would return to the client (that sent GETATTR) the attributes it 9483 requested, but make sure size comes from what CB_GETATTR returned. 9484 The server would not update the file's metadata with the client's 9485 modified size. 9487 In the case that the file attribute size is different than the 9488 server's current value, the server treats this as a modification 9489 regardless of the value of the change attribute retrieved via 9490 CB_GETATTR and responds to the second client as in the last step. 9492 This methodology resolves issues of clock differences between client 9493 and server and other scenarios where the use of CB_GETATTR break 9494 down. 9496 It should be noted that the server is under no obligation to use 9497 CB_GETATTR and therefore the server MAY simply recall the delegation 9498 to avoid its use. 9500 10.4.4. Recall of Open Delegation 9502 The following events necessitate recall of an open delegation: 9504 o Potentially conflicting OPEN request (or READ/WRITE done with 9505 "special" stateid) 9507 o SETATTR sent by another client 9509 o REMOVE request for the file 9511 o RENAME request for the file as either source or target of the 9512 RENAME 9514 Whether a RENAME of a directory in the path leading to the file 9515 results in recall of an open delegation depends on the semantics of 9516 the server's file system. If that file system denies such RENAMEs 9517 when a file is open, the recall must be performed to determine 9518 whether the file in question is, in fact, open. 9520 In addition to the situations above, the server may choose to recall 9521 open delegations at any time if resource constraints make it 9522 advisable to do so. Clients should always be prepared for the 9523 possibility of recall. 9525 When a client receives a recall for an open delegation, it needs to 9526 update state on the server before returning the delegation. These 9527 same updates must be done whenever a client chooses to return a 9528 delegation voluntarily. The following items of state need to be 9529 dealt with: 9531 o If the file associated with the delegation is no longer open and 9532 no previous CLOSE operation has been sent to the server, a CLOSE 9533 operation must be sent to the server. 9535 o If a file has other open references at the client, then OPEN 9536 operations must be sent to the server. The appropriate stateids 9537 will be provided by the server for subsequent use by the client 9538 since the delegation stateid will no longer be valid. These OPEN 9539 requests are done with the claim type of CLAIM_DELEGATE_CUR. This 9540 will allow the presentation of the delegation stateid so that the 9541 client can establish the appropriate rights to perform the OPEN. 9542 (see the Section 18.16 which describes the OPEN" operation for 9543 details.) 9545 o If there are granted file locks, the corresponding LOCK operations 9546 need to be performed. This applies to the write open delegation 9547 case only. 9549 o For a write open delegation, if at the time of recall the file is 9550 not open for write, all modified data for the file must be flushed 9551 to the server. If the delegation had not existed, the client 9552 would have done this data flush before the CLOSE operation. 9554 o For a write open delegation when a file is still open at the time 9555 of recall, any modified data for the file needs to be flushed to 9556 the server. 9558 o With the write open delegation in place, it is possible that the 9559 file was truncated during the duration of the delegation. For 9560 example, the truncation could have occurred as a result of an OPEN 9561 UNCHECKED with a size attribute value of zero. Therefore, if a 9562 truncation of the file has occurred and this operation has not 9563 been propagated to the server, the truncation must occur before 9564 any modified data is written to the server. 9566 In the case of write open delegation, file locking imposes some 9567 additional requirements. To precisely maintain the associated 9568 invariant, it is required to flush any modified data in any region 9569 for which a write lock was released while the write delegation was in 9570 effect. However, because the write open delegation implies no other 9571 locking by other clients, a simpler implementation is to flush all 9572 modified data for the file (as described just above) if any write 9573 lock has been released while the write open delegation was in effect. 9575 An implementation need not wait until delegation recall (or deciding 9576 to voluntarily return a delegation) to perform any of the above 9577 actions, if implementation considerations (e.g. resource availability 9578 constraints) make that desirable. Generally, however, the fact that 9579 the actual open state of the file may continue to change makes it not 9580 worthwhile to send information about opens and closes to the server, 9581 except as part of delegation return. Only in the case of closing the 9582 open that resulted in obtaining the delegation would clients be 9583 likely to do this early, since, in that case, the close once done 9584 will not be undone. Regardless of the client's choices on scheduling 9585 these actions, all must be performed before the delegation is 9586 returned, including (when applicable) the close that corresponds to 9587 the open that resulted in the delegation. These actions can be 9588 performed either in previous requests or in previous operations in 9589 the same COMPOUND request. 9591 10.4.5. Clients that Fail to Honor Delegation Recalls 9593 A client may fail to respond to a recall for various reasons, such as 9594 a failure of the backchannel from server to the client. The client 9595 may be unaware of a failure in the backchannel. This lack of 9596 awareness could result in the client finding out long after the 9597 failure that its delegation has been revoked, and another client has 9598 modified the data for which the client had a delegation. This is 9599 especially a problem for the client that held a write delegation. 9601 Status bits returned by SEQUENCE operations help to provide an 9602 alternate way of informing the client of issues regarding the status 9603 of the backchannel and of recalled delegations. When the backchannel 9604 is not available, the server returns the status bit 9605 SEQ4_STATUS_CB_PATH_DOWN on SEQUENCE operations. The client can 9606 react by attempting to re-establish the backchannel and by returning 9607 recallable objects if a backchannel cannot be successfully re- 9608 established. 9610 Whether the backchannel is functioning or not, it may be that the 9611 recalled delegation is not returned. Note that the client's lease 9612 might still be renewed, even though the recalled delegation is not 9613 returned. In this situation, servers SHOULD revoke delegations that 9614 are not returned in a period of time equal to the lease period. This 9615 period of time should allow the client time to note the backchannel- 9616 down status and re-establish the backchannel. 9618 When delegations are revoked, the server will return with the 9619 SEQ4_STATUS_RECALLABLE_STATE_REVOKED status bit set on subsequent 9620 SEQUENCE operations. The client should note this and then use 9621 TEST_STATEID to find which delegations have been recalled. 9623 10.4.6. Delegation Revocation 9625 At the point a delegation is revoked, if there are associated opens 9626 on the client, these opens may or may not be revoked. If no lock or 9627 open is granted that is inconsistent with the existing open, the 9628 stateid for the open may remain valid, and be disconnected from the 9629 revoked delegation, just as would be the case if the delegation were 9630 returned. 9632 For example, if an OPEN for read-write with DENY=NONE is associated 9633 with the delegation, granting of another such OPEN to a different 9634 client will revoke the delegation but need not revoke the OPEN, since 9635 no lock inconsistent with that OPEN has been granted. On the other 9636 hand, if an OPEN denying write is granted, then the existing open 9637 must be revoked. 9639 When opens and/or locks are revoked, the applications holding these 9640 opens or locks need to be notified. This notification usually occurs 9641 by returning errors for READ/WRITE operations or when a close is 9642 attempted for the open file. 9644 If no opens exist for the file at the point the delegation is 9645 revoked, then notification of the revocation is unnecessary. 9646 However, if there is modified data present at the client for the 9647 file, the user of the application should be notified. Unfortunately, 9648 it may not be possible to notify the user since active applications 9649 may not be present at the client. See Section 10.5.1 for additional 9650 details. 9652 10.4.7. Delegations via WANT_DELEGATION 9654 In addition to providing delegations as part of the reply to OPEN 9655 operations, servers MAY provide delegations separate from open, via 9656 the OPTIONAL WANT_DELEGATION operation. This allows delegations to 9657 be obtained in advance of an OPEN that might benefit from them, for 9658 objects which are not a valid target of OPEN, or to deal with cases 9659 in which a delegation has been recalled and the client wants to make 9660 an attempt to re-establish it if the absence of use by other clients 9661 allows that. 9663 The WANT_DELEGATION operation may be performed on any type of file 9664 object other than a directory. 9666 When a delegation is obtained using WANT_DELEGATION, any open files 9667 for the same filehandle held by that client are to be treated as 9668 subordinate to the delegation, just as if they had been created using 9669 an OPEN of type CLAIM_DELEGATE_CUR. They are otherwise unchanged as 9670 to seqid, access and deny modes, and the relationship with byte-range 9671 locks. Similarly, existing byte-range locks subordinate to an open 9672 which becomes subordinate to a delegation, become indirectly 9673 subordinate to that new delegation. 9675 The WANT_DELEGATION operation provides for delivery of delegations 9676 via callbacks, when the delegations are not immediately available. 9677 When a requested delegation is available, it is delivered to the 9678 client via a CB_PUSH_DELEG operation. When this happens, open files 9679 for the same filehandle become subordinate to the new delegation at 9680 the point at which the delegation is delivered , just as if they had 9681 been created using an OPEN of type CLAIM_DELEGATE_CUR. Similarly, 9682 for existing byte-range locks subordinate to an open. 9684 10.5. Data Caching and Revocation 9686 When locks and delegations are revoked, the assumptions upon which 9687 successful caching depend are no longer guaranteed. For any locks or 9688 share reservations that have been revoked, the corresponding state- 9689 owner needs to be notified. This notification includes applications 9690 with a file open that has a corresponding delegation which has been 9691 revoked. Cached data associated with the revocation must be removed 9692 from the client. In the case of modified data existing in the 9693 client's cache, that data must be removed from the client without it 9694 being written to the server. As mentioned, the assumptions made by 9695 the client are no longer valid at the point when a lock or delegation 9696 has been revoked. For example, another client may have been granted 9697 a conflicting lock after the revocation of the lock at the first 9698 client. Therefore, the data within the lock range may have been 9699 modified by the other client. Obviously, the first client is unable 9700 to guarantee to the application what has occurred to the file in the 9701 case of revocation. 9703 Notification to a state-owner will in many cases consist of simply 9704 returning an error on the next and all subsequent READs/WRITEs to the 9705 open file or on the close. Where the methods available to a client 9706 make such notification impossible because errors for certain 9707 operations may not be returned, more drastic action such as signals 9708 or process termination may be appropriate. The justification for 9709 this is that an invariant for which an application depends on may be 9710 violated. Depending on how errors are typically treated for the 9711 client operating environment, further levels of notification 9712 including logging, console messages, and GUI pop-ups may be 9713 appropriate. 9715 10.5.1. Revocation Recovery for Write Open Delegation 9717 Revocation recovery for a write open delegation poses the special 9718 issue of modified data in the client cache while the file is not 9719 open. In this situation, any client which does not flush modified 9720 data to the server on each close must ensure that the user receives 9721 appropriate notification of the failure as a result of the 9722 revocation. Since such situations may require human action to 9723 correct problems, notification schemes in which the appropriate user 9724 or administrator is notified may be necessary. Logging and console 9725 messages are typical examples. 9727 If there is modified data on the client, it must not be flushed 9728 normally to the server. A client may attempt to provide a copy of 9729 the file data as modified during the delegation under a different 9730 name in the file system name space to ease recovery. Note that when 9731 the client can determine that the file has not been modified by any 9732 other client, or when the client has a complete cached copy of file 9733 in question, such a saved copy of the client's view of the file may 9734 be of particular value for recovery. In other case, recovery using a 9735 copy of the file based partially on the client's cached data and 9736 partially on the server copy as modified by other clients, will be 9737 anything but straightforward, so clients may avoid saving file 9738 contents in these situations or mark the results specially to warn 9739 users of possible problems. 9741 Saving of such modified data in delegation revocation situations may 9742 be limited to files of a certain size or might be used only when 9743 sufficient disk space is available within the target file system. 9744 Such saving may also be restricted to situations when the client has 9745 sufficient buffering resources to keep the cached copy available 9746 until it is properly stored to the target file system. 9748 10.6. Attribute Caching 9750 This section pertains to the caching of a file's attributes on a 9751 client when that client does not hold a delegation on the file. 9753 The attributes discussed in this section do not include named 9754 attributes. Individual named attributes are analogous to files and 9755 caching of the data for these needs to be handled just as data 9756 caching is for ordinary files. Similarly, LOOKUP results from an 9757 OPENATTR directory are to be cached on the same basis as any other 9758 pathnames and similarly for directory contents. 9760 Clients may cache file attributes obtained from the server and use 9761 them to avoid subsequent GETATTR requests. Such caching is write 9762 through in that modification to file attributes is always done by 9763 means of requests to the server and should not be done locally and 9764 cached. The exception to this are modifications to attributes that 9765 are intimately connected with data caching. Therefore, extending a 9766 file by writing data to the local data cache is reflected immediately 9767 in the size as seen on the client without this change being 9768 immediately reflected on the server. Normally such changes are not 9769 propagated directly to the server but when the modified data is 9770 flushed to the server, analogous attribute changes are made on the 9771 server. When open delegation is in effect, the modified attributes 9772 may be returned to the server in reaction to a CB_RECALL call. 9774 The result of local caching of attributes is that the attribute 9775 caches maintained on individual clients will not be coherent. 9776 Changes made in one order on the server may be seen in a different 9777 order on one client and in a third order on a different client. 9779 The typical file system application programming interfaces do not 9780 provide means to atomically modify or interrogate attributes for 9781 multiple files at the same time. The following rules provide an 9782 environment where the potential incoherences mentioned above can be 9783 reasonably managed. These rules are derived from the practice of 9784 previous NFS protocols. 9786 o All attributes for a given file (per-fsid attributes excepted) are 9787 cached as a unit at the client so that no non-serializability can 9788 arise within the context of a single file. 9790 o An upper time boundary is maintained on how long a client cache 9791 entry can be kept without being refreshed from the server. 9793 o When operations are performed that change attributes at the 9794 server, the updated attribute set is requested as part of the 9795 containing RPC. This includes directory operations that update 9796 attributes indirectly. This is accomplished by following the 9797 modifying operation with a GETATTR operation and then using the 9798 results of the GETATTR to update the client's cached attributes. 9800 Note that if the full set of attributes to be cached is requested by 9801 READDIR, the results can be cached by the client on the same basis as 9802 attributes obtained via GETATTR. 9804 A client may validate its cached version of attributes for a file by 9805 fetching just both the change and time_access attributes and assuming 9806 that if the change attribute has the same value as it did when the 9807 attributes were cached, then no attributes other than time_access 9808 have changed. The reason why time_access is also fetched is because 9809 many servers operate in environments where the operation that updates 9810 change does not update time_access. For example, POSIX file 9811 semantics do not update access time when a file is modified by the 9812 write system call. Therefore, the client that wants a current 9813 time_access value should fetch it with change during the attribute 9814 cache validation processing and update its cached time_access. 9816 The client may maintain a cache of modified attributes for those 9817 attributes intimately connected with data of modified regular files 9818 (size, time_modify, and change). Other than those three attributes, 9819 the client MUST NOT maintain a cache of modified attributes. 9820 Instead, attribute changes are immediately sent to the server. 9822 In some operating environments, the equivalent to time_access is 9823 expected to be implicitly updated by each read of the content of the 9824 file object. If an NFS client is caching the content of a file 9825 object, whether it is a regular file, directory, or symbolic link, 9826 the client SHOULD NOT update the time_access attribute (via SETATTR 9827 or a small READ or READDIR request) on the server with each read that 9828 is satisfied from cache. The reason is that this can defeat the 9829 performance benefits of caching content, especially since an explicit 9830 SETATTR of time_access may alter the change attribute on the server. 9831 If the change attribute changes, clients that are caching the content 9832 will think the content has changed, and will re-read unmodified data 9833 from the server. Nor is the client encouraged to maintain a modified 9834 version of time_access in its cache, since this would mean that the 9835 client will either eventually have to write the access time to the 9836 server with bad performance effects, or it would never update the 9837 server's time_access, thereby resulting in a situation where an 9838 application that caches access time between a close and open of the 9839 same file observes the access time oscillating between the past and 9840 present. The time_access attribute always means the time of last 9841 access to a file by a read that was satisfied by the server. This 9842 way clients will tend to see only time_access changes that go forward 9843 in time. 9845 10.7. Data and Metadata Caching and Memory Mapped Files 9847 Some operating environments include the capability for an application 9848 to map a file's content into the application's address space. Each 9849 time the application accesses a memory location that corresponds to a 9850 block that has not been loaded into the address space, a page fault 9851 occurs and the file is read (or if the block does not exist in the 9852 file, the block is allocated and then instantiated in the 9853 application's address space). 9855 As long as each memory mapped access to the file requires a page 9856 fault, the relevant attributes of the file that are used to detect 9857 access and modification (time_access, time_metadata, time_modify, and 9858 change) will be updated. However, in many operating environments, 9859 when page faults are not required these attributes will not be 9860 updated on reads or updates to the file via memory access (regardless 9861 whether the file is local file or is being access remotely). A 9862 client or server MAY fail to update attributes of a file that is 9863 being accessed via memory mapped I/O. This has several implications: 9865 o If there is an application on the server that has memory mapped a 9866 file that a client is also accessing, the client may not be able 9867 to get a consistent value of the change attribute to determine 9868 whether its cache is stale or not. A server that knows that the 9869 file is memory mapped could always pessimistically return updated 9870 values for change so as to force the application to always get the 9871 most up to date data and metadata for the file. However, due to 9872 the negative performance implications of this, such behavior is 9873 OPTIONAL. 9875 o If the memory mapped file is not being modified on the server, and 9876 instead is just being read by an application via the memory mapped 9877 interface, the client will not see an updated time_access 9878 attribute. However, in many operating environments, neither will 9879 any process running on the server. Thus NFS clients are at no 9880 disadvantage with respect to local processes. 9882 o If there is another client that is memory mapping the file, and if 9883 that client is holding a write delegation, the same set of issues 9884 as discussed in the previous two bullet items apply. So, when a 9885 server does a CB_GETATTR to a file that the client has modified in 9886 its cache, the reply from CB_GETATTR will not necessarily be 9887 accurate. As discussed earlier, the client's obligation is to 9888 report that the file has been modified since the delegation was 9889 granted, not whether it has been modified again between successive 9890 CB_GETATTR calls, and the server MUST assume that any file the 9891 client has modified in cache has been modified again between 9892 successive CB_GETATTR calls. Depending on the nature of the 9893 client's memory management system, this weak obligation may not be 9894 possible. A client MAY return stale information in CB_GETATTR 9895 whenever the file is memory mapped. 9897 o The mixture of memory mapping and file locking on the same file is 9898 problematic. Consider the following scenario, where a page size 9899 on each client is 8192 bytes. 9901 * Client A memory maps first page (8192 bytes) of file X 9903 * Client B memory maps first page (8192 bytes) of file X 9905 * Client A write locks first 4096 bytes 9907 * Client B write locks second 4096 bytes 9909 * Client A, via a STORE instruction modifies part of its locked 9910 region. 9912 * Simultaneous to client A, client B executes a STORE on part of 9913 its locked region. 9915 Here the challenge is for each client to resynchronize to get a 9916 correct view of the first page. In many operating environments, the 9917 virtual memory management systems on each client only know a page is 9918 modified, not that a subset of the page corresponding to the 9919 respective lock regions has been modified. So it is not possible for 9920 each client to do the right thing, which is to only write to the 9921 server that portion of the page that is locked. For example, if 9922 client A simply writes out the page, and then client B writes out the 9923 page, client A's data is lost. 9925 Moreover, if mandatory locking is enabled on the file, then we have a 9926 different problem. When clients A and B execute the STORE 9927 instructions, the resulting page faults require a byte-range lock on 9928 the entire page. Each client then tries to extend their locked range 9929 to the entire page, which results in a deadlock. Communicating the 9930 NFS4ERR_DEADLOCK error to a STORE instruction is difficult at best. 9932 If a client is locking the entire memory mapped file, there is no 9933 problem with advisory or mandatory byte-range locking, at least until 9934 the client unlocks a region in the middle of the file. 9936 Given the above issues the following are permitted: 9938 o Clients and servers MAY deny memory mapping a file they know there 9939 are byte-range locks for. 9941 o Clients and servers MAY deny a byte-range lock on a file they know 9942 is memory mapped. 9944 o A client MAY deny memory mapping a file that it knows requires 9945 mandatory locking for I/O. If mandatory locking is enabled after 9946 the file is opened and mapped, the client MAY deny the application 9947 further access to its mapped file. 9949 10.8. Name and Directory Caching without Directory Delegations 9951 The NFSv4.1 directory delegation facility (described in Section 10.9 9952 below) is OPTIONAL for servers to implement. Even where it is 9953 implemented, it may not be always be functional because of resource 9954 availability issues or other constraints. Thus, it is important to 9955 understand how name and directory caching are done in the absence of 9956 directory delegations. Those topics are discussed in the next in 9957 Section 10.8.1. 9959 10.8.1. Name Caching 9961 The results of LOOKUP and READDIR operations may be cached to avoid 9962 the cost of subsequent LOOKUP operations. Just as in the case of 9963 attribute caching, inconsistencies may arise among the various client 9964 caches. To mitigate the effects of these inconsistencies and given 9965 the context of typical file system APIs, an upper time boundary is 9966 maintained on how long a client name cache entry can be kept without 9967 verifying that the entry has not been made invalid by a directory 9968 change operation performed by another client. 9970 When a client is not making changes to a directory for which there 9971 exist name cache entries, the client needs to periodically fetch 9972 attributes for that directory to ensure that it is not being 9973 modified. After determining that no modification has occurred, the 9974 expiration time for the associated name cache entries may be updated 9975 to be the current time plus the name cache staleness bound. 9977 When a client is making changes to a given directory, it needs to 9978 determine whether there have been changes made to the directory by 9979 other clients. It does this by using the change attribute as 9980 reported before and after the directory operation in the associated 9981 change_info4 value returned for the operation. The server is able to 9982 communicate to the client whether the change_info4 data is provided 9983 atomically with respect to the directory operation. If the change 9984 values are provided atomically, the client has a basis for 9985 determining, given proper care, whether other clients are modifying 9986 the directory is question. 9988 The simplest way to enable the client to make this determination is 9989 for the client to serialize all changes made to a specific directory. 9990 When this is done, and the server provides before and after values of 9991 the change attribute atomically, the client can simply compare the 9992 after value of the change attribute from one operation on a directory 9993 with the before value on the next subsequent operation modifying that 9994 directory. When these are equal, the client is assured that no other 9995 client is modifying the directory in question. 9997 When such serialization is not used, and there may be multiple 9998 simultaneous outstanding operations modifying a single directory sent 9999 from a single client, making this sort of determination can be more 10000 complicated, since two such operations which are recognized as 10001 complete in a different order than they were actually performed, 10002 might give an appearance consistent with modification being made by 10003 another client. Where this appears to happen, the client needs to 10004 await the completion of all such modifications that were started 10005 previously, to see if the outstanding before and after change numbers 10006 can be sorted into a chain such that the before value of one change 10007 number matches the after value of a previous one, in a chain 10008 consistent with this client being the only one modifying the 10009 directory. 10011 In either of these cases, the client is able to determine whether the 10012 directory is being modified by another client. If the comparison 10013 indicates that the directory was updated by another client, the name 10014 cache associated with the modified directory is purged from the 10015 client. If the comparison indicates no modification, the name cache 10016 can be updated on the client to reflect the directory operation and 10017 the associated timeout extended. The post-operation change value 10018 needs to be saved as the basis for future change_info4 comparisons. 10020 As demonstrated by the scenario above, name caching requires that the 10021 client revalidate name cache data by inspecting the change attribute 10022 of a directory at the point when the name cache item was cached. 10023 This requires that the server update the change attribute for 10024 directories when the contents of the corresponding directory is 10025 modified. For a client to use the change_info4 information 10026 appropriately and correctly, the server must report the pre and post 10027 operation change attribute values atomically. When the server is 10028 unable to report the before and after values atomically with respect 10029 to the directory operation, the server must indicate that fact in the 10030 change_info4 return value. When the information is not atomically 10031 reported, the client should not assume that other clients have not 10032 changed the directory. 10034 10.8.2. Directory Caching 10036 The results of READDIR operations may be used to avoid subsequent 10037 READDIR operations. Just as in the cases of attribute and name 10038 caching, inconsistencies may arise among the various client caches. 10039 To mitigate the effects of these inconsistencies, and given the 10040 context of typical file system APIs, the following rules should be 10041 followed: 10043 o Cached READDIR information for a directory which is not obtained 10044 in a single READDIR operation must always be a consistent snapshot 10045 of directory contents. This is determined by using a GETATTR 10046 before the first READDIR and after the last of READDIR that 10047 contributes to the cache. 10049 o An upper time boundary is maintained to indicate the length of 10050 time a directory cache entry is considered valid before the client 10051 must revalidate the cached information. 10053 The revalidation technique parallels that discussed in the case of 10054 name caching. When the client is not changing the directory in 10055 question, checking the change attribute of the directory with GETATTR 10056 is adequate. The lifetime of the cache entry can be extended at 10057 these checkpoints. When a client is modifying the directory, the 10058 client needs to use the change_info4 data to determine whether there 10059 are other clients modifying the directory. If it is determined that 10060 no other client modifications are occurring, the client may update 10061 its directory cache to reflect its own changes. 10063 As demonstrated previously, directory caching requires that the 10064 client revalidate directory cache data by inspecting the change 10065 attribute of a directory at the point when the directory was cached. 10066 This requires that the server update the change attribute for 10067 directories when the contents of the corresponding directory is 10068 modified. For a client to use the change_info4 information 10069 appropriately and correctly, the server must report the pre and post 10070 operation change attribute values atomically. When the server is 10071 unable to report the before and after values atomically with respect 10072 to the directory operation, the server must indicate that fact in the 10073 change_info4 return value. When the information is not atomically 10074 reported, the client should not assume that other clients have not 10075 changed the directory. 10077 10.9. Directory Delegations 10079 10.9.1. Introduction to Directory Delegations 10081 Directory caching for the NFSv4.1 protocol, as previously described, 10082 is similar to file caching in previous versions. Clients typically 10083 cache directory information for a duration determined by the client. 10084 At the end of a predefined timeout, the client will query the server 10085 to see if the directory has been updated. By caching attributes, 10086 clients reduce the number of GETATTR calls made to the server to 10087 validate attributes. Furthermore, frequently accessed files and 10088 directories, such as the current working directory, have their 10089 attributes cached on the client so that some NFS operations can be 10090 performed without having to make an RPC call. By caching name and 10091 inode information about most recently looked up entries in a 10092 Directory Name Lookup Cache (DNLC), clients do not need to send 10093 LOOKUP calls to the server every time these files are accessed. 10095 This caching approach works reasonably well at reducing network 10096 traffic in many environments. However, it does not address 10097 environments where there are numerous queries for files that do not 10098 exist. In these cases of "misses", the client sends requests to the 10099 server in order to provide reasonable application semantics and 10100 promptly detect the creation of new directory entries. Examples of 10101 high miss activity are compilation in software development 10102 environments. The current behavior of NFS limits its potential 10103 scalability and wide-area sharing effectiveness in these types of 10104 environments. Other distributed stateful file system architectures 10105 such as AFS and DFS have proven that adding state around directory 10106 contents can greatly reduce network traffic in high-miss 10107 environments. 10109 Delegation of directory contents is an OPTIONAL feature of NFSv4.1. 10110 Directory delegations provide similar traffic reduction benefits as 10111 with file delegations. By allowing clients to cache directory 10112 contents (in a read-only fashion) while being notified of changes, 10113 the client can avoid making frequent requests to interrogate the 10114 contents of slowly-changing directories, reducing network traffic and 10115 improving client performance. It can also simplify the task of 10116 determining whether other clients are making changes to the directory 10117 when the client itself is making many changes to the directory and 10118 changes are not serialized. 10120 Directory delegations allow improved namespace cache consistency to 10121 be achieved through delegations and synchronous recalls, in the 10122 absence of notifications. In addition, if time-based consistency is 10123 sufficient, asynchronous notifications can provide performance 10124 benefits for the client, and possibly the server, under some common 10125 operating conditions such as slowly-changing and/or very large 10126 directories. 10128 10.9.2. Directory Delegation Design 10130 NFSv4.1 introduces the GET_DIR_DELEGATION (Section 18.39) operation 10131 to allow the client to ask for a directory delegation. The 10132 delegation covers directory attributes and all entries in the 10133 directory. If either of these change, the delegation will be 10134 recalled synchronously. The operation causing the recall will have 10135 to wait before the recall is complete. Any changes to directory 10136 entry attributes will not cause the delegation to be recalled. 10138 In addition to asking for delegations, a client can also ask for 10139 notifications for certain events. These events include changes to 10140 the directory's attributes and/or its contents. If a client asks for 10141 notification for a certain event, the server will notify the client 10142 when that event occurs. This will not result in the delegation being 10143 recalled for that client. The notifications are asynchronous and 10144 provide a way of avoiding recalls in situations where a directory is 10145 changing enough that the pure recall model may not be effective while 10146 trying to allow the client to get substantial benefit. In the 10147 absence of notifications, once the delegation is recalled the client 10148 has to refresh its directory cache which might not be very efficient 10149 for very large directories. 10151 The delegation is read-only and the client may not make changes to 10152 the directory other than by performing NFSv4.1 operations that modify 10153 the directory or the associated file attributes so that the server 10154 has knowledge of these changes. In order to keep the client 10155 namespace synchronized with the server, the server will, if the 10156 client has requested notifications, notify the client holding the 10157 delegation of the changes made as a result. This is to avoid any 10158 need for subsequent GETATTR or READDIR calls to the server. If a 10159 single client is holding the delegation and that client makes any 10160 changes to the directory (i.e. the changes are made via operations 10161 sent though a session associated with the client ID holding the 10162 delegation), the delegation will not be recalled. Multiple clients 10163 may hold a delegation on the same directory, but if any such client 10164 modifies the directory, the server MUST recall the delegation from 10165 the other clients, unless those clients have made provisions to be 10166 notified of that sort of modification. 10168 Delegations can be recalled by the server at any time. Normally, the 10169 server will recall the delegation when the directory changes in a way 10170 that is not covered by the notification, or when the directory 10171 changes and notifications have not been requested. If another client 10172 removes the directory for which a delegation has been granted, the 10173 server will recall the delegation. 10175 10.9.3. Attributes in Support of Directory Notifications 10177 See Section 5.11 for a description of the attributes associated with 10178 directory notifications. 10180 10.9.4. Directory Delegation Recall 10182 The server will recall the directory delegation by sending a callback 10183 to the client. It will use the same callback procedure as used for 10184 recalling file delegations. The server will recall the delegation 10185 when the directory changes in a way that is not covered by the 10186 notification. However the server need not recall the delegation if 10187 attributes of an entry within the directory change. 10189 If the server notices that handing out a delegation for a directory 10190 is causing too many notifications to be sent out, it may decide not 10191 to hand out delegations for that directory, or recall those already 10192 granted. If a client tries to remove the directory for which a 10193 delegation has been granted, the server will recall all associated 10194 delegations. 10196 The implementation sections for a number of operations describe 10197 situations in which notification or delegation recall would be 10198 required under some common circumstances. In this regard, a similar 10199 set of caveats to those listed in Section 10.2 apply. 10201 o For CREATE, see Section 18.4.4. 10203 o For LINK, see Section 18.9.4. 10205 o For OPEN, see Section 18.16.4. 10207 o For REMOVE, see Section 18.25.4. 10209 o For RENAME, see Section 18.26.4. 10211 o For SETATTR, see Section 18.30.4. 10213 10.9.5. Directory Delegation Recovery 10215 Recovery from client or server restart for state on regular files has 10216 two main goals, avoiding the necessity of breaking application 10217 guarantees with respect to locked files and delivery of updates 10218 cached at the client. Neither of these goals applies to directories 10219 protected by read delegations and notifications. Thus, no provision 10220 is made for reclaiming directory delegations in the event of client 10221 or server restart. The client can simply establish a directory 10222 delegation in the same fashion as was done initially. 10224 11. Multi-Server Namespace 10226 NFSv4.1 supports attributes that allow a namespace to extend beyond 10227 the boundaries of a single server. It is RECOMMENDED that clients 10228 and servers support construction of such multi-server namespaces. 10229 Use of such multi-server namespaces is OPTIONAL however, and for many 10230 purposes, single-server namespace are perfectly acceptable. Use of 10231 multi-server namespaces can provide many advantages, however, by 10232 separating a file system's logical position in a namespace from the 10233 (possibly changing) logistical and administrative considerations that 10234 result in particular file systems being located on particular 10235 servers. 10237 11.1. Location Attributes 10239 NFSv4.1 contains RECOMMENDED attributes that allow file systems on 10240 one server to be associated with one or more instances of that file 10241 system on other servers. These attributes specify such file system 10242 instances by specifying a server address target (either as a DNS name 10243 representing one or more IP addresses or as a literal IP address) 10244 together with the path of that file system within the associated 10245 single-server namespace. 10247 The fs_locations_info RECOMMENDED attribute allows specification of 10248 one or more file system instance locations where the data 10249 corresponding to a given file system may be found. This attribute 10250 provides to the client, in addition to information about file system 10251 instance locations, significant information about the various file 10252 system instance choices (e.g. priority for use, writability, 10253 currency, etc.). It also includes information to help the client 10254 efficiently effect as seamless a transition as possible among 10255 multiple file system instances, when and if that should be necessary. 10257 The fs_locations RECOMMENDED attribute is inherited from NFSv4.0 and 10258 only allows specification of the file system locations where the data 10259 corresponding to a given file system may be found. Servers SHOULD 10260 make this attribute available whenever fs_locations_info is 10261 supported, but client use of fs_locations_info is to be preferred. 10263 11.2. File System Presence or Absence 10265 A given location in an NFSv4.1 namespace (typically but not 10266 necessarily a multi-server namespace) can have a number of file 10267 system instance locations associated with it (via the fs_locations or 10268 fs_locations_info attribute). There may also be an actual current 10269 file system at that location, accessible via normal namespace 10270 operations (e.g. LOOKUP). In this case, the file system is said to 10271 be "present" at that position in the namespace and clients will 10272 typically use it, reserving use of additional locations specified via 10273 the location-related attributes to situations in which the principal 10274 location is no longer available. 10276 When there is no actual file system at the namespace location in 10277 question, the file system is said to be "absent". An absent file 10278 system contains no files or directories other than the root. Any 10279 reference to it, except to access a small set of attributes useful in 10280 determining alternate locations, will result in an error, 10281 NFS4ERR_MOVED. Note that if the server ever returns the error 10282 NFS4ERR_MOVED, it MUST support the fs_locations attribute and SHOULD 10283 support the fs_locations_info and fs_status attributes. 10285 While the error name suggests that we have a case of a file system 10286 which once was present, and has only become absent later, this is 10287 only one possibility. A position in the namespace may be permanently 10288 absent with the set of file system(s) designated by the location 10289 attributes being the only realization. The name NFS4ERR_MOVED 10290 reflects an earlier, more limited conception of its function, but 10291 this error will be returned whenever the referenced file system is 10292 absent, whether it has moved or not. 10294 Except in the case of GETATTR-type operations (to be discussed 10295 later), when the current filehandle at the start of an operation is 10296 within an absent file system, that operation is not performed and the 10297 error NFS4ERR_MOVED returned, to indicate that the file system is 10298 absent on the current server. 10300 Because a GETFH cannot succeed if the current filehandle is within an 10301 absent file system, filehandles within an absent file system cannot 10302 be transferred to the client. When a client does have filehandles 10303 within an absent file system, it is the result of obtaining them when 10304 the file system was present, and having the file system become absent 10305 subsequently. 10307 It should be noted that because the check for the current filehandle 10308 being within an absent file system happens at the start of every 10309 operation, operations that change the current filehandle so that it 10310 is within an absent file system will not result in an error. This 10311 allows such combinations as PUTFH-GETATTR and LOOKUP-GETATTR to be 10312 used to get attribute information, particularly location attribute 10313 information, as discussed below. 10315 The RECOMMENDED file system attribute fs_status can be used to 10316 interrogate the present/absent status of a given file system. 10318 11.3. Getting Attributes for an Absent File System 10320 When a file system is absent, most attributes are not available, but 10321 it is necessary to allow the client access to the small set of 10322 attributes that are available, and most particularly those that give 10323 information about the correct current locations for this file system, 10324 fs_locations and fs_locations_info. 10326 11.3.1. GETATTR Within an Absent File System 10328 As mentioned above, an exception is made for GETATTR in that 10329 attributes may be obtained for a filehandle within an absent file 10330 system. This exception only applies if the attribute mask contains 10331 at least one attribute bit that indicates the client is interested in 10332 a result regarding an absent file system: fs_locations, 10333 fs_locations_info, or fs_status. If none of these attributes is 10334 requested, GETATTR will result in an NFS4ERR_MOVED error. 10336 When a GETATTR is done on an absent file system, the set of supported 10337 attributes is very limited. Many attributes, including those that 10338 are normally REQUIRED, will not be available on an absent file 10339 system. In addition to the attributes mentioned above (fs_locations, 10340 fs_locations_info, fs_status), the following attributes SHOULD be 10341 available on absent file systems, in the case of RECOMMENDED 10342 attributes at least to the same degree that they are available on 10343 present file systems. 10345 change_policy: This attribute is useful for absent file systems and 10346 can be helpful in summarizing to the client when any of the 10347 location-related attributes changes. 10349 fsid: This attribute should be provided so that the client can 10350 determine file system boundaries, including, in particular, the 10351 boundary between present and absent file systems. This value must 10352 be different from any other fsid on the current server and need 10353 have no particular relationship to fsids on any particular 10354 destination to which the client might be directed. 10356 mounted_on_fileid: For objects at the top of an absent file system 10357 this attribute needs to be available. Since the fileid is one 10358 which is within the present parent file system, there should be no 10359 need to reference the absent file system to provide this 10360 information. 10362 Other attributes SHOULD NOT be made available for absent file 10363 systems, even when it is possible to provide them. The server should 10364 not assume that more information is always better and should avoid 10365 gratuitously providing additional information. 10367 When a GETATTR operation includes a bit mask for one of the 10368 attributes fs_locations, fs_locations_info, or fs_status, but where 10369 the bit mask includes attributes which are not supported, GETATTR 10370 will not return an error, but will return the mask of the actual 10371 attributes supported with the results. 10373 Handling of VERIFY/NVERIFY is similar to GETATTR in that if the 10374 attribute mask does not include fs_locations, fs_locations_info, or 10375 fs_status, the error NFS4ERR_MOVED will result. It differs in that 10376 any appearance in the attribute mask of an attribute not supported 10377 for an absent file system (and note that this will include some 10378 normally REQUIRED attributes), will also cause an NFS4ERR_MOVED 10379 result. 10381 11.3.2. READDIR and Absent File Systems 10383 A READDIR performed when the current filehandle is within an absent 10384 file system will result in an NFS4ERR_MOVED error, since, unlike the 10385 case of GETATTR, no such exception is made for READDIR. 10387 Attributes for an absent file system may be fetched via a READDIR for 10388 a directory in a present file system, when that directory contains 10389 the root directories of one or more absent file systems. In this 10390 case, the handling is as follows: 10392 o If the attribute set requested includes one of the attributes 10393 fs_locations, fs_locations_info, or fs_status, then fetching of 10394 attributes proceeds normally and no NFS4ERR_MOVED indication is 10395 returned, even when the rdattr_error attribute is requested. 10397 o If the attribute set requested does not include one of the 10398 attributes fs_locations, fs_locations_info, or fs_status, then if 10399 the rdattr_error attribute is requested, each directory entry for 10400 the root of an absent file system, will report NFS4ERR_MOVED as 10401 the value of the rdattr_error attribute. 10403 o If the attribute set requested does not include any of the 10404 attributes fs_locations, fs_locations_info, fs_status, or 10405 rdattr_error then the occurrence of the root of an absent file 10406 system within the directory will result in the READDIR failing 10407 with an NFS4ERR_MOVED error. 10409 o The unavailability of an attribute because of a file system's 10410 absence, even one that is ordinarily REQUIRED, does not result in 10411 any error indication. The set of attributes returned for the root 10412 directory of the absent file system in that case is simply 10413 restricted to those actually available. 10415 11.4. Uses of Location Information 10417 The location-bearing attributes (fs_locations and fs_locations_info), 10418 provide, together with the possibility of absent file systems, a 10419 number of important facilities in providing reliable, manageable, and 10420 scalable data access. 10422 When a file system is present, these attributes can provide 10423 alternative locations, to be used to access the same data, in the 10424 event of server failures, communications problems, or other 10425 difficulties that make continued access to the current file system 10426 impossible or otherwise impractical. Under some circumstances 10427 multiple alternative locations may be used simultaneously to provide 10428 higher performance access to the file system in question. Provision 10429 of such alternate locations is referred to as "replication" although 10430 there are cases in which replicated sets of data are not in fact 10431 present, and the replicas are instead different paths to the same 10432 data. 10434 When a file system is present and becomes absent, clients can be 10435 given the opportunity to have continued access to their data, at an 10436 alternate location. In this case, a continued attempt to use the 10437 data in the now-absent file system will result in an NFS4ERR_MOVED 10438 error and at that point the successor locations (typically only one 10439 but multiple choices are possible) can be fetched and used to 10440 continue access. Transfer of the file system contents to the new 10441 location is referred to as "migration", but it should be kept in mind 10442 that there are cases in which this term can be used, like 10443 "replication", when there is no actual data migration per se. 10445 Where a file system was not previously present, specification of file 10446 system location provides a means by which file systems located on one 10447 server can be associated with a namespace defined by another server, 10448 thus allowing a general multi-server namespace facility. A 10449 designation of such a location, in place of an absent file system, is 10450 called a "referral". 10452 Because client support for location-related attributes is OPTIONAL, a 10453 server may (but is not required to) take action to hide migration and 10454 referral events from such clients, by acting as a proxy, for example. 10455 The server can determine the presence of client support from the 10456 arguments of the EXCHANGE_ID operation (see Section 18.35.3). 10458 11.4.1. File System Replication 10460 The fs_locations and fs_locations_info attributes provide alternative 10461 locations, to be used to access data in place of or in addition to 10462 the current file system instance. On first access to a file system, 10463 the client should obtain the value of the set of alternate locations 10464 by interrogating the fs_locations or fs_locations_info attribute, 10465 with the latter being preferred. 10467 In the event that server failures, communications problems, or other 10468 difficulties make continued access to the current file system 10469 impossible or otherwise impractical, the client can use the alternate 10470 locations as a way to get continued access to its data. Depending on 10471 specific attributes of these alternate locations, as indicated within 10472 the fs_locations_info attribute, multiple locations may be used 10473 simultaneously, to provide higher performance through the 10474 exploitation of multiple paths between client and target file system. 10476 The alternate locations may be physical replicas of the (typically 10477 read-only) file system data, or they may reflect alternate paths to 10478 the same server or provide for the use of various forms of server 10479 clustering in which multiple servers provide alternate ways of 10480 accessing the same physical file system. How these different modes 10481 of file system transition are represented within the fs_locations and 10482 fs_locations_info attributes and how the client deals with file 10483 system transition issues will be discussed in detail below. 10485 Multiple server addresses, whether they are derived from a single 10486 entry with a DNS name representing a set of IP addresses, or from 10487 multiple entries each with its own server address may correspond to 10488 the same actual server. The fact that two addresses correspond to 10489 the same server is shown by a common so_major_id field within the 10490 eir_server_owner field returned by EXCHANGE_ID (see Section 18.35.3). 10491 For a detailed discussion of how server address targets interact with 10492 the determination of server identity specified by the server owner 10493 field, see Section 11.5. 10495 11.4.2. File System Migration 10497 When a file system is present and becomes absent, clients can be 10498 given the opportunity to have continued access to their data, at an 10499 alternate location, as specified by the fs_locations or 10500 fs_locations_info attribute. Typically, a client will be accessing 10501 the file system in question, get an NFS4ERR_MOVED error, and then use 10502 the fs_locations or fs_locations_info attribute to determine the new 10503 location of the data. When fs_locations_info is used, additional 10504 information will be available which will define the nature of the 10505 client's handling of the transition to a new server. 10507 Such migration can be helpful in providing load balancing or general 10508 resource reallocation. The protocol does not specify how the file 10509 system will be moved between servers. It is anticipated that a 10510 number of different server-to-server transfer mechanisms might be 10511 used with the choice left to the server implementer. The NFSv4.1 10512 protocol specifies the method used to communicate the migration event 10513 between client and server. 10515 The new location may be an alternate communication path to the same 10516 server, or, in the case of various forms of server clustering, 10517 another server providing access to the same physical file system. 10518 The client's responsibilities in dealing with this transition depend 10519 on the specific nature of the new access path and how and whether 10520 data was in fact migrated. These issues will be discussed in detail 10521 below. 10523 When multiple server addresses correspond to the same actual server, 10524 as shown by a common value for the so_major_id field of the 10525 eir_server_owner field returned by EXCHANGE_ID, the location or 10526 locations may designate alternate server addresses in the form of 10527 specific server network addresses. These could be used to access the 10528 file system in question at those addresses and when it is no longer 10529 accessible at the original address. 10531 Although a single successor location is typical, multiple locations 10532 may be provided, together with information that allows priority among 10533 the choices to be indicated, via information in the fs_locations_info 10534 attribute. Where suitable clustering mechanisms make it possible to 10535 provide multiple identical file systems or paths to them, this allows 10536 the client the opportunity to deal with any resource or 10537 communications issues that might limit data availability. 10539 When an alternate location is designated as the target for migration, 10540 it must designate the same data (with metadata being the same to the 10541 degree indicated by the fs_locations_info attribute). Where file 10542 systems are writable, a change made on the original file system must 10543 be visible on all migration targets. Where a file system is not 10544 writable but represents a read-only copy (possibly periodically 10545 updated) of a writable file system, similar requirements apply to the 10546 propagation of updates. Any change visible in the original file 10547 system must already be effected on all migration targets, to avoid 10548 any possibility, that a client in effecting a transition to the 10549 migration target will see any reversion in file system state. 10551 11.4.3. Referrals 10553 Referrals provide a way of placing a file system in a location within 10554 the namespace essentially without respect to its physical location on 10555 a given server. This allows a single server or a set of servers to 10556 present a multi-server namespace that encompasses file systems 10557 located on multiple servers. Some likely uses of this include 10558 establishment of site-wide or organization-wide namespaces, or even 10559 knitting such together into a truly global namespace. 10561 Referrals occur when a client determines, upon first referencing a 10562 position in the current namespace, that it is part of a new file 10563 system and that the file system is absent. When this occurs, 10564 typically by receiving the error NFS4ERR_MOVED, the actual location 10565 or locations of the file system can be determined by fetching the 10566 fs_locations or fs_locations_info attribute. 10568 The locations-related attribute may designate a single file system 10569 location or multiple file system locations, to be selected based on 10570 the needs of the client. The server, in the fs_locations_info 10571 attribute may specify priorities to be associated with various file 10572 system location choices. The server may assign different priorities 10573 to different locations as reported to individual clients, in order to 10574 adapt to client physical location or to effect load balancing. When 10575 both read-only and read-write file systems are present, some of the 10576 read-only locations may not be absolutely up-to-date (as they would 10577 have to be in the case of replication and migration). Servers may 10578 also specify file system locations that include client-substituted 10579 variables so that different clients are referred to different file 10580 systems (with different data contents) based on client attributes 10581 such as CPU architecture. 10583 When the fs_locations_info attribute indicates that there are 10584 multiple possible targets listed, the relationships among them may be 10585 important to the client in selecting the one to use. The same rules 10586 specified in Section 11.4.1 defining the appropriate standards for 10587 the data propagation, apply to these multiple replicas as well. For 10588 example, the client might prefer a writable target on a server that 10589 has additional writable replicas to which it subsequently might 10590 switch. Note that, as distinguished from the case of replication, 10591 there is no need to deal with the case of propagation of updates made 10592 by the current client, since the current client has not accessed the 10593 file system in question. 10595 Use of multi-server namespaces is enabled by NFSv4.1 but is not 10596 required. The use of multi-server namespaces and their scope will 10597 depend on the applications used, and system administration 10598 preferences. 10600 Multi-server namespaces can be established by a single server 10601 providing a large set of referrals to all of the included file 10602 systems. Alternatively, a single multi-server namespace may be 10603 administratively segmented with separate referral file systems (on 10604 separate servers) for each separately-administered portion of the 10605 namespace. Any segment or the top-level referral file system may use 10606 replicated referral file systems for higher availability. 10608 Generally, multi-server namespaces are for the most part uniform, in 10609 that the same data made available to one client at a given location 10610 in the namespace is made available to all clients at that location. 10611 There are however facilities provided which allow different clients 10612 to be directed to different sets of data, so as to adapt to such 10613 client characteristics as CPU architecture. 10615 11.5. Location Entries and Server Identity 10617 As mentioned above, a single location entry may have a server address 10618 target in the form of a DNS name which may represent multiple IP 10619 addresses, while multiple location entries may have their own server 10620 address targets, that reference the same server. Whether two IP 10621 addresses designate the same server is indicated by the existence of 10622 a common so_major_id field within the eir_server_owner field returned 10623 by EXCHANGE_ID (see Section 18.35.3), subject to further 10624 verification, for details of which see Section 2.10.4. 10626 When multiple addresses for the same server exist, the client may 10627 assume that for each file system in the namespace of a given server 10628 network address, there exist file systems at corresponding namespace 10629 locations for each of the other server network addresses. It may do 10630 this even in the absence of explicit listing in fs_locations and 10631 fs_locations_info. Such corresponding file system locations can be 10632 used as alternate locations, just as those explicitly specified via 10633 the fs_locations and fs_locations_info attributes. Where these 10634 specific addresses are explicitly designated in the fs_locations_info 10635 attribute, the conditions of use specified in this attribute (e.g. 10636 priorities, specification of simultaneous use) may limit the client's 10637 use of these alternate locations. 10639 If a single location entry designates multiple server IP addresses, 10640 the client cannot assume that these addresses are multiple paths to 10641 the same server. In most case they will be, but the client MUST 10642 verify that before acting on that assumption. When two server 10643 addresses are designated by a single location entry and they 10644 correspond to different servers, this normally indicates some sort of 10645 misconfiguration, and so the client should avoid use such location 10646 entries when alternatives are available. When they are not, clients 10647 should pick one of IP addresses and use it, without using others that 10648 are not directed to the same server. 10650 11.6. Additional Client-side Considerations 10652 When clients make use of servers that implement referrals, 10653 replication, and migration, care should be taken so that a user who 10654 mounts a given file system that includes a referral or a relocated 10655 file system continues to see a coherent picture of that user-side 10656 file system despite the fact that it contains a number of server-side 10657 file systems which may be on different servers. 10659 One important issue is upward navigation from the root of a server- 10660 side file system to its parent (specified as ".." in UNIX), in the 10661 case in which it transitions to that file system as a result of 10662 referral, migration, or a transition as a result of replication. 10663 When the client is at such a point, and it needs to ascend to the 10664 parent, it must go back to the parent as seen within the multi-server 10665 namespace rather issuing a LOOKUPP call to the server, which would 10666 result in the parent within that server's single-server namespace. 10667 In order to do this, the client needs to remember the filehandles 10668 that represent such file system roots, and use these instead of 10669 issuing a LOOKUPP to the current server. This will allow the client 10670 to present to applications a consistent namespace, where upward 10671 navigation and downward navigation are consistent. 10673 Another issue concerns refresh of referral locations. When referrals 10674 are used extensively, they may change as server configurations 10675 change. It is expected that clients will cache information related 10676 to traversing referrals so that future client side requests are 10677 resolved locally without server communication. This is usually 10678 rooted in client-side name lookup caching. Clients should 10679 periodically purge this data for referral points in order to detect 10680 changes in location information. When the change_policy attribute 10681 changes for directories that hold referral entries or for the 10682 referral entries themselves, clients should consider any associated 10683 cached referral information to be out of date. 10685 11.7. Effecting File System Transitions 10687 Transitions between file system instances, whether due to switching 10688 between replicas upon server unavailability, or in response to 10689 server-initiated migration events are best dealt with together. This 10690 is so even though for the server, pragmatic considerations will 10691 normally force different implementation strategies for planned and 10692 unplanned transitions. Even though the prototypical use cases of 10693 replication and migration contain distinctive sets of features, when 10694 all possibilities for these operations are considered, there is an 10695 underlying unity of these operations, from the client's point of 10696 view, that makes treating them together desirable. 10698 A number of methods are possible for servers to replicate data and to 10699 track client state in order to allow clients to transition between 10700 file system instances with a minimum of disruption. Such methods 10701 vary between those that use inter-server clustering techniques to 10702 limit the changes seen by the client, to those that are less 10703 aggressive, use more standard methods of replicating data, and impose 10704 a greater burden on the client to adapt to the transition. 10706 The NFSv4.1 protocol does not impose choices on clients and servers 10707 with regard to that spectrum of transition methods. In fact, there 10708 are many valid choices, depending on client and application 10709 requirements and their interaction with server implementation 10710 choices. The NFSv4.1 protocol does define the specific choices that 10711 can be made, how these choices are communicated to the client and how 10712 the client is to deal with any discontinuities. 10714 In the sections below, references will be made to various possible 10715 server implementation choices as a way of illustrating the transition 10716 scenarios that clients may deal with. The intent here is not to 10717 define or limit server implementations but rather to illustrate the 10718 range of issues that clients may face. 10720 In the discussion below, references will be made to a file system 10721 having a particular property or of two file systems (typically the 10722 source and destination) belonging to a common class of any of several 10723 types. Two file systems that belong to such a class share some 10724 important aspect of file system behavior that clients may depend upon 10725 when present, to easily effect a seamless transition between file 10726 system instances. Conversely, where the file systems do not belong 10727 to such a common class, the client has to deal with various sorts of 10728 implementation discontinuities which may cause performance or other 10729 issues in effecting a transition. 10731 Where the fs_locations_info attribute is available, such file system 10732 classification data will be made directly available to the client 10733 (see Section 11.10 for details). When only fs_locations is 10734 available, default assumptions with regard to such classifications 10735 have to be inferred (see Section 11.9 for details). 10737 In cases in which one server is expected to accept opaque values from 10738 the client that originated from another server, the servers SHOULD 10739 encode the "opaque" values in big endian byte order. If this is 10740 done, servers acting as replicas or immigrating file systems will be 10741 able to parse values like stateids, directory cookies, filehandles, 10742 etc. even if their native byte order is different from that of other 10743 servers cooperating in the replication and migration of the file 10744 system. 10746 11.7.1. File System Transitions and Simultaneous Access 10748 When a single file system may be accessed at multiple locations, 10749 whether this is because of an indication of file system identity as 10750 reported by the fs_locations or fs_locations_info attributes or 10751 because two file system instances have corresponding locations on 10752 server addresses which connect to the same server (as indicated by a 10753 common so_major_id field in the eir_server_owner field returned by 10754 EXCHANGE_ID), the client will, depending on specific circumstances as 10755 discussed below, either: 10757 o The client accesses multiple instances simultaneously, as 10758 representing alternate paths to the same data and metadata. 10760 o The client accesses one instance (or set of instances) and then 10761 transitions to an alternative instance (or set of instances) as a 10762 result of network issues, server unresponsiveness, or server- 10763 directed migration. The transition may involve changes in 10764 filehandles, fileids, the change attribute, and/or locking state, 10765 depending on the attributes of the source and destination file 10766 system instances, as specified in the fs_locations_info attribute. 10768 Which of these choices is possible, and how a transition is effected, 10769 is governed by equivalence classes of file system instances as 10770 reported by the fs_locations_info attribute, and, for file system 10771 instances in the same location within a multiple single-server 10772 namespace as indicated by the so_major_id field in the 10773 eir_server_owner field returned by EXCHANGE_ID. 10775 11.7.2. Simultaneous Use and Transparent Transitions 10777 When two file system instances have the same location within their 10778 respective single-server namespaces and those two server network 10779 addresses designate the same server (as indicated by the same 10780 so_major_id value in the eir_server_owner value returned in response 10781 to EXCHANGE_ID), those file systems instances can be treated as the 10782 same, and either used together simultaneously or serially with no 10783 transition activity required on the part of the client. In this case 10784 we refer to the transition as "transparent" and the client in 10785 transferring access from to the other is acting as it would in the 10786 event that communication is interrupted, with a new connection and 10787 possibly a new session being established to continue access to the 10788 same file system. 10790 Whether simultaneous use of the two file system instances is valid is 10791 controlled by whether the fs_locations_info attribute shows the two 10792 instances as having the same _simultaneous-use_ class. See 10793 Section 11.10.1 for information about the definition of the various 10794 use classes, including the _simultaneous-use_ class. 10796 Note that for two such file systems, any information within the 10797 fs_locations_info attribute that indicates the need for special 10798 transition activity, i.e. the appearance of the two file system 10799 instances with different _handle_, _fileid_, _write-verifier_, 10800 _change_, _readdir_ classes, indicates a serious problem and the 10801 client, if it allows transition to the file system instance at all, 10802 must not treat this as a transparent transition. The server SHOULD 10803 NOT indicate that these instances belong to different _handle_, 10804 _fileid_, _write-verifier_, _change_, _readdir_ classes, whether the 10805 two instances are shown belonging to the same _simultaneous-use_ 10806 class or not. 10808 Where these conditions do not apply, a non-transparent file system 10809 instance transition is required with the details depending on the 10810 respective _handle_, _fileid_, _write-verifier_, _change_, _readdir_ 10811 classes of the two file system instances and whether the two servers 10812 address in question have the same eir_server_scope value as reported 10813 by EXCHANGE_ID. 10815 11.7.2.1. Simultaneous Use of File System Instances 10817 When the conditions in Section 11.7.2 hold, in either of the 10818 following two cases, the client may use the two file system instances 10819 simultaneously. 10821 o The fs_locations_info attribute does not contain separate per- 10822 network-address entries for file systems instances at the distinct 10823 network addresses. This includes the case in which the 10824 fs_locations_info attribute is unavailable. In this case, the 10825 fact that the two server addresses connect to the same server (as 10826 indicated by the two addresses sharing the same the so_major_id 10827 value and subsequently confirmed as described in Section 2.10.4) 10828 justifies simultaneous use and there is no fs_locations_info 10829 attribute information contradicting that. 10831 o The fs_locations_info attribute indicates that two file system 10832 instances belong to the same _simultaneous-use_ class. 10834 In this case, the client may use both file system instances 10835 simultaneously, as representations of the same file system, whether 10836 that happens because the two network addresses connect to the same 10837 physical server or because different servers connect to clustered 10838 file systems and export their data in common. When simultaneous use 10839 is in effect, any change made to one file system instance must be 10840 immediately reflected in the other file system instance(s). Locks 10841 are treated as part of a common lease, associated with a common 10842 client ID. Depending on the details of the eir_server_owner returned 10843 by EXCHANGE_ID, the two server instances may be accessed by different 10844 sessions or a single session in common. 10846 11.7.2.2. Transparent File System Transitions 10848 When the conditions in Section 11.7.2.1 hold and the 10849 fs_locations_info attribute explicitly shows the file system 10850 instances for these distinct network addresses as belonging to 10851 different _simultaneous-use_ classes, the file system instances 10852 should not be used by the client simultaneously, but rather serially 10853 with one being used unless and until communication difficulties, lack 10854 of responsiveness, or an explicit migration event causes another file 10855 system instance (or set of file system instances sharing a common 10856 _simultaneous-use_ class) to be used. 10858 When a change of file system instance is to be done, the client will 10859 use the same client ID already in effect. If it already has 10860 connections to the new server address, these will be used. Otherwise 10861 new connections to existing sessions or new sessions associated with 10862 the existing client ID are established as indicated by the 10863 eir_server_owner returned by EXCHANGE_ID. 10865 In all such transparent transition cases, the following apply: 10867 o If filehandles are persistent they stay the same. If filehandles 10868 are volatile, they either stay the same, or if they expire, the 10869 reason for expiration is not due to the file system transition. 10871 o Fileid values do not change across the transition. 10873 o The file system will have the same fsid in both the old and new 10874 locations. 10876 o Change attribute values are consistent across the transition and 10877 do not have to be refetched. When change attributes indicate that 10878 a cached object is still valid, it can remain cached. 10880 o Client and state identifiers retain their validity across the 10881 transition, except where their staleness is recognized and 10882 reported by the new server. Except where such staleness requires 10883 it, no lock reclamation is needed. Any such staleness is an 10884 indication that the server should be considered to have restarted 10885 and is reported as discussed in Section 8.4.2. 10887 o Write verifiers are presumed to retain their validity and can be 10888 used to compare with verifiers returned by COMMIT on the new 10889 server, with the expectation that if COMMIT on the new server 10890 returns an identical verifier, then that server has all of the 10891 data unstably written to the original server and has committed it 10892 to stable storage as requested. 10894 o Readdir cookies are presumed to retain their validity and can be 10895 presented to subsequent READDIR requests together with the readdir 10896 verifier with which they are associated. When the verifier is 10897 accepted as valid, the cookie will continue the READDIR operation 10898 so that the entire directory can be obtained by the client. 10900 11.7.3. Filehandles and File System Transitions 10902 There are a number of ways in which filehandles can be handled across 10903 a file system transition. These can be divided into two broad 10904 classes depending upon whether the two file systems across which the 10905 transition happens share sufficient state to effect some sort of 10906 continuity of file system handling. 10908 When there is no such co-operation in filehandle assignment, the two 10909 file systems are reported as being in different _handle_ classes. In 10910 this case, all filehandles are assumed to expire as part of the file 10911 system transition. Note that this behavior does not depend on 10912 fh_expire_type attribute and supersedes the specification of 10913 FH4_VOL_MIGRATION bit, which only affects behavior when 10914 fs_locations_info is not available. 10916 When there is co-operation in filehandle assignment, the two file 10917 systems are reported as being in the same _handle_ classes. In this 10918 case, persistent filehandles remain valid after the file system 10919 transition, while volatile filehandles (excluding those that are only 10920 volatile due to the FH4_VOL_MIGRATION bit) are subject to expiration 10921 on the target server. 10923 11.7.4. Fileids and File System Transitions 10925 In NFSv4.0, the issue of continuity of fileids in the event of a file 10926 system transition was not addressed. The general expectation had 10927 been that in situations in which the two file system instances are 10928 created by a single vendor using some sort of file system image copy, 10929 fileids will be consistent across the transition while in the 10930 analogous multi-vendor transitions they will not. This poses 10931 difficulties, especially for the client without special knowledge of 10932 the transition mechanisms adopted by the server. Note that although 10933 fileid is not a REQUIRED attribute, many servers support fileids and 10934 many clients provide API's that depend on fileids. 10936 It is important to note that while clients themselves may have no 10937 trouble with a fileid changing as a result of a file system 10938 transition event, applications do typically have access to the fileid 10939 (e.g. via stat), and the result of this is that an application may 10940 work perfectly well if there is no file system instance transition or 10941 if any such transition is among instances created by a single vendor, 10942 yet be unable to deal with the situation in which a multi-vendor 10943 transition occurs, at the wrong time. 10945 Providing the same fileids in a multi-vendor (multiple server 10946 vendors) environment has generally been held to be quite difficult. 10947 While there is work to be done, it needs to be pointed out that this 10948 difficulty is partly self-imposed. Servers have typically identified 10949 fileid with inode number, i.e. with a quantity used to find the file 10950 in question. This identification poses special difficulties for 10951 migration of a file system between vendors where assigning the same 10952 index to a given file may not be possible. Note here that a fileid 10953 is not required to be useful to find the file in question, only that 10954 it is unique within the given file system. Servers prepared to 10955 accept a fileid as a single piece of metadata and store it apart from 10956 the value used to index the file information can relatively easily 10957 maintain a fileid value across a migration event, allowing a truly 10958 transparent migration event. 10960 In any case, where servers can provide continuity of fileids, they 10961 should, and the client should be able to find out that such 10962 continuity is available and take appropriate action. Information 10963 about the continuity (or lack thereof) of fileids across a file 10964 system transition is represented by specifying whether the file 10965 systems in question are of the same _fileid_ class. 10967 Note that when consistent fileids do not exist across a transition 10968 (either because there is no continuity of fileids or because fileid 10969 is not a supported attribute on one of instances involved), and there 10970 are no reliable filehandles across a transition event (either because 10971 there is no filehandle continuity or because the filehandles are 10972 volatile), the client is in a position where it cannot verify that 10973 files it was accessing before the transition are the same objects. 10974 It is forced to assume that no object has been renamed, and, unless 10975 there are guarantees that provide this (e.g. the file system is read- 10976 only), problems for applications may occur. Therefore, use of such 10977 configurations should be limited to situations where the problems 10978 that this may cause can be tolerated. 10980 11.7.5. Fsids and File System Transitions 10982 Since fsids are generally only unique within a per-server basis, it 10983 is likely that they will change during a file system transition. One 10984 exception is the case of transparent transitions, but in that case we 10985 have multiple network addresses that are defined as the same server 10986 (as specified by a common value of the so_major_id field of 10987 eir_server_owner). Clients should not make the fsids received from 10988 the server visible to applications since they may not be globally 10989 unique, and because they may change during a file system transition 10990 event. Applications are best served if they are isolated from such 10991 transitions to the extent possible. 10993 Although normally, a single source file system will transition to a 10994 single target file system, there is a provision for splitting a 10995 single source file system into multiple target file systems, by 10996 specifying the FSLI4F_MULTI_FS flag. 10998 11.7.5.1. File System Splitting 11000 When a file system transition is made and the fs_locations_info 11001 indicates that the file system in question may be split into multiple 11002 file systems (via the FSLI4F_MULTI_FS flag), the client SHOULD do 11003 GETATTRs to determine the fsid attribute on all known objects within 11004 the file system undergoing transition to determine the new file 11005 system boundaries. 11007 Clients may maintain the fsids passed to existing applications by 11008 mapping all of the fsids for the descendent file systems to the 11009 common fsid used for the original file system. 11011 Splitting a file system may be done on a transition between file 11012 systems of the same _fileid_ class, since the fact that fileids are 11013 unique within the source file system ensure they will be unique in 11014 each of the target file systems. 11016 11.7.6. The Change Attribute and File System Transitions 11018 Since the change attribute is defined as a server-specific one, 11019 change attributes fetched from one server are normally presumed to be 11020 invalid on another server. Such a presumption is troublesome since 11021 it would invalidate all cached change attributes, requiring 11022 refetching. Even more disruptive, the absence of any assured 11023 continuity for the change attribute means that even if the same value 11024 is retrieved on refetch no conclusions can drawn as to whether the 11025 object in question has changed. The identical change attribute could 11026 be merely an artifact of a modified file with a different change 11027 attribute construction algorithm, with that new algorithm just 11028 happening to result in an identical change value. 11030 When the two file systems have consistent change attribute formats, 11031 and this fact is communicated to the client by reporting as in the 11032 same _change_ class, the client may assume a continuity of change 11033 attribute construction and handle this situation just as it would be 11034 handled without any file system transition. 11036 11.7.7. Lock State and File System Transitions 11038 In a file system transition, the client needs to handle cases in 11039 which the two servers have cooperated in state management and in 11040 which they have not. Cooperation by two servers in state management 11041 requires coordination of client IDs. Before the client attempts to 11042 use a client ID associated with one server in a request to the server 11043 of the other file system, it must eliminate the possibility that two 11044 non-cooperating servers have assigned the same client ID by accident. 11045 The client needs to compare the eir_server_scope values returned by 11046 each server. If the scope values do not match, then the servers have 11047 not cooperated in state management. If the scope values match, then 11048 this indicates the servers have cooperated in assigning client IDs to 11049 the point that they will reject client IDs that refer to state they 11050 do not know about. 11052 In the case of migration, the servers involved in the migration of a 11053 file system SHOULD transfer all server state from the original to the 11054 new server. When this is done, it must be done in a way that is 11055 transparent to the client. With replication, such a degree of common 11056 state is typically not the case. Clients, however should use the 11057 information provided by the eir_server_scope returned by EXCHANGE_ID 11058 to determine whether such sharing may be in effect, rather than 11059 making assumptions based on the reason for the transition. 11061 This state transfer will reduce disruption to the client when a file 11062 system transition occurs. If the servers are successful in 11063 transferring all state, the client can attempt to establish sessions 11064 associated with the client ID used for the source file system 11065 instance. If the server accepts that as a valid client ID, then the 11066 client may use the existing stateids associated with that client ID 11067 for the old file system instance in connection with that same client 11068 ID in connection with the transitioned file system instance. 11070 When the two servers belong to the same server scope, it does not 11071 mean that when dealing with the transition, the client will not have 11072 to reclaim state. However it does mean that the client may proceed 11073 using its current client ID when establishing communication with the 11074 new server and the new server will either recognize the client ID as 11075 valid, or reject it, in which case locks must be reclaimed by the 11076 client. 11078 File systems co-operating in state management may actually share 11079 state or simply divide the identifier space so as to recognize (and 11080 reject as stale) each other's stateids and client IDs. Servers which 11081 do share state may not do so under all conditions or at all times. 11082 The requirement for the server is that if it cannot be sure in 11083 accepting a client ID that it reflects the locks the client was 11084 given, it must treat all associated state as stale and report it as 11085 such to the client. 11087 When the two file system instances are on servers that do not share a 11088 server scope value, the client must establish a new client ID on the 11089 destination, if it does not have one already, and reclaim locks if 11090 possible. In this case, old stateids and client IDs should not be 11091 presented to the new server since there is no assurance that they 11092 will not conflict with IDs valid on that server. 11094 In either case, when actual locks are not known to be maintained, the 11095 destination server may establish a grace period specific to the given 11096 file system, with non-reclaim locks being rejected for that file 11097 system, even though normal locks are being granted for other file 11098 systems. Clients should not infer the absence of a grace period for 11099 file systems being transitioned to a server from responses to 11100 requests for other file systems. 11102 In the case of lock reclamation for a given file system after a file 11103 system transition, edge conditions can arise similar to those for 11104 reclaim after server restart (although in the case of the planned 11105 state transfer associated with migration, these can be avoided by 11106 securely recording lock state as part of state migration). Unless 11107 the destination server can guarantee that locks will not be 11108 incorrectly granted, the destination server should not allow lock 11109 reclaims and avoid establishing a grace period. 11111 Once all locks have been reclaimed, or there were no locks to 11112 reclaim, the client indicates that there are no more reclaims to be 11113 done for the file system in question by issuing a RECLAIM_COMPLETE 11114 operation with the rca_one_fs parameter set to true. Once this has 11115 been done, non-reclaim locking operations may be done, and any 11116 subsequent request to do reclaims will be rejected with the error 11117 NFS4ERR_NO_GRACE. 11119 Information about client identity may be propagated between servers 11120 in the form of client_owner4 and associated verifiers, under the 11121 assumption that the client presents the same values to all the 11122 servers with which it deals. 11124 Servers are encouraged to provide facilities to allow locks to be 11125 reclaimed on the new server after a file system transition. Often, 11126 however, in cases in which the two servers do not share a server 11127 scope value, such facilities may not be available and client should 11128 be prepared to re-obtain locks, even though it is possible that the 11129 client may have its LOCK or OPEN request denied due to a conflicting 11130 lock. 11132 The consequences of having no facilities available to reclaim locks 11133 on the sew server will depend on the type of environment. In some 11134 environments, such as the transition between read-only file systems, 11135 such denial of locks should not pose large difficulties in practice. 11136 When an attempt to re-establish a lock on a new server is denied, the 11137 client should treat the situation as if its original lock had been 11138 revoked. Note that when the lock is granted, the client cannot 11139 assume that no conflicting lock could have been granted in the 11140 interim. Where change attribute continuity is present, the client 11141 may check the change attribute to check for unwanted file 11142 modifications. Where even this is not available, and the file system 11143 is not read-only, a client may reasonably treat all pending locks as 11144 having been revoked. 11146 11.7.7.1. Leases and File System Transitions 11148 In the case of lease renewal, the client may not be submitting 11149 requests for a file system that has been transferred to another 11150 server. This can occur because of the lease renewal mechanism. The 11151 client renews the lease associated with all file systems when 11152 submitting a request on an associated session, regardless of the 11153 specific file system being referenced. 11155 In order for the client to schedule renewal of leases where there is 11156 locking state that may have been relocated to the new server, the 11157 client must find out about lease relocation before those leases 11158 expire. To accomplish this, the SEQUENCE operation will return the 11159 status bit SEQ4_STATUS_LEASE_MOVED, if responsibility for any of the 11160 locking state renewed has been transferred to a new server. This 11161 will continue until the client receives an NFS4ERR_MOVED error for 11162 each of the file systems for which there has been locking state 11163 relocation. 11165 When a client receives an SEQ4_STATUS_LEASE_MOVED indication, it 11166 should perform an operation on each file system associated with the 11167 server where there is locking state for the current client associated 11168 with the file system in question. The client may choose to reference 11169 all file systems in the interests of simplicity but what is important 11170 is that it must reference all file systems for which there was 11171 locking state where that state moved. Once the client receives an 11172 NFS4ERR_MOVED error for each file system, the SEQ4_STATUS_LEASE_MOVED 11173 indication is cleared. The client can terminate the process of 11174 checking file systems once this indication is cleared (but only if 11175 the client has received a reply for all outstanding SEQUENCE requests 11176 on all sessions it has with the server), since there are no others 11177 for which locking state has moved. 11179 A client may use GETATTR of the fs_status (or fs_locations_info) 11180 attribute on all of the file systems to get absence indications in a 11181 single (or a few) request(s), since absent file systems will not 11182 cause an error in this context. However, it still must do an 11183 operation which receives NFS4ERR_MOVED on each file system, in order 11184 to clear the SEQ4_STATUS_LEASE_MOVED indication is cleared. 11186 Once the set of file systems with transferred locking state has been 11187 determined, the client can follow the normal process to obtain the 11188 new server information (through the fs_locations and 11189 fs_locations_info attributes) and perform renewal of those leases on 11190 the new server, unless information in fs_locations_info attribute 11191 shows that no state could have been transferred. If the server has 11192 not had state transferred to it transparently, the client will 11193 receive NFS4ERR_STALE_CLIENTID from the new server, as described 11194 above, and the client can then reclaim locks as is done in the event 11195 of server failure. 11197 11.7.7.2. Transitions and the Lease_time Attribute 11199 In order that the client may appropriately manage its leases in the 11200 case of a file system transition, the destination server must 11201 establish proper values for the lease_time attribute. 11203 When state is transferred transparently, that state should include 11204 the correct value of the lease_time attribute. The lease_time 11205 attribute on the destination server must never be less than that on 11206 the source since this would result in premature expiration of leases 11207 granted by the source server. Upon transitions in which state is 11208 transferred transparently, the client is under no obligation to re- 11209 fetch the lease_time attribute and may continue to use the value 11210 previously fetched (on the source server). 11212 If state has not been transferred transparently, either because the 11213 associated servers are shown as having different eir_server_scope 11214 strings or because the client ID is rejected when presented to the 11215 new server, the client should fetch the value of lease_time on the 11216 new (i.e. destination) server, and use it for subsequent locking 11217 requests. However the server must respect a grace period at least as 11218 long as the lease_time on the source server, in order to ensure that 11219 clients have ample time to reclaim their lock before potentially 11220 conflicting non-reclaimed locks are granted. 11222 11.7.8. Write Verifiers and File System Transitions 11224 In a file system transition, the two file systems may be clustered in 11225 the handling of unstably written data. When this is the case, and 11226 the two file systems belong to the same _write-verifier_ class, write 11227 verifiers returned from one system may be compared to those returned 11228 by the other and superfluous writes avoided. 11230 When two file systems belong to different _write-verifier_ classes, 11231 any verifier generated by one must not be compared to one provided by 11232 the other. Instead, it should be treated as not equal even when the 11233 values are identical. 11235 11.7.9. Readdir Cookies and Verifiers and File System Transitions 11237 In a file system transition, the two file systems may be consistent 11238 in their handling of READDIR cookies and verifiers. When this is the 11239 case, and the two file systems belong to the same _readdir_ class, 11240 READDIR cookies and verifiers from one system may be recognized by 11241 the other and READDIR operations started on one server may be validly 11242 continued on the other, simply by presenting the cookie and verifier 11243 returned by a READDIR operation done on the first file system to the 11244 second. 11246 When two file systems belong to different _readdir_ classes, any 11247 READDIR cookie and verifier generated by one is not valid on the 11248 second, and must not be presented to that server by the client. The 11249 client should act as if the verifier was rejected. 11251 11.7.10. File System Data and File System Transitions 11253 When multiple replicas exist and are used simultaneously or in 11254 succession by a client, applications using them will normally expect 11255 that they contain data the same data or data which is consistent with 11256 the normal sorts of changes that are made by other clients updating 11257 the data of the file system. (with metadata being the same to the 11258 degree indicated by the fs_locations_info attribute). However, when 11259 multiple file systems are presented as replicas of one another, the 11260 precise relationship between the data of one and the data of another 11261 is not, as a general matter, specified by the NFSv4.1 protocol. It 11262 is quite possible to present as replicas file systems where the data 11263 of those file systems is sufficiently different that some 11264 applications have problems dealing with the transition between 11265 replicas. The namespace will typically be constructed so that 11266 applications can choose an appropriate level of support, so that in 11267 one position in the namespace a varied set of replicas will be listed 11268 while in another only those that are up-to-date may be considered 11269 replicas. The protocol does define three special cases of the 11270 relationship among replicas to be specified by the server and relied 11271 upon by clients: 11273 o When multiple server addresses correspond to the same actual 11274 server, as indicated by a common so_major_id field within the 11275 eir_server_owner field returned by EXCHANGE_ID, the client may 11276 depend on the fact that changes to data, metadata, or locks made 11277 on one file system are immediately reflected on others. 11279 o When multiple replicas exist and are used simultaneously by a 11280 client (see the FSLIB4_CLSIMUL definition within 11281 fs_locations_info), they must designate the same data. Where file 11282 systems are writable, a change made on one instance must be 11283 visible on all instances, immediately upon the earlier of the 11284 return of the modifying requester or the visibility of that change 11285 on any of the associated replicas. This allows a client to use 11286 these replicas simultaneously without any special adaptation to 11287 the fact that there are multiple replicas. In this case, locks, 11288 whether shared or byte-range, and delegations obtained one replica 11289 are immediately reflected on all replicas, even though these locks 11290 will be managed under a set of client IDs. 11292 o When one replica is designated as the successor instance to 11293 another existing instance after return NFS4ERR_MOVED (i.e. the 11294 case of migration), the client may depend on the fact that all 11295 changes securely made to data (uncommitted writes are dealt with 11296 in Section 11.7.8) on the original instance are made to the 11297 successor image. 11299 o Where a file system is not writable but represents a read-only 11300 copy (possibly periodically updated) of a writable file system, 11301 clients have similar requirements with regard to the propagation 11302 of updates. They may need a guarantee that any change visible on 11303 the original file system instance must be immediately visible on 11304 any replica before the client transitions access to that replica, 11305 in order to avoid any possibility that a client, in effecting a 11306 transition to a replica, will see any reversion in file system 11307 state. The specific means by which this will be prevented varies 11308 based on fs4_status_type reported as part of the fs_status 11309 attribute (see Section 11.11). Since these file systems are 11310 presumed not to be suitable for simultaneous use, there is no 11311 specification of how locking is handled and it generally will be 11312 the case that locks obtained one file system will be separate from 11313 those on others. Since these are going to be read-only file 11314 systems, this is not expected to pose an issue for clients or 11315 applications. 11317 11.8. Effecting File System Referrals 11319 Referrals are effected when an absent file system is encountered, and 11320 one or more alternate locations are made available by the 11321 fs_locations or fs_locations_info attributes. The client will 11322 typically get an NFS4ERR_MOVED error, fetch the appropriate location 11323 information and proceed to access the file system on a different 11324 server, even though it retains its logical position within the 11325 original namespace. Referrals differ from migration events in that 11326 they happen only when the client has not previously referenced the 11327 file system in question (so there is nothing to transition). 11328 Referrals can only come into effect when an absent file system is 11329 encountered at its root. 11331 The examples given in the sections below are somewhat artificial in 11332 that an actual client will not typically do a multi-component lookup, 11333 but will have cached information regarding the upper levels of the 11334 name hierarchy. However, these example are chosen to make the 11335 required behavior clear and easy to put within the scope of a small 11336 number of requests, without getting unduly into details of how 11337 specific clients might choose to cache things. 11339 11.8.1. Referral Example (LOOKUP) 11341 Let us suppose that the following COMPOUND is sent in an environment 11342 in which /this/is/the/path is absent from the target server. This 11343 may be for a number of reasons. It may be the case that the file 11344 system has moved, or, it may be the case that the target server is 11345 functioning mainly, or solely, to refer clients to the servers on 11346 which various file systems are located. 11348 o PUTROOTFH 11350 o LOOKUP "this" 11352 o LOOKUP "is" 11354 o LOOKUP "the" 11356 o LOOKUP "path" 11358 o GETFH 11360 o GETATTR fsid,fileid,size,time_modify 11362 Under the given circumstances, the following will be the result. 11364 o PUTROOTFH --> NFS_OK. The current fh is now the root of the 11365 pseudo-fs. 11367 o LOOKUP "this" --> NFS_OK. The current fh is for /this and is 11368 within the pseudo-fs. 11370 o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is 11371 within the pseudo-fs. 11373 o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and 11374 is within the pseudo-fs. 11376 o LOOKUP "path" --> NFS_OK. The current fh is for /this/is/the/path 11377 and is within a new, absent file system, but ... the client will 11378 never see the value of that fh. 11380 o GETFH --> NFS4ERR_MOVED. Fails because current fh is in an absent 11381 file system at the start of the operation and the spec makes no 11382 exception for GETFH. 11384 o GETATTR fsid,fileid,size,time_modify. Not executed because the 11385 failure of the GETFH stops processing of the COMPOUND. 11387 Given the failure of the GETFH, the client has the job of determining 11388 the root of the absent file system and where to find that file 11389 system, i.e. the server and path relative to that server's root fh. 11390 Note here that in this example, the client did not obtain filehandles 11391 and attribute information (e.g. fsid) for the intermediate 11392 directories, so that it would not be sure where the absent file 11393 system starts. It could be the case, for example, that /this/is/the 11394 is the root of the moved file system and that the reason that the 11395 lookup of "path" succeeded is that the file system was not absent on 11396 that operation but was moved between the last LOOKUP and the GETFH 11397 (since COMPOUND is not atomic). Even if we had the fsids for all of 11398 the intermediate directories, we could have no way of knowing that 11399 /this/is/the/path was the root of a new file system, since we don't 11400 yet have its fsid. 11402 In order to get the necessary information, let us re-send the chain 11403 of LOOKUPs with GETFHs and GETATTRs to at least get the fsids so we 11404 can be sure where the appropriate file system boundaries are. The 11405 client could choose to get fs_locations_info at the same time but in 11406 most cases the client will have a good guess as to where file system 11407 boundaries are (because of where and where not NFS4ERR_MOVED was 11408 received) making fetching of fs_locations_info unnecessary. 11410 OP01: PUTROOTFH --> NFS_OK 11412 - Current fh is root of pseudo-fs. 11414 OP02: GETATTR(fsid) --> NFS_OK 11416 - Just for completeness. Normally, clients will know the fsid of 11417 the pseudo-fs as soon as they establish communication with a 11418 server. 11420 OP03: LOOKUP "this" --> NFS_OK 11422 OP04: GETATTR(fsid) --> NFS_OK 11424 - Get current fsid to see where file system boundaries are. The 11425 fsid will be that for the pseudo-fs in this example, so no 11426 boundary. 11428 OP05: GETFH --> NFS_OK 11430 - Current fh is for /this and is within pseudo-fs. 11432 OP06: LOOKUP "is" --> NFS_OK 11434 - Current fh is for /this/is and is within pseudo-fs. 11436 OP07: GETATTR(fsid) --> NFS_OK 11438 - Get current fsid to see where file system boundaries are. The 11439 fsid will be that for the pseudo-fs in this example, so no 11440 boundary. 11442 OP08: GETFH --> NFS_OK 11444 - Current fh is for /this/is and is within pseudo-fs. 11446 OP09: LOOKUP "the" --> NFS_OK 11448 - Current fh is for /this/is/the and is within pseudo-fs. 11450 OP10: GETATTR(fsid) --> NFS_OK 11452 - Get current fsid to see where file system boundaries are. The 11453 fsid will be that for the pseudo-fs in this example, so no 11454 boundary. 11456 OP11: GETFH --> NFS_OK 11458 - Current fh is for /this/is/the and is within pseudo-fs. 11460 OP12: LOOKUP "path" --> NFS_OK 11462 - Current fh is for /this/is/the/path and is within a new, absent 11463 file system, but ... 11465 - The client will never see the value of that fh 11467 OP13: GETATTR(fsid, fs_locations_info) --> NFS_OK 11469 - We are getting the fsid to know where the file system boundaries 11470 are. In this operation the fsid will be different than that of 11471 the parent directory (which in turn was retrieved in OP10). Note 11472 that the fsid we are given will not necessarily be preserved at 11473 the new location. That fsid might be different and in fact the 11474 fsid we have for this file system might be a valid fsid of a 11475 different file system on that new server. 11477 - In this particular case, we are pretty sure anyway that what has 11478 moved is /this/is/the/path rather than /this/is/the since we have 11479 the fsid of the latter and it is that of the pseudo-fs, which 11480 presumably cannot move. However, in other examples, we might not 11481 have this kind of information to rely on (e.g. /this/is/the might 11482 be a non-pseudo file system separate from /this/is/the/path), so 11483 we need to have another reliable source information on the 11484 boundary of the file system which is moved. If, for example, the 11485 file system "/this/is" had moved we would have a case of migration 11486 rather than referral and once the boundaries of the migrated file 11487 system was clear we could fetch fs_locations_info. 11489 - We are fetching fs_locations_info because the fact that we got an 11490 NFS4ERR_MOVED at this point means that it most likely that this is 11491 a referral and we need the destination. Even if it is the case 11492 that "/this/is/the" is a file system which has migrated, we will 11493 still need the location information for that file system. 11495 OP14: GETFH --> NFS4ERR_MOVED 11497 - Fails because current fh is in an absent file system at the start 11498 of the operation and the spec makes no exception for GETFH. Note 11499 that this means the server will never send the client a filehandle 11500 from within an absent file system. 11502 Given the above, the client knows where the root of the absent file 11503 system is (/this/is/the/path), by noting where the change of fsid 11504 occurred (between "the" and "path"). The fs_locations_info attribute 11505 also gives the client the actual location of the absent file system, 11506 so that the referral can proceed. The server gives the client the 11507 bare minimum of information about the absent file system so that 11508 there will be very little scope for problems of conflict between 11509 information sent by the referring server and information of the file 11510 system's home. No filehandles and very few attributes are present on 11511 the referring server and the client can treat those it receives as 11512 basically transient information with the function of enabling the 11513 referral. 11515 11.8.2. Referral Example (READDIR) 11517 Another context in which a client may encounter referrals is when it 11518 does a READDIR on directory in which some of the sub-directories are 11519 the roots of absent file systems. 11521 Suppose such a directory is read as follows: 11523 o PUTROOTFH 11525 o LOOKUP "this" 11527 o LOOKUP "is" 11529 o LOOKUP "the" 11531 o READDIR (fsid, size, time_modify, mounted_on_fileid) 11533 In this case, because rdattr_error is not requested, 11534 fs_locations_info is not requested, and some of attributes cannot be 11535 provided, the result will be an NFS4ERR_MOVED error on the READDIR, 11536 with the detailed results as follows: 11538 o PUTROOTFH --> NFS_OK. The current fh is at the root of the 11539 pseudo-fs. 11541 o LOOKUP "this" --> NFS_OK. The current fh is for /this and is 11542 within the pseudo-fs. 11544 o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is 11545 within the pseudo-fs. 11547 o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and 11548 is within the pseudo-fs. 11550 o READDIR (fsid, size, time_modify, mounted_on_fileid) --> 11551 NFS4ERR_MOVED. Note that the same error would have been returned 11552 if /this/is/the had migrated, when in fact it is because the 11553 directory contains the root of an absent file system. 11555 So now suppose that we re-send with rdattr_error: 11557 o PUTROOTFH 11559 o LOOKUP "this" 11561 o LOOKUP "is" 11563 o LOOKUP "the" 11565 o READDIR (rdattr_error, fsid, size, time_modify, mounted_on_fileid) 11567 The results will be: 11569 o PUTROOTFH --> NFS_OK. The current fh is at the root of the 11570 pseudo-fs. 11572 o LOOKUP "this" --> NFS_OK. The current fh is for /this and is 11573 within the pseudo-fs. 11575 o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is 11576 within the pseudo-fs. 11578 o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and 11579 is within the pseudo-fs. 11581 o READDIR (rdattr_error, fsid, size, time_modify, mounted_on_fileid) 11582 --> NFS_OK. The attributes for directory entry with the component 11583 named "path" will only contain rdattr_error with the value 11584 NFS4ERR_MOVED, together with an fsid value and a value for 11585 mounted_on_fileid. 11587 So suppose we do another READDIR to get fs_locations_info (although 11588 we could have used a GETATTR directly, as in Section 11.8.1). 11590 o PUTROOTFH 11592 o LOOKUP "this" 11594 o LOOKUP "is" 11596 o LOOKUP "the" 11598 o READDIR (rdattr_error, fs_locations_info, mounted_on_fileid, fsid, 11599 size, time_modify) 11601 The results would be: 11603 o PUTROOTFH --> NFS_OK. The current fh is at the root of the 11604 pseudo-fs. 11606 o LOOKUP "this" --> NFS_OK. The current fh is for /this and is 11607 within the pseudo-fs. 11609 o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is 11610 within the pseudo-fs. 11612 o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and 11613 is within the pseudo-fs. 11615 o READDIR (rdattr_error, fs_locations_info, mounted_on_fileid, fsid, 11616 size, time_modify) --> NFS_OK. The attributes will be as shown 11617 below. 11619 The attributes for the directory entry with the component named 11620 "path" will only contain 11622 o rdattr_error (value: NFS_OK) 11624 o fs_locations_info 11626 o mounted_on_fileid (value: unique fileid within referring file 11627 system) 11629 o fsid (value: unique value within referring server) 11631 The attributes for entry "path" will not contain size or time_modify 11632 because these attributes are not available within an absent file 11633 system. 11635 11.9. The Attribute fs_locations 11637 The fs_locations attribute is structured in the following way: 11639 struct fs_location4 { 11640 utf8str_cis server<>; 11641 pathname4 rootpath; 11642 }; 11644 struct fs_locations4 { 11645 pathname4 fs_root; 11646 fs_location4 locations<>; 11648 }; 11650 The fs_location4 data type is used to represent the location of a 11651 file system by providing a server name and the path to the root of 11652 the file system within that server's namespace. When a set of 11653 servers have corresponding file systems at the same path within their 11654 namespaces, an array of server names may be provided. An entry in 11655 the server array is a UTF-8 string and represents one of a 11656 traditional DNS host name, IPv4 address, or IPv6 address, or an zero- 11657 length string. A zero-length string SHOULD be used to indicate the 11658 current address being used for the RPC call. It is not a requirement 11659 that all servers that share the same rootpath be listed in one 11660 fs_location4 instance. The array of server names is provided for 11661 convenience. Servers that share the same rootpath may also be listed 11662 in separate fs_location4 entries in the fs_locations attribute. 11664 The fs_locations4 data type and fs_locations attribute contain an 11665 array of such locations. Since the namespace of each server may be 11666 constructed differently, the "fs_root" field is provided. The path 11667 represented by fs_root represents the location of the file system in 11668 the current server's namespace, i.e. that of the server from which 11669 the fs_locations attribute was obtained. The fs_root path is meant 11670 to aid the client by clearly referencing the root of the file system 11671 whose locations are being reported, no matter what object within the 11672 current file system the current filehandle designates. The fs_root 11673 is simply the pathname the client used to reach the object on the 11674 current server, the object being that the fs_locations attribute 11675 applies to. 11677 When the fs_locations attribute is interrogated and there are no 11678 alternate file system locations, the server SHOULD return a zero- 11679 length array of fs_location4 structures, together with a valid 11680 fs_root. 11682 As an example, suppose there is a replicated file system located at 11683 two servers (servA and servB). At servA, the file system is located 11684 at path "/a/b/c". At, servB the file system is located at path 11685 "/x/y/z". If the client were to obtain the fs_locations value for 11686 the directory at "/a/b/c/d", it might not necessarily know that the 11687 file system's root is located in servA's namespace at "/a/b/c". When 11688 the client switches to servB, it will need to determine that the 11689 directory it first referenced at servA is now represented by the path 11690 "/x/y/z/d" on servB. To facilitate this, the fs_locations attribute 11691 provided by servA would have a fs_root value of "/a/b/c" and two 11692 entries in fs_locations. One entry in fs_locations will be for 11693 itself (servA) and the other will be for servB with a path of 11694 "/x/y/z". With this information, the client is able to substitute 11695 "/x/y/z" for the "/a/b/c" at the beginning of its access path and 11696 construct "/x/y/z/d" to use for the new server. 11698 Note that: there is no requirement that the number of components in 11699 each rootpath be the same; there is no relation between the number of 11700 components in rootpath or fs_root; and the none of the components in 11701 each rootpath and fs_root have to be the same. In the above example, 11702 we could have had a third element in the locations array, with server 11703 equal to "servC", and rootpath equal to "/I/II", and a fourth element 11704 in locations with server equal to "servD", and rootpath equal to 11705 "/aleph/beth/gimel/daleth/he". 11707 The relationship between fs_root to a rootpath is that the client 11708 replaces the pathname indicated in fs_root for the current server for 11709 the substitute indicated in rootpath for the new server. 11711 For an example for a referred or migrated file system, suppose there 11712 is a file system located at serv1. At serv1, the file system is 11713 located at "/az/buky/vedi/glagoli". The client finds that object at 11714 "glagoli" has migrated (or is a referral). The client gets the 11715 fs_locations attribute, which contains an fs_root of "/az/buky/vedi/ 11716 glagoli", and one element in the locations array, with server equal 11717 to "serv2", and rootpath equal to "/izhitsa/fita". The client 11718 replaces "/az/buky/vedi/glagoli" with "/izhitsa/fita", and uses the 11719 latter pathname on "serv2". 11721 Thus, the server MUST return an fs_root that is equal to the path the 11722 client used to reach the object the fs_locations attribute applies 11723 to. Otherwise the client cannot determine the new path to use on the 11724 new server. 11726 Since the fs_locations attribute lacks information defining various 11727 attributes of the various file system choices presented, it SHOULD 11728 only be interrogated and used when fs_locations_info is not 11729 available. When fs_locations is used, information about the specific 11730 locations should be assumed based on the following rules. 11732 The following rules are general and apply irrespective of the 11733 context. 11735 o All listed file system instances should be considered as of the 11736 same _handle_ class, if and only if, the current fh_expire_type 11737 attribute does not include the FH4_VOL_MIGRATION bit. Note that 11738 in the case of referral, filehandle issues do not apply since 11739 there can be no filehandles known within the current file system 11740 nor is there any access to the fh_expire_type attribute on the 11741 referring (absent) file system. 11743 o All listed file system instances should be considered as of the 11744 same _fileid_ class, if and only if, the fh_expire_type attribute 11745 indicates persistent filehandles and does not include the 11746 FH4_VOL_MIGRATION bit. Note that in the case of referral, fileid 11747 issues do not apply since there can be no fileids known within the 11748 referring (absent) file system nor is there any access to the 11749 fh_expire_type attribute. 11751 o All file system instances servers should be considered as of 11752 different _change_ classes. 11754 For other class assignments, handling of file system transitions 11755 depends on the reasons for the transition: 11757 o When the transition is due to migration, that is the client was 11758 directed to new file system after receiving an NFS4ERR_MOVED 11759 error, the target should be treated as being of the same _write- 11760 verifier_ class as the source. 11762 o When the transition is due to failover to another replica, that 11763 is, the client selected another replica without receiving and 11764 NFS4ERR_MOVED error, the target should be treated as being of a 11765 different _write-verifier_ class from the source. 11767 The specific choices reflect typical implementation patterns for 11768 failover and controlled migration respectively. Since other choices 11769 are possible and useful, this information is better obtained by using 11770 fs_locations_info. When a server implementation needs to communicate 11771 other choices, it MUST support the fs_locations_info attribute. 11773 See Section 21 for a discussion on the recommendations for the 11774 security flavor to be used by any GETATTR operation that requests the 11775 "fs_locations" attribute. 11777 11.10. The Attribute fs_locations_info 11779 The fs_locations_info attribute is intended as a more functional 11780 replacement for fs_locations which will continue to exist and be 11781 supported. Clients can use it to get a more complete set of 11782 information about alternative file system locations. When the server 11783 does not support fs_locations_info, fs_locations can be used to get a 11784 subset of the information. A server which supports fs_locations_info 11785 MUST support fs_locations as well. 11787 There is additional information present in fs_locations_info, that is 11788 not available in fs_locations: 11790 o Attribute continuity information to allow a client to select a 11791 location which meets the transparency requirements of the 11792 applications accessing the data and to take advantage of 11793 optimizations that server guarantees as to attribute continuity 11794 may provide (e.g. change attribute). 11796 o File System identity information which indicates when multiple 11797 replicas, from the client's point of view, correspond to the same 11798 target file system, allowing them to be used interchangeably, 11799 without disruption, as multiple paths to the same thing. 11801 o Information which will bear on the suitability of various 11802 replicas, depending on the use that the client intends. For 11803 example, many applications need an absolutely up-to-date copy 11804 (e.g. those that write), while others may only need access to the 11805 most up-to-date copy reasonably available. 11807 o Server-derived preference information for replicas, which can be 11808 used to implement load-balancing while giving the client the 11809 entire file system list to be used in case the primary fails. 11811 The fs_locations_info attribute is structured similarly to the 11812 fs_locations attribute. A top-level structure (fs_locations_info4) 11813 contains the entire attribute including the root pathname of the file 11814 system and an array of lower-level structures that define replicas 11815 that share a common root path on their respective servers. The 11816 lower-level structure in turn (fs_locations_item4) contains a 11817 specific pathname and information on one or more individual server 11818 replicas. For that last lowest-level fs_locations_info has a 11819 fs_locations_server4 structure that contains per-server-replica 11820 information in addition to the server name. This per-server-replica 11821 information includes a nominally opaque array, fls_info, in which 11822 specific pieces of information are located at the specific indices 11823 listed below. 11825 The attribute will always contains at least a single 11826 fs_locations_server entry. Typically, this will be an entry with the 11827 FS4LIGF_CUR_REQ flag set, although in the case of a referral there 11828 will be no entry with that flag set. 11830 It should be noted that fs_locations_info attributes returned by 11831 servers for various replicas may different for various reasons. One 11832 server may know about a set of replicas that are not know to other 11833 servers. Further, compatibility attributes may differ. Filehandles 11834 may by of the same class going from replica A to replica B but not 11835 going in the reverse direction. This may happen because the 11836 filehandles are the same but the server implementation for the server 11837 on which replica B may not have provision to note and report that 11838 equivalence. 11840 The fs_locations_info attribute consists of a root pathname 11841 (fli_fs_root, just like fs_root in the fs_locations attribute), 11842 together with an array of fs_location_item4 structures. The 11843 fs_location_item4 structures in turn consist of a root pathname 11844 (fli_rootpath) together with an array (fli_entries) of elements of 11845 data type fs_locations_server4, all defined as follows. 11847 /* 11848 * Defines an individual server replica 11849 */ 11850 struct fs_locations_server4 { 11851 int32_t fls_currency; 11852 opaque fls_info<>; 11853 utf8str_cis fls_server; 11854 }; 11856 /* 11857 * Byte indices of items within 11858 * fls_info: flag fields, class numbers, 11859 * bytes indicating ranks and orders. 11860 */ 11861 const FSLI4BX_GFLAGS = 0; 11862 const FSLI4BX_TFLAGS = 1; 11864 const FSLI4BX_CLSIMUL = 2; 11865 const FSLI4BX_CLHANDLE = 3; 11866 const FSLI4BX_CLFILEID = 4; 11867 const FSLI4BX_CLWRITEVER = 5; 11868 const FSLI4BX_CLCHANGE = 6; 11869 const FSLI4BX_CLREADDIR = 7; 11871 const FSLI4BX_READRANK = 8; 11872 const FSLI4BX_WRITERANK = 9; 11873 const FSLI4BX_READORDER = 10; 11874 const FSLI4BX_WRITEORDER = 11; 11876 /* 11877 * Bits defined within the general flag byte. 11878 */ 11879 const FSLI4GF_WRITABLE = 0x01; 11880 const FSLI4GF_CUR_REQ = 0x02; 11881 const FSLI4GF_ABSENT = 0x04; 11882 const FSLI4GF_GOING = 0x08; 11883 const FSLI4GF_SPLIT = 0x10; 11885 /* 11886 * Bits defined within the transport flag byte. 11887 */ 11888 const FSLI4TF_RDMA = 0x01; 11890 /* 11891 * Defines a set of replicas sharing 11892 * a common value of the root path 11893 * with in the corresponding 11894 * single-server namespaces. 11895 */ 11896 struct fs_locations_item4 { 11897 fs_locations_server4 fli_entries<>; 11898 pathname4 fli_rootpath; 11899 }; 11901 /* 11902 * Defines the overall structure of 11903 * the fs_locations_info attribute. 11904 */ 11905 struct fs_locations_info4 { 11906 uint32_t fli_flags; 11907 int32_t fli_valid_for; 11908 pathname4 fli_fs_root; 11909 fs_locations_item4 fli_items<>; 11910 }; 11912 /* 11913 * Flag bits in fli_flags. 11914 */ 11915 const FSLI4IF_VAR_SUB = 0x00000001; 11917 typedef fs_locations_info4 fattr4_fs_locations_info; 11919 As noted above, the fs_locations_info attribute, when supported, may 11920 be requested of absent file systems without causing NFS4ERR_MOVED to 11921 be returned and it is generally expected that it will be available 11922 for both present and absent file systems even if only a single 11923 fs_locations_server4 entry is present, designating the current 11924 (present) file system, or two fs_locations_server4 entries 11925 designating the previous location of an absent file system (the one 11926 just referenced) and its successor location. Servers are strongly 11927 urged to support this attribute on all file systems if they support 11928 it on any file system. 11930 The data presented in the fs_locations_info attribute may be obtained 11931 by the server in any number of ways, including specification by the 11932 administrator or by current protocols for transferring data among 11933 replicas and protocols not yet developed. NFSv4.1 only defines how 11934 this information is presented by the server to the client. 11936 11.10.1. The fs_locations_server4 Structure 11938 The fs_locations_server4 structure consists of the following items: 11940 o An indication of file system up-to-date-ness (fls_currency) in 11941 terms of approximate seconds before the present. This value is 11942 relative to the master copy. A negative value indicates that the 11943 server is unable to give any reasonably useful value here. A zero 11944 indicates that file system is the actual writable data or a 11945 reliably coherent and fully up-to-date copy. Positive values 11946 indicate how out-of-date this copy can normally be before it is 11947 considered for update. Such a value is not a guarantee that such 11948 updates will always be performed on the required schedule but 11949 instead serve as a hint about how far the copy of the data would 11950 be expected to be behind the most up-to-date copy. 11952 o A counted array of one-byte values (fls_info) containing 11953 information about the particular file system instance. This data 11954 includes general flags, transport capability flags, file system 11955 equivalence class information, and selection priority information. 11956 The encoding will be discussed below. 11958 o The server string (fls_server). For the case of the replica 11959 currently being accessed (via GETATTR), a zero-length string MAY 11960 be used to indicate the current address being used for the RPC 11961 call. 11963 Data within the fls_info array is in the form of 8-bit data items 11964 with constants giving the offsets within the array of various values 11965 describing this particular file system instance. This style of 11966 definition was chosen, in preference to explicit XDR structure 11967 definitions for these values, for a number of reasons. 11969 o The kinds of data in the fls_info array, representing flags, file 11970 system classes and priorities among set of file systems 11971 representing the same data, are such that eight bits provides a 11972 quite acceptable range of values. Even where there might be more 11973 than 256 such file system instances, having more than 256 distinct 11974 classes or priorities is unlikely. 11976 o Explicit definition of the various specific data items within XDR 11977 would limit expandability in that any extension within a 11978 subsequent minor version would require yet another attribute, 11979 leading to specification and implementation clumsiness. 11981 o Such explicit definitions would also make it impossible to propose 11982 standards-track extensions apart from a full minor version. 11984 This encoding scheme can be adapted to the specification of multi- 11985 byte numeric values, even though none are currently defined. If 11986 extensions are made via standards-track RFC's, multi-byte quantities 11987 will be encoded as a range of bytes with a range of indices with the 11988 byte interpreted in big endian byte order. Further any such index 11989 assignments are constrained so that the relevant quantities will not 11990 cross XDR word boundaries. 11992 The set of fls_info data is subject to expansion in a future minor 11993 version, or in a standard-track RFC, within the context of a single 11994 minor version. The server SHOULD NOT send and the client MUST NOT 11995 use indices within the fls_info array that are not defined in 11996 standards-track RFC's. 11998 The fls_info array contains within it: 12000 o Two 8-bit flag fields, one devoted to general file-system 12001 characteristics and a second reserved for transport-related 12002 capabilities. 12004 o Six 8-bit class values which define various file system 12005 equivalence classes as explained below. 12007 o Four 8-bit priority values which govern file system selection as 12008 explained below. 12010 The general file system characteristics flag (at byte index 12011 FSLI4BX_GFLAGS) has the following bits defined within it: 12013 o FSLI4GF_WRITABLE indicates that this file system target is 12014 writable, allowing it to be selected by clients which may need to 12015 write on this file system. When the current file system instance 12016 is writable, and is defined as of the same simultaneous use class 12017 (as specified by the value at index FSLI4BX_CLSIMUL) to which the 12018 client was previously writing, then it must incorporate within its 12019 data any committed write made on the source file system instance. 12020 See Section 11.7.8 which discusses the write-verifier class. 12021 While there is no harm in not setting this flag for a file system 12022 that turns out to be writable, turning the flag on for read-only 12023 file system can cause problems for clients which select a 12024 migration or replication target based on it and then find 12025 themselves unable to write. 12027 o FSLI4GF_CUR_REQ indicates that this replica is the one on which 12028 the request is being made. Only a single server entry may have 12029 this flag set and in the case of a referral, no entry will have 12030 it. 12032 o FSLI4GF_ABSENT indicates that this entry corresponds an absent 12033 file system replica. It can only be set if FSLI4GF_CUR_REQ is 12034 set. When both such bits are set it indicates that a file system 12035 instance is not usable but that the information in the entry can 12036 be used to determine the sorts of continuity available when 12037 switching from this replica to other possible replicas. Since 12038 this bit can only be true if FSLI4GF_CUR_REQ is true, the value 12039 could be determined using the fs_status attribute but the 12040 information is also made available here for the convenience of the 12041 client. An entry with this bit, since it represents a true file 12042 system (albeit absent), does not appear in the event of a 12043 referral, but only where a file system has been accessed at this 12044 location and has subsequently been migrated. 12046 o FSLI4GF_GOING indicates that a replica, while still available, 12047 should not be used further. The client, if using it, should make 12048 an orderly transfer to another file system instance as 12049 expeditiously as possible. It is expected that file systems going 12050 out of service will be announced as FSLI4GF_GOING some time before 12051 the actual loss of service and that the valid_for value will be 12052 sufficiently small to allow clients to detect and act on scheduled 12053 events while large enough that the cost of the requests to fetch 12054 the fs_locations_info values will not be excessive. Values on the 12055 order of ten minutes seem reasonable. 12057 When this flag is seen as part of a transition into a new file 12058 system, a client might choose to transfer immediately to another 12059 replica, or it may reference the current file system and only 12060 transition when a migration event occurs. Similarly, when this 12061 flag appears as a replica in the referral, clients would likely to 12062 avoid being referred to this instance whenever there is another 12063 choice. 12065 o FSLI4GF_SPLIT indicates that when a transition occurs from the 12066 current file system instance to this one, the replacement may 12067 consist of multiple file systems. In this case, the client has to 12068 be prepared for the possibility that objects on the same file 12069 system before migration will be on different ones after. Note 12070 that FSLI4GF_SPLIT is not incompatible with the file systems 12071 belonging to the same _fileid_ class since, if one has a set of 12072 fileids that are unique within a file system, each subset assigned 12073 to a smaller file system after migration would not have any 12074 conflicts internal to that file system. 12076 A client, in the case of a split file system, will interrogate 12077 existing files with which it has continuing connection (it is free 12078 simply forget cached filehandles). If the client remembers the 12079 directory filehandle associated with each open file, it may 12080 proceed upward using LOOKUPP to find the new file system 12081 boundaries. Note that in the event of a referral, there will not 12082 be any such files and so these action will not be performed. 12083 Instead, reference to portions of the original file system split 12084 off into other will encounter an fsid change and possibly a 12085 further referral. 12087 Once the client recognizes that one file system has been split 12088 into two, it could maintain applications running without 12089 disruption by presenting the two file systems as a single one 12090 until a convenient point to recognize the transition, such as a 12091 restart. This would require a mapping of fsids from the server's 12092 fsids to fsids as seen by the client but this is already necessary 12093 for other reasons. As noted above, existing fileids within the 12094 two descendant file systems will not conflict. Providing non- 12095 conflicting fileids for newly-created files on the files on the 12096 split file systems is the responsibility of the server (or servers 12097 working in concert). Note that filehandles could be different for 12098 file systems that tool part in the split form those newly 12099 accessed, allowing the server to determine when the need for such 12100 treatment is over. 12102 Although it is possible for this flag to be present in the event 12103 of referral, it would generally be of little interest to the 12104 client, since the client is not expected to have information 12105 regarding the current contents of the absent file system. 12107 The transport-flag field (at byte index FSLI4BX_TFLAGS) contains the 12108 following bits related to the transport capabilities of the specific 12109 file system. 12111 o FSLI4TF_RDMA indicates that this file system provides NFSv4.1 file 12112 system access using an RDMA-capable transport. 12114 Attribute continuity and file system identity information are 12115 expressed by defining equivalence relations on the sets of file 12116 systems presented to the client. Each such relation is expressed as 12117 a set of file system equivalence classes. For each relation, a file 12118 system has an 8-bit class number. Two file systems belong to the 12119 same class if both have identical non-zero class numbers. Zero is 12120 treated as non-matching. Most often, the relevant question for the 12121 client will be whether a given replica is identical-to/ 12122 continuous-with the current one in a given respect but the 12123 information should be available also as to whether two other replicas 12124 match in that respect as well. 12126 The following fields specify the file system's class numbers for the 12127 equivalence relations used in determining the nature of file system 12128 transitions. See Section 11.7 for details about how this information 12129 is to be used. Servers may assign these values as they wish, so long 12130 as file system instances that share the same value have the specified 12131 relationship to one another, conversely file systems which have the 12132 specified relationship to one another share a common class value. As 12133 each instance entry is added, the relationships of this instance to 12134 previously entered instances can be consulted and if one is found 12135 that bears the specified relationship, that entry's class value can 12136 be copied to the new entry. When no such previous entry exists, a 12137 new value for that byte index, not previously used can be selected, 12138 most likely by increment the value of the last class value assigned 12139 for that index. 12141 o The field with byte index FSLI4BX_CLSIMUL defines the 12142 simultaneous-use class for the file system. 12144 o The field with byte index FSLI4BX_CLHANDLE defines the handle 12145 class for the file system. 12147 o The field with byte index FSLI4BX_CLFILEID defines the fileid 12148 class for the file system. 12150 o The field with byte index FSLI4BX_CLWRITEVER defines the write- 12151 verifier class for the file system. 12153 o The field with byte index FSLI4BX_CLCHANGE defines the change 12154 class for the file system. 12156 o The field with byte index FSLI4BX_CLREADDIR defines the readdir 12157 class for the file system. 12159 Server-specified preference information is also provided via 8-bit 12160 values within the fls_info array. The values provide a rank and an 12161 order (see below) to be used with separate values specifiable for the 12162 cases of read-only and writable file systems. These values are 12163 compared for different file systems to establish the server-specified 12164 preference, with lower values indicating "more preferred". 12166 Rank is used to express a strict server-imposed ordering on clients, 12167 with lower values indicating "more preferred." Clients should 12168 attempt to use all replicas with a given rank before they use one 12169 with a higher rank. Only if all of those file systems are 12170 unavailable should the client proceed to those of a higher rank. 12171 Because specifying a rank will override client preferences, servers 12172 should be conservative about using this mechanism, particularly when 12173 the environment is one in client communication characteristics are 12174 not tightly controlled and visible to the server. 12176 Within a rank, the order value is used to specify the server's 12177 preference to guide the client's selection when the client's own 12178 preferences are not controlling, with lower values of order 12179 indicating "more preferred." If replicas are approximately equal in 12180 all respects, clients should defer to the order specified by the 12181 server. When clients look at server latency as part of their 12182 selection, they are free to use this criterion but it is suggested 12183 that when latency differences are not significant, the server- 12184 specified order should guide selection. 12186 o The field at byte index FSLI4BX_READRANK gives the rank value to 12187 be used for read-only access. 12189 o The field at byte index FSLI4BX_READORDER gives the order value to 12190 be used for read-only access. 12192 o The field at byte index FSLI4BX_WRITERANK gives the rank value to 12193 be used for writable access. 12195 o The field at byte index FSLI4BX_WRITEORDER gives the order value 12196 to be used for writable access. 12198 Depending on the potential need for write access by a given client, 12199 one of the pairs of rank and order values is used. The read rank and 12200 order should only be used if the client knows that only reading will 12201 ever be done or if it is prepared to switch to a different replica in 12202 the event that any write access capability is required in the future. 12204 11.10.2. The fs_locations_info4 Structure 12206 The fs_locations_info4 structure, encoding the fs_locations_info 12207 attribute, contains the following: 12209 o The fli_flags field which contains general flags that affect the 12210 interpretation of this fs_locations_info4 structure and all 12211 fs_locations_item4 structures within it. The only flag currently 12212 defined is FSLI4IF_VAR_SUB. All bits in the fli_flags field which 12213 are not defined should always be returned as zero. 12215 o The fli_fs_root field which contains the pathname of the root of 12216 the current file system on the current server, just as it does in 12217 the fs_locations4 structure. 12219 o An array called fli_items of fs_locations4_item structures, which 12220 contain information about replicas of the current file system. 12221 Where the current file system is actually present, or has been 12222 present, i.e. this is not a referral situation, one of the 12223 fs_locations_item4 structures will contain an fs_locations_server4 12224 for the current server. This structure will have FSLI4GF_ABSENT 12225 set if the current file system is absent, i.e. normal access to it 12226 will return NFS4ERR_MOVED. 12228 o The fli_valid_for field specifies a time in seconds for which it 12229 is reasonable for a client to use the fs_locations_info attribute 12230 without refetch. The fli_valid_for value does not provide a 12231 guarantee of validity since servers can unexpectedly go out of 12232 service or become inaccessible for any number of reasons. Clients 12233 are well-advised to refetch this information for actively accessed 12234 file system at every fli_valid_for seconds. This is particularly 12235 important when file system replicas may go out of service in a 12236 controlled way using the FSLI4GF_GOING flag to communicate an 12237 ongoing change. The server should set fli_valid_for to a value 12238 which allows well-behaved clients to notice the FSLI4GF_GOING flag 12239 and make an orderly switch before the loss of service becomes 12240 effective. If this value is zero, then no refetch interval is 12241 appropriate and the client need not refetch this data on any 12242 particular schedule. In the event of a transition to a new file 12243 system instance, a new value of the fs_locations_info attribute 12244 will be fetched at the destination and it is to be expected that 12245 this may have a different valid_for value, which the client should 12246 then use, in the same fashion as the previous value. 12248 The FSLI4IF_VAR_SUB flag within fli_flags controls whether variable 12249 substitution is to be enabled. See Section 11.10.3 for an 12250 explanation of variable substitution. 12252 11.10.3. The fs_locations_item4 Structure 12254 The fs_locations_item4 structure contains a pathname (in the field 12255 fli_rootpath) which encodes the path of the target file system 12256 replicas on the set of servers designated by the included 12257 fs_locations_server4 entries. The precise manner in which this 12258 target location is specified depends on the value of the 12259 FSLI4IF_VAR_SUB flag within the associated fs_locations_info4 12260 structure. 12262 If this flag is not set, then fli_rootpath simply designates the 12263 location of the target file system within each server's single-server 12264 namespace just as it does for the rootpath within the fs_location4 12265 structure. When this bit is set, however, component entries of a 12266 certain form are subject to client-specific variable substitution so 12267 as to allow a degree of namespace non-uniformity in order to 12268 accommodate the selection of client-specific file system targets to 12269 adapt to different client architectures or other characteristics. 12271 When such substitution is in effect a variable beginning with the 12272 string "${" and ending with the string "}" and containing a colon is 12273 to be replaced by the client-specific value associated with that 12274 variable. The string "unknown" should be used by the client when it 12275 has no value for such a variable. The pathname resulting from such 12276 substitutions is used to designate the target file system, so that 12277 different clients may have different file systems, corresponding to 12278 that location in the multi-server namespace. 12280 As mentioned above, such substituted pathname variables contain a 12281 colon. The part before the colon is to be a DNS domain name with the 12282 part after being a case-insensitive alphanumeric string. 12284 Where the domain is "ietf.org", only variable names defined in this 12285 document or subsequent standards-track RFC's are subject to such 12286 substitution. Organizations are free to use their domain names to 12287 create their own sets of client-specific variables, to be subject to 12288 such substitution. In case where such variables are intended to be 12289 used more broadly than a single organization, publication of an 12290 informational RFC defining such variables is RECOMMENDED. 12292 The variable ${ietf.org:CPU_ARCH} is used to denote the CPU 12293 architecture object files are compiled. This specification does not 12294 limit the acceptable values (except that they must be valid UTF-8 12295 strings) but such values as "x86", "x86_64" and "sparc" would be 12296 expected to be used in line with industry practice. 12298 The variable ${ietf.org:OS_TYPE} is used to denote the operating 12299 system and thus the kernel and library API's for which code might be 12300 compiled. This specification does not limit the acceptable values 12301 (except that they must be valid UTF-8 strings) but such values as 12302 "linux" and "freebsd" would be expected to be used in line with 12303 industry practice. 12305 The variable ${ietf.org:OS_VERSION} is used to denote the operating 12306 system version and the thus the specific details of versioned 12307 interfaces for which code might be compiled. This specification does 12308 not limit the acceptable values (except that they must be valid UTF-8 12309 strings) but combinations of numbers and letters with interspersed 12310 dots would be expected to be used in line with industry practice, 12311 with the details of the version format depending on the specific 12312 value of the value of the variable ${ietf.org:OS_TYPE} with which it 12313 is used. 12315 Use of these variable could result in direction of different clients 12316 to different file systems on the same server, as appropriate to 12317 particular clients. In cases in which the target file systems are 12318 located on different servers, a single server could serve as a 12319 referral point so that each valid combination of variable values 12320 would designate a referral hosted on a single server, with the 12321 targets of those referrals on a number of different servers. 12323 Because namespace administration is affected by the values selected 12324 to substitute for various variables, clients should provide 12325 convenient means of determining what variable substitutions a client 12326 will implement, as well as, where appropriate, providing means to 12327 control the substitutions to be used. The exact means by which this 12328 will be done is outside the scope of this specification. 12330 Although variable substitution is most suitable for use in the 12331 context of referrals, if may be used in the context of replication 12332 and migration. If it is used in these contexts, the server must 12333 ensure that no matter what values the client presents for the 12334 substituted variables, the result is always a valid successor file 12335 system instance to that from which a transition is occurring, i.e. 12336 that the data is identical or represents a later image of a writable 12337 file system. 12339 Note that when fli_rootpath is a null pathname (that is, one with 12340 zero components), the file system designated is at the root of the 12341 specified server, whether the FSLI4IF_VAR_SUB flag within the 12342 associated fs_locations_info4 structure is set or not. 12344 11.11. The Attribute fs_status 12346 In an environment in which multiple copies of the same basic set of 12347 data are available, information regarding the particular source of 12348 such data and the relationships among different copies can be very 12349 helpful in providing consistent data to applications. 12351 enum fs4_status_type { 12352 STATUS4_FIXED = 1, 12353 STATUS4_UPDATED = 2, 12354 STATUS4_VERSIONED = 3, 12355 STATUS4_WRITABLE = 4, 12356 STATUS4_REFERRAL = 5 12357 }; 12359 struct fs4_status { 12360 bool fss_absent; 12361 fs4_status_type fss_type; 12362 utf8str_cs fss_source; 12363 utf8str_cs fss_current; 12364 int32_t fss_age; 12365 nfstime4 fss_version; 12366 }; 12367 The boolean fss_absent indicates whether the file system is currently 12368 absent. This value will be set if the file system was previously 12369 present and becomes absent, or if the file system has never been 12370 present and the type is STATUS4_REFERRAL. When this boolean is set 12371 and the type is not STATUS4_REFERRAL, the remaining information in 12372 the fs4_status reflects that last valid when the file system was 12373 present. 12375 The fss_type field indicates the kind of file system image 12376 represented. This is of particular importance when using the version 12377 values to determine appropriate succession of file system images. 12378 When fss_absent is set, and the file system was previously present, 12379 the value of fss_type reflected is that when the file was last 12380 present. Five values are distinguished: 12382 o STATUS4_FIXED which indicates a read-only image in the sense that 12383 it will never change. The possibility is allowed that, as a 12384 result of migration or switch to a different image, changed data 12385 can be accessed, but within the confines of this instance, no 12386 change is allowed. The client can use this fact to cache 12387 aggressively. 12389 o STATUS4_VERSIONED which indicates that the image, like the 12390 STATUS4_UPDATED case, is updated externally, but it provides a 12391 guarantee that the server will carefully update an associated 12392 version value so that the client can protect itself from a 12393 situation in which it reads data from one version of the file 12394 system, and then later reads data from an earlier version of the 12395 same file system. See below for a discussion of how this can be 12396 done. 12398 o STATUS4_UPDATED which indicates an image that cannot be updated by 12399 the user writing to it but may be changed externally, typically 12400 because it is a periodically updated copy of another writable file 12401 system somewhere else. In this case, version information is not 12402 provided and the client does not have the responsibility of making 12403 sure that this version only advances upon a file system instance 12404 transition. In this case, it is the responsibility of the server 12405 to make sure that the data presented after a file system instance 12406 transition is a proper successor image and includes all changes 12407 seen by the client and any change made before all such changes. 12409 o STATUS4_WRITABLE which indicates that the file system is an actual 12410 writable one. The client need not, of course, actually write to 12411 the file system, but once it does, it should not accept a 12412 transition to anything other than a writable instance of that same 12413 file system. 12415 o STATUS4_REFERRAL which indicates that the file system is question 12416 is absent and has never been present on this server. 12418 Note that in the STATUS4_UPDATED and STATUS4_VERSIONED cases, the 12419 server is responsible for the appropriate handling of locks that are 12420 inconsistent with external changes to delegations. If a server gives 12421 out delegations, they SHOULD be recalled before an inconsistent 12422 change made to data, and MUST be revoked if this is not possible. 12423 Similarly, if an open is inconsistent with data that is changed (the 12424 open denies WRITE and the data is changed), that lock SHOULD be 12425 considered administratively revoked. 12427 The opaque strings fss_source and fss_current provide a way of 12428 presenting information about the source of the file system image 12429 being present. It is not intended that client do anything with this 12430 information other than make it available to administrative tools. It 12431 is intended that this information be helpful when researching 12432 possible problems with a file system image that might arise when it 12433 is unclear if the correct image is being accessed and if not, how 12434 that image came to be made. This kind of diagnostic information will 12435 be helpful, if, as seems likely, copies of file systems are made in 12436 many different ways (e.g. simple user-level copies, file system-level 12437 point-in-time copies, clones of the underlying storage), under a 12438 variety of administrative arrangements. In such environments, 12439 determining how a given set of data was constructed can be very 12440 helpful in resolving problems. 12442 The opaque string fss_source is used to indicate the source of a 12443 given file system with the expectation that tools capable of creating 12444 a file system image propagate this information, when that is 12445 possible. It is understood that this may not always be possible 12446 since a user-level copy may be thought of as creating a new data set 12447 and the tools used may have no mechanism to propagate this data. 12448 When a file system is initially created, it is desirable to associate 12449 with it data regarding how the file system was created, where it was 12450 created, by whom, etc. Making this information available in this 12451 attribute in a human-readable string form will be helpful for 12452 applications and system administrators and also serves to make it 12453 available when the original file system is used to make subsequent 12454 copies. 12456 The opaque string fss_current should provide whatever information is 12457 available about the source of the current copy. Such information as 12458 the tool creating it, any relevant parameters to that tool, the time 12459 at which the copy was done, the user making the change, the server on 12460 which the change was made, etc. All information should be in a 12461 human-readable string form. 12463 The field fss_age provides an indication of how out-of-date the file 12464 system currently is with respect to its ultimate data source (in case 12465 of cascading data updates). This complements the fls_currency field 12466 of fs_locations_server4 (see Section 11.10) in the following way: the 12467 information in fls_currency gives a bound for how out of date the 12468 data in a file system might typically get, while the value in fss_age 12469 gives a bound on how out of date that data actually is. Negative 12470 values imply that no information is available. A zero means that 12471 this data is known to be current. A positive value means that this 12472 data is known to be no older than that number of seconds with respect 12473 to the ultimate data source. Using this value, the client may be 12474 able to decide that a data copy is too old, so that it may search for 12475 a newer version to use. 12477 The fss_version field provides a version identification, in the form 12478 of a time value, such that successive versions always have later time 12479 values. When the fs_type is anything other than STATUS4_VERSIONED, 12480 the server may provide such a value but there is no guarantee as to 12481 its validity and clients will not use it except to provide additional 12482 information to add to fss_source and fss_current. 12484 When fss_type is STATUS4_VERSIONED, servers SHOULD provide a value of 12485 version which progresses monotonically whenever any new version of 12486 the data is established. This allows the client, if reliable image 12487 progression is important to it, to fetch this attribute as part of 12488 each COMPOUND where data or metadata from the file system is used. 12490 When it is important to the client to make sure that only valid 12491 successor images are accepted, it must make sure that it does not 12492 read data or metadata from the file system without updating its sense 12493 of the current state of the image, to avoid the possibility that the 12494 fs_status which the client holds will be one for an earlier image, 12495 and so accept a new file system instance which is later than that but 12496 still earlier than updated data read by the client. 12498 In order to do this reliably, it must do a GETATTR of the fs_status 12499 attribute that follows any interrogation of data or metadata within 12500 the file system in question. Often this is most conveniently done by 12501 appending such a GETATTR after all other operations that reference a 12502 given file system. When errors occur between reading file system 12503 data and performing such a GETATTR, care must be exercised to make 12504 sure that the data in question is not used before obtaining the 12505 proper fs_status value. In this connection, when an OPEN is done 12506 within such a versioned file system and the associated GETATTR of 12507 fs_status is not successfully completed, the open file in question 12508 must not be accessed until that fs_status is fetched. 12510 The procedure above will ensure that before using any data from the 12511 file system the client has in hand a newly-fetched current version of 12512 the file system image. Multiple values for multiple requests in 12513 flight can be resolved by assembling them into the required partial 12514 order (and the elements should form a total order within it) and 12515 using the last. The client may then, when switching among file 12516 system instances, decline to use an instance which does not have an 12517 fss_type of STATUS4_VERSIONED or whose fss_version field is earlier 12518 than the last one obtained from the predecessor file system instance. 12520 12. Parallel NFS (pNFS) 12522 12.1. Introduction 12524 pNFS is an OPTIONAL feature within NFSv4.1; the pNFS feature set 12525 allows direct client access to the storage devices containing file 12526 data. When file data for a single NFSv4 server is stored on multiple 12527 and/or higher throughput storage devices (by comparison to the 12528 server's throughput capability), the result can be significantly 12529 better file access performance. The relationship among multiple 12530 clients, a single server, and multiple storage devices for pNFS 12531 (server and clients have access to all storage devices) is shown in 12532 this diagram: 12534 +-----------+ 12535 |+-----------+ +-----------+ 12536 ||+-----------+ | | 12537 ||| | NFSv4.1 + pNFS | | 12538 +|| Clients |<------------------------------>| Server | 12539 +| | | | 12540 +-----------+ | | 12541 ||| +-----------+ 12542 ||| | 12543 ||| | 12544 ||| Storage +-----------+ | 12545 ||| Protocol |+-----------+ | 12546 ||+----------------||+-----------+ Control | 12547 |+-----------------||| | Protocol| 12548 +------------------+|| Storage |------------+ 12549 +| Devices | 12550 +-----------+ 12552 Figure 68 12554 In this model, the clients, server, and storage devices are 12555 responsible for managing file access. This is in contrast to NFSv4 12556 without pNFS where it is primarily the server's responsibility; some 12557 of this responsibility may be delegated to the client under strictly 12558 specified conditions. 12560 pNFS takes the form of OPTIONAL operations that manage protocol 12561 objects called 'layouts' which contain a byte-range and storage 12562 location information. The layout is managed in a similar fashion as 12563 NFSv4.1 data delegations. For example, the layout is leased, 12564 recallable and revocable. However, layouts are distinct abstractions 12565 and are manipulated with new operations. When a client holds a 12566 layout, it is granted the ability to directly access the byte-range 12567 at the storage location specified in the layout. 12569 There are interactions between layouts and other NFSv4.1 abstractions 12570 such as data delegations and byte-range locking. Delegation issues 12571 are discussed in Section 12.5.5. Byte range locking issues are 12572 discussed in Section 12.2.9 and Section 12.5.1. 12574 The NFSv4.1 pNFS feature has been structured to allow for a variety 12575 of storage protocols to be defined and used. As noted in the diagram 12576 above, the storage protocol is the method used by the client to store 12577 and retrieve data directly from the storage devices. The NFSv4.1 12578 protocol directly defines one storage protocol, the NFSv4.1 storage 12579 type, and its use. 12581 Examples of other storage protocols that could be used with NFSv4.1's 12582 pNFS are: 12584 o Block/volume protocols such as iSCSI ([38]), and FCP ([39]). The 12585 block/volume protocol support can be independent of the addressing 12586 structure of the block/volume protocol used, allowing more than 12587 one protocol to access the same file data and enabling 12588 extensibility to other block/volume protocols. 12590 o Object protocols such as OSD over iSCSI or Fibre Channel [40]. 12592 o Other storage protocols, including PVFS and other file systems 12593 that are in use in HPC environments. 12595 It is possible that various storage protocols are available to both 12596 client and server and it may be possible that a client and server do 12597 not have a matching storage protocol available to them. Because of 12598 this, the pNFS server MUST support normal NFSv4.1 access to any file 12599 accessible by the pNFS feature; this will allow for continued 12600 interoperability between an NFSv4.1 client and server. 12602 12.2. pNFS Definitions 12604 NFSv4.1's pNFS feature provides parallel data access to a file system 12605 that stripes its content across multiple storage servers. The first 12606 instantiation of pNFS, as part of NFSv4.1, separates the file system 12607 protocol processing into two parts: metadata processing and data 12608 processing. Data consist of the contents of regular files which are 12609 striped across storage servers. Data striping occurs in at least two 12610 ways: on a file-by-file basis, and within sufficiently large files, 12611 on a block-by-block basis. In contrast, striped access to metadata 12612 by pNFS clients is not provided in NFSv4.1, even though the file 12613 system back end of a pNFS server might stripe metadata. Metadata 12614 consist of everything else, including the contents of non-regular 12615 files (e.g. directories); see Section 12.2.1. The metadata 12616 functionality is implemented by an NFSv4.1 server that supports pNFS 12617 and the operations described in (Section 18); such a server is called 12618 a metadata server (Section 12.2.2). 12620 The data functionality is implemented by one or more storage devices, 12621 each of which are accessed by the client via a storage protocol. A 12622 subset (defined in Section 13.6) of NFSv4.1 is one such storage 12623 protocol. New terms are introduced to the NFSv4.1 nomenclature and 12624 existing terms are clarified to allow for the description of the pNFS 12625 feature. 12627 12.2.1. Metadata 12629 Information about a file system object, such as its name, location 12630 within the namespace, owner, ACL and other attributes. Metadata may 12631 also include storage location information and this will vary based on 12632 the underlying storage mechanism that is used. 12634 12.2.2. Metadata Server 12636 An NFSv4.1 server which supports the pNFS feature. A variety of 12637 architectural choices exists for the metadata server and its use of 12638 file system information held at the server. Some servers may contain 12639 metadata only for file objects residing at the metadata server while 12640 the file data resides on associated storage devices. Other metadata 12641 servers may hold both metadata and a varying degree of file data. 12643 12.2.3. pNFS Client 12645 An NFSv4.1 client that supports pNFS operations and supports at least 12646 one storage protocol for performing I/O to storage devices. 12648 12.2.4. Storage Device 12650 A storage device stores a regular file's data, but leaves metadata 12651 management to the metadata server. A storage device could be another 12652 NFSv4.1 server, an object storage device (OSD), a block device 12653 accessed over a SAN (e.g., either FiberChannel or iSCSI SAN), or some 12654 other entity. 12656 12.2.5. Storage Protocol 12658 A storage protocol is the protocol used between the pNFS client and 12659 the storage device to access the file data. 12661 12.2.6. Control Protocol 12663 The control protocol is used by the exported file system between the 12664 metadata server and storage devices. Specification of such protocols 12665 is outside the scope of the NFSv4.1 protocol. Such control protocols 12666 would be used to control activities such as the allocation and 12667 deallocation of storage and the management of state required by the 12668 storage devices to perform client access control. 12670 A particular control protocol is not REQUIRED by NFSv4.1 but 12671 requirements are placed on the control protocol for maintaining 12672 attributes like modify time, the change attribute, and the end-of- 12673 file (EOF) position. 12675 12.2.7. Layout Types 12677 A layout describes the mapping of a file's data to the storage 12678 devices that hold the data. A layout is said to belong to a specific 12679 layout type (data type layouttype4, see Section 3.3.13). The layout 12680 type allows for variants to handle different storage protocols, such 12681 as those associated with block/volume [31], object [30], and file 12682 (Section 13) layout types. A metadata server, along with its control 12683 protocol, MUST support at least one layout type. A private sub-range 12684 of the layout type name space is also defined. Values from the 12685 private layout type range MAY be used for internal testing or 12686 experimentation. 12688 As an example, the organization of the file layout type could be an 12689 array of tuples (e.g., device ID, filehandle), along with a 12690 definition of how the data is stored across the devices (e.g., 12691 striping). A block/volume layout might be an array of tuples that 12692 store along with information 12693 about block size and the associated file offset of the block number. 12694 An object layout might be an array of tuples 12695 and an additional structure (i.e., the aggregation map) that defines 12696 how the logical byte sequence of the file data is serialized into the 12697 different objects. Note that the actual layouts are typically more 12698 complex than these simple expository examples. 12700 Requests for pNFS-related operations will often specify a layout 12701 type. Examples of such operations are GETDEVICEINFO and LAYOUTGET. 12703 The response for these operations will include structures such a 12704 device_addr4 or a layout4, each of which includes a layout type 12705 within it. The layout type sent by the server MUST always be the 12706 same one requested by the client. When a server sends a response 12707 that includes a different layout type, the client SHOULD ignore the 12708 response and behave as if the server had returned an error response. 12710 12.2.8. Layout 12712 A layout defines how a file's data is organized on one or more 12713 storage devices. There are many potential layout types; each of the 12714 layout types are differentiated by the storage protocol used to 12715 access data and in the aggregation scheme that lays out the file data 12716 on the underlying storage devices. A layout is precisely identified 12717 by the following tuple: ; where filehandle refers to the filehandle of the file on the 12719 metadata server. 12721 It is important to define when layouts overlap and/or conflict with 12722 each other. For two layouts with overlapping byte ranges to actually 12723 overlap each other, both layouts must be of the same layout type, 12724 correspond to the same filehandle, and have the same iomode. Layouts 12725 conflict when they overlap and differ in the content of the layout 12726 (i.e., the storage device/file mapping parameters differ). Note that 12727 differing iomodes do not lead to conflicting layouts. It is 12728 permissible for layouts with different iomodes, pertaining to the 12729 same byte range, to be held by the same client. An example of this 12730 would be copy-on-write functionality for a block/volume layout type. 12732 12.2.9. Layout Iomode 12734 The layout iomode (data type layoutiomode4, see Section 3.3.20) 12735 indicates to the metadata server the client's intent to perform 12736 either just read operations or a mixture of I/O possibly containing 12737 read and write operations. For certain layout types, it is useful 12738 for a client to specify this intent at the time it sends LAYOUTGET 12739 (Section 18.43). For example, block/volume based protocols, block 12740 allocation could occur when a READ/WRITE iomode is specified. A 12741 special LAYOUTIOMODE4_ANY iomode is defined and can only be used for 12742 LAYOUTRETURN and CB_LAYOUTRECALL, not for LAYOUTGET. It specifies 12743 that layouts pertaining to both READ and READ/WRITE iomodes are being 12744 returned or recalled, respectively. 12746 A storage device may validate I/O with regard to the iomode; this is 12747 dependent upon storage device implementation and layout type. Thus, 12748 if the client's layout iomode is inconsistent with the I/O being 12749 performed, the storage device may reject the client's I/O with an 12750 error indicating a new layout with the correct iomode should be 12751 obtained via LAYOUTGET. For example, if a client gets a layout with 12752 a READ iomode and performs a WRITE to a storage device, the storage 12753 device is allowed to reject that WRITE. 12755 The use of the layout iomode does not conflict with OPEN share modes 12756 or byte-range lock requests; open mode and lock conflicts are 12757 enforced as they are without the use of pNFS, and are logically 12758 separate from the pNFS layout level. Open modes and locks are the 12759 preferred method for restricting user access to data files. For 12760 example, an OPEN of read, deny-write does not conflict with a 12761 LAYOUTGET containing an iomode of READ/WRITE performed by another 12762 client. Applications that depend on writing into the same file 12763 concurrently may use byte-range locking to serialize their accesses. 12765 12.2.10. Device IDs 12767 The device ID (data type deviceid4, see Section 3.3.14) identifies a 12768 group of storage devices. The scope of a device ID is the pair 12769 . In practice, a significant amount of 12770 information may be required to fully address a storage device. 12771 Rather than embedding all such information in a layout, layouts embed 12772 device IDs. The NFSv4.1 operation GETDEVICEINFO (Section 18.40) is 12773 used to retrieve the complete address information (including all 12774 device addresses for the device ID) regarding the storage device 12775 according to its layout type and device ID. For example, the address 12776 of an NFSv4.1 data server or of an object storage device could be an 12777 IP address and port. The address of a block storage device could be 12778 a volume label. 12780 Clients cannot expect the mapping between a device ID and its storage 12781 device address(es) to persist across metadata server restart. See 12782 Section 12.7.4 for a description of how recovery works in that 12783 situation. 12785 A device ID lives as long as there is a layout referring to the 12786 device ID. If there are no layouts referring to the device ID, the 12787 server is free to delete the device ID any time. Once a device ID is 12788 deleted by the server, the server MUST NOT reuse the device ID for 12789 the same layout type and client ID again. This requirement is 12790 feasible because the device ID is 16 bytes long, leaving sufficient 12791 room to store a generation number if server's implementation requires 12792 most of the rest of the device ID's content to be reused. This 12793 requirement is necessary because otherwise the race conditions 12794 between asynchronous notification of device ID addition and deletion 12795 would be too difficult to sort out. 12797 Device ID to device address mappings are not leased, and can be 12798 changed at any time. (Note that while device ID to device address 12799 mappings are likely to change after the metadata server restarts, the 12800 server is not required to change the mappings.) A server has two 12801 choices for changing mappings. It can recall all layouts referring 12802 to the device ID or it can use a notification mechanism. 12804 The NFSv4.1 protocol has no optimal way to recall all layouts that 12805 referred to a particular device ID (unless the server associates a 12806 single device ID with a single fsid or a single client ID; in which 12807 case, CB_LAYOUTRECALL has options for recalling all layouts 12808 associated with the fsid, client ID pair or just the client ID). 12810 Via a notification mechanism (see Section 20.12), device ID to device 12811 address mappings can change over the duration of server operation 12812 without recalling or revoking the layouts that refer to device ID. 12813 The notification mechanism can also delete a device ID, but only if 12814 the client has no layouts referring to the device ID. A notification 12815 of a change to a device ID to device address mapping will immediately 12816 or eventually invalidate some or all of the device ID's mappings. 12817 The server MUST support notifications and the client must request 12818 them before they can be used. For further information about the 12819 notification types Section 20.12. 12821 12.3. pNFS Operations 12823 NFSv4.1 has several operations that are needed for pNFS servers, 12824 regardless of layout type or storage protocol. These operations are 12825 all sent to a metadata server and summarized here. While pNFS is an 12826 OPTIONAL feature, if pNFS is implemented, some operations are 12827 REQUIRED in order to comply with pNFS. See Section 17. 12829 These are the fore channel pNFS operations: 12831 GETDEVICEINFO. As noted previously (Section 12.2.10), GETDEVICEINFO 12832 (Section 18.40) returns the mapping of device ID to storage device 12833 address. 12835 GETDEVICELIST (Section 18.41), allows clients to fetch all device 12836 IDs for a specific file system. 12838 LAYOUTGET (Section 18.43) is used by a client to get a layout for a 12839 file. 12841 LAYOUTCOMMIT (Section 18.42) is used to inform the metadata server 12842 of the client's intent to commit data which has been written to 12843 the storage device; the storage device as originally indicated in 12844 the return value of LAYOUTGET. 12846 LAYOUTRETURN (Section 18.44) is used to return layouts for a file, 12847 an FSID and for client ID. 12849 These are the backchannel pNFS operations: 12851 CB_LAYOUTRECALL (Section 20.3) recalls a layout or all layouts 12852 belonging to a file system, or all layouts belonging to a client 12853 ID. 12855 CB_RECALL_ANY (Section 20.6), tells a client that it needs to return 12856 some number of recallable objects, including layouts, to the 12857 metadata server. 12859 CB_RECALLABLE_OBJ_AVAIL (Section 20.7) tells a client that a 12860 recallable object that it was denied (in case of pNFS, a layout, 12861 denied by LAYOUTGET) due to resource exhaustion, is now available. 12863 CB_NOTIFY_DEVICEID (Section 20.12) Notifies the client of changes to 12864 device IDs. 12866 12.4. pNFS Attributes 12868 A number of attributes specific to pNFS are listed and described in 12869 Section 5.12 12871 12.5. Layout Semantics 12873 12.5.1. Guarantees Provided by Layouts 12875 Layouts grant to the client the ability to access data located at a 12876 storage device with the appropriate storage protocol. The client is 12877 guaranteed the layout will be recalled when one of two things occur; 12878 either a conflicting layout is requested or the state encapsulated by 12879 the layout becomes invalid and this can happen when an event directly 12880 or indirectly modifies the layout. When a layout is recalled and 12881 returned by the client, the client continues with the ability to 12882 access file data with normal NFSv4.1 operations through the metadata 12883 server. Only the ability to access the storage devices is affected. 12885 The requirement of NFSv4.1, that all user access rights MUST be 12886 obtained through the appropriate open, lock, and access operations, 12887 is not modified with the existence of layouts. Layouts are provided 12888 to NFSv4.1 clients and user access still follows the rules of the 12889 protocol as if they did not exist. It is a requirement that for a 12890 client to access a storage device, a layout must be held by the 12891 client. If a storage device receives an I/O for a byte range for 12892 which the client does not hold a layout, the storage device SHOULD 12893 reject that I/O request. Note that the act of modifying a file for 12894 which a layout is held, does not necessarily conflict with the 12895 holding of the layout that describes the file being modified. 12896 Therefore, it is the requirement of the storage protocol or layout 12897 type that determines the necessary behavior. For example, block/ 12898 volume layout types require that the layout's iomode agree with the 12899 type of I/O being performed. 12901 Depending upon the layout type and storage protocol in use, storage 12902 device access permissions may be granted by LAYOUTGET and may be 12903 encoded within the type-specific layout. For an example of storage 12904 device access permissions see an object based protocol such as [40]. 12905 If access permissions are encoded within the layout, the metadata 12906 server SHOULD recall the layout when those permissions become invalid 12907 for any reason; for example when a file becomes unwritable or 12908 inaccessible to a client. Note, clients are still required to 12909 perform the appropriate access operations with open, lock and access 12910 as described above. The degree to which it is possible for the 12911 client to circumvent these access operations and the consequences of 12912 doing so must be clearly specified by the individual layout type 12913 specifications. In addition, these specifications must be clear 12914 about the requirements and non-requirements for the checking 12915 performed by the server. 12917 In the presence of pNFS functionality, mandatory file locks MUST 12918 behave as they would without pNFS. Therefore, if mandatory file 12919 locks and layouts are provided simultaneously, the storage device 12920 MUST be able to enforce the mandatory file locks. For example, if 12921 one client obtains a mandatory lock and a second client accesses the 12922 storage device, the storage device MUST appropriately restrict I/O 12923 for the byte range of the mandatory file lock. If the storage device 12924 is incapable of providing this check in the presence of mandatory 12925 file locks, the metadata server then MUST NOT grant layouts and 12926 mandatory file locks simultaneously. 12928 12.5.2. Getting a Layout 12930 A client obtains a layout with the LAYOUTGET operation. The metadata 12931 server will grant layouts of a particular type (e.g., block/volume, 12932 object, or file). The client selects an appropriate layout type that 12933 the server supports and the client is prepared to use. The layout 12934 returned to the client might not exactly match the requested byte 12935 range as described in Section 18.43.3. As needed a client may make 12936 multiple LAYOUTGET requests; these might result in multiple 12937 overlapping, non-conflicting layouts (see Section 12.2.8). 12939 In order to get a layout, the client must first have opened the file 12940 via the OPEN operation. When a client has no layout on a file, it 12941 MUST present a stateid as returned by OPEN, a delegation stateid, or 12942 a byte-range lock stateid in the loga_stateid argument. A successful 12943 LAYOUTGET result includes a layout stateid. The first successful 12944 LAYOUTGET processed by the server using a non-layout stateid as an 12945 argument MUST have the "seqid" field of the layout stateid in the 12946 response set to one. Thereafter, the client uses a layout stateid 12947 (see Section 12.5.3) on future invocations of LAYOUTGET on the file, 12948 and the "seqid" MUST NOT be set to zero. Once the layout has been 12949 retrieved, it can be held across multiple OPEN and CLOSE sequences. 12950 Therefore, a client may hold a layout for a file that is not 12951 currently open by any user on the client. This allows for the 12952 caching of layouts beyond CLOSE. 12954 The storage protocol used by the client to access the data on the 12955 storage device is determined by the layout's type. The client is 12956 responsible for matching the layout type with an available method to 12957 interpret and use the layout. The method for this layout type 12958 selection is outside the scope of the pNFS functionality. 12960 Although the metadata server is in control of the layout for a file, 12961 the pNFS client can provide hints to the server when a file is opened 12962 or created about the preferred layout type and aggregation schemes. 12963 pNFS introduces a layout_hint (Section 5.12.4) attribute that the 12964 client can set at file creation time to provide a hint to the server 12965 for new files. Setting this attribute separately, after the file has 12966 been created might make it difficult, or impossible, for the server 12967 implementation to comply. 12969 Because the EXCLUSIVE4 createmode4 does not allow the setting of 12970 attributes at file creation time, NFSv4.1 introduces the EXCLUSIVE4_1 12971 createmode4, which does allow attributes to be set at file creation 12972 time. In addition, if the session is created with persistent reply 12973 caches, EXCLUSIVE4_1 is neither necessary nor allowed. Instead, 12974 GUARDED4 both works better and is prescribed. Table 18 in 12975 Section 18.16.3, summarizes how a client is allowed to send an 12976 exclusive create. 12978 12.5.3. Layout Stateid 12980 As with all other stateids, the layout stateid consists of a "seqid" 12981 and "other" field. Once a layout stateid is changed, the "other" 12982 field will stay constant unless the stateid is revoked, or the client 12983 returns all layouts on the file and the server disposes of the 12984 stateid. The "seqid" field is initially set to one, and is never 12985 zero on any NFSv4.1 operation that uses layout stateids, whether it 12986 is a fore channel or backchannel operation. After the layout stateid 12987 is established, the server increments by one the value of the "seqid" 12988 in each subsequent LAYOUTGET and LAYOUTRETURN response, and in each 12989 CB_LAYOUTRECALL request. 12991 Given the design goal of pNFS to provide parallelism, the layout 12992 stateid differs from other stateid types in that the client is 12993 expected to send LAYOUTGET and LAYOUTRETURN operations in parallel. 12994 The "seqid" value is used by the client to properly sort responses to 12995 LAYOUTGET and LAYOUTRETURN. The "seqid" is also used to prevent race 12996 conditions between LAYOUTGET and CB_LAYOUTRECALL. Given the 12997 processing rules differ from layout stateids and other stateid types, 12998 only the pNFS sections of this document should be considered to 12999 determine proper layout stateid handling. 13001 Once the client receives a layout stateid, it MUST use the correct 13002 "seqid" for subsequent LAYOUTGET or LAYOUTRETURN operations. The 13003 correct "seqid" is defined as the highest "seqid" value from 13004 responses of fully processed LAYOUTGET or LAYOUTRETURN operations or 13005 arguments of a fully processed CB_LAYOUTRECALL operation. Since the 13006 server is incrementing the "seqid" value on each layout operation, 13007 the client may determine the order of operation processing by 13008 inspecting the "seqid" value. In the case of overlapping layout 13009 ranges, the ordering information will provide the client the 13010 knowledge of which layout ranges are held. Note that overlapping 13011 layout ranges may occur because of the client's specific requests or 13012 because the server is allowed to expand the range of a requested 13013 layout and notify the client in the LAYOUTRETURN results. Additional 13014 layout stateid sequencing requirements are provided in 13015 Section 12.5.5.2. 13017 The client's receipt of a "seqid" is not sufficient for subsequent 13018 use. The client must fully process the operations before the "seqid" 13019 can be used. For LAYOUTGET results, if the client is not using the 13020 forgetful model (Section 12.5.5.1), it MUST first update its record 13021 of what ranges of the file's layout it has before using the seqid. 13022 For LAYOUTRETURN results, the client MUST delete the range from its 13023 record of what ranges of the file's layout it had before using the 13024 seqid. For CB_LAYOUTRECALL arguments, the client MUST send a 13025 response to the recall before using the seqid. The fundamental 13026 requirement in client processing is that the "seqid" is used to 13027 provide the order of processing. LAYOUTGET results may be processed 13028 in parallel. LAYOUTRETURN results may be processed in parallel. 13029 LAYOUTGET and LAYOUTRETURN responses may be processed in parallel as 13030 long as the ranges do not overlap. CB_LAYOUTRECALL request 13031 processing MUST be processed in "seqid" order at all times. 13033 Once a client has no more layouts on a file, the layout stateid is no 13034 longer valid, and MUST NOT be used. Any attempt to use such a layout 13035 stateid will result in NFS4ERR_BAD_STATEID. 13037 12.5.4. Committing a Layout 13039 Allowing for varying storage protocols capabilities, the pNFS 13040 protocol does not require the metadata server and storage devices to 13041 have a consistent view of file attributes and data location mappings. 13042 Data location mapping refers to aspects such as which offsets store 13043 data as opposed to storing holes (see Section 13.4.4 for a 13044 discussion). Related issues arise for storage protocols where a 13045 layout may hold provisionally allocated blocks where the allocation 13046 of those blocks does not survive a complete restart of both the 13047 client and server. Because of this inconsistency, it is necessary to 13048 re-synchronize the client with the metadata server and its storage 13049 devices and make any potential changes available to other clients. 13050 This is accomplished by use of the LAYOUTCOMMIT operation. 13052 The LAYOUTCOMMIT operation is responsible for committing a modified 13053 layout to the metadata server. The data should be written and 13054 committed to the appropriate storage devices before the LAYOUTCOMMIT 13055 occurs. The scope of the LAYOUTCOMMIT operation depends on the 13056 storage protocol in use. It is important to note that the level of 13057 synchronization is from the point of view of the client which sent 13058 the LAYOUTCOMMIT. The updated state on the metadata server need only 13059 reflect the state as of the client's last operation previous to the 13060 LAYOUTCOMMIT. It is not REQUIRED to maintain a global view that 13061 accounts for other clients' I/O that may have occurred within the 13062 same time frame. 13064 For block/volume-based layouts, LAYOUTCOMMIT may require updating the 13065 block list that comprises the file and committing this layout to 13066 stable storage. For file-layouts synchronization of attributes 13067 between the metadata and storage devices primarily the size attribute 13068 is required. 13070 The control protocol is free to synchronize the attributes before it 13071 receives a LAYOUTCOMMIT, however upon successful completion of a 13072 LAYOUTCOMMIT, state that exists on the metadata server that describes 13073 the file MUST be in sync with the state existing on the storage 13074 devices that comprise that file as of the issuing client's last 13075 operation. Thus, a client that queries the size of a file between a 13076 WRITE to a storage device and the LAYOUTCOMMIT may observe a size 13077 that does not reflect the actual data written. 13079 The client MUST have a layout in order to issue LAYOUTCOMMIT. 13081 12.5.4.1. LAYOUTCOMMIT and change/time_modify 13083 The change and time_modify attributes may be updated by the server 13084 when the LAYOUTCOMMIT operation is processed. The reason for this is 13085 that some layout types do not support the update of these attributes 13086 when the storage devices process I/O operations. If client has a 13087 layout with the LAYOUTIOMODE4_RW iomode on the file, the client MAY 13088 provide a suggested value to the server for time_modify within the 13089 arguments to LAYOUTCOMMIT. Based on the layout type, the provided 13090 value may or may not be used. The server should sanity check the 13091 client provided values before they are used. For example, the server 13092 should ensure that time does not flow backwards. The client always 13093 has the option to set time_modify through an explicit SETATTR 13094 operation. 13096 For some layout protocols, the storage device is able to notify the 13097 metadata server of the occurrence of an I/O and as a result the 13098 change and time_modify attributes may be updated at the metadata 13099 server. For a metadata server that is capable of monitoring updates 13100 to the change and time_modify attributes, LAYOUTCOMMIT processing is 13101 not required to update the change attribute; in this case the 13102 metadata server must ensure that no further update to the data has 13103 occurred since the last update of the attributes; file-based 13104 protocols may have enough information to make this determination or 13105 may update the change attribute upon each file modification. This 13106 also applies for the time_modify attribute. If the server 13107 implementation is able to determine that the file has not been 13108 modified since the last time_modify update, the server need not 13109 update time_modify at LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the 13110 updated attributes should be visible if that file was modified since 13111 the latest previous LAYOUTCOMMIT or LAYOUTGET. 13113 12.5.4.2. LAYOUTCOMMIT and size 13115 The size of a file may be updated when the LAYOUTCOMMIT operation is 13116 used by the client. One of the fields in the argument to 13117 LAYOUTCOMMIT is loca_last_write_offset; this field indicates the 13118 highest byte offset written but not yet committed with the 13119 LAYOUTCOMMIT operation. The data type of loca_last_write_offset is 13120 newoffset4 and is switched on a boolean value, no_newoffset, that 13121 indicates if a previous write occurred or not. If no_newoffset is 13122 FALSE, an offset is not given. If the client has a layout with 13123 LAYOUTIOMODE4_RW iomode on the file, with an lo_offset and lo_length 13124 that overlaps loca_last_write_offset, then the client MAY set 13125 no_newoffset to TRUE and provide an offset that will update the file 13126 size. Keep in mind that offset is not the same as length, though 13127 they are related. For example, a loca_last_write_offset value of 13128 zero means that one byte was written at offset zero, and so the 13129 length of the file is at least one byte. 13131 The metadata server may do one of the following: 13133 1. Update the file's size using the last write offset provided by 13134 the client as either the true file size or as a hint of the file 13135 size. If the metadata server has a method available, any new 13136 value for file size should be sanity checked. For example, the 13137 file must not be truncated if the client presents a last write 13138 offset less than the file's current size. 13140 2. Ignore the client provided last write offset; the metadata server 13141 must have sufficient knowledge from other sources to determine 13142 the file's size. For example, the metadata server queries the 13143 storage devices with the control protocol. 13145 The method chosen to update the file's size will depend on the 13146 storage device's and/or the control protocol's capabilities. For 13147 example, if the storage devices are block devices with no knowledge 13148 of file size, the metadata server must rely on the client to set the 13149 last write offset appropriately. 13151 The results of LAYOUTCOMMIT contain a new size value in the form of a 13152 newsize4 union data type. If the file's size is set as a result of 13153 LAYOUTCOMMIT, the metadata server must reply with the new size; 13154 otherwise the new size is not provided. If the file size is updated, 13155 the metadata server SHOULD update the storage devices such that the 13156 new file size is reflected when LAYOUTCOMMIT processing is complete. 13157 For example, the client should be able to READ up to the new file 13158 size. 13160 The client can extend the length of a file or truncate a file by 13161 sending a SETATTR operation to the metadata server with the size 13162 attribute specified. If the size specified is larger than the 13163 current size of the file, the file is "zero extended", i.e., zeroes 13164 are implicitly added between the file's previous EOF and the new EOF. 13165 (In many implementations the zero extended region of the file 13166 consists of unallocated holes in the file.) When the client writes 13167 past EOF via WRITE, the SETATTR operation does not need to be used. 13169 12.5.4.3. LAYOUTCOMMIT and layoutupdate 13171 The LAYOUTCOMMIT argument contains a loca_layoutupdate field 13172 (Section 18.42.1) of data type layoutupdate4 (Section 3.3.18). This 13173 argument is a layout type-specific structure. The structure can be 13174 used to pass arbitrary layout type-specific information from the 13175 client to the metadata server at LAYOUTCOMMIT time. For example, if 13176 using a block/volume layout, the client can indicate to the metadata 13177 server which reserved or allocated blocks the client used or did not 13178 use. The content of loca_layoutupdate (field lou_body) need not be 13179 the same layout type-specific content returned by LAYOUTGET 13180 (Section 18.43.2) in the loc_body field of the lo_content field, of 13181 the logr_layout field. The content of loca_layoutupdate is defined 13182 by the layout type specification and is opaque to LAYOUTCOMMIT. 13184 12.5.5. Recalling a Layout 13186 Since a layout protects a client's access to a file via a direct 13187 client-storage-device path, a layout need only be recalled when it is 13188 semantically unable to serve this function. Typically, this occurs 13189 when the layout no longer encapsulates the true location of the file 13190 over the byte range it represents. Any operation or action, such as 13191 server driven restriping or load balancing, that changes the layout 13192 will result in a recall of the layout. A layout is recalled by the 13193 CB_LAYOUTRECALL callback operation (see Section 20.3) and returned 13194 with LAYOUTRETURN Section 18.44. The CB_LAYOUTRECALL operation may 13195 recall a layout identified by a byte range, all the layouts 13196 associated with a file system (FSID), or all layouts associated with 13197 a client ID. Section 12.5.5.2 discusses sequencing issues 13198 surrounding the getting, returning, and recalling of layouts. 13200 An iomode is also specified when recalling a layout. Generally, the 13201 iomode in the recall request must match the layout being returned; 13202 for example, a recall with an iomode of LAYOUTIOMODE4_RW should cause 13203 the client to only return LAYOUTIOMODE4_RW layouts and not 13204 LAYOUTIOMODE4_READ layouts. However, a special LAYOUTIOMODE4_ANY 13205 enumeration is defined to enable recalling a layout of any iomode; in 13206 other words, the client must return both read-only and read/write 13207 layouts. 13209 A REMOVE operation SHOULD cause the metadata server to recall the 13210 layout to prevent the client from accessing a non-existent file and 13211 to reclaim state stored on the client. Since a REMOVE may be delayed 13212 until the last close of the file has occurred, the recall may also be 13213 delayed until this time. After the last reference on the file has 13214 been released and the file has been removed, the client should no 13215 longer be able to perform I/O using the layout. In the case of a 13216 files based layout, the data server SHOULD return NFS4ERR_STALE in 13217 response to any operation on the removed file. 13219 Once a layout has been returned, the client MUST NOT send I/Os to the 13220 storage devices for the file, byte range, and iomode represented by 13221 the returned layout. If a client does send an I/O to a storage 13222 device for which it does not hold a layout, the storage device SHOULD 13223 reject the I/O. 13225 Although pNFS does not alter the file data caching capabilities of 13226 clients, or their semantics, it recognizes that some clients may 13227 perform more aggressive write-behind caching to optimize the benefits 13228 provided by pNFS. However, write-behind caching may negatively 13229 affect the latency in returning a layout in response to a 13230 CB_LAYOUTRECALL; this is similar to file delegations and the impact 13231 that file data caching has on DELEGRETURN. Client implementations 13232 SHOULD limit the amount of unwritten data they have outstanding at 13233 any one time in order to prevent excessively long responses to 13234 CB_LAYOUTRECALL. Once a layout is recalled, a server MUST wait one 13235 lease period before taking further action. As soon as a lease period 13236 has past, the server may choose to fence the client's access to the 13237 storage devices if the server perceives the client has taken too long 13238 to return a layout. However, just as in the case of data delegation 13239 and DELEGRETURN, the server may choose to wait given that the client 13240 is showing forward progress on its way to returning the layout. This 13241 forward progress can take the form of successful interaction with the 13242 storage devices or sub-portions of the layout being returned by the 13243 client. The server can also limit exposure to these problems by 13244 limiting the byte ranges initially provided in the layouts and thus 13245 the amount of outstanding modified data. 13247 12.5.5.1. Layout Recall Callback Robustness 13249 It has been assumed thus far that pNFS client state for a file 13250 exactly matches the pNFS server state for that file and client 13251 regarding layout ranges and iomode. This assumption leads to the 13252 implication that any callback results in a LAYOUTRETURN or set of 13253 LAYOUTRETURNs that exactly match the range in the callback, since 13254 both client and server agree about the state being maintained. 13255 However, it can be useful if this assumption does not always hold. 13256 For example: 13258 o If conflicts that require callbacks are very rare, and a server 13259 can use a multi-file callback to recover per-client resources 13260 (e.g., via a FSID recall, or a multi-file recall within a single 13261 compound), the result may be significantly less client-server pNFS 13262 traffic. 13264 o It may be useful for servers to maintain information about what 13265 ranges are held by a client on a coarse-grained basis, leading to 13266 the server's layout ranges being beyond those actually held by the 13267 client. In the extreme, a server could manage conflicts on a per- 13268 file basis, only issuing whole-file callbacks even though clients 13269 may request and be granted sub-file ranges. 13271 o It may be useful for clients to "forget" details about what 13272 layouts and ranges the client actually has, leading to the 13273 server's layout ranges being beyond those what the client "thinks" 13274 it has. As long as the client does not assume it has layouts that 13275 are beyond what the server has granted, this is a safe practice. 13276 When a client forgets what ranges and layouts it has, and it 13277 receives a CB_LAYOUTRECALL operation, the client MUST follow up 13278 with a LAYOUTRETURN for what the server recalled, or alternatively 13279 return the NFS4ERR_NOMATCHING_LAYOUT error if it has no layout to 13280 return in the recalled range. 13282 o In order to avoid errors, it is vital that a client not assign 13283 itself layout permissions beyond what the server has granted and 13284 that the server not forget layout permissions that have been 13285 granted. On the other hand, if a server believes that a client 13286 holds a layout that the client does not know about, it is useful 13287 for the client to cleanly indicate completion of the requested 13288 recall either by issuing a LAYOUTRETURN for the entire requested 13289 range or by returning an NFS4ERR_NOMATCHING_LAYOUT error to the 13290 CB_LAYOUTRECALL. 13292 Thus, in light of the above, it is useful for a server to be able to 13293 send callbacks for layout ranges it has not granted to a client, and 13294 for a client to return ranges it does not hold. A pNFS client MUST 13295 always return layouts that comprise the full range specified by the 13296 recall. Note, the full recalled layout range need not be returned as 13297 part of a single operation, but may be returned in portions. This 13298 allows the client to stage the flushing of dirty data, layout 13299 commits, and returns. Also, it indicates to the metadata server that 13300 the client is making progress. 13302 When a layout is returned, the client MUST NOT have any outstanding 13303 I/O requests to the storage devices involved in the layout. 13304 Rephrasing, the client MUST NOT return the layout while it has 13305 outstanding I/O requests to the storage device. 13307 Even with this requirement for the client, it is possible that I/O 13308 requests may be presented to a storage device no longer allowed to 13309 perform them. Since the server has no strict control as to when the 13310 client will return the layout, the server may later decide to 13311 unilaterally revoke the client's access to the storage devices as 13312 provided by the layout. In choosing to revoke access, the server 13313 must deal with the possibility of lingering I/O request; those 13314 outstanding I/O requests are still in flight to storage devices 13315 identified by the revoked layout. All layout specifications MUST 13316 define whether unilateral layout revocation by the metadata server is 13317 supported; if it is, the specification must also describe how 13318 lingering writes are processed. For example, storage devices 13319 identified by the revoked layout could be fenced off from the client 13320 that held the layout. 13322 In order to ensure client/server convergence with regard to layout 13323 state, the final LAYOUTRETURN operation in a sequence of LAYOUTRETURN 13324 operations for a particular recall, MUST specify the entire range 13325 being recalled, echoing the recalled layout type, iomode, recall/ 13326 return type (FILE, FSID, or ALL), and byte range; even if layouts 13327 pertaining to partial ranges were previously returned. In addition, 13328 if the client holds no layouts that overlaps the range being 13329 recalled, the client should return the NFS4ERR_NOMATCHING_LAYOUT 13330 error code to CB_LAYOUTRECALL. This allows the server to update its 13331 view of the client's layout state. 13333 12.5.5.2. Sequencing of Layout Operations 13335 As with other stateful operations, pNFS requires the correct 13336 sequencing of layout operations. PNFS uses the "seqid" in the layout 13337 stateid to provide the correct sequencing between regular operations 13338 and callbacks. It is the server's responsibility to avoid 13339 inconsistencies regarding the layouts provided and the client's 13340 responsibility to properly serialize its layout requests and layout 13341 returns. 13343 12.5.5.2.1. Layout Recall and Return Sequencing 13345 One critical issue with regard to layout operations sequencing 13346 concerns callbacks. The protocol must defend against races between 13347 the reply to a LAYOUTGET or LAYOUTRETURN operation and a subsequent 13348 CB_LAYOUTRECALL. A client MUST NOT process a CB_LAYOUTRECALL that 13349 implies one or more outstanding LAYOUTGET or LAYOUTRETURN operations 13350 to which the client has not yet received a reply. The client detects 13351 such a CB_LAYOUTRECALL by examining the "seqid" field of the recall's 13352 layout stateid. If the "seqid" is not one higher than what the 13353 client currently has recorded, and the client has at least one 13354 LAYOUTGET and/or LAYOUTRETURN operation outstanding, the client knows 13355 the server sent the CB_LAYOUTRECALL after sending a response to an 13356 outstanding LAYOUTGET or LAYOUTRETURN. The client MUST wait before 13357 processing such a CB_LAYOUTRECALL until it processes all replies for 13358 outstanding LAYOUTGET and LAYOUTRETURN operations for the 13359 corresponding file with seqid less than the seqid given by 13360 CB_LAYOUTRECALL (lor_stateid, see Section 20.3.) 13362 In addition to the seqid-based mechanism, Section 2.10.5.3 describes 13363 the sessions mechanism for allowing the client to detect callback 13364 race conditions and delay processing such a CB_LAYOUTRECALL. The 13365 server MAY reference conflicting operations in the CB_SEQUENCE that 13366 precedes the CB_LAYOUTRECALL. Because the server has already sent 13367 replies for these operations before issuing the callback, the replies 13368 may race with the CB_LAYOUTRECALL. The client MUST wait for all the 13369 referenced calls to complete and update its view of the layout state 13370 before processing the CB_LAYOUTRECALL. 13372 12.5.5.2.1.1. Get/Return Sequencing 13374 The protocol allows the client to send concurrent LAYOUTGET and 13375 LAYOUTRETURN operations to the server. The protocol does not provide 13376 any means for the server to process the requests in the same order in 13377 which they were created. However, through the use of the "seqid" 13378 field in the layout stateid, the client can determine the order in 13379 which parallel outstanding operations were processed by the server. 13380 Thus, when a layout retrieved by an outstanding LAYOUTGET operation 13381 intersects with a layout returned by an outstanding LAYOUTRETURN on 13382 the same file, the order in which the two conflicting operations are 13383 processed determines the final state of the overlapping layout. The 13384 order is determined by the "seqid" returned in each operation: the 13385 operation with the higher seqid was executed later. 13387 It is permissible for the client to send in parallel multiple 13388 LAYOUTGET operations for the same file or multiple LAYOUTRETURN 13389 operations for the same file, and a mix of both. 13391 It is permissible for the client to use the current stateid (see 13392 Section 16.2.3.1.2) for LAYOUTGET operations for example when 13393 compounding LAYOUTGETs or compounding OPEN and LAYOUTGETs. It is 13394 also permissible to use the current stateid when compounding 13395 LAYOUTRETURNs. 13397 It is permissible for the client to use the current stateid when 13398 combining LAYOUTRETURN and LAYOUTGET operations for the same file in 13399 the same COMPOUND request since the server MUST process these in 13400 order. However, if a client does send such COMPOUND requests, it 13401 MUST NOT have more than one outstanding for the same file at the same 13402 time and MUST NOT have other LAYOUTGET or LAYOUTRETURN operations 13403 outstanding at the same time for that same file. 13405 12.5.5.2.1.2. Client Considerations 13407 Consider a pNFS client that has sent a LAYOUTGET and before it 13408 receives the reply to LAYOUTGET, it receives a CB_LAYOUTRECALL for 13409 the same file with an overlapping range. There are two 13410 possibilities, which the client can distinguish via the layout 13411 stateid in the recall. 13413 1. The server processed the LAYOUTGET before issuing the recall, so 13414 the LAYOUTGET must be waited for because it may be carrying 13415 layout information that will need to be returned to deal with the 13416 CB_LAYOUTRECALL. 13418 2. The server sent the callback before receiving the LAYOUTGET. The 13419 server will not respond to the LAYOUTGET until the 13420 CB_LAYOUTRECALL is processed. 13422 If these possibilities cannot be distinguished, a deadlock could 13423 result, as the client must wait for the LAYOUTGET response before 13424 processing the recall in the first case, but that response will not 13425 arrive until after the recall is processed in the second case. Note 13426 that in the first case, the "seqid" in the layout stateid of the 13427 recall is two greater than what the client has recorded and in the 13428 second case, the "seqid" is one greater than what the client has 13429 recorded. This allows the client to disambiguate between the two 13430 cases. The client thus knows precisely which possibility applies. 13432 In case 1 the client knows it needs to wait for the LAYOUTGET 13433 response before processing the recall (or the client can return 13434 NFS4ERR_DELAY). 13436 In case 2 the client will not wait for the LAYOUTGET response before 13437 processing the recall, because waiting would cause deadlock. 13438 Therefore, the action at the client will only require waiting in the 13439 case that the client has not yet seen the server's earlier responses 13440 to the LAYOUTGET operation(s). 13442 The recall process can be considered completed when the final 13443 LAYOUTRETURN operation for the recalled range is completed. The 13444 LAYOUTRETURN uses the layout stateid (with seqid) specified in 13445 CB_LAYOUTRECALL. If the client uses multiple LAYOUTRETURNs in 13446 processing the recall, the first LAYOUTRETURN will use the layout 13447 stateid as specified in CB_LAYOUTRECALL. Subsequent LAYOUTRETURNs 13448 will use the highest seqid as is the usual case. 13450 12.5.5.2.1.3. Server Considerations 13452 Consider a race from the metadata server's point of view. The 13453 metadata server has sent a CB_LAYOUTRECALL and receives an 13454 overlapping LAYOUTGET for the same file before the LAYOUTRETURN(s) 13455 that respond to the CB_LAYOUTRECALL. There are three cases: 13457 1. The client sent the LAYOUTGET before processing the 13458 CB_LAYOUTRECALL. The "seqid" in the layout stateid of LAYOUTGET 13459 is two less than the "seqid" in CB_LAYOUTRECALL. The server 13460 returns NFS4ERR_RECALLCONFLICT to the client, which indicates to 13461 the client that there is a pending recall. 13463 2. The client sent the LAYOUTGET after processing the 13464 CB_LAYOUTRECALL, but the LAYOUTGET arrived before the 13465 LAYOUTRETURN and the response to CB_LAYOUTRECALL that completed 13466 that processing. The "seqid" in the layout stateid of LAYOUTGET 13467 is equal to or greater than that of the "seqid" in 13468 CB_LAYOUTRECALL. The server has not received a response to the 13469 CB_LAYOUTRECALL, so it returns NFS4ERR_RECALLCONFLICT. 13471 3. The client sent the LAYOUTGET after processing the 13472 CB_LAYOUTRECALL, the server received the CB_LAYOUTRECALL 13473 response, but the LAYOUTGET arrived before the LAYOUTRETURN that 13474 completed that processing. The "seqid" in the layout stateid of 13475 LAYOUTGET is equal to that of the "seqid" in CB_LAYOUTRECALL. 13476 The server has received a response to the CB_LAYOUTRECALL, so it 13477 returns NFS4ERR_RETURNCONFLICT. 13479 12.5.5.2.1.4. Wraparound and Validation of Seqid 13481 The rules for layout stateid processing differ from other stateids in 13482 the protocol because the "seqid" value cannot be zero and the 13483 stateid's "seqid" value changes in a CB_LAYOUTRECALL operation. The 13484 non-zero requirement combined with the inherent parallelism of layout 13485 operations means that a set of LAYOUTGET and LAYOUTRETURN operations 13486 may contain the same value for "seqid". The server uses a slightly 13487 modified version of the modulo arithmetic as described in 13488 Section 2.10.5.1 when incrementing the layout stateid's "seqid". The 13489 modification to that modulo arithmetic description is to not use 13490 zero. The modulo arithmetic is also used for the comparisons of 13491 "seqid" values in the processing of CB_LAYOUTRECALL events as 13492 described above in Section 12.5.5.2.1.3. 13494 Just as the server validates the "seqid" in the event of 13495 CB_LAYOUTRECALL usage, as described in Section 12.5.5.2.1.3, the 13496 server also validates the "seqid" value to ensure that it is within 13497 an appropriate range. This range represents the degree of 13498 parallelism the server supports for layout stateids. If the client 13499 is sending multiple layout operations to the server in parallel, by 13500 definition, the "seqid" value in the supplied stateid will not be the 13501 current "seqid" as held by the server. The range of parallelism 13502 spans from the highest or current "seqid" to a "seqid" value in the 13503 past. To assist in the discussion, the server's current "seqid" 13504 value for a layout stateid is defined as: SERVER_CURRENT_SEQID. The 13505 lowest "seqid" value that is acceptable to the server is represented 13506 by PAST_SEQID. And the value for the range of valid "seqid"s or 13507 range of parallelism is VALID_SEQID_RANGE. Therefore, the following 13508 holds: VALID_SEQID_RANGE = SERVER_CURRENT_SEQID - PAST_SEQID. In the 13509 following, all arithmetic is the modulo arithmetic as described 13510 above. 13512 The server MUST support a minimum VALID_SEQID_RANGE. The minimum is 13513 defined as: VALID_SEQID_RANGE = summation of 1..N of 13514 (ca_maxoperations(i) - 1) where N is the number of session fore 13515 channels and ca_maxoperations(i) is the value of the ca_maxoperations 13516 returned from CREATE_SESSION of the i'th session. The reason for 13517 minus 1 is to allow for the required SEQUENCE operation. The server 13518 MAY support a VALID_SEQID_RANGE value larger than the minimum. The 13519 maximum VALID_SEQID_RANGE is (2 ^ 32 - 2) (accounts for 0 not being a 13520 valid "seqid" value). 13522 If the server finds the "seqid" is zero, the NFS4ERR_BAD_STATEID 13523 error is returned to the client. The server further validates the 13524 "seqid" to ensure it is within the range of parallelism, 13525 VALID_SEQID_RANGE. If the "seqid" value is outside of that range, 13526 the error NFS4ERR_OLD_STATEID is returned to the client. Upon 13527 receipt of NFS4ERR_OLD_STATEID, the client updates the stateid in the 13528 layout request based on processing of other layout requests and re- 13529 sends the operation to the server. 13531 12.5.5.2.1.5. Bulk Recall and Return 13533 PNFS supports recalling and returning all layouts that are for files 13534 belonging to a particular fsid (LAYOUTRECALL4_FSID, 13535 LAYOUTRETURN4_FSID) or client ID (LAYOUTRECALL4_ALL, 13536 LAYOUTRETURN4_ALL). There are no "bulk" stateids, so detection of 13537 races via the seqid is not possible. The server MUST NOT initiate 13538 bulk recall while another recall is in progress, or the corresponding 13539 LAYOUTRETURN is in progress or pending. In the event the server 13540 sends a bulk recall while the client has pending or in progress 13541 LAYOUTRETURN, CB_LAYOUTRECALL, or LAYOUTGET, the client returns 13542 NFS4ERR_DELAY. In the event the client sends a LAYOUTGET or 13543 LAYOUTRETURN while a bulk recall is in progress, the server returns 13544 NFS4ERR_RECALLCONFLICT. If the client sends a LAYOUTGET or 13545 LAYOUTRETURN after the server receives NFS4ERR_DELAY from a bulk 13546 recall, then to ensure forward progress, the server MAY return 13547 NFS4ERR_RECALLCONFLICT. 13549 Once a CB_LAYOUTRECALL of LAYOUTRECALL4_ALL is sent, the server MUST 13550 NOT allow the client to use any layout stateid except for 13551 LAYOUTCOMMIT operations. Once the client receives a CB_LAYOUTRECALL 13552 of LAYOUTRECALL4_ALL, it MUST NOT use any layout stateid except for 13553 LAYOUTCOMMIT operations. Once a LAYOUTRETURN of LAYOUTRETURN4_ALL is 13554 sent, all layout stateids granted to the client ID are freed. The 13555 client MUST NOT use the layout stateids again. It MUST use LAYOUTGET 13556 to obtain new layout stateids. 13558 Once a CB_LAYOUTRECALL of LAYOUTRECALL4_FSID is sent, the server MUST 13559 NOT allow the client to use any layout stateid that refers to a file 13560 with the specified fsid except for LAYOUTCOMMIT operations. Once the 13561 client receives a CB_LAYOUTRECALL of LAYOUTRECALL4_ALL, it MUST NOT 13562 use any layout stateid that refers to a file with the specified fsid 13563 except for LAYOUTCOMMIT operations. Once a LAYOUTRETURN of 13564 LAYOUTRETURN4_FSID is sent, all layout stateids granted to the 13565 referenced fsid are freed. The client MUST NOT use the layout 13566 stateids for files with the referenced fsid again. It MUST use 13567 LAYOUTGET to obtain new layout stateids files with the referenced 13568 fsid. 13570 If the server has sent a bulk CB_LAYOUTRECALL, and receives a 13571 LAYOUTGET, or a LAYOUTRETURN with a stateid, the server MUST return 13572 NFS4ERR_RECALLCONFLICT. If the server has sent a bulk 13573 CB_LAYOUTRECALL, and receives a LAYOUTRETURN with an lr_returntype 13574 that is not equal to the lor_recalltype of the CB_LAYOUTRECALL, the 13575 server MUST return NFS4ERR_RECALLCONFLICT. 13577 12.5.6. Revoking Layouts 13579 Parallel NFS permits servers to revoke layouts from clients that fail 13580 to response to recalls and/or fail to renew their lease in time. 13581 Whether the server revokes the layout or not depends on the layout 13582 type, and what actions are taken with respect to the client's I/O to 13583 data servers is also layout type specific. 13585 12.5.7. Metadata Server Write Propagation 13587 Asynchronous writes written through the metadata server may be 13588 propagated lazily to the storage devices. For data written 13589 asynchronously through the metadata server, a client performing a 13590 read at the appropriate storage device is not guaranteed to see the 13591 newly written data until a COMMIT occurs at the metadata server. 13592 While the write is pending, reads to the storage device may give out 13593 either the old data, the new data, or a mixture of new and old. Upon 13594 completion of a synchronous WRITE or COMMIT (for asynchronously 13595 written data), the metadata server MUST ensure that storage devices 13596 give out the new data and that the data has been written to stable 13597 storage. If the server implements its storage in any way such that 13598 it cannot obey these constraints, then it MUST recall the layouts to 13599 prevent reads being done that cannot be handled correctly. Note that 13600 the layouts MUST be recalled prior to the server responding to the 13601 associated WRITE operations. 13603 12.6. pNFS Mechanics 13605 This section describes the operations flow taken by a pNFS client to 13606 a metadata server and storage device. 13608 When a pNFS client encounters a new FSID, it sends a GETATTR to the 13609 NFSv4.1 server for the fs_layout_type (Section 5.12.1) attribute. If 13610 the attribute returns at least one layout type, and the layout types 13611 returned are among the set supported by the client, the client knows 13612 that pNFS is a possibility for the file system. If, from the server 13613 that returned the new FSID, the client does not have a client ID that 13614 came from an EXCHANGE_ID result that returned 13615 EXCHGID4_FLAG_USE_PNFS_MDS, it MUST send an EXCHANGE_ID to the server 13616 with the EXCHGID4_FLAG_USE_PNFS_MDS bit set. If the server's 13617 response does not have EXCHGID4_FLAG_USE_PNFS_MDS, then contrary to 13618 what the fs_layout_type attribute said, the server does not support 13619 pNFS, and the client will not be able use pNFS to that server; in 13620 this case, the server MUST return NFS4ERR_NOTSUPP in response to any 13621 pNFS operation. 13623 The client then creates a session, requesting a persistent session, 13624 so that exclusive creates can be done with single round trip via the 13625 createmode4 of GUARDED4. If the session ends up not being 13626 persistent, the client will use EXCLUSIVE4_1 for exclusive creates. 13628 If a file is to be created on a pNFS enabled file system, the client 13629 uses the OPEN operation. With the normal set of attributes that may 13630 be provided upon OPEN used for creation, there is an OPTIONAL 13631 layout_hint attribute. The client's use of layout_hint allows the 13632 client to express its preference for a layout type and its associated 13633 layout details. The use of a createmode4 of UNCHECKED4, GUARDED4, or 13634 EXCLUSIVE4_1 will allow the client to provide the layout_hint 13635 attribute at create time. The client MUST NOT use EXCLUSIVE4 (see 13636 Table 18). The client is RECOMMENDED to combine a GETATTR operation 13637 after the OPEN within the same COMPOUND. The GETATTR may then 13638 retrieve the layout_type attribute for the newly created file. The 13639 client will then know what layout type the server has chosen for the 13640 file and therefore what storage protocol the client must use. 13642 If the client wants to open an existing file, then it also includes a 13643 GETATTR to determine what layout type the file supports. 13645 The GETATTR in either the file creation or plain file open case can 13646 also include the layout_blksize and layout_alignment attributes so 13647 that the client can determine optimal offsets and lengths for I/O on 13648 the file. 13650 Assuming the client supports the layout type returned by GETATTR and 13651 it chooses to use pNFS for data access, it then sends LAYOUTGET using 13652 the filehandle and stateid returned by OPEN, specifying the range it 13653 wants to do I/O on. The response is a layout, which may be a subset 13654 of the range for which the client asked. It also includes device IDs 13655 and a description of how data is organized (or in the case of 13656 writing, how data is to be organized) across the devices. The device 13657 IDs and data description are encoded in a format that is specific to 13658 the layout type, but the client is expected to understand. 13660 When the client wants to send an I/O, it determines which device ID 13661 it needs to send the I/O command to by examining the data description 13662 in the layout. It then sends a GETDEVICEINFO to find the device 13663 address(es) of the device ID. The client then sends the I/O request 13664 one of device ID's device addresses, using the storage protocol 13665 defined for the layout type. Note that if a client has multiple I/Os 13666 to send, these I/O requests may be done in parallel. 13668 If the I/O was a WRITE, then at some point the client may want to use 13669 LAYOUTCOMMIT to commit the modification time and the new size of the 13670 file (if it believes it extended the file size) to the metadata 13671 server and the modified data to the file system. 13673 12.7. Recovery 13675 Recovery is complicated by the distributed nature of the pNFS 13676 protocol. In general, crash recovery for layouts is similar to crash 13677 recovery for delegations in the base NFSv4.1 protocol. However, the 13678 client's ability to perform I/O without contacting the metadata 13679 server introduces subtleties that must be handled correctly if the 13680 possibility of file system corruption is to be avoided. 13682 12.7.1. Recovery from Client Restart 13684 Client recovery for layouts is similar to client recovery for other 13685 lock and delegation state. When an pNFS client restarts, it will 13686 lose all information about the layouts that it previously owned. 13687 There are two methods by which the server can reclaim these resources 13688 and allow otherwise conflicting layouts to be provided to other 13689 clients. 13691 The first is through the expiry of the client's lease. If the client 13692 recovery time is longer than the lease period, the client's lease 13693 will expire and the server will know that state may be released. For 13694 layouts the server may release the state immediately upon lease 13695 expiry or it may allow the layout to persist awaiting possible lease 13696 revival, as long as no other layout conflicts. 13698 The second is through the client restarting in less time than it 13699 takes for the lease period to expire. In such a case, the client 13700 will contact the server through the standard EXCHANGE_ID protocol. 13701 The server will find that the client's co_ownerid matches the 13702 co_ownerid of the previous client invocation, but that the verifier 13703 is different. The server uses this as a signal to release all layout 13704 state associated with the client's previous invocation. In this 13705 scenario, the data written by the client but not covered by a 13706 successful LAYOUTCOMMIT is in an undefined state; it may have been 13707 written or it may now be lost. This is acceptable behavior and it is 13708 the client's responsibility to use LAYOUTCOMMIT to achieve the 13709 desired level of stability. 13711 12.7.2. Dealing with Lease Expiration on the Client 13713 If a client believes its lease has expired, it MUST NOT send I/O to 13714 the storage device until it has validated its lease. The client can 13715 send a SEQUENCE operation to the metadata server. If the SEQUENCE 13716 operation is successful, but sr_status_flag has 13717 SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, 13718 SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, or 13719 SEQ4_STATUS_ADMIN_STATE_REVOKED set, the client MUST NOT use 13720 currently held layouts. The client has two choices to recover from 13721 the lease expiration. First, for all modified but uncommitted data, 13722 write it to the metadata server using the FILE_SYNC4 flag for the 13723 WRITEs or WRITE and COMMIT. Second, the client reestablishes a 13724 client ID and session with the server and obtain new layouts and 13725 device ID to device address mappings for the modified data ranges and 13726 then write the data to the storage devices with the newly obtained 13727 layouts. 13729 If sr_status_flags from the metadata server has 13730 SEQ4_STATUS_RESTART_RECLAIM_NEEDED set (or SEQUENCE returns 13731 NFS4ERR_STALE_CLIENTID, or SEQUENCE returns NFS4ERR_BAD_SESSION and 13732 CREATE_SESSION returns NFS4ERR_STALE_CLIENTID) then the metadata 13733 server has restarted, and the client SHOULD recover using the methods 13734 described in Section 12.7.4. 13736 If sr_status_flags from the metadata server has 13737 SEQ4_STATUS_LEASE_MOVED set, then the client recovers by following 13738 the procedure described in Section 11.7.7.1. After that, the client 13739 may get an indication that the layout state was not moved with the 13740 file system. The client recovers as in the other applicable 13741 situations discussed in Paragraph 1 or Paragraph 2 of this section. 13743 If sr_status_flags reports no loss of state, then the lease for the 13744 layouts the client has are valid and renewed, and the client can once 13745 again send I/O requests to the storage devices. 13747 While clients SHOULD NOT send I/Os to storage devices that may extend 13748 past the lease expiration time period, this is not always possible; 13749 for example, an extended network partition that starts after the I/O 13750 is sent and does not heal until the I/O request is received by the 13751 storage device. Thus the metadata server and/or storage devices are 13752 responsible for protecting themselves from I/Os that are sent before 13753 the lease expires, but arrive after the lease expires. See 13754 Section 12.7.3. 13756 12.7.3. Dealing with Loss of Layout State on the Metadata Server 13758 This is a description of the case where all of the following are 13759 true: 13761 o the metadata server has not restarted 13763 o a pNFS client's layouts have been discarded (usually because the 13764 client's lease expired) and are invalid 13766 o an I/O from the pNFS client arrives at the storage device 13768 The metadata server and its storage devices MUST solve this by 13769 fencing the client. In other words, prevent the execution of I/O 13770 operations from the client to the storage devices after layout state 13771 loss. The details of how fencing is done are specific to the layout 13772 type. The solution for NFSv4.1 file-based layouts is described in 13773 (Section 13.11), and for other layout types in their respective 13774 external specification documents. 13776 12.7.4. Recovery from Metadata Server Restart 13778 The pNFS client will discover that the metadata server has restarted 13779 via the methods described in Section 8.4.2 and discussed in a pNFS- 13780 specific context in Paragraph 2, of Section 12.7.2. The client MUST 13781 stop using layouts and delete the device ID to device address 13782 mappings it previously received from the metadata server. Having 13783 done that, if the client wrote data to the storage device without 13784 committing the layouts via LAYOUTCOMMIT, then the client has 13785 additional work to do in order to have the client, metadata server 13786 and storage device(s) all synchronized on the state of the data. 13788 o If the client has data still modified and unwritten in the 13789 client's memory, the client has only two choices. 13791 1. The client can obtain a layout via LAYOUTGET after the 13792 server's grace period and write the data to the storage 13793 devices. 13795 2. The client can write that data through the metadata server 13796 using the WRITE (Section 18.32) operation, and then obtain 13797 layouts as desired. 13799 o If the client asynchronously wrote data to the storage device, but 13800 still has a copy of the data in its memory, then it has available 13801 to it the recovery options listed above in the previous bullet 13802 point. If the metadata server is also in its grace period, the 13803 client has available to it the options below in the next bullet 13804 item. 13806 o The client does not have a copy of the data in its memory and the 13807 metadata server is still in its grace period. The client cannot 13808 use LAYOUTGET (within or outside the grace period) to reclaim a 13809 layout because the contents of the response from LAYOUTGET may not 13810 match what it had previously. The range might be different or it 13811 might get the same range but the content of the layout might be 13812 different. Even if the content of the layout appears to be the 13813 same, the device IDs may map to different device addresses, and 13814 even if the device addresses are the same, the device addresses 13815 could have been assigned to a different storage device. The 13816 option of retrieving the data from the storage device and writing 13817 it to the metadata server per the recovery scenario described 13818 above is not available because, again, the mappings of range to 13819 device ID, device ID to device address, device address to physical 13820 device are stale and new mappings via new LAYOUTGET do not solve 13821 the problem. 13823 The only recovery option for this scenario is to send a 13824 LAYOUTCOMMIT in reclaim mode, which the metadata server will 13825 accept as long as it is in its grace period. The use of 13826 LAYOUTCOMMIT in reclaim mode informs the metadata server that the 13827 layout has changed. It is critical the metadata server receive 13828 this information before its grace period ends, and thus before it 13829 starts allowing updates to the file system. 13831 To send LAYOUTCOMMIT in reclaim mode, the client sets the 13832 loca_reclaim field of the operation's arguments (Section 18.42.1) 13833 to TRUE. During the metadata server's recovery grace period (and 13834 only during the recovery grace period) the metadata server is 13835 prepared to accept LAYOUTCOMMIT requests with the loca_reclaim 13836 field set to TRUE. 13838 When loca_reclaim is TRUE, the client is attempting to commit 13839 changes to the layout that occurred prior to the restart of the 13840 metadata server. The metadata server applies some consistency 13841 checks on the loca_layoutupdate field of the arguments to 13842 determine whether the client can commit the data written to the 13843 storage device to the file system. The loca_layoutupdate field is 13844 of data type layoutupdate4, and contains layout type-specific 13845 content (in the lou_body field of loca_layoutupdate). The layout 13846 type-specific information that loca_layoutupdate might have is 13847 discussed in Section 12.5.4.3. If the metadata server's 13848 consistency checks on loca_layoutupdate succeed, then the metadata 13849 server MUST commit the data (as described by the loca_offset, 13850 loca_length, and loca_layoutupdate fields of the arguments) that 13851 was written to storage device. If the metadata server's 13852 consistency checks on loca_layoutupdate fail, the metadata server 13853 rejects the LAYOUTCOMMIT operation, and makes no changes to the 13854 file system. However, any time LAYOUTCOMMIT with loca_reclaim 13855 TRUE fails, the pNFS client has lost all the data in the range 13856 defined by . A client can defend 13857 against this risk by caching all data, whether written 13858 synchronously or asynchronously in its memory and not release the 13859 cached data until a successful LAYOUTCOMMIT. This condition does 13860 not hold true for all layout types; for example, files-based 13861 storage devices need not suffer from this limitation. 13863 o The client does not have a copy of the data in its memory and the 13864 metadata server is no longer in its grace period; i.e. the 13865 metadata server returns NFS4ERR_NO_GRACE. As with the scenario in 13866 the above bullet item, the failure of LAYOUTCOMMIT means the data 13867 in the range lost. The defense against 13868 the risk is the same; cache all written data on the client until a 13869 successful LAYOUTCOMMIT. 13871 12.7.5. Operations During Metadata Server Grace Period 13873 Some of the recovery scenarios thus far noted that some operations, 13874 namely WRITE and LAYOUTGET might be permitted during the metadata 13875 server's grace period. The metadata server may allow these 13876 operations during its grace period. For LAYOUTGET, the metadata 13877 server must reliably determine that servicing such a request will not 13878 conflict with an impending LAYOUTCOMMIT reclaim request. For WRITE, 13879 it must reliably determine that it will not conflict with an 13880 impending OPEN; or a LOCK where the file has mandatory file locking 13881 enabled. 13883 As mentioned previously, some operations, namely WRITE and LAYOUTGET 13884 may be rejected during the metadata server's grace period, because to 13885 provide simple, valid handling during the grace period, the easiest 13886 method is to simply reject all non-reclaim pNFS requests and WRITE 13887 operations by returning the NFS4ERR_GRACE error. However, depending 13888 on the storage protocol (which is specific to the layout type) and 13889 metadata server implementation, the metadata server may be able to 13890 determine that a particular request is safe. For example, a metadata 13891 server may save provisional allocation mappings for each file to 13892 stable storage, as well as information about potentially conflicting 13893 OPEN share modes and mandatory byte-range locks that might have been 13894 in effect at the time of restart, and use this information during the 13895 recovery grace period to determine that a WRITE request is safe. 13897 12.7.6. Storage Device Recovery 13899 Recovery from storage device restart is mostly dependent upon the 13900 layout type in use. However, there are a few general techniques a 13901 client can use if it discovers a storage device has crashed while 13902 holding modified, uncommitted data that was asynchronously written. 13903 First and foremost, it is important to realize that the client is the 13904 only one which has the information necessary to recover non-committed 13905 data; since, it holds the modified data and probably nothing else 13906 does. Second, the best solution is for the client to err on the side 13907 of caution and attempt to re-write the modified data through another 13908 path. 13910 The client SHOULD immediately write the data to the metadata server, 13911 with the stable field in the WRITE4args set to FILE_SYNC4. Once it 13912 does this, there is no need to wait for the original storage device. 13914 12.8. Metadata and Storage Device Roles 13916 If the same physical hardware is used to implement both a metadata 13917 server and storage device, then the same hardware entity is to be 13918 understood to be implementing two distinct roles and it is important 13919 that it be clearly understood on behalf of which role the hardware is 13920 executing at any given time. 13922 Two sub-cases can be distinguished. 13924 1. The storage device uses NFSv4.1 as the storage protocol, i.e. 13925 same physical hardware is used to implement both a metadata and 13926 data server. See Section 13.1 for a description how multiple 13927 roles are handled. 13929 2. The storage device does not use NFSv4.1 as the storage protocol, 13930 and the same physical hardware is used to implement both a 13931 metadata and storage device. Whether distinct network addresses 13932 are used to access metadata server and storage device is 13933 immaterial, because, it is always clear to the pNFS client and 13934 server, from upper layer protocol being used (NFSv4.1 or non- 13935 NFSv4.1) what role the request to the common server network 13936 address is directed to. 13938 12.9. Security Considerations for pNFS 13940 pNFS separates file system metadata and data and provides access to 13941 both. There are pNFS-specific operations (listed in Section 12.3) 13942 that provide access to the metadata; all existing NFSv4.1 13943 conventional (non-pNFS) security mechanisms and features apply to 13944 accessing the metadata. The combination of components in a pNFS 13945 system (see Figure 68) is required to preserve the security 13946 properties of NFSv4.1 with respect to an entity accessing storage 13947 device from a client, including security countermeasures to defend 13948 against threats that NFSv4.1 provides defenses for in environments 13949 where these threats are considered significant. 13951 In some cases, the security countermeasures for connections to 13952 storage devices may take the form of physical isolation or a 13953 recommendation not to use pNFS in an environment. For example, it 13954 may be impractical to provide confidentiality protection for some 13955 storage protocols to protect against eavesdropping; in environments 13956 where eavesdropping on such protocols is of sufficient concern to 13957 require countermeasures, physical isolation of the communication 13958 channel (e.g., via direct connection from client(s) to storage 13959 device(s)) and/or a decision to forego use of pNFS (e.g., and fall 13960 back to conventional NFSv4.1) may be appropriate courses of action. 13962 Where communication with storage devices is subject to the same 13963 threats as client to metadata server communication, the protocols 13964 used for that communication need to provide security mechanisms as 13965 strong as or no weaker than those available via RPSEC_GSS for 13966 NFSv4.1. 13968 pNFS implementations MUST NOT remove NFSv4.1's access controls. The 13969 combination of clients, storage devices, and the metadata server are 13970 responsible for ensuring that all client to storage device file data 13971 access respects NFSv4.1's ACLs and file open modes. This entails 13972 performing both of these checks on every access in the client, the 13973 storage device, or both (as applicable; when the storage device is an 13974 NFSv4.1 server, the storage device is ultimately responsible for 13975 controlling access). If a pNFS configuration performs these checks 13976 only in the client, the risk of a misbehaving client obtaining 13977 unauthorized access is an important consideration in determining when 13978 it is appropriate to use such a pNFS configuration. Such layout 13979 types SHOULD NOT be used when client-only access checks do not 13980 provide sufficient assurance that NFSv4.1 access control is being 13981 applied correctly. 13983 13. PNFS: NFSv4.1 File Layout Type 13985 This section describes the semantics and format of NFSv4.1 file-based 13986 layouts for pNFS. NFSv4.1 file-based layouts uses the 13987 LAYOUT4_NFSV4_1_FILES layout type. The LAYOUT4_NFSV4_1_FILES type 13988 defines striping data across multiple NFSv4.1 data servers. 13990 13.1. Client ID and Session Considerations 13992 Sessions are a REQUIRED feature of NFSv4.1, and this extends to both 13993 the metadata server and file-based (NFSv4.1-based) data servers. 13995 The role a server plays in pNFS is determined by the result it 13996 returns from EXCHANGE_ID. The roles are: 13998 o metadata server (EXCHGID4_FLAG_USE_PNFS_MDS is set in the result 13999 eir_flags), 14001 o data server (EXCHGID4_FLAG_USE_PNFS_DS) 14003 o non-metadata server (EXCHGID4_FLAG_USE_NON_PNFS). This is an 14004 NFSv4.1 server that does not support operations (e.g. LAYOUTGET) 14005 or attributes that pertain to pNFS. 14007 The client MAY request zero or more of EXCHGID4_FLAG_USE_NON_PNFS, 14008 EXCHGID4_FLAG_USE_PNFS_DS, or EXCHGID4_FLAG_USE_PNFS_MDS, even though 14009 some combinations (e.g. EXCHGID4_FLAG_USE_NON_PNFS | 14010 EXCHGID4_FLAG_USE_PNFS_MDS) are contradictory. The server however 14011 MUST only return the following acceptable combinations: 14013 +--------------------------------------------------------+ 14014 | Acceptable Results from EXCHANGE_ID | 14015 +--------------------------------------------------------+ 14016 | EXCHGID4_FLAG_USE_PNFS_MDS | 14017 | EXCHGID4_FLAG_USE_PNFS_MDS | EXCHGID4_FLAG_USE_PNFS_DS | 14018 | EXCHGID4_FLAG_USE_PNFS_DS | 14019 | EXCHGID4_FLAG_USE_NON_PNFS | 14020 | EXCHGID4_FLAG_USE_PNFS_DS | EXCHGID4_FLAG_USE_NON_PNFS | 14021 +--------------------------------------------------------+ 14023 As the above table implies, a server can have one or two roles. A 14024 server can be both a metadata server and a data server or it can be 14025 both a data server and non-metadata server. In addition to returning 14026 two roles in EXCHANGE_ID's results, and thus serving both roles via a 14027 common client ID, a server can serve two roles by returning a unique 14028 client ID and server owner for each role in each of two EXCHANGE_ID 14029 results, with each result indicating each role. 14031 In the case of a server with concurrent PNFS roles that are served by 14032 a common client ID, if the EXCHANGE_ID request from the client has 14033 zero or a combination of the bits set in eia_flags, the server result 14034 should set bits which represent the higher of the acceptable 14035 combination of the server roles, with a preference to match the roles 14036 requested by the client. Thus if a client request has 14037 (EXCHGID4_FLAG_USE_NON_PNFS | EXCHGID4_FLAG_USE_PNFS_MDS | 14038 EXCHGID4_FLAG_USE_PNFS_DS) flags set, and the server is both a 14039 metadata server and a data server, serving both the roles by a common 14040 client ID, the server SHOULD return with (EXCHGID4_FLAG_USE_PNFS_MDS 14041 | EXCHGID4_FLAG_USE_PNFS_DS) set. 14043 In the case of a server that has multiple concurrent PNFS roles, each 14044 role served by a unique client ID, if the client specifies zero or a 14045 combination of roles in the request, the server results SHOULD return 14046 only one of the roles from the combination specified by the client 14047 request. If the role specified by the server result does not match 14048 the intended use by the client, the client should send the 14049 EXCHANGE_ID specifying just the interested PNFS role. 14051 If a pNFS metadata client gets a layout that refers it to an NFSv4.1 14052 data server, it needs a client ID on that data server. If it does 14053 not yet have a client ID from the server that had the 14054 EXCHGID4_FLAG_USE_PNFS_DS flag set in the EXCHANGE_ID results, then 14055 the client must send an EXCHANGE_ID to the data server, using the 14056 same co_ownerid as it sent to the metadata server, with the 14057 EXCHGID4_FLAG_USE_PNFS_DS flag set in the arguments. If the server's 14058 EXCHANGE_ID results have EXCHGID4_FLAG_USE_PNFS_DS set, then the 14059 client may use the client ID to create sessions that will exchange 14060 pNFS data operations. The client ID returned by the data server has 14061 no relationship with the client ID returned by a metadata server 14062 unless the client IDs are equal and the server owners and server 14063 scopes of the data server and metadata server are equal. 14065 In NFSv4.1, the session ID in the SEQUENCE operation implies the 14066 client ID, which in turn might be used by the server to map the 14067 stateid to the right client/server pair. However, when a data server 14068 is presented with a READ or WRITE operation with a stateid, because 14069 the stateid is associated with client ID on a metadata server, and 14070 because the session ID in the preceding SEQUENCE operation is tied to 14071 the client ID of the data server, the data server has no obvious way 14072 to determine the metadata server from the COMPOUND procedure, and 14073 thus has no way to validate the stateid. One RECOMMENDED approach is 14074 for pNFS servers to encode metadata server routing and/or identity 14075 information in the data server filehandles as returned in the layout. 14077 If metadata server routing and/or identity information is encoded in 14078 data server filehandles, when the metadata server identity or 14079 location changes, the data server filehandles it gave out must become 14080 invalid (stale), and so the metadata server must first recall the 14081 layouts. Invalidating a data server filehandle does not render the 14082 NFS client's data cache invalid. The client's cache should map a 14083 data server filehandle to a metadata server filehandle, and a 14084 metadata server filehandle to cached data. 14086 If a server is both a metadata server and a data server, the server 14087 might need to distinguish operations on files that are directed to 14088 the metadata server from those that are directed to the data server. 14089 It is RECOMMENDED that the values of the filehandles returned by the 14090 LAYOUTGET operation to be different than the value of the filehandle 14091 returned by the OPEN of the same file. 14093 Another scenario is for the metadata server and the storage device to 14094 be distinct from one client's point of view, and the roles reversed 14095 from another client's point of view. For example, in the cluster 14096 file system model, a metadata server to one client may be a data 14097 server to another client. If NFSv4.1 is being used as the storage 14098 protocol, then pNFS servers need to encode the values of filehandles 14099 according to their specific roles. 14101 13.1.1. Sessions Considerations for Data Servers 14103 Section 2.10.9.2 states that a client has to keep its lease renewed 14104 in order to prevent a session from being deleted by the server. If 14105 the reply to EXCHANGE_ID has just the EXCHGID4_FLAG_USE_PNFS_DS role 14106 set, then as noted in Section 13.6 the client will not be able to 14107 determine the data server's lease_time attribute, because GETATTR 14108 will not be permitted. Instead, the rule is that any time a client 14109 receives a layout referring it to a data server that returns just the 14110 EXCHGID4_FLAG_USE_PNFS_DS role, the client MAY assume that the 14111 lease_time attribute from the metadata server that returned the 14112 layout applies to the data server. Thus the data server MUST be 14113 aware of the values of all lease_time attributes of all metadata 14114 servers it is providing I/O for, and MUST use the maximum of all such 14115 lease_time values as the lease interval for all client IDs and 14116 sessions established on it. 14118 For example, if one metadata server has a lease_time attribute of 20 14119 seconds, and a second metadata server has a lease_time attribute of 14120 10 seconds, then if both servers return layouts that refer to an 14121 EXCHGID4_FLAG_USE_PNFS_DS-only data server, the data server MUST 14122 renew a client's lease if the interval between two SEQUENCE 14123 operations on different COMPOUND requests is less than 20 seconds. 14125 13.2. File Layout Definitions 14127 The following definitions apply to the LAYOUT4_NFSV4_1_FILES layout 14128 type, and may be applicable to other layout types. 14130 Unit. A unit is a fixed size quantity of data written to a data 14131 server. 14133 Pattern. A pattern is a method of distributing one or more equal 14134 sized units across a set of data servers. A pattern is iterated 14135 one or more times. 14137 Stripe. An stripe is a set of data distributed across a set of data 14138 servers in a pattern before that pattern repeats. 14140 Stripe Count. A stripe count is the number of units in a pattern. 14142 Stripe Width. A stripe width is the size of stripe in bytes. The 14143 stripe width = the stripe count * the size of the stripe unit. 14145 Hereafter, this document will refer to a unit that is a written in a 14146 pattern as a "stripe unit". 14148 A pattern may have more stripe units than data servers. If so, some 14149 data servers will have more than one stripe unit per stripe. A data 14150 server that has multiple stripe units per stripe MAY store each unit 14151 in a different data file (and depending on the implementation, will 14152 possibly assign a unique data filehandle to each data file). 14154 13.3. File Layout Data Types 14156 The high level NFSv4.1 layout types are nfsv4_1_file_layouthint4, 14157 nfsv4_1_file_layout_ds_addr4, and nfsv4_1_file_layout4. 14159 The SETATTR operation supports a layout hint attribute 14160 (Section 5.12.4). When the client sets a layout hint (data type 14161 layouthint4) with a layout type of LAYOUT4_NFSV4_1_FILES (the 14162 loh_type field), the loh_body field contains a value of data type 14163 nfsv4_1_file_layouthint4. 14165 const NFL4_UFLG_MASK = 0x0000003F; 14166 const NFL4_UFLG_DENSE = 0x00000001; 14167 const NFL4_UFLG_COMMIT_THRU_MDS = 0x00000002; 14168 const NFL4_UFLG_STRIPE_UNIT_SIZE_MASK 14169 = 0xFFFFFFC0; 14171 typedef uint32_t nfl_util4; 14172 enum filelayout_hint_care4 { 14173 NFLH4_CARE_DENSE = NFL4_UFLG_DENSE, 14175 NFLH4_CARE_COMMIT_THRU_MDS 14176 = NFL4_UFLG_COMMIT_THRU_MDS, 14178 NFLH4_CARE_STRIPE_UNIT_SIZE 14179 = 0x00000040, 14181 NFLH4_CARE_STRIPE_COUNT = 0x00000080 14182 }; 14184 /* Encoded in the loh_body field of type layouthint4: */ 14186 struct nfsv4_1_file_layouthint4 { 14187 uint32_t nflh_care; 14188 nfl_util4 nflh_util; 14189 count4 nflh_stripe_count; 14190 }; 14192 The generic layout hint structure is described in Section 3.3.19. 14193 The client uses the layout hint in the layout_hint (Section 5.12.4) 14194 attribute to indicate the preferred type of layout to be used for a 14195 newly created file. The LAYOUT4_NFSV4_1_FILES layout type-specific 14196 content for the layout hint is composed of three fields. The first 14197 field, nflh_care, is a set of flags indicating which values of the 14198 hint the client cares about. If the NFLH4_CARE_DENSE flag is set, 14199 then the client indicates in the second field, nflh_util, a 14200 preference for how the data file is packed (Section 13.4.4), which is 14201 controlled by the value of nflh_util & NFL4_UFLG_DENSE. If the 14202 NFLH4_CARE_COMMIT_THRU_MDS flag is set, then the client indicates a 14203 preference for whether the client should send COMMIT operations to 14204 the metadata server or data server (Section 13.7), which is 14205 controlled by the value of nflh_util & NFL4_UFLG_COMMIT_THRU_MDS. If 14206 the NFLH4_CARE_STRIPE_UNIT_SIZE flag is set, the client indicates its 14207 preferred stripe unit size, which is indicated in nflh_util & 14208 NFL4_UFLG_STRIPE_UNIT_SIZE_MASK (thus the stripe unit size MUST be a 14209 multiple of 64 bytes). The minimum stripe unit size is 64 bytes. If 14210 the NFLH4_CARE_STRIPE_COUNT flag is set, the client indicates in the 14211 third field, nflh_stripe_count, the stripe count. The stripe count 14212 multiplied by the stripe unit size is the stripe width. 14214 When LAYOUTGET returns a LAYOUT4_NFSV4_1_FILES layout (indicated in 14215 the loc_type field of the lo_content field), the loc_body field of 14216 the lo_content field contains a value of data type 14217 nfsv4_1_file_layout4. Among other content, nfsv4_1_file_layout4 has 14218 a storage device ID (field nfl_deviceid) of data type deviceid4. The 14219 GETDEVICEINFO operation maps a device ID to a storage device address 14220 (type device_addr4). When GETDEVICEINFO returns a device address 14221 with a layout type of LAYOUT4_NFSV4_1_FILES (the da_layout_type 14222 field), the da_addr_body field contains a value of data type 14223 nfsv4_1_file_layout_ds_addr4. 14225 typedef netaddr4 multipath_list4<>; 14227 /* Encoded in the da_addr_body field of type device_addr4: */ 14228 struct nfsv4_1_file_layout_ds_addr4 { 14229 uint32_t nflda_stripe_indices<>; 14230 multipath_list4 nflda_multipath_ds_list<>; 14231 }; 14233 The nfsv4_1_file_layout_ds_addr4 data type represents the device 14234 address. It is composed of two fields: 14236 1. nflda_multipath_ds_list: An array of lists of data servers, where 14237 each list can be one or more elements, and each element 14238 represents a (see Section 13.5) data server address which may 14239 serve equally as the target of IO operations. The length of this 14240 array might be different than the stripe count. 14242 2. nflda_stripe_indices: An array of indexes used to index into 14243 nflda_multipath_ds_list. Each element of nflda_stripe_indices 14244 MUST be less than the number of elements in 14245 nflda_multipath_ds_list. Each element of nflda_multipath_ds_list 14246 SHOULD be referred to by one or more elements of 14247 nflda_stripe_indices. The number of elements in 14248 nflda_stripe_indices is always equal to the stripe count. 14250 /* Encoded in the loc_body field of type layout_content4: */ 14251 struct nfsv4_1_file_layout4 { 14252 deviceid4 nfl_deviceid; 14253 nfl_util4 nfl_util; 14254 uint32_t nfl_first_stripe_index; 14255 offset4 nfl_pattern_offset; 14256 nfs_fh4 nfl_fh_list<>; 14257 }; 14259 The nfsv4_1_file_layout4 data type represents the layout. It is 14260 composed of the following fields: 14262 1. nfl_deviceid: The device ID which maps to a value of type 14263 nfsv4_1_file_layout_ds_addr4. 14265 2. nfl_util: Like the nflh_util field of data type 14266 nfsv4_1_file_layouthint4, a compact representation of how the 14267 data on a file on each data server is packed, whether the client 14268 should send COMMIT operations to the metadata server or data 14269 server, and the stripe unit size. If a server returns two or 14270 more overlapping layouts, each stripe unit size in each 14271 overlapping layout MUST be the same. 14273 3. nfl_first_stripe_index: The index into the first element of the 14274 nflda_stripe_indices array to use. 14276 4. nfl_pattern_offset: This field is the logical offset into the 14277 file where the striping pattern starts. It is required for 14278 converting the client's logical I/O offset (e.g. the current 14279 offset in a POSIX file descriptor before the read() or write() 14280 system call is sent) into the stripe unit number (see 14281 Section 13.4.1). 14283 If dense packing is used, then nfl_pattern_offset is also needed 14284 to convert the client's logical I/O offset to an offset on the 14285 file on the data server corresponding to the stripe unit number 14286 (see Section 13.4.4). 14288 Note that nfl_pattern_offset is not always the same as lo_offset. 14289 For example, via the LAYOUTGET operation, a client might request 14290 a layout starting at offset 1000 of a file that has its striping 14291 pattern start at offset 0. 14293 5. nfl_fh_list: An array of data server filehandles for each list of 14294 data servers in each element of the nflda_multipath_ds_list 14295 array. The number of elements in nfl_fh_list depends on whether 14296 sparse or dense packing is being used. 14298 * If sparse packing is being used, the number of elements in 14299 nfl_fh_list MUST be one of three values: 14301 + Zero. This means that filehandles used for each data 14302 server are the same as the filehandle returned by the OPEN 14303 operation from the metadata server. 14305 + One. This means that every data server uses the same 14306 filehandle: what is specified in nfl_fh_list[0]. 14308 + The same number of elements in nflda_multipath_ds_list. 14309 Thus, in this case, when issuing an I/O to any data server 14310 in nflda_multipath_ds_list[X], the filehandle in 14311 nfl_fh_list[X] MUST be used. 14313 See the discussion on sparse packing in Section 13.4.4. 14315 * If dense packing is being used, number of elements in 14316 nfl_fh_list MUST be the same as the number of elements in 14317 nflda_stripe_indices. Thus when issuing I/O to any data 14318 server in nflda_multipath_ds_list[nflda_stripe_indices[Y]], 14319 the filehandle in nfl_fh_list[Y] MUST be used. In addition, 14320 any time there exists i, and j, (i != j) such that the 14321 intersection of 14322 nflda_multipath_ds_list[nflda_stripe_indices[i]] and 14323 nflda_multipath_ds_list[nflda_stripe_indices[j]] is not empty, 14324 then nfl_fh_list[i] MUST NOT equal nfl_fh_list[j]. In other 14325 words, when dense packing is being used, if a data server 14326 appears in two or more units of a striping pattern, each 14327 reference to the data server MUST use a different filehandle. 14329 Indeed, if there are multiple striping patterns, as indicated 14330 by the presence of multiple objects of data type layout4 14331 (either returned in one or multiple LAYOUTGET operations), and 14332 a data server is the target of a unit of one pattern and 14333 another unit of another pattern, then each reference to each 14334 data server MUST use a different filehandle. 14336 See the discussion on dense packing in Section 13.4.4. 14338 The details on the interpretation of the layout are in Section 13.4. 14340 13.4. Interpreting the File Layout 14342 13.4.1. Determining the Stripe Unit Number 14344 To find the stripe unit number that corresponds to the client's 14345 logical file offset, the pattern offset must also be used. The i'th 14346 stripe unit (SUi) is: 14348 relative_offset = file_offset - nfl_pattern_offset; 14349 SUi = floor(relative_offset / stripe_unit_size); 14351 13.4.2. Interpreting the File Layout Using Sparse Packing 14353 When sparse packing is used, the algorithm for determining the 14354 filehandle and set of data server network addresses to write stripe 14355 unit i (SUi) to is: 14357 stripe_count = number of elements in nflda_stripe_indices; 14359 j = (SUi + nfl_first_stripe_index) % stripe_count; 14361 idx = nflda_stripe_indices[j]; 14363 fh_count = number of elements in nfl_fh_list; 14364 ds_count = number of elements in nflda_multipath_ds_list; 14366 switch (fh_count) { 14367 case ds_count: 14368 fh = nfl_fh_list[idx]; 14369 break; 14371 case 1: 14372 fh = nfl_fh_list[0]; 14373 break; 14375 case 0: 14376 fh = filehandle returned by OPEN; 14377 break; 14379 default: 14380 throw a fatal exception; 14381 break; 14382 } 14384 address_list = nflda_multipath_ds_list[idx]; 14386 The client would then select a data server from address_list, and 14387 send a READ or WRITE operation using the filehandle specified in fh. 14389 Consider the following example: 14391 Suppose we have a device address consisting of seven data servers, 14392 arranged in three equivalence (Section 13.5) classes: 14394 { A, B, C, D }, { E }, { F, G } 14396 Where A through G are network addresses. 14398 Then 14400 nflda_multipath_ds_list<> = { A, B, C, D }, { E }, { F, G } 14402 i.e. 14404 nflda_multipath_ds_list[0] = { A, B, C, D } 14406 nflda_multipath_ds_list[1] = { E } 14408 nflda_multipath_ds_list[2] = { F, G } 14410 Suppose the striping index array is: 14412 nflda_stripe_indices<> = { 2, 0, 1, 0 } 14414 Now suppose the client gets a layout which has a device ID that maps 14415 to the above device address. The initial index, 14417 nfl_first_stripe_index = 2, 14419 and 14421 nfl_fh_list = { 0x36, 0x87, 0x67 }. 14423 If the client wants to write to SU0, the set of valid { network 14424 address, filehandle } combinations for SUi are determined by: 14426 nfl_first_stripe_index = 2 14428 So 14430 idx = nflda_stripe_indices[(0 + 2) % 4] 14432 = nflda_stripe_indices[2] 14434 = 1 14436 So 14438 nflda_multipath_ds_list[1] = { E } 14440 and 14442 nfl_fh_list[1] = { 0x87 } 14444 The client can thus write SU0 to { 0x87, { E }, }. 14446 The destinations of the first thirteen storage units are: 14448 +-----+------------+--------------+ 14449 | SUi | filehandle | data servers | 14450 +-----+------------+--------------+ 14451 | 0 | 87 | E | 14452 | 1 | 36 | A,B,C,D | 14453 | 2 | 67 | F,G | 14454 | 3 | 36 | A,B,C,D | 14455 | 4 | 87 | E | 14456 | 5 | 36 | A,B,C,D | 14457 | 6 | 67 | F,G | 14458 | 7 | 36 | A,B,C,D | 14459 | 8 | 87 | E | 14460 | 9 | 36 | A,B,C,D | 14461 | 10 | 67 | F,G | 14462 | 11 | 36 | A,B,C,D | 14463 | 12 | 87 | E | 14464 +-----+------------+--------------+ 14466 13.4.3. Interpreting the File Layout Using Dense Packing 14468 When dense packing is used, the algorithm for determining the 14469 filehandle and set of data server network addresses to write stripe 14470 unit i (SUi) to is: 14472 stripe_count = number of elements in nflda_stripe_indices; 14474 j = (SUi + nfl_first_stripe_index) % stripe_count; 14476 idx = nflda_stripe_indices[j]; 14478 fh_count = number of elements in nfl_fh_list; 14479 ds_count = number of elements in nflda_multipath_ds_list; 14481 switch (fh_count) { 14482 case stripe_count: 14483 fh = nfl_fh_list[j]; 14484 break; 14486 default: 14487 throw a fatal exception; 14488 break; 14489 } 14491 address_list = nflda_multipath_ds_list[idx]; 14493 The client would then select a data server from address_list, and 14494 send a READ or WRITE operation using the filehandle specified in fh. 14496 Consider the following example (which is the same as the sparse 14497 packing example, except for the filehandle list): 14499 Suppose we have a device address consisting of seven data servers, 14500 arranged in three equivalence (Section 13.5) classes: 14502 { A, B, C, D }, { E }, { F, G } 14504 Where A through G are network addresses. 14506 Then 14508 nflda_multipath_ds_list<> = { A, B, C, D }, { E }, { F, G } 14510 i.e. 14512 nflda_multipath_ds_list[0] = { A, B, C, D } 14514 nflda_multipath_ds_list[1] = { E } 14516 nflda_multipath_ds_list[2] = { F, G } 14518 Suppose the striping index array is: 14520 nflda_stripe_indices<> = { 2, 0, 1, 0 } 14522 Now suppose the client gets a layout which has a device ID that maps 14523 to the above device address. The initial index, 14525 nfl_first_stripe_index = 2, 14527 and 14529 nfl_fh_list = { 0x67, 0x37, 0x87, 0x36 }. 14531 The interesting examples for dense packing are SU1 and SU3, because 14532 each stripe unit refers to the same data server list, yet MUST use a 14533 different filehandle. If the client wants to write to SU1, the set 14534 of valid { network address, filehandle } combinations for SUi are 14535 determined by: 14537 nfl_first_stripe_index = 2 14539 So 14540 j = (1 + 2) % 4 = 3 14542 idx = nflda_stripe_indices[j] 14544 = nflda_stripe_indices[3] 14546 = 0 14548 So 14550 nflda_multipath_ds_list[0] = { A, B, C, D } 14552 and 14554 nfl_fh_list[3] = { 0x36 } 14556 The client can thus write SU1 to { 0x36, { A, B, C, D }, }. 14558 For SU3, j = (3 + 2) % 4 = 1, and nflda_stripe_indices[1] = 0. Then 14559 nflda_multipath_ds_list[0] = { A, B, C, D }, and nfl_fh_list[1] = 14560 0x37. The client can thus write SU3 to { 0x37, { A, B, C, D } }. 14562 The destinations of the first thirteen storage units are: 14564 +-----+------------+--------------+ 14565 | SUi | filehandle | data servers | 14566 +-----+------------+--------------+ 14567 | 0 | 87 | E | 14568 | 1 | 36 | A,B,C,D | 14569 | 2 | 67 | F,G | 14570 | 3 | 37 | A,B,C,D | 14571 | 4 | 87 | E | 14572 | 5 | 36 | A,B,C,D | 14573 | 6 | 67 | F,G | 14574 | 7 | 37 | A,B,C,D | 14575 | 8 | 87 | E | 14576 | 9 | 36 | A,B,C,D | 14577 | 10 | 67 | F,G | 14578 | 11 | 37 | A,B,C,D | 14579 | 12 | 87 | E | 14580 +-----+------------+--------------+ 14582 13.4.4. Sparse and Dense Stripe Unit Packing 14584 The flag NFL4_UFLG_DENSE of the nfl_util4 data type (field nflh_util 14585 of the data type nfsv4_1_file_layouthint4 and field nfl_util of data 14586 type nfsv4_1_file_layout_ds_addr4) specifies how the data is packed 14587 within the data file on a data server. It allows for two different 14588 data packings: sparse and dense. The packing type determines the 14589 calculation that must be made to map the client visible file offset 14590 to the offset within the data file located on the data server. 14592 If nfl_util & NFL4_UFLG_DENSE is zero, this means that sparse packing 14593 is being used. Hence the logical offsets of the file as viewed by a 14594 client issuing READs and WRITEs directly to the metadata server are 14595 the same offsets each data server uses when storing a stripe unit. 14596 The effect then, for striping patterns consisting of at least two 14597 stripe units, is for each data server file to be sparse or holey. So 14598 for example, suppose there is a pattern with three stripe units, the 14599 stripe unit size is a 4096 bytes, and there are three data servers in 14600 the pattern, then the file in data server 1 will have stripe units 0, 14601 3, 6, 9, ... filled, data server 2's file will have stripe units 1, 14602 4, 7, 10, ... filled, and data server 3's file will have stripe units 14603 2, 5, 8, 11, ... filled. The unfilled stripe units of each file will 14604 be holes, hence the files in each data server are sparse. 14606 If sparse packing is being used and a client attempts I/O to one of 14607 the holes, then an error MUST be returned by the data server. Using 14608 the above example, if data server 3 received a READ or WRITE request 14609 for block 4, the data server would return NFS4ERR_PNFS_IO_HOLE. Thus 14610 data servers need to understand the striping pattern in order to 14611 support sparse packing. 14613 If nfl_util & NFL4_UFLG_DENSE is one, this means that dense packing 14614 is being used and the data server files have no holes. Dense packing 14615 might be selected because the data server does not (efficiently) 14616 support holey files, or because the data server cannot recognize 14617 read-ahead unless there are no holes. If dense packing is indicated 14618 in the layout, the data files must be packed. Using the example 14619 striping pattern and stripe unit size that was used for the sparse 14620 packing example, the corresponding dense packing would have all 14621 stripe units of all data files filled. Logical stripe units 0, 3, 6, 14622 ... of the file would live on stripe units 0, 1, 2, ... of the file 14623 of data server 1, logical stripe units 1, 4, 7, ... of the file would 14624 live on stripe units 0, 1, 2, ... of the file of data server 2, and 14625 logical stripe units 2, 5, 8, ... of the file would live on stripe 14626 units 0, 1, 2, ... of the file of data server 3. 14628 Because dense packing does not leave holes on the data servers, the 14629 pNFS client is allowed to write to any offset of any data file of any 14630 data server in the stripe. Thus the data servers need not know the 14631 file's striping pattern. 14633 The calculation to determine the byte offset within the data file for 14634 dense data server layouts is: 14636 stripe_width = stripe_unit_size * N; 14637 where N = number of elements in nflda_stripe_indices. 14639 relative_offset = file_offset - nfl_pattern_offset; 14641 data_file_offset = floor(relative_offset / stripe_width) 14642 * stripe_unit_size 14643 + relative_offset % stripe_unit_size 14645 If dense packing is being used, and a data server appears more than 14646 once in a striping pattern, then to distinguish one stripe unit from 14647 another, the data server MUST use a different filehandle. Let's 14648 suppose there are two data servers. Logical stripe units 0, 3, 6 are 14649 served by data server 1, logical stripe units 1, 4, 7 are served by 14650 data server 2, and logical stripe units 2, 5, 8 are also served by 14651 data server 2. Unless data server 2 has two filehandles (each 14652 referring to a different data file), then, for example, a write to 14653 logical stripe unit 1 overwrites the write to logical stripe unit 2, 14654 because both logical stripe units are located in the same stripe unit 14655 (0) of data server 2. 14657 13.5. Data Server Multipathing 14659 The NFSv4.1 file layout supports multipathing to multiple data server 14660 addresses. Data server-level multipathing is used for bandwidth 14661 scaling via trunking (Section 2.10.4) and for higher availability of 14662 use in the case of a data server failure. Multipathing allows the 14663 client to switch to another data server address which may that of 14664 another data server that is exporting the same data stripe unit, 14665 without having to contact the metadata server for a new layout. 14667 To support data server multipathing, each element of the 14668 nflda_multipath_ds_list contains an array of one more data server 14669 network addresses. This array (data type multipath_list4) represents 14670 a list of data servers (each identified by a network address), with 14671 it being possible that some data servers will appear in the list 14672 multiple times. 14674 The client is free to use any of the network addresses as a 14675 destination to send data server requests. If some network addresses 14676 are less optimal paths to the data than others, then the MDS SHOULD 14677 NOT include those network addresses in an element of 14678 nflda_multipath_ds_list. If less optimal network addresses exist to 14679 provide fail over, the RECOMMENDED method to offer the addresses is 14680 to provide them in a replacement device ID to device address mapping, 14681 or a replacement device ID. When a client finds that no data server 14682 in an element of nflda_multipath_ds_list responds, it SHOULD send a 14683 GETDEVICEINFO to attempt to replace the existing device ID to device 14684 address mappings. If the MDS detects that all data servers 14685 represented by an element of nflda_multipath_ds_list are unavailable, 14686 the MDS SHOULD send a CB_NOTIFY_DEVICEID (if the client has indicated 14687 it wants device ID notifications for changed device IDs) to change 14688 the device ID to device address mappings to the available data 14689 servers. If the device ID itself must be replaced, the MDS SHOULD 14690 recall all layouts with the device ID, and thus force the client to 14691 get new layouts and device ID mappings via LAYOUTGET and 14692 GETDEVICEINFO. 14694 Generally if two network addresses appear in an element of 14695 nflda_multipath_ds_list they will designate the same data server and 14696 the two data server addresses will support the implementation client 14697 ID or session trunking (the latter is RECOMMENDED) as defined in 14698 Section 2.10.4, and the two data server addresses will share the same 14699 server owner, or major ID of the server owner. It is not always 14700 necessary for the two data server addresses to designate the same 14701 server with trunking being used. For example the data could be read- 14702 only, and the data consist of exact replicas. 14704 13.6. Operations Sent to NFSv4.1 Data Servers 14706 Clients accessing data on an NFSv4.1 data server MUST send only the 14707 NULL procedure and COMPOUND procedures whose operations are taken 14708 only from two restricted subsets of the operations defined as valid 14709 NFSv4.1 operations. Clients MUST use the filehandle specified by the 14710 layout when accessing data on NFSv4.1 data servers. 14712 The first of these operation subsets consist of management operations 14713 where the current filehandle is not relevant. This subset consists 14714 of the BACKCHANNEL_CTL, BIND_CONN_TO_SESSION, CREATE_SESSION, 14715 DESTROY_CLIENTID, DESTROY_SESSION, EXCHANGE_ID, SECINFO_NO_NAME, 14716 SET_SSV, and SEQUENCE operations. The client may use these 14717 operations in order to set up and maintain the appropriate client 14718 IDs, sessions, and security contexts involved in communication with 14719 the data server. Henceforth these will be referred to as data-server 14720 housekeeping operations. 14722 The second subset consists of COMMIT, READ, WRITE, and PUTFH, These 14723 operations must be used with a current filehandle specified by the 14724 layout. In the case of PUTFH, the new current filehandle must be one 14725 taken from the layout. Henceforth, these will be referred to as 14726 data-server I/O operations. As described in Section 12.5.1, a client 14727 MUST NOT send an I/O to a data server for which it does not hold a 14728 valid layout; the data server MUST reject such an I/O. 14730 Unless the server has a concurrent non-data-server personality, i.e. 14731 EXCHANGE_ID results returned (EXCHGID4_FLAG_USE_PNFS_DS | 14732 EXCHGID4_FLAG_USE_PNFS_MDS) or (EXCHGID4_FLAG_USE_PNFS_DS | 14733 EXCHGID4_FLAG_USE_NON_PNFS), see Section 13.1, any use of operations 14734 other than those specified in the two subsets above MUST return 14735 NFS4ERR_NOTSUPP to the client. 14737 When the server has concurrent data server and non-data-server 14738 personalities, each COMPOUND sent by the client MUST be constructed 14739 so that it is appropriate to one of the two personalities, and must 14740 not contain operations directed to a mix of those personalities. The 14741 server MUST enforce this. To understand the constraints, operations 14742 within a COMPOUND are divided into the following three classes: 14744 1. An operation which is ambiguous regarding its personality 14745 assignment. These include all of the data-server housekeeping 14746 operations. Additionally, if the server has assigned filehandles 14747 so that the ones defined by the layout are the same as those used 14748 by the metadata server, all operations in the second class are 14749 within this group unless a stateid used is incompatible with a 14750 data-server personality in that it is a special stateid or has a 14751 non-zero seqid field. 14753 2. An operation which is referable to the data server personality. 14754 These are data-server I/O operations where the filehandle is one 14755 that can only be validly directed to the data-server personality. 14757 3. An operation which is referable to the non-data-server 14758 personality. These include all COMPOUND operations that are 14759 neither data-server housekeeping nor data-server I/O operations 14760 plus data-server I/O operations where the current fh (or the one 14761 to be made the current fh in the case of PUTFH) is one that is 14762 only valid on the metadata server or where a stateid is used that 14763 is incompatible with the data server, i.e. is a special stateid 14764 or has a non-zero seqid value. 14766 When a COMPOUND first executes an operation from class 3 above, it 14767 acts as a normal COMPOUND on any other server and the data server 14768 personality ceases to be relevant. There are no special restrictions 14769 on the operations in the COMPOUND to limit them to those for a data 14770 server. When a PUTFH is done, filehandles derived from the layout 14771 are not valid. If their format is not normally acceptable, then 14772 NFS4ERR_BADHANDLE MUST result. Similarly, current filehandles for 14773 other operations do not accept filehandles derived from layouts and 14774 are not normally usable on the metadata server. Using these will 14775 result in NFS4ERR_STALE. 14777 When a COMPOUND first executes an operation from class 2, which would 14778 be PUTFH where the filehandle is one from a layout, the COMPOUND 14779 henceforth is interpreted with respect to the data server 14780 personality. Operations outside the two classes discussed above MUST 14781 result in NFS4ERR_NOTSUPP. Filehandles are validated using the rules 14782 of the data server, resulting in NFS4ERR_BADHANDLE and/or 14783 NFS4ERR_STALE even when they would not normally do so when addressed 14784 to the non-data-server personality. Stateids must obey the rules of 14785 the data server in that any use of special stateids or stateids with 14786 non-zero seqid values must result in NFS4ERR_BAD_STATEID. 14788 Until the server first executes an operation from class 2 or class 3, 14789 the client MUST NOT depend on the operation being executed by either 14790 the data-server or the non-data-server personality. The server MUST 14791 pick one personality consistently for a given COMPOUND, with the only 14792 possible transition being a single one when the first operation from 14793 class 2 or class 3 is executed. 14795 Because of the complexity induced by assigning filehandles so they 14796 can be used on both a data server and a metadata server, it is 14797 RECOMMENDED that where the same server can have both personalities, 14798 the server assign separate unique filehandles to both personalities. 14799 This makes it unambiguous for which server a given request is 14800 intended. 14802 GETATTR and SETATTR MUST be directed to the metadata server. In the 14803 case of a SETATTR of the size attribute, the control protocol is 14804 responsible for propagating size updates/truncations to the data 14805 servers. In the case of extending WRITEs to the data servers, the 14806 new size must be visible on the metadata server once a LAYOUTCOMMIT 14807 has completed (see Section 12.5.4.2). Section 13.10, describes the 14808 mechanism by which the client is to handle data server files that do 14809 not reflect the metadata server's size. 14811 13.7. COMMIT Through Metadata Server 14813 The file layout provides two alternate means of providing for the 14814 commit of data written through data servers. The flag 14815 NFL4_UFLG_COMMIT_THRU_MDS in the field nfl_util of the file layout 14816 (data type nfsv4_1_file_layout4) is an indication from the metadata 14817 server to the client of the REQUIRED way of performing COMMIT, either 14818 by sending the COMMIT to the data server or the metadata server. 14819 These two methods of dealing with the issue correspond to broad 14820 styles of implementation for a pNFS server supporting the files 14821 layout type. 14823 o When the flag is FALSE, COMMIT operations MUST to be sent to the 14824 data server to which the corresponding WRITE operations were sent. 14825 This approach is most useful when striping of files is implemented 14826 as part of pNFS server, with the individual data servers each 14827 implementing their own file systems. 14829 o When the flag is TRUE, COMMIT operations MUST be sent to the 14830 metadata server, rather than to the individual data servers. This 14831 approach is most useful when the pNFS server is implemented on top 14832 of a clustered file system. In such an implementation, sending 14833 COMMIT's to multiple data servers may result in repeated writes of 14834 metadata blocks as each individual COMMIT is executed, to the 14835 detriment of write performance. Sending a single COMMIT to the 14836 metadata server can provide more efficiency when there exists a 14837 clustered file system capable of implementing such a co-ordinated 14838 COMMIT. 14840 If nfl_util & NFL4_UFLG_COMMIT_THRU_MDS is TRUE, then in order to 14841 maintain the current NFSv4.1 commit and recovery model, the data 14842 servers MUST return a common writeverf verifier in all WRITE 14843 responses for a given file layout, and the metadata server's 14844 COMMIT implementation must return the same writeverf. The value 14845 of the writeverf verifier MUST be changed at the metadata server 14846 or any data server that is referenced in the layout, whenever 14847 there is a server event that can possibly lead to loss of 14848 uncommitted data. The scope of the verifier can be for a file or 14849 for the entire pNFS server. It might be more difficult for the 14850 server to maintain the verifier at the file level but the benefit 14851 is that only events that impact a given file will require recovery 14852 action. 14854 Note that if the layout specified dense packing, then the offset used 14855 to a COMMIT to the MDS may differ than that of an offset used to a 14856 COMMIT to the data server. 14858 The single COMMIT to the metadata server will return a verifier and 14859 the client should compare it to all the verifiers from the WRITEs and 14860 fail the COMMIT if there is any mismatched verifiers. If COMMIT to 14861 the metadata server fails, the client should re-send WRITEs for all 14862 the modified data in the file. The client should treat modified data 14863 with a mismatched verifier as a WRITE failure and try to recover by 14864 reissuing the WRITEs to the original data server or using another 14865 path to that data if the layout has not been recalled. Another 14866 option the client has is getting a new layout or just rewrite the 14867 data through the metadata server. If nfl_util & 14868 NFL4_UFLG_COMMIT_THRU_MDS is FALSE, sending a COMMIT to the metadata 14869 server might have no effect. If nfl_util & NFL4_UFLG_COMMIT_THRU_MDS 14870 is FALSE, a COMMIT sent to the metadata server should be used only to 14871 commit data that was written to the metadata server. See 14872 Section 12.7.6 for recovery options. 14874 13.8. The Layout Iomode 14876 The layout iomode need not be used by the metadata server when 14877 servicing NFSv4.1 file-based layouts, although in some circumstances 14878 it may be useful. For example, if the server implementation supports 14879 reading from read-only replicas or mirrors, it would be useful for 14880 the server to return a layout enabling the client to do so. As such, 14881 the client SHOULD set the iomode based on its intent to read or write 14882 the data. The client may default to an iomode of LAYOUTIOMODE4_RW. 14883 The iomode need not be checked by the data servers when clients 14884 perform I/O. However, the data servers SHOULD still validate that the 14885 client holds a valid layout and return an error if the client does 14886 not. 14888 13.9. Metadata and Data Server State Coordination 14890 13.9.1. Global Stateid Requirements 14892 When the client sends I/O to a data server, the stateid used MUST NOT 14893 be a layout stateid as returned by LAYOUTGET or sent by 14894 CB_LAYOUTRECALL. Permitted stateids are based on one of the 14895 following: an open stateid (the stateid field of data type OPEN4resok 14896 as returned by OPEN), a delegation stateid (the stateid field of data 14897 types open_read_delegation4 and open_write_delegation4 as returned by 14898 OPEN or WANT_DELEGATION, or as sent by CB_PUSH_DELEG), or a stateid 14899 returned by the LOCK or LOCKU operations. The stateid sent to the 14900 data server MUST be sent with the seqid set to zero, indicating the 14901 most current version of that stateid, rather than indicating a 14902 specific non-zero seqid value. In no case is the use of special 14903 stateid values allowed. 14905 The stateid used for I/O MUST have the same effect and be subject to 14906 the same validation on a data server as it would if the I/O was being 14907 performed on the metadata server itself in the absence of pNFS. This 14908 has the implication that stateids are globally valid on both the 14909 metadata and data servers. This requires the metadata server to 14910 propagate changes in lock and open state to the data servers, so that 14911 the data servers can validate I/O accesses. This is discussed 14912 further in Section 13.9.2. Depending on when stateids are 14913 propagated, the existence of a valid stateid on the data server may 14914 act as proof of a valid layout. 14916 Clients performing I/O operations need to select an appropriate 14917 stateid based on the locks (including opens and delegations) held by 14918 the client and the various types of state-owners issuing the I/O 14919 requests. The rules for doing so when referencing data servers are 14920 somewhat different from those discussed in Section 8.2.5 which apply 14921 when accessing metadata servers. 14923 The following rules, applied in order of decreasing priority, govern 14924 the selection of the appropriate stateid: 14926 o If the client holds a delegation for the file in question, the 14927 delegation stateid should be used. 14929 o Otherwise, there must be an open stateid for the current open- 14930 owner, and that open stateid for the open file in question is 14931 used, unless mandatory locking, prevents that. See below. 14933 o If the data server had previously responded with NFS4ERR_LOCKED to 14934 use of the open stateid, then the client should use the lock 14935 stateid whenever one exists for that open file with the current 14936 lock-owner. 14938 o Special stateids should never be used and if used the data server 14939 MUST reject the I/O with an NFS4ERR_BAD_STATEID error. 14941 13.9.2. Data Server State Propagation 14943 Since the metadata server, which handles lock and open-mode state 14944 changes, as well as ACLs, may not be co-located with the data servers 14945 where I/O access are validated, the server implementation MUST take 14946 care of propagating changes of this state to the data servers. Once 14947 the propagation to the data servers is complete, the full effect of 14948 those changes MUST be in effect at the data servers. However, some 14949 state changes need not be propagated immediately, although all 14950 changes SHOULD be propagated promptly. These state propagations have 14951 an impact on the design of the control protocol, even though the 14952 control protocol is outside of the scope of this specification. 14953 Immediate propagation refers to the synchronous propagation of state 14954 from the metadata server to the data server(s); the propagation must 14955 be complete before returning to the client. 14957 13.9.2.1. Lock State Propagation 14959 If the pNFS server supports mandatory locking, any mandatory locks on 14960 a file MUST be made effective at the data servers before the request 14961 that establishes them returns to the caller. The effect MUST be the 14962 same as if the mandatory lock state were synchronously propagated to 14963 the data servers, even though the details of the control protocol may 14964 avoid actual transfer of the state under certain circumstances. 14966 On the other hand, since advisory lock state is not used for checking 14967 I/O accesses at the data servers, there is no semantic reason for 14968 propagating advisory lock state to the data servers. Since updates 14969 to advisory locks neither confer nor remove privileges, these changes 14970 need not be propagated immediately, and may not need to be propagated 14971 promptly. The updates to advisory locks need only be propagated when 14972 the data server needs to resolve a question about a stateid. In 14973 fact, if byte-range locking is not mandatory (i.e., is advisory) the 14974 clients are advised not to use the lock-based stateids for I/O at 14975 all. The stateids returned by open are sufficient and eliminate 14976 overhead for this kind of state propagation. 14978 If a client gets back an NFS4ERR_LOCKED error from a data server, 14979 this is an indication that mandatory byte-range locking is in force. 14980 The client recovers from this by getting a byte-range lock that 14981 covers the affected range and re-sends the I/O with the stateid of 14982 the byte-range lock. 14984 13.9.2.2. Open and Deny Mode Validation 14986 Open and deny mode validation MUST be performed against the open and 14987 deny mode(s) held by the data servers. When access is reduced or a 14988 deny mode made more restrictive (because of CLOSE or DOWNGRADE) the 14989 data server MUST prevent any I/Os that would be denied if performed 14990 on the metadata server. When access is expanded, the data server 14991 MUST make sure that no requests are subsequently rejected because of 14992 open or deny issues that no longer apply, given the previous 14993 relaxation. 14995 13.9.2.3. File Attributes 14997 Since the SETATTR operation has the ability to modify state that is 14998 visible on both the metadata and data servers (e.g., the size), care 14999 must be taken to ensure that the resultant state across the set of 15000 data servers is consistent; especially when truncating or growing the 15001 file. 15003 As described earlier, the LAYOUTCOMMIT operation is used to ensure 15004 that the metadata is synchronized with changes made to the data 15005 servers. For the NFSv4.1-based data storage protocol, it is 15006 necessary to re-synchronize state such as the size attribute, and the 15007 setting of mtime/change/atime. See Section 12.5.4 for a full 15008 description of the semantics regarding LAYOUTCOMMIT and attribute 15009 synchronization. It should be noted, that by using an NFSv4.1-based 15010 layout type, it is possible to synchronize this state before 15011 LAYOUTCOMMIT occurs. For example, the control protocol can be used 15012 to query the attributes present on the data servers. 15014 Any changes to file attributes that control authorization or access 15015 as reflected by ACCESS calls or READs and WRITEs on the metadata 15016 server, MUST be propagated to the data servers for enforcement on 15017 READ and WRITE I/O calls. If the changes made on the metadata server 15018 result in more restrictive access permissions for any user, those 15019 changes MUST be propagated to the data servers synchronously. 15021 The OPEN operation (Section 18.16.4) does not impose any requirement 15022 that I/O operations on an open file have the same credentials as the 15023 OPEN itself (unless EXCHGID4_FLAG_BIND_PRINC_STATEID is set when 15024 EXCHANGE_ID creates the client ID) and so requires the server's READ 15025 and WRITE operations to perform appropriate access checking. Changes 15026 to ACLs also require new access checking by READ and WRITE on the 15027 server. The propagation of access right changes due to changes in 15028 ACLs may be asynchronous only if the server implementation is able to 15029 determine that the updated ACL is not more restrictive for any user 15030 specified in the old ACL. Due to the relative infrequency of ACL 15031 updates, it is suggested that all changes be propagated 15032 synchronously. 15034 13.10. Data Server Component File Size 15036 A potential problem exists when a component data file on a particular 15037 data server is grown past EOF; the problem exists for both dense and 15038 sparse layouts. Imagine the following scenario: a client creates a 15039 new file (size == 0) and writes to byte 131072; the client then seeks 15040 to the beginning of the file and reads byte 100. The client should 15041 receive 0s back as a result of the READ. However, if the READ falls 15042 on a data server other than the one that received client's original 15043 WRITE, the data server servicing the READ may still believe that the 15044 file's size is at 0 and return no data with the EOF flag set. The 15045 data server can only return 0s if it knows that the file's size has 15046 been extended. This would require the immediate propagation of the 15047 file's size to all data servers, which is potentially very costly. 15048 Therefore, the client that has initiated the extension of the file's 15049 size MUST be prepared to deal with these EOF conditions; the EOF'ed 15050 or short READs will be treated as a hole in the file and the NFS 15051 client will substitute 0s for the data when the offset is less than 15052 the client's view of the file size. 15054 The NFSv4.1 protocol only provides close to open file data cache 15055 semantics; meaning that when the file is closed all modified data is 15056 written to the server. When a subsequent OPEN of the file is done, 15057 the change attribute is inspected for a difference from a cached 15058 value for the change attribute. For the case above, this means that 15059 a LAYOUTCOMMIT will be done at close (along with the data WRITEs) and 15060 will update the file's size and change attribute. Access from 15061 another client after that point will result in the appropriate size 15062 being returned. 15064 13.11. Layout Revocation and Fencing 15066 As described in Section 12.7, the layout type-specific storage 15067 protocol is responsible for handling the effects of I/Os started 15068 before lease expiration, extending through lease expiration. The 15069 LAYOUT4_NFSV4_1_FILES layout type can prevents all I/Os to data 15070 servers from being executed after lease expiration, without relying 15071 on a precise client lease timer and without requiring data servers to 15072 maintain lease timers. However, while LAYOUT4_NFSV4_1_FILES pNFS 15073 server is free to deny the client all access to the data servers, 15074 because it supports revocation of layouts, it is also free to perform 15075 a denial on a per file basis only when revoking a layout. 15077 In addition to lease expiration, the reasons a layout can be revoked 15078 include: client fails to respond to a CB_LAYOUTRECALL, the metadata 15079 server restarts, or administrative intervention. Regardless of the 15080 reason, once a client's layout has been revoked, the pNFS server MUST 15081 prevent the client from issuing I/O for the affected file from and to 15082 all data servers, in other words, it MUST fence the client from the 15083 affected file on the data servers. 15085 Fencing works as follows. As described in Section 13.1, in COMPOUND 15086 procedure requests to the data server, the data filehandle provided 15087 by the PUTFH operation and the stateid in the READ or WRITE operation 15088 are used to validate that the client has a valid layout for the I/O 15089 being performed, if it does not, the I/O is rejected with 15090 NFS4ERR_PNFS_NO_LAYOUT. The server can simply check the stateid, and 15091 additionally, make the data filehandle stale if the layout specified 15092 a data filehandle that is different from the metadata server's 15093 filehandle for the file (see the nfl_fh_list description in 15094 Section 13.3). 15096 Before the metadata server takes any action to invalidate layout 15097 state given out by a previous instance, it must make sure that all 15098 layout state from that previous instance are invalidated at the data 15099 servers. This means that a metadata server may not restripe a file 15100 until it has contacted all of the data servers to invalidate the 15101 layouts from the previous instance nor may it give out mandatory 15102 locks that conflict with layouts from the previous instance without 15103 either doing a specific invalidation (as it would have to do anyway) 15104 or doing a global data server invalidation. 15106 13.12. Security Considerations for the File Layout Type 15108 The NFSv4.1 file layout type MUST adhere to the security 15109 considerations outlined in Section 12.9. NFSv4.1 data servers MUST 15110 make all of the required access checks on each READ or WRITE I/O as 15111 determined by the NFSv4.1 protocol. If the metadata server would 15112 deny READ or WRITE operation on a given file due its ACL, mode 15113 attribute, open mode, open deny mode, mandatory lock state, or any 15114 other attributes and state, the data server MUST also deny the READ 15115 or WRITE operation. This impacts the control protocol and the 15116 propagation of state from the metadata server to the data servers; 15117 see Section 13.9.2 for more details. 15119 The methods for authentication, integrity, and privacy for file 15120 layout-based data servers are the same as those used by metadata 15121 servers. Metadata and data servers use ONC RPC security flavors to 15122 authenticate, and SECINFO and SECINFO_NO_NAME to negotiate the 15123 security mechanism and services to be used. 15125 For a given file object, a metadata server MAY require different 15126 security parameters (secinfo4 value) than the data server. For a 15127 given file object with multiple data servers, the secinfo4 value 15128 SHOULD be the same across all data servers. If the secinfo4 values 15129 across a metadata server and its data servers differ for a specific 15130 file, the mapping of the principal to the server's internal user 15131 identifier MUST be the same in order for the access control checks 15132 based on ACL, mode, open and deny mode, and mandatory locking to be 15133 consistent across on the pNFS server. 15135 If an NFSv4.1 implementation supports pNFS and supports NFSv4.1 file 15136 layouts, then the implementation MUST support the SECINFO_NO_NAME 15137 operation, on both the metadata and data servers. 15139 14. Internationalization 15141 The primary issue in which NFSv4.1 needs to deal with 15142 internationalization, or I18N, is with respect to file names and 15143 other strings as used within the protocol. The choice of string 15144 representation must allow reasonable name/string access to clients 15145 which use various languages. The UTF-8 encoding of the UCS as 15146 defined by ISO10646 [14] allows for this type of access and follows 15147 the policy described in "IETF Policy on Character Sets and 15148 Languages", RFC2277 [15]. 15150 RFC3454 [16], otherwise know as "stringprep", documents a framework 15151 for using Unicode/UTF-8 in networking protocols, so as "to increase 15152 the likelihood that string input and string comparison work in ways 15153 that make sense for typical users throughout the world." A protocol 15154 must define a profile of stringprep "in order to fully specify the 15155 processing options." The remainder of this Internationalization 15156 section defines the NFSv4.1 stringprep profiles. Much of terminology 15157 used for the remainder of this section comes from stringprep. 15159 There are three UTF-8 string types defined for NFSv4.1: utf8str_cs, 15160 utf8str_cis, and utf8str_mixed. Separate profiles are defined for 15161 each. Each profile defines the following, as required by stringprep: 15163 o The intended applicability of the profile 15165 o The character repertoire that is the input and output to 15166 stringprep (which is Unicode 3.2 for referenced version of 15167 stringprep) 15169 o The mapping tables from stringprep used (as described in section 3 15170 of stringprep) 15172 o Any additional mapping tables specific to the profile 15174 o The Unicode normalization used, if any (as described in section 4 15175 of stringprep) 15177 o The tables from stringprep listing of characters that are 15178 prohibited as output (as described in section 5 of stringprep) 15180 o The bidirectional string testing used, if any (as described in 15181 section 6 of stringprep) 15183 o Any additional characters that are prohibited as output specific 15184 to the profile 15186 Stringprep discusses Unicode characters, whereas NFSv4.1 renders 15187 UTF-8 characters. Since there is a one-to-one mapping from UTF-8 to 15188 Unicode, when the remainder of this document refers to Unicode, the 15189 reader should assume UTF-8. 15191 Much of the text for the profiles comes from RFC3491 [17]. 15193 14.1. Stringprep profile for the utf8str_cs type 15195 Every use of the utf8str_cs type definition in the NFSv4 protocol 15196 specification follows the profile named nfs4_cs_prep. 15198 14.1.1. Intended applicability of the nfs4_cs_prep profile 15200 The utf8str_cs type is a case sensitive string of UTF-8 characters. 15201 Its primary use in NFSv4.1 is for naming components and pathnames. 15202 Components and pathnames are stored on the server's file system. Two 15203 valid distinct UTF-8 strings might be the same after processing via 15204 the utf8str_cs profile. If the strings are two names inside a 15205 directory, the NFSv4.1 server will need to either: 15207 o disallow the creation of a second name if its post processed form 15208 collides with that of an existing name, or 15210 o allow the creation of the second name, but arrange so that after 15211 post processing, the second name is different than the post 15212 processed form of the first name. 15214 14.1.2. Character repertoire of nfs4_cs_prep 15216 The nfs4_cs_prep profile uses Unicode 3.2, as defined in stringprep's 15217 Appendix A.1 15219 14.1.3. Mapping used by nfs4_cs_prep 15221 The nfs4_cs_prep profile specifies mapping using the following tables 15222 from stringprep: 15224 Table B.1 15226 Table B.2 is normally not part of the nfs4_cs_prep profile as it is 15227 primarily for dealing with case-insensitive comparisons. However, if 15228 the NFSv4.1 file server supports the case_insensitive file system 15229 attribute, and if case_insensitive is TRUE, the NFSv4.1 server MUST 15230 use Table B.2 (in addition to Table B1) when processing utf8str_cs 15231 strings, and the NFSv4.1 client MUST assume Table B.2 (in addition to 15232 Table B.1) are being used. 15234 If the case_preserving attribute is present and set to FALSE, then 15235 the NFSv4.1 server MUST use table B.2 to map case when processing 15236 utf8str_cs strings. Whether the server maps from lower to upper case 15237 or the upper to lower case is an implementation dependency. 15239 14.1.4. Normalization used by nfs4_cs_prep 15241 The nfs4_cs_prep profile does not specify a normalization form. A 15242 later revision of this specification may specify a particular 15243 normalization form. Therefore, the server and client can expect that 15244 they may receive unnormalized characters within protocol requests and 15245 responses. If the operating environment requires normalization, then 15246 the implementation must normalize utf8str_cs strings within the 15247 protocol before presenting the information to an application (at the 15248 client) or local file system (at the server). 15250 14.1.5. Prohibited output for nfs4_cs_prep 15252 The nfs4_cs_prep profile specifies prohibiting using the following 15253 tables from stringprep: 15255 Table C.3 15257 Table C.4 15259 Table C.5 15261 Table C.6 15263 Table C.7 15265 Table C.8 15267 Table C.9 15269 14.1.6. Bidirectional output for nfs4_cs_prep 15271 The nfs4_cs_prep profile does not specify any checking of 15272 bidirectional strings. 15274 14.2. Stringprep profile for the utf8str_cis type 15276 Every use of the utf8str_cis type definition in the NFSv4.1 protocol 15277 specification follows the profile named nfs4_cis_prep. 15279 14.2.1. Intended applicability of the nfs4_cis_prep profile 15281 The utf8str_cis type is a case insensitive string of UTF-8 15282 characters. Its primary use in NFSv4.1 is for naming NFS servers. 15284 14.2.2. Character repertoire of nfs4_cis_prep 15286 The nfs4_cis_prep profile uses Unicode 3.2, as defined in 15287 stringprep's Appendix A.1 15289 14.2.3. Mapping used by nfs4_cis_prep 15291 The nfs4_cis_prep profile specifies mapping using the following 15292 tables from stringprep: 15294 Table B.1 15296 Table B.2 15298 14.2.4. Normalization used by nfs4_cis_prep 15300 The nfs4_cis_prep profile specifies using Unicode normalization form 15301 KC, as described in stringprep. 15303 14.2.5. Prohibited output for nfs4_cis_prep 15305 The nfs4_cis_prep profile specifies prohibiting using the following 15306 tables from stringprep: 15308 Table C.1.2 15310 Table C.2.2 15312 Table C.3 15314 Table C.4 15316 Table C.5 15318 Table C.6 15320 Table C.7 15322 Table C.8 15324 Table C.9 15326 14.2.6. Bidirectional output for nfs4_cis_prep 15328 The nfs4_cis_prep profile specifies checking bidirectional strings as 15329 described in stringprep's section 6. 15331 14.3. Stringprep profile for the utf8str_mixed type 15333 Every use of the utf8str_mixed type definition in the NFSv4.1 15334 protocol specification follows the profile named nfs4_mixed_prep. 15336 14.3.1. Intended applicability of the nfs4_mixed_prep profile 15338 The utf8str_mixed type is a string of UTF-8 characters, with a prefix 15339 that is case sensitive, a separator equal to '@', and a suffix that 15340 is fully qualified domain name. Its primary use in NFSv4.1 is for 15341 naming principals identified in an Access Control Entry. 15343 14.3.2. Character repertoire of nfs4_mixed_prep 15345 The nfs4_mixed_prep profile uses Unicode 3.2, as defined in 15346 stringprep's Appendix A.1 15348 14.3.3. Mapping used by nfs4_cis_prep 15350 For the prefix and the separator of a utf8str_mixed string, the 15351 nfs4_mixed_prep profile specifies mapping using the following table 15352 from stringprep: 15354 Table B.1 15356 For the suffix of a utf8str_mixed string, the nfs4_mixed_prep profile 15357 specifies mapping using the following tables from stringprep: 15359 Table B.1 15361 Table B.2 15363 14.3.4. Normalization used by nfs4_mixed_prep 15365 The nfs4_mixed_prep profile specifies using Unicode normalization 15366 form KC, as described in stringprep. 15368 14.3.5. Prohibited output for nfs4_mixed_prep 15370 The nfs4_mixed_prep profile specifies prohibiting using the following 15371 tables from stringprep: 15373 Table C.1.2 15375 Table C.2.2 15377 Table C.3 15379 Table C.4 15381 Table C.5 15383 Table C.6 15385 Table C.7 15387 Table C.8 15389 Table C.9 15391 14.3.6. Bidirectional output for nfs4_mixed_prep 15393 The nfs4_mixed_prep profile specifies checking bidirectional strings 15394 as described in stringprep's section 6. 15396 14.4. UTF-8 Capabilities 15398 const FSCHARSET_CAP4_CONTAINS_NON_UTF8 = 0x1; 15399 const FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 = 0x2; 15401 typedef uint32_t fs_charset_cap4; 15403 Because some operating environments and file systems do not enforce 15404 character set encodings, NFSv4.1 supports the fs_charset_cap 15405 attribute (Section 5.8.2.11) that indicates to the client a file 15406 system's UTF-8 capabilities. The attribute is an integer containing 15407 a pair of flags. The first flag is FSCHARSET_CAP4_CONTAINS_NON_UTF8, 15408 which, if set to one tells the client the file system contains non- 15409 UTF-8 characters, and the server will not convert non-UTF characters 15410 to UTF-8 if the client reads a symlink or directory, nor will 15411 operations with component names or pathnames in the arguments convert 15412 the strings to UTF-8. The second flag is 15413 FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 which if set to one, indicates that 15414 the server will accept (and generate) only UTF-8 characters on the 15415 file system. If FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 is set to one, 15416 FSCHARSET_CAP4_CONTAINS_NON_UTF8 MUST be set to zero. 15417 FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 SHOULD always be set to one. 15419 14.5. UTF-8 Related Errors 15421 Where the client sends an invalid UTF-8 string, the server should 15422 return NFS4ERR_INVAL (see Table 11). This includes cases in which 15423 inappropriate prefixes are detected and where the count includes 15424 trailing bytes that do not constitute a full UCS character. 15426 Where the client supplied string is valid UTF-8 but contains 15427 characters that are not supported by the server as a value for that 15428 string (e.g. names containing characters that have more than two 15429 bytes on a file system that supports Unicode characters only), the 15430 server should return NFS4ERR_BADCHAR. 15432 Where a UTF-8 string is used as a file name, and the file system, 15433 while supporting all of the characters within the name, does not 15434 allow that particular name to be used, the server should return the 15435 error NFS4ERR_BADNAME (Table 11). This includes situations in which 15436 the server file system imposes a normalization constraint on name 15437 strings, but will also include such situations as file system 15438 prohibitions of "." and ".." as file names for certain operations, 15439 and other such constraints. 15441 15. Error Values 15443 NFS error numbers are assigned to failed operations within a Compound 15444 (COMPOUND or CB_COMPOUND) request. A Compound request contains a 15445 number of NFS operations that have their results encoded in sequence 15446 in a Compound reply. The results of successful operations will 15447 consist of an NFS4_OK status followed by the encoded results of the 15448 operation. If an NFS operation fails, an error status will be 15449 entered in the reply and the Compound request will be terminated. 15451 15.1. Error Definitions 15453 Protocol Error Definitions 15455 +-----------------------------------+--------+-------------------+ 15456 | Error | Number | Description | 15457 +-----------------------------------+--------+-------------------+ 15458 | NFS4_OK | 0 | Section 15.1.3.1 | 15459 | NFS4ERR_ACCESS | 13 | Section 15.1.6.1 | 15460 | NFS4ERR_ATTRNOTSUPP | 10032 | Section 15.1.15.1 | 15461 | NFS4ERR_ADMIN_REVOKED | 10047 | Section 15.1.5.1 | 15462 | NFS4ERR_BACK_CHAN_BUSY | 10057 | Section 15.1.12.1 | 15463 | NFS4ERR_BADCHAR | 10040 | Section 15.1.7.1 | 15464 | NFS4ERR_BADHANDLE | 10001 | Section 15.1.2.1 | 15465 | NFS4ERR_BADIOMODE | 10049 | Section 15.1.10.1 | 15466 | NFS4ERR_BADLAYOUT | 10050 | Section 15.1.10.2 | 15467 | NFS4ERR_BADNAME | 10041 | Section 15.1.7.2 | 15468 | NFS4ERR_BADOWNER | 10039 | Section 15.1.15.2 | 15469 | NFS4ERR_BADSESSION | 10052 | Section 15.1.11.1 | 15470 | NFS4ERR_BADSLOT | 10053 | Section 15.1.11.2 | 15471 | NFS4ERR_BADTYPE | 10007 | Section 15.1.4.1 | 15472 | NFS4ERR_BADXDR | 10036 | Section 15.1.1.1 | 15473 | NFS4ERR_BAD_COOKIE | 10003 | Section 15.1.1.2 | 15474 | NFS4ERR_BAD_HIGH_SLOT | 10077 | Section 15.1.11.3 | 15475 | NFS4ERR_BAD_RANGE | 10042 | Section 15.1.8.1 | 15476 | NFS4ERR_BAD_SEQID | 10026 | Section 15.1.16.1 | 15477 | NFS4ERR_BAD_SESSION_DIGEST | 10051 | Section 15.1.12.2 | 15478 | NFS4ERR_BAD_STATEID | 10025 | Section 15.1.5.2 | 15479 | NFS4ERR_CB_PATH_DOWN | 10048 | Section 15.1.11.4 | 15480 | NFS4ERR_CLID_INUSE | 10017 | Section 15.1.13.2 | 15481 | NFS4ERR_CLIENTID_BUSY | 10074 | Section 15.1.13.1 | 15482 | NFS4ERR_COMPLETE_ALREADY | 10054 | Section 15.1.9.1 | 15483 | NFS4ERR_CONN_NOT_BOUND_TO_SESSION | 10055 | Section 15.1.11.6 | 15484 | NFS4ERR_DEADLOCK | 10045 | Section 15.1.8.2 | 15485 | NFS4ERR_DEADSESSION | 10078 | Section 15.1.11.5 | 15486 | NFS4ERR_DELAY | 10008 | Section 15.1.1.3 | 15487 | NFS4ERR_DELEG_ALREADY_WANTED | 10056 | Section 15.1.14.1 | 15488 | NFS4ERR_DENIED | 10010 | Section 15.1.8.3 | 15489 | NFS4ERR_DIRDELEG_UNAVAIL | 10084 | Section 15.1.14.2 | 15490 | NFS4ERR_DQUOT | 69 | Section 15.1.4.2 | 15491 | NFS4ERR_ENCR_ALG_UNSUPP | 10079 | Section 15.1.13.3 | 15492 | NFS4ERR_EXIST | 17 | Section 15.1.4.3 | 15493 | NFS4ERR_EXPIRED | 10011 | Section 15.1.5.4 | 15494 | NFS4ERR_FBIG | 27 | Section 15.1.4.4 | 15495 | NFS4ERR_FHEXPIRED | 10014 | Section 15.1.2.2 | 15496 | NFS4ERR_FILE_OPEN | 10046 | Section 15.1.4.5 | 15497 | NFS4ERR_GRACE | 10013 | Section 15.1.9.2 | 15498 | NFS4ERR_HASH_ALG_UNSUPP | 10072 | Section 15.1.13.4 | 15499 | NFS4ERR_INVAL | 22 | Section 15.1.1.4 | 15500 | NFS4ERR_IO | 5 | Section 15.1.4.6 | 15501 | NFS4ERR_ISDIR | 21 | Section 15.1.2.3 | 15502 | NFS4ERR_LAYOUTTRYLATER | 10058 | Section 15.1.10.3 | 15503 | NFS4ERR_LAYOUTUNAVAILABLE | 10059 | Section 15.1.10.4 | 15504 | NFS4ERR_LEASE_MOVED | 10031 | Section 15.1.16.2 | 15505 | NFS4ERR_LOCKED | 10012 | Section 15.1.8.4 | 15506 | NFS4ERR_LOCKS_HELD | 10037 | Section 15.1.8.5 | 15507 | NFS4ERR_LOCK_NOTSUPP | 10043 | Section 15.1.8.6 | 15508 | NFS4ERR_LOCK_RANGE | 10028 | Section 15.1.8.7 | 15509 | NFS4ERR_MINOR_VERS_MISMATCH | 10021 | Section 15.1.3.2 | 15510 | NFS4ERR_MLINK | 31 | Section 15.1.4.7 | 15511 | NFS4ERR_MOVED | 10019 | Section 15.1.2.4 | 15512 | NFS4ERR_NAMETOOLONG | 63 | Section 15.1.7.3 | 15513 | NFS4ERR_NOENT | 2 | Section 15.1.4.8 | 15514 | NFS4ERR_NOFILEHANDLE | 10020 | Section 15.1.2.5 | 15515 | NFS4ERR_NOMATCHING_LAYOUT | 10060 | Section 15.1.10.5 | 15516 | NFS4ERR_NOSPC | 28 | Section 15.1.4.9 | 15517 | NFS4ERR_NOTDIR | 20 | Section 15.1.2.6 | 15518 | NFS4ERR_NOTEMPTY | 66 | Section 15.1.4.10 | 15519 | NFS4ERR_NOTSUPP | 10004 | Section 15.1.1.5 | 15520 | NFS4ERR_NOT_ONLY_OP | 10081 | Section 15.1.3.3 | 15521 | NFS4ERR_NOT_SAME | 10027 | Section 15.1.15.3 | 15522 | NFS4ERR_NO_GRACE | 10033 | Section 15.1.9.3 | 15523 | NFS4ERR_NXIO | 6 | Section 15.1.16.3 | 15524 | NFS4ERR_OLD_STATEID | 10024 | Section 15.1.5.5 | 15525 | NFS4ERR_OPENMODE | 10038 | Section 15.1.8.8 | 15526 | NFS4ERR_OP_ILLEGAL | 10044 | Section 15.1.3.4 | 15527 | NFS4ERR_OP_NOT_IN_SESSION | 10071 | Section 15.1.3.5 | 15528 | NFS4ERR_PERM | 1 | Section 15.1.6.2 | 15529 | NFS4ERR_PNFS_IO_HOLE | 10075 | Section 15.1.10.6 | 15530 | NFS4ERR_PNFS_NO_LAYOUT | 10080 | Section 15.1.10.7 | 15531 | NFS4ERR_RECALLCONFLICT | 10061 | Section 15.1.14.3 | 15532 | NFS4ERR_RECLAIM_BAD | 10034 | Section 15.1.9.4 | 15533 | NFS4ERR_RECLAIM_CONFLICT | 10035 | Section 15.1.9.5 | 15534 | NFS4ERR_REJECT_DELEG | 10085 | Section 15.1.14.4 | 15535 | NFS4ERR_REP_TOO_BIG | 10066 | Section 15.1.3.6 | 15536 | NFS4ERR_REP_TOO_BIG_TO_CACHE | 10067 | Section 15.1.3.7 | 15537 | NFS4ERR_REQ_TOO_BIG | 10065 | Section 15.1.3.8 | 15538 | NFS4ERR_RESTOREFH | 10030 | Section 15.1.16.4 | 15539 | NFS4ERR_RETRY_UNCACHED_REP | 10068 | Section 15.1.3.9 | 15540 | NFS4ERR_RETURNCONFLICT | 10086 | Section 15.1.10.8 | 15541 | NFS4ERR_ROFS | 30 | Section 15.1.4.11 | 15542 | NFS4ERR_SAME | 10009 | Section 15.1.15.4 | 15543 | NFS4ERR_SHARE_DENIED | 10015 | Section 15.1.8.9 | 15544 | NFS4ERR_SEQUENCE_POS | 10064 | Section 15.1.3.10 | 15545 | NFS4ERR_SEQ_FALSE_RETRY | 10076 | Section 15.1.11.7 | 15546 | NFS4ERR_SEQ_MISORDERED | 10063 | Section 15.1.11.8 | 15547 | NFS4ERR_SERVERFAULT | 10006 | Section 15.1.1.6 | 15548 | NFS4ERR_STALE | 70 | Section 15.1.2.7 | 15549 | NFS4ERR_STALE_CLIENTID | 10022 | Section 15.1.13.5 | 15550 | NFS4ERR_STALE_STATEID | 10023 | Section 15.1.16.5 | 15551 | NFS4ERR_SYMLINK | 10029 | Section 15.1.2.8 | 15552 | NFS4ERR_TOOSMALL | 10005 | Section 15.1.1.7 | 15553 | NFS4ERR_TOO_MANY_OPS | 10070 | Section 15.1.3.11 | 15554 | NFS4ERR_UNKNOWN_LAYOUTTYPE | 10062 | Section 15.1.10.9 | 15555 | NFS4ERR_UNSAFE_COMPOUND | 10069 | Section 15.1.3.12 | 15556 | NFS4ERR_WRONGSEC | 10016 | Section 15.1.6.3 | 15557 | NFS4ERR_WRONG_CRED | 10082 | Section 15.1.6.4 | 15558 | NFS4ERR_WRONG_TYPE | 10083 | Section 15.1.2.9 | 15559 | NFS4ERR_XDEV | 18 | Section 15.1.4.12 | 15560 +-----------------------------------+--------+-------------------+ 15562 Table 11 15564 15.1.1. General Errors 15566 This section deals with errors that are applicable to a broad set of 15567 different purposes. 15569 15.1.1.1. NFS4ERR_BADXDR (Error Code 10036) 15571 The arguments for this operation do not match those specified in the 15572 XDR definition. This includes situations in which the request ends 15573 before all the arguments have been seen. Note that this error 15574 applies when fixed enumerations (these include booleans) have a value 15575 within the input stream which is not valid for the enum. A replier 15576 may pre-parse all operations for a Compound procedure before doing 15577 any operation execution and return RPC-level XDR errors in that case. 15579 15.1.1.2. NFS4ERR_BAD_COOKIE (Error Code 10003) 15581 Used for operations that provide a set of information indexed by some 15582 quantity provided by the client or cookie sent by the server for an 15583 earlier invocation. Where the value cannot be used for its intended 15584 purpose, this error results. 15586 15.1.1.3. NFS4ERR_DELAY (Error Code 10008) 15588 For any of a number of reasons, the replier could not process this 15589 operation in what was deemed a reasonable time. The client should 15590 wait and then try the request with a new slot and sequence value. 15592 Some example of situations that might lead to this situation: 15594 o A server that supports hierarchical storage receives a request to 15595 process a file that had been migrated. 15597 o An operation requires a delegation recall to proceed and waiting 15598 for this delegation recall makes processing this request in a 15599 timely fashion impossible. 15601 In such cases, the error NFS4ERR_DELAY allows these preparatory 15602 operations to proceed without holding up client resources such as a 15603 session slot. After delaying for period of time, the client can then 15604 re-send the operation in question (but not with the same slot ID and 15605 sequence ID; one or both MUST be different on the re-send). 15607 Note that without the ability to return NFS4ERR_DELAY and the 15608 client's willingness to re-send when receiving it, deadlock might 15609 well result. E.g., if a recall is done, and if the delegation return 15610 or operations preparatory to delegation return are held up by other 15611 operations that need the delegation to be returned, session slots 15612 might not be available. The result could be deadlock. 15614 15.1.1.4. NFS4ERR_INVAL (Error Code 22) 15616 The arguments for this operation are not valid for some reason, even 15617 though they do match those specified in the XDR definition for the 15618 request. 15620 15.1.1.5. NFS4ERR_NOTSUPP (Error Code 10004) 15622 Operation not supported, either because the operation is an OPTIONAL 15623 one and is not supported by this server or because the operation MUST 15624 NOT be implemented in the current minor version. 15626 15.1.1.6. NFS4ERR_SERVERFAULT (Error Code 10006) 15628 An error occurred on the server which does not map to any of the 15629 specific legal NFSv4.1 protocol error values. The client should 15630 translate this into an appropriate error. UNIX clients may choose to 15631 translate this to EIO. 15633 15.1.1.7. NFS4ERR_TOOSMALL (Error Code 10005) 15635 Used where an operation returns a variable amount of data, with a 15636 limit specified by the client. Where the data returned cannot be fit 15637 within the limit specified by the client, this error results. 15639 15.1.2. Filehandle Errors 15641 These errors deal with the situation in which the current or saved 15642 filehandle, or the filehandle passed to PUTFH intended to become the 15643 current filehandle, is invalid in some way. This includes situations 15644 in which the filehandle is a valid filehandle in general but is not 15645 of the appropriate object type for the current operation. 15647 Where the error description indicates a problem with the current or 15648 saved filehandle, it is to be understood that filehandles are only 15649 checked for the condition if they are implicit arguments of the 15650 operation in question. 15652 15.1.2.1. NFS4ERR_BADHANDLE (Error Code 10001) 15654 Illegal NFS filehandle for the current server. The current file 15655 handle failed internal consistency checks. Once accepted as valid 15656 (by PUTFH), no subsequent status change can cause the filehandle to 15657 generate this error. 15659 15.1.2.2. NFS4ERR_FHEXPIRED (Error Code 10014) 15661 A current or saved filehandle which is an argument to the current 15662 operation is volatile and has expired at the server. 15664 15.1.2.3. NFS4ERR_ISDIR (Error Code 21) 15666 The current or saved filehandle designates a directory when the 15667 current operation does not allow a directory to be accepted as the 15668 target of this operation. 15670 15.1.2.4. NFS4ERR_MOVED (Error Code 10019) 15672 The file system which contains the current filehandle object is not 15673 present at the server. It may have been relocated, migrated to 15674 another server or may have never been present. The client may obtain 15675 the new file system location by obtaining the "fs_locations" or 15676 "fs_locations_info" attribute for the current filehandle. For 15677 further discussion, refer to Section 11.2 15679 15.1.2.5. NFS4ERR_NOFILEHANDLE (Error Code 10020) 15681 The logical current or saved filehandle value is required by the 15682 current operation and is not set. This may be a result of a 15683 malformed COMPOUND operation (i.e. no PUTFH or PUTROOTFH before an 15684 operation that requires the current filehandle be set). 15686 15.1.2.6. NFS4ERR_NOTDIR (Error Code 20) 15688 The current (or saved) filehandle designates an object which is not a 15689 directory for an operation in which a directory is required. 15691 15.1.2.7. NFS4ERR_STALE (Error Code 70) 15693 The current or saved filehandle value designating an argument to the 15694 current operation is invalid The file referred to by that filehandle 15695 no longer exists or access to it has been revoked. 15697 15.1.2.8. NFS4ERR_SYMLINK (Error Code 10029) 15699 The current filehandle designates a symbolic link when the current 15700 operation does not allow a symbolic link as the target. 15702 15.1.2.9. NFS4ERR_WRONG_TYPE (Error Code 10083) 15704 The current (or saved) filehandle designates an object which is of an 15705 invalid type for the current operation and there is no more specific 15706 error (such as NFS4ERR_ISDIR or NFS4ERR_SYMLINK) that applies. Note 15707 that in NFSv4.0, such situations generally resulted in the less 15708 specific error NFS4ERR_INVAL. 15710 15.1.3. Compound Structure Errors 15712 This section deals with errors that relate to overall structure of a 15713 Compound request (by which we mean to include both COMPOUND and 15714 CB_COMPOUND), rather than to particular operations. 15716 There are a number of basic constraints on the operations that may 15717 appear in a Compound request. Sessions adds to these basic 15718 constraints by requiring a Sequence operation (either SEQUENCE or 15719 CB_SEQUENCE) at the start of the Compound. 15721 15.1.3.1. NFS_OK (Error code 0) 15723 Indicates the operation completed successfully, in that all of the 15724 constituent operations completed without error. 15726 15.1.3.2. NFS4ERR_MINOR_VERS_MISMATCH (Error code 10021) 15728 The minor version specified is not one that the current listener 15729 supports. This value is returned in the overall status for the 15730 Compound but is not associated with a specific operation since the 15731 results must specify a result count of zero. 15733 15.1.3.3. NFS4ERR_NOT_ONLY_OP (Error Code 10081) 15735 Certain operations, which are allowed to be executed outside of a 15736 session, must be the only operation within a COMPOUND. This error 15737 results when that constraint is not met. 15739 15.1.3.4. NFS4ERR_OP_ILLEGAL (Error Code 10044) 15741 The operation code is not a valid one for the current Compound 15742 procedure. The opcode in the result stream matched with this error 15743 is the ILLEGAL value, although the value that appears in the request 15744 stream may be different. Where an illegal value appears and the 15745 replier pre-parses all operations for a Compound procedure before 15746 doing any operation execution, an RPC-level XDR error may be returned 15747 in this case. 15749 15.1.3.5. NFS4ERR_OP_NOT_IN_SESSION (Error Code 10071) 15751 Most forward operations and all callback operations are only valid 15752 within the context of a session, so that the Compound request in 15753 question must begin with a Sequence operation. If an attempt is made 15754 to execute these operations outside the context of session, this 15755 error results. 15757 15.1.3.6. NFS4ERR_REP_TOO_BIG (Error Code 10066) 15759 The reply to a Compound would exceed the channel's negotiated maximum 15760 response size. 15762 15.1.3.7. NFS4ERR_REP_TOO_BIG_TO_CACHE (Error Code 10067) 15764 The reply to a Compound would exceed the channel's negotiated maximum 15765 size for replies cached in the reply cache when the Sequence for the 15766 current request specifies that this request is to be cached. 15768 15.1.3.8. NFS4ERR_REQ_TOO_BIG (Error Code 10065) 15770 The Compound request exceeds the channel's negotiated maximum size 15771 for requests. 15773 15.1.3.9. NFS4ERR_RETRY_UNCACHED_REP (Error Code 10068) 15775 The requester has attempted a retry of a Compound which it previously 15776 requested not be placed in the reply cache. 15778 15.1.3.10. NFS4ERR_SEQUENCE_POS (Error Code 10064) 15780 A Sequence operation appeared in a position other than the first 15781 operation of a Compound request. 15783 15.1.3.11. NFS4ERR_TOO_MANY_OPS (Error Code 10070) 15785 The Compound request has too many operations, exceeding the count 15786 negotiated when the session was created. 15788 15.1.3.12. NFS4ERR_UNSAFE_COMPOUND (Error Code 10068) 15790 The client has sent a COMPOUND request with an unsafe mix of 15791 operations, specifically with a non-idempotent operation changing the 15792 current filehandle which is not followed by a GETFH. 15794 15.1.4. File System Errors 15796 These errors describe situations which occurred in the underlying 15797 file system implementation rather than in the protocol or any NFSv4.x 15798 feature. 15800 15.1.4.1. NFS4ERR_BADTYPE (Error Code 10007) 15802 An attempt was made to create an object with an inappropriate type 15803 specified to CREATE. This may be because the type is undefined, 15804 because it is a type not supported by the server, or because it is a 15805 type for which create is not intended such as a regular file or named 15806 attribute, for which OPEN is used to do the file creation. 15808 15.1.4.2. NFS4ERR_DQUOT (Error Code 19) 15810 Resource (quota) hard limit exceeded. The user's resource limit on 15811 the server has been exceeded. 15813 15.1.4.3. NFS4ERR_EXIST (Error Code 17) 15815 A file of the specified target name (when creating, renaming or 15816 linking) already exists. 15818 15.1.4.4. NFS4ERR_FBIG (Error Code 27) 15820 File too large. The operation would have caused a file to grow 15821 beyond the server's limit. 15823 15.1.4.5. NFS4ERR_FILE_OPEN (Error Code 10046) 15825 The operation is not allowed because a file involved in the operation 15826 is currently open. Servers may, but are not required to disallow 15827 linking-to, removing, or renaming open files. 15829 15.1.4.6. NFS4ERR_IO (Error Code 5) 15831 Indicates that an I/O error occurred for which the file system was 15832 unable to provide recovery. 15834 15.1.4.7. NFS4ERR_MLINK (Error Code 31) 15836 The request would have caused the server's limit for the number of 15837 hard links a file may have to be exceeded. 15839 15.1.4.8. NFS4ERR_NOENT (Error Code 2) 15841 Indicates no such file or directory. The file or directory name 15842 specified does not exist. 15844 15.1.4.9. NFS4ERR_NOSPC (Error Code 28) 15846 Indicates no space left on device. The operation would have caused 15847 the server's file system to exceed its limit. 15849 15.1.4.10. NFS4ERR_NOTEMPTY (Error Code 66) 15851 An attempt was made to remove a directory that was not empty. 15853 15.1.4.11. NFS4ERR_ROFS (Error Code 30) 15855 Indicates a read-only file system. A modifying operation was 15856 attempted on a read-only file system. 15858 15.1.4.12. NFS4ERR_XDEV (Error Code 18) 15860 Indicates an attempt to do an operation, such as linking, that 15861 inappropriately crosses a boundary. This may be due to such 15862 boundaries as: 15864 o That between file systems (where the fsids are different). 15866 o That between different named attribute directories or between a 15867 named attribute directory and an ordinary directory. 15869 o That between regions of a file system that the file system 15870 implementation treats as separate (for example for space 15871 accounting purposes), and where cross-connection between the 15872 regions are not allowed. 15874 15.1.5. State Management Errors 15876 These errors indicate problems with the stateid (or one of the 15877 stateids) passed to a given operation. This includes situations in 15878 which the stateid is invalid as well as situations in which the 15879 stateid is valid but designates revoked locking state. Depending on 15880 the operation, the stateid when valid may designate opens, byte-range 15881 locks, file or directory delegations, layouts, or device maps. 15883 15.1.5.1. NFS4ERR_ADMIN_REVOKED (Error Code 10047) 15885 A stateid designates locking state of any type that has been revoked 15886 due to administrative interaction, possibly while the lease is valid. 15888 15.1.5.2. NFS4ERR_BAD_STATEID (Error Code 10026) 15890 A stateid does not properly designate any valid state. See 15891 Section 8.2.4 and Section 8.2.3 for a discussion of how stateids are 15892 validated. 15894 15.1.5.3. NFS4ERR_DELEG_REVOKED (Error Code 10056) 15896 A stateid designates recallable locking state of any type that has 15897 been revoked due to the failure of the client to return the lock, 15898 when it was recalled. 15900 15.1.5.4. NFS4ERR_EXPIRED (Error Code 10011) 15902 A stateid designates locking state of any type that has been revoked 15903 due to expiration of the client's lease, either immediately upon 15904 lease expiration, or following a later request for a conflicting 15905 lock. 15907 15.1.5.5. NFS4ERR_OLD_STATEID (Error Code 10024) 15909 A stateid with a non-zero seqid value does match the current seqid 15910 for the state designated by the user. 15912 15.1.6. Security Errors 15914 These are the various permission-related errors in NFSv4.1. 15916 15.1.6.1. NFS4ERR_ACCESS (Error Code 13) 15918 Indicates permission denied. The caller does not have the correct 15919 permission to perform the requested operation. Contrast this with 15920 NFS4ERR_PERM (Section 15.1.6.2), which restricts itself to owner or 15921 privileged user permission failures, and NFS4ERR_WRONG_CRED 15922 (Section 15.1.6.4) which deals with appropriate permission to delete 15923 or modify transient objects, based on the credentials of the user 15924 that created them. 15926 15.1.6.2. NFS4ERR_PERM (Error Code 1) 15928 Indicates requester is not the owner. The operation was not allowed 15929 because the caller is neither a privileged user (root) nor the owner 15930 of the target of the operation. 15932 15.1.6.3. NFS4ERR_WRONGSEC (Error Code 10016) 15934 Indicates that the security mechanism being used by the client for 15935 the operation does not match the server's security policy. The 15936 client should change the security mechanism being used and re-send 15937 the operation (but not with the same slot ID and sequence ID; one or 15938 both MUST be different on the re-send). SECINFO and SECINFO_NO_NAME 15939 can be used to determine the appropriate mechanism. 15941 15.1.6.4. NFS4ERR_WRONG_CRED (Error Code 10082) 15943 An operation manipulating state was attempted by a principal that was 15944 not allowed to modify that piece of state. 15946 15.1.7. Name Errors 15948 Names in NFSv4 are UTF-8 strings. When the strings are not valid 15949 UTF-8 or are of length zero, the error NFS4ERR_INVAL results. 15950 Besides this, there are a number of other errors to indicate specific 15951 problems with names. 15953 15.1.7.1. NFS4ERR_BADCHAR (Error Code 10040) 15955 A UTF-8 string contains a character which is not supported by the 15956 server in the context in which it being used. 15958 15.1.7.2. NFS4ERR_BADNAME (Error Code 10041) 15960 A name string in a request consisted of valid UTF-8 characters 15961 supported by the server but the name is not supported by the server 15962 as a valid name for current operation. An example might be creating 15963 a file or directory named ".." on a server whose file system uses 15964 that name for links to parent directories. 15966 15.1.7.3. NFS4ERR_NAMETOOLONG (Error Code 63) 15968 Returned when the filename in an operation exceeds the server's 15969 implementation limit. 15971 15.1.8. Locking Errors 15973 This section deal with errors related to locking, both as to share 15974 reservations and byte-range locking. It does not deal with errors 15975 specific to the process of reclaiming locks. Those are dealt with in 15976 the next section. 15978 15.1.8.1. NFS4ERR_BAD_RANGE (Error Code 10042) 15980 The range for a LOCK, LOCKT, or LOCKU operation is not appropriate to 15981 the allowable range of offsets for the server. E.g., this error 15982 results when a server which only supports 32-bit ranges receives a 15983 range that cannot be handled by that server. (See Section 18.10.3). 15985 15.1.8.2. NFS4ERR_DEADLOCK (Error Code 10045) 15987 The server has been able to determine a file locking deadlock 15988 condition for a blocking lock request. 15990 15.1.8.3. NFS4ERR_DENIED (Error Code 10010) 15992 An attempt to lock a file is denied. Since this may be a temporary 15993 condition, the client is encouraged to re-send the lock request (but 15994 not with the same slot ID and sequence ID; one or both MUST be 15995 different on the re-send) until the lock is accepted. See 15996 Section 9.6 for a discussion of the re-send. 15998 15.1.8.4. NFS4ERR_LOCKED (Error Code 10012) 16000 A read or write operation was attempted on a file where there was a 16001 conflict between the I/O and an existing lock: 16003 o There is a share reservation inconsistent with the I/O being done. 16005 o The range to be read or written intersects an existing mandatory 16006 byte range lock. 16008 15.1.8.5. NFS4ERR_LOCKS_HELD (Error Code 10037) 16010 An operation was prevented by the unexpected presence of locks. 16012 15.1.8.6. NFS4ERR_LOCK_NOTSUPP (Error Code 10043) 16014 A locking request was attempted which would require the upgrade or 16015 downgrade of a lock range already held by the owner when the server 16016 does not support atomic upgrade or downgrade of locks. 16018 15.1.8.7. NFS4ERR_LOCK_RANGE (Error Code 10028) 16020 A lock request is operating on a range that overlaps in part a 16021 currently held lock for the current lock owner and does not precisely 16022 match a single such lock where the server does not support this type 16023 of request, and thus does not implement POSIX locking semantics. See 16024 Section 18.10.4, Section 18.11.4, and Section 18.12.4 for a 16025 discussion of how this applies to LOCK, LOCKT, and LOCKU 16026 respectively. 16028 15.1.8.8. NFS4ERR_OPENMODE (Error Code 10038) 16030 The client attempted a READ, WRITE, LOCK or other operation not 16031 sanctioned by the stateid passed (e.g. writing to a file opened only 16032 for read). 16034 15.1.8.9. NFS4ERR_SHARE_DENIED (Error Code 10015) 16036 An attempt to OPEN a file with a share reservation has failed because 16037 of a share conflict. 16039 15.1.9. Reclaim Errors 16041 These errors relate to the process of reclaiming locks after a server 16042 restart. 16044 15.1.9.1. NFS4ERR_COMPLETE_ALREADY (Error Code 10054) 16046 The client previously sent a successful RECLAIM_COMPLETE operation. 16047 An additional RECLAIM_COMPLETE operation is not necessary and results 16048 in this error. 16050 15.1.9.2. NFS4ERR_GRACE (Error Code 10013) 16052 The server is in its recovery or grace period which should at least 16053 match the lease period of the server. A locking request other than a 16054 reclaim could not be granted during that period. 16056 15.1.9.3. NFS4ERR_NO_GRACE (Error Code 10033) 16058 A reclaim of client state was attempted in circumstances in which the 16059 server cannot guarantee that conflicting state has not been provided 16060 to another client. This can occur because the reclaim has been done 16061 outside of the grace period of the server, after the client has done 16062 a RECLAIM_COMPLETE operation, or because previous operations have 16063 created a situation in which the server is not able to determine that 16064 a reclaim-interfering edge condition does not exist. 16066 15.1.9.4. NFS4ERR_RECLAIM_BAD (Error Code 10034) 16068 A reclaim attempted by the client does not match the server's state 16069 consistency checks and has been rejected therefore as invalid. 16071 15.1.9.5. NFS4ERR_RECLAIM_CONFLICT (Error Code 10035) 16073 The reclaim attempted by the client has encountered a conflict and 16074 cannot be satisfied. Potentially indicates a misbehaving client, 16075 although not necessarily the one receiving the error. The 16076 misbehavior might be on the part of the client that established the 16077 lock with which this client conflicted. 16079 15.1.10. pNFS Errors 16081 This section deals with pNFS-related errors including those that are 16082 associated with using NFSv4.1 to communicate with a data server. 16084 15.1.10.1. NFS4ERR_BADIOMODE (Error Code 10049) 16086 An invalid or inappropriate layout iomode was specified. 16088 15.1.10.2. NFS4ERR_BADLAYOUT (Error Code 10050) 16090 The layout specified is invalid in some way. For LAYOUTCOMMIT, this 16091 indicates that the specified layout is not held by the client or is 16092 not of mode LAYOUTIOMODE4_RW. For LAYOUTGET, it indicates that a 16093 layout matching the client's specification as to minimum length 16094 cannot be granted. 16096 15.1.10.3. NFS4ERR_LAYOUTTRYLATER (Error Code 10058) 16098 Layouts are temporarily unavailable for the file. The client should 16099 re-send later (but not with the same slot ID and sequence ID; one or 16100 both MUST be different on the re-send). 16102 15.1.10.4. NFS4ERR_LAYOUTUNAVAILABLE (Error Code 10059) 16104 Returned when layouts are not available for the current file system 16105 or the particular specified file. 16107 15.1.10.5. NFS4ERR_NOMATCHING_LAYOUT (Error Code 10060) 16109 Returned when layouts are recalled and the client has no layouts 16110 matching the specification of the layouts being recalled. 16112 15.1.10.6. NFS4ERR_PNFS_IO_HOLE (Error Code 10075) 16114 The pNFS client has attempted to read from or write to an illegal 16115 hole of a file of a data server that is using sparse packing. See 16116 Section 13.4.4. 16118 15.1.10.7. NFS4ERR_PNFS_NO_LAYOUT (Error Code 10080) 16120 The pNFS client has attempted to read from or write to a file (using 16121 a request to a data server) without holding a valid layout. This 16122 includes the case where the client had a layout, but the iomode does 16123 not allow a WRITE. 16125 15.1.10.8. NFS4ERR_RETURNCONFLICT (Error Code 10086) 16127 A layout is unavailable due to an attempt to perform the LAYOUTGET 16128 before a pending LAYOUTRETURN on the file has been received. See 16129 Section 12.5.5.2.1.3. 16131 15.1.10.9. NFS4ERR_UNKNOWN_LAYOUTTYPE (Error Code 10062) 16133 The client has specified a layout type which is not supported by the 16134 server. 16136 15.1.11. Session Use Errors 16138 This section deals with errors encountered in using sessions, that 16139 is, in issuing requests over them using the Sequence (i.e. either 16140 SEQUENCE or CB_SEQUENCE) operations. 16142 15.1.11.1. NFS4ERR_BADSESSION (Error Code 10052) 16144 A session ID was specified which does not exist. 16146 15.1.11.2. NFS4ERR_BADSLOT (Error Code 10053) 16148 The requester sent a Sequence operation that attempted to use a slot 16149 the replier does not have in its slot table. It is possible the slot 16150 may have been retired. 16152 15.1.11.3. NFS4ERR_BAD_HIGH_SLOT (Error Code 10077) 16154 The highest_slot argument in a Sequence operation exceeds the 16155 replier's enforced highest_slotid. 16157 15.1.11.4. NFS4ERR_CB_PATH_DOWN (Error Code 10048) 16159 There is a problem contacting the client via the callback path. The 16160 function of this error has been mostly superseded by the use of 16161 status flags in the reply to the SEQUENCE operation (see 16162 Section 18.46). 16164 15.1.11.5. NFS4ERR_DEADSESSION (Error Code 10078) 16166 The specified session is a persistent session which is dead and does 16167 not accept new requests or perform new operations on existing 16168 requests (in the case in which a request was partially executed 16169 before server restart). 16171 15.1.11.6. NFS4ERR_CONN_NOT_BOUND_TO_SESSION (Error Code 10055) 16173 A Sequence operation was sent on a connection that has not been 16174 associated with the specified session, where the client specified 16175 that connection association was to be enforced with SP4_MACH_CRED or 16176 SP4_SSV state protection. 16178 15.1.11.7. NFS4ERR_SEQ_FALSE_RETRY (Error Code 10076) 16180 The requester sent a Sequence operation with a slot ID and sequence 16181 ID that are in the reply cache, but the replier has detected that the 16182 retried request is not the same as the original request. 16184 15.1.11.8. NFS4ERR_SEQ_MISORDERED (Error Code 10063) 16186 The requester sent a Sequence operation with an invalid sequence ID. 16188 15.1.12. Session Management Errors 16190 This section deals with errors associated with requests used in 16191 session management. 16193 15.1.12.1. NFS4ERR_BACK_CHAN_BUSY (Error Code 10057) 16195 An attempt was made to destroy a session when the session cannot be 16196 destroyed because the server has callback requests outstanding. 16198 15.1.12.2. NFS4ERR_BAD_SESSION_DIGEST (Error Code 10051) 16200 The digest used in a SET_SSV request is not valid. 16202 15.1.13. Client Management Errors 16204 This sections deals with errors associated with requests used to 16205 create and manage client IDs. 16207 15.1.13.1. NFS4ERR_CLIENTID_BUSY (Error Code 10074) 16209 The DESTROY_CLIENTID operation has found there are sessions and/or 16210 unexpired state associated with the client ID to be destroyed. 16212 15.1.13.2. NFS4ERR_CLID_INUSE (Error Code 10017) 16214 While processing an EXCHANGE_ID operation, the server was presented 16215 with a co_ownerid field matches an existing client with valid leased 16216 state but the principal issuing the EXCHANGE_ID is different than 16217 that establishing the existing client. This indicates a (most likely 16218 due to chance) collision between clients. The client should recover 16219 by changing the co_ownerid and re-sending EXCHANGE_ID (but not with 16220 the same slot ID and sequence ID; one or both MUST be different on 16221 the re-send). 16223 15.1.13.3. NFS4ERR_ENCR_ALG_UNSUPP (Error Code 10079) 16225 An EXCHANGE_ID was sent which specified state protection via SSV, and 16226 where the set of encryption algorithms presented by the client did 16227 not include any supported by the server. 16229 15.1.13.4. NFS4ERR_HASH_ALG_UNSUPP (Error Code 10072) 16231 An EXCHANGE_ID was sent which specified state protection via SSV, and 16232 where the set of hashing algorithms presented by the client did not 16233 include any supported by the server. 16235 15.1.13.5. NFS4ERR_STALE_CLIENTID (Error Code 10022) 16237 A client ID not recognized by the server was passed to an operation. 16238 Note that unlike the case of NFSv4.0, client IDs are not passed 16239 explicitly to the server in ordinary locking operations and cannot 16240 result in this error. Instead, when there is a server restart, it is 16241 first manifested through an error on the associated session and the 16242 staleness of the client ID is detected when trying to associate a 16243 client ID with a new session. 16245 15.1.14. Delegation Errors 16247 This section deals with errors associated with requesting and 16248 returning delegations. 16250 15.1.14.1. NFS4ERR_DELEG_ALREADY_WANTED (Error Code 10056) 16252 The client has requested a delegation when it had already registered 16253 that it wants that same delegation. 16255 15.1.14.2. NFS4ERR_DIRDELEG_UNAVAIL (Error Code 10084) 16257 This error is returned when the server is unable or unwilling to 16258 provide a requested directory delegation. 16260 15.1.14.3. NFS4ERR_RECALLCONFLICT (Error Code 10061) 16262 A recallable object (i.e. a layout or delegation) is unavailable due 16263 to a conflicting recall operation for that object that is currently 16264 in progress. 16266 15.1.14.4. NFS4ERR_REJECT_DELEG (Error Code 10085) 16268 The callback operation invoked to deal with a new delegation has 16269 rejected it. 16271 15.1.15. Attribute Handling Errors 16273 This section deals with errors specific to attribute handling within 16274 NFSv4. 16276 15.1.15.1. NFS4ERR_ATTRNOTSUPP (Error Code 10032) 16278 An attribute specified is not supported by the server. This error 16279 MUST NOT be returned by the GETATTR operation. 16281 15.1.15.2. NFS4ERR_BADOWNER (Error Code 10039) 16283 Returned when an owner or owner_group attribute value or the who 16284 field of an ace within an ACL attribute value cannot be translated to 16285 a local representation. 16287 15.1.15.3. NFS4ERR_NOT_SAME (Error Code 10027) 16289 This error is returned by the VERIFY operation to signify that the 16290 attributes compared were not the same as those provided in the 16291 client's request. 16293 15.1.15.4. NFS4ERR_SAME (Error Code 10009) 16295 This error is returned by the NVERIFY operation to signify that the 16296 attributes compared were the same as those provided in the client's 16297 request. 16299 15.1.16. Obsoleted Errors 16301 These errors MUST NOT be generated by any NFSv4.1 operation. This 16302 can be for a number of reasons. 16304 o The function provided by the error has been superseded by one of 16305 the status bits returned by the SEQUENCE operation. 16307 o The new session structure and associated change in locking have 16308 made the error unnecessary. 16310 o There has been a restructuring of some errors for NFSv4.1 which 16311 resulted in the elimination of certain of the errors. 16313 15.1.16.1. NFS4ERR_BAD_SEQID (Error Code 10026) 16315 The sequence number (seqid) in a locking request is neither the next 16316 expected number or the last number processed. These seqids are 16317 ignored in NFSv4.1. 16319 15.1.16.2. NFS4ERR_LEASE_MOVED (Error Code 10031) 16321 A lease being renewed is associated with a file system that has been 16322 migrated to a new server. The error has been superseded by the 16323 SEQ4_STATUS_LEASE_MOVED status bit (see Section 18.46). 16325 15.1.16.3. NFS4ERR_NXIO (Error Code 5) 16327 I/O error. No such device or address. This error is for errors 16328 involving block and character device access, but NFSv4.1 is not a 16329 device access protocol. 16331 15.1.16.4. NFS4ERR_RESTOREFH (Error Code 10030) 16333 The RESTOREFH operation does not have a saved filehandle (identified 16334 by SAVEFH) to operate upon. In NFSv4.1, this error has been 16335 superseded by NFS4ERR_NOFILEHANDLE. 16337 15.1.16.5. NFS4ERR_STALE_STATEID (Error Code 10023) 16339 A stateid generated by an earlier server instance was used. This 16340 error is moot in NFSv4.1 because all operations that take a stateid 16341 MUST be preceded by the SEQUENCE operation, and the earlier server 16342 instance is detected by the session infrastructure that supports 16343 SEQUENCE. 16345 15.2. Operations and their valid errors 16347 This section contains a table which gives the valid error returns for 16348 each protocol operation. The error code NFS4_OK (indicating no 16349 error) is not listed but should be understood to be returnable by all 16350 operations with two important exceptions: 16352 o The operations which MUST NOT be implemented: OPEN_CONFIRM, 16353 RELEASE_LOCKOWNER, RENEW, SETCLIENTID, and SETCLIENTID_CONFIRM. 16355 o The invalid operation: ILLEGAL. 16357 Valid error returns for each protocol operation 16359 +----------------------+--------------------------------------------+ 16360 | Operation | Errors | 16361 +----------------------+--------------------------------------------+ 16362 | ACCESS | NFS4ERR_ACCESS, NFS4ERR_BADXDR, | 16363 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16364 | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | 16365 | | NFS4ERR_IO, NFS4ERR_MOVED, | 16366 | | NFS4ERR_NOFILEHANDLE, | 16367 | | NFS4ERR_OP_NOT_IN_SESSION, | 16368 | | NFS4ERR_REP_TOO_BIG, | 16369 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16370 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16371 | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS | 16372 | BACKCHANNEL_CTL | NFS4ERR_BADXDR, NFS4ERR_DEADSESSION, | 16373 | | NFS4ERR_DELAY, NFS4ERR_INVAL, | 16374 | | NFS4ERR_NOENT, NFS4ERR_OP_NOT_IN_SESSION, | 16375 | | NFS4ERR_REP_TOO_BIG, | 16376 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16377 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS | 16378 | BIND_CONN_TO_SESSION | NFS4ERR_BADSESSION, NFS4ERR_BADXDR, | 16379 | | NFS4ERR_BAD_SESSION_DIGEST, | 16380 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16381 | | NFS4ERR_INVAL, NFS4ERR_NOT_ONLY_OP, | 16382 | | NFS4ERR_REP_TOO_BIG, | 16383 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16384 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16385 | | NFS4ERR_TOO_MANY_OPS | 16386 | CLOSE | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | 16387 | | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, | 16388 | | NFS4ERR_DELAY, NFS4ERR_EXPIRED, | 16389 | | NFS4ERR_FHEXPIRED, NFS4ERR_LOCKS_HELD, | 16390 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 16391 | | NFS4ERR_OLD_STATEID, | 16392 | | NFS4ERR_OP_NOT_IN_SESSION, | 16393 | | NFS4ERR_REP_TOO_BIG, | 16394 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16395 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16396 | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS, | 16397 | | NFS4ERR_WRONG_CRED | 16398 | COMMIT | NFS4ERR_ACCESS, NFS4ERR_BADXDR, | 16399 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16400 | | NFS4ERR_FHEXPIRED, NFS4ERR_IO, | 16401 | | NFS4ERR_ISDIR, NFS4ERR_MOVED, | 16402 | | NFS4ERR_NOFILEHANDLE, | 16403 | | NFS4ERR_OP_NOT_IN_SESSION, | 16404 | | NFS4ERR_REP_TOO_BIG, | 16405 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16406 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16407 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 16408 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE | 16409 | CREATE | NFS4ERR_ACCESS, NFS4ERR_ATTRNOTSUPP, | 16410 | | NFS4ERR_BADCHAR, NFS4ERR_BADNAME, | 16411 | | NFS4ERR_BADOWNER, NFS4ERR_BADTYPE, | 16412 | | NFS4ERR_BADXDR, NFS4ERR_DEADSESSION, | 16413 | | NFS4ERR_DELAY, NFS4ERR_DQUOT, | 16414 | | NFS4ERR_EXIST, NFS4ERR_FHEXPIRED, | 16415 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MLINK, | 16416 | | NFS4ERR_MOVED, NFS4ERR_NAMETOOLONG, | 16417 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, | 16418 | | NFS4ERR_NOTDIR, NFS4ERR_OP_NOT_IN_SESSION, | 16419 | | NFS4ERR_PERM, NFS4ERR_REP_TOO_BIG, | 16420 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16421 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_ROFS, | 16422 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 16423 | | NFS4ERR_TOO_MANY_OPS, | 16424 | | NFS4ERR_UNSAFE_COMPOUND | 16425 | CREATE_SESSION | NFS4ERR_BADXDR, NFS4ERR_DEADSESSION, | 16426 | | NFS4ERR_DELAY, NFS4ERR_INVAL, | 16427 | | NFS4ERR_NOENT, NFS4ERR_NOT_ONLY_OP, | 16428 | | NFS4ERR_NOSPC, NFS4ERR_REP_TOO_BIG, | 16429 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16430 | | NFS4ERR_REQ_TOO_BIG, | 16431 | | NFS4ERR_SEQ_MISORDERED, | 16432 | | NFS4ERR_SERVERFAULT, | 16433 | | NFS4ERR_STALE_CLIENTID, NFS4ERR_TOOSMALL, | 16434 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_CRED | 16435 | DELEGPURGE | NFS4ERR_BADXDR, NFS4ERR_DEADSESSION, | 16436 | | NFS4ERR_DELAY, NFS4ERR_NOTSUPP, | 16437 | | NFS4ERR_OP_NOT_IN_SESSION, | 16438 | | NFS4ERR_REP_TOO_BIG, | 16439 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16440 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16441 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_CRED | 16442 | DELEGRETURN | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | 16443 | | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, | 16444 | | NFS4ERR_DELAY, NFS4ERR_DELEG_REVOKED, | 16445 | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, | 16446 | | NFS4ERR_INVAL, NFS4ERR_MOVED, | 16447 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP, | 16448 | | NFS4ERR_OLD_STATEID, | 16449 | | NFS4ERR_OP_NOT_IN_SESSION, | 16450 | | NFS4ERR_REP_TOO_BIG, | 16451 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16452 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16453 | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS, | 16454 | | NFS4ERR_WRONG_CRED | 16455 | DESTROY_CLIENTID | NFS4ERR_BADXDR, NFS4ERR_CLIENTID_BUSY, | 16456 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16457 | | NFS4ERR_NOT_ONLY_OP, NFS4ERR_REP_TOO_BIG, | 16458 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16459 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16460 | | NFS4ERR_STALE_CLIENTID, | 16461 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_CRED | 16462 | DESTROY_SESSION | NFS4ERR_BACK_CHAN_BUSY, | 16463 | | NFS4ERR_BADSESSION, NFS4ERR_BADXDR, | 16464 | | NFS4ERR_CB_PATH_DOWN, | 16465 | | NFS4ERR_CONN_NOT_BOUND_TO_SESSION, | 16466 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16467 | | NFS4ERR_NOT_ONLY_OP, NFS4ERR_REP_TOO_BIG, | 16468 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16469 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16470 | | NFS4ERR_STALE_CLIENTID, | 16471 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_CRED | 16472 | EXCHANGE_ID | NFS4ERR_BADCHAR, NFS4ERR_BADXDR, | 16473 | | NFS4ERR_CLID_INUSE, NFS4ERR_DEADSESSION, | 16474 | | NFS4ERR_DELAY, NFS4ERR_ENCR_ALG_UNSUPP, | 16475 | | NFS4ERR_HASH_ALG_UNSUPP, NFS4ERR_INVAL, | 16476 | | NFS4ERR_NOENT, NFS4ERR_NOT_ONLY_OP, | 16477 | | NFS4ERR_NOT_SAME, NFS4ERR_REP_TOO_BIG, | 16478 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16479 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16480 | | NFS4ERR_TOO_MANY_OPS | 16481 | FREE_STATEID | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 16482 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16483 | | NFS4ERR_LOCKS_HELD, NFS4ERR_OLD_STATEID, | 16484 | | NFS4ERR_OP_NOT_IN_SESSION, | 16485 | | NFS4ERR_REP_TOO_BIG, | 16486 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16487 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16488 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_CRED | 16489 | GET_DIR_DELEGATION | NFS4ERR_ACCESS, NFS4ERR_BADXDR, | 16490 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16491 | | NFS4ERR_DIRDELEG_UNAVAIL, | 16492 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 16493 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED, | 16494 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTDIR, | 16495 | | NFS4ERR_NOTSUPP, | 16496 | | NFS4ERR_OP_NOT_IN_SESSION, | 16497 | | NFS4ERR_REP_TOO_BIG, | 16498 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16499 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16500 | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS | 16501 | GETATTR | NFS4ERR_ACCESS, NFS4ERR_BADXDR, | 16502 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16503 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 16504 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED, | 16505 | | NFS4ERR_NOFILEHANDLE, | 16506 | | NFS4ERR_OP_NOT_IN_SESSION, | 16507 | | NFS4ERR_REP_TOO_BIG, | 16508 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16509 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16510 | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS, | 16511 | | NFS4ERR_WRONG_TYPE | 16512 | GETDEVICEINFO | NFS4ERR_BADXDR, NFS4ERR_DEADSESSION, | 16513 | | NFS4ERR_DELAY, NFS4ERR_INVAL, | 16514 | | NFS4ERR_NOENT, NFS4ERR_NOTSUPP, | 16515 | | NFS4ERR_OP_NOT_IN_SESSION, | 16516 | | NFS4ERR_REP_TOO_BIG, | 16517 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16518 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16519 | | NFS4ERR_TOOSMALL, NFS4ERR_TOO_MANY_OPS, | 16520 | | NFS4ERR_UNKNOWN_LAYOUTTYPE | 16521 | GETDEVICELIST | NFS4ERR_BADXDR, NFS4ERR_BAD_COOKIE, | 16522 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16523 | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | 16524 | | NFS4ERR_IO, NFS4ERR_NOFILEHANDLE, | 16525 | | NFS4ERR_NOTSUPP, NFS4ERR_NOT_SAME, | 16526 | | NFS4ERR_OP_NOT_IN_SESSION, | 16527 | | NFS4ERR_REP_TOO_BIG, | 16528 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16529 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16530 | | NFS4ERR_TOO_MANY_OPS, | 16531 | | NFS4ERR_UNKNOWN_LAYOUTTYPE | 16532 | GETFH | NFS4ERR_FHEXPIRED, NFS4ERR_MOVED, | 16533 | | NFS4ERR_NOFILEHANDLE, | 16534 | | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_STALE | 16535 | ILLEGAL | NFS4ERR_BADXDR NFS4ERR_OP_ILLEGAL | 16536 | LAYOUTCOMMIT | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 16537 | | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADIOMODE, | 16538 | | NFS4ERR_BADLAYOUT, NFS4ERR_BADXDR, | 16539 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16540 | | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | 16541 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 16542 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR | 16543 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 16544 | | NFS4ERR_NOTSUPP, NFS4ERR_NO_GRACE, | 16545 | | NFS4ERR_OP_NOT_IN_SESSION, | 16546 | | NFS4ERR_RECLAIM_BAD, | 16547 | | NFS4ERR_RECLAIM_CONFLICT, | 16548 | | NFS4ERR_REP_TOO_BIG, | 16549 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16550 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16551 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 16552 | | NFS4ERR_TOO_MANY_OPS, | 16553 | | NFS4ERR_UNKNOWN_LAYOUTTYPE, | 16554 | | NFS4ERR_WRONG_CRED | 16555 | LAYOUTGET | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 16556 | | NFS4ERR_BADIOMODE, NFS4ERR_BADLAYOUT, | 16557 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 16558 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16559 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_DQUOT, | 16560 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 16561 | | NFS4ERR_INVAL, NFS4ERR_IO, | 16562 | | NFS4ERR_LAYOUTTRYLATER, | 16563 | | NFS4ERR_LAYOUTUNAVAILABLE, NFS4ERR_LOCKED, | 16564 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 16565 | | NFS4ERR_NOSPC, NFS4ERR_NOTSUPP, | 16566 | | NFS4ERR_OLD_STATEID, NFS4ERR_OPENMODE, | 16567 | | NFS4ERR_OP_NOT_IN_SESSION, | 16568 | | NFS4ERR_RECALLCONFLICT, | 16569 | | NFS4ERR_REP_TOO_BIG, | 16570 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16571 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16572 | | NFS4ERR_STALE, NFS4ERR_TOOSMALL, | 16573 | | NFS4ERR_TOO_MANY_OPS, | 16574 | | NFS4ERR_UNKNOWN_LAYOUTTYPE, | 16575 | | NFS4ERR_WRONG_TYPE | 16576 | LAYOUTRETURN | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | 16577 | | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, | 16578 | | NFS4ERR_DELAY, NFS4ERR_DELEG_REVOKED, | 16579 | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, | 16580 | | NFS4ERR_GRACE, NFS4ERR_INVAL, | 16581 | | NFS4ERR_ISDIR, NFS4ERR_MOVED, | 16582 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP, | 16583 | | NFS4ERR_NO_GRACE, NFS4ERR_OLD_STATEID, | 16584 | | NFS4ERR_OP_NOT_IN_SESSION, | 16585 | | NFS4ERR_REP_TOO_BIG, | 16586 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16587 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16588 | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS, | 16589 | | NFS4ERR_UNKNOWN_LAYOUTTYPE, | 16590 | | NFS4ERR_WRONG_CRED, NFS4ERR_WRONG_TYPE | 16591 | LINK | NFS4ERR_ACCESS, NFS4ERR_BADCHAR, | 16592 | | NFS4ERR_BADNAME, NFS4ERR_BADXDR, | 16593 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16594 | | NFS4ERR_DQUOT, NFS4ERR_EXIST, | 16595 | | NFS4ERR_FHEXPIRED, NFS4ERR_FILE_OPEN, | 16596 | | NFS4ERR_GRACE, NFS4ERR_INVAL, | 16597 | | NFS4ERR_ISDIR, NFS4ERR_IO, NFS4ERR_MLINK, | 16598 | | NFS4ERR_MOVED, NFS4ERR_NAMETOOLONG, | 16599 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, | 16600 | | NFS4ERR_NOTDIR, NFS4ERR_NOTSUPP, | 16601 | | NFS4ERR_OP_NOT_IN_SESSION, | 16602 | | NFS4ERR_REP_TOO_BIG, | 16603 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16604 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_ROFS, | 16605 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 16606 | | NFS4ERR_SYMLINK, NFS4ERR_TOO_MANY_OPS, | 16607 | | NFS4ERR_WRONGSEC, NFS4ERR_WRONG_TYPE, | 16608 | | NFS4ERR_XDEV | 16609 | LOCK | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 16610 | | NFS4ERR_BADXDR, NFS4ERR_BAD_RANGE, | 16611 | | NFS4ERR_BAD_STATEID, NFS4ERR_DEADLOCK, | 16612 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16613 | | NFS4ERR_DENIED, NFS4ERR_EXPIRED, | 16614 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 16615 | | NFS4ERR_INVAL, NFS4ERR_ISDIR, | 16616 | | NFS4ERR_LOCK_NOTSUPP, NFS4ERR_LOCK_RANGE, | 16617 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 16618 | | NFS4ERR_NO_GRACE, NFS4ERR_OLD_STATEID, | 16619 | | NFS4ERR_OPENMODE, | 16620 | | NFS4ERR_OP_NOT_IN_SESSION, | 16621 | | NFS4ERR_RECLAIM_BAD, | 16622 | | NFS4ERR_RECLAIM_CONFLICT, | 16623 | | NFS4ERR_REP_TOO_BIG, | 16624 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16625 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_ROFS, | 16626 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 16627 | | NFS4ERR_SYMLINK, NFS4ERR_TOO_MANY_OPS, | 16628 | | NFS4ERR_WRONG_CRED, NFS4ERR_WRONG_TYPE | 16629 | LOCKT | NFS4ERR_ACCESS, NFS4ERR_BADXDR, | 16630 | | NFS4ERR_BAD_RANGE, NFS4ERR_DEADSESSION, | 16631 | | NFS4ERR_DELAY, NFS4ERR_DENIED, | 16632 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 16633 | | NFS4ERR_INVAL, NFS4ERR_ISDIR, | 16634 | | NFS4ERR_LOCK_RANGE, NFS4ERR_MOVED, | 16635 | | NFS4ERR_NOFILEHANDLE, | 16636 | | NFS4ERR_OP_NOT_IN_SESSION, | 16637 | | NFS4ERR_REP_TOO_BIG, | 16638 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16639 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_ROFS, | 16640 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 16641 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_CRED, | 16642 | | NFS4ERR_WRONG_TYPE | 16643 | LOCKU | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 16644 | | NFS4ERR_BADXDR, NFS4ERR_BAD_RANGE, | 16645 | | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, | 16646 | | NFS4ERR_DELAY, NFS4ERR_EXPIRED, | 16647 | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | 16648 | | NFS4ERR_LOCK_RANGE, NFS4ERR_MOVED, | 16649 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_OLD_STATEID, | 16650 | | NFS4ERR_OP_NOT_IN_SESSION, | 16651 | | NFS4ERR_REP_TOO_BIG, | 16652 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16653 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16654 | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS, | 16655 | | NFS4ERR_WRONG_CRED | 16656 | LOOKUP | NFS4ERR_ACCESS, NFS4ERR_BADCHAR, | 16657 | | NFS4ERR_BADNAME, NFS4ERR_BADXDR, | 16658 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16659 | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | 16660 | | NFS4ERR_IO, NFS4ERR_MOVED, | 16661 | | NFS4ERR_NAMETOOLONG, NFS4ERR_NOENT, | 16662 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTDIR, | 16663 | | NFS4ERR_OP_NOT_IN_SESSION, | 16664 | | NFS4ERR_REP_TOO_BIG, | 16665 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16666 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16667 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 16668 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONGSEC | 16669 | LOOKUPP | NFS4ERR_ACCESS, NFS4ERR_DEADSESSION, | 16670 | | NFS4ERR_DELAY, NFS4ERR_FHEXPIRED, | 16671 | | NFS4ERR_IO, NFS4ERR_MOVED, NFS4ERR_NOENT, | 16672 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTDIR, | 16673 | | NFS4ERR_OP_NOT_IN_SESSION, | 16674 | | NFS4ERR_REP_TOO_BIG, | 16675 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16676 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16677 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 16678 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONGSEC | 16679 | NVERIFY | NFS4ERR_ACCESS, NFS4ERR_ATTRNOTSUPP, | 16680 | | NFS4ERR_BADCHAR, NFS4ERR_BADXDR, | 16681 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16682 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 16683 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED, | 16684 | | NFS4ERR_NOFILEHANDLE, | 16685 | | NFS4ERR_OP_NOT_IN_SESSION, | 16686 | | NFS4ERR_REP_TOO_BIG, | 16687 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16688 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SAME, | 16689 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 16690 | | NFS4ERR_TOO_MANY_OPS, | 16691 | | NFS4ERR_UNKNOWN_LAYOUTTYPE, | 16692 | | NFS4ERR_WRONG_TYPE | 16693 | OPEN | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 16694 | | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADCHAR, | 16695 | | NFS4ERR_BADNAME, NFS4ERR_BADOWNER, | 16696 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 16697 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16698 | | NFS4ERR_DELEG_ALREADY_WANTED, | 16699 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_DQUOT, | 16700 | | NFS4ERR_EXIST, NFS4ERR_EXPIRED, | 16701 | | NFS4ERR_FBIG, NFS4ERR_FHEXPIRED, | 16702 | | NFS4ERR_GRACE, NFS4ERR_INVAL, | 16703 | | NFS4ERR_ISDIR, NFS4ERR_IO, NFS4ERR_MOVED, | 16704 | | NFS4ERR_NAMETOOLONG, NFS4ERR_NOENT, | 16705 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, | 16706 | | NFS4ERR_NOTDIR, NFS4ERR_NO_GRACE, | 16707 | | NFS4ERR_OLD_STATEID, | 16708 | | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_PERM, | 16709 | | NFS4ERR_RECLAIM_BAD, | 16710 | | NFS4ERR_RECLAIM_CONFLICT, | 16711 | | NFS4ERR_REP_TOO_BIG, | 16712 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16713 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_ROFS, | 16714 | | NFS4ERR_SERVERFAULT, NFS4ERR_SHARE_DENIED, | 16715 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 16716 | | NFS4ERR_TOO_MANY_OPS, | 16717 | | NFS4ERR_UNSAFE_COMPOUND, NFS4ERR_WRONGSEC, | 16718 | | NFS4ERR_WRONG_TYPE | 16719 | OPEN_CONFIRM | NFS4ERR_NOTSUPP | 16720 | OPEN_DOWNGRADE | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | 16721 | | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, | 16722 | | NFS4ERR_DELAY, NFS4ERR_EXPIRED, | 16723 | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | 16724 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 16725 | | NFS4ERR_OLD_STATEID, | 16726 | | NFS4ERR_OP_NOT_IN_SESSION, | 16727 | | NFS4ERR_REP_TOO_BIG, | 16728 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16729 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_ROFS, | 16730 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 16731 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_CRED | 16732 | OPENATTR | NFS4ERR_ACCESS, NFS4ERR_BADXDR, | 16733 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16734 | | NFS4ERR_DQUOT, NFS4ERR_FHEXPIRED, | 16735 | | NFS4ERR_IO, NFS4ERR_MOVED, NFS4ERR_NOENT, | 16736 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, | 16737 | | NFS4ERR_NOTSUPP, | 16738 | | NFS4ERR_OP_NOT_IN_SESSION, | 16739 | | NFS4ERR_REP_TOO_BIG, | 16740 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16741 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_ROFS, | 16742 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 16743 | | NFS4ERR_TOO_MANY_OPS, | 16744 | | NFS4ERR_UNSAFE_COMPOUND, | 16745 | | NFS4ERR_WRONG_TYPE | 16746 | PUTFH | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR, | 16747 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16748 | | NFS4ERR_MOVED, NFS4ERR_OP_NOT_IN_SESSION, | 16749 | | NFS4ERR_REP_TOO_BIG, | 16750 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16751 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16752 | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS, | 16753 | | NFS4ERR_WRONGSEC | 16754 | PUTPUBFH | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16755 | | NFS4ERR_OP_NOT_IN_SESSION, | 16756 | | NFS4ERR_REP_TOO_BIG, | 16757 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16758 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16759 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONGSEC | 16760 | PUTROOTFH | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16761 | | NFS4ERR_OP_NOT_IN_SESSION, | 16762 | | NFS4ERR_REP_TOO_BIG, | 16763 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16764 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16765 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONGSEC | 16766 | READ | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 16767 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 16768 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16769 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, | 16770 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 16771 | | NFS4ERR_INVAL, NFS4ERR_ISDIR, NFS4ERR_IO, | 16772 | | NFS4ERR_LOCKED, NFS4ERR_MOVED, | 16773 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_OLD_STATEID, | 16774 | | NFS4ERR_OPENMODE, | 16775 | | NFS4ERR_OP_NOT_IN_SESSION, | 16776 | | NFS4ERR_PNFS_IO_HOLE, | 16777 | | NFS4ERR_PNFS_NO_LAYOUT, | 16778 | | NFS4ERR_REP_TOO_BIG, | 16779 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16780 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16781 | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | 16782 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE | 16783 | READDIR | NFS4ERR_ACCESS, NFS4ERR_BADXDR, | 16784 | | NFS4ERR_BAD_COOKIE, NFS4ERR_DEADSESSION, | 16785 | | NFS4ERR_DELAY, NFS4ERR_FHEXPIRED, | 16786 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED, | 16787 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTDIR, | 16788 | | NFS4ERR_NOT_SAME, | 16789 | | NFS4ERR_OP_NOT_IN_SESSION, | 16790 | | NFS4ERR_REP_TOO_BIG, | 16791 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16792 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16793 | | NFS4ERR_STALE, NFS4ERR_TOOSMALL, | 16794 | | NFS4ERR_TOO_MANY_OPS | 16795 | READLINK | NFS4ERR_ACCESS, NFS4ERR_DEADSESSION, | 16796 | | NFS4ERR_DELAY, NFS4ERR_FHEXPIRED, | 16797 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED, | 16798 | | NFS4ERR_NOFILEHANDLE, | 16799 | | NFS4ERR_OP_NOT_IN_SESSION, | 16800 | | NFS4ERR_REP_TOO_BIG, | 16801 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16802 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16803 | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS, | 16804 | | NFS4ERR_WRONG_TYPE | 16805 | RECLAIM_COMPLETE | NFS4ERR_BADXDR, NFS4ERR_COMPLETE_ALREADY, | 16806 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16807 | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | 16808 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 16809 | | NFS4ERR_OP_NOT_IN_SESSION, | 16810 | | NFS4ERR_REP_TOO_BIG, | 16811 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16812 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16813 | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS, | 16814 | | NFS4ERR_WRONG_CRED, NFS4ERR_WRONG_TYPE | 16815 | RELEASE_LOCKOWNER | NFS4ERR_NOTSUPP | 16816 | REMOVE | NFS4ERR_ACCESS, NFS4ERR_BADCHAR, | 16817 | | NFS4ERR_BADNAME, NFS4ERR_BADXDR, | 16818 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16819 | | NFS4ERR_FHEXPIRED, NFS4ERR_FILE_OPEN, | 16820 | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_IO, | 16821 | | NFS4ERR_MOVED, NFS4ERR_NAMETOOLONG, | 16822 | | NFS4ERR_NOENT, NFS4ERR_NOFILEHANDLE, | 16823 | | NFS4ERR_NOTDIR, NFS4ERR_NOTEMPTY, | 16824 | | NFS4ERR_OP_NOT_IN_SESSION, | 16825 | | NFS4ERR_REP_TOO_BIG, | 16826 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16827 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_ROFS, | 16828 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 16829 | | NFS4ERR_TOO_MANY_OPS | 16830 | RENAME | NFS4ERR_ACCESS, NFS4ERR_BADCHAR, | 16831 | | NFS4ERR_BADNAME, NFS4ERR_BADXDR, | 16832 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16833 | | NFS4ERR_DQUOT, NFS4ERR_EXIST, | 16834 | | NFS4ERR_FHEXPIRED, NFS4ERR_FILE_OPEN, | 16835 | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_IO, | 16836 | | NFS4ERR_MOVED, NFS4ERR_NAMETOOLONG, | 16837 | | NFS4ERR_NOENT, NFS4ERR_NOFILEHANDLE, | 16838 | | NFS4ERR_NOSPC, NFS4ERR_NOTDIR, | 16839 | | NFS4ERR_NOTEMPTY, | 16840 | | NFS4ERR_OP_NOT_IN_SESSION, | 16841 | | NFS4ERR_REP_TOO_BIG, | 16842 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16843 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_ROFS, | 16844 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 16845 | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONGSEC, | 16846 | | NFS4ERR_XDEV | 16847 | RENEW | NFS4ERR_NOTSUPP | 16848 | RESTOREFH | NFS4ERR_DEADSESSION, NFS4ERR_FHEXPIRED, | 16849 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 16850 | | NFS4ERR_OP_NOT_IN_SESSION, | 16851 | | NFS4ERR_REP_TOO_BIG, | 16852 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16853 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16854 | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS, | 16855 | | NFS4ERR_WRONGSEC | 16856 | SAVEFH | NFS4ERR_DEADSESSION, NFS4ERR_FHEXPIRED, | 16857 | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | 16858 | | NFS4ERR_OP_NOT_IN_SESSION, | 16859 | | NFS4ERR_REP_TOO_BIG, | 16860 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16861 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16862 | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS | 16863 | SECINFO | NFS4ERR_ACCESS, NFS4ERR_BADCHAR, | 16864 | | NFS4ERR_BADNAME, NFS4ERR_BADXDR, | 16865 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16866 | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | 16867 | | NFS4ERR_MOVED, NFS4ERR_NAMETOOLONG, | 16868 | | NFS4ERR_NOENT, NFS4ERR_NOFILEHANDLE, | 16869 | | NFS4ERR_NOTDIR, NFS4ERR_OP_NOT_IN_SESSION, | 16870 | | NFS4ERR_REP_TOO_BIG, | 16871 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16872 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16873 | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS | 16874 | SECINFO_NO_NAME | NFS4ERR_ACCESS, NFS4ERR_BADXDR, | 16875 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16876 | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | 16877 | | NFS4ERR_MOVED, NFS4ERR_NOENT, | 16878 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTDIR, | 16879 | | NFS4ERR_NOTSUPP, | 16880 | | NFS4ERR_OP_NOT_IN_SESSION, | 16881 | | NFS4ERR_REP_TOO_BIG, | 16882 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16883 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16884 | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS | 16885 | SEQUENCE | NFS4ERR_BADSESSION, NFS4ERR_BADSLOT, | 16886 | | NFS4ERR_BADXDR, NFS4ERR_BAD_HIGH_SLOT, | 16887 | | NFS4ERR_CONN_NOT_BOUND_TO_SESSION, | 16888 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16889 | | NFS4ERR_REP_TOO_BIG, | 16890 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16891 | | NFS4ERR_REQ_TOO_BIG, | 16892 | | NFS4ERR_RETRY_UNCACHED_REP, | 16893 | | NFS4ERR_SEQUENCE_POS, | 16894 | | NFS4ERR_SEQ_FALSE_RETRY, | 16895 | | NFS4ERR_SEQ_MISORDERED, | 16896 | | NFS4ERR_TOO_MANY_OPS | 16897 | SET_SSV | NFS4ERR_BADXDR, | 16898 | | NFS4ERR_BAD_SESSION_DIGEST, | 16899 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16900 | | NFS4ERR_INVAL, NFS4ERR_OP_NOT_IN_SESSION, | 16901 | | NFS4ERR_REP_TOO_BIG, | 16902 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16903 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS | 16904 | SETATTR | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 16905 | | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADCHAR, | 16906 | | NFS4ERR_BADOWNER, NFS4ERR_BADXDR, | 16907 | | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, | 16908 | | NFS4ERR_DELAY, NFS4ERR_DELEG_REVOKED, | 16909 | | NFS4ERR_DQUOT, NFS4ERR_EXPIRED, | 16910 | | NFS4ERR_FBIG, NFS4ERR_FHEXPIRED, | 16911 | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_IO, | 16912 | | NFS4ERR_LOCKED, NFS4ERR_MOVED, | 16913 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, | 16914 | | NFS4ERR_OLD_STATEID, NFS4ERR_OPENMODE, | 16915 | | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_PERM, | 16916 | | NFS4ERR_REP_TOO_BIG, | 16917 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16918 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_ROFS, | 16919 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 16920 | | NFS4ERR_TOO_MANY_OPS, | 16921 | | NFS4ERR_UNKNOWN_LAYOUTTYPE, | 16922 | | NFS4ERR_WRONG_TYPE | 16923 | SETCLIENTID | NFS4ERR_NOTSUPP | 16924 | SETCLIENTID_CONFIRM | NFS4ERR_NOTSUPP | 16925 | TEST_STATEID | NFS4ERR_BADXDR, NFS4ERR_DEADSESSION, | 16926 | | NFS4ERR_DELAY, NFS4ERR_OP_NOT_IN_SESSION, | 16927 | | NFS4ERR_REP_TOO_BIG, | 16928 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16929 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16930 | | NFS4ERR_TOO_MANY_OPS | 16931 | VERIFY | NFS4ERR_ACCESS, NFS4ERR_ATTRNOTSUPP, | 16932 | | NFS4ERR_BADCHAR, NFS4ERR_BADXDR, | 16933 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16934 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 16935 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED, | 16936 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOT_SAME, | 16937 | | NFS4ERR_OP_NOT_IN_SESSION, | 16938 | | NFS4ERR_REP_TOO_BIG, | 16939 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16940 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16941 | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS, | 16942 | | NFS4ERR_UNKNOWN_LAYOUTTYPE, | 16943 | | NFS4ERR_WRONG_TYPE | 16944 | WANT_DELEGATION | NFS4ERR_BADXDR, NFS4ERR_DEADSESSION, | 16945 | | NFS4ERR_DELAY, | 16946 | | NFS4ERR_DELEG_ALREADY_WANTED, | 16947 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 16948 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_MOVED, | 16949 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP, | 16950 | | NFS4ERR_NO_GRACE, | 16951 | | NFS4ERR_OP_NOT_IN_SESSION, | 16952 | | NFS4ERR_RECALLCONFLICT, | 16953 | | NFS4ERR_RECLAIM_BAD, | 16954 | | NFS4ERR_RECLAIM_CONFLICT, | 16955 | | NFS4ERR_REP_TOO_BIG, | 16956 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16957 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | 16958 | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS, | 16959 | | NFS4ERR_WRONG_TYPE | 16960 | WRITE | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | 16961 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 16962 | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | 16963 | | NFS4ERR_DELEG_REVOKED, NFS4ERR_DQUOT, | 16964 | | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | 16965 | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | 16966 | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR, | 16967 | | NFS4ERR_LOCKED, NFS4ERR_MOVED, | 16968 | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, | 16969 | | NFS4ERR_OLD_STATEID, NFS4ERR_OPENMODE, | 16970 | | NFS4ERR_OP_NOT_IN_SESSION, | 16971 | | NFS4ERR_PNFS_IO_HOLE, | 16972 | | NFS4ERR_PNFS_NO_LAYOUT, | 16973 | | NFS4ERR_REP_TOO_BIG, | 16974 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 16975 | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_ROFS, | 16976 | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | 16977 | | NFS4ERR_SYMLINK, NFS4ERR_TOO_MANY_OPS, | 16978 | | NFS4ERR_WRONG_TYPE | 16979 +----------------------+--------------------------------------------+ 16981 Table 12 16983 15.3. Callback operations and their valid errors 16985 This section contains a table which gives the valid error returns for 16986 each callback operation. The error code NFS4_OK (indicating no 16987 error) is not listed but should be understood to be returnable by all 16988 callback operations with the exception of CB_ILLEGAL. 16990 Valid error returns for each protocol callback operation 16992 +-------------------------+-----------------------------------------+ 16993 | Callback Operation | Errors | 16994 +-------------------------+-----------------------------------------+ 16995 | CB_GETATTR | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR, | 16996 | | NFS4ERR_DELAY, NFS4ERR_INVAL, | 16997 | | NFS4ERR_OP_NOT_IN_SESSION, | 16998 | | NFS4ERR_REP_TOO_BIG, | 16999 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 17000 | | NFS4ERR_REQ_TOO_BIG, | 17001 | | NFS4ERR_SERVERFAULT, | 17002 | | NFS4ERR_TOO_MANY_OPS, | 17003 | CB_ILLEGAL | NFS4ERR_BADXDR, NFS4ERR_OP_ILLEGAL | 17004 | CB_LAYOUTRECALL | NFS4ERR_BADHANDLE, NFS4ERR_BADIOMODE, | 17005 | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | 17006 | | NFS4ERR_DELAY, NFS4ERR_INVAL, | 17007 | | NFS4ERR_NOMATCHING_LAYOUT, | 17008 | | NFS4ERR_NOTSUPP, | 17009 | | NFS4ERR_OP_NOT_IN_SESSION, | 17010 | | NFS4ERR_REP_TOO_BIG, | 17011 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 17012 | | NFS4ERR_REQ_TOO_BIG, | 17013 | | NFS4ERR_TOO_MANY_OPS, | 17014 | | NFS4ERR_UNKNOWN_LAYOUTTYPE, | 17015 | | NFS4ERR_WRONG_TYPE | 17016 | CB_NOTIFY | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR, | 17017 | | NFS4ERR_BAD_STATEID, NFS4ERR_DELAY, | 17018 | | NFS4ERR_INVAL, NFS4ERR_NOTSUPP, | 17019 | | NFS4ERR_OP_NOT_IN_SESSION, | 17020 | | NFS4ERR_REP_TOO_BIG, | 17021 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 17022 | | NFS4ERR_REQ_TOO_BIG, | 17023 | | NFS4ERR_SERVERFAULT, | 17024 | | NFS4ERR_TOO_MANY_OPS | 17025 | CB_NOTIFY_LOCK | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR, | 17026 | | NFS4ERR_BAD_STATEID, NFS4ERR_DELAY, | 17027 | | NFS4ERR_NOTSUPP, | 17028 | | NFS4ERR_OP_NOT_IN_SESSION, | 17029 | | NFS4ERR_REP_TOO_BIG, | 17030 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 17031 | | NFS4ERR_REQ_TOO_BIG, | 17032 | | NFS4ERR_SERVERFAULT, | 17033 | | NFS4ERR_TOO_MANY_OPS | 17034 | CB_PUSH_DELEG | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR, | 17035 | | NFS4ERR_DELAY, NFS4ERR_INVAL, | 17036 | | NFS4ERR_NOTSUPP, | 17037 | | NFS4ERR_OP_NOT_IN_SESSION, | 17038 | | NFS4ERR_REJECT_DELEG, | 17039 | | NFS4ERR_REP_TOO_BIG, | 17040 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 17041 | | NFS4ERR_REQ_TOO_BIG, | 17042 | | NFS4ERR_SERVERFAULT, | 17043 | | NFS4ERR_TOO_MANY_OPS, | 17044 | | NFS4ERR_WRONG_TYPE | 17045 | CB_RECALL | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR, | 17046 | | NFS4ERR_BAD_STATEID, NFS4ERR_DELAY, | 17047 | | NFS4ERR_OP_NOT_IN_SESSION, | 17048 | | NFS4ERR_REP_TOO_BIG, | 17049 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 17050 | | NFS4ERR_REQ_TOO_BIG, | 17051 | | NFS4ERR_SERVERFAULT, | 17052 | | NFS4ERR_TOO_MANY_OPS | 17053 | CB_RECALL_ANY | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 17054 | | NFS4ERR_INVAL, | 17055 | | NFS4ERR_OP_NOT_IN_SESSION, | 17056 | | NFS4ERR_REP_TOO_BIG, | 17057 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 17058 | | NFS4ERR_REQ_TOO_BIG, | 17059 | | NFS4ERR_TOO_MANY_OPS | 17060 | CB_RECALLABLE_OBJ_AVAIL | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 17061 | | NFS4ERR_INVAL, NFS4ERR_NOTSUPP, | 17062 | | NFS4ERR_OP_NOT_IN_SESSION, | 17063 | | NFS4ERR_REP_TOO_BIG, | 17064 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 17065 | | NFS4ERR_REQ_TOO_BIG, | 17066 | | NFS4ERR_SERVERFAULT, | 17067 | | NFS4ERR_TOO_MANY_OPS | 17068 | CB_RECALL_SLOT | NFS4ERR_BADXDR, NFS4ERR_BAD_HIGH_SLOT, | 17069 | | NFS4ERR_DELAY, | 17070 | | NFS4ERR_OP_NOT_IN_SESSION, | 17071 | | NFS4ERR_REP_TOO_BIG, | 17072 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 17073 | | NFS4ERR_REQ_TOO_BIG, | 17074 | | NFS4ERR_TOO_MANY_OPS | 17075 | CB_SEQUENCE | NFS4ERR_BADSESSION, NFS4ERR_BADSLOT, | 17076 | | NFS4ERR_BADXDR, NFS4ERR_BAD_HIGH_SLOT, | 17077 | | NFS4ERR_CONN_NOT_BOUND_TO_SESSION, | 17078 | | NFS4ERR_DELAY, NFS4ERR_REP_TOO_BIG, | 17079 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 17080 | | NFS4ERR_REQ_TOO_BIG, | 17081 | | NFS4ERR_RETRY_UNCACHED_REP, | 17082 | | NFS4ERR_SEQUENCE_POS, | 17083 | | NFS4ERR_SEQ_FALSE_RETRY, | 17084 | | NFS4ERR_SEQ_MISORDERED, | 17085 | | NFS4ERR_TOO_MANY_OPS | 17086 | CB_WANTS_CANCELLED | NFS4ERR_BADXDR, NFS4ERR_DELAY, | 17087 | | NFS4ERR_NOTSUPP, | 17088 | | NFS4ERR_OP_NOT_IN_SESSION, | 17089 | | NFS4ERR_REP_TOO_BIG, | 17090 | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | 17091 | | NFS4ERR_REQ_TOO_BIG, | 17092 | | NFS4ERR_SERVERFAULT, | 17093 | | NFS4ERR_TOO_MANY_OPS | 17094 +-------------------------+-----------------------------------------+ 17096 Table 13 17098 15.4. Errors and the operations that use them 17100 +-----------------------------------+-------------------------------+ 17101 | Error | Operations | 17102 +-----------------------------------+-------------------------------+ 17103 | NFS4ERR_ACCESS | ACCESS, COMMIT, CREATE, | 17104 | | GETATTR, GET_DIR_DELEGATION, | 17105 | | LAYOUTCOMMIT, LAYOUTGET, | 17106 | | LINK, LOCK, LOCKT, LOCKU, | 17107 | | LOOKUP, LOOKUPP, NVERIFY, | 17108 | | OPEN, OPENATTR, READ, | 17109 | | READDIR, READLINK, REMOVE, | 17110 | | RENAME, SECINFO, | 17111 | | SECINFO_NO_NAME, SETATTR, | 17112 | | VERIFY, WRITE | 17113 | NFS4ERR_ADMIN_REVOKED | CLOSE, DELEGRETURN, | 17114 | | LAYOUTCOMMIT, LAYOUTGET, | 17115 | | LAYOUTRETURN, LOCK, LOCKU, | 17116 | | OPEN, OPEN_DOWNGRADE, READ, | 17117 | | SETATTR, WRITE | 17118 | NFS4ERR_ATTRNOTSUPP | CREATE, LAYOUTCOMMIT, | 17119 | | NVERIFY, OPEN, SETATTR, | 17120 | | VERIFY | 17121 | NFS4ERR_BACK_CHAN_BUSY | DESTROY_SESSION | 17122 | NFS4ERR_BADCHAR | CREATE, EXCHANGE_ID, LINK, | 17123 | | LOOKUP, NVERIFY, OPEN, | 17124 | | REMOVE, RENAME, SECINFO, | 17125 | | SETATTR, VERIFY | 17126 | NFS4ERR_BADHANDLE | CB_GETATTR, CB_LAYOUTRECALL, | 17127 | | CB_NOTIFY, CB_NOTIFY_LOCK, | 17128 | | CB_PUSH_DELEG, CB_RECALL, | 17129 | | PUTFH | 17130 | NFS4ERR_BADIOMODE | CB_LAYOUTRECALL, | 17131 | | LAYOUTCOMMIT, LAYOUTGET | 17132 | NFS4ERR_BADLAYOUT | LAYOUTCOMMIT, LAYOUTGET | 17133 | NFS4ERR_BADNAME | CREATE, LINK, LOOKUP, OPEN, | 17134 | | REMOVE, RENAME, SECINFO | 17135 | NFS4ERR_BADOWNER | CREATE, OPEN, SETATTR | 17136 | NFS4ERR_BADSESSION | BIND_CONN_TO_SESSION, | 17137 | | CB_SEQUENCE, DESTROY_SESSION, | 17138 | | SEQUENCE | 17139 | NFS4ERR_BADSLOT | CB_SEQUENCE, SEQUENCE | 17140 | NFS4ERR_BADTYPE | CREATE | 17141 | NFS4ERR_BADXDR | ACCESS, BACKCHANNEL_CTL, | 17142 | | BIND_CONN_TO_SESSION, | 17143 | | CB_GETATTR, CB_ILLEGAL, | 17144 | | CB_LAYOUTRECALL, CB_NOTIFY, | 17145 | | CB_NOTIFY_LOCK, | 17146 | | CB_PUSH_DELEG, CB_RECALL, | 17147 | | CB_RECALLABLE_OBJ_AVAIL, | 17148 | | CB_RECALL_ANY, | 17149 | | CB_RECALL_SLOT, CB_SEQUENCE, | 17150 | | CB_WANTS_CANCELLED, CLOSE, | 17151 | | COMMIT, CREATE, | 17152 | | CREATE_SESSION, DELEGPURGE, | 17153 | | DELEGRETURN, | 17154 | | DESTROY_CLIENTID, | 17155 | | DESTROY_SESSION, EXCHANGE_ID, | 17156 | | FREE_STATEID, GETATTR, | 17157 | | GETDEVICEINFO, GETDEVICELIST, | 17158 | | GET_DIR_DELEGATION, ILLEGAL, | 17159 | | LAYOUTCOMMIT, LAYOUTGET, | 17160 | | LAYOUTRETURN, LINK, LOCK, | 17161 | | LOCKT, LOCKU, LOOKUP, | 17162 | | NVERIFY, OPEN, OPENATTR, | 17163 | | OPEN_DOWNGRADE, PUTFH, READ, | 17164 | | READDIR, RECLAIM_COMPLETE, | 17165 | | REMOVE, RENAME, SECINFO, | 17166 | | SECINFO_NO_NAME, SEQUENCE, | 17167 | | SETATTR, SET_SSV, | 17168 | | TEST_STATEID, VERIFY, | 17169 | | WANT_DELEGATION, WRITE | 17170 | NFS4ERR_BAD_COOKIE | GETDEVICELIST, READDIR | 17171 | NFS4ERR_BAD_HIGH_SLOT | CB_RECALL_SLOT, CB_SEQUENCE, | 17172 | | SEQUENCE | 17173 | NFS4ERR_BAD_RANGE | LOCK, LOCKT, LOCKU | 17174 | NFS4ERR_BAD_SESSION_DIGEST | BIND_CONN_TO_SESSION, SET_SSV | 17175 | NFS4ERR_BAD_STATEID | CB_LAYOUTRECALL, CB_NOTIFY, | 17176 | | CB_NOTIFY_LOCK, CB_RECALL, | 17177 | | CLOSE, DELEGRETURN, | 17178 | | FREE_STATEID, LAYOUTGET, | 17179 | | LAYOUTRETURN, LOCK, LOCKU, | 17180 | | OPEN, OPEN_DOWNGRADE, READ, | 17181 | | SETATTR, WRITE | 17182 | NFS4ERR_CB_PATH_DOWN | DESTROY_SESSION | 17183 | NFS4ERR_CLID_INUSE | EXCHANGE_ID | 17184 | NFS4ERR_CLIENTID_BUSY | DESTROY_CLIENTID | 17185 | NFS4ERR_COMPLETE_ALREADY | RECLAIM_COMPLETE | 17186 | NFS4ERR_CONN_NOT_BOUND_TO_SESSION | CB_SEQUENCE, DESTROY_SESSION, | 17187 | | SEQUENCE | 17188 | NFS4ERR_DEADLOCK | LOCK | 17189 | NFS4ERR_DEADSESSION | ACCESS, BACKCHANNEL_CTL, | 17190 | | BIND_CONN_TO_SESSION, CLOSE, | 17191 | | COMMIT, CREATE, | 17192 | | CREATE_SESSION, DELEGPURGE, | 17193 | | DELEGRETURN, | 17194 | | DESTROY_CLIENTID, | 17195 | | DESTROY_SESSION, EXCHANGE_ID, | 17196 | | FREE_STATEID, GETATTR, | 17197 | | GETDEVICEINFO, GETDEVICELIST, | 17198 | | GET_DIR_DELEGATION, | 17199 | | LAYOUTCOMMIT, LAYOUTGET, | 17200 | | LAYOUTRETURN, LINK, LOCK, | 17201 | | LOCKT, LOCKU, LOOKUP, | 17202 | | LOOKUPP, NVERIFY, OPEN, | 17203 | | OPENATTR, OPEN_DOWNGRADE, | 17204 | | PUTFH, PUTPUBFH, PUTROOTFH, | 17205 | | READ, READDIR, READLINK, | 17206 | | RECLAIM_COMPLETE, REMOVE, | 17207 | | RENAME, RESTOREFH, SAVEFH, | 17208 | | SECINFO, SECINFO_NO_NAME, | 17209 | | SEQUENCE, SETATTR, SET_SSV, | 17210 | | TEST_STATEID, VERIFY, | 17211 | | WANT_DELEGATION, WRITE | 17212 | NFS4ERR_DELAY | ACCESS, BACKCHANNEL_CTL, | 17213 | | BIND_CONN_TO_SESSION, | 17214 | | CB_GETATTR, CB_LAYOUTRECALL, | 17215 | | CB_NOTIFY, CB_NOTIFY_LOCK, | 17216 | | CB_PUSH_DELEG, CB_RECALL, | 17217 | | CB_RECALLABLE_OBJ_AVAIL, | 17218 | | CB_RECALL_ANY, | 17219 | | CB_RECALL_SLOT, CB_SEQUENCE, | 17220 | | CB_WANTS_CANCELLED, CLOSE, | 17221 | | COMMIT, CREATE, | 17222 | | CREATE_SESSION, DELEGPURGE, | 17223 | | DELEGRETURN, | 17224 | | DESTROY_CLIENTID, | 17225 | | DESTROY_SESSION, EXCHANGE_ID, | 17226 | | FREE_STATEID, GETATTR, | 17227 | | GETDEVICEINFO, GETDEVICELIST, | 17228 | | GET_DIR_DELEGATION, | 17229 | | LAYOUTCOMMIT, LAYOUTGET, | 17230 | | LAYOUTRETURN, LINK, LOCK, | 17231 | | LOCKT, LOCKU, LOOKUP, | 17232 | | LOOKUPP, NVERIFY, OPEN, | 17233 | | OPENATTR, OPEN_DOWNGRADE, | 17234 | | PUTFH, PUTPUBFH, PUTROOTFH, | 17235 | | READ, READDIR, READLINK, | 17236 | | RECLAIM_COMPLETE, REMOVE, | 17237 | | RENAME, SECINFO, | 17238 | | SECINFO_NO_NAME, SEQUENCE, | 17239 | | SETATTR, SET_SSV, | 17240 | | TEST_STATEID, VERIFY, | 17241 | | WANT_DELEGATION, WRITE | 17242 | NFS4ERR_DELEG_ALREADY_WANTED | OPEN, WANT_DELEGATION | 17243 | NFS4ERR_DELEG_REVOKED | DELEGRETURN, LAYOUTGET, | 17244 | | LAYOUTRETURN, OPEN, READ, | 17245 | | SETATTR, WRITE | 17246 | NFS4ERR_DENIED | LOCK, LOCKT | 17247 | NFS4ERR_DIRDELEG_UNAVAIL | GET_DIR_DELEGATION | 17248 | NFS4ERR_DQUOT | CREATE, LAYOUTGET, LINK, | 17249 | | OPEN, OPENATTR, RENAME, | 17250 | | SETATTR, WRITE | 17251 | NFS4ERR_ENCR_ALG_UNSUPP | EXCHANGE_ID | 17252 | NFS4ERR_EXIST | CREATE, LINK, OPEN, RENAME | 17253 | NFS4ERR_EXPIRED | CLOSE, DELEGRETURN, | 17254 | | LAYOUTCOMMIT, LAYOUTRETURN, | 17255 | | LOCK, LOCKU, OPEN, | 17256 | | OPEN_DOWNGRADE, READ, | 17257 | | SETATTR, WRITE | 17258 | NFS4ERR_FBIG | LAYOUTCOMMIT, OPEN, SETATTR, | 17259 | | WRITE | 17260 | NFS4ERR_FHEXPIRED | ACCESS, CLOSE, COMMIT, | 17261 | | CREATE, DELEGRETURN, GETATTR, | 17262 | | GETDEVICELIST, GETFH, | 17263 | | GET_DIR_DELEGATION, | 17264 | | LAYOUTCOMMIT, LAYOUTGET, | 17265 | | LAYOUTRETURN, LINK, LOCK, | 17266 | | LOCKT, LOCKU, LOOKUP, | 17267 | | LOOKUPP, NVERIFY, OPEN, | 17268 | | OPENATTR, OPEN_DOWNGRADE, | 17269 | | READ, READDIR, READLINK, | 17270 | | RECLAIM_COMPLETE, REMOVE, | 17271 | | RENAME, RESTOREFH, SAVEFH, | 17272 | | SECINFO, SECINFO_NO_NAME, | 17273 | | SETATTR, VERIFY, | 17274 | | WANT_DELEGATION, WRITE | 17275 | NFS4ERR_FILE_OPEN | LINK, REMOVE, RENAME | 17276 | NFS4ERR_GRACE | GETATTR, GET_DIR_DELEGATION, | 17277 | | LAYOUTCOMMIT, LAYOUTGET, | 17278 | | LAYOUTRETURN, LINK, LOCK, | 17279 | | LOCKT, NVERIFY, OPEN, READ, | 17280 | | REMOVE, RENAME, SETATTR, | 17281 | | VERIFY, WANT_DELEGATION, | 17282 | | WRITE | 17283 | NFS4ERR_HASH_ALG_UNSUPP | EXCHANGE_ID | 17284 | NFS4ERR_INVAL | ACCESS, BACKCHANNEL_CTL, | 17285 | | BIND_CONN_TO_SESSION, | 17286 | | CB_GETATTR, CB_LAYOUTRECALL, | 17287 | | CB_NOTIFY, CB_PUSH_DELEG, | 17288 | | CB_RECALLABLE_OBJ_AVAIL, | 17289 | | CB_RECALL_ANY, CREATE, | 17290 | | CREATE_SESSION, DELEGRETURN, | 17291 | | EXCHANGE_ID, GETATTR, | 17292 | | GETDEVICEINFO, GETDEVICELIST, | 17293 | | GET_DIR_DELEGATION, | 17294 | | LAYOUTCOMMIT, LAYOUTGET, | 17295 | | LAYOUTRETURN, LINK, LOCK, | 17296 | | LOCKT, LOCKU, LOOKUP, | 17297 | | NVERIFY, OPEN, | 17298 | | OPEN_DOWNGRADE, READ, | 17299 | | READDIR, READLINK, | 17300 | | RECLAIM_COMPLETE, REMOVE, | 17301 | | RENAME, SECINFO, | 17302 | | SECINFO_NO_NAME, SETATTR, | 17303 | | SET_SSV, VERIFY, | 17304 | | WANT_DELEGATION, WRITE | 17305 | NFS4ERR_IO | ACCESS, COMMIT, CREATE, | 17306 | | GETATTR, GETDEVICELIST, | 17307 | | GET_DIR_DELEGATION, | 17308 | | LAYOUTCOMMIT, LAYOUTGET, | 17309 | | LINK, LOOKUP, LOOKUPP, | 17310 | | NVERIFY, OPEN, OPENATTR, | 17311 | | READ, READDIR, READLINK, | 17312 | | REMOVE, RENAME, SETATTR, | 17313 | | VERIFY, WANT_DELEGATION, | 17314 | | WRITE | 17315 | NFS4ERR_ISDIR | COMMIT, LAYOUTCOMMIT, | 17316 | | LAYOUTRETURN, LINK, LOCK, | 17317 | | LOCKT, OPEN, READ, WRITE | 17318 | NFS4ERR_LAYOUTTRYLATER | LAYOUTGET | 17319 | NFS4ERR_LAYOUTUNAVAILABLE | LAYOUTGET | 17320 | NFS4ERR_LOCKED | LAYOUTGET, READ, SETATTR, | 17321 | | WRITE | 17322 | NFS4ERR_LOCKS_HELD | CLOSE, FREE_STATEID | 17323 | NFS4ERR_LOCK_NOTSUPP | LOCK | 17324 | NFS4ERR_LOCK_RANGE | LOCK, LOCKT, LOCKU | 17325 | NFS4ERR_MLINK | CREATE, LINK | 17326 | NFS4ERR_MOVED | ACCESS, CLOSE, COMMIT, | 17327 | | CREATE, DELEGRETURN, GETATTR, | 17328 | | GETFH, GET_DIR_DELEGATION, | 17329 | | LAYOUTCOMMIT, LAYOUTGET, | 17330 | | LAYOUTRETURN, LINK, LOCK, | 17331 | | LOCKT, LOCKU, LOOKUP, | 17332 | | LOOKUPP, NVERIFY, OPEN, | 17333 | | OPENATTR, OPEN_DOWNGRADE, | 17334 | | PUTFH, READ, READDIR, | 17335 | | READLINK, RECLAIM_COMPLETE, | 17336 | | REMOVE, RENAME, RESTOREFH, | 17337 | | SAVEFH, SECINFO, | 17338 | | SECINFO_NO_NAME, SETATTR, | 17339 | | VERIFY, WANT_DELEGATION, | 17340 | | WRITE | 17341 | NFS4ERR_NAMETOOLONG | CREATE, LINK, LOOKUP, OPEN, | 17342 | | REMOVE, RENAME, SECINFO | 17343 | NFS4ERR_NOENT | BACKCHANNEL_CTL, | 17344 | | CREATE_SESSION, EXCHANGE_ID, | 17345 | | GETDEVICEINFO, LOOKUP, | 17346 | | LOOKUPP, OPEN, OPENATTR, | 17347 | | REMOVE, RENAME, SECINFO, | 17348 | | SECINFO_NO_NAME | 17349 | NFS4ERR_NOFILEHANDLE | ACCESS, CLOSE, COMMIT, | 17350 | | CREATE, DELEGRETURN, GETATTR, | 17351 | | GETDEVICELIST, GETFH, | 17352 | | GET_DIR_DELEGATION, | 17353 | | LAYOUTCOMMIT, LAYOUTGET, | 17354 | | LAYOUTRETURN, LINK, LOCK, | 17355 | | LOCKT, LOCKU, LOOKUP, | 17356 | | LOOKUPP, NVERIFY, OPEN, | 17357 | | OPENATTR, OPEN_DOWNGRADE, | 17358 | | READ, READDIR, READLINK, | 17359 | | RECLAIM_COMPLETE, REMOVE, | 17360 | | RENAME, RESTOREFH, SAVEFH, | 17361 | | SECINFO, SECINFO_NO_NAME, | 17362 | | SETATTR, VERIFY, | 17363 | | WANT_DELEGATION, WRITE | 17364 | NFS4ERR_NOMATCHING_LAYOUT | CB_LAYOUTRECALL | 17365 | NFS4ERR_NOSPC | CREATE, CREATE_SESSION, | 17366 | | LAYOUTGET, LINK, OPEN, | 17367 | | OPENATTR, RENAME, SETATTR, | 17368 | | WRITE | 17369 | NFS4ERR_NOTDIR | CREATE, GET_DIR_DELEGATION, | 17370 | | LINK, LOOKUP, LOOKUPP, OPEN, | 17371 | | READDIR, REMOVE, RENAME, | 17372 | | SECINFO, SECINFO_NO_NAME | 17373 | NFS4ERR_NOTEMPTY | REMOVE, RENAME | 17374 | NFS4ERR_NOTSUPP | CB_LAYOUTRECALL, CB_NOTIFY, | 17375 | | CB_NOTIFY_LOCK, | 17376 | | CB_PUSH_DELEG, | 17377 | | CB_RECALLABLE_OBJ_AVAIL, | 17378 | | CB_WANTS_CANCELLED, | 17379 | | DELEGPURGE, DELEGRETURN, | 17380 | | GETDEVICEINFO, GETDEVICELIST, | 17381 | | GET_DIR_DELEGATION, | 17382 | | LAYOUTCOMMIT, LAYOUTGET, | 17383 | | LAYOUTRETURN, LINK, OPENATTR, | 17384 | | OPEN_CONFIRM, | 17385 | | RELEASE_LOCKOWNER, RENEW, | 17386 | | SECINFO_NO_NAME, SETCLIENTID, | 17387 | | SETCLIENTID_CONFIRM, | 17388 | | WANT_DELEGATION | 17389 | NFS4ERR_NOT_ONLY_OP | BIND_CONN_TO_SESSION, | 17390 | | CREATE_SESSION, | 17391 | | DESTROY_CLIENTID, | 17392 | | DESTROY_SESSION, EXCHANGE_ID | 17393 | NFS4ERR_NOT_SAME | EXCHANGE_ID, GETDEVICELIST, | 17394 | | READDIR, VERIFY | 17395 | NFS4ERR_NO_GRACE | LAYOUTCOMMIT, LAYOUTRETURN, | 17396 | | LOCK, OPEN, WANT_DELEGATION | 17397 | NFS4ERR_OLD_STATEID | CLOSE, DELEGRETURN, | 17398 | | FREE_STATEID, LAYOUTGET, | 17399 | | LAYOUTRETURN, LOCK, LOCKU, | 17400 | | OPEN, OPEN_DOWNGRADE, READ, | 17401 | | SETATTR, WRITE | 17402 | NFS4ERR_OPENMODE | LAYOUTGET, LOCK, READ, | 17403 | | SETATTR, WRITE | 17404 | NFS4ERR_OP_ILLEGAL | CB_ILLEGAL, ILLEGAL | 17405 | NFS4ERR_OP_NOT_IN_SESSION | ACCESS, BACKCHANNEL_CTL, | 17406 | | CB_GETATTR, CB_LAYOUTRECALL, | 17407 | | CB_NOTIFY, CB_NOTIFY_LOCK, | 17408 | | CB_PUSH_DELEG, CB_RECALL, | 17409 | | CB_RECALLABLE_OBJ_AVAIL, | 17410 | | CB_RECALL_ANY, | 17411 | | CB_RECALL_SLOT, | 17412 | | CB_WANTS_CANCELLED, CLOSE, | 17413 | | COMMIT, CREATE, DELEGPURGE, | 17414 | | DELEGRETURN, FREE_STATEID, | 17415 | | GETATTR, GETDEVICEINFO, | 17416 | | GETDEVICELIST, GETFH, | 17417 | | GET_DIR_DELEGATION, | 17418 | | LAYOUTCOMMIT, LAYOUTGET, | 17419 | | LAYOUTRETURN, LINK, LOCK, | 17420 | | LOCKT, LOCKU, LOOKUP, | 17421 | | LOOKUPP, NVERIFY, OPEN, | 17422 | | OPENATTR, OPEN_DOWNGRADE, | 17423 | | PUTFH, PUTPUBFH, PUTROOTFH, | 17424 | | READ, READDIR, READLINK, | 17425 | | RECLAIM_COMPLETE, REMOVE, | 17426 | | RENAME, RESTOREFH, SAVEFH, | 17427 | | SECINFO, SECINFO_NO_NAME, | 17428 | | SETATTR, SET_SSV, | 17429 | | TEST_STATEID, VERIFY, | 17430 | | WANT_DELEGATION, WRITE | 17431 | NFS4ERR_PERM | CREATE, OPEN, SETATTR | 17432 | NFS4ERR_PNFS_IO_HOLE | READ, WRITE | 17433 | NFS4ERR_PNFS_NO_LAYOUT | READ, WRITE | 17434 | NFS4ERR_RECALLCONFLICT | LAYOUTGET, WANT_DELEGATION | 17435 | NFS4ERR_RECLAIM_BAD | LAYOUTCOMMIT, LOCK, OPEN, | 17436 | | WANT_DELEGATION | 17437 | NFS4ERR_RECLAIM_CONFLICT | LAYOUTCOMMIT, LOCK, OPEN, | 17438 | | WANT_DELEGATION | 17439 | NFS4ERR_REJECT_DELEG | CB_PUSH_DELEG | 17440 | NFS4ERR_REP_TOO_BIG | ACCESS, BACKCHANNEL_CTL, | 17441 | | BIND_CONN_TO_SESSION, | 17442 | | CB_GETATTR, CB_LAYOUTRECALL, | 17443 | | CB_NOTIFY, CB_NOTIFY_LOCK, | 17444 | | CB_PUSH_DELEG, CB_RECALL, | 17445 | | CB_RECALLABLE_OBJ_AVAIL, | 17446 | | CB_RECALL_ANY, | 17447 | | CB_RECALL_SLOT, CB_SEQUENCE, | 17448 | | CB_WANTS_CANCELLED, CLOSE, | 17449 | | COMMIT, CREATE, | 17450 | | CREATE_SESSION, DELEGPURGE, | 17451 | | DELEGRETURN, | 17452 | | DESTROY_CLIENTID, | 17453 | | DESTROY_SESSION, EXCHANGE_ID, | 17454 | | FREE_STATEID, GETATTR, | 17455 | | GETDEVICEINFO, GETDEVICELIST, | 17456 | | GET_DIR_DELEGATION, | 17457 | | LAYOUTCOMMIT, LAYOUTGET, | 17458 | | LAYOUTRETURN, LINK, LOCK, | 17459 | | LOCKT, LOCKU, LOOKUP, | 17460 | | LOOKUPP, NVERIFY, OPEN, | 17461 | | OPENATTR, OPEN_DOWNGRADE, | 17462 | | PUTFH, PUTPUBFH, PUTROOTFH, | 17463 | | READ, READDIR, READLINK, | 17464 | | RECLAIM_COMPLETE, REMOVE, | 17465 | | RENAME, RESTOREFH, SAVEFH, | 17466 | | SECINFO, SECINFO_NO_NAME, | 17467 | | SEQUENCE, SETATTR, SET_SSV, | 17468 | | TEST_STATEID, VERIFY, | 17469 | | WANT_DELEGATION, WRITE | 17470 | NFS4ERR_REP_TOO_BIG_TO_CACHE | ACCESS, BACKCHANNEL_CTL, | 17471 | | BIND_CONN_TO_SESSION, | 17472 | | CB_GETATTR, CB_LAYOUTRECALL, | 17473 | | CB_NOTIFY, CB_NOTIFY_LOCK, | 17474 | | CB_PUSH_DELEG, CB_RECALL, | 17475 | | CB_RECALLABLE_OBJ_AVAIL, | 17476 | | CB_RECALL_ANY, | 17477 | | CB_RECALL_SLOT, CB_SEQUENCE, | 17478 | | CB_WANTS_CANCELLED, CLOSE, | 17479 | | COMMIT, CREATE, | 17480 | | CREATE_SESSION, DELEGPURGE, | 17481 | | DELEGRETURN, | 17482 | | DESTROY_CLIENTID, | 17483 | | DESTROY_SESSION, EXCHANGE_ID, | 17484 | | FREE_STATEID, GETATTR, | 17485 | | GETDEVICEINFO, GETDEVICELIST, | 17486 | | GET_DIR_DELEGATION, | 17487 | | LAYOUTCOMMIT, LAYOUTGET, | 17488 | | LAYOUTRETURN, LINK, LOCK, | 17489 | | LOCKT, LOCKU, LOOKUP, | 17490 | | LOOKUPP, NVERIFY, OPEN, | 17491 | | OPENATTR, OPEN_DOWNGRADE, | 17492 | | PUTFH, PUTPUBFH, PUTROOTFH, | 17493 | | READ, READDIR, READLINK, | 17494 | | RECLAIM_COMPLETE, REMOVE, | 17495 | | RENAME, RESTOREFH, SAVEFH, | 17496 | | SECINFO, SECINFO_NO_NAME, | 17497 | | SEQUENCE, SETATTR, SET_SSV, | 17498 | | TEST_STATEID, VERIFY, | 17499 | | WANT_DELEGATION, WRITE | 17500 | NFS4ERR_REQ_TOO_BIG | ACCESS, BACKCHANNEL_CTL, | 17501 | | BIND_CONN_TO_SESSION, | 17502 | | CB_GETATTR, CB_LAYOUTRECALL, | 17503 | | CB_NOTIFY, CB_NOTIFY_LOCK, | 17504 | | CB_PUSH_DELEG, CB_RECALL, | 17505 | | CB_RECALLABLE_OBJ_AVAIL, | 17506 | | CB_RECALL_ANY, | 17507 | | CB_RECALL_SLOT, CB_SEQUENCE, | 17508 | | CB_WANTS_CANCELLED, CLOSE, | 17509 | | COMMIT, CREATE, | 17510 | | CREATE_SESSION, DELEGPURGE, | 17511 | | DELEGRETURN, | 17512 | | DESTROY_CLIENTID, | 17513 | | DESTROY_SESSION, EXCHANGE_ID, | 17514 | | FREE_STATEID, GETATTR, | 17515 | | GETDEVICEINFO, GETDEVICELIST, | 17516 | | GET_DIR_DELEGATION, | 17517 | | LAYOUTCOMMIT, LAYOUTGET, | 17518 | | LAYOUTRETURN, LINK, LOCK, | 17519 | | LOCKT, LOCKU, LOOKUP, | 17520 | | LOOKUPP, NVERIFY, OPEN, | 17521 | | OPENATTR, OPEN_DOWNGRADE, | 17522 | | PUTFH, PUTPUBFH, PUTROOTFH, | 17523 | | READ, READDIR, READLINK, | 17524 | | RECLAIM_COMPLETE, REMOVE, | 17525 | | RENAME, RESTOREFH, SAVEFH, | 17526 | | SECINFO, SECINFO_NO_NAME, | 17527 | | SEQUENCE, SETATTR, SET_SSV, | 17528 | | TEST_STATEID, VERIFY, | 17529 | | WANT_DELEGATION, WRITE | 17530 | NFS4ERR_RETRY_UNCACHED_REP | CB_SEQUENCE, SEQUENCE | 17531 | NFS4ERR_ROFS | CREATE, LINK, LOCK, LOCKT, | 17532 | | OPEN, OPENATTR, | 17533 | | OPEN_DOWNGRADE, REMOVE, | 17534 | | RENAME, SETATTR, WRITE | 17535 | NFS4ERR_SAME | NVERIFY | 17536 | NFS4ERR_SEQUENCE_POS | CB_SEQUENCE, SEQUENCE | 17537 | NFS4ERR_SEQ_FALSE_RETRY | CB_SEQUENCE, SEQUENCE | 17538 | NFS4ERR_SEQ_MISORDERED | CB_SEQUENCE, CREATE_SESSION, | 17539 | | SEQUENCE | 17540 | NFS4ERR_SERVERFAULT | ACCESS, BIND_CONN_TO_SESSION, | 17541 | | CB_GETATTR, CB_NOTIFY, | 17542 | | CB_NOTIFY_LOCK, | 17543 | | CB_PUSH_DELEG, CB_RECALL, | 17544 | | CB_RECALLABLE_OBJ_AVAIL, | 17545 | | CB_WANTS_CANCELLED, CLOSE, | 17546 | | COMMIT, CREATE, | 17547 | | CREATE_SESSION, DELEGPURGE, | 17548 | | DELEGRETURN, | 17549 | | DESTROY_CLIENTID, | 17550 | | DESTROY_SESSION, EXCHANGE_ID, | 17551 | | FREE_STATEID, GETATTR, | 17552 | | GETDEVICEINFO, GETDEVICELIST, | 17553 | | GET_DIR_DELEGATION, | 17554 | | LAYOUTCOMMIT, LAYOUTGET, | 17555 | | LAYOUTRETURN, LINK, LOCK, | 17556 | | LOCKU, LOOKUP, LOOKUPP, | 17557 | | NVERIFY, OPEN, OPENATTR, | 17558 | | OPEN_DOWNGRADE, PUTFH, | 17559 | | PUTPUBFH, PUTROOTFH, READ, | 17560 | | READDIR, READLINK, | 17561 | | RECLAIM_COMPLETE, REMOVE, | 17562 | | RENAME, RESTOREFH, SAVEFH, | 17563 | | SECINFO, SECINFO_NO_NAME, | 17564 | | SETATTR, TEST_STATEID, | 17565 | | VERIFY, WANT_DELEGATION, | 17566 | | WRITE | 17567 | NFS4ERR_SHARE_DENIED | OPEN | 17568 | NFS4ERR_STALE | ACCESS, CLOSE, COMMIT, | 17569 | | CREATE, DELEGRETURN, GETATTR, | 17570 | | GETFH, GET_DIR_DELEGATION, | 17571 | | LAYOUTCOMMIT, LAYOUTGET, | 17572 | | LAYOUTRETURN, LINK, LOCK, | 17573 | | LOCKT, LOCKU, LOOKUP, | 17574 | | LOOKUPP, NVERIFY, OPEN, | 17575 | | OPENATTR, OPEN_DOWNGRADE, | 17576 | | PUTFH, READ, READDIR, | 17577 | | READLINK, RECLAIM_COMPLETE, | 17578 | | REMOVE, RENAME, RESTOREFH, | 17579 | | SAVEFH, SECINFO, | 17580 | | SECINFO_NO_NAME, SETATTR, | 17581 | | VERIFY, WANT_DELEGATION, | 17582 | | WRITE | 17583 | NFS4ERR_STALE_CLIENTID | CREATE_SESSION, | 17584 | | DESTROY_CLIENTID, | 17585 | | DESTROY_SESSION | 17586 | NFS4ERR_SYMLINK | COMMIT, LAYOUTCOMMIT, LINK, | 17587 | | LOCK, LOCKT, LOOKUP, LOOKUPP, | 17588 | | OPEN, READ, WRITE | 17589 | NFS4ERR_TOOSMALL | CREATE_SESSION, | 17590 | | GETDEVICEINFO, LAYOUTGET, | 17591 | | READDIR | 17592 | NFS4ERR_TOO_MANY_OPS | ACCESS, BACKCHANNEL_CTL, | 17593 | | BIND_CONN_TO_SESSION, | 17594 | | CB_GETATTR, CB_LAYOUTRECALL, | 17595 | | CB_NOTIFY, CB_NOTIFY_LOCK, | 17596 | | CB_PUSH_DELEG, CB_RECALL, | 17597 | | CB_RECALLABLE_OBJ_AVAIL, | 17598 | | CB_RECALL_ANY, | 17599 | | CB_RECALL_SLOT, CB_SEQUENCE, | 17600 | | CB_WANTS_CANCELLED, CLOSE, | 17601 | | COMMIT, CREATE, | 17602 | | CREATE_SESSION, DELEGPURGE, | 17603 | | DELEGRETURN, | 17604 | | DESTROY_CLIENTID, | 17605 | | DESTROY_SESSION, EXCHANGE_ID, | 17606 | | FREE_STATEID, GETATTR, | 17607 | | GETDEVICEINFO, GETDEVICELIST, | 17608 | | GET_DIR_DELEGATION, | 17609 | | LAYOUTCOMMIT, LAYOUTGET, | 17610 | | LAYOUTRETURN, LINK, LOCK, | 17611 | | LOCKT, LOCKU, LOOKUP, | 17612 | | LOOKUPP, NVERIFY, OPEN, | 17613 | | OPENATTR, OPEN_DOWNGRADE, | 17614 | | PUTFH, PUTPUBFH, PUTROOTFH, | 17615 | | READ, READDIR, READLINK, | 17616 | | RECLAIM_COMPLETE, REMOVE, | 17617 | | RENAME, RESTOREFH, SAVEFH, | 17618 | | SECINFO, SECINFO_NO_NAME, | 17619 | | SEQUENCE, SETATTR, SET_SSV, | 17620 | | TEST_STATEID, VERIFY, | 17621 | | WANT_DELEGATION, WRITE | 17622 | NFS4ERR_UNKNOWN_LAYOUTTYPE | CB_LAYOUTRECALL, | 17623 | | GETDEVICEINFO, GETDEVICELIST, | 17624 | | LAYOUTCOMMIT, LAYOUTGET, | 17625 | | LAYOUTRETURN, NVERIFY, | 17626 | | SETATTR, VERIFY | 17627 | NFS4ERR_UNSAFE_COMPOUND | CREATE, OPEN, OPENATTR | 17628 | NFS4ERR_WRONGSEC | LINK, LOOKUP, LOOKUPP, OPEN, | 17629 | | PUTFH, PUTPUBFH, PUTROOTFH, | 17630 | | RENAME, RESTOREFH | 17631 | NFS4ERR_WRONG_CRED | CLOSE, CREATE_SESSION, | 17632 | | DELEGPURGE, DELEGRETURN, | 17633 | | DESTROY_CLIENTID, | 17634 | | DESTROY_SESSION, | 17635 | | FREE_STATEID, LAYOUTCOMMIT, | 17636 | | LAYOUTRETURN, LOCK, LOCKT, | 17637 | | LOCKU, OPEN_DOWNGRADE, | 17638 | | RECLAIM_COMPLETE | 17639 | NFS4ERR_WRONG_TYPE | CB_LAYOUTRECALL, | 17640 | | CB_PUSH_DELEG, COMMIT, | 17641 | | GETATTR, LAYOUTGET, | 17642 | | LAYOUTRETURN, LINK, LOCK, | 17643 | | LOCKT, NVERIFY, OPEN, | 17644 | | OPENATTR, READ, READLINK, | 17645 | | RECLAIM_COMPLETE, SETATTR, | 17646 | | VERIFY, WANT_DELEGATION, | 17647 | | WRITE | 17648 | NFS4ERR_XDEV | LINK, RENAME | 17649 +-----------------------------------+-------------------------------+ 17651 Table 14 17653 16. NFSv4.1 Procedures 17655 Both procedures, NULL and COMPOUND, MUST be implemented. 17657 16.1. Procedure 0: NULL - No Operation 17659 16.1.1. ARGUMENTS 17661 void; 17663 16.1.2. RESULTS 17665 void; 17667 16.1.3. DESCRIPTION 17669 This is the standard NULL procedure with the standard void argument 17670 and void response. This procedure has no functionality associated 17671 with it. Because of this it is sometimes used to measure the 17672 overhead of processing a service request. Therefore, the server 17673 SHOULD ensure that no unnecessary work is done in servicing this 17674 procedure. 17676 16.1.4. ERRORS 17678 None. 17680 16.2. Procedure 1: COMPOUND - Compound Operations 17682 16.2.1. ARGUMENTS 17684 enum nfs_opnum4 { 17685 OP_ACCESS = 3, 17686 OP_CLOSE = 4, 17687 OP_COMMIT = 5, 17688 OP_CREATE = 6, 17689 OP_DELEGPURGE = 7, 17690 OP_DELEGRETURN = 8, 17691 OP_GETATTR = 9, 17692 OP_GETFH = 10, 17693 OP_LINK = 11, 17694 OP_LOCK = 12, 17695 OP_LOCKT = 13, 17696 OP_LOCKU = 14, 17697 OP_LOOKUP = 15, 17698 OP_LOOKUPP = 16, 17699 OP_NVERIFY = 17, 17700 OP_OPEN = 18, 17701 OP_OPENATTR = 19, 17702 OP_OPEN_CONFIRM = 20, /* Mandatory not-to-implement */ 17703 OP_OPEN_DOWNGRADE = 21, 17704 OP_PUTFH = 22, 17705 OP_PUTPUBFH = 23, 17706 OP_PUTROOTFH = 24, 17707 OP_READ = 25, 17708 OP_READDIR = 26, 17709 OP_READLINK = 27, 17710 OP_REMOVE = 28, 17711 OP_RENAME = 29, 17712 OP_RENEW = 30, /* Mandatory not-to-implement */ 17713 OP_RESTOREFH = 31, 17714 OP_SAVEFH = 32, 17715 OP_SECINFO = 33, 17716 OP_SETATTR = 34, 17717 OP_SETCLIENTID = 35, /* Mandatory not-to-implement */ 17718 OP_SETCLIENTID_CONFIRM = 36, /* Mandatory not-to-implement */ 17719 OP_VERIFY = 37, 17720 OP_WRITE = 38, 17721 OP_RELEASE_LOCKOWNER = 39, /* Mandatory not-to-implement */ 17723 /* new operations for NFSv4.1 */ 17724 OP_BACKCHANNEL_CTL = 40, 17725 OP_BIND_CONN_TO_SESSION = 41, 17726 OP_EXCHANGE_ID = 42, 17727 OP_CREATE_SESSION = 43, 17728 OP_DESTROY_SESSION = 44, 17729 OP_FREE_STATEID = 45, 17730 OP_GET_DIR_DELEGATION = 46, 17731 OP_GETDEVICEINFO = 47, 17732 OP_GETDEVICELIST = 48, 17733 OP_LAYOUTCOMMIT = 49, 17734 OP_LAYOUTGET = 50, 17735 OP_LAYOUTRETURN = 51, 17736 OP_SECINFO_NO_NAME = 52, 17737 OP_SEQUENCE = 53, 17738 OP_SET_SSV = 54, 17739 OP_TEST_STATEID = 55, 17740 OP_WANT_DELEGATION = 56, 17741 OP_DESTROY_CLIENTID = 57, 17742 OP_RECLAIM_COMPLETE = 58, 17743 OP_ILLEGAL = 10044 17744 }; 17746 union nfs_argop4 switch (nfs_opnum4 argop) { 17747 case OP_ACCESS: ACCESS4args opaccess; 17748 case OP_CLOSE: CLOSE4args opclose; 17749 case OP_COMMIT: COMMIT4args opcommit; 17750 case OP_CREATE: CREATE4args opcreate; 17751 case OP_DELEGPURGE: DELEGPURGE4args opdelegpurge; 17752 case OP_DELEGRETURN: DELEGRETURN4args opdelegreturn; 17753 case OP_GETATTR: GETATTR4args opgetattr; 17754 case OP_GETFH: void; 17755 case OP_LINK: LINK4args oplink; 17756 case OP_LOCK: LOCK4args oplock; 17757 case OP_LOCKT: LOCKT4args oplockt; 17758 case OP_LOCKU: LOCKU4args oplocku; 17759 case OP_LOOKUP: LOOKUP4args oplookup; 17760 case OP_LOOKUPP: void; 17761 case OP_NVERIFY: NVERIFY4args opnverify; 17762 case OP_OPEN: OPEN4args opopen; 17763 case OP_OPENATTR: OPENATTR4args opopenattr; 17765 /* Not for NFSv4.1 */ 17766 case OP_OPEN_CONFIRM: OPEN_CONFIRM4args opopen_confirm; 17768 case OP_OPEN_DOWNGRADE: 17769 OPEN_DOWNGRADE4args opopen_downgrade; 17771 case OP_PUTFH: PUTFH4args opputfh; 17772 case OP_PUTPUBFH: void; 17773 case OP_PUTROOTFH: void; 17774 case OP_READ: READ4args opread; 17775 case OP_READDIR: READDIR4args opreaddir; 17776 case OP_READLINK: void; 17777 case OP_REMOVE: REMOVE4args opremove; 17778 case OP_RENAME: RENAME4args oprename; 17780 /* Not for NFSv4.1 */ 17781 case OP_RENEW: RENEW4args oprenew; 17783 case OP_RESTOREFH: void; 17784 case OP_SAVEFH: void; 17785 case OP_SECINFO: SECINFO4args opsecinfo; 17786 case OP_SETATTR: SETATTR4args opsetattr; 17788 /* Not for NFSv4.1 */ 17789 case OP_SETCLIENTID: SETCLIENTID4args opsetclientid; 17791 /* Not for NFSv4.1 */ 17792 case OP_SETCLIENTID_CONFIRM: SETCLIENTID_CONFIRM4args 17793 opsetclientid_confirm; 17794 case OP_VERIFY: VERIFY4args opverify; 17795 case OP_WRITE: WRITE4args opwrite; 17797 /* Not for NFSv4.1 */ 17798 case OP_RELEASE_LOCKOWNER: 17799 RELEASE_LOCKOWNER4args 17800 oprelease_lockowner; 17802 /* Operations new to NFSv4.1 */ 17803 case OP_BACKCHANNEL_CTL: 17804 BACKCHANNEL_CTL4args opbackchannel_ctl; 17806 case OP_BIND_CONN_TO_SESSION: 17807 BIND_CONN_TO_SESSION4args 17808 opbind_conn_to_session; 17810 case OP_EXCHANGE_ID: EXCHANGE_ID4args opexchange_id; 17812 case OP_CREATE_SESSION: 17813 CREATE_SESSION4args opcreate_session; 17815 case OP_DESTROY_SESSION: 17816 DESTROY_SESSION4args opdestroy_session; 17818 case OP_FREE_STATEID: FREE_STATEID4args opfree_stateid; 17819 case OP_GET_DIR_DELEGATION: 17820 GET_DIR_DELEGATION4args 17821 opget_dir_delegation; 17823 case OP_GETDEVICEINFO: GETDEVICEINFO4args opgetdeviceinfo; 17824 case OP_GETDEVICELIST: GETDEVICELIST4args opgetdevicelist; 17825 case OP_LAYOUTCOMMIT: LAYOUTCOMMIT4args oplayoutcommit; 17826 case OP_LAYOUTGET: LAYOUTGET4args oplayoutget; 17827 case OP_LAYOUTRETURN: LAYOUTRETURN4args oplayoutreturn; 17829 case OP_SECINFO_NO_NAME: 17830 SECINFO_NO_NAME4args opsecinfo_no_name; 17832 case OP_SEQUENCE: SEQUENCE4args opsequence; 17833 case OP_SET_SSV: SET_SSV4args opset_ssv; 17834 case OP_TEST_STATEID: TEST_STATEID4args optest_stateid; 17836 case OP_WANT_DELEGATION: 17837 WANT_DELEGATION4args opwant_delegation; 17839 case OP_DESTROY_CLIENTID: 17840 DESTROY_CLIENTID4args 17841 opdestroy_clientid; 17843 case OP_RECLAIM_COMPLETE: 17844 RECLAIM_COMPLETE4args 17845 opreclaim_complete; 17847 /* Operations not new to NFSv4.1 */ 17848 case OP_ILLEGAL: void; 17849 }; 17851 struct COMPOUND4args { 17852 utf8str_cs tag; 17853 uint32_t minorversion; 17854 nfs_argop4 argarray<>; 17855 }; 17857 16.2.2. RESULTS 17859 union nfs_resop4 switch (nfs_opnum4 resop) { 17860 case OP_ACCESS: ACCESS4res opaccess; 17861 case OP_CLOSE: CLOSE4res opclose; 17862 case OP_COMMIT: COMMIT4res opcommit; 17863 case OP_CREATE: CREATE4res opcreate; 17864 case OP_DELEGPURGE: DELEGPURGE4res opdelegpurge; 17865 case OP_DELEGRETURN: DELEGRETURN4res opdelegreturn; 17866 case OP_GETATTR: GETATTR4res opgetattr; 17867 case OP_GETFH: GETFH4res opgetfh; 17868 case OP_LINK: LINK4res oplink; 17869 case OP_LOCK: LOCK4res oplock; 17870 case OP_LOCKT: LOCKT4res oplockt; 17871 case OP_LOCKU: LOCKU4res oplocku; 17872 case OP_LOOKUP: LOOKUP4res oplookup; 17873 case OP_LOOKUPP: LOOKUPP4res oplookupp; 17874 case OP_NVERIFY: NVERIFY4res opnverify; 17875 case OP_OPEN: OPEN4res opopen; 17876 case OP_OPENATTR: OPENATTR4res opopenattr; 17877 /* Not for NFSv4.1 */ 17878 case OP_OPEN_CONFIRM: OPEN_CONFIRM4res opopen_confirm; 17880 case OP_OPEN_DOWNGRADE: 17881 OPEN_DOWNGRADE4res 17882 opopen_downgrade; 17884 case OP_PUTFH: PUTFH4res opputfh; 17885 case OP_PUTPUBFH: PUTPUBFH4res opputpubfh; 17886 case OP_PUTROOTFH: PUTROOTFH4res opputrootfh; 17887 case OP_READ: READ4res opread; 17888 case OP_READDIR: READDIR4res opreaddir; 17889 case OP_READLINK: READLINK4res opreadlink; 17890 case OP_REMOVE: REMOVE4res opremove; 17891 case OP_RENAME: RENAME4res oprename; 17892 /* Not for NFSv4.1 */ 17893 case OP_RENEW: RENEW4res oprenew; 17894 case OP_RESTOREFH: RESTOREFH4res oprestorefh; 17895 case OP_SAVEFH: SAVEFH4res opsavefh; 17896 case OP_SECINFO: SECINFO4res opsecinfo; 17897 case OP_SETATTR: SETATTR4res opsetattr; 17898 /* Not for NFSv4.1 */ 17899 case OP_SETCLIENTID: SETCLIENTID4res opsetclientid; 17901 /* Not for NFSv4.1 */ 17902 case OP_SETCLIENTID_CONFIRM: 17903 SETCLIENTID_CONFIRM4res 17904 opsetclientid_confirm; 17905 case OP_VERIFY: VERIFY4res opverify; 17906 case OP_WRITE: WRITE4res opwrite; 17908 /* Not for NFSv4.1 */ 17909 case OP_RELEASE_LOCKOWNER: 17910 RELEASE_LOCKOWNER4res 17911 oprelease_lockowner; 17913 /* Operations new to NFSv4.1 */ 17914 case OP_BACKCHANNEL_CTL: 17915 BACKCHANNEL_CTL4res 17916 opbackchannel_ctl; 17918 case OP_BIND_CONN_TO_SESSION: 17919 BIND_CONN_TO_SESSION4res 17920 opbind_conn_to_session; 17922 case OP_EXCHANGE_ID: EXCHANGE_ID4res opexchange_id; 17924 case OP_CREATE_SESSION: 17925 CREATE_SESSION4res 17926 opcreate_session; 17928 case OP_DESTROY_SESSION: 17929 DESTROY_SESSION4res 17930 opdestroy_session; 17932 case OP_FREE_STATEID: FREE_STATEID4res 17933 opfree_stateid; 17935 case OP_GET_DIR_DELEGATION: 17936 GET_DIR_DELEGATION4res 17937 opget_dir_delegation; 17939 case OP_GETDEVICEINFO: GETDEVICEINFO4res 17940 opgetdeviceinfo; 17942 case OP_GETDEVICELIST: GETDEVICELIST4res 17943 opgetdevicelist; 17945 case OP_LAYOUTCOMMIT: LAYOUTCOMMIT4res oplayoutcommit; 17946 case OP_LAYOUTGET: LAYOUTGET4res oplayoutget; 17947 case OP_LAYOUTRETURN: LAYOUTRETURN4res oplayoutreturn; 17949 case OP_SECINFO_NO_NAME: 17950 SECINFO_NO_NAME4res 17951 opsecinfo_no_name; 17953 case OP_SEQUENCE: SEQUENCE4res opsequence; 17954 case OP_SET_SSV: SET_SSV4res opset_ssv; 17955 case OP_TEST_STATEID: TEST_STATEID4res optest_stateid; 17957 case OP_WANT_DELEGATION: 17958 WANT_DELEGATION4res 17959 opwant_delegation; 17961 case OP_DESTROY_CLIENTID: 17963 DESTROY_CLIENTID4res 17964 opdestroy_clientid; 17966 case OP_RECLAIM_COMPLETE: 17967 RECLAIM_COMPLETE4res 17968 opreclaim_complete; 17970 /* Operations not new to NFSv4.1 */ 17971 case OP_ILLEGAL: ILLEGAL4res opillegal; 17972 }; 17974 struct COMPOUND4res { 17975 nfsstat4 status; 17976 utf8str_cs tag; 17977 nfs_resop4 resarray<>; 17978 }; 17980 16.2.3. DESCRIPTION 17982 The COMPOUND procedure is used to combine one or more of the NFS 17983 operations into a single RPC request. The NFS RPC program has two 17984 main procedures: NULL and COMPOUND. All other operations use the 17985 COMPOUND procedure as a wrapper. 17987 The COMPOUND procedure is used to combine individual operations into 17988 a single RPC request. The server interprets each of the operations 17989 in turn. If an operation is executed by the server and the status of 17990 that operation is NFS4_OK, then the next operation in the COMPOUND 17991 procedure is executed. The server continues this process until there 17992 are no more operations to be executed or one of the operations has a 17993 status value other than NFS4_OK. 17995 In the processing of the COMPOUND procedure, the server may find that 17996 it does not have the available resources to execute any or all of the 17997 operations within the COMPOUND sequence. See Section 2.10.5.4 for a 17998 more detailed discussion. 18000 The server will generally choose between two methods of decoding the 18001 client's request. The first would be the traditional one pass XDR 18002 decode. If there is an XDR decoding error in this case, the RPC XDR 18003 decode error would be returned. The second method would be to make 18004 an initial pass to decode the basic COMPOUND request and then to XDR 18005 decode the individual operations; the most interesting is the decode 18006 of attributes. In this case, the server may encounter an XDR decode 18007 error during the second pass. In this case, the server would return 18008 the error NFS4ERR_BADXDR to signify the decode error. 18010 The COMPOUND arguments contain a "minorversion" field. For NFSv4.1, 18011 the value for this field is 1. If the server receives a COMPOUND 18012 procedure with a minorversion field value that it does not support, 18013 the server MUST return an error of NFS4ERR_MINOR_VERS_MISMATCH and a 18014 zero length resultdata array. 18016 Contained within the COMPOUND results is a "status" field. If the 18017 results array length is non-zero, this status must be equivalent to 18018 the status of the last operation that was executed within the 18019 COMPOUND procedure. Therefore, if an operation incurred an error 18020 then the "status" value will be the same error value as is being 18021 returned for the operation that failed. 18023 Note that operations, 0 (zero) and 1 (one) are not defined for the 18024 COMPOUND procedure. Operation 2 is not defined and is reserved for 18025 future definition and use with minor versioning. If the server 18026 receives a operation array that contains operation 2 and the 18027 minorversion field has a value of 0 (zero), an error of 18028 NFS4ERR_OP_ILLEGAL, as described in the next paragraph, is returned 18029 to the client. If an operation array contains an operation 2 and the 18030 minorversion field is non-zero and the server does not support the 18031 minor version, the server returns an error of 18032 NFS4ERR_MINOR_VERS_MISMATCH. Therefore, the 18033 NFS4ERR_MINOR_VERS_MISMATCH error takes precedence over all other 18034 errors. 18036 It is possible that the server receives a request that contains an 18037 operation that is less than the first legal operation (OP_ACCESS) or 18038 greater than the last legal operation (OP_RELEASE_LOCKOWNER). In 18039 this case, the server's response will encode the opcode OP_ILLEGAL 18040 rather than the illegal opcode of the request. The status field in 18041 the ILLEGAL return results will set to NFS4ERR_OP_ILLEGAL. The 18042 COMPOUND procedure's return results will also be NFS4ERR_OP_ILLEGAL. 18044 The definition of the "tag" in the request is left to the 18045 implementor. It may be used to summarize the content of the compound 18046 request for the benefit of packet sniffers and engineers debugging 18047 implementations. However, the value of "tag" in the response SHOULD 18048 be the same value as provided in the request. This applies to the 18049 tag field of the CB_COMPOUND procedure as well. 18051 16.2.3.1. Current Filehandle and Stateid 18053 The COMPOUND procedure offers a simple environment for the execution 18054 of the operations specified by the client. The first two relate to 18055 the filehandle while the second two relate to the current stateid. 18057 16.2.3.1.1. Current Filehandle 18059 The current and saved filehandle are used throughout the protocol. 18060 Most operations implicitly use the current filehandle as a argument 18061 and many set the current filehandle as part of the results. The 18062 combination of client specified sequences of operations and current 18063 and saved filehandle arguments and results allows for greater 18064 protocol flexibility. The best or easiest example of current 18065 filehandle usage is a sequence like the following: 18067 PUTFH fh1 {fh1} 18068 LOOKUP "compA" {fh2} 18069 GETATTR {fh2} 18070 LOOKUP "compB" {fh3} 18071 GETATTR {fh3} 18072 LOOKUP "compC" {fh4} 18073 GETATTR {fh4} 18074 GETFH 18076 Figure 85 18078 In this example, the PUTFH (Section 18.19) operation explicitly sets 18079 the current filehandle value while the result of each LOOKUP 18080 operation sets the current filehandle value to the resultant file 18081 system object. Also, the client is able to insert GETATTR operations 18082 using the current filehandle as an argument. 18084 The PUTROOTFH (Section 18.21) and PUTPUBFH (Section 18.21) operations 18085 also set the current filehandle. The above example would replace 18086 "PUTFH fh1" with PUTROOTFH or PUTPUBFH with no filehandle argument in 18087 order to achieve the same effect (on the assumption that "compA" is 18088 directly below the root of the namespace). 18090 Along with the current filehandle, there is a saved filehandle. 18091 While the current filehandle is set as the result of operations like 18092 LOOKUP, the saved filehandle must be set directly with the use of the 18093 SAVEFH operation. The SAVEFH operations copies the current 18094 filehandle value to the saved value. The saved filehandle value is 18095 used in combination with the current filehandle value for the LINK 18096 and RENAME operations. The RESTOREFH operation will copy the saved 18097 filehandle value to the current filehandle value; as a result, the 18098 saved filehandle value may be used a sort of "scratch" area for the 18099 client's series of operations. 18101 16.2.3.1.2. Current Stateid 18103 With NFSv4.1, additions of a current stateid and a saved stateid have 18104 been made to the COMPOUND processing environment; this allows for the 18105 passing of stateids between operations. There are no changes to the 18106 syntax of the protocol, only changes to the semantics of a few 18107 operations. 18109 A "current stateid" is the stateid that is associated with the 18110 current filehandle. The current stateid may only be changed by an 18111 operation that modifies the current filehandle or returns a stateid. 18112 If an operation returns a stateid it MUST set the current stateid to 18113 the returned value. If an operation sets the current filehandle but 18114 does not return a stateid, the current stateid MUST be set to the 18115 all-zeros special stateid, i.e. (seqid, other) = (0, 0). If an 18116 operation uses a stateid as an argument but does not return a 18117 stateid, the current stateid MUST NOT be changed. E.g., PUTFH, 18118 PUTROOTFH, and PUTPUBFH will change the current server state from 18119 {ocfh, (osid)} to {cfh, (0, 0)} while LOCK will change the current 18120 state from {cfh, (osid} to {cfh, (nsid)}. Operations like LOOKUP 18121 that transform a current filehandle and component name into a new 18122 current filehandle will also change the current stateid to {0, 0}. 18123 The SAVEFH and RESTOREFH operations will save and restore both the 18124 current filehandle and the current stateid as a set. 18126 The following example is the common case of a simple READ operation 18127 with a supplied stateid showing that the PUTFH initializes the 18128 current stateid to (0, 0). The subsequent READ with stateid (sid1) 18129 leaves the current stateid unchanged, but does evaluate the the 18130 operation. 18132 PUTFH fh1 - -> {fh1, (0, 0)} 18133 READ (sid1), 0, 1024 {fh1, (0, 0)} -> {fh1, (0, 0)} 18135 Figure 86 18137 This next example performs an OPEN with the root filehandle and as a 18138 result generates stateid (sid1). The next operation specifies the 18139 READ with the argument stateid set such that (seqid, other) are equal 18140 to (1, 0), but the current stateid set by the previous operation is 18141 actually used when the operation is evaluated. This allows correct 18142 interaction with any existing, potentially conflicting, locks. 18144 PUTROOTFH - -> {fh1, (0, 0)} 18145 OPEN "compA" {fh1, (0, 0)} -> {fh2, (sid1)} 18146 READ (1, 0), 0, 1024 {fh2, (sid1)} -> {fh2, (sid1)} 18147 CLOSE (1, 0) {fh2, (sid1)} -> {fh2, (sid2)} 18148 Figure 87 18150 This next example is similar to the second in how it passes the 18151 stateid sid2 generated by the LOCK operation to the next READ 18152 operation. This allows the client to explicitly surround a single 18153 I/O operation with a lock and its appropriate stateid to guarantee 18154 correctness with other client locks. The example also shows how 18155 SAVEFH and RESTOREFH can save and later re-use a filehandle and 18156 stateid, passing them as the current filehandle and stateid to a READ 18157 operation. 18159 PUTFH fh1 - -> {fh1, (0, 0)} 18160 LOCK 0, 1024, (sid1) {fh1, (sid1)} -> {fh1, (sid2)} 18161 READ (1, 0), 0, 1024 {fh1, (sid2)} -> {fh1, (sid2)} 18162 LOCKU 0, 1024, (1, 0) {fh1, (sid2)} -> {fh1, (sid3)} 18163 SAVEFH {fh1, (sid3)} -> {fh1, (sid3)} 18165 PUTFH fh2 {fh1, (sid3)} -> {fh2, (0, 0)} 18166 WRITE (1, 0), 0, 1024 {fh2, (0, 0)} -> {fh2, (0, 0)} 18168 RESTOREFH {fh2, (0, 0)} -> {fh1, (sid3)} 18169 READ (1, 0), 1024, 1024 {fh1, (sid3)} -> {fh1, (sid3)} 18171 Figure 88 18173 The final example shows a disallowed use of the current stateid. The 18174 client is attempting to implicitly pass anonymous special stateid, 18175 (0,0) to the READ operation. The server MUST return 18176 NFS4ERR_BAD_STATEID in the reply to the READ operation. 18178 PUTFH fh1 - -> {fh1, (0, 0)} 18179 READ (1, 0), 0, 1024 {fh1, (0, 0)} -> NFS4ERR_BAD_STATEID 18181 Figure 89 18183 16.2.4. ERRORS 18185 COMPOUND will of course return every error that each operation on the 18186 fore channel can return (see Table 12). However if COMPOUND returns 18187 zero operations, obviously the error returned by COMPOUND has nothing 18188 to do with an error returned by an operation. The list of errors 18189 COMPOUND will return if it processes zero operations include: 18191 COMPOUND error returns 18193 +------------------------------+------------------------------------+ 18194 | Error | Notes | 18195 +------------------------------+------------------------------------+ 18196 | NFS4ERR_BADCHAR | The tag argument has a character | 18197 | | the replier does not support. | 18198 | NFS4ERR_BADXDR | | 18199 | NFS4ERR_DELAY | | 18200 | NFS4ERR_INVAL | The tag argument is not in UTF-8 | 18201 | | encoding. | 18202 | NFS4ERR_MINOR_VERS_MISMATCH | | 18203 | NFS4ERR_SERVERFAULT | | 18204 | NFS4ERR_TOO_MANY_OPS | | 18205 | NFS4ERR_REP_TOO_BIG | | 18206 | NFS4ERR_REP_TOO_BIG_TO_CACHE | | 18207 | NFS4ERR_REQ_TOO_BIG | | 18208 +------------------------------+------------------------------------+ 18210 Table 15 18212 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL 18214 The following tables summarize the operations of the NFSv4.1 protocol 18215 and the corresponding designation of REQUIRED, RECOMMENDED, OPTIONAL 18216 to implement or MUST NOT implement. The designation of MUST NOT 18217 implement is reserved for those operations that were defined in 18218 NFSv4.0 and MUST NOT be implemented in NFSv4.1. 18220 For the most part, the REQUIRED, RECOMMENDED, or OPTIONAL designation 18221 for operations sent by the client is for the server implementation. 18222 The client is generally required to implement the operations needed 18223 for the operating environment for which it serves. For example, a 18224 read-only NFSv4.1 client would have no need to implement the WRITE 18225 operation and is not required to do so. 18227 The REQUIRED or OPTIONAL designation for callback operations sent by 18228 the server is for both the client and server. Generally, the client 18229 has the option of creating the backchannel and sending the operations 18230 on the fore channel that will be a catalyst for the server sending 18231 callback operations. A partial exception is CB_RECALL_SLOT; the only 18232 way the client can avoid supporting this operation is by not creating 18233 a backchannel. 18235 Since this is a summary of the operations and their designation, 18236 there are subtleties that are not presented here. Therefore, if 18237 there is a question of the requirements of implementation, the 18238 operation descriptions themselves must be consulted along with other 18239 relevant explanatory text within this specification. 18241 The abbreviations used in the second and third columns of the table 18242 are defined as follows. 18244 REQ REQUIRED to implement 18246 REC RECOMMEND to implement 18248 OPT OPTIONAL to implement 18250 MNI MUST NOT implement 18252 For the NFSv4.1 features that are OPTIONAL, the operations that 18253 support those features are OPTIONAL and the server would return 18254 NFS4ERR_NOTSUPP in response to the client's use of those operations. 18255 If an OPTIONAL feature is supported, it is possible that a set of 18256 operations related to the feature become REQUIRED to implement. The 18257 third column of the table designates the feature(s) and if the 18258 operation is REQUIRED or OPTIONAL in the presence of support for the 18259 feature. 18261 The OPTIONAL features identified and their abbreviations are as 18262 follows: 18264 pNFS Parallel NFS 18266 FDELG File Delegations 18268 DDELG Directory Delegations 18270 Operations 18272 +----------------------+------------+--------------+----------------+ 18273 | Operation | REQ, REC, | Feature | Definition | 18274 | | OPT, or | (REQ, REC, | | 18275 | | MNI | or OPT) | | 18276 +----------------------+------------+--------------+----------------+ 18277 | ACCESS | REQ | | Section 18.1 | 18278 | BACKCHANNEL_CTL | REQ | | Section 18.33 | 18279 | BIND_CONN_TO_SESSION | REQ | | Section 18.34 | 18280 | CLOSE | REQ | | Section 18.2 | 18281 | COMMIT | REQ | | Section 18.3 | 18282 | CREATE | REQ | | Section 18.4 | 18283 | CREATE_SESSION | REQ | | Section 18.36 | 18284 | DELEGPURGE | OPT | FDELG (REQ) | Section 18.5 | 18285 | DELEGRETURN | OPT | FDELG, | Section 18.6 | 18286 | | | DDELG, pNFS | | 18287 | | | (REQ) | | 18288 | DESTROY_CLIENTID | REQ | | Section 18.50 | 18289 | DESTROY_SESSION | REQ | | Section 18.37 | 18290 | EXCHANGE_ID | REQ | | Section 18.35 | 18291 | FREE_STATEID | REQ | | Section 18.38 | 18292 | GETATTR | REQ | | Section 18.7 | 18293 | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 | 18294 | GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 | 18295 | GETFH | REQ | | Section 18.8 | 18296 | GET_DIR_DELEGATION | OPT | DDELG (REQ) | Section 18.39 | 18297 | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 | 18298 | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 | 18299 | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 | 18300 | LINK | OPT | | Section 18.9 | 18301 | LOCK | REQ | | Section 18.10 | 18302 | LOCKT | REQ | | Section 18.11 | 18303 | LOCKU | REQ | | Section 18.12 | 18304 | LOOKUP | REQ | | Section 18.13 | 18305 | LOOKUPP | REQ | | Section 18.14 | 18306 | NVERIFY | REQ | | Section 18.15 | 18307 | OPEN | REQ | | Section 18.16 | 18308 | OPENATTR | OPT | | Section 18.17 | 18309 | OPEN_CONFIRM | MNI | | N/A | 18310 | OPEN_DOWNGRADE | REQ | | Section 18.18 | 18311 | PUTFH | REQ | | Section 18.19 | 18312 | PUTPUBFH | REQ | | Section 18.20 | 18313 | PUTROOTFH | REQ | | Section 18.21 | 18314 | READ | REQ | | Section 18.22 | 18315 | READDIR | REQ | | Section 18.23 | 18316 | READLINK | OPT | | Section 18.24 | 18317 | RECLAIM_COMPLETE | REQ | | Section 18.51 | 18318 | RELEASE_LOCKOWNER | MNI | | N/A | 18319 | REMOVE | REQ | | Section 18.25 | 18320 | RENAME | REQ | | Section 18.26 | 18321 | RENEW | MNI | | N/A | 18322 | RESTOREFH | REQ | | Section 18.27 | 18323 | SAVEFH | REQ | | Section 18.28 | 18324 | SECINFO | REQ | | Section 18.29 | 18325 | SECINFO_NO_NAME | REC | pNFS files | Section 18.45, | 18326 | | | layout (REQ) | Section 13.12 | 18327 | SEQUENCE | REQ | | Section 18.46 | 18328 | SETATTR | REQ | | Section 18.30 | 18329 | SETCLIENTID | MNI | | N/A | 18330 | SETCLIENTID_CONFIRM | MNI | | N/A | 18331 | SET_SSV | REQ | | Section 18.47 | 18332 | TEST_STATEID | REQ | | Section 18.48 | 18333 | VERIFY | REQ | | Section 18.31 | 18334 | WANT_DELEGATION | OPT | FDELG (OPT) | Section 18.49 | 18335 | WRITE | REQ | | Section 18.32 | 18336 +----------------------+------------+--------------+----------------+ 18338 Callback Operations: 18340 Callback Operations 18342 +-------------------------+-----------+-------------+---------------+ 18343 | Operation | REQ, REC, | Feature | Definition | 18344 | | OPT, or | (REQ, REC, | | 18345 | | MNI | or OPT) | | 18346 +-------------------------+-----------+-------------+---------------+ 18347 | CB_GETATTR | OPT | FDELG (REQ) | Section 20.1 | 18348 | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 | 18349 | CB_NOTIFY | OPT | DDELG (REQ) | Section 20.4 | 18350 | CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.4 | 18351 | CB_NOTIFY_LOCK | OPT | | Section 20.11 | 18352 | CB_PUSH_DELEG | OPT | FDELG (OPT) | Section 20.5 | 18353 | CB_RECALL | OPT | FDELG, | Section 20.2 | 18354 | | | DDELG, pNFS | | 18355 | | | (REQ) | | 18356 | CB_RECALL_ANY | OPT | FDELG, | Section 20.6 | 18357 | | | DDELG, pNFS | | 18358 | | | (REQ) | | 18359 | CB_RECALL_SLOT | REQ | | Section 20.8 | 18360 | CB_RECALLABLE_OBJ_AVAIL | OPT | DDELG, pNFS | Section 20.7 | 18361 | | | (REQ) | | 18362 | CB_SEQUENCE | OPT | FDELG, | Section 20.9 | 18363 | | | DDELG, pNFS | | 18364 | | | (REQ) | | 18365 | CB_WANTS_CANCELLED | OPT | FDELG, | Section 20.10 | 18366 | | | DDELG, pNFS | | 18367 | | | (REQ) | | 18368 +-------------------------+-----------+-------------+---------------+ 18370 18. NFSv4.1 Operations 18372 18.1. Operation 3: ACCESS - Check Access Rights 18373 18.1.1. ARGUMENTS 18375 const ACCESS4_READ = 0x00000001; 18376 const ACCESS4_LOOKUP = 0x00000002; 18377 const ACCESS4_MODIFY = 0x00000004; 18378 const ACCESS4_EXTEND = 0x00000008; 18379 const ACCESS4_DELETE = 0x00000010; 18380 const ACCESS4_EXECUTE = 0x00000020; 18382 struct ACCESS4args { 18383 /* CURRENT_FH: object */ 18384 uint32_t access; 18385 }; 18387 18.1.2. RESULTS 18389 struct ACCESS4resok { 18390 uint32_t supported; 18391 uint32_t access; 18392 }; 18394 union ACCESS4res switch (nfsstat4 status) { 18395 case NFS4_OK: 18396 ACCESS4resok resok4; 18397 default: 18398 void; 18399 }; 18401 18.1.3. DESCRIPTION 18403 ACCESS determines the access rights that a user, as identified by the 18404 credentials in the RPC request, has with respect to the file system 18405 object specified by the current filehandle. The client encodes the 18406 set of access rights that are to be checked in the bit mask "access". 18407 The server checks the permissions encoded in the bit mask. If a 18408 status of NFS4_OK is returned, two bit masks are included in the 18409 response. The first, "supported", represents the access rights for 18410 which the server can verify reliably. The second, "access", 18411 represents the access rights available to the user for the filehandle 18412 provided. On success, the current filehandle retains its value. 18414 Note that the reply's supported and access fields MUST NOT contain 18415 more values than originally set in the request's access field. For 18416 example, if the client sends an ACCESS operation with just the 18417 ACCESS4_READ value set and the server supports this value, the server 18418 MUST NOT set more than ACCESS4_READ in the supported field even if it 18419 could have reliably checked other values. 18421 The reply's access field MUST NOT contain more values than the 18422 supported field. 18424 The results of this operation are necessarily advisory in nature. A 18425 return status of NFS4_OK and the appropriate bit set in the bit mask 18426 does not imply that such access will be allowed to the file system 18427 object in the future. This is because access rights can be revoked 18428 by the server at any time. 18430 The following access permissions may be requested: 18432 ACCESS4_READ Read data from file or read a directory. 18434 ACCESS4_LOOKUP Look up a name in a directory (no meaning for non- 18435 directory objects). 18437 ACCESS4_MODIFY Rewrite existing file data or modify existing 18438 directory entries. 18440 ACCESS4_EXTEND Write new data or add directory entries. 18442 ACCESS4_DELETE Delete an existing directory entry. 18444 ACCESS4_EXECUTE Execute a regular file (no meaning for a directory). 18446 On success, the current filehandle retains its value. 18448 ACCESS4_EXECUTE is a challenging semantic to implement because NFS 18449 provides remote file access, not remote execution. This leads to the 18450 following: 18452 o Whether a regular file is executable or not ought to be the 18453 responsibility of the NFS client and not the server. And yet the 18454 ACCESS operation is specified to seemingly require a server to own 18455 that responsibility. 18457 o When a client executes a regular file, it has to read the file 18458 from the server. Strictly speaking, the server should not allow 18459 the client to read a file being executed unless the user has read 18460 permissions on the file. Requiring users and administers to set 18461 read permissions on executable files in order to access them over 18462 NFS is not going to be acceptable to some people. Historically, 18463 NFS servers have allowed a user to READ a file if the user has 18464 execute access to the file. 18466 As a practical example, the UNIX specification [41] states that an 18467 implementation claiming conformance to UNIX may indicate in the 18468 access() programming interface's result that a privileged user has 18469 execute rights, even if no execute permission bits are set on the 18470 regular file's attributes. It is possible to claim conformance to 18471 the UNIX specification and instead not indicate execute rights in 18472 that situation, which is true for some operating environments. 18473 Suppose the operating environments of the client and server are 18474 implementing the access() semantics for privileged users differently, 18475 and the ACCESS operation implementations of the client and server 18476 follow their respective access() semantics. This can cause undesired 18477 behavior: 18479 o Suppose the client's access() interface returns X_OK if the user 18480 is privileged and no execute permission bits are set on the 18481 regular file's attribute, and the server's access() interface does 18482 not return X_OK in that situation. Then the client will be unable 18483 to execute files stored on the NFS server that could be executed 18484 if stored on a non-NFS file system. 18486 o Suppose the client's access() interface does not return X_OK if 18487 the user is privileged, and no execute permission bits are set on 18488 the regular file's attribute, and the server's access() interface 18489 does return X_OK in that situation. Then: 18491 * The client will be able to execute files stored on the NFS 18492 server that could be executed if stored on a non-NFS file 18493 system, unless the client's execution subsystem also checks for 18494 execute permission bits. 18496 * Even if the execution subsystem is checking for execute 18497 permission bits, there are more potential issues. E.g. suppose 18498 the client is invoking access() to build a "path search table" 18499 of all executable files in the user's "search path", where the 18500 path is a list of directories each containing executable files. 18501 Suppose there are two files each in separate directories of the 18502 search path, such that files have the same component name. In 18503 the first directory the file has no execute permission bits 18504 set, and in the second directory the file has execute bits set. 18505 The path search table will indicate that the first directory 18506 has the executable file, but the execute subsystem will fail to 18507 execute it. The command shell might fail to try the second 18508 file in the second directory. And even if it did, this is a 18509 potential performance issue. Clearly the desired outcome for 18510 the client is for the path search table to not contain the 18511 first file. 18513 To deal the problems described above, the smart client, stupid server 18514 principle is used. The client owns overall responsibility for 18515 determining execute access and relies on the server to parse the 18516 execution permissions within the file's mode, acl, and dacl 18517 attributes. The rules for the client and server follow: 18519 o If the client is sending ACCESS in order to determine if the user 18520 can read the file, the client SHOULD set ACCESS4_READ in the 18521 request's access field. 18523 o If the client's operating environment only grants execution to the 18524 user if the user has execute access according to the execute 18525 permissions in the mode, acl, and dacl attributes, then if the 18526 client wants to determine execute access, the client SHOULD send 18527 an ACCESS request with ACCESS4_EXECUTE bit set in the request's 18528 access field. 18530 o If the client's operating environment grants execution to the user 18531 even if the user does not have execute access according to the 18532 execute permissions in the mode, acl, and dacl attributes, then if 18533 the client wants to determine execute access, it SHOULD send an 18534 ACCESS request with both the ACCESS4_EXECUTE and ACCESS4_READ bits 18535 set in the request's access field. This way, if any read or 18536 execute permission grants the user read or execute access (or if 18537 the server interprets the user as privileged), as indicated by the 18538 presence of ACCESS4_EXECUTE and/or ACCESS4_READ in the reply's 18539 access field, the client will be able to grant the user execute 18540 access to the file. 18542 o If the server supports execute permission bits, or some other 18543 method for denoting executability (e.g. the suffix of the name of 18544 the file might indicate execute), it MUST check only execute 18545 permissions, not read permissions, when determining whether the 18546 reply will have ACCESS4_EXECUTE set in the access field or not. 18547 The server MUST NOT also examine read permission bits when 18548 determining whether the reply will have ACCESS4_EXECUTE set in the 18549 access field or not. Even if the server's operating environment 18550 would grant execute access to the user (e.g., the user is 18551 privileged), the server MUST NOT reply with ACCESS4_EXECUTE set in 18552 reply's access field, unless there is at least one execute 18553 permission bit set in the mode, acl, or dacl attributes. In the 18554 case of acl and dacl, the "one execute permission bit" MUST be an 18555 ACE4_EXECUTE bit set in an ALLOW ACE. 18557 o If the server does not support execute permission bits or some 18558 other method for denoting executability, it MUST NOT set 18559 ACCESS4_EXECUTE in the reply's supported and access fields. If 18560 the client set ACCESS4_EXECUTE in the ACCESS request's access 18561 field, and ACCESS4_EXECUTE is not set in the reply's supported 18562 field, then the client will have to send an ACCESS request with 18563 the ACCESS4_READ bit set in the request's access field. 18565 o If the server supports read permission bits, it MUST only check 18566 for read permissions in the mode, acl, and dacl attributes when it 18567 receives an ACCESS request with ACCESS4_READ set the access field. 18568 The server MUST NOT also examine execute permission bits when 18569 determining whether the reply will have ACCESS4_READ set in the 18570 access field or not. 18572 Note that if the ACCESS reply has ACCESS4_READ or ACCESS_EXECUTE set, 18573 then the user also has permissions to OPEN (Section 18.16) or READ 18574 (Section 18.22) the file. I.e., if client sends an ACCESS request 18575 with the ACCESS4_READ and ACCESS_EXECUTE set in the access field (or 18576 two separate requests, one with ACCESS4_READ set, and the other with 18577 ACCESS4_EXECUTE set), and the reply has just ACCESS4_EXECUTE set in 18578 the access field (or just one reply has ACCESS4_EXECUTE set), then 18579 the user has authorization to OPEN or READ the file. 18581 18.1.4. IMPLEMENTATION 18583 In general, it is not sufficient for the client to attempt to deduce 18584 access permissions by inspecting the uid, gid, and mode fields in the 18585 file attributes or by attempting to interpret the contents of the ACL 18586 attribute. This is because the server may perform uid or gid mapping 18587 or enforce additional access control restrictions. It is also 18588 possible that the server may not be in the same ID space as the 18589 client. In these cases (and perhaps others), the client can not 18590 reliably perform an access check with only current file attributes. 18592 In the NFSv2 protocol, the only reliable way to determine whether an 18593 operation was allowed was to try it and see if it succeeded or 18594 failed. Using the ACCESS operation in the NFSv4.1 protocol, the 18595 client can ask the server to indicate whether or not one or more 18596 classes of operations are permitted. The ACCESS operation is 18597 provided to allow clients to check before doing a series of 18598 operations which will result in an access failure. The OPEN 18599 operation provides a point where the server can verify access to the 18600 file object and method to return that information to the client. The 18601 ACCESS operation is still useful for directory operations or for use 18602 in the case the UNIX interface access() is used on the client. 18604 The information returned by the server in response to an ACCESS call 18605 is not permanent. It was correct at the exact time that the server 18606 performed the checks, but not necessarily afterwards. The server can 18607 revoke access permission at any time. 18609 The client should use the effective credentials of the user to build 18610 the authentication information in the ACCESS request used to 18611 determine access rights. It is the effective user and group 18612 credentials that are used in subsequent read and write operations. 18614 Many implementations do not directly support the ACCESS4_DELETE 18615 permission. Operating systems like UNIX will ignore the 18616 ACCESS4_DELETE bit if set on an access request on a non-directory 18617 object. In these systems, delete permission on a file is determined 18618 by the access permissions on the directory in which the file resides, 18619 instead of being determined by the permissions of the file itself. 18620 Therefore, the mask returned enumerating which access rights can be 18621 determined will have the ACCESS4_DELETE value set to 0. This 18622 indicates to the client that the server was unable to check that 18623 particular access right. The ACCESS4_DELETE bit in the access mask 18624 returned will then be ignored by the client. 18626 18.2. Operation 4: CLOSE - Close File 18628 18.2.1. ARGUMENTS 18630 struct CLOSE4args { 18631 /* CURRENT_FH: object */ 18632 seqid4 seqid; 18633 stateid4 open_stateid; 18634 }; 18636 18.2.2. RESULTS 18638 union CLOSE4res switch (nfsstat4 status) { 18639 case NFS4_OK: 18640 stateid4 open_stateid; 18641 default: 18642 void; 18643 }; 18645 18.2.3. DESCRIPTION 18647 The CLOSE operation releases share reservations for the regular or 18648 named attribute file as specified by the current filehandle. The 18649 share reservations and other state information released at the server 18650 as a result of this CLOSE is only that associated with the supplied 18651 stateid. State associated with other OPENs is not affected. 18653 If byte-range locks are held, the client SHOULD release all locks 18654 before issuing a CLOSE. The server MAY free all outstanding locks on 18655 CLOSE but some servers may not support the CLOSE of a file that still 18656 has byte-range locks held. The server MUST return failure if any 18657 locks would exist after the CLOSE. 18659 The argument seqid MAY have any value and the server MUST ignore 18660 seqid. 18662 On success, the current filehandle retains its value. 18664 The server MAY require that the principal, security flavor, and 18665 applicable, the GSS mechanism, combination that sent the OPEN request 18666 also be the one to CLOSE the file. This might not be possible if 18667 credentials for the principal are no longer available. The server 18668 MAY allow the machine credential or SSV credential (see 18669 Section 18.35) to send CLOSE. 18671 18.2.4. IMPLEMENTATION 18673 Even though CLOSE returns a stateid, this stateid is not useful to 18674 the client and should be treated as deprecated. CLOSE "shuts down" 18675 the state associated with all OPENs for the file by a single open- 18676 owner. As noted above, CLOSE will either release all file locking 18677 state or return an error. Therefore, the stateid returned by CLOSE 18678 is not useful for operations that follow. To help find any uses of 18679 this stateid by clients, the server SHOULD return the invalid special 18680 stated (the "other" value is zero and the "seqid" field is 18681 NFS4_UINT32_MAX, see Section 8.2.3). 18683 A CLOSE operation may make delegations grantable where they were not 18684 previously. Servers may choose to respond immediately if there are 18685 pending delegation want requests or may respond to the situation at a 18686 later time. 18688 18.3. Operation 5: COMMIT - Commit Cached Data 18690 18.3.1. ARGUMENTS 18692 struct COMMIT4args { 18693 /* CURRENT_FH: file */ 18694 offset4 offset; 18695 count4 count; 18696 }; 18698 18.3.2. RESULTS 18700 struct COMMIT4resok { 18701 verifier4 writeverf; 18702 }; 18704 union COMMIT4res switch (nfsstat4 status) { 18705 case NFS4_OK: 18706 COMMIT4resok resok4; 18707 default: 18708 void; 18709 }; 18711 18.3.3. DESCRIPTION 18713 The COMMIT operation forces or flushes uncommitted, modified data to 18714 stable storage for the file specified by the current filehandle. The 18715 flushed data is that which was previously written with a WRITE 18716 operation which had the stable field set to UNSTABLE4. 18718 The offset specifies the position within the file where the flush is 18719 to begin. An offset value of 0 (zero) means to flush data starting 18720 at the beginning of the file. The count specifies the number of 18721 bytes of data to flush. If count is 0 (zero), a flush from offset to 18722 the end of the file is done. 18724 The server returns a write verifier upon successful completion of the 18725 COMMIT. The write verifier is used by the client to determine if the 18726 server has restarted between the initial WRITE(s) and the COMMIT. 18727 The client does this by comparing the write verifier returned from 18728 the initial writes and the verifier returned by the COMMIT operation. 18729 The server must vary the value of the write verifier at each server 18730 event or instantiation that may lead to a loss of uncommitted data. 18731 Most commonly this occurs when the server is restarted; however, 18732 other events at the server may result in uncommitted data loss as 18733 well. 18735 On success, the current filehandle retains its value. 18737 18.3.4. IMPLEMENTATION 18739 The COMMIT operation is similar in operation and semantics to the 18740 POSIX fsync(2) system call that synchronizes a file's state with the 18741 disk (file data and metadata is flushed to disk or stable storage). 18742 COMMIT performs the same operation for a client, flushing any 18743 unsynchronized data and metadata on the server to the server's disk 18744 or stable storage for the specified file. Like fsync(2), it may be 18745 that there is some modified data or no modified data to synchronize. 18746 The data may have been synchronized by the server's normal periodic 18747 buffer synchronization activity. COMMIT should return NFS4_OK, 18748 unless there has been an unexpected error. 18750 COMMIT differs from fsync(2) in that it is possible for the client to 18751 flush a range of the file (most likely triggered by a buffer- 18752 reclamation scheme on the client before file has been completely 18753 written). 18755 The server implementation of COMMIT is reasonably simple. If the 18756 server receives a full file COMMIT request, that is starting at 18757 offset 0 and count 0, it should do the equivalent of fsync()'ing the 18758 file. Otherwise, it should arrange to have the modified data in the 18759 range specified by offset and count to be flushed to stable storage. 18760 In both cases, any metadata associated with the file must be flushed 18761 to stable storage before returning. It is not an error for there to 18762 be nothing to flush on the server. This means that the data and 18763 metadata that needed to be flushed have already been flushed or lost 18764 during the last server failure. 18766 The client implementation of COMMIT is a little more complex. There 18767 are two reasons for wanting to commit a client buffer to stable 18768 storage. The first is that the client wants to reuse a buffer. In 18769 this case, the offset and count of the buffer are sent to the server 18770 in the COMMIT request. The server then flushes any modified data 18771 based on the offset and count, and flushes any modified metadata 18772 associated with the file. It then returns the status of the flush 18773 and the write verifier. The other reason for the client to generate 18774 a COMMIT is for a full file flush, such as may be done at close. In 18775 this case, the client would gather all of the buffers for this file 18776 that contain uncommitted data, do the COMMIT operation with an offset 18777 of 0 and count of 0, and then free all of those buffers. Any other 18778 dirty buffers would be sent to the server in the normal fashion. 18780 After a buffer is written by the client with the stable parameter set 18781 to UNSTABLE4, the buffer must be considered as modified by the client 18782 until the buffer has either been flushed via a COMMIT operation or 18783 written via a WRITE operation with stable parameter set to FILE_SYNC4 18784 or DATA_SYNC4. This is done to prevent the buffer from being freed 18785 and reused before the data can be flushed to stable storage on the 18786 server. 18788 When a response is returned from either a WRITE or a COMMIT operation 18789 and it contains a write verifier that is different than previously 18790 returned by the server, the client will need to retransmit all of the 18791 buffers containing uncommitted data to the server. How this is to be 18792 done is up to the implementor. If there is only one buffer of 18793 interest, then it should sent in a WRITE request with the FILE_SYNC4 18794 stable parameter. If there is more than one buffer, it might be 18795 worthwhile retransmitting all of the buffers in WRITE requests with 18796 the stable parameter set to UNSTABLE4 and then retransmitting the 18797 COMMIT operation to flush all of the data on the server to stable 18798 storage. However, if the server repeatably returns from COMMIT a 18799 verifier that differs from that returned by WRITE, the only way to 18800 ensure progress is to retransmit all of the buffers with WRITE 18801 requests with the FILE_SYNC4 stable parameter. 18803 The above description applies to page-cache-based systems as well as 18804 buffer-cache-based systems. In those systems, the virtual memory 18805 system will need to be modified instead of the buffer cache. 18807 18.4. Operation 6: CREATE - Create a Non-Regular File Object 18809 18.4.1. ARGUMENTS 18811 union createtype4 switch (nfs_ftype4 type) { 18812 case NF4LNK: 18813 linktext4 linkdata; 18814 case NF4BLK: 18815 case NF4CHR: 18816 specdata4 devdata; 18817 case NF4SOCK: 18818 case NF4FIFO: 18819 case NF4DIR: 18820 void; 18821 default: 18822 void; /* server should return NFS4ERR_BADTYPE */ 18823 }; 18825 struct CREATE4args { 18826 /* CURRENT_FH: directory for creation */ 18827 createtype4 objtype; 18828 component4 objname; 18829 fattr4 createattrs; 18830 }; 18832 18.4.2. RESULTS 18834 struct CREATE4resok { 18835 change_info4 cinfo; 18836 bitmap4 attrset; /* attributes set */ 18837 }; 18839 union CREATE4res switch (nfsstat4 status) { 18840 case NFS4_OK: 18841 /* new CURRENTFH: created object */ 18842 CREATE4resok resok4; 18843 default: 18844 void; 18845 }; 18847 18.4.3. DESCRIPTION 18849 The CREATE operation creates a file object other than an ordinary 18850 file in a directory with a given name. The OPEN operation MUST be 18851 used to create a regular file or a named attribute. 18853 The current filehandle must be a directory: an object of type NF4DIR. 18854 If the current filehandle is an attribute directory (type 18855 NF4ATTRDIR), the error NFS4ERR_WRONG_TYPE is returned. If the 18856 current file handle designates any other type of object, the error 18857 NFS4ERR_NOTDIR results. 18859 The objname specifies the name for the new object. The objtype 18860 determines the type of object to be created: directory, symlink, etc. 18861 If the object type specified is that of an ordinary file, a named 18862 attribute, or a named attribute directory, the error NFS4ERR_BADTYPE 18863 results. 18865 If an object of the same name already exists in the directory, the 18866 server will return the error NFS4ERR_EXIST. 18868 For the directory where the new file object was created, the server 18869 returns change_info4 information in cinfo. With the atomic field of 18870 the change_info4 data type, the server will indicate if the before 18871 and after change attributes were obtained atomically with respect to 18872 the file object creation. 18874 If the objname has a length of 0 (zero), or if objname does not obey 18875 the UTF-8 definition, the error NFS4ERR_INVAL will be returned. 18877 The current filehandle is replaced by that of the new object. 18879 The createattrs specifies the initial set of attributes for the 18880 object. The set of attributes may include any writable attribute 18881 valid for the object type. When the operation is successful, the 18882 server will return to the client an attribute mask signifying which 18883 attributes were successfully set for the object. 18885 If createattrs includes neither the owner attribute nor an ACL with 18886 an ACE for the owner, and if the server's file system both supports 18887 and requires an owner attribute (or an owner ACE) then the server 18888 MUST derive the owner (or the owner ACE). This would typically be 18889 from the principal indicated in the RPC credentials of the call, but 18890 the server's operating environment or file system semantics may 18891 dictate other methods of derivation. Similarly, if createattrs 18892 includes neither the group attribute nor a group ACE, and if the 18893 server's file system both supports and requires the notion of a group 18894 attribute (or group ACE), the server MUST derive the group attribute 18895 (or the corresponding owner ACE) for the file. This could be from 18896 the RPC call's credentials, such as the group principal if the 18897 credentials include it (such as with AUTH_SYS), from the group 18898 identifier associated with the principal in the credentials (for 18899 e.g., POSIX systems have a passwd database that has the group 18900 identifier for every user identifier), inherited from directory the 18901 object is created in, or whatever else the server's operating 18902 environment or file system semantics dictate. This applies to the 18903 OPEN operation too. 18905 Conversely, it is possible the client will specify in createattrs an 18906 owner attribute, group attribute, or ACL that the principal indicated 18907 the RPC call's credentials does not have permissions to create files 18908 for. The error to be returned in this instance is NFS4ERR_PERM. 18909 This applies to the OPEN operation too. 18911 If the current filehandle designates a directory for which another 18912 client holds a directory delegation, then, unless the delegation is 18913 such that the situation can be resolved by sending a notification, 18914 the delegation MUST be recalled, and the CREATE operation MUST NOT 18915 proceed until the delegation is returned or revoked. Except where 18916 this happens very quickly, one or more NFS4ERR_DELAY errors will be 18917 returned to requests made while delegation remains outstanding. 18919 When the current filehandle designates a directory for which one or 18920 more directory delegations exist, then, when those delegations 18921 request such notifications, NOTIFY4_ADD_ENTRY will be generated as a 18922 result of this operation. 18924 If the capability FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 is set 18925 (Section 14.4), and a symbolic link is being created, then the 18926 content of the symbolic link MUST be in UTF-8 encoding. 18928 18.4.4. IMPLEMENTATION 18930 If the client desires to set attribute values after the create, a 18931 SETATTR operation can be added to the COMPOUND request so that the 18932 appropriate attributes will be set. 18934 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting Recovery 18936 18.5.1. ARGUMENTS 18938 struct DELEGPURGE4args { 18939 clientid4 clientid; 18940 }; 18942 18.5.2. RESULTS 18944 struct DELEGPURGE4res { 18945 nfsstat4 status; 18946 }; 18948 18.5.3. DESCRIPTION 18950 Purges all of the delegations awaiting recovery for a given client. 18951 This is useful for clients which do not commit delegation information 18952 to stable storage to indicate that conflicting requests need not be 18953 delayed by the server awaiting recovery of delegation information. 18955 The client is NOT specified by the clientid field of the request. 18956 The client SHOULD set the client field to zero and the server MUST 18957 ignore the clientid field. Instead the server MUST derive the client 18958 ID from the value of the session ID in the arguments of the SEQUENCE 18959 operation that precedes DELEGPURGE in the COMPOUND request. 18961 This operation should be used by clients that record delegation 18962 information on stable storage on the client. In this case, 18963 DELEGPURGE should be sent immediately after doing delegation recovery 18964 on all delegations known to the client. Doing so will notify the 18965 server that no additional delegations for the client will be 18966 recovered allowing it to free resources, and avoid delaying other 18967 clients which make requests that conflict with the unrecovered 18968 delegations. The set of delegations known to the server and the 18969 client may be different. The reason for this is that a client may 18970 fail after making a request which resulted in delegation but before 18971 it received the results and committed them to the client's stable 18972 storage. 18974 The server MAY support DELEGPURGE, but if it does not, it MUST NOT 18975 support CLAIM_DELEGATE_PREV. 18977 18.6. Operation 8: DELEGRETURN - Return Delegation 18979 18.6.1. ARGUMENTS 18981 struct DELEGRETURN4args { 18982 /* CURRENT_FH: delegated object */ 18983 stateid4 deleg_stateid; 18984 }; 18986 18.6.2. RESULTS 18988 struct DELEGRETURN4res { 18989 nfsstat4 status; 18990 }; 18992 18.6.3. DESCRIPTION 18994 Returns the delegation represented by the current filehandle and 18995 stateid. 18997 Delegations may be returned when recalled or voluntarily (i.e. before 18998 the server has recalled them). In either case the client must 18999 properly propagate state changed under the context of the delegation 19000 to the server before returning the delegation. 19002 The server MAY require that the principal, security flavor, and if 19003 applicable, the GSS mechanism, combination that acquired the 19004 delegation also be the one to send DELEGRETURN on the file. This 19005 might not be possible if credentials for the principal are no longer 19006 available. The server MAY allow the machine credential or SSV 19007 credential (see Section 18.35) to send DELEGRETURN. 19009 18.7. Operation 9: GETATTR - Get Attributes 19011 18.7.1. ARGUMENTS 19013 struct GETATTR4args { 19014 /* CURRENT_FH: object */ 19015 bitmap4 attr_request; 19016 }; 19018 18.7.2. RESULTS 19020 struct GETATTR4resok { 19021 fattr4 obj_attributes; 19022 }; 19024 union GETATTR4res switch (nfsstat4 status) { 19025 case NFS4_OK: 19026 GETATTR4resok resok4; 19027 default: 19028 void; 19029 }; 19031 18.7.3. DESCRIPTION 19033 The GETATTR operation will obtain attributes for the file system 19034 object specified by the current filehandle. The client sets a bit in 19035 the bitmap argument for each attribute value that it would like the 19036 server to return. The server returns an attribute bitmap that 19037 indicates the attribute values which it was able to return, which 19038 will include all attributes requested by the client which are 19039 attributes supported by the server for the target file system. This 19040 bitmap is followed by the attribute values ordered lowest attribute 19041 number first. 19043 The server MUST return a value for each attribute that the client 19044 requests if the attribute is supported by the server for the target 19045 file system. If the server does not support a particular attribute 19046 on the target file system then it MUST NOT return the attribute value 19047 and MUST NOT set the attribute bit in the result bitmap. The server 19048 MUST return an error if it supports an attribute on the target but 19049 cannot obtain its value. In that case, no attribute values will be 19050 returned. 19052 File systems which are absent should be treated as having support for 19053 a very small set of attributes as described in GETATTR Within an 19054 Absent File System (Section 5), even if previously, when the file 19055 system was present, more attributes were supported. 19057 All servers MUST support the REQUIRED attributes as specified in File 19058 Attributes (Section 11.3.1), for all file systems, with the exception 19059 of absent file systems. 19061 On success, the current filehandle retains its value. 19063 18.7.4. IMPLEMENTATION 19065 Suppose there is a write delegation held by another client for file 19066 in question and size and/or change are among the set of attributes 19067 being interrogated. The server has two choices. First, the server 19068 can obtain the actual current value of these attributes from the 19069 client holding the delegation by using the CB_GETATTR callback. 19070 Second, the server, particularly when the delegated client is 19071 unresponsive, can recall the delegation in question. The GETATTR 19072 MUST NOT proceed until one of the following occurs: 19074 o The requested attribute values are returned in the response to 19075 CB_GETATTR. 19077 o The write delegation is returned. 19079 o The write delegation is revoked. 19081 Unless one of the above happens very quickly, one or more 19082 NFS4ERR_DELAY errors will be returned if while a delegation is 19083 outstanding. 19085 18.8. Operation 10: GETFH - Get Current Filehandle 19087 18.8.1. ARGUMENTS 19089 /* CURRENT_FH: */ 19090 void; 19092 18.8.2. RESULTS 19094 struct GETFH4resok { 19095 nfs_fh4 object; 19096 }; 19098 union GETFH4res switch (nfsstat4 status) { 19099 case NFS4_OK: 19100 GETFH4resok resok4; 19101 default: 19102 void; 19103 }; 19105 18.8.3. DESCRIPTION 19107 This operation returns the current filehandle value. 19109 On success, the current filehandle retains its value. 19111 As described in Section 2.10.5.4, GETFH is REQUIRED or RECOMMENDED to 19112 immediately follow certain operations, and servers are free to reject 19113 such operations the client fails to insert GETFH in the request as 19114 REQUIRED or RECOMMENDED. Section 18.16.4.1 provides additional 19115 justification for why GETFH MUST follow OPEN. 19117 18.8.4. IMPLEMENTATION 19119 Operations that change the current filehandle like LOOKUP or CREATE 19120 do not automatically return the new filehandle as a result. For 19121 instance, if a client needs to lookup a directory entry and obtain 19122 its filehandle then the following request is needed. 19124 PUTFH (directory filehandle) 19126 LOOKUP (entry name) 19128 GETFH 19130 18.9. Operation 11: LINK - Create Link to a File 19132 18.9.1. ARGUMENTS 19134 struct LINK4args { 19135 /* SAVED_FH: source object */ 19136 /* CURRENT_FH: target directory */ 19137 component4 newname; 19138 }; 19140 18.9.2. RESULTS 19142 struct LINK4resok { 19143 change_info4 cinfo; 19144 }; 19146 union LINK4res switch (nfsstat4 status) { 19147 case NFS4_OK: 19148 LINK4resok resok4; 19149 default: 19150 void; 19151 }; 19153 18.9.3. DESCRIPTION 19155 The LINK operation creates an additional newname for the file 19156 represented by the saved filehandle, as set by the SAVEFH operation, 19157 in the directory represented by the current filehandle. The existing 19158 file and the target directory must reside within the same file system 19159 on the server. On success, the current filehandle will continue to 19160 be the target directory. If an object exists in the target directory 19161 with the same name as newname, the server must return NFS4ERR_EXIST. 19163 For the target directory, the server returns change_info4 information 19164 in cinfo. With the atomic field of the change_info4 data type, the 19165 server will indicate if the before and after change attributes were 19166 obtained atomically with respect to the link creation. 19168 If the newname has a length of 0 (zero), or if newname does not obey 19169 the UTF-8 definition, the error NFS4ERR_INVAL will be returned. 19171 18.9.4. IMPLEMENTATION 19173 The server MAY impose restrictions on the LINK operation such that 19174 LINK may not be done when the file is open or when that open is done 19175 by particular protocols, or with particular options or access modes. 19176 When LINK is rejected because of such restrictions, the error 19177 NFS4ERR_FILE_OPEN is returned. 19179 If a server does implement such restrictions and those restrictions 19180 include cases of NFSv4 opens preventing successful execution of a 19181 link, the server needs to recall any delegations which could hide the 19182 existence of opens relevant to that decision. The reason is that 19183 when a client holds a delegation, the server might not have an 19184 accurate account of the opens for that client, since the client may 19185 execute OPENs and CLOSEs locally. The LINK operation must be delayed 19186 only until a definitive result can be obtained. E.g., suppose there 19187 are multiple delegations and one of them establishes an open whose 19188 presence would prevent the link. Given the server's semantics, 19189 NFS4ERR_FILE_OPEN may be returned to the caller as soon as that 19190 delegation is returned without waiting for other delegations to be 19191 returned. Similarly, if such opens are not associated with 19192 delegations, NFS4ERR_FILE_OPEN can be returned immediately with no 19193 delegation recall being done. 19195 If the current filehandle designates a directory for which another 19196 client holds a directory delegation, then, unless the delegation is 19197 such that the situation can be resolved by sending a notification, 19198 the delegation MUST be recalled, and the operation cannot be 19199 performed successfully. until the delegation is returned or revoked. 19200 Except where this happens very quickly, one or more NFS4ERR_DELAY 19201 errors will be returned to requests made while delegation remains 19202 outstanding. 19204 When the current filehandle designates a directory for which one or 19205 more directory delegations exist, then, when those delegations 19206 request such notifications, instead of a recall, NOTIFY4_ADD_ENTRY 19207 will be generated as a result of the LINK operation. 19209 If the current file system supports the numlinks attribute, and other 19210 clients have delegations to the file being linked, then those 19211 delegations MUST be recalled and the LINK operation MUST NOT proceed 19212 until all delegations are returned or revoked. Except where this 19213 happens very quickly, one or more NFS4ERR_DELAY errors will be 19214 returned to requests made while delegation remains outstanding. 19216 Changes to any property of the "hard" linked files are reflected in 19217 all of the linked files. When a link is made to a file, the 19218 attributes for the file should have a value for numlinks that is one 19219 greater than the value before the LINK operation. 19221 The statement "file and the target directory must reside within the 19222 same file system on the server" means that the fsid fields in the 19223 attributes for the objects are the same. If they reside on different 19224 file systems, the error NFS4ERR_XDEV is returned. This error may be 19225 returned by some server when there is an internal partitioning of a 19226 file system which the LINK operation would violate. 19228 On some servers, "." and ".." are illegal values for newname and the 19229 error NFS4ERR_BADNAME will be returned if they are specified. 19231 When the current filehandle designates a named attribute directory 19232 and the object to be linked (the saved filehandle) is not a named 19233 attribute for the same object, the error NFS4ERR_XDEV MUST be 19234 returned. When the saved filehandle designates a named attribute and 19235 the current filehandle is not the appropriate named attribute 19236 directory, the error NFS4ERR_XDEV MUST also be returned. 19238 When the current filehandle designates a named attribute directory 19239 and the object to be linked (the saved filehandle) is a named 19240 attribute within that directory, the server may return the error 19241 NFS4ERR_NOTSUPP. 19243 In the case that newname is already linked to the file represented by 19244 the saved filehandle, the server will return NFS4ERR_EXIST. 19246 Note that symbolic links are created with the CREATE operation. 19248 18.10. Operation 12: LOCK - Create Lock 19250 18.10.1. ARGUMENTS 19252 /* 19253 * For LOCK, transition from open_stateid and lock_owner 19254 * to a lock stateid. 19255 */ 19256 struct open_to_lock_owner4 { 19257 seqid4 open_seqid; 19258 stateid4 open_stateid; 19259 seqid4 lock_seqid; 19260 lock_owner4 lock_owner; 19261 }; 19263 /* 19264 * For LOCK, existing lock stateid continues to request new 19265 * file lock for the same lock_owner and open_stateid. 19266 */ 19267 struct exist_lock_owner4 { 19268 stateid4 lock_stateid; 19269 seqid4 lock_seqid; 19270 }; 19272 union locker4 switch (bool new_lock_owner) { 19273 case TRUE: 19274 open_to_lock_owner4 open_owner; 19275 case FALSE: 19276 exist_lock_owner4 lock_owner; 19277 }; 19279 /* 19280 * LOCK/LOCKT/LOCKU: Record lock management 19281 */ 19282 struct LOCK4args { 19283 /* CURRENT_FH: file */ 19284 nfs_lock_type4 locktype; 19285 bool reclaim; 19286 offset4 offset; 19287 length4 length; 19288 locker4 locker; 19289 }; 19291 18.10.2. RESULTS 19293 struct LOCK4denied { 19294 offset4 offset; 19295 length4 length; 19296 nfs_lock_type4 locktype; 19297 lock_owner4 owner; 19298 }; 19300 struct LOCK4resok { 19301 stateid4 lock_stateid; 19302 }; 19304 union LOCK4res switch (nfsstat4 status) { 19305 case NFS4_OK: 19306 LOCK4resok resok4; 19307 case NFS4ERR_DENIED: 19308 LOCK4denied denied; 19309 default: 19310 void; 19311 }; 19313 18.10.3. DESCRIPTION 19315 The LOCK operation requests a byte-range lock for the byte range 19316 specified by the offset and length parameters, and lock type 19317 specified in the locktype parameter. If this is a reclaim request, 19318 the reclaim parameter will be TRUE. 19320 Bytes in a file may be locked even if those bytes are not currently 19321 allocated to the file. To lock the file from a specific offset 19322 through the end-of-file (no matter how long the file actually is) use 19323 a length field equal to NFS4_UINT64_MAX. The server MUST return 19324 NFS4ERR_INVAL under the following combinations of length and offset: 19326 o Length is equal to zero. 19328 o Length is not equal to NFS4_UINT64_MAX, and the sum of length and 19329 offset exceeds NFS4_UINT64_MAX. 19331 32-bit servers are servers that support locking for byte offsets that 19332 fit within 32 bits (i.e. less than or equal to NFS4_UINT32_MAX). If 19333 the client specifies a range that overlaps one or more bytes beyond 19334 offset NFS4_UINT32_MAX, but does not end at offset NFS4_UINT64_MAX, 19335 then such a 32-bit server MUST return the error NFS4ERR_BAD_RANGE. 19337 If the server returns NFS4ERR_DENIED, owner, offset, and length of a 19338 conflicting lock are returned. 19340 The locker argument specifies the lock-owner that is associated with 19341 the LOCK request. The locker4 structure is a switched union that 19342 indicates whether the client has already created byte-range locking 19343 state associated with the current open file and lock-owner. In the 19344 case in which it has, the argument is just a stateid representing the 19345 set of locks associated with that open file and lock-owner, together 19346 with a lock_seqid value which MAY be any value and MUST be ignored by 19347 the server. In the case where no byte-range locking state has been 19348 established, or the client does not have the stateid available, the 19349 argument contains the stateid of the open file with which this lock 19350 is to be associated, together with the lock-owner with which the lock 19351 is to be associated. The open_to_lock_owner case covers the very 19352 first lock done by a lock-owner for a given open file and offers a 19353 method to use the established state of the open_stateid to transition 19354 to the use of a lock stateid. 19356 The following fields of the locker parameter MAY be set to any value 19357 by the client and MUST be ignored by the server: 19359 o The clientid field of the lock_owner field of the open_owner field 19360 (locker.open_owner.lock_owner.clientid). The reason the server 19361 MUST ignore the clientid field is that the server MUST derive the 19362 client ID from the session ID from the SEQUENCE operation of the 19363 COMPOUND request. 19365 o The open_seqid and lock_seqid fields of the open_owner field 19366 (locker.open_owner.open_seqid and locker.open_owner.lock_seqid). 19368 o The lock_seqid field of the lock_owner field 19369 (locker.lock_owner.lock_seqid). 19371 Note that the client ID appearing in a LOCK4denied structure is the 19372 actual client associated with the conflicting lock, whether this is 19373 the client ID associated with the current session, or a different 19374 one. Thus if the server returns NFS4ERR_DENIED, it MUST set clientid 19375 of the owner field of the denied field. 19377 If the current filehandle is not an ordinary file, an error will be 19378 returned to the client. In the case that the current filehandle 19379 represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. if 19380 the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is 19381 returned. In all other cases, NFS4ERR_WRONG_TYPE is returned. 19383 On success, the current filehandle retains its value. 19385 18.10.4. IMPLEMENTATION 19387 If the server is unable to determine the exact offset and length of 19388 the conflicting lock, the same offset and length that were provided 19389 in the arguments should be returned in the denied results 19391 LOCK operations are subject to permission checks and to checks 19392 against the access type of the associated file. However, the 19393 specific right and modes required for various type of locks, reflect 19394 the semantics of the server-exported file system, and are not 19395 specified by the protocol. For example, Windows 2000 allows a write 19396 lock of a file open for READ, while a POSIX-compliant system does 19397 not. 19399 When the client makes a lock request that corresponds to a range that 19400 the lock-owner has locked already (with the same or different lock 19401 type), or to a sub-region of such a range, or to a region which 19402 includes multiple locks already granted to that lock-owner, in whole 19403 or in part, and the server does not support such locking operations 19404 (i.e. does not support POSIX locking semantics), the server will 19405 return the error NFS4ERR_LOCK_RANGE. In that case, the client may 19406 return an error, or it may emulate the required operations, using 19407 only LOCK for ranges that do not include any bytes already locked by 19408 that lock-owner and LOCKU of locks held by that lock-owner 19409 (specifying an exactly-matching range and type). Similarly, when the 19410 client makes a lock request that amounts to upgrading (changing from 19411 a read lock to a write lock) or downgrading (changing from write lock 19412 to a read lock) an existing byte-range lock, and the server does not 19413 support such a lock, the server will return NFS4ERR_LOCK_NOTSUPP. 19414 Such operations may not perfectly reflect the required semantics in 19415 the face of conflicting lock requests from other clients. 19417 When a client holds a write delegation, the client holding that 19418 delegation is assured that there are no opens by other clients. 19419 Thus, there can be no conflicting LOCK requests from such clients. 19420 Therefore, the client may be handling locking requests locally, 19421 without doing LOCK operations on the server. If it does that, it 19422 must be prepared to update the lock status on the server, by doing 19423 appropriate LOCK and LOCKU requests before returning the delegation. 19425 When one or more clients hold read delegations, any LOCK request 19426 where the server is implementing mandatory locking semantics, MUST 19427 result in the recall of all such delegations. The LOCK request may 19428 not be granted until all such delegations are return or revoked. 19429 Except where this happens very quickly, one or more NFS4ERR_DELAY 19430 errors will be returned to requests made while the delegation remains 19431 outstanding. 19433 18.11. Operation 13: LOCKT - Test For Lock 19435 18.11.1. ARGUMENTS 19437 struct LOCKT4args { 19438 /* CURRENT_FH: file */ 19439 nfs_lock_type4 locktype; 19440 offset4 offset; 19441 length4 length; 19442 lock_owner4 owner; 19443 }; 19445 18.11.2. RESULTS 19447 union LOCKT4res switch (nfsstat4 status) { 19448 case NFS4ERR_DENIED: 19449 LOCK4denied denied; 19450 case NFS4_OK: 19451 void; 19452 default: 19453 void; 19454 }; 19456 18.11.3. DESCRIPTION 19458 The LOCKT operation tests the lock as specified in the arguments. If 19459 a conflicting lock exists, the owner, offset, length, and type of the 19460 conflicting lock are returned. The owner field in the results 19461 includes the client ID of the owner of conflicting lock, whether this 19462 is the client ID associated with the current session or a different 19463 client ID. If no lock is held, nothing other than NFS4_OK is 19464 returned. Lock types READ_LT and READW_LT are processed in the same 19465 way in that a conflicting lock test is done without regard to 19466 blocking or non-blocking. The same is true for WRITE_LT and 19467 WRITEW_LT. 19469 The ranges are specified as for LOCK. The NFS4ERR_INVAL and 19470 NFS4ERR_BAD_RANGE errors are returned under the same circumstances as 19471 for LOCK. 19473 The clientid field of the owner MAY be set to any value by the client 19474 and MUST be ignored by the server. The reason the server MUST ignore 19475 the clientid field is that the server MUST derive the client ID from 19476 the session ID from the SEQUENCE operation of the COMPOUND request. 19478 If the current filehandle is not an ordinary file, an error will be 19479 returned to the client. In the case that the current filehandle 19480 represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. if 19481 the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is 19482 returned. In all other cases, NFS4ERR_WRONG_TYPE is returned. 19484 On success, the current filehandle retains its value. 19486 18.11.4. IMPLEMENTATION 19488 If the server is unable to determine the exact offset and length of 19489 the conflicting lock, the same offset and length that were provided 19490 in the arguments should be returned in the denied results. 19492 LOCKT uses a lock_owner4 rather a stateid4, as is used in LOCK to 19493 identify the owner. This is because the client does not have to open 19494 the file to test for the existence of a lock, so a stateid might not 19495 be available. 19497 As noted in Section 18.10.4, some servers may return 19498 NFS4ERR_LOCK_RANGE to certain (otherwise non-conflicting) lock 19499 requests that overlap ranges already granted to the current lock- 19500 owner. 19502 The LOCKT operation's test for conflicting locks SHOULD exclude locks 19503 for the current lock-owner, and thus should return NFS4_OK in such 19504 cases. Note that this means that a server might return NFS4_OK to a 19505 LOCKT request even though a LOCK request for the same range and lock 19506 owner would fail with NFS4ERR_LOCK_RANGE. 19508 When a client holds a write delegation, it may choose (see 19509 Section 18.10.4) to handle LOCK requests locally. In such a case, 19510 LOCKT requests will similarly be handled locally. 19512 18.12. Operation 14: LOCKU - Unlock File 19514 18.12.1. ARGUMENTS 19516 struct LOCKU4args { 19517 /* CURRENT_FH: file */ 19518 nfs_lock_type4 locktype; 19519 seqid4 seqid; 19520 stateid4 lock_stateid; 19521 offset4 offset; 19522 length4 length; 19523 }; 19525 18.12.2. RESULTS 19527 union LOCKU4res switch (nfsstat4 status) { 19528 case NFS4_OK: 19529 stateid4 lock_stateid; 19530 default: 19531 void; 19532 }; 19534 18.12.3. DESCRIPTION 19536 The LOCKU operation unlocks the byte-range lock specified by the 19537 parameters. The client may set the locktype field to any value that 19538 is legal for the nfs_lock_type4 enumerated type, and the server MUST 19539 accept any legal value for locktype. Any legal value for locktype 19540 has no effect on the success or failure of the LOCKU operation. 19542 The ranges are specified as for LOCK. The NFS4ERR_INVAL and 19543 NFS4ERR_BAD_RANGE errors are returned under the same circumstances as 19544 for LOCK. 19546 The seqid parameter MAY be any value and the server MUST ignore it. 19548 If the current filehandle is not an ordinary file, an error will be 19549 returned to the client. In the case that the current filehandle 19550 represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. if 19551 the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is 19552 returned. In all other cases, NFS4ERR_WRONG_TYPE is returned. 19554 On success, the current filehandle retains its value. 19556 The server MAY require that the principal, security flavor, and 19557 applicable, the GSS mechanism, combination that sent a LOCK request 19558 also be the one to send LOCKU on the file. This might not be 19559 possible if credentials for the principal are no longer available. 19560 The server MAY allow the machine credential or SSV credential (see 19561 Section 18.35) to send LOCKU. 19563 18.12.4. IMPLEMENTATION 19565 If the area to be unlocked does not correspond exactly to a lock 19566 actually held by the lock-owner the server may return the error 19567 NFS4ERR_LOCK_RANGE. This includes the case in which the area is not 19568 locked, where the area is a sub-range of the area locked, where it 19569 overlaps the area locked without matching exactly or the area 19570 specified includes multiple locks held by the lock-owner. In all of 19571 these cases, allowed by POSIX locking semantics, a client receiving 19572 this error, should if it desires support for such operations, 19573 simulate the operation using LOCKU on ranges corresponding to locks 19574 it actually holds, possibly followed by LOCK requests for the sub- 19575 ranges not being unlocked. 19577 When a client holds a write delegation, it may choose (See 19578 Section 18.10.4) to handle LOCK requests locally. In such a case, 19579 LOCKU requests will similarly be handled locally. 19581 18.13. Operation 15: LOOKUP - Lookup Filename 19583 18.13.1. ARGUMENTS 19585 struct LOOKUP4args { 19586 /* CURRENT_FH: directory */ 19587 component4 objname; 19588 }; 19590 18.13.2. RESULTS 19592 struct LOOKUP4res { 19593 /* New CURRENT_FH: object */ 19594 nfsstat4 status; 19595 }; 19597 18.13.3. DESCRIPTION 19599 This operation LOOKUPs or finds a file system object using the 19600 directory specified by the current filehandle. LOOKUP evaluates the 19601 component and if the object exists the current filehandle is replaced 19602 with the component's filehandle. 19604 If the component cannot be evaluated either because it does not exist 19605 or because the client does not have permission to evaluate the 19606 component, then an error will be returned and the current filehandle 19607 will be unchanged. 19609 If the component is a zero length string or if any component does not 19610 obey the UTF-8 definition, the error NFS4ERR_INVAL will be returned. 19612 18.13.4. IMPLEMENTATION 19614 If the client wants to achieve the effect of a multi-component 19615 lookup, it may construct a COMPOUND request such as (and obtain each 19616 filehandle): 19618 PUTFH (directory filehandle) 19619 LOOKUP "pub" 19620 GETFH 19621 LOOKUP "foo" 19622 GETFH 19623 LOOKUP "bar" 19624 GETFH 19626 Unlike NFSv3, NFSv4.1 allows LOOKUP requests to cross mountpoints on 19627 the server. The client can detect a mountpoint crossing by comparing 19628 the fsid attribute of the directory with the fsid attribute of the 19629 directory looked up. If the fsids are different then the new 19630 directory is a server mountpoint. UNIX clients that detect a 19631 mountpoint crossing will need to mount the server's file system. 19632 This needs to be done to maintain the file object identity checking 19633 mechanisms common to UNIX clients. 19635 Servers that limit NFS access to "shares" or "exported" file systems 19636 should provide a pseudo file system into which the exported file 19637 systems can be integrated, so that clients can browse the server's 19638 name space. The clients view of a pseudo file system will be limited 19639 to paths that lead to exported file systems. 19641 Note: previous versions of the protocol assigned special semantics to 19642 the names "." and "..". NFSv4.1 assigns no special semantics to 19643 these names. The LOOKUPP operator must be used to lookup a parent 19644 directory. 19646 Note that this operation does not follow symbolic links. The client 19647 is responsible for all parsing of filenames including filenames that 19648 are modified by symbolic links encountered during the lookup process. 19650 If the current filehandle supplied is not a directory but a symbolic 19651 link, the error NFS4ERR_SYMLINK is returned as the error. For all 19652 other non-directory file types, the error NFS4ERR_NOTDIR is returned. 19654 18.14. Operation 16: LOOKUPP - Lookup Parent Directory 19656 18.14.1. ARGUMENTS 19658 /* CURRENT_FH: object */ 19659 void; 19661 18.14.2. RESULTS 19663 struct LOOKUPP4res { 19664 /* new CURRENT_FH: parent directory */ 19665 nfsstat4 status; 19666 }; 19668 18.14.3. DESCRIPTION 19670 The current filehandle is assumed to refer to a regular directory or 19671 a named attribute directory. LOOKUPP assigns the filehandle for its 19672 parent directory to be the current filehandle. If there is no parent 19673 directory an NFS4ERR_NOENT error must be returned. Therefore, 19674 NFS4ERR_NOENT will be returned by the server when the current 19675 filehandle is at the root or top of the server's file tree. 19677 As is the case with LOOKUP, LOOKUPP will also cross mountpoints. 19679 If the current filehandle is not a directory or named attribute 19680 directory, the error NFS4ERR_NOTDIR is returned. 19682 If the requester's security flavor does not match that configured for 19683 the parent directory, then the server SHOULD return NFS4ERR_WRONGSEC 19684 (a future minor revision of NFSv4 may upgrade this to MUST) in the 19685 LOOKUPP response. However, if the server does so, it MUST support 19686 the SECINFO_NO_NAME operation (Section 18.45), so that the client can 19687 gracefully determine the correct security flavor. 19689 If the current filehandle is a named attribute directory that is 19690 associated with a file system object via OPENATTR (i.e. not a sub- 19691 directory of a named attribute directory) LOOKUPP SHOULD return the 19692 filehandle of the associated file system object. 19694 18.14.4. IMPLEMENTATION 19696 An issue to note is upward navigation from named attribute 19697 directories. The named attribute directories are essentially 19698 detached from the namespace and this property should be safely 19699 represented in the client operating environment. LOOKUPP on a named 19700 attribute directory may return the filehandle of the associated file 19701 and conveying this to applications might be unsafe as many 19702 applications expect the parent of an object to always be a directory. 19703 Therefore the client may want to hide the parent of named attribute 19704 directories (represented as ".." in UNIX) or represent the named 19705 attribute directory as its own parent (as typically done for the file 19706 system root directory in UNIX). 19708 18.15. Operation 17: NVERIFY - Verify Difference in Attributes 19710 18.15.1. ARGUMENTS 19712 struct NVERIFY4args { 19713 /* CURRENT_FH: object */ 19714 fattr4 obj_attributes; 19715 }; 19717 18.15.2. RESULTS 19719 struct NVERIFY4res { 19720 nfsstat4 status; 19721 }; 19723 18.15.3. DESCRIPTION 19725 This operation is used to prefix a sequence of operations to be 19726 performed if one or more attributes have changed on some file system 19727 object. If all the attributes match then the error NFS4ERR_SAME MUST 19728 be returned. 19730 On success, the current filehandle retains its value. 19732 18.15.4. IMPLEMENTATION 19734 This operation is useful as a cache validation operator. If the 19735 object to which the attributes belong has changed then the following 19736 operations may obtain new data associated with that object. For 19737 instance, to check if a file has been changed and obtain new data if 19738 it has: 19740 SEQUENCE 19741 PUTFH fh 19742 NVERIFY attrbits attrs 19743 READ 0 32767 19745 Contrast this with NFSv3, which would first send a GETATTR in one 19746 request/reply round trip, and then if attributes indicated that the 19747 client's cache was stale, then send a READ in another request/reply 19748 round trip. 19750 In the case that a RECOMMENDED attribute is specified in the NVERIFY 19751 operation and the server does not support that attribute for the file 19752 system object, the error NFS4ERR_ATTRNOTSUPP is returned to the 19753 client. 19755 When the attribute rdattr_error or any set-only attribute (e.g. 19756 time_modify_set) is specified, the error NFS4ERR_INVAL is returned to 19757 the client. 19759 18.16. Operation 18: OPEN - Open a Regular File 19761 18.16.1. ARGUMENTS 19763 /* 19764 * Various definitions for OPEN 19765 */ 19766 enum createmode4 { 19767 UNCHECKED4 = 0, 19768 GUARDED4 = 1, 19769 /* Deprecated in NFSv4.1. */ 19770 EXCLUSIVE4 = 2, 19771 /* 19772 * New to NFSv4.1. If session is persistent, 19773 * GUARDED4 MUST be used. Otherwise, use 19774 * EXCLUSIVE4_1 instead of EXCLUSIVE4. 19775 */ 19776 EXCLUSIVE4_1 = 3 19777 }; 19779 struct creatverfattr { 19780 verifier4 cva_verf; 19781 fattr4 cva_attrs; 19782 }; 19784 union createhow4 switch (createmode4 mode) { 19785 case UNCHECKED4: 19786 case GUARDED4: 19787 fattr4 createattrs; 19788 case EXCLUSIVE4: 19789 verifier4 createverf; 19790 case EXCLUSIVE4_1: 19791 creatverfattr ch_createboth; 19792 }; 19794 enum opentype4 { 19795 OPEN4_NOCREATE = 0, 19796 OPEN4_CREATE = 1 19797 }; 19799 union openflag4 switch (opentype4 opentype) { 19800 case OPEN4_CREATE: 19801 createhow4 how; 19802 default: 19804 void; 19805 }; 19807 /* Next definitions used for OPEN delegation */ 19808 enum limit_by4 { 19809 NFS_LIMIT_SIZE = 1, 19810 NFS_LIMIT_BLOCKS = 2 19811 /* others as needed */ 19812 }; 19814 struct nfs_modified_limit4 { 19815 uint32_t num_blocks; 19816 uint32_t bytes_per_block; 19817 }; 19819 union nfs_space_limit4 switch (limit_by4 limitby) { 19820 /* limit specified as file size */ 19821 case NFS_LIMIT_SIZE: 19822 uint64_t filesize; 19823 /* limit specified by number of blocks */ 19824 case NFS_LIMIT_BLOCKS: 19825 nfs_modified_limit4 mod_blocks; 19826 } ; 19828 /* 19829 * Share Access and Deny constants for open argument 19830 */ 19831 const OPEN4_SHARE_ACCESS_READ = 0x00000001; 19832 const OPEN4_SHARE_ACCESS_WRITE = 0x00000002; 19833 const OPEN4_SHARE_ACCESS_BOTH = 0x00000003; 19835 const OPEN4_SHARE_DENY_NONE = 0x00000000; 19836 const OPEN4_SHARE_DENY_READ = 0x00000001; 19837 const OPEN4_SHARE_DENY_WRITE = 0x00000002; 19838 const OPEN4_SHARE_DENY_BOTH = 0x00000003; 19840 /* new flags for share_access field of OPEN4args */ 19841 const OPEN4_SHARE_ACCESS_WANT_DELEG_MASK = 0xFF00; 19842 const OPEN4_SHARE_ACCESS_WANT_NO_PREFERENCE = 0x0000; 19843 const OPEN4_SHARE_ACCESS_WANT_READ_DELEG = 0x0100; 19844 const OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG = 0x0200; 19845 const OPEN4_SHARE_ACCESS_WANT_ANY_DELEG = 0x0300; 19846 const OPEN4_SHARE_ACCESS_WANT_NO_DELEG = 0x0400; 19847 const OPEN4_SHARE_ACCESS_WANT_CANCEL = 0x0500; 19849 const 19850 OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL 19851 = 0x10000; 19853 const 19854 OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED 19855 = 0x20000; 19857 enum open_delegation_type4 { 19858 OPEN_DELEGATE_NONE = 0, 19859 OPEN_DELEGATE_READ = 1, 19860 OPEN_DELEGATE_WRITE = 2, 19861 OPEN_DELEGATE_NONE_EXT = 3 /* new to v4.1 */ 19862 }; 19864 enum open_claim_type4 { 19865 /* 19866 * Not a reclaim. 19867 */ 19868 CLAIM_NULL = 0, 19870 CLAIM_PREVIOUS = 1, 19871 CLAIM_DELEGATE_CUR = 2, 19872 CLAIM_DELEGATE_PREV = 3, 19874 /* 19875 * Not a reclaim. 19876 * 19877 * Like CLAIM_NULL, but object identified 19878 * by the current filehandle. 19879 */ 19880 CLAIM_FH = 4, /* new to v4.1 */ 19882 /* 19883 * Like CLAIM_DELEGATE_CUR, but object identified 19884 * by current filehandle. 19885 */ 19886 CLAIM_DELEG_CUR_FH = 5, /* new to v4.1 */ 19888 /* 19889 * Like CLAIM_DELEGATE_PREV, but object identified 19890 * by current filehandle. 19891 */ 19892 CLAIM_DELEG_PREV_FH = 6 /* new to v4.1 */ 19893 }; 19895 struct open_claim_delegate_cur4 { 19896 stateid4 delegate_stateid; 19897 component4 file; 19898 }; 19899 union open_claim4 switch (open_claim_type4 claim) { 19900 /* 19901 * No special rights to file. 19902 * Ordinary OPEN of the specified file. 19903 */ 19904 case CLAIM_NULL: 19905 /* CURRENT_FH: directory */ 19906 component4 file; 19907 /* 19908 * Right to the file established by an 19909 * open previous to server reboot. File 19910 * identified by filehandle obtained at 19911 * that time rather than by name. 19912 */ 19913 case CLAIM_PREVIOUS: 19914 /* CURRENT_FH: file being reclaimed */ 19915 open_delegation_type4 delegate_type; 19917 /* 19918 * Right to file based on a delegation 19919 * granted by the server. File is 19920 * specified by name. 19921 */ 19922 case CLAIM_DELEGATE_CUR: 19923 /* CURRENT_FH: directory */ 19924 open_claim_delegate_cur4 delegate_cur_info; 19926 /* 19927 * Right to file based on a delegation 19928 * granted to a previous boot instance 19929 * of the client. File is specified by name. 19930 */ 19931 case CLAIM_DELEGATE_PREV: 19932 /* CURRENT_FH: directory */ 19933 component4 file_delegate_prev; 19935 /* 19936 * Like CLAIM_NULL. No special rights 19937 * to file. Ordinary OPEN of the 19938 * specified file by current filehandle. 19939 */ 19940 case CLAIM_FH: /* new to v4.1 */ 19941 /* CURRENT_FH: regular file to open */ 19942 void; 19944 /* 19945 * Like CLAIM_DELEGATE_PREV. Right to file based on a 19946 * delegation granted to a previous boot 19947 * instance of the client. File is identified by 19948 * by filehandle. 19949 */ 19950 case CLAIM_DELEG_PREV_FH: /* new to v4.1 */ 19951 /* CURRENT_FH: file being opened */ 19952 void; 19954 /* 19955 * Like CLAIM_DELEGATE_CUR. Right to file based on 19956 * a delegation granted by the server. 19957 * File is identified by filehandle. 19958 */ 19959 case CLAIM_DELEG_CUR_FH: /* new to v4.1 */ 19960 /* CURRENT_FH: file being opened */ 19961 stateid4 oc_delegate_stateid; 19963 }; 19965 /* 19966 * OPEN: Open a file, potentially receiving an open delegation 19967 */ 19968 struct OPEN4args { 19969 seqid4 seqid; 19970 uint32_t share_access; 19971 uint32_t share_deny; 19972 open_owner4 owner; 19973 openflag4 openhow; 19974 open_claim4 claim; 19975 }; 19977 18.16.2. RESULTS 19979 struct open_read_delegation4 { 19980 stateid4 stateid; /* Stateid for delegation*/ 19981 bool recall; /* Pre-recalled flag for 19982 delegations obtained 19983 by reclaim (CLAIM_PREVIOUS) */ 19985 nfsace4 permissions; /* Defines users who don't 19986 need an ACCESS call to 19987 open for read */ 19988 }; 19990 struct open_write_delegation4 { 19991 stateid4 stateid; /* Stateid for delegation */ 19992 bool recall; /* Pre-recalled flag for 19993 delegations obtained 19994 by reclaim 19995 (CLAIM_PREVIOUS) */ 19997 nfs_space_limit4 19998 space_limit; /* Defines condition that 19999 the client must check to 20000 determine whether the 20001 file needs to be flushed 20002 to the server on close. */ 20004 nfsace4 permissions; /* Defines users who don't 20005 need an ACCESS call as 20006 part of a delegated 20007 open. */ 20008 }; 20010 enum why_no_delegation4 { /* new to v4.1 */ 20011 WND4_NOT_WANTED = 0, 20012 WND4_CONTENTION = 1, 20013 WND4_RESOURCE = 2, 20014 WND4_NOT_SUPP_FTYPE = 3, 20015 WND4_WRITE_DELEG_NOT_SUPP_FTYPE = 4, 20016 WND4_NOT_SUPP_UPGRADE = 5, 20017 WND4_NOT_SUPP_DOWNGRADE = 6, 20018 WND4_CANCELED = 7, 20019 WND4_IS_DIR = 8 20020 }; 20022 union open_none_delegation4 /* new to v4.1 */ 20023 switch (why_no_delegation4 ond_why) { 20024 case WND4_CONTENTION: 20025 bool ond_server_will_push_deleg; 20026 case WND4_RESOURCE: 20027 bool ond_server_will_signal_avail; 20028 default: 20029 void; 20030 }; 20032 union open_delegation4 20033 switch (open_delegation_type4 delegation_type) { 20034 case OPEN_DELEGATE_NONE: 20035 void; 20036 case OPEN_DELEGATE_READ: 20037 open_read_delegation4 read; 20038 case OPEN_DELEGATE_WRITE: 20039 open_write_delegation4 write; 20040 case OPEN_DELEGATE_NONE_EXT: /* new to v4.1 */ 20041 open_none_delegation4 od_whynone; 20042 }; 20044 /* 20045 * Result flags 20046 */ 20048 /* Client must confirm open */ 20049 const OPEN4_RESULT_CONFIRM = 0x00000002; 20050 /* Type of file locking behavior at the server */ 20051 const OPEN4_RESULT_LOCKTYPE_POSIX = 0x00000004; 20052 /* Server will preserve file if removed while open */ 20053 const OPEN4_RESULT_PRESERVE_UNLINKED = 0x00000008; 20055 /* 20056 * Server may use CB_NOTIFY_LOCK on locks 20057 * derived from this open 20058 */ 20059 const OPEN4_RESULT_MAY_NOTIFY_LOCK = 0x00000020; 20061 struct OPEN4resok { 20062 stateid4 stateid; /* Stateid for open */ 20063 change_info4 cinfo; /* Directory Change Info */ 20064 uint32_t rflags; /* Result flags */ 20065 bitmap4 attrset; /* attribute set for create*/ 20066 open_delegation4 delegation; /* Info on any open 20067 delegation */ 20068 }; 20070 union OPEN4res switch (nfsstat4 status) { 20071 case NFS4_OK: 20072 /* New CURRENT_FH: opened file */ 20073 OPEN4resok resok4; 20074 default: 20075 void; 20076 }; 20078 18.16.3. DESCRIPTION 20080 The OPEN operation opens a regular file in a directory with the 20081 provided name or filehandle. OPEN can also create a file if a name 20082 is provided, and the client specifies it wants to create a file. 20083 Specification whether a file is be created or not, and the method of 20084 creation is via the openhow parameter. The openhow parameter 20085 consists of a switched union (data type opengflag4), which switches 20086 on the value of opentype (OPEN4_NOCREATE or OPEN4_CREATE). If 20087 OPEN4_CREATE is specified, this leads to another switched union (data 20088 type createhow4) that supports four cases of creation methods: 20089 UNCHECKED4, GUARDED4, EXCLUSIVE4, or EXCLUSIVE4_1. If opentype is 20090 OPEN4_CREATE, then the claim field of the claim field (sic) MUST be 20091 one of CLAIM_NULL, CLAIM_DELEGATE_CUR, or CLAIM_DELEGATE_PREV, 20092 because these claim methods include a component of a file name. 20094 Upon success (which might entail creation of a new file), the current 20095 filehandle is replaced by that of the created or existing object. 20097 If the current filehandle is a named attribute directory, OPEN will 20098 then create or open a named attribute file. Note that exclusive 20099 create of a named attribute is not supported. If the createmode is 20100 EXCLUSIVE4 or EXCLUSIVE4_1 and the current filehandle is a named 20101 attribute directory, the server will return EINVAL. 20103 UNCHECKED4 means that the file should be created if a file of that 20104 name does not exist and encountering an existing regular file of that 20105 name is not an error. For this type of create, createattrs specifies 20106 the initial set of attributes for the file. The set of attributes 20107 may include any writable attribute valid for regular files. When an 20108 UNCHECKED4 create encounters an existing file, the attributes 20109 specified by createattrs are not used, except that when createattrs 20110 specifies the size attribute with a size of zero, the existing file 20111 is truncated. 20113 If GUARDED4 is specified, the server checks for the presence of a 20114 duplicate object by name before performing the create. If a 20115 duplicate exists, NFS4ERR_EXIST is returned. If the object does not 20116 exist, the request is performed as described for UNCHECKED4. 20118 For the UNCHECKED4 and GUARDED4 cases, where the operation is 20119 successful, the server will return to the client an attribute mask 20120 signifying which attributes were successfully set for the object. 20122 EXCLUSIVE4_1 and EXCLUSIVE4 specify that the server is to follow 20123 exclusive creation semantics, using the verifier to ensure exclusive 20124 creation of the target. The server should check for the presence of 20125 a duplicate object by name. If the object does not exist, the server 20126 creates the object and stores the verifier with the object. If the 20127 object does exist and the stored verifier matches the client provided 20128 verifier, the server uses the existing object as the newly created 20129 object. If the stored verifier does not match, then an error of 20130 NFS4ERR_EXIST is returned. 20132 If using EXCLUSIVE4, and if the server uses attributes to store the 20133 exclusive create verifier, the server will signify which attributes 20134 it used by setting the appropriate bits in the attribute mask that is 20135 returned in the results. Unlike UNCHECKED4, GUARDED4, and 20136 EXCLUSIVE4_1, EXCLUSIVE4 does not support the setting of attributes 20137 at file creation, and after a successful OPEN via EXCLUSIVE4, the 20138 client MUST send a SETATTR to set attributes to a known state. 20140 In NFSv4.1, EXCLUSIVE4 has been deprecated in favor of EXCLUSIVE4_1. 20141 Unlike EXCLUSIVE4, attributes may be provided in the EXCLUSIVE4_1 20142 case, but because the server may use attributes of the target object 20143 to store the verifier, the set of allowable attributes may be fewer 20144 than the set of attributes SETATTR allows. The allowable attributes 20145 for EXCLUSIVE4_1 are indicated in the suppattr_exclcreat 20146 (Section 5.8.1.14) attribute. If the client attempts to set in 20147 cva_attrs an attribute that is not in suppattr_exclcreat, the server 20148 MUST return NFS4ERR_INVAL. The response field, attrset indicates 20149 both which attributes the server set from cva_attrs, and which 20150 attributes the server used to store the verifier. As described in 20151 Section 18.16.4, the client can compare cva_attrs.attrmask with 20152 attrset to determine which attributes were used to store the 20153 verifier. 20155 With the addition of persistent sessions and pNFS, under some 20156 conditions EXCLUSIVE4 MUST NOT be used by the client or supported by 20157 the server. The following table summarizes the appropriate and 20158 mandated exclusive create methods for implementations of NFSv4.1: 20160 Required methods for exclusive create 20162 +--------------+--------+-----------------+-------------------------+ 20163 | Persistent | pNFS | Server REQUIRED | Client Allowed | 20164 | Reply Cache | server | | | 20165 +--------------+--------+-----------------+-------------------------+ 20166 | no | no | EXCLUSIVE4_1 | EXCLUSIVE4_1 (SHOULD) | 20167 | | | and EXCLUSIVE4 | or EXCLUSIVE4 (SHOULD | 20168 | | | | NOT) | 20169 | no | yes | EXCLUSIVE4_1 | EXCLUSIVE4_1 | 20170 | yes | no | GUARDED4 | GUARDED4 | 20171 | yes | yes | GUARDED4 | GUARDED4 | 20172 +--------------+--------+-----------------+-------------------------+ 20174 Table 18 20176 If CREATE_SESSION4_FLAG_PERSIST is set in the results of 20177 CREATE_SESSION the reply cache is persistent (see Section 18.36). If 20178 the EXCHGID4_FLAG_USE_PNFS_MDS flag is set in the results from 20179 EXCHANGE_ID, the server is a pNFS server (see Section 18.35). If the 20180 client attempts to use EXCLUSIVE4 on a persistent session, or a 20181 session derived from a EXCHGID4_FLAG_USE_PNFS_MDS client ID, the 20182 server MUST return NFS4ERR_INVAL. 20184 With persistent sessions, exclusive create semantics are fully 20185 achievable via GUARDED4, and so EXCLUSIVE4 or EXCLUSIVE4_1 MUST NOT 20186 be used. When pNFS is being used, the layout_hint attribute might 20187 not be supported after the file is created. Only the EXCLUSIVE4_1 20188 and GUARDED methods of exclusive file creation allow the atomic 20189 setting of attributes. 20191 For the target directory, the server returns change_info4 information 20192 in cinfo. With the atomic field of the change_info4 data type, the 20193 server will indicate if the before and after change attributes were 20194 obtained atomically with respect to the link creation. 20196 The OPEN operation provides for Windows share reservation capability 20197 with the use of the share_access and share_deny fields of the OPEN 20198 arguments. The client specifies at OPEN the required share_access 20199 and share_deny modes. For clients that do not directly support 20200 SHAREs (i.e. UNIX), the expected deny value is DENY_NONE. In the 20201 case that there is a existing SHARE reservation that conflicts with 20202 the OPEN request, the server returns the error NFS4ERR_SHARE_DENIED. 20203 For additional discussion of SHARE semantics see Section 9.7. 20205 For each OPEN, the client provides a value for the owner field of the 20206 OPEN argument. The owner field is of data type open_owner4, and 20207 contains a field called clientid and a field called owner. The 20208 client can set the clientid field to any value and the server MUST 20209 ignore it. Instead the server MUST derive the client ID from the 20210 session ID of the SEQUENCE operation of the COMPOUND request. 20212 The seqid field of the request is not used in NFSv4.1, but it MAY be 20213 any value and the server MUST ignore it. 20215 In the case that the client is recovering state from a server 20216 failure, the claim field of the OPEN argument is used to signify that 20217 the request is meant to reclaim state previously held. 20219 The "claim" field of the OPEN argument is used to specify the file to 20220 be opened and the state information which the client claims to 20221 possess. There are seven claim types as follows: 20223 +----------------------+--------------------------------------------+ 20224 | open type | description | 20225 +----------------------+--------------------------------------------+ 20226 | CLAIM_NULL, CLAIM_FH | For the client, this is a new OPEN request | 20227 | | and there is no previous state associate | 20228 | | with the file for the client. With | 20229 | | CLAIM_NULL the file is identified by the | 20230 | | current filehandle and the specified | 20231 | | component name. With CLAIM_FH (new to | 20232 | | NFSv4.1) the file is identified by just | 20233 | | the current filehandle. | 20234 | CLAIM_PREVIOUS | The client is claiming basic OPEN state | 20235 | | for a file that was held previous to a | 20236 | | server restart. Generally used when a | 20237 | | server is returning persistent | 20238 | | filehandles; the client may not have the | 20239 | | file name to reclaim the OPEN. | 20240 | CLAIM_DELEGATE_CUR, | The client is claiming a delegation for | 20241 | CLAIM_DELEG_CUR_FH | OPEN as granted by the server. Generally | 20242 | | this is done as part of recalling a | 20243 | | delegation. With CLAIM_DELEGATE_CUR, the | 20244 | | file is identified by the current | 20245 | | filehandle and the specified component | 20246 | | name. With CLAIM_DELEG_CUR_FH (new to | 20247 | | NFSv4.1), the file is identified by just | 20248 | | the current filehandle. | 20249 | CLAIM_DELEGATE_PREV, | The client is claiming a delegation | 20250 | CLAIM_DELEG_PREV_FH | granted to a previous client instance; | 20251 | | used after the client restarts. The server | 20252 | | MAY support CLAIM_DELEGATE_PREV or | 20253 | | CLAIM_DELEG_PREV_FH (new to NFSv4.1). If | 20254 | | it does support either open type, | 20255 | | CREATE_SESSION MUST NOT remove the | 20256 | | client's delegation state, and the server | 20257 | | MUST support the DELEGPURGE operation. | 20258 +----------------------+--------------------------------------------+ 20260 For OPEN requests that reach the server during the grace period, the 20261 server returns an error of NFS4ERR_GRACE. The following claim types 20262 are exceptions: 20264 o OPEN requests specifying the claim type CLAIM_PREVIOUS are devoted 20265 to reclaiming opens after a server reboot and are typically only 20266 valid during the grace period. 20268 o OPEN requests specifying the claim types CLAIM_DELEGATE_CUR and 20269 CLAIM_DELEG_CUR_FH are valid both during and after the grace 20270 period. Since the granting of the delegation that they are 20271 subordinate to assures that there is no conflict with locks to be 20272 reclaimed by other clients, the server need not return 20273 NFS4ERR_GRACE when these are received during the grace period. 20275 For any OPEN request, the server may return an open delegation, which 20276 allows further opens and closes to be handled locally on the client 20277 as described in Section 10.4. Note that delegation is up to the 20278 server to decide. The client should never assume that delegation 20279 will or will not be granted in a particular instance. It should 20280 always be prepared for either case. A partial exception is the 20281 reclaim (CLAIM_PREVIOUS) case, in which a delegation type is claimed. 20282 In this case, delegation will always be granted, although the server 20283 may specify an immediate recall in the delegation structure. 20285 The rflags returned by a successful OPEN allow the server to return 20286 information governing how the open file is to be handled. 20288 o OPEN4_RESULT_CONFIRM is deprecated and MUST NOT be returned by an 20289 NFSv4.1 server. 20291 o OPEN4_RESULT_LOCKTYPE_POSIX indicates the server's file locking 20292 behavior supports the complete set of Posix locking techniques. 20293 From this the client can choose to manage file locking state in a 20294 way to handle a mis-match of file locking management. 20296 o OPEN4_RESULT_PRESERVE_UNLINKED indicates the server will preserve 20297 the open file if the client (or any other client) removes the file 20298 as long as it is open. Furthermore, the server promises to 20299 preserve the file through the grace period after server restart, 20300 thereby giving the client the opportunity to reclaim its open. 20302 o OPEN4_RESULT_MAY_NOTIFY_LOCK indicates that the server may attempt 20303 CB_NOTIFY_LOCK callbacks for locks on this file. This flag is a 20304 hint only, and may be safely ignored by the client. 20306 If the component is of zero length, NFS4ERR_INVAL will be returned. 20307 The component is also subject to the normal UTF-8, character support, 20308 and name checks. See Section 14.5 for further discussion. 20310 When an OPEN is done and the specified open-owner already has the 20311 resulting filehandle open, the result is to "OR" together the new 20312 share and deny status together with the existing status. In this 20313 case, only a single CLOSE need be done, even though multiple OPENs 20314 were completed. When such an OPEN is done, checking of share 20315 reservations for the new OPEN proceeds normally, with no exception 20316 for the existing OPEN held by the same open-owner. In this case, the 20317 stateid returned as an "other" field that matches that of the 20318 previous open while the "seqid" field is incremented to reflect the 20319 change status due to the new open. 20321 If the underlying file system at the server is only accessible in a 20322 read-only mode and the OPEN request has specified ACCESS_WRITE or 20323 ACCESS_BOTH, the server will return NFS4ERR_ROFS to indicate a read- 20324 only file system. 20326 As with the CREATE operation, the server MUST derive the owner, owner 20327 ACE, group, or group ACE if any of the four attributes are required 20328 and supported by the server's file system. For an OPEN with the 20329 EXCLUSIVE4 createmode, the server has no choice, since such OPEN 20330 calls do not include the createattrs field. Conversely, if 20331 createattrs (UNCHECKED4 or GUARDED4) or cva_attrs (EXCLUSIVE4_1) is 20332 specified, and includes an owner, or owner_group, or ACE that the 20333 principal in the RPC call's credentials does not have authorization 20334 to create files for, then the server may return NFS4ERR_PERM. 20336 In the case of an OPEN which specifies a size of zero (e.g. 20337 truncation) and the file has named attributes, the named attributes 20338 are left as is and are not removed. 20340 NFSv4.1 gives more precise control to clients over acquisition of 20341 delegations via the following new flags for the share_access field of 20342 OPEN4args: 20344 OPEN4_SHARE_ACCESS_WANT_READ_DELEG 20346 OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG 20348 OPEN4_SHARE_ACCESS_WANT_ANY_DELEG 20350 OPEN4_SHARE_ACCESS_WANT_NO_DELEG 20352 OPEN4_SHARE_ACCESS_WANT_CANCEL 20354 OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL 20356 OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED 20358 If (share_access & OPEN4_SHARE_ACCESS_WANT_DELEG_MASK) is not zero, 20359 then the client will have specified one and only one of: 20361 OPEN4_SHARE_ACCESS_WANT_READ_DELEG 20363 OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG 20364 OPEN4_SHARE_ACCESS_WANT_ANY_DELEG 20366 OPEN4_SHARE_ACCESS_WANT_NO_DELEG 20368 OPEN4_SHARE_ACCESS_WANT_CANCEL 20370 Otherwise the client is indicating no desire for a delegation and the 20371 server MAY or MAY not return a delegation in the OPEN response. 20373 If the server supports the new _WANT_ flags and the client sends one 20374 or more of the new flags, then in the event the server does not 20375 return a delegation, it MUST return a delegation type of 20376 OPEN_DELEGATE_NONE_EXT. The field od_whynone in the reply indicates 20377 why no delegation was returned and will be one of: 20379 WND4_NOT_WANTED The client specified 20380 OPEN4_SHARE_ACCESS_WANT_NO_DELEG. 20382 WND4_CONTENTION There is a conflicting delegation or open on the 20383 file. 20385 WND4_RESOURCE Resource limitations prevent the server from granting 20386 a delegation. 20388 WND4_NOT_SUPP_FTYPE The server does not support delegations on this 20389 file type. 20391 WND4_WRITE_DELEG_NOT_SUPP_FTYPE The server does not support write 20392 delegations on this file type. 20394 WND4_NOT_SUPP_UPGRADE The server does not support atomic upgrade of 20395 a read delegation to a write delegation. 20397 WND4_NOT_SUPP_DOWNGRADE The server does not support atomic downgrade 20398 of a write delegation to a read delegation. 20400 WND4_CANCELED The client specified OPEN4_SHARE_ACCESS_WANT_CANCEL 20401 and now any "want" for this file object is cancelled. 20403 WND4_IS_DIR The specified file object is a directory, and the 20404 operation is OPEN or WANT_DELEGATION which do not support 20405 delegations on directories. 20407 OPEN4_SHARE_ACCESS_WANT_READ_DELEG, 20408 OPEN_SHARE_ACCESS_WANT_WRITE_DELEG, or 20409 OPEN_SHARE_ACCESS_WANT_ANY_DELEG mean, respectively, the client wants 20410 a read, write, or any delegation regardless which of 20411 OPEN4_SHARE_ACCESS_READ, OPEN4_SHARE_ACCESS_WRITE, or 20412 OPEN4_SHARE_ACCESS_BOTH is set. If the client has a read delegation 20413 on a file, and requests a write delegation, then the client is 20414 requesting atomic upgrade of its read delegation to a write 20415 delegation. If the client has a write delegation on a file, and 20416 requests a read delegation, then the client is requesting atomic 20417 downgrade to a read delegation. A server MAY support atomic upgrade 20418 or downgrade. If it does, then the returned delegation_type of 20419 OPEN_DELEGATE_READ or OPEN_DELEGATE_WRITE that is different than the 20420 delegation type the client currently has, indicates successful 20421 upgrade or downgrade. If it does not support atomic delegation 20422 upgrade or downgrade, then od_whynone will be WND4_NOT_SUPP_UPGRADE 20423 or WND4_NOT_SUPP_DOWNGRADE. 20425 OPEN4_SHARE_ACCESS_WANT_NO_DELEG means the client wants no 20426 delegation. 20428 OPEN4_SHARE_ACCESS_WANT_CANCEL means the client wants no delegation 20429 and wants to cancel any previously registered "want" for a 20430 delegation. 20432 The client may set one or both of 20433 OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL and 20434 OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED. However, they 20435 will have no effect unless one of following are set: 20437 o OPEN4_SHARE_ACCESS_WANT_READ_DELEG 20439 o OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG 20441 o OPEN4_SHARE_ACCESS_WANT_ANY_DELEG 20443 If the client specifies 20444 OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL, then it wishes 20445 to register a "want" for a delegation, in the event the OPEN results 20446 do not include a delegation. If so and the server denies the 20447 delegation due to insufficient resources, the server MAY later inform 20448 the client, via the CB_RECALLABLE_OBJ_AVAIL operation, that the 20449 resource limitation condition has eased. The server will tell the 20450 client that it intends to send a future CB_RECALLABLE_OBJ_AVAIL 20451 operation by setting delegation_type in the results to 20452 OPEN_DELEGATE_NONE_EXT, ond_why to WND4_RESOURCE, and 20453 ond_server_will_signal_avail set to TRUE. If 20454 ond_server_will_signal_avail is set to TRUE, the server MUST later 20455 send a CB_RECALLABLE_OBJ_AVAIL operation. 20457 If the client specifies 20458 OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_UNCONTENDED, then it wishes 20459 to register a "want" for a delegation, in the event the OPEN results 20460 do not include a delegation. If so and the server denies the 20461 delegation due to insufficient resources, the server MAY later inform 20462 the client, via the CB_PUSH_DELEG operation, that the resource 20463 limitation condition has eased. The server will tell the client that 20464 it intends to send a future CB_PUSH_DELEG operation by setting 20465 delegation_type in the results to OPEN_DELEGATE_NONE_EXT, ond_why to 20466 WND4_CONTENTION, and ond_server_will_push_deleg to TRUE. If 20467 ond_server_will_push_deleg is TRUE, the server MUST later send a 20468 CB_RECALLABLE_OBJ_AVAIL operation. 20470 If the client has previously registered a want for a delegation on a 20471 file, and then sends a request to register a want for a delegation on 20472 the same file, the server MUST return a new error: 20473 NFS4ERR_DELEG_ALREADY_WANTED. If the client wishes to register a 20474 different type of delegation want for the same file, it MUST cancel 20475 the existing delegation WANT. 20477 18.16.4. IMPLEMENTATION 20479 In absence of a persistent session, the client invokes exclusive 20480 create by setting the how parameter to EXCLUSIVE4 or EXCLUSIVE4_1. 20481 In these cases, the client provides a verifier that can reasonably be 20482 expected to be unique. A combination of a client identifier, perhaps 20483 the client network address, and a unique number generated by the 20484 client, perhaps the RPC transaction identifier, may be appropriate. 20486 If the object does not exist, the server creates the object and 20487 stores the verifier in stable storage. For file systems that do not 20488 provide a mechanism for the storage of arbitrary file attributes, the 20489 server may use one or more elements of the object metadata to store 20490 the verifier. The verifier MUST be stored in stable storage to 20491 prevent erroneous failure on retransmission of the request. It is 20492 assumed that an exclusive create is being performed because exclusive 20493 semantics are critical to the application. Because of the expected 20494 usage, exclusive CREATE does not rely solely on the server's reply 20495 cache for storage of the verifier. A nonpersistent reply cache does 20496 not survive a crash and the session and reply cache may be deleted 20497 after a network partition that exceeds the lease time, thus opening 20498 failure windows. 20500 An NFSv4.1 server SHOULD NOT store the verifier in any of the file's 20501 RECOMMENDED or REQUIRED attributes. If it does, the server SHOULD 20502 use time_modify_set or time_access_set to store the verifier. The 20503 server SHOULD NOT store the verifier in the following attributes: acl 20504 (it is desirable for access control to be established at creation), 20505 dacl (ditto), mode (ditto), owner (ditto), owner_group (ditto), 20506 retentevt_set (it may be desired to establish retention at creation) 20507 retention_hold (ditto), retention_set (ditto), sacl (it is desirable 20508 for auditing control to be established at creation), size (on some 20509 servers, size may have a limited range of values), mode_set_masked 20510 (as with mode), and time_creation (a meaningful file creation should 20511 be set when the file is created). Another alternative for the server 20512 is to use a named attribute to store the verifier. 20514 Because the EXCLUSIVE4 create method does not specify initial 20515 attributes, when processing an EXCLUSIVE4 create, the server 20517 o SHOULD set the owner of the file to that corresponding to the 20518 credential of request's RPC header. 20520 o SHOULD NOT leave the file's access control to anyone but the owner 20521 of the file. 20523 If the server cannot support exclusive create semantics, possibly 20524 because of the requirement to commit the verifier to stable storage, 20525 it should fail the OPEN request with the error, NFS4ERR_NOTSUPP. 20527 During an exclusive CREATE request, if the object already exists, the 20528 server reconstructs the object's verifier and compares it with the 20529 verifier in the request. If they match, the server treats the 20530 request as a success. The request is presumed to be a duplicate of 20531 an earlier, successful request for which the reply was lost and that 20532 the server duplicate request cache mechanism did not detect. If the 20533 verifiers do not match, the request is rejected with the status, 20534 NFS4ERR_EXIST. 20536 After the client has performed a successful exclusive create, the 20537 attrset response indicates which attributes were used to store the 20538 verifier. If EXCLUSIVE4 was used, the attributes set in attrset were 20539 used for the verifier. If EXCLUSIVE4_1 was used, the client 20540 determines the attributes used for the verifier by comparing attrset 20541 with cva_attrs.attrmask; any bits set in the former but not the 20542 latter identify the attributes used store the verifier. The client 20543 MUST immediately send a SETATTR to set attributes used to store the 20544 verifier. Until it does so, the attributes used to store the 20545 verifier cannot be relied upon. The subsequent SETATTR MUST NOT 20546 occur in the same COMPOUND request as the OPEN. 20548 Unless a persistent session is used, use of the GUARDED4 attribute 20549 does not provide exactly-once semantics. In particular, if a reply 20550 is lost and the server does not detect the retransmission of the 20551 request, the operation can fail with NFS4ERR_EXIST, even though the 20552 create was performed successfully. The client would use this 20553 behavior in the case that the application has not requested an 20554 exclusive create but has asked to have the file truncated when the 20555 file is opened. In the case of the client timing out and 20556 retransmitting the create request, the client can use GUARDED4 to 20557 prevent against a sequence like: create, write, create 20558 (retransmitted) from occurring. 20560 For SHARE reservations, the client MUST specify a value for 20561 share_access that is one of READ, WRITE, or BOTH. For share_deny, 20562 the client MUST specify one of NONE, READ, WRITE, or BOTH. If the 20563 client fails to do this, the server MUST return NFS4ERR_INVAL. 20565 Based on the share_access value (READ, WRITE, or BOTH) the client 20566 should check that the requester has the proper access rights to 20567 perform the specified operation. This would generally be the results 20568 of applying the ACL access rules to the file for the current 20569 requester. However, just as with the ACCESS operation, the client 20570 should not attempt to second-guess the server's decisions, as access 20571 rights may change and may be subject to server administrative 20572 controls outside the ACL framework. If the requester is not 20573 authorized to READ or WRITE (depending on the share_access value), 20574 the server MUST return NFS4ERR_ACCESS. 20576 Note that if the client ID was not created with 20577 EXCHGID4_FLAG_BIND_PRINC_STATEID set in the reply to EXCHANGE_ID, 20578 then the server MUST NOT impose any requirement that READs and WRITEs 20579 sent for an open file have the same credentials as the OPEN itself, 20580 and the server is REQUIRED to perform access checking on the READs 20581 and WRITEs themselves. Otherwise, if the reply to EXCHANGE_ID did 20582 have EXCHGID4_FLAG_BIND_PRINC_STATEID set, then with one exception, 20583 the credentials used in the OPEN request MUST match those used in the 20584 READs and WRITEs, and the stateids in the READs and WRITEs MUST 20585 match, or be derived from the stateid from the reply to OPEN. The 20586 exception is if SP4_SSV or SP4_MACH_CRED state protection is used, 20587 and the spo_must_allow result of EXCHANGE_ID includes the READ and/or 20588 WRITE operations. In that case, the machine or SSV credential will 20589 be allowed to issue READ and/or WRITE. See Section 18.35. 20591 If the component provided to OPEN is a symbolic link, the error 20592 NFS4ERR_SYMLINK will be returned to the client, while if it is a 20593 directory the error NFS4ERR_ISDIR. If the component is neither of 20594 those but not an ordinary file, the error NFS4ERR_WRONG_TYPE is 20595 returned. If the current filehandle is not a directory, the error 20596 NFS4ERR_NOTDIR will be returned. 20598 The use of the OPEN4_RESULT_PRESERVE_UNLINKED result flag allows a 20599 client avoid the common implementation practice of renaming an open 20600 file to ".nfs" after it removes the file. After the 20601 server returns OPEN4_RESULT_PRESERVE_UNLINKED, if a client sends a 20602 REMOVE operation that would reduce the file's link count to zero, the 20603 server SHOULD report a value of zero for the numlinks attribute on 20604 the file. 20606 If another client has a delegation of the file being opened that 20607 conflicts with open being done (sometimes depending of the 20608 share_access or share_deny value specified), the delegation(s) MUST 20609 be recalled, and the operation cannot proceed until each such 20610 delegation is returned or revoked. Except where this happens very 20611 quickly, one or more NFS4ERR_DELAY errors will be returned to 20612 requests made while delegation remains outstanding. In the case of a 20613 write delegation, any open by a different client will conflict, while 20614 for a read delegation only opens with one of the following 20615 characteristics will be considered conflicting: 20617 o The value of share_access includes the bit 20618 OPEN4_SHARE_ACCESS_WRITE. 20620 o The value of share_deny specifies READ or BOTH. 20622 o OPEN4_CREATE is specified together with UNCHECKED4, the size 20623 attribute is specified as zero (for truncation) and an existing 20624 file is truncated. 20626 If OPEN4_CREATE is specified and the file does not exist and the 20627 current filehandle designates a directory for which another client 20628 holds a directory delegation, then, unless the delegation is such 20629 that the situation can be resolved by sending a notification, the 20630 delegation MUST be recalled, and the operation cannot proceed until 20631 the delegation is returned or revoked. Except where this happens 20632 very quickly, one or more NFS4ERR_DELAY errors will be returned to 20633 requests made while delegation remains outstanding. 20635 If OPEN4_CREATE is specified and the file does not exist and the 20636 current filehandle designates a directory for which one or more 20637 directory delegations exist, then, when those delegations request 20638 such notifications, NOTIFY4_ADD_ENTRY will be generated as a result 20639 of this operation. 20641 18.16.4.1. WARNING TO CLIENT IMPLEMENTORS 20643 OPEN resembles LOOKUP in that it generates a filehandle for the 20644 client to use. Unlike LOOKUP though, OPEN creates server state on 20645 the filehandle. In normal circumstances, the client can only release 20646 this state with a CLOSE operation. CLOSE uses the current filehandle 20647 to determine which file to close. Therefore the client MUST follow 20648 every OPEN operation with a GETFH operation in the same COMPOUND 20649 procedure. This will supply the client with the filehandle such that 20650 CLOSE can be used appropriately. 20652 Simply waiting for the lease on the file to expire is insufficient 20653 because the server may maintain the state indefinitely as long as 20654 another client does not attempt to make a conflicting access to the 20655 same file. 20657 See also Section 2.10.5.4. 20659 18.17. Operation 19: OPENATTR - Open Named Attribute Directory 20661 18.17.1. ARGUMENTS 20663 struct OPENATTR4args { 20664 /* CURRENT_FH: object */ 20665 bool createdir; 20666 }; 20668 18.17.2. RESULTS 20670 struct OPENATTR4res { 20671 /* 20672 * If status is NFS4_OK, 20673 * new CURRENT_FH: named attribute 20674 * directory 20675 */ 20676 nfsstat4 status; 20677 }; 20679 18.17.3. DESCRIPTION 20681 The OPENATTR operation is used to obtain the filehandle of the named 20682 attribute directory associated with the current filehandle. The 20683 result of the OPENATTR will be a filehandle to an object of type 20684 NF4ATTRDIR. From this filehandle, READDIR and LOOKUP operations can 20685 be used to obtain filehandles for the various named attributes 20686 associated with the original file system object. Filehandles 20687 returned within the named attribute directory will designate objects 20688 of type of NF4NAMEDATTR. 20690 The createdir argument allows the client to signify if a named 20691 attribute directory should be created as a result of the OPENATTR 20692 operation. Some clients may use the OPENATTR operation with a value 20693 of FALSE for createdir to determine if any named attributes exist for 20694 the object. If none exist, then NFS4ERR_NOENT will be returned. If 20695 createdir has a value of TRUE and no named attribute directory 20696 exists, one is created and its filehandle becomes the current 20697 filehandle. On the other hand, if createdir has a value of TRUE and 20698 the named attribute directory already exists, no error results and 20699 the filehandle of the existing directory becomes the current 20700 filehandle. The creation of a named attribute directory assumes that 20701 the server has implemented named attribute support in this fashion 20702 and is not required to do so by this definition. 20704 If the current file handle designates an object of type NF4NAMEDATTR 20705 (a named attribute) or NF4ATTRDIR (a named attribute directory), an 20706 error of NFS4ERR_WRONG_TYPE is returned to the client. Name 20707 attributes or a named attribute directory may have their own named 20708 attributes. 20710 18.17.4. IMPLEMENTATION 20712 If the server does not support named attributes for the current 20713 filehandle, an error of NFS4ERR_NOTSUPP will be returned to the 20714 client. 20716 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access 20718 18.18.1. ARGUMENTS 20720 struct OPEN_DOWNGRADE4args { 20721 /* CURRENT_FH: opened file */ 20722 stateid4 open_stateid; 20723 seqid4 seqid; 20724 uint32_t share_access; 20725 uint32_t share_deny; 20726 }; 20728 18.18.2. RESULTS 20730 struct OPEN_DOWNGRADE4resok { 20731 stateid4 open_stateid; 20732 }; 20734 union OPEN_DOWNGRADE4res switch(nfsstat4 status) { 20735 case NFS4_OK: 20736 OPEN_DOWNGRADE4resok resok4; 20737 default: 20738 void; 20739 }; 20741 18.18.3. DESCRIPTION 20743 This operation is used to adjust the access and deny states for a 20744 given open. This is necessary when a given open-owner opens the same 20745 file multiple times with different access and deny values. In this 20746 situation, a close of one of the opens may change the appropriate 20747 share_access and share_deny flags to remove bits associated with 20748 opens no longer in effect. 20750 Valid values for the share_access field are: OPEN4_SHARE_ACCESS_READ, 20751 OPEN4_SHARE_ACCESS_WRITE, or OPEN4_SHARE_ACCESS_BOTH. If the client 20752 specifies other values, the server MUST reply with NFS4ERR_INVAL. 20754 Valid values for the share_deny field are: OPEN4_SHARE_DENY_NONE, 20755 OPEN4_SHARE_DENY_READ, OPEN4_SHARE_DENY_WRITE, or 20756 OPEN4_SHARE_DENY_BOTH. If the client specifies other values, the 20757 server MUST reply with NFS4ERR_INVAL. 20759 After checking for valid values of share_access and share_deny, the 20760 server replaces the current access and deny modes on the file with 20761 share_access and share_deny subject to the following constraints: 20763 o The bits in share_access SHOULD equal the union of the 20764 share_access bits (not including OPEN4_SHARE_WANT_* bits) 20765 specified for some subset of the OPENs in effect for the current 20766 open-owner on the current file. 20768 o The bits in share_deny SHOULD equal the union of the share_deny 20769 bits specified for some subset of the OPENs in effect for the 20770 current open-owner on the current file. 20772 If the above constraints are not respected, the server SHOULD return 20773 the error NFS4ERR_INVAL. Since share_access and share_deny bits 20774 should be subsets of those already granted, short of a defect in the 20775 client or server implementation, it is not possible for the 20776 OPEN_DOWNGRADE request to be denied because of conflicting share 20777 reservations. 20779 The seqid argument is not used in NFSv4.1, MAY be any value, and MUST 20780 be ignored by the server. 20782 On success, the current filehandle retains its value. 20784 18.18.4. IMPLEMENTATION 20786 An OPEN_DOWNGRADE operation may make read delegations grantable where 20787 they were not previously. Servers may choose to respond immediately 20788 if there are pending delegation want requests or may respond to the 20789 situation at a later time. 20791 18.19. Operation 22: PUTFH - Set Current Filehandle 20793 18.19.1. ARGUMENTS 20795 struct PUTFH4args { 20796 nfs_fh4 object; 20797 }; 20799 18.19.2. RESULTS 20801 struct PUTFH4res { 20802 /* 20803 * If status is NFS4_OK, 20804 * new CURRENT_FH: argument to PUTFH 20805 */ 20806 nfsstat4 status; 20807 }; 20809 18.19.3. DESCRIPTION 20811 Replaces the current filehandle with the filehandle provided as an 20812 argument. Clears the current stateid. 20814 If the security mechanism used by the requester does not meet the 20815 requirements of the filehandle provided to this operation, the server 20816 MUST return NFS4ERR_WRONGSEC. 20818 See Section 16.2.3.1.1 for more details on the current filehandle. 20820 See Section 16.2.3.1.2 for more details on the current stateid. 20822 18.19.4. IMPLEMENTATION 20824 Commonly used as the second operator (after SEQUENCE) in a COMPOUND 20825 request to set the context for following operations. 20827 18.20. Operation 23: PUTPUBFH - Set Public Filehandle 20829 18.20.1. ARGUMENT 20831 void; 20833 18.20.2. RESULT 20835 struct PUTPUBFH4res { 20836 /* 20837 * If status is NFS4_OK, 20838 * new CURRENT_FH: public fh 20839 */ 20840 nfsstat4 status; 20841 }; 20843 18.20.3. DESCRIPTION 20845 Replaces the current filehandle with the filehandle that represents 20846 the public filehandle of the server's name space. This filehandle 20847 may be different from the "root" filehandle which may be associated 20848 with some other directory on the server. 20850 PUTPUBFH also clears the current stateid. 20852 The public filehandle represents the concepts embodied in RFC2054 20853 [32], RFC2055 [33], RFC2224 [42]. The intent for NFSv4.1 is that the 20854 public filehandle (represented by the PUTPUBFH operation) be used as 20855 a method of providing WebNFS server compatibility with NFSv3. 20857 The public filehandle and the root filehandle (represented by the 20858 PUTROOTFH operation) SHOULD be equivalent. If the public and root 20859 filehandles are not equivalent, then the public filehandle MUST be a 20860 descendant of the root filehandle. 20862 See Section 16.2.3.1.1 for more details on the current filehandle. 20864 See Section 16.2.3.1.2 for more details on the current stateid. 20866 18.20.4. IMPLEMENTATION 20868 Used as the second operator (after SEQUENCE) in an NFS request to set 20869 the context for file accessing operations that follow in the same 20870 COMPOUND request. 20872 With the NFSv3 public filehandle, the client is able to specify 20873 whether the path name provided in the LOOKUP should be evaluated as 20874 either an absolute path relative to the server's root or relative to 20875 the public filehandle. RFC2224 [42] contains further discussion of 20876 the functionality. With NFSv4.1, that type of specification is not 20877 directly available in the LOOKUP operation. The reason for this is 20878 because the component separators needed to specify absolute vs. 20879 relative are not allowed in NFSv4. Therefore, the client is 20880 responsible for constructing its request such that the use of either 20881 PUTROOTFH or PUTPUBFH are used to signify absolute or relative 20882 evaluation of an NFS URL respectively. 20884 Note that there are warnings mentioned in RFC2224 [42] with respect 20885 to the use of absolute evaluation and the restrictions the server may 20886 place on that evaluation with respect to how much of its namespace 20887 has been made available. These same warnings apply to NFSv4.1. It 20888 is likely, therefore that because of server implementation details, 20889 an NFSv3 absolute public filehandle lookup may behave differently 20890 than an NFSv4.1 absolute resolution. 20892 There is a form of security negotiation as described in RFC2755 [43] 20893 that uses the public filehandle and an overloading of the pathname. 20894 This method is not available with NFSv4.1 as filehandles are not 20895 overloaded with special meaning and therefore do not provide the same 20896 framework as NFSv3. Clients should therefore use the security 20897 negotiation mechanisms described in Section 2.6. 20899 18.21. Operation 24: PUTROOTFH - Set Root Filehandle 20901 18.21.1. ARGUMENTS 20903 void; 20905 18.21.2. RESULTS 20907 struct PUTROOTFH4res { 20908 /* 20909 * If status is NFS4_OK, 20910 * new CURRENT_FH: root fh 20911 */ 20912 nfsstat4 status; 20913 }; 20915 18.21.3. DESCRIPTION 20917 Replaces the current filehandle with the filehandle that represents 20918 the root of the server's name space. From this filehandle a LOOKUP 20919 operation can locate any other filehandle on the server. This 20920 filehandle may be different from the "public" filehandle which may be 20921 associated with some other directory on the server. 20923 PUTROOTFH also clears the current stateid. 20925 See Section 16.2.3.1.1 for more details on the current filehandle. 20927 See Section 16.2.3.1.2 for more details on the current stateid. 20929 18.21.4. IMPLEMENTATION 20931 Commonly used as the second operator (after SEQUENCE) in an NFS 20932 request to set the context for file accessing operations that follow 20933 in the same COMPOUND request. 20935 18.22. Operation 25: READ - Read from File 20937 18.22.1. ARGUMENTS 20939 struct READ4args { 20940 /* CURRENT_FH: file */ 20941 stateid4 stateid; 20942 offset4 offset; 20943 count4 count; 20944 }; 20946 18.22.2. RESULTS 20948 struct READ4resok { 20949 bool eof; 20950 opaque data<>; 20951 }; 20953 union READ4res switch (nfsstat4 status) { 20954 case NFS4_OK: 20955 READ4resok resok4; 20956 default: 20957 void; 20958 }; 20960 18.22.3. DESCRIPTION 20962 The READ operation reads data from the regular file identified by the 20963 current filehandle. 20965 The client provides an offset of where the READ is to start and a 20966 count of how many bytes are to be read. An offset of 0 (zero) means 20967 to read data starting at the beginning of the file. If offset is 20968 greater than or equal to the size of the file, the status, NFS4_OK, 20969 is returned with a data length set to 0 (zero) and eof is set to 20970 TRUE. The READ is subject to access permissions checking. 20972 If the client specifies a count value of 0 (zero), the READ succeeds 20973 and returns 0 (zero) bytes of data again subject to access 20974 permissions checking. The server may choose to return fewer bytes 20975 than specified by the client. The client needs to check for this 20976 condition and handle the condition appropriately. 20978 Except when special stateids are used, the stateid value for a READ 20979 request represents a value returned from a previous byte-range lock 20980 or share reservation request or the stateid associated with a 20981 delegation. The stateid identifies the associated owners if any and 20982 is used by the server to verify that the associated locks are still 20983 valid (e.g. have not been revoked). 20985 If the read ended at the end-of-file (formally, in a correctly formed 20986 READ request, if offset + count is equal to the size of the file), or 20987 the read request extends beyond the size of the file (if offset + 20988 count is greater than the size of the file), eof is returned as TRUE; 20989 otherwise it is FALSE. A successful READ of an empty file will 20990 always return eof as TRUE. 20992 If the current filehandle is not an ordinary file, an error will be 20993 returned to the client. In the case that the current filehandle 20994 represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. if 20995 the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is 20996 returned. In all other cases, NFS4ERR_WRONG_TYPE is returned. 20998 For a READ with a stateid value of all bits 0, the server MAY allow 20999 the READ to be serviced subject to mandatory file locks or the 21000 current share deny modes for the file. For a READ with a stateid 21001 value of all bits 1, the server MAY allow READ operations to bypass 21002 locking checks at the server. 21004 On success, the current filehandle retains its value. 21006 18.22.4. IMPLEMENTATION 21008 It is possible for the server to return fewer than count bytes of 21009 data. If the server returns less than the count requested and eof is 21010 set to FALSE, the client should send another READ to get the 21011 remaining data. A server may return less data than requested under 21012 several circumstances. The file may have been truncated by another 21013 client or perhaps on the server itself, changing the file size from 21014 what the requesting client believes to be the case. This would 21015 reduce the actual amount of data available to the client. It is 21016 possible that the server may back off the transfer size and reduce 21017 the read request return. Server resource exhaustion may also occur 21018 necessitating a smaller read return. 21020 If mandatory file locking is in effect for the file, and if the 21021 region corresponding to the data to be read from file is write locked 21022 by an owner not associated the stateid, the server will return the 21023 NFS4ERR_LOCKED error. The client should try to get the appropriate 21024 read byte-range lock via the LOCK operation before re-attempting the 21025 READ. When the READ completes, the client should release the byte- 21026 range lock via LOCKU. 21028 If another client has a write delegation for the file being read, the 21029 delegation must be recalled, and the operation cannot proceed until 21030 that delegation is returned or revoked. Except where this happens 21031 very quickly, one or more NFS4ERR_DELAY errors will be returned to 21032 requests made while the delegation remains outstanding. Normally, 21033 delegations will not be recalled as a result of a READ operation 21034 since the recall will occur as a result of an earlier OPEN. However, 21035 since it is possible for a READ to be done with a special stateid, 21036 the server needs to check for this case even though the client should 21037 have done an OPEN previously. 21039 18.23. Operation 26: READDIR - Read Directory 21041 18.23.1. ARGUMENTS 21043 struct READDIR4args { 21044 /* CURRENT_FH: directory */ 21045 nfs_cookie4 cookie; 21046 verifier4 cookieverf; 21047 count4 dircount; 21048 count4 maxcount; 21049 bitmap4 attr_request; 21050 }; 21052 18.23.2. RESULTS 21054 struct entry4 { 21055 nfs_cookie4 cookie; 21056 component4 name; 21057 fattr4 attrs; 21058 entry4 *nextentry; 21059 }; 21061 struct dirlist4 { 21062 entry4 *entries; 21063 bool eof; 21064 }; 21066 struct READDIR4resok { 21067 verifier4 cookieverf; 21068 dirlist4 reply; 21069 }; 21071 union READDIR4res switch (nfsstat4 status) { 21072 case NFS4_OK: 21073 READDIR4resok resok4; 21074 default: 21075 void; 21076 }; 21078 18.23.3. DESCRIPTION 21080 The READDIR operation retrieves a variable number of entries from a 21081 file system directory and returns client requested attributes for 21082 each entry along with information to allow the client to request 21083 additional directory entries in a subsequent READDIR. 21085 The arguments contain a cookie value that represents where the 21086 READDIR should start within the directory. A value of 0 (zero) for 21087 the cookie is used to start reading at the beginning of the 21088 directory. For subsequent READDIR requests, the client specifies a 21089 cookie value that is provided by the server on a previous READDIR 21090 request. 21092 The request's cookieverf field should be set to 0 (zero) when the 21093 request's cookie field is 0 (zero) (first directory read). On 21094 subsequent requests, the cookieverf field must match the cookieverf 21095 returned by the READDIR in which the cookie was acquired. If the 21096 server determines that the cookieverf is no longer valid for the 21097 directory, the error NFS4ERR_NOT_SAME must be returned. 21099 The dircount field of the request is a hint of the maximum number of 21100 bytes of directory information that should be returned. This value 21101 represents the total length of the names of the directory entries and 21102 the cookie value for these entries. This length represents the XDR 21103 encoding of the data (names and cookies) and not the length in the 21104 native format of the server. 21106 The maxcount field of the request represents the maximum total size 21107 of all of the data being returned within the READDIR4resok structure 21108 and includes the XDR overhead. The server MAY return less data. If 21109 the server is unable to return a single directory entry within the 21110 maxcount limit, the error NFS4ERR_TOOSMALL MUST be returned to the 21111 client. 21113 Finally, the request's attr_request field represents the list of 21114 attributes to be returned for each directory entry supplied by the 21115 server. 21117 A successful reply consists of a list of directory entries. Each of 21118 these entries contains the name of the directory entry, a cookie 21119 value for that entry, and the associated attributes as requested. 21120 The "eof" flag has a value of TRUE if there are no more entries in 21121 the directory. 21123 The cookie value is only meaningful to the server and is used as a 21124 cursor for the directory entry. As mentioned, this cookie is used by 21125 the client for subsequent READDIR operations so that it may continue 21126 reading a directory. The cookie is similar in concept to a READ 21127 offset but MUST NOT be interpreted as such by the client. Ideally, 21128 the cookie value SHOULD NOT change if the directory is modified since 21129 the client may be caching these values. 21131 In some cases, the server may encounter an error while obtaining the 21132 attributes for a directory entry. Instead of returning an error for 21133 the entire READDIR operation, the server can instead return the 21134 attribute rdattr_error (Section 5.8.1.12). With this, the server is 21135 able to communicate the failure to the client and not fail the entire 21136 operation in the instance of what might be a transient failure. 21137 Obviously, the client must request the fattr4_rdattr_error attribute 21138 for this method to work properly. If the client does not request the 21139 attribute, the server has no choice but to return failure for the 21140 entire READDIR operation. 21142 For some file system environments, the directory entries "." and ".." 21143 have special meaning and in other environments, they do not. If the 21144 server supports these special entries within a directory, they SHOULD 21145 NOT be returned to the client as part of the READDIR response. To 21146 enable some client environments, the cookie values of 0, 1, and 2 are 21147 to be considered reserved. Note that the UNIX client will use these 21148 values when combining the server's response and local representations 21149 to enable a fully formed UNIX directory presentation to the 21150 application. 21152 For READDIR arguments, cookie values of 1 and 2 SHOULD NOT be used 21153 and for READDIR results cookie values of 0, 1, and 2 SHOULD NOT be 21154 returned. 21156 On success, the current filehandle retains its value. 21158 18.23.4. IMPLEMENTATION 21160 The server's file system directory representations can differ 21161 greatly. A client's programming interfaces may also be bound to the 21162 local operating environment in a way that does not translate well 21163 into the NFS protocol. Therefore the use of the dircount and 21164 maxcount fields are provided to enable the client to provide hints to 21165 the server. If the client is aggressive about attribute collection 21166 during a READDIR, the server has an idea of how to limit the encoded 21167 response. 21169 If dircount is zero, the server bounds the reply's size based on 21170 request's maxcount field. 21172 The cookieverf may be used by the server to help manage cookie values 21173 that may become stale. It should be a rare occurrence that a server 21174 is unable to continue properly reading a directory with the provided 21175 cookie/cookieverf pair. The server SHOULD make every effort to avoid 21176 this condition since the application at the client might be unable to 21177 properly handle this type of failure. 21179 The use of the cookieverf will also protect the client from using 21180 READDIR cookie values that might be stale. For example, if the file 21181 system has been migrated, the server might or might not be able to 21182 use the same cookie values to service READDIR as the previous server 21183 used. With the client providing the cookieverf, the server is able 21184 to provide the appropriate response to the client. This prevents the 21185 case where the server accepts a cookie value but the underlying 21186 directory has changed and the response is invalid from the client's 21187 context of its previous READDIR. 21189 Since some servers will not be returning "." and ".." entries as has 21190 been done with previous versions of the NFS protocol, the client that 21191 requires these entries be present in READDIR responses must fabricate 21192 them. 21194 18.24. Operation 27: READLINK - Read Symbolic Link 21196 18.24.1. ARGUMENTS 21198 /* CURRENT_FH: symlink */ 21199 void; 21201 18.24.2. RESULTS 21203 struct READLINK4resok { 21204 linktext4 link; 21205 }; 21207 union READLINK4res switch (nfsstat4 status) { 21208 case NFS4_OK: 21209 READLINK4resok resok4; 21210 default: 21211 void; 21212 }; 21214 18.24.3. DESCRIPTION 21216 READLINK reads the data associated with a symbolic link. Depending 21217 on the value of the UTF-8 capability attribute (Section 14.4), the 21218 data is encoded in UTF-8. Whether created by an NFS client or 21219 created locally on the server, the data in a symbolic link is not 21220 interpreted (except possibly to check for proper UTF-8 encoding) when 21221 created, but is simply stored. 21223 On success, the current filehandle retains its value. 21225 18.24.4. IMPLEMENTATION 21227 A symbolic link is nominally a pointer to another file. The data is 21228 not necessarily interpreted by the server, just stored in the file. 21229 It is possible for a client implementation to store a path name that 21230 is not meaningful to the server operating system in a symbolic link. 21231 A READLINK operation returns the data to the client for 21232 interpretation. If different implementations want to share access to 21233 symbolic links, then they must agree on the interpretation of the 21234 data in the symbolic link. 21236 The READLINK operation is only allowed on objects of type NF4LNK. 21237 The server should return the error NFS4ERR_WRONG_TYPE if the object 21238 is not of type NF4LNK. 21240 18.25. Operation 28: REMOVE - Remove File System Object 21242 18.25.1. ARGUMENTS 21244 struct REMOVE4args { 21245 /* CURRENT_FH: directory */ 21246 component4 target; 21247 }; 21249 18.25.2. RESULTS 21251 struct REMOVE4resok { 21252 change_info4 cinfo; 21253 }; 21255 union REMOVE4res switch (nfsstat4 status) { 21256 case NFS4_OK: 21257 REMOVE4resok resok4; 21258 default: 21259 void; 21260 }; 21262 18.25.3. DESCRIPTION 21264 The REMOVE operation removes (deletes) a directory entry named by 21265 filename from the directory corresponding to the current filehandle. 21266 If the entry in the directory was the last reference to the 21267 corresponding file system object, the object may be destroyed. The 21268 directory may be either of type NF4DIR or NF4ATTRDIR. 21270 For the directory where the filename was removed, the server returns 21271 change_info4 information in cinfo. With the atomic field of the 21272 change_info4 data type, the server will indicate if the before and 21273 after change attributes were obtained atomically with respect to the 21274 removal. 21276 If the target has a length of 0 (zero), or if target does not obey 21277 the UTF-8 definition (and the server is enforcing UTF-8 encoding, see 21278 Section 14.4), the error NFS4ERR_INVAL will be returned. 21280 On success, the current filehandle retains its value. 21282 18.25.4. IMPLEMENTATION 21284 NFSv3 required a different operator RMDIR for directory removal and 21285 REMOVE for non-directory removal. This allowed clients to skip 21286 checking the file type when being passed a non-directory delete 21287 system call (e.g. unlink() in POSIX) to remove a directory, as well 21288 as the converse (e.g. a rmdir() on a non-directory) because they knew 21289 the server would check the file type. NFSv4.1 REMOVE can be used to 21290 delete any directory entry independent of its file type. The 21291 implementor of an NFSv4.1 client's entry points from the unlink() and 21292 rmdir() system calls should first check the file type against the 21293 types the system call is allowed to remove before issuing a REMOVE. 21294 Alternatively, the implementor can produce a COMPOUND call that 21295 includes a LOOKUP/VERIFY sequence to verify the file type before a 21296 REMOVE operation in the same COMPOUND call. 21298 The concept of last reference is server specific. However, if the 21299 numlinks field in the previous attributes of the object had the value 21300 1, the client should not rely on referring to the object via a 21301 filehandle. Likewise, the client should not rely on the resources 21302 (disk space, directory entry, and so on) formerly associated with the 21303 object becoming immediately available. Thus, if a client needs to be 21304 able to continue to access a file after using REMOVE to remove it, 21305 the client should take steps to make sure that the file will still be 21306 accessible. While the traditional mechanism used is to RENAME the 21307 file from its old name to a new hidden name, the NFSv4.1 OPEN 21308 operation MAY return a result flag, OPEN4_RESULT_PRESERVE_UNLINKED, 21309 which indicates to the client that the file will be preserved if the 21310 file has an outstanding open (see Section 18.16). 21312 If the server finds that the file is still open when the REMOVE 21313 arrives: 21315 o The server SHOULD NOT delete the file's directory entry if the 21316 file was opened with OPEN4_SHARE_DENY_WRITE or 21317 OPEN4_SHARE_DENY_BOTH. 21319 o If the file was not opened with OPEN4_SHARE_DENY_WRITE or 21320 OPEN4_SHARE_DENY_BOTH, the server SHOULD delete the file's 21321 directory entry. However, until last CLOSE of the file, the 21322 server MAY continue to allow access to the file via its 21323 filehandle. 21325 o The server MUST NOT delete the directory entry if the reply from 21326 OPEN had the flag OPEN4_RESULT_PRESERVE_UNLINKED set. 21328 The server MAY implement its own restrictions on removal of a file 21329 while it is open. The server might disallow such a REMOVE (or a 21330 removal that occurs as part of RENAME). The conditions that 21331 influence the restrictions on removal of a file while it is still 21332 open include: 21334 o Whether certain access protocols (i.e. not just NFS) are holding 21335 the file open. 21337 o Whether particular options, access modes, or policies on the 21338 server are enabled. 21340 In all cases in which a decision is made to not allow the file's 21341 directory entry be removed because of an open, the error 21342 NFS4ERR_FILE_OPEN is returned. 21344 Where the determination above cannot be made definitively because 21345 delegations are being held, they MUST be recalled to allow processing 21346 of the REMOVE to continue. When a delegation is held, the server's 21347 knowledge of the status of opens for that client is not to be relied 21348 on, so that unless there are files opened with the particular deny 21349 modes by clients without delegations, the determination cannot be 21350 made until delegations are recalled, and the operation cannot proceed 21351 until each sufficient delegations have been returned or revoked to 21352 allow the server to make a correct determination. 21354 In all cases in which delegations are recalled, the server is likely 21355 to return one or more NFS4ERR_DELAY errors while delegations remain 21356 outstanding. 21358 If the current filehandle designates a directory for which another 21359 client holds a directory delegation, then, unless the situation can 21360 be resolved by sending a notification, the directory delegation MUST 21361 be recalled, and the operation MUST NOT proceed until the delegation 21362 is returned or revoked. Except where this happens very quickly, one 21363 or more NFS4ERR_DELAY errors will be returned to requests made while 21364 delegation remains outstanding. 21366 When the current filehandle designates a directory for which one or 21367 more directory delegations exist, then, when those delegations 21368 request such notifications, NOTIFY4_REMOVE_ENTRY will be generated as 21369 a result of this operation. 21371 Note that when a remove occurs as a result of a RENAME, 21372 NOTIFY4_REMOVE_ENTRY will only be generated if the removal happens as 21373 a separate operation. In the case in which the removal is integrated 21374 and atomic with RENAME, the notification of the removal is integrated 21375 with notification for the RENAME. See the discussion of the 21376 NOTIFY4_RENAME_ENTRY notification in Section 20.4. 21378 18.26. Operation 29: RENAME - Rename Directory Entry 21379 18.26.1. ARGUMENTS 21381 struct RENAME4args { 21382 /* SAVED_FH: source directory */ 21383 component4 oldname; 21384 /* CURRENT_FH: target directory */ 21385 component4 newname; 21386 }; 21388 18.26.2. RESULTS 21390 struct RENAME4resok { 21391 change_info4 source_cinfo; 21392 change_info4 target_cinfo; 21393 }; 21395 union RENAME4res switch (nfsstat4 status) { 21396 case NFS4_OK: 21397 RENAME4resok resok4; 21398 default: 21399 void; 21400 }; 21402 18.26.3. DESCRIPTION 21404 The RENAME operation renames the object identified by oldname in the 21405 source directory corresponding to the saved filehandle, as set by the 21406 SAVEFH operation, to newname in the target directory corresponding to 21407 the current filehandle. The operation is required to be atomic to 21408 the client. Source and target directories MUST reside on the same 21409 file system on the server. On success, the current filehandle will 21410 continue to be the target directory. 21412 If the target directory already contains an entry with the name, 21413 newname, the source object MUST be compatible with the target: either 21414 both are non-directories or both are directories and the target MUST 21415 be empty. If compatible, the existing target is removed before the 21416 rename occurs or preferably as part of the rename and atomic with it. 21417 See Section 18.25.4 for client and server actions whenever a target 21418 is removed. Note however that when the removal is performed 21419 atomically with the rename, certain parts of the removal described 21420 there are integrated with the rename. For example, notification of 21421 the removal will not be via a NOTIFY4_REMOVE_ENTRY but will be 21422 indicated as part of the NOTIFY4_ADD_ENTRY or NOTIFY4_RENAME_ENTRY 21423 generated by the rename. 21425 If the source object and the target are not compatible or if the 21426 target is a directory but not empty, the server will return the 21427 error, NFS4ERR_EXIST. 21429 If oldname and newname both refer to the same file (e.g. they might 21430 be hard links of each other), then unless the file is open (see 21431 Section 18.26.4), RENAME MUST perform no action and return NFS4_OK. 21433 For both directories involved in the RENAME, the server returns 21434 change_info4 information. With the atomic field of the change_info4 21435 data type, the server will indicate if the before and after change 21436 attributes were obtained atomically with respect to the rename. 21438 If oldname refers to a named attribute and the saved and current 21439 filehandles refer to different file system objects, the server will 21440 return NFS4ERR_XDEV just as if the saved and current filehandles 21441 represented directories on different file systems. 21443 If oldname or newname have a length of 0 (zero), or if oldname or 21444 newname do not obey the UTF-8 definition, the error NFS4ERR_INVAL 21445 will be returned. 21447 18.26.4. IMPLEMENTATION 21449 The server MAY impose restrictions on the RENAME operation such that 21450 RENAME may not be done when the file being renamed is open or when 21451 that open is done by particular protocols, or with particular options 21452 or access modes. Similar restrictions may be applied when a file 21453 exists with the target name and is open. When RENAME is rejected 21454 because of such restrictions, the error NFS4ERR_FILE_OPEN is 21455 returned. 21457 When oldname and rename refer to the same file and that file is open 21458 in a fashion such that RENAME would normally be rejected with 21459 NFS4ERR_FILE_OPEN if oldname and newname were different files, then 21460 RENAME SHOULD be rejected with NFS4ERR_FILE_OPEN. 21462 If a server does implement such restrictions and those restrictions 21463 include cases of NFSv4 opens preventing successful execution of a 21464 rename, the server needs to recall any delegations which could hide 21465 the existence of opens relevant to that decision. This is because 21466 when a client holds a delegation, the server might not have an 21467 accurate account of the opens for that client, since the client may 21468 execute OPENs and CLOSEs locally. The RENAME operation need only be 21469 delayed until a definitive result can be obtained. For example, if 21470 there are multiple delegations and one of them establishes an open 21471 whose presence would prevent the rename, given the server's 21472 semantics, NFS4ERR_FILE_OPEN may be returned to the caller as soon as 21473 that delegation is returned without waiting for other delegations to 21474 be returned. Similarly, if such opens are not associated with 21475 delegations, NFS4ERR_FILE_OPEN can be returned immediately with no 21476 delegation recall being done. 21478 If the current filehandle or the saved filehandle designate a 21479 directory for which another client holds a directory delegation, 21480 then, unless the situation can be resolved by sending a notification, 21481 the delegation MUST be recalled, and the operation cannot proceed 21482 until the delegation is returned or revoked. Except where this 21483 happens very quickly, one or more NFS4ERR_DELAY errors will be 21484 returned to requests made while delegation remains outstanding. 21486 When the current and saved filehandles are the same and they 21487 designate a directory for which one or more directory delegations 21488 exist, then, when those delegations request such notifications, a 21489 notification of type NOTIFY4_RENAME_ENTRY will be generated as a 21490 result of this operation. When oldname and rename refer to the same 21491 file, no notification is generated (because as Section 18.26.3 21492 states, the server MUST take no action). When a file is removed 21493 because it has the same name as the target, if that removal is done 21494 atomically with the rename, a NOTIFY4_REMOVE_ENTRY notification will 21495 not be generated. Instead, the deletion of the file will be reported 21496 as part of the NOTIFY4_RENAME_ENTRY notification. 21498 When the current and saved filehandles are not the same: 21500 o If the current filehandle designates a directory for which one or 21501 more directory delegations exist, then, when those delegations 21502 request such notifications, NOTIFY4_ADD_ENTRY will be generated as 21503 a result of this operation. When a file is removed because it has 21504 the same name as the target, if that removal is done atomically 21505 with the rename, a NOTIFY4_REMOVE_ENTRY notification will not be 21506 generated. Instead, the deletion of the file will be reported as 21507 part of the NOTIFY4_ADD_ENTRY notification. 21509 o If the saved filehandle designates a directory for which one or 21510 more directory delegations exist, then, when those delegations 21511 request such notifications, NOTIFY4_REMOVE_ENTRY will be generated 21512 as a result of this operation. 21514 If the object being renamed has file delegations held by clients 21515 other than the one doing the RENAME, the delegations MUST be 21516 recalled, and the operation cannot proceed until each such delegation 21517 is returned or revoked. Note that in the case of multiply linked 21518 files, the delegation recall requirement applies even if the 21519 delegation was obtained through a different name than the one being 21520 renamed. In all cases in which delegations are recalled, the server 21521 is likely to return one or more NFS4ERR_DELAY error while the 21522 delegation(s) remains outstanding, although it may, if the returns 21523 happen quickly, not do that. 21525 The RENAME operation must be atomic to the client. The statement 21526 "source and target directories MUST reside on the same file system on 21527 the server" means that the fsid fields in the attributes for the 21528 directories are the same. If they reside on different file systems, 21529 the error, NFS4ERR_XDEV, is returned. 21531 Based on the value of the fh_expire_type attribute for the object, 21532 the filehandle may or may not expire on a RENAME. However, server 21533 implementors are strongly encouraged to attempt to keep filehandles 21534 from expiring in this fashion. 21536 On some servers, the file names "." and ".." are illegal as either 21537 oldname or newname, and will result in the error NFS4ERR_BADNAME. In 21538 addition, on many servers the case of oldname or newname being an 21539 alias for the source directory will be checked for. Such servers 21540 will return the error NFS4ERR_INVAL in these cases. 21542 If either of the source or target filehandles are not directories, 21543 the server will return NFS4ERR_NOTDIR. 21545 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle 21547 18.27.1. ARGUMENTS 21549 /* SAVED_FH: */ 21550 void; 21552 18.27.2. RESULTS 21554 struct RESTOREFH4res { 21555 /* 21556 * If status is NFS4_OK, 21557 * new CURRENT_FH: value of saved fh 21558 */ 21559 nfsstat4 status; 21560 }; 21562 18.27.3. DESCRIPTION 21564 Set the current filehandle and stateid to the values in the saved 21565 filehandle and stateid. If there is no saved filehandle then the 21566 server will return the error NFS4ERR_NOFILEHANDLE. 21568 See Section 16.2.3.1.1 for more details on the current filehandle. 21570 See Section 16.2.3.1.2 for more details on the current stateid. 21572 18.27.4. IMPLEMENTATION 21574 Operations like OPEN and LOOKUP use the current filehandle to 21575 represent a directory and replace it with a new filehandle. Assuming 21576 the previous filehandle was saved with a SAVEFH operator, the 21577 previous filehandle can be restored as the current filehandle. This 21578 is commonly used to obtain post-operation attributes for the 21579 directory, e.g. 21581 PUTFH (directory filehandle) 21582 SAVEFH 21583 GETATTR attrbits (pre-op dir attrs) 21584 CREATE optbits "foo" attrs 21585 GETATTR attrbits (file attributes) 21586 RESTOREFH 21587 GETATTR attrbits (post-op dir attrs) 21589 18.28. Operation 32: SAVEFH - Save Current Filehandle 21591 18.28.1. ARGUMENTS 21593 /* CURRENT_FH: */ 21594 void; 21596 18.28.2. RESULTS 21598 struct SAVEFH4res { 21599 /* 21600 * If status is NFS4_OK, 21601 * new SAVED_FH: value of current fh 21602 */ 21603 nfsstat4 status; 21604 }; 21606 18.28.3. DESCRIPTION 21608 Save the current filehandle and stateid. If a previous filehandle 21609 was saved then it is no longer accessible. The saved filehandle can 21610 be restored as the current filehandle with the RESTOREFH operator. 21612 On success, the current filehandle retains its value. 21614 See Section 16.2.3.1.1 for more details on the current filehandle. 21616 See Section 16.2.3.1.2 for more details on the current stateid. 21618 18.28.4. IMPLEMENTATION 21620 18.29. Operation 33: SECINFO - Obtain Available Security 21622 18.29.1. ARGUMENTS 21624 struct SECINFO4args { 21625 /* CURRENT_FH: directory */ 21626 component4 name; 21627 }; 21629 18.29.2. RESULTS 21631 /* 21632 * From RFC 2203 21633 */ 21634 enum rpc_gss_svc_t { 21635 RPC_GSS_SVC_NONE = 1, 21636 RPC_GSS_SVC_INTEGRITY = 2, 21637 RPC_GSS_SVC_PRIVACY = 3 21638 }; 21640 struct rpcsec_gss_info { 21641 sec_oid4 oid; 21642 qop4 qop; 21643 rpc_gss_svc_t service; 21644 }; 21646 /* RPCSEC_GSS has a value of '6' - See RFC 2203 */ 21647 union secinfo4 switch (uint32_t flavor) { 21648 case RPCSEC_GSS: 21649 rpcsec_gss_info flavor_info; 21650 default: 21651 void; 21652 }; 21654 typedef secinfo4 SECINFO4resok<>; 21656 union SECINFO4res switch (nfsstat4 status) { 21657 case NFS4_OK: 21658 /* CURRENTFH: consumed */ 21659 SECINFO4resok resok4; 21660 default: 21661 void; 21662 }; 21664 18.29.3. DESCRIPTION 21666 The SECINFO operation is used by the client to obtain a list of valid 21667 RPC authentication flavors for a specific directory filehandle, file 21668 name pair. SECINFO should apply the same access methodology used for 21669 LOOKUP when evaluating the name. Therefore, if the requester does 21670 not have the appropriate access to LOOKUP the name then SECINFO MUST 21671 behave the same way and return NFS4ERR_ACCESS. 21673 The result will contain an array which represents the security 21674 mechanisms available, with an order corresponding to the server's 21675 preferences, the most preferred being first in the array. The client 21676 is free to pick whatever security mechanism it both desires and 21677 supports, or to pick in the server's preference order the first one 21678 it supports. The array entries are represented by the secinfo4 21679 structure. The field 'flavor' will contain a value of AUTH_NONE, 21680 AUTH_SYS (as defined in RFC1831 [3]), or RPCSEC_GSS (as defined in 21681 RFC2203 [4]). The field flavor can also be any other security flavor 21682 registered with IANA. 21684 For the flavors AUTH_NONE and AUTH_SYS, no additional security 21685 information is returned. The same is true of many (if not most) 21686 other security flavors, including AUTH_DH. For a return value of 21687 RPCSEC_GSS, a security triple is returned that contains the mechanism 21688 object identifier (OID, as defined in RFC2743 [7]), the quality of 21689 protection (as defined in RFC2743 [7]) and the service type (as 21690 defined in RFC2203 [4]). It is possible for SECINFO to return 21691 multiple entries with flavor equal to RPCSEC_GSS with different 21692 security triple values. 21694 On success, the current filehandle is consumed (see 21695 Section 2.6.3.1.1.8), and if the next operation after SECINFO tries 21696 to use the current filehandle, that operation will fail with the 21697 status NFS4ERR_NOFILEHANDLE. 21699 If the name has a length of 0 (zero), or if name does not obey the 21700 UTF-8 definition (assuming UTF-8 capabilities are enabled, see 21701 Section 14.4), the error NFS4ERR_INVAL will be returned. 21703 See Section 2.6 for additional information on the use of SECINFO. 21705 18.29.4. IMPLEMENTATION 21707 The SECINFO operation is expected to be used by the NFS client when 21708 the error value of NFS4ERR_WRONGSEC is returned from another NFS 21709 operation. This signifies to the client that the server's security 21710 policy is different from what the client is currently using. At this 21711 point, the client is expected to obtain a list of possible security 21712 flavors and choose what best suits its policies. 21714 As mentioned, the server's security policies will determine when a 21715 client request receives NFS4ERR_WRONGSEC. See Table 14 for a list 21716 operations which can return NFS4ERR_WRONGSEC. In addition, when 21717 READDIR returns attributes, the rdattr_error (Section 5.8.1.12) can 21718 contain NFS4ERR_WRONGSEC. Note that CREATE and REMOVE MUST NOT 21719 return NFS4ERR_WRONGSEC. The rationale for CREATE is that unless the 21720 target name exists it cannot have a separate security policy from the 21721 parent directory, and the security policy of the parent was checked 21722 when its filehandle was injected into the COMPOUND request's 21723 operations stream (for similar reasons, an OPEN operation that 21724 creates the target MUST NOT return NFS4ERR_WRONGSEC). If the target 21725 name exists, while it might have a separate security policy, that is 21726 irrelevant because CREATE MUST return NFS4ERR_EXIST. The rationale 21727 for REMOVE is that while that target might have separate security 21728 policy, the target is going to be removed, and so the security policy 21729 of the parent trumps that of the object being removed. RENAME and 21730 LINK MAY return NFS4ERR_WRONGSEC, but the NFS4ERR_WRONGSEC error 21731 applies only to the saved filehandle (see Section 2.6.3.1.2). Any 21732 NFS4ERR_WRONGSEC error on the current filehandle used by LINK and 21733 RENAME MUST be returned by the PUTFH, PUTPUBFH, PUTROOTFH, or 21734 RESTOREFH operation that injected the current filehandle. 21736 With the exception of LINK and RENAME, the set of operations that can 21737 return NFS4ERR_WRONGSEC represent the point at which the client can 21738 inject a filehandle into the "current filehandle" at the server. The 21739 filehandle is either provided by the client (PUTFH, PUTPUBFH, 21740 PUTROOTFH), generated as a result of a name to filehandle translation 21741 (LOOKUP and OPEN), or generated from the saved filehandle via 21742 RESTOREFH. As Section 2.6.3.1.1.1 states, a put filehandle operation 21743 followed by SAVEFH MUST NOT return NFS4ERR_WRONGSEC. Thus the 21744 RESTOREFH operation, under certain conditions (see Section 2.6.3.1.1) 21745 is permitted to return NFS4ERR_WRONGSEC so that security policies can 21746 be honored. 21748 The READDIR operation will not directly return the NFS4ERR_WRONGSEC 21749 error. However, if the READDIR request included a request for 21750 attributes, it is possible that the READDIR request's security triple 21751 did not match that of a directory entry. If this is the case and the 21752 client has requested the rdattr_error attribute, the server will 21753 return the NFS4ERR_WRONGSEC error in rdattr_error for the entry. 21755 To resolve an error return of NFS4ERR_WRONGSEC, the client does the 21756 following: 21758 o For LOOKUP and OPEN, the client will use SECINFO with the same 21759 current filehandle and name as provided in the original LOOKUP or 21760 OPEN to enumerate the available security triples. 21762 o For the rdattr_error, the client will use SECINFO with the same 21763 current filehandle as provided in the original READDIR. The name 21764 passed to SECINFO will be that of the directory entry (as returned 21765 from READDIR) that had the NFS4ERR_WRONGSEC error in the 21766 rdattr_error attribute. 21768 o For PUTFH, PUTROOTFH, PUTPUBFH, RESTOREFH, LINK, and RENAME, the 21769 client will use SECINFO_NO_NAME { style = 21770 SECINFO_STYLE4_CURRENT_FH }. The client will prefix the 21771 SECINFO_NO_NAME operation with the appropriate PUTFH, PUTPUBFH, or 21772 PUTROOTFH operation that provides the filehandle originally 21773 provided by the PUTFH, PUTPUBFH, PUTROOTFH, or RESTOREFH 21774 operation. 21776 NOTE: In NFSv4.0, the client was required to use SECINFO, and had 21777 to reconstruct the parent of the original filehandle, and the 21778 component name of the original filehandle. The introduction in 21779 NFSv4.1 of SECINFO_NO_NAME obviates the need for reconstruction. 21781 o For LOOKUPP, the client will use SECINFO_NO_NAME { style = 21782 SECINFO_STYLE4_PARENT } and provide the filehandle which equals 21783 the filehandle originally provided to LOOKUPP. 21785 See Section 21 for a discussion on the recommendations for the 21786 security flavor used by SECINFO and SECINFO_NO_NAME. 21788 18.30. Operation 34: SETATTR - Set Attributes 21790 18.30.1. ARGUMENTS 21792 struct SETATTR4args { 21793 /* CURRENT_FH: target object */ 21794 stateid4 stateid; 21795 fattr4 obj_attributes; 21796 }; 21798 18.30.2. RESULTS 21800 struct SETATTR4res { 21801 nfsstat4 status; 21802 bitmap4 attrsset; 21803 }; 21805 18.30.3. DESCRIPTION 21807 The SETATTR operation changes one or more of the attributes of a file 21808 system object. The new attributes are specified with a bitmap and 21809 the attributes that follow the bitmap in bit order. 21811 The stateid argument for SETATTR is used to provide file locking 21812 context that is necessary for SETATTR requests that set the size 21813 attribute. Since setting the size attribute modifies the file's 21814 data, it has the same locking requirements as a corresponding WRITE. 21815 Any SETATTR that sets the size attribute is incompatible with a share 21816 reservation that specifies DENY_WRITE. The area between the old end- 21817 of-file and the new end-of-file is considered to be modified just as 21818 would have been the case had the area in question been specified as 21819 the target of WRITE, for the purpose of checking conflicts with byte- 21820 range locks, for those cases in which a server is implementing 21821 mandatory byte-range locking behavior. A valid stateid SHOULD always 21822 be specified. When the file size attribute is not set, the special 21823 stateid consisting of all bits zero MAY be passed. 21825 On either success or failure of the operation, the server will return 21826 the attrsset bitmask to represent what (if any) attributes were 21827 successfully set. The attrsset in the response is a subset of the 21828 attrmask field of the obj_attributes field in the argument. 21830 On success, the current filehandle retains its value. 21832 18.30.4. IMPLEMENTATION 21834 If the request specifies the owner attribute to be set, the server 21835 SHOULD allow the operation to succeed if the current owner of the 21836 object matches the value specified in the request. Some servers may 21837 be implemented in a way as to prohibit the setting of the owner 21838 attribute unless the requester has privilege to do so. If the server 21839 is lenient in this one case of matching owner values, the client 21840 implementation may be simplified in cases of creation of an object 21841 (e.g. an exclusive create via OPEN) followed by a SETATTR. 21843 The file size attribute is used to request changes to the size of a 21844 file. A value of zero causes the file to be truncated, a value less 21845 than the current size of the file causes data from new size to the 21846 end of the file to be discarded, and a size greater than the current 21847 size of the file causes logically zeroed data bytes to be added to 21848 the end of the file. Servers are free to implement this using 21849 unallocate bytes (holes) or allocated data bytes set to zero. 21850 Clients should not make any assumptions regarding a server's 21851 implementation of this feature, beyond that the bytes in affected 21852 region returned by READ will be zeroed. Servers MUST support 21853 extending the file size via SETATTR. 21855 SETATTR is not guaranteed to be atomic. A failed SETATTR may 21856 partially change a file's attributes, hence the reason why the reply 21857 always includes the status and the list of attributes that were set. 21859 If the object whose attributes are being changed has a file 21860 delegation which is held by a client other than the one doing the 21861 SETATTR, the delegation(s) must be recalled, and the operation cannot 21862 proceed to actually change an attribute until each such delegation is 21863 returned or revoked. In all cases in which delegations are recalled, 21864 the server is likely to return one or more NFS4ERR_DELAY error while 21865 the delegation(s) remains outstanding, although it may, if the 21866 returns happen quickly, not do that. 21868 If the object whose attributes are being set is a directory and 21869 another client holds a directory delegation for that directory, then 21870 if enabled, asynchronous notifications will be generated when the set 21871 of attributes changed has a non-null intersection with the set of 21872 attributes for which notification is requested. Notifications of 21873 type NOTIFY4_CHANGE_DIR_ATTRS will be sent to the appropriate 21874 client(s), but the SETATTR is not delayed by waiting for these 21875 notifications to be sent. 21877 If the object whose attributes are being set is a member of directory 21878 for which another client holds a directory delegation, then 21879 asynchronous notifications will be generated when the set of 21880 attributes changed has a non-null intersection with the set of 21881 attributes for which notification is requested. Notifications of 21882 type NOTIFY4_CHANGE_CHILD_ATTRS will be sent to the appropriate 21883 clients, but the SETATTR is not delayed by waiting for these 21884 notifications to be sent. 21886 Changing the size of a file with SETATTR indirectly changes the 21887 time_modify and change attributes. A client must account for this as 21888 size changes can result in data deletion. 21890 The attributes time_access_set and time_modify_set are write-only 21891 attributes constructed as a switched union so the client can direct 21892 the server in setting the time values. If the switched union 21893 specifies SET_TO_CLIENT_TIME4, the client has provided an nfstime4 to 21894 be used for the operation. If the switch union does not specify 21895 SET_TO_CLIENT_TIME4, the server is to use its current time for the 21896 SETATTR operation. 21898 If server and client times differ, programs that compare client time 21899 to file times can break. A time synchronization protocol should be 21900 used to limit client/server time skew. 21902 Use of a COMPOUND containing a VERIFY operation specifying only the 21903 change attribute, immediately followed by a SETATTR, provides a means 21904 whereby a client may specify a request that emulates the 21905 functionality of the SETATTR guard mechanism of NFSv3. Since the 21906 function of the guard mechanism is to avoid changes to the file 21907 attributes based on stale information, delays between checking of the 21908 guard condition and the setting of the attributes have the potential 21909 to compromise this function, as would the corresponding delay in the 21910 NFSv4 emulation. Therefore, NFSv4.1 servers SHOULD take care to 21911 avoid such delays, to the degree possible, when executing such a 21912 request. 21914 If the server does not support an attribute as requested by the 21915 client, the server SHOULD return NFS4ERR_ATTRNOTSUPP. 21917 A mask of the attributes actually set is returned by SETATTR in all 21918 cases. That mask MUST NOT include attributes bits not requested to 21919 be set by the client. If the attribute masks in the request and 21920 reply are equal, the status field in the reply MUST be NFS4_OK. 21922 18.31. Operation 37: VERIFY - Verify Same Attributes 21924 18.31.1. ARGUMENTS 21926 struct VERIFY4args { 21927 /* CURRENT_FH: object */ 21928 fattr4 obj_attributes; 21929 }; 21931 18.31.2. RESULTS 21933 struct VERIFY4res { 21934 nfsstat4 status; 21935 }; 21937 18.31.3. DESCRIPTION 21939 The VERIFY operation is used to verify that attributes have the value 21940 assumed by the client before proceeding with following operations in 21941 the COMPOUND request. If any of the attributes do not match then the 21942 error NFS4ERR_NOT_SAME must be returned. The current filehandle 21943 retains its value after successful completion of the operation. 21945 18.31.4. IMPLEMENTATION 21947 One possible use of the VERIFY operation is the following series of 21948 operations. With this the client is attempting to verify that the 21949 file being removed will match what the client expects to be removed. 21950 This series can help prevent the unintended deletion of a file. 21952 PUTFH (directory filehandle) 21953 LOOKUP (file name) 21954 VERIFY (filehandle == fh) 21955 PUTFH (directory filehandle) 21956 REMOVE (file name) 21958 This series does not prevent a second client from removing and 21959 creating a new file in the middle of this sequence but it does help 21960 avoid the unintended result. 21962 In the case that a RECOMMENDED attribute is specified in the VERIFY 21963 operation and the server does not support that attribute for the file 21964 system object, the error NFS4ERR_ATTRNOTSUPP is returned to the 21965 client. 21967 When the attribute rdattr_error or any set-only attribute (e.g. 21968 time_modify_set) is specified, the error NFS4ERR_INVAL is returned to 21969 the client. 21971 18.32. Operation 38: WRITE - Write to File 21973 18.32.1. ARGUMENTS 21975 enum stable_how4 { 21976 UNSTABLE4 = 0, 21977 DATA_SYNC4 = 1, 21978 FILE_SYNC4 = 2 21979 }; 21981 struct WRITE4args { 21982 /* CURRENT_FH: file */ 21983 stateid4 stateid; 21984 offset4 offset; 21985 stable_how4 stable; 21986 opaque data<>; 21987 }; 21989 18.32.2. RESULTS 21991 struct WRITE4resok { 21992 count4 count; 21993 stable_how4 committed; 21994 verifier4 writeverf; 21995 }; 21997 union WRITE4res switch (nfsstat4 status) { 21998 case NFS4_OK: 21999 WRITE4resok resok4; 22000 default: 22001 void; 22002 }; 22004 18.32.3. DESCRIPTION 22006 The WRITE operation is used to write data to a regular file. The 22007 target file is specified by the current filehandle. The offset 22008 specifies the offset where the data should be written. An offset of 22009 0 (zero) specifies that the write should start at the beginning of 22010 the file. The count, as encoded as part of the opaque data 22011 parameter, represents the number of bytes of data that are to be 22012 written. If the count is 0 (zero), the WRITE will succeed and return 22013 a count of 0 (zero) subject to permissions checking. The server MAY 22014 write fewer bytes than requested by the client. 22016 The client specifies with the stable parameter the method of how the 22017 data is to be processed by the server. If stable is FILE_SYNC4, the 22018 server MUST commit the data written plus all file system metadata to 22019 stable storage before returning results. This corresponds to the 22020 NFSv2 protocol semantics. Any other behavior constitutes a protocol 22021 violation. If stable is DATA_SYNC4, then the server MUST commit all 22022 of the data to stable storage and enough of the metadata to retrieve 22023 the data before returning. The server implementor is free to 22024 implement DATA_SYNC4 in the same fashion as FILE_SYNC4, but with a 22025 possible performance drop. If stable is UNSTABLE4, the server is 22026 free to commit any part of the data and the metadata to stable 22027 storage, including all or none, before returning a reply to the 22028 client. There is no guarantee whether or when any uncommitted data 22029 will subsequently be committed to stable storage. The only 22030 guarantees made by the server are that it will not destroy any data 22031 without changing the value of writeverf and that it will not commit 22032 the data and metadata at a level less than that requested by the 22033 client. 22035 Except when special stateids are used, the stateid value for a WRITE 22036 request represents a value returned from a previous byte-range LOCK 22037 or OPEN request or the stateid associated with a delegation. The 22038 stateid identifies the associated owners if any and is used by the 22039 server to verify that the associated locks are still valid (e.g. have 22040 not been revoked). 22042 Upon successful completion, the following results are returned. The 22043 count result is the number of bytes of data written to the file. The 22044 server may write fewer bytes than requested. If so, the actual 22045 number of bytes written starting at location, offset, is returned. 22047 The server also returns an indication of the level of commitment of 22048 the data and metadata via committed. Per Table 20, 22050 o The server MAY commit the data at a stronger level than requested. 22052 o The server MUST commit the data at a level at least as high as 22053 that committed. 22055 Valid combinations of the fields stable in the request and committed 22056 in the reply. 22058 +------------+-----------------------------------+ 22059 | stable | committed | 22060 +------------+-----------------------------------+ 22061 | UNSTABLE4 | FILE_SYNC4, DATA_SYNC4, UNSTABLE4 | 22062 | DATA_SYNC4 | FILE_SYNC4, DATA_SYNC4 | 22063 | FILE_SYNC4 | FILE_SYNC4 | 22064 +------------+-----------------------------------+ 22066 Table 20 22068 The final portion of the result is the field writeverf. This field 22069 is the write verifier and is a cookie that the client can use to 22070 determine whether a server has changed instance state (e.g. server 22071 restart) between a call to WRITE and a subsequent call to either 22072 WRITE or COMMIT. This cookie MUST be unchanged during a single 22073 instance of the NFSv4.1 server and MUST be unique between instances 22074 of the NFSv4.1 server. If the cookie changes, then the client MUST 22075 assume that any data written with an UNSTABLE4 value for committed 22076 and an old writeverf in the reply has been lost and will need to be 22077 recovered. 22079 If a client writes data to the server with the stable argument set to 22080 UNSTABLE4 and the reply yields a committed response of DATA_SYNC4 or 22081 UNSTABLE4, the client will follow up some time in the future with a 22082 COMMIT operation to synchronize outstanding asynchronous data and 22083 metadata with the server's stable storage, barring client error. It 22084 is possible that due to client crash or other error that a subsequent 22085 COMMIT will not be received by the server. 22087 For a WRITE with a stateid value of all bits 0, the server MAY allow 22088 the WRITE to be serviced subject to mandatory file locks or the 22089 current share deny modes for the file. For a WRITE with a stateid 22090 value of all bits 1, the server MUST NOT allow the WRITE operation to 22091 bypass locking checks at the server and otherwise is treated as if a 22092 stateid of all bits 0 were used. 22094 On success, the current filehandle retains its value. 22096 18.32.4. IMPLEMENTATION 22098 It is possible for the server to write fewer bytes of data than 22099 requested by the client. In this case, the server SHOULD NOT return 22100 an error unless no data was written at all. If the server writes 22101 less than the number of bytes specified, the client will need to send 22102 another WRITE to write the remaining data. 22104 It is assumed that the act of writing data to a file will cause the 22105 time_modified and change attributes of the file to be updated. 22106 However, these attributes SHOULD NOT be changed unless the contents 22107 of the file are changed. Thus, a WRITE request with count set to 0 22108 SHOULD NOT cause the time_modified and change attributes of the file 22109 to be updated. 22111 Stable storage is persistent storage that survives: 22113 1. Repeated power failures. 22115 2. Hardware failures (of any board, power supply, etc.). 22117 3. Repeated software crashes and restarts. 22119 This definition does not address failure of the stable storage module 22120 itself. 22122 The verifier is defined to allow a client to detect different 22123 instances of an NFSv4.1 protocol server over which cached, 22124 uncommitted data may be lost. In the most likely case, the verifier 22125 allows the client to detect server restarts. This information is 22126 required so that the client can safely determine whether the server 22127 could have lost cached data. If the server fails unexpectedly and 22128 the client has uncommitted data from previous WRITE requests (done 22129 with the stable argument set to UNSTABLE4 and in which the result 22130 committed was returned as UNSTABLE4 as well) the server might not 22131 have flushed cached data to stable storage. The burden of recovery 22132 is on the client and the client will need to retransmit the data to 22133 the server. 22135 A suggested verifier would be to use the time that the server was 22136 last started (if restarting the server results in lost buffers). 22138 The reply's committed field allows the client to do more effective 22139 caching. If the server is committing all WRITE requests to stable 22140 storage, then it SHOULD return with committed set to FILE_SYNC4, 22141 regardless of the value of the stable field in the arguments. A 22142 server that uses an NVRAM accelerator may choose to implement this 22143 policy. The client can use this to increase the effectiveness of the 22144 cache by discarding cached data that has already been committed on 22145 the server. 22147 Some implementations may return NFS4ERR_NOSPC instead of 22148 NFS4ERR_DQUOT when a user's quota is exceeded. 22150 In the case that the current filehandle is of type NF4DIR, the server 22151 will return NFS4ERR_ISDIR. If the current file is a symbolic link, 22152 the error NFS4ERR_SYMLINK will be returned. Otherwise, if the 22153 current filehandle does not designate an ordinary file, the server 22154 will return NFS4ERR_WRONG_TYPE. 22156 If mandatory file locking is in effect for the file, and the 22157 corresponding byte-range of the data to be written to the file is 22158 read or write locked by an owner that is not associated with the 22159 stateid, the server MUST return NFS4ERR_LOCKED. If so, the client 22160 MUST check if the owner corresponding to the stateid used with the 22161 WRITE operation has a conflicting read lock that overlaps with the 22162 region that was to be written. If the stateid's owner has no 22163 conflicting read lock, then the client SHOULD try to get the 22164 appropriate write byte-range lock via the LOCK operation before re- 22165 attempting the WRITE. When the WRITE completes, the client SHOULD 22166 release the byte-range lock via LOCKU. 22168 If the stateid's owner had a conflicting read lock, then the client 22169 has no choice but to return an error to the application that 22170 attempted the WRITE. The reason is that since the stateid's owner 22171 had a read lock, the server either attempted to temporarily 22172 effectively upgrade this read lock to a write lock, or the server has 22173 no upgrade capability. If the server attempted to upgrade the read 22174 lock and failed, it is pointless for the client to re-attempt the 22175 upgrade via the LOCK operation, because there might be another client 22176 also trying to upgrade. If two clients are blocked trying upgrade 22177 the same lock, the clients deadlock. If the server has no upgrade 22178 capability, then it is pointless to try a LOCK operation to upgrade. 22180 If one or more other clients have delegations for the file being 22181 written, those delegations MUST be recalled, and the operation cannot 22182 proceed until those delegations are returned or revoked. Except 22183 where this happens very quickly, one or more NFS4ERR_DELAY errors 22184 will be returned to requests made while the delegation remains 22185 outstanding. Normally, delegations will not be recalled as a result 22186 of a WRITE operation since the recall will occur as a result of an 22187 earlier OPEN. However, since it is possible for a WRITE to be done 22188 with a special stateid, the server needs to check for this case even 22189 though the client should have done an OPEN previously. 22191 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control 22193 Control aspects of the backchannel 22195 18.33.1. ARGUMENT 22197 typedef opaque gsshandle4_t<>; 22199 struct gss_cb_handles4 { 22200 rpc_gss_svc_t gcbp_service; /* RFC 2203 */ 22201 gsshandle4_t gcbp_handle_from_server; 22202 gsshandle4_t gcbp_handle_from_client; 22203 }; 22205 union callback_sec_parms4 switch (uint32_t cb_secflavor) { 22206 case AUTH_NONE: 22207 void; 22208 case AUTH_SYS: 22209 authsys_parms cbsp_sys_cred; /* RFC 1831 */ 22210 case RPCSEC_GSS: 22211 gss_cb_handles4 cbsp_gss_handles; 22212 }; 22214 struct BACKCHANNEL_CTL4args { 22215 uint32_t bca_cb_program; 22216 callback_sec_parms4 bca_sec_parms<>; 22217 }; 22219 18.33.2. RESULT 22221 struct BACKCHANNEL_CTL4res { 22222 nfsstat4 bcr_status; 22223 }; 22225 18.33.3. DESCRIPTION 22227 The BACKCHANNEL_CTL operation replaces the backchannel's callback 22228 program number and adds (not replaces) RPCSEC_GSS contexts for use by 22229 the backchannel. 22231 The arguments of the BACKCHANNEL_CTL call are a subset of the 22232 CREATE_SESSION parameters. In the arguments of BACKCHANNEL_CTL, the 22233 bca_cb_program field and bca_sec_parms fields correspond respectively 22234 to the csa_cb_program and csa_sec_parms fields of the arguments of 22235 CREATE_SESSION (Section 18.36). 22237 BACKCHANNEL_CTL MUST appear in a COMPOUND that starts with SEQUENCE. 22239 If the RPCSEC_GSS handle identified by gcbp_handle_from_server does 22240 not exist on the server, the server MUST return NFS4ERR_NOENT. 22242 18.34. Operation 41: BIND_CONN_TO_SESSION 22244 18.34.1. ARGUMENT 22246 enum channel_dir_from_client4 { 22247 CDFC4_FORE = 0x1, 22248 CDFC4_BACK = 0x2, 22249 CDFC4_FORE_OR_BOTH = 0x3, 22250 CDFC4_BACK_OR_BOTH = 0x7 22251 }; 22253 struct BIND_CONN_TO_SESSION4args { 22254 sessionid4 bctsa_sessid; 22256 channel_dir_from_client4 22257 bctsa_dir; 22259 bool bctsa_use_conn_in_rdma_mode; 22260 }; 22262 18.34.2. RESULT 22264 enum channel_dir_from_server4 { 22265 CDFS4_FORE = 0x1, 22266 CDFS4_BACK = 0x2, 22267 CDFS4_BOTH = 0x3 22268 }; 22270 struct BIND_CONN_TO_SESSION4resok { 22271 sessionid4 bctsr_sessid; 22273 channel_dir_from_server4 22274 bctsr_dir; 22276 bool bctsr_use_conn_in_rdma_mode; 22277 }; 22279 union BIND_CONN_TO_SESSION4res 22280 switch (nfsstat4 bctsr_status) { 22282 case NFS4_OK: 22283 BIND_CONN_TO_SESSION4resok 22284 bctsr_resok4; 22286 default: void; 22287 }; 22289 18.34.3. DESCRIPTION 22291 BIND_CONN_TO_SESSION is used to associate additional connections with 22292 a session. It MUST be used on the connection being associated with 22293 the session. It MUST be the only operation in the COMPOUND 22294 procedure. If SP4_NONE (Section 18.35) state protection is used, any 22295 principal, security flavor, or RPCSEC_GSS context MAY be used to 22296 invoke the operation. If SP4_MACH_CRED is used, RPCSEC_GSS MUST be 22297 used with the integrity or privacy services, using the principal that 22298 created the client ID. If SP4_SSV is used, RPCSEC_GSS with the SSV 22299 GSS mechanism (Section 2.10.8) and integrity or privacy MUST be used. 22301 If, when the client ID was created, the client opted for SP4_NONE 22302 state protection, the client is not required to use 22303 BIND_CONN_TO_SESSION to associate the connection with the session, 22304 unless the client wishes to associate the connection with the 22305 backchannel. When SP4_NONE protection is used, simply sending a 22306 COMPOUND request with a SEQUENCE operation is sufficient to associate 22307 the connnection with the session specified in SEQUENCE. 22309 The field bctsa_dir indicates whether the client wants to associate 22310 the connection with the fore channel or the backchannel or both 22311 channels. The value CDFC4_FORE_OR_BOTH indicates the client wants to 22312 associate the connection with both the fore channel and backchannel, 22313 but will accept the connection being associated to just the fore 22314 channel. The value CDFC4_BACK_OR_BOTH indicates the client wants to 22315 associate with both the fore and backchannel, but will accept the 22316 connection being associated with just the backchannel. The server 22317 replies in bctsr_dir which channel(s) the connection is associated 22318 with. If the client specified CDFC4_FORE, the server MUST return 22319 CDFS4_FORE. If the client specified CDFC4_BACK, the server MUST 22320 return CDFS4_BACK. If the client specified CDFC4_FORE_OR_BOTH, the 22321 server MUST return CDFS4_FORE or CDFS4_BOTH. If the client specified 22322 CDFC4_BACK_OR_BOTH, the server MUST return CDFS4_BACK or CDFS4_BOTH. 22324 See the CREATE_SESSION operation (Section 18.36), and the description 22325 of the argument csa_use_conn_in_rdma_mode to understand 22326 bctsa_use_conn_in_rdma_mode, and the description of 22327 csr_use_conn_in_rdma_mode to understand bctsr_use_conn_in_rdma_mode. 22329 Invoking BIND_CONN_TO_SESSION on a connection already associated with 22330 the specified session has no effect, and the server MUST respond with 22331 NFS4_OK, unless the client is demanding changes to the set of 22332 channels the connection is associated with. If so, the server MUST 22333 return NFS4ERR_INVAL. 22335 18.34.4. IMPLEMENTATION 22337 If a session's channel loses all connections, depending on the client 22338 ID's state protection and type of channel, the client might need to 22339 use BIND_CONN_TO_SESSION to associate a new connection. If the 22340 server restarted and does not keep the reply cache in stable storage, 22341 the server will not recognize the session ID. The client will 22342 ultimately have to invoke EXCHANGE_ID to create a new client ID and 22343 session. 22345 Suppose SP4_SSV state protection is being used, and 22346 BIND_CONN_TO_SESSION is among the operations included in the 22347 spo_must_enforce set when the client ID was created (Section 18.35). 22348 If so, there is an issue if SET_SSV is sent, no response is returned, 22349 and the last connection associated with the client ID drops. The 22350 client, per the sessions model, MUST retry the SET_SSV. But it needs 22351 a new connection to do so, and MUST associate that connection with 22352 the session via a BIND_CONN_TO_SESSION authenticated with the SSV GSS 22353 mechanism. The problem is that the RPCSEC_GSS message integrity 22354 codes use a subkey derived from the SSV as the key and the SSV may 22355 have changed. While there are multiple recovery strategies, a 22356 single, general strategy is described here. 22358 o The client reconnects. 22360 o The client assumes the SET_SSV was executed, and so sends 22361 BIND_CONN_TO_SESSION with the subkey (derived from the new SSV, 22362 i.e., what SET_SSV would have set the SSV to) used as the key for 22363 the RPCSEC_GSS credential message integrity codes. 22365 o If the request succeeds, this means the original attempted SET_SSV 22366 did execute successfully. The client re-sends the original 22367 SET_SSV, which the server will reply to via the reply cache. 22369 o If the server returns an RPC authentication error, this means the 22370 server's current SSV was not changed, (and the SET_SSV was likely 22371 not executed). The client then tries BIND_CONN_TO_SESSION with 22372 the subkey derived from the old SSV as the key for the RPCSEC_GSS 22373 message integrity codes. 22375 o The attempted BIND_CONN_TO_SESSION with the old SSV should 22376 succeed. If so the client re-sends the original SET_SSV. If the 22377 original SET_SSV was not executed, then the server executes it. 22378 If the original SET_SSV was executed, but failed, the server will 22379 return the SET_SSV from the reply cache. 22381 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID 22383 Exchange long hand client and server identifiers (owners), and create 22384 a client ID 22386 18.35.1. ARGUMENT 22387 const EXCHGID4_FLAG_SUPP_MOVED_REFER = 0x00000001; 22388 const EXCHGID4_FLAG_SUPP_MOVED_MIGR = 0x00000002; 22390 const EXCHGID4_FLAG_BIND_PRINC_STATEID = 0x00000100; 22392 const EXCHGID4_FLAG_USE_NON_PNFS = 0x00010000; 22393 const EXCHGID4_FLAG_USE_PNFS_MDS = 0x00020000; 22394 const EXCHGID4_FLAG_USE_PNFS_DS = 0x00040000; 22396 const EXCHGID4_FLAG_MASK_PNFS = 0x00070000; 22398 const EXCHGID4_FLAG_UPD_CONFIRMED_REC_A = 0x40000000; 22399 const EXCHGID4_FLAG_CONFIRMED_R = 0x80000000; 22401 struct state_protect_ops4 { 22402 bitmap4 spo_must_enforce; 22403 bitmap4 spo_must_allow; 22404 }; 22406 struct ssv_sp_parms4 { 22407 state_protect_ops4 ssp_ops; 22408 sec_oid4 ssp_hash_algs<>; 22409 sec_oid4 ssp_encr_algs<>; 22410 uint32_t ssp_window; 22411 uint32_t ssp_num_gss_handles; 22412 }; 22414 enum state_protect_how4 { 22415 SP4_NONE = 0, 22416 SP4_MACH_CRED = 1, 22417 SP4_SSV = 2 22418 }; 22420 union state_protect4_a switch(state_protect_how4 spa_how) { 22421 case SP4_NONE: 22422 void; 22423 case SP4_MACH_CRED: 22424 state_protect_ops4 spa_mach_ops; 22425 case SP4_SSV: 22426 ssv_sp_parms4 spa_ssv_parms; 22427 }; 22429 struct EXCHANGE_ID4args { 22430 client_owner4 eia_clientowner; 22431 uint32_t eia_flags; 22432 state_protect4_a eia_state_protect; 22433 nfs_impl_id4 eia_client_impl_id<1>; 22434 }; 22436 18.35.2. RESULT 22438 struct ssv_prot_info4 { 22439 state_protect_ops4 spi_ops; 22440 uint32_t spi_hash_alg; 22441 uint32_t spi_encr_alg; 22442 uint32_t spi_ssv_len; 22443 uint32_t spi_window; 22444 gsshandle4_t spi_handles<>; 22445 }; 22447 union state_protect4_r switch(state_protect_how4 spr_how) { 22448 case SP4_NONE: 22449 void; 22450 case SP4_MACH_CRED: 22451 state_protect_ops4 spr_mach_ops; 22452 case SP4_SSV: 22453 ssv_prot_info4 spr_ssv_info; 22454 }; 22456 struct EXCHANGE_ID4resok { 22457 clientid4 eir_clientid; 22458 sequenceid4 eir_sequenceid; 22459 uint32_t eir_flags; 22460 state_protect4_r eir_state_protect; 22461 server_owner4 eir_server_owner; 22462 opaque eir_server_scope; 22463 nfs_impl_id4 eir_server_impl_id<1>; 22464 }; 22466 union EXCHANGE_ID4res switch (nfsstat4 eir_status) { 22467 case NFS4_OK: 22468 EXCHANGE_ID4resok eir_resok4; 22470 default: 22471 void; 22472 }; 22474 18.35.3. DESCRIPTION 22476 The client uses the EXCHANGE_ID operation to register a particular 22477 client owner with the server. The client ID returned from this 22478 operation will be necessary for requests that create state on the 22479 server and will serve as a parent object to sessions created by the 22480 client. In order to confirm the client ID it must first be used, 22481 along with the returned eir_sequenceid, as arguments to 22482 CREATE_SESSION. If the flag EXCHGID4_FLAG_CONFIRMED_R is set in the 22483 result, eir_flags, then eir_sequenceid MUST be ignored, as it has no 22484 relevancy. 22486 EXCHANGE_ID MAY be sent in a COMPOUND procedure that starts with 22487 SEQUENCE. However, when a client communicates with a server for the 22488 first time, it will not have a session, so using SEQUENCE will not be 22489 possible. If EXCHANGE_ID is sent without a preceding SEQUENCE, then 22490 it MUST be the only operation in the COMPOUND procedure's request. 22491 If is not, the server MUST return NFS4ERR_NOT_ONLY_OP. 22493 The eia_clientowner field is composed of a co_verifier field and a 22494 co_ownerid string. As noted in Section 2.4, the co_ownerid describes 22495 the client, and the co_verifier is the incarnation of the client. An 22496 EXCHANGE_ID sent with a new incarnation of the client will lead to 22497 the server removing lock state of the old incarnation. Whereas an 22498 EXCHANGE_ID sent with the current incarnation and co_ownerid will 22499 result in an error or an update of the client ID's properties, 22500 depending on the arguments to EXCHANGE_ID. 22502 A server MUST NOT use the same client ID for two different 22503 incarnations of an eir_clientowner. 22505 In addition to the client ID and sequence ID, the server returns a 22506 server owner (eir_server_owner) and server scope (eir_server_scope). 22507 The former field is used for network trunking as described in 22508 Section 2.10.4. The latter field is used to allow clients to 22509 determine when client IDs sent by one server may be recognized by 22510 another in the event of file system migration (see Section 11.7.7). 22512 The client ID returned by EXCHANGE_ID is only unique relative to the 22513 combination of eir_server_owner.so_major_id and eir_server_scope. 22514 Thus if two servers return the same client ID, the onus is on the 22515 client to distinguish the client IDs on the basis of 22516 eir_server_owner.so_major_id and eir_server_scope. In the event two 22517 different server's claim matching server_owner.so_major_id and 22518 eir_server_scope, the client can use the verification techniques 22519 discussed in Section 2.10.4 to determine if the servers are distinct. 22520 If they are distinct, then the client will need to note the 22521 destination network addresses of the connections used with each 22522 server, and use the network address as the final discriminator. 22524 The server, as defined by the unique identity expressed in the 22525 so_major_id of the server owner and the server scope, needs to track 22526 several properties of each client ID it hands out. The properties 22527 apply to the client ID and all sessions associated with the client 22528 ID. The properties are derived from the arguments and results of 22529 EXCHANGE_ID. The client ID properties include: 22531 o The capabilities expressed by the following bits, which come from 22532 the results of EXCHANGE_ID: 22534 * EXCHGID4_FLAG_SUPP_MOVED_REFER 22536 * EXCHGID4_FLAG_SUPP_MOVED_MIGR 22538 * EXCHGID4_FLAG_BIND_PRINC_STATEID 22540 * EXCHGID4_FLAG_USE_NON_PNFS 22542 * EXCHGID4_FLAG_USE_PNFS_MDS 22544 * EXCHGID4_FLAG_USE_PNFS_DS 22546 These properties may be updated by subsequent EXCHANGE_ID requests 22547 on confirmed client IDs though the server MAY refuse to change 22548 them. 22550 o The state protection method used, one of SP4_NONE, SP4_MACH_CRED, 22551 or SP4_SSV, as set by the spa_how field of the arguments to 22552 EXCHANGE_ID. Once the client ID is confirmed, this property 22553 cannot be updated by subsequent EXCHANGE_ID requests. 22555 o For SP4_MACH_CRED or SP4_SSV state protection: 22557 * The list of operations that MUST use the specified state 22558 protection: spo_must_enforce, which come from the results of 22559 EXCHANGE_ID. 22561 * The list of operations that MAY use the specified state 22562 protection: spo_must_allow, which come from the results of 22563 EXCHANGE_ID. 22565 Once the client ID is confirmed, these properties cannot be 22566 updated by subsequent EXCHANGE_ID requests. 22568 o For SP4_SSV protection: 22570 * The OID of the hash algorithm. This property is represented by 22571 one of the algorithms in the ssp_hash_algs field of the 22572 EXCHANGE_ID arguments. Once the client ID is confirmed, this 22573 property cannot be updated by subsequent EXCHANGE_ID requests. 22575 * The OID of the encryption algorithm. This property is 22576 represented by one of the algorithms in the ssp_encr_algs field 22577 of the EXCHANGE_ID arguments. Once the client ID is confirmed, 22578 this property cannot be updated by subsequent EXCHANGE_ID 22579 requests. 22581 * The length of the SSV. This property is represented by the 22582 spi_ssv_len field in the EXCHANGE_ID results. Once the client 22583 ID is confirmed, this property cannot be updated by subsequent 22584 EXCHANGE_ID requests. The length of SSV MUST be equal to the 22585 length of the key used by the negotiated encryption algorithm. 22587 * Number of concurrent versions of the SSV the client and server 22588 will support (Section 2.10.8). This property is represented by 22589 spi_window, in the EXCHANGE_ID results. The property may be 22590 updated by subsequent EXCHANGE_ID requests. 22592 o The client's implementation ID as represented by the 22593 eia_client_impl_id field of the arguments. The property may be 22594 updated by subsequent EXCHANGE_ID requests. 22596 o The server's implementation ID as represented by the 22597 eir_server_impl_id field of the reply. The property may be 22598 updated by replies to subsequent EXCHANGE_ID requests. 22600 The eia_flags passed as part of the arguments and the eir_flags 22601 results allow the client and server to inform each other of their 22602 capabilities as well as indicate how the client ID will be used. 22603 Whether a bit is set or cleared on the arguments' flags does not 22604 force the server to set or clear the same bit on the results' side. 22605 Bits not defined above cannot be set in the eia_flags field. If they 22606 are, the server MUST reject the operation with NFS4ERR_INVAL. 22608 The EXCHGID4_FLAG_UPD_CONFIRMED_REC_A bit can only be set in 22609 eia_flags; it is always off in eir_flags. The 22610 EXCHGID4_FLAG_CONFIRMED_R bit can only be set in eir_flags; it is 22611 always off in eia_flags. If the server recognizes the co_ownerid and 22612 co_verifier as mapping to a confirmed client ID, it sets 22613 EXCHGID4_FLAG_CONFIRMED_R in eir_flags. The 22614 EXCHGID4_FLAG_CONFIRMED_R flag allows a client to tell if the client 22615 ID it is trying to create already exists and is confirmed. 22617 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set in eia_flags, this means 22618 the client is attempting to update properties of an existing 22619 confirmed client ID (if the client wants to update properties of an 22620 unconfirmed client ID, it MUST NOT set 22621 EXCHGID4_FLAG_UPD_CONFIRMED_REC_A). If so, it is RECOMMENDED the 22622 client send the update EXCHANGE_ID operation in the same COMPOUND as 22623 a SEQUENCE so that the EXCHANGE_ID is executed exactly once. Whether 22624 the client can update the properties of client ID depends on the 22625 state protection it selected when the client ID was created, and the 22626 principal and security flavor it uses when sending the EXCHANGE_ID 22627 request. The situations described in Sub-Paragraph 6, Sub- 22628 Paragraph 7, Sub-Paragraph 8, or Sub-Paragraph 9, of Paragraph 6 in 22629 Section 18.35.4 will apply. Note that if the operation succeeds and 22630 returns a client ID that is already confirmed, the server MUST set 22631 the EXCHGID4_FLAG_CONFIRMED_R bit in eir_flags. 22633 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in eia_flags, this 22634 means the client is trying to establish a new client ID; it is 22635 attempting to trunk data communication to the server 22636 (Section 2.10.4); or it is attempting to update properties of an 22637 unconfirmed client ID. The situations described in Sub-Paragraph 1, 22638 Sub-Paragraph 2, Sub-Paragraph 3, Sub-Paragraph 4, or Sub-Paragraph 5 22639 of Paragraph 6 in Section 18.35.4 will apply. Note that if the 22640 operation succeeds and returns a client ID that was previously 22641 confirmed, the server MUST set the EXCHGID4_FLAG_CONFIRMED_R bit in 22642 eir_flags. 22644 When the EXCHGID4_FLAG_SUPP_MOVED_REFER flag bit is set, the client 22645 indicates that it is capable of dealing with an NFS4ERR_MOVED error 22646 as part of a referral sequence. When this bit is not set, it is 22647 still legal for the server to perform a referral sequence. However, 22648 a server may use the fact that the client is incapable of correctly 22649 responding to a referral, by avoiding it for that particular client. 22650 It may, for instance, act as a proxy for that particular file system, 22651 at some cost in performance, although it is not obligated to do so. 22652 If the server will potentially perform a referral, it MUST set 22653 EXCHGID4_FLAG_SUPP_MOVED_REFER in eir_flags. 22655 When the EXCHGID4_FLAG_SUPP_MOVED_MIGR is set, the client indicates 22656 that it is capable of dealing with an NFS4ERR_MOVED error as part of 22657 a file system migration sequence. When this bit is not set, it is 22658 still legal for the server to indicate that a file system has moved, 22659 when this in fact happens. However, a server may use the fact that 22660 the client is incapable of correctly responding to a migration in its 22661 scheduling of file systems to migrate so as to avoid migration of 22662 file systems being actively used. It may also hide actual migrations 22663 from clients unable to deal with them by acting as a proxy for a 22664 migrated file system for particular clients, at some cost in 22665 performance, although it is not obligated to do so. If the server 22666 will potentially perform a migration, it MUST set 22667 EXCHGID4_FLAG_SUPP_MOVED_MIGR in eir_flags. 22669 When EXCHGID4_FLAG_BIND_PRINC_STATEID is set, the client indicates it 22670 wants the server to bind the stateid to the principal. This means 22671 that when a principal creates a stateid, it has to be the one to use 22672 the stateid. If the server will perform binding it will return 22673 EXCHGID4_FLAG_BIND_PRINC_STATEID. The server MAY return 22674 EXCHGID4_FLAG_BIND_PRINC_STATEID even if the client does not request 22675 it. If an update to the client ID changes the value of 22676 EXCHGID4_FLAG_BIND_PRINC_STATEID's client ID property, the effect 22677 applies only to new stateids. Existing stateids (and all stateids 22678 with the same "other" field) that were created with stateid to 22679 principal binding in force will continue to have binding in force. 22680 Existing stateids (and all stateids with same "other" field) that 22681 were created with stateid to principal not in force will continue to 22682 have binding not in force. 22684 The EXCHGID4_FLAG_USE_NON_PNFS, EXCHGID4_FLAG_USE_PNFS_MDS, and 22685 EXCHGID4_FLAG_USE_PNFS_DS bits are described in Section 13.1 and 22686 convey roles the client ID is to be used for in a pNFS environment. 22687 The server MUST set one of the acceptable combinations of these bits 22688 (roles) in eir_flags, as specified in Section 13.1. Note that the 22689 same client owner/server owner pair can have multiple roles. 22690 Multiple roles can be associated with the same client ID or with 22691 different client IDs. Thus, if a client sends EXCHANGE_ID from the 22692 same client owner to the same server owner multiple times, but 22693 specifies different pNFS roles each time, the server might return 22694 different client IDs. Given that different pNFS roles might have 22695 different client IDs, the client may ask for different properties for 22696 each role/client ID. 22698 The spa_how field of the eia_state_protect field specifies how the 22699 client wants to protect its client, locking and session state from 22700 unauthorized changes (Section 2.10.7.3): 22702 o SP4_NONE. The client does not request the NFSv4.1 server to 22703 enforce state protection. The NFSv4.1 server MUST NOT enforce 22704 state protection for the returned client ID. 22706 o SP4_MACH_CRED. This choice is only valid if the client sent the 22707 request with RPCSEC_GSS as the security flavor, and with a service 22708 of RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY. The client wants 22709 to use an RPCSEC_GSS-based machine credential to protect its 22710 state. The server MUST note the principal the EXCHANGE_ID 22711 operation was sent with, and the GSS mechanism used. These notes 22712 collectively comprise the machine credential. 22714 After the client ID is confirmed, as long as the lease associated 22715 with the client ID is unexpired, a subsequent EXCHANGE_ID 22716 operation that uses the same eia_clientowner.co_owner as the first 22717 EXCHANGE_ID, MUST also use the same machine credential as the 22718 first EXCHANGE_ID. The server returns the same client ID for the 22719 subsequent EXCHANGE_ID as that returned from the first 22720 EXCHANGE_ID. 22722 o SP4_SSV. This choice is only valid if the client sent the request 22723 with RPCSEC_GSS as the security flavor, and with a service of 22724 RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY. This choice 22725 indicates the client wants to use the SSV to protect state. The 22726 server records the credential used in the request as the machine 22727 credential (as defined above) for the eia_clientowner.co_owner. 22728 The CREATE_SESSION operation that confirms the client ID MUST use 22729 the same machine credential. 22731 When a client specifies SP4_MACH_CRED or SP4_SSV, it also provides 22732 two lists of operations (each expressed as a bit map). The first 22733 list is spo_must_enforce and consists of those operations the client 22734 MUST send (subject to the server confirming the list of operations in 22735 the result of EXCHANGE_ID) with the machine credential (if 22736 SP4_MACH_CRED protection is specified) or the SSV-based credential 22737 (if SP4_SSV protection is used). The client MUST send the operations 22738 with RPCSEC_GSS credentials that specify the RPC_GSS_SVC_INTEGRITY or 22739 RPC_GSS_SVC_PRIVACY security service. Typically the first list of 22740 operations includes EXCHANGE_ID, CREATE_SESSION, DELEGPURGE, 22741 DESTROY_SESSION, BIND_CONN_TO_SESSION, and DESTROY_CLIENTID. The 22742 client SHOULD NOT specify in this list any operations that require a 22743 filehandle because the server's access policies MAY conflict with the 22744 client's choice, and thus the client would then be unable to access a 22745 subset of the server's namespace. 22747 Note that if SP4_SSV protection is specified, and the client 22748 indicates that CREATE_SESSION must be protected with SP4_SSV, because 22749 the SSV cannot exist without a confirmed client ID, the first 22750 CREATE_SESSION MUST instead be sent using the machine credential, and 22751 the server MUST accept the machine credential. 22753 There is a corresponding result, also called spo_must_enforce, of the 22754 operations the server will require SP4_MACH_CRED or SP4_SSV 22755 protection for. Normally the server's result equals the client's 22756 argument, but the result MAY be different. If the client requests 22757 one or more operations in the set { EXCHANGE_ID, CREATE_SESSION, 22758 DELEGPURGE, DESTROY_SESSION, BIND_CONN_TO_SESSION, DESTROY_CLIENTID 22759 }, then the result spo_must_enforce MUST include the operations the 22760 client requested from that set. 22762 If spo_must_enforce in the results has BIND_CONN_TO_SESSION set, then 22763 connection binding enforcement is enabled, and the client MUST use 22764 the machine (if SP4_MACH_CRED protection is used) or SSV (if SP4_SSV 22765 protection is used) credential on calls to BIND_CONN_TO_SESSION. 22767 The second list is spo_must_allow and consists of those operations 22768 the client wants to have the option of issuing with the machine 22769 credential or the SSV-based credential, even if the object the 22770 operations are performed on is not owned by the machine or SSV 22771 credential. 22773 The corresponding result, also called spo_must_allow, consists of the 22774 operations the server will allow the client to use SP4_SSV or 22775 SP4_MACH_CRED credentials with. Normally the server's result equals 22776 the client's argument, but the result MAY be different. 22778 The purpose of spo_must_allow is to allow clients to solve the 22779 following conundrum. Suppose the client ID is confirmed with 22780 EXCHGID4_FLAG_BIND_PRINC_STATEID, and it calls OPEN with the 22781 RPCSEC_GSS credentials of a normal user. Now suppose the user's 22782 credentials expire, and cannot be renewed (e.g. a Kerberos ticket 22783 granting ticket expires, and the user has logged off and will not be 22784 acquiring a new ticket granting ticket). The client will be unable 22785 to send CLOSE without the user's credentials, which is to say the 22786 client has to either leave the state on the server, or it has to re- 22787 send EXCHANGE_ID with a new verifier to clear all state. That is, 22788 unless the client includes CLOSE on the list of operations in 22789 spo_must_allow and the server agrees. 22791 The SP4_SSV protection parameters also have: 22793 ssp_hash_algs: 22795 This is the set of algorithms the client supports for the purpose 22796 of computing the digests needed for the internal SSV GSS mechanism 22797 and for the SET_SSV operation. Each algorithm is specified as an 22798 object identifier (OID). The REQUIRED algorithms for a server are 22799 id-sha1, id-sha224, id-sha256, id-sha384, and id-sha512 [18]. The 22800 algorithm the server selects among the set is indicated in 22801 spi_hash_alg, a field of spr_ssv_prot_info. The field 22802 spi_hash_alg is an index into the array ssp_hash_algs. If the 22803 server does not support any of the offered algorithms, it returns 22804 NFS4ERR_HASH_ALG_UNSUPP. If ssp_hash_algs is empty, the server 22805 MUST return NFS4ERR_INVAL. 22807 ssp_encr_algs: 22809 This is the set of algorithms the client supports for the purpose 22810 of providing privacy protection for the internal SSV GSS 22811 mechanism. Each algorithm is specified as an OID. The REQUIRED 22812 algorithm for a server is id-aes256-CBC. The RECOMMENDED 22813 algorithms are id-aes192-CBC and id-aes128-CBC [19]. The selected 22814 algorithm is returned in spi_encr_alg, an index into 22815 ssp_encr_algs. If the server does not support any of the offered 22816 algorithms, it returns NFS4ERR_ENCR_ALG_UNSUPP. If ssp_encr_algs 22817 is empty, the server MUST return NFS4ERR_INVAL. 22819 ssp_window: 22821 This is the number of SSV versions the client wants the server to 22822 maintain (i.e. each successful call to SET_SSV produces a new 22823 version of the SSV). If ssp_window is zero, the server MUST 22824 return NFS4ERR_INVAL. The server responds with spi_window, which 22825 MUST NOT exceed ssp_window, and MUST be at least one (1). Any 22826 requests on the backchannel or fore channel that are using a 22827 version of the SSV that is outside the window will fail with an 22828 ONC RPC authentication error, and the requester will have to retry 22829 them with the same slot ID and sequence ID. 22831 ssp_num_gss_handles: 22833 This is the number of RPCSEC_GSS handles the server should create 22834 that are based on the GSS SSV mechanism (Section 2.10.8). It is 22835 not the total number of RPCSEC_GSS handles for the client ID. 22836 Indeed, subsequent calls to EXCHANGE_ID will add RPCSEC_GSS 22837 handles. The server responds with a list of handles in 22838 spi_handles. If the client asks for at least one handle and the 22839 server cannot create it, the server MUST return an error. The 22840 handles in spi_handles are not available for use until the client 22841 ID is confirmed, which could be immediately if EXCHANGE_ID returns 22842 EXCHGID4_FLAG_CONFIRMED_R, or upon successful confirmation from 22843 CREATE_SESSION. While a client ID can span all the connections 22844 that are connected to a server sharing the same 22845 eir_server_owner.so_major_id, the RPCSEC_GSS handles returned in 22846 spi_handles can only be used on connections connected to a server 22847 that returns the same the eir_server_owner.so_major_id and 22848 eir_server_owner.so_minor_id on each connection. It is 22849 permissible for the client to set ssp_num_gss_handles to zero (0); 22850 the client can create more handles with another EXCHANGE_ID call. 22852 The arguments include an array of up to one element in length called 22853 eia_client_impl_id. If eia_client_impl_id is present it contains the 22854 information identifying the implementation of the client. Similarly, 22855 the results include an array of up to one element in length called 22856 eir_server_impl_id that identifies the implementation of the server. 22857 Servers MUST accept a zero length eia_client_impl_id array, and 22858 clients MUST accept a zero length eir_server_impl_id array. 22860 An example use for implementation identifiers would be diagnostic 22861 software that extract this information in an attempt to identify 22862 interoperability problems, performance workload behaviors or general 22863 usage statistics. Since the intent of having access to this 22864 information is for planning or general diagnosis only, the client and 22865 server MUST NOT interpret this implementation identity information in 22866 a way that affects interoperational behavior of the implementation. 22868 The reason is that if clients and servers did such a thing, they 22869 might use fewer capabilities of the protocol than the peer can 22870 support, or the client and server might refuse to interoperate. 22872 Because it is possible some implementations will violate the protocol 22873 specification and interpret the identity information, implementations 22874 MUST allow the users of the NFSv4 client and server to set the 22875 contents of the sent nfs_impl_id structure to any value. 22877 18.35.4. IMPLEMENTATION 22879 A server's client record is a 5-tuple: 22881 1. co_ownerid 22883 The client identifier string, from the eia_clientowner 22884 structure of the EXCHANGE_ID4args structure 22886 2. co_verifier: 22888 A client-specific value used to indicate incarnations (where a 22889 client restart represents a new incarnation), from the 22890 eia_clientowner structure of the EXCHANGE_ID4args structure 22892 3. principal: 22894 The principal that was defined in the RPC header's credential 22895 and/or verifier at the time the client record was established. 22897 4. client ID: 22899 The shorthand client identifier, generated by the server and 22900 returned via the eir_clientid field in the EXCHANGE_ID4resok 22901 structure 22903 5. confirmed: 22905 A private field on the server indicating whether or not a 22906 client record has been confirmed. A client record is 22907 confirmed if there has been a successful CREATE_SESSION 22908 operation to confirm it. Otherwise it is unconfirmed. An 22909 unconfirmed record is established by a EXCHANGE_ID call. Any 22910 unconfirmed record that is not confirmed within a lease period 22911 SHOULD be removed. 22913 The following identifiers represent special values for the fields in 22914 the records. 22916 ownerid_arg: 22918 The value of the eia_clientowner.co_ownerid subfield of the 22919 EXCHANGE_ID4args structure of the current request. 22921 verifier_arg: 22923 The value of the eia_clientowner.co_verifier subfield of the 22924 EXCHANGE_ID4args structure of the current request. 22926 old_verifier_arg: 22928 A value of the eia_clientowner.co_verifier field of a client 22929 record received in a previous request; this is distinct from 22930 verifier_arg. 22932 principal_arg: 22934 The value of the RPCSEC_GSS principal for the current request. 22936 old_principal_arg: 22938 A value of the principal of a client record as defined by the RPC 22939 header's credential or verifier of a previous request. This is 22940 distinct from principal_arg. 22942 clientid_ret: 22944 The value of the eir_clientid field the server will return in the 22945 EXCHANGE_ID4resok structure for the current request. 22947 old_clientid_ret: 22949 The value of the eir_clientid field the server returned in the 22950 EXCHANGE_ID4resok structure for a previous request. This is 22951 distinct from clientid_ret. 22953 confirmed: 22955 The client ID has been confirmed. 22957 unconfirmed: 22959 The client ID has not been confirmed. 22961 Since EXCHANGE_ID is a non-idempotent operation, we must consider the 22962 possibility that retries occur as a result of a client restart, 22963 network partition, malfunctioning router, etc. Retries are 22964 identified by the value of the eia_clientowner field of 22965 EXCHANGE_ID4args and the method for dealing with them is outlined in 22966 the scenarios below. 22968 The scenarios are described in terms of the client record(s) a server 22969 has for a given co_ownerid. Note if the client ID was created 22970 specifying SP4_SSV state protection and EXCHANGE_ID as the one of the 22971 operations in spo_must_allow, then server MUST authorize EXCHANGE_IDs 22972 with the SSV principal in addition to the principal that created the 22973 client ID. 22975 1. New Owner ID 22977 If the server has no client records with 22978 eia_clientowner.co_ownerid matching ownerid_arg, and 22979 EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in the 22980 EXCHANGE_ID, then a new shorthand client ID (let us call it 22981 clientid_ret) is generated, and the following unconfirmed 22982 record is added to the server's state. 22984 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 22985 unconfirmed } 22987 Subsequently, the server returns clientid_ret. 22989 2. Non-Update on Existing Client ID 22991 If the server has the following confirmed record, and the 22992 request does not have EXCHGID4_FLAG_UPD_CONFIRMED_REC_A set, 22993 then the request is the result of a retried request due to a 22994 faulty router or lost connection, or the client is trying to 22995 determine if it can perform trunking. 22997 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 22998 confirmed } 23000 Since the record has been confirmed, the client must have 23001 received the server's reply from the initial EXCHANGE_ID 23002 request. Since the server has a confirmed record, and since 23003 EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, with the 23004 possible exception of eir_server_owner.so_minor_id, the server 23005 returns the same result it did when the client ID's properties 23006 were last updated (or if never updated, the result when the 23007 client ID was created). The confirmed record is unchanged. 23009 3. Client Collision 23011 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, and if the 23012 server has the following confirmed record, then this request 23013 is likely the result of a chance collision between the values 23014 of the eia_clientowner.co_ownerid subfield of EXCHANGE_ID4args 23015 for two different clients. 23017 { ownerid_arg, *, old_principal_arg, old_clientid_ret, 23018 confirmed } 23020 If there is currently no state associated with 23021 old_clientid_ret, or if there is state but the lease has 23022 expired, then this case is effectively equivalent to the New 23023 Owner ID case of Paragraph 1. The confirmed record is 23024 deleted, the old_clientid_ret and its lock state are deleted, 23025 a new shorthand client ID is generated, and the following 23026 unconfirmed record is added to the server's state. 23028 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 23029 unconfirmed } 23031 Subsequently, the server returns clientid_ret. 23033 If old_clientid_ret has an unexpired lease with state, then no 23034 state of old_clientid_ret is changed or deleted. The server 23035 returns NFS4ERR_CLID_INUSE to indicate the client should retry 23036 with a different value for the eia_clientowner.co_ownerid 23037 subfield of EXCHANGE_ID4args. The client record is not 23038 changed. 23040 4. Replacement of Unconfirmed Record 23042 If the EXCHGID4_FLAG_UPD_CONFIRMED_REC_A flag is not set, and 23043 the server has the following unconfirmed record then the 23044 client is attempting EXCHANGE_ID again on an unconfirmed 23045 client ID, perhaps due to a retry, or perhaps due to a client 23046 restart before client ID confirmation (i.e. before 23047 CREATE_SESSION was called), or some other reason. 23049 { ownerid_arg, *, *, old_clientid_ret, unconfirmed } 23051 It is possible the properties of old_clientid_ret are 23052 different than those specified in the current EXCHANGE_ID. 23053 Whether the properties are being updated or not, to eliminate 23054 ambiguity, the server deletes the unconfirmed record, 23055 generates a new client ID (clientid_ret) and establishes the 23056 following unconfirmed record: 23058 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 23059 unconfirmed } 23061 5. Client Restart 23063 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, and if the 23064 server has the following confirmed client record, then this 23065 request is likely from a previously confirmed client which has 23066 restarted. 23068 { ownerid_arg, old_verifier_arg, principal_arg, 23069 old_clientid_ret, confirmed } 23071 Since the previous incarnation of the same client will no 23072 longer be making requests, once the new client ID is confirmed 23073 by CREATE_SESSION, lock and share reservations should be 23074 released immediately rather than forcing the new incarnation 23075 to wait for the lease time on the previous incarnation to 23076 expire. Furthermore, session state should be removed since if 23077 the client had maintained that information across restart, 23078 this request would not have been sent. If the server does not 23079 support the CLAIM_DELEGATE_PREV claim type, associated 23080 delegations should be purged as well; otherwise, delegations 23081 are retained and recovery proceeds according to 23082 Section 10.2.1. 23084 After processing, clientid_ret is returned to the client and 23085 this client record is added: 23087 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 23088 unconfirmed } 23090 The previously described confirmed record continues to exist, 23091 and thus the same ownerid_arg exists in both a confirmed and 23092 unconfirmed state at the same time. The number of states can 23093 collapse to one once the server receives an applicable 23094 CREATE_SESSION or EXCHANGE_ID. 23096 + If the server subsequently receives a successful 23097 CREATE_SESSION that confirms clientid_ret, then the server 23098 atomically destroys the confirmed record and makes the 23099 unconfirmed record confirmed as described in 23100 Section 18.36.4. 23102 + If the server instead subsequently receives an EXCHANGE_ID 23103 with the client owner equal to ownerid_arg, one strategy is 23104 to simply delete the unconfirmed record, and process the 23105 EXCHANGE_ID as described in the entirety of 23106 Section 18.35.4. 23108 6. Update 23110 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 23111 has the following confirmed record, then this request is an 23112 attempt at an update. 23114 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 23115 confirmed } 23117 Since the record has been confirmed, the client must have 23118 received the server's reply from the initial EXCHANGE_ID 23119 request. The server allows the update, and the client record 23120 is left intact. 23122 7. Update but No Confirmed Record 23124 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 23125 has no confirmed record corresponding ownerid_arg, then the 23126 server returns NFS4ERR_NOENT and leaves any unconfirmed record 23127 intact. 23129 8. Update but Wrong Verifier 23131 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 23132 has the following confirmed record, then this request is an 23133 illegal attempt at an update, perhaps because of a retry from 23134 an previous client incarnation. 23136 { ownerid_arg, old_verifier_arg, *, clientid_ret, confirmed } 23138 The server returns NFS4ERR_NOT_SAME and leaves the client 23139 record intact. 23141 9. Update but Wrong Principal 23143 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 23144 has the following confirmed record, then this request is an 23145 illegal attempt at an update by an unauthorized principal. 23147 { ownerid_arg, verifier_arg, old_principal_arg, clientid_ret, 23148 confirmed } 23149 The server returns NFS4ERR_PERM and leaves the client record 23150 intact. 23152 18.36. Operation 43: CREATE_SESSION - Create New Session and Confirm 23153 Client ID 23155 Start up session and confirm client ID. 23157 18.36.1. ARGUMENT 23159 struct channel_attrs4 { 23160 count4 ca_headerpadsize; 23161 count4 ca_maxrequestsize; 23162 count4 ca_maxresponsesize; 23163 count4 ca_maxresponsesize_cached; 23164 count4 ca_maxoperations; 23165 count4 ca_maxrequests; 23166 uint32_t ca_rdma_ird<1>; 23167 }; 23169 const CREATE_SESSION4_FLAG_PERSIST = 0x00000001; 23170 const CREATE_SESSION4_FLAG_CONN_BACK_CHAN = 0x00000002; 23171 const CREATE_SESSION4_FLAG_CONN_RDMA = 0x00000004; 23173 struct CREATE_SESSION4args { 23174 clientid4 csa_clientid; 23175 sequenceid4 csa_sequence; 23177 uint32_t csa_flags; 23179 channel_attrs4 csa_fore_chan_attrs; 23180 channel_attrs4 csa_back_chan_attrs; 23182 uint32_t csa_cb_program; 23183 callback_sec_parms4 csa_sec_parms<>; 23184 }; 23186 18.36.2. RESULT 23188 struct CREATE_SESSION4resok { 23189 sessionid4 csr_sessionid; 23190 sequenceid4 csr_sequence; 23192 uint32_t csr_flags; 23194 channel_attrs4 csr_fore_chan_attrs; 23195 channel_attrs4 csr_back_chan_attrs; 23196 }; 23198 union CREATE_SESSION4res switch (nfsstat4 csr_status) { 23199 case NFS4_OK: 23200 CREATE_SESSION4resok csr_resok4; 23201 default: 23202 void; 23203 }; 23205 18.36.3. DESCRIPTION 23207 This operation is used by the client to create new session objects on 23208 the server. 23210 CREATE_SESSION can be sent with or without a preceding SEQUENCE 23211 operation in the same COMPOUND procedure. If CREATE_SESSION is sent 23212 with a preceding SEQUENCE operation, any session created by 23213 CREATE_SESSION has no direct relation to the session specified in the 23214 SEQUENCE operation, although the two sessions might be associated 23215 with the same client ID. If CREATE_SESSION is sent without a 23216 preceding SEQUENCE, then it MUST be the only operation in the 23217 COMPOUND procedure's request. If is not, the server MUST return 23218 NFS4ERR_NOT_ONLY_OP. 23220 In addition to creating a session, CREATE_SESSION has the following 23221 effects: 23223 o The first session created with a new client ID serves to confirm 23224 the creation of that client's state on the server. The server 23225 returns the parameter values for the new session. 23227 o The connection CREATE_SESSION is sent over is associated with the 23228 session's fore channel. 23230 The arguments and results of CREATE_SESSION are described as follows: 23232 csa_clientid: 23234 This is the client ID the new session will be associated with. 23235 The corresponding result is csr_sessionid, the session ID of the 23236 new session. 23238 csa_sequence: 23240 Each client ID serializes CREATE_SESSION via a per client ID 23241 sequence number (see Section 18.36.4). The corresponding result 23242 is csr_sequence, which MUST be equal to csa_sequence. 23244 In the next three arguments, the client offers a value that is to be 23245 a property of the session. It is RECOMMENDED that the server accept 23246 the value. If it is not acceptable, the server MAY use a different 23247 value. Regardless, the server MUST return the value the session will 23248 use (which will be either what the client offered, or what the server 23249 is insisting on). return the value used to the client. These 23250 parameters have the following interpretation. 23252 csa_flags: 23254 The csa_flags field contains a list of the following flag bits: 23256 CREATE_SESSION4_FLAG_PERSIST: 23258 If CREATE_SESSION4_FLAG_PERSIST is set, the client wants the 23259 server to provide a persistent reply cache. For sessions in 23260 which only idempotent operations will be used (e.g. a read-only 23261 session), clients SHOULD NOT set CREATE_SESSION4_FLAG_PERSIST. 23262 If the server does not or cannot provide a persistent reply 23263 cache, the server MUST NOT set CREATE_SESSION4_FLAG_PERSIST in 23264 the field csr_flags. 23266 If the server is a pNFS metadata server, for reasons described 23267 in Section 12.5.2 it SHOULD support 23268 CREATE_SESSION4_FLAG_PERSIST if it supports the layout_hint 23269 (Section 5.12.4) attribute. 23271 CREATE_SESSION4_FLAG_CONN_BACK_CHAN: 23273 If CREATE_SESSION4_FLAG_CONN_BACK_CHAN is set in csa_flags, the 23274 client is requesting that the server use the connection 23275 CREATE_SESSION is called over for the backchannel as well as 23276 the fore channel. The server sets 23277 CREATE_SESSION4_FLAG_CONN_BACK_CHAN in the result field 23278 csr_flags if it agrees. If CREATE_SESSION4_FLAG_CONN_BACK_CHAN 23279 is not set in csa_flags, then 23280 CREATE_SESSION4_FLAG_CONN_BACK_CHAN MUST NOT be set in 23281 csr_flags. 23283 CREATE_SESSION4_FLAG_CONN_RDMA: 23285 If CREATE_SESSION4_FLAG_CONN_RDMA is set in csa_flags, and if 23286 the connection CREATE_SESSION is called over is currently in 23287 non-RDMA mode, but has the capability to operate in RDMA mode, 23288 then client is requesting the server agree to "step up" to RDMA 23289 mode on the connection. The server sets 23290 CREATE_SESSION4_FLAG_CONN_RDMA in the result field csr_flags if 23291 it agrees. If CREATE_SESSION4_FLAG_CONN_RDMA is not set in 23292 csa_flags, then CREATE_SESSION4_FLAG_CONN_RDMA MUST NOT be set 23293 in csr_flags. Note that once the server agrees to step up, it 23294 and the client MUST exchange all future traffic on the 23295 connection with RPC RDMA framing and not Record Marking ([8]). 23297 csa_fore_chan_attrs, csa_fore_chan_attrs: 23299 The csa_fore_chan_attrs and csa_back_chan_attrs fields apply to 23300 attributes of the fore channel (which conveys requests originating 23301 from the client to the server), and the backchannel (the channel 23302 that conveys callback requests originating from the server to the 23303 client), respectively. The results are in corresponding 23304 structures called csr_fore_chan_attrs and csr_back_chan_attrs. 23305 The results establish attributes for each channel, and on all 23306 subsequent use of each channel of the session. Each structure has 23307 the following fields: 23309 ca_headerpadsize: 23311 The maximum amount of padding the requester is willing to apply 23312 to ensure that write payloads are aligned on some boundary at 23313 the replier. The replier should reply in ca_headerpadsize with 23314 its preferred value, or zero if padding is not in use. The 23315 replier may decrease this value but MUST NOT increase it. 23317 ca_maxrequestsize: 23319 The maximum size of a COMPOUND or CB_COMPOUND request that will 23320 be sent. This size represents the XDR encoded size of the 23321 request, including the RPC headers (including security flavor 23322 credentials and verifiers) but excludes any RPC transport 23323 framing headers. Imagine a request coming over a non-RDMA 23324 TCP/IP connection, and that it has a single Record Marking 23325 header preceding it. The maximum allowable count encoded in 23326 the header will be ca_maxrequestsize. If a requester sends a 23327 request that exceeds ca_maxrequestsize, the error 23328 NFS4ERR_REQ_TOO_BIG will be returned per the description in 23329 Section 2.10.5.4. 23331 ca_maxresponsesize: 23333 The maximum size of a COMPOUND or CB_COMPOUND reply that the 23334 requester will accept from the replier including RPC headers 23335 (see the ca_maxrequestsize definition). The NFSv4.1 server 23336 MUST NOT increase the value of this parameter in the 23337 CREATE_SESSION results. However, if the client selects a value 23338 for ca_maxresponsesize such that a replier on a channel could 23339 never send a response, the server SHOULD return 23340 NFS4ERR_TOOSMALL in the CREATE_SESSION reply. If a requester 23341 sends a request for which the size of the reply would exceed 23342 this value, the replier will return NFS4ERR_REP_TOO_BIG, per 23343 the description in Section 2.10.5.4. 23345 ca_maxresponsesize_cached: 23347 Like ca_maxresponsesize, but the maximum size of a reply that 23348 will be stored in the reply cache (Section 2.10.5.1). If the 23349 reply to CREATE_SESSION has ca_maxresponsesize_cached less than 23350 ca_maxresponsesize, then this is an indication to the requester 23351 on the channel that it needs to be selective about which 23352 replies it directs the replier to cache; for example large 23353 replies from nonidempotent operations (e.g. COMPOUND requests 23354 with a READ operation), should not be cached. The requester 23355 decides which replies to cache via an argument to the SEQUENCE 23356 (the sa_cachethis field, see Section 18.46) or CB_SEQUENCE (the 23357 csa_cachethis field, see Section 20.9) operations. If a 23358 requester sends a request for which the size of the reply would 23359 exceed this value, the replier will return 23360 NFS4ERR_REP_TOO_BIG_TO_CACHE, per the description in 23361 Section 2.10.5.4. 23363 ca_maxoperations: 23365 The maximum number of operations the replier will accept in a 23366 COMPOUND or CB_COMPOUND. The server MUST NOT increase 23367 ca_maxoperations in the reply to CREATE_SESSION. If the 23368 requester sends a COMPOUND or CB_COMPOUND with more operations 23369 than ca_maxoperations, the replier MUST return 23370 NFS4ERR_TOO_MANY_OPS. 23372 ca_maxrequests: 23374 The maximum number of concurrent COMPOUND or CB_COMPOUND 23375 requests the requester will send on the session. Subsequent 23376 requests will each be assigned a slot identifier by the 23377 requester within the range 0 to ca_maxrequests - 1 inclusive. 23379 ca_rdma_ird: 23381 This array has a maximum of one element. If this array has one 23382 element, then the element contains the inbound RDMA read queue 23383 depth (IRD). 23385 csa_cb_program 23387 This is the ONC RPC program number the server must use in any 23388 callbacks sent through the backchannel to the client. The server 23389 MUST specify an ONC RPC program number equal to csa_cb_program and 23390 an ONC RPC version number equal to 4 in callbacks sent to the 23391 client. If a CB_COMPOUND is sent to the client, the server MUST 23392 use a minor version number of 1. There is no corresponding 23393 result. 23395 csa_sec_parms 23397 The field csa_sec_parms is an array of acceptable security 23398 credentials the server can use on the session's backchannel. 23399 Three security flavors are supported: AUTH_NONE, AUTH_SYS, and 23400 RPCSEC_GSS. If AUTH_NONE is specified for a credential, then this 23401 says the client is authorizing the server to use AUTH_NONE on all 23402 callbacks for the session. If AUTH_SYS is specified, then the 23403 client is authorizing the server to use AUTH_SYS on all callbacks, 23404 using the credential specified cbsp_sys_cred. If RPCSEC_GSS is 23405 specified, then the server is allowed to use the RPCSEC_GSS 23406 context specified in cbsp_gss_parms as the RPCSEC_GSS context in 23407 the credential of the RPC header of callbacks to the client. 23408 There is no corresponding result. 23410 The RPCSEC_GSS context for the backchannel is specified via a pair 23411 of values of data type gsshandle4_t. The data type gsshandle4_t 23412 represents an RPCSEC_GSS handle, and is precisely the same as the 23413 data type of the "handle" field of the rpc_gss_init_res data type 23414 defined in Section 5.2.3.1, "Context Creation Response - 23415 Successful Acceptance" of [4]. 23417 The first RPCSEC_GSS handle, gcbp_handle_from_server, is the fore 23418 handle the server returned to the client (either in the handle 23419 field of data type rpc_gss_init_res or one of the elements of the 23420 spi_handles field returned in the reply to EXCHANGE_ID) when the 23421 RPCSEC_GSS context was created on the server. The second handle, 23422 gcbp_handle_from_client, is the back handle the client will map 23423 the RPCSEC_GSS context to. The server can immediately use the 23424 value of gcbp_handle_from_client in the RPCSEC_GSS credential in 23425 callback RPCs. I.e., the value in gcbp_handle_from_client can be 23426 used as the value of the field "handle" in data type 23427 rpc_gss_cred_t (see Section 5, "Elements of the RPCSEC_GSS 23428 Security Protocol" of [4]) in callback RPCs. The server MUST use 23429 the RPCSEC_GSS security service specified in gcbp_service, i.e. it 23430 MUST set the "service" field of the rpc_gss_cred_t data type in 23431 RPCSEC_GSS credential to the value of gcbp_service (see Section 23432 5.3.1, "RPC Request Header", of [4]). 23434 If the RPCSEC_GSS handle identified by gcbp_handle_from_server 23435 does not exist on the server, the server will return 23436 NFS4ERR_NOENT. 23438 Note that while the GSS context state is shared between the fore 23439 and back RPCSEC_GSS contexts, the fore and back RPCSEC_GSS context 23440 state are independent of each other as far as the RPCSEC_GSS 23441 sequence number (see the seq_num field in the rpc_gss_cred_t data 23442 type of Section 5 and of Section 5.3.1, "RPC Request Header", of 23443 [4]). 23445 Once the session is created, the first SEQUENCE or CB_SEQUENCE 23446 received on a slot MUST have a sequence ID equal to 1; if not the 23447 server MUST return NFS4ERR_SEQ_MISORDERED. 23449 18.36.4. IMPLEMENTATION 23451 To describe a possible implementation, the same notation for client 23452 records introduced in the description of EXCHANGE_ID is used with the 23453 following addition: 23455 clientid_arg: The value of the csa_clientid field of the 23456 CREATE_SESSION4args structure of the current request. 23458 Since CREATE_SESSION is a non-idempotent operation, we must consider 23459 the possibility that retries may occur as a result of a client 23460 restart, network partition, malfunctioning router, etc. For each 23461 client ID created by EXCHANGE_ID, the server maintains a separate 23462 reply cache (called the CREATE_SESSION reply cache) similar to the 23463 session reply cache used for SEQUENCE operations, with two 23464 distinctions. 23466 o First this is a reply cache just for detecting and processing 23467 CREATE_SESSION requests for a given client ID. 23469 o Second, the size of the client ID reply cache is of one slot (and 23470 as a result, the CREATE_SESSION request does not carry a slot 23471 number). This means that at most one CREATE_SESSION request for a 23472 given client ID can be outstanding. 23474 As previously stated, CREATE_SESSION can be sent with or without a 23475 preceding SEQUENCE operation. Even if SEQUENCE precedes 23476 CREATE_SESSION, the server MUST maintain the CREATE_SESSION reply 23477 cache, which is separate from the reply cache for the session 23478 associated with SEQUENCE. If CREATE_SESSION was originally sent by 23479 itself, the client MAY send a retry of the CREATE_SESSION operation 23480 within a COMPOUND preceded by SEQUENCE. If CREATE_SESSION was 23481 originally sent in a COMPOUND that started with SEQUENCE, then the 23482 client SHOULD send a retry in a COMPOUND that starts with SEQUENCE 23483 that has the same session ID as the SEQUENCE of the original request. 23484 However, the client MAY send a retry in a COMPOUND that either has no 23485 preceding SEQUENCE, or has a preceding SEQUENCE that refers to a 23486 different session than the original CREATE_SESSION. This might be 23487 necessary if the client sends a CREATE_SESSION in a COMPOUND preceded 23488 by a SEQUENCE with session ID X, and session X no longer exists. 23489 Regardless, any retry of CREATE_SESSION, with or without a preceding 23490 SEQUENCE, MUST use the same value of csa_sequence as the original. 23492 When a client sends a successful EXCHANGE_ID and it is returned an 23493 unconfirmed client ID, the client is also returned eir_sequenceid, 23494 and the client is expected to set the value of csa_sequenceid in the 23495 client ID-confirming-CREATE_SESSION it sends with that client ID to 23496 the value of eir_sequenceid. When EXCHANGE_ID returns a new, 23497 unconfirmed client ID, the server initializes the client ID slot to 23498 be equal to eir_sequenceid - 1 (accounting for underflow), and 23499 records a contrived CREATE_SESSION result with a "cached" result of 23500 NFS4ERR_SEQ_MISORDERED. With the slot thus initialized, the 23501 processing of the CREATE_SESSION operation is divided into four 23502 phases: 23504 1. Client record lookup. The server looks up the client ID in its 23505 client record table. If the server contains no records with 23506 client ID equal to clientid_arg, then most likely the client's 23507 state has been purged during a period of inactivity, possibly due 23508 to a loss of connectivity. NFS4ERR_STALE_CLIENTID is returned, 23509 and no changes are made to any client records on the server. 23510 Otherwise, the server goes to phase 2. 23512 2. Sequence ID processing. If csa_sequenceid is equal to the 23513 sequence ID in the client ID's slot, then this is a replay of the 23514 previous CREATE_SESSION request, and the server returns the 23515 cached result. If csa_sequenceid is not equal to the sequence ID 23516 in the slot, and is more than one greater (accounting for 23517 wraparound), then the server returns the error 23518 NFS4ERR_SEQ_MISORDERED, and does not change the slot. If 23519 csa_sequenceid is equal to the slot's sequence ID + 1 (accounting 23520 for wraparound), then the slot's sequence ID is set to 23521 csa_sequenceid, and the CREATE_SESSION processing goes to the 23522 next phase. A subsequent new CREATE_SESSION call MUST use a 23523 csa_sequence that is one greater than last successfully used. 23525 3. Client ID confirmation. If this would be the first session for 23526 the client ID, the CREATE_SESSION operation serves to confirm the 23527 client ID. Otherwise the client ID confirmation phase is skipped 23528 and only the session creation phase occurs. Any case in which 23529 there is more than one record with identical values for client ID 23530 represents a server implementation error. Operation in the 23531 potential valid cases is summarized as follows. 23533 * Successful Confirmation 23535 If the server has the following unconfirmed record, then 23536 this is the expected confirmation of an unconfirmed record. 23538 { ownerid, verifier, principal_arg, clientid_arg, 23539 unconfirmed } 23541 As noted in Section 18.35.4, the server might also have the 23542 following confirmed record. 23544 { ownerid, old_verifier, principal_arg, old_clientid, 23545 confirmed } 23547 The server schedules the replacement of both records with: 23549 { ownerid, verifier, principal_arg, clientid_arg, confirmed 23550 } 23552 The processing of CREATE_SESSION continues on to session 23553 creation. Once the session is successfully created, the 23554 scheduled client record replacement is committed. If the 23555 session is not successfully created, then no changes are 23556 made to any client records on the server. 23558 * Unsuccessful Confirmation 23560 If the server has the following record, then the client has 23561 changed principals after the previous EXCHANGE_ID request, 23562 or there has been a chance collision between shorthand 23563 client identifiers. 23565 { *, *, old_principal_arg, clientid_arg, * } 23567 Neither of these cases are permissible. Processing stops 23568 and NFS4ERR_CLID_INUSE is returned to the client. No 23569 changes are made to any client records on the server. 23571 4. Session creation. The server confirmed the client ID, either in 23572 this CREATE_SESSION operation, or a previous CREATE_SESSION 23573 operation. The server examines the remaining fields of the 23574 arguments. 23576 5. The server creates the session by recording the parameter values 23577 used (including whether the CREATE_SESSION4_FLAG_PERSIST flag is 23578 set and has been accepted by the server) and allocating space for 23579 the session reply cache (if there is not enough space, the server 23580 returns NFS4ERR_NOSPC). For each slot in the reply cache, the 23581 server sets the sequence ID to zero (0), and records an entry 23582 containing a COMPOUND reply with zero operations and the error 23583 NFS4ERR_SEQ_MISORDERED. This way, if the first SEQUENCE request 23584 sent has a sequence ID equal to zero, the server can simply 23585 return what is in the reply cache: NFS4ERR_SEQ_MISORDERED. The 23586 client initializes its reply cache for receiving callbacks in the 23587 same way, and similarly, the first CB_SEQUENCE operation on a 23588 slot after session creation must have a sequence ID of one. 23590 6. If the session state is created successfully, the server 23591 associates the session with the client ID provided by the client. 23593 7. When a request that had CREATE_SESSION4_FLAG_CONN_RDMA set needs 23594 to be retried, the retry MUST be done on a new connection that is 23595 in non-RDMA mode. If properties of the new connection are 23596 different enough that the arguments to CREATE_SESSION must 23597 change, then a non-retry MUST be sent. The server will 23598 eventually dispose of any session that was created on the 23599 original connection. 23601 On the backchannel, the client and server might wish to have many 23602 slots, in some cases perhaps more that the fore channel, in to deal 23603 with the situations where the network link has high latency and is 23604 the primary bottleneck for response to recalls. If so, and if the 23605 client provides too few slots to the backchannel, the server might 23606 limit the number of recallable objects it gives to the server. 23608 Implementing RPCSEC_GSS callback support requires the client and 23609 server change their RPCSEC_GSS implementations. One possible set of 23610 changes includes: 23612 o Adding a data structure that wraps the GSS-API context with a 23613 reference count. 23615 o New functions to increment and decrement the reference count. If 23616 the reference count is decremented to zero, the wrapper data 23617 structure and the GSS-API context it refers to would be freed. 23619 o Change RPCSEC_GSS to create the wrapper data structure upon 23620 receiving GSS-API context from gss_accept_sec_context() and 23621 gss_init_sec_context(). The reference count would be initialized 23622 to 1. 23624 o Adding a function to map an existing RPCSEC_GSS handle to a 23625 pointer to the wrapper data structure. The reference count would 23626 be incremented. 23628 o Adding a function to create a new RPCSEC_GSS handle from a pointer 23629 to the wrapper data structure. The reference count would be 23630 incremented. 23632 o Replacing calls from RPCSEC_GSS that free GSS-API contexts, with 23633 calls to decrement the reference count on the wrapper data 23634 structure. 23636 18.37. Operation 44: DESTROY_SESSION - Destroy existing session 23638 Destroy existing session. 23640 18.37.1. ARGUMENT 23642 struct DESTROY_SESSION4args { 23643 sessionid4 dsa_sessionid; 23644 }; 23646 18.37.2. RESULT 23648 struct DESTROY_SESSION4res { 23649 nfsstat4 dsr_status; 23650 }; 23652 18.37.3. DESCRIPTION 23654 The DESTROY_SESSION operation closes the session and discards the 23655 session's reply cache, if any. Any remaining connections associated 23656 with the session are immediately disassociated. If the connection 23657 has no remaining associated sessions, the connection MAY be closed by 23658 the server. Locks, delegations, layouts, wants, and the lease, which 23659 are all tied to the client ID, are not affected by DESTROY_SESSION. 23661 DESTROY_SESSION MUST be invoked on a connection that is associated 23662 with the session being destroyed. In addition if SP4_MACH_CRED state 23663 protection was specified when the client ID was created, the 23664 RPCSEC_GSS principal that created the session MUST be the one that 23665 destroys the session, using RPCSEC_GSS privacy or integrity. If 23666 SP4_SSV state protection was specified when the client ID was 23667 created, RPCSEC_GSS using the SSV mechanism (Section 2.10.8) MUST be 23668 used, with integrity or privacy. 23670 If the COMPOUND request starts with SEQUENCE, and if the sessionids 23671 specified in SEQUENCE and DESTROY_SESSION are the same, then 23673 o DESTROY_SESSION MUST be the final operation in the COMPOUND 23674 request. 23676 o It is advisable to not place DESTROY_SESSION in a COMPOUND request 23677 with other state-modifying operations, because the DESTROY_SESSION 23678 will destroy the reply cache. 23680 DESTROY_SESSION MAY be the only operation in a COMPOUND request. 23682 Because the session is destroyed, a client that retries the request 23683 may receive an error in reply to the retry, even though the original 23684 request was successful. 23686 If there is a backchannel on the session and the server has 23687 outstanding CB_COMPOUND operations for the session which have not 23688 been replied to, then the server MAY refuse to destroy the session 23689 and return an error. In the event the backchannel is down, the 23690 server SHOULD return NFS4ERR_CB_PATH_DOWN to inform the client that 23691 the backchannel needs to repaired before the server will allow the 23692 session to be destroyed. Otherwise, the error CB_BACK_CHAN_BUSY 23693 SHOULD be returned to indicate that there are CB_COMPOUNDs that need 23694 to be replied to. The client SHOULD reply to all outstanding 23695 CB_COMPOUNDs before re-sending DESTROY_SESSION. 23697 18.38. Operation 45: FREE_STATEID - Free stateid with no locks 23699 Free a single stateid. 23701 18.38.1. ARGUMENT 23703 struct FREE_STATEID4args { 23704 stateid4 fsa_stateid; 23705 }; 23707 18.38.2. RESULT 23709 struct FREE_STATEID4res { 23710 nfsstat4 fsr_status; 23711 }; 23713 18.38.3. DESCRIPTION 23715 The FREE_STATEID operation is used to free a stateid which no longer 23716 has any associated locks (including opens, byte-range locks, 23717 delegations, layouts). This may be because of client unlock 23718 operations or because of server revocation. If there are valid locks 23719 (of any kind) associated with the stateid in question, the error 23720 NFS4ERR_LOCKS_HELD will be returned, and the associated stateid will 23721 not be freed. 23723 When a stateid is freed which had been associated with revoked locks, 23724 the client, by doing the FREE_STATEID acknowledges the loss of those 23725 locks. This allows the server, once all such revoked state is 23726 acknowledged, to allow that client again to reclaim locks, without 23727 encountering the edge conditions discussed in Section 8.4.2. 23729 Once a successful FREE_STATEID is done for a given stateid, any 23730 subsequent use of that stateid will result in an NFS4ERR_BAD_STATEID 23731 error. 23733 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory delegation 23735 Obtain a directory delegation. 23737 18.39.1. ARGUMENT 23739 typedef nfstime4 attr_notice4; 23741 struct GET_DIR_DELEGATION4args { 23742 /* CURRENT_FH: delegated directory */ 23743 bool gdda_signal_deleg_avail; 23744 bitmap4 gdda_notification_types; 23745 attr_notice4 gdda_child_attr_delay; 23746 attr_notice4 gdda_dir_attr_delay; 23747 bitmap4 gdda_child_attributes; 23748 bitmap4 gdda_dir_attributes; 23749 }; 23751 18.39.2. RESULT 23753 struct GET_DIR_DELEGATION4resok { 23754 verifier4 gddr_cookieverf; 23755 /* Stateid for get_dir_delegation */ 23756 stateid4 gddr_stateid; 23757 /* Which notifications can the server support */ 23758 bitmap4 gddr_notification; 23759 bitmap4 gddr_child_attributes; 23760 bitmap4 gddr_dir_attributes; 23761 }; 23763 enum gddrnf4_status { 23764 GDD4_OK = 0, 23765 GDD4_UNAVAIL = 1 23766 }; 23768 union GET_DIR_DELEGATION4res_non_fatal 23769 switch (gddrnf4_status gddrnf_status) { 23770 case GDD4_OK: 23771 GET_DIR_DELEGATION4resok gddrnf_resok4; 23772 case GDD4_UNAVAIL: 23773 bool gddrnf_will_signal_deleg_avail; 23774 }; 23776 union GET_DIR_DELEGATION4res 23777 switch (nfsstat4 gddr_status) { 23778 case NFS4_OK: 23779 GET_DIR_DELEGATION4res_non_fatal gddr_res_non_fatal4; 23780 default: 23781 void; 23782 }; 23784 18.39.3. DESCRIPTION 23786 The GET_DIR_DELEGATION operation is used by a client to request a 23787 directory delegation. The directory is represented by the current 23788 filehandle. The client also specifies whether it wants the server to 23789 notify it when the directory changes in certain ways by setting one 23790 or more bits in a bitmap. The server may choose not to grant the 23791 delegation. In that case the server will return 23792 NFS4ERR_DIRDELEG_UNAVAIL. If the server decides to hand out the 23793 delegation, it will return a cookie verifier for that directory. If 23794 the cookie verifier changes when the client is holding the 23795 delegation, the delegation will be recalled unless the client has 23796 asked for notification for this event. 23798 The server will also return a directory delegation stateid, 23799 gddr_stateid, as a result of the GET_DIR_DELEGATION operation. This 23800 stateid will appear in callback messages related to the delegation, 23801 such as notifications and delegation recalls. The client will use 23802 this stateid to return the delegation voluntarily or upon recall. A 23803 delegation is returned by calling the DELEGRETURN operation. 23805 The server might not be able to support notifications of certain 23806 events. If the client asks for such notifications, the server MUST 23807 inform the client of its inability to do so as part of the 23808 GET_DIR_DELEGATION reply by not setting the appropriate bits in the 23809 supported notifications bitmask, gddr_notification, contained in the 23810 reply. The server MUST NOT add bits to gddr_notification that the 23811 client did not request. 23813 The GET_DIR_DELEGATION operation can be used for both normal and 23814 named attribute directories. 23816 If client sets gdda_signal_deleg_avail to TRUE, then it is 23817 registering with the client a "want" for a directory delegation. If 23818 the delegation is not available, and the server supports and will 23819 honor the "want", the results will have 23820 gddrnf_will_signal_deleg_avail set to TRUE and no error will be 23821 indicated on return. If so the client should expect a future 23822 CB_RECALLABLE_OBJ_AVAIL operation to indicate that a directory 23823 delegation is available. If the server does not wish to honor the 23824 "want" or is not able to do so, it returns the error 23825 NFS4ERR_DIRDELEG_UNAVAIL. If the delegation is immediately 23826 available, the server SHOULD return it with the response to the 23827 operation, rather than via a callback. 23829 18.39.4. IMPLEMENTATION 23831 Directory delegations provide the benefit of improving cache 23832 consistency of namespace information. This is done through 23833 synchronous callbacks. A server must support synchronous callbacks 23834 in order to support directory delegations. In addition to that, 23835 asynchronous notifications provide a way to reduce network traffic as 23836 well as improve client performance in certain conditions. 23838 Notifications are specified in terms of potential changes to the 23839 directory. A client can ask to be notified of events by setting one 23840 or more bits in gdda_notification_types. The client can ask for 23841 notifications on addition of entries to a directory (by setting the 23842 NOTIFY4_ADD_ENTRY in gdda_notification_types), notifications on entry 23843 removal (NOTIFY4_REMOVE_ENTRY), renames (NOTIFY4_RENAME_ENTRY), 23844 directory attribute changes (NOTIFY4_CHANGE_DIR_ATTRIBUTES), and 23845 cookie verifier changes (NOTIFY4_CHANGE_COOKIE_VERIFIER) by setting 23846 one or more corresponding bits in the gdda_notification_types field. 23848 The client can also ask for notifications of changes to attributes of 23849 directory entries (NOTIFY4_CHANGE_CHILD_ATTRIBUTES) in order to keep 23850 its attribute cache up to date. However any changes made to child 23851 attributes do not cause the delegation to be recalled. If a client 23852 is interested in directory entry caching, or negative name caching, 23853 it can set the gdda_notification_types appropriately to its 23854 particular need and the server will notify it of all changes that 23855 would otherwise invalidate its name cache. The kind of notification 23856 a client asks for may depend on the directory size, its rate of 23857 change and the applications being used to access that directory. The 23858 enumeration of the conditions under which a client might ask for a 23859 notification is out of the scope of this specification. 23861 For attribute notifications, the client will set bits in the 23862 gdda_dir_attributes bitmap to indicate which attributes it wants to 23863 be notified of. If the server does not support notifications for 23864 changes to a certain attribute, it SHOULD NOT set that attribute in 23865 the supported attribute bitmap specified in the reply 23866 (gddr_dir_attributes). The client will also set in the 23867 gdda_child_attributes bitmap the attributes of directory entries it 23868 wants to be notified of, and the server will indicate in 23869 gddr_child_attributes which attributes of directory entries it will 23870 notify the client of. 23872 The client will also let the server know if it wants to get the 23873 notification as soon as the attribute change occurs or after a 23874 certain delay by setting a delay factor; gdda_child_attr_delay is for 23875 attribute changes to directory entries and gdda_dir_attr_delay is for 23876 attribute changes to the directory. If this delay factor is set to 23877 zero, that indicates to the server that the client wants to be 23878 notified of any attribute changes as soon as they occur. If the 23879 delay factor is set to N seconds, the server will make a best effort 23880 guarantee that attribute updates are synchronized within N seconds. 23881 If the client asks for a delay factor that the server does not 23882 support or that may cause significant resource consumption on the 23883 server by causing the server to send a lot of notifications, the 23884 server should not commit to sending out notifications for attributes 23885 and therefore must not set the appropriate bit in the 23886 gddr_child_attributes and gddr_dir_attributes bitmaps in the 23887 response. 23889 The client MUST use a security tuple (Section 2.6.1) that the 23890 directory or its applicable ancestor (Section 2.6) is exported with. 23891 If not, the server MUST return NFS4ERR_WRONGSEC to the operation that 23892 both precedes GET_DIR_DELEGATION and sets the current filehandle (see 23893 Section 2.6.3.1). 23895 The directory delegation covers all the entries in the directory 23896 except the parent entry. That means if a directory and its parent 23897 both hold directory delegations, any changes to the parent will not 23898 cause a notification to be sent for the child even though the child's 23899 parent entry points to the parent directory. 23901 18.40. Operation 47: GETDEVICEINFO - Get Device Information 23903 18.40.1. ARGUMENT 23905 struct GETDEVICEINFO4args { 23906 deviceid4 gdia_device_id; 23907 layouttype4 gdia_layout_type; 23908 count4 gdia_maxcount; 23909 bitmap4 gdia_notify_types; 23910 }; 23912 18.40.2. RESULT 23914 struct GETDEVICEINFO4resok { 23915 device_addr4 gdir_device_addr; 23916 bitmap4 gdir_notification; 23917 }; 23919 union GETDEVICEINFO4res switch (nfsstat4 gdir_status) { 23920 case NFS4_OK: 23921 GETDEVICEINFO4resok gdir_resok4; 23922 case NFS4ERR_TOOSMALL: 23923 count4 gdir_mincount; 23924 default: 23925 void; 23926 }; 23928 18.40.3. DESCRIPTION 23930 Returns pNFS storage device address information for the specified 23931 device ID. The client identifies the device information to be 23932 returned by providing the gdia_device_id and gdia_layout_type that 23933 uniquely identify the device. The client provides gdia_maxcount to 23934 limit the number of bytes for the result. This maximum size 23935 represents all of the data being returned within the 23936 GETDEVICEINFO4resok structure and includes the XDR overhead. The 23937 server may return less data. If the server is unable to return any 23938 information within the gdia_maxcount limit, the error 23939 NFS4ERR_TOOSMALL will be returned. However, if gdia_maxcount is 23940 zero, NFS4ERR_TOOSMALL MUST NOT be returned. 23942 The da_layout_type field of the gdir_device_addr returned by the 23943 server MUST be equal to the gdia_layout_type specified by the client. 23944 If it is not equal, the client SHOULD ignore the response as invalid 23945 and behave as if the server returned an error, even if the client 23946 does have support for the layout type returned. 23948 The client also provides a notification bitmap, gdia_notify_types for 23949 the device ID mapping notification for which it is interested in 23950 receiving; the server must support device ID notifications for the 23951 notification request to have affect. The notification mask is 23952 composed in the same manner as the bitmap for file attributes 23953 (Section 3.3.7). The numbers of bit positions are listed in the 23954 notify_device_type4 enumeration type (Section 20.12). Only two 23955 enumerated values of notify_device_type4 currently apply to 23956 GETDEVICEINFO: NOTIFY_DEVICEID4_CHANGE and NOTIFY_DEVICEID4_DELETE 23957 (see Section 20.12). 23959 The notification bitmap applies only to the specified device ID. If 23960 a client issues GETDEVICEINFO on a deviceID multiple times, the last 23961 notification bitmap is used by the server for subsequent 23962 notifications. If the bitmap is zero or empty, then the device ID's 23963 notifications are turned off. 23965 If the client wants to just update or turn off notifications, it MAY 23966 issue GETDEVICEINFO with gdia_maxcount set to zero. In that event, 23967 if the device ID is valid, the reply's da_addr_body field of the 23968 gdir_device_addr field will be of zero length. 23970 If an unknown device ID is given in gdia_device_id, the server 23971 returns NFS4ERR_NOENT. Otherwise, the device address information is 23972 returned in gdir_device_addr. Finally, if the server supports 23973 notifications for device ID mappings, the gdir_notification result 23974 will contain a bitmap of which notifications it will actually send to 23975 the client (via CB_NOTIFY_DEVICEID, see Section 20.12). 23977 If NFS4ERR_TOOSMALL is returned, the results also contain 23978 gdir_mincount. The value of gdir_mincount represents the minimum 23979 size necessary to obtain the device information. 23981 18.40.4. IMPLEMENTATION 23983 Aside from updating or turning off notifications, another use case 23984 for gdia_maxcount being set to zero is to validate a device ID. 23986 The client SHOULD request a notification for changes or deletion of a 23987 device ID to device address mapping so that the server can allow the 23988 client gracefully use a new mapping, without having pending I/O fail 23989 abruptly, or force layouts using the device ID to be recalled or 23990 revoked. 23992 It is possible that GETDEVICEINFO (and GETDEVICELIST) will race with 23993 CB_NOTIFY_DEVICEID, i.e. CB_NOTIFY_DEVICEID arrives before the 23994 client gets and processes the response to GETDEVICEINFO or 23995 GETDEVICELIST. The analysis of the race leverages the fact that the 23996 server MUST NOT delete a device ID that is referred to by a layout 23997 the client has. 23999 o CB_NOTIFY_DEVICEID deletes a device ID. If the client believes it 24000 has layouts that refer to the device ID, then it is possible the 24001 layouts have been revoked. The client should send a TEST_STATEID 24002 request using the stateid for each layout that might have been 24003 revoked. If TEST_STATEID indicates any layouts have been revoked, 24004 the client must recover from layout revocation as described in 24005 Section 12.5.6. If TEST_STATEID indicates at least one layout has 24006 not been revoked, the client should send a GETDEVICEINFO on the 24007 device ID to verify that the device ID has been deleted. If 24008 GETDEVICEINFO indicates the device ID does not exist, the client 24009 then assumes the server is faulty, and recovers issuing by 24010 EXCHANGE_ID. If the client does not have layouts that refer to 24011 the device ID, no harm is done. The client should mark the device 24012 ID as deleted, and when the GETDEVICEINFO or GETDEVICELIST results 24013 are finally received for the device ID, delete the device ID from 24014 client's cache. 24016 o CB_NOTIFY_DEVICEID indicates a device ID's device addressing 24017 mappings have changed. The client should assume that the results 24018 from the in progress GETDEVICEINFO will be stale for the device ID 24019 once received, and so it should send another GETDEVICEINFO on the 24020 device ID. 24022 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings for a File 24023 System 24025 18.41.1. ARGUMENT 24027 struct GETDEVICELIST4args { 24028 /* CURRENT_FH: object belonging to the file system */ 24029 layouttype4 gdla_layout_type; 24031 /* number of deviceIDs to return */ 24032 count4 gdla_maxdevices; 24034 nfs_cookie4 gdla_cookie; 24035 verifier4 gdla_cookieverf; 24036 }; 24038 18.41.2. RESULT 24040 struct GETDEVICELIST4resok { 24041 nfs_cookie4 gdlr_cookie; 24042 verifier4 gdlr_cookieverf; 24043 deviceid4 gdlr_deviceid_list<>; 24044 bool gdlr_eof; 24045 }; 24047 union GETDEVICELIST4res switch (nfsstat4 gdlr_status) { 24048 case NFS4_OK: 24049 GETDEVICELIST4resok gdlr_resok4; 24050 default: 24051 void; 24052 }; 24054 18.41.3. DESCRIPTION 24056 This operation is used by the client to enumerate all of the device 24057 IDs a server's file system uses. 24059 The client provides a current filehandle of a file object that 24060 belongs to the file system (i.e. all file objects sharing the same 24061 fsid as that of the current filehandle), and the layout type in 24062 gdia_layout_type. Since this operation might require multiple calls 24063 to enumerate all the device IDs (and is thus similar to the READDIR 24064 (Section 18.23) operation), the client also provides gdia_cookie and 24065 gdia_cookieverf to specify the current cursor position in the list. 24066 When the client wants to read from the beginning of the file system's 24067 device mappings, it sets gdla_cookie to zero. The field 24068 gdla_cookieverf MUST be ignored by the server when gdla_cookie is 24069 zero. The client provides gdla_maxdevices to limit the number of 24070 device IDs in the result. If gdla_maxdevices is zero, the server 24071 MUST return NFS4ERR_INVAL. The server MAY return fewer device IDs. 24073 The successful response to the operation will contain the cookie, 24074 gdlr_cookie, and cookie verifier, gdlr_cookieverf, to be used on the 24075 subsequent GETDEVICELIST. A gdlr_eof value of TRUE signifies that 24076 there are no remaining entries in the server's device list. Each 24077 element of gdlr_deviceid_list contains a device ID. 24079 18.41.4. IMPLEMENTATION 24081 An example of the use of this operation is for pNFS clients and 24082 servers that use LAYOUT4_BLOCK_VOLUME layouts. In these environments 24083 it may be helpful for a client to determine device accessibility upon 24084 first file system access. 24086 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using a layout 24088 18.42.1. ARGUMENT 24090 union newtime4 switch (bool nt_timechanged) { 24091 case TRUE: 24092 nfstime4 nt_time; 24093 case FALSE: 24094 void; 24095 }; 24097 union newoffset4 switch (bool no_newoffset) { 24098 case TRUE: 24099 offset4 no_offset; 24100 case FALSE: 24101 void; 24102 }; 24104 struct LAYOUTCOMMIT4args { 24105 /* CURRENT_FH: file */ 24106 offset4 loca_offset; 24107 length4 loca_length; 24108 bool loca_reclaim; 24109 stateid4 loca_stateid; 24110 newoffset4 loca_last_write_offset; 24111 newtime4 loca_time_modify; 24112 layoutupdate4 loca_layoutupdate; 24113 }; 24115 18.42.2. RESULT 24117 union newsize4 switch (bool ns_sizechanged) { 24118 case TRUE: 24119 length4 ns_size; 24120 case FALSE: 24121 void; 24122 }; 24124 struct LAYOUTCOMMIT4resok { 24125 newsize4 locr_newsize; 24126 }; 24128 union LAYOUTCOMMIT4res switch (nfsstat4 locr_status) { 24129 case NFS4_OK: 24130 LAYOUTCOMMIT4resok locr_resok4; 24131 default: 24132 void; 24133 }; 24135 18.42.3. DESCRIPTION 24137 Commits changes in the layout represented by the current filehandle, 24138 client ID (derived from the session ID in the preceding SEQUENCE 24139 operation), byte range, and stateid. Since layouts are sub- 24140 dividable, a smaller portion of a layout, retrieved via LAYOUTGET, 24141 can be committed. The region being committed is specified through 24142 the byte range (loca_offset and loca_length). This region MUST 24143 overlap with one or more existing layouts previously granted via 24144 LAYOUTGET (Section 18.43), each with an iomode of LAYOUTIOMODE4_RW. 24145 In the case where the iomode of any held layout segment is not 24146 LAYOUTIOMODE4_RW, the server should return the error 24147 NFS4ERR_BAD_IOMODE. For the case where the client does not hold 24148 matching layout segment(s) for the defined region, the server should 24149 return the error NFS4ERR_BAD_LAYOUT. 24151 The LAYOUTCOMMIT operation indicates that the client has completed 24152 writes using a layout obtained by a previous LAYOUTGET. The client 24153 may have only written a subset of the data range it previously 24154 requested. LAYOUTCOMMIT allows it to commit or discard provisionally 24155 allocated space and to update the server with a new end of file. The 24156 layout referenced by LAYOUTCOMMIT is still valid after the operation 24157 completes and can be continued to be referenced by the client ID, 24158 filehandle, byte range, layout type, and stateid. 24160 If the loca_reclaim field is set to TRUE, this indicates that the 24161 client is attempting to commit changes to a layout after the restart 24162 of the metadata server during the metadata server's recovery grace 24163 period (see Section 12.7.4). This type of request may be necessary 24164 when the client has uncommitted writes to provisionally allocated 24165 regions of a file which were sent to the storage devices before the 24166 restart of the metadata server. In this case the layout provided by 24167 the client MUST be a subset of a writable layout that the client held 24168 immediately before the restart of the metadata server. The metadata 24169 server is free to accept or reject this request based on its own 24170 internal metadata consistency checks. If the metadata server finds 24171 that the layout provided by the client does not pass its consistency 24172 checks, it MUST reject the request with the status 24173 NFS4ERR_RECLAIM_BAD. The successful completion of the LAYOUTCOMMIT 24174 request with loca_reclaim set to TRUE does NOT provide the client 24175 with a layout for the file. It simply commits the changes to the 24176 layout specified in the loca_layoutupdate field. To obtain a layout 24177 for the file the client must send a LAYOUTGET request to the server 24178 after the server's grace period has expired. If the metadata server 24179 receives a LAYOUTCOMMIT request with loca_reclaim set to TRUE when 24180 the metadata server is not in its recovery grace period, it MUST 24181 reject the request with the status NFS4ERR_NO_GRACE. 24183 Setting the loca_reclaim field to TRUE is required if and only if the 24184 committed layout was acquired before the metadata server restart. If 24185 the client is committing a layout that was acquired during the 24186 metadata server's grace period, it MUST set the "reclaim" field to 24187 FALSE. 24189 The loca_stateid is a layout stateid value as returned by previously 24190 successful layout operations (see Section 12.5.3). 24192 The loca_last_write_offset field specifies the offset of the last 24193 byte written by the client previous to the LAYOUTCOMMIT. Note that 24194 this value is never equal to the file's size (at most it is one byte 24195 less than the file's size) and MUST be less than or equal to 24196 NFS4_MAXFILEOFF. Also, loca_last_write_offset MUST overlap the range 24197 described by loca_offset and loca_length. The metadata server may 24198 use this information to determine whether the file's size needs to be 24199 updated. If the metadata server updates the file's size as the 24200 result of the LAYOUTCOMMIT operation, it must return the new size 24201 (locr_newsize.ns_size) as part of the results. 24203 The loca_time_modify field allows the client to suggest a 24204 modification time it would like the metadata server to set. The 24205 metadata server may use the suggestion or it may use the time of the 24206 LAYOUTCOMMIT operation to set the modification time. If the metadata 24207 server uses the client provided modification time, it should ensure 24208 time does not flow backwards. If the client wants to force the 24209 metadata server to set an exact time, the client should use a SETATTR 24210 operation in a COMPOUND right after LAYOUTCOMMIT. See Section 12.5.4 24211 for more details. If the client desires the resultant modification 24212 time it should construct the COMPOUND so that a GETATTR follows the 24213 LAYOUTCOMMIT. 24215 The loca_layoutupdate argument to LAYOUTCOMMIT provides a mechanism 24216 for a client to provide layout specific updates to the metadata 24217 server. For example, the layout update can describe what regions of 24218 the original layout have been used and what regions can be 24219 deallocated. There is no NFSv4.1 file layout-specific layoutupdate4 24220 structure. 24222 The layout information is more verbose for block devices than for 24223 objects and files because the latter two hide the details of block 24224 allocation behind their storage protocols. At the minimum, the 24225 client needs to communicate changes to the end of file location back 24226 to the server, and, if desired, its view of the file's modification 24227 time. For block/volume layouts, it needs to specify precisely which 24228 blocks have been used. 24230 If the layout identified in the arguments does not exist, the error 24231 NFS4ERR_BADLAYOUT is returned. The layout being committed may also 24232 be rejected if it does not correspond to an existing layout with an 24233 iomode of LAYOUTIOMODE4_RW. 24235 On success, the current filehandle retains its value and the current 24236 stateid retains its value. 24238 18.42.4. IMPLEMENTATION 24240 The client MAY also use LAYOUTCOMMIT with the loca_reclaim field set 24241 to TRUE to convey hints to modified file attributes or to report 24242 layout-type specific information such as I/O errors for object-based 24243 storage layouts, as normally done during normal operation. Doing so 24244 may help the metadata server to recover files more efficiently after 24245 restart. For example, some file system implementations may require 24246 expansive recovery of file system objects if the metadata server does 24247 not get a positive indication from all clients holding a write layout 24248 that they have successfully completed all their writes. Sending a 24249 LAYOUTCOMMIT (if required) and then following with LAYOUTRETURN can 24250 provide such an indication and allow for graceful and efficient 24251 recovery. 24253 18.43. Operation 50: LAYOUTGET - Get Layout Information 24254 18.43.1. ARGUMENT 24256 struct LAYOUTGET4args { 24257 /* CURRENT_FH: file */ 24258 bool loga_signal_layout_avail; 24259 layouttype4 loga_layout_type; 24260 layoutiomode4 loga_iomode; 24261 offset4 loga_offset; 24262 length4 loga_length; 24263 length4 loga_minlength; 24264 stateid4 loga_stateid; 24265 count4 loga_maxcount; 24266 }; 24268 18.43.2. RESULT 24270 struct LAYOUTGET4resok { 24271 bool logr_return_on_close; 24272 stateid4 logr_stateid; 24273 layout4 logr_layout<>; 24274 }; 24276 union LAYOUTGET4res switch (nfsstat4 logr_status) { 24277 case NFS4_OK: 24278 LAYOUTGET4resok logr_resok4; 24279 case NFS4ERR_LAYOUTTRYLATER: 24280 bool logr_will_signal_layout_avail; 24281 default: 24282 void; 24283 }; 24285 18.43.3. DESCRIPTION 24287 Requests a layout from the metadata server for reading or writing the 24288 file given by the filehandle at the byte range specified by offset 24289 and length. Layouts are identified by the client ID (derived from 24290 the session ID in the preceding SEQUENCE operation), current 24291 filehandle, layout type (loga_layout_type), and the layout stateid 24292 (loga_stateid). The use of the loga_iomode field depends upon the 24293 layout type, but should reflect the client's data access intent. 24295 If the metadata server is in a grace period, and does not persist 24296 layouts and device ID to device address mappings, then it MUST return 24297 NFS4ERR_GRACE (see Section 8.4.2.1). 24299 The LAYOUTGET operation returns layout information for the specified 24300 byte range: a layout. The client actually specifies two ranges, both 24301 starting at the offset in the loga_offset field. The first range is 24302 between loga_offset and loga_offset + loga_length - 1 inclusive. 24303 This range indicates the desired range the client wants the layout to 24304 cover. The second range is between loga_offset and loga_offset + 24305 loga_minlength - 1 inclusive. This range indicates the required 24306 range the client needs the layout to cover. Thus, loga_minlength 24307 MUST be less than or equal to loga_length. 24309 When a length field is set to NFS4_UINT64_MAX, this indicates a 24310 desire (when loga_length is NFS4_UINT64_MAX) or requirement (when 24311 loga_minlength is NFS4_UINT64_MAX) to get a layout from loga_offset 24312 through the end-of-file, regardless of the file's length. 24314 The following rules govern the relationships among, and the minima of 24315 loga_length, loga_minlength, and loga_offset. 24317 o If loga_length is less than loga_minlength, the metadata server 24318 MUST return NFS4ERR_INVAL. 24320 o If loga_minlength is zero, this is an indication to the metadata 24321 server that the client desires any layout at offset loga_offset or 24322 less that the metadata server has "readily available". Readily is 24323 subjective, and depends on the layout type and the pNFS server 24324 implementation. For example, some metadata servers might have to 24325 pre-allocate stable storage when they receive a request for a 24326 range of a file that goes beyond the file's current length. If 24327 loga_minlength is zero and loga_length is greater than zero, this 24328 tells the metadata server what range of the layout the client 24329 would prefer to have. If loga_length and loga_minlength are both 24330 zero, then the client is indicating it desires a layout of any 24331 length with the ending offset of the range no less than specified 24332 loga_offset, and the starting offset at or below loga_offset. If 24333 the metadata server does not have a layout that is readily 24334 available, then it MUST return return NFS4ERR_LAYOUTTRYLATER. 24336 o If the sum of loga_offset and loga_minlength exceeds 24337 NFS4_UINT64_MAX, and loga_minlength is not NFS4_UINT64_MAX, the 24338 error NFS4ERR_INVAL MUST result. 24340 o If the sum of loga_offset and loga_length exceeds NFS4_UINT64_MAX, 24341 and loga_length is not NFS4_UINT64_MAX, the error NFS4ERR_INVAL 24342 MUST result. 24344 After the metadata server has performed the above checks on 24345 loga_offset, loga_minlength, and loga_offset, the metadata server 24346 MUST return a layout according to the rules in Table 21. 24348 Acceptable layouts based on loga_minlength. Note: u64m = 24349 NFS4_UINT64_MAX; a_off = loga_offset; a_minlen = loga_minlength. 24351 +-----------+-----------+----------+----------+---------------------+ 24352 | Layout | Layout | Layout | Layout | Layout length of | 24353 | iomode of | a_minlen | iomode | offset | reply | 24354 | request | of | of reply | of reply | | 24355 | | request | | | | 24356 +-----------+-----------+----------+----------+---------------------+ 24357 | _READ | u64m | MAY be | MUST be | MUST be >= file | 24358 | | | _READ | <= a_off | length - layout | 24359 | | | | | offset | 24360 | _READ | u64m | MAY be | MUST be | MUST be u64m | 24361 | | | _RW | <= a_off | | 24362 | _READ | > 0 and < | MAY be | MUST be | MUST be >= MIN(file | 24363 | | u64m | _READ | <= a_off | length, a_minlen + | 24364 | | | | | a_off) - layout | 24365 | | | | | offset | 24366 | _READ | > 0 and < | MAY be | MUST be | MUST be >= a_off - | 24367 | | u64m | _RW | <= a_off | layout offset + | 24368 | | | | | a_minlen | 24369 | _READ | 0 | MAY be | MUST be | MUST be > 0 | 24370 | | | _READ | <= a_off | | 24371 | _READ | 0 | MAY be | MUST be | MUST be > 0 | 24372 | | | _RW | <= a_off | | 24373 | _RW | u64m | MUST be | MUST be | MUST be u64m | 24374 | | | _RW | <= a_off | | 24375 | _RW | > 0 and < | MUST be | MUST be | MUST be >= a_off - | 24376 | | u64m | _RW | <= a_off | layout offset + | 24377 | | | | | a_minlen | 24378 | _RW | 0 | MUST be | MUST be | MUST be > 0 | 24379 | | | _RW | <= a_off | | 24380 +-----------+-----------+----------+----------+---------------------+ 24382 Table 21 24384 If loga_minlength is not zero and the metadata server cannot return a 24385 layout according to the rules in Table 21, then the metadata server 24386 MUST return the error NFS4ERR_BADLAYOUT. If loga_minlength is zero 24387 and the metadata server cannot or will not return a layout according 24388 to the rules in Table 21, then the metadata server MUST return the 24389 error NFS4ERR_LAYOUTTRYLATER. Assuming loga_length is greater than 24390 loga_minlength or equal to zero, the metadata server SHOULD return a 24391 layout according to the rules in Table 22. 24393 Desired layouts based on loga_length. The rules of Table 21 MUST be 24394 applied first. Note: u64m = NFS4_UINT64_MAX; a_off = loga_offset; 24395 a_len = loga_length. 24397 +------------+------------+-----------+-----------+-----------------+ 24398 | Layout | Layout | Layout | Layout | Layout length | 24399 | iomode of | a_len of | iomode of | offset of | of reply | 24400 | request | request | reply | reply | | 24401 +------------+------------+-----------+-----------+-----------------+ 24402 | _READ | u64m | MAY be | MUST be | SHOULD be u64m | 24403 | | | _READ | <= a_off | | 24404 | _READ | u64m | MAY be | MUST be | SHOULD be u64m | 24405 | | | _RW | <= a_off | | 24406 | _READ | > 0 and < | MAY be | MUST be | SHOULD be >= | 24407 | | u64m | _READ | <= a_off | a_off - layout | 24408 | | | | | offset + a_len | 24409 | _READ | > 0 and < | MAY be | MUST be | SHOULD be >= | 24410 | | u64m | _RW | <= a_off | a_off - layout | 24411 | | | | | offset + a_len | 24412 | _READ | 0 | MAY be | MUST be | SHOULD be > | 24413 | | | _READ | <= a_off | a_off - layout | 24414 | | | | | offset | 24415 | _READ | 0 | MAY be | MUST be | SHOULD be > | 24416 | | | _READ | <= a_off | a_off - layout | 24417 | | | | | offset | 24418 | _RW | u64m | MUST be | MUST be | SHOULD be u64m | 24419 | | | _RW | <= a_off | | 24420 | _RW | > 0 and < | MUST be | MUST be | SHOULD be >= | 24421 | | u64m | _RW | <= a_off | a_off - layout | 24422 | | | | | offset + a_len | 24423 | _RW | 0 | MUST be | MUST be | SHOULD be > | 24424 | | | _RW | <= a_off | a_off - layout | 24425 | | | | | offset | 24426 +------------+------------+-----------+-----------+-----------------+ 24428 Table 22 24430 The loga_stateid field specifies a valid stateid. If a layout is not 24431 currently held by the client, the loga_stateid field represents a 24432 stateid reflecting the correspondingly valid open, byte-range lock, 24433 or delegation stateid. Once a layout is held on the file by the 24434 client, the loga_stateid field MUST be a stateid as returned from a 24435 previous LAYOUTGET or LAYOUTRETURN operation or provided by a 24436 CB_LAYOUTRECALL operation (see Section 12.5.3). 24438 The loga_maxcount field specifies the maximum layout size (in bytes) 24439 that the client can handle. If the size of the layout structure 24440 exceeds the size specified by maxcount, the metadata server will 24441 return the NFS4ERR_TOOSMALL error. 24443 The returned layout is expressed as an array, logr_layout, with each 24444 element of type layout4. If a file has a single striping pattern, 24445 then logr_layout SHOULD contain just one entry. Otherwise, if the 24446 requested range overlaps more than one striping pattern, logr_layout 24447 will contain the required number of entries. The elements of 24448 logr_layout MUST be sorted in ascending order of the value of the 24449 lo_offset field of each element. There MUST be no gaps or overlaps 24450 in the range between two successive elements of logr_layout. The 24451 lo_iomode field in each element of logr_layout MUST be the same. 24453 Table 21 and Table 22 both refer to a returned layout iomode, offset, 24454 and length. Because the returned layout is encoded in the 24455 logr_layout array, more description is required. 24457 iomode 24459 The value of the returned layout iomode listed in Table 21 and 24460 Table 22 is equal to the value of the lo_iomode field in each 24461 element of logr_layout. As shown in Table 21 and Table 22, the 24462 metadata server MAY return a layout with an lo_iomode different 24463 from the requested iomode (field loga_iomode of the request). If 24464 it does so, it MUST ensure that the lo_iomode is more permissive 24465 than the loga_iomode requested. For example, this behavior allows 24466 an implementation to upgrade read-only requests to read/write 24467 requests at its discretion, within the limits of the layout type 24468 specific protocol. A lo_iomode of either LAYOUTIOMODE4_READ or 24469 LAYOUTIOMODE4_RW MUST be returned. 24471 offset 24473 The value of the returned layout offset listed in Table 21 and 24474 Table 22 is always equal to the lo_offset field of the first 24475 element logr_layout. 24477 length 24479 When setting the value of the returned layout length, the 24480 situation is complicated by the possibility that the special 24481 layout length value NFS4_UINT64_MAX is involved. For a 24482 logr_layout array of N elements, the lo_length field in the first 24483 N-1 elements MUST NOT be NFS4_UINT64_MAX. The lo_length field of 24484 the last element of logr_layout can be NFS4_UINT64_MAX under some 24485 conditions as described in the following list. 24487 * If an applicable rule of Table 21 states the metadata server 24488 MUST return a layout of length NFS4_UINT64_MAX, then lo_length 24489 field of the last element of logr_layout MUST be 24490 NFS4_UINT64_MAX. 24492 * If an applicable rule of Table 21 states the metadata server 24493 MUST NOT return a layout of length NFS4_UINT64_MAX, then 24494 lo_length field of the last element of logr_layout MUST NOT be 24495 NFS4_UINT64_MAX. 24497 * If an applicable rule of Table 22 states the metadata server 24498 SHOULD return a layout of length NFS4_UINT64_MAX, then 24499 lo_length field of the last element of logr_layout SHOULD be 24500 NFS4_UINT64_MAX. 24502 * When the value of the returned layout length of Table 21 and 24503 Table 22 is not NFS4_UINT64_MAX, then the returned layout 24504 length is equal to the sum of the lo_length fields of each 24505 element of logr_layout. 24507 The logr_return_on_close result field is a directive to return the 24508 layout before closing the file. When the metadata server sets this 24509 return value to TRUE, it MUST be prepared to recall the layout in the 24510 case the client fails to return the layout before close. For the 24511 metadata server that knows a layout must be returned before a close 24512 of the file, this return value can be used to communicate the desired 24513 behavior to the client and thus remove one extra step from the 24514 client's and metadata server's interaction. 24516 The logr_stateid stateid is returned to the client for use in 24517 subsequent layout related operations. See Section 8.2, 24518 Section 12.5.3, and Section 12.5.5.2 for a further discussion and 24519 requirements. 24521 The format of the returned layout (lo_content) is specific to the 24522 layout type. The value of the layout type (lo_content.loc_type) for 24523 each of the elements of the array of layouts returned by the metadata 24524 server (logr_layout) MUST be equal to the loga_layout_type specified 24525 by the client. If it is not equal, the client SHOULD ignore the 24526 response as invalid and behave as if the metadata server returned an 24527 error, even if the client does have support for the layout type 24528 returned. 24530 If layouts are not supported for the requested file or its containing 24531 file system the metadata server MUST return 24532 NFS4ERR_LAYOUTUNAVAILABLE. If the layout type is not supported, the 24533 metadata server MUST return NFS4ERR_UNKNOWN_LAYOUTTYPE. If layouts 24534 are supported but no layout matches the client provided layout 24535 identification, the metadata server MUST return NFS4ERR_BADLAYOUT. 24536 If an invalid loga_iomode is specified, or a loga_iomode of 24537 LAYOUTIOMODE4_ANY is specified, the metadata server MUST return 24538 NFS4ERR_BADIOMODE. 24540 If the layout for the file is unavailable due to transient 24541 conditions, e.g. file sharing prohibits layouts, the metadata server 24542 MUST return NFS4ERR_LAYOUTTRYLATER. 24544 If the layout request is rejected due to an overlapping layout 24545 recall, the metadata server MUST return NFS4ERR_RECALLCONFLICT. See 24546 Section 12.5.5.2 for details. 24548 If the layout conflicts with a mandatory byte range lock held on the 24549 file, and if the storage devices have no method of enforcing 24550 mandatory locks, other than through the restriction of layouts, the 24551 metadata server SHOULD return NFS4ERR_LOCKED. 24553 If client sets loga_signal_layout_avail to TRUE, then it is 24554 registering with the client a "want" for a layout in the event the 24555 layout cannot be obtained due to resource exhaustion. If the 24556 metadata server supports and will honor the "want", the results will 24557 have logr_will_signal_layout_avail set to TRUE. If so the client 24558 should expect a CB_RECALLABLE_OBJ_AVAIL operation to indicate that a 24559 layout is available. 24561 On success, the current filehandle retains its value and the current 24562 stateid is updated to match the value as returned in the results. 24564 18.43.4. IMPLEMENTATION 24566 Typically, LAYOUTGET will be called as part of a COMPOUND request 24567 after an OPEN operation and results in the client having location 24568 information for the file; this requires that loga_stateid be set to 24569 the special stateid that tells the metadata server to use the current 24570 stateid, which is set by OPEN (see Section 16.2.3.1.2) . A client 24571 may also hold a layout across multiple OPENs. The client specifies a 24572 layout type that limits what kind of layout the metadata server will 24573 return. This prevents metadata servers from granting layouts that 24574 are unusable by the client. 24576 As indicated by Table 21 and Table 22 the specification of LAYOUTGET 24577 allows a pNFS client and server considerable flexibility. A pNFS 24578 client can take several strategies for sending LAYOUTGET. Some 24579 examples are as follows. 24581 o If LAYOUTGET is preceded by OPEN in the same COMPOUND request, and 24582 the OPEN requests read access, the client might opt to request a 24583 _READ layout with loga_offset set to zero, loga_minlength set to 24584 zero, and loga_length set to NFS4_UINT64_MAX. If the file has 24585 space allocated to it, that space is striped over one or more 24586 storage devices, and there is either no conflicting layout, or the 24587 concept of a conflicting layout does not apply to the pNFS 24588 server's layout type or implementation, then the metadata server 24589 might return a layout with a starting offset of zero, and a length 24590 equal to the length of the file, if not NFS4_UINT64_MAX. If the 24591 length of the file is not a multiple of the pNFS server's stripe 24592 width (see Section 13.2 for a formal definition), the metadata 24593 server might round the returned layout's length up. 24595 o If LAYOUTGET is preceded by OPEN in the same COMPOUND request, and 24596 the OPEN does not truncate the file, and requests write access, 24597 the client might opt to request a _RW layout with loga_offset set 24598 to zero, loga_minlength set to zero, and loga_length set to the 24599 file's current length (if known), or NFS4_UINT64_MAX. As with the 24600 previous case, under some conditions the metadata server might 24601 return a layout that covers the entire length of the file or 24602 beyond. 24604 o As above, but the OPEN truncates the file. In this case, client 24605 might anticipate it will be writing to the file from offset zero, 24606 and so loga_offset and loga_minlength are set to zero, and 24607 loga_length is set to the value of threshold4_write_iosize. The 24608 metadata server might return a layout from offset zero with a 24609 length at least as long as as threshold4_write_iosize. 24611 o A process on the client invokes a request to read from offset 24612 10000 for length 50000. The client is using buffered I/O, and has 24613 buffer sizes of 4096 bytes. The client intends to map the request 24614 of the process into a series of READ requests starting at offset 24615 8192. The end offset needs to be higher than 10000 + 50000 = 24616 60000, and the next offset that is a multiple of 4096 is 61440. 24617 The difference between 61440 and that starting offset of the 24618 layout is 53248 (which is the product of 4096 and 15). The value 24619 of threshold4_read_iosize is less than 53248, so the client sends 24620 a LAYOUTGET request with loga_offset set to 8192, loga_minlength 24621 set to 53248, and loga_length set to the file's length (if known) 24622 minus 8192 or NFS4_UINT64_MAX (if the file's length is not known). 24623 Since this LAYOUTGET request exceeds the metadata server's 24624 threshold, it grants the layout, possibly with an initial offset 24625 of 0, with an end offset of at least 8192 + 53248 - 1 = 61439, but 24626 preferably a layout with an offset aligned on the stripe width and 24627 a length that is a multiple of the stripe width. 24629 o As above, but the client is not using buffered I/O, and instead 24630 all internal I/O requests are sent directly to the server. The 24631 LAYOUTGET request has loga_offset equal to 10000, and 24632 loga_minlength set to 50000. The value of loga_length is set to 24633 the length of the file. The metadata server is free to return a 24634 layout that fully overlaps the requested range, with a starting 24635 offset and length aligned on the stripe width. 24637 o Again a process on the client invokes a request to read from 24638 offset 10000 for length 50000, and buffered I/O is in use. The 24639 client is expecting that the server might not be able to return 24640 the layout for the full I/O range, with loga_offset set to 8192 24641 and loga_minlength set to 53248. The client intends to map the 24642 request of the process into a series of READ requests starting at 24643 offset 8192, each with length 4096, with a total length of 53248 24644 (which equals 13 * 4096). Because the value of 24645 threshold4_read_iosize is equal to 4096, it is practical and 24646 reasonable for the client to use several LAYOUTGETs to complete 24647 the series of READs. The client sends a LAYOUTGET request with 24648 loga_offset set to 8192, loga_minlength set to 4096, and 24649 loga_length set to 53248 or higher. The server will grant a 24650 layout possibly with an initial offset of 0, with an end offset of 24651 at least 8192 + 4096 - 1 = 12287, but preferably a layout with an 24652 offset aligned on the stripe width and a length that is a multiple 24653 of the stripe width. This will allow the client to make forward 24654 progress, possibly having to issue more LAYOUTGET requests for the 24655 remainder of the range. 24657 o An NFS client detects a sequential read pattern, and so issues a 24658 LAYOUTGET that goes well beyond any current or pending read 24659 requests to the server. The server might likewise detect this 24660 pattern, and grant the LAYOUTGET request. The client continues to 24661 send LAYOUTGET requests once it has read from an offset of the 24662 file that represents 50% of the way through the last layout it 24663 received. 24665 o As above but the client fails to detect the pattern, but the 24666 server does. The next time the metadata server gets a LAYOUTGET, 24667 it returns a layout with a length that is well beyond 24668 loga_minlength. 24670 o A client is using buffered I/O, and has a long queue of write 24671 behinds to process and also detects a sequential write pattern. 24672 It issues a LAYOUTGET for a layout that spans the range of the 24673 queued write behinds and well beyond, including ranges beyond the 24674 filer's current length. The client continues to issue LAYOUTGETs 24675 once the write behind queue reaches 50% of the maximum queue 24676 length. 24678 Once the client has obtained a layout referring to a particular 24679 device ID, the metadata server MUST NOT delete the device ID until 24680 the layout is returned or revoked. 24682 CB_NOTIFY_DEVICEID can race with LAYOUTGET. One race scenario is 24683 that LAYOUTGET returns a device ID the client does not have device 24684 address mappings for, and the metadata server sends a 24685 CB_NOTIFY_DEVICEID to add the device ID to the client's awareness and 24686 meanwhile the client sends GETDEVICEINFO on the device ID. This 24687 scenario is discussed in Section 18.40.4. Another scenario is that 24688 the CB_NOTIFY_DEVICEID is processed by the client before it processes 24689 the results from LAYOUTGET. The client will send a GETDEVICEINFO on 24690 the device ID. If the results from GETDEVICEINFO are received before 24691 the client gets results from LAYTOUTGET, then there is no longer a 24692 race. If the results from LAYOUTGET are received before the results 24693 from GETDEVICEINFO, the client can either wait for results of 24694 GETDEVICEINFO, or send another one to get possibly more up to date 24695 device address mappings for the device ID. 24697 18.44. Operation 51: LAYOUTRETURN - Release Layout Information 24699 18.44.1. ARGUMENT 24701 /* Constants used for LAYOUTRETURN and CB_LAYOUTRECALL */ 24702 const LAYOUT4_RET_REC_FILE = 1; 24703 const LAYOUT4_RET_REC_FSID = 2; 24704 const LAYOUT4_RET_REC_ALL = 3; 24706 enum layoutreturn_type4 { 24707 LAYOUTRETURN4_FILE = LAYOUT4_RET_REC_FILE, 24708 LAYOUTRETURN4_FSID = LAYOUT4_RET_REC_FSID, 24709 LAYOUTRETURN4_ALL = LAYOUT4_RET_REC_ALL 24710 }; 24712 struct layoutreturn_file4 { 24713 offset4 lrf_offset; 24714 length4 lrf_length; 24715 stateid4 lrf_stateid; 24716 /* layouttype4 specific data */ 24717 opaque lrf_body<>; 24718 }; 24720 union layoutreturn4 switch(layoutreturn_type4 lr_returntype) { 24721 case LAYOUTRETURN4_FILE: 24722 layoutreturn_file4 lr_layout; 24723 default: 24724 void; 24725 }; 24726 struct LAYOUTRETURN4args { 24727 /* CURRENT_FH: file */ 24728 bool lora_reclaim; 24729 layouttype4 lora_layout_type; 24730 layoutiomode4 lora_iomode; 24731 layoutreturn4 lora_layoutreturn; 24732 }; 24734 18.44.2. RESULT 24736 union layoutreturn_stateid switch (bool lrs_present) { 24737 case TRUE: 24738 stateid4 lrs_stateid; 24739 case FALSE: 24740 void; 24741 }; 24743 union LAYOUTRETURN4res switch (nfsstat4 lorr_status) { 24744 case NFS4_OK: 24745 layoutreturn_stateid lorr_stateid; 24746 default: 24747 void; 24748 }; 24750 18.44.3. DESCRIPTION 24752 This operation returns from the client to the server one or more 24753 layouts represented by the client ID (derived from the session ID in 24754 the preceding SEQUENCE operation), lora_layout_type, and lora_iomode. 24755 When lr_returntype is LAYOUTRETURN4_FILE, the returned layout is 24756 further identified by the current filehandle, lrf_offset, lrf_length, 24757 and lrf_stateid. If the lrf_length field is NFS4_UINT64_MAX, all 24758 bytes of the layout, starting at lrf_offset are returned. When 24759 lr_returntype is LAYOUTRETURN4_FSID, the current filehandle is used 24760 to identify the file system and all layouts matching the client ID, 24761 the fsid of the file system, lora_layout_type, and lora_iomode are 24762 returned. When lr_returntype is LAYOUTRETURN4_ALL, all layouts 24763 matching the client ID, lora_layout_type, and lora_iomode are 24764 returned and the current filehandle is not used. After this call, 24765 the client MUST NOT use the returned layout(s) and the associated 24766 storage protocol to access the file data. 24768 If the set of layouts designated in the case of LAYOUTRETURN4_FSID or 24769 LAYOUTRETURN4_ALL is empty, then no error results. In the case of 24770 LAYOUTRETURN4_FILE, the byte range specified is returned even if it 24771 is a subdivision of a layout previously obtained with LAYOUTGET, a 24772 combination of multiple layouts previously obtained with LAYOUTGET, 24773 or a combination including some layouts previously obtained with 24774 LAYOUTGET, and one or more subdivisions of such layouts. When the 24775 byte range does not designate any bytes for which a layout is held 24776 for the specified file, client ID, layout type and mode, no error 24777 results. See Section 12.5.5.2.1.5 for considerations with "bulk" 24778 return of layouts. 24780 The layout being returned may be a subset or superset of a layout 24781 specified by CB_LAYOUTRECALL. However, if it is a subset, the recall 24782 is not complete until the full recalled scope has been returned. 24783 Recalled scope refers to the byte range in the case of 24784 LAYOUTRETURN4_FILE, use of LAYOUTRETURN4_FSID, or the use of 24785 LAYOUTRETURN4_ALL. There must be a LAYOUTRETURN with a matching 24786 scope to complete the return even if all current layout ranges have 24787 been previously individually returned. 24789 For all lr_returntype values, an iomode of LAYOUTIOMODE4_ANY 24790 specifies that all layouts that match the other arguments to 24791 LAYOUTRETURN (i.e., client ID, lora_layout_type, and one of current 24792 filehandle and range; fsid derived from current filehandle; or 24793 LAYOUTRETURN4_ALL) are being returned. 24795 In the case that lr_returntype is LAYOUTRETURN4_FILE, the lrf_stateid 24796 provided by the client is a layout stateid as returned from previous 24797 layout operations. Note that the "seqid" field of lrf_stateid MUST 24798 NOT be zero. See Section 8.2, Section 12.5.3, and Section 12.5.5.2 24799 for a further discussion and requirements. 24801 Return of a layout or all layouts does not invalidate the mapping of 24802 storage device ID to storage device address which remains in effect 24803 until specifically changed or deleted via device ID notification 24804 callbacks. 24806 If the lora_reclaim field is set to TRUE, the client is attempting to 24807 return a layout that was acquired before the restart of the metadata 24808 server during the metadata server's grace period. When returning 24809 layouts that were acquired during the metadata server's grace period, 24810 the client MUST set the lora_reclaim field to FALSE. The 24811 lora_reclaim field MUST be set to FALSE also when lr_layoutreturn is 24812 LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL. See LAYOUTCOMMIT 24813 (Section 18.42) for more details. 24815 Layouts may be returned when recalled or voluntarily (i.e., before 24816 the server has recalled them). In either case the client must 24817 properly propagate state changed under the context of the layout to 24818 the storage device(s) or to the metadata server before returning the 24819 layout. 24821 If the client returns the layout in response to a CB_LAYOUTRECALL 24822 where the lor_recalltype field of the clora_recall field was 24823 LAYOUTRECALL4_FILE, the client should use the lor_stateid value from 24824 CB_LAYOUTRECALL as the value for lrf_stateid. Otherwise, it should 24825 use logr_stateid (from a previous LAYOUTGET result) or lorr_stateid 24826 (from a previous LAYRETURN result). This is done to indicate the 24827 point in time (in terms of layout stateid transitions) when the 24828 recall was sent. The client uses the precise lora_recallstateid 24829 value and MUST NOT set the stateid's seqid to zero; otherwise 24830 NFS4ERR_BAD_STATEID MUST be returned. NFS4ERR_OLD_STATEID can be 24831 returned if the client is using an old seqid, and the server knows 24832 the client should not be using the old seqid. E.g. the client uses 24833 the seqid on slot 1 of the session, received the response with the 24834 new seqid, and uses the slot to send another request with the old 24835 seqid. 24837 If a client fails to return a layout in a timely manner, then the 24838 metadata server SHOULD use its control protocol with the storage 24839 devices to fence the client from accessing the data referenced by the 24840 layout. See Section 12.5.5 for more details. 24842 If the LAYOUTRETURN request sets the lora_reclaim field to TRUE after 24843 the metadata server's grace period, NFS4ERR_NO_GRACE is returned. 24845 If the LAYOUTRETURN request sets the lora_reclaim field to TRUE and 24846 lr_returntype is set to LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL, 24847 NFS4ERR_INVAL is returned. 24849 If the client sets the lr_returntype field to LAYOUTRETURN4_FILE, 24850 then the lrs_stateid field will represent the layout stateid as 24851 updated for this operation's processing; the current stateid will 24852 also be updated to match the returned value. If the last byte of any 24853 layout for the current file, client ID, and layout type is being 24854 returned and there are no remaining pending CB_LAYOUTRECALL 24855 operations for which a LAYOUTRETURN operation must be done, 24856 lrs_present MUST be FALSE, and no stateid will be returned. In 24857 addition, the COMPOUND request's current stateid will be set to all- 24858 zeroes special stateid (see Section 16.2.3.1.2). The server MUST 24859 reject with NFS4ERR_BAD_STATEID any further use of the current 24860 stateid in that COMPOUND until the current stateid is re-established 24861 by a later stateid-returning operation. 24863 On success, the current filehandle retains its value. 24865 If the EXCHGID4_FLAG_BIND_PRINC_STATEID capability is set on the 24866 client ID (see Section 18.35), the server will require that the 24867 principal, security flavor, and if applicable, the GSS mechanism, 24868 combination that acquired the layout also be the one to send 24869 LAYOUTRETURN. This might not be possible if credentials for the 24870 principal are no longer available. The server will allow the machine 24871 credential or SSV credential (see Section 18.35) to send LAYOUTRETURN 24872 if LAYOUTRETURN's operation code was set in the spo_must_allow result 24873 of EXCHANGE_ID. 24875 18.44.4. IMPLEMENTATION 24877 The final LAYOUTRETURN operation in response to a CB_LAYOUTRECALL 24878 callback MUST be serialized with any outstanding, intersecting 24879 LAYOUTRETURN operations. Note that it is possible that while a 24880 client is returning the layout for some recalled range the server may 24881 recall a superset of that range (e.g. LAYOUTRECALL4_ALL); the final 24882 return operation for the latter must block until the former layout 24883 recall is done. 24885 Returning all layouts in a file system using LAYOUTRETURN4_FSID is 24886 typically done in response to a CB_LAYOUTRECALL for that file system 24887 as the final return operation. Similarly, LAYOUTRETURN4_ALL is used 24888 in response to a recall callback for all layouts. It is possible 24889 that the client already returned some outstanding layouts via 24890 individual LAYOUTRETURN calls and the call for LAYOUTRETURN4_FSID or 24891 LAYOUTRETURN4_ALL marks the end of the LAYOUTRETURN sequence. See 24892 Section 12.5.5.1 for more details. 24894 Once the client has returned all layouts referring to a particular 24895 device ID, the server MAY delete the device ID. 24897 18.45. Operation 52: SECINFO_NO_NAME - Get Security on Unnamed Object 24899 Obtain available security mechanisms with the use of the parent of an 24900 object or the current filehandle. 24902 18.45.1. ARGUMENT 24904 enum secinfo_style4 { 24905 SECINFO_STYLE4_CURRENT_FH = 0, 24906 SECINFO_STYLE4_PARENT = 1 24907 }; 24909 /* CURRENT_FH: object or child directory */ 24910 typedef secinfo_style4 SECINFO_NO_NAME4args; 24912 18.45.2. RESULT 24914 /* CURRENTFH: consumed if status is NFS4_OK */ 24915 typedef SECINFO4res SECINFO_NO_NAME4res; 24917 18.45.3. DESCRIPTION 24919 Like the SECINFO operation, SECINFO_NO_NAME is used by the client to 24920 obtain a list of valid RPC authentication flavors for a specific file 24921 object. Unlike SECINFO, SECINFO_NO_NAME only works with objects that 24922 are accessed by filehandle. 24924 There are two styles of SECINFO_NO_NAME, as determined by the value 24925 of the secinfo_style4 enumeration. If SECINFO_STYLE4_CURRENT_FH is 24926 passed, then SECINFO_NO_NAME is querying for the required security 24927 for the current filehandle. If SECINFO_STYLE4_PARENT is passed, then 24928 SECINFO_NO_NAME is querying for the required security of the current 24929 filehandle's parent. If the style selected is SECINFO_STYLE4_PARENT, 24930 then SECINFO should apply the same access methodology used for 24931 LOOKUPP when evaluating the traversal to the parent directory. 24932 Therefore, if the requester does not have the appropriate access to 24933 LOOKUPP the parent then SECINFO_NO_NAME must behave the same way and 24934 return NFS4ERR_ACCESS. 24936 If PUTFH, PUTPUBFH, PUTROOTFH, or RESTOREFH return NFS4ERR_WRONGSEC, 24937 then the client resolves the situation by sending a COMPOUND request 24938 that consists of PUTFH, PUTPUBFH, or PUTROOTFH immediately followed 24939 by SECINFO_NO_NAME, style SECINFO_STYLE4_CURRENT_FH. See Section 2.6 24940 for instructions on dealing with NFS4ERR_WRONGSEC error returns from 24941 PUTFH, PUTROOTFH, PUTPUBFH, or RESTOREFH. 24943 If SECINFO_STYLE4_PARENT is specified and there is no parent 24944 directory, SECINFO_NO_NAME MUST return NFS4ERR_NOENT. 24946 On success, the current filehandle is consumed (see 24947 Section 2.6.3.1.1.8), and if the next operation after SECINFO_NO_NAME 24948 tries to use the current filehandle, that operation will fail with 24949 the status NFS4ERR_NOFILEHANDLE. 24951 Everything else about SECINFO_NO_NAME is the same as SECINFO. See 24952 the discussion on SECINFO (Section 18.29.3). 24954 18.45.4. IMPLEMENTATION 24956 See the discussion on SECINFO (Section 18.29.4). 24958 18.46. Operation 53: SEQUENCE - Supply per-procedure sequencing and 24959 control 24961 Supply per-procedure sequencing and control 24963 18.46.1. ARGUMENT 24965 struct SEQUENCE4args { 24966 sessionid4 sa_sessionid; 24967 sequenceid4 sa_sequenceid; 24968 slotid4 sa_slotid; 24969 slotid4 sa_highest_slotid; 24970 bool sa_cachethis; 24971 }; 24973 18.46.2. RESULT 24975 const SEQ4_STATUS_CB_PATH_DOWN = 0x00000001; 24976 const SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING = 0x00000002; 24977 const SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED = 0x00000004; 24978 const SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED = 0x00000008; 24979 const SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED = 0x00000010; 24980 const SEQ4_STATUS_ADMIN_STATE_REVOKED = 0x00000020; 24981 const SEQ4_STATUS_RECALLABLE_STATE_REVOKED = 0x00000040; 24982 const SEQ4_STATUS_LEASE_MOVED = 0x00000080; 24983 const SEQ4_STATUS_RESTART_RECLAIM_NEEDED = 0x00000100; 24984 const SEQ4_STATUS_CB_PATH_DOWN_SESSION = 0x00000200; 24985 const SEQ4_STATUS_BACKCHANNEL_FAULT = 0x00000400; 24986 const SEQ4_STATUS_DEVID_CHANGED = 0x00000800; 24987 const SEQ4_STATUS_DEVID_DELETED = 0x00001000; 24989 struct SEQUENCE4resok { 24990 sessionid4 sr_sessionid; 24991 sequenceid4 sr_sequenceid; 24992 slotid4 sr_slotid; 24993 slotid4 sr_highest_slotid; 24994 slotid4 sr_target_highest_slotid; 24995 uint32_t sr_status_flags; 24996 }; 24998 union SEQUENCE4res switch (nfsstat4 sr_status) { 24999 case NFS4_OK: 25000 SEQUENCE4resok sr_resok4; 25001 default: 25002 void; 25003 }; 25005 18.46.3. DESCRIPTION 25007 The SEQUENCE operation is used by the server to implement session 25008 request control and the reply cache semantics. 25010 This operation MUST appear as the first operation of any COMPOUND in 25011 which it appears. The error NFS4ERR_SEQUENCE_POS will be returned 25012 when it is found in any position in a COMPOUND beyond the first. 25013 Operations other than SEQUENCE, BIND_CONN_TO_SESSION, EXCHANGE_ID, 25014 CREATE_SESSION, and DESTROY_SESSION, MUST NOT appear as the first 25015 operation in a COMPOUND. Such operations MUST yield the error 25016 NFS4ERR_OP_NOT_IN_SESSION if they do appear at the start of a 25017 COMPOUND. 25019 If SEQUENCE is received on a connection not associated with the 25020 session via CREATE_SESSION or BIND_CONN_TO_SESSION, and connection 25021 association enforcement is enabled (see Section 18.35), then the 25022 server returns NFS4ERR_CONN_NOT_BOUND_TO_SESSION. 25024 The sa_sessionid argument identifies the session this request applies 25025 to. The sr_sessionid result MUST equal sa_sessionid. 25027 The sa_slotid argument is the index in the reply cache for the 25028 request. The sa_sequenceid field is the sequence number of the 25029 request for the reply cache entry (slot). The sr_slotid result MUST 25030 equal sa_slotid. The sr_sequenceid result MUST equal sa_sequenceid. 25032 The sa_highest_slotid argument is the highest slot ID the client has 25033 a request outstanding for; it could be equal to sa_slotid. The 25034 server returns two "highest_slotid" values: sr_highest_slotid, and 25035 sr_target_highest_slotid. The former is the highest slot ID the 25036 server will accept in future SEQUENCE operation, and SHOULD NOT be 25037 less than the value of sa_highest_slotid. (but see Section 2.10.5.1 25038 for an exception). The latter is the highest slot ID the server 25039 would prefer the client use on a future SEQUENCE operation. 25041 If sa_cachethis is TRUE, then the client is requesting that the 25042 server cache the entire reply in the server's reply cache; therefore 25043 the server MUST cache the reply (see Section 2.10.5.1.3). The server 25044 MAY cache the reply if sa_cachethis is FALSE. If the server does not 25045 cache the entire reply, it MUST still record that it executed the 25046 request at the specified slot and sequence ID. 25048 The response to the SEQUENCE operation contains a word of status 25049 flags (sr_status_flags) that can provide to the client information 25050 related to the status of the client's lock state and communications 25051 paths. Note that any status bits relating to lock state MAY be reset 25052 when lock state is lost due to a server restart (even if the session 25053 is persistent across restarts; session persistence does not imply 25054 lock state persistence) or the establishment of a new client 25055 instance. 25057 SEQ4_STATUS_CB_PATH_DOWN 25058 When set, indicates that the client has no operational backchannel 25059 path for any session associated with the client ID, making it 25060 necessary for the client to re-establish one. This bit remains 25061 set on all SEQUENCE responses on all sessions associated with the 25062 client ID until at least one backchannel is available on any 25063 session associated with the client ID. If the client fails to re- 25064 establish a backchannel for the client ID, it is subject to having 25065 recallable state revoked. 25067 SEQ4_STATUS_CB_PATH_DOWN_SESSION 25068 When set, indicates that the session has no operational 25069 backchannel. There are two reasons why 25070 SEQ4_STATUS_CB_PATH_DOWN_SESSION may be set and not 25071 SEQ4_STATUS_CB_PATH_DOWN. First is that a callback operation that 25072 applies specifically to the session (e.g. CB_RECALL_SLOT, see 25073 Section 20.8) needs to be sent. Second is that the server did 25074 send a callback operation, but the connection was lost before the 25075 reply. The server cannot be sure whether the client received the 25076 callback operation or not, and so, per rules on request retry, the 25077 server MUST retry the callback operation over the same session. 25078 The SEQ4_STATUS_CB_PATH_DOWN_SESSION bit is the indication to the 25079 client that it needs to associate a connection to the session's 25080 backchannel. This bit remains set on all SEQUENCE responses on 25081 the session until a backchannel on the session the path is 25082 available. If the client fails to re-establish a backchannel for 25083 the session, it is subject to having recallable state revoked. 25085 SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING 25086 When set, indicates that all GSS contexts assigned to the 25087 session's backchannel will expire within a period equal to the 25088 lease time. This bit remains set on all SEQUENCE replies until 25089 the expiration time of at least one context is beyond the lease 25090 period from the current time (relative to the time of when a 25091 SEQUENCE response was sent) or until all GSS contexts for the 25092 session's backchannel have expired. 25094 SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED 25095 When set, indicates all GSS contexts assigned to the session's 25096 backchannel have expired. This bit remains set on all SEQUENCE 25097 replies until at least one non-expired context for the session's 25098 backchannel has been established. 25100 SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED 25101 When set, indicates that the lease has expired and as a result the 25102 server released all of the client's locking state. This status 25103 bit remains set on all SEQUENCE replies until the loss of all such 25104 locks has been acknowledged by use of FREE_STATEID (see 25105 Section 18.38), or by establishing a new client instance by 25106 destroying all sessions (via DESTROY_SESSION), the client ID (via 25107 DESTROY_CLIENTID), and then invoking EXCHANGE_ID and 25108 CREATE_SESSION to establish a new client ID. 25110 SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED 25111 When set indicates that some subset of the client's locks have 25112 been revoked due to expiration of the lease period followed by 25113 another client's conflicting lock request. This status bit 25114 remains set on all SEQUENCE replies until the loss of all such 25115 locks has been acknowledged by use of FREE_STATEID. 25117 SEQ4_STATUS_ADMIN_STATE_REVOKED 25118 When set indicates that one or more locks have been revoked 25119 without expiration of the lease period, due to administrative 25120 action. This status bit remains set on all SEQUENCE replies until 25121 the loss of all such locks has been acknowledged by use of 25122 FREE_STATEID. 25124 SEQ4_STATUS_RECALLABLE_STATE_REVOKED 25125 When set indicates that one or more recallable objects have been 25126 revoked without expiration of the lease period, due to the 25127 client's failure to return them when recalled which may be a 25128 consequence of there being no working backchanel and the client 25129 failing to reestablish a backchannel per the 25130 SEQ4_STATUS_CB_PATH_DOWN, SEQ4_STATUS_CB_PATH_DOWN_SESSION, or 25131 SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED status flags. This status bit 25132 remains set on all SEQUENCE replies until the loss of all such 25133 locks has been acknowledged by use of FREE_STATEID. 25135 SEQ4_STATUS_LEASE_MOVED 25136 When set indicates that responsibility for lease renewal has been 25137 transferred to one or more new servers. This condition will 25138 continue until the client receives an NFS4ERR_MOVED error and the 25139 server receives the subsequent GETATTR for the fs_locations or 25140 fs_locations_info attribute for an access to each file system for 25141 which a lease has been moved to a new server. See 25142 Section 11.7.7.1. 25144 SEQ4_STATUS_RESTART_RECLAIM_NEEDED 25145 When set indicates that due to server restart the client must 25146 reclaim locking state. Until the client sends a global 25147 RECLAIM_COMPLETE (Section 18.51), every SEQUENCE operation will 25148 return SEQ4_STATUS_RESTART_RECLAIM_NEEDED. 25150 SEQ4_STATUS_BACKCHANNEL_FAULT 25151 The server has encountered an unrecoverable fault with the 25152 backchannel (e.g. it has lost track of the sequence ID for a slot 25153 in the backchannel). The client MUST stop sending more requests 25154 on the session's fore channel, wait for all outstanding requests 25155 to complete on the fore and back channel, and then destroy the 25156 session. 25158 SEQ4_STATUS_DEVID_CHANGED 25159 The client is using device ID notifications and the server has 25160 changed a device ID mapping held by the client. This flag will 25161 stay present until the client has obtained the new mapping with 25162 GETDEVICEINFO. 25164 SEQ4_STATUS_DEVID_DELETED 25165 The client is using device ID notifications and the server has 25166 deleted a device ID mapping held by the client. This flag will 25167 stay in effect until the client sends a GETDEVICEINFO on the 25168 device ID with a null value in the argument gdia_notify_types. 25170 The value of the sa_sequenceid argument relative to the cached 25171 sequence ID on the slot falls into one of three cases. 25173 o If the difference between sa_sequenceid and the server's cached 25174 sequence ID at the slot ID is two (2) or more, or if sa_sequenceid 25175 is less than the cached sequence ID (accounting for wraparound of 25176 the unsigned sequence ID value), then the server MUST return 25177 NFS4ERR_SEQ_MISORDERED. 25179 o If sa_sequenceid and the cached sequence ID are the same, this is 25180 a retry, and the server replies with the COMPOUND reply that is 25181 stored the reply cache. The lease is possibly renewed as 25182 described below. 25184 o If sa_sequenceid is one greater (accounting for wraparound) than 25185 the cached sequence ID, then this is a new request, and the slot's 25186 sequence ID is incremented. The operations subsequent to 25187 SEQUENCE, if any, are processed. If there are no other 25188 operations, the only other effects are to cache the SEQUENCE reply 25189 in the slot, maintain the session's activity, and possibly renew 25190 the lease. 25192 If the client reuses a slot ID and sequence ID for a completely 25193 different request, the server MAY treat the request as if it is retry 25194 of what it has already executed. The server MAY however detect the 25195 client's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY. 25197 If SEQUENCE returns an error, then the state of the slot (sequence 25198 ID, cached reply) MUST NOT change, and the associated lease MUST NOT 25199 be renewed. 25201 If SEQUENCE returns NFS4_OK, then the associated lease MUST be 25202 renewed (see Section 8.3), except if 25203 SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED is returned in sr_status_flags. 25205 18.46.4. IMPLEMENTATION 25207 The server MUST maintain a mapping of session ID to client ID in 25208 order to validate any operations that follow SEQUENCE that take a 25209 stateid as an argument and/or result. 25211 If the client establishes a persistent session, then a SEQUENCE done 25212 after a server restart may encounter requests performed and recorded 25213 in a persistent reply cache before the server restart. In this case, 25214 SEQUENCE will be processed successfully, while requests which were 25215 not processed previously are rejected with NFS4ERR_DEADSESSION. 25217 Depending on which of the operations within the COMPOUND were 25218 successfully performed before the server restart, these operations 25219 will also have replies sent from the server reply cache. Note that 25220 when these operations establish locking state it is locking state 25221 that applies to the previous server instance and to the previous 25222 client ID, even though the server restart, which logically happened 25223 after these operations, eliminated that state. In the case of a 25224 partially executed COMPOUND, processing may reach an operation not 25225 processed during the earlier server instance, making this operation a 25226 new one and not performable on the existing session. In this case, 25227 NFS4ERR_DEADSESSION will be returned from that operation. 25229 18.47. Operation 54: SET_SSV - Update SSV for a Client ID 25231 18.47.1. ARGUMENT 25233 struct ssa_digest_input4 { 25234 SEQUENCE4args sdi_seqargs; 25235 }; 25237 struct SET_SSV4args { 25238 opaque ssa_ssv<>; 25239 opaque ssa_digest<>; 25240 }; 25242 18.47.2. RESULT 25244 struct ssr_digest_input4 { 25245 SEQUENCE4res sdi_seqres; 25246 }; 25248 struct SET_SSV4resok { 25249 opaque ssr_digest<>; 25250 }; 25252 union SET_SSV4res switch (nfsstat4 ssr_status) { 25253 case NFS4_OK: 25254 SET_SSV4resok ssr_resok4; 25255 default: 25256 void; 25257 }; 25259 18.47.3. DESCRIPTION 25261 This operation is used to update the SSV for a client ID. Before 25262 SET_SSV is called the first time on a client ID, the SSV is zero (0). 25263 The SSV is the key used for the SSV GSS mechanism (Section 2.10.8) 25265 SET_SSV MUST be preceded by a SEQUENCE operation in the same 25266 COMPOUND. It MUST NOT be used if the client did not opt for SP4_SSV 25267 state protection when the client ID was created (see Section 18.35); 25268 the server returns NFS4ERR_INVAL in that case. 25270 The field ssa_digest is computed as the output of the HMAC RFC2104 25271 [11] using the subkey derived from the SSV4_SUBKEY_MIC_I2T and 25272 current SSV as the key (See Section 2.10.8 for a description of 25273 subkeys), and an XDR encoded value of data type ssa_digest_input4. 25274 The field sdi_seqargs is equal to the arguments of the SEQUENCE 25275 operation for the COMPOUND procedure that SET_SSV is within. 25277 The argument ssa_ssv is XORed with the current SSV to produce the new 25278 SSV. The argument ssa_ssv SHOULD be generated randomly. 25280 In the response, ssr_digest is the output of the HMAC using the 25281 subkey derived from SSV4_SUBKEY_MIC_T2I and new SSV as the key, and 25282 an XDR encoded value of data type ssr_digest_input4. The field 25283 sdi_seqres is equal to the results of the SEQUENCE operation for the 25284 COMPOUND procedure that SET_SSV is within. 25286 As noted in Section 18.35, the client and server can maintain 25287 multiple concurrent versions of the SSV. The client and server each 25288 MUST maintain an internal SSV version number, which is set to one (1) 25289 the first time SET_SSV executes on the server and the client receives 25290 the first SET_SSV reply. Each subsequent SET_SSV increases the 25291 internal SSV version number by one (1). The value of this version 25292 number corresponds to the smpt_ssv_seq, smt_ssv_seq, sspt_ssv_seq, 25293 and ssct_ssv_seq fields of the SSV GSS mechanism tokens (see 25294 Section 2.10.8). 25296 18.47.4. IMPLEMENTATION 25298 When the server receives ssa_digest, it MUST verify the digest by 25299 computing the digest the same way the client did and comparing it 25300 with ssa_digest. If the server gets a different result, this is an 25301 error, NFS4ERR_BAD_SESSION_DIGEST. This error might be the result of 25302 another SET_SSV from the same client ID changing the SSV. If so, the 25303 client recovers by issuing SET_SSV again with a recomputed digest 25304 based on the subkey of the new SSV. If the transport connection is 25305 dropped after the SET_SSV request is sent, but before the SET_SSV 25306 reply is received, then there are special considerations for recovery 25307 if the client has no more connections associated with sessions 25308 associated with the client ID of the SSV. See Section 18.34.4. 25310 Clients SHOULD NOT send an ssa_ssv that is equal to a previous 25311 ssa_ssv, nor equal to a previous or current SSV (including an ssa_ssv 25312 equal to zero since the SSV is initialized to zero when the client ID 25313 is created). 25315 Clients SHOULD send SET_SSV with RPCSEC_GSS privacy. Servers MUST 25316 support RPCSEC_GSS with privacy for any COMPOUND that has { SEQUENCE, 25317 SET_SSV }. 25319 A client SHOULD NOT send SET_SSV with the SSV GSS mechanism's 25320 credential because the purpose of SET_SSV is to seed the SSV from 25321 non-SSV credentials. Instead SET_SSV SHOULD be sent with the 25322 credential of a user that is accessing the client ID for the first 25323 time (Section 2.10.7.3). However if the client does send SET_SSV 25324 with SSV credentials, the digest protecting the arguments uses the 25325 value of the SSV before ssa_ssv is XORed in, and the digest 25326 protecting the results uses the value of the SSV after the ssa_ssv is 25327 XORed in. 25329 18.48. Operation 55: TEST_STATEID - Test stateids for validity 25331 Test a series of stateids for validity. 25333 18.48.1. ARGUMENT 25335 struct TEST_STATEID4args { 25336 stateid4 ts_stateids<>; 25337 }; 25339 18.48.2. RESULT 25341 struct TEST_STATEID4resok { 25342 nfsstat4 tsr_status_codes<>; 25343 }; 25345 union TEST_STATEID4res switch (nfsstat4 tsr_status) { 25346 case NFS4_OK: 25347 TEST_STATEID4resok tsr_resok4; 25348 default: 25349 void; 25350 }; 25352 18.48.3. DESCRIPTION 25354 The TEST_STATEID operation is used to check the validity of a set of 25355 stateids. It can be used at any time but the client should 25356 definitely use it when it receives an indication that one or more of 25357 its stateids have been invalidated due to lock revocation. This 25358 occurs when the SEQUENCE operation returns with one of the following 25359 sr_status_flags set: 25361 o SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED 25363 o SEQ4_STATUS_EXPIRED_ADMIN_STATE_REVOKED 25365 o SEQ4_STATUS_EXPIRED_RECALLABLE_STATE_REVOKED 25367 The client can use TEST_STATEID one or more times to test the 25368 validity of its stateids. Each use of TEST_STATEID allows a large 25369 set of such stateids to be tested and allows problems with earlier 25370 stateids not to interfere with checking of subsequent ones as would 25371 happen if individual stateids are tested by operation in a COMPOUND. 25373 For each stateid, the server returns the status code that would be 25374 returned if that stateid were to be used in normal operation. 25375 Returning such a status indication is not an error and does not cause 25376 compound processing to terminate. Checks for the validity of the 25377 stateid proceed as they would for normal operations with a number of 25378 exceptions: 25380 o There is no check for the type of stateid object, as would be the 25381 case for normal use of a stateid. 25383 o There is no reference to the current filehandle. 25385 o Special stateids are always considered invalid (they result in the 25386 error code NFS4ERR_BAD_STATEID). 25388 All stateids are interpreted as being associated with the client for 25389 the current session. Any possible association with a previous 25390 instance of the client (as stale stateids) is not considered. 25392 The errors which are validly returned within the status_code array 25393 are: NFS4ERR_OK, NFS4ERR_BAD_STATEID, NFS4ERR_OLD_STATEID, 25394 NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, and NFS4ERR_DELEG_REVOKED. 25396 18.48.4. IMPLEMENTATION 25398 See Section 8.2.2 and Section 8.2.4 for a discussion of stateid 25399 structure, lifetime, and validation. 25401 18.49. Operation 56: WANT_DELEGATION - Request Delegation 25402 18.49.1. ARGUMENT 25404 union deleg_claim4 switch (open_claim_type4 dc_claim) { 25405 /* 25406 * No special rights to object. Ordinary delegation 25407 * request of the specified object. Object identified 25408 * by filehandle. 25409 */ 25410 case CLAIM_FH: /* new to v4.1 */ 25411 /* CURRENT_FH: object being delegated */ 25412 void; 25414 /* 25415 * Right to file based on a delegation granted 25416 * to a previous boot instance of the client. 25417 * File is specified by filehandle. 25418 */ 25419 case CLAIM_DELEG_PREV_FH: /* new to v4.1 */ 25420 /* CURRENT_FH: object being delegated */ 25421 void; 25423 /* 25424 * Right to the file established by an open previous 25425 * to server reboot. File identified by filehandle. 25426 * Used during server reclaim grace period. 25427 */ 25428 case CLAIM_PREVIOUS: 25429 /* CURRENT_FH: object being reclaimed */ 25430 open_delegation_type4 dc_delegate_type; 25431 }; 25433 struct WANT_DELEGATION4args { 25434 uint32_t wda_want; 25435 deleg_claim4 wda_claim; 25436 }; 25438 18.49.2. RESULT 25440 union WANT_DELEGATION4res switch (nfsstat4 wdr_status) { 25441 case NFS4_OK: 25442 open_delegation4 wdr_resok4; 25443 default: 25444 void; 25445 }; 25447 18.49.3. DESCRIPTION 25449 Where this description mandates the return of a specific error code 25450 for a specific condition, and where multiple conditions apply, the 25451 server MAY return any of the mandated error codes. 25453 This operation allows a client to: 25455 o Get a delegation on all types of files except directories. 25457 o Register a "want" for a delegation for the specified file object, 25458 and be notified via a callback when the delegation is available. 25459 The server MAY support notifications of availability via 25460 callbacks. If the server does not support registration of wants 25461 it MUST NOT return an error to indicate that, and instead MUST 25462 return with ond_why set to WND4_CONTENTION or WND4_RESOURCE and 25463 ond_server_will_push_deleg or ond_server_will_signal_avail set to 25464 FALSE. When the server indicates that it will notify the client 25465 by means of a callback, it will either provide the delegation 25466 using a CB_PUSH_DELEG operation, or cancel its promise by sending 25467 a CB_WANTS_CANCELLED operation. 25469 o Cancel a want for a delegation. 25471 The client SHOULD NOT set OPEN4_SHARE_ACCESS_READ and SHOULD NOT set 25472 OPEN4_SHARE_ACCESS_WRITE in wda_want. If it does, the server MUST 25473 ignore them. 25475 The meanings of the following flags in wda_want are the same as they 25476 are in OPEN: 25478 o OPEN4_SHARE_ACCESS_WANT_READ_DELEG 25480 o OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG 25482 o OPEN4_SHARE_ACCESS_WANT_ANY_DELEG 25484 o OPEN4_SHARE_ACCESS_WANT_NO_DELEG 25486 o OPEN4_SHARE_ACCESS_WANT_CANCEL 25488 o OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL 25490 o OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED 25492 The handling of the above flags in WANT_DELEGATION is the same as in 25493 OPEN. Information about the delegation and/or the promises the 25494 server is making regarding future callbacks are the same as those 25495 described in the open_delegation4 structure. 25497 The successful results of WANT_DELEG are of type open_delegation4 25498 which is the same type as the "delegation" field in the results of 25499 the OPEN operation (see Section 18.16.3). The server constructs 25500 wdr_resok4 the same way it constructs OPEN's "delegation" with one 25501 difference: WANT_DELEGATION MUST NOT return a delegation type of 25502 OPEN_DELEGATE_NONE. 25504 If (wda_want & OPEN4_SHARE_ACCESS_WANT_DELEG_MASK) is zero then the 25505 client is indicating no desire for a delegation and the server MUST 25506 return NFS4ERR_INVAL. 25508 The client uses the OPEN4_SHARE_ACCESS_WANT_NO_DELEG flag in the 25509 WANT_DELEGATION operation to cancel a previously requested want for a 25510 delegation. Note that if the server is in the process of sending the 25511 delegation (via CB_PUSH_DELEG) at the time the client sends a 25512 cancellation of the want, the delegation might still be pushed to the 25513 client. 25515 If WANT_DELEGATION fails to return a delegation, and the server 25516 returns NFS4_OK, the server MUST set the delegation type to 25517 OPEN4_DELEGATE_NONE_EXT, and set od_whynone, as described in 25518 Section 18.16. Write delegations are not available for file types 25519 that are not writeable. This includes file objects of types: NF4BLK, 25520 NF4CHR, NF4LNK, NF4SOCK, and NF4FIFO. If the client requests 25521 OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG without 25522 OPEN4_SHARE_ACCESS_WANT_READ_DELEG on an object with one of the 25523 aforementioned file types, the server must set 25524 WND4_WRITE_DELEG_NOT_SUPP_FTYPE. 25526 18.49.4. IMPLEMENTATION 25528 A request for a conflicting delegation is not normally intended to 25529 trigger the recall of the existing delegation. Servers may choose to 25530 treat some clients as having higher priority such that their wants 25531 will trigger recall of an existing delegation, although that is 25532 expected to be an unusual situation. 25534 Servers will generally recall delegations assigned by WANT_DELEGATION 25535 on the same basis as those assigned by OPEN. CB_RECALL will 25536 generally be done only when other clients perform operations 25537 inconsistent with the delegation. The normal response to aging of 25538 delegations is to use CB_RECALL_ANY, in order to give the client the 25539 opportunity to keep the delegations most useful from its point of 25540 view. 25542 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing client ID 25544 Destroy existing client ID. 25546 18.50.1. ARGUMENT 25548 struct DESTROY_CLIENTID4args { 25549 clientid4 dca_clientid; 25550 }; 25552 18.50.2. RESULT 25554 struct DESTROY_CLIENTID4res { 25555 nfsstat4 dcr_status; 25556 }; 25558 18.50.3. DESCRIPTION 25560 The DESTROY_CLIENTID operation destroys the client ID. If there are 25561 sessions (both idle and non-idle), opens, locks, delegations, 25562 layouts, and/or wants (Section 18.49) associated with the unexpired 25563 lease of the client ID, the server MUST return NFS4ERR_CLIENTID_BUSY. 25564 DESTROY_CLIENTID MAY be preceded with a SEQUENCE operation as long as 25565 the client ID derived from the session ID of SEQUENCE is not the same 25566 as the client ID to be destroyed. If the client IDs are the same, 25567 then the server MUST return NFS4ERR_CLIENTID_BUSY. 25569 If DESTROY_CLIENTID is not prefixed by SEQUENCE, it MUST be the only 25570 operation in the COMPOUND request (otherwise the server MUST return 25571 NFS4ERR_NOT_ONLY_OP). If the operation is sent without a SEQUENCE 25572 preceding it, a client that retransmits the request may receive an 25573 error in response, because the original request might have been 25574 successfully executed. 25576 18.50.4. IMPLEMENTATION 25578 DESTROY_CLIENTID allows a server to immediately reclaim the resources 25579 consumed by an unused client ID, and also to forget that it ever 25580 generated the client ID. By forgetting it ever generated the client 25581 ID the server can safely reuse the client ID on a future EXCHANGE_ID 25582 operation. 25584 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims Finished 25586 Indicate transition between reclaim and non-reclaim locking. 25588 18.51.1. ARGUMENT 25590 struct RECLAIM_COMPLETE4args { 25591 /* 25592 * If rca_one_fs TRUE, 25593 * 25594 * CURRENT_FH: object in 25595 * filesystem reclaim is 25596 * complete for. 25597 */ 25598 bool rca_one_fs; 25599 }; 25601 18.51.2. RESULTS 25603 struct RECLAIM_COMPLETE4res { 25604 nfsstat4 rcr_status; 25605 }; 25607 18.51.3. DESCRIPTION 25609 A RECLAIM_COMPLETE operation is used to indicate that the client has 25610 reclaimed all of the locking state that it will recover, when it is 25611 recovering state due to either a server restart or the transfer of a 25612 file system to another server. There are two types of 25613 RECLAIM_COMPLETE operations: 25615 o When rca_one_fs is FALSE, a global RECLAIM_COMPLETE is being done. 25616 This indicates that recovery of all locks that the client held on 25617 the previous server instance have been completed. 25619 o When rca_one_fs is TRUE, a file system-specific RECLAIM_COMPLETE 25620 is being done. This indicates that recovery of locks for a single 25621 fs (the one designated by the current filehandle) due to a file 25622 system transition have been completed. Presence of a current 25623 filehandle is only required when rca_one_fs is set to TRUE. 25625 Once a RECLAIM_COMPLETE is done, there can be no further reclaim 25626 operations for locks whose scope is defined as having completed 25627 recovery. Once the client sends RECLAIM_COMPLETE, the server will 25628 not allow the client to do subsequent reclaims of locking state for 25629 that scope and if these are attempted, will return NFS4ERR_NO_GRACE. 25631 Whenever a client establishes a new client ID and before it does the 25632 first non-reclaim operation that obtains a lock, it MUST do a global 25633 RECLAIM_COMPLETE, even if there are no locks to reclaim. If non- 25634 reclaim locking operations are done before the RECLAIM_COMPLETE, an 25635 NFS4ERR_GRACE error will be returned. 25637 Similarly, when the client accesses a file system on a new server, 25638 before it sends the first non-reclaim operation that obtains a lock 25639 on this new server, it must do a RECLAIM_COMPLETE with rca_one_fs set 25640 to TRUE and current filehandle within that file system, even if there 25641 are no locks to reclaim. If non-reclaim locking operations are done 25642 on that file system before the RECLAIM_COMPLETE, an NFS4ERR_GRACE 25643 error will be returned. 25645 Any locks not reclaimed at the point at which RECLAIM_COMPLETE is 25646 done become non-reclaimable. The client MUST NOT attempt to reclaim 25647 them, either during the current server instance or in any subsequent 25648 server instance, or on another server to which responsibility for 25649 that file system is transferred. If the client were to do so, it 25650 would be violating the protocol by representing itself as owning 25651 locks that it does not own, and so has no right to reclaim. See 25652 Section 8.4.3 for a discussion of edge conditions related to lock 25653 reclaim. 25655 By sending a RECLAIM_COMPLETE, the client indicates readiness to 25656 proceed to do normal non-reclaim locking operations. The client 25657 should be aware that such operations may temporarily result in 25658 NFS4ERR_GRACE errors until the server is ready to terminate its grace 25659 period. 25661 18.51.4. IMPLEMENTATION 25663 Servers will typically use the information as to when reclaim 25664 activity is complete to reduce the length of the grace period. When 25665 the server maintains in persistent storage a list of clients that 25666 might have had locks, it is in a position to use the fact that all 25667 such clients have done a RECLAIM_COMPLETE to terminate the grace 25668 period and begin normal operations (i.e. grant requests for new 25669 locks) sooner than it might otherwise. 25671 Latency can be minimized by doing a RECLAIM_COMPLETE as part of the 25672 COMPOUND request in which the last lock-reclaiming operation is done. 25673 When there are no reclaims to be done, RECLAIM_COMPLETE should be 25674 done immediately in order to allow the grace period to end as soon as 25675 possible. 25677 RECLAIM_COMPLETE should only be done once for each server instance, 25678 or occasion of the transition of a file system. If it is done a 25679 second time, the error NFS4ERR_COMPLETE_ALREADY will result. Note 25680 that because of the session feature's retry protection, retries of 25681 COMPOUND requests containing RECLAIM_COMPLETE operation will not 25682 result in this error. 25684 When a RECLAIM_COMPLETE is done, the client effectively acknowledges 25685 any locks not yet reclaimed as lost. This allows the server to again 25686 mark this client as able to subsequently recover locks if it had been 25687 prevented from doing so, be by logic to prevent the occurrence of 25688 edge conditions, as described in Section 8.4.3. 25690 18.52. Operation 10044: ILLEGAL - Illegal operation 25692 18.52.1. ARGUMENTS 25694 void; 25696 18.52.2. RESULTS 25698 struct ILLEGAL4res { 25699 nfsstat4 status; 25700 }; 25702 18.52.3. DESCRIPTION 25704 This operation is a placeholder for encoding a result to handle the 25705 case of the client sending an operation code within COMPOUND that is 25706 not supported. See the COMPOUND procedure description for more 25707 details. 25709 The status field of ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL. 25711 18.52.4. IMPLEMENTATION 25713 A client will probably not send an operation with code OP_ILLEGAL but 25714 if it does, the response will be ILLEGAL4res just as it would be with 25715 any other invalid operation code. Note that if the server gets an 25716 illegal operation code that is not OP_ILLEGAL, and if the server 25717 checks for legal operation codes during the XDR decode phase, then 25718 the ILLEGAL4res would not be returned. 25720 19. NFSv4.1 Callback Procedures 25722 The procedures used for callbacks are defined in the following 25723 sections. In the interest of clarity, the terms "client" and 25724 "server" refer to NFS clients and servers, despite the fact that for 25725 an individual callback RPC, the sense of these terms would be 25726 precisely the opposite. 25728 Both procedures, CB_NULL and CB_COMPOUND, MUST be implemented. 25730 19.1. Procedure 0: CB_NULL - No Operation 25732 19.1.1. ARGUMENTS 25734 void; 25736 19.1.2. RESULTS 25738 void; 25740 19.1.3. DESCRIPTION 25742 CB_NULL is the standard ONC RPC NULL procedure, with the standard 25743 void argument and void response. Even though there is no direct 25744 functionality associated with this procedure, the server will use 25745 CB_NULL to confirm the existence of a path for RPCs from the server 25746 to client. 25748 19.1.4. ERRORS 25750 None. 25752 19.2. Procedure 1: CB_COMPOUND - Compound Operations 25754 19.2.1. ARGUMENTS 25756 enum nfs_cb_opnum4 { 25757 OP_CB_GETATTR = 3, 25758 OP_CB_RECALL = 4, 25759 /* Callback operations new to NFSv4.1 */ 25760 OP_CB_LAYOUTRECALL = 5, 25761 OP_CB_NOTIFY = 6, 25762 OP_CB_PUSH_DELEG = 7, 25763 OP_CB_RECALL_ANY = 8, 25764 OP_CB_RECALLABLE_OBJ_AVAIL = 9, 25765 OP_CB_RECALL_SLOT = 10, 25766 OP_CB_SEQUENCE = 11, 25767 OP_CB_WANTS_CANCELLED = 12, 25768 OP_CB_NOTIFY_LOCK = 13, 25769 OP_CB_NOTIFY_DEVICEID = 14, 25771 OP_CB_ILLEGAL = 10044 25772 }; 25773 union nfs_cb_argop4 switch (unsigned argop) { 25774 case OP_CB_GETATTR: 25775 CB_GETATTR4args opcbgetattr; 25776 case OP_CB_RECALL: 25777 CB_RECALL4args opcbrecall; 25778 case OP_CB_LAYOUTRECALL: 25779 CB_LAYOUTRECALL4args opcblayoutrecall; 25780 case OP_CB_NOTIFY: 25781 CB_NOTIFY4args opcbnotify; 25782 case OP_CB_PUSH_DELEG: 25783 CB_PUSH_DELEG4args opcbpush_deleg; 25784 case OP_CB_RECALL_ANY: 25785 CB_RECALL_ANY4args opcbrecall_any; 25786 case OP_CB_RECALLABLE_OBJ_AVAIL: 25787 CB_RECALLABLE_OBJ_AVAIL4args opcbrecallable_obj_avail; 25788 case OP_CB_RECALL_SLOT: 25789 CB_RECALL_SLOT4args opcbrecall_slot; 25790 case OP_CB_SEQUENCE: 25791 CB_SEQUENCE4args opcbsequence; 25792 case OP_CB_WANTS_CANCELLED: 25793 CB_WANTS_CANCELLED4args opcbwants_cancelled; 25794 case OP_CB_NOTIFY_LOCK: 25795 CB_NOTIFY_LOCK4args opcbnotify_lock; 25796 case OP_CB_NOTIFY_DEVICEID: 25797 CB_NOTIFY_DEVICEID4args opcbnotify_deviceid; 25798 case OP_CB_ILLEGAL: void; 25799 }; 25801 struct CB_COMPOUND4args { 25802 utf8str_cs tag; 25803 uint32_t minorversion; 25804 uint32_t callback_ident; 25805 nfs_cb_argop4 argarray<>; 25806 }; 25808 19.2.2. RESULTS 25810 union nfs_cb_resop4 switch (unsigned resop) { 25811 case OP_CB_GETATTR: CB_GETATTR4res opcbgetattr; 25812 case OP_CB_RECALL: CB_RECALL4res opcbrecall; 25814 /* new NFSv4.1 operations */ 25815 case OP_CB_LAYOUTRECALL: 25816 CB_LAYOUTRECALL4res 25817 opcblayoutrecall; 25819 case OP_CB_NOTIFY: CB_NOTIFY4res opcbnotify; 25821 case OP_CB_PUSH_DELEG: CB_PUSH_DELEG4res 25822 opcbpush_deleg; 25824 case OP_CB_RECALL_ANY: CB_RECALL_ANY4res 25825 opcbrecall_any; 25827 case OP_CB_RECALLABLE_OBJ_AVAIL: 25828 CB_RECALLABLE_OBJ_AVAIL4res 25829 opcbrecallable_obj_avail; 25831 case OP_CB_RECALL_SLOT: 25832 CB_RECALL_SLOT4res 25833 opcbrecall_slot; 25835 case OP_CB_SEQUENCE: CB_SEQUENCE4res opcbsequence; 25837 case OP_CB_WANTS_CANCELLED: 25838 CB_WANTS_CANCELLED4res 25839 opcbwants_cancelled; 25841 case OP_CB_NOTIFY_LOCK: 25842 CB_NOTIFY_LOCK4res 25843 opcbnotify_lock; 25845 case OP_CB_NOTIFY_DEVICEID: 25846 CB_NOTIFY_DEVICEID4res 25847 opcbnotify_deviceid; 25849 /* Not new operation */ 25850 case OP_CB_ILLEGAL: CB_ILLEGAL4res opcbillegal; 25851 }; 25852 struct CB_COMPOUND4res { 25853 nfsstat4 status; 25854 utf8str_cs tag; 25855 nfs_cb_resop4 resarray<>; 25856 }; 25858 19.2.3. DESCRIPTION 25860 The CB_COMPOUND procedure is used to combine one or more of the 25861 callback procedures into a single RPC request. The main callback RPC 25862 program has two main procedures: CB_NULL and CB_COMPOUND. All other 25863 operations use the CB_COMPOUND procedure as a wrapper. 25865 During the processing of the CB_COMPOUND procedure, the client may 25866 find that it does not have the available resources to execute any or 25867 all of the operations within the CB_COMPOUND sequence. Refer to 25868 Section 2.10.5.4 for details. 25870 The minorversion field of the arguments MUST be the same as the 25871 minorversion of the COMPOUND procedure used to created the client ID 25872 and session. For NFSv4.1, minorversion MUST be set to 1. 25874 Contained within the CB_COMPOUND results is a 'status' field. This 25875 status must be equivalent to the status of the last operation that 25876 was executed within the CB_COMPOUND procedure. Therefore, if an 25877 operation incurred an error then the 'status' value will be the same 25878 error value as is being returned for the operation that failed. 25880 The "tag" field is handled the same way as that of COMPOUND procedure 25881 (see Section 16.2.3). 25883 Illegal operation codes are handled in the same way as they are 25884 handled for the COMPOUND procedure. 25886 19.2.4. IMPLEMENTATION 25888 The CB_COMPOUND procedure is used to combine individual operations 25889 into a single RPC request. The client interprets each of the 25890 operations in turn. If an operation is executed by the client and 25891 the status of that operation is NFS4_OK, then the next operation in 25892 the CB_COMPOUND procedure is executed. The client continues this 25893 process until there are no more operations to be executed or one of 25894 the operations has a status value other than NFS4_OK. 25896 19.2.5. ERRORS 25898 CB_COMPOUND will of course return every error that each operation on 25899 the backchannel can return (see Table 13). However if CB_COMPOUND 25900 returns zero operations, obviously the error returned by COMPOUND has 25901 nothing to do with an error returned by an operation. The list of 25902 errors CB_COMPOUND will return if it processes zero operations 25903 include: 25905 CB_COMPOUND error returns 25907 +------------------------------+------------------------------------+ 25908 | Error | Notes | 25909 +------------------------------+------------------------------------+ 25910 | NFS4ERR_BADCHAR | The tag argument has a character | 25911 | | the replier does not support. | 25912 | NFS4ERR_BADXDR | | 25913 | NFS4ERR_DELAY | | 25914 | NFS4ERR_INVAL | The tag argument is not in UTF-8 | 25915 | | encoding. | 25916 | NFS4ERR_MINOR_VERS_MISMATCH | | 25917 | NFS4ERR_SERVERFAULT | | 25918 | NFS4ERR_TOO_MANY_OPS | | 25919 | NFS4ERR_REP_TOO_BIG | | 25920 | NFS4ERR_REP_TOO_BIG_TO_CACHE | | 25921 | NFS4ERR_REQ_TOO_BIG | | 25922 +------------------------------+------------------------------------+ 25924 Table 23 25926 20. NFSv4.1 Callback Operations 25928 20.1. Operation 3: CB_GETATTR - Get Attributes 25930 20.1.1. ARGUMENT 25932 struct CB_GETATTR4args { 25933 nfs_fh4 fh; 25934 bitmap4 attr_request; 25935 }; 25937 20.1.2. RESULT 25939 struct CB_GETATTR4resok { 25940 fattr4 obj_attributes; 25941 }; 25943 union CB_GETATTR4res switch (nfsstat4 status) { 25944 case NFS4_OK: 25945 CB_GETATTR4resok resok4; 25946 default: 25947 void; 25948 }; 25950 20.1.3. DESCRIPTION 25952 The CB_GETATTR operation is used by the server to obtain the current 25953 modified state of a file that has been write delegated. The 25954 attributes size and change are the only ones guaranteed to be 25955 serviced by the client. See Section 10.4.3 for a full description of 25956 how the client and server are to interact with the use of CB_GETATTR. 25958 If the filehandle specified is not one for which the client holds a 25959 write delegation, an NFS4ERR_BADHANDLE error is returned. 25961 20.1.4. IMPLEMENTATION 25963 The client returns attrmask bits and the associated attribute values 25964 only for the change attribute, and attributes that it may change 25965 (time_modify, and size). 25967 20.2. Operation 4: CB_RECALL - Recall a Delegation 25969 20.2.1. ARGUMENT 25971 struct CB_RECALL4args { 25972 stateid4 stateid; 25973 bool truncate; 25974 nfs_fh4 fh; 25975 }; 25977 20.2.2. RESULT 25979 struct CB_RECALL4res { 25980 nfsstat4 status; 25981 }; 25983 20.2.3. DESCRIPTION 25985 The CB_RECALL operation is used to begin the process of recalling a 25986 delegation and returning it to the server. 25988 The truncate flag is used to optimize recall for a file object which 25989 is a regular file and is about to be truncated to zero. When it is 25990 TRUE, the client is freed of the obligation to propagate modified 25991 data for the file to the server, since this data is irrelevant. 25993 If the handle specified is not one for which the client holds a 25994 delegation, an NFS4ERR_BADHANDLE error is returned. 25996 If the stateid specified is not one corresponding to an open 25997 delegation for the file specified by the filehandle, an 25998 NFS4ERR_BAD_STATEID is returned. 26000 20.2.4. IMPLEMENTATION 26002 The client SHOULD reply to the callback immediately. Replying does 26003 not complete the recall except when the value of the reply's status 26004 field is neither NFS4ERR_DELAY nor NFS4_OK. The recall is not 26005 complete until the delegation is returned using a DELEGRETURN 26006 operation. 26008 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from Client 26009 20.3.1. ARGUMENT 26011 /* 26012 * NFSv4.1 callback arguments and results 26013 */ 26015 enum layoutrecall_type4 { 26016 LAYOUTRECALL4_FILE = LAYOUT4_RET_REC_FILE, 26017 LAYOUTRECALL4_FSID = LAYOUT4_RET_REC_FSID, 26018 LAYOUTRECALL4_ALL = LAYOUT4_RET_REC_ALL 26019 }; 26021 struct layoutrecall_file4 { 26022 nfs_fh4 lor_fh; 26023 offset4 lor_offset; 26024 length4 lor_length; 26025 stateid4 lor_stateid; 26026 }; 26028 union layoutrecall4 switch(layoutrecall_type4 lor_recalltype) { 26029 case LAYOUTRECALL4_FILE: 26030 layoutrecall_file4 lor_layout; 26031 case LAYOUTRECALL4_FSID: 26032 fsid4 lor_fsid; 26033 case LAYOUTRECALL4_ALL: 26034 void; 26035 }; 26037 struct CB_LAYOUTRECALL4args { 26038 layouttype4 clora_type; 26039 layoutiomode4 clora_iomode; 26040 bool clora_changed; 26041 layoutrecall4 clora_recall; 26042 }; 26044 20.3.2. RESULT 26046 struct CB_LAYOUTRECALL4res { 26047 nfsstat4 clorr_status; 26048 }; 26050 20.3.3. DESCRIPTION 26052 The CB_LAYOUTRECALL operation is used by the server to recall layouts 26053 from the client; as a result, the client will begin the process of 26054 returning layouts via LAYOUTRETURN. The CB_LAYOUTRECALL operation 26055 specifies one of three forms of recall processing with the value of 26056 layoutrecall_type4. The recall is either for a specific layout (by 26057 file), for an entire file system (FSID), or for all file systems 26058 (ALL). 26060 The behavior of the operation varies based on the value of the 26061 layoutrecall_type4. The value and behaviors are: 26063 LAYOUTRECALL4_FILE 26065 For a layout to match the recall request, the values of the 26066 following fields must match those of the layout: clora_type, 26067 clora_iomode, lor_fh, and the byte range specified by lor_offset 26068 and lor_length. The clora_iomode field may have a special value 26069 of LAYOUTIOMODE4_ANY. The special value LAYOUTIOMODE4_ANY will 26070 match any iomode originally returned in a layout; therefore it 26071 acts as a wild card. The other special value used is for 26072 lor_length. If lor_length has a value of NFS4_UINT64_MAX, the 26073 lor_length field means the maximum possible file size. If a 26074 matching layout is found, it MUST be returned using the 26075 LAYOUTRETURN operation (see Section 18.44). An example of the 26076 field's special value use is if clora_iomode is LAYOUTIOMODE4_ANY, 26077 lor_offset is zero, and lor_length is NFS4_UINT64_MAX, then the 26078 entire layout is to be returned. 26080 The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the 26081 client does not hold layouts for the file or if the client does 26082 not have any overlapping layouts for the specification in the 26083 layout recall. 26085 LAYOUTRECALL4_FSID and LAYOUTRECALL4_ALL 26087 If LAYOUTRECALL4_FSID is specified, the fsid specifies the file 26088 system for which any outstanding layouts MUST be returned. If 26089 LAYOUTRECALL4_ALL is specified, all outstanding layouts MUST be 26090 returned. In addition, LAYOUTRECALL4_FSID and LAYOUTRECALL4_ALL 26091 specify that all the storage device ID to storage device address 26092 mappings in the affected file system(s) are also recalled. The 26093 respective LAYOUTRETURN with either LAYOUTRETURN4_FSID or 26094 LAYOUTRETURN4_ALL acknowledges to the server that the client 26095 invalidated the said device mappings. See Section 12.5.5.2.1.5 26096 for considerations with "bulk" recall of layouts. 26098 The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the 26099 client does not hold layouts and does not have valid deviceid 26100 mappings. 26102 In processing the layout recall request, the client also varies its 26103 behavior based on the value of the clora_changed field. This field 26104 is used by the server to provide additional context for the reason 26105 why the layout is being recalled. A FALSE value for clora_changed 26106 indicates that no change in the layout is expected and the client may 26107 write modified data to the storage devices involved; this must be 26108 done prior to returning the layout via LAYOUTRETURN. A TRUE value 26109 for clora_changed indicates that the server is changing the layout. 26110 Examples of layout changes and reasons for a TRUE indication are: the 26111 metadata server is restriping the file or a permanent error has 26112 occurred on a storage device and the metadata server would like to 26113 provide a new layout for the file. Therefore, a clora_changed value 26114 of TRUE indicates some level of change for the layout and the client 26115 SHOULD NOT write and commit modified data to the storage devices. In 26116 this case, the client writes and commits data through the metadata 26117 server. 26119 See Section 12.5.3 for a description of how the lor_stateid field in 26120 the arguments is to be constructed. Note that the "seqid" field of 26121 lor_stateid MUST NOT be zero. See Section 8.2, Section 12.5.3, and 26122 Section 12.5.5.2 for a further discussion and requirements. 26124 20.3.4. IMPLEMENTATION 26126 The client's processing for CB_LAYOUTRECALL is similar to CB_RECALL 26127 (recall of file delegations) in that the client responds to the 26128 request before actually returning layouts via the LAYOUTRETURN 26129 operation. While the client responds to the CB_LAYOUTRECALL 26130 immediately, the operation is not considered complete (i.e. 26131 considered pending) until all affected layouts are returned to the 26132 server via the LAYOUTRETURN operation. 26134 Before returning the layout to the server via LAYOUTRETURN, the 26135 client should wait for the response from in-process or in-flight 26136 READ, WRITE, or COMMIT operations that use the recalled layout. 26138 If the client is holding modified data which is affected by a 26139 recalled layout, the client has various options for writing the data 26140 to the server. As always, the client may write the data through the 26141 metadata server. In fact, the client may not have a choice other 26142 than writing to the metadata server when the clora_changed argument 26143 is TRUE and a new layout is unavailable from the server. However, 26144 the client may be able to write the modified data to the storage 26145 device if the clora_changed argument is FALSE; this needs to be done 26146 before returning the layout via LAYOUTRETURN. If the client were to 26147 obtain a new layout covering the modified data's range, then writing 26148 to the storage devices is an available alternative. Note that before 26149 obtaining a new layout, the client must first return the original 26150 layout. 26152 In the case of modified data being written while the layout is held, 26153 the client must use LAYOUTCOMMIT operations at the appropriate time; 26154 as required LAYOUTCOMMIT must be done before the LAYOUTRETURN. If a 26155 large amount of modified data is outstanding, the client may send 26156 LAYOUTRETURNs for portions of the recalled layout; this allows the 26157 server to monitor the client's progress and adherence to the original 26158 recall request. However, the last LAYOUTRETURN in a sequence of 26159 returns, MUST specify the full range being recalled (see 26160 Section 12.5.5.1 for details). 26162 If a server needs to delete a device ID, and there are layouts 26163 referring to the device ID, CB_LAYOUTRECALL MUST be invoked to cause 26164 the client to return all layouts referring to device ID before the 26165 server can delete the device ID. If the client does not return the 26166 affected layouts, the server MAY revoke the layouts. 26168 20.4. Operation 6: CB_NOTIFY - Notify directory changes 26170 Tell the client of directory changes. 26172 20.4.1. ARGUMENT 26174 /* 26175 * Directory notification types. 26176 */ 26177 enum notify_type4 { 26178 NOTIFY4_CHANGE_CHILD_ATTRS = 0, 26179 NOTIFY4_CHANGE_DIR_ATTRS = 1, 26180 NOTIFY4_REMOVE_ENTRY = 2, 26181 NOTIFY4_ADD_ENTRY = 3, 26182 NOTIFY4_RENAME_ENTRY = 4, 26183 NOTIFY4_CHANGE_COOKIE_VERIFIER = 5 26184 }; 26186 /* Changed entry information. */ 26187 struct notify_entry4 { 26188 component4 ne_file; 26189 fattr4 ne_attrs; 26190 }; 26192 /* Previous entry information */ 26193 struct prev_entry4 { 26194 notify_entry4 pe_prev_entry; 26195 /* what READDIR returned for this entry */ 26196 nfs_cookie4 pe_prev_entry_cookie; 26197 }; 26199 struct notify_remove4 { 26200 notify_entry4 nrm_old_entry; 26201 nfs_cookie4 nrm_old_entry_cookie; 26202 }; 26204 struct notify_add4 { 26205 /* 26206 * Information on object 26207 * possibly renamed over. 26208 */ 26209 notify_remove4 nad_old_entry<1>; 26210 notify_entry4 nad_new_entry; 26211 /* what READDIR would have returned for this entry */ 26212 nfs_cookie4 nad_new_entry_cookie<1>; 26213 prev_entry4 nad_prev_entry<1>; 26214 bool nad_last_entry; 26215 }; 26217 struct notify_attr4 { 26218 notify_entry4 na_changed_entry; 26219 }; 26221 struct notify_rename4 { 26222 notify_remove4 nrn_old_entry; 26223 notify_add4 nrn_new_entry; 26224 }; 26226 struct notify_verifier4 { 26227 verifier4 nv_old_cookieverf; 26228 verifier4 nv_new_cookieverf; 26229 }; 26231 /* 26232 * Objects of type notify_<>4 and 26233 * notify_device_<>4 are encoded in this. 26234 */ 26235 typedef opaque notifylist4<>; 26237 struct notify4 { 26238 /* composed from notify_type4 or notify_deviceid_type4 */ 26239 bitmap4 notify_mask; 26240 notifylist4 notify_vals; 26241 }; 26243 struct CB_NOTIFY4args { 26244 stateid4 cna_stateid; 26245 nfs_fh4 cna_fh; 26246 notify4 cna_changes<>; 26247 }; 26249 20.4.2. RESULT 26251 struct CB_NOTIFY4res { 26252 nfsstat4 cnr_status; 26253 }; 26255 20.4.3. DESCRIPTION 26257 The CB_NOTIFY operation is used by the server to send notifications 26258 to clients about changes to delegated directories The registration of 26259 notifications for the directories occurs when the delegation is 26260 established using GET_DIR_DELEGATION. These notifications are sent 26261 over the backchannel. The notification is sent once the original 26262 request has been processed on the server. The server will send an 26263 array of notifications for changes that might have occurred in the 26264 directory. The notifications are sent as list of pairs of bitmaps 26265 and values. See Section 3.3.7 for a description of how NFSv4.1 26266 bitmaps work. 26268 If the server has more notifications than can fit in the CB_COMPOUND 26269 request, it SHOULD send a sequence of serial CB_COMPOUND requests so 26270 that the client's view of the directory does not become confused. 26271 E.g. If the server indicates a file named "foo" is added, and that 26272 the file "foo" is removed, the order in which the client receives 26273 these notifications needs to be the same as the order in which 26274 corresponding operations occurred on the server. 26276 If the client holding the delegation makes any changes in the 26277 directory that cause files or sub directories to be added or removed, 26278 the server will notify that client of the resulting change(s). If 26279 the client holding the delegation is making attribute or cookie 26280 verifier changes only, the server does not need to send notifications 26281 to that client. The server will send the following information for 26282 each operation: 26284 NOTIFY4_ADD_ENTRY 26285 The server will send information about the new directory entry 26286 being created along with the cookie for that entry. The entry 26287 information (data type notify_add4) includes the component name of 26288 the entry and attributes. The server will send this type of entry 26289 when a file is actually being created, when an entry is being 26290 added to a directory as a result of a rename across directories 26291 (see below), and when a hard link is being created to an existing 26292 file. If this entry is added to the end of the directory, the 26293 server will set the nad_last_entry flag to TRUE. If the file is 26294 added such that there is at least one entry before it, the server 26295 will also return the previous entry information (nad_prev_entry, a 26296 variable length array of up to one element. If the array is of 26297 zero length, there is no previous entry), along with its cookie. 26298 This is to help clients find the right location in their file name 26299 caches and directory caches where this entry should be cached. If 26300 the new entry's cookie is available, it will be in the 26301 nad_new_entry_cookie (another variable length array of up to one 26302 element) field. If the addition of the entry causes another entry 26303 to be deleted (which can only happen in the rename case) 26304 atomically with the addition, then information on this entry is 26305 reported in nad_old_entry. 26307 NOTIFY4_REMOVE_ENTRY 26308 The server will send information about the directory entry being 26309 deleted. The server will also send the cookie value for the 26310 deleted entry so that clients can get to the cached information 26311 for this entry. 26313 NOTIFY4_RENAME_ENTRY 26314 The server will send information about both the old entry and the 26315 new entry. This includes name and attributes for each entry. In 26316 addition, if the rename causes the deletion of an entry (i.e. the 26317 case of a file renamed over) then this is reported in 26318 nrn_new_new_entry.nad_old_entry. This notification is only sent 26319 if both entries are in the same directory. If the rename is 26320 across directories, the server will send a remove notification to 26321 one directory and an add notification to the other directory, 26322 assuming both have a directory delegation. 26324 NOTIFY4_CHANGE_CHILD_ATTRS/NOTIFY4_CHANGE_DIR_ATTRS 26325 The client will use the attribute mask to inform the server of 26326 attributes for which it wants to receive notifications. This 26327 change notification can be requested for both changes to the 26328 attributes of the directory as well as changes to any file's 26329 attributes in the directory by using two separate attribute masks. 26330 The client cannot ask for change attribute notification for a 26331 specific file. One attribute mask covers all the files in the 26332 directory. Upon any attribute change, the server will send back 26333 the values of changed attributes. Notifications might not make 26334 sense for some file system wide attributes and it is up to the 26335 server to decide which subset it wants to support. The client can 26336 negotiate the frequency of attribute notifications by letting the 26337 server know how often it wants to be notified of an attribute 26338 change. The server will return supported notification frequencies 26339 or an indication that no notification is permitted for directory 26340 or child attributes by setting the dir_notif_delay and 26341 dir_entry_notif_delay attributes respectively. 26343 NOTIFY4_CHANGE_COOKIE_VERIFIER 26344 If the cookie verifier changes while a client is holding a 26345 delegation, the server will notify the client so that it can 26346 invalidate its cookies and re-send a READDIR to get the new set of 26347 cookies. 26349 20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to Client 26351 Offers a previously requested delegation to the client. 26353 20.5.1. ARGUMENT 26355 struct CB_PUSH_DELEG4args { 26356 nfs_fh4 cpda_fh; 26357 open_delegation4 cpda_delegation; 26359 }; 26361 20.5.2. RESULT 26363 struct CB_PUSH_DELEG4res { 26364 nfsstat4 cpdr_status; 26365 }; 26367 20.5.3. DESCRIPTION 26369 CB_PUSH_DELEG is used by the server to both signal to the client that 26370 the delegation it wants (previously indicated via a want established 26371 from an OPEN or WANT_DELEGATION operation) is available and to 26372 simultaneously offer the delegation to the client. The client has 26373 the choice of accepting the delegation by returning NFS4_OK to the 26374 server, delaying the decision to accept the offered delegation by 26375 returning NFS4ERR_DELAY or permanently rejecting the offer of the 26376 delegation by returning NFS4ERR_REJECT_DELEG. When a delegation is 26377 rejected in this fashion, the want previously established is 26378 permanently deleted and the delegation is subject to acquisition by 26379 another client. 26381 20.5.4. IMPLEMENTATION 26383 If the client does return NFS4ERR_DELAY and there is a conflicting 26384 delegation request, the server MAY process it at the expense of the 26385 client that returned NFS4ERR_DELAY. The client's want will typically 26386 not be cancelled, but MAY processed behind other delegation requests 26387 or registered wants. 26389 When a client returns a status other than NFS4_OK, NFSERR_DELAY, or 26390 NFS4ERR_REJECT_DELAY, the want remains pending, although servers may 26391 decide to cancel the want by sending a CB_WANTS_CANCELLED. 26393 20.6. Operation 8: CB_RECALL_ANY - Keep any N recallable objects 26395 Notify client to return all but N recallable objects. 26397 20.6.1. ARGUMENT 26399 const RCA4_TYPE_MASK_RDATA_DLG = 0; 26400 const RCA4_TYPE_MASK_WDATA_DLG = 1; 26401 const RCA4_TYPE_MASK_DIR_DLG = 2; 26402 const RCA4_TYPE_MASK_FILE_LAYOUT = 3; 26403 const RCA4_TYPE_MASK_BLK_LAYOUT = 4; 26404 const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8; 26405 const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 9; 26406 const RCA4_TYPE_MASK_OTHER_LAYOUT_MIN = 12; 26407 const RCA4_TYPE_MASK_OTHER_LAYOUT_MAX = 15; 26409 struct CB_RECALL_ANY4args { 26410 uint32_t craa_objects_to_keep; 26411 bitmap4 craa_type_mask; 26412 }; 26414 20.6.2. RESULT 26416 struct CB_RECALL_ANY4res { 26417 nfsstat4 crar_status; 26418 }; 26420 20.6.3. DESCRIPTION 26422 The server may decide that it cannot hold all of the state for 26423 recallable objects, such as delegations and layouts, without running 26424 out of resources. In such a case, it is free to recall individual 26425 objects to reduce the load but this would be far from optimal. 26427 Because the general purpose of such recallable objects as delegations 26428 is to eliminate client interaction with the server, the server cannot 26429 interpret lack of recent use as indicating that the object is no 26430 longer useful. The absence of visible use may be the result of a 26431 large number of potential operations eliminated. In the case of 26432 layouts, the layout will be used explicitly but the metadata server 26433 does not have direct knowledge of such use. 26435 In order to implement an effective reclaim scheme for such objects, 26436 the server's knowledge of available resources must be used to 26437 determine when objects must be recalled with the clients selecting 26438 the actual objects to be returned. 26440 Server implementations may differ in their resource allocation 26441 requirements. For example, one server may share resources among all 26442 classes of recallable objects whereas another may use separate 26443 resource pools for layouts and for delegations, or further separate 26444 resources by types of delegations. 26446 When a given resource pool is over-utilized, the server can send a 26447 CB_RECALL_ANY to clients holding recallable objects of the types 26448 involved, allowing it to keep a certain number of such objects and 26449 return any excess. A mask specifies which types of objects are to be 26450 limited. The client chooses, based on its own knowledge of current 26451 usefulness, which of the objects in that class should be returned. 26453 A number of bits are defined. For some of these, ranges are defined 26454 and it is up to the definition of the storage protocol to specify how 26455 these are to be used. There are ranges reserved for object-based 26456 storage protocols and for other experimental storage protocols. An 26457 RFC defining such a storage protocol needs to specify how particular 26458 bits within its range are to be used. For example, it may specify a 26459 mapping between attributes of the layout (read vs. write, size of 26460 area) and the bit to be used or it may define a field in the layout 26461 where the associated bit position is made available by the server to 26462 the client. 26464 RCA4_TYPE_MASK_RDATA_DLG 26466 The client is to return read delegations on non-directory file 26467 objects. 26469 RCA4_TYPE_MASK_WDATA_DLG 26471 The client is to return write delegations on regular file objects. 26473 RCA4_TYPE_MASK_DIR_DLG 26475 The client is to return directory delegations. 26477 RCA4_TYPE_MASK_FILE_LAYOUT 26479 The client is to return layouts of type LAYOUT4_NFSV4_1_FILES. 26481 RCA4_TYPE_MASK_BLK_LAYOUT 26483 See [31] for a description. 26485 RCA4_TYPE_MASK_OBJ_LAYOUT_MIN to RCA4_TYPE_MASK_OBJ_LAYOUT_MAX 26487 See [30] for a description. 26489 RCA4_TYPE_MASK_OTHER_LAYOUT_MIN to RCA4_TYPE_MASK_OTHER_LAYOUT_MAX 26491 This range is reserved for telling the client to recall layouts of 26492 experimental or site specific layout types (see Section 3.3.13). 26494 When a bit is set in the type mask that corresponds to an undefined 26495 type of recallable object, NFS4ERR_INVAL MUST be returned. When a 26496 bit is set that corresponds to a defined type of object, but the 26497 client does not support an object of the type, NFS4ERR_INVAL MUST NOT 26498 be returned. Future minor versions of NFSv4 may expand the set of 26499 valid type mask bits. 26501 CB_RECALL_ANY specifies a count of objects that the client may keep 26502 as opposed to a count that the client must return. This is to avoid 26503 potential race between a CB_RECALL_ANY that had a count of objects to 26504 free with a set of client-originated operations to return layouts or 26505 delegations. As a result of the race, the client and server would 26506 have differing ideas as to how many objects to return. Hence the 26507 client could mistakenly free too many. 26509 If resource demands prompt it, the server may send another 26510 CB_RECALL_ANY with a lower count, even it has not yet received an 26511 acknowledgement from the client for a previous CB_RECALL_ANY with the 26512 same type mask. Although the possibility exists that these will be 26513 received by the client in a order different from the order in which 26514 they were sent, any such permutation of the callback stream is 26515 harmless. It is the job of the client to bring down the size of the 26516 recallable object set in line with each CB_RECALL_ANY received and 26517 until that obligation is met it cannot be canceled or modified by any 26518 subsequent CB_RECALL_ANY for the same type mask. Thus if the server 26519 sends two CB_RECALL_ANY's, the effect will be the same as if the 26520 lower count was sent, whatever the order of recall receipt. Note 26521 that this means that a server may not cancel the effect of a 26522 CB_RECALL_ANY by sending another recall with a higher count. When a 26523 CB_RECALL_ANY is received and the count is already within the limit 26524 set or is above a limit that the client is working to get down to, 26525 that callback has no effect. 26527 Servers are generally free not to give out recallable objects when 26528 insufficient resources are available. Note that the effect of such a 26529 policy is implicitly to give precedence to existing objects relative 26530 to requested ones, with the result that resources might not be 26531 optimally used. To prevent this, servers are well advised to make 26532 the point at which they start issuing CB_RECALL_ANY callbacks 26533 somewhat below that at which they cease to give out new delegations 26534 and layouts. This allows the client to purge its less-used objects 26535 whenever appropriate and so continue to have its subsequent requests 26536 given new resources freed up by object returns. 26538 20.6.4. IMPLEMENTATION 26540 The client can choose to return any type of object specified by the 26541 mask. If a server wishes to limit use of objects of a specific type, 26542 it should only specify that type in the mask sent. The client may 26543 not return requested objects and it is up to the server to handle 26544 this situation, typically by doing specific recalls to properly limit 26545 resource usage. The server should give the client enough time to 26546 return objects before proceeding to specific recalls. This time 26547 should not be less than the lease period. 26549 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal Resources for 26550 Recallable Objects 26552 Signals that resources are available to grant recallable objects. 26554 20.7.1. ARGUMENT 26556 typedef CB_RECALL_ANY4args CB_RECALLABLE_OBJ_AVAIL4args; 26558 20.7.2. RESULT 26560 struct CB_RECALLABLE_OBJ_AVAIL4res { 26561 nfsstat4 croa_status; 26562 }; 26564 20.7.3. DESCRIPTION 26566 CB_RECALLABLE_OBJ_AVAIL is used by the server to signal the client 26567 that the server has resources to grant recallable objects that might 26568 previously have been denied by OPEN, WANT_DELEGATION, GET_DIR_DELEG, 26569 or LAYOUTGET. 26571 The argument craa_objects_to_keep means the total number of 26572 recallable objects of the types indicated in the argument type_mask 26573 that the server believes it can allow the client to have, including 26574 the number of such objects the client already has. A client that 26575 tries to acquire more recallable objects than the server informs it 26576 can have runs the risk of having objects recalled. 26578 The server is not obligated to reserve the difference between the 26579 number of the objects the client currently has and the value of 26580 craa_objects_to_keep, nor does delaying the reply to 26581 CB_RECALLABLE_OBJ_AVAIL prevent the server from using the resources 26582 of the recallable objects for another purpose. Indeed, if a client 26583 responds slowly to CB_RECALLABLE_OBJ_AVAIL, the server might 26584 interpret the client as having reduced capability to manage 26585 recallable objects, and so cancel or reduce any reservation it is 26586 maintaining on behalf of the client. Thus if the client desires to 26587 acquire more recallable objects, it needs to reply quickly to 26588 CB_RECALLABLE_OBJ_AVAIL, and then send the appropriate operations to 26589 acquire recallable objects. 26591 20.8. Operation 10: CB_RECALL_SLOT - change flow control limits 26593 Change flow control limits 26595 20.8.1. ARGUMENT 26597 struct CB_RECALL_SLOT4args { 26598 slotid4 rsa_target_highest_slotid; 26599 }; 26601 20.8.2. RESULT 26603 struct CB_RECALL_SLOT4res { 26604 nfsstat4 rsr_status; 26605 }; 26607 20.8.3. DESCRIPTION 26609 The CB_RECALL_SLOT operation requests the client to return session 26610 slots, and if applicable, transport credits (e.g. RDMA credits for 26611 connections associated with the operations channel) of the session's 26612 fore channel. CB_RECALL_SLOT specifies rsa_target_highest_slotid, 26613 the value of the target highest slot ID the server wants for the 26614 session. The client MUST then progress toward reducing the session's 26615 highest slot ID to the target value. 26617 If the session has only non-RDMA connections associated with its 26618 operations channel, then the client need only wait for all 26619 outstanding requests with a slot ID > rsa_target_highest_slotid to 26620 complete, then send a single COMPOUND consisting of a single SEQUENCE 26621 operation, with the sa_highestslot field set to 26622 rsa_target_highest_slotid. If there are RDMA-based connections 26623 associated with operation channel, then the client needs to also send 26624 enough zero-length RDMA Sends to take the total RDMA credit count to 26625 rsa_target_highest_slotid + 1 or below. 26627 20.8.4. IMPLEMENTATION 26629 If the client fails to reduce highest slot it has on the fore channel 26630 to what the server requests, the server can force the issue by 26631 asserting flow control on the receive side of all connections bound 26632 to the fore channel, and then finish servicing all outstanding 26633 requests that are in slots greater than rsa_target_highest_slotid. 26634 Once that is done, the server can then open the flow control, and any 26635 time the client sends a new request on a slot greater than 26636 rsa_target_highest_slotid, the server can return NFS4ERR_BADSLOT. 26638 20.9. Operation 11: CB_SEQUENCE - Supply backchannel sequencing and 26639 control 26641 Sequence and control 26643 20.9.1. ARGUMENT 26645 struct referring_call4 { 26646 sequenceid4 rc_sequenceid; 26647 slotid4 rc_slotid; 26648 }; 26650 struct referring_call_list4 { 26651 sessionid4 rcl_sessionid; 26652 referring_call4 rcl_referring_calls<>; 26653 }; 26655 struct CB_SEQUENCE4args { 26656 sessionid4 csa_sessionid; 26657 sequenceid4 csa_sequenceid; 26658 slotid4 csa_slotid; 26659 slotid4 csa_highest_slotid; 26660 bool csa_cachethis; 26661 referring_call_list4 csa_referring_call_lists<>; 26662 }; 26664 20.9.2. RESULT 26666 struct CB_SEQUENCE4resok { 26667 sessionid4 csr_sessionid; 26668 sequenceid4 csr_sequenceid; 26669 slotid4 csr_slotid; 26670 slotid4 csr_highest_slotid; 26671 slotid4 csr_target_highest_slotid; 26672 }; 26674 union CB_SEQUENCE4res switch (nfsstat4 csr_status) { 26675 case NFS4_OK: 26676 CB_SEQUENCE4resok csr_resok4; 26677 default: 26678 void; 26679 }; 26681 20.9.3. DESCRIPTION 26683 The CB_SEQUENCE operation is used to manage operational accounting 26684 for the backchannel of the session on which a request is sent. The 26685 contents include the session ID to which this request belongs, the 26686 slot ID and sequence ID used by the server to implement session 26687 request control and exactly once semantics, and exchanged slot ID 26688 maxima which are used to adjust the size of the reply cache. This 26689 operation will appear once as the first operation in each CB_COMPOUND 26690 request or a protocol error MUST result. See Section 18.46.3 for a 26691 description of how slots are processed. 26693 If csa_cachethis is TRUE, then the server is requesting that the 26694 client cache the reply in the callback reply cache. The client MUST 26695 cache the reply (see Section 2.10.5.1.3). 26697 The csa_referring_call_lists array is the list of COMPOUND requests, 26698 identified by session ID, slot ID and sequence ID. These are 26699 requests that the client previously sent to the server. These 26700 previous requests created state that some operation(s) in the same 26701 CB_COMPOUND as the csa_referring_call_lists are identifying. A 26702 session ID is included because leased state is tied to a client ID, 26703 and a client ID can have multiple sessions. See Section 2.10.5.3. 26705 The value of the csa_sequenceid argument relative to the cached 26706 sequence ID on the slot falls into one of three cases. 26708 o If the difference between csa_sequenceid and the client's cached 26709 sequence ID at the slot ID is two (2) or more, or if 26710 csa_sequenceid is less than the cached sequence ID (accounting for 26711 wraparound of the unsigned sequence ID value), then the client 26712 MUST return NFS4ERR_SEQ_MISORDERED. 26714 o If csa_sequenceid and the cached sequence ID are the same, this is 26715 a retry, and the client returns the CB_COMPOUND request's cached 26716 reply. 26718 o If csa_sequenceid is one greater (accounting for wraparound) than 26719 the cached sequence ID, then this is a new request, and the slot's 26720 sequence ID is incremented. The operations subsequent to 26721 CB_SEQUENCE, if any, are processed. If there are no other 26722 operations, the only other effects are to cache the CB_SEQUENCE 26723 reply in the slot, maintain the session's activity, and when the 26724 server receives the CB_SEQUENCE reply, renew the lease of state 26725 related to the client ID. 26727 If the server reuses a slot ID and sequence ID for a completely 26728 different request, the client MAY treat the request as if it is retry 26729 of what it has already executed. The client MAY however detect the 26730 server's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY. 26732 If CB_SEQUENCE returns an error, then the state of the slot (sequence 26733 ID, cached reply) MUST NOT change. 26735 The client returns two "highest_slotid" values: csr_highest_slotid, 26736 and csr_target_highest_slotid. The former is the highest slot ID the 26737 client will accept in a future CB_SEQUENCE operation, and SHOULD NOT 26738 be less than the value of csa_highest_slotid (but see 26739 Section 2.10.5.1 for an exception). The latter is the highest slot 26740 ID the client would prefer the server use on a future CB_SEQUENCE 26741 operation. 26743 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending Delegation 26744 Wants 26746 Retracts promise to signal delegation availability. 26748 20.10.1. ARGUMENT 26750 struct CB_WANTS_CANCELLED4args { 26751 bool cwca_contended_wants_cancelled; 26752 bool cwca_resourced_wants_cancelled; 26753 }; 26755 20.10.2. RESULT 26757 struct CB_WANTS_CANCELLED4res { 26758 nfsstat4 cwcr_status; 26759 }; 26761 20.10.3. DESCRIPTION 26763 The CB_WANTS_CANCELLED operation is used to notify the client that 26764 the some or all wants it registered for recallable delegations and 26765 layouts have been canceled. 26767 If cwca_contended_wants_cancelled is TRUE, this indicates the server 26768 will not be pushing to the client any delegations that become 26769 available after contention passes. 26771 If cwca_resourced_wants_cancelled is TRUE, this indicates the server 26772 will not notify the client when there are resources on the server to 26773 grant delegations or layouts. 26775 After receiving a CB_WANTS_CANCELLED operation, the client is free to 26776 attempt to acquire the delegations or layouts it was waiting for, and 26777 possibly re-register wants. 26779 20.10.4. IMPLEMENTATION 26781 When a client has an OPEN, WANT_DELEGATION, or GET_DIR_DELEGATION 26782 request outstanding, when a CB_WANTS_CANCELLED is sent, the server 26783 may need to make clear to the client whether a promise to signal 26784 delegation availability happened before the CB_WANTS_CANCELLED and is 26785 thus covered by it, or after the CB_WANTS_CANCELLED in which case it 26786 was not covered by it. The server can make this distinction by 26787 putting the appropriate requests into the list of referring calls in 26788 the associated CB_SEQUENCE. 26790 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible lock 26791 availability 26793 Notify client of possible byte-range lock availability. 26795 20.11.1. ARGUMENT 26797 struct CB_NOTIFY_LOCK4args { 26798 nfs_fh4 cnla_fh; 26799 lock_owner4 cnla_lock_owner; 26800 }; 26802 20.11.2. RESULT 26804 struct CB_NOTIFY_LOCK4res { 26805 nfsstat4 cnlr_status; 26806 }; 26808 20.11.3. DESCRIPTION 26810 The server can use this operation to indicate that a byte-range lock 26811 for the given file and lock-owner, previously requested by the client 26812 via an unsuccessful LOCK request, might be available. 26814 This callback is meant to be used by servers to help reduce the 26815 latency of blocking locks in the case where they recognize that a 26816 client which has been polling for a blocking lock may now be able to 26817 acquire the lock. If the server supports this callback for a given 26818 file, it MUST set the OPEN4_RESULT_MAY_NOTIFY_LOCK flag when 26819 responding to successful opens for that file. This does not commit 26820 the server to the use of CB_NOTIFY_LOCK, but the client may use this 26821 as a hint to decide how frequently to poll for locks derived from 26822 that open. 26824 If an OPEN operation results in an upgrade, in which the stateid 26825 returned has an "other" value matching that of a stateid already 26826 allocated, with a new "seqid" indicating a change in the lock being 26827 represented, then the value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag 26828 when responding to that new OPEN controls handling from that point 26829 going forward. When parallel OPENs are done on the same file and 26830 open-owner, the ordering of the "seqid" field of the returned stateid 26831 (subject to wraparound) are to be used to select the controlling 26832 value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag. 26834 20.11.4. IMPLEMENTATION 26836 The server MUST NOT grant the lock to the client unless and until it 26837 receives an actual LOCK request from the client. Similarly, the 26838 client receiving this callback cannot assume that it now has the 26839 lock, or that a subsequent LOCK request for the lock will be 26840 successful. 26842 The server is not required to implement this callback, and even if it 26843 does, it is not required to use it in any particular case. Therefore 26844 the client must still rely on polling for blocking locks, as 26845 described in Section 9.6. 26847 Similarly, the client is not required to implement this callback, and 26848 even it does, is still free to ignore it. Therefore the server MUST 26849 NOT assume that the client will act based on the callback. 26851 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID changes 26853 Tell the client of deviceID changes. 26855 20.12.1. ARGUMENT 26857 /* 26858 * Device notification types. 26859 */ 26860 enum notify_deviceid_type4 { 26861 NOTIFY_DEVICEID4_CHANGE = 1, 26862 NOTIFY_DEVICEID4_DELETE = 2 26863 }; 26865 /* For NOTIFY4_DEVICEID4_DELETE */ 26866 struct notify_deviceid_delete4 { 26867 layouttype4 ndd_layouttype; 26868 deviceid4 ndd_deviceid; 26869 }; 26871 /* For NOTIFY4_DEVICEID4_CHANGE */ 26872 struct notify_deviceid_change4 { 26873 layouttype4 ndc_layouttype; 26874 deviceid4 ndc_deviceid; 26875 bool ndc_immediate; 26876 }; 26878 struct CB_NOTIFY_DEVICEID4args { 26879 notify4 cnda_changes<>; 26880 }; 26882 20.12.2. RESULT 26884 struct CB_NOTIFY_DEVICEID4res { 26885 nfsstat4 cndr_status; 26886 }; 26888 20.12.3. DESCRIPTION 26890 The CB_NOTIFY_DEVICEID operation is used by the server to send 26891 notifications to clients about changes to pNFS device IDs. The 26892 registration of device ID notifications is optional and is done via 26893 GETDEVICEINFO. These notifications are sent over the backchannel 26894 once the original request has been processed on the server. The 26895 server will send an array of notifications, cnda_changes, as a list 26896 of pairs of bitmaps and values. See Section 3.3.7 for a description 26897 of how NFSv4.1 bitmaps work. 26899 As with CB_NOTIFY (Section 20.4.3), it is possible the server has 26900 more notifications than can fit in a CB_COMPOUND, thus requiring 26901 multiple CB_COMPOUNDs. Unlike CB_NOTIFY, serialization is not an 26902 issue because unlike directory entries, device IDs cannot be re-used 26903 after being deleted (Section 12.2.10). 26905 All device ID notifications contain a device ID and a layout type. 26906 The layout type is necessary because two different layout types can 26907 share the same device ID, and the common device ID can have 26908 completely different mappings for each layout type. 26910 The server will send the following notifications: 26912 NOTIFY_DEVICEID4_CHANGE 26913 A previously provided device ID to device address mapping has 26914 changed and the client uses GETDEVICEINFO to obtain the updated 26915 mapping. The notification is encoded in a value of data type 26916 notify_deviceid_change4. This data type also contains a boolean 26917 field, ndc_immediate, which if TRUE indicates that the change will 26918 be enforced immediately, and so the client might not be able to 26919 complete any pending I/O to the device ID. If ndc_immediate is 26920 FALSE, then for an indefinite time, the client can complete 26921 pending I/O. After pending I/O is complete, the client SHOULD get 26922 the new device ID to device address mappings before issuing new 26923 I/O to the device ID. 26925 NOTIFY4_DEVICEID_DELETE 26926 Deletes a device ID from the mappings. This notification MUST NOT 26927 be sent if the client has a layout that refers to the device ID. 26928 In other words if the server is sending a delete device ID 26929 notification, one of the following is true for layouts associated 26930 with the layout type: 26932 * The client never had a layout referring to that device ID. 26934 * The client has returned all layouts referring to that device 26935 ID. 26937 * The server has revoked all layouts referring to that device ID. 26939 The notification is encoded in a value of data type 26940 notify_deviceid_delete4. After a server deletes a device ID, it 26941 MUST NOT reuse that device ID for the same layout type until the 26942 client ID is deleted. 26944 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback Operation 26946 20.13.1. ARGUMENT 26948 void; 26950 20.13.2. RESULT 26952 /* 26953 * CB_ILLEGAL: Response for illegal operation numbers 26954 */ 26955 struct CB_ILLEGAL4res { 26956 nfsstat4 status; 26957 }; 26959 20.13.3. DESCRIPTION 26961 This operation is a placeholder for encoding a result to handle the 26962 case of the server sending an operation code within CB_COMPOUND that 26963 is not defined in the NFSv4.1 specification. See Section 19.2.3 for 26964 more details. 26966 The status field of CB_ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL. 26968 20.13.4. IMPLEMENTATION 26970 A server will probably not send an operation with code OP_CB_ILLEGAL 26971 but if it does, the response will be CB_ILLEGAL4res just as it would 26972 be with any other invalid operation code. Note that if the client 26973 gets an illegal operation code that is not OP_ILLEGAL, and if the 26974 client checks for legal operation codes during the XDR decode phase, 26975 then an instance of data type CB_ILLEGAL4res will not be returned. 26977 21. Security Considerations 26979 Historically the authentication of model of NFS had the entire 26980 machine being the NFS client, and the NFS server trusting the NFS 26981 client to authenticate the end-user. The NFS server in turn shared 26982 its files only to specific clients, as identified by the client's 26983 source network address. Given this model, the AUTH_SYS RPC security 26984 flavor simply identified the end-user using the client to the NFS 26985 server. When processing NFS responses, the client ensured that the 26986 responses came from the same network address and port number that the 26987 request was sent to. While such a model is easy to implement and 26988 simple to deploy and use, it is unsafe. Thus, NFSv4.1 26989 implementations are REQUIRED to support a security model that uses 26990 end to end authentication, where an end-user on a client mutually 26991 authenticates (via cryptographic schemes that do not expose passwords 26992 or keys in the clear on the network) to a principal on an NFS server. 26993 Consideration is also be given to the integrity and privacy of NFS 26994 requests and responses. The issues of end to end mutual 26995 authentication, integrity, and privacy are discussed 26996 Section 2.2.1.1.1. 26998 Note that being REQUIRED to implement does not mean REQUIRED to use; 26999 AUTH_SYS can be used by NFSv4.1 clients and servers. However, 27000 AUTH_SYS is merely an OPTIONAL security flavor in NFSv4.1, and so 27001 interoperability via AUTH_SYS is not assured. 27003 For reasons of reduced administration overhead, better performance 27004 and/or reduction of CPU utilization, users of NFSv4.1 implementations 27005 may opt to not use security mechanisms that enable integrity 27006 protection on each remote procedure call and response. The use of 27007 mechanisms without integrity leaves the user vulnerable to an 27008 attacker in the middle of the NFS client and server that modifies the 27009 RPC request and/or the response. While implementations are free to 27010 provide the option to use weaker security mechanisms, there are three 27011 operations in particular that warrant the implementation overriding 27012 user choices. 27014 o The first two such operations are SECINFO and SECINFO_NO_NAME. It 27015 is RECOMMENDED that the client send both operations such that they 27016 is protected with a security flavor that has integrity protection, 27017 such as RPCSEC_GSS with either the rpc_gss_svc_integrity or 27018 rpc_gss_svc_privacy service. Without integrity protection 27019 encapsulating SECINFO and SECINFO_NO_NAME and their results, an 27020 attacker in the middle could modify results such that the client 27021 might select a weaker algorithm in the set allowed by server, 27022 making the client and/or server vulnerable to further attacks. 27024 o The third operation that should definitely use integrity 27025 protection is any GETATTR for the fs_locations and 27026 fs_locations_info attributes. The attack has two steps. First 27027 the attacker modifies the unprotected results of some operation to 27028 return NFS4ERR_MOVED. Second, when the client follows up with a 27029 GETATTR for the fs_locations or fs_locations_info attributes, the 27030 attacker modifies the results to cause the client migrate its 27031 traffic to a server controlled by the attacker. 27033 Relative to previous NFS versions, NFSv4.1 has additional security 27034 considerations for pNFS (see Section 12.9 and Section 13.12), locking 27035 and session state (see Section 2.10.7.3). 27037 22. IANA Considerations 27039 22.1. Named Attribute Definitions 27041 The NFSv4.1 protocol supports the association of a file with zero or 27042 more named attributes. The name space identifiers for these 27043 attributes are defined as string names. The protocol does not define 27044 the specific assignment of the name space for these file attributes. 27045 Even though the name space is not specifically controlled to prevent 27046 collisions, an IANA registry has been created for the registration of 27047 NFSv4.1 named attributes. Registration will be achieved through the 27048 publication of an Informational RFC and will require not only the 27049 name of the attribute but the syntax and semantics of the named 27050 attribute contents; the intent is to promote interoperability where 27051 common interests exist. While application developers are allowed to 27052 define and use attributes as needed, they are encouraged to register 27053 the attributes with IANA. 27055 Such registered named attributes are presumed to apply to all minor 27056 versions of NFSv4, including those defined subsequently to the 27057 registration. Where the named attribute is intended to be limited 27058 with regard to the minor versions for which they are not be used, the 27059 Informational RFC must clearly state the applicable limits. 27061 22.2. ONC RPC Network Identifiers (netids) 27063 Section 3.3.9) discussed the r_netid field and the corresponding 27064 r_addr field within a netaddr4 structure. The NFSv4 protocol depends 27065 on the syntax and semantics of these fields to effectively 27066 communicate callback and other information between client and server. 27067 Therefore, an IANA registry has been created to include the values 27068 defined in this document and to allow for future expansion based on 27069 transport usage/availability. Additions to this ONC RPC Network 27070 Identifier registry must be done with the publication of an RFC. 27072 The initial values for this registry are as follows (some of this 27073 text is replicated from Section 3.3.9 for clarity): 27075 The Network Identifier (or r_netid for short) is used to specify a 27076 transport protocol and associated universal address (or r_addr for 27077 short). The syntax of the Network Identifier is a US-ASCII string. 27078 The initial definitions for r_netid are: 27080 "tcp" - TCP over IP version 4 27082 "udp" - UDP over IP version 4 27084 "tcp6" - TCP over IP version 6 27086 "udp6" - UDP over IP version 6 27088 Note: the '"' marks are used for delimiting the strings for this 27089 document and are not part of the Network Identifier string. 27091 For the "tcp" and "udp" Network Identifiers the Universal Address or 27092 r_addr (for IPv4) is a US-ASCII string and is of the form described 27093 in Section 3.3.9.1. 27095 For the "tcp" and "udp" Network Identifiers the Universal Address or 27096 r_addr (for IPv6) is a US-ASCII string and is of the form described 27097 in Section 3.3.9.2. 27099 As mentioned, the registration of new Network Identifiers will 27100 require the publication of an RFC with similar detail as listed above 27101 for the Network Identifier itself and corresponding Universal 27102 Address. 27104 22.3. Defining New Notifications 27106 New notification types may be added to the CB_NOTIFY_DEVICEID 27107 operation Section 20.12. This can be done via changes to the 27108 operations that register notifications, or by adding new operations 27109 to NFSv4. This requires a new minor version of NFSv4, and requires a 27110 standards track document from IETF. Another way to add a 27111 notification is to specify a new layout type. Notifications for new 27112 layout types would be requested via GETDEVICELIST (Section 18.41) and 27113 GETDEVICEINFO (Section 18.40). See Section 22.4). 27115 22.4. Defining New Layout Types 27117 New layout type numbers will be requested from IANA. IANA will only 27118 provide layout type numbers for Standards Track RFCs approved by the 27119 IESG, in accordance with Standards Action policy defined in [20]. 27120 All layout types assigned by IANA MUST be in the range 0x00000001 to 27121 0x7FFFFFFF. 27123 The author of a new pNFS layout specification must follow these steps 27124 to obtain acceptance of the layout type as a standard: 27126 1. The author devises the new layout specification. 27128 2. The new layout type specification MUST, at a minimum: 27130 * Define the contents of the layout-type-specific fields of the 27131 following data types: 27133 + the da_addr_body field of the device_addr4 data type; 27135 + the loh_body field of the layouthint4 data type; 27137 + the loc_body field of layout_content4 data type (which in 27138 turn is the lo_content field of the layout4 data type); 27140 + the lou_body field of the layoutupdate4 data type; 27142 * Describe or define the storage access protocol used to access 27143 the data servers 27145 * Describe whether revocation of layouts is supported. 27147 * At a minimum, describe the methods of recovery from: 27149 1. Failure and restart for client, server, storage device. 27151 2. Lease expiration from perspective of the active client, 27152 server, storage device. 27154 3. Loss of layout state resulting in fencing of client access 27155 to storage devices (for an example, see Section 12.7.3). 27157 * A list of any new notification values for CB_NOTIFY_DEVICEID. 27159 * A list of any new recallable object types for CB_RECALL_ANY. 27161 * Include an IANA considerations section. 27163 * Include a security considerations section. 27165 3. The author documents the new layout specification as an Internet 27166 Draft. 27168 4. The author submits the Internet Draft for review through the IETF 27169 standards process as defined in "Internet Official Protocol 27170 Standards" (STD 1). The new layout specification will be 27171 submitted for eventual publication as a standards track RFC. 27173 5. The layout specification progresses through the IETF standards 27174 process; the new option will be reviewed by the NFSv4 Working 27175 Group (if that group still exists), or as an Internet Draft not 27176 submitted by an IETF working group. 27178 22.5. Path Variable Definitions 27180 This section deals with the IANA considerations associated with the 27181 variable substitution feature for location names as described in 27182 Section 11.10.3. As described there, variables subject to 27183 substitution consist of a domain name and a specific name within that 27184 domain, with two separated by a colon. 27186 22.5.1. Path Variable Values 27188 For names with the domain "ietf.org" only three specific names are 27189 currently defined and additional names will only be created via 27190 standards-track RFC's. 27192 For the variable names ${ietf.org:CPU_ARCH} and ${ietf.org:OS_TYPE}, 27193 IANA will have to create a registry of values to be used for that 27194 variable. Applications for such values must contain the variable 27195 name, the proposed value of that variable, and a brief (one or two 27196 paragraphs) explanation of what is indicated by that specific value. 27197 Such requests should be reviewed by nfsv4@ietf.org and a Designated 27198 Expert. 27200 For the name ${ietf.org:OS_VERSION}, no such registry need be created 27201 as the specifics of the values will vary with the value of 27202 ${ietf.org:OS_TYPE}. 27204 22.5.2. Path Variable Names 27206 IANA needs to set up a registry to help make generally available 27207 information about variables of the form ${domain:var}, where domain 27208 is something other than "ietf.org". 27210 Applications for the addition of variables to this registry should 27211 contain the name of the variable and a brief (one or a few 27212 paragraphs) explanation of the purpose of the variable. No review of 27213 these applications by IANA is necessary. 27215 23. References 27217 23.1. Normative References 27219 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 27220 Levels", March 1997. 27222 [2] Eisler, M., "XDR: External Data Representation Standard", 27223 STD 67, RFC 4506, May 2006. 27225 [3] Srinivasan, R., "RPC: Remote Procedure Call Protocol 27226 Specification Version 2", RFC 1831, August 1995. 27228 [4] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol 27229 Specification", RFC 2203, September 1997. 27231 [5] Zhu, L., Jaganathan, K., and S. Hartman, "The Kerberos Version 27232 5 Generic Security Service Application Program Interface (GSS- 27233 API) Mechanism Version 2", RFC 4121, July 2005. 27235 [6] Eisler, M., "LIPKEY - A Low Infrastructure Public Key Mechanism 27236 Using SPKM", RFC 2847, June 2000. 27238 [7] Linn, J., "Generic Security Service Application Program 27239 Interface Version 2, Update 1", RFC 2743, January 2000. 27241 [8] Talpey, T. and B. Callaghan, "RDMA Transport for ONC RPC - A 27242 Work in Progress", Internet Draft draft-ietf-nfsv4-rpcrdma-06, 27243 May 2007. 27245 [9] Talpey, T. and B. Callaghan, "NFS Direct Data Placement - A 27246 Work in Progress", Internet 27247 Draft draft-ietf-nfsv4-nfsdirect-06, May 2007. 27249 [10] Recio, P., Metzler, B., Culley, P., Hilland, J., and D. Garcia, 27250 "A Remote Direct Memory Access Protocol Specification", 27251 RFC 5040, October 2007. 27253 [11] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-Hashing 27254 for Message Authentication", RFC 2104, February 1997. 27256 [12] Shepler, S., Eisler, M., and D. Noveck, "NFSv4 Minor Version 1 27257 XDR Description A Work in Progress", Internet 27258 Draft draft-ietf-nfsv4-minorversion1-dot-x-04.txt, 27259 December 2007. 27261 [13] Hinden, R. and S. Deering, "IP Version 6 Addressing 27262 Architecture", RFC 3513, April 2003. 27264 [14] International Organization for Standardization, "Information 27265 Technology - Universal Multiple-octet coded Character Set (UCS) 27266 - Part 1: Architecture and Basic Multilingual Plane", 27267 ISO Standard 10646-1, May 1993. 27269 [15] Alvestrand, H., "IETF Policy on Character Sets and Languages", 27270 BCP 18, RFC 2277, January 1998. 27272 [16] Hoffman, P. and M. Blanchet, "Preparation of Internationalized 27273 Strings ("stringprep")", RFC 3454, December 2002. 27275 [17] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile 27276 for Internationalized Domain Names (IDN)", RFC 3491, 27277 March 2003. 27279 [18] Schaad, J., Kaliski, B., and R. Housley, "Additional Algorithms 27280 and Identifiers for RSA Cryptography for use in the Internet 27281 X.509 Public Key Infrastructure Certificate and Certificate 27282 Revocation List (CRL) Profile", RFC 4055, June 2005. 27284 [19] National Institute of Standards and Technology, "Cryptographic 27285 Algorithm Object Registration", December 2005. 27287 [20] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA 27288 Considerations Section in RFCs", BCP 26, RFC 2434, 27289 October 1998. 27291 23.2. Informative References 27293 [21] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame, 27294 C., Eisler, M., and D. Noveck, "Network File System (NFS) 27295 version 4 Protocol", RFC 3530, April 2003. 27297 [22] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS Version 3 27298 Protocol Specification", RFC 1813, June 1995. 27300 [23] Eisler, M., "NFS Version 2 and Version 3 Security Issues and 27301 the NFS Protocol's Use of RPCSEC_GSS and Kerberos V5", 27302 RFC 2623, June 1999. 27304 [24] Juszczak, C., "Improving the Performance and Correctness of an 27305 NFS Server", USENIX Conference Proceedings , June 1990. 27307 [25] Reynolds, J., "Assigned Numbers: RFC 1700 is Replaced by an On- 27308 line Database", RFC 3232, January 2002. 27310 [26] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", 27311 RFC 1833, August 1995. 27313 [27] Werme, R., "RPC XID Issues", USENIX Conference Proceedings , 27314 February 1996. 27316 [28] Nowicki, B., "NFS: Network File System Protocol specification", 27317 RFC 1094, March 1989. 27319 [29] Bhide, A., Elnozahy, E., and S. Morgan, "A Highly Available 27320 Network Server", USENIX Conference Proceedings , January 1991. 27322 [30] Halevy, B., Welch, B., and J. Zelenka, "Object-based pNFS 27323 Operations", April 2008, . 27326 [31] Black, D., Fridella, S., and J. Glasgow, "pNFS Block/Volume 27327 Layout", April 2008, . 27330 [32] Callaghan, B., "WebNFS Client Specification", RFC 2054, 27331 October 1996. 27333 [33] Callaghan, B., "WebNFS Server Specification", RFC 2055, 27334 October 1996. 27336 [34] Shepler, S., "NFS Version 4 Design Considerations", RFC 2624, 27337 June 1999. 27339 [35] Simonsen, K., "Character Mnemonics and Character Sets", 27340 RFC 1345, June 1992. 27342 [36] The Open Group, "Protocols for Interworking: XNFS, Version 3W, 27343 ISBN 1-85912-184-5", February 1998. 27345 [37] Floyd, S. and V. Jacobson, "The Synchronization of Periodic 27346 Routing Messages", IEEE/ACM Transactions on Networking 2(2), 27347 pp. 122-136, April 1994. 27349 [38] Satran, J., Meth, K., Sapuntzakis, C., Chadalapaka, M., and E. 27350 Zeidner, "Internet Small Computer Systems Interface (iSCSI)", 27351 RFC 3720, April 2004. 27353 [39] Snively, R., "Fibre Channel Protocol for SCSI, 2nd Version 27354 (FCP-2)", ANSI/INCITS 350-2003, Oct 2003. 27356 [40] Weber, R., "Object-Based Storage Device Commands (OSD)", ANSI/ 27357 INCITS 400-2004, July 2004, 27358 . 27360 [41] The Open Group, "The Open Group Base Specifications Issue 6, 27361 IEEE Std 1003.1, 2004 Edition", 2004. 27363 [42] Callaghan, B., "NFS URL Scheme", RFC 2224, October 1997. 27365 [43] Chiu, A., Eisler, M., and B. Callaghan, "Security Negotiation 27366 for WebNFS", RFC 2755, January 2000. 27368 Appendix A. Acknowledgments 27370 The initial drafts for the SECINFO extensions were edited by Mike 27371 Eisler with contributions from Peng Dai, Sergey Klyushin, and Carl 27372 Burnett. 27374 The initial drafts for the SESSIONS extensions were edited by Tom 27375 Talpey, Spencer Shepler, Jon Bauman with contributions from Charles 27376 Antonelli, Brent Callaghan, Mike Eisler, John Howard, Chet Juszczak, 27377 Trond Myklebust, Dave Noveck, John Scott, Mike Stolarchuk and Mark 27378 Wittle. 27380 Initial drafts relating to multi-server namespace features, including 27381 the concept of referrals, were contributed by Dave Noveck, Carl 27382 Burnett, and Charles Fan with contributions from Ted Anderson, Neil 27383 Brown, and Jon Haswell. 27385 The initial drafts for the Directory Delegations support were 27386 contributed by Saadia Khan with input from Dave Noveck, Mike Eisler, 27387 Carl Burnett, Ted Anderson and Tom Talpey. 27389 The initial drafts for the ACL explanations were contributed by Sam 27390 Falkner and Lisa Week. 27392 The pNFS work was inspired by the NASD and OSD work done by Garth 27393 Gibson. Gary Grider has also been a champion of high-performance 27394 parallel I/O. Garth Gibson and Peter Corbett started the pNFS effort 27395 with a problem statement document for IETF that formed the basis for 27396 the pNFS work in NFSv4.1. 27398 The initial drafts for the parallel NFS support were edited by Brent 27399 Welch and Garth Goodson. Additional authors for those documents were 27400 Benny Halevy, David Black, and Andy Adamson. Additional input came 27401 from the informal group which contributed to the construction of the 27402 initial pNFS drafts; specific acknowledgement goes to Gary Grider, 27403 Peter Corbett, Dave Noveck, Peter Honeyman, and Stephen Fridella. 27405 Fredric Isaman found several errors in draft versions of the ONC RPC 27406 XDR description of the NFSv4.1 protocol. 27408 Audrey Van Belleghem provided, in numerous ways, essential co- 27409 ordination and management of the process of editing the specification 27410 drafts. 27412 Richard Jernigan gave feedback on the file layout's striping pattern 27413 design. 27415 Several formal inspection teams were formed to review various areas 27416 of the protocol. All the inspections found significant errors and 27417 room for improvement. NFSv4.1's inspection teams were: 27419 o ACLs, with the following inspectors: Sam Falkner, Bruce Fields, 27420 Rahul Iyer, Saadia Khan, Dave Noveck, Lisa Week, Mario Wurzl, and 27421 Alan Yoder. 27423 o Sessions, with the following inspectors: William Brown, Tom 27424 Doeppner, Robert Gordon, Benny Halevy, Fredric Isaman, Rick 27425 Macklem, Trond Myklebust, Dave Noveck, Karen Rochford, John Scott, 27426 and Peter Shah. 27428 o Initial pNFS inspection, with the following inspectors: Andy 27429 Adamson, David Black, Mike Eisler, Marc Eshel, Sam Falkner, Garth 27430 Goodson, Benny Halevy, Rahul Iyer, Trond Myklebust, Spencer 27431 Shepler, and Lisa Week. 27433 o Global namespace, with the following inspectors: Mike Eisler, Dan 27434 Ellard, Craig Everhart, Fred Isaman, Trond Myklebust, Dave Noveck, 27435 Theresa Raj, Spencer Shepler, Renu Tewari, and Robert Thurlow. 27437 o NFSv4.1 file layout type, with the following inspectors: Andy 27438 Adamson, Marc Eshel, Sam Falkner, Garth Goodson, Rahul Iyer, Trond 27439 Myklebust, and Lisa Week. 27441 o NFSv4.1 locking and directory delegations, with the following 27442 inspectors: Mike Eisler, Pranoop Erasani, Robert Gordon, Saadia 27443 Khan, Eric Kustarz, Dave Noveck, Spencer Shepler, and Amy Weaver. 27445 o EXCHANGE_ID and DESTROY_CLIENTID, with the following inspectors: 27446 Mike Eisler, Pranoop Erasani, Robert Gordon, Benny Halevy, Fred 27447 Isaman, Saadia Khan, Rick Macklem, Spencer Shepler, and Brent 27448 Welch. 27450 o Final pNFS inspection, with the following inspectors: Andy 27451 Adamson, Mike Eisler, Sam Falkner, Mark Eshel, Jason Glasgow, 27452 Garth Goodson, Robert Gordon, Benny Halevy, Dean Hildebrand, Rahul 27453 Iyer, Suchit Kaura, Trond Myklebust, Anatoly Pinchuk, Spencer 27454 Shepler, Renu Tewari, Lisa Week, and Brent Welch. 27456 A review team worked together to generate the tables of assignments 27457 of error sets to operations and make sure that each such assignment 27458 had two or more people validating it. Participating in the process 27459 were: Andy Adamson, Mike Eisler, Sam Falkner, Garth Goodson, Robert 27460 Gordon, Trond Myklebust, Dave Noveck Spencer Shepler, Tom Talpey, Amy 27461 Weaver, and Lisa Week. 27463 Others who provided comments include: Jason Goldschmidt and Mahesh 27464 Siddheshwar. 27466 Authors' Addresses 27468 Spencer Shepler 27469 Sun Microsystems, Inc. 27470 7808 Moonflower Drive 27471 Austin, TX 78750 27472 USA 27474 Phone: +1-512-401-1080 27475 Email: spencer.shepler@sun.com 27477 Mike Eisler 27478 NetApp 27479 5765 Chase Point Circle 27480 Colorado Springs, CO 80919 27481 USA 27483 Phone: +1-719-599-9026 27484 Email: mike@eisler.com 27485 URI: http://www.eisler.com 27487 David Noveck 27488 NetApp 27489 1601 Trapelo Road, Suite 16 27490 Waltham, MA 02454 27491 USA 27493 Phone: +1-781-768-5347 27494 Email: dnoveck@netapp.com 27496 Full Copyright Statement 27498 Copyright (C) The IETF Trust (2008). 27500 This document is subject to the rights, licenses and restrictions 27501 contained in BCP 78, and except as set forth therein, the authors 27502 retain all their rights. 27504 This document and the information contained herein are provided on an 27505 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 27506 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 27507 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 27508 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 27509 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 27510 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 27512 Intellectual Property 27514 The IETF takes no position regarding the validity or scope of any 27515 Intellectual Property Rights or other rights that might be claimed to 27516 pertain to the implementation or use of the technology described in 27517 this document or the extent to which any license under such rights 27518 might or might not be available; nor does it represent that it has 27519 made any independent effort to identify any such rights. Information 27520 on the procedures with respect to rights in RFC documents can be 27521 found in BCP 78 and BCP 79. 27523 Copies of IPR disclosures made to the IETF Secretariat and any 27524 assurances of licenses to be made available, or the result of an 27525 attempt made to obtain a general license or permission for the use of 27526 such proprietary rights by implementers or users of this 27527 specification can be obtained from the IETF on-line IPR repository at 27528 http://www.ietf.org/ipr. 27530 The IETF invites any interested party to bring to its attention any 27531 copyrights, patents or patent applications, or other proprietary 27532 rights that may cover technology that may be required to implement 27533 this standard. Please address the information to the IETF at 27534 ietf-ipr@ietf.org. 27536 Acknowledgment 27538 Funding for the RFC Editor function is provided by the IETF 27539 Administrative Support Activity (IASA).