idnits 2.17.1 draft-faibish-nfsv4-pnfs-access-permissions-check-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC5661, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document updates RFC5662, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC5661, updated by this document, for RFC5378 checks: 2005-10-21) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 9, 2010) is 5039 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5661 (Obsoleted by RFC 8881) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 NFSv4 Working Group S. Faibish 2 Internet-Draft EMC Corporation 3 Intended status: Proposed Standard D. Black 4 Expires: January 9, 2011 EMC Corporation 5 Updates: 5661, 5662 M. Eisler 6 NetApp 7 J. Glasgow 8 Google 9 July 9, 2010 11 pNFS Access Permissions Check 12 draft-faibish-nfsv4-pnfs-access-permissions-check-03 14 Status of this Memo 16 This Internet-Draft is submitted to IETF in full conformance with the 17 provisions of BCP 78 and BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html 35 This Internet-Draft will expire on January 9, 2010. 37 Copyright Notice 39 Copyright (c) 2010 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with 47 respect to this document. Code Components extracted from this 48 document must include Simplified BSD License text as described in 49 Section 4.e of the Trust Legal Provisions and are provided without 50 warranty as described in the Simplified BSD License. 52 Abstract 54 This document extends the pNFS protocol to communicate errors caused 55 by inability to access data servers referenced by layouts, including 56 checks performed by both clients and the MDS. The extension provides 57 means for clients to communicate client-detected access denial errors 58 to the MDS, including the case in which a client requests direct NFS 59 access via the MDS that the MDS cannot perform. 61 Table of Contents 63 1. Introduction...................................................3 64 2. Conventions used in this document..............................5 65 3. Changes to Operation 51: LAYOUTRETURN (RFC 5661)...............5 66 3.1. ARGUMENT (18.44.1)........................................5 67 3.2. RESULT (18.44.2)..........................................6 68 3.3. DESCRIPTION (18.44.3).....................................6 69 3.4. IMPLEMENTATION (18.44.4)..................................7 70 4. Security Considerations........................................8 71 5. IANA Considerations............................................8 72 6. Conclusions....................................................8 73 7. References.....................................................9 74 7.1. Normative References......................................9 76 1. Introduction 78 Figure 1 shows the overall architecture of a Parallel NFS (pNFS) 79 system: 81 +-----------+ 82 |+-----------+ +-----------+ 83 ||+-----------+ | | 84 ||| | NFSv4.1 + pNFS | | 85 +|| Clients |<------------------------------>| MDS | 86 +| | | | 87 +-----------+ | | 88 ||| +-----------+ 89 ||| | 90 ||| | 91 ||| Storage +-----------+ | 92 ||| Protocol |+-----------+ | 93 ||+----------------||+-----------+ Control | 94 |+-----------------||| | Protocol | 95 +------------------+|| Storage |------------+ 96 +| Devices | 97 +-----------+ 99 Figure 1 pNFS Architecture 101 In this document, "storage device" is used as a general term for a 102 data server and/or storage server for the file, block or object pNFS 103 layouts. 105 The current pNFS protocol [RFC5661] assumes that a client can access 106 every storage device (SD) included in a valid layout sent by the MDS 107 server, and provides no means to communicate client access failures 108 to the MDS. Access failures can impair pNFS performance scaling and 109 allow significant errors to go unreported. If the MDS can access all 110 the storage devices involved, but the client doesn't have sufficient 111 access rights to some storage devices, the client may choose to fall 112 back to accessing the file system using NFSV4.1 without pNFS support; 113 there are environments in which this behavior is undesirable, 114 especially if it occurs silently. An important example is addition of 115 a new storage device to which a large population of pNFS clients 116 (e.g., 1000s) lack access permission. Layouts granted that use this 117 new device, result in client errors, requiring that all I/Os to that 118 new storage device be served by the MDS server. This creates a 119 performance and scalability bottleneck that may be difficult to 120 detect based on I/O behavior because the other storage devices are 121 functioning correctly. 123 The preferable approach to this scenario is to report the access 124 failures before any client attempts to issue any I/Os that can only 125 be serviced by the MDS server. This makes the problem explicit, 126 rather than forcing the MDS, or a system administrator, to diagnose 127 the performance problem caused by client I/O using NFS instead of 128 pNFS. There are limits to this approach because complex mount 129 structures may prevent a client from detecting this situation at 130 mount time, but at a minimum, access problems involving the root of 131 the mount structure can be detected. 133 The most suitable time for the client to report inability to access a 134 storage device is at mount time, but this is not always possible. 135 If the application uses a special tag or a switch to the mount 136 command (e.g., -pnfs) and syscall to declare its intention to use 137 pNFS, at the client, the client can check for both pNFS support and 138 device accessibility. 140 This document introduces an error reporting mechanism that is an 141 extension to the return of a pNFS layout; a pNFS client MAY use this 142 mechanism to inform the MDS that the layout is being returned because 143 one or more data servers are not accessible to the client. Error 144 reporting at I/O time is not affected because the result of an 145 inaccessible data server may not be an I/O error if a subsequent 146 retry of the operation via the MDS is successful. 148 There is a related problem scenario involving an MDS that cannot 149 access some storage devices and hence cannot perform I/Os on behalf 150 of a client. In the case of the block layout [RFC5663] if the MDS has 151 no access to a storage device (e.g., LUN), MDS implementations 152 generally do not export any filesystem using that storage device. In 153 contrast to the block layout, MDSs for the file [RFC5661] and object 154 [RFC5664] layouts may be unable to access the storage devices that 155 store data for an exported filesystem. This enables a file or object 156 layout MDS to provide layouts that contain client-inaccessible 157 devices. For the specific case of adding a new storage device to a 158 filesystem, MDS issuance of test I/Os to the newly added device 159 before using it in layouts avoids this problem scenario, but does not 160 cover loss of access to existing storage devices at a later time. 162 In addition, [RFC5661] states that a client can write through or read 163 from the MDS, even if it has a layout; this assumes that the MDS can 164 access all the storage devices. This document makes that assumed 165 access an explicit requirement. 167 2. Conventions used in this document 169 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 170 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 171 document are to be interpreted as described in RFC-2119 [RFC2119]. 173 3. Changes to Operation 51: LAYOUTRETURN (RFC 5661) 175 The existing LAYOUTRETURN operation is extended by introducing two 176 new layout return types: 178 o LAYOUT4_RET_REC_FSID_NO_ACCESS at fsid scope; and 180 o LAYOUT4_RET_REC_FILE_NO_ACCESS at file scope. 182 The former returns all layouts for the FSID and informs the server 183 that the reason for the return is a storage device connectivity 184 problem, and the latter performs the same function for an individual 185 file layout. 187 3.1. ARGUMENT (18.44.1) 189 The ARGUMENT specification of the LAYOUTRETURN operation in section 190 18.44.1 of [RFC5661] is replaced by the following XDR code [XDR]: 192 /* Constants used for new LAYOUTRETURN and CB_LAYOUTRECALL */ 193 const LAYOUT4_RET_REC_FILE = 1; 194 const LAYOUT4_RET_REC_FSID = 2; 195 const LAYOUT4_RET_REC_ALL = 3; 196 const LAYOUT4_RET_REC_FSID_NO_ACCESS = 4; 197 const LAYOUT4_RET_REC_FILE_NO_ACESSS = 5; 199 enum layoutreturn_type4 { 200 LAYOUTRETURN4_FILE = LAYOUT4_RET_REC_FILE, 201 LAYOUTRETURN4_FSID = LAYOUT4_RET_REC_FSID, 202 LAYOUTRETURN4_ALL = LAYOUT4_RET_REC_ALL, 203 LAYOUTRETURN4_FSID_NO_ACCESS = LAYOUT4_RET_REC_FSID_NO_ACCESS, 204 LAYOUTRETURN4_FILE_NO_ACCESS = LAYOUT4_RET_REC_FILE_NO_ACCESS 205 }; 207 struct layoutreturn_file4 { 208 offset4 lrf_offset; 209 length4 lrf_length; 210 stateid4 lrf_stateid; 211 /* layouttype4 specific data */ 212 opaque lrf_body<>; 213 }; 215 struct layoutreturn_fsid_no_access4 { 216 deviceid4 lrfna_deviceid; 217 nfsstat4 lrfna_status; 218 }; 220 struct layoutreturn_file_no_access4 { 221 offset4 lrfna_offset; 222 length4 lrfna_length; 223 stateid4 lrfna_stateid; 224 deviceid4 lrfna_deviceid; 225 nfsstat4 lrfna_status; 226 /* layouttype4 specific data */ 227 opaque lrfna_body<>; 228 }; 230 union layoutreturn4 switch(layoutreturn_type4 lr_returntype) { 231 case LAYOUTRETURN4_FILE: 232 layoutreturn_file4 lr_layout; 233 case LAYOUTRETURN4_FSID_NO_ACCESS: 234 layoutreturn_fsid_no_access4 lr_fsid<>; 235 case LAYOUTRETURN4_FILE_NO_ACCESS: 236 layoutreturn_file_no_accesss4 lr_layout; 237 default: void; 238 }; 240 3.2. RESULT (18.44.2) 242 The RESULT of the LAYOUTRETURN operation is unchanged; see section 243 18.44.2 of [RFC5661]. 245 3.3. DESCRIPTION (18.44.3) 247 The following text is added to the end of the LAYOUTRETURN operation 248 DESCRIPTION in section 18.44.3 of [RFC5661]: 250 There are two NO_ACCESS layoutreturn_type4 values that indicate lack 251 of storage device access, LAYOUT4_RET_REC_FSID_NO_ACCESS and 252 LAYOUT4_RET_REC_FILE_NO_ACCESS. A client uses these values to return 253 all layouts for an FSID or to return a layout (or portion thereof) 254 for a file, and in both cases to inform the server that the reason 255 for the return is an inability to access one or more storage devices. 256 The same stateid may be used or the client MAY force use of a new 257 stateid in order to report a new error. An NFS error (nfsstat4) is 258 included in the layoutreturn data structures for these two types to 259 distinguish access permission problems from device inaccessibility: 261 o NFS4ERR_PERM SHOULD be used for access permission denial; and 263 o NFS4ERR_NXIO SHOULD be used for inability to access a device. 265 Other NFS errors MAY be used when they are appropriate. All uses of 266 these two layout return types that report errors SHOULD be logged by 267 the client. 269 The client MAY use the new LAYOUT4_RET_REC_FILE_NO_ACCESS instead of 270 LAYOUT_RET_REC_FSID_NO_ACCESS when it has reason to believe that only 271 one, or a small number of files are affected. If the problem affects 272 multiple devices, the client may use multiple file layout return 273 operations to communicate the multiple devices encountering errors; 274 each return operation SHOULD return a layout extent obtained from the 275 device for which an error is being reported. In contrast, 276 LAYOUT_RET_REC_FSID_NO_ACCESS includes an array of 277 pairs to enable errors to be reported for multiple devices in one 278 operation so that the client is not required to repeat the FSID- 279 scoped layout return operation to report multiple errors. 281 3.4. IMPLEMENTATION (18.44.4) 283 The following text is added to the end of the LAYOUTRETURN operation 284 IMPLEMENTATION in section 18.4.4 of [RFC5661]: 286 A client that expects to use pNFS for a mounted filesystem SHOULD 287 check for pNFS support at mount time. This check SHOULD be performed 288 by sending an OPEN request, a LAYOUTGET operation and a GETDEVICELIST 289 operation, followed by layout-type-specific checks for accessibility 290 of each storage device returned by GETDEVICELIST. If the NFS server 291 does not support pNFS, the LAYOUTGET operation will be rejected with 292 an NFS4ERR_NOTSUPP error; in this situation it is up to the client to 293 determine whether it is acceptable to proceed with NFS-only access. 295 When an I/O fails because a storage device is inaccessible, the 296 client SHOULD retry the failed I/O via the MDS. In this situation, 297 before retrying the I/O, the client SHOULD return the layout, or 298 inaccessible portion thereof, and SHOULD indicate which storage 299 device or devices was or were inaccessible. If the client does not 300 return at least the inaccessible portion of the layout before the I/O 301 retry via the MDS, and that I/O retry fails with NFS4ERR_PERM or 302 NFS4ERR_NXIO, then the client MUST return at least the inaccessible 303 portion of layout, as the MDS error indicates that the affected 304 portion of that file is completely inaccessible to the client. 306 Backwards compatibility may require a client to perform two layout 307 return operations to deal with servers that don't understand the 308 NO_ACCESS layoutreturn_type4 values and hence respond with 309 NFS4ERR_INVAL. In this situation, the client SHOULD perform an 310 ordinary FSID or file layout return operation and remember that the 311 new return types are not to be used with that server. 313 The metadata server (MDS) SHOULD NOT use storage devices in pNFS 314 layouts that are not accessible to the MDS. To the extent that an 315 MDS can determine whether storage devices are accessible to clients, 316 an MDS SHOULD NOT include a storage device in any pNFS layouts sent 317 to a client that cannot access that storage device. At a minimum, the 318 server SHOULD perform these storage device accessibility checks 319 before exporting a filesystem that supports pNFS and when the device 320 configuration for such an exported filesystem is changed (e.g., to 321 add a storage device). A client MAY perform I/O via the MDS even when 322 the client holds a layout that covers the I/O; servers MUST support 323 this client behavior. 325 4. Security Considerations 327 All control operations from the MDS to the storage devices, including 328 any operations required for access permission checks, SHOULD be 329 authenticated in order to maintain integrity of stored data. 331 5. IANA Considerations 333 There are no IANA considerations in this document beyond pNFS IANA 334 Considerations are covered in [RFC5661]. 336 6. Conclusions 338 This draft specifies additions to the pNFS protocol addressing client 339 and MDS server inability to access storage devices used in pNFS 340 layouts for all layout types. 342 7. References 344 7.1. Normative References 346 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 347 Requirement Levels", BCP 14, RFC 2119, March 1997. 349 [RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File 350 System (NFS) Version 4 Minor Version 1 Protocol", 351 http://tools.ietf.org/html/rfc5661, January 2010. 353 [RFC5663] Black, D., Glasgow, J., Fridella, S., "Parallel NFS (pNFS) 354 Block/Volume Layout", http://tools.ietf.org/html/rfc5663, 355 January 2010. 357 [RFC5664] Halevy, B., Welch, B., Zelenka, J., "Object-Based Parallel 358 NFS (pNFS) Operations", http://tools.ietf.org/html/rfc5664, 359 January 2010 361 [XDR] Eisler, M., "XDR: External Data Representation Standard", 362 STD 67, RFC 4506, May 2006. 364 Acknowledgments 366 This draft includes ideas from discussions with the primary author of 367 the pNFS object layout, Benny Halevy, and the Linux kernel pNFS 368 maintainers, including Bruce Fields. 370 This document was prepared using 2-Word-v2.0.template.dot. 372 Authors' Addresses 374 Sorin Faibish (editor) 375 EMC Corporation 376 32 Coslin Drive 377 Southboro, MA 01772 378 US 380 Phone: +1 (508) 305-8545 381 Email: sfaibish@emc.com 383 David L. Black 384 EMC Corporation 385 176 South Street 386 Hopkinton, MA 01748 387 US 389 Phone: +1 (508) 293-7953 390 Email: david.black@emc.com 392 Michael Eisler 393 NetApp 394 5765 Chase Point Circle 395 Colorado Springs, CO 80919 396 US 398 Phone: +1 (719) 599-9026 399 Email: mike@eisler.com 401 Jason Glasgow 402 Google 403 5 Cambridge Center, Floors 3-6 404 Cambridge, MA 02142 405 US 407 Phone: +1 (617) 575-1599 408 Email: jglasgow@google.com