idnits 2.17.1 draft-haynes-nfsv4-layout-types-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 10, 2014) is 3662 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 5661 (Obsoleted by RFC 8881) == Outdated reference: A later version (-03) exists of draft-bhalevy-nfsv4-flex-files-01 == Outdated reference: A later version (-07) exists of draft-faibish-nfsv4-pnfs-lustre-layout-06 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 T. Haynes 3 Internet-Draft Primary Data 4 Intended status: Informational April 10, 2014 5 Expires: October 12, 2014 7 Considerations for a New pNFS Layout Type 8 draft-haynes-nfsv4-layout-types-02.txt 10 Abstract 12 This document provides help in distinguishing between the 13 requirements for Network File System (NFS) version 4.1's Parallel NFS 14 (pNFS) and those those specifically directed to the pNFS File Layout. 15 The lack of a clear separation between the two set of requirements 16 may be troublesome for those trying to specify new Layout Types. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at http://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on October 12, 2014. 35 Copyright Notice 37 Copyright (c) 2014 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 53 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 54 2.1. Difference Between a Data Server and a Storage Device . . 4 55 2.2. Requirements Language . . . . . . . . . . . . . . . . . . 4 56 3. The Control Protocol . . . . . . . . . . . . . . . . . . . . 4 57 3.1. Protocol Requirements . . . . . . . . . . . . . . . . . . 5 58 3.2. Non-protocol Requirements . . . . . . . . . . . . . . . . 5 59 3.3. Editorial Requirements . . . . . . . . . . . . . . . . . 6 60 4. Implementations in Existing Layout Types . . . . . . . . . . 6 61 4.1. File Layout Type . . . . . . . . . . . . . . . . . . . . 6 62 4.2. Block Layout Type . . . . . . . . . . . . . . . . . . . . 7 63 4.3. Object Layout Type . . . . . . . . . . . . . . . . . . . 8 64 5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 65 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 66 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 67 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 68 8.1. Normative References . . . . . . . . . . . . . . . . . . 9 69 8.2. Informative References . . . . . . . . . . . . . . . . . 10 70 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 10 71 Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 10 72 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 10 74 1. Introduction 76 Both Parallel Network File System (pNFS) and the File Layout Type 77 were defined in the Network File System (NFS) version 4.1 protocol 78 specification, [RFC5661]. The Block Layout Type was defined in 79 [RFC5663] and the Object Layout Type was in turn defined in 80 [RFC5664]. 82 Some implementers have interpreted the text in Sections 12 ("Parallel 83 NFS (pNFS)") and 13 ("NFSv4.1 as a Storage Protocol in pNFS: the File 84 Layout Type") of [RFC5661] as both being strictly for the File Layout 85 Type. I.e., since Section 13 was not covered in a separate RFC like 86 those for both the Block and Object Layout Types, there is some 87 confusion as to the responsibilities of both the Metadata Server 88 (MDS) and the Data Servers (DS) which were laid out in Section 12. 90 As a consequence, new internet drafts (see [FlexFiles] and [Lustre]) 91 may struggle to meet the requirements to be a pNFS Layout Type. This 92 document clarifies what are the Layout Type independent requirements 93 placed on all Layout Types, whether one of the original three or any 94 new variant. 96 2. Definitions 98 control protocol: is a set of requirements for the communication of 99 information on layouts, stateids, file metadata, and file data 100 between the metadata server and the storage devices. 102 Data Server (DS): is one of the pNFS servers which provide the 103 contents of a file system object which is a regular file. 104 Depending on the layout, there might be one or more data servers 105 over which the data is striped. Note that while the metadata 106 server is strictly accessed over the NFSv4.1 protocol, depending 107 on the Layout Type, the data server could be accessed via any 108 protocol that meets the pNFS requirements. 110 fencing: is when the metadata server prevents the storage devices 111 from processing I/O from a specific client to a specific file. 113 layout: informs a client of which storage devices it needs to 114 communicate with (and over which protocol) to perform I/O on a 115 file. The layout might also provide some hints about how the 116 storage is physically organized. 118 layout iomode: describes whether the layout granted to the client is 119 for read or read/write I/O. 121 layout stateid: is a 128-bit quantity returned by a server that 122 uniquely defines the layout state provided by the server for a 123 specific layout that describes a Layout Type and file (see 124 Section 12.5.2 of [RFC5661]). Further, Section 12.5.3 describes 125 the difference between a layout stateid and a normal stateid. 127 Layout Type: describes both the storage protocol used to access the 128 data and the aggregation scheme used to lays out the file data on 129 the underlying storage devices. 131 metadata: is that part of the file system object which describes the 132 object and not the payload. E.g., it could be the time since last 133 modification, access, etc. 135 Metadata Server (MDS): is the pNFS server which provides metadata 136 information for a file system object. It also is responsible for 137 generating layouts for file system objects. Note that the MDS is 138 responsible for directory-based operations. 140 recalling a layout: is when the metadata server uses a back channel 141 to inform the client that the layout is to be returned in a 142 graceful manner. Note that the client could be able to flush any 143 writes, etc., before replying to the metadata server. 145 revoking a layout: is when the metadata server invalidates the 146 layout such that neither the metadata server nor any storage 147 device will accept any access from the client with that layout. 149 stateid: is a 128-bit quantity returned by a server that uniquely 150 defines the open and locking states provided by the server for a 151 specific open-owner or lock-owner/open-owner pair for a specific 152 file and type of lock. 154 storage device: is another term used almost interchangeably with 155 data server. See Section 2.1 for the nuances between the two. 157 2.1. Difference Between a Data Server and a Storage Device 159 We defined a data server as a pNFS server, which implies that it can 160 utilize the NFSv4.1 protocol to communicate with the client. As 161 such, only the File Layout Type would currently meet this 162 requirement. The more generic concept is a storage device, which can 163 use any protocol to communicate with the client. The requirements 164 for a storage device to act together with the metadata server to 165 provide data to a client are that there is a Layout Type 166 specification for the given protocol and that the metadata server has 167 granted a layout to the client. Note that nothing precludes there 168 being multiple supported Layout Types (i.e., protocols) between a 169 metadata server, storage devices, and client. 171 As storage device is the more encompassing terminology, this document 172 utilizes it over data server. 174 2.2. Requirements Language 176 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 177 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 178 document are to be interpreted as described in [RFC2119]. 180 3. The Control Protocol 182 In Section 12.2.6 of [RFC5661], the control protocol is introduced. 183 There have been no specifications for control protocols, and indeed 184 there need not be such a protocol in use for any given 185 implementation. The control protocol is actually a set of 186 requirements provided to describe the interaction between the 187 metadata server and the storage device. When specifying a new Layout 188 Type, the defining document MUST show how it meets these 189 requirements, especially with respect to the security implications. 191 3.1. Protocol Requirements 193 The broad requirements of such interactions between the metadata 194 server and the storage devices are: 196 (1) NFSv4.1 clients MUST be able to access a file directly through 197 the metadata server and not the storage device. I.e., the 198 metadata server must be able to retrieve the data from the 199 constituent storage devices and present it back to the client 200 via normal NFSv4.1 operations. Whether the metadata server 201 allows access over other protocols (e.g., NFSv3, Server Message 202 Block (SMB), etc) is strictly an implementation choice. 204 (2) The metadata server MUST be able to restrict access to a file on 205 the storage devices when it revokes a layout. The metadata 206 server typically would revoke a layout whenever a client fails 207 to respond to a recall or fails to renew its lease in time. It 208 might also revoke the layout as a means of enforcing a change in 209 state that the storage device cannot directly enforce with the 210 client. 212 (3) Storage devices MUST NOT remove NFSv4.1's access controls: ACLs 213 and file open modes. 215 (4) Locking MUST be respected. 217 (5) The metadata server and the storage devices MUST agree on 218 attributes like modify time, the change attribute, and the end- 219 of-file (EOF) position. 221 Note that "agree" here means that some state changes need not be 222 propagated immediately, although all changes SHOULD be 223 propagated promptly. 225 Note that there is no requirement on how these are implemented. 226 While the File Layout Type does use the stateid to fence off the 227 client, there is no requirement that other Layout Types use this 228 stateid approach. But the other Layout Types MUST document how the 229 client, metadata server, and storage devices interact to meet these 230 requirements. 232 3.2. Non-protocol Requirements 234 In gathering the requirements from Section 12 of [RFC5661], there are 235 some which are notable in their absence: 237 (1) Storage device MUST honor the byte range restrictions present in 238 the layout. I.e., if the layout only provides access to the 239 first 2 MB of the file, then any access after that MUST NOT be 240 granted. 242 (2) The enforcement of authentication and authorization so that 243 restrictions that would be enforced by the metadata server are 244 also enforced by the storage device. Examples include both 245 export access checks and if the layout has an iomode of 246 LAYOUTIOMODE4_READ, then if the client attempts to write, the I/ 247 O may be rejected. 249 While storage devices should make such checks on the layout 250 iomode, [RFC5661] does not mandate that all Layout Types have to 251 make such checks. 253 (3) The allocation and deallocation of storage. I.e., creating and 254 deleting files. 256 Of these, the first two are of concern to this draft and Layout Types 257 SHOULD honor them if at all possible, 259 3.3. Editorial Requirements 261 In addition to these protocol requirements, there are two editorial 262 requirements for drafts that present a new Layout Type. At a 263 minimum, the specification needs to address: 265 (1) The approach the new Layout Type takes towards fencing clients 266 once the metadata server determines that the layout is revoked. 268 (2) The security considerations of the new Layout Type. 270 While these could be envisioned as one section in that the fencing 271 issue might be the only security issue, it is recommended to deal 272 with them separably. 274 The specification of the Layout Type should discuss how the client, 275 metadata server, and storage device act together to meet the protocol 276 requirements. I.e., if the storage device cannot enforce mandatory 277 byte-range locks, then how can the metadata server and the client 278 interact with the layout to enforce those locks? 280 4. Implementations in Existing Layout Types 282 4.1. File Layout Type 284 Not surprisingly, the File Layout Type comes closest to the normal 285 semantics of NFSv4.1. In particular, the stateid used for I/O MUST 286 have the same effect and be subject to the same validation on a data 287 server as it would if the I/O was being performed on the metadata 288 server itself in the absence of pNFS. 290 And while for most implementations the storage devices can do the 291 following validations: 293 o client holds a valid layout, 295 o client I/O matches the layout iomode, and, 297 o client does not go out of the byte ranges, 299 these are each presented as a "SHOULD" and not a "MUST". However, it 300 is just these layout specific checks that are optional, not the 301 normal file access semantics. The storage devices MUST make all of 302 the required access checks on each READ or WRITE I/O as determined by 303 the NFSv4.1 protocol. If the metadata server would deny a READ or 304 WRITE operation on a file due to its ACL, mode attribute, open access 305 mode, open deny mode, mandatory byte-range lock state, or any other 306 attributes and state, the storage device MUST also deny the READ or 307 WRITE operation. And note that while the NFSv4.1 protocol does not 308 mandate export access checks based on the client's IP address, if the 309 metadata server implements such a policy, then that counts as such 310 state as outlined above. 312 As the data filehandle provided by the PUTFH operation and the 313 stateid in the READ or WRITE operation are used to ensure that the 314 client has a valid layout for the I/O being performed, the client can 315 be fenced off for access to a specific file via the invalidation of 316 either key. 318 4.2. Block Layout Type 320 With the Block Layout Type, the storage devices are not guaranteed to 321 be able to enforce file-based security. Typically, storage area 322 network (SAN) disk arrays and SAN protocols provide access control 323 mechanisms (e.g., Logical Unit Number (LUN) mapping and/or masking), 324 which operate at the granularity of individual hosts, not individual 325 blocks. Access to block storage is logically at a lower layer of the 326 I/O stack than NFSv4, and hence NFSv4 security is not directly 327 applicable to protocols that access such storage directly. As such, 328 [RFC5663] is very careful to define that in environments where pNFS 329 clients cannot be trusted to enforce such policies, pNFS Block Layout 330 Types SHOULD NOT be used. 332 The implication here is that the security burden has shifted from the 333 storage devices to the client. It is the responsibility of the 334 administrator doing the deployment to trust the client 335 implementation. However, this is not a new requirement when it comes 336 to SAN protocols, the client is expected to provide block-based 337 protection. 339 This implication also extends to ACLs, locks, and layouts. The 340 storage devices might not be able to enforce any of these and the 341 burden is pushed to the client to make the appropriate checks before 342 sending I/O to the storage devices. As an example, if the metadata 343 server uses a layout iomode for reading to enforce a mandatory read- 344 only lock, then the client has to honor that intent by not sending 345 WRITEs to the storage devices. The basic issue here is that the 346 storage device can be treated as a local dumb disk such that once the 347 client has access to the storage device, it is able to perform either 348 READ or WRITE I/O to the entire storage device. The byte ranges in 349 the layout, any locks, the layout iomode, etc, can only be enforced 350 by the client. 352 While the Block Layout Type does support client fencing upon revoking 353 a layout, the above restrictions come into play again: the 354 granularity of the fencing can only be at the host/logical-unit 355 level. Thus, if one of a client's layouts is unilaterally revoked by 356 the server, it will effectively render useless *all* of the client's 357 layouts for files located on the storage units comprising the logical 358 volume. This may render useless the client's layouts for files in 359 other file systems. 361 4.3. Object Layout Type 363 The Object Layout Type focuses security checks to occur during the 364 allocation of the layout. The client will typically ask for a layout 365 for each byte-range of either READ or READ/WRITE. At that time, the 366 metadata server should verify permissions against the layout iomode, 367 the outstanding locks, the file mode bits or ACLs, etc. As the 368 client may be acting for multiple local users, it MUST authenticate 369 and authorize the user by issuing respective OPEN and ACCESS calls to 370 the metadata server, similar to having NFSv4 data delegations. 372 Upon successful authorization, inside the layout, the client receives 373 a set of object capabilities allowing it I/O access to the specified 374 objects corresponding to the requested iomode. These capabilities 375 are used to enforce access control at the storage devices. Whenever 376 the metadata server detects one of: 378 o the permissions on the object change, 380 o a conflicting mandatory byte-range lock is granted, or 382 o a layout is revoked and reassigned to another client, 383 then it MUST change the capability version attribute on all objects 384 comprising the file to implicitly invalidate any outstanding 385 capabilities before committing to one of these changes. 387 When the metadata server wishes to fence off a client to a particular 388 object, then it can use the above approach to invalidate the 389 capability attribute on the given object. The client can be informed 390 via the storage device that the capability has been rejected and is 391 allowed to fetch a refreshed set of capabilities, i.e., re-acquire 392 the layout. 394 5. Summary 396 In the three published Layout Types, the burden of enforcing the 397 security of NFSv4.1 can fall to either the storage devices (Files), 398 the client (Blocks), or the metadata server (Objects). Such 399 decisions seem to be forced by the native capabilities of the storage 400 devices - if a real control protocol can be implemented, then the 401 burden can be shifted primarily to the storage devices. 403 But as we have seen, the control protocol is actually a set of 404 requirements. And as new Layout Types are published, the enclosing 405 documents minimally MUST address: 407 (1) The fencing of clients after a layout is revoked. 409 (2) The security implications of the native capabilities of the 410 storage devices with respect to the requirements of the NFSv4.1 411 security model. 413 6. Security Considerations 415 The metadata server MUST be able to fence off a client's access to a 416 file stored on a storage device. When it revokes the layout, the 417 client's access MUST be terminated at the storage devices. 419 7. IANA Considerations 421 This document has no actions for IANA. 423 8. References 425 8.1. Normative References 427 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 428 Requirement Levels", March 1997. 430 [RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File 431 System (NFS) Version 4 Minor Version 1 Protocol", RFC 432 5661, January 2010. 434 [RFC5663] Black, D., Fridella, S., and J. Glasgow, "pNFS Block/ 435 Volume Layout", RFC 5663, January 2010. 437 [RFC5664] Halevy, B., Welch, B., and J. Zelenka, "Object-Based 438 Parallel NFS (pNFS) Operations", RFC 5664, January 2010. 440 8.2. Informative References 442 [FlexFiles] 443 Halevy, B., "Parallel NFS (pNFS) Flexible Files Layout", 444 draft-bhalevy-nfsv4-flex-files-01 (Work In Progress), 445 October 2013. 447 [Lustre] Faibish, S. and P. Tao, "Parallel NFS (pNFS) Lustre Layout 448 Operations", draft-faibish-nfsv4-pnfs-lustre-layout-06 449 (Work In Progress), November 2013. 451 Appendix A. Acknowledgments 453 Dave Noveck provided an early review that sharpened the clarity of 454 the definitions. 456 Appendix B. RFC Editor Notes 458 [RFC Editor: please remove this section prior to publishing this 459 document as an RFC] 461 [RFC Editor: prior to publishing this document as an RFC, please 462 replace all occurrences of RFCTBD10 with RFCxxxx where xxxx is the 463 RFC number of this document] 465 Author's Address 467 Thomas Haynes 468 Primary Data, Inc. 469 4300 El Camino Real Ste 100 470 Los Altos, CA 94022 471 USA 473 Phone: +1 408 215 1519 474 Email: thomas.haynes@primarydata.com