idnits 2.17.1 draft-myklebust-nfsv4-pnfs-backend-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 278 has weird spacing: '...s_parms pid...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: The REGISTER_DS operation allows an NFS client to signal to the metadata server its ability to act as a data server towards other NFS clients. Note that the MDS MUST not ever issue a layout for a file for which the client does not hold a read delegation. -- The document date (October 25, 2010) is 4903 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 T. Myklebust 3 Internet-Draft NetApp 4 Expires: April 28, 2011 Y. Allu 5 October 25, 2010 7 Network File System (NFS) version 4 pNFS back end protocol extensions 8 draft-myklebust-nfsv4-pnfs-backend-01 10 Abstract 12 This document describes an extension to the NFSv4.1 draft protocol to 13 allow NFS clients to act as pNFS data servers towards other NFS 14 clients. 16 The intention is to reduce the load on the actual data servers by 17 allowing some trusted clients to share the contents of their data 18 caches with other clients. 20 Keywords 22 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 23 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 24 document are to be interpreted as described in [RFC2119]. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on April 28, 2011. 43 Copyright Notice 45 Copyright (c) 2010 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 2. Description of the proposed data sharing model . . . . . . . . 5 62 2.1. NFS client acting as a pure pNFS client . . . . . . . . . 5 63 2.2. Meta data server responsibilities . . . . . . . . . . . . 5 64 2.3. NFS client acting as a pNFS data server . . . . . . . . . 5 65 3. Security considerations . . . . . . . . . . . . . . . . . . . 7 66 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 67 5. State expiration and recovery considerations . . . . . . . . . 9 68 6. Structured Data Types . . . . . . . . . . . . . . . . . . . . 10 69 6.1. proxy_identifier4 . . . . . . . . . . . . . . . . . . . . 10 70 7. New client operations . . . . . . . . . . . . . . . . . . . . 11 71 7.1. REGISTER_DS - Offer to act as a data server . . . . . . . 11 72 7.1.1. ARGUMENTS . . . . . . . . . . . . . . . . . . . . . . 11 73 7.1.2. RESULTS . . . . . . . . . . . . . . . . . . . . . . . 11 74 7.1.3. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . 12 75 7.2. UNREGISTER_DS - Revoke offer to act as a data server . . . 12 76 7.2.1. ARGUMENTS . . . . . . . . . . . . . . . . . . . . . . 12 77 7.2.2. RESULTS . . . . . . . . . . . . . . . . . . . . . . . 13 78 7.2.3. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . 13 79 7.3. PROXY_OPEN - Check proxy access rights to a file . . . . . 13 80 7.3.1. ARGUMENTS . . . . . . . . . . . . . . . . . . . . . . 13 81 7.3.2. RESULTS . . . . . . . . . . . . . . . . . . . . . . . 13 82 7.3.3. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . 13 83 8. New callback operations . . . . . . . . . . . . . . . . . . . 15 84 8.1. CB_PROXY_REVOKE - revoke proxy access rights to a file . . 15 85 8.1.1. ARGUMENTS . . . . . . . . . . . . . . . . . . . . . . 15 86 8.1.2. RESULTS . . . . . . . . . . . . . . . . . . . . . . . 15 87 8.1.3. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . 15 88 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17 91 1. Introduction 93 The object of this proposal is to allow further scale out of NFS 94 traffic by allowing NFS clients to share the contents of their file 95 data caches with other NFS clients. 97 The model assumes a server workload in which a number of read-only 98 files are commonly accessed by more than one client at a time. A 99 typical use case would be one in which the exported filesystem 100 contains a set of libraries. e.g. a UNIX /lib partition, a set of 101 CAD/CAM objects, or a collection of php modules and other static 102 webserver data. On such systems, a common problem occurs when 103 booting up the cluster when possibly all clients need to access the 104 same library data at roughly the same time. The server bandwidth 105 gets eaten up through serving up the same data over and over again. 107 It is not obvious that use of the pNFS scale out mode is sufficient 108 to avoid this kind of congestion. The fundamental problem is that 109 NFS clients are all accessing the same data, which are striped over 110 the same data servers. The effect may therefore simply be to move 111 the bottleneck from the metadata server over onto the data servers. 113 Current methods of reducing the impact of this congestion typically 114 require the user to dedicate extra resources for the boot process. 115 They include preloading the data on the client in local permanent 116 caches (a.k.a. cachefs), replication of the shared data across 117 several NFS servers and setting up NFS proxy servers. 119 Another solution, which does not require the use of dedicated 120 resources is the peer-to-peer model in which the first few clients to 121 read the data from the server are allowed to share the contents of 122 their cache with the next waves. This RFC attempts to enable such a 123 model by allowing clients which have already cached data to act as 124 pNFS data servers toward their peers. It does so by defining a 125 control protocol in the sense defined in Section 12.2.6 of 126 [draft-ietf-nfsv4-minorversion1-29] to enable the data server to 127 enforce layouts, and negotiate authentication and authorization 128 information with the server. 130 2. Description of the proposed data sharing model 132 2.1. NFS client acting as a pure pNFS client 134 The proposal implies no protocol changes for NFSv4.1 clients that 135 wish only to act as pNFS clients, in order to access the cached data 136 from other clients. These clients will request file layouts from the 137 meta data server using the LAYOUTGET operation in the usual fashion. 138 Should the server return NFS4ERR_LAYOUTUNAVAILABLE, then the client 139 proceeds to read from the file through the metadata server in the 140 usual manner. Otherwise, the client interprets the returned file 141 layout in the manner specified by Section 13 of 142 [draft-ietf-nfsv4-minorversion1-29] (NFSv4.1 as a Storage Protocol in 143 pNFS: the File Layout Type). 145 2.2. Meta data server responsibilities 147 The metadata server has the usual responsibilities as dictated by 148 Section 13 of [draft-ietf-nfsv4-minorversion1-29]. It maintains the 149 list of available data servers for each file, and manages the layout 150 requests from pNFS clients, responds to PROXY_OPEN requests from data 151 servers, and ensures that PROXY_OPEN stateids are revoked when the 152 corresponding layout is revoked. 154 2.3. NFS client acting as a pNFS data server 156 A client that wishes to act as a data server is required to notify 157 the metadata server of that intention using the REGISTER_DS 158 operation. Depending on the circumstances, the client may opt to 159 register as a data server for all cached files, for just a single 160 filesystem, for a collection of filesystems, for a collection of 161 specific files, or for just a single file. 163 As stated in the introduction, the design assumes that the sharing of 164 cached data between NFS clients will reduce the amount of NFS traffic 165 to the permanent storage medium. It therefore only makes sense to 166 invoke this model in the case when the server knows that the client 167 that is acting as a data server is caching the file data 168 aggressively. In order to verify that this is the case, we require 169 that the metadata server can only issue layouts for data servers that 170 hold a read delegation for the file in question. 172 Conversely, a client that is registered to act as a data server, and 173 that receives a READ request for a file for which it does not hold a 174 delegation, MUST reject that request with the error 175 NFS4ERR_PNFS_NO_LAYOUT. 177 When the data server receives a READ request from a client with a 178 stateid or a data server filehandle that it does not recognise, it 179 attempts to validate that request using the PROXY_OPEN call. This 180 operation will convert the data server filehandle as provided by the 181 layout into a real filehandle, that the data server can use to access 182 the file on the metadata server. In order to make it easy for the 183 data server to identify the file, the real filehandle SHOULD match 184 the filehandle that was returned to the client when it received the 185 read delegation. 187 The PROXY_OPEN call also checks the access rights that were granted 188 by the layout and the READ stateid for validity. If the pNFS client 189 in question does not hold a layout for this file, the PROXY_OPEN 190 request from the data server will return NFS4ERR_PNFS_NO_LAYOUT. In 191 this case, the data server should not attempt to service the READ 192 request, but should pass the error on to the pNFS client. 194 If file access was verified by PROXY_OPEN, the data server can then 195 attempt to service the READ request from its cache. Should it fail 196 to find the data in its cache, the data server should attempt to 197 retrieve it from the parent server. 199 When layouts are returned to the metadata server, the data server is 200 made responsible for fencing off any further READ requests. To do 201 so, the metadata server sends a CB_PROXY_REVOKE callback to the data 202 server (or servers) that are referenced by that layout. Upon 203 receiving the CB_PROXY_REVOKE callback, the data server should match 204 the filehandle and stateid arguments to the data filehandle that was 205 previously used as an argument to the PROXY_OPEN request, and the 206 stateid that was returned by that request. Should the client attempt 207 to reuse the same data filehandle and stateid in a future READ 208 request, then the data server SHOULD revalidate the client's access 209 using another PROXY_OPEN rpc call to the metadata server. 211 An NFS client can at all times revoke its offer to act as a data 212 server by using the UNREGISTER_DS operation. This operation takes a 213 single stateid, as returned by the original REGISTER_DS request. 214 When the metadata server receives such a request, it must immediately 215 revoke all layouts that reference that particular data server. It 216 does not need to send a CB_PROXY_REVOKE notification to the data 217 server that it is unregistering, however it MUST notify any other 218 data servers that are referenced by the same layout. 220 3. Security considerations 222 As per Section 13.1 in [draft-ietf-nfsv4-minorversion1-29], it is 223 expected that metadata servers will need to encode server routing 224 information in the data server filehandles. To enable this, the 225 REGISTER_DS request includes a 64-bit cookie argument that the 226 metadata server is required to store. It is then required to encode 227 that 64-bit cookie in the first 64-bits of the data server 228 filehandle. 230 All operations from the data server to the metadata server, including 231 any operations required to refill the file cache in order to satisfy 232 a READ request by the pNFS client should be authenticated using a 233 principal of the form "nfsd/hostname@REALM". It is, however expected 234 that this requirement will be obsoleted, should the proposal for 235 RPCSEC_GSSv3 [draft-williams-rpcsecgssv3] be approved. In this case, 236 the data server may instead choose to create a process credential 237 that asserts the credentials of the pNFS client. 239 4. IANA Considerations 241 This document has no actions for IANA. 243 5. State expiration and recovery considerations 245 Should the pNFS client's session expire on the metadata server, then 246 the latter is required to recall all layouts from the data servers 247 using the CB_PROXY_REVOKE callback. Upon re-establishing the 248 session, the pNFS client then proceeds to follow the usual state 249 recovery routine, including layout recovery. 251 Should the pNFS client's session expire on the data server then it is 252 required to recover that session before it can issue a new READ 253 request. In that case, the data server MUST assume that all existing 254 layouts have been revoked. Should the pNFS client attempt to assert 255 a layout then it MUST be validated using a PROXY_OPEN call. 257 Should the data server's session expire on the metadata server, then 258 the metadata server MUST revoke all layouts that reference that data 259 server. It should also consider as invalid any REGISTER_DS requests 260 that the data server had issued. After recovering its session, the 261 data server MAY reissue the REGISTER_DS requests. 263 Finally, if the metadata server crashes, then the data server SHOULD 264 assert all REGISTER_DS requests as part of the recovery process. 265 Once that is done, it must also assume that all layouts have been 266 revoked, and that any attempt to reuse them MUST be revalidated using 267 a PROXY_OPEN request. Otherwise, both it and the pNFS client perform 268 the normal NFS client recovery process. 270 6. Structured Data Types 272 6.1. proxy_identifier4 274 union proxy_identifier4 switch (uint32_t flavor) { 275 case RPCSEC_GSS: 276 principal_arg pid_principal; 277 case AUTH_SYS: 278 struct authsys_parms pid_authsys; 279 default: 280 void; 281 }; 283 The proxy_identifier4 data type is used to identify the user on 284 behalf of which the data server is issuing a PROXY_OPEN. 286 7. New client operations 288 7.1. REGISTER_DS - Offer to act as a data server 290 7.1.1. ARGUMENTS 292 const NFS4_MDS_IDENTIFIER_SIZE = 8; 294 enum register_ds_type4 { 295 REGISTER_DS_ALL = 0, 296 REGISTER_DS_FILESYSTEM = 1, 297 REGISTER_DS_ADD_FILESYSTEM = 2 298 REGISTER_DS_FILE = 3 299 REGISTER_DS_ADD_FILE = 4 300 }; 302 typedef opaque mds_identifier4[NFS4_MDS_IDENTIFIER_SIZE]; 304 union register_ds (register_ds_type4 ds_type) { 305 case REGISTER_DS_ALL: 306 mds_identifier4 rea_mds_identifier; 307 case REGISTER_DS_FILESYSTEM: 308 /* CURRENT_FH: file on filesystem being re-exported */ 309 mds_identifier4 rea_mds_identifier; 310 case REGISTER_DS_ADD_FILESYSTEM: 311 /* CURRENT_FH: file on filesystem being re-exported */ 312 stateid4 rea_dataserver_stateid; 313 case REGISTER_DS_FILE: 314 /* CURRENT_FH: file being re-exported */ 315 mds_identifier4 rea_mds_identifier; 316 case REGISTER_DS_ADD_FILE: 317 /* CURRENT_FH: file being re-exported */ 318 stateid4 rea_dataserver_stateid; 319 }; 321 struct REGISTER_DS4args { 322 register_ds rea_dsinfo; 323 }; 325 7.1.2. RESULTS 327 union REGISTER_DS4res (nfsstat4 status) { 328 case NFS4_OK: 329 stateid4 res_dataserver_stateid; 330 default: 331 void; 332 }; 334 7.1.3. DESCRIPTION 336 The REGISTER_DS operation allows an NFS client to signal to the 337 metadata server its ability to act as a data server towards other NFS 338 clients. Note that the MDS MUST not ever issue a layout for a file 339 for which the client does not hold a read delegation. 341 The client can register an intention to export all files for which it 342 holds a read delegation, using the argument REGISTER_DS_ALL. 344 It is also anticipated that some NFS setups may have the ability to 345 set a caching and/or re-exporting policy. For such setups, it is 346 possible to set more fine-grained data server policies: 347 REGISTER_DS_FILESYSTEM allows the client to specify that it wants 348 to be a data server for a specific filesystem only. 349 REGISTER_DS_ADD_FILESYSTEM allows the client to specify that it 350 wants to add a filesystem to the data server represented by the 351 stateid 'rea_dataserver_stateid'. 352 REGISTER_DS_FILE allows the client to specify that it wants to 353 serve a particular file only. 354 REGISTER_DS_ADD_FILE allows the client to specify that it wants to 355 add a file to the data server represented by the stateid 356 'rea_dataserver_stateid'. 358 The client should also supply a unique 64-bit identifier in the 359 argument rea_mds_identifier. This identifier should be put as the 360 first 8 bytes of any data server filehandle, and may be used by the 361 data server to identify the MDS to which the filehandle belongs. 363 On success, the server returns the stateid 'res_dataserver_stateid' 364 which acts to identify the data server in future REGISTER_DS calls, 365 and in UNREGISTER_DS calls. 367 The client may in fact hold several data server stateids, and use 368 them to manage the overall policy. 370 7.2. UNREGISTER_DS - Revoke offer to act as a data server 372 7.2.1. ARGUMENTS 374 struct UNREGISTER_DS4args { 375 stateid4 una_dataserver_stateid; 376 }; 378 7.2.2. RESULTS 380 struct UNREGISTER_DS4res { 381 nfsstat4 status; 382 }; 384 7.2.3. DESCRIPTION 386 When the MDS receives an UNREGISTER_DS operation then it must 387 immediately invalidate all state associated with the data server 388 stateid 'una_dataserver_stateid'. 390 It MUST therefore revoke all layouts that refer to the data server 391 that is represented by una_dataserver_stateid. 393 After revoking the layouts, the MDS MUST no longer issue layouts for 394 these files and filesystems using the data server represented by 395 una_dataserver_stateid. 397 7.3. PROXY_OPEN - Check proxy access rights to a file 399 This takes a data server filehandle, the read stateid, and a proxy 400 user identifier, and returns the true filehandle on success. 402 7.3.1. ARGUMENTS 404 struct PROXY_OPEN4args { 405 /* CURRENT_FH: "data server filehandle" */ 406 proxy_identifier4 popa_user_id; 407 stateid4 popa_read_stateid; 408 }; 410 7.3.2. RESULTS 412 union PROXY_OPEN4res switch (nfsstat4 status) { 413 case NFS4_OK: 414 /* CURRENTFH: true filehandle */ 415 stateid4 popr_proxy_stateid; 416 default: 417 void; 418 }; 420 7.3.3. DESCRIPTION 422 The PROXY_OPEN function authenticates the READ request by the pNFS 423 client. If the data filehandle is valid, and the user identified by 424 the popa_user_id is authorised to access the file, then the metadata 425 server returns the true filehandle (as returned by LOOKUP and/or 426 OPEN) of the file. 428 If the pNFS client does not currently hold a layout for this file, 429 then the PROXY_OPEN request should fail with the error 430 NFS4ERR_PNFS_NO_LAYOUT. 432 If the data server filehandle argument cannot be translated into a 433 valid metadata server filehandle, then the errors NFS4ERR_STALE, 434 NFS4ERR_BADHANDLE, or NFS4ERR_FHEXPIRED should be returned, as 435 appropriate. 437 If the stateid argument does not correspond to a valid open stateid, 438 delegation stateid, or lock stateid, for the file that is being 439 attempted READ, then the metadata server should return the 440 appropriate error. 442 In case of success, the metadata server returns a stateid 443 "popr_proxy_stateid" that is used by the CB_PROXY_REVOKE callback to 444 identify which layout is being revoked. 446 8. New callback operations 448 8.1. CB_PROXY_REVOKE - revoke proxy access rights to a file 450 This takes a data server filehandle, and a proxy open stateid, and 451 revokes them. 453 8.1.1. ARGUMENTS 455 struct CB_PROXY_REVOKE4args { 456 nfs_fh4 pra_object; 457 stateid4 pra_proxy_stateid; 458 }; 460 8.1.2. RESULTS 462 struct CB_PROXY_REVOKE4res{ 463 nfsstat4 prr_status; 464 }; 466 8.1.3. DESCRIPTION 468 pra_object is the data server filehandle for the file, whereas 469 pra_proxy_stateid is the stateid that was returned by the PROXY_OPEN 470 operation. 472 Upon receiving this callback, the data server MUST invalidate all 473 state associated with the stateid pra_proxy_stateid, and return 474 NFS4_OK. 476 If the filehandle was not found, the client MUST return 477 NFS4ERR_BADHANDLE. If the stateid was not found, it MUST return 478 NFS4ERR_BAD_STATEID. 480 9. References 482 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 483 Requirement Levels", RFC 2119. 485 [draft-ietf-nfsv4-minorversion1-29] 486 Shepler, S., Eisler, M., and D. Noveck, "NFS Version 4 487 Minor Version 1", draft-ietf-nfsv4-minorversion1 29. 489 [draft-williams-rpcsecgssv3] 490 Williams, N., "Remote Procedure Call (RPC) Security 491 Version 3", draft-williams-rpcsecgssv3 00. 493 Authors' Addresses 495 Trond Myklebust 496 NetApp 497 3215 Bellflower Ct 498 Ann Arbor, MI 48103 499 USA 501 Phone: +1-734-662-6608 502 Email: Trond.Myklebust@netapp.com 504 Yamini Allu 505 2421 Mission College Blvd 506 Santa Clara, CA 95054 507 USA 509 Phone: +1-631-662-3422 510 Email: yamini.allu@gmail.com