idnits 2.17.1 draft-eisler-nfsv4-enterprise-apps-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 256 has weird spacing: '...E4resok ioar_...' == Line 367 has weird spacing: '...E4resok rwar_...' == Line 453 has weird spacing: '...ections wwaa...' == Line 468 has weird spacing: '...E4resok wwar_...' -- The document date (October 14, 2010) is 4936 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '2' on line 350 == Unused Reference: 'I-D.eisler-nfsv4-pnfs-dedupe' is defined on line 770, but no explicit reference was found in the text == Unused Reference: 'I-D.eisler-nfsv4-pnfs-metastripe' is defined on line 775, but no explicit reference was found in the text == Unused Reference: 'I-D.faibish-nfsv4-pnfs-access-permissions-check' is defined on line 780, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-nfsv4-minorversion1' is defined on line 786, but no explicit reference was found in the text == Unused Reference: 'I-D.lentini-nfsv4-server-side-copy' is defined on line 791, but no explicit reference was found in the text == Unused Reference: 'I-D.myklebust-nfsv4-pnfs-backend' is defined on line 797, but no explicit reference was found in the text == Unused Reference: 'I-D.quigley-nfsv4-sec-label' is defined on line 803, but no explicit reference was found in the text == Unused Reference: 'RFC3530' is defined on line 808, but no explicit reference was found in the text == Outdated reference: A later version (-01) exists of draft-eisler-nfsv4-pnfs-dedupe-00 == Outdated reference: A later version (-02) exists of draft-eisler-nfsv4-pnfs-metastripe-01 == Outdated reference: A later version (-06) exists of draft-lentini-nfsv4-server-side-copy-05 == Outdated reference: A later version (-01) exists of draft-myklebust-nfsv4-pnfs-backend-00 == Outdated reference: A later version (-03) exists of draft-quigley-nfsv4-sec-label-01 -- Obsolete informational reference (is this intentional?): RFC 3530 (Obsoleted by RFC 7530) Summary: 0 errors (**), 0 flaws (~~), 18 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force M. Eisler, Ed. 3 Internet-Draft NetApp 4 Intended status: Informational M. Susairaj, Ed. 5 Expires: April 17, 2011 Oracle 6 October 14, 2010 8 Extending NFS to Support Enterprise Applications 9 draft-eisler-nfsv4-enterprise-apps-01 11 Abstract 13 This document proposes a new operating to efficiently initialize 14 files. 16 Status of this Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF). Note that other groups may also distribute 23 working documents as Internet-Drafts. The list of current Internet- 24 Drafts is at http://datatracker.ietf.org/drafts/current/. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 This Internet-Draft will expire on April 17, 2011. 33 Copyright Notice 35 Copyright (c) 2010 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. Code Components extracted from this document must 44 include Simplified BSD License text as described in Section 4.e of 45 the Trust Legal Provisions and are provided without warranty as 46 described in the Simplified BSD License. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 51 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 52 2. Operation XX: INITIALIZE - Initialize File . . . . . . . . . . 4 53 2.1. ARGUMENT . . . . . . . . . . . . . . . . . . . . . . . . . 4 54 2.2. RESULT . . . . . . . . . . . . . . . . . . . . . . . . . . 4 55 2.3. MOTIVATION . . . . . . . . . . . . . . . . . . . . . . . . 4 56 2.4. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . 5 57 2.5. IMPLEMENTATION . . . . . . . . . . . . . . . . . . . . . . 6 58 3. Operation XX: IO_ADVISE - Advise server of client's 59 intended I/O access pattern . . . . . . . . . . . . . . . . . 6 60 3.1. ARGUMENT . . . . . . . . . . . . . . . . . . . . . . . . . 7 61 3.2. RESULT . . . . . . . . . . . . . . . . . . . . . . . . . . 7 62 3.3. MOTIVATION . . . . . . . . . . . . . . . . . . . . . . . . 7 63 3.4. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . 8 64 4. Operation XX: READ_WITH_ADVICE - READ with advice . . . . . . 9 65 4.1. ARGUMENT . . . . . . . . . . . . . . . . . . . . . . . . . 9 66 4.2. RESULT . . . . . . . . . . . . . . . . . . . . . . . . . . 10 67 4.3. MOTIVATION . . . . . . . . . . . . . . . . . . . . . . . . 10 68 4.4. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . 11 69 5. Operation XX: WRITE_WITH_ADVICE - WRITE with advice . . . . . 11 70 5.1. ARGUMENT . . . . . . . . . . . . . . . . . . . . . . . . . 12 71 5.2. RESULT . . . . . . . . . . . . . . . . . . . . . . . . . . 13 72 5.3. MOTIVATION . . . . . . . . . . . . . . . . . . . . . . . . 13 73 5.4. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . 13 74 6. Operation XX: SET_WORKFLOW_TAG - Sets the workflow tag of 75 a given session . . . . . . . . . . . . . . . . . . . . . . . 14 76 6.1. ARGUMENT . . . . . . . . . . . . . . . . . . . . . . . . . 15 77 6.2. RESULT . . . . . . . . . . . . . . . . . . . . . . . . . . 15 78 6.3. MOTIVATION . . . . . . . . . . . . . . . . . . . . . . . . 15 79 6.4. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . 15 80 7. Operation XX: SESSION_CTL - Adjust session parameters . . . . 15 81 7.1. ARGUMENT . . . . . . . . . . . . . . . . . . . . . . . . . 16 82 7.2. RESULT . . . . . . . . . . . . . . . . . . . . . . . . . . 16 83 7.3. MOTIVATION . . . . . . . . . . . . . . . . . . . . . . . . 16 84 7.4. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . 17 85 8. Modification to Operation 42: EXCHANGE_ID - Instantiate 86 Client ID . . . . . . . . . . . . . . . . . . . . . . . . . . 18 87 8.1. ARGUMENT . . . . . . . . . . . . . . . . . . . . . . . . . 18 88 8.2. RESULT . . . . . . . . . . . . . . . . . . . . . . . . . . 18 89 8.3. MOTIVATION . . . . . . . . . . . . . . . . . . . . . . . . 19 90 8.4. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . 19 91 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 92 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 93 11. Security Considerations . . . . . . . . . . . . . . . . . . . 20 94 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 95 12.1. Normative References . . . . . . . . . . . . . . . . . . . 20 96 12.2. Informative References . . . . . . . . . . . . . . . . . . 20 97 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21 99 1. Introduction 101 Enterprise applications (such as databases) have requirements that go 102 beyond the traditional use cases for NFS. The requirements falls 103 into two broad categories: (1) data integrity and (2) quality of 104 service. This document proposes a set of operatons for a future 105 minor version of NFSv4 to support requirements of enterprise 106 applications. 108 1.1. Requirements Language 110 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 111 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 112 document are to be interpreted as described in RFC 2119 [RFC2119]. 114 2. Operation XX: INITIALIZE - Initialize File 116 2.1. ARGUMENT 118 struct INITIALIZE4args { 119 /* CURRENT_FH: file */ 120 stateid4 ia_stateid; 121 offset4 ia_offset; 122 length4 ia_blocksize 123 length4 ia_blockcount; 124 length4 ia_reloff_pattern; 125 length4 ia_reloff_blocknum; 126 opaque ia_pattern<>; 127 }; 129 2.2. RESULT 131 nfsstat4; 133 2.3. MOTIVATION 135 Most enterprise applications that use files almost always need to 136 initialize such files to a known state. Even with existing files, 137 after such a file grows, the application needs to initialize the 138 expanded region the file. The most trivial initial state is 139 intialize every byte to zero. The problem with initializing to zero 140 is that it is often difficult to distinguish a byte-range of 141 initialized to all zeroes from data corruption, since a pattern of 142 zeroes is a probable pattern for corruption. Instead, some 143 applications, such as database management systems, use pattern 144 consisting of bytes or words of non-zero values. Ideally one would 145 like to efficiently initialize an entire file to a specified pattern 146 without having to send WRITE requests for the entire file. The 147 INITIALIZE operation is bandwidth conserving operation for 148 initializing file state. 150 2.4. DESCRIPTION 152 The INITIALIZE operation is used to initialize an open file to an 153 iterated pattern. The pattern consists of a fixed string, and a 154 block number. The pattern is defined by the arguments. 156 o ia_offset: where to start the iterated pattern. This value is 157 specified in bytes. 159 o ia_blocksize: the size of each iteration of the pattern. Each 160 iteration is called a block. 162 o ia_reloff_pattern: the relative offset within a block where to 163 write the specified pattern encoded in ia_pattern. 165 o ia_reloff_blocknum: the relative offset within a block where to 166 write a 64 bit block number. The block number is incremented once 167 a block is written. The block number is always written in little 168 endian order. If ia_reloff_blocknum is set to NFS4_UINT64_MAX, 169 then this informs the server that no block number is to be 170 written. 172 o ia_pattern: a fixed string written to every block. If the length 173 of ia_pattern is zero, then this informs the server that no string 174 is to be written. 176 The field ia_stateid is the stateid corresponding to the current 177 filehandle's share reservation, delegation, or byte range lock. 179 An example will illustrate how the client uses INITIALIZE. Suppose 180 the arguments (except for ia_stateid) are: { 0, 500, 1000, 8, 0, 181 "DeadBeef" }. Then starting with offset zero, the content of the 182 file will have these contents. 184 offset value (decimal or ASCII) 185 0 0 0 0 0 0 0 0 0 186 8 'D' 'e' 'a' 'd' 'B' 'e' 'e' 'f' 187 16-499 zeroes 189 500 0 0 0 0 0 0 0 1 190 508 'D' 'e' 'a' 'd' 'B' 'e' 'e' 'f' 191 516-999 zeroes 193 ... 195 499500 0 0 0 0 0 0 3 231 196 499508 'D' 'e' 'a' 'd' 'B' 'e' 'e' 'f' 197 499516-499999 zeroes 199 2.5. IMPLEMENTATION 201 When an NFS server receives this operation, instead of writing the 202 iterated pattern over each block, it should de-allocate the data of 203 affected range of the file and record the the values of ia_offset, 204 ia_blocksize, ia_blockcount, ia_reloff_pattern, ia_reloff_blocknum, 205 and ia_pattern<> in the file's system metadata. When a client sends 206 a READ request, instead of returning zeroes, it should construct a 207 response corresponding to the pattern specified in the arguments to 208 INITIALIZE. 210 An application likely has a legacy pattern for initialized blocks 211 which cannot be mapped to that specified for INITIALIZE. The 212 application should modified to detect that the block corresponds to 213 INITIALIZE's pattern. When the application sees such a block, it can 214 overwrite the block with the legacy pattern. Note that will cause 215 the block to be allocated on the NFS server. 217 When the length of ia_pattern is zero and the value of 218 ia_reloff_blocknum is NFS4_UINT64_MAX, then the client is requesting 219 that a hole be punched into the file. 221 3. Operation XX: IO_ADVISE - Advise server of client's intended I/O 222 access pattern 224 3.1. ARGUMENT 226 enum io_advise_type { 227 IO_ADVISE4_SEQUENTIAL_CACHE = 0, 228 IO_ADVISE4_SEQUENTIAL_DONTCACHE = 1, 229 IO_ADVISE4_RANDOM = 2, 230 IO_ADVISE4_PREFETCH = 3, 231 IO_ADVISE4_PREFETCH_OPPORTUNISTIC = 4, 232 IO_ADVISE4_INTENT_TO_WRITE = 5, 233 IO_ADVISE4_RECENTLY_USED = 6 234 }; 236 struct io_directions { 237 stateid4 iod_stateid; 238 offset4 iod_offset; 239 bitmap4 iod_flags; 240 }; 242 struct IO_ADVISE4args { 243 /* CURRENT_FH: file */ 244 io_directions ioaa_directions; 245 length4 ioaa_count; 246 }; 248 3.2. RESULT 250 struct IO_ADVISE4resok { 251 bitmap4 ioar_flags; 252 }; 254 union IO_ADVISE4res switch (nfsstat4 ioar_status) { 255 case NFS4_OK: 256 IO_ADVISE4resok ioar_resok4; 257 default: 258 void; 259 }; 261 3.3. MOTIVATION 263 The client is in a better position to deduce the intended I/O pattern 264 than the server, especially if the application provides this 265 information. With this information, the server can optimize I/O to 266 the file. 268 3.4. DESCRIPTION 270 The IO_ADVISE operation is used advise the server as to how the 271 holder of the stateid intends to access the file over the specified 272 byte range (iod_offset through iod_offset + ioaa_count - 1). 274 o IO_ADVISE4_SEQUENTIAL_CACHE: Sequential access to data expected. 275 The server should leave data in its cache. 277 o IO_ADVISE4_SEQUENTIAL_DONTCACHE: Sequential access to data 278 expected. The server does not need to leave data in its cache. 280 o IO_ADVISE4_RANDOM: Random access to data expected. 282 o IO_ADVISE4_PREFETCH: Stateid holder expects to access the data 283 soon; prefetch data in preparation. 285 o IO_ADVISE4_PREFETCH_OPPORTUNISTIC: Stateid holder expects to 286 access the data soon; prefetch if it can be done at a marginal 287 cost. 289 o IO_ADVISE4_INTENT_TO_WRITE: Byte range will be written soon so no 290 point in caching data. 292 o IO_ADVISE4_RECENTLY_USED: The client has recently accessed the 293 byte range in its own cache. This informs the server that the 294 data in the byte range remains important to the client. When the 295 server reaches resource exhaustion, knowing which data is more 296 important allows the server to make better choices about which 297 data to, for example purge from a cache, or move to secondary 298 storage. It also informs the server which delegations are more 299 important, since if delegations are working correctly, once 300 delegated to a client, a server might never receive another I/O 301 request for the file. 303 The results indicate which advice the server intends to follow. The 304 server MUST NOT return an error if it does not recognize or does not 305 support the requested advice. The server MAY return different advice 306 than what the client requested. If it does, then this might be due 307 to one of several conditions, including, but not limited to: another 308 client advising of a different I/O access pattern; a different I/O 309 access pattern from another client that that the server has 310 heuristically detected; or the server is not able to support the 311 requested I/O access pattern, perhaps due to a temporary resource 312 limitation (for example, a request for IO_ADVISE4_SEQUENTIAL_CACHE 313 might not be supported because the server cannot afford to cache 314 data, and/or cannot afford to queue read-a-head requests). 316 4. Operation XX: READ_WITH_ADVICE - READ with advice 318 4.1. ARGUMENT 320 enum io_advise_type { 321 IO_ADVISE4_SEQUENTIAL_CACHE = 0, 322 IO_ADVISE4_SEQUENTIAL_DONTCACHE = 1, 323 IO_ADVISE4_RANDOM = 2, 324 IO_ADVISE4_PREFETCH = 3, 325 IO_ADVISE4_PREFETCH_OPPORTUNISTIC = 4, 326 IO_ADVISE4_INTENT_TO_WRITE = 5, 327 IO_ADVISE4_RECENTLY_USED = 6 328 }; 330 struct io_directions { 331 stateid4 iod_stateid; 332 offset4 iod_offset; 334 bitmap4 iod_flags; 335 }; 337 struct READ_WITH_ADVICE4args { 338 /* CURRENT_FH: file */ 339 io_directions rwaa_directions; 340 length4 rwaa_count; 341 }; 343 4.2. RESULT 345 const NFS4_TWO_GB = 0x80000000; 347 typedef opaque twoGB_byte_array4[NFS4_INT32_MAX]; 349 struct fourGB_buffer4 { 350 twoGB_byte_array4[2]; 351 } 353 struct large_buffer4 { 354 fourGB_buffer4 lb_big_buffers<>; 355 opaque lb_small_buffer<>; 356 } 358 struct READ_WITH_ADVICE4resok { 359 bool rwar_eof; 360 bitmap4 rwar_flags; 361 large_buffer4 rwar_data; 363 }; 365 union READ_WITH_ADVICE4res switch (nfsstat4 rwar_status) { 366 case NFS4_OK: 367 READ_WITH_ADVICE4resok rwar_resok4; 368 default: 369 void; 370 }; 372 4.3. MOTIVATION 374 Under some circumstances, the IO_ADVISE operation is insufficient 375 when the client is also performing a READ operation. Some advice 376 needs to be communicated atomically with the READ operation and an 377 IO_ADVISE in the same COMPOUND operation as the READ operation would 378 fail to provide the necessary advice. For example, if IO_ADVISE 379 proceeded READ, and the server was given advice to not cache the data 380 requested by READ, the IO_ADVISE would be too late, because the 381 server might already have cached the data. If IO_ADVISE preceded 382 READ, in order to be effective, the advice would have to be 383 communicated across two operations in the same COMPOUND. This would 384 complicate the server implementation. 386 4.4. DESCRIPTION 388 The READ_WITH_ADVICE operation is used read from a file and to advise 389 the server as to how the reader of intends to access the file over 390 the specified byte range (iod_offset through iod_offset + rwaa_count 391 - 1). 393 o IO_ADVISE4_SEQUENTIAL_CACHE: Sequential access to data expected. 394 The server should leave data in its cache. 396 o IO_ADVISE4_SEQUENTIAL_DONTCACHE: Sequential access to data 397 expected. The server does not need to leave data in its cache. 399 o IO_ADVISE4_RANDOM: Random access to data expected. 401 o IO_ADVISE4_PREFETCH: Not applicable. 403 o IO_ADVISE4_PREFETCH_OPPORTUNISTIC: Not applicable. 405 o IO_ADVISE4_INTENT_TO_WRITE: Byte range will be written soon so no 406 point in caching data. 408 o IO_ADVISE4_RECENTLY_USED: Explicit hint to keep data of byte range 409 in cache. 411 The results indicate which advice the server intends to follow. The 412 server MUST NOT return an error if it does not recognize or does not 413 support the requested advice. 415 The intent is that READ_WITH_ADVICE is preferred over READ. In 416 addition to providing I/O hints, READ_WITH_ADVICE uses 64 bit data 417 lengths, which anticipates the expected improvements in average 418 network speeds and network buffer capacities. Because the XDR 419 standard does not support 64 bit array lengths, the large_buffer4 420 data type is introduced to encode an array of zero or more buffers of 421 fixed size of 2^32 bytes, followed by a variable length array of up 422 to 2^32 - 1 bytes 424 5. Operation XX: WRITE_WITH_ADVICE - WRITE with advice 425 5.1. ARGUMENT 427 enum stable_how4 { /* from NFSv4.0 */ 428 UNSTABLE4 = 0, 429 DATA_SYNC4 = 1, 430 FILE_SYNC4 = 2, 431 LAYOUT_SYNC4 = 3 /* new */ 432 }; 434 enum io_advise_type { 435 IO_ADVISE4_SEQUENTIAL_CACHE = 0, 436 IO_ADVISE4_SEQUENTIAL_DONTCACHE = 1, 437 IO_ADVISE4_RANDOM = 2, 438 IO_ADVISE4_PREFETCH = 3, 439 IO_ADVISE4_PREFETCH_OPPORTUNISTIC = 4, 440 IO_ADVISE4_INTENT_TO_WRITE = 5, 441 IO_ADVISE4_RECENTLY_USED = 6 442 }; 444 struct io_directions { 445 stateid4 iod_stateid; 446 offset4 iod_offset; 447 bitmap4 iod_flags; 448 }; 450 struct WRITE_WITH_ADVICE4args { 451 /* CURRENT_FH: file */ 452 stable_how4 wwaa_stable; 453 io_directions wwaa_directions; 454 large_buffer4 wwaa_data<>; 455 }; 457 5.2. RESULT 459 struct WRITE_WITH_ADVICE4resok { 460 length4 wwar_count; 461 stable_how4 wwar_committed; 462 bitmap4 wwar_flags; 464 }; 466 union WRITE_WITH_ADVICE4res switch (nfsstat4 wwar_status) { 467 case NFS4_OK: 468 WRITE_WITH_ADVICE4resok wwar_resok4; 469 default: 470 void; 471 }; 473 5.3. MOTIVATION 475 Under some circumstances, the IO_ADVISE operation is insufficient 476 when the client is also performing a WRITE operation. Some advice 477 needs to be communicated atomically with the WRITE operation and an 478 IO_ADVISE in the same COMPOUND operation as the WRITE operation would 479 fail to provide the necessary advice. For example, if IO_ADVISE 480 proceeded WRITE and the server was given advice to not cache the data 481 requested by WRITE the IO_ADVISE would be too late, because the 482 server might already have cached the data. If IO_ADVISE preceded 483 WRITE in order to be effective, the advice would have to be 484 communicated across two operations in the same COMPOUND. This would 485 complicate the server implementation. 487 This operation adds a new enumerated value for stable_how4 called 488 LAYOUT_SYNC4 in order to reduce the need for LAYOUT_COMMIT 489 operations. 491 5.4. DESCRIPTION 493 The WRITE_WITH_ADVICE operation is used write to a file and to advise 494 the server as to how the writer intends to access the file over the 495 specified byte range (iod_offset through iod_offset + amount of data 496 in wwaa_data - 1). 498 o IO_ADVISE4_SEQUENTIAL_CACHE: Sequential access to data expected. 499 The server should leave data in its cache. 501 o IO_ADVISE4_SEQUENTIAL_DONTCACHE: Sequential access to data 502 expected. The server does not need to leave data in its cache. 504 o IO_ADVISE4_RANDOM: Random access to data expected. 506 o IO_ADVISE4_PREFETCH: Not applicable. 508 o IO_ADVISE4_PREFETCH_OPPORTUNISTIC: Not applicable. 510 o IO_ADVISE4_INTENT_TO_WRITE: Byte range will be over-written soon 511 so no point in caching data. 513 o IO_ADVISE4_RECENTLY_USED: Explicit hint to keep data of byte range 514 in cache. 516 The results indicate which advice the server intends to follow. The 517 server MUST NOT return an error if it does not recognize or does not 518 support the requested advice. 520 The intent is that WRITE_WITH_ADVICE is preferred over WRITE. In 521 addition to providing I/O hints, WRITE_WITH_ADVICE uses 64 bit data 522 lengths, which anticipates the expected improvements in average 523 network speeds and network buffer capacities. Because the XDR 524 standard does not support 64 bit array lengths, the large_buffer4 525 data type is introduced to encode an array of zero or more buffers of 526 fixed size of 2^32 bytes, followed by a variable length array of up 527 to 2^32 - 1 bytes 529 If general, if the value of wwaa_stable is valid, then the value of 530 wwar_committed in the reply MUST NOT be less than the value of 531 wwaa_stable. The exception is if the wwaa_stable is LAYOUT_SYNC4. 532 LAYOUT_SYNC4 is an enumerated value that can be used by the client 533 when the server is an pNFS data server, and the client has a layout 534 that covers the byte range specified by iod_offset and the amlount of 535 data in wwaa_data. If the client sends a WRITE_WITH_ADVICE to a data 536 server with wwaa_stable set to LAYOUT_SYNC4, then a successful reply 537 MUST return value of wwar_committed equal to LAYOUT_SYNC4 or 538 FILE_SYNC4. Regardless what value wwaa_stable is, if the server is a 539 pNFS data server, it MAY return a value of wwar_committed equal to 540 LAYOUT_SYNC4. Whenever wwar_committed is LAYOUT_SYNC4, this 541 indicates that range of the layout covered by iod_offset and 542 wwar_count has been committed to the metadata server, and there is 543 not need to send a LAYOUT_COMMIT for that range. 545 6. Operation XX: SET_WORKFLOW_TAG - Sets the workflow tag of a given 546 session 548 6.1. ARGUMENT 550 struct SET_WORKFLOW_TAG 4args { 552 uint64_t swta_tag; 553 }; 555 6.2. RESULT 557 nfsstat4 559 6.3. MOTIVATION 561 Enterprise applications require guarantees of quality and/or priority 562 of service Providing end-to-end guarantees requires awareness at the 563 file services level of the necessary quality and/or priority. 565 6.4. DESCRIPTION 567 Sets the workflow tag of a given session. All operations in progress 568 before the server receives SET_WORKFLOW_TAG use the previous tag (if 569 any). All operations received after the server receives 570 SET_WORKFLOW_TAG use the new tag. 572 7. Operation XX: SESSION_CTL - Adjust session parameters 573 7.1. ARGUMENT 575 struct channel_attrs4 { /* from NFSv4.1 */ 576 count4 ca_headerpadsize; 577 count4 ca_maxrequestsize; 578 count4 ca_maxresponsesize; 579 count4 ca_maxresponsesize_cached; 580 count4 ca_maxoperations; 581 count4 ca_maxrequests; 582 uint32_t ca_rdma_ird<1>; 583 }; 585 /* from NFSv4.1 */ 587 const CREATE_SESSION4_FLAG_PERSIST = 0x00000001; 588 const CREATE_SESSION4_FLAG_CONN_BACK_CHAN = 0x00000002; 589 const CREATE_SESSION4_FLAG_CONN_RDMA = 0x00000004; 591 struct session_ctl4 { 592 uint32_t sc_flags; 593 channel_attrs4 sc_fore_chan_attrs; 594 channel_attrs4 sc_back_chan_attrs; 595 }; 597 typedef session_ctl SESSION_CTL4args; 599 7.2. RESULT 601 union SESSION_CTL4res switch (nfsstat4 scr_status) { 602 case NFS4_OK: 603 session_ctl4 scr_resok4; 604 default: 605 void; 606 }; 608 7.3. MOTIVATION 610 The introduction of the session model in NFSv4.1 imposes an explicit 611 limitation on the number of outstanding requests a client can make of 612 an NFS server. In enterprise applications, it is possible each NFS 613 request corresponds to a single application request. Thus, the size 614 of the slot table can bound the number of outstanding application 615 requests. While there are workarounds (examples include (1)implement 616 a mapping layer between application's request slot list and the 617 client's slot table (2) create additional sessions in order to 618 preserve a one-to-one mapping between application and client slots), 619 these workarounds introduce complexity. The application's needs for 620 more slots are dynamic. The NFSv4.1 model assumes a dynamic slot 621 table, but the size of the slot table is driven by the server via the 622 reply to the SEQUENCE operation and the CB_RECALL_SLOT operation. 623 What is missing is a method for the client to request a larger slot 624 table. 626 7.4. DESCRIPTION 628 This operation allows the client to request changes to the session's 629 parameters. There are three major fields in the arguments and 630 results: 632 o sc_flags. These flags correspond to the csa_flags and csr_flags 633 argument and result of CREATE_SESSION. In the result, the value 634 of a bit in sc_flags MUST be one of: 636 * The corresponding bit in sc_flags of the arguments to 637 SESSION_CTL. 639 * The corresponding bit in sc_flags of the result of the previous 640 SESSION_CTL that the server executed. 642 * If the server has not executed a previous SESSION_CTL, then the 643 corresponding bit in the csr_flags field of the reply the 644 CREATE_SESSION operation that created the session. 646 o sc_fore_chan_attrs. In the arguments of SESSION_CTL, the fields 647 within sc_fore_chan_attrs correspond to the fields of the argument 648 csa_fore_chan_attrs in the arguments of CREATE_SESSION. In the 649 results of SESSION_CTL, the values fields within 650 sc_fore_chan_attrs correspond to the fields of the result 651 csr_fore_chan_attrs in the response to CREATE_SESSION. The values 652 of the fields in the result sc_fore_chan_attrs are governed 653 according to the same rules that govern the values of the fields 654 of csr_fore_chan_attrs. 656 o sc_back_chan_attrs. In the arguments of SESSION_CTL, the fields 657 within sc_back_chan_attrs correspond to the fields of the argument 658 csa_back_chan_attrs in the arguments of CREATE_SESSION. In the 659 results of SESSION_CTL, the values fields within 660 sc_back_chan_attrs correspond to the fields of the result 661 csr_back_chan_attrs in the response to CREATE_SESSION. The values 662 of the fields in the result sc_back_chan_attrs are governed 663 according to the same rules that govern the values of the fields 664 of csr_back_chan_attrs. 666 The SESSION_CTL operation MUST be sent on a COMPOUND operation 667 prefixed by a SEQUENCE operation with the sa_slotid argument set to 668 zero. If SESSION_CTL requests a smaller slot table on the fore 669 channel, and there are operations in progress on other slots of the 670 fore channel, the server MUST do one of (1) return 671 NFS4ERR_FORE_CHAN_BUSY (a new error); (2) allow SESSION_CTL to 672 succeed, wait for the in progress operations to complete and reply to 673 those operations before replying to SESSION_CTL; or (3) if all the in 674 progress operations allow the one or both of the errors NFS4ERR_DELAY 675 or NFS4ERR_SERVERFAULT, allow SESSION_CTL to succeed, abort the in 676 progress operations, reply with to those operations with either 677 NFS4ERR_DELAY or NFS4ERR_SERVERFAULT, and then reply to SESSION_CTL. 678 Because a server is free to return NFS4ERR_FORE_CHAN_BUSY, it is 679 strongly RECOMMENDED that when a client sends a SESSION_CTL operation 680 that it have no other requests in progress. 682 If SESSION_CTL request a smaller slot table on the backchannel and 683 there are operations in progress on other slots of the backchannel, 684 the server MUST do one of (1) return NFS4ERR_BACK_CHAN_BUSY; (2) 685 allow SESSION_CTL to succeed, and for wait replies to the in progress 686 backchannel operations before replying to SESSION_CTL; or (3) if all 687 the in progress operations allow the one or both of the errors 688 NFS4ERR_DELAY or NFS4ERR_SERVERFAULT, allow SESSION_CTL to succeed, 689 abort the in progress operations, reply with to those operations with 690 either NFS4ERR_DELAY or NFS4ERR_SERVERFAULT, and then reply to 691 SESSION_CTL. Before a client sends a SESSION_CTL operation, it 692 SHOULD reply to all in progress backchannel requests of the same 693 session as the SESSION_CTL operation. 695 8. Modification to Operation 42: EXCHANGE_ID - Instantiate Client ID 697 8.1. ARGUMENT 699 /* new */ 700 const EXCHGID4_FLAG_SUPP_FENCE_OPS = 0x00000004; 702 8.2. RESULT 703 Unchanged 705 8.3. MOTIVATION 707 Enterprise applications require guarantees that an operation has 708 either aborted or completed. NFSv4.1 provides this guarantee as long 709 as the session is alive: simply send a SEQUENCE operation on the same 710 slot with a new sequence number, and the successful return of 711 SEQUENCE indicates the previous operation has completed. However, if 712 the session is lost, there is no way to know when any in progress 713 operations have aborted or completed. In hindsight, the NFSv4.1 714 specification should have mandated that DESTROY_SESSION abort/ 715 complete all outstanding operations. 717 8.4. DESCRIPTION 719 A client SHOULD request the EXCHGID4_FLAG_SUPP_FENCE_OPS capability 720 when it sends an EXCHANGE_ID operation. The server SHOULD set this 721 capability in the EXCHANGE_ID reply whether the client requests it or 722 not. If the client ID is created with this capability then the 723 following will occur: 725 o The server will not reply to DESTROY_SESSION until all operations 726 in progress are completed or aborted. 728 o The server will not reply to subsequent EXCHANGE_ID invoked on the 729 same Client Owner with a new verifier until all operations in 730 progress on the Client ID's session are completed or aborted. 732 o When DESTROY_CLIENTID is invoked, if there are sessions (both idle 733 and non-idle), opens, locks, delegations, layouts, and/or wants 734 (Section 18.49) associated with the client ID are removed. 735 Pending operations will be completed or aborted before the 736 sessions, opens, locks, delegations, layouts, and/or wants are 737 deleted. 739 o The NFS server SHOULD support client ID trunking, and if it does 740 and the EXCHGID4_FLAG_SUPP_FENCE_OPS capability is enabled, then a 741 session ID created on one node of the storage cluster MUST be 742 destroyable via DESTROY_SESSION. In addition, DESTROY_CLIENTID 743 and an EXCHANGE_ID with a new verifier affects all sessions 744 regardless what node the sessions were created on. 746 9. Acknowledgements 748 Contributors to this document include: Sumanta Chatterjee, Steve 749 Daniel, Mike Eisler, Jeff Kimmel, Akshay Shah, Margaret Susairaj, and 750 Lynne Thieme. Reviewers of this document include: Dave Noveck. 752 10. IANA Considerations 754 The IO_ADVISE4 flags are considered extendable. Values 32 through 63 755 are reserved for private use. All others are standards track. 757 11. Security Considerations 759 None. 761 12. References 763 12.1. Normative References 765 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 766 Requirement Levels", BCP 14, RFC 2119, March 1997. 768 12.2. Informative References 770 [I-D.eisler-nfsv4-pnfs-dedupe] 771 Eisler, M., "Storage De-Duplication Awareness in NFS", 772 draft-eisler-nfsv4-pnfs-dedupe-00 (work in progress), 773 October 2008. 775 [I-D.eisler-nfsv4-pnfs-metastripe] 776 Eisler, M., "Metadata Striping for pNFS", 777 draft-eisler-nfsv4-pnfs-metastripe-01 (work in progress), 778 October 2008. 780 [I-D.faibish-nfsv4-pnfs-access-permissions-check] 781 Faibish, S., Black, D., Eisler, M., and J. Glasgow, "pNFS 782 Access Permissions Check", 783 draft-faibish-nfsv4-pnfs-access-permissions-check-03 (work 784 in progress), July 2010. 786 [I-D.ietf-nfsv4-minorversion1] 787 Shepler, S., Eisler, M., and D. Noveck, "NFS Version 4 788 Minor Version 1", draft-ietf-nfsv4-minorversion1-29 (work 789 in progress), December 2008. 791 [I-D.lentini-nfsv4-server-side-copy] 792 Lentini, J., Eisler, M., Kenchammana, D., Madan, A., and 793 R. Iyer, "NFS Server-side Copy", 794 draft-lentini-nfsv4-server-side-copy-05 (work in 795 progress), July 2010. 797 [I-D.myklebust-nfsv4-pnfs-backend] 798 Myklebust, T., "Network File System (NFS) version 4 pNFS 799 back end protocol extensions", 800 draft-myklebust-nfsv4-pnfs-backend-00 (work in progress), 801 July 2009. 803 [I-D.quigley-nfsv4-sec-label] 804 Quigley, D. and J. Morris, "MAC Security Label Support for 805 NFSv4", draft-quigley-nfsv4-sec-label-01 (work in 806 progress), February 2010. 808 [RFC3530] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., 809 Beame, C., Eisler, M., and D. Noveck, "Network File System 810 (NFS) version 4 Protocol", RFC 3530, April 2003. 812 Authors' Addresses 814 Michael Eisler (editor) 815 NetApp 816 5765 Chase Point Circle 817 Colorado Springs, CO 80919 818 US 820 Phone: +1 719 599 9026 821 Email: mike@eisler.com 823 Margaret Susairaj (editor) 824 Oracle 825 7806 Garden Bend 826 Sugar Land, TX 77479 827 US 829 Phone: +1 408 431 7405 830 Email: Margaret.Susairaj@oracle.com