idnits 2.17.1 draft-dnoveck-nfsv4-storage-control-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 1114: '...ATTR, the server MUST obey these const...' RFC 2119 keyword, line 1116: '...bute, the server MUST return the error...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 302 has weird spacing: '... opaque prop...' == Line 872 has weird spacing: '...age_ctl note_...' == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 29, 2011) is 4775 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 4 errors (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 nfsv4 D. Noveck 3 Internet-Draft EMC 4 Expires: September 30, 2011 P. Erasani 5 L. Bairavasundaram 6 NetApp 7 P. Dai 8 C. Karamonolis 9 Vmware 10 March 29, 2011 12 Storage Control Extensions for NFS Version 4 13 draft-dnoveck-nfsv4-storage-control-01 15 Abstract 17 Developments in storage systems have made it important for 18 applications to have control over the characteristics of the storage 19 that will be used for their particular files. The development of 20 pNFS has added to the usefulness of such control mechanisms as it has 21 created the opportunity for the hierarchical organization of file 22 names to be separated from the control of storage characteristics for 23 individual files, including the assignment to storage locations to 24 reflect the performance or other needs of those specific files. This 25 document proposes extensions to NFS version 4 to allow storage 26 requirements to be communicated to the NFS version 4 server. 28 Status of this Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on September 30, 2011. 45 Copyright Notice 47 Copyright (c) 2011 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 This document may contain material from IETF Documents or IETF 61 Contributions published or made publicly available before November 62 10, 2008. The person(s) controlling the copyright in some of this 63 material may not have granted the IETF Trust the right to allow 64 modifications of such material outside the IETF Standards Process. 65 Without obtaining an adequate license from the person(s) controlling 66 the copyright in such materials, this document may not be modified 67 outside the IETF Standards Process, and derivative works of it may 68 not be created outside the IETF Standards Process, except to format 69 it for publication as an RFC or to translate it into languages other 70 than English. 72 Table of Contents 74 1. Storage Control Issues . . . . . . . . . . . . . . . . . . . . 4 75 2. Storage Choice and API Definition . . . . . . . . . . . . . . 6 76 3. Modes of Storage Choice . . . . . . . . . . . . . . . . . . . 7 77 4. Assuring Extensability . . . . . . . . . . . . . . . . . . . . 8 78 4.1. Requirements for Extensability . . . . . . . . . . . . . . 8 79 4.2. XDR Encoding for Extensability . . . . . . . . . . . . . . 9 80 5. Storage Control . . . . . . . . . . . . . . . . . . . . . . . 11 81 5.1. Property Types . . . . . . . . . . . . . . . . . . . . . . 11 82 5.1.1. Informative Properties . . . . . . . . . . . . . . . . 11 83 5.1.2. Enforceable Properties . . . . . . . . . . . . . . . . 12 84 5.2. Base Property Specifications . . . . . . . . . . . . . . . 14 85 5.2.1. Storage Size . . . . . . . . . . . . . . . . . . . . . 15 86 5.2.2. Storage Use Duration . . . . . . . . . . . . . . . . . 16 87 5.2.3. Storage Device Failure Limit . . . . . . . . . . . . . 16 88 5.2.4. Storage System Failure Limit . . . . . . . . . . . . . 17 89 5.2.5. Storage System Failure RPO . . . . . . . . . . . . . . 17 90 5.2.6. Storage System Failure RTO Properties . . . . . . . . 17 91 6. Uses of the Attribute storage_ctl . . . . . . . . . . . . . . 19 92 6.1. Use of storage_ctl when creating a file . . . . . . . . . 19 93 6.2. Use of storage_ctl in SETATTR . . . . . . . . . . . . . . 20 94 6.3. Use of storage_ctl in GETATTR/READDIR . . . . . . . . . . 21 95 6.4. Use of storage_ctl in VERIFY/NVERIFY . . . . . . . . . . . 21 96 7. The FETCH_SCNOTE Operation . . . . . . . . . . . . . . . . . . 23 97 8. Attribute Extension . . . . . . . . . . . . . . . . . . . . . 25 98 8.1. Experimental and Other Non-standardized Extensions . . . . 25 99 8.2. Standardized Extensions . . . . . . . . . . . . . . . . . 26 100 8.3. The storage_ext attribute . . . . . . . . . . . . . . . . 26 101 9. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 102 9.1. Errors . . . . . . . . . . . . . . . . . . . . . . . . . . 27 103 9.2. Semantic constraints . . . . . . . . . . . . . . . . . . . 28 104 10. Possible Future Work . . . . . . . . . . . . . . . . . . . . . 30 105 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 31 106 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 32 108 1. Storage Control Issues 110 Storage to which files may be assigned can differ in a number of 111 ways, raising the issue of how to control the choice of storage for 112 specific files. The range of such choices is not static but can be 113 expected to increase as flash memory becomes an option whose use 114 needs to be controlled, or various choices of types of local caching 115 need to be made. Although all files may well be helped by such 116 approaches, the degree to which they will be helped will vary with 117 the type of file and the typical application reference pattern for 118 it. In addition, the value of improved access will differ with quick 119 access to certain files being of much greater value, thereby 120 justifying the allocation of more expensive storage resources to such 121 files. 123 The traditional way that user decisions regarding assignment of 124 storage resources have been effected is by assigning specific file 125 systems to specific disks or sets of disks. Files placed in that 126 file system thereby get the storage characteristics assigned to that 127 file system. Where file systems contain storage of various types, 128 various heuristics are used to assign files or pieces thereof, to 129 storage of various types, generally without any external input about 130 application needs. 132 The creation of pNFS modifies this pattern in that data and metadata 133 are separated. Where pNFS is used, assigning a file to a specific 134 file system now controls only where the metadata is located. 135 Different files may have their data assigned to different sorts of 136 storage, potentially located on different servers. This gives rise 137 to the need for a means by which the storage choice for a particular 138 file may be made. 140 NFS version 4.1 contains a layouthint attribute but this does not 141 really address the problem. The focus of the layouthint attribute is 142 on the striping configuration, but there is a need to control storage 143 characteristics other than this. This is the case even when there is 144 only a single stripe (that is, no striping). Even though this is not 145 "parallel NFS," using pNFS in this way to provide a separation of 146 data and metadata, with the ability to choose locations for data 147 based on its characteristics subject to later change in a user- 148 transparent manner is very powerful, particularly if the storage 149 location is subject to intelligent management. 151 Additionally, more sophisticated storage management arrangements make 152 it desirable to have a way to specify details for storage handling, 153 even when pNFS is not used. When a file system contains different 154 sorts of storage, input regarding desired or necessary storage 155 characteristics can be used to make storage assignment choices more 156 in line with application needs. 158 As a result, the ability to specify desired storage characteristics 159 can provide benfits, both when pNFS is used and when it is not, 160 although pNFS has the most immediate set of needs for means by which 161 to control storage selection. 163 2. Storage Choice and API Definition 165 It needs to be noted that existing API's may not provide means by 166 which some of the storage characteristics described herein may be 167 communicated to NFSv4 in-kernel clients and from there, to NFSv4 168 servers. Nevertheless, definition of a means by which these storage 169 characteristics may be communicated to the NFSv4 server is still 170 useful for a number of reasons: 172 Embedded clients for particular applications may specify this 173 information even without any API deinition. 175 Client implementations may use various less-than-perfect ways of 176 specifying storage characteristics, assigning storage 177 chatcteristics based on file ownership or other nominally 178 unrealated characteristics that that corelate well with customer 179 intentions. 181 Note that if the absence of a standard kernel API were sufficient to 182 stop this work, it also probably be the case that the absence of a 183 means to communicate the information to remote servers might make the 184 definition of that API not worth the effort. By defining some 185 storage characteristics and a general means of communicating them and 186 others (via an extension mechanism) we allow for either: 188 The later development of API's to specify these storage 189 characteristics. 191 The developemt of API's to specify different sets of storage 192 characteristics that can then be easily assimilated to this 193 mechanism as extensions. 195 3. Modes of Storage Choice 197 There are a number of different ways in which storage choices may be 198 indicated: 200 o The specific file system location(s) might be specified. 202 o Specific types of storage might be specified with selection of 203 such choices as SSD, SATA, or fiber channel SAN drives being made 204 by the client and effected by the MDS. 206 o Desired characteristics of storage including speed (latency and/or 207 throughput), amount of storage that will be needed, safety (raid- 208 level). Available storage would be selected to meet the required 209 characteristics and would be subject to active management as the 210 environment changes. 212 These different modes of storage choice are all useful in different 213 environments. Specification of a specific file system imposes the 214 least need for a storage management infrastructure but it requires 215 user/application knowledge. 217 The other modes imply a sequence of progressively greater 218 infrastructure requirements to map specifications to specific storage 219 systems and a correspondingly smaller need for user/application 220 knowledge of the storage environment. However, such modes of 221 operation are very different from existing storage management 222 paradigms and the precise ways in which applications and storage 223 might communicate are not fully understood. 225 4. Assuring Extensability 227 4.1. Requirements for Extensability 229 As the examples of different modes of storage choice suggest, there 230 are potentially a large number of specific items that might be 231 specified in order to effect storage choice. Further, in many cases, 232 expected future developments in the area of storage can be expected 233 to extend and otherwise modify the characteristics which might be 234 specified. 236 The need for extensibility is important as one might expect many 237 ongoing developments, including those in the areas of storage 238 hardware, and file systems, to create corresponding needs to specify 239 relevant storage chatacteristics. 241 For example, local caching, including writeback caching using flash, 242 creates the opportunity for greatly improved performance, at the risk 243 of greater complexity in dealing with network failures. This raises 244 the issue of allowing the user to make the choice of whether this 245 greater performance is worth the risks and difficulties. 247 Similarly, the development of distributed file systems raises many 248 choices where performance will need to be balanced against various 249 forms of safety issues, with specific choices reflecting the specific 250 needs of applications dealing with the storage. 252 These situations and others that we may not be able to predict, 253 require that any attribute scheme in this area allow the 254 specification of multiple storage characteristics with the ability to 255 easily extend the specification so that it incorporates new 256 characteristics to govern storage selection. Further, the need for 257 actual use testing before incorporation in an IETF standard, imposes 258 new requirements as far as organizing specification of the 259 characteristics. 261 Having "working code" to effect characteristic selection is not 262 sufficient to demonstrate usefulness. The working code may be 263 trivial while finding out whether this set of characteristics make 264 sense for applications to use or requires extension or modification 265 before assuming its final form is not trivial. This may require 266 significant trial use among a large set of users running different 267 applications, before the details are ready to be standardized. 269 These factors increase the need for flexibility, including non- 270 private use of characteristics not yet standardized. Accommodating 271 this need for flexibility has the potential for unduly interfering 272 with interoperability and the design of this feature will need to 273 avoid that. 275 4.2. XDR Encoding for Extensability 277 While each storage property could conceivably be made its own 278 attribute, the burden that this would place on the IETF process would 279 be immense. There would be necessary co-ordination (and almost 280 certain confusion) as individual experimental properties needed 281 temporary attribute numbers and then had to shift them to other more 282 permanent numbers. Further, and even more of an issue, storage 283 property definition would seem to require a minor version, which 284 seems too heavyweight. This would slow down the process beyond what 285 should be for something which was its own standard-track RFC. 287 In order to address these issues, individual properties will be 288 treated as sub-attributes within a single storage_ctl attribute. To 289 simplify assignment of sub-attribute numbers, mainly in support of 290 experimental use, multiple sub-attribute spaces will be supported, to 291 allow independent development of features each involving multiple 292 storage properties. Once such a feature is standardized, the 293 definition of the specific sub-atribute space could simply be made 294 the subject of a standards-track RFC, with no change to those using 295 it. 297 typedef uin32_t spacenum_sc; /* Individual property space id. */ 298 typedef uint32_t bitmap_sc<*>; /* Bit map for the presence or 299 absence of individual properties 300 using bit numbers assigned for 301 the space. Like bitmap4. */ 302 typedef opaque proplist_sc<*>; /* Data associated with each of the 303 properties in the bitmap_sc. 304 Like attrlist4. */ 306 struct section_sc { 307 spacenum_sc SpaceSection; /* Section number. */ 308 bitmap_sc WhichProperties;/* Bit map of properties present. */ 309 proplist_sc PropertyData; /* Data for each of the properties 310 specified in this section. */ 311 }; 313 typedef section_sc fattr4_storage_ctl<*>; 314 /* The attribute may have one or 315 more property sections. */ 317 This form of property encoding allows the property set to be extended 318 without requiring a new minor version. Also, by allowing property 319 space numbers to be assigned, property sets can be developed 320 independently, and converted to a standard state without undue 321 interruption to those using the earlier form. 323 5. Storage Control 325 Storage, along with compute, memory, and network, is an integral part 326 of an application's resources. Much like the other types of 327 resources consumed by an application, storage needs can be described 328 using a set of properties. These properties may serve to describe 329 the characteristics of the storage, the intended usage both temporal 330 and spatial, quality of service expectations, physical layout over 331 available storage media, data access locations, geographical 332 distribution, just to name a few. The collection of such properties 333 together define the control an application ultimately wants to have 334 on storage; conversely, they enable the storage system to more 335 effectively and dynamically meet the application's needs as 336 specifically expressed, rather than inferred, based on fallible 337 heuristics. Henceforth, we will use the term control to refer to the 338 property collection. 340 It is not difficult to conceive various storage properties. In fact, 341 there are numerous of them, due to the diversity of applications and 342 the corresponding workload characteristics, the ever increasing 343 storage value-adds in the form of data services, and the fast 344 changing business requirements. It is an impossible task to capture 345 all of them here. Rather, the goal of this document is to define a 346 framework in which new properties can be easily added and new 347 semantics of the properties can be introduced as necessary without 348 disruption. It is desired that they be capable of being used in more 349 limited situations, refined as necessary. 351 5.1. Property Types 353 There may be numerous storage properties as mentioned above. We 354 need, however, to distinguish at least two types, namely, informative 355 properties and enforceable properties. There may very well be other 356 systems or criteria when it comes to the classification of storage 357 properties; and extensibility shall apply in this case just as it 358 does to adding new storage properties. However, there is a need to 359 explicitly capture the distinctions between informative and 360 enforceable properties in the data model, due to the impact on the 361 storage protocol semantics. 363 5.1.1. Informative Properties 365 An informative property, as the name suggests, provides some 366 descriptive information about the storage in question. Such 367 information is furnished in a single direction from the application 368 to the storage system with absolutely no "contractual" implications. 369 The storage system may use the information captured in such a 370 property for storage optimization. But it is not obligated to do so. 372 More importantly, the application is not offered any transparency as 373 to how the storage system may utilize this information. As such, the 374 information flow is strictly one-way without the prospect for any 375 feedback. Examples of informative properties are the access pattern 376 of the storage in use, the expected capacity need, and the estimated 377 growth rate. 379 5.1.2. Enforceable Properties 381 In contrast, an enforceable property may have embedded in it varying 382 degrees of binding effect. By that, it means the application 383 specifying the property has expectations that the storage system not 384 only acts upon but also conveys the action status back in some way. 385 Unlike the case of an informative property, the information flow in 386 this case is truly bi-directional, with the backward direction for 387 monitoring property status, including information on whether a 388 property has been satisfied or is in the process of being satisfied. 389 In that sense, an enforceable property has a resemblance to an 390 agreement, where one might monitor the performance of the other 391 party. 393 Applications seeking tighter control of the storage may resort to the 394 enforceable properties. Examples of enforceable properties could 395 include the type and speed of sorage but could also include the 396 availability, reliability, and average throughput and latency. 398 5.1.2.1. Enforcement Level 400 To allow varying degrees of control, an enforcement level may be 401 associated with an enforceable property. There are two levels of 402 control possible, namely, advisory and mandatory. Regardless of the 403 level, the storage system should strive to fulfill an enforceable 404 property. The difference lies in the treatment of an inability to do 405 so. With an advisory enforcement level, the storage system shall 406 continue to carry out the operation even if the property could not be 407 fulfilled; whereas with mandatory, the storage shall fail the 408 operation without making any modification. In any case, the failure 409 to fulfill an enforceable property can be communicated to the 410 application. 412 5.1.2.2. Compliance Status 414 While control may suffice to describe the ultimate storage 415 requirements, i.e., the intended behavior once it has been fully 416 implemented, it does not by itself capture the dynamic aspects of the 417 implementation process. This is encompassed by the concept of 418 "compliance" which indicates the extent to which requested storage 419 properties have or have not been provided or whether they are still 420 in the process of being provided. Note that the word "compliance" as 421 used here has no connection with this word as used to describe issues 422 conformance with a set of legal requirements for record-keeping, 423 among other matters. 425 Control implementation can be a fairly heavyweight process by nature 426 due to the data intensity involved. This may be true whether it is 427 during the initial provisioning of storage, or the subsequent change 428 management, or the remediation of compliance violation. The data 429 intensive nature of the control implementation process implies that 430 the transition from non-compliance to compliance will not be 431 instantaneous in the general case. In other words, the 432 implementation process remains asynchronous relative to the operation 433 that triggers it. 435 The asynchronous nature of the control implementation process may be 436 captured by the compliance status. The compliance status may have 437 three different values, namely, Current, Complying, and Failed. The 438 value Current represents a fully compliant state. The value 439 Complying refers to a transient state in which the transition to 440 current is in progress. 442 The value Failed represents an indefinite state of non-compliance. 443 In the last case, the storage system may have made the determination 444 that it is unable to fulfill some or all of the storage properties 445 given the physical resources available. The application will work 446 without, but its performance may not be what is desired. 448 The compliance status describes the state of the control fulfillment 449 as it pertains to each property. It applies to an enforceable 450 property only. Its presence is not a syntactic requirement as 451 defined by the XDR specification. Depending on the operational 452 context in which the enforceable property is specified, specification 453 of compliance status may be either invalid, required, or optional 454 with the specification of more that one such status values possible 455 in some cases. 457 5.1.2.3. XDR Encoding for Enforceable Properties 459 Enforceable properties contain a word which is of type enforce_sc and 460 allows the enforcement level and compliance status to be specified. 461 To allow greatest flexibility, all enforcement statuses and 462 compliance status values are specified as bit values, allowing sets 463 of enforcement levels and complicance status, to be specified, as 464 appropriate. 466 typedef uint32_t enforce_sc; 468 const enforce_sc ENFORCE_MANDATORY = 0x1; 469 const enforce_sc ENFORCE_ADVISORY = 0x2; 470 const enforce_sc ENFORCE_CURENT = 0x10; 471 const enforce_sc ENFORCE_COMPLYING = 0x20; 472 const enforce_sc ENFORCE_FAILED = 0x40; 474 For most purposes, enforcement words should have a single enforcement 475 level, either ENFORCE_MANDATORY ENFORCE_ADVISORY. Any enforcement 476 word containing both bits will result in NFS4ERR_SCTL_BADENF being 477 returned. Specification of an enforcement word containing neither 478 will generally result in in NFS4ERR_SCTL_BADENF being returned. 479 However, it may be specified, when doing a SETATTR that specifies a 480 reserved empty parameter value to remove a property specification. 481 Also, it may be specified when doing an VERIFY or NVERIFY to specify 482 a property without a defined enforcement level. 484 When specifying a storage property as part of a OPEN, CREATE. or 485 SETATTR, no enforcement level bits should be specified. If they are, 486 the error NFS4ERR_SCTL_BADENF is returned. For values returned by 487 the server in response to GETATTR, enforcement words, containing 488 exactly one compliance status bit will be returned. When using 489 storage properties as part of VERIFY or NVERIFY compliance words 490 containing no compliance bits or any subset of the valid compliance 491 status bits may be specified. 493 5.2. Base Property Specifications 495 The goal for initial inclusion in an NFS version 4 minor version is 496 to define a small set of property specifications that are generally 497 useful and do not require a large management infrastructure to 498 implement. The following are the three property specifications that 499 fit the description. 501 const spacenum_sc SCNUM_BASE = 1; /* Base property space id for 502 all properties in this 503 group. */ 505 const uint32_t SCBASE_SIZE = 0; /* Informative property for 506 size. */ 507 const uint32_t SCBASE_DURATION = 1; /* Informative property for 508 duration. */ 509 const uint32_t SCBASE_DEVFAIL = 2; /* Enforceable property for 510 a device failure limit. */ 511 const uint32_t SCBASE_SYSFAIL = 3; /* Enforceable property for 512 a system failure limit. */ 513 const uint32_t SCBASE_FAIL_RPO = 4; /* Enforceable property for 514 a recovery point objective 515 in the event of failure. */ 516 const uint32_t SCBASE_SFAIL_RTO = 5;/* Enforceable property for 517 a recovery time objective 518 in the event of system 519 failure. */ 520 const uint32_t SCBASE_DLOSS_RTO = 6;/* Enforceable property for 521 a recovery time objective 522 in the event of data loss. */ 523 const uint32_t SCBASE_DISASTER_RTO = 7;/* Enforceable property for a 524 recovery time objective in 525 the event of disaster. */ 527 5.2.1. Storage Size 529 The storage size is an informative property that allows the 530 specification of the expected amount of storage to be needed. It may 531 be used by the server in seeing if appropriate space is available and 532 in reserving space. It is specified as a 64-bit unsigned value 533 giving a quantity of storage expressed in bytes. 535 typedef uint64_t propbase_size; 537 This value may be different from the expected file size. Areas not 538 allocated, because of holes for example, are not included. This 539 amount of storage may not be required immediately if the file starts 540 small and grows. Any derating of specified values is purely a matter 541 of server implementation choice and will typically reflect the 542 ability to move data to respond to storage overcommitment. 544 A value of zero is invalid and would result in the error 545 NFS4ERR_SCTL_BADPARM when used in an OPEN or CREATE. When used in 546 SETATTR, it causes deletion of a previous storage size specification. 548 5.2.2. Storage Use Duration 550 The storage use duration is an informative property that allows the 551 specification of the amount of time that the storage is expected to 552 be needed. It may be used in assigning files to storage so that 553 space conflicts are reduced. It is specified as a 64-bit unsigned 554 value giving a duration in milliseconds. 556 typedef uint64_t propbase_duration; 558 This allows times from 1 millisecond up to approximately 500 million 559 years to be specified. A value of zero is invalid and would result 560 in the error NFS4ERR_SCTL_BADPARM when used in an OPEN or CREATE. 561 When used in SETATTR, it causes deletion of a previous storage 562 duration specification. 564 5.2.3. Storage Device Failure Limit 566 The storage device failure limit is an enforceable property that 567 allows the specification of a number of disk drives (or other 568 devices) that can fail simultaneously with no data loss and that 569 incurs zero recovery time. It must be the case that any set of 570 devices of the specified can fail without data loss and with zero 571 recovery time. 573 Even though there is no recovery time, there may be a significant 574 recovery period of modestly reduced performance while adaptation to 575 the failure is done and until the completion of which, additional 576 device failures will be considered simultaneous. 578 The limit is specified as a 32-bit unsigned value giving the minimum 579 count of simultaneous failures that can result in data loss to 580 clients accessing the file. Storage is assigned which either matches 581 this specification or provides a greater value. When pNFS is 582 involved the specification applies to storage for the MDS and each 583 DS. 585 typedef uint32_t prop_dev_fail_lim; 587 struct propbase_device_failure_limit { 588 enforce_sc DflEnforce; 589 prop_dev_fail_lim DflLimit; 590 }; 592 This allows values from zero to approximately 4 billion to be 593 specified. A value of zero is valid and specifies that data loss is 594 tolerable in the event of single device failure. (e.g. RAID-0) 596 5.2.4. Storage System Failure Limit 598 The storage system failure limit is an enforceable property that 599 allows the specification of the number of storage systems that must 600 be able to fail simultaneously without complete data loss. Storage 601 is assigned which either matches this specification or provides a 602 greater value. When pNFS is involved the specification applies to 603 storage for the MDS and DS's as a unit. 605 typedef uint32_t prop_sys_fail_lim; 607 struct propbase_system_failure_limit { 608 enforce_sc SflEnforce; 609 prop_sys_fail_lim SflLimit; 610 }; 612 This allows values from zero to approximately four billion to be 613 specified. A value of zero is valid and specifies data loss in the 614 event of a single storage system failure is tolerable. 616 5.2.5. Storage System Failure RPO 618 The recovery point objective (RPO) is the age of files that must be 619 recovered from backup storage for normal operations to resume if a 620 computer, system, device, or network failure results in data loss. 621 The RPO is expressed backward in time (that is, into the past) from 622 the instant at which the failure occurs, and can be specified in 623 seconds. It is an important consideration in disaster recovery 624 planning. 626 typedef uint64_t prop_sys_fail_RPO; 628 struct propbase_system_failure_RPO { 629 enforce_sc SfrpoEnforce; 630 prop_sys_fail_RPO SfrpoTime; 631 }; 633 This allows values from zero seconds to a value far beyond the age of 634 the universe to be specified. A value of zero is valid and indiactes 635 that a real-time backup that reflects changes immediately as made is 636 required. 638 5.2.6. Storage System Failure RTO Properties 640 Recovery time objective (RTO) properties specify is the maximum 641 tolerable length of time that storage assigned may be unavailable in 642 the event of various classes of failures. There are three associated 643 properties, each of which specifies this value for a particular class 644 of failure: 646 The system failure RTO property, with the property id 647 SCBASE_SFAIL_RTO, defines the recovery time objective in the event 648 of failures that do not not involve data loss or data corruption. 650 The data loss RTO property, with the property id SCBASE_DLOSS_RTO, 651 defines the recovery time objective in the event of failures that 652 do not not involve the occurrence of a disaster, defined as a 653 major environmental event such as a hurricane, earthquake, or 654 flood, etc. 656 The system failure RTO property, with the property id 657 SCBASE_DISASTER_RTO, defines the recovery time objective in the 658 event of any falure including disasters. 660 The actual RTO is a function of the extent to which the interruption 661 disrupts normal operations and the provisions made to ameliorate this 662 situation. The desired RTO is a function of the urgency to re- 663 establish operations and the consequences of failure to promptly do 664 so. It is an important consideration in recovery planning. 666 typedef uint64_t propbase_sys_fail_RTO; 668 struct propbase_system_failure_RTO { 669 enforce_sc SfrtoEnforce; 670 prop_sys_fail_RTO SfrtoTime; 671 }; 673 RTO values for all of these properties is specified as a 64-bit 674 integer which specifies a number of microseconds. Although sub- 675 second RTO values may be difficult, the specification allows small 676 values which might be useful in the future. The maximum value is 677 approximately five-hundred thousand years. 679 6. Uses of the Attribute storage_ctl 681 There are four occasions in which the storage_ctl attribute is 682 referred to as part of an fattr4 when the storage_ctl mask is 683 present. 685 o As an attribute specified when creating a file or similar object 686 by means of an OPEN or CREATE operation, in order to specify the 687 specific storage properies to control the locations on which the 688 data is to be put and other associated properties. 690 o As an attribute set in a SETATTR operation to change the requested 691 location properties. Servers may or may not have the ability to 692 change locations on request, but the operation structure will 693 indicate whether the server has or doesn't have this ability when 694 it is requested. 696 o As an attribute read in a GETATTR or READDIR operation to 697 determine the currently requested storage properties and the 698 degree to which they are current being complied with. 700 o As an attribute specified in VERIFY or NVERIFY to test for current 701 location property compliance status. 703 In addition to the above, a fattr4_storage_ctl of the of the same 704 structure as storage_ctl attribute (although not within an fattr) 705 also appears within the response data in the following situations. 707 For the OPEN, CREATE, and SETATTR operations, when the error 708 returned is NFS4ERR_SCTL_FAIL. (See Use of storage_ctl when 709 creating a file and Use of storage_ctl in SETATTR for details). 711 For the response to the FETCH_SCNOTE operation, when there is a 712 pending storage control note to be reported. 714 For most purposes, a fattr4_storage_ctl which appears in OPEN, 715 CREATE, and SETATTR requests are handled the same and a 716 fattr4_storage_ctl which appears in the responses for OPEN, CREATE, 717 and SETATTR are handled similarly, while the VERIFY and NVERIFY 718 requests form a third similarity group. 720 6.1. Use of storage_ctl when creating a file 722 When the storage_ctl attribute is specified when creating a file, it 723 helps decide on the location selected for the file data. If all 724 enforceable properties can be immediately satisfied, then the 725 operation proceeds normally. 727 If an enforceable property specified as with the manadatory 728 enforcement level cannot be satisfied then the operation fails with 729 the error NFS4ERR_SCTL_FAIL. The response contains, for the case 730 NFS4ERR_SCTL_FAIL, a fattr4_storage_ctl value which consists all such 731 enforceable properties which could not be satisfied. 733 If there is a situation which is not as serious as the failure above, 734 but still of note, then information relevant to that situation is 735 stored as a pending storage control note, where it can be fetched (in 736 the same COMPOUND) by the FETCH_SCNOTE operation. 738 The following three classes of items are included in situations 739 leading to a pending storage control note being created. 741 o An enforceable property of the advisory enforcment level which 742 could not be satisfied, i.e its compliance status is indicated as 743 failed. 745 o An enforceable property of the advisory enforcement level which 746 could not be immediately satisfied, i.e. its compliance status is 747 indicated as Complying. 749 o An enforceable property of the mandatory enforcement level which 750 could not be immediately satisfied, i.e. its compliance status is 751 indicated as Complying. 753 6.2. Use of storage_ctl in SETATTR 755 A value of the storage_ctl attribute with a structure similar to the 756 OPEN case is used to change properties for an existing file. 757 Existing elements properties, not changed by the storage_ctl 758 attribute remain in effect. 760 An enforceable property type and the same enforcement level status is 761 overridden by a corresponding one in the new attributes. To delete 762 such an enforceable property element without setting a new one, an 763 enforceable property with no parameter values is used. Similarly, an 764 informative property will override an existing one of the same type 765 and use of the that property specification with no parameters is used 766 to delete an existing informative propety specification without 767 replacing it. 769 Failures and notifications are indicated via the error code 770 NFS4ERR_SCTL_FAILED and creation of pending storage control notes, 771 just as in the case of OPEN. 773 6.3. Use of storage_ctl in GETATTR/READDIR 775 When the storage_ctl attribute is requested as part of GETATTR or 776 READDIR, the fattr4_storage_ctl returned within the file attributes 777 reflects the current informative properties together with the 778 enforceable properties and together with its current compliance 779 status. 781 The order of the elements need not reflect that used when the 782 attribute was first set. When enforceable properties specify a range 783 of multiple possible values, the one returned in the attribute will 784 reflect the value actually assigned. 786 6.4. Use of storage_ctl in VERIFY/NVERIFY 788 The storage_ctl attribute presented to VERIFY or NVERIFY is 789 interpreted as a series of properties each of which results in a 790 truth value. When the truth value for all properties presented is 791 true, VERIFY succeeds and NVERIFY fails. Conversely when not all 792 properties have that truth value, VERIFY fails and NVERIFY succeeds. 794 When informative properties are present they are compared to the 795 value set at OPEN, CREATE, or the last SETATTR. If no such value had 796 been previously set, the result is treated as non-matching. 798 Enforceable properties are classified according to three criteria: 800 o Whether they have parameters that indicate specific values 801 (With-P) or are the special values defined for that purpose for 802 each parameter, which are treated as without parameters (Non-P) 803 where the parameter values taken are those specified in the 804 corresponding property within the file's attributes. 806 o Whether they are, an enforcement level specified (With-Enf) or not 807 (Non-Enf). 809 o Whether they are together with one or more compliance level levels 810 specified (With-Comp) or not (Non-Comp). 812 Given the above classifications, the following sets of 813 characteristics for enforceable properties in the context of 814 storage_ctl for VERIFY, NVERIFY are treated as errors and should 815 cause the return of the error NFS4ERR_SCTL_BAD. 817 o Non-Comp/Non-Enf/Non-P 819 o Non-Comp/Non-Enf/With-P 820 o With-Comp/non-Enf/Non-P 822 o With-Comp/With-Enf/With-P 824 Given the above classifications, the following sets of 825 characteristics for enforceable properties in the context of 826 storage_ctl for VERIFY, NVERIFY are handled as discussed below. 828 Non-Comp/With-Enf/Non-P: is true iff there exists an enforceable 829 property containing elements of the associated enforcement status 830 as part of the storage_ctl attribute of the file. 832 Non-Comp/With-Enf/With-P: is true iff the enforceable proeprty 833 specified is compatible with the corresponding enforceable 834 property of the associated enforcement level, i.e. if it is 835 possible to satisfy both at the same time, without reference to 836 whether both or either actually is satisfied. 838 With-Comp/Non-Enf/With-P: is true iff the enforceable property 839 (including a set of of property specifications of the same type) 840 which appear in the storage_ctl attribute passed to the op is 841 consistent with the set of compliance levels (often a single level 842 but sometimes two) in the specification. That is, the actual 843 compliance level must be one of the ones that is specified. 845 With-CompB/With-Enf/Non-P: is true iff the enforceable property 846 designated by this specification (i.e. that being of the same type 847 of specification and the same enforcement level) is consistent 848 with the set of compliance levels (often a single level but 849 sometimes two) in this specification. That is, the actual 850 compliance level must be one of the ones that is specified. 852 7. The FETCH_SCNOTE Operation 854 7.1. SYNOPSIS 856 (cfh) -> note_pres, note_fattr 858 7.2. ARGUMENT 860 /* CURRENT_FH: */ 861 void; 863 7.3. RESULT 865 enum SCFres_type { 866 SCFres_ABSENT = 0, 867 SCFres_PRESENT = 1 868 }; 870 union SCFresok switch (SCFres_type note_pres) { 871 case FETCH_PRES: 872 fattr4_storage_ctl note_attr; 874 case FETCH_ABS: 875 void; 876 }; 878 union FETCHres switch (nfsstat4 status) { 879 case NFS4_OK: 880 /* CURRENT_FH: opened file */ 881 FETCH4resok resok4; 882 default: 883 void; 884 }; 886 7.4. DESCRIPTION 888 The FETCH_SCNOTE operation is used to fetch a pending storage control 889 note for a specified file handle (the current file handle). Note 890 that these notes are stored according to the current file handle when 891 the operation which gave rise to them was executed. Thus it will be 892 the directory on (most) OPENs, and the specific file in the event of 893 SETATTR. 895 This operation uses the current filehandle value to identify the 896 storage control note being sought. 898 The operation returns an indication of whether the note is present 899 and if it is a fattr4_storage_ctl value which consists all 900 enforceable properties where there is a lack of adequate compliance 901 to be noted. The use of the the enum scnote_respval rather than a 902 boolean value allows later extension. 904 If the note is present, it ceases to be so once the operation is 905 executed. 907 7.5. IMPLEMENTATION 909 Storage control note items are maintained on a per-COMPOUND-request 910 basis and cease to exist when a COMPOUND fails due to completion or 911 an the occurrence of an error. This makes it desirable to place the 912 FETCH_SCNOTE operation close to, generally immediately after the 913 operation capable of generating the storage control note. 915 8. Attribute Extension 917 8.1. Experimental and Other Non-standardized Extensions 919 In order to support development of extensions to allow control of new 920 file system support attributes, extensions may be defined, each with 921 their own proper space id. The goal is to allow quick deployment of 922 new features, including those that are vendor-specific at the time 923 with the definitions of extensions being publicly available. 925 Each such extension set should be registered with IANA. The 926 registration will include 928 o A short name (a few words) by which the extension will be known. 930 o The name or corporate identity of the owner of the extension. 932 o Data for the first version of the namespace extension, as 933 described below. 935 IANA will assign a space id by which the extension will be known. 937 Successive versions of spaceid properties should be registered by the 938 owner of the extension. The registration should include: 940 o The namespace name and number. 942 o The namespace version number. The version number is in the form a 943 series of small (< 256) integers. The length of the series will 944 probably be restricted to something between four and six. The 945 version numbers will not be checked for order but only that they 946 are unique for a given extension. 948 o A document in the form of an internet draft with information on 949 the namespace elements paralleling this one. The document will 950 contain definitions and property numbers with the space id for all 951 of properties within the extension. 953 Successive version may add properties but may not delete them, 954 clarifications to the semantics of existing properties may be made 955 but substantive changes in their semantics should not be made. 957 Existing properties may not be defines as invalid or mandatory-to- 958 not-implement but they may be defined as incompatible with some 959 set of new properties. 961 The definitional document should be subject to expert review but the 962 purpose of the review is to ensure that the document describes the 963 extension adequately. It should not be rejected simply because the 964 expert would do things differently or believe the specified 965 properties are useful. 967 8.2. Standardized Extensions 969 Storage properties may be extended via a standards-track document in 970 a number of ways. Such an extension may be part of a new minor 971 version, but may also be done independent of in a standards-track 972 document other than for a new NFSv4 minor version. When the 973 extension occurs in a new minor version the document should make 974 clear whether the additional properties are recommended (as is 975 normally the case) or mandatory. 977 The following forms of extension are all valid options: 979 Adding additional properties to existing standardized property set 980 such as PROP_BASE. 982 Creating a new property set its own property set id. 984 Converting a previous experimental property set to standards-track 985 status based on the publication of the RFC [Need to clarify any 986 possible transfer of ownership issues.] 988 8.3. The storage_ext attribute 990 The storage_ext attribute is a per-fs attribute which contains 991 information on the storage_ctl extensions suported by the server when 992 used on the associated file system. Servers will often report the 993 same value of the storage_ext attribute for all file systems, but 994 client should not assume that this is the case. 996 struct section_se { 997 spacenum_sc SpaceSction; /* Section number. */ 998 bitmap_sc WhichProperties;/* Supported properties. */ 999 }; 1001 typedef section_se fattr4_storage_ext<�>; 1003 The storage_ext attribute consists of section_se arrays, each of 1004 which specify the supported properties for a specific space_id. The 1005 section_se arrays should be reported in ascending numeric order of 1006 spacenum_sc values. 1008 9. Summary 1010 This chapter serves a reference guide to things discussed above. For 1011 a more discursive treatment, with less attention due syntax details, 1012 see above. 1014 9.1. Errors 1016 This proposal would involve adding the following new errors to the 1017 NFS version 4 minor version in which it is included. 1019 NFS4ERR_SCTL_BADPROP Returned when the storage_ctl attribute 1020 contains properties with a space id unknown to the server, or with 1021 property bits whose diplacement in the bitmap corresponds to 1022 property numbers not known to the server as being associated with 1023 the current space id. 1025 This error is returnable by OPEN, CREATE, SETATTR, VERIFY, and 1026 NVERIFY. 1028 NFS4ERR_SCTL_BADPARM Returned when the storage_ctl attribute 1029 contains parameters defined as not valid in connection with the 1030 current property. This includes situations in which multiple 1031 properties contain values that are defined as inconsistent (as 1032 opposed to not being satisfiable). 1034 This error is returnable by OPEN, CREATE, SETATTR, VERIFY, and 1035 NVERIFY. 1037 NFS4ERR_SCTL_BADENF Returned when the the storage_ctl attribute 1038 contains a enforceable property whose enforce_sc is invalid, in 1039 that it contain multiple enforcement level bits, contains no 1040 enforcement level bits, in a context in which that is not allowed 1041 or contains a set of compliance specification bits that is not 1042 appropriate in the current context. 1044 This error is returnable by OPEN, CREATE, SETATTR, VERIFY, and 1045 NVERIFY. 1047 NFS4ERR_SCTL_BADDATA Returned when the storage_ctl contains a 1048 section_sc whose PropertyData array does not match the length of 1049 the properties specified in the associated WhichProperties. 1051 This error is returnable by OPEN, CREATE, SETATTR, VERIFY, and 1052 NVERIFY. 1054 NFS4ERR_SCTL_FAIL Returned when a required storage_ctl element 1055 cannot be satisfied. This is as opposed to the case in which it 1056 is not being able to be satisfied immediately but is in the 1057 process of being satisfied. 1059 This error is returnable by OPEN, CREATE, and SETATTR only. 1061 9.2. Semantic constraints 1063 This section lists the semantic contraints on property 1064 specifications. We will have situations in which the attribute will 1065 fully match specified XDR specification but the specification will 1066 not be in line with appropriate contextual constraints. This section 1067 will list those constraints, in order to complement the XDR 1068 definition above. 1070 There are four categories of constraints that need to be dealt with: 1072 o Whether the properties have the associated parameters specified. 1074 o Whether the properties have an associated enforcement level 1075 specified. 1077 o Whether the properties have associated compliance level(s) 1078 specified. 1080 o Constraints that involve the validity of combinations of what are 1081 otherwise allowed situations with regard to the above. 1083 Each property specifies a particuar value which is invalid and is to 1084 be treated as inicateing the absence of property parameters (zero 1085 values, zero-length arays, etc.). Specification of the parameters 1086 associated with storage properties are generally required and so 1087 these special value result in NFS4ERR_SCTL_BADPARM being returned. 1088 The only exceptions are SETATTR, for which a storage property without 1089 parameters serves to delete the corresponding storage propery in the 1090 existing attribute, and VERIFY/NVERIFY where it is allowed under some 1091 circumstances, to be discussed below. 1093 Specification of the enforcement level is generally required for 1094 enforceable properties. The only exception is VERIFY/NVERIFY where 1095 it is allowed under some circumstances, to be discussed below. 1097 Specification of the compliance status for enforceable properties 1098 depends on the context in which the properties appears. For OPEN, 1099 CREATE, and SETATTR, specification of compliance status is not 1100 allowed. VERIFY/NVERIFY specification of multiple compliance status 1101 values is allowed, subject to the specific combination constraints 1102 appropriate to VERIFY and NVERIFY as listed below. For all other 1103 contexts, whether in GETATTR, READDIR, the responses in the 1104 NFS4ERR_SCTL_FAIL case, or in the response to the FETCH_SCNOTE 1105 operation, specification of compliance status is required but only a 1106 single compliance status must appear. 1108 In addition to the constraints listed above, in the case of a 1109 storage_ctl attribute within VERIFY/NVERIFY, the properties within 1110 the attribute must meet the additional constraints described in the 1111 section Use of storage_ctl in VERIFY/NVERIFY 1113 When sending responses to GETATTR, READDIR, OPEN, CREATE, and 1114 SETATTR, the server MUST obey these constraints. When receiving 1115 OPEN, SETATTR, VERIFY, and NVERIFY requests that contain the 1116 storage_ctl attribute, the server MUST return the error 1117 NFS4ERR_SCTL_BADENF if the attribute does not follow the specified 1118 constraints and is otherwise valid (matching the XDR property 1119 deinition). 1121 These constraints apply to properties introduced by extensions to the 1122 storage_ctl attirbute unless explicitly overridden in the document 1123 defining the extension. Such a document may add other contextual 1124 constraints that apply to the properties defined by that extension. 1126 10. Possible Future Work 1128 This document describes a basic framework for storage control and a 1129 basic set of properties. It is a base for development of this 1130 feature and could have considerable additions before incorporation in 1131 NFSv4 an minor version. On the other hand, the feature is intended 1132 to be defined with sufficient flexibility that many of these 1133 additions to the feature might be done as subsequent extensions, 1134 after the basic feature is made part of an NFSv4 minor version. 1136 The question of which additions are required for an initial version 1137 of the feature, which are best deferred to later and which proposed 1138 extensions don't really belong is a complex one and will be a major 1139 subject of the development of the feature. 1141 The following list, illustrates some of the possible additions that 1142 have had some preliminary discussion. It is not intended to be 1143 exhaustive, and the examination of other additions not yet thought of 1144 is definitely part of the work to be done: 1146 Addition of other properties to those in this document, that make 1147 sense as a basic set of properties, both informative and 1148 enforceable, for an initial set to be part of an NFSv4 minor 1149 version. 1151 Mechanisms to allow a set of properties to be applied to a large 1152 set of files, including those that are directory-based (with 1153 inheritance a possible part of the mix), by bulk attribute change 1154 on a client-specified set of files, or by allowing the client to 1155 store some set of properties as a persistent object in file 1156 system, and allowing subsequent storage control attributes to 1157 reference that persistent object. 1159 Mechanisms to enable the client to determine possible choices (or 1160 ranges) for some properties within the context of a given server. 1161 This would be to simplify and streamline property negotation. 1163 Mechanisms by which a server could advertise various possible sets 1164 of property choices to deal with environments where there only 1165 exists a small set of possible choices each effecting a particular 1166 choice for many properties, as opposed to a case where multiple 1167 independent property choices are possible. 1169 11. Acknowledgments 1171 Mike Eisler reviewed early drafts of this work and made important 1172 contributions in helping define the direction of the effort. 1174 David Black reviewed many drafts of this work and made many helpful 1175 suggestions that improved the quality of the result. 1177 Authors' Addresses 1179 David Noveck 1180 EMC 1181 228 South St. 1182 Hopkinton, MA 01748 1183 US 1185 Phone: +1 508 249 5748 1186 Email: david.noveck@emc.com 1188 Pranoop R. Erasani 1189 NetApp 1190 48980 Oat Grass Terrace 1191 Fremont, CA 94539 1192 US 1194 Phone: +1 408 822 3282 1195 Email: pranoop@netapp.com 1197 Lakshmi N. Bairavasundaram 1198 NetApp 1199 475 East Java Drive 1200 Sunnyvale, CA 94089 1201 US 1203 Phone: +1 408 419 5616 1204 Email: lakshmib@netapp.com 1206 Peng Dai 1207 Vmware 1208 5 Cambridge Center 1209 Cambridge, MA 02142 1210 US 1212 Phone: +1 617 528 7592 1213 Email: pdai@vmware.com 1214 Christos Karamonolis 1215 Vmware 1216 3401 Hillview Ave. 1217 Palo Alto, CA 94304 1218 US 1220 Phone: +1 650 427 2329 1221 Email: ckaramonolis@vmware.com