idnits 2.17.1 draft-dnoveck-location-chapters-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 14. -- Found old boilerplate from RFC 3978, Section 5.5 on line 2134. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2111. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2118. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2124. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 378: '...y be absent), it MUST support the fs_l...' RFC 2119 keyword, line 379: '... attribute and SHOULD support the fs...' RFC 2119 keyword, line 437: '...t), the following attributes SHOULD be...' RFC 2119 keyword, line 456: '...Other attributes SHOULD NOT be made av...' RFC 2119 keyword, line 899: '...ration of a filesystem SHOULD transfer...' (2 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 1890 has weird spacing: '...s4_type typ...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: The set of info data is subject to expansion in a future minor version, or in a standard-track RFC, within the context of a single minor version. The server SHOULD NOT send and the client MUST not use indices within the info array that are not defined in standards-track RFC's. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 19, 2006) is 6520 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 3530 (ref. '1') (Obsoleted by RFC 7530) == Outdated reference: A later version (-29) exists of draft-ietf-nfsv4-minorversion1-02 Summary: 8 errors (**), 0 flaws (~~), 5 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 Working Group D. Noveck 3 Internet-Draft Network Appliance 4 Expires: December 21, 2006 June 19, 2006 6 Chapters for Migration, Replication, and Referrals for v4.1 Draft 7 draft-dnoveck-location-chapters-00 9 Status of this Memo 11 By submitting this Internet-Draft, each author represents that any 12 applicable patent or other IPR claims of which he or she is aware 13 have been or will be disclosed, and any of which he or she becomes 14 aware will be disclosed, in accordance with Section 6 of BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on December 21, 2006. 34 Copyright Notice 36 Copyright (C) The Internet Society (2006). 38 Abstract 40 This includes material proposed for inclusion into the next v4.1 41 draft. It includes one revised chapter, one rewritten chapter, and a 42 set of suggestions for changes in other part of the v4.1 document. 44 Table of Contents 46 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 47 2. Single-server Name Space . . . . . . . . . . . . . . . . . . . 5 48 2.1. Server Exports . . . . . . . . . . . . . . . . . . . . . . 5 49 2.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . . 5 50 2.3. Server Pseudo Filesystem . . . . . . . . . . . . . . . . . 6 51 2.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . . 6 52 2.5. Filehandle Volatility . . . . . . . . . . . . . . . . . . 6 53 2.6. Exported Root . . . . . . . . . . . . . . . . . . . . . . 7 54 2.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . . 7 55 2.8. Security Policy and Name Space Presentation . . . . . . . 7 56 3. Multi-server Name Space . . . . . . . . . . . . . . . . . . . 9 57 3.1. Location attributes . . . . . . . . . . . . . . . . . . . 9 58 3.2. File System Presence or Absence . . . . . . . . . . . . . 9 59 3.3. Getting Attributes for an Absent File System . . . . . . . 10 60 3.3.1. GETATTR Within an Absent File System . . . . . . . . . 11 61 3.3.2. READDIR and Absent File Systems . . . . . . . . . . . 12 62 3.4. Uses of Location Information . . . . . . . . . . . . . . . 12 63 3.4.1. File System Replication . . . . . . . . . . . . . . . 13 64 3.4.2. File System Migration . . . . . . . . . . . . . . . . 14 65 3.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . . 14 66 3.5. Additional Client-side Considerations . . . . . . . . . . 15 67 3.6. Effecting File System Transitions . . . . . . . . . . . . 16 68 3.6.1. Transparent File System Transitions . . . . . . . . . 17 69 3.6.2. Filehandles and File System Transitions . . . . . . . 18 70 3.6.3. Fileid's and File System Transitions . . . . . . . . . 19 71 3.6.4. Fsid's and File System Transitions . . . . . . . . . . 20 72 3.6.5. The Change Attribute and File System Transitions . . . 20 73 3.6.6. Lock State and File System Transitions . . . . . . . . 20 74 3.6.7. Write Verifiers and File System Transitions . . . . . 24 75 3.7. Effecting File System Referrals . . . . . . . . . . . . . 24 76 3.7.1. Referral Example (LOOKUP) . . . . . . . . . . . . . . 25 77 3.7.2. Referral Example (READDIR) . . . . . . . . . . . . . . 28 78 3.8. The Attribute fs_absent . . . . . . . . . . . . . . . . . 31 79 3.9. The Attribute fs_locations . . . . . . . . . . . . . . . . 31 80 3.10. The Attribute fs_locations_info . . . . . . . . . . . . . 33 81 3.11. The Attribute fs_status . . . . . . . . . . . . . . . . . 42 82 4. Other Changes . . . . . . . . . . . . . . . . . . . . . . . . 46 83 5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 47 84 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 48 85 Intellectual Property and Copyright Statements . . . . . . . . . . 49 87 1. Introduction 89 This draft consists of mainly of proposed material for the v4.1 90 draft, to replace the handling of server name space, migration, and 91 replication in the current v4.1 draft, which has been inherited 92 pretty much unchanged from RFC3530 [1]. 94 It is based on the current working group document on ideas published 95 in some previously published but now expired individual and working 96 group drafts (Guide for Referrals in NFSv4, and Next Steps for NFSv4 97 Migration/Replication. 99 Other than stylistic changes that reflect the difference in purpose 100 between those earlier drafts and the v4.1 spec-to-be, the following 101 changes have been made: 103 o Deleted type as an attribute to be returned in an absent fs 104 (sometimes). Too much additional complexity for no real value. 106 o Added RMDA-capability flag to fs_locations_info as requested by 107 Trond. 109 o Added field telling how-current fs is to fs_status as requested by 110 Craig E. 112 o In light of "we don't know if this is the right set of stuff" 113 comments I'd heard, restructured the fs_locations_info stuff for 114 greater changeability/expandability. Also hope to make this 115 easier to understand without compromising other goals. 117 o Deleted fh-replacement stuff due to lack of interest. Could add 118 back if people really want it. 120 o Deleted VLCACHE bit. Lack of interest. Could come back. 122 o Deleted option to transparently split an fs. Could come back if 123 there is a groundswell of support. 125 o Re-organized some of the continuity information in a way that I 126 thinks adds clarity to this. Got rid of same-fs and now have 127 explicitly server, endpoint, and sharing classes, which is a lot 128 clearer. In doing this, I adopted a straight session orientation, 129 although I have not had time to update this document to fully 130 reflect the working group's decisions to make sessions mandatory. 132 This document proposes the following changes relative to the current 133 draft of the spec for NFSv4 Minor Version 1 134 (draft-ietf-nfsv4-minorversion1-02.txt [2]): 136 o Deletion of the current chapter 4 (Filesystem Migration and 137 Replication) from that draft. 139 o Replacement of the current chapter 5 (NFS Server Name Space) by 140 the second chapter of this document (Single-server Name Space). 142 o Deletion of section 6.14 (Migration, Replication and State) from 143 the current v4.1 draft. 145 o Addition to the draft of the third chapter of this document, 146 (Multi-server Name Space) following the current chapter 10 of the 147 v4.1 drafts (NFSv4.1 Sessions). This provides the replacement 148 (rewritten to reflect referrals and other changes of the current 149 draft's) chapter 4 and section 6.14. 151 o Minor changes to other areas of the spec as set fourth in the 152 fourth chapter of this document (Other Spec Changes Needed). 154 2. Single-server Name Space 156 This chapter describes the NFSv4 single-server name space. Single- 157 server namespaces may be presented directly to clients, or they may 158 be used as a basis to form larger multi-server namespaces (e.g. site- 159 wide or organization-wide) to be presented to clients, as described 160 in Section 3. 162 2.1. Server Exports 164 On a UNIX server, the name space describes all the files reachable by 165 pathnames under the root directory or "/". On a Windows NT server 166 the name space constitutes all the files on disks named by mapped 167 disk letters. NFS server administrators rarely make the entire 168 server's filesystem name space available to NFS clients. More often 169 portions of the name space are made available via an "export" 170 feature. In previous versions of the NFS protocol, the root 171 filehandle for each export is obtained through the MOUNT protocol; 172 the client sends a string that identifies the export of name space 173 and the server returns the root filehandle for it. The MOUNT 174 protocol supports an EXPORTS procedure that will enumerate the 175 server's exports. 177 2.2. Browsing Exports 179 The NFS version 4 protocol provides a root filehandle that clients 180 can use to obtain filehandles for the exports of a particular server, 181 via a series of LOOKUP operations within a COMPOUND, to traverse a 182 path. A common user experience is to use a graphical user interface 183 (perhaps a file "Open" dialog window) to find a file via progressive 184 browsing through a directory tree. The client must be able to move 185 from one export to another export via single-component, progressive 186 LOOKUP operations. 188 This style of browsing is not well supported by the NFS version 2 and 189 3 protocols. The client expects all LOOKUP operations to remain 190 within a single server filesystem. For example, the device attribute 191 will not change. This prevents a client from taking name space paths 192 that span exports. 194 An automounter on the client can obtain a snapshot of the server's 195 name space using the EXPORTS procedure of the MOUNT protocol. If it 196 understands the server's pathname syntax, it can create an image of 197 the server's name space on the client. The parts of the name space 198 that are not exported by the server are filled in with a "pseudo 199 filesystem" that allows the user to browse from one mounted 200 filesystem to another. There is a drawback to this representation of 201 the server's name space on the client: it is static. If the server 202 administrator adds a new export the client will be unaware of it. 204 2.3. Server Pseudo Filesystem 206 NFS version 4 servers avoid this name space inconsistency by 207 presenting all the exports for a given server within the framework of 208 a single namespace, for that server. An NFS version 4 client uses 209 LOOKUP and READDIR operations to browse seamlessly from one export to 210 another. Portions of the server name space that are not exported are 211 bridged via a "pseudo filesystem" that provides a view of exported 212 directories only. A pseudo filesystem has a unique fsid and behaves 213 like a normal, read only filesystem. 215 Based on the construction of the server's name space, it is possible 216 that multiple pseudo filesystems may exist. For example, 218 /a pseudo filesystem 219 /a/b real filesystem 220 /a/b/c pseudo filesystem 221 /a/b/c/d real filesystem 223 Each of the pseudo filesystems are considered separate entities and 224 therefore will have its own unique fsid. 226 2.4. Multiple Roots 228 The DOS and Windows operating environments are sometimes described as 229 having "multiple roots". Filesystems are commonly represented as 230 disk letters. MacOS represents filesystems as top level names. NFS 231 version 4 servers for these platforms can construct a pseudo file 232 system above these root names so that disk letters or volume names 233 are simply directory names in the pseudo root. 235 2.5. Filehandle Volatility 237 The nature of the server's pseudo filesystem is that it is a logical 238 representation of filesystem(s) available from the server. 239 Therefore, the pseudo filesystem is most likely constructed 240 dynamically when the server is first instantiated. It is expected 241 that the pseudo filesystem may not have an on disk counterpart from 242 which persistent filehandles could be constructed. Even though it is 243 preferable that the server provide persistent filehandles for the 244 pseudo filesystem, the NFS client should expect that pseudo file 245 system filehandles are volatile. This can be confirmed by checking 246 the associated "fh_expire_type" attribute for those filehandles in 247 question. If the filehandles are volatile, the NFS client must be 248 prepared to recover a filehandle value (e.g. with a series of LOOKUP 249 operations) when receiving an error of NFS4ERR_FHEXPIRED. 251 2.6. Exported Root 253 If the server's root filesystem is exported, one might conclude that 254 a pseudo-filesystem is unneeded. This not necessarily so. Assume 255 the following filesystems on a server: 257 / disk1 (exported) 258 /a disk2 (not exported) 259 /a/b disk3 (exported) 261 Because disk2 is not exported, disk3 cannot be reached with simple 262 LOOKUPs. The server must bridge the gap with a pseudo-filesystem. 264 2.7. Mount Point Crossing 266 The server filesystem environment may be constructed in such a way 267 that one filesystem contains a directory which is 'covered' or 268 mounted upon by a second filesystem. For example: 270 /a/b (filesystem 1) 271 /a/b/c/d (filesystem 2) 273 The pseudo filesystem for this server may be constructed to look 274 like: 276 / (place holder/not exported) 277 /a/b (filesystem 1) 278 /a/b/c/d (filesystem 2) 280 It is the server's responsibility to present the pseudo filesystem 281 that is complete to the client. If the client sends a lookup request 282 for the path "/a/b/c/d", the server's response is the filehandle of 283 the filesystem "/a/b/c/d". In previous versions of the NFS protocol, 284 the server would respond with the filehandle of directory "/a/b/c/d" 285 within the filesystem "/a/b". 287 The NFS client will be able to determine if it crosses a server mount 288 point by a change in the value of the "fsid" attribute. 290 2.8. Security Policy and Name Space Presentation 292 The application of the server's security policy needs to be carefully 293 considered by the implementor. One may choose to limit the 294 viewability of portions of the pseudo filesystem based on the 295 server's perception of the client's ability to authenticate itself 296 properly. However, with the support of multiple security mechanisms 297 and the ability to negotiate the appropriate use of these mechanisms, 298 the server is unable to properly determine if a client will be able 299 to authenticate itself. If, based on its policies, the server 300 chooses to limit the contents of the pseudo filesystem, the server 301 may effectively hide filesystems from a client that may otherwise 302 have legitimate access. 304 As suggested practice, the server should apply the security policy of 305 a shared resource in the server's namespace to the components of the 306 resource's ancestors. For example: 308 / 309 /a/b 310 /a/b/c 312 The /a/b/c directory is a real filesystem and is the shared resource. 313 The security policy for /a/b/c is Kerberos with integrity. The 314 server should apply the same security policy to /, /a, and /a/b. 315 This allows for the extension of the protection of the server's 316 namespace to the ancestors of the real shared resource. 318 For the case of the use of multiple, disjoint security mechanisms in 319 the server's resources, the security for a particular object in the 320 server's namespace should be the union of all security mechanisms of 321 all direct descendants. 323 3. Multi-server Name Space 325 NFSv4.1 supports attributes that allow a namespace to extend beyond 326 the boundaries of a single server. Use of such multi-server 327 namespaces is optional, and for many purposes, single-server 328 namespace are perfectly acceptable. Use of multi-server namespaces 329 can provide many advantages, however, by separating a file system's 330 logical position in a name space from the (possibly changing) 331 logistical and administrative considerations that result in 332 particular file systems being located on particular servers. 334 3.1. Location attributes 336 NFSv4 contains recommended attributes that allow file systems on one 337 server to be associated with one or more instances of that file 338 system on other servers. These attributes specify such file systems 339 by specifying a server name (either a DNS name or an IP address) 340 together with the path of that filesystem within that server's 341 single-server name space. 343 The fs_locations_info recommended attribute allows specification of 344 one more file systems locations where the data corresponding to a 345 given file system may be found. This attributes provides to the 346 client, in addition to information about file system locations, 347 extensive information about the various file system choices (e.g. 348 priority for use, writability, currency, etc.) as well as information 349 to help the client efficiently effect as seamless a transition as 350 possible among multiple file system instances, when and if that 351 should be necessary. 353 The fs_locations recommended attribute is inherited from NFSv4.0 and 354 only allows specification of the file system locations where the data 355 corresponding to a given file system may be found. Servers should 356 make this attribute available whenever fs_locations_info is 357 supported, but client use of fs_locations_info is to be preferred. 359 3.2. File System Presence or Absence 361 A given location in an NFSv4 namespace (typically but not necessarily 362 a multi-server namespace) can have a number of file system locations 363 associated with it (via the fs_locations or fs_locations_info 364 attribute). There may also be an actual current file system at that 365 location, accessible via normal namespace operations (e.g. LOOKUP). 366 In this case there, the file system is said to be "present" at that 367 position in the namespace and clients will typically use it, 368 reserving use of additional locations specified via the location- 369 related attributes to situations in which the principal location is 370 no longer available. 372 When there is no actual filesystem at the namespace location in 373 question, the file system is said to be "absent". An absent file 374 system contains no files or directories other than the root and any 375 reference to it, except to access a small set of attributes useful in 376 determining alternate locations, will result in an error, 377 NFS4ERR_MOVED. Note that if the server ever returns NFS4ERR_MOVED 378 (i.e. file systems may be absent), it MUST support the fs_locations 379 attribute and SHOULD support the fs_locations_info and fs_absent 380 attributes. 382 While the error name suggests that we have a case of a file system 383 which once was present, and has only become absent later, this is 384 only one possibility. A position in the namespace may be permanently 385 absent with the file system(s) designated by the location attributes 386 the only realization. The name NFS4ERR_MOVED reflects an earlier, 387 more limited conception of its function, but this error will be 388 returned whenever the referenced file system is absent, whether it 389 has moved or not. 391 Except in the case of GETATTR-type operations (to be discussed 392 later), when the current filehandle at the start of an operation is 393 within an absent file system, that operation is not performed and the 394 error NFS4ERR_MOVED returned, to indicate that the filesystem is 395 absent on the current server. 397 Because a GETFH cannot succeed, if the current filehandle is within 398 an absent file system, filehandles within an absent filesystem cannot 399 be transferred to the client. When a client does have filehandles 400 within an absent file system, it is the result of obtaining them when 401 the file system was present, and having the file system become absent 402 subsequently. 404 It should be noted that because the check for the current filehandle 405 being within an absent filesystem happens at the start of every 406 operation, operations which change the current filehandle so that it 407 is within an absent filesystem will not result in an error. This 408 allows such combinations as PUTFH-GETATTR and LOOKUP-GETATTR to be 409 used to get attribute information, particularly location attribute 410 information, as discussed below. 412 The recommended file system attribute fs_absent can used to 413 interrogate the present/absent status of a given file system. 415 3.3. Getting Attributes for an Absent File System 417 When a file system is absent, most attributes are not available, but 418 it is necessary to allow the client access to the small set of 419 attributes that are available, and most particularly those that give 420 information about the correct current locations for this file system, 421 fs_locations and fs_locations_info. 423 3.3.1. GETATTR Within an Absent File System 425 As mentioned above, an exception is made for GETATTR in that 426 attributes may be obtained for a filehandle within an absent file 427 system. This exception only applies if the attribute mask contains 428 at least one attribute bit that indicates the client is interested in 429 a result regarding an absent file system: fs_locations, 430 fs_locations_info, or fs_absent. If none of these attributes is 431 requested, GETATTR will result in an NFS4ERR_MOVED error. 433 When a GETATTR is done on an absent file system, the set of supported 434 attributes is very limited. Many attributes, including those that 435 are normally mandatory will not be available on an absent file 436 system. In addition to the attributes mentioned above (fs_locations, 437 fs_locations_info, fs_absent), the following attributes SHOULD be 438 available on absent file systems, in the case of recommended 439 attributes at least to the same degree that they are available on 440 present file systems. 442 change: This attribute is useful for absent file systems and can be 443 helpful in summarizing to the client when any of the location- 444 related attributes changes. 446 fsid: This attribute should be provided so that the client can 447 determine file system boundaries, including, in particular, the 448 boundary between present and absent file systems. 450 mounted_on_fileid: For objects at the top of an absent file system 451 this attribute needs to be available. Since the fileid is one 452 which is within the present parent file system, there should be no 453 need to reference the absent file system to provide this 454 information. 456 Other attributes SHOULD NOT be made available for absent file 457 systems, even when it is possible to provide them. The server should 458 not assume that more information is always better and should avoid 459 gratuitously providing additional information. 461 When a GETATTR operation includes a bit mask for one of the 462 attributes fs_locations, fs_locations_info, or absent, but where the 463 bit mask includes attributes which are not supported, GETATTR will 464 not return an error, but will return the mask of the actual 465 attributes supported with the results. 467 Handling of VERIFY/NVERIFY is similar to GETATTR in that if the 468 attribute mask does not include fs_locations, fs_locations_info, or 469 absent, the error NFS4ERR_MOVED will result. It differs in that any 470 appearance in the attribute mask of an attribute not supported for an 471 absent file system (and note that this will include some normally 472 mandatory attributes), will also cause an NFS4ERR_MOVED result. 474 3.3.2. READDIR and Absent File Systems 476 A READDIR performed when the current filehandle is within an absent 477 file system will result in an NFS4ERR_MOVED error, since, unlike the 478 case of GETATTR, no such exception is made for READDIR. 480 Attributes for an absent file system may be fetched via a READDIR for 481 a directory in a present file system, when that directory contains 482 the root directories of one or more absent filesystems. In this 483 case, the handling is as follows: 485 o If the attribute set requested includes one of the attributes 486 fs_locations, fs_locations_info, or absent, then fetching of 487 attributes proceeds normally and no NFS4ERR_MOVED indication is 488 returned, even when the rdattr_error attribute is requested. 490 o If the attribute set requested does not include one of the 491 attributes fs_locations, fs_locations_info, or fs_absent, then if 492 the rdattr_error attribute is requested, each directory entry for 493 the root of an absent file system, will report NFS4ERR_MOVED as 494 the value of the rdattr_error attribute. 496 o If the attribute set requested does not include any of the 497 attributes fs_locations, fs_locations_info, fs_absent, or 498 rdattr_error then the occurrence of the root of an absent file 499 system within the directory will result in the READDIR failing 500 with an NFSER_MOVED error. 502 o The unavailability of an attribute because of a file system's 503 absence, even one that is ordinarily mandatory, does not result in 504 any error indication. The set of attributes returned for the root 505 directory of the absent filesystem in that case is simply 506 restricted to those actually available. 508 3.4. Uses of Location Information 510 The location-bearing attributes (fs_locations and fs_locations_info), 511 provide, together with the possibility of absent filesystems, a 512 number of important facilities in providing reliable, manageable, and 513 scalable data access. 515 When a file system is present, these attribute can provide 516 alternative locations, to be used to access the same data, in the 517 event that server failures, communications problems, or other 518 difficulties, make continued access to the current file system 519 impossible or otherwise impractical. Provision of such alternate 520 locations is referred to as "replication" although there are cases in 521 which replicated sets of data are not in fact present, and the 522 replicas are instead different paths to the same data. 524 When a file system is present and becomes absent, clients can be 525 given the opportunity to have continued access to their data, at an 526 alternate location. In this case, a continued attempt to use the 527 data in the now-absent file system will result in an NFSERR_MOVED 528 error and at that point the successor locations (typically only one 529 but multiple choices are possible) can be fetched and used to 530 continue access. Transfer of the file system contents to the new 531 location is referred to as "migration", but it should be kept in mind 532 that there are cases in which this term can be used, like 533 "replication" when there is no actual data migration per se. 535 Where a file system was not previously present, specification of file 536 system location provides a means by which file systems located on one 537 server can be associated with a name space defined by another server, 538 thus allowing a general multi-server namespace facility. Designation 539 of such a location, in place of an absent filesystem, is called 540 "referral". 542 3.4.1. File System Replication 544 The fs_locations and fs_locations_info attributes provide alternative 545 locations, to be used to access data in place of the current file 546 system. On first access to a filesystem, the client should obtain 547 the value of the set alternate locations by interrogating the 548 fs_locations or fs_locations_info attribute, with the latter being 549 preferred. 551 In the event that server failures, communications problems, or other 552 difficulties, make continued access to the current file system 553 impossible or otherwise impractical, the client can use the alternate 554 locations as a way to get continued access to his data. 556 The alternate locations may be physical replicas of the (typically 557 read-only) file system data, or they may reflect alternate paths to 558 the same server or provide for the use of various form of server 559 clustering in which multiple servers provide alternate ways of 560 accessing the same physical file system. How these different modes 561 of file system transition are represented within the fs_locations and 562 fs_locations_info attributes and how the client deals with file 563 system transition issues will be discussed in detail below. 565 3.4.2. File System Migration 567 When a file system is present and becomes absent, clients can be 568 given the opportunity to have continued access to their data, at an 569 alternate location, as specified by the fs_locations or 570 fs_locations_info attribute. Typically, a client will be accessing 571 the file system in question, get a an NFS4ERR_MOVED error, and then 572 use the fs_locations or fs_locations_info attribute to determine the 573 new location of the data. When fs_locations_info is used, additional 574 information will be available which will define the nature of the 575 client's handling of the transition to a new server. 577 Such migration can be helpful in providing load balancing or general 578 resource reallocation. The protocol does not specify how the 579 filesystem will be moved between servers. It is anticipated that a 580 number of different server-to-server transfer mechanisms might be 581 used with the choice left to the server implementor. The NFSv4.1 582 protocol specifies the method used to communicate the migration event 583 between client and server. 585 The new location may be an alternate communication path to the same 586 server, or, in the case of various forms of server clustering, 587 another server providing access to the same physical file system. 588 The client's responsibilities in dealing with this transition depend 589 on the specific nature of the new access path and how and whether 590 data was in fact migrated. These issues will be discussed in detail 591 below. 593 Although a single successor location is typical, multiple locations 594 may be provided, together with information that allows priority among 595 the choices to be indicated, via information in the fs_locations_info 596 attribute. Where suitable clustering mechanisms make it possible to 597 provide multiple identical file systems or paths to them, this allows 598 the client the opportunity to deal with any resource or 599 communications issues that might limit data availability. 601 3.4.3. Referrals 603 Referrals provide a way of placing a file system in a location 604 essentially without respect to its physical location on a given 605 server. This allows a single server of a set of servers to present a 606 multi-server namespace that encompasses filesystems located on 607 multiple servers. Some likely uses of this include establishment of 608 site-wide or organization-wide namespaces, or even knitting such 609 together into a truly global namespace. 611 Referrals occur when a client determines, upon first referencing a 612 position in the current namespace, that it is part of a new file 613 system and that that file system is absent. When this occurs, 614 typically by receiving the error NFS4ERR_MOVED, the actual location 615 or locations of the file system can be determined by fetching the 616 fs_locations or fs_locations_info attribute. 618 Use of multi-server namespaces is enabled by NFSv4 but is not 619 required. The use of multi-server namespaces and their scope will 620 depend on the application used, and system administration 621 preferences. 623 Multi-server namespaces can be established by a single server 624 providing a large set of referrals to all of the included 625 filesystems. Alternatively, a single multi-server namespace may be 626 administratively segmented with separate referral file systems (on 627 separate servers) for each separately-administered section of the 628 name space. Any segment or the top-level referral file system may 629 use replicated referral file systems for higher availability. 631 3.5. Additional Client-side Considerations 633 When clients make use of servers that implement referrals and 634 migration, care should be taken so that a user who mounts a given 635 filesystem that includes a referral or a relocated filesystem 636 continue to see a coherent picture of that user-side filesystem 637 despite the fact that it contains a number of server-side filesystems 638 which may be on different servers. 640 One important issue is upward navigation from the root of a server- 641 side filesystem to its parent (specified as ".." in UNIX). The 642 client needs to determine when it hits an fsid root going up the 643 filetree. When at such a point, and needs to ascend to the parent, 644 it must do so locally instead of sending a LOOKUPP call to the 645 server. The LOOKUPP would normally return the ancestor of the target 646 filesystem on the target server, which may not be part of the space 647 that the client mounted. 649 Another issue concerns refresh of referral locations. When referrals 650 are used extensively, they may change as server configurations 651 change. It is expected that clients will cache information related 652 to traversing referrals so that future client side requests are 653 resolved locally without server communication. This is usually 654 rooted in client-side name lookup caching. Clients should 655 periodically purge this data for referral points in order to detect 656 changes in location information. When the change attribute changes 657 for directories that hold referral entries or for the referral 658 entries themselves, clients should consider any associated cached 659 referral information to be out of date. 661 3.6. Effecting File System Transitions 663 Transitions between file system instances, whether due to switching 664 between replicas upon server unavailability, or in response to a 665 server-initiated migration event are best dealt with together. Even 666 though the prototypical use cases of replication and migration 667 contain distinctive sets of features, when all possibilities for 668 these operations are considered, the underlying unity of these 669 operations, from the client's point of view is clear, even though for 670 the server pragmatic considerations will normally force different 671 implementation strategies for planned and unplanned transitions. 673 A number of methods are possible for servers to replicate data and to 674 track client state in order to allow clients to transition between 675 file system instances with a minimum of disruption. Such methods 676 vary between those that use inter-server clustering techniques to 677 limit the changes seen by the client, to those that are less 678 aggressive, use more standard methods of replicating data, and impose 679 a greater burden on the client to adapt to the transition. 681 The NFSv4.1 protocol does not impose choices on clients and servers 682 with regard to that spectrum of transition methods. In fact, there 683 are many valid choices, depending on client and application 684 requirements and their interaction with server implementation 685 choices. The NFSv4.1 protocol does define the specific choices that 686 can be made, how these choices are communicated to the client and how 687 the client is to deal with any discontinuities. 689 In the sections below references will be made to various possible 690 server implementation choices as a way of illustrating the transition 691 scenarios that clients may deal with. The intent here is not to 692 define or limit server implementations but rather to illustrate the 693 range of issues that clients may face. 695 In the discussion below, references will be made to a file system 696 having a particular property or of two file systems (typically the 697 source and destination) belonging to a common class of any of several 698 types. Two file systems that belong to such a class share some 699 important aspect of file system behavior that clients may depend upon 700 when present, to easily effect a seamless transition between file 701 system instances. Conversely, where the file systems do not belong 702 to such a common class, the client has to deal with various sorts of 703 implementation discontinuities which may cause performance or other 704 issues in effecting a transition. 706 Where the fs_locations_info attribute is available, such file system 707 classification data will be made directly available to the client. 708 See Section 3.10 for details. When only fs_locations is available, 709 default assumptions with regard to such classifications have to be 710 inferred. See Section 3.9 for details. 712 In cases in which one server is expected to accept opaque values from 713 the client that originated from another server, it is a wise 714 implementation practice for the servers to encode the "opaque" values 715 in network byte order. If this is done, servers acting as replicas 716 or immigrating filesystems will be able to parse values like 717 stateids, directory cookies, filehandles, etc. even if their native 718 byte order is different from that of other servers cooperating in the 719 replication and migration of the filesystem. 721 3.6.1. Transparent File System Transitions 723 Discussion of transition possibilities will start at the most 724 transparent end of the spectrum of possibilities. When there are 725 multiple paths to a single server, and there are network problems 726 that force another path to be used, or when a path is to be put out 727 of service, a replication or migration event may occur without any 728 real replication or migration. Nevertheless, such events fit within 729 the same general framework in that there is a transition between file 730 system locations, communicated just as other, less transparent 731 transitions are communicated. 733 There are cases of transparent transitions that may happen 734 independent of location information, in that a specific host name, 735 may map to several IP addresses, allowing session trunking to provide 736 alternate paths. In other cases, however multiple addresses may have 737 separate location entries for specific file systems to preferentially 738 direct traffic for those specific file systems to certain server 739 addresses, subject to planned or unplanned, corresponding to a 740 nominal replication or migrations event. 742 The specific details of the transition depend on file system 743 equivalence class information (as provided by the fs_locations_info 744 and fs_locations attributes). 746 o Where the old and new filesystems belong to the same _endpoint_ 747 class, the transition consists of creating a new connection which 748 is associated with the existing session to the old server 749 endpoint. Where a connection cannot be associated with the 750 existing session, the target server must be able to recognize the 751 sessionid as invalid and force creation on a new session or a new 752 client id. 754 o Where the old and new filesystems do not belong to the same 755 _endpoint_ classes, but to the same _server_ class, the transition 756 consists of creating a new session, associated with the existing 757 clientid. Where the clientid is stale, the stale, the target 758 server must be able to recognize the clientid as no longer valid 759 and force creation of a new clientid. 761 In either of the above cases, the file system may be shown as 762 belonging to the same _sharing_ class, class allowing the alternate 763 session or connection to be established in advance and used either to 764 accelerate the file system transition when necessary (avoiding 765 connection latency), or to provide higher performance by actively 766 using multiple paths simultaneously. 768 When two file systems belong to the same _endpoint_ class, or 769 _sharing_ class, many transition issues are eliminated, and any 770 information indicating otherwise is ignored as erroneous. 772 In all such transparent transition cases, the following apply: 774 o File handles stay the same if persistent and if volatile are only 775 subject to expiration, if they would be in the absence of file 776 system transition. 778 o Fileid values do not change across the transition. 780 o The file system will have the same fsid in both the old and new 781 the old and new locations. 783 o Change attribute values are consistent across the transition and 784 do not have to be refetched. When change attributes indicate that 785 a cached object is still valid, it can remain cached. 787 o Session, client, and state identifier retain their validity across 788 the transition, except where their staleness is recognized and 789 reported by the new server. Except where such staleness requires 790 it, no lock reclamation is needed. 792 o Write verifiers are presumed to retain their validity and can be 793 presented to COMMIT, with the expectation that if COMMIT on the 794 new server accept them as valid, then that server has all of the 795 data unstably written to the original server and has committed it 796 to stable storage as requested. 798 3.6.2. Filehandles and File System Transitions 800 There are a number of ways in which filehandles can be handled across 801 a file system transition. These can be divided into two broad 802 classes depending upon whether the two file systems across which the 803 transition happens share sufficient state to effect some sort of 804 continuity of filesystem handling. 806 When there is no such co-operation in filehandle assignment, the two 807 file systems are reported as being in different _handle_ classes. In 808 this case, all filehandles are assumed to expire as part of the file 809 system transition. Note that this behavior does not depend on 810 fh_expire_type attribute and supersedes the specification of 811 FH4_VOL_MIGRATION bit, which only affects behavior when 812 fs_locations_info is not available. 814 When there is co-operation in filehandle assignment, the two file 815 systems are reported as being in the same _handle_ classes. In this 816 case, persistent filehandle remain valid after the file system 817 transition, while volatile filehandles (excluding those while are 818 only volatile due to the FH4_VOL_MIGRATION bit) are subject to 819 expiration on the target server. 821 3.6.3. Fileid's and File System Transitions 823 In NFSv4.0, the issue of continuity of fileid's in the event of a 824 file system transition was not addressed. The general expectation 825 had been that in situations in which the two filesystem instances are 826 created by a single vendor using some sort of filesystem image copy, 827 fileid's will be consistent across the transition while in the 828 analogous multi-vendor transitions they will not. This poses 829 difficulties, especially for the client without special knowledge of 830 the of the transition mechanisms adopted by the server. 832 It is important to note that while clients themselves may have no 833 trouble with a fileid changing as a result of a file system 834 transition event, applications do typically have access to the fileid 835 (e.g. via stat), and the result of this is that an application may 836 work perfectly well if there is no filesystem instance transition or 837 if any such transition is among instances created by a single vendor, 838 yet be unable to deal with the situation in which a multi-vendor 839 transition occurs, at the wrong time. 841 Providing the same fileid's in a multi-vendor (multiple server 842 vendors) environment has generally been held to be quite difficult. 843 While there is work to be done, it needs to be pointed out that this 844 difficulty is partly self-imposed. Servers have typically identified 845 fileid with inode number, i.e. with a quantity used to find the file 846 in question. This identification poses special difficulties for 847 migration of an fs between vendors where assigning the same index to 848 a given file may not be possible. Note here that a fileid does not 849 require that it be useful to find the file in question, only that it 850 is unique within the given fs. Servers prepared to accept a fileid 851 as a single piece of metadata and store it apart from the value used 852 to index the file information can relatively easily maintain a fileid 853 value across a migration event, allowing a truly transparent 854 migration event. 856 In any case, where servers can provide continuity of fileids, they 857 should and the client should be able to find out that such continuity 858 is available, and take appropriate action. Information about the 859 continuity (or lack thereof) of fileid's across a file system is 860 represented by specifying whether the file systems in question are of 861 the same _fileid_ class. 863 3.6.4. Fsid's and File System Transitions 865 Since fsid's are only unique within a per-server basis, it is to be 866 expected that they will change during a file system transition. 867 Clients should not make the fsid's received from the server visible 868 to application since they may not be globally unique, and because 869 they may change during a file system transition event. Applications 870 are best served if they are isolated from such transitions to the 871 extent possible. 873 3.6.5. The Change Attribute and File System Transitions 875 Since the change attribute is defined as a server-specific one, 876 change attributes fetched from one server are normally presumed to be 877 invalid on another server. Such a presumption is troublesome since 878 it would invalidate all cached change attributes, requiring 879 refetching. Even more disruptive, the absence of any assured 880 continuity for the change attribute means that even if the same value 881 is gotten on refetch no conclusions can drawn as to whether the 882 object in question has changed. The identical change attribute could 883 be merely an artifact, of a modified file with a different change 884 attribute construction algorithm, with that new algorithm just 885 happening to result in an identical change value. 887 When the two file systems have consistent change attribute formats, 888 and this fact is communicated to the client by reporting as in the 889 same _change_ class, the client may assume a continuity of change 890 attribute construction and handle this situation just as it would be 891 handled without any filesystem transition. 893 3.6.6. Lock State and File System Transitions 895 In a file system transition, the two file systems may have co- 896 operated in state management. When this is the case, and the two 897 file systems belong to the same _state_ class, the two file systems 898 will have compatible state environments. In the case of migration, 899 the servers involved in the migration of a filesystem SHOULD transfer 900 all server state from the original to the new server. When this 901 done, it must be done in a way that is transparent to the client. 903 With replication, such a degree of common state is typically not the 904 case. Clients, however should use the information provided by the 905 fs_locations_info attribute to determine whether such sharing is in 906 effect when this is available, and only if that attribute is not 907 available depend on these defaults. 909 This state transfer will reduce disruption to the client when a file 910 system transition If the servers are successful in transferring all 911 state, the client will continue to use stateids assigned by the 912 original server. Therefore the new server must recognize these 913 stateids as valid. This holds true for the clientid as well. Since 914 responsibility for an entire filesystem is transferred is with such 915 an event, there is no possibility that conflicts will arise on the 916 new server as a result of the transfer of locks. 918 As part of the transfer of information between servers, leases would 919 be transferred as well. The leases being transferred to the new 920 server will typically have a different expiration time from those for 921 the same client, previously on the old server. To maintain the 922 property that all leases on a given server for a given client expire 923 at the same time, the server should advance the expiration time to 924 the later of the leases being transferred or the leases already 925 present. This allows the client to maintain lease renewal of both 926 classes without special effort. 928 When the two servers belong to the same _state_ class, it does not 929 necessarily mean that when dealing with the transition, the client 930 will not have to reclaim state. However it does mean that the client 931 may proceed using his current clientid and stateid's just as if there 932 had been no file system transition event and only reclaim state when 933 an NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID error is received. 935 File systems co-operating in state management may actually share 936 state or simply divide the id space so as to recognize (and reject as 937 stale) each others state and clients id's. Servers which do share 938 state may not do under all conditions or all times. The requirement 939 for the server is that if it cannot be sure in accepting an id that 940 it reflects the locks the client was given, it must treat all 941 associated state as stale and report it as such to the client. 943 When two file systems belong to different _state_ classes, the client 944 must establish a new state on the destination, and reclaim if 945 possible. In this case, old stateids and clientid's should not be 946 presented to the new server since there is no assurance that they 947 will not conflict with id's valid on that server. 949 In either case, when actual locks are not known to be maintained, the 950 destination server may establish a grace period specific to the given 951 file system, with non-reclaim locks being rejected for that file 952 system, even though normal locks are being granted for other file 953 systems. Clients should not infer the absence of a grace period for 954 file systems being transitioned to a server from responses to 955 requests for other file systems. 957 In the case of lock reclamation for a given file system after a file 958 system transition, edge conditions can arise similar to those for 959 reclaim after server reboot (although in the case of the planned 960 state transfer associated with migration, these can be avoided by 961 securely recording lock state as part of state migration. Where the 962 destination server cannot guarantee that locks will not be 963 incorrectly granted, the destination server should not establish a 964 file-system-specific grace period. 966 In place of a file-system-specific version of RECLAIM_COMPLETE, 967 servers may assume that an attempt to obtain a new lock, other than 968 be reclaim, indicate the end of the client's attempt to reclaim locks 969 for that file system. [NOTE: The alternative would be to adapt 970 RECLAIM_COMPLETE to this task]. 972 Information about client identity that may be propagated between 973 servers in the form of nfs_client_id4 and associated verifiers, under 974 the assumption that the client presents the same values to all the 975 servers with which it deals. [NOTE: This contradicts what is 976 currently said about SETCLIENTID, and interacts with the issue of 977 what sessions should do about this.] 979 Servers are encouraged to provide facilities to allow locks to be 980 reclaimed on the new server after a file system transition. Often, 981 however, in cases in which the two file systems are not of the same 982 _state _ class, such facilities may not be available and client 983 should be prepared to re-obtain locks, even though it is possible 984 that the client may have his LOCK or OPEN request denied due to a 985 conflicting lock. In some environments, such as the transition 986 between read-only file systems, such denial of locks should not pose 987 large difficulties in practice. When an attempt to re-establish a 988 lock on a new server is denied, the client should treat the situation 989 as if his original lock had been revoked. In all cases in which the 990 lock is granted, the client cannot assume that no conflicting could 991 have been granted in the interim. Where change attribute continuity 992 is present, the client may check the change attribute to check for 993 unwanted file modifications. Where even this is not available, and 994 the file system is not read-only a client may reasonably treat all 995 pending locks as having been revoked. 997 3.6.6.1. Leases and File System Transitions 999 In the case of lease renewal, the client may not be submitting 1000 requests for a filesystem that has been transferred to another 1001 server. This can occur because of the lease renewal mechanism. The 1002 client renews leases for all filesystems when submitting a request to 1003 any one filesystem at the server. 1005 In order for the client to schedule renewal of leases that may have 1006 been relocated to the new server, the client must find out about 1007 lease relocation before those leases expire. To accomplish this, all 1008 operations which renew leases for a client (i.e. OPEN, CLOSE, READ, 1009 WRITE, RENEW, LOCK, LOCKT, LOCKU), will return the error 1010 NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be 1011 renewed has been transferred to a new server. This condition will 1012 continue until the client receives an NFS4ERR_MOVED error and the 1013 server receives the subsequent GETATTR for the fs_locations or 1014 fs_locations_info attribute for an access to each filesystem for 1015 which a lease has been moved to a new server. 1017 [ISSUE: There is a conflict between this and the idea in the sessions 1018 text that we can have every op in the session implicitly renew the 1019 lease. This needs to be dealt with. D. Noveck will create an issue 1020 in the issue tracker.] 1022 When a client receives an NFS4ERR_LEASE_MOVED error, it should 1023 perform an operation on each filesystem associated with the server in 1024 question. When the client receives an NFS4ERR_MOVED error, the 1025 client can follow the normal process to obtain the new server 1026 information (through the fs_locations and fs_locations_info 1027 attributes) and perform renewal of those leases on the new server, 1028 unless information in fs_locations_info attribute shows that no state 1029 could have been transferred. If the server has not had state 1030 transferred to it transparently, the client will receive either 1031 NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID from the new server, 1032 as described above, and the client can then recover state information 1033 as it does in the event of server failure. 1035 3.6.6.2. Transitions and the Lease_time Attribute 1037 In order that the client may appropriately manage its leases in the 1038 case of a file system transition, the destination server must 1039 establish proper values for the lease_time attribute. 1041 When state is transferred transparently, that state should include 1042 the correct value of the lease_time attribute. The lease_time 1043 attribute on the destination server must never be less than that on 1044 the source since this would result in premature expiration of leases 1045 granted by the source server. Upon transitions in which state is 1046 transferred transparently, the client is under no obligation to re- 1047 fetch the lease_time attribute and may continue to use the value 1048 previously fetched (on the source server). 1050 If state has not been transferred transparently, either because the 1051 file systems are show as being in different state classes or because 1052 the client sees a real or simulated server reboot), the client should 1053 fetch the value of lease_time on the new (i.e. destination) server, 1054 and use it for subsequent locking requests. However the server must 1055 respect a grace period at least as long as the lease_time on the 1056 source server, in order to ensure that clients have ample time to 1057 reclaim their lock before potentially conflicting non-reclaimed locks 1058 are granted. 1060 3.6.7. Write Verifiers and File System Transitions 1062 In a file system transition, the two file systems may be clustered in 1063 the handling of unstably written data. When this is the case, and 1064 the two file systems belong to the same _verifier_ class, valid 1065 verifiers from one system may be recognized by the other and 1066 superfluous writes avoided. There is no requirement that all valid 1067 verifiers be recognized, but it cannot be the case that a verifier is 1068 recognized as valid when it is not. [NOTE: We need to resolve the 1069 issue of proper verifier scope]. 1071 When two file systems belong to different _verifier_ classes, the 1072 client must assume that all unstable writes in existence at the time 1073 file system transition, have been lost since there is no way the old 1074 verifier can recognized as valid (or not) on the target server. 1076 3.7. Effecting File System Referrals 1078 Referrals are effected when an absent file system is encountered, and 1079 one or more alternate locations are made available by the 1080 fs_locations or fs_locations_info attributes. The client will 1081 typically get an NFS4ERR_MOVED error, fetch the appropriate location 1082 information and proceed to access the file system on different 1083 server, even though it retains its logical position within the 1084 original namespace. 1086 The examples given in the sections below are somewhat artificial in 1087 that an actual client will not typically do a multi-component lookup, 1088 but will have cached information regarding the upper levels of the 1089 name hierarchy. However, these example are chosen to make the 1090 required behavior clear and easy to put within the scope of a small 1091 number of requests, without getting unduly into details of how 1092 specific clients might choose to cache things. 1094 3.7.1. Referral Example (LOOKUP) 1096 Let us suppose that the following COMPOUND is issued in an 1097 environment in which /src/linux/2.7/latest is absent from the target 1098 server. This may be for a number of reasons. It may be the case 1099 that the file system has moved, or, it may be the case that the 1100 target server is functioning mainly, or solely, to refer clients to 1101 the servers on which various file systems are located. 1103 o PUTROOTFH 1105 o LOOKUP "src" 1107 o LOOKUP "linux" 1109 o LOOKUP "2.7" 1111 o LOOKUP "latest" 1113 o GETFH 1115 o GETATTR fsid,fileid,size,ctime 1117 Under the given circumstances, the following will be the result. 1119 o PUTROOTFH --> NFS_OK. The current fh is now the root of the 1120 pseudo-fs. 1122 o LOOKUP "src" --> NFS_OK. The current fh is for /src and is within 1123 the pseudo-fs. 1125 o LOOKUP "linux" --> NFS_OK. The current fh is for /src/linux and 1126 is within the pseudo-fs. 1128 o LOOKUP "2.7" --> NFS_OK. The current fh is for /src/linux/2.7 and 1129 is within the pseudo-fs. 1131 o LOOKUP "latest" --> NFS_OK. The current fh is for /src/linux/2.7/ 1132 latest and is within a new, absent fs, but ... the client will 1133 never see the value of that fh. 1135 o GETFH --> NFS4ERR_MOVED. Fails because current fh is in an absent 1136 fs at the start of the operation and the spec makes no exception 1137 for GETFH. 1139 o GETATTR fsid,fileid,size,ctime. Not executed because the failure 1140 of the GETFH stops processing of the COMPOUND. 1142 Given the failure of the GETFH, the client has the job of determining 1143 the root of the absent file system and where to find that file 1144 system, i.e. the server and path relative to that server's root fh. 1145 Note here that in this example, the client did not obtain filehandles 1146 and attribute information (e.g. fsid) for the intermediate 1147 directories, so that he would not be sure where the absent file 1148 system starts. It could be the case, for example, that 1149 /src/linux/2.7 is the root of the moved filesystem and that the 1150 reason that the lookup of "latest" succeeded is that the filesystem 1151 was not absent on that op but was moved between the last LOOKUP and 1152 the GETFH (since COMPOUND is not atomic). Even if we had the fsid's 1153 for all of the intermediate directories, we could have no way of 1154 knowing that /src/linux/2.7/latest was the root of a new fs, since we 1155 don't yet have its fsid. 1157 In order to get the necessary information, let us re-issue the chain 1158 of lookup's with GETFH's and GETATTR's to at least get the fsid's so 1159 we can be sure where the appropriate fs boundaries are. The client 1160 could choose to get fs_locations_info at the same time but in most 1161 cases the client will have a good guess as to where fs boundaries are 1162 (because of where NFS4ERR_MOVED was gotten and where not) making 1163 fetching of fs_locations_info unnecessary. 1165 OP01: PUTROOTFH --> NFS_OK 1167 - Current fh is root of pseudo-fs. 1169 OP02: GETATTR(fsid) --> NFS_OK 1171 - Just for completeness. Normally, clients will know the fsid of 1172 the pseudo-fs as soon as they establish communication with a 1173 server. 1175 OP03: LOOKUP "src" --> NFS_OK 1177 OP04: GETATTR(fsid) --> NFS_OK 1179 - Get current fsid to see where fs boundaries are. The fsid will be 1180 that for the pseudo-fs in this example, so no boundary. 1182 OP05: GETFH --> NFS_OK 1184 - Current fh is for /src and is within pseudo-fs. 1186 OP06: LOOKUP "linux" --> NFS_OK 1187 - Current fh is for /src/linux and is within pseudo-fs. 1189 OP07: GETATTR(fsid) --> NFS_OK 1191 - Get current fsid to see where fs boundaries are. The fsid will be 1192 that for the pseudo-fs in this example, so no boundary. 1194 OP08: GETFH --> NFS_OK 1196 - Current fh is for /src/linux and is within pseudo-fs. 1198 OP09: LOOKUP "2.7" --> NFS_OK 1200 - Current fh is for /src/linux/2.7 and is within pseudo-fs. 1202 OP10: GETATTR(fsid) --> NFS_OK 1204 - Get current fsid to see where fs boundaries are. The fsid will be 1205 that for the pseudo-fs in this example, so no boundary. 1207 OP11: GETFH --> NFS_OK 1209 - Current fh is for /src/linux/2.7 and is within pseudo-fs. 1211 OP12: LOOKUP "latest" --> NFS_OK 1213 - Current fh is for /src/linux/2.7/latest and is within a new, 1214 absent fs, but ... 1216 - The client will never see the value of that fh 1218 OP13: GETATTR(fsid, fs_locations_info) --> NFS_OK 1220 - We are getting the fsid to know where the fs boundaries are. Note 1221 that the fsid we are given will not necessarily be preserved at 1222 the new location. That fsid might be different and in fact the 1223 fsid we have for this fs might a valid fsid of a different fs on 1224 that new server. 1226 - In this particular case, we are pretty sure anyway that what has 1227 moved is /src/linux/2.7/latest rather than /src/linux/2.7 since we 1228 have the fsid of the latter and it is that of the pseudo-fs, which 1229 presumably cannot move. However, in other examples, we might not 1230 have this kind of information to rely on (e.g. /src/linux/2.7 1231 might be a non-pseudo filesystem separate from /src/linux/2.7/ 1232 latest), so we need to have another reliable source information on 1233 the boundary of the fs which is moved. If, for example, the 1234 filesystem "/src/linux" had moved we would have a case of 1235 migration rather than referral and once the boundaries of the 1236 migrated filesystem was clear we could fetch fs_locations_info. 1238 - We are fetching fs_locations_info because the fact that we got an 1239 NFS4ERR_MOVED at this point means that it most likely that this is 1240 a referral and we need the destination. Even if it is the case 1241 that "/src/linux/2.7" is a filesystem which has migrated, we will 1242 still need the location information for that file system. 1244 OP14: GETFH --> NFS4ERR_MOVED 1246 - Fails because current fh is in an absent fs at the start of the 1247 operation and the spec makes no exception for GETFH. Note that 1248 this has the happy consequence that we don't have to worry about 1249 the volatility or lack thereof of the fh. If the root of the fs 1250 on the new location is a persistent fh, then we can assume that 1251 this fh, which we never saw is a persistent fh, which, if we could 1252 see it, would exactly match the new fh. At least, there is no 1253 evidence to disprove that. On the other hand, if we find a 1254 volatile root at the new location, then the filehandle which we 1255 never saw must have been volatile or at least nobody can prove 1256 otherwise. 1258 Given the above, the client knows where the root of the absent file 1259 system is, by noting where the change of fsid occurred. The 1260 fs_locations_info attribute also gives the client the actual location 1261 of the absent file system, so that the referral can proceed. The 1262 server gives the client the bare minimum of information about the 1263 absent file system so that there will be very little scope for 1264 problems of conflict between information sent by the referring server 1265 and information of the file system's home. No filehandles and very 1266 few attributes are present on the referring server and the client can 1267 treat those it receives as basically transient information with the 1268 function of enabling the referral. 1270 3.7.2. Referral Example (READDIR) 1272 Another context in which a client may encounter referrals is when it 1273 does a READDIR on directory in which some of the sub-directories are 1274 the roots of absent file systems. 1276 Suppose such a directory is read as follows: 1278 o PUTROOTFH 1280 o LOOKUP "src" 1281 o LOOKUP "linux" 1283 o LOOKUP "2.7" 1285 o READDIR (fsid, size, ctime, mounted_on_fileid) 1287 In this case, because rdattr_error is not requested, 1288 fs_locations_info is not requested, and some of attributes cannot be 1289 provided the result will be an NFS4ERR_MOVED error on the READDIR, 1290 with the detailed results as follows: 1292 o PUTROOTFH --> NFS_OK. The current fh is at the root of the 1293 pseudo-fs. 1295 o LOOKUP "src" --> NFS_OK. The current fh is for /src and is within 1296 the pseudo-fs. 1298 o LOOKUP "linux" --> NFS_OK. The current fh is for /src/linux and 1299 is within the pseudo-fs. 1301 o LOOKUP "2.7" --> NFS_OK. The current fh is for /src/linux/2.7 and 1302 is within the pseudo-fs. 1304 o READDIR (fsid, size, ctime, mounted_on_fileid) --> NFS4ERR_MOVED. 1305 Note that the same error would have been returned if 1306 /src/linux/2.7 had migrated, when in fact it is because the 1307 directory contains the root of an absent fs. 1309 So now suppose that we reissue with rdattr_error: 1311 o PUTROOTFH 1313 o LOOKUP "src" 1315 o LOOKUP "linux" 1317 o LOOKUP "2.7" 1319 o READDIR (rdattr_error, fsid, size, ctime, mounted_on_fileid) 1321 The results will be: 1323 o PUTROOTFH --> NFS_OK. The current fh is at the root of the 1324 pseudo-fs. 1326 o LOOKUP "src" --> NFS_OK. The current fh is for /src and is within 1327 the pseudo-fs. 1329 o LOOKUP "linux" --> NFS_OK. The current fh is for /src/linux and 1330 is within the pseudo-fs. 1332 o LOOKUP "2.7" --> NFS_OK. The current fh is for /src/linux/2.7 and 1333 is within the pseudo-fs. 1335 o READDIR (rdattr_error, fsid, size, ctime, mounted_on_fileid) --> 1336 NFS_OK. The attributes for "latest" will only contain 1337 rdattr_error with the value will be NFS4ERR_MOVED, together with 1338 an fsid value and an a value for mounted_on_fileid. 1340 So suppose we do another READDIR to get fs_locations_info, although 1341 we could have used a GETATTR directly, as in the previous section. 1343 o PUTROOTFH 1345 o LOOKUP "src" 1347 o LOOKUP "linux" 1349 o LOOKUP "2.7" 1351 o READDIR (rdattr_error, fs_locations_info, mounted_on_fileid, fsid, 1352 size, ctime) 1354 The results would be: 1356 o PUTROOTFH --> NFS_OK. The current fh is at the root of the 1357 pseudo-fs. 1359 o LOOKUP "src" --> NFS_OK. The current fh is for /src and is within 1360 the pseudo-fs. 1362 o LOOKUP "linux" --> NFS_OK. The current fh is for /src/linux and 1363 is within the pseudo-fs. 1365 o LOOKUP "2.7" --> NFS_OK. The current fh is for /src/linux/2.7 and 1366 is within the pseudo-fs. 1368 o READDIR (rdattr_error, fs_locations_info, mounted_on_fileid, fsid, 1369 size, ctime) --> NFS_OK. The attributes will be as shown below. 1371 The attributes for "latest" will only contain 1373 o rdattr_error (value: NFS4ERR_MOVED) 1375 o fs_locations_info ) 1376 o mounted_on_fileid (value: unique fileid within referring fs) 1378 o fsid (value: unique value within referring server) 1380 The attribute entry for "latest" will not contain size or ctime. 1382 3.8. The Attribute fs_absent 1384 In order to provide the client information about whether the current 1385 file system is present or absent, the fs_absent attribute may be 1386 interrogated. 1388 As noted above, this attribute, when supported, may be requested of 1389 absent filesystems without causing NFS4ERR_MOVED to be returned and 1390 it should always be available. Servers are strongly urged to support 1391 this attribute on all filesystems if they support it on any 1392 filesystem. 1394 3.9. The Attribute fs_locations 1396 The fs_locations attribute is structured in the following way: 1398 struct fs_location { 1399 utf8str_cis server<>; 1400 pathname4 rootpath; 1401 }; 1403 struct fs_locations { 1404 pathname4 fs_root; 1405 fs_location locations<>; 1406 }; 1408 The fs_location struct is used to represent the location of a 1409 filesystem by providing a server name and the path to the root of the 1410 file system within that server's namespace. When a set of servers 1411 have corresponding file systems at the same path within their 1412 namespaces, an array of server names may be provided. An entry in 1413 the server array is an UTF8 string and represents one of a 1414 traditional DNS host name, IPv4 address, or IPv6 address. It is not 1415 a requirement that all servers that share the same rootpath be listed 1416 in one fs_location struct. The array of server names is provided for 1417 convenience. Servers that share the same rootpath may also be listed 1418 in separate fs_location entries in the fs_locations attribute. 1420 The fs_locations struct and attribute contains an array of such 1421 locations. Since the name space of each server may be constructed 1422 differently, the "fs_root" field is provided. The path represented 1423 by fs_root represents the location of the filesystem in the current 1424 server's name space, i.e. that of the server from which the 1425 fs_locations attribute was obtained. The fs_root path is meant to 1426 aid the client by clearly referencing the root of the file system 1427 whose locations are being reported, no matter what object within the 1428 current file system, the current filehandle designates. 1430 As an example, suppose there is a replicated filesystem located at 1431 two servers (servA and servB). At servA, the filesystem is located 1432 at path "/a/b/c". At, servB the filesystem is located at path 1433 "/x/y/z". If the client were to obtain the fs_locations value for 1434 the directory at "/a/b/c/d", it might not necessarily know that the 1435 filesystem's root is located in servA's name space at "/a/b/c". When 1436 the client switches to servB, it will need to determine that the 1437 directory it first referenced at servA is now represented by the path 1438 "/x/y/z/d" on servB. To facilitate this, the fs_locations attribute 1439 provided by servA would have a fs_root value of "/a/b/c" and two 1440 entries in fs_locations. One entry in fs_locations will be for 1441 itself (servA) and the other will be for servB with a path of 1442 "/x/y/z". With this information, the client is able to substitute 1443 "/x/y/z" for the "/a/b/c" at the beginning of its access path and 1444 construct "/x/y/z/d" to use for the new server. 1446 Since fs_locations attribute lacks information defining various 1447 attributes of the various file system choices presented, it should 1448 only be interrogated and used when fs_locations_info is not 1449 available. When fs_locations is used, information about the specific 1450 locations should be assumed based on the following rules. 1452 The following rules are general and apply irrespective of the 1453 context. 1455 o When a DNS server name maps to multiple IP addresses, they should 1456 be considered identical, i.e. of the same _endpoint_ class. 1458 o Except in the case of servers sharing an _endpoint_ class, all 1459 listed servers should be considered as of the same _handle_ class, 1460 if and only if, the current fh_expire_type attribute does not 1461 include the FH4_VOL_MIGRATION bit. Note that in the case of 1462 referral, filehandle issues do not apply since there can be no 1463 filehandles known within the current file system nor is there any 1464 access to the fh_expire_type attribute on the referring (absent) 1465 file system. 1467 o Except in the case of servers sharing an _endpoint_ class, all 1468 listed servers should be considered as of the same _fileid_ class, 1469 if and only if, the fh_expire_type attribute indicates persistent 1470 filehandles and does not include the FH4_VOL_MIGRATION bit. Note 1471 that in the case of referral, fileid issues do not apply since 1472 there can be no fileids known within the referring (absent) file 1473 system nor is there any access to the fh_expire_type attribute. 1475 o Except in the case of servers sharing an _endpoint_ class, all 1476 listed servers should be considered as of different _change_ 1477 classes. 1479 For other class assignments, handling depends of file system 1480 transitions depends on the reasons for the transition: 1482 o When the transition is due to migration, the target should be 1483 treated as being of the same _state_ and _verifier_ class as the 1484 source. 1486 o When the transition is due to failover to another replica, the 1487 target should be treated as being of a different _state_ and 1488 _verifier_ class from the source. 1490 The specific choices reflect typical implementation patterns for 1491 failover and controlled migration respectively. Since other choices 1492 are possible and useful, this information is better obtained by using 1493 fs_locations_info. 1495 See the section "Security Considerations" for a discussion on the 1496 recommendations for the security flavor to be used by any GETATTR 1497 operation that requests the "fs_locations" attribute. 1499 3.10. The Attribute fs_locations_info 1501 The fs_locations_info attribute is intended as a more functional 1502 replacement for fs_locations which will continue to exist and be 1503 supported. Clients can use it get a more complete set of information 1504 about alternative file system locations. When the server does not 1505 support fs_locations_info, fs_locations can be used to get a subset 1506 of the information. A server which supports fs_locations_info MUST 1507 support fs_locations as well. 1509 There are several sorts of additional information present in 1510 fs_locations_info, that aren't available in fs_locations: 1512 o Attribute continuity information to allow a client to select a 1513 location which meets the transparency requirements of the 1514 applications accessing the data and to take advantage of 1515 optimizations that server guarantees as to attribute continuity 1516 may provide (e.g. change attribute). 1518 o Filesystem identity information which indicates when multiple 1519 replicas, from the clients point of view, correspond to the same 1520 target filesystem, allowing them to be used interchangeably, 1521 without disruption, as multiple paths to the same thing. 1523 o Information which will bear on the suitability of various 1524 replicas, depending on the use that the client intends. For 1525 example, many applications need an absolutely up-to-date copy 1526 (e.g. those that write), while others may only need access to the 1527 most up-to-date copy reasonably available. 1529 o Server-derived preference information for replicas, which can be 1530 used to implement load-balancing while giving the client the 1531 entire fs list to be used in case the primary fails. 1533 The fs_locations_info attribute consists of a root pathname (just 1534 like fs_locations), together with an array of location4_item 1535 structures. 1537 struct locations4_server { 1538 int32_t currency; 1539 uint32_t info<>; 1540 utf8str_cis server; 1541 }; 1543 const LIBX_GFLAGS = 0; 1544 const LIBX_TFLAGS = 1; 1546 const LIBX_CLSHARE = 2; 1547 const LIBX_CLSERVER = 3; 1548 const LIBX_CLENDPOINT = 4; 1549 const LIBX_CLHANDLE = 5; 1550 const LIBX_CLFILEID = 6; 1551 const LIBX_CLVERIFIER = 7; 1552 const LIBX_CLSTATE = 8; 1554 const LIBX_READRANK = 9; 1555 const LIBX_WRITERANK = 10; 1556 const LIBX_READORDER = 11; 1557 const LIBX_WRITEORDER = 12; 1559 const LIGF_WRITABLE = 0x01; 1560 const LIGF_CUR_REQ = 0x02; 1561 const LIGF_ABSENT = 0x04; 1562 const LIGF_GOING = 0x08; 1564 const LITF_RDMA = 0x01; 1566 struct locations4_item { 1567 locations4_server entries<>; 1568 pathname4 rootpath; 1569 }; 1571 struct locations4_info { 1572 pathname4 fs_root; 1573 locations4_item items<>; 1574 }; 1576 The fs_locations_info attribute is structured similarly to the 1577 fs_locations attribute. A top-level structure (fs_locations4 or 1578 locations4_info) contains the entire attribute including the root 1579 pathname of the fs and an array of lower-level structures that define 1580 replicas that share a common root path on their respective servers. 1581 Those lower-level structures in turn (fs_locations4 or 1582 location4_item) contain a specific pathname and information on one or 1583 more individual server replicas. For that last lowest-level 1584 information, fs_locations has a server name in the form of 1585 utf8str_cis, while fs_locations_info has a location4_server structure 1586 that contains per-server-replica information in addition to the 1587 server name. 1589 The location4_server structure consists of the following items: 1591 o An indication of file system up-to-date-ness (currency) in terms 1592 of approximate seconds before the present. A negative value 1593 indicates that the server is unable to give any reasonably useful 1594 value here. A zero indicates that filesystem is the actual 1595 writable data or a reliably coherent and fully up-to-date copy. 1596 Positive values indicate how out- of-date this copy can normally 1597 be before it is considered for update. Such a value is not a 1598 guarantee that such updates will always be performed on the 1599 required schedule but instead serve as a hint about how far behind 1600 the most up-to-date copy of the data, this copy would normally be 1601 expected to be. 1603 o A counted array of 32-but words containing various sorts of data, 1604 about the particular file system instance. This data includes 1605 general flags, transport capability flags, file system equivalence 1606 class information, and selection priority information. The 1607 encoding will be discussed below. 1609 o The server string. For the case of the replica currently being 1610 accessed (via GETATTR), a null string may be used to indicate the 1611 current address being used for the RPC call. 1613 Data within the info array, is in the form of 8-bit data items even 1614 though that array is, from XDR's point of view an array of 32-bit 1615 integers. This definition was chosen because: 1617 o The kinds of data in the info array, representing, flags, file 1618 system classes and priorities among set of file systems 1619 representing the same data are such that eight bits provides a 1620 quite acceptable range of values. Even where there might be more 1621 than 256 such file system instances, having more than 256 distinct 1622 classes or priorities is unlikely. 1624 o XDR does not have any means to declare an 8-bit data type, other 1625 than an ASCII string, and using 32-bit data types would lead to 1626 significant space inefficiency. 1628 o Explicit definition of the various specific data items within XDR 1629 would limit expandability in that any extension within a 1630 subsequent minor version would require yet another attribute, 1631 leading to specification and implementation clumsiness. 1633 o Such explicit definitions would also make it impossible to propose 1634 standards-track extensions apart from a full minor version. 1636 Each 8-bit successive field within this array is designated by a 1637 constant byte-index as defined above. More significant bit fields 1638 within a single word have successive indices with a transition to the 1639 next word following the most significant 8-bit field in each word. 1641 The set of info data is subject to expansion in a future minor 1642 version, or in a standard-track RFC, within the context of a single 1643 minor version. The server SHOULD NOT send and the client MUST not 1644 use indices within the info array that are not defined in standards- 1645 track RFC's. 1647 The following fragment of c++ code (with Doxygen-style comments) 1648 illustrates how data items within the info array can be found using a 1649 byte-index such as specified by the constants beginning with "LIBX_". 1650 The associated InfoArray object is assume to be initialized with 1651 "Length" containing the XDR-specified length in terms of 32-bit words 1652 and "Data" containing the array of words encoded by the "info<>" 1653 specification. 1655 class InfoArray { 1656 private: 1657 uint32_t Length; 1658 uint32_t Data[]; 1660 public: 1661 uint8_t GetValue(int byteIndex); 1662 }; 1664 /// @brief Get the value of a locations4_server info value 1665 /// 1666 /// This method obtains the specific info value given a 1667 /// byte index defined in the NFSv4.1 spec or another 1668 /// later standards-track document. 1669 /// 1670 /// @param[in] byteIndex The byte index identifying the 1671 /// item requested. 1672 /// @returns The value of the requested item. 1674 uint8_t InfoArray::GetItem(int byteIndex) { 1676 int wordIndex = byteIndex/4; 1677 int byteWithinWord = byteIndex % 4; 1679 if (wordIndex >= Length) { 1680 return (0); 1681 } 1683 uint32_t ourWord = Data[wordIndex]; 1684 return ((ourWord >> (byteWithinWord*8)) & 0xff); 1685 } 1687 The info array contains within it: 1689 o Two 8-bit flag fields, one devoted to general file-system 1690 characteristics and a second reserved for transport-related 1691 capabilities. 1693 o Seven 8-bit class values which define various file system 1694 equivalence classes as explained below. 1696 o Four 8-bit priority values which govern file system selection as 1697 explained below. 1699 The general file system characteristics flag (at byte index 1700 LIBX_GFLAGS) has the following bits defined within it: 1702 o LIGF_WRITABLE indicates that this fs target is writable, allowing 1703 it to be selected by clients which may need to write on this 1704 filesystem. When the current filesystem instance is writable, 1705 then any other filesystem to which the client might switch must 1706 incorporate within its data any committed write made on the 1707 current filesystem instance. See the section on verifier class, 1708 for issues related to uncommitted writes. While there is no harm 1709 in not setting this flag for a filesystem that turns out to be 1710 writable, turning the flag on for read-only filesystem can cause 1711 problems for clients who select a migration or replication target 1712 based on it and then find themselves unable to write. 1714 o LIGF_CUR_REQ indicates that this replica is the one on which the 1715 request is being made. Only a single server entry may have this 1716 flag set and in the case of a referral, no entry will have it. 1718 o LIGF_ABSENT indicates that this entry corresponds an absent 1719 filesystem replica. It can only be set if LIGF_CUR_REQ is set. 1720 When both such bits are set it indicates that a filesystem 1721 instance is not usable but that the information in the entry can 1722 be used to determine the sorts of continuity available when 1723 switching from this replica to other possible replicas. Since 1724 this bit can only be true if LIGF_CUR_REQ is true, the value could 1725 be determined using the fs_absent attribute but the information is 1726 also made available here for the convenience of the client. An 1727 entry with this bit, since it represents a true filesystem (albeit 1728 absent) does not appear in the event of a referral, but only where 1729 a filesystem has been accessed at this location and subsequently 1730 been migrated. 1732 o LIGF_GOING indicates that a replica, while still available, should 1733 not be used further. The client, if using it, should make an 1734 orderly transfer to another filesystem instance as expeditiously 1735 as possible. It is expected that file systems going out of 1736 service will be announced as LIGF_GOING some time before the 1737 actual loss of service and that the valid_for value will be 1738 sufficiently small to allow servers to detect and act on scheduled 1739 events while large enough that the cost of the requests to fetch 1740 the fs_locations_info values will not be excessive. Values on the 1741 order of ten minutes seem reasonable. 1743 The transport-flag field (at byte index LIBX_TFLAGS) contains the 1744 following bits related to the transport capabilities of the specific 1745 file system. 1747 o LITF_RDMA indicates that this file system provides NFSv4.1 file 1748 system access using an RDMA-capable transport. 1750 Attribute continuity and filesystem identity information are 1751 expressed by defining equivalence relations on the sets of file 1752 systems presented to the client. Each such relation is expressed as 1753 a set of file system equivalence classes. For each relation, a file 1754 system has an 8-bit class number. Two file systems belong to the 1755 same class if both have identical non-zero class numbers. Zero is 1756 treated as non-matching. Most often, the relevant question for the 1757 client will be whether a given replica is identical-with/ 1758 continuous-to the current one in a given respect but the information 1759 should be available also as to whether two other replicas match in 1760 that respect as well. 1762 The following fields specify the file system's class numbers for the 1763 equivalence relations used in determining the nature of file system 1764 transitions. See Section 3.6 for details about how this information 1765 is to be used. 1767 o The field with byte-index LIBX_CLSHARE defines the sharing class 1768 for the file system. 1770 o The field with byte-index LIBX_CLSERVER defines the server class 1771 for the file system. 1773 o The field with byte-index LIBX_CLENDPOINT defines the endpoint 1774 class for the file system. 1776 o The field with byte-index LIBX_CLHANDLE defines the handle class 1777 for the file system. 1779 o The field with byte-index LIBX_CLFILEID defines the fileid class 1780 for the file system. 1782 o The field with byte-index LIBX_CLVERIFIER defines the verifier 1783 class for the file system. 1785 o The field with byte-index LIBX_CLSTATE defines the state class for 1786 the file system. 1788 Server-specified preference information is also provided via 8-bit 1789 values within the info array. The values provide a rank and an order 1790 (see below) to be used with separate values specifiable for the cases 1791 of read-only and writable file systems. These values are compared 1792 for different file systems to establish the server-specified 1793 preference, with lower values indicating "more preferred". 1795 Rank is used to express a strict server-imposed ordering on clients, 1796 with lower values indicating "more preferred." Clients should 1797 attempt to use all replicas with a given rank before they use one 1798 with a higher rank. Only if all of those file systems are 1799 unavailable should the client proceed to those of a higher rank. 1801 Within a rank, the order value is used to specify the server's 1802 preference to guide the client's selection when the client's own 1803 preferences are not controlling, with lower values of order 1804 indicating "more preferred." If replicas are approximately equal in 1805 all respects, clients should defer to the order specified by the 1806 server. When clients look at server latency as part of their 1807 selection, they are free to use this criterion but it is suggested 1808 that when latency differences are not significant, the server- 1809 specified order should guide selection. 1811 o The field at byte index LIBX_READRANK gives the rank value to be 1812 used for read-only access. 1814 o The field at byte index LIBX_READOREDER gives the order value to 1815 be used for read-only access. 1817 o The field at byte index LIBX_WRITERANK gives the rank value to be 1818 used for writable access. 1820 o The field at byte index LIBX_WRITEOREDER gives the order value to 1821 be used for writable access. 1823 Depending on the potential need for write access by a given client, 1824 one of the pairs of rank and order values is used. The read rank and 1825 order should only be used if the client knows that only reading will 1826 ever be done or if it is prepared to switch to a different replica in 1827 the event that any write access capability is required in the future. 1829 The locations4_info structure, encoding the fs_locations_info 1830 attribute contains the following: 1832 o The fs_root field which contains the pathname of the root of the 1833 current filesystem on the current server, just as it does the 1834 fs_locations4 structure. 1836 o An array of locations4_item structures, which contain information 1837 about replicas of the current filesystem. Where the current 1838 filesystem is actually present, or has been present, i.e. this is 1839 not a referral situation, one of the locations4_item structure 1840 will contain a locations4_server for the current server. This 1841 structure will have LIGF_ABSENT set if the current filesystem is 1842 absent, i.e. normal access to it will return NFS4ERR_MOVED. 1844 o The valid_for field specifies a time for which it is reasonable 1845 for a client to use the fs_locations_info attribute without 1846 refetch. The valid_for value does not provide a guarantee of 1847 validity since servers can unexpectedly go out of service or 1848 become inaccessible for any number of reasons. Clients are well- 1849 advised to refetch this information for actively accessed 1850 filesystem at every valid_for seconds. This is particularly 1851 important when filesystem replicas may go out of service in a 1852 controlled way using the LIGF_GOING flag to communicate an ongoing 1853 change. The server should set valid_for to a value which allows 1854 well-behaved clients to notice the LIF_GOING flag and make an 1855 orderly switch before the loss of service becomes effective. If 1856 this value is zero, then no refetch interval is appropriate and 1857 the client need not refetch this data on any particular schedule. 1858 In the event of a transition to a new filesystem instance, a new 1859 value of the fs_locations_info attribute will be fetched at the 1860 destination and it is to be expected that this may have a 1861 different valid_for value, which the client should then use, in 1862 the same fashion as the previous value. 1864 As noted above, the fs_locations_info attribute, when supported, may 1865 be requested of absent filesystems without causing NFS4ERR_MOVED to 1866 be returned and it is generally expected that will be available for 1867 both present and absent filesystems even if only a single 1868 location_server entry is present, designating the current (present) 1869 filesystem, or two location_server entries designating the current 1870 (and now previous) location of an absent filesystem and its successor 1871 location. Servers are strongly urged to support this attribute on 1872 all filesystems if they support it on any filesystem. 1874 3.11. The Attribute fs_status 1876 In an environment in which multiple copies of the same basic set of 1877 data are available, information regarding the particular source of 1878 such data and the relationships among different copies, can be very 1879 helpful in providing consistent data to applications. 1881 enum status4_type { 1882 STATUS4_FIXED = 1, 1883 STATUS4_UPDATED = 2, 1884 STATUS4_INTERLOCKED = 3, 1885 STATUS4_WRITABLE = 4, 1886 STATUS4_ABSENT = 5 1887 }; 1889 struct fs4_status { 1890 status4_type type; 1891 utf8str_cs source; 1892 utf8str_cs current; 1893 int32_t age; 1894 nfstime4 version; 1895 }; 1897 The type value indicates the kind of filesystem image represented. 1898 This is of particular importance when using the version values to 1899 determine appropriate succession of filesystem images. Five types 1900 are distinguished: 1902 o STATUS4_FIXED which indicates a read-only image in the sense that 1903 it will never change. The possibility is allowed that as a result 1904 of migration or switch to a different image, changed data can be 1905 accessed but within the confines of this instance, no change is 1906 allowed. The client can use this fact to aggressively cache. 1908 o STATUS4_UPDATED which indicates an image that cannot be updated by 1909 the user writing to it but may be changed exogenously, typically 1910 because it is a periodically updated copy of another writable 1911 filesystem somewhere else. 1913 o STATUS4_VERSIONED which indicates that the image, like the 1914 STATUS4_UPDATED case, is updated exogenously, but it provides a 1915 guarantee that the server will carefully update the associated 1916 version value so that the client, may if it chooses, protect 1917 itself from a situation in which it reads data from one version of 1918 the filesystem, and then later reads data from an earlier version 1919 of the same filesystem. See below for a discussion of how this 1920 can be done. 1922 o STATUS4_WRITABLE which indicates that the filesystem is an actual 1923 writable one. The client need not of course actually write to the 1924 filesystem, but once it does, it should not accept a transition to 1925 anything other than a writable instance of that same filesystem. 1927 o STATUS4_ABSENT which indicates that the information is the last 1928 valid for a filesystem which is no longer present. 1930 The opaque strings source and current provide a way of presenting 1931 information about the source of the filesystem image being present. 1932 It is not intended that client do anything with this information 1933 other than make it available to administrative tools. It is intended 1934 that this information be helpful when researching possible problems 1935 with a filesystem image that might arise when it is unclear if the 1936 correct image is being accessed and if not, how that image came to be 1937 made. This kind of debugging information will be helpful, if, as 1938 seems likely, copies of filesystems are made in many different ways 1939 (e.g. simple user-level copies, filesystem- level point-in-time 1940 copies, cloning of the underlying storage), under a variety of 1941 administrative arrangements. In such environments, determining how a 1942 given set of data was constructed can be very helpful in resolving 1943 problems. 1945 The opaque string 'source' is used to indicate the source of a given 1946 filesystem with the expectation that tools capable of creating a 1947 filesystem image propagate this information, when that is possible. 1948 It is understood that this may not always be possible since a user- 1949 level copy may be thought of as creating a new data set and the tools 1950 used may have no mechanism to propagate this data. When a filesystem 1951 is initially created associating with it data regarding how the 1952 filesystem was created, where it was created, by whom, etc. can be 1953 put in this attribute in a human- readable string form so that it 1954 will be available when propagated to subsequent copies of this data. 1956 The opaque string 'current' should provide whatever information is 1957 available about the source of the current copy. Such information as 1958 the tool creating it, any relevant parameters to that tool, the time 1959 at which the copy was done, the user making the change, the server on 1960 which the change was made etc. All information should be in a human- 1961 readable string form. 1963 The age provides an indication of how out-of-date the file system 1964 currently is with respect to its ultimate data source (in case of 1965 cascading data updates). This complements the currency field of 1966 locations4_server (See Section 3.10) in the following way: the 1967 information in locations4_server.currency gives a bound for how out 1968 of date the data in a file system might typically get, while the age 1969 gives a bound on how out of date that data actually is. Negative 1970 values imply no information is available. A zero means that this 1971 data is known to be current. A positive value means that this data 1972 is known to be no older than that number of seconds with respect to 1973 the ultimate data source. 1975 The version field provides a version identification, in the form of a 1976 time value, such that successive versions always have later time 1977 values. When the filesystem type is anything other than 1978 STATUS4_VERSIONED, the server may provide such a value but there is 1979 no guarantee as to its validity and clients will not use it except to 1980 provide additional information to add to 'source' and 'current'. 1982 When the type is STATUS4_VERSIONED, servers should provide a value of 1983 version which progresses monotonically whenever any new version of 1984 the data is established. This allows the client, if reliable image 1985 progression is important to it, to fetch this attribute as part of 1986 each COMPOUND where data or metadata from the filesystem is used. 1988 When it is important to the client to make sure that only valid 1989 successor images are accepted, it must make sure that it does not 1990 read data or metadata from the filesystem without updating its sense 1991 of the current state of the image, to avoid the possibility that the 1992 fs_status which the client holds will be one for an earlier image, 1993 and so accept a new filesystem instance which is later than that but 1994 still earlier than updated data read by the client. 1996 In order to do this reliably, it must do a GETATTR of fs_status that 1997 follows any interrogation of data or metadata within the filesystem 1998 in question. Often this is most conveniently done by appending such 1999 a GETATTR after all other operations that reference a given 2000 filesystem. When errors occur between reading filesystem data and 2001 performing such a GETATTR, care must be exercised to make sure that 2002 the data in question is not used before obtaining the proper 2003 fs_status value. In this connection, when an OPEN is done within 2004 such a versioned filesystem and the associated GETATTR of fs_status 2005 is not successfully completed, the open file in question must not be 2006 accessed until that fs_status is fetched. 2008 The procedure above will ensure that before using any data from the 2009 filesystem the client has in hand a newly-fetched current version of 2010 the filesystem image. Multiple values for multiple requests in 2011 flight can be resolved by assembling them into the required partial 2012 order (and the elements should form a total order within it) and 2013 using the last. The client may then, when switching among filesystem 2014 instances, decline to use an instance which is not of type 2015 STATUS4_VERSIONED or whose version field is earlier than the last one 2016 obtained from the predecessor filesystem instance. 2018 4. Other Changes 2020 This is a list of changes in other areas of the spec that need to be 2021 made to conform with what is written here. 2023 o Need to add fs_absent, fs_locations_info, and fs_status to the 2024 list of recommended attributes. 2026 o Need to add NFS4ERR_MOVED to all the ops that don't currently 2027 include it to match what the spec says. Alternatively, we may 2028 want to factor this out and create a list of errors that any op 2029 can receive. 2031 o Change the definition of NFS4ERR_MOVED in section 20 to indicate 2032 that it just means that the fs is not there and may never have 2033 really "moved". 2035 o Delete sections 6.14 and 6.14.* which have been incorporated in 2036 the new chapter. 2038 o In the spirit of the "minior" issue, fix instances of ampersand-lt 2039 which need to come out as less-than. Also in that spirit, fix 2040 errors marked "TDB" in the error list. 2042 o Add locations4_server, locations4_item, and locations4_info as the 2043 appropriate sections 1.2.*. 2045 o Add status4_type and fs4_status as the appropriate sections 1.2.*. 2047 o Delete the errors NFS4ERR_MOVED_DATA_AND_STATE and 2048 NFS4ERR_MOVED_DATA from section 20. 2050 o Replace the sixth paragraph of section 2.2.3 with the following 2051 text: FH4_VOL_MIGRATION The filehandle will expire as a result of 2052 a file system transition (migration or replication), in those case 2053 in which the continuity of filehandle use is not specified by 2054 _handle_ class information within the fs_locations_info attribute. 2055 When this bit is set, clients without access to fs_locations_info 2056 information should assume file handles will expire on file system 2057 transitions. 2059 o Note that the last sentence of the paragraph referred to above has 2060 been removed and was never true. It is one thing to say that a 2061 file handle may expire (i.e. that you have to be prepared for the 2062 server to tell you it is expired) and another to say that you must 2063 decide it is expired even if the server may not necessarily 2064 recognize as expired (because he has no idea what your handles 2065 look like). 2067 o Replace the tenth paragraph of section 2.2.3 with the following 2068 text: Servers which provide volatile filehandles that may expire 2069 while open require special care as regards handling of RENAMESs 2070 and REMOVEs. This situation can arise if FH4_VOL_MIGRATION or 2071 FH4_VOL_RENAME is set, if FH4_VOLATILE_ANY is set and 2072 FH4_NOEXPIRE_WITH_OPEN not set, or if a non-readonly file system 2073 has a transition target in a different _handle _ class. In these 2074 cases, the server should deny a RENAME or REMOVE that would affect 2075 an OPEN file of any of the components leading to the OPEN file. 2076 In addition, the server should deny all RENAME or REMOVE requests 2077 during the grace period, in order to make sure that reclaims of 2078 files where filehandles may have expired do not do a reclaim for 2079 the wrong file. 2081 5. References 2083 [1] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame, 2084 C., Eisler, M., and D. Noveck, "Network File System (NFS) 2085 version 4 Protocol", RFC 3530, April 2003. 2087 [2] Shepler, S., "NFSv4 Minor Version 1", 2088 draft-ietf-nfsv4-minorversion1-02 (work in progress), 2089 March 2006. 2091 Author's Address 2093 David Noveck 2094 Network Appliance 2095 1601 Trapelo Road, Suite 16 2096 Waltham, MA 02454 2097 US 2099 Phone: +1 781 961 9291 2100 Email: dnoveck@netapp.com 2102 Intellectual Property Statement 2104 The IETF takes no position regarding the validity or scope of any 2105 Intellectual Property Rights or other rights that might be claimed to 2106 pertain to the implementation or use of the technology described in 2107 this document or the extent to which any license under such rights 2108 might or might not be available; nor does it represent that it has 2109 made any independent effort to identify any such rights. Information 2110 on the procedures with respect to rights in RFC documents can be 2111 found in BCP 78 and BCP 79. 2113 Copies of IPR disclosures made to the IETF Secretariat and any 2114 assurances of licenses to be made available, or the result of an 2115 attempt made to obtain a general license or permission for the use of 2116 such proprietary rights by implementers or users of this 2117 specification can be obtained from the IETF on-line IPR repository at 2118 http://www.ietf.org/ipr. 2120 The IETF invites any interested party to bring to its attention any 2121 copyrights, patents or patent applications, or other proprietary 2122 rights that may cover technology that may be required to implement 2123 this standard. Please address the information to the IETF at 2124 ietf-ipr@ietf.org. 2126 Disclaimer of Validity 2128 This document and the information contained herein are provided on an 2129 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2130 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 2131 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 2132 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 2133 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2134 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2136 Copyright Statement 2138 Copyright (C) The Internet Society (2006). This document is subject 2139 to the rights, licenses and restrictions contained in BCP 78, and 2140 except as set forth therein, the authors retain all their rights. 2142 Acknowledgment 2144 Funding for the RFC Editor function is currently provided by the 2145 Internet Society.