idnits 2.17.1 draft-ietf-nfsv4-rfc5661-msns-update-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 105 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 3515 has weird spacing: '... struct fs_lo...' == Line 3561 has weird spacing: '... struct fs_lo...' == Line 3570 has weird spacing: '... struct fs_lo...' (Using the creation date from RFC5661, updated by this document, for RFC5378 checks: 2005-10-21) -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 16, 2019) is 1836 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 5661 (Obsoleted by RFC 8881) Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 D. Noveck, Ed. 3 Internet-Draft NetApp 4 Updates: 5661 (if approved) C. Lever 5 Intended status: Standards Track ORACLE 6 Expires: October 18, 2019 April 16, 2019 8 NFS Version 4.1 Update for Multi-Server Namespace 9 draft-ietf-nfsv4-rfc5661-msns-update-00 11 Abstract 13 This document presents necessary clarifications and corrections 14 concerning features related to the use of attributes in NFSv4.1 15 related to file system location. These revised features include 16 migration, which transfers responsibility for a file system from one 17 server to another, and include facilities to support trunking by 18 allowing discovery of the set of network addresses to use to access a 19 file system. This document updates RFC5661. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on October 18, 2019. 38 Copyright Notice 40 Copyright (c) 2019 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (https://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 This document may contain material from IETF Documents or IETF 54 Contributions published or made publicly available before November 55 10, 2008. The person(s) controlling the copyright in some of this 56 material may not have granted the IETF Trust the right to allow 57 modifications of such material outside the IETF Standards Process. 58 Without obtaining an adequate license from the person(s) controlling 59 the copyright in such materials, this document may not be modified 60 outside the IETF Standards Process, and derivative works of it may 61 not be created outside the IETF Standards Process, except to format 62 it for publication as an RFC or to translate it into languages other 63 than English. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 5 68 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 6 69 3. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 6 70 3.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 71 3.2. Summary of Issues Addressed . . . . . . . . . . . . . . . 9 72 3.3. Relationship of this Document to [RFC5661] . . . . . . . 11 73 3.4. Compatibility Issues . . . . . . . . . . . . . . . . . . 13 74 4. Revised Preparatory Sections . . . . . . . . . . . . . . . . 13 75 4.1. Updated Section 1.7.3.3 of [RFC5661] to be retitled 76 "Introduction to Multi-Server Namespace" . . . . . . . . 14 77 4.2. Updated Section 2.10.4 of [RFC5661] entitled 78 "Server Scope" . . . . . . . . . . . . . . . . . . . . . 14 79 4.3. Updated Section 2.10.5 of [RFC5661] entitled 80 "Trunking" . . . . . . . . . . . . . . . . . . . . . . . 16 81 4.3.1. Updated Section 2.10.5.1 of [RFC5661] entitled 82 "Verifying Claims of Matching Server Identity" . . . 19 83 5. Replacement for Section 11 of [RFC5661] entitled 84 "Multi-Server Namespace" . . . . . . . . . . . . . . . . . . 20 85 5.1. New section to be added as the first sub-section of 86 Section 11 of [RFC5661] to be entitled 87 "Terminology Related to File System Location" . . . . . . 21 88 5.2. Replacement for Section 11.1 of [RFC5661] to be retitled 89 "File System Location Attributes" . . . . . . . . . . . . 23 90 5.3. Transferred Section 11.2 of [RFC5661] to be entitled 91 "File System Presence or Absence" . . . . . . . . . . . . 24 92 5.4. Transferred Section 11.3 of [RFC5661] entitled 93 "Getting Attributes for an Absent File System" . . . . . 25 94 5.4.1. GETATTR within an Absent File System (transferred 95 section) . . . . . . . . . . . . . . . . . . . . . . 25 96 5.4.2. READDIR and Absent File Systems (transferred section) 26 98 5.5. Updated Section 11.4 of [RFC5661] to be retitled 99 "Uses of File System Location Information" . . . . . . . 27 100 5.5.1. New section to be added as the first sub-section of 101 Section 11.4 of [RFC5661] to be entitled 102 "Combining Multiple Uses in a Single Attribute" . . . 28 103 5.5.2. New section to be added as the second sub-section of 104 Section 11.4 of [RFC5661] to be entitled 105 "File System Location Attributes and Trunking" . . . 29 106 5.5.3. New section to be added as the third sub-section of 107 Section 11.4 of [RFC5661] to be entitled 108 "File System Location Attributes and Connection Type 109 Selection" . . . . . . . . . . . . . . . . . . . . . 29 110 5.5.4. Updated Section 11.4.1 of [RFC5661] entitled 111 "File System Replication" . . . . . . . . . . . . . . 30 112 5.5.5. Updated Section 11.4.2 of [RFC5661] entitled 113 "File System Migration" . . . . . . . . . . . . . . . 31 114 5.5.6. Updated Section 11.4.3 of [RFC5661] entitled 115 "Referrals" . . . . . . . . . . . . . . . . . . . . . 32 116 5.5.7. New section to be added after Section 11.4.3 of 117 [RFC5661] to be entitled 118 "Changes in a File System Location Attribute" . . . . 33 119 5.6. Transferred Section 11.6 of [RFC5661] entitled 120 "Additional Client-Side Considerations" . . . . . . . . . 34 121 5.7. New section to be added after Section 11.6 of [RFC5661] 122 to be entitled "Overview of File Access Transitions" . . 35 123 5.8. New section to be added second after Section 11.6 of 124 [RFC5661] to be entitled 125 "Effecting Network Endpoint Transitions" . . . . . . . . 35 126 5.9. Updated Section 11.7 of [RFC5661] entitled 127 "Effecting File System Transitions" . . . . . . . . . . . 36 128 5.9.1. Updated Section 11.7.1 of [RFC5661] entitled 129 "File System Transitions and Simultaneous Access" . . 37 130 5.9.2. Updated Section 11.7.3 of [RFC5661] entitled 131 "Filehandles and File System Transitions" . . . . . . 38 132 5.9.3. Updated Section 11.7.4 of [RFC5661] entitled 133 "Fileids and File System Transitions" . . . . . . . . 38 134 5.9.4. Updated section 11.7.5 of [RFC5661] entitled 135 "Fsids and File System Transitions" . . . . . . . . . 40 136 5.9.5. Updated Section 11.7.6 of [RFC5661] entitled 137 "The Change Attribute and File System Transitions" . 40 138 5.9.6. Updated Section 11.7.8 of [RFC5661] entitled 139 "Write Verifiers and File System Transitions" . . . . 41 140 5.9.7. Updated Section 11.7.9 of [RFC5661] entitled 141 "Readdir Cookies and Verifiers and File System 142 Transitions)" . . . . . . . . . . . . . . . . . . . . 41 143 5.9.8. Updated Section 11.7.10 entitled 144 "File System Data and File System Transitions" . . . 42 145 5.9.9. Updated Section 11.7.7 entitled 146 "Lock State and File System Transitions" . . . . . . 43 147 5.10. New section to be added after Section 11.7 of [RFC5661] 148 to be entitled "Transferring State upon Migration" . . . 45 149 5.10.1. Only sub-section within new section to be added to 150 [RFC5661] to be entitled 151 "Transparent State Migration and pNFS" . . . . . . . 46 152 5.11. New section to be added second after Section 11.7 of 153 [RFC5661] to be entitled 154 "Client Responsibilities when Access is Transitioned" . . 47 155 5.11.1. First sub-section within new section to be added to 156 [RFC5661] to be entitled 157 "Client Transition Notifications" . . . . . . . . . 48 158 5.11.2. Second sub-section within new section to be added to 159 [RFC5661] to be entitled 160 "Performing Migration Discovery" . . . . . . . . . . 50 161 5.11.3. Third sub-section within new section to be added to 162 [RFC5661] to be entitled 163 "Overview of Client Response to NFS4ERR_MOVED" . . . 52 164 5.11.4. Fourth sub-section within new section to be added to 165 [RFC5661] to be entitled 166 "Obtaining Access to Sessions and State after 167 Migration" . . . . . . . . . . . . . . . . . . . . . 54 168 5.11.5. Fifth sub-section within new section to be added to 169 [RFC5661] to be entitled 170 "Obtaining Access to Sessions and State after 171 Network Address Transfer" . . . . . . . . . . . . . 56 172 5.12. New section to be added third after Section 11.7 of 173 [RFC5661] to be entitled 174 "Server Responsibilities Upon Migration" . . . . . . . . 57 175 5.12.1. First sub-section within new section to be added to 176 [RFC5661] to be entitled 177 "Server Responsibilities in Effecting State Reclaim 178 after Migration" . . . . . . . . . . . . . . . . . . 57 179 5.12.2. Second sub-section within new section to be added to 180 [RFC5661] to be entitled 181 "Server Responsibilities in Effecting Transparent 182 State Migration" . . . . . . . . . . . . . . . . . . 58 183 5.12.3. Third sub-section within new section to be added to 184 [RFC5661] to be entitled 185 "Server Responsibilities in Effecting Session 186 Transfer" . . . . . . . . . . . . . . . . . . . . . 60 187 5.13. Transferred Section 11.8 of [RFC5661] entitled 188 "Effecting File System Referrals" . . . . . . . . . . . . 63 189 5.13.1. Referral Example (LOOKUP) (transferred section) . . 63 190 5.13.2. Referral Example (READDIR) (transferred section) . . 67 191 5.14. Transferred Section 11.9 of [RFC5661]" entitled 192 "The Attribute fs_locations" . . . . . . . . . . . . . . 69 193 5.15. Updated Section 11.10 of [RFC5661] entitled 194 "The Attribute fs_locations_info" . . . . . . . . . . . . 72 195 5.15.1. Updated section 11.10.1 of [RFC5661] entitled 196 "The fs_locations_server4 Structure" . . . . . . . . 76 197 5.15.2. Updated Section 11.10.2 of [RFC5661] entitled 198 "The fs_locations_info4 Structure" . . . . . . . . . 83 199 5.15.3. Updated Section 11.10.3 of [RFC5661] entitled 200 "The fs_locations_item4 Structure" . . . . . . . . . 84 201 5.16. Transferred Section 11.11 of [RFC5661]" entitled 202 "The Attribute fs_status" . . . . . . . . . . . . . . . . 86 203 6. Revised Error Definitions within [RFC5661] . . . . . . . . . 90 204 6.1. Added Initial subsection of Section 15.1 of [RFC5661] 205 entitled "Overall Error Table" . . . . . . . . . . . . . 90 206 6.2. Updated Section 15.1.2.4 of [RFC5661] entitled 207 "NFS4ERR_MOVED (Error Code 10013)" . . . . . . . . . . . 93 208 6.3. Updated Section 15.1.9 of [RFC5661] entitled 209 "Reclaim Errors" . . . . . . . . . . . . . . . . . . . . 93 210 7. Revised Operations within [RFC5661] . . . . . . . . . . . . . 95 211 7.1. Updated Section 18.35 of [RFC5661] entitled 212 "Operation 42: EXCHANGE_ID - Instantiate Client ID" . . . 95 213 7.2. Updated Section 18.51 of [RFC5661] entitled 214 "Operation 58: RECLAIM_COMPLETE - Indicates Reclaims 215 Finished" . . . . . . . . . . . . . . . . . . . . . . . . 113 216 8. Security Considerations . . . . . . . . . . . . . . . . . . . 117 217 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 119 218 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 119 219 10.1. Normative References . . . . . . . . . . . . . . . . . . 119 220 10.2. Informative References . . . . . . . . . . . . . . . . . 121 221 Appendix A. Classification of Document Sections . . . . . . . . 121 222 Appendix B. Revisions Made to [RFC5661] . . . . . . . . . . . . 123 223 B.1. Revisions Made to Section 11 of [RFC5661] . . . . . . . . 123 224 B.2. Revisions Made to Operations in RFC5661 . . . . . . . . . 127 225 B.3. Revisions Made to Error Definitions in [RFC5661] . . . . 129 226 B.4. Other Revisions Made to [RFC5661] . . . . . . . . . . . . 129 227 Appendix C. Disposition of Sections Within [RFC5661] . . . . . . 130 228 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 134 229 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 134 231 1. Introduction 233 This document defines the proper handling, within NFSv4.1, of the 234 attributes related to file system location (fs_locations and 235 fs_locations_info) and how necessary changes in those attributes are 236 to be dealt with. It supersedes the treatment of these issues that 237 appeared in Section 11 of [RFC5661]. The necessary corrections and 238 clarifications parallel those done for NFSv4.0 in [RFC7931] and 239 [I-D.ietf-nfsv4-mv0-trunking-update]. 241 A large part of the changes to be made are necessary to clarify the 242 handling of Transparent State Migration in NFSv4.1, which was not 243 described in [RFC5661]. In addition, many of the issues dealt with 244 in [RFC7931] for NFSv4.0 need to be addressed in the context of 245 NFSv4.1. 247 Another important issue to be dealt with concerns the handling of 248 multiple entries within attributes related to file system locations 249 that represent different ways to access the same file system. 250 Unfortunately, [RFC5661] while recognizing that these entries can 251 represent different ways to access the same file system, confuses the 252 matter by treating network access paths as "replicas", making it 253 difficult for these attributes to be used to obtain information about 254 the network addresses to be used to access particular file system 255 instances and engendering confusion between two different sorts of 256 transition: those involving a change of network access paths to the 257 same file system instance and those in which there is a shift between 258 two distinct replicas. 260 This document supplements facilities related to trunking, introduced 261 in [RFC5661]. For some important terminology regarding trunking, see 262 Section 3.1. When file system location information is used to 263 determine the set of network addresses to access a particular file 264 system instance (i.e. to perform trunking discovery), clarification 265 is needed regarding the interaction of trunking and transitions 266 between file system replicas, including migration. Unfortunately 267 [RFC5661], while it provided a method of determining whether two 268 network addresses were connected to the same server, did not address 269 the issue of trunking discovery, making it necessary to address it in 270 this document. 272 2. Requirements Language 274 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 275 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 276 "OPTIONAL" in this document are to be interpreted as described in BCP 277 14 [RFC2119] [RFC8174] when, and only when, they appear in all 278 capitals, as shown here. 280 3. Preliminaries 282 3.1. Terminology 284 While most of the terms related to multi-server namespace issues are 285 appropriately defined in the section replacing Section 11 in 286 [RFC5661] and appear in Section 5.1 below, there are a number of 287 terms used outside that context that are explained here. 289 In this document, the phrase "client ID" always refers to the 64-bit 290 shorthand identifier assigned by the server (a clientid4) and never 291 to the structure which the client uses to identify itself to the 292 server (called an nfs_client_id4 or client_owner in NFSv4.0 and 293 NFSv4.1 respectively). The opaque identifier within those structures 294 is referred to as a "client id string". 296 It is particularly important to clarify the distinction between 297 trunking detection and trunking discovery. The definitions we 298 present will be applicable to all minor versions of NFSv4, but we 299 will put particular emphasis on how these terms apply to NFS version 300 4.1. 302 o Trunking detection refers to ways of deciding whether two specific 303 network addresses are connected to the same NFSv4 server. The 304 means available to make this determination depends on the protocol 305 version, and, in some cases, on the client implementation. 307 In the case of NFS version 4.1 and later minor versions, the means 308 of trunking detection are as described by [RFC5661] and are 309 available to every client. Two network addresses connected to the 310 same server are always server-trunkable but cannot necessarily be 311 used together to access a single session. 313 o Trunking discovery is a process by which a client using one 314 network address can obtain other addresses that are connected to 315 the same server. Typically, it builds on a trunking detection 316 facility by providing one or more methods by which candidate 317 addresses are made available to the client who can then use 318 trunking detection to appropriately filter them. 320 Despite the support for trunking detection there was no 321 description of trunking discovery provided in [RFC5661]. 323 Regarding network addresses and the handling of trunking we use the 324 following terminology: 326 o Each NFSv4 server is assumed to have a set of IP addresses to 327 which NFSv4 requests may be sent by clients. These are referred 328 to as the server's network addresses. Access to a specific server 329 network address may involve the use of multiple ports, since the 330 ports to be used for various types of connections might be 331 required to be different. 333 o Each network address, when combined with a pathname providing the 334 location of a file system root directory relative to the 335 associated server root file handle, defines a file system network 336 access path. 338 o Server network addresses are used to establish connections to 339 servers which may be of a number of connection types. Separate 340 connection types are used to support NFSv4 layered on top of the 341 RPC stream transport as described in [RFC5531] and on top of RPC- 342 over-RDMA as described in [RFC8166]. 344 o The combination of a server network address and a particular 345 connection type to be used by a connection is referred to as a 346 "server endpoint". Although using different connection types may 347 result in different ports being used, the use of different ports 348 by multiple connections to the same network address is not the 349 essence of the distinction between the two endpoints used. 351 o Two network addresses connected to the same server are said to be 352 server-trunkable. Two such addresses support the use of clientid 353 ID trunking, as described in [RFC5661]. 355 o Two network addresses connected to the same server such that those 356 addresses can be used to support a single common session are 357 referred to as session-trunkable. Note that two addresses may be 358 server-trunkable without being session-trunkable and that when two 359 connections of different connection types are made to the same 360 network address and are based on a single file system location 361 entry they are always session-trunkable, independent of the 362 connection type, as specified by [RFC5661], since their derivation 363 from the same file system location entry together with the 364 identity of their network addresses assures that both connections 365 are to the same server and will return server-owner information 366 allowing session trunking to be used. 368 Discussion of the term "replica" is complicated for a number of 369 reasons: 371 o Even though the term is used in explaining the issues in [RFC5661] 372 that need to be addressed in this document, a full explanation of 373 this term requires explanation of related terms connected to the 374 file system location attributes which are provided in Section 5.1 375 of the current document. 377 o The term is also used in [RFC5661], with a meaning different from 378 that in the current document. In short, in [RFC5661] each replica 379 is identified by a single network access path while, in the 380 current document a set of network access paths which have server- 381 trunkable network addresses and the same root-relative file system 382 pathname is considered to be a single replica with multiple 383 network access paths. 385 3.2. Summary of Issues Addressed 387 This document explains how clients and servers are to determine the 388 particular network access paths to be used to access a file system. 389 This includes describing how changes to the specific replica to be 390 used or to the set of addresses to be used to access it are to be 391 dealt with, and how transfers of responsibility that need to be made 392 can be dealt with transparently. This includes cases in which there 393 is a shift between one replica and another and those in which 394 different network access paths are used to access the same replica. 396 As a result of the following problems in [RFC5661], it is necessary 397 to provide the specific updates which are made by this document. 398 These updates are described in Appendix B 400 o [RFC5661], while it dealt with situations in which various forms 401 of clustering allowed co-ordination of the state assigned by co- 402 operating servers to be used, made no provisions for Transparent 403 State Migration, as introduced by [RFC7530] and corrected and 404 clarified by [RFC7931]. 406 o Although NFSv4.1 was defined with a clear definition of how 407 trunking detection was to be done, there was no clear 408 specification of how trunking discovery was to be done, despite 409 the fact that the specification clearly indicated that this 410 information could be made available via the file system location 411 attributes. 413 o Because the existence of multiple network access paths to the same 414 file system was dealt with as if there were multiple replicas, 415 issues relating to transitions between replicas could never be 416 clearly distinguished from trunking-related transitions between 417 the addresses used to access a particular file system instance. 418 As a result, in situations in which both migration and trunking 419 configuration changes were involved, neither of these could be 420 clearly dealt with and the relationship between these two features 421 was not seriously addressed. 423 o Because use of two network access paths to the same file system 424 instance (i.e. trunking) was often treated as if two replicas were 425 involved, it was considered that two replicas were being used 426 simultaneously. As a result, the treatment of replicas being used 427 simultaneously in [RFC5661] was not clear as it covered the two 428 distinct cases of a single file system instance being accessed by 429 two different network access paths and two replicas being accessed 430 simultaneously, with the limitations of the latter case not being 431 clearly laid out. 433 The majority of the consequences of these issues are dealt with by 434 presenting in Section 5 below, a replacement for Section 11 within 435 [RFC5661]. This replacement modifies existing sub-sections within 436 that section and adds new ones, as described in Appendix B.1. Also, 437 some existing sections are deleted. These changes were made in order 438 to: 440 o Reorganize the description so that the case of two network access 441 paths to the same file system instance needs to be distinguished 442 clearly from the case of two different replicas since, in the 443 former case, locking state is shared and there also can be sharing 444 of session state. 446 o Provide a clear statement regarding the desirability of 447 transparent transfer of state between replicas together with a 448 recommendation that either that or a single-fs grace period be 449 provided. 451 o Specifically delineate how such transfers are to be dealt with by 452 the client, taking into account the differences from the treatment 453 in [RFC7931] made necessary by the major protocol changes made in 454 NFSv4.1. 456 o Provide discussion of the relationship between transparent state 457 transfer and Parallel NFS (pNFS). 459 o Provide clarification of the fs_locations_info attribute in order 460 to specify which portions of the information provided apply to a 461 specific network access path and which to the replica which that 462 path is used to access. 464 In addition, there are also updates to other sections of [RFC5661], 465 where the consequences of the incorrect assumptions underlying the 466 current treatment of multi-server namespace issues also need to be 467 corrected. These are to be dealt with as described in Sections B.2 468 through B.4 of the current document. 470 o A revised introductory section regarding multi-server namespace 471 facilities is provided. 473 o A more realistic treatment of server scope is provided, which 474 reflects the more limited co-ordination of locking state adopted 475 by servers actually sharing a common server scope. 477 o Some confusing text regarding changes in server_owner has been 478 clarified. 480 o The description of some existing errors has been modified to more 481 clearly explain certain errors situations to reflect the existence 482 of trunking and the possible use of fs-specific grace periods. 483 For details, see Appendix B.3. 485 o New descriptions of certain existing operations are provided, 486 either because the existing treatment did not account for 487 situations that would arise in dealing with transparent state 488 migration, or because some types of reclaim issues were not 489 adequately dealt with in the context of fs-specific grace periods. 490 For details, see Appendix B.3. 492 3.3. Relationship of this Document to [RFC5661] 494 The role of this document is to explain and specify a set of needed 495 changes to [RFC5661]. All of these changes are related to the multi- 496 server namespace features of NFSv4.1. 498 This document contains sections that provide additions to and other 499 modifications of [RFC5661] as well as others that explain the reasons 500 for modifications but do not directly affect existing specifications. 502 In consequence, the sections of this document can be divided into 503 five groups based on how they relate to the eventual updating of the 504 NFSv4.1 specification. Once the update is published, NFSv4.1 will be 505 specified by two documents that need to be read together, until such 506 time as a consolidated specification is produced. 508 o Explanatory sections do not contain any material that is meant to 509 update the specification of NFSv4.1. Such sections may contain 510 explanations about why and how changes are to be done, without 511 including any text that is to update [RFC5661] or appear in an 512 eventual consolidated document. 514 o Replacement sections contain text that is to replace and thus 515 supersede text within [RFC5661] and then appear in an eventual 516 consolidated document. The titles of replacement sections 517 indicate the section(s) within [RFC5661] that is to be replaced. 519 o Additional sections contain text which, although not replacing 520 anything in [RFC5661], will be part of the specification of 521 NFSv4.1 and will be expected to be part of an eventual 522 consolidated document. The titles of additional sections indicate 523 where, within [RFC5661], the new section would appear. 525 o Transferred sections contain text which reproduces that from a 526 corresponding section of [RFC5661]. Such sections are reproduced 527 in this document, to avoid the need for the reader to continually 528 switch between this document and [RFC5661] in reading about a 529 particular topic. Many subsections within Section 5 are of this 530 type. The titles of transferred sections typically indicate the 531 source within [RFC5661], of the transferred material. An 532 exception is the case transferred sub-sections of a transferred 533 section where the title only notes that the subsection is 534 transferred. 536 o Editing sections contain some text that replaces text within 537 [RFC5661], although the entire section will not consist of such 538 text and will include other text as well. Such sections make 539 relatively minor adjustments in the existing NFSv4.1 specification 540 which are expected to be reflected in an eventual consolidated 541 document. Generally, such replacement text appears in the form of 542 a quotation, which may be rendered as an indented set of 543 paragraphs. 545 See Appendix A for a classification of the sections of this document 546 according to the categories above. 548 Overall, explanatory sections explain why the document makes the 549 changes it does to the specification of NFSv4.1 in [RFC5661] while 550 the other section types are used to specify how the specification of 551 NFSv4.1 will be changed. While the details of that process are 552 described in Appendix B, the following summarizes the necessary 553 changes: 555 o Section 4 provides replacements for preparatory sections important 556 to establish the background for and updated treatment of issues 557 related to multi-server namespace. 559 o Section 5 provides a complete replacement for Section 11 of 560 [RFC5661]. This replacement is necessary to adapt the section to 561 the existence of trunking with the multi-server namespace, to 562 describe transparent state migration and session migration and to 563 clarify how continuity of locking state is to be provided in the 564 absence of transparent state migration. 566 o Section 6 provides updated descriptions of errors affected by the 567 changes made in this document. 569 o Section 7 provides updated descriptions of two operations affected 570 by the changes made in this document. 572 o Section 8 describes the changes to Section 21 of [RFC5661] (i.e. 573 the Security Considerations Section) made necessary by the other 574 changes in this document. 576 When this document is approved and published, [RFC5661] would be 577 significantly updated as described above with most of the changed 578 sections appearing within the current Section 11 of that document. A 579 detailed discussion of how this affects each section of [RFC5661] can 580 be found in Appendix C. 582 3.4. Compatibility Issues 584 Because of the extensive modification to the specification for an 585 existing protocol, proper attention to compatibility issues is 586 needed. In general, the following, besides the fact that no XDR 587 changes have been made, are the main reasons that compatibility 588 issues have been avoided. 590 o The addition of explicit reference to the fact that network 591 addresses presented within location entries can provide the 592 clients with candidates for trunking, while not mentioned in 593 [RFC5661], is not incompatible with anything specified there. 594 This is because in situation in which there are multiple addresses 595 by which a server could be reached, these addresses would be 596 presented within additional location entries, even though the 597 earlier document would erroneously present these as additional 598 "replicas" which might be migrated to or used simultaneously with 599 those at other addresses that are trunkable with them. 601 o Many of the facilities described here, such as transparent state 602 migration and session migration are clearly specified as optional, 603 with it being made clear how clients can be aware of this server 604 functionality. As a result, clients previously unaware of these 605 facilities will not look for them and not use them while all 606 clients will be able to see that they are not provided by servers 607 unaware of them. 609 o In cases such as the handling of server scope in which [RFC5661] 610 specified a level of inter-server co-operation, which is, 611 practically speaking, impossible to achieve, the necessary 612 correction cannot give rise to compatibility issues. This is 613 because clients could not rely on these assurances, since they 614 could not be realized. 616 4. Revised Preparatory Sections 618 A number of sections appearing early in [RFC5661] require revisions 619 to provide need clarification and to be compatible with changes 620 needed in this document. The reasons for these revisions are 621 discussed in Appendix B.4 623 4.1. Updated Section 1.7.3.3 of [RFC5661] to be retitled "Introduction 624 to Multi-Server Namespace" 626 NFSv4.1 contains a number of features to allow implementation of 627 namespaces that cross server boundaries and that allow and facilitate 628 a non-disruptive transfer of support for individual file systems 629 between servers. They are all based upon attributes that allow one 630 file system to specify alternate, additional, and new location 631 information that specifies how the client may access that file 632 system. 634 These attributes can be used to provide for individual active file 635 systems: 637 o Alternate network addresses to access the current file system 638 instance. 640 o The locations of alternate file system instances or replicas to be 641 used in the event that the current file system instance becomes 642 unavailable. 644 These file system location attributes may be used together with the 645 concept of absent file systems, in which a position in the server 646 namespace is associated with locations on other servers without there 647 being any corresponding file system instance on the current server. 649 o These attributes may be used with absent file systems to implement 650 referrals whereby one server may direct the client to a file 651 system provided by another server. This allows extensive multi- 652 server namespaces to be constructed. 654 o These attributes may be provided when a previously present file 655 system becomes absent. This allows non-disruptive migration of 656 file systems to alternate servers. 658 4.2. Updated Section 2.10.4 of [RFC5661] entitled "Server Scope" 660 Servers each specify a server scope value in the form of an opaque 661 string eir_server_scope returned as part of the results of an 662 EXCHANGE_ID operation. The purpose of the server scope is to allow a 663 group of servers to indicate to clients that a set of servers sharing 664 the same server scope value has arranged to use compatible values of 665 otherwise opaque identifiers. Thus, the identifiers generated by two 666 servers within that set can be assumed compatible so that, in some 667 cases, identifiers generated by one server in that set that set may 668 be presented to another server of the same scope. 670 The use of such compatible values does not imply that a value 671 generated by one server will always be accepted by another. In most 672 cases, it will not. However, a server will not accept a value 673 generated by another inadvertently. When it does accept it, it will 674 be because it is recognized as valid and carrying the same meaning as 675 on another server of the same scope. 677 When servers are of the same server scope, this compatibility of 678 values applies to the following identifiers: 680 o Filehandle values. A filehandle value accepted by two servers of 681 the same server scope denotes the same object. A WRITE operation 682 sent to one server is reflected immediately in a READ sent to the 683 other. 685 o Server owner values. When the server scope values are the same, 686 server owner value may be validly compared. In cases where the 687 server scope values are different, server owner values are treated 688 as different even if they contain identical strings of bytes. 690 The coordination among servers required to provide such compatibility 691 can be quite minimal, and limited to a simple partition of the ID 692 space. The recognition of common values requires additional 693 implementation, but this can be tailored to the specific situations 694 in which that recognition is desired. 696 Clients will have occasion to compare the server scope values of 697 multiple servers under a number of circumstances, each of which will 698 be discussed under the appropriate functional section: 700 o When server owner values received in response to EXCHANGE_ID 701 operations sent to multiple network addresses are compared for the 702 purpose of determining the validity of various forms of trunking, 703 as described in Section 5.5.2 of the current document. 705 o When network or server reconfiguration causes the same network 706 address to possibly be directed to different servers, with the 707 necessity for the client to determine when lock reclaim should be 708 attempted, as described in Section 8.4.2.1 of [RFC5661]. 710 When two replies from EXCHANGE_ID, each from two different server 711 network addresses, have the same server scope, there are a number of 712 ways a client can validate that the common server scope is due to two 713 servers cooperating in a group. 715 o If both EXCHANGE_ID requests were sent with RPCSEC_GSS ([RFC2203], 716 [RFC5403], [RFC7861]) authentication and the server principal is 717 the same for both targets, the equality of server scope is 718 validated. It is RECOMMENDED that two servers intending to share 719 the same server scope also share the same principal name, 720 simplifying the client's task of validating server scope. 722 o The client may accept the appearance of the second server in the 723 fs_locations or fs_locations_info attribute for a relevant file 724 system. For example, if there is a migration event for a 725 particular file system or there are locks to be reclaimed on a 726 particular file system, the attributes for that particular file 727 system may be used. The client sends the GETATTR request to the 728 first server for the fs_locations or fs_locations_info attribute 729 with RPCSEC_GSS authentication. It may need to do this in advance 730 of the need to verify the common server scope. If the client 731 successfully authenticates the reply to GETATTR, and the GETATTR 732 request and reply containing the fs_locations or fs_locations_info 733 attribute refers to the second server, then the equality of server 734 scope is supported. A client may choose to limit the use of this 735 form of support to information relevant to the specific file 736 system involved (e.g. a file system being migrated). 738 4.3. Updated Section 2.10.5 of [RFC5661] entitled "Trunking" 740 Trunking is the use of multiple connections between a client and 741 server in order to increase the speed of data transfer. NFSv4.1 742 supports two types of trunking: session trunking and client ID 743 trunking. 745 In the context of a single server network address, it can be assumed 746 that all connections are accessing the same server and NFSv4.1 747 servers MUST support both forms of trunking. When multiple 748 connections use a set of network addresses accessing the same server, 749 the server MUST support both forms of trunking. NFSv4.1 servers in a 750 clustered configuration MAY allow network addresses for different 751 servers to use client ID trunking. 753 Clients may use either form of trunking as long as they do not, when 754 trunking between different server network addresses, violate the 755 servers' mandates as to the kinds of trunking to be allowed (see 756 below). With regard to callback channels, the client MUST allow the 757 server to choose among all callback channels valid for a given client 758 ID and MUST support trunking when the connections supporting the 759 backchannel allow session or client ID trunking to be used for 760 callbacks. 762 Session trunking is essentially the association of multiple 763 connections, each with potentially different target and/or source 764 network addresses, to the same session. When the target network 765 addresses (server addresses) of the two connections are the same, the 766 server MUST support such session trunking. When the target network 767 addresses are different, the server MAY indicate such support using 768 the data returned by the EXCHANGE_ID operation (see below). 770 Client ID trunking is the association of multiple sessions to the 771 same client ID. Servers MUST support client ID trunking for two 772 target network addresses whenever they allow session trunking for 773 those same two network addresses. In addition, a server MAY, by 774 presenting the same major server owner ID (see Section 2.5 of 775 [RFC5661]) and server scope (Section 4.2), allow an additional case 776 of client ID trunking. When two servers return the same major server 777 owner and server scope, it means that the two servers are cooperating 778 on locking state management, which is a prerequisite for client ID 779 trunking. 781 Distinguishing when the client is allowed to use session and client 782 ID trunking requires understanding how the results of the EXCHANGE_ID 783 (Section 7.1) operation identify a server. Suppose a client sends 784 EXCHANGE_IDs over two different connections, each with a possibly 785 different target network address, but each EXCHANGE_ID operation has 786 the same value in the eia_clientowner field. If the same NFSv4.1 787 server is listening over each connection, then each EXCHANGE_ID 788 result MUST return the same values of eir_clientid, 789 eir_server_owner.so_major_id, and eir_server_scope. The client can 790 then treat each connection as referring to the same server (subject 791 to verification; see Section 4.3.1 below), and it can use each 792 connection to trunk requests and replies. The client's choice is 793 whether session trunking or client ID trunking applies. 795 Session Trunking. If the eia_clientowner argument is the same in two 796 different EXCHANGE_ID requests, and the eir_clientid, 797 eir_server_owner.so_major_id, eir_server_owner.so_minor_id, and 798 eir_server_scope results match in both EXCHANGE_ID results, then 799 the client is permitted to perform session trunking. If the 800 client has no session mapping to the tuple of eir_clientid, 801 eir_server_owner.so_major_id, eir_server_scope, and 802 eir_server_owner.so_minor_id, then it creates the session via a 803 CREATE_SESSION operation over one of the connections, which 804 associates the connection to the session. If there is a session 805 for the tuple, the client can send BIND_CONN_TO_SESSION to 806 associate the connection to the session. 808 Of course, if the client does not desire to use session trunking, 809 it is not required to do so. It can invoke CREATE_SESSION on the 810 connection. This will result in client ID trunking as described 811 below. It can also decide to drop the connection if it does not 812 choose to use trunking. 814 Client ID Trunking. If the eia_clientowner argument is the same in 815 two different EXCHANGE_ID requests, and the eir_clientid, 816 eir_server_owner.so_major_id, and eir_server_scope results match 817 in both EXCHANGE_ID results, then the client is permitted to 818 perform client ID trunking (regardless of whether the 819 eir_server_owner.so_minor_id results match). The client can 820 associate each connection with different sessions, where each 821 session is associated with the same server. 823 The client completes the act of client ID trunking by invoking 824 CREATE_SESSION on each connection, using the same client ID that 825 was returned in eir_clientid. These invocations create two 826 sessions and also associate each connection with its respective 827 session. The client is free to decline to use client ID trunking 828 by simply dropping the connection at this point. 830 When doing client ID trunking, locking state is shared across 831 sessions associated with that same client ID. This requires the 832 server to coordinate state across sessions and the client to be 833 able to associate the same locking state with multiple sessions. 835 It is always possible that, as a result of various sorts of 836 reconfiguration events, eir_server_scope and eir_server_owner values 837 may be different on subsequent EXCHANGE_ID requests made to the same 838 network address. 840 In most cases such reconfiguration events will be disruptive and 841 indicate that an IP address formerly connected to one server is now 842 connected to an entirely different one. 844 Some guidelines on client handling of such situations follow: 846 o When eir_server_scope changes, the client has no assurance that 847 any id's it obtained previously (e.g. file handles, state ids, 848 client ids) can be validly used on the new server, and, even if 849 the new server accepts them, there is no assurance that this is 850 not due to accident. Thus, it is best to treat all such state as 851 lost/stale although a client may assume that the probability of 852 inadvertent acceptance is low and treat this situation as within 853 the next case. 855 o When eir_server_scope remains the same and 856 eir_server_owner.so_major_id changes, the client can use the 857 filehandles it has, consider its locking state lost, and attempt 858 to reclaim or otherwise re-obtain its locks. It may find that its 859 file handle IS now stale but if NFS4ERR_STALE is not received, it 860 can proceed to reclaim or otherwise re-obtain its open locking 861 state. 863 o When eir_server_scope and eir_server_owner.so_major_id remain the 864 same, the client has to use the now-current values of 865 eir_server_owner.so_minor_id in deciding on appropriate forms of 866 trunking. This may result in connections being dropped or new 867 sessions being created. 869 4.3.1. Updated Section 2.10.5.1 of [RFC5661] entitled "Verifying Claims 870 of Matching Server Identity" 872 When the server responses using two different connections claim 873 matching or partially matching eir_server_owner, eir_server_scope, 874 and eir_clientid values, the client does not have to trust the 875 servers' claims. The client may verify these claims before trunking 876 traffic in the following ways: 878 o For session trunking, clients SHOULD reliably verify if 879 connections between different network paths are in fact associated 880 with the same NFSv4.1 server and usable on the same session, and 881 servers MUST allow clients to perform reliable verification. When 882 a client ID is created, the client SHOULD specify that 883 BIND_CONN_TO_SESSION is to be verified according to the SP4_SSV or 884 SP4_MACH_CRED (Section 7.1) state protection options. For 885 SP4_SSV, reliable verification depends on a shared secret (the 886 SSV) that is established via the SET_SSV (see Section 18.27 of 887 [RFC5661]) operation. 889 When a new connection is associated with the session (via the 890 BIND_CONN_TO_SESSION operation, see Section 18.34 of [RFC5661]), 891 if the client specified SP4_SSV state protection for the 892 BIND_CONN_TO_SESSION operation, the client MUST send the 893 BIND_CONN_TO_SESSION with RPCSEC_GSS protection, using integrity 894 or privacy, and an RPCSEC_GSS handle created with the GSS SSV 895 mechanism (see section 2.10.9 of [RFC5661]). 897 If the client mistakenly tries to associate a connection to a 898 session of a wrong server, the server will either reject the 899 attempt because it is not aware of the session identifier of the 900 BIND_CONN_TO_SESSION arguments, or it will reject the attempt 901 because the RPCSEC_GSS authentication fails. Even if the server 902 mistakenly or maliciously accepts the connection association 903 attempt, the RPCSEC_GSS verifier it computes in the response will 904 not be verified by the client, so the client will know it cannot 905 use the connection for trunking the specified session. 907 If the client specified SP4_MACH_CRED state protection, the 908 BIND_CONN_TO_SESSION operation will use RPCSEC_GSS integrity or 909 privacy, using the same credential that was used when the client 910 ID was created. Mutual authentication via RPCSEC_GSS assures the 911 client that the connection is associated with the correct session 912 of the correct server. 914 o For client ID trunking, the client has at least two options for 915 verifying that the same client ID obtained from two different 916 EXCHANGE_ID operations came from the same server. The first 917 option is to use RPCSEC_GSS authentication when sending each 918 EXCHANGE_ID operation. Each time an EXCHANGE_ID is sent with 919 RPCSEC_GSS authentication, the client notes the principal name of 920 the GSS target. If the EXCHANGE_ID results indicate that client 921 ID trunking is possible, and the GSS targets' principal names are 922 the same, the servers are the same and client ID trunking is 923 allowed. 925 The second option for verification is to use SP4_SSV protection. 926 When the client sends EXCHANGE_ID, it specifies SP4_SSV 927 protection. The first EXCHANGE_ID the client sends always has to 928 be confirmed by a CREATE_SESSION call. The client then sends 929 SET_SSV. Later, the client sends EXCHANGE_ID to a second 930 destination network address different from the one the first 931 EXCHANGE_ID was sent to. The client checks that each EXCHANGE_ID 932 reply has the same eir_clientid, eir_server_owner.so_major_id, and 933 eir_server_scope. If so, the client verifies the claim by sending 934 a CREATE_SESSION operation to the second destination address, 935 protected with RPCSEC_GSS integrity using an RPCSEC_GSS handle 936 returned by the second EXCHANGE_ID. If the server accepts the 937 CREATE_SESSION request, and if the client verifies the RPCSEC_GSS 938 verifier and integrity codes, then the client has proof the second 939 server knows the SSV, and thus the two servers are cooperating for 940 the purposes of specifying server scope and client ID trunking. 942 5. Replacement for Section 11 of [RFC5661] entitled "Multi-Server 943 Namespace" 945 NFSv4.1 supports attributes that allow a namespace to extend beyond 946 the boundaries of a single server. It is desirable that clients and 947 servers support construction of such multi-server namespaces. Use of 948 such multi-server namespaces is OPTIONAL however, and for many 949 purposes, single-server namespaces are perfectly acceptable. Use of 950 multi-server namespaces can provide many advantages, by separating a 951 file system's logical position in a namespace from the (possibly 952 changing) logistical and administrative considerations that result in 953 particular file systems being located on particular servers via a 954 single network access paths known in advance or determined using DNS. 956 5.1. New section to be added as the first sub-section of Section 11 of 957 [RFC5661] to be entitled "Terminology Related to File System 958 Location" 960 Regarding terminology relating to the construction of multi-server 961 namespaces out of a set of local per-server namespaces: 963 o Each server has a set of exported file systems which may be 964 accessed by NFSv4 clients. Typically, this is done by assigning 965 each file system a name within the pseudo-fs associated with the 966 server, although the pseudo-fs may be dispensed with if there is 967 only a single exported file system. Each such file system is part 968 of the server's local namespace, and can be considered as a file 969 system instance within a larger multi-server namespace. 971 o The set of all exported file systems for a given server 972 constitutes that server's local namespace. 974 o In some cases, a server will have a namespace more extensive than 975 its local namespace by using features associated with attributes 976 that provide file system location information. These features, 977 which allow construction of a multi-server namespace are all 978 described in individual sections below and include referrals 979 (described in Section 5.5.6), migration (described in 980 Section 5.5.5), and replication (described in Section 5.5.4). 982 o A file system present in a server's pseudo-fs may have multiple 983 file system instances on different servers associated with it. 984 All such instances are considered replicas of one another. 986 o When a file system is present in a server's pseudo-fs, but there 987 is no corresponding local file system, it is said to be "absent". 988 In such cases, all associated instances will be accessed on other 989 servers. 991 Regarding terminology relating to attributes used in trunking 992 discovery and other multi-server namespace features: 994 o File system location attributes include the fs_locations and 995 fs_locations_info attributes. 997 o File system location entries provide the individual file system 998 locations within the file system location attributes. Each such 999 entry specifies a server, in the form of a host name or IP 1000 address, and an fs name, which designates the location of the file 1001 system within the server's pseudo-fs. A file system location 1002 entry designates a set of server endpoints to which the client may 1003 establish connections. There may be multiple endpoints because a 1004 host name may map to multiple network addresses and because 1005 multiple connection types may be used to communicate with a single 1006 network address. However, all such endpoints MUST provide a way 1007 of connecting to a single server. The exact form of the location 1008 entry varies with the particular file system location attribute 1009 used, as described in Section 5.2. 1011 o File system location elements are derived from location entries 1012 and each describes a particular network access path, consisting of 1013 a network address and a location within the server's pseudo-fs. 1014 Such location elements need not appear within a file system 1015 location attribute, but the existence of each location element 1016 derives from a corresponding location entry. When a location 1017 entry specifies an IP address there is only a single corresponding 1018 location element. File system location entries that contain a 1019 host name are resolved using DNS, and may result in one or more 1020 location elements. All location elements consist of a location 1021 address which is the IP address of an interface to a server and an 1022 fs name which is the location of the file system within the 1023 server's pseudo-fs. The fs name is empty if the server has no 1024 pseudo-fs and only a single exported file system at the root 1025 filehandle. 1027 o Two file system location elements are said to be server-trunkable 1028 if they specify the same fs name and the location addresses are 1029 such that the location addresses are server-trunkable. When the 1030 corresponding network paths are used, the client will always be 1031 able to use client ID trunking, but will only be able to use 1032 session trunking if the paths are also session-trunkable. 1034 o Two file system location elements are said to be session-trunkable 1035 if they specify the same fs name and the location addresses are 1036 such that the location addresses are session-trunkable. When the 1037 corresponding network paths are used, the client will be able to 1038 able to use either client ID trunking or session trunking. 1040 Each set of server-trunkable location elements defines a set of 1041 available network access paths to a particular file system. When 1042 there are multiple such file systems, each of which contains the same 1043 data, these file systems are considered replicas of one another. 1044 Logically, such replication is symmetric, since the fs currently in 1045 use and an alternate fs are replicas of each other. Often, in other 1046 documents, the term "replica" is not applied to the fs currently in 1047 use, despite the fact that the replication relation is inherently 1048 symmetric. 1050 5.2. Replacement for Section 11.1 of [RFC5661] to be retitled "File 1051 System Location Attributes" 1053 NFSv4.1 contains attributes that provide information about how (i.e., 1054 at what network address and namespace position) a given file system 1055 may be accessed. As a result, file systems in the namespace of one 1056 server can be associated with one or more instances of that file 1057 system on other servers. These attributes contain file system 1058 location entries specifying a server address target (either as a DNS 1059 name representing one or more IP addresses or as a specific IP 1060 address) together with the pathname of that file system within the 1061 associated single-server namespace. 1063 The fs_locations_info RECOMMENDED attribute allows specification of 1064 one or more file system instance locations where the data 1065 corresponding to a given file system may be found. This attribute 1066 provides to the client, in addition to specification of file system 1067 instance locations, other helpful information such as: 1069 o Information guiding choices among the various file system 1070 instances provided (e.g., priority for use, writability, currency, 1071 etc.). 1073 o Information to help the client efficiently effect as seamless a 1074 transition as possible among multiple file system instances, when 1075 and if that should be necessary. 1077 o Information helping to guide the selection of the appropriate 1078 connection type to be used when establishing a connection. 1080 Within the fs_locations_info attribute, each fs_locations_server4 1081 entry corresponds to a file system location entry with the fls_server 1082 field designating the server, with the location pathname within the 1083 server's pseudo-fs given by the fl_rootpath field of the encompassing 1084 fs_locations_item4. 1086 The fs_locations attribute defined in NFSv4.0 is also a part of 1087 NFSv4.1. This attribute only allows specification of the file system 1088 locations where the data corresponding to a given file system may be 1089 found. Servers should make this attribute available whenever 1090 fs_locations_info is supported, but client use of fs_locations_info 1091 is preferable, as it provides more information. 1093 Within the fs_location attribute, each fs_location4 contains a file 1094 system location entry with the server field designating the server 1095 and the rootpath field giving the location pathname within the 1096 server's pseudo-fs. 1098 5.3. Transferred Section 11.2 of [RFC5661] to be entitled "File System 1099 Presence or Absence" 1101 A given location in an NFSv4.1 namespace (typically but not 1102 necessarily a multi-server namespace) can have a number of file 1103 system instance locations associated with it (via the fs_locations or 1104 fs_locations_info attribute). There may also be an actual current 1105 file system at that location, accessible via normal namespace 1106 operations (e.g., LOOKUP). In this case, the file system is said to 1107 be "present" at that position in the namespace, and clients will 1108 typically use it, reserving use of additional locations specified via 1109 the location-related attributes to situations in which the principal 1110 location is no longer available. 1112 When there is no actual file system at the namespace location in 1113 question, the file system is said to be "absent". An absent file 1114 system contains no files or directories other than the root. Any 1115 reference to it, except to access a small set of attributes useful in 1116 determining alternate locations, will result in an error, 1117 NFS4ERR_MOVED. Note that if the server ever returns the error 1118 NFS4ERR_MOVED, it MUST support the fs_locations attribute and SHOULD 1119 support the fs_locations_info and fs_status attributes. 1121 While the error name suggests that we have a case of a file system 1122 that once was present, and has only become absent later, this is only 1123 one possibility. A position in the namespace may be permanently 1124 absent with the set of file system(s) designated by the location 1125 attributes being the only realization. The name NFS4ERR_MOVED 1126 reflects an earlier, more limited conception of its function, but 1127 this error will be returned whenever the referenced file system is 1128 absent, whether it has moved or not. 1130 Except in the case of GETATTR-type operations (to be discussed 1131 later), when the current filehandle at the start of an operation is 1132 within an absent file system, that operation is not performed and the 1133 error NFS4ERR_MOVED is returned, to indicate that the file system is 1134 absent on the current server. 1136 Because a GETFH cannot succeed if the current filehandle is within an 1137 absent file system, filehandles within an absent file system cannot 1138 be transferred to the client. When a client does have filehandles 1139 within an absent file system, it is the result of obtaining them when 1140 the file system was present, and having the file system become absent 1141 subsequently. 1143 It should be noted that because the check for the current filehandle 1144 being within an absent file system happens at the start of every 1145 operation, operations that change the current filehandle so that it 1146 is within an absent file system will not result in an error. This 1147 allows such combinations as PUTFH-GETATTR and LOOKUP-GETATTR to be 1148 used to get attribute information, particularly location attribute 1149 information, as discussed below. 1151 The RECOMMENDED file system attribute fs_status can be used to 1152 interrogate the present/absent status of a given file system. 1154 5.4. Transferred Section 11.3 of [RFC5661] entitled "Getting Attributes 1155 for an Absent File System" 1157 When a file system is absent, most attributes are not available, but 1158 it is necessary to allow the client access to the small set of 1159 attributes that are available, and most particularly those that give 1160 information about the correct current locations for this file system: 1161 fs_locations and fs_locations_info. 1163 5.4.1. GETATTR within an Absent File System (transferred section) 1165 As mentioned above, an exception is made for GETATTR in that 1166 attributes may be obtained for a filehandle within an absent file 1167 system. This exception only applies if the attribute mask contains 1168 at least one attribute bit that indicates the client is interested in 1169 a result regarding an absent file system: fs_locations, 1170 fs_locations_info, or fs_status. If none of these attributes is 1171 requested, GETATTR will result in an NFS4ERR_MOVED error. 1173 When a GETATTR is done on an absent file system, the set of supported 1174 attributes is very limited. Many attributes, including those that 1175 are normally REQUIRED, will not be available on an absent file 1176 system. In addition to the attributes mentioned above (fs_locations, 1177 fs_locations_info, fs_status), the following attributes SHOULD be 1178 available on absent file systems. In the case of RECOMMENDED 1179 attributes, they should be available at least to the same degree that 1180 they are available on present file systems. 1182 change_policy: This attribute is useful for absent file systems and 1183 can be helpful in summarizing to the client when any of the 1184 location-related attributes change. 1186 fsid: This attribute should be provided so that the client can 1187 determine file system boundaries, including, in particular, the 1188 boundary between present and absent file systems. This value must 1189 be different from any other fsid on the current server and need 1190 have no particular relationship to fsids on any particular 1191 destination to which the client might be directed. 1193 mounted_on_fileid: For objects at the top of an absent file system, 1194 this attribute needs to be available. Since the fileid is within 1195 the present parent file system, there should be no need to 1196 reference the absent file system to provide this information. 1198 Other attributes SHOULD NOT be made available for absent file 1199 systems, even when it is possible to provide them. The server should 1200 not assume that more information is always better and should avoid 1201 gratuitously providing additional information. 1203 When a GETATTR operation includes a bit mask for one of the 1204 attributes fs_locations, fs_locations_info, or fs_status, but where 1205 the bit mask includes attributes that are not supported, GETATTR will 1206 not return an error, but will return the mask of the actual 1207 attributes supported with the results. 1209 Handling of VERIFY/NVERIFY is similar to GETATTR in that if the 1210 attribute mask does not include fs_locations, fs_locations_info, or 1211 fs_status, the error NFS4ERR_MOVED will result. It differs in that 1212 any appearance in the attribute mask of an attribute not supported 1213 for an absent file system (and note that this will include some 1214 normally REQUIRED attributes) will also cause an NFS4ERR_MOVED 1215 result. 1217 5.4.2. READDIR and Absent File Systems (transferred section) 1219 A READDIR performed when the current filehandle is within an absent 1220 file system will result in an NFS4ERR_MOVED error, since, unlike the 1221 case of GETATTR, no such exception is made for READDIR. 1223 Attributes for an absent file system may be fetched via a READDIR for 1224 a directory in a present file system, when that directory contains 1225 the root directories of one or more absent file systems. In this 1226 case, the handling is as follows: 1228 o If the attribute set requested includes one of the attributes 1229 fs_locations, fs_locations_info, or fs_status, then fetching of 1230 attributes proceeds normally and no NFS4ERR_MOVED indication is 1231 returned, even when the rdattr_error attribute is requested. 1233 o If the attribute set requested does not include one of the 1234 attributes fs_locations, fs_locations_info, or fs_status, then if 1235 the rdattr_error attribute is requested, each directory entry for 1236 the root of an absent file system will report NFS4ERR_MOVED as the 1237 value of the rdattr_error attribute. 1239 o If the attribute set requested does not include any of the 1240 attributes fs_locations, fs_locations_info, fs_status, or 1241 rdattr_error, then the occurrence of the root of an absent file 1242 system within the directory will result in the READDIR failing 1243 with an NFS4ERR_MOVED error. 1245 o The unavailability of an attribute because of a file system's 1246 absence, even one that is ordinarily REQUIRED, does not result in 1247 any error indication. The set of attributes returned for the root 1248 directory of the absent file system in that case is simply 1249 restricted to those actually available. 1251 5.5. Updated Section 11.4 of [RFC5661] to be retitled "Uses of File 1252 System Location Information" 1254 The file system location attributes (i.e. fs_locations and 1255 fs_locations_info), together with the possibility of absent file 1256 systems, provide a number of important facilities in providing 1257 reliable, manageable, and scalable data access. 1259 When a file system is present, these attributes can provide 1261 o The locations of alternative replicas, to be used to access the 1262 same data in the event of server failures, communications 1263 problems, or other difficulties that make continued access to the 1264 current replica impossible or otherwise impractical. Provision 1265 and use of such alternate replicas is referred to as "replication" 1266 and is discussed in Section 5.5.4 below. 1268 o The network address(es) to be used to access the current file 1269 system instance or replicas of it. Client use of this information 1270 is discussed in Section 5.5.2 below. 1272 Under some circumstances, multiple replicas may be used 1273 simultaneously to provide higher-performance access to the file 1274 system in question, although the lack of state sharing between 1275 servers may be an impediment to such use. 1277 When a file system is present and becomes absent, clients can be 1278 given the opportunity to have continued access to their data, using a 1279 different replica. In this case, a continued attempt to use the data 1280 in the now-absent file system will result in an NFS4ERR_MOVED error 1281 and, at that point, the successor replica or set of possible replica 1282 choices can be fetched and used to continue access. Transfer of 1283 access to the new replica location is referred to as "migration", and 1284 is discussed in Section 5.5.4 below. 1286 Where a file system had been absent, specification of file system 1287 location provides a means by which file systems located on one server 1288 can be associated with a namespace defined by another server, thus 1289 allowing a general multi-server namespace facility. A designation of 1290 such a remote instance, in place of a file system never previously 1291 present, is called a "pure referral" and is discussed in 1292 Section 5.5.6 below. 1294 Because client support for attributes related to file system location 1295 is OPTIONAL, a server may choose to take action to hide migration and 1296 referral events from such clients, by acting as a proxy, for example. 1297 The server can determine the presence of client support from the 1298 arguments of the EXCHANGE_ID operation (see Section 7.1.3 in the 1299 current document). 1301 5.5.1. New section to be added as the first sub-section of Section 11.4 1302 of [RFC5661] to be entitled "Combining Multiple Uses in a Single 1303 Attribute" 1305 A file system location attribute will sometimes contain information 1306 relating to the location of multiple replicas which may be used in 1307 different ways. 1309 o File system location entries that relate to the file system 1310 instance currently in use provide trunking information, allowing 1311 the client to find additional network addresses by which the 1312 instance may be accessed. 1314 o File system location entries that provide information about 1315 replicas to which access is to be transferred. 1317 o Other file system location entries that relate to replicas that 1318 are available to use in the event that access to the current 1319 replica becomes unsatisfactory. 1321 In order to simplify client handling and allow the best choice of 1322 replicas to access, the server should adhere to the following 1323 guidelines. 1325 o All file system location entries that relate to a single file 1326 system instance should be adjacent. 1328 o File system location entries that relate to the instance currently 1329 in use should appear first. 1331 o File system location entries that relate to replica(s) to which 1332 migration is occurring should appear before replicas which are 1333 available for later use if the current replica should become 1334 inaccessible. 1336 5.5.2. New section to be added as the second sub-section of 1337 Section 11.4 of [RFC5661] to be entitled "File System Location 1338 Attributes and Trunking" 1340 Trunking is the use of multiple connections between a client and 1341 server in order to increase the speed of data transfer. A client may 1342 determine the set of network addresses to use to access a given file 1343 system in a number of ways: 1345 o When the name of the server is known to the client, it may use DNS 1346 to obtain a set of network addresses to use in accessing the 1347 server. 1349 o The client may fetch the file system location attribute for the 1350 file system. This will provide either the name of the server 1351 (which can be turned into a set of network addresses using DNS), 1352 or a set of server-trunkable location entries. Using the latter 1353 alternative, the server can provide addresses it regards as 1354 desirable to use to access the file system in question. 1356 It should be noted that the client, when it fetches a location 1357 attribute for a file system, may encounter multiple entries for a 1358 number of reasons, so that, when determining trunking information, it 1359 may have to bypass addresses not trunkable with one already known. 1361 The server can provide location entries that include either names or 1362 network addresses. It might use the latter form because of DNS- 1363 related security concerns or because the set of addresses to be used 1364 might require active management by the server. 1366 Locations entries used to discover candidate addresses for use in 1367 trunking are subject to change, as discussed in Section 5.5.7 below. 1368 The client may respond to such changes by using additional addresses 1369 once they are verified or by ceasing to use existing ones. The 1370 server can force the client to cease using an address by returning 1371 NFS4ERR_MOVED when that address is used to access a file system. 1372 This allows a transfer of client access which is similar to 1373 migration, although the same file system instance is accessed 1374 throughout. 1376 5.5.3. New section to be added as the third sub-section of Section 11.4 1377 of [RFC5661] to be entitled "File System Location Attributes and 1378 Connection Type Selection" 1380 Because of the need to support multiple connections, clients face the 1381 issue of determining the proper connection type to use when 1382 establishing a connection to a given server network address. In some 1383 cases, this issue can be addressed through the use of the connection 1384 "step-up" facility described in Section 18.16 of [RFC5661]. However, 1385 because there are cases is which that facility is not available, the 1386 client may have to choose a connection type with no possibility of 1387 changing it within the scope of a single connection. 1389 The two file system location attributes differ as to the information 1390 made available in this regard. Fs_locations provides no information 1391 to support connection type selection. As a result, clients 1392 supporting multiple connection types would need to attempt to 1393 establish connections using multiple connection types until the one 1394 preferred by the client is successfully established. 1396 Fs_locations_info includes a flag, FSLI4TF_RDMA, which, when set 1397 indicates that RPC-over-RDMA support is available using the specified 1398 location entry, by "stepping up" an existing TCP connection to 1399 include support for RDMA operation. This flag makes it convenient 1400 for a client wishing to use RDMA. When this flag is set, it can 1401 establish a TCP connection and then convert that connection to use 1402 RDMA by using the step-up facility. 1404 Irrespective of the particular attribute used, when there is no 1405 indication that a step-up operation can be performed, a client 1406 supporting RDMA operation can establish a new RDMA connection and it 1407 can be bound to the session already established by the TCP 1408 connection, allowing the TCP connection to be dropped and the session 1409 converted to further use in RDMA node. 1411 5.5.4. Updated Section 11.4.1 of [RFC5661] entitled "File System 1412 Replication" 1414 The fs_locations and fs_locations_info attributes provide alternative 1415 file system locations, to be used to access data in place of or in 1416 addition to the current file system instance. On first access to a 1417 file system, the client should obtain the set of alternate locations 1418 by interrogating the fs_locations or fs_locations_info attribute, 1419 with the latter being preferred. 1421 In the event that the occurrence of server failures, communications 1422 problems, or other difficulties make continued access to the current 1423 file system impossible or otherwise impractical, the client can use 1424 the alternate locations as a way to get continued access to its data. 1426 The alternate locations may be physical replicas of the (typically 1427 read-only) file system data, or they may provide for the use of 1428 various forms of server clustering in which multiple servers provide 1429 alternate ways of accessing the same physical file system. How these 1430 different modes of file system transition are represented within the 1431 fs_locations and fs_locations_info attributes and how the client 1432 deals with file system transition issues will be discussed in detail 1433 below. 1435 5.5.5. Updated Section 11.4.2 of [RFC5661] entitled "File System 1436 Migration" 1438 When a file system is present and becomes absent, the NFSv4.1 1439 protocol provides a means by which clients can be given the 1440 opportunity to have continued access to their data, using a different 1441 replica. The location of this replica is specified by a file system 1442 location attribute. The ensuing migration of access to another 1443 replica includes the ability to retain locks across the transition, 1444 either by using lock reclaim or by taking advantage of Transparent 1445 State Migration. 1447 Typically, a client will be accessing the file system in question, 1448 get an NFS4ERR_MOVED error, and then use a file system location 1449 attribute to determine the new location of the data. When 1450 fs_locations_info is used, additional information will be available 1451 that will define the nature of the client's handling of the 1452 transition to a new server. 1454 Such migration can be helpful in providing load balancing or general 1455 resource reallocation. The protocol does not specify how the file 1456 system will be moved between servers. It is anticipated that a 1457 number of different server-to-server transfer mechanisms might be 1458 used with the choice left to the server implementer. The NFSv4.1 1459 protocol specifies the method used to communicate the migration event 1460 between client and server. 1462 The new location may be, in the case of various forms of server 1463 clustering, another server providing access to the same physical file 1464 system. The client's responsibilities in dealing with this 1465 transition will depend on whether migration has occurred and the 1466 means the server has chosen to provide continuity of locking state. 1467 These issues will be discussed in detail below. 1469 Although a single successor location is typical, multiple locations 1470 may be provided. When multiple locations are provided, the client 1471 will typically use the first one provided. If that is inaccessible 1472 for some reason, later ones can be used. In such cases the client 1473 might consider that the transition to the new replica as a migration 1474 event, even though some of the servers involved might not be aware of 1475 the use of the server which was inaccessible. In such a case, a 1476 client might lose access to locking state as a result of the access 1477 transfer. 1479 When an alternate location is designated as the target for migration, 1480 it must designate the same data (with metadata being the same to the 1481 degree indicated by the fs_locations_info attribute). Where file 1482 systems are writable, a change made on the original file system must 1483 be visible on all migration targets. Where a file system is not 1484 writable but represents a read-only copy (possibly periodically 1485 updated) of a writable file system, similar requirements apply to the 1486 propagation of updates. Any change visible in the original file 1487 system must already be effected on all migration targets, to avoid 1488 any possibility that a client, in effecting a transition to the 1489 migration target, will see any reversion in file system state. 1491 5.5.6. Updated Section 11.4.3 of [RFC5661] entitled "Referrals" 1493 Referrals allow the server to associate a file system namespace entry 1494 located on one server with a file system located on another server. 1495 When this includes the use of pure referrals, servers are provided a 1496 way of placing a file system in a location within the namespace 1497 essentially without respect to its physical location on a particular 1498 server. This allows a single server or a set of servers to present a 1499 multi-server namespace that encompasses file systems located on a 1500 wider range of servers. Some likely uses of this facility include 1501 establishment of site-wide or organization-wide namespaces, with the 1502 eventual possibility of combining such together into a truly global 1503 namespace, such as the one provided by AFS (the Andrew File System) 1504 [TBD: appropriate reference needed] 1506 Referrals occur when a client determines, upon first referencing a 1507 position in the current namespace, that it is part of a new file 1508 system and that the file system is absent. When this occurs, 1509 typically upon receiving the error NFS4ERR_MOVED, the actual location 1510 or locations of the file system can be determined by fetching a 1511 locations attribute. 1513 The file system location attribute may designate a single file system 1514 location or multiple file system locations, to be selected based on 1515 the needs of the client. The server, in the fs_locations_info 1516 attribute, may specify priorities to be associated with various file 1517 system location choices. The server may assign different priorities 1518 to different locations as reported to individual clients, in order to 1519 adapt to client physical location or to effect load balancing. When 1520 both read-only and read-write file systems are present, some of the 1521 read-only locations might not be absolutely up-to-date (as they would 1522 have to be in the case of replication and migration). Servers may 1523 also specify file system locations that include client-substituted 1524 variables so that different clients are referred to different file 1525 systems (with different data contents) based on client attributes 1526 such as CPU architecture. 1528 When the fs_locations_info attribute is such that that there are 1529 multiple possible targets listed, the relationships among them may be 1530 important to the client in selecting which one to use. The same 1531 rules specified in Section 5.5.5 below regarding multiple migration 1532 targets apply to these multiple replicas as well. For example, the 1533 client might prefer a writable target on a server that has additional 1534 writable replicas to which it subsequently might switch. Note that, 1535 as distinguished from the case of replication, there is no need to 1536 deal with the case of propagation of updates made by the current 1537 client, since the current client has not accessed the file system in 1538 question. 1540 Use of multi-server namespaces is enabled by NFSv4.1 but is not 1541 required. The use of multi-server namespaces and their scope will 1542 depend on the applications used and system administration 1543 preferences. 1545 Multi-server namespaces can be established by a single server 1546 providing a large set of pure referrals to all of the included file 1547 systems. Alternatively, a single multi-server namespace may be 1548 administratively segmented with separate referral file systems (on 1549 separate servers) for each separately administered portion of the 1550 namespace. The top-level referral file system or any segment may use 1551 replicated referral file systems for higher availability. 1553 Generally, multi-server namespaces are for the most part uniform, in 1554 that the same data made available to one client at a given location 1555 in the namespace is made available to all clients at that location. 1556 However, there are facilities provided that allow different clients 1557 to be directed to different sets of data, for reasons such as 1558 enabling adaptation to such client characteristics as CPU 1559 architecture. These facilities are described in Section 11.10.3 of 1560 [RFC5661] and in Section 5.15.3 of the current document. 1562 5.5.7. New section to be added after Section 11.4.3 of [RFC5661] to be 1563 entitled "Changes in a File System Location Attribute" 1565 Although clients will typically fetch a file system location 1566 attribute when first accessing a file system and when NFS4ERR_MOVED 1567 is returned, a client can choose to fetch the attribute periodically, 1568 in which case the value fetched may change over time. 1570 For clients not prepared to access multiple replicas simultaneously 1571 (see Section 5.9.1 of the current document), the handling of the 1572 various cases of location change are as follows: 1574 o Changes in the list of replicas or in the network addresses 1575 associated with replicas do not require immediate action. The 1576 client will typically update its list of replicas to reflect the 1577 new information. 1579 o Additions to the list of network addresses for the current file 1580 system instance need not be acted on promptly. However, to 1581 prepare for the case in which a migration event occurs 1582 subsequently, the client can choose to take note of the new 1583 address and then use it whenever it needs to switch access to a 1584 new replica. 1586 o Deletions from the list of network addresses for the current file 1587 system instance need not be acted on immediately, although the 1588 client might need to be prepared for a shift in access whenever 1589 the server indicates that a network access path is not usable to 1590 access the current file system, by returning NFS4ERR_MOVED. 1592 For clients that are prepared to access several replicas 1593 simultaneously, the following additional cases need to be addressed. 1594 As in the cases discussed above, changes in the set of replicas need 1595 not be acted upon promptly, although the client has the option of 1596 adjusting its access even in the absence of difficulties that would 1597 lead to a new replica to be selected. 1599 o When a new replica is added which may be accessed simultaneously 1600 with one currently in use, the client is free to use the new 1601 replica immediately. 1603 o When a replica currently in use is deleted from the list, the 1604 client need not cease using it immediately. However, since the 1605 server may subsequently force such use to cease (by returning 1606 NFS4ERR_MOVED), clients might decide to limit the need for later 1607 state transfer. For example, new opens might be done on other 1608 replicas, rather than on one not present in the list. 1610 5.6. Transferred Section 11.6 of [RFC5661] entitled "Additional Client- 1611 Side Considerations" 1613 When clients make use of servers that implement referrals, 1614 replication, and migration, care should be taken that a user who 1615 mounts a given file system that includes a referral or a relocated 1616 file system continues to see a coherent picture of that user-side 1617 file system despite the fact that it contains a number of server-side 1618 file systems that may be on different servers. 1620 One important issue is upward navigation from the root of a server- 1621 side file system to its parent (specified as ".." in UNIX), in the 1622 case in which it transitions to that file system as a result of 1623 referral, migration, or a transition as a result of replication. 1625 When the client is at such a point, and it needs to ascend to the 1626 parent, it must go back to the parent as seen within the multi-server 1627 namespace rather than sending a LOOKUPP operation to the server, 1628 which would result in the parent within that server's single-server 1629 namespace. In order to do this, the client needs to remember the 1630 filehandles that represent such file system roots and use these 1631 instead of sending a LOOKUPP operation to the current server. This 1632 will allow the client to present to applications a consistent 1633 namespace, where upward navigation and downward navigation are 1634 consistent. 1636 Another issue concerns refresh of referral locations. When referrals 1637 are used extensively, they may change as server configurations 1638 change. It is expected that clients will cache information related 1639 to traversing referrals so that future client-side requests are 1640 resolved locally without server communication. This is usually 1641 rooted in client-side name look up caching. Clients should 1642 periodically purge this data for referral points in order to detect 1643 changes in location information. When the change_policy attribute 1644 changes for directories that hold referral entries or for the 1645 referral entries themselves, clients should consider any associated 1646 cached referral information to be out of date. 1648 5.7. New section to be added after Section 11.6 of [RFC5661] to be 1649 entitled "Overview of File Access Transitions" 1651 File access transitions are of two types: 1653 o Those that involve a transition from accessing the current replica 1654 to another one in connection with either replication or migration. 1655 How these are dealt with is discussed in Section 5.9 of the 1656 current document. 1658 o Those in which access to the current file system instance is 1659 retained, while the network path used to access that instance is 1660 changed. This case is discussed in Section 5.8 of the current 1661 document. 1663 5.8. New section to be added second after Section 11.6 of [RFC5661] to 1664 be entitled "Effecting Network Endpoint Transitions" 1666 The endpoints used to access a particular file system instance may 1667 change in a number of ways, as listed below. In each of these cases, 1668 the same fsid, filehandles, stateids, client IDs and session are used 1669 to continue access, with a continuity of lock state. 1671 o When use of a particular address is to cease and there is also one 1672 currently in use which is server-trunkable with it, requests that 1673 would have been issued on the address whose use is to be 1674 discontinued can be issued on the remaining address(es). When an 1675 address is not a session-trunkable one, the request might need to 1676 be modified to reflect the fact that a different session will be 1677 used. 1679 o When use of a particular connection is to cease, as indicated by 1680 receiving NFS4ERR_MOVED when using that connection but that 1681 address is still indicated as accessible according to the 1682 appropriate file system location entries, it is likely that 1683 requests can be issued on a new connection of a different 1684 connection type, once that connection is established. Since any 1685 two server endpoints that share a network address are inherently 1686 session-trunkable, the client can use BIND_CONN_TO_SESSION to 1687 access the existing session using the new connection and proceed 1688 to access the file system using the new connection. 1690 o When there are no potential replacement addresses in use but there 1691 are valid addresses session-trunkable with the one whose use is to 1692 be discontinued, the client can use BIND_CONN_TO_SESSION to access 1693 the existing session using the new address. Although the target 1694 session will generally be accessible, there may be cases in which 1695 that session is no longer accessible. In this case, the client 1696 can create a new session to enable continued access to the 1697 existing instance and provide for use of existing filehandles, 1698 stateids, and client ids while providing continuity of locking 1699 state. 1701 o When there is no potential replacement address in use and there 1702 are no valid addresses session-trunkable with the one whose use is 1703 to be discontinued, other server-trunkable addresses may be used 1704 to provide continued access. Although use of CREATE_SESSION is 1705 available to provide continued access to the existing instance, 1706 servers have the option of providing continued access to the 1707 existing session through the new network access path in a fashion 1708 similar to that provided by session migration (see Section 5.10 of 1709 the current document). To take advantage of this possibility, 1710 clients can perform an initial BIND_CONN_TO_SESSION, as in the 1711 previous case, and use CREATE_SESSION only if that fails. 1713 5.9. Updated Section 11.7 of [RFC5661] entitled "Effecting File System 1714 Transitions" 1716 There are a range of situations in which there is a change to be 1717 effected in the set of replicas used to access a particular file 1718 system. Some of these may involve an expansion or contraction of the 1719 set of replicas used as discussed in Section 5.9.1 below. 1721 For reasons explained in that section, most transitions will involve 1722 a transition from a single replica to a corresponding replacement 1723 replica. When effecting replica transition, some types of sharing 1724 between the replicas may affect handling of the transition as 1725 described in Sections 5.9.2 through 5.9.8 below. The attribute 1726 fs_locations_info provides helpful information to allow the client to 1727 determine the degree of inter-replica sharing. 1729 With regard to some types of state, the degree of continuity across 1730 the transition depends on the occasion prompting the transition, with 1731 transitions initiated by the servers (i.e. migration) offering much 1732 more scope for a non-disruptive transition than cases in which the 1733 client on its own shifts its access to another replica (i.e. 1734 replication). This issue potentially applies to locking state and to 1735 session state, which are dealt with below as follows: 1737 o An introduction to the possible means of providing continuity in 1738 these areas appears in Section 5.9.9 below. 1740 o Transparent State Migration is introduced in Section 5.10 of the 1741 current document. The possible transfer of session state is 1742 addressed there as well. 1744 o The client handling of transitions, including determining how to 1745 deal with the various means that the server might take to supply 1746 effective continuity of locking state is discussed in Section 5.11 1747 of the current document. 1749 o The servers' (source and destination) responsibilities in 1750 effecting Transparent Migration of locking and session state are 1751 discussed in Section 5.12 of the current document. 1753 5.9.1. Updated Section 11.7.1 of [RFC5661] entitled "File System 1754 Transitions and Simultaneous Access" 1756 The fs_locations_info attribute (described in Section 11.10.1 of 1757 [RFC5661] and Section 5.15 of this document) may indicate that two 1758 replicas may be used simultaneously (see Section 11.7.2.1 of 1759 [RFC5661] for details). Although situations in which multiple 1760 replicas may be accessed simultaneously are somewhat similar to those 1761 in which a single replica is accessed by multiple network addresses, 1762 there are important differences, since locking state is not shared 1763 among multiple replicas. 1765 Because of this difference in state handling, many clients will not 1766 have the ability to take advantage of the fact that such replicas 1767 represent the same data. Such clients will not be prepared to use 1768 multiple replicas simultaneously but will access each file system 1769 using only a single replica, although the replica selected might make 1770 multiple server-trunkable addresses available. 1772 Clients who are prepared to use multiple replicas simultaneously will 1773 divide opens among replicas however they choose. Once that choice is 1774 made, any subsequent transitions will treat the set of locking state 1775 associated with each replica as a single entity. 1777 For example, if one of the replicas become unavailable, access will 1778 be transferred to a different replica, also capable of simultaneous 1779 access with the one still in use. 1781 When there is no such replica, the transition may be to the replica 1782 already in use. At this point, the client has a choice between 1783 merging the locking state for the two replicas under the aegis of the 1784 sole replica in use or treating these separately, until another 1785 replica capable of simultaneous access presents itself. 1787 5.9.2. Updated Section 11.7.3 of [RFC5661] entitled "Filehandles and 1788 File System Transitions" 1790 There are a number of ways in which filehandles can be handled across 1791 a file system transition. These can be divided into two broad 1792 classes depending upon whether the two file systems across which the 1793 transition happens share sufficient state to effect some sort of 1794 continuity of file system handling. 1796 When there is no such cooperation in filehandle assignment, the two 1797 file systems are reported as being in different handle classes. In 1798 this case, all filehandles are assumed to expire as part of the file 1799 system transition. Note that this behavior does not depend on the 1800 fh_expire_type attribute and supersedes the specification of the 1801 FH4_VOL_MIGRATION bit, which only affects behavior when 1802 fs_locations_info is not available. 1804 When there is cooperation in filehandle assignment, the two file 1805 systems are reported as being in the same handle classes. In this 1806 case, persistent filehandles remain valid after the file system 1807 transition, while volatile filehandles (excluding those that are only 1808 volatile due to the FH4_VOL_MIGRATION bit) are subject to expiration 1809 on the target server. 1811 5.9.3. Updated Section 11.7.4 of [RFC5661] entitled "Fileids and File 1812 System Transitions" 1814 In NFSv4.0, the issue of continuity of fileids in the event of a file 1815 system transition was not addressed. The general expectation had 1816 been that in situations in which the two file system instances are 1817 created by a single vendor using some sort of file system image copy, 1818 fileids would be consistent across the transition, while in the 1819 analogous multi-vendor transitions they would not. This poses 1820 difficulties, especially for the client without special knowledge of 1821 the transition mechanisms adopted by the server. Note that although 1822 fileid is not a REQUIRED attribute, many servers support fileids and 1823 many clients provide APIs that depend on fileids. 1825 It is important to note that while clients themselves may have no 1826 trouble with a fileid changing as a result of a file system 1827 transition event, applications do typically have access to the fileid 1828 (e.g., via stat). The result is that an application may work 1829 perfectly well if there is no file system instance transition or if 1830 any such transition is among instances created by a single vendor, 1831 yet be unable to deal with the situation in which a multi-vendor 1832 transition occurs at the wrong time. 1834 Providing the same fileids in a multi-vendor (multiple server 1835 vendors) environment has generally been held to be quite difficult. 1836 While there is work to be done, it needs to be pointed out that this 1837 difficulty is partly self-imposed. Servers have typically identified 1838 fileid with inode number, i.e. with a quantity used to find the file 1839 in question. This identification poses special difficulties for 1840 migration of a file system between vendors where assigning the same 1841 index to a given file may not be possible. Note here that a fileid 1842 is not required to be useful to find the file in question, only that 1843 it is unique within the given file system. Servers prepared to 1844 accept a fileid as a single piece of metadata and store it apart from 1845 the value used to index the file information can relatively easily 1846 maintain a fileid value across a migration event, allowing a truly 1847 transparent migration event. 1849 In any case, where servers can provide continuity of fileids, they 1850 should, and the client should be able to find out that such 1851 continuity is available and take appropriate action. Information 1852 about the continuity (or lack thereof) of fileids across a file 1853 system transition is represented by specifying whether the file 1854 systems in question are of the same fileid class. 1856 Note that when consistent fileids do not exist across a transition 1857 (either because there is no continuity of fileids or because fileid 1858 is not a supported attribute on one of instances involved), and there 1859 are no reliable filehandles across a transition event (either because 1860 there is no filehandle continuity or because the filehandles are 1861 volatile), the client is in a position where it cannot verify that 1862 files it was accessing before the transition are the same objects. 1863 It is forced to assume that no object has been renamed, and, unless 1864 there are guarantees that provide this (e.g., the file system is 1865 read-only), problems for applications may occur. Therefore, use of 1866 such configurations should be limited to situations where the 1867 problems that this may cause can be tolerated. 1869 5.9.4. Updated section 11.7.5 of [RFC5661] entitled "Fsids and File 1870 System Transitions" 1872 Since fsids are generally only unique on a per-server basis, it is 1873 likely that they will change during a file system transition. 1874 Clients should not make the fsids received from the server visible to 1875 applications since they may not be globally unique, and because they 1876 may change during a file system transition event. Applications are 1877 best served if they are isolated from such transitions to the extent 1878 possible. 1880 Although normally a single source file system will transition to a 1881 single target file system, there is a provision for splitting a 1882 single source file system into multiple target file systems, by 1883 specifying the FSLI4F_MULTI_FS flag. 1885 5.9.4.1. Updated section 11.7.5.1 of [RFC5661] entitled "File System 1886 Splitting" 1888 When a file system transition is made and the fs_locations_info 1889 indicates that the file system in question might be split into 1890 multiple file systems (via the FSLI4F_MULTI_FS flag), the client 1891 SHOULD do GETATTRs to determine the fsid attribute on all known 1892 objects within the file system undergoing transition to determine the 1893 new file system boundaries. 1895 Clients might choose to maintain the fsids passed to existing 1896 applications by mapping all of the fsids for the descendant file 1897 systems to the common fsid used for the original file system. 1899 Splitting a file system can be done on a transition between file 1900 systems of the same fileid class, since the fact that fileids are 1901 unique within the source file system ensure they will be unique in 1902 each of the target file systems. 1904 5.9.5. Updated Section 11.7.6 of [RFC5661] entitled "The Change 1905 Attribute and File System Transitions" 1907 Since the change attribute is defined as a server-specific one, 1908 change attributes fetched from one server are normally presumed to be 1909 invalid on another server. Such a presumption is troublesome since 1910 it would invalidate all cached change attributes, requiring 1911 refetching. Even more disruptive, the absence of any assured 1912 continuity for the change attribute means that even if the same value 1913 is retrieved on refetch, no conclusions can be drawn as to whether 1914 the object in question has changed. The identical change attribute 1915 could be merely an artifact of a modified file with a different 1916 change attribute construction algorithm, with that new algorithm just 1917 happening to result in an identical change value. 1919 When the two file systems have consistent change attribute formats, 1920 and this fact is communicated to the client by reporting in the same 1921 change class, the client may assume a continuity of change attribute 1922 construction and handle this situation just as it would be handled 1923 without any file system transition. 1925 5.9.6. Updated Section 11.7.8 of [RFC5661] entitled "Write Verifiers 1926 and File System Transitions" 1928 In a file system transition, the two file systems might be clustered 1929 in the handling of unstably written data. When this is the case, and 1930 the two file systems belong to the same write-verifier class, write 1931 verifiers returned from one system may be compared to those returned 1932 by the other and superfluous writes avoided. 1934 When two file systems belong to different write-verifier classes, any 1935 verifier generated by one must not be compared to one provided by the 1936 other. Instead, the two verifiers should be treated as not equal 1937 even when the values are identical. 1939 5.9.7. Updated Section 11.7.9 of [RFC5661] entitled "Readdir Cookies 1940 and Verifiers and File System Transitions)" 1942 In a file system transition, the two file systems might be consistent 1943 in their handling of READDIR cookies and verifiers. When this is the 1944 case, and the two file systems belong to the same readdir class, 1945 READDIR cookies and verifiers from one system may be recognized by 1946 the other and READDIR operations started on one server may be validly 1947 continued on the other, simply by presenting the cookie and verifier 1948 returned by a READDIR operation done on the first file system to the 1949 second. 1951 When two file systems belong to different readdir classes, any 1952 READDIR cookie and verifier generated by one is not valid on the 1953 second, and must not be presented to that server by the client. The 1954 client should act as if the verifier was rejected. 1956 5.9.8. Updated Section 11.7.10 entitled "File System Data and File 1957 System Transitions" 1959 When multiple replicas exist and are used simultaneously or in 1960 succession by a client, applications using them will normally expect 1961 that they contain either the same data or data that is consistent 1962 with the normal sorts of changes that are made by other clients 1963 updating the data of the file system (with metadata being the same to 1964 the degree indicated by the fs_locations_info attribute). However, 1965 when multiple file systems are presented as replicas of one another, 1966 the precise relationship between the data of one and the data of 1967 another is not, as a general matter, specified by the NFSv4.1 1968 protocol. It is quite possible to present as replicas file systems 1969 where the data of those file systems is sufficiently different that 1970 some applications have problems dealing with the transition between 1971 replicas. The namespace will typically be constructed so that 1972 applications can choose an appropriate level of support, so that in 1973 one position in the namespace a varied set of replicas will be 1974 listed, while in another only those that are up-to-date may be 1975 considered replicas. The protocol does define three special cases of 1976 the relationship among replicas to be specified by the server and 1977 relied upon by clients: 1979 o When multiple replicas exist and are used simultaneously by a 1980 client (see the FSLIB4_CLSIMUL definition within 1981 fs_locations_info), they must designate the same data. Where file 1982 systems are writable, a change made on one instance must be 1983 visible on all instances, immediately upon the earlier of the 1984 return of the modifying requester or the visibility of that change 1985 on any of the associated replicas. This allows a client to use 1986 these replicas simultaneously without any special adaptation to 1987 the fact that there are multiple replicas, beyond adapting to the 1988 fact that locks obtained on one replica are maintained separately 1989 (i.e. under a different client ID). In this case, locks (whether 1990 share reservations or byte-range locks) and delegations obtained 1991 on one replica are immediately reflected on all replicas, in the 1992 sense that access from all other servers is prevented regardless 1993 of the replica used. However, because the servers are not 1994 required to treat two associated client IDs as representing the 1995 same client, it is best to access each file using only a single 1996 client ID. 1998 o When one replica is designated as the successor instance to 1999 another existing instance after return NFS4ERR_MOVED (i.e., the 2000 case of migration), the client may depend on the fact that all 2001 changes written to stable storage on the original instance are 2002 written to stable storage of the successor (uncommitted writes are 2003 dealt with in Section 5.9.6 above). 2005 o Where a file system is not writable but represents a read-only 2006 copy (possibly periodically updated) of a writable file system, 2007 clients have similar requirements with regard to the propagation 2008 of updates. They may need a guarantee that any change visible on 2009 the original file system instance must be immediately visible on 2010 any replica before the client transitions access to that replica, 2011 in order to avoid any possibility that a client, in effecting a 2012 transition to a replica, will see any reversion in file system 2013 state. The specific means of this guarantee varies based on the 2014 value of the fss_type field that is reported as part of the 2015 fs_status attribute (see Section 11.11 of [RFC5661]). Since these 2016 file systems are presumed to be unsuitable for simultaneous use, 2017 there is no specification of how locking is handled; in general, 2018 locks obtained on one file system will be separate from those on 2019 others. Since these are expected to be read-only file systems, 2020 this is not likely to pose an issue for clients or applications. 2022 5.9.9. Updated Section 11.7.7 entitled "Lock State and File System 2023 Transitions" 2025 While accessing a file system, clients obtain locks enforced by the 2026 server which may prevent actions by other clients that are 2027 inconsistent with those locks. 2029 When access is transferred between replicas, clients need to be 2030 assured that the actions disallowed by holding these locks cannot 2031 have occurred during the transition. This can be ensured by the 2032 methods below. Unless at least one of these is implemented, clients 2033 will not be assured of continuity of lock possession across a 2034 migration event. 2036 o Providing the client an opportunity to re-obtain his locks via a 2037 per-fs grace period on the destination server. Because the lock 2038 reclaim mechanism was originally defined to support server reboot, 2039 it implicitly assumes that file handles will on reclaim will be 2040 the same as those at open. In the case of migration, this 2041 requires that source and destination servers use the same 2042 filehandles, as evidenced by using the same server scope (see 2043 Section 4.2) or by showing this agreement using fs_locations_info 2044 (see Section 5.9.2 above). 2046 o Locking state can be transferred as part of the transition by 2047 providing Transparent State Migration as described in Section 5.10 2048 of the current document. 2050 Of these, Transparent State Migration provides the smoother 2051 experience for clients in that there is no grace-period-based delay 2052 before new locks can be obtained. However, it requires a greater 2053 degree of inter-server co-ordination. In general, the servers taking 2054 part in migration are free to provide either facility. However, when 2055 the filehandles can differ across the migration event, Transparent 2056 State Migration is the only available means of providing the needed 2057 functionality. 2059 It should be noted that these two methods are not mutually exclusive 2060 and that a server might well provide both. In particular, if there 2061 is some circumstance preventing a specific lock from being 2062 transferred transparently, the destination server can allow it to be 2063 reclaimed, by implementing a per-fs grace period for the migrated 2064 file system. 2066 5.9.9.1. Transferred Section 11.7.7.1 [RFC5661] entitled "Leases and 2067 File System Transitions" 2069 In the case of lease renewal, the client may not be submitting 2070 requests for a file system that has been transferred to another 2071 server. This can occur because of the lease renewal mechanism. The 2072 client renews the lease associated with all file systems when 2073 submitting a request on an associated session, regardless of the 2074 specific file system being referenced. 2076 In order for the client to schedule renewal of its lease where there 2077 is locking state that may have been relocated to the new server, the 2078 client must find out about lease relocation before that lease expire. 2079 To accomplish this, the SEQUENCE operation will return the status bit 2080 SEQ4_STATUS_LEASE_MOVED if responsibility for any of the renewed 2081 locking state has been transferred to a new server. This will 2082 continue until the client receives an NFS4ERR_MOVED error for each of 2083 the file systems for which there has been locking state relocation. 2085 When a client receives an SEQ4_STATUS_LEASE_MOVED indication from a 2086 server, for each file system of the server for which the client has 2087 locking state, the client should perform an operation. For 2088 simplicity, the client may choose to reference all file systems, but 2089 what is important is that it must reference all file systems for 2090 which there was locking state where that state has moved. Once the 2091 client receives an NFS4ERR_MOVED error for each such file system, the 2092 server will clear the SEQ4_STATUS_LEASE_MOVED indication. The client 2093 can terminate the process of checking file systems once this 2094 indication is cleared (but only if the client has received a reply 2095 for all outstanding SEQUENCE requests on all sessions it has with the 2096 server), since there are no others for which locking state has moved. 2098 A client may use GETATTR of the fs_status (or fs_locations_info) 2099 attribute on all of the file systems to get absence indications in a 2100 single (or a few) request(s), since absent file systems will not 2101 cause an error in this context. However, it still must do an 2102 operation that receives NFS4ERR_MOVED on each file system, in order 2103 to clear the SEQ4_STATUS_LEASE_MOVED indication. 2105 Once the set of file systems with transferred locking state has been 2106 determined, the client can follow the normal process to obtain the 2107 new server information (through the fs_locations and 2108 fs_locations_info attributes) and perform renewal of that lease on 2109 the new server, unless information in the fs_locations_info attribute 2110 shows that no state could have been transferred. If the server has 2111 not had state transferred to it transparently, the client will 2112 receive NFS4ERR_STALE_CLIENTID from the new server, as described 2113 above, and the client can then reclaim locks as is done in the event 2114 of server failure. 2116 5.9.9.2. Transferred Section 11.7.7.2 of [RFC5661] entitled 2117 "Transitions and the Lease_time Attribute" 2119 In order that the client may appropriately manage its lease in the 2120 case of a file system transition, the destination server must 2121 establish proper values for the lease_time attribute. 2123 When state is transferred transparently, that state should include 2124 the correct value of the lease_time attribute. The lease_time 2125 attribute on the destination server must never be less than that on 2126 the source, since this would result in premature expiration of a 2127 lease granted by the source server. Upon transitions in which state 2128 is transferred transparently, the client is under no obligation to 2129 refetch the lease_time attribute and may continue to use the value 2130 previously fetched (on the source server). 2132 If state has not been transferred transparently, either because the 2133 associated servers are shown as having different eir_server_scope 2134 strings or because the client ID is rejected when presented to the 2135 new server, the client should fetch the value of lease_time on the 2136 new (i.e., destination) server, and use it for subsequent locking 2137 requests. However, the server must respect a grace period of at 2138 least as long as the lease_time on the source server, in order to 2139 ensure that clients have ample time to reclaim their lock before 2140 potentially conflicting non-reclaimed locks are granted. 2142 5.10. New section to be added after Section 11.7 of [RFC5661] to be 2143 entitled "Transferring State upon Migration" 2145 When the transition is a result of a server-initiated decision to 2146 transition access and the source and destination servers have 2147 implemented appropriate co-operation, it is possible to: 2149 o Transfer locking state from the source to the destination server, 2150 in a fashion similar to that provided by Transparent State 2151 Migration in NFSv4.0, as described in [RFC7931]. Server 2152 responsibilities are described in Section 5.12.2 of the current 2153 document. 2155 o Transfer session state from the source to the destination server. 2156 Server responsibilities in effecting such a transfer are described 2157 in Section 5.12.3 of the current document. 2159 The means by which the client determines which of these transfer 2160 events has occurred are described in Section 5.11 of the current 2161 document. 2163 5.10.1. Only sub-section within new section to be added to [RFC5661] to 2164 be entitled "Transparent State Migration and pNFS" 2166 When pNFS is involved, the protocol is capable of supporting: 2168 o Migration of the Metadata Server (MDS), leaving the Data Servers 2169 (DS's) in place. 2171 o Migration of the file system as a whole, including the MDS and 2172 associated DS's. 2174 o Replacement of one DS by another. 2176 o Migration of a pNFS file system to one in which pNFS is not used. 2178 o Migration of a file system not using pNFS to one in which layouts 2179 are available. 2181 Note that migration per se is only involved in the transfer of the 2182 MDS function. Although the servicing of a layout may be transferred 2183 from one data server to another, this not done using the file system 2184 location attributes. The MDS can effect such transfers by recalling/ 2185 revoking existing layouts and granting new ones on a different data 2186 server. 2188 Migration of the MDS function is directly supported by Transparent 2189 State Migration. Layout state will normally be transparently 2190 transferred, just as other state is. As a result, Transparent State 2191 Migration provides a framework in which, given appropriate inter-MDS 2192 data transfer, one MDS can be substituted for another. 2194 Migration of the file system function as a whole can be accomplished 2195 by recalling all layouts as part of the initial phase of the 2196 migration process. As a result, IO will be done through the MDS 2197 during the migration process, and new layouts can be granted once the 2198 client is interacting with the new MDS. An MDS can also effect this 2199 sort of transition by revoking all layouts as part of Transparent 2200 State Migration, as long as the client is notified about the loss of 2201 locking state. 2203 In order to allow migration to a file system on which pNFS is not 2204 supported, clients need to be prepared for a situation in which 2205 layouts are not available or supported on the destination file system 2206 and so direct IO requests to the destination server, rather than 2207 depending on layouts being available. 2209 Replacement of one DS by another is not addressed by migration as 2210 such but can be effected by an MDS recalling layouts for the DS to be 2211 replaced and issuing new ones to be served by the successor DS. 2213 Migration may transfer a file system from a server which does not 2214 support pNFS to one which does. In order to properly adapt to this 2215 situation, clients which support pNFS, but function adequately in its 2216 absence should check for pNFS support when a file system is migrated 2217 and be prepared to use pNFS when support is available on the 2218 destination. 2220 5.11. New section to be added second after Section 11.7 of [RFC5661] to 2221 be entitled "Client Responsibilities when Access is Transitioned" 2223 For a client to respond to an access transition, it must become aware 2224 of it. The ways in which this can happen are discussed in 2225 Section 5.11.1 which discusses indications that a specific file 2226 system access path has transitioned as well as situations in which 2227 additional activity is necessary to determine the set of file systems 2228 that have been migrated. Section 5.11.2 goes on to complete the 2229 discussion of how the set of migrated file systems might be 2230 determined. Sections 5.11.3 through 5.11.5 discuss how the client 2231 should deal with each transition it becomes aware of, either directly 2232 or as a result of migration discovery. 2234 The following terms are used to describe client activities: 2236 o "Transition recovery" refers to the process of restoring access to 2237 a file system on which NFS4ERR_MOVED was received. 2239 o "Migration recovery" to that subset of transition recovery which 2240 applies when the file system has migrated to a different replica. 2242 o "Migration discovery" refers to the process of determining which 2243 file system(s) have been migrated. It is necessary to avoid a 2244 situation in which leases could expire when a file system is not 2245 accessed for a long period of time, since a client unaware of the 2246 migration might be referencing an unmigrated file system and not 2247 renewing the lease associated with the migrated file system. 2249 5.11.1. First sub-section within new section to be added to [RFC5661] 2250 to be entitled "Client Transition Notifications" 2252 When there is a change in the network access path which a client is 2253 to use to access a file system, there are a number of related status 2254 indications with which clients need to deal: 2256 o If an attempt is made to use or return a filehandle within a file 2257 system that is no longer accessible at the address previously used 2258 to access it, the error NFS4ERR_MOVED is returned. 2260 Exceptions are made to allow such file handles to be used when 2261 interrogating a file system location attribute. This enables a 2262 client to determine a new replica's location or a new network 2263 access path. 2265 This condition continues on subsequent attempts to access the file 2266 system in question. The only way the client can avoid the error 2267 is to cease accessing the file system in question at its old 2268 server location and access it instead using a different address at 2269 which it is now available. 2271 o Whenever a SEQUENCE operation is sent by a client to a server 2272 which generated state held on that client which is associated with 2273 a file system that is no longer accessible on the server at which 2274 it was previously available, the response will contain a lease- 2275 migrated indication, with the SEQ4_STATUS_LEASE_MOVED status bit 2276 being set. 2278 This condition continues until the client acknowledges the 2279 notification by fetching a file system location attribute for the 2280 file system whose network access path is being changed. When 2281 there are multiple such file systems, a location attribute for 2282 each such file system needs to be fetched. The location attribute 2283 for all migrated file system needs to be fetched in order to clear 2284 the condition. Even after the condition is cleared, the client 2285 needs to respond by using the location information to access the 2286 file system at its new location to ensure that leases are not 2287 needlessly expired. 2289 Unlike the case of NFSv4.0, in which the corresponding conditions are 2290 both errors and thus mutually exclusive, in NFSv4.1 the client can, 2291 and often will, receive both indications on the same request. As a 2292 result, implementations need to address the question of how to co- 2293 ordinate the necessary recovery actions when both indications arrive 2294 in the response to the same request. It should be noted that when 2295 processing an NFSv4 COMPOUND, the server will normally decide whether 2296 SEQ4_STATUS_LEASE_MOVED is to be set before it determines which file 2297 system will be referenced or whether NFS4ERR_MOVED is to be returned. 2299 Since these indications are not mutually exclusive in NFSv4.1, the 2300 following combinations are possible results when a COMPOUND is 2301 issued: 2303 o The COMPOUND status is NFS4ERR_MOVED and SEQ4_STATUS_LEASE_MOVED 2304 is asserted. 2306 In this case, transition recovery is required. While it is 2307 possible that migration discovery is needed in addition, it is 2308 likely that only the accessed file system has transitioned. In 2309 any case, because addressing NFS4ERR_MOVED is necessary to allow 2310 the rejected requests to be processed on the target, dealing with 2311 it will typically have priority over migration discovery. 2313 o The COMPOUND status is NFS4ERR_MOVED and SEQ4_STATUS_LEASE_MOVED 2314 is clear. 2316 In this case, transition recovery is also required. It is clear 2317 that migration discovery is not needed to find file systems that 2318 have been migrated other that the one returning NFS4ERR_MOVED. 2319 Cases in which this result can arise include a referral or a 2320 migration for which there is no associated locking state. This 2321 can also arise in cases in which an access path transition other 2322 than migration occurs within the same server. In such a case, 2323 there is no need to set SEQ4_STATUS_LEASE_MOVED, since the lease 2324 remains associated with the current server even though the access 2325 path has changed. 2327 o The COMPOUND status is not NFS4ERR_MOVED and 2328 SEQ4_STATUS_LEASE_MOVED is asserted. 2330 In this case, no transition recovery activity is required on the 2331 file system(s) accessed by the request. However, to prevent 2332 avoidable lease expiration, migration discovery needs to be done 2334 o The COMPOUND status is not NFS4ERR_MOVED and 2335 SEQ4_STATUS_LEASE_MOVED is clear. 2337 In this case, neither transition-related activity nor migration 2338 discovery is required. 2340 Note that the specified actions only need to be taken if they are not 2341 already going on. For example, when NFS4ERR_MOVED is received when 2342 accessing a file system for which transition recovery already going 2343 on, the client merely waits for that recovery to be completed while 2344 the receipt of SEQ4_STATUS_LEASE_MOVED indication only needs to 2345 initiate migration discovery for a server if such discovery is not 2346 already underway for that server. 2348 The fact that a lease-migrated condition does not result in an error 2349 in NFSv4.1 has a number of important consequences. In addition to 2350 the fact, discussed above, that the two indications are not mutually 2351 exclusive, there are number of issues that are important in 2352 considering implementation of migration discovery, as discussed in 2353 Section 5.11.2. 2355 Because of the absence of NFSV4ERR_LEASE_MOVED, it is possible for 2356 file systems whose access path has not changed to be successfully 2357 accessed on a given server even though recovery is necessary for 2358 other file systems on the same server. As a result, access can go on 2359 while, 2361 o The migration discovery process is going on for that server. 2363 o The transition recovery process is going on for on other file 2364 systems connected to that server. 2366 5.11.2. Second sub-section within new section to be added to [RFC5661] 2367 to be entitled "Performing Migration Discovery" 2369 Migration discovery can be performed in the same context as 2370 transition recovery, allowing recovery for each migrated file system 2371 to be invoked as it is discovered. Alternatively, it may be done in 2372 a separate migration discovery thread, allowing migration discovery 2373 to be done in parallel with one or more instances of transition 2374 recovery. 2376 In either case, because the lease-migrated indication does not result 2377 in an error. other access to file systems on the server can proceed 2378 normally, with the possibility that further such indications will be 2379 received, raising the issue of how such indications are to be dealt 2380 with. In general, 2382 o No action needs to be taken for such indications received by the 2383 those performing migration discovery, since continuation of that 2384 work will address the issue. 2386 o In other cases in which migration discovery is currently being 2387 performed, nothing further needs to be done to respond to such 2388 lease migration indications, as long as one can be certain that 2389 the migration discovery process would deal with those indications. 2390 See below for details. 2392 o For such indications received in all other contexts, the 2393 appropriate response is to initiate or otherwise provide for the 2394 execution of migration discovery for file systems associated with 2395 the server IP address returning the indication. 2397 This leaves a potential difficulty in situations in which the 2398 migration discovery process is near to completion but is still 2399 operating. One should not ignore a LEASE_MOVED indication if the 2400 migration discovery process is not able to respond to the discovery 2401 of additional migrating file systems without additional aid. A 2402 further complexity relevant in addressing such situations is that a 2403 lease-migrated indication may reflect the server's state at the time 2404 the SEQUENCE operation was processed, which may be different from 2405 that in effect at the time the response is received. Because new 2406 migration events may occur at any time, and because a LEASE_MOVED 2407 indication may reflect the situation in effect a considerable time 2408 before the indication is received, special care needs to be taken to 2409 ensure that LEASE_MOVED indications are not inappropriately ignored. 2411 A useful approach to this issue involves the use of separate 2412 externally-visible migration discovery states for each server. 2413 Separate values could represent the various possible states for the 2414 migration discovery process for a server: 2416 o non-operation, in which migration discovery is not being performed 2418 o normal operation, in which there is an ongoing scan for migrated 2419 file systems. 2421 o completion/verification of migration discovery processing, in 2422 which the possible completion of migration discovery processing 2423 needs to be verified. 2425 Given that framework, migration discovery processing would proceed as 2426 follows. 2428 o While in the normal-operation state, the thread performing 2429 discovery would fetch, for successive file systems known to the 2430 client on the server being worked on, a file system location 2431 attribute plus the fs_status attribute. 2433 o If the fs_status attribute indicates that the file system is a 2434 migrated one (i.e. fss_absent is true and fss_type != 2435 STATUS4_REFERRAL) and thus that it is likely that the fetch of the 2436 file system location attribute has cleared one the file systems 2437 contributing to the lease-migrated indication. 2439 o In cases in which that happened, the thread cannot know whether 2440 the lease-migrated indication has been cleared and so it enters 2441 the completion/verification state and proceeds to issue a COMPOUND 2442 to see if the LEASE_MOVED indication has been cleared. 2444 o When the discovery process is in the completion/verification 2445 state, if other requests get a lease-migrated indication they note 2446 that it was received. Laater, the existence of such indications 2447 is used when the request completes, as described below. 2449 When the request used in the completion/verification state completes: 2451 o If a lease-migrated indication is returned, the discovery 2452 continues normally. Note that this is so even if all file systems 2453 have traversed, since new migrations could have occurred while the 2454 process was going on. 2456 o Otherwise, if there is any record that other requests saw a lease- 2457 migrated indication while the request was going on, that record is 2458 cleared and the verification request retried. The discovery 2459 process remains in completion/verification state. 2461 o If there have been no lease-migrated indications, the work of 2462 migration discovery is considered completed and it enters the non- 2463 operating state. Once it enters this state, subsequent lease- 2464 migrated indication will trigger a new migration discovery 2465 process. 2467 It should be noted that the process described above is not guaranteed 2468 to terminate, as a long series of new migration events might 2469 continually delay the clearing of the LEASE_MOVED indication. To 2470 prevent unnecessary lease expiration, it is appropriate for clients 2471 to use the discovery of migrations to effect lease renewal 2472 immediately, rather than waiting for clearing of the LEASE_MOVED 2473 indication when the complete set of migrations is available. 2475 5.11.3. Third sub-section within new section to be added to [RFC5661] 2476 to be entitled "Overview of Client Response to NFS4ERR_MOVED" 2478 This section outlines a way in which a client that receives 2479 NFS4ERR_MOVED can effect transition recovery by using a new server or 2480 server endpoint if one is available. As part of that process, it 2481 will determine: 2483 o Whether the NFS4ERR_MOVED indicates migration has occurred, or 2484 whether it indicates another sort of file system access transition 2485 as discussed in Section 5.8 above. 2487 o In the case of migration, whether Transparent State Migration has 2488 occurred. 2490 o Whether any state has been lost during the process of Transparent 2491 State Migration. 2493 o Whether sessions have been transferred as part of Transparent 2494 State Migration. 2496 During the first phase of this process, the client proceeds to 2497 examine file system location entries to find the initial network 2498 address it will use to continue access to the file system or its 2499 replacement. For each location entry that the client examines, the 2500 process consists of five steps: 2502 1. Performing an EXCHANGE_ID directed at the location address. This 2503 operation is used to register the client owner (in the form of a 2504 client_owner4) with the server, to obtain a client ID to be use 2505 subsequently to communicate with it, to obtain that client ID's 2506 confirmation status, and to determine server_owner and scope for 2507 the purpose of determining if the entry is trunkable with that 2508 previously being used to access the file system (i.e. that it 2509 represents another network access path to the same file system 2510 and can share locking state with it). 2512 2. Making an initial determination of whether migration has 2513 occurred. The initial determination will be based on whether the 2514 EXCHANGE_ID results indicate that the current location element is 2515 server-trunkable with that used to access the file system when 2516 access was terminated by receiving NFS4ERR_MOVED. If it is, then 2517 migration has not occurred. In that case, the transition is 2518 dealt with, at least initially, as one involving continued access 2519 to the same file system on the same server through a new network 2520 address. 2522 3. Obtaining access to existing session state or creating new 2523 sessions. How this is done depends on the initial determination 2524 of whether migration has occurred and can be done as described in 2525 Section 5.11.4 below in the case of migration or as described in 2526 Section 5.11.5 below in the case of a network address transfer 2527 without migration. 2529 4. Verification of the trunking relationship assumed in step 2 as 2530 discussed in Section 2.10.5.1 of [RFC5661]. Although this step 2531 will generally confirm the initial determination, it is possible 2532 for verification to fail with the result that an initial 2533 determination that a network address shift (without migration) 2534 has occurred may be invalidated and migration determined to have 2535 occurred. There is no need to redo step 3 above, since it will 2536 be possible to continue use of the session established already. 2538 5. Obtaining access to existing locking state and/or reobtaining it. 2539 How this is done depends on the final determination of whether 2540 migration has occurred and can be done as described below in 2541 Section 5.11.4 in the case of migration or as described in 2542 Section 5.11.5 in the case of a network address transfer without 2543 migration. 2545 Once the initial address has been determined, clients are free to 2546 apply an abbreviated process to find additional addresses trunkable 2547 with it (clients may seek session-trunkable or server-trunkable 2548 addresses depending on whether they support clientid trunking). 2549 During this later phase of the process, further location entries are 2550 examined using the abbreviated procedure specified below: 2552 1. Before the EXCHANGE_ID, the fs name of the location entry is 2553 examined and if it does not match that currently being used, the 2554 entry is ignored. otherwise, one proceeds as specified by step 1 2555 above. 2557 2. In the case that the network address is session-trunkable with 2558 one used previously a BIND_CONN_TO_SESSION is used to access that 2559 session using the new network address. Otherwise, or if the bind 2560 operation fails, a CREATE_SESSION is done. 2562 3. The verification procedure referred to in step 4 above is used. 2563 However, if it fails, the entry is ignored and the next available 2564 entry is used. 2566 5.11.4. Fourth sub-section within new section to be added to [RFC5661] 2567 to be entitled "Obtaining Access to Sessions and State after 2568 Migration" 2570 In the event that migration has occurred, migration recovery will 2571 involve determining whether Transparent State Migration has occurred. 2572 This decision is made based on the client ID returned by the 2573 EXCHANGE_ID and the reported confirmation status. 2575 o If the client ID is an unconfirmed client ID not previously known 2576 to the client, then Transparent State Migration has not occurred. 2578 o If the client ID is a confirmed client ID previously known to the 2579 client, then any transferred state would have been merged with an 2580 existing client ID representing the client to the destination 2581 server. In this state merger case, Transparent State Migration 2582 might or might not have occurred and a determination as to whether 2583 it has occurred is deferred until sessions are established and the 2584 client is ready to begin state recovery. 2586 o If the client ID is a confirmed client ID not previously known to 2587 the client, then the client can conclude that the client ID was 2588 transferred as part of Transparent State Migration. In this 2589 transferred client ID case, Transparent State Migration has 2590 occurred although some state might have been lost. 2592 Once the client ID has been obtained, it is necessary to obtain 2593 access to sessions to continue communication with the new server. In 2594 any of the cases in which Transparent State Migration has occurred, 2595 it is possible that a session was transferred as well. To deal with 2596 that possibility, clients can, after doing the EXCHANGE_ID, issue a 2597 BIND_CONN_TO_SESSION to connect the transferred session to a 2598 connection to the new server. If that fails, it is an indication 2599 that the session was not transferred and that a new session needs to 2600 be created to take its place. 2602 In some situations, it is possible for a BIND_CONN_TO_SESSION to 2603 succeed without session migration having occurred. If state merger 2604 has taken place then the associated client ID may have already had a 2605 set of existing sessions, with it being possible that the sessionid 2606 of a given session is the same as one that might have been migrated. 2607 In that event, a BIND_CONN_TO_SESSION might succeed, even though 2608 there could have been no migration of the session with that 2609 sessionid. In such cases, the client will receive sequence errors 2610 when the slot sequence values used are not appropriate on the new 2611 session. When this occurs, the client can create a new a session and 2612 cease using the existing one. 2614 Once the client has determined the initial migration status, and 2615 determined that there was a shift to a new server, it needs to re- 2616 establish its locking state, if possible. To enable this to happen 2617 without loss of the guarantees normally provided by locking, the 2618 destination server needs to implement a per-fs grace period in all 2619 cases in which lock state was lost, including those in which 2620 Transparent State Migration was not implemented. 2622 Clients need to be deal with the following cases: 2624 o In the state merger case, it is possible that the server has not 2625 attempted Transparent State Migration, in which case state may 2626 have been lost without it being reflected in the SEQ4_STATUS bits. 2627 To determine whether this has happened, the client can use 2628 TEST_STATEID to check whether the stateids created on the source 2629 server are still accessible on the destination server. Once a 2630 single stateid is found to have been successfully transferred, the 2631 client can conclude that Transparent State Migration was begun and 2632 any failure to transport all of the stateids will be reflected in 2633 the SEQ4_STATUS bits. Otherwise, Transparent State Migration has 2634 not occurred. 2636 o In a case in which Transparent State Migration has not occurred, 2637 the client can use the per-fs grace period provided by the 2638 destination server to reclaim locks that were held on the source 2639 server. 2641 o In a case in which Transparent State Migration has occurred, and 2642 no lock state was lost (as shown by SEQ4_STATUS flags), no lock 2643 reclaim is necessary. 2645 o In a case in which Transparent State Migration has occurred, and 2646 some lock state was lost (as shown by SEQ4_STATUS flags), existing 2647 stateids need to be checked for validity using TEST_STATEID, and 2648 reclaim used to re-establish any that were not transferred. 2650 For all of the cases above, RECLAIM_COMPLETE with an rca_one_fs value 2651 of TRUE needs to be done before normal use of the file system 2652 including obtaining new locks for the file system. This applies even 2653 if no locks were lost and there was no need for any to be reclaimed. 2655 5.11.5. Fifth sub-section within new section to be added to [RFC5661] 2656 to be entitled "Obtaining Access to Sessions and State after 2657 Network Address Transfer" 2659 The case in which there is a transfer to a new network address 2660 without migration is similar to that described in Section 5.11.4 2661 above in that there is a need to obtain access to needed sessions and 2662 locking state. However, the details are simpler and will vary 2663 depending on the type of trunking between the address receiving 2664 NFS4ERR_MOVED and that to which the transfer is to be made 2666 To make a session available for use, a BIND_CONN_TO_SESSION should be 2667 used to obtain access to the session previously in use. Only if this 2668 fails, should a CREATE_SESSION be done. While this procedure mirrors 2669 that in Section 5.11.4 above, there is an important difference in 2670 that preservation of the session is not purely optional but depends 2671 on the type of trunking. 2673 Access to appropriate locking state will generally need no actions 2674 beyond access to the session. However, the SEQ4_STATUS bits need to 2675 be checked for lost locking state, including the need to reclaim 2676 locks after a server reboot, since there is always a possibility of 2677 locking state being lost. 2679 5.12. New section to be added third after Section 11.7 of [RFC5661] to 2680 be entitled "Server Responsibilities Upon Migration" 2682 In the event of file system migration, when the client connects to 2683 the destination server, that server needs to be able to provide the 2684 client continued to access the files it had open on the source 2685 server. There are two ways to provide this: 2687 o By provision of an fs-specific grace period, allowing the client 2688 the ability to reclaim its locks, in a fashion similar to what 2689 would have been done in the case of recovery from a server 2690 restart. See Section 5.12.1 for a more complete discussion. 2692 o By implementing Transparent State Migration possibly in connection 2693 with session migration, the server can provide the client 2694 immediate access to the state built up on the source server, on 2695 the destination. 2697 These features are discussed separately in Sections 5.12.2 and 2698 5.12.3, which discuss Transparent State Migration and session 2699 migration respectively. 2701 All the features described above can involve transfer of lock-related 2702 information between source and destination servers. In some cases, 2703 this transfer is a necessary part of the implementation while in 2704 other cases it is a helpful implementation aid which servers might or 2705 might not use. The sub-sections below discuss the information which 2706 would be transferred but do not define the specifics of the transfer 2707 protocol. This is left as an implementation choice although 2708 standards in this area could be developed at a later time. 2710 5.12.1. First sub-section within new section to be added to [RFC5661] 2711 to be entitled "Server Responsibilities in Effecting State 2712 Reclaim after Migration" 2714 In this case, destination server need have no knowledge of the locks 2715 held on the source server, but relies on the clients to accurately 2716 report (via reclaim operations) the locks previously held, not 2717 allowing new locks to be granted on migrated file system until the 2718 grace period expires. 2720 During this grace period clients have the opportunity to use reclaim 2721 operations to obtain locks for file system objects within the 2722 migrated file system, in the same way that they do when recovering 2723 from server restart, and the servers typically rely on clients to 2724 accurately report their locks, although they have the option of 2725 subjecting these requests to verification. If the clients only 2726 reclaim locks held on the source server, no conflict can arise. Once 2727 the client has reclaimed its locks, it indicates the completion of 2728 lock reclamation by performing a RECLAIM_COMPLETE specifying 2729 rca_one_fs as TRUE. 2731 While it is not necessary for source and destination servers to co- 2732 operate to transfer information about locks, implementations are 2733 well-advised to consider transferring the following useful 2734 information: 2736 o If information about the set of clients that have locking state 2737 for the transferred file system is made available, the destination 2738 server will be able to terminate the grace period once all such 2739 clients have reclaimed their locks, allowing normal locking 2740 activity to resume earlier than it would have otherwise. 2742 o Locking summary information for individual clients (at various 2743 possible levels of detail) can detect some instances in which 2744 clients do not accurately represent the locks held on the source 2745 server. 2747 5.12.2. Second sub-section within new section to be added to [RFC5661] 2748 to be entitled "Server Responsibilities in Effecting 2749 Transparent State Migration" 2751 The basic responsibility of the source server in effecting 2752 Transparent State Migration is to make available to the destination 2753 server a description of each piece of locking state associated with 2754 the file system being migrated. In addition to client id string and 2755 verifier, the source server needs to provide, for each stateid: 2757 o The stateid including the current sequence value. 2759 o The associated client ID. 2761 o The handle of the associated file. 2763 o The type of the lock, such as open, byte-range lock, delegation, 2764 or layout. 2766 o For locks such as opens and byte-range locks, there will be 2767 information about the owner(s) of the lock. 2769 o For recallable/revocable lock types, the current recall status 2770 needs to be included. 2772 o For each lock type, there will be type-specific information, such 2773 as share and deny modes for opens and type and byte ranges for 2774 byte-range locks and layouts. 2776 Such information will most probably be organized by client id string 2777 on the destination server so that it can be used to provide 2778 appropriate context to each client when it makes itself known to the 2779 client. Issues connected with a client impersonating another by 2780 presenting another client's id string are discussed in Section 8. 2782 A further server responsibility concerns locks that are revoked or 2783 otherwise lost during the process of file system migration. Because 2784 locks that appear to be lost during the process of migration will be 2785 reclaimed by the client, the servers have to take steps to ensure 2786 that locks revoked soon before or soon after migration are not 2787 inadvertently allowed to be reclaimed in situations in which the 2788 continuity of lock possession cannot be assured. 2790 o For locks lost on the source but whose loss has not yet been 2791 acknowledged by the client (by using FREE_STATEID), the 2792 destination must be aware of this loss so that it can deny a 2793 request to reclaim them. 2795 o For locks lost on the destination after the state transfer but 2796 before the client's RECLAIM_COMPLTE is done, the destination 2797 server should note these and not allow them to be reclaimed. 2799 An additional responsibility of the cooperating servers concerns 2800 situations in which a stateid cannot be transferred transparently 2801 because it conflicts with an existing stateid held by the client and 2802 associated with a different file system. In this case there are two 2803 valid choices: 2805 o Treat the transfer, as in NFSv4.0, as one without Transparent 2806 State Migration. In this case, conflicting locks cannot be 2807 granted until the client does a RECLAIM_COMPLETE, after reclaiming 2808 the locks it had, with the exception of reclaims denied because 2809 they were attempts to reclaim locks that had been lost. 2811 o Implement Transparent State Migration, except for the lock with 2812 the conflicting stateid. In this case, the client will be aware 2813 of a lost lock (through the SEQ4_STATUS flags) and be allowed to 2814 reclaim it. 2816 When transferring state between the source and destination, the 2817 issues discussed in Section 7.2 of [RFC7931] must still be attended 2818 to. In this case, the use of NFS4ERR_DELAY may still necessary in 2819 NFSv4.1, as it was in NFSv4.0, to prevent locking state changing 2820 while it is being transferred. 2822 There are a number of important differences in the NFS4.1 context: 2824 o The absence of RELEASE_LOCKOWNER means that the one case in which 2825 an operation could not be deferred by use of NFS4ERR_DELAY no 2826 longer exists. 2828 o Sequencing of operations is no longer done using owner-based 2829 operation sequences numbers. Instead, sequencing is session- 2830 based 2832 As a result, when sessions are not transferred, the techniques 2833 discussed in Section 7.2 of [RFC7931] are adequate and will not be 2834 further discussed. 2836 5.12.3. Third sub-section within new section to be added to [RFC5661] 2837 to be entitled "Server Responsibilities in Effecting Session 2838 Transfer" 2840 The basic responsibility of the source server in effecting session 2841 transfer is to make available to the destination server a description 2842 of the current state of each slot with the session, including: 2844 o The last sequence value received for that slot. 2846 o Whether there is cached reply data for the last request executed 2847 and, if so, the cached reply. 2849 When sessions are transferred, there are a number of issues that pose 2850 challenges in terms of making the transferred state unmodifiable 2851 during the period it is gathered up and transferred to the 2852 destination server. 2854 o A single session may be used to access multiple file systems, not 2855 all of which are being transferred. 2857 o Requests made on a session may, even if rejected, affect the state 2858 of the session by advancing the sequence number associated with 2859 the slot used. 2861 As a result, when the file system state might otherwise be considered 2862 unmodifiable, the client might have any number of in-flight requests, 2863 each of which is capable of changing session state, which may be of a 2864 number of types: 2866 1. Those requests that were processed on the migrating file system, 2867 before migration began. 2869 2. Those requests which got the error NFS4ERR_DELAY because the file 2870 system being accessed was in the process of being migrated. 2872 3. Those requests which got the error NFS4ERR_MOVED because the file 2873 system being accessed had been migrated. 2875 4. Those requests that accessed the migrating file system, in order 2876 to obtain location or status information. 2878 5. Those requests that did not reference the migrating file system. 2880 It should be noted that the history of any particular slot is likely 2881 to include a number of these request classes. In the case in which a 2882 session which is migrated is used by file systems other than the one 2883 migrated, requests of class 5 may be common and be the last request 2884 processed, for many slots. 2886 Since session state can change even after the locking state has been 2887 fixed as part of the migration process, the session state known to 2888 the client could be different from that on the destination server, 2889 which necessarily reflects the session state on the source server, at 2890 an earlier time. In deciding how to deal with this situation, it is 2891 helpful to distinguish between two sorts of behavioral consequences 2892 of the choice of initial sequence ID values. 2894 o The error NFS4ERR_SEQ_MISORDERED is returned when the sequence ID 2895 in a request is neither equal to the last one seen for the current 2896 slot nor the next greater one. 2898 In view of the difficulty of arriving at a mutually acceptable 2899 value for the correct last sequence value at the point of 2900 migration, it may be necessary for the server to show some degree 2901 of forbearance, when the sequence ID is one that would be 2902 considered unacceptable if session migration were not involved. 2904 o Returning the cached reply for a previously executed request when 2905 the sequence ID in the request matches the last value recorded for 2906 the slot. 2908 In the cases in which an error is returned and there is no 2909 possibility of any non-idempotent operation having been executed, 2910 it may not be necessary to adhere to this as strictly as might be 2911 proper if session migration were not involved. For example, the 2912 fact that the error NFS4ERR_DELAY was returned may not assist the 2913 client in any material way, while the fact that NFS4ERR_MOVED was 2914 returned by the source server may not be relevant when the request 2915 was reissued, directed to the destination server. 2917 An important issue is that the specification needs to take note of 2918 all potential COMPOUNDs, even if they might be unlikely in practice. 2919 For example, a COMPOUND is allowed to access multiple file systems 2920 and might perform non-idempotent operations in some of them before 2921 accessing a file system being migrated. Also, a COMPOUND may return 2922 considerable data in the response, before being rejected with 2923 NFS4ERR_DELAY or NFS4ERR_MOVED, and may in addition be marked as 2924 sa_cachethis. 2926 To address these issues, a destination server MAY do any of the 2927 following when implementing session transfer. 2929 o Avoid enforcing any sequencing semantics for a particular slot 2930 until the client has established the starting sequence for that 2931 slot on the destination server. 2933 o For each slot, avoid returning a cached reply returning 2934 NFS4ERR_DELAY or NFS4ERR_MOVED until the client has established 2935 the starting sequence for that slot on the destination server. 2937 o Until the client has established the starting sequence for a 2938 particular slot on the destination server, avoid reporting 2939 NFS4ERR_SEQ_MISORDERED or return a cached reply returning 2940 NFS4ERR_DELAY or NFS4ERR_MOVED, where the reply consists solely of 2941 a series of operations where the response is NFS4_OK until the 2942 final error. 2944 Because of the considerations mentioned above, the destination server 2945 can respond appropriately to SEQUENCE operations received from the 2946 client by adopting the three policies listed below: 2948 o Not responding with NFS4ERR_SEQ_MISORDERED for the initial request 2949 on a slot within a transferred session, since the destination 2950 server cannot be aware of requests made by the client after the 2951 server handoff but before the client became aware of the shift. 2953 o Replying as it would for a retry whenever the sequence matches 2954 that transferred by the source server, even though this would not 2955 provide retry handling for requests issued after the server 2956 handoff, under the assumption that when such requests are issued 2957 they will never be responded to in a state-changing fashion, 2958 making retry support for them unnecessary. 2960 o Once a non-retry SEQUENCE is received for a given slot, using that 2961 as the basis for further sequence checking, with no further 2962 reference to the sequence value transferred by the sour server. 2964 5.13. Transferred Section 11.8 of [RFC5661] entitled "Effecting File 2965 System Referrals" 2967 Referrals are effected when an absent file system is encountered and 2968 one or more alternate locations are made available by the 2969 fs_locations or fs_locations_info attributes. The client will 2970 typically get an NFS4ERR_MOVED error, fetch the appropriate location 2971 information, and proceed to access the file system on a different 2972 server, even though it retains its logical position within the 2973 original namespace. Referrals differ from migration events in that 2974 they happen only when the client has not previously referenced the 2975 file system in question (so there is nothing to transition). 2976 Referrals can only come into effect when an absent file system is 2977 encountered at its root. 2979 The examples given in the sections below are somewhat artificial in 2980 that an actual client will not typically do a multi-component look 2981 up, but will have cached information regarding the upper levels of 2982 the name hierarchy. However, these examples are chosen to make the 2983 required behavior clear and easy to put within the scope of a small 2984 number of requests, without getting a discussion of the details of 2985 how specific clients might choose to cache things. 2987 5.13.1. Referral Example (LOOKUP) (transferred section) 2989 Let us suppose that the following COMPOUND is sent in an environment 2990 in which /this/is/the/path is absent from the target server. This 2991 may be for a number of reasons. It may be that the file system has 2992 moved, or it may be that the target server is functioning mainly, or 2993 solely, to refer clients to the servers on which various file systems 2994 are located. 2996 o PUTROOTFH 2998 o LOOKUP "this" 3000 o LOOKUP "is" 3002 o LOOKUP "the" 3004 o LOOKUP "path" 3006 o GETFH 3007 o GETATTR (fsid, fileid, size, time_modify) 3009 Under the given circumstances, the following will be the result. 3011 o PUTROOTFH --> NFS_OK. The current fh is now the root of the 3012 pseudo-fs. 3014 o LOOKUP "this" --> NFS_OK. The current fh is for /this and is 3015 within the pseudo-fs. 3017 o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is 3018 within the pseudo-fs. 3020 o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and 3021 is within the pseudo-fs. 3023 o LOOKUP "path" --> NFS_OK. The current fh is for /this/is/the/path 3024 and is within a new, absent file system, but ... the client will 3025 never see the value of that fh. 3027 o GETFH --> NFS4ERR_MOVED. Fails because current fh is in an absent 3028 file system at the start of the operation, and the specification 3029 makes no exception for GETFH. 3031 o GETATTR (fsid, fileid, size, time_modify). Not executed because 3032 the failure of the GETFH stops processing of the COMPOUND. 3034 Given the failure of the GETFH, the client has the job of determining 3035 the root of the absent file system and where to find that file 3036 system, i.e., the server and path relative to that server's root fh. 3037 Note that in this example, the client did not obtain filehandles and 3038 attribute information (e.g., fsid) for the intermediate directories, 3039 so that it would not be sure where the absent file system starts. It 3040 could be the case, for example, that /this/is/the is the root of the 3041 moved file system and that the reason that the look up of "path" 3042 succeeded is that the file system was not absent on that operation 3043 but was moved between the last LOOKUP and the GETFH (since COMPOUND 3044 is not atomic). Even if we had the fsids for all of the intermediate 3045 directories, we could have no way of knowing that /this/is/the/path 3046 was the root of a new file system, since we don't yet have its fsid. 3048 In order to get the necessary information, let us re-send the chain 3049 of LOOKUPs with GETFHs and GETATTRs to at least get the fsids so we 3050 can be sure where the appropriate file system boundaries are. The 3051 client could choose to get fs_locations_info at the same time but in 3052 most cases the client will have a good guess as to where file system 3053 boundaries are (because of where NFS4ERR_MOVED was, and was not, 3054 received) making fetching of fs_locations_info unnecessary. 3056 OP01: PUTROOTFH --> NFS_OK 3058 - Current fh is root of pseudo-fs. 3060 OP02: GETATTR(fsid) --> NFS_OK 3062 - Just for completeness. Normally, clients will know the fsid of 3063 the pseudo-fs as soon as they establish communication with a 3064 server. 3066 OP03: LOOKUP "this" --> NFS_OK 3068 OP04: GETATTR(fsid) --> NFS_OK 3070 - Get current fsid to see where file system boundaries are. The 3071 fsid will be that for the pseudo-fs in this example, so no 3072 boundary. 3074 OP05: GETFH --> NFS_OK 3076 - Current fh is for /this and is within pseudo-fs. 3078 OP06: LOOKUP "is" --> NFS_OK 3080 - Current fh is for /this/is and is within pseudo-fs. 3082 OP07: GETATTR(fsid) --> NFS_OK 3084 - Get current fsid to see where file system boundaries are. The 3085 fsid will be that for the pseudo-fs in this example, so no 3086 boundary. 3088 OP08: GETFH --> NFS_OK 3090 - Current fh is for /this/is and is within pseudo-fs. 3092 OP09: LOOKUP "the" --> NFS_OK 3094 - Current fh is for /this/is/the and is within pseudo-fs. 3096 OP10: GETATTR(fsid) --> NFS_OK 3098 - Get current fsid to see where file system boundaries are. The 3099 fsid will be that for the pseudo-fs in this example, so no 3100 boundary. 3102 OP11: GETFH --> NFS_OK 3103 - Current fh is for /this/is/the and is within pseudo-fs. 3105 OP12: LOOKUP "path" --> NFS_OK 3107 - Current fh is for /this/is/the/path and is within a new, absent 3108 file system, but ... 3110 - The client will never see the value of that fh. 3112 OP13: GETATTR(fsid, fs_locations_info) --> NFS_OK 3114 - We are getting the fsid to know where the file system boundaries 3115 are. In this operation, the fsid will be different than that of 3116 the parent directory (which in turn was retrieved in OP10). Note 3117 that the fsid we are given will not necessarily be preserved at 3118 the new location. That fsid might be different, and in fact the 3119 fsid we have for this file system might be a valid fsid of a 3120 different file system on that new server. 3122 - In this particular case, we are pretty sure anyway that what has 3123 moved is /this/is/the/path rather than /this/is/the since we have 3124 the fsid of the latter and it is that of the pseudo-fs, which 3125 presumably cannot move. However, in other examples, we might not 3126 have this kind of information to rely on (e.g., /this/is/the might 3127 be a non-pseudo file system separate from /this/is/the/path), so 3128 we need to have other reliable source information on the boundary 3129 of the file system that is moved. If, for example, the file 3130 system /this/is had moved, we would have a case of migration 3131 rather than referral, and once the boundaries of the migrated file 3132 system was clear we could fetch fs_locations_info. 3134 - We are fetching fs_locations_info because the fact that we got an 3135 NFS4ERR_MOVED at this point means that it is most likely that this 3136 is a referral and we need the destination. Even if it is the case 3137 that /this/is/the is a file system that has migrated, we will 3138 still need the location information for that file system. 3140 OP14: GETFH --> NFS4ERR_MOVED 3142 - Fails because current fh is in an absent file system at the start 3143 of the operation, and the specification makes no exception for 3144 GETFH. Note that this means the server will never send the client 3145 a filehandle from within an absent file system. 3147 Given the above, the client knows where the root of the absent file 3148 system is (/this/is/the/path) by noting where the change of fsid 3149 occurred (between "the" and "path"). The fs_locations_info attribute 3150 also gives the client the actual location of the absent file system, 3151 so that the referral can proceed. The server gives the client the 3152 bare minimum of information about the absent file system so that 3153 there will be very little scope for problems of conflict between 3154 information sent by the referring server and information of the file 3155 system's home. No filehandles and very few attributes are present on 3156 the referring server, and the client can treat those it receives as 3157 transient information with the function of enabling the referral. 3159 5.13.2. Referral Example (READDIR) (transferred section) 3161 Another context in which a client may encounter referrals is when it 3162 does a READDIR on a directory in which some of the sub-directories 3163 are the roots of absent file systems. 3165 Suppose such a directory is read as follows: 3167 o PUTROOTFH 3169 o LOOKUP "this" 3171 o LOOKUP "is" 3173 o LOOKUP "the" 3175 o READDIR (fsid, size, time_modify, mounted_on_fileid) 3177 In this case, because rdattr_error is not requested, 3178 fs_locations_info is not requested, and some of the attributes cannot 3179 be provided, the result will be an NFS4ERR_MOVED error on the 3180 READDIR, with the detailed results as follows: 3182 o PUTROOTFH --> NFS_OK. The current fh is at the root of the 3183 pseudo-fs. 3185 o LOOKUP "this" --> NFS_OK. The current fh is for /this and is 3186 within the pseudo-fs. 3188 o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is 3189 within the pseudo-fs. 3191 o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and 3192 is within the pseudo-fs. 3194 o READDIR (fsid, size, time_modify, mounted_on_fileid) --> 3195 NFS4ERR_MOVED. Note that the same error would have been returned 3196 if /this/is/the had migrated, but it is returned because the 3197 directory contains the root of an absent file system. 3199 So now suppose that we re-send with rdattr_error: 3201 o PUTROOTFH 3203 o LOOKUP "this" 3205 o LOOKUP "is" 3207 o LOOKUP "the" 3209 o READDIR (rdattr_error, fsid, size, time_modify, mounted_on_fileid) 3211 The results will be: 3213 o PUTROOTFH --> NFS_OK. The current fh is at the root of the 3214 pseudo-fs. 3216 o LOOKUP "this" --> NFS_OK. The current fh is for /this and is 3217 within the pseudo-fs. 3219 o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is 3220 within the pseudo-fs. 3222 o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and 3223 is within the pseudo-fs. 3225 o READDIR (rdattr_error, fsid, size, time_modify, mounted_on_fileid) 3226 --> NFS_OK. The attributes for directory entry with the component 3227 named "path" will only contain rdattr_error with the value 3228 NFS4ERR_MOVED, together with an fsid value and a value for 3229 mounted_on_fileid. 3231 Suppose we do another READDIR to get fs_locations_info (although we 3232 could have used a GETATTR directly, as in Section 5.13.1). 3234 o PUTROOTFH 3236 o LOOKUP "this" 3238 o LOOKUP "is" 3240 o LOOKUP "the" 3242 o READDIR (rdattr_error, fs_locations_info, mounted_on_fileid, fsid, 3243 size, time_modify) 3245 The results would be: 3247 o PUTROOTFH --> NFS_OK. The current fh is at the root of the 3248 pseudo-fs. 3250 o LOOKUP "this" --> NFS_OK. The current fh is for /this and is 3251 within the pseudo-fs. 3253 o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is 3254 within the pseudo-fs. 3256 o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and 3257 is within the pseudo-fs. 3259 o READDIR (rdattr_error, fs_locations_info, mounted_on_fileid, fsid, 3260 size, time_modify) --> NFS_OK. The attributes will be as shown 3261 below. 3263 The attributes for the directory entry with the component named 3264 "path" will only contain: 3266 o rdattr_error (value: NFS_OK) 3268 o fs_locations_info 3270 o mounted_on_fileid (value: unique fileid within referring file 3271 system) 3273 o fsid (value: unique value within referring server) 3275 The attributes for entry "path" will not contain size or time_modify 3276 because these attributes are not available within an absent file 3277 system. 3279 5.14. Transferred Section 11.9 of [RFC5661]" entitled "The Attribute 3280 fs_locations" 3282 The fs_locations attribute is structured in the following way: 3284 struct fs_location4 { 3285 utf8str_cis server<>; 3286 pathname4 rootpath; 3287 }; 3289 struct fs_locations4 { 3290 pathname4 fs_root; 3291 fs_location4 locations<>; 3292 }; 3293 The fs_location4 data type is used to represent the location of a 3294 file system by providing a server name and the path to the root of 3295 the file system within that server's namespace. When a set of 3296 servers have corresponding file systems at the same path within their 3297 namespaces, an array of server names may be provided. An entry in 3298 the server array is a UTF-8 string and represents one of a 3299 traditional DNS host name, IPv4 address, IPv6 address, or a zero- 3300 length string. An IPv4 or IPv6 address is represented as a universal 3301 address (see Section 3.3.9 of [RFC5661] and [RFC5665]), minus the 3302 netid, and either with or without the trailing ".p1.p2" suffix that 3303 represents the port number. If the suffix is omitted, then the 3304 default port, 2049, SHOULD be assumed. A zero-length string SHOULD 3305 be used to indicate the current address being used for the RPC call. 3306 It is not a requirement that all servers that share the same rootpath 3307 be listed in one fs_location4 instance. The array of server names is 3308 provided for convenience. Servers that share the same rootpath may 3309 also be listed in separate fs_location4 entries in the fs_locations 3310 attribute. 3312 The fs_locations4 data type and the fs_locations attribute each 3313 contain an array of such locations. Since the namespace of each 3314 server may be constructed differently, the "fs_root" field is 3315 provided. The path represented by fs_root represents the location of 3316 the file system in the current server's namespace, i.e., that of the 3317 server from which the fs_locations attribute was obtained. The 3318 fs_root path is meant to aid the client by clearly referencing the 3319 root of the file system whose locations are being reported, no matter 3320 what object within the current file system the current filehandle 3321 designates. The fs_root is simply the pathname the client used to 3322 reach the object on the current server (i.e., the object to which the 3323 fs_locations attribute applies). 3325 When the fs_locations attribute is interrogated and there are no 3326 alternate file system locations, the server SHOULD return a zero- 3327 length array of fs_location4 structures, together with a valid 3328 fs_root. 3330 As an example, suppose there is a replicated file system located at 3331 two servers (servA and servB). At servA, the file system is located 3332 at path /a/b/c. At, servB the file system is located at path /x/y/z. 3333 If the client were to obtain the fs_locations value for the directory 3334 at /a/b/c/d, it might not necessarily know that the file system's 3335 root is located in servA's namespace at /a/b/c. When the client 3336 switches to servB, it will need to determine that the directory it 3337 first referenced at servA is now represented by the path /x/y/z/d on 3338 servB. To facilitate this, the fs_locations attribute provided by 3339 servA would have an fs_root value of /a/b/c and two entries in 3340 fs_locations. One entry in fs_locations will be for itself (servA) 3341 and the other will be for servB with a path of /x/y/z. With this 3342 information, the client is able to substitute /x/y/z for the /a/b/c 3343 at the beginning of its access path and construct /x/y/z/d to use for 3344 the new server. 3346 Note that there is no requirement that the number of components in 3347 each rootpath be the same; there is no relation between the number of 3348 components in rootpath or fs_root, and none of the components in a 3349 rootpath and fs_root have to be the same. In the above example, we 3350 could have had a third element in the locations array, with server 3351 equal to "servC" and rootpath equal to "/I/II", and a fourth element 3352 in locations with server equal to "servD" and rootpath equal to 3353 "/aleph/beth/gimel/daleth/he". 3355 The relationship between fs_root to a rootpath is that the client 3356 replaces the pathname indicated in fs_root for the current server for 3357 the substitute indicated in rootpath for the new server. 3359 For an example of a referred or migrated file system, suppose there 3360 is a file system located at serv1. At serv1, the file system is 3361 located at /az/buky/vedi/glagoli. The client finds that object at 3362 glagoli has migrated (or is a referral). The client gets the 3363 fs_locations attribute, which contains an fs_root of /az/buky/vedi/ 3364 glagoli, and one element in the locations array, with server equal to 3365 serv2, and rootpath equal to /izhitsa/fita. The client replaces 3366 /az/buky/vedi/glagoli with /izhitsa/fita, and uses the latter 3367 pathname on serv2. 3369 Thus, the server MUST return an fs_root that is equal to the path the 3370 client used to reach the object to which the fs_locations attribute 3371 applies. Otherwise, the client cannot determine the new path to use 3372 on the new server. 3374 Since the fs_locations attribute lacks information defining various 3375 attributes of the various file system choices presented, it SHOULD 3376 only be interrogated and used when fs_locations_info is not 3377 available. When fs_locations is used, information about the specific 3378 locations should be assumed based on the following rules. 3380 The following rules are general and apply irrespective of the 3381 context. 3383 o All listed file system instances should be considered as of the 3384 same handle class, if and only if, the current fh_expire_type 3385 attribute does not include the FH4_VOL_MIGRATION bit. Note that 3386 in the case of referral, filehandle issues do not apply since 3387 there can be no filehandles known within the current file system, 3388 nor is there any access to the fh_expire_type attribute on the 3389 referring (absent) file system. 3391 o All listed file system instances should be considered as of the 3392 same fileid class if and only if the fh_expire_type attribute 3393 indicates persistent filehandles and does not include the 3394 FH4_VOL_MIGRATION bit. Note that in the case of referral, fileid 3395 issues do not apply since there can be no fileids known within the 3396 referring (absent) file system, nor is there any access to the 3397 fh_expire_type attribute. 3399 o All file system instances servers should be considered as of 3400 different change classes. 3402 For other class assignments, handling of file system transitions 3403 depends on the reasons for the transition: 3405 o When the transition is due to migration, that is, the client was 3406 directed to a new file system after receiving an NFS4ERR_MOVED 3407 error, the target should be treated as being of the same write- 3408 verifier class as the source. 3410 o When the transition is due to failover to another replica, that 3411 is, the client selected another replica without receiving an 3412 NFS4ERR_MOVED error, the target should be treated as being of a 3413 different write-verifier class from the source. 3415 The specific choices reflect typical implementation patterns for 3416 failover and controlled migration, respectively. Since other choices 3417 are possible and useful, this information is better obtained by using 3418 fs_locations_info. When a server implementation needs to communicate 3419 other choices, it MUST support the fs_locations_info attribute. 3421 See Section 8 for a discussion on the recommendations for the 3422 security flavor to be used by any GETATTR operation that requests the 3423 "fs_locations" attribute. 3425 5.15. Updated Section 11.10 of [RFC5661] entitled "The Attribute 3426 fs_locations_info" 3428 The fs_locations_info attribute is intended as a more functional 3429 replacement for the fs_locations attribute which will continue to 3430 exist and be supported. Clients can use it to get a more complete 3431 set of data about alternative file system locations, including 3432 additional network paths to access replicas in use and additional 3433 replicas. When the server does not support fs_locations_info, 3434 fs_locations can be used to get a subset of the data. A server that 3435 supports fs_locations_info MUST support fs_locations as well. 3437 There is additional data present in fs_locations_info, that is not 3438 available in fs_locations: 3440 o Attribute continuity information. This information will allow a 3441 client to select a replica that meets the transparency 3442 requirements of the applications accessing the data and to 3443 leverage optimizations due to the server guarantees of attribute 3444 continuity (e.g., if the change attribute of a file of the file 3445 system is continuous between multiple replicas, the client does 3446 not have to invalidate the file's cache when switching to a 3447 different replica). 3449 o File system identity information that indicates when multiple 3450 replicas, from the client's point of view, correspond to the same 3451 target file system, allowing them to be used interchangeably, 3452 without disruption, as distinct synchronized replicas of the same 3453 file data. 3455 Note that having two replicas with common identity information is 3456 distinct from the case of two (trunked) paths to the same replica. 3458 o Information that will bear on the suitability of various replicas, 3459 depending on the use that the client intends. For example, many 3460 applications need an absolutely up-to-date copy (e.g., those that 3461 write), while others may only need access to the most up-to-date 3462 copy reasonably available. 3464 o Server-derived preference information for replicas, which can be 3465 used to implement load-balancing while giving the client the 3466 entire file system list to be used in case the primary fails. 3468 The fs_locations_info attribute is structured similarly to the 3469 fs_locations attribute. A top-level structure (fs_locations_info4) 3470 contains the entire attribute including the root pathname of the file 3471 system and an array of lower-level structures that define replicas 3472 that share a common rootpath on their respective servers. The lower- 3473 level structure in turn (fs_locations_item4) contains a specific 3474 pathname and information on one or more individual network access 3475 paths. For that last lowest level, fs_locations_info has an 3476 fs_locations_server4 structure that contains per-server-replica 3477 information in addition to the file system location entry. This per- 3478 server-replica information includes a nominally opaque array, 3479 fls_info, within which specific pieces of information are located at 3480 the specific indices listed below. 3482 Two fs_location_server4 entries that are within different 3483 fs_location_item4 structures are never trunkable, while two entries 3484 within in the same fs_location_item4 structure might or might not be 3485 trunkable. Two entries that are trunkable will have identical 3486 identity information, although, as noted above, the converse is not 3487 the case. 3489 The attribute will always contain at least a single 3490 fs_locations_server entry. Typically, there will be an entry with 3491 the FS4LIGF_CUR_REQ flag set, although in the case of a referral 3492 there will be no entry with that flag set. 3494 It should be noted that fs_locations_info attributes returned by 3495 servers for various replicas may differ for various reasons. One 3496 server may know about a set of replicas that are not known to other 3497 servers. Further, compatibility attributes may differ. Filehandles 3498 might be of the same class going from replica A to replica B but not 3499 going in the reverse direction. This might happen because the 3500 filehandles are the same, but replica B's server implementation might 3501 not have provision to note and report that equivalence. 3503 The fs_locations_info attribute consists of a root pathname 3504 (fli_fs_root, just like fs_root in the fs_locations attribute), 3505 together with an array of fs_location_item4 structures. The 3506 fs_location_item4 structures in turn consist of a root pathname 3507 (fli_rootpath) together with an array (fli_entries) of elements of 3508 data type fs_locations_server4, all defined as follows. 3510 3512 /* 3513 * Defines an individual server access path 3514 */ 3515 struct fs_locations_server4 { 3516 int32_t fls_currency; 3517 opaque fls_info<>; 3518 utf8str_cis fls_server; 3519 }; 3521 /* 3522 * Byte indices of items within 3523 * fls_info: flag fields, class numbers, 3524 * bytes indicating ranks and orders. 3525 */ 3526 const FSLI4BX_GFLAGS = 0; 3527 const FSLI4BX_TFLAGS = 1; 3529 const FSLI4BX_CLSIMUL = 2; 3530 const FSLI4BX_CLHANDLE = 3; 3531 const FSLI4BX_CLFILEID = 4; 3532 const FSLI4BX_CLWRITEVER = 5; 3533 const FSLI4BX_CLCHANGE = 6; 3534 const FSLI4BX_CLREADDIR = 7; 3536 const FSLI4BX_READRANK = 8; 3537 const FSLI4BX_WRITERANK = 9; 3538 const FSLI4BX_READORDER = 10; 3539 const FSLI4BX_WRITEORDER = 11; 3541 /* 3542 * Bits defined within the general flag byte. 3543 */ 3544 const FSLI4GF_WRITABLE = 0x01; 3545 const FSLI4GF_CUR_REQ = 0x02; 3546 const FSLI4GF_ABSENT = 0x04; 3547 const FSLI4GF_GOING = 0x08; 3548 const FSLI4GF_SPLIT = 0x10; 3550 /* 3551 * Bits defined within the transport flag byte. 3552 */ 3553 const FSLI4TF_RDMA = 0x01; 3555 /* 3556 * Defines a set of replicas sharing 3557 * a common value of the rootpath 3558 * within the corresponding 3559 * single-server namespaces. 3560 */ 3561 struct fs_locations_item4 { 3562 fs_locations_server4 fli_entries<>; 3563 pathname4 fli_rootpath; 3564 }; 3566 /* 3567 * Defines the overall structure of 3568 * the fs_locations_info attribute. 3569 */ 3570 struct fs_locations_info4 { 3571 uint32_t fli_flags; 3572 int32_t fli_valid_for; 3573 pathname4 fli_fs_root; 3574 fs_locations_item4 fli_items<>; 3575 }; 3577 /* 3578 * Flag bits in fli_flags. 3579 */ 3580 const FSLI4IF_VAR_SUB = 0x00000001; 3581 typedef fs_locations_info4 fattr4_fs_locations_info; 3583 3585 As noted above, the fs_locations_info attribute, when supported, may 3586 be requested of absent file systems without causing NFS4ERR_MOVED to 3587 be returned. It is generally expected that it will be available for 3588 both present and absent file systems even if only a single 3589 fs_locations_server4 entry is present, designating the current 3590 (present) file system, or two fs_locations_server4 entries 3591 designating the previous location of an absent file system (the one 3592 just referenced) and its successor location. Servers are strongly 3593 urged to support this attribute on all file systems if they support 3594 it on any file system. 3596 The data presented in the fs_locations_info attribute may be obtained 3597 by the server in any number of ways, including specification by the 3598 administrator or by current protocols for transferring data among 3599 replicas and protocols not yet developed. NFSv4.1 only defines how 3600 this information is presented by the server to the client. 3602 5.15.1. Updated section 11.10.1 of [RFC5661] entitled "The 3603 fs_locations_server4 Structure" 3605 The fs_locations_server4 structure consists of the following items in 3606 addition to the fls_server field which specifies a network address or 3607 set of addresses to be used to access the specified file system. 3608 Note that both of these items (i.e., fls_currency and flinfo) specify 3609 attributes of the file system replica and should not be different 3610 when there are multiple fs_locations_server4 structures for the same 3611 replica, each specifying a network path to the chosen replica. 3613 When these values are different in two fs_locations_server4 3614 structures, a client has no basis for choosing one over the other and 3615 is best off simply ignoring both entries, whether these entries apply 3616 to migration replication or referral. When there are more than two 3617 such entries, majority voting can be used to exclude a single 3618 erroneous entry from consideration. In the case in which trunking 3619 information is provided for a replica currently being accessed, the 3620 additional trunked addresses can be ignored while access continues on 3621 the address currently being used, even if the entry corresponding to 3622 that path might be considered invalid. 3624 o An indication of how up-to-date the file system is (fls_currency) 3625 in seconds. This value is relative to the master copy. A 3626 negative value indicates that the server is unable to give any 3627 reasonably useful value here. A value of zero indicates that the 3628 file system is the actual writable data or a reliably coherent and 3629 fully up-to-date copy. Positive values indicate how out-of-date 3630 this copy can normally be before it is considered for update. 3631 Such a value is not a guarantee that such updates will always be 3632 performed on the required schedule but instead serves as a hint 3633 about how far the copy of the data would be expected to be behind 3634 the most up-to-date copy. 3636 o A counted array of one-byte values (fls_info) containing 3637 information about the particular file system instance. This data 3638 includes general flags, transport capability flags, file system 3639 equivalence class information, and selection priority information. 3640 The encoding will be discussed below. 3642 o The server string (fls_server). For the case of the replica 3643 currently being accessed (via GETATTR), a zero-length string MAY 3644 be used to indicate the current address being used for the RPC 3645 call. The fls_server field can also be an IPv4 or IPv6 address, 3646 formatted the same way as an IPv4 or IPv6 address in the "server" 3647 field of the fs_location4 data type (see Section 11.9 of 3648 [RFC5661]). 3650 With the exception of the transport-flag field (at offset 3651 FSLI4BX_TFLAGS with the fls_info array), all of this data applies to 3652 the replica specified by the entry, rather that the specific network 3653 path used to access it. 3655 Data within the fls_info array is in the form of 8-bit data items 3656 with constants giving the offsets within the array of various values 3657 describing this particular file system instance. This style of 3658 definition was chosen, in preference to explicit XDR structure 3659 definitions for these values, for a number of reasons. 3661 o The kinds of data in the fls_info array, representing flags, file 3662 system classes, and priorities among sets of file systems 3663 representing the same data, are such that 8 bits provide a quite 3664 acceptable range of values. Even where there might be more than 3665 256 such file system instances, having more than 256 distinct 3666 classes or priorities is unlikely. 3668 o Explicit definition of the various specific data items within XDR 3669 would limit expandability in that any extension within would 3670 require yet another attribute, leading to specification and 3671 implementation clumsiness. In the context of the NFSv4 extension 3672 model in effect at the time fs_locations_info was designed (i.e. 3673 that described in [RFC5661]), this would necessitate a new minor 3674 version to effect any Standards Track extension to the data in in 3675 fls_info. 3677 The set of fls_info data is subject to expansion in a future minor 3678 version, or in a Standards Track RFC, within the context of a single 3679 minor version. The server SHOULD NOT send and the client MUST NOT 3680 use indices within the fls_info array or flag bits that are not 3681 defined in Standards Track RFCs. 3683 In light of the new extension model defined in [RFC8178] and the fact 3684 that the individual items within fls_info are not explicitly 3685 referenced in the XDR, the following practices should be followed 3686 when extending or otherwise changing the structure of the data 3687 returned in fls_info within the scope of a single minor version. 3689 o All extensions need to be described by Standards Track documents. 3690 There is no need for such documents to be marked as updating 3691 [RFC5661] or this document. 3693 o It needs to be made clear whether the information in any added 3694 data items applies to the replica specified by the entry or to the 3695 specific network paths specified in the entry. 3697 o There needs to be a reliable way defined to determine whether the 3698 server is aware of the extension. This may be based on the length 3699 field of the fls_info array, but it is more flexible to provide 3700 fs-scope or server-scope attributes to indicate what extensions 3701 are provided. 3703 This encoding scheme can be adapted to the specification of multi- 3704 byte numeric values, even though none are currently defined. If 3705 extensions are made via Standards Track RFCs, multi-byte quantities 3706 will be encoded as a range of bytes with a range of indices, with the 3707 byte interpreted in big-endian byte order. Further, any such index 3708 assignments will be constrained by the need for the relevant 3709 quantities not to cross XDR word boundaries. 3711 The fls_info array currently contains: 3713 o Two 8-bit flag fields, one devoted to general file-system 3714 characteristics and a second reserved for transport-related 3715 capabilities. 3717 o Six 8-bit class values that define various file system equivalence 3718 classes as explained below. 3720 o Four 8-bit priority values that govern file system selection as 3721 explained below. 3723 The general file system characteristics flag (at byte index 3724 FSLI4BX_GFLAGS) has the following bits defined within it: 3726 o FSLI4GF_WRITABLE indicates that this file system target is 3727 writable, allowing it to be selected by clients that may need to 3728 write on this file system. When the current file system instance 3729 is writable and is defined as of the same simultaneous use class 3730 (as specified by the value at index FSLI4BX_CLSIMUL) to which the 3731 client was previously writing, then it must incorporate within its 3732 data any committed write made on the source file system instance. 3733 See Section 5.9.6, which discusses the write-verifier class. 3734 While there is no harm in not setting this flag for a file system 3735 that turns out to be writable, turning the flag on for a read-only 3736 file system can cause problems for clients that select a migration 3737 or replication target based on the flag and then find themselves 3738 unable to write. 3740 o FSLI4GF_CUR_REQ indicates that this replica is the one on which 3741 the request is being made. Only a single server entry may have 3742 this flag set and, in the case of a referral, no entry will have 3743 it set. Note that this flag might be set even if the request was 3744 made on a network access path different from any of those 3745 specified in the current entry. 3747 o FSLI4GF_ABSENT indicates that this entry corresponds to an absent 3748 file system replica. It can only be set if FSLI4GF_CUR_REQ is 3749 set. When both such bits are set, it indicates that a file system 3750 instance is not usable but that the information in the entry can 3751 be used to determine the sorts of continuity available when 3752 switching from this replica to other possible replicas. Since 3753 this bit can only be true if FSLI4GF_CUR_REQ is true, the value 3754 could be determined using the fs_status attribute, but the 3755 information is also made available here for the convenience of the 3756 client. An entry with this bit, since it represents a true file 3757 system (albeit absent), does not appear in the event of a 3758 referral, but only when a file system has been accessed at this 3759 location and has subsequently been migrated. 3761 o FSLI4GF_GOING indicates that a replica, while still available, 3762 should not be used further. The client, if using it, should make 3763 an orderly transfer to another file system instance as 3764 expeditiously as possible. It is expected that file systems going 3765 out of service will be announced as FSLI4GF_GOING some time before 3766 the actual loss of service. It is also expected that the 3767 fli_valid_for value will be sufficiently small to allow clients to 3768 detect and act on scheduled events, while large enough that the 3769 cost of the requests to fetch the fs_locations_info values will 3770 not be excessive. Values on the order of ten minutes seem 3771 reasonable. 3773 When this flag is seen as part of a transition into a new file 3774 system, a client might choose to transfer immediately to another 3775 replica, or it may reference the current file system and only 3776 transition when a migration event occurs. Similarly, when this 3777 flag appears as a replica in the referral, clients would likely 3778 avoid being referred to this instance whenever there is another 3779 choice. 3781 This flag, like the other items within fls_info applies to the 3782 replica, rather than to a particular path to that replica. When 3783 it appears, a transition to a new replica rather than to a 3784 different path to the same replica, is indicated. 3786 o FSLI4GF_SPLIT indicates that when a transition occurs from the 3787 current file system instance to this one, the replacement may 3788 consist of multiple file systems. In this case, the client has to 3789 be prepared for the possibility that objects on the same file 3790 system before migration will be on different ones after. Note 3791 that FSLI4GF_SPLIT is not incompatible with the file systems 3792 belonging to the same fileid class since, if one has a set of 3793 fileids that are unique within a file system, each subset assigned 3794 to a smaller file system after migration would not have any 3795 conflicts internal to that file system. 3797 A client, in the case of a split file system, will interrogate 3798 existing files with which it has continuing connection (it is free 3799 to simply forget cached filehandles). If the client remembers the 3800 directory filehandle associated with each open file, it may 3801 proceed upward using LOOKUPP to find the new file system 3802 boundaries. Note that in the event of a referral, there will not 3803 be any such files and so these actions will not be performed. 3804 Instead, a reference to a portion of the original file system now 3805 split off into other file systems will encounter an fsid change 3806 and possibly a further referral. 3808 Once the client recognizes that one file system has been split 3809 into two, it can prevent the disruption of running applications by 3810 presenting the two file systems as a single one until a convenient 3811 point to recognize the transition, such as a restart. This would 3812 require a mapping from the server's fsids to fsids as seen by the 3813 client, but this is already necessary for other reasons. As noted 3814 above, existing fileids within the two descendant file systems 3815 will not conflict. Providing non-conflicting fileids for newly 3816 created files on the split file systems is the responsibility of 3817 the server (or servers working in concert). The server can encode 3818 filehandles such that filehandles generated before the split event 3819 can be discerned from those generated after the split, allowing 3820 the server to determine when the need for emulating two file 3821 systems as one is over. 3823 Although it is possible for this flag to be present in the event 3824 of referral, it would generally be of little interest to the 3825 client, since the client is not expected to have information 3826 regarding the current contents of the absent file system. 3828 The transport-flag field (at byte index FSLI4BX_TFLAGS) contains the 3829 following bits related to the transport capabilities of the specific 3830 network path(s) specified by the entry. 3832 o FSLI4TF_RDMA indicates that any specified network paths provide 3833 NFSv4.1 clients access using an RDMA-capable transport. 3835 Attribute continuity and file system identity information are 3836 expressed by defining equivalence relations on the sets of file 3837 systems presented to the client. Each such relation is expressed as 3838 a set of file system equivalence classes. For each relation, a file 3839 system has an 8-bit class number. Two file systems belong to the 3840 same class if both have identical non-zero class numbers. Zero is 3841 treated as non-matching. Most often, the relevant question for the 3842 client will be whether a given replica is identical to / continuous 3843 with the current one in a given respect, but the information should 3844 be available also as to whether two other replicas match in that 3845 respect as well. 3847 The following fields specify the file system's class numbers for the 3848 equivalence relations used in determining the nature of file system 3849 transitions. See Sections 5.7 through 5.12 and their various 3850 subsections for details about how this information is to be used. 3851 Servers may assign these values as they wish, so long as file system 3852 instances that share the same value have the specified relationship 3853 to one another; conversely, file systems that have the specified 3854 relationship to one another share a common class value. As each 3855 instance entry is added, the relationships of this instance to 3856 previously entered instances can be consulted, and if one is found 3857 that bears the specified relationship, that entry's class value can 3858 be copied to the new entry. When no such previous entry exists, a 3859 new value for that byte index (not previously used) can be selected, 3860 most likely by incrementing the value of the last class value 3861 assigned for that index. 3863 o The field with byte index FSLI4BX_CLSIMUL defines the 3864 simultaneous-use class for the file system. 3866 o The field with byte index FSLI4BX_CLHANDLE defines the handle 3867 class for the file system. 3869 o The field with byte index FSLI4BX_CLFILEID defines the fileid 3870 class for the file system. 3872 o The field with byte index FSLI4BX_CLWRITEVER defines the write- 3873 verifier class for the file system. 3875 o The field with byte index FSLI4BX_CLCHANGE defines the change 3876 class for the file system. 3878 o The field with byte index FSLI4BX_CLREADDIR defines the readdir 3879 class for the file system. 3881 Server-specified preference information is also provided via 8-bit 3882 values within the fls_info array. The values provide a rank and an 3883 order (see below) to be used with separate values specifiable for the 3884 cases of read-only and writable file systems. These values are 3885 compared for different file systems to establish the server-specified 3886 preference, with lower values indicating "more preferred". 3888 Rank is used to express a strict server-imposed ordering on clients, 3889 with lower values indicating "more preferred". Clients should 3890 attempt to use all replicas with a given rank before they use one 3891 with a higher rank. Only if all of those file systems are 3892 unavailable should the client proceed to those of a higher rank. 3893 Because specifying a rank will override client preferences, servers 3894 should be conservative about using this mechanism, particularly when 3895 the environment is one in which client communication characteristics 3896 are neither tightly controlled nor visible to the server. 3898 Within a rank, the order value is used to specify the server's 3899 preference to guide the client's selection when the client's own 3900 preferences are not controlling, with lower values of order 3901 indicating "more preferred". If replicas are approximately equal in 3902 all respects, clients should defer to the order specified by the 3903 server. When clients look at server latency as part of their 3904 selection, they are free to use this criterion, but it is suggested 3905 that when latency differences are not significant, the server- 3906 specified order should guide selection. 3908 o The field at byte index FSLI4BX_READRANK gives the rank value to 3909 be used for read-only access. 3911 o The field at byte index FSLI4BX_READORDER gives the order value to 3912 be used for read-only access. 3914 o The field at byte index FSLI4BX_WRITERANK gives the rank value to 3915 be used for writable access. 3917 o The field at byte index FSLI4BX_WRITEORDER gives the order value 3918 to be used for writable access. 3920 Depending on the potential need for write access by a given client, 3921 one of the pairs of rank and order values is used. The read rank and 3922 order should only be used if the client knows that only reading will 3923 ever be done or if it is prepared to switch to a different replica in 3924 the event that any write access capability is required in the future. 3926 5.15.2. Updated Section 11.10.2 of [RFC5661] entitled "The 3927 fs_locations_info4 Structure" 3929 The fs_locations_info4 structure, encoding the fs_locations_info 3930 attribute, contains the following: 3932 o The fli_flags field, which contains general flags that affect the 3933 interpretation of this fs_locations_info4 structure and all 3934 fs_locations_item4 structures within it. The only flag currently 3935 defined is FSLI4IF_VAR_SUB. All bits in the fli_flags field that 3936 are not defined should always be returned as zero. 3938 o The fli_fs_root field, which contains the pathname of the root of 3939 the current file system on the current server, just as it does in 3940 the fs_locations4 structure. 3942 o An array called fli_items of fs_locations4_item structures, which 3943 contain information about replicas of the current file system. 3944 Where the current file system is actually present, or has been 3945 present, i.e., this is not a referral situation, one of the 3946 fs_locations_item4 structures will contain an fs_locations_server4 3947 for the current server. This structure will have FSLI4GF_ABSENT 3948 set if the current file system is absent, i.e., normal access to 3949 it will return NFS4ERR_MOVED. 3951 o The fli_valid_for field specifies a time in seconds for which it 3952 is reasonable for a client to use the fs_locations_info attribute 3953 without refetch. The fli_valid_for value does not provide a 3954 guarantee of validity since servers can unexpectedly go out of 3955 service or become inaccessible for any number of reasons. Clients 3956 are well-advised to refetch this information for an actively 3957 accessed file system at every fli_valid_for seconds. This is 3958 particularly important when file system replicas may go out of 3959 service in a controlled way using the FSLI4GF_GOING flag to 3960 communicate an ongoing change. The server should set 3961 fli_valid_for to a value that allows well-behaved clients to 3962 notice the FSLI4GF_GOING flag and make an orderly switch before 3963 the loss of service becomes effective. If this value is zero, 3964 then no refetch interval is appropriate and the client need not 3965 refetch this data on any particular schedule. In the event of a 3966 transition to a new file system instance, a new value of the 3967 fs_locations_info attribute will be fetched at the destination. 3968 It is to be expected that this may have a different fli_valid_for 3969 value, which the client should then use in the same fashion as the 3970 previous value. Because a refetch of the attribute cause 3971 information from all component entries to be refetched, the server 3972 will typically provide a low value for this field if any of the 3973 replicas are likely to go out of service in a short time frame. 3974 Note that, because of the ability of the server to return 3975 NFS4ERR_MOVED to change to use of different paths, when alternate 3976 trunked paths are available, there is generally no need to use low 3977 values of fli_valid_for in connection with the management of 3978 alternate paths to the same replica. 3980 The FSLI4IF_VAR_SUB flag within fli_flags controls whether variable 3981 substitution is to be enabled. See Section 5.15.3 for an explanation 3982 of variable substitution. 3984 5.15.3. Updated Section 11.10.3 of [RFC5661] entitled "The 3985 fs_locations_item4 Structure" 3987 The fs_locations_item4 structure contains a pathname (in the field 3988 fli_rootpath) that encodes the path of the target file system 3989 replicas on the set of servers designated by the included 3990 fs_locations_server4 entries. The precise manner in which this 3991 target location is specified depends on the value of the 3992 FSLI4IF_VAR_SUB flag within the associated fs_locations_info4 3993 structure. 3995 If this flag is not set, then fli_rootpath simply designates the 3996 location of the target file system within each server's single-server 3997 namespace just as it does for the rootpath within the fs_location4 3998 structure. When this bit is set, however, component entries of a 3999 certain form are subject to client-specific variable substitution so 4000 as to allow a degree of namespace non-uniformity in order to 4001 accommodate the selection of client-specific file system targets to 4002 adapt to different client architectures or other characteristics. 4004 When such substitution is in effect, a variable beginning with the 4005 string "${" and ending with the string "}" and containing a colon is 4006 to be replaced by the client-specific value associated with that 4007 variable. The string "unknown" should be used by the client when it 4008 has no value for such a variable. The pathname resulting from such 4009 substitutions is used to designate the target file system, so that 4010 different clients may have different file systems, corresponding to 4011 that location in the multi-server namespace. 4013 As mentioned above, such substituted pathname variables contain a 4014 colon. The part before the colon is to be a DNS domain name, and the 4015 part after is to be a case-insensitive alphanumeric string. 4017 Where the domain is "ietf.org", only variable names defined in this 4018 document or subsequent Standards Track RFCs are subject to such 4019 substitution. Organizations are free to use their domain names to 4020 create their own sets of client-specific variables, to be subject to 4021 such substitution. In cases where such variables are intended to be 4022 used more broadly than a single organization, publication of an 4023 Informational RFC defining such variables is RECOMMENDED. 4025 The variable ${ietf.org:CPU_ARCH} is used to denote that the CPU 4026 architecture object files are compiled. This specification does not 4027 limit the acceptable values (except that they must be valid UTF-8 4028 strings), but such values as "x86", "x86_64", and "sparc" would be 4029 expected to be used in line with industry practice. 4031 The variable ${ietf.org:OS_TYPE} is used to denote the operating 4032 system, and thus the kernel and library APIs, for which code might be 4033 compiled. This specification does not limit the acceptable values 4034 (except that they must be valid UTF-8 strings), but such values as 4035 "linux" and "freebsd" would be expected to be used in line with 4036 industry practice. 4038 The variable ${ietf.org:OS_VERSION} is used to denote the operating 4039 system version, and thus the specific details of versioned 4040 interfaces, for which code might be compiled. This specification 4041 does not limit the acceptable values (except that they must be valid 4042 UTF-8 strings). However, combinations of numbers and letters with 4043 interspersed dots would be expected to be used in line with industry 4044 practice, with the details of the version format depending on the 4045 specific value of the variable ${ietf.org:OS_TYPE} with which it is 4046 used. 4048 Use of these variables could result in the direction of different 4049 clients to different file systems on the same server, as appropriate 4050 to particular clients. In cases in which the target file systems are 4051 located on different servers, a single server could serve as a 4052 referral point so that each valid combination of variable values 4053 would designate a referral hosted on a single server, with the 4054 targets of those referrals on a number of different servers. 4056 Because namespace administration is affected by the values selected 4057 to substitute for various variables, clients should provide 4058 convenient means of determining what variable substitutions a client 4059 will implement, as well as, where appropriate, providing means to 4060 control the substitutions to be used. The exact means by which this 4061 will be done is outside the scope of this specification. 4063 Although variable substitution is most suitable for use in the 4064 context of referrals, it may be used in the context of replication 4065 and migration. If it is used in these contexts, the server must 4066 ensure that no matter what values the client presents for the 4067 substituted variables, the result is always a valid successor file 4068 system instance to that from which a transition is occurring, i.e., 4069 that the data is identical or represents a later image of a writable 4070 file system. 4072 Note that when fli_rootpath is a null pathname (that is, one with 4073 zero components), the file system designated is at the root of the 4074 specified server, whether or not the FSLI4IF_VAR_SUB flag within the 4075 associated fs_locations_info4 structure is set. 4077 5.16. Transferred Section 11.11 of [RFC5661]" entitled "The Attribute 4078 fs_status" 4080 In an environment in which multiple copies of the same basic set of 4081 data are available, information regarding the particular source of 4082 such data and the relationships among different copies can be very 4083 helpful in providing consistent data to applications. 4085 enum fs4_status_type { 4086 STATUS4_FIXED = 1, 4087 STATUS4_UPDATED = 2, 4088 STATUS4_VERSIONED = 3, 4089 STATUS4_WRITABLE = 4, 4090 STATUS4_REFERRAL = 5 4091 }; 4093 struct fs4_status { 4094 bool fss_absent; 4095 fs4_status_type fss_type; 4096 utf8str_cs fss_source; 4097 utf8str_cs fss_current; 4098 int32_t fss_age; 4099 nfstime4 fss_version; 4100 }; 4102 The boolean fss_absent indicates whether the file system is currently 4103 absent. This value will be set if the file system was previously 4104 present and becomes absent, or if the file system has never been 4105 present and the type is STATUS4_REFERRAL. When this boolean is set 4106 and the type is not STATUS4_REFERRAL, the remaining information in 4107 the fs4_status reflects that last valid when the file system was 4108 present. 4110 The fss_type field indicates the kind of file system image 4111 represented. This is of particular importance when using the version 4112 values to determine appropriate succession of file system images. 4113 When fss_absent is set, and the file system was previously present, 4114 the value of fss_type reflected is that when the file was last 4115 present. Five values are distinguished: 4117 o STATUS4_FIXED, which indicates a read-only image in the sense that 4118 it will never change. The possibility is allowed that, as a 4119 result of migration or switch to a different image, changed data 4120 can be accessed, but within the confines of this instance, no 4121 change is allowed. The client can use this fact to cache 4122 aggressively. 4124 o STATUS4_VERSIONED, which indicates that the image, like the 4125 STATUS4_UPDATED case, is updated externally, but it provides a 4126 guarantee that the server will carefully update an associated 4127 version value so that the client can protect itself from a 4128 situation in which it reads data from one version of the file 4129 system and then later reads data from an earlier version of the 4130 same file system. See below for a discussion of how this can be 4131 done. 4133 o STATUS4_UPDATED, which indicates an image that cannot be updated 4134 by the user writing to it but that may be changed externally, 4135 typically because it is a periodically updated copy of another 4136 writable file system somewhere else. In this case, version 4137 information is not provided, and the client does not have the 4138 responsibility of making sure that this version only advances upon 4139 a file system instance transition. In this case, it is the 4140 responsibility of the server to make sure that the data presented 4141 after a file system instance transition is a proper successor 4142 image and includes all changes seen by the client and any change 4143 made before all such changes. 4145 o STATUS4_WRITABLE, which indicates that the file system is an 4146 actual writable one. The client need not, of course, actually 4147 write to the file system, but once it does, it should not accept a 4148 transition to anything other than a writable instance of that same 4149 file system. 4151 o STATUS4_REFERRAL, which indicates that the file system in question 4152 is absent and has never been present on this server. 4154 Note that in the STATUS4_UPDATED and STATUS4_VERSIONED cases, the 4155 server is responsible for the appropriate handling of locks that are 4156 inconsistent with external changes to delegations. If a server gives 4157 out delegations, they SHOULD be recalled before an inconsistent 4158 change is made to the data, and MUST be revoked if this is not 4159 possible. Similarly, if an OPEN is inconsistent with data that is 4160 changed (the OPEN has OPEN4_SHARE_DENY_WRITE/OPEN4_SHARE_DENY_BOTH 4161 and the data is changed), that OPEN SHOULD be considered 4162 administratively revoked. 4164 The opaque strings fss_source and fss_current provide a way of 4165 presenting information about the source of the file system image 4166 being present. It is not intended that the client do anything with 4167 this information other than make it available to administrative 4168 tools. It is intended that this information be helpful when 4169 researching possible problems with a file system image that might 4170 arise when it is unclear if the correct image is being accessed and, 4171 if not, how that image came to be made. This kind of diagnostic 4172 information will be helpful, if, as seems likely, copies of file 4173 systems are made in many different ways (e.g., simple user-level 4174 copies, file-system-level point-in-time copies, clones of the 4175 underlying storage), under a variety of administrative arrangements. 4176 In such environments, determining how a given set of data was 4177 constructed can be very helpful in resolving problems. 4179 The opaque string fss_source is used to indicate the source of a 4180 given file system with the expectation that tools capable of creating 4181 a file system image propagate this information, when possible. It is 4182 understood that this may not always be possible since a user-level 4183 copy may be thought of as creating a new data set and the tools used 4184 may have no mechanism to propagate this data. When a file system is 4185 initially created, it is desirable to associate with it data 4186 regarding how the file system was created, where it was created, who 4187 created it, etc. Making this information available in this attribute 4188 in a human-readable string will be helpful for applications and 4189 system administrators and will also serve to make it available when 4190 the original file system is used to make subsequent copies. 4192 The opaque string fss_current should provide whatever information is 4193 available about the source of the current copy. Such information 4194 includes the tool creating it, any relevant parameters to that tool, 4195 the time at which the copy was done, the user making the change, the 4196 server on which the change was made, etc. All information should be 4197 in a human-readable string. 4199 The field fss_age provides an indication of how out-of-date the file 4200 system currently is with respect to its ultimate data source (in case 4201 of cascading data updates). This complements the fls_currency field 4202 of fs_locations_server4 (see Section 5.15) in the following way: the 4203 information in fls_currency gives a bound for how out of date the 4204 data in a file system might typically get, while the value in fss_age 4205 gives a bound on how out-of-date that data actually is. Negative 4206 values imply that no information is available. A zero means that 4207 this data is known to be current. A positive value means that this 4208 data is known to be no older than that number of seconds with respect 4209 to the ultimate data source. Using this value, the client may be 4210 able to decide that a data copy is too old, so that it may search for 4211 a newer version to use. 4213 The fss_version field provides a version identification, in the form 4214 of a time value, such that successive versions always have later time 4215 values. When the fs_type is anything other than STATUS4_VERSIONED, 4216 the server may provide such a value, but there is no guarantee as to 4217 its validity and clients will not use it except to provide additional 4218 information to add to fss_source and fss_current. 4220 When fss_type is STATUS4_VERSIONED, servers SHOULD provide a value of 4221 fss_version that progresses monotonically whenever any new version of 4222 the data is established. This allows the client, if reliable image 4223 progression is important to it, to fetch this attribute as part of 4224 each COMPOUND where data or metadata from the file system is used. 4226 When it is important to the client to make sure that only valid 4227 successor images are accepted, it must make sure that it does not 4228 read data or metadata from the file system without updating its sense 4229 of the current state of the image. This is to avoid the possibility 4230 that the fs_status that the client holds will be one for an earlier 4231 image, which would cause the client to accept a new file system 4232 instance that is later than that but still earlier than the updated 4233 data read by the client. 4235 In order to accept valid images reliably, the client must do a 4236 GETATTR of the fs_status attribute that follows any interrogation of 4237 data or metadata within the file system in question. Often this is 4238 most conveniently done by appending such a GETATTR after all other 4239 operations that reference a given file system. When errors occur 4240 between reading file system data and performing such a GETATTR, care 4241 must be exercised to make sure that the data in question is not used 4242 before obtaining the proper fs_status value. In this connection, 4243 when an OPEN is done within such a versioned file system and the 4244 associated GETATTR of fs_status is not successfully completed, the 4245 open file in question must not be accessed until that fs_status is 4246 fetched. 4248 The procedure above will ensure that before using any data from the 4249 file system the client has in hand a newly-fetched current version of 4250 the file system image. Multiple values for multiple requests in 4251 flight can be resolved by assembling them into the required partial 4252 order (and the elements should form a total order within the partial 4253 order) and using the last. The client may then, when switching among 4254 file system instances, decline to use an instance that does not have 4255 an fss_type of STATUS4_VERSIONED or whose fss_version field is 4256 earlier than the last one obtained from the predecessor file system 4257 instance. 4259 6. Revised Error Definitions within [RFC5661] 4261 6.1. Added Initial subsection of Section 15.1 of [RFC5661] entitled 4262 "Overall Error Table" 4264 This section contains an updated table including all NFSv4.1 error 4265 codes. In each case a reference to the most-current description is 4266 given, whether that description is within this document or [RFC5661]. 4268 Updated Error Definition References 4270 +------------------------------------+--------+---------------------+ 4271 | Error | Number | Description | 4272 +------------------------------------+--------+---------------------+ 4273 | NFS4_OK | 0 | 15.1.3.1 in RFC5661 | 4274 | NFS4ERR_ACCESS | 13 | 15.1.6.1 in RFC5661 | 4275 | NFS4ERR_ATTRNOTSUPP | 10032 | 15.1.15.1 in | 4276 | | | RFC5661 | 4277 | NFS4ERR_ADMIN_REVOKED | 10047 | 15.1.5.1 in RFC5661 | 4278 | NFS4ERR_BACK_CHAN_BUSY | 10057 | 15.1.12.1 in | 4279 | | | RFC5661 | 4280 | NFS4ERR_BADCHAR | 10040 | 15.1.7.1 in RFC5661 | 4281 | NFS4ERR_BADHANDLE | 10001 | 15.1.2.1 in RFC5661 | 4282 | NFS4ERR_BADIOMODE | 10049 | 15.1.10.1 in | 4283 | | | RFC5661 | 4284 | NFS4ERR_BADLAYOUT | 10050 | 15.1.10.2 in | 4285 | | | RFC5661 | 4286 | NFS4ERR_BADNAME | 10041 | 15.1.7.2 in RFC5661 | 4287 | NFS4ERR_BADOWNER | 10039 | 15.1.15.2 in | 4288 | | | RFC5661 | 4289 | NFS4ERR_BADSESSION | 10052 | 15.1.11.1 in | 4290 | | | RFC5661 | 4291 | NFS4ERR_BADSLOT | 10053 | 15.1.11.2 in | 4292 | | | RFC5661 | 4293 | NFS4ERR_BADTYPE | 10007 | 15.1.4.1 in RFC5661 | 4294 | NFS4ERR_BADXDR | 10036 | 15.1.1.1 in RFC5661 | 4295 | NFS4ERR_BAD_COOKIE | 10003 | 15.1.1.2 in RFC5661 | 4296 | NFS4ERR_BAD_HIGH_SLOT | 10077 | 15.1.11.3 in | 4297 | | | RFC5661 | 4298 | NFS4ERR_BAD_RANGE | 10042 | 15.1.8.1 in RFC5661 | 4299 | NFS4ERR_BAD_SEQID | 10026 | 15.1.16.1 in | 4300 | | | RFC5661 | 4301 | NFS4ERR_BAD_SESSION_DIGEST | 10051 | 15.1.12.2 in | 4302 | | | RFC5661 | 4303 | NFS4ERR_BAD_STATEID | 10025 | 15.1.5.2 in RFC5661 | 4304 | NFS4ERR_CB_PATH_DOWN | 10048 | 15.1.11.4 in | 4305 | | | RFC5661 | 4306 | NFS4ERR_CLID_INUSE | 10017 | 15.1.13.2 in | 4307 | | | RFC5661 | 4308 | NFS4ERR_CLIENTID_BUSY | 10074 | 15.1.13.1 in | 4309 | | | RFC5661 | 4310 | NFS4ERR_COMPLETE_ALREADY | 10054 | Section 6.3.1 | 4311 | NFS4ERR_CONN_NOT_BOUND_TO_SESSION | 10055 | 15.1.11.6 in | 4312 | | | RFC5661 | 4313 | NFS4ERR_DEADLOCK | 10045 | 15.1.8.2 in RFC5661 | 4314 | NFS4ERR_DEADSESSION | 10078 | 15.1.11.5 in | 4315 | | | RFC5661 | 4316 | NFS4ERR_DELAY | 10008 | 15.1.1.3 in RFC5661 | 4317 | NFS4ERR_DELEG_ALREADY_WANTED | 10056 | 15.1.14.1 in | 4318 | | | RFC5661 | 4319 | NFS4ERR_DELEG_REVOKED | 10087 | 15.1.5.3 in RFC5661 | 4320 | NFS4ERR_DENIED | 10010 | 15.1.8.3 in RFC5661 | 4321 | NFS4ERR_DIRDELEG_UNAVAIL | 10084 | 15.1.14.2 in | 4322 | | | RFC5661 | 4323 | NFS4ERR_DQUOT | 69 | 15.1.4.2 in RFC5661 | 4324 | NFS4ERR_ENCR_ALG_UNSUPP | 10079 | 15.1.13.3 in | 4325 | | | RFC5661 | 4326 | NFS4ERR_EXIST | 17 | 15.1.4.3 in RFC5661 | 4327 | NFS4ERR_EXPIRED | 10011 | 15.1.5.4 in RFC5661 | 4328 | NFS4ERR_FBIG | 27 | 15.1.4.4 in RFC5661 | 4329 | NFS4ERR_FHEXPIRED | 10014 | 15.1.2.2 in RFC5661 | 4330 | NFS4ERR_FILE_OPEN | 10046 | 15.1.4.5 in RFC5661 | 4331 | NFS4ERR_GRACE | 10013 | Section 6.3.2 | 4332 | NFS4ERR_HASH_ALG_UNSUPP | 10072 | 15.1.13.4 in | 4333 | | | RFC5661 | 4334 | NFS4ERR_INVAL | 22 | 15.1.1.4 in RFC5661 | 4335 | NFS4ERR_IO | 5 | 15.1.4.6 in RFC5661 | 4336 | NFS4ERR_ISDIR | 21 | 15.1.2.3 in RFC5661 | 4337 | NFS4ERR_LAYOUTTRYLATER | 10058 | 15.1.10.3 in | 4338 | | | RFC5661 | 4339 | NFS4ERR_LAYOUTUNAVAILABLE | 10059 | 15.1.10.4 in | 4340 | | | RFC5661 | 4341 | NFS4ERR_LEASE_MOVED | 10031 | 15.1.16.2 in | 4342 | | | RFC5661 | 4343 | NFS4ERR_LOCKED | 10012 | 15.1.8.4 in RFC5661 | 4344 | NFS4ERR_LOCKS_HELD | 10037 | 15.1.8.5 in RFC5661 | 4345 | NFS4ERR_LOCK_NOTSUPP | 10043 | 15.1.8.6 in RFC5661 | 4346 | NFS4ERR_LOCK_RANGE | 10028 | 15.1.8.7 in RFC5661 | 4347 | NFS4ERR_MINOR_VERS_MISMATCH | 10021 | 15.1.3.2 in RFC5661 | 4348 | NFS4ERR_MLINK | 31 | 15.1.4.7 in RFC5661 | 4349 | NFS4ERR_MOVED | 10019 | Section 6.2 | 4350 | NFS4ERR_NAMETOOLONG | 63 | 15.1.7.3 in RFC5661 | 4351 | NFS4ERR_NOENT | 2 | 15.1.4.8 in RFC5661 | 4352 | NFS4ERR_NOFILEHANDLE | 10020 | 15.1.2.5 in RFC5661 | 4353 | NFS4ERR_NOMATCHING_LAYOUT | 10060 | 15.1.10.5 in | 4354 | | | RFC5661 | 4355 | NFS4ERR_NOSPC | 28 | 15.1.4.9 in RFC5661 | 4356 | NFS4ERR_NOTDIR | 20 | 15.1.2.6 in RFC5661 | 4357 | NFS4ERR_NOTEMPTY | 66 | 15.1.4.10 in | 4358 | | | RFC5661 | 4359 | NFS4ERR_NOTSUPP | 10004 | 15.1.1.5 in RFC5661 | 4360 | NFS4ERR_NOT_ONLY_OP | 10081 | 15.1.3.3 in RFC5661 | 4361 | NFS4ERR_NOT_SAME | 10027 | 15.1.15.3 in | 4362 | | | RFC5661 | 4363 | NFS4ERR_NO_GRACE | 10033 | Section 6.3.3 | 4364 | NFS4ERR_NXIO | 6 | 15.1.16.3 in | 4365 | | | RFC5661 | 4366 | NFS4ERR_OLD_STATEID | 10024 | 15.1.5.5 in RFC5661 | 4367 | NFS4ERR_OPENMODE | 10038 | 15.1.8.8 in RFC5661 | 4368 | NFS4ERR_OP_ILLEGAL | 10044 | 15.1.3.4 in RFC5661 | 4369 | NFS4ERR_OP_NOT_IN_SESSION | 10071 | 15.1.3.5 in RFC5661 | 4370 | NFS4ERR_PERM | 1 | 15.1.6.2 in RFC5661 | 4371 | NFS4ERR_PNFS_IO_HOLE | 10075 | 15.1.10.6 in | 4372 | | | RFC5661 | 4373 | NFS4ERR_PNFS_NO_LAYOUT | 10080 | 15.1.10.7 in | 4374 | | | RFC5661 | 4375 | NFS4ERR_RECALLCONFLICT | 10061 | 15.1.14.3 in | 4376 | | | RFC5661 | 4377 | NFS4ERR_RECLAIM_BAD | 10034 | Section 6.3.4 | 4378 | NFS4ERR_RECLAIM_CONFLICT | 10035 | Section 6.3.5 | 4379 | NFS4ERR_REJECT_DELEG | 10085 | 15.1.14.4 in | 4380 | | | RFC5661 | 4381 | NFS4ERR_REP_TOO_BIG | 10066 | 15.1.3.6 in RFC5661 | 4382 | NFS4ERR_REP_TOO_BIG_TO_CACHE | 10067 | 15.1.3.7 in RFC5661 | 4383 | NFS4ERR_REQ_TOO_BIG | 10065 | 15.1.3.8 in RFC5661 | 4384 | NFS4ERR_RESTOREFH | 10030 | 15.1.16.4 in | 4385 | | | RFC5661 | 4386 | NFS4ERR_RETRY_UNCACHED_REP | 10068 | 15.1.3.9 in RFC5661 | 4387 | NFS4ERR_RETURNCONFLICT | 10086 | 15.1.10.8 in | 4388 | | | RFC5661 | 4389 | NFS4ERR_ROFS | 30 | 15.1.4.11 in | 4390 | | | RFC5661 | 4391 | NFS4ERR_SAME | 10009 | 15.1.15.4 in | 4392 | | | RFC5661 | 4393 | NFS4ERR_SHARE_DENIED | 10015 | 15.1.8.9 in RFC5661 | 4394 | NFS4ERR_SEQUENCE_POS | 10064 | 15.1.3.10 in | 4395 | | | RFC5661 | 4396 | NFS4ERR_SEQ_FALSE_RETRY | 10076 | 15.1.11.7 in | 4397 | | | RFC5661 | 4398 | NFS4ERR_SEQ_MISORDERED | 10063 | 15.1.11.8 in | 4399 | | | RFC5661 | 4400 | NFS4ERR_SERVERFAULT | 10006 | 15.1.1.6 in RFC5661 | 4401 | NFS4ERR_STALE | 70 | 15.1.2.7 in RFC5661 | 4402 | NFS4ERR_STALE_CLIENTID | 10022 | 15.1.13.5 in | 4403 | | | RFC5661 | 4404 | NFS4ERR_STALE_STATEID | 10023 | 15.1.16.5 in | 4405 | | | RFC5661 | 4406 | NFS4ERR_SYMLINK | 10029 | 15.1.2.8 in RFC5661 | 4407 | NFS4ERR_TOOSMALL | 10005 | 15.1.1.7 in RFC5661 | 4408 | NFS4ERR_TOO_MANY_OPS | 10070 | 15.1.3.11 in | 4409 | | | RFC5661 | 4410 | NFS4ERR_UNKNOWN_LAYOUTTYPE | 10062 | 15.1.10.9 in | 4411 | | | RFC5661 | 4412 | NFS4ERR_UNSAFE_COMPOUND | 10069 | 15.1.3.12 in | 4413 | | | RFC5661 | 4414 | NFS4ERR_WRONGSEC | 10016 | 15.1.6.3 in RFC5661 | 4415 | NFS4ERR_WRONG_CRED | 10082 | 15.1.6.4 in RFC5661 | 4416 | NFS4ERR_WRONG_TYPE | 10083 | 15.1.2.9 in RFC5661 | 4417 | NFS4ERR_XDEV | 18 | 15.1.4.12 in | 4418 | | | RFC5661 | 4419 +------------------------------------+--------+---------------------+ 4421 Table 1 4423 6.2. Updated Section 15.1.2.4 of [RFC5661] entitled "NFS4ERR_MOVED 4424 (Error Code 10013)" 4426 The file system that contains the current filehandle object is not 4427 accessible using the address on which the request was made. It still 4428 might be accessible using other addresses server-trunkable with it or 4429 it might not be present at the server. In the latter case, it might 4430 have been relocated or migrated to another server, or it might have 4431 never been present. The client may obtain information regarding 4432 access to the file system location by obtaining the "fs_locations" or 4433 "fs_locations_info" attribute for the current filehandle. For 4434 further discussion, refer to Section 5 4436 6.3. Updated Section 15.1.9 of [RFC5661] entitled "Reclaim Errors" 4438 These errors relate to the process of reclaiming locks after a server 4439 restart or in connection with the migration of a file system (i.e. in 4440 the case in which rca_one_fs is TRUE). 4442 6.3.1. Updated Section 15.1.9.1 of [RFC5661] entitled 4443 "NFS4ERR_COMPLETE_ALREADY (Error Code 10054)" 4445 The client previously sent a successful RECLAIM_COMPLETE operation 4446 specifying the same scope, whether that scope is global or for the 4447 same file system in the case of a per-fs RECLAIM_COMPLETE. An 4448 additional RECLAIM_COMPLETE operation is not necessary and results in 4449 this error. 4451 6.3.2. Updated Section 15.1.9.2 of [RFC5661] entitled "NFS4ERR_GRACE 4452 (Error Code 10013)" 4454 The server was in its recovery or grace period, with regard to the 4455 file system object for which the lock was requested. The locking 4456 request was not a reclaim request and so could not be granted during 4457 that period. 4459 6.3.3. Updated Section 15.1.9.3 of [RFC5661] entitled "NFS4ERR_NO_GRACE 4460 (Error Code 10033)" 4462 A reclaim of client state was attempted in circumstances in which the 4463 server cannot guarantee that conflicting state has not been provided 4464 to another client. This can occur because the reclaim has been done 4465 outside of a grace period implemented by the server, after the client 4466 has done a RECLAIM_COMPLETE operation which ends its ability to 4467 reclaim the requested lock, or because previous operations have 4468 created a situation in which the server is not able to determine that 4469 a reclaim-interfering edge condition does not exist. 4471 6.3.4. Updated Section 15.1.9.4 of [RFC5661] entitled 4472 "NFS4ERR_RECLAIM_BAD (Error Code 10034)" 4474 The server has determined that a reclaim attempted by the client is 4475 not valid, i.e. the lock specified as being reclaimed could not 4476 possibly have existed before the server restart or file system 4477 migration event. A server is not obliged to make this determination 4478 and will typically rely on the client to only reclaim locks that the 4479 client was granted prior to restart or file system migration. 4480 However, when a server does have reliable information to enable it 4481 make this determination, this error indicates that the reclaim has 4482 been rejected as invalid. This is as opposed to the error 4483 NFS4ERR_RECLAIM_CONFLICT (see Section 6.3.5) where the server can 4484 only determine that there has been an invalid reclaim, but cannot 4485 determine which request is invalid. 4487 6.3.5. Updated Section 15.1.9.5 of [RFC5661] entitled 4488 "NFS4ERR_RECLAIM_CONFLICT (Error Code 10035)" 4490 The reclaim attempted by the client has encountered a conflict and 4491 cannot be satisfied. Potentially indicates a misbehaving client, 4492 although not necessarily the one receiving the error. The 4493 misbehavior might be on the part of the client that established the 4494 lock with which this client conflicted. See also Section 6.3.4 for 4495 the related error, NFS4ERR_RECLAIM_BAD. 4497 7. Revised Operations within [RFC5661] 4499 7.1. Updated Section 18.35 of [RFC5661] entitled "Operation 42: 4500 EXCHANGE_ID - Instantiate Client ID" 4502 The EXCHANGE_ID exchanges long-hand client and server identifiers 4503 (owners), and provides access to a client ID, creating one if 4504 necessary. This client ID becomes associated with the connection on 4505 which the operation is done, so that it is available when a 4506 CREATE_SESSION is done or when the connection is used to issue a 4507 request on an existing session associated with the current client. 4509 7.1.1. Updated Section 18.35.1 of [RFC5661] entitled "ARGUMENT" 4511 4513 const EXCHGID4_FLAG_SUPP_MOVED_REFER = 0x00000001; 4514 const EXCHGID4_FLAG_SUPP_MOVED_MIGR = 0x00000002; 4516 const EXCHGID4_FLAG_BIND_PRINC_STATEID = 0x00000100; 4518 const EXCHGID4_FLAG_USE_NON_PNFS = 0x00010000; 4519 const EXCHGID4_FLAG_USE_PNFS_MDS = 0x00020000; 4520 const EXCHGID4_FLAG_USE_PNFS_DS = 0x00040000; 4522 const EXCHGID4_FLAG_MASK_PNFS = 0x00070000; 4524 const EXCHGID4_FLAG_UPD_CONFIRMED_REC_A = 0x40000000; 4525 const EXCHGID4_FLAG_CONFIRMED_R = 0x80000000; 4527 struct state_protect_ops4 { 4528 bitmap4 spo_must_enforce; 4529 bitmap4 spo_must_allow; 4530 }; 4532 struct ssv_sp_parms4 { 4533 state_protect_ops4 ssp_ops; 4534 sec_oid4 ssp_hash_algs<>; 4535 sec_oid4 ssp_encr_algs<>; 4536 uint32_t ssp_window; 4537 uint32_t ssp_num_gss_handles; 4538 }; 4540 enum state_protect_how4 { 4541 SP4_NONE = 0, 4542 SP4_MACH_CRED = 1, 4543 SP4_SSV = 2 4544 }; 4546 union state_protect4_a switch(state_protect_how4 spa_how) { 4547 case SP4_NONE: 4548 void; 4549 case SP4_MACH_CRED: 4550 state_protect_ops4 spa_mach_ops; 4551 case SP4_SSV: 4552 ssv_sp_parms4 spa_ssv_parms; 4553 }; 4555 struct EXCHANGE_ID4args { 4556 client_owner4 eia_clientowner; 4557 uint32_t eia_flags; 4558 state_protect4_a eia_state_protect; 4559 nfs_impl_id4 eia_client_impl_id<1>; 4560 }; 4562 4564 7.1.2. Updated Section 18.35.2 of [RFC5661] entitled "RESULT" 4565 4567 struct ssv_prot_info4 { 4568 state_protect_ops4 spi_ops; 4569 uint32_t spi_hash_alg; 4570 uint32_t spi_encr_alg; 4571 uint32_t spi_ssv_len; 4572 uint32_t spi_window; 4573 gsshandle4_t spi_handles<>; 4574 }; 4576 union state_protect4_r switch(state_protect_how4 spr_how) { 4577 case SP4_NONE: 4578 void; 4579 case SP4_MACH_CRED: 4580 state_protect_ops4 spr_mach_ops; 4581 case SP4_SSV: 4582 ssv_prot_info4 spr_ssv_info; 4583 }; 4585 struct EXCHANGE_ID4resok { 4586 clientid4 eir_clientid; 4587 sequenceid4 eir_sequenceid; 4588 uint32_t eir_flags; 4589 state_protect4_r eir_state_protect; 4590 server_owner4 eir_server_owner; 4591 opaque eir_server_scope; 4592 nfs_impl_id4 eir_server_impl_id<1>; 4593 }; 4595 union EXCHANGE_ID4res switch (nfsstat4 eir_status) { 4596 case NFS4_OK: 4597 EXCHANGE_ID4resok eir_resok4; 4599 default: 4600 void; 4601 }; 4603 4605 7.1.3. Updated Section 18.35.3 of [RFC5661] entitled "DESCRIPTION" 4607 The client uses the EXCHANGE_ID operation to register a particular 4608 client_owner with the server. However, when the client_owner has 4609 been already been registered by other means (e.g. Transparent State 4610 Migration), the client may still use EXCHANGE_ID to obtain the client 4611 ID assigned previously. 4613 The client ID returned from this operation will be associated with 4614 the connection on which the EXHANGE_ID is received and will serve as 4615 a parent object for sessions created by the client on this connection 4616 or to which the connection is bound. As a result of using those 4617 sessions to make requests involving the creation of state, that state 4618 will become associated with the client ID returned. 4620 In situations in which the registration of the client_owner has not 4621 occurred previously, the client ID must first be used, along with the 4622 returned eir_sequenceid, in creating an associated session using 4623 CREATE_SESSION. 4625 If the flag EXCHGID4_FLAG_CONFIRMED_R is set in the result, 4626 eir_flags, then it is an indication that the registration of the 4627 client_owner has already occurred and that a further CREATE_SESSION 4628 is not needed to confirm it. Of course, subsequent CREATE_SESSION 4629 operations may be needed for other reasons. 4631 The value eir_sequenceid is used to establish an initial sequence 4632 value associate with the client ID returned. In cases in which a 4633 CREATE_SESSION has already been done, there is no need for this 4634 value, since sequencing of such request has already been established 4635 and the client has no need for this value and will ignore it 4637 EXCHANGE_ID MAY be sent in a COMPOUND procedure that starts with 4638 SEQUENCE. However, when a client communicates with a server for the 4639 first time, it will not have a session, so using SEQUENCE will not be 4640 possible. If EXCHANGE_ID is sent without a preceding SEQUENCE, then 4641 it MUST be the only operation in the COMPOUND procedure's request. 4642 If it is not, the server MUST return NFS4ERR_NOT_ONLY_OP. 4644 The eia_clientowner field is composed of a co_verifier field and a 4645 co_ownerid string. As noted in section 2.4 of [RFC5661], the 4646 co_ownerid describes the client, and the co_verifier is the 4647 incarnation of the client. An EXCHANGE_ID sent with a new 4648 incarnation of the client will lead to the server removing lock state 4649 of the old incarnation. Whereas an EXCHANGE_ID sent with the current 4650 incarnation and co_ownerid will result in an error or an update of 4651 the client ID's properties, depending on the arguments to 4652 EXCHANGE_ID. 4654 A server MUST NOT provide the same client ID to two different 4655 incarnations of an eia_clientowner. 4657 In addition to the client ID and sequence ID, the server returns a 4658 server owner (eir_server_owner) and server scope (eir_server_scope). 4659 The former field is used in connection with network trunking as 4660 described in Section 2.10.54 of [RFC5661]. The latter field is used 4661 to allow clients to determine when client IDs sent by one server may 4662 be recognized by another in the event of file system migration (see 4663 Section 5.9.9 of the current document). 4665 The client ID returned by EXCHANGE_ID is only unique relative to the 4666 combination of eir_server_owner.so_major_id and eir_server_scope. 4667 Thus, if two servers return the same client ID, the onus is on the 4668 client to distinguish the client IDs on the basis of 4669 eir_server_owner.so_major_id and eir_server_scope. In the event two 4670 different servers claim matching server_owner.so_major_id and 4671 eir_server_scope, the client can use the verification techniques 4672 discussed in Section 2.10.5 of [RFC5661] to determine if the servers 4673 are distinct. If they are distinct, then the client will need to 4674 note the destination network addresses of the connections used with 4675 each server and use the network address as the final discriminator. 4677 The server, as defined by the unique identity expressed in the 4678 so_major_id of the server owner and the server scope, needs to track 4679 several properties of each client ID it hands out. The properties 4680 apply to the client ID and all sessions associated with the client 4681 ID. The properties are derived from the arguments and results of 4682 EXCHANGE_ID. The client ID properties include: 4684 o The capabilities expressed by the following bits, which come from 4685 the results of EXCHANGE_ID: 4687 * EXCHGID4_FLAG_SUPP_MOVED_REFER 4689 * EXCHGID4_FLAG_SUPP_MOVED_MIGR 4691 * EXCHGID4_FLAG_BIND_PRINC_STATEID 4693 * EXCHGID4_FLAG_USE_NON_PNFS 4695 * EXCHGID4_FLAG_USE_PNFS_MDS 4697 * EXCHGID4_FLAG_USE_PNFS_DS 4699 These properties may be updated by subsequent EXCHANGE_ID 4700 operations on confirmed client IDs though the server MAY refuse to 4701 change them. 4703 o The state protection method used, one of SP4_NONE, SP4_MACH_CRED, 4704 or SP4_SSV, as set by the spa_how field of the arguments to 4705 EXCHANGE_ID. Once the client ID is confirmed, this property 4706 cannot be updated by subsequent EXCHANGE_ID operations. 4708 o For SP4_MACH_CRED or SP4_SSV state protection: 4710 * The list of operations (spo_must_enforce) that MUST use the 4711 specified state protection. This list comes from the results 4712 of EXCHANGE_ID. 4714 * The list of operations (spo_must_allow) that MAY use the 4715 specified state protection. This list comes from the results 4716 of EXCHANGE_ID. 4718 Once the client ID is confirmed, these properties cannot be 4719 updated by subsequent EXCHANGE_ID requests. 4721 o For SP4_SSV protection: 4723 * The OID of the hash algorithm. This property is represented by 4724 one of the algorithms in the ssp_hash_algs field of the 4725 EXCHANGE_ID arguments. Once the client ID is confirmed, this 4726 property cannot be updated by subsequent EXCHANGE_ID requests. 4728 * The OID of the encryption algorithm. This property is 4729 represented by one of the algorithms in the ssp_encr_algs field 4730 of the EXCHANGE_ID arguments. Once the client ID is confirmed, 4731 this property cannot be updated by subsequent EXCHANGE_ID 4732 requests. 4734 * The length of the SSV. This property is represented by the 4735 spi_ssv_len field in the EXCHANGE_ID results. Once the client 4736 ID is confirmed, this property cannot be updated by subsequent 4737 EXCHANGE_ID operations. 4739 There are REQUIRED and RECOMMENDED relationships among the 4740 length of the key of the encryption algorithm ("key length"), 4741 the length of the output of hash algorithm ("hash length"), and 4742 the length of the SSV ("SSV length"). 4744 + key length MUST be <= hash length. This is because the keys 4745 used for the encryption algorithm are actually subkeys 4746 derived from the SSV, and the derivation is via the hash 4747 algorithm. The selection of an encryption algorithm with a 4748 key length that exceeded the length of the output of the 4749 hash algorithm would require padding, and thus weaken the 4750 use of the encryption algorithm. 4752 + hash length SHOULD be <= SSV length. This is because the 4753 SSV is a key used to derive subkeys via an HMAC, and it is 4754 recommended that the key used as input to an HMAC be at 4755 least as long as the length of the HMAC's hash algorithm's 4756 output (see Section 3 of [RFC2104]). 4758 + key length SHOULD be <= SSV length. This is a transitive 4759 result of the above two invariants. 4761 + key length SHOULD be >= hash length / 2. This is because 4762 the subkey derivation is via an HMAC and it is recommended 4763 that if the HMAC has to be truncated, it should not be 4764 truncated to less than half the hash length (see Section 4 4765 of RFC2104 [RFC2104]). 4767 * Number of concurrent versions of the SSV the client and server 4768 will support (see Section 2.10.9 of [RFC5661]). This property 4769 is represented by spi_window in the EXCHANGE_ID results. The 4770 property may be updated by subsequent EXCHANGE_ID operations. 4772 o The client's implementation ID as represented by the 4773 eia_client_impl_id field of the arguments. The property may be 4774 updated by subsequent EXCHANGE_ID requests. 4776 o The server's implementation ID as represented by the 4777 eir_server_impl_id field of the reply. The property may be 4778 updated by replies to subsequent EXCHANGE_ID requests. 4780 The eia_flags passed as part of the arguments and the eir_flags 4781 results allow the client and server to inform each other of their 4782 capabilities as well as indicate how the client ID will be used. 4783 Whether a bit is set or cleared on the arguments' flags does not 4784 force the server to set or clear the same bit on the results' side. 4785 Bits not defined above cannot be set in the eia_flags field. If they 4786 are, the server MUST reject the operation with NFS4ERR_INVAL. 4788 The EXCHGID4_FLAG_UPD_CONFIRMED_REC_A bit can only be set in 4789 eia_flags; it is always off in eir_flags. The 4790 EXCHGID4_FLAG_CONFIRMED_R bit can only be set in eir_flags; it is 4791 always off in eia_flags. If the server recognizes the co_ownerid and 4792 co_verifier as mapping to a confirmed client ID, it sets 4793 EXCHGID4_FLAG_CONFIRMED_R in eir_flags. The 4794 EXCHGID4_FLAG_CONFIRMED_R flag allows a client to tell if the client 4795 ID it is trying to create already exists and is confirmed. 4797 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set in eia_flags, this means 4798 that the client is attempting to update properties of an existing 4799 confirmed client ID (if the client wants to update properties of an 4800 unconfirmed client ID, it MUST NOT set 4801 EXCHGID4_FLAG_UPD_CONFIRMED_REC_A). If so, it is RECOMMENDED that 4802 the client send the update EXCHANGE_ID operation in the same COMPOUND 4803 as a SEQUENCE so that the EXCHANGE_ID is executed exactly once. 4804 Whether the client can update the properties of client ID depends on 4805 the state protection it selected when the client ID was created, and 4806 the principal and security flavor it used when sending the 4807 EXCHANGE_ID operation. The situations described in items 6, 7, 8, or 4808 9 of the second numbered list of Section 7.1.4 below will apply. 4809 Note that if the operation succeeds and returns a client ID that is 4810 already confirmed, the server MUST set the EXCHGID4_FLAG_CONFIRMED_R 4811 bit in eir_flags. 4813 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in eia_flags, this 4814 means that the client is trying to establish a new client ID; it is 4815 attempting to trunk data communication to the server (See 4816 Section 2.10.5 of [RFC5661]); or it is attempting to update 4817 properties of an unconfirmed client ID. The situations described in 4818 items 1, 2, 3, 4, or 5 of the second numbered list of Section 7.1.4 4819 below will apply. Note that if the operation succeeds and returns a 4820 client ID that was previously confirmed, the server MUST set the 4821 EXCHGID4_FLAG_CONFIRMED_R bit in eir_flags. 4823 When the EXCHGID4_FLAG_SUPP_MOVED_REFER flag bit is set, the client 4824 indicates that it is capable of dealing with an NFS4ERR_MOVED error 4825 as part of a referral sequence. When this bit is not set, it is 4826 still legal for the server to perform a referral sequence. However, 4827 a server may use the fact that the client is incapable of correctly 4828 responding to a referral, by avoiding it for that particular client. 4829 It may, for instance, act as a proxy for that particular file system, 4830 at some cost in performance, although it is not obligated to do so. 4831 If the server will potentially perform a referral, it MUST set 4832 EXCHGID4_FLAG_SUPP_MOVED_REFER in eir_flags. 4834 When the EXCHGID4_FLAG_SUPP_MOVED_MIGR is set, the client indicates 4835 that it is capable of dealing with an NFS4ERR_MOVED error as part of 4836 a file system migration sequence. When this bit is not set, it is 4837 still legal for the server to indicate that a file system has moved, 4838 when this in fact happens. However, a server may use the fact that 4839 the client is incapable of correctly responding to a migration in its 4840 scheduling of file systems to migrate so as to avoid migration of 4841 file systems being actively used. It may also hide actual migrations 4842 from clients unable to deal with them by acting as a proxy for a 4843 migrated file system for particular clients, at some cost in 4844 performance, although it is not obligated to do so. If the server 4845 will potentially perform a migration, it MUST set 4846 EXCHGID4_FLAG_SUPP_MOVED_MIGR in eir_flags. 4848 When EXCHGID4_FLAG_BIND_PRINC_STATEID is set, the client indicates 4849 that it wants the server to bind the stateid to the principal. This 4850 means that when a principal creates a stateid, it has to be the one 4851 to use the stateid. If the server will perform binding, it will 4852 return EXCHGID4_FLAG_BIND_PRINC_STATEID. The server MAY return 4853 EXCHGID4_FLAG_BIND_PRINC_STATEID even if the client does not request 4854 it. If an update to the client ID changes the value of 4855 EXCHGID4_FLAG_BIND_PRINC_STATEID's client ID property, the effect 4856 applies only to new stateids. Existing stateids (and all stateids 4857 with the same "other" field) that were created with stateid to 4858 principal binding in force will continue to have binding in force. 4859 Existing stateids (and all stateids with the same "other" field) that 4860 were created with stateid to principal not in force will continue to 4861 have binding not in force. 4863 The EXCHGID4_FLAG_USE_NON_PNFS, EXCHGID4_FLAG_USE_PNFS_MDS, and 4864 EXCHGID4_FLAG_USE_PNFS_DS bits are described in Section 13.1 of 4865 [RFC5661] and convey roles the client ID is to be used for in a pNFS 4866 environment. The server MUST set one of the acceptable combinations 4867 of these bits (roles) in eir_flags, as specified in that section. 4868 Note that the same client owner/server owner pair can have multiple 4869 roles. Multiple roles can be associated with the same client ID or 4870 with different client IDs. Thus, if a client sends EXCHANGE_ID from 4871 the same client owner to the same server owner multiple times, but 4872 specifies different pNFS roles each time, the server might return 4873 different client IDs. Given that different pNFS roles might have 4874 different client IDs, the client may ask for different properties for 4875 each role/client ID. 4877 The spa_how field of the eia_state_protect field specifies how the 4878 client wants to protect its client, locking, and session states from 4879 unauthorized changes (Section 2.10.8.3 of [RFC5661]): 4881 o SP4_NONE. The client does not request the NFSv4.1 server to 4882 enforce state protection. The NFSv4.1 server MUST NOT enforce 4883 state protection for the returned client ID. 4885 o SP4_MACH_CRED. If spa_how is SP4_MACH_CRED, then the client MUST 4886 send the EXCHANGE_ID operation with RPCSEC_GSS as the security 4887 flavor, and with a service of RPC_GSS_SVC_INTEGRITY or 4888 RPC_GSS_SVC_PRIVACY. If SP4_MACH_CRED is specified, then the 4889 client wants to use an RPCSEC_GSS-based machine credential to 4890 protect its state. The server MUST note the principal the 4891 EXCHANGE_ID operation was sent with, and the GSS mechanism used. 4892 These notes collectively comprise the machine credential. 4894 After the client ID is confirmed, as long as the lease associated 4895 with the client ID is unexpired, a subsequent EXCHANGE_ID 4896 operation that uses the same eia_clientowner.co_owner as the first 4897 EXCHANGE_ID MUST also use the same machine credential as the first 4898 EXCHANGE_ID. The server returns the same client ID for the 4899 subsequent EXCHANGE_ID as that returned from the first 4900 EXCHANGE_ID. 4902 o SP4_SSV. If spa_how is SP4_SSV, then the client MUST send the 4903 EXCHANGE_ID operation with RPCSEC_GSS as the security flavor, and 4904 with a service of RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY. 4905 If SP4_SSV is specified, then the client wants to use the SSV to 4906 protect its state. The server records the credential used in the 4907 request as the machine credential (as defined above) for the 4908 eia_clientowner.co_owner. The CREATE_SESSION operation that 4909 confirms the client ID MUST use the same machine credential. 4911 When a client specifies SP4_MACH_CRED or SP4_SSV, it also provides 4912 two lists of operations (each expressed as a bitmap). The first list 4913 is spo_must_enforce and consists of those operations the client MUST 4914 send (subject to the server confirming the list of operations in the 4915 result of EXCHANGE_ID) with the machine credential (if SP4_MACH_CRED 4916 protection is specified) or the SSV-based credential (if SP4_SSV 4917 protection is used). The client MUST send the operations with 4918 RPCSEC_GSS credentials that specify the RPC_GSS_SVC_INTEGRITY or 4919 RPC_GSS_SVC_PRIVACY security service. Typically, the first list of 4920 operations includes EXCHANGE_ID, CREATE_SESSION, DELEGPURGE, 4921 DESTROY_SESSION, BIND_CONN_TO_SESSION, and DESTROY_CLIENTID. The 4922 client SHOULD NOT specify in this list any operations that require a 4923 filehandle because the server's access policies MAY conflict with the 4924 client's choice, and thus the client would then be unable to access a 4925 subset of the server's namespace. 4927 Note that if SP4_SSV protection is specified, and the client 4928 indicates that CREATE_SESSION must be protected with SP4_SSV, because 4929 the SSV cannot exist without a confirmed client ID, the first 4930 CREATE_SESSION MUST instead be sent using the machine credential, and 4931 the server MUST accept the machine credential. 4933 There is a corresponding result, also called spo_must_enforce, of the 4934 operations for which the server will require SP4_MACH_CRED or SP4_SSV 4935 protection. Normally, the server's result equals the client's 4936 argument, but the result MAY be different. If the client requests 4937 one or more operations in the set { EXCHANGE_ID, CREATE_SESSION, 4938 DELEGPURGE, DESTROY_SESSION, BIND_CONN_TO_SESSION, DESTROY_CLIENTID 4939 }, then the result spo_must_enforce MUST include the operations the 4940 client requested from that set. 4942 If spo_must_enforce in the results has BIND_CONN_TO_SESSION set, then 4943 connection binding enforcement is enabled, and the client MUST use 4944 the machine (if SP4_MACH_CRED protection is used) or SSV (if SP4_SSV 4945 protection is used) credential on calls to BIND_CONN_TO_SESSION. 4947 The second list is spo_must_allow and consists of those operations 4948 the client wants to have the option of sending with the machine 4949 credential or the SSV-based credential, even if the object the 4950 operations are performed on is not owned by the machine or SSV 4951 credential. 4953 The corresponding result, also called spo_must_allow, consists of the 4954 operations the server will allow the client to use SP4_SSV or 4955 SP4_MACH_CRED credentials with. Normally, the server's result equals 4956 the client's argument, but the result MAY be different. 4958 The purpose of spo_must_allow is to allow clients to solve the 4959 following conundrum. Suppose the client ID is confirmed with 4960 EXCHGID4_FLAG_BIND_PRINC_STATEID, and it calls OPEN with the 4961 RPCSEC_GSS credentials of a normal user. Now suppose the user's 4962 credentials expire, and cannot be renewed (e.g., a Kerberos ticket 4963 granting ticket expires, and the user has logged off and will not be 4964 acquiring a new ticket granting ticket). The client will be unable 4965 to send CLOSE without the user's credentials, which is to say the 4966 client has to either leave the state on the server or re-send 4967 EXCHANGE_ID with a new verifier to clear all state, that is, unless 4968 the client includes CLOSE on the list of operations in spo_must_allow 4969 and the server agrees. 4971 The SP4_SSV protection parameters also have: 4973 ssp_hash_algs: 4975 This is the set of algorithms the client supports for the purpose 4976 of computing the digests needed for the internal SSV GSS mechanism 4977 and for the SET_SSV operation. Each algorithm is specified as an 4978 object identifier (OID). The REQUIRED algorithms for a server are 4979 id-sha1, id-sha224, id-sha256, id-sha384, and id-sha512 [RFC4055]. 4980 The algorithm the server selects among the set is indicated in 4981 spi_hash_alg, a field of spr_ssv_prot_info. The field 4982 spi_hash_alg is an index into the array ssp_hash_algs. If the 4983 server does not support any of the offered algorithms, it returns 4984 NFS4ERR_HASH_ALG_UNSUPP. If ssp_hash_algs is empty, the server 4985 MUST return NFS4ERR_INVAL. 4987 ssp_encr_algs: 4989 This is the set of algorithms the client supports for the purpose 4990 of providing privacy protection for the internal SSV GSS 4991 mechanism. Each algorithm is specified as an OID. The REQUIRED 4992 algorithm for a server is id-aes256-CBC. The RECOMMENDED 4993 algorithms are id-aes192-CBC and id-aes128-CBC [CSOR_AES]. The 4994 selected algorithm is returned in spi_encr_alg, an index into 4995 ssp_encr_algs. If the server does not support any of the offered 4996 algorithms, it returns NFS4ERR_ENCR_ALG_UNSUPP. If ssp_encr_algs 4997 is empty, the server MUST return NFS4ERR_INVAL. Note that due to 4998 previously stated requirements and recommendations on the 4999 relationships between key length and hash length, some 5000 combinations of RECOMMENDED and REQUIRED encryption algorithm and 5001 hash algorithm either SHOULD NOT or MUST NOT be used. Table 2 5002 summarizes the illegal and discouraged combinations. 5004 ssp_window: 5006 This is the number of SSV versions the client wants the server to 5007 maintain (i.e., each successful call to SET_SSV produces a new 5008 version of the SSV). If ssp_window is zero, the server MUST 5009 return NFS4ERR_INVAL. The server responds with spi_window, which 5010 MUST NOT exceed ssp_window and MUST be at least one. Any requests 5011 on the backchannel or fore channel that are using a version of the 5012 SSV that is outside the window will fail with an ONC RPC 5013 authentication error, and the requester will have to retry them 5014 with the same slot ID and sequence ID. 5016 ssp_num_gss_handles: 5018 This is the number of RPCSEC_GSS handles the server should create 5019 that are based on the GSS SSV mechanism (see section 2.10.9 of 5020 [RFC5661]). It is not the total number of RPCSEC_GSS handles for 5021 the client ID. Indeed, subsequent calls to EXCHANGE_ID will add 5022 RPCSEC_GSS handles. The server responds with a list of handles in 5023 spi_handles. If the client asks for at least one handle and the 5024 server cannot create it, the server MUST return an error. The 5025 handles in spi_handles are not available for use until the client 5026 ID is confirmed, which could be immediately if EXCHANGE_ID returns 5027 EXCHGID4_FLAG_CONFIRMED_R, or upon successful confirmation from 5028 CREATE_SESSION. 5030 While a client ID can span all the connections that are connected 5031 to a server sharing the same eir_server_owner.so_major_id, the 5032 RPCSEC_GSS handles returned in spi_handles can only be used on 5033 connections connected to a server that returns the same the 5034 eir_server_owner.so_major_id and eir_server_owner.so_minor_id on 5035 each connection. It is permissible for the client to set 5036 ssp_num_gss_handles to zero; the client can create more handles 5037 with another EXCHANGE_ID call. 5039 Because each SSV RPCSEC_GSS handle shares a common SSV GSS 5040 context, there are security considerations specific to this 5041 situation discussed in Section 2.10.10 of [RFC5661]. 5043 The seq_window (see Section 5.2.3.1 of [RFC2203]) of each 5044 RPCSEC_GSS handle in spi_handle MUST be the same as the seq_window 5045 of the RPCSEC_GSS handle used for the credential of the RPC 5046 request that the EXCHANGE_ID operation was sent as a part of. 5048 +-------------------+----------------------+------------------------+ 5049 | Encryption | MUST NOT be combined | SHOULD NOT be combined | 5050 | Algorithm | with | with | 5051 +-------------------+----------------------+------------------------+ 5052 | id-aes128-CBC | | id-sha384, id-sha512 | 5053 | id-aes192-CBC | id-sha1 | id-sha512 | 5054 | id-aes256-CBC | id-sha1, id-sha224 | | 5055 +-------------------+----------------------+------------------------+ 5057 Table 2 5059 The arguments include an array of up to one element in length called 5060 eia_client_impl_id. If eia_client_impl_id is present, it contains 5061 the information identifying the implementation of the client. 5062 Similarly, the results include an array of up to one element in 5063 length called eir_server_impl_id that identifies the implementation 5064 of the server. Servers MUST accept a zero-length eia_client_impl_id 5065 array, and clients MUST accept a zero-length eir_server_impl_id 5066 array. 5068 A possible use for implementation identifiers would be in diagnostic 5069 software that extracts this information in an attempt to identify 5070 interoperability problems, performance workload behaviors, or general 5071 usage statistics. Since the intent of having access to this 5072 information is for planning or general diagnosis only, the client and 5073 server MUST NOT interpret this implementation identity information in 5074 a way that affects how the implementation behaves in interacting with 5075 its peer. The client and server are not allowed to depend on the 5076 peer's manifesting a particular allowed behavior based on an 5077 implementation identifier but are required to interoperate as 5078 specified elsewhere in the protocol specification. 5080 Because it is possible that some implementations might violate the 5081 protocol specification and interpret the identity information, 5082 implementations MUST provide facilities to allow the NFSv4 client and 5083 server be configured to set the contents of the nfs_impl_id 5084 structures sent to any specified value. 5086 7.1.4. Updated Section 18.35.4 of [RFC5661] entitled "IMPLEMENTATION" 5088 A server's client record is a 5-tuple: 5090 1. co_ownerid 5091 The client identifier string, from the eia_clientowner 5092 structure of the EXCHANGE_ID4args structure. 5094 2. co_verifier: 5096 A client-specific value used to indicate incarnations (where a 5097 client restart represents a new incarnation), from the 5098 eia_clientowner structure of the EXCHANGE_ID4args structure. 5100 3. principal: 5102 The principal that was defined in the RPC header's credential 5103 and/or verifier at the time the client record was established. 5105 4. client ID: 5107 The shorthand client identifier, generated by the server and 5108 returned via the eir_clientid field in the EXCHANGE_ID4resok 5109 structure. 5111 5. confirmed: 5113 A private field on the server indicating whether or not a 5114 client record has been confirmed. A client record is 5115 confirmed if there has been a successful CREATE_SESSION 5116 operation to confirm it. Otherwise, it is unconfirmed. An 5117 unconfirmed record is established by an EXCHANGE_ID call. Any 5118 unconfirmed record that is not confirmed within a lease period 5119 SHOULD be removed. 5121 The following identifiers represent special values for the fields in 5122 the records. 5124 ownerid_arg: 5126 The value of the eia_clientowner.co_ownerid subfield of the 5127 EXCHANGE_ID4args structure of the current request. 5129 verifier_arg: 5131 The value of the eia_clientowner.co_verifier subfield of the 5132 EXCHANGE_ID4args structure of the current request. 5134 old_verifier_arg: 5136 A value of the eia_clientowner.co_verifier field of a client 5137 record received in a previous request; this is distinct from 5138 verifier_arg. 5140 principal_arg: 5142 The value of the RPCSEC_GSS principal for the current request. 5144 old_principal_arg: 5146 A value of the principal of a client record as defined by the RPC 5147 header's credential or verifier of a previous request. This is 5148 distinct from principal_arg. 5150 clientid_ret: 5152 The value of the eir_clientid field the server will return in the 5153 EXCHANGE_ID4resok structure for the current request. 5155 old_clientid_ret: 5157 The value of the eir_clientid field the server returned in the 5158 EXCHANGE_ID4resok structure for a previous request. This is 5159 distinct from clientid_ret. 5161 confirmed: 5163 The client ID has been confirmed. 5165 unconfirmed: 5167 The client ID has not been confirmed. 5169 Since EXCHANGE_ID is a non-idempotent operation, we must consider the 5170 possibility that retries occur as a result of a client restart, 5171 network partition, malfunctioning router, etc. Retries are 5172 identified by the value of the eia_clientowner field of 5173 EXCHANGE_ID4args, and the method for dealing with them is outlined in 5174 the scenarios below. 5176 The scenarios are described in terms of the client record(s) a server 5177 has for a given co_ownerid. Note that if the client ID was created 5178 specifying SP4_SSV state protection and EXCHANGE_ID as the one of the 5179 operations in spo_must_allow, then the server MUST authorize 5180 EXCHANGE_IDs with the SSV principal in addition to the principal that 5181 created the client ID. 5183 1. New Owner ID 5185 If the server has no client records with 5186 eia_clientowner.co_ownerid matching ownerid_arg, and 5187 EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in the 5188 EXCHANGE_ID, then a new shorthand client ID (let us call it 5189 clientid_ret) is generated, and the following unconfirmed 5190 record is added to the server's state. 5192 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 5193 unconfirmed } 5195 Subsequently, the server returns clientid_ret. 5197 2. Non-Update on Existing Client ID 5199 If the server has the following confirmed record, and the 5200 request does not have EXCHGID4_FLAG_UPD_CONFIRMED_REC_A set, 5201 then the request is the result of a retried request due to a 5202 faulty router or lost connection, or the client is trying to 5203 determine if it can perform trunking. 5205 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 5206 confirmed } 5208 Since the record has been confirmed, the client must have 5209 received the server's reply from the initial EXCHANGE_ID 5210 request. Since the server has a confirmed record, and since 5211 EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, with the 5212 possible exception of eir_server_owner.so_minor_id, the server 5213 returns the same result it did when the client ID's properties 5214 were last updated (or if never updated, the result when the 5215 client ID was created). The confirmed record is unchanged. 5217 3. Client Collision 5219 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, and if the 5220 server has the following confirmed record, then this request 5221 is likely the result of a chance collision between the values 5222 of the eia_clientowner.co_ownerid subfield of EXCHANGE_ID4args 5223 for two different clients. 5225 { ownerid_arg, *, old_principal_arg, old_clientid_ret, 5226 confirmed } 5228 If there is currently no state associated with 5229 old_clientid_ret, or if there is state but the lease has 5230 expired, then this case is effectively equivalent to the New 5231 Owner ID case of Paragraph 1. The confirmed record is 5232 deleted, the old_clientid_ret and its lock state are deleted, 5233 a new shorthand client ID is generated, and the following 5234 unconfirmed record is added to the server's state. 5236 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 5237 unconfirmed } 5239 Subsequently, the server returns clientid_ret. 5241 If old_clientid_ret has an unexpired lease with state, then no 5242 state of old_clientid_ret is changed or deleted. The server 5243 returns NFS4ERR_CLID_INUSE to indicate that the client should 5244 retry with a different value for the 5245 eia_clientowner.co_ownerid subfield of EXCHANGE_ID4args. The 5246 client record is not changed. 5248 4. Replacement of Unconfirmed Record 5250 If the EXCHGID4_FLAG_UPD_CONFIRMED_REC_A flag is not set, and 5251 the server has the following unconfirmed record, then the 5252 client is attempting EXCHANGE_ID again on an unconfirmed 5253 client ID, perhaps due to a retry, a client restart before 5254 client ID confirmation (i.e., before CREATE_SESSION was 5255 called), or some other reason. 5257 { ownerid_arg, *, *, old_clientid_ret, unconfirmed } 5259 It is possible that the properties of old_clientid_ret are 5260 different than those specified in the current EXCHANGE_ID. 5261 Whether or not the properties are being updated, to eliminate 5262 ambiguity, the server deletes the unconfirmed record, 5263 generates a new client ID (clientid_ret), and establishes the 5264 following unconfirmed record: 5266 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 5267 unconfirmed } 5269 5. Client Restart 5271 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, and if the 5272 server has the following confirmed client record, then this 5273 request is likely from a previously confirmed client that has 5274 restarted. 5276 { ownerid_arg, old_verifier_arg, principal_arg, 5277 old_clientid_ret, confirmed } 5279 Since the previous incarnation of the same client will no 5280 longer be making requests, once the new client ID is confirmed 5281 by CREATE_SESSION, byte-range locks and share reservations 5282 should be released immediately rather than forcing the new 5283 incarnation to wait for the lease time on the previous 5284 incarnation to expire. Furthermore, session state should be 5285 removed since if the client had maintained that information 5286 across restart, this request would not have been sent. If the 5287 server supports neither the CLAIM_DELEGATE_PREV nor 5288 CLAIM_DELEG_PREV_FH claim types, associated delegations should 5289 be purged as well; otherwise, delegations are retained and 5290 recovery proceeds according to section 10.2.1 of [RFC5661]. 5292 After processing, clientid_ret is returned to the client and 5293 this client record is added: 5295 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 5296 unconfirmed } 5298 The previously described confirmed record continues to exist, 5299 and thus the same ownerid_arg exists in both a confirmed and 5300 unconfirmed state at the same time. The number of states can 5301 collapse to one once the server receives an applicable 5302 CREATE_SESSION or EXCHANGE_ID. 5304 + If the server subsequently receives a successful 5305 CREATE_SESSION that confirms clientid_ret, then the server 5306 atomically destroys the confirmed record and makes the 5307 unconfirmed record confirmed as described in section 5308 16.36.3 of [RFC5661]. 5310 + If the server instead subsequently receives an EXCHANGE_ID 5311 with the client owner equal to ownerid_arg, one strategy is 5312 to simply delete the unconfirmed record, and process the 5313 EXCHANGE_ID as described in the entirety of Section 7.1.4. 5315 6. Update 5317 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 5318 has the following confirmed record, then this request is an 5319 attempt at an update. 5321 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 5322 confirmed } 5324 Since the record has been confirmed, the client must have 5325 received the server's reply from the initial EXCHANGE_ID 5326 request. The server allows the update, and the client record 5327 is left intact. 5329 7. Update but No Confirmed Record 5331 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 5332 has no confirmed record corresponding ownerid_arg, then the 5333 server returns NFS4ERR_NOENT and leaves any unconfirmed record 5334 intact. 5336 8. Update but Wrong Verifier 5338 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 5339 has the following confirmed record, then this request is an 5340 illegal attempt at an update, perhaps because of a retry from 5341 a previous client incarnation. 5343 { ownerid_arg, old_verifier_arg, *, clientid_ret, confirmed } 5345 The server returns NFS4ERR_NOT_SAME and leaves the client 5346 record intact. 5348 9. Update but Wrong Principal 5350 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 5351 has the following confirmed record, then this request is an 5352 illegal attempt at an update by an unauthorized principal. 5354 { ownerid_arg, verifier_arg, old_principal_arg, clientid_ret, 5355 confirmed } 5357 The server returns NFS4ERR_PERM and leaves the client record 5358 intact. 5360 7.2. Updated Section 18.51 of [RFC5661] entitled "Operation 58: 5361 RECLAIM_COMPLETE - Indicates Reclaims Finished" 5363 7.2.1. Updated Section 18.51.1 of [RFC5661] entitled "ARGUMENT" 5364 5366 struct RECLAIM_COMPLETE4args { 5367 /* 5368 * If rca_one_fs TRUE, 5369 * 5370 * CURRENT_FH: object in 5371 * file system reclaim is 5372 * complete for. 5373 */ 5374 bool rca_one_fs; 5375 }; 5377 5379 7.2.2. Updated Section 18.51.2 of [RFC5661] entitled "RESULTS" 5381 5383 struct RECLAIM_COMPLETE4res { 5384 nfsstat4 rcr_status; 5385 }; 5387 5389 7.2.3. Updated Section 18.51.3 of [RFC5661] entitled "DESCRIPTION" 5391 A RECLAIM_COMPLETE operation is used to indicate that the client has 5392 reclaimed all of the locking state that it will recover using 5393 reclaim, when it is recovering state due to either a server restart 5394 or the migration of a file system to another server. There are two 5395 types of RECLAIM_COMPLETE operations: 5397 o When rca_one_fs is FALSE, a global RECLAIM_COMPLETE is being done. 5398 This indicates that recovery of all locks that the client held on 5399 the previous server instance has been completed. The current 5400 filehandle need not be set in this case. 5402 o When rca_one_fs is TRUE, a file system-specific RECLAIM_COMPLETE 5403 is being done. This indicates that recovery of locks for a single 5404 fs (the one designated by the current filehandle) due to the 5405 migration of the file system has been completed. Presence of a 5406 current filehandle is required when rca_one_fs is set to TRUE. 5407 When the current filehandle designates a filehandle in a file 5408 system not in the process of migration, the operation returns 5409 NFS4_OK and is otherwise ignored. 5411 Once a RECLAIM_COMPLETE is done, there can be no further reclaim 5412 operations for locks whose scope is defined as having completed 5413 recovery. Once the client sends RECLAIM_COMPLETE, the server will 5414 not allow the client to do subsequent reclaims of locking state for 5415 that scope and, if these are attempted, will return NFS4ERR_NO_GRACE. 5417 Whenever a client establishes a new client ID and before it does the 5418 first non-reclaim operation that obtains a lock, it MUST send a 5419 RECLAIM_COMPLETE with rca_one_fs set to FALSE, even if there are no 5420 locks to reclaim. If non-reclaim locking operations are done before 5421 the RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned. 5423 Similarly, when the client accesses a migrated file system on a new 5424 server, before it sends the first non-reclaim operation that obtains 5425 a lock on this new server, it MUST send a RECLAIM_COMPLETE with 5426 rca_one_fs set to TRUE and current filehandle within that file 5427 system, even if there are no locks to reclaim. If non-reclaim 5428 locking operations are done on that file system before the 5429 RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned. 5431 It should be noted that there are situations in which a client needs 5432 to issue both forms of RECLAIM_COMPLETE. An example is an instance 5433 of file system migration in which the file system is migrated to a 5434 server for which the client has no clientid. As a result, the client 5435 needs to obtain a clientid from the server (incurring the 5436 responsibility to do RECLAIM_COMPLETE with rca_one_fs set to FALSE) 5437 as well as RECLAIM_COMPLETE with rca_one_fs set to TRUE to complete 5438 the per-fs grace period associated with the file system migration. 5439 These two may be done in any order as long as all necessary lock 5440 reclaims have been done before issuing either of them. 5442 Any locks not reclaimed at the point at which RECLAIM_COMPLETE is 5443 done become non-reclaimable. The client MUST NOT attempt to reclaim 5444 them, either during the current server instance or in any subsequent 5445 server instance, or on another server to which responsibility for 5446 that file system is transferred. If the client were to do so, it 5447 would be violating the protocol by representing itself as owning 5448 locks that it does not own, and so has no right to reclaim. See 5449 Section 8.4.3 of [RFC5661] for a discussion of edge conditions 5450 related to lock reclaim. 5452 By sending a RECLAIM_COMPLETE, the client indicates readiness to 5453 proceed to do normal non-reclaim locking operations. The client 5454 should be aware that such operations may temporarily result in 5455 NFS4ERR_GRACE errors until the server is ready to terminate its grace 5456 period. 5458 7.2.4. Updated Section 18.51.4 of [RFC5661] entitled "IMPLEMENTATION" 5460 Servers will typically use the information as to when reclaim 5461 activity is complete to reduce the length of the grace period. When 5462 the server maintains in persistent storage a list of clients that 5463 might have had locks, it is able to use the fact that all such 5464 clients have done a RECLAIM_COMPLETE to terminate the grace period 5465 and begin normal operations (i.e., grant requests for new locks) 5466 sooner than it might otherwise. 5468 Latency can be minimized by doing a RECLAIM_COMPLETE as part of the 5469 COMPOUND request in which the last lock-reclaiming operation is done. 5470 When there are no reclaims to be done, RECLAIM_COMPLETE should be 5471 done immediately in order to allow the grace period to end as soon as 5472 possible. 5474 RECLAIM_COMPLETE should only be done once for each server instance or 5475 occasion of the transition of a file system. If it is done a second 5476 time, the error NFS4ERR_COMPLETE_ALREADY will result. Note that 5477 because of the session feature's retry protection, retries of 5478 COMPOUND requests containing RECLAIM_COMPLETE operation will not 5479 result in this error. 5481 When a RECLAIM_COMPLETE is sent, the client effectively acknowledges 5482 any locks not yet reclaimed as lost. This allows the server to re- 5483 enable the client to recover locks if the occurrence of edge 5484 conditions, as described in Section 8.4.3 of [RFC5661], had caused 5485 the server to disable the client's ability to recover locks. 5487 Because previous descriptions of RECLAIM_COMPLETE were not 5488 sufficiently explicit about the circumstances in which use of 5489 RECLAIM_COMPLETE with rca_one_fs set to TRUE was appropriate, there 5490 have been cases which it has been misused by clients, and cases in 5491 which servers have, in various ways, not responded to such misuse as 5492 described above. While clients SHOULD NOT misuse this feature and 5493 servers SHOULD respond to such misuse as described above, 5494 implementers need to be aware of the following considerations as they 5495 make necessary tradeoffs between interoperability with existing 5496 implementations and proper support for facilities to allow lock 5497 recovery in the event of file system migration. 5499 o When servers have no support for becoming the destination server 5500 of a file system subject to migration, there is no possibility of 5501 a per-fs RECLAIM_COMPLETE being done legitimately and occurrences 5502 of it SHOULD be ignored. However, the negative consequences of 5503 accepting such mistaken use are quite limited as long as the 5504 client does not issue it before all necessary reclaims are done. 5506 o When a server might become the destination for a file system being 5507 migrated, inappropriate use of per-fs RECLAIM_COMPLETE is more 5508 concerning. In the case in which the file system designated is 5509 not within a per-fs grace period, the per-fs RECLAIM_COMPLETE 5510 SHOULD be ignored, with the negative consequences of accepting it 5511 being limited, as in the case in which migration is not supported. 5512 However, if the server encounters a file system undergoing 5513 migration, the operation cannot be accepted as if it were a global 5514 RECLAIM_COMPLETE without invalidating its intended use. 5516 8. Security Considerations 5518 The Security Considerations section of [RFC5661] needs the additions 5519 below to properly address some aspects of trunking discovery, 5520 referral, migration and replication. 5522 The possibility that requests to determine the set of network 5523 addresses corresponding to a given server might be interfered with 5524 or have their responses modified in flight needs to be taken into 5525 account. In light of this, the following considerations should be 5526 taken note of: 5528 o When DNS is used to convert server names to addresses and 5529 DNSSEC [RFC4033] is not available, the validity of the network 5530 addresses returned cannot be relied upon. However, when the 5531 client uses RPCSEC_GSS to access the designated server, it is 5532 possible for mutual authentication to discover invalid server 5533 addresses provided, as long as the RPCSEC_GSS implementation 5534 used does not use insecure DNS queries to canonicalize the 5535 hostname components of the service principal names, as 5536 explained in [RFC4120]. 5538 o The fetching of attributes containing file system location 5539 information SHOULD be performed using RPCSEC_GSS with integrity 5540 protection, as previously explained in the Security 5541 Considerations section of [RFC5661]. It is important to note 5542 here that a client making a request of this sort without using 5543 RPCSEC_GSS including integrity protection needs be aware of the 5544 negative consequences of doing so, which can lead to invalid 5545 host names or network addresses being returned. These include 5546 case in which the client is directed a server under the control 5547 of an attacker, who might get access to data written or provide 5548 incorrect values for data read. In light of this, the client 5549 needs to recognize that using such returned location 5550 information to access an NFSv4 server without use of RPCSEC_GSS 5551 (i.e. by using AUTH_SYS) poses dangers as it can result in the 5552 client interacting with such an attacker-controlled server, 5553 without any authentication facilities to verify the server's 5554 identity. 5556 o Despite the fact that it is a requirement (of [RFC5661]) that 5557 "implementations" provide "support" for use of RPCSEC_GSS, it 5558 cannot be assumed that use of RPCSEC_GSS is always available 5559 between any particular client-server pair. 5561 o When a client has the network addresses of a server but not the 5562 associated host names, that would interfere with its ability to 5563 use RPCSEC_GSS. 5565 In light of the above, a server SHOULD present file system 5566 location entries that correspond to file systems on other servers 5567 using a host name. This would allow the client to interrogate the 5568 fs_locations on the destination server to obtain trunking 5569 information (as well as replica information) using RPCSEC_GSS with 5570 integrity, validating the name provided while assuring that the 5571 response has not been modified in flight. 5573 When RPCSEC_GSS is not available on a server, the client needs to 5574 be aware of the fact that the location entries are subject to 5575 modification in flight and so cannot be relied upon. In the case 5576 of a client being directed to another server after NFS4ERR_MOVED, 5577 this could vitiate the authentication provided by the use of 5578 RPCSEC_GSS on the destination. Even when RPCSEC_GSS 5579 authentication is available on the destination, the server might 5580 validly represent itself as the server to which the client was 5581 erroneously directed. Without a way to decide whether the server 5582 is a valid one, the client can only determine, using RPCSEC_GSS, 5583 that the server corresponds to the name provided, with no basis 5584 for trusting that server. As a result, the client SHOULD NOT use 5585 such unverified location entries as a basis for migration, even 5586 though RPCSEC_GSS might be available on the destination. 5588 When a file system location attribute is fetched upon connecting 5589 with an NFS server, it SHOULD, as stated above, be done using 5590 RPCSEC_GSS with integrity protection. When this not possible, it 5591 is generally best for the client to ignore trunking and replica 5592 information or simply not fetch the location information for these 5593 purposes. 5595 When location information cannot be verified, it can be subjected 5596 to additional filtering to prevent the client from being 5597 inappropriately directed. For example, if a range of network 5598 addresses can be determined that assure that the servers and 5599 clients using AUTH_SYS are subject to the appropriate set of 5600 constrains (e.g. physical network isolation, administrative 5601 controls on the operating systems used), then network addresses in 5602 the appropriate range can be used with others discarded or 5603 restricted in their use of AUTH_SYS. 5605 To summarize considerations regarding the use of RPCSEC_GSS in 5606 fetching location information, we need to consider the following 5607 possibilities for requests to interrogate location information, 5608 with interrogation approaches on the referring and destination 5609 servers arrived at separately: 5611 o The use of RPCSEC_GSS with integrity protection is RECOMMENDED 5612 in all cases, since the absence of integrity protection exposes 5613 the client to the possibility of the results being modified in 5614 transit. 5616 o The use of requests issued without RPCSEC_GSS (i.e. using 5617 AUTH_SYS which has no provision to avoid modification of data 5618 in flight), while undesirable and a potential security 5619 exposure, may not be avoidable in all cases. Where the use of 5620 the returned information cannot be avoided, it is made subject 5621 to filtering as described above to eliminate the possibility 5622 that the client would treat an invalid address as if it were a 5623 NFSv4 server. The specifics will vary depending on the degree 5624 of network isolation and whether the request is to the 5625 referring or destination servers. 5627 9. IANA Considerations 5629 This document does not require actions by IANA. 5631 10. References 5633 10.1. Normative References 5635 [CSOR_AES] 5636 National Institute of Standards and Technology, 5637 "Cryptographic Algorithm Object Registration", URL 5638 http://csrc.nist.gov/groups/ST/crypto_apps_infra/csor/ 5639 algorithms.html, November 2007. 5641 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 5642 Requirement Levels", BCP 14, RFC 2119, 5643 DOI 10.17487/RFC2119, March 1997, 5644 . 5646 [RFC2203] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol 5647 Specification", RFC 2203, DOI 10.17487/RFC2203, September 5648 1997, . 5650 [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. 5651 Rose, "DNS Security Introduction and Requirements", 5652 RFC 4033, DOI 10.17487/RFC4033, March 2005, 5653 . 5655 [RFC4055] Schaad, J., Kaliski, B., and R. Housley, "Additional 5656 Algorithms and Identifiers for RSA Cryptography for use in 5657 the Internet X.509 Public Key Infrastructure Certificate 5658 and Certificate Revocation List (CRL) Profile", RFC 4055, 5659 DOI 10.17487/RFC4055, June 2005, 5660 . 5662 [RFC4120] Neuman, C., Yu, T., Hartman, S., and K. Raeburn, "The 5663 Kerberos Network Authentication Service (V5)", RFC 4120, 5664 DOI 10.17487/RFC4120, July 2005, 5665 . 5667 [RFC5403] Eisler, M., "RPCSEC_GSS Version 2", RFC 5403, 5668 DOI 10.17487/RFC5403, February 2009, 5669 . 5671 [RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol 5672 Specification Version 2", RFC 5531, DOI 10.17487/RFC5531, 5673 May 2009, . 5675 [RFC5665] Eisler, M., "IANA Considerations for Remote Procedure Call 5676 (RPC) Network Identifiers and Universal Address Formats", 5677 RFC 5665, DOI 10.17487/RFC5665, January 2010, 5678 . 5680 [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System 5681 (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, 5682 March 2015, . 5684 [RFC7861] Adamson, A. and N. Williams, "Remote Procedure Call (RPC) 5685 Security Version 3", RFC 7861, DOI 10.17487/RFC7861, 5686 November 2016, . 5688 [RFC7931] Noveck, D., Ed., Shivam, P., Lever, C., and B. Baker, 5689 "NFSv4.0 Migration: Specification Update", RFC 7931, 5690 DOI 10.17487/RFC7931, July 2016, 5691 . 5693 [RFC8166] Lever, C., Ed., Simpson, W., and T. Talpey, "Remote Direct 5694 Memory Access Transport for Remote Procedure Call Version 5695 1", RFC 8166, DOI 10.17487/RFC8166, June 2017, 5696 . 5698 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 5699 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 5700 May 2017, . 5702 [RFC8178] Noveck, D., "Rules for NFSv4 Extensions and Minor 5703 Versions", RFC 8178, DOI 10.17487/RFC8178, July 2017, 5704 . 5706 10.2. Informative References 5708 [I-D.ietf-nfsv4-mv0-trunking-update] 5709 Lever, C. and D. Noveck, "NFS version 4.0 Trunking 5710 Update", draft-ietf-nfsv4-mv0-trunking-update-05 (work in 5711 progress), February 2019. 5713 [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- 5714 Hashing for Message Authentication", RFC 2104, 5715 DOI 10.17487/RFC2104, February 1997, 5716 . 5718 [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., 5719 "Network File System (NFS) Version 4 Minor Version 1 5720 Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, 5721 . 5723 Appendix A. Classification of Document Sections 5725 Using the classification appearing in Section 3.3, we can proceed 5726 through the current document and classify its sections as listed 5727 below. In this listing, when we refer to a Section X and there is a 5728 Section X.1 within it, the classification of Section X refers to the 5729 part of that section exclusive of subsections. In the case when that 5730 portion is empty, the section is not counted. 5732 o Sections 1 and 2 are both explanatory. 5734 o Section 3 consists of four sections all of which are explanatory. 5736 o Appendix B consists of nine sections all of which are explanatory. 5738 o Section 4 consists of five sections of which the first is 5739 explanatory, while the remaining four, from Section 4.1, to 4.3.1, 5740 are all replacement sections. 5742 o Overall, Section 5 is a replacement for Section 11 of [RFC5661]. 5743 However, with regard to its subsections: 5745 o Section 5 itself is a replacement section. 5747 o Section 5.1 is an additional section. 5749 o Section 5.2 is a replacement section. 5751 o Sections 5.13 through 5.13.2, a total of four sections are all 5752 additional sections. 5754 o Section 5.5 is a replacement section. 5756 o Sections 5.5.1 through 5.5.3, a total of three sections, are 5757 all additional sections. 5759 o Sections 5.5.4 through 5.5.6, a total of three sections, are 5760 all replacement sections. 5762 o Section 5.5.7 is an additional section. 5764 o Section 5.6 is a transferred section. 5766 o Sections 5.7 and 5.8 are both additional sections. 5768 o Sections 5.9 through 5.9.9, a total of eleven sections, are all 5769 replacement sections. 5771 o Sections 5.9.9.1 and 5.9.9.2 are both transferred sections. 5773 o Sections 5.10 through 5.12.3, a total of twelve sections, are 5774 all additional sections. 5776 o Sections 5.13 through 5.14, a total of four sections, are all 5777 transferred sections. 5779 o Section 5.15 is a replacement sections, which consists of a 5780 total of four sections. 5782 o Section 5.16 is a transferred section. 5784 o Section 6 includes the following nine sections: 5786 o Section 6 itself is an explanatory section. 5788 o Table 1 is an additional section. 5790 o The remaining seven sections, from Section 6.2 through 6.3.5 5791 are all replacement sections. 5793 o Section 7 is a replacement section, which consists of a total of 5794 ten sections. 5796 o Section 8 is an editing section. 5798 o Section 9 through Acknowledgments, a total of six sections, are 5799 all replacement sections. 5801 To summarize: 5803 o There are seventeen explanatory sections. 5805 o There are forty-eight replacement sections. 5807 o There are twenty-four additional sections. 5809 o There are eight transferred sections. 5811 o There is editing section. 5813 Appendix B. Revisions Made to [RFC5661] 5815 B.1. Revisions Made to Section 11 of [RFC5661] 5817 A number of areas need to be revised, replacing existing sub-sections 5818 within section 11 of [RFC5661]: 5820 o New introductory material, including a terminology section, 5821 replaces the existing material in [RFC5661] ranging from the start 5822 of the existing Section 11 up to and including the existing 5823 Section 11.1. The new material starts at the beginning of 5824 Section 5 and continues through 5.2 below. 5826 o A significant reorganization of the material in the existing 5827 Sections 11.4 and 11.5 (of [RFC5661]) is necessary. The reasons 5828 for the reorganization of these sections into a single section 5829 with multiple subsections are discussed in Appendix B.1.1 below. 5830 This replacement appears as Section 5.5 below. 5832 New material relating to the handling of the file system location 5833 attributes is contained in Sections 5.5.1 and 5.5.7 below. 5835 o A major replacement for the existing Section 11.7 of [RFC5661] 5836 entitled "Effecting File System Transitions", will appear as 5837 Sections 5.7 through 5.12 of the current document. The reasons 5838 for the reorganization of this section into multiple sections are 5839 discussed below in Appendix B.1.2 of the current document. 5841 o A replacement for the existing Section 11.10 of [RFC5661] entitled 5842 "The Attribute fs_locations_info", will appear as Section 5.15 of 5843 the current document, with Appendix B.1.3 describing the 5844 differences between the new section and the treatment within 5845 [RFC5661]. A revised treatment is necessary because the existing 5846 treatment did not make clear how the added attribute information 5847 relates to the case of trunked paths to the same replica. These 5848 issues were not addressed in [RFC5661] where the concepts of a 5849 replica and a network path used to access a replica were not 5850 clearly distinguished. 5852 B.1.1. Re-organization of Sections 11.4 and 11.5 of [RFC5661] 5854 Previously, issues related to the fact that multiple location entries 5855 directed the client to the same file system instance were dealt with 5856 in a separate Section 11.5 of [RFC5661]. Because of the new 5857 treatment of trunking, these issues now belong within Section 5.5 5858 below. 5860 In this new section of the current document, trunking is dealt with 5861 in Section 5.5.2 together with the other uses of file system location 5862 information described in Sections Section 5.5.3 through 5.5.6. 5864 As a result, Section 5.5 which will replace Section 11.4 of [RFC5661] 5865 is substantially different than the section it replaces in that some 5866 existing sections will be replaced by corresponding sections below 5867 while, at the same time, new sections will be added, resulting in a 5868 replacement containing some renumbered sections, as follows: 5870 o The material in Section 5.5 of the current document, exclusive of 5871 subsections, replaces the material in Section 11.4 of [RFC5661] 5872 exclusive of subsections. 5874 o Section 5.5.1 of the current document is a new first subsection of 5875 the overall section. In a consolidated document it would appear 5876 as Section 11.4.1. 5878 o Section 5.5.2 of the current document is a new second subsection 5879 of the overall section. In a consolidated document it would 5880 appear as Section 11.4.2. 5882 o Each of the Sections 5.5.4, 5.5.5, and 5.5.6 of the current 5883 document replaces (in order) one of the corresponding Sections 5884 11.4.1, 11.4.2, and 11.4.3 of [RFC5661]. In a consolidated 5885 document they would appear as Sections 11.4.3, 11.4.4, and 11.4.5. 5887 o Section 5.5.7 of the current document is a new final subsection of 5888 the overall section. In a consolidated document it would appear 5889 as Section 11.4.6. 5891 B.1.2. Re-organization of Material Dealing with File System Transitions 5893 The material relating to file system transition, previously contained 5894 in Section 11.7 of [RFC5661] has been reorganized and augmented as 5895 described below: 5897 o Because there can be a shift of the network access paths used to 5898 access a file system instance without any shift between replicas, 5899 a new Section 5.7 in the current document distinguishes between 5900 those cases in which there is a shift between distinct replicas 5901 and those involving a shift in network access paths with no shift 5902 between replicas. 5904 As a result, a new Section 5.8 in the current document deals with 5905 network address transitions while the bulk of the former 5906 Section 11.7 (in [RFC5661]) is extensively modified as reflected 5907 by Section 5.9 in the current document which is now limited to 5908 cases in which there is a shift between two different sets of 5909 replicas. 5911 o The additional Section 5.10 in the current document discusses the 5912 case in which a shift to a different replica is made and state is 5913 transferred to allow the client the ability to have continued 5914 access to its accumulated locking state on the new server. 5916 o The additional Section 5.11 in the current document discusses the 5917 client's response to access transitions and how it determines 5918 whether migration has occurred, and how it gets access to any 5919 transferred locking and session state. 5921 o The additional Section 5.12 in the current document discusses the 5922 responsibilities of the source and destination servers when 5923 transferring locking and session state. 5925 This re-organization has caused a renumbering of the sections within 5926 Section 11 of [RFC5661] as described below: 5928 o The new Sections 5.7 and 5.8 in the current document would appear 5929 as Sections 11.7 and 11.8 respectively, in an eventual 5930 consolidated document. 5932 o Section 11.7 of [RFC5661] will be modified as described in 5933 Section 5.9. The necessary modifications reflect the fact that 5934 this section will only deal with transitions between replicas 5935 while transitions between network addresses are dealt with in 5936 other sections. Details of the reorganization are described later 5937 in this section. The updated section would appear as Section 11.9 5938 in an eventual consolidated document. 5940 o The additional Sections 5.10, 5.11, and 5.12 in the current 5941 document would appear as Sections 11.10, 11.11, and 11.12 5942 respectively, in an eventual consolidated document. 5944 o Consequently, Sections 11.8, 11.9, 11.10, and 11.11 in [RFC5661] 5945 would appear as Sections 11.13, 11.14, 11.15, and 11.16 5946 respectively, in an eventual consolidated document. 5948 As part of this general re-organization, Section 11.7 of [RFC5661] 5949 will be modified as described below: 5951 o Sections 11.7 and 11.7.1 of [RFC5661] are to be replaced by 5952 Sections 5.9 and 5.9.1, respectively of the current document. 5953 These sections would appear as Section 11.9 and 11.9.1 in an 5954 eventual consolidated document. 5956 o Section 11.7.2 (and included subsections) of [RFC5661] are to be 5957 deleted. 5959 o Sections 11.7.3, 11.7.4. 11.7.5, 11.7.5.1, and 11.7.6 [RFC5661] 5960 are to be replaced by Sections 5.9.2, 5.9.3, 5.9.4, 5.9.4.1, and 5961 5.9.5 respectively of the current document. These sections would 5962 appear as Sections 11.9.2, 11.9.3 11.9.4, 11.9.4.1 and 11.9.5 in 5963 an eventual consolidated document. 5965 o Section 11.7.7 of [RFC5661] is to be replaced by Section 5.9.9. 5966 Because this sub-section has been moved to the end of the section 5967 dealing with file system transitions, it would appear as 5968 Section 11.9.9 in an eventual consolidated document. 5970 o Sections 11.7.8, 11.7.9. and 11.7.10 of [RFC5661] are to be 5971 replaced by Sections 5.9.6, 5.9.7, and 5.9.8 respectively of the 5972 current document. These sections would appear as Sections 11.9.6, 5973 11.9.7 and 11.9.8 in an eventual consolidated document. 5975 B.1.3. Updates to treatment of fs_locations_info 5977 Various elements of the fs_locations_info attribute contain 5978 information that applies to either a specific file system replica or 5979 to a network path or set of network paths used to access such a 5980 replica. The existing treatment of fs_locations info (in 5981 Section 11.10 of [RFC5661]) does not clearly distinguish these cases, 5982 in part because the document did not clearly distinguish replicas 5983 from the paths used to access them. 5985 In addition, special clarification needed to be provided with regard 5986 to the following fields: 5988 o With regard to the handling of FSLI4GF_GOING, it needs to be made 5989 clear that this only applies to the unavailability of a replica 5990 rather than to a path to access a replica. 5992 o In describing the appropriate value for a server to use for 5993 fli_valid_for, it needs to be made clear that there is no need for 5994 the client to frequently fetch the fs_locations_info value to be 5995 prepared for shifts in trunking patterns. 5997 o Clarification of the rules for extensions to the fls_info needs to 5998 be provided. The existing treatment reflects the extension model 5999 in effect at the time [RFC5661] was written, and need to be 6000 updated in accordance with the extension model described in 6001 [RFC8178]. 6003 B.2. Revisions Made to Operations in RFC5661 6005 Revised descriptions were needed to address issues that arose in 6006 effecting necessary changes to multi-server namespace features. 6008 o The existing treatment of EXCHANGE_ID (in Section 18.35 of 6009 [RFC5661]) assumes that client IDs cannot be created/ confirmed 6010 other than by the EXCHANGE_ID and CREATE_SESSION operations. 6011 Also, the necessary use of EXCHANGE_ID in recovery from migration 6012 and related situations is not addressed clearly. A revised 6013 treatment of EXCHANGE_ID is necessary and it appears in 6014 Section 7.1 below while the specific differences between it and 6015 the treatment within [RFC5661] are explained in Appendix B.2.1 6016 below. 6018 o The existing treatment of RECLAIM_COMPLETE in section 18.51 of 6019 [RFC5661]) is not sufficiently clear about the purpose and use of 6020 the rca_one_fs and how the server is to deal with inappropriate 6021 values of this argument. Because the resulting confusion raises 6022 interoperability issues, a new treatment of RECLAIM_COMPLETE is 6023 necessary and it appears in Section 7.2 below while the specific 6024 differences between it and the treatment within [RFC5661] are 6025 discussed in Appendix B.2.2 below. In addition, the definitions 6026 of the reclaim-related errors receive an updated treatment in 6027 Section 6.3 to reflect the fact that there are multiple contexts 6028 for lock reclaim operations. 6030 B.2.1. Revision to Treatment of EXCHANGE_ID 6032 There are a number of issues in the original treatment of EXCHANGE_ID 6033 (in [RFC5661]) that cause problems for Transparent State Migration 6034 and for the transfer of access between different network access paths 6035 to the same file system instance. 6037 These issues arise from the fact that this treatment was written, 6039 o Assuming that a client ID can only become known to a server by 6040 having been created by executing an EXCHANGE_ID, with confirmation 6041 of the ID only possible by execution of a CREATE_SESSION. 6043 o Considering the interactions between a client and a server only 6044 occurring on a single network address 6046 As these assumptions have become invalid in the context of 6047 Transparent State Migration and active use of trunking, the treatment 6048 has been modified in several respects. 6050 o It had been assumed that an EXCHANGED_ID executed when the server 6051 is already aware of a given client instance must be either 6052 updating associated parameters (e.g. with respect to callbacks) or 6053 a lingering retransmission to deal with a previously lost reply. 6054 As result, any slot sequence returned by that operation would be 6055 of no use. The existing treatment goes so far as to say that it 6056 "MUST NOT" be used, although this usage is not in accord with 6057 [RFC2119]. This created a difficulty when an EXCHANGE_ID is done 6058 after Transparent State Migration since that slot sequence would 6059 need to be used in a subsequent CREATE_SESSION. 6061 In the updated treatment, CREATE_SESSION is a way that client IDs 6062 are confirmed but it is understood that other ways are possible. 6063 The slot sequence can be used as needed and cases in which it 6064 would be of no use are appropriately noted. 6066 o It was assumed that the only functions of EXCHANGE_ID were to 6067 inform the server of the client, create the client ID, and 6068 communicate it to the client. When multiple simultaneous 6069 connections are involved, as often happens when trunking, that 6070 treatment was inadequate in that it ignored the role of 6071 EXCHANGE_ID in associating the client ID with the connection on 6072 which it was done, so that it could be used by a subsequent 6073 CREATE_SESSSION, whose parameters do not include an explicit 6074 client ID. 6076 The new treatment explicitly discusses the role of EXCHANGE_ID in 6077 associating the client ID with the connection so it can be used by 6078 CREATE_SESSION and in associating a connection with an existing 6079 session. 6081 The new treatment can be found in Section 7.1 below. It is intended 6082 to supersede the treatment in Section 18.35 of [RFC5661]. Publishing 6083 a complete replacement for Section 18.35 allows the corrected 6084 definition to be read as a whole, in place of the one in [RFC5661]. 6086 B.2.2. Revision to Treatment of RECLAIM_COMPLETE 6088 The following changes were made to the treatment of RECLAIM_COMPLETE 6089 in [RFC5661] to arrive at the treatment in Section 7.2. 6091 o In a number of places the text is made more explicit about the 6092 purpose of rca_one_fs and its connection to file system migration. 6094 o There is a discussion of situations in which particular forms of 6095 RECLAIM_COMPLETE would need to be done. 6097 o There is a discussion of interoperability issues that result from 6098 implementations that may have arisen due to the lack of clarity of 6099 the previous treatment of RECLAIM_COMPLETE. 6101 B.3. Revisions Made to Error Definitions in [RFC5661] 6103 The new handling of various situations required revisions of some 6104 existing error definition: 6106 o Because of the need to appropriately address trunking-related 6107 issues, some uses of the term "replica" in [RFC5661] have become 6108 problematic since a shift in network access paths was considered 6109 to be a shift to a different replica. As a result, the existing 6110 definition of NFS4ERR_MOVED (in Section 15.1.2.4 of [RFC5661]) 6111 needs to be updated to reflect the different handling of 6112 unavailability of a particular fs via a specific network address. 6114 Since such a situation is no longer considered to constitute 6115 unavailability of a file system instance, the description needs to 6116 change even though the set of circumstances in which it is to be 6117 returned remain the same. The new paragraph explicitly recognizes 6118 that a different network address might be used, while the previous 6119 description, misleadingly, treated this as a shift between two 6120 replicas while only a single file system instance might be 6121 involved. The updated description appears in Section 6.2 below. 6123 o Because of the need to accommodate use of fs-specific grace 6124 periods, it is necessary to clarify some of the error definitions 6125 of reclaim-related errors in Section 15 of [RFC5661], so the text 6126 applies properly to reclaims for all types of grace periods. The 6127 updated descriptions appear in Section 6.3 below. 6129 B.4. Other Revisions Made to [RFC5661] 6131 Beside the major reworking of Section 11 and the associated revisions 6132 to existing operations and errors, there are a number of related 6133 changes that are necessary: 6135 o The summary that appeared in Section 1.7.3.3 of [RFC5661] was 6136 revised to reflect the changes made in Section 5 of the current 6137 document. The updated summary appears as Section 4.1 below. 6139 o The discussion of server scope which appeared in Section 2.10.4 of 6140 [RFC5661] needed to be replaced, since the previous text appears 6141 to require a level of inter-server co-ordination incompatible with 6142 its basic function of avoiding the need for a globally uniform 6143 means of assigning server_owner values. A revised treatment 6144 appears in Section 4.2 below. 6146 o The discussion of trunking which appeared in Section 2.10.5 of 6147 [RFC5661] needed to be revised, to more clearly explain the 6148 multiple types of trunking supporting and how the client can be 6149 made aware of the existing trunking configuration. In addition 6150 the last paragraph (exclusive of sub-sections) of that section, 6151 dealing with server_owner changes, is literally true, it has been 6152 a source of confusion. Since the existing paragraph can be read 6153 as suggesting that such changes be dealt with non-disruptively, 6154 the issue needs to be clarified in the revised section, which 6155 appears in Section 4.3 6157 Appendix C. Disposition of Sections Within [RFC5661] 6159 In this appendix, we proceed through [RFC5661] identifying sections 6160 as unchanged, modified, deleted, or replaced and indicating where 6161 additional sections from the current document would appear in an 6162 eventual consolidated description of NFSv4.1. In this presentation, 6163 when section X is referred to, it denotes that section plus all 6164 included subsections. When it is necessary to refer to the part of a 6165 section outside any included subsections, the exclusion is noted 6166 explicitly. 6168 o Section 1 is unmodified except that Section 1.7.3.3 is to be 6169 replaced by Section 4.1 from the current document. 6171 o Section 2 is unmodified except for the specific items listed 6172 below: 6174 o Section 2.10.4 is replaced by Section 4.2 from the current 6175 document. 6177 o Section 2.10.5 is replaced by Section 4.3 of the current 6178 document. 6180 o Sections 3 through 10 are unchanged. 6182 o Section 11 is extensively modified as discussed below. 6184 o Section 11, exclusive of subsections, is replaced by the 6185 material from the start of Section 5 and continuing through 6186 Section 5.1, all from the current document. 6188 o Section 11.1 is replaced by Section 5.2 from the current 6189 document. 6191 o Sections 11.2, 11.3, 11.3.1, and 11.3.2 are unchanged. 6193 o Section 11.4 is replaced by Section 5.5 from the current 6194 document. For details regarding subsections see below. 6196 o New sections corresponding to Sections 5.5.1 through 5.5.3 6197 from the current document appear next. 6199 o Section 11.4.1 is replaced by Section 5.5.4 from the current 6200 document. 6202 o Section 11.4.2 is replaced by Section 5.5.5 from the current 6203 document. 6205 o Section 11.4.3 is replaced by Section 5.5.6 from the current 6206 document. 6208 o A new section corresponding to Section 5.5.7 from the 6209 current document appears next. 6211 o Section 11.5 is to be deleted. 6213 o Section 11.6 is unchanged. 6215 o New sections corresponding to Sections 5.7 and 5.8 from the 6216 current document appear next. 6218 o Section 11.7 is replaced by Section 5.9 from the current 6219 document. For details regarding subsections see below. This 6220 section (with included subsections) would appear as 6221 Section 11.9 in an eventual consolidated document. In addition 6222 to the shift from Section 11.7 to Section 11.9, subsections 6223 within it would be affected by the deletion of Section 11.7.2 6224 and the move of Section 11.7.7 to be the last sub-section. 6226 o Section 11.7.1 is replaced by Section 5.9.1 from the current 6227 document. In an eventual consolidated document, it would 6228 appear as Section 11.9.1. 6230 o Sections 11.7.2, 11.7.2.1, and 11.7.2.2 are deleted. 6232 o Section 11.7.3 is replaced by Section 5.9.2 from the current 6233 document. In an eventual consolidated document, it would 6234 appear as Section 11.9.2. 6236 o Section 11.7.4 is replaced by Section 5.9.3 from the current 6237 document. In an eventual consolidated document, it would 6238 appear as Section 11.9.3. 6240 o Sections 11.7.5 and 11.7.5.1 are replaced by Sections 5.9.4 6241 and 5.9.4.1 respectively, from the current document. In an 6242 eventual consolidated document, they would appear as 6243 Sections 11.9.4 and 11.9.4.1. 6245 o Section 11.7.6 is replaced by Section 5.9.5 from the current 6246 document. In an eventual consolidated document, it would 6247 appear as Section 11.9.5. 6249 o Section 11.7.7, exclusive of subsections, is replaced by 6250 Section 5.9.9 from the current document. Sections 11.7.7.1 6251 and 11.7.7.2 are unchanged. Because this section will 6252 become the last sub-section of the replacement for 6253 Section 11.7, it would appear as Section 11.9.9 in an 6254 eventual consolidated document. 6256 o Section 11.7.8 is replaced by Section 5.9.6 from the current 6257 document. In an eventual consolidated document, it would 6258 appear as Section 11.9.6. 6260 o Section 11.7.9 is replaced by Section 5.9.7 from the current 6261 document. In an eventual consolidated document, it would 6262 appear as Section 11.9.7. 6264 o Section 11.7.10 is replaced by Section 5.9.8 from the 6265 current document. In an eventual consolidated document, it 6266 would appear as Section 11.9.8. 6268 o New sections corresponding to Sections 5.10, 5.11, and 5.12 6269 from the current document appear next as additional sub- 6270 sections of Section 11. Each of these has subsections, so 6271 there is a total of seventeen sections added. These sections 6272 would appear as Sections 11.10, 11.11, and 11.12 respectively 6273 in an eventual consolidated document. 6275 o Sections 11.8, 11.8.1, 11.8.2, and 11.9, are unchanged although 6276 they would be renumbered as Sections 11.13 (with included 6277 subsections) and 11.14 in an eventual consolidated document. 6279 o Sections 11.10, 11.10.1, 11.10.2, and 11.10.3 are replaced by 6280 Sections 5.15 through 5.15.3 from the current document. These 6281 sections would appear as Section 11.15 (with included 6282 subsections) in an eventual consolidated document. 6284 o Section 11.11 is unchanged, although it would appear as 6285 Section 11.16 in an eventual consolidated document. 6287 o Sections 12 through 14 are unchanged. 6289 o Section 15 is unmodified except that 6291 * The description of NFS4ERR_MOVED in Section 15.1.2,4 is revised 6292 as described in Section 6.2 of the current document. 6294 * The description of the reclaim-related errors in section 15.1.9 6295 is replaced by the revised descriptions in Section 6.3 of the 6296 current document. 6298 o Sections 16 and 17 are unchanged. 6300 o Section 18 is unmodified except for the following: 6302 * Section 18.35 is replaced by Section 7.1 in the current 6303 document. 6305 * Section 18.51 is replaced by Section 7.2 in the current 6306 document. 6308 o Sections 19 through 23 are unchanged. 6310 In terms of top-level sections, exclusive of appendices: 6312 o There is one heavily modified top-level section (Section 11) 6314 o There are five other modified top-level sections (Sections 1, 2, 6315 15, 18), and 21. 6317 o The other seventeen top-level sections are unchanged. 6319 The disposition of sections of [RFC5661] is summarized in the 6320 following table which provides counts of sections replaced, added, 6321 deleted, modified, or unchanged. Separate counts are provided for: 6323 o Top-level sections. 6325 o Sections with TOC entries. 6327 o Sections within Section 11. 6329 o Sections outside Section 11. 6331 In this table, the counts for top-level sections and TOC entries are 6332 for sections including subsections while other counts are for 6333 sections exclusive of included subsections. 6335 +------------+------+------+--------+------------+--------+ 6336 | Status | Top | TOC | in 11 | not in 11 | Total | 6337 +------------+------+------+--------+------------+--------+ 6338 | Replaced | 0 | 6 | 21 | 15 | 36 | 6339 | Added | 0 | 5 | 24 | 0 | 24 | 6340 | Deleted | 0 | 1 | 4 | 0 | 4 | 6341 | Modified | 6 | 9 | 0 | 2 | 2 | 6342 | Unchanged | 17 | 199 | 12 | 910 | 922 | 6343 | in RFC5661 | 23 | 220 | 37 | 927 | 964 | 6344 +------------+------+------+--------+------------+--------+ 6346 Acknowledgments 6348 The authors wish to acknowledge the important role of Andy Adamson of 6349 Netapp in clarifying the need for trunking discovery functionality, 6350 and exploring the role of the file system location attributes in 6351 providing the necessary support. 6353 The authors also wish to acknowledge the work of Xuan Qi of Oracle 6354 with NFSv4.1 client and server prototypes of transparent state 6355 migration functionality. 6357 The authors wish to thank others that brought attention to important 6358 issues. The comments of Trond Myklebust of Primary Data related to 6359 trunking helped to clarify the role of DNS in trunking discovery. 6360 Rick Macklem's comments brought attention to problems in the handling 6361 of the per-fs version of RECLAIM_COMPLETE. 6363 The authors wish to thank Olga Kornievskaia of Netapp for her helpful 6364 review comments. 6366 Authors' Addresses 6367 David Noveck (editor) 6368 NetApp 6369 1601 Trapelo Road 6370 Waltham, MA 02451 6371 United States of America 6373 Phone: +1 781 572 8038 6374 Email: davenoveck@gmail.com 6376 Charles Lever 6377 Oracle Corporation 6378 1015 Granger Avenue 6379 Ann Arbor, MI 48104 6380 United States of America 6382 Phone: +1 248 614 5091 6383 Email: chuck.lever@oracle.com