idnits 2.17.1 draft-dnoveck-nfsv4-mv1-msns-update-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC5661, updated by this document, for RFC5378 checks: 2005-10-21) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 13, 2017) is 2349 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5661 (Obsoleted by RFC 8881) == Outdated reference: A later version (-16) exists of draft-ietf-nfsv4-migration-issues-13 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 D. Noveck 3 Internet-Draft NetApp 4 Updates: 5661 (if approved) C. Lever 5 Intended status: Standards Track ORACLE 6 Expires: May 17, 2018 November 13, 2017 8 NFSv4.1 Update for Multi-Server Namespace 9 draft-dnoveck-nfsv4-mv1-msns-update-01 11 Abstract 13 This document presents necessary clarifications and corrections 14 concerning features related to the use of location-related attributes 15 in NFSv4.1. These include migration, which transfers responsibility 16 for a file system from one server to another, and trunking which 17 deals with the discovery and control of the set of network addresses 18 to use to access a file system. This document updates RFC5661. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at https://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on May 17, 2018. 37 Copyright Notice 39 Copyright (c) 2017 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (https://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 56 3. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 4 57 3.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 58 3.2. Summary of Issues . . . . . . . . . . . . . . . . . . . . 6 59 3.3. Relationship of this Document to RFC5661 . . . . . . . . 8 60 4. Changes to Section 11 of RFC5661 . . . . . . . . . . . . . . 9 61 4.1. Multi-Server Namespace (as updated) . . . . . . . . . . . 9 62 4.2. Location-related Terminology (to be added) . . . . . . . 10 63 4.3. Location Attributes (as updated) . . . . . . . . . . . . 11 64 4.4. Re-organization of Sections 11.4 and 11.5 of RFC5661 . . 12 65 4.5. Uses of Location Information (as updated) . . . . . . . . 12 66 4.5.1. Combining Multiple Uses in a Single Attribute (to be 67 added) . . . . . . . . . . . . . . . . . . . . . . . 13 68 4.5.2. Location Attributes and Trunking (to be added) . . . 14 69 4.5.3. File System Replication (as updated) . . . . . . . . 15 70 4.5.4. File System Migration (as updated) . . . . . . . . . 15 71 4.5.5. Referrals (as updated) . . . . . . . . . . . . . . . 16 72 4.5.6. Changes in a Location Attribute (to be added) . . . . 18 73 5. Re-organization of Section 11.7 of RFC5661 . . . . . . . . . 19 74 6. Overview of File Access Transitions (to be added) . . . . . . 19 75 7. Effecting Network Address Transitions (to be added) . . . . . 20 76 8. Effecting File System Transitions (as updated) . . . . . . . 20 77 8.1. File System Transitions and Simultaneous Access (as 78 updated) . . . . . . . . . . . . . . . . . . . . . . . . 21 79 8.2. Filehandles and File System Transitions (as updated) . . 22 80 8.3. Fileids and File System Transitions (as updated) . . . . 22 81 8.4. Fsids and File System Transitions (as updated) . . . . . 23 82 8.4.1. File System Splitting (as updated) . . . . . . . . . 24 83 8.5. The Change Attribute and File System Transitions (as 84 updated) . . . . . . . . . . . . . . . . . . . . . . . . 24 85 8.6. Write Verifiers and File System Transitions (as updated) 25 86 8.7. Readdir Cookies and Verifiers and File System Transitions 87 (as updated) . . . . . . . . . . . . . . . . . . . . . . 25 88 8.8. File System Data and File System Transitions (as updated) 25 89 8.9. Lock State and File System Transitions (as updated) . . . 27 90 9. Transferring State upon Migration (to be added) . . . . . . . 27 91 9.1. Transparent State Migration and pNFS (to be added) . . . 28 92 10. Client Responsibilities when Access is Transitioned (to be 93 added) . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 94 10.1. Client Transition Notifications (to be added) . . . . . 29 95 10.2. Use of a Transition Discovery Thread (to be added) . . 31 96 10.3. Overview of Client Response to NFS4ERR_MOVED (to be 97 added) . . . . . . . . . . . . . . . . . . . . . . . . . 33 98 10.4. Obtaining Access to Sessions and State after Migration 99 (to be added) . . . . . . . . . . . . . . . . . . . . . 35 100 10.5. Obtaining Access to Sessions and State after Network 101 Address Transfer (to be added) . . . . . . . . . . . . . 36 102 11. Server Responsibilities Upon Migration (to be added) . . . . 37 103 11.1. Server Responsibilities in Effecting Transparent State 104 Migration (to be added) . . . . . . . . . . . . . . . . 37 105 11.2. Server Responsibilities in Effecting Session Transfer 106 (to be added) . . . . . . . . . . . . . . . . . . . . . 39 107 12. Changes to RFC5661 outside Section 11 . . . . . . . . . . . . 41 108 12.1. (Introduction to) Multi-Server Namespace (as updated) . 42 109 12.2. Server Scope (as updated) . . . . . . . . . . . . . . . 43 110 12.3. Revised Treatment of NFS4ERR_MOVED . . . . . . . . . . . 45 111 12.4. Revised Discussion of Server_owner changes . . . . . . . 45 112 12.5. Revision to Treatment of EXCHANGE_ID . . . . . . . . . . 46 113 13. Operation 42: EXCHANGE_ID - Instantiate Client ID (as 114 updated) . . . . . . . . . . . . . . . . . . . . . . . . . . 47 115 14. Security Considerations . . . . . . . . . . . . . . . . . . . 65 116 15. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 67 117 16. References . . . . . . . . . . . . . . . . . . . . . . . . . 67 118 16.1. Normative References . . . . . . . . . . . . . . . . . . 67 119 16.2. Informative References . . . . . . . . . . . . . . . . . 68 120 Appendix A. Classification of Document Sections . . . . . . . . 69 121 Appendix B. Updates to RFC5661 . . . . . . . . . . . . . . . . . 70 122 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 73 123 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 73 125 1. Introduction 127 This document deals with the proper handling of the location-related 128 attributes fs_locations and fs_locations_info and how necessary 129 changes in those attributes are to be dealt with. Important 130 background regarding these changes is to be found in 131 [I-D.ietf-nfsv4-migration-issues]. 133 A large number of the changes to be made parallel those in [RFC7931], 134 which clarifies the handling of Transparent State Migration in 135 NFSv4.0. Many of the issues dealt with there need to be addressed in 136 the context of NFSv4.1. 138 Another important issue to be dealt with concerns the handling of 139 multiple entries within location-related attributes that represent 140 different ways to access the same file system. Unfortunately 141 [RFC5661], while recognizing that these entries can represent 142 different ways to access the same file system, confuses the matter by 143 treating network access paths as "replicas", making it difficult for 144 these attributes to be used to obtain information about the network 145 addresses to be used to access particular file system instances and 146 engendering confusion between a transition between network access 147 paths to the same file system instance and a transition between two 148 replicas. 150 When location information is used to determine the set of network 151 addresses to access a particular file system instance (i.e. to 152 perform trunking discovery), clarification is needed regarding the 153 interaction of trunking and transitions between file system replicas, 154 including migration. Unfortunately [RFC5661], while it provided a 155 method of determining whether two network addresses were connected to 156 the same server, did not address the issue of trunking discovery, 157 making it necessary to address it in this document. 159 2. Requirements Language 161 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 162 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 163 document are to be interpreted as described in [RFC2119]. 165 3. Preliminaries 167 3.1. Terminology 169 While most of the terms related to multi-server namespace issues are 170 appropriately defined in the replacement for Section 11 in [RFC5661] 171 and appear in Section 4.2 below, there are a number of terms used 172 outside that context that are explained here. 174 In this document the phrase "client ID" always refers to the 64-bit 175 shorthand identifier assigned by the server (a clientid4) and never 176 to the structure which the client uses to identify itself to the 177 server (called an nfs_client_id4 or client_owner in NFSv4.0 and 178 NFSv4.1 respectively). The opaque identifier within those structures 179 is referred to as a "client id string". 181 It is particularly important to clarify the distinction between 182 trunking detection and trunking discovery. The definitions we 183 present will be applicable to all minor versions of NFSv4, but we 184 will put particular emphasis on how these terms apply to NFS version 185 4.1. 187 o Trunking detection refers to ways of deciding whether two specific 188 network addresses are connected to the same NFSv4 server. The 189 means available to make this determination depends on the protocol 190 version, and, in some cases, on the client implementation. 192 In the case of NFS version 4.1 and later minor versions, the means 193 of trunking detection are as described by [RFC5661] and are 194 available to every client. Two network connected to the same 195 server are always server-trunkable but are not necessarily 196 session-trunkable. 198 In the case of NFS v4.0, the means to be used are described in 199 [RFC7931] and require use of the uniform client string approach to 200 be effective. There is no concept of session-trunkability. 202 o Trunking discovery is a process by which a client using one 203 network address can obtain other addresses that are connected to 204 the same server Typically it builds on a trunking detection 205 facility by proving one or more methods by which candidate 206 addresses are made available to the client who can then use 207 trunking detection to appropriately filter them. 209 Despite the support for trunking detection there was no 210 description of trunking discovery provided in [RFC5661]. 212 In the case of NFSv4.0, trunking discovery was not provided for 213 [RFC7530] and no description of it was provided by [RFC7931], when 214 trunking detection support was added. 216 Regarding network addresses and the handling of trunking we use the 217 following terminology: 219 o Each NFSv4 server is assumed to have a set of IP addresses to 220 which NFSv4 requests may be sent by clients. These are referred 221 to as the server's network addresses. 223 o Each network address, when combined with a pathname providing the 224 location of a file system root directory relative to the 225 associated server root file handle, defines a file system network 226 access path. 228 o Two network addresses connected to the same server are said to be 229 server-trunkable. 231 o Two network addresses connected to the same server such that those 232 addresses can be used to support a single common session are 233 referred to as session-trunkable. Note that two addresses may be 234 server-trunkable without being session-trunkable. 236 Discussion of the term "replica" is complicated for a number of 237 reasons: 239 o Even though the term is used in explaining the issues in [RFC5661] 240 that need to be addressed in this document, a full explanation of 241 this term requires explanation of related terms connected to the 242 location attributes which are provided in Section 4.2 of the 243 current document. 245 o The term is also used in [RFC5661], with a meaning different from 246 that in the current document. In short, in [RFC5661] each replica 247 is a identified by a single network access path while, in the 248 current document a set of network access paths which have server- 249 trunkable network addresses and the same root-relative file system 250 pathname are considered to be a single replica with multiple 251 network access paths. 253 3.2. Summary of Issues 255 This document explains how clients and servers are to determine the 256 particular network access paths to be used to access a file system. 257 This includes describing how changes to the specific replica or to 258 the set of addresses to be used are to be dealt with, and how 259 transfers of responsibility that need to be made can be dealt with 260 transparently. This includes cases in which there is a shift between 261 one replica and another and those in which different network access 262 paths are used to access the same replica. 264 As a result of the following problems in [RFC5661], it is necessary 265 to provide the updates described later in this document. 267 o [RFC5661], while it dealt with situations in which various forms 268 of clustering allowed co-ordination of the state assigned by co- 269 operating servers to be used, made no provisions for Transparent 270 State Migration, as introduced by [RFC7530] and corrected and 271 clarified by [RFC7931]. 273 o Although NFSv4.1 was defined with a clear definition of how 274 trunking detection was to be done, there was no clear 275 specification of how trunking discovery was to be done, despite 276 the fact that the specification clearly indicated that this 277 information could be made available via the location attributes. 279 o Because the existence of multiple network access paths to the same 280 file system was dealt with as if there were multiple replicas, 281 issues relating to transitions between replicas could never be 282 clearly distinguished from trunking-related transitions between 283 the addresses used to access a particular file system instance. 284 As a result, in situations in which both migration and trunking 285 were involved, neither of these could be clearly dealt with and 286 the relationship between these two features was not seriously 287 addressed. 289 o Because use of two network access paths to the same file system 290 instance (i.e. trunking) was often treated as if two replicas were 291 involved, it was considered that two replicas were being used 292 simultaneously. As a result, the treatment of replicas being used 293 simultaneously in [RFC5661] was not clear as it covered the two 294 distinct cases of a single file system instance being accessed by 295 two different network access paths and two replicas being accessed 296 simultaneously, with the limitations of the latter case not being 297 clearly laid out. 299 The majority of the consequences of these issues are dealt with via 300 the updates in various subsections of Section 4 of the current 301 document which deal with problems within Section 11 of [RFC5661]. 302 These include: 304 o Reorganization made necessary by the fact that two network access 305 paths to the same file system instance needs to be distinguished 306 clearly from two different replicas since the former share locking 307 state and can share session state. 309 o The need for a clear statement regarding the desirability of 310 transparent transfer of state together with a recommendation that 311 either that or a single-fs grace period be provided. 313 o Specifically delineating how such transfers are to be dealt with 314 by the client, taking into account the differences from the 315 treatment in [RFC7931] made necessary by the major protocol 316 changes made in NFSv4.1. 318 o Discussion of the relationship between transparent state transfer 319 and pNFS. 321 In addition, there are also updates to other sections of [RFC5661], 322 where the consequences of the incorrect assumptions underlying the 323 current treatment of multi-server namespace issues also need to be 324 corrected. These are to be dealt with as described in various 325 subsections of Section 12 of the current document. 327 o A revised introductory section regarding multi-server namespace 328 facilities is provided. 330 o A more realistic treatment of server scope is provided, which 331 reflects the more limited co-ordination of locking state adopted 332 by servers actually sharing a common server scope. 334 o Some confusing text regarding changes in server_owner needs to be 335 clarified. 337 o The description of NFS4ERR_MOVED needs to be updated since two 338 different network access paths to the same file system are no 339 longer considered to be two instances of the same file system. 341 o A new treatment of EXCHANGE_ID is needed, replacing that which 342 appeared in Section 18.35 of [RFC5661] 344 3.3. Relationship of this Document to RFC5661 346 The role of this document is to explain and specify a set of needed 347 changes to [RFC5661]. All of these changes are related to the multi- 348 server namespace features of NFSv4.1. 350 This document contains sections that propose additions to and other 351 modifications of [RFC5661] as well as others that explain the reasons 352 for modifications but do not directly affect existing specifications. 354 In consequence, the sections of this document can be divided into 355 four groups based on how they relate to the eventual updating of the 356 NFSv4.1 specification. Once the update is published, NFSv4.1 will be 357 specified by two documents that need to be read together, until such 358 time as a consolidated specification is produced. 360 o Explanatory sections do not contain any material that is meant to 361 update the specification of NFSv4.1. Such sections may contain 362 explanation about why and how changes are to be done, without 363 including any text that is to update [RFC5661] or appear in an 364 eventual consolidated document, 366 o Replacement sections contain text that is to replace and thus 367 supersede text within [RFC5661] and then appear in an eventual 368 consolidated document. Replacement sections have the phrase "(as 369 updated)" appended to the section title. 371 o Additional sections contain text which, although not replacing 372 anything in [RFC5661], will be part of the specification of 373 NFSv4.1 and will be expected to be part of an eventual 374 consolidated document. Additional sections have the phrase "(to 375 be added)" appended to the section title. 377 o Editing sections contain some text that replaces text within 378 [RFC5661], although the entire section will not consist of such 379 text and will include other text as well. Such sections make 380 relatively minor adjustments in the existing NFSv4.1 specification 381 which are expected to reflected in an eventual consolidated 382 document. Generally such replacement text appears as a quotation, 383 which may take the form of an indented set of paragraphs. 385 See Appendix A for a classification of the sections of this document 386 according the categories above. 388 When this document is approved and published, [RFC5661] would be 389 significantly updated with most of the changed sections within the 390 current Section 11 of that document. A detailed discussion of the 391 necessary updates can be found in Appendix B. 393 4. Changes to Section 11 of RFC5661 395 A number of sections need to be revised, replacing existing sub- 396 sections within section 11 of [RFC5661]: 398 o New introductory material, including a terminology section, 399 replaces the existing material in [RFC5661] ranging from the start 400 of the existing Section 11 up to and including the existing 401 Section 11.1. The new material appears in Sections 4.1 through 402 4.3 below. 404 o A significant reorganization of the material in the existing 405 Sections 11.4 and 11.5 (of [RFC5661]) is necessary. The reasons 406 for the reorganization of these sections into a single section 407 with multiple subsections are discussed in Section 4.4 below. 408 This replacement appears as Section 4.5 below. 410 New material relating to the handling of the location attributes 411 is contained in Sections 4.5.1 and 4.5.6 below. 413 o A major replacement for the existing Section 11.7 of [RFC5661] 414 entitled "Effecting File System Transitions", will appear as 415 Sections 6 through 11 of the current document. The reasons for 416 the reorganization of this section into multiple sections are 417 discussed below in Section 5 of the current document. 419 4.1. Multi-Server Namespace (as updated) 421 NFSv4.1 supports attributes that allow a namespace to extend beyond 422 the boundaries of a single server. It is desirable that clients and 423 servers support construction of such multi-server namespaces. Use of 424 such multi-server namespaces is OPTIONAL, however, and for many 425 purposes, single-server namespaces are perfectly acceptable. Use of 426 multi-server namespaces can provide many advantages, however, by 427 separating a file system's logical position in a namespace from the 428 (possibly changing) logistical and administrative considerations that 429 result in particular file systems being located on particular 430 servers. 432 4.2. Location-related Terminology (to be added) 434 Regarding terminology relating to the construction of multi-server 435 namespaces out of a set of local per-server namespaces: 437 o Each server has a set of exported file systems which may accessed 438 by NFSv4 clients. Typically, this is done by assigning each file 439 system a name within the pseudo-fs associated with the server, 440 although the pseudo-fs may be dispensed with if there is only a 441 single exported file system. Each such file system is part of the 442 server's local namespace, and can be considered as a file system 443 instance within a larger multi-server namespace. 445 o The set of all exported file systems for a given server 446 constitutes that server's local namespace. 448 o In some cases, a server will have a namespace, more extensive than 449 its local namespace, by using features associated with attributes 450 that provide location information. These features, which allow 451 construction of a multi-server namespace are all described in 452 individual sections below and include referrals (described in 453 Section 4.5.5), migration (described in Section 4.5.4), and 454 replication (described in Section 4.5.3). 456 o A file system present in a server's pseudo-fs may have multiple 457 file system instances on different servers associated with it. 458 All such instances are considered replicas of one another. 460 o When a file system is present in a server's pseudo-fs, but there 461 is no corresponding local file system, it is said to be "absent". 462 In such cases, all associated instances will be accessed on other 463 servers. 465 Regarding terminology relating to attributes used in trunking 466 discovery and other multi-server namespace features: 468 o Location attributes include the fs_locations and fs_locations_info 469 attributes. 471 o Location entries are the individual file system locations in the 472 location attributes. Each such entry specifies a server, in the 473 form of a host name, and an fs name, which in the location of the 474 file system within the server's pseudo-fs. The exact form of the 475 location entry varies with the particular location attribute used 476 as described in Section 4.3 478 o Location elements are derived from location entries and each 479 describes a particular network access path. Location elements 480 need not appear within a location attribute, but the existence of 481 each location element derives from a corresponding location entry. 482 When a location entry specifies an IP address there is only a 483 single corresponding location element. Location entries that 484 contain a host name, are resolved using DNS, and may result in one 485 or more location elements. All location elements consist of a 486 location address which is the IP address of an interface to a 487 server and an fs name which is the location of the file system 488 within the server's pseudo-fs. The fs name is empty if the server 489 has no pseudo-fs and only a single exported file system at the 490 root filehandle. 492 o Two location elements are said to be server-trunkable if they 493 specify the same fs name and the location addresses are such that 494 the location addresses are server-trunkable. 496 o Two location elements are said to be session-trunkable if they 497 specify the same fs name and the location addresses are such that 498 the location addresses are session-trunkable. 500 Each set of server-trunkable location elements defines a set of 501 available network access paths to a particular file system. When 502 there are multiple such file systems, each of which contains the same 503 data, these file systems are considered replicas of one another. 504 Logically, such replication is symmetric, since the fs currently in 505 use and an alternate fs are replicas of each other. Often, in other 506 documents, the term "replica" is not applied to the fs currently in 507 use, despite the fact that the replication relation is inherently 508 symmetric. 510 4.3. Location Attributes (as updated) 512 NFSv4.1 contains RECOMMENDED attributes that provide information 513 about how (i.e. at what network address and namespace position) a 514 given file system may be accessed. As a result, file systems in the 515 namespace of one server can be associated with one or more instances 516 of that file system on other servers. These attributes contain 517 location entries specifying a server address target (either as a DNS 518 name representing one or more IP addresses or as a specific IP 519 address) together with the pathname of that file system within the 520 associated single-server namespace. 522 The fs_locations_info RECOMMENDED attribute allows specification of 523 one or more file system instance locations where the data 524 corresponding to a given file system may be found. This attribute 525 provides to the client, in addition to information about file system 526 instance locations, significant information about the various file 527 system instance choices (e.g., priority for use, writability, 528 currency, etc.). It also includes information to help the client 529 efficiently effect as seamless a transition as possible among 530 multiple file system instances, when and if that should be necessary. 532 Within the fs_locations_info attribute, each fs_locations_server4 533 entry corresponds to a location entry with the fls_server field 534 designating the server, with the location pathname within the 535 server's pseudo-fs given by the fl_rootpath field of the encompassing 536 fs_locations_item4. 538 The fs_locations attribute defined in NFSv4.0 is also a part of 539 NFSv4.1. This attribute only allows specification of the file system 540 locations where the data corresponding to a given file system may be 541 found. Servers should make this attribute available whenever 542 fs_locations_info is supported, but client use of fs_locations_info 543 is preferable. 545 Within the fs_location attribute, each fs_location4 contains a 546 location entry with the server field designating the server and the 547 rootpath field giving the location pathname within the server's 548 pseudo-fs. 550 4.4. Re-organization of Sections 11.4 and 11.5 of RFC5661 552 Previously, issues related to the fact that multiple location entries 553 directed the client to the same file system instance were dealt with 554 in a separate Section 11.5 of [RFC5661]. Because of the new 555 treatment of trunking, these issues now belong within Section 4.5 556 below. 558 In this new section of the current document, trunking is dealt with 559 in Section 4.5.2 together with the other uses of location information 560 described in Sections 4.5.3, 4.5.4, and 4.5.5. 562 4.5. Uses of Location Information (as updated) 564 The location attributes (i.e. fs_locations and fs_locations_info), 565 together with the possibility of absent file systems, provide a 566 number of important facilities in providing reliable, manageable, and 567 scalable data access. 569 When a file system is present, these attributes can provide 571 o The locations of alternative replicas, to be used to access the 572 same data in the event of server failures, communications 573 problems, or other difficulties that make continued access to the 574 current replica impossible or otherwise impractical. Provision 575 and use of such alternate replicas is referred to as "replication" 576 and is discussed in Section 4.5.3 below. 578 o The network address(es) to be used to access the current file 579 system instance or replicas of it. Client use of this information 580 is discussed in Section 4.5.2 below. 582 Under some circumstances, multiple replicas may be used 583 simultaneously to provide higher-performance access to the file 584 system in question, although the lack of state sharing between 585 servers may be an impediment to such use. 587 When a file system is present and becomes absent, clients can be 588 given the opportunity to have continued access to their data, using a 589 different replica. In this case, a continued attempt to use the data 590 in the now-absent file system will result in an NFS4ERR_MOVED error 591 and, at that point, the successor replica or set of possible replica 592 choices can be fetched and used to continue access. Transfer of 593 access to the new replica location is referred to as "migration", and 594 is discussed in Section 4.5.3 below. 596 Where a file system was previously absent, specification of file 597 system location provides a means by which file systems located on one 598 server can be associated with a namespace defined by another server, 599 thus allowing a general multi-server namespace facility. A 600 designation of such a remote instance, in place of a file system 601 never previously present , is called a "pure referral" and is 602 discussed in Section 4.5.5 below. 604 Because client support for location-related attributes is OPTIONAL, a 605 server may (but is not required to) take action to hide migration and 606 referral events from such clients, by acting as a proxy, for example. 607 The server can determine the presence of client support from the 608 arguments of the EXCHANGE_ID operation (see Section 13.3 in the 609 current document). 611 4.5.1. Combining Multiple Uses in a Single Attribute (to be added) 613 A location attribute will sometimes contain information relating to 614 the location of multiple replicas which may be used in different 615 ways. 617 o Location entries that relate to the file system instance currently 618 in use provide trunking information, allowing the client to find 619 additional network addresses by which the instance may be 620 accessed. 622 o Location entries that provide information about replicas to which 623 access is to be transferred. 625 o Other location entries that relate to replicas that are available 626 to use in the event that access to the current replica becomes 627 unsatisfactory. 629 In order to simplify client handling and allow the best choice of 630 replicas to access, the server should adhere to the following 631 guidelines. 633 o All location entries that relate to a single file system instance 634 should be adjacent. 636 o Location entries that relate to the instance currently in use 637 should appear first. 639 o Location entries that relate to replica(s) to which migration is 640 occurring should appear before replicas which are available for 641 later use if the current replica should become inaccessible. 643 4.5.2. Location Attributes and Trunking (to be added) 645 A client may determine the set of network addresses to use to access 646 a given file system in a number of ways: 648 o When the name of the server is known to the client, it may use DNS 649 to obtain a set of network addresses to use in accessing the 650 server. 652 o It may fetch the location attribute for the filesystem which will 653 provide either the name of the server (which can be turned into a 654 set of network addresses using DNS), or it will find a set of 655 server-trunkable location entries which can provide the addresses 656 specified by the server as desirable to use to access the file 657 system in question. 659 The server can provide location entries that include either names or 660 network addresses. It might use the latter form because of DNS- 661 related security concerns or because the set of addresses to be used 662 might require active management by the server. 664 Locations entries used to discover addresses for use in trunking are 665 subject to change, as discussed in Section 4.5.6 below. The client 666 may respond to such changes by using additional addresses or ceasing 667 to use existing ones. The server can force the client to cease using 668 an address by returning NFS4ERR_MOVED when that address is used to 669 access a file system. This allows a transfer of access very like 670 migration, although the same file system instance is accessed 671 throughout. 673 4.5.3. File System Replication (as updated) 675 The fs_locations and fs_locations_info attributes provide alternative 676 locations, to be used to access data in place of or in addition to 677 the current file system instance. On first access to a file system, 678 the client should obtain the set of alternate locations by 679 interrogating the fs_locations or fs_locations_info attribute, with 680 the latter being preferred. 682 In the event that server failures, communications problems, or other 683 difficulties make continued access to the current file system 684 impossible or otherwise impractical, the client can use the alternate 685 locations as a way to get continued access to its data. 687 The alternate locations may be physical replicas of the (typically 688 read-only) file system data, or they may provide for the use of 689 various forms of server clustering in which multiple servers provide 690 alternate ways of accessing the same physical file system. How these 691 different modes of file system transition are represented within the 692 fs_locations and fs_locations_info attributes and how the client 693 deals with file system transition issues will be discussed in detail 694 below. 696 4.5.4. File System Migration (as updated) 698 When a file system is present and becomes absent, clients can be 699 given the opportunity to have continued access to their data, at an 700 alternate location, as specified by a location attribute. This 701 migration of access to another replica includes the ability to retain 702 locks across the transition, either by reclaim or by Transparent 703 State Migration. 705 Typically, a client will be accessing the file system in question, 706 get an NFS4ERR_MOVED error, and then use a location attribute to 707 determine the new location of the data. When fs_locations_info is 708 used, additional information will be available that will define the 709 nature of the client's handling of the transition to a new server. 711 Such migration can be helpful in providing load balancing or general 712 resource reallocation. The protocol does not specify how the file 713 system will be moved between servers. It is anticipated that a 714 number of different server-to-server transfer mechanisms might be 715 used with the choice left to the server implementer. The NFSv4.1 716 protocol specifies the method used to communicate the migration event 717 between client and server. 719 The new location may be, in the case of various forms of server 720 clustering, another server providing access to the same physical file 721 system. The client's responsibilities in dealing with this 722 transition will depend on whether migration has occurred and the 723 means the server has chosen to provide continuity of locking state. 724 These issues will be discussed in detail below. 726 Although a single successor location is typical, multiple locations 727 may be provided. When multiple locations are provided, the client 728 use the first one provided. If that is inaccessible for some reason, 729 later ones can be used. In such cases the client might consider that 730 the transition to the new replica is a migration event, although it 731 would lose access to locking state if it did so. 733 When an alternate location is designated as the target for migration, 734 it must designate the same data (with metadata being the same to the 735 degree indicated by the fs_locations_info attribute). Where file 736 systems are writable, a change made on the original file system must 737 be visible on all migration targets. Where a file system is not 738 writable but represents a read-only copy (possibly periodically 739 updated) of a writable file system, similar requirements apply to the 740 propagation of updates. Any change visible in the original file 741 system must already be effected on all migration targets, to avoid 742 any possibility that a client, in effecting a transition to the 743 migration target, will see any reversion in file system state. 745 4.5.5. Referrals (as updated) 747 Referrals allow the server to associate a file system located on one 748 server with file system located on another server. When this 749 includes the use of pure referrals, servers are provided a way of 750 placing a file system in a location within the namespace essentially 751 without respect to its physical location on a given server. This 752 allows a single server or a set of servers to present a multi-server 753 namespace that encompasses file systems located on multiple servers. 754 Some likely uses of this include establishment of site-wide or 755 organization-wide namespaces, with the eventual possibility of 756 combining such together into a truly global namespace. 758 Referrals occur when a client determines, upon first referencing a 759 position in the current namespace, that it is part of a new file 760 system and that the file system is absent. When this occurs, 761 typically by receiving the error NFS4ERR_MOVED, the actual location 762 or locations of the file system can be determined by fetching the 763 fs_locations or fs_locations_info attribute. 765 The locations-related attribute may designate a single file system 766 location or multiple file system locations, to be selected based on 767 the needs of the client. The server, in the fs_locations_info 768 attribute, may specify priorities to be associated with various file 769 system location choices. The server may assign different priorities 770 to different locations as reported to individual clients, in order to 771 adapt to client physical location or to effect load balancing. When 772 both read-only and read-write file systems are present, some of the 773 read-only locations might not be absolutely up-to-date (as they would 774 have to be in the case of replication and migration). Servers may 775 also specify file system locations that include client-substituted 776 variables so that different clients are referred to different file 777 systems (with different data contents) based on client attributes 778 such as CPU architecture. 780 When the fs_locations_info attribute indicates that there are 781 multiple possible targets listed, the relationships among them may be 782 important to the client in selecting which one to use. The same 783 rules specified in Section 4.5.4 below regarding multiple migration 784 targets apply to these multiple replicas as well. For example, the 785 client might prefer a writable target on a server that has additional 786 writable replicas to which it subsequently might switch. Note that, 787 as distinguished from the case of replication, there is no need to 788 deal with the case of propagation of updates made by the current 789 client, since the current client has not accessed the file system in 790 question. 792 Use of multi-server namespaces is enabled by NFSv4.1 but is not 793 required. The use of multi-server namespaces and their scope will 794 depend on the applications used and system administration 795 preferences. 797 Multi-server namespaces can be established by a single server 798 providing a large set of pure referrals to all of the included file 799 systems. Alternatively, a single multi-server namespace may be 800 administratively segmented with separate referral file systems (on 801 separate servers) for each separately administered portion of the 802 namespace. The top-level referral file system or any segment may use 803 replicated referral file systems for higher availability. 805 Generally, multi-server namespaces are for the most part uniform, in 806 that the same data made available to one client at a given location 807 in the namespace is made available to all clients at that location. 808 However, there are facilities provided that allow different clients 809 to be directed to different sets of data, so as to adapt to such 810 client characteristics as CPU architecture. 812 4.5.6. Changes in a Location Attribute (to be added) 814 Although clients will typically fetch a location attribute when first 815 accessing a file system and when NFS4ERR_MOVED is returned, a client 816 can choose to fetch the attribute periodically, in which case, the 817 value fetched may change over time. 819 For clients not prepared to access multiple replicas simultaneously 820 (see Section 8.1 of the current document), the handling of the 821 various cases of change are as follows: 823 o Changes in the list of replicas or in the network addresses 824 associated with replicas do not require immediate action. The 825 client will typically update its list of replicas to reflect the 826 new information. 828 o Additions to the list of network addresses for the current file 829 system instance need not be acted on promptly. However the client 830 can choose to use the new address whenever it needs to switch 831 access to a new replica. 833 o Deletions from the list of network addresses for the current file 834 system instance need not be acted on immediately, although the 835 client might need to be prepared for a shift in access whenever 836 the server indicates that a network access path unusable to access 837 the current file system, by returning NFS4ERR_MOVED. 839 For clients that are prepared to access several replicas 840 simultaneously, the following additional cases need to be addressed. 841 As in the cases discussed above, changes in the set of replicas need 842 not be acted upon promptly, although the client has the option of 843 adjusting its access in the absence of difficulties that cause a new 844 replica to be selected. 846 o When a new replica is added which may be accessed simultaneously 847 with one currently in use, the client is free to use the new 848 replica immediately. 850 o When a replica currently in use is deleted from the list, the 851 client need not cease using it immediately. However, since the 852 server may subsequently force such use to cease (by returning 853 NFS4ERR_MOVED). clients may choose to limit the need for later 854 state transfer. For example, new opens might be done on other 855 replicas, rather than on one not present in the list. 857 5. Re-organization of Section 11.7 of RFC5661 859 The material in Section 11.7 of [RFC5661] has been reorganized and 860 augmented as specified below: 862 o Because there can be a shift of the network access paths used to 863 access a file system instance without any shift between replicas, 864 a new Section 6 in the current document distinguishes between 865 those cases in which there is a shift between distinct replicas 866 and those involving a shift in network access paths with no shift 867 between replicas. 869 As a result, a new Section 7 in the current document deals with 870 network address transitions while the bulk of the former 871 Section 11.7 (in [RFC5661]) is replaced by Section 8 in the 872 current document which is now limited to cases in which there is a 873 shift between two different sets of replicas. 875 o The additional Section 9 in the current document discusses the 876 case in which a shift to a different replica is made and state is 877 transferred to allow the client the ability to have continues 878 access to the accumulated locking state on the new server. 880 o The additional Section 10 in the current document discusses the 881 client's response to access transitions and how it determines 882 whether migration has occurred, and how it gets access to any 883 transferred locking and session state. 885 o The additional Section 11 in the current document discusses the 886 responsibilities of the source and destination servers when 887 transferring locking and session state. 889 6. Overview of File Access Transitions (to be added) 891 File access transitions are of two types: 893 o Those that involve a transition from accessing the current replica 894 to another one either due to replication or migration. How these 895 are dealt with is discussed in Section 8 in the current document. 897 o Those in which access to the current file system instance is 898 retained, while the network path used to access that instance is 899 changed. This case is discussed in Section 7 in the current 900 document. 902 7. Effecting Network Address Transitions (to be added) 904 The addresses used to access a particular file system instance may 905 change in a number of ways, as listed below. In each of these cases, 906 the same filehandles, stateids, client IDs and session are used to 907 continue access, with a continuity of lock state. 909 o When use of a particular address is to cease and there is one 910 currently in use which is server-trunkable with it, requests that 911 would have been issued on the address whose use is to be 912 discontinued can be issued on the remaining address(es). When an 913 address is not a session-trunkable one, the request may need to be 914 modified to reflect the fact that a different session will be 915 used. 917 o When there are no potential replacement addresses in use but there 918 are valid addresses session-trunkable with the one whose use is to 919 be discontinued, the client can use BIND_CONN_TO_SESSION to access 920 the existing session using the new address. Although the target 921 session will generally be accessible, there may be cases in which 922 that session in no longer accessible, in which case a new session 923 can be created to provide the client continued access to the 924 existing instance. 926 o When there is no potential replacement address in use and no are 927 valid addresses session-trunkable with the one whose use is to be 928 discontinued, other server-trunkable addresses may be used to 929 provide continued access. Although use of CREATE_SESSION is 930 available to provide continued access to the existing instance, 931 servers have the option of providing continued access to the 932 existing session through the new network access path in a fashion 933 similar to that provided by session migration (see Section 9 of 934 the current document). To take advantage of this possibility, 935 clients can perform an initial BIND_CONN_TO_SESSION, as in the 936 previous case, and use CREATE_SESSION only when that fails. 938 8. Effecting File System Transitions (as updated) 940 There are a range of situations in which there is a change to be 941 effected in the set of replicas used to access a particular file 942 system. Some of these may involve an expansion or contraction of the 943 set of replicas used as discussed in Section 8.1 below. 945 For reasons explained in that section, most transitions will involve 946 a transition from a single replica to a corresponding replacement 947 replica. When effecting replica transition, some types of sharing 948 between the replicas may affect handling of the transition as 949 described in Sections 8.2 through 8.8 below. The attribute 950 fs_locations_info provides helpful information to allow the client to 951 determine the degree of inter-replica sharing. 953 With regard to some types of state, the degree of continuity across 954 the transition depends on the occasion prompting the transition, with 955 transitions initiated by the servers (i.e. migration) offering much 956 more scope for a non-disruptive transition than cases in which the 957 client on its own shifts its access to another replica (i.e. 958 replication). This issue potentially applies to locking state and to 959 session state, which are dealt with below as follows: 961 o An introduction to the possible means of providing continuity of 962 these areas appears in Section 8.9 below. 964 o Transparent State Migration is introduced in Section 9 of the 965 current document. The possible transfer of session state is 966 addressed there as well. 968 o The client handling of transitions, including determining how to 969 deal with the various means that the server might take to supply 970 effective continuity of locking state are discussed in Section 10 971 of the current document. 973 o The servers' (source and destination) responsibilities in 974 effecting Transparent Migration of locking and session state are 975 discussed in Section 11 of the current document. 977 8.1. File System Transitions and Simultaneous Access (as updated) 979 The fs_locations_info attribute (described in Section 11.10.1 of 980 [RFC5661]) may indicate that two replicas may be used simultaneously 981 (see Section 11.7.2.1 of [RFC5661] for details). Although situations 982 in which multiple replicas may be accessed simultaneously are 983 somewhat similar to those in which a single replica is accessed by 984 multiple network addresses, there are important differences, since 985 locking state is not shared among multiple replicas. 987 Because of this difference in state handling, many clients will not 988 have the ability to take advantage the fact that such replicas 989 represent the same data. Such clients will not be prepared to use 990 multiple replicas simultaneously but will access each file system 991 using only a single replica, although the replica selected may make 992 multiple server-trunkable addresses available. 994 Clients who are prepared to use multiple replicas simultaneously will 995 divide opens among replicas however they choose. Once that choice is 996 made, any subsequent transitions will treat the set of locking state 997 associated with each replica as a single entity. 999 For example, if one of the replicas become unavailable, access will 1000 be transferred to a different replica, also capable of simultaneous 1001 access with the one still in use. 1003 When there is no such replica, the transition may be to the replica 1004 already in use. At this point, the client has a choice between 1005 merging the locking state for the two replicas under the aegis of the 1006 sole replica in use or treating these separately, until another 1007 replica capable of simultaneous access presents itself. 1009 8.2. Filehandles and File System Transitions (as updated) 1011 There are a number of ways in which filehandles can be handled across 1012 a file system transition. These can be divided into two broad 1013 classes depending upon whether the two file systems across which the 1014 transition happens share sufficient state to effect some sort of 1015 continuity of file system handling. 1017 When there is no such cooperation in filehandle assignment, the two 1018 file systems are reported as being in different handle classes. In 1019 this case, all filehandles are assumed to expire as part of the file 1020 system transition. Note that this behavior does not depend on the 1021 fh_expire_type attribute and supersedes the specification of the 1022 FH4_VOL_MIGRATION bit, which only affects behavior when 1023 fs_locations_info is not available. 1025 When there is cooperation in filehandle assignment, the two file 1026 systems are reported as being in the same handle classes. In this 1027 case, persistent filehandles remain valid after the file system 1028 transition, while volatile filehandles (excluding those that are only 1029 volatile due to the FH4_VOL_MIGRATION bit) are subject to expiration 1030 on the target server. 1032 8.3. Fileids and File System Transitions (as updated) 1034 In NFSv4.0, the issue of continuity of fileids in the event of a file 1035 system transition was not addressed. The general expectation had 1036 been that in situations in which the two file system instances are 1037 created by a single vendor using some sort of file system image copy, 1038 fileids will be consistent across the transition, while in the 1039 analogous multi-vendor transitions they will not. This poses 1040 difficulties, especially for the client without special knowledge of 1041 the transition mechanisms adopted by the server. Note that although 1042 fileid is not a REQUIRED attribute, many servers support fileids and 1043 many clients provide APIs that depend on fileids. 1045 It is important to note that while clients themselves may have no 1046 trouble with a fileid changing as a result of a file system 1047 transition event, applications do typically have access to the fileid 1048 (e.g., via stat). The result is that an application may work 1049 perfectly well if there is no file system instance transition or if 1050 any such transition is among instances created by a single vendor, 1051 yet be unable to deal with the situation in which a multi-vendor 1052 transition occurs at the wrong time. 1054 Providing the same fileids in a multi-vendor (multiple server 1055 vendors) environment has generally been held to be quite difficult. 1056 While there is work to be done, it needs to be pointed out that this 1057 difficulty is partly self-imposed. Servers have typically identified 1058 fileid with inode number, i.e. with a quantity used to find the file 1059 in question. This identification poses special difficulties for 1060 migration of a file system between vendors where assigning the same 1061 index to a given file may not be possible. Note here that a fileid 1062 is not required to be useful to find the file in question, only that 1063 it is unique within the given file system. Servers prepared to 1064 accept a fileid as a single piece of metadata and store it apart from 1065 the value used to index the file information can relatively easily 1066 maintain a fileid value across a migration event, allowing a truly 1067 transparent migration event. 1069 In any case, where servers can provide continuity of fileids, they 1070 should, and the client should be able to find out that such 1071 continuity is available and take appropriate action. Information 1072 about the continuity (or lack thereof) of fileids across a file 1073 system transition is represented by specifying whether the file 1074 systems in question are of the same fileid class. 1076 Note that when consistent fileids do not exist across a transition 1077 (either because there is no continuity of fileids or because fileid 1078 is not a supported attribute on one of instances involved), and there 1079 are no reliable filehandles across a transition event (either because 1080 there is no filehandle continuity or because the filehandles are 1081 volatile), the client is in a position where it cannot verify that 1082 files it was accessing before the transition are the same objects. 1083 It is forced to assume that no object has been renamed, and, unless 1084 there are guarantees that provide this (e.g., the file system is 1085 read-only), problems for applications may occur. Therefore, use of 1086 such configurations should be limited to situations where the 1087 problems that this may cause can be tolerated. 1089 8.4. Fsids and File System Transitions (as updated) 1091 Since fsids are generally only unique on a per-server basis, it is 1092 likely that they will change during a file system transition. 1093 Clients should not make the fsids received from the server visible to 1094 applications since they may not be globally unique, and because they 1095 may change during a file system transition event. Applications are 1096 best served if they are isolated from such transitions to the extent 1097 possible. 1099 Although normally a single source file system will transition to a 1100 single target file system, there is a provision for splitting a 1101 single source file system into multiple target file systems, by 1102 specifying the FSLI4F_MULTI_FS flag. 1104 8.4.1. File System Splitting (as updated) 1106 When a file system transition is made and the fs_locations_info 1107 indicates that the file system in question may be split into multiple 1108 file systems (via the FSLI4F_MULTI_FS flag), the client SHOULD do 1109 GETATTRs to determine the fsid attribute on all known objects within 1110 the file system undergoing transition to determine the new file 1111 system boundaries. 1113 Clients may maintain the fsids passed to existing applications by 1114 mapping all of the fsids for the descendant file systems to the 1115 common fsid used for the original file system. 1117 Splitting a file system may be done on a transition between file 1118 systems of the same fileid class, since the fact that fileids are 1119 unique within the source file system ensure they will be unique in 1120 each of the target file systems. 1122 8.5. The Change Attribute and File System Transitions (as updated) 1124 Since the change attribute is defined as a server-specific one, 1125 change attributes fetched from one server are normally presumed to be 1126 invalid on another server. Such a presumption is troublesome since 1127 it would invalidate all cached change attributes, requiring 1128 refetching. Even more disruptive, the absence of any assured 1129 continuity for the change attribute means that even if the same value 1130 is retrieved on refetch, no conclusions can be drawn as to whether 1131 the object in question has changed. The identical change attribute 1132 could be merely an artifact of a modified file with a different 1133 change attribute construction algorithm, with that new algorithm just 1134 happening to result in an identical change value. 1136 When the two file systems have consistent change attribute formats, 1137 and this fact is communicated to the client by reporting in the same 1138 change class, the client may assume a continuity of change attribute 1139 construction and handle this situation just as it would be handled 1140 without any file system transition. 1142 8.6. Write Verifiers and File System Transitions (as updated) 1144 In a file system transition, the two file systems may be clustered in 1145 the handling of unstably written data. When this is the case, and 1146 the two file systems belong to the same write-verifier class, write 1147 verifiers returned from one system may be compared to those returned 1148 by the other and superfluous writes avoided. 1150 When two file systems belong to different write-verifier classes, any 1151 verifier generated by one must not be compared to one provided by the 1152 other. Instead, the two verifiers should be treated as not equal 1153 even when the values are identical. 1155 8.7. Readdir Cookies and Verifiers and File System Transitions (as 1156 updated) 1158 In a file system transition, the two file systems may be consistent 1159 in their handling of READDIR cookies and verifiers. When this is the 1160 case, and the two file systems belong to the same readdir class, 1161 READDIR cookies and verifiers from one system may be recognized by 1162 the other and READDIR operations started on one server may be validly 1163 continued on the other, simply by presenting the cookie and verifier 1164 returned by a READDIR operation done on the first file system to the 1165 second. 1167 When two file systems belong to different readdir classes, any 1168 READDIR cookie and verifier generated by one is not valid on the 1169 second, and must not be presented to that server by the client. The 1170 client should act as if the verifier was rejected. 1172 8.8. File System Data and File System Transitions (as updated) 1174 When multiple replicas exist and are used simultaneously or in 1175 succession by a client, applications using them will normally expect 1176 that they contain either the same data or data that is consistent 1177 with the normal sorts of changes that are made by other clients 1178 updating the data of the file system (with metadata being the same to 1179 the degree indicated by the fs_locations_info attribute). However, 1180 when multiple file systems are presented as replicas of one another, 1181 the precise relationship between the data of one and the data of 1182 another is not, as a general matter, specified by the NFSv4.1 1183 protocol. It is quite possible to present as replicas file systems 1184 where the data of those file systems is sufficiently different that 1185 some applications have problems dealing with the transition between 1186 replicas. The namespace will typically be constructed so that 1187 applications can choose an appropriate level of support, so that in 1188 one position in the namespace a varied set of replicas will be 1189 listed, while in another only those that are up-to-date may be 1190 considered replicas. The protocol does define three special cases of 1191 the relationship among replicas to be specified by the server and 1192 relied upon by clients: 1194 o When multiple replicas exist and are used simultaneously by a 1195 client (see the FSLIB4_CLSIMUL definition within 1196 fs_locations_info), they must designate the same data. Where file 1197 systems are writable, a change made on one instance must be 1198 visible on all instances, immediately upon the earlier of the 1199 return of the modifying requester or the visibility of that change 1200 on any of the associated replicas. This allows a client to use 1201 these replicas simultaneously without any special adaptation to 1202 the fact that there are multiple replicas, beyond adapting to the 1203 fact that locks obtained on one replica are maintained separately 1204 (i.e. under a different client ID). In this case, locks (whether 1205 share reservations or byte-range locks) and delegations obtained 1206 on one replica are immediately reflected on all replicas, in the 1207 sense that access from all other servers is prevented regardless 1208 of the replica used. However, because the servers are not 1209 required to treat two associated client IDs as representing the 1210 same client, it is best to access each file using only a single 1211 client ID. 1213 o When one replica is designated as the successor instance to 1214 another existing instance after return NFS4ERR_MOVED (i.e., the 1215 case of migration), the client may depend on the fact that all 1216 changes written to stable storage on the original instance are 1217 written to stable storage of the successor (uncommitted writes are 1218 dealt with in Section 8.6 below). 1220 o Where a file system is not writable but represents a read-only 1221 copy (possibly periodically updated) of a writable file system, 1222 clients have similar requirements with regard to the propagation 1223 of updates. They may need a guarantee that any change visible on 1224 the original file system instance must be immediately visible on 1225 any replica before the client transitions access to that replica, 1226 in order to avoid any possibility that a client, in effecting a 1227 transition to a replica, will see any reversion in file system 1228 state. The specific means of this guarantee varies based on the 1229 value of the fss_type field that is reported as part of the 1230 fs_status attribute (see Section 11.11 of [RFC5661]). Since these 1231 file systems are presumed to be unsuitable for simultaneous use, 1232 there is no specification of how locking is handled; in general, 1233 locks obtained on one file system will be separate from those on 1234 others. Since these are going to be read-only file systems, this 1235 is not expected to pose an issue for clients or applications. 1237 8.9. Lock State and File System Transitions (as updated) 1239 While accessing a file system, clients obtain locks enforced by the 1240 server which may prevent actions by other clients that are 1241 inconsistent with those locks. 1243 When access is transferred between replicas, clients need to be 1244 assured that the actions disallowed by holding these locks cannot 1245 have occurred during the transition. This can be ensured by the 1246 methods below. If at least one of these is not implemented clients 1247 will not be able to be assured of continuity of lock possession 1248 across a migration event. 1250 o Providing the client an opportunity to re-obtain his locks via a 1251 per-fs grace period on the destination server. Because the lock 1252 reclaim mechanism was originally defined to support server reboot, 1253 it implicitly assumes that file handles will on reclaim will be 1254 the same as those at open. In the case of migration this requires 1255 that source and destination servers use the same filehandles, as 1256 evidenced by using the same server scope (see Section 12.2 of the 1257 current document) or by showing this agreement using 1258 fs_locations_info (see Section 8.2 above). 1260 o Transferring locking state as part of the transition as described 1261 in Section 9 of the current document to provide Transparent State 1262 Migration. 1264 Of these, Transparent State Migration provides the smoother 1265 experience for clients in that there is no grace-period-based delay 1266 before new locks can be obtained. However, it requires a greater 1267 degree of inter-server co-ordination. In general, the servers taking 1268 part in migration are free to provide either facility. However, when 1269 the filehandles can differ across the migration event, Transparent 1270 State Migration is the only available means of providing the needed 1271 functionality. 1273 It should be noted that these two methods are not mutually exclusive 1274 and that a server might well provide both. In particular, if there 1275 is some circumstance preventing a lock from being transferred, the 1276 server can allow it to be reclaimed. 1278 9. Transferring State upon Migration (to be added) 1280 When the transition is a result of a server-initiated decision to 1281 transition access and the source and destination servers have 1282 implemented appropriate co-operation, it is possible to: 1284 o Transfer locking state from the source to the destination server, 1285 in a fashion similar to that provide by Transparent State 1286 Migration in NFSv4.0, as described in [RFC7931]. Server 1287 responsibilities are described in Section 11.1 of the current 1288 document. 1290 o Transfer session state from the source to the destination server. 1291 Server responsibilities in effecting this transition are described 1292 in Section 11.2 of the current document. 1294 The means by which the client determines which of these transfer 1295 events has occurred are described in Section 10 of the current 1296 document. 1298 9.1. Transparent State Migration and pNFS (to be added) 1300 When pNFS is involved, the protocol is capable of supporting: 1302 o Migration of the MDS, leaving DS's in place. 1304 o Migration of the file system as a whole, including the MDS and 1305 associated DS's. 1307 o Replacement of one DS by another. 1309 o Migration of a pNFS file system to one in which pNFS is not used. 1311 o Migration of a file system not using pNFS to one in which layouts 1312 are available. 1314 Migration of the MDS function is directly supported by Transparent 1315 State Migration. Layout state will normally be transparently 1316 transferred, just as other state is. As a result, Transparent State 1317 Migration provides a framework in which, given appropriate inter-MDS 1318 data transfer, one MDS can be substituted for another. 1320 Migration of the file system function as a whole can be accomplished 1321 by recalling all layouts as part of the initial phase of the 1322 migration process. As a result, IO will be done through the MDS 1323 during the migration process, and new layouts can be granted once the 1324 client is interacting with the new MDS. An MDS can also effect this 1325 sort of transition by revoking all layouts as part of Transparent 1326 State Migration, as long as the client is notified about the loss of 1327 state. 1329 In order to allow migration to a file system on which pNFS is not 1330 supported, clients need to be prepared for a situation in which 1331 layouts are not available or supported on the destination file system 1332 and so direct IO requests to the destination server, rather than 1333 depending on layouts being available. 1335 Replacement of one DS by another is not addressed by migration as 1336 such but can be effected by an MDS recalling layouts for the DS to be 1337 replaced and issuing new ones to be served by the successor DS. 1339 Migration may transfer a file system from a server which does not 1340 support pNFS to one which does. In order to properly adapt to this 1341 situation, clients which support pNFS, but function adequately in its 1342 absence, should check for pNFS support when a file system is migrated 1343 and be prepared to use pNFS when support is available. 1345 10. Client Responsibilities when Access is Transitioned (to be added) 1347 For a client to respond to an access transition, it must be made 1348 aware of it. The ways in which this can happen are discussed in 1349 Section 10.1 below and subsequent sections. Section 10.2 goes on to 1350 complete the discussion of how the set of transitions to be responded 1351 to can be determined. Sections 10.3 through 10.5 discuss how the 1352 client should deal each transition it becomes aware of. 1354 10.1. Client Transition Notifications (to be added) 1356 When there is a change in the network access path used to access to a 1357 file system, there are a number of related status indications with 1358 which clients need to deal: 1360 o If an attempt is made to use or return a filehandle within a file 1361 system that is no longer accessible at the address previously used 1362 to access it, the error NFS4ERR_MOVED is returned. 1364 Exceptions are made to allow such file handles to be used when 1365 interrogating a location attribute. This enables a client to 1366 determine the new replica's location or network access path. 1368 This condition continues on subsequent attempts to access the file 1369 system in question. The only way the client can avoid the error 1370 is to cease accessing the filesystem in question at its old server 1371 location and access it instead using a different address at which 1372 it is now available. 1374 o Whenever a SEQUENCE operation is sent by a client to a server 1375 which generated state held on that client which is associated with 1376 a file system that is no longer accessible at which it was 1377 previously available, the status bit SEQ4_STATUS_LEASE_MOVED is 1378 set in the response. 1380 This condition continues until the client acknowledges the 1381 notification by fetching a location attribute for the file system 1382 whose network access path is being changed. When there are 1383 multiple such file systems, a location attribute for each such 1384 file system needs to be fetched, in order to clear the condition. 1385 Even after the condition is cleared, the client needs to respond 1386 by using the location information to access the file system at its 1387 new location to ensure that leases are not needlessly expired. 1389 Unlike the case of NFSv4.0 in which the corresponding conditions are 1390 both errors, in NFSv4.1 the client can, and often will, receive both 1391 indications on the same request. As a result, implementations need 1392 to address the question of how to co-ordinate the necessary recovery 1393 actions when both indications arrive simultaneously. It should be 1394 noted that when the server decides whether SEQ4_STATUS_LEASE_MOVED is 1395 to be set, it has no way of knowing which file system will be 1396 referenced or whether NFS4ERR_MOVED will be returned. 1398 While it is true that, when only a single file system is subject to a 1399 change in its network access path, a single set of actions will clear 1400 both indications, the possibility of multiple file systems undergoing 1401 change calls for an approach in which there are separate recovery 1402 actions for each indication. In general, the response to neither 1403 indication can be subsumed within the other since: 1405 o If the client were to respond only to the MOVED indication, there 1406 would be no effective client response to a situation in which a 1407 file system was not being actively accessed at the time the access 1408 transition occurred. As a result, leases on the destination 1409 server might be needlessly expired. 1411 o If the client were to respond only to the LEASE_MOVED indication, 1412 recovery for file systems in active use could be deferred in order 1413 to accomplish recovery for others not being actively accessed. 1414 The consequences of this choice can pose particular problems when 1415 there are a large number of file systems supported by a particular 1416 server, or when it happens that some servers, after receiving 1417 migrated file systems have periods of unavailability, such as 1418 occur as a result of server reboot. This can result in recovery 1419 for actively accessed migrated file systems being unnecessarily 1420 delayed for long periods of time. 1422 Similar considerations apply to other arrangements in which one of 1423 the indications, while not ignored per se, is subsumed within a 1424 single recovery process focused on recovery for the other indication. 1426 Although clients are free to decide on their own approaches to 1427 recovery, we will explore in Section 10.2 below an approach with the 1428 following characteristics: 1430 o All instances of the MOVED indication, whether they involve 1431 migration or not, are dealt with promptly, either by doing the 1432 necessary recovery directly, providing that it be done 1433 asynchronously, or ensuring that it is already under way. 1435 o All instances of the LEASE_MOVED indication are dealt with 1436 asynchronously, in a transition discovery thread whose job is to 1437 clear that indication by fetching the appropriate location 1438 attribute. Because this thread will only be fetching a location 1439 attribute and the fs_status attribute for the file systems 1440 referenced by the client, it cannot receive MOVED indications. 1441 Some useful guidance regarding possible implementation of a 1442 transition discovery thread can be found below. 1444 o When a transition discovery thread happens upon a migrated file 1445 system (i.e. not present and not a pure referral), the thread is 1446 likely to have cleared one (out of an unknown number) of file 1447 systems whose migration needs to be responded to. The discovery 1448 thread needs to schedule the appropriate migration recovery (as 1449 described in Section 10.3 below). This is necessary to ensure 1450 that migrated file systems will be referenced on the destination 1451 server in order to avoid unnecessary lease expiration. 1453 For many of the migrated file systems discovered in this way, the 1454 client has not received any MOVED indication. In such cases, 1455 lease recovery needs to be scheduled but it should not interfere 1456 with continuation of the transition discovery function. 1458 o When a transition discovery thread receives a LEASE_MOVED 1459 indication, it takes no special action but continues its normal 1460 operation. On the other hand, if a LEASE_MOVED indication is not 1461 received, it indicates that the thread has completed its work 1462 successfully. 1464 10.2. Use of a Transition Discovery Thread (to be added) 1466 As noted above, LEASE_MOVED indications can be dealt with in a 1467 transition discovery thread. When this approach is used, 1469 o No action needs to be taken for such indications received by the 1470 transition discovery threads, since continuation of that thread's 1471 work will address the issue. 1473 o For such indications received in other contexts, the generally 1474 appropriate response is to initiate or otherwise provide for the 1475 execution of a transition discovery thread for file systems 1476 associated with the server IP address returning the indication. 1478 o In all cases in which the appropriate transition discovery thread 1479 is running, nothing further needs to be done to respond to 1480 LEASE_MOVED indications. 1482 This leaves a potential difficulty in situations in which the 1483 transition discovery thread is near to completion but is still 1484 operating. One should not ignore a LEASE_MOVED indication if the 1485 discovery thread is not able to respond to additional transitioning 1486 file system without additional aid. A further difficulty in 1487 addressing such situation is that a LEASE_MOVED indication may 1488 reflect the server's state at the time the SEQUENCE operation was 1489 processed, which may be different from that in effect at the time the 1490 response is received. 1492 A useful approach to this issue involves the use of separate 1493 externally-visible discovery thread states representing non- 1494 operation, normal operation, and completion/verification of 1495 transition discovery processing. 1497 Within that framework, discovery thread processing would proceed as 1498 follows. 1500 o While in the normal-operation state, the thread would fetch, for 1501 successive file systems known to the client on the server being 1502 worked on, a location attribute plus the fs_status attribute. 1504 o If the fs_status attribute indicates that the file system is a 1505 migrated one (i.e. fss_absent is true and fss_type != 1506 STATUS4_REFERRAL) and thus that it is likely that the fetch of the 1507 location attribute has cleared one the file systems contributing 1508 to the LEASE_MOVED indication. 1510 o In cases in which that happened, the thread cannot know whether 1511 the LEASE_MOVED indication has been cleared and so it enters the 1512 completion/verification state and proceeds to issue a COMPOUND to 1513 see if the LEASE_MOVED indication has been cleared. 1515 o When the discovery thread is in the completion/verification state, 1516 if others get a LEASE_MOVED indication they note this fact and it 1517 is used when the request completes, as described below. 1519 When the request used in the completion/verification state completes: 1521 o If a LEASE_MOVED indication is returned, the discovery thread 1522 resumes its normal work. 1524 o Otherwise, if there is any record that other requests saw a 1525 LEASE_MOVED indication, that record is cleared and the 1526 verification request retried. The discovery thread remains in 1527 completion/verification state. 1529 o If there has been no LEASE_MOVED indication, the work of the 1530 discovery thread is considered completed and it enters the non- 1531 operating state. 1533 10.3. Overview of Client Response to NFS4ERR_MOVED (to be added) 1535 This section outlines a way in which a client that receives 1536 NFS4ERR_MOVED can respond by using a new server or network address if 1537 one is available. As part of that process, it will determine: 1539 o Whether the NFS4ERR_MOVED indicates migration has occurred, or 1540 whether it indicates another sort of file system access transition 1541 as discussed in Section 7 above. 1543 o In the case of migration, whether Transparent State Migration has 1544 occurred. 1546 o Whether any state has been lost during the process of Transparent 1547 State Migration. 1549 o Whether sessions have been transferred as part of Transparent 1550 State Migration. 1552 During the first phase of this process, the client proceeds to 1553 examine location entries to find the initial network address it will 1554 use to continue access to the file system or its replacement. For 1555 each location entry that the client examines, the process consists of 1556 five steps: 1558 1. Performing an EXCHANGE_ID directed at the location address. This 1559 operation is used to register the client-owner with the server, 1560 to obtain a client ID to be use subsequently to communicate with 1561 it, to obtain tat client ID's confirmation status and, to 1562 determine server_owner and scope for the purpose of determining 1563 if the entry is trunkable with that previously being used to 1564 access the file system (i.e. that it represents another network 1565 access path to the same file system and can share locking state 1566 with it). 1568 2. Making an initial determination of whether migration has 1569 occurred. The initial determination will be based on whether the 1570 EXCHANGE_ID results indicate that the current location element is 1571 server-trunkable with that used to access the file system when 1572 access was terminated by receiving NFS4ERR_MOVED. If it is, then 1573 migration has not occurred and the transition is dealt with, at 1574 least initially, as one involving continued access to the same 1575 file system on the same server through a new network address. 1577 3. Obtaining access to existing session state or creating new 1578 sessions. How this is done depends on the initial determination 1579 of whether migration has occurred and can be done as described in 1580 Section 10.4 below in the case of migration or as described in 1581 Section 10.5 below in the case of a network address transfer 1582 without migration. 1584 4. Verification of the trunking relationship assumed in step 2 as 1585 discussed in Section 2.10.5.1 of [RFC5661]. Although this step 1586 will generally confirm the initial determination, it is possible 1587 for verification to fail with the result that an initial 1588 determination that a network address shift (without migration) 1589 has occurred may be invalidated and migration determined to have 1590 occurred. There is no need to redo step 3 above, since it will 1591 be possible to continue use of the session established already. 1593 5. Obtaining access to existing locking state and/or reobtaining it. 1594 How this is done depends on the final determination of whether 1595 migration has occurred and can be done as described below in 1596 Section 10.4 in the case of migration or as described in 1597 Section 10.5 in the case of a network address transfer without 1598 migration. 1600 Once the initial address has been determined, clients are free to 1601 apply an abbreviated process to find additional addresses trunkable 1602 with it (clients may seek session-trunkable or server-trunkable 1603 addresses depending on whether they support clientid trunking). 1604 During this later phase of the process, further location entries are 1605 examined using the abbreviated procedure specified below: 1607 1. Before the EXCHANGE_ID, the fs name of the location entry is 1608 examined and if it does not match that currently being used, the 1609 entry is ignored. otherwise, one proceeds as specified by step 1 1610 above,. 1612 2. In the case that the network address is session-trunkable with 1613 one used previously a BIND_CONN_TO_SESSION is used to access that 1614 session using new network address. Otherwise, or if the bind 1615 operation fails, a CREATE_SESSION is done. 1617 3. The verification procedure referred to in step 4 above is used. 1618 However, if it fails, the entry is ignored and the next available 1619 entry is used. 1621 10.4. Obtaining Access to Sessions and State after Migration (to be 1622 added) 1624 In the event that migration has occurred, the determination of 1625 whether Transparent State Migration has occurred is driven by the 1626 client ID returned by the EXCHANGE_ID and the reported confirmation 1627 status. 1629 o If the client ID is an unconfirmed client ID not previously known 1630 to the client, then Transparent State Migration has not occurred. 1632 o If the client ID is a confirmed client ID previously known to the 1633 client, then any transferred state would have been merged with an 1634 existing client ID representing the client to the destination 1635 server. In this state merger case, Transparent State Migration 1636 might or might not have occurred and a determination as to whether 1637 it has occurred is deferred until sessions are established and we 1638 are ready to begin state recovery. 1640 o If the client ID is a confirmed client ID not previously known to 1641 the client, then the client can conclude that the client ID was 1642 transferred as part of Transparent State Migration. In this 1643 transferred client ID case, Transparent State Migration has 1644 occurred although some state may have been lost. 1646 Once the client ID has been obtained, it is necessary to obtain 1647 access to sessions to continue communication with the new server. In 1648 any of the cases in which Transparent State Migration has occurred, 1649 it is possible that a session was transferred as well. To deal with 1650 that possibility, clients can, after doing the EXCHANGE_ID, issue a 1651 BIND_CONN_TO_SESSION to connect the transferred session to a 1652 connection to the new server. If that fails, it is an indication 1653 that the session was not transferred and that a new session needs to 1654 be created to take its place. 1656 In some situations, it is possible for a BIND_CONN_TO_SESSION to 1657 succeed without session migration having occurred. If state merger 1658 has taken place then the associated client ID may have already had a 1659 set of existing sessions, with it being possible that the sessionid 1660 of a given session is the same as one that might have been migrated. 1661 In that event, a BIND_CONN_TO_SESSION might succeed, even though 1662 there could have been no migration of the session with that 1663 sessionid. 1665 Once the client has determined the initial migration status, and 1666 determined that there was a shift to a new server, it needs to re- 1667 establish its lock state, if possible. To enable this to happen 1668 without loss of the guarantees normally provided by locking, the 1669 destination server needs to implement a per-fs grace period in all 1670 cases in which lock state was lost, including those in which 1671 Transparent State Migration was not implemented. 1673 Clients need to be deal with the following cases: 1675 o In the state merger case, it is possible that the server has not 1676 attempted Transparent State Migration, in which case state may 1677 have been lost without it being reflected in the SEQ4_STATUS bits. 1678 To determine whether this has happened, the client can use 1679 TEST_STATEID to check whether the stateids created on the source 1680 server are still accessible on the destination server. Once a 1681 single stateid is found to have been successfully transferred, the 1682 client can conclude that Transparent State Migration was begun and 1683 any failure to transport all of the stateids will be reflected in 1684 the SEQ4_STATUS bits. Otherwise. Transparent State Migration has 1685 not occurred. 1687 o In a case in which Transparent State Migration has not occurred, 1688 the client can use the per-fs grace period provided by the 1689 destination server to reclaim locks that were held on the source 1690 server. 1692 o In a case in which Transparent State Migration has occurred, and 1693 no lock state was lost (as shown by SEQ4_STATUS flags), no lock 1694 reclaim is necessary. 1696 o In a case in which Transparent State Migration has occurred, and 1697 some lock state was lost (as shown by SEQ4_STATUS flags), existing 1698 stateids need to be checked for validity using TEST_STATEID, and 1699 reclaim used to re-establish any that were not transferred. 1701 For all of the cases above, RECLAIM_COMPLETE with an rca_one_fs value 1702 of true should be done before normal use of the file system including 1703 obtaining new locks for the file system. This applies even if no 1704 locks were lost and needed to be reclaimed. 1706 10.5. Obtaining Access to Sessions and State after Network Address 1707 Transfer (to be added) 1709 The case in which there is a transfer to a new network address 1710 without migration is similar to that described in Section 10.4 above 1711 in that there is a need to obtain access to needed sessions and 1712 locking state. However, the details are simpler and will vary 1713 depending on the type of trunking between the address receiving 1714 NFS4ERR_MOVED and that to which the transfer is to be made 1716 To make a session available for use, a BIND_CONN_TO_SESSION should be 1717 used to obtain access to the session previously in use. Only if this 1718 fails, should a CREATE_SESSION be done. While this procedure mirrors 1719 that in Section 10.4 above, there is an important difference in that 1720 preservation of the session is not purely optional but depends on the 1721 type of trunking. 1723 Access to appropriate locking state should need no actions beyond 1724 access to the session. However. the SEQ4_STATUS bits should be 1725 checked for lost locking state, including the need to reclaim locks 1726 after a server reboot. 1728 11. Server Responsibilities Upon Migration (to be added) 1730 In order to effect Transparent State Migration and possibly session 1731 migration, the source and server need to co-operate to transfer 1732 certain client-relevant information. The sections below discuss the 1733 information to be transferred but do not define the specifics of the 1734 transfer protocol. This is left as an implementation choice although 1735 standards in this area could be developed at a later time. 1737 Transparent State Migration and session migration are discussed 1738 separately, in Sections 11.1 and 11.2 below respectively. In each 1739 case, the discussion addresses the issue of providing the client a 1740 consistent view of the transferred state, even though the transfer 1741 might take an extended time. 1743 11.1. Server Responsibilities in Effecting Transparent State Migration 1744 (to be added) 1746 The basic responsibility of the source server in effecting 1747 Transparent State Migration is to make available to the destination 1748 server a description of each piece of locking state associated with 1749 the file system being migrated. In addition to client id string and 1750 verifier, the source server needs to provide, for each stateid: 1752 o The stateid including the current sequence value. 1754 o The associated client ID. 1756 o The handle of the associated file. 1758 o The type of the lock, such as open, byte-range lock, delegation, 1759 layout. 1761 o For locks such as opens and byte-range locks, there will be 1762 information about the owner(s) of the lock. 1764 o For recallable/revocable lock types, the current recall status 1765 needs to be included. 1767 o For each lock type there will by type-specific information, such 1768 as share and deny modes for opens and type and byte ranges for 1769 byte-range locks and layouts. 1771 A further server responsibility concerns locks that are revoked or 1772 otherwise lost during the process of file system migration. Because 1773 locks that appear to be lost during the process of migration will be 1774 reclaimed by the client, the servers have to take steps to ensure 1775 that locks revoked soon before or soon after migration are not 1776 inadvertently allowed to be reclaimed in situations in which the 1777 continuity of lock possession cannot be assured. 1779 o For locks lost on the source but whose loss has not yet been 1780 acknowledged by the client (by using FREE_STATEID), the 1781 destination must be aware of this loss so that it can deny a 1782 request to reclaim them. 1784 o For locks lost on the destination after the state transfer but 1785 before the client's RECLAIM_COMPLTE is done, the destination 1786 server should note these and not allow them to be reclaimed. 1788 An additional responsibility of the cooperating servers concerns 1789 situations in which a stateid cannot be transferred transparently 1790 because it conflicts with an existing stateid held by the client and 1791 associated with a different file system. In this case there are two 1792 valid choices: 1794 o Treat the transfer, as in NFSv4.0, as one without Transparent 1795 State Migration. In this case, conflicting locks cannot be 1796 granted until the client does a RECLAIM_COMPLETE, after reclaiming 1797 the locks it had, with the exception of reclaims denied because 1798 they were attempts to reclaim locks that had been lost. 1800 o Implement Transparent State Migration, except for the lock with 1801 the conflicting stateid. In this case, the client will be aware 1802 of a lost lock (through the SEQ4_STATUS flags) and be allowed to 1803 reclaim it. 1805 When transferring state between the source and destination, the 1806 issues discussed in Section 7.2 of [RFC7931] must still be attended 1807 to. In this case, the use of NFS4ERR_DELAY may still necessary in 1808 NFSv4.1, as it was in NFSv4.0, to prevent locking state changing 1809 while it is being transferred. 1811 There are a number of important differences in the NFS4.1 context: 1813 o The absence of RELEASE_LOCKOWNER means that the one case in which 1814 an operation could not be deferred by use of NFS4ERR_DELAY no 1815 longer exists. 1817 o Sequencing of operations is no longer done using owner-based 1818 operation sequences numbers. Instead, sequencing is session- 1819 based 1821 As a result, when sessions are not transferred, the techniques 1822 discussed in [RFC7931] are adequate and will not be further 1823 discussed. 1825 11.2. Server Responsibilities in Effecting Session Transfer (to be 1826 added) 1828 The basic responsibility of the source server in effecting session 1829 transfer is to make available to the destination server a description 1830 of the current state of each slot with the session, including: 1832 o The last sequence value received for that slot. 1834 o Whether there is cached reply data for the last request executed 1835 and, if so, the cached reply. 1837 When sessions are transferred, there are a number of issues that pose 1838 challenges in terms of making the transferred state unmodifiable 1839 during the period it is gathered up and transferred to the 1840 destination server. 1842 o A single session may be used to access multiple file systems, not 1843 all of which are being transferred. 1845 o Requests made on a session may, even if rejected, affect the state 1846 of the session by advancing the sequence number associated with 1847 the slot used. 1849 As a result, when the filesystem state might otherwise be considered 1850 unmodifiable, the client might have any number of in-flight requests, 1851 each of which is capable of changing session state, which may be of a 1852 number of types: 1854 1. Those requests that were processed on the migrating file system, 1855 before migration began. 1857 2. Those requests which got the error NFS4ERR_DELAY because the file 1858 system being accessed was in the process of being migrated. 1860 3. Those requests which got the error NFS4ERR_MOVED because the file 1861 system being accessed had been migrated. 1863 4. Those requests that accessed the migrating file system, in order 1864 to obtain location or status information. 1866 5. Those requests that did not reference the migrating file system. 1868 It should be noted that the history of any particular slot is likely 1869 to include a number of these request classes. In the case in which a 1870 session which is migrated is used by filesystems other than the one 1871 migrated, requests of class 5 may be common and be the last request 1872 processed, for many slots. 1874 Since session state can change even after the locking state has been 1875 fixed as part of the migration process, the session state known to 1876 the client could be different from that on the destination server, 1877 which necessarily reflects the session state on the source server, at 1878 an earlier time. In deciding how to deal with this situation, it is 1879 helpful to distinguish between two sorts of behavioral consequences 1880 of the choice of initial sequence ID values. 1882 o The error NFS4ERR_SEQ_MISORDERED is returned when the sequence ID 1883 in a request is neither equal to the last one seen for the current 1884 slot nor the next greater one. 1886 In view of the difficulty of arriving at a mutually acceptable 1887 value for the correct last sequence value at the point of 1888 migration, it may be necessary for the server to show some degree 1889 of forbearance, when the sequence ID is one that would be 1890 considered unacceptable if session migration were not involved. 1892 o Returning the cached reply for a previously executed request when 1893 the sequence ID in the request matches the last value recorded for 1894 the slot. 1896 In the cases in which an error is returned and there is no 1897 possibility of any non-idempotent operation having been executed, 1898 it may not be necessary to adhere to this as strictly as might be 1899 proper if session migration were not involved. For example, the 1900 fact that the error NFS4ERR_DELAY was returned may not assist the 1901 client in any material way, while the fact that NFS4ERR_MOVED was 1902 returned by the source server may not be relevant when the request 1903 was reissued, directed to the destination server. 1905 One part of the necessary adaptation to these sorts of issues would 1906 restrict enforcement of normal slot sequence enforcement semantics 1907 until the client itself, by issuing a request using a particular slot 1908 on the destination server, established the new starting sequence for 1909 that slot on the migrated session. 1911 An important issue is that the specification needs to take note of 1912 all potential COMPOUNDs, even if they might be unlikely in practice. 1913 For example, a COMPOUND is allowed to access multiple file systems 1914 and might perform non-idempotent operations in some of them before 1915 accessing a file system being migrated. Also, a COMPOUND may return 1916 considerable data in the response, before being rejected with 1917 NFS4ERR_DELAY or NFS4ERR_MOVED, and may in addition be marked as 1918 sa_cachethis. 1920 To address these issues, the destination server MAY: 1922 o Avoid enforcing any sequencing semantics for a particular slot 1923 until the client has established the starting sequence for that 1924 slot on the destination server. 1926 o For each slot, avoid returning a cached reply returning 1927 NFS4ERR_DELAY or NFS4ERR_MOVED until the client has established 1928 the starting sequence for that slot on the destination server. 1930 o Until the client has established the starting sequence for a 1931 particular slot on the destination server, avoid reporting 1932 NFS4ERR_SEQ_MISORDERED or return a cached reply returning 1933 NFS4ERR_DELAY or NFS4ERR_MOVED, where the reply consists solely of 1934 a series of operations where the response is NFS4_OK until the 1935 final error. 1937 12. Changes to RFC5661 outside Section 11 1939 Beside the major rework of Section 11, there are a number of related 1940 changes are necessary. 1942 o The summary that appeared in Section 1.7.3.3 of [RFC5661] needs to 1943 be revised to reflect the changes called for in Section 4 of the 1944 current document. The updated summary appears in Section 12.1 1945 below. 1947 o The discussion of server scope which appeared in Section 2.10.4 of 1948 [RFC5661] needs to be replaced, since the existing text appears to 1949 require a level of inter-server co-ordination incompatible with 1950 its basic function of avoiding the need for a globally uniform 1951 means of assigning server_owner values. A revised treatment 1952 appears Section 12.2 below. 1954 o While the last paragraph (exclusive of sub-sections) of 1955 Section 2.10.5 in [RFC5661], dealing with server_owner changes, is 1956 literally true, it has been a source of confusion. Since the 1957 existing paragraph can be read as suggesting that such changes be 1958 dealt with non-disruptively, the treatment in Section 12.4 below 1959 needs to be substituted. 1961 o The existing definition of NFS4ERR_MOVED (in Section 15.1.2.4 of 1962 [RFC5661]) needs to be updated to reflect the different handling 1963 of unavailability of a particular fs via a specific network 1964 address. Since such a situation is no longer considered to 1965 constitute unavailability of a file system instance, the 1966 description needs to change even though the instances in which it 1967 is returned remain the same. The updated description appears in 1968 Section 12.3 below. 1970 o The existing treatment of EXCHANGE_ID (in Section 18.35 of 1971 [RFC5661]) assumes that client IDs cannot be created/ confirmed 1972 other than by the EXCHANGE_ID and CREATE_SESSION operations. 1973 Also, the necessary use of EXCHANGE_ID in recovery from migration 1974 and related situations is not addressed clearly. A revised 1975 treatment of EXCHANGE_ID is necessary and it appears in Section 13 1976 below while the specific differences between it and the treatment 1977 within [RFC5661] appears in Section 12.5. 1979 12.1. (Introduction to) Multi-Server Namespace (as updated) 1981 NFSv4.1 contains a number of features to allow implementation of 1982 namespaces that cross server boundaries and that allow and facilitate 1983 a non-disruptive transfer of support for individual file systems 1984 between servers. They are all based upon attributes that allow one 1985 file system to specify alternate, additional, and new location 1986 information which specifies how the client may access to access that 1987 file system. 1989 These attributes can be used to provide for individual active file 1990 systems: 1992 o Alternate network addresses to access the current file system 1993 instance. 1995 o The locations of alternate file system instances or replicas to be 1996 used in the event that the current file system instance becomes 1997 unavailable. 1999 These attributes may be used together with the concept of absent file 2000 systems, in which a position in the server namespace is associated 2001 with locations on other servers without any file system instance on 2002 the current server. 2004 o Location attributes may be used with absent file systems to 2005 implement referrals whereby one server may direct the client to a 2006 file system provided by another server. This allows extensive 2007 multi-server namespaces to be constructed. 2009 o Location attributes may be provided when a previously present file 2010 system becomes absent. This allows non-disruptive migration of 2011 file systems to alternate servers. 2013 12.2. Server Scope (as updated) 2015 Servers each specify a server scope value in the form of an opaque 2016 string eir_server_scope returned as part of the results of an 2017 EXCHANGE_ID operation. The purpose of the server scope is to allow a 2018 group of servers to indicate to clients that a set of servers sharing 2019 the same server scope value has arranged to use compatible values of 2020 otherwise opaque identifiers. Thus, the identifiers generated by two 2021 servers within that set and, in some cases identifiers by one server 2022 in that set that set may be presented to another server of the same 2023 scope. 2025 The use of such compatible values does not imply that a value 2026 generated by one server will always be accepted by another. In most 2027 cases, it will not. However, a server will not accept a value 2028 generated by another inadvertently. When it does accept it, it will 2029 be because it is recognized as valid and carrying the same meaning as 2030 on another server of the same scope. 2032 When servers are of the same server scope, this compatibility of 2033 values applies to the follow identifiers: 2035 o Filehandle values. A filehandle value accepted by two servers of 2036 the same server scope denotes the same object. A WRITE operation 2037 sent to one server is reflected immediately in a READ sent to the 2038 other. 2040 o Server owner values. When the server scope values are the same, 2041 server owner value may be validly compared. In cases where the 2042 server scope values are different, server owner values are treated 2043 as different even if they contain all identical bytes. 2045 The coordination among servers required to provide such compatibility 2046 can be quite minimal, and limited to a simple partition of the ID 2047 space. The recognition of common values requires additional 2048 implementation, but this can be tailored to the specific situations 2049 in which that recognition is desired. 2051 Clients will have occasion to compare the server scope values of 2052 multiple servers under a number of circumstances, each of which will 2053 be discussed under the appropriate functional section: 2055 o When server owner values received in response to EXCHANGE_ID 2056 operations sent to multiple network addresses are compared for the 2057 purpose of determining the validity of various forms of trunking, 2058 as described in Section 4.5.2 of the current document. 2060 o When network or server reconfiguration causes the same network 2061 address to possibly be directed to different servers, with the 2062 necessity for the client to determine when lock reclaim should be 2063 attempted, as described in Section 8.4.2.1 of [RFC5661]. 2065 When two replies from EXCHANGE_ID, each from two different server 2066 network addresses, have the same server scope, there are a number of 2067 ways a client can validate that the common server scope is due to two 2068 servers cooperating in a group. 2070 o If both EXCHANGE_ID requests were sent with RPCSEC_GSS ([RFC2203], 2071 [RFC5403], [RFC7861]) authentication and the server principal is 2072 the same for both targets, the equality of server scope is 2073 validated. It is RECOMMENDED that two servers intending to share 2074 the same server scope also share the same principal name. 2076 o The client may accept the appearance of the second server in the 2077 fs_locations or fs_locations_info attribute for a relevant file 2078 system. For example, if there is a migration event for a 2079 particular file system or there are locks to be reclaimed on a 2080 particular file system, the attributes for that particular file 2081 system may be used. The client sends the GETATTR request to the 2082 first server for the fs_locations or fs_locations_info attribute 2083 with RPCSEC_GSS authentication. It may need to do this in advance 2084 of the need to verify the common server scope. If the client 2085 successfully authenticates the reply to GETATTR, and the GETATTR 2086 request and reply containing the fs_locations or fs_locations_info 2087 attribute refers to the second server, then the equality of server 2088 scope is supported. A client may choose to limit the use of this 2089 form of support to information relevant to the specific file 2090 system involved (e.g. a file system being migrated). 2092 12.3. Revised Treatment of NFS4ERR_MOVED 2094 Because the term "replica" is now used differently, the current 2095 description of NFS4ERR_MOVED needs to be changed to the one below. 2096 The new paragraph explicitly recognizes that a different network 2097 address might be used, while the previous description, misleadingly, 2098 treated this as a shift between two replicas while only a single file 2099 system instance might be involved. 2101 The file system that contains the current filehandle object is 2102 accessible using the address on which the request was made. It 2103 still might be accessible using other addresses server-trunkable 2104 with it or it might not be present at the server. In the latter 2105 case, it might have been relocated or migrated to another server, 2106 or it might have never been present. The client may obtain 2107 information regarding access to the file system location by 2108 obtaining the "fs_locations" or "fs_locations_info" attribute for 2109 the current filehandle. For further discussion, refer to 2110 Section 11 of [RFC5661], as modified by the current document. 2112 12.4. Revised Discussion of Server_owner changes 2114 Because of problems with the treatment of such changes, the confusing 2115 paragraph, which simply says that such changes need to be dealt with, 2116 is to be replace by the one below. 2118 It is always possible that, as a result of various sorts of 2119 reconfiguration events, eir_server_scope and eir_server_owner 2120 values may be different on subsequent EXCHANGE_ID requests made to 2121 the same network address. 2123 In most cases such reconfiguration events will be disruptive and 2124 indicate that an IP address formerly connected to one server is 2125 now connected to an entirely different one. 2127 Some guidelines on client handling of such situations follow: 2129 * When eir_server_scope changes, the client has no assurance that 2130 any id's it obtained previously (e.g. file handles) can be 2131 validly used on the new server, and, even if the new server 2132 accepts them, there is no assurance that this is not due to 2133 accident. Thus it is best to treat all such state as lost/ 2134 stale although a client may assume that the probability of 2135 inadvertent acceptance is low and treat this situation as 2136 within the next case. 2138 * When eir_server_scope remains the same and 2139 eir_server_owner.so_major_id changes, the client can use 2140 filehandles it has and attempt reclaims. It may find that 2141 these are now stale but if NFS4ERR_STALE is not received, he 2142 can proceed to reclaim his opens. 2144 * When eir_server_scope and eir_server_owner.so_major_id remain 2145 the same, the client has to use the now-current values of 2146 eir_server_owner.so_minor_id in deciding on appropriate forms 2147 of trunking. 2149 12.5. Revision to Treatment of EXCHANGE_ID 2151 There are a number of issues in the original treatment of EXCHANGE_ID 2152 (in [RFC5661]) that cause problems for Transparent State Migration 2153 and for the transfer of access between different network access paths 2154 to the same file system instance. 2156 These issues arise from the fact that this treatment was written: 2158 o assuming that a client ID can only become known to a server by 2159 having been created by executing an EXCHANGE_ID, with confirmation 2160 of the ID only possible by execution of a CREATE_SESSION. 2162 o Considering the interactions between a client and a server only on 2163 a single network address 2165 As these assumptions have become invalid in the context of 2166 Transparent State Migration and active use of trunking, the treatment 2167 has been modified in several respects. 2169 o It was assumed that an EXCHANGED_ID executed when the server is 2170 already aware of a given client instance must be either updating 2171 associated parameters (e.g. with respect to callbacks) or a 2172 lingering retransmission to deal with a previously lost reply. As 2173 result, any slot sequence returned would be of no use. The 2174 existing treatment goes so far as to say that it "MUST NOT" be 2175 used, although this usage is not in accord with [RFC2119]. This 2176 created a difficulty when an EXCHANGE_ID is done after Transparent 2177 State Migration since that slot sequence needs to be used in a 2178 subsequent CREATE_SESSION. 2180 In the updated treatment, CREATE_SESSION is a way that client IDs 2181 are confirmed but it is understood that other ways are possible. 2182 The slot sequence can be used as needed and cases in which it 2183 would be of no use are appropriately noted. 2185 o It was assumed that the only functions of EXCHANGE_ID were to 2186 inform the server of the client, create the client ID, and 2187 communicate it to the client. When multiple simultaneous 2188 connections are involved, as often happens when trunking, that 2189 treatment was inadequate in that it ignored the role of 2190 EXCHANGE_ID in associating the client ID with the connection on 2191 which it was done, so that it could be used by a subsequent 2192 CREATE_SESSSION, whose parameters do not include an explicit 2193 client ID. 2195 The new treatment explicitly discusses the role of EXCHANGE_ID in 2196 associating the client ID with the connection so it can be used by 2197 CREATE_SESSION and in associating a connection with an existing 2198 session. 2200 The new treatment can be found in Section 13 below. It is intended 2201 to supersede the treatment in Section 18.35 of [RFC5661]. Publishing 2202 a complete replacement for Section 18.35 allows the corrected 2203 definition to be read as a whole once [RFC5661] is updated 2205 13. Operation 42: EXCHANGE_ID - Instantiate Client ID (as updated) 2207 The EXCHANGE_ID exchanges long-hand client and server identifiers 2208 (owners), and provides access to a client ID, creating one if 2209 necessary. This client ID becomes associated with the connection on 2210 which the operation is done, so that it is available when a 2211 CREATE_SESSION is done or when the connection is used to issue a 2212 request on an existing session associated with the current client. 2214 13.1. ARGUMENT 2216 const EXCHGID4_FLAG_SUPP_MOVED_REFER = 0x00000001; 2217 const EXCHGID4_FLAG_SUPP_MOVED_MIGR = 0x00000002; 2219 const EXCHGID4_FLAG_BIND_PRINC_STATEID = 0x00000100; 2221 const EXCHGID4_FLAG_USE_NON_PNFS = 0x00010000; 2222 const EXCHGID4_FLAG_USE_PNFS_MDS = 0x00020000; 2223 const EXCHGID4_FLAG_USE_PNFS_DS = 0x00040000; 2225 const EXCHGID4_FLAG_MASK_PNFS = 0x00070000; 2227 const EXCHGID4_FLAG_UPD_CONFIRMED_REC_A = 0x40000000; 2228 const EXCHGID4_FLAG_CONFIRMED_R = 0x80000000; 2230 struct state_protect_ops4 { 2231 bitmap4 spo_must_enforce; 2232 bitmap4 spo_must_allow; 2233 }; 2235 struct ssv_sp_parms4 { 2236 state_protect_ops4 ssp_ops; 2237 sec_oid4 ssp_hash_algs<>; 2238 sec_oid4 ssp_encr_algs<>; 2239 uint32_t ssp_window; 2240 uint32_t ssp_num_gss_handles; 2241 }; 2243 enum state_protect_how4 { 2244 SP4_NONE = 0, 2245 SP4_MACH_CRED = 1, 2246 SP4_SSV = 2 2247 }; 2249 union state_protect4_a switch(state_protect_how4 spa_how) { 2250 case SP4_NONE: 2251 void; 2252 case SP4_MACH_CRED: 2253 state_protect_ops4 spa_mach_ops; 2254 case SP4_SSV: 2255 ssv_sp_parms4 spa_ssv_parms; 2256 }; 2258 struct EXCHANGE_ID4args { 2259 client_owner4 eia_clientowner; 2260 uint32_t eia_flags; 2261 state_protect4_a eia_state_protect; 2262 nfs_impl_id4 eia_client_impl_id<1>; 2263 }; 2265 13.2. RESULT 2266 struct ssv_prot_info4 { 2267 state_protect_ops4 spi_ops; 2268 uint32_t spi_hash_alg; 2269 uint32_t spi_encr_alg; 2270 uint32_t spi_ssv_len; 2271 uint32_t spi_window; 2272 gsshandle4_t spi_handles<>; 2273 }; 2275 union state_protect4_r switch(state_protect_how4 spr_how) { 2276 case SP4_NONE: 2277 void; 2278 case SP4_MACH_CRED: 2279 state_protect_ops4 spr_mach_ops; 2280 case SP4_SSV: 2281 ssv_prot_info4 spr_ssv_info; 2282 }; 2284 struct EXCHANGE_ID4resok { 2285 clientid4 eir_clientid; 2286 sequenceid4 eir_sequenceid; 2287 uint32_t eir_flags; 2288 state_protect4_r eir_state_protect; 2289 server_owner4 eir_server_owner; 2290 opaque eir_server_scope; 2291 nfs_impl_id4 eir_server_impl_id<1>; 2292 }; 2294 union EXCHANGE_ID4res switch (nfsstat4 eir_status) { 2295 case NFS4_OK: 2296 EXCHANGE_ID4resok eir_resok4; 2298 default: 2299 void; 2300 }; 2302 13.3. DESCRIPTION 2304 The client uses the EXCHANGE_ID operation to register a particular 2305 client_owner with the server. However, when the client_owner has 2306 been already been registered by other means (e.g. Transparent State 2307 Migration), the client may still use EXCHANGE_ID to obtain the client 2308 ID assigned previously. 2310 The client ID returned from this operation will be associated with 2311 the connection on which the EXHANGE_ID is received and will serve as 2312 a parent object for sessions created by the client on this connection 2313 or to which the connection is bound. As a result of using those 2314 sessions to make requests involving the creation of state, that state 2315 will become associated with the client ID returned. 2317 In situations in which the registration of the client_owner has not 2318 occurred previously, the client ID must first be used, along with the 2319 returned eir_sequenceid, in creating an associated session using 2320 CREATE_SESSION. 2322 If the flag EXCHGID4_FLAG_CONFIRMED_R is set in the result, 2323 eir_flags, then it is an indication that the registration of the 2324 client_owner has already occurred and that a further CREATE_SESSION 2325 is not needed to confirm it. Of course, subsequent CREATE_SESSION 2326 operations may be needed for other reasons. 2328 The value eir_sequenceid is used to establish an initial sequence 2329 value associate with the client ID returned. In cases in which a 2330 CREATE_SESSION has already been done, there is no need for this 2331 value, since sequencing of such request has already been established 2332 and the client has no need for this value and will ignore it 2334 EXCHANGE_ID MAY be sent in a COMPOUND procedure that starts with 2335 SEQUENCE. However, when a client communicates with a server for the 2336 first time, it will not have a session, so using SEQUENCE will not be 2337 possible. If EXCHANGE_ID is sent without a preceding SEQUENCE, then 2338 it MUST be the only operation in the COMPOUND procedure's request. 2339 If it is not, the server MUST return NFS4ERR_NOT_ONLY_OP. 2341 The eia_clientowner field is composed of a co_verifier field and a 2342 co_ownerid string. As noted in section 2.4 of [RFC5661], the 2343 co_ownerid describes the client, and the co_verifier is the 2344 incarnation of the client. An EXCHANGE_ID sent with a new 2345 incarnation of the client will lead to the server removing lock state 2346 of the old incarnation. Whereas an EXCHANGE_ID sent with the current 2347 incarnation and co_ownerid will result in an error or an update of 2348 the client ID's properties, depending on the arguments to 2349 EXCHANGE_ID. 2351 A server MUST NOT use the same client ID for two different 2352 incarnations of an eir_clientowner. 2354 In addition to the client ID and sequence ID, the server returns a 2355 server owner (eir_server_owner) and server scope (eir_server_scope). 2356 The former field is used for network trunking as described in 2357 Section 2.10.54 of [RFC5661]. The latter field is used to allow 2358 clients to determine when client IDs sent by one server may be 2359 recognized by another in the event of file system migration (see 2360 Section 8.9 of the current document). 2362 The client ID returned by EXCHANGE_ID is only unique relative to the 2363 combination of eir_server_owner.so_major_id and eir_server_scope. 2364 Thus, if two servers return the same client ID, the onus is on the 2365 client to distinguish the client IDs on the basis of 2366 eir_server_owner.so_major_id and eir_server_scope. In the event two 2367 different servers claim matching server_owner.so_major_id and 2368 eir_server_scope, the client can use the verification techniques 2369 discussed in Section 2.10.5 of [RFC5661] to determine if the servers 2370 are distinct. If they are distinct, then the client will need to 2371 note the destination network addresses of the connections used with 2372 each server, and use the network address as the final discriminator. 2374 The server, as defined by the unique identity expressed in the 2375 so_major_id of the server owner and the server scope, needs to track 2376 several properties of each client ID it hands out. The properties 2377 apply to the client ID and all sessions associated with the client 2378 ID. The properties are derived from the arguments and results of 2379 EXCHANGE_ID. The client ID properties include: 2381 o The capabilities expressed by the following bits, which come from 2382 the results of EXCHANGE_ID: 2384 * EXCHGID4_FLAG_SUPP_MOVED_REFER 2386 * EXCHGID4_FLAG_SUPP_MOVED_MIGR 2388 * EXCHGID4_FLAG_BIND_PRINC_STATEID 2390 * EXCHGID4_FLAG_USE_NON_PNFS 2392 * EXCHGID4_FLAG_USE_PNFS_MDS 2394 * EXCHGID4_FLAG_USE_PNFS_DS 2396 These properties may be updated by subsequent EXCHANGE_ID requests 2397 on confirmed client IDs though the server MAY refuse to change 2398 them. 2400 o The state protection method used, one of SP4_NONE, SP4_MACH_CRED, 2401 or SP4_SSV, as set by the spa_how field of the arguments to 2402 EXCHANGE_ID. Once the client ID is confirmed, this property 2403 cannot be updated by subsequent EXCHANGE_ID requests. 2405 o For SP4_MACH_CRED or SP4_SSV state protection: 2407 * The list of operations (spo_must_enforce) that MUST use the 2408 specified state protection. This list comes from the results 2409 of EXCHANGE_ID. 2411 * The list of operations (spo_must_allow) that MAY use the 2412 specified state protection. This list comes from the results 2413 of EXCHANGE_ID. 2415 Once the client ID is confirmed, these properties cannot be 2416 updated by subsequent EXCHANGE_ID requests. 2418 o For SP4_SSV protection: 2420 * The OID of the hash algorithm. This property is represented by 2421 one of the algorithms in the ssp_hash_algs field of the 2422 EXCHANGE_ID arguments. Once the client ID is confirmed, this 2423 property cannot be updated by subsequent EXCHANGE_ID requests. 2425 * The OID of the encryption algorithm. This property is 2426 represented by one of the algorithms in the ssp_encr_algs field 2427 of the EXCHANGE_ID arguments. Once the client ID is confirmed, 2428 this property cannot be updated by subsequent EXCHANGE_ID 2429 requests. 2431 * The length of the SSV. This property is represented by the 2432 spi_ssv_len field in the EXCHANGE_ID results. Once the client 2433 ID is confirmed, this property cannot be updated by subsequent 2434 EXCHANGE_ID requests. 2436 There are REQUIRED and RECOMMENDED relationships among the 2437 length of the key of the encryption algorithm ("key length"), 2438 the length of the output of hash algorithm ("hash length"), and 2439 the length of the SSV ("SSV length"). 2441 + key length MUST be <= hash length. This is because the keys 2442 used for the encryption algorithm are actually subkeys 2443 derived from the SSV, and the derivation is via the hash 2444 algorithm. The selection of an encryption algorithm with a 2445 key length that exceeded the length of the output of the 2446 hash algorithm would require padding, and thus weaken the 2447 use of the encryption algorithm. 2449 + hash length SHOULD be <= SSV length. This is because the 2450 SSV is a key used to derive subkeys via an HMAC, and it is 2451 recommended that the key used as input to an HMAC be at 2452 least as long as the length of the HMAC's hash algorithm's 2453 output (see Section 3 of [RFC2104]). 2455 + key length SHOULD be <= SSV length. This is a transitive 2456 result of the above two invariants. 2458 + key length SHOULD be >= hash length / 2. This is because 2459 the subkey derivation is via an HMAC and it is recommended 2460 that if the HMAC has to be truncated, it should not be 2461 truncated to less than half the hash length (see Section 4 2462 of RFC2104 [RFC2104]). 2464 * Number of concurrent versions of the SSV the client and server 2465 will support (see Section 2.10.9 of [RFC5661]). This property 2466 is represented by spi_window in the EXCHANGE_ID results. The 2467 property may be updated by subsequent EXCHANGE_ID requests. 2469 o The client's implementation ID as represented by the 2470 eia_client_impl_id field of the arguments. The property may be 2471 updated by subsequent EXCHANGE_ID requests. 2473 o The server's implementation ID as represented by the 2474 eir_server_impl_id field of the reply. The property may be 2475 updated by replies to subsequent EXCHANGE_ID requests. 2477 The eia_flags passed as part of the arguments and the eir_flags 2478 results allow the client and server to inform each other of their 2479 capabilities as well as indicate how the client ID will be used. 2480 Whether a bit is set or cleared on the arguments' flags does not 2481 force the server to set or clear the same bit on the results' side. 2482 Bits not defined above cannot be set in the eia_flags field. If they 2483 are, the server MUST reject the operation with NFS4ERR_INVAL. 2485 The EXCHGID4_FLAG_UPD_CONFIRMED_REC_A bit can only be set in 2486 eia_flags; it is always off in eir_flags. The 2487 EXCHGID4_FLAG_CONFIRMED_R bit can only be set in eir_flags; it is 2488 always off in eia_flags. If the server recognizes the co_ownerid and 2489 co_verifier as mapping to a confirmed client ID, it sets 2490 EXCHGID4_FLAG_CONFIRMED_R in eir_flags. The 2491 EXCHGID4_FLAG_CONFIRMED_R flag allows a client to tell if the client 2492 ID it is trying to create already exists and is confirmed. 2494 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set in eia_flags, this means 2495 that the client is attempting to update properties of an existing 2496 confirmed client ID (if the client wants to update properties of an 2497 unconfirmed client ID, it MUST NOT set 2498 EXCHGID4_FLAG_UPD_CONFIRMED_REC_A). If so, it is RECOMMENDED that 2499 the client send the update EXCHANGE_ID operation in the same COMPOUND 2500 as a SEQUENCE so that the EXCHANGE_ID is executed exactly once. 2501 Whether the client can update the properties of client ID depends on 2502 the state protection it selected when the client ID was created, and 2503 the principal and security flavor it uses when sending the 2504 EXCHANGE_ID request. The situations described in items 6, 7, 8, or 9 2505 of the second numbered list of Section 13.4 below will apply. Note 2506 that if the operation succeeds and returns a client ID that is 2507 already confirmed, the server MUST set the EXCHGID4_FLAG_CONFIRMED_R 2508 bit in eir_flags. 2510 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in eia_flags, this 2511 means that the client is trying to establish a new client ID; it is 2512 attempting to trunk data communication to the server (See 2513 Section 2.10.5 of [RFC5661]); or it is attempting to update 2514 properties of an unconfirmed client ID. The situations described in 2515 items 1, 2, 3, 4, or 5 of the second numbered list of Section 13.4 2516 below will apply. Note that if the operation succeeds and returns a 2517 client ID that was previously confirmed, the server MUST set the 2518 EXCHGID4_FLAG_CONFIRMED_R bit in eir_flags. 2520 When the EXCHGID4_FLAG_SUPP_MOVED_REFER flag bit is set, the client 2521 indicates that it is capable of dealing with an NFS4ERR_MOVED error 2522 as part of a referral sequence. When this bit is not set, it is 2523 still legal for the server to perform a referral sequence. However, 2524 a server may use the fact that the client is incapable of correctly 2525 responding to a referral, by avoiding it for that particular client. 2526 It may, for instance, act as a proxy for that particular file system, 2527 at some cost in performance, although it is not obligated to do so. 2528 If the server will potentially perform a referral, it MUST set 2529 EXCHGID4_FLAG_SUPP_MOVED_REFER in eir_flags. 2531 When the EXCHGID4_FLAG_SUPP_MOVED_MIGR is set, the client indicates 2532 that it is capable of dealing with an NFS4ERR_MOVED error as part of 2533 a file system migration sequence. When this bit is not set, it is 2534 still legal for the server to indicate that a file system has moved, 2535 when this in fact happens. However, a server may use the fact that 2536 the client is incapable of correctly responding to a migration in its 2537 scheduling of file systems to migrate so as to avoid migration of 2538 file systems being actively used. It may also hide actual migrations 2539 from clients unable to deal with them by acting as a proxy for a 2540 migrated file system for particular clients, at some cost in 2541 performance, although it is not obligated to do so. If the server 2542 will potentially perform a migration, it MUST set 2543 EXCHGID4_FLAG_SUPP_MOVED_MIGR in eir_flags. 2545 When EXCHGID4_FLAG_BIND_PRINC_STATEID is set, the client indicates 2546 that it wants the server to bind the stateid to the principal. This 2547 means that when a principal creates a stateid, it has to be the one 2548 to use the stateid. If the server will perform binding, it will 2549 return EXCHGID4_FLAG_BIND_PRINC_STATEID. The server MAY return 2550 EXCHGID4_FLAG_BIND_PRINC_STATEID even if the client does not request 2551 it. If an update to the client ID changes the value of 2552 EXCHGID4_FLAG_BIND_PRINC_STATEID's client ID property, the effect 2553 applies only to new stateids. Existing stateids (and all stateids 2554 with the same "other" field) that were created with stateid to 2555 principal binding in force will continue to have binding in force. 2556 Existing stateids (and all stateids with the same "other" field) that 2557 were created with stateid to principal not in force will continue to 2558 have binding not in force. 2560 The EXCHGID4_FLAG_USE_NON_PNFS, EXCHGID4_FLAG_USE_PNFS_MDS, and 2561 EXCHGID4_FLAG_USE_PNFS_DS bits are described in Section 13.1 of 2562 [RFC5661] and convey roles the client ID is to be used for in a pNFS 2563 environment. The server MUST set one of the acceptable combinations 2564 of these bits (roles) in eir_flags, as specified in that section. 2565 Note that the same client owner/server owner pair can have multiple 2566 roles. Multiple roles can be associated with the same client ID or 2567 with different client IDs. Thus, if a client sends EXCHANGE_ID from 2568 the same client owner to the same server owner multiple times, but 2569 specifies different pNFS roles each time, the server might return 2570 different client IDs. Given that different pNFS roles might have 2571 different client IDs, the client may ask for different properties for 2572 each role/client ID. 2574 The spa_how field of the eia_state_protect field specifies how the 2575 client wants to protect its client, locking, and session states from 2576 unauthorized changes (Section 2.10.8.3 of [RFC5661]): 2578 o SP4_NONE. The client does not request the NFSv4.1 server to 2579 enforce state protection. The NFSv4.1 server MUST NOT enforce 2580 state protection for the returned client ID. 2582 o SP4_MACH_CRED. If spa_how is SP4_MACH_CRED, then the client MUST 2583 send the EXCHANGE_ID request with RPCSEC_GSS as the security 2584 flavor, and with a service of RPC_GSS_SVC_INTEGRITY or 2585 RPC_GSS_SVC_PRIVACY. If SP4_MACH_CRED is specified, then the 2586 client wants to use an RPCSEC_GSS-based machine credential to 2587 protect its state. The server MUST note the principal the 2588 EXCHANGE_ID operation was sent with, and the GSS mechanism used. 2589 These notes collectively comprise the machine credential. 2591 After the client ID is confirmed, as long as the lease associated 2592 with the client ID is unexpired, a subsequent EXCHANGE_ID 2593 operation that uses the same eia_clientowner.co_owner as the first 2594 EXCHANGE_ID MUST also use the same machine credential as the first 2595 EXCHANGE_ID. The server returns the same client ID for the 2596 subsequent EXCHANGE_ID as that returned from the first 2597 EXCHANGE_ID. 2599 o SP4_SSV. If spa_how is SP4_SSV, then the client MUST send the 2600 EXCHANGE_ID request with RPCSEC_GSS as the security flavor, and 2601 with a service of RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY. 2603 If SP4_SSV is specified, then the client wants to use the SSV to 2604 protect its state. The server records the credential used in the 2605 request as the machine credential (as defined above) for the 2606 eia_clientowner.co_owner. The CREATE_SESSION operation that 2607 confirms the client ID MUST use the same machine credential. 2609 When a client specifies SP4_MACH_CRED or SP4_SSV, it also provides 2610 two lists of operations (each expressed as a bitmap). The first list 2611 is spo_must_enforce and consists of those operations the client MUST 2612 send (subject to the server confirming the list of operations in the 2613 result of EXCHANGE_ID) with the machine credential (if SP4_MACH_CRED 2614 protection is specified) or the SSV-based credential (if SP4_SSV 2615 protection is used). The client MUST send the operations with 2616 RPCSEC_GSS credentials that specify the RPC_GSS_SVC_INTEGRITY or 2617 RPC_GSS_SVC_PRIVACY security service. Typically, the first list of 2618 operations includes EXCHANGE_ID, CREATE_SESSION, DELEGPURGE, 2619 DESTROY_SESSION, BIND_CONN_TO_SESSION, and DESTROY_CLIENTID. The 2620 client SHOULD NOT specify in this list any operations that require a 2621 filehandle because the server's access policies MAY conflict with the 2622 client's choice, and thus the client would then be unable to access a 2623 subset of the server's namespace. 2625 Note that if SP4_SSV protection is specified, and the client 2626 indicates that CREATE_SESSION must be protected with SP4_SSV, because 2627 the SSV cannot exist without a confirmed client ID, the first 2628 CREATE_SESSION MUST instead be sent using the machine credential, and 2629 the server MUST accept the machine credential. 2631 There is a corresponding result, also called spo_must_enforce, of the 2632 operations for which the server will require SP4_MACH_CRED or SP4_SSV 2633 protection. Normally, the server's result equals the client's 2634 argument, but the result MAY be different. If the client requests 2635 one or more operations in the set { EXCHANGE_ID, CREATE_SESSION, 2636 DELEGPURGE, DESTROY_SESSION, BIND_CONN_TO_SESSION, DESTROY_CLIENTID 2637 }, then the result spo_must_enforce MUST include the operations the 2638 client requested from that set. 2640 If spo_must_enforce in the results has BIND_CONN_TO_SESSION set, then 2641 connection binding enforcement is enabled, and the client MUST use 2642 the machine (if SP4_MACH_CRED protection is used) or SSV (if SP4_SSV 2643 protection is used) credential on calls to BIND_CONN_TO_SESSION. 2645 The second list is spo_must_allow and consists of those operations 2646 the client wants to have the option of sending with the machine 2647 credential or the SSV-based credential, even if the object the 2648 operations are performed on is not owned by the machine or SSV 2649 credential. 2651 The corresponding result, also called spo_must_allow, consists of the 2652 operations the server will allow the client to use SP4_SSV or 2653 SP4_MACH_CRED credentials with. Normally, the server's result equals 2654 the client's argument, but the result MAY be different. 2656 The purpose of spo_must_allow is to allow clients to solve the 2657 following conundrum. Suppose the client ID is confirmed with 2658 EXCHGID4_FLAG_BIND_PRINC_STATEID, and it calls OPEN with the 2659 RPCSEC_GSS credentials of a normal user. Now suppose the user's 2660 credentials expire, and cannot be renewed (e.g., a Kerberos ticket 2661 granting ticket expires, and the user has logged off and will not be 2662 acquiring a new ticket granting ticket). The client will be unable 2663 to send CLOSE without the user's credentials, which is to say the 2664 client has to either leave the state on the server or re-send 2665 EXCHANGE_ID with a new verifier to clear all state, that is, unless 2666 the client includes CLOSE on the list of operations in spo_must_allow 2667 and the server agrees. 2669 The SP4_SSV protection parameters also have: 2671 ssp_hash_algs: 2673 This is the set of algorithms the client supports for the purpose 2674 of computing the digests needed for the internal SSV GSS mechanism 2675 and for the SET_SSV operation. Each algorithm is specified as an 2676 object identifier (OID). The REQUIRED algorithms for a server are 2677 id-sha1, id-sha224, id-sha256, id-sha384, and id-sha512 [RFC4055]. 2678 The algorithm the server selects among the set is indicated in 2679 spi_hash_alg, a field of spr_ssv_prot_info. The field 2680 spi_hash_alg is an index into the array ssp_hash_algs. If the 2681 server does not support any of the offered algorithms, it returns 2682 NFS4ERR_HASH_ALG_UNSUPP. If ssp_hash_algs is empty, the server 2683 MUST return NFS4ERR_INVAL. 2685 ssp_encr_algs: 2687 This is the set of algorithms the client supports for the purpose 2688 of providing privacy protection for the internal SSV GSS 2689 mechanism. Each algorithm is specified as an OID. The REQUIRED 2690 algorithm for a server is id-aes256-CBC. The RECOMMENDED 2691 algorithms are id-aes192-CBC and id-aes128-CBC [CSOR_AES]. The 2692 selected algorithm is returned in spi_encr_alg, an index into 2693 ssp_encr_algs. If the server does not support any of the offered 2694 algorithms, it returns NFS4ERR_ENCR_ALG_UNSUPP. If ssp_encr_algs 2695 is empty, the server MUST return NFS4ERR_INVAL. Note that due to 2696 previously stated requirements and recommendations on the 2697 relationships between key length and hash length, some 2698 combinations of RECOMMENDED and REQUIRED encryption algorithm and 2699 hash algorithm either SHOULD NOT or MUST NOT be used. Table 1 2700 summarizes the illegal and discouraged combinations. 2702 ssp_window: 2704 This is the number of SSV versions the client wants the server to 2705 maintain (i.e., each successful call to SET_SSV produces a new 2706 version of the SSV). If ssp_window is zero, the server MUST 2707 return NFS4ERR_INVAL. The server responds with spi_window, which 2708 MUST NOT exceed ssp_window, and MUST be at least one. Any 2709 requests on the backchannel or fore channel that are using a 2710 version of the SSV that is outside the window will fail with an 2711 ONC RPC authentication error, and the requester will have to retry 2712 them with the same slot ID and sequence ID. 2714 ssp_num_gss_handles: 2716 This is the number of RPCSEC_GSS handles the server should create 2717 that are based on the GSS SSV mechanism (see section 2.10.9 of 2718 [RFC5661]). It is not the total number of RPCSEC_GSS handles for 2719 the client ID. Indeed, subsequent calls to EXCHANGE_ID will add 2720 RPCSEC_GSS handles. The server responds with a list of handles in 2721 spi_handles. If the client asks for at least one handle and the 2722 server cannot create it, the server MUST return an error. The 2723 handles in spi_handles are not available for use until the client 2724 ID is confirmed, which could be immediately if EXCHANGE_ID returns 2725 EXCHGID4_FLAG_CONFIRMED_R, or upon successful confirmation from 2726 CREATE_SESSION. 2728 While a client ID can span all the connections that are connected 2729 to a server sharing the same eir_server_owner.so_major_id, the 2730 RPCSEC_GSS handles returned in spi_handles can only be used on 2731 connections connected to a server that returns the same the 2732 eir_server_owner.so_major_id and eir_server_owner.so_minor_id on 2733 each connection. It is permissible for the client to set 2734 ssp_num_gss_handles to zero; the client can create more handles 2735 with another EXCHANGE_ID call. 2737 Because each SSV RPCSEC_GSS handle shares a common SSV GSS 2738 context, there are security considerations specific to this 2739 situation discussed in Section 2.10.10 of [RFC5661]. 2741 The seq_window (see Section 5.2.3.1 of [RFC2203]) of each 2742 RPCSEC_GSS handle in spi_handle MUST be the same as the seq_window 2743 of the RPCSEC_GSS handle used for the credential of the RPC 2744 request that the EXCHANGE_ID request was sent with. 2746 +-------------------+----------------------+------------------------+ 2747 | Encryption | MUST NOT be combined | SHOULD NOT be combined | 2748 | Algorithm | with | with | 2749 +-------------------+----------------------+------------------------+ 2750 | id-aes128-CBC | | id-sha384, id-sha512 | 2751 | id-aes192-CBC | id-sha1 | id-sha512 | 2752 | id-aes256-CBC | id-sha1, id-sha224 | | 2753 +-------------------+----------------------+------------------------+ 2755 Table 1 2757 The arguments include an array of up to one element in length called 2758 eia_client_impl_id. If eia_client_impl_id is present, it contains 2759 the information identifying the implementation of the client. 2760 Similarly, the results include an array of up to one element in 2761 length called eir_server_impl_id that identifies the implementation 2762 of the server. Servers MUST accept a zero-length eia_client_impl_id 2763 array, and clients MUST accept a zero-length eir_server_impl_id 2764 array. 2766 A possible use for implementation identifiers would be in diagnostic 2767 software that extracts this information in an attempt to identify 2768 interoperability problems, performance workload behaviors, or general 2769 usage statistics. Since the intent of having access to this 2770 information is for planning or general diagnosis only, the client and 2771 server MUST NOT interpret this implementation identity information in 2772 a way that affects how the implementation behaves in interacting with 2773 its peer. The client and server are not allowed to depend on the 2774 peer's manifesting a particular allowed behavior based on an 2775 implementation identifier but are required to interoperate as 2776 specified elsewhere in the protocol specification. 2778 Because it is possible that some implementations might violate the 2779 protocol specification and interpret the identity information, 2780 implementations MUST provide facilities to allow the NFSv4 client and 2781 server be configured to set the contents of the nfs_impl_id 2782 structures sent to any specified value. 2784 13.4. IMPLEMENTATION 2786 A server's client record is a 5-tuple: 2788 1. co_ownerid 2790 The client identifier string, from the eia_clientowner 2791 structure of the EXCHANGE_ID4args structure. 2793 2. co_verifier: 2795 A client-specific value used to indicate incarnations (where a 2796 client restart represents a new incarnation), from the 2797 eia_clientowner structure of the EXCHANGE_ID4args structure. 2799 3. principal: 2801 The principal that was defined in the RPC header's credential 2802 and/or verifier at the time the client record was established. 2804 4. client ID: 2806 The shorthand client identifier, generated by the server and 2807 returned via the eir_clientid field in the EXCHANGE_ID4resok 2808 structure. 2810 5. confirmed: 2812 A private field on the server indicating whether or not a 2813 client record has been confirmed. A client record is 2814 confirmed if there has been a successful CREATE_SESSION 2815 operation to confirm it. Otherwise, it is unconfirmed. An 2816 unconfirmed record is established by an EXCHANGE_ID call. Any 2817 unconfirmed record that is not confirmed within a lease period 2818 SHOULD be removed. 2820 The following identifiers represent special values for the fields in 2821 the records. 2823 ownerid_arg: 2825 The value of the eia_clientowner.co_ownerid subfield of the 2826 EXCHANGE_ID4args structure of the current request. 2828 verifier_arg: 2830 The value of the eia_clientowner.co_verifier subfield of the 2831 EXCHANGE_ID4args structure of the current request. 2833 old_verifier_arg: 2835 A value of the eia_clientowner.co_verifier field of a client 2836 record received in a previous request; this is distinct from 2837 verifier_arg. 2839 principal_arg: 2841 The value of the RPCSEC_GSS principal for the current request. 2843 old_principal_arg: 2845 A value of the principal of a client record as defined by the RPC 2846 header's credential or verifier of a previous request. This is 2847 distinct from principal_arg. 2849 clientid_ret: 2851 The value of the eir_clientid field the server will return in the 2852 EXCHANGE_ID4resok structure for the current request. 2854 old_clientid_ret: 2856 The value of the eir_clientid field the server returned in the 2857 EXCHANGE_ID4resok structure for a previous request. This is 2858 distinct from clientid_ret. 2860 confirmed: 2862 The client ID has been confirmed. 2864 unconfirmed: 2866 The client ID has not been confirmed. 2868 Since EXCHANGE_ID is a non-idempotent operation, we must consider the 2869 possibility that retries occur as a result of a client restart, 2870 network partition, malfunctioning router, etc. Retries are 2871 identified by the value of the eia_clientowner field of 2872 EXCHANGE_ID4args, and the method for dealing with them is outlined in 2873 the scenarios below. 2875 The scenarios are described in terms of the client record(s) a server 2876 has for a given co_ownerid. Note that if the client ID was created 2877 specifying SP4_SSV state protection and EXCHANGE_ID as the one of the 2878 operations in spo_must_allow, then the server MUST authorize 2879 EXCHANGE_IDs with the SSV principal in addition to the principal that 2880 created the client ID. 2882 1. New Owner ID 2884 If the server has no client records with 2885 eia_clientowner.co_ownerid matching ownerid_arg, and 2886 EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in the 2887 EXCHANGE_ID, then a new shorthand client ID (let us call it 2888 clientid_ret) is generated, and the following unconfirmed 2889 record is added to the server's state. 2891 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 2892 unconfirmed } 2894 Subsequently, the server returns clientid_ret. 2896 2. Non-Update on Existing Client ID 2898 If the server has the following confirmed record, and the 2899 request does not have EXCHGID4_FLAG_UPD_CONFIRMED_REC_A set, 2900 then the request is the result of a retried request due to a 2901 faulty router or lost connection, or the client is trying to 2902 determine if it can perform trunking. 2904 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 2905 confirmed } 2907 Since the record has been confirmed, the client must have 2908 received the server's reply from the initial EXCHANGE_ID 2909 request. Since the server has a confirmed record, and since 2910 EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, with the 2911 possible exception of eir_server_owner.so_minor_id, the server 2912 returns the same result it did when the client ID's properties 2913 were last updated (or if never updated, the result when the 2914 client ID was created). The confirmed record is unchanged. 2916 3. Client Collision 2918 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, and if the 2919 server has the following confirmed record, then this request 2920 is likely the result of a chance collision between the values 2921 of the eia_clientowner.co_ownerid subfield of EXCHANGE_ID4args 2922 for two different clients. 2924 { ownerid_arg, *, old_principal_arg, old_clientid_ret, 2925 confirmed } 2927 If there is currently no state associated with 2928 old_clientid_ret, or if there is state but the lease has 2929 expired, then this case is effectively equivalent to the New 2930 Owner ID case of Paragraph 1. The confirmed record is 2931 deleted, the old_clientid_ret and its lock state are deleted, 2932 a new shorthand client ID is generated, and the following 2933 unconfirmed record is added to the server's state. 2935 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 2936 unconfirmed } 2938 Subsequently, the server returns clientid_ret. 2940 If old_clientid_ret has an unexpired lease with state, then no 2941 state of old_clientid_ret is changed or deleted. The server 2942 returns NFS4ERR_CLID_INUSE to indicate that the client should 2943 retry with a different value for the 2944 eia_clientowner.co_ownerid subfield of EXCHANGE_ID4args. The 2945 client record is not changed. 2947 4. Replacement of Unconfirmed Record 2949 If the EXCHGID4_FLAG_UPD_CONFIRMED_REC_A flag is not set, and 2950 the server has the following unconfirmed record, then the 2951 client is attempting EXCHANGE_ID again on an unconfirmed 2952 client ID, perhaps due to a retry, a client restart before 2953 client ID confirmation (i.e., before CREATE_SESSION was 2954 called), or some other reason. 2956 { ownerid_arg, *, *, old_clientid_ret, unconfirmed } 2958 It is possible that the properties of old_clientid_ret are 2959 different than those specified in the current EXCHANGE_ID. 2960 Whether or not the properties are being updated, to eliminate 2961 ambiguity, the server deletes the unconfirmed record, 2962 generates a new client ID (clientid_ret), and establishes the 2963 following unconfirmed record: 2965 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 2966 unconfirmed } 2968 5. Client Restart 2970 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, and if the 2971 server has the following confirmed client record, then this 2972 request is likely from a previously confirmed client that has 2973 restarted. 2975 { ownerid_arg, old_verifier_arg, principal_arg, 2976 old_clientid_ret, confirmed } 2977 Since the previous incarnation of the same client will no 2978 longer be making requests, once the new client ID is confirmed 2979 by CREATE_SESSION, byte-range locks and share reservations 2980 should be released immediately rather than forcing the new 2981 incarnation to wait for the lease time on the previous 2982 incarnation to expire. Furthermore, session state should be 2983 removed since if the client had maintained that information 2984 across restart, this request would not have been sent. If the 2985 server supports neither the CLAIM_DELEGATE_PREV nor 2986 CLAIM_DELEG_PREV_FH claim types, associated delegations should 2987 be purged as well; otherwise, delegations are retained and 2988 recovery proceeds according to section 10.2.1 of [RFC5661]. 2990 After processing, clientid_ret is returned to the client and 2991 this client record is added: 2993 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 2994 unconfirmed } 2996 The previously described confirmed record continues to exist, 2997 and thus the same ownerid_arg exists in both a confirmed and 2998 unconfirmed state at the same time. The number of states can 2999 collapse to one once the server receives an applicable 3000 CREATE_SESSION or EXCHANGE_ID. 3002 + If the server subsequently receives a successful 3003 CREATE_SESSION that confirms clientid_ret, then the server 3004 atomically destroys the confirmed record and makes the 3005 unconfirmed record confirmed as described in section 3006 16.36.3 of [RFC5661]. 3008 + If the server instead subsequently receives an EXCHANGE_ID 3009 with the client owner equal to ownerid_arg, one strategy is 3010 to simply delete the unconfirmed record, and process the 3011 EXCHANGE_ID as described in the entirety of Section 13.4. 3013 6. Update 3015 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 3016 has the following confirmed record, then this request is an 3017 attempt at an update. 3019 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 3020 confirmed } 3021 Since the record has been confirmed, the client must have 3022 received the server's reply from the initial EXCHANGE_ID 3023 request. The server allows the update, and the client record 3024 is left intact. 3026 7. Update but No Confirmed Record 3028 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 3029 has no confirmed record corresponding ownerid_arg, then the 3030 server returns NFS4ERR_NOENT and leaves any unconfirmed record 3031 intact. 3033 8. Update but Wrong Verifier 3035 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 3036 has the following confirmed record, then this request is an 3037 illegal attempt at an update, perhaps because of a retry from 3038 a previous client incarnation. 3040 { ownerid_arg, old_verifier_arg, *, clientid_ret, confirmed } 3042 The server returns NFS4ERR_NOT_SAME and leaves the client 3043 record intact. 3045 9. Update but Wrong Principal 3047 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 3048 has the following confirmed record, then this request is an 3049 illegal attempt at an update by an unauthorized principal. 3051 { ownerid_arg, verifier_arg, old_principal_arg, clientid_ret, 3052 confirmed } 3054 The server returns NFS4ERR_PERM and leaves the client record 3055 intact. 3057 14. Security Considerations 3059 The Security Considerations section of [RFC5661] needs the additions 3060 below to properly address some aspects of trunking discovery, 3061 referral, migration and replication. 3063 The possibility that requests to determine the set of network 3064 addresses corresponding to a given server might be interfered with 3065 or have their responses corrupted needs to be taken into account. 3066 In light of this, the following considerations should be taken 3067 note of: 3069 o When DNS is used to convert server named to addresses and 3070 DNSSEC [RFC4033] is not available, the validity of the network 3071 addresses returned cannot be relied upon. However, when the 3072 client uses RPCSEC_GSS to access the designated server, it is 3073 possible for mutual authentication to discover invalid server 3074 addresses provided. 3076 o The fetching of attributes containing location information 3077 SHOULD be performed using RPCSEC_GSS with integrity protection, 3078 as previously explained in the Security Considerations section 3079 of [RFC5661]. It is important to note here that a client 3080 making a request of this sort without using RPCSEC_GSS 3081 including integrity protection needs be aware of the negative 3082 consequences of doing so, which can lead to invalid host names 3083 or network addresses being returned. In light of this, the 3084 client needs to recognize that using such returned location 3085 information to access an NFSv4 server without use of RPCSEC_GSS 3086 (i.e. by using AUTH_SYS) poses dangers as it can result in the 3087 client interacting with an unverified network address posing as 3088 an NFSv4 server. 3090 o Despite the fact that it is a REQUIREMENT (of [RFC5661]) that 3091 "implementations" provide "support" for use of RPCSEC_GSS, it 3092 cannot be assumed that use of RPCSEC_GSS is always available 3093 between any particular client-server pair. 3095 o When a client has the network addresses of a server but not the 3096 associated host names, that would interfere with its ability to 3097 use RPCSEC_GSS. 3099 In light of the above, a server should present location entries 3100 that correspond to file systems on other servers using a host 3101 name. This would allow the client to interrogate the fs_locations 3102 on the destination server to obtain trunking information (as well 3103 as replica information) using RPCSEC_GSS with integrity, 3104 validating the name provided while assuring that the response has 3105 not been corrupted. 3107 When RPCSEC_GSS is not available on a server, the client needs to 3108 be aware of the fact that the location entries are subject to 3109 corruption and cannot be relied upon. In the case of a client 3110 being directed to another server after NFS4ERR_MOVED, this could 3111 vitiate the authentication provided by the use of RPCSEC_GSS, 3112 since the destination might validly represent itself as the server 3113 to which the client was erroneously directed In the case in which 3114 a location attribute is fetched upon connecting with a server, it 3115 is best for the client to ignore trunking and replica information, 3116 when RPCSEC_FGGS with integrity protection cannot be used. 3118 In cases in which location information which cannot be verified is 3119 fetched and used, it needs to be subject to appropriate filtering 3120 to prevent the client from being inappropriately directed. For 3121 example, where security depends on the physical isolation of the 3122 network on which clients and servers interact, validation of 3123 network addresses to make sure they are within this network can 3124 limit exposures. 3126 To summarize considerations regarding the use of RPCSEC_GSS in 3127 fetching location information, we need to consider the following 3128 possibilities for requests to interrogate location information, 3129 with interrogation approaches on the referring and destination 3130 servers arrived at separately: 3132 o The use of RPCSEC_GSS with integrity protection is RECOMMENDED 3133 in all cases, since the absence of integrity protection exposes 3134 the client to the possibility of the results being modified. 3136 o The use of RPCSEC_GSS without integrity protection to fetch 3137 location information SHOULD NOT be attempted. In cases of 3138 migration or referral, this applies both to the referring and 3139 destination servers. 3141 o The use of requests issued without RPCSEC_GSS (i.e. using 3142 AUTH_SYS), while undesirable, may not be avoidable in all 3143 cases. Where the use of the returned information cannot be 3144 avoided, it should be subject to filtering to eliminate the 3145 possibility that the client would treat an invalid address as 3146 if it were a NFSv4 server. The specifics will vary depending 3147 on the degree of network isolation and whether the request is 3148 to the referring or destination servers. 3150 15. IANA Considerations 3152 This document does not require actions by IANA. 3154 16. References 3156 16.1. Normative References 3158 [CSOR_AES] 3159 National Institute of Standards and Technology, 3160 "Cryptographic Algorithm Object Registration", URL 3161 http://csrc.nist.gov/groups/ST/crypto_apps_infra/csor/ 3162 algorithms.html, November 2007. 3164 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 3165 Requirement Levels", BCP 14, RFC 2119, 3166 DOI 10.17487/RFC2119, March 1997, 3167 . 3169 [RFC2203] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol 3170 Specification", RFC 2203, DOI 10.17487/RFC2203, September 3171 1997, . 3173 [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. 3174 Rose, "DNS Security Introduction and Requirements", 3175 RFC 4033, DOI 10.17487/RFC4033, March 2005, 3176 . 3178 [RFC4055] Schaad, J., Kaliski, B., and R. Housley, "Additional 3179 Algorithms and Identifiers for RSA Cryptography for use in 3180 the Internet X.509 Public Key Infrastructure Certificate 3181 and Certificate Revocation List (CRL) Profile", RFC 4055, 3182 DOI 10.17487/RFC4055, June 2005, 3183 . 3185 [RFC5403] Eisler, M., "RPCSEC_GSS Version 2", RFC 5403, 3186 DOI 10.17487/RFC5403, February 2009, 3187 . 3189 [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., 3190 "Network File System (NFS) Version 4 Minor Version 1 3191 Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, 3192 . 3194 [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System 3195 (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, 3196 March 2015, . 3198 [RFC7861] Adamson, A. and N. Williams, "Remote Procedure Call (RPC) 3199 Security Version 3", RFC 7861, DOI 10.17487/RFC7861, 3200 November 2016, . 3202 [RFC7931] Noveck, D., Ed., Shivam, P., Lever, C., and B. Baker, 3203 "NFSv4.0 Migration: Specification Update", RFC 7931, 3204 DOI 10.17487/RFC7931, July 2016, 3205 . 3207 16.2. Informative References 3209 [I-D.ietf-nfsv4-migration-issues] 3210 Noveck, D., Shivam, P., Lever, C., and B. Baker, "NFSv4 3211 Migration and Trunking: Implementation and Specification 3212 Issues", draft-ietf-nfsv4-migration-issues-13 (work in 3213 progress), May 2017. 3215 [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- 3216 Hashing for Message Authentication", RFC 2104, 3217 DOI 10.17487/RFC2104, February 1997, 3218 . 3220 Appendix A. Classification of Document Sections 3222 Using the classification appearing in Section 3.3, we can proceed 3223 through the current document and classify its sections as listed 3224 below. In this listing, when we refer to a Section X and there is a 3225 Section X.1 within it, the classification of Section X refers to the 3226 part of that section exclusive of subsections. In the case when that 3227 portion is empty, the section is not counted. 3229 o Sections 1 through 4, a total of five sections, are all 3230 explanatory. 3232 o Section 4.1 is a replacement section. 3234 o Section 4.3 is an additional sections. 3236 o Section 4.3 is a replacement sections. 3238 o Section 4.4 is explanatory. 3240 o Section 4.5 is a replacement section. 3242 o Sections 4.5.1 and 4.5.2 are additional sections. 3244 o Sections 4.5.3 through 4.5.5, a total of three sections, are all 3245 replacement sections. 3247 o Section 4.5.6 is an additional section. 3249 o Section 5 is explanatory. 3251 o Sections 6 and 7 are additional sections. 3253 o Sections 8 through 8.9, a total of ten sections, are all 3254 replacement sections. 3256 o Sections 9 through 11.2, a total of eleven sections, are all 3257 additional sections. 3259 o Section 12 is explanatory. 3261 o Sections 12.1 and 12.2 are replacement sections. 3263 o Sections 12.3 and 12.4 are editing sections. 3265 o Section 12.5 is explanatory. 3267 o Section 13 is a replacement section, which consists of a total of 3268 five sections. 3270 o Section 14 is an editing section. 3272 o Section 15 through Acknowledgments, a total of six sections, are 3273 all explanatory. 3275 To summarize: 3277 o There are fifteen explanatory sections. 3279 o There are twenty-two replacement sections. 3281 o There are seventeen additional sections. 3283 o There are three editing sections. 3285 Appendix B. Updates to RFC5661 3287 In this appendix, we proceed through [RFC5661] identifying sections 3288 as unchanged, modified, deleted, or replaced and indicating where 3289 additional sections from the current document would appear in an 3290 eventual consolidated description of NFSv4.1. In this presentation, 3291 when section X is referred to, it denotes that section plus all 3292 included subsections. When it is necessary to refer to the part of a 3293 section outside any included subsections, the exclusion is noted 3294 explicitly. 3296 o Section 1 is unmodified except that Section 1.7.3.3 is to be 3297 replaced by Section 12.1 from the current document. 3299 o Section 2 is unmodified except for the specific items listed 3300 below: 3302 o Section 2.10.4 is replaced by Section 12.2 from the current 3303 document. 3305 o Section 2.10.5 is modified as discussed in Section 12.4 of the 3306 current document. 3308 o Sections 3 through 10 are unchanged. 3310 o Section 11 is extensively modified as discussed below. 3312 o Section 11, exclusive of subsections, is replaced by Sections 3313 4.1 and 4.2 from the current document. 3315 o Section 11.1 is replaced by Section 4.3 from the current 3316 document. 3318 o Sections 11.2, 11.3, 11.3.1, and 11.3.2 are unchanged. 3320 o Section 11.4 is replaced by Section 4.5 from the current 3321 document. For details regarding subsections see below. 3323 o New sections corresponding to Sections 4.5.1 and 4.5.2 from 3324 the current document appear next. 3326 o Section 11.4.1 is replaced by Section 4.5.3 3328 o Section 11.4.2 is replaced by Section 4.5.4 3330 o Section 11.4.3 is replaced by Section 4.5.5 3332 o A new section corresponding to Section 4.5.6 from the 3333 current document appears next. 3335 o Section 11.5 is to be deleted. 3337 o Section 11.6 is unchanged. 3339 o New sections corresponding to Sections 6 and 7 from the current 3340 document appear next. 3342 o Section 11.7 is replaced by Section 8 from the current 3343 document. For details regarding subsections see below. 3345 o Section 11.7.1 is replaced by Section 8.1 3347 o Sections 11.7.2, 11.7.2.1, and 11.7.2.2 are deleted. 3349 o Section 11.7.3 is replaced by Section 8.2 3351 o Section 11.7.4 is replaced by Section 8.3 3352 o Sections 11.7.5 and 11.7.5.1 are replaced by Sections 8.4 3353 and 8.4.1 respectively. 3355 o Section 11.7.6 is replaced by Section 8.5 3357 o Section 11.7.7, exclusive of subsections, is replaced by 3358 Section 8.9. Sections 11.7.7.1 and 11.7.72 are unchanged. 3360 o Section 11.7.8 is replaced by Section 8.6 3362 o Section 11.7.9 is replaced by Section 8.7 3364 o Section 11.7.10 is replaced by Section 8.8 3366 o Sections 11.8, 11.8.1, 11.8.2, 11.9, 11.10, 11.10.1, 11.10.2, 3367 11.10.3, and 11.11 are unchanged. 3369 o New sections corresponding to Sections 9, 10, and 11 from the 3370 current document appear next as additional sub-sections of 3371 Section 11. Each of these has subsections, so there is a total 3372 of seventeen sections added. 3374 o Sections 12 through 14 are unchanged. 3376 o Section 15 is unmodified except that the description of 3377 NFS4ERR_MOVED in Section 15.1 is revised as described in 3378 Section 12.3 of the current document. 3380 o Sections 16 and 17 are unchanged. 3382 o Section 18 is unmodified except that section 18.35 is replaced by 3383 Section 13 in the current document. 3385 o Sections 19 through 23 are unchanged. 3387 In terms of top-level sections, exclusive of appendices: 3389 o There is one heavily modified top-level section (Section 11) 3391 o There are four other modified top-level sections (Sections 1, 2, 3392 15, and 18). 3394 o The other eighteen top-level sections are unchanged. 3396 The disposition of sections of [RFC5661] is summarized in the 3397 following table which provides counts of sections replaced, added, 3398 deleted, modified, or unchanged. Separate counts are provided for: 3400 o Top-level sections. 3402 o Sections with TOC entries. 3404 o Sections within Section 11. 3406 o Sections outside Section 11. 3408 In this table, the counts for top-level sections and TOC entries are 3409 for sections including subsections while other counts are for 3410 sections exclusive of included subsections. 3412 +------------+------+------+--------+------------+--------+ 3413 | Status | Top | TOC | in 11 | not in 11 | Total | 3414 +------------+------+------+--------+------------+--------+ 3415 | Replaced | 0 | 3 | 17 | 7 | 24 | 3416 | Added | 0 | 5 | 22 | 0 | 22 | 3417 | Deleted | 0 | 1 | 4 | 0 | 4 | 3418 | Modified | 5 | 4 | 0 | 2 | 2 | 3419 | Unchanged | 18 | 212 | 16 | 918 | 934 | 3420 | in RFC5661 | 23 | 220 | 37 | 927 | 964 | 3421 +------------+------+------+--------+------------+--------+ 3423 Acknowledgments 3425 The authors wish to acknowledge the important role of Andy Adamson of 3426 Netapp in clarifying the need for trunking discovery functionality, 3427 and exploring the role of the location attributes in providing the 3428 necessary support. 3430 The authors also wish to acknowledge the work of Xuan Qi of Oracle 3431 with NFSv4.1 client and server prototypes of transparent state 3432 migration functionality. 3434 The authors wish to thank Trond Myklebust of Primary Data for his 3435 comments related to trunking, helping to clarify the role of DNS in 3436 trunking discovery. 3438 The authors wish to thank Olga Kornievskaia of Netapp for her helpful 3439 review comments. 3441 Authors' Addresses 3442 David Noveck 3443 NetApp 3444 1601 Trapelo Road 3445 Waltham, MA 02451 3446 United States of America 3448 Phone: +1 781 572 8038 3449 Email: davenoveck@gmail.com 3451 Charles Lever 3452 Oracle Corporation 3453 1015 Granger Avenue 3454 Ann Arbor, MI 48104 3455 United States of America 3457 Phone: +1 248 614 5091 3458 Email: chuck.lever@oracle.com