idnits 2.17.1 draft-ietf-nfsv4-mv1-msns-update-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC5661, updated by this document, for RFC5378 checks: 2005-10-21) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 3, 2018) is 2303 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5661 (Obsoleted by RFC 8881) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 D. Noveck, Ed. 3 Internet-Draft NetApp 4 Updates: 5661 (if approved) C. Lever 5 Intended status: Standards Track ORACLE 6 Expires: July 7, 2018 January 3, 2018 8 NFSv4.1 Update for Multi-Server Namespace 9 draft-ietf-nfsv4-mv1-msns-update-00 11 Abstract 13 This document presents necessary clarifications and corrections 14 concerning features related to the use of location-related attributes 15 in NFSv4.1. These include migration, which transfers responsibility 16 for a file system from one server to another, and facilities to 17 support trunking by allowing discovery of the set of network 18 addresses to use to access a file system. This document updates 19 RFC5661. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on July 7, 2018. 38 Copyright Notice 40 Copyright (c) 2018 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (https://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 56 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 57 3. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 4 58 3.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 59 3.2. Summary of Issues . . . . . . . . . . . . . . . . . . . . 6 60 3.3. Relationship of this Document to RFC5661 . . . . . . . . 8 61 4. Changes to Section 11 of RFC5661 . . . . . . . . . . . . . . 9 62 4.1. Multi-Server Namespace (as updated) . . . . . . . . . . . 9 63 4.2. Location-related Terminology (to be added) . . . . . . . 9 64 4.3. Location Attributes (as updated) . . . . . . . . . . . . 11 65 4.4. Re-organization of Sections 11.4 and 11.5 of RFC5661 . . 12 66 4.5. Uses of Location Information (as updated) . . . . . . . . 12 67 4.5.1. Combining Multiple Uses in a Single Attribute (to be 68 added) . . . . . . . . . . . . . . . . . . . . . . . 13 69 4.5.2. Location Attributes and Trunking (to be added) . . . 14 70 4.5.3. File System Replication (as updated) . . . . . . . . 14 71 4.5.4. File System Migration (as updated) . . . . . . . . . 15 72 4.5.5. Referrals (as updated) . . . . . . . . . . . . . . . 16 73 4.5.6. Changes in a Location Attribute (to be added) . . . . 17 74 5. Re-organization of Section 11.7 of RFC5661 . . . . . . . . . 18 75 6. Overview of File Access Transitions (to be added) . . . . . . 19 76 7. Effecting Network Address Transitions (to be added) . . . . . 19 77 8. Effecting File System Transitions (as updated) . . . . . . . 20 78 8.1. File System Transitions and Simultaneous Access (as 79 updated) . . . . . . . . . . . . . . . . . . . . . . . . 21 80 8.2. Filehandles and File System Transitions (as updated) . . 21 81 8.3. Fileids and File System Transitions (as updated) . . . . 22 82 8.4. Fsids and File System Transitions (as updated) . . . . . 23 83 8.4.1. File System Splitting (as updated) . . . . . . . . . 23 84 8.5. The Change Attribute and File System Transitions (as 85 updated) . . . . . . . . . . . . . . . . . . . . . . . . 24 86 8.6. Write Verifiers and File System Transitions (as updated) 24 87 8.7. Readdir Cookies and Verifiers and File System Transitions 88 (as updated) . . . . . . . . . . . . . . . . . . . . . . 24 89 8.8. File System Data and File System Transitions (as updated) 25 90 8.9. Lock State and File System Transitions (as updated) . . . 26 91 9. Transferring State upon Migration (to be added) . . . . . . . 27 92 9.1. Transparent State Migration and pNFS (to be added) . . . 27 93 10. Client Responsibilities when Access is Transitioned (to be 94 added) . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 95 10.1. Client Transition Notifications (to be added) . . . . . 29 96 10.2. Performing Migration Discovery (to be added) . . . . . . 31 97 10.3. Overview of Client Response to NFS4ERR_MOVED (to be 98 added) . . . . . . . . . . . . . . . . . . . . . . . . . 34 99 10.4. Obtaining Access to Sessions and State after Migration 100 (to be added) . . . . . . . . . . . . . . . . . . . . . 36 101 10.5. Obtaining Access to Sessions and State after Network 102 Address Transfer (to be added) . . . . . . . . . . . . . 37 103 11. Server Responsibilities Upon Migration (to be added) . . . . 38 104 11.1. Server Responsibilities in Effecting Transparent State 105 Migration (to be added) . . . . . . . . . . . . . . . . 38 106 11.2. Server Responsibilities in Effecting Session Transfer 107 (to be added) . . . . . . . . . . . . . . . . . . . . . 40 108 12. Changes to RFC5661 outside Section 11 . . . . . . . . . . . . 42 109 12.1. (Introduction to) Multi-Server Namespace (as updated) . 43 110 12.2. Server Scope (as updated) . . . . . . . . . . . . . . . 44 111 12.3. Revised Treatment of NFS4ERR_MOVED . . . . . . . . . . . 46 112 12.4. Revised Discussion of Server_owner changes . . . . . . . 46 113 12.5. Revision to Treatment of EXCHANGE_ID . . . . . . . . . . 47 114 13. Operation 42: EXCHANGE_ID - Instantiate Client ID (as 115 updated) . . . . . . . . . . . . . . . . . . . . . . . . . . 48 116 14. Security Considerations . . . . . . . . . . . . . . . . . . . 66 117 15. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 68 118 16. References . . . . . . . . . . . . . . . . . . . . . . . . . 69 119 16.1. Normative References . . . . . . . . . . . . . . . . . . 69 120 16.2. Informative References . . . . . . . . . . . . . . . . . 70 121 Appendix A. Classification of Document Sections . . . . . . . . 70 122 Appendix B. Updates to RFC5661 . . . . . . . . . . . . . . . . . 71 123 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 74 124 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 75 126 1. Introduction 128 This document defines the proper handling, within NFSv4.1, of the 129 location-related attributes fs_locations and fs_locations_info and 130 how necessary changes in those attributes are to be dealt with. The 131 necessary corrections and clarifications parallel those done for 132 NFSv4.0 in [RFC7931] and [I-D.cel-nfsv4-mv0-trunking-update]. 134 A large part of the changes to be made are necessary to clarify the 135 handling of Transparent State Migration in NFSv4.1, which was omitted 136 in [RFC5661]. Many of the issues dealt with in [RFC7931] need to be 137 addressed in the context of NFSv4.1. 139 Another important issue to be dealt with concerns the handling of 140 multiple entries within location-related attributes that represent 141 different ways to access the same file system. Unfortunately 142 [RFC5661], while recognizing that these entries can represent 143 different ways to access the same file system, confuses the matter by 144 treating network access paths as "replicas", making it difficult for 145 these attributes to be used to obtain information about the network 146 addresses to be used to access particular file system instances and 147 engendering confusion between two different sorts of transition: 148 those involving a change of network access paths to the same file 149 system instance and those in which there is shift between two 150 distinct replicas. 152 When location information is used to determine the set of network 153 addresses to access a particular file system instance (i.e. to 154 perform trunking discovery), clarification is needed regarding the 155 interaction of trunking and transitions between file system replicas, 156 including migration. Unfortunately [RFC5661], while it provided a 157 method of determining whether two network addresses were connected to 158 the same server, did not address the issue of trunking discovery, 159 making it necessary to address it in this document. 161 2. Requirements Language 163 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 164 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 165 document are to be interpreted as described in [RFC2119]. 167 3. Preliminaries 169 3.1. Terminology 171 While most of the terms related to multi-server namespace issues are 172 appropriately defined in the replacement for Section 11 in [RFC5661] 173 and appear in Section 4.2 below, there are a number of terms used 174 outside that context that are explained here. 176 In this document, the phrase "client ID" always refers to the 64-bit 177 shorthand identifier assigned by the server (a clientid4) and never 178 to the structure which the client uses to identify itself to the 179 server (called an nfs_client_id4 or client_owner in NFSv4.0 and 180 NFSv4.1 respectively). The opaque identifier within those structures 181 is referred to as a "client id string". 183 It is particularly important to clarify the distinction between 184 trunking detection and trunking discovery. The definitions we 185 present will be applicable to all minor versions of NFSv4, but we 186 will put particular emphasis on how these terms apply to NFS version 187 4.1. 189 o Trunking detection refers to ways of deciding whether two specific 190 network addresses are connected to the same NFSv4 server. The 191 means available to make this determination depends on the protocol 192 version, and, in some cases, on the client implementation. 194 In the case of NFS version 4.1 and later minor versions, the means 195 of trunking detection are as described by [RFC5661] and are 196 available to every client. Two network addresses connected to the 197 same server are always server-trunkable but are not necessarily 198 session-trunkable. 200 o Trunking discovery is a process by which a client using one 201 network address can obtain other addresses that are connected to 202 the same server Typically it builds on a trunking detection 203 facility by providing one or more methods by which candidate 204 addresses are made available to the client who can then use 205 trunking detection to appropriately filter them. 207 Despite the support for trunking detection there was no 208 description of trunking discovery provided in [RFC5661]. 210 Regarding network addresses and the handling of trunking we use the 211 following terminology: 213 o Each NFSv4 server is assumed to have a set of IP addresses to 214 which NFSv4 requests may be sent by clients. These are referred 215 to as the server's network addresses. 217 o Each network address, when combined with a pathname providing the 218 location of a file system root directory relative to the 219 associated server root file handle, defines a file system network 220 access path. 222 o Two network addresses connected to the same server are said to be 223 server-trunkable. 225 o Two network addresses connected to the same server such that those 226 addresses can be used to support a single common session are 227 referred to as session-trunkable. Note that two addresses may be 228 server-trunkable without being session-trunkable. 230 Discussion of the term "replica" is complicated for a number of 231 reasons: 233 o Even though the term is used in explaining the issues in [RFC5661] 234 that need to be addressed in this document, a full explanation of 235 this term requires explanation of related terms connected to the 236 location attributes which are provided in Section 4.2 of the 237 current document. 239 o The term is also used in [RFC5661], with a meaning different from 240 that in the current document. In short, in [RFC5661] each replica 241 is a identified by a single network access path while, in the 242 current document a set of network access paths which have server- 243 trunkable network addresses and the same root-relative file system 244 pathname are considered to be a single replica with multiple 245 network access paths. 247 3.2. Summary of Issues 249 This document explains how clients and servers are to determine the 250 particular network access paths to be used to access a file system. 251 This includes describing how changes to the specific replica or to 252 the set of addresses to be used are to be dealt with, and how 253 transfers of responsibility that need to be made can be dealt with 254 transparently. This includes cases in which there is a shift between 255 one replica and another and those in which different network access 256 paths are used to access the same replica. 258 As a result of the following problems in [RFC5661], it is necessary 259 to provide the updates described later in this document. 261 o [RFC5661], while it dealt with situations in which various forms 262 of clustering allowed co-ordination of the state assigned by co- 263 operating servers to be used, made no provisions for Transparent 264 State Migration, as introduced by [RFC7530] and corrected and 265 clarified by [RFC7931]. 267 o Although NFSv4.1 was defined with a clear definition of how 268 trunking detection was to be done, there was no clear 269 specification of how trunking discovery was to be done, despite 270 the fact that the specification clearly indicated that this 271 information could be made available via the location attributes. 273 o Because the existence of multiple network access paths to the same 274 file system was dealt with as if there were multiple replicas, 275 issues relating to transitions between replicas could never be 276 clearly distinguished from trunking-related transitions between 277 the addresses used to access a particular file system instance. 278 As a result, in situations in which both migration and trunking 279 configuration changes were involved, neither of these could be 280 clearly dealt with and the relationship between these two features 281 was not seriously addressed. 283 o Because use of two network access paths to the same file system 284 instance (i.e. trunking) was often treated as if two replicas were 285 involved, it was considered that two replicas were being used 286 simultaneously. As a result, the treatment of replicas being used 287 simultaneously in [RFC5661] was not clear as it covered the two 288 distinct cases of a single file system instance being accessed by 289 two different network access paths and two replicas being accessed 290 simultaneously, with the limitations of the latter case not being 291 clearly laid out. 293 The majority of the consequences of these issues are dealt with via 294 the updates in various subsections of Section 4 of the current 295 document which deal with problems within Section 11 of [RFC5661]. 296 These include: 298 o Reorganization made necessary by the fact that two network access 299 paths to the same file system instance needs to be distinguished 300 clearly from two different replicas since the former share locking 301 state and can share session state. 303 o The need for a clear statement regarding the desirability of 304 transparent transfer of state together with a recommendation that 305 either that or a single-fs grace period be provided. 307 o Specifically delineating how such transfers are to be dealt with 308 by the client, taking into account the differences from the 309 treatment in [RFC7931] made necessary by the major protocol 310 changes made in NFSv4.1. 312 o Discussion of the relationship between transparent state transfer 313 and Parallel NFS (pNFS). 315 In addition, there are also updates to other sections of [RFC5661], 316 where the consequences of the incorrect assumptions underlying the 317 current treatment of multi-server namespace issues also need to be 318 corrected. These are to be dealt with as described in various 319 subsections of Section 12 of the current document. 321 o A revised introductory section regarding multi-server namespace 322 facilities is provided. 324 o A more realistic treatment of server scope is provided, which 325 reflects the more limited co-ordination of locking state adopted 326 by servers actually sharing a common server scope. 328 o Some confusing text regarding changes in server_owner needs to be 329 clarified. 331 o The description of NFS4ERR_MOVED needs to be updated since two 332 different network access paths to the same file system are no 333 longer considered to be two instances of the same file system. 335 o A new treatment of EXCHANGE_ID is needed, replacing that which 336 appeared in Section 18.35 of [RFC5661] 338 3.3. Relationship of this Document to RFC5661 340 The role of this document is to explain and specify a set of needed 341 changes to [RFC5661]. All of these changes are related to the multi- 342 server namespace features of NFSv4.1. 344 This document contains sections that propose additions to and other 345 modifications of [RFC5661] as well as others that explain the reasons 346 for modifications but do not directly affect existing specifications. 348 In consequence, the sections of this document can be divided into 349 four groups based on how they relate to the eventual updating of the 350 NFSv4.1 specification. Once the update is published, NFSv4.1 will be 351 specified by two documents that need to be read together, until such 352 time as a consolidated specification is produced. 354 o Explanatory sections do not contain any material that is meant to 355 update the specification of NFSv4.1. Such sections may contain 356 explanations about why and how changes are to be done, without 357 including any text that is to update [RFC5661] or appear in an 358 eventual consolidated document, 360 o Replacement sections contain text that is to replace and thus 361 supersede text within [RFC5661] and then appear in an eventual 362 consolidated document. Replacement sections have the phrase "(as 363 updated)" appended to the section title. 365 o Additional sections contain text which, although not replacing 366 anything in [RFC5661], will be part of the specification of 367 NFSv4.1 and will be expected to be part of an eventual 368 consolidated document. Additional sections have the phrase "(to 369 be added)" appended to the section title. 371 o Editing sections contain some text that replaces text within 372 [RFC5661], although the entire section will not consist of such 373 text and will include other text as well. Such sections make 374 relatively minor adjustments in the existing NFSv4.1 specification 375 which are expected to reflected in an eventual consolidated 376 document. Generally such replacement text appears as a quotation, 377 which may take the form of an indented set of paragraphs. 379 See Appendix A for a classification of the sections of this document 380 according the categories above. 382 When this document is approved and published, [RFC5661] would be 383 significantly updated with most of the changed sections within the 384 current Section 11 of that document. A detailed discussion of the 385 necessary updates can be found in Appendix B. 387 4. Changes to Section 11 of RFC5661 389 A number of sections need to be revised, replacing existing sub- 390 sections within section 11 of [RFC5661]: 392 o New introductory material, including a terminology section, 393 replaces the existing material in [RFC5661] ranging from the start 394 of the existing Section 11 up to and including the existing 395 Section 11.1. The new material appears in Sections 4.1 through 396 4.3 below. 398 o A significant reorganization of the material in the existing 399 Sections 11.4 and 11.5 (of [RFC5661]) is necessary. The reasons 400 for the reorganization of these sections into a single section 401 with multiple subsections are discussed in Section 4.4 below. 402 This replacement appears as Section 4.5 below. 404 New material relating to the handling of the location attributes 405 is contained in Sections 4.5.1 and 4.5.6 below. 407 o A major replacement for the existing Section 11.7 of [RFC5661] 408 entitled "Effecting File System Transitions", will appear as 409 Sections 6 through 11 of the current document. The reasons for 410 the reorganization of this section into multiple sections are 411 discussed below in Section 5 of the current document. 413 4.1. Multi-Server Namespace (as updated) 415 NFSv4.1 supports attributes that allow a namespace to extend beyond 416 the boundaries of a single server. It is desirable that clients and 417 servers support construction of such multi-server namespaces. Use of 418 such multi-server namespaces is OPTIONAL however, and for many 419 purposes, single-server namespaces are perfectly acceptable. Use of 420 multi-server namespaces can provide many advantages, by separating a 421 file system's logical position in a namespace from the (possibly 422 changing) logistical and administrative considerations that result in 423 particular file systems being located on particular servers. 425 4.2. Location-related Terminology (to be added) 427 Regarding terminology relating to the construction of multi-server 428 namespaces out of a set of local per-server namespaces: 430 o Each server has a set of exported file systems which may accessed 431 by NFSv4 clients. Typically, this is done by assigning each file 432 system a name within the pseudo-fs associated with the server, 433 although the pseudo-fs may be dispensed with if there is only a 434 single exported file system. Each such file system is part of the 435 server's local namespace, and can be considered as a file system 436 instance within a larger multi-server namespace. 438 o The set of all exported file systems for a given server 439 constitutes that server's local namespace. 441 o In some cases, a server will have a namespace, more extensive than 442 its local namespace, by using features associated with attributes 443 that provide location information. These features, which allow 444 construction of a multi-server namespace are all described in 445 individual sections below and include referrals (described in 446 Section 4.5.5), migration (described in Section 4.5.4), and 447 replication (described in Section 4.5.3). 449 o A file system present in a server's pseudo-fs may have multiple 450 file system instances on different servers associated with it. 451 All such instances are considered replicas of one another. 453 o When a file system is present in a server's pseudo-fs, but there 454 is no corresponding local file system, it is said to be "absent". 455 In such cases, all associated instances will be accessed on other 456 servers. 458 Regarding terminology relating to attributes used in trunking 459 discovery and other multi-server namespace features: 461 o Location attributes include the fs_locations and fs_locations_info 462 attributes. 464 o Location entries are the individual file system locations in the 465 location attributes. Each such entry specifies a server, in the 466 form of a host name, and an fs name, which in the location of the 467 file system within the server's pseudo-fs. The exact form of the 468 location entry varies with the particular location attribute used 469 as described in Section 4.3 471 o Location elements are derived from location entries and each 472 describes a particular network access path. Location elements 473 need not appear within a location attribute, but the existence of 474 each location element derives from a corresponding location entry. 475 When a location entry specifies an IP address there is only a 476 single corresponding location element. Location entries that 477 contain a host name, are resolved using DNS, and may result in one 478 or more location elements. All location elements consist of a 479 location address which is the IP address of an interface to a 480 server and an fs name which is the location of the file system 481 within the server's pseudo-fs. The fs name is empty if the server 482 has no pseudo-fs and only a single exported file system at the 483 root filehandle. 485 o Two location elements are said to be server-trunkable if they 486 specify the same fs name and the location addresses are such that 487 the location addresses are server-trunkable. 489 o Two location elements are said to be session-trunkable if they 490 specify the same fs name and the location addresses are such that 491 the location addresses are session-trunkable. 493 Each set of server-trunkable location elements defines a set of 494 available network access paths to a particular file system. When 495 there are multiple such file systems, each of which contains the same 496 data, these file systems are considered replicas of one another. 497 Logically, such replication is symmetric, since the fs currently in 498 use and an alternate fs are replicas of each other. Often, in other 499 documents, the term "replica" is not applied to the fs currently in 500 use, despite the fact that the replication relation is inherently 501 symmetric. 503 4.3. Location Attributes (as updated) 505 NFSv4.1 contains RECOMMENDED attributes that provide information 506 about how (i.e. at what network address and namespace position) a 507 given file system may be accessed. As a result, file systems in the 508 namespace of one server can be associated with one or more instances 509 of that file system on other servers. These attributes contain 510 location entries specifying a server address target (either as a DNS 511 name representing one or more IP addresses or as a specific IP 512 address) together with the pathname of that file system within the 513 associated single-server namespace. 515 The fs_locations_info RECOMMENDED attribute allows specification of 516 one or more file system instance locations where the data 517 corresponding to a given file system may be found. This attribute 518 provides to the client, in addition to information about file system 519 instance locations, significant information about the various file 520 system instance choices (e.g., priority for use, writability, 521 currency, etc.). It also includes information to help the client 522 efficiently effect as seamless a transition as possible among 523 multiple file system instances, when and if that should be necessary. 525 Within the fs_locations_info attribute, each fs_locations_server4 526 entry corresponds to a location entry with the fls_server field 527 designating the server, with the location pathname within the 528 server's pseudo-fs given by the fl_rootpath field of the encompassing 529 fs_locations_item4. 531 The fs_locations attribute defined in NFSv4.0 is also a part of 532 NFSv4.1. This attribute only allows specification of the file system 533 locations where the data corresponding to a given file system may be 534 found. Servers should make this attribute available whenever 535 fs_locations_info is supported, but client use of fs_locations_info 536 is preferable. 538 Within the fs_location attribute, each fs_location4 contains a 539 location entry with the server field designating the server and the 540 rootpath field giving the location pathname within the server's 541 pseudo-fs. 543 4.4. Re-organization of Sections 11.4 and 11.5 of RFC5661 545 Previously, issues related to the fact that multiple location entries 546 directed the client to the same file system instance were dealt with 547 in a separate Section 11.5 of [RFC5661]. Because of the new 548 treatment of trunking, these issues now belong within Section 4.5 549 below. 551 In this new section of the current document, trunking is dealt with 552 in Section 4.5.2 together with the other uses of location information 553 described in Sections 4.5.3, 4.5.4, and 4.5.5. 555 4.5. Uses of Location Information (as updated) 557 The location attributes (i.e. fs_locations and fs_locations_info), 558 together with the possibility of absent file systems, provide a 559 number of important facilities in providing reliable, manageable, and 560 scalable data access. 562 When a file system is present, these attributes can provide 564 o The locations of alternative replicas, to be used to access the 565 same data in the event of server failures, communications 566 problems, or other difficulties that make continued access to the 567 current replica impossible or otherwise impractical. Provision 568 and use of such alternate replicas is referred to as "replication" 569 and is discussed in Section 4.5.3 below. 571 o The network address(es) to be used to access the current file 572 system instance or replicas of it. Client use of this information 573 is discussed in Section 4.5.2 below. 575 Under some circumstances, multiple replicas may be used 576 simultaneously to provide higher-performance access to the file 577 system in question, although the lack of state sharing between 578 servers may be an impediment to such use. 580 When a file system is present and becomes absent, clients can be 581 given the opportunity to have continued access to their data, using a 582 different replica. In this case, a continued attempt to use the data 583 in the now-absent file system will result in an NFS4ERR_MOVED error 584 and, at that point, the successor replica or set of possible replica 585 choices can be fetched and used to continue access. Transfer of 586 access to the new replica location is referred to as "migration", and 587 is discussed in Section 4.5.3 below. 589 Where a file system was previously absent, specification of file 590 system location provides a means by which file systems located on one 591 server can be associated with a namespace defined by another server, 592 thus allowing a general multi-server namespace facility. A 593 designation of such a remote instance, in place of a file system 594 never previously present , is called a "pure referral" and is 595 discussed in Section 4.5.5 below. 597 Because client support for location-related attributes is OPTIONAL, a 598 server may (but is not required to) take action to hide migration and 599 referral events from such clients, by acting as a proxy, for example. 600 The server can determine the presence of client support from the 601 arguments of the EXCHANGE_ID operation (see Section 13.3 in the 602 current document). 604 4.5.1. Combining Multiple Uses in a Single Attribute (to be added) 606 A location attribute will sometimes contain information relating to 607 the location of multiple replicas which may be used in different 608 ways. 610 o Location entries that relate to the file system instance currently 611 in use provide trunking information, allowing the client to find 612 additional network addresses by which the instance may be 613 accessed. 615 o Location entries that provide information about replicas to which 616 access is to be transferred. 618 o Other location entries that relate to replicas that are available 619 to use in the event that access to the current replica becomes 620 unsatisfactory. 622 In order to simplify client handling and allow the best choice of 623 replicas to access, the server should adhere to the following 624 guidelines. 626 o All location entries that relate to a single file system instance 627 should be adjacent. 629 o Location entries that relate to the instance currently in use 630 should appear first. 632 o Location entries that relate to replica(s) to which migration is 633 occurring should appear before replicas which are available for 634 later use if the current replica should become inaccessible. 636 4.5.2. Location Attributes and Trunking (to be added) 638 Trunking is the use of multiple connections between a client and 639 server in order to increase the speed of data transfer. A client may 640 determine the set of network addresses to use to access a given file 641 system in a number of ways: 643 o When the name of the server is known to the client, it may use DNS 644 to obtain a set of network addresses to use in accessing the 645 server. 647 o It may fetch the location attribute for the filesystem which will 648 provide either the name of the server (which can be turned into a 649 set of network addresses using DNS), or it will find a set of 650 server-trunkable location entries which can provide the addresses 651 specified by the server as desirable to use to access the file 652 system in question. 654 The server can provide location entries that include either names or 655 network addresses. It might use the latter form because of DNS- 656 related security concerns or because the set of addresses to be used 657 might require active management by the server. 659 Locations entries used to discover candidate addresses for use in 660 trunking are subject to change, as discussed in Section 4.5.6 below. 661 The client may respond to such changes by using additional addresses 662 once they are verified or by ceasing to use existing ones. The 663 server can force the client to cease using an address by returning 664 NFS4ERR_MOVED when that address is used to access a file system. 665 This allows a transfer of access similar to migration, although the 666 same file system instance is accessed throughout. 668 4.5.3. File System Replication (as updated) 670 The fs_locations and fs_locations_info attributes provide alternative 671 locations, to be used to access data in place of or in addition to 672 the current file system instance. On first access to a file system, 673 the client should obtain the set of alternate locations by 674 interrogating the fs_locations or fs_locations_info attribute, with 675 the latter being preferred. 677 In the event that server failures, communications problems, or other 678 difficulties make continued access to the current file system 679 impossible or otherwise impractical, the client can use the alternate 680 locations as a way to get continued access to its data. 682 The alternate locations may be physical replicas of the (typically 683 read-only) file system data, or they may provide for the use of 684 various forms of server clustering in which multiple servers provide 685 alternate ways of accessing the same physical file system. How these 686 different modes of file system transition are represented within the 687 fs_locations and fs_locations_info attributes and how the client 688 deals with file system transition issues will be discussed in detail 689 below. 691 4.5.4. File System Migration (as updated) 693 When a file system is present and becomes absent, clients can be 694 given the opportunity to have continued access to their data, at an 695 alternate location, as specified by a location attribute. This 696 migration of access to another replica includes the ability to retain 697 locks across the transition, either by reclaim or by Transparent 698 State Migration. 700 Typically, a client will be accessing the file system in question, 701 get an NFS4ERR_MOVED error, and then use a location attribute to 702 determine the new location of the data. When fs_locations_info is 703 used, additional information will be available that will define the 704 nature of the client's handling of the transition to a new server. 706 Such migration can be helpful in providing load balancing or general 707 resource reallocation. The protocol does not specify how the file 708 system will be moved between servers. It is anticipated that a 709 number of different server-to-server transfer mechanisms might be 710 used with the choice left to the server implementer. The NFSv4.1 711 protocol specifies the method used to communicate the migration event 712 between client and server. 714 The new location may be, in the case of various forms of server 715 clustering, another server providing access to the same physical file 716 system. The client's responsibilities in dealing with this 717 transition will depend on whether migration has occurred and the 718 means the server has chosen to provide continuity of locking state. 719 These issues will be discussed in detail below. 721 Although a single successor location is typical, multiple locations 722 may be provided. When multiple locations are provided, the client 723 use the first one provided. If that is inaccessible for some reason, 724 later ones can be used. In such cases the client might consider that 725 the transition to the new replica is a migration event, although it 726 would lose access to locking state if it did so. 728 When an alternate location is designated as the target for migration, 729 it must designate the same data (with metadata being the same to the 730 degree indicated by the fs_locations_info attribute). Where file 731 systems are writable, a change made on the original file system must 732 be visible on all migration targets. Where a file system is not 733 writable but represents a read-only copy (possibly periodically 734 updated) of a writable file system, similar requirements apply to the 735 propagation of updates. Any change visible in the original file 736 system must already be effected on all migration targets, to avoid 737 any possibility that a client, in effecting a transition to the 738 migration target, will see any reversion in file system state. 740 4.5.5. Referrals (as updated) 742 Referrals allow the server to associate a file system located on one 743 server with file system located on another server. When this 744 includes the use of pure referrals, servers are provided a way of 745 placing a file system in a location within the namespace essentially 746 without respect to its physical location on a particular server. 747 This allows a single server or a set of servers to present a multi- 748 server namespace that encompasses file systems located on a wider 749 range of servers. Some likely uses of this facility include 750 establishment of site-wide or organization-wide namespaces, with the 751 eventual possibility of combining such together into a truly global 752 namespace. 754 Referrals occur when a client determines, upon first referencing a 755 position in the current namespace, that it is part of a new file 756 system and that the file system is absent. When this occurs, 757 typically by receiving the error NFS4ERR_MOVED, the actual location 758 or locations of the file system can be determined by fetching the 759 fs_locations or fs_locations_info attribute. 761 The locations-related attribute may designate a single file system 762 location or multiple file system locations, to be selected based on 763 the needs of the client. The server, in the fs_locations_info 764 attribute, may specify priorities to be associated with various file 765 system location choices. The server may assign different priorities 766 to different locations as reported to individual clients, in order to 767 adapt to client physical location or to effect load balancing. When 768 both read-only and read-write file systems are present, some of the 769 read-only locations might not be absolutely up-to-date (as they would 770 have to be in the case of replication and migration). Servers may 771 also specify file system locations that include client-substituted 772 variables so that different clients are referred to different file 773 systems (with different data contents) based on client attributes 774 such as CPU architecture. 776 When the fs_locations_info attribute is such that that there are 777 multiple possible targets listed, the relationships among them may be 778 important to the client in selecting which one to use. The same 779 rules specified in Section 4.5.4 below regarding multiple migration 780 targets apply to these multiple replicas as well. For example, the 781 client might prefer a writable target on a server that has additional 782 writable replicas to which it subsequently might switch. Note that, 783 as distinguished from the case of replication, there is no need to 784 deal with the case of propagation of updates made by the current 785 client, since the current client has not accessed the file system in 786 question. 788 Use of multi-server namespaces is enabled by NFSv4.1 but is not 789 required. The use of multi-server namespaces and their scope will 790 depend on the applications used and system administration 791 preferences. 793 Multi-server namespaces can be established by a single server 794 providing a large set of pure referrals to all of the included file 795 systems. Alternatively, a single multi-server namespace may be 796 administratively segmented with separate referral file systems (on 797 separate servers) for each separately administered portion of the 798 namespace. The top-level referral file system or any segment may use 799 replicated referral file systems for higher availability. 801 Generally, multi-server namespaces are for the most part uniform, in 802 that the same data made available to one client at a given location 803 in the namespace is made available to all clients at that location. 804 However, there are facilities provided that allow different clients 805 to be directed different sets of data, to enable adaptation to such 806 client characteristics as CPU architecture. 808 4.5.6. Changes in a Location Attribute (to be added) 810 Although clients will typically fetch a location attribute when first 811 accessing a file system and when NFS4ERR_MOVED is returned, a client 812 can choose to fetch the attribute periodically, in which case, the 813 value fetched may change over time. 815 For clients not prepared to access multiple replicas simultaneously 816 (see Section 8.1 of the current document), the handling of the 817 various cases of change are as follows: 819 o Changes in the list of replicas or in the network addresses 820 associated with replicas do not require immediate action. The 821 client will typically update its list of replicas to reflect the 822 new information. 824 o Additions to the list of network addresses for the current file 825 system instance need not be acted on promptly. However the client 826 can choose to use the new address whenever it needs to switch 827 access to a new replica. 829 o Deletions from the list of network addresses for the current file 830 system instance need not be acted on immediately, although the 831 client might need to be prepared for a shift in access whenever 832 the server indicates that a network access path is not usable to 833 access the current file system, by returning NFS4ERR_MOVED. 835 For clients that are prepared to access several replicas 836 simultaneously, the following additional cases need to be addressed. 837 As in the cases discussed above, changes in the set of replicas need 838 not be acted upon promptly, although the client has the option of 839 adjusting its access even in the absence of difficulties that would 840 lead to a new replica to be selected. 842 o When a new replica is added which may be accessed simultaneously 843 with one currently in use, the client is free to use the new 844 replica immediately. 846 o When a replica currently in use is deleted from the list, the 847 client need not cease using it immediately. However, since the 848 server may subsequently force such use to cease (by returning 849 NFS4ERR_MOVED), clients can decide to limit the need for later 850 state transfer. For example, new opens might be done on other 851 replicas, rather than on one not present in the list. 853 5. Re-organization of Section 11.7 of RFC5661 855 The material in Section 11.7 of [RFC5661] has been reorganized and 856 augmented as specified below: 858 o Because there can be a shift of the network access paths used to 859 access a file system instance without any shift between replicas, 860 a new Section 6 in the current document distinguishes between 861 those cases in which there is a shift between distinct replicas 862 and those involving a shift in network access paths with no shift 863 between replicas. 865 As a result, a new Section 7 in the current document deals with 866 network address transitions while the bulk of the former 867 Section 11.7 (in [RFC5661]) is replaced by Section 8 in the 868 current document which is now limited to cases in which there is a 869 shift between two different sets of replicas. 871 o The additional Section 9 in the current document discusses the 872 case in which a shift to a different replica is made and state is 873 transferred to allow the client the ability to have continues 874 access to the accumulated locking state on the new server. 876 o The additional Section 10 in the current document discusses the 877 client's response to access transitions and how it determines 878 whether migration has occurred, and how it gets access to any 879 transferred locking and session state. 881 o The additional Section 11 in the current document discusses the 882 responsibilities of the source and destination servers when 883 transferring locking and session state. 885 6. Overview of File Access Transitions (to be added) 887 File access transitions are of two types: 889 o Those that involve a transition from accessing the current replica 890 to another one in connection with either replication or migration. 891 How these are dealt with is discussed in Section 8 of the current 892 document. 894 o Those in which access to the current file system instance is 895 retained, while the network path used to access that instance is 896 changed. This case is discussed in Section 7 of the current 897 document. 899 7. Effecting Network Address Transitions (to be added) 901 The addresses used to access a particular file system instance may 902 change in a number of ways, as listed below. In each of these cases, 903 the same filehandles, stateids, client IDs and session are used to 904 continue access, with a continuity of lock state. 906 o When use of a particular address is to cease and there is also one 907 currently in use which is server-trunkable with it, requests that 908 would have been issued on the address whose use is to be 909 discontinued can be issued on the remaining address(es). When an 910 address is not a session-trunkable one, the request might need to 911 be modified to reflect the fact that a different session will be 912 used. 914 o When there are no potential replacement addresses in use but there 915 are valid addresses session-trunkable with the one whose use is to 916 be discontinued, the client can use BIND_CONN_TO_SESSION to access 917 the existing session using the new address. Although the target 918 session will generally be accessible, there may be cases in which 919 that session in no longer accessible, in which case a new session 920 can be created to provide the client continued access to the 921 existing instance. 923 o When there is no potential replacement address in use and there 924 are no valid addresses session-trunkable with the one whose use is 925 to be discontinued, other server-trunkable addresses may be used 926 to provide continued access. Although use of CREATE_SESSION is 927 available to provide continued access to the existing instance, 928 servers have the option of providing continued access to the 929 existing session through the new network access path in a fashion 930 similar to that provided by session migration (see Section 9 of 931 the current document). To take advantage of this possibility, 932 clients can perform an initial BIND_CONN_TO_SESSION, as in the 933 previous case, and use CREATE_SESSION only when that fails. 935 8. Effecting File System Transitions (as updated) 937 There are a range of situations in which there is a change to be 938 effected in the set of replicas used to access a particular file 939 system. Some of these may involve an expansion or contraction of the 940 set of replicas used as discussed in Section 8.1 below. 942 For reasons explained in that section, most transitions will involve 943 a transition from a single replica to a corresponding replacement 944 replica. When effecting replica transition, some types of sharing 945 between the replicas may affect handling of the transition as 946 described in Sections 8.2 through 8.8 below. The attribute 947 fs_locations_info provides helpful information to allow the client to 948 determine the degree of inter-replica sharing. 950 With regard to some types of state, the degree of continuity across 951 the transition depends on the occasion prompting the transition, with 952 transitions initiated by the servers (i.e. migration) offering much 953 more scope for a non-disruptive transition than cases in which the 954 client on its own shifts its access to another replica (i.e. 955 replication). This issue potentially applies to locking state and to 956 session state, which are dealt with below as follows: 958 o An introduction to the possible means of providing continuity of 959 these areas appears in Section 8.9 below. 961 o Transparent State Migration is introduced in Section 9 of the 962 current document. The possible transfer of session state is 963 addressed there as well. 965 o The client handling of transitions, including determining how to 966 deal with the various means that the server might take to supply 967 effective continuity of locking state are discussed in Section 10 968 of the current document. 970 o The servers' (source and destination) responsibilities in 971 effecting Transparent Migration of locking and session state are 972 discussed in Section 11 of the current document. 974 8.1. File System Transitions and Simultaneous Access (as updated) 976 The fs_locations_info attribute (described in Section 11.10.1 of 977 [RFC5661]) may indicate that two replicas may be used simultaneously 978 (see Section 11.7.2.1 of [RFC5661] for details). Although situations 979 in which multiple replicas may be accessed simultaneously are 980 somewhat similar to those in which a single replica is accessed by 981 multiple network addresses, there are important differences, since 982 locking state is not shared among multiple replicas. 984 Because of this difference in state handling, many clients will not 985 have the ability to take advantage of the fact that such replicas 986 represent the same data. Such clients will not be prepared to use 987 multiple replicas simultaneously but will access each file system 988 using only a single replica, although the replica selected may make 989 multiple server-trunkable addresses available. 991 Clients who are prepared to use multiple replicas simultaneously will 992 divide opens among replicas however they choose. Once that choice is 993 made, any subsequent transitions will treat the set of locking state 994 associated with each replica as a single entity. 996 For example, if one of the replicas become unavailable, access will 997 be transferred to a different replica, also capable of simultaneous 998 access with the one still in use. 1000 When there is no such replica, the transition may be to the replica 1001 already in use. At this point, the client has a choice between 1002 merging the locking state for the two replicas under the aegis of the 1003 sole replica in use or treating these separately, until another 1004 replica capable of simultaneous access presents itself. 1006 8.2. Filehandles and File System Transitions (as updated) 1008 There are a number of ways in which filehandles can be handled across 1009 a file system transition. These can be divided into two broad 1010 classes depending upon whether the two file systems across which the 1011 transition happens share sufficient state to effect some sort of 1012 continuity of file system handling. 1014 When there is no such cooperation in filehandle assignment, the two 1015 file systems are reported as being in different handle classes. In 1016 this case, all filehandles are assumed to expire as part of the file 1017 system transition. Note that this behavior does not depend on the 1018 fh_expire_type attribute and supersedes the specification of the 1019 FH4_VOL_MIGRATION bit, which only affects behavior when 1020 fs_locations_info is not available. 1022 When there is cooperation in filehandle assignment, the two file 1023 systems are reported as being in the same handle classes. In this 1024 case, persistent filehandles remain valid after the file system 1025 transition, while volatile filehandles (excluding those that are only 1026 volatile due to the FH4_VOL_MIGRATION bit) are subject to expiration 1027 on the target server. 1029 8.3. Fileids and File System Transitions (as updated) 1031 In NFSv4.0, the issue of continuity of fileids in the event of a file 1032 system transition was not addressed. The general expectation had 1033 been that in situations in which the two file system instances are 1034 created by a single vendor using some sort of file system image copy, 1035 fileids would be consistent across the transition, while in the 1036 analogous multi-vendor transitions they would not. This poses 1037 difficulties, especially for the client without special knowledge of 1038 the transition mechanisms adopted by the server. Note that although 1039 fileid is not a REQUIRED attribute, many servers support fileids and 1040 many clients provide APIs that depend on fileids. 1042 It is important to note that while clients themselves may have no 1043 trouble with a fileid changing as a result of a file system 1044 transition event, applications do typically have access to the fileid 1045 (e.g., via stat). The result is that an application may work 1046 perfectly well if there is no file system instance transition or if 1047 any such transition is among instances created by a single vendor, 1048 yet be unable to deal with the situation in which a multi-vendor 1049 transition occurs at the wrong time. 1051 Providing the same fileids in a multi-vendor (multiple server 1052 vendors) environment has generally been held to be quite difficult. 1053 While there is work to be done, it needs to be pointed out that this 1054 difficulty is partly self-imposed. Servers have typically identified 1055 fileid with inode number, i.e. with a quantity used to find the file 1056 in question. This identification poses special difficulties for 1057 migration of a file system between vendors where assigning the same 1058 index to a given file may not be possible. Note here that a fileid 1059 is not required to be useful to find the file in question, only that 1060 it is unique within the given file system. Servers prepared to 1061 accept a fileid as a single piece of metadata and store it apart from 1062 the value used to index the file information can relatively easily 1063 maintain a fileid value across a migration event, allowing a truly 1064 transparent migration event. 1066 In any case, where servers can provide continuity of fileids, they 1067 should, and the client should be able to find out that such 1068 continuity is available and take appropriate action. Information 1069 about the continuity (or lack thereof) of fileids across a file 1070 system transition is represented by specifying whether the file 1071 systems in question are of the same fileid class. 1073 Note that when consistent fileids do not exist across a transition 1074 (either because there is no continuity of fileids or because fileid 1075 is not a supported attribute on one of instances involved), and there 1076 are no reliable filehandles across a transition event (either because 1077 there is no filehandle continuity or because the filehandles are 1078 volatile), the client is in a position where it cannot verify that 1079 files it was accessing before the transition are the same objects. 1080 It is forced to assume that no object has been renamed, and, unless 1081 there are guarantees that provide this (e.g., the file system is 1082 read-only), problems for applications may occur. Therefore, use of 1083 such configurations should be limited to situations where the 1084 problems that this may cause can be tolerated. 1086 8.4. Fsids and File System Transitions (as updated) 1088 Since fsids are generally only unique on a per-server basis, it is 1089 likely that they will change during a file system transition. 1090 Clients should not make the fsids received from the server visible to 1091 applications since they may not be globally unique, and because they 1092 may change during a file system transition event. Applications are 1093 best served if they are isolated from such transitions to the extent 1094 possible. 1096 Although normally a single source file system will transition to a 1097 single target file system, there is a provision for splitting a 1098 single source file system into multiple target file systems, by 1099 specifying the FSLI4F_MULTI_FS flag. 1101 8.4.1. File System Splitting (as updated) 1103 When a file system transition is made and the fs_locations_info 1104 indicates that the file system in question may be split into multiple 1105 file systems (via the FSLI4F_MULTI_FS flag), the client SHOULD do 1106 GETATTRs to determine the fsid attribute on all known objects within 1107 the file system undergoing transition to determine the new file 1108 system boundaries. 1110 Clients may maintain the fsids passed to existing applications by 1111 mapping all of the fsids for the descendant file systems to the 1112 common fsid used for the original file system. 1114 Splitting a file system may be done on a transition between file 1115 systems of the same fileid class, since the fact that fileids are 1116 unique within the source file system ensure they will be unique in 1117 each of the target file systems. 1119 8.5. The Change Attribute and File System Transitions (as updated) 1121 Since the change attribute is defined as a server-specific one, 1122 change attributes fetched from one server are normally presumed to be 1123 invalid on another server. Such a presumption is troublesome since 1124 it would invalidate all cached change attributes, requiring 1125 refetching. Even more disruptive, the absence of any assured 1126 continuity for the change attribute means that even if the same value 1127 is retrieved on refetch, no conclusions can be drawn as to whether 1128 the object in question has changed. The identical change attribute 1129 could be merely an artifact of a modified file with a different 1130 change attribute construction algorithm, with that new algorithm just 1131 happening to result in an identical change value. 1133 When the two file systems have consistent change attribute formats, 1134 and this fact is communicated to the client by reporting in the same 1135 change class, the client may assume a continuity of change attribute 1136 construction and handle this situation just as it would be handled 1137 without any file system transition. 1139 8.6. Write Verifiers and File System Transitions (as updated) 1141 In a file system transition, the two file systems may be clustered in 1142 the handling of unstably written data. When this is the case, and 1143 the two file systems belong to the same write-verifier class, write 1144 verifiers returned from one system may be compared to those returned 1145 by the other and superfluous writes avoided. 1147 When two file systems belong to different write-verifier classes, any 1148 verifier generated by one must not be compared to one provided by the 1149 other. Instead, the two verifiers should be treated as not equal 1150 even when the values are identical. 1152 8.7. Readdir Cookies and Verifiers and File System Transitions (as 1153 updated) 1155 In a file system transition, the two file systems may be consistent 1156 in their handling of READDIR cookies and verifiers. When this is the 1157 case, and the two file systems belong to the same readdir class, 1158 READDIR cookies and verifiers from one system may be recognized by 1159 the other and READDIR operations started on one server may be validly 1160 continued on the other, simply by presenting the cookie and verifier 1161 returned by a READDIR operation done on the first file system to the 1162 second. 1164 When two file systems belong to different readdir classes, any 1165 READDIR cookie and verifier generated by one is not valid on the 1166 second, and must not be presented to that server by the client. The 1167 client should act as if the verifier was rejected. 1169 8.8. File System Data and File System Transitions (as updated) 1171 When multiple replicas exist and are used simultaneously or in 1172 succession by a client, applications using them will normally expect 1173 that they contain either the same data or data that is consistent 1174 with the normal sorts of changes that are made by other clients 1175 updating the data of the file system (with metadata being the same to 1176 the degree indicated by the fs_locations_info attribute). However, 1177 when multiple file systems are presented as replicas of one another, 1178 the precise relationship between the data of one and the data of 1179 another is not, as a general matter, specified by the NFSv4.1 1180 protocol. It is quite possible to present as replicas file systems 1181 where the data of those file systems is sufficiently different that 1182 some applications have problems dealing with the transition between 1183 replicas. The namespace will typically be constructed so that 1184 applications can choose an appropriate level of support, so that in 1185 one position in the namespace a varied set of replicas will be 1186 listed, while in another only those that are up-to-date may be 1187 considered replicas. The protocol does define three special cases of 1188 the relationship among replicas to be specified by the server and 1189 relied upon by clients: 1191 o When multiple replicas exist and are used simultaneously by a 1192 client (see the FSLIB4_CLSIMUL definition within 1193 fs_locations_info), they must designate the same data. Where file 1194 systems are writable, a change made on one instance must be 1195 visible on all instances, immediately upon the earlier of the 1196 return of the modifying requester or the visibility of that change 1197 on any of the associated replicas. This allows a client to use 1198 these replicas simultaneously without any special adaptation to 1199 the fact that there are multiple replicas, beyond adapting to the 1200 fact that locks obtained on one replica are maintained separately 1201 (i.e. under a different client ID). In this case, locks (whether 1202 share reservations or byte-range locks) and delegations obtained 1203 on one replica are immediately reflected on all replicas, in the 1204 sense that access from all other servers is prevented regardless 1205 of the replica used. However, because the servers are not 1206 required to treat two associated client IDs as representing the 1207 same client, it is best to access each file using only a single 1208 client ID. 1210 o When one replica is designated as the successor instance to 1211 another existing instance after return NFS4ERR_MOVED (i.e., the 1212 case of migration), the client may depend on the fact that all 1213 changes written to stable storage on the original instance are 1214 written to stable storage of the successor (uncommitted writes are 1215 dealt with in Section 8.6 above). 1217 o Where a file system is not writable but represents a read-only 1218 copy (possibly periodically updated) of a writable file system, 1219 clients have similar requirements with regard to the propagation 1220 of updates. They may need a guarantee that any change visible on 1221 the original file system instance must be immediately visible on 1222 any replica before the client transitions access to that replica, 1223 in order to avoid any possibility that a client, in effecting a 1224 transition to a replica, will see any reversion in file system 1225 state. The specific means of this guarantee varies based on the 1226 value of the fss_type field that is reported as part of the 1227 fs_status attribute (see Section 11.11 of [RFC5661]). Since these 1228 file systems are presumed to be unsuitable for simultaneous use, 1229 there is no specification of how locking is handled; in general, 1230 locks obtained on one file system will be separate from those on 1231 others. Since these are expected to be read-only file systems, 1232 this is not likely to pose an issue for clients or applications. 1234 8.9. Lock State and File System Transitions (as updated) 1236 While accessing a file system, clients obtain locks enforced by the 1237 server which may prevent actions by other clients that are 1238 inconsistent with those locks. 1240 When access is transferred between replicas, clients need to be 1241 assured that the actions disallowed by holding these locks cannot 1242 have occurred during the transition. This can be ensured by the 1243 methods below. If at least one of these is not implemented, clients 1244 will not be assured of continuity of lock possession across a 1245 migration event. 1247 o Providing the client an opportunity to re-obtain his locks via a 1248 per-fs grace period on the destination server. Because the lock 1249 reclaim mechanism was originally defined to support server reboot, 1250 it implicitly assumes that file handles will on reclaim will be 1251 the same as those at open. In the case of migration this requires 1252 that source and destination servers use the same filehandles, as 1253 evidenced by using the same server scope (see Section 12.2 of the 1254 current document) or by showing this agreement using 1255 fs_locations_info (see Section 8.2 above). 1257 o Transferring locking state as part of the transition as described 1258 in Section 9 of the current document to provide Transparent State 1259 Migration. 1261 Of these, Transparent State Migration provides the smoother 1262 experience for clients in that there is no grace-period-based delay 1263 before new locks can be obtained. However, it requires a greater 1264 degree of inter-server co-ordination. In general, the servers taking 1265 part in migration are free to provide either facility. However, when 1266 the filehandles can differ across the migration event, Transparent 1267 State Migration is the only available means of providing the needed 1268 functionality. 1270 It should be noted that these two methods are not mutually exclusive 1271 and that a server might well provide both. In particular, if there 1272 is some circumstance preventing a specific lock from being 1273 transferred transparently, the server can allow it to be reclaimed. 1275 9. Transferring State upon Migration (to be added) 1277 When the transition is a result of a server-initiated decision to 1278 transition access and the source and destination servers have 1279 implemented appropriate co-operation, it is possible to: 1281 o Transfer locking state from the source to the destination server, 1282 in a fashion similar to that provide by Transparent State 1283 Migration in NFSv4.0, as described in [RFC7931]. Server 1284 responsibilities are described in Section 11.1 of the current 1285 document. 1287 o Transfer session state from the source to the destination server. 1288 Server responsibilities in effecting such a transfer are described 1289 in Section 11.2 of the current document. 1291 The means by which the client determines which of these transfer 1292 events has occurred are described in Section 10 of the current 1293 document. 1295 9.1. Transparent State Migration and pNFS (to be added) 1297 When pNFS is involved, the protocol is capable of supporting: 1299 o Migration of the Metadata Server (MDS), leaving the Data Servers 1300 (DS's) in place. 1302 o Migration of the file system as a whole, including the MDS and 1303 associated DS's. 1305 o Replacement of one DS by another. 1307 o Migration of a pNFS file system to one in which pNFS is not used. 1309 o Migration of a file system not using pNFS to one in which layouts 1310 are available. 1312 Migration of the MDS function is directly supported by Transparent 1313 State Migration. Layout state will normally be transparently 1314 transferred, just as other state is. As a result, Transparent State 1315 Migration provides a framework in which, given appropriate inter-MDS 1316 data transfer, one MDS can be substituted for another. 1318 Migration of the file system function as a whole can be accomplished 1319 by recalling all layouts as part of the initial phase of the 1320 migration process. As a result, IO will be done through the MDS 1321 during the migration process, and new layouts can be granted once the 1322 client is interacting with the new MDS. An MDS can also effect this 1323 sort of transition by revoking all layouts as part of Transparent 1324 State Migration, as long as the client is notified about the loss of 1325 locking state. 1327 In order to allow migration to a file system on which pNFS is not 1328 supported, clients need to be prepared for a situation in which 1329 layouts are not available or supported on the destination file system 1330 and so direct IO requests to the destination server, rather than 1331 depending on layouts being available. 1333 Replacement of one DS by another is not addressed by migration as 1334 such but can be effected by an MDS recalling layouts for the DS to be 1335 replaced and issuing new ones to be served by the successor DS. 1337 Migration may transfer a file system from a server which does not 1338 support pNFS to one which does. In order to properly adapt to this 1339 situation, clients which support pNFS, but function adequately in its 1340 absence should check for pNFS support when a file system is migrated 1341 and be prepared to use pNFS when support is available on the 1342 destination. 1344 10. Client Responsibilities when Access is Transitioned (to be added) 1346 For a client to respond to an access transition, it must be made 1347 aware of it. The ways in which this can happen are discussed in 1348 Section 10.1 which discusses indications that a specific file system 1349 access path has transitioned as well as situations in which 1350 additional activity is necessary to determine the set of file systems 1351 that have been migrated. Section 10.2 goes on to complete the 1352 discussion of how the set of migrated file systems might be 1353 determined. Sections 10.3 through 10.5 discuss how the client should 1354 deal with each transition it becomes aware of, either directly or as 1355 a result of migration discovery. 1357 The following terms are used to describe client activities: 1359 o "Transition recovery" refers to the process of restoring access to 1360 a file system on which NFS4ERR_MOVED was received. 1362 o "Migration recovery" to that subset of transition recovery which 1363 applies when the file system has migrated to a different replica. 1365 o "Migration discovery" refers to the process of determining which 1366 file system(s) have been migrated. It is necessary to avoid a 1367 situation in which leases could expire when a file system is not 1368 accessed for a long period of time, since a client unaware of the 1369 migration might be referencing an unmigrated file system and not 1370 renewing the lease associated with the migrated file system. 1372 10.1. Client Transition Notifications (to be added) 1374 When there is a change in the network access path which a client is 1375 to use to access a file system, there are a number of related status 1376 indications with which clients need to deal: 1378 o If an attempt is made to use or return a filehandle within a file 1379 system that is no longer accessible at the address previously used 1380 to access it, the error NFS4ERR_MOVED is returned. 1382 Exceptions are made to allow such file handles to be used when 1383 interrogating a location attribute. This enables a client to 1384 determine a new replica's location or a new network access path. 1386 This condition continues on subsequent attempts to access the file 1387 system in question. The only way the client can avoid the error 1388 is to cease accessing the filesystem in question at its old server 1389 location and access it instead using a different address at which 1390 it is now available. 1392 o Whenever a SEQUENCE operation is sent by a client to a server 1393 which generated state held on that client which is associated with 1394 a file system that is no longer accessible on the server at which 1395 it was previously available, a lease-migrated indication, in the 1396 form the SEQ4_STATUS_LEASE_MOVED status bit being set, appears in 1397 the response. 1399 This condition continues until the client acknowledges the 1400 notification by fetching a location attribute for the file system 1401 whose network access path is being changed. When there are 1402 multiple such file systems, a location attribute for each such 1403 file system needs to be fetched. The location attribute for all 1404 migrated file system needs to be fetched in order to clear the 1405 condition. Even after the condition is cleared, the client needs 1406 to respond by using the location information to access the file 1407 system at its new location to ensure that leases are not 1408 needlessly expired. 1410 Unlike the case of NFSv4.0, in which the corresponding conditions are 1411 both errors and thus mutually exclusive, in NFSv4.1 the client can, 1412 and often will, receive both indications on the same request. As a 1413 result, implementations need to address the question of how to co- 1414 ordinate the necessary recovery actions when both indications arrive 1415 in the response to the same request. It should be noted that when 1416 processing an NFSv4 COMPOUND, the server decides whether 1417 SEQ4_STATUS_LEASE_MOVED is to be set before it determines which file 1418 system will be referenced or whether NFS4ERR_MOVED is to be returned. 1420 Since these indications are not mutually exclusive in NFSv4.1, the 1421 following combinations are possible results when a COMPOUND is 1422 issued: 1424 o The COMPOUND status is NFS4ERR_MOVED and SEQ4_STATUS_LEASE_MOVED 1425 is asserted. 1427 In this case, transition recovery is required. While it is 1428 possible that migration discovery is needed in addition, it is 1429 likely that only the accessed file system has transitioned. In 1430 any case, because addressing NFS4ERR_MOVED is necessary to allow 1431 the rejected requests to be processed on the target, dealing with 1432 it will typically have priority over migration discovery. 1434 o The COMPOUND status is NFS4ERR_MOVED and SEQ4_STATUS_LEASE_MOVED 1435 is clear. 1437 In this case, transition recovery is also required. It is clear 1438 that migration discovery is not needed to find file systems that 1439 have been migrated other that the one returning NFS4ERR_MOVED. 1440 Cases in which this result can arise include a referral or a 1441 migration for which there is no associated locking state. This 1442 can also arise in cases in which an access path transition other 1443 than migration occurs within the same server. In such a case, 1444 there is no need to set SEQ4_STATUS_LEASE_MOVED, since the lease 1445 remains associated with the current server even though the access 1446 path has changed. 1448 o The COMPOUND status is not NFS4ERR_MOVED and 1449 SEQ4_STATUS_LEASE_MOVED is asserted. 1451 In this case, no transition recovery activity is required on the 1452 file system(s) accessed by the request. However, to prevent 1453 avoidable lease expiration, migration discovery needs to be done 1455 o The COMPOUND status is not NFS4ERR_MOVED and 1456 SEQ4_STATUS_LEASE_MOVED is clear. 1458 In this case, neither transition-related activity nor migration 1459 discovery is required. 1461 Note that the specified actions only need to be taken if they are not 1462 already going on. For example NFS4ERR_MOVED on a file system for 1463 which transition recovery already going on merely waits for that 1464 recovery to be completed while SEQ4_STATUS_LEASE_MOVED only needs to 1465 initiate migration discovery for a server if it is not going on for 1466 that server. 1468 The fact that a lease-migrated condition does not result in an error 1469 in NFSv4.1 has a number of important consequences. In addition to 1470 the fact, discussed above, that the two indications are not mutually 1471 exclusive, there are number of issues that are important in 1472 considering implementation of migration discovery, as discussed in 1473 Section 10.2. 1475 Because of the absence of NFSV4ERR_LEASE_MOVED, it is possible for 1476 file systems whose access path has not changed to be successfully 1477 accessed on a given server even though recovery is necessary for 1478 other file systems on the same server. As a result, access can go on 1479 while, 1481 o The migration discovery process is going on for that server. 1483 o The transition recovery process is going on for on other file 1484 systems connected to that server. 1486 10.2. Performing Migration Discovery (to be added) 1488 Migration discovery can be performed in the same context as 1489 transition recovery, allowing recovery for each migrated file system 1490 to be invoked as it is discovered. Alternatively, it may be done in 1491 a separate migration discovery thread, allowing migration discovery 1492 to be done in parallel with one or more instances of transition 1493 recovery. 1495 In either case, because the lease-migrated indication does not result 1496 in an error. other access to file systems on the server can proceed 1497 normally, with the possibility that further such indications will be 1498 received, raising the issue of how such indications are to be dealt 1499 with. In general, 1501 o No action needs to be taken for such indications received by the 1502 those performing migration discovery, since continuation of that 1503 work will address the issue. 1505 o In other cases in which migration discovery is currently being 1506 performed, nothing further needs to be done to respond to such 1507 lease migration indications, as long as one can be certain that 1508 the migration discovery process would deal with those indications. 1509 See below for details. 1511 o For such indications received in all other contexts, the 1512 appropriate response is to initiate or otherwise provide for the 1513 execution of migration discovery for file systems associated with 1514 the server IP address returning the indication. 1516 This leaves a potential difficulty in situations in which the 1517 migration discovery process is near to completion but is still 1518 operating. One should not ignore a LEASE_MOVED indication if the 1519 migration discovery process is not able to respond to the discovery 1520 of additional migrating file system without additional aid. A 1521 further complexity relevant in addressing such situations is that a 1522 lease-migrated indication may reflect the server's state at the time 1523 the SEQUENCE operation was processed, which may be different from 1524 that in effect at the time the response is received. Because new 1525 migration events may occur at any time, and because a LEASE_MOVED 1526 indication may reflect the situation in effect a considerable time 1527 before the indication is received, special care needs to be taken to 1528 ensure that LEASE_MOVED indications are not inappropriately ignored. 1530 A useful approach to this issue involves the use of separate 1531 externally-visible migration discovery states for each server. 1532 Separate values could represent the various possible states for the 1533 migration discovery process for a server: 1535 o non-operation, in which migration discovery is not being performed 1537 o normal operation, in which there is an ongoing scan for migrated 1538 file systems. 1540 o completion/verification of migration discovery processing, in 1541 which the possible completion of migration discovery processing 1542 needs to be verified. 1544 Given that framework, migration discovery processing would proceed as 1545 follows. 1547 o While in the normal-operation state, the thread performing 1548 discovery would fetch, for successive file systems known to the 1549 client on the server being worked on, a location attribute plus 1550 the fs_status attribute. 1552 o If the fs_status attribute indicates that the file system is a 1553 migrated one (i.e. fss_absent is true and fss_type != 1554 STATUS4_REFERRAL) and thus that it is likely that the fetch of the 1555 location attribute has cleared one the file systems contributing 1556 to the lease-migrated indication. 1558 o In cases in which that happened, the thread cannot know whether 1559 the lease-migrated indication has been cleared and so it enters 1560 the completion/verification state and proceeds to issue a COMPOUND 1561 to see if the LEASE_MOVED indication has been cleared. 1563 o When the discovery process is in the completion/verification 1564 state, if others get a lease-migrated indication they note the it 1565 was received and the existence of such indications is used when 1566 the request completes, as described below. 1568 When the request used in the completion/verification state completes: 1570 o If a lease-migrated indication is returned, the discovery 1571 continues normally. Note that this is so even if all file systems 1572 have traversed, since new migrations could have occurred while the 1573 process was going on. 1575 o Otherwise, if there is any record that other requests saw a lease- 1576 migrated indication, that record is cleared and the verification 1577 request retried. The discovery process remains in completion/ 1578 verification state. 1580 o If there have been no lease-migrated indications, the work of 1581 migration discovery is considered completed and it enters the non- 1582 operating state. Once it enters this state, subsequent lease- 1583 migrated indication will trigger a new migration discovery 1584 process. 1586 It should be noted that the process described above is not guaranteed 1587 to terminate, as a long series of new migration events might 1588 continually delay the clearing of the LEASE_MOVED indication. To 1589 prevent unnecessary lease expiration, it is appropriate for clients 1590 to use the discovery of migrations to effect lease renewal 1591 immediately, rather than waiting for clearing of the LEASE_MOVED 1592 indication when the complete set of migrations is available. 1594 10.3. Overview of Client Response to NFS4ERR_MOVED (to be added) 1596 This section outlines a way in which a client that receives 1597 NFS4ERR_MOVED can effect transition recovery by using a new server or 1598 network address if one is available. As part of that process, it 1599 will determine: 1601 o Whether the NFS4ERR_MOVED indicates migration has occurred, or 1602 whether it indicates another sort of file system access transition 1603 as discussed in Section 7 above. 1605 o In the case of migration, whether Transparent State Migration has 1606 occurred. 1608 o Whether any state has been lost during the process of Transparent 1609 State Migration. 1611 o Whether sessions have been transferred as part of Transparent 1612 State Migration. 1614 During the first phase of this process, the client proceeds to 1615 examine location entries to find the initial network address it will 1616 use to continue access to the file system or its replacement. For 1617 each location entry that the client examines, the process consists of 1618 five steps: 1620 1. Performing an EXCHANGE_ID directed at the location address. This 1621 operation is used to register the client-owner with the server, 1622 to obtain a client ID to be use subsequently to communicate with 1623 it, to obtain tat client ID's confirmation status and, to 1624 determine server_owner and scope for the purpose of determining 1625 if the entry is trunkable with that previously being used to 1626 access the file system (i.e. that it represents another network 1627 access path to the same file system and can share locking state 1628 with it). 1630 2. Making an initial determination of whether migration has 1631 occurred. The initial determination will be based on whether the 1632 EXCHANGE_ID results indicate that the current location element is 1633 server-trunkable with that used to access the file system when 1634 access was terminated by receiving NFS4ERR_MOVED. If it is, then 1635 migration has not occurred and the transition is dealt with, at 1636 least initially, as one involving continued access to the same 1637 file system on the same server through a new network address. 1639 3. Obtaining access to existing session state or creating new 1640 sessions. How this is done depends on the initial determination 1641 of whether migration has occurred and can be done as described in 1642 Section 10.4 below in the case of migration or as described in 1643 Section 10.5 below in the case of a network address transfer 1644 without migration. 1646 4. Verification of the trunking relationship assumed in step 2 as 1647 discussed in Section 2.10.5.1 of [RFC5661]. Although this step 1648 will generally confirm the initial determination, it is possible 1649 for verification to fail with the result that an initial 1650 determination that a network address shift (without migration) 1651 has occurred may be invalidated and migration determined to have 1652 occurred. There is no need to redo step 3 above, since it will 1653 be possible to continue use of the session established already. 1655 5. Obtaining access to existing locking state and/or reobtaining it. 1656 How this is done depends on the final determination of whether 1657 migration has occurred and can be done as described below in 1658 Section 10.4 in the case of migration or as described in 1659 Section 10.5 in the case of a network address transfer without 1660 migration. 1662 Once the initial address has been determined, clients are free to 1663 apply an abbreviated process to find additional addresses trunkable 1664 with it (clients may seek session-trunkable or server-trunkable 1665 addresses depending on whether they support clientid trunking). 1666 During this later phase of the process, further location entries are 1667 examined using the abbreviated procedure specified below: 1669 1. Before the EXCHANGE_ID, the fs name of the location entry is 1670 examined and if it does not match that currently being used, the 1671 entry is ignored. otherwise, one proceeds as specified by step 1 1672 above,. 1674 2. In the case that the network address is session-trunkable with 1675 one used previously a BIND_CONN_TO_SESSION is used to access that 1676 session using new network address. Otherwise, or if the bind 1677 operation fails, a CREATE_SESSION is done. 1679 3. The verification procedure referred to in step 4 above is used. 1680 However, if it fails, the entry is ignored and the next available 1681 entry is used. 1683 10.4. Obtaining Access to Sessions and State after Migration (to be 1684 added) 1686 In the event that migration has occurred, migration recovery will 1687 involve determining whether Transparent State Migration has occurred. 1688 This decision is made based on the client ID returned by the 1689 EXCHANGE_ID and the reported confirmation status. 1691 o If the client ID is an unconfirmed client ID not previously known 1692 to the client, then Transparent State Migration has not occurred. 1694 o If the client ID is a confirmed client ID previously known to the 1695 client, then any transferred state would have been merged with an 1696 existing client ID representing the client to the destination 1697 server. In this state merger case, Transparent State Migration 1698 might or might not have occurred and a determination as to whether 1699 it has occurred is deferred until sessions are established and the 1700 client is ready to begin state recovery. 1702 o If the client ID is a confirmed client ID not previously known to 1703 the client, then the client can conclude that the client ID was 1704 transferred as part of Transparent State Migration. In this 1705 transferred client ID case, Transparent State Migration has 1706 occurred although some state may have been lost. 1708 Once the client ID has been obtained, it is necessary to obtain 1709 access to sessions to continue communication with the new server. In 1710 any of the cases in which Transparent State Migration has occurred, 1711 it is possible that a session was transferred as well. To deal with 1712 that possibility, clients can, after doing the EXCHANGE_ID, issue a 1713 BIND_CONN_TO_SESSION to connect the transferred session to a 1714 connection to the new server. If that fails, it is an indication 1715 that the session was not transferred and that a new session needs to 1716 be created to take its place. 1718 In some situations, it is possible for a BIND_CONN_TO_SESSION to 1719 succeed without session migration having occurred. If state merger 1720 has taken place then the associated client ID may have already had a 1721 set of existing sessions, with it being possible that the sessionid 1722 of a given session is the same as one that might have been migrated. 1723 In that event, a BIND_CONN_TO_SESSION might succeed, even though 1724 there could have been no migration of the session with that 1725 sessionid. 1727 Once the client has determined the initial migration status, and 1728 determined that there was a shift to a new server, it needs to re- 1729 establish its locking state, if possible. To enable this to happen 1730 without loss of the guarantees normally provided by locking, the 1731 destination server needs to implement a per-fs grace period in all 1732 cases in which lock state was lost, including those in which 1733 Transparent State Migration was not implemented. 1735 Clients need to be deal with the following cases: 1737 o In the state merger case, it is possible that the server has not 1738 attempted Transparent State Migration, in which case state may 1739 have been lost without it being reflected in the SEQ4_STATUS bits. 1740 To determine whether this has happened, the client can use 1741 TEST_STATEID to check whether the stateids created on the source 1742 server are still accessible on the destination server. Once a 1743 single stateid is found to have been successfully transferred, the 1744 client can conclude that Transparent State Migration was begun and 1745 any failure to transport all of the stateids will be reflected in 1746 the SEQ4_STATUS bits. Otherwise. Transparent State Migration has 1747 not occurred. 1749 o In a case in which Transparent State Migration has not occurred, 1750 the client can use the per-fs grace period provided by the 1751 destination server to reclaim locks that were held on the source 1752 server. 1754 o In a case in which Transparent State Migration has occurred, and 1755 no lock state was lost (as shown by SEQ4_STATUS flags), no lock 1756 reclaim is necessary. 1758 o In a case in which Transparent State Migration has occurred, and 1759 some lock state was lost (as shown by SEQ4_STATUS flags), existing 1760 stateids need to be checked for validity using TEST_STATEID, and 1761 reclaim used to re-establish any that were not transferred. 1763 For all of the cases above, RECLAIM_COMPLETE with an rca_one_fs value 1764 of true should be done before normal use of the file system including 1765 obtaining new locks for the file system. This applies even if no 1766 locks were lost and there was no need for any to be reclaimed. 1768 10.5. Obtaining Access to Sessions and State after Network Address 1769 Transfer (to be added) 1771 The case in which there is a transfer to a new network address 1772 without migration is similar to that described in Section 10.4 above 1773 in that there is a need to obtain access to needed sessions and 1774 locking state. However, the details are simpler and will vary 1775 depending on the type of trunking between the address receiving 1776 NFS4ERR_MOVED and that to which the transfer is to be made 1777 To make a session available for use, a BIND_CONN_TO_SESSION should be 1778 used to obtain access to the session previously in use. Only if this 1779 fails, should a CREATE_SESSION be done. While this procedure mirrors 1780 that in Section 10.4 above, there is an important difference in that 1781 preservation of the session is not purely optional but depends on the 1782 type of trunking. 1784 Access to appropriate locking state should need no actions beyond 1785 access to the session. However. the SEQ4_STATUS bits should be 1786 checked for lost locking state, including the need to reclaim locks 1787 after a server reboot. 1789 11. Server Responsibilities Upon Migration (to be added) 1791 In order to effect Transparent State Migration and possibly session 1792 migration, the source and server need to co-operate to transfer 1793 certain client-relevant information. The sections below discuss the 1794 information to be transferred but do not define the specifics of the 1795 transfer protocol. This is left as an implementation choice although 1796 standards in this area could be developed at a later time. 1798 Transparent State Migration and session migration are discussed 1799 separately, in Sections 11.1 and 11.2 below respectively. In each 1800 case, the discussion addresses the issue of providing the client a 1801 consistent view of the transferred state, even though the transfer 1802 might take an extended time. 1804 11.1. Server Responsibilities in Effecting Transparent State Migration 1805 (to be added) 1807 The basic responsibility of the source server in effecting 1808 Transparent State Migration is to make available to the destination 1809 server a description of each piece of locking state associated with 1810 the file system being migrated. In addition to client id string and 1811 verifier, the source server needs to provide, for each stateid: 1813 o The stateid including the current sequence value. 1815 o The associated client ID. 1817 o The handle of the associated file. 1819 o The type of the lock, such as open, byte-range lock, delegation, 1820 layout. 1822 o For locks such as opens and byte-range locks, there will be 1823 information about the owner(s) of the lock. 1825 o For recallable/revocable lock types, the current recall status 1826 needs to be included. 1828 o For each lock type there will by type-specific information, such 1829 as share and deny modes for opens and type and byte ranges for 1830 byte-range locks and layouts. 1832 A further server responsibility concerns locks that are revoked or 1833 otherwise lost during the process of file system migration. Because 1834 locks that appear to be lost during the process of migration will be 1835 reclaimed by the client, the servers have to take steps to ensure 1836 that locks revoked soon before or soon after migration are not 1837 inadvertently allowed to be reclaimed in situations in which the 1838 continuity of lock possession cannot be assured. 1840 o For locks lost on the source but whose loss has not yet been 1841 acknowledged by the client (by using FREE_STATEID), the 1842 destination must be aware of this loss so that it can deny a 1843 request to reclaim them. 1845 o For locks lost on the destination after the state transfer but 1846 before the client's RECLAIM_COMPLTE is done, the destination 1847 server should note these and not allow them to be reclaimed. 1849 An additional responsibility of the cooperating servers concerns 1850 situations in which a stateid cannot be transferred transparently 1851 because it conflicts with an existing stateid held by the client and 1852 associated with a different file system. In this case there are two 1853 valid choices: 1855 o Treat the transfer, as in NFSv4.0, as one without Transparent 1856 State Migration. In this case, conflicting locks cannot be 1857 granted until the client does a RECLAIM_COMPLETE, after reclaiming 1858 the locks it had, with the exception of reclaims denied because 1859 they were attempts to reclaim locks that had been lost. 1861 o Implement Transparent State Migration, except for the lock with 1862 the conflicting stateid. In this case, the client will be aware 1863 of a lost lock (through the SEQ4_STATUS flags) and be allowed to 1864 reclaim it. 1866 When transferring state between the source and destination, the 1867 issues discussed in Section 7.2 of [RFC7931] must still be attended 1868 to. In this case, the use of NFS4ERR_DELAY may still necessary in 1869 NFSv4.1, as it was in NFSv4.0, to prevent locking state changing 1870 while it is being transferred. 1872 There are a number of important differences in the NFS4.1 context: 1874 o The absence of RELEASE_LOCKOWNER means that the one case in which 1875 an operation could not be deferred by use of NFS4ERR_DELAY no 1876 longer exists. 1878 o Sequencing of operations is no longer done using owner-based 1879 operation sequences numbers. Instead, sequencing is session- 1880 based 1882 As a result, when sessions are not transferred, the techniques 1883 discussed in Section 7.2 of [RFC7931] are adequate and will not be 1884 further discussed. 1886 11.2. Server Responsibilities in Effecting Session Transfer (to be 1887 added) 1889 The basic responsibility of the source server in effecting session 1890 transfer is to make available to the destination server a description 1891 of the current state of each slot with the session, including: 1893 o The last sequence value received for that slot. 1895 o Whether there is cached reply data for the last request executed 1896 and, if so, the cached reply. 1898 When sessions are transferred, there are a number of issues that pose 1899 challenges in terms of making the transferred state unmodifiable 1900 during the period it is gathered up and transferred to the 1901 destination server. 1903 o A single session may be used to access multiple file systems, not 1904 all of which are being transferred. 1906 o Requests made on a session may, even if rejected, affect the state 1907 of the session by advancing the sequence number associated with 1908 the slot used. 1910 As a result, when the filesystem state might otherwise be considered 1911 unmodifiable, the client might have any number of in-flight requests, 1912 each of which is capable of changing session state, which may be of a 1913 number of types: 1915 1. Those requests that were processed on the migrating file system, 1916 before migration began. 1918 2. Those requests which got the error NFS4ERR_DELAY because the file 1919 system being accessed was in the process of being migrated. 1921 3. Those requests which got the error NFS4ERR_MOVED because the file 1922 system being accessed had been migrated. 1924 4. Those requests that accessed the migrating file system, in order 1925 to obtain location or status information. 1927 5. Those requests that did not reference the migrating file system. 1929 It should be noted that the history of any particular slot is likely 1930 to include a number of these request classes. In the case in which a 1931 session which is migrated is used by filesystems other than the one 1932 migrated, requests of class 5 may be common and be the last request 1933 processed, for many slots. 1935 Since session state can change even after the locking state has been 1936 fixed as part of the migration process, the session state known to 1937 the client could be different from that on the destination server, 1938 which necessarily reflects the session state on the source server, at 1939 an earlier time. In deciding how to deal with this situation, it is 1940 helpful to distinguish between two sorts of behavioral consequences 1941 of the choice of initial sequence ID values. 1943 o The error NFS4ERR_SEQ_MISORDERED is returned when the sequence ID 1944 in a request is neither equal to the last one seen for the current 1945 slot nor the next greater one. 1947 In view of the difficulty of arriving at a mutually acceptable 1948 value for the correct last sequence value at the point of 1949 migration, it may be necessary for the server to show some degree 1950 of forbearance, when the sequence ID is one that would be 1951 considered unacceptable if session migration were not involved. 1953 o Returning the cached reply for a previously executed request when 1954 the sequence ID in the request matches the last value recorded for 1955 the slot. 1957 In the cases in which an error is returned and there is no 1958 possibility of any non-idempotent operation having been executed, 1959 it may not be necessary to adhere to this as strictly as might be 1960 proper if session migration were not involved. For example, the 1961 fact that the error NFS4ERR_DELAY was returned may not assist the 1962 client in any material way, while the fact that NFS4ERR_MOVED was 1963 returned by the source server may not be relevant when the request 1964 was reissued, directed to the destination server. 1966 One part of the necessary adaptation to these sorts of issues would 1967 restrict enforcement of normal slot sequence enforcement semantics 1968 until the client itself, by issuing a request using a particular slot 1969 on the destination server, established the new starting sequence for 1970 that slot on the migrated session. 1972 An important issue is that the specification needs to take note of 1973 all potential COMPOUNDs, even if they might be unlikely in practice. 1974 For example, a COMPOUND is allowed to access multiple file systems 1975 and might perform non-idempotent operations in some of them before 1976 accessing a file system being migrated. Also, a COMPOUND may return 1977 considerable data in the response, before being rejected with 1978 NFS4ERR_DELAY or NFS4ERR_MOVED, and may in addition be marked as 1979 sa_cachethis. 1981 To address these issues, the destination server MAY do any of the 1982 following. 1984 o Avoid enforcing any sequencing semantics for a particular slot 1985 until the client has established the starting sequence for that 1986 slot on the destination server. 1988 o For each slot, avoid returning a cached reply returning 1989 NFS4ERR_DELAY or NFS4ERR_MOVED until the client has established 1990 the starting sequence for that slot on the destination server. 1992 o Until the client has established the starting sequence for a 1993 particular slot on the destination server, avoid reporting 1994 NFS4ERR_SEQ_MISORDERED or return a cached reply returning 1995 NFS4ERR_DELAY or NFS4ERR_MOVED, where the reply consists solely of 1996 a series of operations where the response is NFS4_OK until the 1997 final error. 1999 12. Changes to RFC5661 outside Section 11 2001 Beside the major rework of Section 11, there are a number of related 2002 changes that are necessary: 2004 o The summary that appeared in Section 1.7.3.3 of [RFC5661] needs to 2005 be revised to reflect the changes called for in Section 4 of the 2006 current document. The updated summary appears as Section 12.1 2007 below. 2009 o The discussion of server scope which appeared in Section 2.10.4 of 2010 [RFC5661] needs to be replaced, since the existing text appears to 2011 require a level of inter-server co-ordination incompatible with 2012 its basic function of avoiding the need for a globally uniform 2013 means of assigning server_owner values. A revised treatment 2014 appears Section 12.2 below. 2016 o While the last paragraph (exclusive of sub-sections) of 2017 Section 2.10.5 in [RFC5661], dealing with server_owner changes, is 2018 literally true, it has been a source of confusion. Since the 2019 existing paragraph can be read as suggesting that such changes be 2020 dealt with non-disruptively, the treatment in Section 12.4 below 2021 needs to be substituted. 2023 o The existing definition of NFS4ERR_MOVED (in Section 15.1.2.4 of 2024 [RFC5661]) needs to be updated to reflect the different handling 2025 of unavailability of a particular fs via a specific network 2026 address. Since such a situation is no longer considered to 2027 constitute unavailability of a file system instance, the 2028 description needs to change even though the instances in which it 2029 is returned remain the same. The updated description appears in 2030 Section 12.3 below. 2032 o The existing treatment of EXCHANGE_ID (in Section 18.35 of 2033 [RFC5661]) assumes that client IDs cannot be created/ confirmed 2034 other than by the EXCHANGE_ID and CREATE_SESSION operations. 2035 Also, the necessary use of EXCHANGE_ID in recovery from migration 2036 and related situations is not addressed clearly. A revised 2037 treatment of EXCHANGE_ID is necessary and it appears in Section 13 2038 below while the specific differences between it and the treatment 2039 within [RFC5661] are explained in Section 12.5 below. 2041 12.1. (Introduction to) Multi-Server Namespace (as updated) 2043 NFSv4.1 contains a number of features to allow implementation of 2044 namespaces that cross server boundaries and that allow and facilitate 2045 a non-disruptive transfer of support for individual file systems 2046 between servers. They are all based upon attributes that allow one 2047 file system to specify alternate, additional, and new location 2048 information which specifies how the client may access to access that 2049 file system. 2051 These attributes can be used to provide for individual active file 2052 systems: 2054 o Alternate network addresses to access the current file system 2055 instance. 2057 o The locations of alternate file system instances or replicas to be 2058 used in the event that the current file system instance becomes 2059 unavailable. 2061 These attributes may be used together with the concept of absent file 2062 systems, in which a position in the server namespace is associated 2063 with locations on other servers without any file system instance on 2064 the current server. 2066 o Location attributes may be used with absent file systems to 2067 implement referrals whereby one server may direct the client to a 2068 file system provided by another server. This allows extensive 2069 multi-server namespaces to be constructed. 2071 o Location attributes may be provided when a previously present file 2072 system becomes absent. This allows non-disruptive migration of 2073 file systems to alternate servers. 2075 12.2. Server Scope (as updated) 2077 Servers each specify a server scope value in the form of an opaque 2078 string eir_server_scope returned as part of the results of an 2079 EXCHANGE_ID operation. The purpose of the server scope is to allow a 2080 group of servers to indicate to clients that a set of servers sharing 2081 the same server scope value has arranged to use compatible values of 2082 otherwise opaque identifiers. Thus, the identifiers generated by two 2083 servers within that set can be assumed compatible so that, in some 2084 cases, identifiers by one server in that set that set may be 2085 presented to another server of the same scope. 2087 The use of such compatible values does not imply that a value 2088 generated by one server will always be accepted by another. In most 2089 cases, it will not. However, a server will not accept a value 2090 generated by another inadvertently. When it does accept it, it will 2091 be because it is recognized as valid and carrying the same meaning as 2092 on another server of the same scope. 2094 When servers are of the same server scope, this compatibility of 2095 values applies to the following identifiers: 2097 o Filehandle values. A filehandle value accepted by two servers of 2098 the same server scope denotes the same object. A WRITE operation 2099 sent to one server is reflected immediately in a READ sent to the 2100 other. 2102 o Server owner values. When the server scope values are the same, 2103 server owner value may be validly compared. In cases where the 2104 server scope values are different, server owner values are treated 2105 as different even if they contain identical strings of bytes. 2107 The coordination among servers required to provide such compatibility 2108 can be quite minimal, and limited to a simple partition of the ID 2109 space. The recognition of common values requires additional 2110 implementation, but this can be tailored to the specific situations 2111 in which that recognition is desired. 2113 Clients will have occasion to compare the server scope values of 2114 multiple servers under a number of circumstances, each of which will 2115 be discussed under the appropriate functional section: 2117 o When server owner values received in response to EXCHANGE_ID 2118 operations sent to multiple network addresses are compared for the 2119 purpose of determining the validity of various forms of trunking, 2120 as described in Section 4.5.2 of the current document. 2122 o When network or server reconfiguration causes the same network 2123 address to possibly be directed to different servers, with the 2124 necessity for the client to determine when lock reclaim should be 2125 attempted, as described in Section 8.4.2.1 of [RFC5661]. 2127 When two replies from EXCHANGE_ID, each from two different server 2128 network addresses, have the same server scope, there are a number of 2129 ways a client can validate that the common server scope is due to two 2130 servers cooperating in a group. 2132 o If both EXCHANGE_ID requests were sent with RPCSEC_GSS ([RFC2203], 2133 [RFC5403], [RFC7861]) authentication and the server principal is 2134 the same for both targets, the equality of server scope is 2135 validated. It is RECOMMENDED that two servers intending to share 2136 the same server scope also share the same principal name. 2138 o The client may accept the appearance of the second server in the 2139 fs_locations or fs_locations_info attribute for a relevant file 2140 system. For example, if there is a migration event for a 2141 particular file system or there are locks to be reclaimed on a 2142 particular file system, the attributes for that particular file 2143 system may be used. The client sends the GETATTR request to the 2144 first server for the fs_locations or fs_locations_info attribute 2145 with RPCSEC_GSS authentication. It may need to do this in advance 2146 of the need to verify the common server scope. If the client 2147 successfully authenticates the reply to GETATTR, and the GETATTR 2148 request and reply containing the fs_locations or fs_locations_info 2149 attribute refers to the second server, then the equality of server 2150 scope is supported. A client may choose to limit the use of this 2151 form of support to information relevant to the specific file 2152 system involved (e.g. a file system being migrated). 2154 12.3. Revised Treatment of NFS4ERR_MOVED 2156 Because the term "replica" is now used differently, the current 2157 description of NFS4ERR_MOVED needs to be changed to the one below. 2158 The new paragraph explicitly recognizes that a different network 2159 address might be used, while the previous description, misleadingly, 2160 treated this as a shift between two replicas while only a single file 2161 system instance might be involved. 2163 The file system that contains the current filehandle object is not 2164 accessible using the address on which the request was made. It 2165 still might be accessible using other addresses server-trunkable 2166 with it or it might not be present at the server. In the latter 2167 case, it might have been relocated or migrated to another server, 2168 or it might have never been present. The client may obtain 2169 information regarding access to the file system location by 2170 obtaining the "fs_locations" or "fs_locations_info" attribute for 2171 the current filehandle. For further discussion, refer to 2172 Section 11 of [RFC5661], as modified by the current document. 2174 12.4. Revised Discussion of Server_owner changes 2176 Because of problems with the treatment of such changes, the confusing 2177 paragraph, which simply says that such changes need to be dealt with, 2178 is to be replaced by the one below. 2180 It is always possible that, as a result of various sorts of 2181 reconfiguration events, eir_server_scope and eir_server_owner 2182 values may be different on subsequent EXCHANGE_ID requests made to 2183 the same network address. 2185 In most cases such reconfiguration events will be disruptive and 2186 indicate that an IP address formerly connected to one server is 2187 now connected to an entirely different one. 2189 Some guidelines on client handling of such situations follow: 2191 * When eir_server_scope changes, the client has no assurance that 2192 any id's it obtained previously (e.g. file handles) can be 2193 validly used on the new server, and, even if the new server 2194 accepts them, there is no assurance that this is not due to 2195 accident. Thus it is best to treat all such state as lost/ 2196 stale although a client may assume that the probability of 2197 inadvertent acceptance is low and treat this situation as 2198 within the next case. 2200 * When eir_server_scope remains the same and 2201 eir_server_owner.so_major_id changes, the client can use 2202 filehandles it has and attempt reclaims. It may find that 2203 these are now stale but if NFS4ERR_STALE is not received, he 2204 can proceed to reclaim his opens. 2206 * When eir_server_scope and eir_server_owner.so_major_id remain 2207 the same, the client has to use the now-current values of 2208 eir_server_owner.so_minor_id in deciding on appropriate forms 2209 of trunking. 2211 12.5. Revision to Treatment of EXCHANGE_ID 2213 There are a number of issues in the original treatment of EXCHANGE_ID 2214 (in [RFC5661]) that cause problems for Transparent State Migration 2215 and for the transfer of access between different network access paths 2216 to the same file system instance. 2218 These issues arise from the fact that this treatment was written: 2220 o assuming that a client ID can only become known to a server by 2221 having been created by executing an EXCHANGE_ID, with confirmation 2222 of the ID only possible by execution of a CREATE_SESSION. 2224 o Considering the interactions between a client and a server only on 2225 a single network address 2227 As these assumptions have become invalid in the context of 2228 Transparent State Migration and active use of trunking, the treatment 2229 has been modified in several respects. 2231 o It had been assumed that an EXCHANGED_ID executed when the server 2232 is already aware of a given client instance must be either 2233 updating associated parameters (e.g. with respect to callbacks) or 2234 a lingering retransmission to deal with a previously lost reply. 2235 As result, any slot sequence returned would be of no use. The 2236 existing treatment goes so far as to say that it "MUST NOT" be 2237 used, although this usage is not in accord with [RFC2119]. This 2238 created a difficulty when an EXCHANGE_ID is done after Transparent 2239 State Migration since that slot sequence needs to be used in a 2240 subsequent CREATE_SESSION. 2242 In the updated treatment, CREATE_SESSION is a way that client IDs 2243 are confirmed but it is understood that other ways are possible. 2244 The slot sequence can be used as needed and cases in which it 2245 would be of no use are appropriately noted. 2247 o It was assumed that the only functions of EXCHANGE_ID were to 2248 inform the server of the client, create the client ID, and 2249 communicate it to the client. When multiple simultaneous 2250 connections are involved, as often happens when trunking, that 2251 treatment was inadequate in that it ignored the role of 2252 EXCHANGE_ID in associating the client ID with the connection on 2253 which it was done, so that it could be used by a subsequent 2254 CREATE_SESSSION, whose parameters do not include an explicit 2255 client ID. 2257 The new treatment explicitly discusses the role of EXCHANGE_ID in 2258 associating the client ID with the connection so it can be used by 2259 CREATE_SESSION and in associating a connection with an existing 2260 session. 2262 The new treatment can be found in Section 13 below. It is intended 2263 to supersede the treatment in Section 18.35 of [RFC5661]. Publishing 2264 a complete replacement for Section 18.35 allows the corrected 2265 definition to be read as a whole once [RFC5661] is updated 2267 13. Operation 42: EXCHANGE_ID - Instantiate Client ID (as updated) 2269 The EXCHANGE_ID exchanges long-hand client and server identifiers 2270 (owners), and provides access to a client ID, creating one if 2271 necessary. This client ID becomes associated with the connection on 2272 which the operation is done, so that it is available when a 2273 CREATE_SESSION is done or when the connection is used to issue a 2274 request on an existing session associated with the current client. 2276 13.1. ARGUMENT 2278 const EXCHGID4_FLAG_SUPP_MOVED_REFER = 0x00000001; 2279 const EXCHGID4_FLAG_SUPP_MOVED_MIGR = 0x00000002; 2281 const EXCHGID4_FLAG_BIND_PRINC_STATEID = 0x00000100; 2283 const EXCHGID4_FLAG_USE_NON_PNFS = 0x00010000; 2284 const EXCHGID4_FLAG_USE_PNFS_MDS = 0x00020000; 2285 const EXCHGID4_FLAG_USE_PNFS_DS = 0x00040000; 2287 const EXCHGID4_FLAG_MASK_PNFS = 0x00070000; 2289 const EXCHGID4_FLAG_UPD_CONFIRMED_REC_A = 0x40000000; 2290 const EXCHGID4_FLAG_CONFIRMED_R = 0x80000000; 2292 struct state_protect_ops4 { 2293 bitmap4 spo_must_enforce; 2294 bitmap4 spo_must_allow; 2295 }; 2297 struct ssv_sp_parms4 { 2298 state_protect_ops4 ssp_ops; 2299 sec_oid4 ssp_hash_algs<>; 2300 sec_oid4 ssp_encr_algs<>; 2301 uint32_t ssp_window; 2302 uint32_t ssp_num_gss_handles; 2303 }; 2305 enum state_protect_how4 { 2306 SP4_NONE = 0, 2307 SP4_MACH_CRED = 1, 2308 SP4_SSV = 2 2309 }; 2311 union state_protect4_a switch(state_protect_how4 spa_how) { 2312 case SP4_NONE: 2313 void; 2314 case SP4_MACH_CRED: 2315 state_protect_ops4 spa_mach_ops; 2316 case SP4_SSV: 2317 ssv_sp_parms4 spa_ssv_parms; 2318 }; 2320 struct EXCHANGE_ID4args { 2321 client_owner4 eia_clientowner; 2322 uint32_t eia_flags; 2323 state_protect4_a eia_state_protect; 2324 nfs_impl_id4 eia_client_impl_id<1>; 2325 }; 2327 13.2. RESULT 2328 struct ssv_prot_info4 { 2329 state_protect_ops4 spi_ops; 2330 uint32_t spi_hash_alg; 2331 uint32_t spi_encr_alg; 2332 uint32_t spi_ssv_len; 2333 uint32_t spi_window; 2334 gsshandle4_t spi_handles<>; 2335 }; 2337 union state_protect4_r switch(state_protect_how4 spr_how) { 2338 case SP4_NONE: 2339 void; 2340 case SP4_MACH_CRED: 2341 state_protect_ops4 spr_mach_ops; 2342 case SP4_SSV: 2343 ssv_prot_info4 spr_ssv_info; 2344 }; 2346 struct EXCHANGE_ID4resok { 2347 clientid4 eir_clientid; 2348 sequenceid4 eir_sequenceid; 2349 uint32_t eir_flags; 2350 state_protect4_r eir_state_protect; 2351 server_owner4 eir_server_owner; 2352 opaque eir_server_scope; 2353 nfs_impl_id4 eir_server_impl_id<1>; 2354 }; 2356 union EXCHANGE_ID4res switch (nfsstat4 eir_status) { 2357 case NFS4_OK: 2358 EXCHANGE_ID4resok eir_resok4; 2360 default: 2361 void; 2362 }; 2364 13.3. DESCRIPTION 2366 The client uses the EXCHANGE_ID operation to register a particular 2367 client_owner with the server. However, when the client_owner has 2368 been already been registered by other means (e.g. Transparent State 2369 Migration), the client may still use EXCHANGE_ID to obtain the client 2370 ID assigned previously. 2372 The client ID returned from this operation will be associated with 2373 the connection on which the EXHANGE_ID is received and will serve as 2374 a parent object for sessions created by the client on this connection 2375 or to which the connection is bound. As a result of using those 2376 sessions to make requests involving the creation of state, that state 2377 will become associated with the client ID returned. 2379 In situations in which the registration of the client_owner has not 2380 occurred previously, the client ID must first be used, along with the 2381 returned eir_sequenceid, in creating an associated session using 2382 CREATE_SESSION. 2384 If the flag EXCHGID4_FLAG_CONFIRMED_R is set in the result, 2385 eir_flags, then it is an indication that the registration of the 2386 client_owner has already occurred and that a further CREATE_SESSION 2387 is not needed to confirm it. Of course, subsequent CREATE_SESSION 2388 operations may be needed for other reasons. 2390 The value eir_sequenceid is used to establish an initial sequence 2391 value associate with the client ID returned. In cases in which a 2392 CREATE_SESSION has already been done, there is no need for this 2393 value, since sequencing of such request has already been established 2394 and the client has no need for this value and will ignore it 2396 EXCHANGE_ID MAY be sent in a COMPOUND procedure that starts with 2397 SEQUENCE. However, when a client communicates with a server for the 2398 first time, it will not have a session, so using SEQUENCE will not be 2399 possible. If EXCHANGE_ID is sent without a preceding SEQUENCE, then 2400 it MUST be the only operation in the COMPOUND procedure's request. 2401 If it is not, the server MUST return NFS4ERR_NOT_ONLY_OP. 2403 The eia_clientowner field is composed of a co_verifier field and a 2404 co_ownerid string. As noted in section 2.4 of [RFC5661], the 2405 co_ownerid describes the client, and the co_verifier is the 2406 incarnation of the client. An EXCHANGE_ID sent with a new 2407 incarnation of the client will lead to the server removing lock state 2408 of the old incarnation. Whereas an EXCHANGE_ID sent with the current 2409 incarnation and co_ownerid will result in an error or an update of 2410 the client ID's properties, depending on the arguments to 2411 EXCHANGE_ID. 2413 A server MUST NOT use the same client ID for two different 2414 incarnations of an eir_clientowner. 2416 In addition to the client ID and sequence ID, the server returns a 2417 server owner (eir_server_owner) and server scope (eir_server_scope). 2418 The former field is used for network trunking as described in 2419 Section 2.10.54 of [RFC5661]. The latter field is used to allow 2420 clients to determine when client IDs sent by one server may be 2421 recognized by another in the event of file system migration (see 2422 Section 8.9 of the current document). 2424 The client ID returned by EXCHANGE_ID is only unique relative to the 2425 combination of eir_server_owner.so_major_id and eir_server_scope. 2426 Thus, if two servers return the same client ID, the onus is on the 2427 client to distinguish the client IDs on the basis of 2428 eir_server_owner.so_major_id and eir_server_scope. In the event two 2429 different servers claim matching server_owner.so_major_id and 2430 eir_server_scope, the client can use the verification techniques 2431 discussed in Section 2.10.5 of [RFC5661] to determine if the servers 2432 are distinct. If they are distinct, then the client will need to 2433 note the destination network addresses of the connections used with 2434 each server, and use the network address as the final discriminator. 2436 The server, as defined by the unique identity expressed in the 2437 so_major_id of the server owner and the server scope, needs to track 2438 several properties of each client ID it hands out. The properties 2439 apply to the client ID and all sessions associated with the client 2440 ID. The properties are derived from the arguments and results of 2441 EXCHANGE_ID. The client ID properties include: 2443 o The capabilities expressed by the following bits, which come from 2444 the results of EXCHANGE_ID: 2446 * EXCHGID4_FLAG_SUPP_MOVED_REFER 2448 * EXCHGID4_FLAG_SUPP_MOVED_MIGR 2450 * EXCHGID4_FLAG_BIND_PRINC_STATEID 2452 * EXCHGID4_FLAG_USE_NON_PNFS 2454 * EXCHGID4_FLAG_USE_PNFS_MDS 2456 * EXCHGID4_FLAG_USE_PNFS_DS 2458 These properties may be updated by subsequent EXCHANGE_ID requests 2459 on confirmed client IDs though the server MAY refuse to change 2460 them. 2462 o The state protection method used, one of SP4_NONE, SP4_MACH_CRED, 2463 or SP4_SSV, as set by the spa_how field of the arguments to 2464 EXCHANGE_ID. Once the client ID is confirmed, this property 2465 cannot be updated by subsequent EXCHANGE_ID requests. 2467 o For SP4_MACH_CRED or SP4_SSV state protection: 2469 * The list of operations (spo_must_enforce) that MUST use the 2470 specified state protection. This list comes from the results 2471 of EXCHANGE_ID. 2473 * The list of operations (spo_must_allow) that MAY use the 2474 specified state protection. This list comes from the results 2475 of EXCHANGE_ID. 2477 Once the client ID is confirmed, these properties cannot be 2478 updated by subsequent EXCHANGE_ID requests. 2480 o For SP4_SSV protection: 2482 * The OID of the hash algorithm. This property is represented by 2483 one of the algorithms in the ssp_hash_algs field of the 2484 EXCHANGE_ID arguments. Once the client ID is confirmed, this 2485 property cannot be updated by subsequent EXCHANGE_ID requests. 2487 * The OID of the encryption algorithm. This property is 2488 represented by one of the algorithms in the ssp_encr_algs field 2489 of the EXCHANGE_ID arguments. Once the client ID is confirmed, 2490 this property cannot be updated by subsequent EXCHANGE_ID 2491 requests. 2493 * The length of the SSV. This property is represented by the 2494 spi_ssv_len field in the EXCHANGE_ID results. Once the client 2495 ID is confirmed, this property cannot be updated by subsequent 2496 EXCHANGE_ID requests. 2498 There are REQUIRED and RECOMMENDED relationships among the 2499 length of the key of the encryption algorithm ("key length"), 2500 the length of the output of hash algorithm ("hash length"), and 2501 the length of the SSV ("SSV length"). 2503 + key length MUST be <= hash length. This is because the keys 2504 used for the encryption algorithm are actually subkeys 2505 derived from the SSV, and the derivation is via the hash 2506 algorithm. The selection of an encryption algorithm with a 2507 key length that exceeded the length of the output of the 2508 hash algorithm would require padding, and thus weaken the 2509 use of the encryption algorithm. 2511 + hash length SHOULD be <= SSV length. This is because the 2512 SSV is a key used to derive subkeys via an HMAC, and it is 2513 recommended that the key used as input to an HMAC be at 2514 least as long as the length of the HMAC's hash algorithm's 2515 output (see Section 3 of [RFC2104]). 2517 + key length SHOULD be <= SSV length. This is a transitive 2518 result of the above two invariants. 2520 + key length SHOULD be >= hash length / 2. This is because 2521 the subkey derivation is via an HMAC and it is recommended 2522 that if the HMAC has to be truncated, it should not be 2523 truncated to less than half the hash length (see Section 4 2524 of RFC2104 [RFC2104]). 2526 * Number of concurrent versions of the SSV the client and server 2527 will support (see Section 2.10.9 of [RFC5661]). This property 2528 is represented by spi_window in the EXCHANGE_ID results. The 2529 property may be updated by subsequent EXCHANGE_ID requests. 2531 o The client's implementation ID as represented by the 2532 eia_client_impl_id field of the arguments. The property may be 2533 updated by subsequent EXCHANGE_ID requests. 2535 o The server's implementation ID as represented by the 2536 eir_server_impl_id field of the reply. The property may be 2537 updated by replies to subsequent EXCHANGE_ID requests. 2539 The eia_flags passed as part of the arguments and the eir_flags 2540 results allow the client and server to inform each other of their 2541 capabilities as well as indicate how the client ID will be used. 2542 Whether a bit is set or cleared on the arguments' flags does not 2543 force the server to set or clear the same bit on the results' side. 2544 Bits not defined above cannot be set in the eia_flags field. If they 2545 are, the server MUST reject the operation with NFS4ERR_INVAL. 2547 The EXCHGID4_FLAG_UPD_CONFIRMED_REC_A bit can only be set in 2548 eia_flags; it is always off in eir_flags. The 2549 EXCHGID4_FLAG_CONFIRMED_R bit can only be set in eir_flags; it is 2550 always off in eia_flags. If the server recognizes the co_ownerid and 2551 co_verifier as mapping to a confirmed client ID, it sets 2552 EXCHGID4_FLAG_CONFIRMED_R in eir_flags. The 2553 EXCHGID4_FLAG_CONFIRMED_R flag allows a client to tell if the client 2554 ID it is trying to create already exists and is confirmed. 2556 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set in eia_flags, this means 2557 that the client is attempting to update properties of an existing 2558 confirmed client ID (if the client wants to update properties of an 2559 unconfirmed client ID, it MUST NOT set 2560 EXCHGID4_FLAG_UPD_CONFIRMED_REC_A). If so, it is RECOMMENDED that 2561 the client send the update EXCHANGE_ID operation in the same COMPOUND 2562 as a SEQUENCE so that the EXCHANGE_ID is executed exactly once. 2563 Whether the client can update the properties of client ID depends on 2564 the state protection it selected when the client ID was created, and 2565 the principal and security flavor it uses when sending the 2566 EXCHANGE_ID request. The situations described in items 6, 7, 8, or 9 2567 of the second numbered list of Section 13.4 below will apply. Note 2568 that if the operation succeeds and returns a client ID that is 2569 already confirmed, the server MUST set the EXCHGID4_FLAG_CONFIRMED_R 2570 bit in eir_flags. 2572 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in eia_flags, this 2573 means that the client is trying to establish a new client ID; it is 2574 attempting to trunk data communication to the server (See 2575 Section 2.10.5 of [RFC5661]); or it is attempting to update 2576 properties of an unconfirmed client ID. The situations described in 2577 items 1, 2, 3, 4, or 5 of the second numbered list of Section 13.4 2578 below will apply. Note that if the operation succeeds and returns a 2579 client ID that was previously confirmed, the server MUST set the 2580 EXCHGID4_FLAG_CONFIRMED_R bit in eir_flags. 2582 When the EXCHGID4_FLAG_SUPP_MOVED_REFER flag bit is set, the client 2583 indicates that it is capable of dealing with an NFS4ERR_MOVED error 2584 as part of a referral sequence. When this bit is not set, it is 2585 still legal for the server to perform a referral sequence. However, 2586 a server may use the fact that the client is incapable of correctly 2587 responding to a referral, by avoiding it for that particular client. 2588 It may, for instance, act as a proxy for that particular file system, 2589 at some cost in performance, although it is not obligated to do so. 2590 If the server will potentially perform a referral, it MUST set 2591 EXCHGID4_FLAG_SUPP_MOVED_REFER in eir_flags. 2593 When the EXCHGID4_FLAG_SUPP_MOVED_MIGR is set, the client indicates 2594 that it is capable of dealing with an NFS4ERR_MOVED error as part of 2595 a file system migration sequence. When this bit is not set, it is 2596 still legal for the server to indicate that a file system has moved, 2597 when this in fact happens. However, a server may use the fact that 2598 the client is incapable of correctly responding to a migration in its 2599 scheduling of file systems to migrate so as to avoid migration of 2600 file systems being actively used. It may also hide actual migrations 2601 from clients unable to deal with them by acting as a proxy for a 2602 migrated file system for particular clients, at some cost in 2603 performance, although it is not obligated to do so. If the server 2604 will potentially perform a migration, it MUST set 2605 EXCHGID4_FLAG_SUPP_MOVED_MIGR in eir_flags. 2607 When EXCHGID4_FLAG_BIND_PRINC_STATEID is set, the client indicates 2608 that it wants the server to bind the stateid to the principal. This 2609 means that when a principal creates a stateid, it has to be the one 2610 to use the stateid. If the server will perform binding, it will 2611 return EXCHGID4_FLAG_BIND_PRINC_STATEID. The server MAY return 2612 EXCHGID4_FLAG_BIND_PRINC_STATEID even if the client does not request 2613 it. If an update to the client ID changes the value of 2614 EXCHGID4_FLAG_BIND_PRINC_STATEID's client ID property, the effect 2615 applies only to new stateids. Existing stateids (and all stateids 2616 with the same "other" field) that were created with stateid to 2617 principal binding in force will continue to have binding in force. 2618 Existing stateids (and all stateids with the same "other" field) that 2619 were created with stateid to principal not in force will continue to 2620 have binding not in force. 2622 The EXCHGID4_FLAG_USE_NON_PNFS, EXCHGID4_FLAG_USE_PNFS_MDS, and 2623 EXCHGID4_FLAG_USE_PNFS_DS bits are described in Section 13.1 of 2624 [RFC5661] and convey roles the client ID is to be used for in a pNFS 2625 environment. The server MUST set one of the acceptable combinations 2626 of these bits (roles) in eir_flags, as specified in that section. 2627 Note that the same client owner/server owner pair can have multiple 2628 roles. Multiple roles can be associated with the same client ID or 2629 with different client IDs. Thus, if a client sends EXCHANGE_ID from 2630 the same client owner to the same server owner multiple times, but 2631 specifies different pNFS roles each time, the server might return 2632 different client IDs. Given that different pNFS roles might have 2633 different client IDs, the client may ask for different properties for 2634 each role/client ID. 2636 The spa_how field of the eia_state_protect field specifies how the 2637 client wants to protect its client, locking, and session states from 2638 unauthorized changes (Section 2.10.8.3 of [RFC5661]): 2640 o SP4_NONE. The client does not request the NFSv4.1 server to 2641 enforce state protection. The NFSv4.1 server MUST NOT enforce 2642 state protection for the returned client ID. 2644 o SP4_MACH_CRED. If spa_how is SP4_MACH_CRED, then the client MUST 2645 send the EXCHANGE_ID request with RPCSEC_GSS as the security 2646 flavor, and with a service of RPC_GSS_SVC_INTEGRITY or 2647 RPC_GSS_SVC_PRIVACY. If SP4_MACH_CRED is specified, then the 2648 client wants to use an RPCSEC_GSS-based machine credential to 2649 protect its state. The server MUST note the principal the 2650 EXCHANGE_ID operation was sent with, and the GSS mechanism used. 2651 These notes collectively comprise the machine credential. 2653 After the client ID is confirmed, as long as the lease associated 2654 with the client ID is unexpired, a subsequent EXCHANGE_ID 2655 operation that uses the same eia_clientowner.co_owner as the first 2656 EXCHANGE_ID MUST also use the same machine credential as the first 2657 EXCHANGE_ID. The server returns the same client ID for the 2658 subsequent EXCHANGE_ID as that returned from the first 2659 EXCHANGE_ID. 2661 o SP4_SSV. If spa_how is SP4_SSV, then the client MUST send the 2662 EXCHANGE_ID request with RPCSEC_GSS as the security flavor, and 2663 with a service of RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY. 2665 If SP4_SSV is specified, then the client wants to use the SSV to 2666 protect its state. The server records the credential used in the 2667 request as the machine credential (as defined above) for the 2668 eia_clientowner.co_owner. The CREATE_SESSION operation that 2669 confirms the client ID MUST use the same machine credential. 2671 When a client specifies SP4_MACH_CRED or SP4_SSV, it also provides 2672 two lists of operations (each expressed as a bitmap). The first list 2673 is spo_must_enforce and consists of those operations the client MUST 2674 send (subject to the server confirming the list of operations in the 2675 result of EXCHANGE_ID) with the machine credential (if SP4_MACH_CRED 2676 protection is specified) or the SSV-based credential (if SP4_SSV 2677 protection is used). The client MUST send the operations with 2678 RPCSEC_GSS credentials that specify the RPC_GSS_SVC_INTEGRITY or 2679 RPC_GSS_SVC_PRIVACY security service. Typically, the first list of 2680 operations includes EXCHANGE_ID, CREATE_SESSION, DELEGPURGE, 2681 DESTROY_SESSION, BIND_CONN_TO_SESSION, and DESTROY_CLIENTID. The 2682 client SHOULD NOT specify in this list any operations that require a 2683 filehandle because the server's access policies MAY conflict with the 2684 client's choice, and thus the client would then be unable to access a 2685 subset of the server's namespace. 2687 Note that if SP4_SSV protection is specified, and the client 2688 indicates that CREATE_SESSION must be protected with SP4_SSV, because 2689 the SSV cannot exist without a confirmed client ID, the first 2690 CREATE_SESSION MUST instead be sent using the machine credential, and 2691 the server MUST accept the machine credential. 2693 There is a corresponding result, also called spo_must_enforce, of the 2694 operations for which the server will require SP4_MACH_CRED or SP4_SSV 2695 protection. Normally, the server's result equals the client's 2696 argument, but the result MAY be different. If the client requests 2697 one or more operations in the set { EXCHANGE_ID, CREATE_SESSION, 2698 DELEGPURGE, DESTROY_SESSION, BIND_CONN_TO_SESSION, DESTROY_CLIENTID 2699 }, then the result spo_must_enforce MUST include the operations the 2700 client requested from that set. 2702 If spo_must_enforce in the results has BIND_CONN_TO_SESSION set, then 2703 connection binding enforcement is enabled, and the client MUST use 2704 the machine (if SP4_MACH_CRED protection is used) or SSV (if SP4_SSV 2705 protection is used) credential on calls to BIND_CONN_TO_SESSION. 2707 The second list is spo_must_allow and consists of those operations 2708 the client wants to have the option of sending with the machine 2709 credential or the SSV-based credential, even if the object the 2710 operations are performed on is not owned by the machine or SSV 2711 credential. 2713 The corresponding result, also called spo_must_allow, consists of the 2714 operations the server will allow the client to use SP4_SSV or 2715 SP4_MACH_CRED credentials with. Normally, the server's result equals 2716 the client's argument, but the result MAY be different. 2718 The purpose of spo_must_allow is to allow clients to solve the 2719 following conundrum. Suppose the client ID is confirmed with 2720 EXCHGID4_FLAG_BIND_PRINC_STATEID, and it calls OPEN with the 2721 RPCSEC_GSS credentials of a normal user. Now suppose the user's 2722 credentials expire, and cannot be renewed (e.g., a Kerberos ticket 2723 granting ticket expires, and the user has logged off and will not be 2724 acquiring a new ticket granting ticket). The client will be unable 2725 to send CLOSE without the user's credentials, which is to say the 2726 client has to either leave the state on the server or re-send 2727 EXCHANGE_ID with a new verifier to clear all state, that is, unless 2728 the client includes CLOSE on the list of operations in spo_must_allow 2729 and the server agrees. 2731 The SP4_SSV protection parameters also have: 2733 ssp_hash_algs: 2735 This is the set of algorithms the client supports for the purpose 2736 of computing the digests needed for the internal SSV GSS mechanism 2737 and for the SET_SSV operation. Each algorithm is specified as an 2738 object identifier (OID). The REQUIRED algorithms for a server are 2739 id-sha1, id-sha224, id-sha256, id-sha384, and id-sha512 [RFC4055]. 2740 The algorithm the server selects among the set is indicated in 2741 spi_hash_alg, a field of spr_ssv_prot_info. The field 2742 spi_hash_alg is an index into the array ssp_hash_algs. If the 2743 server does not support any of the offered algorithms, it returns 2744 NFS4ERR_HASH_ALG_UNSUPP. If ssp_hash_algs is empty, the server 2745 MUST return NFS4ERR_INVAL. 2747 ssp_encr_algs: 2749 This is the set of algorithms the client supports for the purpose 2750 of providing privacy protection for the internal SSV GSS 2751 mechanism. Each algorithm is specified as an OID. The REQUIRED 2752 algorithm for a server is id-aes256-CBC. The RECOMMENDED 2753 algorithms are id-aes192-CBC and id-aes128-CBC [CSOR_AES]. The 2754 selected algorithm is returned in spi_encr_alg, an index into 2755 ssp_encr_algs. If the server does not support any of the offered 2756 algorithms, it returns NFS4ERR_ENCR_ALG_UNSUPP. If ssp_encr_algs 2757 is empty, the server MUST return NFS4ERR_INVAL. Note that due to 2758 previously stated requirements and recommendations on the 2759 relationships between key length and hash length, some 2760 combinations of RECOMMENDED and REQUIRED encryption algorithm and 2761 hash algorithm either SHOULD NOT or MUST NOT be used. Table 1 2762 summarizes the illegal and discouraged combinations. 2764 ssp_window: 2766 This is the number of SSV versions the client wants the server to 2767 maintain (i.e., each successful call to SET_SSV produces a new 2768 version of the SSV). If ssp_window is zero, the server MUST 2769 return NFS4ERR_INVAL. The server responds with spi_window, which 2770 MUST NOT exceed ssp_window, and MUST be at least one. Any 2771 requests on the backchannel or fore channel that are using a 2772 version of the SSV that is outside the window will fail with an 2773 ONC RPC authentication error, and the requester will have to retry 2774 them with the same slot ID and sequence ID. 2776 ssp_num_gss_handles: 2778 This is the number of RPCSEC_GSS handles the server should create 2779 that are based on the GSS SSV mechanism (see section 2.10.9 of 2780 [RFC5661]). It is not the total number of RPCSEC_GSS handles for 2781 the client ID. Indeed, subsequent calls to EXCHANGE_ID will add 2782 RPCSEC_GSS handles. The server responds with a list of handles in 2783 spi_handles. If the client asks for at least one handle and the 2784 server cannot create it, the server MUST return an error. The 2785 handles in spi_handles are not available for use until the client 2786 ID is confirmed, which could be immediately if EXCHANGE_ID returns 2787 EXCHGID4_FLAG_CONFIRMED_R, or upon successful confirmation from 2788 CREATE_SESSION. 2790 While a client ID can span all the connections that are connected 2791 to a server sharing the same eir_server_owner.so_major_id, the 2792 RPCSEC_GSS handles returned in spi_handles can only be used on 2793 connections connected to a server that returns the same the 2794 eir_server_owner.so_major_id and eir_server_owner.so_minor_id on 2795 each connection. It is permissible for the client to set 2796 ssp_num_gss_handles to zero; the client can create more handles 2797 with another EXCHANGE_ID call. 2799 Because each SSV RPCSEC_GSS handle shares a common SSV GSS 2800 context, there are security considerations specific to this 2801 situation discussed in Section 2.10.10 of [RFC5661]. 2803 The seq_window (see Section 5.2.3.1 of [RFC2203]) of each 2804 RPCSEC_GSS handle in spi_handle MUST be the same as the seq_window 2805 of the RPCSEC_GSS handle used for the credential of the RPC 2806 request that the EXCHANGE_ID request was sent with. 2808 +-------------------+----------------------+------------------------+ 2809 | Encryption | MUST NOT be combined | SHOULD NOT be combined | 2810 | Algorithm | with | with | 2811 +-------------------+----------------------+------------------------+ 2812 | id-aes128-CBC | | id-sha384, id-sha512 | 2813 | id-aes192-CBC | id-sha1 | id-sha512 | 2814 | id-aes256-CBC | id-sha1, id-sha224 | | 2815 +-------------------+----------------------+------------------------+ 2817 Table 1 2819 The arguments include an array of up to one element in length called 2820 eia_client_impl_id. If eia_client_impl_id is present, it contains 2821 the information identifying the implementation of the client. 2822 Similarly, the results include an array of up to one element in 2823 length called eir_server_impl_id that identifies the implementation 2824 of the server. Servers MUST accept a zero-length eia_client_impl_id 2825 array, and clients MUST accept a zero-length eir_server_impl_id 2826 array. 2828 A possible use for implementation identifiers would be in diagnostic 2829 software that extracts this information in an attempt to identify 2830 interoperability problems, performance workload behaviors, or general 2831 usage statistics. Since the intent of having access to this 2832 information is for planning or general diagnosis only, the client and 2833 server MUST NOT interpret this implementation identity information in 2834 a way that affects how the implementation behaves in interacting with 2835 its peer. The client and server are not allowed to depend on the 2836 peer's manifesting a particular allowed behavior based on an 2837 implementation identifier but are required to interoperate as 2838 specified elsewhere in the protocol specification. 2840 Because it is possible that some implementations might violate the 2841 protocol specification and interpret the identity information, 2842 implementations MUST provide facilities to allow the NFSv4 client and 2843 server be configured to set the contents of the nfs_impl_id 2844 structures sent to any specified value. 2846 13.4. IMPLEMENTATION 2848 A server's client record is a 5-tuple: 2850 1. co_ownerid 2852 The client identifier string, from the eia_clientowner 2853 structure of the EXCHANGE_ID4args structure. 2855 2. co_verifier: 2857 A client-specific value used to indicate incarnations (where a 2858 client restart represents a new incarnation), from the 2859 eia_clientowner structure of the EXCHANGE_ID4args structure. 2861 3. principal: 2863 The principal that was defined in the RPC header's credential 2864 and/or verifier at the time the client record was established. 2866 4. client ID: 2868 The shorthand client identifier, generated by the server and 2869 returned via the eir_clientid field in the EXCHANGE_ID4resok 2870 structure. 2872 5. confirmed: 2874 A private field on the server indicating whether or not a 2875 client record has been confirmed. A client record is 2876 confirmed if there has been a successful CREATE_SESSION 2877 operation to confirm it. Otherwise, it is unconfirmed. An 2878 unconfirmed record is established by an EXCHANGE_ID call. Any 2879 unconfirmed record that is not confirmed within a lease period 2880 SHOULD be removed. 2882 The following identifiers represent special values for the fields in 2883 the records. 2885 ownerid_arg: 2887 The value of the eia_clientowner.co_ownerid subfield of the 2888 EXCHANGE_ID4args structure of the current request. 2890 verifier_arg: 2892 The value of the eia_clientowner.co_verifier subfield of the 2893 EXCHANGE_ID4args structure of the current request. 2895 old_verifier_arg: 2897 A value of the eia_clientowner.co_verifier field of a client 2898 record received in a previous request; this is distinct from 2899 verifier_arg. 2901 principal_arg: 2903 The value of the RPCSEC_GSS principal for the current request. 2905 old_principal_arg: 2907 A value of the principal of a client record as defined by the RPC 2908 header's credential or verifier of a previous request. This is 2909 distinct from principal_arg. 2911 clientid_ret: 2913 The value of the eir_clientid field the server will return in the 2914 EXCHANGE_ID4resok structure for the current request. 2916 old_clientid_ret: 2918 The value of the eir_clientid field the server returned in the 2919 EXCHANGE_ID4resok structure for a previous request. This is 2920 distinct from clientid_ret. 2922 confirmed: 2924 The client ID has been confirmed. 2926 unconfirmed: 2928 The client ID has not been confirmed. 2930 Since EXCHANGE_ID is a non-idempotent operation, we must consider the 2931 possibility that retries occur as a result of a client restart, 2932 network partition, malfunctioning router, etc. Retries are 2933 identified by the value of the eia_clientowner field of 2934 EXCHANGE_ID4args, and the method for dealing with them is outlined in 2935 the scenarios below. 2937 The scenarios are described in terms of the client record(s) a server 2938 has for a given co_ownerid. Note that if the client ID was created 2939 specifying SP4_SSV state protection and EXCHANGE_ID as the one of the 2940 operations in spo_must_allow, then the server MUST authorize 2941 EXCHANGE_IDs with the SSV principal in addition to the principal that 2942 created the client ID. 2944 1. New Owner ID 2946 If the server has no client records with 2947 eia_clientowner.co_ownerid matching ownerid_arg, and 2948 EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in the 2949 EXCHANGE_ID, then a new shorthand client ID (let us call it 2950 clientid_ret) is generated, and the following unconfirmed 2951 record is added to the server's state. 2953 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 2954 unconfirmed } 2956 Subsequently, the server returns clientid_ret. 2958 2. Non-Update on Existing Client ID 2960 If the server has the following confirmed record, and the 2961 request does not have EXCHGID4_FLAG_UPD_CONFIRMED_REC_A set, 2962 then the request is the result of a retried request due to a 2963 faulty router or lost connection, or the client is trying to 2964 determine if it can perform trunking. 2966 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 2967 confirmed } 2969 Since the record has been confirmed, the client must have 2970 received the server's reply from the initial EXCHANGE_ID 2971 request. Since the server has a confirmed record, and since 2972 EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, with the 2973 possible exception of eir_server_owner.so_minor_id, the server 2974 returns the same result it did when the client ID's properties 2975 were last updated (or if never updated, the result when the 2976 client ID was created). The confirmed record is unchanged. 2978 3. Client Collision 2980 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, and if the 2981 server has the following confirmed record, then this request 2982 is likely the result of a chance collision between the values 2983 of the eia_clientowner.co_ownerid subfield of EXCHANGE_ID4args 2984 for two different clients. 2986 { ownerid_arg, *, old_principal_arg, old_clientid_ret, 2987 confirmed } 2989 If there is currently no state associated with 2990 old_clientid_ret, or if there is state but the lease has 2991 expired, then this case is effectively equivalent to the New 2992 Owner ID case of Paragraph 1. The confirmed record is 2993 deleted, the old_clientid_ret and its lock state are deleted, 2994 a new shorthand client ID is generated, and the following 2995 unconfirmed record is added to the server's state. 2997 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 2998 unconfirmed } 3000 Subsequently, the server returns clientid_ret. 3002 If old_clientid_ret has an unexpired lease with state, then no 3003 state of old_clientid_ret is changed or deleted. The server 3004 returns NFS4ERR_CLID_INUSE to indicate that the client should 3005 retry with a different value for the 3006 eia_clientowner.co_ownerid subfield of EXCHANGE_ID4args. The 3007 client record is not changed. 3009 4. Replacement of Unconfirmed Record 3011 If the EXCHGID4_FLAG_UPD_CONFIRMED_REC_A flag is not set, and 3012 the server has the following unconfirmed record, then the 3013 client is attempting EXCHANGE_ID again on an unconfirmed 3014 client ID, perhaps due to a retry, a client restart before 3015 client ID confirmation (i.e., before CREATE_SESSION was 3016 called), or some other reason. 3018 { ownerid_arg, *, *, old_clientid_ret, unconfirmed } 3020 It is possible that the properties of old_clientid_ret are 3021 different than those specified in the current EXCHANGE_ID. 3022 Whether or not the properties are being updated, to eliminate 3023 ambiguity, the server deletes the unconfirmed record, 3024 generates a new client ID (clientid_ret), and establishes the 3025 following unconfirmed record: 3027 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 3028 unconfirmed } 3030 5. Client Restart 3032 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, and if the 3033 server has the following confirmed client record, then this 3034 request is likely from a previously confirmed client that has 3035 restarted. 3037 { ownerid_arg, old_verifier_arg, principal_arg, 3038 old_clientid_ret, confirmed } 3039 Since the previous incarnation of the same client will no 3040 longer be making requests, once the new client ID is confirmed 3041 by CREATE_SESSION, byte-range locks and share reservations 3042 should be released immediately rather than forcing the new 3043 incarnation to wait for the lease time on the previous 3044 incarnation to expire. Furthermore, session state should be 3045 removed since if the client had maintained that information 3046 across restart, this request would not have been sent. If the 3047 server supports neither the CLAIM_DELEGATE_PREV nor 3048 CLAIM_DELEG_PREV_FH claim types, associated delegations should 3049 be purged as well; otherwise, delegations are retained and 3050 recovery proceeds according to section 10.2.1 of [RFC5661]. 3052 After processing, clientid_ret is returned to the client and 3053 this client record is added: 3055 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 3056 unconfirmed } 3058 The previously described confirmed record continues to exist, 3059 and thus the same ownerid_arg exists in both a confirmed and 3060 unconfirmed state at the same time. The number of states can 3061 collapse to one once the server receives an applicable 3062 CREATE_SESSION or EXCHANGE_ID. 3064 + If the server subsequently receives a successful 3065 CREATE_SESSION that confirms clientid_ret, then the server 3066 atomically destroys the confirmed record and makes the 3067 unconfirmed record confirmed as described in section 3068 16.36.3 of [RFC5661]. 3070 + If the server instead subsequently receives an EXCHANGE_ID 3071 with the client owner equal to ownerid_arg, one strategy is 3072 to simply delete the unconfirmed record, and process the 3073 EXCHANGE_ID as described in the entirety of Section 13.4. 3075 6. Update 3077 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 3078 has the following confirmed record, then this request is an 3079 attempt at an update. 3081 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 3082 confirmed } 3083 Since the record has been confirmed, the client must have 3084 received the server's reply from the initial EXCHANGE_ID 3085 request. The server allows the update, and the client record 3086 is left intact. 3088 7. Update but No Confirmed Record 3090 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 3091 has no confirmed record corresponding ownerid_arg, then the 3092 server returns NFS4ERR_NOENT and leaves any unconfirmed record 3093 intact. 3095 8. Update but Wrong Verifier 3097 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 3098 has the following confirmed record, then this request is an 3099 illegal attempt at an update, perhaps because of a retry from 3100 a previous client incarnation. 3102 { ownerid_arg, old_verifier_arg, *, clientid_ret, confirmed } 3104 The server returns NFS4ERR_NOT_SAME and leaves the client 3105 record intact. 3107 9. Update but Wrong Principal 3109 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 3110 has the following confirmed record, then this request is an 3111 illegal attempt at an update by an unauthorized principal. 3113 { ownerid_arg, verifier_arg, old_principal_arg, clientid_ret, 3114 confirmed } 3116 The server returns NFS4ERR_PERM and leaves the client record 3117 intact. 3119 14. Security Considerations 3121 The Security Considerations section of [RFC5661] needs the additions 3122 below to properly address some aspects of trunking discovery, 3123 referral, migration and replication. 3125 The possibility that requests to determine the set of network 3126 addresses corresponding to a given server might be interfered with 3127 or have their responses corrupted needs to be taken into account. 3128 In light of this, the following considerations should be taken 3129 note of: 3131 o When DNS is used to convert server named to addresses and 3132 DNSSEC [RFC4033] is not available, the validity of the network 3133 addresses returned cannot be relied upon. However, when the 3134 client uses RPCSEC_GSS to access the designated server, it is 3135 possible for mutual authentication to discover invalid server 3136 addresses provided. 3138 o The fetching of attributes containing location information 3139 SHOULD be performed using RPCSEC_GSS with integrity protection, 3140 as previously explained in the Security Considerations section 3141 of [RFC5661]. It is important to note here that a client 3142 making a request of this sort without using RPCSEC_GSS 3143 including integrity protection needs be aware of the negative 3144 consequences of doing so, which can lead to invalid host names 3145 or network addresses being returned. In light of this, the 3146 client needs to recognize that using such returned location 3147 information to access an NFSv4 server without use of RPCSEC_GSS 3148 (i.e. by using AUTH_SYS) poses dangers as it can result in the 3149 client interacting with an unverified network address posing as 3150 an NFSv4 server. 3152 o Despite the fact that it is a REQUIREMENT (of [RFC5661]) that 3153 "implementations" provide "support" for use of RPCSEC_GSS, it 3154 cannot be assumed that use of RPCSEC_GSS is always available 3155 between any particular client-server pair. 3157 o When a client has the network addresses of a server but not the 3158 associated host names, that would interfere with its ability to 3159 use RPCSEC_GSS. 3161 In light of the above, a server should present location entries 3162 that correspond to file systems on other servers using a host 3163 name. This would allow the client to interrogate the fs_locations 3164 on the destination server to obtain trunking information (as well 3165 as replica information) using RPCSEC_GSS with integrity, 3166 validating the name provided while assuring that the response has 3167 not been corrupted. 3169 When RPCSEC_GSS is not available on a server, the client needs to 3170 be aware of the fact that the location entries are subject to 3171 corruption and cannot be relied upon. In the case of a client 3172 being directed to another server after NFS4ERR_MOVED, this could 3173 vitiate the authentication provided by the use of RPCSEC_GSS on 3174 the destination. Even when RPCSEC_GSS authentication is available 3175 on the destination, the server might validly represent itself as 3176 the server to which the client was erroneously directed. Without 3177 a way to decide whether the server is a valid one, the client can 3178 only determine, using RPCSEC_GSS, that the server corresponds to 3179 the name provided, with no basis for trusting that server. As a 3180 result, the client should not use such unverified location entries 3181 as a basis for migration, even though RPCSEC_GSS might be 3182 available on the destination. 3184 When a location attribute is fetched upon connecting with an NFS 3185 server, it SHOULD, as stated above, be done using RPCSEC_GSS with 3186 integrity protection. When this not possible, it is generally 3187 best for the client to ignore trunking and replica information or 3188 simply not fetch the location information for these purposes. 3190 When location information cannot be verified, it can be subjected 3191 to additional filtering to prevent the client from being 3192 inappropriately directed. For example, if a range of network 3193 addresses can be determined that assure that the servers and 3194 clients using AUTH_SYS are subject to the appropriate set of 3195 constrains (e.g. physical network isolation, administrative 3196 controls on the operating systems used), then network addresses in 3197 the appropriate range can be used with others discarded or 3198 restricted in their use of AUTH_SYS. 3200 To summarize considerations regarding the use of RPCSEC_GSS in 3201 fetching location information, we need to consider the following 3202 possibilities for requests to interrogate location information, 3203 with interrogation approaches on the referring and destination 3204 servers arrived at separately: 3206 o The use of RPCSEC_GSS with integrity protection is RECOMMENDED 3207 in all cases, since the absence of integrity protection exposes 3208 the client to the possibility of the results being modified in 3209 transit. 3211 o The use of requests issued without RPCSEC_GSS (i.e. using 3212 AUTH_SYS), while undesirable, may not be avoidable in all 3213 cases. Where the use of the returned information cannot be 3214 avoided, it should be subject to filtering to eliminate the 3215 possibility that the client would treat an invalid address as 3216 if it were a NFSv4 server. The specifics will vary depending 3217 on the degree of network isolation and whether the request is 3218 to the referring or destination servers. 3220 15. IANA Considerations 3222 This document does not require actions by IANA. 3224 16. References 3226 16.1. Normative References 3228 [CSOR_AES] 3229 National Institute of Standards and Technology, 3230 "Cryptographic Algorithm Object Registration", URL 3231 http://csrc.nist.gov/groups/ST/crypto_apps_infra/csor/ 3232 algorithms.html, November 2007. 3234 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 3235 Requirement Levels", BCP 14, RFC 2119, 3236 DOI 10.17487/RFC2119, March 1997, 3237 . 3239 [RFC2203] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol 3240 Specification", RFC 2203, DOI 10.17487/RFC2203, September 3241 1997, . 3243 [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. 3244 Rose, "DNS Security Introduction and Requirements", 3245 RFC 4033, DOI 10.17487/RFC4033, March 2005, 3246 . 3248 [RFC4055] Schaad, J., Kaliski, B., and R. Housley, "Additional 3249 Algorithms and Identifiers for RSA Cryptography for use in 3250 the Internet X.509 Public Key Infrastructure Certificate 3251 and Certificate Revocation List (CRL) Profile", RFC 4055, 3252 DOI 10.17487/RFC4055, June 2005, 3253 . 3255 [RFC5403] Eisler, M., "RPCSEC_GSS Version 2", RFC 5403, 3256 DOI 10.17487/RFC5403, February 2009, 3257 . 3259 [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., 3260 "Network File System (NFS) Version 4 Minor Version 1 3261 Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, 3262 . 3264 [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System 3265 (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, 3266 March 2015, . 3268 [RFC7861] Adamson, A. and N. Williams, "Remote Procedure Call (RPC) 3269 Security Version 3", RFC 7861, DOI 10.17487/RFC7861, 3270 November 2016, . 3272 [RFC7931] Noveck, D., Ed., Shivam, P., Lever, C., and B. Baker, 3273 "NFSv4.0 Migration: Specification Update", RFC 7931, 3274 DOI 10.17487/RFC7931, July 2016, 3275 . 3277 16.2. Informative References 3279 [I-D.cel-nfsv4-mv0-trunking-update] 3280 Lever, C. and D. Noveck, "NFS version 4.0 Trunking 3281 Update", draft-cel-nfsv4-mv0-trunking-update-00 (work in 3282 progress), November 2017. 3284 [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- 3285 Hashing for Message Authentication", RFC 2104, 3286 DOI 10.17487/RFC2104, February 1997, 3287 . 3289 Appendix A. Classification of Document Sections 3291 Using the classification appearing in Section 3.3, we can proceed 3292 through the current document and classify its sections as listed 3293 below. In this listing, when we refer to a Section X and there is a 3294 Section X.1 within it, the classification of Section X refers to the 3295 part of that section exclusive of subsections. In the case when that 3296 portion is empty, the section is not counted. 3298 o Sections 1 through 4, a total of five sections, are all 3299 explanatory. 3301 o Section 4.1 is a replacement section. 3303 o Section 4.3 is an additional sections. 3305 o Section 4.3 is a replacement sections. 3307 o Section 4.4 is explanatory. 3309 o Section 4.5 is a replacement section. 3311 o Sections 4.5.1 and 4.5.2 are additional sections. 3313 o Sections 4.5.3 through 4.5.5, a total of three sections, are all 3314 replacement sections. 3316 o Section 4.5.6 is an additional section. 3318 o Section 5 is explanatory. 3320 o Sections 6 and 7 are additional sections. 3322 o Sections 8 through 8.9, a total of ten sections, are all 3323 replacement sections. 3325 o Sections 9 through 11.2, a total of eleven sections, are all 3326 additional sections. 3328 o Section 12 is explanatory. 3330 o Sections 12.1 and 12.2 are replacement sections. 3332 o Sections 12.3 and 12.4 are editing sections. 3334 o Section 12.5 is explanatory. 3336 o Section 13 is a replacement section, which consists of a total of 3337 five sections. 3339 o Section 14 is an editing section. 3341 o Section 15 through Acknowledgments, a total of six sections, are 3342 all explanatory. 3344 To summarize: 3346 o There are fifteen explanatory sections. 3348 o There are twenty-two replacement sections. 3350 o There are seventeen additional sections. 3352 o There are three editing sections. 3354 Appendix B. Updates to RFC5661 3356 In this appendix, we proceed through [RFC5661] identifying sections 3357 as unchanged, modified, deleted, or replaced and indicating where 3358 additional sections from the current document would appear in an 3359 eventual consolidated description of NFSv4.1. In this presentation, 3360 when section X is referred to, it denotes that section plus all 3361 included subsections. When it is necessary to refer to the part of a 3362 section outside any included subsections, the exclusion is noted 3363 explicitly. 3365 o Section 1 is unmodified except that Section 1.7.3.3 is to be 3366 replaced by Section 12.1 from the current document. 3368 o Section 2 is unmodified except for the specific items listed 3369 below: 3371 o Section 2.10.4 is replaced by Section 12.2 from the current 3372 document. 3374 o Section 2.10.5 is modified as discussed in Section 12.4 of the 3375 current document. 3377 o Sections 3 through 10 are unchanged. 3379 o Section 11 is extensively modified as discussed below. 3381 o Section 11, exclusive of subsections, is replaced by Sections 3382 4.1 and 4.2 from the current document. 3384 o Section 11.1 is replaced by Section 4.3 from the current 3385 document. 3387 o Sections 11.2, 11.3, 11.3.1, and 11.3.2 are unchanged. 3389 o Section 11.4 is replaced by Section 4.5 from the current 3390 document. For details regarding subsections see below. 3392 o New sections corresponding to Sections 4.5.1 and 4.5.2 from 3393 the current document appear next. 3395 o Section 11.4.1 is replaced by Section 4.5.3 3397 o Section 11.4.2 is replaced by Section 4.5.4 3399 o Section 11.4.3 is replaced by Section 4.5.5 3401 o A new section corresponding to Section 4.5.6 from the 3402 current document appears next. 3404 o Section 11.5 is to be deleted. 3406 o Section 11.6 is unchanged. 3408 o New sections corresponding to Sections 6 and 7 from the current 3409 document appear next. 3411 o Section 11.7 is replaced by Section 8 from the current 3412 document. For details regarding subsections see below. 3414 o Section 11.7.1 is replaced by Section 8.1 3415 o Sections 11.7.2, 11.7.2.1, and 11.7.2.2 are deleted. 3417 o Section 11.7.3 is replaced by Section 8.2 3419 o Section 11.7.4 is replaced by Section 8.3 3421 o Sections 11.7.5 and 11.7.5.1 are replaced by Sections 8.4 3422 and 8.4.1 respectively. 3424 o Section 11.7.6 is replaced by Section 8.5 3426 o Section 11.7.7, exclusive of subsections, is replaced by 3427 Section 8.9. Sections 11.7.7.1 and 11.7.72 are unchanged. 3429 o Section 11.7.8 is replaced by Section 8.6 3431 o Section 11.7.9 is replaced by Section 8.7 3433 o Section 11.7.10 is replaced by Section 8.8 3435 o Sections 11.8, 11.8.1, 11.8.2, 11.9, 11.10, 11.10.1, 11.10.2, 3436 11.10.3, and 11.11 are unchanged. 3438 o New sections corresponding to Sections 9, 10, and 11 from the 3439 current document appear next as additional sub-sections of 3440 Section 11. Each of these has subsections, so there is a total 3441 of seventeen sections added. 3443 o Sections 12 through 14 are unchanged. 3445 o Section 15 is unmodified except that the description of 3446 NFS4ERR_MOVED in Section 15.1 is revised as described in 3447 Section 12.3 of the current document. 3449 o Sections 16 and 17 are unchanged. 3451 o Section 18 is unmodified except that section 18.35 is replaced by 3452 Section 13 in the current document. 3454 o Sections 19 through 23 are unchanged. 3456 In terms of top-level sections, exclusive of appendices: 3458 o There is one heavily modified top-level section (Section 11) 3460 o There are four other modified top-level sections (Sections 1, 2, 3461 15, and 18). 3463 o The other eighteen top-level sections are unchanged. 3465 The disposition of sections of [RFC5661] is summarized in the 3466 following table which provides counts of sections replaced, added, 3467 deleted, modified, or unchanged. Separate counts are provided for: 3469 o Top-level sections. 3471 o Sections with TOC entries. 3473 o Sections within Section 11. 3475 o Sections outside Section 11. 3477 In this table, the counts for top-level sections and TOC entries are 3478 for sections including subsections while other counts are for 3479 sections exclusive of included subsections. 3481 +------------+------+------+--------+------------+--------+ 3482 | Status | Top | TOC | in 11 | not in 11 | Total | 3483 +------------+------+------+--------+------------+--------+ 3484 | Replaced | 0 | 3 | 17 | 7 | 24 | 3485 | Added | 0 | 5 | 22 | 0 | 22 | 3486 | Deleted | 0 | 1 | 4 | 0 | 4 | 3487 | Modified | 5 | 4 | 0 | 2 | 2 | 3488 | Unchanged | 18 | 212 | 16 | 918 | 934 | 3489 | in RFC5661 | 23 | 220 | 37 | 927 | 964 | 3490 +------------+------+------+--------+------------+--------+ 3492 Acknowledgments 3494 The authors wish to acknowledge the important role of Andy Adamson of 3495 Netapp in clarifying the need for trunking discovery functionality, 3496 and exploring the role of the location attributes in providing the 3497 necessary support. 3499 The authors also wish to acknowledge the work of Xuan Qi of Oracle 3500 with NFSv4.1 client and server prototypes of transparent state 3501 migration functionality. 3503 The authors wish to thank Trond Myklebust of Primary Data for his 3504 comments related to trunking, helping to clarify the role of DNS in 3505 trunking discovery. 3507 The authors wish to thank Olga Kornievskaia of Netapp for her helpful 3508 review comments. 3510 Authors' Addresses 3512 David Noveck (editor) 3513 NetApp 3514 1601 Trapelo Road 3515 Waltham, MA 02451 3516 United States of America 3518 Phone: +1 781 572 8038 3519 Email: davenoveck@gmail.com 3521 Charles Lever 3522 Oracle Corporation 3523 1015 Granger Avenue 3524 Ann Arbor, MI 48104 3525 United States of America 3527 Phone: +1 248 614 5091 3528 Email: chuck.lever@oracle.com