idnits 2.17.1 draft-ietf-nfsv4-mv1-msns-update-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 2418 has weird spacing: '... struct fs_lo...' == Line 2464 has weird spacing: '... struct fs_lo...' == Line 2473 has weird spacing: '... struct fs_lo...' (Using the creation date from RFC5661, updated by this document, for RFC5378 checks: 2005-10-21) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 13, 2018) is 1991 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5661 (Obsoleted by RFC 8881) Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 D. Noveck, Ed. 3 Internet-Draft NetApp 4 Updates: 5661 (if approved) C. Lever 5 Intended status: Standards Track ORACLE 6 Expires: May 17, 2019 November 13, 2018 8 NFS Version 4.1 Update for Multi-Server Namespace 9 draft-ietf-nfsv4-mv1-msns-update-03 11 Abstract 13 This document presents necessary clarifications and corrections 14 concerning features related to the use of attributes in NFSv4.1 15 related to file system location. These features include migration, 16 which transfers responsibility for a file system from one server to 17 another, and facilities to support trunking by allowing discovery of 18 the set of network addresses to use to access a file system. This 19 document updates RFC5661. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on May 17, 2019. 38 Copyright Notice 40 Copyright (c) 2018 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (https://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 5 56 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 6 57 3. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 6 58 3.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 59 3.2. Summary of Issues . . . . . . . . . . . . . . . . . . . . 8 60 3.3. Relationship of this Document to [RFC5661] . . . . . . . 10 61 4. Changes to Section 11 of [RFC5661] . . . . . . . . . . . . . 11 62 4.1. Updated introductory material for Section 11 of [RFC5661] 63 entitled "Multi-Server Namespace" . . . . . . . . . . . . 12 64 4.2. New section to be added as the first sub-section of 65 Section 11 of [RFC5661] to be entitled 66 "Terminology Related to File System Location" . . . . . . 12 67 4.3. Updated Section 11.1 of [RFC5661] to be retitled 68 "File System Location Attributes" . . . . . . . . . . . . 14 69 4.4. Re-organization of Sections 11.4 and 11.5 of [RFC5661] . 15 70 4.5. Updated Section 11.4 of [RFC5661] to be retitled 71 "Uses of File System Location Information" . . . . . . . 15 72 4.5.1. New section to be added as the first sub-section of 73 Section 11.4 of [RFC5661] to be entitled 74 "Combining Multiple Uses in a Single Attribute" . . . 16 75 4.5.2. New section to be added as the second sub-section of 76 Section 11.4 of [RFC5661] to be entitled 77 "File System Location Attributes and Trunking" . . . 17 78 4.5.3. New section to be added as the third sub-section of 79 Section 11.4 of [RFC5661] to be entitled 80 "File System Location Attributes and Connection Type 81 Selection" . . . . . . . . . . . . . . . . . . . . . 18 82 4.5.4. Updated Section 11.4.1 of [RFC5661] entitled 83 "File System Replication" . . . . . . . . . . . . . . 19 84 4.5.5. Updated Section 11.4.2 of [RFC5661] entitled 85 "File System Migration" . . . . . . . . . . . . . . . 19 86 4.5.6. Updated Section 11.4.3 of [RFC5661] entitled 87 "Referrals" . . . . . . . . . . . . . . . . . . . . . 20 88 4.5.7. New section to be added after Section 11.4.3 of 89 [RFC5661] to be entitled 90 "Changes in a File System Location Attribute" . . . . 22 91 5. Re-organization of Section 11.7 of [RFC5661] . . . . . . . . 23 92 6. New section to be added after Section 11.6 of [RFC5661] 93 to be entitled "Overview of File Access Transitions" . . . . 23 94 7. New section to be added second after Section 11.6 of 95 [RFC5661] to be entitled 96 "Effecting Network Endpoint Transitions" . . . . . . . . . . 24 98 8. Updated Section 11.7 of [RFC5661] entitled 99 "Effecting File System Transitions" . . . . . . . . . . . . . 25 100 8.1. Updated Section 11.7.1 of [RFC5661] entitled 101 "File System Transitions and Simultaneous Access" . . . . 25 102 8.2. Updated Section 11.7.3 of [RFC5661] entitled 103 "Filehandles and File System Transitions" . . . . . . . . 26 104 8.3. Updated Section 11.7.4 of [RFC5661] entitled 105 "Fileids and File System Transitions" . . . . . . . . . . 27 106 8.4. Updated section 11.7.5 of [RFC5661] entitled 107 "Fsids and File System Transitions" . . . . . . . . . . . 28 108 8.4.1. Updated section 11.7.5.1 of [RFC5661] entitled 109 "File System Splitting" . . . . . . . . . . . . . . . 28 110 8.5. Updated Section 11.7.6 of [RFC5661] entitled 111 "The Change Attribute and File System Transitions" . . . 29 112 8.6. Updated Section 11.7.8 of [RFC5661] entitled 113 "Write Verifiers and File System Transitions" . . . . . . 29 114 8.7. Updated Section 11.7.9 of [RFC5661] entitled 115 "Readdir Cookies and Verifiers and File System 116 Transitions)" . . . . . . . . . . . . . . . . . . . . . . 29 117 8.8. Updated Section 11.7.10 of [RFC5661] entitled 118 "File System Data and File System Transitions" . . . . . 30 119 8.9. Updated Section 11.7.7 of [RFC5661] entitled 120 "Lock State and File System Transitions" . . . . . . . . 31 121 9. New section to be added after Section 11.11 of [RFC5661] 122 to be entitled "Transferring State upon Migration" . . . . . 32 123 9.1. Only sub-section within new section to be added to 124 [RFC5661] to be entitled 125 "Transparent State Migration and pNFS" . . . . . . . . . 32 126 10. New section to be added second after Section 11.11 of 127 [RFC5661] to be entitled 128 "Client Responsibilities when Access is Transitioned" . . . . 34 129 10.1. First sub-section within new section to be added to 130 [RFC5661] to be entitled 131 "Client Transition Notifications" . . . . . . . . . . . 34 132 10.2. Second sub-section within new section to be added to 133 [RFC5661] to be entitled 134 "Performing Migration Discovery" . . . . . . . . . . . . 37 135 10.3. Third sub-section within new section to be added to 136 [RFC5661] to be entitled 137 "Overview of Client Response to NFS4ERR_MOVED" . . . . . 39 138 10.4. Fourth sub-section within new section to be added to 139 [RFC5661] to be entitled 140 "Obtaining Access to Sessions and State after Migration" 41 141 10.5. Fifth sub-section within new section to be added to 142 [RFC5661] to be entitled 143 "Obtaining Access to Sessions and State after Network 144 Address Transfer" . . . . . . . . . . . . . . . . . . . 43 145 11. New section to be added third after Section 11.11 of 147 [RFC5661] to be entitled 148 "Server Responsibilities Upon Migration" . . . . . . . . . . 43 149 11.1. First sub-section within new section to be added to 150 [RFC5661] to be entitled 151 "Server Responsibilities in Effecting State Reclaim 152 after Migration" . . . . . . . . . . . . . . . . . . . . 44 153 11.2. Second sub-section within new section to be added to 154 [RFC5661] to be entitled 155 "Server Responsibilities in Effecting Transparent State 156 Migration" . . . . . . . . . . . . . . . . . . . . . . . 45 157 11.3. Third sub-section within new section to be added to 158 [RFC5661] to be entitled 159 "Server Responsibilities in Effecting Session Transfer" 46 160 12. fs_locations_info . . . . . . . . . . . . . . . . . . . . . . 49 161 12.1. Updates to treatment of fs_locations_info . . . . . . . 49 162 12.2. Updated Section 11.10 of [RFC5661] entitled 163 "The Attribute fs_locations_info" . . . . . . . . . . . 49 164 12.2.1. Updated section 11.10.1 of [RFC5661] entitled 165 "The fs_locations_server4 Structure" . . . . . . . . 53 166 12.2.2. Updated Section 11.10.2 of [RFC5661] entitled 167 "The fs_locations_info4 Structure" . . . . . . . . . 60 168 12.2.3. Updated Section 11.10.3 of [RFC5661] entitled 169 "The fs_locations_item4 Structure" . . . . . . . . . 61 170 13. Changes to [RFC5661] outside Section 11 . . . . . . . . . . . 63 171 13.1. Updated section 1.7.3.3 of [RFC5661] to be retitled 172 "Introduction to Multi-Server Namespace" . . . . . . . . 64 173 13.2. Updated Section 2.10.4 of [RFC5661] entitled 174 "Server Scope" . . . . . . . . . . . . . . . . . . . . . 65 175 13.3. Revised Treatment of NFS4ERR_MOVED . . . . . . . . . . . 66 176 13.4. Revised Discussion of Server_owner changes . . . . . . . 67 177 13.5. Revision to Treatment of EXCHANGE_ID . . . . . . . . . . 68 178 13.6. Revision to Treatment of RECLAIM_COMPLETE . . . . . . . 69 179 13.7. Updated Section 15.1.9 of [RFC5661] entitled 180 "Reclaim Errors" . . . . . . . . . . . . . . . . . . . . 70 181 13.7.1. Updated Section 15.1.9.1 of [RFC5661] entitled 182 "NFS4ERR_COMPLETE_ALREADY (Error Code 10054)" . . . 70 183 13.7.2. Updated Section 15.1.9.2 of [RFC5661] entitled 184 "NFS4ERR_GRACE (Error Code 10013)" . . . . . . . . . 70 185 13.7.3. Updated Section 15.1.9.3 of [RFC5661] entitled 186 "NFS4ERR_NO_GRACE (Error Code 10033)" . . . . . . . 70 187 13.7.4. Updated Section 15.1.9.4 of [RFC5661] entitled 188 "NFS4ERR_RECLAIM_BAD (Error Code 10034)" . . . . . . 70 189 13.7.5. Updated Section 15.1.9.5 of [RFC5661] entitled 190 "NFS4ERR_RECLAIM_CONFLICT (Error Code 10035)" . . . 71 191 14. Updated Section 18.35 of [RFC5661] entitled 192 "Operation 42: EXCHANGE_ID - Instantiate Client ID" . . . . . 71 193 15. Updated Section 18.51 of [RFC5661] entitled 194 "Operation 58: RECLAIM_COMPLETE - Indicates Reclaims 195 Finished" . . . . . . . . . . . . . . . . . . . . . . . . . . 89 196 16. Security Considerations . . . . . . . . . . . . . . . . . . . 93 197 17. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 95 198 18. References . . . . . . . . . . . . . . . . . . . . . . . . . 95 199 18.1. Normative References . . . . . . . . . . . . . . . . . . 95 200 18.2. Informative References . . . . . . . . . . . . . . . . . 96 201 Appendix A. Classification of Document Sections . . . . . . . . 97 202 Appendix B. Updates to [RFC5661] . . . . . . . . . . . . . . . . 98 203 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 102 204 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 102 206 1. Introduction 208 This document defines the proper handling, within NFSv4.1, of the 209 attributes related to file system location fs_locations and 210 fs_locations_info and how necessary changes in those attributes are 211 to be dealt with. The necessary corrections and clarifications 212 parallel those done for NFSv4.0 in [RFC7931] and 213 [I-D.cel-nfsv4-mv0-trunking-update]. 215 A large part of the changes to be made are necessary to clarify the 216 handling of Transparent State Migration in NFSv4.1, which was omitted 217 in [RFC5661]. Many of the issues dealt with in [RFC7931] need to be 218 addressed in the context of NFSv4.1. 220 Another important issue to be dealt with concerns the handling of 221 multiple entries within attributes related to file system locations 222 that represent different ways to access the same file system. 223 Unfortunately [RFC5661], while recognizing that these entries can 224 represent different ways to access the same file system, confuses the 225 matter by treating network access paths as "replicas", making it 226 difficult for these attributes to be used to obtain information about 227 the network addresses to be used to access particular file system 228 instances and engendering confusion between two different sorts of 229 transition: those involving a change of network access paths to the 230 same file system instance and those in which there is a shift between 231 two distinct replicas. 233 When file system location information is used to determine the set of 234 network addresses to access a particular file system instance (i.e. 235 to perform trunking discovery), clarification is needed regarding the 236 interaction of trunking and transitions between file system replicas, 237 including migration. Unfortunately [RFC5661], while it provided a 238 method of determining whether two network addresses were connected to 239 the same server, did not address the issue of trunking discovery, 240 making it necessary to address it in this document. 242 2. Requirements Language 244 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 245 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 246 document are to be interpreted as described in [RFC2119]. 248 3. Preliminaries 250 3.1. Terminology 252 While most of the terms related to multi-server namespace issues are 253 appropriately defined in the replacement for Section 11 in [RFC5661] 254 and appear in Section 4.2 below, there are a number of terms used 255 outside that context that are explained here. 257 In this document, the phrase "client ID" always refers to the 64-bit 258 shorthand identifier assigned by the server (a clientid4) and never 259 to the structure which the client uses to identify itself to the 260 server (called an nfs_client_id4 or client_owner in NFSv4.0 and 261 NFSv4.1 respectively). The opaque identifier within those structures 262 is referred to as a "client id string". 264 It is particularly important to clarify the distinction between 265 trunking detection and trunking discovery. The definitions we 266 present will be applicable to all minor versions of NFSv4, but we 267 will put particular emphasis on how these terms apply to NFS version 268 4.1. 270 o Trunking detection refers to ways of deciding whether two specific 271 network addresses are connected to the same NFSv4 server. The 272 means available to make this determination depends on the protocol 273 version, and, in some cases, on the client implementation. 275 In the case of NFS version 4.1 and later minor versions, the means 276 of trunking detection are as described by [RFC5661] and are 277 available to every client. Two network addresses connected to the 278 same server are always server-trunkable but cannot necessarily be 279 used together to access a single session. 281 o Trunking discovery is a process by which a client using one 282 network address can obtain other addresses that are connected to 283 the same server. Typically it builds on a trunking detection 284 facility by providing one or more methods by which candidate 285 addresses are made available to the client who can then use 286 trunking detection to appropriately filter them. 288 Despite the support for trunking detection there was no 289 description of trunking discovery provided in [RFC5661]. 291 Regarding network addresses and the handling of trunking we use the 292 following terminology: 294 o Each NFSv4 server is assumed to have a set of IP addresses to 295 which NFSv4 requests may be sent by clients. These are referred 296 to as the server's network addresses. Access to a specific server 297 network address may involve the use of multiple ports, since the 298 ports to be used for various types of connections might be 299 required to be different. 301 o Each network address, when combined with a pathname providing the 302 location of a file system root directory relative to the 303 associated server root file handle, defines a file system network 304 access path. 306 o Server network addresses are used to establish connections to 307 servers which may be of a number of connection types. Separate 308 connection types are used to support NFSv4 layered on top of the 309 RPC stream transport as described in [RFC5531] and on top of RPC- 310 over-RDMA as described in [RFC8166]. 312 o The combination of a server network address and a particular 313 connection type to be used by a connection is referred to as a 314 "server endpoint". Although using different connection types may 315 result in different ports being used, the use of different ports 316 by multiple connections to the same network address is not the 317 essence of the distinction between the two endpoints used. 319 o Two network addresses connected to the same server are said to be 320 server-trunkable. Two such addresses support the use of clientid 321 ID trunking, as described in [RFC5661] 323 o Two network addresses connected to the same server such that those 324 addresses can be used to support a single common session are 325 referred to as session-trunkable. Note that two addresses may be 326 server-trunkable without being session-trunkable and that when two 327 connections of different connection types are made to the same 328 network address and are based on a single file system location 329 entry they are always session-trunkable, independent of the 330 connection type, as specified by [RFC5661], since their derivation 331 from the same file system location entry together with the 332 identity of their network addresses assures that both connections 333 are to the same server and will return server-owner information 334 allowing session trunking to be used. 336 Discussion of the term "replica" is complicated for a number of 337 reasons: 339 o Even though the term is used in explaining the issues in [RFC5661] 340 that need to be addressed in this document, a full explanation of 341 this term requires explanation of related terms connected to the 342 file system location attributes which are provided in Section 4.2 343 of the current document. 345 o The term is also used in [RFC5661], with a meaning different from 346 that in the current document. In short, in [RFC5661] each replica 347 is a identified by a single network access path while, in the 348 current document a set of network access paths which have server- 349 trunkable network addresses and the same root-relative file system 350 pathname are considered to be a single replica with multiple 351 network access paths. 353 3.2. Summary of Issues 355 This document explains how clients and servers are to determine the 356 particular network access paths to be used to access a file system. 357 This includes describing how changes to the specific replica or to 358 the set of addresses to be used are to be dealt with, and how 359 transfers of responsibility that need to be made can be dealt with 360 transparently. This includes cases in which there is a shift between 361 one replica and another and those in which different network access 362 paths are used to access the same replica. 364 As a result of the following problems in [RFC5661], it is necessary 365 to provide the updates described later in this document. 367 o [RFC5661], while it dealt with situations in which various forms 368 of clustering allowed co-ordination of the state assigned by co- 369 operating servers to be used, made no provisions for Transparent 370 State Migration, as introduced by [RFC7530] and corrected and 371 clarified by [RFC7931]. 373 o Although NFSv4.1 was defined with a clear definition of how 374 trunking detection was to be done, there was no clear 375 specification of how trunking discovery was to be done, despite 376 the fact that the specification clearly indicated that this 377 information could be made available via the file system location 378 attributes. 380 o Because the existence of multiple network access paths to the same 381 file system was dealt with as if there were multiple replicas, 382 issues relating to transitions between replicas could never be 383 clearly distinguished from trunking-related transitions between 384 the addresses used to access a particular file system instance. 385 As a result, in situations in which both migration and trunking 386 configuration changes were involved, neither of these could be 387 clearly dealt with and the relationship between these two features 388 was not seriously addressed. 390 o Because use of two network access paths to the same file system 391 instance (i.e. trunking) was often treated as if two replicas were 392 involved, it was considered that two replicas were being used 393 simultaneously. As a result, the treatment of replicas being used 394 simultaneously in [RFC5661] was not clear as it covered the two 395 distinct cases of a single file system instance being accessed by 396 two different network access paths and two replicas being accessed 397 simultaneously, with the limitations of the latter case not being 398 clearly laid out. 400 The majority of the consequences of these issues are dealt with via 401 the updates in various subsections of Section 4 and the whole of 402 Section 12 within the current document which deal with problems 403 within Section 11 of [RFC5661] These changes include: 405 o Reorganization made necessary by the fact that two network access 406 paths to the same file system instance needs to be distinguished 407 clearly from two different replicas since the former share locking 408 state and can share session state. 410 o The need for a clear statement regarding the desirability of 411 transparent transfer of state together with a recommendation that 412 either that or a single-fs grace period be provided. 414 o Specifically delineating how such transfers are to be dealt with 415 by the client, taking into account the differences from the 416 treatment in [RFC7931] made necessary by the major protocol 417 changes made in NFSv4.1. 419 o Discussion of the relationship between transparent state transfer 420 and Parallel NFS (pNFS). 422 o A clarification of the fs_locations_info attribute to specify 423 which portions of the information provided apply to a specific 424 network access path and which to the replica which that path is 425 used to access. 427 In addition, there are also updates to other sections of [RFC5661], 428 where the consequences of the incorrect assumptions underlying the 429 current treatment of multi-server namespace issues also need to be 430 corrected. These are to be dealt with as described in Sections 13 431 through 15 of the current document. 433 o A revised introductory section regarding multi-server namespace 434 facilities is provided. 436 o A more realistic treatment of server scope is provided, which 437 reflects the more limited co-ordination of locking state adopted 438 by servers actually sharing a common server scope. 440 o Some confusing text regarding changes in server_owner needs to be 441 clarified. 443 o The description of NFS4ERR_MOVED needs to be updated since two 444 different network access paths to the same file system are no 445 longer considered to be two instances of the same file system. 447 o A new treatment of EXCHANGE_ID is needed, replacing that which 448 appeared in Section 18.35 of [RFC5661]. This is necessary since 449 the existing treatment of client id confirmation does not make 450 sense in the context of transparent state migration, in which 451 client ids are transferred between source and destination servers. 453 o A new treatment of RECLAIM_COMPLETE is needed, replacing that 454 which appeared in Section 18.51 of [RFC5661]. This is necessary 455 to clarify the function of the one-fs flag and clarify how 456 existing clients, that might not properly use this flag, are to be 457 dealt with. 459 3.3. Relationship of this Document to [RFC5661] 461 The role of this document is to explain and specify a set of needed 462 changes to [RFC5661]. All of these changes are related to the multi- 463 server namespace features of NFSv4.1. 465 This document contains sections that propose additions to and other 466 modifications of [RFC5661] as well as others that explain the reasons 467 for modifications but do not directly affect existing specifications. 469 In consequence, the sections of this document can be divided into 470 four groups based on how they relate to the eventual updating of the 471 NFSv4.1 specification. Once the update is published, NFSv4.1 will be 472 specified by two documents that need to be read together, until such 473 time as a consolidated specification is produced. 475 o Explanatory sections do not contain any material that is meant to 476 update the specification of NFSv4.1. Such sections may contain 477 explanations about why and how changes are to be done, without 478 including any text that is to update [RFC5661] or appear in an 479 eventual consolidated document, 481 o Replacement sections contain text that is to replace and thus 482 supersede text within [RFC5661] and then appear in an eventual 483 consolidated document. The titles of replacement sections 484 indicate the section(s) within [RFC5661] that is to be replaced. 486 o Additional sections contain text which, although not replacing 487 anything in [RFC5661], will be part of the specification of 488 NFSv4.1 and will be expected to be part of an eventual 489 consolidated document. The titles of additional sections indicate 490 where, within [RFC5661], the new section would appear. 492 o Editing sections contain some text that replaces text within 493 [RFC5661], although the entire section will not consist of such 494 text and will include other text as well. Such sections make 495 relatively minor adjustments in the existing NFSv4.1 specification 496 which are expected to reflected in an eventual consolidated 497 document. Generally such replacement text appears as a quotation, 498 which may take the form of an indented set of paragraphs. 500 See Appendix A for a classification of the sections of this document 501 according to the categories above. 503 When this document is approved and published, [RFC5661] would be 504 significantly updated with most of the changed sections within the 505 current Section 11 of that document. A detailed discussion of the 506 necessary updates can be found in Appendix B. 508 4. Changes to Section 11 of [RFC5661] 510 A number of sections need to be revised, replacing existing sub- 511 sections within section 11 of [RFC5661]: 513 o New introductory material, including a terminology section, 514 replaces the existing material in [RFC5661] ranging from the start 515 of the existing Section 11 up to and including the existing 516 Section 11.1. The new material appears in Sections 4.1 through 517 4.3 below. 519 o A significant reorganization of the material in the existing 520 Sections 11.4 and 11.5 (of [RFC5661]) is necessary. The reasons 521 for the reorganization of these sections into a single section 522 with multiple subsections are discussed in Section 4.4 below. 523 This replacement appears as Section 4.5 below. 525 New material relating to the handling of the file system location 526 attributes is contained in Sections 4.5.1 and 4.5.7 below. 528 o A major replacement for the existing Section 11.7 of [RFC5661] 529 entitled "Effecting File System Transitions", will appear as 530 Sections 6 through 11 of the current document. The reasons for 531 the reorganization of this section into multiple sections are 532 discussed below in Section 5 of the current document. 534 o A replacement for the existing Section 11.10 of [RFC5661] entitled 535 "The Attribute fs_locations_info", will appear as Section 12.2 of 536 the current document, with Section 12.1 describing the differences 537 between the new section and the treatment within [RFC5661]. A 538 revised treatment is necessary because the existing treatment did 539 not make clear how the added attribute information relates to the 540 case of trunked paths to the same replica. These issues were not 541 addressed in [RFC5661] where the concepts of a replica and a 542 network path used to access a replica were not clearly 543 distinguished. 545 4.1. Updated introductory material for Section 11 of [RFC5661] entitled 546 "Multi-Server Namespace" 548 NFSv4.1 supports attributes that allow a namespace to extend beyond 549 the boundaries of a single server. It is desirable that clients and 550 servers support construction of such multi-server namespaces. Use of 551 such multi-server namespaces is OPTIONAL however, and for many 552 purposes, single-server namespaces are perfectly acceptable. Use of 553 multi-server namespaces can provide many advantages, by separating a 554 file system's logical position in a namespace from the (possibly 555 changing) logistical and administrative considerations that result in 556 particular file systems being located on particular servers. 558 4.2. New section to be added as the first sub-section of Section 11 of 559 [RFC5661] to be entitled "Terminology Related to File System 560 Location" 562 Regarding terminology relating to the construction of multi-server 563 namespaces out of a set of local per-server namespaces: 565 o Each server has a set of exported file systems which may accessed 566 by NFSv4 clients. Typically, this is done by assigning each file 567 system a name within the pseudo-fs associated with the server, 568 although the pseudo-fs may be dispensed with if there is only a 569 single exported file system. Each such file system is part of the 570 server's local namespace, and can be considered as a file system 571 instance within a larger multi-server namespace. 573 o The set of all exported file systems for a given server 574 constitutes that server's local namespace. 576 o In some cases, a server will have a namespace more extensive than 577 its local namespace, by using features associated with attributes 578 that provide file system location information. These features, 579 which allow construction of a multi-server namespace are all 580 described in individual sections below and include referrals 581 (described in Section 4.5.6), migration (described in 582 Section 4.5.5), and replication (described in Section 4.5.4). 584 o A file system present in a server's pseudo-fs may have multiple 585 file system instances on different servers associated with it. 586 All such instances are considered replicas of one another. 588 o When a file system is present in a server's pseudo-fs, but there 589 is no corresponding local file system, it is said to be "absent". 590 In such cases, all associated instances will be accessed on other 591 servers. 593 Regarding terminology relating to attributes used in trunking 594 discovery and other multi-server namespace features: 596 o File system location attributes include the fs_locations and 597 fs_locations_info attributes. 599 o File system location entries provide the individual file system 600 locations within the file system location attributes. Each such 601 entry specifies a server, in the form of a host name or IP 602 address, and an fs name, which designates the location of the file 603 system within the server's pseudo-fs. A file system location 604 entry designates a set of server endpoints to which the client may 605 establish connections. There may be multiple endpoints because a 606 host name may map to multiple network addresses and because 607 multiple connection types may be used to communicate with a single 608 network address. However, all such endpoints MUST provide a way 609 of connecting to a single server. The exact form of the location 610 entry varies with the particular file system location attribute 611 used, as described in Section 4.3. 613 o File system location elements are derived from location entries 614 and each describes a particular network access path, consisting of 615 a network address and a location within the server's pseudo-fs. 616 Such location elements need not appear within a file system 617 location attribute, but the existence of each location element 618 derives from a corresponding location entry. When a location 619 entry specifies an IP address there is only a single corresponding 620 location element. File system location entries that contain a 621 host name, are resolved using DNS, and may result in one or more 622 location elements. All location elements consist of a location 623 address which is the IP address of an interface to a server and an 624 fs name which is the location of the file system within the 625 server's pseudo-fs. The fs name is empty if the server has no 626 pseudo-fs and only a single exported file system at the root 627 filehandle. 629 o Two file system location elements are said to be server-trunkable 630 if they specify the same fs name and the location addresses are 631 such that the location addresses are server-trunkable. When the 632 corresponding network paths are used, the client will always be 633 able to use client ID trunking, but will only be able to use 634 session trunking if the paths are also session-trunkable. 636 o Two file system location elements are said to be session-trunkable 637 if they specify the same fs name and the location addresses are 638 such that the location addresses are session-trunkable. When the 639 corresponding network paths are used, the client will be able to 640 able to use either client ID trunking or session trunking. 642 Each set of server-trunkable location elements defines a set of 643 available network access paths to a particular file system. When 644 there are multiple such file systems, each of which contains the same 645 data, these file systems are considered replicas of one another. 646 Logically, such replication is symmetric, since the fs currently in 647 use and an alternate fs are replicas of each other. Often, in other 648 documents, the term "replica" is not applied to the fs currently in 649 use, despite the fact that the replication relation is inherently 650 symmetric. 652 4.3. Updated Section 11.1 of [RFC5661] to be retitled "File System 653 Location Attributes" 655 NFSv4.1 contains RECOMMENDED attributes that provide information 656 about how (i.e. at what network address and namespace position) a 657 given file system may be accessed. As a result, file systems in the 658 namespace of one server can be associated with one or more instances 659 of that file system on other servers. These attributes contain file 660 system location entries specifying a server address target (either as 661 a DNS name representing one or more IP addresses or as a specific IP 662 address) together with the pathname of that file system within the 663 associated single-server namespace. 665 The fs_locations_info RECOMMENDED attribute allows specification of 666 one or more file system instance locations where the data 667 corresponding to a given file system may be found. This attribute 668 provides to the client, in addition to specification of file system 669 instance locations, other helpful information such as: 671 o Information guiding choices among the various file system 672 instances provided (e.g., priority for use, writability, currency, 673 etc.). 675 o Information to help the client efficiently effect as seamless a 676 transition as possible among multiple file system instances, when 677 and if that should be necessary. 679 o Information helping to guide the selection of the appropriate 680 connection type to be used when establishing a connection. 682 Within the fs_locations_info attribute, each fs_locations_server4 683 entry corresponds to a file system location entry with the fls_server 684 field designating the server, with the location pathname within the 685 server's pseudo-fs given by the fl_rootpath field of the encompassing 686 fs_locations_item4. 688 The fs_locations attribute defined in NFSv4.0 is also a part of 689 NFSv4.1. This attribute only allows specification of the file system 690 locations where the data corresponding to a given file system may be 691 found. Servers should make this attribute available whenever 692 fs_locations_info is supported, but client use of fs_locations_info 693 is preferable, as it provides more information. 695 Within the fs_location attribute, each fs_location4 contains a file 696 system location entry with the server field designating the server 697 and the rootpath field giving the location pathname within the 698 server's pseudo-fs. 700 4.4. Re-organization of Sections 11.4 and 11.5 of [RFC5661] 702 Previously, issues related to the fact that multiple location entries 703 directed the client to the same file system instance were dealt with 704 in a separate Section 11.5 of [RFC5661]. Because of the new 705 treatment of trunking, these issues now belong within Section 4.5 706 below. 708 In this new section of the current document, trunking is dealt with 709 in Section 4.5.2 together with the other uses of file system location 710 information described in Sections 4.5.4, 4.5.5, and 4.5.6. 712 4.5. Updated Section 11.4 of [RFC5661] to be retitled "Uses of File 713 System Location Information" 715 The file system location attributes (i.e. fs_locations and 716 fs_locations_info), together with the possibility of absent file 717 systems, provide a number of important facilities in providing 718 reliable, manageable, and scalable data access. 720 When a file system is present, these attributes can provide 721 o The locations of alternative replicas, to be used to access the 722 same data in the event of server failures, communications 723 problems, or other difficulties that make continued access to the 724 current replica impossible or otherwise impractical. Provision 725 and use of such alternate replicas is referred to as "replication" 726 and is discussed in Section 4.5.4 below. 728 o The network address(es) to be used to access the current file 729 system instance or replicas of it. Client use of this information 730 is discussed in Section 4.5.2 below. 732 Under some circumstances, multiple replicas may be used 733 simultaneously to provide higher-performance access to the file 734 system in question, although the lack of state sharing between 735 servers may be an impediment to such use. 737 When a file system is present and becomes absent, clients can be 738 given the opportunity to have continued access to their data, using a 739 different replica. In this case, a continued attempt to use the data 740 in the now-absent file system will result in an NFS4ERR_MOVED error 741 and, at that point, the successor replica or set of possible replica 742 choices can be fetched and used to continue access. Transfer of 743 access to the new replica location is referred to as "migration", and 744 is discussed in Section 4.5.4 below. 746 Where a file system was previously absent, specification of file 747 system location provides a means by which file systems located on one 748 server can be associated with a namespace defined by another server, 749 thus allowing a general multi-server namespace facility. A 750 designation of such a remote instance, in place of a file system 751 never previously present , is called a "pure referral" and is 752 discussed in Section 4.5.6 below. 754 Because client support for attributes related to file system location 755 is OPTIONAL, a server may (but is not required to) take action to 756 hide migration and referral events from such clients, by acting as a 757 proxy, for example. The server can determine the presence of client 758 support from the arguments of the EXCHANGE_ID operation (see 759 Section 14.3 in the current document). 761 4.5.1. New section to be added as the first sub-section of Section 11.4 762 of [RFC5661] to be entitled "Combining Multiple Uses in a Single 763 Attribute" 765 A file system location attribute will sometimes contain information 766 relating to the location of multiple replicas which may be used in 767 different ways. 769 o File system location entries that relate to the file system 770 instance currently in use provide trunking information, allowing 771 the client to find additional network addresses by which the 772 instance may be accessed. 774 o File system location entries that provide information about 775 replicas to which access is to be transferred. 777 o Other file system location entries that relate to replicas that 778 are available to use in the event that access to the current 779 replica becomes unsatisfactory. 781 In order to simplify client handling and allow the best choice of 782 replicas to access, the server should adhere to the following 783 guidelines. 785 o All file system location entries that relate to a single file 786 system instance should be adjacent. 788 o File system location entries that relate to the instance currently 789 in use should appear first. 791 o File system location entries that relate to replica(s) to which 792 migration is occurring should appear before replicas which are 793 available for later use if the current replica should become 794 inaccessible. 796 4.5.2. New section to be added as the second sub-section of 797 Section 11.4 of [RFC5661] to be entitled "File System Location 798 Attributes and Trunking" 800 Trunking is the use of multiple connections between a client and 801 server in order to increase the speed of data transfer. A client may 802 determine the set of network addresses to use to access a given file 803 system in a number of ways: 805 o When the name of the server is known to the client, it may use DNS 806 to obtain a set of network addresses to use in accessing the 807 server. 809 o It may fetch the file system location attribute for the filesystem 810 which will provide either the name of the server (which can be 811 turned into a set of network addresses using DNS), or it will find 812 a set of server-trunkable location entries which can provide the 813 addresses specified by the server as desirable to use to access 814 the file system in question. 816 The server can provide location entries that include either names or 817 network addresses. It might use the latter form because of DNS- 818 related security concerns or because the set of addresses to be used 819 might require active management by the server. 821 Locations entries used to discover candidate addresses for use in 822 trunking are subject to change, as discussed in Section 4.5.7 below. 823 The client may respond to such changes by using additional addresses 824 once they are verified or by ceasing to use existing ones. The 825 server can force the client to cease using an address by returning 826 NFS4ERR_MOVED when that address is used to access a file system. 827 This allows a transfer of client access which is similar to 828 migration, although the same file system instance is accessed 829 throughout. 831 4.5.3. New section to be added as the third sub-section of Section 11.4 832 of [RFC5661] to be entitled "File System Location Attributes and 833 Connection Type Selection" 835 Because of the need to support multiple connections, clients face the 836 issue of determining the proper connection type to use when 837 establishing a connection to a given server network address. In some 838 cases, this issue can be addressed through the use of the connection 839 "step-up" facility described in Section 18.16 of [RFC5661]. However, 840 because there are cases is which that facility is not available, the 841 client may have to choose a connection type with no possibility of 842 changing it within the scope of a single connection. 844 The two file system location attributes differ as to the information 845 made available in this regard. Fs_locations provides no information 846 to support connection type selection. As a result, clients 847 supporting multiple connection types would need to attempt to 848 establish connections using multiple connection types until the one 849 preferred by the client is successfully established. 851 Fs_locations_info provides a flag, FSLI4TF_RDMA flag. indicating 852 that RPC-over-RDMA support is available using the specified location 853 entry. This flag makes it for a convenient for a client wishing to 854 use RDMA, to establish a TCP connection and then convert to use of 855 RDMA. After establishing a TCP connection, the step-up facility, can 856 be used, if available, to convert that connection to RDMA mode. 857 Otherwise, if RDMA availability is indicated, a new RDMA connection 858 can be established and it can be bound to the session already 859 established by the TCP connection, allowing the TCP connection to be 860 dropped and the session converted to further use in RDMA node. 862 4.5.4. Updated Section 11.4.1 of [RFC5661] entitled "File System 863 Replication" 865 The fs_locations and fs_locations_info attributes provide alternative 866 file system locations, to be used to access data in place of or in 867 addition to the current file system instance. On first access to a 868 file system, the client should obtain the set of alternate locations 869 by interrogating the fs_locations or fs_locations_info attribute, 870 with the latter being preferred. 872 In the event that server failures, communications problems, or other 873 difficulties make continued access to the current file system 874 impossible or otherwise impractical, the client can use the alternate 875 locations as a way to get continued access to its data. 877 The alternate locations may be physical replicas of the (typically 878 read-only) file system data, or they may provide for the use of 879 various forms of server clustering in which multiple servers provide 880 alternate ways of accessing the same physical file system. How these 881 different modes of file system transition are represented within the 882 fs_locations and fs_locations_info attributes and how the client 883 deals with file system transition issues will be discussed in detail 884 below. 886 4.5.5. Updated Section 11.4.2 of [RFC5661] entitled "File System 887 Migration" 889 When a file system is present and becomes absent, clients can be 890 given the opportunity to have continued access to their data, at an 891 alternate location, as specified by a file system location attribute. 892 This migration of access to another replica includes the ability to 893 retain locks across the transition, either by using lock reclaim or 894 by taking advantage of Transparent State Migration. 896 Typically, a client will be accessing the file system in question, 897 get an NFS4ERR_MOVED error, and then use a file system location 898 attribute to determine the new location of the data. When 899 fs_locations_info is used, additional information will be available 900 that will define the nature of the client's handling of the 901 transition to a new server. 903 Such migration can be helpful in providing load balancing or general 904 resource reallocation. The protocol does not specify how the file 905 system will be moved between servers. It is anticipated that a 906 number of different server-to-server transfer mechanisms might be 907 used with the choice left to the server implementer. The NFSv4.1 908 protocol specifies the method used to communicate the migration event 909 between client and server. 911 The new location may be, in the case of various forms of server 912 clustering, another server providing access to the same physical file 913 system. The client's responsibilities in dealing with this 914 transition will depend on whether migration has occurred and the 915 means the server has chosen to provide continuity of locking state. 916 These issues will be discussed in detail below. 918 Although a single successor location is typical, multiple locations 919 may be provided. When multiple locations are provided, the client 920 will typically use the first one provided. If that is inaccessible 921 for some reason, later ones can be used. In such cases the client 922 might consider that the transition to the new replica as a migration 923 event, even though some of the servers involved might not be aware of 924 the use of the server which was inaccessible. In such a case, a 925 client might lose access to locking state as a result of the access 926 transfer. 928 When an alternate location is designated as the target for migration, 929 it must designate the same data (with metadata being the same to the 930 degree indicated by the fs_locations_info attribute). Where file 931 systems are writable, a change made on the original file system must 932 be visible on all migration targets. Where a file system is not 933 writable but represents a read-only copy (possibly periodically 934 updated) of a writable file system, similar requirements apply to the 935 propagation of updates. Any change visible in the original file 936 system must already be effected on all migration targets, to avoid 937 any possibility that a client, in effecting a transition to the 938 migration target, will see any reversion in file system state. 940 4.5.6. Updated Section 11.4.3 of [RFC5661] entitled "Referrals" 942 Referrals allow the server to associate a file system namespace entry 943 located on one server with a file system located on another server. 944 When this includes the use of pure referrals, servers are provided a 945 way of placing a file system in a location within the namespace 946 essentially without respect to its physical location on a particular 947 server. This allows a single server or a set of servers to present a 948 multi-server namespace that encompasses file systems located on a 949 wider range of servers. Some likely uses of this facility include 950 establishment of site-wide or organization-wide namespaces, with the 951 eventual possibility of combining such together into a truly global 952 namespace. 954 Referrals occur when a client determines, upon first referencing a 955 position in the current namespace, that it is part of a new file 956 system and that the file system is absent. When this occurs, 957 typically upon receiving the error NFS4ERR_MOVED, the actual location 958 or locations of the file system can be determined by fetching the a 959 locations attribute. attribute. 961 The file system location attribute may designate a single file system 962 location or multiple file system locations, to be selected based on 963 the needs of the client. The server, in the fs_locations_info 964 attribute, may specify priorities to be associated with various file 965 system location choices. The server may assign different priorities 966 to different locations as reported to individual clients, in order to 967 adapt to client physical location or to effect load balancing. When 968 both read-only and read-write file systems are present, some of the 969 read-only locations might not be absolutely up-to-date (as they would 970 have to be in the case of replication and migration). Servers may 971 also specify file system locations that include client-substituted 972 variables so that different clients are referred to different file 973 systems (with different data contents) based on client attributes 974 such as CPU architecture. 976 When the fs_locations_info attribute is such that that there are 977 multiple possible targets listed, the relationships among them may be 978 important to the client in selecting which one to use. The same 979 rules specified in Section 4.5.5 below regarding multiple migration 980 targets apply to these multiple replicas as well. For example, the 981 client might prefer a writable target on a server that has additional 982 writable replicas to which it subsequently might switch. Note that, 983 as distinguished from the case of replication, there is no need to 984 deal with the case of propagation of updates made by the current 985 client, since the current client has not accessed the file system in 986 question. 988 Use of multi-server namespaces is enabled by NFSv4.1 but is not 989 required. The use of multi-server namespaces and their scope will 990 depend on the applications used and system administration 991 preferences. 993 Multi-server namespaces can be established by a single server 994 providing a large set of pure referrals to all of the included file 995 systems. Alternatively, a single multi-server namespace may be 996 administratively segmented with separate referral file systems (on 997 separate servers) for each separately administered portion of the 998 namespace. The top-level referral file system or any segment may use 999 replicated referral file systems for higher availability. 1001 Generally, multi-server namespaces are for the most part uniform, in 1002 that the same data made available to one client at a given location 1003 in the namespace is made available to all clients at that location. 1004 However, as described above, there are facilities provided that allow 1005 different clients to be directed different sets of data, to enable 1006 adaptation to such client characteristics as CPU architecture. 1008 4.5.7. New section to be added after Section 11.4.3 of [RFC5661] to be 1009 entitled "Changes in a File System Location Attribute" 1011 Although clients will typically fetch a file system location 1012 attribute when first accessing a file system and when NFS4ERR_MOVED 1013 is returned, a client can choose to fetch the attribute periodically, 1014 in which case the value fetched may change over time. 1016 For clients not prepared to access multiple replicas simultaneously 1017 (see Section 8.1 of the current document), the handling of the 1018 various cases of change is as follows: 1020 o Changes in the list of replicas or in the network addresses 1021 associated with replicas do not require immediate action. The 1022 client will typically update its list of replicas to reflect the 1023 new information. 1025 o Additions to the list of network addresses for the current file 1026 system instance need not be acted on promptly. However the client 1027 can choose to use the new address whenever it needs to switch 1028 access to a new replica. 1030 o Deletions from the list of network addresses for the current file 1031 system instance need not be acted on immediately, although the 1032 client might need to be prepared for a shift in access whenever 1033 the server indicates that a network access path is not usable to 1034 access the current file system, by returning NFS4ERR_MOVED. 1036 For clients that are prepared to access several replicas 1037 simultaneously, the following additional cases need to be addressed. 1038 As in the cases discussed above, changes in the set of replicas need 1039 not be acted upon promptly, although the client has the option of 1040 adjusting its access even in the absence of difficulties that would 1041 lead to a new replica to be selected. 1043 o When a new replica is added which may be accessed simultaneously 1044 with one currently in use, the client is free to use the new 1045 replica immediately. 1047 o When a replica currently in use is deleted from the list, the 1048 client need not cease using it immediately. However, since the 1049 server may subsequently force such use to cease (by returning 1050 NFS4ERR_MOVED), clients might decide to limit the need for later 1051 state transfer. For example, new opens might be done on other 1052 replicas, rather than on one not present in the list. 1054 5. Re-organization of Section 11.7 of [RFC5661] 1056 The material in Section 11.7 of [RFC5661] has been reorganized and 1057 augmented as specified below: 1059 o Because there can be a shift of the network access paths used to 1060 access a file system instance without any shift between replicas, 1061 a new Section 6 in the current document distinguishes between 1062 those cases in which there is a shift between distinct replicas 1063 and those involving a shift in network access paths with no shift 1064 between replicas. 1066 As a result, a new Section 7 in the current document deals with 1067 network address transitions while the bulk of the former 1068 Section 11.7 (in [RFC5661]) is replaced by Section 8 in the 1069 current document which is now limited to cases in which there is a 1070 shift between two different sets of replicas. 1072 o The additional Section 9 in the current document discusses the 1073 case in which a shift to a different replica is made and state is 1074 transferred to allow the client the ability to have continues 1075 access to the accumulated locking state on the new server. 1077 o The additional Section 10 in the current document discusses the 1078 client's response to access transitions and how it determines 1079 whether migration has occurred, and how it gets access to any 1080 transferred locking and session state. 1082 o The additional Section 11 in the current document discusses the 1083 responsibilities of the source and destination servers when 1084 transferring locking and session state. 1086 6. New section to be added after Section 11.6 of [RFC5661] to be 1087 entitled "Overview of File Access Transitions" 1089 File access transitions are of two types: 1091 o Those that involve a transition from accessing the current replica 1092 to another one in connection with either replication or migration. 1093 How these are dealt with is discussed in Section 8 of the current 1094 document. 1096 o Those in which access to the current file system instance is 1097 retained, while the network path used to access that instance is 1098 changed. This case is discussed in Section 7 of the current 1099 document. 1101 7. New section to be added second after Section 11.6 of [RFC5661] to be 1102 entitled "Effecting Network Endpoint Transitions" 1104 The endpoints used to access a particular file system instance may 1105 change in a number of ways, as listed below. In each of these cases, 1106 the same filehandles, stateids, client IDs and session are used to 1107 continue access, with a continuity of lock state. 1109 o When use of a particular address is to cease and there is also one 1110 currently in use which is server-trunkable with it, requests that 1111 would have been issued on the address whose use is to be 1112 discontinued can be issued on the remaining address(es). When an 1113 address is not a session-trunkable one, the request might need to 1114 be modified to reflect the fact that a different session will be 1115 used. 1117 o When use of a particular connection is to cease, as indicated by 1118 receiving NFS4ERR_MOVED when using that connection but that 1119 address is still indicated as accessible according to the 1120 appropriate file system location entries, it is likely that 1121 requests can be issued on a new connection of a different 1122 connection type, once that connection is established. Since any 1123 two server endpoints that share a network address are inherently 1124 session-trunkable, the client can use BIND_CONN_TO_SESSION to 1125 access the existing session using the new connection and proceed 1126 to access the file system using the new connection. 1128 o When there are no potential replacement addresses in use but there 1129 are valid addresses session-trunkable with the one whose use is to 1130 be discontinued, the client can use BIND_CONN_TO_SESSION to access 1131 the existing session using the new address. Although the target 1132 session will generally be accessible, there may be cases in which 1133 that session in no longer accessible, in which case a new session 1134 can be created to provide the client continued access to the 1135 existing instance. 1137 o When there is no potential replacement address in use and there 1138 are no valid addresses session-trunkable with the one whose use is 1139 to be discontinued, other server-trunkable addresses may be used 1140 to provide continued access. Although use of CREATE_SESSION is 1141 available to provide continued access to the existing instance, 1142 servers have the option of providing continued access to the 1143 existing session through the new network access path in a fashion 1144 similar to that provided by session migration (see Section 9 of 1145 the current document). To take advantage of this possibility, 1146 clients can perform an initial BIND_CONN_TO_SESSION, as in the 1147 previous case, and use CREATE_SESSION only if that fails. 1149 8. Updated Section 11.7 of [RFC5661] entitled "Effecting File System 1150 Transitions" 1152 There are a range of situations in which there is a change to be 1153 effected in the set of replicas used to access a particular file 1154 system. Some of these may involve an expansion or contraction of the 1155 set of replicas used as discussed in Section 8.1 below. 1157 For reasons explained in that section, most transitions will involve 1158 a transition from a single replica to a corresponding replacement 1159 replica. When effecting replica transition, some types of sharing 1160 between the replicas may affect handling of the transition as 1161 described in Sections 8.2 through 8.8 below. The attribute 1162 fs_locations_info provides helpful information to allow the client to 1163 determine the degree of inter-replica sharing. 1165 With regard to some types of state, the degree of continuity across 1166 the transition depends on the occasion prompting the transition, with 1167 transitions initiated by the servers (i.e. migration) offering much 1168 more scope for a non-disruptive transition than cases in which the 1169 client on its own shifts its access to another replica (i.e. 1170 replication). This issue potentially applies to locking state and to 1171 session state, which are dealt with below as follows: 1173 o An introduction to the possible means of providing continuity of 1174 these areas appears in Section 8.9 below. 1176 o Transparent State Migration is introduced in Section 9 of the 1177 current document. The possible transfer of session state is 1178 addressed there as well. 1180 o The client handling of transitions, including determining how to 1181 deal with the various means that the server might take to supply 1182 effective continuity of locking state are discussed in Section 10 1183 of the current document. 1185 o The servers' (source and destination) responsibilities in 1186 effecting Transparent Migration of locking and session state are 1187 discussed in Section 11 of the current document. 1189 8.1. Updated Section 11.7.1 of [RFC5661] entitled "File System 1190 Transitions and Simultaneous Access" 1192 The fs_locations_info attribute (described in Section 11.10.1 of 1193 [RFC5661] and Section 12.2 of this document) may indicate that two 1194 replicas may be used simultaneously (see Section 11.7.2.1 of 1195 [RFC5661] for details). Although situations in which multiple 1196 replicas may be accessed simultaneously are somewhat similar to those 1197 in which a single replica is accessed by multiple network addresses, 1198 there are important differences, since locking state is not shared 1199 among multiple replicas. 1201 Because of this difference in state handling, many clients will not 1202 have the ability to take advantage of the fact that such replicas 1203 represent the same data. Such clients will not be prepared to use 1204 multiple replicas simultaneously but will access each file system 1205 using only a single replica, although the replica selected might make 1206 multiple server-trunkable addresses available. 1208 Clients who are prepared to use multiple replicas simultaneously will 1209 divide opens among replicas however they choose. Once that choice is 1210 made, any subsequent transitions will treat the set of locking state 1211 associated with each replica as a single entity. 1213 For example, if one of the replicas become unavailable, access will 1214 be transferred to a different replica, also capable of simultaneous 1215 access with the one still in use. 1217 When there is no such replica, the transition may be to the replica 1218 already in use. At this point, the client has a choice between 1219 merging the locking state for the two replicas under the aegis of the 1220 sole replica in use or treating these separately, until another 1221 replica capable of simultaneous access presents itself. 1223 8.2. Updated Section 11.7.3 of [RFC5661] entitled "Filehandles and File 1224 System Transitions" 1226 There are a number of ways in which filehandles can be handled across 1227 a file system transition. These can be divided into two broad 1228 classes depending upon whether the two file systems across which the 1229 transition happens share sufficient state to effect some sort of 1230 continuity of file system handling. 1232 When there is no such cooperation in filehandle assignment, the two 1233 file systems are reported as being in different handle classes. In 1234 this case, all filehandles are assumed to expire as part of the file 1235 system transition. Note that this behavior does not depend on the 1236 fh_expire_type attribute and supersedes the specification of the 1237 FH4_VOL_MIGRATION bit, which only affects behavior when 1238 fs_locations_info is not available. 1240 When there is cooperation in filehandle assignment, the two file 1241 systems are reported as being in the same handle classes. In this 1242 case, persistent filehandles remain valid after the file system 1243 transition, while volatile filehandles (excluding those that are only 1244 volatile due to the FH4_VOL_MIGRATION bit) are subject to expiration 1245 on the target server. 1247 8.3. Updated Section 11.7.4 of [RFC5661] entitled "Fileids and File 1248 System Transitions" 1250 In NFSv4.0, the issue of continuity of fileids in the event of a file 1251 system transition was not addressed. The general expectation had 1252 been that in situations in which the two file system instances are 1253 created by a single vendor using some sort of file system image copy, 1254 fileids would be consistent across the transition, while in the 1255 analogous multi-vendor transitions they would not. This poses 1256 difficulties, especially for the client without special knowledge of 1257 the transition mechanisms adopted by the server. Note that although 1258 fileid is not a REQUIRED attribute, many servers support fileids and 1259 many clients provide APIs that depend on fileids. 1261 It is important to note that while clients themselves may have no 1262 trouble with a fileid changing as a result of a file system 1263 transition event, applications do typically have access to the fileid 1264 (e.g., via stat). The result is that an application may work 1265 perfectly well if there is no file system instance transition or if 1266 any such transition is among instances created by a single vendor, 1267 yet be unable to deal with the situation in which a multi-vendor 1268 transition occurs at the wrong time. 1270 Providing the same fileids in a multi-vendor (multiple server 1271 vendors) environment has generally been held to be quite difficult. 1272 While there is work to be done, it needs to be pointed out that this 1273 difficulty is partly self-imposed. Servers have typically identified 1274 fileid with inode number, i.e. with a quantity used to find the file 1275 in question. This identification poses special difficulties for 1276 migration of a file system between vendors where assigning the same 1277 index to a given file may not be possible. Note here that a fileid 1278 is not required to be useful to find the file in question, only that 1279 it is unique within the given file system. Servers prepared to 1280 accept a fileid as a single piece of metadata and store it apart from 1281 the value used to index the file information can relatively easily 1282 maintain a fileid value across a migration event, allowing a truly 1283 transparent migration event. 1285 In any case, where servers can provide continuity of fileids, they 1286 should, and the client should be able to find out that such 1287 continuity is available and take appropriate action. Information 1288 about the continuity (or lack thereof) of fileids across a file 1289 system transition is represented by specifying whether the file 1290 systems in question are of the same fileid class. 1292 Note that when consistent fileids do not exist across a transition 1293 (either because there is no continuity of fileids or because fileid 1294 is not a supported attribute on one of instances involved), and there 1295 are no reliable filehandles across a transition event (either because 1296 there is no filehandle continuity or because the filehandles are 1297 volatile), the client is in a position where it cannot verify that 1298 files it was accessing before the transition are the same objects. 1299 It is forced to assume that no object has been renamed, and, unless 1300 there are guarantees that provide this (e.g., the file system is 1301 read-only), problems for applications may occur. Therefore, use of 1302 such configurations should be limited to situations where the 1303 problems that this may cause can be tolerated. 1305 8.4. Updated section 11.7.5 of [RFC5661] entitled "Fsids and File 1306 System Transitions" 1308 Since fsids are generally only unique on a per-server basis, it is 1309 likely that they will change during a file system transition. 1310 Clients should not make the fsids received from the server visible to 1311 applications since they may not be globally unique, and because they 1312 may change during a file system transition event. Applications are 1313 best served if they are isolated from such transitions to the extent 1314 possible. 1316 Although normally a single source file system will transition to a 1317 single target file system, there is a provision for splitting a 1318 single source file system into multiple target file systems, by 1319 specifying the FSLI4F_MULTI_FS flag. 1321 8.4.1. Updated section 11.7.5.1 of [RFC5661] entitled "File System 1322 Splitting" 1324 When a file system transition is made and the fs_locations_info 1325 indicates that the file system in question might be split into 1326 multiple file systems (via the FSLI4F_MULTI_FS flag), the client 1327 SHOULD do GETATTRs to determine the fsid attribute on all known 1328 objects within the file system undergoing transition to determine the 1329 new file system boundaries. 1331 Clients might choose to maintain the fsids passed to existing 1332 applications by mapping all of the fsids for the descendant file 1333 systems to the common fsid used for the original file system. 1335 Splitting a file system can be done on a transition between file 1336 systems of the same fileid class, since the fact that fileids are 1337 unique within the source file system ensure they will be unique in 1338 each of the target file systems. 1340 8.5. Updated Section 11.7.6 of [RFC5661] entitled "The Change Attribute 1341 and File System Transitions" 1343 Since the change attribute is defined as a server-specific one, 1344 change attributes fetched from one server are normally presumed to be 1345 invalid on another server. Such a presumption is troublesome since 1346 it would invalidate all cached change attributes, requiring 1347 refetching. Even more disruptive, the absence of any assured 1348 continuity for the change attribute means that even if the same value 1349 is retrieved on refetch, no conclusions can be drawn as to whether 1350 the object in question has changed. The identical change attribute 1351 could be merely an artifact of a modified file with a different 1352 change attribute construction algorithm, with that new algorithm just 1353 happening to result in an identical change value. 1355 When the two file systems have consistent change attribute formats, 1356 and this fact is communicated to the client by reporting in the same 1357 change class, the client may assume a continuity of change attribute 1358 construction and handle this situation just as it would be handled 1359 without any file system transition. 1361 8.6. Updated Section 11.7.8 of [RFC5661] entitled "Write Verifiers and 1362 File System Transitions" 1364 In a file system transition, the two file systems might be clustered 1365 in the handling of unstably written data. When this is the case, and 1366 the two file systems belong to the same write-verifier class, write 1367 verifiers returned from one system may be compared to those returned 1368 by the other and superfluous writes avoided. 1370 When two file systems belong to different write-verifier classes, any 1371 verifier generated by one must not be compared to one provided by the 1372 other. Instead, the two verifiers should be treated as not equal 1373 even when the values are identical. 1375 8.7. Updated Section 11.7.9 of [RFC5661] entitled "Readdir Cookies and 1376 Verifiers and File System Transitions)" 1378 In a file system transition, the two file systems might be consistent 1379 in their handling of READDIR cookies and verifiers. When this is the 1380 case, and the two file systems belong to the same readdir class, 1381 READDIR cookies and verifiers from one system may be recognized by 1382 the other and READDIR operations started on one server may be validly 1383 continued on the other, simply by presenting the cookie and verifier 1384 returned by a READDIR operation done on the first file system to the 1385 second. 1387 When two file systems belong to different readdir classes, any 1388 READDIR cookie and verifier generated by one is not valid on the 1389 second, and must not be presented to that server by the client. The 1390 client should act as if the verifier was rejected. 1392 8.8. Updated Section 11.7.10 of [RFC5661] entitled "File System Data 1393 and File System Transitions" 1395 When multiple replicas exist and are used simultaneously or in 1396 succession by a client, applications using them will normally expect 1397 that they contain either the same data or data that is consistent 1398 with the normal sorts of changes that are made by other clients 1399 updating the data of the file system (with metadata being the same to 1400 the degree indicated by the fs_locations_info attribute). However, 1401 when multiple file systems are presented as replicas of one another, 1402 the precise relationship between the data of one and the data of 1403 another is not, as a general matter, specified by the NFSv4.1 1404 protocol. It is quite possible to present as replicas file systems 1405 where the data of those file systems is sufficiently different that 1406 some applications have problems dealing with the transition between 1407 replicas. The namespace will typically be constructed so that 1408 applications can choose an appropriate level of support, so that in 1409 one position in the namespace a varied set of replicas will be 1410 listed, while in another only those that are up-to-date may be 1411 considered replicas. The protocol does define three special cases of 1412 the relationship among replicas to be specified by the server and 1413 relied upon by clients: 1415 o When multiple replicas exist and are used simultaneously by a 1416 client (see the FSLIB4_CLSIMUL definition within 1417 fs_locations_info), they must designate the same data. Where file 1418 systems are writable, a change made on one instance must be 1419 visible on all instances, immediately upon the earlier of the 1420 return of the modifying requester or the visibility of that change 1421 on any of the associated replicas. This allows a client to use 1422 these replicas simultaneously without any special adaptation to 1423 the fact that there are multiple replicas, beyond adapting to the 1424 fact that locks obtained on one replica are maintained separately 1425 (i.e. under a different client ID). In this case, locks (whether 1426 share reservations or byte-range locks) and delegations obtained 1427 on one replica are immediately reflected on all replicas, in the 1428 sense that access from all other servers is prevented regardless 1429 of the replica used. However, because the servers are not 1430 required to treat two associated client IDs as representing the 1431 same client, it is best to access each file using only a single 1432 client ID. 1434 o When one replica is designated as the successor instance to 1435 another existing instance after return NFS4ERR_MOVED (i.e., the 1436 case of migration), the client may depend on the fact that all 1437 changes written to stable storage on the original instance are 1438 written to stable storage of the successor (uncommitted writes are 1439 dealt with in Section 8.6 above). 1441 o Where a file system is not writable but represents a read-only 1442 copy (possibly periodically updated) of a writable file system, 1443 clients have similar requirements with regard to the propagation 1444 of updates. They may need a guarantee that any change visible on 1445 the original file system instance must be immediately visible on 1446 any replica before the client transitions access to that replica, 1447 in order to avoid any possibility that a client, in effecting a 1448 transition to a replica, will see any reversion in file system 1449 state. The specific means of this guarantee varies based on the 1450 value of the fss_type field that is reported as part of the 1451 fs_status attribute (see Section 11.11 of [RFC5661]). Since these 1452 file systems are presumed to be unsuitable for simultaneous use, 1453 there is no specification of how locking is handled; in general, 1454 locks obtained on one file system will be separate from those on 1455 others. Since these are expected to be read-only file systems, 1456 this is not likely to pose an issue for clients or applications. 1458 8.9. Updated Section 11.7.7 of [RFC5661] entitled "Lock State and File 1459 System Transitions" 1461 While accessing a file system, clients obtain locks enforced by the 1462 server which may prevent actions by other clients that are 1463 inconsistent with those locks. 1465 When access is transferred between replicas, clients need to be 1466 assured that the actions disallowed by holding these locks cannot 1467 have occurred during the transition. This can be ensured by the 1468 methods below. Unless at least one of these is implemented, clients 1469 will not be assured of continuity of lock possession across a 1470 migration event. 1472 o Providing the client an opportunity to re-obtain his locks via a 1473 per-fs grace period on the destination server. Because the lock 1474 reclaim mechanism was originally defined to support server reboot, 1475 it implicitly assumes that file handles will on reclaim will be 1476 the same as those at open. In the case of migration, this 1477 requires that source and destination servers use the same 1478 filehandles, as evidenced by using the same server scope (see 1479 Section 13.2 of the current document) or by showing this agreement 1480 using fs_locations_info (see Section 8.2 above). 1482 o Locking state can be transferred as part of the transition by 1483 providing Transparent State Migration as described in Section 9 of 1484 the current document. 1486 Of these, Transparent State Migration provides the smoother 1487 experience for clients in that there is no grace-period-based delay 1488 before new locks can be obtained. However, it requires a greater 1489 degree of inter-server co-ordination. In general, the servers taking 1490 part in migration are free to provide either facility. However, when 1491 the filehandles can differ across the migration event, Transparent 1492 State Migration is the only available means of providing the needed 1493 functionality. 1495 It should be noted that these two methods are not mutually exclusive 1496 and that a server might well provide both. In particular, if there 1497 is some circumstance preventing a specific lock from being 1498 transferred transparently, the destination server can allow it to be 1499 reclaimed, by implementing a per-fs grace period for the migrated 1500 file system. 1502 9. New section to be added after Section 11.11 of [RFC5661] to be 1503 entitled "Transferring State upon Migration" 1505 When the transition is a result of a server-initiated decision to 1506 transition access and the source and destination servers have 1507 implemented appropriate co-operation, it is possible to: 1509 o Transfer locking state from the source to the destination server, 1510 in a fashion similar to that provided by Transparent State 1511 Migration in NFSv4.0, as described in [RFC7931]. Server 1512 responsibilities are described in Section 11.2 of the current 1513 document. 1515 o Transfer session state from the source to the destination server. 1516 Server responsibilities in effecting such a transfer are described 1517 in Section 11.3 of the current document. 1519 The means by which the client determines which of these transfer 1520 events has occurred are described in Section 10 of the current 1521 document. 1523 9.1. Only sub-section within new section to be added to [RFC5661] to be 1524 entitled "Transparent State Migration and pNFS" 1526 When pNFS is involved, the protocol is capable of supporting: 1528 o Migration of the Metadata Server (MDS), leaving the Data Servers 1529 (DS's) in place. 1531 o Migration of the file system as a whole, including the MDS and 1532 associated DS's. 1534 o Replacement of one DS by another. 1536 o Migration of a pNFS file system to one in which pNFS is not used. 1538 o Migration of a file system not using pNFS to one in which layouts 1539 are available. 1541 Note that migration per se is only involved in the transfer of the 1542 MDS function. Although the servicing of a layout may be transferred 1543 from one data server to another, this not done using the file system 1544 location attributes. The MDS can effect such transfers by recalling/ 1545 revoking existing layouts and granting new ones on a different data 1546 server. 1548 Migration of the MDS function is directly supported by Transparent 1549 State Migration. Layout state will normally be transparently 1550 transferred, just as other state is. As a result, Transparent State 1551 Migration provides a framework in which, given appropriate inter-MDS 1552 data transfer, one MDS can be substituted for another. 1554 Migration of the file system function as a whole can be accomplished 1555 by recalling all layouts as part of the initial phase of the 1556 migration process. As a result, IO will be done through the MDS 1557 during the migration process, and new layouts can be granted once the 1558 client is interacting with the new MDS. An MDS can also effect this 1559 sort of transition by revoking all layouts as part of Transparent 1560 State Migration, as long as the client is notified about the loss of 1561 locking state. 1563 In order to allow migration to a file system on which pNFS is not 1564 supported, clients need to be prepared for a situation in which 1565 layouts are not available or supported on the destination file system 1566 and so direct IO requests to the destination server, rather than 1567 depending on layouts being available. 1569 Replacement of one DS by another is not addressed by migration as 1570 such but can be effected by an MDS recalling layouts for the DS to be 1571 replaced and issuing new ones to be served by the successor DS. 1573 Migration may transfer a file system from a server which does not 1574 support pNFS to one which does. In order to properly adapt to this 1575 situation, clients which support pNFS, but function adequately in its 1576 absence should check for pNFS support when a file system is migrated 1577 and be prepared to use pNFS when support is available on the 1578 destination. 1580 10. New section to be added second after Section 11.11 of [RFC5661] to 1581 be entitled "Client Responsibilities when Access is Transitioned" 1583 For a client to respond to an access transition, it must become aware 1584 of it. The ways in which this can happen are discussed in 1585 Section 10.1 which discusses indications that a specific file system 1586 access path has transitioned as well as situations in which 1587 additional activity is necessary to determine the set of file systems 1588 that have been migrated. Section 10.2 goes on to complete the 1589 discussion of how the set of migrated file systems might be 1590 determined. Sections 10.3 through 10.5 discuss how the client should 1591 deal with each transition it becomes aware of, either directly or as 1592 a result of migration discovery. 1594 The following terms are used to describe client activities: 1596 o "Transition recovery" refers to the process of restoring access to 1597 a file system on which NFS4ERR_MOVED was received. 1599 o "Migration recovery" to that subset of transition recovery which 1600 applies when the file system has migrated to a different replica. 1602 o "Migration discovery" refers to the process of determining which 1603 file system(s) have been migrated. It is necessary to avoid a 1604 situation in which leases could expire when a file system is not 1605 accessed for a long period of time, since a client unaware of the 1606 migration might be referencing an unmigrated file system and not 1607 renewing the lease associated with the migrated file system. 1609 10.1. First sub-section within new section to be added to [RFC5661] to 1610 be entitled "Client Transition Notifications" 1612 When there is a change in the network access path which a client is 1613 to use to access a file system, there are a number of related status 1614 indications with which clients need to deal: 1616 o If an attempt is made to use or return a filehandle within a file 1617 system that is no longer accessible at the address previously used 1618 to access it, the error NFS4ERR_MOVED is returned. 1620 Exceptions are made to allow such file handles to be used when 1621 interrogating a file system location attribute. This enables a 1622 client to determine a new replica's location or a new network 1623 access path. 1625 This condition continues on subsequent attempts to access the file 1626 system in question. The only way the client can avoid the error 1627 is to cease accessing the filesystem in question at its old server 1628 location and access it instead using a different address at which 1629 it is now available. 1631 o Whenever a SEQUENCE operation is sent by a client to a server 1632 which generated state held on that client which is associated with 1633 a file system that is no longer accessible on the server at which 1634 it was previously available, a lease-migrated indication, in the 1635 form the SEQ4_STATUS_LEASE_MOVED status bit being set, appears in 1636 the response. 1638 This condition continues until the client acknowledges the 1639 notification by fetching a file system location attribute for the 1640 file system whose network access path is being changed. When 1641 there are multiple such file systems, a location attribute for 1642 each such file system needs to be fetched. The location attribute 1643 for all migrated file system needs to be fetched in order to clear 1644 the condition. Even after the condition is cleared, the client 1645 needs to respond by using the location information to access the 1646 file system at its new location to ensure that leases are not 1647 needlessly expired. 1649 Unlike the case of NFSv4.0, in which the corresponding conditions are 1650 both errors and thus mutually exclusive, in NFSv4.1 the client can, 1651 and often will, receive both indications on the same request. As a 1652 result, implementations need to address the question of how to co- 1653 ordinate the necessary recovery actions when both indications arrive 1654 in the response to the same request. It should be noted that when 1655 processing an NFSv4 COMPOUND, the server will normally decide whether 1656 SEQ4_STATUS_LEASE_MOVED is to be set before it determines which file 1657 system will be referenced or whether NFS4ERR_MOVED is to be returned. 1659 Since these indications are not mutually exclusive in NFSv4.1, the 1660 following combinations are possible results when a COMPOUND is 1661 issued: 1663 o The COMPOUND status is NFS4ERR_MOVED and SEQ4_STATUS_LEASE_MOVED 1664 is asserted. 1666 In this case, transition recovery is required. While it is 1667 possible that migration discovery is needed in addition, it is 1668 likely that only the accessed file system has transitioned. In 1669 any case, because addressing NFS4ERR_MOVED is necessary to allow 1670 the rejected requests to be processed on the target, dealing with 1671 it will typically have priority over migration discovery. 1673 o The COMPOUND status is NFS4ERR_MOVED and SEQ4_STATUS_LEASE_MOVED 1674 is clear. 1676 In this case, transition recovery is also required. It is clear 1677 that migration discovery is not needed to find file systems that 1678 have been migrated other that the one returning NFS4ERR_MOVED. 1679 Cases in which this result can arise include a referral or a 1680 migration for which there is no associated locking state. This 1681 can also arise in cases in which an access path transition other 1682 than migration occurs within the same server. In such a case, 1683 there is no need to set SEQ4_STATUS_LEASE_MOVED, since the lease 1684 remains associated with the current server even though the access 1685 path has changed. 1687 o The COMPOUND status is not NFS4ERR_MOVED and 1688 SEQ4_STATUS_LEASE_MOVED is asserted. 1690 In this case, no transition recovery activity is required on the 1691 file system(s) accessed by the request. However, to prevent 1692 avoidable lease expiration, migration discovery needs to be done 1694 o The COMPOUND status is not NFS4ERR_MOVED and 1695 SEQ4_STATUS_LEASE_MOVED is clear. 1697 In this case, neither transition-related activity nor migration 1698 discovery is required. 1700 Note that the specified actions only need to be taken if they are not 1701 already going on. For example, when NFS4ERR_MOVED is received when 1702 accessing a file system for which transition recovery already going 1703 on, the client merely waits for that recovery to be completed while 1704 the receipt of SEQ4_STATUS_LEASE_MOVED indication only needs to 1705 initiate migration discovery for a server if it is not going on for 1706 that server. 1708 The fact that a lease-migrated condition does not result in an error 1709 in NFSv4.1 has a number of important consequences. In addition to 1710 the fact, discussed above, that the two indications are not mutually 1711 exclusive, there are number of issues that are important in 1712 considering implementation of migration discovery, as discussed in 1713 Section 10.2. 1715 Because of the absence of NFSV4ERR_LEASE_MOVED, it is possible for 1716 file systems whose access path has not changed to be successfully 1717 accessed on a given server even though recovery is necessary for 1718 other file systems on the same server. As a result, access can go on 1719 while, 1721 o The migration discovery process is going on for that server. 1723 o The transition recovery process is going on for on other file 1724 systems connected to that server. 1726 10.2. Second sub-section within new section to be added to [RFC5661] to 1727 be entitled "Performing Migration Discovery" 1729 Migration discovery can be performed in the same context as 1730 transition recovery, allowing recovery for each migrated file system 1731 to be invoked as it is discovered. Alternatively, it may be done in 1732 a separate migration discovery thread, allowing migration discovery 1733 to be done in parallel with one or more instances of transition 1734 recovery. 1736 In either case, because the lease-migrated indication does not result 1737 in an error. other access to file systems on the server can proceed 1738 normally, with the possibility that further such indications will be 1739 received, raising the issue of how such indications are to be dealt 1740 with. In general, 1742 o No action needs to be taken for such indications received by the 1743 those performing migration discovery, since continuation of that 1744 work will address the issue. 1746 o In other cases in which migration discovery is currently being 1747 performed, nothing further needs to be done to respond to such 1748 lease migration indications, as long as one can be certain that 1749 the migration discovery process would deal with those indications. 1750 See below for details. 1752 o For such indications received in all other contexts, the 1753 appropriate response is to initiate or otherwise provide for the 1754 execution of migration discovery for file systems associated with 1755 the server IP address returning the indication. 1757 This leaves a potential difficulty in situations in which the 1758 migration discovery process is near to completion but is still 1759 operating. One should not ignore a LEASE_MOVED indication if the 1760 migration discovery process is not able to respond to the discovery 1761 of additional migrating file systems without additional aid. A 1762 further complexity relevant in addressing such situations is that a 1763 lease-migrated indication may reflect the server's state at the time 1764 the SEQUENCE operation was processed, which may be different from 1765 that in effect at the time the response is received. Because new 1766 migration events may occur at any time, and because a LEASE_MOVED 1767 indication may reflect the situation in effect a considerable time 1768 before the indication is received, special care needs to be taken to 1769 ensure that LEASE_MOVED indications are not inappropriately ignored. 1771 A useful approach to this issue involves the use of separate 1772 externally-visible migration discovery states for each server. 1773 Separate values could represent the various possible states for the 1774 migration discovery process for a server: 1776 o non-operation, in which migration discovery is not being performed 1778 o normal operation, in which there is an ongoing scan for migrated 1779 file systems. 1781 o completion/verification of migration discovery processing, in 1782 which the possible completion of migration discovery processing 1783 needs to be verified. 1785 Given that framework, migration discovery processing would proceed as 1786 follows. 1788 o While in the normal-operation state, the thread performing 1789 discovery would fetch, for successive file systems known to the 1790 client on the server being worked on, a file system location 1791 attribute plus the fs_status attribute. 1793 o If the fs_status attribute indicates that the file system is a 1794 migrated one (i.e. fss_absent is true and fss_type != 1795 STATUS4_REFERRAL) and thus that it is likely that the fetch of the 1796 file system location attribute has cleared one the file systems 1797 contributing to the lease-migrated indication. 1799 o In cases in which that happened, the thread cannot know whether 1800 the lease-migrated indication has been cleared and so it enters 1801 the completion/verification state and proceeds to issue a COMPOUND 1802 to see if the LEASE_MOVED indication has been cleared. 1804 o When the discovery process is in the completion/verification 1805 state, if others request get a lease-migrated indication they note 1806 that it was received and the existence of such indications is used 1807 when the request completes, as described below. 1809 When the request used in the completion/verification state completes: 1811 o If a lease-migrated indication is returned, the discovery 1812 continues normally. Note that this is so even if all file systems 1813 have traversed, since new migrations could have occurred while the 1814 process was going on. 1816 o Otherwise, if there is any record that other requests saw a lease- 1817 migrated indication while the request was going on, that record is 1818 cleared and the verification request retried. The discovery 1819 process remains in completion/verification state. 1821 o If there have been no lease-migrated indications, the work of 1822 migration discovery is considered completed and it enters the non- 1823 operating state. Once it enters this state, subsequent lease- 1824 migrated indication will trigger a new migration discovery 1825 process. 1827 It should be noted that the process described above is not guaranteed 1828 to terminate, as a long series of new migration events might 1829 continually delay the clearing of the LEASE_MOVED indication. To 1830 prevent unnecessary lease expiration, it is appropriate for clients 1831 to use the discovery of migrations to effect lease renewal 1832 immediately, rather than waiting for clearing of the LEASE_MOVED 1833 indication when the complete set of migrations is available. 1835 10.3. Third sub-section within new section to be added to [RFC5661] to 1836 be entitled "Overview of Client Response to NFS4ERR_MOVED" 1838 This section outlines a way in which a client that receives 1839 NFS4ERR_MOVED can effect transition recovery by using a new server or 1840 server endpoint if one is available. As part of that process, it 1841 will determine: 1843 o Whether the NFS4ERR_MOVED indicates migration has occurred, or 1844 whether it indicates another sort of file system access transition 1845 as discussed in Section 7 above. 1847 o In the case of migration, whether Transparent State Migration has 1848 occurred. 1850 o Whether any state has been lost during the process of Transparent 1851 State Migration. 1853 o Whether sessions have been transferred as part of Transparent 1854 State Migration. 1856 During the first phase of this process, the client proceeds to 1857 examine file system location entries to find the initial network 1858 address it will use to continue access to the file system or its 1859 replacement. For each location entry that the client examines, the 1860 process consists of five steps: 1862 1. Performing an EXCHANGE_ID directed at the location address. This 1863 operation is used to register the client-owner with the server, 1864 to obtain a client ID to be use subsequently to communicate with 1865 it, to obtain that client ID's confirmation status, and to 1866 determine server_owner and scope for the purpose of determining 1867 if the entry is trunkable with that previously being used to 1868 access the file system (i.e. that it represents another network 1869 access path to the same file system and can share locking state 1870 with it). 1872 2. Making an initial determination of whether migration has 1873 occurred. The initial determination will be based on whether the 1874 EXCHANGE_ID results indicate that the current location element is 1875 server-trunkable with that used to access the file system when 1876 access was terminated by receiving NFS4ERR_MOVED. If it is, then 1877 migration has not occurred and the transition is dealt with, at 1878 least initially, as one involving continued access to the same 1879 file system on the same server through a new network address. 1881 3. Obtaining access to existing session state or creating new 1882 sessions. How this is done depends on the initial determination 1883 of whether migration has occurred and can be done as described in 1884 Section 10.4 below in the case of migration or as described in 1885 Section 10.5 below in the case of a network address transfer 1886 without migration. 1888 4. Verification of the trunking relationship assumed in step 2 as 1889 discussed in Section 2.10.5.1 of [RFC5661]. Although this step 1890 will generally confirm the initial determination, it is possible 1891 for verification to fail with the result that an initial 1892 determination that a network address shift (without migration) 1893 has occurred may be invalidated and migration determined to have 1894 occurred. There is no need to redo step 3 above, since it will 1895 be possible to continue use of the session established already. 1897 5. Obtaining access to existing locking state and/or reobtaining it. 1898 How this is done depends on the final determination of whether 1899 migration has occurred and can be done as described below in 1900 Section 10.4 in the case of migration or as described in 1901 Section 10.5 in the case of a network address transfer without 1902 migration. 1904 Once the initial address has been determined, clients are free to 1905 apply an abbreviated process to find additional addresses trunkable 1906 with it (clients may seek session-trunkable or server-trunkable 1907 addresses depending on whether they support clientid trunking). 1908 During this later phase of the process, further location entries are 1909 examined using the abbreviated procedure specified below: 1911 1. Before the EXCHANGE_ID, the fs name of the location entry is 1912 examined and if it does not match that currently being used, the 1913 entry is ignored. otherwise, one proceeds as specified by step 1 1914 above,. 1916 2. In the case that the network address is session-trunkable with 1917 one used previously a BIND_CONN_TO_SESSION is used to access that 1918 session using the new network address. Otherwise, or if the bind 1919 operation fails, a CREATE_SESSION is done. 1921 3. The verification procedure referred to in step 4 above is used. 1922 However, if it fails, the entry is ignored and the next available 1923 entry is used. 1925 10.4. Fourth sub-section within new section to be added to [RFC5661] to 1926 be entitled "Obtaining Access to Sessions and State after 1927 Migration" 1929 In the event that migration has occurred, migration recovery will 1930 involve determining whether Transparent State Migration has occurred. 1931 This decision is made based on the client ID returned by the 1932 EXCHANGE_ID and the reported confirmation status. 1934 o If the client ID is an unconfirmed client ID not previously known 1935 to the client, then Transparent State Migration has not occurred. 1937 o If the client ID is a confirmed client ID previously known to the 1938 client, then any transferred state would have been merged with an 1939 existing client ID representing the client to the destination 1940 server. In this state merger case, Transparent State Migration 1941 might or might not have occurred and a determination as to whether 1942 it has occurred is deferred until sessions are established and the 1943 client is ready to begin state recovery. 1945 o If the client ID is a confirmed client ID not previously known to 1946 the client, then the client can conclude that the client ID was 1947 transferred as part of Transparent State Migration. In this 1948 transferred client ID case, Transparent State Migration has 1949 occurred although some state might have been lost. 1951 Once the client ID has been obtained, it is necessary to obtain 1952 access to sessions to continue communication with the new server. In 1953 any of the cases in which Transparent State Migration has occurred, 1954 it is possible that a session was transferred as well. To deal with 1955 that possibility, clients can, after doing the EXCHANGE_ID, issue a 1956 BIND_CONN_TO_SESSION to connect the transferred session to a 1957 connection to the new server. If that fails, it is an indication 1958 that the session was not transferred and that a new session needs to 1959 be created to take its place. 1961 In some situations, it is possible for a BIND_CONN_TO_SESSION to 1962 succeed without session migration having occurred. If state merger 1963 has taken place then the associated client ID may have already had a 1964 set of existing sessions, with it being possible that the sessionid 1965 of a given session is the same as one that might have been migrated. 1966 In that event, a BIND_CONN_TO_SESSION might succeed, even though 1967 there could have been no migration of the session with that 1968 sessionid. 1970 Once the client has determined the initial migration status, and 1971 determined that there was a shift to a new server, it needs to re- 1972 establish its locking state, if possible. To enable this to happen 1973 without loss of the guarantees normally provided by locking, the 1974 destination server needs to implement a per-fs grace period in all 1975 cases in which lock state was lost, including those in which 1976 Transparent State Migration was not implemented. 1978 Clients need to be deal with the following cases: 1980 o In the state merger case, it is possible that the server has not 1981 attempted Transparent State Migration, in which case state may 1982 have been lost without it being reflected in the SEQ4_STATUS bits. 1983 To determine whether this has happened, the client can use 1984 TEST_STATEID to check whether the stateids created on the source 1985 server are still accessible on the destination server. Once a 1986 single stateid is found to have been successfully transferred, the 1987 client can conclude that Transparent State Migration was begun and 1988 any failure to transport all of the stateids will be reflected in 1989 the SEQ4_STATUS bits. Otherwise. Transparent State Migration has 1990 not occurred. 1992 o In a case in which Transparent State Migration has not occurred, 1993 the client can use the per-fs grace period provided by the 1994 destination server to reclaim locks that were held on the source 1995 server. 1997 o In a case in which Transparent State Migration has occurred, and 1998 no lock state was lost (as shown by SEQ4_STATUS flags), no lock 1999 reclaim is necessary. 2001 o In a case in which Transparent State Migration has occurred, and 2002 some lock state was lost (as shown by SEQ4_STATUS flags), existing 2003 stateids need to be checked for validity using TEST_STATEID, and 2004 reclaim used to re-establish any that were not transferred. 2006 For all of the cases above, RECLAIM_COMPLETE with an rca_one_fs value 2007 of TRUE needs to be done before normal use of the file system 2008 including obtaining new locks for the file system. This applies even 2009 if no locks were lost and there was no need for any to be reclaimed. 2011 10.5. Fifth sub-section within new section to be added to [RFC5661] to 2012 be entitled "Obtaining Access to Sessions and State after Network 2013 Address Transfer" 2015 The case in which there is a transfer to a new network address 2016 without migration is similar to that described in Section 10.4 above 2017 in that there is a need to obtain access to needed sessions and 2018 locking state. However, the details are simpler and will vary 2019 depending on the type of trunking between the address receiving 2020 NFS4ERR_MOVED and that to which the transfer is to be made 2022 To make a session available for use, a BIND_CONN_TO_SESSION should be 2023 used to obtain access to the session previously in use. Only if this 2024 fails, should a CREATE_SESSION be done. While this procedure mirrors 2025 that in Section 10.4 above, there is an important difference in that 2026 preservation of the session is not purely optional but depends on the 2027 type of trunking. 2029 Access to appropriate locking state should need no actions beyond 2030 access to the session. However, the SEQ4_STATUS bits need to be 2031 checked for lost locking state, including the need to reclaim locks 2032 after a server reboot. 2034 11. New section to be added third after Section 11.11 of [RFC5661] to 2035 be entitled "Server Responsibilities Upon Migration" 2037 In the event of file system migration, when the client connects to 2038 the destination server, it needs to be able to provide the client 2039 continued to access the files it had open on the source server. 2040 There are two ways to provide this: 2042 o By provision of an fs-specific grace period, allowing the client 2043 the ability to reclaim its locks, in a fashion similar to what 2044 would have been done in the case of recovery from a server 2045 restart. See Section 11.1 for a more complete discussion. 2047 o By implementing Transparent State Migration possibly in connection 2048 with session migration, the server can provide the client 2049 immediate access to the state built up on the source server, on 2050 the destination. 2052 These features are discussed separately in Sections 11.2 and 11.3, 2053 which discuss Transparent State Migration and session migration 2054 respectively. 2056 All the features described above can involve transfer of lock-related 2057 information between source and destination servers. In some cases 2058 this transfer is a necessary part of the implementation while in 2059 other cases it is a helpful implementation aid which servers might or 2060 might not use. The sub-sections below discuss the information which 2061 would transferred but do not define the specifics of the transfer 2062 protocol. This is left as an implementation choice although 2063 standards in this area could be developed at a later time. 2065 11.1. First sub-section within new section to be added to [RFC5661] to 2066 be entitled "Server Responsibilities in Effecting State Reclaim 2067 after Migration" 2069 In this case, destination server need have no knowledge of the locks 2070 held on the source server, but relies on the clients to accurately 2071 report (via reclaim operations) the locks previously held, not 2072 allowing new locks to be granted on migrated file system until the 2073 grace period expires. 2075 During this grace period clients have the opportunity to use reclaim 2076 operations to obtain locks for file system objects within the 2077 migrated file system, in the same way that they do when recovering 2078 from server restart, and the servers typically rely on clients to 2079 accurately report their locks, although they have the option of 2080 subjecting these requests to verification. If the clients only 2081 reclaim locks held on the source server, no conflict can arise. Once 2082 the client has reclaimed its locks, it indicates the completion of 2083 lock reclamation by performing a RECLAIM_COMPLETE specifying 2084 rca_one_fs as TRUE. 2086 While it is not necessary for source and destination servers to co- 2087 operate to transfer information about locks, implementations are 2088 well-advised to consider transferring the following useful 2089 information: 2091 o If information about the set of clients that have locking state 2092 for the transferred file system, the destination server will be 2093 able to terminate the grace period once all such clients have 2094 reclaimed their locks, allowing normal locking activity to resume 2095 earlier than it would have otherwise. 2097 o Locking summary information for individual clients (at various 2098 possible levels of detail) can detect some instances in which 2099 clients do not accurately represent the locks held on the source 2100 server. 2102 11.2. Second sub-section within new section to be added to [RFC5661] to 2103 be entitled "Server Responsibilities in Effecting Transparent 2104 State Migration" 2106 The basic responsibility of the source server in effecting 2107 Transparent State Migration is to make available to the destination 2108 server a description of each piece of locking state associated with 2109 the file system being migrated. In addition to client id string and 2110 verifier, the source server needs to provide, for each stateid: 2112 o The stateid including the current sequence value. 2114 o The associated client ID. 2116 o The handle of the associated file. 2118 o The type of the lock, such as open, byte-range lock, delegation, 2119 layout. 2121 o For locks such as opens and byte-range locks, there will be 2122 information about the owner(s) of the lock. 2124 o For recallable/revocable lock types, the current recall status 2125 needs to be included. 2127 o For each lock type there will by type-specific information, such 2128 as share and deny modes for opens and type and byte ranges for 2129 byte-range locks and layouts. 2131 A further server responsibility concerns locks that are revoked or 2132 otherwise lost during the process of file system migration. Because 2133 locks that appear to be lost during the process of migration will be 2134 reclaimed by the client, the servers have to take steps to ensure 2135 that locks revoked soon before or soon after migration are not 2136 inadvertently allowed to be reclaimed in situations in which the 2137 continuity of lock possession cannot be assured. 2139 o For locks lost on the source but whose loss has not yet been 2140 acknowledged by the client (by using FREE_STATEID), the 2141 destination must be aware of this loss so that it can deny a 2142 request to reclaim them. 2144 o For locks lost on the destination after the state transfer but 2145 before the client's RECLAIM_COMPLTE is done, the destination 2146 server should note these and not allow them to be reclaimed. 2148 An additional responsibility of the cooperating servers concerns 2149 situations in which a stateid cannot be transferred transparently 2150 because it conflicts with an existing stateid held by the client and 2151 associated with a different file system. In this case there are two 2152 valid choices: 2154 o Treat the transfer, as in NFSv4.0, as one without Transparent 2155 State Migration. In this case, conflicting locks cannot be 2156 granted until the client does a RECLAIM_COMPLETE, after reclaiming 2157 the locks it had, with the exception of reclaims denied because 2158 they were attempts to reclaim locks that had been lost. 2160 o Implement Transparent State Migration, except for the lock with 2161 the conflicting stateid. In this case, the client will be aware 2162 of a lost lock (through the SEQ4_STATUS flags) and be allowed to 2163 reclaim it. 2165 When transferring state between the source and destination, the 2166 issues discussed in Section 7.2 of [RFC7931] must still be attended 2167 to. In this case, the use of NFS4ERR_DELAY may still necessary in 2168 NFSv4.1, as it was in NFSv4.0, to prevent locking state changing 2169 while it is being transferred. 2171 There are a number of important differences in the NFS4.1 context: 2173 o The absence of RELEASE_LOCKOWNER means that the one case in which 2174 an operation could not be deferred by use of NFS4ERR_DELAY no 2175 longer exists. 2177 o Sequencing of operations is no longer done using owner-based 2178 operation sequences numbers. Instead, sequencing is session- 2179 based 2181 As a result, when sessions are not transferred, the techniques 2182 discussed in Section 7.2 of [RFC7931] are adequate and will not be 2183 further discussed. 2185 11.3. Third sub-section within new section to be added to [RFC5661] to 2186 be entitled "Server Responsibilities in Effecting Session 2187 Transfer" 2189 The basic responsibility of the source server in effecting session 2190 transfer is to make available to the destination server a description 2191 of the current state of each slot with the session, including: 2193 o The last sequence value received for that slot. 2195 o Whether there is cached reply data for the last request executed 2196 and, if so, the cached reply. 2198 When sessions are transferred, there are a number of issues that pose 2199 challenges in terms of making the transferred state unmodifiable 2200 during the period it is gathered up and transferred to the 2201 destination server. 2203 o A single session may be used to access multiple file systems, not 2204 all of which are being transferred. 2206 o Requests made on a session may, even if rejected, affect the state 2207 of the session by advancing the sequence number associated with 2208 the slot used. 2210 As a result, when the filesystem state might otherwise be considered 2211 unmodifiable, the client might have any number of in-flight requests, 2212 each of which is capable of changing session state, which may be of a 2213 number of types: 2215 1. Those requests that were processed on the migrating file system, 2216 before migration began. 2218 2. Those requests which got the error NFS4ERR_DELAY because the file 2219 system being accessed was in the process of being migrated. 2221 3. Those requests which got the error NFS4ERR_MOVED because the file 2222 system being accessed had been migrated. 2224 4. Those requests that accessed the migrating file system, in order 2225 to obtain location or status information. 2227 5. Those requests that did not reference the migrating file system. 2229 It should be noted that the history of any particular slot is likely 2230 to include a number of these request classes. In the case in which a 2231 session which is migrated is used by filesystems other than the one 2232 migrated, requests of class 5 may be common and be the last request 2233 processed, for many slots. 2235 Since session state can change even after the locking state has been 2236 fixed as part of the migration process, the session state known to 2237 the client could be different from that on the destination server, 2238 which necessarily reflects the session state on the source server, at 2239 an earlier time. In deciding how to deal with this situation, it is 2240 helpful to distinguish between two sorts of behavioral consequences 2241 of the choice of initial sequence ID values. 2243 o The error NFS4ERR_SEQ_MISORDERED is returned when the sequence ID 2244 in a request is neither equal to the last one seen for the current 2245 slot nor the next greater one. 2247 In view of the difficulty of arriving at a mutually acceptable 2248 value for the correct last sequence value at the point of 2249 migration, it may be necessary for the server to show some degree 2250 of forbearance, when the sequence ID is one that would be 2251 considered unacceptable if session migration were not involved. 2253 o Returning the cached reply for a previously executed request when 2254 the sequence ID in the request matches the last value recorded for 2255 the slot. 2257 In the cases in which an error is returned and there is no 2258 possibility of any non-idempotent operation having been executed, 2259 it may not be necessary to adhere to this as strictly as might be 2260 proper if session migration were not involved. For example, the 2261 fact that the error NFS4ERR_DELAY was returned may not assist the 2262 client in any material way, while the fact that NFS4ERR_MOVED was 2263 returned by the source server may not be relevant when the request 2264 was reissued, directed to the destination server. 2266 One part of the necessary adaptation to these sorts of issues would 2267 restrict enforcement of normal slot sequence enforcement semantics 2268 until the client itself, by issuing a request using a particular slot 2269 on the destination server, established the new starting sequence for 2270 that slot on the migrated session. 2272 An important issue is that the specification needs to take note of 2273 all potential COMPOUNDs, even if they might be unlikely in practice. 2274 For example, a COMPOUND is allowed to access multiple file systems 2275 and might perform non-idempotent operations in some of them before 2276 accessing a file system being migrated. Also, a COMPOUND may return 2277 considerable data in the response, before being rejected with 2278 NFS4ERR_DELAY or NFS4ERR_MOVED, and may in addition be marked as 2279 sa_cachethis. 2281 To address these issues, a destination server MAY do any of the 2282 following when implementing session transfer. 2284 o Avoid enforcing any sequencing semantics for a particular slot 2285 until the client has established the starting sequence for that 2286 slot on the destination server. 2288 o For each slot, avoid returning a cached reply returning 2289 NFS4ERR_DELAY or NFS4ERR_MOVED until the client has established 2290 the starting sequence for that slot on the destination server. 2292 o Until the client has established the starting sequence for a 2293 particular slot on the destination server, avoid reporting 2294 NFS4ERR_SEQ_MISORDERED or return a cached reply returning 2295 NFS4ERR_DELAY or NFS4ERR_MOVED, where the reply consists solely of 2296 a series of operations where the response is NFS4_OK until the 2297 final error. 2299 12. fs_locations_info 2301 12.1. Updates to treatment of fs_locations_info 2303 Various elements of the fs_locations_info attribute contain 2304 information that applies to either a specific filesystem replica or 2305 to a network path or set of network paths used to access such a 2306 replica. The existing treatment of fs_locations info (in 2307 Section 11.10 of [RFC5661]) does not clearly distinguish these cases, 2308 in part because the document did not clearly distinguish replicas 2309 from the paths used to access them. 2311 In addition, special clarification needed to be provided for: 2313 o With regard to the handling of FSLI4GF_GOING, it needs to be made 2314 clear that this only applies to the unavailability of a replica 2315 rather than to a path to access a replica. 2317 o In describing the appropriate value for a server to use for 2318 fli_valid_for, it needs to be made clear that there is no need for 2319 the client to frequently fetch the fs_locations_info value to be 2320 prepared for shifts in trunking patterns. 2322 o Clarification of the rules for extensions of the fls_info needs to 2323 be provided. The existing treatment reflects the extension model 2324 in effect at the time [RFC5661] was written, and need to be 2325 updated in accordance with the extension model described in 2326 [RFC8178]. 2328 12.2. Updated Section 11.10 of [RFC5661] entitled "The Attribute 2329 fs_locations_info" 2331 The fs_locations_info attribute is intended as a more functional 2332 replacement for the fs_locations attribute which will continue to 2333 exist and be supported. Clients can use it to get a more complete 2334 set of data about alternative file system locations, including 2335 additional network paths to access replicas in use and additional 2336 replicas. When the server does not support fs_locations_info, 2337 fs_locations can be used to get a subset of the data. A server that 2338 supports fs_locations_info MUST support fs_locations as well. 2340 There is additional data present in fs_locations_info, that is not 2341 available in fs_locations: 2343 o Attribute continuity information. This information will allow a 2344 client to select a replica that meets the transparency 2345 requirements of the applications accessing the data and to 2346 leverage optimizations due to the server guarantees of attribute 2347 continuity (e.g., if the change attribute of a file of the file 2348 system is continuous between multiple replicas, the client does 2349 not have to invalidate the file's cache when switching to a 2350 different replica). 2352 o File system identity information that indicates when multiple 2353 replicas, from the client's point of view, correspond to the same 2354 target file system, allowing them to be used interchangeably, 2355 without disruption, as distinct synchronized replicas of the same 2356 file data. 2358 Note that having two replicas with common identity information is 2359 distinct from the case of two (trunked) paths to the same replica. 2361 o Information that will bear on the suitability of various replicas, 2362 depending on the use that the client intends. For example, many 2363 applications need an absolutely up-to-date copy (e.g., those that 2364 write), while others may only need access to the most up-to-date 2365 copy reasonably available. 2367 o Server-derived preference information for replicas, which can be 2368 used to implement load-balancing while giving the client the 2369 entire file system list to be used in case the primary fails. 2371 The fs_locations_info attribute is structured similarly to the 2372 fs_locations attribute. A top-level structure (fs_locations_info4) 2373 contains the entire attribute including the root pathname of the file 2374 system and an array of lower-level structures that define replicas 2375 that share a common rootpath on their respective servers. The lower- 2376 level structure in turn (fs_locations_item4) contains a specific 2377 pathname and information on one or more individual network access 2378 paths. For that last lowest level, fs_locations_info has an 2379 fs_locations_server4 structure that contains per-server-replica 2380 information in addition to the file system location entry. This per- 2381 server-replica information includes a nominally opaque array, 2382 fls_info, within which specific pieces of information are located at 2383 the specific indices listed below. 2385 Two fs_location_server4 entries that are within different 2386 fs_location_item4 structures are never trunkable, while two entries 2387 within in the same fs_location_item4 structure might or might not be 2388 trunkable. Two entries that are trunkable will have identical 2389 identity information, although, as noted above, the converse is not 2390 the case. 2392 The attribute will always contain at least a single 2393 fs_locations_server entry. Typically, there will be an entries with 2394 the FS4LIGF_CUR_REQ flag set, although in the case of a referral 2395 there will be no entry with that flag set. 2397 It should be noted that fs_locations_info attributes returned by 2398 servers for various replicas may differ for various reasons. One 2399 server may know about a set of replicas that are not known to other 2400 servers. Further, compatibility attributes may differ. Filehandles 2401 might be of the same class going from replica A to replica B but not 2402 going in the reverse direction. This might happen because the 2403 filehandles are the same, but replica B's server implementation might 2404 not have provision to note and report that equivalence. 2406 The fs_locations_info attribute consists of a root pathname 2407 (fli_fs_root, just like fs_root in the fs_locations attribute), 2408 together with an array of fs_location_item4 structures. The 2409 fs_location_item4 structures in turn consist of a root pathname 2410 (fli_rootpath) together with an array (fli_entries) of elements of 2411 data type fs_locations_server4, all defined as follows. 2413 2415 /* 2416 * Defines an individual server access path 2417 */ 2418 struct fs_locations_server4 { 2419 int32_t fls_currency; 2420 opaque fls_info<>; 2421 utf8str_cis fls_server; 2422 }; 2424 /* 2425 * Byte indices of items within 2426 * fls_info: flag fields, class numbers, 2427 * bytes indicating ranks and orders. 2428 */ 2429 const FSLI4BX_GFLAGS = 0; 2430 const FSLI4BX_TFLAGS = 1; 2432 const FSLI4BX_CLSIMUL = 2; 2433 const FSLI4BX_CLHANDLE = 3; 2434 const FSLI4BX_CLFILEID = 4; 2435 const FSLI4BX_CLWRITEVER = 5; 2436 const FSLI4BX_CLCHANGE = 6; 2437 const FSLI4BX_CLREADDIR = 7; 2439 const FSLI4BX_READRANK = 8; 2440 const FSLI4BX_WRITERANK = 9; 2441 const FSLI4BX_READORDER = 10; 2442 const FSLI4BX_WRITEORDER = 11; 2444 /* 2445 * Bits defined within the general flag byte. 2446 */ 2447 const FSLI4GF_WRITABLE = 0x01; 2448 const FSLI4GF_CUR_REQ = 0x02; 2449 const FSLI4GF_ABSENT = 0x04; 2450 const FSLI4GF_GOING = 0x08; 2451 const FSLI4GF_SPLIT = 0x10; 2453 /* 2454 * Bits defined within the transport flag byte. 2455 */ 2456 const FSLI4TF_RDMA = 0x01; 2458 /* 2459 * Defines a set of replicas sharing 2460 * a common value of the rootpath 2461 * within the corresponding 2462 * single-server namespaces. 2463 */ 2464 struct fs_locations_item4 { 2465 fs_locations_server4 fli_entries<>; 2466 pathname4 fli_rootpath; 2467 }; 2469 /* 2470 * Defines the overall structure of 2471 * the fs_locations_info attribute. 2472 */ 2473 struct fs_locations_info4 { 2474 uint32_t fli_flags; 2475 int32_t fli_valid_for; 2476 pathname4 fli_fs_root; 2477 fs_locations_item4 fli_items<>; 2478 }; 2480 /* 2481 * Flag bits in fli_flags. 2482 */ 2483 const FSLI4IF_VAR_SUB = 0x00000001; 2485 typedef fs_locations_info4 fattr4_fs_locations_info; 2487 2488 As noted above, the fs_locations_info attribute, when supported, may 2489 be requested of absent file systems without causing NFS4ERR_MOVED to 2490 be returned. It is generally expected that it will be available for 2491 both present and absent file systems even if only a single 2492 fs_locations_server4 entry is present, designating the current 2493 (present) file system, or two fs_locations_server4 entries 2494 designating the previous location of an absent file system (the one 2495 just referenced) and its successor location. Servers are strongly 2496 urged to support this attribute on all file systems if they support 2497 it on any file system. 2499 The data presented in the fs_locations_info attribute may be obtained 2500 by the server in any number of ways, including specification by the 2501 administrator or by current protocols for transferring data among 2502 replicas and protocols not yet developed. NFSv4.1 only defines how 2503 this information is presented by the server to the client. 2505 12.2.1. Updated section 11.10.1 of [RFC5661] entitled "The 2506 fs_locations_server4 Structure" 2508 The fs_locations_server4 structure consists of the following items in 2509 addition to the fls_server field which specifies a network address or 2510 set of addresses to be used to access the specified file system. 2511 Note that both of these items specify attributes of the file system 2512 replica and should not be different when there are multiple 2513 fs_locations_server4 structures for the same replica, each specifying 2514 a network path to the chosen replica. 2516 o An indication of how up-to-date the file system is (fls_currency) 2517 in seconds. This value is relative to the master copy. A 2518 negative value indicates that the server is unable to give any 2519 reasonably useful value here. A value of zero indicates that the 2520 file system is the actual writable data or a reliably coherent and 2521 fully up-to-date copy. Positive values indicate how out-of-date 2522 this copy can normally be before it is considered for update. 2523 Such a value is not a guarantee that such updates will always be 2524 performed on the required schedule but instead serves as a hint 2525 about how far the copy of the data would be expected to be behind 2526 the most up-to-date copy. 2528 o A counted array of one-byte values (fls_info) containing 2529 information about the particular file system instance. This data 2530 includes general flags, transport capability flags, file system 2531 equivalence class information, and selection priority information. 2532 The encoding will be discussed below. 2534 o The server string (fls_server). For the case of the replica 2535 currently being accessed (via GETATTR), a zero-length string MAY 2536 be used to indicate the current address being used for the RPC 2537 call. The fls_server field can also be an IPv4 or IPv6 address, 2538 formatted the same way as an IPv4 or IPv6 address in the "server" 2539 field of the fs_location4 data type (see Section 11.9 of 2540 [RFC5661]). 2542 With the exception of the transport-flag field (at offset 2543 FSLIBX_TFLAGS with the fls_info array), all of this data applies to 2544 the replica specified by the entry, rather that the specific network 2545 path used to access it. 2547 Data within the fls_info array is in the form of 8-bit data items 2548 with constants giving the offsets within the array of various values 2549 describing this particular file system instance. This style of 2550 definition was chosen, in preference to explicit XDR structure 2551 definitions for these values, for a number of reasons. 2553 o The kinds of data in the fls_info array, representing flags, file 2554 system classes, and priorities among sets of file systems 2555 representing the same data, are such that 8 bits provide a quite 2556 acceptable range of values. Even where there might be more than 2557 256 such file system instances, having more than 256 distinct 2558 classes or priorities is unlikely. 2560 o Explicit definition of the various specific data items within XDR 2561 would limit expandability in that any extension within would 2562 require yet another attribute, leading to specification and 2563 implementation clumsiness. In the context of the NFSv4 extension 2564 model in effect at the time fs_locations_info was designed (i.e. 2565 that described in [RFC5661]), this would necessitate a new minor 2566 to effect any Standards Track extension to the data in in 2567 fls_info. 2569 The set of fls_info data is subject to expansion in a future minor 2570 version, or in a Standards Track RFC, within the context of a single 2571 minor version. The server SHOULD NOT send and the client MUST NOT 2572 use indices within the fls_info array or flag bits that are not 2573 defined in Standards Track RFCs. 2575 In light of the new extension model defined in [RFC8178] and the fact 2576 that the individual items within fls_info are not explicitly 2577 referenced in the XDR, the following practices should be followed 2578 when extending or otherwise changing the structure of the data 2579 returned in fls_info within the scope of a single minor version. 2581 o All extensions need to be described by Standards Track documents. 2582 There is no need for such documents to be marked as updating 2583 [RFC5661] or this document. 2585 o It needs to be made clear whether the information in any added 2586 data items applies to the replica specified by the entry or to the 2587 specific network paths specified in the entry. 2589 o There needs to be a reliable way defined to determine whether the 2590 server is aware of the extension. This may be based on the length 2591 field of the fls_info array, but it is more flexible to provide 2592 fs-scope or server-scope attributes to indicate what extensions 2593 are provided. 2595 This encoding scheme can be adapted to the specification of multi- 2596 byte numeric values, even though none are currently defined. If 2597 extensions are made via Standards Track RFCs, multi-byte quantities 2598 will be encoded as a range of bytes with a range of indices, with the 2599 byte interpreted in big-endian byte order. Further, any such index 2600 assignments will be constrained by the need for the relevant 2601 quantities not to cross XDR word boundaries. 2603 The fls_info array currently contains: 2605 o Two 8-bit flag fields, one devoted to general file-system 2606 characteristics and a second reserved for transport-related 2607 capabilities. 2609 o Six 8-bit class values that define various file system equivalence 2610 classes as explained below. 2612 o Four 8-bit priority values that govern file system selection as 2613 explained below. 2615 The general file system characteristics flag (at byte index 2616 FSLI4BX_GFLAGS) has the following bits defined within it: 2618 o FSLI4GF_WRITABLE indicates that this file system target is 2619 writable, allowing it to be selected by clients that may need to 2620 write on this file system. When the current file system instance 2621 is writable and is defined as of the same simultaneous use class 2622 (as specified by the value at index FSLI4BX_CLSIMUL) to which the 2623 client was previously writing, then it must incorporate within its 2624 data any committed write made on the source file system instance. 2625 See Section 8.6, which discusses the write-verifier class. While 2626 there is no harm in not setting this flag for a file system that 2627 turns out to be writable, turning the flag on for a read-only file 2628 system can cause problems for clients that select a migration or 2629 replication target based on the flag and then find themselves 2630 unable to write. 2632 o FSLI4GF_CUR_REQ indicates that this replica is the one on which 2633 the request is being made. Only a single server entry may have 2634 this flag set and, in the case of a referral, no entry will have 2635 it set. Note that this flag might be set even if the request was 2636 made on a network access path different from any of those 2637 specified in the current entry. 2639 o FSLI4GF_ABSENT indicates that this entry corresponds to an absent 2640 file system replica. It can only be set if FSLI4GF_CUR_REQ is 2641 set. When both such bits are set, it indicates that a file system 2642 instance is not usable but that the information in the entry can 2643 be used to determine the sorts of continuity available when 2644 switching from this replica to other possible replicas. Since 2645 this bit can only be true if FSLI4GF_CUR_REQ is true, the value 2646 could be determined using the fs_status attribute, but the 2647 information is also made available here for the convenience of the 2648 client. An entry with this bit, since it represents a true file 2649 system (albeit absent), does not appear in the event of a 2650 referral, but only when a file system has been accessed at this 2651 location and has subsequently been migrated. 2653 o FSLI4GF_GOING indicates that a replica, while still available, 2654 should not be used further. The client, if using it, should make 2655 an orderly transfer to another file system instance as 2656 expeditiously as possible. It is expected that file systems going 2657 out of service will be announced as FSLI4GF_GOING some time before 2658 the actual loss of service. It is also expected that the 2659 fli_valid_for value will be sufficiently small to allow clients to 2660 detect and act on scheduled events, while large enough that the 2661 cost of the requests to fetch the fs_locations_info values will 2662 not be excessive. Values on the order of ten minutes seem 2663 reasonable. 2665 When this flag is seen as part of a transition into a new file 2666 system, a client might choose to transfer immediately to another 2667 replica, or it may reference the current file system and only 2668 transition when a migration event occurs. Similarly, when this 2669 flag appears as a replica in the referral, clients would likely 2670 avoid being referred to this instance whenever there is another 2671 choice. 2673 This flag, like the other items within fls_info applies to the 2674 replica, rather than to a particular path to that replica. When 2675 it appears, a transition to a new replica rather than to a 2676 different path to the same replica, is indicated. 2678 o FSLI4GF_SPLIT indicates that when a transition occurs from the 2679 current file system instance to this one, the replacement may 2680 consist of multiple file systems. In this case, the client has to 2681 be prepared for the possibility that objects on the same file 2682 system before migration will be on different ones after. Note 2683 that FSLI4GF_SPLIT is not incompatible with the file systems 2684 belonging to the same fileid class since, if one has a set of 2685 fileids that are unique within a file system, each subset assigned 2686 to a smaller file system after migration would not have any 2687 conflicts internal to that file system. 2689 A client, in the case of a split file system, will interrogate 2690 existing files with which it has continuing connection (it is free 2691 to simply forget cached filehandles). If the client remembers the 2692 directory filehandle associated with each open file, it may 2693 proceed upward using LOOKUPP to find the new file system 2694 boundaries. Note that in the event of a referral, there will not 2695 be any such files and so these actions will not be performed. 2696 Instead, a reference to a portion of the original file system now 2697 split off into other file systems will encounter an fsid change 2698 and possibly a further referral. 2700 Once the client recognizes that one file system has been split 2701 into two, it can prevent the disruption of running applications by 2702 presenting the two file systems as a single one until a convenient 2703 point to recognize the transition, such as a restart. This would 2704 require a mapping from the server's fsids to fsids as seen by the 2705 client, but this is already necessary for other reasons. As noted 2706 above, existing fileids within the two descendant file systems 2707 will not conflict. Providing non-conflicting fileids for newly 2708 created files on the split file systems is the responsibility of 2709 the server (or servers working in concert). The server can encode 2710 filehandles such that filehandles generated before the split event 2711 can be discerned from those generated after the split, allowing 2712 the server to determine when the need for emulating two file 2713 systems as one is over. 2715 Although it is possible for this flag to be present in the event 2716 of referral, it would generally be of little interest to the 2717 client, since the client is not expected to have information 2718 regarding the current contents of the absent file system. 2720 The transport-flag field (at byte index FSLI4BX_TFLAGS) contains the 2721 following bits related to the transport capabilities of the specific 2722 network path(s) specified by the entry. 2724 o FSLI4TF_RDMA indicates that any specified network paths provide 2725 NFSv4.1 clients access using an RDMA-capable transport. 2727 Attribute continuity and file system identity information are 2728 expressed by defining equivalence relations on the sets of file 2729 systems presented to the client. Each such relation is expressed as 2730 a set of file system equivalence classes. For each relation, a file 2731 system has an 8-bit class number. Two file systems belong to the 2732 same class if both have identical non-zero class numbers. Zero is 2733 treated as non-matching. Most often, the relevant question for the 2734 client will be whether a given replica is identical to / continuous 2735 with the current one in a given respect, but the information should 2736 be available also as to whether two other replicas match in that 2737 respect as well. 2739 The following fields specify the file system's class numbers for the 2740 equivalence relations used in determining the nature of file system 2741 transitions. See Sections 6 through 11 and their various subsections 2742 for details about how this information is to be used. Servers may 2743 assign these values as they wish, so long as file system instances 2744 that share the same value have the specified relationship to one 2745 another; conversely, file systems that have the specified 2746 relationship to one another share a common class value. As each 2747 instance entry is added, the relationships of this instance to 2748 previously entered instances can be consulted, and if one is found 2749 that bears the specified relationship, that entry's class value can 2750 be copied to the new entry. When no such previous entry exists, a 2751 new value for that byte index (not previously used) can be selected, 2752 most likely by incrementing the value of the last class value 2753 assigned for that index. 2755 o The field with byte index FSLI4BX_CLSIMUL defines the 2756 simultaneous-use class for the file system. 2758 o The field with byte index FSLI4BX_CLHANDLE defines the handle 2759 class for the file system. 2761 o The field with byte index FSLI4BX_CLFILEID defines the fileid 2762 class for the file system. 2764 o The field with byte index FSLI4BX_CLWRITEVER defines the write- 2765 verifier class for the file system. 2767 o The field with byte index FSLI4BX_CLCHANGE defines the change 2768 class for the file system. 2770 o The field with byte index FSLI4BX_CLREADDIR defines the readdir 2771 class for the file system. 2773 Server-specified preference information is also provided via 8-bit 2774 values within the fls_info array. The values provide a rank and an 2775 order (see below) to be used with separate values specifiable for the 2776 cases of read-only and writable file systems. These values are 2777 compared for different file systems to establish the server-specified 2778 preference, with lower values indicating "more preferred". 2780 Rank is used to express a strict server-imposed ordering on clients, 2781 with lower values indicating "more preferred". Clients should 2782 attempt to use all replicas with a given rank before they use one 2783 with a higher rank. Only if all of those file systems are 2784 unavailable should the client proceed to those of a higher rank. 2785 Because specifying a rank will override client preferences, servers 2786 should be conservative about using this mechanism, particularly when 2787 the environment is one in which client communication characteristics 2788 are neither tightly controlled nor visible to the server. 2790 Within a rank, the order value is used to specify the server's 2791 preference to guide the client's selection when the client's own 2792 preferences are not controlling, with lower values of order 2793 indicating "more preferred". If replicas are approximately equal in 2794 all respects, clients should defer to the order specified by the 2795 server. When clients look at server latency as part of their 2796 selection, they are free to use this criterion but it is suggested 2797 that when latency differences are not significant, the server- 2798 specified order should guide selection. 2800 o The field at byte index FSLI4BX_READRANK gives the rank value to 2801 be used for read-only access. 2803 o The field at byte index FSLI4BX_READORDER gives the order value to 2804 be used for read-only access. 2806 o The field at byte index FSLI4BX_WRITERANK gives the rank value to 2807 be used for writable access. 2809 o The field at byte index FSLI4BX_WRITEORDER gives the order value 2810 to be used for writable access. 2812 Depending on the potential need for write access by a given client, 2813 one of the pairs of rank and order values is used. The read rank and 2814 order should only be used if the client knows that only reading will 2815 ever be done or if it is prepared to switch to a different replica in 2816 the event that any write access capability is required in the future. 2818 12.2.2. Updated Section 11.10.2 of [RFC5661] entitled "The 2819 fs_locations_info4 Structure" 2821 The fs_locations_info4 structure, encoding the fs_locations_info 2822 attribute, contains the following: 2824 o The fli_flags field, which contains general flags that affect the 2825 interpretation of this fs_locations_info4 structure and all 2826 fs_locations_item4 structures within it. The only flag currently 2827 defined is FSLI4IF_VAR_SUB. All bits in the fli_flags field that 2828 are not defined should always be returned as zero. 2830 o The fli_fs_root field, which contains the pathname of the root of 2831 the current file system on the current server, just as it does in 2832 the fs_locations4 structure. 2834 o An array called fli_items of fs_locations4_item structures, which 2835 contain information about replicas of the current file system. 2836 Where the current file system is actually present, or has been 2837 present, i.e., this is not a referral situation, one of the 2838 fs_locations_item4 structures will contain an fs_locations_server4 2839 for the current server. This structure will have FSLI4GF_ABSENT 2840 set if the current file system is absent, i.e., normal access to 2841 it will return NFS4ERR_MOVED. 2843 o The fli_valid_for field specifies a time in seconds for which it 2844 is reasonable for a client to use the fs_locations_info attribute 2845 without refetch. The fli_valid_for value does not provide a 2846 guarantee of validity since servers can unexpectedly go out of 2847 service or become inaccessible for any number of reasons. Clients 2848 are well-advised to refetch this information for an actively 2849 accessed file system at every fli_valid_for seconds. This is 2850 particularly important when file system replicas may go out of 2851 service in a controlled way using the FSLI4GF_GOING flag to 2852 communicate an ongoing change. The server should set 2853 fli_valid_for to a value that allows well-behaved clients to 2854 notice the FSLI4GF_GOING flag and make an orderly switch before 2855 the loss of service becomes effective. If this value is zero, 2856 then no refetch interval is appropriate and the client need not 2857 refetch this data on any particular schedule. In the event of a 2858 transition to a new file system instance, a new value of the 2859 fs_locations_info attribute will be fetched at the destination. 2860 It is to be expected that this may have a different fli_valid_for 2861 value, which the client should then use in the same fashion as the 2862 previous value. Because a refetch of the attribute cause 2863 information from all component entries to be refetched, the server 2864 will typically provide a low value for this field if any of the 2865 replicas are likely to go out of service in a short time frame. 2867 Note that, because of the ability of the server to return 2868 NFS4ERR_MOVED to change to use of different paths, when alternate 2869 trunked paths are available, there is generally no need to use low 2870 values of fli_valid_for in connection with the management of 2871 alternate paths to the same replica. 2873 The FSLI4IF_VAR_SUB flag within fli_flags controls whether variable 2874 substitution is to be enabled. See Section 12.2.3 for an explanation 2875 of variable substitution. 2877 12.2.3. Updated Section 11.10.3 of [RFC5661] entitled "The 2878 fs_locations_item4 Structure" 2880 The fs_locations_item4 structure contains a pathname (in the field 2881 fli_rootpath) that encodes the path of the target file system 2882 replicas on the set of servers designated by the included 2883 fs_locations_server4 entries. The precise manner in which this 2884 target location is specified depends on the value of the 2885 FSLI4IF_VAR_SUB flag within the associated fs_locations_info4 2886 structure. 2888 If this flag is not set, then fli_rootpath simply designates the 2889 location of the target file system within each server's single-server 2890 namespace just as it does for the rootpath within the fs_location4 2891 structure. When this bit is set, however, component entries of a 2892 certain form are subject to client-specific variable substitution so 2893 as to allow a degree of namespace non-uniformity in order to 2894 accommodate the selection of client-specific file system targets to 2895 adapt to different client architectures or other characteristics. 2897 When such substitution is in effect, a variable beginning with the 2898 string "${" and ending with the string "}" and containing a colon is 2899 to be replaced by the client-specific value associated with that 2900 variable. The string "unknown" should be used by the client when it 2901 has no value for such a variable. The pathname resulting from such 2902 substitutions is used to designate the target file system, so that 2903 different clients may have different file systems, corresponding to 2904 that location in the multi-server namespace. 2906 As mentioned above, such substituted pathname variables contain a 2907 colon. The part before the colon is to be a DNS domain name, and the 2908 part after is to be a case-insensitive alphanumeric string. 2910 Where the domain is "ietf.org", only variable names defined in this 2911 document or subsequent Standards Track RFCs are subject to such 2912 substitution. Organizations are free to use their domain names to 2913 create their own sets of client-specific variables, to be subject to 2914 such substitution. In cases where such variables are intended to be 2915 used more broadly than a single organization, publication of an 2916 Informational RFC defining such variables is RECOMMENDED. 2918 The variable ${ietf.org:CPU_ARCH} is used to denote that the CPU 2919 architecture object files are compiled. This specification does not 2920 limit the acceptable values (except that they must be valid UTF-8 2921 strings), but such values as "x86", "x86_64", and "sparc" would be 2922 expected to be used in line with industry practice. 2924 The variable ${ietf.org:OS_TYPE} is used to denote the operating 2925 system, and thus the kernel and library APIs, for which code might be 2926 compiled. This specification does not limit the acceptable values 2927 (except that they must be valid UTF-8 strings), but such values as 2928 "linux" and "freebsd" would be expected to be used in line with 2929 industry practice. 2931 The variable ${ietf.org:OS_VERSION} is used to denote the operating 2932 system version, and thus the specific details of versioned 2933 interfaces, for which code might be compiled. This specification 2934 does not limit the acceptable values (except that they must be valid 2935 UTF-8 strings). However, combinations of numbers and letters with 2936 interspersed dots would be expected to be used in line with industry 2937 practice, with the details of the version format depending on the 2938 specific value of the variable ${ietf.org:OS_TYPE} with which it is 2939 used. 2941 Use of these variables could result in the direction of different 2942 clients to different file systems on the same server, as appropriate 2943 to particular clients. In cases in which the target file systems are 2944 located on different servers, a single server could serve as a 2945 referral point so that each valid combination of variable values 2946 would designate a referral hosted on a single server, with the 2947 targets of those referrals on a number of different servers. 2949 Because namespace administration is affected by the values selected 2950 to substitute for various variables, clients should provide 2951 convenient means of determining what variable substitutions a client 2952 will implement, as well as, where appropriate, providing means to 2953 control the substitutions to be used. The exact means by which this 2954 will be done is outside the scope of this specification. 2956 Although variable substitution is most suitable for use in the 2957 context of referrals, it may be used in the context of replication 2958 and migration. If it is used in these contexts, the server must 2959 ensure that no matter what values the client presents for the 2960 substituted variables, the result is always a valid successor file 2961 system instance to that from which a transition is occurring, i.e., 2962 that the data is identical or represents a later image of a writable 2963 file system. 2965 Note that when fli_rootpath is a null pathname (that is, one with 2966 zero components), the file system designated is at the root of the 2967 specified server, whether or not the FSLI4IF_VAR_SUB flag within the 2968 associated fs_locations_info4 structure is set. 2970 13. Changes to [RFC5661] outside Section 11 2972 Beside the major rework of Section 11, there are a number of related 2973 changes that are necessary: 2975 o The summary that appeared in Section 1.7.3.3 of [RFC5661] needs to 2976 be revised to reflect the changes called for in Section 4 of the 2977 current document. The updated summary appears as Section 13.1 2978 below. 2980 o The discussion of server scope which appeared in Section 2.10.4 of 2981 [RFC5661] needs to be replaced, since the existing text appears to 2982 require a level of inter-server co-ordination incompatible with 2983 its basic function of avoiding the need for a globally uniform 2984 means of assigning server_owner values. A revised treatment 2985 appears in Section 13.2 below. 2987 o While the last paragraph (exclusive of sub-sections) of 2988 Section 2.10.5 in [RFC5661], dealing with server_owner changes, is 2989 literally true, it has been a source of confusion. Since the 2990 existing paragraph can be read as suggesting that such changes be 2991 dealt with non-disruptively, the treatment in Section 13.4 below 2992 needs to be substituted. 2994 o The existing definition of NFS4ERR_MOVED (in Section 15.1.2.4 of 2995 [RFC5661]) needs to be updated to reflect the different handling 2996 of unavailability of a particular fs via a specific network 2997 address. Since such a situation is no longer considered to 2998 constitute unavailability of a file system instance, the 2999 description needs to change even though the set of circumstances 3000 in which it is to be returned remain the same. The updated 3001 description appears in Section 13.3 below. 3003 o The existing treatment of EXCHANGE_ID (in Section 18.35 of 3004 [RFC5661]) assumes that client IDs cannot be created/ confirmed 3005 other than by the EXCHANGE_ID and CREATE_SESSION operations. 3006 Also, the necessary use of EXCHANGE_ID in recovery from migration 3007 and related situations is not addressed clearly. A revised 3008 treatment of EXCHANGE_ID is necessary and it appears in Section 14 3009 below while the specific differences between it and the treatment 3010 within [RFC5661] are explained in Section 13.5 below. 3012 o The existing treatment of RECLAIM_COMPLETE in section 18.51 of 3013 [RFC5661]) is not sufficiently clear about the purpose and use of 3014 the rca_one_fs and how the server is to deal with inappropriate 3015 values of this argument. Because the resulting confusion raises 3016 interoperability issues, a new treatment of RECLAIM_COMPLETE is 3017 necessary and it appears in Section 15 below while the specific 3018 differences between it and the treatment within [RFC5661] are 3019 discussed in Section 13.6 below. In addition, the definitions of 3020 the reclaim-related errors receive an updated treatment in 3021 Section 13.7 to reflect the fact that there are multiple contexts 3022 for lock reclaim operations. 3024 13.1. Updated section 1.7.3.3 of [RFC5661] to be retitled "Introduction 3025 to Multi-Server Namespace" 3027 NFSv4.1 contains a number of features to allow implementation of 3028 namespaces that cross server boundaries and that allow and facilitate 3029 a non-disruptive transfer of support for individual file systems 3030 between servers. They are all based upon attributes that allow one 3031 file system to specify alternate, additional, and new location 3032 information which specifies how the client may access to access that 3033 file system. 3035 These attributes can be used to provide for individual active file 3036 systems: 3038 o Alternate network addresses to access the current file system 3039 instance. 3041 o The locations of alternate file system instances or replicas to be 3042 used in the event that the current file system instance becomes 3043 unavailable. 3045 These file system location attributes may be used together with the 3046 concept of absent file systems, in which a position in the server 3047 namespace is associated with locations on other servers without there 3048 being any corresponding file system instance on the current server. 3050 o These attributes may be used with absent file systems to implement 3051 referrals whereby one server may direct the client to a file 3052 system provided by another server. This allows extensive multi- 3053 server namespaces to be constructed. 3055 o These attributes may be provided when a previously present file 3056 system becomes absent. This allows non-disruptive migration of 3057 file systems to alternate servers. 3059 13.2. Updated Section 2.10.4 of [RFC5661] entitled "Server Scope" 3061 Servers each specify a server scope value in the form of an opaque 3062 string eir_server_scope returned as part of the results of an 3063 EXCHANGE_ID operation. The purpose of the server scope is to allow a 3064 group of servers to indicate to clients that a set of servers sharing 3065 the same server scope value has arranged to use compatible values of 3066 otherwise opaque identifiers. Thus, the identifiers generated by two 3067 servers within that set can be assumed compatible so that, in some 3068 cases, identifiers by one server in that set that set may be 3069 presented to another server of the same scope. 3071 The use of such compatible values does not imply that a value 3072 generated by one server will always be accepted by another. In most 3073 cases, it will not. However, a server will not accept a value 3074 generated by another inadvertently. When it does accept it, it will 3075 be because it is recognized as valid and carrying the same meaning as 3076 on another server of the same scope. 3078 When servers are of the same server scope, this compatibility of 3079 values applies to the following identifiers: 3081 o Filehandle values. A filehandle value accepted by two servers of 3082 the same server scope denotes the same object. A WRITE operation 3083 sent to one server is reflected immediately in a READ sent to the 3084 other. 3086 o Server owner values. When the server scope values are the same, 3087 server owner value may be validly compared. In cases where the 3088 server scope values are different, server owner values are treated 3089 as different even if they contain identical strings of bytes. 3091 The coordination among servers required to provide such compatibility 3092 can be quite minimal, and limited to a simple partition of the ID 3093 space. The recognition of common values requires additional 3094 implementation, but this can be tailored to the specific situations 3095 in which that recognition is desired. 3097 Clients will have occasion to compare the server scope values of 3098 multiple servers under a number of circumstances, each of which will 3099 be discussed under the appropriate functional section: 3101 o When server owner values received in response to EXCHANGE_ID 3102 operations sent to multiple network addresses are compared for the 3103 purpose of determining the validity of various forms of trunking, 3104 as described in Section 4.5.2 of the current document. 3106 o When network or server reconfiguration causes the same network 3107 address to possibly be directed to different servers, with the 3108 necessity for the client to determine when lock reclaim should be 3109 attempted, as described in Section 8.4.2.1 of [RFC5661]. 3111 When two replies from EXCHANGE_ID, each from two different server 3112 network addresses, have the same server scope, there are a number of 3113 ways a client can validate that the common server scope is due to two 3114 servers cooperating in a group. 3116 o If both EXCHANGE_ID requests were sent with RPCSEC_GSS ([RFC2203], 3117 [RFC5403], [RFC7861]) authentication and the server principal is 3118 the same for both targets, the equality of server scope is 3119 validated. It is RECOMMENDED that two servers intending to share 3120 the same server scope also share the same principal name. 3122 o The client may accept the appearance of the second server in the 3123 fs_locations or fs_locations_info attribute for a relevant file 3124 system. For example, if there is a migration event for a 3125 particular file system or there are locks to be reclaimed on a 3126 particular file system, the attributes for that particular file 3127 system may be used. The client sends the GETATTR request to the 3128 first server for the fs_locations or fs_locations_info attribute 3129 with RPCSEC_GSS authentication. It may need to do this in advance 3130 of the need to verify the common server scope. If the client 3131 successfully authenticates the reply to GETATTR, and the GETATTR 3132 request and reply containing the fs_locations or fs_locations_info 3133 attribute refers to the second server, then the equality of server 3134 scope is supported. A client may choose to limit the use of this 3135 form of support to information relevant to the specific file 3136 system involved (e.g. a file system being migrated). 3138 13.3. Revised Treatment of NFS4ERR_MOVED 3140 Because of the need to appropriately address trunking-related issues, 3141 some uses of the term "replica" in [RFC5661] have become problematic 3142 since a shift in network access paths was considered to be a shift to 3143 a different replica. As a result, the description of NFS4ERR_MOVED 3144 in [RFC5661] needs to be changed to the one below. The new paragraph 3145 explicitly recognizes that a different network address might be used, 3146 while the previous description, misleadingly, treated this as a shift 3147 between two replicas while only a single file system instance might 3148 be involved. 3150 The file system that contains the current filehandle object is not 3151 accessible using the address on which the request was made. It 3152 still might be accessible using other addresses server-trunkable 3153 with it or it might not be present at the server. In the latter 3154 case, it might have been relocated or migrated to another server, 3155 or it might have never been present. The client may obtain 3156 information regarding access to the file system location by 3157 obtaining the "fs_locations" or "fs_locations_info" attribute for 3158 the current filehandle. For further discussion, refer to 3159 Section 11 of [RFC5661], as modified by the current document. 3161 13.4. Revised Discussion of Server_owner changes 3163 Section 2.10.5 of [RFC5661] discusses the issue of possible 3164 server_owner changes as follows: 3166 The client should be prepared for the possibility that 3167 eir_server_owner values may be different on subsequent EXCHANGE_ID 3168 requests made to the same network address, as a result of various 3169 sorts of reconfiguration events. When this happens and the 3170 changes result in the invalidation of previously valid forms of 3171 trunking, the client should cease to use those forms, either by 3172 dropping connections or by adding sessions. For a discussion of 3173 lock reclaim as it relates to such reconfiguration events, see 3174 Section 8.4.2.1 3176 While this paragraph is literally true in that such reconfiguration 3177 events can happen and clients have to deal with them, it is confusing 3178 in that it can be read as suggesting that clients have to deal with 3179 them without disruption, which in general is impossible. This has 3180 led to confusion especially since the text is not very clear about 3181 the actions that might need to be done since: 3183 o The cases of change which are very disruptive (e.g. change if 3184 server scope) are not sufficiently distinguished from those that 3185 simply involve a change of trunking modes (i.e. change 3186 server_owner minor id) 3188 o There is an undue focus on the effect of such changes as they 3189 affect the comparison with corresponding value from other servers, 3190 without fully dealing with the issue that result from value 3191 discontinuity within a single server. 3193 Because of these issues the paragraph which appears at the end of 3194 Section 2.10.5 needs to be replaced by the material below. 3196 It is always possible that, as a result of various sorts of 3197 reconfiguration events, eir_server_scope and eir_server_owner 3198 values may be different on subsequent EXCHANGE_ID requests made to 3199 the same network address. 3201 In most cases such reconfiguration events will be disruptive and 3202 indicate that an IP address formerly connected to one server is 3203 now connected to an entirely different one. 3205 Some guidelines on client handling of such situations follow: 3207 * When eir_server_scope changes, the client has no assurance that 3208 any id's it obtained previously (e.g. file handles, state ids, 3209 client ids) can be validly used on the new server, and, even if 3210 the new server accepts them, there is no assurance that this is 3211 not due to accident. Thus it is best to treat all such state 3212 as lost/stale although a client may assume that the probability 3213 of inadvertent acceptance is low and treat this situation as 3214 within the next case. 3216 * When eir_server_scope remains the same and 3217 eir_server_owner.so_major_id changes, the client can use the 3218 filehandles it has, consider its locking state lost, and 3219 attempt to reclaim or otherwise re-obtain its locks. It may 3220 find that its file handle are now stale but if NFS4ERR_STALE is 3221 not received, it can proceed to reclaim or otherwise re-obtain 3222 its open locking state. 3224 * When eir_server_scope and eir_server_owner.so_major_id remain 3225 the same, the client has to use the now-current values of 3226 eir_server_owner.so_minor_id in deciding on appropriate forms 3227 of trunking. This may result in connections being dropped or 3228 new sessions being created. 3230 13.5. Revision to Treatment of EXCHANGE_ID 3232 There are a number of issues in the original treatment of EXCHANGE_ID 3233 (in [RFC5661]) that cause problems for Transparent State Migration 3234 and for the transfer of access between different network access paths 3235 to the same file system instance. 3237 These issues arise from the fact that this treatment was written: 3239 o Assuming that a client ID can only become known to a server by 3240 having been created by executing an EXCHANGE_ID, with confirmation 3241 of the ID only possible by execution of a CREATE_SESSION. 3243 o Considering the interactions between a client and a server only on 3244 a single network address 3246 As these assumptions have become invalid in the context of 3247 Transparent State Migration and active use of trunking, the treatment 3248 has been modified in several respects. 3250 o It had been assumed that an EXCHANGED_ID executed when the server 3251 is already aware of a given client instance must be either 3252 updating associated parameters (e.g. with respect to callbacks) or 3253 a lingering retransmission to deal with a previously lost reply. 3254 As result, any slot sequence returned by that operation would be 3255 of no use. The existing treatment goes so far as to say that it 3256 "MUST NOT" be used, although this usage is not in accord with 3257 [RFC2119]. This created a difficulty when an EXCHANGE_ID is done 3258 after Transparent State Migration since that slot sequence would 3259 need to be used in a subsequent CREATE_SESSION. 3261 In the updated treatment, CREATE_SESSION is a way that client IDs 3262 are confirmed but it is understood that other ways are possible. 3263 The slot sequence can be used as needed and cases in which it 3264 would be of no use are appropriately noted. 3266 o It was assumed that the only functions of EXCHANGE_ID were to 3267 inform the server of the client, create the client ID, and 3268 communicate it to the client. When multiple simultaneous 3269 connections are involved, as often happens when trunking, that 3270 treatment was inadequate in that it ignored the role of 3271 EXCHANGE_ID in associating the client ID with the connection on 3272 which it was done, so that it could be used by a subsequent 3273 CREATE_SESSSION, whose parameters do not include an explicit 3274 client ID. 3276 The new treatment explicitly discusses the role of EXCHANGE_ID in 3277 associating the client ID with the connection so it can be used by 3278 CREATE_SESSION and in associating a connection with an existing 3279 session. 3281 The new treatment can be found in Section 14 below. It is intended 3282 to supersede the treatment in Section 18.35 of [RFC5661]. Publishing 3283 a complete replacement for Section 18.35 allows the corrected 3284 definition to be read as a whole once [RFC5661] is updated 3286 13.6. Revision to Treatment of RECLAIM_COMPLETE 3288 The following changes were made to the treatment of RECLAIM_COMPLETE 3289 in [RFC5661] to arrive at the treatment in Section 15. 3291 o In a number of places the text is more explicit about the purpose 3292 of rca_one_fs and its connection to file system migration. 3294 o There is a discussion of situations in which either form of 3295 RECLAIM_COMPLETE would need to be done. 3297 o There is a discussion of interoperability issues that result from 3298 implementations that may have arisen due to the lack of clarity of 3299 the previous treatment of RECLAIM_COMPLETE. 3301 13.7. Updated Section 15.1.9 of [RFC5661] entitled "Reclaim Errors" 3303 These errors relate to the process of reclaiming locks after a server 3304 restart or in connection with the migration of a file system (i.e. in 3305 the case in which rca_one_fs is TRUE). 3307 13.7.1. Updated Section 15.1.9.1 of [RFC5661] entitled 3308 "NFS4ERR_COMPLETE_ALREADY (Error Code 10054)" 3310 The client previously sent a successful RECLAIM_COMPLETE operation 3311 specifying the same scope, whether that scope is global or for the 3312 same file system in the case of a per-fs RECLAIM_COMPLETE. An 3313 additional RECLAIM_COMPLETE operation is not necessary and results in 3314 this error. 3316 13.7.2. Updated Section 15.1.9.2 of [RFC5661] entitled "NFS4ERR_GRACE 3317 (Error Code 10013)" 3319 The server was in its recovery or grace period, with regard to the 3320 file system object for which the lock was requested. The locking 3321 request was not a reclaim request and so could not be granted during 3322 that period. 3324 13.7.3. Updated Section 15.1.9.3 of [RFC5661] entitled 3325 "NFS4ERR_NO_GRACE (Error Code 10033)" 3327 A reclaim of client state was attempted in circumstances in which the 3328 server cannot guarantee that conflicting state has not been provided 3329 to another client. This can occur because the reclaim has been done 3330 outside of a grace period of implemented by the server, after the 3331 client has done a RECLAIM_COMPLETE operation which ends its ability 3332 to reclaim the requested lock, or because previous operations have 3333 created a situation in which the server is not able to determine that 3334 a reclaim-interfering edge condition does not exist. 3336 13.7.4. Updated Section 15.1.9.4 of [RFC5661] entitled 3337 "NFS4ERR_RECLAIM_BAD (Error Code 10034)" 3339 The server has determined that a reclaim attempted by the client is 3340 not valid, i.e. the lock specified as being reclaimed could not 3341 possibly have existed before the server restart or file system 3342 migration event. A server is not obliged to make this determination 3343 and will typically rely on the client to only reclaim locks that the 3344 client was granted prior to restart or file system migration. 3345 However, when a server does have reliable information to enable it 3346 make this determination, this error indicates that the reclaim has 3347 been rejected as invalid. This is as opposed to the error 3348 NFS4ERR_RECLAIM_CONFLICT (see Section 13.7.5) where the server can 3349 only determine that there has been an invalid reclaim, but cannot 3350 determine which request is invalid. 3352 13.7.5. Updated Section 15.1.9.5 of [RFC5661] entitled 3353 "NFS4ERR_RECLAIM_CONFLICT (Error Code 10035)" 3355 The reclaim attempted by the client has encountered a conflict and 3356 cannot be satisfied. Potentially indicates a misbehaving client, 3357 although not necessarily the one receiving the error. The 3358 misbehavior might be on the part of the client that established the 3359 lock with which this client conflicted. See also Section 13.7.4 for 3360 the related error, NFS4ERR_RECLAIM_BAD. 3362 14. Updated Section 18.35 of [RFC5661] entitled "Operation 42: 3363 EXCHANGE_ID - Instantiate Client ID" 3365 The EXCHANGE_ID exchanges long-hand client and server identifiers 3366 (owners), and provides access to a client ID, creating one if 3367 necessary. This client ID becomes associated with the connection on 3368 which the operation is done, so that it is available when a 3369 CREATE_SESSION is done or when the connection is used to issue a 3370 request on an existing session associated with the current client. 3372 14.1. Updated Section 18.35.1 of [RFC5661] entitled "ARGUMENT" 3374 3376 const EXCHGID4_FLAG_SUPP_MOVED_REFER = 0x00000001; 3377 const EXCHGID4_FLAG_SUPP_MOVED_MIGR = 0x00000002; 3379 const EXCHGID4_FLAG_BIND_PRINC_STATEID = 0x00000100; 3381 const EXCHGID4_FLAG_USE_NON_PNFS = 0x00010000; 3382 const EXCHGID4_FLAG_USE_PNFS_MDS = 0x00020000; 3383 const EXCHGID4_FLAG_USE_PNFS_DS = 0x00040000; 3385 const EXCHGID4_FLAG_MASK_PNFS = 0x00070000; 3387 const EXCHGID4_FLAG_UPD_CONFIRMED_REC_A = 0x40000000; 3388 const EXCHGID4_FLAG_CONFIRMED_R = 0x80000000; 3389 struct state_protect_ops4 { 3390 bitmap4 spo_must_enforce; 3391 bitmap4 spo_must_allow; 3392 }; 3394 struct ssv_sp_parms4 { 3395 state_protect_ops4 ssp_ops; 3396 sec_oid4 ssp_hash_algs<>; 3397 sec_oid4 ssp_encr_algs<>; 3398 uint32_t ssp_window; 3399 uint32_t ssp_num_gss_handles; 3400 }; 3402 enum state_protect_how4 { 3403 SP4_NONE = 0, 3404 SP4_MACH_CRED = 1, 3405 SP4_SSV = 2 3406 }; 3408 union state_protect4_a switch(state_protect_how4 spa_how) { 3409 case SP4_NONE: 3410 void; 3411 case SP4_MACH_CRED: 3412 state_protect_ops4 spa_mach_ops; 3413 case SP4_SSV: 3414 ssv_sp_parms4 spa_ssv_parms; 3415 }; 3417 struct EXCHANGE_ID4args { 3418 client_owner4 eia_clientowner; 3419 uint32_t eia_flags; 3420 state_protect4_a eia_state_protect; 3421 nfs_impl_id4 eia_client_impl_id<1>; 3422 }; 3424 3426 14.2. Updated Section 18.35.2 of [RFC5661] entitled "RESULT" 3427 3429 struct ssv_prot_info4 { 3430 state_protect_ops4 spi_ops; 3431 uint32_t spi_hash_alg; 3432 uint32_t spi_encr_alg; 3433 uint32_t spi_ssv_len; 3434 uint32_t spi_window; 3435 gsshandle4_t spi_handles<>; 3436 }; 3438 union state_protect4_r switch(state_protect_how4 spr_how) { 3439 case SP4_NONE: 3440 void; 3441 case SP4_MACH_CRED: 3442 state_protect_ops4 spr_mach_ops; 3443 case SP4_SSV: 3444 ssv_prot_info4 spr_ssv_info; 3445 }; 3447 struct EXCHANGE_ID4resok { 3448 clientid4 eir_clientid; 3449 sequenceid4 eir_sequenceid; 3450 uint32_t eir_flags; 3451 state_protect4_r eir_state_protect; 3452 server_owner4 eir_server_owner; 3453 opaque eir_server_scope; 3454 nfs_impl_id4 eir_server_impl_id<1>; 3455 }; 3457 union EXCHANGE_ID4res switch (nfsstat4 eir_status) { 3458 case NFS4_OK: 3459 EXCHANGE_ID4resok eir_resok4; 3461 default: 3462 void; 3463 }; 3465 3467 14.3. Updated Section 18.35.3 of [RFC5661] entitled "DESCRIPTION" 3469 The client uses the EXCHANGE_ID operation to register a particular 3470 client_owner with the server. However, when the client_owner has 3471 been already been registered by other means (e.g. Transparent State 3472 Migration), the client may still use EXCHANGE_ID to obtain the client 3473 ID assigned previously. 3475 The client ID returned from this operation will be associated with 3476 the connection on which the EXHANGE_ID is received and will serve as 3477 a parent object for sessions created by the client on this connection 3478 or to which the connection is bound. As a result of using those 3479 sessions to make requests involving the creation of state, that state 3480 will become associated with the client ID returned. 3482 In situations in which the registration of the client_owner has not 3483 occurred previously, the client ID must first be used, along with the 3484 returned eir_sequenceid, in creating an associated session using 3485 CREATE_SESSION. 3487 If the flag EXCHGID4_FLAG_CONFIRMED_R is set in the result, 3488 eir_flags, then it is an indication that the registration of the 3489 client_owner has already occurred and that a further CREATE_SESSION 3490 is not needed to confirm it. Of course, subsequent CREATE_SESSION 3491 operations may be needed for other reasons. 3493 The value eir_sequenceid is used to establish an initial sequence 3494 value associate with the client ID returned. In cases in which a 3495 CREATE_SESSION has already been done, there is no need for this 3496 value, since sequencing of such request has already been established 3497 and the client has no need for this value and will ignore it 3499 EXCHANGE_ID MAY be sent in a COMPOUND procedure that starts with 3500 SEQUENCE. However, when a client communicates with a server for the 3501 first time, it will not have a session, so using SEQUENCE will not be 3502 possible. If EXCHANGE_ID is sent without a preceding SEQUENCE, then 3503 it MUST be the only operation in the COMPOUND procedure's request. 3504 If it is not, the server MUST return NFS4ERR_NOT_ONLY_OP. 3506 The eia_clientowner field is composed of a co_verifier field and a 3507 co_ownerid string. As noted in section 2.4 of [RFC5661], the 3508 co_ownerid describes the client, and the co_verifier is the 3509 incarnation of the client. An EXCHANGE_ID sent with a new 3510 incarnation of the client will lead to the server removing lock state 3511 of the old incarnation. Whereas an EXCHANGE_ID sent with the current 3512 incarnation and co_ownerid will result in an error or an update of 3513 the client ID's properties, depending on the arguments to 3514 EXCHANGE_ID. 3516 A server MUST NOT provide the same client ID to two different 3517 incarnations of an eir_clientowner. 3519 In addition to the client ID and sequence ID, the server returns a 3520 server owner (eir_server_owner) and server scope (eir_server_scope). 3521 The former field is used in connection with network trunking as 3522 described in Section 2.10.54 of [RFC5661]. The latter field is used 3523 to allow clients to determine when client IDs sent by one server may 3524 be recognized by another in the event of file system migration (see 3525 Section 8.9 of the current document). 3527 The client ID returned by EXCHANGE_ID is only unique relative to the 3528 combination of eir_server_owner.so_major_id and eir_server_scope. 3529 Thus, if two servers return the same client ID, the onus is on the 3530 client to distinguish the client IDs on the basis of 3531 eir_server_owner.so_major_id and eir_server_scope. In the event two 3532 different servers claim matching server_owner.so_major_id and 3533 eir_server_scope, the client can use the verification techniques 3534 discussed in Section 2.10.5 of [RFC5661] to determine if the servers 3535 are distinct. If they are distinct, then the client will need to 3536 note the destination network addresses of the connections used with 3537 each server, and use the network address as the final discriminator. 3539 The server, as defined by the unique identity expressed in the 3540 so_major_id of the server owner and the server scope, needs to track 3541 several properties of each client ID it hands out. The properties 3542 apply to the client ID and all sessions associated with the client 3543 ID. The properties are derived from the arguments and results of 3544 EXCHANGE_ID. The client ID properties include: 3546 o The capabilities expressed by the following bits, which come from 3547 the results of EXCHANGE_ID: 3549 * EXCHGID4_FLAG_SUPP_MOVED_REFER 3551 * EXCHGID4_FLAG_SUPP_MOVED_MIGR 3553 * EXCHGID4_FLAG_BIND_PRINC_STATEID 3555 * EXCHGID4_FLAG_USE_NON_PNFS 3557 * EXCHGID4_FLAG_USE_PNFS_MDS 3559 * EXCHGID4_FLAG_USE_PNFS_DS 3561 These properties may be updated by subsequent EXCHANGE_ID 3562 operations on confirmed client IDs though the server MAY refuse to 3563 change them. 3565 o The state protection method used, one of SP4_NONE, SP4_MACH_CRED, 3566 or SP4_SSV, as set by the spa_how field of the arguments to 3567 EXCHANGE_ID. Once the client ID is confirmed, this property 3568 cannot be updated by subsequent EXCHANGE_ID operations. 3570 o For SP4_MACH_CRED or SP4_SSV state protection: 3572 * The list of operations (spo_must_enforce) that MUST use the 3573 specified state protection. This list comes from the results 3574 of EXCHANGE_ID. 3576 * The list of operations (spo_must_allow) that MAY use the 3577 specified state protection. This list comes from the results 3578 of EXCHANGE_ID. 3580 Once the client ID is confirmed, these properties cannot be 3581 updated by subsequent EXCHANGE_ID requests. 3583 o For SP4_SSV protection: 3585 * The OID of the hash algorithm. This property is represented by 3586 one of the algorithms in the ssp_hash_algs field of the 3587 EXCHANGE_ID arguments. Once the client ID is confirmed, this 3588 property cannot be updated by subsequent EXCHANGE_ID requests. 3590 * The OID of the encryption algorithm. This property is 3591 represented by one of the algorithms in the ssp_encr_algs field 3592 of the EXCHANGE_ID arguments. Once the client ID is confirmed, 3593 this property cannot be updated by subsequent EXCHANGE_ID 3594 requests. 3596 * The length of the SSV. This property is represented by the 3597 spi_ssv_len field in the EXCHANGE_ID results. Once the client 3598 ID is confirmed, this property cannot be updated by subsequent 3599 EXCHANGE_ID operations. 3601 There are REQUIRED and RECOMMENDED relationships among the 3602 length of the key of the encryption algorithm ("key length"), 3603 the length of the output of hash algorithm ("hash length"), and 3604 the length of the SSV ("SSV length"). 3606 + key length MUST be <= hash length. This is because the keys 3607 used for the encryption algorithm are actually subkeys 3608 derived from the SSV, and the derivation is via the hash 3609 algorithm. The selection of an encryption algorithm with a 3610 key length that exceeded the length of the output of the 3611 hash algorithm would require padding, and thus weaken the 3612 use of the encryption algorithm. 3614 + hash length SHOULD be <= SSV length. This is because the 3615 SSV is a key used to derive subkeys via an HMAC, and it is 3616 recommended that the key used as input to an HMAC be at 3617 least as long as the length of the HMAC's hash algorithm's 3618 output (see Section 3 of [RFC2104]). 3620 + key length SHOULD be <= SSV length. This is a transitive 3621 result of the above two invariants. 3623 + key length SHOULD be >= hash length / 2. This is because 3624 the subkey derivation is via an HMAC and it is recommended 3625 that if the HMAC has to be truncated, it should not be 3626 truncated to less than half the hash length (see Section 4 3627 of RFC2104 [RFC2104]). 3629 * Number of concurrent versions of the SSV the client and server 3630 will support (see Section 2.10.9 of [RFC5661]). This property 3631 is represented by spi_window in the EXCHANGE_ID results. The 3632 property may be updated by subsequent EXCHANGE_ID operations. 3634 o The client's implementation ID as represented by the 3635 eia_client_impl_id field of the arguments. The property may be 3636 updated by subsequent EXCHANGE_ID requests. 3638 o The server's implementation ID as represented by the 3639 eir_server_impl_id field of the reply. The property may be 3640 updated by replies to subsequent EXCHANGE_ID requests. 3642 The eia_flags passed as part of the arguments and the eir_flags 3643 results allow the client and server to inform each other of their 3644 capabilities as well as indicate how the client ID will be used. 3645 Whether a bit is set or cleared on the arguments' flags does not 3646 force the server to set or clear the same bit on the results' side. 3647 Bits not defined above cannot be set in the eia_flags field. If they 3648 are, the server MUST reject the operation with NFS4ERR_INVAL. 3650 The EXCHGID4_FLAG_UPD_CONFIRMED_REC_A bit can only be set in 3651 eia_flags; it is always off in eir_flags. The 3652 EXCHGID4_FLAG_CONFIRMED_R bit can only be set in eir_flags; it is 3653 always off in eia_flags. If the server recognizes the co_ownerid and 3654 co_verifier as mapping to a confirmed client ID, it sets 3655 EXCHGID4_FLAG_CONFIRMED_R in eir_flags. The 3656 EXCHGID4_FLAG_CONFIRMED_R flag allows a client to tell if the client 3657 ID it is trying to create already exists and is confirmed. 3659 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set in eia_flags, this means 3660 that the client is attempting to update properties of an existing 3661 confirmed client ID (if the client wants to update properties of an 3662 unconfirmed client ID, it MUST NOT set 3663 EXCHGID4_FLAG_UPD_CONFIRMED_REC_A). If so, it is RECOMMENDED that 3664 the client send the update EXCHANGE_ID operation in the same COMPOUND 3665 as a SEQUENCE so that the EXCHANGE_ID is executed exactly once. 3666 Whether the client can update the properties of client ID depends on 3667 the state protection it selected when the client ID was created, and 3668 the principal and security flavor it uses when sending the 3669 EXCHANGE_ID operation. The situations described in items 6, 7, 8, or 3670 9 of the second numbered list of Section 14.4 below will apply. Note 3671 that if the operation succeeds and returns a client ID that is 3672 already confirmed, the server MUST set the EXCHGID4_FLAG_CONFIRMED_R 3673 bit in eir_flags. 3675 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in eia_flags, this 3676 means that the client is trying to establish a new client ID; it is 3677 attempting to trunk data communication to the server (See 3678 Section 2.10.5 of [RFC5661]); or it is attempting to update 3679 properties of an unconfirmed client ID. The situations described in 3680 items 1, 2, 3, 4, or 5 of the second numbered list of Section 14.4 3681 below will apply. Note that if the operation succeeds and returns a 3682 client ID that was previously confirmed, the server MUST set the 3683 EXCHGID4_FLAG_CONFIRMED_R bit in eir_flags. 3685 When the EXCHGID4_FLAG_SUPP_MOVED_REFER flag bit is set, the client 3686 indicates that it is capable of dealing with an NFS4ERR_MOVED error 3687 as part of a referral sequence. When this bit is not set, it is 3688 still legal for the server to perform a referral sequence. However, 3689 a server may use the fact that the client is incapable of correctly 3690 responding to a referral, by avoiding it for that particular client. 3691 It may, for instance, act as a proxy for that particular file system, 3692 at some cost in performance, although it is not obligated to do so. 3693 If the server will potentially perform a referral, it MUST set 3694 EXCHGID4_FLAG_SUPP_MOVED_REFER in eir_flags. 3696 When the EXCHGID4_FLAG_SUPP_MOVED_MIGR is set, the client indicates 3697 that it is capable of dealing with an NFS4ERR_MOVED error as part of 3698 a file system migration sequence. When this bit is not set, it is 3699 still legal for the server to indicate that a file system has moved, 3700 when this in fact happens. However, a server may use the fact that 3701 the client is incapable of correctly responding to a migration in its 3702 scheduling of file systems to migrate so as to avoid migration of 3703 file systems being actively used. It may also hide actual migrations 3704 from clients unable to deal with them by acting as a proxy for a 3705 migrated file system for particular clients, at some cost in 3706 performance, although it is not obligated to do so. If the server 3707 will potentially perform a migration, it MUST set 3708 EXCHGID4_FLAG_SUPP_MOVED_MIGR in eir_flags. 3710 When EXCHGID4_FLAG_BIND_PRINC_STATEID is set, the client indicates 3711 that it wants the server to bind the stateid to the principal. This 3712 means that when a principal creates a stateid, it has to be the one 3713 to use the stateid. If the server will perform binding, it will 3714 return EXCHGID4_FLAG_BIND_PRINC_STATEID. The server MAY return 3715 EXCHGID4_FLAG_BIND_PRINC_STATEID even if the client does not request 3716 it. If an update to the client ID changes the value of 3717 EXCHGID4_FLAG_BIND_PRINC_STATEID's client ID property, the effect 3718 applies only to new stateids. Existing stateids (and all stateids 3719 with the same "other" field) that were created with stateid to 3720 principal binding in force will continue to have binding in force. 3721 Existing stateids (and all stateids with the same "other" field) that 3722 were created with stateid to principal not in force will continue to 3723 have binding not in force. 3725 The EXCHGID4_FLAG_USE_NON_PNFS, EXCHGID4_FLAG_USE_PNFS_MDS, and 3726 EXCHGID4_FLAG_USE_PNFS_DS bits are described in Section 13.1 of 3727 [RFC5661] and convey roles the client ID is to be used for in a pNFS 3728 environment. The server MUST set one of the acceptable combinations 3729 of these bits (roles) in eir_flags, as specified in that section. 3730 Note that the same client owner/server owner pair can have multiple 3731 roles. Multiple roles can be associated with the same client ID or 3732 with different client IDs. Thus, if a client sends EXCHANGE_ID from 3733 the same client owner to the same server owner multiple times, but 3734 specifies different pNFS roles each time, the server might return 3735 different client IDs. Given that different pNFS roles might have 3736 different client IDs, the client may ask for different properties for 3737 each role/client ID. 3739 The spa_how field of the eia_state_protect field specifies how the 3740 client wants to protect its client, locking, and session states from 3741 unauthorized changes (Section 2.10.8.3 of [RFC5661]): 3743 o SP4_NONE. The client does not request the NFSv4.1 server to 3744 enforce state protection. The NFSv4.1 server MUST NOT enforce 3745 state protection for the returned client ID. 3747 o SP4_MACH_CRED. If spa_how is SP4_MACH_CRED, then the client MUST 3748 send the EXCHANGE_ID operation with RPCSEC_GSS as the security 3749 flavor, and with a service of RPC_GSS_SVC_INTEGRITY or 3750 RPC_GSS_SVC_PRIVACY. If SP4_MACH_CRED is specified, then the 3751 client wants to use an RPCSEC_GSS-based machine credential to 3752 protect its state. The server MUST note the principal the 3753 EXCHANGE_ID operation was sent with, and the GSS mechanism used. 3754 These notes collectively comprise the machine credential. 3756 After the client ID is confirmed, as long as the lease associated 3757 with the client ID is unexpired, a subsequent EXCHANGE_ID 3758 operation that uses the same eia_clientowner.co_owner as the first 3759 EXCHANGE_ID MUST also use the same machine credential as the first 3760 EXCHANGE_ID. The server returns the same client ID for the 3761 subsequent EXCHANGE_ID as that returned from the first 3762 EXCHANGE_ID. 3764 o SP4_SSV. If spa_how is SP4_SSV, then the client MUST send the 3765 EXCHANGE_ID operation with RPCSEC_GSS as the security flavor, and 3766 with a service of RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY. 3767 If SP4_SSV is specified, then the client wants to use the SSV to 3768 protect its state. The server records the credential used in the 3769 request as the machine credential (as defined above) for the 3770 eia_clientowner.co_owner. The CREATE_SESSION operation that 3771 confirms the client ID MUST use the same machine credential. 3773 When a client specifies SP4_MACH_CRED or SP4_SSV, it also provides 3774 two lists of operations (each expressed as a bitmap). The first list 3775 is spo_must_enforce and consists of those operations the client MUST 3776 send (subject to the server confirming the list of operations in the 3777 result of EXCHANGE_ID) with the machine credential (if SP4_MACH_CRED 3778 protection is specified) or the SSV-based credential (if SP4_SSV 3779 protection is used). The client MUST send the operations with 3780 RPCSEC_GSS credentials that specify the RPC_GSS_SVC_INTEGRITY or 3781 RPC_GSS_SVC_PRIVACY security service. Typically, the first list of 3782 operations includes EXCHANGE_ID, CREATE_SESSION, DELEGPURGE, 3783 DESTROY_SESSION, BIND_CONN_TO_SESSION, and DESTROY_CLIENTID. The 3784 client SHOULD NOT specify in this list any operations that require a 3785 filehandle because the server's access policies MAY conflict with the 3786 client's choice, and thus the client would then be unable to access a 3787 subset of the server's namespace. 3789 Note that if SP4_SSV protection is specified, and the client 3790 indicates that CREATE_SESSION must be protected with SP4_SSV, because 3791 the SSV cannot exist without a confirmed client ID, the first 3792 CREATE_SESSION MUST instead be sent using the machine credential, and 3793 the server MUST accept the machine credential. 3795 There is a corresponding result, also called spo_must_enforce, of the 3796 operations for which the server will require SP4_MACH_CRED or SP4_SSV 3797 protection. Normally, the server's result equals the client's 3798 argument, but the result MAY be different. If the client requests 3799 one or more operations in the set { EXCHANGE_ID, CREATE_SESSION, 3800 DELEGPURGE, DESTROY_SESSION, BIND_CONN_TO_SESSION, DESTROY_CLIENTID 3801 }, then the result spo_must_enforce MUST include the operations the 3802 client requested from that set. 3804 If spo_must_enforce in the results has BIND_CONN_TO_SESSION set, then 3805 connection binding enforcement is enabled, and the client MUST use 3806 the machine (if SP4_MACH_CRED protection is used) or SSV (if SP4_SSV 3807 protection is used) credential on calls to BIND_CONN_TO_SESSION. 3809 The second list is spo_must_allow and consists of those operations 3810 the client wants to have the option of sending with the machine 3811 credential or the SSV-based credential, even if the object the 3812 operations are performed on is not owned by the machine or SSV 3813 credential. 3815 The corresponding result, also called spo_must_allow, consists of the 3816 operations the server will allow the client to use SP4_SSV or 3817 SP4_MACH_CRED credentials with. Normally, the server's result equals 3818 the client's argument, but the result MAY be different. 3820 The purpose of spo_must_allow is to allow clients to solve the 3821 following conundrum. Suppose the client ID is confirmed with 3822 EXCHGID4_FLAG_BIND_PRINC_STATEID, and it calls OPEN with the 3823 RPCSEC_GSS credentials of a normal user. Now suppose the user's 3824 credentials expire, and cannot be renewed (e.g., a Kerberos ticket 3825 granting ticket expires, and the user has logged off and will not be 3826 acquiring a new ticket granting ticket). The client will be unable 3827 to send CLOSE without the user's credentials, which is to say the 3828 client has to either leave the state on the server or re-send 3829 EXCHANGE_ID with a new verifier to clear all state, that is, unless 3830 the client includes CLOSE on the list of operations in spo_must_allow 3831 and the server agrees. 3833 The SP4_SSV protection parameters also have: 3835 ssp_hash_algs: 3837 This is the set of algorithms the client supports for the purpose 3838 of computing the digests needed for the internal SSV GSS mechanism 3839 and for the SET_SSV operation. Each algorithm is specified as an 3840 object identifier (OID). The REQUIRED algorithms for a server are 3841 id-sha1, id-sha224, id-sha256, id-sha384, and id-sha512 [RFC4055]. 3842 The algorithm the server selects among the set is indicated in 3843 spi_hash_alg, a field of spr_ssv_prot_info. The field 3844 spi_hash_alg is an index into the array ssp_hash_algs. If the 3845 server does not support any of the offered algorithms, it returns 3846 NFS4ERR_HASH_ALG_UNSUPP. If ssp_hash_algs is empty, the server 3847 MUST return NFS4ERR_INVAL. 3849 ssp_encr_algs: 3851 This is the set of algorithms the client supports for the purpose 3852 of providing privacy protection for the internal SSV GSS 3853 mechanism. Each algorithm is specified as an OID. The REQUIRED 3854 algorithm for a server is id-aes256-CBC. The RECOMMENDED 3855 algorithms are id-aes192-CBC and id-aes128-CBC [CSOR_AES]. The 3856 selected algorithm is returned in spi_encr_alg, an index into 3857 ssp_encr_algs. If the server does not support any of the offered 3858 algorithms, it returns NFS4ERR_ENCR_ALG_UNSUPP. If ssp_encr_algs 3859 is empty, the server MUST return NFS4ERR_INVAL. Note that due to 3860 previously stated requirements and recommendations on the 3861 relationships between key length and hash length, some 3862 combinations of RECOMMENDED and REQUIRED encryption algorithm and 3863 hash algorithm either SHOULD NOT or MUST NOT be used. Table 1 3864 summarizes the illegal and discouraged combinations. 3866 ssp_window: 3868 This is the number of SSV versions the client wants the server to 3869 maintain (i.e., each successful call to SET_SSV produces a new 3870 version of the SSV). If ssp_window is zero, the server MUST 3871 return NFS4ERR_INVAL. The server responds with spi_window, which 3872 MUST NOT exceed ssp_window, and MUST be at least one. Any 3873 requests on the backchannel or fore channel that are using a 3874 version of the SSV that is outside the window will fail with an 3875 ONC RPC authentication error, and the requester will have to retry 3876 them with the same slot ID and sequence ID. 3878 ssp_num_gss_handles: 3880 This is the number of RPCSEC_GSS handles the server should create 3881 that are based on the GSS SSV mechanism (see section 2.10.9 of 3882 [RFC5661]). It is not the total number of RPCSEC_GSS handles for 3883 the client ID. Indeed, subsequent calls to EXCHANGE_ID will add 3884 RPCSEC_GSS handles. The server responds with a list of handles in 3885 spi_handles. If the client asks for at least one handle and the 3886 server cannot create it, the server MUST return an error. The 3887 handles in spi_handles are not available for use until the client 3888 ID is confirmed, which could be immediately if EXCHANGE_ID returns 3889 EXCHGID4_FLAG_CONFIRMED_R, or upon successful confirmation from 3890 CREATE_SESSION. 3892 While a client ID can span all the connections that are connected 3893 to a server sharing the same eir_server_owner.so_major_id, the 3894 RPCSEC_GSS handles returned in spi_handles can only be used on 3895 connections connected to a server that returns the same the 3896 eir_server_owner.so_major_id and eir_server_owner.so_minor_id on 3897 each connection. It is permissible for the client to set 3898 ssp_num_gss_handles to zero; the client can create more handles 3899 with another EXCHANGE_ID call. 3901 Because each SSV RPCSEC_GSS handle shares a common SSV GSS 3902 context, there are security considerations specific to this 3903 situation discussed in Section 2.10.10 of [RFC5661]. 3905 The seq_window (see Section 5.2.3.1 of [RFC2203]) of each 3906 RPCSEC_GSS handle in spi_handle MUST be the same as the seq_window 3907 of the RPCSEC_GSS handle used for the credential of the RPC 3908 request that the EXCHANGE_ID operation was sent as a part of. 3910 +-------------------+----------------------+------------------------+ 3911 | Encryption | MUST NOT be combined | SHOULD NOT be combined | 3912 | Algorithm | with | with | 3913 +-------------------+----------------------+------------------------+ 3914 | id-aes128-CBC | | id-sha384, id-sha512 | 3915 | id-aes192-CBC | id-sha1 | id-sha512 | 3916 | id-aes256-CBC | id-sha1, id-sha224 | | 3917 +-------------------+----------------------+------------------------+ 3919 Table 1 3921 The arguments include an array of up to one element in length called 3922 eia_client_impl_id. If eia_client_impl_id is present, it contains 3923 the information identifying the implementation of the client. 3924 Similarly, the results include an array of up to one element in 3925 length called eir_server_impl_id that identifies the implementation 3926 of the server. Servers MUST accept a zero-length eia_client_impl_id 3927 array, and clients MUST accept a zero-length eir_server_impl_id 3928 array. 3930 A possible use for implementation identifiers would be in diagnostic 3931 software that extracts this information in an attempt to identify 3932 interoperability problems, performance workload behaviors, or general 3933 usage statistics. Since the intent of having access to this 3934 information is for planning or general diagnosis only, the client and 3935 server MUST NOT interpret this implementation identity information in 3936 a way that affects how the implementation behaves in interacting with 3937 its peer. The client and server are not allowed to depend on the 3938 peer's manifesting a particular allowed behavior based on an 3939 implementation identifier but are required to interoperate as 3940 specified elsewhere in the protocol specification. 3942 Because it is possible that some implementations might violate the 3943 protocol specification and interpret the identity information, 3944 implementations MUST provide facilities to allow the NFSv4 client and 3945 server be configured to set the contents of the nfs_impl_id 3946 structures sent to any specified value. 3948 14.4. Updated Section 18.35.4 of [RFC5661] entitled "IMPLEMENTATION" 3950 A server's client record is a 5-tuple: 3952 1. co_ownerid 3953 The client identifier string, from the eia_clientowner 3954 structure of the EXCHANGE_ID4args structure. 3956 2. co_verifier: 3958 A client-specific value used to indicate incarnations (where a 3959 client restart represents a new incarnation), from the 3960 eia_clientowner structure of the EXCHANGE_ID4args structure. 3962 3. principal: 3964 The principal that was defined in the RPC header's credential 3965 and/or verifier at the time the client record was established. 3967 4. client ID: 3969 The shorthand client identifier, generated by the server and 3970 returned via the eir_clientid field in the EXCHANGE_ID4resok 3971 structure. 3973 5. confirmed: 3975 A private field on the server indicating whether or not a 3976 client record has been confirmed. A client record is 3977 confirmed if there has been a successful CREATE_SESSION 3978 operation to confirm it. Otherwise, it is unconfirmed. An 3979 unconfirmed record is established by an EXCHANGE_ID call. Any 3980 unconfirmed record that is not confirmed within a lease period 3981 SHOULD be removed. 3983 The following identifiers represent special values for the fields in 3984 the records. 3986 ownerid_arg: 3988 The value of the eia_clientowner.co_ownerid subfield of the 3989 EXCHANGE_ID4args structure of the current request. 3991 verifier_arg: 3993 The value of the eia_clientowner.co_verifier subfield of the 3994 EXCHANGE_ID4args structure of the current request. 3996 old_verifier_arg: 3998 A value of the eia_clientowner.co_verifier field of a client 3999 record received in a previous request; this is distinct from 4000 verifier_arg. 4002 principal_arg: 4004 The value of the RPCSEC_GSS principal for the current request. 4006 old_principal_arg: 4008 A value of the principal of a client record as defined by the RPC 4009 header's credential or verifier of a previous request. This is 4010 distinct from principal_arg. 4012 clientid_ret: 4014 The value of the eir_clientid field the server will return in the 4015 EXCHANGE_ID4resok structure for the current request. 4017 old_clientid_ret: 4019 The value of the eir_clientid field the server returned in the 4020 EXCHANGE_ID4resok structure for a previous request. This is 4021 distinct from clientid_ret. 4023 confirmed: 4025 The client ID has been confirmed. 4027 unconfirmed: 4029 The client ID has not been confirmed. 4031 Since EXCHANGE_ID is a non-idempotent operation, we must consider the 4032 possibility that retries occur as a result of a client restart, 4033 network partition, malfunctioning router, etc. Retries are 4034 identified by the value of the eia_clientowner field of 4035 EXCHANGE_ID4args, and the method for dealing with them is outlined in 4036 the scenarios below. 4038 The scenarios are described in terms of the client record(s) a server 4039 has for a given co_ownerid. Note that if the client ID was created 4040 specifying SP4_SSV state protection and EXCHANGE_ID as the one of the 4041 operations in spo_must_allow, then the server MUST authorize 4042 EXCHANGE_IDs with the SSV principal in addition to the principal that 4043 created the client ID. 4045 1. New Owner ID 4047 If the server has no client records with 4048 eia_clientowner.co_ownerid matching ownerid_arg, and 4049 EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in the 4050 EXCHANGE_ID, then a new shorthand client ID (let us call it 4051 clientid_ret) is generated, and the following unconfirmed 4052 record is added to the server's state. 4054 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 4055 unconfirmed } 4057 Subsequently, the server returns clientid_ret. 4059 2. Non-Update on Existing Client ID 4061 If the server has the following confirmed record, and the 4062 request does not have EXCHGID4_FLAG_UPD_CONFIRMED_REC_A set, 4063 then the request is the result of a retried request due to a 4064 faulty router or lost connection, or the client is trying to 4065 determine if it can perform trunking. 4067 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 4068 confirmed } 4070 Since the record has been confirmed, the client must have 4071 received the server's reply from the initial EXCHANGE_ID 4072 request. Since the server has a confirmed record, and since 4073 EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, with the 4074 possible exception of eir_server_owner.so_minor_id, the server 4075 returns the same result it did when the client ID's properties 4076 were last updated (or if never updated, the result when the 4077 client ID was created). The confirmed record is unchanged. 4079 3. Client Collision 4081 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, and if the 4082 server has the following confirmed record, then this request 4083 is likely the result of a chance collision between the values 4084 of the eia_clientowner.co_ownerid subfield of EXCHANGE_ID4args 4085 for two different clients. 4087 { ownerid_arg, *, old_principal_arg, old_clientid_ret, 4088 confirmed } 4090 If there is currently no state associated with 4091 old_clientid_ret, or if there is state but the lease has 4092 expired, then this case is effectively equivalent to the New 4093 Owner ID case of Paragraph 1. The confirmed record is 4094 deleted, the old_clientid_ret and its lock state are deleted, 4095 a new shorthand client ID is generated, and the following 4096 unconfirmed record is added to the server's state. 4098 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 4099 unconfirmed } 4101 Subsequently, the server returns clientid_ret. 4103 If old_clientid_ret has an unexpired lease with state, then no 4104 state of old_clientid_ret is changed or deleted. The server 4105 returns NFS4ERR_CLID_INUSE to indicate that the client should 4106 retry with a different value for the 4107 eia_clientowner.co_ownerid subfield of EXCHANGE_ID4args. The 4108 client record is not changed. 4110 4. Replacement of Unconfirmed Record 4112 If the EXCHGID4_FLAG_UPD_CONFIRMED_REC_A flag is not set, and 4113 the server has the following unconfirmed record, then the 4114 client is attempting EXCHANGE_ID again on an unconfirmed 4115 client ID, perhaps due to a retry, a client restart before 4116 client ID confirmation (i.e., before CREATE_SESSION was 4117 called), or some other reason. 4119 { ownerid_arg, *, *, old_clientid_ret, unconfirmed } 4121 It is possible that the properties of old_clientid_ret are 4122 different than those specified in the current EXCHANGE_ID. 4123 Whether or not the properties are being updated, to eliminate 4124 ambiguity, the server deletes the unconfirmed record, 4125 generates a new client ID (clientid_ret), and establishes the 4126 following unconfirmed record: 4128 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 4129 unconfirmed } 4131 5. Client Restart 4133 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, and if the 4134 server has the following confirmed client record, then this 4135 request is likely from a previously confirmed client that has 4136 restarted. 4138 { ownerid_arg, old_verifier_arg, principal_arg, 4139 old_clientid_ret, confirmed } 4141 Since the previous incarnation of the same client will no 4142 longer be making requests, once the new client ID is confirmed 4143 by CREATE_SESSION, byte-range locks and share reservations 4144 should be released immediately rather than forcing the new 4145 incarnation to wait for the lease time on the previous 4146 incarnation to expire. Furthermore, session state should be 4147 removed since if the client had maintained that information 4148 across restart, this request would not have been sent. If the 4149 server supports neither the CLAIM_DELEGATE_PREV nor 4150 CLAIM_DELEG_PREV_FH claim types, associated delegations should 4151 be purged as well; otherwise, delegations are retained and 4152 recovery proceeds according to section 10.2.1 of [RFC5661]. 4154 After processing, clientid_ret is returned to the client and 4155 this client record is added: 4157 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 4158 unconfirmed } 4160 The previously described confirmed record continues to exist, 4161 and thus the same ownerid_arg exists in both a confirmed and 4162 unconfirmed state at the same time. The number of states can 4163 collapse to one once the server receives an applicable 4164 CREATE_SESSION or EXCHANGE_ID. 4166 + If the server subsequently receives a successful 4167 CREATE_SESSION that confirms clientid_ret, then the server 4168 atomically destroys the confirmed record and makes the 4169 unconfirmed record confirmed as described in section 4170 16.36.3 of [RFC5661]. 4172 + If the server instead subsequently receives an EXCHANGE_ID 4173 with the client owner equal to ownerid_arg, one strategy is 4174 to simply delete the unconfirmed record, and process the 4175 EXCHANGE_ID as described in the entirety of Section 14.4. 4177 6. Update 4179 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 4180 has the following confirmed record, then this request is an 4181 attempt at an update. 4183 { ownerid_arg, verifier_arg, principal_arg, clientid_ret, 4184 confirmed } 4186 Since the record has been confirmed, the client must have 4187 received the server's reply from the initial EXCHANGE_ID 4188 request. The server allows the update, and the client record 4189 is left intact. 4191 7. Update but No Confirmed Record 4193 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 4194 has no confirmed record corresponding ownerid_arg, then the 4195 server returns NFS4ERR_NOENT and leaves any unconfirmed record 4196 intact. 4198 8. Update but Wrong Verifier 4200 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 4201 has the following confirmed record, then this request is an 4202 illegal attempt at an update, perhaps because of a retry from 4203 a previous client incarnation. 4205 { ownerid_arg, old_verifier_arg, *, clientid_ret, confirmed } 4207 The server returns NFS4ERR_NOT_SAME and leaves the client 4208 record intact. 4210 9. Update but Wrong Principal 4212 If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server 4213 has the following confirmed record, then this request is an 4214 illegal attempt at an update by an unauthorized principal. 4216 { ownerid_arg, verifier_arg, old_principal_arg, clientid_ret, 4217 confirmed } 4219 The server returns NFS4ERR_PERM and leaves the client record 4220 intact. 4222 15. Updated Section 18.51 of [RFC5661] entitled "Operation 58: 4223 RECLAIM_COMPLETE - Indicates Reclaims Finished" 4225 15.1. Updated Section 18.51.1 of [RFC5661] entitled "ARGUMEBNT" 4226 4228 struct RECLAIM_COMPLETE4args { 4229 /* 4230 * If rca_one_fs TRUE, 4231 * 4232 * CURRENT_FH: object in 4233 * file system reclaim is 4234 * complete for. 4235 */ 4236 bool rca_one_fs; 4237 }; 4239 4241 15.2. Updated Section 18.51.2 of [RFC5661] entitled "RESULTS" 4243 4245 struct RECLAIM_COMPLETE4res { 4246 nfsstat4 rcr_status; 4247 }; 4249 4251 15.3. Updated Section 18.51.3 of [RFC5661] entitled "DESCRIPTION" 4253 A RECLAIM_COMPLETE operation is used to indicate that the client has 4254 reclaimed all of the locking state that it will recover using 4255 reclaim, when it is recovering state due to either a server restart 4256 or the migration of a file system to another server. There are two 4257 types of RECLAIM_COMPLETE operations: 4259 o When rca_one_fs is FALSE, a global RECLAIM_COMPLETE is being done. 4260 This indicates that recovery of all locks that the client held on 4261 the previous server instance have been completed. The current 4262 filehandle need not be set in this case. 4264 o When rca_one_fs is TRUE, a file system-specific RECLAIM_COMPLETE 4265 is being done. This indicates that recovery of locks for a single 4266 fs (the one designated by the current filehandle) due to the 4267 migration of the file system has been completed. Presence of a 4268 current filehandle is required when rca_one_fs is set to TRUE. 4269 When the current filehandle designates a filehandle in a file 4270 system not in the process of migration, the operation returns 4271 NFS4_OK and is otherwise ignored. 4273 Once a RECLAIM_COMPLETE is done, there can be no further reclaim 4274 operations for locks whose scope is defined as having completed 4275 recovery. Once the client sends RECLAIM_COMPLETE, the server will 4276 not allow the client to do subsequent reclaims of locking state for 4277 that scope and, if these are attempted, will return NFS4ERR_NO_GRACE. 4279 Whenever a client establishes a new client ID and before it does the 4280 first non-reclaim operation that obtains a lock, it MUST send a 4281 RECLAIM_COMPLETE with rca_one_fs set to FALSE, even if there are no 4282 locks to reclaim. If non-reclaim locking operations are done before 4283 the RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned. 4285 Similarly, when the client accesses a migrated file system on a new 4286 server, before it sends the first non-reclaim operation that obtains 4287 a lock on this new server, it MUST send a RECLAIM_COMPLETE with 4288 rca_one_fs set to TRUE and current filehandle within that file 4289 system, even if there are no locks to reclaim. If non-reclaim 4290 locking operations are done on that file system before the 4291 RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned. 4293 It should be noted that there are situations in which a client needs 4294 to issue both forms of RECLAIM_COMPLETE. An example is an instance 4295 of file system migration in which the file system is migrated to a 4296 server for which the client has no clientid. As a result, the client 4297 needs to obtain a clientid from the server (incurring the 4298 responsibility to do RECLAIM_COMPLETE with rca_one_fs set to FALSE) 4299 as well as RECLAIM_COMPLETE with rca_one_fs set to TRUE to complete 4300 the per-fs grace period associated with the file system migration. 4302 Any locks not reclaimed at the point at which RECLAIM_COMPLETE is 4303 done become non-reclaimable. The client MUST NOT attempt to reclaim 4304 them, either during the current server instance or in any subsequent 4305 server instance, or on another server to which responsibility for 4306 that file system is transferred. If the client were to do so, it 4307 would be violating the protocol by representing itself as owning 4308 locks that it does not own, and so has no right to reclaim. See 4309 Section 8.4.3 of [RFC5661] for a discussion of edge conditions 4310 related to lock reclaim. 4312 By sending a RECLAIM_COMPLETE, the client indicates readiness to 4313 proceed to do normal non-reclaim locking operations. The client 4314 should be aware that such operations may temporarily result in 4315 NFS4ERR_GRACE errors until the server is ready to terminate its grace 4316 period. 4318 15.4. Updated Section 18.51.4 of [RFC5661] entitled "IMPLEMENTATION" 4320 Servers will typically use the information as to when reclaim 4321 activity is complete to reduce the length of the grace period. When 4322 the server maintains in persistent storage a list of clients that 4323 might have had locks, it is able to use the fact that all such 4324 clients have done a RECLAIM_COMPLETE to terminate the grace period 4325 and begin normal operations (i.e., grant requests for new locks) 4326 sooner than it might otherwise. 4328 Latency can be minimized by doing a RECLAIM_COMPLETE as part of the 4329 COMPOUND request in which the last lock-reclaiming operation is done. 4330 When there are no reclaims to be done, RECLAIM_COMPLETE should be 4331 done immediately in order to allow the grace period to end as soon as 4332 possible. 4334 RECLAIM_COMPLETE should only be done once for each server instance or 4335 occasion of the transition of a file system. If it is done a second 4336 time, the error NFS4ERR_COMPLETE_ALREADY will result. Note that 4337 because of the session feature's retry protection, retries of 4338 COMPOUND requests containing RECLAIM_COMPLETE operation will not 4339 result in this error. 4341 When a RECLAIM_COMPLETE is sent, the client effectively acknowledges 4342 any locks not yet reclaimed as lost. This allows the server to re- 4343 enable the client to recover locks if the occurrence of edge 4344 conditions, as described in Section 8.4.3 of [RFC5661], had caused 4345 the server to disable the client's ability to recover locks. 4347 Because previous descriptions of RECLAIM_COMPLETE were not 4348 sufficiently explicit about the circumstances in which use of 4349 RECLAIM_COMPLETE with rca_one_fs set to TRUE was appropriate, there 4350 have been cases which it has been misused by clients, and cases in 4351 which servers have, in various ways, not responded to such misuse as 4352 described above. While clients SHOULD NOT misuse this feature and 4353 servers SHOULD respond to such misuse as described above, 4354 implementers need to be aware of the following considerations as they 4355 make necessary tradeoffs between interoperability with existing 4356 implementations and proper support for facilities to allow lock 4357 recovery in the event of file system migration. 4359 o When servers have no support for becoming the destination server 4360 of a file system subject to migration, there is no possibility of 4361 a per-fs RECLAIM_COMPLETE being done legitimately and occurrences 4362 of it SHOULD be ignored. However, the negative consequences of 4363 accepting mistaken use are quite limited as long as the does not 4364 issue it before all necessary reclaims are done. 4366 o When a server might become the destination for a file system being 4367 migrated, inappropriate use per-fs RECLAIM_COMPLETE is more 4368 concerning. In the case in which the file system designated is 4369 not within a per-fs grace period, it SHOULD be ignored, with the 4370 negative consequences of accepting it being limited, as in the 4371 case in which migration is not supported. However, if it should 4372 encounter a file system undergoing migration, it cannot be 4373 accepted as if it were a global RECLAIM_COMPLETE without 4374 invalidating its intended use. 4376 16. Security Considerations 4378 The Security Considerations section of [RFC5661] needs the additions 4379 below to properly address some aspects of trunking discovery, 4380 referral, migration and replication. 4382 The possibility that requests to determine the set of network 4383 addresses corresponding to a given server might be interfered with 4384 or have their responses corrupted needs to be taken into account. 4385 In light of this, the following considerations should be taken 4386 note of: 4388 o When DNS is used to convert server named to addresses and 4389 DNSSEC [RFC4033] is not available, the validity of the network 4390 addresses returned cannot be relied upon. However, when the 4391 client uses RPCSEC_GSS to access the designated server, it is 4392 possible for mutual authentication to discover invalid server 4393 addresses provided. 4395 o The fetching of attributes containing file system location 4396 information SHOULD be performed using RPCSEC_GSS with integrity 4397 protection, as previously explained in the Security 4398 Considerations section of [RFC5661]. It is important to note 4399 here that a client making a request of this sort without using 4400 RPCSEC_GSS including integrity protection needs be aware of the 4401 negative consequences of doing so, which can lead to invalid 4402 host names or network addresses being returned. In light of 4403 this, the client needs to recognize that using such returned 4404 location information to access an NFSv4 server without use of 4405 RPCSEC_GSS (i.e. by using AUTH_SYS) poses dangers as it can 4406 result in the client interacting with an unverified network 4407 address posing as an NFSv4 server. 4409 o Despite the fact that it is a REQUIREMENT (of [RFC5661]) that 4410 "implementations" provide "support" for use of RPCSEC_GSS, it 4411 cannot be assumed that use of RPCSEC_GSS is always available 4412 between any particular client-server pair. 4414 o When a client has the network addresses of a server but not the 4415 associated host names, that would interfere with its ability to 4416 use RPCSEC_GSS. 4418 In light of the above, a server should present file system 4419 location entries that correspond to file systems on other servers 4420 using a host name. This would allow the client to interrogate the 4421 fs_locations on the destination server to obtain trunking 4422 information (as well as replica information) using RPCSEC_GSS with 4423 integrity, validating the name provided while assuring that the 4424 response has not been corrupted. 4426 When RPCSEC_GSS is not available on a server, the client needs to 4427 be aware of the fact that the location entries are subject to 4428 corruption and cannot be relied upon. In the case of a client 4429 being directed to another server after NFS4ERR_MOVED, this could 4430 vitiate the authentication provided by the use of RPCSEC_GSS on 4431 the destination. Even when RPCSEC_GSS authentication is available 4432 on the destination, the server might validly represent itself as 4433 the server to which the client was erroneously directed. Without 4434 a way to decide whether the server is a valid one, the client can 4435 only determine, using RPCSEC_GSS, that the server corresponds to 4436 the name provided, with no basis for trusting that server. As a 4437 result, the client should not use such unverified location entries 4438 as a basis for migration, even though RPCSEC_GSS might be 4439 available on the destination. 4441 When a file system location attribute is fetched upon connecting 4442 with an NFS server, it SHOULD, as stated above, be done using 4443 RPCSEC_GSS with integrity protection. When this not possible, it 4444 is generally best for the client to ignore trunking and replica 4445 information or simply not fetch the location information for these 4446 purposes. 4448 When location information cannot be verified, it can be subjected 4449 to additional filtering to prevent the client from being 4450 inappropriately directed. For example, if a range of network 4451 addresses can be determined that assure that the servers and 4452 clients using AUTH_SYS are subject to the appropriate set of 4453 constrains (e.g. physical network isolation, administrative 4454 controls on the operating systems used), then network addresses in 4455 the appropriate range can be used with others discarded or 4456 restricted in their use of AUTH_SYS. 4458 To summarize considerations regarding the use of RPCSEC_GSS in 4459 fetching location information, we need to consider the following 4460 possibilities for requests to interrogate location information, 4461 with interrogation approaches on the referring and destination 4462 servers arrived at separately: 4464 o The use of RPCSEC_GSS with integrity protection is RECOMMENDED 4465 in all cases, since the absence of integrity protection exposes 4466 the client to the possibility of the results being modified in 4467 transit. 4469 o The use of requests issued without RPCSEC_GSS (i.e. using 4470 AUTH_SYS), while undesirable, may not be avoidable in all 4471 cases. Where the use of the returned information cannot be 4472 avoided, it should be subject to filtering to eliminate the 4473 possibility that the client would treat an invalid address as 4474 if it were a NFSv4 server. The specifics will vary depending 4475 on the degree of network isolation and whether the request is 4476 to the referring or destination servers. 4478 17. IANA Considerations 4480 This document does not require actions by IANA. 4482 18. References 4484 18.1. Normative References 4486 [CSOR_AES] 4487 National Institute of Standards and Technology, 4488 "Cryptographic Algorithm Object Registration", URL 4489 http://csrc.nist.gov/groups/ST/crypto_apps_infra/csor/ 4490 algorithms.html, November 2007. 4492 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 4493 Requirement Levels", BCP 14, RFC 2119, 4494 DOI 10.17487/RFC2119, March 1997, 4495 . 4497 [RFC2203] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol 4498 Specification", RFC 2203, DOI 10.17487/RFC2203, September 4499 1997, . 4501 [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. 4502 Rose, "DNS Security Introduction and Requirements", 4503 RFC 4033, DOI 10.17487/RFC4033, March 2005, 4504 . 4506 [RFC4055] Schaad, J., Kaliski, B., and R. Housley, "Additional 4507 Algorithms and Identifiers for RSA Cryptography for use in 4508 the Internet X.509 Public Key Infrastructure Certificate 4509 and Certificate Revocation List (CRL) Profile", RFC 4055, 4510 DOI 10.17487/RFC4055, June 2005, 4511 . 4513 [RFC5403] Eisler, M., "RPCSEC_GSS Version 2", RFC 5403, 4514 DOI 10.17487/RFC5403, February 2009, 4515 . 4517 [RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol 4518 Specification Version 2", RFC 5531, DOI 10.17487/RFC5531, 4519 May 2009, . 4521 [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., 4522 "Network File System (NFS) Version 4 Minor Version 1 4523 Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, 4524 . 4526 [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System 4527 (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, 4528 March 2015, . 4530 [RFC7861] Adamson, A. and N. Williams, "Remote Procedure Call (RPC) 4531 Security Version 3", RFC 7861, DOI 10.17487/RFC7861, 4532 November 2016, . 4534 [RFC7931] Noveck, D., Ed., Shivam, P., Lever, C., and B. Baker, 4535 "NFSv4.0 Migration: Specification Update", RFC 7931, 4536 DOI 10.17487/RFC7931, July 2016, 4537 . 4539 [RFC8166] Lever, C., Ed., Simpson, W., and T. Talpey, "Remote Direct 4540 Memory Access Transport for Remote Procedure Call Version 4541 1", RFC 8166, DOI 10.17487/RFC8166, June 2017, 4542 . 4544 [RFC8178] Noveck, D., "Rules for NFSv4 Extensions and Minor 4545 Versions", RFC 8178, DOI 10.17487/RFC8178, July 2017, 4546 . 4548 18.2. Informative References 4550 [I-D.cel-nfsv4-mv0-trunking-update] 4551 Lever, C. and D. Noveck, "NFS version 4.0 Trunking 4552 Update", draft-cel-nfsv4-mv0-trunking-update-00 (work in 4553 progress), November 2017. 4555 [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- 4556 Hashing for Message Authentication", RFC 2104, 4557 DOI 10.17487/RFC2104, February 1997, 4558 . 4560 Appendix A. Classification of Document Sections 4562 Using the classification appearing in Section 3.3, we can proceed 4563 through the current document and classify its sections as listed 4564 below. In this listing, when we refer to a Section X and there is a 4565 Section X.1 within it, the classification of Section X refers to the 4566 part of that section exclusive of subsections. In the case when that 4567 portion is empty, the section is not counted. 4569 o Sections 1 through 4, a total of five sections, are all 4570 explanatory. 4572 o Section 4.1 is a replacement section. 4574 o Section 4.3 is an additional section. 4576 o Section 4.3 is a replacement section. 4578 o Section 4.4 is explanatory. 4580 o Section 4.5 is a replacement section. 4582 o Sections 4.5.1 through 4.5.3, a total of three sections, are all 4583 additional sections. 4585 o Sections 4.5.4 through 4.5.6, a total of three sections, are all 4586 replacement sections. 4588 o Section 4.5.7 is an additional section. 4590 o Section 5 is explanatory. 4592 o Sections 6 and 7 are additional sections. 4594 o Sections 8 through 8.9, a total of ten sections, are all 4595 replacement sections. 4597 o Sections 9 through 11.3, a total of twelve sections, are all 4598 additional sections. 4600 o Section 12.1 is explanatory. 4602 o Sections 12.2 throuhy 12.2.3, a total of four sections, are all 4603 replacemebt sections. 4605 o Section 13 is explanatory. 4607 o Sections 13.1 and 13.2 are replacement sections. 4609 o Sections 13.3 and 13.4 are editing sections. 4611 o Sections 13.5 and 13.6 is explanatory. 4613 o Section 13.7 is a replcement section, which consists of a total of 4614 six sections. 4616 o Section 14 is a replacement section, which consists of a total of 4617 five sections. 4619 o Section 15 is a replacement section, which consists of a total of 4620 five sections. 4622 o Section 16 is an editing section. 4624 o Section 17 through Acknowledgments, a total of six sections, are 4625 all explanatory. 4627 To summarize: 4629 o There are seventeen explanatory sections. 4631 o There are thirty-seven replacement sections. 4633 o There are eightteen additional sections. 4635 o There are three editing sections. 4637 Appendix B. Updates to [RFC5661] 4639 In this appendix, we proceed through [RFC5661] identifying sections 4640 as unchanged, modified, deleted, or replaced and indicating where 4641 additional sections from the current document would appear in an 4642 eventual consolidated description of NFSv4.1. In this presentation, 4643 when section X is referred to, it denotes that section plus all 4644 included subsections. When it is necessary to refer to the part of a 4645 section outside any included subsections, the exclusion is noted 4646 explicitly. 4648 o Section 1 is unmodified except that Section 1.7.3.3 is to be 4649 replaced by Section 13.1 from the current document. 4651 o Section 2 is unmodified except for the specific items listed 4652 below: 4654 o Section 2.10.4 is replaced by Section 13.2 from the current 4655 document. 4657 o Section 2.10.5 is modified as discussed in Section 13.4 of the 4658 current document. 4660 o Sections 3 through 10 are unchanged. 4662 o Section 11 is extensively modified as discussed below. 4664 o Section 11, exclusive of subsections, is replaced by Sections 4665 4.1 and 4.2 from the current document. 4667 o Section 11.1 is replaced by Section 4.3 from the current 4668 document. 4670 o Sections 11.2, 11.3, 11.3.1, and 11.3.2 are unchanged. 4672 o Section 11.4 is replaced by Section 4.5 from the current 4673 document. For details regarding subsections see below. 4675 o New sections corresponding to Sections 4.5.1 through 4.5.3 4676 from the current document appear next. 4678 o Section 11.4.1 is replaced by Section 4.5.4 from the current 4679 document. 4681 o Section 11.4.2 is replaced by Section 4.5.5 from the current 4682 document. 4684 o Section 11.4.3 is replaced by Section 4.5.6 from the current 4685 document. 4687 o A new section corresponding to Section 4.5.7 from the 4688 current document appears next. 4690 o Section 11.5 is to be deleted. 4692 o Section 11.6 is unchanged. 4694 o New sections corresponding to Sections 6 and 7 from the current 4695 document appear next. 4697 o Section 11.7 is replaced by Section 8 from the current 4698 document. For details regarding subsections see below. 4700 o Section 11.7.1 is replaced by Section 8.1 from the current 4701 document. 4703 o Sections 11.7.2, 11.7.2.1, and 11.7.2.2 are deleted. 4705 o Section 11.7.3 is replaced by Section 8.2 from the current 4706 document. 4708 o Section 11.7.4 is replaced by Section 8.3 from the current 4709 document. 4711 o Sections 11.7.5 and 11.7.5.1 are replaced by Sections 8.4 4712 and 8.4.1 respectively, from the current document. 4714 o Section 11.7.6 is replaced by Section 8.5 from the current 4715 document. 4717 o Section 11.7.7, exclusive of subsections, is replaced by 4718 Section 8.9 from the current document. Sections 11.7.7.1 4719 and 11.7.7.2 are unchanged. 4721 o Section 11.7.8 is replaced by Section 8.6 from the current 4722 document. 4724 o Section 11.7.9 is replaced by Section 8.7 from the current 4725 document. 4727 o Section 11.7.10 is replaced by Section 8.8 from the current 4728 document. 4730 o Sections 11.8, 11.8.1, 11.8.2, and 11.9, are unchanged. 4732 o Sections 11.10, 11.10.1, 11.10.2, and 11.10.3 are replaced by 4733 Sections 12.2 through 12.2.3 from the current document. 4735 o Section 11.11 is unchanged. 4737 o New sections corresponding to Sections 9, 10, and 11 from the 4738 current document appear next as additional sub-sections of 4739 Section 11. Each of these has subsections, so there is a total 4740 of seventeen sections added. 4742 o Sections 12 through 14 are unchanged. 4744 o Section 15 is unmodified except that 4746 * The description of NFS4ERR_MOVED in Section 15.1 is revised as 4747 described in Section 13.3 of the current document. 4749 * The description of the reclaim-related errors in section 15.1.9 4750 is replaced by the revised descriptions in Section 13.7 of the 4751 current document. 4753 o Sections 16 and 17 are unchanged. 4755 o Section 18 is unmodified except for the following: 4757 * Section 18.35 is replaced by Section 14 in the current 4758 document. 4760 * Section 18.51 is replaced by Section 15 in the current 4761 document. 4763 o Sections 19 through 23 are unchanged. 4765 In terms of top-level sections, exclusive of appendices: 4767 o There is one heavily modified top-level section (Section 11) 4769 o There are four other modified top-level sections (Sections 1, 2, 4770 15, and 18). 4772 o The other eighteen top-level sections are unchanged. 4774 The disposition of sections of [RFC5661] is summarized in the 4775 following table which provides counts of sections replaced, added, 4776 deleted, modified, or unchanged. Separate counts are provided for: 4778 o Top-level sections. 4780 o Sections with TOC entries. 4782 o Sections within Section 11. 4784 o Sections outside Section 11. 4786 In this table, the counts for top-level sections and TOC entries are 4787 for sections including subsections while other counts are for 4788 sections exclusive of included subsections. 4790 +------------+------+------+--------+------------+--------+ 4791 | Status | Top | TOC | in 11 | not in 11 | Total | 4792 +------------+------+------+--------+------------+--------+ 4793 | Replaced | 0 | 6 | 21 | 15 | 36 | 4794 | Added | 0 | 5 | 24 | 0 | 24 | 4795 | Deleted | 0 | 1 | 4 | 0 | 4 | 4796 | Modified | 5 | 3 | 0 | 2 | 2 | 4797 | Unchanged | 18 | 210 | 12 | 910 | 922 | 4798 | in RFC5661 | 23 | 220 | 37 | 927 | 964 | 4799 +------------+------+------+--------+------------+--------+ 4801 Acknowledgments 4803 The authors wish to acknowledge the important role of Andy Adamson of 4804 Netapp in clarifying the need for trunking discovery functionality, 4805 and exploring the role of the file system location attributes in 4806 providing the necessary support. 4808 The authors also wish to acknowledge the work of Xuan Qi of Oracle 4809 with NFSv4.1 client and server prototypes of transparent state 4810 migration functionality. 4812 The authors wish to thank others that brought attention to important 4813 issues. The comments of Trond Myklebust of Primary Data related to 4814 trunking helped to clarify the role of DNS in trunking discovery. 4815 Rick Macklem's comments brought attention to problems in the handling 4816 of the per-fs version of RECLAIM_COMPLETE. 4818 The authors wish to thank Olga Kornievskaia of Netapp for her helpful 4819 review comments. 4821 Authors' Addresses 4823 David Noveck (editor) 4824 NetApp 4825 1601 Trapelo Road 4826 Waltham, MA 02451 4827 United States of America 4829 Phone: +1 781 572 8038 4830 Email: davenoveck@gmail.com 4831 Charles Lever 4832 Oracle Corporation 4833 1015 Granger Avenue 4834 Ann Arbor, MI 48104 4835 United States of America 4837 Phone: +1 248 614 5091 4838 Email: chuck.lever@oracle.com