idnits 2.17.1 draft-ietf-nfsv4-migration-issues-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (March 29, 2017) is 2578 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 5661 (Obsoleted by RFC 8881) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 D. Noveck, Ed. 3 Internet-Draft NetApp 4 Intended status: Informational P. Shivam 5 Expires: September 30, 2017 C. Lever 6 B. Baker 7 ORACLE 8 March 29, 2017 10 NFSv4 migration: Implementation Experience and Specification Issues 11 draft-ietf-nfsv4-migration-issues-12 13 Abstract 15 The migration feature of NFSv4 provides for moving responsibility for 16 a single filesystem from one server to another, without disruption to 17 clients. A number of problems in the specification of this feature 18 in NFSv4.0 were resolved by the publication of RFC 7931. In 19 addition, there are specification issues to be resolved with regard 20 to the NFSv4.1 version of this feature which are discussed in this 21 document. 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on September 30, 2017. 40 Copyright Notice 42 Copyright (c) 2017 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 58 2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 3. NFSv4.0 Issues and Their Resolution . . . . . . . . . . . . . 3 60 3.1. NFSv4.0 Issues . . . . . . . . . . . . . . . . . . . . . 3 61 3.2. Resolution of NFSv4.0 Protocol Difficulties . . . . . . . 4 62 4. Issues for NFSv4.1 . . . . . . . . . . . . . . . . . . . . . 5 63 4.1. Issues to Address for NFSv4.1 . . . . . . . . . . . . . . 5 64 4.1.1. Addressing state merger in NFSv4.1 . . . . . . . . . 6 65 4.1.2. Addressing pNFS relationship with migration . . . . . 7 66 4.1.3. Addressing server_owner changes in NFSv4.1 . . . . . 7 67 4.1.4. Addressing Confirmation Status of Migrated 68 Client IDs in NFSv4.1 . . . . . . . . . . . . . . . . 8 69 4.1.5. Addressing Session Migration in NFSv4.1 . . . . . . . 9 70 4.2. Possible Resolutions for NFSv4.1 Issues . . . . . . . . . 9 71 4.2.1. Server Responsibilities in Effecting Transparent 72 State Migration . . . . . . . . . . . . . . . . . . . 10 73 4.2.2. Determining Initial Migration Status in NFSv4.1 . . . 11 74 4.2.3. Client Response to Migration in NFSv4.1 . . . . . . . 13 75 4.2.4. Dealing with Multiple Location Entries . . . . . . . 13 76 4.2.5. Client Recovery from Migration Events . . . . . . . . 15 77 4.2.6. The Migration Discovery Process . . . . . . . . . . . 18 78 4.2.7. Synchronzing Session Transfer . . . . . . . . . . . . 19 79 4.2.8. Migration and pNFS . . . . . . . . . . . . . . . . . 22 80 5. Security Considerations . . . . . . . . . . . . . . . . . . . 23 81 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 82 7. Normative References . . . . . . . . . . . . . . . . . . . . 23 83 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 23 84 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 24 86 1. Introduction 88 This document. which deals with existing issues/problems in 89 standards-track documents, is in the informational category, and 90 while the facts it reports may have normative implications, any such 91 normative significance reflects the readers' preferences. For 92 example, we may report that the existing definition of migration for 93 NFSv4.1 does not properly describe how migrating state is to be 94 merged with existing state for the destination server. While it is 95 to be expected that client and server implementers will judge this to 96 be a situation that is best avoided, the judgment as to how pressing 97 this issue should be considered is a judgment for the reader, and 98 eventually the nfsv4 working group to make. 100 We do explore possible ways in which such issues can be avoided, with 101 minimal negative effects, given that the working group has decided to 102 address these issues, but the choice of exactly how to address these 103 is best given effect in one or more standards-track documents and/or 104 errata. 106 This document focuses on NFSv4.1, since the analogous issues for 107 NFSv4.0 have already been addressed by the publication of [RFC7931]. 108 Nevertheless, the history of these issues in NFSv4.0 is presented, 109 since understanding the similarities and differences between these 110 protocols may be helpful in deciding how best to address remaining 111 issues. 113 2. Conventions 115 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 116 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 117 document are to be interpreted as described in [RFC2119]. 119 In the context of this informational document, these normative 120 keywords will always occur in the context of a quotation, most often 121 direct but sometimes indirect. The context will make it clear 122 whether the quotation is from: 124 o The previously current definitive definition of the NFSv4.0 125 protocol [RFC7530]. 127 o The current definitive definition of the NFSv4.1 protocol 128 [RFC5661]. 130 o A proposed or possible text to serve as a replacement for the 131 current or previous definitive document text. Sometimes, a number 132 of possible alternative texts may be listed and benefits and 133 detriments of each examined in turn. 135 3. NFSv4.0 Issues and Their Resolution 137 3.1. NFSv4.0 Issues 139 Many of the problems seen with Transparent State Migration derived 140 from the inability of servers to determine whether two client IDs, 141 issued on different servers, corresponded to the same client. This 142 difficulty derived in turn from the common practice, recommended by 143 [RFC7530], in which each client presented different client 144 identification strings to different servers, rather than presenting 145 the same identification string to all servers. 147 This practice, later referred to as the "non-uniform" client string 148 approach, derived from concern that, since NFSv4.0 provided no means 149 to determine whether two IP addresses correspond to the server, a 150 single client connected to both might be confused by the fact that 151 state changes made via one IP address might unexpectedly affect the 152 state maintained with respect to the second IP address, thought of as 153 a separate server 155 To avoid this unexpected behavior, clients used the non-uniform 156 client id string approach. By doing so, a client connected to two 157 different servers (or to two IP addresses connected to the same 158 server) appeared to be two different servers. Since the server is 159 under the impression that two different clients are involved, state 160 changes made on each distinct IP address cannot be reflected on 161 another. 163 However, by doing things this way, state migrated from server to 164 server cannot be referred to the actual client which generated it, 165 leading to confusion. 167 In addition to this core problem, the following issues with regard to 168 Transparent State Migration needed to be addressed: 170 o Clarification regarding the ability to merge state from different 171 leases even though their expiration times might not be precisely 172 synchronized. 174 o Clarifying the treatment of client IDs since it is not always 175 clear when clientid4 and when nfs_client_id4 was intended. 177 o Clarifying the logic of returning NFS4ERR_LEASE_MOVED. 179 o Clarifying the handling NFS4ERR_CLID_INUSE. 181 3.2. Resolution of NFSv4.0 Protocol Difficulties 183 The client string identification issue was addressed in [RFC7931] as 184 follows: 186 o Defining both the uniform and non-uniform client id string 187 approaches as valid choices but indicating that the latter posed 188 difficulties for Transparent Stare Migration. 190 o Providing a way that clients could use to determine whether two IP 191 addresses are connected to the same server. 193 o Allowing clients using the uniform approach to avoid negative 194 consequences due to otherwise unexpected behavior since behavior 195 that is a consequence of known trunking relationships is not 196 unexpected. 198 o As a result, servers migrating state are aware of the fact that 199 the same client is associated with two different items of state 200 even when that state was originally created on two different 201 servers. 203 Since all of the other issues noted in Section 3.1 were also 204 addressed, publication of [RFC7931] updating [RFC7530] addressed all 205 known issues with Transparent State Migration in NFSv4.0. 207 4. Issues for NFSv4.1 209 4.1. Issues to Address for NFSv4.1 211 Because NFSv4.1 embraces the uniform client-string approach, as 212 advised by section 2.4 of [RFC5661], addressing migration issues is 213 simpler, in that a shift in client id string models is not required. 214 Instead, NFSv4 returns information in the EXCHANGE_ID response to 215 enable trunking relationships to be determined by the client. 217 The other necessary part of addressing migration issues, providing 218 for the server's merger of leases that relate to the same client, is 219 not currently addressed by [RFC5661] and changes need to be made to 220 make it clear that state needs to be appropriately merged as part of 221 migration, to avoid multiple client IDs between a client-server pair. 223 In addition, there are a number of new features within NFSv4.1 whose 224 relationship with migration needs to be clarified. Some examples: 226 o The interaction of trunking with migration and other aspects of 227 multi-server namespace needs to be clarified. 229 o There needs to be some clarification of how migration, and 230 particularly Transparent State Migration, should interact with 231 pNFS layouts. 233 o The current discussion (in [RFC5661]), of the possibility of 234 server_owner changes is incomplete and confusing. 236 o The expected confirmation status of client IDs transferred by 237 Transparent State Migration needs to be clarified. 239 o There are a number of issues related to the migration of sessions 240 that need to be addressed 242 Discussion of how to resolve these issues will appear in the sections 243 below. 245 4.1.1. Addressing state merger in NFSv4.1 247 The existing treatment of state transfer in [RFC5661], has similar 248 problems to that in [RFC7530] in that it assumes that the state for 249 multiple filesystems formerly on different servers will not be merged 250 so that it appears under a single common client ID. We've already 251 seen the reasons that this is a problem with regard to NFSv4.0. 253 Although we don't have the problems stemming from the non-uniform 254 client-string approach, there are a number of complexities in the 255 existing treatment of state management in the section entitled "Lock 256 State and File System Transitions" in [RFC5661] that make this non- 257 trivial to address: 259 o Migration is currently treated together with other sorts of 260 filesystem transitions including transitioning between replicas 261 without any NFS4ERR_MOVED errors. 263 o There is separate handling and discussion of the cases of matching 264 and non-matching server scopes. 266 o In the case of matching server scopes, the text calls for an 267 unrealistic degree of transparency, suggesting that the source and 268 destination servers need to cooperate in stateid assignment. 270 o In the case of non-matching server scopes, the text does not 271 mention the possibility of the transparent migration of state at 272 all, resulting in a functional regression from NFSV4.0 274 o The potential interaction between migration and trunking has not 275 been addressed. 277 o There is insufficient attention to the question of how clients can 278 deal with the complexities of recovering from migration. As part 279 of this, the implications of the shift of lease migration 280 notification shifting from an error (NFS4ERR_LEASE_MOVED in 281 NFSv4.0) to status bit (SEQ4_STATUS_LEASE_MOVED in NFSv4.1) need 282 to be explored. 284 To summarize, there is a need for an NFSv4.1 treatment of Transparent 285 State Migration that is an extension of that in [RFC7931] and that 286 includes appropriate handling for NFSv4.1 features such as trunking. 288 4.1.2. Addressing pNFS relationship with migration 290 This is made difficult because, within the pNFS framework, migration 291 might mean any of several things: 293 o Transfer of the MDS, leaving DS's as they are. 295 This would be minimally disruptive to those using layouts but 296 would require the pNFS control protocol being used to support the 297 DS being directed to a new MDS. 299 o Transfer of a DS, leaving everything else in place. 301 Such a transfer can be handled without using migration at all. 302 The server can recall/revoke layouts, and issue new ones, as 303 appropriate. 305 o Transfer of the filesystem to a new filesystem with both MDS and 306 DS's moving. 308 In such a transfer, an entirely different set of DS's will be at 309 the target location. There may even be no pNFS support on the 310 destination filesystem at all. 312 Migration needs to support both the first and last of these models. 314 4.1.3. Addressing server_owner changes in NFSv4.1 316 Section 2.10.5 of [RFC5661] states the following. 318 The client should be prepared for the possibility that 319 eir_server_owner values may be different on subsequent EXCHANGE_ID 320 requests made to the same network address, as a result of various 321 sorts of reconfiguration events. When this happens and the 322 changes result in the invalidation of previously valid forms of 323 trunking, the client should cease to use those forms, either by 324 dropping connections or by adding sessions. For a discussion of 325 lock reclaim as it relates to such reconfiguration events, see 326 Section 8.4.2.1. 328 While this paragraph is literally true in that such reconfiguration 329 events can happen and clients have to deal with them, it is confusing 330 in that it can be read as suggesting that clients have to deal with 331 them without disruption, which in general is impossible. 333 A clearer alternative would be: 335 It is always possible that, as a result of various sorts of 336 reconfiguration events, eir_server_scope and eir_server_owner 337 values may be different on subsequent EXCHANGE_ID requests made to 338 the same network address. 340 In most cases such reconfiguration events will be disruptive and 341 indicate that an IP address formerly connected to one server is 342 now connected to an entirely different one. 344 Some guidelines on client handling of such situations follow: 346 o When eir_server_scope changes, the client has no assurance that 347 any id's it obtained previously (e.g. file handles) can be 348 validly used on the new server, and, even if the new server 349 accepts them, there is no assurance that this is not due to 350 accident. Thus it is best to treat all such state as lost/ 351 stale although a client may assume that the probability of 352 inadvertent acceptance is low and treat this situation as 353 within the next case. 355 o When eir_server_scope remains the same and 356 eir_server_owner.so_major_id changes, the client can use 357 filehandles it has and attempt reclaims. It may find that 358 these are now stale but if NFS4ERR_STALE is not received, he 359 can proceed to reclaim his opens. 361 o When eir_server_scope and eir_server_owner.so_major_id remain 362 the same, the client has to use the now-current values of 363 eir_server-owner.so_minor_id in deciding on appropriate forms 364 of trunking. 366 4.1.4. Addressing Confirmation Status of Migrated Client IDs in NFSv4.1 368 When a client ID is transferred between systems as a part of 369 migration, it is not always clear whether it should be considered 370 confirmed or unconfirmed on the target server. In the case in which 371 an associated session is transferred together with the client ID, it 372 is clear that the transferred client ID needs to be considered 373 confirmed, as the existence of an associated session is incompatible 374 with an unconfirmed client ID. 376 The case in which a client ID is transferred without an associated 377 session is less clear-cut and there needs to be a choice between two 378 possibilities: 380 o Consider it unconfirmed, because of the lack of an associated 381 session. This makes it simpler for the client to determine 382 whether there is an associated session transferred at the same 383 time. However, it is inconsistent with the fact there are 384 stateids which have been transferred with the client ID. 386 o Consider it confirmed, because it was confirmed on the source 387 server and the transfer is not considered to have affected that. 388 Although this makes it simpler for the client to determine whether 389 there is an associated session transferred at the same time, an 390 alternative is discussed in Section 4.1.5. 392 A related issue concerns the potential use the SEQ4_STATUS flags to 393 determine whether all or some of the state present on the source has 394 been transferred the destination server. This could be done using 395 either of the alternatives above but it is more in the spirit of the 396 second alternative. One potential use of these flags is discussed in 397 more detail in Section 4.2.2. 399 4.1.5. Addressing Session Migration in NFSv4.1 401 Some issues that need to be addressed regard the migration of 402 sessions, in addition to client IDs and stateids 404 o It needs to be made clearer how the client can deal with the 405 possibility that sessions might or might not be transferred as 406 part of Transparent State Migration. 408 o Rules need to be clarified regarding possible transfer of sessions 409 when either the source session is being used to access other file 410 systems on source server or there is already a session connecting 411 the client to the destination server. 413 o There needs to be more detail regarding how the protocol avoids 414 situations in which the same session is subject to concurrent 415 changes on two different servers at the same time. 417 4.2. Possible Resolutions for NFSv4.1 Issues 419 The subsections below explore some ways of dealing with the issues 420 discussed in Section 4.1 422 First we introduce some terminology we will be using in these 423 sections: 425 o Location attributes include the fs_locations and fs_locations_info 426 attributes. 428 o Location entries are the individual file system locations in the 429 location attributes. 431 o Location elements are derived from location entries. If a 432 location entry specifies an IP address there is only a single 433 corresponding location element. Location entries that contain a 434 host name, are resolved using DNS, and may result in one or more 435 location elements. All location elements consist of a location 436 address which is the IP address of an interface to a server and an 437 fs name which is the location of the file system within the 438 server's pseudo-fs. The fs name is empty if the server has no 439 pseudo-fs and only a single exported file system at the root 440 filehandle. 442 o Two location elements are trunkable if they specify the same fs 443 name and the location addresses are such that trunking of the 444 location addresses can be used as shown by the server_owner values 445 returned. 447 4.2.1. Server Responsibilities in Effecting Transparent State Migration 449 The basic responsibility of the source server in effecting 450 Transparent State Migration is to make available to the destination 451 server a description of each piece of locking state associated with 452 the file system being migrated. In addition to client id string and 453 verifier, the source server needs to provide. for each stateid: 455 o The stateid including the current sequence value. 457 o The associated client ID. 459 o The handle of the associated file. 461 o The type of the lock, such as open, byte-range lock, delegation, 462 layout. 464 o For locks such as opens and byte-range locks, there will be 465 information about the owner(s) of the lock. 467 o For recallable/revocable lock types, the current recall status 468 needs to be included. 470 o For each lock type there will by type-specific information, such 471 as share and deny modes for opens and type and byte ranges for 472 byte-range locks and layouts. 474 A further server responsibility concerns locks that are revoked or 475 otherwise lost during the process of file system migration. Because 476 locks that appear to be lost during the process of migration will be 477 reclaimed by the client, the servers have to take steps to ensure 478 that locks revoked soon before or soon after migration are not 479 inadvertently allowed to be reclaimed in situations in which the 480 continuity of lock possession cannot be assured. 482 o For locks lost on the source but whose loss has not yet been 483 acknowledged by the client (by using FREE_STATEID), the 484 destination must be aware of this loss so that it can deny a 485 request to reclaim them. 487 o For locks lost on the destination after the state transfer but 488 before the client's RECLAIM_COMPLTE is done, the destination 489 server should note these and not allow them to be reclaimed. 491 A further responsibility of the servers concerns situations in which 492 stateid cannot be transferred transparently because it conflicts with 493 an existing stateid held by the client and associated with a 494 different file systems. In this case there are two valid choices: 496 o Treat the transfer, as in NFSv4.0, as one without Transparent 497 State Migration. In this case, conflicting locks cannot be 498 granted until the client does a RECLAIM_COMPLETE, after reclaiming 499 the lock it had, with the exception of reclaims denied because 500 they were attempts to reclaim locks that had been lost. 502 o Implement Transparent State Migration, except for the lock with 503 the conflicting stateid. In this case, the client will be aware 504 of a lost lock (through the SEQ4_STATUS flags) and be allowed to 505 reclaim it. 507 4.2.2. Determining Initial Migration Status in NFSv4.1 509 This section proposes a way in which a client which receives 510 NFS4ERR_MOVED can determine: 512 o Whether the NFS4ERR_MOVED indicates migration has occurred, or 513 whether it indicates another sort of file system transition as 514 discussed in Section 4.2.4 516 o In the case of migration, whether Transparent State Migration has 517 occurred. 519 o Whether any state has been lost during the process of Transparent 520 State Migration. 522 o Whether sessions have been transferred as part of Transparent 523 State Migration. 525 This is written assuming that the second option regarding client ID 526 confirmation status after migration (as discussed in Section 4.1.4) 527 is adopted. However that choice is not essential to the procedure 528 and could be changed. 530 The process begins by the client examining the location entries using 531 either of the location attributes. For those whose fs name matches 532 that currently being used, an EXCHANGE_ID is directed at the location 533 address and the server_owner and scope used to determine if the entry 534 is trunkable with that previously being used to access the file 535 system (i.e. that it represents another path to the same file system 536 and can share locking state with it). If it is, then this should be 537 treated as a transition from one set of paths to another, as 538 described in Section 4.2.4, rather than a migration event. 540 Otherwuse, if one or more of the EXCHANGE_ID operations above has 541 encountered a distinct server, then migration has occurred and the 542 procedure continues. If there were no location entries with a 543 matching fs name, then one with another fs name is selected, an 544 EXCHANGE_ID is done, and the procedure continues using the result of 545 that operation. 547 The determination of whether Transparent State Migration has occurred 548 is driven by the client ID returned and its confirmation status. 550 o If the client ID is an unconfirmed client ID not previously known 551 to the client, then Transparent State Migration has not occurred. 553 o If the client ID is a confirmed client ID previously known to the 554 client, then any transferred state would have been merged with an 555 existing client ID representing the client to the destination 556 server. In this state merger case, Transparent State Migration 557 might or might not have occurred. 559 o If the client ID is a confirmed client ID not previously known to 560 the client, then the client can conclude that the client ID was 561 transferred as part of Transparent State Migration. In this 562 transferred client ID case, Transparent State Migration has 563 occurred although some state may have been lost. 565 In the state merger case, it is possible that the server has not 566 attempted Transparent State Migration, in which case state may have 567 been lost without it being reflected in the SEQ4_STATUS bits. To 568 determine whether this has happened, the client can use TEST_STATEID 569 to check whether the stateids created on the source server are still 570 accessible on the destination server. Once a single stateid is found 571 to have been successfully transferred, the client can conclude that 572 Transparent State Migration was begun and any failure to transport 573 all of the stateids will be reflected in the SEQ4_STATUS bits. 575 In any of the cases in which Transparent State Migration has 576 occurred, it is possible that a session was transferred as well. To 577 deal with that possibility, clients can, after doing the EXCHANGE_ID, 578 issue a BIND_CONN_TO_SESSION to connect the transferred session to a 579 connection to the new server. If that fails, it is an indication 580 that the session was not transferred and that a new session needs to 581 be created to take its place. 583 4.2.3. Client Response to Migration in NFSv4.1 585 Once the client has determined the initial migration status, it needs 586 to re-establish its lock state, if possible. To enable this to 587 happen without loss of the guarantees normally provided by locking, 588 the destination server needs to implement a per-fs grace period in 589 all cases in which lock state was lost, including those in which 590 Transparent State Migration was not implemented. 592 The following cases need to be dealt with: 594 o In a case in which Transparent State Migration has not occurred, 595 the client can use the per-fs grace period provided by the 596 destination server to reclaim locks that were held on the source 597 server. 599 o In a cases in which Transparent State Migration has occurred, and 600 no lock state was lost (as shown by SEQ4_STATUS flags), no lock 601 reclaim is necessary. 603 o In a case in which Transparent State Migration has occurred, and 604 some lock state was lost (as shown by SEQ4_STATUS flags), existing 605 stateids need to be checked for validity using TEST_STATEID, and 606 reclaim used to re-establish any that were not transferred. 608 For all of the cases above, RECLAIM_COMPLETE with an rca_one_fs value 609 of true should be done before normal use of the file system including 610 obtaining new locks for the file system. This applies even if no 611 locks were lost and needed to be reclaimed. 613 4.2.4. Dealing with Multiple Location Entries 615 The possibility that more than one server address may be present in 616 location attributes requires further clarification. This is 617 particularly the case, given the potential role of trunking for 618 NFSv4.1, whose connection to migration needs to be clarified. 620 The description of the location attributes in [RFC5661], while it 621 indicates that multiple address entries in these attributes may be 622 used to indicate alternate paths to the file system, does so mainly 623 in the context of replication and does so without mentioning 624 trunking. The discussion of migration does not discuss the 625 possibility of multiple location entries or trunking, which we will 626 explore here. 628 We will cover cases in which multiple addresses appear directly in 629 the attributes as well as those in which the multiple addresses 630 result because a single location entry is expanded into multiple 631 location elements using addresses provided by DNS. 633 When the set of valid location elements by which a file system may be 634 accessed changes, migration need not be involved. Some cases to 635 consider: 637 o When the set of location elements expands, migration is not 638 involved. In the case in which the additional elements are not 639 trunkable with ones previously being used, the new elements serve 640 as additional access locations, available in case of the failure 641 of server addresses being used. When additional elements are 642 trunkable with those currently being used the client may use the 643 additional addresses just as they might have if they had been 644 available when use of the file system began. 646 There is no current mechanism by which the client can be notified 647 of a change in the set of available location for an fs. Given the 648 client has at least one IP address available to access the 649 filesystem in question, periodic polling is an adequate mechanism 650 for the client to find additional server addresses to use to 651 access the file system. 653 o When the set of location elements contracts but none of the 654 elements no longer usable were in fact being used by the client, 655 then no migration is involved. Only if the client were to start 656 using one of the unavailable elements will the client be notified 657 (via NFS4ERR_MOVED) of the need to not use those elements and to 658 use others provided by a location attribute. 660 When a specific server address being used becomes unavailable to 661 service a particular file system, NF4ERR_MOVED will be returned, and 662 the client will respond based on the available locations. Whether 663 continuity of locking state will be available depends on a number of 664 factors: 666 o If there are still elements in use trunkable with the element that 667 has become unavailable, there will still be a continuity of 668 locking state, even though Transparent State Migration per se has 669 not occurred. If the in-use addresses are session-trunkable with 670 the address becoming unavailable, only one connection is lost and 671 all existing sessions will remain available. If, on the other 672 hand, the in-use addresses are only clientid-trunkable with the 673 address becoming unavailable, a session can be lost. However, 674 that session can be made available on those other nodes, just as 675 they it would have been if Transparent State Migration were in 676 effect, even though no migration has occurred. 678 o Otherwise, if there are available addresses trunkable with the one 679 that has become unavailable, the client has access to existing 680 locking state once it establishes a connection with the new 681 addresses, using a new or existing session depending on the type 682 of trunking in effect. This is also similar to the case in which 683 Transparent State Migration has occurred, even though there is no 684 migration, with the state remaining on the existing server. 686 Note that this case, as well as the previous one, can be expected 687 in the case in which the server seeks to direct traffic with 688 regard to particular file systems to choose addresses, in the 689 interest of load balancing, to adjust to hardware availability 690 constraints, or for other reasons. 692 o In other cases, migration has occurred and the client can use the 693 procedure described in Section 4.2.2 to determine whether 694 Transparent State Migration occurred and whether any locking state 695 was lost during the transfer. 697 One should note the following differences between migration with 698 Transparent State Migration and the similar cases in which there is a 699 continuity of locking state with no change in the server. 701 o When locks are lost (as indicated when using them or via the 702 SEQ4_STAUS flags) and migration has not been done, they are not to 703 be reclaimed. Instead such losses are treated as lock revocations 704 and acknowledged using FREE_STATEID. 706 o When migration has not been done, there is no need for a 707 RECLAIM_COMPLETE (with rca_one_fs set to true). 709 4.2.5. Client Recovery from Migration Events 711 When a file system is migrated, there a number of migration-related 712 status indications with which clients need to deal: 714 o If an attempt is made to use or return a filehandle within a file 715 system that has been migrated away from the server on which it was 716 previously available, the error NFS4ERR_MOVED is returned. 718 This condition continues on subsequent attempts to access the file 719 system in question. The only way the client can avoid the error 720 is to cease accessing the filesystem in question at its old server 721 location and access it instead on the server to which it has been 722 migrated. 724 o Whenever a SEQUENCE operation is sent by a client to a server 725 which generated state held on that client which is associated with 726 a file system that has been migrated away from the server on which 727 it was previously available, the status bit 728 SEQ4_STATUS_LEASE_MOVED is set in the response. 730 This condition continues until the client acknowledges the 731 notification by fetching a location attribute for the migrated 732 file system. When there are multiple migrated file systems, a 733 location attribute for each such migrated file system needs to be 734 fetched, in order to clear the condition. Even after the 735 condition is cleared, the client needs to respond by using the 736 location information to access the destination server to ensure 737 that leases are not needlessly expired. 739 Unlike the case of NFSv4.0 in which the corresponding conditions are 740 both errors, in NFSv4.1 the client can, and often will, receive both 741 indications on the same request. As a result, the question of how to 742 co-ordinate the necessary recovery actions when both indications 743 arrive simultaneously must be resolved. It should be noted that when 744 the server decides whether SEQ4_STATUS_LEASE_MOVED is ti be set, it 745 has no way of knowing which file system will be referenced or whether 746 NFS4ERR_MOVED will be returned. 748 While it is true that, when only a single migrated file system is 749 involved, a single set of actions will clear both indications, the 750 possibility of multiple migrated file systems calls for an approach 751 in which there are separate recovery actions for each indication. In 752 general, the response to neither indication can be subsumed within 753 the other since: 755 o If the client were to respond only to the MOVED indication, there 756 would be no effective client response to a situation in which a 757 file system was not being actively accessed at the time migration 758 occurred. As a result, leases on the destination server might be 759 needlessly expired. 761 o If the client were to respond only to the LEASE_MOVED indication, 762 recovery for migrated file systems in active use could be deferred 763 in order to accomplish recovery for others not being actively 764 accessed. The consequences of this choice can pose particular 765 problems when there are a large number of file systems supported 766 by a particular server, or when it happens that some servers, 767 after receiving migrated file systems have periods of 768 unavailability, such as occur as a result of server reboot. This 769 can result in recovery for actively accessed migrated file systems 770 being unnecessarily delayed for long periods of time. 772 Similar considerations apply to other arrangements in which one of 773 the indications, while not ignored per se, is subsumed within a 774 single recovery process focused on recovery for the other indication. 776 Generally speaking, client recovery for these indications should have 777 the following characteristics: 779 o All instances of the MOVED indication should be dealt with 780 promptly, either by doing the necessary recovery directly, 781 providing that it be done asynchronously, or ensuring that it is 782 already under way. 784 o All instances of the LEASE_MOVED indication should be dealt with 785 asynchronously, in a migration discovery thread whose job is to 786 clear that indication by fetching the appropriate location 787 attribute. Because this thread will only be fetching a location 788 attribute and the fs_status attribute for the file systems 789 referenced by the client, it cannot receive MOVED indications. 790 Some useful guidance regarding possible implementation of the 791 migration discovery thread can be found in Section 4.2.6. 793 o When a migration discovery thread happens upon a migrated file 794 system (i.e. not present and not a referral), the thread is likely 795 to have cleared one (out of an unknown number) of file systems 796 whose migration needs to be responded to. The discovery thread 797 needs to schedule the appropriate migration recovery (as described 798 in Section 4.2.3). This is necessary to ensure that migrated file 799 systems will be referenced on the destination server in order to 800 avoid lease expiration 802 For many of the migrated file systems discovered in this way, the 803 client has not received any MOVED indication. In such cases, 804 lease recovery needs to be scheduled but it should not interfere 805 with continuation of the migration discovery function. 807 o When a migration discovery thread receives a LEASE_MOVED 808 indication, it takes no special action but continues its normal 809 operation. On the other hand, if a LEASE_MOVED indication is not 810 received, it indicates that the thread has completed its work 811 successfully. 813 4.2.6. The Migration Discovery Process 815 As noted above, LEASE_MOVED indications are best dealt with in a 816 migration discovery thread. Because of this structure, 818 o No action needs to be taken for such indications received by the 819 migration discovery threads, since continuation of that thread's 820 work will address the issue. 822 o For such indications received in other contexts, the generally 823 appropriate response is to initiate or otherwise provide for the 824 execution of a migration discovery thread for file systems 825 associated with the server IP address returning the indication. 827 o In all cases in which the appropriate migration discovery thread 828 is running, nothing further need be done to respond to LEASE_MOVED 829 indications. 831 This leaves a potential difficulty in situations in which the 832 migration discovery thread is near to completion but is still 833 operating. One should not ignore a LEASE_MOVED indication if the 834 discovery thread is not able to respond to migrated file system 835 without additional aid. A further difficulty in addressing such 836 situation is that a LEASE_MOVED indication may reflect the server's 837 state at the time the SEQUENCE operation was processed, which may be 838 different from that in effect at the time the response is received. 840 A useful approach to this issue involves the use of separate 841 externally-visible discovery thread states representing non- 842 operation, normal operation, and completion/verification of migration 843 discovery processing. 845 Within that framework, discovery thread processing would proceed as 846 follows. 848 o While in the normal-operation state, the thread would fetch, for 849 successive file systems known to the client on the server being 850 worked on, a location attribute plus the fs_status attribute. 852 o If the fs_status attribute indicates that the file system is a 853 migrated one (i.e. fss_absent is true and fss_type != 854 STATUS4_REFERRAL) and thus that it is likely that the fetch of the 855 location attribute has cleared one the file systems contributing 856 to the LEASE_MOVED indication. 858 o In cases in which that happened, the thread cannot know whether 859 the LEASE_MOVED indication has been cleared and so it enters the 860 completion/verification state and proceeds to issue a COMPOUND to 861 see if the LEASE_MOVED indication has been cleared. 863 o When the discovery thread is in the completion/verification state, 864 if others get a LEASE_MOVED indication they note this fact and it 865 is used when the request completes, as described below. 867 When the request used in the completion/verification state completes: 869 o If a LEASE_MOVED indication is returned, the discovery thread 870 resumes its normal work. 872 o Otherwise, if there is any record that other requests saw a 873 LEASE_MOVED indication, that record is cleared and the 874 verification request retried. The discovery thread remains in 875 completion/verification state. 877 o If there has been no LEASE_MOVED indication, the work of the 878 discovery thread is considered completed and it enters the non- 879 operating state. 881 4.2.7. Synchronzing Session Transfer 883 When transferring state between the source and destination, the 884 issues discussed in Section 7.2 of [RFC7931] must still be attended 885 to. In this case, the use of NFS4ERR_DELAY is still necessary in 886 NFSv4.1, as it was in NFSv4.0, to prevent locking state changing 887 while it is being transferred. 889 There are a number of important differences in the NFS4.1 context: 891 o The absence of RELEASE_LOCKOWNER means that the one case in which 892 an operation could not be deferred by use of NFS4ERR_DELAY no 893 longer exists. 895 o Sequencing of operations is no longer done using owner-based 896 operation sequences numbers. Instead, sequencing is session- 897 based 899 As a result, when sessions are not transferred, the techniques 900 discussed in [RFC7931] are adequate and will not be further 901 discussed. 903 When sessions are transferred, there are a number of issues that pose 904 challenges since, 906 o A single session may be used to access multiple file systems, not 907 all of which are being transferred. 909 o Requests made on a session, even if rejected may, affect the state 910 of the session by advancing the sequence number associated with 911 the slot used. 913 As a result, when the filesystem state might otherwise be considered 914 unmodifiable, the client might have any number of in-flight requests, 915 each of which is capable of changing session state, which may be of a 916 number of types: 918 1. Those requests that were processed on the migrating file system, 919 before migration began. 921 2. Those requests which got the error NFS4ERR_DELAY because the file 922 system being accessed was in the process of being migrated. 924 3. Those requests which got the error NFS4ERR_MOVED because the file 925 system being accessed had been migrated. 927 4. Those requests that accessed the migrating file system, in order 928 to obtain location or status information. 930 5. Those requests that did not reference the migrating file system. 932 It should be noted that the history of any particular slot is likely 933 to include a number of these request classes. In the case in which a 934 session which is migrated is used by filesystems other than the one 935 migrated, requests of class 5 may be common and be the last request 936 processed, for many slots. 938 Since session state can change even after the locking state has been 939 fixed as part of the migration process, the session state known to 940 the client could be different from that on the destination server, 941 which necessarily reflects the session state on the source server, at 942 an earlier time. In deciding how to deal with this situation, it is 943 helpful to distinguish between two sorts of behavioral consequences 944 of the choice of initial sequence ID values. 946 o The error NFS4ERR_SEQ_MISORDERED is returned when the sequence ID 947 in a request is neither equal to the last one seen for the current 948 slot nor the next greater one. 950 In view of the difficulty of arriving at a mutually acceptable 951 value for the correct last sequence a the point of migration, it 952 may be necessary for the server to show some degree of 953 forbearance, when the sequence ID is one that would be considered 954 unacceptable if session migration were not involved. 956 o Returning the cached reply for a previously executed request when 957 the sequence ID in the request matches the last value recorded for 958 the slot. 960 In the cases in which an error is returned and there is no 961 possibility of any non-idempotent operation having been executed, 962 it may not be necessary to adhere to this as strictly as might be 963 proper if session migration were not involved. For example, the 964 fact that the error NFS4ERR_DELAY was returned may not assist the 965 client in any material way, while the fact that NFS4ERR_MOVED was 966 returned by the source server may not be relevant when the request 967 was reissued, directed to the destination server. 969 One part of adapting to these sorts of issues would restrict 970 enforcement of normal slot sequence enforcement semantics until the 971 client itself, by issuing a request using a particular slot on the 972 destination server, established the new starting sequence for that 973 slot on the migrated session. 975 An important issue is that the specification needs to take note of 976 all potential COMPOUNDs, even if they might be unlikely in practice. 977 For example, a COMPOUND is allow to access multiple file systems and 978 might perform non-idempotent operations in some of them before 979 accessing a file system being migrated. Also, a COMPOUND may return 980 considerable data in the response, before being rejected with 981 NFS4ERR_DELAY or NFS4ERR_MOVED, and may in addition be marked as 982 sa_cachethis. 984 Some possibilities that need to be considered to address the issues: 986 o Do not enforce any sequencing semantics for a particular slot 987 until the client has established the starting sequence for that 988 slot on the destination server. 990 o For each slot, do not return a cached reply returning 991 NFS4ERR_DELAY or NFS4ERR_MOVED until the client has established 992 the starting sequence for that slot on the destination server. 994 o Until the client has established the starting sequence for a 995 particular slot on the destination server, do not report 996 NFS4ERR_SEQ_MISORDERED or return a cached reply returning 997 NFS4ERR_DELAY or NFS4ERR_MOVED, where the reply consists solely of 998 a series of operations where the response is NFS4_OK until the 999 final error. 1001 4.2.8. Migration and pNFS 1003 When pNFS is involved, migration is capable of supporting: 1005 o Migration of the MDS, leaving DS's in place. 1007 o Migration of the file system as a whole, including the MDS and 1008 associated DS's. 1010 o Replacement of one DS by another. 1012 o Migration of a pNFS file system to one in which pNFS is not used. 1014 o Migration of a file system not using pNFS to one in which layouts 1015 are available. 1017 Migration of the MDS function is directly supported by Transparent 1018 State Migration. Layout state will normally be transparently 1019 transferred, just as other state is. As a result, Transparent State 1020 Migration provides a framework in which, given appropriate inter-MDS 1021 data transfer, one MDS can be substituted for another. 1023 Migration of the file system function can be accomplished by 1024 recalling all layouts as part of the initial phase of the migration 1025 process. As a result, IO will be done through the MDS during the 1026 migration process, and new layouts can be granted once the client is 1027 interacting with the new MDS. An MDS can also effect this sort of 1028 transition by revoking all layouts as part of Transparent State 1029 Migration, as long as the client is notified about the loss of state. 1031 In order to allow migration to a file system on which pNFS is not 1032 supported, clients need to be prepared for a situation in layouts are 1033 not available or supported on the destination file system and be 1034 prepared to direct IO request to the destination server, rather than 1035 depending on layouts being available. 1037 Replacement of one DS by another is not addressed by migration as 1038 such but can be effected by an MDS recalling layouts for the DS to be 1039 replaced and issuing new ones to be served by the successor DS. 1041 Migration may transfer a file system from a server which does not 1042 support pNFS to one which does. In order to properly adapt to this 1043 situation, clients which support pNFS, but function adequately in its 1044 absence, should check for pNFS support when a file system is migrated 1045 and be prepared to use pNFS when support is available. 1047 5. Security Considerations 1049 With regard to NFSv4.0, the Security Considerations section of 1050 [RFC7530] encourages clients to protect the integrity of the SECINFO 1051 operation, any GETATTR operation for the fs_locations attribute. A 1052 needed change is to include the operations SETCLIENTID/ 1053 SETCLIENTID_CONFIRM as among those for which integrity protection is 1054 recommended. A migration recovery event can use any or all of these 1055 operations. 1057 With regard to NFSv4.1, the Security Considerations section of 1058 [RFC5661] takes proper care of migration-related issues. No change 1059 is needed. 1061 6. IANA Considerations 1063 This document does not require actions by IANA. 1065 7. Normative References 1067 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1068 Requirement Levels", BCP 14, RFC 2119, 1069 DOI 10.17487/RFC2119, March 1997, 1070 . 1072 [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., 1073 "Network File System (NFS) Version 4 Minor Version 1 1074 Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, 1075 . 1077 [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System 1078 (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, 1079 March 2015, . 1081 [RFC7931] Noveck, D., Ed., Shivam, P., Lever, C., and B. Baker, 1082 "NFSv4.0 Migration: Specification Update", RFC 7931, 1083 DOI 10.17487/RFC7931, July 2016, 1084 . 1086 Appendix A. Acknowledgements 1088 The editor and authors of this document gratefully acknowledge the 1089 contributions of Trond Myklebust of NetApp and Robert Thurlow of 1090 Oracle. We also thank Tom Haynes of Primary Data and Spencer Shepler 1091 of Microsoft for their guidance and suggestions. 1093 Special thanks go to members of the Oracle Solaris NFS team, 1094 especially Rick Mesta and James Wahlig, for their work implementing 1095 an NFSv4.0 migration prototype and identifying many of the issues 1096 documented here. 1098 Authors' Addresses 1100 David Noveck (editor) 1101 NetApp 1102 26 Locust Avenue 1103 Lexington, MA 02421 1104 US 1106 Phone: +1 781 572 8038 1107 Email: davenoveck@gmail.com 1109 Piyush Shivam 1110 Oracle Corporation 1111 5300 Riata Park Ct. 1112 Austin, TX 78727 1113 US 1115 Phone: +1 512 401 1019 1116 Email: piyush.shivam@oracle.com 1118 Charles Lever 1119 Oracle Corporation 1120 1015 Granger Avenue 1121 Ann Arbor, MI 48104 1122 US 1124 Phone: +1 248 614 5091 1125 Email: chuck.lever@oracle.com 1127 Bill Baker 1128 Oracle Corporation 1129 5300 Riata Park Ct. 1130 Austin, TX 78727 1131 US 1133 Phone: +1 512 401 1081 1134 Email: bill.baker@oracle.com