Internet-Draft | RECALL_DEVICE | March 2024 |
Haynes | Expires 20 September 2024 | [Page] |
The Parallel Network File System (pNFS) allows for the metadata server to use CB_LAYOUTRECALL to recall a layout from a client by file id or file system id or all. It also allows the server to use CB_NOTIFY_DEVICEID to delete a devicid. It does not provide a mechanism for the metadata server to recall all layouts that have a data file on a specific deviceid. This document presents an extension to RFC8881 to allow the server recall layouts from clients based on deviceid.¶
This note is to be removed before publishing as an RFC.¶
Discussion of this draft takes place on the NFSv4 working group mailing list (nfsv4@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/nfsv4/. Working Group information can be found at https://datatracker.ietf.org/wg/nfsv4/about/.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 20 September 2024.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
In the Network File System version4 (NFSv4) with a Parallel NFS (pNFS) metadata server ([RFC8881]), there is no mechanism for the metadata server to recall layouts from the client for when a particular deviceid (see Section 3.3.14 of [RFC8881]) either temporarily or permanently is no longer available.¶
One use case is when the deviceids in a layout are separated by power fault domains. Each layout might describe 3 different devices, each contained in a different power fault domain. In such a scenario, a single fault domain can have the power removed and not cause the loss of access to the data. However, client I/O will be impacted as the client still has to perform WRITEs (see Section 18.32 of [RFC8881]) to the unavailable device, send LAYOUTERRORs (see Section 15.6 of [RFC7862]) to inform the metadata server of NFS4ERR_NXIO (see Section 15.1.16.3 of [RFC8881]).¶
If the metadata sever had the means to recall layouts by deviceid, a lot of this unnecessary traffic could be eliminated. Finally, while the metadata server could recall layouts one by one, this is again unnecessary traffic and can be offloaded to the client.¶
Besides the use case above, consider if the metadata server wants to set the NOTIFY4_DEVICEID_DELETE in the CB_NOTIFY_DEVICEID callback (see Section 20.12 of [RFC8881]). This flag cannot be set if a layout is outstanding for a deviceid. While the metadata server can revoke all such layouts, there is no way to know that the client has acknowledged that revocation and hence is still not doing I/O to other data files in the layout. The metadata server could fence those layouts as well (see Section 12.5.5 of [RFC8881]), but that can be an expensive operation.¶
Using the process detailed in [RFC8178], the revisions in this document become an extension of NFSv4.2 [RFC7862]. They are built on top of the external data representation (XDR) [RFC4506] generated from [RFC7863].¶
This section is to be removed before publishing as an RFC.¶
The authors have tried to introduce this new functionality outside of a particular pNFS Layout Type. Does that work?¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
The original union layoutrecall4 (see Section 20.3.1 of [RFC8881]) is:¶
<CODE BEGINS> file "new_union_layoutrecall4" enum layoutrecall_type4 { LAYOUTRECALL4_FILE = LAYOUT4_RET_REC_FILE, LAYOUTRECALL4_FSID = LAYOUT4_RET_REC_FSID, LAYOUTRECALL4_ALL = LAYOUT4_RET_REC_ALL }; union layoutrecall4 switch(layoutrecall_type4 lor_recalltype) { case LAYOUTRECALL4_FILE: layoutrecall_file4 lor_layout; case LAYOUTRECALL4_FSID: fsid4 lor_fsid; case LAYOUTRECALL4_ALL: void; }; <CODE ENDS>¶
The proposed extension is:¶
<CODE BEGINS> file "new_union_layoutrecall4" /// const LAYOUT4_RET_REC_ALL = 4; /// /// enum layoutrecall_type4 { /// LAYOUTRECALL4_FILE = LAYOUT4_RET_REC_FILE, /// LAYOUTRECALL4_FSID = LAYOUT4_RET_REC_FSID, /// LAYOUTRECALL4_ALL = LAYOUT4_RET_REC_ALL, /// LAYOUTRECALL4_DEVICEID = LAYOUTRECALL4_RET_REC_DEVICEID /// }; /// /// union layoutrecall4 switch(layoutrecall_type4 lor_recalltype) { /// case LAYOUTRECALL4_FILE: /// layoutrecall_file4 lor_layout; /// case LAYOUTRECALL4_FSID: /// fsid4 lor_fsid; /// case LAYOUTRECALL4_DEVICEID: /// deviceid4 lor_deviceid; /// case LAYOUTRECALL4_ALL: /// void; /// }; <CODE ENDS>¶
With this minimal change, all of the semantics of CB_LAYOUTRECALL in (see Section 20.3 of [RFC8881]) remain the same, i.e., the client and server are aware of how CB_LAYOUTRECALL interacts with each other. The one issue to investigated is what happens if a NFSv4.2 client sees a LAYOUTRECALL4_DEVICEID in a CB_LAYOUTRECALL. They SHOULD return NFS4ERR_UNION_NOTSUPP, but the implementations might not be compliant with [RFC8178]. As such, a survey should be conducted of the major implementations.¶
Finally, when the client does handle a LAYOUTRECALL4_DEVICEID in a CB_LAYOUTRECALL, it MUST return all layouts which have a given deviceid. The server can determine that the client no longer has any layouts with the given devicedid once the client replies with NFS4ERR_NOMATCHING_LAYOUT.¶
This document contains the external data representation (XDR) [RFC4506] description of the new open flags for delegating the file to the client. The XDR description is embedded in this document in a way that makes it simple for the reader to extract into a ready-to-compile form. The reader can feed this document into the following shell script to produce the machine readable XDR description of the new flags:¶
<CODE BEGINS> #!/bin/sh grep '^ *///' $* | sed 's?^ */// ??' | sed 's?^ *///$??' <CODE ENDS>¶
That is, if the above script is stored in a file called "extract.sh", and this document is in a file called "spec.txt", then the reader can do:¶
<CODE BEGINS> sh extract.sh < spec.txt > layout_wcc.x <CODE ENDS>¶
The effect of the script is to remove leading white space from each line, plus a sentinel sequence of "///". XDR descriptions with the sentinel sequence are embedded throughout the document.¶
Note that the XDR code contained in this document depends on types from the NFSv4.2 nfs4_prot.x file (generated from [RFC7863]). This includes both nfs types that end with a 4, such as offset4, length4, etc., as well as more generic types such as uint32_t and uint64_t.¶
While the XDR can be appended to that from [RFC7863], the various code snippets belong in their respective areas of the that XDR.¶
Both the XDR description and the scripts used for extracting the XDR description are Code Components as described in Section 4 of "Legal Provisions Relating to IETF Documents" [LEGAL]. These Code Components are licensed according to the terms of that document.¶
There are no new security considerations beyond those in [RFC7862].¶
IANA should use the current document (RFC-TBD) as the reference for the new entries.¶
Trond Myklebust and Paul Saab have were invloved in the initial requirements for this functionality.¶