Notes from NFSv4 working meeting at IETF 76 held on November 12, 2009. Notes are from Trond Myklebust and Mike Eisler. Documents Release Update (Eisler) - A month ago, RFC editor sent copy. - 60 suggestions that they want editors to consider. - Goal is to publish by end of year (document, xdr, blocks & objects doc). Federated FS (Eisler/Lentini) - FedFS requirements has been approved for publication as an RFC - In the RFC editor's queue... - Other documents are in last call. - Four drafts published as WG documents - Because FedFS requirements document makes normative references to NFSv4.1, it is waiting for publication of NFSv4. - DNS directorate has reviewed the DNS record document - Needs to add proper disclaimer - Needs to respond to mailing list... - LDAP schema document o Has gone through expert review o TM has demonstrated and released a Linux prototype. o Updated schema in -04 by splitting path name fields into components o Needs to choose the format for NFS path (some prefer string encoding) - Eisler said he prefers using the NFS method. o Decide if and how and NSDB is discoverable via DNS SRV o Need LDAP expert review for changes - Admin protocol o Version -03 specifies the path format to be same type as NFSv4 (used to be string) o Proposal from the security directorate to discuss trust anchor management o Wants to ensure that communication between NFSv4/.1 servers would be given a certificate on junction construction, so can use TLS communication with NSDB server. - Eisler worries that Admin host may be compromised, and add bogus certificates. - Myklebust: why are we worried about certificate poisoning when Admin uses RPCSEC_GSS to talk to NFSv4/.1 server? - Eisler: Worried about case where Admin server has keytab. Also granting the admin authority to dictate approved Certificate Authorities seems heavy handed. - Eisler: What if the communication between the NFS server and the LDAP service is not protected with certificates (even TLS has cipher suites that are not based on certificates) o Add a parameter to the query junction procedure to instruct a fileserver to test/resolve the junction. - Meetings are held once a week. See slides. Callers from outside Canada/US should contact James Lentini for a local dial in phone number. NFSv4 Multi-Domain Access (Myklebust for Adamson) o Unfortunately no notes are available as the designated note taker was presenting, and Eisler was running the presentation. The other note takers did not submit notes. Requirements for NFSv4.2 (Eisler) - Draft document is a personal I-D, not an official work item - No commitment from IESG/IETF to re-charter our WG for NFSv4.2. - Eisler is seeking that re-chartering. - Summary of motivation for NFSv4.1 o Storage needs to react to IT trends - NFSv4.1 was reaction to - scale out (pNFS) - high speed networking (exactly once semantics) - XID based replay caches too limited - Storage is now faced with new major trends - Space/efficiency demands - cost of energy & backup times o Flash is now least expensive medium when measuring cost/IOPS o compliance - There are laws that regulate customer data management o We didn't get all of NFSv4.1 right - Disc capacities are doubling on 1-2 year cycles o Access times are not o Neither are allotments for data management operations o Energy price spikes are a compounding problem o Storage industry has responded with de-duplication - NFS needs to catch up - need space reporting, hole punching, de-duplication mapping on read/write - Peer to peer networking has been proven o For some workloads, NFS clients and server would benefit from this model o See draft-myklebust-nfsv4-pnfs-backend - Today, pNFS allows I/O offload, but not metadata offload o Doesn't have to be the case, see draft-eisler-nfsv4-pnfs-metastripe - File copy is more efficient if NFS servers take care of it o Now have APIs on some NFS clients for performing file copy - draft-lentini-nfsv4-server-side-copy has reached WG consensus... - Adding flash to storage arrays is goodness o doesn't require changes to storage protocols - Value of flash is best realized on the client side - We could cede this ground to DAS o Or we could embrace use cases that leverage client-side flash for network storage o caching - Sub-file caching is needed. - Myklebust: persistent caching? - Eisler: That is certainly a side benefit of flash, and of course NFSv4 delegations are specified for that use case. But my primary thinking was that flash has higher capacity for less cost than DRAM without the high latency of disk thus flash enables clients to efficiently and cheaply cache much more data. - Compliance: o Data continues to expand rapidly o Rules for managing the data are expanding just as rapidly o Immutable compliance attribute needs to be settable on the file when it is created o Security labeling is a framework for reducing mistakes and making malicious misuse harder - Dave Quigley: this is policy. Not required. - Eisler: Yes, but being able to impose that policy makes compliance a more tractable task for storage administrators. - See draft-quigley-nfsv4-sec-label - Bug fixes and minor enhancements o Examples include - pNFS connectivity problem (see draft-faibish-nfsv4-pnfs-access- permissions-check-01 ) - Trunking discovery - Hints of I/O pattern - E.g. much harder to discern sequential access when pNFS is in use - Proposed next steps o Make draft-eisler-nfsv4-minorversion-2-requirements a work item of NFSv4 - November 16, 2009 - Lars Eggert (Transport Area Advisor) points out that it is unclear to him that there is consensus for all these items. - Eisler: Agreed. However a requirements I-D is a framework for driving consensus. draft-eisler-nfsv4-minorversion-2-requirements-02 is not final word on NFSv4.2 work items. - Lars: also wants NFSv4.0-bis and NFsv4.1 out and significant progress toward WG Last Call on remaining FedFS items before we try to push new items onto the WG charter. o Drive to WG consensus - January 2010 o WG Last Call - February 2010 o Re-charter WG, based on final requirements - March 2010 (before Anaheim IETF meeting) pNFS Access Permission Check (Sorin Faibish) - Follow up to work from Stockholm. - Problem is o pNFS clients can receive valid layouts to a DS, without being able o to access it for I/O o There is not mechanism for detection or correction on the MDS server o MDS has no info about permissions access to DS for fallback o This is a serious scalability problem for pNFS. o client will have to fall back to write through metadata o Permission denial is not detected at mount time, it is detected at o I/O time... o MDS has no info about permission access to DS for fallback o MDS doesn't check client permissions except on fallback detection - Protocol gaps o There is no error reporting mechanism for client and MDS for permission access issues o MDS can deliver valid layout to clients that have no permission to a DS without checking. o There is no correction mechanism to allow the MDS to recall a layout and remove the DS with issues. o The permission problem is not reported at mount time (/ is pNFS mounted) and may have a performance penalty during i/o o No guarantees that fallback to MDS will succeed. o pNFS doesn't address the protocol between the MDS and DS - Remedies o A protocol change is needed. See draft-faibish-nfsv4-pnfs-access- permissions-check o Add access permission error reporting to client and server using a new LAYOUTRETURN command o Add a new LAYOUTRECALL_CB command requiring the client to perform a permission check, and return all layouts for DS with permissions issues. o Leave the detection of permission problem condition as a recommendation for the server implementation o Could be a problem only for a few clients... o On detection, the server will remove the DS from the list of valid DSes, or flag it as inaccessible, and will recall all layouts that include that DS and send new layouts excluding the DS to clients. - Error reporting o Add client error reporting to LAYOUTRETURN. Opaque for permission access denial before fallback to NFS. - Marc Eshel: why not do general error reporting? - Eisler: Agreed. For example lack of connectivity is another error condition worth reporting to the MDS (from the client) o Faibish: yes, might be able to add issues like that. Might want to do this for NFSv4.1. - Eisler: 4.1 is off the table. o Introduce new LAYOUTRETURN_DEVICE command, which returns all layouts for the denied DS, and report a new error case. o Same error reporting will be used in combination with the new LAYOUTRETURN/CB_LAYOUTRECALL. - Permission checks o Detection of the problem is left as an exercise to the client. o Protocol defines the permission checks and responses to permission issues. A new LAYOUT operation could be introduced in 4.2 as well as a new error reporting mechanism o A new CB_LAYOUTRECALL command could be introduced in 4.2 asking for a permissions check (not connectivity). o A new LAYOUTRETURN command will be introduced, returning all layouts for a given DS/device before a fallback to write through MDS. o Eisler: If you do add new LAYOUT operation, why do we need LAYOUTRETURN? - Faibish: These are alternatives for discussion o A series of alternatives were proposed. See slides... o See implementation example o For the object layout type, error report already exists - see draft-ietf-nfsv4-pnfs-objs-12 - Questions o Is permission check required, or is error reporting enough? o Is this issue a protocol change or an optimization? o Are the protocol changes too complex for the pNFS protocol? o Are the new layout operations needed or should we modify existing operations? - Myklebust: What is the justification for a new callback? o Faibish: It would be used when MDS finds out from a client that permissions to a certain DS are broken, and the MDS wants to determine if the problem affects other clients. NFSv4 MAC attribute interoperability (Dave Quigley) - Multiple MAC models exists o MLS/Biba o Type enforcement - Multiple policies exist o RHEL-4/5/Fedora 9-11 o RHEL MLS vs. trusted extensions MLS - Policy definitions must be flexible - Accommodating everyone in one format is impossible - Quigley: Not sure this is local to the NFSv4 WG. Wants to define a model that is more general. o Eisler: Does it belong in the IETF? - Quigley: IANA is a fit for label formats - NFSv4 MAC attributes contain 2 components o Opaque label data o Some sort of policy/model identifier - How do we use the opaque data section? - How do we use the policy/model identifier - Old idea (DOIs) o A DOI is a 32-bit value - Identifies a MAC model + policy o Problem is DOI scales badly, and is difficult to implement. - New idea is LFS o LFS - label format specifier - identifies an entry in a label format registry - separates label format from meaning - Label format registry o Contains entries describing structure of the opaque field o Registry managed by an external entity - Maintained by IANA o Entry 0 reserved for keeping the field completely opaque. - Contents of an entry o Entry contains unique identifier o Description of the format - colon separated string of strings - description of the binary encoding of label data - Comma separated key/value pairs o Reference to a document describing format. - Example o Deployment uses CALIPSO style MLS with labeled NFS o Registers LFS 1 as a CALIPSO label format - Places CALIPSO draft as label description o Format contains DOI to specify policies o Label now has 2 identifiers - LFS@ o Jarrett Lu: How do you reconcile different policies? - Quigley: That's what the policy field is for. Scope is outside of NFS o Myklebust: Do Linux distributers need to register every version with IANA? - Dave Q: No. For example, someone will register the SELinux label format with IANA, and then register an entity that would act as the registrar for the different policy versions. IANA does not need to be policy broker. o Eisler: Is there a mandated LFS that NFSv4.x server and client must implement? - Quigley: Open to discussion. - MAC attribute needs to handle more than MLS o Do not expect MLS to be preferred way going forward. There has been a move away from MLS. Type Enforcement approach is the most popular replacement (being used with SE Linux). - A question for Spencer Shepler: should registry documents be NFSv4 work items? NFSv4.2 sparse files (Marc Eshel) - Sparse files are a common way to represent huge files o Database files o HPC applications o Virtual machine images - Applications are not aware of file organization o Read holes o Pre-fetch holes - Simple proposal to change the READ operation o New return code when holes are read o Don't return data that is all zeros o Return extra info with to tell where next real data is found. - Myklebust: Do we need this in addition to the de-duplication I-D (draft-eisler-nfsv4-pnfs-dedupe)? o Eshel: Does that require pnfs? - Eisler: No, it uses layouts but not pnfs. Could achieve the same thing as this proposal. Difference is that in the de-duplication draft, client can get the entire map of holes before starting reading. In this proposal, the client is prevented from reading a hole. Possibly Marc?s idea and draft-eisler-nfsv4-pnfs-dedupe are complementary. o Eisler: Marc?s proposal uses the count4 data type which is just 32 bits. This has been an ongoing issue with NFSv4.x proposals. Perhaps want to make this 64-bit. Maybe time to retire READ and WRITE anyway, since they are limited to 4GB reads and writes.