[nfsv4]

75th IETF
July 25-31, 2009, Stockholm, Sweden

WEDNESDAY, July 29, 2009
0900-1015 Morning Session I

=== Agenda Bash and Note Well

No additions.

=== Server Side Copy Offload Operation (Lentini)

Client triggers local copy on NFS server. And perhaps a
NFS server to server copy.

Extensive feedback and comments on the NFSv4 WG mailing
list starting in April 2009.

Today: to copy client reads file then writes it back:
wasteful

Proposal: Client sends copy operation to server which does
the copy
- saves network and client CPU resources..

Today: to copy from one server to another - same issues.

Proposal: client arranges for copy to take place.
COPY_NOTIFY to source server, COPY to destination
server...

Possible for source server and destination server
 to communicate on a different network than the
client->servers.

Black: How does security work? Covered later.

Faibish: is there a check that there is a connection between
source and destination servers?

Lentini: reply to COPY_NOTIFY has a set of addresses so
source server can tell  the client which  

Lentini: Uses: 
- File restore (snapshot).
- Virtualized Environment - allows a 
		hypervisor to: snap shot, clone, 
		migrate a VM's storage.

File versus directory copy: proposal is for file based copy
only.

Synchronous vs asynchronous: server decides
- client gets a completion message (sync case)
- client gets a in-progress message with handle to 
  server (asynchronous case) when copy complete, sends completion
  message

Partial file copy supported.

Space reservation (bit to ensure there is enough
destination space; prevents a destination server from thin
provisioning a file that is dominated by content containing
zeroes)

Intra and inter server copies. 2 server case: not
mandating any server to server protocol.
- proprietary and standards based protocols are possible.

Selected pull instead of push. (push means source server have
write permission on destination server which is less secure)

Reply to COPY_NOTIFY has list of URLs (addresses or
services). Can be NFS, ftp, http, etc. URLs.

Black: What is definition for NFS URL and where is it
specified and how version negotiated?

Eisler: NFS URL defined in 1990s (re: WebNFS) with
RFCs. Version is negotiated the same way NFS negotiates
version today (in-band, via major version in ONC RPC header,
 and minor version in COMPOUND header).

Security: requirements listed, two options: RPCSEC_GSSv3
(work in progress) or host based (AUTH_SYS).

Black: full delegation mechanism or delegation restricted
to copy operation?

Lentini: restricted to NFS user credentials, copy privilege
 and source file being copied.

Faibish: assumed client has access to both servers?
Eisler: yes, client must have access to both

Pawlowski: Faibish to review draft to see if security concern on copy operation
from perspective of client authorization.

Next steps:
RPCSEC_GSSv3 draft completed
Make copy offload part of the WG charter
Make it piece of 4.2
Black: is it 4.2 part of the WG charter?
Pawlowski: Nope! should do this!

Pawlowski: COPY comes up every few years within the WG, and
WG consensus has been to push back because of the lack
of APIs that would use it. Also inter-server COPY has
generated push back because NFS previously had no notion
of a formal relationship between NFS servers. What has
changed is that APIs exists, and with the introduction
of pNFS, we now have an example of formal relationships
between NFS servers.

=== Federated FS (Lentini)

- "Using DNS SRV to Specify a Global File Name Space with NFS version 4" 
-- <draft-ietf-nfsv4-federated-fs-dns-srv-namespace-01.txt> 

- "Administration Protocol for Federated Filesystems" 
-- <draft-ietf-nfsv4-federated-fs-admin-01.txt> 

- "Requirements for Federated File Systems" 
-- <draft-ietf-nfsv4-federated-fs-reqts-03.txt> 

- "NSDB Protocol for Federated Filesystemsâ€ś 
-- <draft-ietf-nfsv4-federated-fs-protocol-01.txt> 

Some future extensions (root fileset fsl type for smb,
etc) are not in current drafts.

Requirements document has finished WG Last Call. Shepler
owes a shepherding statement to Lars Eggert (Area
Advisor). Shepler has committed to Monday, August 10,
2009.

Name space root discovery: defines a DNS record to fine
the root of the namespace. Ready for last call review in
October.

Eggert: Ask DNS directorate to review BEFORE HAND!!! Lentini
to follow up.

Lentini would like to do another pass on security
considerations in the NSDB etc. spec.

Shepler (via Eisler): it is possible to have WG last
call on <draft-ietf-nfsv4-federated-fs-admin-01.txt>
and <draft-ietf-nfsv4-federated-fs-protocol-01.txt>
before October?

Lentini: There is an NFSv4 bake-a-thon before October. If
FedFS is going to be be tested there, then would like to
leverage that opportunity, to shake out protocol details
and so stick with October.

nfsv4 referral mechanism undefined in RFC3530 (but
fully defined in NFSv4.1 spec) - expired fs locations
 draft. Should we resurrect the draft and include in
RFC3530bis? 

Black: Lars - comment on size of 3530?

Eggert: if you can structure into multiple document that is preferred - but
OK to put in 3530 as we have an example that says even longer documents
can be processed by IETF and RFC Editor.

Eisler: Actually technically we don't have proof that
NFSv4.1 can be processed by RFC Editor (see the end of
these minutes).

AI: Dave Noveck and Tom Haynes to review course forward
on Referrals doc resurrection and whether to fold into
RFC3530bis or not.

NFSv4 Multi-domain access (Re, Andy Adamson's I-D) will
become important when deploy NFS this will be needed.

In September, should have feedback on implementation details of
Federate Naming. Should hit last call in October though.

Pawlowski: By October will we have testing experience that
 will impact Last Call schedule? James: We won't know
 until September..

=== NFS operation over IPv4 and IPv6 (Alex RN)

The WG charter suggests updating the specifications for
IPv6

IPv6 allows two clients to have the same address on two
private networks. The server needs a way to distinguish
between these two clients.

IPv6 is a problem/issue for:
- multi-homing
- RPCBIND
- NLM
- NFSv4.0 client identification
- reply cache
- dual to single stack transitions

Multi-homing: how are IPv4 and IPv6 different? 
- Private address boundaries: NFS client boot when IPv6
  is also boot strapped
- IPv6 potentially have same subnetids for different
  private addresses
- server scope ambiguity - need to store extra info for
  private addresses. embedded addresses: separate address
  prefix with IPv4 address included

RPCBIND Issues
- Should always be used if client or server supports always
  for IPv6, preferred over portmap
  because netid and universal address provides
  non-ambiguous information about whether service is
  supported on IPv6.

- Problem: advertising non-local info.

NLM issues:
- Assuming a dual stack NLM client. If a IPv4 or IPv6 path
  goes down,  locks can get 'stuck'; no clear cut notion
  of what is an NLM client identifier whether the IPv4
  or IPv6 NLM client owns state.

  Solution: If the monitor name is the same as the client
  name, the server  can contact the client over the IPv4
  network rather than the IPv6 network. client restart:
  client can reconnect and re-establish locks

NFSv4.0 client identification needed: E.g. get delegation
through both IPv4 and IPv6 address families - causes
revocation even though same client.  Solution: use client
string across address family and send a SETCLIENTID
whenever a new TCP connection to NFSv4.0 server is
created.

reply-cache and Exactly Once Semantics. If transmit is
from one address family and re-transmit is over another
address family, the re-transmit will not be recognized
as a retry, will miss reply cache, and break EOS. 

Eisler: What's the solution for NFSv3?

Alex: The solution is only for NFSv4. There is no solution
being proposed for NFSv2 or NFSv3.

For NFSv4.0, solution is to include client ID in reply
cache, in addition to xid (i.e. don't use the source IP
address and port).

Summary: expand charter to include implementation advice
for NFSv2,v3,v4,and v4.1 for IPv6

Pawlowski: Why expand for NFSv[23]; I thought this was not solvable for NFSv[23]?

Eisler: some things, like switching address families
between re-transmits produce unsolvable problems. WG could
produce document that advises implementations to strive
to avoid such conditions, or at least, advised them that
when it has to be done, unsolvable problems will result.

Pawlowski: Nonetheless, I don't want the WG to spend lots of
time on v2 and v3.  There might be advice to give. We
definitely need to nail v4 and v4.1

Black: Useful to have advice on existing implementations
v2 + v3 will behave when confronted with IPv6

Pawlowski: Lars, is this what a BCP would do?

Eggert: BCPs represent a class of document that are stronger
than an Informational RFC. 

Eggert: recommendation for how to structure this. (1) If you
want to run this in a IPv6 only environment, are there any
changes needed to NFS or ONC RPC protocols? If so, then
there is standards-based work here, but that work would
be in document(s) separate from a BCP. (2) If you are
running in a mixed environment (IPv4 + IPv6 or IPv6 + IPv6,
...) what implementation advice is there?. Have a BCP for
the latter. The BCP would propose how to deal with this.


=== NFSv4 Multi-Domain Access (Adamson) 

Work done at UMich CITI with Kevin Coffman.

Propose taking draft under charter.

First draft on UID mappings, GID mappings are future work.

Definition of a name service: Exports a unique UID number
space. Focus is on LDAP as a name service.

David Black noted the issue of a multi-named user in DNS
domains and NFS naming. E.g, adamson.com, beepy.com,
and eisler.com, are hosted on a single machine, then this
 machine must pick 1 domain that is none of those 3.

Eisler says this is important in light of Federated Name
Spaces (multi-realm considerations).

Lentini: Pointed out that the work is applicable/important
even if FedFS is not being used. 

Faibish: Would like use cases for when this is important.

Eggert: Andy must run by our LDAP advisor - Leif Johansson
(listed on NFSv4 WG charter).

=== Proposal for an NFSv4 extension to allow the use of NFS clients as pNFS data servers (Adamson)

- <draft-myklebust-nfsv4-pnfs-backend-00.txt> 

Presenting for Trond.

Solutions like cachefs or NFS-server-side data replication
are inefficient because the same data is getting cached
multiple times and fetched multiple times from the same
origin data. Solution is to use a peer-to-peer approach
where pNFS clients act as data servers.

Work being presented as been prototyped by NetApp for
Linux and showed "correct" scaling?

Eisler: define correct?

Adamson: not sure what Trond meant but likely meant
linear.

Proposed extensions? 

Eisler and Black pretty much said the same thing:

- Eisler - This could easily be extended to write through
  and write back (Pawlowski: my sentiment exactly; Eisler made
   case for sub-file caching in San Francisco).

- Black - suggesting another place to dig. I see why need
  weak delegation.  Just want delegation revocable. What
  if you client holds a layout, leverage the pNFS data
  channel. Leverage - Eisler big believer in using layouts
  to extend NFS.

Eisler: Especially with write-through or write-back,
 this proposal could be highly disruptive.

=== Access checks and pNFS (Sorin Faibish) 

Note: no accompanying I-D at this time.

Summary:
Problem is not specific to a specific layout type.
Proposal to add clarifying error code to unsnarl errors
arising, and perhaps described expected function of MDS
and DS without specifying protocol between.

The problem arises when the metadata server does not have
permission to access a data server.

If the client cannot see all the devices at mount time,
the client  will fall back to NFS with no indication of
an issue. Silent loss of scalability as a result.

In file/object layout, client does not check that it has
access  to data servers at mount time. If you detect
this error at mount time,  the problem is much easier to
fix. The admin will not notice this  error and scalability
will suffer.

Protocol Gaps:
- client doesn't communicate to MDS that it had an I/O
  error on the DS

- MDS can give clients a layout with no expectation that
  the client has permissions to access the DS

- permission problem is not reported at mount time

- (two more he didn't cover because of questions
  interrupting him; see slides)

Adams: by access do you mean physical access or permissions?
Faibish: (file permission)

Black: I can't get there and I can get there and
I can't see anything  are very similar. Functional
requirements/expectations on the MDS to DS protocol  are
reasonable even if the protocol is not specified. 

Faibish: Agree.

Eisler: Bullet #1 is the protocol gap.

Pawlowski: this is not earth shattering to pNFS. This is really
a diagnostic error code proposal.

Black: Agree.
Eisler: Agree.
Faibish: Agree.

Proposed Remedies:

1. Add permission checks of the clients to access all the
   Data Servers (using a list sent by MDS) at mount
   time.

2. Add new client error case when client cannot access a
   Data Server at mount time and propagate to MDS

3. Add permission check of MDS to DS after a client
   permission access error report to that DS

4. Add new I/O error when a pNFS client cannot access a
   DS that was accessible at mount time and then ask for
   the re-direct

Adamson: For #1, makes sense for block. For users, how
would you check for every user at mount time?  Faibish: In
file case, this would be for basic client access checks. 

Pawlowski: The implementations should be logging these
errors. This is not totally a protocol error.  Faibish:
Agree.

Black: #3 (and some other items) are implementation advice
not protocol changes.

Black: Complete the I/O successfully in an I/O through the
MDS, but also  inform the user via the protocol. Don't
fail the I/O.

Continuation of Remedies:

5. The pNFS server that granted a layout to the client,
   should check that the client has access to the storage
   devices (files, LUNs, or objects).  

6. pNFS client should add a new mount switch pNFS to inform
   the pNFS server of client's pNFS access intention and
   log on both (client/server) in case of failure 

7. pNFS MDS should check that it can perform normal I/Os
   to any device it hands out in a pNFS layout

Black: Comment on item 6. The important underlying
principle to write up is  opportunistic versus intentional
uses of pNFS (topic on the list). In the  later case,
you really want to know that it is not working because
there  will be major problems otherwise.

Implementation Ideas ...
* Add an error case into LAYOUTRETURN or LAYOUTCOMMIT.

* Add a new layout return type that is "FSID with
  prejudice", i.e., return all layouts for this FSID and
   tell the server that the reason for the return is a
   connectivity issue

* Add periodic access permission checks retries and return
  layout only after several retries

* Add a new mount switch –pNFS and a possible error on
  pNFS optimization that didn't work and carries on using
  plain NFS (not pNFS) to the MDS

Questions raised on Implementation ideas ...

Beepy: Are you proposing an alternative scalability
protocol to pNFS?  If pNFS does not work, we are no worse
off than with regular NFS. Faibish: described the uses cases
for the addition of a DS after I/O  has already started.

Faibish: I was thinking about programs that run for many
days. If DSes  come and go, we want to know when the
client is not using pNFS.

Black: this seems appropriate as implementation guidance
even if  pNFS MDS to DS protocol is not specified. The
only protocol change  being suggested is how errors are
reported between the client and MDS.

Slide with list of Questions:

* Should we leave this entire issue as an implementation
  detail?

* Should we include protocol changes to address the
  scalability limitation to pNFS scalable protocol?

* If we answer yes to protocol changes should we introduce
  a new layout command or modify LAYOUTGET, LAYOUTCOMMIT?
  
* Should we amend/enhance NFSv4.1 or leave it for v4.2?

Beepy: Lars, should we overload the protocol specification
with BCP?

Eggert: BCP are a class of documents that are used for a lot
of things.  They are stronger than informational. The BCP
level statement  is a way of saying the "NFSv4 consensus
is to do X". Lars doesn't  know what we want to say. A
BCP might be appropriate or it  might not be. The IETF has
always been careful to not tell you how  to implement the
spec. It is useful to communicate implementation  advice
though.

Black: last 2 questions (on slide with Questions)  are
connected. We need to figure out what the  right way to
do the error reporting channel is, then we can figure 
out where we would put this in v4.1 or v4.2.

Faibish: This is where we fail. Early adopters to pNFS will
be tripped up by these issues first.

Black: I want to see a reasonable description of the
proposed solution.

Lentini: Referring to Black's point, observes that this
is the  3rd time v4.2 came up in this meeting - where do
we stand with making this a WG work item?

Pawlowski: Having a 'it takes a village approach' to v4.2. Get
Falkner or Eisler to  start rolling out v4.2 potential
item list.

Eisler: Will make the v4.2 list ASAP. 

Pawlowski: Two weeks from today?

Eisler: Done.

=== Topic of why NFSv4.1 doesn't have a published RFC yet. 

The BTNS WG needs to resolve an issue with the connection
latching  specification which NFSv4.1 indirectly depends
on.

Eggert: 4.1 track down Nico... talk to Russ - check his
concern, and the BUTTONS working group. Eisler should try
to meet with the BTNS WG chairs if either or both are at
IETF (still).

Eggert: WRT v4.1 maybe we want to track down (with Nico)
and resolve what we need this week. 

Pawlowski: I pointed Spencer to Nico.... I'll put an Aug 12th
timer on getting back to Lars.

=== Wrap-up (Pawlowski)