idnits 2.17.1 

draft-eisler-nfsv4-minorversion-2-requirements-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** The document seems to lack a License Notice according IETF Trust
     Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009
     Section 6.b -- however, there's a paragraph with a matching beginning.
     Boilerplate error?

     (You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Feb 2009 rather than one of the newer Notices.  See
     https://trustee.ietf.org/license-info/.)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (October 19, 2009) is 5302 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Obsolete informational reference (is this intentional?): RFC 3530
     (Obsoleted by RFC 7530)


     Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                           M. Eisler, Ed.
3	Internet-Draft                                                    NetApp
4	Intended status: Informational                          October 19, 2009
5	Expires: April 22, 2010

7	                        Requirements for NFSv4.2
8	           draft-eisler-nfsv4-minorversion-2-requirements-01

10	Status of this Memo

12	   This Internet-Draft is submitted to IETF in full conformance with the
13	   provisions of BCP 78 and BCP 79.

15	   Internet-Drafts are working documents of the Internet Engineering
16	   Task Force (IETF), its areas, and its working groups.  Note that
17	   other groups may also distribute working documents as Internet-
18	   Drafts.

20	   Internet-Drafts are draft documents valid for a maximum of six months
21	   and may be updated, replaced, or obsoleted by other documents at any
22	   time.  It is inappropriate to use Internet-Drafts as reference
23	   material or to cite them other than as "work in progress."

25	   The list of current Internet-Drafts can be accessed at
26	   http://www.ietf.org/ietf/1id-abstracts.txt.

28	   The list of Internet-Draft Shadow Directories can be accessed at
29	   http://www.ietf.org/shadow.html.

31	   This Internet-Draft will expire on April 22, 2010.

33	Copyright Notice

35	   Copyright (c) 2009 IETF Trust and the persons identified as the
36	   document authors.  All rights reserved.

38	   This document is subject to BCP 78 and the IETF Trust's Legal
39	   Provisions Relating to IETF Documents in effect on the date of
40	   publication of this document (http://trustee.ietf.org/license-info).
41	   Please review these documents carefully, as they describe your rights
42	   and restrictions with respect to this document.

44	Abstract

46	   This document proposes requirements for NFSv4.2.

48	Table of Contents

50	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
51	     1.1.  Requirements Language . . . . . . . . . . . . . . . . . . . 3
52	   2.  Efficiency and Utilization Requirements . . . . . . . . . . . . 3
53	     2.1.  Capacity  . . . . . . . . . . . . . . . . . . . . . . . . . 3
54	     2.2.  Network Bandwidth and Processing  . . . . . . . . . . . . . 5
55	   3.  Flash Memory Requirements . . . . . . . . . . . . . . . . . . . 5
56	   4.  Compliance  . . . . . . . . . . . . . . . . . . . . . . . . . . 6
57	   5.  Incremental Improvements  . . . . . . . . . . . . . . . . . . . 6
58	   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8
59	   7.  Security Considerations . . . . . . . . . . . . . . . . . . . . 8
60	   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . 8
61	   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 8
62	     9.1.  Normative References  . . . . . . . . . . . . . . . . . . . 8
63	     9.2.  Informative References  . . . . . . . . . . . . . . . . . . 8
64	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . . . 8

66	1.  Introduction

68	   NFSv4.1 [I-D.ietf-nfsv4-minorversion1] is an approved specification.
69	   The NFSv4 [RFC3530] community has indicated a desire to continue
70	   innovating NFS, and specifically via a new minor version of NFSv4,
71	   namely NFSv4.2.  The desire for future innovation is primarily driven
72	   by two trends in the storage industry:

74	   o  High efficiency and utilization of resources such as, capacity,
75	      network bandwidth, and processors.

77	   o  Solid state flash storage which promises faster throughput and
78	      lower latency than magnetic disk drives and lower cost than
79	      dynamic random access memory.

81	   Secondarily, innovation is being driver by the trend to stronger
82	   compliance with information management.  In addition, as might be
83	   expected with a complex protocol like NFSv4.1, implementation
84	   experience has shown that minor changes to the protocol would be
85	   useful to improve the end user experience.

87	   This document proposes requirements along these four themes, and
88	   attempts to strike a balance between stating the problem and
89	   proposing solutions.  With respect to the latter, some thinking among
90	   the NFS community has taken place, and a future revision of this
91	   document will reference embodiments of such thinking.

93	1.1.  Requirements Language

95	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
96	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
97	   document are to be interpreted as described in RFC 2119 [RFC2119].

99	2.  Efficiency and Utilization Requirements

101	2.1.  Capacity

103	   Despite the capacity of magnetic disk continuing to increase at
104	   exponential rates, the storage industry is under pressure to make the
105	   storage of data increasingly efficient, so that more data can be
106	   stored.  The driver for this counter-intuitive demand is that disk
107	   access times are not improving anywhere near as quickly as
108	   capacities.  The industry has responded to this development by
109	   increasing data density via limiting the number of times a unique
110	   pattern of data is stored in a storage device.  For example some
111	   storage devices support de-duplication.  When storing two files, a
112	   storage device might compare them for shared patterns of data, and
113	   store the pattern just once, and setting reference counts on the
114	   blocks of the unique pattern to two.  With de-duplication the number
115	   of times a storage device has to read a particular pattern would be
116	   reduced to just once, thus improving average access time.

118	   For a file access protocol such as NFS, there are several implied
119	   requirements for addressing this capacity efficiency trend:

121	   o  The "space_used" attribute of NFSv4 does not report meaningful
122	      information.  Removing a file with a "space_used" value of X bytes
123	      does not mean that the file system will see an increase of X
124	      available bytes.  Providing more meaningful information is a
125	      requirement.

127	   o  Because it is probable, especially for applications such as
128	      hypervisors, the NFSv4 client is accessing multiple files with
129	      shared blocks of data, it is in the interest of the client and
130	      server for the client to know which blocks are share so that they
131	      are are not read multiple times, and not cached multiple times.
132	      Providing a block map of shared blocks is a requirement.

134	   o  If an NFSv4 client is aware of which patterns exist on which
135	      files, when it wants to write pattern X to file B to offset J, and
136	      it knows that X also exists in offset I of file A, then if it can
137	      advise the server of its intent, the server can arrange for
138	      pattern X to appear in file A being a zero copy.  Even if the
139	      server does not support de-duplication, it can at least perform a
140	      local copy that saves network bandwidth and processor overhead on
141	      the client and server.

143	   o  File holes are patterns of zeros that in some file systems do are
144	      unallocated blocks.  In a sense, holes are the ultimate de-
145	      duplicated pattern.  While proposals to extend NFS to support hole
146	      punching have been around since the 1980s, until recently there
147	      have not been NFS clients that could make use of hole punching.
148	      The Information Technology (IT) trend toward virtualizing
149	      operating environments via hypervisors has resulted in a need for
150	      hypervisors to translate a (virtual) disk command to free a block
151	      into an NFS request to free that block.  On the read side, if a
152	      file contains holes, then again, as the ultimate in de-
153	      duplication, it would be better for the client to be told the
154	      region it wants to read has a hole, instead of of returning long
155	      arrays of zero bytes.  Even if a server does not support holes on
156	      write or read, avoiding the transmission of zeroes will save
157	      network bandwidth and reduce processor overhead.

159	2.2.  Network Bandwidth and Processing

161	   The computational capabilities of processors continues to grow at an
162	   exponential rate.  However, as noted previously, because disk access
163	   times are not showing a commensurate exponential decrease, disk
164	   performance is not tracking processor performance.  In addition,
165	   while network bandwidth is exponentially increasing, unlike disk
166	   capacities and processor bandwidth, the improvement is not seen on a
167	   1-2 year cycle, but happens on something closer to a 10 year cycle.
168	   The lag between disk and network performance compared to processor
169	   performance means that there is often a discontinuity between the
170	   processing capabilities of NFS clients and the speed at which they
171	   can extract data from an NFS server.  For some use cases, much of the
172	   data that is read by one client from an NFS server also needs to be
173	   read by other clients.  Re-reading this data is will result in a
174	   waste of the network bandwidth and processing of the NFS server.
175	   This same observation has driven the creation of peer-to-peer content
176	   distribution protocols, where data is directly read from peers rather
177	   than servers.  It is apparent that a similar technique could be used
178	   to offload primary storage.

180	   The pNFS protocol distributes the I/O to a set of files across a
181	   cluster of data servers.  Arguably, its primary value is in balancing
182	   load across storage devices, especially when it can leverage a back
183	   end file system or storage cluster with automatic load balancing
184	   capabilities.  In NFSv4.1, no consideration was given to metadata.
185	   Metadata is critical to several workloads, to the point that, as
186	   defined in NFSv4.1, pNFS will not not offer much value in those
187	   cases.  The load balancing capabilities of pNFS need to be brought to
188	   metadata.

190	   From an end user perspective, the operations performed on a file
191	   include creating, reading, writing, deleting, and copying.  NFSv4 has
192	   operations for all but the last.  While file copy has been proposed
193	   for NFS in the past, it was always rejected because of the lack of
194	   Application Programming Interfaces (APIs) within existing operating
195	   environments to send a copy operation.  The IT trend toward
196	   virtualization via hypervisors has changed the situation, where the
197	   emerging use case is to copy a virtual disk.  The use of a copy
198	   operation will save network bandwidth on the client and server, and
199	   where the server supports it, intra-server file copy has the
200	   potential to avoid all physical data copy.

202	3.  Flash Memory Requirements

204	   Flash memory is rapidly filling the wide gap between expensive but
205	   fast Dynamic Random Access Memory (DRAM) and inexpensive but cheap
206	   magnetic disk.  The cost per bit of flash is between DRAM and disk.
207	   The access time pet bit of flash is between DRAM and disk.  This has
208	   resulted in the File access Operations Per Second (FOPS) per unit of
209	   cost of flash exceeding DRAM and disk.  Flash can be easily added as
210	   another storage medium to NFS servers, and this does not require a
211	   change to the NFS protocol.  However, the value of flash's superior
212	   FOPS is best realized when flash is closest to the application, i.e.
213	   on the NFS client.  One approach would be to forgo the use of network
214	   storage and de-evolve back to Direct Attached Storage (DAS).
215	   However, this would require that data protection value that exists in
216	   modern storage devices be brought into DAS, and this is not always
217	   convenient or cost effective.  A less traumatic way to leverage the
218	   full FOPS of flash would be for NFSv4 clients to leverage flash for
219	   caching of data.

221	   Today NFSv4 supports whole file delegations for enabling caching.
222	   Such a granularity is useful for applications like user home
223	   directories where there is little file sharing.  However, NFS is used
224	   for many more workloads, which include file sharing.  In these
225	   workloads, files are shared, whereas individual blocks might not be.
226	   This drives a requirement for sub-file caching.

228	4.  Compliance

230	   New regulations for the IT industry limit who can view what data.
231	   NFSv4 has Access Control Lists (ACLs), but the ACL can be changed by
232	   the nominal file owner.  In practice, the end user that owns the file
233	   (essentially, has the right to delete the file or give permissions to
234	   other users), is not the legal owner of the file.  The legal owner of
235	   the file wants to control not just who can access the file, but who
236	   they can pass the content of the file to.  The IT industry has
237	   addressed this need in the past with notion of security labeling.
238	   Labels are attached to devices, files, users, applications, network
239	   connections, etc.  When the labels of two objects match, data can be
240	   transferred from one to another.  For example a label called "Secret"
241	   on a file results in only users with a "Secret" security clearance
242	   being allowed to view the file, despite what the ACL says.

244	   To attach a label on a file requires that it be created atomically
245	   with the file, which means that a new RECOMMENDED attribute for a
246	   security label is needed.

248	5.  Incremental Improvements

250	   Implementation experience with NFSv4.1 and related protocols, such as
251	   SMB2, has shown a number of areas where the protocol can be improved.

253	   o  Hints for the type of file access, such as sequential read.  While
254	      traditionally NFS servers have been able to detect read-a-head
255	      patterns, with the introduction of pNFS, this will be harder.
256	      Since NFS clients can detect patterns of access, they can advise
257	      servers.  In addition, the UNIX/Linux madvise() API is an example
258	      of where applications can provide direct advice to the NFS server.

260	   o  Head of line blocking.  Consider a client that wants to send a
261	      three operations: a file creation, a read for one megabyte, and a
262	      write for one megabyte.  Each of these might be sent on a separate
263	      slot.  The client determines that it is not desirable for the read
264	      operation to wait for the write operation to be sent, so it sends
265	      the create.  However, it does not want to serialize the read and
266	      write behind the create, so the read gets sent, followed by the
267	      write.  On the reply side, the server does not know that client
268	      wants the create satisfied first, so read and write operations are
269	      first processed.  By the time the create is performed on the
270	      server, the response to the read is still filling the reply side.
271	      While NFSv4.1 could solve this problem by associating two
272	      connections with the session, and using one connection for create,
273	      and the other for read or write, multiple connections come at a
274	      cost.  The requirement is to solve this head of line blocking
275	      problem.  Tagging a request as one that should go to the head of
276	      the line for request and response processing is one possible way
277	      to address it.

279	   o  pNFS connectivity/access indication.  If a pNFS client is given a
280	      layout that directs it to a storage device it cannot access due to
281	      connectivity of access control issues, it has no way in NFSv4.1 to
282	      indicate the problem to the metadata server.

284	   o  RPCSEC_GSS sequence window size on backchannel.  The NFSv4.1
285	      specification does not have a way to for the client to tell the
286	      server what window size to use on the backchannel.  The
287	      specification says that the window size will be the same as what
288	      the server uses.  Potentially, a server could use a very large
289	      window size that the client does not want.

291	   o  Trunking discovery.  The NFSv4.1 specification is long on how a
292	      client verifies if trunking is available between two connections,
293	      but short on how a client can discover destination addresses that
294	      can be trunked.  It would be useful if there was a method (such as
295	      an operation) to get a list of destinations that can be session or
296	      client ID trunked, as well as a notification when the set of
297	      destinations changes.

299	6.  IANA Considerations

301	   None.

303	7.  Security Considerations

305	   None.

307	8.  Acknowledgements

309	   Thanks to Dave Noveck for reviewing this document and providing
310	   valuable feedback.

312	9.  References

314	9.1.  Normative References

316	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
317	              Requirement Levels", BCP 14, RFC 2119, March 1997.

319	9.2.  Informative References

321	   [I-D.ietf-nfsv4-minorversion1]
322	              Shepler, S., Eisler, M., and D. Noveck, "NFS Version 4
323	              Minor Version 1", draft-ietf-nfsv4-minorversion1-29 (work
324	              in progress), December 2008.

326	   [RFC3530]  Shepler, S., Callaghan, B., Robinson, D., Thurlow, R.,
327	              Beame, C., Eisler, M., and D. Noveck, "Network File System
328	              (NFS) version 4 Protocol", RFC 3530, April 2003.

330	Author's Address

332	   Michael Eisler (editor)
333	   NetApp
334	   5765 Chase Point Circle
335	   Colorado Springs, CO  80919
336	   US

338	   Phone: +1 719 599 8759
339	   Email: mike@eisler.com