idnits 2.17.1 

draft-ietf-nfsv4-nfs-rdma-problem-statement-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 15.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 691.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 667.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 674.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 680.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 3530 (Obsoleted by RFC 7530)

  ** Obsolete normative reference: RFC 1831 (Obsoleted by RFC 5531)

  ** Obsolete normative reference: RFC 1832 (Obsoleted by RFC 4506)

  ** Downref: Normative reference to an Informational RFC: RFC 1813


     Summary: 7 errors (**), 0 flaws (~~), 2 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	INTERNET-DRAFT                                            Tom Talpey
3	Expires: April 2007                                    Chet Juszczak

5	                                                       October, 2006

7	                       NFS RDMA Problem Statement
8	             draft-ietf-nfsv4-nfs-rdma-problem-statement-05

10	Status of this Memo

12	     By submitting this Internet-Draft, each author represents that any
13	     applicable patent or other IPR claims of which he or she is aware
14	     have been or will be disclosed, and any of which he or she becomes
15	     aware will be disclosed, in accordance with Section 6 of BCP 79.

17	     Internet-Drafts are working documents of the Internet Engineering
18	     Task Force (IETF), its areas, and its working groups.  Note that
19	     other groups may also distribute working documents as Internet-
20	     Drafts.

22	     Internet-Drafts are draft documents valid for a maximum of six
23	     months and may be updated, replaced, or obsoleted by other
24	     documents at any time.  It is inappropriate to use Internet-Drafts
25	     as reference material or to cite them other than as "work in
26	     progress."

28	     The list of current Internet-Drafts can be accessed at
29	         http://www.ietf.org/ietf/1id-abstracts.txt The list of
30	     Internet-Draft Shadow Directories can be accessed at
31	         http://www.ietf.org/shadow.html.

33	Abstract

35	     This draft addresses applying Remote Direct Memory Access to the
36	     NFS protocols.  NFS implementations historically incur significant
37	     overhead due to data copies on end-host systems, as well as other
38	     sources.  The potential benefits of RDMA to these implementations
39	     are explored, and the reasons why RDMA is especially well-suited to
40	     NFS and network file protocols in general are evaluated.

42	Table Of Contents

44	     1.   Introduction . . . . . . . . . . . . . . . . . . . . . . .   2
45	     2.   Problem Statement  . . . . . . . . . . . . . . . . . . . .   4
46	     3.   File Protocol Architecture . . . . . . . . . . . . . . . .   5
47	     4.   Sources of Overhead  . . . . . . . . . . . . . . . . . . .   7
48	     4.1.   Savings from TOE . . . . . . . . . . . . . . . . . . . .   8
49	     4.2.   Savings from RDMA  . . . . . . . . . . . . . . . . . . .   9
50	     5.   Application of RDMA to NFS . . . . . . . . . . . . . . . .  10
51	     6.   Conclusions  . . . . . . . . . . . . . . . . . . . . . . .  10
52	          Security Considerations  . . . . . . . . . . . . . . . . .  11
53	          IANA Considerations  . . . . . . . . . . . . . . . . . . .  11
54	          Acknowledgements . . . . . . . . . . . . . . . . . . . . .  11
55	          Normative References . . . . . . . . . . . . . . . . . . .  11
56	          Informative References . . . . . . . . . . . . . . . . . .  12
57	          Authors' Addresses . . . . . . . . . . . . . . . . . . . .  14
58	          Intellectual Property and Copyright Statements . . . . . .  14

60	1.  Introduction

62	     The Network File System (NFS) protocol (as described in [RFC1094],
63	     [RFC1813], and [RFC3530]) is one of several remote file access
64	     protocols used in the class of processing architecture sometimes
65	     called Network Attached Storage (NAS).

67	     Historically, remote file access has proved to be a convenient,
68	     cost-effective way to share information over a network, a concept
69	     proven over time by the popularity of the NFS protocol.  However,
70	     there are issues in such a deployment.

72	     As compared to a local (direct-attached) file access architecture,
73	     NFS removes the overhead of managing the local on-disk filesystem
74	     state and its metadata, but interposes at least a transport network
75	     and two network endpoints between an application process and the
76	     files it is accessing.  This tradeoff has to date usually resulted
77	     in a net performance loss as a result of reduced bandwidth,
78	     increased application server CPU utilization, and other overheads.

80	     Several classes of applications, including those directly
81	     supporting enterprise activities in high performance domains such
82	     as database applications and shared clusters, have therefore
83	     encountered issues with moving to NFS architectures.  While this
84	     has been due principally to the performance costs of NFS versus
85	     direct attached files, other reasons are relevant, such as the lack
86	     of strong consistency guarantees being provided by NFS
87	     implementations.

89	     Replication of local file access performance on NAS using
90	     traditional network protocol stacks has proven difficult, not
91	     because of protocol processing overheads, but because of data copy
92	     costs in the network endpoints.  This is especially true since host
93	     buses are now often the main bottleneck in NAS architectures
94	     [MOG03] [CHA+01].

96	     The External Data Representation [RFC1832] employed beneath NFS and
97	     RPC [RFC1831] can add more data copies, exacerbating the problem.

99	     Data copy-avoidance designs have not been widely adopted for a
100	     variety of reasons.  [BRU99] points out that "many copy avoidance
101	     techniques for network I/O are not applicable or may even backfire
102	     if applied to file I/O."  Other designs that eliminate unnecessary
103	     copies, such as [PAI+00], are incompatible with existing APIs and
104	     therefore force application changes.

106	     In recent years, an effort to standardize a set of protocols for
107	     Remote Direct Memory Access, RDMA, over the standard Internet
108	     Protocol Suite has been chartered [RDDP].  Several drafts have been
109	     proposed and are being considered for Standards Track.

111	     RDMA is a general solution to the problem of CPU overhead incurred
112	     due to data copies, primarily at the receiver.  Substantial
113	     research has addressed this and has borne out the efficacy of the
114	     approach.  An overview of this is the RDDP "Remote Direct Memory
115	     Access (RDMA) over IP Problem Statement" document, [RFC4297].

117	     In addition to the per-byte savings of off-loading data copies,
118	     RDMA-enabled NICs (RNICS) offload the underlying protocol layers as
119	     well, e.g. TCP, further reducing CPU overhead due to NAS
120	     processing.

122	1.1.  Background

124	     The RDDP Problem Statement [RFC4297] asserts:

126	          "High costs associated with copying are an issue primarily for
127	          large scale systems ... with high bandwidth feeds, usually
128	          multiprocessors and clusters, that are adversely affected by
129	          copying overhead.  Examples of such machines include all
130	          varieties of servers: database servers, storage servers,
131	          application servers for transaction processing, for e-
132	          commerce, and web serving, content distribution, video
133	          distribution, backups, data mining and decision support, and
134	          scientific computing.

136	          Note that such servers almost exclusively service many
137	          concurrent sessions (transport connections), which, in
138	          aggregate, are responsible for > 1 Gbits/s of communication.
139	          Nonetheless, the cost of copying overhead for a particular
140	          load is the same whether from few or many sessions."

142	     Note that each of the servers listed above could be accessing their
143	     file data as an NFS client, or NFS serving the data to such
144	     clients, or acting as both.

146	     The CPU overhead of the NFS and TCP/IP protocol stacks (including
147	     data copies or reduced copy workarounds) becomes a significant
148	     matter in these clients and servers.  File access using locally
149	     attached disks imposes relatively low overhead due to the highly
150	     optimized I/O path and direct memory access afforded to the storage
151	     controller.  This is not the case with NFS, which must pass data
152	     to, and especially from, the network and network processing stack
153	     to the NFS stack.  Frequently, data copies are imposed on this
154	     transfer, in some cases several such copies in each direction.

156	     Copies are potentially encountered in an NFS implementation
157	     exchanging data to and from user address spaces, within kernel
158	     buffer caches, in XDR marshalling and unmarshalling, and within
159	     network stacks and network drivers.  Other overheads such as
160	     serialization among multiple threads of execution sharing a single
161	     NFS mount point and transport connection are additionally
162	     encountered.

164	     Numerous upper layer protocols achieve extremely high bandwidth and
165	     low overhead through the use of RDMA.  [MAF+02] show that the RDMA-
166	     based Direct Access File System (with a user-level implementation
167	     of the file system client) can outperform even a zero-copy
168	     implementation of NFS [CHA+01] [CHA+99] [GAL+99] [KM02].  Also,
169	     file data access implies the use of large ULP messages.  These
170	     large messages tend to amortize any increase in per-message costs
171	     due to the offload of protocol processing incurred when using RNICs
172	     while gaining the benefits of reduced per-byte costs.  Finally, the
173	     direct memory addressing afforded by RDMA avoids many sources of
174	     contention on network resources.

176	2.  Problem Statement

178	     The principal performance problem encountered by NFS
179	     implementations is the CPU overhead required to implement the
180	     protocol.  Primary among the sources of this overhead is the
181	     movement of data from NFS protocol messages to its eventual
182	     destination in user buffers or aligned kernel buffers.  Due to the
183	     nature of the RPC and XDR protocols, the NFS data payload arrives
184	     at arbitrary alignment, necessitating a copy at the receiver, and
185	     the NFS requests are completed in an arbitrary sequence.

187	     The data copies consume system bus bandwidth and CPU time, reducing
188	     the available system capacity for applications [RFC4297].
189	     Achieving zero-copy with NFS has, to date, required sophisticated,
190	     version-specific "header cracking" hardware and/or extensive
191	     platform-specific virtual memory mapping tricks.  Such approaches
192	     become even more difficult for NFS version 4 due to the existence
193	     of the COMPOUND operation, which further reduces alignment and
194	     greatly complicates ULP offload.

196	     Furthermore, NFS will soon be challenged by emerging high-speed
197	     network fabrics such as 10 Gbits/s Ethernet.  Performing even raw
198	     network I/O such as TCP is an issue at such speeds with today's
199	     hardware.  The problem is fundamental in nature and has led the
200	     IETF to explore RDMA [RFC4297].

202	     Zero-copy techniques benefit file protocols extensively, as they
203	     enable direct user I/O, reduce the overhead of protocol stacks,
204	     provide perfect alignment into caches, etc.  Many studies have
205	     already shown the performance benefits of such techniques [SKE+01]
206	     [DCK+03] [FJNFS] [FJDAFS] [KM02] [MAF+02].

208	     RDMA is compelling here for another reason; hardware offloaded
209	     networking support in itself does not avoid data copies, without
210	     resorting to implementing part of the NFS protocol in the NIC.
211	     Support of RDMA by NFS enables the highest performance at the
212	     architecture level rather than by implementation; this enables
213	     ubiquitous and interoperable solutions.

215	     By providing file access performance equivalent to that of local
216	     file systems, NFS over RDMA will enable applications running on a
217	     set of client machines to interact through an NFS file system, just
218	     as applications running on a single machine might interact through
219	     a local file system.

221	3.  File Protocol Architecture

223	     NFS runs as an ONC RPC [RFC1831] application.  Being a file access
224	     protocol, NFS is very "rich" in data content (versus control
225	     information).

227	     NFS messages can range from very small (under 100 bytes) to very
228	     large (from many kilobytes to a megabyte or more).  They are all
229	     contained within an RPC message and follow a variable length RPC
230	     header.  This layout provides an alignment challenge for the data
231	     items contained in an NFS call (request) or reply (response)
232	     message.

234	     In addition to the control information in each NFS call or reply
235	     message, sometimes there are large "chunks" of application file
236	     data, for example read and write requests.  With NFS version 4 (due
237	     to the existence of the COMPOUND operation) there can be several of
238	     these data chunks interspersed with control information.

240	     ONC RPC is a remote procedure call protocol that has been run over
241	     a variety of transports.  Most implementations today use UDP or
242	     TCP.  RPC messages are defined in terms of an eXternal Data
243	     Representation (XDR) [RFC1832] which provides a canonical data
244	     representation across a variety of host architectures.  An XDR data
245	     stream is conveyed differently on each type of transport.  On UDP,
246	     RPC messages are encapsulated inside datagrams, while on a TCP byte
247	     stream, RPC messages are delineated by a record marking protocol.
248	     An RDMA transport also conveys RPC messages in a unique fashion
249	     that must be fully described if client and server implementations
250	     are to interoperate.

252	     The RPC transport is responsible for conveying an RPC message from
253	     a sender to a receiver.  An RPC message is either an RPC call from
254	     a client to a server, or an RPC reply from the server back to the
255	     client.  An RPC message contains an RPC call header followed by
256	     arguments if the message is an RPC call, or an RPC reply header
257	     followed by results if the message is an RPC reply.  The call
258	     header contains a transaction ID (XID) followed by the program and
259	     procedure number as well as a security credential.  An RPC reply
260	     header begins with an XID that matches that of the RPC call
261	     message, followed by a security verifier and results.  All data in
262	     an RPC message is XDR encoded.

264	     The encoding of XDR data into transport buffers is referred to as
265	     "marshalling", and the decoding of XDR data contained within
266	     transport buffers and into destination RPC procedure result
267	     buffers, is referred to as "unmarshalling".  The process of
268	     marshalling takes place therefore at the sender of any particular
269	     message, be it an RPC request or an RPC response.  Unmarshalling,
270	     of course, takes place at the receiver.

272	     Normally, any bulk data is moved (copied) as a result of the
273	     unmarshalling process, because the destination adddress is not
274	     known until the RPC code receives control and subsequently invokes
275	     the XDR unmarshalling routine.  In other words, XDR-encoded data is
276	     not self-describing, and it carries no placement information.  This
277	     results in a data copy in most NFS implementations.

279	     One mechanism by which the RPC layer may overcome this is for each
280	     request to include placement information, to be used for direct
281	     placement during XDR encode.  This "write chunk" can avoid sending
282	     bulk data inline in an RPC message and generally results in one or
283	     more RDMA Write operations.

285	     Similarly, a "read chunk", where placement information referring to
286	     bulk data which may be directly fetched via one or more RDMA Read
287	     operations during XDR decode, may be conveyed.  The "read chunk"
288	     will therefore be useful in both RPC calls and replies, while the
289	     "write chunk" is used solely in replies.

291	     These "chunks" are the key concept in an existing proposal
292	     [RPCRDMA].  They convey what are effectively pointers to remote
293	     memory across the network.  They allow cooperating peers to
294	     exchange data outside of XDR encodings but still use XDR for
295	     describing the data to be transferred.  And, finally, through use
296	     of XDR they maintain a large degree of on-the-wire compatibility.

298	     The central concept of the RDMA transport is to provide the
299	     additional encoding conventions to convey this placement
300	     information in transport-specific encoding, and to modify the XDR
301	     handling of bulk data.

303	                             Block Diagram

305	     +------------------------+-----------------------------------+
306	     |         NFS            |            NFS + RDMA             |
307	     +------------------------+----------------------+------------+
308	     |           Operations / Procedures             |            |
309	     +-----------------------------------------------+            |
310	     |                   RPC/XDR                     |            |
311	     +--------------------------------+--------------+            |
312	     |       Stream Transport         |      RDMA Transport       |
313	     +--------------------------------+---------------------------+

315	4.  Sources of Overhead

317	     Network and file protocol costs can be categorized as follows:

319	     o    per-byte costs - data touching costs such as checksum or data
320	          copy.  Today's network interface hardware commonly offloads
321	          the checksum, which leaves the other major source of per-byte
322	          overhead, data copy.

324	     o    per-packet costs - interrupts and lower-layer processing.
325	          Today's network interface hardware also commonly coalesce
326	          interrupts to reduce per-packet costs.

328	     o    per-message (request or response) costs - LLP and ULP
329	          processing.

331	     Improvement from optimization becomes more important if the
332	     overhead it targets is a larger share of the total cost.  As other
333	     sources of overhead, such as the checksumming and interrupt
334	     handling above are eliminated, the remaining overheads (primarily
335	     data copy) loom larger.

337	     With copies crossing the bus twice per copy, network processing
338	     overhead is high whenever network bandwidth is large in comparison
339	     to CPU and memory bandwidths.  Generally with today's end-systems,
340	     the effects are observable at network speeds at or above 1 Gbits/s.

342	     A common question is whether increase in CPU processing power
343	     alleviates the problem of high processing costs of network I/O.
344	     The answer is no, it is the memory bandwidth that is the issue.
345	     Faster CPUs do not help if the CPU spends most of its time waiting
346	     for memory [RFC4297].

348	     TCP offload engine (TOE) technology aims to offload the CPU by
349	     moving TCP/IP protocol processing to the NIC.  However, TOE
350	     technology by itself does nothing to avoid necessary data copies
351	     within upper layer protocols.  [MOG03] provides a description of
352	     the role TOE can play in reducing per-packet and per-message costs.
353	     Beyond the offloads commonly provided by today's network interface
354	     hardware, TOE alone (w/o RDMA) helps in protocol header processing,
355	     but this has been shown to be a minority component of the total
356	     protocol processing overhead. [CHA+01]

358	     Numerous software approaches to the optimization of network
359	     throughput have been made.  Experience has shown that network I/O
360	     interacts with other aspects of system processing such as file I/O
361	     and disk I/O.  [BRU99] [CHU96] Zero-copy optimizations based on
362	     page remapping [CHU96] can be dependent upon machine architecture,
363	     and are not scaleable to multi-processor architectures.  Correct
364	     buffer alignment and sizing together are needed to optimize the
365	     performance of zero-copy movement mechanisms [SKE+01].  The NFS
366	     message layout described above does not facilitate the splitting of
367	     headers from data nor does it facilitate providing correct data
368	     buffer alignment.

370	4.1.  Savings from TOE

372	     The expected improvement of TOE specifically for NFS protocol
373	     processing can be quantified and shown to be fundamentally limited.
374	     [SHI+03] presents a set of "LAWS" parameters which serve to
375	     illustrate the issues.  In the TOE case, the copy cost can be
376	     viewed as part of the application processing "a".  Application
377	     processing increases the LAWS "gamma", which is shown by the paper
378	     to result in a diminished benefit for TOE.

380	     For example, if the overhead is 20% TCP/IP, 30% copy and 50% real
381	     application work, then gamma is 80/20 or 4, which means the maximum
382	     benefit of TOE is 1/gamma, or only 25%.

384	     For RDMA (with embedded TOE) and the same example, the "overhead"
385	     (o) offloaded or eliminated is 50% (20%+30%).  Therefore in the
386	     RDMA case, gamma is 50/50 or 1, and the inverse gives the potential
387	     benefit of 1 (100%), a factor of two.

389	                        CPU overhead reduction factor

391	                   No Offload   TCP Offload   RDMA Offload
392	                   -----------+-------------+-------------
393	                      1.00x        1.25x         2.00x

395	     The analysis in the paper shows that RDMA could improve throughput
396	     by the same factor of two, even when the host is (just) powerful
397	     enough to drive the full network bandwidth without RDMA.  It can
398	     also be shown that the speedup may be higher if network bandwidth
399	     grows faster than Moore's Law, although the higher benefits will
400	     apply to a narrow range of applications.

402	4.2.  Savings from RDMA

404	     Performance measurements directly comparing an NFS over RDMA
405	     prototype with conventional network-based NFS processing are
406	     described in [CAL+03].  Comparisons of Read throughput and CPU
407	     overhead were performed on two Gigabit Ethernet adapters, one
408	     conventional and one with RDMA capability.  The prototype RDMA
409	     protocol performed all transfers via RDMA Read.

411	     In these results, conventional network-based throughput was
412	     severely limited by the client's CPU being saturated at 100% for
413	     all transfers.  Read throughput reached no more than 60MBytes/s.

415	            I/O Type      Size    Read Throughput     CPU Utilization
416	            Conventional    2KB          20MB/s              100%
417	            Conventional   16KB          40MB/s              100%
418	            Conventional  256KB          60MB/s              100%

420	     However, over RDMA, throughput rose to the theoretical maximum
421	     throughput of the platform, while saturating the single-CPU system
422	     only at maximum throughput.

424	            I/O Type      Size    Read Throughput     CPU Utilization
425	            RDMA            2KB          10MB/s               45%
426	            RDMA           16KB          40MB/s               70%
427	            RDMA          256KB         100MB/s              100%

429	     The lower relative throughput of the RDMA prototype at the small
430	     blocksize may be attributable to the RDMA Read imposed by the
431	     prototype protocol, which reduced the operation rate since it
432	     introduces additional latency.  As well, it may reflect the
433	     relative increase of per-packet setup costs within the DMA portion
434	     of the transfer.

436	5.  Application of RDMA to NFS

438	     Efficient file protocols require efficient data positioning and
439	     movement.  The client system knows the client memory address where
440	     the application has data to be written or wants read data
441	     deposited.  The server system knows the server memory address where
442	     the local filesystem will accept write data or has data to be read.
443	     Neither peer however is aware of the others' data destination in
444	     the current NFS, RPC or XDR protocols.  Existing NFS
445	     implementations have struggled with the performance costs of data
446	     copies when using traditional Ethernet transports.

448	     With the onset of faster networks, the network I/O bottleneck will
449	     worsen.  Fortunately, new transports that support RDMA have
450	     emerged.  RDMA excels at bulk transfer efficiency; it is an
451	     efficient way to deliver direct data placement and remove a major
452	     part of the problem: data copies.  RDMA also addresses other
453	     overheads, e.g. underlying protocol offload, and offers separation
454	     of control information from data.

456	     The current NFS message layout provides the performance enhancing
457	     opportunity for an NFS over RDMA protocol that separates the
458	     control information from data chunks while meeting the alignment
459	     needs of both.  The data chunks can be copied "directly" between
460	     the client and server memory addresses above (with a single
461	     occurrence on each memory bus) while the control information can be
462	     passed "inline".  [RPCRDMA] describes such a protocol.

464	6.  Conclusions

466	     NFS version 4 [RFC3530] has been granted "Proposed Standard"
467	     status.  The NFSv4 protocol was developed along several design
468	     points, important among them: effective operation over wide- area
469	     networks, including the Internet itself;  strong security
470	     integrated into the protocol;  extensive cross-platform
471	     interoperability including integrated locking semantics compatible
472	     with multiple operating systems; and (this is key), protocol
473	     extension.

475	     NFS version 4 is an excellent base on which to add the needed
476	     performance enhancements and improved semantics described above.
477	     The minor versioning support defined in NFS version 4 was designed
478	     to support protocol improvements without disruption to the
479	     installed base.  Evolutionary improvement of the protocol via minor
480	     versioning is a conservative and cautious approach to current and
481	     future problems and shortcomings.

483	     Many arguments can be made as to the efficacy of the file
484	     abstraction in meeting the future needs of enterprise data service
485	     and the Internet.  Fine grained Quality of Service (QoS) policies
486	     (e.g. data delivery, retention, availability, security, ...) are
487	     high among them.

489	     It is vital that the NFS protocol continue to provide these
490	     benefits to a wide range of applications, without its usefulness
491	     being compromised by concerns about performance and semantic
492	     inadequacies.  This can reasonably be addressed in the existing NFS
493	     protocol framework.  A cautious evolutionary improvement of
494	     performance and semantics allows building on the value already
495	     present in the NFS protocol, while addressing new requirements that
496	     have arisen from the application of networking technology.

498	7.  Security Considerations

500	     Security Considerations are not covered by this document.  Please
501	     refer to the appropriate protocol documents for any security
502	     issues.

504	8.  IANA Considerations

506	     IANA Considerations are not covered by this document.  Please refer
507	     to the appropriate protocol documents for any IANA issues.

509	9.  Acknowledgements

511	     The authors wish to thank Jeff Chase who provided many useful
512	     suggestions.

514	10.  Normative References

516	     [RFC3530]
517	          S. Shepler, et. al., "NFS Version 4 Protocol", Standards Track
518	          RFC

520	     [RFC1831]
521	          R. Srinivasan, "RPC: Remote Procedure Call Protocol
522	          Specification Version 2", Standards Track RFC

524	     [RFC1832]
525	          R. Srinivasan, "XDR: External Data Representation Standard",
526	          Standards Track RFC

528	     [RFC1813]
529	          B. Callaghan, B. Pawlowski, P. Staubach, "NFS Version 3
530	          Protocol Specification", Informational RFC

532	11.  Informative References

534	     [BRU99]
535	          J. Brustoloni, "Interoperation of copy avoidance in network
536	          and file I/O", in Proc. INFOCOM '99, pages 534-542, New York,
537	          NY, Mar. 1999., IEEE.  Also available from
538	          http://www.cs.pitt.edu/~jcb/publs.html

540	     [CAL+03]
541	          B. Callaghan, T. Lingutla-Raj, A.  Chiu, P. Staubach, O. Asad,
542	          "NFS over RDMA", in Proceedings of ACM SIGCOMM Summer 2003
543	          NICELI Workshop.

545	     [CHA+01]
546	          J. S. Chase, A. J. Gallatin, K. G. Yocum, "Endsystem
547	          optimizations for high-speed TCP", IEEE Communications,
548	          39(4):68-74, April 2001.

550	     [CHA+99]
551	          J. S. Chase, D. C. Anderson, A. J. Gallatin, A. R. Lebeck, K.
552	          G. Yocum, "Network I/O with Trapeze", in 1999 Hot
553	          Interconnects Symposium, August 1999.

555	     [CHU96]
556	          H.K. Chu, "Zero-copy TCP in Solaris", Proc. of the USENIX 1996
557	          Annual Technical Conference, San Diego, CA, January 1996

559	     [DCK+03]
560	          M. DeBergalis, P. Corbett, S. Kleiman, A. Lent, D. Noveck, T.
561	          Talpey, M. Wittle, "The Direct Access File System", in
562	          Proceedings of 2nd USENIX Conference on File and Storage
563	          Technologies (FAST '03), San Francisco, CA, March 31 - April
564	          2, 2003

566	     [FJDAFS]
567	          Fujitsu Prime Software Technologies, "Meet the DAFS
568	          Performance with DAFS/VI Kernel Implementation using cLAN",
569	          available from
570	          http://www.pst.fujitsu.com/english/dafsdemo/index.html, 2001.

572	     [FJNFS]
573	          Fujitsu Prime Software Technologies, "An Adaptation of VIA to
574	          NFS on Linux", available from
575	          http://www.pst.fujitsu.com/english/nfs/index.html, 2000.

577	     [GAL+99]
578	          A. Gallatin, J. Chase, K. Yocum, "Trapeze/IP: TCP/IP at Near-
579	          Gigabit Speeds", 1999 USENIX Technical Conference (Freenix
580	          Track), June 1999.

582	     [KM02]
583	          K. Magoutis, "Design and Implementation of a Direct Access
584	          File System (DAFS) Kernel Server for FreeBSD", in Proceedings
585	          of USENIX BSDCon 2002 Conference, San Francisco, CA, February
586	          11-14, 2002.

588	     [MAF+02]
589	          K. Magoutis, S. Addetia, A. Fedorova, M. Seltzer, J. Chase, D.
590	          Gallatin, R. Kisley, R. Wickremesinghe, E. Gabber, "Structure
591	          and Performance of the Direct Access File System (DAFS)", in
592	          Proceedings of 2002 USENIX Annual Technical Conference,
593	          Monterey, CA, June 9-14, 2002.

595	     [MOG03]
596	          J. Mogul, "TCP offload is a dumb idea whose time has come",
597	          9th Workshop on Hot Topics in Operating Systems (HotOS IX),
598	          Lihue, HI, May 2003. USENIX.

600	     [NFSv4.1]
601	          S. Shepler, ed., "NFSv4 Minor Version 1" Internet Draft work-
602	          in-progress, draft-ietf-nfsv4-minorversion1

604	     [PAI+00]
605	          V. S. Pai, P. Druschel, W. Zwaenepoel, "IO-Lite: a unified I/O
606	          buffering and caching system", ACM Trans. Computer Systems,
607	          18(1):37-66, Feb. 2000.

609	     [RDDP]
610	          RDDP Working Group charter,
611	          http://www.ietf.org/html.charters/rddp-charter.html

613	     [RFC4297]
614	          A. Romanow, J. Mogul, T. Talpey, S. Bailey, "Remote Direct
615	          Memory Access (RDMA) over IP Problem Statement", Informational
616	          RFC

618	     [RFC1094]
619	          Sun Microsystems, "NFS: Network File System Protocol
620	          Specification"

622	     [RPCRDMA]
623	          T. Talpey, B. Callaghan, "RDMA Transport for ONC RPC",
624	          Internet Draft Work in Progress, draft-ietf-nfsv4-rpcrdma

626	     [SHI+03]
627	          P. Shivam, J. Chase, "On the Elusive Benefits of Protocol
628	          Offload", to be published in Proceedings of ACM SIGCOMM Summer
629	          2003 NICELI Workshop, also available from
630	          http://issg.cs.duke.edu/publications/niceli03.pdf

632	     [SKE+01]
633	          K.-A. Skevik, T. Plagemann, V. Goebel, P. Halvorsen,
634	          "Evaluation of a Zero-Copy Protocol Implementation", in
635	          Proceedings of the 27th Euromicro Conference - Multimedia and
636	          Telecommunications Track (MTT'2001), Warsaw, Poland, September
637	          2001.

639	Authors' Addresses

641	     Tom Talpey
642	     Network Appliance, Inc.
643	     375 Totten Pond Road
644	     Waltham, MA 02451 USA

646	     Phone: +1 781 768 5329
647	     Email: thomas.talpey@netapp.com

649	     Chet Juszczak
650	     Chet's Boathouse Co.
651	     P.O. Box 1467
652	     Merrimack, NH 03054

654	     Email: chetnh@earthlink.net

656	Intellectual Property and Copyright Statements

658	Intellectual Property Statement

660	     The IETF takes no position regarding the validity or scope of any
661	     Intellectual Property Rights or other rights that might be claimed
662	     to pertain to the implementation or use of the technology described
663	     in this document or the extent to which any license under such
664	     rights might or might not be available; nor does it represent that
665	     it has made any independent effort to identify any such rights.
666	     Information on the procedures with respect to rights in RFC
667	     documents can be found in BCP 78 and BCP 79.

669	     Copies of IPR disclosures made to the IETF Secretariat and any
670	     assurances of licenses to be made available, or the result of an
671	     attempt made to obtain a general license or permission for the use
672	     of such proprietary rights by implementers or users of this
673	     specification can be obtained from the IETF on-line IPR repository
674	     at http://www.ietf.org/ipr.

676	     The IETF invites any interested party to bring to its attention any
677	     copyrights, patents or patent applications, or other proprietary
678	     rights that may cover technology that may be required to implement
679	     this standard.  Please address the information to the IETF at ietf-
680	     ipr@ietf.org.

682	Disclaimer of Validity

684	     This document and the information contained herein are provided on
685	     an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
686	     REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND
687	     THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES,
688	     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT
689	     THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR
690	     ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
691	     PARTICULAR PURPOSE.

693	Copyright Statement

695	     Copyright (C) The Internet Society (2006).

697	     This document is subject to the rights, licenses and restrictions
698	     contained in BCP 78, and except as set forth therein, the authors
699	     retain all their rights.

701	Acknowledgement
702	     Funding for the RFC Editor function is currently provided by the
703	     Internet Society.