idnits 2.17.1 draft-ietf-nfsv4-rpcrdma-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 1607. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1617. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1624. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1630. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: 3. Flow control credit value. When sent in an RPC call message, the requested value is provided. When sent in an RPC reply message, the granted value is returned. RPC calls SHOULD not be sent in excess of the currently granted limit. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 22, 2008) is 5907 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '8' on line 1003

  ** Downref: Normative reference to an Informational RFC: RFC 1094

  ** Obsolete normative reference: RFC 1831 (Obsoleted by RFC 5531)

  ** Downref: Normative reference to an Informational RFC: RFC 1813

  ** Obsolete normative reference: RFC 3530 (Obsoleted by RFC 7530)


     Summary: 5 errors (**), 0 flaws (~~), 3 warnings (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	NFSv4 Working Group                                      Tom Talpey
3	Internet-Draft                                               NetApp
4	Intended status: Standards Track                    Brent Callaghan
5	Expires: August 23, 2008                                      Apple
6	                                                  February 22, 2008

8	    Remote Direct Memory Access Transport for Remote Procedure Call
9	                      draft-ietf-nfsv4-rpcrdma-07

11	Status of this Memo

13	     By submitting this Internet-Draft, each author represents that any
14	     applicable patent or other IPR claims of which he or she is aware
15	     have been or will be disclosed, and any of which he or she becomes
16	     aware will be disclosed, in accordance with Section 6 of BCP 79.

18	     Internet-Drafts are working documents of the Internet Engineering
19	     Task Force (IETF), its areas, and its working groups.  Note that
20	     other groups may also distribute working documents as Internet-
21	     Drafts.

23	     Internet-Drafts are draft documents valid for a maximum of six
24	     months and may be updated, replaced, or obsoleted by other
25	     documents at any time.  It is inappropriate to use Internet-Drafts
26	     as reference material or to cite them other than as "work in
27	     progress."

29	     The list of current Internet-Drafts can be accessed at
30	         http://www.ietf.org/ietf/1id-abstracts.txt

32	     The list of Internet-Draft Shadow Directories can be accessed at
33	         http://www.ietf.org/shadow.html.

35	Copyright Notice

37	     Copyright (C) The IETF Trust (2008).

39	Abstract

41	     A protocol is described providing Remote Direct Memory Access
42	     (RDMA) as a new transport for Computing Remote Procedure Call
43	     (RPC).  The RDMA transport binding conveys the benefits of
44	     efficient, bulk data transport over high speed networks, while
45	     providing for minimal change to RPC applications and with no
46	     required revision of the application RPC protocol, or the RPC
47	     protocol itself.

49	Table of Contents

51	     1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
52	     2.  Abstract RDMA Requirements . . . . . . . . . . . . . . . . . 3
53	     3.  Protocol Outline . . . . . . . . . . . . . . . . . . . . . . 4
54	     3.1.  Short Messages . . . . . . . . . . . . . . . . . . . . . . 5
55	     3.2.  Data Chunks  . . . . . . . . . . . . . . . . . . . . . . . 5
56	     3.3.  Flow Control . . . . . . . . . . . . . . . . . . . . . . . 6
57	     3.4.  XDR Encoding with Chunks . . . . . . . . . . . . . . . . . 7
58	     3.5.  XDR Decoding with Read Chunks  . . . . . . . . . . . . .  10
59	     3.6.  XDR Decoding with Write Chunks . . . . . . . . . . . . .  11
60	     3.7.  XDR Roundup and Chunks . . . . . . . . . . . . . . . . .  12
61	     3.8.  RPC Call and Reply . . . . . . . . . . . . . . . . . . .  13
62	     3.9.  Padding  . . . . . . . . . . . . . . . . . . . . . . . .  16
63	     4.  RPC RDMA Message Layout  . . . . . . . . . . . . . . . . .  17
64	     4.1.  RPC over RDMA Header . . . . . . . . . . . . . . . . . .  17
65	     4.2.  RPC over RDMA header errors  . . . . . . . . . . . . . .  19
66	     4.3.  XDR Language Description . . . . . . . . . . . . . . . .  20
67	     5.  Long Messages  . . . . . . . . . . . . . . . . . . . . . .  22
68	     5.1.  Message as an RDMA Read Chunk  . . . . . . . . . . . . .  22
69	     5.2.  RDMA Write of Long Replies (Reply Chunks)  . . . . . . .  24
70	     6.  Connection Configuration Protocol  . . . . . . . . . . . .  25
71	     6.1.  Initial Connection State . . . . . . . . . . . . . . . .  26
72	     6.2.  Protocol Description . . . . . . . . . . . . . . . . . .  26
73	     7.  Memory Registration Overhead . . . . . . . . . . . . . . .  28
74	     8.  Errors and Error Recovery  . . . . . . . . . . . . . . . .  28
75	     9.  Node Addressing  . . . . . . . . . . . . . . . . . . . . .  28
76	     10.  RPC Binding . . . . . . . . . . . . . . . . . . . . . . .  29
77	     11.  Security Considerations . . . . . . . . . . . . . . . . .  30
78	     12.  IANA Considerations . . . . . . . . . . . . . . . . . . .  31
79	     13.  Acknowledgements  . . . . . . . . . . . . . . . . . . . .  32
80	     14.  Normative References  . . . . . . . . . . . . . . . . . .  32
81	     15.  Informative References  . . . . . . . . . . . . . . . . .  33
82	     16.  Authors' Addresses  . . . . . . . . . . . . . . . . . . .  34
83	     17.  Intellectual Property and Copyright Statements  . . . . .  35
84	     Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . .  36

86	Requirements Language

88	     The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
89	     "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
90	     this document are to be interpreted as described in [RFC2119].

92	1.  Introduction

94	     Remote Direct Memory Access (RDMA) [RFC5040, RFC5041] [IB] is a
95	     technique for efficient movement of data between end nodes, which
96	     becomes increasingly compelling over high speed transports.  By
97	     directing data into destination buffers as it is sent on a network,
98	     and placing it via direct memory access by hardware, the double
99	     benefit of faster transfers and reduced host overhead is obtained.

101	     Open Network Computing Remote Procedure Call (ONC RPC, or simply,
102	     RPC) [RFC1831bis] is a remote procedure call protocol that has been
103	     run over a variety of transports.  Most RPC implementations today
104	     use UDP or TCP.  RPC messages are defined in terms of an eXternal
105	     Data Representation (XDR) [RFC4506] which provides a canonical data
106	     representation across a variety of host architectures.  An XDR data
107	     stream is conveyed differently on each type of transport.  On UDP,
108	     RPC messages are encapsulated inside datagrams, while on a TCP byte
109	     stream, RPC messages are delineated by a record marking protocol.
110	     An RDMA transport also conveys RPC messages in a unique fashion
111	     that must be fully described if client and server implementations
112	     are to interoperate.

114	     RDMA transports present new semantics unlike the behaviors of
115	     either UDP or TCP alone.  They retain message delineations like UDP
116	     while also providing a reliable, sequenced data transfer like TCP.
117	     And, they provide the new efficient, bulk transfer service of RDMA.
118	     RDMA transports are therefore naturally viewed as a new transport
119	     type by RPC.

121	     RDMA as a transport will benefit the performance of RPC protocols
122	     that move large "chunks" of data, since RDMA hardware excels at
123	     moving data efficiently between host memory and a high speed
124	     network with little or no host CPU involvement.  In this context,
125	     the NFS protocol, in all its versions [RFC1094] [RFC1813] [RFC3530]
126	     [NFSv4.1], is an obvious beneficiary of RDMA.  A complete problem
127	     statement is discussed in [NFSRDMAPS], and related NFSv4 issues are
128	     discussed in [NFSv4.1].  Many other RPC-based protocols will also
129	     benefit.

131	     Although the RDMA transport described here provides relatively
132	     transparent support for any RPC application, the proposal goes
133	     further in describing mechanisms that can optimize the use of RDMA
134	     with more active participation by the RPC application.

136	2.  Abstract RDMA Requirements

138	     An RPC transport is responsible for conveying an RPC message from a
139	     sender to a receiver.  An RPC message is either an RPC call from a
140	     client to a server, or an RPC reply from the server back to the
141	     client.  An RPC message contains an RPC call header followed by
142	     arguments if the message is an RPC call, or an RPC reply header
143	     followed by results if the message is an RPC reply.  The call
144	     header contains a transaction ID (XID) followed by the program and
145	     procedure number as well as a security credential.  An RPC reply
146	     header begins with an XID that matches that of the RPC call
147	     message, followed by a security verifier and results.  All data in
148	     an RPC message is XDR encoded.  For a complete description of the
149	     RPC protocol and XDR encoding, see [RFC1831bis] and [RFC4506].

151	     This protocol assumes the following abstract model for RDMA
152	     transports.  These terms, common in the RDMA lexicon, are used in
153	     this document.  A more complete glossary of RDMA terms can be found
154	     in [RFC5040].

156	     o Registered Memory
157	          All data moved via tagged RDMA operations is resident in
158	          registered memory at its destination.  This protocol assumes
159	          that each segment of registered memory MUST be identified with
160	          a steering tag of no more than 32 bits and memory addresses of
161	          up to 64 bits in length.

163	     o RDMA Send
164	          The RDMA provider supports an RDMA Send operation with
165	          completion signalled at the receiver when data is placed in a
166	          pre-posted buffer.  The amount of transferred data is limited
167	          only by the size of the receiver's buffer.  Sends complete at
168	          the receiver in the order they were issued at the sender.

170	     o RDMA Write
171	          The RDMA provider supports an RDMA Write operation to directly
172	          place data in the receiver's buffer.  An RDMA Write is
173	          initiated by the sender and completion is signalled at the
174	          sender.  No completion is signalled at the receiver.  The
175	          sender uses a steering tag, memory address and length of the
176	          remote destination buffer.  RDMA Writes are not necessarily
177	          ordered with respect to one another, but are ordered with
178	          respect to RDMA Sends; a subsequent RDMA Send completion
179	          obtained at the receiver guarantees that prior RDMA Write data
180	          has been successfully placed in the receiver's memory.

182	     o RDMA Read
183	          The RDMA provider supports an RDMA Read operation to directly
184	          place peer source data in the requester's buffer.  An RDMA
185	          Read is initiated by the receiver and completion is signalled
186	          at the receiver.  The receiver provides steering tags, memory
187	          addresses and a length for the remote source and local
188	          destination buffers.  Since the peer at the data source
189	          receives no notification of RDMA Read completion, there is an
190	          assumption that on receiving the data the receiver will signal
191	          completion with an RDMA Send message, so that the peer can
192	          free the source buffers and the associated steering tags.

194	     This protocol is designed to be carried over all RDMA transports
195	     meeting the stated requirements.  This protocol conveys to the RPC
196	     peer, information sufficient for that RPC peer to direct an RDMA
197	     layer to perform transfers containing RPC data, and to communicate
198	     their result(s).  For example, it is readily carried over RDMA
199	     transports such as iWARP [RFC5040, RFC5041] or Infiniband [IB].

201	3.  Protocol Outline

203	     An RPC message can be conveyed in identical fashion, whether it is
204	     a call or reply message.  In each case, the transmission of the
205	     message proper is preceded by transmission of a transport-specific
206	     header for use by RPC over RDMA transports.  This header is
207	     analogous to the record marking used for RPC over TCP, but is more
208	     extensive, since RDMA transports support several modes of data
209	     transfer and it is important to allow the client and server to use
210	     the most efficient mode for any given transfer.  Multiple segments
211	     of a message may be transferred in different ways to different
212	     remote memory destinations.

214	     All transfers of a call or reply begin with an RDMA Send which
215	     transfers at least the RPC over RDMA header, usually with the call
216	     or reply message appended, or at least some part thereof.  Because
217	     the size of what may be transmitted via RDMA Send is limited by the
218	     size of the receiver's pre-posted buffer, the RPC over RDMA
219	     transport provides a number of methods to reduce the amount
220	     transferred by means of the RDMA Send, when necessary, by
221	     transferring various parts of the message using RDMA Read and RDMA
222	     Write.

224	     RPC over RDMA framing replaces all other RPC framing (such as TCP
225	     record marking) when used atop an RPC/RDMA association, even though
226	     the underlying RDMA protocol may itself be layered atop a protocol
227	     with a defined RPC framing (such as TCP).  An upper layer may
228	     however define an exchange to dynamically enable RPC/RDMA on an
229	     existing RPC association.  Any such exchange must be carefully
230	     architected so as to prevent any ambiguity as to the framing in use
231	     for each side of the connection.  Because RPC/RDMA framing delimits
232	     an entire RPC request or reply, any such shift must occur between
233	     distinct RPC messages.

235	3.1.  Short Messages

237	     Many RPC messages are quite short.  For example, the NFS version 3
238	     GETATTR request, is only 56 bytes: 20 bytes of RPC header, plus a
239	     32 byte file handle argument and 4 bytes of length.  The reply to
240	     this common request is about 100 bytes.

242	     There is no benefit in transferring such small messages with an
243	     RDMA Read or Write operation.  The overhead in transferring
244	     steering tags and memory addresses is justified only by large
245	     transfers.  The critical message size that justifies RDMA transfer
246	     will vary depending on the RDMA implementation and network, but is
247	     typically of the order of a few kilobytes.  It is appropriate to
248	     transfer a short message with an RDMA Send to a pre-posted buffer.
249	     The RPC over RDMA header with the short message (call or reply)
250	     immediately following is transferred using a single RDMA Send
251	     operation.

253	     Short RPC messages over an RDMA transport:

255	       RPC Client                           RPC Server
256	           |               RPC Call              |
257	      Send |   ------------------------------>   |
258	           |                                     |
259	           |               RPC Reply             |
260	           |   <------------------------------   | Send

262	3.2.  Data Chunks

264	     Some protocols, like NFS, have RPC procedures that can transfer
265	     very large "chunks" of data in the RPC call or reply and would
266	     cause the maximum send size to be exceeded if one tried to transfer
267	     them as part of the RDMA Send.  These large chunks typically range
268	     from a kilobyte to a megabyte or more.  An RDMA transport can
269	     transfer large chunks of data more efficiently via the direct
270	     placement of an RDMA Read or RDMA Write operation.  Using direct
271	     placement instead of inline transfer not only avoids expensive data
272	     copies, but provides correct data alignment at the destination.

274	3.3.  Flow Control

276	     It is critical to provide RDMA Send flow control for an RDMA
277	     connection.  RDMA receive operations will fail if a pre-posted
278	     receive buffer is not available to accept an incoming RDMA Send,
279	     and repeated occurrences of such errors can be fatal to the
280	     connection.  This is a departure from conventional TCP/IP
281	     networking where buffers are allocated dynamically on an as-needed
282	     basis, and where pre-posting is not required.

284	     It is not practical to provide for fixed credit limits at the RPC
285	     server.  Fixed limits scale poorly, since posted buffers are
286	     dedicated to the associated connection until consumed by receive
287	     operations.  Additionally for protocol correctness, the RPC server
288	     must always be able to reply to client requests, whether or not new
289	     buffers have been posted to accept future receives.  (Note that the
290	     RPC server may in fact be a client at some other layer.  For
291	     example, NFSv4 callbacks are processed by the NFSv4 client, acting
292	     as an RPC server.  The credit discussions apply equally in either
293	     case.)

295	     Flow control for RDMA Send operations is implemented as a simple
296	     request/grant protocol in the RPC over RDMA header associated with
297	     each RPC message.  The RPC over RDMA header for RPC call messages
298	     contains a requested credit value for the RPC server, which MAY be
299	     dynamically adjusted by the caller to match its expected needs.
300	     The RPC over RDMA header for the RPC reply messages provides the
301	     granted result, which MAY have any value except it MUST NOT be zero
302	     when no in-progress operations are present at the server, since
303	     such a value would result in deadlock.  The value MAY be adjusted
304	     up or down at each opportunity to match the server's needs or
305	     policies.

307	     The RPC client MUST NOT send unacknowledged requests in excess of
308	     this granted RPC server credit limit.  If the limit is exceeded,
309	     the RDMA layer may signal an error, possibly terminating the
310	     connection.  Even if an error does not occur, it is OPTIONAL that
311	     the server handle the excess request(s), and it MAY return an RPC
312	     error to the client.  Also note that the never-zero requirement
313	     implies that an RPC server MUST always provide at least one credit
314	     to each connected RPC client from which no requests are
315	     outstanding.  The client would deadlock otherwise, unable to send
316	     another request.

318	     While RPC calls complete in any order, the current flow control
319	     limit at the RPC server is known to the RPC client from the Send
320	     ordering properties.  It is always the most recent server-granted
321	     credit value minus the number of requests in flight.

323	     Certain RDMA implementations may impose additional flow control
324	     restrictions, such as limits on RDMA Read operations in progress at
325	     the responder.  Because these operations are outside the scope of
326	     this protocol, they are not addressed and SHOULD be provided for by
327	     other layers.  For example, a simple upper layer RPC consumer might
328	     perform single-issue RDMA Read requests, while a more
329	     sophisticated, multithreaded RPC consumer might implement its own
330	     FIFO queue of such operations.  For further discussion of possible
331	     protocol implementations capable of negotiating these values, see
332	     section 6 "Connection Configuration Protocol" of this draft, or
333	     [NFSv4.1].

335	3.4.  XDR Encoding with Chunks

337	     The data comprising an RPC call or reply message is marshaled or
338	     serialized into a contiguous stream by an XDR routine.  XDR data
339	     types such as integers, strings, arrays and linked lists are
340	     commonly implemented over two very simple functions that encode
341	     either an XDR data unit (32 bits) or an array of bytes.

343	     Normally, the separate data items in an RPC call or reply are
344	     encoded as a contiguous sequence of bytes for network transmission
345	     over UDP or TCP.  However, in the case of an RDMA transport, local
346	     routines such as XDR encode can determine that (for instance) an
347	     opaque byte array is large enough to be more efficiently moved via
348	     an RDMA data transfer operation like RDMA Read or RDMA Write.

350	     Semantically speaking, the protocol has no restriction regarding
351	     data types which may or may not be represented by a read or write
352	     chunk.  In practice however, efficiency considerations lead to the
353	     conclusion that certain data types are not generally "chunkable".
354	     Typically, only those opaque and aggregate data types that may
355	     attain substantial size are considered to be eligible.  With
356	     today's hardware this size may be a kilobyte or more.  However any
357	     object MAY be chosen for chunking in any given message.

359	     The eligibility of XDR data items to be candidates for being moved
360	     as data chunks (as opposed to being marshaled inline) is not
361	     specified by the RPC over RDMA protocol.  Chunk eligibility
362	     criteria MUST be determined by each upper layer in order to provide
363	     for an interoperable specification.  One such example with
364	     rationale, for the NFS protocol family, is provided in [NFSDDP].

366	     The interface by which an upper layer implementation communicates
367	     the eligibility of a data item locally to RPC for chunking is out
368	     of scope for this specification.  In many implementations, it is
369	     possible to implement a transparent RPC chunking facility.
370	     However, such implementations may lead to inefficiencies, either
371	     because they require the RPC layer to perform expensive
372	     registration and deregistration of memory "on the fly", or they may
373	     require using RDMA chunks in reply messages, along with the
374	     resulting additional handshaking with the RPC over RDMA peer.
375	     However, these issues are internal and generally confined to the
376	     local interface between RPC and its upper layers, one in which
377	     implementations are free to innovate.  The only requirement is that
378	     the resulting RPC RDMA protocol sent to the peer is valid for the
379	     upper layer.  See for example [NFSDDP].

381	     When sending any message (request or reply) that contains an
382	     eligible large data chunk, the XDR encoding routine avoids moving
383	     the data into the XDR stream.  Instead, it does not encode the data
384	     portion, but records the address and size of each chunk in a
385	     separate "read chunk list" encoded within RPC RDMA transport-
386	     specific headers.  Such chunks will be transferred via RDMA Read
387	     operations initiated by the receiver.

389	     When the read chunks are to be moved via RDMA, the memory for each
390	     chunk is registered.  This registration may take place within XDR
391	     itself, providing for full transparency to upper layers, or it may
392	     be performed by any other specific local implementation.

394	     Additionally, when making an RPC call that can result in bulk data
395	     transferred in the reply, write chunks MAY be provided to accept
396	     the data directly via RDMA Write.  These write chunks will
397	     therefore be pre-filled by the RPC server prior to responding, and
398	     XDR decode of the data at the client will not be required.  These
399	     chunks undergo a similar registration and advertisement via "write
400	     chunk lists" built as a part of XDR encoding.

402	     Some RPC client implementations are not able to determine where an
403	     RPC call's results reside during the "encode" phase.  This makes it
404	     difficult or impossible for the RPC client layer to encode the
405	     write chunk list at the time of building the request.  In this
406	     case, it is difficult for the RPC implementation to provide
407	     transparency to the RPC consumer, which may require recoding to
408	     provide result information at this earlier stage.

410	     Therefore if the RPC client does not make a write chunk list
411	     available to receive the result, then the RPC server MAY return
412	     data inline in the reply, or if the upper layer specification
413	     permits, it MAY be returned via a read chunk list.  It is NOT
414	     RECOMMENDED that upper layer RPC client protocol specifications
415	     omit write chunk lists for eligible replies, due to the lower
416	     performance of the additional handshaking to perform data transfer,
417	     and the requirement that the RPC server must expose (and preserve)
418	     the reply data for a period of time.  In the absence of a server-
419	     provided read chunk list in the reply, if the encoded reply
420	     overflows the posted receive buffer, the RPC will fail with an RDMA
421	     transport error.

423	     When any data within a message is provided via either read or write
424	     chunks, the chunk itself refers only to the data portion of the XDR
425	     stream element.  In particular, for counted fields (e.g., a "<>"
426	     encoding) the byte count which is encoded as part of the field
427	     remains in the XDR stream, and is also encoded in the chunk list.
428	     The data portion is however elided from the encoded XDR stream, and
429	     is transferred as part of chunk list processing.  This is important
430	     to maintain upper layer implementation compatibility - both the
431	     count and the data must be transferred as part of the logical XDR
432	     stream.  While the chunk list processing results in the data being
433	     available to the upper layer peer for XDR decoding, the length
434	     present in the chunk list entries is not.  Any byte count in the
435	     XDR stream MUST match the sum of the byte counts present in the
436	     corresponding read or write chunk list.  If they do not agree, an
437	     RPC protocol encoding error results.

439	     The following items are contained in a chunk list entry.

441	     Handle
442	          Steering tag or handle obtained when the chunk memory is
443	          registered for RDMA.

445	     Length
446	          The length of the chunk in bytes.

448	     Offset
449	          The offset or beginning memory address of the chunk.  In order
450	          to support the widest array of RDMA implementations, as well
451	          as the most general steering tag scheme, this field is
452	          unconditionally included in each chunk list entry.

454	          While zero-based offset schemes are available in many RDMA
455	          implementations, their use by RPC requires individual
456	          registration of each read or write chunk.  On many such
457	          implementations this can be a significant overhead.  By
458	          providing an offset in each chunk, many pre-registration or
459	          region-based registrations can be readily supported, and by
460	          using a single, universal chunk representation, the RPC RDMA
461	          protocol implementation is simplified to its most general
462	          form.

464	     Position
465	          For data which is to be encoded, the position in the XDR
466	          stream where the chunk would normally reside.  Note that the
467	          chunk therefore inserts its data into the XDR stream at this
468	          position, but its transfer is no longer "inline".  Also note
469	          therefore that all chunks belonging to a single RPC argument
470	          or result will have the same position.  For data which is to
471	          be decoded, no position is used.

473	     When XDR marshaling is complete, the chunk list is XDR encoded,
474	     then sent to the receiver prepended to the RPC message.  Any source
475	     data for a read chunk, or the destination of a write chunk, remain
476	     behind in the sender's registered memory and their actual payload
477	     is not marshaled into the request or reply.

479	     +----------------+----------------+-------------
480	     | RPC over RDMA  |                |
481	     |    header w/   |   RPC Header   | Non-chunk args/results
482	     |     chunks     |                |
483	     +----------------+----------------+-------------

485	     Read chunk lists and write chunk lists are structured somewhat
486	     differently.  This is due to the different usage - read chunks are
487	     decoded and indexed by their argument's or result's position in the
488	     XDR data stream;  their size is always known.  Write chunks on the
489	     other hand are used only for results, and have neither a
490	     preassigned offset in the XDR stream, nor a size until the results
491	     are produced, since the buffers may be only partially filled, or
492	     may not be used for results at all.  Their presence in the XDR
493	     stream is therefore not known until the reply is processed.  The
494	     mapping of Write chunks onto designated NFS procedures and their
495	     results is described in [NFSDDP].

497	     Therefore, read chunks are encoded into a read chunk list as a
498	     single array, with each entry tagged by its (known) size and its
499	     argument's or result's position in the XDR stream.  Write chunks
500	     are encoded as a list of arrays of RDMA buffers, with each list
501	     element (an array) providing buffers for a separate result.
502	     Individual write chunk list elements MAY thereby result in being
503	     partially or fully filled, or in fact not being filled at all.
504	     Unused write chunks, or unused bytes in write chunk buffer lists,
505	     are not returned as results, and their memory is returned to the
506	     upper layer as part of RPC completion.  However, the RPC layer MUST
507	     NOT assume that the buffers have not been modified.

509	3.5.  XDR Decoding with Read Chunks

511	     The XDR decode process moves data from an XDR stream into a data
512	     structure provided by the RPC client or server application.  Where
513	     elements of the destination data structure are buffers or strings,
514	     the RPC application can either pre-allocate storage to receive the
515	     data, or leave the string or buffer fields null and allow the XDR
516	     decode stage of RPC processing to automatically allocate storage of
517	     sufficient size.

519	     When decoding a message from an RDMA transport, the receiver first
520	     XDR decodes the chunk lists from the RPC over RDMA header, then
521	     proceeds to decode the body of the RPC message (arguments or
522	     results).  Whenever the XDR offset in the decode stream matches
523	     that of a chunk in the read chunk list, the XDR routine initiates
524	     an RDMA Read to bring over the chunk data into locally registered
525	     memory for the destination buffer.

527	     When processing an RPC request, the RPC receiver (RPC server)
528	     acknowledges its completion of use of the source buffers by simply
529	     replying to the RPC sender (client), and the peer may then free all
530	     source buffers advertised by the request.

532	     When processing an RPC reply, after completing such a transfer the
533	     RPC receiver (client) MUST issue an RDMA_DONE message (described in
534	     Section 3.8) to notify the peer (server) that the source buffers
535	     can be freed.

537	     The read chunk list is constructed and used entirely within the
538	     RPC/XDR layer.  Other than specifying the minimum chunk size, the
539	     management of the read chunk list is automatic and transparent to
540	     an RPC application.

542	3.6.  XDR Decoding with Write Chunks

544	     When a "write chunk list" is provided for the results of the RPC
545	     call, the RPC server MUST provide any corresponding data via RDMA
546	     Write to the memory referenced in the chunk list entries.  The RPC
547	     reply conveys this by returning the write chunk list to the client
548	     with the lengths rewritten to match the actual transfer.  The XDR
549	     "decode" of the reply therefore performs no local data transfer but
550	     merely returns the length obtained from the reply.

552	     Each decoded result consumes one entry in the write chunk list,
553	     which in turn consists of an array of RDMA segments.  The length is
554	     therefore the sum of all returned lengths in all segments
555	     comprising the corresponding list entry.  As each list entry is
556	     "decoded", the entire entry is consumed.

558	     The write chunk list is constructed and used by the RPC
559	     application.  The RPC/XDR layer simply conveys the list between
560	     client and server and initiates the RDMA Writes back to the client.
561	     The mapping of write chunk list entries to procedure arguments MUST
562	     be determined for each protocol.  An example of a mapping is
563	     described in [NFSDDP].

565	3.7.  XDR Roundup and Chunks

567	     The XDR protocol requires 4-byte alignment of each new encoded
568	     element in any XDR stream.  This requirement is for efficiency and
569	     ease of decode/unmarshaling at the receiver - if the XDR stream
570	     buffer begins on a native machine boundary, then the XDR elements
571	     will lie on similarly predictable offsets in memory.

573	     Within XDR, when non-4-byte encodes (such as an odd-length string
574	     or bulk data) are marshaled, their length is encoded literally,
575	     while their data is padded to begin the next element at a 4-byte
576	     boundary in the XDR stream.  For TCP or RDMA inline encoding, this
577	     minimal overhead is required because the transport-specific framing
578	     relies on the fact that the relative offset of the elements in the
579	     XDR stream from the start of the message determines the XDR
580	     position during decode.

582	     On the other hand, RPC/RDMA Read chunks carry the XDR position of
583	     each chunked element and length of the Chunk segment, and can be
584	     placed by the receiver exactly where they belong in the receiver's
585	     memory without regard to the alignment of their position in the XDR
586	     stream.  Since any rounded-up data is not actually part of the
587	     upper layer's message, the receiver will not reference it, and
588	     there is no reason to set it to any particular value in the
589	     receiver's memory.

591	     When roundup is present at the end of a sequence of chunks, the
592	     length of the sequence will terminate it at a non-4-byte XDR
593	     position.  When the receiver proceeds to decode the remaining part
594	     of the XDR stream, it inspects the XDR position indicated by the
595	     next chunk.  Because this position will not match (else roundup
596	     would not have occurred), the receiver decoding will fall back to
597	     inspecting the remaining inline portion.  If in turn, no data
598	     remains to be decoded from the inline portion, then the receiver
599	     MUST conclude that roundup is present, and therefore advances the
600	     XDR decode position to that indicated by the next chunk (if any).
601	     In this way, roundup is passed without ever actually transferring
602	     additional XDR bytes.

604	     Some protocol operations over RPC/RDMA, for instance NFS writes of
605	     data encountered at the end of a file or in direct i/o situations,
606	     commonly yield these roundups within RDMA Read Chunks.  Because any
607	     roundup bytes are not actually present in the data buffers being
608	     written, memory for these bytes would come from noncontiguous
609	     buffers, either as an additional memory registration segment, or as
610	     an additional Chunk.  The overhead of these operations can be
611	     significant to both the sender to marshal them, and even higher to
612	     the receiver which to transfer them.  Senders SHOULD therefore
613	     avoid encoding individual RDMA Read Chunks for roundup whenever
614	     possible.  It is acceptable, but not necessary, to include roundup
615	     data in an existing RDMA Read Chunk, but only if it is already
616	     present in the XDR stream to carry upper layer data.

618	     Note that there is no exposure of additional data at the sender due
619	     to eliding roundup data from the XDR stream, since any additional
620	     sender buffers are never exposed to the peer.  The data is
621	     literally not there to be transferred.

623	     For RDMA Write Chunks, a simpler encoding method applies.  Again,
624	     roundup bytes are not transferred, instead the chunk length sent to
625	     the receiver in the reply is simply increased to include any
626	     roundup.  Because of the requirement that the RDMA Write chunks are
627	     filled sequentially without gaps, this situation can only occur on
628	     the final chunk receiving data.  Therefore there is no opportunity
629	     for roundup data to insert misalignment or positional gaps into the
630	     XDR stream.

632	3.8.  RPC Call and Reply

634	     The RDMA transport for RPC provides three methods of moving data
635	     between RPC client and server:

637	     Inline
638	          Data are moved between RPC client and server within an RDMA
639	          Send.

641	     RDMA Read
642	          Data are moved between RPC client and server via an RDMA Read
643	          operation via steering tag, address and offset obtained from a
644	          read chunk list.

646	     RDMA Write
647	          Result data is moved from RPC server to client via an RDMA
648	          Write operation via steering tag, address and offset obtained
649	          from a write chunk list or reply chunk in the client's RPC
650	          call message.

652	     These methods of data movement may occur in combinations within a
653	     single RPC.  For instance, an RPC call may contain some inline data
654	     along with some large chunks to be transferred via RDMA Read to the
655	     server.  The reply to that call may have some result chunks that
656	     the server RDMA Writes back to the client.  The following protocol
657	     interactions illustrate RPC calls that use these methods to move
658	     RPC message data:

660	     An RPC with write chunks in the call message:

662	       RPC Client                           RPC Server
663	           |     RPC Call + Write Chunk list     |
664	      Send |   ------------------------------>   |
665	           |                                     |
666	           |               Chunk 1               |
667	           |   <------------------------------   | Write
668	           |                  :                  |
669	           |               Chunk n               |
670	           |   <------------------------------   | Write
671	           |                                     |
672	           |               RPC Reply             |
673	           |   <------------------------------   | Send

675	     In the presence of write chunks, RDMA ordering provides the
676	     guarantee that all data in the RDMA Write operations has been
677	     placed in memory prior to the client's RPC reply processing.

679	     An RPC with read chunks in the call message:

681	       RPC Client                           RPC Server
682	           |     RPC Call + Read Chunk list      |
683	      Send |   ------------------------------>   |
684	           |                                     |
685	           |               Chunk 1               |
686	           |   +------------------------------   | Read
687	           |   v----------------------------->   |
688	           |                  :                  |
689	           |               Chunk n               |
690	           |   +------------------------------   | Read
691	           |   v----------------------------->   |
692	           |                                     |
693	           |               RPC Reply             |
694	           |   <------------------------------   | Send

696	     An RPC with read chunks in the reply message:

698	       RPC Client                           RPC Server
699	           |               RPC Call              |
700	      Send |   ------------------------------>   |
701	           |                                     |
702	           |     RPC Reply + Read Chunk list     |
703	           |   <------------------------------   | Send
704	           |                                     |
705	           |               Chunk 1               |
706	      Read |   ------------------------------+   |
707	           |   <-----------------------------v   |
708	           |                  :                  |
709	           |               Chunk n               |
710	      Read |   ------------------------------+   |
711	           |   <-----------------------------v   |
712	           |                                     |
713	           |                 Done                |
714	      Send |   ------------------------------>   |

716	     The final Done message allows the RPC client to signal the server
717	     that it has received the chunks, so the server can de-register and
718	     free the memory holding the chunks.  A Done completion is not
719	     necessary for an RPC call, since the RPC reply Send is itself a
720	     receive completion notification.  In the event that the client
721	     fails to return the Done message within some timeout period, the
722	     server MAY conclude that a protocol violation has occurred and
723	     close the RPC connection, or it MAY proceed with a de-register and
724	     free its chunk buffers.  This may result in a fatal RDMA error if
725	     the client later attempts to perform an RDMA Read operation, which
726	     amounts to the same thing.

728	     The use of read chunks in RPC reply messages is much less efficient
729	     than providing write chunks in the originating RPC calls, due to
730	     the additional message exchanges, the need for the RPC server to
731	     advertise buffers to the peer, the necessity of the server
732	     maintaining a timer for the purpose of recovery from misbehaving
733	     clients, and the need for additional memory registration.  Their
734	     use is NOT RECOMMENDED by upper layers where efficiency is a
735	     primary concern. [NFSDDP]  However, they MAY be employed by upper
736	     layer protocol bindings which are primarily concerned with
737	     transparency, since they can frequently be implemented completely
738	     within the RPC lower layers.

740	     It is important to note that the Done message consumes a credit at
741	     the RPC server.  The RPC server SHOULD provide sufficient credits
742	     to the client to allow the Done message to be sent without deadlock
743	     (driving the outstanding credit count to zero).  The RPC client
744	     MUST account for its required Done messages to the server in its
745	     accounting of available credits, and the server SHOULD replenish
746	     any credit consumed by its use of such exchanges at its earliest
747	     opportunity.

749	     Finally, it is possible to conceive of RPC exchanges that involve
750	     any or all combinations of write chunks in the RPC call, read
751	     chunks in the RPC call, and read chunks in the RPC reply.  Support
752	     for such exchanges is straightforward from a protocol perspective,
753	     but in practice such exchanges would be quite rare, limited to
754	     upper layer protocol exchanges which transferred bulk data in both
755	     the call and corresponding reply.

757	3.9.  Padding

759	     Alignment of specific opaque data enables certain scatter/gather
760	     optimizations.  Padding leverages the useful property that RDMA
761	     transfers preserve alignment of data, even when they are placed
762	     into pre-posted receive buffers by Sends.

764	     Many servers can make good use of such padding.  Padding allows the
765	     chaining of RDMA receive buffers such that any data transferred by
766	     RDMA on behalf of RPC requests will be placed into appropriately
767	     aligned buffers on the system that receives the transfer.  In this
768	     way, the need for servers to perform RDMA Read to satisfy all but
769	     the largest client writes is obviated.

771	     The effect of padding is demonstrated below showing prior bytes on
772	     an XDR stream (XXX) followed by an opaque field consisting of four
773	     length bytes (LLLL) followed by data bytes (DDDD).  The receiver of
774	     the RDMA Send has posted two chained receive buffers.  Without
775	     padding, the opaque data is split across the two buffers.  With the
776	     addition of padding bytes ("ppp" in the figure below) prior to the
777	     first data byte, the data can be forced to align correctly in the
778	     second buffer.

780	                                           Buffer 1       Buffer 2
781	     Unpadded                           --------------  --------------

783	      XXXXXXXLLLLDDDDDDDDDDDDDD    ---> XXXXXXXLLLLDDD  DDDDDDDDDDD

785	     Padded

787	      XXXXXXXLLLLpppDDDDDDDDDDDDDD ---> XXXXXXXLLLLppp  DDDDDDDDDDDDDD

789	     Padding is implemented completely within the RDMA transport
790	     encoding, flagged with a specific message type.  Where padding is
791	     applied, two values are passed to the peer:  an "rdma_align" which
792	     is the padding value used, and "rdma_thresh", which is the opaque
793	     data size at or above which padding is applied.  For instance, if
794	     the server is using chained 4 KB receive buffers, then up to (4 KB
795	     - 1) padding bytes could be used to achieve alignment of the data.
796	     The XDR routine at the peer MUST consult these values when decoding
797	     opaque values.  Where the decoded length exceeds the rdma_thresh,
798	     the XDR decode MUST skip over the appropriate padding as indicated
799	     by rdma_align and the current XDR stream position.

801	4.  RPC RDMA Message Layout

803	     RPC call and reply messages are conveyed across an RDMA transport
804	     with a prepended RPC over RDMA header.  The RPC over RDMA header
805	     includes data for RDMA flow control credits, padding parameters and
806	     lists of addresses that provide direct data placement via RDMA Read
807	     and Write operations.  The layout of the RPC message itself is
808	     unchanged from that described in [RFC1831bis] except for the
809	     possible exclusion of large data chunks that will be moved by RDMA
810	     Read or Write operations.  If the RPC message (along with the RPC
811	     over RDMA header) is too long for the posted receive buffer (even
812	     after any large chunks are removed), then the entire RPC message
813	     MAY be moved separately as a chunk, leaving just the RPC over RDMA
814	     header in the RDMA Send.

816	4.1.  RPC over RDMA Header

818	     The RPC over RDMA header begins with four 32-bit fields that are
819	     always present and which control the RDMA interaction including
820	     RDMA-specific flow control.  These are then followed by a number of
821	     items such as chunk lists and padding which MAY or MUST NOT be
822	     present depending on the type of transmission.  The four fields
823	     which are always present are:

825	     1. Transaction ID (XID).
826	          The XID generated for the RPC call and reply.  Having the XID
827	          at the beginning of the message makes it easy to establish the
828	          message context.  This XID MUST be the same as the XID in the
829	          RPC header.  The receiver MAY perform its processing based
830	          solely on the XID in the RPC over RDMA header, and thereby
831	          ignore the XID in the RPC header, if it so chooses.

833	     2. Version number.
834	          This version of the RPC RDMA message protocol is 1.  The
835	          version number MUST be increased by one whenever the format of
836	          the RPC RDMA messages is changed.

838	     3. Flow control credit value.
839	          When sent in an RPC call message, the requested value is
840	          provided.  When sent in an RPC reply message, the granted
841	          value is returned.  RPC calls SHOULD not be sent in excess of
842	          the currently granted limit.

844	     4. Message type.

846	          o    RDMA_MSG = 0 indicates that chunk lists and RPC message
847	               follow.

849	          o    RDMA_NOMSG = 1 indicates that after the chunk lists there
850	               is no RPC message.  In this case, the chunk lists provide
851	               information to allow the message proper to be transferred
852	               using RDMA Read or write and thus is not appended to the
853	               RPC over RDMA header.

855	          o    RDMA_MSGP = 2 indicates that a chunk list and RPC message
856	               with some padding follow.

858	          0    RDMA_DONE = 3 indicates that the message signals the
859	               completion of a chunk transfer via RDMA Read.

861	          o    RDMA_ERROR = 4 is used to signal any detected error(s) in
862	               the RPC RDMA chunk encoding.

864	     Because the version number is encoded as part of this header, and
865	     the RDMA_ERROR message type is used to indicate errors, these first
866	     four fields and the start of the following message body MUST always
867	     remain aligned at these fixed offsets for all versions of the RPC
868	     over RDMA header.

870	     For a message of type RDMA_MSG or RDMA_NOMSG, the Read and Write
871	     chunk lists follow.  If the Read chunk list is null (a 32 bit word
872	     of zeros), then there are no chunks to be transferred separately
873	     and the RPC message follows in its entirety.  If non-null, then
874	     it's the beginning of an XDR encoded sequence of Read chunk list
875	     entries.  If the Write chunk list is non-null, then an XDR encoded
876	     sequence of Write chunk entries follows.

878	     If the message type is RDMA_MSGP, then two additional fields that
879	     specify the padding alignment and threshold are inserted prior to
880	     the Read and Write chunk lists.

882	     A header of message type RDMA_MSG or RDMA_MSGP MUST be followed by
883	     the RPC call or RPC reply message body, beginning with the XID.
884	     The XID in the RDMA_MSG or RDMA_MSGP header MUST match this.

886	     +--------+---------+---------+-----------+-------------+----------
887	     |        |         |         | Message   |   NULLs     | RPC Call
888	     |  XID   | Version | Credits |  Type     |    or       |    or
889	     |        |         |         |           | Chunk Lists | Reply Msg
890	     +--------+---------+---------+-----------+-------------+----------

892	     Note that in the case of RDMA_DONE and RDMA_ERROR, no chunk list or
893	     RPC message follows.  As an implementation hint: a gather operation
894	     on the Send of the RDMA RPC message can be used to marshal the
895	     initial header, the chunk list, and the RPC message itself.

897	4.2.  RPC over RDMA header errors

899	     When a peer receives an RPC RDMA message, it MUST perform the
900	     following basic validity checks on the header and chunk contents.
901	     If such errors are detected in the request, an RDMA_ERROR reply
902	     MUST be generated.

904	     Two types of errors are defined, version mismatch and invalid chunk
905	     format.  When the peer detects an RPC over RDMA header version
906	     which it does not support (currently this draft defines only
907	     version 1), it replies with an error code of ERR_VERS, and provides
908	     the low and high inclusive version numbers it does, in fact,
909	     support.  The version number in this reply MUST be any value
910	     otherwise valid at the receiver.  When other decoding errors are
911	     detected in the header or chunks, either an RPC decode error MAY be
912	     returned, or the ROC/RDMA error code ERR_CHUNK MUST be returned.

914	4.3.  XDR Language Description

916	     Here is the message layout in XDR language.

918	        struct xdr_rdma_segment {
919	           uint32 handle;          /* Registered memory handle */
920	           uint32 length;          /* Length of the chunk in bytes */
921	           uint64 offset;          /* Chunk virtual address or offset */
922	        };

924	        struct xdr_read_chunk {
925	           uint32 position;        /* Position in XDR stream */
926	           struct xdr_rdma_segment target;
927	        };

929	        struct xdr_read_list {
930	           struct xdr_read_chunk entry;
931	           struct xdr_read_list  *next;
932	        };

934	        struct xdr_write_chunk {
935	           struct xdr_rdma_segment target<>;
936	        };

938	        struct xdr_write_list {
939	           struct xdr_write_chunk entry;
940	           struct xdr_write_list  *next;
941	        };

943	        struct rdma_msg {
944	           uint32    rdma_xid;     /* Mirrors the RPC header xid */
945	           uint32    rdma_vers;    /* Version of this protocol */
946	           uint32    rdma_credit;  /* Buffers requested/granted */
947	           rdma_body rdma_body;
948	        };

950	        enum rdma_proc {
951	           RDMA_MSG=0,   /* An RPC call or reply msg */
952	           RDMA_NOMSG=1, /* An RPC call or reply msg - separate body */
953	           RDMA_MSGP=2,  /* An RPC call or reply msg with padding */
954	           RDMA_DONE=3,  /* Client signals reply completion */
955	           RDMA_ERROR=4  /* An RPC RDMA encoding error */
956	        };
957	        union rdma_body switch (rdma_proc proc) {
958	           case RDMA_MSG:
959	             rpc_rdma_header rdma_msg;
960	           case RDMA_NOMSG:
961	             rpc_rdma_header_nomsg rdma_nomsg;
962	           case RDMA_MSGP:
963	             rpc_rdma_header_padded rdma_msgp;
964	           case RDMA_DONE:
965	             void;
966	           case RDMA_ERROR:
967	             rpc_rdma_error rdma_error;
968	        };

970	        struct rpc_rdma_header {
971	           struct xdr_read_list   *rdma_reads;
972	           struct xdr_write_list  *rdma_writes;
973	           struct xdr_write_chunk *rdma_reply;
974	           /* rpc body follows */
975	        };

977	        struct rpc_rdma_header_nomsg {
978	           struct xdr_read_list   *rdma_reads;
979	           struct xdr_write_list  *rdma_writes;
980	           struct xdr_write_chunk *rdma_reply;
981	        };

983	        struct rpc_rdma_header_padded {
984	           uint32                 rdma_align;   /* Padding alignment */
985	           uint32                 rdma_thresh;  /* Padding threshold */
986	           struct xdr_read_list   *rdma_reads;
987	           struct xdr_write_list  *rdma_writes;
988	           struct xdr_write_chunk *rdma_reply;
989	           /* rpc body follows */
990	        };
991	        enum rpc_rdma_errcode {
992	           ERR_VERS = 1,
993	           ERR_CHUNK = 2
994	        };

996	        union rpc_rdma_error switch (rpc_rdma_errcode err) {
997	           case ERR_VERS:
998	             uint32               rdma_vers_low;
999	             uint32               rdma_vers_high;
1000	           case ERR_CHUNK:
1001	             void;
1002	           default:
1003	             uint32               rdma_extra[8];
1004	        };

1006	5.  Long Messages

1008	     The receiver of RDMA Send messages is required by RDMA to have
1009	     previously posted one or more adequately sized buffers.  The RPC
1010	     client can inform the server of the maximum size of its RDMA Send
1011	     messages via the Connection Configuration Protocol described later
1012	     in this document.

1014	     Since RPC messages are frequently small, memory savings can be
1015	     achieved by posting small buffers.  Even large messages like NFS
1016	     READ or WRITE will be quite small once the chunks are removed from
1017	     the message.  However, there may be large messages that would
1018	     demand a very large buffer be posted, where the contents of the
1019	     buffer may not be a chunkable XDR element.  A good example is an
1020	     NFS READDIR reply which may contain a large number of small
1021	     filename strings.  Also, the NFS version 4 protocol [RFC3530]
1022	     features COMPOUND request and reply messages of unbounded length.

1024	     Ideally, each upper layer will negotiate these limits.  However, it
1025	     is frequently necessary to provide a transparent solution.

1027	5.1.  Message as an RDMA Read Chunk

1029	     One relatively simple method is to have the client identify any RPC
1030	     message that exceeds the RPC server's posted buffer size and move
1031	     it separately as a chunk, i.e., reference it as the first entry in
1032	     the read chunk list with an XDR position of zero.

1034	     Normal Message

1036	     +--------+---------+---------+------------+-------------+----------
1037	     |        |         |         |            |             | RPC Call
1038	     |  XID   | Version | Credits |  RDMA_MSG  | Chunk Lists |    or
1039	     |        |         |         |            |             | Reply Msg
1040	     +--------+---------+---------+------------+-------------+----------

1042	     Long Message

1044	     +--------+---------+---------+------------+-------------+
1045	     |        |         |         |            |             |
1046	     |  XID   | Version | Credits | RDMA_NOMSG | Chunk Lists |
1047	     |        |         |         |            |             |
1048	     +--------+---------+---------+------------+-------------+
1049	                                                  |
1050	                                                  |  +----------
1051	                                                  |  | Long RPC Call
1052	                                                  +->|    or
1053	                                                     | Reply Message
1054	                                                     +----------

1056	     If the receiver gets an RPC over RDMA header with a message type of
1057	     RDMA_NOMSG and finds an initial read chunk list entry with a zero
1058	     XDR position, it allocates a registered buffer and issues an RDMA
1059	     Read of the long RPC message into it.  The receiver then proceeds
1060	     to XDR decode the RPC message as if it had received it inline with
1061	     the Send data.  Further decoding may issue additional RDMA Reads to
1062	     bring over additional chunks.

1064	     Although the handling of long messages requires one extra network
1065	     turnaround, in practice these messages will be rare if the posted
1066	     receive buffers are correctly sized, and of course they will be
1067	     non-existent for RDMA-aware upper layers.

1069	     A long call RPC with request supplied via RDMA Read

1071	       RPC Client                           RPC Server
1072	           |        RDMA over RPC Header         |
1073	      Send |   ------------------------------>   |
1074	           |                                     |
1075	           |          Long RPC Call Msg          |
1076	           |   +------------------------------   | Read
1077	           |   v----------------------------->   |
1078	           |                                     |
1079	           |         RDMA over RPC Reply         |
1080	           |   <------------------------------   | Send

1082	     An RPC with long reply returned via RDMA Read

1084	       RPC Client                           RPC Server
1085	           |             RPC Call                |
1086	      Send |   ------------------------------>   |
1087	           |                                     |
1088	           |         RDMA over RPC Header        |
1089	           |   <------------------------------   | Send
1090	           |                                     |
1091	           |          Long RPC Reply Msg         |
1092	      Read |   ------------------------------+   |
1093	           |   <-----------------------------v   |
1094	           |                                     |
1095	           |                Done                 |
1096	      Send |   ------------------------------>   |

1098	     It is possible for a single RPC procedure to employ both a long
1099	     call for its arguments, and a long reply for its results.  However,
1100	     such an operation is atypical, as few upper layers define such
1101	     exchanges.

1103	5.2.  RDMA Write of Long Replies (Reply Chunks)

1105	     A superior method of handling long RPC replies is to have the RPC
1106	     client post a large buffer into which the server can write a large
1107	     RPC reply.  This has the advantage that an RDMA Write may be
1108	     slightly faster in network latency than an RDMA Read, and does not
1109	     require the server to wait for the completion as it must for RDMA
1110	     Read.  Additionally, for a reply it removes the need for an
1111	     RDMA_DONE message if the large reply is returned as a Read chunk.

1113	     This protocol supports direct return of a large reply via the
1114	     inclusion of an OPTIONAL rdma_reply write chunk after the read
1115	     chunk list and the write chunk list.  The client allocates a buffer
1116	     sized to receive a large reply and enters its steering tag, address
1117	     and length in the rdma_reply write chunk.  If the reply message is
1118	     too long to return inline with an RDMA Send (exceeds the size of
1119	     the client's posted receive buffer), even with read chunks removed,
1120	     then the RPC server performs an RDMA Write of the RPC reply message
1121	     into the buffer indicated by the rdma_reply chunk.  If the client
1122	     doesn't provide an rdma_reply chunk, or if it's too small, then if
1123	     the upper layer specification permits, the message MAY be returned
1124	     as a Read chunk.

1126	     An RPC with long reply returned via RDMA Write

1128	       RPC Client                           RPC Server
1129	           |      RPC Call with rdma_reply       |
1130	      Send |   ------------------------------>   |
1131	           |                                     |
1132	           |          Long RPC Reply Msg         |
1133	           |   <------------------------------   | Write
1134	           |                                     |
1135	           |         RDMA over RPC Header        |
1136	           |   <------------------------------   | Send

1138	     The use of RDMA Write to return long replies requires that the
1139	     client applications anticipate a long reply and have some knowledge
1140	     of its size so that an adequately sized buffer can be allocated.
1141	     This is certainly true of NFS READDIR replies; where the client
1142	     already provides an upper bound on the size of the encoded
1143	     directory fragment to be returned by the server.

1145	     The use of these "reply chunks" is highly efficient and convenient
1146	     for both RPC client and server.  Their use is encouraged for
1147	     eligible RPC operations such as NFS READDIR, which would otherwise
1148	     require extensive chunk management within the results or use of
1149	     RDMA Read and a Done message. [NFSDDP]

1151	6.  Connection Configuration Protocol

1153	     RDMA Send operations require the receiver to post one or more
1154	     buffers at the RDMA connection endpoint, each large enough to
1155	     receive the largest Send message.  Buffers are consumed as Send
1156	     messages are received.  If a buffer is too small, or if there are
1157	     no buffers posted, the RDMA transport MAY return an error and break
1158	     the RDMA connection.  The receiver MUST post sufficient, adequately
1159	     buffers to avoid buffer overrun or capacity errors.

1161	     The protocol described above includes only a mechanism for managing
1162	     the number of such receive buffers, and no explicit features to
1163	     allow the RPC client and server to provision or control buffer
1164	     sizing, nor any other session parameters.

1166	     In the past, this type of connection management has not been
1167	     necessary for RPC.  RPC over UDP or TCP does not have a protocol to
1168	     negotiate the link.  The server can get a rough idea of the maximum
1169	     size of messages from the server protocol code.  However, a
1170	     protocol to negotiate transport features on a more dynamic basis is
1171	     desirable.

1173	     The Connection Configuration Protocol allows the client to pass its
1174	     connection requirements to the server, and allows the server to
1175	     inform the client of its connection limits.

1177	     Use of the Connection Configuration Protocol by an upper layer is
1178	     OPTIONAL.

1180	6.1.  Initial Connection State

1182	     This protocol MAY be used for connection setup prior to the use of
1183	     another RPC protocol that uses the RDMA transport.  It operates in-
1184	     band, i.e., it uses the connection itself to negotiate the
1185	     connection parameters.  To provide a basis for connection
1186	     negotiation, the connection is assumed to provide a basic level of
1187	     interoperability: the ability to exchange at least one RPC message
1188	     at a time that is at least 1 KB in size.  The server MAY exceed
1189	     this basic level of configuration, but the client MUST NOT assume
1190	     more than one, and MUST receive a valid reply from the server
1191	     carrying the actual number of available receive messages, prior to
1192	     sending its next request.

1194	6.2.  Protocol Description

1196	     Version 1 of the Connection Configuration protocol consists of a
1197	     single procedure that allows the client to inform the server of its
1198	     connection requirements and the server to return connection
1199	     information to the client.

1201	     The maxcall_sendsize argument is the maximum size of an RPC call
1202	     message that the client MAY send inline in an RDMA Send message to
1203	     the server.  The server MAY return a maxcall_sendsize value that is
1204	     smaller or larger than the client's request.  The client MUST NOT
1205	     send an inline call message larger than what the server will
1206	     accept.  The maxcall_sendsize limits only the size of inline RPC
1207	     calls.  It does not limit the size of long RPC messages transferred
1208	     as an initial chunk in the Read chunk list.

1210	     The maxreply_sendsize is the maximum size of an inline RPC message
1211	     that the client will accept from the server.

1213	     The maxrdmaread is the maximum number of RDMA Reads which may be
1214	     active at the peer.  This number correlates to the RDMA incoming
1215	     RDMA Read count ("IRD") configured into each originating endpoint
1216	     by the client or server.  If more than this number of RDMA Read
1217	     operations by the connected peer are issued simultaneously,
1218	     connection loss or suboptimal flow control may result, therefore
1219	     the value SHOULD be observed at all times.  The peers' values need
1220	     not be equal.  If zero, the peer MUST NOT issue requests which
1221	     require RDMA Read to satisfy, as no transfer will be possible.

1223	     The align value is the value recommended by the server for opaque
1224	     data values such as strings and counted byte arrays.  The client
1225	     MAY use this value to compute the number of prepended pad bytes
1226	     when XDR encoding opaque values in the RPC call message.

1228	        typedef unsigned int uint32;

1230	        struct config_rdma_req {
1231	             uint32  maxcall_sendsize;
1232	                         /* max size of inline RPC call */
1233	             uint32  maxreply_sendsize;
1234	                         /* max size of inline RPC reply */
1235	             uint32  maxrdmaread;
1236	                         /* max active RDMA Reads at client */
1237	        };

1239	        struct config_rdma_reply {
1240	             uint32  maxcall_sendsize;
1241	                         /* max call size accepted by server */
1242	             uint32  align;
1243	                         /* server's receive buffer alignment */
1244	             uint32  maxrdmaread;
1245	                         /* max active RDMA Reads at server */
1246	        };

1248	        program CONFIG_RDMA_PROG {
1249	           version VERS1 {
1250	              /*
1251	               * Config call/reply
1252	               */
1253	              config_rdma_reply CONF_RDMA(config_rdma_req) = 1;
1254	           } = 1;
1255	        } = 100400;

1257	7.  Memory Registration Overhead

1259	     RDMA requires that all data be transferred between registered
1260	     memory regions at the source and destination.  All protocol headers
1261	     as well as separately transferred data chunks use registered
1262	     memory.  Since the cost of registering and de-registering memory
1263	     can be a large proportion of the RDMA transaction cost, it is
1264	     important to minimize registration activity.  This is easily
1265	     achieved within RPC controlled memory by allocating chunk list data
1266	     and RPC headers in a reusable way from pre-registered pools.

1268	     The data chunks transferred via RDMA MAY occupy memory that
1269	     persists outside the bounds of the RPC transaction.  Hence, the
1270	     default behavior of an RPC over RDMA transport is to register and
1271	     de-register these chunks on every transaction.  However, this is
1272	     not a limitation of the protocol - only of the existing local RPC
1273	     API.  The API is easily extended through such functions as
1274	     rpc_control(3) to change the default behavior so that the
1275	     application can assume responsibility for controlling memory
1276	     registration through an RPC-provided registered memory allocator.

1278	8.  Errors and Error Recovery

1280	     RPC RDMA protocol errors are described in section 4.  RPC errors
1281	     and RPC error recovery are not affected by the protocol, and
1282	     proceed as for any RPC error condition.  RDMA Transport error
1283	     reporting and recovery are outside the scope of this protocol.

1285	     It is assumed that the link itself will provide some degree of
1286	     error detection and retransmission.  iWARP's MPA layer (when used
1287	     over TCP), SCTP, as well as the Infiniband link layer all provide
1288	     CRC protection of the RDMA payload, and CRC-class protection is a
1289	     general attribute of such transports.  Additionally, the RPC layer
1290	     itself can accept errors from the link level and recover via
1291	     retransmission.  RPC recovery can handle complete loss and re-
1292	     establishment of the link.

1294	     See section 11 for further discussion of the use of RPC-level
1295	     integrity schemes to detect errors, and related efficiency issues.

1297	9.  Node Addressing

1299	     In setting up a new RDMA connection, the first action by an RPC
1300	     client will be to obtain a transport address for the server.  The
1301	     mechanism used to obtain this address, and to open an RDMA
1302	     connection is dependent on the type of RDMA transport, and is the
1303	     responsibility of each RPC protocol binding and its local
1304	     implementation.

1306	10.  RPC Binding

1308	     RPC services normally register with a portmap or rpcbind [RFC1833]
1309	     service, which associates an RPC program number with a service
1310	     address.  (In the case of UDP or TCP, the service address for NFS
1311	     is normally port 2049.)  This policy is no different with RDMA
1312	     interconnects, although it may require the allocation of port
1313	     numbers appropriate to each upper layer binding which uses the RPC
1314	     framing defined here.

1316	     When mapped atop the iWARP [RFC5040, RFC5041] transport, which uses
1317	     IP port addressing due to its layering on TCP and/or SCTP, port
1318	     mapping is trivial and consists merely of issuing the port in the
1319	     connection process.

1321	     When mapped atop Infiniband [IB], which uses a GID-based service
1322	     endpoint naming scheme, a translation MUST be employed.  One such
1323	     translation is defined in the Infiniband Port Addressing Annex
1324	     [IBPORT], which is appropriate for translating IP port addressing
1325	     to the Infiniband network.  Therefore, in this case, IP port
1326	     addressing may be readily employed by the upper layer.

1328	     When a mapping standard or convention exists for IP ports on an
1329	     RDMA interconnect, there are several possibilities for each upper
1330	     layer to consider:

1332	          One possibility is to have an upper layer server register its
1333	          mapped IP port with the rpcbind service, under the netid (or
1334	          netid's) defined here.  An RPC/RDMA-aware client can then
1335	          resolve its desired service to a mappable port, and proceed to
1336	          connect.  This is the most flexible and compatible approach,
1337	          for those upper layers which are defined to use the rpcbind
1338	          service.

1340	          A second possibility is to have the server's portmapper
1341	          register itself on the RDMA interconnect at a "well known"
1342	          service address.  (On UDP or TCP, this corresponds to port
1343	          111.)  A client could connect to this service address and use
1344	          the portmap protocol to obtain a service address in response
1345	          to a program number, e.g., an iWARP port number, or an
1346	          Infiniband GID.

1348	          Alternatively, the client could simply connect to the mapped
1349	          well-known port for the service itself, if it is appropriately
1350	          defined.

1352	     Historically, different RPC protocols have taken different
1353	     approaches to their port assignment, therefore the specific method
1354	     is left to each RPC/RDMA-enabled upper layer binding, and not
1355	     addressed here.

1357	     This specification defines a new "netid", to be used for
1358	     registration of upper layers atop iWARP [RFC5040, RFC5041] and
1359	     (when a suitable port translation service is available) Infiniband
1360	     [IB] in section 12, "IANA Considerations."  Additional RDMA-capable
1361	     networks MAY define their own netids, or if they provide a port
1362	     translation, MAY share the one defined here.

1364	11.  Security Considerations

1366	     RPC provides its own security via the RPCSEC_GSS framework
1367	     [RFC2203].  RPCSEC_GSS can provide message authentication,
1368	     integrity checking, and privacy.  This security mechanism will be
1369	     unaffected by the RDMA transport.  The data integrity and privacy
1370	     features alter the body of the message, presenting it as a single
1371	     chunk.  For large messages the chunk may be large enough to qualify
1372	     for RDMA Read transfer.  However, there is much data movement
1373	     associated with computation and verification of integrity, or
1374	     encryption/decryption, so certain performance advantages may be
1375	     lost.

1377	     For efficiency, a more appropriate security mechanism for RDMA
1378	     links may be link-level protection, such as certain configurations
1379	     of IPsec, which may be co-located in the RDMA hardware.  The use of
1380	     link-level protection MAY be negotiated through the use of the new
1381	     RPCSEC_GSS mechanism defined in [RPCSECGSSV2] in conjunction with
1382	     the Channel Binding mechanism [RFC5056] and IPsec Channel
1383	     Connection Latching [BTNSLATCH].  Use of such mechanisms is
1384	     REQUIRED where integrity and/or privacy is desired, and where
1385	     efficiency is required.

1387	     An additional consideration is the protection of the integrity and
1388	     privacy of local memory by the RDMA transport itself.  The use of
1389	     RDMA by RPC MUST NOT introduce any vulnerabilities to system memory
1390	     contents, or to memory owned by user processes.  These protections
1391	     are provided by the RDMA layer specifications, and specifically
1392	     their security models.  It is REQUIRED that any RDMA provider used
1393	     for RPC transport be conformant to the requirements of [RFC5042] in
1394	     order to satisfy these protections.

1396	     Once delivered securely by the RDMA provider, any RDMA-exposed
1397	     addresses will contain only RPC payloads in the chunk lists,
1398	     transferred under the protection of RPCSEC_GSS integrity and
1399	     privacy.  By these means, the data will be protected end-to-end, as
1400	     required by the RPC layer security model.

1402	     Where results are supplied to the requester via Read chunks, a
1403	     server resource deficit can arise if the client does not promptly
1404	     acknowledge their status via the RDMA_DONE message.  This can
1405	     potentially lead to a denial of service situation, with a single
1406	     client unfairly (and unnecessarily) consuming server RDMA
1407	     resources.  Servers MUST protect against this situation,
1408	     originating from one or many clients.  For example, a time-based
1409	     window of buffer availability may be offered, if the client fails
1410	     to obtain the data within the window, it will simply retry using
1411	     ordinary RPC retry semantics.  Or, a more severe method would be
1412	     for the server to simply close the client's RDMA connection,
1413	     freeing the RDMA resources and allowing the server to reclaim them.

1415	     A fairer and more useful method is provided by the protocol itself.
1416	     The server MAY use the rdma_credit value to limit the number of
1417	     outstanding requests for each client.  By including the number of
1418	     outstanding RDMA_DONE completions in the computation of available
1419	     client credits, the server can limit its exposure to each client,
1420	     and therefore provide uninterrupted service as its resources
1421	     permit.

1423	     However, the server must ensure that it does not decrease the
1424	     credit count to zero with this method, since the RDMA_DONE message
1425	     is not acknowledged.  If the credit count were to drop to zero
1426	     solely due to outstanding RDMA_DONE messages, the client would
1427	     deadlock since it would never obtain a new credit with which to
1428	     continue.  Therefore, if the server adjusts credits to zero for
1429	     outstanding RDMA_DONE, it MUST withhold its reply to at least one
1430	     message in order to provide the next credit.  The time-based window
1431	     (or any other appropriate method) SHOULD be used by the server to
1432	     recover resources in the event that the client never returns.

1434	     The "Connection Configuration Protocol", when used, MUST be
1435	     protected by an appropriate RPC security flavor, to ensure it is
1436	     not attacked in the process of initiating an RPC/RDMA connection.

1438	12.  IANA Considerations

1440	     The new RPC transport is to be assigned a new RPC "netid", which is
1441	     an rpcbind [RFC1833] string used to describe the underlying
1442	     protocol in order for RPC to select the appropriate transport
1443	     framing, as well as the format of the service ports.

1445	     The following "nc_proto" registry string is hereby defined for this
1446	     purpose:

1448	          NC_RDMA "rdma"

1450	     This netid MAY be used for any RDMA network satisfying the
1451	     requirements of section 2, and able to identify service endpoints
1452	     using IP port addressing, possibly through use of a translation
1453	     service as described above in section 10, RPC Binding.

1455	     As a new RPC transport, this protocol has no effect on RPC program
1456	     numbers or existing registered port numbers.  However, new port
1457	     numbers MAY be registered for use by RPC/RDMA-enabled services, as
1458	     appropriate to the new networks over which the services will
1459	     operate.

1461	     The OPTIONAL Connection Configuration protocol described herein
1462	     requires an RPC program number assignment.  The value "100400" is
1463	     hereby assigned:

1465	          rdmaconfig 100400 rpc.rdmaconfig

1467	     Currently, neither the nc_proto netid's nor the RPC program numbers
1468	     are are assigned by IANA.  The list in [RFC1833] has served as the
1469	     netid registry, and the republication declared in [IANA-RPC] has
1470	     served as the program number registry.  Ideally, IANA will create
1471	     explicit registries for these objects.  However, in the absence of
1472	     new registries, this document would serve as the repository for the
1473	     RPC program number assignment, and the protocol netid.

1475	13.  Acknowledgements

1477	     The authors wish to thank Rob Thurlow, John Howard, Chet Juszczak,
1478	     Alex Chiu, Peter Staubach, Dave Noveck, Brian Pawlowski, Steve
1479	     Kleiman, Mike Eisler, Mark Wittle, Shantanu Mehendale, David
1480	     Robinson and Mallikarjun Chadalapaka for their contributions to
1481	     this document.

1483	14.  Normative References

1485	     [RFC2119]
1486	          S. Bradner, "Key words for use in RFCs to Indicate Requirement
1487	          Levels", Best Current Practice, BCP 14, RFC 2119, March 1997.

1489	     [RFC1094]
1490	          Sun Microsystems, "NFS: Network File System Protocol
1491	          Specification", (NFS version 2) Informational RFC,
1492	          http://www.ietf.org/rfc/rfc1094.txt

1494	     [RFC1831bis]
1495	          R. Thurlow, Ed., "RPC: Remote Procedure Call Protocol
1496	          Specification Version 2", Standards Track RFC

1498	     [RFC4506]
1499	          M. Eisler Ed., "XDR: External Data Representation Standard",
1500	          Standards Track RFC, http://www.ietf.org/rfc/rfc4506.txt

1502	     [RFC1813]
1503	          B. Callaghan, B. Pawlowski, P. Staubach, "NFS Version 3
1504	          Protocol Specification", Informational RFC,
1505	          http://www.ietf.org/rfc/rfc1813.txt

1507	     [RFC1833]
1508	          R. Srinivasan, "Binding Protocols for ONC RPC Version 2",
1509	          Standards Track RFC, http://www.ietf.org/rfc/rfc1833.txt

1511	     [RFC3530]
1512	          S. Shepler, B. Callaghan, D. Robinson, R. Thurlow, C. Beame,
1513	          M.  Eisler, D. Noveck, "NFS version 4 Protocol", Standards
1514	          Track RFC, http://www.ietf.org/rfc/rfc3530.txt

1516	     [RFC2203]
1517	          M. Eisler, A. Chiu, L. Ling, "RPCSEC_GSS Protocol
1518	          Specification", Standards Track RFC,
1519	          http://www.ietf.org/rfc/rfc2203.txt

1521	     [RPCSECGSSV2]
1522	          M. Eisler, "RPCSEC_GSS Version 2", Internet Draft Work in
1523	          Progress draft-ietf-nfsv4-rpcsec-gss-v2

1525	     [RFC5056]
1526	          N. Williams, "On the Use of Channel Bindings to Secure
1527	          Channels", Standards Track RFC

1529	     [BTNSLATCH]
1530	          N. Williams, "IPsec Channels: Connection Latching", Internet
1531	          Draft Work in Progress draft-ietf-btns-connection-latching

1533	     [RFC5042]
1534	          J. Pinkerton, E. Deleganes, "Direct Data Placement Protocol
1535	          (DDP) / Remote Direct Memory Access Protocol (RDMAP) Security"
1536	          Standards Track RFC

1538	15.  Informative References

1540	     [NFSDDP]
1541	          B. Callaghan, T. Talpey, "NFS Direct Data Placement" Internet
1542	          Draft Work in Progress, draft-ietf-nfsv4-nfsdirect

1544	     [RFC5040]
1545	          R. Recio et al., "A Remote Direct Memory Access Protocol
1546	          Specification", Standards Track RFC

1548	     [RFC5041]
1549	          H. Shah et al., "Direct Data Placement over Reliable
1550	          Transports", Standards Track RFC

1552	     [NFSRDMAPS]
1553	          T. Talpey, C. Juszczak, "NFS RDMA Problem Statement", Internet
1554	          Draft Work in Progress, draft-ietf-nfsv4-nfs-rdma-problem-
1555	          statement

1557	     [NFSv4.1]
1558	          S. Shepler et al., ed., "NFSv4 Minor Version 1" Internet Draft
1559	          Work in Progress, draft-ietf-nfsv4-minorversion1

1561	     [IB]
1562	          Infiniband Architecture Specification, available from
1563	          http://www.infinibandta.org

1565	     [IBPORT]
1566	          Infiniband Trade Association, "IP Addressing Annex", available
1567	          from http://www.infinibandta.org

1569	     [IANA-RPC]
1570	          IANA Sun RPC number statement,
1571	          http://www.iana.org/assignments/sun-rpc-numbers

1573	16.  Authors' Addresses

1575	     Tom Talpey
1576	     Network Appliance, Inc.
1577	     1601 Trapelo Road, #16
1578	     Waltham, MA 02451 USA

1580	     Phone: +1 781 768 5329
1581	     EMail: thomas.talpey@netapp.com
1582	     Brent Callaghan
1583	     Apple Computer, Inc.
1584	     MS: 302-4K
1585	     2 Infinite Loop
1586	     Cupertino, CA 95014 USA

1588	     EMail: brentc@apple.com

1590	17.  Intellectual Property and Copyright Statements

1592	Full Copyright Statement

1594	     Copyright (C) The IETF Trust (2008).

1596	     This document is subject to the rights, licenses and restrictions
1597	     contained in BCP 78, and except as set forth therein, the authors
1598	     retain all their rights.

1600	     This document and the information contained herein are provided on
1601	     an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
1602	     REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE
1603	     IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
1604	     WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
1605	     WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE
1606	     ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
1607	     FOR A PARTICULAR PURPOSE.

1609	Intellectual Property
1610	     The IETF takes no position regarding the validity or scope of any
1611	     Intellectual Property Rights or other rights that might be claimed
1612	     to pertain to the implementation or use of the technology described
1613	     in this document or the extent to which any license under such
1614	     rights might or might not be available; nor does it represent that
1615	     it has made any independent effort to identify any such rights.
1616	     Information on the procedures with respect to rights in RFC
1617	     documents can be found in BCP 78 and BCP 79.

1619	     Copies of IPR disclosures made to the IETF Secretariat and any
1620	     assurances of licenses to be made available, or the result of an
1621	     attempt made to obtain a general license or permission for the use
1622	     of such proprietary rights by implementers or users of this
1623	     specification can be obtained from the IETF on-line IPR repository
1624	     at http://www.ietf.org/ipr.

1626	     The IETF invites any interested party to bring to its attention any
1627	     copyrights, patents or patent applications, or other proprietary
1628	     rights that may cover technology that may be required to implement
1629	     this standard.  Please address the information to the IETF at ietf-
1630	     ipr@ietf.org.

1632	Acknowledgment
1633	     Funding for the RFC Editor function is provided by the IETF
1634	     Administrative Support Activity (IASA).