idnits 2.17.1 

draft-ietf-rddp-applicability-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 17.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 1004.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 981.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 988.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 994.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (April 24, 2006) is 6576 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '1' is defined on line 909, but no explicit reference
     was found in the text

  == Unused Reference: '2' is defined on line 912, but no explicit reference
     was found in the text

  == Unused Reference: '3' is defined on line 915, but no explicit reference
     was found in the text

  == Unused Reference: '4' is defined on line 918, but no explicit reference
     was found in the text

  == Unused Reference: '5' is defined on line 923, but no explicit reference
     was found in the text

  == Unused Reference: '9' is defined on line 937, but no explicit reference
     was found in the text

  ** Obsolete normative reference: RFC 2246 (ref. '2') (Obsoleted by RFC 4346)

  ** Obsolete normative reference: RFC 2406 (ref. '3') (Obsoleted by RFC
     4303, RFC 4305)

  ** Obsolete normative reference: RFC 2960 (ref. '4') (Obsoleted by RFC 4960)

  ** Downref: Normative reference to an Informational RFC: RFC 3257 (ref. '5')

  == Outdated reference: A later version (-07) exists of
     draft-ietf-rddp-rdmap-05

  == Outdated reference: A later version (-07) exists of
     draft-ietf-rddp-ddp-05

  == Outdated reference: A later version (-07) exists of
     draft-ietf-rddp-sctp-02

  == Outdated reference: A later version (-10) exists of
     draft-ietf-rddp-security-08

  == Outdated reference: A later version (-08) exists of
     draft-ietf-rddp-mpa-02

  == Outdated reference: A later version (-08) exists of
     draft-ietf-nfsv4-nfsdirect-02


     Summary: 7 errors (**), 0 flaws (~~), 15 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Remote Direct Data Placement                                  C. Bestler
3	Working group                                       Broadcom Corporation
4	Internet-Draft                                                  L. Coene
5	Expires: October 26, 2006                                        Siemens
6	                                                          April 24, 2006

8	Applicability of Remote Direct Memory Access Protocol (RDMA) and Direct
9	                          Data Placement (DDP)
10	                  draft-ietf-rddp-applicability-06.txt

12	Status of this Memo

14	   By submitting this Internet-Draft, each author represents that any
15	   applicable patent or other IPR claims of which he or she is aware
16	   have been or will be disclosed, and any of which he or she becomes
17	   aware will be disclosed, in accordance with Section 6 of BCP 79.

19	   Internet-Drafts are working documents of the Internet Engineering
20	   Task Force (IETF), its areas, and its working groups.  Note that
21	   other groups may also distribute working documents as Internet-
22	   Drafts.

24	   Internet-Drafts are draft documents valid for a maximum of six months
25	   and may be updated, replaced, or obsoleted by other documents at any
26	   time.  It is inappropriate to use Internet-Drafts as reference
27	   material or to cite them other than as "work in progress."

29	   The list of current Internet-Drafts can be accessed at
30	   http://www.ietf.org/ietf/1id-abstracts.txt.

32	   The list of Internet-Draft Shadow Directories can be accessed at
33	   http://www.ietf.org/shadow.html.

35	   This Internet-Draft will expire on October 26, 2006.

37	Copyright Notice

39	   Copyright (C) The Internet Society (2006).

41	Abstract

43	   This document describes the applicability of Remote Direct Memory
44	   Access Protocol (RDMAP) and the Direct Data Placement Protocol (DDP).
45	   It compares and contrasts the different transport options over IP
46	   that DDP can use, provides guidance to ULP developers on choosing
47	   between available transports and/or how to be indifferent to the
48	   specific transport layer used, compares use of DDP with direct use of
49	   the supporting transports, and compares DDP over IP transports with
50	   non-IP transports that support RDMA functionality.

52	Table of Contents

54	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
55	   2.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  5
56	   3.  Direct Placement . . . . . . . . . . . . . . . . . . . . . . .  6
57	     3.1.  Fewer Required ULP Interactions  . . . . . . . . . . . . .  6
58	     3.2.  Direct Placement using only the LLP  . . . . . . . . . . .  6
59	   4.  Tagged Messages  . . . . . . . . . . . . . . . . . . . . . . .  8
60	     4.1.  Order Independent Reception  . . . . . . . . . . . . . . .  8
61	     4.2.  Reduced ULP Notifications  . . . . . . . . . . . . . . . .  9
62	     4.3.  Simplified ULP Exchanges . . . . . . . . . . . . . . . . .  9
63	     4.4.  Order Independent Sending  . . . . . . . . . . . . . . . . 11
64	     4.5.  Untagged Messages and Tagged Buffers as ULP Credits  . . . 12
65	   5.  RDMA Read  . . . . . . . . . . . . . . . . . . . . . . . . . . 14
66	   6.  LLP Comparisons  . . . . . . . . . . . . . . . . . . . . . . . 15
67	     6.1.  Multistreaming Implications  . . . . . . . . . . . . . . . 15
68	     6.2.  Out of Order Reception Implications  . . . . . . . . . . . 15
69	     6.3.  Header and Marker Overhead . . . . . . . . . . . . . . . . 15
70	     6.4.  Middlebox Support  . . . . . . . . . . . . . . . . . . . . 15
71	     6.5.  Processing Overhead  . . . . . . . . . . . . . . . . . . . 16
72	     6.6.  Data Integrity Implications  . . . . . . . . . . . . . . . 16
73	       6.6.1.  MPA/TCP Specifics  . . . . . . . . . . . . . . . . . . 16
74	       6.6.2.  SCTP Specifics . . . . . . . . . . . . . . . . . . . . 17
75	     6.7.  Non-IP Transports  . . . . . . . . . . . . . . . . . . . . 17
76	       6.7.1.  No RDMA Layer Ack  . . . . . . . . . . . . . . . . . . 17
77	     6.8.  Other IP Transports  . . . . . . . . . . . . . . . . . . . 18
78	     6.9.  LLP Independent Session Establishment  . . . . . . . . . . 19
79	       6.9.1.  RDMA-only Session Establishment  . . . . . . . . . . . 19
80	       6.9.2.  RDMA-Conditional Session Establishment . . . . . . . . 19
81	   7.  Local Interface Implications . . . . . . . . . . . . . . . . . 21
82	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 22
83	   9.  Security considerations  . . . . . . . . . . . . . . . . . . . 23
84	     9.1.  Connection/Association Setup . . . . . . . . . . . . . . . 23
85	     9.2.  Tagged Buffer Exposure . . . . . . . . . . . . . . . . . . 23
86	     9.3.  Impact of Encrypted Transports . . . . . . . . . . . . . . 24
87	   10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25
88	     10.1. Normative references . . . . . . . . . . . . . . . . . . . 25
89	     10.2. Informative References . . . . . . . . . . . . . . . . . . 25
90	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 26
91	   Intellectual Property and Copyright Statements . . . . . . . . . . 27

93	1.  Introduction

95	   Remote Direct Memory Access Protocol [6] and Direct Data Placement
96	   [7] work together to provide application independent efficient
97	   placement of application payload directly into buffers specified by
98	   the Upper Layer Protocol (ULP).

100	   The DDP protocol is responsible for direct placement of received
101	   payload into ULP specified buffers.  The RDMAP protocol provides
102	   completion notifications to the ULP and support for Data Sink
103	   initiated fetch of advertised buffers (RDMA Reads).

105	   DDP and RDMAP are both application independent protocols which allow
106	   the ULP to perform remote direct data placement.  DDP can use
107	   multiple standard IP transports including SCTP and TCP.

109	   By clarifying the situations where the functionality of these
110	   protocols are applicable, this document can guide implementers,
111	   application and protocol designers in selecting which protocols to
112	   use.

114	   The applicability of RDMAP/DDP is driven by their unique
115	   capabilities:

117	   o  The existence of an application independent protocol allows common
118	      solutions to be implemented in hardware and/or the kernel.  This
119	      document will discuss when common data placement procedures are of
120	      the greatest benefit to applications as contrasted with
121	      application specific solutions built on top of direct use of the
122	      underlying transport.

124	   o  DDP supports both untagged and tagged buffers.  Tagged buffers
125	      allow the Data Sink ULP to be indifferent to what order (or in
126	      what messages) the Data Source sent the data, or what order
127	      packets are received in.  Typically tagged data can be used for
128	      payload transfer, while untagged is best used for control
129	      messages.  However each upper layer protocol can determine the
130	      optimal use of tagged and untagged messages for itself.  This
131	      document will discuss when Data Source flexibility is of benefit
132	      to applications.

134	   o  RDMAP consolidates ULP notifications, thereby minimizing the
135	      number of required ULP interactions.

137	   o  RDMAP defines RDMA Reads, which allow remote access to advertised
138	      buffers.  This document will review the advantages of using RDMA
139	      Reads as contrasted to alternate solutions.

141	   Some non-IP transports, such as InfiniBand, directly integrate RDMA
142	   features.  This document will review the applicability of providing
143	   RDMA services over ubiquitous IP transports as opposed to the use of
144	   customized transport protocols.  Due to the fact that DDP is defined
145	   cleanly as a layer over existing IP transports, DDP has simpler
146	   ordering rules than some prior RDMA protocols.  This may have some
147	   implications for application designers.

149	   The full capabilities of DDP and RDMAP can only be fully realized by
150	   applications that are designed to exploit them.  The co-existence of
151	   RDMAP/DDP aware local interfaces with traditional socket interfaces
152	   will also be explored.

154	   Finally, DDP support is defined for at least two IP transports: SCTP
155	   [8] and MPA over TCP [10].  The rationale for supporting both
156	   transports is reviewed, as well as when each would be the appropriate
157	   selection.

159	2.  Definitions

161	   Advertisement - the act of informing a Remote Peer that a local RDMA
162	      Buffer is available to it.  A Node makes available an RDMA Buffer
163	      for incoming RDMA Read or RDMA Write access by informing its RDMA/
164	      DDP peer of the Tagged Buffer identifiers (STag, base address, and
165	      buffer length).  This advertisement of Tagged Buffer information
166	      is not defined by RDMA/DDP and is left to the ULP.  A typical
167	      method would be for the Local Peer to embed the Tagged Buffer's
168	      Steering Tag, base address, and length in a Send Message destined
169	      for the Remote Peer.

171	   Data Sink - The peer receiving a data payload.  Note that the Data
172	      Sink can be required to both send and receive RDMA/DDP Messages to
173	      transfer a data payload.

175	   Data Source - The peer sending a data payload.  Note that the Data
176	      Source can be required to both send and receive RDMA/DDP Messages
177	      to transfer a data payload.

179	   Lower Layer Protocol (LLP) The transport protocol that provides
180	      services to DDP.  This is an IP transport with any required
181	      adaptation layer.  Adaptation layers are defined for SCTP and TCP.

183	   Steering Tag (STag) An identifier of a Tagged Buffer on a Node, valid
184	      as defined within a protocol specification.

186	   Tagged Message A DDP message that is directed to a ULP specified
187	      buffer based upon imbedded addressing information.  In the
188	      immediate sense, the destination buffer is specified by the
189	      message sender.  The message receiver is given no independent
190	      indication that a tagged message has been received.

192	   Untagged Message A DDP message that is directed to a ULP specified
193	      buffer based upon a Message Sequence Number being matched with a
194	      receiver supplied buffer.  The destination buffer is specified by
195	      the message receiver.  The message receiver is notified by some
196	      mechanism that an untagged message has been received.

198	   Upper Layer Protocol (ULP) The direct user of RDMAP/DDP services.  In
199	      addition to protocols such as iSER [11] and NFSv4 over RDMA [12],
200	      the ULP may be embedded in an application, or a middleware layer
201	      as is often the case for the Sockets Direct Protocol (SDP) and
202	      Remote Procedure Call (RPC) protocols.

204	3.  Direct Placement

206	   Direct Data Placement optimizes the placement of ULP payload into the
207	   correct destination buffers, typically eliminating intermediate
208	   copying.  Placement is enabled without regard to order of arrival,
209	   order of transmission or requiring per-placement interaction with the
210	   ULP.

212	   RDMAP minimizes the required ULP interactions .  This capability is
213	   most valuable for applications that require multiple transport layer
214	   packets for each required ULP interaction.

216	3.1.  Fewer Required ULP Interactions

218	   While reducing the number of required ULP interactions is in itself
219	   desirable, it is critical for high speed connections.  The burst
220	   packet rate for a high speed interface could easily exceed the host
221	   systems ability to switch ULP contexts.

223	   Content access applications are primary examples of applications with
224	   both high bandwidth and high content to required ULP interaction
225	   ratios.  These applications include file access protocols (NAS),
226	   storage access (SAN), database access and other application specific
227	   forms of content access such as HTTP, XML and email.

229	3.2.  Direct Placement using only the LLP

231	   Direct data placement can be achieved without RDMA.  Pre-posting of
232	   receive buffers could allow a non-RDMA network stack to place data
233	   directly to user buffers.

235	   The degree to which DDP optimizes depends on which transport is being
236	   compared with, and on the nature of the local interface.  Without
237	   RDMAP/DDP pre-posting buffers requires the receiving side to
238	   accurately predict the required buffers and their sizes.  This is not
239	   feasible for all ULPs.  By contrast, DDP only requires the ULP to
240	   predict the sequence and size of incoming untagged messages.

242	   An application that could predict incoming messages and required
243	   nothing more than direct placement into buffers might be able to do
244	   so with a properly designed local interface to SCTP or TCP.  Doing so
245	   for TCP requires making predictions at a byte level rather than a
246	   message level.

248	   The main benefit of DDP for such an application would be that pre-
249	   posting of receive buffers is a mandated local interface capability,
250	   and that predictions can be made on a per-message basis (not per
251	   byte).

253	   The Lower Layer Protocol, LLP, can also be used directly if ULP
254	   specific knowledge is built into the protocol stack to allow "parse
255	   and place" handling of received packets.  Such a solution either
256	   requires interaction with the ULP, or that the protocol stack have
257	   knowledge of ULP specific syntax rules.

259	   DDP achieves the benefits of directly placing incoming payload
260	   without requiring tight coupling between the ULP and the protocol
261	   stack.  However, "parse and place" capabilities can certainly provide
262	   equivalent services to a limited number of ULPs.

264	4.  Tagged Messages

266	   This section covers the major benefits from the use of Tagged
267	   Messages.

269	   A more critical advantage of DDP is the ability of the Data Source to
270	   use tagged buffers.  Tagging messages allows the Data Source to
271	   choose the ordering and packetization of its payload deliveries.
272	   With direct data placement based solely upon pre-posted receives, the
273	   packetization and delivery of payload must be agreed by the ULP peers
274	   in advance.

276	   The Upper Layer Protocol can allocate content between untagged and/or
277	   tagged messages to maximize the potential optimizations.  Placing
278	   content within an untagged message can deliver the content in the
279	   same packet that signals completion to the receiver.  This can
280	   improve latency.  It can even eliminate round trips.  But it requires
281	   making larger anonymous buffers to be available.

283	   Some examples of data that typically belongs in the untagged message
284	   would include short fixed size control data that is inherently part
285	   of the control message almost always should be included in the
286	   untagged message, relatively short payload that is almost always
287	   needed (especially when it would eliminate a round-trip to fetch the
288	   data.  For example, the initial data on a write request, and of
289	   course advertising tagged buffers that specify the location of data
290	   not in the untagged message.

292	   Tagged messages standardizes direct placement of data without per-
293	   packet interaction with the upper layers.  Even if there is an upper
294	   layer protocol encoding of what is being transferred, as is common
295	   with middleware solutions, this information is not understood at the
296	   application independent layers.  The directions on where to place the
297	   incoming data cannot be accessed without switching to the ULP first.
298	   DDP provides a standardized 'packing list' which can be interpreted
299	   without requiring ULP interaction.  Indeed, it is designed to be
300	   implementable in hardware.

302	4.1.  Order Independent Reception

304	   Tagged messages are directed to a buffer based on an included
305	   Steering Tag. Additionally, no notice is provided to the ULP for each
306	   individual Tagged Message's arrival.  Together these allow tagged
307	   messages received out-of-order to be processed without intermediate
308	   buffering or additional notifications to the ULP.

310	4.2.  Reduced ULP Notifications

312	   RDMAP offers both tagged and untagged messages.  No receiving side
313	   ULP interactions are required for tagged messages.  By optimally
314	   dividing traffic between tagged and untagged messages the ULP can
315	   limit the number of events that must be dealt with at the ULP layer.
316	   This typically reduces the number of context switches required and
317	   improves performance.

319	   RDMAP further reduces required ULP interactions consolidating
320	   completion notifications of tagged messages with the completion
321	   notification of a trailing untagged message.  For most ULPs this
322	   radically reduces the number of ULP required interactions even
323	   further.

325	   While RDMAP consolidation of notices is beneficial to most
326	   applications, it may be detrimental to some applications that benefit
327	   from streamed delivery to enable ULP processing of received data as
328	   promptly as possible.  A ULP that uses RDMAP cannot begin processing
329	   any portion of an exchange until it receives notification that the
330	   entire exchange has been placed.  An "exchange" here is a set of zero
331	   or more tagged messages and a single terminating untagged message.
332	   An application that would prefer to begin work on the received
333	   payload, no matter what order it arrived in, as soon as possible
334	   might prefer to work directly with the LLP.  RDMAP is optimized for
335	   applications that are more concerned when the entire exchange is
336	   complete.

338	   An application that benefits from being able to begin processing of
339	   each received packet as quickly as possible may find RDMAP interferes
340	   with that goal.

342	   Such an application might be able to retain most of the benefits of
343	   RDMAP by using the DDP layer directly.  However, in addition to
344	   taking on the responsibilities of the RDMAP layer, the application
345	   would likely have more difficulty finding support for a DDP-only API.
346	   Many hardware implementations may choose to tightly couple RDMAP and
347	   DDP, and might not provide an API directly to DDP services.

349	   These features minimize the required interactions with the ULP.  This
350	   can be extremely beneficial for applications that use multiple
351	   transport layer packets to accomplish what is a single ULP
352	   interaction.

354	4.3.  Simplified ULP Exchanges

356	   The notification rules for Tagged Messages allows ULPs to create
357	   multi-message "exchanges" consisting of zero or more tagged messages
358	   that represent a single step in the ULP interaction.  The receiving
359	   ULP is notified that the untagged message has arrived, and implicitly
360	   of any associated tagged messages.

362	   A ULP where all exchanges would naturally be only the untagged
363	   message would derive virtually no benefit from the use of RDMAP/DDP
364	   as opposed to SCTP.  But while tagged buffers are the justification
365	   for RDMAP/DDP, untagged buffers are still necessary.  Without
366	   untagged buffers the only method to exchange buffer advertisements
367	   would involve out-of-band communications and/or sharing of compile
368	   time constants.  Most RDMA-aware ULPs use untagged buffers for
369	   requests and responses.  Buffer advertisements are typically done
370	   within these untagged messages.

372	   More importantly there would be no reliable method for the upper
373	   layer peers to synchronize.  The absence of any guarantees about
374	   ordering within or between tagged messages is fundamental to allowing
375	   the DDP layer to optimize transfer of tagged payload.

377	   So no ULP can be defined entirely in terms of tagged messages.
378	   Eventually a notification that confirms delivery must be generated
379	   from the RDMAP/DDP layer.

381	   Limiting use of untagged buffers to requests and responses by moving
382	   all bulk data using tagged transfers can greatly simplify the amount
383	   of prediction that the Data Sink must perform in pre-posting receive
384	   buffers.  For example, a typical RDMA enabled interaction would
385	   consist of the following:

387	      Client sends transaction request to server's as an untagged
388	      message.

390	      This message includes buffer advertisements for the buffers where
391	      the results are to be placed.

393	      The Server sends multiple tagged messages to the advertised
394	      buffers.

396	      The Server sends transaction reply as an untagged message to the
397	      client.

399	      Client receives single notification, indicating completion of the
400	      interaction.

402	   With this type of exchange the pacing and required size of untagged
403	   buffers is highly predictable.  The variability of response sizes is
404	   absorbed by tagged transfers.

406	4.4.  Order Independent Sending

408	   Use of tagged messages is especially applicable when the Data Sink
409	   does not know the actual size, structure or location of the content
410	   it is requesting (or updating).

412	   For example, suppose the Data Sink ULP needs to fetch four related
413	   pieces of data into a four separate buffers.  With SCTP the Data Sink
414	   ULP could receive four messages into four separate buffers, only
415	   having to predict the maximum size of each.  However it would have to
416	   dictate the order in which the Data Source supplied the separate
417	   pieces.  If the Data Source found it advantageous to fetch them in a
418	   different order it would have to use intermediate buffering to re-
419	   order the pieces into the expected order even though the application
420	   only required that all four be delivered and did not truly have an
421	   ordering requirement.

423	   Techniques such as RAID striping and mirroring represent this same
424	   problem, but one step further.  What appears to be a single resource
425	   to the Data Sink is actually stored in separate locations by the Data
426	   Source.  Non RDMA protocols would either require the Data Source to
427	   fetch the material in the desired order or force the Data Source to
428	   use its own holding buffers to assemble an image of the destination
429	   buffer.

431	   While sometimes referred to as a "buffer-to-buffer" solution, RDMA
432	   more fundamentally enables remote buffer access.  The ULP is free to
433	   work with larger remote buffers than it has locally.  This reduces
434	   buffering requirements and the number of times the data must be
435	   copied in an end-to-end transfer.

437	   There are numerous reasons why the Data Sink would not know the true
438	   order or location of the requested data.  It could be different for
439	   each client, different records selected and/or different sort orders,
440	   RAID striping, file fragmentation, volume fragmentation, volume
441	   mirroring and server-side dynamic compositing of content (such as
442	   server side includes for HTTP).

444	   In all of these cases the Data Source is free to assemble the desired
445	   data in the Data Sink's buffer in whatever order the component data
446	   becomes available to it.  It is not constrained on ordering.  It does
447	   not have to assemble an image in its own memory before creating it in
448	   the Data Sink's buffers.

450	   Note that while DDP enables use of tagged messages for bulk transfer,
451	   there are some application scenarios where untagged messages would
452	   still be used for bulk transfer.  For example, a file server may not
453	   expose its own memory to its clients.  A client wishing to write may
454	   advertise a buffer which the server will issue RDMA Reads upon.
455	   However, when performing a small write it may be preferable to
456	   include the data in the untagged message rather than incurring an
457	   additional round trip with the RDMA Read and its response.

459	   Generally, the best use of an untagged message is to synchronize and
460	   to deliver data that is naturally tied to the same message as the
461	   synchronization.  For initial data transfers this has the additional
462	   benefit of avoiding the need to advertise specific tagged buffers for
463	   indefinite time periods.  Instead anonymous buffers can be used for
464	   initial data reception.  Because anonymous buffers do not need to be
465	   tied to specific messages in advance this can be a major benefit.

467	4.5.  Untagged Messages and Tagged Buffers as ULP Credits

469	   The handling of end-to-end buffer credits differs considerably with
470	   DDP than when the ULP directly uses either TCP or SCTP.

472	   With both TCP and SCTP buffer credits are based upon the receiver
473	   granting transmit permission based on the total number of bytes.
474	   These credits reflect system buffering resources and/or simple flow
475	   control.  They do not represent ULP resources.

477	   DDP defines no standard flow control, but presumes the existince of a
478	   ULP mechanism.  The presumed mechanism is that the Data Sink ULP has
479	   issued credits to the Data Source allowing the Data Source to send a
480	   specific number of untagged messages.

482	   The ULP peers must ensure that the sender is aware of the maximum
483	   size that can be sent to any specific target buffer.  One method of
484	   doing so is to use a standard size for all untagged buffers within a
485	   given connection.  For example, a ULP may specify an initial untagged
486	   buffer size to be used immediately after session establishment, and
487	   then optionally specify mechanisms for negotiating changes.

489	   Tagged buffers are ULP resources advertised directly from ULP to ULP.
490	   A DDP put to a known tagged buffer is constrained only by transport
491	   level flow control, not by available system buffering.

493	   Either tagged or untagged buffers allows bypassing of system buffer
494	   resources.  Use of tagged buffers additionally allows the Data Source
495	   to choose what order to exercise the credits in.

497	   To the extent allowed by the ULP, tagged buffers are also divisible
498	   resources.  The Data Sink can advertise a single 100 KB buffer, and
499	   then receive notifications from its peer that it had written 50 KB,
500	   20 KB and 30 KB to that buffer in three successive transactions.

502	   ULP-management of tagged buffer resources, independent of transport
503	   and DDP layer credits, is an additional benefit of RDMA protocols.
504	   Large bulk transfers cannot be blocked by limited general purpose
505	   buffering capacity.  Applications can flow control based upon higher
506	   level abstractions, such as number of outstanding requests,
507	   independent of the amount of data that must be transferred.

509	   However, use of system buffering, as offered by direct use of the
510	   underlying transports, can be preferable under certain circumstances.

512	   One example would be when the number of target ULP buffers is
513	   sufficiently large, and the rate at which any writes arrive is
514	   sufficiently low, that pinning all the target ULP buffers in memory
515	   would be undesirable.  The maximum transfer rate, and hence the
516	   maximum amount of system buffering required, may be more stable and
517	   predictable than the total ULP buffer exposure.

519	   Another would be the Data Sink wishes to receive a stream of data at
520	   a predictable rate, but does not know in advance what the size of
521	   each data packet will be.  This is common from streaming media that
522	   has been encoded with a variable bit rate.  With DDP the Data Sink
523	   would either have to use untagged buffers large enough for the
524	   largest packet, or advertise a circular buffer.  If for security or
525	   other reasons the Data Sink did not want the size of its buffer to be
526	   publicly known, using the underlying SCTP transport directly may be
527	   preferable because of their byte-oriented credits.

529	5.  RDMA Read

531	   RDMA Reads are a further service provided by RDMAP.  RDMA Reads allow
532	   the Data Sink to fetch exactly the portion of the peer ULP buffer
533	   required on a "just in time" basis.  This can be done without
534	   requiring per-fetch support from the Data Source ULP.

536	   Storage servers may wish to limit the maximum write buffer allocated
537	   to any single session.  The storage server may be a very minimal
538	   layer between the client and the disk storage media, or the server
539	   may merely wish to limit the total resources that would be required
540	   if all clients could push the entire payload they wished written at
541	   their own convenience.

543	   In either case, there is little benefit in transferring data from the
544	   Data Source far in advance of when it will be written to the
545	   persistent storage media.  RDMA Reads allow the Storage Server to
546	   fetch the payload on a "just in time" basis.  In this fashion a
547	   relatively small number of block sized buffers can be used to execute
548	   a single transaction that specified writing a large file, or a
549	   Storage Server with numerous clients can fetch buffers from the
550	   individual clients in the order that is most convenient to the
551	   server.

553	   This same capability can be used when the desired portion of the
554	   advertised buffer is not known in advance.  For example the
555	   advertised buffer could contain performance statistics.  The data
556	   sink could request the portions of the data it required, without
557	   requiring an interaction with the Data Source ULP.

559	   This is applicable for many applications that publish semi-volatile
560	   data that does not require transactional validity checking (i.e.,
561	   authorized users have read access to the entire set of data).  It is
562	   less applicable when there are ULP consistency checks that must be
563	   performed upon the data.  Such applications would be better served by
564	   having the client send a request, and having the server use RDMA
565	   Writes to publish the requested data.  Neither RDMAP or DDP provide
566	   mechanisms for bundling multiple disjoint updates into an atomic
567	   operation.  Therefore use of an advertised buffer as a data resource
568	   is subject to the same caveats as any randomly updated data resource,
569	   such as flat files, that do not enforce their own consistency.

571	6.  LLP Comparisons

573	   Normally the choice of underlying IP transport is irrelevant to the
574	   ULP.  RDMAP and DDP provides the same services over either.  There
575	   may be performance impacts of the choice, however.  It is the
576	   responsibility of the ULP to determine which IP transport is best
577	   suited to its needs.

579	   SCTP provides for preservation of message boundaries.  Each DDP
580	   segment will be delivered within a single SCTP packet.  The
581	   equivalent services are only available with TCP through the use of
582	   the MPA (Marker PDU Alignment) adaptation layer.

584	6.1.  Multistreaming Implications

586	   SCTP also provides multi-streaming.  When the same pair of hosts have
587	   need for multiple DDP streams this can be a major advantage.  A
588	   single SCTP association carries multiple DDP streams, consolidating
589	   connection setup, congestion control and acknowledgements.

591	   Completions are controlled by the DDP Source Sequence Number (DDP-
592	   SSN) on a per stream basis.  Therefore combining multiple DDP Streams
593	   into a single SCTP association cannot result in a dropped packet
594	   carrying data for one stream delaying completions on others.

596	6.2.  Out of Order Reception Implications

598	   The use of unordered Data Chunks with SCTP guarantees that the DDP
599	   layer will be able to perform placements when IP datagrams are
600	   received out of order.

602	   Placement of out-of-order DDP Segments carried over MPA/TCP is not
603	   guaranteed, but certainly allowed.  The ability of the MPA receiver
604	   to process out-of-order DDP Segments may be impaired when alignment
605	   of TCP segments and MPA FPDUs is lost.  Using SCTP, each DDP Segment
606	   is encoded in a single Data Chunk and never spread over multiple IP
607	   datagrams.

609	6.3.  Header and Marker Overhead

611	   MPA and TCP headers together are smaller than the headers used by
612	   SCTP and its adaptation layer.  However, this advantage can be
613	   reduced by the insertion of MPA markers.  The different in ULP
614	   payload per IP Datagram is not likely to be a signifigant factor.

616	6.4.  Middlebox Support

618	   Even with the MPA adaptation layer, DDP traffic carried over MPA/TCP
619	   will appear to all network middleboxes as a normal TCP connection.
620	   In many environments there may be a requirement to use only TCP
621	   connections to satisfy existing network elements and/or to facilitate
622	   monitoring and control of connections.  While SCTP is certainly just
623	   as monitorable and controllable as TCP, there is no guarantee that
624	   the network management infrastructure has the required support for
625	   both.

627	6.5.  Processing Overhead

629	   A DDP stream delivered via MPA/TCP will require more processing
630	   effort that one delivered over SCTP.  However this extra work may be
631	   justified for many deployments where full SCTP support is unavailable
632	   in the endpoints of the network, or where middleboxes impair the
633	   usability of SCTP.

635	6.6.  Data Integrity Implications

637	   Both the SCTP and MPA/TCP adaptation provide end-to-end CRC32c
638	   protection against data accidental corruption, or its equivalent.

640	   A ULP that requires a greater degree of protection may add it own.
641	   However, DDP and RDMAP headers will only be guaranteed to have the
642	   equivalent of end-to-end CRC32c protection.  A ULP that requires data
643	   integrity checking more thorough than an end-to-end CRC32c should
644	   first invalidate all STags that reference a buffer before applying
645	   their own integrity check.

647	   CRC32c only provides protection against random corruption.  To
648	   protect against unauthorized alteration or forging of data packets,
649	   security methods must be applied.  IPsec is supported for both SCTP
650	   and MPA/TCP.

652	6.6.1.  MPA/TCP Specifics

654	   It is mandatory for MPA/TCP implementations to implement CRC32c, but
655	   it is NOT mandatory to use the CRC32c during an RDMA connection.  The
656	   activating or deactivating of the CRC in MPA/TCP is an administrative
657	   configuration operation at the local and remote end.  The
658	   administration of the CRC(ON/OFF) is invisible to the ULP.

660	   Applications SHOULD trust that this administrative option will only
661	   be used when the end-to-end protection is at least as effective as a
662	   transport layer CRC32c.  Applications SHOULD NOT apply additional
663	   protection as a guard against this administrative option being turned
664	   on inadvertently.

666	   Administrators MUST NOT enable CRC32c suppression unless the end-to-
667	   end protection is truly equivalent.

669	   If the CRC is active/used for one direction/end , then the use of the
670	   CRC is mandatory in both directions/ends.

672	   If both ends have been configured NOT to use the CRC, then this is
673	   allowed as long as an equivalent protection(comparable or better
674	   than/to CRC) from undetected errors on the connection is provided.

676	6.6.2.  SCTP Specifics

678	   SCTP provides CRC32c protection automatically.  The adaptation to
679	   SCTP provides for no option to suppress SCTP CRC32c protection.

681	6.7.  Non-IP Transports

683	   DDP is defined to operate over ubiquitous IP transports such as SCTP
684	   and TCP.  This enabled a new DDP-enabled node to be added anywhere to
685	   an IP network.  No DDP-specific support from middle-boxes is
686	   required.

688	   There are non-IP transport fabric offering RDMA capabilities.
689	   Because these capabilities are integrated with the transport protocol
690	   they have some technical advantages when compared to RDMA over IP.
691	   For example fencing of RDMA operations can be based upon transport
692	   level acks.  Because DDP is cleanly layered over an IP transport, any
693	   explicit RDMA layer ack must be separate from the transport layer
694	   ack.

696	   There may be deployments where the benefits of RDMA/transport
697	   integration outweigh the benefits of being on an IP network.

699	6.7.1.  No RDMA Layer Ack

701	   DDP does not provide for its own acknowledgements.  The only form of
702	   ack provided at the RDMAP layer is an RDMA Read Response.  DDP and
703	   RDMAP rely almost entirely upon other layers for flow control and
704	   pacing.  The LLP is relied upon to guarantee delivery and avoid
705	   network congestion, and ULP level acking is relied upon for ULP
706	   pacing and to avoid ULP buffer overruns.

708	   Previous RDMA protocols, such as InfiniBand, have been able to use
709	   their integration with the transport layer to provide stronger
710	   ordering guarantees.  It is important that application designers that
711	   require such guarantees to provide them through ULP interaction.

713	   Specifically:

715	      There is no ability for a local interface to "fence" outbound
716	      messages to guarantee that prior tagged messages have been placed
717	      prior to sending a tagged message.  The only guarantees available
718	      from the other side would be an RDMA Read Response (coming from
719	      the RDMAP layer) or a response from the ULP layer.  Remember that
720	      the normal ordering rules only guarantee when the Data Sink ULP
721	      will be notified of untagged messages, it does not control when
722	      data is placed into receive buffers.

724	      Re-use of tagged buffers must be done with extreme care.  The fact
725	      that an untagged message indicates that all prior tagged messages
726	      have been placed does not guarantee that no later tagged message
727	      have.  The best strategy is to only change the state of any given
728	      advertised buffers with with untagged messages.

730	      As covered elsewhere in this document, flow control of untagged
731	      messages MUST be provided by the ULP itself.

733	6.8.  Other IP Transports

735	   Both TCP and SCTP provide DDP with reliable transport with TCP
736	   friendly rate control.  As currently DDP is defined to work over
737	   reliable transports and implicitly relies upon some form of rate
738	   control.

740	   DDP is fully compatible with a non-reliable protocol.  Out-of-order
741	   placement is obviously not dependent on whether the other DDP
742	   Segments ever actually arrive.

744	   However, RDMAP requires the LLP to provide reliable service.  An
745	   alternate completion handling protocol would be required if DDP were
746	   to be deployed over an unreliable IP transport.

748	   As noted in the prior section on tagged buffers as ULP credits,
749	   neither RDMAP or DDP provide any flow control for tagged messages.
750	   If no transport layer flow control is provided, an RDMAP/DDP
751	   application would be only limited by the link layer rate, almost
752	   inevitably resulting in severe network congestion.

754	   RDMAP encourages applications to be ignorant of the underlying
755	   transport PMTU.  The ULP is only notified when all messages ending in
756	   a single untagged message have completed.  The ULP is not aware of
757	   the granularity or ordering of the underlying message.  This approach
758	   assumes that the ULP is only interested in the complete set of
759	   messages, and has no use for a subset of them.

761	6.9.  LLP Independent Session Establishment

763	   For an RDMAP/DDP application, the transport services provided by a
764	   pair of SCTP Streams and by a TCP connection both provide the same
765	   service (reliable delivery of DDP Segments between two connected
766	   RDMAP/DDP endpoints).

768	6.9.1.  RDMA-only Session Establishment

770	   It is also possible to allow for transport neutral establishment of
771	   RDMAP/DDP sessions between endpoints.  Combined, these two features
772	   would allow most applications to be unconcerned as to which LLP was
773	   actually in use.

775	   Specifically, the procedures for DDP Stream Session establishment
776	   discussed in section 3 of the SCTP mapping, and section 13.3 of the
777	   MPA/TCP mapping, both allow for the exchange of ULP specific data
778	   ("Private Data") before enabling the exchange of DDP Segments.  This
779	   delay can allow for proper selection and/or configuration of the
780	   endpoints based upon the exchanged data.  For example, each DDP
781	   Stream Session associated with a single client session might be
782	   assigned to the same DDP Protection Domain.

784	   To be transport neutral, the applications should exchange Private
785	   Data as part of session establishment messages to determine how the
786	   RDMA endpoints are to be configured.  One side must be the Initiator,
787	   and the other the Responder.

789	   With SCTP, a pair of SCTP streams can be used for sequential
790	   sessions.  With MPA/TCP each connection can be used for at most one
791	   session.  However, the same source/destination pair of ports can be
792	   re-used sequentially subject to normal TCP rules.

794	   Both SCTP and MPA limit the private data size to a maximum of 512
795	   bytes.

797	   MPA/TCP requires the end of the TCP connection that initiated the
798	   conversion to MPA mode to send the first DDP Segment.  SCTP does not
799	   have this requirement.  ULPs which wish to be transport neutral
800	   should require the initiating end to send the first message.  A zero-
801	   length RDMA Write can be used for this purpose if the ULP logic
802	   itself does naturally support this restriction.

804	6.9.2.  RDMA-Conditional Session Establishment

806	   It is sometimes desirable for the active side of a session to connect
807	   with the passive side before knowing whether the passive side
808	   supports RDMA.

810	   This style of session establishment can be supported with either TCP
811	   or SCTP, but not as transparently as for RDMA-only sessions.  Pre-
812	   existing non-RDMA servers are also far more likely to be using TCP
813	   than SCTP.

815	   With TCP. a normal TCP connection is established.  It is then used by
816	   the ULP to determine whether or not to convert to MPA mode and use
817	   RDMA.  This will typically be integral with other session
818	   establishment negotiations.

820	   With SCTP, the establishment of an association tests whether RDMA is
821	   supported.  If not supported, the application simply requests the
822	   association without the RDMA adaptation indication.

824	   One key difference is that with SCTP the determination as to whether
825	   the peer can support RDMA is made before the transport layer
826	   association/connection is established while with TCP the established
827	   connection itself is used to determine whether RDMA is supported.

829	7.  Local Interface Implications

831	   Full utilization of DDP and RDMAP capabilities requires a local
832	   interface that explicitly requests these services.  Protocols such as
833	   Sockets Direct Protocol (SDP) can allow applications to keep their
834	   traditional byte-stream or message-stream interface and still enjoy
835	   many of the benefits of the optimized wire level protocols.

837	8.  IANA Considerations

839	   There are no IANA considerations in this document.

841	9.  Security considerations

843	9.1.  Connection/Association Setup

845	   Both the SCTP and TCP adaptations allow for existing procedures to be
846	   followed for the establishment of the SCTP association or TCP
847	   connection.  Use of DDP does not impair the use of any security
848	   measures to filter, validate and/or log the remote end of an
849	   association/connection.

851	   Authentication of peers and approval of connections is outside of the
852	   scope of DDP.  Connection authentication is the responsibility of the
853	   ULP, which may be based upon information from the LLP.  IPSEC is
854	   usable for both TCP and SCTP.

856	9.2.  Tagged Buffer Exposure

858	   DDP only exposes ULP memory to the extent explicitly allowed by ULP
859	   actions.  These include posting of receive operations and enabling of
860	   Steering Tags.

862	   DDP validates that STags are only used by the remote peer to the
863	   extent authorized by the ULP.  The STag selects from a pool of
864	   buffers previously authorized by the ULP; an STag by itself does not
865	   authorize access.

867	   Use of randomization in generating STag values may be useful in
868	   preventing 'off by one' and other programmatic errors, but is of
869	   limited value in countering generation and misuse of STag values by
870	   an active attacker.  IPsec provides countermeasures that can prevent
871	   such an unauthorized attacker from gaining access to buffers used by
872	   DDP and RDMAP.

874	   Neither RDMAP or DDP place requirements on how ULP's advertise
875	   buffers.  A ULP may use a single Steering Tag for multiple buffer
876	   advertisements.  However, the ULP should be aware that enforcement on
877	   STag usage is likely limited to the overall range that is enabled.
878	   If the remote peer writes into the 'wrong' advertised buffer, neither
879	   the DDP or RDMAP layer will be aware of this.  Nor is there any
880	   report to the ULP on how the remote peer specifically used tagged
881	   buffers.

883	   Unless the ULP peers have an adequate basis for mutual trust, the
884	   receiving ULP might be well advised to use a distinct STag for each
885	   interaction, and to invalidate it after each use or to require its
886	   peer to use the RDMAP option to invalidate the STag with its
887	   responding untagged message.

889	9.3.  Impact of Encrypted Transports

891	   While DDP is cleanly layered over the LLP, its maximum benefit may be
892	   limited when the LLP Stream is secured with a streaming cypher, such
893	   as Transport Layer Security (TLS).  If the LLP must decrypt in order,
894	   it cannot provide out-of-order DDP Segments to the DDP layer for
895	   placement purposes.  IPsec tunnel mode encrypts entire IP Datagrams.
896	   IPsec transport mode encrypts TCP Segments or SCTP packets.  In
897	   neither case should IPsec preclude providing out-of-order DDP
898	   Segments to the DDP layer for placement.

900	   Note that end-to-end use of IPsec cryptographic integrity protection
901	   may allow suppression of MPA CRC generation and checking under
902	   certain circumstances.  This is one example where the LLP may be
903	   judged to have "or equivalent" protection to an end-to-end CRC32c.

905	10.  References

907	10.1.  Normative references

909	   [1]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
910	         Levels", BCP 14, RFC 2119, March 1997.

912	   [2]   Dierks, T. and C. Allen, "The TLS Protocol Version 1.0",
913	         RFC 2246, January 1999.

915	   [3]   Kent, S. and R. Atkinson, "IP Encapsulating Security Payload
916	         (ESP)", RFC 2406, November 1998.

918	   [4]   Stewart, R., Xie, Q., Morneault, K., Sharp, C., Schwarzbauer,
919	         H., Taylor, T., Rytina, I., Kalla, M., Zhang, L., and V.
920	         Paxson, "Stream Control Transmission Protocol", RFC 2960,
921	         October 2000.

923	   [5]   Coene, L., "Stream Control Transmission Protocol Applicability
924	         Statement", RFC 3257, April 2002.

926	   [6]   Recio, R., "An RDMA Protocol Specification",
927	         draft-ietf-rddp-rdmap-05 (work in progress), July 2005.

929	   [7]   Shah, H., "Direct Data Placement over Reliable Transports",
930	         draft-ietf-rddp-ddp-05 (work in progress), July 2005.

932	   [8]   Stewart, R., "Stream Control Transmission Protocol (SCTP)
933	         Remote Direct Memory Access  (RDMA) Direct Data Placement (DDP)
934	         Adaptation", draft-ietf-rddp-sctp-02 (work in progress),
935	         August 2005.

937	   [9]   Pinkerton, J., "DDP/RDMAP Security",
938	         draft-ietf-rddp-security-08 (work in progress), March 2006.

940	   [10]  Culley, P., "Marker PDU Aligned Framing for TCP Specification",
941	         draft-ietf-rddp-mpa-02 (work in progress), February 2005.

943	10.2.  Informative References

945	   [11]  Ko, M., "iSCSI Extensions for RDMA Specification",
946	         October 2005.

948	   [12]  Callaghan, B. and T. Talpey, "NFS Direct Data Placement",
949	         draft-ietf-nfsv4-nfsdirect-02 (work in progress), October 2005.

951	Authors' Addresses

953	   Caitlin Bestler
954	   Broadcom Corporation
955	   16215 Alton Parkway
956	   P.O. Box 57013
957	   Irvine, CA  92619-7013
958	   USA

960	   Phone: 949-926-6383
961	   Email: caitlinb@broadcom.com

963	   Lode Coene
964	   Siemens
965	   Atealaan 26
966	   Herentals,   2200
967	   Belgium

969	   Phone: +32-14-252081
970	   Email: lode.coene@siemens.com

972	Intellectual Property Statement

974	   The IETF takes no position regarding the validity or scope of any
975	   Intellectual Property Rights or other rights that might be claimed to
976	   pertain to the implementation or use of the technology described in
977	   this document or the extent to which any license under such rights
978	   might or might not be available; nor does it represent that it has
979	   made any independent effort to identify any such rights.  Information
980	   on the procedures with respect to rights in RFC documents can be
981	   found in BCP 78 and BCP 79.

983	   Copies of IPR disclosures made to the IETF Secretariat and any
984	   assurances of licenses to be made available, or the result of an
985	   attempt made to obtain a general license or permission for the use of
986	   such proprietary rights by implementers or users of this
987	   specification can be obtained from the IETF on-line IPR repository at
988	   http://www.ietf.org/ipr.

990	   The IETF invites any interested party to bring to its attention any
991	   copyrights, patents or patent applications, or other proprietary
992	   rights that may cover technology that may be required to implement
993	   this standard.  Please address the information to the IETF at
994	   ietf-ipr@ietf.org.

996	Disclaimer of Validity

998	   This document and the information contained herein are provided on an
999	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1000	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
1001	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
1002	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
1003	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1004	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1006	Copyright Statement

1008	   Copyright (C) The Internet Society (2006).  This document is subject
1009	   to the rights, licenses and restrictions contained in BCP 78, and
1010	   except as set forth therein, the authors retain all their rights.

1012	Acknowledgment

1014	   Funding for the RFC Editor function is currently provided by the
1015	   Internet Society.