idnits 2.17.1 

draft-ietf-rddp-applicability-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 17.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 914.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 891.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 898.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 904.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 11, 2005) is 6765 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '1' is defined on line 832, but no explicit reference
     was found in the text

  == Unused Reference: '2' is defined on line 835, but no explicit reference
     was found in the text

  == Unused Reference: '3' is defined on line 838, but no explicit reference
     was found in the text

  == Unused Reference: '4' is defined on line 841, but no explicit reference
     was found in the text

  == Unused Reference: '5' is defined on line 845, but no explicit reference
     was found in the text

  == Unused Reference: '6' is defined on line 848, but no explicit reference
     was found in the text

  == Unused Reference: '7' is defined on line 851, but no explicit reference
     was found in the text

  == Unused Reference: '8' is defined on line 854, but no explicit reference
     was found in the text

  == Unused Reference: '9' is defined on line 859, but no explicit reference
     was found in the text

  ** Obsolete normative reference: RFC 2246 (ref. '2') (Obsoleted by RFC 4346)

  ** Obsolete normative reference: RFC 2406 (ref. '3') (Obsoleted by RFC
     4303, RFC 4305)

  ** Obsolete normative reference: RFC 2960 (ref. '4') (Obsoleted by RFC 4960)

  ** Downref: Normative reference to an Informational RFC: RFC 3257 (ref. '5')

  == Outdated reference: A later version (-07) exists of
     draft-ietf-rddp-rdmap-05

  == Outdated reference: A later version (-07) exists of
     draft-ietf-rddp-ddp-05

  == Outdated reference: A later version (-07) exists of
     draft-ietf-rddp-sctp-02

  == Outdated reference: A later version (-08) exists of
     draft-ietf-rddp-mpa-02


     Summary: 7 errors (**), 0 flaws (~~), 16 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Remote Direct Data Placement                                  C. Bestler
3	Working group                                                   Broadcom
4	Internet-Draft                                                  L. Coene
5	Expires: April 14, 2006                                          Siemens
6	                                                        October 11, 2005

8	Applicability of Remote Direct Memory Access Protocol (RDMA) and Direct
9	                          Data Placement (DDP)
10	                  draft-ietf-rddp-applicability-04.txt

12	Status of this Memo

14	   By submitting this Internet-Draft, each author represents that any
15	   applicable patent or other IPR claims of which he or she is aware
16	   have been or will be disclosed, and any of which he or she becomes
17	   aware will be disclosed, in accordance with Section 6 of BCP 79.

19	   Internet-Drafts are working documents of the Internet Engineering
20	   Task Force (IETF), its areas, and its working groups.  Note that
21	   other groups may also distribute working documents as Internet-
22	   Drafts.

24	   Internet-Drafts are draft documents valid for a maximum of six months
25	   and may be updated, replaced, or obsoleted by other documents at any
26	   time.  It is inappropriate to use Internet-Drafts as reference
27	   material or to cite them other than as "work in progress."

29	   The list of current Internet-Drafts can be accessed at
30	   http://www.ietf.org/ietf/1id-abstracts.txt.

32	   The list of Internet-Draft Shadow Directories can be accessed at
33	   http://www.ietf.org/shadow.html.

35	   This Internet-Draft will expire on April 14, 2006.

37	Copyright Notice

39	   Copyright (C) The Internet Society (2005).

41	Abstract

43	   This document describes the applicability of Remote Direct Memory
44	   Access Protocol (RDMAP) and the Direct Data Placement Protocol (DDP).
45	   It comparese and contrasts the different transport options over IP
46	   that DDP can use, provides guidance to ULP developers on choosing
47	   between available transports and/or how to be indifferent to the
48	   specific transport layer used, compares use of DDP with direct use of
49	   the supporting transports, and compares DDP over IP transports with
50	   non-IP transports that support RDMA functionality.

52	Table of Contents

54	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
55	   2.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  5
56	   3.  Direct Placement . . . . . . . . . . . . . . . . . . . . . . .  6
57	     3.1.  Fewer Required ULP Interactions  . . . . . . . . . . . . .  6
58	     3.2.  Direct Placement using only the LLP  . . . . . . . . . . .  6
59	   4.  Tagged Messages  . . . . . . . . . . . . . . . . . . . . . . .  8
60	     4.1.  Order Independent Reception  . . . . . . . . . . . . . . .  8
61	     4.2.  Reduced ULP Notifications  . . . . . . . . . . . . . . . .  8
62	     4.3.  Simplified ULP Exchanges . . . . . . . . . . . . . . . . .  9
63	     4.4.  Order Independent Sending  . . . . . . . . . . . . . . . . 10
64	     4.5.  Tagged Buffers as ULP Credits  . . . . . . . . . . . . . . 11
65	   5.  RDMA Read  . . . . . . . . . . . . . . . . . . . . . . . . . . 13
66	   6.  LLP Comparisons  . . . . . . . . . . . . . . . . . . . . . . . 14
67	     6.1.  Multistreaming Implications  . . . . . . . . . . . . . . . 14
68	     6.2.  Out of Order Reception Implications  . . . . . . . . . . . 14
69	     6.3.  Header and Marker Overhead . . . . . . . . . . . . . . . . 14
70	     6.4.  Middlebox Support  . . . . . . . . . . . . . . . . . . . . 15
71	     6.5.  Processing Overhead  . . . . . . . . . . . . . . . . . . . 15
72	     6.6.  Data Integrity Implications  . . . . . . . . . . . . . . . 15
73	       6.6.1.  MPA/TCP Specifics  . . . . . . . . . . . . . . . . . . 15
74	       6.6.2.  SCTP Specifics . . . . . . . . . . . . . . . . . . . . 16
75	     6.7.  Non-IP Transports  . . . . . . . . . . . . . . . . . . . . 16
76	       6.7.1.  No RDMA Layer Ack  . . . . . . . . . . . . . . . . . . 16
77	     6.8.  Other IP Transports  . . . . . . . . . . . . . . . . . . . 17
78	     6.9.  LLP Independent Session Establishment  . . . . . . . . . . 17
79	       6.9.1.  RDMA-only Session Establishment  . . . . . . . . . . . 18
80	       6.9.2.  RDMA-Conditional Session Establishment . . . . . . . . 18
81	   7.  Local Interface Implications . . . . . . . . . . . . . . . . . 20
82	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 21
83	   9.  Security considerations  . . . . . . . . . . . . . . . . . . . 22
84	     9.1.  Connection/Association Setup . . . . . . . . . . . . . . . 22
85	     9.2.  Tagged Buffer Exposure . . . . . . . . . . . . . . . . . . 22
86	     9.3.  Impact of Encrypted Transports . . . . . . . . . . . . . . 22
87	   10. Normative references . . . . . . . . . . . . . . . . . . . . . 23
88	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24
89	   Intellectual Property and Copyright Statements . . . . . . . . . . 25

91	1.  Introduction

93	   Remote Direct Memory Access Protocol (RDMAP) and Direct Data
94	   Placement (DDP) work together to provide application independent
95	   efficient placement of application payload directly into buffers
96	   specified by the Upper Layer Protocol (ULP).

98	   The DDP protocol is responsible for direct placement of received
99	   payload into ULP specified buffers.  The RDMAP protocol provides
100	   completion notifications to the ULP and support for Data Sink
101	   initiated fetch of advertised buffers (RDMA Reads).

103	   DDP and RDMAP are both application independent protocols which allow
104	   the ULP to perform remote direct data placement.  DDP can use
105	   multiple standard IP transports including SCTP and TCP.

107	   By clarifying the situations where the functionality of these
108	   protocols are applicable, this document can guide implementers,
109	   application and protocol designers in selecting which protocols to
110	   use.

112	   The applicability of RDMAP/DDP is driven by their unique
113	   capabilities:

115	   o  The existence of an application independent protocol allows common
116	      solutions to be implemented in hardware and/or the kernel.  This
117	      document will discuss when common data placement procedures are of
118	      the greatest benefit to applications as contrasted with
119	      application specific solutions built on top of direct use of the
120	      underlying transport.

122	   o  DDP supports both untagged and tagged buffers.  Tagged buffers
123	      allow the Data Sink ULP to be indifferent to what order (or in
124	      what packets) the Data Source sent the data, or what order they
125	      are received in.  This document will discuss when Data Source
126	      flexibility is of benefit to applications.

128	   o  RDMAP consolidates ULP notifications, thereby minimizing the
129	      number of required ULP interactions.

131	   o  RDMAP defines RDMA Reads, which allow remote access to advertised
132	      buffers.  This document will review the advantages of using RDMA
133	      Reads as contrasted to alternate solutions.

135	   Some non-IP transports, such as InfiniBand, directly integrate RDMA
136	   features.  This document will review the applicability of providing
137	   RDMA services over ubiquitous IP transports as opposed to the use of
138	   customized transport protocols.  Due to the fact that DDP is defined
139	   cleanly as a layer over existing IP transports, DDP has simpler
140	   ordering rules than some prior RDMA protocols.  This may have some
141	   implications for application designers.

143	   The full capabilities of DDP and RDMAP can only be fully realized by
144	   applications that are designed to exploit them.  The co-existence of
145	   RDMAP/DDP aware local interfaces with traditional socket interfaces
146	   will also be explored.

148	   Finally, DDP support is defined for at least two IP transports: SCTP
149	   and TCP.  The rationale for supporting both transports is reviewed,
150	   as well as when each would be the appropriate selection.

152	2.  Definitions

154	   Advertisement - the act of informing a Remote Peer that a local RDMA
155	      Buffer is available to it.  A Node makes available an RDMA Buffer
156	      for incoming RDMA Read or RDMA Write access by informing its RDMA/
157	      DDP peer of the Tagged Buffer identifiers (STag, base address, and
158	      buffer length).  This advertisement of Tagged Buffer information
159	      is not defined by RDMA/DDP and is left to the ULP.  A typical
160	      method would be for the Local Peer to embed the Tagged Buffer's
161	      Steering Tag, base address, and length in a Send Message destined
162	      for the Remote Peer.

164	   Data Sink - The peer receiving a data payload.  Note that the Data
165	      Sink can be required to both send and receive RDMA/DDP Messages to
166	      transfer a data payload.

168	   Data Source - The peer sending a data payload.  Note that the Data
169	      Source can be required to both send and receive RDMA/DDP Messages
170	      to transfer a data payload.

172	   Lower Layer Protocol (LLP) The transport protocol that provides
173	      services to DDP.  This is an IP transport with any required
174	      adaptation layer.  Adaptation layers are defined for SCTP and TCP.

176	   Steering Tag (STag) An identifier of a Tagged Buffer on a Node, valid
177	      as defined within a protocol specification.

179	   Tagged Message A DDP message that is directed to a ULP specified
180	      buffer based upon imbedded addressing information.  In the
181	      immediate sense, the destination buffer is specified by the
182	      message sender.

184	   Untagged Message A DDP message that is directed to a ULP specified
185	      buffer based upon a Message Sequence Number being matched with a
186	      receiver supplied buffer.  The destination buffer is specified by
187	      the message receiver.

189	   Upper Layer Protocol (ULP) The direct user of RDMAP/DDP services.
190	      This may be an application, or a middleware layer such as Sockets
191	      Direct Protocol (SDP) or Remote Procedure Calls (RPC).

193	3.  Direct Placement

195	   Direct Data Placement optimizes the placement of ULP payload into the
196	   correct destination buffers, typically eliminating intermediate
197	   copying.  Placement is enabled without regard to order of arrival,
198	   order of transmission or requiring per-placement interaction with the
199	   ULP.

201	   RDMAP minimizes the required ULP interactions .  This capability is
202	   most valuable for applications that require multiple transport layer
203	   packets for each required ULP interaction.

205	3.1.  Fewer Required ULP Interactions

207	   While reducing the number of required ULP interactions is in itself
208	   desirable, it is critical for high speed connections.  The burst
209	   packet rate for a high speed interface could easily exceed the host
210	   systems ability to switch ULP contexts.

212	   Content access applications are primary examples of applications with
213	   both high bandwidth and high content to required ULP interaction
214	   ratios.  These applications include file access protocols (NAS),
215	   storage access (SAN), database access and other application specific
216	   forms of content access such as HTTP, XML and email.

218	3.2.  Direct Placement using only the LLP

220	   Direct data placement can be achieved without RDMA.  Pre-posting of
221	   receive buffers could allow a non-RDMA network stack to place data
222	   directly to user buffers.

224	   The degree to which DDP optimizes depends on which transport is being
225	   compared with, and on the nature of the local interface.  Without
226	   RDMAP/DDP pre-posting buffers requires the receiving side to
227	   accurately predict the required buffers and their sizes.  This is not
228	   feasible for all ULPs.  By contrast, DDP only requires the ULP to
229	   predict the sequence and size of incoming untagged messages.

231	   An application that could predict incoming messages and required
232	   nothing more than direct placement into buffers might be able to do
233	   so with a properly designed local interface to SCTP or TCP.  Doing so
234	   for TCP requires making predictions at a byte level rather than a
235	   message level.

237	   The main benefit of DDP for such an application would be that pre-
238	   posting of receive buffers is a mandated local interface capability,
239	   and that predictions can be made on a per-message basis (not per
240	   byte).

242	   The LLP can also be used directly if ULP specific knowledge is built
243	   into the protocol stack to allow "parse and place" handling of
244	   received packets.  Such a solution either requires interaction with
245	   the ULP, or that the protocol stack have knowledge of ULP specific
246	   syntax rules.

248	   DDP achieves the benefits of directly placing incoming payload
249	   without requiring tight coupling between the ULP and the protocol
250	   stack.  However, "parse and place" capabilities can certainly provide
251	   equivalent services to a limited number of ULPs.

253	4.  Tagged Messages

255	   This section covers the major benefits from the use of Tagged
256	   Messages.

258	   A more critical advantage of DDP is the ability of the Data Source to
259	   use tagged buffers.  Tagging messages allows the Data Source to
260	   choose the ordering and packetization of its payload deliveries.
261	   With direct data placement based solely upon pre-posted receives, the
262	   packetization and delivery of payload must be agreed by the ULP peers
263	   in advance.  Even if there is an encoding of what is being
264	   transferred, as is common with middleware solutions, this information
265	   is not understood at the application independent layers.  The
266	   directions on where to place the incoming data cannot be accessed
267	   without switching to the ULP first.  DDP provides a standardized
268	   'packing list' which can be interpreted without requiring ULP
269	   interaction.  Indeed, it is designed to be implementable in hardware.

271	4.1.  Order Independent Reception

273	   Tagged messages are directed to a buffer based on an included
274	   Steering Tag. Additionally, no notice is provided to the ULP for each
275	   individual Tagged Message's arrival.  Together these allow tagged
276	   messages received out-of-order to be processed without intermediate
277	   buffering or additional notifications to the ULP.

279	4.2.  Reduced ULP Notifications

281	   RDMAP further reduces required ULP interactions consolidating
282	   completion notifications of tagged messages with the completion
283	   notification of a trailing untagged message.  For most ULPs this
284	   radically reduces the number of ULP required interactions even
285	   further.

287	   While RDMAP consolidation of notices is beneficial to most
288	   applications, it may be detrimental to some applications that benefit
289	   from streamed delivery to enable ULP processing of received data as
290	   promptly as possible.  A ULP that uses RDMAP cannot begin processing
291	   any portion of an exchange until it receives notification that the
292	   entire exchange has been placed.  An "exchange" here is a set of zero
293	   or more tagged messages and a single terminating untagged message.
294	   An application that would prefer to begin work on the received
295	   payload, no matter what order it arrived in, as soon as possible
296	   might prefer to work directly with the LLP.  RDMAP is optimized for
297	   applications that are more concerned when the entire exchange is
298	   complete.

300	   An application that benefits from being able to begin processing of
301	   each received packet as quickly as possible may find RDMAP interferes
302	   with that goal.

304	   Such an application might be able to retain most of the benefits of
305	   RDMAP by using the DDP layer directly.  However, in addition to
306	   taking on the responsibilities of the RDMAP layer, the application
307	   would likely have more difficulty finding support for a DDP-only API.
308	   Many hardware implementations may choose to tightly couple RDMAP and
309	   DDP, and might not provide an API directly to DDP services.

311	   These features minimize the required interactions with the ULP.  This
312	   can be extremely beneficial for applications that use multiple
313	   transport layer packets to accomplish what is a single ULP
314	   interaction.

316	4.3.  Simplified ULP Exchanges

318	   The notification rules for Tagged Messages allows ULPs to create
319	   multi-message "exchanges" consisting of zero or more tagged messages
320	   that represent a single step in the ULP interaction.  The receiving
321	   ULP is notified that the untagged message has arrived, and implicitly
322	   of any associated tagged messages.

324	   A ULP where all exchanges would naturally be only the untagged
325	   message would derive virtually no benefit from the use of RDMAP/DDP
326	   as opposed to SCTP.  But while tagged buffers are the justification
327	   for RDMAP/DDP, untagged buffers are still necessary.  Without
328	   untagged buffers the only method to exchange buffer advertisements
329	   would involve out-of-band communications and/or sharing of compile
330	   time constants.  Most RDMA-aware ULPs use untagged buffers for
331	   requests and responses.  Buffer advertisements are typically done
332	   within these untagged messages.

334	   Limiting use of untagged buffers to requests and responses by moving
335	   all bulk data using tagged transfers can greatly simplify the amount
336	   of prediction that the Data Sink must perform in pre-posting receive
337	   buffers.  For example, a typical RDMA enabled interaction would
338	   consist of the following:

340	      Client sends transaction request to server's as an untagged
341	      message.

343	      This message includes buffer advertisements for the buffers where
344	      the results are to be placed.

346	      The Server sends multiple tagged messages to the advertised
347	      buffers.

349	      The Server sends transaction reply as an untagged message to the
350	      client.

352	      Client receives single notification, indicating completion of the
353	      interaction.

355	   With this type of exchange the pacing and required size of untagged
356	   buffers is highly predictable.  The variability of response sizes is
357	   absorbed by tagged transfers.

359	4.4.  Order Independent Sending

361	   Use of tagged messages is especially applicable when the Data Sink
362	   does not know the actual size, structure or location of the content
363	   it is requesting (or updating).

365	   For example, suppose the Data Sink ULP needs to fetch four related
366	   pieces of data into a four separate buffers.  With SCTP the Data Sink
367	   ULP could receive four messages into four separate buffers, only
368	   having to predict the maximum size of each.  However it would have to
369	   dictate the order in which the Data Source supplied the separate
370	   pieces.  If the Data Source found it advantageous to fetch them in a
371	   different order it would have to use intermediate buffering to re-
372	   order the pieces into the expected order even though the application
373	   only required that all four be delivered and did not truly have an
374	   ordering requirement.

376	   Techniques such as RAID striping and mirroring represent this same
377	   problem, but one step further.  What appears to be a single resource
378	   to the Data Sink is actually stored in separate locations by the Data
379	   Source.  Non RDMA protocols would either require the Data Source to
380	   fetch the material in the desired order or force the Data Source to
381	   use its own holding buffers to assemble an image of the destination
382	   buffer.

384	   While sometimes referred to as a "buffer-to-buffer" solution, RDMA
385	   more fundamentally enables remote buffer access.  The ULP is free to
386	   work with larger remote buffers than it has locally.  This reduces
387	   buffering requirements and the number of times the data must be
388	   copied in an end-to-end transfer.

390	   There are numerous reasons why the Data Sink would not know the true
391	   order or location of the requested data.  It could be different for
392	   each client, different records selected and/or different sort orders,
393	   RAID striping, file fragmentation, volume fragmentation, volume
394	   mirroring and server-side dynamic compositing of content (such as
395	   server side includes for HTTP).

397	   In all of these cases the Data Source is free to assemble the desired
398	   data in the Data Sinks buffer in whatever order the component data
399	   becomes available to it.  It is not constrained on ordering.  It does
400	   not have to assemble an image in its own memory before creating it in
401	   the Data Sink's buffers.

403	   Note that while DDP enables use of tagged messages for bulk transfer,
404	   there are some application scenarios where untagged messages would
405	   still be used for bulk transfer.  For example, under the Direct
406	   Access File Server (DAFS) protocol the file server does not expose
407	   its own memory to its clients.  A client wishing to write may
408	   advertise a buffer which the server will issue RDMA Reads upon.
409	   However, when performing a small write it may be preferable to
410	   include the data in the untagged message rather than incurring an
411	   additional round trip with the RDMA Read and its response.

413	4.5.  Tagged Buffers as ULP Credits

415	   The handling of end-to-end buffer credits differs considerably with
416	   DDP than when the ULP directly uses either TCP or SCTP.

418	   With both TCP and SCTP buffer credits are based upon the receiver
419	   granting transmit permission based on the total number of bytes.
420	   These credits reflect system buffering resources and/or simple flow
421	   control.  They do not represent ULP resources.

423	   DDP defines no standard flow control, but presumes the existince of a
424	   ULP mechanism.  The presumed mechanism is that the Data Sink ULP has
425	   issued credits to the Data Source allowing the Data Source to send a
426	   specific number of untagged messages.

428	   The ULP peers must ensure that the sender is aware of the maximum
429	   size that can be sent to any specific target buffer.  One method of
430	   doing so is to use a standard size for all untagged buffers within a
431	   given connection.  For example, DAFS specifies an initial size
432	   requirement for session establishment, during which the untagged
433	   buffer size for the remainder of the session is negotiated.

435	   Tagged buffers are ULP resources advertised directly from ULP to ULP.
436	   A DDP put to a known tagged buffer is constrained only by transport
437	   level flow control, not by available system buffering.

439	   Either tagged or untagged buffers allows bypassing of system buffer
440	   resources.  Use of tagged buffers additionally allows the Data Source
441	   to choose what order to exercise the credits in.

443	   To the extent allowed by the ULP, tagged buffers are also divisible
444	   resources.  The Data Sink can advertise a single 100 KB buffer, and
445	   then receive notifications from its peer that it had written 50 KB,
446	   20 KB and 30 KB to that buffer in three successive transactions.

448	   ULP-management of tagged buffer resources, independent of transport
449	   and DDP layer credits, is an additional benefit of RDMA protocols.
450	   Large bulk transfers cannot be blocked by limited general purpose
451	   buffering capacity.  Applications can flow control based upon higher
452	   level abstractions, such as number of outstanding requests,
453	   independent of the amount of data that must be transferred.

455	   However, use of system buffering, as offered by direct use of the
456	   underlying transports, can be preferable under certain circumstances.

458	   One example would be when the number of target ULP buffers is
459	   sufficiently large, and the rate at which any writes arrive is
460	   sufficiently low, that pinning all the target ULP buffers in memory
461	   would be undesirable.  The maximum transfer rate, and hence the
462	   maximum amount of system buffering required, may be more stable and
463	   predictable than the total ULP buffer exposure.

465	   Another would be the Data Sink wishes to receive a stream of data at
466	   a predictable rate, but does not know in advance what the size of
467	   each data packet will be.  This is common from streaming media that
468	   has been encoded with a variable bit rate.  With DDP the Data Sink
469	   would either have to use untagged buffers large enough for the
470	   largest packet, or advertise a circular buffer.  If for security or
471	   other reasons the Data Sink did not want the size of its buffer to be
472	   publicly known, using the underlying SCTP transport directly may be
473	   preferable because of their byte-oriented credits.

475	5.  RDMA Read

477	   RDMA Reads are a further service provided by RDMAP.  RDMA Reads allow
478	   the Data Sink to fetch exactly the portion of the peer ULP buffer
479	   required on a "just in time" basis.  This can be done without
480	   requiring per-fetch support from the Data Source ULP.

482	   Storage servers may wish to limit the maximum write buffer allocated
483	   to any single session.  The storage server may be a very minimal
484	   layer between the client and the disk storage media, or the server
485	   may merely wish to limit the total resources that would be required
486	   if all clients could push the entire payload they wished written at
487	   their own convenience.

489	   In either case, there is little benefit in transferring data from the
490	   Data Source far in advance of when it will be written to the
491	   persistent storage media.  RDMA Reads allow the Storage Server to
492	   fetch the payload on a "just in time" basis.  In this fashion a
493	   relatively small number of block sized buffers can be used to execute
494	   a single transaction that specified writing a large file, or a
495	   Storage Server with numerous clients can fetch buffers from the
496	   individual clients in the order that is most convenient to the
497	   server.

499	   This same capability can be used when the desired portion of the
500	   advertised buffer is not known in advance.  For example the
501	   advertised buffer could contain performance statistics.  The data
502	   sink could request the portions of the data it required, without
503	   requiring an interaction with the Data Source ULP.

505	   This is applicable for many applications that publish semi-volatile
506	   data that does not require transactional validity checking (i.e.,
507	   authorized users have read access to the entire set of data).  It is
508	   less applicable when there are ULP consistency checks that must be
509	   performed upon the data.  Such applications would be better served by
510	   having the client send a request, and having the server use RDMA
511	   Writes to publish the requested data.  Neither RDMAP or DDP provide
512	   mechanisms for bundling multiple disjoint updates into an atomic
513	   operation.  Therefore use of an advertised buffer as a data resource
514	   is subject to the same caveats as any randomly updated data resource,
515	   such as flat files, that do not enforce their own cosnsistency.

517	6.  LLP Comparisons

519	   Normally the choice of underlying IP transport is irrelevant to the
520	   ULP.  RDMAP and DDP provides the same services over either.  There
521	   may be performance impacts of the choice, however.  It is the
522	   responsibility of the ULP to determine which IP transport is best
523	   suited to its needs.

525	   SCTP provides for preservation of message boundaries.  Each DDP
526	   segment will be delivered within a single SCTP packet.  The
527	   equivalent services are only available with TCP through the use of
528	   the MPA adaptation layer.

530	6.1.  Multistreaming Implications

532	   SCTP also provides multi-streaming.  When the same pair of hosts have
533	   need for multiple DDP streams this can be a major advantage.  A
534	   single SCTP association carries multiple DDP streams, consolidating
535	   connection setup, congestion control and acknowledgements.

537	   Completions are controlled by the DDP Source Sequence Number (DDP-
538	   SSN) on a per stream basis.  Therefore combining multiple DDP Streams
539	   into a single SCTP association cannot result in a dropped packet
540	   carrying data for one stream delaying completions on others.

542	6.2.  Out of Order Reception Implications

544	   The use of unordered Data Chunks with SCTP guarantees that the DDP
545	   layer will be able to perform placements when IP datagrams are
546	   received out of order.

548	   Placement of out-of-order DDP Segments carried over MPA/TCP is not
549	   guaranteed, but certainly allowed.  The ability of the MPA receiver
550	   to process out-of-order DDP Segments may be impaired when alignment
551	   of TCP segments and MPA FPDUs is lost.  Using SCTP, each DDP Segment
552	   is encoded in a single Data Chunk and never spread over multiple IP
553	   datagrams.

555	6.3.  Header and Marker Overhead

557	   MPA and TCP headers together are smaller than the headers used by
558	   SCTP and its adaptation layer.  However, this advantage can be
559	   considerably reduced by the insertion of MPA markers.  In any event
560	   the different in ULP payload per IP Datagram is not likely to be a
561	   signifigant factor.

563	6.4.  Middlebox Support

565	   Even with the MPA adaptation layer, DDP traffic carried over MPA/TCP
566	   will appear to all network middleboxes as a normal TCP connection.
567	   In many environments there may be a requirement to use only TCP
568	   connections to satisfy existing network elements and/or to facilitate
569	   monitoring and control of connections.  While SCTP is certainly just
570	   as monitorable and controllable as TCP, there is no guarantee that
571	   the network management infrastructure has the required support for
572	   both.

574	6.5.  Processing Overhead

576	   A DDP stream delivered via MPA/TCP will required more processing
577	   effort that one delivered over SCTP.  However this extra work may be
578	   justified for many deployments where full SCTP support is unavailable
579	   in the endpoints of the network, or where middleboxes impair the
580	   usability of SCTP.

582	6.6.  Data Integrity Implications

584	   Both the SCTP and MPA/TCP adaptation provide end-to-end CRC32c
585	   protection against data corruption, or its equivalent.

587	   A ULP that requires a greater degree of protection may add it own.
588	   However, DDP and RDMAP headers will only be guaranteed to have the
589	   equivalent of end-to-end CRC32c protection.  A ULP that requires data
590	   integrity checking more thorough than an end-to-end CRC32c should
591	   first invalidate all STags that reference a buffer before applying
592	   their own integrity check.

594	6.6.1.  MPA/TCP Specifics

596	   It is mandatory for MPA/TCP implementations to implement CRC32c, but
597	   it is NOT mandatory to use the CRC32c during an RDMA connection.  The
598	   activating or deactivating of the CRC in MPA/TCP is an administrative
599	   configuration operation at the local and remote end.  The
600	   administration of the CRC(ON/OFF) is invisible to the ULP.

602	   Applications SHOULD trust that this administrative option will only
603	   be used when the end-to-end protection is at least as effective as a
604	   transport layer CRC32c.  Applications SHOULD NOT apply additional
605	   protection as a guard against this administrative option being turned
606	   on inadvertently.

608	   Administrators MUST NOT enable CRC32c suppression unless the end-to-
609	   end protection is truly equivalent.

611	   If the CRC is active/used for one direction/end , then the use of the
612	   CRC is mandatory in both directions/ends.

614	   If both ends have been configured NOT to use the CRC, then this is
615	   allowed as long as an equivalent protection(comparable or better
616	   than/to CRC) from undetected errors on the connection is provided.

618	6.6.2.  SCTP Specifics

620	   SCTP provides CRC32c protection automatically.  The adaptation to
621	   SCTP provides for no option to suppress SCTP CRC32c protection.

623	6.7.  Non-IP Transports

625	   DDP is defined to operate over ubiquitous IP transports such as SCTP
626	   and TCP.  This enabled a new DDP-enabled node to be added anywhere to
627	   an IP network.  No DDP-specific support from middle-boxes is
628	   required.

630	   There are non-IP transport fabric offering RDMA capabilities.
631	   Because these capabilities are integrated with the transport protocol
632	   they have some technical advantages when compared to RDMA over IP.
633	   For example fencing of RDMA operations can be based upon transport
634	   level acks.  Because DDP is cleanly layered over an IP transport, any
635	   explicit RDMA layer ack must be separate from the transport layer
636	   ack.

638	   There may be deployments where the benefits of RDMA/transport
639	   integration outweigh the benefits of being on an IP network.

641	6.7.1.  No RDMA Layer Ack

643	   DDP does not provide for its own acknowledgements.  The only form of
644	   ack provided at the RDMAP layer is an RDMA Read Response.  DDP and
645	   RDMAP rely almost entirely upon other layers for flow control and
646	   pacing.  The LLP is relied upon to guarantee delivery and avoid
647	   network congestion, and ULP level acking is relied upon for ULP
648	   pacing and to avoid ULP buffer overruns.

650	   Previous RDMA protocols, such as InfiniBand, have been able to use
651	   their integration with the transport layer to provide stronger
652	   ordering guarantees.  It is important that application designers that
653	   require such guarantees to provide them through ULP interaction.

655	   Specifically:

657	      There is no ability for a local interface to "fence" outbound
658	      messages to guarantee that prior tagged messages have been placed
659	      prior to sending a tagged message.  The only guarantees available
660	      from the other side would be an RDMA Read Response (coming from
661	      the RDMAP layer) or a response from the ULP layer.  Remember that
662	      the normal ordering rules only guarantee when the Data Sink ULP
663	      will be notified of untagged messages, it does not control when
664	      data is placed into receive buffers.

666	      Re-use of tagged buffers must be done with extreme care.  The fact
667	      that an untagged message indicates that all prior tagged messages
668	      have been placed does not guarantee that no later tagged message
669	      have.  The best strategy is to only change the state of any given
670	      advertised buffers with with untagged messages.

672	      As covered elsewhere in this document, flow control of untagged
673	      messages MUST be provided by the ULP itself.

675	6.8.  Other IP Transports

677	   Both TCP and SCTP provide DDP with reliable transport with TCP
678	   friendly rate control.  As currently DDP is defined to work over
679	   reliable transports and implicitly relies upon some form of rate
680	   control.

682	   DDP is fully compatible with a non-reliable protocol.  Out-of-order
683	   placement is obviously not dependent on whether the other DDP
684	   Segments ever actually arrive.

686	   However, RDMAP requires the LLP to provide reliable service.  An
687	   alternate completion handling protocol would be required if DDP were
688	   to be deployed over an unreliable IP transport.

690	   As noted in the prior section on tagged buffers as ULP credits,
691	   neither RDMAP or DDP provide any flow control for tagged messages.
692	   If no transport layer flow control is provided, an RDMAP/DDP
693	   application would be only limited by the link layer rate, almost
694	   inevitably resulting in severe network congestion.

696	   RDMAP encourages applications to be ignorant of the underlying
697	   transport PMTU.  The ULP is only notified when all messages ending in
698	   a single untagged message have completed.  The ULP is not aware of
699	   the granularity or ordering of the underlying message.  This approach
700	   assumes that the ULP is only interested in the complete set of
701	   messages, and has no use for a subset of them.

703	6.9.  LLP Independent Session Establishment

705	   For an RDMAP/DDP application, the transport services provided by a
706	   pair of SCTP Streams and by a TCP connection both provide the same
707	   service (reliable delivery of DDP Segments between two connected
708	   RDMAP/DDP endpoints).

710	6.9.1.  RDMA-only Session Establishment

712	   It is also possible to allow for transport neutral establishment of
713	   RDMAP/DDP sessions between endpoints.  Combined, these two features
714	   would allow most applications to be unconcerned as to which LLP was
715	   actually in use.

717	   Specifically, the procedures for DDP Stream Session establishment
718	   discussed in section 3 of the SCTP mapping, and section 13.3 of the
719	   MPA/TCP mapping, both allow for the exchange of ULP specific data
720	   ("Private Data") before enabling the exchange of DDP Segments.  This
721	   delays can allow for proper selection and/or configuration of the
722	   endpoints based upon the exchanged data.  For example, each DDP
723	   Stream Session associated with a single client session might be
724	   assigned to the same DDP Protection Domain.

726	   To be transport neutral, the applications should exchange Private
727	   Data as part of session establishment messages to determine how the
728	   RDMA endpoints are to be configured.  One side must be the Initiator,
729	   and the other the Responder.

731	   With SCTP, a pair of SCTP streams can be used for sequential
732	   sessions.  With MPA/TCP each connection can be used for at most one
733	   session.  However, the same source/destination pair of ports can be
734	   re-used sequentially subject to normal TCP rules.

736	   Both SCTP and MPA limit the private data size to a maximum of 512
737	   bytes.

739	   MPA/TCP requires the end of the TCP connection that initiated the
740	   conversion to MPA mode to send the first DDP Segment.  SCTP does not
741	   have this requirement.  ULPs which wish to be transport neutral
742	   should require the initiating end to send the first message.  A zero-
743	   length RDMA Write can be used for this purpose if the ULP logic
744	   itself does naturally support this restriction.

746	6.9.2.  RDMA-Conditional Session Establishment

748	   It is sometimes desirable for the active side of a session to connect
749	   with the passive side before knowing whether the passive side
750	   supports RDMA.

752	   This style of session establishment can be supported with either TCP
753	   or SCTP, but not as transparently as for RDMA-only sessions.  Pre-
754	   existing non-RDMA servers are also far more likely to be using TCP
755	   than SCTP.

757	   With TCP. a normal TCP connection is established.  It is then used by
758	   the ULP to determine whether or not to convert to MPA mode and use
759	   RDMA.  This will typically be integral with other session
760	   establishment negotiations.

762	   With SCTP, the establishment of an association tests whether RDMA is
763	   supported.  If not supported, the application simply requests the
764	   association without the RDMA adaptation indication.

766	   In key difference is that with SCTP the determination as to whether
767	   the peer can support RDMA is made before the transport layer
768	   association/connection is established while with TCP the established
769	   connection itself is used to determine whether RDMA is supported.

771	7.  Local Interface Implications

773	   Full utilization of DDP and RDMAP capabilities requires a local
774	   interface that explicitly requests these services.  Protocols such as
775	   Sockets Direct Protocol (SDP) can allow applications to keep their
776	   traditional byte-stream or message-stream interface and still enjoy
777	   many of the benefits of the optimized wire level protocols.

779	8.  IANA Considerations

781	   There are no IANA considerations in this document.

783	9.  Security considerations

785	9.1.  Connection/Association Setup

787	   Both the SCTP and TCP adaptations allow for existing procedures to be
788	   followed for the establishment of the SCTP association or TCP
789	   connection.  Use of DDP does not impair the use of any security
790	   measures to filter, validate and/or log the remote end of an
791	   association/connection.

793	9.2.  Tagged Buffer Exposure

795	   DDP only exposes ULP memory to the extent explicitly allowed by ULP
796	   actions.  These include posting of receive operations and enabling of
797	   Steering Tags.

799	   Neither RDMAP or DDP place requirements on how ULP's advertise
800	   buffers.  A ULP may use a single Steering Tag for multiple buffer
801	   advertisements.  However, the ULP should be aware that enforcement on
802	   STag usage is likely limited to the overall range that is enabled.
803	   If the remote peer writes into the 'wrong' advertised buffer, neither
804	   the DDP or RDMAP layer will be aware of this.  Nor is there any
805	   report to the ULP on how the remote peer specifically used tagged
806	   buffers.

808	   Unless the ULP peers have an adequate basis for mutual trust, the
809	   receiving ULP might be well advised to use a distinct STag for each
810	   interaction, and to invalidate it after each use or to require its
811	   peer to use the RDMAP option to invalidate the STag with its
812	   responding untagged message.

814	9.3.  Impact of Encrypted Transports

816	   While DDP is cleanly layered over the LLP, its maximum benefit may be
817	   limited when the LLP Stream is secured with a streaming cypher, such
818	   as Transport Layer Security (TLS).  If the LLP must decrypt in order,
819	   it cannot provide out-of-order DDP Segments to the DDP layer for
820	   placement purposes.  IPsec tunnel mode encrypts entire IP Datagrams.
821	   IPsec transport mode encrypts TCP Segments or SCTP packets.  In
822	   neither case should IPsec preclude providing out-of-order DDP
823	   Segments to the DDP layer for placement.

825	   Note that end-to-end use of IPsec cryptographic integrity protection
826	   may allow suppression of MPA CRC generation and checking under
827	   certain circumstances.  This is one example where the LLP may be
828	   judged to have "or equivalent" protection to an end-to-end CRC32c.

830	10.  Normative references

832	   [1]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
833	        Levels", BCP 14, RFC 2119, March 1997.

835	   [2]  Dierks, T. and C. Allen, "The TLS Protocol Version 1.0",
836	        RFC 2246, January 1999.

838	   [3]  Kent, S. and R. Atkinson, "IP Encapsulating Security Payload
839	        (ESP)", RFC 2406, November 1998.

841	   [4]  Stewart, R., Xie, Q., Morneault, K., Sharp, C., Schwarzbauer,
842	        H., Taylor, T., Rytina, I., Kalla, M., Zhang, L., and V. Paxson,
843	        "Stream Control Transmission Protocol", RFC 2960, October 2000.

845	   [5]  Coene, L., "Stream Control Transmission Protocol Applicability
846	        Statement", RFC 3257, April 2002.

848	   [6]  Recio, R., "An RDMA Protocol Specification",
849	        draft-ietf-rddp-rdmap-05 (work in progress), July 2005.

851	   [7]  Shah, H., "Direct Data Placement over Reliable Transports",
852	        draft-ietf-rddp-ddp-05 (work in progress), July 2005.

854	   [8]  Stewart, R., "Stream Control Transmission Protocol (SCTP) Remote
855	        Direct Memory Access (RDMA) Direct Data Placement (DDP)
856	        Adaptationn", draft-ietf-rddp-sctp-02 (work in progress),
857	        August 2005.

859	   [9]  Culley, P., "Marker PDU Aligned Framing for TCP Specification",
860	        draft-ietf-rddp-mpa-02 (work in progress), February 2005.

862	Authors' Addresses

864	   Caitlin Bestler
865	   Broadcom
866	   49 Discovery
867	   Irvine, CA  92618
868	   USA

870	   Phone: 949-926-6383
871	   Email: caitlinb@broadcom.com

873	   Lode Coene
874	   Siemens
875	   Atealaan 26
876	   Herentals,   2200
877	   Belgium

879	   Phone: +32-14-252081
880	   Email: lode.coene@siemens.com

882	Intellectual Property Statement

884	   The IETF takes no position regarding the validity or scope of any
885	   Intellectual Property Rights or other rights that might be claimed to
886	   pertain to the implementation or use of the technology described in
887	   this document or the extent to which any license under such rights
888	   might or might not be available; nor does it represent that it has
889	   made any independent effort to identify any such rights.  Information
890	   on the procedures with respect to rights in RFC documents can be
891	   found in BCP 78 and BCP 79.

893	   Copies of IPR disclosures made to the IETF Secretariat and any
894	   assurances of licenses to be made available, or the result of an
895	   attempt made to obtain a general license or permission for the use of
896	   such proprietary rights by implementers or users of this
897	   specification can be obtained from the IETF on-line IPR repository at
898	   http://www.ietf.org/ipr.

900	   The IETF invites any interested party to bring to its attention any
901	   copyrights, patents or patent applications, or other proprietary
902	   rights that may cover technology that may be required to implement
903	   this standard.  Please address the information to the IETF at
904	   ietf-ipr@ietf.org.

906	Disclaimer of Validity

908	   This document and the information contained herein are provided on an
909	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
910	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
911	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
912	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
913	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
914	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

916	Copyright Statement

918	   Copyright (C) The Internet Society (2005).  This document is subject
919	   to the rights, licenses and restrictions contained in BCP 78, and
920	   except as set forth therein, the authors retain all their rights.

922	Acknowledgment

924	   Funding for the RFC Editor function is currently provided by the
925	   Internet Society.