idnits 2.17.1 

draft-ietf-rddp-applicability-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 17.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 915.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 892.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 899.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 905.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (September 26, 2005) is 6780 days in the past.  Is
     this intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '1' is defined on line 827, but no explicit reference
     was found in the text

  == Unused Reference: '2' is defined on line 830, but no explicit reference
     was found in the text

  == Unused Reference: '3' is defined on line 833, but no explicit reference
     was found in the text

  == Unused Reference: '4' is defined on line 836, but no explicit reference
     was found in the text

  == Unused Reference: '5' is defined on line 841, but no explicit reference
     was found in the text

  == Unused Reference: '6' is defined on line 844, but no explicit reference
     was found in the text

  == Unused Reference: '7' is defined on line 847, but no explicit reference
     was found in the text

  == Unused Reference: '8' is defined on line 850, but no explicit reference
     was found in the text

  == Unused Reference: '9' is defined on line 855, but no explicit reference
     was found in the text

  == Unused Reference: '10' is defined on line 858, but no explicit reference
     was found in the text

  == Unused Reference: '11' is defined on line 860, but no explicit reference
     was found in the text

  ** Obsolete normative reference: RFC 2246 (ref. '2') (Obsoleted by RFC 4346)

  ** Obsolete normative reference: RFC 2406 (ref. '3') (Obsoleted by RFC
     4303, RFC 4305)

  ** Obsolete normative reference: RFC 2960 (ref. '4') (Obsoleted by RFC 4960)

  ** Downref: Normative reference to an Informational RFC: RFC 3257 (ref. '5')

  == Outdated reference: A later version (-07) exists of
     draft-ietf-rddp-rdmap-05

  == Outdated reference: A later version (-07) exists of
     draft-ietf-rddp-ddp-05

  == Outdated reference: A later version (-07) exists of
     draft-ietf-rddp-sctp-02

  == Outdated reference: A later version (-08) exists of
     draft-ietf-rddp-mpa-02

  -- Possible downref: Non-RFC (?) normative reference: ref. '10'

  -- Possible downref: Non-RFC (?) normative reference: ref. '11'


     Summary: 9 errors (**), 0 flaws (~~), 18 warnings (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Remote Direct Data Placement                                  C. Bestler
3	Working group                                                   Broadcom
4	Internet-Draft                                                  L. Coene
5	Expires: March 30, 2006                                          Siemens
6	                                                      September 26, 2005

8	Applicability of Remote Direct Memory Access Protocol (RDMA) and Direct
9	                          Data Placement (DDP)
10	                  draft-ietf-rddp-applicability-03.txt

12	Status of this Memo

14	   By submitting this Internet-Draft, each author represents that any
15	   applicable patent or other IPR claims of which he or she is aware
16	   have been or will be disclosed, and any of which he or she becomes
17	   aware will be disclosed, in accordance with Section 6 of BCP 79.

19	   Internet-Drafts are working documents of the Internet Engineering
20	   Task Force (IETF), its areas, and its working groups.  Note that
21	   other groups may also distribute working documents as Internet-
22	   Drafts.

24	   Internet-Drafts are draft documents valid for a maximum of six months
25	   and may be updated, replaced, or obsoleted by other documents at any
26	   time.  It is inappropriate to use Internet-Drafts as reference
27	   material or to cite them other than as "work in progress."

29	   The list of current Internet-Drafts can be accessed at
30	   http://www.ietf.org/ietf/1id-abstracts.txt.

32	   The list of Internet-Draft Shadow Directories can be accessed at
33	   http://www.ietf.org/shadow.html.

35	   This Internet-Draft will expire on March 30, 2006.

37	Copyright Notice

39	   Copyright (C) The Internet Society (2005).

41	Abstract

43	   This document describes the applicability of Remote Direct Memory
44	   Access Protocol (RDMAP) and the Direct Data Placement Protocol (DDP).
45	   It comparese and contrasts the different transport options over IP
46	   that DDP can use, provides guidance to ULP developers on choosing
47	   between available transports and/or how to be indifferent to the
48	   specific transport layer used, compares use of DDP with direct use of
49	   the supporting transports, and compares DDP over IP transports with
50	   non-IP transports that support RDMA functionality.

52	Table of Contents

54	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
55	   2.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  5
56	   3.  Direct Placement . . . . . . . . . . . . . . . . . . . . . . .  6
57	     3.1.  Fewer Required ULP Interactions  . . . . . . . . . . . . .  6
58	     3.2.  Direct Placement using only the LLP  . . . . . . . . . . .  6
59	   4.  Tagged Messages  . . . . . . . . . . . . . . . . . . . . . . .  8
60	     4.1.  Order Independent Reception  . . . . . . . . . . . . . . .  8
61	     4.2.  Reduced ULP Notifications  . . . . . . . . . . . . . . . .  8
62	     4.3.  Simplified ULP Exchanges . . . . . . . . . . . . . . . . .  9
63	     4.4.  Order Independent Sending  . . . . . . . . . . . . . . . . 10
64	     4.5.  Tagged Buffers as ULP Credits  . . . . . . . . . . . . . . 11
65	   5.  RDMA Read  . . . . . . . . . . . . . . . . . . . . . . . . . . 13
66	   6.  LLP Comparisons  . . . . . . . . . . . . . . . . . . . . . . . 14
67	     6.1.  Multistreaming Implications  . . . . . . . . . . . . . . . 14
68	     6.2.  Out of Order Reception Implications  . . . . . . . . . . . 14
69	     6.3.  Header and Marker Overhead . . . . . . . . . . . . . . . . 14
70	     6.4.  Middlebox Support  . . . . . . . . . . . . . . . . . . . . 15
71	     6.5.  Processing Overhead  . . . . . . . . . . . . . . . . . . . 15
72	     6.6.  Data Integrity Implications  . . . . . . . . . . . . . . . 15
73	       6.6.1.  MPA/TCP Specifics  . . . . . . . . . . . . . . . . . . 15
74	       6.6.2.  SCTP Specifics . . . . . . . . . . . . . . . . . . . . 16
75	     6.7.  Non-IP Transports  . . . . . . . . . . . . . . . . . . . . 16
76	       6.7.1.  No RDMA Layer Ack  . . . . . . . . . . . . . . . . . . 16
77	     6.8.  Other IP Transports  . . . . . . . . . . . . . . . . . . . 17
78	     6.9.  LLP Independent Session Establishment  . . . . . . . . . . 17
79	       6.9.1.  RDMA-only Session Establishment  . . . . . . . . . . . 18
80	       6.9.2.  RDMA-Conditional Session Establishment . . . . . . . . 18
81	   7.  Local Interface Implications . . . . . . . . . . . . . . . . . 20
82	   8.  Security considerations  . . . . . . . . . . . . . . . . . . . 21
83	     8.1.  Connection/Association Setup . . . . . . . . . . . . . . . 21
84	     8.2.  Tagged Buffer Exposure . . . . . . . . . . . . . . . . . . 21
85	     8.3.  Impact of Encrypted Transports . . . . . . . . . . . . . . 21
86	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 22
87	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23
88	   Intellectual Property and Copyright Statements . . . . . . . . . . 24

90	1.  Introduction

92	   Remote Direct Memory Access Protocol (RDMAP) and Direct Data
93	   Placement (DDP) work together to provide application independent
94	   efficient placement of application payload directly into buffers
95	   specified by the Upper Layer Protocol (ULP).

97	   The DDP protocol is responsible for direct placement of received
98	   payload into ULP specified buffers.  The RDMAP protocol provides
99	   completion notifications to the ULP and support for Data Sink
100	   initiated fetch of advertised buffers (RDMA Reads).

102	   DDP and RDMAP are both application independent protocols which allow
103	   the ULP to perform remote direct data placement.  DDP can use
104	   multiple standard IP transports including SCTP and TCP.

106	   By clarifying the situations where the functionality of these
107	   protocols are applicable, this document can guide implementers,
108	   application and protocol designers in selecting which protocols to
109	   use.

111	   The applicability of RDMAP/DDP is driven by their unique
112	   capabilities:

114	   o  The existence of an application independent protocol allows common
115	      solutions to be implemented in hardware and/or the kernel.  This
116	      document will discuss when common data placement procedures are of
117	      the greatest benefit to applications as contrasted with
118	      application specific solutions built on top of direct use of the
119	      underlying transport.

121	   o  DDP supports both untagged and tagged buffers.  Tagged buffers
122	      allow the Data Sink ULP to be indifferent to what order (or in
123	      what packets) the Data Source sent the data, or what order they
124	      are received in.  This document will discuss when Data Source
125	      flexibility is of benefit to applications.

127	   o  RDMAP consolidates ULP notifications, thereby minimizing the
128	      number of required ULP interactions.

130	   o  RDMAP defines RDMA Reads, which allow remote access to advertised
131	      buffers.  This document will review the advantages of using RDMA
132	      Reads as contrasted to alternate solutions.

134	   Some non-IP transports, such as InfiniBand, directly integrate RDMA
135	   features.  This document will review the applicability of providing
136	   RDMA services over ubiquitous IP transports as opposed to the use of
137	   customized transport protocols.  Due to the fact that DDP is defined
138	   cleanly as a layer over existing IP transports, DDP has simpler
139	   ordering rules than some prior RDMA protocols.  This may have some
140	   implications for application designers.

142	   The full capabilities of DDP and RDMAP can only be fully realized by
143	   applications that are designed to exploit them.  The co-existence of
144	   RDMAP/DDP aware local interfaces with traditional socket interfaces
145	   will also be explored.

147	   Finally, DDP support is defined for at least two IP transports: SCTP
148	   and TCP.  The rationale for supporting both transports is reviewed,
149	   as well as when each would be the appropriate selection.

151	2.  Definitions

153	   Advertisement - the act of informing a Remote Peer that a local RDMA
154	      Buffer is available to it.  A Node makes available an RDMA Buffer
155	      for incoming RDMA Read or RDMA Write access by informing its RDMA/
156	      DDP peer of the Tagged Buffer identifiers (STag, base address, and
157	      buffer length).  This advertisement of Tagged Buffer information
158	      is not defined by RDMA/DDP and is left to the ULP.  A typical
159	      method would be for the Local Peer to embed the Tagged Buffer's
160	      Steering Tag, base address, and length in a Send Message destined
161	      for the Remote Peer.

163	   Data Sink - The peer receiving a data payload.  Note that the Data
164	      Sink can be required to both send and receive RDMA/DDP Messages to
165	      transfer a data payload.

167	   Data Source - The peer sending a data payload.  Note that the Data
168	      Source can be required to both send and receive RDMA/DDP Messages
169	      to transfer a data payload.

171	   Lower Layer Protocol (LLP) The transport protocol that provides
172	      services to DDP.  This is an IP transport with any required
173	      adaptation layer.  Adaptation layers are defined for SCTP and TCP.

175	   Steering Tag (STag) An identifier of a Tagged Buffer on a Node, valid
176	      as defined within a protocol specification.

178	   Tagged Message A DDP message that is directed to a ULP specified
179	      buffer based upon imbedded addressing information.  In the
180	      immediate sense, the destination buffer is specified by the
181	      message sender.

183	   Untagged Message A DDP message that is directed to a ULP specified
184	      buffer based upon a Message Sequence Number being matched with a
185	      receiver supplied buffer.  The destination buffer is specified by
186	      the message receiver.

188	   Upper Layer Protocol (ULP) The direct user of RDMAP/DDP services.
189	      This may be an application, or a middleware layer such as Sockets
190	      Direct Protocol (SDP) or Remote Procedure Calls (RPC).

192	3.  Direct Placement

194	   Direct Data Placement optimizes the placement of ULP payload into the
195	   correct destination buffers, typically eliminating intermediate
196	   copying.  Placement is enabled without regard to order of arrival,
197	   order of transmission or requiring per-placement interaction with the
198	   ULP.

200	   RDMAP minimizes the required ULP interactions .  This capability is
201	   most valuable for applications that require multiple transport layer
202	   packets for each required ULP interaction.

204	3.1.  Fewer Required ULP Interactions

206	   While reducing the number of required ULP interactions is in itself
207	   desirable, it is critical for high speed connections.  The burst
208	   packet rate for a high speed interface could easily exceed the host
209	   systems ability to switch ULP contexts.

211	   Content access applications are primary examples of applications with
212	   both high bandwidth and high content to required ULP interaction
213	   ratios.  These applications include file access protocols (NAS),
214	   storage access (SAN), database access and other application specific
215	   forms of content access such as HTTP, XML and email.

217	3.2.  Direct Placement using only the LLP

219	   Direct data placement can be achieved without RDMA.  Pre-posting of
220	   receive buffers could allow a non-RDMA network stack to place data
221	   directly to user buffers.

223	   The degree to which DDP optimizes depends on which transport is being
224	   compared with, and on the nature of the local interface.  Without
225	   RDMAP/DDP pre-posting buffers requires the receiving side to
226	   accurately predict the required buffers and their sizes.  This is not
227	   feasible for all ULPs.  By contrast, DDP only requires the ULP to
228	   predict the sequence and size of incoming untagged messages.

230	   An application that could predict incoming messages and required
231	   nothing more than direct placement into buffers might be able to do
232	   so with a properly designed local interface to SCTP or TCP.  Doing so
233	   for TCP requires making predictions at a byte level rather than a
234	   message level.

236	   The main benefit of DDP for such an application would be that pre-
237	   posting of receive buffers is a mandated local interface capability,
238	   and that predictions can be made on a per-message basis (not per
239	   byte).

241	   The LLP can also be used directly if ULP specific knowledge is built
242	   into the protocol stack to allow "parse and place" handling of
243	   received packets.  Such a solution either requires interaction with
244	   the ULP, or that the protocol stack have knowledge of ULP specific
245	   syntax rules.

247	   DDP achieves the benefits of directly placing incoming payload
248	   without requiring tight coupling between the ULP and the protocol
249	   stack.  However, "parse and place" capabilities can certainly provide
250	   equivalent services to a limited number of ULPs.

252	4.  Tagged Messages

254	   This section covers the major benefits from the use of Tagged
255	   Messages.

257	   A more critical advantage of DDP is the ability of the Data Source to
258	   use tagged buffers.  Tagging messages allows the Data Source to
259	   choose the ordering and packetization of its payload deliveries.
260	   With direct data placement based solely upon pre-posted receives, the
261	   packetization and delivery of payload must be agreed by the ULP peers
262	   in advance.  Even if there is an encoding of what is being
263	   transferred, as is common with middleware solutions, this information
264	   is not understood at the application independent layers.  The
265	   directions on where to place the incoming data cannot be accessed
266	   without switching to the ULP first.  DDP provides a standardized
267	   'packing list' which can be interpreted without requiring ULP
268	   interaction.  Indeed, it is designed to be implementable in hardware.

270	4.1.  Order Independent Reception

272	   Tagged messages are directed to a buffer based on an included
273	   Steering Tag. Additionally, no notice is provided to the ULP for each
274	   individual Tagged Message's arrival.  Together these allow tagged
275	   messages received out-of-order to be processed without intermediate
276	   buffering or additional notifications to the ULP.

278	4.2.  Reduced ULP Notifications

280	   RDMAP further reduces required ULP interactions consolidating
281	   completion notifications of tagged messages with the completion
282	   notification of a trailing untagged message.  For most ULPs this
283	   radically reduces the number of ULP required interactions even
284	   further.

286	   While RDMAP consolidation of notices is beneficial to most
287	   applications, it may be detrimental to some applications that benefit
288	   from streamed delivery to enable ULP processing of received data as
289	   promptly as possible.  A ULP that uses RDMAP cannot begin processing
290	   any portion of an exchange until it receives notification that the
291	   entire exchange has been placed.  An "exchange" here is a set of zero
292	   or more tagged messages and a single terminating untagged message.
293	   An application that would prefer to begin work on the received
294	   payload, no matter what order it arrived in, as soon as possible
295	   might prefer to work directly with the LLP.  RDMAP is optimized for
296	   applications that are more concerned when the entire exchange is
297	   complete.

299	   An application that benefits from being able to begin processing of
300	   each received packet as quickly as possible may find RDMAP interferes
301	   with that goal.

303	   Such an application might be able to retain most of the benefits of
304	   RDMAP by using the DDP layer directly.  However, in addition to
305	   taking on the responsibilities of the RDMAP layer, the application
306	   would likely have more difficulty finding support for a DDP-only API.
307	   Many hardware implementations may choose to tightly couple RDMAP and
308	   DDP, and might not provide an API directly to DDP services.

310	   These features minimize the required interactions with the ULP.  This
311	   can be extremely beneficial for applications that use multiple
312	   transport layer packets to accomplish what is a single ULP
313	   interaction.

315	4.3.  Simplified ULP Exchanges

317	   The notification rules for Tagged Messages allows ULPs to create
318	   multi-message "exchanges" consisting of zero or more tagged messages
319	   that represent a single step in the ULP interaction.  The receiving
320	   ULP is notified that the untagged message has arrived, and implicitly
321	   of any associated tagged messages.

323	   A ULP where all exchanges would naturally be only the untagged
324	   message would derive virtually no benefit from the use of RDMAP/DDP
325	   as opposed to SCTP.  But while tagged buffers are the justification
326	   for RDMAP/DDP, untagged buffers are still necessary.  Without
327	   untagged buffers the only method to exchange buffer advertisements
328	   would involve out-of-band communications and/or sharing of compile
329	   time constants.  Most RDMA-aware ULPs use untagged buffers for
330	   requests and responses.  Buffer advertisements are typically done
331	   within these untagged messages.

333	   Limiting use of untagged buffers to requests and responses by moving
334	   all bulk data using tagged transfers can greatly simplify the amount
335	   of prediction that the Data Sink must perform in pre-posting receive
336	   buffers.  For example, a typical RDMA enabled interaction would
337	   consist of the following:

339	      Client sends transaction request to server's as an untagged
340	      message.

342	      This message includes buffer advertisements for the buffers where
343	      the results are to be placed.

345	      The Server sends multiple tagged messages to the advertised
346	      buffers.

348	      The Server sends transaction reply as an untagged message to the
349	      client.

351	      Client receives single notification, indicating completion of the
352	      interaction.

354	   With this type of exchange the pacing and required size of untagged
355	   buffers is highly predictable.  The variability of response sizes is
356	   absorbed by tagged transfers.

358	4.4.  Order Independent Sending

360	   Use of tagged messages is especially applicable when the Data Sink
361	   does not know the actual size, structure or location of the content
362	   it is requesting (or updating).

364	   For example, suppose the Data Sink ULP needs to fetch four related
365	   pieces of data into a four separate buffers.  With SCTP the Data Sink
366	   ULP could receive four messages into four separate buffers, only
367	   having to predict the maximum size of each.  However it would have to
368	   dictate the order in which the Data Source supplied the separate
369	   pieces.  If the Data Source found it advantageous to fetch them in a
370	   different order it would have to use intermediate buffering to re-
371	   order the pieces into the expected order even though the application
372	   only required that all four be delivered and did not truly have an
373	   ordering requirement.

375	   Techniques such as RAID striping and mirroring represent this same
376	   problem, but one step further.  What appears to be a single resource
377	   to the Data Sink is actually stored in separate locations by the Data
378	   Source.  Non RDMA protocols would either require the Data Source to
379	   fetch the material in the desired order or force the Data Source to
380	   use its own holding buffers to assemble an image of the destination
381	   buffer.

383	   While sometimes referred to as a "buffer-to-buffer" solution, RDMA
384	   more fundamentally enables remote buffer access.  The ULP is free to
385	   work with larger remote buffers than it has locally.  This reduces
386	   buffering requirements and the number of times the data must be
387	   copied in an end-to-end transfer.

389	   There are numerous reasons why the Data Sink would not know the true
390	   order or location of the requested data.  It could be different for
391	   each client, different records selected and/or different sort orders,
392	   RAID striping, file fragmentation, volume fragmentation, volume
393	   mirroring and server-side dynamic compositing of content (such as
394	   server side includes for HTTP).

396	   In all of these cases the Data Source is free to assemble the desired
397	   data in the Data Sinks buffer in whatever order the component data
398	   becomes available to it.  It is not constrained on ordering.  It does
399	   not have to assemble an image in its own memory before creating it in
400	   the Data Sink's buffers.

402	   Note that while DDP enables use of tagged messages for bulk transfer,
403	   there are some application scenarios where untagged messages would
404	   still be used for bulk transfer.  For example, under the Direct
405	   Access File Server (DAFS) protocol the file server does not expose
406	   its own memory to its clients.  A client wishing to write may
407	   advertise a buffer which the server will issue RDMA Reads upon.
408	   However, when performing a small write it may be preferable to
409	   include the data in the untagged message rather than incurring an
410	   additional round trip with the RDMA Read and its response.

412	4.5.  Tagged Buffers as ULP Credits

414	   The handling of end-to-end buffer credits differs considerably with
415	   DDP than when the ULP directly uses either TCP or SCTP.

417	   With both TCP and SCTP buffer credits are based upon the receiver
418	   granting transmit permission based on the total number of bytes.
419	   These credits reflect system buffering resources and/or simple flow
420	   control.  They do not represent ULP resources.

422	   DDP defines no standard flow control, but presumes the existince of a
423	   ULP mechanism.  The presumed mechanism is that the Data Sink ULP has
424	   issued credits to the Data Source allowing the Data Source to send a
425	   specific number of untagged messages.

427	   The ULP peers must ensure that the sender is aware of the maximum
428	   size that can be sent to any specific target buffer.  One method of
429	   doing so is to use a standard size for all untagged buffers within a
430	   given connection.  For example, DAFS specifies an initial size
431	   requirement for session establishment, during which the untagged
432	   buffer size for the remainder of the session is negotiated.

434	   Tagged buffers are ULP resources advertised directly from ULP to ULP.
435	   A DDP put to a known tagged buffer is constrained only by transport
436	   level flow control, not by available system buffering.

438	   Either tagged or untagged buffers allows bypassing of system buffer
439	   resources.  Use of tagged buffers additionally allows the Data Source
440	   to choose what order to exercise the credits in.

442	   To the extent allowed by the ULP, tagged buffers are also divisible
443	   resources.  The Data Sink can advertise a single 100 KB buffer, and
444	   then receive notifications from its peer that it had written 50 KB,
445	   20 KB and 30 KB to that buffer in three successive transactions.

447	   ULP-management of tagged buffer resources, independent of transport
448	   and DDP layer credits, is an additional benefit of RDMA protocols.
449	   Large bulk transfers cannot be blocked by limited general purpose
450	   buffering capacity.  Applications can flow control based upon higher
451	   level abstractions, such as number of outstanding requests,
452	   independent of the amount of data that must be transferred.

454	   However, use of system buffering, as offered by direct use of the
455	   underlying transports, can be preferable under certain circumstances.

457	   One example would be when the number of target ULP buffers is
458	   sufficiently large, and the rate at which any writes arrive is
459	   sufficiently low, that pinning all the target ULP buffers in memory
460	   would be undesirable.  The maximum transfer rate, and hence the
461	   maximum amount of system buffering required, may be more stable and
462	   predictable than the total ULP buffer exposure.

464	   Another would be the Data Sink wishes to receive a stream of data at
465	   a predictable rate, but does not know in advance what the size of
466	   each data packet will be.  This is common from streaming media that
467	   has been encoded with a variable bit rate.  With DDP the Data Sink
468	   would either have to use untagged buffers large enough for the
469	   largest packet, or advertise a circular buffer.  If for security or
470	   other reasons the Data Sink did not want the size of its buffer to be
471	   publicly known, using the underlying SCTP transport directly may be
472	   preferable because of their byte-oriented credits.

474	5.  RDMA Read

476	   RDMA Reads are a further service provided by RDMAP.  RDMA Reads allow
477	   the Data Sink to fetch exactly the portion of the peer ULP buffer
478	   required on a "just in time" basis.  This can be done without
479	   requiring per-fetch support from the Data Source ULP.

481	   Storage servers may wish to limit the maximum write buffer allocated
482	   to any single session.  The storage server may be a very minimal
483	   layer between the client and the disk storage media, or the server
484	   may merely wish to limit the total resources that would be required
485	   if all clients could push the entire payload they wished written at
486	   their own convenience.

488	   In either case, there is little benefit in transferring data from the
489	   Data Source far in advance of when it will be written to the
490	   persistent storage media.  RDMA Reads allow the Storage Server to
491	   fetch the payload on a "just in time" basis.  In this fashion a
492	   relatively small number of block sized buffers can be used to execute
493	   a single transaction that specified writing a large file, or a
494	   Storage Server with numerous clients can fetch buffers from the
495	   individual clients in the order that is most convenient to the
496	   server.

498	   This same capability can be used when the desired portion of the
499	   advertised buffer is not known in advance.  For example the
500	   advertised buffer could contain performance statistics.  The data
501	   sink could request the portions of the data it required, without
502	   requiring an interaction with the Data Source ULP.

504	   This is applicable for many applications that publish semi-volatile
505	   data that does not require transactional validity checking (i.e.,
506	   authorized users have read access to the entire set of data).  It is
507	   less applicable when there are ULP consistency checks that must be
508	   performed upon the data.  Such applications would be better served by
509	   having the client send a request, and having the server use RDMA
510	   Writes to publish the requested data.  Neither RDMAP or DDP provide
511	   mechanisms for bundling multiple disjoint updates into an atomic
512	   operation.  Therefore use of an advertised buffer as a data resource
513	   is subject to the same caveats as any randomly updated data resource,
514	   such as flat files, that do not enforce their own cosnsistency.

516	6.  LLP Comparisons

518	   Normally the choice of underlying IP transport is irrelevant to the
519	   ULP.  RDMAP and DDP provides the same services over either.  There
520	   may be performance impacts of the choice, however.  It is the
521	   responsibility of the ULP to determine which IP transport is best
522	   suited to its needs.

524	   SCTP provides for preservation of message boundaries.  Each DDP
525	   segment will be delivered within a single SCTP packet.  The
526	   equivalent services are only available with TCP through the use of
527	   the MPA adaptation layer.

529	6.1.  Multistreaming Implications

531	   SCTP also provides multi-streaming.  When the same pair of hosts have
532	   need for multiple DDP streams this can be a major advantage.  A
533	   single SCTP association carries multiple DDP streams, consolidating
534	   connection setup, congestion control and acknowledgements.

536	   Completions are controlled by the DDP Source Sequence Number (DDP-
537	   SSN) on a per stream basis.  Therefore combining multiple DDP Streams
538	   into a single SCTP association cannot result in a dropped packet
539	   carrying data for one stream delaying completions on others.

541	6.2.  Out of Order Reception Implications

543	   The use of unordered Data Chunks with SCTP guarantees that the DDP
544	   layer will be able to perform placements when IP datagrams are
545	   received out of order.

547	   Placement of out-of-order DDP Segments carried over MPA/TCP is not
548	   guaranteed, but certainly allowed.  The ability of the MPA receiver
549	   to process out-of-order DDP Segments may be impaired when alignment
550	   of TCP segments and MPA FPDUs is lost.  Using SCTP, each DDP Segment
551	   is encoded in a single Data Chunk and never spread over multiple IP
552	   datagrams.

554	6.3.  Header and Marker Overhead

556	   MPA and TCP headers together are smaller than the headers used by
557	   SCTP and its adaptation layer.  However, this advantage can be
558	   considerably reduced by the insertion of MPA markers.  In any event
559	   the different in ULP payload per IP Datagram is not likely to be a
560	   signifigant factor.

562	6.4.  Middlebox Support

564	   Even with the MPA adaptation layer, DDP traffic carried over MPA/TCP
565	   will appear to all network middleboxes as a normal TCP connection.
566	   In many environments there may be a requirement to use only TCP
567	   connections to satisfy existing network elements and/or to facilitate
568	   monitoring and control of connections.  While SCTP is certainly just
569	   as monitorable and controllable as TCP, there is no guarantee that
570	   the network management infrastructure has the required support for
571	   both.

573	6.5.  Processing Overhead

575	   A DDP stream delivered via MPA/TCP will required more processing
576	   effort that one delivered over SCTP.  However this extra work may be
577	   justified for many deployments where full SCTP support is unavailable
578	   in the endpoints of the network, or where middleboxes impair the
579	   usability of SCTP.

581	6.6.  Data Integrity Implications

583	   Both the SCTP and MPA/TCP adaptation provide end-to-end CRC32c
584	   protection against data corruption, or its equivalent.

586	   A ULP that requires a greater degree of protection may add it own.
587	   However, DDP and RDMAP headers will only be guaranteed to have the
588	   equivalent of end-to-end CRC32c protection.  A ULP that requires data
589	   integrity checking more thorough than an end-to-end CRC32c should
590	   first invalidate all STags that reference a buffer before applying
591	   their own integrity check.

593	6.6.1.  MPA/TCP Specifics

595	   It is mandatory for MPA/TCP implementations to implement CRC32c, but
596	   it is NOT mandatory to use the CRC32c during an RDMA connection.  The
597	   activating or deactivating of the CRC in MPA/TCP is an administrative
598	   configuration operation at the local and remote end.  The
599	   administration of the CRC(ON/OFF) is invisible to the ULP.

601	   Applications SHOULD trust that this administrative option will only
602	   be used when the end-to-end protection is at least as effective as a
603	   transport layer CRC32c.  Applications SHOULD NOT apply additional
604	   protection as a guard against this administrative option being turned
605	   on inadvertently.

607	   Administrators MUST NOT enable CRC32c suppression unless the end-to-
608	   end protection is truly equivalent.

610	   If the CRC is active/used for one direction/end , then the use of the
611	   CRC is mandatory in both directions/ends.

613	   If both ends have been configured NOT to use the CRC, then this is
614	   allowed as long as an equivalent protection(comparable or better
615	   than/to CRC) from undetected errors on the connection is provided.

617	6.6.2.  SCTP Specifics

619	   SCTP provides CRC32c protection automatically.  The adaptation to
620	   SCTP provides for no option to suppress SCTP CRC32c protection.

622	6.7.  Non-IP Transports

624	   DDP is defined to operate over ubiquitous IP transports such as SCTP
625	   and TCP.  This enabled a new DDP-enabled node to be added anywhere to
626	   an IP network.  No DDP-specific support from middle-boxes is
627	   required.

629	   There are non-IP transport fabric offering RDMA capabilities.
630	   Because these capabilities are integrated with the transport protocol
631	   they have some technical advantages when compared to RDMA over IP.
632	   For example fencing of RDMA operations can be based upon transport
633	   level acks.  Because DDP is cleanly layered over an IP transport, any
634	   explicit RDMA layer ack must be separate from the transport layer
635	   ack.

637	   There may be deployments where the benefits of RDMA/transport
638	   integration outweigh the benefits of being on an IP network.

640	6.7.1.  No RDMA Layer Ack

642	   DDP does not provide for its own acknowledgements.  The only form of
643	   ack provided at the RDMAP layer is an RDMA Read Response.  DDP and
644	   RDMAP rely almost entirely upon other layers for flow control and
645	   pacing.  The LLP is relied upon to guarantee delivery and avoid
646	   network congestion, and ULP level acking is relied upon for ULP
647	   pacing and to avoid ULP buffer overruns.

649	   Previous RDMA protocols, such as InfiniBand, have been able to use
650	   their integration with the transport layer to provide stronger
651	   ordering guarantees.  It is important that application designers that
652	   require such guarantees to provide them through ULP interaction.

654	   Specifically:

656	      There is no ability for a local interface to "fence" outbound
657	      messages to guarantee that prior tagged messages have been placed
658	      prior to sending a tagged message.  The only guarantees available
659	      from the other side would be an RDMA Read Response (coming from
660	      the RDMAP layer) or a response from the ULP layer.  Remember that
661	      the normal ordering rules only guarantee when the Data Sink ULP
662	      will be notified of untagged messages, it does not control when
663	      data is placed into receive buffers.

665	      Re-use of tagged buffers must be done with extreme care.  The fact
666	      that an untagged message indicates that all prior tagged messages
667	      have been placed does not guarantee that no later tagged message
668	      have.  The best strategy is to only change the state of any given
669	      advertised buffers with with untagged messages.

671	      As covered elsewhere in this document, flow control of untagged
672	      messages MUST be provided by the ULP itself.

674	6.8.  Other IP Transports

676	   Both TCP and SCTP provide DDP with reliable transport with TCP
677	   friendly rate control.  As currently DDP is defined to work over
678	   reliable transports and implicitly relies upon some form of rate
679	   control.

681	   DDP is fully compatible with a non-reliable protocol.  Out-of-order
682	   placement is obviously not dependent on whether the other DDP
683	   Segments ever actually arrive.

685	   However, RDMAP requires the LLP to provide reliable service.  An
686	   alternate completion handling protocol would be required if DDP were
687	   to be deployed over an unreliable IP transport.

689	   As noted in the prior section on tagged buffers as ULP credits,
690	   neither RDMAP or DDP provide any flow control for tagged messages.
691	   If no transport layer flow control is provided, an RDMAP/DDP
692	   application would be only limited by the link layer rate, almost
693	   inevitably resulting in severe network congestion.

695	   RDMAP encourages applications to be ignorant of the underlying
696	   transport PMTU.  The ULP is only notified when all messages ending in
697	   a single untagged message have completed.  The ULP is not aware of
698	   the granularity or ordering of the underlying message.  This approach
699	   assumes that the ULP is only interested in the complete set of
700	   messages, and has no use for a subset of them.

702	6.9.  LLP Independent Session Establishment

704	   For an RDMAP/DDP application, the transport services provided by a
705	   pair of SCTP Streams and by a TCP connection both provide the same
706	   service (reliable delivery of DDP Segments between two connected
707	   RDMAP/DDP endpoints).

709	6.9.1.  RDMA-only Session Establishment

711	   It is also possible to allow for transport neutral establishment of
712	   RDMAP/DDP sessions between endpoints.  Combined, these two features
713	   would allow most applications to be unconcerned as to which LLP was
714	   actually in use.

716	   Specifically, the procedures for DDP Stream Session establishment
717	   discussed in section 3 of the SCTP mapping, and section 13.3 of the
718	   MPA/TCP mapping, both allow for the exchange of ULP specific data
719	   ("Private Data") before enabling the exchange of DDP Segments.  This
720	   delays can allow for proper selection and/or configuration of the
721	   endpoints based upon the exchanged data.  For example, each DDP
722	   Stream Session associated with a single client session might be
723	   assigned to the same DDP Protection Domain.

725	   To be transport neutral, the applications should exchange Private
726	   Data as part of session establishment messages to determine how the
727	   RDMA endpoints are to be configured.  One side must be the Initiator,
728	   and the other the Responder.

730	   With SCTP, a pair of SCTP streams can be used for sequential
731	   sessions.  With MPA/TCP each connection can be used for at most one
732	   session.  However, the same source/destination pair of ports can be
733	   re-used sequentially subject to normal TCP rules.

735	   Both SCTP and MPA limit the private data size to a maximum of 512
736	   bytes.

738	   MPA/TCP requires the end of the TCP connection that initiated the
739	   conversion to MPA mode to send the first DDP Segment.  SCTP does not
740	   have this requirement.  ULPs which wish to be transport neutral
741	   should require the initiating end to send the first message.  A zero-
742	   length RDMA Write can be used for this purpose if the ULP logic
743	   itself does naturally support this restriction.

745	6.9.2.  RDMA-Conditional Session Establishment

747	   It is sometimes desirable for the active side of a session to connect
748	   with the passive side before knowing whether the passive side
749	   supports RDMA.

751	   This style of session establishment can be supported with either TCP
752	   or SCTP, but not as transparently as for RDMA-only sessions.  Pre-
753	   existing non-RDMA servers are also far more likely to be using TCP
754	   than SCTP.

756	   With TCP. a normal TCP connection is established.  It is then used by
757	   the ULP to determine whether or not to convert to MPA mode and use
758	   RDMA.  This will typically be integral with other session
759	   establishment negotiations.

761	   With SCTP, the establishment of an association tests whether RDMA is
762	   supported.  If not supported, the application simply requests the
763	   association without the RDMA adaptation indication.

765	   In key difference is that with SCTP the determination as to whether
766	   the peer can support RDMA is made before the transport layer
767	   association/connection is established while with TCP the established
768	   connection itself is used to determine whether RDMA is supported.

770	7.  Local Interface Implications

772	   Full utilization of DDP and RDMAP capabilities requires a local
773	   interface that explicitly requests these services.  Protocols such as
774	   Sockets Direct Protocol (SDP) can allow applications to keep their
775	   traditional byte-stream or message-stream interface and still enjoy
776	   many of the benefits of the optimized wire level protocols.

778	8.  Security considerations

780	8.1.  Connection/Association Setup

782	   Both the SCTP and TCP adaptations allow for existing procedures to be
783	   followed for the establishment of the SCTP association or TCP
784	   connection.  Use of DDP does not impair the use of any security
785	   measures to filter, validate and/or log the remote end of an
786	   association/connection.

788	8.2.  Tagged Buffer Exposure

790	   DDP only exposes ULP memory to the extent explicitly allowed by ULP
791	   actions.  These include posting of receive operations and enabling of
792	   Steering Tags.

794	   Neither RDMAP or DDP place requirements on how ULP's advertise
795	   buffers.  A ULP may use a single Steering Tag for multiple buffer
796	   advertisements.  However, the ULP should be aware that enforcement on
797	   STag usage is likely limited to the overall range that is enabled.
798	   If the remote peer writes into the 'wrong' advertised buffer, neither
799	   the DDP or RDMAP layer will be aware of this.  Nor is there any
800	   report to the ULP on how the remote peer specifically used tagged
801	   buffers.

803	   Unless the ULP peers have an adequate basis for mutual trust, the
804	   receiving ULP might be well advised to use a distinct STag for each
805	   interaction, and to invalidate it after each use or to require its
806	   peer to use the RDMAP option to invalidate the STag with its
807	   responding untagged message.

809	8.3.  Impact of Encrypted Transports

811	   While DDP is cleanly layered over the LLP, its maximum benefit may be
812	   limited when the LLP Stream is secured with a streaming cypher, such
813	   as Transport Layer Security (TLS).  If the LLP must decrypt in order,
814	   it cannot provide out-of-order DDP Segments to the DDP layer for
815	   placement purposes.  IPsec tunnel mode encrypts entire IP Datagrams.
816	   IPsec transport mode encrypts TCP Segments or SCTP packets.  In
817	   neither case should IPsec preclude providing out-of-order DDP
818	   Segments to the DDP layer for placement.

820	   Note that end-to-end use of IPsec cryptographic integrity protection
821	   may allow suppression of MPA CRC generation and checking under
822	   certain circumstances.  This is one example where the LLP may be
823	   judged to have "or equivalent" protection to an end-to-end CRC32c.

825	9.  References

827	   [1]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
828	         Levels", BCP 14, RFC 2119, March 1997.

830	   [2]   Dierks, T. and C. Allen, "The TLS Protocol Version 1.0",
831	         RFC 2246, January 1999.

833	   [3]   Kent, S. and R. Atkinson, "IP Encapsulating Security Payload
834	         (ESP)", RFC 2406, November 1998.

836	   [4]   Stewart, R., Xie, Q., Morneault, K., Sharp, C., Schwarzbauer,
837	         H., Taylor, T., Rytina, I., Kalla, M., Zhang, L., and V.
838	         Paxson, "Stream Control Transmission Protocol", RFC 2960,
839	         October 2000.

841	   [5]   Coene, L., "Stream Control Transmission Protocol Applicability
842	         Statement", RFC 3257, April 2002.

844	   [6]   Recio, R., "An RDMA Protocol Specification",
845	         draft-ietf-rddp-rdmap-05 (work in progress), July 2005.

847	   [7]   Shah, H., "Direct Data Placement over Reliable Transports",
848	         draft-ietf-rddp-ddp-05 (work in progress), July 2005.

850	   [8]   Stewart, R., "Stream Control Transmission Protocol (SCTP)
851	         Remote Direct Memory Access (RDMA) Direct Data Placement (DDP)
852	         Adaptationn", draft-ietf-rddp-sctp-02 (work in progress),
853	         August 2005.

855	   [9]   Culley, P., "Marker PDU Aligned Framing for TCP Specification",
856	         draft-ietf-rddp-mpa-02 (work in progress), February 2005.

858	   [10]  "Direct Access File System versino 1.0", September 2001.

860	   [11]  Pinkerton, J., "Sockets Direct Protocol (SDP) for iWARP over
861	         TCP 1.0", October 2003.

863	Authors' Addresses

865	   Caitlin Bestler
866	   Broadcom
867	   49 Discovery
868	   Irvine, CA  92618
869	   USA

871	   Phone: 949-926-6383
872	   Email: caitlinb@broadcom.com

874	   Lode Coene
875	   Siemens
876	   Atealaan 26
877	   Herentals,   2200
878	   Belgium

880	   Phone: +32-14-252081
881	   Email: lode.coene@siemens.com

883	Intellectual Property Statement

885	   The IETF takes no position regarding the validity or scope of any
886	   Intellectual Property Rights or other rights that might be claimed to
887	   pertain to the implementation or use of the technology described in
888	   this document or the extent to which any license under such rights
889	   might or might not be available; nor does it represent that it has
890	   made any independent effort to identify any such rights.  Information
891	   on the procedures with respect to rights in RFC documents can be
892	   found in BCP 78 and BCP 79.

894	   Copies of IPR disclosures made to the IETF Secretariat and any
895	   assurances of licenses to be made available, or the result of an
896	   attempt made to obtain a general license or permission for the use of
897	   such proprietary rights by implementers or users of this
898	   specification can be obtained from the IETF on-line IPR repository at
899	   http://www.ietf.org/ipr.

901	   The IETF invites any interested party to bring to its attention any
902	   copyrights, patents or patent applications, or other proprietary
903	   rights that may cover technology that may be required to implement
904	   this standard.  Please address the information to the IETF at
905	   ietf-ipr@ietf.org.

907	Disclaimer of Validity

909	   This document and the information contained herein are provided on an
910	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
911	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
912	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
913	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
914	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
915	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

917	Copyright Statement

919	   Copyright (C) The Internet Society (2005).  This document is subject
920	   to the rights, licenses and restrictions contained in BCP 78, and
921	   except as set forth therein, the authors retain all their rights.

923	Acknowledgment

925	   Funding for the RFC Editor function is currently provided by the
926	   Internet Society.