idnits 2.17.1 

draft-ietf-rddp-applicability-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 21 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 329 has weird spacing: '...r sends  multi...'

  == Line 413 has weird spacing: '...g so is  to us...'

  == Line 415 has weird spacing: '...ich the  untag...'

  == Line 434 has weird spacing: '...control  based...'

  == Line 620 has weird spacing: '...e level  proto...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (June 12, 2003) is 7623 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '1' is defined on line 671, but no explicit reference
     was found in the text

  == Unused Reference: '2' is defined on line 674, but no explicit reference
     was found in the text

  == Unused Reference: '3' is defined on line 678, but no explicit reference
     was found in the text

  == Unused Reference: '4' is defined on line 681, but no explicit reference
     was found in the text

  == Unused Reference: '5' is defined on line 685, but no explicit reference
     was found in the text

  == Unused Reference: '6' is defined on line 688, but no explicit reference
     was found in the text

  == Unused Reference: '7' is defined on line 691, but no explicit reference
     was found in the text

  == Unused Reference: '8' is defined on line 694, but no explicit reference
     was found in the text

  == Unused Reference: '9' is defined on line 699, but no explicit reference
     was found in the text

  ** Obsolete normative reference: RFC 2246 (ref. '2') (Obsoleted by RFC 4346)

  ** Obsolete normative reference: RFC 2406 (ref. '3') (Obsoleted by RFC
     4303, RFC 4305)

  ** Obsolete normative reference: RFC 2960 (ref. '4') (Obsoleted by RFC 4960)

  ** Downref: Normative reference to an Informational RFC: RFC 3257 (ref. '5')

  == Outdated reference: A later version (-07) exists of
     draft-ietf-rddp-rdmap-00

  == Outdated reference: A later version (-07) exists of
     draft-ietf-rddp-ddp-00

  -- Possible downref: Normative reference to a draft: ref. '8' 

  == Outdated reference: A later version (-03) exists of
     draft-culley-iwarp-mpa-02

  -- Possible downref: Normative reference to a draft: ref. '9' 


     Summary: 7 errors (**), 0 flaws (~~), 20 warnings (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Remote Direct Data Placement                                  C. Bestler
3	Working group                                                   L. Coene
4	Internet-Draft                                             June 12, 2003
5	Expires: December 11, 2003

7	    Applicability of Remote Direct Memory Access Protocol (RDMA) and
8	                      Direct Data Placement (DDP)
9	                  draft-ietf-rddp-applicability-00.txt

11	Status of this Memo

13	   This document is an Internet-Draft and is in full conformance with
14	   all provisions of Section 10 of RFC2026.

16	   Internet-Drafts are working documents of the Internet Engineering
17	   Task Force (IETF), its areas, and its working groups.  Note that
18	   other groups may also distribute working documents as Internet-
19	   Drafts.

21	   Internet-Drafts are draft documents valid for a maximum of six months
22	   and may be updated, replaced, or obsoleted by other documents at any
23	   time.  It is inappropriate to use Internet-Drafts as reference
24	   material or to cite them other than as "work in progress."

26	   The list of current Internet-Drafts can be accessed at http://
27	   www.ietf.org/ietf/1id-abstracts.txt.

29	   The list of Internet-Draft Shadow Directories can be accessed at
30	   http://www.ietf.org/shadow.html.

32	   This Internet-Draft will expire on December 11, 2003.

34	Copyright Notice

36	   Copyright (C) The Internet Society (2003).  All Rights Reserved.

38	Abstract

40	   This document describes the applicability of Remote Direct Memory
41	   Access Protocol (RDMAP)  and the Direct Data Placement Protocol
42	   (DDP).  It contrasts the different transport options over IP that DDP
43	   can use, compares use of DDP with direct use of the supporting
44	   transports, and compares DDP over IP transports with non-IP
45	   transports that support RDMA functionality.

47	Table of Contents

49	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
50	   2.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  5
51	   3.  Direct Placement . . . . . . . . . . . . . . . . . . . . . . .  6
52	   3.1 Fewer Required ULP Interactions  . . . . . . . . . . . . . . .  6
53	   3.2 Direct Placement using only the LLP  . . . . . . . . . . . . .  6
54	   4.  Tagged Messages  . . . . . . . . . . . . . . . . . . . . . . .  8
55	   4.1 Order Independent Reception  . . . . . . . . . . . . . . . . .  8
56	   4.2 Reduced ULP Notifications  . . . . . . . . . . . . . . . . . .  8
57	   4.3 Simplified ULP Exchanges . . . . . . . . . . . . . . . . . . .  9
58	   4.4 Order Independent Sending  . . . . . . . . . . . . . . . . . . 10
59	   4.5 Tagged Buffers as ULP Credits  . . . . . . . . . . . . . . . . 11
60	   5.  RDMA Read  . . . . . . . . . . . . . . . . . . . . . . . . . . 13
61	   6.  LLP Comparisons  . . . . . . . . . . . . . . . . . . . . . . . 14
62	   6.1 Multistreaming Implications  . . . . . . . . . . . . . . . . . 14
63	   6.2 Out of Order Reception Implications  . . . . . . . . . . . . . 14
64	   6.3 Header and Marker Overhead . . . . . . . . . . . . . . . . . . 14
65	   6.4 Data Integrity Implications  . . . . . . . . . . . . . . . . . 15
66	   6.5 Non-IP Transports  . . . . . . . . . . . . . . . . . . . . . . 15
67	   6.6 Other IP Transports  . . . . . . . . . . . . . . . . . . . . . 15
68	   7.  Local Interface Implications . . . . . . . . . . . . . . . . . 17
69	   8.  Security considerations  . . . . . . . . . . . . . . . . . . . 18
70	   8.1 Connection/Association Setup . . . . . . . . . . . . . . . . . 18
71	   8.2 Tagged Buffer Exposure . . . . . . . . . . . . . . . . . . . . 18
72	   8.3 Impact of Encrypted Transports . . . . . . . . . . . . . . . . 18
73	       References . . . . . . . . . . . . . . . . . . . . . . . . . . 19
74	       Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 19
75	       Full Copyright Statement . . . . . . . . . . . . . . . . . . . 21

77	1. Introduction

79	   Remote Direct Memory Access Protocol (RDMAP) and Direct Data
80	   Placement (DDP) work together to provide application independent
81	   efficient placemenet of application payload directly into buffers
82	   specified by the Upper Layer Protocol (ULP).

84	   The DDP protocol is responsible for direct placement of received
85	   payload into ULP specified buffers.  The RDMAP protocol provides
86	   completion notifications to the ULP and support for Data Sink
87	   initiated fetch of advertised buffers (RDMA Reads).

89	   DDP and RDMAP are both application independent protocols which allow
90	   the ULP to perform remote direct data placement.  DDP can use
91	   multiple standard IP transports including SCTP and TCP.

93	   By clarifying the situations where the functionality of these
94	   protocols are applicable, this document can guide implementers,
95	   application and protocol designers in selecting which protocols to
96	   use.

98	   The applicability of RDMAP/DDP is driven by their unique
99	   capabilities:

101	   o  The existence of an application independent protocol allows common
102	      solutions to be implemented in hardware and/or the kernel.  This
103	      document will discuss when common data placement procedures are of
104	      the greatest benefit to applications as contrasted with
105	      application specific solutions built on top of direct use of the
106	      underlying transport.

108	   o  DDP supports both untagged and tagged buffers.  Tagged buffers
109	      allow the Data Sink ULP to be indifferent to what order (or in
110	      what packets) the Data Source sent the data, or what order they
111	      are received in.  This document will discuss when Data Source
112	      flexibility is of benefit to applications.

114	   o  RDMAP consolidates ULP notifications, thereby minimizing the
115	      number of required ULP interactions.

117	   o  RDMAP defines RDMA Reads, which allow remote access to advertised
118	      buffers.  This document will review the advantages of using RDMA
119	      Reads as contrasted to alternate solutions.

121	   Some non-IP transports, such as InfiniBand, directly integrate RDMA
122	   features.  This document will review the applicability of providing
123	   RDMA services over ubiquitous IP transports as opposed to the use of
124	   customized transport protocols.

126	   The full capabilities of DDP and RDMAP can only be fully realized by
127	   applications that are designed to exploit them.  The co-existence of
128	   RDMAP/DDP aware local interfaces with traditional socket interfaces
129	   will also be explored.

131	   Finally, DDP support is defined for at least two IP transports: SCTP
132	   and TCP.  The rationale for supporting both transports is reviewed,
133	   as well as when each would be the appropriate selection.

135	2. Definitions

137	   Advertisement - the act of informing a Remote Peer that a local RDMA
138	      Buffer is available to it.  A Node makes available an RDMA Buffer
139	      for incoming RDMA Read or RDMA Write access by informing its RDMA/
140	      DDP peer of the Tagged Buffer identifiers (STag, base address, and
141	      buffer length).  This advertisement of Tagged Buffer information
142	      is not defined by RDMA/DDP and is left to the ULP.  A typical
143	      method would be for the Local Peer to embed the Tagged Buffer's
144	      Steering Tag, base address, and length in a Send Message destined
145	      for the Remote Peer.

147	   Data Sink - The peer receiving a data payload.  Note that the Data
148	      Sink can be required to both send and receive RDMA/DDP Messages to
149	      transfer a data payload.

151	   Data Source - The peer sending a data payload.  Note that the Data
152	      Source can be required to both send and receive RDMA/DDP Messages
153	      to transfer a data payload.

155	   Lower Layer Protocol (LLP) The transport protocol that provides
156	      services to DDP.  This is an IP transport with any required
157	      adaptation layer.  Adaptation layers are defined for SCTP and TCP.

159	   Steering Tag (STag) An identifier of a Tagged Buffer on a Node, valid
160	      as defined within a protocol specification.

162	   Tagged Message A DDP message that is directed to a ULP specified
163	      buffer based upon imbedded addressing information.  In the
164	      immediate sense, the destination buffer is specified by the
165	      message sender.

167	   Untagged Message A DDP message that is directed to a ULP specified
168	      buffer based upon a Message Sequence Number being matched with a
169	      receiver supplied buffer.  The destination buffer is specified by
170	      the message receiver.

172	   Upper Layer Protocol (ULP) The direct user of RDMAP/DDP services.
173	      This may be an application, or a middleware layer such as Sockets
174	      Direct Protocol (SDP) or Remote Procedure Calls (RPC).

176	3. Direct Placement

178	   Direct Data Placement optimizes the placement of ULP payload into the
179	   correct destination buffers, typically eliminating intermediate
180	   copying.  Placement is enabled without regard to order of arrival,
181	   order of transmission or requiring per-placement interaction with the
182	   ULP.

184	   RDMAP minimizes the required ULP interactions .  This capability is
185	   most valuable for applications that require multiple transport layer
186	   packets for each required ULP interaction.

188	3.1 Fewer Required ULP Interactions

190	   While reducing the number of required ULP interactions is in itself
191	   desirable, it is critical for high speed connections.  The burst
192	   packet rate for a high speed interface could easily exceed the host
193	   systems ability to switch ULP contexts.

195	   Content access applications are primary examples of applications with
196	   both high bandwidth and high content to required ULP interaction
197	   ratios.  These applications include file access protocols (NAS),
198	   storage access (SAN), database access and other application specific
199	   forms of content access such as HTTP, XML and email.

201	   Direct data placement can be achieved without RDMA.  Pre-posting of
202	   receive buffers could allow a non-RDMA network stack to place data
203	   directly to user buffers.

205	3.2 Direct Placement using only the LLP

207	   The degree to which DDP optimizes depends on which transport is being
208	   compared with, and on the nature of the local interface.  Without
209	   RDMAP/DDP pre-posting buffers requires the receiving side to
210	   accurately predict the required buffers and their sizes.  This is not
211	   feasible for all ULPs.  By contrast, DDP only requires the ULP to
212	   predict the sequence and size of incoming untagged messages.

214	   An application that could predict incoming messages and required
215	   nothing more than direct placement into buffers might be able to do
216	   so with a properly designed local interface to SCTP or TCP.  Doing so
217	   for  TCP requires making predictions at a byte level rather than a
218	   message level.

220	   The main benefit of DDP for such an application would be that pre-
221	   posting of receive buffers is a mandated local interface capability,
222	   and that predictions can be made on a per-message basis (not per
223	   byte).

225	   The LLP can also be used directly if ULP specific knowledge is built
226	   into the protocol stack to allow "parse and place" handling of
227	   received packets.  Such a solution either requires interaction with
228	   the ULP, or that the protocol stack have knowledge of ULP specific
229	   syntax rules.

231	   DDP achieves the benefits of directly placing incoming payload
232	   without requiring tight coupling between the ULP and the protocol
233	   stack.  However, "parse adn place" capabilities can certainly provide
234	   equivalent services to a limited number of ULPs.

236	4. Tagged Messages

238	   This section covers the major benefits from the use of Tagged
239	   Messages.

241	   A more critical advantage of DDP is the ability of the Data Source to
242	   use tagged buffers.  Tagging transfers allows the Data Source to
243	   choose the ordering and packetization of its payload deliveries.
244	   With direct data placement based solely upon pre-posted receives, the
245	   packetization and delivery of payload must be agreed by the ULP
246	   peers.  Even if there is an encoding of what is being transferred, as
247	   is common with middleware solutions, this information is not
248	   understood at the application independent layers.  The directions on
249	   where to place the incoming data cannot be accessed without switching
250	   to the ULP first.  DDP provides a standardized 'packing list' which
251	   can be interpreted without requiring ULP interaction.  Indeed, it is
252	   designed to be implementable in hardware.

254	4.1 Order Independent Reception

256	   Tagged messages are directed to a buffer based on an included
257	   Steering Tag.  Additionally, no notice is provided to the ULP for
258	   each individual Tagged Message's arrival.  Together these allow
259	   tagged messages received out-of-order to be processed without
260	   intermediate buffering or additional notifications to the ULP.

262	4.2 Reduced ULP Notifications

264	   RDMAP further reduces required ULP interactions consolidating
265	   completion notifications of tagged messages with the completion
266	   notification of a trailing untagged message.  For most ULPs this
267	   radically reduces the number of ULP required interactions even
268	   further.

270	   While RDMAP consolidation of notices is beneficial to most
271	   applications.  It may be detrimental to some applications that
272	   benefit from streamed delivery to enable ULP processing of received
273	   data as promptly as possible.  A ULP that uses RDMAP cannot begin
274	   processing any portion of an exchange until it receives notification
275	   that the entire exchange has been placed.  An "exchange" here is a
276	   set of zero or more tagged messages and a single terminating untagged
277	   message.  An application that would prefer to begin work on the
278	   received payload, no matter what order it arrived in, as soon as
279	   possible might prefer to work directly with the LLP.  RDMAP is
280	   optimized for applications that are more concerned when the entire
281	   exchange is complete.

283	   An application that benefits from being able to begin processing of
284	   each received packet as quickly as possible may find RDMAP interferes
285	   with that goal.

287	   Such an application might be able to retain most of the benefits of
288	   RDMAP by using the DDP layer directly.  However, in addition to
289	   taking on the responsibilities of the RDMAP layer, the application
290	   would likely have more difficulty finding support for a DDP-only API.
291	   Many hardware implementations may choose to tightly couple RDMAP and
292	   DDP, and might not provide an API directly to DDP services.

294	   These features minimize the required interactions with the ULP.  This
295	   can be extremely beneficial for applications that use multiple
296	   transport layer packets to accomplish what is a single ULP
297	   interaction.

299	4.3 Simplified ULP Exchanges

301	   The notification rules for Tagged Messages allows ULPs to create
302	   multi-message "exchanges" consisting of zero or more tagged messages
303	   that represent a single step in the ULP interaction.  The receiving
304	   ULP is notified that the untagged message has arrived, and implicitly
305	   of any associated tagged messages.

307	   A ULP where all exchanges would naturally be only the untagged
308	   message would derive virtually no benefit from the use of RDMAP/DDP
309	   as opposed to SCTP.  But while tagged buffers are the justification
310	   for RDMAP/DDP, untagged buffers are still necessary.  Without
311	   untagged buffers the only method to exchange buffer advertisements
312	   would involve out-of-band communications and/or sharing of compile
313	   time constants.  Most RDMA-aware ULPs use untagged buffers for
314	   requests and responses.  Buffer advertisements are typically done
315	   within these untagged messages.

317	   Limiting use of untagged buffers to requests and responses by moving
318	   all bulk data using tagged transfers can greatly simplify the amount
319	   of prediction that the Data Sink must perform in pre-posting receive
320	   buffers.  For example, a typical RDMA enabled interaction would
321	   consist of the following:

323	      Client sends transaction request to server's as an untagged
324	      message.

326	      This message includes buffer advertisements for the buffers where
327	      the results are to be placed.

329	      The Server sends  multiple tagged messages to the advertised
330	      buffers.

332	      The Server sends transaction reply as an untagged message to the
333	      client.

335	      Client receives single notification, indicating completion of the
336	      interaction.

338	   With this type of exchange the pacing and required size of untagged
339	   buffers is highly predictable.  The variability of response sizes is
340	   absorbed by tagged transfers.

342	4.4 Order Independent Sending

344	   Use of tagged messages is especially applicable when the Data Sink
345	   does not know the actual size, structure or location of the content
346	   it is requesting (or updating).

348	   For example, suppose the Data Sink ULP needs to fetch four related
349	   pieces of data into a four separate buffers.  With SCTP the Data Sink
350	   ULP could receive four messages into four separate buffers, only
351	   having to predict the maximum size of each.  However it would have to
352	   dictate the order in which the Data Source supplied the separate
353	   pieces.  If the Data Source found it advantageous to fetch them in a
354	   different order it would have to use intermediate buffering to re-
355	   order the pieces into the expected order even though the application
356	   only required that all four be delivered and did not truly have an
357	   ordering requirement.

359	   Techniques such as RAID striping and mirroring represent this same
360	   problem, but one step further.  What appears to be a single resource
361	   to the Data Sink is actually stored in separate locations by the Data
362	   Source.  Non RDMA protocols would either require the Data Source to
363	   fetch the material in the desired order or force the Data Source to
364	   use its own holding buffers to assemble an image of the destination
365	   buffer.

367	   While sometimes referred to as a "buffer-to-buffer" solution, RDMA
368	   more fundamentally enables remote buffer access.  The ULP is free to
369	   work with larger remote buffers than it has locally.  This reduces
370	   buffering requirements and the number of times the data must be
371	   copied in an end-to-end transfer.

373	   There are numerous reasons why the Data Sink would not know the true
374	   order or location of the requested data.  It could be different for
375	   each client, different records selected and/or different sort orders,
376	   RAID striping, file fragmentation, volume fragmentation, volume
377	   mirroring and server-side dynamic compositing of content (such as
378	   server side includes for HTTP).

380	   In all of these cases the Data Source is free to assemble the desired
381	   data in the Data Sinks buffer in whatever order the component data
382	   becomes available to it.  It is not constrained on ordering.  It does
383	   not have to assemble an image in its own memory before creating it in
384	   the Data Sink's buffers.

386	   Note that while DDP enables use of tagged messages for bulk transfer,
387	   there are some application scenarios where untagged messages would
388	   still be used for bulk transfer.  For example, under the Direct
389	   Access File Server (DAFS) protocol the file server does not expose
390	   its own memory to its clients.  A client wishing to write may
391	   advertise a buffer which the server will issue RDMA Reads upon.
392	   However, when performing a small write it may be preferable to
393	   include the data in the untagged message rather than incurring an
394	   additional round trip with the RDMA Read and its response.

396	4.5 Tagged Buffers as ULP Credits

398	   The handling of end-to-end buffer credits differs considerably with
399	   DDP than when the ULP directly uses either TCP or SCTP.

401	   With both TCP and SCTP buffer credits are based upon the receiver
402	   granting transmit permission based on the total number of bytes.
403	   These credits reflect system buffering resources and/or simple flow
404	   control.  They do not represent ULP resources.

406	   DDP defines no standard flow control, but presumes the existince of a
407	   ULP mechanism.  The presumed mechanism is that the Data Sink ULP has
408	   issued credits to the Data Source allowing the Data Source to send a
409	   specific number of untagged messages.

411	   The ULP peers must ensure that the sender is aware of the maximum
412	   size that can be sent to any specific target buffer.  One method of
413	   doing so is  to use a standard size for all untagged buffers within a
414	   given connection.  For example, DAFS specifies an initial size
415	   requirement for session establishment, during which the  untagged
416	   buffer size for the remainder of the session is negotiated.

418	   Tagged buffers are ULP resources advertised directly from ULP to ULP.
419	   A DDP put to a known tagged buffer is constrained only by transport
420	   level flow control, not by available system buffering.

422	   Either tagged or untagged buffers allows bypassing of system buffer
423	   resources.  Use of tagged buffers additionally allows the Data Source
424	   to choose what order to exercise the credits in.

426	   To the extent allowed by the ULP, tagged buffers are also divisible
427	   resources.  The Data Sink can advertise a single 100 KB buffer, and
428	   then receive notifications from its peer that it had written 50 KB,
429	   20 KB and 30 KB to that buffer in three successive transactions.

431	   ULP-management of tagged buffer resources, independent of transport
432	   and DDP layer credits, is an additional benefit of RDMA protocols.
433	   Large bulk transfers cannot be blocked by limited general purpose
434	   buffering capacity.  Applications can flow control  based upon higher
435	   level abstractions, such as number of outstanding requests,
436	   independent of the amount of data that must be transferred.

438	   However, use of system buffering, as offered by direct use of the
439	   underlying transports, can be preferable under certain circumstances.

441	   One example would be when the number of target ULP buffers is
442	   sufficiently large, and the rate at which any writes arrive is
443	   sufficiently low, that pinning all the target ULP buffers in memory
444	   would be undesirable.  The maximum transfer rate, and hence the
445	   maximum amount of system buffering required,  may be more stable and
446	   predictable than the total ULP buffer exposure.

448	   Another would be the Data Sink wishes to receive a stream of data at
449	   a predictable rate, but does not know in advance what the size of
450	   each data packet will be.  This is common from streaming media that
451	   has been encoded with a variable bit rate.  With DDP the Data Sink
452	   would either have to use untagged buffers large enough for the
453	   largest packet, or advertise a circular buffer.  If for security or
454	   other reasons the Data Sink did not want the size of its buffer to be
455	   publicly known, using the underlying SCTP transport directly may be
456	   preferable because of their byte-oriented credits.

458	5. RDMA Read

460	   RDMA Reads are a further service provided by RDMAP.  RDMA Reads allow
461	   the Data Sink to fetch exactly the portion of the peer ULP buffer
462	   required on a "just in time" basis.  This can be done without
463	   requiring per-fetch support from the Data Source ULP.

465	   Storage servers may wish to limit the maximum write buffer allocated
466	   to any single session.  The storage server may be a very minimal
467	   layer between the client and the disk storage media, or the server
468	   may merely wish to limit the total resources that would be required
469	   if all clients could push the entire payload they wished written at
470	   their own convenience.

472	   In either case, there is little benefit in transferring data from the
473	   Data Source far in advance of when it will be written to the
474	   persistent storage media.  RDMA Reads allow the Storage Server to
475	   fetch the payload on a "just in time" basis.  In this fashion a
476	   relatively small number of block sized buffers can be used to execute
477	   a single transaction that specified writing a large file, or a
478	   Storage Server with numerous clients can fetch buffers from the
479	   individual clients in the order that is most convenient to the
480	   server.

482	   This same capability can be used when the desired portion of the
483	   advertised buffer is not known in advance.  For example the
484	   advertised buffer could contain performance statistics.  The data
485	   sink could request the portions of the data it required, without
486	   requiring an interaction with the Data Source ULP.

488	   This is applicable for many applications that publish semi-volatile
489	   data that does not require transactional validity checking (i.e.,
490	   authorized users have read access to the entire set of data).  It is
491	   less applicable when there are ULP consistency checks that must be
492	   performed upon the data.  Such applications would be better served by
493	   having the client send a request, and having the server use RDMA
494	   Writes to publish the requested data.  Neither RDMAP or DDP provide
495	   mechanisms for bundling multiple disjoint updates into an atomic
496	   operation.  Therefore use of an advertised buffer as a data resource
497	   is subject to the same caveats as any randomly updated data resource,
498	   such as flat files, that do not enforce their own cosnsistency.

500	6. LLP Comparisons

502	   Normally the choice of underlying IP transport is irrelevant to the
503	   ULP.  RDMAP and DDP provides the same services over either.  There
504	   may be performance impacts of the choice, however.  It is the
505	   responsibility of the ULP to determine which IP transport is best
506	   suited to its needs.

508	   SCTP provides for preservation of message boundaries.  Each DDP
509	   segment will be delivered within a single SCTP packet.  The
510	   equivalent services are only available with TCP through the use of
511	   the MPA adaptation layer.

513	6.1 Multistreaming Implications

515	   SCTP also provides multi-streaming.  When the same pair of hosts have
516	   need for multiple DDP streams this can be a major advantage.  A
517	   single SCTP association carries multiple DDP streams, consolidating
518	   connection setup and flow control.

520	   Completions are controlled by the DDP Source Sequence Number (DDP-
521	   SSN) on a per stream basis.  Therefore combining multiple DDP Streams
522	   into a single SCTP association cannot result in a dropped packet
523	   carrying data for one stream delaying completions on others.

525	6.2 Out of Order Reception Implications

527	   The use of unordered Data Chunks with SCTP guarantees that the DDP
528	   layer will be able to perform placements when IP datagrams are
529	   received out of order.

531	   Placement of out-of-order DDP Segments carried over MPA/TCP is not
532	   guaranteed, but certainly allowed.  The ability of the MPA receiver
533	   to process out-of-order DDP Segments may be impaired when TCP
534	   alignment is lost.  Using SCTP, each DDP Segment is encoded in a
535	   single Data Chunk and never spread over multiple IP datagrams.

537	6.3 Header and Marker Overhead

539	   MPA and TCP headers together are smaller than the headers used by
540	   SCTP and its adaptation layer.  However, this advantage can be
541	   considerably reduced by the insertion of MPA markers.  In any event
542	   the different in ULP payload per IP Datagram is not likely to be a
543	   signifigant factor.

545	   Even with the MPA adaptation layer, DDP traffic will appear to all
546	   network traffic as a normal TCP connection.  In many environmenets
547	   there may be a requirement to use only TCP connections to satisfy
548	   existing network elements and/or to facilitate monitoring and control
549	   of connections.

551	   A DDP stream delivered via MPA/TCP will require more processing
552	   effort than one delivered over SCTP.  However this extra work may be
553	   justified for many deployments where full SCTP support is unavailable
554	   in the intermediate network.

556	6.4 Data Integrity Implications

558	   Both the SCTP and MPA/TCP adaptation provide end-to-end CRC32c
559	   protection against data corruption, or its equivalent.

561	   A ULP that requires a greater degree of protection may add it own.
562	   However, DDP and RDMAP headers will only be guaranteed to have the
563	   equivalent of end-to-end CRC32c protection.  A ULP that requires data
564	   integrity checking more thorough than an end-to-end CRC32c should
565	   first invalidate all STags that reference a buffer before applying
566	   their own integrity check.

568	6.5 Non-IP Transports

570	   DDP is defined to operate over ubiquitous IP transports such as SCTP
571	   and TCP.  This enabled a new DDP-enabled node to be added anywhere to
572	   an IP network.  No DDP-specific support from middle-boxes is
573	   required.

575	   There are non-IP transport fabric offering RDMA capabilities.
576	   Because these capabilities are integrated with the transport protocol
577	   they have some technical advantages when compared to RDMA over IP.
578	   For example fencing of RDMA operations can be based upon transport
579	   level acks.  Because DDP is cleanly layered over an IP transport, any
580	   explicit RDMA layer ack must be separate from the transport layer
581	   ack.

583	   There may be deployments where the benefits of RDMA/transport
584	   integration outweigh the benefits of being on an IP network.

586	6.6 Other IP Transports

588	   Both TCP and SCTP provide DDP with reliable transport with TCP
589	   friendly rate control.  As currently DDP is defined to work over
590	   reliable transports and implicitly relies upon some form of rate
591	   control.

593	   DDP is fully compatible with a non-reliable protocol.  Out-of-order
594	   placement is obviously not dependent on whether the other DDP
595	   Segments ever actually arrive.

597	   However, RDMAP requires the LLP to provide reliable service.  An
598	   alternate completion handling protocol would be required if DDP were
599	   to be deployed over an unreliable IP transport.

601	   As noted in the prior section on tagged buffers as ULP credits,
602	   neither RDMAP or DDP provide any flow control for tagged messages.
603	   If no transport layer flow control is provided, an RDMAP/DDP
604	   application would be only limited by the link layer rate, almost
605	   inevitably resulting in severe network congestion.

607	   RDMAP encourages applications to be ignorant of the underlying
608	   transport PMTU.  The ULP is only notified when all messages ending in
609	   a single untagged message have completed.  The ULP is not aware of
610	   the granularity or ordering of the underlying message.  This approach
611	   assumes that the ULP is only interested in the complete set of
612	   messages, and has no use for a subset of them.

614	7. Local Interface Implications

616	   Full utilization of DDP and RDMAP capabilities requires a local
617	   interface that explicitly requests these services.  Protocols such as
618	   Sockets Direct Protocol (SDP) can allow applications to keep their
619	   traditional byte-stream or message-stream interface and still enjoy
620	   many of the benefits of the optimized wire level  protocols.

622	8. Security considerations

624	8.1 Connection/Association Setup

626	   Both the SCTP and TCP adaptations allow for existing procedures to be
627	   followed for the establishment of the SCTP association or TCP
628	   connection.  Use of DDP does not impair the use of any security
629	   measures to filter, validate and/or log the remote end of an
630	   association/connection.

632	8.2 Tagged Buffer Exposure

634	   DDP only exposes ULP memory to the extent explicitly allowed by ULP
635	   actions.  These include posting of receive operations and enabling of
636	   Steering Tags.

638	   Neither RDMAP or DDP place requirements on how ULP's advertise
639	   buffers.  A ULP may use a single Steering Tag for multiple buffer
640	   advertisements.  However, the ULP should be aware that enforcement on
641	   STag usage is likely limited to the overall range that is enabled.
642	   If the remote peer writes into the 'wrong' advertised buffer, neither
643	   the DDP or RDMAP layer will be aware of this.  Nor is there any
644	   report to the ULP on how the remote peer specifically used tagged
645	   buffers.

647	   Unless the ULP peers have an adequate basis for mutual trust, the
648	   receiving ULP might be well advised to use a distinct STag for each
649	   interaction, and to invalidate it after each use or to require its
650	   peer to use the RDMAP option to invalidate the STag with its
651	   responding untagged message.

653	8.3 Impact of Encrypted Transports

655	   While DDP is cleanly layered over the LLP, its maximum benefit may be
656	   limited when the LLP Stream is secured with a streaming cypher, such
657	   as Transport Layer Security (TLS).  If the LLP must decrypt in order,
658	   it cannot provide out-of-order DDP Segments to the DDP layer for
659	   placement purposes.  IPsec tunnel mode encrypts entire IP Datagrams.
660	   IPsec transport mode encrypts TCP Segments or SCTP packets.  In
661	   neither case should IPsec preclude providing out-of-order DDP
662	   Segments to the DDP layer for placement.

664	   Note that end-to-end use of IPsec cryptographic integrity protection
665	   may allow suppression of MPA CRC generation and checking under
666	   certain circumstances.  This is one example where the LLP may be
667	   judged to have "or equivalent" protection to an end-to-end CRC32c.

669	References

671	   [1]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
672	        Levels", BCP 14, RFC 2119, March 1997.

674	   [2]  Dierks, T., Allen, C., Treese, W., Karlton, P., Freier, A. and
675	        P. Kocher, "The TLS Protocol Version 1.0", RFC 2246, January
676	        1999.

678	   [3]  Kent, S. and R. Atkinson, "IP Encapsulating Security Payload
679	        (ESP)", RFC 2406, November 1998.

681	   [4]  Stewart, R., Xie, Q., Morneault, K., Sharp, C., Schwarzbauer,
682	        H., Taylor, T., Rytina, I., Kalla, M., Zhang, L. and V. Paxson,
683	        "Stream Control Transmission Protocol", RFC 2960, October 2000.

685	   [5]  Coene, L., "Stream Control Transmission Protocol Applicability
686	        Statement", RFC 3257, April 2002.

688	   [6]  Recio, R., "An RDMA Protocol Specification", draft-ietf-rddp-
689	        rdmap-00 (work in progress), February 2003.

691	   [7]  Shah, H., "Direct Data Placement over Reliable Transports",
692	        draft-ietf-rddp-ddp-00 (work in progress), February 2003.

694	   [8]  Stewart, R., "Stream Control Transmission Protocol (SCTP) Remote
695	        Direct Memory Access  (RDMA) Direct Data Placement (DDP)
696	        Adaption", draft-stewart-rddp-sctp-02 (work in progress),
697	        February 2003.

699	   [9]  Culley, P., "Marker PDU Aligned Framing for TCP Specification",
700	        draft-culley-iwarp-mpa-02 (work in progress), February 2003.

702	Authors' Addresses

704	   Caitlin Bestler
705	   1241 W. North Shore
706	   # 2G
707	   Chicago, IL  60626
708	   USA

710	   Phone: +1-773-743-1594
711	   EMail: cait@asomi.com
712	   Lode Coene
713	   Atealaan 26
714	   Herentals,   2200
715	   Belgium

717	   Phone: +32-14-252081
718	   EMail: lode.coene@siemens.com

720	Full Copyright Statement

722	   Copyright (C) The Internet Society (2003).  All Rights Reserved.

724	   This document and translations of it may be copied and furnished to
725	   others, and derivative works that comment on or otherwise explain it
726	   or assist in its implementation may be prepared, copied, published
727	   and distributed, in whole or in part, without restriction of any
728	   kind, provided that the above copyright notice and this paragraph are
729	   included on all such copies and derivative works.  However, this
730	   document itself may not be modified in any way, such as by removing
731	   the copyright notice or references to the Internet Society or other
732	   Internet organizations, except as needed for the purpose of
733	   developing Internet standards in which case the procedures for
734	   copyrights defined in the Internet Standards process must be
735	   followed, or as required to translate it into languages other than
736	   English.

738	   The limited permissions granted above are perpetual and will not be
739	   revoked by the Internet Society or its successors or assigns.

741	   This document and the information contained herein is provided on an
742	   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
743	   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
744	   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
745	   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
746	   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

748	Acknowledgement

750	   Funding for the RFC Editor function is currently provided by the
751	   Internet Society.