idnits 2.17.1 

draft-ietf-rddp-applicability-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 109 instances of too long lines in the document, the longest
     one being 1 character in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 340 has weird spacing: '...r sends  multi...'

  == Line 424 has weird spacing: '...g so is  to us...'

  == Line 426 has weird spacing: '...ich the  untag...'

  == Line 445 has weird spacing: '...control  based...'

  == Line 759 has weird spacing: '...e level  proto...'

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 13, 2003) is 7494 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '1' is defined on line 810, but no explicit reference
     was found in the text

  == Unused Reference: '2' is defined on line 813, but no explicit reference
     was found in the text

  == Unused Reference: '3' is defined on line 817, but no explicit reference
     was found in the text

  == Unused Reference: '4' is defined on line 820, but no explicit reference
     was found in the text

  == Unused Reference: '5' is defined on line 824, but no explicit reference
     was found in the text

  == Unused Reference: '6' is defined on line 827, but no explicit reference
     was found in the text

  == Unused Reference: '7' is defined on line 830, but no explicit reference
     was found in the text

  == Unused Reference: '8' is defined on line 833, but no explicit reference
     was found in the text

  == Unused Reference: '9' is defined on line 838, but no explicit reference
     was found in the text

  ** Obsolete normative reference: RFC 2246 (ref. '2') (Obsoleted by RFC 4346)

  ** Obsolete normative reference: RFC 2406 (ref. '3') (Obsoleted by RFC
     4303, RFC 4305)

  ** Obsolete normative reference: RFC 2960 (ref. '4') (Obsoleted by RFC 4960)

  ** Downref: Normative reference to an Informational RFC: RFC 3257 (ref. '5')

  == Outdated reference: A later version (-07) exists of
     draft-ietf-rddp-rdmap-00

  == Outdated reference: A later version (-07) exists of
     draft-ietf-rddp-ddp-00

  == Outdated reference: A later version (-07) exists of
     draft-ietf-rddp-sctp-00

  == Outdated reference: A later version (-08) exists of
     draft-ietf-rddp-mpa-00


     Summary: 8 errors (**), 0 flaws (~~), 21 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Remote Direct Data Placement                                  C. Bestler
3	Working group                                                   L. Coene
4	Internet-Draft                                          October 13, 2003
5	Expires: April 12, 2004

7	     Applicability of Remote Direct Memory Access Protocol (RDMA) and
8	                       Direct Data Placement (DDP)
9	                   draft-ietf-rddp-applicability-01.txt

11	Status of this Memo

13	    This document is an Internet-Draft and is in full conformance with
14	    all provisions of Section 10 of RFC2026.

16	    Internet-Drafts are working documents of the Internet Engineering
17	    Task Force (IETF), its areas, and its working groups.  Note that
18	    other groups may also distribute working documents as Internet-
19	    Drafts.

21	    Internet-Drafts are draft documents valid for a maximum of six months
22	    and may be updated, replaced, or obsoleted by other documents at any
23	    time.  It is inappropriate to use Internet-Drafts as reference
24	    material or to cite them other than as "work in progress."

26	    The list of current Internet-Drafts can be accessed at http://
27	    www.ietf.org/ietf/1id-abstracts.txt.

29	    The list of Internet-Draft Shadow Directories can be accessed at
30	    http://www.ietf.org/shadow.html.

32	    This Internet-Draft will expire on April 12, 2004.

34	Copyright Notice

36	    Copyright (C) The Internet Society (2003).  All Rights Reserved.

38	Abstract

40	    This document describes the applicability of Remote Direct Memory
41	    Access Protocol (RDMAP)  and the Direct Data Placement Protocol
42	    (DDP).  It contrasts the different transport options over IP that DDP
43	    can use, compares use of DDP with direct use of the supporting
44	    transports, and compares DDP over IP transports with non-IP
45	    transports that support RDMA functionality.

47	Table of Contents

49	    1.    Introduction . . . . . . . . . . . . . . . . . . . . . . . .  3
50	    2.    Definitions  . . . . . . . . . . . . . . . . . . . . . . . .  5
51	    3.    Direct Placement . . . . . . . . . . . . . . . . . . . . . .  6
52	    3.1   Fewer Required ULP Interactions  . . . . . . . . . . . . . .  6
53	    3.2   Direct Placement using only the LLP  . . . . . . . . . . . .  6
54	    4.    Tagged Messages  . . . . . . . . . . . . . . . . . . . . . .  8
55	    4.1   Order Independent Reception  . . . . . . . . . . . . . . . .  8
56	    4.2   Reduced ULP Notifications  . . . . . . . . . . . . . . . . .  8
57	    4.3   Simplified ULP Exchanges . . . . . . . . . . . . . . . . . .  9
58	    4.4   Order Independent Sending  . . . . . . . . . . . . . . . . . 10
59	    4.5   Tagged Buffers as ULP Credits  . . . . . . . . . . . . . . . 11
60	    5.    RDMA Read  . . . . . . . . . . . . . . . . . . . . . . . . . 13
61	    6.    LLP Comparisons  . . . . . . . . . . . . . . . . . . . . . . 14
62	    6.1   Multistreaming Implications  . . . . . . . . . . . . . . . . 14
63	    6.2   Out of Order Reception Implications  . . . . . . . . . . . . 14
64	    6.3   Header and Marker Overhead . . . . . . . . . . . . . . . . . 14
65	    6.4   Middlebox Support  . . . . . . . . . . . . . . . . . . . . . 14
66	    6.5   Processing Overhead  . . . . . . . . . . . . . . . . . . . . 15
67	    6.6   Data Integrity Implications  . . . . . . . . . . . . . . . . 15
68	    6.6.1 MPA/TCP Specifics  . . . . . . . . . . . . . . . . . . . . . 15
69	    6.6.2 SCTP Specifics . . . . . . . . . . . . . . . . . . . . . . . 16
70	    6.7   Non-IP Transports  . . . . . . . . . . . . . . . . . . . . . 16
71	    6.7.1 No RDMA Layer Ack  . . . . . . . . . . . . . . . . . . . . . 16
72	    6.8   Other IP Transports  . . . . . . . . . . . . . . . . . . . . 17
73	    6.9   LLP Independent Session Establishment  . . . . . . . . . . . 17
74	    6.9.1 RDMA-only Session Establishment  . . . . . . . . . . . . . . 18
75	    6.9.2 RDMA-Conditional Session Establishment . . . . . . . . . . . 18
76	    7.    Local Interface Implications . . . . . . . . . . . . . . . . 20
77	    8.    Security considerations  . . . . . . . . . . . . . . . . . . 21
78	    8.1   Connection/Association Setup . . . . . . . . . . . . . . . . 21
79	    8.2   Tagged Buffer Exposure . . . . . . . . . . . . . . . . . . . 21
80	    8.3   Impact of Encrypted Transports . . . . . . . . . . . . . . . 21
81	          References . . . . . . . . . . . . . . . . . . . . . . . . . 22
82	          Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 22
83	          Full Copyright Statement . . . . . . . . . . . . . . . . . . 24

85	1. Introduction

87	    Remote Direct Memory Access Protocol (RDMAP) and Direct Data
88	    Placement (DDP) work together to provide application independent
89	    efficient placemenet of application payload directly into buffers
90	    specified by the Upper Layer Protocol (ULP).

92	    The DDP protocol is responsible for direct placement of received
93	    payload into ULP specified buffers.  The RDMAP protocol provides
94	    completion notifications to the ULP and support for Data Sink
95	    initiated fetch of advertised buffers (RDMA Reads).

97	    DDP and RDMAP are both application independent protocols which allow
98	    the ULP to perform remote direct data placement.  DDP can use
99	    multiple standard IP transports including SCTP and TCP.

101	    By clarifying the situations where the functionality of these
102	    protocols are applicable, this document can guide implementers,
103	    application and protocol designers in selecting which protocols to
104	    use.

106	    The applicability of RDMAP/DDP is driven by their unique
107	    capabilities:

109	    o  The existence of an application independent protocol allows common
110	       solutions to be implemented in hardware and/or the kernel.  This
111	       document will discuss when common data placement procedures are of
112	       the greatest benefit to applications as contrasted with
113	       application specific solutions built on top of direct use of the
114	       underlying transport.

116	    o  DDP supports both untagged and tagged buffers.  Tagged buffers
117	       allow the Data Sink ULP to be indifferent to what order (or in
118	       what packets) the Data Source sent the data, or what order they
119	       are received in.  This document will discuss when Data Source
120	       flexibility is of benefit to applications.

122	    o  RDMAP consolidates ULP notifications, thereby minimizing the
123	       number of required ULP interactions.

125	    o  RDMAP defines RDMA Reads, which allow remote access to advertised
126	       buffers.  This document will review the advantages of using RDMA
127	       Reads as contrasted to alternate solutions.

129	    Some non-IP transports, such as InfiniBand, directly integrate RDMA
130	    features.  This document will review the applicability of providing
131	    RDMA services over ubiquitous IP transports as opposed to the use of
132	    customized transport protocols.  Due to the fact that DDP is defined
133	    cleanly as a layer over existing IP transports, DDP has simpler
134	    ordering rules than some prior RDMA protocols.  This may have some
135	    implications for application designers.

137	    The full capabilities of DDP and RDMAP can only be fully realized by
138	    applications that are designed to exploit them.  The co-existence of
139	    RDMAP/DDP aware local interfaces with traditional socket interfaces
140	    will also be explored.

142	    Finally, DDP support is defined for at least two IP transports: SCTP
143	    and TCP.  The rationale for supporting both transports is reviewed,
144	    as well as when each would be the appropriate selection.

146	2. Definitions

148	    Advertisement - the act of informing a Remote Peer that a local RDMA
149	       Buffer is available to it.  A Node makes available an RDMA Buffer
150	       for incoming RDMA Read or RDMA Write access by informing its RDMA/
151	       DDP peer of the Tagged Buffer identifiers (STag, base address, and
152	       buffer length).  This advertisement of Tagged Buffer information
153	       is not defined by RDMA/DDP and is left to the ULP.  A typical
154	       method would be for the Local Peer to embed the Tagged Buffer's
155	       Steering Tag, base address, and length in a Send Message destined
156	       for the Remote Peer.

158	    Data Sink - The peer receiving a data payload.  Note that the Data
159	       Sink can be required to both send and receive RDMA/DDP Messages to
160	       transfer a data payload.

162	    Data Source - The peer sending a data payload.  Note that the Data
163	       Source can be required to both send and receive RDMA/DDP Messages
164	       to transfer a data payload.

166	    Lower Layer Protocol (LLP) The transport protocol that provides
167	       services to DDP.  This is an IP transport with any required
168	       adaptation layer.  Adaptation layers are defined for SCTP and TCP.

170	    Steering Tag (STag) An identifier of a Tagged Buffer on a Node, valid
171	       as defined within a protocol specification.

173	    Tagged Message A DDP message that is directed to a ULP specified
174	       buffer based upon imbedded addressing information.  In the
175	       immediate sense, the destination buffer is specified by the
176	       message sender.

178	    Untagged Message A DDP message that is directed to a ULP specified
179	       buffer based upon a Message Sequence Number being matched with a
180	       receiver supplied buffer.  The destination buffer is specified by
181	       the message receiver.

183	    Upper Layer Protocol (ULP) The direct user of RDMAP/DDP services.
184	       This may be an application, or a middleware layer such as Sockets
185	       Direct Protocol (SDP) or Remote Procedure Calls (RPC).

187	3. Direct Placement

189	    Direct Data Placement optimizes the placement of ULP payload into the
190	    correct destination buffers, typically eliminating intermediate
191	    copying.  Placement is enabled without regard to order of arrival,
192	    order of transmission or requiring per-placement interaction with the
193	    ULP.

195	    RDMAP minimizes the required ULP interactions .  This capability is
196	    most valuable for applications that require multiple transport layer
197	    packets for each required ULP interaction.

199	3.1 Fewer Required ULP Interactions

201	    While reducing the number of required ULP interactions is in itself
202	    desirable, it is critical for high speed connections.  The burst
203	    packet rate for a high speed interface could easily exceed the host
204	    systems ability to switch ULP contexts.

206	    Content access applications are primary examples of applications with
207	    both high bandwidth and high content to required ULP interaction
208	    ratios.  These applications include file access protocols (NAS),
209	    storage access (SAN), database access and other application specific
210	    forms of content access such as HTTP, XML and email.

212	3.2 Direct Placement using only the LLP

214	    Direct data placement can be achieved without RDMA.  Pre-posting of
215	    receive buffers could allow a non-RDMA network stack to place data
216	    directly to user buffers.

218	    The degree to which DDP optimizes depends on which transport is being
219	    compared with, and on the nature of the local interface.  Without
220	    RDMAP/DDP pre-posting buffers requires the receiving side to
221	    accurately predict the required buffers and their sizes.  This is not
222	    feasible for all ULPs.  By contrast, DDP only requires the ULP to
223	    predict the sequence and size of incoming untagged messages.

225	    An application that could predict incoming messages and required
226	    nothing more than direct placement into buffers might be able to do
227	    so with a properly designed local interface to SCTP or TCP.  Doing so
228	    for  TCP requires making predictions at a byte level rather than a
229	    message level.

231	    The main benefit of DDP for such an application would be that pre-
232	    posting of receive buffers is a mandated local interface capability,
233	    and that predictions can be made on a per-message basis (not per
234	    byte).

236	    The LLP can also be used directly if ULP specific knowledge is built
237	    into the protocol stack to allow "parse and place" handling of
238	    received packets.  Such a solution either requires interaction with
239	    the ULP, or that the protocol stack have knowledge of ULP specific
240	    syntax rules.

242	    DDP achieves the benefits of directly placing incoming payload
243	    without requiring tight coupling between the ULP and the protocol
244	    stack.  However, "parse and place" capabilities can certainly provide
245	    equivalent services to a limited number of ULPs.

247	4. Tagged Messages

249	    This section covers the major benefits from the use of Tagged
250	    Messages.

252	    A more critical advantage of DDP is the ability of the Data Source to
253	    use tagged buffers.  Tagging messages allows the Data Source to
254	    choose the ordering and packetization of its payload deliveries.
255	    With direct data placement based solely upon pre-posted receives, the
256	    packetization and delivery of payload must be agreed by the ULP peers
257	    in advance.  Even if there is an encoding of what is being
258	    transferred, as is common with middleware solutions, this information
259	    is not understood at the application independent layers.  The
260	    directions on where to place the incoming data cannot be accessed
261	    without switching to the ULP first.  DDP provides a standardized
262	    'packing list' which can be interpreted without requiring ULP
263	    interaction.  Indeed, it is designed to be implementable in hardware.

265	4.1 Order Independent Reception

267	    Tagged messages are directed to a buffer based on an included
268	    Steering Tag.  Additionally, no notice is provided to the ULP for
269	    each individual Tagged Message's arrival.  Together these allow
270	    tagged messages received out-of-order to be processed without
271	    intermediate buffering or additional notifications to the ULP.

273	4.2 Reduced ULP Notifications

275	    RDMAP further reduces required ULP interactions consolidating
276	    completion notifications of tagged messages with the completion
277	    notification of a trailing untagged message.  For most ULPs this
278	    radically reduces the number of ULP required interactions even
279	    further.

281	    While RDMAP consolidation of notices is beneficial to most
282	    applications.  It may be detrimental to some applications that
283	    benefit from streamed delivery to enable ULP processing of received
284	    data as promptly as possible.  A ULP that uses RDMAP cannot begin
285	    processing any portion of an exchange until it receives notification
286	    that the entire exchange has been placed.  An "exchange" here is a
287	    set of zero or more tagged messages and a single terminating untagged
288	    message.  An application that would prefer to begin work on the
289	    received payload, no matter what order it arrived in, as soon as
290	    possible might prefer to work directly with the LLP.  RDMAP is
291	    optimized for applications that are more concerned when the entire
292	    exchange is complete.

294	    An application that benefits from being able to begin processing of
295	    each received packet as quickly as possible may find RDMAP interferes
296	    with that goal.

298	    Such an application might be able to retain most of the benefits of
299	    RDMAP by using the DDP layer directly.  However, in addition to
300	    taking on the responsibilities of the RDMAP layer, the application
301	    would likely have more difficulty finding support for a DDP-only API.
302	    Many hardware implementations may choose to tightly couple RDMAP and
303	    DDP, and might not provide an API directly to DDP services.

305	    These features minimize the required interactions with the ULP.  This
306	    can be extremely beneficial for applications that use multiple
307	    transport layer packets to accomplish what is a single ULP
308	    interaction.

310	4.3 Simplified ULP Exchanges

312	    The notification rules for Tagged Messages allows ULPs to create
313	    multi-message "exchanges" consisting of zero or more tagged messages
314	    that represent a single step in the ULP interaction.  The receiving
315	    ULP is notified that the untagged message has arrived, and implicitly
316	    of any associated tagged messages.

318	    A ULP where all exchanges would naturally be only the untagged
319	    message would derive virtually no benefit from the use of RDMAP/DDP
320	    as opposed to SCTP.  But while tagged buffers are the justification
321	    for RDMAP/DDP, untagged buffers are still necessary.  Without
322	    untagged buffers the only method to exchange buffer advertisements
323	    would involve out-of-band communications and/or sharing of compile
324	    time constants.  Most RDMA-aware ULPs use untagged buffers for
325	    requests and responses.  Buffer advertisements are typically done
326	    within these untagged messages.

328	    Limiting use of untagged buffers to requests and responses by moving
329	    all bulk data using tagged transfers can greatly simplify the amount
330	    of prediction that the Data Sink must perform in pre-posting receive
331	    buffers.  For example, a typical RDMA enabled interaction would
332	    consist of the following:

334	       Client sends transaction request to server's as an untagged
335	       message.

337	       This message includes buffer advertisements for the buffers where
338	       the results are to be placed.

340	       The Server sends  multiple tagged messages to the advertised
341	       buffers.

343	       The Server sends transaction reply as an untagged message to the
344	       client.

346	       Client receives single notification, indicating completion of the
347	       interaction.

349	    With this type of exchange the pacing and required size of untagged
350	    buffers is highly predictable.  The variability of response sizes is
351	    absorbed by tagged transfers.

353	4.4 Order Independent Sending

355	    Use of tagged messages is especially applicable when the Data Sink
356	    does not know the actual size, structure or location of the content
357	    it is requesting (or updating).

359	    For example, suppose the Data Sink ULP needs to fetch four related
360	    pieces of data into a four separate buffers.  With SCTP the Data Sink
361	    ULP could receive four messages into four separate buffers, only
362	    having to predict the maximum size of each.  However it would have to
363	    dictate the order in which the Data Source supplied the separate
364	    pieces.  If the Data Source found it advantageous to fetch them in a
365	    different order it would have to use intermediate buffering to re-
366	    order the pieces into the expected order even though the application
367	    only required that all four be delivered and did not truly have an
368	    ordering requirement.

370	    Techniques such as RAID striping and mirroring represent this same
371	    problem, but one step further.  What appears to be a single resource
372	    to the Data Sink is actually stored in separate locations by the Data
373	    Source.  Non RDMA protocols would either require the Data Source to
374	    fetch the material in the desired order or force the Data Source to
375	    use its own holding buffers to assemble an image of the destination
376	    buffer.

378	    While sometimes referred to as a "buffer-to-buffer" solution, RDMA
379	    more fundamentally enables remote buffer access.  The ULP is free to
380	    work with larger remote buffers than it has locally.  This reduces
381	    buffering requirements and the number of times the data must be
382	    copied in an end-to-end transfer.

384	    There are numerous reasons why the Data Sink would not know the true
385	    order or location of the requested data.  It could be different for
386	    each client, different records selected and/or different sort orders,
387	    RAID striping, file fragmentation, volume fragmentation, volume
388	    mirroring and server-side dynamic compositing of content (such as
389	    server side includes for HTTP).

391	    In all of these cases the Data Source is free to assemble the desired
392	    data in the Data Sinks buffer in whatever order the component data
393	    becomes available to it.  It is not constrained on ordering.  It does
394	    not have to assemble an image in its own memory before creating it in
395	    the Data Sink's buffers.

397	    Note that while DDP enables use of tagged messages for bulk transfer,
398	    there are some application scenarios where untagged messages would
399	    still be used for bulk transfer.  For example, under the Direct
400	    Access File Server (DAFS) protocol the file server does not expose
401	    its own memory to its clients.  A client wishing to write may
402	    advertise a buffer which the server will issue RDMA Reads upon.
403	    However, when performing a small write it may be preferable to
404	    include the data in the untagged message rather than incurring an
405	    additional round trip with the RDMA Read and its response.

407	4.5 Tagged Buffers as ULP Credits

409	    The handling of end-to-end buffer credits differs considerably with
410	    DDP than when the ULP directly uses either TCP or SCTP.

412	    With both TCP and SCTP buffer credits are based upon the receiver
413	    granting transmit permission based on the total number of bytes.
414	    These credits reflect system buffering resources and/or simple flow
415	    control.  They do not represent ULP resources.

417	    DDP defines no standard flow control, but presumes the existince of a
418	    ULP mechanism.  The presumed mechanism is that the Data Sink ULP has
419	    issued credits to the Data Source allowing the Data Source to send a
420	    specific number of untagged messages.

422	    The ULP peers must ensure that the sender is aware of the maximum
423	    size that can be sent to any specific target buffer.  One method of
424	    doing so is  to use a standard size for all untagged buffers within a
425	    given connection.  For example, DAFS specifies an initial size
426	    requirement for session establishment, during which the  untagged
427	    buffer size for the remainder of the session is negotiated.

429	    Tagged buffers are ULP resources advertised directly from ULP to ULP.
430	    A DDP put to a known tagged buffer is constrained only by transport
431	    level flow control, not by available system buffering.

433	    Either tagged or untagged buffers allows bypassing of system buffer
434	    resources.  Use of tagged buffers additionally allows the Data Source
435	    to choose what order to exercise the credits in.

437	    To the extent allowed by the ULP, tagged buffers are also divisible
438	    resources.  The Data Sink can advertise a single 100 KB buffer, and
439	    then receive notifications from its peer that it had written 50 KB,
440	    20 KB and 30 KB to that buffer in three successive transactions.

442	    ULP-management of tagged buffer resources, independent of transport
443	    and DDP layer credits, is an additional benefit of RDMA protocols.
444	    Large bulk transfers cannot be blocked by limited general purpose
445	    buffering capacity.  Applications can flow control  based upon higher
446	    level abstractions, such as number of outstanding requests,
447	    independent of the amount of data that must be transferred.

449	    However, use of system buffering, as offered by direct use of the
450	    underlying transports, can be preferable under certain circumstances.

452	    One example would be when the number of target ULP buffers is
453	    sufficiently large, and the rate at which any writes arrive is
454	    sufficiently low, that pinning all the target ULP buffers in memory
455	    would be undesirable.  The maximum transfer rate, and hence the
456	    maximum amount of system buffering required,  may be more stable and
457	    predictable than the total ULP buffer exposure.

459	    Another would be the Data Sink wishes to receive a stream of data at
460	    a predictable rate, but does not know in advance what the size of
461	    each data packet will be.  This is common from streaming media that
462	    has been encoded with a variable bit rate.  With DDP the Data Sink
463	    would either have to use untagged buffers large enough for the
464	    largest packet, or advertise a circular buffer.  If for security or
465	    other reasons the Data Sink did not want the size of its buffer to be
466	    publicly known, using the underlying SCTP transport directly may be
467	    preferable because of their byte-oriented credits.

469	5. RDMA Read

471	    RDMA Reads are a further service provided by RDMAP.  RDMA Reads allow
472	    the Data Sink to fetch exactly the portion of the peer ULP buffer
473	    required on a "just in time" basis.  This can be done without
474	    requiring per-fetch support from the Data Source ULP.

476	    Storage servers may wish to limit the maximum write buffer allocated
477	    to any single session.  The storage server may be a very minimal
478	    layer between the client and the disk storage media, or the server
479	    may merely wish to limit the total resources that would be required
480	    if all clients could push the entire payload they wished written at
481	    their own convenience.

483	    In either case, there is little benefit in transferring data from the
484	    Data Source far in advance of when it will be written to the
485	    persistent storage media.  RDMA Reads allow the Storage Server to
486	    fetch the payload on a "just in time" basis.  In this fashion a
487	    relatively small number of block sized buffers can be used to execute
488	    a single transaction that specified writing a large file, or a
489	    Storage Server with numerous clients can fetch buffers from the
490	    individual clients in the order that is most convenient to the
491	    server.

493	    This same capability can be used when the desired portion of the
494	    advertised buffer is not known in advance.  For example the
495	    advertised buffer could contain performance statistics.  The data
496	    sink could request the portions of the data it required, without
497	    requiring an interaction with the Data Source ULP.

499	    This is applicable for many applications that publish semi-volatile
500	    data that does not require transactional validity checking (i.e.,
501	    authorized users have read access to the entire set of data).  It is
502	    less applicable when there are ULP consistency checks that must be
503	    performed upon the data.  Such applications would be better served by
504	    having the client send a request, and having the server use RDMA
505	    Writes to publish the requested data.  Neither RDMAP or DDP provide
506	    mechanisms for bundling multiple disjoint updates into an atomic
507	    operation.  Therefore use of an advertised buffer as a data resource
508	    is subject to the same caveats as any randomly updated data resource,
509	    such as flat files, that do not enforce their own cosnsistency.

511	6. LLP Comparisons

513	    Normally the choice of underlying IP transport is irrelevant to the
514	    ULP.  RDMAP and DDP provides the same services over either.  There
515	    may be performance impacts of the choice, however.  It is the
516	    responsibility of the ULP to determine which IP transport is best
517	    suited to its needs.

519	    SCTP provides for preservation of message boundaries.  Each DDP
520	    segment will be delivered within a single SCTP packet.  The
521	    equivalent services are only available with TCP through the use of
522	    the MPA adaptation layer.

524	6.1 Multistreaming Implications

526	    SCTP also provides multi-streaming.  When the same pair of hosts have
527	    need for multiple DDP streams this can be a major advantage.  A
528	    single SCTP association carries multiple DDP streams, consolidating
529	    connection setup, congestion control and acknowledgements.

531	    Completions are controlled by the DDP Source Sequence Number (DDP-
532	    SSN) on a per stream basis.  Therefore combining multiple DDP Streams
533	    into a single SCTP association cannot result in a dropped packet
534	    carrying data for one stream delaying completions on others.

536	6.2 Out of Order Reception Implications

538	    The use of unordered Data Chunks with SCTP guarantees that the DDP
539	    layer will be able to perform placements when IP datagrams are
540	    received out of order.

542	    Placement of out-of-order DDP Segments carried over MPA/TCP is not
543	    guaranteed, but certainly allowed.  The ability of the MPA receiver
544	    to process out-of-order DDP Segments may be impaired when TCP
545	    alignment is lost.  Using SCTP, each DDP Segment is encoded in a
546	    single Data Chunk and never spread over multiple IP datagrams.

548	6.3 Header and Marker Overhead

550	    MPA and TCP headers together are smaller than the headers used by
551	    SCTP and its adaptation layer.  However, this advantage can be
552	    considerably reduced by the insertion of MPA markers.  In any event
553	    the different in ULP payload per IP Datagram is not likely to be a
554	    signifigant factor.

556	6.4 Middlebox Support

558	    Even with the MPA adaptation layer, DDP traffic carried over MPA/TCP
559	    will appear to all network middleboxes as a normal TCP connection.
560	    In many environmenets there may be a requirement to use only TCP
561	    connections to satisfy existing network elements and/or to facilitate
562	    monitoring and control of connections.  While SCTP is certainly just
563	    as monitorable and controllable as TCP, there is no guarantee that
564	    the network management infrastructure has the required support for
565	    both.

567	6.5 Processing Overhead

569	    A DDP stream delivered via MPA/TCP will require more processing
570	    effort than one delivered over SCTP.  However this extra work may be
571	    justified for many deployments where full SCTP support is unavailable
572	    in the intermediate network.

574	6.6 Data Integrity Implications

576	    Both the SCTP and MPA/TCP adaptation provide end-to-end CRC32c
577	    protection against data corruption, or its equivalent.

579	    A ULP that requires a greater degree of protection may add it own.
580	    However, DDP and RDMAP headers will only be guaranteed to have the
581	    equivalent of end-to-end CRC32c protection.  A ULP that requires data
582	    integrity checking more thorough than an end-to-end CRC32c should
583	    first invalidate all STags that reference a buffer before applying
584	    their own integrity check.

586	6.6.1 MPA/TCP Specifics

588	    It is mandatory for MPA/TCP implementations to implement CRC32c, but
589	    it is NOT mandatory to use the CRC32c during an RDMA connection.  The
590	    activating or deactivating of the CRC in MPA/TCP is an administrative
591	    configuration operation at the local and remote end.  The
592	    administration of the CRC(ON/OFF) is invisible to the ULP.

594	    Applications SHOULD trust that this administrative option will only
595	    be used when the end-to-end protection is at least as effective as a
596	    transport layer CRC32c.  Applications SHOULD NOT apply additional
597	    protection as a guard against this administrative option being turned
598	    on inadvertently.

600	    Administrators MUST NOT enable CRC32c suppression unless the end-to-
601	    end protection is truly equivalent.

603	    If the CRC is active/used for one direction/end , then the use of the
604	    CRC is mandatory in both directions/ends.

606	    If both ends have been configured NOT to use the CRC, then this is
607	    allowed as long as an equivalent protection(comparable or better
608	    than/to CRC) from undetected errors on the connection is provided.

610	6.6.2 SCTP Specifics

612	    SCTP provides CRC32c protection automatically.  The adaptation to
613	    SCTP provides for no option to suppress SCTP CRC32c protection.

615	6.7 Non-IP Transports

617	    DDP is defined to operate over ubiquitous IP transports such as SCTP
618	    and TCP.  This enabled a new DDP-enabled node to be added anywhere to
619	    an IP network.  No DDP-specific support from middle-boxes is
620	    required.

622	    There are non-IP transport fabric offering RDMA capabilities.
623	    Because these capabilities are integrated with the transport protocol
624	    they have some technical advantages when compared to RDMA over IP.
625	    For example fencing of RDMA operations can be based upon transport
626	    level acks.  Because DDP is cleanly layered over an IP transport, any
627	    explicit RDMA layer ack must be separate from the transport layer
628	    ack.

630	    There may be deployments where the benefits of RDMA/transport
631	    integration outweigh the benefits of being on an IP network.

633	6.7.1 No RDMA Layer Ack

635	    DDP does not provide for its own acknowledgements.  The only form of
636	    ack provided at the RDMAP layer is an RDMA Read Response.  DDP and
637	    RDMAP rely almost entirely upon other layers for flow control and
638	    pacing.  The LLP is relied upon to guarantee delivery and avoid
639	    network congestion,  and ULP level acking is relied upon for ULP
640	    pacing and to avoid ULP buffer overruns.

642	    Previous RDMA protocols, such as InfiniBand, have been able to use
643	    their integration with the transport layer to provide stronger
644	    ordering guarantees.  It is important that application designers that
645	    require such guarantees to provide them through ULP interaction.

647	    Specifically:

649	       There is no ability for a local interface to "fence" outbound
650	       messages to guarantee that prior tagged messages have been placed
651	       prior to sending a tagged message.  The only guarantees available
652	       from the other side would be an RDMA Read Response (coming from
653	       the RDMAP layer) or a response from the ULP layer.  Remember that
654	       the normal ordering rules only guarantee when the Data Sink ULP
655	       will be notified of untagged messages, it does not control when
656	       data is placed into receive buffers.

658	       Re-use of tagged buffers must be done with extreme care.  The fact
659	       that an untagged message indicates that all prior tagged messages
660	       have been placed does not guarantee that no later tagged message
661	       have.  The best strategy is to only change the state of any given
662	       advertised buffers with with untagged messages.

664	       As covered elsewhere in this document, flow control of untagged
665	       messages MUST be provided by the ULP itself.

667	6.8 Other IP Transports

669	    Both TCP and SCTP provide DDP with reliable transport with TCP
670	    friendly rate control.  As currently DDP is defined to work over
671	    reliable transports and implicitly relies upon some form of rate
672	    control.

674	    DDP is fully compatible with a non-reliable protocol.  Out-of-order
675	    placement is obviously not dependent on whether the other DDP
676	    Segments ever actually arrive.

678	    However, RDMAP requires the LLP to provide reliable service.  An
679	    alternate completion handling protocol would be required if DDP were
680	    to be deployed over an unreliable IP transport.

682	    As noted in the prior section on tagged buffers as ULP credits,
683	    neither RDMAP or DDP provide any flow control for tagged messages.
684	    If no transport layer flow control is provided, an RDMAP/DDP
685	    application would be only limited by the link layer rate, almost
686	    inevitably resulting in severe network congestion.

688	    RDMAP encourages applications to be ignorant of the underlying
689	    transport PMTU.  The ULP is only notified when all messages ending in
690	    a single untagged message have completed.  The ULP is not aware of
691	    the granularity or ordering of the underlying message.  This approach
692	    assumes that the ULP is only interested in the complete set of
693	    messages, and has no use for a subset of them.

695	6.9 LLP Independent Session Establishment

697	    For an RDMAP/DDP application, the transport services provided by a
698	    pair of SCTP Streams and by a TCP connection both provide the same
699	    service (reliable delivery of DDP Segments between two connected
700	    RDMAP/DDP endpoints).

702	6.9.1 RDMA-only Session Establishment

704	    It is also possible to allow for transport neutral establishment of
705	    RDMAP/DDP sessions between endpoints.  Combined, these two features
706	    would allow most applications to be unconcerned as to which LLP was
707	    actually in use.

709	    Specifically, the procedures for DDP Stream Session establishment
710	    discussed in section 3 of the SCTP mapping, and section 13.3 of the
711	    MPA/TCP mapping, both allow for the exchange of ULP specific data
712	    ("Private Data") before enabling the exchange of DDP Segments.  This
713	    delays can allow for proper selection and/or configuration of the
714	    endpoints based upon the exchanged data.  For example, each DDP
715	    Stream Session associated with a single client session might be
716	    assigned to the same DDP Protection Domain.

718	    To be transport neutral, the applications should exchange Private
719	    Data as part of session establishment messages to determine how the
720	    RDMA endpoints are to be configured.  One side must be the Initiator,
721	    and the other the Responder.

723	    With SCTP, a pair of SCTP streams can be used for sequential
724	    sessions.  With MPA/TCP each connection can be used for at most one
725	    session.  However, the same source/destination pair of ports can be
726	    re-used sequentially subject to normal TCP rules.

728	6.9.2 RDMA-Conditional Session Establishment

730	    It is sometimes desirable for the active side of a session to connect
731	    with the passive side before knowing whether the passive side
732	    supports RDMA.

734	    This style of session establishment can be supported with either TCP
735	    or SCTP, but not as transparently as for RDMA-only sessions.  Pre-
736	    existing non-RDMA servers are also far more likely to be using TCP
737	    than SCTP.

739	    With TCP.  a normal TCP connection is established.  It is then used
740	    by the ULP to determine whether or not to convert to MPA mode and use
741	    RDMA.  This will typically be integral with other session
742	    establishment negotiations.

744	    With SCTP, the establishment of an association tests whether RDMA is
745	    supported.  If not supported, the application simply requests the
746	    association without the RDMA adaptation indication.

748	    In key difference is that with SCTP the determination as to whether
749	    the peer can support RDMA is made before the transport layer
750	    association/connection is established while with TCP the established
751	    connection itself is used to determine whether RDMA is supported.

753	7. Local Interface Implications

755	    Full utilization of DDP and RDMAP capabilities requires a local
756	    interface that explicitly requests these services.  Protocols such as
757	    Sockets Direct Protocol (SDP) can allow applications to keep their
758	    traditional byte-stream or message-stream interface and still enjoy
759	    many of the benefits of the optimized wire level  protocols.

761	8. Security considerations

763	8.1 Connection/Association Setup

765	    Both the SCTP and TCP adaptations allow for existing procedures to be
766	    followed for the establishment of the SCTP association or TCP
767	    connection.  Use of DDP does not impair the use of any security
768	    measures to filter, validate and/or log the remote end of an
769	    association/connection.

771	8.2 Tagged Buffer Exposure

773	    DDP only exposes ULP memory to the extent explicitly allowed by ULP
774	    actions.  These include posting of receive operations and enabling of
775	    Steering Tags.

777	    Neither RDMAP or DDP place requirements on how ULP's advertise
778	    buffers.  A ULP may use a single Steering Tag for multiple buffer
779	    advertisements.  However, the ULP should be aware that enforcement on
780	    STag usage is likely limited to the overall range that is enabled.
781	    If the remote peer writes into the 'wrong' advertised buffer, neither
782	    the DDP or RDMAP layer will be aware of this.  Nor is there any
783	    report to the ULP on how the remote peer specifically used tagged
784	    buffers.

786	    Unless the ULP peers have an adequate basis for mutual trust, the
787	    receiving ULP might be well advised to use a distinct STag for each
788	    interaction, and to invalidate it after each use or to require its
789	    peer to use the RDMAP option to invalidate the STag with its
790	    responding untagged message.

792	8.3 Impact of Encrypted Transports

794	    While DDP is cleanly layered over the LLP, its maximum benefit may be
795	    limited when the LLP Stream is secured with a streaming cypher, such
796	    as Transport Layer Security (TLS).  If the LLP must decrypt in order,
797	    it cannot provide out-of-order DDP Segments to the DDP layer for
798	    placement purposes.  IPsec tunnel mode encrypts entire IP Datagrams.
799	    IPsec transport mode encrypts TCP Segments or SCTP packets.  In
800	    neither case should IPsec preclude providing out-of-order DDP
801	    Segments to the DDP layer for placement.

803	    Note that end-to-end use of IPsec cryptographic integrity protection
804	    may allow suppression of MPA CRC generation and checking under
805	    certain circumstances.  This is one example where the LLP may be
806	    judged to have "or equivalent" protection to an end-to-end CRC32c.

808	References

810	    [1]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
811	         Levels", BCP 14, RFC 2119, March 1997.

813	    [2]  Dierks, T., Allen, C., Treese, W., Karlton, P., Freier, A. and
814	         P. Kocher, "The TLS Protocol Version 1.0", RFC 2246, January
815	         1999.

817	    [3]  Kent, S. and R. Atkinson, "IP Encapsulating Security Payload
818	         (ESP)", RFC 2406, November 1998.

820	    [4]  Stewart, R., Xie, Q., Morneault, K., Sharp, C., Schwarzbauer,
821	         H., Taylor, T., Rytina, I., Kalla, M., Zhang, L. and V. Paxson,
822	         "Stream Control Transmission Protocol", RFC 2960, October 2000.

824	    [5]  Coene, L., "Stream Control Transmission Protocol Applicability
825	         Statement", RFC 3257, April 2002.

827	    [6]  Recio, R., "An RDMA Protocol Specification", draft-ietf-rddp-
828	         rdmap-00 (work in progress), February 2003.

830	    [7]  Shah, H., "Direct Data Placement over Reliable Transports",
831	         draft-ietf-rddp-ddp-00 (work in progress), February 2003.

833	    [8]  Stewart, R., "Stream Control Transmission Protocol (SCTP) Remote
834	         Direct Memory Access (RDMA) Direct Data Placement (DDP)
835	         Adaptationn", draft-ietf-rddp-sctp-00 (work in progress),
836	         September 2003.

838	    [9]  Culley, P., "Marker PDU Aligned Framing for TCP Specification",
839	         draft-ietf-rddp-mpa-00 (work in progress), October 2003.

841	Authors' Addresses

843	    Caitlin Bestler
844	    1241 W. North Shore
845	    # 2G
846	    Chicago, IL  60626
847	    USA

849	    Phone: +1-773-743-1594
850	    EMail: cait@asomi.com
851	    Lode Coene
852	    Atealaan 26
853	    Herentals,   2200
854	    Belgium

856	    Phone: +32-14-252081
857	    EMail: lode.coene@siemens.com

859	Full Copyright Statement

861	    Copyright (C) The Internet Society (2003).  All Rights Reserved.

863	    This document and translations of it may be copied and furnished to
864	    others, and derivative works that comment on or otherwise explain it
865	    or assist in its implementation may be prepared, copied, published
866	    and distributed, in whole or in part, without restriction of any
867	    kind, provided that the above copyright notice and this paragraph are
868	    included on all such copies and derivative works.  However, this
869	    document itself may not be modified in any way, such as by removing
870	    the copyright notice or references to the Internet Society or other
871	    Internet organizations, except as needed for the purpose of
872	    developing Internet standards in which case the procedures for
873	    copyrights defined in the Internet Standards process must be
874	    followed, or as required to translate it into languages other than
875	    English.

877	    The limited permissions granted above are perpetual and will not be
878	    revoked by the Internet Society or its successors or assigns.

880	    This document and the information contained herein is provided on an
881	    "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
882	    TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
883	    BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
884	    HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
885	    MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

887	Acknowledgement

889	    Funding for the RFC Editor function is currently provided by the
890	    Internet Society.