idnits 2.17.1 

draft-ietf-rddp-mpa-08.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 21.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 3201.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 3216.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 3223.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 3229.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The abstract seems to contain references ([DDP]), which it shouldn't. 
     Please replace those with straight textual mentions of the documents in
     question.

  == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     C: This bit declares an endpoint's preferred CRC usage.  When this
     field is '0' in the MPA Request Frame and the MPA Reply Frame, CRCs MUST
     not be checked and need not be generated by either endpoint.  When this
     bit is '1' in either the MPA Request Frame or MPA Reply Frame, CRCs MUST
     be generated and checked by both endpoints.  Note that even when not in
     use, the CRC field remains present in the FPDU.  When CRCs are not in
     use, the CRC field MUST be considered valid for FPDU checking regardless
     of its contents.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     9.  MPA implementations MUST validate the PD_Length field.  The
     buffer that receives the Private Data field MUST be large enough to
     receive that data; the amount of Private Data MUST not exceed the
     PD_Length, or the application buffer.  If any of the above fails, the
     startup frame MUST be considered improperly formatted.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 7, 2006) is 6404 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'Seconds' is mentioned on line 2376, but not defined

  ** Obsolete normative reference: RFC 2401 (Obsoleted by RFC 4301)

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  == Outdated reference: A later version (-10) exists of
     draft-ietf-rddp-security-09

  == Outdated reference: A later version (-04) exists of
     draft-ietf-nfsv4-channel-bindings-02

  -- Obsolete informational reference (is this intentional?): RFC  896
     (Obsoleted by RFC 7805)

  -- Obsolete informational reference (is this intentional?): RFC 2960
     (Obsoleted by RFC 4960)

  -- No information found for draft-hilland-iwarp-verbs-v1 - is the name
     correct?


     Summary: 7 errors (**), 0 flaws (~~), 8 warnings (==), 11 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	   Remote Direct Data Placement Work Group                    P. Culley
2	   INTERNET-DRAFT                               Hewlett-Packard Company
3	   draft-ietf-rddp-mpa-08.txt                                  U. Elzur
4	                                                   Broadcom Corporation
5	                                                               R. Recio
6	                                                        IBM Corporation
7	                                                              S. Bailey
8	                                                  Sandburst Corporation
9	                                                             J. Carrier
10	                                                              Cray Inc.

12	   Expires: April 2007                       October 7, 2006

14	             Marker PDU Aligned Framing for TCP Specification

16	Status of this Memo

18	   By submitting this Internet-Draft, each author represents that any
19	   applicable patent or other IPR claims of which he or she is aware
20	   have been or will be disclosed, and any of which he or she becomes
21	   aware will be disclosed, in accordance with Section 6 of BCP 79.

23	   Internet-Drafts are working documents of the Internet Engineering
24	   Task Force (IETF), its areas, and its working groups.  Note that
25	   other groups may also distribute working documents as Internet-
26	   Drafts.

28	   Internet-Drafts are draft documents valid for a maximum of six months
29	   and may be updated, replaced, or obsoleted by other documents at any
30	   time.  It is inappropriate to use Internet-Drafts as reference
31	   material or to cite them other than as "work in progress."

33	   The list of current Internet-Drafts can be accessed at
34	   http://www.ietf.org/1id-abstracts.html.  The list of Internet-Draft
35	   Shadow Directories can be accessed at http://www.ietf.org/shadow.html

37	Abstract

39	   MPA (Marker Protocol data unit Aligned framing) is designed to work
40	   as an "adaptation layer" between TCP and the Direct Data Placement
41	   [DDP] protocol, preserving the reliable, in-order delivery of TCP,
42	   while adding the preservation of higher-level protocol record
43	   boundaries that DDP requires.  MPA is fully compliant with applicable
44	   TCP RFCs and can be utilized with existing TCP implementations.  MPA
45	   also supports integrated implementations that combine TCP, MPA and
46	   DDP to reduce buffering requirements in the implementation and
47	   improve performance at the system level.

49	   Table of Contents

51	   Status of this Memo                                                 1
52	   Abstract                                                            1
53	   1      Glossary                                                     5
54	   2      Introduction                                                 8
55	   2.1    Motivation                                                   8
56	   2.2    Protocol Overview                                            8
57	   3      MPA's interactions with DDP                                 12
58	   4      MPA Full Operation Mode                                     14
59	   4.1    FPDU Format                                                 14
60	   4.2    Marker Format                                               15
61	   4.3    MPA Markers                                                 15
62	   4.4    CRC Calculation                                             18
63	   4.5    FPDU Size Considerations                                    21
64	   5      MPA's interactions with TCP                                 23
65	   5.1    MPA transmitters with a standard layered TCP                23
66	   5.2    MPA receivers with a standard layered TCP                   24
67	   6      MPA Receiver FPDU Identification                            24
68	   7      Connection Semantics                                        26
69	   7.1    Connection setup                                            26
70	   7.1.1  MPA Request and Reply Frame Format                          28
71	   7.1.2  Connection Startup Rules                                    29
72	   7.1.3  Example Delayed Startup sequence                            32
73	   7.1.4  Use of Private Data                                         35
74	   7.1.4.1  Motivation                                                35
75	   7.1.4.2  Example Immediate Startup using Private Data              36
76	   7.1.5  "Dual stack" implementations                                38
77	   7.2    Normal Connection Teardown                                  39
78	   8      Error Semantics                                             40
79	   9      Security Considerations                                     41
80	   9.1    Protocol-specific Security Considerations                   41
81	   9.1.1  Spoofing                                                    41
82	   9.1.1.1  Impersonation                                             41
83	   9.1.1.2  Stream Hijacking                                          42
84	   9.1.1.3  Man in the Middle Attack                                  42
85	   9.1.2  Eavesdropping                                               42
86	   9.2    Introduction to Security Options                            43
87	   9.3    Using IPsec With MPA                                        43
88	   9.4    Requirements for IPsec Encapsulation of MPA/DDP             44
89	   10     IANA Considerations                                         45
90	   A Appendix. Optimized MPA-aware TCP implementations                46
91	   A.1    Optimized MPA/TCP transmitters                              46
92	   A.2    Effects of Optimized MPA/TCP Segmentation                   47
93	   A.3    Optimized MPA/TCP receivers                                 49
94	   A.4    Re-segmenting Middle boxes and non optimized MPA/TCP senders50
95	   A.5    Receiver implementation                                     51
96	   A.5.1  Network Layer Reassembly Buffers                            52
97	   A.5.2  TCP Reassembly buffers                                      53
98	   B Appendix. Analysis of MPA over TCP Operations                    54
99	   B.1    Assumptions                                                 54
100	   B.1.1  MPA is layered beneath DDP [DDP]                            54
101	   B.1.2  MPA preserves DDP message framing                           55
102	   B.1.3  The size of the ULPDU passed to MPA is less than EMSS under
103	          normal conditions                                           55
104	   B.1.4  Out-of-order placement but NO out-of-order Delivery         55
105	   B.2    The Value of FPDU Alignment                                 55
106	   B.2.1  Impact of lack of FPDU Alignment on the receiver computational
107	          load and complexity                                         57
108	   B.2.2  FPDU Alignment effects on TCP wire protocol                 61
109	   C Appendix. IETF Implementation Interoperability with RDMA Consortium
110	          Protocols                                                   63
111	   C.1    Negotiated Parameters                                       63
112	   C.2    RDMAC RNIC and Non-permissive IETF RNIC                     65
113	   C.2.1  RDMAC RNIC Initiator                                        65
114	   C.2.2  Non-Permissive IETF RNIC Initiator                          66
115	   C.2.3  RDMAC RNIC and Permissive IETF RNIC                         66
116	   C.2.4  RDMAC RNIC Initiator                                        67
117	   C.2.5  Permissive IETF RNIC Initiator                              67
118	   C.3    Non-Permissive IETF RNIC and Permissive IETF RNIC           67
119	   Normative References                                               69
120	   Informative References                                             69
121	   Author's Addresses                                                 71
122	   Acknowledgments                                                    72
123	   Full Copyright Statement                                           75
124	   Intellectual Property                                              75

126	   Table of Figures

128	   Figure 1 ULP MPA TCP Layering                                       9
129	   Figure 2 FPDU Format                                               14
130	   Figure 3 Marker Format                                             15
131	   Figure 4 Example FPDU Format with Marker                           17
132	   Figure 5 Annotated Hex Dump of an FPDU                             20
133	   Figure 6 Annotated Hex Dump of an FPDU with Marker                 21
134	   Figure 7 Fully layered implementation                              23
135	   Figure 8 MPA Request/Reply Frame                                   28
136	   Figure 9: Example Delayed Startup negotiation                      33
137	   Figure 10: Example Immediate Startup negotiation                   36
138	   Figure 11 Optimized MPA/TCP implementation                         46
139	   Figure 12: Non-aligned FPDU freely placed in TCP octet stream      57
140	   Figure 13: Aligned FPDU placed immediately after TCP header        59
141	   Figure 14.  Connection Parameters for the RNIC Types.              64
142	   Figure 15: MPA negotiation between an RDMAC RNIC and a Non-permissive
143	          IETF RNIC.                                                  65
144	   Figure 16: MPA negotiation between an RDMAC RNIC and a Permissive
145	          IETF RNIC.                                                  66
146	   Figure 17: MPA negotiation between a Non-permissive IETF RNIC and a
147	          Permissive IETF RNIC.                                       68

149	   Revision history [To be deleted prior to RFC publication]

151	   [draft-ietf-rddp-mpa-08] workgroup draft with following changes:

153	        Re-submission to correct conversion errors.

155	   [draft-ietf-rddp-mpa-07] workgroup draft with following changes:

157	        Minor clarifications; added CRC to glossary, made 2.1 discussion
158	        on probabilistic/deterministic a little less global.  Added note
159	        that MULPDU is likely smaller than 64768, clarified 'M' bit
160	        description, added xref to private data discussion in field
161	        definition, removed LLP acronym, added sentence on DOS attack to
162	        "Man in Middle" in security.

164	    [draft-ietf-rddp-mpa-06] workgroup draft with following changes:

166	        Document restructuring to move descriptive information on
167	        implementing optimized MPA/TCP implementations to an appendix.
168	        All normative text was removed from the appendix.  Paragraph
169	        added to security section explaining IPSEC version.  Added
170	        informative references to architecture, applicability, and
171	        problem statement documents.

173	1  Glossary

175	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
176	       "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
177	       this document are to be interpreted as described in [RFC2119].

179	   Consumer - the ULPs or applications that lie above MPA and DDP.  The
180	       Consumer is responsible for making TCP connections, starting MPA
181	       and DDP connections, and generally controlling operations.

183	   CRC - Cyclic Redundancy Check.

185	   Delivery - (Delivered, Delivers) - For MPA, Delivery is defined as
186	       the process of informing DDP that a particular PDU is ordered for
187	       use.  A PDU is Delivered in the exact order that it was sent by
188	       the original sender; MPA uses TCP's byte stream ordering to
189	       determine when Delivery is possible.  This is specifically
190	       different from "passing the PDU to DDP", which may generally
191	       occur in any order, while the order of Delivery is strictly
192	       defined.

194	   EMSS - Effective Maximum Segment Size.  EMSS is the smaller of the
195	       TCP maximum segment size (MSS) as defined in RFC 793 [RFC793],
196	       and the current path Maximum Transfer Unit (MTU) [RFC1191].

198	   FPDU - Framed Protocol Data Unit.  The unit of data created by an MPA
199	       sender.

201	   FPDU Alignment - the property that an FPDU is Header Aligned with the
202	       TCP segment, and the TCP segment includes an integer number of
203	       FPDUs.  A TCP segment with a FPDU Alignment allows immediate
204	       processing of the contained FPDUs without waiting on other TCP
205	       segments to arrive or combining with prior segments.

207	   FPDU Pointer (FPDUPTR) - This field of the Marker is used to indicate
208	       the beginning of an FPDU.

210	   Full Operation (Full Operation Phase) - After the completion of the
211	       Startup Phase MPA begins exchanging FPDUs.

213	   Header Alignment - the property that a TCP segment begins with an
214	       FPDU.  The FPDU is Header Aligned when the FPDU header is exactly
215	       at the start of the TCP segment (right behind the TCP headers on
216	       the wire).

218	   Initiator - The endpoint of a connection that sends the MPA Request
219	       Frame, i.e. the first to actually send data (which may not be the
220	       one which sends the TCP SYN).

222	   Marker - A four octet field that is placed in the MPA data stream at
223	       fixed octet intervals (every 512 octets).

225	   MPA-aware TCP - a TCP implementation that is aware of the receiver
226	       efficiencies of MPA FPDU Alignment and is capable of sending TCP
227	       segments that begin with an FPDU.

229	   MPA-enabled - MPA is enabled if the MPA protocol is visible on the
230	       wire.  When the sender is MPA-enabled, it is inserting framing
231	       and Markers.  When the receiver is MPA-enabled, it is
232	       interpreting framing and Markers.

234	   MPA Request Frame - Data sent from the MPA Initiator to the MPA
235	       Responder during the Startup Phase.

237	   MPA Reply Frame - Data sent from the MPA Responder to the MPA
238	       Initiator during the Startup Phase.

240	   MPA - Marker-based ULP PDU Aligned Framing for TCP protocol.  This
241	       document defines the MPA protocol.

243	   MULPDU - Maximum ULPDU.  The current maximum size of the record that
244	       is acceptable for DDP to pass to MPA for transmission.

246	   Node - A computing device attached to one or more links of a Network.
247	       A Node in this context does not refer to a specific application
248	       or protocol instantiation running on the computer.  A Node may
249	       consist of one or more MPA on TCP devices installed in a host
250	       computer.

252	   PAD - A 1-3 octet group of zeros used to fill an FPDU to an exact
253	       modulo 4 size.

255	   PDU - protocol data unit

257	   Private Data - A block of data exchanged between MPA endpoints during
258	       initial connection setup.

260	   Protection Domain - An RDMA concept (see [VERBS] and [RDMASEC]) that
261	       tie use of various endpoint resources (memory access etc.) to the
262	       specific RDMA/DDP/MPA connection.

264	   RDDP - a suite of protocols including MPA, [DDP], [RDMAP], an overall
265	       security document [RDMASEC], a problem statement [RFC4297], an
266	       architecture document [RFC4296], and an applicability document
267	       [APPL].

269	   RDMA - Remote Direct Memory Access; a protocol that uses DDP and MPA
270	       to enable applications to transfer data directly from memory
271	       buffers.  See [RDMAP].

273	   Remote Peer - The MPA protocol implementation on the opposite end of
274	       the connection.  Used to refer to the remote entity when
275	       describing protocol exchanges or other interactions between two
276	       Nodes.

278	   Responder - The connection endpoint which responds to an incoming MPA
279	       connection request (the MAP Request Frame).  This may not be the
280	       endpoint which awaited the TCP SYN.

282	   Startup Phase - The initial exchanges of an MPA connection which
283	       serves to more fully identify MPA endpoints to each other and
284	       pass connection specific setup information to each other.

286	   ULP - Upper Layer Protocol.  The protocol layer above the protocol
287	       layer currently being referenced.  The ULP for MPA is DDP [DDP].

289	   ULPDU - Upper Layer Protocol Data Unit.  The data record defined by
290	      the layer above MPA (DDP).  ULPDU corresponds to DDP's DDP
291	      segment.

293	   ULPDU_Length - a field in the FPDU describing the length of the
294	      included ULPDU.

296	2  Introduction

298	   This section discusses the reason for creating MPA on TCP and a
299	   general overview of the protocol.

301	2.1 Motivation

303	   The Direct Data Placement protocol [DDP], when used with TCP [RFC793]
304	   requires a mechanism to detect record boundaries.  The DDP records
305	   are referred to as Upper Layer Protocol Data Units by this document.
306	   The ability to locate the Upper Layer Protocol Data Unit (ULPDU)
307	   boundary is useful to a hardware network adapter that uses DDP to
308	   directly place the data in the application buffer based on the
309	   control information carried in the ULPDU header.  This may be done
310	   without requiring that the packets arrive in order.  Potential
311	   benefits of this capability are the avoidance of the memory copy
312	   overhead and a smaller memory requirement for handling out of order
313	   or dropped packets.

315	   Many approaches have been proposed for a generalized framing
316	   mechanism.  Some are probabilistic in nature and others are
317	   deterministic.  An example probabilistic approach is characterized by
318	   a detectable value embedded in the octet stream, with no method of
319	   preventing that value elsewhere within user data.  It is
320	   probabilistic because under some conditions the receiver may
321	   incorrectly interpret application data as the detectable value.
322	   Under these conditions, the protocol may fail with unacceptable
323	   frequency.  One deterministic approach is characterized by embedded
324	   controls at known locations in the octet stream.  Because the
325	   receiver can guarantee it will only examine the data stream at
326	   locations that are known to contain the embedded control, the
327	   protocol can never misinterpret application data as being embedded
328	   control data.  For unambiguous handling of an out of order packet, a
329	   deterministic approach is preferred.

331	   The MPA protocol provides a framing mechanism for DDP running over
332	   TCP using the deterministic approach.  It allows the location of the
333	   ULPDU to be determined in the TCP stream even if the TCP segments
334	   arrive out of order.

336	2.2 Protocol Overview

338	   The layering of PDUs with MPA is shown in Figure 1, below.

340	               +------------------+
341	               |     ULP client   |
342	               +------------------+  <- Consumer messages
343	               |        DDP       |
344	               +------------------+  <- ULPDUs
345	               |        MPA*      |
346	               +------------------+  <- FPDUs (containing ULPDUs)
347	               |        TCP*      |
348	               +------------------+  <- TCP Segments (containing FPDUs)
349	               |      IP etc.     |
350	               +------------------+
351	                * These may be fully layered or optimized together.

353	                       Figure 1 ULP MPA TCP Layering

355	   MPA is described as an extra layer above TCP and below DDP.  The
356	   operation sequence is:

358	   1.  A TCP connection is established by ULP action.  This is done
359	       using methods not described by this specification.  The ULP may
360	       exchange some amount of data in streaming mode prior to starting
361	       MPA, but is not required to do so.

363	   2.  The Consumer negotiates the use of DDP and MPA at both ends of a
364	       connection.  The mechanisms to do this are not described in this
365	       specification.  The negotiation may be done in streaming mode, or
366	       by some other mechanism (such as a pre-arranged port number).

368	   3.  The ULP activates MPA on each end in the Startup Phase, either as
369	       an Initiator or a Responder, as determined by the ULP.  This mode
370	       verifies the usage of MPA, specifies the use of CRC and Markers,
371	       and allows the ULP to communicate some additional data via a
372	       Private Data exchange.  See section 7.1 Connection setup for more
373	       details on the startup process.

375	   4.  At the end of the Startup Phase, the ULP puts MPA (and DDP) into
376	       Full Operation and begins sending DDP data as further described
377	       below.  In this document, DDP data chunks are called ULPDUs.  For
378	       a description of the DDP data, see [DDP].

380	   Following is a description of data transfer when MPA is in Full
381	   Operation.

383	   1.  DDP determines the Maximum ULPDU (MULPDU) size by querying MPA
384	       for this value.  MPA derives this information from TCP or IP,
385	       when it is available, or chooses a reasonable value.

387	   2.  DDP creates ULPDUs of MULPDU size or smaller, and hands them to
388	       MPA at the sender.

390	   3.  MPA creates a Framed Protocol Data Unit (FPDU) by pre-pending a
391	       header, optionally inserting Markers, and appending a CRC field
392	       after the ULPDU and PAD (if any).  MPA delivers the FPDU to TCP.

394	   4.  The TCP sender puts the FPDUs into the TCP stream.  If the sender
395	       is optimized MPA/TCP, it segments the TCP stream in such a way
396	       that a TCP Segment boundary is also the boundary of an FPDU.  TCP
397	       then passes each segment to the IP layer for transmission.

399	   5.  The receiver may or may not be optimized.  If it is optimized
400	       MPA/TCP, it may separate passing the TCP payload to MPA from
401	       passing the TCP payload ordering information to MPA.  In either
402	       case, RFC compliant TCP wire behavior is observed at both the
403	       sender and receiver.

405	   6.  The MPA receiver locates and assembles complete FPDUs within the
406	       stream, verifies their integrity, and removes MPA Markers (when
407	       present), ULPDU_Length, PAD and the CRC field.

409	   7.  MPA then provides the complete ULPDUs to DDP.  MPA may also
410	       separate passing MPA payload to DDP from passing the MPA payload
411	       ordering information.

413	   A fully layered MPA on TCP is implemented as a data stream ULP for
414	   TCP and is therefore RFC compliant.

416	   An optimized DDP/MPA/TCP uses a TCP layer which potentially contains
417	   some additional behaviors as suggested in this document.  When
418	   DDP/MPA/TCP are cross-layer optimized, the behavior of TCP (esp.
419	   sender segmentation) may change from that of the un-optimized
420	   implementation, but the changes are within the bounds permitted by
421	   the TCP RFC specifications, and will interoperate with an un-
422	   optimized TCP.  The additional behaviors are described in Appendix A
423	   and are not normative, they are described at a TCP interface layer as
424	   a convenience.  Implementations may achieve the described
425	   functionality using any method, including cross layer optimizations
426	   between TCP, MPA and DDP.

428	   An optimized DDP/MPA/TCP sender is able to segment the data stream
429	   such that TCP segments begin with FPDUs (FPDU Alignment).  This has
430	   significant advantages for receivers.  When segments arrive with
431	   aligned FPDUs the receiver usually need not buffer any portion of the
432	   segment, allowing DDP to place it in its destination memory
433	   immediately, thus avoiding copies from intermediate buffers (DDP's
434	   reason for existence).

436	   An optimized DDP/MPA/TCP receiver allows a DDP on MPA implementation
437	   to locate the start of ULPDUs that may be received out of order.  It
438	   also allows the implementation to determine if the entire ULPDU has
439	   been received.  As a result, MPA can pass out of order ULPDUs to DDP
440	   for immediate use.  This enables a DDP on MPA implementation to save
441	   a significant amount of intermediate storage by placing the ULPDUs in
442	   the right locations in the application buffers when they arrive,
443	   rather than waiting until full ordering can be restored.

445	   The ability of a receiver to recover out of order ULPDUs is optional
446	   and declared to the transmitter during startup.  When the receiver
447	   declares that it does not support out of order recovery, the
448	   transmitter does not add the control information to the data stream
449	   needed for out of order recovery.

451	   If the receiver is fully layered, then MPA receives a strictly
452	   ordered stream of data and does not deal with out of order ULPDUs.
453	   In this case MPA passes each ULPDU to DDP when the last bytes arrive
454	   from TCP, along with the indication that they are in order.

456	   MPA implementations that support recovery of out of order ULPDUs MUST
457	   support a mechanism to indicate the ordering of ULPDUs as the sender
458	   transmitted them and indicate when missing intermediate segments
459	   arrive.  These mechanisms allow DDP to reestablish record ordering
460	   and report Delivery of complete messages (groups of records).

462	   MPA also addresses enhanced data integrity.  Some users of TCP have
463	   noted that the TCP checksum is not as strong as could be desired (see
464	   [CRCTCP]).  Studies such as [CRCTCP] have shown that the TCP checksum
465	   indicates segments in error at a much higher rate than the underlying
466	   link characteristics would indicate.  With these higher error rates,
467	   the chance that an error will escape detection, when using only the
468	   TCP checksum for data integrity, becomes a concern.  A stronger
469	   integrity check can reduce the chance of data errors being missed.

471	   MPA includes a CRC check to increase the ULPDU data integrity to the
472	   level provided by other modern protocols, such as SCTP [RFC2960].  It
473	   is possible to disable this CRC check, however CRCs MUST be enabled
474	   unless it is clear that the end to end connection through the network
475	   has data integrity at least as good as an MPA with CRC enabled (for
476	   example when IPsec is implemented end to end).  DDP's ULP expects
477	   this level of data integrity and therefore the ULP does not have to
478	   provide its own duplicate data integrity and error recovery for lost
479	   data.

481	3  MPA's interactions with DDP

483	   DDP requires MPA to maintain DDP record boundaries from the sender to
484	   the receiver.  When using MPA on TCP to send data, DDP provides
485	   records (ULPDUs) to MPA.  MPA will use the reliable transmission
486	   abilities of TCP to transmit the data, and will insert appropriate
487	   additional information into the TCP stream to allow the MPA receiver
488	   to locate the record boundary information.

490	   As such, MPA accepts complete records (ULPDUs) from DDP at the sender
491	   and returns them to DDP at the receiver.

493	   MPA MUST encapsulate the ULPDU such that there is exactly one ULPDU
494	   contained in one FPDU.

496	   MPA over a standard TCP stack can usually provide FPDU Alignment with
497	   the TCP Header if the FPDU is equal to TCP's EMSS.  An optimized
498	   MPA/TCP stack can also maintain alignment as long as the FPDU is less
499	   than or equal to TCP's EMSS.  Since FPDU Alignment is generally
500	   desired by the receiver, DDP cooperates with MPA to ensure FPDUs'
501	   lengths do not exceed the EMSS under normal conditions.  This is done
502	   with the MULPDU mechanism.

504	   MPA MUST provide information to DDP on the current maximum size of
505	   the record that is acceptable to send (MULPDU).  DDP SHOULD limit
506	   each record size to MULPDU.  The range of MULPDU values MUST be
507	   between 128 octets and 64768 octets, inclusive.

509	   The sending DDP MUST NOT post a ULPDU larger than 64768 octets to
510	   MPA.  DDP MAY post a ULPDU of any size between one and 64768 octets,
511	   however MPA is not REQUIRED to support a ULPDU Length that is greater
512	   than the current MULPDU.

514	   While the maximum theoretical length supported by the MPA header
515	   ULPDU_Length field is 65535, TCP over IP requires the IP datagram
516	   maximum length to be 65535 octets.  To enable MPA to support FPDU
517	   Alignment, the maximum size of the FPDU must fit within an IP
518	   datagram.  Thus the ULPDU limit of 64768 octets was derived by taking
519	   the maximum IP datagram length, subtracting from it the maximum total
520	   length of the sum of the IPv4 header, TCP header, IPv4 options, TCP
521	   options, and the worst case MPA overhead, and then rounding the
522	   result down to a 128 octet boundary.

524	   Note that MULPDU will be significantly smaller than the theoretical
525	   maximum in most implementations for most circumstances, due to link
526	   MTUs, use of extra headers such as required for IPSEC etc.

528	   On receive, MPA MUST pass each ULPDU with its length to DDP when it
529	   has been validated.

531	   If an MPA implementation supports passing out of order ULPDUs to DDP,
532	   the MPA implementation SHOULD:

534	   *   Pass each ULPDU with its length to DDP as soon as it has been
535	       fully received and validated.

537	   *   Provide a mechanism to indicate the ordering of ULPDUs as the
538	       sender transmitted them.  One possible mechanism might be
539	       providing the TCP sequence number for each ULPDU.

541	   *   Provide a mechanism to indicate when a given ULPDU (and prior
542	       ULPDUs) are complete (Delivered to DDP).  One possible mechanism
543	       might be to allow DDP to see the current outgoing TCP Ack
544	       sequence number.

546	   *   Provide an indication to DDP that the TCP has closed or has begun
547	       to close the connection (e.g. received a FIN).

549	   MPA MUST provide the protocol version negotiated with its peer to
550	   DDP.  DDP will use this version to set the version in its header and
551	   to report the version to [RDMAP].

553	4  MPA Full Operation Mode

555	   The following sections describe the main semantics of the full
556	   operation mode of MPA.

558	4.1 FPDU Format

560	   MPA senders create FPDUs out of ULPDUs.  The format of an FPDU shown
561	   below MUST be used for all MPA FPDUs.  For purposes of clarity,
562	   Markers are not shown in Figure 2.

564	       0                   1                   2                   3
565	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
566	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
567	      |          ULPDU_Length         |                               |
568	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
569	      |                                                               |
570	      ~                                                               ~
571	      ~                            ULPDU                              ~
572	      |                                                               |
573	      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
574	      |                               |          PAD (0-3 octets)     |
575	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
576	      |                             CRC                               |
577	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
578	                           Figure 2 FPDU Format

580	   ULPDU_Length: 16 bits (unsigned integer).  This is the number of
581	   octets of the contained ULPDU.  It does not include the length of the
582	   FPDU header itself, the pad, the CRC, or of any Markers that fall
583	   within the ULPDU.  The 16-bit ULPDU Length field is large enough to
584	   support the largest IP datagrams for IPv4 or IPv6.

586	   PAD: The PAD field trails the ULPDU and contains between zero and
587	   three octets of data.  The pad data MUST be set to zero by the sender
588	   and ignored by the receiver (except for CRC checking).  The length of
589	   the pad is set so as to make the size of the FPDU an integral
590	   multiple of four.

592	   CRC: 32 bits, When CRCs are enabled, this field contains a CRC32C
593	   check value, which is used to verify the entire contents of the FPDU,
594	   using CRC32C.  See section 4.4 CRC Calculation on page 18.  When CRCs
595	   are not enabled, this field is still present, may contain any value,
596	   and MUST NOT be checked.

598	   The FPDU adds a minimum of 6 octets to the length of the ULPDU.  In
599	   addition, the total length of the FPDU will include the length of any
600	   Markers and from 0 to 3 pad octets added to round-up the ULPDU size.

602	4.2 Marker Format

604	   The format of a Marker MUST be as specified in Figure 3:

606	       0                   1                   2                   3
607	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
608	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
609	      |           RESERVED            |            FPDUPTR            |
610	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
611	                          Figure 3 Marker Format

613	   RESERVED: The Reserved field MUST be set to zero on transmit and
614	   ignored on receive (except for CRC calculation).

616	   FPDUPTR: The FPDU Pointer is a relative pointer, 16-bits long,
617	   interpreted as an unsigned integer that indicates the number of
618	   octets in the TCP stream from the beginning of the ULPDU Length field
619	   to the first octet of the entire Marker.  The least significant two
620	   bits MUST always be set to zero at the transmitter, and the receivers
621	   MUST always treat these as zero for calculations.

623	4.3 MPA Markers

625	   MPA Markers are used to identify the start of FPDUs when packets are
626	   received out of order.  This is done by locating the Markers at fixed
627	   intervals in the data stream (which is correlated to the TCP sequence
628	   number) and using the Marker value to locate the preceding FPDU
629	   start.

631	   All MPA Markers are included in the containing FPDU CRC calculation
632	   (when both CRCs and Markers are in use).

634	   The MPA receiver's ability to locate out of order FPDUs and pass the
635	   ULPDUs to DDP is implementation dependent.  MPA/DDP allows those
636	   receivers that are able to deal with out of order FPDUs in this way
637	   to require the insertion of Markers in the data stream.  When the
638	   receiver cannot deal with out of order FPDUs in this way, it may
639	   disable the insertion of Markers at the sender.  All MPA senders MUST
640	   be able to generate Markers when their use is declared by the
641	   opposing receiver (see section 7.1 Connection setup on page 26).

643	   When Markers are enabled, MPA senders MUST insert a Marker into the
644	   data stream at a 512 octet periodic interval in the TCP Sequence
645	   Number Space.  The Marker contains a 16 bit unsigned integer referred
646	   to as the FPDUPTR (FPDU Pointer).

648	   If the FPDUPTR's value is non-zero, the FPDU Pointer is a 16 bit
649	   relative back-pointer.  FPDUPTR MUST contain the number of octets in
650	   the TCP stream from the beginning of the ULPDU Length field to the
651	   first octet of the Marker, unless the Marker falls between FPDUs.
652	   Thus the location of the first octet of the previous FPDU header can
653	   be determined by subtracting the value of the given Marker from the
654	   current octet-stream sequence number (i.e. TCP sequence number) of
655	   the first octet of the Marker.  Note that this computation MUST take
656	   into account that the TCP sequence number could have wrapped between
657	   the Marker and the header.

659	   An FPDUPTR value of 0x0000 is a special case - it is used when the
660	   Marker falls exactly between FPDUs (between the preceding FPDU CRC
661	   field, and the next FPDU's ULPDU Length field).  In this case, the
662	   Marker is considered to be contained in the following FPDU; the
663	   Marker MUST be included in the CRC calculation of the FPDU following
664	   the Marker (if CRCs are being generated or checked).  Thus an FPDUPTR
665	   value of 0x0000 means that immediately following the Marker is an
666	   FPDU header (the ULPDU Length field).

668	   Since all FPDUs are integral multiples of 4 octets, the bottom two
669	   bits of the FPDUPTR as calculated by the sender are zero.  MPA
670	   reserves these bits so they MUST be treated as zero for computation
671	   at the receiver.

673	   When Markers are enabled (see section 7.1 Connection setup on page
674	   26), the MPA Markers MUST be inserted immediately preceding the first
675	   FPDU of Full Operation phase, and at every 512th octet of the TCP
676	   octet stream thereafter.  As a result, the first Marker has an
677	   FPDUPTR value of 0x0000.  If the first Marker begins at octet
678	   sequence number SeqStart, then Markers are inserted such that the
679	   first octet of the Marker is at octet sequence number SeqNum if the
680	   remainder of (SeqNum - SeqStart) mod 512 is zero.  Note that SeqNum
681	   can wrap.

683	   For example, if the TCP sequence number were used to calculate the
684	   insertion point of the Marker, the starting TCP sequence number is
685	   unlikely to be zero, and 512 octet multiples are unlikely to fall on
686	   a modulo 512 of zero.  If the MPA connection is started at TCP
687	   sequence number 11, then the 1st Marker will begin at 11, and
688	   subsequent Markers will begin at 523, 1035, etc.

690	   If an FPDU is large enough to contain multiple Markers, they MUST all
691	   point to the same point in the TCP stream: the first octet of the
692	   ULPDU Length field for the FPDU.

694	   If a Marker interval contains multiple FPDUs (the FPDUs are small),
695	   the Marker MUST point to the start of the ULPDU Length field for the
696	   FPDU containing the Marker unless the Marker falls between FPDUs, in
697	   which case the Marker MUST be zero.

699	   The following example shows an FPDU containing a Marker.

701	       0                   1                   2                   3
702	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
703	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
704	      |       ULPDU Length (0x0010)   |                               |
705	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
706	      |                                                               |
707	      +                                                               +
708	      |                         ULPDU (octets 0-9)                    |
709	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
710	      |            (0x0000)           |        FPDU ptr (0x000C)      |
711	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
712	      |                        ULPDU (octets 10-15)                   |
713	      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
714	      |                               |          PAD (2 octets:0,0)   |
715	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
716	      |                              CRC                              |
717	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
718	                 Figure 4 Example FPDU Format with Marker

720	   MPA Receivers MUST preserve ULPDU boundaries when passing data to
721	   DDP.  MPA Receivers MUST pass the ULPDU data and the ULPDU Length to
722	   DDP and not the Markers, headers, and CRC.

724	4.4 CRC Calculation

726	   An MPA implementation MUST implement CRC support and MUST either:

728	   (1) always use CRCs; The MPA provider at is not REQUIRED to support
729	       an administrator's request that CRCs not be used.

731	       or

733	   (2a) only indicate a preference to not use CRCs on the explicit
734	       request of the system administrator, via an interface not defined
735	       in this spec.  The default configuration for a connection MUST be
736	       to use CRCs.

738	   (2b) disable CRC checking (and possibly generation) if both the local
739	       and remote endpoints indicate preference to not use CRCs.

741	   The decision for hosts to request CRC suppression MAY be made on an
742	   administrative basis for any path that provides equivalent protection
743	   from undetected errors as an end-to-end CRC32c.

745	   The process MUST be invisible to the ULP.

747	   After receipt of an MPA startup declaration indicating that its peer
748	   requires CRCs, an MPA instance MUST continue generating and checking
749	   CRCs until the connection terminates.  If an MPA instance has
750	   declared that it does not require CRCs, it MUST turn off CRC checking
751	   immediately after receipt of an MPA mode declaration indicating that
752	   its peer also does not require CRCs.  It MAY continue generating
753	   CRCs.  See section 7.1 Connection setup on page 26 for details on the
754	   MPA startup.

756	   When sending an FPDU, the sender MUST include a CRC field.  When CRCs
757	   are enabled, the CRC field in the MPA FPDU MUST be computed using the
758	   CRC32C polynomial in the manner described in the iSCSI Protocol
759	   [iSCSI] document for Header and Data Digests.

761	   The fields which MUST be included in the CRC calculation when sending
762	   an FPDU are as follows:

764	   1)  If a Marker does not immediately precede the ULPDU Length field,
765	       the CRC-32c is calculated from the first octet of the ULPDU
766	       Length field, through all the ULPDU and Markers (if present), to
767	       the last octet of the PAD (if present), inclusive.  If there is a
768	       Marker immediately following the PAD, the Marker is included in
769	       the CRC calculation for this FPDU.

771	   2)  If a Marker immediately precedes the first octet of the ULPDU
772	       Length field of the FPDU, (i.e. the Marker fell between FPDUs,
773	       and thus is required to be included in the second FPDU), the CRC-
774	       32c is calculated from the first octet of the Marker, through the
775	       ULPDU Length header, through all the ULPDU and Markers (if
776	       present), to the last octet of the PAD (if present), inclusive.

778	   3)  After calculating the CRC-32c, the resultant value is placed into
779	       the CRC field at the end of the FPDU.

781	   When an FPDU is received, and CRC checking is enabled, the receiver
782	   MUST first perform the following:

784	   1)  Calculate the CRC of the incoming FPDU in the same fashion as
785	       defined above.

787	   2)  Verify that the calculated CRC-32c value is the same as the
788	       received CRC-32c value found in the FPDU CRC field.  If not, the
789	       receiver MUST treat the FPDU as an invalid FPDU.

791	   The procedure for handling invalid FPDUs is covered in the Error
792	   Section (see section 8 on page 40).

794	   The following is an annotated hex dump of an example FPDU sent as the
795	   first FPDU on the stream.  As such, it starts with a Marker.  The
796	   FPDU contains a 42 octet ULPDU (an example DDP segment) which in turn
797	   contains 24 octets of the contained ULPDU, which is a data load that
798	   is all zeros.  The CRC32c has been correctly calculated and can be
799	   used as a reference.  See the [DDP] and [RDMAP] specification for
800	   definitions of the DDP Control field, Queue, MSN, MO, and Send Data.

802	       Octet Contents  Annotation
803	       Count

805	       0000    00      Marker: Reserved
806	       0001    00
807	       0002    00      Marker: FPDUPTR
808	       0003    00
809	       0004    00      ULPDU Length
810	       0005    2a
811	       0006    41      DDP Control Field, Send with Last flag set
812	       0007    43
813	       0008    00      Reserved (DDP STag position with no STag)
814	       0009    00
815	       000a    00
816	       000b    00
817	       000c    00      DDP Queue = 0
818	       000d    00
819	       000e    00
820	       000f    00
821	       0010    00      DDP MSN = 1
822	       0011    00
823	       0012    00
824	       0013    01
825	       0014    00      DDP MO = 0
826	       0015    00
827	       0016    00
828	       0017    00
829	       0018    00      DDP Send Data (24 octets of zeros)
830	       ...
831	       002f    00
832	       0030    52      CRC32c
833	       0031    23
834	       0032    99
835	       0033    83
836	                  Figure 5 Annotated Hex Dump of an FPDU

838	   The following is an example sent as the second FPDU of the stream
839	   where the first FPDU (which is not shown here) had a length of 492
840	   octets and was also a Send to Queue 0 with Last Flag set.  This
841	   example contains a Marker.

843	       Octet Contents  Annotation
844	       Count

846	       01ec    00      Length
847	       01ed    2a
848	       01ee    41      DDP Control Field: Send with Last Flag set
849	       01ef    43
850	       01f0    00      Reserved (DDP STag position with no STag)
851	       01f1    00
852	       01f2    00
853	       01f3    00
854	       01f4    00      DDP Queue = 0
855	       01f5    00
856	       01f6    00
857	       01f7    00
858	       01f8    00      DDP MSN = 2
859	       01f9    00
860	       01fa    00
861	       01fb    02
862	       01fc    00      DDP MO = 0
863	       01fd    00
864	       01fe    00
865	       01ff    00
866	       0200    00      Marker: Reserved
867	       0201    00
868	       0202    00      Marker: FPDUPTR
869	       0203    14
870	       0204    00      DDP Send Data (24 octets of zeros)
871	       ...
872	       021b    00
873	       021c    84      CRC32c
874	       021d    92
875	       021e    58
876	       021f    98
877	            Figure 6 Annotated Hex Dump of an FPDU with Marker

879	4.5 FPDU Size Considerations

881	   MPA defines the Maximum Upper Layer Protocol Data Unit (MULPDU) as
882	   the size of the largest ULPDU fitting in an FPDU.  For an empty TCP
883	   Segment, MULPDU is EMSS minus the FPDU overhead (6 octets) minus
884	   space for Markers and pad octets.

886	        The maximum ULPDU Length for a single ULPDU when Markers are
887	        present MUST be computed as:

889	        MULPDU = EMSS - (6 + 4 * Ceiling(EMSS / 512) + EMSS mod 4)

891	   The formula above accounts for the worst-case number of Markers.

893	        The maximum ULPDU Length for a single ULPDU when Markers are NOT
894	        present MUST be computed as:

896	        MULPDU = EMSS - (6 + EMSS mod 4)

898	   As a further optimization of the wire efficiency an MPA
899	   implementation MAY dynamically adjust the MULPDU (see section 5 for
900	   latency and wire efficiency trade-offs).  When one or more FPDUs are
901	   already packed into a TCP Segment, MULPDU MAY be reduced accordingly.

903	   DDP SHOULD provide ULPDUs that are as large as possible, but less
904	   than or equal to MULPDU.

906	   If the TCP implementation needs to adjust EMSS to support MTU changes
907	   or changing TCP options, the MULPDU value is changed accordingly.

909	   In certain rare situations, the EMSS may shrink below 128 octets in
910	   size.  If this occurs, the MPA on TCP sender MUST NOT shrink the
911	   MULPDU below 128 octets and is not required to follow the
912	   segmentation rules in Sections 5.1 and Appendix A.

914	   If one or more FPDUs are already packed into a TCP segment, such that
915	   the remaining room is less than 128 octets, MPA MUST NOT provide a
916	   MULPDU smaller than 128.  In this case, MPA would typically provide a
917	   MULPDU for the next full sized segment, but may still pack the next
918	   FPDU into the small remaining room, provide that the next FPDU is
919	   small enough to fit.

921	   The value 128 is chosen as to allow DDP designers room for the DDP
922	   Header and some user data.

924	5  MPA's interactions with TCP

926	   The following sections describe MPA's interactions with TCP.  This
927	   section discusses using a standard layered TCP stack with MPA
928	   attached above a TCP socket.  Discussion of using an optimized MPA-
929	   aware TCP with an MPA implementation that takes advantage of the
930	   extra optimizations is done in Appendix A.

932	                   +-----------------------------------+
933	                   | +-----+       +-----------------+ |
934	                   | | MPA |       | Other Protocols | |
935	                   | +-----+       +-----------------+ |
936	                   |    ||                  ||         |
937	                   |  ----- socket API --------------  |
938	                   |            ||                     |
939	                   |         +-----+                   |
940	                   |         | TCP |                   |
941	                   |         +-----+                   |
942	                   |            ||                     |
943	                   |         +-----+                   |
944	                   |         | IP  |                   |
945	                   |         +-----+                   |
946	                   +-----------------------------------+

948	                   Figure 7 Fully layered implementation

950	   The Fully layered implementation is described for completeness;
951	   however, the user is cautioned that the reduced probability of FPDU
952	   alignment when transmitting with this implementation will tend to
953	   introduce a higher overhead at optimized receivers.  In addition, the
954	   lack of out-of-order receive processing will significantly reduce the
955	   value of DDP/MPA by imposing higher buffering and copying overhead in
956	   the local receiver.

958	5.1 MPA transmitters with a standard layered TCP

960	   MPA transmitters SHOULD calculate a MULPDU as described in section
961	   4.5  If the TCP implementation allows EMSS to be determined by MPA,
962	   that value should be used.  If the transmit side TCP implementation
963	   is not able to report the EMSS, MPA SHOULD use the current MTU value
964	   to establish a likely FPDU size, taking into account the various
965	   expected header sizes.

967	   MPA transmitters SHOULD also use whatever facilities the TCP stack
968	   presents to cause the TCP transmitter to start TCP segments at FPDU
969	   boundaries.  Multiple FPDUs MAY be packed into a single TCP segment
970	   as determined by the EMSS calculation as long as they are entirely
971	   contained in the TCP segment.

973	   For example, passing FPDU buffers sized to the current EMSS to the
974	   TCP socket and using the TCP_NODELAY socket option to disable the
975	   Nagle [RFC0896] algorithm will usually result in many of the segments
976	   starting with an FPDU.

978	   It is recognized that various effects can cause a FPDU alignment to
979	   be lost.  Following are a few of the effects:

981	   *   ULPDUs that are smaller than the MULPDU.  If these are sent in a
982	       continuous stream, FPDU alignment will be lost.  Note that
983	       careful use of a dynamic MULPDU can help in this case; the MULPDU
984	       for future FPDUs can be adjusted to re-establish alignment with
985	       the segments based on the current EMSS.

987	   *   Sending enough data that the TCP receive window limit is reached.
988	       TCP may send a smaller segment to exactly fill the receive
989	       window.

991	   *   Sending data when TCP is operating up against the congestion
992	       window.  If TCP is not tracking the congestion window in
993	       segments, it may transmit a smaller segment to exactly fill the
994	       receive window.

996	   *   Changes in EMSS due to varying TCP options, or changes in MTU.

998	   If FPDU alignment with TCP segments is lost for any reason, the
999	   alignment is regained after a break in transmission where the TCP
1000	   send buffers are emptied.  Many usage models for DDP/MPA will include
1001	   such breaks.

1003	   MPA receivers are REQUIRED to be able to operate correctly even if
1004	   alignment is lost (see section 6).

1006	5.2 MPA receivers with a standard layered TCP

1008	   MPA receivers will get TCP data in the usual ordered stream.  The
1009	   receivers MUST identify FPDU boundaries by using the ULPDU_LENGTH
1010	   field, as described in section 6.  Receivers MAY utilize markers to
1011	   check for FPDU boundary consistency, but they are NOT required to
1012	   examine the markers to determine the FPDU boundaries.

1014	6  MPA Receiver FPDU Identification

1016	   An MPA receiver MUST first verify the FPDU before passing the ULPDU
1017	   to DDP.  To do this, the receiver MUST:

1019	   *   locate the start of the FPDU unambiguously,

1021	   *   verify its CRC (if CRC checking is enabled).

1023	   If the above conditions are true, the MPA receiver passes the ULPDU
1024	   to DDP.

1026	   To detect the start of the FPDU unambiguously one of the following
1027	   MUST be used:

1029	   1:  In an ordered TCP stream, the ULPDU Length field in the current
1030	       FPDU when FPDU has a valid CRC, can be used to identify the
1031	       beginning of the next FPDU.

1033	   2:  For optimized MPA/TCP receivers that support out of order
1034	       reception of FPDUs (see section 4.3 MPA Markers on page 15) a
1035	       Marker can always be used to locate the beginning of an FPDU (in
1036	       FPDUs with valid CRCs).  Since the location of the Marker is
1037	       known in the octet stream (sequence number space), the Marker can
1038	       always be found.

1040	   3:  Having found an FPDU by means of a Marker, an optimized MPA/TCP
1041	       receiver can find following contiguous FPDUs by using the ULPDU
1042	       Length fields (from FPDUs with valid CRCs) to establish the next
1043	       FPDU boundary.

1045	   The ULPDU Length field (see section 4 on page 14) MUST be used to
1046	   determine if the entire FPDU is present before forwarding the ULPDU
1047	   to DDP.

1049	   CRC calculation is discussed in section 4.4 on page 18 above.

1051	7  Connection Semantics

1053	7.1 Connection setup

1055	   MPA requires that the Consumer MUST activate MPA, and any TCP
1056	   enhancements for MPA, on a TCP half connection at the same location
1057	   in the octet stream at both the sender and the receiver.  This is
1058	   required in order for the Marker scheme to correctly locate the
1059	   Markers (if enabled) and to correctly locate the first FPDU.

1061	   MPA, and any TCP enhancements for MPA are enabled by the ULP in both
1062	   directions at once at an endpoint.

1064	   This can be accomplished several ways, and is left up to DDP's ULP:

1066	   *   DDP's ULP MAY require DDP on MPA startup immediately after TCP
1067	       connection setup.  This has the advantage that no streaming mode
1068	       negotiation is needed.  An example of such a protocol is shown in
1069	       Figure 10: Example Immediate Startup negotiation on page 36.

1071	       This may be accomplished by using a well-known port, or a service
1072	       locator protocol to locate an appropriate port on which DDP on
1073	       MPA is expected to operate.

1075	   *   DDP's ULP MAY negotiate the start of DDP on MPA sometime after a
1076	       normal TCP startup, using TCP streaming data exchanges on the
1077	       same connection.  The exchange establishes that DDP on MPA (as
1078	       well as other ULPs) will be used, and exactly locates the point
1079	       in the octet stream where MPA is to begin operation.  Note that
1080	       such a negotiation protocol is outside the scope of this
1081	       specification.  A simplified example of such a protocol is shown
1082	       in Figure 9: Example Delayed Startup negotiation on page 33.

1084	   An MPA endpoint operates in two distinct phases.

1086	   The Startup Phase is used to verify correct MPA setup, exchange CRC
1087	   and Marker configuration, and optionally pass Private Data between
1088	   endpoints prior to completing a DDP connection.  During this phase,
1089	   specifically formatted frames are exchanged as TCP byte streams
1090	   without using CRCs or Markers.  During this phase a DDP endpoint need
1091	   not be "bound" to the MPA connection.  In fact, the choice of DDP
1092	   endpoint and its operating parameters may not be known until the
1093	   Consumer supplied Private Data (if any) has been examined by the
1094	   Consumer.

1096	   The second distinct phase is Full Operation during which FPDUs are
1097	   sent using all the rules that pertain (CRCs, Markers, MULPDU
1098	   restrictions etc.).  A DDP endpoint MUST be "bound" to the MPA
1099	   connection at entry to this phase.

1101	   When Private Data is passed between ULPs in the Startup Phase, the
1102	   ULP is responsible for interpreting that data, and then placing MPA
1103	   into Full Operation.

1105	   Note: The following text differentiates the two endpoints by calling
1106	       them Initiator and Responder.  This is quite arbitrary and is NOT
1107	       related to the TCP startup (SYN, SYN/ACK sequence).  The
1108	       Initiator is the side that sends first in the MPA startup
1109	       sequence (the MPA Request Frame).

1111	   Note: The possibility that both endpoints would be allowed to make a
1112	       connection at the same time, sometimes called an active/active
1113	       connection, was considered by the work group and rejected.  There
1114	       were several motivations for this decision.  One was that
1115	       applications needing this facility were few (none other than
1116	       theoretical at the time of this draft).  Another was that the
1117	       facility created some implementation difficulties, particularly
1118	       with the "dual stack" designs described later on.  A last issue
1119	       was that dealing with rejected connections at startup would have
1120	       required at least an additional frame type, and more recovery
1121	       actions, complicating the protocol.  While none of these issues
1122	       was overwhelming, the group and implementers were not motivated
1123	       to do the work to resolve these issues.  The protocol includes a
1124	       method of detecting these active/active startup attempts so that
1125	       they can be rejected and an error reported.

1127	   The ULP is responsible for determining which side is Initiator or
1128	   Responder.  For client/server type ULPs this is easy.  For peer-peer
1129	   ULPs (which might utilize a TCP style active/active startup), some
1130	   mechanism (not defined by this specification) must be established, or
1131	   some streaming mode data exchanged prior to MPA startup to determine
1132	   the side which starts in Initiator and which starts in Responder MPA
1133	   mode.

1135	7.1.1  MPA Request and Reply Frame Format

1137	       0                   1                   2                   3
1138	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1139	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1140	   0  |                                                               |
1141	      +         Key (16 bytes containing "MPA ID Req Frame")          +
1142	   4  |      (4D 50 41 20 49 44 20 52 65 71 20 46 72 61 6D 65)        |
1143	      +         Or  (16 bytes containing "MPA ID Rep Frame")          +
1144	   8  |      (4D 50 41 20 49 44 20 52 65 70 20 46 72 61 6D 65)        |
1145	      +                                                               +
1146	   12 |                                                               |
1147	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1148	   16 |M|C|R| Res     |     Rev       |          PD_Length            |
1149	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1150	      |                                                               |
1151	      ~                                                               ~
1152	      ~                   Private Data                                ~
1153	      |                                                               |
1154	      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1155	      |                               |
1156	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1157	                     Figure 8 MPA Request/Reply Frame

1159	   Key: This field contains the "key" used to validate that the sender
1160	       is an MPA sender.  Initiator mode senders MUST set this field to
1161	       the fixed value "MPA ID Req frame" or (in byte order) 4D 50 41 20
1162	       49 44 20 52 65 71 20 46 72 61 6D 65 (in hexadecimal).  Responder
1163	       mode receivers MUST check this field for the same value, and
1164	       close the connection and report an error locally if any other
1165	       value is detected.  Responder mode senders MUST set this field to
1166	       the fixed value "MPA ID Rep frame" or (in byte order) 4D 50 41 20
1167	       49 44 20 52 65 70 20 46 72 61 6D 65 (in hexadecimal).  Initiator
1168	       mode receivers MUST check this field for the same value, and
1169	       close the connection and report an error locally if any other
1170	       value is detected.

1172	   M: This bit declares an endpoint's REQUIRED Marker usage.  When this
1173	       bit is '1' in an MPA Request Frame, the Initiator declares that
1174	       Markers are REQUIRED in FPDUs sent from the Responder.  When set
1175	       to '1' in an MPA Reply Frame, this bit declares that Markers are
1176	       REQUIRED in FPDUs sent from the Initiator.  When in a received
1177	       MPA Request Frame or MPA Reply Frame and the value is '0',
1178	       Markers MUST NOT be added to the data stream by that endpoint.
1179	       When '1' Markers MUST be added as described in section 4.3 MPA
1180	       Markers on page 15.

1182	   C: This bit declares an endpoint's preferred CRC usage.  When this
1183	       field is '0' in the MPA Request Frame and the MPA Reply Frame,
1184	       CRCs MUST not be checked and need not be generated by either
1185	       endpoint.  When this bit is '1' in either the MPA Request Frame
1186	       or MPA Reply Frame, CRCs MUST be generated and checked by both
1187	       endpoints.  Note that even when not in use, the CRC field remains
1188	       present in the FPDU.  When CRCs are not in use, the CRC field
1189	       MUST be considered valid for FPDU checking regardless of its
1190	       contents.

1192	   R: This bit is set to zero, and not checked on reception in the MPA
1193	       Request Frame.  In the MPA Reply Frame, this bit is the Rejected
1194	       Connection bit, set by the Responders ULP to indicate acceptance
1195	       '0', or rejection '1', of the connection parameters provided in
1196	       the Private Data.

1198	   Res: This field is reserved for future use.  It MUST be set to zero
1199	       when sending, and not checked on reception.

1201	   Rev: This field contains the Revision of MPA.  For this version of
1202	       the specification senders MUST set this field to one.  MPA
1203	       receivers compliant with this version of the specification MUST
1204	       check this field.  If the MPA receiver cannot interoperate with
1205	       the received version, then it MUST close the connection and
1206	       report an error locally.  Otherwise, the MPA receiver should
1207	       report the received version to the ULP.

1209	   PD_Length: This field MUST contain the length in Octets of the
1210	       Private Data field.  A value of zero indicates that there is no
1211	       Private Data field present at all.  If the receiver detects that
1212	       the PD_Length field does not match the length of the Private Data
1213	       field, or if the length of the Private Data field exceeds 512
1214	       octets, the receiver MUST close the connection and report an
1215	       error locally.  Otherwise, the MPA receiver should pass the
1216	       PD_Length value and Private Data to the ULP.

1218	   Private Data: This field may contain any value defined by ULPs or may
1219	       not be present.  The Private Data field MUST between 0 and 512
1220	       octets in length.  ULPs define how to size, set, and validate
1221	       this field within these limits.  Private Data usage is further
1222	       discussed in section 7.1.4 on page 35.

1224	7.1.2  Connection Startup Rules

1226	   The following rules apply to MPA connection Startup Phase:

1228	   1.  When MPA is started in the Initiator mode, the MPA implementation
1229	       MUST send a valid MPA Request Frame.  The MPA Request Frame MAY
1230	       include ULP supplied Private Data.

1232	   2.  When MPA is started in the Responder mode, the MPA implementation
1233	       MUST wait until a MPA Request Frame is received and validated
1234	       before entering full MPA/DDP operation.

1236	       If the MPA Request Frame is improperly formatted, the
1237	       implementation MUST close the TCP connection and exit MPA.

1239	       If the MPA Request Frame is properly formatted but the Private
1240	       Data is not acceptable, the implementation SHOULD return an MPA
1241	       Reply Frame with the Rejected Connection bit set to '1'; the MPA
1242	       Reply Frame MAY include ULP supplied Private Data; the
1243	       implementation MUST exit MPA, leaving the TCP connection open.
1244	       The ULP may close TCP or use the connection for other purposes.

1246	       If the MPA Request Frame is properly formatted and the Private
1247	       Data is acceptable, the implementation SHOULD return an MPA Reply
1248	       Frame with the Rejected Connection bit set to '0'; the MPA Reply
1249	       Frame MAY include ULP supplied Private Data; and the Responder
1250	       SHOULD prepare to interpret any data received as FPDUs and pass
1251	       any received ULPDUs to DDP.

1253	       Note: Since the receiver's ability to deal with Markers is
1254	           unknown until the Request and Reply frames have been
1255	           received, sending FPDUs before this occurs is not possible.

1257	       Note: The requirement to wait on a Request Frame before sending a
1258	           Reply frame is a design choice, it makes for well ordered
1259	           sequence of events at each end, and avoids having to specify
1260	           how to deal with situations where both ends start at the same
1261	           time.

1263	   3.  MPA Initiator mode implementations MUST receive and validate a
1264	       MPA Reply Frame.

1266	       If the MPA Reply Frame is improperly formatted, the
1267	       implementation MUST close the TCP connection and exit MPA.

1269	       If the MPA Reply Frame is properly formatted but is the Private
1270	       Data is not acceptable, or if the Rejected Connection bit set to
1271	       '1', the implementation MUST exit MPA, leaving the TCP connection
1272	       open.  The ULP may close TCP or use the connection for other
1273	       purposes.

1275	       If the MPA Reply Frame is properly formatted and the Private Data
1276	       is acceptable, and the Reject Connection bit is set to '0', the
1277	       implementation SHOULD enter full MPA/DDP operation mode;
1278	       interpreting any received data as FPDUs and sending DDP ULPDUs as
1279	       FPDUs.

1281	   4.  MPA Responder mode implementations MUST receive and validate at
1282	       least one FPDU before sending any FPDUs or Markers.

1284	       Note: this requirement is present to allow the Initiator time to
1285	           get its receiver into Full Operation before an FPDU arrives,
1286	           avoiding potential race conditions at the Initiator.  This
1287	           was also subject to some debate in the work group before
1288	           rough consensus was reached.  Eliminating this requirement
1289	           would allow faster startup in some types of applications.
1290	           However, that would also make certain implementations
1291	           (particularly "dual stack") much harder.

1293	   5.  If a received "Key" does not match the expected value, (See 7.1.1
1294	       MPA Request and Reply Frame Format above) the TCP/DDP connection
1295	       MUST be closed, and an error returned to the ULP.

1297	   6.  The received Private Data fields may be used by Consumers at
1298	       either end to further validate the connection, and set up DDP or
1299	       other ULP parameters.  The Initiator ULP MAY close the
1300	       TCP/MPA/DDP connection as a result of validating the Private Data
1301	       fields.  The Responder SHOULD return a MPA Reply Frame with the
1302	       "Reject Connection" Bit set to '1' if the validation of the
1303	       Private Data is not acceptable to the ULP.

1305	   7.  When the first FPDU is to be sent, then if Markers are enabled,
1306	       the first octets sent are the special Marker 0x00000000, followed
1307	       by the start of the FPDU (the FPDU's ULPDU Length field).  If
1308	       Markers are not enabled, the first octets sent are the start of
1309	       the FPDU (the FPDU's ULPDU Length field).

1311	   8.  MPA implementations MUST use the difference between the MPA
1312	       Request Frame and the MPA Reply Frame to check for incorrect
1313	       "Initiator/Initiator" startups.  Implementations SHOULD put a
1314	       timeout on waiting for the MPA Request Frame when started in
1315	       Responder mode, to detect incorrect "Responder/Responder"
1316	       startups.

1318	   9.  MPA implementations MUST validate the PD_Length field.  The
1319	       buffer that receives the Private Data field MUST be large enough
1320	       to receive that data; the amount of Private Data MUST not exceed
1321	       the PD_Length, or the application buffer.  If any of the above
1322	       fails, the startup frame MUST be considered improperly formatted.

1324	   10. MPA implementations SHOULD implement a reasonable timeout while
1325	       waiting for the entire startup frames; this prevents certain
1326	       denial of service attacks.  ULPs SHOULD implement a reasonable
1327	       timeout while waiting for FPDUs, ULPDUs and application level
1328	       messages to guard against application failures and certain denial
1329	       of service attacks.

1331	7.1.3  Example Delayed Startup sequence

1333	   A variety of startup sequences are possible when using MPA on TCP.
1334	   Following is an example of an MPA/DDP startup that occurs after TCP
1335	   has been running for a while and has exchanged some amount of
1336	   streaming data.  This example does not use any Private Data (an
1337	   example that does is shown later in 7.1.4.2 Example Immediate Startup
1338	   using Private Data on page 36), although it is perfectly legal to
1339	   include the Private Data.  Note that since the example does not use
1340	   any Private Data, there are no ULP interactions shown between
1341	   receiving "Startup frames" and putting MPA into Full Operation.

1343	          Initiator                                 Responder

1345	   +---------------------------+
1346	   |ULP streaming mode         |
1347	   | <Hello> request to        |
1348	   | transition to DDP/MPA     |           +--------------------------+
1349	   | mode (optional)           | --------> |ULP gets request;         |
1350	   +---------------------------+           |enables MPA Responder mode|
1351	                                           |with last (optional)      |
1352	                                           |streaming mode <Hello Ack>|
1353	                                           |for MPA to send.          |
1354	   +---------------------------+           |MPA waits for incoming    |
1355	   |ULP receives streaming     | <-------- |  <MPA Request frame>     |
1356	   | <Hello Ack>;              |           +--------------------------+
1357	   |Enters MPA Initiator mode; |
1358	   |MPA sends                  |
1359	   |  <MPA Request Frame>;     |
1360	   |MPA waits for incoming     |           +--------------------------+
1361	   |  <MPA Reply Frame         | - - - - > |MPA receives              |
1362	   +---------------------------+           |  <MPA Request Frame>     |
1363	                                           |Consumer binds DDP to MPA,|
1364	                                           |MPA sends the             |
1365	                                           |  <MPA Reply Frame>.      |
1366	                                           |DDP/MPA enables FPDU      |
1367	   +---------------------------+           |decoding, but does not    |
1368	   |MPA receives the           | < - - - - |send any FPDUs.           |
1369	   |  <MPA Reply Frame>        |           +--------------------------+
1370	   |Consumer binds DDP to MPA, |
1371	   |DDP/MPA begins full        |
1372	   |operation.                 |
1373	   |MPA sends first FPDU (as   |           +--------------------------+
1374	   |DDP ULPDUs become          | ========> |MPA Receives first FPDU.  |
1375	   |available).                |           |MPA sends first FPDU (as  |
1376	   +---------------------------+           |DDP ULPDUs become         |
1377	                                   <====== |available.                |
1378	                                           +--------------------------+
1379	               Figure 9: Example Delayed Startup negotiation

1381	   An example Delayed Startup sequence is described below:

1383	       *   Active and passive sides start up a TCP connection in the
1384	           usual fashion, probably using sockets APIs.  They exchange
1385	           some amount of streaming mode data.  At some point one side
1386	           (the MPA Initiator) sends streaming mode data that
1387	           effectively says "Hello, Lets go into MPA/DDP mode."

1389	   *   When the remote side (the MPA Responder) gets this streaming mode
1390	       message, the Consumer would send a last streaming mode message
1391	       that effectively says "I Acknowledge your Hello, and am now in
1392	       MPA Responder Mode".  The exchange of these messages establishes
1393	       the exact point in the TCP stream where MPA is enabled.  The
1394	       Responding Consumer enables MPA in the Responder mode and waits
1395	       for the initial MPA startup message.

1397	       *   The Initiating Consumer would enable MPA startup in the
1398	           Initiator mode which then sends the MPA Request Frame.  It is
1399	           assumed that no Private Data messages are needed for this
1400	           example, although it is possible to do so.  The Initiating
1401	           MPA (and Consumer) would also wait for the MPA connection to
1402	           be accepted.

1404	   *   The Responding MPA would receive the initial MPA Request Frame
1405	       and would inform the Consumer that this message arrived.  The
1406	       Consumer can then accept the MPA/DDP connection or close the TCP
1407	       connection.

1409	   *   To accept the connection request, the Responding Consumer would
1410	       use an appropriate API to bind the TCP/MPA connections to a DDP
1411	       endpoint, thus enabling MPA/DDP into Full Operation.  In the
1412	       process of going to Full Operation, MPA sends the MPA Reply
1413	       Frame.  MPA/DDP waits for the first incoming FPDU before sending
1414	       any FPDUs.

1416	   *   If the initial TCP data was not a properly formatted MPA Request
1417	       Frame MPA will close or reset the TCP connection immediately.

1419	       *   The Initiating MPA would receive the MPA Reply Frame and
1420	           would report this message to the Consumer.  The Consumer can
1421	           then accept the MPA/DDP connection, or close or reset the TCP
1422	           connection to abort the process.

1424	       *   On determining that the Connection is acceptable, the
1425	           Initiating Consumer would use an appropriate API to bind the
1426	           TCP/MPA connections to a DDP endpoint thus enabling MPA/DDP
1427	           into Full Operation.  MPA/DDP would begin sending DDP
1428	           messages as MPA FPDUs.

1430	7.1.4  Use of Private Data

1432	   This section is advisory in nature, in that it suggests a method that
1433	   a ULP can deal with pre-DDP connection information exchange.

1435	7.1.4.1  Motivation

1437	   Prior RDMA protocols have been developed that provide Private Data
1438	   via out of band mechanisms.  As a result, many applications now
1439	   expect some form of Private Data to be available for application use
1440	   prior to setting up the DDP/RDMA connection.  Following are some
1441	   examples of the use of Private Data.

1443	   An RDMA Endpoint (referred to as a Queue Pair, or QP, in InfiniBand
1444	   and the [VERBS]) must be associated with a Protection Domain.  No
1445	   receive operations may be posted to the endpoint before it is
1446	   associated with a Protection Domain.  Indeed under both the
1447	   InfiniBand and proposed RDMA/DDP verbs [VERBS] an endpoint/QP is
1448	   created within a Protection Domain.

1450	   There are some applications where the choice of Protection Domain is
1451	   dependent upon the identity of the remote ULP client.  For example,
1452	   if a user session requires multiple connections, it is highly
1453	   desirable for all of those connections to use a single Protection
1454	   Domain.  Note: use of Protection Domains is further discussed in
1455	   [RDMASEC].

1457	   InfiniBand, the DAT APIs [DAT-API] and the [IT-API] all provide for
1458	   the active side ULP to provide Private Data when requesting a
1459	   connection.  This data is passed to the ULP to allow it to determine
1460	   whether to accept the connection, and if so with which endpoint (and
1461	   implicitly which Protection Domain).

1463	   The Private Data can also be used to ensure that both ends of the
1464	   connection have configured their RDMA endpoints compatibly on such
1465	   matters as the RDMA Read capacity (see [RDMAP]).  Further ULP-
1466	   specific uses are also presumed, such as establishing the identity of
1467	   the client.

1469	   Private Data is also allowed for when accepting the connection, to
1470	   allow completion of any negotiation on RDMA resources and for other
1471	   ULP reasons.

1473	   There are several potential ways to exchange this Private Data.  For
1474	   example, the InfiniBand specification includes a connection
1475	   management protocol that allows a small amount of Private Data to be
1476	   exchanged using datagrams before actually starting the RDMA
1477	   connection.

1479	   This draft allows for small amounts of Private Data to be exchanged
1480	   as part of the MPA startup sequence.  The actual Private Data fields
1481	   are carried in the MPA Request Frame, and the MPA Reply Frame.

1483	   If larger amounts of Private Data or more negotiation is necessary,
1484	   TCP streaming mode messages may be exchanged prior to enabling MPA.

1486	7.1.4.2  Example Immediate Startup using Private Data

1488	          Initiator                                 Responder

1490	   +---------------------------+
1491	   |TCP SYN sent               |           +--------------------------+
1492	   +---------------------------+ --------> |TCP gets SYN packet;      |
1493	   +---------------------------+           |  Sends SYN-Ack           |
1494	   |TCP gets SYN-Ack           | <-------- +--------------------------+
1495	   |  Sends Ack                |
1496	   +---------------------------+ --------> +--------------------------+
1497	   +---------------------------+           |Consumer enables MPA      |
1498	   |Consumer enables MPA       |           |Responder Mode, waits for |
1499	   |Initiator mode with        |           |  <MPA Request frame>     |
1500	   |Private Data; MPA sends    |           +--------------------------+
1501	   |  <MPA Request Frame>;     |
1502	   |MPA waits for incoming     |           +--------------------------+
1503	   |  <MPA Reply Frame         | - - - - > |MPA receives              |
1504	   +---------------------------+           |  <MPA Request Frame>     |
1505	                                           |Consumer examines Private |
1506	                                           |Data, provides MPA with   |
1507	                                           |return Private Data,      |
1508	                                           |binds DDP to MPA, and     |
1509	                                           |enables MPA to send an    |
1510	                                           |  <MPA Reply Frame>.      |
1511	                                           |DDP/MPA enables FPDU      |
1512	   +---------------------------+           |decoding, but does not    |
1513	   |MPA receives the           | < - - - - |send any FPDUs.           |
1514	   |  <MPA Reply Frame>        |           +--------------------------+
1515	   |Consumer examines Private  |
1516	   |Data, binds DDP to MPA,    |
1517	   |and enables DDP/MPA to     |
1518	   |begin Full Operation.      |
1519	   |MPA sends first FPDU (as   |           +--------------------------+
1520	   |DDP ULPDUs become          | ========> |MPA Receives first FPDU.  |
1521	   |available).                |           |MPA sends first FPDU (as  |
1522	   +---------------------------+           |DDP ULPDUs become         |
1523	                                   <====== |available.                |
1524	                                           +--------------------------+
1525	             Figure 10: Example Immediate Startup negotiation

1527	   Note: the exact order of when MPA is started in the TCP connection
1528	       sequence is implementation dependent; the above diagram shows one
1529	       possible sequence.  Also, the Initiator "Ack" to the Responder's
1530	       "SYN-Ack" may be combined into the same TCP segment containing
1531	       the MPA Request Frame (as is allowed by TCP RFCs).

1533	   The example immediate startup sequence is described below:

1535	   *   The passive side (Responding Consumer) would listen on the TCP
1536	       destination port, to indicate its readiness to accept a
1537	       connection.

1539	       *   The active side (Initiating Consumer) would request a
1540	           connection from a TCP endpoint (that expected to upgrade to
1541	           MPA/DDP/RDMA and expected the Private Data) to a destination
1542	           address and port.

1544	       *   The Initiating Consumer would initiate a TCP connection to
1545	           the destination port.  Acceptance/rejection of the connection
1546	           would proceed as per normal TCP connection establishment.

1548	   *   The passive side (Responding Consumer) would receive the TCP
1549	       connection request as usual allowing normal TCP gatekeepers, such
1550	       as INETD and TCPserver, to exercise their normal
1551	       safeguard/logging functions.  On acceptance of the TCP
1552	       connection, the Responding Consumer would enable MPA in the
1553	       Responder mode and wait for the initial MPA startup message.

1555	       *   The Initiating Consumer would enable MPA startup in the
1556	           Initiator mode to send an initial MPA Request Frame with its
1557	           included Private Data message to send.  The Initiating MPA
1558	           (and Consumer) would also wait for the MPA connection to be
1559	           accepted, and any returned Private Data.

1561	   *   The Responding MPA would receive the initial MPA Request Frame
1562	       with the Private Data message and would pass the Private Data
1563	       through to the Consumer.  The Consumer can then accept the
1564	       MPA/DDP connection, close the TCP connection, or reject the MPA
1565	       connection with a return message.

1567	   *   To accept the connection request, the Responding Consumer would
1568	       use an appropriate API to bind the TCP/MPA connections to a DDP
1569	       endpoint, thus enabling MPA/DDP into Full Operation.  In the
1570	       process of going to Full Operation, MPA sends the MPA Reply Frame
1571	       which includes the Consumer supplied Private Data containing any
1572	       appropriate Consumer response.  MPA/DDP waits for the first
1573	       incoming FPDU before sending any FPDUs.

1575	   *   If the initial TCP data was not a properly formatted MPA Request
1576	       Frame, MPA will close or reset the TCP connection immediately.

1578	   *   To reject the MPA connection request, the Responding Consumer
1579	       would send an MPA Reply Frame with any ULP supplied Private Data
1580	       (with reason for rejection), with the "Rejected Connection" bit
1581	       set to '1', and may close the TCP connection.

1583	       *   The Initiating MPA would receive the MPA Reply Frame with the
1584	           Private Data message and would report this message to the
1585	           Consumer, including the supplied Private Data.

1587	           If the "rejected Connection" bit is set to a '1', MPA will
1588	           close the TCP connection and exit.

1590	           If the "Rejected Connection" bit is set to a '0', and on
1591	           determining from the MPA Reply Frame Private Data that the
1592	           Connection is acceptable, the Initiating Consumer would use
1593	           an appropriate API to bind the TCP/MPA connections to a DDP
1594	           endpoint thus enabling MPA/DDP into Full Operation.  MPA/DDP
1595	           would begin sending DDP messages as MPA FPDUs.

1597	7.1.5  "Dual stack" implementations

1599	   MPA/DDP implementations are commonly expected to be implemented as
1600	   part of a "dual stack" architecture.  One "stack" is the traditional
1601	   TCP stack, usually with a sockets interface API (Application
1602	   Programming Interface).  The second stack is the MPA/DDP "stack" with
1603	   its own API, and potentially separate code or hardware to deal with
1604	   the MPA/DDP data.  Of course, implementations may vary, so the
1605	   following comments are of an advisory nature only.

1607	   The use of the two "stacks" offers advantages:

1609	        TCP connection setup is usually done with the TCP stack.  This
1610	        allows use of the usual naming and addressing mechanisms.  It
1611	        also means that any mechanisms used to "harden" the connection
1612	        setup against security threats are also used when starting
1613	        MPA/DDP.

1615	        Some applications may have been originally designed for TCP, but
1616	        are "enhanced" to utilize MPA/DDP after a negotiation reveals
1617	        the capability to do so.  The negotiation process takes place in
1618	        TCP's streaming mode, using the usual TCP APIs.

1620	        Some new applications, designed for RDMA or DDP, still need to
1621	        exchange some data prior to starting MPA/DDP.  This exchange can
1622	        be of arbitrary length or complexity, but often consists of only
1623	        a small amount of Private Data, perhaps only a single message.
1624	        Using the TCP streaming mode for this exchange allows this to be
1625	        done using well understood methods.

1627	   The main disadvantage of using two stacks is the conversion of an
1628	   active TCP connection between them.  This process must be done with
1629	   care to prevent loss of data.

1631	   To avoid some of the problems when using a "dual stack" architecture
1632	   the following additional restrictions may be required by the
1633	   implementation:

1635	   1.  Enabling the DDP/MPA stack SHOULD be done only when no incoming
1636	       stream data is expected.  This is typically managed by the ULP
1637	       protocol.  When following the recommended startup sequence, the
1638	       Responder side enters DDP/MPA mode, sends the last streaming mode
1639	       data, and then waits for the MPA Request Frame.  No additional
1640	       streaming mode data is expected.  The Initiator side ULP receives
1641	       the last streaming mode data, and then enters DDP/MPA mode.
1642	       Again, no additional streaming mode data is expected.

1644	   2.  The DDP/MPA MAY provide the ability to send a "last streaming
1645	       message" as part of its Responder DDP/MPA enable function.  This
1646	       allows the DDP/MPA stack to more easily manage the conversion to
1647	       DDP/MPA mode (and avoid problems with a very fast return of the
1648	       MPA Request Frame from the Initiator side).

1650	   Note: Regardless of the "stack" architecture used, TCP's rules MUST
1651	       be followed.  For example, if network data is lost, re-segmented
1652	       or re-ordered, TCP MUST recover appropriately even when this
1653	       occurs while switching stacks.

1655	7.2 Normal Connection Teardown

1657	   Each half connection of MPA terminates when DDP closes the
1658	   corresponding TCP half connection.

1660	   A mechanism SHOULD be provided by MPA to DDP for DDP to be made aware
1661	   that a graceful close of the TCP connection has been received by the
1662	   TCP (e.g. FIN is received).

1664	8  Error Semantics

1666	   The following errors MUST be detected by MPA and the codes SHOULD be
1667	   provided to DDP or other Consumer:

1669	    Code Error

1671	    1    TCP connection closed, terminated or lost.  This includes lost
1672	         by timeout, too many retries, RST received or FIN received.

1674	    2    Received MPA CRC does not match the calculated value for the
1675	         FPDU.

1677	    3    In the event that the CRC is valid, received MPA Marker (if
1678	         enabled) and ULPDU Length fields do not agree on the start of
1679	         a FPDU.  If the FPDU start determined from previous ULPDU
1680	         Length fields does not match with the MPA Marker position, MPA
1681	         SHOULD deliver an error to DDP.  It may not be possible to
1682	         make this check as a segment arrives, but the check SHOULD be
1683	         made when a gap creating an out of order sequence is closed
1684	         and any time a Marker points to an already identified FPDU.
1685	         It is OPTIONAL for a receiver to check each Marker, if
1686	         multiple Markers are present in an FPDU, or if the segment is
1687	         received in order.

1689	    4    Invalid MPA Request Frame or MPA Response Frame received.  In
1690	         this case, the TCP connection MUST be immediately closed.  DDP
1691	         and other ULPs should treat this similar to code 1, above.

1693	   When conditions 2 or 3 above are detected, an optimized MPA/TCP
1694	   implementation MAY choose to silently drop the TCP segment rather
1695	   than reporting the error to DDP.  In this case, the sending TCP will
1696	   retry the segment, usually correcting the error, unless the problem
1697	   was at the source.  In that case, the source will usually exceed the
1698	   number of retries and terminate the connection.

1700	   Once MPA delivers an error of any type, it MUST NOT pass or deliver
1701	   any additional FPDUs on that half connection.

1703	   For Error codes 2 and 3, MPA MUST NOT close the TCP connection
1704	   following a reported error.  Closing the connection is the
1705	   responsibility of DDP's ULP.

1707	        Note that since MPA will not Deliver any FPDUs on a half
1708	        connection following an error detected on the receive side of
1709	        that connection, DDP's ULP is expected to tear down the
1710	        connection.  This may not occur until after one or more last
1711	        messages are transmitted on the opposite half connection.  This
1712	        allows a diagnostic error message to be sent.

1714	9  Security Considerations

1716	   This section discusses the security considerations for MPA.

1718	9.1 Protocol-specific Security Considerations

1720	   The vulnerabilities of MPA to third-party attacks are no greater than
1721	   any other protocol running over TCP.  A third party, by sending
1722	   packets into the network that are delivered to an MPA receiver, could
1723	   launch a variety of attacks that take advantage of how MPA operates.
1724	   For example, a third party could send random packets that are valid
1725	   for TCP, but contain no FPDU headers.  An MPA receiver reports an
1726	   error to DDP when any packet arrives that cannot be validated as an
1727	   FPDU when properly located on an FPDU boundary.  A third party could
1728	   also send packets that are valid for TCP, MPA, and DDP, but do not
1729	   target valid buffers.  These types of attacks ultimately result in
1730	   loss of connection and thus become a type of DOS (Denial Of Service)
1731	   attack.  Communication security mechanisms such as IPsec [RFC2401]
1732	   may be used to prevent such attacks.

1734	   Independent of how MPA operates, a third party could use ICMP
1735	   messages to reduce the path MTU to such a small size that performance
1736	   would likewise be severely impacted.  Range checking on path MTU
1737	   sizes in ICMP packets may be used to prevent such attacks.

1739	   [RDMAP] and [DDP] are used to control, read and write data buffers
1740	   over IP networks.  Therefore, the control and the data packets of
1741	   these protocols are vulnerable to the spoofing, tampering and
1742	   information disclosure attacks listed below.  In addition, Connection
1743	   to/from an unauthorized or unauthenticated endpoint is a potential
1744	   problem with most applications using RDMA, DDP, and MPA.

1746	9.1.1  Spoofing

1748	   Spoofing attacks can be launched by the Remote Peer, or by a network
1749	   based attacker.  A network based spoofing attack applies to all
1750	   Remote Peers.  Because the MPA Stream requires a TCP Stream in the
1751	   ESTABLISHED state, certain types of traditional forms of wire attacks
1752	   do not apply -- an end-to-end handshake must have occurred to
1753	   establish the MPA Stream.  So, the only form of spoofing that applies
1754	   is one when a remote node can both send and receive packets.  Yet
1755	   even with this limitation the Stream is still exposed to the
1756	   following spoofing attacks.

1758	9.1.1.1  Impersonation

1760	   A network based attacker can impersonate a legal MPA/DDP/RDMAP peer
1761	   (by spoofing a legal IP address), and establish an MPA/DDP/RDMAP
1762	   Stream with the victim.  End to end authentication (i.e. IPsec or ULP
1763	   authentication) provides protection against this attack.

1765	9.1.1.2  Stream Hijacking

1767	   Stream hijacking happens when a network based attacker follows the
1768	   Stream establishment phase, and waits until the authentication phase
1769	   (if such a phase exists) is completed successfully.  He can then
1770	   spoof the IP address and re-direct the Stream from the victim to its
1771	   own machine.  For example, an attacker can wait until an iSCSI
1772	   authentication is completed successfully, and hijack the iSCSI
1773	   Stream.

1775	   The best protection against this form of attack is end-to-end
1776	   integrity protection and authentication, such as IPsec to prevent
1777	   spoofing.  Another option is to provide physical security.
1778	   Discussion of physical security is out of scope for this document.

1780	9.1.1.3  Man in the Middle Attack

1782	   If a network based attacker has the ability to delete, inject,
1783	   replay, or modify packets which will still be accepted by MPA (e.g.,
1784	   TCP sequence number is correct, FPDU is valid etc.) then the Stream
1785	   can be exposed to a man in the middle attack.  The attacker could
1786	   potentially use the services of [DDP] and [RDMAP] to read the
1787	   contents of the associated data buffer, modify the contents of the
1788	   associated data buffer, or to disable further access to the buffer.
1789	   Other attacks on the connection setup sequence and even on TCP can be
1790	   used to cause denial of service.  The only countermeasure for this
1791	   form of attack is to either secure the MPA/DDP/RDMAP Stream (i.e.
1792	   integrity protect) or attempt to provide physical security to prevent
1793	   man-in-the-middle type attacks.

1795	   The best protection against this form of attack is end-to-end
1796	   integrity protection and authentication, such as IPsec, to prevent
1797	   spoofing or tampering.  If Stream or session level authentication and
1798	   integrity protection are not used, then a man-in-the-middle attack
1799	   can occur, enabling spoofing and tampering.

1801	   Another approach is to restrict access to only the local subnet/link,
1802	   and provide some mechanism to limit access, such as physical security
1803	   or 802.1.x.  This model is an extremely limited deployment scenario,
1804	   and will not be further examined here.

1806	9.1.2  Eavesdropping

1808	   Generally speaking, Stream confidentiality protects against
1809	   eavesdropping.  Stream and/or session authentication and integrity
1810	   protection is a counter measurement against various spoofing and
1811	   tampering attacks.  The effectiveness of authentication and integrity
1812	   against a specific attack, depend on whether the authentication is
1813	   machine level authentication (as the one provided by IPsec), or ULP
1814	   authentication.

1816	9.2 Introduction to Security Options

1818	   The following security services can be applied to an MPA/DDP/RDMAP
1819	   Stream:

1821	   1.  Session confidentiality - protects against eavesdropping.

1823	   2.  Per-packet data source authentication - protects against the
1824	   following spoofing attacks: network based impersonation, Stream
1825	   hijacking, and man in the middle.

1827	   3.  Per-packet integrity - protects against tampering done by
1828	   network based modification of FPDUs (indirectly affecting buffer
1829	   content through DDP services).

1831	   4.  Packet sequencing - protects against replay attacks, which is
1832	   a special case of the above tampering attack.

1834	   If an MPA/DDP/RDMAP Stream may be subject to impersonation attacks,
1835	   or Stream hijacking attacks, it is recommended that the Stream be
1836	   authenticated, integrity protected, and protected from replay
1837	   attacks; it may use confidentiality protection to protect from
1838	   eavesdropping (in case the MPA/DDP/RDMAP Stream traverses a public
1839	   network).

1841	   IPsec is capable of providing the above security services for IP and
1842	   TCP traffic.

1844	   ULP protocols may be able to provide part of the above security
1845	   services.  See [NFSv4CHANNEL] for additional information on a
1846	   promising approach called "channel binding".  From [NFSv4CHANNEL]:

1848	        "The concept of channel bindings allows applications to prove
1849	        that the end-points of two secure channels at different network
1850	        layers are the same by binding authentication at one channel to
1851	        the session protection at the other channel.  The use of channel
1852	        bindings allows applications to delegate session protection to
1853	        lower layers, which may significantly improve performance for
1854	        some applications."

1856	9.3 Using IPsec With MPA

1858	   IPsec can be used to protect against the packet injection attacks
1859	   outlined above.  Because IPsec is designed to secure individual IP
1860	   packets, MPA can run above IPsec without change.  IPsec packets are
1861	   processed (e.g., integrity checked and decrypted) in the order they
1862	   are received, and an MPA receiver will process the decrypted FPDUs
1863	   contained in these packets in the same manner as FPDUs contained in
1864	   unsecured IP packets.

1866	   MPA Implementations MUST implement IPsec as described in Section 9.4
1867	   below.  The use of IPsec is up to ULPs and administrators.

1869	9.4 Requirements for IPsec Encapsulation of MPA/DDP

1871	   The IP Storage working group has spent significant time and effort to
1872	   define the normative IPsec requirements for IP Storage [RFC3723].
1873	   Portions of that specification are applicable to a wide variety of
1874	   protocols, including the RDDP protocol suite.  In order to not
1875	   replicate this effort, an MPA on TCP implementation MUST follow the
1876	   requirements defined in RFC3723 Section 2.3 and Section 5, including
1877	   the associated normative references for those sections.

1879	   Additionally, since IPsec acceleration hardware may only be able to
1880	   handle a limited number of active IKE Phase 2 SAs, Phase 2 delete
1881	   messages MAY be sent for idle SAs, as a means of keeping the number
1882	   of active Phase 2 SAs to a minimum.  The receipt of an IKE Phase 2
1883	   delete message MUST NOT be interpreted as a reason for tearing down
1884	   an DDP/RDMA Stream.  Rather, it is preferable to leave the Stream up,
1885	   and if additional traffic is sent on it, to bring up another IKE
1886	   Phase 2 SA to protect it.  This avoids the potential for continually
1887	   bringing Streams up and down.

1889	   The IPsec requirements for RDDP are based on the version of IPsec
1890	   specified in RFC 2401 [RFC2401] and related RFCs, as profiled by RFC
1891	   3723 [RFC3723], despite the existence of a newer version of IPsec
1892	   specified in RFC 4301 [RFC4301] and related RFCs.  One of the
1893	   important early applications of the RDDP protocols is their use with
1894	   iSCSI [iSER]; RDDP's IPsec requirements follow those of IPsec in
1895	   order to facilitate that usage by allowing a common profile of IPsec
1896	   to be used with iSCSI and the RDDP protocols.  In the future, RFC
1897	   3723 may be updated to the newer version of IPsec, the IPsec security
1898	   requirements of any such update should apply uniformly to iSCSI and
1899	   the RDDP protocols.

1901	   Note that there are serious security issues if IPsec is not
1902	   implemented end-to-end.  For example, if IPsec is implemented as a
1903	   tunnel in the middle of the network, any hosts between the peer and
1904	   the IPsec tunneling device can freely attack the unprotected Stream.

1906	10 IANA Considerations

1908	   No IANA actions are required by this document.

1910	   If a well-known port is chosen as the mechanism to identify a DDP on
1911	   MPA on TCP, the well-known port must be registered with IANA.
1912	   Because the use of the port is DDP specific, registration of the port
1913	   with IANA is left to DDP.

1915	A Appendix.
1916	            Optimized MPA-aware TCP implementations

1918	   This appendix is for information only and is NOT part of the
1919	   standard.

1921	   This appendix covers some Optimized MPA-aware TCP implementation
1922	   guidance to implementers.  It is intended for those implementations
1923	   that want to send/receive as much traffic as possible in an aligned
1924	   and zero-copy fashion.

1926	                   +-----------------------------------+
1927	                   | +-----------+ +-----------------+ |
1928	                   | | Optimized | | Other Protocols | |
1929	                   | |  MPA/TCP  | +-----------------+ |
1930	                   | +-----------+        ||           |
1931	                   |         \\     --- socket API --- |
1932	                   |          \\          ||           |
1933	                   |           \\      +-----+         |
1934	                   |            \\     | TCP |         |
1935	                   |             \\    +-----+         |
1936	                   |              \\    //             |
1937	                   |             +-------+             |
1938	                   |             |  IP   |             |
1939	                   |             +-------+             |
1940	                   +-----------------------------------+

1942	                Figure 11 Optimized MPA/TCP implementation

1944	   The diagram above shows a block diagram of a potential
1945	   implementation.  The network sub-system in the diagram can support
1946	   traditional sockets based connections using the normal API as shown
1947	   on the right side of the diagram.  Connections for DDP/MPA/TCP are
1948	   run using the facilities shown on the left side of the diagram.

1950	   The DDP/MPA/TCP connections can be started using the facilities shown
1951	   on the left side using some suitable API, or they can be initiated
1952	   using the facilities shown on the right side and transitioned to the
1953	   left side at the point in the connection setup where MPA goes to
1954	   "full MPA/DDP operation mode" as described in section 7.1.2 on page
1955	   29.

1957	   The optimized MPA/TCP implementations (left side of diagram and
1958	   described below) are only applicable to MPA, all other TCP
1959	   applications continue to use the standard TCP stacks and interfaces
1960	   shown in the right side of the diagram.

1962	A.1  Optimized MPA/TCP transmitters

1964	   The various TCP RFCs allow considerable choice in segmenting a TCP
1965	   stream.  In order to optimize FPDU recovery at the MPA receiver, an
1966	   optimized MPA/TCP implementation uses additional segmentation rules.

1968	   To provide optimum performance, an optimized MPA/TCP transmit side
1969	   implementation should be enabled to:

1971	   *   With an EMSS large enough to contain the FPDU(s), segment the
1972	       outgoing TCP stream such that the first octet of every TCP
1973	       Segment begins with an FPDU.  Multiple FPDUs may be packed into a
1974	       single TCP segment as long as they are entirely contained in the
1975	       TCP segment.

1977	   *   Report the current EMSS from the TCP to the MPA transmit layer.

1979	   There are exceptions to the above rule.  Once an ULPDU is provided to
1980	   MPA, the MPA/TCP sender transmits it or fails the connection; it
1981	   cannot be repudiated.  As a result, during changes in MTU and EMSS,
1982	   or when TCP's Receive Window size (RWIN) becomes too small, it may be
1983	   necessary to send FPDUs that do not conform to the segmentation rule
1984	   above.

1986	   A possible, but less desirable, alternative is to use IP
1987	   fragmentation on accepted FPDUs to deal with MTU reductions or
1988	   extremely small EMSS.

1990	   Even when alignment with TCP segments is lost, the sender still
1991	   formats the FPDU according to FPDU format as shown in Figure 2.

1993	   On a retransmission, TCP does not necessarily preserve original TCP
1994	   segmentation boundaries.  This can lead to the loss of FPDU Alignment
1995	   and containment within a TCP segment during TCP retransmissions.  An
1996	   optimized MPA/TCP sender should try to preserve original TCP
1997	   segmentation boundaries on a retransmission.

1999	A.2  Effects of Optimized MPA/TCP Segmentation

2001	   Optimized MPA/TCP senders will fill TCP segments to the EMSS with a
2002	   single FPDU when a DDP message is large enough.  Since the DDP
2003	   message may not exactly fit into TCP segments, a "message tail" often
2004	   occurs that results in an FPDU that is smaller than a single TCP
2005	   segment.  Additionally some DDP messages may be considerably shorter
2006	   than the EMSS.  If a small FPDU is sent in a single TCP segment the
2007	   result is a "short" TCP segment.

2009	   Applications expected to see strong advantages from Direct Data
2010	   Placement include transaction-based applications and throughput
2011	   applications.  Request/response protocols typically send one FPDU per
2012	   TCP segment and then wait for a response.  Under these conditions,
2013	   these "short" TCP segments are an appropriate and expected effect of
2014	   the segmentation.

2016	   Another possibility is that the application might be sending multiple
2017	   messages (FPDUs) to the same endpoint before waiting for a response.

2019	   In this case, the segmentation policy would tend to reduce the
2020	   available connection bandwidth by under-filling the TCP segments.

2022	   Standard TCP implementations often utilize the Nagle [RFC0896]
2023	   algorithm to ensure that segments are filled to the EMSS whenever the
2024	   round trip latency is large enough that the source stream can fully
2025	   fill segments before Acks arrive.  The algorithm does this by
2026	   delaying the transmission of TCP segments until a ULP can fill a
2027	   segment, or until an ACK arrives from the far side.  The algorithm
2028	   thus allows for smaller segments when latencies are shorter to keep
2029	   the ULP's end to end latency to reasonable levels.

2031	   The Nagle algorithm is not mandatory to use [RFC1122].

2033	   When used with optimized MPA/TCP stacks, Nagle and similar algorithms
2034	   can result in the "packing" of multiple FPDUs into TCP segments.

2036	   If a "message tail", small DDP messages, or the start of a larger DDP
2037	   message are available, MPA may pack multiple FPDUs into TCP segments.
2038	   When this is done, the TCP segments can be more fully utilized, but,
2039	   due to the size constraints of FPDUs, segments may not be filled to
2040	   the EMSS.  A dynamic MULPDU that informs DDP of the size of the
2041	   remaining TCP segment space makes filling the TCP segment more
2042	   effective.

2044	        Note that MPA receivers do more processing of a TCP segment that
2045	        contains multiple FPDUs, this may affect the performance of some
2046	        receiver implementations.

2048	   It is up to the ULP to decide if Nagle is useful with DDP/MPA.  Note
2049	   that many of the applications expected to take advantage of MPA/DDP
2050	   prefer to avoid the extra delays caused by Nagle.  In such scenarios
2051	   it is anticipated there will be minimal opportunity for packing at
2052	   the transmitter and receivers may choose to optimize their
2053	   performance for this anticipated behavior.

2055	   Therefore, the application is expected to set TCP parameters such
2056	   that it can trade off latency and wire efficiency.  Implementations
2057	   should provide a connection option which disables Nagle for MPA/TCP
2058	   similar to the way the TCP_NODELAY socket option is provided for a
2059	   traditional sockets interface.

2061	   When latency is not critical, application is expected to leave Nagle
2062	   enabled.  In this case the TCP implementation may pack any available
2063	   FPDUs into TCP segments so that the segments are filled to the EMSS.
2064	   If the amount of data available is not enough to fill the TCP segment
2065	   when it is prepared for transmission, TCP can send the segment partly
2066	   filled, or use the Nagle algorithm to wait for the ULP to post more
2067	   data.

2069	A.3  Optimized MPA/TCP receivers

2071	   When an MPA receive implementation and the MPA-aware receive side TCP
2072	   implementation support handling out of order ULPDUs, the TCP receive
2073	   implementation performs the following functions:

2075	   1)  The implementation passes incoming TCP segments to MPA as soon as
2076	       they have been received and validated, even if not received in
2077	       order.  The TCP layer commits to keeping each segment before it
2078	       can be passed to the MPA.  This means that the segment must have
2079	       passed the TCP, IP, and lower layer data integrity validation
2080	       (i.e., checksum), must be in the receive window, must be part of
2081	       the same epoch (if timestamps are used to verify this) and any
2082	       other checks required by TCP RFCs.

2084	       This is not to imply that the data must be completely ordered
2085	       before use.  An implementation can accept out of order segments,
2086	       SACK them [RFC2018], and pass them to MPA immediately, before the
2087	       reception of the segments needed to fill in the gaps arrive.
2088	       MPA expects to utilize these segments when they are complete
2089	       FPDUs or can be combined into complete FPDUs to allow the passing
2090	       of ULPDUs to DDP when they arrive, independent of ordering.  DDP
2091	       uses the passed ULPDU to "place" the DDP segments (see [DDP] for
2092	       more details).

2094	       Since MPA performs a CRC calculation and other checks on received
2095	       FPDUs, the MPA/TCP implementation ensures that any TCP segments
2096	       that duplicate data already received and processed (as can happen
2097	       during TCP retries) do not overwrite already received and
2098	       processed FPDUs.  This avoids the possibility that duplicate data
2099	       may corrupt already validated FPDUs.

2101	   2)  The implementation provides a mechanism to indicate the ordering
2102	       of TCP segments as the sender transmitted them.  One possible
2103	       mechanism might be attaching the TCP sequence number to each
2104	       segment.

2106	   3)  The implementation also provides a mechanism to indicate when a
2107	       given TCP segment (and the prior TCP stream) is complete.  One
2108	       possible mechanism might be to utilize the leading (left) edge of
2109	       the TCP Receive Window.

2111	       MPA uses the ordering and completion indications to inform DDP
2112	       when a ULPDU is complete; MPA Delivers the FPDU to DDP.  DDP uses
2113	       the indications to "deliver" its messages to the DDP consumer
2114	       (see [DDP] for more details).

2116	       DDP on MPA utilizes the above two mechanisms to establish the
2117	       Delivery semantics that DDP's consumers agree to.  These
2118	       semantics are described fully in [DDP].  These include
2119	       requirements on DDP's consumer to respect ownership of buffers
2120	       prior to the time that DDP delivers them to the Consumer.

2122	   The use of SACK [RFC2018] significantly improves network utilization
2123	   and performance and is therefore recommended.  When combined with the
2124	   out-of-order passing of segments to MPA and DDP, significant
2125	   buffering and copying of received data can be avoided.

2127	A.4  Re-segmenting Middle boxes and non optimized MPA/TCP senders

2129	   Since MPA senders often start FPDUs on TCP segment boundaries, a
2130	   receiving optimized MPA/TCP implementation may be able to optimize
2131	   the reception of data in various ways.

2133	   However, MPA receivers MUST NOT depend on FPDU Alignment on TCP
2134	   segment boundaries.

2136	   Some MPA senders may be unable to conform to the sender requirements
2137	   because their implementation of TCP is not designed with MPA in mind.
2138	   Even for optimized MPA/TCP senders, the network may contain "middle
2139	   boxes" which modify the TCP stream by changing the segmentation.
2140	   This is generally interoperable with TCP and its users and MPA must
2141	   be no exception.

2143	   The presence of Markers in MPA (when enabled) allows an optimized
2144	   MPA/TCP receiver to recover the FPDUs despite these obstacles,
2145	   although it may be necessary to utilize additional buffering at the
2146	   receiver to do so.

2148	   Some of the cases that a receiver may have to contend with are listed
2149	   below as a reminder to the implementer:

2151	   *   A single Aligned and complete FPDU, either in order, or out of
2152	       order:  This can be passed to DDP as soon as validated, and
2153	       Delivered when ordering is established.

2155	   *   Multiple FPDUs in a TCP segment, aligned and fully contained,
2156	       either in order, or out of order:  These can be passed to DDP as
2157	       soon as validated, and Delivered when ordering is established.

2159	   *   Incomplete FPDU: The receiver should buffer until the remainder
2160	       of the FPDU arrives.  If the remainder of the FPDU is already
2161	       available, this can be passed to DDP as soon as validated, and
2162	       Delivered when ordering is established.

2164	   *   Unaligned FPDU start: The partial FPDU must be combined with its
2165	       preceding portion(s).  If the preceding parts are already
2166	       available, and the whole FPDU is present, this can be passed to
2167	       DDP as soon as validated, and Delivered when ordering is
2168	       established.  If the whole FPDU is not available, the receiver
2169	       should buffer until the remainder of the FPDU arrives.

2171	   *   Combinations of Unaligned or incomplete FPDUs (and potentially
2172	       other complete FPDUs) in the same TCP segment:  If any FPDU is
2173	       present in its entirety, or can be completed with portions
2174	       already available, it can be passed to DDP as soon as validated,
2175	       and Delivered when ordering is established.

2177	A.5  Receiver implementation

2179	   Transport & Network Layer Reassembly Buffers:

2181	   The use of reassembly buffers (either TCP reassembly buffers or IP
2182	   fragmentation reassembly buffers) is implementation dependent.  When
2183	   MPA is enabled, reassembly buffers are needed if out of order packets
2184	   arrive and Markers are not enabled.  Buffers are also needed if FPDU
2185	   Alignment is lost or if IP fragmentation occurs.  This is because the
2186	   incoming out of order segment may not contain enough information for
2187	   MPA to process all of the FPDU.  For cases where a re-segmenting
2188	   middle box is present, or where the TCP sender is not optimized, the
2189	   presence of Markers significantly reduces the amount of buffering
2190	   needed.

2192	   Recovery from IP Fragmentation is transparent to the MPA Consumers.

2194	A.5.1  Network Layer Reassembly Buffers

2196	   The MPA/TCP implementation should set the IP Don't Fragment bit at
2197	   the IP layer.  Thus upon a path MTU change, intermediate devices drop
2198	   the IP datagram if it is too large and reply with an ICMP message
2199	   which tells the source TCP that the path MTU has changed.  This
2200	   causes TCP to emit segments conformant with the new path MTU size.
2201	   Thus IP fragments under most conditions should never occur at the
2202	   receiver.  But it is possible.

2204	   There are several options for implementation of network layer
2205	   reassembly buffers:

2207	   1.  drop any IP fragments, and reply with an ICMP message according
2208	       to [RFC792] (fragmentation needed and DF set) to tell the Remote
2209	       Peer to resize its TCP segment

2211	   2.  support an IP reassembly buffer, but have it of limited size
2212	       (possibly the same size as the local link's MTU).  The end Node
2213	       would normally never advertise a path MTU larger than the local
2214	       link MTU.  It is recommended that a dropped IP fragment cause an
2215	       ICMP message to be generated according to RFC792.

2217	   3.  multiple IP reassembly buffers, of effectively unlimited size.

2219	   4.  support an IP reassembly buffer for the largest IP datagram (64
2220	       KB).

2222	   5.  support for a large IP reassembly buffer which could span
2223	       multiple IP datagrams.

2225	   An implementation should support at least 2 or 3 above, to avoid
2226	   dropping packets that have traversed the entire fabric.

2228	   There is no end-to-end ACK for IP reassembly buffers, so there is no
2229	   flow control on the buffer.  The only end-to-end ACK is a TCP ACK,
2230	   which can only occur when a complete IP datagram is delivered to TCP.
2231	   Because of this, under worst case, pathological scenarios, the
2232	   largest IP reassembly buffer is the TCP receive window (to buffer
2233	   multiple IP datagrams that have all been fragmented).

2235	   Note that if the Remote Peer does not implement re-segmentation of
2236	   the data stream upon receiving the ICMP reply updating the path MTU,
2237	   it is possible to halt forward progress because the opposite peer
2238	   would continue to retransmit using a transport segment size that is
2239	   too large.  This deadlock scenario is no different than if the fabric
2240	   MTU (not last hop MTU) was reduced after connection setup, and the
2241	   remote Node's behavior is not compliant with [RFC1122].

2243	A.5.2  TCP Reassembly buffers

2245	   A TCP reassembly buffer is also needed.  TCP reassembly buffers are
2246	   needed if FPDU Alignment is lost when using TCP with MPA or when the
2247	   MPA FPDU spans multiple TCP segments.  Buffers are also needed if
2248	   Markers are disabled and out of order packets arrive.

2250	   Since lost FPDU Alignment often means that FPDUs are incomplete, an
2251	   MPA on TCP implementation must have a reassembly buffer large enough
2252	   to recover an FPDU that is less than or equal to the MTU of the
2253	   locally attached link (this should be the largest possible advertised
2254	   TCP path MTU).  If the MTU is smaller than 140 octets, a buffer of at
2255	   least 140 octets long is needed to support the minimum FPDU size.
2256	   The 140 octets allows for the minimum MULPDU of 128, 2 octets of pad,
2257	   2 of ULPDU_Length, 4 of CRC, and space for a possible Marker.  As
2258	   usual, additional buffering is likely to provide better performance.

2260	   Note that if the TCP segment were not stored, it is possible to
2261	   deadlock the MPA algorithm.  If the path MTU is reduced, FPDU
2262	   Alignment requires the source TCP to re-segment the data stream to
2263	   the new path MTU.  The source MPA will detect this condition and
2264	   reduce the MPA segment size, but any FPDUs already posted to the
2265	   source TCP will be re-segmented and lose FPDU Alignment.  If the
2266	   destination does not support a TCP reassembly buffer, these segments
2267	   can never be successfully transmitted and the protocol deadlocks.

2269	   When a complete FPDU is received, processing continues normally.

2271	B Appendix.
2272	            Analysis of MPA over TCP Operations

2274	   This appendix is for information only and is NOT part of the
2275	   standard.

2277	   This appendix is an analysis of MPA on TCP and why it is useful to
2278	   integrate MPA with TCP (with modifications to typical TCP
2279	   implementations) to reduce overall system buffering and overhead.

2281	   One of MPA's high level goals is to provide enough information, when
2282	   combined with the Direct Data Placement Protocol [DDP], to enable
2283	   out-of-order placement of DDP payload into the final Upper Layer
2284	   Protocol (ULP) buffer.  Note that DDP separates the act of placing
2285	   data into a ULP buffer from that of notifying the ULP that the ULP
2286	   buffer is available for use.  In DDP terminology, the former is
2287	   defined as "Placement", and the later is defined as "Delivery".  MPA
2288	   supports in-order Delivery of the data to the ULP, including support
2289	   for Direct Data Placement in the final ULP buffer location when TCP
2290	   segments arrive out-of-order.  Effectively, the goal is to use the
2291	   pre-posted ULP buffers as the TCP receive buffer, where the
2292	   reassembly of the ULP Protocol Data Unit (PDU) by TCP (with MPA and
2293	   DDP) is done in place, in the ULP buffer, with no data copies.

2295	   This Appendix walks through the advantages and disadvantages of the
2296	   TCP sender modifications proposed by MPA:

2298	   1) that MPA prefers that the TCP sender to do Header Alignment, where
2299	   a TCP segment should begin with an MPA Framing Protocol Data Unit
2300	   (FPDU) (if there is payload present).

2302	   2) that there be an integral number of FPDUs in a TCP segment (under
2303	   conditions where the Path MTU is not changing).

2305	   This Appendix concludes that the scaling advantages of FPDU Alignment
2306	   are strong, based primarily on fairly drastic TCP receive buffer
2307	   reduction requirements and simplified receive handling.  The analysis
2308	   also shows that there is little effect to TCP wire behavior.

2310	B.1  Assumptions

2312	B.1.1  MPA is layered beneath DDP [DDP]

2314	   MPA is an adaptation layer between DDP and TCP.  DDP requires
2315	   preservation of DDP segment boundaries and a CRC32C digest covering
2316	   the DDP header and data.   MPA adds these features to the TCP stream
2317	   so that DDP over TCP has the same basic properties as DDP over SCTP.

2319	B.1.2  MPA preserves DDP message framing

2321	   MPA was designed as a framing layer specifically for DDP and was not
2322	   intended as a general-purpose framing layer for any other ULP using
2323	   TCP.

2325	   A framing layer allows ULPs using it to receive indications from the
2326	   transport layer only when complete ULPDUs are present.  As a framing
2327	   layer, MPA is not aware of the content of the DDP PDU, only that it
2328	   has received and, if necessary, reassembled a complete PDU for
2329	   Delivery to the DDP.

2331	B.1.3  The size of the ULPDU passed to MPA is less than EMSS under
2332	          normal conditions

2334	   To make reception of a complete DDP PDU on every received segment
2335	   possible, DDP passes to MPA a PDU that is no larger than the EMSS of
2336	   the underlying fabric.  Each FPDU that MPA creates contains
2337	   sufficient information for the receiver to directly place the ULP
2338	   payload in the correct location in the correct receive buffer.

2340	   Edge cases when this condition does not occur are dealt with, but do
2341	   not need to be on the fast path

2343	B.1.4  Out-of-order placement but NO out-of-order Delivery

2345	   DDP receives complete DDP PDUs from MPA.  Each DDP PDU contains the
2346	   information necessary to place its ULP payload directly in the
2347	   correct location in host memory.

2349	   Because each DDP segment is self-describing, it is possible for DDP
2350	   segments received out of order to have their ULP payload placed
2351	   immediately in the ULP receive buffer.

2353	   Data delivery to the ULP is guaranteed to be in the order the data
2354	   was sent.  DDP only indicates data delivery to the ULP after TCP has
2355	   acknowledged the complete byte stream.

2357	B.2  The Value of FPDU Alignment

2359	   Significant receiver optimizations can be achieved when Header
2360	   Alignment and complete FPDUs are the common case.  The optimizations
2361	   allow utilizing significantly fewer buffers on the receiver and less
2362	   computation per FPDU.  The net effect is the ability to build a
2363	   "flow-through" receiver that enables TCP-based solutions to scale to
2364	   10G and beyond in an economical way.  The optimizations are
2365	   especially relevant to hardware implementations of receivers that
2366	   process multiple protocol layers - Data Link Layer (e.g., Ethernet),
2367	   Network and Transport Layer (e.g., TCP/IP), and even some ULP on top
2368	   of TCP (e.g., MPA/DDP).  As network speed increases, there is an
2369	   increasing desire to use a hardware based receiver in order to
2370	   achieve an efficient high performance solution.

2372	   A TCP receiver, under worst case conditions, has to allocate buffers
2373	   (BufferSizeTCP) whose capacities are a function of the bandwidth-
2374	   delay product.  Thus:

2376	       BufferSizeTCP = K * bandwidth [octets/Second] * Delay [Seconds].

2378	   Where bandwidth is the end-to-end bandwidth of the connection, delay
2379	   is the round trip delay of the connection, and K is an implementation
2380	   dependent constant.

2382	   Thus BufferSizeTCP scales with the end-to-end bandwidth (10x more
2383	   buffers for a 10x increase in end-to-end bandwidth).  As this
2384	   buffering approach may scale poorly for hardware or software
2385	   implementations alike, several approaches allow reduction in the
2386	   amount of buffering required for high-speed TCP communication.

2388	   The MPA/DDP approach is to enable the ULP's buffer to be used as the
2389	   TCP receive buffer.  If the application pre-posts a sufficient amount
2390	   of buffering, and each TCP segment has sufficient information to
2391	   place the payload into the right application buffer, when an out-of-
2392	   order TCP segment arrives it could potentially be placed directly in
2393	   the ULP buffer.  However, placement can only be done when a complete
2394	   FPDU with the placement information is available to the receiver, and
2395	   the FPDU contents contain enough information to place the data into
2396	   the correct ULP buffer (e.g., there is a DDP header available).

2398	   For the case when the FPDU is not aligned with the TCP segment, it
2399	   may take, on average, 2 TCP segments to assemble one FPDU.
2400	   Therefore, the receiver has to allocate BufferSizeNAF (Buffer Size,
2401	   Non-Aligned FPDU) octets:

2403	       BufferSizeNAF = K1* EMSS * number_of_connections + K2 * EMSS

2405	   Where K1 and K2 are implementation dependent constants and EMSS is
2406	   the effective maximum segment size.

2408	   For example, a 1 Gbps link with 10,000 connections and an EMSS of
2409	   1500B would require 15 MB of memory.  Often the number of connections
2410	   used scales with the network speed, aggravating the situation for
2411	   higher speeds.

2413	   FPDU Alignment would allow the receiver to allocate BufferSizeAF
2414	   (Buffer Size, Aligned FPDU) octets:

2416	       BufferSizeAF = K2 * EMSS

2418	   for the same conditions.  A FPDU Aligned receiver may require memory
2419	   in the range of ~100s of KB - which is feasible for an on-chip memory
2420	   and enables a "flow-through" design, in which the data flows through
2421	   the NIC and is placed directly in the destination buffer.  Assuming
2422	   most of the connections support FPDU Alignment, the receiver buffers
2423	   no longer scale with number of connections.

2425	   Additional optimizations can be achieved in a balanced I/O sub-system
2426	   -- where the system interface of the network controller provides
2427	   ample bandwidth as compared with the network bandwidth.  For almost
2428	   twenty years this has been the case and the trend is expected to
2429	   continue - while Ethernet speeds have scaled by 1000 (from 10
2430	   megabit/sec to 10 gigabit/sec), I/O bus bandwidth of volume CPU
2431	   architectures has scaled from ~2 MB/sec to ~2 GB/sec (PC-XT bus to
2432	   PCI-X DDR).  Under these conditions, the FPDU Alignment approach
2433	   allows BufferSizeAF to be indifferent to network speed.  It is
2434	   primarily a function of the local processing time for a given frame.
2435	   Thus when the FPDU Alignment approach is used, receive buffering is
2436	   expected to scale gracefully (i.e. less than linear scaling) as
2437	   network speed is increased.

2439	B.2.1  Impact of lack of FPDU Alignment on the receiver computational
2440	          load and complexity

2442	   The receiver must perform IP and TCP processing, and then perform
2443	   FPDU CRC checks, before it can trust the FPDU header placement
2444	   information.  For simplicity of the description, the assumption is
2445	   that a FPDU is carried in no more than 2 TCP segments.  In reality,
2446	   with no FPDU Alignment, an FPDU can be carried by more than 2 TCP
2447	   segments (e.g., if the PMTU was reduced).

2449	   ----++-----------------------------++-----------------------++-----
2450	   +---||---------------+    +--------||--------+   +----------||----+
2451	   |   TCP Seg X-1      |    |     TCP Seg X    |   |  TCP Seg X+1   |
2452	   +---||---------------+    +--------||--------+   +----------||----+
2453	   ----++-----------------------------++-----------------------++-----
2454	                   FPDU #N-1                  FPDU #N

2456	       Figure 12: Non-aligned FPDU freely placed in TCP octet stream

2458	   The receiver algorithm for processing TCP segments (e.g., TCP segment
2459	   #X in Figure 12: Non-aligned FPDU freely placed in TCP octet stream)
2460	   carrying non-aligned FPDUs (in-order or out-of-order) includes:

2462	      Data Link Layer processing (whole frame) - typically including a
2463	          CRC calculation.

2465	      1.  Network Layer processing (assuming not an IP fragment, the
2466	          whole Data Link Layer frame contains one IP datagram.  IP
2467	          fragments should be reassembled in a local buffer.  This is
2468	          not a performance optimization goal)

2470	      2.  Transport Layer processing -- TCP protocol processing, header
2471	          and checksum checks.

2473	          a.  Classify incoming TCP segment using the 5 tuple (IP SRC,
2474	              IP DST, TCP SRC Port, TCP DST Port, protocol)

2476	      3.  Find FPDU message boundaries.

2478	          a.  Get MPA state information for the connection

2480	              If the TCP segment is in-order, use the receiver managed
2481	                  MPA state information to calculate where the previous
2482	                  FPDU message (#N-1) ends in the current TCP segment X.
2483	                  (previously, when the MPA receiver processed the first
2484	                  part of FPDU #N-1, it calculated the number of bytes
2485	                  remaining to complete FPDU #N-1 by using the MPA
2486	                  Length field).

2488	                  Get the stored partial CRC for FPDU #N-1

2490	                  Complete CRC calculation for FPDU #N-1 data (first
2491	                      portion of TCP segment #X)

2493	                  Check CRC calculation for FPDU #N-1

2495	                  If no FPDU CRC errors, placement is allowed

2497	                  Locate the local buffer for the first portion of
2498	                      FPDU#N-1, CopyData(local buffer of first portion
2499	                      of FPDU #N-1, host buffer address, length)

2501	                  Compute host buffer address for second portion of FPDU
2502	                      #N-1

2504	                  CopyData (local buffer of second portion of FPDU #N-1,
2505	                      host buffer address for second portion, length)

2507	                  Calculate the octet offset into the TCP segment for
2508	                      the next FPDU #N.

2510	                  Start Calculation of CRC for available data for FPDU
2511	                      #N

2513	                  Store partial CRC results for FPDU #N

2515	                  Store local buffer address of first portion of FPDU #N

2517	                  No further action is possible on FPDU #N, before it is
2518	                      completely received

2520	              If TCP out-of-order, receiver must buffer the data until
2521	                  at least one complete FPDU is received.  Typically
2522	                  buffering for more than one TCP segment per connection
2523	                  is required.  Use the MPA based Markers to calculate
2524	                  where FPDU boundaries are.

2526	                  When a complete FPDU is available, a similar procedure
2527	                      to the in-order algorithm above is used.  There is
2528	                      additional complexity, though, because when the
2529	                      missing segment arrives, this TCP segment must be
2530	                      run through the CRC engine after the CRC is
2531	                      calculated for the missing segment.

2533	   If we assume FPDU Alignment, the following diagram and the algorithm
2534	   below apply.  Note that when using MPA, the receiver is assumed to
2535	   actively detect presence or loss of FPDU Alignment for every TCP
2536	   segment received.

2538	      +--------------------------+      +--------------------------+
2539	   +--|--------------------------+   +--|--------------------------+
2540	   |  |       TCP Seg X          |   |  |         TCP Seg X+1      |
2541	   +--|--------------------------+   +--|--------------------------+
2542	      +--------------------------+      +--------------------------+
2543	                FPDU #N                          FPDU #N+1

2545	        Figure 13: Aligned FPDU placed immediately after TCP header

2547	   The receiver algorithm for FPDU Aligned frames (in-order or out-of-
2548	   order) includes:

2550	       1)  Data Link Layer processing (whole frame) - typically
2551	           including a CRC calculation.

2553	       2)  Network Layer processing (assuming not an IP fragment, the
2554	           whole Data Link Layer frame contains one IP datagram.  IP
2555	           fragments should be reassembled in a local buffer.  This is
2556	           not a performance optimization goal)

2558	       3)  Transport Layer processing -- TCP protocol processing, header
2559	           and checksum checks.

2561	           a.  Classify incoming TCP segment using the 5 tuple (IP SRC,
2562	               IP DST, TCP SRC Port, TCP DST Port, protocol)

2564	       4)  Check for Header Alignment. (Described in detail in Section
2565	           6).  Assuming Header Alignment for the rest of the algorithm
2566	           below.

2568	           a.  If the header is not aligned, see the algorithm defined
2569	               in the prior section.

2571	       5)  If TCP is in-order or out-of-order the MPA header is at the
2572	           beginning of the current TCP payload.  Get the FPDU length
2573	           from the FPDU header.

2575	       6)  Calculate CRC over FPDU

2577	       7)  Check CRC calculation for FPDU #N

2579	       8)  If no FPDU CRC errors, placement is allowed

2581	       9)  CopyData(TCP segment #X, host buffer address, length)

2583	       10) Loop to #5 until all the FPDUs in the TCP segment are
2584	           consumed in order to handle FPDU packing.

2586	   Implementation note: In both cases the receiver has to classify the
2587	   incoming TCP segment and associate it with one of the flows it
2588	   maintains.  In the case of no FPDU Alignment, the receiver is forced
2589	   to classify incoming traffic before it can calculate the FPDU CRC.
2590	   In the case of FPDU Alignment the operations order is left to the
2591	   implementer.

2593	   The FPDU Aligned receiver algorithm is significantly simpler.  There
2594	   is no need to locally buffer portions of FPDUs.  Accessing state
2595	   information is also substantially simplified - the normal case does
2596	   not require retrieving information to find out where a FPDU starts
2597	   and ends or retrieval of a partial CRC before the CRC calculation can
2598	   commence.  This avoids adding internal latencies, having multiple
2599	   data passes through the CRC machine, or scheduling multiple commands
2600	   for moving the data to the host buffer.

2602	   The aligned FPDU approach is useful for in-order and out-of-order
2603	   reception.  The receiver can use the same mechanisms for data storage
2604	   in both cases, and only needs to account for when all the TCP
2605	   segments have arrived to enable Delivery.  The Header Alignment,
2606	   along with the high probability that at least one complete FPDU is
2607	   found with every TCP segment, allows the receiver to perform data
2608	   placement for out-of-order TCP segments with no need for intermediate
2609	   buffering.  Essentially the TCP receive buffer has been eliminated
2610	   and TCP reassembly is done in place within the ULP buffer.

2612	   In case FPDU Alignment is not found, the receiver should follow the
2613	   algorithm for non aligned FPDU reception which may be slower and less
2614	   efficient.

2616	B.2.2  FPDU Alignment effects on TCP wire protocol

2618	      In an optimized MPA/TCP implementation, TCP exposes its EMSS to
2619	      MPA.  MPA uses the EMSS to calculate its MULPDU, which it then
2620	      exposes to DDP, its ULP.  DDP uses the MULPDU to segment its
2621	      payload so that each FPDU sent by MPA fits completely into one
2622	      TCP segment.  This has no impact on wire protocol and exposing
2623	      this information is already supported on many TCP
2624	      implementations, including all modern flavors of BSD networking,
2625	      through the TCP_MAXSEG socket option.

2627	   In the common case, the ULP (i.e. DDP over MPA) messages provided to
2628	   the TCP layer are segmented to MULPDU size.  It is assumed that the
2629	   ULP message size is bounded by MULPDU, such that a single ULP message
2630	   can be encapsulated in a single TCP segment.  Therefore, in the
2631	   common case, there is no increase in the number of TCP segments
2632	   emitted.  For smaller ULP messages, the sender can also apply
2633	   packing, i.e. the sender packs as many complete FPDUs as possible
2634	   into one TCP segment.  The requirement to always have a complete FPDU
2635	   may increase the number of TCP segments emitted.  Typically, a ULP
2636	   message size varies from few bytes to multiple EMSS (e.g., 64
2637	   Kbytes).  In some cases the ULP may post more than one message at a
2638	   time for transmission, giving the sender an opportunity for packing.
2639	   In the case where more than one FPDU is available for transmission
2640	   and the FPDUs are encapsulated into a TCP segment and there is no
2641	   room in the TCP segment to include the next complete FPDU, another
2642	   TCP segment is sent.  In this corner case some of the TCP segments
2643	   are not full size.  In the worst case scenario, the ULP may choose a
2644	   FPDU size that is EMSS/2 +1 and has multiple messages available for
2645	   transmission.  For this poor choice of FPDU size, the average TCP
2646	   segment size is therefore about 1/2 of the EMSS and the number of TCP
2647	   segments emitted is approaching 2x of what is possible without the
2648	   requirement to encapsulate an integer number of complete FPDUs in
2649	   every TCP segment.  This is a dynamic situation that only lasts for
2650	   the duration where the sender ULP has multiple non-optimal messages
2651	   for transmission and this causes a minor impact on the wire
2652	   utilization.

2654	   However, it is not expected that requiring FPDU Alignment will have a
2655	   measurable impact on wire behavior of most applications.  Throughput
2656	   applications with large I/Os are expected to take full advantage of
2657	   the EMSS.  Another class of applications with many small outstanding
2658	   buffers (as compared to EMSS) is expected to use packing when
2659	   applicable.  Transaction oriented applications are also optimal.

2661	   TCP retransmission is another area that can affect sender behavior.
2662	   TCP supports retransmission of the exact, originally transmitted
2663	   segment (see [RFC793] section 2.6, [RFC793] section 3.7 "managing the
2664	   window" and [RFC1122] section 4.2.2.15).  In the unlikely event that
2665	   part of the original segment has been received and acknowledged by
2666	   the remote peer (e.g., a re-segmenting middle box, as documented in
2667	   Appendix A.4, Re-segmenting Middle boxes and non optimized MPA/TCP
2668	   senders on page 50), a better available bandwidth utilization may be
2669	   possible by re-transmitting only the missing octets.  If an optimized
2670	   MPA/TCP retransmits complete FPDUs, there may be some marginal
2671	   bandwidth loss.

2673	   Another area where a change in the TCP segment number may have impact
2674	   is that of Slow Start and Congestion Avoidance.  Slow-start
2675	   exponential increase is measured in segments per second, as the
2676	   algorithm focuses on the overhead per segment at the source for
2677	   congestion that eventually results in dropped segments.  Slow-start
2678	   exponential bandwidth growth for optimized MPA/TCP is similar to any
2679	   TCP implementation.  Congestion Avoidance allows for a linear growth
2680	   in available bandwidth when recovering after a packet drop.  Similar
2681	   to the analysis for slow-start, optimized MPA/TCP doesn't change the
2682	   behavior of the algorithm.  Therefore the average size of the segment
2683	   versus EMSS is not a major factor in the assessment of the bandwidth
2684	   growth for a sender.  Both Slow Start and Congestion Avoidance for an
2685	   optimized MPA/TCP will behave similarly to any TCP sender and allow
2686	   an optimized MPA/TCP to enjoy the theoretical performance limits of
2687	   the algorithms.

2689	   In summary, the ULP messages generated at the sender (e.g., the
2690	   amount of messages grouped for every transmission request) and
2691	   message size distribution has the most significant impact over the
2692	   number of TCP segments emitted.  The worst case effect for certain
2693	   ULPs (with average message size of EMSS/2+1 to EMSS), is bounded by
2694	   an increase of up to 2x in the number of TCP segments and
2695	   acknowledges.  In reality the effect is expected to be marginal.

2697	C Appendix.
2698	            IETF Implementation Interoperability with RDMA Consortium
2699	        Protocols

2701	   This appendix is for information only and is NOT part of the
2702	   standard.

2704	   This appendix covers methods of making MPA implementations
2705	   interoperate with both IETF and RDMA Consortium versions of the
2706	   protocols.

2708	   The RDMA Consortium created early specifications of the MPA/DDP/RDMA
2709	   protocols and some manufacturers created implementations of those
2710	   protocols before the IETF versions were finalized.  These protocols
2711	   and are very similar to the IETF versions making it possible for
2712	   implementations to be created or modified to support either set of
2713	   specifications.

2715	   For those interested, the RDMA Consortium protocol documents
2716	   (draft-culley-iwarp-mpa-v1.0.pdf, draft-shah-iwarp-ddp-v1.0.pdf, and
2717	   draft-recio-iwarp-rdmac-v1.0.pdf) can be obtained at
2718	   http://www.rdmaconsortium.org.

2720	   In this section, implementations of MPA/DDP/RDMA that conform to the
2721	   RDMAC specifications are called RDMAC RNICs.  Implementations of
2722	   MPA/DDP/RDMA that conform to the IETF RFCs are called IETF RNICs.

2724	   Without the exchange of MPA Request/Reply Frames, there is no
2725	   standard mechanism for enabling RDMAC RNICs to interoperate with IETF
2726	   RNICs.  Even if a ULP uses a well-known port to start an IETF RNIC
2727	   immediately in RDMA mode (i.e., without exchanging the MPA
2728	   Request/Reply messages), there is no reason to believe an IETF RNIC
2729	   will interoperate with an RDMAC RNIC because of the differences in
2730	   the version number in the DDP and RDMAP headers on the wire.

2732	   Therefore, the ULP or other supporting entity at the RDMAC RNIC must
2733	   implement MPA Request/Reply Frames on behalf of the RNIC in order to
2734	   negotiate the connection parameters.  The following section describes
2735	   the results following the exchange of the MPA Request/Reply Frames
2736	   before the conversion from streaming to RDMA mode.

2738	C.1  Negotiated Parameters

2740	   Three types of RNICs are considered:

2742	   Upgraded RDMAC RNIC - an RNIC implementing the RDMAC protocols which
2743	       has a ULP or other supporting entity that exchanges the MPA
2744	       Request/Reply Frames in streaming mode before the conversion to
2745	       RDMA mode.

2747	   Non-permissive IETF RNIC - an RNIC implementing the IETF protocols
2748	       which is not capable of implementing the RDMAC protocols.  Such
2749	       an RNIC can only interoperate with other IETF RNICs.

2751	   Permissive IETF RNIC - an RNIC implementing the IETF protocols which
2752	       is capable of implementing the RDMAC protocols on a per
2753	       connection basis.

2755	   The Permissive IETF RNIC is recommended for those implementers that
2756	   want maximum interoperability with other RNIC implementations.

2758	   The values used by these three RNIC types for the MPA, DDP, and RDMAP
2759	   versions as well as MPA Markers and CRC are summarized in Figure 14.

2761	    +----------------++-----------+-----------+-----------+-----------+
2762	    | RNIC TYPE      || DDP/RDMAP |    MPA    |    MPA    |    MPA    |
2763	    |                ||  Version  | Revision  |  Markers  |    CRC    |
2764	    +----------------++-----------+-----------+-----------+-----------+
2765	    +----------------++-----------+-----------+-----------+-----------+
2766	    | RDMAC          ||     0     |     0     |     1     |     1     |
2767	    |                ||           |           |           |           |
2768	    +----------------++-----------+-----------+-----------+-----------+
2769	    | IETF           ||     1     |     1     |  0 or 1   |  0 or 1   |
2770	    | Non-permissive ||           |           |           |           |
2771	    +----------------++-----------+-----------+-----------+-----------+
2772	    | IETF           ||  1 or 0   |  1 or 0   |  0 or 1   |  0 or 1   |
2773	    | permissive     ||           |           |           |           |
2774	    +----------------++-----------+-----------+-----------+-----------+
2775	           Figure 14.  Connection Parameters for the RNIC Types.
2776	            For MPA Markers and MPA CRC, enabled=1, disabled=0.

2778	   It is assumed there is no mixing of versions allowed between MPA, DDP
2779	   and RDMAP.  The RNIC either generates the RDMAC protocols on the wire
2780	   (version is zero) or the IETF protocols (version is one).

2782	   During the exchange of the MPA Request/Reply Frames, each peer
2783	   provides its MPA Revision, Marker preference (M: 0=disabled,
2784	   1=enabled), and CRC preference.  The MPA Revision provided in the MPA
2785	   Request Frame and the MPA Reply Frame may differ.

2787	   From the information in the MPA Request/Reply Frames, each side sets
2788	   the Version field (V: 0=RDMAC, 1=IETF) of the DDP/RDMAP protocols as
2789	   well as the state of the Markers for each half connection.  Between
2790	   DDP and RDMAP, no mixing of versions is allowed.  Moreover, the DDP
2791	   and RDMAP version MUST be identical in the two directions.  The RNIC
2792	   either generates the RDMAC protocols on the wire (version is zero) or
2793	   the IETF protocols (version is one).

2795	   In the following sections, the figures do not discuss CRC negotiation
2796	   because there is no interoperability issue for CRCs.  Since the RDMAC
2797	   RNIC will always request CRC use, then, according to the IETF MPA
2798	   specification, both peers MUST generate and check CRCs.

2800	C.2  RDMAC RNIC and Non-permissive IETF RNIC

2802	   Figure 15 shows that a Non-permissive IETF RNIC cannot interoperate
2803	   with an RDMAC RNIC, despite the fact that both peers exchange MPA
2804	   Request/Reply Frames.  For a Non-permissive IETF RNIC, the MPA
2805	   negotiation has no effect on the DDP/RDMAP version and it is unable
2806	   to interoperate with the RDMAC RNIC.

2808	   The rows in the figure show the state of the Marker field in the MPA
2809	   Request Frame sent by the MPA Initiator.  The columns show the state
2810	   of the Marker field in the MPA Reply Frame sent by the MPA Responder.
2811	   Each type of RNIC is shown as an Initiator and a Responder.  The
2812	   connection results are shown in the lower right corner, at the
2813	   intersection of the different RNIC types, where V=0 is the RDMAC
2814	   DDP/RDMAP version, V=1 is the IETF DDP/RDMAC version, M=0 means MPA
2815	   Markers are disabled and M=1 means MPA Markers are enabled.  The
2816	   negotiated Marker state is shown as X/Y, for the receive direction of
2817	   the Initiator/Responder.

2819	          +---------------------------++-----------------------+
2820	          |   MPA                     ||          MPA          |
2821	          | CONNECT                   ||       Responder       |
2822	          |   MODE  +-----------------++-------+---------------+
2823	          |         |   RNIC          || RDMAC |     IETF      |
2824	          |         |   TYPE          ||       | Non-permissive|
2825	          |         |          +------++-------+-------+-------+
2826	          |         |          |MARKER|| M=1   | M=0   |  M=1  |
2827	          +---------+----------+------++-------+-------+-------+
2828	          +---------+----------+------++-------+-------+-------+
2829	          |         |   RDMAC  | M=1  || V=0   | close | close |
2830	          |         |          |      || M=1/1 |       |       |
2831	          |         +----------+------++-------+-------+-------+
2832	          |   MPA   |          | M=0  || close | V=1   | V=1   |
2833	          |Initiator|   IETF   |      ||       | M=0/0 | M=0/1 |
2834	          |         |Non-perms.+------++-------+-------+-------+
2835	          |         |          | M=1  || close | V=1   | V=1   |
2836	          |         |          |      ||       | M=1/0 | M=1/1 |
2837	          +---------+----------+------++-------+-------+-------+
2838	   Figure 15: MPA negotiation between an RDMAC RNIC and a Non-permissive
2839	                                IETF RNIC.

2841	C.2.1  RDMAC RNIC Initiator

2843	   If the RDMAC RNIC is the MPA Initiator, its ULP sends an MPA Request
2844	   Frame with Rev field set to zero and the M and C bits set to one.
2845	   Because the Non-permissive IETF RNIC cannot dynamically downgrade the
2846	   version number it uses for DDP and RDMAP, it would send an MPA Reply
2847	   Frame with the Rev field equal to one and then gracefully close the
2848	   connection.

2850	C.2.2  Non-Permissive IETF RNIC Initiator

2852	   If the Non-permissive IETF RNIC is the MPA Initiator, it sends an MPA
2853	   Request Frame with Rev field equal to one.  The ULP or supporting
2854	   entity for the RDMAC RNIC responds with an MPA Reply Frame that has
2855	   the Rev field equal to zero and the M bit set to one.  The Non-
2856	   permissive IETF RNIC will gracefully close the connection after it
2857	   reads the incompatible Rev field in the MPA Reply Frame.

2859	C.2.3  RDMAC RNIC and Permissive IETF RNIC

2861	   Figure 16 shows that a Permissive IETF RNIC can interoperate with an
2862	   RDMAC RNIC regardless of its Marker preference.  The figure uses the
2863	   same format as shown with the Non-permissive IETF RNIC.

2865	          +---------------------------++-----------------------+
2866	          |   MPA                     ||          MPA          |
2867	          | CONNECT                   ||       Responder       |
2868	          |   MODE  +-----------------++-------+---------------+
2869	          |         |   RNIC          || RDMAC |     IETF      |
2870	          |         |   TYPE          ||       |  Permissive   |
2871	          |         |          +------++-------+-------+-------+
2872	          |         |          |MARKER|| M=1   | M=0   | M=1   |
2873	          +---------+----------+------++-------+-------+-------+
2874	          +---------+----------+------++-------+-------+-------+
2875	          |         |   RDMAC  | M=1  || V=0   | N/A   | V=0   |
2876	          |         |          |      || M=1/1 |       | M=1/1 |
2877	          |         +----------+------++-------+-------+-------+
2878	          |   MPA   |          | M=0  || V=0   | V=1   | V=1   |
2879	          |Initiator|   IETF   |      || M=1/1 | M=0/0 | M=0/1 |
2880	          |         |Permissive+------++-------+-------+-------+
2881	          |         |          | M=1  || V=0   | V=1   | V=1   |
2882	          |         |          |      || M=1/1 | M=1/0 | M=1/1 |
2883	          +---------+----------+------++-------+-------+-------+
2884	     Figure 16: MPA negotiation between an RDMAC RNIC and a Permissive
2885	                                IETF RNIC.

2887	   A truly Permissive IETF RNIC will recognize an RDMAC RNIC from the
2888	   Rev field of the MPA Req/Rep Frames and then adjust its receive
2889	   Marker state and DDP/RDMAP version to accommodate the RDMAC RNIC.  As
2890	   a result, as an MPA Responder, the Permissive IETF RNIC will never
2891	   return an MPA Reply Frame with the M bit set to zero.  This case is
2892	   shown as a not applicable (N/A) in Figure 16.

2894	C.2.4  RDMAC RNIC Initiator

2896	   When the RDMAC RNIC is the MPA Initiator, its ULP or other supporting
2897	   entity prepares an MPA Request message and sets the revision to zero
2898	   and the M bit and C bit to one.

2900	   The Permissive IETF Responder receives the MPA Request message and
2901	   checks the revision field.  Since it is capable of generating RDMAC
2902	   DDP/RDMAP headers, it sends an MPA Reply message with revision set to
2903	   zero and the M and C bits set to one.  The Responder must inform its
2904	   ULP that it is generating version zero DDP/RDMAP messages.

2906	C.2.5  Permissive IETF RNIC Initiator

2908	   If the Permissive IETF RNIC is the MPA Initiator, it prepares the MPA
2909	   Request Frame setting the Rev field to one.  Regardless of the value
2910	   of the M bit in the MPA Request Frame, the ULP or other supporting
2911	   entity for the RDMAC RNIC will create an MPA Reply Frame with Rev
2912	   equal to zero and the M bit set to one.

2914	   When the Initiator reads the Rev field of the MPA Reply Frame and
2915	   finds that its peer is an RDMAC RNIC, it must inform its ULP that it
2916	   should generate version zero DDP/RDMAP messages and enable MPA
2917	   Markers and CRC.

2919	C.3  Non-Permissive IETF RNIC and Permissive IETF RNIC

2921	   For completeness, Figure 17 below shows the results of MPA
2922	   negotiation between a Non-permissive IETF RNIC and a Permissive IETF
2923	   RNIC.  The important point from this figure is that an IETF RNIC
2924	   cannot detect whether its peer is a Permissive or Non-permissive
2925	   RNIC.

2927	      +---------------------------++-------------------------------+
2928	      |   MPA                     ||              MPA              |
2929	      | CONNECT                   ||            Responder          |
2930	      |   MODE  +-----------------++---------------+---------------+
2931	      |         |   RNIC          ||     IETF      |     IETF      |
2932	      |         |   TYPE          || Non-permissive|  Permissive   |
2933	      |         |          +------++-------+-------+-------+-------+
2934	      |         |          |MARKER|| M=0   | M=1   | M=0   | M=1   |
2935	      +---------+----------+------++-------+-------+-------+-------+
2936	      +---------+----------+------++-------+-------+-------+-------+
2937	      |         |          | M=0  || V=1   | V=1   | V=1   | V=1   |
2938	      |         |   IETF   |      || M=0/0 | M=0/1 | M=0/0 | M=0/1 |
2939	      |         |Non-perms.+------++-------+-------+-------+-------+
2940	      |         |          | M=1  || V=1   | V=1   | V=1   | V=1   |
2941	      |         |          |      || M=1/0 | M=1/1 | M=1/0 | M=1/1 |
2942	      |   MPA   +----------+------++-------+-------+-------+-------+
2943	      |Initiator|          | M=0  || V=1   | V=1   | V=1   | V=1   |
2944	      |         |   IETF   |      || M=0/0 | M=0/1 | M=0/0 | M=0/1 |
2945	      |         |Permissive+------++-------+-------+-------+-------+
2946	      |         |          | M=1  || V=1   | V=1   | V=1   | V=1   |
2947	      |         |          |      || M=1/0 | M=1/1 | M=1/0 | M=1/1 |
2948	      +---------+----------+------++-------+-------+-------+-------+
2949	    Figure 17: MPA negotiation between a Non-permissive IETF RNIC and a
2950	                           Permissive IETF RNIC.

2952	Normative References

2954	   [iSCSI] Satran, J., Internet Small Computer Systems Interface
2955	       (iSCSI), RFC 3720, April 2004.

2957	   [RFC1191] Mogul, J., and Deering, S., "Path MTU Discovery", RFC 1191,
2958	       November 1990.

2960	   [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., Romanow, A., "TCP
2961	       Selective Acknowledgment Options", RFC 2018, October 1996.

2963	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
2964	       Requirement Levels", BCP 14, RFC 2119, March 1997.

2966	   [RFC2401]  Atkinson, R., Kent, S., "Security Architecture for the
2967	       Internet Protocol", RFC 2401, November 1998.

2969	   [RFC3723] Aboba B., et al, "Securing Block Storage Protocols over
2970	       IP", RFC3723, April 2004.

2972	   [RFC793] Postel, J., "Transmission Control Protocol - DARPA Internet
2973	       Program Protocol Specification", RFC 793, September 1981.

2975	   [RDMASEC]  Pinkerton J., Deleganes E., Bitan S., "DDP/RDMAP
2976	       Security", draft-ietf-rddp-security-09.txt (work in progress),
2977	       MAY 2006.

2979	Informative References

2981	   [APPL] Bestler, C., "Applicability of Remote Direct Memory Access
2982	       Protocol (RDMA) and Direct Data Placement (DDP)", draft-ietf-
2983	       rddp-applicability-08.txt (Work in progress), June 2006.

2985	   [CRCTCP] Stone J., Partridge, C., "When the CRC and TCP checksum
2986	       disagree", ACM Sigcomm, Sept. 2000.

2988	   [DAT-API] DAT Collaborative, "kDAPL (Kernel Direct Access Programming
2989	       Library) and uDAPL (User Direct Access Programming Library)",
2990	       http://www.datcollaborative.org.

2992	   [DDP] H. Shah et al., "Direct Data Placement over Reliable
2993	       Transports", draft-ietf-rddp-ddp-07.txt (Work in progress),
2994	       September 2006.

2996	   [iSER] Mike Ko et al., "iSCSI Extensions for RDMA Specification",
2997	       draft-ietf-ips-iser-05.txt (Work in progress), October 2005.

2999	   [IT-API] The Open Group, "Interconnect Transport API (IT-API)"
3000	       Version 2.1, http://www.opengroup.org.

3002	   [NFSv4CHANNEL] Williams, N., "On the Use of Channel Bindings to
3003	       Secure Channels", Internet-Draft draft-ietf-nfsv4-channel-
3004	       bindings-02.txt, July 2004.

3006	   [RDMAP] R. Recio et al., "RDMA Protocol Specification",
3007	       draft-ietf-rddp-rdmap-07.txt, September 2006.

3009	   [RFC792] Postel, J., "Internet Control Message Protocol", September
3010	       1981

3012	   [RFC0896] J. Nagle, "Congestion Control in IP/TCP Internetworks", RFC
3013	       896, January 1984.

3015	   [RFC1122] Braden, R.T., "Requirements for Internet hosts -
3016	       communication layers", October 1989.

3018	   [RFC2960] R. Stewart et al., "Stream Control Transmission Protocol",
3019	       RFC 2960, October 2000.

3021	   [RFC4296] Bailey, S., Talpey, T, "The Architecture of Direct Data
3022	       Placement (DDP) and Remote Direct Memory Access (RDMA) on
3023	       Internet Protocols" RFC 4296, December 2005

3025	   [RFC4297] Romanow, A., et al., "Remote Direct Memory Access (RDMA)
3026	       over IP Problem Statement", RFC 4297, December 2005

3028	   [RFC4301] Kent, S., Seo, K., "Security Architecture for the Internet
3029	       Protocol", RFC 4301, December 2005

3031	   [VERBS] J. Hilland et al., "RDMA Protocol Verbs Specification",
3032	       draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf April 2003,
3033	       http://www.rdmaconsortium.org.

3035	Author's Addresses

3037	   Stephen Bailey
3038	       Sandburst Corporation
3039	       600 Federal Street
3040	       Andover, MA  01810 USA
3041	       Phone: +1 978 689 1614
3042	       Email: steph@sandburst.com

3044	   Paul R. Culley
3045	       Hewlett-Packard Company
3046	       20555 SH 249
3047	       Houston, Tx. USA 77070-2698
3048	       Phone:  281-514-5543
3049	       Email:  paul.culley@hp.com

3051	   Uri Elzur
3052	       Broadcom
3053	       16215 Alton Parkway
3054	       CA, 92618
3055	       Phone: 949.585.6432
3056	       Email:  uri@broadcom.com

3058	   Renato J Recio
3059	       IBM
3060	       Internal Zip 9043
3061	       11400 Burnett Road
3062	       Austin,  Texas  78759
3063	       Phone:  512-838-3685
3064	       Email:  recio@us.ibm.com

3066	   John Carrier
3067	       Cray Inc.
3068	       411 First Avenue S, Suite 600
3069	       Seattle, WA 98104-2860
3070	       Phone: 206-701-2090
3071	       Email: carrier@cray.com

3073	Acknowledgments

3075	   Dwight Barron
3076	       Hewlett-Packard Company
3077	       20555 SH 249
3078	       Houston, Tx. USA 77070-2698
3079	       Phone: 281-514-2769
3080	       Email: dwight.barron@hp.com

3082	   Jeff Chase
3083	       Department of Computer Science
3084	       Duke University
3085	       Durham, NC 27708-0129 USA
3086	       Phone: +1 919 660 6559
3087	       Email: chase@cs.duke.edu

3089	   Ted Compton
3090	       EMC Corporation
3091	       Research Triangle Park, NC 27709, USA
3092	       Phone: 919-248-6075
3093	       Email: compton_ted@emc.com

3095	   Dave Garcia
3096	       Hewlett-Packard Company
3097	       19333 Vallco Parkway
3098	       Cupertino, Ca. USA 95014
3099	       Phone: 408.285.6116
3100	       Email: dave.garcia@hp.com

3102	   Hari Ghadia
3103	       Adaptec, Inc.
3104	       691 S. Milpitas Blvd.,
3105	       Milpitas, CA 95035  USA
3106	       Phone: +1 (408) 957-5608
3107	       Email: hari_ghadia@adaptec.com

3109	   Howard C. Herbert
3110	       Intel Corporation
3111	       MS CH7-404
3112	       5000 West Chandler Blvd.
3113	       Chandler, Arizona 85226
3114	       Phone: 480-554-3116
3115	       Email: howard.c.herbert@intel.com

3117	   Jeff Hilland
3118	       Hewlett-Packard Company
3119	       20555 SH 249
3120	       Houston, Tx. USA 77070-2698
3121	       Phone: 281-514-9489
3122	       Email: jeff.hilland@hp.com

3124	   Mike Ko
3125	       IBM
3126	       650 Harry Rd.
3127	       San Jose, CA 95120
3128	       Phone: (408) 927-2085
3129	       Email: mako@us.ibm.com

3131	   Mike Krause
3132	       Hewlett-Packard Corporation, 43LN
3133	       19410 Homestead Road
3134	       Cupertino, CA 95014 USA
3135	       Phone: +1 (408) 447-3191
3136	       Email: krause@cup.hp.com

3138	   Dave Minturn
3139	       Intel Corporation
3140	       MS JF1-210
3141	       5200 North East Elam Young Parkway
3142	       Hillsboro, Oregon  97124
3143	       Phone: 503-712-4106
3144	       Email: dave.b.minturn@intel.com

3146	   Jim Pinkerton
3147	       Microsoft, Inc.
3148	       One Microsoft Way
3149	       Redmond, WA, USA 98052
3150	       Email: jpink@microsoft.com

3152	   Hemal Shah
3153	       16215 Alton Parkway
3154	       Irvine, California 92619-7013 USA
3155	       Phone: +1 949 926-6941
3156	       Email: hemal@broadcom.com

3158	   Allyn Romanow
3159	       Cisco Systems
3160	       170 W Tasman Drive
3161	       San Jose, CA 95134 USA
3162	       Phone: +1 408 525 8836
3163	       Email: allyn@cisco.com

3165	   Tom Talpey
3166	       Network Appliance
3167	       375 Totten Pond Road
3168	       Waltham, MA 02451 USA
3169	       Phone: +1 (781) 768-5329
3170	       EMail: thomas.talpey@netapp.com

3172	   Patricia Thaler
3173	       Broadcom
3174	       16215 Alton Parkway
3175	       Irvine, CA 92618
3176	       Phone: 916 570 2707
3177	       pthaler@broadcom.com

3179	   Jim Wendt
3180	       Hewlett Packard Corporation
3181	       8000 Foothills Boulevard MS 5668
3182	       Roseville, CA 95747-5668 USA
3183	       Phone: +1 916 785 5198
3184	       Email: jim_wendt@hp.com

3186	   Jim Williams
3187	       Emulex Corporation
3188	       580 Main Street
3189	       Bolton, MA 01740 USA
3190	       Phone: +1 978 779 7224
3191	       Email: jim.williams@emulex.com

3193	Full Copyright Statement

3195	   This document and the information contained herein are provided on an
3196	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
3197	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
3198	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
3199	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
3200	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
3201	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

3203	   Copyright (C) The Internet Society (2006).  This document is subject
3204	   to the rights, licenses and restrictions contained in BCP 78, and
3205	   except as set forth therein, the authors retain all their rights.

3207	Intellectual Property

3209	   The IETF takes no position regarding the validity or scope of any
3210	   Intellectual Property Rights or other rights that might be claimed to
3211	   pertain to the implementation or use of the technology described in
3212	   this document or the extent to which any license under such rights
3213	   might or might not be available; nor does it represent that it has
3214	   made any independent effort to identify any such rights.  Information
3215	   on the procedures with respect to rights in RFC documents can be
3216	   found in BCP 78 and BCP 79.

3218	   Copies of IPR disclosures made to the IETF Secretariat and any
3219	   assurances of licenses to be made available, or the result of an
3220	   attempt made to obtain a general license or permission for the use of
3221	   such proprietary rights by implementers or users of this
3222	   specification can be obtained from the IETF on-line IPR repository at
3223	   http://www.ietf.org/ipr.

3225	   The IETF invites any interested party to bring to its attention any
3226	   copyrights, patents or patent applications, or other proprietary
3227	   rights that may cover technology that may be required to implement
3228	   this standard.  Please address the information to the IETF at
3229	   ietf-ipr@ietf.org.