idnits 2.17.1 draft-ietf-tsvwg-tcp-ulp-frame-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 630: '...number generator MUST be used to gener...' RFC 2119 keyword, line 642: '... Each FPDU SHOULD contain as many ...' RFC 2119 keyword, line 644: '...sabled each FPDU SHALL contain a singl...' RFC 2119 keyword, line 649: '... TUF SHALL present the size of the...' RFC 2119 keyword, line 651: '...8 octets). ULPs SHOULD submit as larg...' (20 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC793' is mentioned on line 385, but not defined

  ** Obsolete undefined reference: RFC  793 (Obsoleted by RFC 9293)

  == Unused Reference: 'RFC2581' is defined on line 1163, but no explicit
     reference was found in the text

  == Unused Reference: 'Stevens' is defined on line 1171, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2616 (ref. 'HTTP') (Obsoleted by RFC
     7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'NagleDAck'

  ** Obsolete normative reference: RFC 1750 (Obsoleted by RFC 4086)

  ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681)

  ** Obsolete normative reference: RFC 2960 (ref. 'SCTP') (Obsoleted by RFC
     4960)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Stevens'

  ** Obsolete normative reference: RFC  793 (ref. 'TCP') (Obsoleted by RFC
     9293)

  ** Obsolete normative reference: RFC 2246 (ref. 'TLS') (Obsoleted by RFC
     4346)


     Summary: 10 errors (**), 0 flaws (~~), 5 warnings (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Transport Area Working Group                   S. Bailey    (Sandburst)
3	Internet-draft                                 J. Chase          (Duke)
4	Expires: May 2002                              J. Pinkerton (Microsoft)
5	                                               A. Romanow       (Cisco)
6	                                               C. Sapuntzakis   (Cisco)
7	                                               J. Wendt            (HP)
8	                                               J. Williams     (Emulex)

10	                     TCP ULP Framing Protocol (TUF)
11	                   draft-ietf-tsvwg-tcp-ulp-frame-01

13	Status of this Memo

15	     This document is an Internet-Draft and is in full conformance with
16	     all provisions of Section 10 of RFC2026.

18	     Internet-Drafts are working documents of the Internet Engineering
19	     Task Force (IETF), its areas, and its working groups.  Note that
20	     other groups may also distribute working documents as Internet-
21	     Drafts.

23	     Internet-Drafts are draft documents valid for a maximum of six
24	     months and may be updated, replaced, or obsoleted by other
25	     documents at any time.  It is inappropriate to use Internet-Drafts
26	     as reference material or to cite them other than as "work in
27	     progress."

29	     The list of current Internet-Drafts can be accessed at
30	     http://www.ietf.org/ietf/1id-abstracts.txt

32	     The list of Internet-Draft Shadow Directories can be accessed at
33	     http://www.ietf.org/shadow.html.

35	Copyright Notice

37	     Copyright (C) The Internet Society (2001). All Rights Reserved.

39	Abstract

41	     The TCP ULP Framing (TUF) protocol defines a shim layer protocol
42	     between an Upper Layer Protocol (ULP) and TCP.  TUF also depends on
43	     a specified TCP segmentation convention between TUF endpoints.
44	     Together, the shim and segmentation conventions enable a TUF/TCP
45	     receiver to recognize ULP data units within a TCP segment
46	     independently of other TCP segments.  This capability simplifies
47	     the design of enhanced network interfaces implementing direct data
48	     placement for ULPs using TCP.  Direct data placement is a key step
49	     to making IP networking competitive with high-end interconnect
50	     solutions in data centers and other high-performance application
51	     domains.

53	Table Of Contents

55	     1.     Definitions  . . . . . . . . . . . . . . . . . . . . . .   3
56	     2.     Overview . . . . . . . . . . . . . . . . . . . . . . . .   4
57	     2.1.   Motivation . . . . . . . . . . . . . . . . . . . . . . .   4
58	     2.2.   Approach . . . . . . . . . . . . . . . . . . . . . . . .   5
59	     3.     Rational For TUF . . . . . . . . . . . . . . . . . . . .   6
60	     3.1.   Direct Data Placement  . . . . . . . . . . . . . . . . .   7
61	     3.2.   Direct Data Placement with TCP . . . . . . . . . . . . .   8
62	     3.2.1. The Simple Case: ULP-unaware Placement . . . . . . . . .   9
63	     3.2.2. The Complex Case: ULP-aware Placement  . . . . . . . . .   9
64	     3.2.3. The Problem of ULP-aware Placement with TCP  . . . . . .  10
65	     3.2.4. Finding ULPDUs In Out-of-order Segments  . . . . . . . .  11
66	     3.2.5. The TUF Solution . . . . . . . . . . . . . . . . . . . .  12
67	     3.2.6. TUF's ULP Assumptions  . . . . . . . . . . . . . . . . .  12
68	     4.     The Protocol . . . . . . . . . . . . . . . . . . . . . .  13
69	     4.1.   The Framing Protocol Data Unit (FPDU)  . . . . . . . . .  13
70	     4.1.1. FPDU Format  . . . . . . . . . . . . . . . . . . . . . .  13
71	     4.1.2. FPDU Size Selection  . . . . . . . . . . . . . . . . . .  14
72	     4.2.   TUF-conforming TCP Sender Segmentation . . . . . . . . .  15
73	     4.3.   Negotiating TUF  . . . . . . . . . . . . . . . . . . . .  15
74	     4.4.   TUF Receiver ULPDU Containment Property Testing  . . . .  16
75	     5.     Protocol Characteristics . . . . . . . . . . . . . . . .  17
76	     5.1.   Properties Of TUF-conforming TCP Senders . . . . . . . .  17
77	     5.2.   Exception Cases  . . . . . . . . . . . . . . . . . . . .  18
78	     5.2.1. Resegmenting Intermediaries  . . . . . . . . . . . . . .  18
79	     5.2.2. PMTU Reduction . . . . . . . . . . . . . . . . . . . . .  19
80	     5.2.3. PMTU Increase  . . . . . . . . . . . . . . . . . . . . .  20
81	     5.2.4. Receive Window < EMSS  . . . . . . . . . . . . . . . . .  21
82	     5.2.5. Size of ULPDU + 8 > EMSS . . . . . . . . . . . . . . . .  21
83	     6.     Security Considerations  . . . . . . . . . . . . . . . .  22
84	     6.1.   Protocol-specific Security Considerations  . . . . . . .  22
85	     6.2.   Using IPSec With TUF . . . . . . . . . . . . . . . . . .  22
86	     6.3.   Using TLS With TUF . . . . . . . . . . . . . . . . . . .  22
87	     7.     IANA Considerations  . . . . . . . . . . . . . . . . . .  25
88	            References . . . . . . . . . . . . . . . . . . . . . . .  25
89	            Authors' Addresses . . . . . . . . . . . . . . . . . . .  26
90	     A.     Sample Sockets Support For TUF . . . . . . . . . . . . .  27
91	     A.1    Basic Principles . . . . . . . . . . . . . . . . . . . .  28
92	     A.2    Enabling TUF . . . . . . . . . . . . . . . . . . . . . .  28
93	     A.3    Sending Data . . . . . . . . . . . . . . . . . . . . . .  29
94	     A.4    Retrieving The Current EMSS or MULPDU  . . . . . . . . .  29
95	     A.5    Disabling ULPDU Packing  . . . . . . . . . . . . . . . .  29
96	     A.6    Disabling The Report of Oversized ULPDUs . . . . . . . .  30
97	            Full Copyright Statement . . . . . . . . . . . . . . . .  30

99	1.  Definitions

101	     The following terms and abbreviations are used in this document.

103	          data delivery - the delivery of received ULP payloads to the
104	          ULP application, i.e, notifying the application of data
105	          arrival by completing a receive operation or generating an
106	          event.

108	          data placement - the storage of received ULP payloads to host
109	          memory, pending delivery to the ULP application.

111	          direct data placement - the storage of received ULP payloads
112	          directly to application-specified buffers without intermediate
113	          buffering or copying.

115	          EMSS - the effective maximum segment size.  EMSS is the TCP
116	          maximum segment size (MSS) defined in RFC 793 [TCP] and
117	          exchanged during TCP connection establishment, adjusted by the
118	          current path maximum transfer unit (MTU) [PathMTU].

120	          FPDU - framing protocol data unit.  The protocol data unit
121	          defined by TUF.

123	          MULPDU - maximum upper layer protocol data unit size.  The
124	          size of the largest ULPDU that fits in an EMSS-sized FPDU.

126	          NIC - network interface controller.  The device that provides
127	          a host's access to a physical network link.

129	          PDU - protocol data unit.  A self-contained block of control
130	          and data defined by a particular protocol.

132	          RDMA - Remote Direct Memory Access protocol.  A data transfer
133	          protocol which uses memory access-style transfer mode(s) to
134	          provide generic direct data placement capabilities for
135	          arbitrary ULPs.

137	          TUF - TCP ULP Framing protocol.  The protocol defined in this
138	          document.

140	          ULP - upper layer protocol.  The client protocol using the
141	          services of the transport layer, or TUF.

143	          ULPDU - upper layer protocol data unit.

145	          ULPDU containment property - the property that a TCP segment
146	          contains exactly an integral number of ULPDUs.

148	2.  Overview

150	     This section summarizes the motivation for the TCP ULP Framing
151	     (TUF) protocol and explains its operation in brief.  Section 3
152	     (`Rational for TUF') develops the rationale for TUF in detail.
153	     Section 4 (`The Protocol') defines the protocol itself.  Section 5
154	     (`Protocol Characteristics') examines various properties of the
155	     protocol's operation.  Implementors may wish to refer directly to
156	     sections 4 and 5.

158	2.1.  Motivation

160	     The IP protocols are not usually used for high-performance high
161	     speed data transfers due to overhead in TCP processing. Instead, a
162	     number of special purpose protocols have been used. The domain of
163	     application for such high speed buffer transfer includes storage,
164	     video delivery and processing, and various applications of cluster
165	     computing, such as scalable database or application service.  For
166	     reasons discussed below, today, there is great industry interest in
167	     developing an IP standard for low overhead high bandwidth data
168	     transfer, which would decrease the costs of high speed
169	     interconnects and supplant special purpose protocols.

171	     The approach typically used for low overhead transfers is called
172	     direct data placement, in which the network interface places data
173	     directly in application buffers, avoiding the latency and memory
174	     bandwidth costs associated with copying.  Direct data placement can
175	     in principal be done with either of IP's reliable transports--SCTP
176	     or TCP.  This document considers what is needed to do direct data
177	     placement with TCP.

179	     In order to place data directly in application buffers, the network
180	     interface needs to use information in the Upper Layer Protocol Data
181	     Units (ULPDUs) contained in the TCP stream.  This can be
182	     accomplished routinely except when TCP segments arrive out of
183	     order.  If TCP segments arrive out of order, the location of the
184	     ULPDUs in the TCP segment cannot be found.  The TUF protocol
185	     addresses this problem of finding ULPDU headers in the TCP stream,
186	     even when TCP segments arrive out of order.

188	2.2.  Approach

190	     TUF is implemented as a shim layer between an ULP and TCP.  The
191	     end-to-end data flow is:

193	     0.   Use of TUF is negotiated end-to-end by the ULP.

195	     1.   The ULP delivers a data stream with ULPDUs delimited to TUF.

197	     2.   TUF inserts a header and delivers the shimmed ULPDUs to TCP.

199	     3.   The TUF-aware TCP sender preserves boundaries of shimmed
200	          ULPDUs (TUF FPDUs) as much as possible when delivering
201	          segments to the IP layer.

203	     4.   The receiving TCP delivers shimmed ULPDUs to the receiving TUF
204	          layer.

206	     5.   TUF removes the shim and delivers the ULPDUs to the ULP.

208	     In other words, the layering of TUF is:

210	                          ULP client
211	                               ^
212	                               |
213	                               | ULPDUs (in octet stream)
214	                               |
215	                               v
216	                              TUF
217	                               ^
218	                               |
219	                               | FPDUs (containing ULPDUs)
220	                               |
221	                               v
222	                      TUF-conforming TCP
223	                               ^
224	                               |
225	                               | TCP Segments (each containing an FPDU)
226	                               |
227	                               v
228	                             . . .

230	     Note that while the semantics of this protocol layering must be
231	     maintained, the receiving network interface may use the information
232	     in the framed ULPDUs to place the data in memory on the host.
233	     Whatever the case, the data is only delivered to the ULP when all
234	     preceding TCP data has arrived.

236	3.  Rational For TUF

238	     This document defines the TUF protocol as a shim layer between an
239	     Upper Layer Protocol (ULP) and TCP.  TUF also depends on a TCP
240	     segmentation convention between TUF/TCP endpoints specified in this
241	     document.  Taken together they provide the capability for a TUF/TCP
242	     receiver to recognize ULPDUs by processing each TCP segment
243	     independently, without requiring state from previous segments.

245	     The purpose of TUF is to enable practical designs for enhanced
246	     network interfaces (NICs) implementing direct data placement for
247	     TCP-based ULPs.  The purpose of direct data placement is to
248	     eliminate the need for a host to copy received data after it
249	     arrives in host memory.  This copying incurs CPU, memory and bus
250	     costs that are substantial and are not masked by advancing hardware
251	     technology.

253	     A general and practical solution to the receive copy problem has
254	     eluded the IP networking community for almost two decades.  There
255	     is a long history of research and experimental schemes to reduce or
256	     eliminate receiver copying overhead for IP networking in general,
257	     and for TCP/IP communication in particular.  While these systems
258	     have convincingly demonstrated the potential performance benefits
259	     of reducing copy costs, all such schemes suffer from one or more of
260	     the following limitations: they require a significant restructuring
261	     of operating system buffering and/or APIs; they are limited to
262	     specific modes of communication (e.g., bulk data transfer) or
263	     specific application ULPs; they do not scale on multiprocessor
264	     hosts; their benefits depend on specific properties of the network
265	     (e.g., large MTUs) or host buffer size and alignment.  Moreover,
266	     all such schemes require some degree of support from NICs to
267	     separate payloads from headers and/or ensure that their placement
268	     in host memory meets specific requirements (e.g., for page
269	     placement and alignment).

271	     Inherent copying costs for IP communication are one motivation to
272	     use alternative non-IP technologies for high-speed networking.  A
273	     number of specialized technologies have been developed for high
274	     speed data transfers in which network interfaces transfer data from
275	     application buffer to application buffer without software touching
276	     the data.  Some examples include the VAXCluster Interconnect in
277	     1983, Fibre Channel (FC) in 1994, and today InfiniBand (IB) and
278	     Virtual Interface Architecture (VIA).  These alternatives have
279	     eroded the popularity of IP technologies in application domains
280	     including network storage, video processing and delivery, and
281	     cluster computing for scientific applications and scalable
282	     database-related services.

284	     Until recently, several factors have limited interest in promoting
285	     IP networking as a solution in these application domains.  First,
286	     the competing network technologies offered significantly higher
287	     link speeds than the network hardware available for use with IP.
288	     Second, these application domains were a relatively small segment
289	     of the network market.  Recently, however, Ethernet networks have
290	     closed the bandwidth gap and even exceeded the bandwidth of
291	     alternatives such as FibreChannel, at much lower cost.  At the same
292	     time, an increasing number of applications are server-hosted in
293	     data centers to enable sharing and access from a growing number of
294	     IP-connected client devices and locations.  With the growth in
295	     importance and number of data centers, high-speed interconnection
296	     within the data center is now central to the everyday operation of
297	     Internet services.

299	     Thus, technology changes have created an opportunity and demand to
300	     extend the benefits of IP technologies to high-performance
301	     application domains, while simultaneously increasing the importance
302	     of those domains.  The ubiquity of IP offers economies of scale
303	     heavily favoring IP in these domains.  For example, reliance on
304	     specialized non-IP technologies for high-performance domains
305	     creates a need to support multiple protocols and redundant network
306	     infrastructure in data centers, and it compromises portability and
307	     interoperability of data center solutions.  Moreover, comprehensive
308	     support for network management and security is developing rapidly
309	     in the IP space.  Use of IP technologies would allow data centers
310	     to benefit from these enhancements.

312	3.1.  Direct Data Placement

314	     Direct data placement is a key step toward making IP networking
315	     competitive in data centers and other high-performance domains.
316	     Direct data placement refers to the ability of a NIC to place data
317	     directly from the network into designated application buffers,
318	     without intermediate copying.  Direct data placement is attractive
319	     relative to other solutions to the receive copy problem.  It is the
320	     only solution that can be implemented in a way that is compatible
321	     with existing operating systems, since the receiving NIC takes over
322	     most of the responsibility to avoid receive copying.  Also, direct
323	     data placement generalizes easily to a range of ULPs.  In
324	     particular, the establishment of an IETF standard for an IP
325	     transport-based direct data placement protocol, which would allow
326	     NICs to directly place data independent of the application ULP
327	     using it.

329	     The TUF protocol is necessary to permit easily deployable enhanced
330	     NICs supporting direct data placement.  Such NICs already exist and
331	     their usage is growing rapidly, but their development is impeded by
332	     the lack of standards.  Direct data placement is unnecessarily
333	     difficult and expensive to design and implement for existing TCP-
334	     based ULPs; the key objective of TUF is to define transport
335	     conventions to simplify the design of these NICs.  A related
336	     impediment is that in the absence of a general direct data
337	     placement protocol these products are limited to specific ULPs such
338	     as iSCSI.  TUF, and possibly additional, higher layer protocol
339	     definitions outside the scope of this document, would encourage the
340	     market by ensuring interoperability of product offerings from
341	     different vendors.

343	     This document defines a framing protocol (TUF) and TCP segmentation
344	     conventions that enable simple support of direct data placement for
345	     a class of TCP-based ULPs.  It does not propose a generic direct
346	     placement ULP, such as an RDMA protocol, or any facility for direct
347	     data placement, but only the foundations for building such a
348	     facility on TCP.  A key objective of TUF is to do this in a way
349	     that is compatible with existing standards and with the spirit of
350	     TCP's stream communication model.  TUF can simplify support for
351	     direct data placement for ULPs such as iSCSI, and it can serve as a
352	     basis for a future RDMA proposal.

354	     The key limitation of TUF as a solution to the receive copy problem
355	     is that it works only if the ULP standard and the sending and
356	     receiving implementations all support it.  Impact on the sender and
357	     ULPs is minimal, but ULPs must be adapted to allow use of TUF at
358	     the ULP/transport boundary.  The necessary modifications may be
359	     quite small.  Use of TUF is a negotiated option between the sender
360	     and receiver for each ULP session, preserving interoperability
361	     among senders and receivers that do not support TUF.

363	3.2.  Direct Data Placement with TCP

365	     Direct data placement is widely used to accomplish high-performance
366	     data transfer in non-IP technologies such as block storage channels
367	     (SCSI, Fibre Channel, etc.), and other specialized high performance
368	     networks like InfiniBand.  This section considers how direct
369	     placement can be done with TCP.

371	     The Internet Protocol suite provides two transports that are prime
372	     candidates for use with direct data placement -- SCTP and TCP.  The
373	     framing features of the SCTP Stream Control Transmission Protocol
374	     [SCTP] make it more directly adaptable for direct data placement
375	     for future ULPs using SCTP.  However, the maturity and ubiquity of
376	     TCP make it desirable to define a flexible method for direct data
377	     placement for TCP-based ULPs as well.

379	     There has been a great deal of `moral confusion' concerning the
380	     interaction of direct data placement with TCP's ordering
381	     guarantees.  These ordering guarantees do not prohibit direct data
382	     placement, even if data is placed as it arrives out of order.

384	     TCP guarantees data delivery to the application ULP as an ordered,
385	     sequential stream [RFC793].  Data is delivered only when TCP has
386	     notified the application of its arrival and transferred ownership
387	     of the receive data buffer.  TCP does not specify how received data
388	     is stored prior to its delivery, and it does not preclude placement
389	     of data in application buffers out of order, as long as no data is
390	     delivered until all preceding data has also been delivered.  Out-
391	     of-order placement greatly simplifies direct data placement NICs
392	     because it streamlines data paths and eliminates the need for a TCP
393	     reassembly buffer on the NIC.

395	     An implementation performing direct data placement must still
396	     respect all TCP delivery semantics.  For example, if a checksum
397	     integrity check fails, the data must not be placed in ULP-supplied
398	     buffers, because, for example, the TCP ports and the TCP sequence
399	     number are not trustworthy.

401	3.2.1.  The Simple Case: ULP-unaware Placement

403	     Direct data placement into a ULP client-supplied buffer designated
404	     to hold the next data delivered to the ULP, regardless of the
405	     contents of the received data, is one of the simplest possible
406	     forms of direct data placement.  This form of direct data placement
407	     is already fully supported by existing TCP mechanisms.  New NIC
408	     products currently, or soon to be available, which claim to offer
409	     `full zero copy operation' typical provide only this ULP-unaware
410	     form of direct data placement.

412	     While ULP-unaware direct data placement works well for ULPs like
413	     FTP where the entire contents of a TCP connection are known to be
414	     nothing but a single stream of bulk client data, most widely used
415	     ULPs, e.g. HTTP [HTTP], BEEP [BEEP] and storage protocols,
416	     multiplex control and data, and possibly even interleave data from
417	     different requests on the same TCP connection.  The simple ULP-
418	     unaware direct data placement is inadequate to avoid data copies
419	     for these ULPs.

421	3.2.2.  The Complex Case: ULP-aware Placement

423	     An explicit goal of this proposal is to support out-of-order direct
424	     data placement for ULPs that provide additional transport-like
425	     features such as control and data multiplexing, layered above TCP
426	     (e.g., iSCSI or a generic direct data placement protocol such as
427	     RDMA).  In many ULPs, such as storage protocols, control
428	     information contained in the ULP uniquely identifies the
429	     destination application buffer of each particular piece of data.

431	     For example, suppose a client requests a read operation using a
432	     network storage ULP, specifying the destination buffer for the
433	     requested data.  The requesting ULP includes control information in
434	     the request (e.g., in the ULPDU header) uniquely identifying that
435	     buffer, and the responder includes that information in the read
436	     response.  For some protocols, the identifier is a unique request
437	     ID, allowing the client ULP to identify the buffer indirectly
438	     through a table of pending requests.  If the storage protocol uses
439	     RDMA, the response may specify the buffer directly by means of a
440	     region identifier.

442	     A network interface that understands the relevant ULP control
443	     information can use it to place the incoming data (e.g., read
444	     response payload) directly in the correct buffer.  In this case,
445	     data placement is guided by ULPDU headers embedded in the TCP data
446	     stream.  The NIC accesses these headers as hints for placement of
447	     the ULP payloads--a form of integrated layer processing for each
448	     TCP segment as it arrives.  This is compatible with TCP's ordering
449	     properties if completion of ULP header processing and delivery of
450	     the payload data to the application are strictly in order.

452	3.2.3.  The Problem of ULP-aware Placement with TCP

454	     The problem with performing direct data placement as a function of
455	     ULP control information in TCP is that it may be difficult to
456	     locate the ULP control information (ULPDU headers) within a TCP
457	     segment.

459	     If all TCP segments are received in sequence order, ULP control
460	     information can be unambiguously located by the rules that permit
461	     any ULP implementation to do so.  For example, each ULPDU may
462	     contain a length field that implicitly specifies the location of
463	     the beginning of the subsequent ULPDU.

465	     If TCP segments are not received in sequence order, without taking
466	     additional measures, it may not be possible to unambiguously locate
467	     ULP control information needed for direct data placement.  For
468	     example, if ULPDU length information is in a TCP segment that is
469	     delayed or lost in transmission, assuming the ULPDU length is the
470	     only means of locating the beginning of the subsequent ULPDU, it is
471	     impossible to locate ULP control information for ULPDUs in
472	     subsequent TCP segments until the lost or delayed TCP segment is
473	     received.  ULP control information, and the data whose placement
474	     depends on it may even be in different TCP segments.  If the ULP
475	     control information is in a TCP segment that is delayed or lost, it
476	     is impossible to directly place the data until the ULP control
477	     information is received.

479	3.2.4.  Finding ULPDUs In Out-of-order Segments

481	     Early attempts at ULP-aware direct data placement in TCP took the
482	     approach of only directly placing data for TCP segments received
483	     in-order.  Otherwise, data was copied through a reassembly buffer
484	     as in a traditional implementation.  Unfortunately packet loss, and
485	     attendant out-of-order reception is a frequent, continuous
486	     characteristic of both wide-area, and switched local area networks
487	     of almost any size, as TCP adjusts to varying congestion
488	     conditions.  Under these conditions, a large portion of the data
489	     transferred ends up being copied, rather than being directly
490	     placed.

492	     Another solution to this problem is to build a reassembly buffer
493	     into the network interface.  Data received out-of-order can be held
494	     in the network interface reassembly buffer until all preceding data
495	     is received, and then direct placement can be performed on the
496	     reassembled data.  Within certain implementation assumptions, this
497	     is reasonable approach, but, unfortunately there are a number of
498	     issues including very large memory requirements, limited
499	     scalability, and increased latency, that make the reassembly
500	     approach undesirable.

502	     The size of reassembly buffer needed in the network interface is a
503	     direct function of the bandwidth * delay product of all active TCP
504	     connections.  Reasonable assumptions on the active bandwidth *
505	     delay product can imply a large amount of reassembly memory.
506	     Furthermore, this large reassembly memory must run at high
507	     speed---more than two times the link speed, to maintain full link
508	     bandwidth.

510	     Finally, performing reassembly in the network interface requires
511	     that the bandwidth from the network interface to host memory be not
512	     just equal, but substantially greater than the maximum bandwidth of
513	     the network link, to ensure that the reassembly buffer is drained
514	     when reassembly is complete.  System bus and interconnect bandwidth
515	     are particularly scarce and expensive resources in most systems.

517	     What is needed to permit ULP-aware direct data placement without
518	     reassembly buffering is a way to ensure that the ULP control
519	     information and the data associated with it is highly likely to be
520	     contained completely within a single TCP segment, and a way for a
521	     receiver to validate this containment property on TCP segments it
522	     receives.  If the receiver can determine that a ULPDU starts at the
523	     beginning of a TCP segment, the receiver can perform ULP-aware
524	     direct placement for that ULPDU, and subsequent ULPDUs contained in
525	     that TCP segment.  The property that a ULPDU is completely
526	     contained within a TCP segment is called the `ULPDU containment
527	     property'.

529	3.2.5.  The TUF Solution

531	     The TUF protocol defines a shim layer above TCP and below the ULP
532	     that allows the receiver to validate the ULPDU containment property
533	     for each TCP segment received, independently of any other TCP
534	     segment.  The TUF protocol also defines a segmentation behavior for
535	     the TCP sender that ensures the ULPDU containment property holds as
536	     often as possible while still respecting the protocol requirements
537	     for TCP senders.

539	     The TUF-specified TCP segmentation behavior ensures that the ULPDU
540	     containment property is maintained as long as the receiver window
541	     size is at least equal to the effective MSS (EMSS), the path MTU
542	     (PMTU) does not change, and the TCP stream is not resegmented by an
543	     intermediary.  In conditions where the TCP receiver window size is
544	     smaller than EMSS, or the PMTU changes, the segmentation behavior
545	     further ensures that once the relevant condition is restored, the
546	     ULPDU containment property will be satisfied again.

548	     For the high-performance applications that this protocol targets,
549	     small receiver window sizes, and PMTU changes are rare transients.
550	     Thus, the specified protocol ensures that ULP control information
551	     and its associated data are virtually always together in a single
552	     TCP segment.

554	3.2.6.  TUF's ULP Assumptions

556	     A key assumption of TUF is that ULPs running on TUF can adjust
557	     ULPDU sizes to fit completely within an EMSS-sized TCP segment.
558	     Clearly, if a ULPDU does not fit within an EMSS-sized TCP segment,
559	     the ULPDU containment property can not be satisfied.  Most storage
560	     protocols (e.g. iSCSI), and other performance-targeted protocols
561	     (e.g. RDMA protocols) support this capability.  ULPs that can not
562	     adjust ULPDU sizes to fit within an EMSS-sized TCP segment, but
563	     still want the performance advantages of direct data placement, can
564	     be mapped on top of an intermediate protocol (e.g. an RDMA
565	     protocol) that does support this data `chunking'.

567	     TUF does not change the stream delivery semantics of TCP to the
568	     ULP, through the TUF implementation.  It merely inserts a shim
569	     header that can be used by direct placement network interfaces to
570	     verify the ULPDU containment property.  The shim header is inserted
571	     by the sending TUF implementation and removed by the receiving TUF
572	     implementation, leaving a stream to be delivered to the ULP.

574	4.  The Protocol

576	     This section defines the TUF protocol itself.  The first two
577	     sections are the core of the protocol defining:

579	     o    the shim layer PDUs, called FPDUs,

581	     o    a TCP-conforming segmentation behavior which ensures the ULPDU
582	          containment property holds under most conditions.

584	     The remaining sections cover other aspects of the protocol which
585	     are primarily implications of the core protocol:

587	     o    what ULP-specified negotiations to enable TUF must accomplish,

589	     o    how receivers can process received TCP segments to establish
590	          whether the ULPDU containment property holds.

592	4.1.  The Framing Protocol Data Unit (FPDU)

594	     TUF sends groups of one or more complete ULPDUs in a framing
595	     protocol data unit (FPDU).

597	4.1.1.  FPDU Format

599	     The format of an FPDU is:

601	     0                   1                   2                   3
602	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
603	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
604	     |          Length               |             Key               |
605	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
606	     |                              Key                              |
607	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
608	     |                                                               |
609	     |                                                               |
610	     ~                                                               ~
611	     ~                            ULPDUs                             ~
612	     |                                                               |
613	     |                                                               |
614	     |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
615	     |            ULPDUs             |
616	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

618	     Length: 16 bits (unsigned integer)
619	          This is the length in octets of the set of framed ULPDUs.  It
620	          does not include the length of the FPDU header itself.

622	     Key: 48 bits (unsigned integer)

624	          This is used by the receiver to validate the ULPDU containment
625	          property.  It is selected at random by the sender, and
626	          initially signaled to the receiver in a ULP-specified way,
627	          before the receiver attempts to test the ULPDU containment
628	          property.  All FPDUs sent on the same connection in the same
629	          direction must use the same key value.  A good quality random
630	          number generator MUST be used to generate the initial key.
631	          RFC 1750 discusses relevant characteristics and provides
632	          references for good quality random number generation
633	          [RFC1750].

635	     The length of an FPDU is 8 + L octets, where L is the length of the
636	     set of framed ULPDUs.  The 16-bit length field is sufficient to
637	     permit a TCP segment with an FPDU to completely fill a maximum-size
638	     IPv4 or IPv6 datagram.

640	4.1.2.  FPDU Size Selection

642	     Each FPDU SHOULD contain as many contiguous, complete ULPDUs as
643	     will fit within the current EMSS, unless ULPDU packing is disabled.
644	     If ULPDU packing is disabled each FPDU SHALL contain a single
645	     ULPDU.  ULPDU packing mode may be negotiated, or specified a priori
646	     by a ULP.  Disabling ULPDU packing is analogous to disabling the
647	     Nagle algorithm in TCP.

649	     TUF SHALL present the size of the largest ULPDU size fitting in an
650	     EMSS-sized FPDU (MULPDU) to the ULP.  MULPDU is EMSS - the FPDU
651	     header size (8 octets).  ULPs SHOULD submit as large ULPDUs as
652	     possible to TUF, up to MULPDU, subject to limits imposed by
653	     specific ULP properties.  The ULP MAY also chose to pack several
654	     ULPDUs into an EMSS-sized unit before submitting them as one ULPDU
655	     to TUF.  Depending upon the ULP, ULP packing may improve data
656	     transfer efficiency, and is unlikely to have any detrimental
657	     effect.

659	     A TUF implementation probing for PMTU increase SHOULD present an
660	     increased MULPDU value to the ULP until a large enough FPDU to
661	     perform the probe results.

663	     Under exceptional circumstances, the EMSS can become too small to
664	     accommodate even a single ULPDU.  For example, a ULP may define
665	     fixed-sized PDUs that are incompressible, or variable size PDUs
666	     with some absolute minimum size, such as the size of a data PDU
667	     containing a minimum amount of data.  It is possible for the EMSS
668	     to shrink to as small as 8 octets [PathMTU].  If the EMSS is too
669	     small to accommodate an incompressible ULPDU, the FPDU MUST contain
670	     only that ULPDU.  ULPs using TUF SHOULD NOT define ULPDUs with a
671	     minimum size greater than 128 octets.

673	4.2.  TUF-conforming TCP Sender Segmentation

675	     TCP senders are allowed substantial freedom in the choice of how to
676	     segment an outgoing TCP stream.  Within the confines of the
677	     receiver-advertised receive window, and the sender computed
678	     congestion window, any segmentation is permitted.  Virtually all
679	     TCP implementations do attempt to segment outgoing TCP streams into
680	     EMSS-sized segments where possible because it improves performance.

682	     TUF-conforming TCP sender behavior ensures that the ULPDU
683	     containment property holds most of the time.  To do this, a TUF-
684	     conforming TCP sender MUST respect a single additional rule in
685	     performing segmentation:

687	          A TUF-conforming TCP sender MUST segment the outgoing TCP
688	          stream such that the first octet of every FPDU is sent at the
689	          beginning of a TCP segment

691	4.3.  Negotiating TUF

693	     Negotiating the use of TUF is the responsibility of the ULP.  The
694	     use of TUF MAY be negotiated separately for each direction on a
695	     connection.  The negotiation procedure MUST ensure that when TUF is
696	     enabled or disabled, the remote peer will not transmit its first
697	     TCP segment in the new mode until it is certain that the local peer
698	     has actually enabled or disabled TUF.

700	     TUF operation is characteristically requested by the receiver and
701	     offered by the sender.  Before enabling TUF, the relevant
702	     parameters:

704	     1.   the sender's 48-bit key

706	     2.   ULPDU packing mode

708	     MUST be established at each peer.

710	     A natural way to enable the use of TUF is a ULP-defined negotiation
711	     exchange of the TUF parameters culminating in enabling TUF, if
712	     requested, for each transfer direction.  A three-way handshake
713	     protocol can be used to ensure that the point at which TUF is
714	     enabled is unambiguous and each end has time to perform local state
715	     changes.  A connection on which TUF is enabled is likely to be the
716	     same connection on which the negotiation occurs, but this is not
717	     required.  A new connection could also use TUF from its initial
718	     establishment, if the TUF parameters and modes are known through
719	     some out-of-band mechanism.

721	     Use of TUF could be disabled during a connection using a similar
722	     ULP-defined three-way handshake.

724	     Other alternatives to parameter exchange include stipulating some
725	     parameters a priori.  For example, a ULP could specify that TUF
726	     with ULPDU packing enabled is always used in both directions.  In
727	     this case, only the 48-bit keys need to be exchanged before TUF is
728	     enabled.  Or, a ULP could determine TUF characteristics on the
729	     basis of the TCP port number.

731	4.4.  TUF Receiver ULPDU Containment Property Testing

733	     A TUF receiver that wishes to use ULP control information to
734	     perform direct data placement must first verify the ULPDU
735	     containment property.  To do this, the receiver MUST establish that
736	     the TCP segment contains exactly one FPDU.  Abstractly, this can be
737	     done by assuming the TCP segment payload begins with an FPDU, and
738	     verifying the following properties of that putative FPDU:

740	     o    The received TCP segment payload length equals the FPDU length
741	          plus the length of the FPDU header (8 octets).

743	     o    The 48-bit key equals the value signaled to the receiver when
744	          TUF was enabled for the connection.

746	     If these conditions are true, the TUF receiver MAY assume that the
747	     ULPDU containment property holds, and use ULP control information
748	     to directly place data in the contained ULPDUs.

750	     TUF DOES NOT provide any information that a TUF receiver can use to
751	     locate ULP control information beyond the ULPDU containment
752	     property.  In particular, a TUF receiver MUST NOT scan TCP segments
753	     in an attempt to locate FPDUs that do not begin at the beginning of
754	     a TCP segment.  However, even if the ULPDU containment property
755	     does not hold, a TUF receiver may still be able to reliably locate
756	     and use ULP control information.  For example, if a received TCP
757	     segment contains the next unreceived data in the TCP stream, the
758	     location of ULPDUs in that segment are unambiguous.  The behavior
759	     of a TUF receiver acting on ULP control information located with
760	     properties other than the ULPDU containment property is not
761	     specified here.

763	5.  Protocol Characteristics

765	     This section discusses some characteristics and behavior which are
766	     implications of the TUF protocol.

768	5.1.  Properties Of TUF-conforming TCP Senders

770	     The general practice of TCP senders to send as much data as
771	     possible within a TCP segment (up to EMSS) implies that an FPDU
772	     whose size is less than or equal to EMSS, and whose first octet
773	     begins a TCP segment will be sent entirely within a single TCP
774	     segment.  This ensures the ULPDU containment property for that TCP
775	     segment.

777	     A TUF-conforming TCP sender still obeys all requirements of TCP.
778	     While the segmentation of a TUF-conforming TCP sender will have
779	     distinctive characteristics when viewed from the network wire, the
780	     same segmentation behavior could also result from a stock TCP
781	     sender.

783	     The one property of a TUF-conforming TCP sender which arguably
784	     departs from traditional expectations is that a TUF-conforming TCP
785	     sender may not produce TCP segments which are as close in size to
786	     EMSS as a stock TCP sender.  The need to ensure the ULPDU
787	     containment property may result in TCP segments which are not as
788	     full as if the property did not need to hold.  While this is
789	     abstractly true, in practice, several characteristics combine to
790	     minimize this effect.  Specifically:

792	     o    Packing ULPDUs into FPDUs gives behavior similar to that of
793	          stock TCP segmentation, albeit with coarser granularity.

795	     o    ULPs which benefit from data-dependent direct data placement
796	          (candidates for TUF) usually transfer large amounts of data in
797	          bulk.  This means that most ULPDUs are data-carrying, and will
798	          be EMSS-sized.  Even when control is interleaved with data,
799	          the combination of a small number of control ULPDUs with a
800	          data ULPDU can be packed to fill an EMSS-sized segment.

802	     Therefore, a TUF-conforming TCP sender seems likely to behave
803	     similarly to a stock TCP sender under most circumstances.  However,
804	     applications that both send and receive data over the same TCP
805	     connection, where there might be dependencies between incoming and
806	     outgoing data, are often subject to excessive delays attributable
807	     to TCP's Nagle algorithm and/or delayed-ACK algorithm [NagleDAck].
808	     These algorithms generally perform best when TCP always sends full-
809	     EMSS segments.  Because TUF can generate sub-EMSS segments as a by-
810	     product of aligning FPDU boundaries with TCP segment boundaries,
811	     TUF might be especially vulnerable to the known problems with the
812	     Nagle and/or delayed-ACK algorithms.

814	     Further work, including implementation experience with TUF, as well
815	     as existing and future proposals for improvements to the Nagle
816	     and/or delayed-ACK algorithms, might be necessary to optimize TUF
817	     performance while fully preserving the congestion-avoidance
818	     features of TCP.  This work is currently outside the scope of this
819	     document.

821	5.2.  Exception Cases

823	     The complete operational specification of TUF is contained in the
824	     rules for forming FPDUs, and sending those FPDUs in TCP segments.
825	     However, the operation of TUF will be subject to a variety of
826	     transient or exceptional conditions.  The behavior of TUF under
827	     those conditions is discussed below to illustrate specifically how
828	     TUF addresses them.

830	5.2.1.  Resegmenting Intermediaries

832	     Resegmenting TCP-layer intermediaries (middleboxes) are one of the
833	     most formidable obstacles to maintaining the ULPDU containment
834	     property.  In the presence of such an intermediary, the
835	     segmentation chosen by the sender may not be the segmentation at
836	     the receiver.  While such intermediaries may or may not be common
837	     in particular networks, in many cases the presence or absence of
838	     such resegmenting behavior is beyond the control or even knowledge
839	     of the end points using TUF.  Therefore, TUF must detect such
840	     resegmentation by design.

842	     A primary reason for the presence of a random key in the FPDU
843	     header is to detect such resegmentation.  An alternative to the
844	     random key which has been proposed, is to use ULP-specific
845	     validation criteria to determine the ULPDU containment property.
846	     For example, some ULP PDUs include relatively strong data integrity
847	     checks such as CRCs, and other ULP control information can often be
848	     validated against various ULP-specific criteria.

850	     While such ULP-specific validation criteria may involve checking
851	     many more bits than the combination of the FPDU's 16-bit length and
852	     48-bit key, ULP-specific validation criteria may not actually offer
853	     a strong guarantee of the ULPDU containment property.  For certain
854	     data streams, the probability of a false-positive indication of the
855	     ULPDU containment property can be extremely high.

857	     Assume that the intermediary resegments to a granularity of no
858	     finer than G octets (e.g. 4).  Also assume that the TCP data stream
859	     contains predominantly application data.  If the ULP is a storage
860	     protocol, simply transferring a file containing a continuous,
861	     repeated stream of well-formed ULPDUs which are some multiple of G
862	     in size increases the probability of a false-positive indication of
863	     the ULPDU containment property to approximately:

865	             1 / (sizeof(repeated ULPDU)/G)

867	     If the well-formed ULPDUs are relatively small (e.g. 32 octets
868	     where G=4 octets), the probability of a false-positive indication
869	     of the ULPDU containment property is approximately 1/8, for EACH
870	     TCP segment which does not actually begin with a ULPDU.  Clearly,
871	     in this case, it would take only a very small number of TCP
872	     segments which do not begin with an actual ULPDU before the `fake'
873	     ULPDU in the application data is interpreted as an actual ULPDU.
874	     The consequences of such a false-positive interpretation could be
875	     dire, for example executing a destructive operation request.

877	     The 48-bit random key in the FPDU results in a low probability of a
878	     false-positive indication of the ULPDU containment property because
879	     it is effectively secret with respect to the application data
880	     stream.

882	     Note that although this analysis may appear to be security-minded,
883	     prompting the image of a sighted third-party adversary that can
884	     `sniff' the 48-bit key, it is actually considering a safety, rather
885	     than a security property.  The security properties of TUF are
886	     discussed in Section 6 (`Security Considerations') below.

888	     Even though TUF can detect the presence of a resegmenting
889	     intermediary, such an intermediary will almost certainly
890	     substantially reduce the chance of the ULPDU containment property
891	     being satisfied.  A TUF implementation which detects a very low
892	     incidence of the ULPDU containment property for a sustained
893	     interval (>> RTT) may assume that a resegmenting intermediary is in
894	     operation and SHOULD discontinue the use of ULP control information
895	     found using the ULPDU containment property.  In such cases, the ULP
896	     MAY elect to disable the use of TUF altogether, or simply just stop
897	     exploiting the ULPDU containment property.

899	5.2.2.  PMTU Reduction

901	     When a PMTU reduction is detected by a TUF-compliant TCP, the TUF-
902	     compliant TCP sender may send FPDUs already committed to the TCP
903	     layer in one of two ways:

905	     o    send unsegmented FPDUs in TCP segments of the old EMSS size,
906	          and rely on IP fragmentation to deliver the segments,

908	     o    segment FPDUs to fit in TCP segments which respect the new
909	          EMSS size.

911	     Stock TCPs face a similar choice on PMTU change, and both
912	     alternatives are used in practice.

914	     In the case that a TUF-compliant TCP chooses to segment FPDUs, it
915	     SHOULD segment them in such a way that, in the absence of
916	     resegmentation by an intermediary, the segments are guaranteed not
917	     to give a false-positive indication of the ULPDU containment
918	     property.  There are various ways to ensure this.  For example, no
919	     matter how the FPDU is segmented, the first segment is guaranteed
920	     not to give a false-positive indication of the ULPDU containment
921	     property---the 48-bit key will match, but the length will not.  In
922	     the worst possible case, each subsequent TCP segment could be sent
923	     with fewer than 8 octets of data, also guaranteed not to give a
924	     false-positive indication of the ULPDU containment property.  More
925	     efficient approaches are possible, but PMTU reduction is a rare
926	     event, and reacting to it is only a transient condition.
927	     Eventually a new MULPDU will be presented to the ULP, and FPDUs
928	     that fit in the new EMSS will result.  During the transient
929	     condition, performance will suffer temporarily no matter how FPDUs
930	     are segmented.

932	     No matter what segmentation is chosen by a TUF-compliant TCP sender
933	     when segmenting an FPDU, if the segments pass through a
934	     resegmenting intermediary, the correctness of the ULPDU containment
935	     property remains strictly a matter of probability.

937	5.2.3.  PMTU Increase

939	     As described in `FPDU Size Selection' above, a TUF-compliant TCP
940	     probing for PMTU increase will present an increased MULPDU value to
941	     the ULP.  This should eventually lead to an FPDU large enough to
942	     actually perform the PMTU increase probe.  The MULPDU value should
943	     not be further adjusted until the probe is actually performed.
944	     This behavior is similar to when a stock TCP would like to perform
945	     a PMTU increase, but less data is available than would fill the
946	     desired segment.

948	     Also, note that depending on the ULP, the actual distribution of
949	     FPDU sizes may have a granularity coarser than a single octet.  An
950	     FPDU with an particular, desired TCP segment size may never be
951	     generated.  Therefore when probing for PMTU increase, a TUF-
952	     compliant TCP must be satisfied with an FPDU that produces a TCP
953	     segment size that is `close' to the desired size.

955	     Finally, note that in cases where PMTU grows and shrinks relatively
956	     frequently, better performance may result from not probing for PMTU
957	     increase at all, or probing very rarely.  This is because the
958	     performance disruption resulting from PMTU decrease can be
959	     substantial, and in many cases, implementations of TUF will be in
960	     hardware, so performance may less sensitive to differences in PMTU.

962	5.2.4.  Receive Window < EMSS

964	     A TUF-compliant TCP sender that is presented with a receive window
965	     smaller than EMSS may be required to segment FPDUs.  The TCP window
966	     probe is a limiting case of this condition where the advertised
967	     receive window is 0, and the amount of data typically sent in
968	     response is a single octet.

970	     In this case, a TUF-compliant TCP sender will segment in accordance
971	     to the requirements of TCP, and the rule defined in `TUF-conforming
972	     TCP Sender Segmentation' above.  In addition, as when resegmenting
973	     in response to PMTU decrease, a TUF-compliant TCP sender SHOULD
974	     segment in such a way that, in the absence of a resegmenting
975	     intermediary, segments are guaranteed not to give a false-positive
976	     indication of the ULPDU containment property.  In situations where
977	     the receive window is smaller than EMSS, data transfer performance
978	     is likely to be limited independently of any segmentation behavior
979	     by the TCP sender.  Furthermore, ULP implementations that choose to
980	     use TUF will almost certainly be designed to maintain a receiver
981	     window larger than EMSS, so a small receiver window should occur
982	     extremely infrequently.

984	5.2.5.  Size of ULPDU + 8 > EMSS

986	     In cases where EMSS shrinks below the minimum size of a ULPDU that
987	     a ULP wants to send, TUF will create FPDUs that are larger than
988	     EMSS, and a TUF-compliant TCP sender will face the same
989	     alternatives as during PMTU reduction:

991	     o    send unsegmented FPDUs and rely on IP fragmentation to deliver
992	          the segments

994	     o    segment FPDUs to fit in TCP segments which respect the EMSS
995	          size

997	     A ULP which is presented with an MULPDU value that is too small to
998	     accommodate PDUs necessary operation SHOULD simply attempt to use
999	     ULPDUs which are as small as possible

1001	     If the EMSS shrinks to a pathologically small size, then a TUF
1002	     implementation SHOULD discontinue the use of ULP control
1003	     information found using the ULPDU containment property.  In such
1004	     cases, the ULP MAY elect to disable the use of TUF altogether, or
1005	     simply just stop exploiting the ULPDU containment property.

1007	     A path MTU which results in an EMSS < 128 + 8 octets is an
1008	     extremely unlikely occurrence and when it does occur, poor data
1009	     transfer performance is a likely result, independent of TCP sender
1010	     segmentation behavior.

1012	6.  Security Considerations

1014	     This section discusses both protocol-specific considerations and
1015	     the implications of using TUF with existing security mechanisms.

1017	6.1.  Protocol-specific Security Considerations

1019	     A third-party that can inject spoofed packets into the network
1020	     which can be delivered to a TUF receiver could launch a variety of
1021	     attacks that exploit TUF-specific behavior.  For example a blind
1022	     third-party adversary could inject random packets which appear in
1023	     the valid TCP window and do not begin with valid FPDU headers.  A
1024	     barrage of such packets might cause a TUF receiver to conclude that
1025	     a resegmenting intermediary is present and disable the use of TUF
1026	     and direct data placement.  This would substantially degrade
1027	     performance.  However, it would probably also have more dire
1028	     consequences than performance, such as causing the ULP to interpret
1029	     the bogus data as valid.  Furthermore, such a third-party could
1030	     also degrade performance just as effectively in a TUF-independent
1031	     way by injecting spoofed ICMP packets which result in reduction of
1032	     the path MTU to an inefficiently small size.

1034	     Fundamentally, the vulnerabilities of TUF to active third-party
1035	     interference are no more acute than to TCP without TUF.  In both
1036	     cases, a communication security mechanism such as IPSec is the only
1037	     way to completely prevent such attacks.

1039	6.2.  Using IPSec With TUF

1041	     Since IPSec is designed to secure arbitrary IP packet streams,
1042	     including streams where packets are lost, TUF can run cleanly on
1043	     top of IPSec without any change.  IPSec packets may be decrypted in
1044	     the order they are received, and a TUF receiver may test and
1045	     exploit the ULPDU containment property just as if the IP datagram
1046	     were unsecured.

1048	6.3.  Using TLS With TUF

1050	     Using TLS [TLS] with TUF, particularly trying to exploit the ULPDU
1051	     containment property to locate ULP control information, is not a
1052	     straightforward process.  TUF can be directly layered on top of
1053	     TLS, but many of the advantages of TUF are lost.  This document
1054	     does not define a way of using TLS with TUF that could offer better
1055	     performance than stock reassembly buffer-based implementations.
1056	     That task is left to a different document, if there is sufficient
1057	     motivation to address the problems.  This section does outlines
1058	     some of the known complications of trying to do better than stock
1059	     reassembly buffer-based implementations using TLS with TUF.

1061	     TLS is a record-oriented protocol.  TLS records are PDUs with a
1062	     similar structure to ULPDUs defined in application ULPs.  As with
1063	     other ULPs, the only way to avoid a complete reassembly buffer is
1064	     to be able to find TLS PDUs in the presence of lost TCP segments.
1065	     The ULPDU containment property could be used to do this, which
1066	     suggests that TLS itself should be layered on top of TUF.  In this
1067	     case, the FPDU header will travel in the clear, but this will
1068	     probably not present serious vulnerabilities other than denial of
1069	     service attacks comparable to what is already possible without TUF.

1071	     Once the TLS records are located and processed it still remains to
1072	     locate the ULPDUs.  The simplest way to do this would be to have
1073	     the TLS implementation be TUF-compliant, and ensure the ULPDU
1074	     containment property within each TLS record.  In this case, the
1075	     protocol layering would look like:

1077	                         ULP client
1078	                              ^
1079	                              |
1080	                              | ULPDUs (in octet stream)
1081	                              |
1082	                              v
1083	                      TUF-conforming TLS
1084	                              ^
1085	                              |
1086	                              | TLS records (containing ULPDUs)
1087	                              |
1088	                              v
1089	                             TUF
1090	                              ^
1091	                              |
1092	                              | FPDUs (each containing a TLS record)
1093	                              |
1094	                              v
1095	                     TUF-conforming TCP
1096	                              ^
1097	                              |
1098	                              | TCP Segments (each containing an FPDU)
1099	                              |
1100	                              v
1101	                            . . .

1103	     An obvious complications of using TLS with TUF is that ciphers
1104	     defined for use with TLS do not offer independence across TLS
1105	     records.  The most common cipher used with TLS is RC4, which is a
1106	     stream cipher.  Efficient decryption of an RC4 stream depends upon
1107	     the entire preceding data stream.  In other words, it is simply not
1108	     feasible to decrypt TLS records encrypted with RC4 in any order
1109	     other than the TCP stream order.  This clearly defeats the purpose
1110	     of TUF.

1112	     TLS is also defined to work with block ciphers such as 3DES in
1113	     Cipher Block Chaining (CBC) mode.  In this case, the dependency of
1114	     the decryption operation on data in previous TLS records is less
1115	     severe.  To decrypt the current TLS record only requires ciphertext
1116	     from the previous TLS record.  While this does not allow complete
1117	     independence of processing TLS records, a lost or delayed TCP
1118	     segment containing a TLS record only prevents decrypting the
1119	     immediately subsequent TLS record, not all TLS records after it.

1121	     TLS compression presents another complication to using TLS with
1122	     TUF.  TLS compression algorithms are allowed to increase the
1123	     content length by up to 1024 octets.  If the content length does
1124	     increase, the TLS record may not fit within an EMSS-sized TCP
1125	     segment, even if the uncompressed ULPDU does.  If the risk of
1126	     exceeding an EMSS-sized TCP segment is small, it may be acceptable
1127	     to occasionally send FPDUs containing TLS records that span several
1128	     TCP segments, or use IP fragmentation.  Some TLS compression
1129	     algorithms may never increase the content length, or only increase
1130	     it by some small, manageable amount.

1132	7.  IANA Considerations

1134	     If framing is enabled a priori for a ULP by connecting to a well-
1135	     known port, this well-known port would be registered for the framed
1136	     ULP with IANA.

1138	8.  References

1140	     [BEEP]
1141	          Rose, M., "The Blocks Extensible Exchange Protocol Core", RFC
1142	          3080, March 2001.

1144	     [HTTP]
1145	          Fielding, R. and others, "Hypertext Transfer Protocol --
1146	          HTTP/1.1.", RFC 2616, June 1999.
1147	          http://www.ietf.org/internet-drafts/draft-ietf-tsvwg-
1148	          initwin-00.txt.

1150	     [NagleDAck]
1151	          Minshall G., Mogul, J., Saito, Y., Verghese, B., "Application
1152	          performance pitfalls and TCP's Nagle algorithm", Workshop on
1153	          Internet Server Performance, May 1999.

1155	     [PathMTU]
1156	          Mogul, J., and Deering, S., "Path MTU Discovery", RFC 1191,
1157	          November 1990.

1159	     [RFC1750]
1160	          Eastlake, D., Crocker, S., Schiller., J., "Randomness
1161	          Recommendations for Security.", RFC 1750, December 1994.

1163	     [RFC2581]
1164	          Allman, M., and others, "TCP Congestion Control," RFC 2581,
1165	          April 1999.

1167	     [SCTP]
1168	          Stewart, R.R. and others, "Stream Control Transmission
1169	          Protocol," RFC2960, October 2000.

1171	     [Stevens]
1172	          Stevens, W. Richard, "Unix Network Programming Volume 1,"
1173	          Prentice Hall, 1998, ISBN 0-13-490012-X.

1175	     [TCP]
1176	          Postel, J., "Transmission Control Protocol - DARPA Internet
1177	           Program Protocol Specification", RFC 793, September 1981.

1179	     [TLS]
1180	          Dierks, T. and others, "The TLS Protocol, Version 1.0", RFC
1181	          2246, January 1999.

1183	Authors' Addresses

1185	     Stephen Bailey
1186	     Sandburst Corporation
1187	     600 Federal Street
1188	     Andover, MA  01810
1189	     USA

1191	     Phone: +1 978 689 1614
1192	     Email: steph@sandburst.com

1194	     Jeff Chase
1195	     Department of Computer Science
1196	     Duke University
1197	     Durham, NC 27708-0129
1198	     USA

1200	     Phone: +1 919 660 6559
1201	     Email: chase@cs.duke.edu

1203	     Jim Pinkerton
1204	     Microsoft, Inc.
1205	     1 Microsoft Way
1206	     Redmond, WA 98052
1207	     USA

1209	     EMail: jpink@microsoft.com
1210	     Allyn Romanow
1211	     Cisco Systems
1212	     170 W Tasman Drive
1213	     San Jose, CA 95134
1214	     USA

1216	     Phone: +1 408 525 8836
1217	     Email: allyn@cisco.com

1219	     Constantine Sapuntzakis
1220	     Cisco Systems
1221	     170 W Tasman Drive
1222	     San Jose, CA 95134
1223	     USA

1225	     Phone: +1 408 525 5497
1226	     EMail: csapuntz@cisco.com

1228	     Jim Wendt
1229	     Hewlett Packard Corporation
1230	     8000 Foothills Boulevard MS 5668
1231	     Roseville, CA 95747-5668
1232	     USA

1234	     Phone: +1 916 785 5198
1235	     EMail: jim_wendt@hp.com

1237	     Jim Williams
1238	     Emulex Corporation
1239	     580 Main Street
1240	     Bolton, MA 01740
1241	     USA

1243	     Phone: +1 978 779 7224
1244	     EMail: jim.williams@emulex.com

1246	Appendix A. Sample Sockets Support For TUF

1248	     The sockets support for TUF described below is only a sketch.  It
1249	     is provided as an aid to understanding TUF.  Implementing this
1250	     interface is not a requirement for a TUF implementation.

1252	     Other software interfaces are possible.  The described interface
1253	     draws from the sockets interface for UDP.  The described interface
1254	     might be natural for applications already designed to support both
1255	     TCP and UCP, or that do network input and output in complete PDU
1256	     units.  For applications that perform octet-at-a-time style input
1257	     and output, an alternative interface that draws from the tradition
1258	     of the TCP URG pointer interface (e.g. using a MSG_OOB flag to
1259	     send()) is equally possible.  An implementation may even offer
1260	     several different interfaces to TUF.

1262	     That said, the sockets support sketched below might well provide
1263	     the basis for a complete, standard interface to be described
1264	     outside this draft.

1266	A.1 Basic Principles

1268	     The sockets support for TUF takes the form of a set of socket
1269	     options that may be set or requested to enable the appropriate
1270	     behavior.

1272	     A socket may be in one of two TUF-related modes in the send
1273	     direction:

1275	     1.   TUF-compliant TCP sender mode.  No data (FPDU headers) is
1276	          added to the TCP octet stream, but each data buffer presented
1277	          in a sending operation is to be sent according to the rules of
1278	          TCP and TUF-compliant TCP senders.  This mode provides direct
1279	          access to a TUF-compliant TCP sender for purposes such as
1280	          implementing TUF.

1282	     2.   TUF sender mode.  An FPDU header is added to data presented by
1283	          an integral number of sending operations, and the FPDU is
1284	          passed to a TUF-compliant TCP sender for transmission

1286	     A socket may be in one TUF-related mode in the receive direction:

1288	     1.   TUF receiver mode.  FPDUs are expected in each TCP segment.

1290	     If a socket receiving operation is used to retrieve received data
1291	     (as opposed to the data being directly placed), FPDU headers are
1292	     removed before the data is returned.

1294	A.2 Enabling TUF
1295	          /* Pick a sending mode */
1296	          if (sendMode == TUF_TCP)
1297	            mode = TUF_SEND_TCP
1298	          else
1299	            mode = TUF_SEND;

1301	          mode |= TUF_RECEIVE;

1303	          setsockopt (s, SOL_TCP, TUF_MODE, &mode, sizeof(mode));

1305	A.3 Sending Data

1307	     The standard socket sending operations, including send(), sendto(),
1308	     sendmsg(), writev(), and others are used to send ULPDUs in TUF.
1309	     The EMSGSIZE error should be returned if the buffer passed to the
1310	     sending operation would result in an FPDU that does not fit in an
1311	     EMSS-sized TCP segment, unless oversized ULPDU errors are disabled,
1312	     as described below.

1314	     When the path EMSS increases, the sending operation MAY return
1315	     EMSGSIZE once to inform the client of the change.

1317	A.4 Retrieving The Current EMSS or MULPDU

1319	          getsockopt (s, SOL_TCP, TUF_MULPDU, &emss, sizeof(emss));

1321	     If the socket is in TUF_SEND_TCP mode, this call returns the TCP
1322	     EMSS.  If the socket is in TUF_SEND mode, the call returns the
1323	     maximum ULPDU that can be submitted in a sending operation without
1324	     requiring fragmentation of the associated FPDU.

1326	     The number should not count any octets that go towards TCP options.

1328	A.5 Disabling ULPDU Packing

1330	          flag = 0;
1331	          setsockopt (s, SOL_TCP, TUF_PACK_PDUS, &flag, sizeof(flag));

1333	     This call disables TUF from packing more than one ULPDU into an
1334	     FPDU.  By default, ULP PDU packing is enabled.

1336	A.6 Disabling The Report of Oversized ULPDUs

1338	          flag = 0;
1339	          setsockopt (s, SOL_TCP, TUF_REPORT_OVERSIZED, &flag,
1340	                      sizeof(flag));

1342	     This call disables sending operations from returning EMSGSIZE in
1343	     response to oversized ULPDUs.  It may be called at any time on a
1344	     socket, whether connected or not.  It is used to continue ULP
1345	     operation when MULPDU is already known to be too small to permit
1346	     some ULPDUs to be sent with out segmentation.  Oversized ULPDU
1347	     reporting can be enabled again if PMTU is discovered to have
1348	     increased.

1350	Full Copyright Statement

1352	     Copyright (C) The Internet Society (2001). All Rights Reserved.

1354	     This document and translations of it may be copied and furnished to
1355	     others, and derivative works that comment on or otherwise explain
1356	     it or assist in its implementation may be prepared, copied,
1357	     published and distributed, in whole or in part, without restriction
1358	     of any kind, provided that the above copyright notice and this
1359	     paragraph are included on all such copies and derivative works.
1360	     However, this document itself may not be modified in any way, such
1361	     as by removing the copyright notice or references to the Internet
1362	     Society or other Internet organizations, except as needed for the
1363	     purpose of developing Internet standards in which case the
1364	     procedures for copyrights defined in the Internet Standards process
1365	     must be followed, or as required to translate it into languages
1366	     other than English.

1368	     The limited permissions granted above are perpetual and will not be
1369	     revoked by the Internet Society or its successors or assigns.

1371	     This document and the information contained herein is provided on
1372	     an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
1373	     ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
1374	     IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
1375	     THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1376	     WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.