idnits 2.17.1 draft-ietf-tsvwg-tcp-ulp-frame-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 630: '...number generator MUST be used to gener...' RFC 2119 keyword, line 642: '... Each FPDU SHOULD contain as many ...' RFC 2119 keyword, line 644: '...sabled each FPDU SHALL contain a singl...' RFC 2119 keyword, line 649: '... TUF SHALL present the size of the...' RFC 2119 keyword, line 651: '...8 octets). ULPs SHOULD submit as larg...' (20 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC793' is mentioned on line 385, but not defined ** Obsolete undefined reference: RFC 793 (Obsoleted by RFC 9293) == Unused Reference: 'RFC2581' is defined on line 1163, but no explicit reference was found in the text == Unused Reference: 'Stevens' is defined on line 1171, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2616 (ref. 'HTTP') (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Possible downref: Non-RFC (?) normative reference: ref. 'NagleDAck' ** Obsolete normative reference: RFC 1750 (Obsoleted by RFC 4086) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 2960 (ref. 'SCTP') (Obsoleted by RFC 4960) -- Possible downref: Non-RFC (?) normative reference: ref. 'Stevens' ** Obsolete normative reference: RFC 793 (ref. 'TCP') (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2246 (ref. 'TLS') (Obsoleted by RFC 4346) Summary: 10 errors (**), 0 flaws (~~), 5 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Area Working Group S. Bailey (Sandburst) 3 Internet-draft J. Chase (Duke) 4 Expires: May 2002 J. Pinkerton (Microsoft) 5 A. Romanow (Cisco) 6 C. Sapuntzakis (Cisco) 7 J. Wendt (HP) 8 J. Williams (Emulex) 10 TCP ULP Framing Protocol (TUF) 11 draft-ietf-tsvwg-tcp-ulp-frame-01 13 Status of this Memo 15 This document is an Internet-Draft and is in full conformance with 16 all provisions of Section 10 of RFC2026. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six 24 months and may be updated, replaced, or obsoleted by other 25 documents at any time. It is inappropriate to use Internet-Drafts 26 as reference material or to cite them other than as "work in 27 progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 Copyright Notice 37 Copyright (C) The Internet Society (2001). All Rights Reserved. 39 Abstract 41 The TCP ULP Framing (TUF) protocol defines a shim layer protocol 42 between an Upper Layer Protocol (ULP) and TCP. TUF also depends on 43 a specified TCP segmentation convention between TUF endpoints. 44 Together, the shim and segmentation conventions enable a TUF/TCP 45 receiver to recognize ULP data units within a TCP segment 46 independently of other TCP segments. This capability simplifies 47 the design of enhanced network interfaces implementing direct data 48 placement for ULPs using TCP. Direct data placement is a key step 49 to making IP networking competitive with high-end interconnect 50 solutions in data centers and other high-performance application 51 domains. 53 Table Of Contents 55 1. Definitions . . . . . . . . . . . . . . . . . . . . . . 3 56 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . 4 57 2.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 4 58 2.2. Approach . . . . . . . . . . . . . . . . . . . . . . . . 5 59 3. Rational For TUF . . . . . . . . . . . . . . . . . . . . 6 60 3.1. Direct Data Placement . . . . . . . . . . . . . . . . . 7 61 3.2. Direct Data Placement with TCP . . . . . . . . . . . . . 8 62 3.2.1. The Simple Case: ULP-unaware Placement . . . . . . . . . 9 63 3.2.2. The Complex Case: ULP-aware Placement . . . . . . . . . 9 64 3.2.3. The Problem of ULP-aware Placement with TCP . . . . . . 10 65 3.2.4. Finding ULPDUs In Out-of-order Segments . . . . . . . . 11 66 3.2.5. The TUF Solution . . . . . . . . . . . . . . . . . . . . 12 67 3.2.6. TUF's ULP Assumptions . . . . . . . . . . . . . . . . . 12 68 4. The Protocol . . . . . . . . . . . . . . . . . . . . . . 13 69 4.1. The Framing Protocol Data Unit (FPDU) . . . . . . . . . 13 70 4.1.1. FPDU Format . . . . . . . . . . . . . . . . . . . . . . 13 71 4.1.2. FPDU Size Selection . . . . . . . . . . . . . . . . . . 14 72 4.2. TUF-conforming TCP Sender Segmentation . . . . . . . . . 15 73 4.3. Negotiating TUF . . . . . . . . . . . . . . . . . . . . 15 74 4.4. TUF Receiver ULPDU Containment Property Testing . . . . 16 75 5. Protocol Characteristics . . . . . . . . . . . . . . . . 17 76 5.1. Properties Of TUF-conforming TCP Senders . . . . . . . . 17 77 5.2. Exception Cases . . . . . . . . . . . . . . . . . . . . 18 78 5.2.1. Resegmenting Intermediaries . . . . . . . . . . . . . . 18 79 5.2.2. PMTU Reduction . . . . . . . . . . . . . . . . . . . . . 19 80 5.2.3. PMTU Increase . . . . . . . . . . . . . . . . . . . . . 20 81 5.2.4. Receive Window < EMSS . . . . . . . . . . . . . . . . . 21 82 5.2.5. Size of ULPDU + 8 > EMSS . . . . . . . . . . . . . . . . 21 83 6. Security Considerations . . . . . . . . . . . . . . . . 22 84 6.1. Protocol-specific Security Considerations . . . . . . . 22 85 6.2. Using IPSec With TUF . . . . . . . . . . . . . . . . . . 22 86 6.3. Using TLS With TUF . . . . . . . . . . . . . . . . . . . 22 87 7. IANA Considerations . . . . . . . . . . . . . . . . . . 25 88 References . . . . . . . . . . . . . . . . . . . . . . . 25 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . 26 90 A. Sample Sockets Support For TUF . . . . . . . . . . . . . 27 91 A.1 Basic Principles . . . . . . . . . . . . . . . . . . . . 28 92 A.2 Enabling TUF . . . . . . . . . . . . . . . . . . . . . . 28 93 A.3 Sending Data . . . . . . . . . . . . . . . . . . . . . . 29 94 A.4 Retrieving The Current EMSS or MULPDU . . . . . . . . . 29 95 A.5 Disabling ULPDU Packing . . . . . . . . . . . . . . . . 29 96 A.6 Disabling The Report of Oversized ULPDUs . . . . . . . . 30 97 Full Copyright Statement . . . . . . . . . . . . . . . . 30 99 1. Definitions 101 The following terms and abbreviations are used in this document. 103 data delivery - the delivery of received ULP payloads to the 104 ULP application, i.e, notifying the application of data 105 arrival by completing a receive operation or generating an 106 event. 108 data placement - the storage of received ULP payloads to host 109 memory, pending delivery to the ULP application. 111 direct data placement - the storage of received ULP payloads 112 directly to application-specified buffers without intermediate 113 buffering or copying. 115 EMSS - the effective maximum segment size. EMSS is the TCP 116 maximum segment size (MSS) defined in RFC 793 [TCP] and 117 exchanged during TCP connection establishment, adjusted by the 118 current path maximum transfer unit (MTU) [PathMTU]. 120 FPDU - framing protocol data unit. The protocol data unit 121 defined by TUF. 123 MULPDU - maximum upper layer protocol data unit size. The 124 size of the largest ULPDU that fits in an EMSS-sized FPDU. 126 NIC - network interface controller. The device that provides 127 a host's access to a physical network link. 129 PDU - protocol data unit. A self-contained block of control 130 and data defined by a particular protocol. 132 RDMA - Remote Direct Memory Access protocol. A data transfer 133 protocol which uses memory access-style transfer mode(s) to 134 provide generic direct data placement capabilities for 135 arbitrary ULPs. 137 TUF - TCP ULP Framing protocol. The protocol defined in this 138 document. 140 ULP - upper layer protocol. The client protocol using the 141 services of the transport layer, or TUF. 143 ULPDU - upper layer protocol data unit. 145 ULPDU containment property - the property that a TCP segment 146 contains exactly an integral number of ULPDUs. 148 2. Overview 150 This section summarizes the motivation for the TCP ULP Framing 151 (TUF) protocol and explains its operation in brief. Section 3 152 (`Rational for TUF') develops the rationale for TUF in detail. 153 Section 4 (`The Protocol') defines the protocol itself. Section 5 154 (`Protocol Characteristics') examines various properties of the 155 protocol's operation. Implementors may wish to refer directly to 156 sections 4 and 5. 158 2.1. Motivation 160 The IP protocols are not usually used for high-performance high 161 speed data transfers due to overhead in TCP processing. Instead, a 162 number of special purpose protocols have been used. The domain of 163 application for such high speed buffer transfer includes storage, 164 video delivery and processing, and various applications of cluster 165 computing, such as scalable database or application service. For 166 reasons discussed below, today, there is great industry interest in 167 developing an IP standard for low overhead high bandwidth data 168 transfer, which would decrease the costs of high speed 169 interconnects and supplant special purpose protocols. 171 The approach typically used for low overhead transfers is called 172 direct data placement, in which the network interface places data 173 directly in application buffers, avoiding the latency and memory 174 bandwidth costs associated with copying. Direct data placement can 175 in principal be done with either of IP's reliable transports--SCTP 176 or TCP. This document considers what is needed to do direct data 177 placement with TCP. 179 In order to place data directly in application buffers, the network 180 interface needs to use information in the Upper Layer Protocol Data 181 Units (ULPDUs) contained in the TCP stream. This can be 182 accomplished routinely except when TCP segments arrive out of 183 order. If TCP segments arrive out of order, the location of the 184 ULPDUs in the TCP segment cannot be found. The TUF protocol 185 addresses this problem of finding ULPDU headers in the TCP stream, 186 even when TCP segments arrive out of order. 188 2.2. Approach 190 TUF is implemented as a shim layer between an ULP and TCP. The 191 end-to-end data flow is: 193 0. Use of TUF is negotiated end-to-end by the ULP. 195 1. The ULP delivers a data stream with ULPDUs delimited to TUF. 197 2. TUF inserts a header and delivers the shimmed ULPDUs to TCP. 199 3. The TUF-aware TCP sender preserves boundaries of shimmed 200 ULPDUs (TUF FPDUs) as much as possible when delivering 201 segments to the IP layer. 203 4. The receiving TCP delivers shimmed ULPDUs to the receiving TUF 204 layer. 206 5. TUF removes the shim and delivers the ULPDUs to the ULP. 208 In other words, the layering of TUF is: 210 ULP client 211 ^ 212 | 213 | ULPDUs (in octet stream) 214 | 215 v 216 TUF 217 ^ 218 | 219 | FPDUs (containing ULPDUs) 220 | 221 v 222 TUF-conforming TCP 223 ^ 224 | 225 | TCP Segments (each containing an FPDU) 226 | 227 v 228 . . . 230 Note that while the semantics of this protocol layering must be 231 maintained, the receiving network interface may use the information 232 in the framed ULPDUs to place the data in memory on the host. 233 Whatever the case, the data is only delivered to the ULP when all 234 preceding TCP data has arrived. 236 3. Rational For TUF 238 This document defines the TUF protocol as a shim layer between an 239 Upper Layer Protocol (ULP) and TCP. TUF also depends on a TCP 240 segmentation convention between TUF/TCP endpoints specified in this 241 document. Taken together they provide the capability for a TUF/TCP 242 receiver to recognize ULPDUs by processing each TCP segment 243 independently, without requiring state from previous segments. 245 The purpose of TUF is to enable practical designs for enhanced 246 network interfaces (NICs) implementing direct data placement for 247 TCP-based ULPs. The purpose of direct data placement is to 248 eliminate the need for a host to copy received data after it 249 arrives in host memory. This copying incurs CPU, memory and bus 250 costs that are substantial and are not masked by advancing hardware 251 technology. 253 A general and practical solution to the receive copy problem has 254 eluded the IP networking community for almost two decades. There 255 is a long history of research and experimental schemes to reduce or 256 eliminate receiver copying overhead for IP networking in general, 257 and for TCP/IP communication in particular. While these systems 258 have convincingly demonstrated the potential performance benefits 259 of reducing copy costs, all such schemes suffer from one or more of 260 the following limitations: they require a significant restructuring 261 of operating system buffering and/or APIs; they are limited to 262 specific modes of communication (e.g., bulk data transfer) or 263 specific application ULPs; they do not scale on multiprocessor 264 hosts; their benefits depend on specific properties of the network 265 (e.g., large MTUs) or host buffer size and alignment. Moreover, 266 all such schemes require some degree of support from NICs to 267 separate payloads from headers and/or ensure that their placement 268 in host memory meets specific requirements (e.g., for page 269 placement and alignment). 271 Inherent copying costs for IP communication are one motivation to 272 use alternative non-IP technologies for high-speed networking. A 273 number of specialized technologies have been developed for high 274 speed data transfers in which network interfaces transfer data from 275 application buffer to application buffer without software touching 276 the data. Some examples include the VAXCluster Interconnect in 277 1983, Fibre Channel (FC) in 1994, and today InfiniBand (IB) and 278 Virtual Interface Architecture (VIA). These alternatives have 279 eroded the popularity of IP technologies in application domains 280 including network storage, video processing and delivery, and 281 cluster computing for scientific applications and scalable 282 database-related services. 284 Until recently, several factors have limited interest in promoting 285 IP networking as a solution in these application domains. First, 286 the competing network technologies offered significantly higher 287 link speeds than the network hardware available for use with IP. 288 Second, these application domains were a relatively small segment 289 of the network market. Recently, however, Ethernet networks have 290 closed the bandwidth gap and even exceeded the bandwidth of 291 alternatives such as FibreChannel, at much lower cost. At the same 292 time, an increasing number of applications are server-hosted in 293 data centers to enable sharing and access from a growing number of 294 IP-connected client devices and locations. With the growth in 295 importance and number of data centers, high-speed interconnection 296 within the data center is now central to the everyday operation of 297 Internet services. 299 Thus, technology changes have created an opportunity and demand to 300 extend the benefits of IP technologies to high-performance 301 application domains, while simultaneously increasing the importance 302 of those domains. The ubiquity of IP offers economies of scale 303 heavily favoring IP in these domains. For example, reliance on 304 specialized non-IP technologies for high-performance domains 305 creates a need to support multiple protocols and redundant network 306 infrastructure in data centers, and it compromises portability and 307 interoperability of data center solutions. Moreover, comprehensive 308 support for network management and security is developing rapidly 309 in the IP space. Use of IP technologies would allow data centers 310 to benefit from these enhancements. 312 3.1. Direct Data Placement 314 Direct data placement is a key step toward making IP networking 315 competitive in data centers and other high-performance domains. 316 Direct data placement refers to the ability of a NIC to place data 317 directly from the network into designated application buffers, 318 without intermediate copying. Direct data placement is attractive 319 relative to other solutions to the receive copy problem. It is the 320 only solution that can be implemented in a way that is compatible 321 with existing operating systems, since the receiving NIC takes over 322 most of the responsibility to avoid receive copying. Also, direct 323 data placement generalizes easily to a range of ULPs. In 324 particular, the establishment of an IETF standard for an IP 325 transport-based direct data placement protocol, which would allow 326 NICs to directly place data independent of the application ULP 327 using it. 329 The TUF protocol is necessary to permit easily deployable enhanced 330 NICs supporting direct data placement. Such NICs already exist and 331 their usage is growing rapidly, but their development is impeded by 332 the lack of standards. Direct data placement is unnecessarily 333 difficult and expensive to design and implement for existing TCP- 334 based ULPs; the key objective of TUF is to define transport 335 conventions to simplify the design of these NICs. A related 336 impediment is that in the absence of a general direct data 337 placement protocol these products are limited to specific ULPs such 338 as iSCSI. TUF, and possibly additional, higher layer protocol 339 definitions outside the scope of this document, would encourage the 340 market by ensuring interoperability of product offerings from 341 different vendors. 343 This document defines a framing protocol (TUF) and TCP segmentation 344 conventions that enable simple support of direct data placement for 345 a class of TCP-based ULPs. It does not propose a generic direct 346 placement ULP, such as an RDMA protocol, or any facility for direct 347 data placement, but only the foundations for building such a 348 facility on TCP. A key objective of TUF is to do this in a way 349 that is compatible with existing standards and with the spirit of 350 TCP's stream communication model. TUF can simplify support for 351 direct data placement for ULPs such as iSCSI, and it can serve as a 352 basis for a future RDMA proposal. 354 The key limitation of TUF as a solution to the receive copy problem 355 is that it works only if the ULP standard and the sending and 356 receiving implementations all support it. Impact on the sender and 357 ULPs is minimal, but ULPs must be adapted to allow use of TUF at 358 the ULP/transport boundary. The necessary modifications may be 359 quite small. Use of TUF is a negotiated option between the sender 360 and receiver for each ULP session, preserving interoperability 361 among senders and receivers that do not support TUF. 363 3.2. Direct Data Placement with TCP 365 Direct data placement is widely used to accomplish high-performance 366 data transfer in non-IP technologies such as block storage channels 367 (SCSI, Fibre Channel, etc.), and other specialized high performance 368 networks like InfiniBand. This section considers how direct 369 placement can be done with TCP. 371 The Internet Protocol suite provides two transports that are prime 372 candidates for use with direct data placement -- SCTP and TCP. The 373 framing features of the SCTP Stream Control Transmission Protocol 374 [SCTP] make it more directly adaptable for direct data placement 375 for future ULPs using SCTP. However, the maturity and ubiquity of 376 TCP make it desirable to define a flexible method for direct data 377 placement for TCP-based ULPs as well. 379 There has been a great deal of `moral confusion' concerning the 380 interaction of direct data placement with TCP's ordering 381 guarantees. These ordering guarantees do not prohibit direct data 382 placement, even if data is placed as it arrives out of order. 384 TCP guarantees data delivery to the application ULP as an ordered, 385 sequential stream [RFC793]. Data is delivered only when TCP has 386 notified the application of its arrival and transferred ownership 387 of the receive data buffer. TCP does not specify how received data 388 is stored prior to its delivery, and it does not preclude placement 389 of data in application buffers out of order, as long as no data is 390 delivered until all preceding data has also been delivered. Out- 391 of-order placement greatly simplifies direct data placement NICs 392 because it streamlines data paths and eliminates the need for a TCP 393 reassembly buffer on the NIC. 395 An implementation performing direct data placement must still 396 respect all TCP delivery semantics. For example, if a checksum 397 integrity check fails, the data must not be placed in ULP-supplied 398 buffers, because, for example, the TCP ports and the TCP sequence 399 number are not trustworthy. 401 3.2.1. The Simple Case: ULP-unaware Placement 403 Direct data placement into a ULP client-supplied buffer designated 404 to hold the next data delivered to the ULP, regardless of the 405 contents of the received data, is one of the simplest possible 406 forms of direct data placement. This form of direct data placement 407 is already fully supported by existing TCP mechanisms. New NIC 408 products currently, or soon to be available, which claim to offer 409 `full zero copy operation' typical provide only this ULP-unaware 410 form of direct data placement. 412 While ULP-unaware direct data placement works well for ULPs like 413 FTP where the entire contents of a TCP connection are known to be 414 nothing but a single stream of bulk client data, most widely used 415 ULPs, e.g. HTTP [HTTP], BEEP [BEEP] and storage protocols, 416 multiplex control and data, and possibly even interleave data from 417 different requests on the same TCP connection. The simple ULP- 418 unaware direct data placement is inadequate to avoid data copies 419 for these ULPs. 421 3.2.2. The Complex Case: ULP-aware Placement 423 An explicit goal of this proposal is to support out-of-order direct 424 data placement for ULPs that provide additional transport-like 425 features such as control and data multiplexing, layered above TCP 426 (e.g., iSCSI or a generic direct data placement protocol such as 427 RDMA). In many ULPs, such as storage protocols, control 428 information contained in the ULP uniquely identifies the 429 destination application buffer of each particular piece of data. 431 For example, suppose a client requests a read operation using a 432 network storage ULP, specifying the destination buffer for the 433 requested data. The requesting ULP includes control information in 434 the request (e.g., in the ULPDU header) uniquely identifying that 435 buffer, and the responder includes that information in the read 436 response. For some protocols, the identifier is a unique request 437 ID, allowing the client ULP to identify the buffer indirectly 438 through a table of pending requests. If the storage protocol uses 439 RDMA, the response may specify the buffer directly by means of a 440 region identifier. 442 A network interface that understands the relevant ULP control 443 information can use it to place the incoming data (e.g., read 444 response payload) directly in the correct buffer. In this case, 445 data placement is guided by ULPDU headers embedded in the TCP data 446 stream. The NIC accesses these headers as hints for placement of 447 the ULP payloads--a form of integrated layer processing for each 448 TCP segment as it arrives. This is compatible with TCP's ordering 449 properties if completion of ULP header processing and delivery of 450 the payload data to the application are strictly in order. 452 3.2.3. The Problem of ULP-aware Placement with TCP 454 The problem with performing direct data placement as a function of 455 ULP control information in TCP is that it may be difficult to 456 locate the ULP control information (ULPDU headers) within a TCP 457 segment. 459 If all TCP segments are received in sequence order, ULP control 460 information can be unambiguously located by the rules that permit 461 any ULP implementation to do so. For example, each ULPDU may 462 contain a length field that implicitly specifies the location of 463 the beginning of the subsequent ULPDU. 465 If TCP segments are not received in sequence order, without taking 466 additional measures, it may not be possible to unambiguously locate 467 ULP control information needed for direct data placement. For 468 example, if ULPDU length information is in a TCP segment that is 469 delayed or lost in transmission, assuming the ULPDU length is the 470 only means of locating the beginning of the subsequent ULPDU, it is 471 impossible to locate ULP control information for ULPDUs in 472 subsequent TCP segments until the lost or delayed TCP segment is 473 received. ULP control information, and the data whose placement 474 depends on it may even be in different TCP segments. If the ULP 475 control information is in a TCP segment that is delayed or lost, it 476 is impossible to directly place the data until the ULP control 477 information is received. 479 3.2.4. Finding ULPDUs In Out-of-order Segments 481 Early attempts at ULP-aware direct data placement in TCP took the 482 approach of only directly placing data for TCP segments received 483 in-order. Otherwise, data was copied through a reassembly buffer 484 as in a traditional implementation. Unfortunately packet loss, and 485 attendant out-of-order reception is a frequent, continuous 486 characteristic of both wide-area, and switched local area networks 487 of almost any size, as TCP adjusts to varying congestion 488 conditions. Under these conditions, a large portion of the data 489 transferred ends up being copied, rather than being directly 490 placed. 492 Another solution to this problem is to build a reassembly buffer 493 into the network interface. Data received out-of-order can be held 494 in the network interface reassembly buffer until all preceding data 495 is received, and then direct placement can be performed on the 496 reassembled data. Within certain implementation assumptions, this 497 is reasonable approach, but, unfortunately there are a number of 498 issues including very large memory requirements, limited 499 scalability, and increased latency, that make the reassembly 500 approach undesirable. 502 The size of reassembly buffer needed in the network interface is a 503 direct function of the bandwidth * delay product of all active TCP 504 connections. Reasonable assumptions on the active bandwidth * 505 delay product can imply a large amount of reassembly memory. 506 Furthermore, this large reassembly memory must run at high 507 speed---more than two times the link speed, to maintain full link 508 bandwidth. 510 Finally, performing reassembly in the network interface requires 511 that the bandwidth from the network interface to host memory be not 512 just equal, but substantially greater than the maximum bandwidth of 513 the network link, to ensure that the reassembly buffer is drained 514 when reassembly is complete. System bus and interconnect bandwidth 515 are particularly scarce and expensive resources in most systems. 517 What is needed to permit ULP-aware direct data placement without 518 reassembly buffering is a way to ensure that the ULP control 519 information and the data associated with it is highly likely to be 520 contained completely within a single TCP segment, and a way for a 521 receiver to validate this containment property on TCP segments it 522 receives. If the receiver can determine that a ULPDU starts at the 523 beginning of a TCP segment, the receiver can perform ULP-aware 524 direct placement for that ULPDU, and subsequent ULPDUs contained in 525 that TCP segment. The property that a ULPDU is completely 526 contained within a TCP segment is called the `ULPDU containment 527 property'. 529 3.2.5. The TUF Solution 531 The TUF protocol defines a shim layer above TCP and below the ULP 532 that allows the receiver to validate the ULPDU containment property 533 for each TCP segment received, independently of any other TCP 534 segment. The TUF protocol also defines a segmentation behavior for 535 the TCP sender that ensures the ULPDU containment property holds as 536 often as possible while still respecting the protocol requirements 537 for TCP senders. 539 The TUF-specified TCP segmentation behavior ensures that the ULPDU 540 containment property is maintained as long as the receiver window 541 size is at least equal to the effective MSS (EMSS), the path MTU 542 (PMTU) does not change, and the TCP stream is not resegmented by an 543 intermediary. In conditions where the TCP receiver window size is 544 smaller than EMSS, or the PMTU changes, the segmentation behavior 545 further ensures that once the relevant condition is restored, the 546 ULPDU containment property will be satisfied again. 548 For the high-performance applications that this protocol targets, 549 small receiver window sizes, and PMTU changes are rare transients. 550 Thus, the specified protocol ensures that ULP control information 551 and its associated data are virtually always together in a single 552 TCP segment. 554 3.2.6. TUF's ULP Assumptions 556 A key assumption of TUF is that ULPs running on TUF can adjust 557 ULPDU sizes to fit completely within an EMSS-sized TCP segment. 558 Clearly, if a ULPDU does not fit within an EMSS-sized TCP segment, 559 the ULPDU containment property can not be satisfied. Most storage 560 protocols (e.g. iSCSI), and other performance-targeted protocols 561 (e.g. RDMA protocols) support this capability. ULPs that can not 562 adjust ULPDU sizes to fit within an EMSS-sized TCP segment, but 563 still want the performance advantages of direct data placement, can 564 be mapped on top of an intermediate protocol (e.g. an RDMA 565 protocol) that does support this data `chunking'. 567 TUF does not change the stream delivery semantics of TCP to the 568 ULP, through the TUF implementation. It merely inserts a shim 569 header that can be used by direct placement network interfaces to 570 verify the ULPDU containment property. The shim header is inserted 571 by the sending TUF implementation and removed by the receiving TUF 572 implementation, leaving a stream to be delivered to the ULP. 574 4. The Protocol 576 This section defines the TUF protocol itself. The first two 577 sections are the core of the protocol defining: 579 o the shim layer PDUs, called FPDUs, 581 o a TCP-conforming segmentation behavior which ensures the ULPDU 582 containment property holds under most conditions. 584 The remaining sections cover other aspects of the protocol which 585 are primarily implications of the core protocol: 587 o what ULP-specified negotiations to enable TUF must accomplish, 589 o how receivers can process received TCP segments to establish 590 whether the ULPDU containment property holds. 592 4.1. The Framing Protocol Data Unit (FPDU) 594 TUF sends groups of one or more complete ULPDUs in a framing 595 protocol data unit (FPDU). 597 4.1.1. FPDU Format 599 The format of an FPDU is: 601 0 1 2 3 602 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 603 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 604 | Length | Key | 605 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 606 | Key | 607 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 608 | | 609 | | 610 ~ ~ 611 ~ ULPDUs ~ 612 | | 613 | | 614 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 615 | ULPDUs | 616 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 618 Length: 16 bits (unsigned integer) 619 This is the length in octets of the set of framed ULPDUs. It 620 does not include the length of the FPDU header itself. 622 Key: 48 bits (unsigned integer) 624 This is used by the receiver to validate the ULPDU containment 625 property. It is selected at random by the sender, and 626 initially signaled to the receiver in a ULP-specified way, 627 before the receiver attempts to test the ULPDU containment 628 property. All FPDUs sent on the same connection in the same 629 direction must use the same key value. A good quality random 630 number generator MUST be used to generate the initial key. 631 RFC 1750 discusses relevant characteristics and provides 632 references for good quality random number generation 633 [RFC1750]. 635 The length of an FPDU is 8 + L octets, where L is the length of the 636 set of framed ULPDUs. The 16-bit length field is sufficient to 637 permit a TCP segment with an FPDU to completely fill a maximum-size 638 IPv4 or IPv6 datagram. 640 4.1.2. FPDU Size Selection 642 Each FPDU SHOULD contain as many contiguous, complete ULPDUs as 643 will fit within the current EMSS, unless ULPDU packing is disabled. 644 If ULPDU packing is disabled each FPDU SHALL contain a single 645 ULPDU. ULPDU packing mode may be negotiated, or specified a priori 646 by a ULP. Disabling ULPDU packing is analogous to disabling the 647 Nagle algorithm in TCP. 649 TUF SHALL present the size of the largest ULPDU size fitting in an 650 EMSS-sized FPDU (MULPDU) to the ULP. MULPDU is EMSS - the FPDU 651 header size (8 octets). ULPs SHOULD submit as large ULPDUs as 652 possible to TUF, up to MULPDU, subject to limits imposed by 653 specific ULP properties. The ULP MAY also chose to pack several 654 ULPDUs into an EMSS-sized unit before submitting them as one ULPDU 655 to TUF. Depending upon the ULP, ULP packing may improve data 656 transfer efficiency, and is unlikely to have any detrimental 657 effect. 659 A TUF implementation probing for PMTU increase SHOULD present an 660 increased MULPDU value to the ULP until a large enough FPDU to 661 perform the probe results. 663 Under exceptional circumstances, the EMSS can become too small to 664 accommodate even a single ULPDU. For example, a ULP may define 665 fixed-sized PDUs that are incompressible, or variable size PDUs 666 with some absolute minimum size, such as the size of a data PDU 667 containing a minimum amount of data. It is possible for the EMSS 668 to shrink to as small as 8 octets [PathMTU]. If the EMSS is too 669 small to accommodate an incompressible ULPDU, the FPDU MUST contain 670 only that ULPDU. ULPs using TUF SHOULD NOT define ULPDUs with a 671 minimum size greater than 128 octets. 673 4.2. TUF-conforming TCP Sender Segmentation 675 TCP senders are allowed substantial freedom in the choice of how to 676 segment an outgoing TCP stream. Within the confines of the 677 receiver-advertised receive window, and the sender computed 678 congestion window, any segmentation is permitted. Virtually all 679 TCP implementations do attempt to segment outgoing TCP streams into 680 EMSS-sized segments where possible because it improves performance. 682 TUF-conforming TCP sender behavior ensures that the ULPDU 683 containment property holds most of the time. To do this, a TUF- 684 conforming TCP sender MUST respect a single additional rule in 685 performing segmentation: 687 A TUF-conforming TCP sender MUST segment the outgoing TCP 688 stream such that the first octet of every FPDU is sent at the 689 beginning of a TCP segment 691 4.3. Negotiating TUF 693 Negotiating the use of TUF is the responsibility of the ULP. The 694 use of TUF MAY be negotiated separately for each direction on a 695 connection. The negotiation procedure MUST ensure that when TUF is 696 enabled or disabled, the remote peer will not transmit its first 697 TCP segment in the new mode until it is certain that the local peer 698 has actually enabled or disabled TUF. 700 TUF operation is characteristically requested by the receiver and 701 offered by the sender. Before enabling TUF, the relevant 702 parameters: 704 1. the sender's 48-bit key 706 2. ULPDU packing mode 708 MUST be established at each peer. 710 A natural way to enable the use of TUF is a ULP-defined negotiation 711 exchange of the TUF parameters culminating in enabling TUF, if 712 requested, for each transfer direction. A three-way handshake 713 protocol can be used to ensure that the point at which TUF is 714 enabled is unambiguous and each end has time to perform local state 715 changes. A connection on which TUF is enabled is likely to be the 716 same connection on which the negotiation occurs, but this is not 717 required. A new connection could also use TUF from its initial 718 establishment, if the TUF parameters and modes are known through 719 some out-of-band mechanism. 721 Use of TUF could be disabled during a connection using a similar 722 ULP-defined three-way handshake. 724 Other alternatives to parameter exchange include stipulating some 725 parameters a priori. For example, a ULP could specify that TUF 726 with ULPDU packing enabled is always used in both directions. In 727 this case, only the 48-bit keys need to be exchanged before TUF is 728 enabled. Or, a ULP could determine TUF characteristics on the 729 basis of the TCP port number. 731 4.4. TUF Receiver ULPDU Containment Property Testing 733 A TUF receiver that wishes to use ULP control information to 734 perform direct data placement must first verify the ULPDU 735 containment property. To do this, the receiver MUST establish that 736 the TCP segment contains exactly one FPDU. Abstractly, this can be 737 done by assuming the TCP segment payload begins with an FPDU, and 738 verifying the following properties of that putative FPDU: 740 o The received TCP segment payload length equals the FPDU length 741 plus the length of the FPDU header (8 octets). 743 o The 48-bit key equals the value signaled to the receiver when 744 TUF was enabled for the connection. 746 If these conditions are true, the TUF receiver MAY assume that the 747 ULPDU containment property holds, and use ULP control information 748 to directly place data in the contained ULPDUs. 750 TUF DOES NOT provide any information that a TUF receiver can use to 751 locate ULP control information beyond the ULPDU containment 752 property. In particular, a TUF receiver MUST NOT scan TCP segments 753 in an attempt to locate FPDUs that do not begin at the beginning of 754 a TCP segment. However, even if the ULPDU containment property 755 does not hold, a TUF receiver may still be able to reliably locate 756 and use ULP control information. For example, if a received TCP 757 segment contains the next unreceived data in the TCP stream, the 758 location of ULPDUs in that segment are unambiguous. The behavior 759 of a TUF receiver acting on ULP control information located with 760 properties other than the ULPDU containment property is not 761 specified here. 763 5. Protocol Characteristics 765 This section discusses some characteristics and behavior which are 766 implications of the TUF protocol. 768 5.1. Properties Of TUF-conforming TCP Senders 770 The general practice of TCP senders to send as much data as 771 possible within a TCP segment (up to EMSS) implies that an FPDU 772 whose size is less than or equal to EMSS, and whose first octet 773 begins a TCP segment will be sent entirely within a single TCP 774 segment. This ensures the ULPDU containment property for that TCP 775 segment. 777 A TUF-conforming TCP sender still obeys all requirements of TCP. 778 While the segmentation of a TUF-conforming TCP sender will have 779 distinctive characteristics when viewed from the network wire, the 780 same segmentation behavior could also result from a stock TCP 781 sender. 783 The one property of a TUF-conforming TCP sender which arguably 784 departs from traditional expectations is that a TUF-conforming TCP 785 sender may not produce TCP segments which are as close in size to 786 EMSS as a stock TCP sender. The need to ensure the ULPDU 787 containment property may result in TCP segments which are not as 788 full as if the property did not need to hold. While this is 789 abstractly true, in practice, several characteristics combine to 790 minimize this effect. Specifically: 792 o Packing ULPDUs into FPDUs gives behavior similar to that of 793 stock TCP segmentation, albeit with coarser granularity. 795 o ULPs which benefit from data-dependent direct data placement 796 (candidates for TUF) usually transfer large amounts of data in 797 bulk. This means that most ULPDUs are data-carrying, and will 798 be EMSS-sized. Even when control is interleaved with data, 799 the combination of a small number of control ULPDUs with a 800 data ULPDU can be packed to fill an EMSS-sized segment. 802 Therefore, a TUF-conforming TCP sender seems likely to behave 803 similarly to a stock TCP sender under most circumstances. However, 804 applications that both send and receive data over the same TCP 805 connection, where there might be dependencies between incoming and 806 outgoing data, are often subject to excessive delays attributable 807 to TCP's Nagle algorithm and/or delayed-ACK algorithm [NagleDAck]. 808 These algorithms generally perform best when TCP always sends full- 809 EMSS segments. Because TUF can generate sub-EMSS segments as a by- 810 product of aligning FPDU boundaries with TCP segment boundaries, 811 TUF might be especially vulnerable to the known problems with the 812 Nagle and/or delayed-ACK algorithms. 814 Further work, including implementation experience with TUF, as well 815 as existing and future proposals for improvements to the Nagle 816 and/or delayed-ACK algorithms, might be necessary to optimize TUF 817 performance while fully preserving the congestion-avoidance 818 features of TCP. This work is currently outside the scope of this 819 document. 821 5.2. Exception Cases 823 The complete operational specification of TUF is contained in the 824 rules for forming FPDUs, and sending those FPDUs in TCP segments. 825 However, the operation of TUF will be subject to a variety of 826 transient or exceptional conditions. The behavior of TUF under 827 those conditions is discussed below to illustrate specifically how 828 TUF addresses them. 830 5.2.1. Resegmenting Intermediaries 832 Resegmenting TCP-layer intermediaries (middleboxes) are one of the 833 most formidable obstacles to maintaining the ULPDU containment 834 property. In the presence of such an intermediary, the 835 segmentation chosen by the sender may not be the segmentation at 836 the receiver. While such intermediaries may or may not be common 837 in particular networks, in many cases the presence or absence of 838 such resegmenting behavior is beyond the control or even knowledge 839 of the end points using TUF. Therefore, TUF must detect such 840 resegmentation by design. 842 A primary reason for the presence of a random key in the FPDU 843 header is to detect such resegmentation. An alternative to the 844 random key which has been proposed, is to use ULP-specific 845 validation criteria to determine the ULPDU containment property. 846 For example, some ULP PDUs include relatively strong data integrity 847 checks such as CRCs, and other ULP control information can often be 848 validated against various ULP-specific criteria. 850 While such ULP-specific validation criteria may involve checking 851 many more bits than the combination of the FPDU's 16-bit length and 852 48-bit key, ULP-specific validation criteria may not actually offer 853 a strong guarantee of the ULPDU containment property. For certain 854 data streams, the probability of a false-positive indication of the 855 ULPDU containment property can be extremely high. 857 Assume that the intermediary resegments to a granularity of no 858 finer than G octets (e.g. 4). Also assume that the TCP data stream 859 contains predominantly application data. If the ULP is a storage 860 protocol, simply transferring a file containing a continuous, 861 repeated stream of well-formed ULPDUs which are some multiple of G 862 in size increases the probability of a false-positive indication of 863 the ULPDU containment property to approximately: 865 1 / (sizeof(repeated ULPDU)/G) 867 If the well-formed ULPDUs are relatively small (e.g. 32 octets 868 where G=4 octets), the probability of a false-positive indication 869 of the ULPDU containment property is approximately 1/8, for EACH 870 TCP segment which does not actually begin with a ULPDU. Clearly, 871 in this case, it would take only a very small number of TCP 872 segments which do not begin with an actual ULPDU before the `fake' 873 ULPDU in the application data is interpreted as an actual ULPDU. 874 The consequences of such a false-positive interpretation could be 875 dire, for example executing a destructive operation request. 877 The 48-bit random key in the FPDU results in a low probability of a 878 false-positive indication of the ULPDU containment property because 879 it is effectively secret with respect to the application data 880 stream. 882 Note that although this analysis may appear to be security-minded, 883 prompting the image of a sighted third-party adversary that can 884 `sniff' the 48-bit key, it is actually considering a safety, rather 885 than a security property. The security properties of TUF are 886 discussed in Section 6 (`Security Considerations') below. 888 Even though TUF can detect the presence of a resegmenting 889 intermediary, such an intermediary will almost certainly 890 substantially reduce the chance of the ULPDU containment property 891 being satisfied. A TUF implementation which detects a very low 892 incidence of the ULPDU containment property for a sustained 893 interval (>> RTT) may assume that a resegmenting intermediary is in 894 operation and SHOULD discontinue the use of ULP control information 895 found using the ULPDU containment property. In such cases, the ULP 896 MAY elect to disable the use of TUF altogether, or simply just stop 897 exploiting the ULPDU containment property. 899 5.2.2. PMTU Reduction 901 When a PMTU reduction is detected by a TUF-compliant TCP, the TUF- 902 compliant TCP sender may send FPDUs already committed to the TCP 903 layer in one of two ways: 905 o send unsegmented FPDUs in TCP segments of the old EMSS size, 906 and rely on IP fragmentation to deliver the segments, 908 o segment FPDUs to fit in TCP segments which respect the new 909 EMSS size. 911 Stock TCPs face a similar choice on PMTU change, and both 912 alternatives are used in practice. 914 In the case that a TUF-compliant TCP chooses to segment FPDUs, it 915 SHOULD segment them in such a way that, in the absence of 916 resegmentation by an intermediary, the segments are guaranteed not 917 to give a false-positive indication of the ULPDU containment 918 property. There are various ways to ensure this. For example, no 919 matter how the FPDU is segmented, the first segment is guaranteed 920 not to give a false-positive indication of the ULPDU containment 921 property---the 48-bit key will match, but the length will not. In 922 the worst possible case, each subsequent TCP segment could be sent 923 with fewer than 8 octets of data, also guaranteed not to give a 924 false-positive indication of the ULPDU containment property. More 925 efficient approaches are possible, but PMTU reduction is a rare 926 event, and reacting to it is only a transient condition. 927 Eventually a new MULPDU will be presented to the ULP, and FPDUs 928 that fit in the new EMSS will result. During the transient 929 condition, performance will suffer temporarily no matter how FPDUs 930 are segmented. 932 No matter what segmentation is chosen by a TUF-compliant TCP sender 933 when segmenting an FPDU, if the segments pass through a 934 resegmenting intermediary, the correctness of the ULPDU containment 935 property remains strictly a matter of probability. 937 5.2.3. PMTU Increase 939 As described in `FPDU Size Selection' above, a TUF-compliant TCP 940 probing for PMTU increase will present an increased MULPDU value to 941 the ULP. This should eventually lead to an FPDU large enough to 942 actually perform the PMTU increase probe. The MULPDU value should 943 not be further adjusted until the probe is actually performed. 944 This behavior is similar to when a stock TCP would like to perform 945 a PMTU increase, but less data is available than would fill the 946 desired segment. 948 Also, note that depending on the ULP, the actual distribution of 949 FPDU sizes may have a granularity coarser than a single octet. An 950 FPDU with an particular, desired TCP segment size may never be 951 generated. Therefore when probing for PMTU increase, a TUF- 952 compliant TCP must be satisfied with an FPDU that produces a TCP 953 segment size that is `close' to the desired size. 955 Finally, note that in cases where PMTU grows and shrinks relatively 956 frequently, better performance may result from not probing for PMTU 957 increase at all, or probing very rarely. This is because the 958 performance disruption resulting from PMTU decrease can be 959 substantial, and in many cases, implementations of TUF will be in 960 hardware, so performance may less sensitive to differences in PMTU. 962 5.2.4. Receive Window < EMSS 964 A TUF-compliant TCP sender that is presented with a receive window 965 smaller than EMSS may be required to segment FPDUs. The TCP window 966 probe is a limiting case of this condition where the advertised 967 receive window is 0, and the amount of data typically sent in 968 response is a single octet. 970 In this case, a TUF-compliant TCP sender will segment in accordance 971 to the requirements of TCP, and the rule defined in `TUF-conforming 972 TCP Sender Segmentation' above. In addition, as when resegmenting 973 in response to PMTU decrease, a TUF-compliant TCP sender SHOULD 974 segment in such a way that, in the absence of a resegmenting 975 intermediary, segments are guaranteed not to give a false-positive 976 indication of the ULPDU containment property. In situations where 977 the receive window is smaller than EMSS, data transfer performance 978 is likely to be limited independently of any segmentation behavior 979 by the TCP sender. Furthermore, ULP implementations that choose to 980 use TUF will almost certainly be designed to maintain a receiver 981 window larger than EMSS, so a small receiver window should occur 982 extremely infrequently. 984 5.2.5. Size of ULPDU + 8 > EMSS 986 In cases where EMSS shrinks below the minimum size of a ULPDU that 987 a ULP wants to send, TUF will create FPDUs that are larger than 988 EMSS, and a TUF-compliant TCP sender will face the same 989 alternatives as during PMTU reduction: 991 o send unsegmented FPDUs and rely on IP fragmentation to deliver 992 the segments 994 o segment FPDUs to fit in TCP segments which respect the EMSS 995 size 997 A ULP which is presented with an MULPDU value that is too small to 998 accommodate PDUs necessary operation SHOULD simply attempt to use 999 ULPDUs which are as small as possible 1001 If the EMSS shrinks to a pathologically small size, then a TUF 1002 implementation SHOULD discontinue the use of ULP control 1003 information found using the ULPDU containment property. In such 1004 cases, the ULP MAY elect to disable the use of TUF altogether, or 1005 simply just stop exploiting the ULPDU containment property. 1007 A path MTU which results in an EMSS < 128 + 8 octets is an 1008 extremely unlikely occurrence and when it does occur, poor data 1009 transfer performance is a likely result, independent of TCP sender 1010 segmentation behavior. 1012 6. Security Considerations 1014 This section discusses both protocol-specific considerations and 1015 the implications of using TUF with existing security mechanisms. 1017 6.1. Protocol-specific Security Considerations 1019 A third-party that can inject spoofed packets into the network 1020 which can be delivered to a TUF receiver could launch a variety of 1021 attacks that exploit TUF-specific behavior. For example a blind 1022 third-party adversary could inject random packets which appear in 1023 the valid TCP window and do not begin with valid FPDU headers. A 1024 barrage of such packets might cause a TUF receiver to conclude that 1025 a resegmenting intermediary is present and disable the use of TUF 1026 and direct data placement. This would substantially degrade 1027 performance. However, it would probably also have more dire 1028 consequences than performance, such as causing the ULP to interpret 1029 the bogus data as valid. Furthermore, such a third-party could 1030 also degrade performance just as effectively in a TUF-independent 1031 way by injecting spoofed ICMP packets which result in reduction of 1032 the path MTU to an inefficiently small size. 1034 Fundamentally, the vulnerabilities of TUF to active third-party 1035 interference are no more acute than to TCP without TUF. In both 1036 cases, a communication security mechanism such as IPSec is the only 1037 way to completely prevent such attacks. 1039 6.2. Using IPSec With TUF 1041 Since IPSec is designed to secure arbitrary IP packet streams, 1042 including streams where packets are lost, TUF can run cleanly on 1043 top of IPSec without any change. IPSec packets may be decrypted in 1044 the order they are received, and a TUF receiver may test and 1045 exploit the ULPDU containment property just as if the IP datagram 1046 were unsecured. 1048 6.3. Using TLS With TUF 1050 Using TLS [TLS] with TUF, particularly trying to exploit the ULPDU 1051 containment property to locate ULP control information, is not a 1052 straightforward process. TUF can be directly layered on top of 1053 TLS, but many of the advantages of TUF are lost. This document 1054 does not define a way of using TLS with TUF that could offer better 1055 performance than stock reassembly buffer-based implementations. 1056 That task is left to a different document, if there is sufficient 1057 motivation to address the problems. This section does outlines 1058 some of the known complications of trying to do better than stock 1059 reassembly buffer-based implementations using TLS with TUF. 1061 TLS is a record-oriented protocol. TLS records are PDUs with a 1062 similar structure to ULPDUs defined in application ULPs. As with 1063 other ULPs, the only way to avoid a complete reassembly buffer is 1064 to be able to find TLS PDUs in the presence of lost TCP segments. 1065 The ULPDU containment property could be used to do this, which 1066 suggests that TLS itself should be layered on top of TUF. In this 1067 case, the FPDU header will travel in the clear, but this will 1068 probably not present serious vulnerabilities other than denial of 1069 service attacks comparable to what is already possible without TUF. 1071 Once the TLS records are located and processed it still remains to 1072 locate the ULPDUs. The simplest way to do this would be to have 1073 the TLS implementation be TUF-compliant, and ensure the ULPDU 1074 containment property within each TLS record. In this case, the 1075 protocol layering would look like: 1077 ULP client 1078 ^ 1079 | 1080 | ULPDUs (in octet stream) 1081 | 1082 v 1083 TUF-conforming TLS 1084 ^ 1085 | 1086 | TLS records (containing ULPDUs) 1087 | 1088 v 1089 TUF 1090 ^ 1091 | 1092 | FPDUs (each containing a TLS record) 1093 | 1094 v 1095 TUF-conforming TCP 1096 ^ 1097 | 1098 | TCP Segments (each containing an FPDU) 1099 | 1100 v 1101 . . . 1103 An obvious complications of using TLS with TUF is that ciphers 1104 defined for use with TLS do not offer independence across TLS 1105 records. The most common cipher used with TLS is RC4, which is a 1106 stream cipher. Efficient decryption of an RC4 stream depends upon 1107 the entire preceding data stream. In other words, it is simply not 1108 feasible to decrypt TLS records encrypted with RC4 in any order 1109 other than the TCP stream order. This clearly defeats the purpose 1110 of TUF. 1112 TLS is also defined to work with block ciphers such as 3DES in 1113 Cipher Block Chaining (CBC) mode. In this case, the dependency of 1114 the decryption operation on data in previous TLS records is less 1115 severe. To decrypt the current TLS record only requires ciphertext 1116 from the previous TLS record. While this does not allow complete 1117 independence of processing TLS records, a lost or delayed TCP 1118 segment containing a TLS record only prevents decrypting the 1119 immediately subsequent TLS record, not all TLS records after it. 1121 TLS compression presents another complication to using TLS with 1122 TUF. TLS compression algorithms are allowed to increase the 1123 content length by up to 1024 octets. If the content length does 1124 increase, the TLS record may not fit within an EMSS-sized TCP 1125 segment, even if the uncompressed ULPDU does. If the risk of 1126 exceeding an EMSS-sized TCP segment is small, it may be acceptable 1127 to occasionally send FPDUs containing TLS records that span several 1128 TCP segments, or use IP fragmentation. Some TLS compression 1129 algorithms may never increase the content length, or only increase 1130 it by some small, manageable amount. 1132 7. IANA Considerations 1134 If framing is enabled a priori for a ULP by connecting to a well- 1135 known port, this well-known port would be registered for the framed 1136 ULP with IANA. 1138 8. References 1140 [BEEP] 1141 Rose, M., "The Blocks Extensible Exchange Protocol Core", RFC 1142 3080, March 2001. 1144 [HTTP] 1145 Fielding, R. and others, "Hypertext Transfer Protocol -- 1146 HTTP/1.1.", RFC 2616, June 1999. 1147 http://www.ietf.org/internet-drafts/draft-ietf-tsvwg- 1148 initwin-00.txt. 1150 [NagleDAck] 1151 Minshall G., Mogul, J., Saito, Y., Verghese, B., "Application 1152 performance pitfalls and TCP's Nagle algorithm", Workshop on 1153 Internet Server Performance, May 1999. 1155 [PathMTU] 1156 Mogul, J., and Deering, S., "Path MTU Discovery", RFC 1191, 1157 November 1990. 1159 [RFC1750] 1160 Eastlake, D., Crocker, S., Schiller., J., "Randomness 1161 Recommendations for Security.", RFC 1750, December 1994. 1163 [RFC2581] 1164 Allman, M., and others, "TCP Congestion Control," RFC 2581, 1165 April 1999. 1167 [SCTP] 1168 Stewart, R.R. and others, "Stream Control Transmission 1169 Protocol," RFC2960, October 2000. 1171 [Stevens] 1172 Stevens, W. Richard, "Unix Network Programming Volume 1," 1173 Prentice Hall, 1998, ISBN 0-13-490012-X. 1175 [TCP] 1176 Postel, J., "Transmission Control Protocol - DARPA Internet 1177 Program Protocol Specification", RFC 793, September 1981. 1179 [TLS] 1180 Dierks, T. and others, "The TLS Protocol, Version 1.0", RFC 1181 2246, January 1999. 1183 Authors' Addresses 1185 Stephen Bailey 1186 Sandburst Corporation 1187 600 Federal Street 1188 Andover, MA 01810 1189 USA 1191 Phone: +1 978 689 1614 1192 Email: steph@sandburst.com 1194 Jeff Chase 1195 Department of Computer Science 1196 Duke University 1197 Durham, NC 27708-0129 1198 USA 1200 Phone: +1 919 660 6559 1201 Email: chase@cs.duke.edu 1203 Jim Pinkerton 1204 Microsoft, Inc. 1205 1 Microsoft Way 1206 Redmond, WA 98052 1207 USA 1209 EMail: jpink@microsoft.com 1210 Allyn Romanow 1211 Cisco Systems 1212 170 W Tasman Drive 1213 San Jose, CA 95134 1214 USA 1216 Phone: +1 408 525 8836 1217 Email: allyn@cisco.com 1219 Constantine Sapuntzakis 1220 Cisco Systems 1221 170 W Tasman Drive 1222 San Jose, CA 95134 1223 USA 1225 Phone: +1 408 525 5497 1226 EMail: csapuntz@cisco.com 1228 Jim Wendt 1229 Hewlett Packard Corporation 1230 8000 Foothills Boulevard MS 5668 1231 Roseville, CA 95747-5668 1232 USA 1234 Phone: +1 916 785 5198 1235 EMail: jim_wendt@hp.com 1237 Jim Williams 1238 Emulex Corporation 1239 580 Main Street 1240 Bolton, MA 01740 1241 USA 1243 Phone: +1 978 779 7224 1244 EMail: jim.williams@emulex.com 1246 Appendix A. Sample Sockets Support For TUF 1248 The sockets support for TUF described below is only a sketch. It 1249 is provided as an aid to understanding TUF. Implementing this 1250 interface is not a requirement for a TUF implementation. 1252 Other software interfaces are possible. The described interface 1253 draws from the sockets interface for UDP. The described interface 1254 might be natural for applications already designed to support both 1255 TCP and UCP, or that do network input and output in complete PDU 1256 units. For applications that perform octet-at-a-time style input 1257 and output, an alternative interface that draws from the tradition 1258 of the TCP URG pointer interface (e.g. using a MSG_OOB flag to 1259 send()) is equally possible. An implementation may even offer 1260 several different interfaces to TUF. 1262 That said, the sockets support sketched below might well provide 1263 the basis for a complete, standard interface to be described 1264 outside this draft. 1266 A.1 Basic Principles 1268 The sockets support for TUF takes the form of a set of socket 1269 options that may be set or requested to enable the appropriate 1270 behavior. 1272 A socket may be in one of two TUF-related modes in the send 1273 direction: 1275 1. TUF-compliant TCP sender mode. No data (FPDU headers) is 1276 added to the TCP octet stream, but each data buffer presented 1277 in a sending operation is to be sent according to the rules of 1278 TCP and TUF-compliant TCP senders. This mode provides direct 1279 access to a TUF-compliant TCP sender for purposes such as 1280 implementing TUF. 1282 2. TUF sender mode. An FPDU header is added to data presented by 1283 an integral number of sending operations, and the FPDU is 1284 passed to a TUF-compliant TCP sender for transmission 1286 A socket may be in one TUF-related mode in the receive direction: 1288 1. TUF receiver mode. FPDUs are expected in each TCP segment. 1290 If a socket receiving operation is used to retrieve received data 1291 (as opposed to the data being directly placed), FPDU headers are 1292 removed before the data is returned. 1294 A.2 Enabling TUF 1295 /* Pick a sending mode */ 1296 if (sendMode == TUF_TCP) 1297 mode = TUF_SEND_TCP 1298 else 1299 mode = TUF_SEND; 1301 mode |= TUF_RECEIVE; 1303 setsockopt (s, SOL_TCP, TUF_MODE, &mode, sizeof(mode)); 1305 A.3 Sending Data 1307 The standard socket sending operations, including send(), sendto(), 1308 sendmsg(), writev(), and others are used to send ULPDUs in TUF. 1309 The EMSGSIZE error should be returned if the buffer passed to the 1310 sending operation would result in an FPDU that does not fit in an 1311 EMSS-sized TCP segment, unless oversized ULPDU errors are disabled, 1312 as described below. 1314 When the path EMSS increases, the sending operation MAY return 1315 EMSGSIZE once to inform the client of the change. 1317 A.4 Retrieving The Current EMSS or MULPDU 1319 getsockopt (s, SOL_TCP, TUF_MULPDU, &emss, sizeof(emss)); 1321 If the socket is in TUF_SEND_TCP mode, this call returns the TCP 1322 EMSS. If the socket is in TUF_SEND mode, the call returns the 1323 maximum ULPDU that can be submitted in a sending operation without 1324 requiring fragmentation of the associated FPDU. 1326 The number should not count any octets that go towards TCP options. 1328 A.5 Disabling ULPDU Packing 1330 flag = 0; 1331 setsockopt (s, SOL_TCP, TUF_PACK_PDUS, &flag, sizeof(flag)); 1333 This call disables TUF from packing more than one ULPDU into an 1334 FPDU. By default, ULP PDU packing is enabled. 1336 A.6 Disabling The Report of Oversized ULPDUs 1338 flag = 0; 1339 setsockopt (s, SOL_TCP, TUF_REPORT_OVERSIZED, &flag, 1340 sizeof(flag)); 1342 This call disables sending operations from returning EMSGSIZE in 1343 response to oversized ULPDUs. It may be called at any time on a 1344 socket, whether connected or not. It is used to continue ULP 1345 operation when MULPDU is already known to be too small to permit 1346 some ULPDUs to be sent with out segmentation. Oversized ULPDU 1347 reporting can be enabled again if PMTU is discovered to have 1348 increased. 1350 Full Copyright Statement 1352 Copyright (C) The Internet Society (2001). All Rights Reserved. 1354 This document and translations of it may be copied and furnished to 1355 others, and derivative works that comment on or otherwise explain 1356 it or assist in its implementation may be prepared, copied, 1357 published and distributed, in whole or in part, without restriction 1358 of any kind, provided that the above copyright notice and this 1359 paragraph are included on all such copies and derivative works. 1360 However, this document itself may not be modified in any way, such 1361 as by removing the copyright notice or references to the Internet 1362 Society or other Internet organizations, except as needed for the 1363 purpose of developing Internet standards in which case the 1364 procedures for copyrights defined in the Internet Standards process 1365 must be followed, or as required to translate it into languages 1366 other than English. 1368 The limited permissions granted above are perpetual and will not be 1369 revoked by the Internet Society or its successors or assigns. 1371 This document and the information contained herein is provided on 1372 an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET 1373 ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR 1374 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1375 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1376 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.