J. Williams Internet-draft Emulex Expires: September 2001 J. Pinkerton Microsoft C. Sapuntzakis Cisco J. Wendt HP J. Chase Duke ULP Framing for TCP draft-williams-tcpulpframe-01 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2001). All Rights Reserved. Abstract This document proposes a framing protocol for TCP which is designed to be fully compliant with applicable TCP RFC's and fully interoperable with existing TCP implementations. The framing mechanism is designed to work as a "shim" between TCP and higher- level protocols, preserving the reliable, in-order delivery of TCP while adding the preservation of higher-level protocol record Williams Expires September 2001 [Page 1] Internet-Draft ULP Framing for TCP 13 March 2001 boundaries if the record is less than or equal to the path MTU. The shim is designed to enable hardware acceleration of data movement operations (e.g. direct placement of receive TCP segments into higher-level protocol buffers) for the protocols that use it, even if TCP segments are delivered out-of-order. Introduction This document proposes a TCP convention that facilitates hardware acceleration for upper-level protocols (ULPs) using TCP as a transport. The proposal does not change the semantics of TCP. It is designed to be fully compliant with applicable RFCs and fully interoperable with existing TCP implementations. The proposal supports ULPs that use TCP to transmit ordered streams of ULP messages, also called Protocol Data Units or PDUs. Many such ULPs exist today, including NFS, CIFS, ONC RPC, IIOP, and HTTP 1.1. A scheme to locate ULP PDU boundaries in a data stream is called ``framing''. These ULPs use byte-counting above TCP to implement framing within the ordered TCP byte stream, i.e., the header of each PDU header contains the PDU's length. This proposal is intended to support a new ULP that acts as a ``shim'' between TCP and higher-level protocols. The shim implements framing and other features above TCP, and acts as a TCP- based transport for higher-level protocols. The shim ULP is designed to enable hardware acceleration of data movement operations for the protocols that use it. Details of the shim are beyond the scope of this document. The purpose of this proposal is to allow a TCP-based ULP sender and receiver --- by mutual consent negotiated at the ULP endpoints --- to observe conventions that enable a hardware-accelerated receiver to locate ULP headers in the incoming data stream, even if a previous TCP segment is lost. A TCP implementation that supports these conventions is called ``framing-aware''. Framing-aware senders and receivers are fully interoperable with TCP implementations that are not framing-aware. Traffic between a framing-aware sender and a framing-aware receiver is indistinguishable from traffic between existing TCP implementations that conform to existing applicable RFCs. Any TCP-based ULP, including the proposed shim, will run correctly even if the sender or receiver is not framing-aware. The purpose of the framing conventions is only to improve performance. If both the sender and receiver are framing-aware, and the ULP endpoints negotiate to enable the proposed framing conventions, Williams Expires September 2001 [Page 2] Internet-Draft ULP Framing for TCP 13 March 2001 then the connection is called a ``framing connection''. The proposed conventions enable a hardware-accelerated receiver to handle incoming data on a framing connection efficiently. In general, the framing conventions allow the receiver to reliably locate ULP PDU headers contained in TCP segments. Although the receiver does not immediately process the ULP headers or data, it may use PDU header fields as a hint to guide its choice of where to place the segment in the receiver's memory for later processing. The framing conventions for a framing connection are as follows. The sending ULP attempts to align its PDU headers with TCP segment boundaries. In common existing TCP implementations (e.g., BSD kernels), this may be implemented in the ULP and network buffering code without modifying the TCP implementation: the sending ULP may generate its PDU headers as TCP upcalls it for data to fill each segment, fragmenting or coalescing its PDUs as necessary. In certain instances it may not be possible for the sender to guarantee this alignment. In particular, the sender may be forced to violate the alignment if it must retransmit a previously transmitted PDU after an MTU change. In this case it is necessary to signal the receiver that the alignment has been violated. The purpose of this proposal is to define a convention to allow the sender to notify the receiver that a segment is not aligned on a PDU boundary, without allocating a bit in the TCP header for this purpose. The framing connection algorithm is viewed as a transition step between current deployed TCP protocol stacks and the eventual deployment of SCTP. Because SCTP supports framing directly, rather than requiring a shim to be implemented on top of it, it is expected that this framing mechanism will eventually be replaced by SCTP when SCTP becomes pervasive. 1. ULP Support for Framing A ULP will submit PDUs to the framing protocol. In the standard mode, the ULP PDUs are limited to the smaller of 2^16-1 (65535) and the size that will fit entirely within a TCP segment. The framing protocol MUST fail any attempt to submit a ULP PDU that is larger than will fit in a TCP segment. The TCP maximum segment size (MSS) can shrink to 8 bytes [see PathMTU] which leaves no room for ULP PDUs. If the MSS goes below 512 bytes, the ULP MAY instruct the framing protocol to enter an "emergency mode." In this mode, the framing protocol MUST accept ULP PDUs up to 512 bytes and MAY fragment the ULPs across TCP segments. Williams Expires September 2001 [Page 3] Internet-Draft ULP Framing for TCP 13 March 2001 Framing-aware TCP implementations will indicate to the framing protocol the path maximum segment size (MSS). This size may change during the course of the connection due to changes in the path MTU. The framing protocol MUST notify the ULP sender of changes in the MSS. The framing protocol MUST provide on the current value of the path MSS to the ULP on request. 2. Framing Protocol Each ULP PDU will be encapsulated in a framing PDU. The format of the framing PDU is as follows. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | Key | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Key | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | ~ ~ ~ ULP PDU ~ | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ULP PDU | PAD (up to 3 bytes) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The "Length" field is 16 bits and contains the length in bytes of the ULP PDU. The PAD field trails the ULP PDU and contains between zero and three bytes of data. The pad data must be set to zero by the sender and ignored by the receiver. The length of the pad is set so as to make the framing PDU a multiple of four. The KEY field is 48 bits and its usage depends on whether the sending TCP implementation is framing aware. If the sending TCP implementation is NOT framing aware (i.e. is a conventional TCP implementation) then the framing protocol must set the key to zero. If the sending TCP implementation is framing aware, then the KEY value is a non-zero random value selected by the sender at connection setup time. All framing PDUs sent on a given connection in one direction must use the same (original) KEY value. Each Williams Expires September 2001 [Page 4] Internet-Draft ULP Framing for TCP 13 March 2001 direction will in general have a different KEY value. The length of the framing PDU in bytes will be 8 + round(L), where L is the length of the ULP PDU and round(L) is the value of L rounded up to the next multiple of four. The length of the ULP PDU may or may not be a multiple of four. 3. Encapsulation of Framing PDUs within a TCP Stream If a TCP connection supports the ULP Framing Protocol, than all data sent on that connection must be Framing PDUs. There is no provision to mix both framed and unframed data on the same connection. Two types of TCP implementations are supported, framing aware and non framing aware. The requirements for each are as follows. 3.1. Non Framing Aware TCP implementations Conventional TCP implementations without special support for framing are considered "non framing aware". In this case the KEY field of the framing header must be set to zero. There are no other requirements other than standard TCP requirements. 3.2. Framing Aware TCP implementation Framing aware TCP implementations must notify the framing protocol of changes in the path maximum segment size (PMSS). The framing protocol must be able to retrieve the PMSS from the framing-aware TCP. Because of changes in the PMSS, there may be cases when a fully framing aware ULP will fail to create PDUs which fit in a TCP segment. This can occur, for example, when retransmitting framing segments after a path MSS change. The use of oversize TCP segments sent by means of IP fragmentation is discouraged due to the limited ID number size of IP and the potential for undetected error due to ID number wrap. Framing aware TCP implementations should resegment at the TCP layer when necessary to meet requirements of the path MSS. If a framing PDU must be split across multiple TCP segments, then the sending TCP implementation must insure that each TCP segment containing a piece of the split framing PDU MUST have a length which is NOT a multiple of four. See Appendix A for such an algorithm at the sender for ensuring this property. Williams Expires September 2001 [Page 5] Internet-Draft ULP Framing for TCP 13 March 2001 4. Validity of Framing-aware TCP segmentation A framing-aware TCP might send more segments that a non-framing aware TCP and each segment might be smaller. A non-framing aware TCP might in some circumstances be able to place multiple messages in a segment, splitting the first or last or both messages. However, framing-aware TCPs still respect congestion control windows, which are maintained as a byte count not as a segment count. Also, in the worst case, the framing-aware TCP sends out a little under twice as many segments as the non-framing aware. This is the only extra load on the network. On retransmission, a framing-aware TCP respects the original stream segmentation. This is allowed by RFC1122, section 4.2.2.15. 5. TCP Receiver Framing Recovery. Because each framing PDU contains sufficient information to determine its length, the beginning of the next framing PDU can be determined. Therefore each successive ULP can be recovered. Conventional TCP implementations will pass received data to the ULP in order, so framing is easily recovered by the ULP. Special framing aware TCP receive implementations may allow the ULP to do immediate data placement on TCP segments received out of order. The receiving end can safely assume that a framing header is aligned with the beginning of the TCP segment's payload if the following conditions are met. 1. Standard TCP processing indicates that this is a valid, in-window segment. 2. The remote sending TCP implementation is framing aware as evidenced by a non-zero KEY on all previous framing PDUs. 3. The received TCP segment length is a multiple of four. 4. No evidence of a resegmenting middle-box has been observed on this connection. Evidence of a resegmenting middle-box would be a previously received TCP segment whose length is a multiple of four and which contained a piece of a split framing PDU. 5. The data contained in the TCP segment parses correctly when interpreted as one or more framing PDUs. In particular, Williams Expires September 2001 [Page 6] Internet-Draft ULP Framing for TCP 13 March 2001 all the KEYs are correct, and the lengths add up to the length of the containing TCP segment. The framing protocol passes the ULP PDU to a ULP parser. The ULP parser determines where in memory the various parts of the ULP PDU shall be placed. The ULP parser MUST NOT execute the ULP protocol (i.e. none of the ULP protocol state variables change). 6. Validity of the Alignment Algorithm The objective of the transmit and receive algorithms is to ensure that the receiver, when processing an out of order TCP segment, never assumes alignment of the framing header with the TCP segment when in fact alignment is not the case. In the absence of a middle-box which resegments the TCP stream, this should never occur. In the presence of such a middle-box, every effort is made to avoid making an invalid alignment assumption, however in the extremely rare case that the middle-box maintained perfect alignment until the critical moment when an out of order TCP segment is received at the destination, then avoiding of erroneous processing of the data depends on the sufficiently low probability that the data stream will not contain a valid framing header(s) with the length (sum of lengths) matching the TCP segment length AND a valid KEY(s) at the non-aligned point in the data stream. 7. Security considerations The modification of the sender's TCP segmentation algorithm does not open any new attacks, since: 1) the segmentation algorithm is not based on input from the network, 2) the segmentation algorithm packs small ULP PDUs into a single TCP segment so it does not open packet flooding attacks. If an attacker can send an in-window TCP segment that is accepted, the attacker can probably force the TCP receiver out of the framing recovery mode, degrading service. However, such an attacker can also place data into the stream, so it is not entirely clear how compelling this attack is. The ULP framing protocol works on top of an unmodified TCP. As such, TLS may be used to secure any protocol using the ULP framing protocol. However, when TLS is in use, the framing protocol is best layered under TLS. If this is done, the ULP framing protocol headers are NOT protected by the TLS authentication and integrity features. If Williams Expires September 2001 [Page 7] Internet-Draft ULP Framing for TCP 13 March 2001 an attacker modifies the ULP framing protocol, the attacker can corrupt the TLS packet. TLS will detect this corruption because of its integrity checks and terminate the connection. The authors do not believe this denial-of-service attack is any simpler than inserting data into the TCP stream and corrupting TLS that way. 8. References [ALF] D. D. Clark and D. L. Tennenhouse, "Architectural considerations for a new generation of protocols," in SIGCOMM Symposium on Communications Architectures and Protocols , (Philadelphia, Pennsylvania), pp. 200--208, IEEE, Sept. 1990. Computer Communications Review, Vol. 20(4), Sept. 1990. [SOCKS] Leech, M., and others, "SOCKS Protocol Version 5," RFC 1928, April 1996 [RFC1112] Braden, R., ed., "Requirements for Internet Hosts -- Communications Layers", RFC 1122, October 1989 [PathMTU] Mogul, J., and Deering, S., "Path MTU Discovery", RFC 1191 [RFC2581] Allman, M. and others, "TCP Congestion Control," RFC 2581, April 1999 [Stevens] Stevens, W. Richard, "Unix Network Programming Volume 1," Prentice Hall, 1998, ISBN 0-13-490012-X [TCP] Postel, J., "Transmission Control Protocol - DARPA Internet Program Protocol Specification", RFC 793, September 1981 [TLS] Dierks, T. and others, "The TLS Protocol, Version 1.0", RFC 2246 Williams Expires September 2001 [Page 8] Internet-Draft ULP Framing for TCP 13 March 2001 Appendix A: Segmentation algorithm for Framing-aware TCP SeqNextByte = next byte to send SeqStartFrame = start of the current Framing protocol frame SeqStartNextFrame = start of the next Framing protocol frame PathMss = maximum segment size with options subtracted // How many bytes do we have to send in the current frame FrameBytesLeft = SeqStartNextFrame - SeqNextByte if (FrameBytesLeft <= PathMss) { if (SeqNextByte == SeqStartFrame || FrameBytesLeft % 4) { // Pack as many complete framing protocol frames in a frame // as possible SegmentBytesLeft = PathMss do { copy frame into segment SegmentBytesLeft -= FrameBytesLeft update SeqStartFrame and SeqStartNextFrame to point to next frame FrameBytesLeft = SeqStartNextFrame - SeqStartFrame } while (FrameBytesLeft < SegmentBytesLeft); } else { // This case happens when the remote TCP acknowledges // up to an even byte boundary send frame with FrameBytesLeft - 1 bytes } } else // FrameBytesLeft >= PathMss if ((PathMss % 4) && (FrameBytesLeft - PathMss) % 4) { send frame with the next PathMss bytes from current frame } else if (((PathMss - 1) % 4) && (FrameBytesLeft - (PathMss - 1)) % 4) { send frame with PathMss - 1 bytes } else if (((PathMss - 2) % 4) && (FrameBytesLeft - (PathMss - 2)) % 4) { send frame with PathMss - 2 bytes } Williams Expires September 2001 [Page 9] Internet-Draft ULP Framing for TCP 13 March 2001 Appendix B: Sockets support for framing-aware TCP senders B.1.1. Creating a TCP socket with segmentation support s = socket (PF_INET, SOCK_STREAM, getprotobyname("tcp")); flag = 1 setsockopt (s, SOL_TCP, TCP_FRAMING_AWARE, &flag, sizeof(flag)); A TCP that does not support segmentation MUST fail the setsockopt call. The setsockopt call MAY not be made on an open TCP connection. B.1.2. Sending data atomically on that socket The send or sendmsg calls should be used to write data to the TCP stream. The EMSGSIZE error should be returned if the buffer passed to send or sendmsg is too large to fit in a single TCP segment. When the path MSS increases, the TCP MAY return EMSGSIZE once to inform the client of the change. B.1.3. Retrieving the max segment size getsockopt (s, SOL_TCP, TCP_SEND_MSS, &mbs, &mbslen); This call returns the maximum segment size that can be sent without fragmentation. The number should not count any bytes that go towards TCP options. Williams Expires September 2001 [Page 10] Internet-Draft ULP Framing for TCP 13 March 2001 Authors' Addresses Jim Williams Giganet, Inc. Concord Office Center 580 Main Street Bolton, MA 01740 US Phone: +1 978 779 7224 EMail: jimw@giganet.com Jim Pinkerton Microsoft, Inc. 1 Microsoft Way Redmond, WA 98052 USA EMail: jpink@microsoft.com Constantine Sapuntzakis Cisco Systems 170 W Tasman Drive San Jose, CA 95134 USA Phone: +1 408 525 5497 EMail: csapuntz@cisco.com Jim Wendt Hewlett Packard Corporation 8000 Foothills Boulevard MS 5668 Roseville, CA 95747-5668 USA Phone: +1 916 785 5198 EMail: jim_wendt@hp.com Jeff Chase Duke University Email: chase@cs.duke.edu Williams Expires September 2001 [Page 11] Internet-Draft ULP Framing for TCP 13 March 2001 Full Copyright Statement Copyright (C) The Internet Society (2001). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Williams Expires September 2001 [Page 12]