Internet Engineering Task Force S. Dawkins INTERNET DRAFT G. Montenegro M. Kojo V. Magret October 21, 1999 End-to-end Performance Implications of Slow Links draft-ietf-pilc-slow-02.txt Status of This Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Comments should be submitted to the PILC mailing list at pilc@grc.nasa.gov. Distribution of this memo is unlimited. This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document makes performance-related recommendations for users of network paths that traverse "very low bit-rate" links. "Very low bit-rate" implies "slower than we would like". This recommendation may be useful in any network where hosts can saturate available bandwidth, but the design space for this recommendation explicitly includes connections that traverse 56 Kb/second modem Expires April 21, 2000 [Page 1] INTERNET DRAFT PILC - Slow Links October 1999 links or 4.8 Kb/second wireless access links - both of which are widely deployed. This document discusses general-purpose mechanisms. Where application-specific mechanisms can outperform the relevant general-purpose mechanism, we point this out and explain why. This document has some recommendations in common with RFC 2689, "Providing integrated services over low-bitrate links", especially in areas like header compression. This document focuses more on traditional data applications for which "best-effort delivery" is appropriate. Changes since last draft: Rewrite of Abstract to say less about history and more about technical motivation. Addition of considerations about MTU sizes. Clarification about whether TCP timestamps are actually recommended(!). Clarify discussion of "Interactions with TCP Congestion Avoidance", and add discussion of "Buffer Auto-Tuning". Other editorial changes and corrections. Expires April 21, 2000 [Page 2] INTERNET DRAFT PILC - Slow Links October 1999 Table of Contents 1.0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.0 Description of Optimizations . . . . . . . . . . . . . . . . . . 3 2.1 Header Compression Alternatives . . . . . . . . . . . . . . 3 2.2 Payload Compression Alternatives . . . . . . . . . . . . . 6 2.3 Interactions with TCP Congestion Avoidance [RFC2581] . . . 6 2.4 Choosing MTU sizes . . . . . . . . . . . . . . . . . . . . 8 2.5 Small Window Effects (Experimental) . . . . . . . . . . . . 8 2.6 TCP Buffer Auto-tuning . . . . . . . . . . . . . . . . . . 9 3.0 Summary of Recommended Optimizations . . . . . . . . . . . . . . 9 4.0 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 11 5.0 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Authors' addresses . . . . . . . . . . . . . . . . . . . . . . . . . 12 Expires April 21, 2000 [Page 3] INTERNET DRAFT PILC - Slow Links October 1999 1.0 Introduction The Internet protocol stack was designed to span a wide range of link speeds, and has met this design goal with only a limited number of enhancements (for example, the use of TCP window scaling as described in "TCP Extensions for High Performance" [RFC1323] for very-high-bandwidth connections). Pre-World Wide Web application protocols tended to be either interactive applications sending very little data (e.g., Telnet) or bulk transfer applications that did not require interactive response (e.g., File Transfer Protocol, Network News). The World Wide Web has given us traffic that is both interactive and "bulky", including images, sound, and video. The World Wide Web has also popularized the Internet, so that there is significant interest in accessing the World Wide Web over link speeds that are much "slower" than typical desktop host speeds. In order to provide the best interactive response for these "bulky" transfers, implementors may wish to minimize the number of bits actually transmitted over these "slow" connections. There are two areas that can be considered - compressing the bits that make up the overhead associated with the connection, and compressing the bits that make up the payload being transported over the connection. In addition, implementors may wish to consider TCP receive window settings and queuing mechanisms as techniques to improve performance over low-speed links. While these techniques don't involve protocol changes, they are included in this document for completeness. 2.0 Description of Optimizations This section describes optimizations which have been suggested for use in situations where hosts can saturate their links. The next section summarizes recommendations about the use of these optimizations. 2.1 Header Compression Alternatives Mechanisms for TCP and IP header compression defined in [RFC1144, RFC2507, RFC2508, RFC2509] provide the following benefits: - Improve interactive response time Expires April 21, 2000 [Page 4] INTERNET DRAFT PILC - Slow Links October 1999 - Allow using small packets for bulk data with good line efficiency - Allow using small packets for delay sensitive low data-rate traffic - Decrease header overhead (for a typical dialup MTU of 296 bytes, the overhead of TCP/IP headers can decrease from about 13 percent with typical 40-byte headers to 1-1.5 percent with with 3-5 byte compressed headers, for most packets) - Reduce packet loss rate over lossy links (simply because shorter transmission times expose packets to fewer events that cause loss). Van Jacobson (VJ) header compression [RFC1144] describes a Proposed Standard for TCP Header compression that is widely deployed. It uses TCP timeouts to detect a loss of synchronization between the compressor and decompressor. A more recent header compression proposal [RFC2507] includes an explicit request for retransmission of an uncompressed packet to allow resynchronization without waiting for a TCP timeout (and executing congestion avoidance procedures). Recommendation: Implement [RFC2507], in particular as it relates to IPv4 tunnels and Minimal Encapsulation for Mobile IP, as well as TCP header compression for lossy links and links that reorder packets. PPP capable devices should implement "IP Header Compression over PPP" [RFC2509]. [RFC1144] header compression should only be enabled when operating over reliable "slow" links, because even a single bit error may result in dropping a full TCP window, waiting for a full RTO, and performing slow-start unnecessarily. [RFC1323] defines a "TCP Timestamp" option, used to prevent "wrapping" of the TCP sequence number space on high-speed links, and to improve TCP RTT estimates by providing unambiguous TCP roundtrip timings. Use of TCP timestamps prevents header compression, because the timestamps are sent as TCP options. This means that each timestamped header has TCP options that differ from the previous header, and headers with changed TCP options are always sent uncompressed. For these reasons, and because connections traversing "slow" links do not require protection against TCP sequence-number wrapping, use of TCP Timestamps is not recommended for use with these connections. Expires April 21, 2000 [Page 5] INTERNET DRAFT PILC - Slow Links October 1999 2.2 Payload Compression Alternatives Compression of IP payloads is also desirable on "slow" network lists. "IP Payload Compression Protocol (IPComp)" [RFC2393] defines a framework where common compression algorithms can be applied to arbitrary IP segment payloads. IP payload compression is something of a niche optimization. It is necessary because IP-level security converts IP payloads to random bitstreams, defeating commonly-deployed link-layer compression mechanisms which are faced with payloads that have no redundant "information" that can be more compactly represented. However, many IP payloads are already compressed (images, audio, video, "zipped" files being FTPed), or are already encrypted above the IP layer (e.g., SSL [SSL]/TLS [RFC2246]). These payloads will not "compress" further, limiting the benefit of this optimization. For uncompressed HTTP payload types, HTTP/1.1 [RFC2616] also includes Content-Encoding and Accept-Encoding headers, supporting a variety of compression algorithms for common compressible MIME types like text/plain. This leaves only the HTTP headers themselves uncompressed. The most recent HTTP-NG proposal [HTTP-NG] replaces the text-based HTTP header representation with a binary representation for compactness. In general, application-level compression can often outperform IPComp, because of the opportunity to use compression dictionaries based on knowledge of the specific data being compressed. All these compression techniques will reduce the need for IPComp, especially for WWW users. Recommendation: IPComp may optionally be implemented. Track HTTP-NG standardization (or any proposed mechanism that will compress HTTP headers). 2.3 Interactions with TCP Congestion Avoidance [RFC2581] In many cases, TCP connections that traverse slow links have the slow link as an "access" link, with higher-speed links in use for most of the connection path. One common configuration might be a laptop computer using dialup access to a terminal server, with an HTTP server on a high-speed LAN "behind" the terminal server. Expires April 21, 2000 [Page 6] INTERNET DRAFT PILC - Slow Links October 1999 The HTTP server may be able to place packets on a directly-attached high-speed LAN at a higher rate than the terminal server can forward them on the low-speed link. The consequence of this action is that the terminal server will be unable to buffer unlimited traffic intended for the low-speed link, and will begin to "drop" the excess packets. The self-clocking nature of TCP's slow start and congestion avoidance algorithms prevent this buffer overrun from continuing, but these algorithms also allow senders to "probe" for available bandwidth - cycling through an increasing rate of transmission until loss occurs, followed by a dramatic (50-percent) drop in transmission rate. This happens when a host directly connected to a low-speed link offers a receive window that is unrealistically large for the low-speed link. The peer host continues to probe for available bandwidth, trying to fill the receive window, until packet loss occurs. Hosts that are directly connected to low-speed links should limit the receive windows they advertise. This recommendation takes two forms: - Modern operating systems are using increasingly larger default TCP receive buffers, in order to maximize throughput on high-speed links. Users should be able to choose the default receive window size in use - typically a system-wide parameter. (This "choice" may be as simple as "dial-up access/LAN access" on a dialog box - this would accomodate many environments without requiring hand-tuning by experienced network engineers). - Application developers should rely on the system default, instead of increasing the receive buffer in use (typically via a socket option), to accomodate users connecting via low-speed links. If an application does manage the receiver buffer in use, this should still be under the user's control, as previously suggested. For example - in the case (described in [RFC2416]) where a modem has only three buffers, whenever the HTTP server returns four back-to-back packets, one will be dropped. If this bottleneck link causes the TCP window to be less than four to five segments, it will not be possible to receive three duplicate acknowledgements, so Fast Retransmit/Fast Recovery will never happen, and TCP recovery will take place with full RTO and slow start. In this case, the common MTU of 296 bytes gives an MSS of 256 bytes, so an appropriate receive buffer size would be 768 bytes - any value larger would allow unproductive probing for non-existent bandwidth. Expires April 21, 2000 [Page 7] INTERNET DRAFT PILC - Slow Links October 1999 This recommendation is applicable in environments where the host "knows" it is always connected to other hosts via "slow links". For hosts that may connect to other host over a variety of links (e.g., dial-up laptop computers with LAN-connected docking stations), buffer auto-tuning is a more reasonable recommendation, and is discussed below. 2.4 Choosing MTU Sizes There are several points to keep in mind when choosing an MTU for low-speed links. First, using an MTU that takes more than 200 milliseconds to transmit effectively turns off delayed acknowledgements, because the receiver will never receive a second full-sized segment before the delayed acknowledgement timer expires. Second, "relatively large" MTUs (which take human-perceptible amounts of time to be transmitted into the network) create human- perceptible delays in other connections using the same network interface. [RFC1144] considers 100-200 millisecond delays as human-perceptible. If it is possible to do so, MTUs should be chosen that do not monopolize network interfaces for human-perceptible amounts of time. The convention of 296-byte MTUs for dialup access was chosen to limit the impact of a single MTU size to 100-200 milliseconds on 9.6 Kb/second links [RFC1144], and implementors should not chose MTUs that will occupy a network interface for more than 100-200 milliseconds. 2.5 Small Window Effects (Experimental) If a TCP connection stabilizes with a window of only a few segments (as would be expected on a "slow" link), the sender isn't sending enough segments to generate three duplicate acknowledgements, triggering fast retransmit/fast recovery. This means that a retranmission timeout is required to repair the loss - dropping the TCP connection to a congestion window with only one segment. [TCPB98] and [TCPF98] observe that (in studies of network trace datasets) it is relatively common for TCP retransmission timeouts to occur even when some duplicate acknowledgements are being sent. The challenge is to use these duplicate acknowledgements to trigger fast retransmit/fast recovery without injecting traffic into the network unnecessarily - and especially not injecting traffic in ways that will result in instability. Expires April 21, 2000 [Page 8] INTERNET DRAFT PILC - Slow Links October 1999 In these situations, it may be desireable to trigger fast retransmit/fast recovery more aggressively. [TCPB98] and [TCPF98] suggest sending a new segment when the first and second duplicate acknowledgements are received, so that the receiver will continue to generate duplicate acknowledgements until the TCP retransmit threshhold is reached, triggering fast retransmit/fast recovery. We note that a maximum of two additional new segments will be sent before the receiver sends either an acknowledgement advancing the window or two additional duplicate acknowledgements, triggering fast retransmit/fast recovery, and that these new segments will be acknowledgement-clocked, not back-to-back. The alternative, lowering the fast retransmit/fast recovery threshold, is more likely to inject unnecessary retransmissions when the duplicate acknowledgements are the result of out-of-order delivery to the far-end TCP [PAX97]. 2.6 TCP Buffer Auto-tuning [SMM98] recognizes a tension between the desire to allocate "large" TCP buffers, so that network paths are fully utilized, and a desire to limit the amount of memory dedicated to TCP buffers, in order to efficiently support large numbers of connections to hosts over network paths that may vary by six orders of magnitude. The technique proposed is to dynamically allocate TCP buffers, based on the current effective window, rather than attempting to preallocate TCP buffers based on anticipated window sizes that may be achieved. This proposal results in receive buffers that are appropriate for the window sizes in use, and send buffers large enough to contain two windows of segments, so that SACK can recover losses without "stalling" the connection. While most of the motivation for this proposal is given from a server's perspective, hosts that connect using multiple interfaces with markedly-different link speeds may also find this technique useful. 3.0 Summary of Recommended Optimizations This section summarizes our recommendations regarding the previous mechanisms, for end nodes that are capable of saturating available bandwidth. Expires April 21, 2000 [Page 9] INTERNET DRAFT PILC - Slow Links October 1999 Header compression should be implemented. [RFC1144] header compression can be enabled over robust network connections. [RFC2507] should be used over network connections that are expected to experience loss due to corruption as well as loss due to congestion. [RFC1323] TCP timestamps must be turned off to allow header compression. IP Payload Compression [RFC2393] should be implemented, although compression at higher layers of the protocol stack (examples: [RFC 2068, HTTP-NG]) may make this mechanism less useful. For HTTP/1.1 environments, [RFC2068] payload compression should be implemented and should be used for payloads that are not already compressed. Implementors should choose MTUs that don't monopolize network interfaces for more than 100-200 milliseconds, in order to limit the impact of a single connection on all other connections sharing the network interface. Implementors should consider the possibility that a host will be directly connected to a low-speed link when choosing default TCP receive window sizes, and, if the host is likely to be used with a range of Application developers should consider the possibility that an application will be used on a host that is directly connected to a low-speed link, before increasing the TCP receive window size beyond the default for TCP connections used by this application. All of the mechanisms described above are stable standards-track RFCs (at Proposed Standard status, as of this writing), with the exception of [HTTP-NG], which is included for completeness. In addition, implementors may wish to consider TCP buffer auto-tuning, especially when the host system is likely to be used with a wide variety of access link speeds. This is not a standards- track TCP mechanism. In addition, researchers may wish to experiment with injecting new traffic into the network when duplicate acknowledgements are being received, as described in [TCPB98] and [TCPF98]. This is not a standards-track TCP mechanism. Of the above mechanisms, only Header Compression (for IP and TCP) ceases to work in the presence of end-to-end IPSEC. Expires April 21, 2000 [Page 10] INTERNET DRAFT PILC - Slow Links October 1999 4.0 Acknowledgements This recommendation has grown out of the Internet Draft "TCP Over Long Thin Networks", which was in turn based on work done in the IETF TCPSAT working group. 5.0 References [SMM98] Jeffrey Semke, Matthew Mathis, and Jamshid Mahdavi, "Automatic TCP Buffer Tuning", 1998. Available from http://www.acm.org/sigcomm/sigcomm98/tp/abs_26.html. [HTTP-NG] H. Frystyk Nielsen, Mike Spreitzer, Bill Janssen, Jim Gettys, "HTTP-NG Overview", draft-frystyk-httpng-overview-00.txt, November 17, 1998, expired, but also available from http://www.w3.org/Protocols/HTTP-NG/1998/11/. [PAX97] Paxson, V., "End-to-End Internet Packet Dynamics", 1997, in SIGCOMM 97 Proceedings, available as http://www.acm.org/sigcomm/ccr/archive/ccr-toc/ccr-toc-97.html [RFC1144] Jacobson, V., "Compressing TCP/IP Headers for Low-Speed Serial Links," RFC 1144, February 1990. (Proposed Standard) [RFC1323] Jacobson, V., Braden, R., Borman, D., "TCP Extensions for High Performance", RFC 1323, May 1992. (Proposed Standard) [RFC2246] T. Dierks, C. Allen, "The TLS Protocol: Version 1.0", RFC 2246, January 1999. (Proposed Standard) [RFC2393] A. Shacham, R. Monsour, R. Pereira, M. Thomas, "IP Payload Compression Protocol (IPComp)," RFC 2393, December 1998. (Proposed Standard) [RFC2416] T. Shepard, C. Partridge, "When TCP Starts Up With Four Packets Into Only Three Buffers", RFC 2416, September 1998. [RFC2507] Mikael Degermark, Bjorn Nordgren, Stephen Pink. "IP Header Compression," RFC 2507, February 1999. (Proposed Standard) [RFC2508] S. Casner, V. Jacobson. "Compressing IP/UDP/RTP Headers for Low-Speed Serial Links," RFC 2508, February 1999. (Proposed Standard) [RFC2509] Mathias Engan, S. Casner, C. Bormann. "IP Header Compression over PPP," RFC 2509, February 1999. (Proposed Expires April 21, 2000 [Page 11] INTERNET DRAFT PILC - Slow Links October 1999 Standard) [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control, RFC 2581, April 1999. (Proposed Standard) [RFC2616] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, Masinter, P. Leach, T. Berners-Lee. "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. (Draft Standard) [SSL] Alan O. Freier, Philip Karlton, Paul C. Kocher, The SSL Protocol: Version 3.0, March 1996 (Expired Internet-Draft, available from http://home.netscape.com/eng/ssl3/ssl-toc.html) [TCPB98] Hari Balakrishnan, Venkata N. Padmanabhan, Srinivasan Seshan, Mark Stemm, Randy H. Katz, "TCP Behavior of a Busy Internet Server: Analysis and Improvements", IEEE Infocom, March 1998. Available from: http://www.cs.berkeley.edu/~hari/papers/infocom98.ps.gz [TCPF98] Dong Lin and H.T. Kung, "TCP Fast Recovery Strategies: Analysis and Improvements", IEEE Infocom, March 1998. Available from: http://www.eecs.harvard.edu/networking/papers/ infocom-tcp-final-198.pdf Authors' addresses Questions about this document may be directed to: Spencer Dawkins Nortel Networks 3 Crockett Ct Allen, TX 75002 Voice: +1-972-684-4827 Fax: +1-972-685-3292 E-Mail: sdawkins@nortelnetworks.com Expires April 21, 2000 [Page 12] INTERNET DRAFT PILC - Slow Links October 1999 Gabriel E. Montenegro Sun Labs Networking and Security Group Sun Microsystems, Inc. 901 San Antonio Road Mailstop UMPK 15-214 Mountain View, California 94303 Voice: +1-650-786-6288 Fax: +1-650-786-6445 E-Mail: gab@sun.com Markku Kojo University of Helsinki/Department of Computer Science P.O. Box 26 (Teollisuuskatu 23) FIN-00014 HELSINKI Finland Voice: +358-9-7084-4179 Fax: +358-9-7084-4441 E-Mail: kojo@cs.helsinki.fi Vincent Magret Corporate Research Center Alcatel Network Systems, Inc 1201 Campbell Mail stop 446-310 Richardson Texas 75081 USA M/S 446-310 Voice: +1-972-996-2625 Fax: +1-972-996-5902 E-mail: vincent.magret@aud.alcatel.com Expires April 21, 2000 [Page 13]