Internet Engineering Task Force S. Dawkins INTERNET DRAFT G. Montenegro M. Kojo V. Magret N. Vaidya June 9, 1999 Performance Implications of Link-Layer Characteristics: Links with Errors draft-ietf-pilc-error-00.txt Status of This Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Comments should be submitted to the PILC mailing list at pilc@grc.nasa.gov. Distribution of this memo is unlimited. This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document is part of the PILC (Performance Implications of Link-Layer Characteristics) series of recommendations for "extreme network conditions", and focuses on network paths that traverse "high error-rate" links. Expires December 9, 1999 [Page 1] INTERNET DRAFT PILC - Links with Errors June 1999 Because TCP is still the flagship protocol for reliable data transport on the Internet and is used for Hypertext Transfer Protocol (HTTP) in particular, and because TCP congestion avoidance procedures interact badly with high uncorrected error rates, this document is focused on TCP over high error rate links. The definition of "high error rate" isn't a formal one - the sender spends an excessive amount of time waiting on acknowledgements that aren't coming, whether due to data losses in the forward path or acknowledgement losses in the return path, and these losses are not due to congestion-related buffer exhaustion. The sender then transmits at substantially reduced traffic levels as it probes the network to determine "safe" traffic levels. Expires December 9, 1999 [Page 2] INTERNET DRAFT PILC - Links with Errors June 1999 Table of Contents 1.0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.0 Interactions with Standard TCP Mechanisms . . . . . . . . . . . 5 2.1 Slow Start and Congestion Avoidance [RFC2581] . . . . . . . . 5 2.2 Fast Retransmit and Fast Recovery [RFC2581] . . . . . . . . . 5 2.3 Selective Acknowledgements [RFC2018] . . . . . . . . . . . . 7 2.4 Delayed Duplicate Acknowlegements [MV97, VMPM99] . . . . . . 7 2.5 Detecting Corruption Loss With Explicit Notifications . . . . 8 3.0 Summary of Recommendations . . . . . . . . . . . . . . . . . . . 9 4.0 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 10 5.0 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Authors' addresses . . . . . . . . . . . . . . . . . . . . . . . . . 12 Expires December 9, 1999 [Page 3] INTERNET DRAFT PILC - Links with Errors June 1999 1.0 Introduction It has been axiomatic that most losses on the Internet are due to congestion, as routers run out of buffers and discard incoming traffic. This observation is the basis for current TCP congestion avoidance strategies - if losses are due to congestion, there is no need for an explicit "congestion encountered" notification to the sender. Quoting Van Jacobson in 1988: "If packet loss is (almost) always due to congestion and if a timeout is (almost) always due to a lost packet, we have a good candidate for the `network is congested' signal." [VJ-DCAC] This axiom has served the Internet community well, because it allowed the deployment of TCPs that have allowed the Internet to accomodate explosive growth in link speeds and traffic levels. This same explosive growth has attracted users of networking technologies that DON'T have low uncorrected error rates - especially, but not only, some of the wireless Wide Area Network communities. Senders using these networks may not be able to transmit at anything like available bandwidth because their TCP connections are spending time in congestion avoidance procedures, or even slow-start procedures, that were triggered by corruption losses in the absence of congestion. This document makes recommendations about what the participants in connections that traverse high error-rate links may wish to consider doing to improve utilization of available bandwidth in ways that do not threaten the stability of the Internet. This document discusses end-to-end mechanisms that do not require TCP-level awareness by intermediate nodes. This places severe limitations on what the end nodes can know about the nature of losses that are occurring between the end nodes. Attempts to apply heuristics to distinguish between congestion and corruption losses have not been successful [BV97, BV98, BV98a]. A companion PILC recommendation, on Performance-Enhancing Proxies (PEPs), relaxes this restriction; because PEPs can be placed on boundaries where network characteristics change dramatically, PEPs have an additional opportunity to improve performance over links with uncorrected errors. Reducing the level of uncorrected errors would also improve utilization of available bandwidth, and a PILC recommendation for designers of future link-layer protocols discusses this issue in greater detail. Expires December 9, 1999 [Page 4] INTERNET DRAFT PILC - Links with Errors June 1999 2.0 Interactions with Standard TCP Mechanisms A TCP sender adapts its use of bandwidth based on feedback from the receiver. When TCP is not able to distinguish between losses due to congestion and losses due to uncorrected errors, it is not able to determine available bandwidth. Some TCP mechanisms, targeting recovery from losses due to congestion, coincidentally assist in recovery from losses due to uncorrected errors as well. 2.1 Slow Start and Congestion Avoidance [RFC2581] Slow Start and Congestion Avoidance [RFC2581] are essential the Internet's stability. They are based on implicit congestion notification, not explicit congestion notification. TCP connections with high error rates interact badly with Slow Start and with Congestion avoidance, because high error rates make the interpretation of losses ambiguous - the sender cannot know intuitively whether detected losses are due to congestion or to data corruption. - Whenever TCP's retransmission timer expires, the sender assumes that the network is congested and invokes slow start. - During slow start, the sender increases its window in units of segments. This is why it is important to use an appropriately sized MTU - and less reliable link layers often use smaller MTUs. 2.2 Fast Retransmit and Fast Recovery [RFC2581] TCPs deliver data as a reliable byte-stream to applications, so when a segment is lost (due to either congestion or corruption) delivery of data to the receiving application must wait until the missing data is received. Missing segments are detected by the receiver by segments arriving with out-of-order sequence numbers. TCPs are required to immediately acknowledge data when it is received out-of-order, sending the next expected sequence number with no delay, so that the sender can retransmit the required data and the receiver can resume delivery of data to the receiving application. These acknowledgements are called "duplicate ACKs", because they carry the same expected sequence number as an acknowledgement that has already been sent for the last in-order segment received (unless this acknowledgement was delayed for Expires December 9, 1999 [Page 5] INTERNET DRAFT PILC - Links with Errors June 1999 performance reasons). Because IP networks are allowed to reorder packets, the receiver may send duplicate acknowledgements for segments that are still enroute, but are arriving out of order due to routing changes, link-level retransmission, etc. When a TCP sender receives three duplicate ACKs, fast retransmit [RFC2581] allows it to infer that a segment was lost. The sender retransmits what it considers to be this lost segment without waiting for the full timeout, thus saving time. After a fast retransmit, a sender invokes the fast recovery [RFC2581] algorithm, whereby it invokes congestion avoidance, but not slow start. This also saves time. In general, TCP can increase its window beyond the delay-bandwidth product. In links with high error rates, the TCP window may remain rather small, less than four segments, for long periods of time due to any of the following reasons: 1. Typical "file size" to be transferred over a connection is relatively small (Web requests, Web document objects, email messages, files, etc.) In particular, users of links with high error rates are often unwilling to carry out large transfers as the response time is so long. 2. When links have high uncorrected error rates, the cwnd tends to stay small. 3. When a TCP path with high uncorrected error rates "crosses" a highly congested wireline Internet path, congestion losses on the Internet have the same effect as 2. 4. Commonly, ISPs/operators configure only a small number of buffers (even as few as for 3 packets) per user in their dial-up routers 5. Often small socket buffers are recommended with high error-rate links in order to prevent the RTO from inflating. A small window - especially a window of less than four segments - effectively prevents the sender from taking advantage of Fast Retransmits. Moreover, efficient recovery from multiple losses within a single window requires adoption of new proposals (NewReno [RFC2582]). Recommendation: Implement Fast Retransmit and Fast Recovery at this time. This is a widely-implemented optimization and is currently at Proposed Standard level. [RFC2488] recommends Expires December 9, 1999 [Page 6] INTERNET DRAFT PILC - Links with Errors June 1999 implementation of Fast Retransmit/Fast Recovery in satellite environments. NewReno [RFC2582] apparently does help a sender better handle partial ACKs and multiple losses in a single window, but at this point is not recommended due to its experimental nature. Instead, SACK is the preferred mechanism. 2.3 Selective Acknowledgements [RFC2018] Selective Acknowledgements allow the repair of multiple segment losses per window without requiring one round-trip per loss. Selective acknowledgements are most useful in LFNs ("Long Fat Networks", because of the long round trip times that may be encountered in these environments, according to Section 1.1 of [RFC1323], and are especially useful if large windows are required, because there is a considerable probability of multiple segment losses per window. In low-speed, high error-rate environments (for example, the wireless WAN environment), TCP windows are much smaller, and burst errors must be much longer in duration in order to damage multiple segments. Accordingly, the complexity of SACK may not be justifiable, unless there is a high probability of both burst errors and congestion. Berkeley's SNOOP protocol research [SNOOP] indicates that SACK does improve throughput for SNOOP when multiple segments are lost per window [BPSK96]. SACK allows SNOOP to recover from multi-segment losses in one round-trip. In this case, the wireless device needs to implement some form of selective acknowledgements. If SACK is not used, recovery from multi-segment losses takes so long that TCP enters congestion avoidance anyway. Recommendation: Implement SACK now for compatibility with other TCPs and improved performance with SNOOP. 2.4 Delayed Duplicate Acknowlegements [MV97, VMPM99] When link layers try aggressively to correct a high underlying error rate, it is imperative to prevent interaction between link-layer retransmission and TCP retransmission as these layers duplicate each other's efforts. In such an environment it may make sense to delay TCP's efforts so as to give the link-layer a chance to recover. With this in mind, the Delayed Dupacks [MV97, VMPM99] scheme selectively delays duplicate acknowledgements at the receiver. It is preferrable to allow a local mechanism to resolve a local problem, instead of invoking TCP's end-to-end Expires December 9, 1999 [Page 7] INTERNET DRAFT PILC - Links with Errors June 1999 mechanism and incurring the associated costs, both in terms of wasted bandwidth and in terms of its effect on TCP's window behavior. At this time, it is not well understood how long the receiver should delay the duplicate acknowledgments. In particular, the impact of medium access control (MAC) protocol on the choice of delay parameter needs to be studied. The MAC protocol may affect the ability to choose the appropriate delay (either statically or dynamically). In general, significant variabilities in link-level retransmission times can have an adverse impact on the performance of the Delayed Dupacks scheme. Recommendation: Delaying duplicate acknowledgements may be useful in specific network topologies, but a general recommendation requires further research and experience. 2.5 Detecting Corruption Loss With Explicit Notifications As noted above, today's TCPs assume that any loss is due to congestion, and encounter difficulty in distinguishing between congestion loss and corruption loss because this "implicit notification" mechanism can't carry both meanings at once. With explicit notification from the network it is possible to determine when a loss is due to congestion. Several proposals along these lines include: - Explicit Loss Notification (ELN) [BPSK96] - Explicit Bad State Notification (EBSN) [BBKVP96] - Explicit Loss Notification to the Receiver (ELNR), and Explicit Delayed Dupack Activation Notification (EDDAN) [MV97] - Explicit Congestion Notification (ECN) [ECN] Of these proposals, Explicit Congestion Notification (ECN) seems closest to deployment on the Internet. ECN requires changes to the routing infrastructure to perform "active queue management" - to detect impending buffer exhaustion, and to randomly drop packets when impending buffer exhaustion has been detected, so that receivers will respond to this implicit notification by slowing their transmission rate and avoiding total buffer exhaustion. Expires December 9, 1999 [Page 8] INTERNET DRAFT PILC - Links with Errors June 1999 ECN then builds on "active queue management" by providing a mechanism for hosts marking packets as "ECN-capable", and routers marking ECN-capable packets as "congestion encountered" during periods of impending buffer exhaustion. This allows ECN-capable routers to provide congestion notification to ECN-capable hosts without dropping packets that would otherwise have been delivered (because the router still has available buffers when the packet arrives). The problem with ECN is that the absence of packets marked as "congestion encountered" should not be interpreted by ECN-capable TCP connections as a green light for aggressive retransmissions. On the contrary, during periods of extreme network congestion routers may drop packets marked with explicit notification because their buffers are exhausted - exactly the wrong time for a host to begin retransmitting aggressively. This isn't a criticism of ECN, which was never intended to be used as a surrogate for explicit corruption notification - only an explanation of why it isn't such a surrogate. ECN uses the TOS byte in the IP header to carry congestion information (ECN-Capable and Congestion-Encountered). This byte is not encrypted in IPSEC, so ECN can be used on TCP connections that are encrypted using IPSEC. Recommendation: Implement ECN, but do not (mis)use it as a surrogate for explicit corruption notification. Continue to investigate true corruption-notification mechanisms like ELNR and EDDAN [MV97], in which the only systems that need to be modified are the base station and the mobile device justify further research. However, the requirement that the base station be able to examine the TCP headers flying through it raises issues with respect to IPSEC-encrypted packets. 3.0 Summary of Recommendations Because existing TCPs have only one implicit loss feedback mechanism, it is not possible to use this mechanism to distinguish between congestion loss and corruption loss without additional information. Because congestion affects all traffic on a path while corruption affects only the specific traffic encountering uncorrected corruption, avoiding congestion has to take precedence over quickly repairing corruption loss. This means that the best that can be achieved without new feedback mechanisms is minimizing Expires December 9, 1999 [Page 9] INTERNET DRAFT PILC - Links with Errors June 1999 the amount of time spent unnecessarily in congestion avoidance. Fast Retransmit/Fast Recovery allows quick repair of loss without giving up the safety of congestion avoidance. In order for Fast Retransmit/Fast Recovery to work, the window size must be large enough to force the receiver to send three duplicate acknowledgements before the retransmission timeout interval expires, forcing full TCP slow-start. Selective Acknowledgements (SACK) extend the benefit of Fast Retransmit/Fast Recovery to situations where multiple "holes" in the window need to be repaired more quickly than can be accomplished by executing Fast Retransmit for each hole, only to discover the next hole. SACK has been found particularly useful in SNOOP environments [SNOOP](where an intermediate network node is handling retransmissions on behalf of the endpoints). SNOOP will be described in more detail in the PILC PEP draft, and is only mentioned here in conjunction with SACK. Delayed Duplicate Acknowledgements is an attractive scheme, especially when link layers use fixed retransmission timer mechanisms that may still be trying to recover when TCP-level retransmission timeouts occur, adding additional traffic to the network. This proposal is worthy of additional study, but is not recommended at this time, because we don't know how to calculate optimal amounts of delay for an arbitrary network topology. Explicit corruption notification mechanisms are being overshadowed by explicit congestion notification mechanisms, and it's not possible to use explicit congestion notification as a surrogate for explicit corruption notification. Of these mechanisms, SNOOP plus SACK and Delayed Duplicate Acknowledgements apply only to wireless networks. The others cover both wireless and wireline environments. Their more general applicability attracts more attention and analysis from the research community. Of these mechanisms, only "SNOOP plus SACK" ceases working in the presence of IPSec. 4.0 Acknowledgements This recommendation has grown out of the Internet Draft "TCP Over Long Thin Networks", which was in turn based on work done in the Expires December 9, 1999 [Page 10] INTERNET DRAFT PILC - Links with Errors June 1999 IETF TCPSAT working group. 5.0 References [BBKVP96] Bakshi, B., P., Krishna, N., Vaidya, N., Pradhan, D.K., "Improving Performance of TCP over Wireless Networks," Technical Report 96-014, Texas A&M University, 1996. [BPSK96] Balakrishnan, H., Padmanabhan, V., Seshan, S., Katz, R., "A Comparison of Mechanisms for Improving TCP Performance over Wireless Links," in ACM SIGCOMM, Stanford, California, August 1996. [BV97] Biaz, S., Vaidya, N., "Using End-to-end Statistics to Distinguish Congestion and Corruption Lossses: A Negative Result," Texas A&M University, Technical Report 97-009, August 18, 1997. [BV98] Biaz, S., Vaidya, N., "Sender-Based heuristics for Distinguishing Congestion Losses from Wireless Transmission Losses," Texas A&M University, Technical Report 98-013, June 1998. [BV98a] Biaz, S., Vaidya, N., "Discriminating Congestion Losses from Wireless Losses using Inter-Arrival Times at the Receiver," Texas A&M University, Technical Report 98-014, June 1998. [ECN] Ramakrishnan, K.K., Floyd, S., "A Proposal to add Explicit Congestion Notification (ECN) to IP", RFC 2481, January 1999. [MV97] Mehta, M., Vaidya, N., "Delayed Duplicate-Acknowledgements: A Proposal to Improve Performance of TCP on Wireless Links," Texas A&M University, December 24, 1997. Available at http://www.cs.tamu.edu/faculty/vaidya/mobile.html [RFC1122] Braden, R., Requirements for Internet Hosts -- Communication Layers, October 1989. [RFC1323] Van Jacobson, Robert Braden, and David Borman. TCP Extensions for High Performance, May 1992. RFC 1323. [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and Romanow, A., "TCP Selective Acknowledgment Options," October, 1996. [RFC2309] Braden, B. Clark, D., Crowcroft, J., Davie, B., Deering, S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., Partridge, C., Peterson, L., Ramakrishnan, K.K., Shenker, S., Wroclawski, J., Expires December 9, 1999 [Page 11] INTERNET DRAFT PILC - Links with Errors June 1999 Zhang, L., "Recommendations on Queue Management and Congestion Avoidance in the Internet," RFC 2309, April 1998. [RFC2488] Mark Allman, Dan Glover, Luis Sanchez. "Enhancing TCP Over Satellite Channels using Standard Mechanisms," RFC 2488 (BCP 28), January 1999. [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control," April 1999. RFC 2581. [RFC2582] Floyd, S., Henderson, T., "The NewReno Modification to TCP's Fast Recovery Algorithm," April 1999. RFC 2582. [SNOOP] Balakrishnan, H., Seshan, S., Amir, E., Katz, R., "Improving TCP/IP Performance over Wireless Networks," Proc. 1st ACM Conf. on Mobile Computing and Networking (Mobicom), Berkeley, CA, November 1995. [VJ-DCAC] Van Jacobson, "Dynamic Congestion Avoidance / Control" e-mail dated Feberuary 11, 1988, available from http://www.kohala.com/~rstevens/vanj.88feb11.txt [VMPM99] N. H. Vaidya, M. Mehta, C. Perkins, G. Montenegro, "Delayed Duplicate Acknowledgements: A TCP-Unaware Approach to Improve Performance of TCP over Wireless," Technical Report 99-003, Computer Science Dept., Texas A&M University, February 1999. Authors' addresses Questions about this document may be directed at: Spencer Dawkins Nortel Networks P.O. Box 833805 Richardson, Texas 75083-3805 Voice: +1-972-684-4827 Fax: +1-972-685-3292 E-Mail: sdawkins@nortel.com Expires December 9, 1999 [Page 12] INTERNET DRAFT PILC - Links with Errors June 1999 Gabriel E. Montenegro Sun Labs Networking and Security Group Sun Microsystems, Inc. 901 San Antonio Road Mailstop UMPK 15-214 Mountain View, California 94303 Voice: +1-650-786-6288 Fax: +1-650-786-6445 E-Mail: gab@sun.com Markku Kojo University of Helsinki/Department of Computer Science P.O. Box 26 (Teollisuuskatu 23) FIN-00014 HELSINKI Finland Voice: +358-9-7084-4179 Fax: +358-9-7084-4441 E-Mail: kojo@cs.helsinki.fi Vincent Magret Corporate Research Center Alcatel Network Systems, Inc 1201 Campbell Mail stop 446-310 Richardson Texas 75081 USA M/S 446-310 Voice: +1-972-996-2625 Fax: +1-972-996-5902 E-mail: vincent.magret@aud.alcatel.com Nitin Vaidya Dept. of Computer Science Texas A&M University College Station, TX 77843-3112 Voice: +1 409-845-0512 Fax: +1 409-847-8578 Email: vaidya@cs.tamu.edu Expires December 9, 1999 [Page 13]