Network Working Group INTERNET DRAFT S. Jagannath and N. Yin Expire in six months Bay Networks Inc. August 1997 End-to-End Traffic Management Issues in IP/ATM Internetworks Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). Abstract This document addresses the end-to-end traffic management issues in IP/ATM internetworks. In the internetwork environment, the ATM control mechanisms (e.g., Available Bit Rate (ABR) and UBR with Early Packet Discard (EPD)) are applicable to the ATM subnetwork, while the TCP flow control extends from end to end. We investigated the end to end performance in terms of TCP throughput and file transfer delay in cases using ABR and UBR in the ATM subnetwork. In this document, we also discuss the issue of trade-off between the buffer requirements at the ATM edge device (e.g., Ethernet-ATM switch, ATM router interface) versus ATM switches inside the ATM network. Our simulation results show that in certain scenarios (e.g., with limited edge device buffer memory) UBR with EPD may perform comparably to ABR or even outperform ABR. We show that it is not sufficient to have a lossless ATM subnetwork from the end-to-end Jagannath, Yin [Page 1] INTERNET DRAFT August 1997 Expires February 5 1998 performance point of view. The results illustrate the necessity for an edge device congestion handling mechanism that can couple the ABR and TCP feedback control loops. We present an algorithm that makes use of the ABR feedback information and edge device congestion state to make packet dropping decisions at the edge of the ATM network. Using the algorithm at the edge device, the end-to-end performance in throughput and delay are improved while using ABR as the ATM subnetwork technology and with small buffers in the edge device. Jagannath, Yin [Page 2] INTERNET DRAFT August 1997 Expires February 5 1998 1. Introduction In an IP/ATM internetwork environment, the legacy networks (e.g., Ethernet) are typically interconnected by an ATM subnetwork (e.g., ATM backbone). Supporting end-to-end Quality of Services (QOS) and traffic management in the internetworks become important requirements for multimedia applications. In this document, we focus on the end- to-end traffic management issues in IP/ATM internetwork. We investigated the end-to-end performance in terms of TCP throughput and file transfer delay in cases using ABR and UBR in the ATM subnetwork. In this document, we also discuss the issue of trade-off between the buffer requirement at the ATM edge device (e.g., Ethernet-ATM switch, ATM router interface) versus ATM switches inside the ATM network. ABR rate-based control mechanism has been specified by the ATM Forum [1]. In the past, there have been significant investigations on ABR flow control and TCP over ATM [8-11]. Most of the works were focused on the ATM switch buffer requirement and throughput fairness using different algorithms (e.g., simple binary - Explicit Forward Congestion Indication (EFCI) vs. Explicit Rate Control; FIFO vs. per VC queuing, etc.). It is generally believed that ABR can effectively control the congestion within ATM networks. There has been little work on end-to-end traffic management in the internetworking environment, where networks are not end-to-end ATM. In such cases ABR flow control may simply push the congestion to the edge of ATM network. Even if the ATM network is kept free of congestion by using ABR flow control, the end-to-end performance (e.g., the time to transfer a file) perceived by the application (typically hosted in the legacy network) may not necessarily be better. Furthermore, one may argue that the reduction in buffer requirement in the ATM switch by using ABR flow control may be at the expense of an increase in buffer requirement at the edge device (e.g., ATM router interface, legacy LAN to ATM switches). Since most of today's data applications use TCP flow control protocol, one may question the benefits of ABR flow control as a subnetwork technology, arguing that UBR is equally effective and much less complex than ABR [12]. Figures 1 and 2 illustrate the two cases under consideration. Jagannath, Yin [Page 3] INTERNET DRAFT August 1997 Expires February 5 1998 Source --> IP -> Router --> ATM Net --> Router -> IP --> Destination Cloud UBR Cloud ----------------------->---------------------- | | --------------- TCP Control Loop --<---------- Figure 1: UBR based ATM Subnetwork. In these cases, source and destination are interconnected through IP/ATM/IP internetworks. In Figure 1, the connection uses UBR service and in Figure 2, the connection uses ABR. In the UBR case, when congestion occurs in ATM networks, cells are dropped (e.g., Early Packet Discard [15] may be utilized) resulting in the reduction of the TCP congestion window. In the ABR case, when congestion is detected in ATM Networks, ABR rate control becomes effective and enforces the ATM edge device to reduce its transmission rate into the ATM network. If congestion persists, the buffer in the edge device will reach its capacity and start to drop packets, resulting in the reduction of the TCP congestion window. From the performance point of view, the latter involves two control loops: ABR and TCP. The ABR inner control loop may result in a longer feedback delay for the TCP control. Furthermore, there are two feedback control protocols, and one may argue that the interactions or interference between the two may actually degrade the TCP performance [12]. Source --> IP -> Router --> ATM Net --> Router -> IP --> Destination Cloud ABR Cloud ----->--------- | | -- ABR Loop -<- -------------------------->------------------- | | --------------- TCP Control Loop --<---------- Figure 2. ABR Based ATM Subnetwork From the implementation perspective, there are trade-off between memory requirements at ATM switch and edge device for UBR and ABR. In Jagannath, Yin [Page 4] INTERNET DRAFT August 1997 Expires February 5 1998 addition, we need to consider the costs of implementation of ABR and UBR at ATM switches and edge devices. From this discussion, we believe that the end-to-end traffic management of an entire internetwork environment should be considered. In this document, we study the interaction and performance of ABR and UBR from the point of view of end-to-end traffic management. Our simulation results show that in certain scenarios (e.g., with limited edge device buffer memory) UBR with EPD may perform comparably to ABR or even outperform ABR. We show that it is not sufficient to have a lossless ATM subnetwork when we are looking at the end-to-end performance point of view. The results illustrate the necessity for an edge device congestion handling mechanism that can couple the two - ABR and TCP feedback control loops. We present an algorithm that makes use of the ABR feedback information and edge device congestion state to make packet dropping decisions at the edge of the ATM network,. Using the algorithm at the edge device, the end-to-end throughput and delay are improved while using ABR as the ATM subnetwork technology and with small buffers in the edge device. 2. Buffer Requirements In the context of feedback congestion control, buffers are used to absorb the transient traffic conditions and steady state rate oscillations due to the feedback delay. The buffers help to avoid losses in the network and to improve link utilization as aggregated traffic rate is below the link bandwidth. When there is congestion the switch conveys the information back to the source. The source reduces its rate according to the feedback information. The switch sees a drop in its input rates after a round trip delay. If the buffer size is large enough to store the extra data during this delay, it is possible to avoid losses. When the congestion abates, the switch sends information back to the source allowing it to increase its rate. There is again a delay before the switch sees an increase in its input rates. Meanwhile the data is drained from the buffer at a higher rate compared with its input rate. The switch continues to see a full link utilization as long as there is data in the buffer. Since the maximum drain rate is the bandwidth of the link and the minimum feedback delay is the round trip propagation delay, it is generally believed that a buffer size of one bandwidth propagation delay product is required to achieve good throughput. Jagannath, Yin [Page 5] INTERNET DRAFT August 1997 Expires February 5 1998 The above argument assumes that the source can reduce its rate when it receives feedback information from the network and that the source will be able to send data at a higher rate when congestion abates in the network. In the internetworking environment, the edge device is the ATM end-system. It controls its transmission rate according to the ABR feedback information, whereas TCP sources send data packet according to the TCP window control protocol. TCP uses implicit negative feedback information (i.e., packet loss and time out). When there is congestion in the network, TCP may continue to transmit a full window amount of data, causing packet loss if the buffer size is not greater than the window size. In case of multiple TCP streams sharing the same buffer it is impossible to size the buffer to be greater than the sum of the window size of all the streams. When there is packet loss, the TCP window size is reduced multiplicatively and it goes through a recovery phase. In this phase, TCP is constrained to transmit at a lower rate even if the network is free of congestion, causing a decrease in the link utilization. When ABR is used, it is possible to avoid losses within the ATM subnetwork. However, since the host uses TCP, the congestion is merely shifted from the ATM subnetwork to the IP/ATM interface. There may be packet loss in the edge device and loss of throughput because of that. Moreover, there is a possibility that there will be negative interaction between the two feedback loops. For example, when the TCP window is large, the available rate may be reduced and when the TCP window is small due to packet losses, the available rate may be high. When edge buffers are limited this kind of negative interaction can cause a severe degradation in the throughput. When UBR is used, packet loss due to EPD or cell discard at the ATM switches triggers the reduction of TCP window. Hence, in the UBR case, there is only a single TCP feedback loop and no additional buffer requirement at the edge device. In general, using UBR may need more memory at the ATM switch, whereas using ABR needs more memory at the edge device. The amount of required buffers for zero cell loss at ATM switch for UBR and at edge device for ABR are in order of maximum TCP window times the number of TCP sessions for zero cell loss. The edge device may be a low cost LAN-ATM switch with limited buffer memory. Furthermore the cost of the edge device is shared by lesser number of TCP sessions as compared to the ATM switch - making it more cost sensitive than the backbone switch. In reality, the buffers at both edge device and ATM switch are limited and smaller than required for zero cell loss. Jagannath, Yin [Page 6] INTERNET DRAFT August 1997 Expires February 5 1998 Hence, it is our goal to maximize the TCP throughput for the limited buffers at both edge device and ATM switches. In the our simulation experiments, we assume limited buffers at both edge device and ATM switches. 3. Internetwork Model In the end-to-end model the whole network can be considered to be a black box with TCP sources and destinations in the periphery. The only properties of the black box that the TCP modules are sensitive to are the packet loss and the round trip delay through the black box. The round trip delay determines how fast the TCP window can open up to utilize available bandwidth in the network, and also how fast it can react to impending congestion in the network. The packet loss triggers the TCP congestion avoidance and recovery scheme. The feedback control loop of the ABR service may cause an increase in the over-all round trip delay as it constricts the rate available to each TCP stream. It is generally believed that ABR is able to reduce the congestion level within the ATM network, however it does so at the expense of increasing the congestion level at the edge of the ATM network. From the an end-to-end performance point of view it is not clear if this is indeed beneficial. In our simulations we use a modified version of TCP Reno. This implementation of TCP uses the fast retransmission and recovery along with a modification to deal with multiple packet losses within the same window[4]. The simulation experiments were performed using a TCP/IP and IP over ATM [5] protocol stack, as shown in Figure 3. The TCP model conforms to RFCs 793 & 1122. In addition, the window scaling option is used and the TCP timer granularity is set to be 0.1s. The options considered for ATM are UBR, UBR with Early Packet Discard (EPD) ABR with EFCI and ABR with Explicit Rate (ER) [1]. The bigger picture of the model consists of two IP clouds connected by an ATM cloud as shown in Figure 1 and 2. In more specific details the model consists of two ATM switches connected back to back with a single OC-3 (155 Mb/s) link. The IP/ATM edge devices are connected to the ATM switches via OC-3 access links. 10 such devices are attached to each switch and therefore there are 10 TCP bi-directional connections that go through the bottleneck link between the ATM switches. One such connection is shown in the layered model in the Figure 3. Jagannath, Yin [Page 7] INTERNET DRAFT August 1997 Expires February 5 1998 _________ _________ | App. | | App. | |---------| |---------| | App. I/F| | App. I/F| |---------| |---------| | TCP | | TCP | |---------| _________ _________ |---------| | | | IP | | IP | | | | | | over | | over | | | | | | ATM | | ATM | | | | | |---------| |---------| | | | IP | | IP |AAL5| |AAL5 | IP| | IP | | | | |----| _________ |-----| | | | | | | | ATM| | ATM | | ATM | | | | |---------| |----|----| |---------| |-----|---| |---------| | PL | | PL | PL | | PL | PL | | PL | PL| | PL | --------- --------- --------- --------- -------- | | | | | | | | --------- ------ ------- --------- -------->---------- | | ----- ABR Loop -<-- ------------>--------------->-------------------- | | -----<---------TCP Control Loop -<--------------- Figure 3. Simulation model The propagation delay (D1 + D2 + D3) and the transmission delay corresponding to the finite transmission rate in the IP cloud is simulated in the TCP/IP stack. In the model described below the delay associated with each of the clouds is 3ms (D1 = D2 = D3 = 3ms). The performance metrics considered are basically the goodput at the TCP layer R, the time delay for a successful transmission of a 100KB file (D = duration between the transmission of the first byte and the receipt of an ack for the last byte). The goodput at the TCP layer is defined as the rate at which the data is successfully transmitted up to the application layer from the TCP layer. Data that is retransmitted is counted only once when it gets transmitted to the application layer. Packets that arrive out of sequence have to wait before all the preceding data is received before they can be transmitted to the application layer. Jagannath, Yin [Page 8] INTERNET DRAFT August 1997 Expires February 5 1998 Explicit Rate Switch Scheme: The switch keeps a flag for every VC which indicates if the VC is active or not. Switch maintains a timer for each VC. Every time a cell comes in on a particular VC the flag is turned on (if it is off) and the corresponding timer is started (if the flag is already on the timer is restarted). This flag is only turned off when the timer expires. Thus, the VC is considered active when the flag is on and inactive when the flag is off. The explicit rate timer value (ERT) is a parameter that can be adjusted. For our simulation we set the timer value to 1ms - this corresponds to little more than the transmission time for a single TCP packet (at line rate). The explicit rate for each VC is simply the total link bandwidth divided equally between all active VCs. Thus, if a VC is inactive for a duration more than the timer value, its bandwidth is reallocated to the other active VCs. 3.1 Simulation Results In this section we provide some initial results that help us understand the issues discussed in the previous sections. The simulation parameters are the following: For TCP with Fast Retransmission and Recovery: Timer Granularity = 100 ms. MSS = 9180 bytes. Round Trip prop. = 18 ms = 2*(D1 + D2 + D3). Max. RTO = 500 ms. Initial RTO = 100 ms. Max. Window Size = 655360 bytes ( > Bw * RTT) For ABR: Round Trip Prop. Delay = 6ms = 2*(D2) Buffer sizes = variable EFCI Thresholds = configurable parameter Nrm = 32 RDF=RIF = 1/32 All link rates = 155Mb/s Explicit Rate Timer (ERT) = 1ms The application is modeled to have infinite amount of data. The performance metrics shown are the goodput at the TCP layer, R and the time delay for a successful transmission of a 100KB file (D = Jagannath, Yin [Page 9] INTERNET DRAFT August 1997 Expires February 5 1998 duration between the transmission of the first byte and the receipt of an ack for the last byte) within a contiguous data stream. Note: All results are with a TCP timer granularity of 100 ms. Table 1: EB = 500, SB = 1000, TCP with FRR Edge-buffer size = 500 cells, Switch-buffer size = 1000 cells, ABR-EFCI threshold = 250, UBR-EPD threshold = 700, Exp. Rate Timer (ERT) = 1ms. ____________________________________________________________ | UBR UBR + EPD ABR-EFCI ABR-ER | |------------------------------------------------------------| | Delay (D) 520 ms 521 ms 460 ms 625 ms | | Goodput(R) 81.06 Mbps 87.12 Mbps 77.77 Mbps 58.95 Mbps | ------------------------------------------------------------ In the case of UBR and UBR+EPD the switch buffer is fully utilized but maximum occupancy of the edge buffers is about the size of 1 TCP packet. Thus, cells are lost only in the switch buffer in case of UBR. ABR with EFCI reduces the cell loss in the switch but results in increase in the cell loss in the edge buffer. ABR with ER eliminates the cell loss completely from the switch buffer but results in more cell loss in the edge device. Thus in the case shown, ER results in the lowest throughput and longest file transfer delay. For the sake of comparison, in Table 2 (below) we show the results for the TCP without fast retransmission and recovery which were obtained earlier. Table 2: EB = 500, SB = 1000, No Fast Retransmit and Recovery Edge-buffer size = 500 cells, Switch-buffer size = 1000 cells, ABR-EFCI threshold = 250, UBR-EPD threshold = 700. ________________________________________________ | UBR UBR+EPD ABR-EFCI | |------------------------------------------------| | Delay (D) 201 ms 139 ms 226 ms | | Goodput(R) 63.94 Mbps 86.39 Mbps 54.16 Mbps | ------------------------------------------------ The set of results (Table 1) with the enhanced TCP show an overall improvement in throughput, but an increase in the file transfer delay. In the presence of fast recovery and retransmit the lost Jagannath, Yin [Page 10] INTERNET DRAFT August 1997 Expires February 5 1998 packet is detected with the receipt of duplicate acknowledgments, but the TCP instead of reducing its window size to 1 packet, merely reduces it to half its current size. This increases the total amount of data that is transmitted by TCP. In the absence of FRR the TCP sources are silent for long periods of time when TCP recovers a packet loss through time-out. The increase in throughput in the case of TCP with FRR can be mainly attributed to more amount of data being transmitted by the TCP source. However, that also increases the number of packets that are dropped causing an increase in the average file transfer delay. (The file transfer delay is the duration between the transmit of the first byte of the file to the receipt for the acknowledgment for the last byte of the file.) In both the sets of results we see that UBR with Early Packet Discard results in better performance. While the ER scheme (Table 1) results in zero cell loss within the ATM network, it does not translate to better end-to-end performance for the TCP connections. The limited buffer sizes on the edge of the network result in poor effective throughput, both due to increased retransmissions and poor interaction between the TCP window and the ABR allowed rate (ACR). All the earlier results assume limited buffers in the edge as well as the switch. We also performed simulations with larger buffers at the edge device and in the switch. When the edge device buffer size is increased the end-to-end throughput is improved, though the delay might still be high, as shown in the following tables. Table 3: EB = 5000, SB = 1000 with FRR Edge-buffer size = 5000 cells, Switch-buffer size = 1000 cells, ABR-EFCI threshold = 250, ER Timer (ERT) = 1ms ___________________________________________ | ABR with EFCI ABR with ER | |-------------------------------------------| | Delay (D) 355 ms 233 ms | | Goodput (R) 89.8 Mbps 112.6 Mbps | ------------------------------------------- Table 4: EB = 10000, SB = 1000 with FRR Edge-buffer size = 10000 cells, Switch-buffer size = 1000 cells, ABR-EFCI threshold = 250, ER Timer (ERT) = 1ms ___________________________________________ | ABR with EFCI ABR with ER | Jagannath, Yin [Page 11] INTERNET DRAFT August 1997 Expires February 5 1998 |-------------------------------------------| | Delay (D) 355 ms 256 ms | | Goodput(R) 89.8 Mbps 119.6 Mbps | ------------------------------------------- The maximum TCP window size is approximately 13600 cells (655360 bytes). With larger buffers, the edge buffer is large enough to hold most of the TCP window. However, we still get low throughput for ABR with EFCI because of cell loss within the switch. ABR with ER on the other hand pushes the congestion into the edge device which can handle it with larger buffers. Cell loss occurs only when the edge buffer overflows which is a very infrequent event because of the size of the buffer. Since the TCP window dynamics is actually triggered by loss of packets, there is less interaction here between the TCP flow control and the ATM flow control mechanism, contrary to the earlier results (Table 1). The only factor that influences the TCP flow control, in the absence of lost packets, is the total round trip time - which in this case slowly increases as the buffers fill up, thereby increasing end-to-end delays. From the results it is evident that it is not sufficient to keep the ATM subnetwork free of congestion to provide good end-to-end throughput and delay performance. Our approach to solve this problem is to couple the two flow control loops. The information received from the network, along with the local queue sizes is used to detect congestion early and speed up the TCP slow-down and recovery processes. In the next section we present an edge device congestion handling results in significant improvement in the end-to-end performance with ABR-ER and small buffers at the edge device. 4. Edge Device Congestion Handling Mechanism The Packet Discard Algorithm, that couples ABR flow control with TCP window flow control, has the objectives of relieving congestion as well as conveying congestion information to the TCP source as soon as possible. In addition the algorithm needs to take into consideration how the TCP source can recover from the lost packet without a significant loss of throughput. Figure 5 gives a state-diagram for the algorithm followed by the pseudocode and an explanation for the algorithm. Currently we are working on an enhancement of this algorithm that is more sensitive to the explicit rate feedback and can lead to further improvement in the performance. Jagannath, Yin [Page 12] INTERNET DRAFT August 1997 Expires February 5 1998 4.1 Adaptive Discard Algorithm _________________ --->--| Phase 0 |--->---- | ----------------- | (ACR < f*LD_ACR | | Queue < LT and Queue < LT) | | and or | | P_CTR = 0 Queue = MQS | | | _________________ | ---<--| Phase 1 |---<---- ----------------- Figure 4. Algorithm State Diagram Variables: ACR = current Available Cell Rate(current cell transmission rate). ER = maximum network allowed cell transmission rate from receiving RM cell, it's the ceiling of ACR. CI = Congestion indication in network from receiving RM cell. LD_ACR = ACR when a packet was last dropped, or when ACR is increased. LD_ACR is always greater than or equal to ACR. QL = Queue Length, number of packets in VC queue LT = Low Queue Threshold MQS = Maximum Queue Size P_CTR = Stores the value of the number of packets in the VC queue when a packet is dropped and is then decrement for every packet that is removed from the VC queue. Congestion Phase = 1 bit information on the phase of the congestion (0 or 1) f = Factor that determines the significance in rate reduction, with 0= LT) or (QL >= MQS){ Drop packet from front of the Queue; Congestion Phase = 1; LD_ACR = ACR; P_CTR = QL; } if (ACR > LD_ACR) LD_ACR = ACR ; } if (Congestion Phase = 1){ if (QL = MQS){ Drop packet in front of queue; LD_ACR = ACR; P_CTR = QL; } if (QL < LT and P_CTR=0) Congestion Phase = 0; } Decrement P_CTR when a packet is serviced. P_CTR = Max (P_CTR, 0); Description: On receipt of the RM cells with the feedback information the ACR value is updated for the particular VC. When ACR is reduced drastically we would like to convey this information to the TCP source as soon as possible. The above algorithm uses this feedback information and the current queue length information to decide if a packet should be dropped from the VC. This coupling is a critical factor in resolving the issues discussed in earlier sections. Every time a packet is dropped the algorithm updates the LD_ACR (Last Drop ACR) value and uses that as the reference rate against which it compares the new ACR values. This value is increased and made equal to the ACR whenever ACR is larger than its current value. Thus, LD_ACR is always greater than or equal to the ACR. The algorithm consists of two phases. The criteria used to affect a packet drop is different in the two phases. Phase 0: The network is assumed not to be congested in this phase. This is reflected both by an ACR that is slowly changing and queue length less than the maximum queue size (MQS). Two possible scenarios can cause a packet drop. If the ACR is constant or slowly changing eventually the TCP window and hence the input rate to the queue will Jagannath, Yin [Page 14] INTERNET DRAFT August 1997 Expires February 5 1998 become large enough to cause the queue length to touch the MQS level. Then it is required to drop a packet to trigger a reduction in the TCP window. In the second scenario a packet is dropped when there is a drastic reduction in the ER available to the VC and if the queue length is greater than a queue threshold. The threshold should be set to at least a few packets to ensure the transmission of duplicated acks, and should not be set too close to the size of queue to ensure the function of congestion avoidance. A drastic reduction in the ACR value signifies congestion in the network and the algorithm sends an early warning to the TCP source by dropping a packet. Front Drop : In both the above cases the packet is dropped from the front of the queue. This results in early triggering of the congestion control mechanism in TCP. This has also been observed by others [16]. Additionally TCP window flow control has the property that the start of the sliding window aligns itself to the dropped packet and stops there till that packet is successfully retransmitted. This reduces the amount of data that the TCP source can pump into a congested network. In implementations of TCP with the fast recovery and retransmit options the recovery from the lost packet is sooner since at least one buffer worth of data is transmitted after the lost packet (which in turn generate the required duplicate acknowledgments for the fast retransmit or recovery). Transition to Phase 1: When TCP detects a lost packet, depending on the implementation, it will either reduce its window size to 1 packet, or it reduces the window size to half its current window size. When multiple packets are lost within the same TCP window different TCP implementations will recover differently. The first packet that is dropped causes the reduction in TCP window and hence its average rate. The multiple packet losses within the same TCP window causes a degradation of throughput and is not desirable, irrespective of the TCP implementation. After the first packet is dropped the algorithm makes a transition to Phase 1 and does not drop any packets to cause rate reduction in this phase. Phase 1 : In this phase the algorithm does not drop packets to convey a reduction of the ACR rate. The packets are dropped only when the queue reaches the MQS value. The packets are dropped from the front because of the same reasons as described in Phase 0. When a packet is dropped the algorithm records the number of packets in the queue in the variable P_CTR. The TCP window size is at least as large as P_CTR when a packet is dropped. Thus, the algorithm tries not to drop Jagannath, Yin [Page 15] INTERNET DRAFT August 1997 Expires February 5 1998 any more packets due to rate reduction till P_CTR packets are serviced. Transition to Phase 0: If the ACR stays at the value that caused the transition to Phase 1, the queue length will decrease after one round trip time and the algorithm can transit to Phase 0. If the ACR decreases further the algorithm eventually drops another packet when the queue length reaches the MQS, but does not transit back to Phase 0. The transition to Phase 0 takes place when at least P_CTR packets have been serviced and queue length recedes below the LT threshold. 4.2 Simulation Results The initial set of results served to identify the problem and to motivate the edge device congestion handling mechanism. In this section we discuss some of the simulation results with the use of the algorithm described above. In order to get better understanding of the behavior of the algorithm these simulation experiments were performed with a smaller TCP packet size - 1500 bytes. A 9180 byte TCP packet is approximately equivalent to 192 ATM cells. This would be very close to the threshold value of the algorithm for the small buffer range that we are considering and would unduly influence the results. The rest of the simulation parameters are the same as used above with all results using TCP with fast retransmit and recovery. The results are shown for UBR and UBR with Early Packet Discard (EPD). Along with ABR we show results for ABR with a simple front drop strategy in the edge device and for ABR with the Adaptive Discard Algorithm. These are termed as ABR-ER, ABR-ER with FD and ABR-ER with ADA in the tables below. From the results it is evident that the end- to-end performance improves as we use intelligent packet discard algorithms in the edge device. There is significant improvement due to the early detection of congestion based on the network feedback information received due to ABR. Table 5: EB = 500, SB = 1000, TCP with Fast Retransmit and Recovery ____________________________________________ | UBR UBR with EPD | |--------------------------------------------| | Delay (D) 370 ms 230 ms | Jagannath, Yin [Page 16] INTERNET DRAFT August 1997 Expires February 5 1998 | Goodput(R) 63.4 Mbps 79.1 Mbps | -------------------------------------------- Table 6: EB = 500, SB = 1000, TCP with Fast Retransmit and Recovery __________________________________________________________ | ABR-ER ABR-ER with FD ABR-ER with ADA | |----------------------------------------------------------| | Delay (D) 715 ms 237 ms 244 ms | | Goodput(R) 36.15 Mbps 67.9 Mbps 87.68 Mbps | ---------------------------------------------------------- Table 7: EB = 1000, SB = 1000, TCP with Fast Retransmit and Recovery __________________________________________________________ | ABR-ER ABR-ER with FD ABR-ER with ADA | |----------------------------------------------------------| | Delay (D) 310 ms 145 ms 108 ms | | Goodput(R) 66.7 Mbps 107.7 Mbps 112.86 Mbps | ---------------------------------------------------------- The use of the algorithm leads to increase in the TCP throughput and a decrease in the end- to-end delays seen by the end host. The algorithm has two effects on the TCP streams - it detects congestion early and drops a packet as soon as it detects congestion, plus it tries to avoid dropping additional packets. The first packet that is dropped achieves a reduction in the TCP window size and hence the input rates, the algorithm tries not to drop any more. This keeps the TCP streams active with a higher sustained average rate. In order to improve the ABR throughput and delay performance it seems to be important to control the TCP window dynamics and reduce possibilities for negative interaction between the TCP flow control loop and the explicit rate feedback loop. 5. Summary In this document we discuss the necessity for considering end-to-end traffic management in IP/ATM internetworks. In particular we consider the impact of the interactions between the TCP and the ABR flow control loops in providing end-to-end quality of service to the user. We also discuss the trade-off in the memory requirements between the switch and edge device based on the type of flow control employed Jagannath, Yin [Page 17] INTERNET DRAFT August 1997 Expires February 5 1998 within the ATM subnetwork. We show that it is important to consider the behavior of the end-to-end protocol like TCP when selecting the traffic management mechanisms in the IP/ATM internetworking environment. The ABR rate control pushes the congestion out of the ATM subnetwork and into the edge devices. We show that this can have a detrimental effect on the end-to-end performance where the buffers at the edge device are limited. In such cases we show that UBR with Early Packet Discard can achieve better performance than ABR. We present an algorithm for handling the congestion in the edge device which improves the end-to-end throughput and delay significantly while using ABR. The algorithm couples the two flow control loops by using the information received by the explicit rate scheme to intelligently discard packets when there is congestion in the edge device. In addition the algorithm is sensitive to the TCP flow control and recovery mechanism to make sure that the throughput does not suffer due closing of the TCP window. As presented in this paper the algorithm is limited to cases where different TCP streams are queued separately. As future work the algorithm will be extended to include shared memory cases. More work needs to be done to include studies with heterogeneous sources with different round- trip delays. References [1] ATM Forum TM SWG "Traffic Management Specification V. 4.0" ATM Forum, April 1996. [2] V. Jacobson, "Modified TCP Congestion Avoidance Algorithm," end2end-interest mailing list, April 30, 1990. ftp://ftp.isi.edu/end2end/end2end-interest-1990.mail. [3] K. Fall and S. Floyd, "Comparisons of Tahoe, Reno and Sack TCP", available from http://www-nrg.ee.lbl.gov/nrg-abstracts.html#KF95. [4] J. C. Hoe, "Improving the Start-up behavior of Congestion Control Scheme for TCP", SIGCOMM 96. [5] RFC 1577, "Classical IP and ARP over ATM", December 1993. [6] RFC 793, "Transmission Control Protocol", September 1981. Jagannath, Yin [Page 18] INTERNET DRAFT August 1997 Expires February 5 1998 [7] RFC 1122, "Requirements for Internet Hosts - Communication Layers", October 1989. [8] C. Fang and A. Lin, "A Simulation Study of ABR Robustness with Binary-Mode Switches", ATMF-95-132, Oct. 1995 [9] H. Chen, and J. Brandt, "Performance Evaluation of EFCI Switch with Per VC Queuing", ATMF 96-0048, Feb. 1996 [10] H. Li, K. Sui, and H. Tzeng, "TCP over ATM with ABR vs. UBR+EPD" ATMF 95-0718, June 1995 [11] R. Jain, S. Kalyanaraman, R. Goyal and S. Fahmy, "Buffer Requirements for TCP over ABR", ATMF 96-0517, April 1996. [12] P. Newman, "Data over ATM: Standardization of Flow Control" SuperCon'96, Santa Clara, Jan 1996. [13] N. Yin and S. Jagannath, "End-to-End Traffic Management in IP/ATM Internetworks", ATM Forum 96-1406, October 1996. [14] S. Jagannath and N. Yin, "End-to-End TCP Performance in IP/ATM Internetworks", ATM Forum 96-1711, December 1996. [15] A. Romanow and S. Floyd, "Dynamics of TCP Traffic over ATM Networks", IEEE Journal of Selected Areas in Communications, pp 633-41, vol. 13, N o. 4, May 1995. [16] T. V. Lakshman et al., "The Drop from Front Strategy in TCP and TCP over ATM", IEEE Infocom, 1996. Authors' Address S. Jagannath and N. Yin (508) 670-8888, 670-8153 (fax) jagan_pop@baynetworks.com, nyin@baynetworks.com Bay Networks Inc. The ION working group can be contacted through the chairs: Andy Malis George Swallow Jagannath, Yin [Page 19]