Network Working Group
INTERNET DRAFT                                     S. Jagannath and N. Yin
Expire in six months                               Bay Networks Inc.
                                                   August 1997


      End-to-End Traffic Management Issues in IP/ATM Internetworks
                   <draft-jagan-e2e-traf-mgmt-00.txt>


Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.
      Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as ``work in progress.''
      To learn the current status of any Internet-Draft, please check
   the ``1id-abstracts.txt'' listing contained in the Internet-Drafts
   Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net
   (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
   Rim).

Abstract

   This document addresses the end-to-end traffic management issues in
   IP/ATM internetworks. In the internetwork environment, the ATM
   control mechanisms (e.g., Available Bit Rate (ABR) and UBR with Early
   Packet Discard (EPD)) are applicable to the ATM subnetwork, while the
   TCP flow control extends from end to end. We investigated the end to
   end performance in terms of TCP throughput and file transfer delay in
   cases using ABR and UBR in the ATM subnetwork. In this document, we
   also discuss the issue of trade-off between the buffer requirements
   at the ATM edge device (e.g., Ethernet-ATM switch, ATM router
   interface) versus ATM switches inside the ATM network.

   Our simulation results show that in certain scenarios (e.g., with
   limited edge device buffer memory) UBR with EPD may perform
   comparably to ABR or even outperform ABR. We show that it is not
   sufficient to have a lossless ATM subnetwork from the end-to-end


Jagannath, Yin                                                  [Page 1]

INTERNET DRAFT                August 1997        Expires February 5 1998


   performance point of view. The results illustrate the necessity for
   an edge device congestion handling mechanism that can couple the ABR
   and TCP feedback control loops.  We present an algorithm that makes
   use of the ABR feedback information and edge device congestion state
   to make packet dropping decisions at the edge of the ATM network.
   Using the algorithm at the edge device, the end-to-end performance in
   throughput and delay are improved while using ABR as the ATM
   subnetwork technology and with small buffers in the edge device.


Jagannath, Yin                                                  [Page 2]

INTERNET DRAFT                August 1997        Expires February 5 1998


1.  Introduction


   In an IP/ATM internetwork environment, the legacy networks (e.g.,
   Ethernet) are typically interconnected by an ATM subnetwork (e.g.,
   ATM backbone). Supporting end-to-end Quality of Services (QOS) and
   traffic management in the internetworks become important requirements
   for multimedia applications. In this document, we focus on the end-
   to-end traffic management issues in IP/ATM internetwork. We
   investigated the end-to-end performance in terms of TCP throughput
   and file transfer delay in cases using ABR and UBR in the ATM
   subnetwork. In this document, we also discuss the issue of trade-off
   between the buffer requirement at the ATM edge device (e.g.,
   Ethernet-ATM switch, ATM router interface) versus ATM switches inside
   the ATM network.

   ABR rate-based control mechanism has been specified by the ATM Forum
   [1].  In the past, there have been significant investigations on ABR
   flow control and TCP over ATM [8-11].  Most of the works were focused
   on the ATM switch buffer requirement and throughput fairness using
   different algorithms (e.g., simple binary - Explicit Forward
   Congestion Indication (EFCI) vs. Explicit Rate Control; FIFO vs. per
   VC queuing, etc.). It is generally believed that ABR can effectively
   control the congestion within ATM networks.

   There has been little work on end-to-end traffic management in the
   internetworking environment, where networks are not end-to-end ATM.
   In such cases ABR flow control may simply push the congestion to the
   edge of ATM network. Even if the ATM network is kept free of
   congestion by using ABR flow control, the end-to-end performance
   (e.g., the time to transfer a file) perceived by the application
   (typically hosted in the legacy network) may not necessarily be
   better. Furthermore, one may argue that the reduction in buffer
   requirement in the ATM switch by using ABR flow control may be at the
   expense of an increase in buffer requirement at the edge device
   (e.g., ATM router interface, legacy LAN to ATM switches).

   Since most of today's data applications use TCP flow control
   protocol, one may question the benefits of ABR flow control as a
   subnetwork technology, arguing that UBR is equally effective and much
   less complex than ABR [12]. Figures 1 and 2 illustrate the two cases
   under consideration.


Jagannath, Yin                                                  [Page 3]

INTERNET DRAFT                August 1997        Expires February 5 1998


    Source --> IP -> Router --> ATM Net --> Router -> IP --> Destination
               Cloud             UBR                  Cloud

            ----------------------->----------------------
           |                                              |
            --------------- TCP Control Loop --<----------

                    Figure 1: UBR based ATM Subnetwork.

   In these cases, source and destination are interconnected through
   IP/ATM/IP internetworks.  In Figure 1, the connection uses UBR
   service and in Figure 2, the connection uses ABR.  In the UBR case,
   when congestion occurs in ATM networks, cells are dropped (e.g.,
   Early Packet Discard [15] may be utilized) resulting in the reduction
   of the TCP congestion window. In the ABR case, when congestion is
   detected in ATM Networks, ABR rate control becomes effective and
   enforces the ATM edge device to reduce its transmission rate into the
   ATM network. If congestion persists, the buffer in the edge device
   will reach its capacity and start to drop packets, resulting in the
   reduction of the TCP congestion window.

   From the performance point of view, the latter involves two control
   loops: ABR and TCP.  The ABR inner control loop may result in a
   longer feedback delay for the TCP control.  Furthermore, there are
   two feedback control protocols, and one may argue that the
   interactions or interference between the two may actually degrade the
   TCP performance [12].


    Source --> IP -> Router --> ATM Net --> Router -> IP --> Destination
               Cloud             ABR                  Cloud

                             ----->---------
                            |               |
                             -- ABR Loop -<-
            -------------------------->-------------------
           |                                              |
            --------------- TCP Control Loop --<----------

                   Figure 2. ABR Based ATM Subnetwork

   From the implementation perspective, there are trade-off between
   memory requirements at ATM switch and edge device for UBR and ABR. In


Jagannath, Yin                                                  [Page 4]

INTERNET DRAFT                August 1997        Expires February 5 1998


   addition, we need to consider the costs of implementation of ABR and
   UBR at ATM switches and edge devices.

   From this discussion, we believe that the end-to-end traffic
   management of an entire internetwork environment should be
   considered. In this document, we study the interaction and
   performance of ABR and UBR from the point of view of end-to-end
   traffic management. Our simulation results show that in certain
   scenarios (e.g., with limited edge device buffer memory) UBR with EPD
   may perform comparably to ABR or even outperform ABR. We show that it
   is not sufficient to have a lossless ATM subnetwork when we are
   looking at the end-to-end performance point of view. The results
   illustrate the necessity for an edge device congestion handling
   mechanism that can couple the two - ABR and TCP feedback control
   loops. We present an algorithm that makes use of the ABR feedback
   information and edge device congestion state to make packet dropping
   decisions at the edge of the ATM network,. Using the algorithm at the
   edge device, the end-to-end throughput and delay  are improved while
   using ABR as the ATM subnetwork technology and with small buffers in
   the edge device.


2. Buffer Requirements


   In the context of feedback congestion control, buffers are used to
   absorb the transient traffic conditions and steady state rate
   oscillations due to the feedback  delay. The buffers help to avoid
   losses in the network and to improve link utilization as aggregated
   traffic rate is below the link bandwidth. When there is congestion
   the switch conveys the information back to the source. The source
   reduces its rate according to the feedback  information. The switch
   sees a drop in its input rates after a round trip delay. If the
   buffer size is large enough to store the extra data during this
   delay, it is possible to avoid losses. When the congestion abates,
   the switch sends information back to the source allowing it to
   increase its rate. There is again a delay before the switch sees an
   increase in its input rates.  Meanwhile the data is drained from the
   buffer at a higher rate compared with its input rate.  The switch
   continues to see a full link utilization as long as there is data in
   the buffer. Since the maximum drain rate is the bandwidth of the link
   and the minimum feedback delay is the round trip propagation delay,
   it is generally believed that a buffer size of one bandwidth
   propagation delay product is required to achieve good throughput.


Jagannath, Yin                                                  [Page 5]

INTERNET DRAFT                August 1997        Expires February 5 1998


   The above argument assumes that the source can reduce its rate when
   it receives feedback information from the network and that the source
   will be able to send data at a higher rate when congestion abates in
   the network. In the internetworking environment, the edge device is
   the ATM end-system. It controls its transmission rate according to
   the ABR feedback information, whereas TCP sources send data packet
   according to the TCP window control protocol.

   TCP uses implicit negative feedback information (i.e., packet loss
   and time out). When there is congestion in the network, TCP may
   continue to transmit a full window amount of data, causing packet
   loss if the buffer size is not greater than the window size. In case
   of multiple TCP streams sharing the same buffer it is impossible to
   size the buffer to be greater than the sum of the window size of all
   the streams. When there is packet loss, the TCP window size is
   reduced multiplicatively and it goes through a recovery phase. In
   this phase, TCP is constrained to transmit at a lower rate even if
   the network is free of congestion, causing a decrease in the link
   utilization.

   When ABR is used, it is possible to avoid losses within the ATM
   subnetwork. However, since the host uses TCP, the congestion is
   merely shifted from the ATM subnetwork to the IP/ATM interface. There
   may be packet loss in the edge device and loss of throughput because
   of that. Moreover, there is a possibility that there will be negative
   interaction between the two feedback loops. For example, when the TCP
   window is large, the available rate may be reduced and when the TCP
   window is small due to packet losses, the available rate may be high.
   When edge buffers are limited this kind of negative interaction can
   cause a severe degradation in the throughput. When UBR is used,
   packet loss due to EPD or cell discard at the ATM switches triggers
   the reduction of TCP window. Hence, in the UBR case, there is only a
   single TCP feedback loop and no additional buffer requirement at the
   edge device.

   In general, using UBR may need more memory at the ATM switch, whereas
   using ABR needs more memory at the edge device. The amount of
   required buffers for zero cell loss at ATM switch for UBR and at edge
   device for ABR are in order of maximum TCP window times the number of
   TCP sessions for zero cell loss. The edge device may be a low cost
   LAN-ATM switch with limited buffer memory. Furthermore the cost of
   the edge device is shared by lesser number of TCP sessions as
   compared to the ATM switch - making it more cost sensitive than the
   backbone switch. In reality, the buffers at both edge device and ATM
   switch are limited and smaller than required for zero cell loss.


Jagannath, Yin                                                  [Page 6]

INTERNET DRAFT                August 1997        Expires February 5 1998


   Hence, it is our goal to maximize the TCP throughput for the limited
   buffers at both edge device and ATM switches. In the our simulation
   experiments, we assume limited buffers at both edge device and ATM
   switches.


3. Internetwork Model


   In the end-to-end model the whole network can be considered to be a
   black box with TCP sources and destinations in the periphery. The
   only properties of the black box that the TCP modules are sensitive
   to are the packet loss and the round trip delay through the black
   box.  The round trip delay determines how fast the TCP window can
   open up to utilize available bandwidth in the network, and also how
   fast it can react to impending congestion in the network. The packet
   loss triggers the TCP congestion avoidance and recovery scheme. The
   feedback control loop of the ABR service may cause an increase in the
   over-all round trip delay as it constricts the rate available to each
   TCP stream. It is generally believed that ABR is able to reduce the
   congestion level within the ATM network, however it does so at the
   expense of increasing the congestion level at the edge of the ATM
   network. From the an end-to-end performance point of view it is not
   clear if this is indeed beneficial.

   In our simulations we use a modified version of TCP Reno. This
   implementation of TCP uses the fast retransmission and recovery along
   with a modification to deal with multiple packet losses within the
   same window[4]. The simulation experiments were performed using a
   TCP/IP and IP over ATM [5] protocol stack, as shown in Figure 3. The
   TCP model conforms to RFCs 793 & 1122. In addition, the window
   scaling option is used and the TCP timer granularity is set to be
   0.1s. The options considered for ATM are UBR, UBR with Early Packet
   Discard (EPD) ABR with EFCI and ABR with Explicit Rate (ER) [1].

   The bigger picture of the model consists of two IP clouds connected
   by an ATM cloud as shown in Figure 1 and 2. In more specific details
   the model consists of two ATM switches connected back to back with a
   single OC-3 (155 Mb/s) link. The IP/ATM edge devices are connected to
   the ATM switches via OC-3 access links. 10 such devices are attached
   to each switch and therefore there are 10 TCP bi-directional
   connections that go through the bottleneck link between the ATM
   switches. One such connection is shown in the layered model in the
   Figure 3.


Jagannath, Yin                                                  [Page 7]

INTERNET DRAFT                August 1997        Expires February 5 1998


     _________                                           _________
    |  App.   |                                         |  App.   |
    |---------|                                         |---------|
    | App. I/F|                                         | App. I/F|
    |---------|                                         |---------|
    |   TCP   |                                         |   TCP   |
    |---------|   _________                 _________   |---------|
    |         |  |    IP   |               |    IP   |  |         |
    |         |  |   over  |               |   over  |  |         |
    |         |  |   ATM   |               |   ATM   |  |         |
    |         |  |---------|               |---------|  |         |
    |   IP    |  | IP |AAL5|               |AAL5 | IP|  |    IP   |
    |         |  |    |----|   _________   |-----|   |  |         |
    |         |  |    | ATM|  |   ATM   |  | ATM |   |  |         |
    |---------|  |----|----|  |---------|  |-----|---|  |---------|
    |   PL    |  | PL | PL |  | PL | PL |  | PL  | PL|  |    PL   |
     ---------    ---------    ---------    ---------     --------
        |         |     |      |     |       |      |         |
         ---------       ------       -------        ---------

                         -------->----------
                        |                   |
                         ----- ABR Loop -<--

             ------------>--------------->--------------------
            |                                                 |
             -----<---------TCP Control Loop -<---------------
                       Figure 3. Simulation model


   The propagation delay (D1 + D2 + D3) and the transmission delay
   corresponding to the finite transmission rate in the IP cloud is
   simulated in the TCP/IP stack. In the model described below the delay
   associated with each of the clouds is 3ms (D1 = D2 = D3 = 3ms). The
   performance metrics considered are basically the goodput at the TCP
   layer R, the time delay for a successful transmission of a 100KB file
   (D = duration between the transmission of the first byte and the
   receipt of an ack for the last byte). The goodput at the TCP layer is
   defined as the rate at which the data is successfully transmitted up
   to the application layer from the TCP layer. Data that is
   retransmitted is counted only once when it gets transmitted to the
   application layer. Packets that arrive out of sequence have to wait
   before all the preceding data is received before they can be
   transmitted to the application layer.


Jagannath, Yin                                                  [Page 8]

INTERNET DRAFT                August 1997        Expires February 5 1998


   Explicit Rate Switch Scheme: The switch keeps a flag for every VC
   which indicates if the VC is active or not. Switch maintains a timer
   for each VC. Every time a cell comes in on a particular VC the flag
   is turned on (if it is off) and the corresponding timer is started
   (if the flag is already on the timer is restarted). This flag is only
   turned off when the timer expires.  Thus, the VC is considered active
   when the flag is on and inactive when the flag is off. The explicit
   rate timer value (ERT) is a parameter that can be adjusted. For our
   simulation we set the timer value to 1ms - this corresponds to little
   more than the transmission time for a single TCP packet (at line
   rate). The explicit rate for each VC is simply the total link
   bandwidth divided equally between all active VCs. Thus, if a VC is
   inactive for a duration more than the timer value, its bandwidth is
   reallocated to the other  active VCs.


3.1 Simulation Results


   In this section we provide some initial results that help us
   understand the issues discussed in the previous sections.  The
   simulation parameters are the following:

   For TCP with Fast Retransmission and Recovery:
           Timer Granularity       = 100 ms.
           MSS                             = 9180 bytes.
           Round Trip prop.        = 18 ms = 2*(D1 + D2 + D3).
           Max. RTO                = 500 ms.
           Initial RTO             = 100 ms.
           Max. Window Size        = 655360 bytes ( > Bw * RTT)


   For ABR:
           Round Trip Prop. Delay  = 6ms = 2*(D2)
           Buffer sizes                    = variable
           EFCI Thresholds         = configurable parameter
           Nrm                             = 32
           RDF=RIF                         = 1/32
           All link rates                  = 155Mb/s
           Explicit Rate Timer (ERT)       = 1ms The application is
   modeled to have infinite amount of data.


   The performance metrics shown are the goodput at the TCP layer, R and
   the time delay for a successful transmission of a 100KB file (D =


Jagannath, Yin                                                  [Page 9]

INTERNET DRAFT                August 1997        Expires February 5 1998


   duration between the transmission of the first byte and the receipt
   of an ack for the last byte) within a contiguous data stream.

   Note: All results are with a TCP timer granularity of 100 ms.


    Table 1: EB = 500, SB = 1000, TCP with FRR
        Edge-buffer size = 500 cells,
        Switch-buffer size = 1000 cells,
        ABR-EFCI threshold = 250,
        UBR-EPD threshold = 700, Exp. Rate Timer (ERT) = 1ms.
     ____________________________________________________________
    |               UBR       UBR + EPD    ABR-EFCI     ABR-ER   |
    |------------------------------------------------------------|
    | Delay (D)    520 ms      521 ms      460 ms      625 ms    |
    | Goodput(R)  81.06 Mbps  87.12 Mbps  77.77 Mbps  58.95 Mbps |
     ------------------------------------------------------------

   In the case of UBR and UBR+EPD the switch buffer is fully utilized
   but maximum occupancy of the edge buffers is about the size of 1 TCP
   packet. Thus, cells are lost only in the switch buffer in case of
   UBR. ABR with EFCI reduces the cell loss in the switch but results in
   increase in the cell loss in the edge buffer. ABR with ER eliminates
   the cell loss completely from the switch buffer but results in more
   cell loss in the edge device. Thus in the case shown, ER results in
   the lowest throughput and longest file transfer delay.

   For the sake of comparison, in Table 2 (below) we show the results
   for the TCP without fast retransmission and recovery which were
   obtained earlier.

    Table 2: EB = 500, SB = 1000, No Fast Retransmit and Recovery
        Edge-buffer size = 500 cells,
        Switch-buffer size = 1000 cells,
        ABR-EFCI threshold = 250, UBR-EPD threshold = 700.
     ________________________________________________
    |               UBR       UBR+EPD     ABR-EFCI   |
    |------------------------------------------------|
    | Delay (D)    201 ms      139 ms      226 ms    |
    | Goodput(R)  63.94 Mbps  86.39 Mbps  54.16 Mbps |
     ------------------------------------------------

   The set of results (Table 1) with the enhanced TCP show an overall
   improvement in throughput, but an increase in the file transfer
   delay. In the presence of fast recovery and retransmit the lost


Jagannath, Yin                                                 [Page 10]

INTERNET DRAFT                August 1997        Expires February 5 1998


   packet is detected with the receipt of duplicate acknowledgments, but
   the TCP instead of reducing its window size to 1 packet, merely
   reduces it  to half its current size. This increases the total amount
   of data that is transmitted by TCP. In the absence of FRR the TCP
   sources are silent for long periods of time when TCP recovers a
   packet loss through time-out.  The increase in throughput in the case
   of TCP with FRR can be mainly attributed to more amount of data being
   transmitted by the TCP source. However, that also increases the
   number of packets that are dropped causing an increase in the average
   file transfer delay. (The file transfer delay is the duration between
   the transmit of the first byte of the file to the receipt for the
   acknowledgment for the last byte of the file.)

   In both the sets of results we see that UBR with Early Packet Discard
   results in better performance.  While the ER scheme (Table 1) results
   in zero cell loss within the ATM network, it does not translate to
   better end-to-end performance for the TCP connections.  The limited
   buffer sizes on the edge of the network result in poor effective
   throughput, both due to increased retransmissions and poor
   interaction between the TCP window and the ABR allowed rate (ACR).

   All the earlier results assume limited buffers in the edge as well as
   the switch. We also performed simulations with larger buffers at the
   edge device and in the switch. When the edge device buffer size is
   increased the end-to-end throughput is improved, though the delay
   might still be high, as shown in the following tables.

    Table 3: EB = 5000, SB = 1000 with FRR
            Edge-buffer size = 5000 cells,
            Switch-buffer size = 1000 cells,
            ABR-EFCI threshold = 250, ER Timer (ERT) = 1ms
     ___________________________________________
    |             ABR with EFCI     ABR with ER |
    |-------------------------------------------|
    | Delay (D)     355 ms            233 ms    |
    | Goodput (R)  89.8 Mbps         112.6 Mbps |
     -------------------------------------------


    Table 4: EB = 10000, SB = 1000 with FRR
            Edge-buffer size = 10000 cells,
            Switch-buffer size = 1000 cells,
            ABR-EFCI threshold = 250, ER Timer (ERT) = 1ms
     ___________________________________________
    |             ABR with EFCI     ABR with ER |


Jagannath, Yin                                                 [Page 11]

INTERNET DRAFT                August 1997        Expires February 5 1998


    |-------------------------------------------|
    | Delay (D)     355 ms            256 ms    |
    | Goodput(R)   89.8 Mbps         119.6 Mbps |
     -------------------------------------------

   The maximum TCP window size is approximately 13600 cells (655360
   bytes). With larger buffers, the edge buffer is large enough to hold
   most of the TCP window. However, we still get low throughput for ABR
   with EFCI because of cell loss within the switch. ABR with ER on the
   other hand pushes the congestion into the edge device which can
   handle it with larger buffers. Cell loss occurs only when the edge
   buffer overflows which is a very infrequent event because of the size
   of the buffer. Since the TCP window dynamics is actually triggered by
   loss of packets, there is less interaction here between the TCP flow
   control and the ATM flow control mechanism, contrary to the earlier
   results (Table 1). The only factor that influences the TCP flow
   control, in the absence of lost packets, is the total round trip time
   - which in this case slowly increases as the buffers fill up, thereby
   increasing end-to-end delays.

   From the results it is evident that it is not sufficient to keep the
   ATM subnetwork free of congestion to provide good end-to-end
   throughput and delay performance. Our approach to solve this problem
   is to couple the two flow control loops. The information received
   from the network, along with the local queue sizes is used to detect
   congestion early and speed up the TCP slow-down and recovery
   processes. In the next section we present an edge device congestion
   handling results in significant improvement in the end-to-end
   performance with ABR-ER and small buffers at the edge device.


4. Edge Device Congestion Handling Mechanism


   The Packet Discard Algorithm, that couples ABR flow control with TCP
   window flow control, has the objectives of relieving congestion as
   well as conveying congestion information to the TCP source as soon as
   possible. In addition the algorithm needs to take into consideration
   how the TCP source can recover from the lost packet without a
   significant loss of throughput. Figure 5 gives a state-diagram for
   the algorithm followed by the pseudocode and an explanation for the
   algorithm. Currently we are working on an enhancement of this
   algorithm that is more sensitive to the explicit rate feedback and
   can lead to further improvement in the performance.


Jagannath, Yin                                                 [Page 12]

INTERNET DRAFT                August 1997        Expires February 5 1998


4.1 Adaptive Discard Algorithm


                               _________________
                        --->--|    Phase 0      |--->----
                       |       -----------------         |
    (ACR < f*LD_ACR    |                                 |  Queue < LT
     and Queue < LT)   |                                 |    and
          or           |                                 |  P_CTR = 0
      Queue = MQS      |                                 |
                       |       _________________         |
                        ---<--|    Phase 1      |---<----
                               -----------------

                        Figure 4. Algorithm State Diagram

    Variables:
    ACR = current Available Cell Rate(current cell transmission rate).
    ER  = maximum network allowed cell transmission rate from
          receiving RM cell, it's the ceiling of ACR.
    CI  = Congestion indication in network from receiving RM cell.
    LD_ACR  = ACR when a packet was last dropped, or when ACR is
             increased.  LD_ACR is always greater than or equal to ACR.
    QL  = Queue Length, number of packets in VC queue
    LT  = Low Queue Threshold

    MQS  = Maximum Queue Size
    P_CTR = Stores the value of the number of packets in the VC queue
            when a packet is dropped and is then decrement for every
            packet that is removed from the VC queue.
    Congestion Phase = 1 bit information on the phase of the
                       congestion (0 or 1)

    f   = Factor that determines the significance in rate reduction,
          with 0<f<1


        Initialize:
                LD_ACR = ACR;
                Congestion Phase = 0;

        When receiving an RM cell
                Update ACR value based on ER and CI information

        if (Congestion Phase = 0) {


Jagannath, Yin                                                 [Page 13]

INTERNET DRAFT                August 1997        Expires February 5 1998


                if ((ACR < f * LD_ACR or CI=1)
                              and  QL >= LT) or (QL >= MQS){
                        Drop packet from front of the Queue;
                        Congestion Phase = 1;
                        LD_ACR = ACR;
                        P_CTR = QL;
                        }

                if (ACR > LD_ACR)
                        LD_ACR = ACR ;
                }
        if (Congestion Phase = 1){
                if (QL = MQS){
                      Drop packet in front of queue;
                        LD_ACR = ACR;
                        P_CTR = QL;
                        }
              if (QL < LT and P_CTR=0)
                        Congestion Phase = 0;
              }

                Decrement P_CTR when a packet is serviced.
                P_CTR = Max (P_CTR, 0);


   Description: On receipt of the RM cells with the feedback information
   the ACR value is updated for the particular VC. When ACR is reduced
   drastically we would like to convey this information to the TCP
   source as soon as possible. The above algorithm uses this feedback
   information and the current queue length information to decide if a
   packet should be dropped from the VC. This coupling is a critical
   factor in resolving the issues discussed in earlier sections.  Every
   time a packet is dropped the algorithm updates the LD_ACR (Last Drop
   ACR) value and uses that as the reference rate against which it
   compares the new ACR values. This value is increased and made equal
   to the ACR whenever ACR is larger than its current value. Thus,
   LD_ACR is always greater than or equal to the ACR.  The algorithm
   consists of two phases. The criteria used to affect a packet drop is
   different in the two phases.

   Phase 0:  The network is assumed not to be congested in this phase.
   This is reflected both by an ACR that is slowly changing and queue
   length less than the maximum queue size (MQS). Two possible scenarios
   can cause a packet drop. If the ACR is constant or slowly changing
   eventually the TCP window and hence the input rate to the queue will


Jagannath, Yin                                                 [Page 14]

INTERNET DRAFT                August 1997        Expires February 5 1998


   become large enough to cause the queue length to touch the MQS level.
   Then it is required to drop a packet to trigger a reduction in the
   TCP window. In the second scenario a packet is dropped when there is
   a drastic reduction in the ER available to the VC and if the queue
   length is greater than a queue threshold. The threshold should be set
   to at least a few packets to ensure the transmission of duplicated
   acks, and should not be set too close to the size of queue to ensure
   the function of congestion avoidance.  A drastic reduction in the ACR
   value signifies congestion in the network and the algorithm sends an
   early warning to the TCP source by dropping a packet.

   Front Drop : In both the above cases the packet is dropped from the
   front of the queue.  This results in early triggering of the
   congestion control mechanism in TCP. This has also been observed by
   others [16]. Additionally TCP window flow control has the property
   that the start of the sliding window aligns itself to the dropped
   packet and stops there till that packet is successfully
   retransmitted. This reduces the amount of data that the TCP source
   can pump into a congested network. In implementations of TCP with the
   fast recovery and retransmit options the recovery from the lost
   packet is sooner since at least one buffer worth of data is
   transmitted after the lost packet (which in turn generate the
   required duplicate acknowledgments for the fast retransmit or
   recovery).

   Transition to Phase 1: When TCP detects a lost packet, depending on
   the implementation, it will either reduce its window size to 1
   packet, or it reduces the window size to half its current window
   size. When multiple packets are lost within the same TCP window
   different TCP implementations will recover differently.

   The first packet that is dropped causes the reduction in TCP window
   and hence its average rate. The multiple packet losses within the
   same TCP window causes a degradation of throughput and is not
   desirable, irrespective of the TCP implementation. After the first
   packet is dropped the algorithm makes a transition to Phase 1 and
   does not drop any packets to cause rate reduction in this phase.

   Phase 1 : In this phase the algorithm does not drop packets to convey
   a reduction of the ACR rate. The packets are dropped only when the
   queue reaches the MQS value. The packets are dropped from the front
   because of the same reasons as described in Phase 0.  When a packet
   is dropped the algorithm records the number of packets in the queue
   in the variable P_CTR. The TCP window size is at least as large as
   P_CTR when a packet is dropped. Thus, the algorithm tries not to drop


Jagannath, Yin                                                 [Page 15]

INTERNET DRAFT                August 1997        Expires February 5 1998


   any more packets due to rate reduction till P_CTR packets are
   serviced.

   Transition to Phase 0: If the ACR stays at the value that caused the
   transition to Phase 1, the queue length will decrease after one round
   trip time and the algorithm can transit to Phase 0. If the ACR
   decreases further the algorithm eventually drops another packet when
   the queue length reaches the MQS, but does not transit back to Phase
   0. The transition to Phase 0 takes place when at least P_CTR packets
   have been serviced and queue length recedes below the LT threshold.


4.2 Simulation Results


   The initial set of results served to identify the problem and to
   motivate the edge device congestion handling mechanism. In this
   section we discuss some of the simulation results with the use of the
   algorithm described above. In order to get better understanding of
   the behavior of the algorithm these simulation experiments were
   performed with a smaller TCP packet size - 1500 bytes. A 9180 byte
   TCP packet is approximately equivalent to 192 ATM cells. This would
   be very close to the threshold value of the algorithm for the small
   buffer range that we are considering and would unduly influence the
   results. The rest of the simulation parameters are the same as used
   above with all results using TCP with fast retransmit and recovery.

   The results are shown for UBR and UBR with Early Packet Discard
   (EPD). Along with ABR we show results for ABR with a simple front
   drop strategy in the edge device and for ABR with the Adaptive
   Discard Algorithm. These are termed as ABR-ER, ABR-ER with FD and
   ABR-ER with ADA in the tables below. From the results it is evident
   that the end- to-end performance improves as we use intelligent
   packet discard algorithms in the edge device. There is significant
   improvement due to the early detection of congestion based on the
   network feedback information received due to ABR.


    Table 5: EB = 500, SB = 1000,
             TCP with Fast Retransmit and Recovery
     ____________________________________________
    |                  UBR         UBR with EPD  |
    |--------------------------------------------|
    | Delay (D)       370 ms         230 ms      |


Jagannath, Yin                                                 [Page 16]

INTERNET DRAFT                August 1997        Expires February 5 1998


    | Goodput(R)     63.4 Mbps      79.1 Mbps    |
     --------------------------------------------


    Table 6: EB = 500, SB = 1000,
             TCP with Fast Retransmit and Recovery
     __________________________________________________________
    |              ABR-ER    ABR-ER with FD   ABR-ER with ADA  |
    |----------------------------------------------------------|
    | Delay (D)    715 ms       237 ms           244 ms        |
    | Goodput(R)  36.15 Mbps   67.9 Mbps        87.68 Mbps     |
     ----------------------------------------------------------

    Table 7: EB = 1000, SB = 1000,
             TCP with Fast Retransmit and Recovery
     __________________________________________________________
    |              ABR-ER    ABR-ER with FD   ABR-ER with ADA  |
    |----------------------------------------------------------|
    | Delay (D)    310 ms       145 ms           108 ms        |
    | Goodput(R)  66.7 Mbps    107.7 Mbps       112.86 Mbps    |
     ----------------------------------------------------------

   The use of the algorithm leads to increase in the TCP throughput and
   a decrease in the end- to-end delays seen by the end host. The
   algorithm has two effects on the TCP streams - it detects congestion
   early and drops a packet as soon as it detects congestion, plus it
   tries to avoid dropping additional packets. The first packet that is
   dropped achieves a reduction in the TCP window size and hence the
   input rates, the  algorithm tries not to drop any more.  This keeps
   the TCP streams active with a higher sustained average rate. In order
   to improve the ABR throughput and delay performance it seems to be
   important to control the TCP window dynamics and reduce possibilities
   for negative interaction between the TCP flow control loop and the
   explicit rate feedback loop.


5. Summary


   In this document we discuss the necessity for considering end-to-end
   traffic management in IP/ATM internetworks. In particular we consider
   the impact of the interactions between the TCP and the ABR flow
   control loops in providing end-to-end quality of service to the user.
   We also discuss the trade-off in the memory requirements between the
   switch and edge device based on the type of flow control employed


Jagannath, Yin                                                 [Page 17]

INTERNET DRAFT                August 1997        Expires February 5 1998


   within the ATM subnetwork. We show that it is important to consider
   the behavior of the end-to-end protocol like TCP when selecting the
   traffic management mechanisms in the IP/ATM internetworking
   environment.

   The ABR rate control pushes the congestion out of the ATM subnetwork
   and into the edge devices. We show that this can have a detrimental
   effect on the end-to-end performance where the buffers at the edge
   device are limited. In such cases we show that UBR with Early Packet
   Discard can achieve better performance than ABR.

   We present an algorithm for handling the congestion in the edge
   device which improves the end-to-end throughput and delay
   significantly while using ABR. The algorithm couples the two flow
   control loops by using the information received by the explicit rate
   scheme to intelligently discard packets when there is congestion in
   the edge device.  In addition the algorithm is sensitive to the TCP
   flow control and recovery mechanism to make sure that the throughput
   does not suffer due closing of the TCP window. As presented in this
   paper the algorithm is limited to cases where different TCP streams
   are queued separately. As future work the algorithm will be extended
   to include shared memory cases. More work needs to be done to include
   studies with heterogeneous sources with different round- trip delays.


References


   [1] ATM Forum TM SWG "Traffic Management Specification V. 4.0"
   ATM Forum, April 1996.

   [2] V. Jacobson, "Modified TCP Congestion Avoidance Algorithm,"
   end2end-interest mailing list, April 30, 1990.
   ftp://ftp.isi.edu/end2end/end2end-interest-1990.mail.

   [3] K. Fall and S. Floyd, "Comparisons of Tahoe, Reno and Sack TCP",
   available from http://www-nrg.ee.lbl.gov/nrg-abstracts.html#KF95.

   [4] J. C. Hoe, "Improving the Start-up behavior of Congestion Control Scheme for TCP",
   SIGCOMM 96.

   [5] RFC 1577, "Classical IP and ARP over ATM", December 1993.

   [6] RFC 793, "Transmission Control Protocol", September 1981.


Jagannath, Yin                                                 [Page 18]

INTERNET DRAFT                August 1997        Expires February 5 1998


   [7] RFC 1122, "Requirements for Internet Hosts - Communication Layers",
   October 1989.

   [8] C. Fang and A. Lin, "A Simulation Study of ABR Robustness with
   Binary-Mode Switches", ATMF-95-132, Oct. 1995

   [9] H. Chen, and J. Brandt, "Performance Evaluation of EFCI Switch
   with Per VC Queuing", ATMF 96-0048, Feb. 1996

   [10] H. Li, K. Sui, and H. Tzeng, "TCP over ATM with ABR vs. UBR+EPD"
   ATMF 95-0718, June 1995

   [11] R. Jain, S. Kalyanaraman, R. Goyal and S. Fahmy, "Buffer Requirements for
   TCP over ABR", ATMF 96-0517, April 1996.

   [12] P. Newman, "Data over ATM: Standardization of Flow Control"
   SuperCon'96, Santa Clara, Jan 1996.

   [13] N. Yin and S. Jagannath, "End-to-End Traffic Management in IP/ATM
   Internetworks", ATM Forum 96-1406, October 1996.

   [14] S. Jagannath and N. Yin, "End-to-End TCP Performance in IP/ATM Internetworks",
   ATM Forum 96-1711, December 1996.

   [15] A. Romanow and S. Floyd, "Dynamics of TCP Traffic over ATM Networks", IEEE
   Journal of Selected Areas in Communications, pp 633-41, vol. 13, N o. 4, May 1995.

   [16] T. V. Lakshman et al., "The Drop from Front Strategy in TCP and TCP over ATM",
   IEEE Infocom, 1996.


Authors' Address

        S. Jagannath and N. Yin
        (508) 670-8888, 670-8153 (fax)
        jagan_pop@baynetworks.com,
        nyin@baynetworks.com
        Bay Networks Inc.


The ION working group can be contacted through the chairs:
        Andy Malis               George Swallow
        <malis@casc.com>         <swallow@cisco.com>


Jagannath, Yin                                                 [Page 19]