idnits 2.17.1 

draft-jagan-e2e-traf-mgmt-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-18) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 9 instances of too long lines in the document, the longest one
     being 17 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 153 has weird spacing: '...d delay  are i...'

  == Line 161 has weird spacing: '...eedback  delay...'

  == Line 165 has weird spacing: '...eedback  infor...'

  == Line 325 has weird spacing: '...e other  activ...'

  == Line 399 has weird spacing: '...uces it  to ha...'

  == (1 more instance...)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (August 1997) is 9743 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '8-11' on line 69

  == Unused Reference: '2' is defined on line 728, but no explicit reference
     was found in the text

  == Unused Reference: '3' is defined on line 732, but no explicit reference
     was found in the text

  == Unused Reference: '6' is defined on line 740, but no explicit reference
     was found in the text

  == Unused Reference: '7' is defined on line 742, but no explicit reference
     was found in the text

  == Unused Reference: '8' is defined on line 745, but no explicit reference
     was found in the text

  == Unused Reference: '9' is defined on line 748, but no explicit reference
     was found in the text

  == Unused Reference: '10' is defined on line 751, but no explicit reference
     was found in the text

  == Unused Reference: '11' is defined on line 754, but no explicit reference
     was found in the text

  == Unused Reference: '13' is defined on line 760, but no explicit reference
     was found in the text

  == Unused Reference: '14' is defined on line 763, but no explicit reference
     was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  -- Possible downref: Non-RFC (?) normative reference: ref. '4'

  ** Obsolete normative reference: RFC 1577 (ref. '5') (Obsoleted by RFC 2225)

  ** Obsolete normative reference: RFC  793 (ref. '6') (Obsoleted by RFC 9293)

  -- Possible downref: Non-RFC (?) normative reference: ref. '8'

  -- Possible downref: Non-RFC (?) normative reference: ref. '9'

  -- Possible downref: Non-RFC (?) normative reference: ref. '10'

  -- Possible downref: Non-RFC (?) normative reference: ref. '11'

  -- Possible downref: Non-RFC (?) normative reference: ref. '12'

  -- Possible downref: Non-RFC (?) normative reference: ref. '13'

  -- Possible downref: Non-RFC (?) normative reference: ref. '14'

  -- Possible downref: Non-RFC (?) normative reference: ref. '15'

  -- Possible downref: Non-RFC (?) normative reference: ref. '16'


     Summary: 13 errors (**), 0 flaws (~~), 17 warnings (==), 16 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group
3	INTERNET DRAFT                                     S. Jagannath and N. Yin
4	Expire in six months                               Bay Networks Inc.
5	                                                   August 1997

7	      End-to-End Traffic Management Issues in IP/ATM Internetworks
8	                   <draft-jagan-e2e-traf-mgmt-00.txt>

10	Status of this Memo

12	   This document is an Internet-Draft.  Internet-Drafts are working
13	   documents of the Internet Engineering Task Force (IETF), its areas,
14	   and its working groups.  Note that other groups may also distribute
15	   working documents as Internet-Drafts.
16	      Internet-Drafts are draft documents valid for a maximum of six
17	   months and may be updated, replaced, or obsoleted by other documents
18	   at any time.  It is inappropriate to use Internet-Drafts as reference
19	   material or to cite them other than as ``work in progress.''
20	      To learn the current status of any Internet-Draft, please check
21	   the ``1id-abstracts.txt'' listing contained in the Internet-Drafts
22	   Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net
23	   (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
24	   Rim).

26	Abstract

28	   This document addresses the end-to-end traffic management issues in
29	   IP/ATM internetworks. In the internetwork environment, the ATM
30	   control mechanisms (e.g., Available Bit Rate (ABR) and UBR with Early
31	   Packet Discard (EPD)) are applicable to the ATM subnetwork, while the
32	   TCP flow control extends from end to end. We investigated the end to
33	   end performance in terms of TCP throughput and file transfer delay in
34	   cases using ABR and UBR in the ATM subnetwork. In this document, we
35	   also discuss the issue of trade-off between the buffer requirements
36	   at the ATM edge device (e.g., Ethernet-ATM switch, ATM router
37	   interface) versus ATM switches inside the ATM network.

39	   Our simulation results show that in certain scenarios (e.g., with
40	   limited edge device buffer memory) UBR with EPD may perform
41	   comparably to ABR or even outperform ABR. We show that it is not
42	   sufficient to have a lossless ATM subnetwork from the end-to-end
43	   performance point of view. The results illustrate the necessity for
44	   an edge device congestion handling mechanism that can couple the ABR
45	   and TCP feedback control loops.  We present an algorithm that makes
46	   use of the ABR feedback information and edge device congestion state
47	   to make packet dropping decisions at the edge of the ATM network.
48	   Using the algorithm at the edge device, the end-to-end performance in
49	   throughput and delay are improved while using ABR as the ATM
50	   subnetwork technology and with small buffers in the edge device.

52	1.  Introduction

54	   In an IP/ATM internetwork environment, the legacy networks (e.g.,
55	   Ethernet) are typically interconnected by an ATM subnetwork (e.g.,
56	   ATM backbone). Supporting end-to-end Quality of Services (QOS) and
57	   traffic management in the internetworks become important requirements
58	   for multimedia applications. In this document, we focus on the end-
59	   to-end traffic management issues in IP/ATM internetwork. We
60	   investigated the end-to-end performance in terms of TCP throughput
61	   and file transfer delay in cases using ABR and UBR in the ATM
62	   subnetwork. In this document, we also discuss the issue of trade-off
63	   between the buffer requirement at the ATM edge device (e.g.,
64	   Ethernet-ATM switch, ATM router interface) versus ATM switches inside
65	   the ATM network.

67	   ABR rate-based control mechanism has been specified by the ATM Forum
68	   [1].  In the past, there have been significant investigations on ABR
69	   flow control and TCP over ATM [8-11].  Most of the works were focused
70	   on the ATM switch buffer requirement and throughput fairness using
71	   different algorithms (e.g., simple binary - Explicit Forward
72	   Congestion Indication (EFCI) vs. Explicit Rate Control; FIFO vs. per
73	   VC queuing, etc.). It is generally believed that ABR can effectively
74	   control the congestion within ATM networks.

76	   There has been little work on end-to-end traffic management in the
77	   internetworking environment, where networks are not end-to-end ATM.
78	   In such cases ABR flow control may simply push the congestion to the
79	   edge of ATM network. Even if the ATM network is kept free of
80	   congestion by using ABR flow control, the end-to-end performance
81	   (e.g., the time to transfer a file) perceived by the application
82	   (typically hosted in the legacy network) may not necessarily be
83	   better. Furthermore, one may argue that the reduction in buffer
84	   requirement in the ATM switch by using ABR flow control may be at the
85	   expense of an increase in buffer requirement at the edge device
86	   (e.g., ATM router interface, legacy LAN to ATM switches).

88	   Since most of today's data applications use TCP flow control
89	   protocol, one may question the benefits of ABR flow control as a
90	   subnetwork technology, arguing that UBR is equally effective and much
91	   less complex than ABR [12]. Figures 1 and 2 illustrate the two cases
92	   under consideration.

94	    Source --> IP -> Router --> ATM Net --> Router -> IP --> Destination
95	               Cloud             UBR                  Cloud

97	            ----------------------->----------------------
98	           |                                              |
99	            --------------- TCP Control Loop --<----------

101	                    Figure 1: UBR based ATM Subnetwork.

103	   In these cases, source and destination are interconnected through
104	   IP/ATM/IP internetworks.  In Figure 1, the connection uses UBR
105	   service and in Figure 2, the connection uses ABR.  In the UBR case,
106	   when congestion occurs in ATM networks, cells are dropped (e.g.,
107	   Early Packet Discard [15] may be utilized) resulting in the reduction
108	   of the TCP congestion window. In the ABR case, when congestion is
109	   detected in ATM Networks, ABR rate control becomes effective and
110	   enforces the ATM edge device to reduce its transmission rate into the
111	   ATM network. If congestion persists, the buffer in the edge device
112	   will reach its capacity and start to drop packets, resulting in the
113	   reduction of the TCP congestion window.

115	   From the performance point of view, the latter involves two control
116	   loops: ABR and TCP.  The ABR inner control loop may result in a
117	   longer feedback delay for the TCP control.  Furthermore, there are
118	   two feedback control protocols, and one may argue that the
119	   interactions or interference between the two may actually degrade the
120	   TCP performance [12].

122	    Source --> IP -> Router --> ATM Net --> Router -> IP --> Destination
123	               Cloud             ABR                  Cloud

125	                             ----->---------
126	                            |               |
127	                             -- ABR Loop -<-
128	            -------------------------->-------------------
129	           |                                              |
130	            --------------- TCP Control Loop --<----------

132	                   Figure 2. ABR Based ATM Subnetwork

134	   From the implementation perspective, there are trade-off between
135	   memory requirements at ATM switch and edge device for UBR and ABR. In
136	   addition, we need to consider the costs of implementation of ABR and
137	   UBR at ATM switches and edge devices.

139	   From this discussion, we believe that the end-to-end traffic
140	   management of an entire internetwork environment should be
141	   considered. In this document, we study the interaction and
142	   performance of ABR and UBR from the point of view of end-to-end
143	   traffic management. Our simulation results show that in certain
144	   scenarios (e.g., with limited edge device buffer memory) UBR with EPD
145	   may perform comparably to ABR or even outperform ABR. We show that it
146	   is not sufficient to have a lossless ATM subnetwork when we are
147	   looking at the end-to-end performance point of view. The results
148	   illustrate the necessity for an edge device congestion handling
149	   mechanism that can couple the two - ABR and TCP feedback control
150	   loops. We present an algorithm that makes use of the ABR feedback
151	   information and edge device congestion state to make packet dropping
152	   decisions at the edge of the ATM network,. Using the algorithm at the
153	   edge device, the end-to-end throughput and delay  are improved while
154	   using ABR as the ATM subnetwork technology and with small buffers in
155	   the edge device.

157	2. Buffer Requirements

159	   In the context of feedback congestion control, buffers are used to
160	   absorb the transient traffic conditions and steady state rate
161	   oscillations due to the feedback  delay. The buffers help to avoid
162	   losses in the network and to improve link utilization as aggregated
163	   traffic rate is below the link bandwidth. When there is congestion
164	   the switch conveys the information back to the source. The source
165	   reduces its rate according to the feedback  information. The switch
166	   sees a drop in its input rates after a round trip delay. If the
167	   buffer size is large enough to store the extra data during this
168	   delay, it is possible to avoid losses. When the congestion abates,
169	   the switch sends information back to the source allowing it to
170	   increase its rate. There is again a delay before the switch sees an
171	   increase in its input rates.  Meanwhile the data is drained from the
172	   buffer at a higher rate compared with its input rate.  The switch
173	   continues to see a full link utilization as long as there is data in
174	   the buffer. Since the maximum drain rate is the bandwidth of the link
175	   and the minimum feedback delay is the round trip propagation delay,
176	   it is generally believed that a buffer size of one bandwidth
177	   propagation delay product is required to achieve good throughput.

179	   The above argument assumes that the source can reduce its rate when
180	   it receives feedback information from the network and that the source
181	   will be able to send data at a higher rate when congestion abates in
182	   the network. In the internetworking environment, the edge device is
183	   the ATM end-system. It controls its transmission rate according to
184	   the ABR feedback information, whereas TCP sources send data packet
185	   according to the TCP window control protocol.

187	   TCP uses implicit negative feedback information (i.e., packet loss
188	   and time out). When there is congestion in the network, TCP may
189	   continue to transmit a full window amount of data, causing packet
190	   loss if the buffer size is not greater than the window size. In case
191	   of multiple TCP streams sharing the same buffer it is impossible to
192	   size the buffer to be greater than the sum of the window size of all
193	   the streams. When there is packet loss, the TCP window size is
194	   reduced multiplicatively and it goes through a recovery phase. In
195	   this phase, TCP is constrained to transmit at a lower rate even if
196	   the network is free of congestion, causing a decrease in the link
197	   utilization.

199	   When ABR is used, it is possible to avoid losses within the ATM
200	   subnetwork. However, since the host uses TCP, the congestion is
201	   merely shifted from the ATM subnetwork to the IP/ATM interface. There
202	   may be packet loss in the edge device and loss of throughput because
203	   of that. Moreover, there is a possibility that there will be negative
204	   interaction between the two feedback loops. For example, when the TCP
205	   window is large, the available rate may be reduced and when the TCP
206	   window is small due to packet losses, the available rate may be high.
207	   When edge buffers are limited this kind of negative interaction can
208	   cause a severe degradation in the throughput. When UBR is used,
209	   packet loss due to EPD or cell discard at the ATM switches triggers
210	   the reduction of TCP window. Hence, in the UBR case, there is only a
211	   single TCP feedback loop and no additional buffer requirement at the
212	   edge device.

214	   In general, using UBR may need more memory at the ATM switch, whereas
215	   using ABR needs more memory at the edge device. The amount of
216	   required buffers for zero cell loss at ATM switch for UBR and at edge
217	   device for ABR are in order of maximum TCP window times the number of
218	   TCP sessions for zero cell loss. The edge device may be a low cost
219	   LAN-ATM switch with limited buffer memory. Furthermore the cost of
220	   the edge device is shared by lesser number of TCP sessions as
221	   compared to the ATM switch - making it more cost sensitive than the
222	   backbone switch. In reality, the buffers at both edge device and ATM
223	   switch are limited and smaller than required for zero cell loss.

225	   Hence, it is our goal to maximize the TCP throughput for the limited
226	   buffers at both edge device and ATM switches. In the our simulation
227	   experiments, we assume limited buffers at both edge device and ATM
228	   switches.

230	3. Internetwork Model

232	   In the end-to-end model the whole network can be considered to be a
233	   black box with TCP sources and destinations in the periphery. The
234	   only properties of the black box that the TCP modules are sensitive
235	   to are the packet loss and the round trip delay through the black
236	   box.  The round trip delay determines how fast the TCP window can
237	   open up to utilize available bandwidth in the network, and also how
238	   fast it can react to impending congestion in the network. The packet
239	   loss triggers the TCP congestion avoidance and recovery scheme. The
240	   feedback control loop of the ABR service may cause an increase in the
241	   over-all round trip delay as it constricts the rate available to each
242	   TCP stream. It is generally believed that ABR is able to reduce the
243	   congestion level within the ATM network, however it does so at the
244	   expense of increasing the congestion level at the edge of the ATM
245	   network. From the an end-to-end performance point of view it is not
246	   clear if this is indeed beneficial.

248	   In our simulations we use a modified version of TCP Reno. This
249	   implementation of TCP uses the fast retransmission and recovery along
250	   with a modification to deal with multiple packet losses within the
251	   same window[4]. The simulation experiments were performed using a
252	   TCP/IP and IP over ATM [5] protocol stack, as shown in Figure 3. The
253	   TCP model conforms to RFCs 793 & 1122. In addition, the window
254	   scaling option is used and the TCP timer granularity is set to be
255	   0.1s. The options considered for ATM are UBR, UBR with Early Packet
256	   Discard (EPD) ABR with EFCI and ABR with Explicit Rate (ER) [1].

258	   The bigger picture of the model consists of two IP clouds connected
259	   by an ATM cloud as shown in Figure 1 and 2. In more specific details
260	   the model consists of two ATM switches connected back to back with a
261	   single OC-3 (155 Mb/s) link. The IP/ATM edge devices are connected to
262	   the ATM switches via OC-3 access links. 10 such devices are attached
263	   to each switch and therefore there are 10 TCP bi-directional
264	   connections that go through the bottleneck link between the ATM
265	   switches. One such connection is shown in the layered model in the
266	   Figure 3.

268	     _________                                           _________
269	    |  App.   |                                         |  App.   |
270	    |---------|                                         |---------|
271	    | App. I/F|                                         | App. I/F|
272	    |---------|                                         |---------|
273	    |   TCP   |                                         |   TCP   |
274	    |---------|   _________                 _________   |---------|
275	    |         |  |    IP   |               |    IP   |  |         |
276	    |         |  |   over  |               |   over  |  |         |
277	    |         |  |   ATM   |               |   ATM   |  |         |
278	    |         |  |---------|               |---------|  |         |
279	    |   IP    |  | IP |AAL5|               |AAL5 | IP|  |    IP   |
280	    |         |  |    |----|   _________   |-----|   |  |         |
281	    |         |  |    | ATM|  |   ATM   |  | ATM |   |  |         |
282	    |---------|  |----|----|  |---------|  |-----|---|  |---------|
283	    |   PL    |  | PL | PL |  | PL | PL |  | PL  | PL|  |    PL   |
284	     ---------    ---------    ---------    ---------     --------
285	        |         |     |      |     |       |      |         |
286	         ---------       ------       -------        ---------

288	                         -------->----------
289	                        |                   |
290	                         ----- ABR Loop -<--

292	             ------------>--------------->--------------------
293	            |                                                 |
294	             -----<---------TCP Control Loop -<---------------
295	                       Figure 3. Simulation model

297	   The propagation delay (D1 + D2 + D3) and the transmission delay
298	   corresponding to the finite transmission rate in the IP cloud is
299	   simulated in the TCP/IP stack. In the model described below the delay
300	   associated with each of the clouds is 3ms (D1 = D2 = D3 = 3ms). The
301	   performance metrics considered are basically the goodput at the TCP
302	   layer R, the time delay for a successful transmission of a 100KB file
303	   (D = duration between the transmission of the first byte and the
304	   receipt of an ack for the last byte). The goodput at the TCP layer is
305	   defined as the rate at which the data is successfully transmitted up
306	   to the application layer from the TCP layer. Data that is
307	   retransmitted is counted only once when it gets transmitted to the
308	   application layer. Packets that arrive out of sequence have to wait
309	   before all the preceding data is received before they can be
310	   transmitted to the application layer.

312	   Explicit Rate Switch Scheme: The switch keeps a flag for every VC
313	   which indicates if the VC is active or not. Switch maintains a timer
314	   for each VC. Every time a cell comes in on a particular VC the flag
315	   is turned on (if it is off) and the corresponding timer is started
316	   (if the flag is already on the timer is restarted). This flag is only
317	   turned off when the timer expires.  Thus, the VC is considered active
318	   when the flag is on and inactive when the flag is off. The explicit
319	   rate timer value (ERT) is a parameter that can be adjusted. For our
320	   simulation we set the timer value to 1ms - this corresponds to little
321	   more than the transmission time for a single TCP packet (at line
322	   rate). The explicit rate for each VC is simply the total link
323	   bandwidth divided equally between all active VCs. Thus, if a VC is
324	   inactive for a duration more than the timer value, its bandwidth is
325	   reallocated to the other  active VCs.

327	3.1 Simulation Results

329	   In this section we provide some initial results that help us
330	   understand the issues discussed in the previous sections.  The
331	   simulation parameters are the following:

333	   For TCP with Fast Retransmission and Recovery:
334	           Timer Granularity       = 100 ms.
335	           MSS                             = 9180 bytes.
336	           Round Trip prop.        = 18 ms = 2*(D1 + D2 + D3).
337	           Max. RTO                = 500 ms.
338	           Initial RTO             = 100 ms.
339	           Max. Window Size        = 655360 bytes ( > Bw * RTT)

341	   For ABR:
342	           Round Trip Prop. Delay  = 6ms = 2*(D2)
343	           Buffer sizes                    = variable
344	           EFCI Thresholds         = configurable parameter
345	           Nrm                             = 32
346	           RDF=RIF                         = 1/32
347	           All link rates                  = 155Mb/s
348	           Explicit Rate Timer (ERT)       = 1ms The application is
349	   modeled to have infinite amount of data.

351	   The performance metrics shown are the goodput at the TCP layer, R and
352	   the time delay for a successful transmission of a 100KB file (D =
353	   duration between the transmission of the first byte and the receipt
354	   of an ack for the last byte) within a contiguous data stream.

356	   Note: All results are with a TCP timer granularity of 100 ms.

358	    Table 1: EB = 500, SB = 1000, TCP with FRR
359	        Edge-buffer size = 500 cells,
360	        Switch-buffer size = 1000 cells,
361	        ABR-EFCI threshold = 250,
362	        UBR-EPD threshold = 700, Exp. Rate Timer (ERT) = 1ms.
363	     ____________________________________________________________
364	    |               UBR       UBR + EPD    ABR-EFCI     ABR-ER   |
365	    |------------------------------------------------------------|
366	    | Delay (D)    520 ms      521 ms      460 ms      625 ms    |
367	    | Goodput(R)  81.06 Mbps  87.12 Mbps  77.77 Mbps  58.95 Mbps |
368	     ------------------------------------------------------------

370	   In the case of UBR and UBR+EPD the switch buffer is fully utilized
371	   but maximum occupancy of the edge buffers is about the size of 1 TCP
372	   packet. Thus, cells are lost only in the switch buffer in case of
373	   UBR. ABR with EFCI reduces the cell loss in the switch but results in
374	   increase in the cell loss in the edge buffer. ABR with ER eliminates
375	   the cell loss completely from the switch buffer but results in more
376	   cell loss in the edge device. Thus in the case shown, ER results in
377	   the lowest throughput and longest file transfer delay.

379	   For the sake of comparison, in Table 2 (below) we show the results
380	   for the TCP without fast retransmission and recovery which were
381	   obtained earlier.

383	    Table 2: EB = 500, SB = 1000, No Fast Retransmit and Recovery
384	        Edge-buffer size = 500 cells,
385	        Switch-buffer size = 1000 cells,
386	        ABR-EFCI threshold = 250, UBR-EPD threshold = 700.
387	     ________________________________________________
388	    |               UBR       UBR+EPD     ABR-EFCI   |
389	    |------------------------------------------------|
390	    | Delay (D)    201 ms      139 ms      226 ms    |
391	    | Goodput(R)  63.94 Mbps  86.39 Mbps  54.16 Mbps |
392	     ------------------------------------------------

394	   The set of results (Table 1) with the enhanced TCP show an overall
395	   improvement in throughput, but an increase in the file transfer
396	   delay. In the presence of fast recovery and retransmit the lost
397	   packet is detected with the receipt of duplicate acknowledgments, but
398	   the TCP instead of reducing its window size to 1 packet, merely
399	   reduces it  to half its current size. This increases the total amount
400	   of data that is transmitted by TCP. In the absence of FRR the TCP
401	   sources are silent for long periods of time when TCP recovers a
402	   packet loss through time-out.  The increase in throughput in the case
403	   of TCP with FRR can be mainly attributed to more amount of data being
404	   transmitted by the TCP source. However, that also increases the
405	   number of packets that are dropped causing an increase in the average
406	   file transfer delay. (The file transfer delay is the duration between
407	   the transmit of the first byte of the file to the receipt for the
408	   acknowledgment for the last byte of the file.)

410	   In both the sets of results we see that UBR with Early Packet Discard
411	   results in better performance.  While the ER scheme (Table 1) results
412	   in zero cell loss within the ATM network, it does not translate to
413	   better end-to-end performance for the TCP connections.  The limited
414	   buffer sizes on the edge of the network result in poor effective
415	   throughput, both due to increased retransmissions and poor
416	   interaction between the TCP window and the ABR allowed rate (ACR).

418	   All the earlier results assume limited buffers in the edge as well as
419	   the switch. We also performed simulations with larger buffers at the
420	   edge device and in the switch. When the edge device buffer size is
421	   increased the end-to-end throughput is improved, though the delay
422	   might still be high, as shown in the following tables.

424	    Table 3: EB = 5000, SB = 1000 with FRR
425	            Edge-buffer size = 5000 cells,
426	            Switch-buffer size = 1000 cells,
427	            ABR-EFCI threshold = 250, ER Timer (ERT) = 1ms
428	     ___________________________________________
429	    |             ABR with EFCI     ABR with ER |
430	    |-------------------------------------------|
431	    | Delay (D)     355 ms            233 ms    |
432	    | Goodput (R)  89.8 Mbps         112.6 Mbps |
433	     -------------------------------------------

435	    Table 4: EB = 10000, SB = 1000 with FRR
436	            Edge-buffer size = 10000 cells,
437	            Switch-buffer size = 1000 cells,
438	            ABR-EFCI threshold = 250, ER Timer (ERT) = 1ms
439	     ___________________________________________
440	    |             ABR with EFCI     ABR with ER |
441	    |-------------------------------------------|
442	    | Delay (D)     355 ms            256 ms    |
443	    | Goodput(R)   89.8 Mbps         119.6 Mbps |
444	     -------------------------------------------

446	   The maximum TCP window size is approximately 13600 cells (655360
447	   bytes). With larger buffers, the edge buffer is large enough to hold
448	   most of the TCP window. However, we still get low throughput for ABR
449	   with EFCI because of cell loss within the switch. ABR with ER on the
450	   other hand pushes the congestion into the edge device which can
451	   handle it with larger buffers. Cell loss occurs only when the edge
452	   buffer overflows which is a very infrequent event because of the size
453	   of the buffer. Since the TCP window dynamics is actually triggered by
454	   loss of packets, there is less interaction here between the TCP flow
455	   control and the ATM flow control mechanism, contrary to the earlier
456	   results (Table 1). The only factor that influences the TCP flow
457	   control, in the absence of lost packets, is the total round trip time
458	   - which in this case slowly increases as the buffers fill up, thereby
459	   increasing end-to-end delays.

461	   From the results it is evident that it is not sufficient to keep the
462	   ATM subnetwork free of congestion to provide good end-to-end
463	   throughput and delay performance. Our approach to solve this problem
464	   is to couple the two flow control loops. The information received
465	   from the network, along with the local queue sizes is used to detect
466	   congestion early and speed up the TCP slow-down and recovery
467	   processes. In the next section we present an edge device congestion
468	   handling results in significant improvement in the end-to-end
469	   performance with ABR-ER and small buffers at the edge device.

471	4. Edge Device Congestion Handling Mechanism

473	   The Packet Discard Algorithm, that couples ABR flow control with TCP
474	   window flow control, has the objectives of relieving congestion as
475	   well as conveying congestion information to the TCP source as soon as
476	   possible. In addition the algorithm needs to take into consideration
477	   how the TCP source can recover from the lost packet without a
478	   significant loss of throughput. Figure 5 gives a state-diagram for
479	   the algorithm followed by the pseudocode and an explanation for the
480	   algorithm. Currently we are working on an enhancement of this
481	   algorithm that is more sensitive to the explicit rate feedback and
482	   can lead to further improvement in the performance.

484	4.1 Adaptive Discard Algorithm

486	                               _________________
487	                        --->--|    Phase 0      |--->----
488	                       |       -----------------         |
489	    (ACR < f*LD_ACR    |                                 |  Queue < LT
490	     and Queue < LT)   |                                 |    and
491	          or           |                                 |  P_CTR = 0
492	      Queue = MQS      |                                 |
493	                       |       _________________         |
494	                        ---<--|    Phase 1      |---<----
495	                               -----------------

497	                        Figure 4. Algorithm State Diagram

499	    Variables:
500	    ACR = current Available Cell Rate(current cell transmission rate).
501	    ER  = maximum network allowed cell transmission rate from
502	          receiving RM cell, it's the ceiling of ACR.
503	    CI  = Congestion indication in network from receiving RM cell.
504	    LD_ACR  = ACR when a packet was last dropped, or when ACR is
505	             increased.  LD_ACR is always greater than or equal to ACR.
506	    QL  = Queue Length, number of packets in VC queue
507	    LT  = Low Queue Threshold

509	    MQS  = Maximum Queue Size
510	    P_CTR = Stores the value of the number of packets in the VC queue
511	            when a packet is dropped and is then decrement for every
512	            packet that is removed from the VC queue.
513	    Congestion Phase = 1 bit information on the phase of the
514	                       congestion (0 or 1)

516	    f   = Factor that determines the significance in rate reduction,
517	          with 0<f<1

519	        Initialize:
520	                LD_ACR = ACR;
521	                Congestion Phase = 0;

523	        When receiving an RM cell
524	                Update ACR value based on ER and CI information

526	        if (Congestion Phase = 0) {
527	                if ((ACR < f * LD_ACR or CI=1)
528	                              and  QL >= LT) or (QL >= MQS){
529	                        Drop packet from front of the Queue;
530	                        Congestion Phase = 1;
531	                        LD_ACR = ACR;
532	                        P_CTR = QL;
533	                        }

535	                if (ACR > LD_ACR)
536	                        LD_ACR = ACR ;
537	                }
538	        if (Congestion Phase = 1){
539	                if (QL = MQS){
540	                      Drop packet in front of queue;
541	                        LD_ACR = ACR;
542	                        P_CTR = QL;
543	                        }
544	              if (QL < LT and P_CTR=0)
545	                        Congestion Phase = 0;
546	              }

548	                Decrement P_CTR when a packet is serviced.
549	                P_CTR = Max (P_CTR, 0);

551	   Description: On receipt of the RM cells with the feedback information
552	   the ACR value is updated for the particular VC. When ACR is reduced
553	   drastically we would like to convey this information to the TCP
554	   source as soon as possible. The above algorithm uses this feedback
555	   information and the current queue length information to decide if a
556	   packet should be dropped from the VC. This coupling is a critical
557	   factor in resolving the issues discussed in earlier sections.  Every
558	   time a packet is dropped the algorithm updates the LD_ACR (Last Drop
559	   ACR) value and uses that as the reference rate against which it
560	   compares the new ACR values. This value is increased and made equal
561	   to the ACR whenever ACR is larger than its current value. Thus,
562	   LD_ACR is always greater than or equal to the ACR.  The algorithm
563	   consists of two phases. The criteria used to affect a packet drop is
564	   different in the two phases.

566	   Phase 0:  The network is assumed not to be congested in this phase.
567	   This is reflected both by an ACR that is slowly changing and queue
568	   length less than the maximum queue size (MQS). Two possible scenarios
569	   can cause a packet drop. If the ACR is constant or slowly changing
570	   eventually the TCP window and hence the input rate to the queue will
571	   become large enough to cause the queue length to touch the MQS level.
572	   Then it is required to drop a packet to trigger a reduction in the
573	   TCP window. In the second scenario a packet is dropped when there is
574	   a drastic reduction in the ER available to the VC and if the queue
575	   length is greater than a queue threshold. The threshold should be set
576	   to at least a few packets to ensure the transmission of duplicated
577	   acks, and should not be set too close to the size of queue to ensure
578	   the function of congestion avoidance.  A drastic reduction in the ACR
579	   value signifies congestion in the network and the algorithm sends an
580	   early warning to the TCP source by dropping a packet.

582	   Front Drop : In both the above cases the packet is dropped from the
583	   front of the queue.  This results in early triggering of the
584	   congestion control mechanism in TCP. This has also been observed by
585	   others [16]. Additionally TCP window flow control has the property
586	   that the start of the sliding window aligns itself to the dropped
587	   packet and stops there till that packet is successfully
588	   retransmitted. This reduces the amount of data that the TCP source
589	   can pump into a congested network. In implementations of TCP with the
590	   fast recovery and retransmit options the recovery from the lost
591	   packet is sooner since at least one buffer worth of data is
592	   transmitted after the lost packet (which in turn generate the
593	   required duplicate acknowledgments for the fast retransmit or
594	   recovery).

596	   Transition to Phase 1: When TCP detects a lost packet, depending on
597	   the implementation, it will either reduce its window size to 1
598	   packet, or it reduces the window size to half its current window
599	   size. When multiple packets are lost within the same TCP window
600	   different TCP implementations will recover differently.

602	   The first packet that is dropped causes the reduction in TCP window
603	   and hence its average rate. The multiple packet losses within the
604	   same TCP window causes a degradation of throughput and is not
605	   desirable, irrespective of the TCP implementation. After the first
606	   packet is dropped the algorithm makes a transition to Phase 1 and
607	   does not drop any packets to cause rate reduction in this phase.

609	   Phase 1 : In this phase the algorithm does not drop packets to convey
610	   a reduction of the ACR rate. The packets are dropped only when the
611	   queue reaches the MQS value. The packets are dropped from the front
612	   because of the same reasons as described in Phase 0.  When a packet
613	   is dropped the algorithm records the number of packets in the queue
614	   in the variable P_CTR. The TCP window size is at least as large as
615	   P_CTR when a packet is dropped. Thus, the algorithm tries not to drop
616	   any more packets due to rate reduction till P_CTR packets are
617	   serviced.

619	   Transition to Phase 0: If the ACR stays at the value that caused the
620	   transition to Phase 1, the queue length will decrease after one round
621	   trip time and the algorithm can transit to Phase 0. If the ACR
622	   decreases further the algorithm eventually drops another packet when
623	   the queue length reaches the MQS, but does not transit back to Phase
624	   0. The transition to Phase 0 takes place when at least P_CTR packets
625	   have been serviced and queue length recedes below the LT threshold.

627	4.2 Simulation Results

629	   The initial set of results served to identify the problem and to
630	   motivate the edge device congestion handling mechanism. In this
631	   section we discuss some of the simulation results with the use of the
632	   algorithm described above. In order to get better understanding of
633	   the behavior of the algorithm these simulation experiments were
634	   performed with a smaller TCP packet size - 1500 bytes. A 9180 byte
635	   TCP packet is approximately equivalent to 192 ATM cells. This would
636	   be very close to the threshold value of the algorithm for the small
637	   buffer range that we are considering and would unduly influence the
638	   results. The rest of the simulation parameters are the same as used
639	   above with all results using TCP with fast retransmit and recovery.

641	   The results are shown for UBR and UBR with Early Packet Discard
642	   (EPD). Along with ABR we show results for ABR with a simple front
643	   drop strategy in the edge device and for ABR with the Adaptive
644	   Discard Algorithm. These are termed as ABR-ER, ABR-ER with FD and
645	   ABR-ER with ADA in the tables below. From the results it is evident
646	   that the end- to-end performance improves as we use intelligent
647	   packet discard algorithms in the edge device. There is significant
648	   improvement due to the early detection of congestion based on the
649	   network feedback information received due to ABR.

651	    Table 5: EB = 500, SB = 1000,
652	             TCP with Fast Retransmit and Recovery
653	     ____________________________________________
654	    |                  UBR         UBR with EPD  |
655	    |--------------------------------------------|
656	    | Delay (D)       370 ms         230 ms      |
657	    | Goodput(R)     63.4 Mbps      79.1 Mbps    |
658	     --------------------------------------------

660	    Table 6: EB = 500, SB = 1000,
661	             TCP with Fast Retransmit and Recovery
662	     __________________________________________________________
663	    |              ABR-ER    ABR-ER with FD   ABR-ER with ADA  |
664	    |----------------------------------------------------------|
665	    | Delay (D)    715 ms       237 ms           244 ms        |
666	    | Goodput(R)  36.15 Mbps   67.9 Mbps        87.68 Mbps     |
667	     ----------------------------------------------------------

669	    Table 7: EB = 1000, SB = 1000,
670	             TCP with Fast Retransmit and Recovery
671	     __________________________________________________________
672	    |              ABR-ER    ABR-ER with FD   ABR-ER with ADA  |
673	    |----------------------------------------------------------|
674	    | Delay (D)    310 ms       145 ms           108 ms        |
675	    | Goodput(R)  66.7 Mbps    107.7 Mbps       112.86 Mbps    |
676	     ----------------------------------------------------------

678	   The use of the algorithm leads to increase in the TCP throughput and
679	   a decrease in the end- to-end delays seen by the end host. The
680	   algorithm has two effects on the TCP streams - it detects congestion
681	   early and drops a packet as soon as it detects congestion, plus it
682	   tries to avoid dropping additional packets. The first packet that is
683	   dropped achieves a reduction in the TCP window size and hence the
684	   input rates, the  algorithm tries not to drop any more.  This keeps
685	   the TCP streams active with a higher sustained average rate. In order
686	   to improve the ABR throughput and delay performance it seems to be
687	   important to control the TCP window dynamics and reduce possibilities
688	   for negative interaction between the TCP flow control loop and the
689	   explicit rate feedback loop.

691	5. Summary

693	   In this document we discuss the necessity for considering end-to-end
694	   traffic management in IP/ATM internetworks. In particular we consider
695	   the impact of the interactions between the TCP and the ABR flow
696	   control loops in providing end-to-end quality of service to the user.
697	   We also discuss the trade-off in the memory requirements between the
698	   switch and edge device based on the type of flow control employed
699	   within the ATM subnetwork. We show that it is important to consider
700	   the behavior of the end-to-end protocol like TCP when selecting the
701	   traffic management mechanisms in the IP/ATM internetworking
702	   environment.

704	   The ABR rate control pushes the congestion out of the ATM subnetwork
705	   and into the edge devices. We show that this can have a detrimental
706	   effect on the end-to-end performance where the buffers at the edge
707	   device are limited. In such cases we show that UBR with Early Packet
708	   Discard can achieve better performance than ABR.

710	   We present an algorithm for handling the congestion in the edge
711	   device which improves the end-to-end throughput and delay
712	   significantly while using ABR. The algorithm couples the two flow
713	   control loops by using the information received by the explicit rate
714	   scheme to intelligently discard packets when there is congestion in
715	   the edge device.  In addition the algorithm is sensitive to the TCP
716	   flow control and recovery mechanism to make sure that the throughput
717	   does not suffer due closing of the TCP window. As presented in this
718	   paper the algorithm is limited to cases where different TCP streams
719	   are queued separately. As future work the algorithm will be extended
720	   to include shared memory cases. More work needs to be done to include
721	   studies with heterogeneous sources with different round- trip delays.

723	References

725	   [1] ATM Forum TM SWG "Traffic Management Specification V. 4.0"
726	   ATM Forum, April 1996.

728	   [2] V. Jacobson, "Modified TCP Congestion Avoidance Algorithm,"
729	   end2end-interest mailing list, April 30, 1990.
730	   ftp://ftp.isi.edu/end2end/end2end-interest-1990.mail.

732	   [3] K. Fall and S. Floyd, "Comparisons of Tahoe, Reno and Sack TCP",
733	   available from http://www-nrg.ee.lbl.gov/nrg-abstracts.html#KF95.

735	   [4] J. C. Hoe, "Improving the Start-up behavior of Congestion Control Scheme for TCP",
736	   SIGCOMM 96.

738	   [5] RFC 1577, "Classical IP and ARP over ATM", December 1993.

740	   [6] RFC 793, "Transmission Control Protocol", September 1981.

742	   [7] RFC 1122, "Requirements for Internet Hosts - Communication Layers",
743	   October 1989.

745	   [8] C. Fang and A. Lin, "A Simulation Study of ABR Robustness with
746	   Binary-Mode Switches", ATMF-95-132, Oct. 1995

748	   [9] H. Chen, and J. Brandt, "Performance Evaluation of EFCI Switch
749	   with Per VC Queuing", ATMF 96-0048, Feb. 1996

751	   [10] H. Li, K. Sui, and H. Tzeng, "TCP over ATM with ABR vs. UBR+EPD"
752	   ATMF 95-0718, June 1995

754	   [11] R. Jain, S. Kalyanaraman, R. Goyal and S. Fahmy, "Buffer Requirements for
755	   TCP over ABR", ATMF 96-0517, April 1996.

757	   [12] P. Newman, "Data over ATM: Standardization of Flow Control"
758	   SuperCon'96, Santa Clara, Jan 1996.

760	   [13] N. Yin and S. Jagannath, "End-to-End Traffic Management in IP/ATM
761	   Internetworks", ATM Forum 96-1406, October 1996.

763	   [14] S. Jagannath and N. Yin, "End-to-End TCP Performance in IP/ATM Internetworks",
764	   ATM Forum 96-1711, December 1996.

766	   [15] A. Romanow and S. Floyd, "Dynamics of TCP Traffic over ATM Networks", IEEE
767	   Journal of Selected Areas in Communications, pp 633-41, vol. 13, N o. 4, May 1995.

769	   [16] T. V. Lakshman et al., "The Drop from Front Strategy in TCP and TCP over ATM",
770	   IEEE Infocom, 1996.

772	Authors' Address

774	        S. Jagannath and N. Yin
775	        (508) 670-8888, 670-8153 (fax)
776	        jagan_pop@baynetworks.com,
777	        nyin@baynetworks.com
778	        Bay Networks Inc.

780	The ION working group can be contacted through the chairs:
781	        Andy Malis               George Swallow
782	        <malis@casc.com>         <swallow@cisco.com>