idnits 2.17.1 

draft-hadi-jhsua-ecnperf-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 16
     longer pages, the longest (page 2) being 60 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 17 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 3 instances of too long lines in the document, the longest one
     being 1 character in excess of 72.

  ** The abstract seems to contain references ([6], [7]), which it shouldn't.
      Please replace those with straight textual mentions of the documents in
     question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The "Author's Address" (or "Authors' Addresses") section title is
     misspelled.

  == Line 262 has weird spacing: '...sts. We  did...'

  == Line 429 has weird spacing: '...   Prob    flo...'

  == Line 521 has weird spacing: '...ctional  trans...'

  == Line 659 has weird spacing: '...  J  et  al , ...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 2000) is 8807 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '8' is defined on line 684, but no explicit reference
     was found in the text

  == Unused Reference: '10' is defined on line 691, but no explicit reference
     was found in the text

  == Unused Reference: '14' is defined on line 704, but no explicit reference
     was found in the text

  == Unused Reference: '20' is defined on line 721, but no explicit reference
     was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  ** Downref: Normative reference to an Informational RFC: RFC 2475 (ref. '4')

  -- Possible downref: Non-RFC (?) normative reference: ref. '5'

  -- Possible downref: Non-RFC (?) normative reference: ref. '6'

  ** Obsolete normative reference: RFC 2481 (ref. '7') (Obsoleted by RFC 3168)

  -- Possible downref: Non-RFC (?) normative reference: ref. '8'

  -- Possible downref: Non-RFC (?) normative reference: ref. '9'

  -- Possible downref: Non-RFC (?) normative reference: ref. '10'

  -- Possible downref: Non-RFC (?) normative reference: ref. '11'

  -- Possible downref: Non-RFC (?) normative reference: ref. '12'

  -- Possible downref: Non-RFC (?) normative reference: ref. '13'

  -- Possible downref: Non-RFC (?) normative reference: ref. '14'

  ** Obsolete normative reference: RFC 2582 (ref. '15') (Obsoleted by RFC
     3782)

  -- Possible downref: Non-RFC (?) normative reference: ref. '16'

  -- Possible downref: Non-RFC (?) normative reference: ref. '17'

  -- Possible downref: Non-RFC (?) normative reference: ref. '18'

  ** Obsolete normative reference: RFC 2309 (ref. '19') (Obsoleted by RFC
     7567)

  -- Possible downref: Non-RFC (?) normative reference: ref. '20'

  -- Possible downref: Non-RFC (?) normative reference: ref. '22'

  -- Possible downref: Non-RFC (?) normative reference: ref. '23'

  -- Possible downref: Non-RFC (?) normative reference: ref. '24'

  -- Possible downref: Non-RFC (?) normative reference: ref. '25'

  -- Possible downref: Non-RFC (?) normative reference: ref. '26'

  -- Possible downref: Non-RFC (?) normative reference: ref. '27'


     Summary: 12 errors (**), 0 flaws (~~), 12 warnings (==), 24 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                    Jamal Hadi Salim
2	Internet Draft                                     Nortel Networks
3	Expires: Sept 2000                                 Uvaiz Ahmed
4	draft-hadi-jhsua-ecnperf-01.txt                Carleton University
5	                                                   March 2000

7	 Performance Evaluation of Explicit Congestion Notification (ECN) in IP
8	Networks

10	Status of this Memo

12	   This document is an Internet-Draft and is in full conformance with
13	   all provisions of Section 10 of RFC2026.

15	   Internet-Drafts are working documents of the Internet Engineering
16	   Task Force (IETF), its areas, and its working groups.  Note that
17	   other groups may also distribute working documents as Internet-
18	   Drafts.

20	   Internet-Drafts are draft documents valid for a maximum of six months
21	   and may be updated, replaced, or obsoleted by other documents at
22	   anytime. It is inappropriate to use Internet-Drafts as reference
23	   material or to cite them other than as "work in progress."

25	   The list of current Internet-Drafts can be accessed at
26	   http://www.ietf.org/ietf/1id-abstracts.txt

28	   The list of Internet-Draft Shadow Directories can be accessed at
29	   http://www.ietf.org/shadow.html.

31	Abstract

33	   This draft presents a performance study of the Explicit Congestion
34	   Notification (ECN) mechanism in the TCP/IP protocol using our
35	   implementation on the Linux Operating System. ECN is an end-to-end
36	   congestion avoidance mechanism proposed by [6] and incorporated into
37	   RFC 2481[7]. We study the behavior of ECN for both bulk and
38	   transactional transfers. Our experiments show that there is
39	   improvement in throughput over NON ECN (TCP employing any of Reno,
40	   SACK/FACK or NewReno congestion control) in the case of bulk
41	   transfers and substantial improvement for transactional transfers.

43	   A more complete pdf version of this document is available at:
44	   http://www7.nortel.com:8080/CTL/ecnperf.pdf

46	   This draft in its current revision is missing a lot of the visual
47	   representations and experimental results found in the pdf version.

49	1. Introduction

51	   In current IP networks, congestion management is left to the
52	   protocols running on top of IP. An IP router when congested simply
53	   drops packets.  TCP is the dominant transport protocol today [26].
54	   TCP infers that there is congestion in the network by detecting
55	   packet drops (RFC 2581). Congestion control algorithms [11] [15] [21]
56	   are then invoked to alleviate congestion.  TCP initially sends at a
57	   higher rate (slow start) until it detects a packet loss. A packet
58	   loss is inferred by the receipt of 3 duplicate ACKs or detected by a
59	   timeout. The sending TCP then moves into a congestion avoidance state
60	   where it carefully probes the network by sending at a slower rate
61	   (which goes up until another packet loss is detected).  Traditionally
62	   a router reacts to congestion by dropping a packet in the absence of
63	   buffer space. This is referred to as Tail Drop. This method has a
64	   number of drawbacks (outlined in Section 2). These drawbacks coupled
65	   with the limitations of end-to-end congestion control have led to
66	   interest in introducing smarter congestion control mechanisms in
67	   routers.  One such mechanism is Random Early Detection (RED) [9]
68	   which detects incipient congestion and implicitly signals the
69	   oversubscribing flow to slow down by dropping its packets. A RED-
70	   enabled router detects congestion before the buffer overflows, based
71	   on a running average queue size, and drops packets probabilistically
72	   before the queue actually fills up. The probability of dropping a new
73	   arriving packet increases as the average queue size increases above a
74	   low water mark minth, towards higher water mark maxth. When the
75	   average queue size exceeds maxth all arriving packets are dropped.

77	   An extension to RED is to mark the IP header instead of dropping
78	   packets (when the average queue size is between minth and maxth;
79	   above maxth arriving packets are dropped as before). Cooperating end
80	   systems would then use this as a signal that the network is congested
81	   and slow down. This is known as Explicit Congestion Notification
82	   (ECN).  In this paper we study an ECN implementation on Linux for
83	   both the router and the end systems in a live network.  The draft is
84	   organized as follows. In Section 2 we give an overview of queue
85	   management in routers. Section 3 gives an overview of ECN and the
86	   changes required at the router and the end hosts to support ECN.
87	   Section 4 defines the experimental testbed and the terminologies used
88	   throughout this draft. Section 5 introduces the experiments that are
89	   carried out, outlines the results and presents an analysis of the
90	   results obtained.  Section 6 concludes the paper.

92	2.  Queue Management in routers

94	   TCP's congestion control and avoidance algorithms are necessary and
95	   powerful but are not enough to provide good service in all
96	   circumstances since they treat the network as a black box. Some sort
97	   of control is required from the routers to complement the end system
98	   congestion control mechanisms. More detailed analysis is contained in
99	   [19].  Queue management algorithms traditionally manage the length of
100	   packet queues in the router by dropping packets only when the buffer
101	   overflows.  A maximum length for each queue is configured. The router
102	   will accept packets till this maximum size is exceeded, at which
103	   point it will drop incoming packets. New packets are accepted when
104	   buffer space allows. This technique is known as Tail Drop. This
105	   method has served the Internet well for years, but has the several
106	   drawbacks.  Since all arriving packets (from all flows) are dropped
107	   when the buffer overflows, this interacts badly with the congestion
108	   control mechanism of TCP. A cycle is formed with a burst of drops
109	   after the maximum queue size is exceeded, followed by a period of
110	   underutilization at the router as end systems back off. End systems
111	   then increase their windows simultaneously up to a point where a
112	   burst of drops happens again. This phenomenon is called Global
113	   Synchronization. It leads to poor link utilization and lower overall
114	   throughput [19] Another problem with Tail Drop is that a single
115	   connection or a few flows could monopolize the queue space, in some
116	   circumstances. This results in a lock out phenomenon leading to
117	   synchronization or other timing effects [19].  Lastly, one of the
118	   major drawbacks of Tail Drop is that queues remain full for long
119	   periods of time. One of the major goals of queue management is to
120	   reduce the steady state queue size[19].  Other queue management
121	   techniques include random drop on full and drop front on full [13].

123	   2.1. Active Queue Management

125	   Active queue management mechanisms detect congestion before the queue
126	   overflows and provide an indication of this congestion to the end
127	   nodes [7]. With this approach TCP does not have to rely only on
128	   buffer overflow as the indication of congestion since notification
129	   happens before serious congestion occurs. One such active management
130	   technique is RED.

132	   2.1.1. Random Early Detection

134	   Random Early Detection (RED) [9] is a congestion avoidance mechanism
135	   implemented in routers which works on the basis of active queue
136	   management. RED addresses the shortcomings of Tail Drop.  A RED
137	   router signals incipient congestion to TCP by dropping packets
138	   probabilistically before the queue runs out of buffer space. This
139	   drop probability is dependent on a running average queue size to
140	   avoid any bias against bursty traffic. A RED router randomly drops
141	   arriving packets, with the result that the probability of dropping a
142	   packet belonging to a particular flow is approximately proportional
143	   to the flow's share of bandwidth. Thus, if the sender is using
144	   relatively more bandwidth it gets penalized by having more of its
145	   packets dropped.  RED operates by maintaining two levels of
146	   thresholds minimum (minth) and maximum (maxth). It drops a packet
147	   probabilistically if and only if the average queue size lies between
148	   the minth and maxth thresholds. If the average queue size is above
149	   the maximum threshold, the arriving packet is always dropped. When
150	   the average queue size is between the minimum and the maximum
151	   threshold, each arriving packet is dropped with probability pa, where
152	   pa is a function of the average queue size. As the average queue
153	   length varies between minth and maxth, pa increases linearly towards
154	   a configured maximum drop probability, maxp. Beyond maxth, the drop
155	   probability is 100%.  Dropping packets in this way ensures that when
156	   some subset of the source TCP packets get dropped and they invoke
157	   congestion avoidance algorithms that will ease the congestion at the
158	   gateway. Since the dropping is distributed across flows, the problem
159	   of global synchronization is avoided.

161	3. Explicit Congestion Notification

163	   Explicit Congestion Notification is an extension proposed to RED
164	   which marks a packet instead of dropping it when the average queue
165	   size is between minth and maxth [7]. Since ECN marks packets before
166	   congestion actually occurs, this is useful for protocols like TCP
167	   that are sensitive to even a single packet loss. Upon receipt of a
168	   congestion marked packet, the TCP receiver informs the sender (in the
169	   subsequent ACK) about incipient congestion which will in turn trigger
170	   the congestion avoidance algorithm at the sender.  ECN requires
171	   support from both the router as well as the end hosts, i.e.  the end
172	   hosts TCP stack needs to be modified. Packets from flows that are not
173	   ECN capable will continue to be dropped by RED (as was the case
174	   before ECN).

176	   3.1. Changes at the router

178	   Router side support for ECN can be added by modifying current RED
179	   implementations. For packets from ECN capable hosts, the router marks
180	   the packets rather than dropping them (if the average queue size is
181	   between minth and maxth).  It is necessary that the router identifies
182	   that a packet is ECN capable, and should only mark packets that are
183	   from ECN capable hosts. This uses two bits in the IP header.  The ECN
184	   Capable Transport (ECT) bit is set by the sender end system if both
185	   the end systems are ECN capable (for a unicast transport, only if
186	   both end systems are ECN-capable). In TCP this is confirmed in the
187	   pre-negotiation during the connection setup phase (explained in
188	   Section 3.2).  Packets encountering congestion are marked by the
189	   router using the Congestion Experienced (CE) (if the average queue
190	   size is between minth and maxth) on their way to the receiver end
191	   system (from the sender end system), with a probability proportional
192	   to the average queue size following the procedure used in RED
193	   (RFC2309) routers.  Bits 10 and 11 in the IPV6 header are proposed
194	   respectively for the ECT and CE bits. Bits 6 and 7 of the IPV4 header
195	   DSCP field are also specified for experimental purposes for the ECT
196	   and CE bits respectively.

198	   3.2. Changes at the TCP Host side

200	   The proposal to add ECN to TCP specifies two new flags in the
201	   reserved field of the TCP header. Bit 9 in the reserved field of the
202	   TCP header is designated as the ECN-Echo (ECE) flag and Bit 8 is
203	   designated as the Congestion Window Reduced (CWR) flag.  These two
204	   bits are used both for the initializing phase in which the sender and
205	   the receiver negotiate the capability and the desire to use ECN, as
206	   well as for the subsequent actions to be taken in case there is
207	   congestion experienced in the network during the established state.

209	   There are two main changes that need to be made to add ECN to TCP to
210	   an end system and one extension to a router running RED.

212	   1. In the connection setup phase, the source and destination TCPs
213	   have to exchange information about their desire and/or capability to
214	   use ECN. This is done by setting both the ECN-Echo flag and the CWR
215	   flag in the SYN packet of the initial connection phase by the sender;
216	   on receipt of this SYN packet, the receiver will set the ECN-Echo
217	   flag in the SYN-ACK response. Once this agreement has been reached,
218	   the sender will thereon set the ECT bit in the IP header of data
219	   packets for that flow, to indicate to the network that it is capable
220	   and willing to participate in ECN. The ECT bit is set on all packets
221	   other than pure ACK's.

223	   2. When a router has decided from its active queue management
224	   mechanism, to drop or mark a packet, it checks the IP-ECT bit in the
225	   packet header. It sets the CE bit in the IP header if the IP-ECT bit
226	   is set. When such a packet reaches the receiver, the receiver
227	   responds by setting the ECN-Echo flag (in the TCP header) in the next
228	   outgoing ACK for the flow. The receiver will continue to do this in
229	   subsequent ACKs until it receives from the sender an indication that
230	   it (the sender) has responded to the congestion notification.

232	   3. Upon receipt of this ACK, the sender triggers its congestion
233	   avoidance algorithm by halving its congestion window, cwnd, and
234	   updating its congestion window threshold value ssthresh. Once it has
235	   taken these appropriate steps, the sender sets the CWR bit on the
236	   next data outgoing packet to tell the receiver that it has reacted to
237	   the (receiver's) notification of congestion.  The receiver reacts to
238	   the CWR by halting the sending of the congestion notifications (ECE)
239	   to the sender if there is no new congestion in the network.

241	   Note that the sender reaction to the indication of congestion in the
242	   network (when it receives an ACK packet that has the ECN-Echo flag
243	   set) is equivalent to the Fast Retransmit/Recovery algorithm (when
244	   there is a congestion loss) in NON-ECN-capable TCP i.e. the sender
245	   halves the congestion window cwnd and reduces the slow start
246	   threshold ssthresh. Fast Retransmit/Recovery is still available for
247	   ECN capable stacks for responding to three duplicate acknowledgments.

249	4. Experimental setup

251	   For testing purposes we have added ECN to the Linux TCP/IP stack,
252	   kernels version 2.0.32. 2.2.5, 2.3.43 (there were also earlier
253	   revisons of 2.3 which were tested).  The 2.0.32 implementation
254	   conforms to RFC 2481 [7] for the end systems only. We have also
255	   modified the code in the 2.1,2.2 and 2.3 cases for the router portion
256	   as well as end system to conform to the RFC. An outdated version of
257	   the 2.0 code is available at [18].  Note Linux version 2.0.32
258	   implements TCP Reno congestion control while kernels >= 2.2.0 default
259	   to New Reno but will opt for a SACK/FACK combo when the remote end
260	   understands SACK.  Our initial tests were carried out with the 2.0
261	   kernel at the end system and 2.1 (pre 2.2) for the router part.  The
262	   majority of the test results here apply to the 2.0 tests. We  did
263	   repeat these tests on a different testbed (move from Pentium to
264	   Pentium-II class machines)with faster machines for the 2.2 and 2.3
265	   kernels, so the comparisons on the 2.0 and 2.2/3 are not relative.

267	   We have updated this draft release to reflect the tests against SACK
268	   and New Reno.

270	   4.1. Testbed setup

272	                                           -----      ----
273	                                          | ECN |    | ECN |
274	                                          | ON  |    | OFF |
275	        data direction ---->>              -----      ----
276	                                            |          |
277	    server                                  |          |
278	     ----        ------        ------       |          |
279	    |    |      |  R1  |      |  R2  |      |          |
280	    |    | -----|      | ---- |      | ----------------------
281	     ----        ------ ^      ------             |
282	                        ^                         |
283	                        |                        -----
284	    congestion point ___|                       |  C  |
285	                                                |     |
286	                                                 -----

288	   The figure above shows our test setup.

290	   All the physical links are 10Mbps ethernet.  Using Class Based
291	   Queuing (CBQ) [22], packets from the data server are constricted to a
292	   1.5Mbps pipe at the router R1. Data is always retrieved from the
293	   server towards the clients labelled , "ECN ON", "ECN OFF", and "C".
294	   Since the pipe from the server is 10Mbps, this creates congestion at
295	   the exit from the router towards the clients for competing flows. The
296	   machines labeled "ECN ON" and "ECN OFF"  are running the same version
297	   of Linux and have exactly the same hardware configuration. The server
298	   is always ECN capable (and can handle NON ECN flows as well using the
299	   standard congestion algorithms). The machine labeled "C" is used to
300	   create congestion in the network. Router R2 acts as a path-delay
301	   controller.  With it we adjust the RTT the clients see.  Router R1
302	   has RED implemented in it and has capability for supporting ECN
303	   flows.  The path-delay router is a PC running the Nistnet [16]
304	   package on a Linux platform. The latency of the link for the
305	   experiments was set to be 20 millisecs.

307	   4.2. Validating the Implementation

309	   We spent time validating that the implementation was conformant to
310	   the specification in RFC 2481. To do this, the popular tcpdump
311	   sniffer [24] was modified to show the packets being marked. We
312	   visually inspected tcpdump traces to validate the conformance to the
313	   RFC under a lot of different scenarios.  We also modified tcptrace
314	   [25] in order to plot the marked packets for visualization and
315	   analysis.

317	   Both tcpdump and tcptrace revealed that the implementation was
318	   conformant to the RFC.

320	   4.3. Terminology used

322	   This section presents background terminology used in the next few
323	   sections.

325	   * Congesting flows: These are TCP flows that are started in the
326	   background so as to create congestion from R1 towards R2. We use the
327	   laptop labeled "C" to introduce congesting flows. Note that "C" as is
328	   the case with the other clients retrieves data from the server.

330	   * Low, Moderate and High congestion: For the case of low congestion
331	   we start two congesting flows in the background, for moderate
332	   congestion we start five congesting flows and for the case of high
333	   congestion we start ten congesting flows in the background.

335	   * Competing flows: These are the flows that we are interested in.
336	   They are either ECN TCP flows from/to "ECN ON" or NON ECN TCP flows
337	   from/to "ECN OFF".

339	   * Maximum drop rate: This is the RED parameter that sets the maximum
340	   probability of a packet being marked at the router. This corresponds
341	   to maxp as explained in Section 2.1.

343	   Our tests were repeated for varying levels of congestion with varying
344	   maximum drop rates. The results are presented in the subsequent
345	   sections.

347	   * Low, Medium and High drop probability: We use the term low
348	   probability to mean a drop probability maxp of 0.02, medium
349	   probability for 0.2 and high probability for 0.5. We also
350	   experimented with drop probabilities of 0.05, 0.1 and 0.3.

352	   * Goodput: We define goodput as the effective data rate as observed
353	   by the user, i.e., if we transmitted 4 data packets in which two of
354	   them were retransmitted packets, the efficiency is 50% and the
355	   resulting goodput is 2*packet size/time taken to transmit.

357	   * RED Region: When the router's average queue size is between minth
358	   and maxth we denote that we are operating in the RED region.

360	   4.4.  RED parameter selection

362	   In our initial testing we noticed that as we increase the number of
363	   congesting flows the RED queue degenerates into a simple Tail Drop
364	   queue.  i.e. the average queue exceeds the maximum threshold most of
365	   the times.  Note that this phenomena has also been observed by [5]
366	   who proposes a dynamic solution to alleviate it by adjusting the
367	   packet dropping probability "maxp" based on the past history of the
368	   average queue size.  Hence, it is necessary that in the course of our
369	   experiments the router operate in the RED region, i.e., we have to
370	   make sure that the average queue is maintained between minth and
371	   maxth. If this is not maintained, then the queue acts like a Tail
372	   Drop queue and the advantages of ECN diminish. Our goal is to
373	   validate ECN's benefits when used with RED at the router.  To ensure
374	   that we were operating in the RED region we monitored the average
375	   queue size and the actual queue size in times of low, moderate and
376	   high congestion and fine-tuned the RED parameters such that the
377	   average queue zones around the RED region before running the
378	   experiment proper.  Our results are, therefore, not influenced by
379	   operating in the wrong RED region.

381	5. The Experiments

383	   We start by making sure that the background flows do not bias our
384	   results by computing the fairness index [12] in Section 5.1. We
385	   proceed to carry out the experiments for bulk transfer presenting the
386	   results and analysis in Section 5.2. In Section 5.3 the results for
387	   transactional transfers along with analysis is presented.  More
388	   details on the experimental results can be found in [27].

390	   5.1. Fairness

392	   In the course of the experiments we wanted to make sure that our
393	   choice of the type of background flows does not bias the results that
394	   we collect.  Hence we carried out some tests initially with both ECN
395	   and NON ECN flows as the background flows. We repeated the
396	   experiments for different drop probabilities and calculated the
397	   fairness index [12].  We also noticed (when there were equal number
398	   of ECN and NON ECN flows) that the number of packets dropped for the
399	   NON ECN flows was equal to the number of packets marked for the ECN
400	   flows, showing thereby that the RED algorithm was fair to both kind
401	   of flows.

403	   Fairness index: The fairness index is a performance metric described
404	   in [12].  Jain [12] postulates that the network is a multi-user
405	   system, and derives a metric to see how fairly each user is treated.
406	   He defines fairness as a function of the variability of throughput
407	   across users. For a given set of user throughputs (x1, x2...xn), the
408	   fairness index to the set is defined as follows:

410	   f(x1,x2,.....,xn) = square((sum[i=1..n]xi))/(n*sum[i=1..n]square(xi))

412	   The fairness index always lies between 0 and 1. A value of 1
413	   indicates that all flows got exactly the same throughput.  Each of
414	   the tests was carried out 10 times to gain confidence in our results.
415	   To compute the fairness index we used FTP to generate traffic.

417	   Experiment details: At time t = 0 we start 2 NON ECN FTP sessions in
418	   the background to create congestion. At time t=20 seconds we start
419	   two competing flows. We note the throughput of all the flows in the
420	   network and calculate the fairness index. The experiment was carried
421	   out for various maximum drop probabilities and for various congestion
422	   levels.  The same procedure is repeated with the background flows as
423	   ECN. The fairness index was fairly constant in both the cases when
424	   the background flows were ECN and NON ECN indicating that there was
425	   no bias when the background flows were either ECN or NON ECN.

427	   Max     Fairness                Fairness
428	   Drop    With BG                 With BG
429	   Prob    flows ECN               flows NON ECN

431	   0.02    0.996888                0.991946
432	   0.05    0.995987                0.988286
433	   0.1     0.985403                0.989726
434	   0.2     0.979368                0.983342

436	   With the observation that the nature of background flows does not
437	   alter the results, we proceed by using the background flows as NON
438	   ECN for the rest of the experiments.

440	   5.2.  Bulk transfers

442	   The metric we chose for bulk transfer is end user throughput.

444	   Experiment Details: All TCP flows used are RENO TCP. For the case of
445	   low congestion we start 2 FTP flows in the background at time 0. Then
446	   after about 20 seconds we start the competing flows, one data
447	   transfer to the ECN machine and the second to the NON ECN machine.
448	   The size of the file used is 20MB. For the case of moderate
449	   congestion we start 5 FTP flows in the background and for the case of
450	   high congestion we start 10 FTP flows in the background. We repeat
451	   the experiments for various maximum drop rates each repeated for a
452	   numnber of sets.

454	   Observation and Analysis:

456	   We make three key observations:

458	   1) As the congestion level increases, the relative advantage for ECN
459	   increases but the absolute advantage decreases (expected, since there
460	   are more flows competing for the same link resource). ECN still does
461	   better than NON ECN even under high congestion.  Infering a sample
462	   from the collected results: at maximum drop probability of 0.1, for
463	   example, the relative advantage of ECN increases from 23% to 50% as
464	   the congestion level increases from low to high.

466	   2) Maintaining congestion levels and varying the maximum drop
467	   probability (MDP) reveals that the relative advantage of ECN
468	   increases with increasing MDP. As an example, for the case of high
469	   congestion as we vary the drop probability from 0.02 to 0.5 the
470	   relative advantage of ECN increases from 10% to 60%.

472	   3) There were hardly any retransmissions for ECN flows (except the
473	   occasional packet drop in a minority of the tests for the case of
474	   high congestion and low maximum drop probability).

476	   We analyzed tcpdump traces for NON ECN with the help of tcptrace and
477	   observed that there were hardly any retransmits due to timeouts.
478	   (Retransmit due to timeouts are inferred by counting the number of 3
479	   DUPACKS retransmit and subtracting them from the total recorded
480	   number of retransmits).  This means that over a long period of time
481	   (as is the case of long bulk transfers), the data-driven loss
482	   recovery mechanism of the Fast Retransmit/Recovery algorithm is very
483	   effective.  The algorithm for ECN on congestion notification from ECE
484	   is the same as that for a Fast Retransmit for NON ECN. Since both are
485	   operating in the RED region, ECN barely gets any advantage over NON
486	   ECN from the signaling (packet drop vs. marking).

488	   It is clear, however, from the results that ECN flows benefit in bulk
489	   transfers.  We believe that the main advantage of ECN for bulk
490	   transfers is that less time is spent recovering (whereas NON ECN
491	   spends time retransmitting), and timeouts are avoided altogether.
492	   [23] has shown that even with RED deployed, TCP RENO could suffer
493	   from multiple packet drops within the same window of data, likely to
494	   lead to multiple congestion reactions or timeouts (these problems are
495	   alleviated by ECN). However, while TCP Reno has performance problems
496	   with multiple packets dropped in a window of data, New Reno and SACK
497	   have no such problems.

499	   Thus, for scenarios with very high levels of congestion, the
500	   advantages of ECN for TCP Reno flows could be more dramatic than the
501	   advantages of ECN for NewReno or SACK flows.  An important
502	   observation to make from our results is that we do not notice
503	   multiple drops within a single window of data. Thus, we would expect
504	   that our results are not heavily influenced by Reno's performance
505	   problems with multiple packets dropped from a window of data.
506	   We repeated these tests with ECN patched newer Linux kernels. As
507	   mentioned earlier these kernels would use a SACK/FACK combo with a
508	   fallback to New Reno.  SACK can be selectively turned off (defaulting
509	   to New Reno).  Our results indicate that ECN still improves
510	   performance for the bulk transfers. More results are available in the
511	   pdf version[27]. As in 1) above, maintaining a maximum drop
512	   probability of 0.1 and increasing the congestion level, it is
513	   observed that ECN-SACK improves performance from about 5% at low
514	   congestion to about 15% at high congestion. In the scenario where
515	   high congestion is maintained and the maximum drop probability is
516	   moved from 0.02 to 0.5, the relative advantage of ECN-SACK improves
517	   from 10% to 40%.  Although this numbers are lower than the ones
518	   exhibited by Reno, they do reflect the improvement that ECN offers
519	   even in the presence of robust recovery mechanisms such as SACK.

521	   5.3.  Transactional  transfers

523	   We model transactional transfers by sending a small request and
524	   getting a response from a server before sending the next request. To
525	   generate transactional transfer traffic we use Netperf [17] with the
526	   CRR (Connect Request Response) option.  As an example let us assume
527	   that we are retrieving a small file of say 5 - 20 KB, then in effect
528	   we send a small request to the server and the server responds by
529	   sending us the file. The transaction is complete when we receive the
530	   complete file. To gain confidence in our results we carry the
531	   simulation for about one hour. For each test there are a few thousand
532	   of these requests and responses taking place.  Although not exactly
533	   modeling HTTP 1.0 traffic, where several concurrent sessions are
534	   opened, Netperf-CRR is nevertheless a close approximation.  Since
535	   Netperf-CRR waits for one connection to complete before opening the
536	   next one (0 think time), that single connection could be viewed as
537	   the slowest response in the set of the opened concurrent sessions (in
538	   HTTP).  The transactional data sizes were selected based on [2] which
539	   indicates that the average web transaction was around 8 - 10 KB; The
540	   smaller (5KB) size was selected to guestimate the size of
541	   transactional processing that may become prevalent with policy
542	   management schemes in the diffserv [4] context.  Using Netperf we are
543	   able to initiate these kind of transactional transfers for a variable
544	   length of time. The main metric of interest in this case is the
545	   transaction rate, which is recorded by Netperf.

547	   * Define Transaction rate as: The number of requests and complete
548	   responses for a particular requested size that we are able to do per
549	   second. For example if our request is of 1KB and the response is 5KB
550	   then we define the transaction rate as the number of such complete
551	   transactions that we can accomplish per second.

553	   Experiment Details: Similar to the case of bulk transfers we start
554	   the background FTP flows to introduce the congestion in the network
555	   at time 0. About 20 seconds later we start the transactional
556	   transfers and run each test for three minutes. We record the
557	   transactions per second that are complete. We repeat the test for
558	   about an hour and plot the various transactions per second, averaged
559	   out over the runs. The experiment is repeated for various maximum
560	   drop probabilities, file sizes and various levels of congestion.

562	   Observation and Analysis

564	   There are three key observations:

566	   1) As congestion increases (with fixed drop probability) the relative
567	   advantage for ECN increases (again the absolute advantage does not
568	   increase since more flows are sharing the same bandwidth). For
569	   example, from the results, if we consider the 5KB transactional flow,
570	   as we increase the congestion from medium congestion (5 congesting
571	   flows) to high congestion (10 congesting flows) for a maximum drop
572	   probability of 0.1 the relative gain for ECN increases from 42% to
573	   62%.

575	   2) Maintaining the congestion level while adjusting the maximum drop
576	   probability indicates that the relative advantage for ECN flows
577	   increase.  From the case of high congestion for the 5KB flow we
578	   observe that the number of transactions per second increases from 0.8
579	   to 2.2 which corresponds to an increase in relative gain for ECN of
580	   20% to 140%.

582	   3) As the transactional data size increases, ECN's advantage
583	   diminishes because the probability of recovering from a Fast
584	   Retransmit increases for NON ECN. ECN, therefore, has a huge
585	   advantage as the transactional data size gets smaller as is observed
586	   in the results.  This can be explained by looking at TCP recovery
587	   mechanisms.  NON ECN in the short flows depends, for recovery, on
588	   congestion signaling via receiving 3 duplicate ACKs, or worse by a
589	   retransmit timer expiration, whereas ECN depends mostly on the TCP-
590	   ECE flag. This is by design in our experimental setup.  [3] shows
591	   that most of the TCP loss recovery in fact happens in timeouts for
592	   short flows. The effectiveness of the Fast Retransmit/Recovery
593	   algorithm is limited by the fact that there might not be enough data
594	   in the pipe to elicit 3 duplicate ACKs.  TCP RENO needs at least 4
595	   outstanding packets to recover from losses without going into a
596	   timeout. For 5KB (4 packets for MTU of 1500Bytes) a NON ECN flow will
597	   always have to wait for a retransmit timeout if any of its packets
598	   are lost. ( This timeout could only have been avoided if the flow had
599	   used an initial window of four packets, and the first of the four
600	   packets was the packet dropped).  We repeated these experiments with
601	   the kernels implementing SACK/FACK and New Reno algorithms. Our
602	   observation was that there was hardly any difference with what we saw
603	   with Reno. For example in the case of SACK-ECN enabling: maintaining
604	   the maximum drop probability to 0.1 and increasing the congestion
605	   level for the 5KB transaction we noticed that the relative gain for
606	   the ECN enabled flows increases from 47-80%.  If we maintain the
607	   congestion level for the 5KB transactions and increase the maximum
608	   drop probabilities instead, we notice that SACKs perfomance increases
609	   from 15%-120%.  It is fair to comment that the difference in the
610	   testbeds (different machines, same topology) might have contributed
611	   to the results; however, it is worth noting that the relative
612	   advantage of the SACK-ECN is obvious.

614	6. Conclusion

616	   ECN enhancements improve on both bulk and transactional TCP traffic.
617	   The improvement is more obvious in short transactional type of flows
618	   (popularly referred to as mice).

620	   * Because less retransmits happen with ECN, it means less traffic on
621	   the network. Although the relative amount of data retransmitted in
622	   our case is small, the effect could be higher when there are more
623	   contributing end systems. The absence of retransmits also implies an
624	   improvement in the goodput. This becomes very important for scenarios
625	   where bandwidth is expensive such as in low bandwidth links.  This
626	   implies also that ECN lends itself well to applications that require
627	   reliability but would prefer to avoid unnecessary retransmissions.

629	   * The fact that ECN avoids timeouts by getting faster notification
630	   (as opposed to traditional packet dropping inference from 3 duplicate
631	   ACKs or, even worse, timeouts) implies less time is spent during
632	   error recovery - this also improves goodput.

634	   * ECN could be used to help in service differentiation where the end
635	   user is able to "probe" for their target rate faster. Assured
636	   forwarding [1] in the diffserv working group at the IETF proposes
637	   using RED with varying drop probabilities as a service
638	   differentiation mechanism.  It is possible that multiple packets
639	   within a single window in TCP RENO could be dropped even in the
640	   presence of RED, likely leading into timeouts [23]. ECN end systems
641	   ignore multiple notifications, which help in countering this scenario
642	   resulting in improved goodput. The ECN end system also ends up
643	   probing the network faster (to reach an optimal bandwidth). [23] also
644	   notes that RENO is the most widely deployed TCP implementation today.
645	        It is clear that the advent of policy management schemes
646	   introduces new requirements for transactional type of applications,
647	   which constitute a very short query and a response in the order of a
648	   few packets. ECN provides advantages to transactional traffic as we
649	   have shown in the experiments.

651	7. Acknowledgements

653	   We would like to thank Alan Chapman, Ioannis Lambadaris, Thomas Kunz,
654	   Biswajit Nandy, Nabil Seddigh, Sally Floyd, and Rupinder Makkar for
655	   their helpful feedback and valuable suggestions.

657	8. References

659	   [1] Heinanen  J  et  al ,   "Assured  Forwarding  PHB  Group",   http:
660	   //search.ietf.org/internet-drafts/draft-ietf-diffserv-af-06.txt.
661	   Work in Progress.

663	   [2] B.A.Mat. " An empirical model of HTTP network traffic."
664	   In proceedings  INFOCOMM'97.

666	   [3] Balakrishnan H., Padmanabhan V , Seshan S, Stemn M and Randy H.
667	   Katz, "TCP   Behavior of a busy Internet Server: Analysis and
668	   Improvements", Proceedings of IEEE Infocom, San Francisco, CA, USA,
669	   March '98 http://www.cs.berkeley.edu/hari/papers/infocom98.ps.gz.

671	   [4] Blake, Steve, et al., "An Architecture for Differentiated
672	   Services." RFC 2475.

674	   [5] W. Feng, D. Kandlur, D. Saha, K. Shin, "Techniques for Eliminating
675	   Packet Loss in Congested TCP/IP Networks", U. Michigan CSE-TR-349-97,
676	   November 1997.

678	   [6] S. Floyd. " TCP and Explicit Congestion Notification." ACM
679	   Computer Communications Review, 24, October 1994.

681	   [7] Ramakrishnan, K.K., and Floyd, S., "A Proposal to add Explicit
682	   Congestion Notification (ECN) to IP" RFC 2481, January 1999.

684	   [8] Kevin Fall, Sally Floyd, "Comparisons of Tahoe, RENO and Sack TCP
685	   ", Computer  Communications Review, V. 26 N. 3, July 1996, pp. 5-21

687	   [9] S.Floyd and V. Jacobson. "Random Early Detection Gateways for
688	   Congestion Avoidance". IEEE/ACM Transactions on Networking, 3(1),
689	   August 1993.

691	   [10] E. Hashem. "Analysis of random drop for gateway congestion
692	   control." Rep. Lcs tr-465, Lav. Fot Comput. Sci., M.I.T., 1989

694	   [11] V.Jacobson. "Congestion Avoidance and Control." In Proceedings
695	   of SIGCOMM '88, Stanford, CA, August 1988

697	   [12] Raj Jain, " The art of computer systems performance analysis",
698	   John Wiley and sons QA76.9.E94J32 ,1991

700	   [13] T. V. Lakshman, Arnie Neidhardt, Teunis Ott, "The Drop From Front
701	    Strategy in TCP Over ATM and Its Interworking with Other  Control
702	   Features", Infocom 96, MA28.1.

704	   [14] P. Mishra and H. Kanakia. "A hop by hop rate based congestion
705	   control scheme." Proc. SIGCOMM '92, pp. 112-123, August 1992.

707	   [15] Floyd & Henderson  "The NewReno Modification to TCP's Fast
708	   Recovery Algorithm",  RFC 2582, April 1999.

710	   [16] The NIST Network Emulation Tool
711	   http://www.antd.nist.gov/itg/nistnet/

713	   [17] The network performance tool
714	   http://www.netperf.org/netperf/NetperfPage.html

716	   [18] ftp://ftp.ee.lbl.gov/ECN/ECN-package.tgz

718	   [19] Braden, Clark et all  "Recommendations on Queue Management and
719	   Congestion  Avoidance in the  Internet" RFC 2309, April 1998

721	   [20] K.K. Ramakrishnan and R.Jain. "A Binary feedback scheme for
722	   congestion avoidance in computer networks."
723	   ACM Trans. Comput. Syst.,8(2):158-181,1990

725	   [21] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow
726	   "TCP Selective Acknowledgement Options " RFC 2018 Oct 1996.

728	   [22]  S.Floyd and V. Jacobson, "Link sharing and Resource Management
729	   Models for packet  Networks",
730	   IEEE/ACM Transactions on Networking, Vol. 3 No.4, August 1995.

732	   [23] Prasad Bagal, Shivkumar Kalyanaraman, Bob Packer,
733	   "Comparative study of RED, ECN and TCP Rate Control".
734	   http://www.packeteer.com/technology/articles.htm

736	   [24] tcpdump, the protocol packet capture & dumper program.
737	   ftp://ftp.ee.lbl.gov/tcpdump.tar.Z

739	   [25] TCP dump file analysis tool: http://jarok.cs.ohiou.edu/software/
740	   tcptrace/tcptrace.html

742	   [26] Thompson K, Miller G.J, Wilder R,
743	   " Wide-Area Internet Traffic Patterns and Characteristics".
744	   IEEE Networks Magazine, November/December 1997

746	   [27]  http://www7.nortel.com:8080/CTL/ecnperf.pdf

748	   9. Author Addresses

750	   Jamal Hadi Salim
751	   Nortel Networks
752	   3500 Carling Ave
753	   Ottawa, ON, K2H 8E9
754	   Canada
755	   Email: hadi@nortelnetworks.com

757	   Uvaiz Ahmed
758	   Dept. of Systems and Computer Engineering
759	   Carleton University
760	   Ottawa
761	   Canada
762	   Email: ahmed@sce.carleton.ca