idnits 2.17.1 draft-hadi-jhsua-ecnperf-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 16 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 17 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 3 instances of too long lines in the document, the longest one being 1 character in excess of 72. ** The abstract seems to contain references ([6], [7]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The "Author's Address" (or "Authors' Addresses") section title is misspelled. == Line 262 has weird spacing: '...sts. We did...' == Line 429 has weird spacing: '... Prob flo...' == Line 521 has weird spacing: '...ctional trans...' == Line 659 has weird spacing: '... J et al , ...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 2000) is 8807 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '8' is defined on line 684, but no explicit reference was found in the text == Unused Reference: '10' is defined on line 691, but no explicit reference was found in the text == Unused Reference: '14' is defined on line 704, but no explicit reference was found in the text == Unused Reference: '20' is defined on line 721, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' ** Downref: Normative reference to an Informational RFC: RFC 2475 (ref. '4') -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' ** Obsolete normative reference: RFC 2481 (ref. '7') (Obsoleted by RFC 3168) -- Possible downref: Non-RFC (?) normative reference: ref. '8' -- Possible downref: Non-RFC (?) normative reference: ref. '9' -- Possible downref: Non-RFC (?) normative reference: ref. '10' -- Possible downref: Non-RFC (?) normative reference: ref. '11' -- Possible downref: Non-RFC (?) normative reference: ref. '12' -- Possible downref: Non-RFC (?) normative reference: ref. '13' -- Possible downref: Non-RFC (?) normative reference: ref. '14' ** Obsolete normative reference: RFC 2582 (ref. '15') (Obsoleted by RFC 3782) -- Possible downref: Non-RFC (?) normative reference: ref. '16' -- Possible downref: Non-RFC (?) normative reference: ref. '17' -- Possible downref: Non-RFC (?) normative reference: ref. '18' ** Obsolete normative reference: RFC 2309 (ref. '19') (Obsoleted by RFC 7567) -- Possible downref: Non-RFC (?) normative reference: ref. '20' -- Possible downref: Non-RFC (?) normative reference: ref. '22' -- Possible downref: Non-RFC (?) normative reference: ref. '23' -- Possible downref: Non-RFC (?) normative reference: ref. '24' -- Possible downref: Non-RFC (?) normative reference: ref. '25' -- Possible downref: Non-RFC (?) normative reference: ref. '26' -- Possible downref: Non-RFC (?) normative reference: ref. '27' Summary: 12 errors (**), 0 flaws (~~), 12 warnings (==), 24 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Jamal Hadi Salim 2 Internet Draft Nortel Networks 3 Expires: Sept 2000 Uvaiz Ahmed 4 draft-hadi-jhsua-ecnperf-01.txt Carleton University 5 March 2000 7 Performance Evaluation of Explicit Congestion Notification (ECN) in IP 8 Networks 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at 22 anytime. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 Abstract 33 This draft presents a performance study of the Explicit Congestion 34 Notification (ECN) mechanism in the TCP/IP protocol using our 35 implementation on the Linux Operating System. ECN is an end-to-end 36 congestion avoidance mechanism proposed by [6] and incorporated into 37 RFC 2481[7]. We study the behavior of ECN for both bulk and 38 transactional transfers. Our experiments show that there is 39 improvement in throughput over NON ECN (TCP employing any of Reno, 40 SACK/FACK or NewReno congestion control) in the case of bulk 41 transfers and substantial improvement for transactional transfers. 43 A more complete pdf version of this document is available at: 44 http://www7.nortel.com:8080/CTL/ecnperf.pdf 46 This draft in its current revision is missing a lot of the visual 47 representations and experimental results found in the pdf version. 49 1. Introduction 51 In current IP networks, congestion management is left to the 52 protocols running on top of IP. An IP router when congested simply 53 drops packets. TCP is the dominant transport protocol today [26]. 54 TCP infers that there is congestion in the network by detecting 55 packet drops (RFC 2581). Congestion control algorithms [11] [15] [21] 56 are then invoked to alleviate congestion. TCP initially sends at a 57 higher rate (slow start) until it detects a packet loss. A packet 58 loss is inferred by the receipt of 3 duplicate ACKs or detected by a 59 timeout. The sending TCP then moves into a congestion avoidance state 60 where it carefully probes the network by sending at a slower rate 61 (which goes up until another packet loss is detected). Traditionally 62 a router reacts to congestion by dropping a packet in the absence of 63 buffer space. This is referred to as Tail Drop. This method has a 64 number of drawbacks (outlined in Section 2). These drawbacks coupled 65 with the limitations of end-to-end congestion control have led to 66 interest in introducing smarter congestion control mechanisms in 67 routers. One such mechanism is Random Early Detection (RED) [9] 68 which detects incipient congestion and implicitly signals the 69 oversubscribing flow to slow down by dropping its packets. A RED- 70 enabled router detects congestion before the buffer overflows, based 71 on a running average queue size, and drops packets probabilistically 72 before the queue actually fills up. The probability of dropping a new 73 arriving packet increases as the average queue size increases above a 74 low water mark minth, towards higher water mark maxth. When the 75 average queue size exceeds maxth all arriving packets are dropped. 77 An extension to RED is to mark the IP header instead of dropping 78 packets (when the average queue size is between minth and maxth; 79 above maxth arriving packets are dropped as before). Cooperating end 80 systems would then use this as a signal that the network is congested 81 and slow down. This is known as Explicit Congestion Notification 82 (ECN). In this paper we study an ECN implementation on Linux for 83 both the router and the end systems in a live network. The draft is 84 organized as follows. In Section 2 we give an overview of queue 85 management in routers. Section 3 gives an overview of ECN and the 86 changes required at the router and the end hosts to support ECN. 87 Section 4 defines the experimental testbed and the terminologies used 88 throughout this draft. Section 5 introduces the experiments that are 89 carried out, outlines the results and presents an analysis of the 90 results obtained. Section 6 concludes the paper. 92 2. Queue Management in routers 94 TCP's congestion control and avoidance algorithms are necessary and 95 powerful but are not enough to provide good service in all 96 circumstances since they treat the network as a black box. Some sort 97 of control is required from the routers to complement the end system 98 congestion control mechanisms. More detailed analysis is contained in 99 [19]. Queue management algorithms traditionally manage the length of 100 packet queues in the router by dropping packets only when the buffer 101 overflows. A maximum length for each queue is configured. The router 102 will accept packets till this maximum size is exceeded, at which 103 point it will drop incoming packets. New packets are accepted when 104 buffer space allows. This technique is known as Tail Drop. This 105 method has served the Internet well for years, but has the several 106 drawbacks. Since all arriving packets (from all flows) are dropped 107 when the buffer overflows, this interacts badly with the congestion 108 control mechanism of TCP. A cycle is formed with a burst of drops 109 after the maximum queue size is exceeded, followed by a period of 110 underutilization at the router as end systems back off. End systems 111 then increase their windows simultaneously up to a point where a 112 burst of drops happens again. This phenomenon is called Global 113 Synchronization. It leads to poor link utilization and lower overall 114 throughput [19] Another problem with Tail Drop is that a single 115 connection or a few flows could monopolize the queue space, in some 116 circumstances. This results in a lock out phenomenon leading to 117 synchronization or other timing effects [19]. Lastly, one of the 118 major drawbacks of Tail Drop is that queues remain full for long 119 periods of time. One of the major goals of queue management is to 120 reduce the steady state queue size[19]. Other queue management 121 techniques include random drop on full and drop front on full [13]. 123 2.1. Active Queue Management 125 Active queue management mechanisms detect congestion before the queue 126 overflows and provide an indication of this congestion to the end 127 nodes [7]. With this approach TCP does not have to rely only on 128 buffer overflow as the indication of congestion since notification 129 happens before serious congestion occurs. One such active management 130 technique is RED. 132 2.1.1. Random Early Detection 134 Random Early Detection (RED) [9] is a congestion avoidance mechanism 135 implemented in routers which works on the basis of active queue 136 management. RED addresses the shortcomings of Tail Drop. A RED 137 router signals incipient congestion to TCP by dropping packets 138 probabilistically before the queue runs out of buffer space. This 139 drop probability is dependent on a running average queue size to 140 avoid any bias against bursty traffic. A RED router randomly drops 141 arriving packets, with the result that the probability of dropping a 142 packet belonging to a particular flow is approximately proportional 143 to the flow's share of bandwidth. Thus, if the sender is using 144 relatively more bandwidth it gets penalized by having more of its 145 packets dropped. RED operates by maintaining two levels of 146 thresholds minimum (minth) and maximum (maxth). It drops a packet 147 probabilistically if and only if the average queue size lies between 148 the minth and maxth thresholds. If the average queue size is above 149 the maximum threshold, the arriving packet is always dropped. When 150 the average queue size is between the minimum and the maximum 151 threshold, each arriving packet is dropped with probability pa, where 152 pa is a function of the average queue size. As the average queue 153 length varies between minth and maxth, pa increases linearly towards 154 a configured maximum drop probability, maxp. Beyond maxth, the drop 155 probability is 100%. Dropping packets in this way ensures that when 156 some subset of the source TCP packets get dropped and they invoke 157 congestion avoidance algorithms that will ease the congestion at the 158 gateway. Since the dropping is distributed across flows, the problem 159 of global synchronization is avoided. 161 3. Explicit Congestion Notification 163 Explicit Congestion Notification is an extension proposed to RED 164 which marks a packet instead of dropping it when the average queue 165 size is between minth and maxth [7]. Since ECN marks packets before 166 congestion actually occurs, this is useful for protocols like TCP 167 that are sensitive to even a single packet loss. Upon receipt of a 168 congestion marked packet, the TCP receiver informs the sender (in the 169 subsequent ACK) about incipient congestion which will in turn trigger 170 the congestion avoidance algorithm at the sender. ECN requires 171 support from both the router as well as the end hosts, i.e. the end 172 hosts TCP stack needs to be modified. Packets from flows that are not 173 ECN capable will continue to be dropped by RED (as was the case 174 before ECN). 176 3.1. Changes at the router 178 Router side support for ECN can be added by modifying current RED 179 implementations. For packets from ECN capable hosts, the router marks 180 the packets rather than dropping them (if the average queue size is 181 between minth and maxth). It is necessary that the router identifies 182 that a packet is ECN capable, and should only mark packets that are 183 from ECN capable hosts. This uses two bits in the IP header. The ECN 184 Capable Transport (ECT) bit is set by the sender end system if both 185 the end systems are ECN capable (for a unicast transport, only if 186 both end systems are ECN-capable). In TCP this is confirmed in the 187 pre-negotiation during the connection setup phase (explained in 188 Section 3.2). Packets encountering congestion are marked by the 189 router using the Congestion Experienced (CE) (if the average queue 190 size is between minth and maxth) on their way to the receiver end 191 system (from the sender end system), with a probability proportional 192 to the average queue size following the procedure used in RED 193 (RFC2309) routers. Bits 10 and 11 in the IPV6 header are proposed 194 respectively for the ECT and CE bits. Bits 6 and 7 of the IPV4 header 195 DSCP field are also specified for experimental purposes for the ECT 196 and CE bits respectively. 198 3.2. Changes at the TCP Host side 200 The proposal to add ECN to TCP specifies two new flags in the 201 reserved field of the TCP header. Bit 9 in the reserved field of the 202 TCP header is designated as the ECN-Echo (ECE) flag and Bit 8 is 203 designated as the Congestion Window Reduced (CWR) flag. These two 204 bits are used both for the initializing phase in which the sender and 205 the receiver negotiate the capability and the desire to use ECN, as 206 well as for the subsequent actions to be taken in case there is 207 congestion experienced in the network during the established state. 209 There are two main changes that need to be made to add ECN to TCP to 210 an end system and one extension to a router running RED. 212 1. In the connection setup phase, the source and destination TCPs 213 have to exchange information about their desire and/or capability to 214 use ECN. This is done by setting both the ECN-Echo flag and the CWR 215 flag in the SYN packet of the initial connection phase by the sender; 216 on receipt of this SYN packet, the receiver will set the ECN-Echo 217 flag in the SYN-ACK response. Once this agreement has been reached, 218 the sender will thereon set the ECT bit in the IP header of data 219 packets for that flow, to indicate to the network that it is capable 220 and willing to participate in ECN. The ECT bit is set on all packets 221 other than pure ACK's. 223 2. When a router has decided from its active queue management 224 mechanism, to drop or mark a packet, it checks the IP-ECT bit in the 225 packet header. It sets the CE bit in the IP header if the IP-ECT bit 226 is set. When such a packet reaches the receiver, the receiver 227 responds by setting the ECN-Echo flag (in the TCP header) in the next 228 outgoing ACK for the flow. The receiver will continue to do this in 229 subsequent ACKs until it receives from the sender an indication that 230 it (the sender) has responded to the congestion notification. 232 3. Upon receipt of this ACK, the sender triggers its congestion 233 avoidance algorithm by halving its congestion window, cwnd, and 234 updating its congestion window threshold value ssthresh. Once it has 235 taken these appropriate steps, the sender sets the CWR bit on the 236 next data outgoing packet to tell the receiver that it has reacted to 237 the (receiver's) notification of congestion. The receiver reacts to 238 the CWR by halting the sending of the congestion notifications (ECE) 239 to the sender if there is no new congestion in the network. 241 Note that the sender reaction to the indication of congestion in the 242 network (when it receives an ACK packet that has the ECN-Echo flag 243 set) is equivalent to the Fast Retransmit/Recovery algorithm (when 244 there is a congestion loss) in NON-ECN-capable TCP i.e. the sender 245 halves the congestion window cwnd and reduces the slow start 246 threshold ssthresh. Fast Retransmit/Recovery is still available for 247 ECN capable stacks for responding to three duplicate acknowledgments. 249 4. Experimental setup 251 For testing purposes we have added ECN to the Linux TCP/IP stack, 252 kernels version 2.0.32. 2.2.5, 2.3.43 (there were also earlier 253 revisons of 2.3 which were tested). The 2.0.32 implementation 254 conforms to RFC 2481 [7] for the end systems only. We have also 255 modified the code in the 2.1,2.2 and 2.3 cases for the router portion 256 as well as end system to conform to the RFC. An outdated version of 257 the 2.0 code is available at [18]. Note Linux version 2.0.32 258 implements TCP Reno congestion control while kernels >= 2.2.0 default 259 to New Reno but will opt for a SACK/FACK combo when the remote end 260 understands SACK. Our initial tests were carried out with the 2.0 261 kernel at the end system and 2.1 (pre 2.2) for the router part. The 262 majority of the test results here apply to the 2.0 tests. We did 263 repeat these tests on a different testbed (move from Pentium to 264 Pentium-II class machines)with faster machines for the 2.2 and 2.3 265 kernels, so the comparisons on the 2.0 and 2.2/3 are not relative. 267 We have updated this draft release to reflect the tests against SACK 268 and New Reno. 270 4.1. Testbed setup 272 ----- ---- 273 | ECN | | ECN | 274 | ON | | OFF | 275 data direction ---->> ----- ---- 276 | | 277 server | | 278 ---- ------ ------ | | 279 | | | R1 | | R2 | | | 280 | | -----| | ---- | | ---------------------- 281 ---- ------ ^ ------ | 282 ^ | 283 | ----- 284 congestion point ___| | C | 285 | | 286 ----- 288 The figure above shows our test setup. 290 All the physical links are 10Mbps ethernet. Using Class Based 291 Queuing (CBQ) [22], packets from the data server are constricted to a 292 1.5Mbps pipe at the router R1. Data is always retrieved from the 293 server towards the clients labelled , "ECN ON", "ECN OFF", and "C". 294 Since the pipe from the server is 10Mbps, this creates congestion at 295 the exit from the router towards the clients for competing flows. The 296 machines labeled "ECN ON" and "ECN OFF" are running the same version 297 of Linux and have exactly the same hardware configuration. The server 298 is always ECN capable (and can handle NON ECN flows as well using the 299 standard congestion algorithms). The machine labeled "C" is used to 300 create congestion in the network. Router R2 acts as a path-delay 301 controller. With it we adjust the RTT the clients see. Router R1 302 has RED implemented in it and has capability for supporting ECN 303 flows. The path-delay router is a PC running the Nistnet [16] 304 package on a Linux platform. The latency of the link for the 305 experiments was set to be 20 millisecs. 307 4.2. Validating the Implementation 309 We spent time validating that the implementation was conformant to 310 the specification in RFC 2481. To do this, the popular tcpdump 311 sniffer [24] was modified to show the packets being marked. We 312 visually inspected tcpdump traces to validate the conformance to the 313 RFC under a lot of different scenarios. We also modified tcptrace 314 [25] in order to plot the marked packets for visualization and 315 analysis. 317 Both tcpdump and tcptrace revealed that the implementation was 318 conformant to the RFC. 320 4.3. Terminology used 322 This section presents background terminology used in the next few 323 sections. 325 * Congesting flows: These are TCP flows that are started in the 326 background so as to create congestion from R1 towards R2. We use the 327 laptop labeled "C" to introduce congesting flows. Note that "C" as is 328 the case with the other clients retrieves data from the server. 330 * Low, Moderate and High congestion: For the case of low congestion 331 we start two congesting flows in the background, for moderate 332 congestion we start five congesting flows and for the case of high 333 congestion we start ten congesting flows in the background. 335 * Competing flows: These are the flows that we are interested in. 336 They are either ECN TCP flows from/to "ECN ON" or NON ECN TCP flows 337 from/to "ECN OFF". 339 * Maximum drop rate: This is the RED parameter that sets the maximum 340 probability of a packet being marked at the router. This corresponds 341 to maxp as explained in Section 2.1. 343 Our tests were repeated for varying levels of congestion with varying 344 maximum drop rates. The results are presented in the subsequent 345 sections. 347 * Low, Medium and High drop probability: We use the term low 348 probability to mean a drop probability maxp of 0.02, medium 349 probability for 0.2 and high probability for 0.5. We also 350 experimented with drop probabilities of 0.05, 0.1 and 0.3. 352 * Goodput: We define goodput as the effective data rate as observed 353 by the user, i.e., if we transmitted 4 data packets in which two of 354 them were retransmitted packets, the efficiency is 50% and the 355 resulting goodput is 2*packet size/time taken to transmit. 357 * RED Region: When the router's average queue size is between minth 358 and maxth we denote that we are operating in the RED region. 360 4.4. RED parameter selection 362 In our initial testing we noticed that as we increase the number of 363 congesting flows the RED queue degenerates into a simple Tail Drop 364 queue. i.e. the average queue exceeds the maximum threshold most of 365 the times. Note that this phenomena has also been observed by [5] 366 who proposes a dynamic solution to alleviate it by adjusting the 367 packet dropping probability "maxp" based on the past history of the 368 average queue size. Hence, it is necessary that in the course of our 369 experiments the router operate in the RED region, i.e., we have to 370 make sure that the average queue is maintained between minth and 371 maxth. If this is not maintained, then the queue acts like a Tail 372 Drop queue and the advantages of ECN diminish. Our goal is to 373 validate ECN's benefits when used with RED at the router. To ensure 374 that we were operating in the RED region we monitored the average 375 queue size and the actual queue size in times of low, moderate and 376 high congestion and fine-tuned the RED parameters such that the 377 average queue zones around the RED region before running the 378 experiment proper. Our results are, therefore, not influenced by 379 operating in the wrong RED region. 381 5. The Experiments 383 We start by making sure that the background flows do not bias our 384 results by computing the fairness index [12] in Section 5.1. We 385 proceed to carry out the experiments for bulk transfer presenting the 386 results and analysis in Section 5.2. In Section 5.3 the results for 387 transactional transfers along with analysis is presented. More 388 details on the experimental results can be found in [27]. 390 5.1. Fairness 392 In the course of the experiments we wanted to make sure that our 393 choice of the type of background flows does not bias the results that 394 we collect. Hence we carried out some tests initially with both ECN 395 and NON ECN flows as the background flows. We repeated the 396 experiments for different drop probabilities and calculated the 397 fairness index [12]. We also noticed (when there were equal number 398 of ECN and NON ECN flows) that the number of packets dropped for the 399 NON ECN flows was equal to the number of packets marked for the ECN 400 flows, showing thereby that the RED algorithm was fair to both kind 401 of flows. 403 Fairness index: The fairness index is a performance metric described 404 in [12]. Jain [12] postulates that the network is a multi-user 405 system, and derives a metric to see how fairly each user is treated. 406 He defines fairness as a function of the variability of throughput 407 across users. For a given set of user throughputs (x1, x2...xn), the 408 fairness index to the set is defined as follows: 410 f(x1,x2,.....,xn) = square((sum[i=1..n]xi))/(n*sum[i=1..n]square(xi)) 412 The fairness index always lies between 0 and 1. A value of 1 413 indicates that all flows got exactly the same throughput. Each of 414 the tests was carried out 10 times to gain confidence in our results. 415 To compute the fairness index we used FTP to generate traffic. 417 Experiment details: At time t = 0 we start 2 NON ECN FTP sessions in 418 the background to create congestion. At time t=20 seconds we start 419 two competing flows. We note the throughput of all the flows in the 420 network and calculate the fairness index. The experiment was carried 421 out for various maximum drop probabilities and for various congestion 422 levels. The same procedure is repeated with the background flows as 423 ECN. The fairness index was fairly constant in both the cases when 424 the background flows were ECN and NON ECN indicating that there was 425 no bias when the background flows were either ECN or NON ECN. 427 Max Fairness Fairness 428 Drop With BG With BG 429 Prob flows ECN flows NON ECN 431 0.02 0.996888 0.991946 432 0.05 0.995987 0.988286 433 0.1 0.985403 0.989726 434 0.2 0.979368 0.983342 436 With the observation that the nature of background flows does not 437 alter the results, we proceed by using the background flows as NON 438 ECN for the rest of the experiments. 440 5.2. Bulk transfers 442 The metric we chose for bulk transfer is end user throughput. 444 Experiment Details: All TCP flows used are RENO TCP. For the case of 445 low congestion we start 2 FTP flows in the background at time 0. Then 446 after about 20 seconds we start the competing flows, one data 447 transfer to the ECN machine and the second to the NON ECN machine. 448 The size of the file used is 20MB. For the case of moderate 449 congestion we start 5 FTP flows in the background and for the case of 450 high congestion we start 10 FTP flows in the background. We repeat 451 the experiments for various maximum drop rates each repeated for a 452 numnber of sets. 454 Observation and Analysis: 456 We make three key observations: 458 1) As the congestion level increases, the relative advantage for ECN 459 increases but the absolute advantage decreases (expected, since there 460 are more flows competing for the same link resource). ECN still does 461 better than NON ECN even under high congestion. Infering a sample 462 from the collected results: at maximum drop probability of 0.1, for 463 example, the relative advantage of ECN increases from 23% to 50% as 464 the congestion level increases from low to high. 466 2) Maintaining congestion levels and varying the maximum drop 467 probability (MDP) reveals that the relative advantage of ECN 468 increases with increasing MDP. As an example, for the case of high 469 congestion as we vary the drop probability from 0.02 to 0.5 the 470 relative advantage of ECN increases from 10% to 60%. 472 3) There were hardly any retransmissions for ECN flows (except the 473 occasional packet drop in a minority of the tests for the case of 474 high congestion and low maximum drop probability). 476 We analyzed tcpdump traces for NON ECN with the help of tcptrace and 477 observed that there were hardly any retransmits due to timeouts. 478 (Retransmit due to timeouts are inferred by counting the number of 3 479 DUPACKS retransmit and subtracting them from the total recorded 480 number of retransmits). This means that over a long period of time 481 (as is the case of long bulk transfers), the data-driven loss 482 recovery mechanism of the Fast Retransmit/Recovery algorithm is very 483 effective. The algorithm for ECN on congestion notification from ECE 484 is the same as that for a Fast Retransmit for NON ECN. Since both are 485 operating in the RED region, ECN barely gets any advantage over NON 486 ECN from the signaling (packet drop vs. marking). 488 It is clear, however, from the results that ECN flows benefit in bulk 489 transfers. We believe that the main advantage of ECN for bulk 490 transfers is that less time is spent recovering (whereas NON ECN 491 spends time retransmitting), and timeouts are avoided altogether. 492 [23] has shown that even with RED deployed, TCP RENO could suffer 493 from multiple packet drops within the same window of data, likely to 494 lead to multiple congestion reactions or timeouts (these problems are 495 alleviated by ECN). However, while TCP Reno has performance problems 496 with multiple packets dropped in a window of data, New Reno and SACK 497 have no such problems. 499 Thus, for scenarios with very high levels of congestion, the 500 advantages of ECN for TCP Reno flows could be more dramatic than the 501 advantages of ECN for NewReno or SACK flows. An important 502 observation to make from our results is that we do not notice 503 multiple drops within a single window of data. Thus, we would expect 504 that our results are not heavily influenced by Reno's performance 505 problems with multiple packets dropped from a window of data. 506 We repeated these tests with ECN patched newer Linux kernels. As 507 mentioned earlier these kernels would use a SACK/FACK combo with a 508 fallback to New Reno. SACK can be selectively turned off (defaulting 509 to New Reno). Our results indicate that ECN still improves 510 performance for the bulk transfers. More results are available in the 511 pdf version[27]. As in 1) above, maintaining a maximum drop 512 probability of 0.1 and increasing the congestion level, it is 513 observed that ECN-SACK improves performance from about 5% at low 514 congestion to about 15% at high congestion. In the scenario where 515 high congestion is maintained and the maximum drop probability is 516 moved from 0.02 to 0.5, the relative advantage of ECN-SACK improves 517 from 10% to 40%. Although this numbers are lower than the ones 518 exhibited by Reno, they do reflect the improvement that ECN offers 519 even in the presence of robust recovery mechanisms such as SACK. 521 5.3. Transactional transfers 523 We model transactional transfers by sending a small request and 524 getting a response from a server before sending the next request. To 525 generate transactional transfer traffic we use Netperf [17] with the 526 CRR (Connect Request Response) option. As an example let us assume 527 that we are retrieving a small file of say 5 - 20 KB, then in effect 528 we send a small request to the server and the server responds by 529 sending us the file. The transaction is complete when we receive the 530 complete file. To gain confidence in our results we carry the 531 simulation for about one hour. For each test there are a few thousand 532 of these requests and responses taking place. Although not exactly 533 modeling HTTP 1.0 traffic, where several concurrent sessions are 534 opened, Netperf-CRR is nevertheless a close approximation. Since 535 Netperf-CRR waits for one connection to complete before opening the 536 next one (0 think time), that single connection could be viewed as 537 the slowest response in the set of the opened concurrent sessions (in 538 HTTP). The transactional data sizes were selected based on [2] which 539 indicates that the average web transaction was around 8 - 10 KB; The 540 smaller (5KB) size was selected to guestimate the size of 541 transactional processing that may become prevalent with policy 542 management schemes in the diffserv [4] context. Using Netperf we are 543 able to initiate these kind of transactional transfers for a variable 544 length of time. The main metric of interest in this case is the 545 transaction rate, which is recorded by Netperf. 547 * Define Transaction rate as: The number of requests and complete 548 responses for a particular requested size that we are able to do per 549 second. For example if our request is of 1KB and the response is 5KB 550 then we define the transaction rate as the number of such complete 551 transactions that we can accomplish per second. 553 Experiment Details: Similar to the case of bulk transfers we start 554 the background FTP flows to introduce the congestion in the network 555 at time 0. About 20 seconds later we start the transactional 556 transfers and run each test for three minutes. We record the 557 transactions per second that are complete. We repeat the test for 558 about an hour and plot the various transactions per second, averaged 559 out over the runs. The experiment is repeated for various maximum 560 drop probabilities, file sizes and various levels of congestion. 562 Observation and Analysis 564 There are three key observations: 566 1) As congestion increases (with fixed drop probability) the relative 567 advantage for ECN increases (again the absolute advantage does not 568 increase since more flows are sharing the same bandwidth). For 569 example, from the results, if we consider the 5KB transactional flow, 570 as we increase the congestion from medium congestion (5 congesting 571 flows) to high congestion (10 congesting flows) for a maximum drop 572 probability of 0.1 the relative gain for ECN increases from 42% to 573 62%. 575 2) Maintaining the congestion level while adjusting the maximum drop 576 probability indicates that the relative advantage for ECN flows 577 increase. From the case of high congestion for the 5KB flow we 578 observe that the number of transactions per second increases from 0.8 579 to 2.2 which corresponds to an increase in relative gain for ECN of 580 20% to 140%. 582 3) As the transactional data size increases, ECN's advantage 583 diminishes because the probability of recovering from a Fast 584 Retransmit increases for NON ECN. ECN, therefore, has a huge 585 advantage as the transactional data size gets smaller as is observed 586 in the results. This can be explained by looking at TCP recovery 587 mechanisms. NON ECN in the short flows depends, for recovery, on 588 congestion signaling via receiving 3 duplicate ACKs, or worse by a 589 retransmit timer expiration, whereas ECN depends mostly on the TCP- 590 ECE flag. This is by design in our experimental setup. [3] shows 591 that most of the TCP loss recovery in fact happens in timeouts for 592 short flows. The effectiveness of the Fast Retransmit/Recovery 593 algorithm is limited by the fact that there might not be enough data 594 in the pipe to elicit 3 duplicate ACKs. TCP RENO needs at least 4 595 outstanding packets to recover from losses without going into a 596 timeout. For 5KB (4 packets for MTU of 1500Bytes) a NON ECN flow will 597 always have to wait for a retransmit timeout if any of its packets 598 are lost. ( This timeout could only have been avoided if the flow had 599 used an initial window of four packets, and the first of the four 600 packets was the packet dropped). We repeated these experiments with 601 the kernels implementing SACK/FACK and New Reno algorithms. Our 602 observation was that there was hardly any difference with what we saw 603 with Reno. For example in the case of SACK-ECN enabling: maintaining 604 the maximum drop probability to 0.1 and increasing the congestion 605 level for the 5KB transaction we noticed that the relative gain for 606 the ECN enabled flows increases from 47-80%. If we maintain the 607 congestion level for the 5KB transactions and increase the maximum 608 drop probabilities instead, we notice that SACKs perfomance increases 609 from 15%-120%. It is fair to comment that the difference in the 610 testbeds (different machines, same topology) might have contributed 611 to the results; however, it is worth noting that the relative 612 advantage of the SACK-ECN is obvious. 614 6. Conclusion 616 ECN enhancements improve on both bulk and transactional TCP traffic. 617 The improvement is more obvious in short transactional type of flows 618 (popularly referred to as mice). 620 * Because less retransmits happen with ECN, it means less traffic on 621 the network. Although the relative amount of data retransmitted in 622 our case is small, the effect could be higher when there are more 623 contributing end systems. The absence of retransmits also implies an 624 improvement in the goodput. This becomes very important for scenarios 625 where bandwidth is expensive such as in low bandwidth links. This 626 implies also that ECN lends itself well to applications that require 627 reliability but would prefer to avoid unnecessary retransmissions. 629 * The fact that ECN avoids timeouts by getting faster notification 630 (as opposed to traditional packet dropping inference from 3 duplicate 631 ACKs or, even worse, timeouts) implies less time is spent during 632 error recovery - this also improves goodput. 634 * ECN could be used to help in service differentiation where the end 635 user is able to "probe" for their target rate faster. Assured 636 forwarding [1] in the diffserv working group at the IETF proposes 637 using RED with varying drop probabilities as a service 638 differentiation mechanism. It is possible that multiple packets 639 within a single window in TCP RENO could be dropped even in the 640 presence of RED, likely leading into timeouts [23]. ECN end systems 641 ignore multiple notifications, which help in countering this scenario 642 resulting in improved goodput. The ECN end system also ends up 643 probing the network faster (to reach an optimal bandwidth). [23] also 644 notes that RENO is the most widely deployed TCP implementation today. 645 It is clear that the advent of policy management schemes 646 introduces new requirements for transactional type of applications, 647 which constitute a very short query and a response in the order of a 648 few packets. ECN provides advantages to transactional traffic as we 649 have shown in the experiments. 651 7. Acknowledgements 653 We would like to thank Alan Chapman, Ioannis Lambadaris, Thomas Kunz, 654 Biswajit Nandy, Nabil Seddigh, Sally Floyd, and Rupinder Makkar for 655 their helpful feedback and valuable suggestions. 657 8. References 659 [1] Heinanen J et al , "Assured Forwarding PHB Group", http: 660 //search.ietf.org/internet-drafts/draft-ietf-diffserv-af-06.txt. 661 Work in Progress. 663 [2] B.A.Mat. " An empirical model of HTTP network traffic." 664 In proceedings INFOCOMM'97. 666 [3] Balakrishnan H., Padmanabhan V , Seshan S, Stemn M and Randy H. 667 Katz, "TCP Behavior of a busy Internet Server: Analysis and 668 Improvements", Proceedings of IEEE Infocom, San Francisco, CA, USA, 669 March '98 http://www.cs.berkeley.edu/hari/papers/infocom98.ps.gz. 671 [4] Blake, Steve, et al., "An Architecture for Differentiated 672 Services." RFC 2475. 674 [5] W. Feng, D. Kandlur, D. Saha, K. Shin, "Techniques for Eliminating 675 Packet Loss in Congested TCP/IP Networks", U. Michigan CSE-TR-349-97, 676 November 1997. 678 [6] S. Floyd. " TCP and Explicit Congestion Notification." ACM 679 Computer Communications Review, 24, October 1994. 681 [7] Ramakrishnan, K.K., and Floyd, S., "A Proposal to add Explicit 682 Congestion Notification (ECN) to IP" RFC 2481, January 1999. 684 [8] Kevin Fall, Sally Floyd, "Comparisons of Tahoe, RENO and Sack TCP 685 ", Computer Communications Review, V. 26 N. 3, July 1996, pp. 5-21 687 [9] S.Floyd and V. Jacobson. "Random Early Detection Gateways for 688 Congestion Avoidance". IEEE/ACM Transactions on Networking, 3(1), 689 August 1993. 691 [10] E. Hashem. "Analysis of random drop for gateway congestion 692 control." Rep. Lcs tr-465, Lav. Fot Comput. Sci., M.I.T., 1989 694 [11] V.Jacobson. "Congestion Avoidance and Control." In Proceedings 695 of SIGCOMM '88, Stanford, CA, August 1988 697 [12] Raj Jain, " The art of computer systems performance analysis", 698 John Wiley and sons QA76.9.E94J32 ,1991 700 [13] T. V. Lakshman, Arnie Neidhardt, Teunis Ott, "The Drop From Front 701 Strategy in TCP Over ATM and Its Interworking with Other Control 702 Features", Infocom 96, MA28.1. 704 [14] P. Mishra and H. Kanakia. "A hop by hop rate based congestion 705 control scheme." Proc. SIGCOMM '92, pp. 112-123, August 1992. 707 [15] Floyd & Henderson "The NewReno Modification to TCP's Fast 708 Recovery Algorithm", RFC 2582, April 1999. 710 [16] The NIST Network Emulation Tool 711 http://www.antd.nist.gov/itg/nistnet/ 713 [17] The network performance tool 714 http://www.netperf.org/netperf/NetperfPage.html 716 [18] ftp://ftp.ee.lbl.gov/ECN/ECN-package.tgz 718 [19] Braden, Clark et all "Recommendations on Queue Management and 719 Congestion Avoidance in the Internet" RFC 2309, April 1998 721 [20] K.K. Ramakrishnan and R.Jain. "A Binary feedback scheme for 722 congestion avoidance in computer networks." 723 ACM Trans. Comput. Syst.,8(2):158-181,1990 725 [21] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow 726 "TCP Selective Acknowledgement Options " RFC 2018 Oct 1996. 728 [22] S.Floyd and V. Jacobson, "Link sharing and Resource Management 729 Models for packet Networks", 730 IEEE/ACM Transactions on Networking, Vol. 3 No.4, August 1995. 732 [23] Prasad Bagal, Shivkumar Kalyanaraman, Bob Packer, 733 "Comparative study of RED, ECN and TCP Rate Control". 734 http://www.packeteer.com/technology/articles.htm 736 [24] tcpdump, the protocol packet capture & dumper program. 737 ftp://ftp.ee.lbl.gov/tcpdump.tar.Z 739 [25] TCP dump file analysis tool: http://jarok.cs.ohiou.edu/software/ 740 tcptrace/tcptrace.html 742 [26] Thompson K, Miller G.J, Wilder R, 743 " Wide-Area Internet Traffic Patterns and Characteristics". 744 IEEE Networks Magazine, November/December 1997 746 [27] http://www7.nortel.com:8080/CTL/ecnperf.pdf 748 9. Author Addresses 750 Jamal Hadi Salim 751 Nortel Networks 752 3500 Carling Ave 753 Ottawa, ON, K2H 8E9 754 Canada 755 Email: hadi@nortelnetworks.com 757 Uvaiz Ahmed 758 Dept. of Systems and Computer Engineering 759 Carleton University 760 Ottawa 761 Canada 762 Email: ahmed@sce.carleton.ca