idnits 2.17.1 draft-jagan-e2e-traf-mgmt-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-18) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 9 instances of too long lines in the document, the longest one being 17 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 153 has weird spacing: '...d delay are i...' == Line 161 has weird spacing: '...eedback delay...' == Line 165 has weird spacing: '...eedback infor...' == Line 325 has weird spacing: '...e other activ...' == Line 399 has weird spacing: '...uces it to ha...' == (1 more instance...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 1997) is 9743 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '8-11' on line 69 == Unused Reference: '2' is defined on line 728, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 732, but no explicit reference was found in the text == Unused Reference: '6' is defined on line 740, but no explicit reference was found in the text == Unused Reference: '7' is defined on line 742, but no explicit reference was found in the text == Unused Reference: '8' is defined on line 745, but no explicit reference was found in the text == Unused Reference: '9' is defined on line 748, but no explicit reference was found in the text == Unused Reference: '10' is defined on line 751, but no explicit reference was found in the text == Unused Reference: '11' is defined on line 754, but no explicit reference was found in the text == Unused Reference: '13' is defined on line 760, but no explicit reference was found in the text == Unused Reference: '14' is defined on line 763, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 1577 (ref. '5') (Obsoleted by RFC 2225) ** Obsolete normative reference: RFC 793 (ref. '6') (Obsoleted by RFC 9293) -- Possible downref: Non-RFC (?) normative reference: ref. '8' -- Possible downref: Non-RFC (?) normative reference: ref. '9' -- Possible downref: Non-RFC (?) normative reference: ref. '10' -- Possible downref: Non-RFC (?) normative reference: ref. '11' -- Possible downref: Non-RFC (?) normative reference: ref. '12' -- Possible downref: Non-RFC (?) normative reference: ref. '13' -- Possible downref: Non-RFC (?) normative reference: ref. '14' -- Possible downref: Non-RFC (?) normative reference: ref. '15' -- Possible downref: Non-RFC (?) normative reference: ref. '16' Summary: 13 errors (**), 0 flaws (~~), 17 warnings (==), 16 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group 3 INTERNET DRAFT S. Jagannath and N. Yin 4 Expire in six months Bay Networks Inc. 5 August 1997 7 End-to-End Traffic Management Issues in IP/ATM Internetworks 8 10 Status of this Memo 12 This document is an Internet-Draft. Internet-Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its areas, 14 and its working groups. Note that other groups may also distribute 15 working documents as Internet-Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six 17 months and may be updated, replaced, or obsoleted by other documents 18 at any time. It is inappropriate to use Internet-Drafts as reference 19 material or to cite them other than as ``work in progress.'' 20 To learn the current status of any Internet-Draft, please check 21 the ``1id-abstracts.txt'' listing contained in the Internet-Drafts 22 Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net 23 (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific 24 Rim). 26 Abstract 28 This document addresses the end-to-end traffic management issues in 29 IP/ATM internetworks. In the internetwork environment, the ATM 30 control mechanisms (e.g., Available Bit Rate (ABR) and UBR with Early 31 Packet Discard (EPD)) are applicable to the ATM subnetwork, while the 32 TCP flow control extends from end to end. We investigated the end to 33 end performance in terms of TCP throughput and file transfer delay in 34 cases using ABR and UBR in the ATM subnetwork. In this document, we 35 also discuss the issue of trade-off between the buffer requirements 36 at the ATM edge device (e.g., Ethernet-ATM switch, ATM router 37 interface) versus ATM switches inside the ATM network. 39 Our simulation results show that in certain scenarios (e.g., with 40 limited edge device buffer memory) UBR with EPD may perform 41 comparably to ABR or even outperform ABR. We show that it is not 42 sufficient to have a lossless ATM subnetwork from the end-to-end 43 performance point of view. The results illustrate the necessity for 44 an edge device congestion handling mechanism that can couple the ABR 45 and TCP feedback control loops. We present an algorithm that makes 46 use of the ABR feedback information and edge device congestion state 47 to make packet dropping decisions at the edge of the ATM network. 48 Using the algorithm at the edge device, the end-to-end performance in 49 throughput and delay are improved while using ABR as the ATM 50 subnetwork technology and with small buffers in the edge device. 52 1. Introduction 54 In an IP/ATM internetwork environment, the legacy networks (e.g., 55 Ethernet) are typically interconnected by an ATM subnetwork (e.g., 56 ATM backbone). Supporting end-to-end Quality of Services (QOS) and 57 traffic management in the internetworks become important requirements 58 for multimedia applications. In this document, we focus on the end- 59 to-end traffic management issues in IP/ATM internetwork. We 60 investigated the end-to-end performance in terms of TCP throughput 61 and file transfer delay in cases using ABR and UBR in the ATM 62 subnetwork. In this document, we also discuss the issue of trade-off 63 between the buffer requirement at the ATM edge device (e.g., 64 Ethernet-ATM switch, ATM router interface) versus ATM switches inside 65 the ATM network. 67 ABR rate-based control mechanism has been specified by the ATM Forum 68 [1]. In the past, there have been significant investigations on ABR 69 flow control and TCP over ATM [8-11]. Most of the works were focused 70 on the ATM switch buffer requirement and throughput fairness using 71 different algorithms (e.g., simple binary - Explicit Forward 72 Congestion Indication (EFCI) vs. Explicit Rate Control; FIFO vs. per 73 VC queuing, etc.). It is generally believed that ABR can effectively 74 control the congestion within ATM networks. 76 There has been little work on end-to-end traffic management in the 77 internetworking environment, where networks are not end-to-end ATM. 78 In such cases ABR flow control may simply push the congestion to the 79 edge of ATM network. Even if the ATM network is kept free of 80 congestion by using ABR flow control, the end-to-end performance 81 (e.g., the time to transfer a file) perceived by the application 82 (typically hosted in the legacy network) may not necessarily be 83 better. Furthermore, one may argue that the reduction in buffer 84 requirement in the ATM switch by using ABR flow control may be at the 85 expense of an increase in buffer requirement at the edge device 86 (e.g., ATM router interface, legacy LAN to ATM switches). 88 Since most of today's data applications use TCP flow control 89 protocol, one may question the benefits of ABR flow control as a 90 subnetwork technology, arguing that UBR is equally effective and much 91 less complex than ABR [12]. Figures 1 and 2 illustrate the two cases 92 under consideration. 94 Source --> IP -> Router --> ATM Net --> Router -> IP --> Destination 95 Cloud UBR Cloud 97 ----------------------->---------------------- 98 | | 99 --------------- TCP Control Loop --<---------- 101 Figure 1: UBR based ATM Subnetwork. 103 In these cases, source and destination are interconnected through 104 IP/ATM/IP internetworks. In Figure 1, the connection uses UBR 105 service and in Figure 2, the connection uses ABR. In the UBR case, 106 when congestion occurs in ATM networks, cells are dropped (e.g., 107 Early Packet Discard [15] may be utilized) resulting in the reduction 108 of the TCP congestion window. In the ABR case, when congestion is 109 detected in ATM Networks, ABR rate control becomes effective and 110 enforces the ATM edge device to reduce its transmission rate into the 111 ATM network. If congestion persists, the buffer in the edge device 112 will reach its capacity and start to drop packets, resulting in the 113 reduction of the TCP congestion window. 115 From the performance point of view, the latter involves two control 116 loops: ABR and TCP. The ABR inner control loop may result in a 117 longer feedback delay for the TCP control. Furthermore, there are 118 two feedback control protocols, and one may argue that the 119 interactions or interference between the two may actually degrade the 120 TCP performance [12]. 122 Source --> IP -> Router --> ATM Net --> Router -> IP --> Destination 123 Cloud ABR Cloud 125 ----->--------- 126 | | 127 -- ABR Loop -<- 128 -------------------------->------------------- 129 | | 130 --------------- TCP Control Loop --<---------- 132 Figure 2. ABR Based ATM Subnetwork 134 From the implementation perspective, there are trade-off between 135 memory requirements at ATM switch and edge device for UBR and ABR. In 136 addition, we need to consider the costs of implementation of ABR and 137 UBR at ATM switches and edge devices. 139 From this discussion, we believe that the end-to-end traffic 140 management of an entire internetwork environment should be 141 considered. In this document, we study the interaction and 142 performance of ABR and UBR from the point of view of end-to-end 143 traffic management. Our simulation results show that in certain 144 scenarios (e.g., with limited edge device buffer memory) UBR with EPD 145 may perform comparably to ABR or even outperform ABR. We show that it 146 is not sufficient to have a lossless ATM subnetwork when we are 147 looking at the end-to-end performance point of view. The results 148 illustrate the necessity for an edge device congestion handling 149 mechanism that can couple the two - ABR and TCP feedback control 150 loops. We present an algorithm that makes use of the ABR feedback 151 information and edge device congestion state to make packet dropping 152 decisions at the edge of the ATM network,. Using the algorithm at the 153 edge device, the end-to-end throughput and delay are improved while 154 using ABR as the ATM subnetwork technology and with small buffers in 155 the edge device. 157 2. Buffer Requirements 159 In the context of feedback congestion control, buffers are used to 160 absorb the transient traffic conditions and steady state rate 161 oscillations due to the feedback delay. The buffers help to avoid 162 losses in the network and to improve link utilization as aggregated 163 traffic rate is below the link bandwidth. When there is congestion 164 the switch conveys the information back to the source. The source 165 reduces its rate according to the feedback information. The switch 166 sees a drop in its input rates after a round trip delay. If the 167 buffer size is large enough to store the extra data during this 168 delay, it is possible to avoid losses. When the congestion abates, 169 the switch sends information back to the source allowing it to 170 increase its rate. There is again a delay before the switch sees an 171 increase in its input rates. Meanwhile the data is drained from the 172 buffer at a higher rate compared with its input rate. The switch 173 continues to see a full link utilization as long as there is data in 174 the buffer. Since the maximum drain rate is the bandwidth of the link 175 and the minimum feedback delay is the round trip propagation delay, 176 it is generally believed that a buffer size of one bandwidth 177 propagation delay product is required to achieve good throughput. 179 The above argument assumes that the source can reduce its rate when 180 it receives feedback information from the network and that the source 181 will be able to send data at a higher rate when congestion abates in 182 the network. In the internetworking environment, the edge device is 183 the ATM end-system. It controls its transmission rate according to 184 the ABR feedback information, whereas TCP sources send data packet 185 according to the TCP window control protocol. 187 TCP uses implicit negative feedback information (i.e., packet loss 188 and time out). When there is congestion in the network, TCP may 189 continue to transmit a full window amount of data, causing packet 190 loss if the buffer size is not greater than the window size. In case 191 of multiple TCP streams sharing the same buffer it is impossible to 192 size the buffer to be greater than the sum of the window size of all 193 the streams. When there is packet loss, the TCP window size is 194 reduced multiplicatively and it goes through a recovery phase. In 195 this phase, TCP is constrained to transmit at a lower rate even if 196 the network is free of congestion, causing a decrease in the link 197 utilization. 199 When ABR is used, it is possible to avoid losses within the ATM 200 subnetwork. However, since the host uses TCP, the congestion is 201 merely shifted from the ATM subnetwork to the IP/ATM interface. There 202 may be packet loss in the edge device and loss of throughput because 203 of that. Moreover, there is a possibility that there will be negative 204 interaction between the two feedback loops. For example, when the TCP 205 window is large, the available rate may be reduced and when the TCP 206 window is small due to packet losses, the available rate may be high. 207 When edge buffers are limited this kind of negative interaction can 208 cause a severe degradation in the throughput. When UBR is used, 209 packet loss due to EPD or cell discard at the ATM switches triggers 210 the reduction of TCP window. Hence, in the UBR case, there is only a 211 single TCP feedback loop and no additional buffer requirement at the 212 edge device. 214 In general, using UBR may need more memory at the ATM switch, whereas 215 using ABR needs more memory at the edge device. The amount of 216 required buffers for zero cell loss at ATM switch for UBR and at edge 217 device for ABR are in order of maximum TCP window times the number of 218 TCP sessions for zero cell loss. The edge device may be a low cost 219 LAN-ATM switch with limited buffer memory. Furthermore the cost of 220 the edge device is shared by lesser number of TCP sessions as 221 compared to the ATM switch - making it more cost sensitive than the 222 backbone switch. In reality, the buffers at both edge device and ATM 223 switch are limited and smaller than required for zero cell loss. 225 Hence, it is our goal to maximize the TCP throughput for the limited 226 buffers at both edge device and ATM switches. In the our simulation 227 experiments, we assume limited buffers at both edge device and ATM 228 switches. 230 3. Internetwork Model 232 In the end-to-end model the whole network can be considered to be a 233 black box with TCP sources and destinations in the periphery. The 234 only properties of the black box that the TCP modules are sensitive 235 to are the packet loss and the round trip delay through the black 236 box. The round trip delay determines how fast the TCP window can 237 open up to utilize available bandwidth in the network, and also how 238 fast it can react to impending congestion in the network. The packet 239 loss triggers the TCP congestion avoidance and recovery scheme. The 240 feedback control loop of the ABR service may cause an increase in the 241 over-all round trip delay as it constricts the rate available to each 242 TCP stream. It is generally believed that ABR is able to reduce the 243 congestion level within the ATM network, however it does so at the 244 expense of increasing the congestion level at the edge of the ATM 245 network. From the an end-to-end performance point of view it is not 246 clear if this is indeed beneficial. 248 In our simulations we use a modified version of TCP Reno. This 249 implementation of TCP uses the fast retransmission and recovery along 250 with a modification to deal with multiple packet losses within the 251 same window[4]. The simulation experiments were performed using a 252 TCP/IP and IP over ATM [5] protocol stack, as shown in Figure 3. The 253 TCP model conforms to RFCs 793 & 1122. In addition, the window 254 scaling option is used and the TCP timer granularity is set to be 255 0.1s. The options considered for ATM are UBR, UBR with Early Packet 256 Discard (EPD) ABR with EFCI and ABR with Explicit Rate (ER) [1]. 258 The bigger picture of the model consists of two IP clouds connected 259 by an ATM cloud as shown in Figure 1 and 2. In more specific details 260 the model consists of two ATM switches connected back to back with a 261 single OC-3 (155 Mb/s) link. The IP/ATM edge devices are connected to 262 the ATM switches via OC-3 access links. 10 such devices are attached 263 to each switch and therefore there are 10 TCP bi-directional 264 connections that go through the bottleneck link between the ATM 265 switches. One such connection is shown in the layered model in the 266 Figure 3. 268 _________ _________ 269 | App. | | App. | 270 |---------| |---------| 271 | App. I/F| | App. I/F| 272 |---------| |---------| 273 | TCP | | TCP | 274 |---------| _________ _________ |---------| 275 | | | IP | | IP | | | 276 | | | over | | over | | | 277 | | | ATM | | ATM | | | 278 | | |---------| |---------| | | 279 | IP | | IP |AAL5| |AAL5 | IP| | IP | 280 | | | |----| _________ |-----| | | | 281 | | | | ATM| | ATM | | ATM | | | | 282 |---------| |----|----| |---------| |-----|---| |---------| 283 | PL | | PL | PL | | PL | PL | | PL | PL| | PL | 284 --------- --------- --------- --------- -------- 285 | | | | | | | | 286 --------- ------ ------- --------- 288 -------->---------- 289 | | 290 ----- ABR Loop -<-- 292 ------------>--------------->-------------------- 293 | | 294 -----<---------TCP Control Loop -<--------------- 295 Figure 3. Simulation model 297 The propagation delay (D1 + D2 + D3) and the transmission delay 298 corresponding to the finite transmission rate in the IP cloud is 299 simulated in the TCP/IP stack. In the model described below the delay 300 associated with each of the clouds is 3ms (D1 = D2 = D3 = 3ms). The 301 performance metrics considered are basically the goodput at the TCP 302 layer R, the time delay for a successful transmission of a 100KB file 303 (D = duration between the transmission of the first byte and the 304 receipt of an ack for the last byte). The goodput at the TCP layer is 305 defined as the rate at which the data is successfully transmitted up 306 to the application layer from the TCP layer. Data that is 307 retransmitted is counted only once when it gets transmitted to the 308 application layer. Packets that arrive out of sequence have to wait 309 before all the preceding data is received before they can be 310 transmitted to the application layer. 312 Explicit Rate Switch Scheme: The switch keeps a flag for every VC 313 which indicates if the VC is active or not. Switch maintains a timer 314 for each VC. Every time a cell comes in on a particular VC the flag 315 is turned on (if it is off) and the corresponding timer is started 316 (if the flag is already on the timer is restarted). This flag is only 317 turned off when the timer expires. Thus, the VC is considered active 318 when the flag is on and inactive when the flag is off. The explicit 319 rate timer value (ERT) is a parameter that can be adjusted. For our 320 simulation we set the timer value to 1ms - this corresponds to little 321 more than the transmission time for a single TCP packet (at line 322 rate). The explicit rate for each VC is simply the total link 323 bandwidth divided equally between all active VCs. Thus, if a VC is 324 inactive for a duration more than the timer value, its bandwidth is 325 reallocated to the other active VCs. 327 3.1 Simulation Results 329 In this section we provide some initial results that help us 330 understand the issues discussed in the previous sections. The 331 simulation parameters are the following: 333 For TCP with Fast Retransmission and Recovery: 334 Timer Granularity = 100 ms. 335 MSS = 9180 bytes. 336 Round Trip prop. = 18 ms = 2*(D1 + D2 + D3). 337 Max. RTO = 500 ms. 338 Initial RTO = 100 ms. 339 Max. Window Size = 655360 bytes ( > Bw * RTT) 341 For ABR: 342 Round Trip Prop. Delay = 6ms = 2*(D2) 343 Buffer sizes = variable 344 EFCI Thresholds = configurable parameter 345 Nrm = 32 346 RDF=RIF = 1/32 347 All link rates = 155Mb/s 348 Explicit Rate Timer (ERT) = 1ms The application is 349 modeled to have infinite amount of data. 351 The performance metrics shown are the goodput at the TCP layer, R and 352 the time delay for a successful transmission of a 100KB file (D = 353 duration between the transmission of the first byte and the receipt 354 of an ack for the last byte) within a contiguous data stream. 356 Note: All results are with a TCP timer granularity of 100 ms. 358 Table 1: EB = 500, SB = 1000, TCP with FRR 359 Edge-buffer size = 500 cells, 360 Switch-buffer size = 1000 cells, 361 ABR-EFCI threshold = 250, 362 UBR-EPD threshold = 700, Exp. Rate Timer (ERT) = 1ms. 363 ____________________________________________________________ 364 | UBR UBR + EPD ABR-EFCI ABR-ER | 365 |------------------------------------------------------------| 366 | Delay (D) 520 ms 521 ms 460 ms 625 ms | 367 | Goodput(R) 81.06 Mbps 87.12 Mbps 77.77 Mbps 58.95 Mbps | 368 ------------------------------------------------------------ 370 In the case of UBR and UBR+EPD the switch buffer is fully utilized 371 but maximum occupancy of the edge buffers is about the size of 1 TCP 372 packet. Thus, cells are lost only in the switch buffer in case of 373 UBR. ABR with EFCI reduces the cell loss in the switch but results in 374 increase in the cell loss in the edge buffer. ABR with ER eliminates 375 the cell loss completely from the switch buffer but results in more 376 cell loss in the edge device. Thus in the case shown, ER results in 377 the lowest throughput and longest file transfer delay. 379 For the sake of comparison, in Table 2 (below) we show the results 380 for the TCP without fast retransmission and recovery which were 381 obtained earlier. 383 Table 2: EB = 500, SB = 1000, No Fast Retransmit and Recovery 384 Edge-buffer size = 500 cells, 385 Switch-buffer size = 1000 cells, 386 ABR-EFCI threshold = 250, UBR-EPD threshold = 700. 387 ________________________________________________ 388 | UBR UBR+EPD ABR-EFCI | 389 |------------------------------------------------| 390 | Delay (D) 201 ms 139 ms 226 ms | 391 | Goodput(R) 63.94 Mbps 86.39 Mbps 54.16 Mbps | 392 ------------------------------------------------ 394 The set of results (Table 1) with the enhanced TCP show an overall 395 improvement in throughput, but an increase in the file transfer 396 delay. In the presence of fast recovery and retransmit the lost 397 packet is detected with the receipt of duplicate acknowledgments, but 398 the TCP instead of reducing its window size to 1 packet, merely 399 reduces it to half its current size. This increases the total amount 400 of data that is transmitted by TCP. In the absence of FRR the TCP 401 sources are silent for long periods of time when TCP recovers a 402 packet loss through time-out. The increase in throughput in the case 403 of TCP with FRR can be mainly attributed to more amount of data being 404 transmitted by the TCP source. However, that also increases the 405 number of packets that are dropped causing an increase in the average 406 file transfer delay. (The file transfer delay is the duration between 407 the transmit of the first byte of the file to the receipt for the 408 acknowledgment for the last byte of the file.) 410 In both the sets of results we see that UBR with Early Packet Discard 411 results in better performance. While the ER scheme (Table 1) results 412 in zero cell loss within the ATM network, it does not translate to 413 better end-to-end performance for the TCP connections. The limited 414 buffer sizes on the edge of the network result in poor effective 415 throughput, both due to increased retransmissions and poor 416 interaction between the TCP window and the ABR allowed rate (ACR). 418 All the earlier results assume limited buffers in the edge as well as 419 the switch. We also performed simulations with larger buffers at the 420 edge device and in the switch. When the edge device buffer size is 421 increased the end-to-end throughput is improved, though the delay 422 might still be high, as shown in the following tables. 424 Table 3: EB = 5000, SB = 1000 with FRR 425 Edge-buffer size = 5000 cells, 426 Switch-buffer size = 1000 cells, 427 ABR-EFCI threshold = 250, ER Timer (ERT) = 1ms 428 ___________________________________________ 429 | ABR with EFCI ABR with ER | 430 |-------------------------------------------| 431 | Delay (D) 355 ms 233 ms | 432 | Goodput (R) 89.8 Mbps 112.6 Mbps | 433 ------------------------------------------- 435 Table 4: EB = 10000, SB = 1000 with FRR 436 Edge-buffer size = 10000 cells, 437 Switch-buffer size = 1000 cells, 438 ABR-EFCI threshold = 250, ER Timer (ERT) = 1ms 439 ___________________________________________ 440 | ABR with EFCI ABR with ER | 441 |-------------------------------------------| 442 | Delay (D) 355 ms 256 ms | 443 | Goodput(R) 89.8 Mbps 119.6 Mbps | 444 ------------------------------------------- 446 The maximum TCP window size is approximately 13600 cells (655360 447 bytes). With larger buffers, the edge buffer is large enough to hold 448 most of the TCP window. However, we still get low throughput for ABR 449 with EFCI because of cell loss within the switch. ABR with ER on the 450 other hand pushes the congestion into the edge device which can 451 handle it with larger buffers. Cell loss occurs only when the edge 452 buffer overflows which is a very infrequent event because of the size 453 of the buffer. Since the TCP window dynamics is actually triggered by 454 loss of packets, there is less interaction here between the TCP flow 455 control and the ATM flow control mechanism, contrary to the earlier 456 results (Table 1). The only factor that influences the TCP flow 457 control, in the absence of lost packets, is the total round trip time 458 - which in this case slowly increases as the buffers fill up, thereby 459 increasing end-to-end delays. 461 From the results it is evident that it is not sufficient to keep the 462 ATM subnetwork free of congestion to provide good end-to-end 463 throughput and delay performance. Our approach to solve this problem 464 is to couple the two flow control loops. The information received 465 from the network, along with the local queue sizes is used to detect 466 congestion early and speed up the TCP slow-down and recovery 467 processes. In the next section we present an edge device congestion 468 handling results in significant improvement in the end-to-end 469 performance with ABR-ER and small buffers at the edge device. 471 4. Edge Device Congestion Handling Mechanism 473 The Packet Discard Algorithm, that couples ABR flow control with TCP 474 window flow control, has the objectives of relieving congestion as 475 well as conveying congestion information to the TCP source as soon as 476 possible. In addition the algorithm needs to take into consideration 477 how the TCP source can recover from the lost packet without a 478 significant loss of throughput. Figure 5 gives a state-diagram for 479 the algorithm followed by the pseudocode and an explanation for the 480 algorithm. Currently we are working on an enhancement of this 481 algorithm that is more sensitive to the explicit rate feedback and 482 can lead to further improvement in the performance. 484 4.1 Adaptive Discard Algorithm 486 _________________ 487 --->--| Phase 0 |--->---- 488 | ----------------- | 489 (ACR < f*LD_ACR | | Queue < LT 490 and Queue < LT) | | and 491 or | | P_CTR = 0 492 Queue = MQS | | 493 | _________________ | 494 ---<--| Phase 1 |---<---- 495 ----------------- 497 Figure 4. Algorithm State Diagram 499 Variables: 500 ACR = current Available Cell Rate(current cell transmission rate). 501 ER = maximum network allowed cell transmission rate from 502 receiving RM cell, it's the ceiling of ACR. 503 CI = Congestion indication in network from receiving RM cell. 504 LD_ACR = ACR when a packet was last dropped, or when ACR is 505 increased. LD_ACR is always greater than or equal to ACR. 506 QL = Queue Length, number of packets in VC queue 507 LT = Low Queue Threshold 509 MQS = Maximum Queue Size 510 P_CTR = Stores the value of the number of packets in the VC queue 511 when a packet is dropped and is then decrement for every 512 packet that is removed from the VC queue. 513 Congestion Phase = 1 bit information on the phase of the 514 congestion (0 or 1) 516 f = Factor that determines the significance in rate reduction, 517 with 0= LT) or (QL >= MQS){ 529 Drop packet from front of the Queue; 530 Congestion Phase = 1; 531 LD_ACR = ACR; 532 P_CTR = QL; 533 } 535 if (ACR > LD_ACR) 536 LD_ACR = ACR ; 537 } 538 if (Congestion Phase = 1){ 539 if (QL = MQS){ 540 Drop packet in front of queue; 541 LD_ACR = ACR; 542 P_CTR = QL; 543 } 544 if (QL < LT and P_CTR=0) 545 Congestion Phase = 0; 546 } 548 Decrement P_CTR when a packet is serviced. 549 P_CTR = Max (P_CTR, 0); 551 Description: On receipt of the RM cells with the feedback information 552 the ACR value is updated for the particular VC. When ACR is reduced 553 drastically we would like to convey this information to the TCP 554 source as soon as possible. The above algorithm uses this feedback 555 information and the current queue length information to decide if a 556 packet should be dropped from the VC. This coupling is a critical 557 factor in resolving the issues discussed in earlier sections. Every 558 time a packet is dropped the algorithm updates the LD_ACR (Last Drop 559 ACR) value and uses that as the reference rate against which it 560 compares the new ACR values. This value is increased and made equal 561 to the ACR whenever ACR is larger than its current value. Thus, 562 LD_ACR is always greater than or equal to the ACR. The algorithm 563 consists of two phases. The criteria used to affect a packet drop is 564 different in the two phases. 566 Phase 0: The network is assumed not to be congested in this phase. 567 This is reflected both by an ACR that is slowly changing and queue 568 length less than the maximum queue size (MQS). Two possible scenarios 569 can cause a packet drop. If the ACR is constant or slowly changing 570 eventually the TCP window and hence the input rate to the queue will 571 become large enough to cause the queue length to touch the MQS level. 572 Then it is required to drop a packet to trigger a reduction in the 573 TCP window. In the second scenario a packet is dropped when there is 574 a drastic reduction in the ER available to the VC and if the queue 575 length is greater than a queue threshold. The threshold should be set 576 to at least a few packets to ensure the transmission of duplicated 577 acks, and should not be set too close to the size of queue to ensure 578 the function of congestion avoidance. A drastic reduction in the ACR 579 value signifies congestion in the network and the algorithm sends an 580 early warning to the TCP source by dropping a packet. 582 Front Drop : In both the above cases the packet is dropped from the 583 front of the queue. This results in early triggering of the 584 congestion control mechanism in TCP. This has also been observed by 585 others [16]. Additionally TCP window flow control has the property 586 that the start of the sliding window aligns itself to the dropped 587 packet and stops there till that packet is successfully 588 retransmitted. This reduces the amount of data that the TCP source 589 can pump into a congested network. In implementations of TCP with the 590 fast recovery and retransmit options the recovery from the lost 591 packet is sooner since at least one buffer worth of data is 592 transmitted after the lost packet (which in turn generate the 593 required duplicate acknowledgments for the fast retransmit or 594 recovery). 596 Transition to Phase 1: When TCP detects a lost packet, depending on 597 the implementation, it will either reduce its window size to 1 598 packet, or it reduces the window size to half its current window 599 size. When multiple packets are lost within the same TCP window 600 different TCP implementations will recover differently. 602 The first packet that is dropped causes the reduction in TCP window 603 and hence its average rate. The multiple packet losses within the 604 same TCP window causes a degradation of throughput and is not 605 desirable, irrespective of the TCP implementation. After the first 606 packet is dropped the algorithm makes a transition to Phase 1 and 607 does not drop any packets to cause rate reduction in this phase. 609 Phase 1 : In this phase the algorithm does not drop packets to convey 610 a reduction of the ACR rate. The packets are dropped only when the 611 queue reaches the MQS value. The packets are dropped from the front 612 because of the same reasons as described in Phase 0. When a packet 613 is dropped the algorithm records the number of packets in the queue 614 in the variable P_CTR. The TCP window size is at least as large as 615 P_CTR when a packet is dropped. Thus, the algorithm tries not to drop 616 any more packets due to rate reduction till P_CTR packets are 617 serviced. 619 Transition to Phase 0: If the ACR stays at the value that caused the 620 transition to Phase 1, the queue length will decrease after one round 621 trip time and the algorithm can transit to Phase 0. If the ACR 622 decreases further the algorithm eventually drops another packet when 623 the queue length reaches the MQS, but does not transit back to Phase 624 0. The transition to Phase 0 takes place when at least P_CTR packets 625 have been serviced and queue length recedes below the LT threshold. 627 4.2 Simulation Results 629 The initial set of results served to identify the problem and to 630 motivate the edge device congestion handling mechanism. In this 631 section we discuss some of the simulation results with the use of the 632 algorithm described above. In order to get better understanding of 633 the behavior of the algorithm these simulation experiments were 634 performed with a smaller TCP packet size - 1500 bytes. A 9180 byte 635 TCP packet is approximately equivalent to 192 ATM cells. This would 636 be very close to the threshold value of the algorithm for the small 637 buffer range that we are considering and would unduly influence the 638 results. The rest of the simulation parameters are the same as used 639 above with all results using TCP with fast retransmit and recovery. 641 The results are shown for UBR and UBR with Early Packet Discard 642 (EPD). Along with ABR we show results for ABR with a simple front 643 drop strategy in the edge device and for ABR with the Adaptive 644 Discard Algorithm. These are termed as ABR-ER, ABR-ER with FD and 645 ABR-ER with ADA in the tables below. From the results it is evident 646 that the end- to-end performance improves as we use intelligent 647 packet discard algorithms in the edge device. There is significant 648 improvement due to the early detection of congestion based on the 649 network feedback information received due to ABR. 651 Table 5: EB = 500, SB = 1000, 652 TCP with Fast Retransmit and Recovery 653 ____________________________________________ 654 | UBR UBR with EPD | 655 |--------------------------------------------| 656 | Delay (D) 370 ms 230 ms | 657 | Goodput(R) 63.4 Mbps 79.1 Mbps | 658 -------------------------------------------- 660 Table 6: EB = 500, SB = 1000, 661 TCP with Fast Retransmit and Recovery 662 __________________________________________________________ 663 | ABR-ER ABR-ER with FD ABR-ER with ADA | 664 |----------------------------------------------------------| 665 | Delay (D) 715 ms 237 ms 244 ms | 666 | Goodput(R) 36.15 Mbps 67.9 Mbps 87.68 Mbps | 667 ---------------------------------------------------------- 669 Table 7: EB = 1000, SB = 1000, 670 TCP with Fast Retransmit and Recovery 671 __________________________________________________________ 672 | ABR-ER ABR-ER with FD ABR-ER with ADA | 673 |----------------------------------------------------------| 674 | Delay (D) 310 ms 145 ms 108 ms | 675 | Goodput(R) 66.7 Mbps 107.7 Mbps 112.86 Mbps | 676 ---------------------------------------------------------- 678 The use of the algorithm leads to increase in the TCP throughput and 679 a decrease in the end- to-end delays seen by the end host. The 680 algorithm has two effects on the TCP streams - it detects congestion 681 early and drops a packet as soon as it detects congestion, plus it 682 tries to avoid dropping additional packets. The first packet that is 683 dropped achieves a reduction in the TCP window size and hence the 684 input rates, the algorithm tries not to drop any more. This keeps 685 the TCP streams active with a higher sustained average rate. In order 686 to improve the ABR throughput and delay performance it seems to be 687 important to control the TCP window dynamics and reduce possibilities 688 for negative interaction between the TCP flow control loop and the 689 explicit rate feedback loop. 691 5. Summary 693 In this document we discuss the necessity for considering end-to-end 694 traffic management in IP/ATM internetworks. In particular we consider 695 the impact of the interactions between the TCP and the ABR flow 696 control loops in providing end-to-end quality of service to the user. 697 We also discuss the trade-off in the memory requirements between the 698 switch and edge device based on the type of flow control employed 699 within the ATM subnetwork. We show that it is important to consider 700 the behavior of the end-to-end protocol like TCP when selecting the 701 traffic management mechanisms in the IP/ATM internetworking 702 environment. 704 The ABR rate control pushes the congestion out of the ATM subnetwork 705 and into the edge devices. We show that this can have a detrimental 706 effect on the end-to-end performance where the buffers at the edge 707 device are limited. In such cases we show that UBR with Early Packet 708 Discard can achieve better performance than ABR. 710 We present an algorithm for handling the congestion in the edge 711 device which improves the end-to-end throughput and delay 712 significantly while using ABR. The algorithm couples the two flow 713 control loops by using the information received by the explicit rate 714 scheme to intelligently discard packets when there is congestion in 715 the edge device. In addition the algorithm is sensitive to the TCP 716 flow control and recovery mechanism to make sure that the throughput 717 does not suffer due closing of the TCP window. As presented in this 718 paper the algorithm is limited to cases where different TCP streams 719 are queued separately. As future work the algorithm will be extended 720 to include shared memory cases. More work needs to be done to include 721 studies with heterogeneous sources with different round- trip delays. 723 References 725 [1] ATM Forum TM SWG "Traffic Management Specification V. 4.0" 726 ATM Forum, April 1996. 728 [2] V. Jacobson, "Modified TCP Congestion Avoidance Algorithm," 729 end2end-interest mailing list, April 30, 1990. 730 ftp://ftp.isi.edu/end2end/end2end-interest-1990.mail. 732 [3] K. Fall and S. Floyd, "Comparisons of Tahoe, Reno and Sack TCP", 733 available from http://www-nrg.ee.lbl.gov/nrg-abstracts.html#KF95. 735 [4] J. C. Hoe, "Improving the Start-up behavior of Congestion Control Scheme for TCP", 736 SIGCOMM 96. 738 [5] RFC 1577, "Classical IP and ARP over ATM", December 1993. 740 [6] RFC 793, "Transmission Control Protocol", September 1981. 742 [7] RFC 1122, "Requirements for Internet Hosts - Communication Layers", 743 October 1989. 745 [8] C. Fang and A. Lin, "A Simulation Study of ABR Robustness with 746 Binary-Mode Switches", ATMF-95-132, Oct. 1995 748 [9] H. Chen, and J. Brandt, "Performance Evaluation of EFCI Switch 749 with Per VC Queuing", ATMF 96-0048, Feb. 1996 751 [10] H. Li, K. Sui, and H. Tzeng, "TCP over ATM with ABR vs. UBR+EPD" 752 ATMF 95-0718, June 1995 754 [11] R. Jain, S. Kalyanaraman, R. Goyal and S. Fahmy, "Buffer Requirements for 755 TCP over ABR", ATMF 96-0517, April 1996. 757 [12] P. Newman, "Data over ATM: Standardization of Flow Control" 758 SuperCon'96, Santa Clara, Jan 1996. 760 [13] N. Yin and S. Jagannath, "End-to-End Traffic Management in IP/ATM 761 Internetworks", ATM Forum 96-1406, October 1996. 763 [14] S. Jagannath and N. Yin, "End-to-End TCP Performance in IP/ATM Internetworks", 764 ATM Forum 96-1711, December 1996. 766 [15] A. Romanow and S. Floyd, "Dynamics of TCP Traffic over ATM Networks", IEEE 767 Journal of Selected Areas in Communications, pp 633-41, vol. 13, N o. 4, May 1995. 769 [16] T. V. Lakshman et al., "The Drop from Front Strategy in TCP and TCP over ATM", 770 IEEE Infocom, 1996. 772 Authors' Address 774 S. Jagannath and N. Yin 775 (508) 670-8888, 670-8153 (fax) 776 jagan_pop@baynetworks.com, 777 nyin@baynetworks.com 778 Bay Networks Inc. 780 The ION working group can be contacted through the chairs: 781 Andy Malis George Swallow 782