idnits 2.17.1 draft-faber-time-wait-avoidance-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-16) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 5 instances of too long lines in the document, the longest one being 2 characters in excess of 72. ** The abstract seems to contain references ([2], [3], [4], [5], [6], [7], [8], [9], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 1997) is 9741 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 52 looks like a reference -- Missing reference section? '2' on line 171 looks like a reference -- Missing reference section? '3' on line 84 looks like a reference -- Missing reference section? '4' on line 98 looks like a reference -- Missing reference section? '5' on line 99 looks like a reference -- Missing reference section? '6' on line 325 looks like a reference -- Missing reference section? '7' on line 115 looks like a reference -- Missing reference section? '8' on line 330 looks like a reference -- Missing reference section? '9' on line 523 looks like a reference Summary: 10 errors (**), 0 flaws (~~), 1 warning (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT T. Faber 2 Expires: February 10, 1998 J. Touch 3 draft-faber-time-wait-avoidance-00.txt W. Yue 4 USC/ISI 5 August 1997 7 Avoiding the TCP TIME_WAIT state at Busy Servers 9 Status of this Memo 11 This document is an Internet-Draft. Internet-Drafts are working doc- 12 uments of the Internet Engineering Task Force (IETF), its areas, and 13 its working groups. Note that other groups may also distribute work- 14 ing documents as Internet-Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six months 17 and may be updated, replaced, or obsoleted by other documents at any 18 time. It is inappropriate to use Internet- Drafts as reference mate- 19 rial or to cite them other than as "work in progress." 21 To view the entire list of current Internet-Drafts, please check the 22 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 23 Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe), 24 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 25 ftp.isi.edu (US West Coast). 27 Abstract 29 This document describes the problems associated with the accumulation 30 of TCP TIME_WAIT states at a network server, such as a web server, 31 and details two methods for avoiding that accumulation. Servers that 32 have many TCP connections in TIME_WAIT state experience performance 33 degradation, and can collapse. One solution is a TCP modification 34 that causes clients to enter TIME_WAIT state rather than servers. 35 The other is an HTTP modification that allows the client to close the 36 transport connection, maintaining the TIME_WAIT state at the client. 37 The goal of both approaches is ensure that TIME_WAIT states accumu- 38 late at the less loaded endpoint. 40 The document also presents initial performance data from reference 41 implementations of these solutions, which suggest that the modifica- 42 tions improve HTTP connection rates at the server by as much as 50%, 43 and allow servers to operate at small transaction throughputs that 44 they cannot sustain their default configuration. 46 Introduction 48 This draft describes the causes and effects of TIME_WAIT TCP protocol 49 control block (TCB) accumulation at servers and proposes independent 50 application and transport level modifications that remove that 51 buildup. We present experimental results showing a 50% improvement 52 in HTTP connection rates, as measured by WebSTONE[1], as well as evi- 53 dence that modified servers function at higher loads than unmodified 54 servers can. 56 TIME_WAIT state and its effects 58 TCP includes a mechanism to ensure that packets associated with one 59 connection that are delayed in the network are not accepted by later 60 connections between the same hosts[2]. The mechanism is implemented 61 by the TIME_WAIT state of the TCP protocol. When an endpoint closes 62 a TCP connection, it keeps state about that connection, usually a 63 copy of the TCB, for twice the maximum segment lifetime (MSL). A 64 connection in this state is in TIME_WAIT, and the endpoint holding 65 the TIME_WAIT TCB rejects any packets addressed to the TIME_WAIT con- 66 nection from the other endpoint. 68 Keeping this TIME_WAIT TCB at either of the hosts prevents a new con- 69 nection with the same combination of source address, source port, 70 destination address, destination port from being created. Either 71 endpoint being in TIME_WAIT prevents data transfer on the connection, 72 so protocol correctness is unaffected by which host holds the 73 TIME_WAIT TCB. Our modifications center on ensuring that the 74 TIME_WAIT TCB is on the less loaded endpoint. 76 Heavily loaded servers potentially keep thousands of TIME_WAIT TCBs, 77 which consume memory and can slow active connections. In BSD-based 78 TCP implementations, TCBs are kept in mbufs, the memory allocation 79 unit of the networking subsystem. There are a finite number of mbufs 80 available in the system, and mbufs consumed by TCBs cannot be used 81 for other purposes, e.g., to move data. Certain systems on high 82 speed networks run out of mbufs due to TIME_WAIT buildup under high 83 connection load. A SPARCStation 20/71 under SunOS 4.1.3 on a 640 84 Mb/sec Myrinet[3] cannot support more than 60 connections/sec. 86 Incoming packets must be demultiplexed by finding the receiving con- 87 nection in the host's TCB list. This process can be slowed when the 88 TCB list is full of TIME_WAIT TCBs. In the simplest implementation, 89 the TCB list is searched linearly to demultiplex the incoming packet 90 to the appropriate connection, which can make TCB lookup a bottle- 91 neck. The additional search overhead can cut throughput between two 92 SunOS 4.1.3 SPARCStations on a Myrinet in half. 94 Other Proposed Solutions 96 There are other solutions to the increased lookup overhead problem, 97 e.g., storing all TIME_WAIT TCBs at the end of the list and using 98 them as a search terminator as BSDI's BSD/OS does[4], or hashing TCBs 99 rather than keeping them in a list[5]. These solutions do not 100 address the loss of memory due to accumulation of TIME_WAIT states, 101 so servers may still be unable to serve a high client load. These 102 approaches improve system response until the server collapses due to 103 lack of free mbufs; our approach of removing the TIME_WAIT state from 104 the server eliminates this cause of server collapse. 106 Allocating more memory to system mbufs or reducing the amount of data 107 cached per connection allows servers to function under a higher load 108 before collapsing. The servers' performance will continue to 109 degrade. Moving TIME_WAIT to clients removes this cause of system 110 degradation and collapse without changing resource allocations. 112 The costs of accumulating TIME_WAIT TCBs have become more apparent as 113 HTTP becomes more prevalent. Under HTTP 1.1, servers terminate con- 114 nections by closing the underlying TCP connection[6], which results 115 in accumulation of TCBs at servers[7]. 117 HTTP 1.1 reduces the number of connections per transaction using per- 118 sistent connections; however, with respect to TIME_WAIT buildup, the 119 use of persistent connections[6] is similar to adding more memory to 120 servers: servers can support a larger load before the effect becomes 121 noticeable, but performance eventually degrades. Servers supporting 122 persistent connections can support more transactions per connection, 123 and will benefit from our modifications by being able to support more 124 connections. 126 Our Proposed Solutions 128 Because the accumulation of TIME_WAIT TCBs is caused by the interac- 129 tion between transport and application protocols, modifications can 130 be made to either protocol to to alleviate it. Changing the trans- 131 port protocol confers the benefits to more applications, but there 132 may be more resistance to changing a protocol on which many applica- 133 tions depend. Application level changes restrict the benefits (and 134 drawbacks) to the application for which the solution is implemented. 135 Furthermore, application solutions are not always possible; for exam- 136 ple, protocols that use the closing of a transport connection to 137 indicate end-of-file are not good candidates for removing TIME_WAIT 138 TCBs at the application layer. 140 This document proposes distinct extensions to TCP and to HTTP that 141 allow hosts to control which end of the connection remains in 142 TIME_WAIT state. A solution needs to be implemented at only one 143 level, transport or application. We describe and measure both to have 144 a basis for comparison. Preliminary experiments indicate that both 145 systems reduce the memory usage of web servers due to TIME_WAIT 146 states to negligible levels, with accompanying performance improve- 147 ments. The TCP modifications require only client side changes, and 148 can be deployed incrementally. The HTTP changes affect client and 149 server, but are compatible with HTTP 1.1 behavior, and can also be 150 incrementally deployed. 152 The remainder of this document is organized as follows: it presents 153 the two proposed solutions, compares them, discusses the results of 154 initial experiments with the solutions, and draws conclusions and 155 outlines future work. 157 Transport Level (TCP) Solution 159 The TCP solution exchanges the TIME_WAIT state between the server and 160 client. We modify the client's TCP implementation so that after it 161 has completed a passive close of a transport connection, it sends an 162 RST packet to the server and puts itself into a TIME_WAIT state. The 163 RST packet removes the TCB in TIME_WAIT state from the server; the 164 explicit transition to a TIME_WAIT state in the client preserves cor- 165 rect TCP behavior. If the client RST is lost, both server and client 166 remain in TIME_WAIT state, which also ensures correct behavior and is 167 equivalent to a simultaneous close in the current protocol. If 168 either host reboots during the RST exchange, the behavior is the same 169 as if a host running unmodified TCP fails with connections in 170 TIME_WAIT state: packets will not be erroneously accepted if the host 171 recovers and refuses connections until a 2 MSL period has elapsed[2]. 173 More formally, the change to the TCP state machine results in chang- 174 ing the arc from LAST_ACK to CLOSED to an arc from LAST_ACK to 175 TIME_WAIT and sending an RST when the arc is traversed. These modi- 176 fications need to be made only to clients. 178 Hosts that act primarily as clients may be configured with the new 179 behavior for all connections; clients that serve as both client and 180 server, for example proxies, may be configured to support both behav- 181 iors. The implementation of both behaviors is straightforward, 182 although it requires a more extensive modification of the TCP state 183 machine. 185 Allowing both behaviors on the same host requires splitting the 186 LAST_ACK state into two states, one that represents the current 187 behavior (LAST_ACK) and one which represents the modified behavior 188 (LAST_ACK_SWAP). These states may both be reported as LAST_ACK to 189 monitoring tools. The state machine determines which state to enter 190 from CLOSE_WAIT based on whether the application issues a close or a 191 close_swap. 193 The current passive close path is: 195 server client 196 ----------------------------------------------------------- 197 ESTABLISHED ESTABLISHED 198 (get application close) 199 goto FIN_WAIT_1 200 send FIN ---FIN---> 201 goto CLOSE_WAIT 202 <---ACK--- send ACK 203 goto FIN_WAIT_2 204 (get application close) 205 goto LAST_ACK 207 <---FIN--- send FIN 208 goto TIME_WAIT 209 send ACK ---ACK---> 210 goto CLOSED 212 This solution adds this branch from CLOSE_WAIT on the client side: 214 server client 215 ----------------------------------------------------------- 216 ESTABLISHED ESTABLISHED 217 (get application close) 218 goto FIN_WAIT_1 219 send FIN ---FIN---> 220 goto CLOSE_WAIT 221 <---ACK--- send ACK 222 goto FIN_WAIT_2 223 (get application close_swap) 224 goto LAST_ACK_SWAP 225 <---FIN--- send FIN 226 goto TIME_WAIT 227 send ACK ---ACK---> 228 goto TIME_WAIT 229 <---RST--- send RST 230 goto CLOSED 232 Strictly speaking, the transition of the client to TIME_WAIT is 233 extraneous, because any host sending an RST is obligated not to allow 234 a connection between the same pair of addresses and ports for a time 235 of at least 2 MSL. 237 Distinguishing between close and close_swap does not require changing 238 the application interface. For example, a per-connection flag can be 239 added to change the default behavior, where the default behavior is 240 chosen based on whether the host is primarily a client or a server. 241 Hosts that are primarily clients will follow the close_swap path 242 unless overridden and servers the close path. 244 Implementations of this system do not change the API at all if all 245 connections from the same host have the same semantics; hosts which 246 are primarily clients will see no change. Only hosts that support 247 both semantics will have a change to the API, and this will be an 248 additional socket option or similar small change. 250 The solution we propose is designed to interoperate with the existing 251 TCP specification. A cleaner implementation of our solution would be 252 to change both endpoint implementations to negotiate which endpoint 253 maintains the TIME_WAIT TCB. However this would require changing all 254 TCP implementations, which ours does not. 256 A SunOS 4.1.3 patch is available from the authors. 258 Application Level Solution for HTTP 260 Protocols that use the state of the transport connection as sig- 261 nalling dictate which endpoint closes a connection, and therefore 262 which incurs the cost of the TIME_WAIT TCB. For example, early HTTP 263 servers used the state of the transport connection as an implicit 264 indicator of both transaction lifetime and request length. The 265 server closing the TCP connection indicated to the client that the 266 whole response had arrived, and, because there were no persistent 267 connections, that the HTTP exchange was over. Because the server was 268 using the close to mark the end of both the transaction and the 269 exchange, it was required to initiate the close. 271 HTTP 1.1 has sufficient framing to allow a modification to shifting 272 TIME_WAIT TCBs to the clients[6]. Responses are self-delineating; 273 all responses include the size of the response either in the headers 274 or via the chunking mechanism. When using persistent connections, 275 which is the default behavior in HTTP/1.1, requests have fields which 276 can be used to control the transport connection. The server is no 277 longer required by the protocol to close the transport connection. 279 To control the distribution of TIME_WAIT TCBs from the application 280 level, our HTTP modifications arrange that the client closes the TCP 281 connection. This requires the client to be able to detect the end of 282 a response. Under HTTP 1.1, this information is available to the 283 client as a side effect of persistent connections. We advocate a 284 change in client behavior which requires them to close the transport 285 connection underlying an HTTP connection, and an extension of the 286 request format which allows the client to notify the server that it 287 is breaking the TCP connection. 289 We propose adding a CLIENT_CLOSE request to HTTP that indicates that 290 a client is ending the HTTP exchange by closing the underlying TCP 291 connection. A CLIENT_CLOSE request requires no reply. It terminates 292 a series of requests on a persistent connection, and indicates to the 293 server that the client has closed the TCP connection. A client will 294 initiate an active close on the TCP connection immediately after 295 sending the CLIENT_CLOSE request to the server. 297 A CLIENT_CLOSE request differs from including a "Connection: close" 298 in the header of a request because a request that includes "connec- 299 tion: close" still requires a reply from the server, and the server 300 will (passively) close the connection[6]. A CLIENT_CLOSE request 301 indicates that the client has severed the TCP connection, and that 302 the server should close its end without replying. 304 Incorporating the CLIENT_CLOSE into the transaction is a minor exten- 305 sion to the HTTP protocol. Current HTTP clients conduct an HTTP 306 transaction by opening the TCP connection, making a series of 307 requests with a "Connection: close" line in the final request header, 308 and collecting the responses. The server closes the connection after 309 sending the final byte of the final request. Modified clients open a 310 connection to the server, make a series of requests, collect the 311 responses, and send a CLIENT_CLOSE request to the server after the 312 end of the last response. The client closes the connection immedi- 313 ately after sending the CLIENT_CLOSE. 315 Modified clients are compatible with the HTTP 1.1 specification[6]. 316 A server that does not understand CLIENT_CLOSE will see a conven- 317 tional HTTP exchange, followed by a request that it does not imple- 318 ment, and a closed connection when it tries to send an error 319 response. A conformant server must be able to handle the client 320 closing the TCP connection at any point. The client has gotten its 321 data, closed the connection and holds the TIME_WAIT TCB. 323 Modifying servers to recognize CLIENT_CLOSE can make parts of their 324 implementation easier. Mogul et al. note that detecting closed con- 325 nections can be difficult for servers[6]. CLIENT_CLOSE marks closing 326 connections, which simplifies the server code that detects and closes 327 connections that clients have intentionally closed. 329 The CLIENT_CLOSE request has been implemented directly in the 330 apache-1.24[8] server and test programs from the WebSTONE performance 331 suite. Patches are available from the authors. 333 Initial Implementation 335 In this section we present experiments that demonstrate the problem 336 and show our solutions effectiveness. The proposed solutions have 337 been implemented under SunOS 4.1.3 and initial evaluations of their 338 performance have been made using both custom benchmark programs and 339 the WebSTONE benchmark. The tests were run on hosts connected to the 340 640 Mb/sec Myrinet LAN. 342 We performed two experiments. The first experiment shows that TCB 343 load degrades server performance and that our modifications reduce 344 that degradation. The second illustrates that both our TCP and HTTP 345 solutions improve server performance under the WebSTONE benchmark, 346 which simulates typical HTTP traffic. The last experiment shows that 347 our modifications enable a server to support HTTP loads that it can- 348 not in their default configurations. 350 The first experiment was designed to determine if TCB load reduces 351 server performance and if our modifications alleviate it. This 352 experiment used four Sparc 20/71's across the Myrinet using a user- 353 level data transfer program over TCP. The throughput is the average 354 of each of two client hosts doing a simultaneous bulk transfer to the 355 server host. We vary the number of TIME_WAIT TCBs at the server by 356 adding dummy TIME_WAIT states. 358 The experiment was: 360 1. Two client machines establish connections to the server 362 2. The server is loaded with extraneous TIME-WAIT TCBs state by a 363 fourth host. 365 3. The two bulk transport connections transfer data. (Throughput 366 timing begins when the data transfer begins, not when the connec- 367 tion is established. TIME_WAIT TCBs may expire during the trans- 368 fer.) 370 4. Between runs, The server is allowed to idle and remove TCBs, to 371 control conditions for all runs. 373 Each result is the average of ten runs. 375 Connections Throughput (Mb/sec) Throughput (Mb/sec) 376 in TIME_WAIT (Unmodified) (with TCP Modification) 377 avg. std. avg. std. 378 dev. dev. 379 ------------------------------------------------------------------- 380 0 66.8 3.3 66.8 3.3 381 500 49.6 3.9 66.8 3.3 382 1000 41.9 4.1 66.5 3.1 383 1500 35.3 2.8 64.6 3.0 384 2000 31.2 4.9 64.3 3.0 385 2500 30.5 3.0 64.3 2.9 387 Table 1: Worst case throughput experiment 389 The experimental procedure is designed to isolate a worst case at the 390 server. The client connections are established first to put them at 391 the end of the list of TCBs in the server kernel, which will maximize 392 the time needed to find them using SunOS's linear search. Two 393 clients are used to neutralize the simple caching behavior in the 394 SunOS kernel, which consists of keeping a single pointer to the most 395 recently accessed TCB. Two distinct clients are used to allow for 396 bursts from the two clients to interleave; two client programs on the 397 same host send bursts in lock-step, which reduces the cost of the TCB 398 list scans. 400 The experiment shows that under worst case conditions, TCB load can 401 reduce throughput by as much as 50%, and that our TCP modifications 402 improve performance under those conditions. 404 While it is useful that our modifications perform well in the worst 405 case, it is important to asses the worth of the modifications under 406 expected conditions. The previous experiment constructed a worst 407 case scenario; the following experiment uses WebSTONE to test our 408 modifications under more typical HTTP load. 410 WebSTONE is a standard benchmark used to measure web server perfor- 411 mance in terms of connection rate and per connection throughput. To 412 measure server performance, several workstations make HTTP requests 413 of a server and monitor the response time and throughput. A central 414 process collects and combines the information from the individual web 415 clients. The benchmark has been augmented to measure the amount of 416 memory consumed by TCBs on the server machine. We used WebSTONE ver- 417 sion 2 for these experiments. 419 WebSTONE models a heavy load that simulates HTTP traffic. Two hosts 420 run multiple web clients which continuously request files ranging 421 from 9KB to 5MB from the server. Each host runs 20 web clients. 423 Results from a typical run using clients are summarized: 425 System Throughput Connections TCB Memory Use 426 Type (Mb/sec) per second (Kbytes) 427 ----------------------------------------------------------------------- 428 Unmodified 20.97 49.09 722.7 429 TCP Modification 26.40 62.02 24.1 430 HTTP Modifications 31.73 74.70 24.4 432 Table 2: WebSTONE benchmark with large fileset 434 Both modifications show marked improvements in throughput and connec- 435 tion rate. TCP modifications increase connection rate by 25% and 436 HTTP modifications increase connection rate by 50%. We believe the 437 TCP modification is less effective than the HTTP modification because 438 it adds another packet exchange. Packet traces are being used to 439 confirm this. [note: more will be included on this in later drafts] 441 When more clients request smaller files, unmodified systems fail com- 442 pletely because they run out of memory; systems using our modifica- 443 tions can support much higher connection rates than unmodified sys- 444 tems. The following table reports data from a typical WebSTONE run 445 using 8 clients on 4 hosts connecting to a dedicated server. All 446 clients request only 500 byte files. 448 System Throughput Connections TCB Memory Use 449 Type (Mb/sec) per second (Kbytes) 450 ----------------------------------------------------------------------- 451 Unmodified fails fails fails 452 TCP Modification 1.14 223.8 16.1 453 HTTP Modifications 1.14 222.4 16.1 455 Table 3: WebSTONE benchmark with small files 457 The experiments support the hypothesis that the proposed solutions 458 reduce the memory load on servers. The custom benchmark shows that 459 the system with a modified transport performs much better in the 460 worst case, and that server bandwidth loss can be considerable. The 461 WebSTONE benchmark shows that both systems reduce memory usage, and 462 that this leads to performance gains. Finally modified systems are 463 able to handle workloads that unmodified systems cannot. 465 This is a challenging test environment because the TCB load of the 466 server host is spread across only two client hosts rather than the 467 hundreds that would share the load in a real system. The clients 468 suffer some performance degradation due to the accumulating TCBs, 469 much as the server does in the unmodified system. 471 Comparison of Methods 473 The primary contrast between the TCP solution and the HTTP solution 474 is that they are implemented at different levels of the protocol 475 hierarchy. The TCP solution has the benefits and drawbacks of a 476 transport level solution: it applies the fix transparently to all 477 application protocols running over TCP, but may be difficult to adopt 478 for the same reason. A change to TCP affects many applications, and 479 many resist changes to TCP to avoid unintended consequences. The 480 HTTP solution has the trade-offs of an application modification: only 481 HTTP will exhibit the new behavior, and the behavior of other appli- 482 cations will be limited. If another protocol causes a TIME_WAIT 483 state buildup, an HTTP fix will not prevent it. 485 The performance of our TCP modification will also be limited by how 486 efficiently hosts process RST packets. Hosts that incur a high over- 487 head to handling RSTs, or delay processing them, will not perform as 488 well. This may be one reason that the TCP solution shows less 489 improvement than the HTTP solution in the small file experiment 490 above. [note this will be expanded upon] 492 The meaning of the RST packet is also changed by out TCP solution. 493 An RST packet is intended to indicate an unusual condition or error 494 in the connection. We are proposing making it part of standard oper- 495 ating procedure. The change in semantics of the RST packet is a 496 result of maintaining compatibility with current TCP. Some browsers 497 are currently using RST in unintended ways as well. 499 Ideally, the two TCP endpoints would negotiate which would hold the 500 TIME_WAIT TCB during connection establishment, but this would require 501 changing the TCP packet format to allow room for that negotiation, 502 and further changes to the state machine. We believe such a system 503 is the best solution to the TIME_WAIT TCB accumulation problem, but 504 recognize that such a large change to TCP would be difficult to get 505 adopted. 507 Adopting the HTTP solution is effective if HTTP connections are the 508 source of TIME_WAIT loading; however, if another protocol begins 509 loading servers with TIME_WAIT states, that protocol will have to be 510 fixed as well. Currently, we believe HTTP causes the bulk of 511 TIME_WAIT loading, which is why we chose to implement our solution 512 under HTTP; in the future other protocols may be the source. 514 Not adopting a TCP fix means that future protocols should be designed 515 to control TIME_WAIT loading, which will constrain their semantics. 516 Specifically, application protocols will not be able to use the state 517 of the transport connection as implicit signalling; application layer 518 protocols will be constrained to include framing and connection 519 control information or run the risk of TIME_WAIT loading servers. 520 For example, streaming real-time transmission systems may make use of 521 such implicit signalling. 523 Some existing protocols, such as FTP[9], make use of implicit sig- 524 nalling, and cannot be retrofitted with TIME_WAIT controls. As these 525 protocols are currently used, they do not appear to be major sources 526 of TIME_WAIT loading. They could become important to TIME_WAIT load 527 if a protocol has a resurgence or is used in new ways, or if its 528 smaller loading characteristics become significant after the HTTP 529 load is reduced. If that happens, a backward-compatible solution may 530 not be possible. 532 Both the TCP and the HTTP solutions are incrementally deployable and 533 solve the problem at hand. Which to deploy in the Internet depends 534 on how the community weighs the changing the semantics of the exist- 535 ing transport protocol versus restricting the semantics of future 536 application protocols. 538 Conclusions 540 This document has discussed the problem of server memory load due to 541 terminated connections remaining in TIME_WAIT state. Servers can 542 become so memory poor at high connection rates that they are unable 543 to transfer data at all. Even if servers can continue to function, 544 their performance can suffer. 546 Two solutions to the memory load problem have been presented at the 547 transport (TCP) level and the application (HTTP). Both solutions 548 allow a client to take on its share of the server memory load. The 549 transport level solution adds a new state and operation to the TCP 550 state machine that explicitly moves the TIME_WAIT state from active 551 close initiator to passive closer. The application level solution 552 adds an access method to HTTP that allows a client to notify a server 553 that it is actively closing the connection, and maintaining the 554 TIME_WAIT state. 556 Both solutions will interoperate with existing systems, allowing for 557 easy deployment. Patches are available from the authors for both 558 solutions: TCP modifications are available for SunOS 4.1.3 and HTTP 559 modifications are available for apache-1.24. 561 Although there are certainly other methods of dealing with TIME_WAIT 562 state accumulation, the methods presented here have the benefits that 563 they preserve current TCP behavior, are incrementally deployable, and 564 are small simple changes to existing systems. Most other solutions, 565 such as ending connections with an RST or moving TIME_WAIT TCBs to 566 other internal queues at the server, either break transport behavior, 567 or do not address the memory load problem directly. 569 Security Considerations 571 The practices advocated in this paper do not seem to affect the secu- 572 rity of either the HTTP or TCP protocols. 574 The increased use and change in semantics of RST packets may cause 575 false alarms in systems that monitor them. 577 Authors' Addresses 579 Ted Faber 580 Joseph Touch 581 Wei Yue 582 University of Southern California/Information Sciences Institute 583 4676 Admiralty Way 584 Marina del Rey, CA 90292-6695 585 USA 586 Phone: +1 310 822 1511 587 Fax: +1 310 823 6714 588 EMail: faber@isi.edu 589 touch@isi.edu 590 wyue@isi.edu 592 This draft expires March 20, 1998. 594 References 596 1. Gene Trent and Mark Sake, "WebSTONE: The First Generation in HTTP 597 Server Benchmarking," white paper, Silicon Graphics International 598 (February 1995), available electronically at 599 . 601 2. Jon Postel, "Transmission Control Protocol," RFC-793/STD-7 (Septem- 602 ber, 1981). 604 3. Myricom, Inc., Nannette J. Boden, Danny Cohen, Robert E. Felderman, 605 Alan E Kulawik, Charles L. Seitz, Jakov N. Selovic, and Wen-King Su, 606 "Myrinet: A Gigabit-per-second Local Area Network," IEEE Micro, pp. 607 29-36, IEEE (February 1995). 609 4. Mike Karels and David Borman, Personal Communication (July 1997). 611 5. Paul E. McKenney and Ken F. Dove, "Efficient Demultiplexing of Incom- 612 ing TCP Packets," Proceedings of SIGCOMM 1992, vol. 22, no. 4, pp. 613 269-279, Baltimore, MD (August 17-20, 1992). 615 6. R. Fielding, J. Gettys, J. Mogul, H. Frystyk, and T. Berners-Lee, 616 "Hypertext Transport Protocol - HTTP/1.1," RFC-2068 (January, 1997). 618 7. Robert G. Moskowitz, "Why in the World Is the Web So Slow," Network 619 Computing, pp. 22-24 (March 15, 1996). 621 8. Roy T. Fielding and Gail Kaiser, "Collaborative Work: The Apache 622 Server Project," IEEE Internet Computing, vol. 1, no. 4, pp. 88-90, 623 IEEE (July/August 1997), available electronically at 624 . 626 9. J. Postel and J. K. Reynolds, "File Transfer Protocol," RFC-959 627 (October, 1985).