idnits 2.17.1 draft-rfced-info-freniche-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing document type: Expected "INTERNET-DRAFT" in the upper left hand corner of the first page ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 65 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 2 instances of lines with control characters in the document. == There are 5 instances of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 98 has weird spacing: '... server name...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 1998) is 9414 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Missing reference section? '1' on line 482 looks like a reference -- Missing reference section? '2' on line 485 looks like a reference -- Missing reference section? '3' on line 487 looks like a reference -- Missing reference section? '4' on line 490 looks like a reference -- Missing reference section? '5' on line 493 looks like a reference -- Missing reference section? '0' on line 586 looks like a reference Summary: 10 errors (**), 0 flaws (~~), 4 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group J. Freniche 2 CASA 3 Category: Informational July 1998 5 TCP Window Probe Deadlock 6 8 Status of This Memo 10 This document is an Internet-Draft. Internet-Drafts are working 11 documents of the Internet Engineering Task Force (IETF), its 12 areas, and its working groups. Note that other groups may also 13 distribute working documents as Internet-Drafts. 15 Internet-Drafts are draft documents valid for a maximum of six 16 months and may be updated, replaced, or obsoleted by other 17 documents at any time. It is inappropriate to use Internet- 18 Drafts as reference material or to cite them other than as 19 "work in progress." 21 To view the entire list of current Internet-Drafts, please check 22 the "1id-abstracts.txt" listing contained in the Internet-Drafts 23 Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net 24 (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au 25 (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu 26 (US West Coast). 28 Distribution of this document is unlimited. 30 Copyright Notice. 32 Copyright (C) The Internet Society (1998). All Rights Reserved. 34 Introduction. 36 In the course of developing/testing a TCP/IP stack for embedded 37 computers, a situation that can be called 'TCP window probe deadlock' 38 and subsequent connection abort has been observed. 40 The above has been detected when a client host sends, using TCP (Ref. 41 1), a huge amount of data to a server host, which in turns processes 42 such input and returns also a huge amount of data. If the sender does 43 not mix appropriately send and receive requests to its underlaying 44 TCP, it is possible to enter in a situation where both applications 45 are blocked and the respective TCP layers interchange window probes 46 forever (unless aborted by some sort of alarm). 48 Initially it was though that the deadlock was a fault when 49 implementing the TCP/IP stack. But it has been reproduced immediately 50 in several other configurations (FreeBSD <-> FreeBSD, FreeBSD <-> HP- 51 UX, HP-UX <-> HP-UX, FreeBSD <-> Solaris and FreeBSD <-> AIX, all 52 tested using Ethernet interfaces and also when using the local 53 interface). Given its nature, it is believed that it occurs in many, 54 if not all, TCP implementations. 56 Next section gives indications on how to reproduce the deadlock, 57 followed by a more detailed description and analysis. Conditions and 58 factors having influence in the deadlock are examined and a solution 59 (at TCP and application level) is proposed, whose impact in current 60 applications is analyzed. Appendixes with traces are also included. 62 For those more curious, the board was a "bare machine" Motorola 63 MC68040 single board computer with an AMD79C90 LANCE interface, 64 programming language was Ada, operating system was the nuke Ada Run 65 Time System. 67 Reproducing the Deadlock. 69 C and S are hosts communicating by TCP. C (client) runs a client 70 program that sends lots of data to S (server) between receive 71 requests. The server processes such data and returns to the client 72 also lots of data. The interface between the applications and the 73 underlaying TCPs is blocking (which is the default behavior for 74 Sockets). 76 Note that C and S do not need to be directly connected, i.e., routers 77 can exist between C and S. 79 A good example for such a server application is the echo service 80 (Ref. 2). Specifically the deadlock was detected by enabling the 81 echo service in the server S, and then running the "tcpecho" client 82 program (included in Appendix 1) in the client C. In the next 83 explanations, "echo" will be used as such server, however any 84 client/server pair with the same characteristics must exhibit the 85 deadlock. 87 In the server, enable the stream echo service (uncomment the echo 88 stream service in /etc/inetd.conf and then reboot "inetd"). In the 89 client, compile the "tcpecho.c" program and execute it (as a normal 90 user): 92 client> tcpecho -n 1 -a 120 -m 60000 server A 94 where: 95 -n 1 send just 1 buffer 96 -a 3600 set an alarm (for socket operations) for 3600 seconds 97 -m 60000 buffer size is 60000 bytes 98 server name or ip address of the server 99 A just one character placed in all bytes of the payload 101 Payload (-m 60000) may need to be adjusted to provoke the deadlock, 102 as it depend upon the receiver buffer sizes and other connection 103 parameters in both sides. 105 The communication between S and C is monitored using a "sniffer" (HP 106 Advisor) and "tcpdump" (Ref. 3) in any machine attached to the same 107 subnet as anyone of the other two hosts (assuming the media is 108 Ethernet), that even can be the client or the server host. 110 Really, "tcpdump" is sufficient to see the deadlock. Clearer trace is 111 obtained by setting "tcpdump" strong filters for the socket pair (C, 112 port number) and (S, port number). One sample of the trace with the 113 deadlock is included in Appendix 2. 115 Description. 117 Once the TCP connection is established, segments are interchanged 118 between C and S. After some amount of data is send and received, the 119 client continue sending segments (with data) but announcing its 120 receive window is 0. 122 S stops then sending data to C, but continues receiving data segments 123 (with window 0) from C. After a while, S announces also to C that its 124 receive window is now 0. 126 An interchange of window probes is made now, one after other, by both 127 hosts, spaced increasingly in time. No more data is effectively 128 interchanged and processed, client and server applications do not 129 progress. 131 Both hosts are now in window probe deadlock. 133 The deadlock is finally broken by exhausting (in hosts that implement 134 a limit to retransmissions of window probes) the number of 135 retransmissions of window probes (between 10 to 15 per host, that 136 giving the back-off, means between 10 minutes and 1 hour) or by the 137 alarm in the client side. 139 The connection is aborted if such alarm was implemented, otherwise 140 the deadlock continues forever. 142 Appendix 2 contains a trace of a connection in "window probe 143 deadlock" and subsequent abort by an alarm. 145 Explanation of the Deadlock. 147 To make easier the explanation, numeric parameters (but 148 representative of actual figures) for the connection in both hosts 149 are used as in this example: 151 Client C: 152 application send buffer = 60000 bytes 153 application recv buffer = 60000 bytes 154 MTU is 1460 bytes 155 TCP send buffer = 16384 156 TCP recv buffer = same as send buffer 158 Server C: 159 application send buffer = 8192 bytes 160 application recv buffer = 8192 bytes 161 MTU is 1460 bytes 162 TCP send buffer = 16383 163 TCP recv buffer = same as send buffer 165 The sequence in the communication is: 167 1 Client C opens the connection with the server S. 169 2 Client C issues a socket send with a buffer of 60000 bytes. 171 3 TCP in client C copies 16384 bytes from the application send 172 buffer to the TCP send buffer and starts to send TCP several 173 segments of 1460 bytes. As there are bytes in the application send 174 buffer pending to be accepted by TCP, the client application is 175 blocked. 177 4 TCP in server S receives the several segments, sends back the 178 acknowledges and delivers data to the server application, in 179 chunks with maximum size of 8192 bytes. The server program 180 processes the data (in the case of echo, the processing is just a 181 copy) and issues a socket send with 8194 bytes. TCP in the server 182 accepts such data and sends it to the TCP client. 184 5 The TCP client receives the data sent by the server and keeps it 185 in its receiver window. The client application is still blocked. 187 6 Steps 3, 4 and 5 continue. Evidently, as the client TCP continues 188 sending/receiving data, the TCP receive buffer in C will be filled 189 up. 191 7 Therefore, C starts to announce receive window 0. As the TCP send 192 buffer in C still contains data, TCP C will continue sending data 193 segments to S (with receive window 0). But the client application 194 is still blocked, as not all bytes in the send request were 195 accepted by the TCP C. 197 8 Obviously S continues receiving data segments (with window 0) from 198 C. Such data is passed to the server application, which processes 199 it and send back to the client. But now such transmission is 200 blocked by the server TCP (as the TCP client announced window 0). 201 Eventually the TCP send buffer in S will be filled. The next send 202 call issued by the application server wil block. 204 9 Finally, the TCP receive window in S will be filled up completely 205 by the data segments received from S. 207 10 In this moment, both applications are blocked in their socket send 208 calls and both TCPs have their receive windows completely filled 209 up. 211 Therefore both TCPs will start to send window probes, increasingly 212 spaced in time, until end of retransmission attempts (if implemented) 213 or alarm expiration. The connection is then aborted in this case, or 214 else will run forever in "window probe deadlock". 216 Conditions for the Deadlock. 218 Clearly the first condition is that the client tries to send a huge 219 amount of data in one or several consecutive socket send calls. Huge 220 is understood here in comparison with the local TCP send buffer size. 222 This last size must also be sufficient to produce a number of 223 segments that start to fill up the peer TCP receive buffer size. 225 To obtain the deadlock, server applications must be according with 226 the "echo" pattern: they must read data from their local TCP in not 227 too much large chunks (in comparison with the client's application 228 send buffer size), process such data and finally respond to the 229 client with large amount of data. 231 The server's application send buffer must be of medium size, to allow 232 that the TCP send buffer fills up it completely and the socket send 233 call blocks. The server's TCP send buffer size must also allow for 234 sufficient segments to fill the client TCP receive buffer. 236 As noted, such conditions on TCPs send and receive buffer sizes are 237 usually found in current TCP implementations. Same for send and 238 receive buffer sizes of server applications. It is also no so unusual 239 that client and server interchange large amount of data. 241 The only non-usual condition is the client sending a huge amount of 242 data, in one or several consecutives blocking send calls, before 243 reading responses. 245 Other Factors. 247 Given that that the root of the phenomenon is a "mechanical" blocking 248 among the send/receive applications/TCPs buffers, it is evident that 249 transmission media characteristics such as MTU and speed do not 250 contribute, positive or negatively. 252 However, TCP can be used on top of some transmission protocols (SMDS, 253 ATM) that have larger MTU, and TCP send and receive windows are 254 usually larger. This may mitigate or even avoid the deadlock. But on 255 the other hand, such protocols are used to interchange large amount 256 of data. Again, the driven condition is to send large amount of data, 257 that will be processed and returned, without intermixing receive 258 requests. 260 By the same reason, host processing power is not a relevant factor. 262 The deadlock is also independent of slow start/congestion avoidance 263 as well from sender/received silly window avoidance and Nagle 264 algorithms. Note that latest FreeBSDs use initially a large send 265 congestion window in local networks but this last feature is not 266 implemented in HP-UX. However, the deadlock was obtained in both type 267 of hosts. 269 The phenomenon is clearly dependent upon the send and receive buffer 270 sizes of the client and server applications, as well of the send and 271 receive buffer sizes of both TCPs. Modifying such parameters can 272 solve the particular problem, but the possibility of deadlock and 273 subsequent timeouts will be still there. 275 This has been checked for several combinations of sizes. Just by 276 adjusting conveniently the -m payload, the deadlock is obtained 277 again. 279 Proposed Solution. 281 Only a blocking interface between the application and the TCP level 282 is considered in this section (i.e., blocking sockets). 284 If the conditions of the client/server are as described in previous 285 paragraphs, and blocking sockets are used, there is a potential for 286 deadlock. Such situation can be avoided by using other models in the 287 client/server communication, such as non-blocking sockets or even 288 using two connections, one for sending and other for receiving data 289 (see Ref. 4 for a detailed discussion and implementation). 291 However there is a solution (at TCP code level in the client) that 292 can impede the deadlock, even with blocking sockets. 294 The application is blocked because there is not space in the TCP send 295 buffer. But such buffer will continue completely filled up because 296 the peer closed its receive window, and the peer application is also 297 blocked in a send call by the same reason. 299 One way to break the deadlock is to wake up the local application 300 when: 302 A The application send buffer is completely copied to TCP send 303 buffer (this is the behavior already available) 305 or else when ALL the following conditions hold: 307 B-1 TCP send buffer full (this TCP cannot accept more data from the 308 send call), 310 B-2 TCP send window is 0 (this TCP cannot send data to peer, so the 311 TCP send buffer will not be emptied), 313 B-3 TCP receive buffer is full (this TCP cannot accept more data 314 received), 316 B-4 TCP has a pending blocking send request, 318 If the four conditions B hold on one side, such part of the 319 connection is blocked. But the peer status is as follows: peer TCP 320 receive buffer is full (by condition B2); peer TCP will not send any 321 more data (by condition B3). 323 There is the potential that now the peer application issues a send 324 call with more data than its local TCP send buffer can accommodate, 325 therefore also blocking. 327 Obviously, if the application is awakened in the side where 328 conditions B hold, the deadlock is avoided. 330 The solution implies at much three modifications to the TCP code that 331 output a segment, to wake up the application when the conditions 332 hold. The check must be done at much in three places: 334 Place 1 When the retransmission timer expires and the TCP must 335 transmit a window probe. 337 Place 2 When a peer window probe is received and an acknowledge must 338 be transmitted. 340 Place 3 When setting up the retransmission timer for window probe 341 transmission. 343 Places 1 and 2 are reactive solutions once the deadlock is present. 344 Instead, last place is pro-active, acting immediately before the 345 deadlock can occur. If the check succeeds, the application must be 346 notified with the number of bytes accepted by the TCP or by a 347 specific error in case of no bytes accepted. 349 If the check is implemented only in Places 1 and 2 and only in the 350 client TCP, the connection runs now to completion but with 351 unnecessary delays: if the connection falls into the "window probe 352 deadlock", the application client will be awakened after the first 353 window probe timeout, which is about 3 seconds. 355 If the check is implemented only in Places 1 and 2 and only in the 356 server TCP, in addition to the previous unnecessary delay, the 357 connection runs slowly after the first window probe deadlock is 358 avoided. The reason is that the application server usually has a 359 short to medium sized receive buffer (for example, 8194 for "inetd" 360 built-in servers such as echo). Once forced to wake up, the server 361 TCP announces a receive window of 8194, which is immediately filled 362 by the client, and the deadlock occurs again until the server be 363 awakened again by this modification (after 3 seconds), and so on. 365 Modifying the TCP code just in Place 3 avoids such delays and it is 366 clearly sufficient for all the cases. 368 Testing a TCP level modified according to this section (just in Place 369 3), it was seen that now connections run correctly and smoothly to 370 completion in the case of being a client, independently if the TCP 371 server has also implemented it. 373 On the other hand, if the TCP client did not implement the solution, 374 deadlock may occur even if the TCP server has implemented it. This 375 asymmetry is caused by the characteristics of the client and server: 376 even if the server is notified, it has no other choice than read 377 (until its memory resources are exhausted) and respond, causing now 378 the deadlock. On the other hand, the client may now switch between 379 sending and reading, avoiding the deadlock. 381 A sample trace for a modified client and unmodified server is 382 included in Appendix 3. The connection now runs to completion with no 383 delays. 385 Impact on Current Applications. 387 The new behavior is compatible with current applications, as the 388 socket send specification (Ref. 5) 390 number_of_accepted_bytes = send (socket, 391 application_send_buffer_address, 392 number_of_bytes_to_send, 393 flags) 395 already returned the number of bytes sent. 397 If the socket is blocking, the caller will be blocked and only 398 notified when the complete buffer has been accepted by the local TCP. 399 In this case, it will return such number of bytes. 401 The caller can also be notified when a send timeout expires (if such 402 socket option was set, but not all TCP implementations provide such 403 option). In this case it will return the effective number of bytes 404 accepted (can be 0, in this last case the error EWOULDBLOCK is set). 406 Therefore, the implications of the new behavior were already present 407 in the interface provided by Sockets. Well-coded applications are 408 therefore aware of such behavior, the proposed TCP modification will 409 have no impact on them. 411 There follows an example of code aware of this possibility: 413 if ((n = send (s, *buf, strlen (buf), flags)) == strlen (buf)) { 414 /* the complete buffer was sent */ 415 ... 417 } else if (n == -1) { 418 /* some error detected, check errno */ 419 if (errno == EWOULDBLOCK) { 421 /* could not sent, proceed accordingly */ 422 ... code for try again ... 424 } else { 425 /* serious error, proceed accordingly */ 426 ... 427 } 429 } else { 430 /* the buffer was sent partially */ 431 ... code for sending the remaining part ... 432 } 434 Note that it is the attempt of sending more data when the buffer was 435 not completely sent, what can lead to the window probe deadlock 436 described. 438 If conditions are as described in previous sections, the deadlock may 439 occur. If the TCP code is not modified, the application will remain 440 blocked in a send call, until retransmissions expire (if implemented) 441 or alarms expire (if used). No modification to the application will 442 avoid the deadlock. 444 If the TCP code is modified as proposed, a client application 445 notified of a send call with the buffer not completely sent must not 446 try to send again. Instead, it must replace the code for sending the 447 remaining part by code for reading the first responses to what was 448 already sent. There follows a pseudo-code for this (see also Appendix 449 1): 451 adjust pointers to cover the whole buffer as a chunk to be sent 452 loop until the whole buffer is sent 453 send a chunk /* first time is all the buffer */ 454 check status: 455 if -1 then error and exit, 456 except when errno is EWOULDBLOCK, continue. 457 read response from server 458 adjust the pointers to the next chunk 459 end loop 461 Security Considerations. 463 The only security implication detected in this study is a "denial of 464 service" attack to those hosts that do not implement a limit in the 465 retransmission of window probes but that provide servers that send 466 large amount of data to clients, in response to large amount of data 467 sent by such clients. A representative server is the one implementing 468 the echo protocol (Ref. 2). 470 An attacker could then establish connections as described in this 471 memo. As the server host will not abort the retransmission of window 472 probes, the attacker will be able to waste resources in the server, 473 as long as he maintains such connections. 475 To avoid this attack, do implement a limit to the retransmission of 476 window probes. The modifications proposed to the client TCP code 477 level will avoid the deadlock in the client side, but no in the 478 server side. 480 References. 482 [1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 483 September 1981. 485 [2] Postel, J., "Echo Protocol", STD 20, RFC 862, May 1983. 487 [3] Lawrence Berkeley National Laboratory, Network Research Group 488 (tcpdump@ee.lbl.gov): ftp://ftp.ee.lbl.gov/tcpdump.tar.Z 490 [4] Stevens, W. R.: Network Programming, Vol. 1, 2nd Ed., 1998, 491 Prentice Hall. 493 [5] IEEE, "Protocol Independent Interfaces", IEEE Std 1003.1g. 495 Author's Address. 497 Juan L. Freniche 498 Engineering Division 499 Construcciones Aeronauticas (CASA) 500 Getafe (SPAIN) 502 Phone: + 34.91.624-2950 503 Fax: + 34.91.624-2705 505 EMail: jlfreniche@acm.org 507 Appendix 1: Listing of tcpecho.c 509 #include 510 #include 511 #include 513 #define SA struct sockaddr 515 int s; /* socket descriptor */ 517 void set_alarm (int duration) { 518 struct itimerval inttimer; 519 struct itimerval ointtimer; 520 inttimer.it_interval.tv_sec = duration; 521 inttimer.it_interval.tv_usec = 0; 522 inttimer.it_value.tv_sec = duration; 523 inttimer.it_value.tv_usec = 0; 524 setitimer (ITIMER_REAL, &inttimer, &ointtimer); 525 } 527 void close_comms () { 528 struct linger linger; 529 linger.l_onoff = 1; 530 linger.l_linger = 0; 531 setsockopt (s, SOL_SOCKET, SO_LINGER, &linger, sizeof (linger)); 532 close (s); 533 } 535 void timeout () { 536 fprintf (stderr, "Connection timeout0); 537 close_comms (); 538 exit (1); 539 } 541 int process_by_tcp (char *remote_host, char *msg, 542 int multiple, int times, int alarm_time) 543 { 544 struct hostent *hp; 545 struct servent *sp; 546 struct sockaddr_in peeraddr_in; 548 int nbytes, i, pending_bytes; 549 char echo_msg [multiple * strlen (msg)]; 550 char echo_constant [multiple * strlen (msg)]; 551 char *aux; 553 memset ((char *) &peeraddr_in, 0, sizeof (struct sockaddr_in)); 554 peeraddr_in.sin_family = AF_INET; 555 hp = gethostbyname (remote_host); 556 if (hp == NULL) { 557 fprintf (stderr, "tcpecho: %s not found0, remote_host); 558 return -1; 559 } 560 peeraddr_in.sin_addr.s_addr = 561 ((struct in_addr *) (hp->h_addr))->s_addr; 563 sp = getservbyname ("echo", "tcp"); 564 if (sp == NULL) { 565 fprintf (stderr, "tcpecho: echo not found in /etc/services0); 566 return -1; 567 } 568 peeraddr_in.sin_port = sp->s_port; 570 s = socket (AF_INET, SOCK_STREAM, 0); 571 if (s == -1) { 572 fprintf (stderr, "tcpecho: Unable to create socket0); 573 return -1; 574 } 576 set_alarm (alarm_time); 577 if (connect (s, (SA *) &peeraddr_in, 578 sizeof (struct sockaddr_in)) == -1) { 579 set_alarm (0); 580 fprintf (stderr, "tcpecho: Unable to connect to remote host %s0, 581 remote_host); 582 return -1; 583 } 584 set_alarm (0); 586 echo_constant [0] = ' '; 587 aux = echo_constant; 588 for (i = 1; i <= multiple; i++) { 589 strcpy (aux, msg); 590 aux = aux + strlen (msg); 591 } 592 strcat (echo_constant, " "); 594 for (i = 1; i <= times; i++) { 595 nbytes = strlen (echo_msg); 596 set_alarm (alarm_time); 598 if (send (s, echo_msg, nbytes, 0) != nbytes) { 599 fprintf (stderr, "tcpecho: Unable to send all bytes0); 600 close_comms (); 601 exit (1); 602 } 603 pending_bytes = nbytes; 604 while (pending_bytes > 0) { 605 if ((nbytes = recv (s, echo_msg, pending_bytes, 0)) <= 0) { 606 fprintf (stderr, "tcpecho: Error reading echo from server0); 607 close_comms (); 608 exit (1); 609 } else { 610 pending_bytes = pending_bytes - nbytes; 611 echo_msg [nbytes] = ' '; 612 } 613 } 614 } 615 set_alarm (0); 616 shutdown (s, 1); 617 return 0; 618 } 620 void print_usage () { 621 fprintf 622 (stderr, 623 "tcpecho: [-n times -a alarm -m multiple] remote_host string0); 624 exit (1); 625 } 627 int main (int argc, char *argv[]) 628 { 629 int c; 630 int times = 1; 631 int alarm_time = 20; 632 int status = 0; 633 int multiple = 0; 634 char *remote_host; 636 if (argc <= 1) { 637 print_usage (); 638 } 640 while ((c = getopt (argc, argv, "n:m:a:")) != -1) { 641 switch (c) { 642 case 'a': 643 alarm_time = atoi(optarg); 644 break; 645 case 'n': 646 times = atoi (optarg); 647 if (times < 1) times = 1; 648 break; 649 case 'm': 651 multiple = atoi (optarg); 652 if (multiple < 1) multiple = 0; 653 break; 654 } 655 } 657 if ((argc - optind) < 2) { 658 print_usage (); 659 } 661 remote_host = argv [optind]; 662 optind ++; 664 signal (SIGINT, close_comms); 665 signal (SIGALRM, timeout); 667 status = process_by_tcp (remote_host, argv [optind], 668 multiple, times, alarm_time); 669 exit (status); 670 } 672 Appendix 2: Trace of a Connection in Deadlock, 674 Trace of a local connection where the phenomenon occurs. To reproduce 675 it, enable the inetd echo service, compile tcpecho.c, launch a second 676 xterm, and execute in it: 678 localhost> tcpdump -N -p -i lo0 -s 128 -S 680 Now, in the first xterm, execute (adjust conveniently the payload, 681 observe the mss on the local interface): 683 localhost> tcpecho -t -n 1 -a 120 -m 300000 localhost A 685 The trace has been edited to remove some unnecessary fields and 686 aligning the remaining. 688 server> tcpdump -N -p -i lo0 -s 128 689 tcpdump: listening on lo0 690 4:26.637 1026 > echo: S 0:0(0) win 16384 691 4:26.637 echo > 1026: S 0:0(0) ack 1 win 57344 692 4:26.637 1026 > echo: . ack 1 win 57344 693 4:26.781 1026 > echo: P 1:2049(2048) ack 1 win 57344 694 4:26.782 1026 > echo: P 2049:16385(14336) ack 1 win 57344 695 4:26.782 1026 > echo: P 16385:30721(14336) ack 1 win 57344 696 4:26.783 1026 > echo: P 30721:45057(14336) ack 1 win 57344 697 4:26.784 echo > 1026: P 1:2049(2048) ack 45057 win 20480 698 4:26.784 1026 > echo: P 45057:57345(12288) ack 2049 win 55296 699 4:26.784 echo > 1026: P 2049:4097(2048) ack 57345 win 8192 700 4:26.785 1026 > echo: P 57345:59393(2048) ack 4097 win 53248 701 4:26.785 echo > 1026: P 4097:8193(4096) ack 59393 win 6144 702 4:26.785 1026 > echo: P 59393:61441(2048) ack 8193 win 49152 703 4:26.786 echo > 1026: P 8193:10241(2048) ack 61441 win 4096 704 4:26.787 echo > 1026: P 10241:24577(14336) ack 61441 win 20480 705 4:26.787 1026 > echo: . 61441:75777(14336) ack 24577 win 32768 706 4:26.788 echo > 1026: P 24577:26625(2048) ack 75777 win 14336 707 4:26.788 1026 > echo: . 75777:90113(14336) ack 26625 win 30720 708 4:26.788 echo > 1026: P 26625:28673(2048) ack 90113 win 0 709 4:26.790 echo > 1026: P 28673:43009(14336) ack 90113 win 16384 710 4:26.790 1026 > echo: . 90113:104449(14336) ack 43009 win 14336 711 4:26.790 echo > 1026: P 43009:45057(2048) ack 104449 win 2048 712 4:26.792 echo > 1026: . 45057:57345(12288) ack 104449 win 34816 713 4:26.792 1026 > echo: . 104449:118785(14336) ack 57345 win 0 714 4:26.792 1026 > echo: . 118785:133121(14336) ack 57345 win 0 715 4:26.794 echo > 1026: . ack 133121 win 38912 716 4:26.794 1026 > echo: . 133121:147457(14336) ack 57345 win 0 717 4:26.794 1026 > echo: P 147457:161793(14336) ack 57345 win 0 718 4:26.951 echo > 1026: . ack 161793 win 18432 719 4:26.951 1026 > echo: . 161793:176129(14336) ack 57345 win 0 720 4:27.151 echo > 1026: . ack 176129 win 4096 721 4:31.451 echo > 1026: . 57345:57346(1) ack 176129 win 4096 722 4:31.451 1026 > echo: . 176129:180225(4096) ack 57345 win 0 723 4:31.551 echo > 1026: . ack 180225 win 0 724 4:36.451 1026 > echo: . 180225:180226(1) ack 57345 win 0 725 4:36.451 echo > 1026: . ack 180225 win 0 726 4:38.451 echo > 1026: . 57345:57346(1) ack 180225 win 0 727 4:38.451 1026 > echo: . ack 57345 win 0 728 4:42.451 1026 > echo: . 180225:180226(1) ack 57345 win 0 729 4:42.451 echo > 1026: . ack 180225 win 0 730 4:52.451 echo > 1026: . 57345:57346(1) ack 180225 win 0 731 4:52.451 1026 > echo: . ack 57345 win 0 732 4:54.451 1026 > echo: . 180225:180226(1) ack 57345 win 0 733 4:54.451 echo > 1026: . ack 180225 win 0 734 5:18.451 1026 > echo: . 180225:180226(1) ack 57345 win 0 735 5:18.451 echo > 1026: . ack 180225 win 0 736 5:20.451 echo > 1026: . 57345:57346(1) ack 180225 win 0 737 5:20.451 1026 > echo: . ack 57345 win 0 738 6:06.451 1026 > echo: . 180225:180226(1) ack 57345 win 0 739 6:06.451 echo > 1026: . ack 180225 win 0 740 6:16.451 echo > 1026: . 57345:57346(1) ack 180225 win 0 741 6:16.451 1026 > echo: . ack 57345 win 0 742 6:26.791 1026 > echo: R 180225:180225(0) ack 57345 win 0 744 The connection is aborted by the alarm used for socket operations. If 745 such alarm is set sufficient high, all retransmissions of window 746 probes could have been seen, and then again the reset. 748 Appendix 3: Trace of a Connection Solving the Deadlock. 750 In a client that has implemented the modification, execute 752 client> tcpecho -t -n 1 -a 120 -m 60000 server A 754 server> tcpdump -N -p -i tun0 -s 128 755 tcpdump: listening on tun0 756 9:36.118 49152 > echo: S 0:0(0) win 11680 757 9:36.119 echo > 49152: S 0:0(0) ack 1 win 17520 758 9:36.121 49152 > echo: . ack 1 win 11680 759 9:36.130 49152 > echo: . 1:1461(1460) ack 1 win 11680 760 9:36.130 echo > 49152: P 1:1461(1460) ack 1461 win 17520 761 9:36.131 49152 > echo: . 1461:2921(1460) ack 1 win 11680 762 9:36.131 echo > 49152: P 1461:2921(1460) ack 2921 win 17520 763 9:36.132 49152 > echo: . 2921:4381(1460) ack 1 win 11680 764 9:36.132 echo > 49152: P 2921:4381(1460) ack 4381 win 17520 765 9:36.133 49152 > echo: . 4381:5841(1460) ack 1 win 11680 766 9:36.133 echo > 49152: P 4381:5841(1460) ack 5841 win 17520 767 9:36.134 49152 > echo: . 5841:7301(1460) ack 1 win 11680 768 9:36.134 echo > 49152: P 5841:7301(1460) ack 7301 win 17520 769 9:36.135 49152 > echo: . 7301:8761(1460) ack 1 win 11680 770 9:36.135 echo > 49152: P 7301:8761(1460) ack 8761 win 17520 771 9:36.136 49152 > echo: . 8761:10221(1460) ack 1 win 11680 772 9:36.136 echo > 49152: P 8761:10221(1460) ack 10221 win 17520 773 9:36.136 49152 > echo: P 10221:11681(1460) ack 1 win 11680 774 9:36.137 echo > 49152: P 10221:11681(1460) ack 11681 win 17520 775 9:36.139 49152 > echo: P 11681:13141(1460) ack 1461 win 10220 776 9:36.151 echo > 49152: . ack 13141 win 17520 777 9:36.168 49152 > echo: P 13141:14601(1460) ack 2921 win 8760 778 9:36.170 49152 > echo: P 14601:16061(1460) ack 4381 win 7300 779 9:36.171 echo > 49152: . ack 16061 win 17520 780 9:36.173 49152 > echo: P 16061:17521(1460) ack 5841 win 5840 781 9:36.175 49152 > echo: P 17521:18981(1460) ack 7301 win 4380 782 9:36.176 echo > 49152: . ack 18981 win 17520 783 9:36.178 49152 > echo: P 18981:20441(1460) ack 8761 win 2920 784 9:36.180 49152 > echo: P 20441:21901(1460) ack 10221 win 1460 785 9:36.180 echo > 49152: . ack 21901 win 17520 786 9:36.183 49152 > echo: P 21901:23361(1460) ack 11681 win 0 787 9:36.185 49152 > echo: P 23361:24821(1460) ack 11681 win 0 788 9:36.185 echo > 49152: . ack 24821 win 17520 789 9:36.188 49152 > echo: . 24821:26281(1460) ack 11681 win 0 790 9:36.188 49152 > echo: P 26281:27741(1460) ack 11681 win 0 791 9:36.189 echo > 49152: . ack 27741 win 17520 792 9:36.191 49152 > echo: . 27741:29201(1460) ack 11681 win 0 793 9:36.191 49152 > echo: P 29201:30661(1460) ack 11681 win 0 794 9:36.192 echo > 49152: . ack 30661 win 17520 795 9:36.194 49152 > echo: . 30661:32121(1460) ack 11681 win 0 796 9:36.194 49152 > echo: P 32121:33581(1460) ack 11681 win 0 797 9:36.196 49152 > echo: . 33581:35041(1460) ack 11681 win 0 798 9:36.197 49152 > echo: P 35041:36501(1460) ack 11681 win 0 799 9:36.199 49152 > echo: . 36501:37961(1460) ack 11681 win 0 800 9:36.200 49152 > echo: P 37961:39421(1460) ack 11681 win 0 801 9:36.202 49152 > echo: . 39421:40881(1460) ack 11681 win 0 802 9:36.202 49152 > echo: P 40881:42341(1460) ack 11681 win 0 803 9:36.351 echo > 49152: . ack 42341 win 5840 804 9:36.353 49152 > echo: . 42341:43801(1460) ack 11681 win 0 805 9:36.354 49152 > echo: . 43801:45261(1460) ack 11681 win 0 806 9:36.355 49152 > echo: . 45261:46721(1460) ack 11681 win 0 807 9:36.355 49152 > echo: . 46721:48181(1460) ack 11681 win 0 808 9:36.551 echo > 49152: . ack 48181 win 0 809 9:36.554 49152 > echo: . ack 11681 win 11680 810 9:36.554 echo > 49152: . 11681:13141(1460) ack 48181 win 0 811 9:36.554 echo > 49152: . 13141:14601(1460) ack 48181 win 0 812 9:36.555 echo > 49152: . 14601:16061(1460) ack 48181 win 0 813 9:36.555 echo > 49152: . 16061:17521(1460) ack 48181 win 0 814 9:36.555 echo > 49152: . 17521:18981(1460) ack 48181 win 0 815 9:36.555 echo > 49152: . 18981:20441(1460) ack 48181 win 0 816 9:36.555 echo > 49152: . 20441:21901(1460) ack 48181 win 0 817 9:36.555 echo > 49152: . 21901:23361(1460) ack 48181 win 0 818 9:36.558 49152 > echo: . ack 14601 win 8760 819 9:36.559 echo > 49152: . ack 48181 win 8192 820 9:36.645 49152 > echo: . ack 17521 win 5840 821 9:36.655 49152 > echo: . ack 20441 win 2920 822 9:36.658 49152 > echo: . ack 23361 win 0 823 9:36.659 echo > 49152: . ack 48181 win 16384 824 9:36.661 49152 > echo: . 48181:49641(1460) ack 23361 win 0 825 9:36.661 49152 > echo: . 49641:51101(1460) ack 23361 win 0 826 9:36.662 49152 > echo: . 51101:52561(1460) ack 23361 win 0 827 9:36.663 49152 > echo: . 52561:54021(1460) ack 23361 win 0 828 9:36.663 49152 > echo: . 54021:55481(1460) ack 23361 win 0 829 9:36.665 49152 > echo: . 55481:56941(1460) ack 23361 win 0 830 9:36.666 49152 > echo: . 56941:58401(1460) ack 23361 win 0 831 9:36.667 49152 > echo: P 58401:59861(1460) ack 23361 win 0 832 9:36.669 49152 > echo: . ack 23361 win 11680 833 9:36.669 echo > 49152: . 23361:24821(1460) ack 59861 win 4704 834 9:36.669 echo > 49152: . 24821:26281(1460) ack 59861 win 4704 835 9:36.669 echo > 49152: . 26281:27741(1460) ack 59861 win 4704 836 9:36.669 echo > 49152: . 27741:29201(1460) ack 59861 win 4704 837 9:36.669 echo > 49152: . 29201:30661(1460) ack 59861 win 4704 838 9:36.669 echo > 49152: . 30661:32121(1460) ack 59861 win 4704 839 9:36.669 echo > 49152: . 32121:33581(1460) ack 59861 win 4704 840 9:36.670 echo > 49152: . 33581:35041(1460) ack 59861 win 4704 841 9:36.682 49152 > echo: . ack 26281 win 8760 842 9:36.697 49152 > echo: . ack 29201 win 5840 843 9:36.700 49152 > echo: . ack 32121 win 2920 844 9:36.701 echo > 49152: . ack 59861 win 12896 845 9:36.704 49152 > echo: . ack 35041 win 0 846 9:36.707 49152 > echo: . ack 35041 win 11680 847 9:36.707 echo > 49152: . 35041:36501(1460) ack 59861 win 12896 848 9:36.707 echo > 49152: . 36501:37961(1460) ack 59861 win 12896 849 9:36.707 echo > 49152: . 37961:39421(1460) ack 59861 win 12896 850 9:36.708 echo > 49152: . 39421:40881(1460) ack 59861 win 12896 851 9:36.708 echo > 49152: . 40881:42341(1460) ack 59861 win 12896 852 9:36.708 echo > 49152: . 42341:43801(1460) ack 59861 win 12896 853 9:36.708 echo > 49152: . 43801:45261(1460) ack 59861 win 12896 854 9:36.708 echo > 49152: . 45261:46721(1460) ack 59861 win 12896 855 9:36.757 49152 > echo: . ack 37961 win 8760 856 9:36.758 echo > 49152: . ack 59861 win 17520 857 9:36.761 49152 > echo: . ack 40881 win 5840 858 9:36.764 49152 > echo: . ack 43801 win 2920 859 9:36.767 49152 > echo: . ack 46721 win 0 860 9:36.788 49152 > echo: . ack 46721 win 11680 861 9:36.788 echo > 49152: . 46721:48181(1460) ack 59861 win 17520 862 9:36.788 echo > 49152: . 48181:49641(1460) ack 59861 win 17520 863 9:36.788 echo > 49152: . 49641:51101(1460) ack 59861 win 17520 864 9:36.788 echo > 49152: . 51101:52561(1460) ack 59861 win 17520 865 9:36.788 echo > 49152: . 52561:54021(1460) ack 59861 win 17520 866 9:36.788 echo > 49152: . 54021:55481(1460) ack 59861 win 17520 867 9:36.788 echo > 49152: . 55481:56941(1460) ack 59861 win 17520 868 9:36.788 echo > 49152: . 56941:58401(1460) ack 59861 win 17520 869 9:36.792 49152 > echo: . ack 49641 win 8760 870 9:36.795 49152 > echo: . ack 52561 win 5840 871 9:36.798 49152 > echo: . ack 55481 win 2920 872 9:36.801 49152 > echo: . ack 58401 win 0 873 9:36.803 49152 > echo: . ack 58401 win 11680 874 9:36.803 echo > 49152: P 58401:59861(1460) ack 59861 win 17520 875 9:36.807 49152 > echo: P 59861:60001(140) ack 59861 win 11680 876 9:36.808 echo > 49152: P 59861:59961(100) ack 60001 win 17520 877 9:37.131 49152 > echo: . ack 59961 win 11680 878 9:37.131 echo > 49152: P 59961:60001(40) ack 60001 win 17520 879 9:37.135 49152 > echo: F 60001:60001(0) ack 60001 win 11680 880 9:37.135 echo > 49152: . ack 60002 win 17520 881 9:37.136 echo > 49152: F 60001:60001(0) ack 60002 win 17520 882 9:37.153 49152 > echo: . ack 60002 win 11679 884 After the client sends a window probe, its application is awakened 885 and reads the complete receive buffer (11680, at time 9:36.554). A 886 window advertisement is send to the server, inviting it to respond 887 with more data, avoiding the deadlock. 889 Full Copyright Statement. 891 Copyright (C) The Internet Society (1998). All Rights Reserved. 893 This document and translations of it may be copied and furnished to 894 others, and derivative works that comment on or otherwise explain it 895 or assist in its implementation may be prepared, copied, published 896 and distributed, in whole or in part, without restriction of any 897 kind, provided that the above copyright notice and this paragraph are 898 included on all such copies and derivative works. However, this 899 document itself may not be modified in any way, such as by removing 900 the copyright notice or references to the Internet Society or other 901 Internet organizations, except as needed for the purpose of 902 developing Internet standards in which case the procedures for 903 copyrights defined in the Internet Standards process must be 904 followed, or as required to translate it into languages other than 905 English. 907 The limited permissions granted above are perpetual and will not be 908 revoked by the Internet Society or its successors or assigns. 910 This document and the information contained herein is provided on an 911 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 912 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 913 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 914 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 915 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."