idnits 2.17.1 draft-ietf-tcpm-rfc793bis-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. -- The draft header indicates that this document obsoletes RFC6093, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document obsoletes RFC6691, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document obsoletes RFC879, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document obsoletes RFC6528, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document updates RFC1122, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC1122, updated by this document, for RFC5378 checks: 1989-10-01) -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 19, 2015) is 3239 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 1981 (ref. '2') (Obsoleted by RFC 8201) ** Downref: Normative reference to an Informational RFC: RFC 2923 (ref. '5') -- Obsolete informational reference (is this intentional?): RFC 793 (ref. '6') (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 896 (ref. '7') (Obsoleted by RFC 7805) -- Obsolete informational reference (is this intentional?): RFC 6093 (ref. '13') (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 6528 (ref. '14') (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 6691 (ref. '15') (Obsoleted by RFC 9293) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force W. Eddy, Ed. 3 Internet-Draft MTI Systems 4 Obsoletes: 793, 879, 6093, 6528, 6691 June 19, 2015 5 (if approved) 6 Updates: 1122 (if approved) 7 Intended status: Standards Track 8 Expires: December 21, 2015 10 Transmission Control Protocol Specification 11 draft-ietf-tcpm-rfc793bis-00 13 Abstract 15 This document specifies the Internet's Transmission Control Protocol 16 (TCP). TCP is an important transport layer protocol in the Internet 17 stack, and has continuously evolved over decades of use and growth of 18 the Internet. Over this time, a number of changes have been made to 19 TCP as it was specified in RFC 793, though these have only been 20 documented in a piecemeal fashion. This document collects and brings 21 those changes together with the protocol specification from RFC 793. 22 This document obsoletes RFC 793 and several other RFCs (TODO: list 23 all actual RFCs when finished). 25 RFC EDITOR NOTE: If approved for publication as an RFC, this should 26 be marked additionally as "STD: 7" and replace RFC 793 in that role. 28 Requirements Language 30 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 31 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 32 document are to be interpreted as described in RFC 2119 [3]. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at http://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on December 21, 2015. 50 Copyright Notice 52 Copyright (c) 2015 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 This document may contain material from IETF Documents or IETF 66 Contributions published or made publicly available before November 67 10, 2008. The person(s) controlling the copyright in some of this 68 material may not have granted the IETF Trust the right to allow 69 modifications of such material outside the IETF Standards Process. 70 Without obtaining an adequate license from the person(s) controlling 71 the copyright in such materials, this document may not be modified 72 outside the IETF Standards Process, and derivative works of it may 73 not be created outside the IETF Standards Process, except to format 74 it for publication as an RFC or to translate it into languages other 75 than English. 77 Table of Contents 79 1. Purpose and Scope . . . . . . . . . . . . . . . . . . . . . . 3 80 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 81 3. Functional Specification . . . . . . . . . . . . . . . . . . 4 82 3.1. Header Format . . . . . . . . . . . . . . . . . . . . . . 4 83 3.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 9 84 3.3. Sequence Numbers . . . . . . . . . . . . . . . . . . . . 13 85 3.4. Establishing a connection . . . . . . . . . . . . . . . . 20 86 3.5. Closing a Connection . . . . . . . . . . . . . . . . . . 27 87 3.6. Precedence and Security . . . . . . . . . . . . . . . . . 29 88 3.7. Segmentation . . . . . . . . . . . . . . . . . . . . . . 30 89 3.7.1. Maximum Segment Size Option . . . . . . . . . . . . . 31 90 3.7.2. Path MTU Discovery . . . . . . . . . . . . . . . . . 33 91 3.7.3. Interfaces with Variable MSS Values . . . . . . . . . 33 92 3.7.4. Nagle Algorithm . . . . . . . . . . . . . . . . . . . 34 93 3.7.5. IPv6 Jumbograms . . . . . . . . . . . . . . . . . . . 34 94 3.8. Data Communication . . . . . . . . . . . . . . . . . . . 34 95 3.9. Interfaces . . . . . . . . . . . . . . . . . . . . . . . 38 96 3.9.1. User/TCP Interface . . . . . . . . . . . . . . . . . 38 97 3.9.2. TCP/Lower-Level Interface . . . . . . . . . . . . . . 45 98 3.10. Event Processing . . . . . . . . . . . . . . . . . . . . 46 99 3.11. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 69 100 4. Changes from RFC 793 . . . . . . . . . . . . . . . . . . . . 74 101 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 77 102 6. Security and Privacy Considerations . . . . . . . . . . . . . 77 103 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 78 104 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 78 105 8.1. Normative References . . . . . . . . . . . . . . . . . . 78 106 8.2. Informative References . . . . . . . . . . . . . . . . . 79 107 Appendix A. TCP Requirement Summary . . . . . . . . . . . . . . 79 108 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 83 110 1. Purpose and Scope 112 In 1981, RFC 793 [6] was released, documenting the Transmission 113 Control Protocol (TCP), and replacing earlier specifications for TCP 114 that had been published in the past. 116 Since then, TCP has been implemented many times, and has been used as 117 a transport protocol for numerous applications on the Internet. 119 For several decades, RFC 793 plus a number of other documents have 120 combined to serve as the specification for TCP [16]. Over time, a 121 number of errata have been identified on RFC 793, as well as 122 deficiencies in security, performance, and other aspects. A number 123 of enhancements has grown and been documented separately. These were 124 never accumulated together into an update to the base specification. 126 The purpose of this document is to bring together all of the IETF 127 Standards Track changes that have been made to the basic TCP 128 functional specification and unify them into an update of the RFC 793 129 protocol specification. Some companion documents are referenced for 130 important algorithms that TCP uses (e.g. for congestion control), but 131 have not been attempted to include in this document. This is a 132 conscious choice, as this base specification can be used with 133 multiple additional algorithms that are developed and incorporated 134 separately, but all TCP implementations need to implement this 135 specification as a common basis in order to interoperate. As some 136 additional TCP features have become quite complicated themselves 137 (e.g. advanced loss recovery and congestion control), future 138 companion documents may attempt to similarly bring these together. 140 In addition to the protocol specification that descibes the TCP 141 segment format, generation, and processing rules that are to be 142 implemented in code, RFC 793 and other updates also contain 143 informative and descriptive text for human readers to understand 144 aspects of the protocol design and operation. This document does not 145 attempt to alter or update this informative text, and is focused only 146 on updating the normative protocol specification. We preserve 147 references to the documentation containing the important explanations 148 and rationale, where appropriate. 150 This document is intended to be useful both in checking existing TCP 151 implementations for conformance, as well as in writing new 152 implementations. 154 2. Introduction 156 RFC 793 contains a discussion of the TCP design goals and provides 157 examples of its operation, including examples of connection 158 establishment, closing connections, and retransmitting packets to 159 repair losses. 161 This document describes the basic functionality expected in modern 162 implementations of TCP, and replaces the protocol specification in 163 RFC 793. It does not replicate or attempt to update the examples and 164 other discussion in RFC 793. Other documents are referenced to 165 provide explanation of the theory of operation, rationale, and 166 detailed discussion of design decisions. This document only focuses 167 on the normative behavior of the protocol. 169 TEMPORARY EDITOR'S NOTE: This is an early revision in the process of 170 updating RFC 793. Many planned changes are not yet incorporated. 172 ***Please do not use this revision as a basis for any work or 173 reference.*** 175 A list of changes from RFC 793 is contained in Section 4. 177 TEMPORARY EDITOR'S NOTE: the current revision of this document does 178 not yet collect all of the changes that will be in the final version. 179 The set of content changes planned for future revisions is kept in 180 Section 4. 182 3. Functional Specification 184 3.1. Header Format 186 TCP segments are sent as internet datagrams. The Internet Protocol 187 header carries several information fields, including the source and 188 destination host addresses [2]. A TCP header follows the internet 189 header, supplying information specific to the TCP protocol. This 190 division allows for the existence of host level protocols other than 191 TCP. 193 TCP Header Format 195 0 1 2 3 196 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 197 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 198 | Source Port | Destination Port | 199 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 200 | Sequence Number | 201 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 202 | Acknowledgment Number | 203 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 204 | Data | |U|A|P|R|S|F| | 205 | Offset| Reserved |R|C|S|S|Y|I| Window | 206 | | |G|K|H|T|N|N| | 207 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 208 | Checksum | Urgent Pointer | 209 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 210 | Options | Padding | 211 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 212 | data | 213 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 215 TCP Header Format 217 Note that one tick mark represents one bit position. 219 Figure 1 221 Source Port: 16 bits 223 The source port number. 225 Destination Port: 16 bits 227 The destination port number. 229 Sequence Number: 32 bits 231 The sequence number of the first data octet in this segment (except 232 when SYN is present). If SYN is present the sequence number is the 233 initial sequence number (ISN) and the first data octet is ISN+1. 235 Acknowledgment Number: 32 bits 237 If the ACK control bit is set this field contains the value of the 238 next sequence number the sender of the segment is expecting to 239 receive. Once a connection is established this is always sent. 241 Data Offset: 4 bits 243 The number of 32 bit words in the TCP Header. This indicates where 244 the data begins. The TCP header (even one including options) is an 245 integral number of 32 bits long. 247 Reserved: 6 bits 249 Reserved for future use. Must be zero. 251 Control Bits: 6 bits (from left to right): 253 URG: Urgent Pointer field significant 254 ACK: Acknowledgment field significant 255 PSH: Push Function 256 RST: Reset the connection 257 SYN: Synchronize sequence numbers 258 FIN: No more data from sender 260 Window: 16 bits 262 The number of data octets beginning with the one indicated in the 263 acknowledgment field which the sender of this segment is willing to 264 accept. 266 Checksum: 16 bits 268 The checksum field is the 16 bit one's complement of the one's 269 complement sum of all 16 bit words in the header and text. If a 270 segment contains an odd number of header and text octets to be 271 checksummed, the last octet is padded on the right with zeros to 272 form a 16 bit word for checksum purposes. The pad is not 273 transmitted as part of the segment. While computing the checksum, 274 the checksum field itself is replaced with zeros. 276 The checksum also covers a 96 bit pseudo header conceptually 277 prefixed to the TCP header. This pseudo header contains the Source 278 Address, the Destination Address, the Protocol, and TCP length. 279 This gives the TCP protection against misrouted segments. This 280 information is carried in the Internet Protocol and is transferred 281 across the TCP/Network interface in the arguments or results of 282 calls by the TCP on the IP. 284 +--------+--------+--------+--------+ 285 | Source Address | 286 +--------+--------+--------+--------+ 287 | Destination Address | 288 +--------+--------+--------+--------+ 289 | zero | PTCL | TCP Length | 290 +--------+--------+--------+--------+ 292 The TCP Length is the TCP header length plus the data length in 293 octets (this is not an explicitly transmitted quantity, but is 294 computed), and it does not count the 12 octets of the pseudo 295 header. 297 Urgent Pointer: 16 bits 299 This field communicates the current value of the urgent pointer as 300 a positive offset from the sequence number in this segment. The 301 urgent pointer points to the sequence number of the octet following 302 the urgent data. This field is only be interpreted in segments 303 with the URG control bit set. 305 Options: variable 307 Options may occupy space at the end of the TCP header and are a 308 multiple of 8 bits in length. All options are included in the 309 checksum. An option may begin on any octet boundary. There are 310 two cases for the format of an option: 312 Case 1: A single octet of option-kind. 314 Case 2: An octet of option-kind, an octet of option-length, and 315 the actual option-data octets. 317 The option-length counts the two octets of option-kind and option- 318 length as well as the option-data octets. 320 Note that the list of options may be shorter than the data offset 321 field might imply. The content of the header beyond the End-of- 322 Option option must be header padding (i.e., zero). 324 Currently defined options include (kind indicated in octal): 326 Kind Length Meaning 327 ---- ------ ------- 328 0 - End of option list. 329 1 - No-Operation. 330 2 4 Maximum Segment Size. 332 A TCP MUST be able to receive a TCP option in any segment. A TCP 333 MUST ignore without error any TCP option it does not implement, 334 assuming that the option has a length field (all TCP options except 335 End of option list and No-Operation have length fields). TCP MUST 336 be prepared to handle an illegal option length (e.g., zero) without 337 crashing; a suggested procedure is to reset the connection and log 338 the reason. 340 Specific Option Definitions 342 End of Option List 344 +--------+ 345 |00000000| 346 +--------+ 347 Kind=0 349 This option code indicates the end of the option list. This 350 might not coincide with the end of the TCP header according to 351 the Data Offset field. This is used at the end of all options, 352 not the end of each option, and need only be used if the end of 353 the options would not otherwise coincide with the end of the TCP 354 header. 356 No-Operation 358 +--------+ 359 |00000001| 360 +--------+ 361 Kind=1 363 This option code may be used between options, for example, to 364 align the beginning of a subsequent option on a word boundary. 365 There is no guarantee that senders will use this option, so 366 receivers must be prepared to process options even if they do 367 not begin on a word boundary. 369 Maximum Segment Size (MSS) 371 +--------+--------+---------+--------+ 372 |00000010|00000100| max seg size | 373 +--------+--------+---------+--------+ 374 Kind=2 Length=4 376 Maximum Segment Size Option Data: 16 bits 378 If this option is present, then it communicates the maximum 379 receive segment size at the TCP which sends this segment. This 380 field may be sent in the initial connection request (i.e., in 381 segments with the SYN control bit set) and must not be sent in 382 other segments. If this option is not used, any segment size is 383 allowed. 385 Padding: variable 387 The TCP header padding is used to ensure that the TCP header ends 388 and data begins on a 32 bit boundary. The padding is composed of 389 zeros. 391 3.2. Terminology 393 Before we can discuss very much about the operation of the TCP we 394 need to introduce some detailed terminology. The maintenance of a 395 TCP connection requires the remembering of several variables. We 396 conceive of these variables being stored in a connection record 397 called a Transmission Control Block or TCB. Among the variables 398 stored in the TCB are the local and remote socket numbers, the 399 security and precedence of the connection, pointers to the user's 400 send and receive buffers, pointers to the retransmit queue and to the 401 current segment. In addition several variables relating to the send 402 and receive sequence numbers are stored in the TCB. 404 Send Sequence Variables 406 SND.UNA - send unacknowledged 407 SND.NXT - send next 408 SND.WND - send window 409 SND.UP - send urgent pointer 410 SND.WL1 - segment sequence number used for last window update 411 SND.WL2 - segment acknowledgment number used for last window 412 update 413 ISS - initial send sequence number 415 Receive Sequence Variables 417 RCV.NXT - receive next 418 RCV.WND - receive window 419 RCV.UP - receive urgent pointer 420 IRS - initial receive sequence number 422 The following diagrams may help to relate some of these variables to 423 the sequence space. 425 Send Sequence Space 427 1 2 3 4 428 ----------|----------|----------|---------- 429 SND.UNA SND.NXT SND.UNA 430 +SND.WND 432 1 - old sequence numbers which have been acknowledged 433 2 - sequence numbers of unacknowledged data 434 3 - sequence numbers allowed for new data transmission 435 4 - future sequence numbers which are not yet allowed 437 Send Sequence Space 439 Figure 2 441 The send window is the portion of the sequence space labeled 3 in 442 Figure 2. 444 Receive Sequence Space 446 1 2 3 447 ----------|----------|---------- 448 RCV.NXT RCV.NXT 449 +RCV.WND 451 1 - old sequence numbers which have been acknowledged 452 2 - sequence numbers allowed for new reception 453 3 - future sequence numbers which are not yet allowed 455 Receive Sequence Space 457 Figure 3 459 The receive window is the portion of the sequence space labeled 2 in 460 Figure 3. 462 There are also some variables used frequently in the discussion that 463 take their values from the fields of the current segment. 465 Current Segment Variables 467 SEG.SEQ - segment sequence number 468 SEG.ACK - segment acknowledgment number 469 SEG.LEN - segment length 470 SEG.WND - segment window 471 SEG.UP - segment urgent pointer 472 SEG.PRC - segment precedence value 474 A connection progresses through a series of states during its 475 lifetime. The states are: LISTEN, SYN-SENT, SYN-RECEIVED, 476 ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, 477 TIME-WAIT, and the fictional state CLOSED. CLOSED is fictional 478 because it represents the state when there is no TCB, and therefore, 479 no connection. Briefly the meanings of the states are: 481 LISTEN - represents waiting for a connection request from any 482 remote TCP and port. 484 SYN-SENT - represents waiting for a matching connection request 485 after having sent a connection request. 487 SYN-RECEIVED - represents waiting for a confirming connection 488 request acknowledgment after having both received and sent a 489 connection request. 491 ESTABLISHED - represents an open connection, data received can be 492 delivered to the user. The normal state for the data transfer 493 phase of the connection. 495 FIN-WAIT-1 - represents waiting for a connection termination 496 request from the remote TCP, or an acknowledgment of the 497 connection termination request previously sent. 499 FIN-WAIT-2 - represents waiting for a connection termination 500 request from the remote TCP. 502 CLOSE-WAIT - represents waiting for a connection termination 503 request from the local user. 505 CLOSING - represents waiting for a connection termination request 506 acknowledgment from the remote TCP. 508 LAST-ACK - represents waiting for an acknowledgment of the 509 connection termination request previously sent to the remote TCP 510 (this termination request sent to the remote TCP already included 511 an acknowledgment of the termination request sent from the remote 512 TCP). 514 TIME-WAIT - represents waiting for enough time to pass to be sure 515 the remote TCP received the acknowledgment of its connection 516 termination request. 518 CLOSED - represents no connection state at all. 520 A TCP connection progresses from one state to another in response to 521 events. The events are the user calls, OPEN, SEND, RECEIVE, CLOSE, 522 ABORT, and STATUS; the incoming segments, particularly those 523 containing the SYN, ACK, RST and FIN flags; and timeouts. 525 The state diagram in Figure 4 illustrates only state changes, 526 together with the causing events and resulting actions, but addresses 527 neither error conditions nor actions which are not connected with 528 state changes. In a later section, more detail is offered with 529 respect to the reaction of the TCP to events. 531 NOTA BENE: this diagram is only a summary and must not be taken as 532 the total specification. 534 +---------+ ---------\ active OPEN 535 | CLOSED | \ ----------- 536 +---------+<---------\ \ create TCB 537 | ^ \ \ snd SYN 538 passive OPEN | | CLOSE \ \ 539 ------------ | | ---------- \ \ 540 create TCB | | delete TCB \ \ 541 V | \ \ 542 rcv RST (note 1) +---------+ CLOSE | \ 543 -------------------->| LISTEN | ---------- | | 544 / +---------+ delete TCB | | 545 / rcv SYN | | SEND | | 546 / ----------- | | ------- | V 547 +---------+ snd SYN,ACK / \ snd SYN +---------+ 548 | |<----------------- ------------------>| | 549 | SYN | rcv SYN | SYN | 550 | RCVD |<-----------------------------------------------| SENT | 551 | | snd SYN,ACK | | 552 | |------------------ -------------------| | 553 +---------+ rcv ACK of SYN \ / rcv SYN,ACK +---------+ 554 | -------------- | | ----------- 555 | x | | snd ACK 556 | V V 557 | CLOSE +---------+ 558 | ------- | ESTAB | 559 | snd FIN +---------+ 560 | CLOSE | | rcv FIN 561 V ------- | | ------- 562 +---------+ snd FIN / \ snd ACK +---------+ 563 | FIN |<----------------- ------------------>| CLOSE | 564 | WAIT-1 |------------------ | WAIT | 565 +---------+ rcv FIN \ +---------+ 566 | rcv ACK of FIN ------- | CLOSE | 567 | -------------- snd ACK | ------- | 568 V x V snd FIN V 569 +---------+ +---------+ +---------+ 570 |FINWAIT-2| | CLOSING | | LAST-ACK| 571 +---------+ +---------+ +---------+ 572 | rcv ACK of FIN | rcv ACK of FIN | 573 | rcv FIN -------------- | Timeout=2MSL -------------- | 574 | ------- x V ------------ x V 575 \ snd ACK +---------+delete TCB +---------+ 576 ------------------------>|TIME WAIT|------------------>| CLOSED | 577 +---------+ +---------+ 579 note 1: The transition from SYN-RCVD to LISTEN on receiving a RST is 580 conditional on having reached SYN-RCVD after a passive open. 582 note 2: An unshown transition exists from FIN-WAIT-1 to TIME-WAIT if 583 a FIN is received and the local FIN is also acknowledged. 585 TCP Connection State Diagram 587 Figure 4 589 3.3. Sequence Numbers 591 A fundamental notion in the design is that every octet of data sent 592 over a TCP connection has a sequence number. Since every octet is 593 sequenced, each of them can be acknowledged. The acknowledgment 594 mechanism employed is cumulative so that an acknowledgment of 595 sequence number X indicates that all octets up to but not including X 596 have been received. This mechanism allows for straight-forward 597 duplicate detection in the presence of retransmission. Numbering of 598 octets within a segment is that the first data octet immediately 599 following the header is the lowest numbered, and the following octets 600 are numbered consecutively. 602 It is essential to remember that the actual sequence number space is 603 finite, though very large. This space ranges from 0 to 2**32 - 1. 604 Since the space is finite, all arithmetic dealing with sequence 605 numbers must be performed modulo 2**32. This unsigned arithmetic 606 preserves the relationship of sequence numbers as they cycle from 607 2**32 - 1 to 0 again. There are some subtleties to computer modulo 608 arithmetic, so great care should be taken in programming the 609 comparison of such values. The symbol "=<" means "less than or 610 equal" (modulo 2**32). 612 The typical kinds of sequence number comparisons which the TCP must 613 perform include: 615 (a) Determining that an acknowledgment refers to some sequence 616 number sent but not yet acknowledged. 618 (b) Determining that all sequence numbers occupied by a segment 619 have been acknowledged (e.g., to remove the segment from a 620 retransmission queue). 622 (c) Determining that an incoming segment contains sequence numbers 623 which are expected (i.e., that the segment "overlaps" the receive 624 window). 626 In response to sending data the TCP will receive acknowledgments. 627 The following comparisons are needed to process the acknowledgments. 629 SND.UNA = oldest unacknowledged sequence number 631 SND.NXT = next sequence number to be sent 633 SEG.ACK = acknowledgment from the receiving TCP (next sequence 634 number expected by the receiving TCP) 636 SEG.SEQ = first sequence number of a segment 638 SEG.LEN = the number of octets occupied by the data in the segment 639 (counting SYN and FIN) 641 SEG.SEQ+SEG.LEN-1 = last sequence number of a segment 643 A new acknowledgment (called an "acceptable ack"), is one for which 644 the inequality below holds: 646 SND.UNA < SEG.ACK =< SND.NXT 648 A segment on the retransmission queue is fully acknowledged if the 649 sum of its sequence number and length is less or equal than the 650 acknowledgment value in the incoming segment. 652 When data is received the following comparisons are needed: 654 RCV.NXT = next sequence number expected on an incoming segments, 655 and is the left or lower edge of the receive window 657 RCV.NXT+RCV.WND-1 = last sequence number expected on an incoming 658 segment, and is the right or upper edge of the receive window 660 SEG.SEQ = first sequence number occupied by the incoming segment 662 SEG.SEQ+SEG.LEN-1 = last sequence number occupied by the incoming 663 segment 665 A segment is judged to occupy a portion of valid receive sequence 666 space if 668 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND 670 or 672 RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND 674 The first part of this test checks to see if the beginning of the 675 segment falls in the window, the second part of the test checks to 676 see if the end of the segment falls in the window; if the segment 677 passes either part of the test it contains data in the window. 679 Actually, it is a little more complicated than this. Due to zero 680 windows and zero length segments, we have four cases for the 681 acceptability of an incoming segment: 683 Segment Receive Test 684 Length Window 685 ------- ------- ------------------------------------------- 687 0 0 SEG.SEQ = RCV.NXT 689 0 >0 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND 691 >0 0 not acceptable 693 >0 >0 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND 694 or RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND 696 Note that when the receive window is zero no segments should be 697 acceptable except ACK segments. Thus, it is be possible for a TCP to 698 maintain a zero receive window while transmitting data and receiving 699 ACKs. However, even when the receive window is zero, a TCP must 700 process the RST and URG fields of all incoming segments. 702 We have taken advantage of the numbering scheme to protect certain 703 control information as well. This is achieved by implicitly 704 including some control flags in the sequence space so they can be 705 retransmitted and acknowledged without confusion (i.e., one and only 706 one copy of the control will be acted upon). Control information is 707 not physically carried in the segment data space. Consequently, we 708 must adopt rules for implicitly assigning sequence numbers to 709 control. The SYN and FIN are the only controls requiring this 710 protection, and these controls are used only at connection opening 711 and closing. For sequence number purposes, the SYN is considered to 712 occur before the first actual data octet of the segment in which it 713 occurs, while the FIN is considered to occur after the last actual 714 data octet in a segment in which it occurs. The segment length 715 (SEG.LEN) includes both data and sequence space occupying controls. 716 When a SYN is present then SEG.SEQ is the sequence number of the SYN. 718 Initial Sequence Number Selection 720 The protocol places no restriction on a particular connection being 721 used over and over again. A connection is defined by a pair of 722 sockets. New instances of a connection will be referred to as 723 incarnations of the connection. The problem that arises from this is 724 -- "how does the TCP identify duplicate segments from previous 725 incarnations of the connection?" This problem becomes apparent if 726 the connection is being opened and closed in quick succession, or if 727 the connection breaks with loss of memory and is then reestablished. 729 To avoid confusion we must prevent segments from one incarnation of a 730 connection from being used while the same sequence numbers may still 731 be present in the network from an earlier incarnation. We want to 732 assure this, even if a TCP crashes and loses all knowledge of the 733 sequence numbers it has been using. When new connections are 734 created, an initial sequence number (ISN) generator is employed which 735 selects a new 32 bit ISN. There are security issues that result if 736 an off-path attacker is able to predict or guess ISN values. 738 The recommended ISN generator is based on the combination of a 739 (possibly fictitious) 32 bit clock whose low order bit is incremented 740 roughly every 4 microseconds, and a pseudorandom hash function (PRF). 741 The clock component is intended to insure that with a Maximum Segment 742 Lifetime (MSL), generated ISNs will be unique, since it cycles 743 approximately every 4.55 hours, which is much longer than the MSL. 745 TCP SHOULD generate its Initial Sequence Numbers with the expression: 747 ISN = M + F(localip, localport, remoteip, remoteport, secretkey) 749 where M is the 4 microsecond timer, and F() is a pseudorandom 750 function (PRF) of the connection's identifying parameters ("localip, 751 localport, remoteip, remoteport") and a secret key ("secretkey"). 752 F() MUST NOT be computable from the outside, or an attacker could 753 still guess at sequence numbers from the ISN used for some other 754 connection. The PRF could be implemented as a cryptographic has of 755 the concatenation of the TCP connection parameters and some secret 756 data. For discussion of the selection of a specific hash algorithm 757 and management of the secret key data, please see Section 3 of [14]. 759 For each connection there is a send sequence number and a receive 760 sequence number. The initial send sequence number (ISS) is chosen by 761 the data sending TCP, and the initial receive sequence number (IRS) 762 is learned during the connection establishing procedure. 764 For a connection to be established or initialized, the two TCPs must 765 synchronize on each other's initial sequence numbers. This is done 766 in an exchange of connection establishing segments carrying a control 767 bit called "SYN" (for synchronize) and the initial sequence numbers. 768 As a shorthand, segments carrying the SYN bit are also called "SYNs". 769 Hence, the solution requires a suitable mechanism for picking an 770 initial sequence number and a slightly involved handshake to exchange 771 the ISN's. 773 The synchronization requires each side to send it's own initial 774 sequence number and to receive a confirmation of it in acknowledgment 775 from the other side. Each side must also receive the other side's 776 initial sequence number and send a confirming acknowledgment. 778 1) A --> B SYN my sequence number is X 779 2) A <-- B ACK your sequence number is X 780 3) A <-- B SYN my sequence number is Y 781 4) A --> B ACK your sequence number is Y 783 Because steps 2 and 3 can be combined in a single message this is 784 called the three way (or three message) handshake. 786 A three way handshake is necessary because sequence numbers are not 787 tied to a global clock in the network, and TCPs may have different 788 mechanisms for picking the ISN's. The receiver of the first SYN has 789 no way of knowing whether the segment was an old delayed one or not, 790 unless it remembers the last sequence number used on the connection 791 (which is not always possible), and so it must ask the sender to 792 verify this SYN. The three way handshake and the advantages of a 793 clock-driven scheme are discussed in [3]. 795 Knowing When to Keep Quiet 797 To be sure that a TCP does not create a segment that carries a 798 sequence number which may be duplicated by an old segment remaining 799 in the network, the TCP must keep quiet for an MSL before assigning 800 any sequence numbers upon starting up or recovering from a crash in 801 which memory of sequence numbers in use was lost. For this 802 specification the MSL is taken to be 2 minutes. This is an 803 engineering choice, and may be changed if experience indicates it is 804 desirable to do so. Note that if a TCP is reinitialized in some 805 sense, yet retains its memory of sequence numbers in use, then it 806 need not wait at all; it must only be sure to use sequence numbers 807 larger than those recently used. 809 The TCP Quiet Time Concept 811 This specification provides that hosts which "crash" without 812 retaining any knowledge of the last sequence numbers transmitted on 813 each active (i.e., not closed) connection shall delay emitting any 814 TCP segments for at least the agreed MSL in the internet system of 815 which the host is a part. In the paragraphs below, an explanation 816 for this specification is given. TCP implementors may violate the 817 "quiet time" restriction, but only at the risk of causing some old 818 data to be accepted as new or new data rejected as old duplicated by 819 some receivers in the internet system. 821 TCPs consume sequence number space each time a segment is formed and 822 entered into the network output queue at a source host. The 823 duplicate detection and sequencing algorithm in the TCP protocol 824 relies on the unique binding of segment data to sequence space to the 825 extent that sequence numbers will not cycle through all 2**32 values 826 before the segment data bound to those sequence numbers has been 827 delivered and acknowledged by the receiver and all duplicate copies 828 of the segments have "drained" from the internet. Without such an 829 assumption, two distinct TCP segments could conceivably be assigned 830 the same or overlapping sequence numbers, causing confusion at the 831 receiver as to which data is new and which is old. Remember that 832 each segment is bound to as many consecutive sequence numbers as 833 there are octets of data and SYN or FIN flags in the segment. 835 Under normal conditions, TCPs keep track of the next sequence number 836 to emit and the oldest awaiting acknowledgment so as to avoid 837 mistakenly using a sequence number over before its first use has been 838 acknowledged. This alone does not guarantee that old duplicate data 839 is drained from the net, so the sequence space has been made very 840 large to reduce the probability that a wandering duplicate will cause 841 trouble upon arrival. At 2 megabits/sec. it takes 4.5 hours to use 842 up 2**32 octets of sequence space. Since the maximum segment 843 lifetime in the net is not likely to exceed a few tens of seconds, 844 this is deemed ample protection for foreseeable nets, even if data 845 rates escalate to l0's of megabits/sec. At 100 megabits/sec, the 846 cycle time is 5.4 minutes which may be a little short, but still 847 within reason. 849 The basic duplicate detection and sequencing algorithm in TCP can be 850 defeated, however, if a source TCP does not have any memory of the 851 sequence numbers it last used on a given connection. For example, if 852 the TCP were to start all connections with sequence number 0, then 853 upon crashing and restarting, a TCP might re-form an earlier 854 connection (possibly after half-open connection resolution) and emit 855 packets with sequence numbers identical to or overlapping with 856 packets still in the network which were emitted on an earlier 857 incarnation of the same connection. In the absence of knowledge 858 about the sequence numbers used on a particular connection, the TCP 859 specification recommends that the source delay for MSL seconds before 860 emitting segments on the connection, to allow time for segments from 861 the earlier connection incarnation to drain from the system. 863 Even hosts which can remember the time of day and used it to select 864 initial sequence number values are not immune from this problem 865 (i.e., even if time of day is used to select an initial sequence 866 number for each new connection incarnation). 868 Suppose, for example, that a connection is opened starting with 869 sequence number S. Suppose that this connection is not used much and 870 that eventually the initial sequence number function (ISN(t)) takes 871 on a value equal to the sequence number, say S1, of the last segment 872 sent by this TCP on a particular connection. Now suppose, at this 873 instant, the host crashes, recovers, and establishes a new 874 incarnation of the connection. The initial sequence number chosen is 875 S1 = ISN(t) -- last used sequence number on old incarnation of 876 connection! If the recovery occurs quickly enough, any old 877 duplicates in the net bearing sequence numbers in the neighborhood of 878 S1 may arrive and be treated as new packets by the receiver of the 879 new incarnation of the connection. 881 The problem is that the recovering host may not know for how long it 882 crashed nor does it know whether there are still old duplicates in 883 the system from earlier connection incarnations. 885 One way to deal with this problem is to deliberately delay emitting 886 segments for one MSL after recovery from a crash- this is the "quiet 887 time" specification. Hosts which prefer to avoid waiting are willing 888 to risk possible confusion of old and new packets at a given 889 destination may choose not to wait for the "quite time". 890 Implementors may provide TCP users with the ability to select on a 891 connection by connection basis whether to wait after a crash, or may 892 informally implement the "quite time" for all connections. 893 Obviously, even where a user selects to "wait," this is not necessary 894 after the host has been "up" for at least MSL seconds. 896 To summarize: every segment emitted occupies one or more sequence 897 numbers in the sequence space, the numbers occupied by a segment are 898 "busy" or "in use" until MSL seconds have passed, upon crashing a 899 block of space-time is occupied by the octets and SYN or FIN flags of 900 the last emitted segment, if a new connection is started too soon and 901 uses any of the sequence numbers in the space-time footprint of the 902 last segment of the previous connection incarnation, there is a 903 potential sequence number overlap area which could cause confusion at 904 the receiver. 906 3.4. Establishing a connection 908 The "three-way handshake" is the procedure used to establish a 909 connection. This procedure normally is initiated by one TCP and 910 responded to by another TCP. The procedure also works if two TCP 911 simultaneously initiate the procedure. When simultaneous attempt 912 occurs, each TCP receives a "SYN" segment which carries no 913 acknowledgment after it has sent a "SYN". Of course, the arrival of 914 an old duplicate "SYN" segment can potentially make it appear, to the 915 recipient, that a simultaneous connection initiation is in progress. 916 Proper use of "reset" segments can disambiguate these cases. 918 Several examples of connection initiation follow. Although these 919 examples do not show connection synchronization using data-carrying 920 segments, this is perfectly legitimate, so long as the receiving TCP 921 doesn't deliver the data to the user until it is clear the data is 922 valid (i.e., the data must be buffered at the receiver until the 923 connection reaches the ESTABLISHED state). The three-way handshake 924 reduces the possibility of false connections. It is the 925 implementation of a trade-off between memory and messages to provide 926 information for this checking. 928 The simplest three-way handshake is shown in Figure 5 below. The 929 figures should be interpreted in the following way. Each line is 930 numbered for reference purposes. Right arrows (-->) indicate 931 departure of a TCP segment from TCP A to TCP B, or arrival of a 932 segment at B from A. Left arrows (<--), indicate the reverse. 933 Ellipsis (...) indicates a segment which is still in the network 934 (delayed). An "XXX" indicates a segment which is lost or rejected. 935 Comments appear in parentheses. TCP states represent the state AFTER 936 the departure or arrival of the segment (whose contents are shown in 937 the center of each line). Segment contents are shown in abbreviated 938 form, with sequence number, control flags, and ACK field. Other 939 fields such as window, addresses, lengths, and text have been left 940 out in the interest of clarity. 942 TCP A TCP B 944 1. CLOSED LISTEN 946 2. SYN-SENT --> --> SYN-RECEIVED 948 3. ESTABLISHED <-- <-- SYN-RECEIVED 950 4. ESTABLISHED --> --> ESTABLISHED 952 5. ESTABLISHED --> --> ESTABLISHED 954 Basic 3-Way Handshake for Connection Synchronization 956 Figure 5 958 In line 2 of Figure 5, TCP A begins by sending a SYN segment 959 indicating that it will use sequence numbers starting with sequence 960 number 100. In line 3, TCP B sends a SYN and acknowledges the SYN it 961 received from TCP A. Note that the acknowledgment field indicates 962 TCP B is now expecting to hear sequence 101, acknowledging the SYN 963 which occupied sequence 100. 965 At line 4, TCP A responds with an empty segment containing an ACK for 966 TCP B's SYN; and in line 5, TCP A sends some data. Note that the 967 sequence number of the segment in line 5 is the same as in line 4 968 because the ACK does not occupy sequence number space (if it did, we 969 would wind up ACKing ACK's!). 971 Simultaneous initiation is only slightly more complex, as is shown in 972 Figure 6. Each TCP cycles from CLOSED to SYN-SENT to SYN-RECEIVED to 973 ESTABLISHED. 975 TCP A TCP B 977 1. CLOSED CLOSED 979 2. SYN-SENT --> ... 981 3. SYN-RECEIVED <-- <-- SYN-SENT 983 4. ... --> SYN-RECEIVED 985 5. SYN-RECEIVED --> ... 987 6. ESTABLISHED <-- <-- SYN-RECEIVED 989 7. ... --> ESTABLISHED 991 Simultaneous Connection Synchronization 993 Figure 6 995 The principle reason for the three-way handshake is to prevent old 996 duplicate connection initiations from causing confusion. To deal 997 with this, a special control message, reset, has been devised. If 998 the receiving TCP is in a non-synchronized state (i.e., SYN-SENT, 999 SYN-RECEIVED), it returns to LISTEN on receiving an acceptable reset. 1000 If the TCP is in one of the synchronized states (ESTABLISHED, FIN- 1001 WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), it 1002 aborts the connection and informs its user. We discuss this latter 1003 case under "half-open" connections below. 1005 TCP A TCP B 1007 1. CLOSED LISTEN 1009 2. SYN-SENT --> ... 1011 3. (duplicate) ... --> SYN-RECEIVED 1013 4. SYN-SENT <-- <-- SYN-RECEIVED 1015 5. SYN-SENT --> --> LISTEN 1017 6. ... --> SYN-RECEIVED 1019 7. SYN-SENT <-- <-- SYN-RECEIVED 1021 8. ESTABLISHED --> --> ESTABLISHED 1023 Recovery from Old Duplicate SYN 1025 Figure 7 1027 As a simple example of recovery from old duplicates, consider 1028 Figure 7. At line 3, an old duplicate SYN arrives at TCP B. TCP B 1029 cannot tell that this is an old duplicate, so it responds normally 1030 (line 4). TCP A detects that the ACK field is incorrect and returns 1031 a RST (reset) with its SEQ field selected to make the segment 1032 believable. TCP B, on receiving the RST, returns to the LISTEN 1033 state. When the original SYN (pun intended) finally arrives at line 1034 6, the synchronization proceeds normally. If the SYN at line 6 had 1035 arrived before the RST, a more complex exchange might have occurred 1036 with RST's sent in both directions. 1038 Half-Open Connections and Other Anomalies 1040 An established connection is said to be "half-open" if one of the 1041 TCPs has closed or aborted the connection at its end without the 1042 knowledge of the other, or if the two ends of the connection have 1043 become desynchronized owing to a crash that resulted in loss of 1044 memory. Such connections will automatically become reset if an 1045 attempt is made to send data in either direction. However, half-open 1046 connections are expected to be unusual, and the recovery procedure is 1047 mildly involved. 1049 If at site A the connection no longer exists, then an attempt by the 1050 user at site B to send any data on it will result in the site B TCP 1051 receiving a reset control message. Such a message indicates to the 1052 site B TCP that something is wrong, and it is expected to abort the 1053 connection. 1055 Assume that two user processes A and B are communicating with one 1056 another when a crash occurs causing loss of memory to A's TCP. 1057 Depending on the operating system supporting A's TCP, it is likely 1058 that some error recovery mechanism exists. When the TCP is up again, 1059 A is likely to start again from the beginning or from a recovery 1060 point. As a result, A will probably try to OPEN the connection again 1061 or try to SEND on the connection it believes open. In the latter 1062 case, it receives the error message "connection not open" from the 1063 local (A's) TCP. In an attempt to establish the connection, A's TCP 1064 will send a segment containing SYN. This scenario leads to the 1065 example shown in Figure 8. After TCP A crashes, the user attempts to 1066 re-open the connection. TCP B, in the meantime, thinks the 1067 connection is open. 1069 TCP A TCP B 1071 1. (CRASH) (send 300,receive 100) 1073 2. CLOSED ESTABLISHED 1075 3. SYN-SENT --> --> (??) 1077 4. (!!) <-- <-- ESTABLISHED 1079 5. SYN-SENT --> --> (Abort!!) 1081 6. SYN-SENT CLOSED 1083 7. SYN-SENT --> --> 1085 Half-Open Connection Discovery 1087 Figure 8 1089 When the SYN arrives at line 3, TCP B, being in a synchronized state, 1090 and the incoming segment outside the window, responds with an 1091 acknowledgment indicating what sequence it next expects to hear (ACK 1092 100). TCP A sees that this segment does not acknowledge anything it 1093 sent and, being unsynchronized, sends a reset (RST) because it has 1094 detected a half-open connection. TCP B aborts at line 5. TCP A will 1095 continue to try to establish the connection; the problem is now 1096 reduced to the basic 3-way handshake of Figure 5. 1098 An interesting alternative case occurs when TCP A crashes and TCP B 1099 tries to send data on what it thinks is a synchronized connection. 1101 This is illustrated in Figure 9. In this case, the data arriving at 1102 TCP A from TCP B (line 2) is unacceptable because no such connection 1103 exists, so TCP A sends a RST. The RST is acceptable so TCP B 1104 processes it and aborts the connection. 1106 TCP A TCP B 1108 1. (CRASH) (send 300,receive 100) 1110 2. (??) <-- <-- ESTABLISHED 1112 3. --> --> (ABORT!!) 1114 Active Side Causes Half-Open Connection Discovery 1116 Figure 9 1118 In Figure 10, we find the two TCPs A and B with passive connections 1119 waiting for SYN. An old duplicate arriving at TCP B (line 2) stirs B 1120 into action. A SYN-ACK is returned (line 3) and causes TCP A to 1121 generate a RST (the ACK in line 3 is not acceptable). TCP B accepts 1122 the reset and returns to its passive LISTEN state. 1124 TCP A TCP B 1126 1. LISTEN LISTEN 1128 2. ... --> SYN-RECEIVED 1130 3. (??) <-- <-- SYN-RECEIVED 1132 4. --> --> (return to LISTEN!) 1134 5. LISTEN LISTEN 1136 Old Duplicate SYN Initiates a Reset on two Passive Sockets 1138 Figure 10 1140 A variety of other cases are possible, all of which are accounted for 1141 by the following rules for RST generation and processing. 1143 Reset Generation 1144 As a general rule, reset (RST) must be sent whenever a segment 1145 arrives which apparently is not intended for the current connection. 1146 A reset must not be sent if it is not clear that this is the case. 1148 There are three groups of states: 1150 1. If the connection does not exist (CLOSED) then a reset is sent 1151 in response to any incoming segment except another reset. In 1152 particular, SYNs addressed to a non-existent connection are 1153 rejected by this means. 1155 If the incoming segment has the ACK bit set, the reset takes its 1156 sequence number from the ACK field of the segment, otherwise the 1157 reset has sequence number zero and the ACK field is set to the sum 1158 of the sequence number and segment length of the incoming segment. 1159 The connection remains in the CLOSED state. 1161 2. If the connection is in any non-synchronized state (LISTEN, 1162 SYN-SENT, SYN-RECEIVED), and the incoming segment acknowledges 1163 something not yet sent (the segment carries an unacceptable ACK), 1164 or if an incoming segment has a security level or compartment 1165 which does not exactly match the level and compartment requested 1166 for the connection, a reset is sent. 1168 If our SYN has not been acknowledged and the precedence level of 1169 the incoming segment is higher than the precedence level requested 1170 then either raise the local precedence level (if allowed by the 1171 user and the system) or send a reset; or if the precedence level 1172 of the incoming segment is lower than the precedence level 1173 requested then continue as if the precedence matched exactly (if 1174 the remote TCP cannot raise the precedence level to match ours 1175 this will be detected in the next segment it sends, and the 1176 connection will be terminated then). If our SYN has been 1177 acknowledged (perhaps in this incoming segment) the precedence 1178 level of the incoming segment must match the local precedence 1179 level exactly, if it does not a reset must be sent. 1181 If the incoming segment has an ACK field, the reset takes its 1182 sequence number from the ACK field of the segment, otherwise the 1183 reset has sequence number zero and the ACK field is set to the sum 1184 of the sequence number and segment length of the incoming segment. 1185 The connection remains in the same state. 1187 3. If the connection is in a synchronized state (ESTABLISHED, 1188 FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), 1189 any unacceptable segment (out of window sequence number or 1190 unacceptable acknowledgment number) must elicit only an empty 1191 acknowledgment segment containing the current send-sequence number 1192 and an acknowledgment indicating the next sequence number expected 1193 to be received, and the connection remains in the same state. 1195 If an incoming segment has a security level, or compartment, or 1196 precedence which does not exactly match the level, and 1197 compartment, and precedence requested for the connection,a reset 1198 is sent and the connection goes to the CLOSED state. The reset 1199 takes its sequence number from the ACK field of the incoming 1200 segment. 1202 Reset Processing 1204 In all states except SYN-SENT, all reset (RST) segments are validated 1205 by checking their SEQ-fields. A reset is valid if its sequence 1206 number is in the window. In the SYN-SENT state (a RST received in 1207 response to an initial SYN), the RST is acceptable if the ACK field 1208 acknowledges the SYN. 1210 The receiver of a RST first validates it, then changes state. If the 1211 receiver was in the LISTEN state, it ignores it. If the receiver was 1212 in SYN-RECEIVED state and had previously been in the LISTEN state, 1213 then the receiver returns to the LISTEN state, otherwise the receiver 1214 aborts the connection and goes to the CLOSED state. If the receiver 1215 was in any other state, it aborts the connection and advises the user 1216 and goes to the CLOSED state. 1218 3.5. Closing a Connection 1220 CLOSE is an operation meaning "I have no more data to send." The 1221 notion of closing a full-duplex connection is subject to ambiguous 1222 interpretation, of course, since it may not be obvious how to treat 1223 the receiving side of the connection. We have chosen to treat CLOSE 1224 in a simplex fashion. The user who CLOSEs may continue to RECEIVE 1225 until he is told that the other side has CLOSED also. Thus, a 1226 program could initiate several SENDs followed by a CLOSE, and then 1227 continue to RECEIVE until signaled that a RECEIVE failed because the 1228 other side has CLOSED. We assume that the TCP will signal a user, 1229 even if no RECEIVEs are outstanding, that the other side has closed, 1230 so the user can terminate his side gracefully. A TCP will reliably 1231 deliver all buffers SENT before the connection was CLOSED so a user 1232 who expects no data in return need only wait to hear the connection 1233 was CLOSED successfully to know that all his data was received at the 1234 destination TCP. Users must keep reading connections they close for 1235 sending until the TCP says no more data. 1237 There are essentially three cases: 1239 1) The user initiates by telling the TCP to CLOSE the connection 1240 2) The remote TCP initiates by sending a FIN control signal 1242 3) Both users CLOSE simultaneously 1244 Case 1: Local user initiates the close 1246 In this case, a FIN segment can be constructed and placed on the 1247 outgoing segment queue. No further SENDs from the user will be 1248 accepted by the TCP, and it enters the FIN-WAIT-1 state. RECEIVEs 1249 are allowed in this state. All segments preceding and including 1250 FIN will be retransmitted until acknowledged. When the other TCP 1251 has both acknowledged the FIN and sent a FIN of its own, the first 1252 TCP can ACK this FIN. Note that a TCP receiving a FIN will ACK 1253 but not send its own FIN until its user has CLOSED the connection 1254 also. 1256 Case 2: TCP receives a FIN from the network 1258 If an unsolicited FIN arrives from the network, the receiving TCP 1259 can ACK it and tell the user that the connection is closing. The 1260 user will respond with a CLOSE, upon which the TCP can send a FIN 1261 to the other TCP after sending any remaining data. The TCP then 1262 waits until its own FIN is acknowledged whereupon it deletes the 1263 connection. If an ACK is not forthcoming, after the user timeout 1264 the connection is aborted and the user is told. 1266 Case 3: both users close simultaneously 1268 A simultaneous CLOSE by users at both ends of a connection causes 1269 FIN segments to be exchanged. When all segments preceding the 1270 FINs have been processed and acknowledged, each TCP can ACK the 1271 FIN it has received. Both will, upon receiving these ACKs, delete 1272 the connection. 1274 TCP A TCP B 1276 1. ESTABLISHED ESTABLISHED 1278 2. (Close) 1279 FIN-WAIT-1 --> --> CLOSE-WAIT 1281 3. FIN-WAIT-2 <-- <-- CLOSE-WAIT 1283 4. (Close) 1284 TIME-WAIT <-- <-- LAST-ACK 1286 5. TIME-WAIT --> --> CLOSED 1288 6. (2 MSL) 1289 CLOSED 1291 Normal Close Sequence 1293 Figure 11 1295 TCP A TCP B 1297 1. ESTABLISHED ESTABLISHED 1299 2. (Close) (Close) 1300 FIN-WAIT-1 --> ... FIN-WAIT-1 1301 <-- <-- 1302 ... --> 1304 3. CLOSING --> ... CLOSING 1305 <-- <-- 1306 ... --> 1308 4. TIME-WAIT TIME-WAIT 1309 (2 MSL) (2 MSL) 1310 CLOSED CLOSED 1312 Simultaneous Close Sequence 1314 Figure 12 1316 3.6. Precedence and Security 1318 The intent is that connection be allowed only between ports operating 1319 with exactly the same security and compartment values and at the 1320 higher of the precedence level requested by the two ports. 1322 The precedence and security parameters used in TCP are exactly those 1323 defined in the Internet Protocol (IP) [2]. Throughout this TCP 1324 specification the term "security/compartment" is intended to indicate 1325 the security parameters used in IP including security, compartment, 1326 user group, and handling restriction. 1328 A connection attempt with mismatched security/compartment values or a 1329 lower precedence value must be rejected by sending a reset. 1330 Rejecting a connection due to too low a precedence only occurs after 1331 an acknowledgment of the SYN has been received. 1333 Note that TCP modules which operate only at the default value of 1334 precedence will still have to check the precedence of incoming 1335 segments and possibly raise the precedence level they use on the 1336 connection. 1338 The security parameters may be used even in a non-secure environment 1339 (the values would indicate unclassified data), thus hosts in non- 1340 secure environments must be prepared to receive the security 1341 parameters, though they need not send them. 1343 3.7. Segmentation 1345 The term "segmentation" refers to the activity TCP performs when 1346 ingesting a stream of bytes from a sending application and 1347 packetizing that stream of bytes into TCP segments. Individual TCP 1348 segments often do not correspond one-for-one to individual send (or 1349 socket write) calls from the application. Applications may perform 1350 writes at the granularity of messages in the upper layer protocol, 1351 but TCP guarantees no boundary coherence between the TCP segments 1352 sent and received versus user application data read or write buffer 1353 boundaries. In some specific protocols, such as RDMA using DDP and 1354 MPA [10], there are performance optimizations possible when the 1355 relation between TCP segments and application data units can be 1356 controlled, and MPA includes a specific mechanism for detecting and 1357 verifying this relationship between TCP segments and application 1358 message data strcutures, but this is specific to applications like 1359 RDMA. In general, multiple goals influence the sizing of TCP 1360 segments created by a TCP implementation. 1362 Goals driving the sending of larger segments include: 1364 o Reducing the number of packets in flight within the network. 1366 o Increasing processing efficiency and potential performance by 1367 enabling a smaller number of interrupts and inter-layer 1368 interactions. 1370 o Limiting the overhead of TCP headers. 1372 Note that the performance benefits of sending larger segments may 1373 decrease as the size increases, and there may be boundaries where 1374 advantages are reversed. For instance, on some machines 1025 bytes 1375 within a segment could lead to worse performance than 1024 bytes, due 1376 purely to data alignment on copy operations. 1378 Goals driving the sending of smaller segments include: 1380 o Avoiding sending segments larger than the smallest MTU within an 1381 IP network path, because this results in either packet loss or 1382 fragmentation. Making matters worse, some firewalls or 1383 middleboxes may drop fragmented packets or ICMP messages related 1384 related to fragmentation. 1386 o Preventing delays to the application data stream, especially when 1387 TCP is waiting on the application to generate more data, or when 1388 the application is waiting on an event or input from its peer in 1389 order to generate more data. 1391 o Enabling "fate sharing" between TCP segments and lower-layer data 1392 units (e.g. below IP, for links with cell or frame sizes smaller 1393 than the IP MTU). 1395 Towards meeting these competing sets of goals, TCP includes several 1396 mechanisms, including the Maximum Segment Size option, Path MTU 1397 Discovery, the Nagle algorithm, and support for IPv6 Jumbograms, as 1398 discussed in the following subsections. 1400 3.7.1. Maximum Segment Size Option 1402 TCP MUST implement both sending and receiving the MSS option. 1404 TCP SHOULD send an MSS option in every SYN segment when its receive 1405 MSS differs from the default 536 for IPv4 or 1220 for IPv6, and MAY 1406 send it always. 1408 If an MSS option is not received at connection setup, TCP MUST assume 1409 a default send MSS of 536 (576-40) for IPv4 or 1220 (1280 - 40) for 1410 IPv6. 1412 The maximum size of a segment that TCP really sends, the "effective 1413 send MSS," MUST be the smaller of the send MSS (which reflects the 1414 available reassembly buffer size at the remote host) and the largest 1415 size permitted by the IP layer: 1417 Eff.snd.MSS = 1418 min(SendMSS+20, MMS_S) - TCPhdrsize - IPoptionsize 1420 where: 1422 o SendMSS is the MSS value received from the remote host, or the 1423 default 536 for IPv4 or 1220 for IPv6, if no MSS option is 1424 received. 1426 o MMS_S is the maximum size for a transport-layer message that TCP 1427 may send. 1429 o TCPhdrsize is the size of the fixed TCP header and any options. 1430 This is 20 in the (rare) case that no options are present, but may 1431 be larger if TCP options are to be sent. Note that some options 1432 may not be included on all segments, but that for each segment 1433 sent, the sender should adjust the data length accordingly, within 1434 the Eff.snd.MSS. 1436 o IPoptionsize is the size of any IP options associated with a TCP 1437 connection. Note that some options may not be included on all 1438 packets, but that for each segment sent, the sender should adjust 1439 the data length accordingly, within the Eff.snd.MSS. 1441 The MSS value to be sent in an MSS option should be equal to the 1442 effective MTU minus the fixed IP and TCP headers. By ignoring both 1443 IP and TCP options when calculating the value for the MSS option, if 1444 there are any IP or TCP options to be sent in a packet, then the 1445 sender must decrease the size of the TCP data accordingly. RFC 6691 1446 [15] discusses this in greater detail. 1448 The MSS value to be sent in an MSS option must be less than or equal 1449 to: 1451 MMS_R - 20 1453 where MMS_R is the maximum size for a transport-layer message that 1454 can be received (and reassembled). TCP obtains MMS_R and MMS_S from 1455 the IP layer; see the generic call GET_MAXSIZES in Section 3.4 of RFC 1456 1122. 1458 When TCP is used in a situation where either the IP or TCP headers 1459 are not fixed, the sender must reduce the amount of TCP data in any 1460 given packet by the number of octets used by the IP and TCP options. 1461 This has been a point of confusion historically, as explained in RFC 1462 6691, Section 3.1. 1464 3.7.2. Path MTU Discovery 1466 A TCP implementation may be aware of the MTU on directly connected 1467 links, but will rarely have insight about MTUs across an entire 1468 network path. For IPv4, RFC 1122 provides an IP-layer recommendation 1469 on the default effective MTU for sending to be less than or equal to 1470 576 for destinations not directly connected. For IPv6, this would be 1471 1280. In all cases, however, implementation of Path MTU Discovery 1472 (PMTUD) and Packetization Layer Path MTU Discovery (PLPMTUD) is 1473 strongly recommended in order for TCP to improve segmentation 1474 decisions. 1476 PMTUD for IPv4 [1] or IPv6 [2] is implemented in conjunction between 1477 TCP, IP, and ICMP protocols. Several adjustments to a TCP 1478 implementation with PMTUD are described in RFC 2923 in order to deal 1479 with problems experienced in practice [5]. PLPMTUD [9] is a 1480 Standards Track improvement to PMTUD that relaxes the requirement for 1481 ICMP support across a path, and improves performance in cases where 1482 ICMP is not consistently conveyed. The mechanisms in all four of 1483 these RFCs are recommended to be included in TCP implementations. 1485 The TCP MSS option specifies an upper bound for the size of packets 1486 that can be received. Hence, setting the value in the MSS option too 1487 small can impact the ability for PMTUD or PLPMTUD to find a larger 1488 path MTU. RFC 1191 discusses this implication of many older TCP 1489 implementations setting MSS to 536 for non-local destinations, rather 1490 than deriving it from the MTUs of connected interfaces as 1491 recommended. 1493 3.7.3. Interfaces with Variable MSS Values 1495 The effective MTU can sometimes vary, as when used with variable 1496 compression, e.g., RObust Header Compression (ROHC) [12]. It is 1497 tempting for TCP to want to advertise the largest possible MSS, to 1498 support the most efficient use of compressed payloads. 1499 Unfortunately, some compression schemes occasionally need to transmit 1500 full headers (and thus smaller payloads) to resynchronize state at 1501 their endpoint compressors/decompressors. If the largest MTU is used 1502 to calculate the value to advertise in the MSS option, TCP 1503 retransmission may interfere with compressor resynchronization. 1505 As a result, when the effective MTU of an interface varies, TCP 1506 SHOULD use the smallest effective MTU of the interface to calculate 1507 the value to advertise in the MSS option. 1509 3.7.4. Nagle Algorithm 1511 The "Nagle algorithm" was described in RFC 896 [7] and was 1512 recommended in RFC 1122 [8] for mitigation of an early problem of too 1513 many small packets being generated. It has been implemented in most 1514 current TCP code bases, sometimes with minor variations. 1516 If there is unacknowledged data (i.e., SND.NXT > SND.UNA), then the 1517 sending TCP buffers all user data (regardless of the PSH bit), until 1518 the outstanding data has been acknowledged or until the TCP can send 1519 a full-sized segment (Eff.snd.MSS bytes). 1521 TODO - see if SEND description later should be updated to reflect 1522 this 1524 A TCP SHOULD implement the Nagle Algorithm to coalesce short 1525 segments. However, there MUST be a way for an application to disable 1526 the Nagle algorithm on an individual connection. In all cases, 1527 sending data is also subject to the limitation imposed by the Slow 1528 Start algorithm [11]. 1530 3.7.5. IPv6 Jumbograms 1532 In order to support TCP over IPv6 jumbograms, implementations need to 1533 be able to send TCP segments larger than the 64KB limit that the MSS 1534 option can convey. RFC 2675 [4] defines that an MSS value of 65,535 1535 bytes is to be treated as infinity, and Path MTU Discovery [2] is 1536 used to determine the actual MSS. 1538 3.8. Data Communication 1540 Once the connection is established data is communicated by the 1541 exchange of segments. Because segments may be lost due to errors 1542 (checksum test failure), or network congestion, TCP uses 1543 retransmission (after a timeout) to ensure delivery of every segment. 1544 Duplicate segments may arrive due to network or TCP retransmission. 1545 As discussed in the section on sequence numbers the TCP performs 1546 certain tests on the sequence and acknowledgment numbers in the 1547 segments to verify their acceptability. 1549 The sender of data keeps track of the next sequence number to use in 1550 the variable SND.NXT. The receiver of data keeps track of the next 1551 sequence number to expect in the variable RCV.NXT. The sender of 1552 data keeps track of the oldest unacknowledged sequence number in the 1553 variable SND.UNA. If the data flow is momentarily idle and all data 1554 sent has been acknowledged then the three variables will be equal. 1556 When the sender creates a segment and transmits it the sender 1557 advances SND.NXT. When the receiver accepts a segment it advances 1558 RCV.NXT and sends an acknowledgment. When the data sender receives 1559 an acknowledgment it advances SND.UNA. The extent to which the 1560 values of these variables differ is a measure of the delay in the 1561 communication. The amount by which the variables are advanced is the 1562 length of the data and SYN or FIN flags in the segment. Note that 1563 once in the ESTABLISHED state all segments must carry current 1564 acknowledgment information. 1566 The CLOSE user call implies a push function, as does the FIN control 1567 flag in an incoming segment. 1569 Retransmission Timeout 1571 NOTE: TODO this needs to be updated in light of 1122 4.2.2.15 and 1572 errata 573; this will be done as part of RFC 1122 incorporation into 1573 this document. 1574 Because of the variability of the networks that compose an 1575 internetwork system and the wide range of uses of TCP connections the 1576 retransmission timeout must be dynamically determined. One procedure 1577 for determining a retransmission timeout is given here as an 1578 illustration. 1580 An Example Retransmission Timeout Procedure 1582 Measure the elapsed time between sending a data octet with a 1583 particular sequence number and receiving an acknowledgment that 1584 covers that sequence number (segments sent do not have to match 1585 segments received). This measured elapsed time is the Round Trip 1586 Time (RTT). Next compute a Smoothed Round Trip Time (SRTT) as: 1588 SRTT = ( ALPHA * SRTT ) + ((1-ALPHA) * RTT) 1590 and based on this, compute the retransmission timeout (RTO) as: 1592 RTO = min[UBOUND,max[LBOUND,(BETA*SRTT)]] 1594 where UBOUND is an upper bound on the timeout (e.g., 1 minute), 1595 LBOUND is a lower bound on the timeout (e.g., 1 second), ALPHA is 1596 a smoothing factor (e.g., .8 to .9), and BETA is a delay variance 1597 factor (e.g., 1.3 to 2.0). 1599 The Communication of Urgent Information 1601 As a result of implementation differences and middlebox interactions, 1602 new applications SHOULD NOT employ the TCP urgent mechanism. 1604 However, TCP implementations MUST still include support for the 1605 urgent mechanism. Details can be found in RFC 6093 [13]. 1607 The objective of the TCP urgent mechanism is to allow the sending 1608 user to stimulate the receiving user to accept some urgent data and 1609 to permit the receiving TCP to indicate to the receiving user when 1610 all the currently known urgent data has been received by the user. 1612 This mechanism permits a point in the data stream to be designated as 1613 the end of urgent information. Whenever this point is in advance of 1614 the receive sequence number (RCV.NXT) at the receiving TCP, that TCP 1615 must tell the user to go into "urgent mode"; when the receive 1616 sequence number catches up to the urgent pointer, the TCP must tell 1617 user to go into "normal mode". If the urgent pointer is updated 1618 while the user is in "urgent mode", the update will be invisible to 1619 the user. 1621 The method employs a urgent field which is carried in all segments 1622 transmitted. The URG control flag indicates that the urgent field is 1623 meaningful and must be added to the segment sequence number to yield 1624 the urgent pointer. The absence of this flag indicates that there is 1625 no urgent data outstanding. 1627 To send an urgent indication the user must also send at least one 1628 data octet. If the sending user also indicates a push, timely 1629 delivery of the urgent information to the destination process is 1630 enhanced. 1632 A TCP MUST support a sequence of urgent data of any length. [8] 1634 A TCP MUST inform the application layer asynchronously whenever it 1635 receives an Urgent pointer and there was previously no pending urgent 1636 data, or whenvever the Urgent pointer advances in the data stream. 1637 There MUST be a way for the application to learn how much urgent data 1638 remains to be read from the connection, or at least to determine 1639 whether or not more urgent data remains to be read. [8] 1641 Managing the Window 1643 The window sent in each segment indicates the range of sequence 1644 numbers the sender of the window (the data receiver) is currently 1645 prepared to accept. There is an assumption that this is related to 1646 the currently available data buffer space available for this 1647 connection. 1649 Indicating a large window encourages transmissions. If more data 1650 arrives than can be accepted, it will be discarded. This will result 1651 in excessive retransmissions, adding unnecessarily to the load on the 1652 network and the TCPs. Indicating a small window may restrict the 1653 transmission of data to the point of introducing a round trip delay 1654 between each new segment transmitted. 1656 The mechanisms provided allow a TCP to advertise a large window and 1657 to subsequently advertise a much smaller window without having 1658 accepted that much data. This, so called "shrinking the window," is 1659 strongly discouraged. The robustness principle dictates that TCPs 1660 will not shrink the window themselves, but will be prepared for such 1661 behavior on the part of other TCPs. 1663 The sending TCP must be prepared to accept from the user and send at 1664 least one octet of new data even if the send window is zero. The 1665 sending TCP must regularly retransmit to the receiving TCP even when 1666 the window is zero. Two minutes is recommended for the 1667 retransmission interval when the window is zero. This retransmission 1668 is essential to guarantee that when either TCP has a zero window the 1669 re-opening of the window will be reliably reported to the other. 1671 When the receiving TCP has a zero window and a segment arrives it 1672 must still send an acknowledgment showing its next expected sequence 1673 number and current window (zero). 1675 The sending TCP packages the data to be transmitted into segments 1676 which fit the current window, and may repackage segments on the 1677 retransmission queue. Such repackaging is not required, but may be 1678 helpful. 1680 In a connection with a one-way data flow, the window information will 1681 be carried in acknowledgment segments that all have the same sequence 1682 number so there will be no way to reorder them if they arrive out of 1683 order. This is not a serious problem, but it will allow the window 1684 information to be on occasion temporarily based on old reports from 1685 the data receiver. A refinement to avoid this problem is to act on 1686 the window information from segments that carry the highest 1687 acknowledgment number (that is segments with acknowledgment number 1688 equal or greater than the highest previously received). 1690 The window management procedure has significant influence on the 1691 communication performance. The following comments are suggestions to 1692 implementers. 1694 Window Management Suggestions 1696 Allocating a very small window causes data to be transmitted in 1697 many small segments when better performance is achieved using 1698 fewer large segments. 1700 One suggestion for avoiding small windows is for the receiver to 1701 defer updating a window until the additional allocation is at 1702 least X percent of the maximum allocation possible for the 1703 connection (where X might be 20 to 40). 1705 Another suggestion is for the sender to avoid sending small 1706 segments by waiting until the window is large enough before 1707 sending data. If the user signals a push function then the data 1708 must be sent even if it is a small segment. 1710 Note that the acknowledgments should not be delayed or unnecessary 1711 retransmissions will result. One strategy would be to send an 1712 acknowledgment when a small segment arrives (with out updating the 1713 window information), and then to send another acknowledgment with 1714 new window information when the window is larger. 1716 The segment sent to probe a zero window may also begin a break up 1717 of transmitted data into smaller and smaller segments. If a 1718 segment containing a single data octet sent to probe a zero window 1719 is accepted, it consumes one octet of the window now available. 1720 If the sending TCP simply sends as much as it can whenever the 1721 window is non zero, the transmitted data will be broken into 1722 alternating big and small segments. As time goes on, occasional 1723 pauses in the receiver making window allocation available will 1724 result in breaking the big segments into a small and not quite so 1725 big pair. And after a while the data transmission will be in 1726 mostly small segments. 1728 The suggestion here is that the TCP implementations need to 1729 actively attempt to combine small window allocations into larger 1730 windows, since the mechanisms for managing the window tend to lead 1731 to many small windows in the simplest minded implementations. 1733 3.9. Interfaces 1735 There are of course two interfaces of concern: the user/TCP interface 1736 and the TCP/lower-level interface. We have a fairly elaborate model 1737 of the user/TCP interface, but the interface to the lower level 1738 protocol module is left unspecified here, since it will be specified 1739 in detail by the specification of the lower level protocol. For the 1740 case that the lower level is IP we note some of the parameter values 1741 that TCPs might use. 1743 3.9.1. User/TCP Interface 1745 The following functional description of user commands to the TCP is, 1746 at best, fictional, since every operating system will have different 1747 facilities. Consequently, we must warn readers that different TCP 1748 implementations may have different user interfaces. However, all 1749 TCPs must provide a certain minimum set of services to guarantee that 1750 all TCP implementations can support the same protocol hierarchy. 1751 This section specifies the functional interfaces required of all TCP 1752 implementations. 1754 TCP User Commands 1756 The following sections functionally characterize a USER/TCP 1757 interface. The notation used is similar to most procedure or 1758 function calls in high level languages, but this usage is not 1759 meant to rule out trap type service calls (e.g., SVCs, UUOs, 1760 EMTs). 1762 The user commands described below specify the basic functions the 1763 TCP must perform to support interprocess communication. 1764 Individual implementations must define their own exact format, and 1765 may provide combinations or subsets of the basic functions in 1766 single calls. In particular, some implementations may wish to 1767 automatically OPEN a connection on the first SEND or RECEIVE 1768 issued by the user for a given connection. 1770 In providing interprocess communication facilities, the TCP must 1771 not only accept commands, but must also return information to the 1772 processes it serves. The latter consists of: 1774 (a) general information about a connection (e.g., interrupts, 1775 remote close, binding of unspecified foreign socket). 1777 (b) replies to specific user commands indicating success or 1778 various types of failure. 1780 Open 1782 Format: OPEN (local port, foreign socket, active/passive [, 1783 timeout] [, precedence] [, security/compartment] [, options]) 1784 -> local connection name 1786 We assume that the local TCP is aware of the identity of the 1787 processes it serves and will check the authority of the process 1788 to use the connection specified. Depending upon the 1789 implementation of the TCP, the local network and TCP 1790 identifiers for the source address will either be supplied by 1791 the TCP or the lower level protocol (e.g., IP). These 1792 considerations are the result of concern about security, to the 1793 extent that no TCP be able to masquerade as another one, and so 1794 on. Similarly, no process can masquerade as another without 1795 the collusion of the TCP. 1797 If the active/passive flag is set to passive, then this is a 1798 call to LISTEN for an incoming connection. A passive open may 1799 have either a fully specified foreign socket to wait for a 1800 particular connection or an unspecified foreign socket to wait 1801 for any call. A fully specified passive call can be made 1802 active by the subsequent execution of a SEND. 1804 A transmission control block (TCB) is created and partially 1805 filled in with data from the OPEN command parameters. 1807 On an active OPEN command, the TCP will begin the procedure to 1808 synchronize (i.e., establish) the connection at once. 1810 The timeout, if present, permits the caller to set up a timeout 1811 for all data submitted to TCP. If data is not successfully 1812 delivered to the destination within the timeout period, the TCP 1813 will abort the connection. The present global default is five 1814 minutes. 1816 The TCP or some component of the operating system will verify 1817 the users authority to open a connection with the specified 1818 precedence or security/compartment. The absence of precedence 1819 or security/compartment specification in the OPEN call 1820 indicates the default values must be used. 1822 TCP will accept incoming requests as matching only if the 1823 security/compartment information is exactly the same and only 1824 if the precedence is equal to or higher than the precedence 1825 requested in the OPEN call. 1827 The precedence for the connection is the higher of the values 1828 requested in the OPEN call and received from the incoming 1829 request, and fixed at that value for the life of the 1830 connection.Implementers may want to give the user control of 1831 this precedence negotiation. For example, the user might be 1832 allowed to specify that the precedence must be exactly matched, 1833 or that any attempt to raise the precedence be confirmed by the 1834 user. 1836 A local connection name will be returned to the user by the 1837 TCP. The local connection name can then be used as a short 1838 hand term for the connection defined by the pair. 1841 Send 1843 Format: SEND (local connection name, buffer address, byte 1844 count, PUSH flag, URGENT flag [,timeout]) 1845 This call causes the data contained in the indicated user 1846 buffer to be sent on the indicated connection. If the 1847 connection has not been opened, the SEND is considered an 1848 error. Some implementations may allow users to SEND first; in 1849 which case, an automatic OPEN would be done. If the calling 1850 process is not authorized to use this connection, an error is 1851 returned. 1853 If the PUSH flag is set, the data must be transmitted promptly 1854 to the receiver, and the PUSH bit will be set in the last TCP 1855 segment created from the buffer. If the PUSH flag is not set, 1856 the data may be combined with data from subsequent SENDs for 1857 transmission efficiency. 1859 New applications SHOULD NOT set the URGENT flag [13] due to 1860 implementation differences and middlebox issues. 1862 If the URGENT flag is set, segments sent to the destination TCP 1863 will have the urgent pointer set. The receiving TCP will 1864 signal the urgent condition to the receiving process if the 1865 urgent pointer indicates that data preceding the urgent pointer 1866 has not been consumed by the receiving process. The purpose of 1867 urgent is to stimulate the receiver to process the urgent data 1868 and to indicate to the receiver when all the currently known 1869 urgent data has been received. The number of times the sending 1870 user's TCP signals urgent will not necessarily be equal to the 1871 number of times the receiving user will be notified of the 1872 presence of urgent data. 1874 If no foreign socket was specified in the OPEN, but the 1875 connection is established (e.g., because a LISTENing connection 1876 has become specific due to a foreign segment arriving for the 1877 local socket), then the designated buffer is sent to the 1878 implied foreign socket. Users who make use of OPEN with an 1879 unspecified foreign socket can make use of SEND without ever 1880 explicitly knowing the foreign socket address. 1882 However, if a SEND is attempted before the foreign socket 1883 becomes specified, an error will be returned. Users can use 1884 the STATUS call to determine the status of the connection. In 1885 some implementations the TCP may notify the user when an 1886 unspecified socket is bound. 1888 If a timeout is specified, the current user timeout for this 1889 connection is changed to the new one. 1891 In the simplest implementation, SEND would not return control 1892 to the sending process until either the transmission was 1893 complete or the timeout had been exceeded. However, this 1894 simple method is both subject to deadlocks (for example, both 1895 sides of the connection might try to do SENDs before doing any 1896 RECEIVEs) and offers poor performance, so it is not 1897 recommended. A more sophisticated implementation would return 1898 immediately to allow the process to run concurrently with 1899 network I/O, and, furthermore, to allow multiple SENDs to be in 1900 progress. Multiple SENDs are served in first come, first 1901 served order, so the TCP will queue those it cannot service 1902 immediately. 1904 We have implicitly assumed an asynchronous user interface in 1905 which a SEND later elicits some kind of SIGNAL or pseudo- 1906 interrupt from the serving TCP. An alternative is to return a 1907 response immediately. For instance, SENDs might return 1908 immediate local acknowledgment, even if the segment sent had 1909 not been acknowledged by the distant TCP. We could 1910 optimistically assume eventual success. If we are wrong, the 1911 connection will close anyway due to the timeout. In 1912 implementations of this kind (synchronous), there will still be 1913 some asynchronous signals, but these will deal with the 1914 connection itself, and not with specific segments or buffers. 1916 In order for the process to distinguish among error or success 1917 indications for different SENDs, it might be appropriate for 1918 the buffer address to be returned along with the coded response 1919 to the SEND request. TCP-to-user signals are discussed below, 1920 indicating the information which should be returned to the 1921 calling process. 1923 Receive 1925 Format: RECEIVE (local connection name, buffer address, byte 1926 count) -> byte count, urgent flag, push flag 1928 This command allocates a receiving buffer associated with the 1929 specified connection. If no OPEN precedes this command or the 1930 calling process is not authorized to use this connection, an 1931 error is returned. 1933 In the simplest implementation, control would not return to the 1934 calling program until either the buffer was filled, or some 1935 error occurred, but this scheme is highly subject to deadlocks. 1936 A more sophisticated implementation would permit several 1937 RECEIVEs to be outstanding at once. These would be filled as 1938 segments arrive. This strategy permits increased throughput at 1939 the cost of a more elaborate scheme (possibly asynchronous) to 1940 notify the calling program that a PUSH has been seen or a 1941 buffer filled. 1943 If enough data arrive to fill the buffer before a PUSH is seen, 1944 the PUSH flag will not be set in the response to the RECEIVE. 1945 The buffer will be filled with as much data as it can hold. If 1946 a PUSH is seen before the buffer is filled the buffer will be 1947 returned partially filled and PUSH indicated. 1949 If there is urgent data the user will have been informed as 1950 soon as it arrived via a TCP-to-user signal. The receiving 1951 user should thus be in "urgent mode". If the URGENT flag is 1952 on, additional urgent data remains. If the URGENT flag is off, 1953 this call to RECEIVE has returned all the urgent data, and the 1954 user may now leave "urgent mode". Note that data following the 1955 urgent pointer (non-urgent data) cannot be delivered to the 1956 user in the same buffer with preceding urgent data unless the 1957 boundary is clearly marked for the user. 1959 To distinguish among several outstanding RECEIVEs and to take 1960 care of the case that a buffer is not completely filled, the 1961 return code is accompanied by both a buffer pointer and a byte 1962 count indicating the actual length of the data received. 1964 Alternative implementations of RECEIVE might have the TCP 1965 allocate buffer storage, or the TCP might share a ring buffer 1966 with the user. 1968 Close 1970 Format: CLOSE (local connection name) 1972 This command causes the connection specified to be closed. If 1973 the connection is not open or the calling process is not 1974 authorized to use this connection, an error is returned. 1975 Closing connections is intended to be a graceful operation in 1976 the sense that outstanding SENDs will be transmitted (and 1977 retransmitted), as flow control permits, until all have been 1978 serviced. Thus, it should be acceptable to make several SEND 1979 calls, followed by a CLOSE, and expect all the data to be sent 1980 to the destination. It should also be clear that users should 1981 continue to RECEIVE on CLOSING connections, since the other 1982 side may be trying to transmit the last of its data. Thus, 1983 CLOSE means "I have no more to send" but does not mean "I will 1984 not receive any more." It may happen (if the user level 1985 protocol is not well thought out) that the closing side is 1986 unable to get rid of all its data before timing out. In this 1987 event, CLOSE turns into ABORT, and the closing TCP gives up. 1989 The user may CLOSE the connection at any time on his own 1990 initiative, or in response to various prompts from the TCP 1991 (e.g., remote close executed, transmission timeout exceeded, 1992 destination inaccessible). 1994 Because closing a connection requires communication with the 1995 foreign TCP, connections may remain in the closing state for a 1996 short time. Attempts to reopen the connection before the TCP 1997 replies to the CLOSE command will result in error responses. 1999 Close also implies push function. 2001 Status 2003 Format: STATUS (local connection name) -> status data 2005 This is an implementation dependent user command and could be 2006 excluded without adverse effect. Information returned would 2007 typically come from the TCB associated with the connection. 2009 This command returns a data block containing the following 2010 information: 2012 local socket, 2013 foreign socket, 2014 local connection name, 2015 receive window, 2016 send window, 2017 connection state, 2018 number of buffers awaiting acknowledgment, 2019 number of buffers pending receipt, 2020 urgent state, 2021 precedence, 2022 security/compartment, 2023 and transmission timeout. 2025 Depending on the state of the connection, or on the 2026 implementation itself, some of this information may not be 2027 available or meaningful. If the calling process is not 2028 authorized to use this connection, an error is returned. This 2029 prevents unauthorized processes from gaining information about 2030 a connection. 2032 Abort 2034 Format: ABORT (local connection name) 2035 This command causes all pending SENDs and RECEIVES to be 2036 aborted, the TCB to be removed, and a special RESET message to 2037 be sent to the TCP on the other side of the connection. 2038 Depending on the implementation, users may receive abort 2039 indications for each outstanding SEND or RECEIVE, or may simply 2040 receive an ABORT-acknowledgment. 2042 TCP-to-User Messages 2044 It is assumed that the operating system environment provides a 2045 means for the TCP to asynchronously signal the user program. 2046 When the TCP does signal a user program, certain information is 2047 passed to the user. Often in the specification the information 2048 will be an error message. In other cases there will be 2049 information relating to the completion of processing a SEND or 2050 RECEIVE or other user call. 2052 The following information is provided: 2054 Local Connection Name Always 2055 Response String Always 2056 Buffer Address Send & Receive 2057 Byte count (counts bytes received) Receive 2058 Push flag Receive 2059 Urgent flag Receive 2061 3.9.2. TCP/Lower-Level Interface 2063 The TCP calls on a lower level protocol module to actually send and 2064 receive information over a network. One case is that of the ARPA 2065 internetwork system where the lower level module is the Internet 2066 Protocol (IP) [2]. 2068 If the lower level protocol is IP it provides arguments for a type of 2069 service and for a time to live. TCP uses the following settings for 2070 these parameters: 2072 Type of Service = Precedence: given by user, Delay: normal, 2073 Throughput: normal, Reliability: normal; or binary XXX00000, where 2074 XXX are the three bits determining precedence, e.g. 000 means 2075 routine precedence. 2077 Time to Live = one minute, or 00111100. 2079 Note that the assumed maximum segment lifetime is two minutes. 2080 Here we explicitly ask that a segment be destroyed if it cannot 2081 be delivered by the internet system within one minute. 2083 If the lower level is IP (or other protocol that provides this 2084 feature) and source routing is used, the interface must allow the 2085 route information to be communicated. This is especially important 2086 so that the source and destination addresses used in the TCP checksum 2087 be the originating source and ultimate destination. It is also 2088 important to preserve the return route to answer connection requests. 2090 Any lower level protocol will have to provide the source address, 2091 destination address, and protocol fields, and some way to determine 2092 the "TCP length", both to provide the functional equivalent service 2093 of IP and to be used in the TCP checksum. 2095 3.10. Event Processing 2097 The processing depicted in this section is an example of one possible 2098 implementation. Other implementations may have slightly different 2099 processing sequences, but they should differ from those in this 2100 section only in detail, not in substance. 2102 The activity of the TCP can be characterized as responding to events. 2103 The events that occur can be cast into three categories: user calls, 2104 arriving segments, and timeouts. This section describes the 2105 processing the TCP does in response to each of the events. In many 2106 cases the processing required depends on the state of the connection. 2108 Events that occur: 2110 User Calls 2112 OPEN 2113 SEND 2114 RECEIVE 2115 CLOSE 2116 ABORT 2117 STATUS 2119 Arriving Segments 2121 SEGMENT ARRIVES 2123 Timeouts 2125 USER TIMEOUT 2126 RETRANSMISSION TIMEOUT 2127 TIME-WAIT TIMEOUT 2129 The model of the TCP/user interface is that user commands receive an 2130 immediate return and possibly a delayed response via an event or 2131 pseudo interrupt. In the following descriptions, the term "signal" 2132 means cause a delayed response. 2134 Error responses are given as character strings. For example, user 2135 commands referencing connections that do not exist receive "error: 2136 connection not open". 2138 Please note in the following that all arithmetic on sequence numbers, 2139 acknowledgment numbers, windows, et cetera, is modulo 2**32 the size 2140 of the sequence number space. Also note that "=<" means less than or 2141 equal to (modulo 2**32). 2143 A natural way to think about processing incoming segments is to 2144 imagine that they are first tested for proper sequence number (i.e., 2145 that their contents lie in the range of the expected "receive window" 2146 in the sequence number space) and then that they are generally queued 2147 and processed in sequence number order. 2149 When a segment overlaps other already received segments we 2150 reconstruct the segment to contain just the new data, and adjust the 2151 header fields to be consistent. 2153 Note that if no state change is mentioned the TCP stays in the same 2154 state. 2156 OPEN Call 2158 CLOSED STATE (i.e., TCB does not exist) 2160 Create a new transmission control block (TCB) to hold 2161 connection state information. Fill in local socket identifier, 2162 foreign socket, precedence, security/compartment, and user 2163 timeout information. Note that some parts of the foreign 2164 socket may be unspecified in a passive OPEN and are to be 2165 filled in by the parameters of the incoming SYN segment. 2166 Verify the security and precedence requested are allowed for 2167 this user, if not return "error: precedence not allowed" or 2168 "error: security/compartment not allowed." If passive enter 2169 the LISTEN state and return. If active and the foreign socket 2170 is unspecified, return "error: foreign socket unspecified"; if 2171 active and the foreign socket is specified, issue a SYN 2172 segment. An initial send sequence number (ISS) is selected. A 2173 SYN segment of the form is sent. Set 2174 SND.UNA to ISS, SND.NXT to ISS+1, enter SYN-SENT state, and 2175 return. 2177 If the caller does not have access to the local socket 2178 specified, return "error: connection illegal for this process". 2179 If there is no room to create a new connection, return "error: 2180 insufficient resources". 2182 LISTEN STATE 2184 If active and the foreign socket is specified, then change the 2185 connection from passive to active, select an ISS. Send a SYN 2186 segment, set SND.UNA to ISS, SND.NXT to ISS+1. Enter SYN-SENT 2187 state. Data associated with SEND may be sent with SYN segment 2188 or queued for transmission after entering ESTABLISHED state. 2189 The urgent bit if requested in the command must be sent with 2190 the data segments sent as a result of this command. If there 2191 is no room to queue the request, respond with "error: 2192 insufficient resources". If Foreign socket was not specified, 2193 then return "error: foreign socket unspecified". 2195 SYN-SENT STATE 2196 SYN-RECEIVED STATE 2197 ESTABLISHED STATE 2198 FIN-WAIT-1 STATE 2199 FIN-WAIT-2 STATE 2200 CLOSE-WAIT STATE 2201 CLOSING STATE 2202 LAST-ACK STATE 2203 TIME-WAIT STATE 2205 Return "error: connection already exists". 2207 SEND Call 2209 CLOSED STATE (i.e., TCB does not exist) 2211 If the user does not have access to such a connection, then 2212 return "error: connection illegal for this process". 2214 Otherwise, return "error: connection does not exist". 2216 LISTEN STATE 2218 If the foreign socket is specified, then change the connection 2219 from passive to active, select an ISS. Send a SYN segment, set 2220 SND.UNA to ISS, SND.NXT to ISS+1. Enter SYN-SENT state. Data 2221 associated with SEND may be sent with SYN segment or queued for 2222 transmission after entering ESTABLISHED state. The urgent bit 2223 if requested in the command must be sent with the data segments 2224 sent as a result of this command. If there is no room to queue 2225 the request, respond with "error: insufficient resources". If 2226 Foreign socket was not specified, then return "error: foreign 2227 socket unspecified". 2229 SYN-SENT STATE 2230 SYN-RECEIVED STATE 2232 Queue the data for transmission after entering ESTABLISHED 2233 state. If no space to queue, respond with "error: insufficient 2234 resources". 2236 ESTABLISHED STATE 2237 CLOSE-WAIT STATE 2239 Segmentize the buffer and send it with a piggybacked 2240 acknowledgment (acknowledgment value = RCV.NXT). If there is 2241 insufficient space to remember this buffer, simply return 2242 "error: insufficient resources". 2244 If the urgent flag is set, then SND.UP <- SND.NXT and set the 2245 urgent pointer in the outgoing segments. 2247 FIN-WAIT-1 STATE 2248 FIN-WAIT-2 STATE 2249 CLOSING STATE 2250 LAST-ACK STATE 2251 TIME-WAIT STATE 2253 Return "error: connection closing" and do not service request. 2255 RECEIVE Call 2257 CLOSED STATE (i.e., TCB does not exist) 2259 If the user does not have access to such a connection, return 2260 "error: connection illegal for this process". 2262 Otherwise return "error: connection does not exist". 2264 LISTEN STATE 2265 SYN-SENT STATE 2266 SYN-RECEIVED STATE 2268 Queue for processing after entering ESTABLISHED state. If 2269 there is no room to queue this request, respond with "error: 2270 insufficient resources". 2272 ESTABLISHED STATE 2273 FIN-WAIT-1 STATE 2274 FIN-WAIT-2 STATE 2276 If insufficient incoming segments are queued to satisfy the 2277 request, queue the request. If there is no queue space to 2278 remember the RECEIVE, respond with "error: insufficient 2279 resources". 2281 Reassemble queued incoming segments into receive buffer and 2282 return to user. Mark "push seen" (PUSH) if this is the case. 2284 If RCV.UP is in advance of the data currently being passed to 2285 the user notify the user of the presence of urgent data. 2287 When the TCP takes responsibility for delivering data to the 2288 user that fact must be communicated to the sender via an 2289 acknowledgment. The formation of such an acknowledgment is 2290 described below in the discussion of processing an incoming 2291 segment. 2293 CLOSE-WAIT STATE 2295 Since the remote side has already sent FIN, RECEIVEs must be 2296 satisfied by text already on hand, but not yet delivered to the 2297 user. If no text is awaiting delivery, the RECEIVE will get a 2298 "error: connection closing" response. Otherwise, any remaining 2299 text can be used to satisfy the RECEIVE. 2301 CLOSING STATE 2302 LAST-ACK STATE 2303 TIME-WAIT STATE 2305 Return "error: connection closing". 2307 CLOSE Call 2309 CLOSED STATE (i.e., TCB does not exist) 2311 If the user does not have access to such a connection, return 2312 "error: connection illegal for this process". 2314 Otherwise, return "error: connection does not exist". 2316 LISTEN STATE 2318 Any outstanding RECEIVEs are returned with "error: closing" 2319 responses. Delete TCB, enter CLOSED state, and return. 2321 SYN-SENT STATE 2323 Delete the TCB and return "error: closing" responses to any 2324 queued SENDs, or RECEIVEs. 2326 SYN-RECEIVED STATE 2328 If no SENDs have been issued and there is no pending data to 2329 send, then form a FIN segment and send it, and enter FIN-WAIT-1 2330 state; otherwise queue for processing after entering 2331 ESTABLISHED state. 2333 ESTABLISHED STATE 2335 Queue this until all preceding SENDs have been segmentized, 2336 then form a FIN segment and send it. In any case, enter FIN- 2337 WAIT-1 state. 2339 FIN-WAIT-1 STATE 2340 FIN-WAIT-2 STATE 2342 Strictly speaking, this is an error and should receive a 2343 "error: connection closing" response. An "ok" response would 2344 be acceptable, too, as long as a second FIN is not emitted (the 2345 first FIN may be retransmitted though). 2347 CLOSE-WAIT STATE 2349 Queue this request until all preceding SENDs have been 2350 segmentized; then send a FIN segment, enter LAST-ACK state. 2352 CLOSING STATE 2353 LAST-ACK STATE 2354 TIME-WAIT STATE 2355 Respond with "error: connection closing". 2357 ABORT Call 2359 CLOSED STATE (i.e., TCB does not exist) 2361 If the user should not have access to such a connection, return 2362 "error: connection illegal for this process". 2364 Otherwise return "error: connection does not exist". 2366 LISTEN STATE 2368 Any outstanding RECEIVEs should be returned with "error: 2369 connection reset" responses. Delete TCB, enter CLOSED state, 2370 and return. 2372 SYN-SENT STATE 2374 All queued SENDs and RECEIVEs should be given "connection 2375 reset" notification, delete the TCB, enter CLOSED state, and 2376 return. 2378 SYN-RECEIVED STATE 2379 ESTABLISHED STATE 2380 FIN-WAIT-1 STATE 2381 FIN-WAIT-2 STATE 2382 CLOSE-WAIT STATE 2384 Send a reset segment: 2386 2388 All queued SENDs and RECEIVEs should be given "connection 2389 reset" notification; all segments queued for transmission 2390 (except for the RST formed above) or retransmission should be 2391 flushed, delete the TCB, enter CLOSED state, and return. 2393 CLOSING STATE LAST-ACK STATE TIME-WAIT STATE 2395 Respond with "ok" and delete the TCB, enter CLOSED state, and 2396 return. 2398 STATUS Call 2400 CLOSED STATE (i.e., TCB does not exist) 2402 If the user should not have access to such a connection, return 2403 "error: connection illegal for this process". 2405 Otherwise return "error: connection does not exist". 2407 LISTEN STATE 2409 Return "state = LISTEN", and the TCB pointer. 2411 SYN-SENT STATE 2413 Return "state = SYN-SENT", and the TCB pointer. 2415 SYN-RECEIVED STATE 2417 Return "state = SYN-RECEIVED", and the TCB pointer. 2419 ESTABLISHED STATE 2421 Return "state = ESTABLISHED", and the TCB pointer. 2423 FIN-WAIT-1 STATE 2425 Return "state = FIN-WAIT-1", and the TCB pointer. 2427 FIN-WAIT-2 STATE 2429 Return "state = FIN-WAIT-2", and the TCB pointer. 2431 CLOSE-WAIT STATE 2433 Return "state = CLOSE-WAIT", and the TCB pointer. 2435 CLOSING STATE 2437 Return "state = CLOSING", and the TCB pointer. 2439 LAST-ACK STATE 2441 Return "state = LAST-ACK", and the TCB pointer. 2443 TIME-WAIT STATE 2445 Return "state = TIME-WAIT", and the TCB pointer. 2447 SEGMENT ARRIVES 2449 If the state is CLOSED (i.e., TCB does not exist) then 2451 all data in the incoming segment is discarded. An incoming 2452 segment containing a RST is discarded. An incoming segment not 2453 containing a RST causes a RST to be sent in response. The 2454 acknowledgment and sequence field values are selected to make 2455 the reset sequence acceptable to the TCP that sent the 2456 offending segment. 2458 If the ACK bit is off, sequence number zero is used, 2460 2462 If the ACK bit is on, 2464 2466 Return. 2468 If the state is LISTEN then 2470 first check for an RST 2472 An incoming RST should be ignored. Return. 2474 second check for an ACK 2476 Any acknowledgment is bad if it arrives on a connection 2477 still in the LISTEN state. An acceptable reset segment 2478 should be formed for any arriving ACK-bearing segment. The 2479 RST should be formatted as follows: 2481 2483 Return. 2485 third check for a SYN 2487 If the SYN bit is set, check the security. If the security/ 2488 compartment on the incoming segment does not exactly match 2489 the security/compartment in the TCB then send a reset and 2490 return. 2492 2494 If the SEG.PRC is greater than the TCB.PRC then if allowed 2495 by the user and the system set TCB.PRC<-SEG.PRC, if not 2496 allowed send a reset and return. 2498 2500 If the SEG.PRC is less than the TCB.PRC then continue. 2502 Set RCV.NXT to SEG.SEQ+1, IRS is set to SEG.SEQ and any 2503 other control or text should be queued for processing later. 2504 ISS should be selected and a SYN segment sent of the form: 2506 2508 SND.NXT is set to ISS+1 and SND.UNA to ISS. The connection 2509 state should be changed to SYN-RECEIVED. Note that any 2510 other incoming control or data (combined with SYN) will be 2511 processed in the SYN-RECEIVED state, but processing of SYN 2512 and ACK should not be repeated. If the listen was not fully 2513 specified (i.e., the foreign socket was not fully 2514 specified), then the unspecified fields should be filled in 2515 now. 2517 fourth other text or control 2519 Any other control or text-bearing segment (not containing 2520 SYN) must have an ACK and thus would be discarded by the ACK 2521 processing. An incoming RST segment could not be valid, 2522 since it could not have been sent in response to anything 2523 sent by this incarnation of the connection. So you are 2524 unlikely to get here, but if you do, drop the segment, and 2525 return. 2527 If the state is SYN-SENT then 2529 first check the ACK bit 2531 If the ACK bit is set 2533 If SEG.ACK =< ISS, or SEG.ACK > SND.NXT, send a reset 2534 (unless the RST bit is set, if so drop the segment and 2535 return) 2537 2539 and discard the segment. Return. 2541 If SND.UNA < SEG.ACK =< SND.NXT then the ACK is 2542 acceptable. (TODO: in processing Errata ID 3300, it was 2543 noted that some stacks in the wild that do not send data 2544 on the SYN are just checking that SEG.ACK == SND.NXT ... 2545 think about whether anything should be said about that 2546 here) 2548 second check the RST bit 2550 If the RST bit is set 2552 If the ACK was acceptable then signal the user "error: 2553 connection reset", drop the segment, enter CLOSED state, 2554 delete TCB, and return. Otherwise (no ACK) drop the 2555 segment and return. 2557 third check the security and precedence 2559 If the security/compartment in the segment does not exactly 2560 match the security/compartment in the TCB, send a reset 2562 If there is an ACK 2564 2566 Otherwise 2568 2570 If there is an ACK 2572 The precedence in the segment must match the precedence 2573 in the TCB, if not, send a reset 2575 2577 If there is no ACK 2579 If the precedence in the segment is higher than the 2580 precedence in the TCB then if allowed by the user and the 2581 system raise the precedence in the TCB to that in the 2582 segment, if not allowed to raise the prec then send a 2583 reset. 2585 2587 If the precedence in the segment is lower than the 2588 precedence in the TCB continue. 2590 If a reset was sent, discard the segment and return. 2592 fourth check the SYN bit 2594 This step should be reached only if the ACK is ok, or there 2595 is no ACK, and it the segment did not contain a RST. 2597 If the SYN bit is on and the security/compartment and 2598 precedence are acceptable then, RCV.NXT is set to SEG.SEQ+1, 2599 IRS is set to SEG.SEQ. SND.UNA should be advanced to equal 2600 SEG.ACK (if there is an ACK), and any segments on the 2601 retransmission queue which are thereby acknowledged should 2602 be removed. 2604 If SND.UNA > ISS (our SYN has been ACKed), change the 2605 connection state to ESTABLISHED, form an ACK segment 2607 2609 and send it. Data or controls which were queued for 2610 transmission may be included. If there are other controls 2611 or text in the segment then continue processing at the sixth 2612 step below where the URG bit is checked, otherwise return. 2614 Otherwise enter SYN-RECEIVED, form a SYN,ACK segment 2616 2618 and send it. Set the variables: 2620 SND.WND <- SEG.WND 2621 SND.WL1 <- SEG.SEQ 2622 SND.WL2 <- SEG.ACK 2624 If there are other controls or text in the segment, queue 2625 them for processing after the ESTABLISHED state has been 2626 reached, return. 2628 fifth, if neither of the SYN or RST bits is set then drop the 2629 segment and return. 2631 Otherwise, 2633 first check sequence number 2635 SYN-RECEIVED STATE 2636 ESTABLISHED STATE 2637 FIN-WAIT-1 STATE 2638 FIN-WAIT-2 STATE 2639 CLOSE-WAIT STATE 2640 CLOSING STATE 2641 LAST-ACK STATE 2642 TIME-WAIT STATE 2644 Segments are processed in sequence. Initial tests on 2645 arrival are used to discard old duplicates, but further 2646 processing is done in SEG.SEQ order. If a segment's 2647 contents straddle the boundary between old and new, only the 2648 new parts should be processed. 2650 There are four cases for the acceptability test for an 2651 incoming segment: 2653 Segment Receive Test 2654 Length Window 2655 ------- ------- ------------------------------------------- 2657 0 0 SEG.SEQ = RCV.NXT 2659 0 >0 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND 2661 >0 0 not acceptable 2663 >0 >0 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND 2664 or RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND 2666 If the RCV.WND is zero, no segments will be acceptable, but 2667 special allowance should be made to accept valid ACKs, URGs 2668 and RSTs. 2670 If an incoming segment is not acceptable, an acknowledgment 2671 should be sent in reply (unless the RST bit is set, if so 2672 drop the segment and return): 2674 2676 After sending the acknowledgment, drop the unacceptable 2677 segment and return. 2679 In the following it is assumed that the segment is the 2680 idealized segment that begins at RCV.NXT and does not exceed 2681 the window. One could tailor actual segments to fit this 2682 assumption by trimming off any portions that lie outside the 2683 window (including SYN and FIN), and only processing further 2684 if the segment then begins at RCV.NXT. Segments with higher 2685 beginning sequence numbers should be held for later 2686 processing. 2688 second check the RST bit, 2690 SYN-RECEIVED STATE 2692 If the RST bit is set 2694 If this connection was initiated with a passive OPEN 2695 (i.e., came from the LISTEN state), then return this 2696 connection to LISTEN state and return. The user need 2697 not be informed. If this connection was initiated 2698 with an active OPEN (i.e., came from SYN-SENT state) 2699 then the connection was refused, signal the user 2700 "connection refused". In either case, all segments on 2701 the retransmission queue should be removed. And in 2702 the active OPEN case, enter the CLOSED state and 2703 delete the TCB, and return. 2705 ESTABLISHED 2706 FIN-WAIT-1 2707 FIN-WAIT-2 2708 CLOSE-WAIT 2710 If the RST bit is set then, any outstanding RECEIVEs and 2711 SEND should receive "reset" responses. All segment 2712 queues should be flushed. Users should also receive an 2713 unsolicited general "connection reset" signal. Enter the 2714 CLOSED state, delete the TCB, and return. 2716 CLOSING STATE 2717 LAST-ACK STATE 2718 TIME-WAIT 2720 If the RST bit is set then, enter the CLOSED state, 2721 delete the TCB, and return. 2723 third check security and precedence 2725 SYN-RECEIVED 2727 If the security/compartment and precedence in the segment 2728 do not exactly match the security/compartment and 2729 precedence in the TCB then send a reset, and return. 2731 ESTABLISHED 2732 FIN-WAIT-1 2733 FIN-WAIT-2 2734 CLOSE-WAIT 2735 CLOSING 2736 LAST-ACK 2737 TIME-WAIT 2739 If the security/compartment and precedence in the segment 2740 do not exactly match the security/compartment and 2741 precedence in the TCB then send a reset, any outstanding 2742 RECEIVEs and SEND should receive "reset" responses. All 2743 segment queues should be flushed. Users should also 2744 receive an unsolicited general "connection reset" signal. 2745 Enter the CLOSED state, delete the TCB, and return. 2747 Note this check is placed following the sequence check to 2748 prevent a segment from an old connection between these ports 2749 with a different security or precedence from causing an 2750 abort of the current connection. 2752 fourth, check the SYN bit, 2754 SYN-RECEIVED 2755 ESTABLISHED STATE 2756 FIN-WAIT STATE-1 2757 FIN-WAIT STATE-2 2758 CLOSE-WAIT STATE 2759 CLOSING STATE 2760 LAST-ACK STATE 2761 TIME-WAIT STATE 2763 TODO: need to incorporate RFC 1122 4.2.2.20(e) here 2765 If the SYN is in the window it is an error, send a reset, 2766 any outstanding RECEIVEs and SEND should receive "reset" 2767 responses, all segment queues should be flushed, the user 2768 should also receive an unsolicited general "connection 2769 reset" signal, enter the CLOSED state, delete the TCB, 2770 and return. 2772 If the SYN is not in the window this step would not be 2773 reached and an ack would have been sent in the first step 2774 (sequence number check). 2776 fifth check the ACK field, 2778 if the ACK bit is off drop the segment and return 2779 if the ACK bit is on 2781 SYN-RECEIVED STATE 2783 If SND.UNA < SEG.ACK =< SND.NXT then enter ESTABLISHED 2784 state and continue processing with variables below set 2785 to: 2787 SND.WND <- SEG.WND 2788 SND.WL1 <- SEG.SEQ 2789 SND.WL2 <- SEG.ACK 2791 If the segment acknowledgment is not acceptable, 2792 form a reset segment, 2794 2796 and send it. 2798 ESTABLISHED STATE 2800 If SND.UNA < SEG.ACK =< SND.NXT then, set SND.UNA <- 2801 SEG.ACK. Any segments on the retransmission queue 2802 which are thereby entirely acknowledged are removed. 2803 Users should receive positive acknowledgments for 2804 buffers which have been SENT and fully acknowledged 2805 (i.e., SEND buffer should be returned with "ok" 2806 response). If the ACK is a duplicate (SEG.ACK =< 2807 SND.UNA), it can be ignored. If the ACK acks 2808 something not yet sent (SEG.ACK > SND.NXT) then send 2809 an ACK, drop the segment, and return. 2811 If SND.UNA =< SEG.ACK =< SND.NXT, the send window 2812 should be updated. If (SND.WL1 < SEG.SEQ or (SND.WL1 2813 = SEG.SEQ and SND.WL2 =< SEG.ACK)), set SND.WND <- 2814 SEG.WND, set SND.WL1 <- SEG.SEQ, and set SND.WL2 <- 2815 SEG.ACK. 2817 Note that SND.WND is an offset from SND.UNA, that 2818 SND.WL1 records the sequence number of the last 2819 segment used to update SND.WND, and that SND.WL2 2820 records the acknowledgment number of the last segment 2821 used to update SND.WND. The check here prevents using 2822 old segments to update the window. 2824 FIN-WAIT-1 STATE 2825 In addition to the processing for the ESTABLISHED 2826 state, if our FIN is now acknowledged then enter FIN- 2827 WAIT-2 and continue processing in that state. 2829 FIN-WAIT-2 STATE 2831 In addition to the processing for the ESTABLISHED 2832 state, if the retransmission queue is empty, the 2833 user's CLOSE can be acknowledged ("ok") but do not 2834 delete the TCB. 2836 CLOSE-WAIT STATE 2838 Do the same processing as for the ESTABLISHED state. 2840 CLOSING STATE 2842 In addition to the processing for the ESTABLISHED 2843 state, if the ACK acknowledges our FIN then enter the 2844 TIME-WAIT state, otherwise ignore the segment. 2846 LAST-ACK STATE 2848 The only thing that can arrive in this state is an 2849 acknowledgment of our FIN. If our FIN is now 2850 acknowledged, delete the TCB, enter the CLOSED state, 2851 and return. 2853 TIME-WAIT STATE 2855 The only thing that can arrive in this state is a 2856 retransmission of the remote FIN. Acknowledge it, and 2857 restart the 2 MSL timeout. 2859 sixth, check the URG bit, 2861 ESTABLISHED STATE 2862 FIN-WAIT-1 STATE 2863 FIN-WAIT-2 STATE 2865 If the URG bit is set, RCV.UP <- max(RCV.UP,SEG.UP), and 2866 signal the user that the remote side has urgent data if 2867 the urgent pointer (RCV.UP) is in advance of the data 2868 consumed. If the user has already been signaled (or is 2869 still in the "urgent mode") for this continuous sequence 2870 of urgent data, do not signal the user again. 2872 CLOSE-WAIT STATE 2873 CLOSING STATE 2874 LAST-ACK STATE 2875 TIME-WAIT 2877 This should not occur, since a FIN has been received from 2878 the remote side. Ignore the URG. 2880 seventh, process the segment text, 2882 ESTABLISHED STATE 2883 FIN-WAIT-1 STATE 2884 FIN-WAIT-2 STATE 2886 Once in the ESTABLISHED state, it is possible to deliver 2887 segment text to user RECEIVE buffers. Text from segments 2888 can be moved into buffers until either the buffer is full 2889 or the segment is empty. If the segment empties and 2890 carries an PUSH flag, then the user is informed, when the 2891 buffer is returned, that a PUSH has been received. 2893 When the TCP takes responsibility for delivering the data 2894 to the user it must also acknowledge the receipt of the 2895 data. 2897 Once the TCP takes responsibility for the data it 2898 advances RCV.NXT over the data accepted, and adjusts 2899 RCV.WND as appropriate to the current buffer 2900 availability. The total of RCV.NXT and RCV.WND should 2901 not be reduced. 2903 Please note the window management suggestions in section 2904 3.7. 2906 Send an acknowledgment of the form: 2908 2910 This acknowledgment should be piggybacked on a segment 2911 being transmitted if possible without incurring undue 2912 delay. 2914 CLOSE-WAIT STATE 2915 CLOSING STATE 2916 LAST-ACK STATE 2917 TIME-WAIT STATE 2919 This should not occur, since a FIN has been received from 2920 the remote side. Ignore the segment text. 2922 eighth, check the FIN bit, 2924 Do not process the FIN if the state is CLOSED, LISTEN or 2925 SYN-SENT since the SEG.SEQ cannot be validated; drop the 2926 segment and return. 2928 If the FIN bit is set, signal the user "connection closing" 2929 and return any pending RECEIVEs with same message, advance 2930 RCV.NXT over the FIN, and send an acknowledgment for the 2931 FIN. Note that FIN implies PUSH for any segment text not 2932 yet delivered to the user. 2934 SYN-RECEIVED STATE 2935 ESTABLISHED STATE 2937 Enter the CLOSE-WAIT state. 2939 FIN-WAIT-1 STATE 2941 If our FIN has been ACKed (perhaps in this segment), 2942 then enter TIME-WAIT, start the time-wait timer, turn 2943 off the other timers; otherwise enter the CLOSING 2944 state. 2946 FIN-WAIT-2 STATE 2948 Enter the TIME-WAIT state. Start the time-wait timer, 2949 turn off the other timers. 2951 CLOSE-WAIT STATE 2953 Remain in the CLOSE-WAIT state. 2955 CLOSING STATE 2957 Remain in the CLOSING state. 2959 LAST-ACK STATE 2961 Remain in the LAST-ACK state. 2963 TIME-WAIT STATE 2965 Remain in the TIME-WAIT state. Restart the 2 MSL 2966 time-wait timeout. 2968 and return. 2970 USER TIMEOUT 2972 USER TIMEOUT 2974 For any state if the user timeout expires, flush all queues, 2975 signal the user "error: connection aborted due to user timeout" 2976 in general and for any outstanding calls, delete the TCB, enter 2977 the CLOSED state and return. 2979 RETRANSMISSION TIMEOUT 2981 For any state if the retransmission timeout expires on a 2982 segment in the retransmission queue, send the segment at the 2983 front of the retransmission queue again, reinitialize the 2984 retransmission timer, and return. 2986 TIME-WAIT TIMEOUT 2988 If the time-wait timeout expires on a connection delete the 2989 TCB, enter the CLOSED state and return. 2991 3.11. Glossary 2993 1822 BBN Report 1822, "The Specification of the Interconnection of 2994 a Host and an IMP". The specification of interface between a 2995 host and the ARPANET. 2997 ACK 2998 A control bit (acknowledge) occupying no sequence space, 2999 which indicates that the acknowledgment field of this segment 3000 specifies the next sequence number the sender of this segment 3001 is expecting to receive, hence acknowledging receipt of all 3002 previous sequence numbers. 3004 ARPANET message 3005 The unit of transmission between a host and an IMP in the 3006 ARPANET. The maximum size is about 1012 octets (8096 bits). 3008 ARPANET packet 3009 A unit of transmission used internally in the ARPANET between 3010 IMPs. The maximum size is about 126 octets (1008 bits). 3012 connection 3013 A logical communication path identified by a pair of sockets. 3015 datagram 3016 A message sent in a packet switched computer communications 3017 network. 3019 Destination Address 3020 The destination address, usually the network and host 3021 identifiers. 3023 FIN 3024 A control bit (finis) occupying one sequence number, which 3025 indicates that the sender will send no more data or control 3026 occupying sequence space. 3028 fragment 3029 A portion of a logical unit of data, in particular an 3030 internet fragment is a portion of an internet datagram. 3032 FTP 3033 A file transfer protocol. 3035 header 3036 Control information at the beginning of a message, segment, 3037 fragment, packet or block of data. 3039 host 3040 A computer. In particular a source or destination of 3041 messages from the point of view of the communication network. 3043 Identification 3044 An Internet Protocol field. This identifying value assigned 3045 by the sender aids in assembling the fragments of a datagram. 3047 IMP 3048 The Interface Message Processor, the packet switch of the 3049 ARPANET. 3051 internet address 3052 A source or destination address specific to the host level. 3054 internet datagram 3055 The unit of data exchanged between an internet module and the 3056 higher level protocol together with the internet header. 3058 internet fragment 3059 A portion of the data of an internet datagram with an 3060 internet header. 3062 IP 3063 Internet Protocol. 3065 IRS 3066 The Initial Receive Sequence number. The first sequence 3067 number used by the sender on a connection. 3069 ISN 3070 The Initial Sequence Number. The first sequence number used 3071 on a connection, (either ISS or IRS). Selected in a way that 3072 is unique within a given period of time and is unpredictable 3073 to attackers. 3075 ISS 3076 The Initial Send Sequence number. The first sequence number 3077 used by the sender on a connection. 3079 leader 3080 Control information at the beginning of a message or block of 3081 data. In particular, in the ARPANET, the control information 3082 on an ARPANET message at the host-IMP interface. 3084 left sequence 3085 This is the next sequence number to be acknowledged by the 3086 data receiving TCP (or the lowest currently unacknowledged 3087 sequence number) and is sometimes referred to as the left 3088 edge of the send window. 3090 local packet 3091 The unit of transmission within a local network. 3093 module 3094 An implementation, usually in software, of a protocol or 3095 other procedure. 3097 MSL 3098 Maximum Segment Lifetime, the time a TCP segment can exist in 3099 the internetwork system. Arbitrarily defined to be 2 3100 minutes. 3102 octet 3103 An eight bit byte. 3105 Options 3106 An Option field may contain several options, and each option 3107 may be several octets in length. The options are used 3108 primarily in testing situations; for example, to carry 3109 timestamps. Both the Internet Protocol and TCP provide for 3110 options fields. 3112 packet 3113 A package of data with a header which may or may not be 3114 logically complete. More often a physical packaging than a 3115 logical packaging of data. 3117 port 3118 The portion of a socket that specifies which logical input or 3119 output channel of a process is associated with the data. 3121 process 3122 A program in execution. A source or destination of data from 3123 the point of view of the TCP or other host-to-host protocol. 3125 PUSH 3126 A control bit occupying no sequence space, indicating that 3127 this segment contains data that must be pushed through to the 3128 receiving user. 3130 RCV.NXT 3131 receive next sequence number 3133 RCV.UP 3134 receive urgent pointer 3136 RCV.WND 3137 receive window 3139 receive next sequence number 3140 This is the next sequence number the local TCP is expecting 3141 to receive. 3143 receive window 3144 This represents the sequence numbers the local (receiving) 3145 TCP is willing to receive. Thus, the local TCP considers 3146 that segments overlapping the range RCV.NXT to RCV.NXT + 3147 RCV.WND - 1 carry acceptable data or control. Segments 3148 containing sequence numbers entirely outside of this range 3149 are considered duplicates and discarded. 3151 RST 3152 A control bit (reset), occupying no sequence space, 3153 indicating that the receiver should delete the connection 3154 without further interaction. The receiver can determine, 3155 based on the sequence number and acknowledgment fields of the 3156 incoming segment, whether it should honor the reset command 3157 or ignore it. In no case does receipt of a segment 3158 containing RST give rise to a RST in response. 3160 RTP 3161 Real Time Protocol: A host-to-host protocol for communication 3162 of time critical information. 3164 SEG.ACK 3165 segment acknowledgment 3167 SEG.LEN 3168 segment length 3170 SEG.PRC 3171 segment precedence value 3173 SEG.SEQ 3174 segment sequence 3176 SEG.UP 3177 segment urgent pointer field 3179 SEG.WND 3180 segment window field 3182 segment 3183 A logical unit of data, in particular a TCP segment is the 3184 unit of data transfered between a pair of TCP modules. 3186 segment acknowledgment 3187 The sequence number in the acknowledgment field of the 3188 arriving segment. 3190 segment length 3191 The amount of sequence number space occupied by a segment, 3192 including any controls which occupy sequence space. 3194 segment sequence 3195 The number in the sequence field of the arriving segment. 3197 send sequence 3198 This is the next sequence number the local (sending) TCP will 3199 use on the connection. It is initially selected from an 3200 initial sequence number curve (ISN) and is incremented for 3201 each octet of data or sequenced control transmitted. 3203 send window 3204 This represents the sequence numbers which the remote 3205 (receiving) TCP is willing to receive. It is the value of 3206 the window field specified in segments from the remote (data 3207 receiving) TCP. The range of new sequence numbers which may 3208 be emitted by a TCP lies between SND.NXT and SND.UNA + 3209 SND.WND - 1. (Retransmissions of sequence numbers between 3210 SND.UNA and SND.NXT are expected, of course.) 3212 SND.NXT 3213 send sequence 3215 SND.UNA 3216 left sequence 3218 SND.UP 3219 send urgent pointer 3221 SND.WL1 3222 segment sequence number at last window update 3224 SND.WL2 3225 segment acknowledgment number at last window update 3227 SND.WND 3228 send window 3230 socket 3231 An address which specifically includes a port identifier, 3232 that is, the concatenation of an Internet Address with a TCP 3233 port. 3235 Source Address 3236 The source address, usually the network and host identifiers. 3238 SYN 3239 A control bit in the incoming segment, occupying one sequence 3240 number, used at the initiation of a connection, to indicate 3241 where the sequence numbering will start. 3243 TCB 3244 Transmission control block, the data structure that records 3245 the state of a connection. 3247 TCB.PRC 3248 The precedence of the connection. 3250 TCP 3251 Transmission Control Protocol: A host-to-host protocol for 3252 reliable communication in internetwork environments. 3254 TOS 3255 Type of Service, an Internet Protocol field. 3257 Type of Service 3258 An Internet Protocol field which indicates the type of 3259 service for this internet fragment. 3261 URG 3262 A control bit (urgent), occupying no sequence space, used to 3263 indicate that the receiving user should be notified to do 3264 urgent processing as long as there is data to be consumed 3265 with sequence numbers less than the value indicated in the 3266 urgent pointer. 3268 urgent pointer 3269 A control field meaningful only when the URG bit is on. This 3270 field communicates the value of the urgent pointer which 3271 indicates the data octet associated with the sending user's 3272 urgent call. 3274 4. Changes from RFC 793 3276 This document obsoletes RFC 793 as well as RFC 6093 and 6528, which 3277 updated 793. In all cases, only the normative protocol specification 3278 and requirements have been incorporated into this document, and the 3279 informational text with background and rationale has not been carried 3280 in. The informational content of those documents is still valuable 3281 in learning about and understanding TCP, and they are valid 3282 Informational references, even though their normative content has 3283 been incorporated into this document. 3285 The main body of this document was adapted from RFC 793's Section 3, 3286 titled "FUNCTIONAL SPECIFICATION", with an attempt to keep formatting 3287 and layout as close as possible. 3289 The collection of applicable RFC Errata that have been reported and 3290 either accepted or held for an update to RFC 793 were incorporated 3291 (Errata IDs: 573, 574, 700, 701, 1283, 1561, 1562, 1564, 1565, 1571, 3292 1572, 2296, 2297, 2298, 2748, 2749, 2934, 3213, 3300, 3301). Some 3293 errata were not applicable due to other changes (Errata IDs: 572, 3294 575, 1569, 3602). TODO: 3305 3296 Changes to the specification of the Urgent Pointer described in RFC 3297 1122 and 6093 were incorporated. See RFC 6093 for detailed 3298 discussion of why these changes were necessary. 3300 The more secure Initial Sequence Number generation algorithm from RFC 3301 6528 was incorporated. See RFC 6528 for discussion of the attacks 3302 that this mitigates, as well as advice on selecting PRF algorithms 3303 and managing secret key data. 3305 RFC EDITOR'S NOTE: the content below is for detailed change tracking 3306 and planning, and not to be included with the final revision of the 3307 document. 3309 This document started as draft-eddy-rfc793bis-00, that was merely a 3310 proposal and rough plan for updating RFC 793. 3312 The -01 revision of this document incorporates the content of RFC 793 3313 Section 3 titled "FUNCTIONAL SPECIFICATION". Other content from RFC 3314 793 has not been incorporated. The -01 revision of this document 3315 makes some minor formatting changes to the RFC 793 content in order 3316 to convert the content into XML2RFC format and account for left-out 3317 parts of RFC 793. For instance, figure numbering differs and some 3318 indentation is not exactly the same. 3320 The -02 revision of draft-eddy-rfc793bis incorporates errata that 3321 have been verified: 3323 Errata ID 573: Reported by Bob Braden (note: This errata basically 3324 is just a reminder that RFC 1122 updates 793. Some of the 3325 associated changes are left pending to a separate revision that 3326 incorporates 1122. Bob's mention of PUSH in 793 section 2.8 was 3327 not applicable here because that section was not part of the 3328 "functional specification". Also the 1122 text on the 3329 retransmission timeout also has been updated by subsequent RFCs, 3330 so the change here deviates from Bob's suggestion to apply the 3331 1122 text.) 3332 Errata ID 574: Reported by Yin Shuming 3333 Errata ID 700: Reported by Yin Shuming 3334 Errata ID 701: Reported by Yin Shuming 3335 Errata ID 1283: Reported by Pei-chun Cheng 3336 Errata ID 1561: Reported by Constantin Hagemeier 3337 Errata ID 1562: Reported by Constantin Hagemeier 3338 Errata ID 1564: Reported by Constantin Hagemeier 3339 Errata ID 1565: Reported by Constantin Hagemeier 3340 Errata ID 1571: Reported by Constantin Hagemeier 3341 Errata ID 1572: Reported by Constantin Hagemeier 3342 Errata ID 2296: Reported by Vishwas Manral 3343 Errata ID 2297: Reported by Vishwas Manral 3344 Errata ID 2298: Reported by Vishwas Manral 3345 Errata ID 2748: Reported by Mykyta Yevstifeyev 3346 Errata ID 2749: Reported by Mykyta Yevstifeyev 3347 Errata ID 2934: Reported by Constantin Hagemeier 3348 Errata ID 3213: Reported by EugnJun Yi 3349 Errata ID 3300: Reported by Botong Huang 3350 Errata ID 3301: Reported by Botong Huang 3351 Note: Some verified errata were not used in this update, as they 3352 relate to sections of RFC 793 elided from this document. These 3353 include Errata ID 572, 575, and 1569. 3354 Note: Errata ID 3602 was not applied in this revision as it is 3355 duplicative of the 1122 corrections. 3356 There is an errata 3305 currently reported that need to be 3357 verified, held, or rejected by the ADs; it is addressing the same 3358 issue as draft-gont-tcpm-tcp-seq-validation and was not attempted 3359 to be applied to this document. 3361 Not related to RFC 793 content, this revision also makes small tweaks 3362 to the introductory text, fixes indentation of the pseudoheader 3363 diagram, and notes that the Security Considerations should also 3364 include privacy, when this section is written. 3366 The -03 revision of draft-eddy-rfc793bis revises all discussion of 3367 the urgent pointer in order to comply with RFC 6093, 1122, and 1011. 3368 Since 1122 held requirements on the urgent pointer, the full list of 3369 requirements was brought into an appendix of this document, so that 3370 it can be updated as-needed. 3372 The -04 revision of draft-eddy-rfc793bis includes the ISN generation 3373 changes from RFC 6528. 3375 The -05 revision of draft-eddy-rfc793bis incorporates MSS 3376 requirements and definitions from RFC 879, 1122, and 6691, as well as 3377 option-handling requirements from RFC 1122. 3379 The -00 revision of draft-ietf-tcpm-rfc793bis incorporates several 3380 additional clarifications and updates to the section on segmentation, 3381 many of which are based on feedback from Joe Touch improving from the 3382 initial text on this in the previous revision. 3384 TODO: Incomplete list of other planned changes - these can be added 3385 to and made more specific, as the document proceeds: 3387 1. incorporate all other 1122 additions 3388 2. point to major additional docs like 1323bis and 5681 3389 3. incorporate relevant parts of 3168 (ECN) 3390 4. incorporate Fernando's new number-checking fixes (if past the 3391 IESG in time) 3392 5. point to 5461 (soft errors) 3393 6. mention 5961 state machine option 3394 7. mention 6161 (reducing TIME-WAIT) 3395 8. incorporate 6429 (ZWP/persist) 3396 9. look at Tony Sabatini suggestion for describing DO field 3397 10. clearly specify treatment of reserved bits (see TCPM thread on 3398 EDO draft April 25, 2014) 3399 11. look at possible mention of draft-minshall-nagle (e.g. as in 3400 Linux) 3401 12. make sure that clarifications in RFC 1011 are captured 3402 13. per TCPM discussion, discussion of checking reserved bits may 3403 need to be altered from 793 3405 5. IANA Considerations 3407 This memo includes no request to IANA. Existing IANA registries for 3408 TCP parameters are sufficient. 3410 TODO: check whether entries pointing to 793 and other documents 3411 obsoleted by this one should be updated to point to this one instead. 3413 6. Security and Privacy Considerations 3415 TODO 3417 See RFC 6093 [13] for discussion of security considerations related 3418 to the urgent pointer field. 3420 Editor's Note: Scott Brim mentioned that this should include a 3421 PERPASS/privacy review. 3423 7. Acknowledgements 3425 This document is largely a revision of RFC 793, which Jon Postel was 3426 the editor of. Due to his excellent work, it was able to last for 3427 three decades before we felt the need to revise it. 3429 Andre Oppermann was a contributor and helped to edit the first 3430 revision of this document. 3432 We are thankful for the assistance of the IETF TCPM working group 3433 chairs: 3435 Michael Scharf 3436 Yoshifumi Nishida 3437 Pasi Sarolahti 3439 During early discussion of this work on the TCPM mailing list, and at 3440 the IETF 88 meeting in Vancouver, helpful comments, critiques, and 3441 reviews were received from (listed alphebetically): David Borman, 3442 Yuchung Cheng, Martin Duke, Kevin Lahey, Kevin Mason, Matt Mathis, 3443 Hagen Paul Pfeifer, Anthony Sabatini, Joe Touch, Reji Varghese, Lloyd 3444 Wood, and Alex Zimmermann. 3446 This document includes content from errata that were reported by 3447 (listed chronologically): Yin Shuming, Bob Braden, Morris M. Keesan, 3448 Pei-chun Cheng, Constantin Hagemeier, Vishwas Manral, Mykyta 3449 Yevstifeyev, EungJun Yi, Botong Huang. 3451 8. References 3453 8.1. Normative References 3455 [1] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 3456 November 1990. 3458 [2] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 3459 for IP version 6", RFC 1981, August 1996. 3461 [3] Bradner, S., "Key words for use in RFCs to Indicate 3462 Requirement Levels", BCP 14, RFC 2119, March 1997. 3464 [4] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms", 3465 RFC 2675, August 1999. 3467 [5] Lahey, K., "TCP Problems with Path MTU Discovery", RFC 3468 2923, September 2000. 3470 8.2. Informative References 3472 [6] Postel, J., "Transmission Control Protocol", STD 7, RFC 3473 793, September 1981. 3475 [7] Nagle, J., "Congestion control in IP/TCP internetworks", 3476 RFC 896, January 1984. 3478 [8] Braden, R., "Requirements for Internet Hosts - 3479 Communication Layers", STD 3, RFC 1122, October 1989. 3481 [9] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 3482 Discovery", RFC 4821, March 2007. 3484 [10] Culley, P., Elzur, U., Recio, R., Bailey, S., and J. 3485 Carrier, "Marker PDU Aligned Framing for TCP 3486 Specification", RFC 5044, October 2007. 3488 [11] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 3489 Control", RFC 5681, September 2009. 3491 [12] Sandlund, K., Pelletier, G., and L-E. Jonsson, "The RObust 3492 Header Compression (ROHC) Framework", RFC 5795, March 3493 2010. 3495 [13] Gont, F. and A. Yourtchenko, "On the Implementation of the 3496 TCP Urgent Mechanism", RFC 6093, January 2011. 3498 [14] Gont, F. and S. Bellovin, "Defending against Sequence 3499 Number Attacks", RFC 6528, February 2012. 3501 [15] Borman, D., "TCP Options and Maximum Segment Size (MSS)", 3502 RFC 6691, July 2012. 3504 [16] Duke, M., Braden, R., Eddy, W., Blanton, E., and A. 3505 Zimmermann, "A Roadmap for Transmission Control Protocol 3506 (TCP) Specification Documents", RFC 7414, February 2015. 3508 Appendix A. TCP Requirement Summary 3510 This section is adapted from RFC 1122. 3512 TODO: this needs to be seriously redone, to use 793bis section 3513 numbers instead of 1122 ones, the RFC1122 heading should be removed, 3514 and all 1122 requirements need to be reflected in 793bis text. 3516 TODO: NOTE that PMTUD+PLPMTUD is not included in this table of 3517 recommendations. 3519 | | | | |S| | 3520 | | | | |H| |F 3521 | | | | |O|M|o 3522 | | |S| |U|U|o 3523 | | |H| |L|S|t 3524 | |M|O| |D|T|n 3525 | |U|U|M| | |o 3526 | |S|L|A|N|N|t 3527 |RFC1122 |T|D|Y|O|O|t 3528 FEATURE |SECTION | | | |T|T|e 3529 -------------------------------------------------|--------|-|-|-|-|-|-- 3530 | | | | | | | 3531 Push flag | | | | | | | 3532 Aggregate or queue un-pushed data |4.2.2.2 | | |x| | | 3533 Sender collapse successive PSH flags |4.2.2.2 | |x| | | | 3534 SEND call can specify PUSH |4.2.2.2 | | |x| | | 3535 If cannot: sender buffer indefinitely |4.2.2.2 | | | | |x| 3536 If cannot: PSH last segment |4.2.2.2 |x| | | | | 3537 Notify receiving ALP of PSH |4.2.2.2 | | |x| | |1 3538 Send max size segment when possible |4.2.2.2 | |x| | | | 3539 | | | | | | | 3540 Window | | | | | | | 3541 Treat as unsigned number |4.2.2.3 |x| | | | | 3542 Handle as 32-bit number |4.2.2.3 | |x| | | | 3543 Shrink window from right |4.2.2.16| | | |x| | 3544 Robust against shrinking window |4.2.2.16|x| | | | | 3545 Receiver's window closed indefinitely |4.2.2.17| | |x| | | 3546 Sender probe zero window |4.2.2.17|x| | | | | 3547 First probe after RTO |4.2.2.17| |x| | | | 3548 Exponential backoff |4.2.2.17| |x| | | | 3549 Allow window stay zero indefinitely |4.2.2.17|x| | | | | 3550 Sender timeout OK conn with zero wind |4.2.2.17| | | | |x| 3551 | | | | | | | 3552 Urgent Data | | | | | | | 3553 Pointer indicates first non-urgent octet |4.2.2.4 |x| | | | | 3554 Arbitrary length urgent data sequence |4.2.2.4 |x| | | | | 3555 Inform ALP asynchronously of urgent data |4.2.2.4 |x| | | | |1 3556 ALP can learn if/how much urgent data Q'd |4.2.2.4 |x| | | | |1 3557 | | | | | | | 3558 TCP Options | | | | | | | 3559 Receive TCP option in any segment |4.2.2.5 |x| | | | | 3560 Ignore unsupported options |4.2.2.5 |x| | | | | 3561 Cope with illegal option length |4.2.2.5 |x| | | | | 3562 Implement sending & receiving MSS option |4.2.2.6 |x| | | | | 3563 IPv4 Send MSS option unless 536 |4.2.2.6 | |x| | | | 3564 IPv6 Send MSS option unless 1220 | N/A | |x| | | | 3565 Send MSS option always |4.2.2.6 | | |x| | | 3566 IPv4 Send-MSS default is 536 |4.2.2.6 |x| | | | | 3567 IPv6 Send-MSS default is 1220 | N/A |x| | | | | 3568 Calculate effective send seg size |4.2.2.6 |x| | | | | 3569 MSS accounts for varying MTU | N/A | |x| | | | 3570 | | | | | | | 3571 TCP Checksums | | | | | | | 3572 Sender compute checksum |4.2.2.7 |x| | | | | 3573 Receiver check checksum |4.2.2.7 |x| | | | | 3574 | | | | | | | 3575 ISN Selection | | | | | | | 3576 Include a clock-driven ISN generator component |4.2.2.9 |x| | | | | 3577 Secure ISN generator with a PRF component | N/A | |x| | | | 3578 | | | | | | | 3579 Opening Connections | | | | | | | 3580 Support simultaneous open attempts |4.2.2.10|x| | | | | 3581 SYN-RCVD remembers last state |4.2.2.11|x| | | | | 3582 Passive Open call interfere with others |4.2.2.18| | | | |x| 3583 Function: simultan. LISTENs for same port |4.2.2.18|x| | | | | 3584 Ask IP for src address for SYN if necc. |4.2.3.7 |x| | | | | 3585 Otherwise, use local addr of conn. |4.2.3.7 |x| | | | | 3586 OPEN to broadcast/multicast IP Address |4.2.3.14| | | | |x| 3587 Silently discard seg to bcast/mcast addr |4.2.3.14|x| | | | | 3588 | | | | | | | 3589 Closing Connections | | | | | | | 3590 RST can contain data |4.2.2.12| |x| | | | 3591 Inform application of aborted conn |4.2.2.13|x| | | | | 3592 Half-duplex close connections |4.2.2.13| | |x| | | 3593 Send RST to indicate data lost |4.2.2.13| |x| | | | 3594 In TIME-WAIT state for 2MSL seconds |4.2.2.13|x| | | | | 3595 Accept SYN from TIME-WAIT state |4.2.2.13| | |x| | | 3596 | | | | | | | 3597 Retransmissions | | | | | | | 3598 Jacobson Slow Start algorithm |4.2.2.15|x| | | | | 3599 Jacobson Congestion-Avoidance algorithm |4.2.2.15|x| | | | | 3600 Retransmit with same IP ident |4.2.2.15| | |x| | | 3601 Karn's algorithm |4.2.3.1 |x| | | | | 3602 Jacobson's RTO estimation alg. |4.2.3.1 |x| | | | | 3603 Exponential backoff |4.2.3.1 |x| | | | | 3604 SYN RTO calc same as data |4.2.3.1 | |x| | | | 3605 Recommended initial values and bounds |4.2.3.1 | |x| | | | 3606 | | | | | | | 3607 Generating ACK's: | | | | | | | 3608 Queue out-of-order segments |4.2.2.20| |x| | | | 3609 Process all Q'd before send ACK |4.2.2.20|x| | | | | 3610 Send ACK for out-of-order segment |4.2.2.21| | |x| | | 3611 Delayed ACK's |4.2.3.2 | |x| | | | 3612 Delay < 0.5 seconds |4.2.3.2 |x| | | | | 3613 Every 2nd full-sized segment ACK'd |4.2.3.2 |x| | | | | 3614 Receiver SWS-Avoidance Algorithm |4.2.3.3 |x| | | | | 3615 | | | | | | | 3616 Sending data | | | | | | | 3617 Configurable TTL |4.2.2.19|x| | | | | 3618 Sender SWS-Avoidance Algorithm |4.2.3.4 |x| | | | | 3619 Nagle algorithm |4.2.3.4 | |x| | | | 3620 Application can disable Nagle algorithm |4.2.3.4 |x| | | | | 3621 | | | | | | | 3622 Connection Failures: | | | | | | | 3623 Negative advice to IP on R1 retxs |4.2.3.5 |x| | | | | 3624 Close connection on R2 retxs |4.2.3.5 |x| | | | | 3625 ALP can set R2 |4.2.3.5 |x| | | | |1 3626 Inform ALP of R1<=retxs inform ALP |4.2.3.9 | |x| | | | 3651 Dest. Unreach (0,1,5) => abort conn |4.2.3.9 | | | | |x| 3652 Dest. Unreach (2-4) => abort conn |4.2.3.9 | |x| | | | 3653 Source Quench => slow start |4.2.3.9 | |x| | | | 3654 Time Exceeded => tell ALP, don't abort |4.2.3.9 | |x| | | | 3655 Param Problem => tell ALP, don't abort |4.2.3.9 | |x| | | | 3656 | | | | | | | 3657 Address Validation | | | | | | | 3658 Reject OPEN call to invalid IP address |4.2.3.10|x| | | | | 3659 Reject SYN from invalid IP address |4.2.3.10|x| | | | | 3660 Silently discard SYN to bcast/mcast addr |4.2.3.10|x| | | | | 3661 | | | | | | | 3662 TCP/ALP Interface Services | | | | | | | 3663 Error Report mechanism |4.2.4.1 |x| | | | | 3664 ALP can disable Error Report Routine |4.2.4.1 | |x| | | | 3665 ALP can specify TOS for sending |4.2.4.2 |x| | | | | 3666 Passed unchanged to IP |4.2.4.2 | |x| | | | 3667 ALP can change TOS during connection |4.2.4.2 | |x| | | | 3668 Pass received TOS up to ALP |4.2.4.2 | | |x| | | 3669 FLUSH call |4.2.4.3 | | |x| | | 3670 Optional local IP addr parm. in OPEN |4.2.4.4 |x| | | | | 3671 -------------------------------------------------|--------|-|-|-|-|-|-- 3673 FOOTNOTES: (1) "ALP" means Application-Layer program. 3675 Author's Address 3677 Wesley M. Eddy (editor) 3678 MTI Systems 3679 US 3681 Email: wes@mti-systems.com