| < draft-ietf-tcpm-rfc793bis-25.txt | draft-ietf-tcpm-rfc793bis-28.txt > | |||
|---|---|---|---|---|
| Internet Engineering Task Force W. Eddy, Ed. | Internet Engineering Task Force W. Eddy, Ed. | |||
| Internet-Draft MTI Systems | Internet-Draft MTI Systems | |||
| Obsoletes: 793, 879, 2873, 6093, 6429, 6528, 7 September 2021 | Obsoletes: 793, 879, 2873, 6093, 6429, 6528, 7 March 2022 | |||
| 6691 (if approved) | 6691 (if approved) | |||
| Updates: 5961, 1122 (if approved) | Updates: 5961, 1011, 1122 (if approved) | |||
| Intended status: Standards Track | Intended status: Standards Track | |||
| Expires: 11 March 2022 | Expires: 8 September 2022 | |||
| Transmission Control Protocol (TCP) Specification | Transmission Control Protocol (TCP) Specification | |||
| draft-ietf-tcpm-rfc793bis-25 | draft-ietf-tcpm-rfc793bis-28 | |||
| Abstract | Abstract | |||
| This document specifies the Transmission Control Protocol (TCP). TCP | This document specifies the Transmission Control Protocol (TCP). TCP | |||
| is an important transport layer protocol in the Internet protocol | is an important transport layer protocol in the Internet protocol | |||
| stack, and has continuously evolved over decades of use and growth of | stack, and has continuously evolved over decades of use and growth of | |||
| the Internet. Over this time, a number of changes have been made to | the Internet. Over this time, a number of changes have been made to | |||
| TCP as it was specified in RFC 793, though these have only been | TCP as it was specified in RFC 793, though these have only been | |||
| documented in a piecemeal fashion. This document collects and brings | documented in a piecemeal fashion. This document collects and brings | |||
| those changes together with the protocol specification from RFC 793. | those changes together with the protocol specification from RFC 793. | |||
| This document obsoletes RFC 793, as well as RFCs 879, 2873, 6093, | This document obsoletes RFC 793, as well as RFCs 879, 2873, 6093, | |||
| 6429, 6528, and 6691 that updated parts of RFC 793. It updates RFC | 6429, 6528, and 6691 that updated parts of RFC 793. It updates RFCs | |||
| 1122, and should be considered as a replacement for the portions of | 1011 and 1122, and should be considered as a replacement for the | |||
| that document dealing with TCP requirements. It also updates RFC | portions of those document dealing with TCP requirements. It also | |||
| 5961 by adding a small clarification in reset handling while in the | updates RFC 5961 by adding a small clarification in reset handling | |||
| SYN-RECEIVED state. The TCP header control bits from RFC 793 have | while in the SYN-RECEIVED state. The TCP header control bits from | |||
| also been updated based on RFC 3168. | RFC 793 have also been updated based on RFC 3168. | |||
| RFC EDITOR NOTE: If approved for publication as an RFC, this should | RFC EDITOR NOTE: If approved for publication as an RFC, this should | |||
| be marked additionally as "STD: 7" and replace RFC 793 in that role. | be marked additionally as "STD: 7" and replace RFC 793 in that role. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on 11 March 2022. | This Internet-Draft will expire on 8 September 2022. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2021 IETF Trust and the persons identified as the | Copyright (c) 2022 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents (https://trustee.ietf.org/ | |||
| license-info) in effect on the date of publication of this document. | license-info) in effect on the date of publication of this document. | |||
| Please review these documents carefully, as they describe your rights | Please review these documents carefully, as they describe your rights | |||
| and restrictions with respect to this document. Code Components | and restrictions with respect to this document. Code Components | |||
| extracted from this document must include Simplified BSD License text | extracted from this document must include Revised BSD License text as | |||
| as described in Section 4.e of the Trust Legal Provisions and are | described in Section 4.e of the Trust Legal Provisions and are | |||
| provided without warranty as described in the Simplified BSD License. | provided without warranty as described in the Revised BSD License. | |||
| This document may contain material from IETF Documents or IETF | This document may contain material from IETF Documents or IETF | |||
| Contributions published or made publicly available before November | Contributions published or made publicly available before November | |||
| 10, 2008. The person(s) controlling the copyright in some of this | 10, 2008. The person(s) controlling the copyright in some of this | |||
| material may not have granted the IETF Trust the right to allow | material may not have granted the IETF Trust the right to allow | |||
| modifications of such material outside the IETF Standards Process. | modifications of such material outside the IETF Standards Process. | |||
| Without obtaining an adequate license from the person(s) controlling | Without obtaining an adequate license from the person(s) controlling | |||
| the copyright in such materials, this document may not be modified | the copyright in such materials, this document may not be modified | |||
| outside the IETF Standards Process, and derivative works of it may | outside the IETF Standards Process, and derivative works of it may | |||
| not be created outside the IETF Standards Process, except to format | not be created outside the IETF Standards Process, except to format | |||
| it for publication as an RFC or to translate it into languages other | it for publication as an RFC or to translate it into languages other | |||
| than English. | than English. | |||
| Table of Contents | Table of Contents | |||
| 1. Purpose and Scope . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Purpose and Scope . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 | 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 | |||
| 2.2. Key TCP Concepts . . . . . . . . . . . . . . . . . . . . 5 | 2.2. Key TCP Concepts . . . . . . . . . . . . . . . . . . . . 6 | |||
| 3. Functional Specification . . . . . . . . . . . . . . . . . . 6 | 3. Functional Specification . . . . . . . . . . . . . . . . . . 6 | |||
| 3.1. Header Format . . . . . . . . . . . . . . . . . . . . . . 6 | 3.1. Header Format . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 3.2. Specific Option Definitions . . . . . . . . . . . . . . . 11 | 3.2. Specific Option Definitions . . . . . . . . . . . . . . . 12 | |||
| 3.2.1. Other Common Options . . . . . . . . . . . . . . . . 13 | 3.2.1. Other Common Options . . . . . . . . . . . . . . . . 13 | |||
| 3.2.2. Experimental TCP Options . . . . . . . . . . . . . . 13 | 3.2.2. Experimental TCP Options . . . . . . . . . . . . . . 13 | |||
| 3.3. TCP Terminology Overview . . . . . . . . . . . . . . . . 13 | 3.3. TCP Terminology Overview . . . . . . . . . . . . . . . . 13 | |||
| 3.3.1. Key Connection State Variables . . . . . . . . . . . 13 | 3.3.1. Key Connection State Variables . . . . . . . . . . . 13 | |||
| 3.3.2. State Machine Overview . . . . . . . . . . . . . . . 15 | 3.3.2. State Machine Overview . . . . . . . . . . . . . . . 15 | |||
| 3.4. Sequence Numbers . . . . . . . . . . . . . . . . . . . . 18 | 3.4. Sequence Numbers . . . . . . . . . . . . . . . . . . . . 18 | |||
| 3.5. Establishing a connection . . . . . . . . . . . . . . . . 26 | 3.4.1. Initial Sequence Number Selection . . . . . . . . . . 21 | |||
| 3.4.2. Knowing When to Keep Quiet . . . . . . . . . . . . . 23 | ||||
| 3.4.3. The TCP Quiet Time Concept . . . . . . . . . . . . . 23 | ||||
| 3.5. Establishing a connection . . . . . . . . . . . . . . . . 25 | ||||
| 3.5.1. Half-Open Connections and Other Anomalies . . . . . . 28 | ||||
| 3.5.2. Reset Generation . . . . . . . . . . . . . . . . . . 31 | ||||
| 3.5.3. Reset Processing . . . . . . . . . . . . . . . . . . 32 | ||||
| 3.6. Closing a Connection . . . . . . . . . . . . . . . . . . 32 | 3.6. Closing a Connection . . . . . . . . . . . . . . . . . . 32 | |||
| 3.6.1. Half-Closed Connections . . . . . . . . . . . . . . . 35 | 3.6.1. Half-Closed Connections . . . . . . . . . . . . . . . 35 | |||
| 3.7. Segmentation . . . . . . . . . . . . . . . . . . . . . . 35 | 3.7. Segmentation . . . . . . . . . . . . . . . . . . . . . . 35 | |||
| 3.7.1. Maximum Segment Size Option . . . . . . . . . . . . . 37 | 3.7.1. Maximum Segment Size Option . . . . . . . . . . . . . 37 | |||
| 3.7.2. Path MTU Discovery . . . . . . . . . . . . . . . . . 38 | 3.7.2. Path MTU Discovery . . . . . . . . . . . . . . . . . 38 | |||
| 3.7.3. Interfaces with Variable MTU Values . . . . . . . . . 39 | 3.7.3. Interfaces with Variable MTU Values . . . . . . . . . 39 | |||
| 3.7.4. Nagle Algorithm . . . . . . . . . . . . . . . . . . . 39 | 3.7.4. Nagle Algorithm . . . . . . . . . . . . . . . . . . . 39 | |||
| 3.7.5. IPv6 Jumbograms . . . . . . . . . . . . . . . . . . . 40 | 3.7.5. IPv6 Jumbograms . . . . . . . . . . . . . . . . . . . 40 | |||
| 3.8. Data Communication . . . . . . . . . . . . . . . . . . . 40 | 3.8. Data Communication . . . . . . . . . . . . . . . . . . . 40 | |||
| 3.8.1. Retransmission Timeout . . . . . . . . . . . . . . . 41 | 3.8.1. Retransmission Timeout . . . . . . . . . . . . . . . 41 | |||
| 3.8.2. TCP Congestion Control . . . . . . . . . . . . . . . 41 | 3.8.2. TCP Congestion Control . . . . . . . . . . . . . . . 41 | |||
| 3.8.3. TCP Connection Failures . . . . . . . . . . . . . . . 42 | 3.8.3. TCP Connection Failures . . . . . . . . . . . . . . . 42 | |||
| 3.8.4. TCP Keep-Alives . . . . . . . . . . . . . . . . . . . 43 | 3.8.4. TCP Keep-Alives . . . . . . . . . . . . . . . . . . . 43 | |||
| 3.8.5. The Communication of Urgent Information . . . . . . . 44 | 3.8.5. The Communication of Urgent Information . . . . . . . 44 | |||
| 3.8.6. Managing the Window . . . . . . . . . . . . . . . . . 45 | 3.8.6. Managing the Window . . . . . . . . . . . . . . . . . 45 | |||
| 3.9. Interfaces . . . . . . . . . . . . . . . . . . . . . . . 50 | 3.9. Interfaces . . . . . . . . . . . . . . . . . . . . . . . 50 | |||
| 3.9.1. User/TCP Interface . . . . . . . . . . . . . . . . . 50 | 3.9.1. User/TCP Interface . . . . . . . . . . . . . . . . . 50 | |||
| 3.9.2. TCP/Lower-Level Interface . . . . . . . . . . . . . . 59 | 3.9.2. TCP/Lower-Level Interface . . . . . . . . . . . . . . 59 | |||
| 3.10. Event Processing . . . . . . . . . . . . . . . . . . . . 62 | 3.10. Event Processing . . . . . . . . . . . . . . . . . . . . 61 | |||
| 3.10.1. OPEN Call . . . . . . . . . . . . . . . . . . . . . 63 | 3.10.1. OPEN Call . . . . . . . . . . . . . . . . . . . . . 63 | |||
| 3.10.2. SEND Call . . . . . . . . . . . . . . . . . . . . . 64 | 3.10.2. SEND Call . . . . . . . . . . . . . . . . . . . . . 64 | |||
| 3.10.3. RECEIVE Call . . . . . . . . . . . . . . . . . . . . 66 | 3.10.3. RECEIVE Call . . . . . . . . . . . . . . . . . . . . 65 | |||
| 3.10.4. CLOSE Call . . . . . . . . . . . . . . . . . . . . . 67 | 3.10.4. CLOSE Call . . . . . . . . . . . . . . . . . . . . . 67 | |||
| 3.10.5. ABORT Call . . . . . . . . . . . . . . . . . . . . . 68 | 3.10.5. ABORT Call . . . . . . . . . . . . . . . . . . . . . 68 | |||
| 3.10.6. STATUS Call . . . . . . . . . . . . . . . . . . . . 69 | 3.10.6. STATUS Call . . . . . . . . . . . . . . . . . . . . 69 | |||
| 3.10.7. SEGMENT ARRIVES . . . . . . . . . . . . . . . . . . 70 | 3.10.7. SEGMENT ARRIVES . . . . . . . . . . . . . . . . . . 70 | |||
| 3.10.8. Timeouts . . . . . . . . . . . . . . . . . . . . . . 84 | 3.10.8. Timeouts . . . . . . . . . . . . . . . . . . . . . . 84 | |||
| 4. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . 84 | 4. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . 84 | |||
| 5. Changes from RFC 793 . . . . . . . . . . . . . . . . . . . . 89 | 5. Changes from RFC 793 . . . . . . . . . . . . . . . . . . . . 89 | |||
| 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 95 | 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 96 | |||
| 7. Security and Privacy Considerations . . . . . . . . . . . . . 96 | 7. Security and Privacy Considerations . . . . . . . . . . . . . 97 | |||
| 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 98 | 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 99 | |||
| 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 99 | 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 100 | |||
| 9.1. Normative References . . . . . . . . . . . . . . . . . . 99 | 9.1. Normative References . . . . . . . . . . . . . . . . . . 100 | |||
| 9.2. Informative References . . . . . . . . . . . . . . . . . 100 | 9.2. Informative References . . . . . . . . . . . . . . . . . 102 | |||
| Appendix A. Other Implementation Notes . . . . . . . . . . . . . 106 | Appendix A. Other Implementation Notes . . . . . . . . . . . . . 107 | |||
| A.1. IP Security Compartment and Precedence . . . . . . . . . 106 | A.1. IP Security Compartment and Precedence . . . . . . . . . 108 | |||
| A.1.1. Precedence . . . . . . . . . . . . . . . . . . . . . 106 | A.1.1. Precedence . . . . . . . . . . . . . . . . . . . . . 108 | |||
| A.1.2. MLS Systems . . . . . . . . . . . . . . . . . . . . . 107 | A.1.2. MLS Systems . . . . . . . . . . . . . . . . . . . . . 109 | |||
| A.2. Sequence Number Validation . . . . . . . . . . . . . . . 107 | A.2. Sequence Number Validation . . . . . . . . . . . . . . . 109 | |||
| A.3. Nagle Modification . . . . . . . . . . . . . . . . . . . 108 | A.3. Nagle Modification . . . . . . . . . . . . . . . . . . . 109 | |||
| A.4. Low Water Mark Settings . . . . . . . . . . . . . . . . . 108 | A.4. Low Watermark Settings . . . . . . . . . . . . . . . . . 110 | |||
| Appendix B. TCP Requirement Summary . . . . . . . . . . . . . . 108 | Appendix B. TCP Requirement Summary . . . . . . . . . . . . . . 110 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 112 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 114 | |||
| 1. Purpose and Scope | 1. Purpose and Scope | |||
| In 1981, RFC 793 [16] was released, documenting the Transmission | In 1981, RFC 793 [16] was released, documenting the Transmission | |||
| Control Protocol (TCP), and replacing earlier specifications for TCP | Control Protocol (TCP), and replacing earlier specifications for TCP | |||
| that had been published in the past. | that had been published in the past. | |||
| Since then, TCP has been widely implemented, and has been used as a | Since then, TCP has been widely implemented, and has been used as a | |||
| transport protocol for numerous applications on the Internet. | transport protocol for numerous applications on the Internet. | |||
| For several decades, RFC 793 plus a number of other documents have | For several decades, RFC 793 plus a number of other documents have | |||
| combined to serve as the core specification for TCP [48]. Over time, | combined to serve as the core specification for TCP [50]. Over time, | |||
| a number of errata have been filed against RFC 793, as well as | a number of errata have been filed against RFC 793. There have also | |||
| deficiencies in security, performance, and many other aspects. The | been deficiencies found and resolved in security, performance, and | |||
| number of enhancements has grown over time across many separate | many other aspects. The number of enhancements has grown over time | |||
| documents. These were never accumulated together into a | across many separate documents. These were never accumulated | |||
| comprehensive update to the base specification. | together into a comprehensive update to the base specification. | |||
| The purpose of this document is to bring together all of the IETF | The purpose of this document is to bring together all of the IETF | |||
| Standards Track changes that have been made to the base TCP | Standards Track changes and other clarifications that have been made | |||
| functional specification and unify them into an update of RFC 793. | to the base TCP functional specification and unify them into an | |||
| updated version of RFC 793. | ||||
| Some companion documents are referenced for important algorithms that | Some companion documents are referenced for important algorithms that | |||
| are used by TCP (e.g. for congestion control), but have not been | are used by TCP (e.g. for congestion control), but have not been | |||
| completely included in this document. This is a conscious choice, as | completely included in this document. This is a conscious choice, as | |||
| this base specification can be used with multiple additional | this base specification can be used with multiple additional | |||
| algorithms that are developed and incorporated separately. This | algorithms that are developed and incorporated separately. This | |||
| document focuses on the common basis all TCP implementations must | document focuses on the common basis all TCP implementations must | |||
| support in order to interoperate. Since some additional TCP features | support in order to interoperate. Since some additional TCP features | |||
| have become quite complicated themselves (e.g. advanced loss recovery | have become quite complicated themselves (e.g. advanced loss recovery | |||
| and congestion control), future companion documents may attempt to | and congestion control), future companion documents may attempt to | |||
| skipping to change at page 4, line 46 ¶ | skipping to change at page 5, line 9 ¶ | |||
| explanations and rationale, where appropriate. | explanations and rationale, where appropriate. | |||
| This document is intended to be useful both in checking existing TCP | This document is intended to be useful both in checking existing TCP | |||
| implementations for conformance purposes, as well as in writing new | implementations for conformance purposes, as well as in writing new | |||
| implementations. | implementations. | |||
| 2. Introduction | 2. Introduction | |||
| RFC 793 contains a discussion of the TCP design goals and provides | RFC 793 contains a discussion of the TCP design goals and provides | |||
| examples of its operation, including examples of connection | examples of its operation, including examples of connection | |||
| establishment, connection termination, packet retransmission to | establishment, connection termination, and packet retransmission to | |||
| repair losses. | repair losses. | |||
| This document describes the basic functionality expected in modern | This document describes the basic functionality expected in modern | |||
| TCP implementations, and replaces the protocol specification in RFC | TCP implementations, and replaces the protocol specification in RFC | |||
| 793. It does not replicate or attempt to update the introduction and | 793. It does not replicate or attempt to update the introduction and | |||
| philosophy content in Sections 1 and 2 of RFC 793. Other documents | philosophy content in Sections 1 and 2 of RFC 793. Other documents | |||
| are referenced to provide explanation of the theory of operation, | are referenced to provide explanation of the theory of operation, | |||
| rationale, and detailed discussion of design decisions. This | rationale, and detailed discussion of design decisions. This | |||
| document only focuses on the normative behavior of the protocol. | document only focuses on the normative behavior of the protocol. | |||
| The "TCP Roadmap" [48] provides a more extensive guide to the RFCs | The "TCP Roadmap" [50] provides a more extensive guide to the RFCs | |||
| that define TCP and describe various important algorithms. The TCP | that define TCP and describe various important algorithms. The TCP | |||
| Roadmap contains sections on strongly encouraged enhancements that | Roadmap contains sections on strongly encouraged enhancements that | |||
| improve performance and other aspects of TCP beyond the basic | improve performance and other aspects of TCP beyond the basic | |||
| operation specified in this document. As one example, implementing | operation specified in this document. As one example, implementing | |||
| congestion control (e.g. [9]) is a TCP requirement, but is a complex | congestion control (e.g. [8]) is a TCP requirement, but is a complex | |||
| topic on its own, and not described in detail in this document, as | topic on its own, and not described in detail in this document, as | |||
| there are many options and possibilities that do not impact basic | there are many options and possibilities that do not impact basic | |||
| interoperability. Similarly, most TCP implementations today include | interoperability. Similarly, most TCP implementations today include | |||
| the high-performance extensions in [46], but these are not strictly | the high-performance extensions in [48], but these are not strictly | |||
| required or discussed in this document. Multipath considerations for | required or discussed in this document. Multipath considerations for | |||
| TCP are also specified separately in [57]. | TCP are also specified separately in [59]. | |||
| A list of changes from RFC 793 is contained in Section 5. | A list of changes from RFC 793 is contained in Section 5. | |||
| 2.1. Requirements Language | 2.1. Requirements Language | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| "OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in BCP | |||
| 14 [3][12] when, and only when, they appear in all capitals, as shown | 14 [3][12] when, and only when, they appear in all capitals, as shown | |||
| here. | here. | |||
| skipping to change at page 6, line 16 ¶ | skipping to change at page 6, line 24 ¶ | |||
| segments, with each TCP segment sent as an Internet Protocol (IP) | segments, with each TCP segment sent as an Internet Protocol (IP) | |||
| datagram. | datagram. | |||
| TCP reliability consists of detecting packet losses (via sequence | TCP reliability consists of detecting packet losses (via sequence | |||
| numbers) and errors (via per-segment checksums), as well as | numbers) and errors (via per-segment checksums), as well as | |||
| correction via retransmission. | correction via retransmission. | |||
| TCP supports unicast delivery of data. Anycast applications exist | TCP supports unicast delivery of data. Anycast applications exist | |||
| that successfully use TCP without modifications, though there is some | that successfully use TCP without modifications, though there is some | |||
| risk of instability due to changes of lower-layer forwarding behavior | risk of instability due to changes of lower-layer forwarding behavior | |||
| [45]. | [47]. | |||
| TCP is connection-oriented, though does not inherently include a | TCP is connection-oriented, though does not inherently include a | |||
| liveness detection capability. | liveness detection capability. | |||
| Data flow is supported bidirectionally over TCP connections, though | Data flow is supported bidirectionally over TCP connections, though | |||
| applications are free to send data only unidirectionally, if they so | applications are free to send data only unidirectionally, if they so | |||
| choose. | choose. | |||
| TCP uses port numbers to identify application services and to | TCP uses port numbers to identify application services and to | |||
| multiplex distinct flows between hosts. | multiplex distinct flows between hosts. | |||
| A more detailed description of TCP features compared to other | A more detailed description of TCP features compared to other | |||
| transport protocols can be found in Section 3.1 of [51]. Further | transport protocols can be found in Section 3.1 of [53]. Further | |||
| description of the motivations for developing TCP and its role in the | description of the motivations for developing TCP and its role in the | |||
| Internet protocol stack can be found in Section 2 of [16] and earlier | Internet protocol stack can be found in Section 2 of [16] and earlier | |||
| versions of the TCP specification. | versions of the TCP specification. | |||
| 3. Functional Specification | 3. Functional Specification | |||
| 3.1. Header Format | 3.1. Header Format | |||
| TCP segments are sent as internet datagrams. The Internet Protocol | TCP segments are sent as internet datagrams. The Internet Protocol | |||
| (IP) header carries several information fields, including the source | (IP) header carries several information fields, including the source | |||
| and destination host addresses [1] [13]. A TCP header follows the IP | and destination host addresses [1] [13]. A TCP header follows the IP | |||
| headers, supplying information specific to the TCP protocol. This | headers, supplying information specific to the TCP protocol. This | |||
| division allows for the existence of host level protocols other than | division allows for the existence of host level protocols other than | |||
| TCP. In early development of the Internet suite of protocols, the IP | TCP. In early development of the Internet suite of protocols, the IP | |||
| header fields had been a part of TCP. | header fields had been a part of TCP. | |||
| This document describes the TCP protocol. The TCP protocol uses TCP | This document describes the TCP protocol. The TCP protocol uses TCP | |||
| Headers. | Headers. | |||
| A TCP Header is formatted as follows, using the style from [65]: | A TCP Header, followed by any user data in the segment, is formatted | |||
| as follows, using the style from [67]: | ||||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | Source Port | Destination Port | | | Source Port | Destination Port | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | Sequence Number | | | Sequence Number | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | Acknowledgment Number | | | Acknowledgment Number | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| skipping to change at page 7, line 46 ¶ | skipping to change at page 8, line 4 ¶ | |||
| Destination Port: 16 bits. | Destination Port: 16 bits. | |||
| The destination port number. | The destination port number. | |||
| Sequence Number: 32 bits. | Sequence Number: 32 bits. | |||
| The sequence number of the first data octet in this segment (except | The sequence number of the first data octet in this segment (except | |||
| when the SYN flag is set). If SYN is set the sequence number is | when the SYN flag is set). If SYN is set the sequence number is | |||
| the initial sequence number (ISN) and the first data octet is | the initial sequence number (ISN) and the first data octet is | |||
| ISN+1. | ISN+1. | |||
| Acknowledgment Number: 32 bits. | Acknowledgment Number: 32 bits. | |||
| If the ACK control bit is set, this field contains the value of the | If the ACK control bit is set, this field contains the value of the | |||
| next sequence number the sender of the segment is expecting to | next sequence number the sender of the segment is expecting to | |||
| receive. Once a connection is established, this is always sent. | receive. Once a connection is established, this is always sent. | |||
| Data Offset (DOffset): 4 bits. | Data Offset (DOffset): 4 bits. | |||
| The number of 32 bit words in the TCP Header. This indicates where | The number of 32 bit words in the TCP Header. This indicates where | |||
| the data begins. The TCP header (even one including options) is an | the data begins. The TCP header (even one including options) is an | |||
| integer multiple of 32 bits long. | integer multiple of 32 bits long. | |||
| Reserved (Rsrvd): 4 bits. | Reserved (Rsrvd): 4 bits. | |||
| A set of control bits reserved for future use. Must be zero in | A set of control bits reserved for future use. Must be zero in | |||
| generated segments and must be ignored in received segments, if | generated segments and must be ignored in received segments, if | |||
| corresponding future features are unimplemented by the sending or | corresponding future features are unimplemented by the sending or | |||
| receiving host. | receiving host. | |||
| The control bits are also know as "flags". Assignment is managed | The control bits are also known as "flags". Assignment is managed | |||
| by IANA from the "TCP Header Flags" registry [61]. The currently | by IANA from the "TCP Header Flags" registry [63]. The currently | |||
| assigned control bits are CWR, ECE, URG, ACK, PSH, RST, SYN, and | assigned control bits are CWR, ECE, URG, ACK, PSH, RST, SYN, and | |||
| FIN. | FIN. | |||
| CWR: 1 bit. | CWR: 1 bit. | |||
| Congestion Window Reduced (see [7]). | Congestion Window Reduced (see [6]). | |||
| ECE: 1 bit. | ECE: 1 bit. | |||
| ECN-Echo (see [7]). | ECN-Echo (see [6]). | |||
| URG: 1 bit. | URG: 1 bit. | |||
| Urgent Pointer field is significant. | Urgent Pointer field is significant. | |||
| ACK: 1 bit. | ACK: 1 bit. | |||
| Acknowledgment field is significant. | Acknowledgment field is significant. | |||
| PSH: 1 bit. | PSH: 1 bit. | |||
| Push Function (see the Send Call description in Section 3.9.1). | Push Function (see the Send Call description in Section 3.9.1). | |||
| skipping to change at page 8, line 45 ¶ | skipping to change at page 9, line 4 ¶ | |||
| RST: 1 bit. | RST: 1 bit. | |||
| Reset the connection. | Reset the connection. | |||
| SYN: 1 bit. | SYN: 1 bit. | |||
| Synchronize sequence numbers. | Synchronize sequence numbers. | |||
| FIN: 1 bit. | FIN: 1 bit. | |||
| No more data from sender. | No more data from sender. | |||
| Window: 16 bits. | Window: 16 bits. | |||
| The number of data octets beginning with the one indicated in the | The number of data octets beginning with the one indicated in the | |||
| acknowledgment field that the sender of this segment is willing to | acknowledgment field that the sender of this segment is willing to | |||
| accept. The value is shifted when the Window Scaling extension is | accept. The value is shifted when the Window Scaling extension is | |||
| used [46]. | used [48]. | |||
| The window size MUST be treated as an unsigned number, or else | The window size MUST be treated as an unsigned number, or else | |||
| large window sizes will appear like negative windows and TCP will | large window sizes will appear like negative windows and TCP will | |||
| not work (MUST-1). It is RECOMMENDED that implementations will | not work (MUST-1). It is RECOMMENDED that implementations will | |||
| reserve 32-bit fields for the send and receive window sizes in the | reserve 32-bit fields for the send and receive window sizes in the | |||
| connection record and do all window computations with 32 bits (REC- | connection record and do all window computations with 32 bits (REC- | |||
| 1). | 1). | |||
| Checksum: 16 bits. | Checksum: 16 bits. | |||
| The checksum field is the 16 bit one's complement of the one's | The checksum field is the 16 bit ones' complement of the ones' | |||
| complement sum of all 16 bit words in the header and text. The | complement sum of all 16 bit words in the header and text. The | |||
| checksum computation needs to ensure the 16-bit alignment of the | checksum computation needs to ensure the 16-bit alignment of the | |||
| data being summed. If a segment contains an odd number of header | data being summed. If a segment contains an odd number of header | |||
| and text octets, alignment can be achieved by padding the last | and text octets, alignment can be achieved by padding the last | |||
| octet with zeros on its right to form a 16 bit word for checksum | octet with zeros on its right to form a 16 bit word for checksum | |||
| purposes. The pad is not transmitted as part of the segment. | purposes. The pad is not transmitted as part of the segment. | |||
| While computing the checksum, the checksum field itself is replaced | While computing the checksum, the checksum field itself is replaced | |||
| with zeros. | with zeros. | |||
| The checksum also covers a pseudo header (Figure 2) conceptually | The checksum also covers a pseudo header (Figure 2) conceptually | |||
| skipping to change at page 10, line 23 ¶ | skipping to change at page 10, line 28 ¶ | |||
| header value in the case of extension headers present in between | header value in the case of extension headers present in between | |||
| IPv6 and TCP). | IPv6 and TCP). | |||
| The TCP checksum is never optional. The sender MUST generate it | The TCP checksum is never optional. The sender MUST generate it | |||
| (MUST-2) and the receiver MUST check it (MUST-3). | (MUST-2) and the receiver MUST check it (MUST-3). | |||
| Urgent Pointer: 16 bits. | Urgent Pointer: 16 bits. | |||
| This field communicates the current value of the urgent pointer as | This field communicates the current value of the urgent pointer as | |||
| a positive offset from the sequence number in this segment. The | a positive offset from the sequence number in this segment. The | |||
| urgent pointer points to the sequence number of the octet following | urgent pointer points to the sequence number of the octet following | |||
| the urgent data. This field is only be interpreted in segments | the urgent data. This field is only to be interpreted in segments | |||
| with the URG control bit set. | with the URG control bit set. | |||
| Options: [TCP Option]; Options#Size == (DOffset-5)*32; present | Options: [TCP Option]; size(Options) == (DOffset-5)*32; present | |||
| only when DOffset > 5. | only when DOffset > 5. Note that this size expression also | |||
| includes any padding trailing the actual options present. | ||||
| Options may occupy space at the end of the TCP header and are a | Options may occupy space at the end of the TCP header and are a | |||
| multiple of 8 bits in length. All options are included in the | multiple of 8 bits in length. All options are included in the | |||
| checksum. An option may begin on any octet boundary. There are | checksum. An option may begin on any octet boundary. There are | |||
| two cases for the format of an option: | two cases for the format of an option: | |||
| Case 1: A single octet of option-kind. | Case 1: A single octet of option-kind. | |||
| Case 2: An octet of option-kind (Kind), an octet of option- | Case 2: An octet of option-kind (Kind), an octet of option- | |||
| length, and the actual option-data octets. | length, and the actual option-data octets. | |||
| The option-length counts the two octets of option-kind and option- | The option-length counts the two octets of option-kind and option- | |||
| length as well as the option-data octets. | length as well as the option-data octets. | |||
| Note that the list of options may be shorter than the data offset | Note that the list of options may be shorter than the data offset | |||
| field might imply. The content of the header beyond the End-of- | field might imply. The content of the header beyond the End-of- | |||
| Option option must be header padding (i.e., zero). | Option option MUST be header padding of zeros (MUST-69). | |||
| The list of all currently defined options is managed by IANA [60], | The list of all currently defined options is managed by IANA [62], | |||
| and each option is defined in other RFCs, as indicated there. That | and each option is defined in other RFCs, as indicated there. That | |||
| set includes experimental options that can be extended to support | set includes experimental options that can be extended to support | |||
| multiple concurrent usages [44]. | multiple concurrent usages [46]. | |||
| A given TCP implementation can support any currently defined | A given TCP implementation can support any currently defined | |||
| options, but the following options MUST be supported (MUST-4 - note | options, but the following options MUST be supported (MUST-4 - note | |||
| Maximum Segment Size option support is also part of MUST-19 in | Maximum Segment Size option support is also part of MUST-19 in | |||
| Section 3.7.2): | Section 3.7.2): | |||
| Kind Length Meaning | Kind Length Meaning | |||
| ---- ------ ------- | ---- ------ ------- | |||
| 0 - End of option list. | 0 - End of option list. | |||
| 1 - No-Operation. | 1 - No-Operation. | |||
| skipping to change at page 11, line 35 ¶ | skipping to change at page 11, line 39 ¶ | |||
| A TCP implementation MUST (MUST-6) ignore without error any TCP | A TCP implementation MUST (MUST-6) ignore without error any TCP | |||
| option it does not implement, assuming that the option has a length | option it does not implement, assuming that the option has a length | |||
| field. All TCP options except End of option list and No-Operation | field. All TCP options except End of option list and No-Operation | |||
| MUST have length fields, including all future options (MUST-68). | MUST have length fields, including all future options (MUST-68). | |||
| TCP implementations MUST be prepared to handle an illegal option | TCP implementations MUST be prepared to handle an illegal option | |||
| length (e.g., zero); a suggested procedure is to reset the | length (e.g., zero); a suggested procedure is to reset the | |||
| connection and log the error cause (MUST-7). | connection and log the error cause (MUST-7). | |||
| Note: There is ongoing work to extend the space available for TCP | Note: There is ongoing work to extend the space available for TCP | |||
| options, such as [64]. | options, such as [66]. | |||
| Data: variable length. | Data: variable length. | |||
| User data carried by the TCP segment. | User data carried by the TCP segment. | |||
| 3.2. Specific Option Definitions | 3.2. Specific Option Definitions | |||
| A TCP Option is one of: an End of Option List Option, a No-Operation | A TCP Option, in the mandatory option set, is one of: an End of | |||
| Option, or a Maximum Segment Size Option. | Option List Option, a No-Operation Option, or a Maximum Segment Size | |||
| Option. | ||||
| An End of Option List Option is formatted as follows: | An End of Option List Option is formatted as follows: | |||
| 0 | 0 | |||
| 0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
| +-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
| | 0 | | | 0 | | |||
| +-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
| where: | where: | |||
| skipping to change at page 13, line 20 ¶ | skipping to change at page 13, line 29 ¶ | |||
| Maximum Segment Size (MSS): 2 bytes. | Maximum Segment Size (MSS): 2 bytes. | |||
| The maximum receive segment size at the TCP endpoint that sends | The maximum receive segment size at the TCP endpoint that sends | |||
| this segment. | this segment. | |||
| 3.2.1. Other Common Options | 3.2.1. Other Common Options | |||
| Additional RFCs define some other commonly used options that are | Additional RFCs define some other commonly used options that are | |||
| recommended to implement for high performance, but not necessary for | recommended to implement for high performance, but not necessary for | |||
| basic TCP interoperability. These are the TCP Selective | basic TCP interoperability. These are the TCP Selective | |||
| Acknowledgement (SACK) option [21][24], TCP Timestamp (TS) option | Acknowledgement (SACK) option [23][27], TCP Timestamp (TS) option | |||
| [46], and TCP Window Scaling (WS) option [46]. | [48], and TCP Window Scaling (WS) option [48]. | |||
| 3.2.2. Experimental TCP Options | 3.2.2. Experimental TCP Options | |||
| Experimental TCP option values are defined in [28], and [44] | Experimental TCP option values are defined in [31], and [46] | |||
| describes the current recommended usage for these experimental | describes the current recommended usage for these experimental | |||
| values. | values. | |||
| 3.3. TCP Terminology Overview | 3.3. TCP Terminology Overview | |||
| This section includes an overview of key terms needed to understand | This section includes an overview of key terms needed to understand | |||
| the detailed protocol operation in the rest of the document. There | the detailed protocol operation in the rest of the document. There | |||
| is a traditional glossary of terms in Section 4. | is a glossary of terms in Section 4. | |||
| 3.3.1. Key Connection State Variables | 3.3.1. Key Connection State Variables | |||
| Before we can discuss very much about the operation of the TCP | Before we can discuss very much about the operation of the TCP | |||
| implementation we need to introduce some detailed terminology. The | implementation we need to introduce some detailed terminology. The | |||
| maintenance of a TCP connection requires the remembering of several | maintenance of a TCP connection requires maintaining state for | |||
| variables. We conceive of these variables being stored in a | several variables. We conceive of these variables being stored in a | |||
| connection record called a Transmission Control Block or TCB. Among | connection record called a Transmission Control Block or TCB. Among | |||
| the variables stored in the TCB are the local and remote IP addresses | the variables stored in the TCB are the local and remote IP addresses | |||
| and port numbers, the IP security level and compartment of the | and port numbers, the IP security level and compartment of the | |||
| connection (see Appendix A.1), pointers to the user's send and | connection (see Appendix A.1), pointers to the user's send and | |||
| receive buffers, pointers to the retransmit queue and to the current | receive buffers, pointers to the retransmit queue and to the current | |||
| segment. In addition several variables relating to the send and | segment. In addition, several variables relating to the send and | |||
| receive sequence numbers are stored in the TCB. | receive sequence numbers are stored in the TCB. | |||
| Send Sequence Variables: | Send Sequence Variables: | |||
| SND.UNA - send unacknowledged | SND.UNA - send unacknowledged | |||
| SND.NXT - send next | SND.NXT - send next | |||
| SND.WND - send window | SND.WND - send window | |||
| SND.UP - send urgent pointer | SND.UP - send urgent pointer | |||
| SND.WL1 - segment sequence number used for last window update | SND.WL1 - segment sequence number used for last window update | |||
| SND.WL2 - segment acknowledgment number used for last window | SND.WL2 - segment acknowledgment number used for last window | |||
| skipping to change at page 16, line 26 ¶ | skipping to change at page 16, line 36 ¶ | |||
| termination request, and to avoid new connections being impacted | termination request, and to avoid new connections being impacted | |||
| by delayed segments from previous connections. | by delayed segments from previous connections. | |||
| CLOSED - represents no connection state at all. | CLOSED - represents no connection state at all. | |||
| A TCP connection progresses from one state to another in response to | A TCP connection progresses from one state to another in response to | |||
| events. The events are the user calls, OPEN, SEND, RECEIVE, CLOSE, | events. The events are the user calls, OPEN, SEND, RECEIVE, CLOSE, | |||
| ABORT, and STATUS; the incoming segments, particularly those | ABORT, and STATUS; the incoming segments, particularly those | |||
| containing the SYN, ACK, RST and FIN flags; and timeouts. | containing the SYN, ACK, RST and FIN flags; and timeouts. | |||
| The OPEN call specifies whether connection establishment is to be | ||||
| actively pursued, or to be passively waited for. | ||||
| A passive OPEN request means that the process wants to accept | ||||
| incoming connection requests, in contrast to an active OPEN | ||||
| attempting to initiate a connection. | ||||
| The state diagram in Figure 5 illustrates only state changes, | The state diagram in Figure 5 illustrates only state changes, | |||
| together with the causing events and resulting actions, but addresses | together with the causing events and resulting actions, but addresses | |||
| neither error conditions nor actions that are not connected with | neither error conditions nor actions that are not connected with | |||
| state changes. In a later section, more detail is offered with | state changes. In a later section, more detail is offered with | |||
| respect to the reaction of the TCP implementation to events. Some | respect to the reaction of the TCP implementation to events. Some | |||
| state names are abbreviated or hyphenated differently in the diagram | state names are abbreviated or hyphenated differently in the diagram | |||
| from how they appear elsewhere in the document. | from how they appear elsewhere in the document. | |||
| NOTA BENE: This diagram is only a summary and must not be taken as | NOTA BENE: This diagram is only a summary and must not be taken as | |||
| the total specification. Many details are not included. | the total specification. Many details are not included. | |||
| skipping to change at page 18, line 9 ¶ | skipping to change at page 18, line 9 ¶ | |||
| +---------+ +---------+ | +---------+ +---------+ | |||
| Figure 5: TCP Connection State Diagram | Figure 5: TCP Connection State Diagram | |||
| The following notes apply to Figure 5: | The following notes apply to Figure 5: | |||
| Note 1: The transition from SYN-RECEIVED to LISTEN on receiving a | Note 1: The transition from SYN-RECEIVED to LISTEN on receiving a | |||
| RST is conditional on having reached SYN-RECEIVED after a passive | RST is conditional on having reached SYN-RECEIVED after a passive | |||
| open. | open. | |||
| Note 2: An unshown transition exists from FIN-WAIT-1 to TIME-WAIT | Note 2: The figure omits a transition from FIN-WAIT-1 to TIME-WAIT | |||
| if a FIN is received and the local FIN is also acknowledged. | if a FIN is received and the local FIN is also acknowledged. | |||
| Note 3: A RST can be sent from any state with a corresponding | Note 3: A RST can be sent from any state with a corresponding | |||
| transition to TIME-WAIT (see [69] for rationale). These | transition to TIME-WAIT (see [71] for rationale). These | |||
| transitions are not not explicitly shown, otherwise the diagram | transitions are not explicitly shown, otherwise the diagram would | |||
| would become very difficult to read. Similarly, receipt of a RST | become very difficult to read. Similarly, receipt of a RST from | |||
| from any state results in a transition to LISTEN or CLOSED, though | any state results in a transition to LISTEN or CLOSED, though this | |||
| this is also omitted from the diagram for legibility. | is also omitted from the diagram for legibility. | |||
| 3.4. Sequence Numbers | 3.4. Sequence Numbers | |||
| A fundamental notion in the design is that every octet of data sent | A fundamental notion in the design is that every octet of data sent | |||
| over a TCP connection has a sequence number. Since every octet is | over a TCP connection has a sequence number. Since every octet is | |||
| sequenced, each of them can be acknowledged. The acknowledgment | sequenced, each of them can be acknowledged. The acknowledgment | |||
| mechanism employed is cumulative so that an acknowledgment of | mechanism employed is cumulative so that an acknowledgment of | |||
| sequence number X indicates that all octets up to but not including X | sequence number X indicates that all octets up to but not including X | |||
| have been received. This mechanism allows for straight-forward | have been received. This mechanism allows for straight-forward | |||
| duplicate detection in the presence of retransmission. Numbering of | duplicate detection in the presence of retransmission. Numbering of | |||
| octets within a segment is that the first data octet immediately | octets within a segment is that the first data octet immediately | |||
| following the header is the lowest numbered, and the following octets | following the header is the lowest numbered, and the following octets | |||
| are numbered consecutively. | are numbered consecutively. | |||
| It is essential to remember that the actual sequence number space is | It is essential to remember that the actual sequence number space is | |||
| finite, though very large. This space ranges from 0 to 2**32 - 1. | finite, though large. This space ranges from 0 to 2**32 - 1. Since | |||
| Since the space is finite, all arithmetic dealing with sequence | the space is finite, all arithmetic dealing with sequence numbers | |||
| numbers must be performed modulo 2**32. This unsigned arithmetic | must be performed modulo 2**32. This unsigned arithmetic preserves | |||
| preserves the relationship of sequence numbers as they cycle from | the relationship of sequence numbers as they cycle from 2**32 - 1 to | |||
| 2**32 - 1 to 0 again. There are some subtleties to computer modulo | 0 again. There are some subtleties to computer modulo arithmetic, so | |||
| arithmetic, so great care should be taken in programming the | great care should be taken in programming the comparison of such | |||
| comparison of such values. The symbol "=<" means "less than or | values. The symbol "=<" means "less than or equal" (modulo 2**32). | |||
| equal" (modulo 2**32). | ||||
| The typical kinds of sequence number comparisons that the TCP | The typical kinds of sequence number comparisons that the TCP | |||
| implementation must perform include: | implementation must perform include: | |||
| (a) Determining that an acknowledgment refers to some sequence | (a) Determining that an acknowledgment refers to some sequence | |||
| number sent but not yet acknowledged. | number sent but not yet acknowledged. | |||
| (b) Determining that all sequence numbers occupied by a segment | (b) Determining that all sequence numbers occupied by a segment | |||
| have been acknowledged (e.g., to remove the segment from a | have been acknowledged (e.g., to remove the segment from a | |||
| retransmission queue). | retransmission queue). | |||
| skipping to change at page 19, line 38 ¶ | skipping to change at page 19, line 38 ¶ | |||
| the inequality below holds: | the inequality below holds: | |||
| SND.UNA < SEG.ACK =< SND.NXT | SND.UNA < SEG.ACK =< SND.NXT | |||
| A segment on the retransmission queue is fully acknowledged if the | A segment on the retransmission queue is fully acknowledged if the | |||
| sum of its sequence number and length is less or equal than the | sum of its sequence number and length is less or equal than the | |||
| acknowledgment value in the incoming segment. | acknowledgment value in the incoming segment. | |||
| When data is received the following comparisons are needed: | When data is received the following comparisons are needed: | |||
| RCV.NXT = next sequence number expected on an incoming segments, | RCV.NXT = next sequence number expected on an incoming segment, | |||
| and is the left or lower edge of the receive window | and is the left or lower edge of the receive window | |||
| RCV.NXT+RCV.WND-1 = last sequence number expected on an incoming | RCV.NXT+RCV.WND-1 = last sequence number expected on an incoming | |||
| segment, and is the right or upper edge of the receive window | segment, and is the right or upper edge of the receive window | |||
| SEG.SEQ = first sequence number occupied by the incoming segment | SEG.SEQ = first sequence number occupied by the incoming segment | |||
| SEG.SEQ+SEG.LEN-1 = last sequence number occupied by the incoming | SEG.SEQ+SEG.LEN-1 = last sequence number occupied by the incoming | |||
| segment | segment | |||
| skipping to change at page 20, line 51 ¶ | skipping to change at page 20, line 51 ¶ | |||
| retransmitted and acknowledged without confusion (i.e., one and only | retransmitted and acknowledged without confusion (i.e., one and only | |||
| one copy of the control will be acted upon). Control information is | one copy of the control will be acted upon). Control information is | |||
| not physically carried in the segment data space. Consequently, we | not physically carried in the segment data space. Consequently, we | |||
| must adopt rules for implicitly assigning sequence numbers to | must adopt rules for implicitly assigning sequence numbers to | |||
| control. The SYN and FIN are the only controls requiring this | control. The SYN and FIN are the only controls requiring this | |||
| protection, and these controls are used only at connection opening | protection, and these controls are used only at connection opening | |||
| and closing. For sequence number purposes, the SYN is considered to | and closing. For sequence number purposes, the SYN is considered to | |||
| occur before the first actual data octet of the segment in which it | occur before the first actual data octet of the segment in which it | |||
| occurs, while the FIN is considered to occur after the last actual | occurs, while the FIN is considered to occur after the last actual | |||
| data octet in a segment in which it occurs. The segment length | data octet in a segment in which it occurs. The segment length | |||
| (SEG.LEN) includes both data and sequence space occupying controls. | (SEG.LEN) includes both data and sequence space-occupying controls. | |||
| When a SYN is present then SEG.SEQ is the sequence number of the SYN. | When a SYN is present then SEG.SEQ is the sequence number of the SYN. | |||
| Initial Sequence Number Selection | 3.4.1. Initial Sequence Number Selection | |||
| A connection is defined by a pair of sockets. Connections can be | A connection is defined by a pair of sockets. Connections can be | |||
| reused. New instances of a connection will be referred to as | reused. New instances of a connection will be referred to as | |||
| incarnations of the connection. The problem that arises from this is | incarnations of the connection. The problem that arises from this is | |||
| -- "how does the TCP implementation identify duplicate segments from | -- "how does the TCP implementation identify duplicate segments from | |||
| previous incarnations of the connection?" This problem becomes | previous incarnations of the connection?" This problem becomes | |||
| apparent if the connection is being opened and closed in quick | apparent if the connection is being opened and closed in quick | |||
| succession, or if the connection breaks with loss of memory and is | succession, or if the connection breaks with loss of memory and is | |||
| then reestablished. To support this, the TIME-WAIT state limits the | then reestablished. To support this, the TIME-WAIT state limits the | |||
| rate of connection reuse, while the initial sequence number selection | rate of connection reuse, while the initial sequence number selection | |||
| described below further protects against ambiguity about what | described below further protects against ambiguity about what | |||
| incarnation of a connection an incoming packet corresponds to. | incarnation of a connection an incoming packet corresponds to. | |||
| To avoid confusion we must prevent segments from one incarnation of a | To avoid confusion we must prevent segments from one incarnation of a | |||
| connection from being used while the same sequence numbers may still | connection from being used while the same sequence numbers may still | |||
| be present in the network from an earlier incarnation. We want to | be present in the network from an earlier incarnation. We want to | |||
| assure this, even if a TCP endpoint loses all knowledge of the | assure this, even if a TCP endpoint loses all knowledge of the | |||
| sequence numbers it has been using. When new connections are | sequence numbers it has been using. When new connections are | |||
| created, an initial sequence number (ISN) generator is employed that | created, an initial sequence number (ISN) generator is employed that | |||
| selects a new 32 bit ISN. There are security issues that result if | selects a new 32 bit ISN. There are security issues that result if | |||
| an off-path attacker is able to predict or guess ISN values. | an off-path attacker is able to predict or guess ISN values [43]. | |||
| TCP Initial Sequence Numbers are generated from a number sequence | TCP Initial Sequence Numbers are generated from a number sequence | |||
| that monotonically increases until it wraps, known loosely as a | that monotonically increases until it wraps, known loosely as a | |||
| "clock". This clock is a 32-bit counter that typically increments at | "clock". This clock is a 32-bit counter that typically increments at | |||
| least once every roughly 4 microseconds, although it is neither | least once every roughly 4 microseconds, although it is neither | |||
| assumed to be realtime nor precise, and need not persist across | assumed to be realtime nor precise, and need not persist across | |||
| reboots. The clock component is intended to insure that with a | reboots. The clock component is intended to ensure that with a | |||
| Maximum Segment Lifetime (MSL), generated ISNs will be unique, since | Maximum Segment Lifetime (MSL), generated ISNs will be unique, since | |||
| it cycles approximately every 4.55 hours, which is much longer than | it cycles approximately every 4.55 hours, which is much longer than | |||
| the MSL. | the MSL. | |||
| A TCP implementation MUST use the above type of "clock" for clock- | A TCP implementation MUST use the above type of "clock" for clock- | |||
| driven selection of initial sequence numbers (MUST-8), and SHOULD | driven selection of initial sequence numbers (MUST-8), and SHOULD | |||
| generate its Initial Sequence Numbers with the expression: | generate its Initial Sequence Numbers with the expression: | |||
| ISN = M + F(localip, localport, remoteip, remoteport, secretkey) | ISN = M + F(localip, localport, remoteip, remoteport, secretkey) | |||
| where M is the 4 microsecond timer, and F() is a pseudorandom | where M is the 4 microsecond timer, and F() is a pseudorandom | |||
| function (PRF) of the connection's identifying parameters ("localip, | function (PRF) of the connection's identifying parameters ("localip, | |||
| localport, remoteip, remoteport") and a secret key ("secretkey") | localport, remoteip, remoteport") and a secret key ("secretkey") | |||
| (SHLD-1). F() MUST NOT be computable from the outside (MUST-9), or | (SHLD-1). F() MUST NOT be computable from the outside (MUST-9), or | |||
| an attacker could still guess at sequence numbers from the ISN used | an attacker could still guess at sequence numbers from the ISN used | |||
| for some other connection. The PRF could be implemented as a | for some other connection. The PRF could be implemented as a | |||
| cryptographic hash of the concatenation of the TCP connection | cryptographic hash of the concatenation of the TCP connection | |||
| parameters and some secret data. For discussion of the selection of | parameters and some secret data. For discussion of the selection of | |||
| a specific hash algorithm and management of the secret key data, | a specific hash algorithm and management of the secret key data, | |||
| please see Section 3 of [41]. | please see Section 3 of [43]. | |||
| For each connection there is a send sequence number and a receive | For each connection there is a send sequence number and a receive | |||
| sequence number. The initial send sequence number (ISS) is chosen by | sequence number. The initial send sequence number (ISS) is chosen by | |||
| the data sending TCP peer, and the initial receive sequence number | the data sending TCP peer, and the initial receive sequence number | |||
| (IRS) is learned during the connection establishing procedure. | (IRS) is learned during the connection establishing procedure. | |||
| For a connection to be established or initialized, the two TCP peers | For a connection to be established or initialized, the two TCP peers | |||
| must synchronize on each other's initial sequence numbers. This is | must synchronize on each other's initial sequence numbers. This is | |||
| done in an exchange of connection establishing segments carrying a | done in an exchange of connection establishing segments carrying a | |||
| control bit called "SYN" (for synchronize) and the initial sequence | control bit called "SYN" (for synchronize) and the initial sequence | |||
| skipping to change at page 22, line 45 ¶ | skipping to change at page 22, line 45 ¶ | |||
| 2) A <-- B ACK your sequence number is X | 2) A <-- B ACK your sequence number is X | |||
| 3) A <-- B SYN my sequence number is Y | 3) A <-- B SYN my sequence number is Y | |||
| 4) A --> B ACK your sequence number is Y | 4) A --> B ACK your sequence number is Y | |||
| Because steps 2 and 3 can be combined in a single message this is | Because steps 2 and 3 can be combined in a single message this is | |||
| called the three-way (or three message) handshake (3WHS). | called the three-way (or three message) handshake (3WHS). | |||
| A 3WHS is necessary because sequence numbers are not tied to a global | A 3WHS is necessary because sequence numbers are not tied to a global | |||
| clock in the network, and TCP implementations may have different | clock in the network, and TCP implementations may have different | |||
| mechanisms for picking the ISNs. The receiver of the first SYN has | mechanisms for picking the ISNs. The receiver of the first SYN has | |||
| no way of knowing whether the segment was an old delayed one or not, | no way of knowing whether the segment was an old one or not, unless | |||
| unless it remembers the last sequence number used on the connection | it remembers the last sequence number used on the connection (which | |||
| (which is not always possible), and so it must ask the sender to | is not always possible), and so it must ask the sender to verify this | |||
| verify this SYN. The three way handshake and the advantages of a | SYN. The three-way handshake and the advantages of a clock-driven | |||
| clock-driven scheme are discussed in [68]. | scheme for ISN selection are discussed in [70]. | |||
| 3.4.2. Knowing When to Keep Quiet | ||||
| Knowing When to Keep Quiet | ||||
| A theoretical problem exists where data could be corrupted due to | A theoretical problem exists where data could be corrupted due to | |||
| confusion between old segments in the network and new ones after a | confusion between old segments in the network and new ones after a | |||
| host reboots, if the same port numbers and sequence space are reused. | host reboots, if the same port numbers and sequence space are reused. | |||
| The "Quiet Time" concept discussed below addresses this and the | The "Quiet Time" concept discussed below addresses this and the | |||
| discussion of it is included for situations where it might be | discussion of it is included for situations where it might be | |||
| relevant, although it is not felt to be necessary in most current | relevant, although it is not felt to be necessary in most current | |||
| implementations. The problem was more relevant earlier in the | implementations. The problem was more relevant earlier in the | |||
| history of TCP. In practical use on the Internet today, the error- | history of TCP. In practical use on the Internet today, the error- | |||
| prone conditions are sufficiently unlikely that it is felt safe to | prone conditions are sufficiently unlikely that it is felt safe to | |||
| ignore. Reasons why it is now negligible include: (a) ISS and | ignore. Reasons why it is now negligible include: (a) ISS and | |||
| skipping to change at page 23, line 31 ¶ | skipping to change at page 23, line 34 ¶ | |||
| remaining in the network, the TCP endpoint must keep quiet for an MSL | remaining in the network, the TCP endpoint must keep quiet for an MSL | |||
| before assigning any sequence numbers upon starting up or recovering | before assigning any sequence numbers upon starting up or recovering | |||
| from a situation where memory of sequence numbers in use was lost. | from a situation where memory of sequence numbers in use was lost. | |||
| For this specification the MSL is taken to be 2 minutes. This is an | For this specification the MSL is taken to be 2 minutes. This is an | |||
| engineering choice, and may be changed if experience indicates it is | engineering choice, and may be changed if experience indicates it is | |||
| desirable to do so. Note that if a TCP endpoint is reinitialized in | desirable to do so. Note that if a TCP endpoint is reinitialized in | |||
| some sense, yet retains its memory of sequence numbers in use, then | some sense, yet retains its memory of sequence numbers in use, then | |||
| it need not wait at all; it must only be sure to use sequence numbers | it need not wait at all; it must only be sure to use sequence numbers | |||
| larger than those recently used. | larger than those recently used. | |||
| The TCP Quiet Time Concept | 3.4.3. The TCP Quiet Time Concept | |||
| Hosts that for any reason lose knowledge of the last sequence numbers | Hosts that for any reason lose knowledge of the last sequence numbers | |||
| transmitted on each active (i.e., not closed) connection shall delay | transmitted on each active (i.e., not closed) connection shall delay | |||
| emitting any TCP segments for at least the agreed MSL in the internet | emitting any TCP segments for at least the agreed MSL in the internet | |||
| system that the host is a part of. In the paragraphs below, an | system that the host is a part of. In the paragraphs below, an | |||
| explanation for this specification is given. TCP implementors may | explanation for this specification is given. TCP implementors may | |||
| violate the "quiet time" restriction, but only at the risk of causing | violate the "quiet time" restriction, but only at the risk of causing | |||
| some old data to be accepted as new or new data rejected as old | some old data to be accepted as new or new data rejected as old | |||
| duplicated by some receivers in the internet system. | duplicated data by some receivers in the internet system. | |||
| TCP endpoints consume sequence number space each time a segment is | TCP endpoints consume sequence number space each time a segment is | |||
| formed and entered into the network output queue at a source host. | formed and entered into the network output queue at a source host. | |||
| The duplicate detection and sequencing algorithm in the TCP protocol | The duplicate detection and sequencing algorithm in the TCP protocol | |||
| relies on the unique binding of segment data to sequence space to the | relies on the unique binding of segment data to sequence space to the | |||
| extent that sequence numbers will not cycle through all 2**32 values | extent that sequence numbers will not cycle through all 2**32 values | |||
| before the segment data bound to those sequence numbers has been | before the segment data bound to those sequence numbers has been | |||
| delivered and acknowledged by the receiver and all duplicate copies | delivered and acknowledged by the receiver and all duplicate copies | |||
| of the segments have "drained" from the internet. Without such an | of the segments have "drained" from the internet. Without such an | |||
| assumption, two distinct TCP segments could conceivably be assigned | assumption, two distinct TCP segments could conceivably be assigned | |||
| the same or overlapping sequence numbers, causing confusion at the | the same or overlapping sequence numbers, causing confusion at the | |||
| receiver as to which data is new and which is old. Remember that | receiver as to which data is new and which is old. Remember that | |||
| each segment is bound to as many consecutive sequence numbers as | each segment is bound to as many consecutive sequence numbers as | |||
| there are octets of data and SYN or FIN flags in the segment. | there are octets of data and SYN or FIN flags in the segment. | |||
| Under normal conditions, TCP implementations keep track of the next | Under normal conditions, TCP implementations keep track of the next | |||
| sequence number to emit and the oldest awaiting acknowledgment so as | sequence number to emit and the oldest awaiting acknowledgment so as | |||
| to avoid mistakenly using a sequence number over before its first use | to avoid mistakenly using a sequence number over before its first use | |||
| has been acknowledged. This alone does not guarantee that old | has been acknowledged. This alone does not guarantee that old | |||
| duplicate data is drained from the net, so the sequence space has | duplicate data is drained from the net, so the sequence space has | |||
| been made very large to reduce the probability that a wandering | been made large to reduce the probability that a wandering duplicate | |||
| duplicate will cause trouble upon arrival. At 2 megabits/sec. it | will cause trouble upon arrival. At 2 megabits/sec. it takes 4.5 | |||
| takes 4.5 hours to use up 2**32 octets of sequence space. Since the | hours to use up 2**32 octets of sequence space. Since the maximum | |||
| maximum segment lifetime in the net is not likely to exceed a few | segment lifetime in the net is not likely to exceed a few tens of | |||
| tens of seconds, this is deemed ample protection for foreseeable | seconds, this is deemed ample protection for foreseeable nets, even | |||
| nets, even if data rates escalate to 10's of megabits/sec. At 100 | if data rates escalate to 10s of megabits/sec. At 100 megabits/sec, | |||
| megabits/sec, the cycle time is 5.4 minutes, which may be a little | the cycle time is 5.4 minutes, which may be a little short, but still | |||
| short, but still within reason. | within reason. Much higher data rates are possible today, with | |||
| implications described in the final paragraph of this subsection. | ||||
| The basic duplicate detection and sequencing algorithm in TCP can be | The basic duplicate detection and sequencing algorithm in TCP can be | |||
| defeated, however, if a source TCP endpoint does not have any memory | defeated, however, if a source TCP endpoint does not have any memory | |||
| of the sequence numbers it last used on a given connection. For | of the sequence numbers it last used on a given connection. For | |||
| example, if the TCP implementation were to start all connections with | example, if the TCP implementation were to start all connections with | |||
| sequence number 0, then upon the host rebooting, a TCP peer might re- | sequence number 0, then upon the host rebooting, a TCP peer might re- | |||
| form an earlier connection (possibly after half-open connection | form an earlier connection (possibly after half-open connection | |||
| resolution) and emit packets with sequence numbers identical to or | resolution) and emit packets with sequence numbers identical to or | |||
| overlapping with packets still in the network, which were emitted on | overlapping with packets still in the network, which were emitted on | |||
| an earlier incarnation of the same connection. In the absence of | an earlier incarnation of the same connection. In the absence of | |||
| skipping to change at page 25, line 24 ¶ | skipping to change at page 25, line 16 ¶ | |||
| bearing sequence numbers in the neighborhood of S1 may arrive and be | bearing sequence numbers in the neighborhood of S1 may arrive and be | |||
| treated as new packets by the receiver of the new incarnation of the | treated as new packets by the receiver of the new incarnation of the | |||
| connection. | connection. | |||
| The problem is that the recovering host may not know for how long it | The problem is that the recovering host may not know for how long it | |||
| was down between rebooting nor does it know whether there are still | was down between rebooting nor does it know whether there are still | |||
| old duplicates in the system from earlier connection incarnations. | old duplicates in the system from earlier connection incarnations. | |||
| One way to deal with this problem is to deliberately delay emitting | One way to deal with this problem is to deliberately delay emitting | |||
| segments for one MSL after recovery from a reboot - this is the | segments for one MSL after recovery from a reboot - this is the | |||
| "quiet time" specification. Hosts that prefer to avoid waiting are | "quiet time" specification. Hosts that prefer to avoid waiting and | |||
| willing to risk possible confusion of old and new packets at a given | are willing to risk possible confusion of old and new packets at a | |||
| destination may choose not to wait for the "quiet time". | given destination may choose not to wait for the "quiet time". | |||
| Implementors may provide TCP users with the ability to select on a | Implementors may provide TCP users with the ability to select on a | |||
| connection by connection basis whether to wait after a reboot, or may | connection by connection basis whether to wait after a reboot, or may | |||
| informally implement the "quiet time" for all connections. | informally implement the "quiet time" for all connections. | |||
| Obviously, even where a user selects to "wait," this is not necessary | Obviously, even where a user selects to "wait," this is not necessary | |||
| after the host has been "up" for at least MSL seconds. | after the host has been "up" for at least MSL seconds. | |||
| To summarize: every segment emitted occupies one or more sequence | To summarize: every segment emitted occupies one or more sequence | |||
| numbers in the sequence space, the numbers occupied by a segment are | numbers in the sequence space, the numbers occupied by a segment are | |||
| "busy" or "in use" until MSL seconds have passed, upon rebooting a | "busy" or "in use" until MSL seconds have passed, upon rebooting a | |||
| block of space-time is occupied by the octets and SYN or FIN flags of | block of space-time is occupied by the octets and SYN or FIN flags of | |||
| the last emitted segment, if a new connection is started too soon and | any potentially still in-flight segments, and if a new connection is | |||
| uses any of the sequence numbers in the space-time footprint of the | started too soon and uses any of the sequence numbers in the space- | |||
| last segment of the previous connection incarnation, there is a | time footprint of those potentially still in-flight segments of the | |||
| potential sequence number overlap area that could cause confusion at | previous connection incarnation, there is a potential sequence number | |||
| the receiver. | overlap area that could cause confusion at the receiver. | |||
| High performance cases will have shorter cycle times than those in | ||||
| the megabits per second that the base TCP design described above | ||||
| considers. At 1 Gbps, the cycle time is 34 seconds, only 3 seconds | ||||
| at 10 Gbps, and around a third of a second at 100 Gbps. In these | ||||
| higher performance cases, TCP Timestamp options and Protection | ||||
| Against Wrapped Sequences (PAWS) [48] provide the needed capability | ||||
| to detect and discard old duplicates. | ||||
| 3.5. Establishing a connection | 3.5. Establishing a connection | |||
| The "three-way handshake" is the procedure used to establish a | The "three-way handshake" is the procedure used to establish a | |||
| connection. This procedure normally is initiated by one TCP peer and | connection. This procedure normally is initiated by one TCP peer and | |||
| responded to by another TCP peer. The procedure also works if two | responded to by another TCP peer. The procedure also works if two | |||
| TCP peers simultaneously initiate the procedure. When simultaneous | TCP peers simultaneously initiate the procedure. When simultaneous | |||
| open occurs, each TCP peer receives a "SYN" segment that carries no | open occurs, each TCP peer receives a "SYN" segment that carries no | |||
| acknowledgment after it has sent a "SYN". Of course, the arrival of | acknowledgment after it has sent a "SYN". Of course, the arrival of | |||
| an old duplicate "SYN" segment can potentially make it appear, to the | an old duplicate "SYN" segment can potentially make it appear, to the | |||
| recipient, that a simultaneous connection initiation is in progress. | recipient, that a simultaneous connection initiation is in progress. | |||
| Proper use of "reset" segments can disambiguate these cases. | Proper use of "reset" segments can disambiguate these cases. | |||
| Several examples of connection initiation follow. Although these | Several examples of connection initiation follow. Although these | |||
| examples do not show connection synchronization using data-carrying | examples do not show connection synchronization using data-carrying | |||
| segments, this is perfectly legitimate, so long as the receiving TCP | segments, this is perfectly legitimate, so long as the receiving TCP | |||
| endpoint doesn't deliver the data to the user until it is clear the | endpoint doesn't deliver the data to the user until it is clear the | |||
| data is valid (e.g., the data is buffered at the receiver until the | data is valid (e.g., the data is buffered at the receiver until the | |||
| connection reaches the ESTABLISHED state, given that the three-way | connection reaches the ESTABLISHED state, given that the three-way | |||
| handshake reduces the possibility of false connections). It is the | handshake reduces the possibility of false connections). It is a | |||
| implementation of a trade-off between memory and messages to provide | trade-off between memory and messages to provide information for this | |||
| information for this checking. | checking. | |||
| The simplest 3WHS is shown in Figure 6. The figures should be | The simplest 3WHS is shown in Figure 6. The figures should be | |||
| interpreted in the following way. Each line is numbered for | interpreted in the following way. Each line is numbered for | |||
| reference purposes. Right arrows (-->) indicate departure of a TCP | reference purposes. Right arrows (-->) indicate departure of a TCP | |||
| segment from TCP peer A to TCP peer B, or arrival of a segment at B | segment from TCP peer A to TCP peer B, or arrival of a segment at B | |||
| from A. Left arrows (<--), indicate the reverse. Ellipsis (...) | from A. Left arrows (<--), indicate the reverse. Ellipsis (...) | |||
| indicates a segment that is still in the network (delayed). Comments | indicates a segment that is still in the network (delayed). Comments | |||
| appear in parentheses. TCP connection states represent the state | appear in parentheses. TCP connection states represent the state | |||
| AFTER the departure or arrival of the segment (whose contents are | AFTER the departure or arrival of the segment (whose contents are | |||
| shown in the center of each line). Segment contents are shown in | shown in the center of each line). Segment contents are shown in | |||
| skipping to change at page 28, line 39 ¶ | skipping to change at page 28, line 36 ¶ | |||
| Figure 8. At line 3, an old duplicate SYN arrives at TCP Peer B. | Figure 8. At line 3, an old duplicate SYN arrives at TCP Peer B. | |||
| TCP Peer B cannot tell that this is an old duplicate, so it responds | TCP Peer B cannot tell that this is an old duplicate, so it responds | |||
| normally (line 4). TCP Peer A detects that the ACK field is | normally (line 4). TCP Peer A detects that the ACK field is | |||
| incorrect and returns a RST (reset) with its SEQ field selected to | incorrect and returns a RST (reset) with its SEQ field selected to | |||
| make the segment believable. TCP Peer B, on receiving the RST, | make the segment believable. TCP Peer B, on receiving the RST, | |||
| returns to the LISTEN state. When the original SYN finally arrives | returns to the LISTEN state. When the original SYN finally arrives | |||
| at line 6, the synchronization proceeds normally. If the SYN at line | at line 6, the synchronization proceeds normally. If the SYN at line | |||
| 6 had arrived before the RST, a more complex exchange might have | 6 had arrived before the RST, a more complex exchange might have | |||
| occurred with RST's sent in both directions. | occurred with RST's sent in both directions. | |||
| Half-Open Connections and Other Anomalies | 3.5.1. Half-Open Connections and Other Anomalies | |||
| An established connection is said to be "half-open" if one of the TCP | An established connection is said to be "half-open" if one of the TCP | |||
| peers has closed or aborted the connection at its end without the | peers has closed or aborted the connection at its end without the | |||
| knowledge of the other, or if the two ends of the connection have | knowledge of the other, or if the two ends of the connection have | |||
| become desynchronized owing to a failure or reboot that resulted in | become desynchronized owing to a failure or reboot that resulted in | |||
| loss of memory. Such connections will automatically become reset if | loss of memory. Such connections will automatically become reset if | |||
| an attempt is made to send data in either direction. However, half- | an attempt is made to send data in either direction. However, half- | |||
| open connections are expected to be unusual. | open connections are expected to be unusual. | |||
| If at site A the connection no longer exists, then an attempt by the | If at site A the connection no longer exists, then an attempt by the | |||
| skipping to change at page 30, line 46 ¶ | skipping to change at page 31, line 5 ¶ | |||
| 4. --> <SEQ=Z+1><CTL=RST> --> (return to LISTEN!) | 4. --> <SEQ=Z+1><CTL=RST> --> (return to LISTEN!) | |||
| 5. LISTEN LISTEN | 5. LISTEN LISTEN | |||
| Figure 11: Old Duplicate SYN Initiates a Reset on two Passive Sockets | Figure 11: Old Duplicate SYN Initiates a Reset on two Passive Sockets | |||
| A variety of other cases are possible, all of which are accounted for | A variety of other cases are possible, all of which are accounted for | |||
| by the following rules for RST generation and processing. | by the following rules for RST generation and processing. | |||
| Reset Generation | 3.5.2. Reset Generation | |||
| A TCP user or application can issue a reset on a connection at any | A TCP user or application can issue a reset on a connection at any | |||
| time, though reset events are also generated by the protocol itself | time, though reset events are also generated by the protocol itself | |||
| when various error conditions occur, as described below. The side of | when various error conditions occur, as described below. The side of | |||
| a connection issuing a reset should enter the TIME-WAIT state, as | a connection issuing a reset should enter the TIME-WAIT state, as | |||
| this generally helps to reduce the load on busy servers for reasons | this generally helps to reduce the load on busy servers for reasons | |||
| described in [69]. | described in [71]. | |||
| As a general rule, reset (RST) is sent whenever a segment arrives | As a general rule, reset (RST) is sent whenever a segment arrives | |||
| that apparently is not intended for the current connection. A reset | that apparently is not intended for the current connection. A reset | |||
| must not be sent if it is not clear that this is the case. | must not be sent if it is not clear that this is the case. | |||
| There are three groups of states: | There are three groups of states: | |||
| 1. If the connection does not exist (CLOSED) then a reset is sent | 1. If the connection does not exist (CLOSED) then a reset is sent | |||
| in response to any incoming segment except another reset. A SYN | in response to any incoming segment except another reset. A SYN | |||
| segment that does not match an existing connection is rejected by | segment that does not match an existing connection is rejected by | |||
| skipping to change at page 31, line 31 ¶ | skipping to change at page 31, line 34 ¶ | |||
| If the incoming segment has the ACK bit set, the reset takes its | If the incoming segment has the ACK bit set, the reset takes its | |||
| sequence number from the ACK field of the segment, otherwise the | sequence number from the ACK field of the segment, otherwise the | |||
| reset has sequence number zero and the ACK field is set to the sum | reset has sequence number zero and the ACK field is set to the sum | |||
| of the sequence number and segment length of the incoming segment. | of the sequence number and segment length of the incoming segment. | |||
| The connection remains in the CLOSED state. | The connection remains in the CLOSED state. | |||
| 2. If the connection is in any non-synchronized state (LISTEN, | 2. If the connection is in any non-synchronized state (LISTEN, | |||
| SYN-SENT, SYN-RECEIVED), and the incoming segment acknowledges | SYN-SENT, SYN-RECEIVED), and the incoming segment acknowledges | |||
| something not yet sent (the segment carries an unacceptable ACK), | something not yet sent (the segment carries an unacceptable ACK), | |||
| or if an incoming segment has a security level or compartment that | or if an incoming segment has a security level or compartment | |||
| does not exactly match the level and compartment requested for the | Appendix A.1 that does not exactly match the level and compartment | |||
| connection, a reset is sent. | requested for the connection, a reset is sent. | |||
| If the incoming segment has an ACK field, the reset takes its | If the incoming segment has an ACK field, the reset takes its | |||
| sequence number from the ACK field of the segment, otherwise the | sequence number from the ACK field of the segment, otherwise the | |||
| reset has sequence number zero and the ACK field is set to the sum | reset has sequence number zero and the ACK field is set to the sum | |||
| of the sequence number and segment length of the incoming segment. | of the sequence number and segment length of the incoming segment. | |||
| The connection remains in the same state. | The connection remains in the same state. | |||
| 3. If the connection is in a synchronized state (ESTABLISHED, | 3. If the connection is in a synchronized state (ESTABLISHED, | |||
| FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), | FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), | |||
| any unacceptable segment (out of window sequence number or | any unacceptable segment (out of window sequence number or | |||
| unacceptable acknowledgment number) must be responded to with an | unacceptable acknowledgment number) must be responded to with an | |||
| empty acknowledgment segment (without any user data) containing | empty acknowledgment segment (without any user data) containing | |||
| the current send-sequence number and an acknowledgment indicating | the current send-sequence number and an acknowledgment indicating | |||
| the next sequence number expected to be received, and the | the next sequence number expected to be received, and the | |||
| connection remains in the same state. | connection remains in the same state. | |||
| If an incoming segment has a security level, or compartment that | If an incoming segment has a security level or compartment that | |||
| does not exactly match the level and compartment requested for the | does not exactly match the level and compartment requested for the | |||
| connection, a reset is sent and the connection goes to the CLOSED | connection, a reset is sent and the connection goes to the CLOSED | |||
| state. The reset takes its sequence number from the ACK field of | state. The reset takes its sequence number from the ACK field of | |||
| the incoming segment. | the incoming segment. | |||
| Reset Processing | 3.5.3. Reset Processing | |||
| In all states except SYN-SENT, all reset (RST) segments are validated | In all states except SYN-SENT, all reset (RST) segments are validated | |||
| by checking their SEQ-fields. A reset is valid if its sequence | by checking their SEQ-fields. A reset is valid if its sequence | |||
| number is in the window. In the SYN-SENT state (a RST received in | number is in the window. In the SYN-SENT state (a RST received in | |||
| response to an initial SYN), the RST is acceptable if the ACK field | response to an initial SYN), the RST is acceptable if the ACK field | |||
| acknowledges the SYN. | acknowledges the SYN. | |||
| The receiver of a RST first validates it, then changes state. If the | The receiver of a RST first validates it, then changes state. If the | |||
| receiver was in the LISTEN state, it ignores it. If the receiver was | receiver was in the LISTEN state, it ignores it. If the receiver was | |||
| in SYN-RECEIVED state and had previously been in the LISTEN state, | in SYN-RECEIVED state and had previously been in the LISTEN state, | |||
| then the receiver returns to the LISTEN state, otherwise the receiver | then the receiver returns to the LISTEN state, otherwise the receiver | |||
| aborts the connection and goes to the CLOSED state. If the receiver | aborts the connection and goes to the CLOSED state. If the receiver | |||
| was in any other state, it aborts the connection and advises the user | was in any other state, it aborts the connection and advises the user | |||
| and goes to the CLOSED state. | and goes to the CLOSED state. | |||
| TCP implementations SHOULD allow a received RST segment to include | TCP implementations SHOULD allow a received RST segment to include | |||
| data (SHLD-2). | data (SHLD-2). It has been suggested that a RST segment could | |||
| contain diagnostic data that explains the cause of the RST. No | ||||
| standard has yet been established for such data. | ||||
| 3.6. Closing a Connection | 3.6. Closing a Connection | |||
| CLOSE is an operation meaning "I have no more data to send." The | CLOSE is an operation meaning "I have no more data to send." The | |||
| notion of closing a full-duplex connection is subject to ambiguous | notion of closing a full-duplex connection is subject to ambiguous | |||
| interpretation, of course, since it may not be obvious how to treat | interpretation, of course, since it may not be obvious how to treat | |||
| the receiving side of the connection. We have chosen to treat CLOSE | the receiving side of the connection. We have chosen to treat CLOSE | |||
| in a simplex fashion. The user who CLOSEs may continue to RECEIVE | in a simplex fashion. The user who CLOSEs may continue to RECEIVE | |||
| until the TCP receiver is told that the remote peer has CLOSED also. | until the TCP receiver is told that the remote peer has CLOSED also. | |||
| Thus, a program could initiate several SENDs followed by a CLOSE, and | Thus, a program could initiate several SENDs followed by a CLOSE, and | |||
| then continue to RECEIVE until signaled that a RECEIVE failed because | then continue to RECEIVE until signaled that a RECEIVE failed because | |||
| the remote peer has CLOSED. The TCP implementation will signal a | the remote peer has CLOSED. The TCP implementation will signal a | |||
| user, even if no RECEIVEs are outstanding, that the remote peer has | user, even if no RECEIVEs are outstanding, that the remote peer has | |||
| closed, so the user can terminate his side gracefully. A TCP | closed, so the user can terminate their side gracefully. A TCP | |||
| implementation will reliably deliver all buffers SENT before the | implementation will reliably deliver all buffers SENT before the | |||
| connection was CLOSED so a user who expects no data in return need | connection was CLOSED so a user who expects no data in return need | |||
| only wait to hear the connection was CLOSED successfully to know that | only wait to hear the connection was CLOSED successfully to know that | |||
| all their data was received at the destination TCP endpoint. Users | all their data was received at the destination TCP endpoint. Users | |||
| must keep reading connections they close for sending until the TCP | must keep reading connections they close for sending until the TCP | |||
| implementation indicates there is no more data. | implementation indicates there is no more data. | |||
| There are essentially three cases: | There are essentially three cases: | |||
| 1) The user initiates by telling the TCP implementation to CLOSE | 1) The user initiates by telling the TCP implementation to CLOSE | |||
| skipping to change at page 35, line 19 ¶ | skipping to change at page 35, line 19 ¶ | |||
| independently, it is possible for a connection to be "half closed," | independently, it is possible for a connection to be "half closed," | |||
| i.e., closed in only one direction, and a host is permitted to | i.e., closed in only one direction, and a host is permitted to | |||
| continue sending data in the open direction on a half-closed | continue sending data in the open direction on a half-closed | |||
| connection. | connection. | |||
| A host MAY implement a "half-duplex" TCP close sequence, so that an | A host MAY implement a "half-duplex" TCP close sequence, so that an | |||
| application that has called CLOSE cannot continue to read data from | application that has called CLOSE cannot continue to read data from | |||
| the connection (MAY-1). If such a host issues a CLOSE call while | the connection (MAY-1). If such a host issues a CLOSE call while | |||
| received data is still pending in the TCP connection, or if new data | received data is still pending in the TCP connection, or if new data | |||
| is received after CLOSE is called, its TCP implementation SHOULD send | is received after CLOSE is called, its TCP implementation SHOULD send | |||
| a RST to show that data was lost (SHLD-3). See [22] section 2.17 for | a RST to show that data was lost (SHLD-3). See [24] section 2.17 for | |||
| discussion. | discussion. | |||
| When a connection is closed actively, it MUST linger in the TIME-WAIT | When a connection is closed actively, it MUST linger in the TIME-WAIT | |||
| state for a time 2xMSL (Maximum Segment Lifetime) (MUST-13). | state for a time 2xMSL (Maximum Segment Lifetime) (MUST-13). | |||
| However, it MAY accept a new SYN from the remote TCP endpoint to | However, it MAY accept a new SYN from the remote TCP endpoint to | |||
| reopen the connection directly from TIME-WAIT state (MAY-2), if it: | reopen the connection directly from TIME-WAIT state (MAY-2), if it: | |||
| (1) assigns its initial sequence number for the new connection to | (1) assigns its initial sequence number for the new connection to | |||
| be larger than the largest sequence number it used on the previous | be larger than the largest sequence number it used on the previous | |||
| connection incarnation, and | connection incarnation, and | |||
| (2) returns to TIME-WAIT state if the SYN turns out to be an old | (2) returns to TIME-WAIT state if the SYN turns out to be an old | |||
| duplicate. | duplicate. | |||
| When the TCP Timestamp options are available, an improved algorithm | When the TCP Timestamp options are available, an improved algorithm | |||
| is described in [39] in order to support higher connection | is described in [41] in order to support higher connection | |||
| establishment rates. This algorithm for reducing TIME-WAIT is a Best | establishment rates. This algorithm for reducing TIME-WAIT is a Best | |||
| Current Practice that SHOULD be implemented, since timestamp options | Current Practice that SHOULD be implemented, since timestamp options | |||
| are commonly used, and using them to reduce TIME-WAIT provides | are commonly used, and using them to reduce TIME-WAIT provides | |||
| benefits for busy Internet servers (SHLD-4). | benefits for busy Internet servers (SHLD-4). | |||
| 3.7. Segmentation | 3.7. Segmentation | |||
| The term "segmentation" refers to the activity TCP performs when | The term "segmentation" refers to the activity TCP performs when | |||
| ingesting a stream of bytes from a sending application and | ingesting a stream of bytes from a sending application and | |||
| packetizing that stream of bytes into TCP segments. Individual TCP | packetizing that stream of bytes into TCP segments. Individual TCP | |||
| segments often do not correspond one-for-one to individual send (or | segments often do not correspond one-for-one to individual send (or | |||
| socket write) calls from the application. Applications may perform | socket write) calls from the application. Applications may perform | |||
| writes at the granularity of messages in the upper layer protocol, | writes at the granularity of messages in the upper layer protocol, | |||
| but TCP guarantees no boundary coherence between the TCP segments | but TCP guarantees no boundary coherence between the TCP segments | |||
| sent and received versus user application data read or write buffer | sent and received versus user application data read or write buffer | |||
| boundaries. In some specific protocols, such as Remote Direct Memory | boundaries. In some specific protocols, such as Remote Direct Memory | |||
| Access (RDMA) using Direct Data Placement (DDP) and Marker PDU | Access (RDMA) using Direct Data Placement (DDP) and Marker PDU | |||
| Aligned Framing (MPA) [32], there are performance optimizations | Aligned Framing (MPA) [35], there are performance optimizations | |||
| possible when the relation between TCP segments and application data | possible when the relation between TCP segments and application data | |||
| units can be controlled, and MPA includes a specific mechanism for | units can be controlled, and MPA includes a specific mechanism for | |||
| detecting and verifying this relationship between TCP segments and | detecting and verifying this relationship between TCP segments and | |||
| application message data structures, but this is specific to | application message data structures, but this is specific to | |||
| applications like RDMA. In general, multiple goals influence the | applications like RDMA. In general, multiple goals influence the | |||
| sizing of TCP segments created by a TCP implementation. | sizing of TCP segments created by a TCP implementation. | |||
| Goals driving the sending of larger segments include: | Goals driving the sending of larger segments include: | |||
| * Reducing the number of packets in flight within the network. | * Reducing the number of packets in flight within the network. | |||
| skipping to change at page 37, line 15 ¶ | skipping to change at page 37, line 15 ¶ | |||
| 3.7.1. Maximum Segment Size Option | 3.7.1. Maximum Segment Size Option | |||
| TCP endpoints MUST implement both sending and receiving the MSS | TCP endpoints MUST implement both sending and receiving the MSS | |||
| option (MUST-14). | option (MUST-14). | |||
| TCP implementations SHOULD send an MSS option in every SYN segment | TCP implementations SHOULD send an MSS option in every SYN segment | |||
| when its receive MSS differs from the default 536 for IPv4 or 1220 | when its receive MSS differs from the default 536 for IPv4 or 1220 | |||
| for IPv6 (SHLD-5), and MAY send it always (MAY-3). | for IPv6 (SHLD-5), and MAY send it always (MAY-3). | |||
| If an MSS option is not received at connection setup, TCP | If an MSS option is not received at connection setup, TCP | |||
| implementations MUST assume a default send MSS of 536 (576-40) for | implementations MUST assume a default send MSS of 536 (576 - 40) for | |||
| IPv4 or 1220 (1280 - 60) for IPv6 (MUST-15). | IPv4 or 1220 (1280 - 60) for IPv6 (MUST-15). | |||
| The maximum size of a segment that TCP endpoint really sends, the | The maximum size of a segment that TCP endpoint really sends, the | |||
| "effective send MSS," MUST be the smaller (MUST-16) of the send MSS | "effective send MSS," MUST be the smaller (MUST-16) of the send MSS | |||
| (that reflects the available reassembly buffer size at the remote | (that reflects the available reassembly buffer size at the remote | |||
| host, the EMTU_R [18]) and the largest transmission size permitted by | host, the EMTU_R [20]) and the largest transmission size permitted by | |||
| the IP layer (EMTU_S [18]): | the IP layer (EMTU_S [20]): | |||
| Eff.snd.MSS = | Eff.snd.MSS = | |||
| min(SendMSS+20, MMS_S) - TCPhdrsize - IPoptionsize | min(SendMSS+20, MMS_S) - TCPhdrsize - IPoptionsize | |||
| where: | where: | |||
| * SendMSS is the MSS value received from the remote host, or the | * SendMSS is the MSS value received from the remote host, or the | |||
| default 536 for IPv4 or 1220 for IPv6, if no MSS option is | default 536 for IPv4 or 1220 for IPv6, if no MSS option is | |||
| received. | received. | |||
| skipping to change at page 38, line 6 ¶ | skipping to change at page 38, line 6 ¶ | |||
| headers associated with a TCP connection. Note that some options | headers associated with a TCP connection. Note that some options | |||
| or extension headers might not be included on all packets, but | or extension headers might not be included on all packets, but | |||
| that for each segment sent, the sender should adjust the data | that for each segment sent, the sender should adjust the data | |||
| length accordingly, within the Eff.snd.MSS. | length accordingly, within the Eff.snd.MSS. | |||
| The MSS value to be sent in an MSS option should be equal to the | The MSS value to be sent in an MSS option should be equal to the | |||
| effective MTU minus the fixed IP and TCP headers. By ignoring both | effective MTU minus the fixed IP and TCP headers. By ignoring both | |||
| IP and TCP options when calculating the value for the MSS option, if | IP and TCP options when calculating the value for the MSS option, if | |||
| there are any IP or TCP options to be sent in a packet, then the | there are any IP or TCP options to be sent in a packet, then the | |||
| sender must decrease the size of the TCP data accordingly. RFC 6691 | sender must decrease the size of the TCP data accordingly. RFC 6691 | |||
| [42] discusses this in greater detail. | [44] discusses this in greater detail. | |||
| The MSS value to be sent in an MSS option must be less than or equal | The MSS value to be sent in an MSS option must be less than or equal | |||
| to: | to: | |||
| MMS_R - 20 | MMS_R - 20 | |||
| where MMS_R is the maximum size for a transport-layer message that | where MMS_R is the maximum size for a transport-layer message that | |||
| can be received (and reassembled at the IP layer) (MUST-67). TCP | can be received (and reassembled at the IP layer) (MUST-67). TCP | |||
| obtains MMS_R and MMS_S from the IP layer; see the generic call | obtains MMS_R and MMS_S from the IP layer; see the generic call | |||
| GET_MAXSIZES in Section 3.4 of RFC 1122. These are defined in terms | GET_MAXSIZES in Section 3.4 of RFC 1122. These are defined in terms | |||
| of their IP MTU equivalents, EMTU_R and EMTU_S [18]. | of their IP MTU equivalents, EMTU_R and EMTU_S [20]. | |||
| When TCP is used in a situation where either the IP or TCP headers | When TCP is used in a situation where either the IP or TCP headers | |||
| are not fixed, the sender must reduce the amount of TCP data in any | are not fixed, the sender must reduce the amount of TCP data in any | |||
| given packet by the number of octets used by the IP and TCP options. | given packet by the number of octets used by the IP and TCP options. | |||
| This has been a point of confusion historically, as explained in RFC | This has been a point of confusion historically, as explained in RFC | |||
| 6691, Section 3.1. | 6691, Section 3.1. | |||
| 3.7.2. Path MTU Discovery | 3.7.2. Path MTU Discovery | |||
| A TCP implementation may be aware of the MTU on directly connected | A TCP implementation may be aware of the MTU on directly connected | |||
| skipping to change at page 38, line 46 ¶ | skipping to change at page 38, line 46 ¶ | |||
| PMTUD and PLPMTUD help TCP choose segment sizes that avoid both on- | PMTUD and PLPMTUD help TCP choose segment sizes that avoid both on- | |||
| path (for IPv4) and source fragmentation (IPv4 and IPv6). | path (for IPv4) and source fragmentation (IPv4 and IPv6). | |||
| PMTUD for IPv4 [2] or IPv6 [14] is implemented in conjunction between | PMTUD for IPv4 [2] or IPv6 [14] is implemented in conjunction between | |||
| TCP, IP, and ICMP protocols. It relies both on avoiding source | TCP, IP, and ICMP protocols. It relies both on avoiding source | |||
| fragmentation and setting the IPv4 DF (don't fragment) flag, the | fragmentation and setting the IPv4 DF (don't fragment) flag, the | |||
| latter to inhibit on-path fragmentation. It relies on ICMP errors | latter to inhibit on-path fragmentation. It relies on ICMP errors | |||
| from routers along the path, whenever a segment is too large to | from routers along the path, whenever a segment is too large to | |||
| traverse a link. Several adjustments to a TCP implementation with | traverse a link. Several adjustments to a TCP implementation with | |||
| PMTUD are described in RFC 2923 in order to deal with problems | PMTUD are described in RFC 2923 in order to deal with problems | |||
| experienced in practice [25]. PLPMTUD [29] is a Standards Track | experienced in practice [28]. PLPMTUD [32] is a Standards Track | |||
| improvement to PMTUD that relaxes the requirement for ICMP support | improvement to PMTUD that relaxes the requirement for ICMP support | |||
| across a path, and improves performance in cases where ICMP is not | across a path, and improves performance in cases where ICMP is not | |||
| consistently conveyed, but still tries to avoid source fragmentation. | consistently conveyed, but still tries to avoid source fragmentation. | |||
| The mechanisms in all four of these RFCs are recommended to be | The mechanisms in all four of these RFCs are recommended to be | |||
| included in TCP implementations. | included in TCP implementations. | |||
| The TCP MSS option specifies an upper bound for the size of packets | The TCP MSS option specifies an upper bound for the size of packets | |||
| that can be received (see [42]). Hence, setting the value in the MSS | that can be received (see [44]). Hence, setting the value in the MSS | |||
| option too small can impact the ability for PMTUD or PLPMTUD to find | option too small can impact the ability for PMTUD or PLPMTUD to find | |||
| a larger path MTU. RFC 1191 discusses this implication of many older | a larger path MTU. RFC 1191 discusses this implication of many older | |||
| TCP implementations setting the TCP MSS to 536 (corresponding to the | TCP implementations setting the TCP MSS to 536 (corresponding to the | |||
| IPv4 576 byte default MTU) for non-local destinations, rather than | IPv4 576 byte default MTU) for non-local destinations, rather than | |||
| deriving it from the MTUs of connected interfaces as recommended. | deriving it from the MTUs of connected interfaces as recommended. | |||
| 3.7.3. Interfaces with Variable MTU Values | 3.7.3. Interfaces with Variable MTU Values | |||
| The effective MTU can sometimes vary, as when used with variable | The effective MTU can sometimes vary, as when used with variable | |||
| compression, e.g., RObust Header Compression (ROHC) [35]. It is | compression, e.g., RObust Header Compression (ROHC) [38]. It is | |||
| tempting for a TCP implementation to advertise the largest possible | tempting for a TCP implementation to advertise the largest possible | |||
| MSS, to support the most efficient use of compressed payloads. | MSS, to support the most efficient use of compressed payloads. | |||
| Unfortunately, some compression schemes occasionally need to transmit | Unfortunately, some compression schemes occasionally need to transmit | |||
| full headers (and thus smaller payloads) to resynchronize state at | full headers (and thus smaller payloads) to resynchronize state at | |||
| their endpoint compressors/decompressors. If the largest MTU is used | their endpoint compressors/decompressors. If the largest MTU is used | |||
| to calculate the value to advertise in the MSS option, TCP | to calculate the value to advertise in the MSS option, TCP | |||
| retransmission may interfere with compressor resynchronization. | retransmission may interfere with compressor resynchronization. | |||
| As a result, when the effective MTU of an interface varies packet-to- | As a result, when the effective MTU of an interface varies packet-to- | |||
| packet, TCP implementations SHOULD use the smallest effective MTU of | packet, TCP implementations SHOULD use the smallest effective MTU of | |||
| the interface to calculate the value to advertise in the MSS option | the interface to calculate the value to advertise in the MSS option | |||
| (SHLD-6). | (SHLD-6). | |||
| 3.7.4. Nagle Algorithm | 3.7.4. Nagle Algorithm | |||
| The "Nagle algorithm" was described in RFC 896 [17] and was | The "Nagle algorithm" was described in RFC 896 [18] and was | |||
| recommended in RFC 1122 [18] for mitigation of an early problem of | recommended in RFC 1122 [20] for mitigation of an early problem of | |||
| too many small packets being generated. It has been implemented in | too many small packets being generated. It has been implemented in | |||
| most current TCP code bases, sometimes with minor variations (see | most current TCP code bases, sometimes with minor variations (see | |||
| Appendix A.3). | Appendix A.3). | |||
| If there is unacknowledged data (i.e., SND.NXT > SND.UNA), then the | If there is unacknowledged data (i.e., SND.NXT > SND.UNA), then the | |||
| sending TCP endpoint buffers all user data (regardless of the PSH | sending TCP endpoint buffers all user data (regardless of the PSH | |||
| bit), until the outstanding data has been acknowledged or until the | bit), until the outstanding data has been acknowledged or until the | |||
| TCP endpoint can send a full-sized segment (Eff.snd.MSS bytes). | TCP endpoint can send a full-sized segment (Eff.snd.MSS bytes). | |||
| A TCP implementation SHOULD implement the Nagle Algorithm to coalesce | A TCP implementation SHOULD implement the Nagle Algorithm to coalesce | |||
| short segments (SHLD-7). However, there MUST be a way for an | short segments (SHLD-7). However, there MUST be a way for an | |||
| application to disable the Nagle algorithm on an individual | application to disable the Nagle algorithm on an individual | |||
| connection (MUST-17). In all cases, sending data is also subject to | connection (MUST-17). In all cases, sending data is also subject to | |||
| the limitation imposed by the Slow Start algorithm [9]. | the limitation imposed by the Slow Start algorithm [8]. | |||
| Since there can be problematic interactions between the Nagle | Since there can be problematic interactions between the Nagle | |||
| Algorithm and delayed acknowledgements, some implementations use | Algorithm and delayed acknowledgements, some implementations use | |||
| minor variations of the Nagle algorithm, such as the one described in | minor variations of the Nagle algorithm, such as the one described in | |||
| Appendix A.3. | Appendix A.3. | |||
| 3.7.5. IPv6 Jumbograms | 3.7.5. IPv6 Jumbograms | |||
| In order to support TCP over IPv6 Jumbograms, implementations need to | In order to support TCP over IPv6 Jumbograms, implementations need to | |||
| be able to send TCP segments larger than the 64KB limit that the MSS | be able to send TCP segments larger than the 64KB limit that the MSS | |||
| option can convey. RFC 2675 [5] defines that an MSS value of 65,535 | option can convey. RFC 2675 [25] defines that an MSS value of 65,535 | |||
| bytes is to be treated as infinity, and Path MTU Discovery [14] is | bytes is to be treated as infinity, and Path MTU Discovery [14] is | |||
| used to determine the actual MSS. | used to determine the actual MSS. | |||
| The Jumbo Payload option need not be implemented or understood by | The Jumbo Payload option need not be implemented or understood by | |||
| IPv6 nodes that do not support attachment to links with a MTU greater | IPv6 nodes that do not support attachment to links with a MTU greater | |||
| than 65,575 [5], and the present IPv6 Node Requirements does not | than 65,575 [25], and the present IPv6 Node Requirements does not | |||
| include support for Jumbograms [53]. | include support for Jumbograms [55]. | |||
| 3.8. Data Communication | 3.8. Data Communication | |||
| Once the connection is established data is communicated by the | Once the connection is established data is communicated by the | |||
| exchange of segments. Because segments may be lost due to errors | exchange of segments. Because segments may be lost due to errors | |||
| (checksum test failure), or network congestion, TCP uses | (checksum test failure), or network congestion, TCP uses | |||
| retransmission to ensure delivery of every segment. Duplicate | retransmission to ensure delivery of every segment. Duplicate | |||
| segments may arrive due to network or TCP retransmission. As | segments may arrive due to network or TCP retransmission. As | |||
| discussed in the section on sequence numbers the TCP implementation | discussed in the section on sequence numbers, the TCP implementation | |||
| performs certain tests on the sequence and acknowledgment numbers in | performs certain tests on the sequence and acknowledgment numbers in | |||
| the segments to verify their acceptability. | the segments to verify their acceptability. | |||
| The sender of data keeps track of the next sequence number to use in | The sender of data keeps track of the next sequence number to use in | |||
| the variable SND.NXT. The receiver of data keeps track of the next | the variable SND.NXT. The receiver of data keeps track of the next | |||
| sequence number to expect in the variable RCV.NXT. The sender of | sequence number to expect in the variable RCV.NXT. The sender of | |||
| data keeps track of the oldest unacknowledged sequence number in the | data keeps track of the oldest unacknowledged sequence number in the | |||
| variable SND.UNA. If the data flow is momentarily idle and all data | variable SND.UNA. If the data flow is momentarily idle and all data | |||
| sent has been acknowledged then the three variables will be equal. | sent has been acknowledged then the three variables will be equal. | |||
| skipping to change at page 41, line 14 ¶ | skipping to change at page 41, line 14 ¶ | |||
| 3.8.1. Retransmission Timeout | 3.8.1. Retransmission Timeout | |||
| Because of the variability of the networks that compose an | Because of the variability of the networks that compose an | |||
| internetwork system and the wide range of uses of TCP connections the | internetwork system and the wide range of uses of TCP connections the | |||
| retransmission timeout (RTO) must be dynamically determined. | retransmission timeout (RTO) must be dynamically determined. | |||
| The RTO MUST be computed according to the algorithm in [10], | The RTO MUST be computed according to the algorithm in [10], | |||
| including Karn's algorithm for taking RTT samples (MUST-18). | including Karn's algorithm for taking RTT samples (MUST-18). | |||
| RFC 793 contains an early example procedure for computing the RTO. | RFC 793 contains an early example procedure for computing the RTO, | |||
| This was then replaced by the algorithm described in RFC 1122, and | based on work mentioned in IEN 177 [72]. This was then replaced by | |||
| subsequently updated in RFC 2988, and then again in RFC 6298. | the algorithm described in RFC 1122, and subsequently updated in RFC | |||
| 2988, and then again in RFC 6298. | ||||
| RFC 1122 allows that if a retransmitted packet is identical to the | RFC 1122 allows that if a retransmitted packet is identical to the | |||
| original packet (which implies not only that the data boundaries have | original packet (which implies not only that the data boundaries have | |||
| not changed, but also that none of the headers have changed), then | not changed, but also that none of the headers have changed), then | |||
| the same IPv4 Identification field MAY be used (see Section 3.2.1.5 | the same IPv4 Identification field MAY be used (see Section 3.2.1.5 | |||
| of RFC 1122) (MAY-4). The same IP identification field may be reused | of RFC 1122) (MAY-4). The same IP identification field may be reused | |||
| anyways, since it is only meaningful when a datagram is fragmented | anyways, since it is only meaningful when a datagram is fragmented | |||
| [43]. TCP implementations should not rely on or typically interact | [45]. TCP implementations should not rely on or typically interact | |||
| with this IPv4 header field in any way. It is not a reasonable way | with this IPv4 header field in any way. It is not a reasonable way | |||
| to either indicate duplicate sent segments, nor to identify duplicate | to either indicate duplicate sent segments, nor to identify duplicate | |||
| received segments. | received segments. | |||
| 3.8.2. TCP Congestion Control | 3.8.2. TCP Congestion Control | |||
| RFC 2914 [6] explains the importance of congestion control for the | RFC 2914 [5] explains the importance of congestion control for the | |||
| Internet. | Internet. | |||
| RFC 1122 required implementation of Van Jacobson's congestion control | RFC 1122 required implementation of Van Jacobson's congestion control | |||
| algorithms slow start and congestion avoidance together with | algorithms slow start and congestion avoidance together with | |||
| exponential back-off for successive RTO values for the same segment. | exponential back-off for successive RTO values for the same segment. | |||
| RFC 2581 provided IETF Standards Track description of slow start and | RFC 2581 provided IETF Standards Track description of slow start and | |||
| congestion avoidance, along with fast retransmit and fast recovery. | congestion avoidance, along with fast retransmit and fast recovery. | |||
| RFC 5681 is the current description of these algorithms and is the | RFC 5681 is the current description of these algorithms and is the | |||
| current Standards Track specification providing guidelines for TCP | current Standards Track specification providing guidelines for TCP | |||
| congestion control. RFC 6298 describes exponential back-off of RTO | congestion control. RFC 6298 describes exponential back-off of RTO | |||
| skipping to change at page 42, line 7 ¶ | skipping to change at page 42, line 8 ¶ | |||
| A TCP endpoint MUST implement the basic congestion control algorithms | A TCP endpoint MUST implement the basic congestion control algorithms | |||
| slow start, congestion avoidance, and exponential back-off of RTO to | slow start, congestion avoidance, and exponential back-off of RTO to | |||
| avoid creating congestion collapse conditions (MUST-19). RFC 5681 | avoid creating congestion collapse conditions (MUST-19). RFC 5681 | |||
| and RFC 6298 describe the basic algorithms on the IETF Standards | and RFC 6298 describe the basic algorithms on the IETF Standards | |||
| Track that are broadly applicable. Multiple other suitable | Track that are broadly applicable. Multiple other suitable | |||
| algorithms exist and have been widely used. Many TCP implementations | algorithms exist and have been widely used. Many TCP implementations | |||
| support a set of alternative algorithms that can be configured for | support a set of alternative algorithms that can be configured for | |||
| use on the endpoint. An endpoint MAY implement such alternative | use on the endpoint. An endpoint MAY implement such alternative | |||
| algorithms provided that the algorithms are conformant with the TCP | algorithms provided that the algorithms are conformant with the TCP | |||
| specifications from the IETF Standards Track as described in RFC | specifications from the IETF Standards Track as described in RFC | |||
| 2914, RFC 5033 [8], and RFC 8961 [15] (MAY-18). | 2914, RFC 5033 [7], and RFC 8961 [15] (MAY-18). | |||
| Explicit Congestion Notification (ECN) was defined in RFC 3168 and is | Explicit Congestion Notification (ECN) was defined in RFC 3168 and is | |||
| an IETF Standards Track enhancement that has many benefits [50]. | an IETF Standards Track enhancement that has many benefits [52]. | |||
| A TCP endpoint SHOULD implement ECN as described in RFC 3168 (SHLD- | A TCP endpoint SHOULD implement ECN as described in RFC 3168 (SHLD- | |||
| 8). | 8). | |||
| 3.8.3. TCP Connection Failures | 3.8.3. TCP Connection Failures | |||
| Excessive retransmission of the same segment by a TCP endpoint | Excessive retransmission of the same segment by a TCP endpoint | |||
| indicates some failure of the remote host or the Internet path. This | indicates some failure of the remote host or the Internet path. This | |||
| failure may be of short or long duration. The following procedure | failure may be of short or long duration. The following procedure | |||
| MUST be used to handle excessive retransmissions of data segments | MUST be used to handle excessive retransmissions of data segments | |||
| (MUST-20): | (MUST-20): | |||
| (a) There are two thresholds R1 and R2 measuring the amount of | (a) There are two thresholds R1 and R2 measuring the amount of | |||
| retransmission that has occurred for the same segment. R1 and R2 | retransmission that has occurred for the same segment. R1 and R2 | |||
| might be measured in time units or as a count of retransmissions | might be measured in time units or as a count of retransmissions | |||
| (with the current RTO and corresponding backoffs as a conversion | (with the current RTO and corresponding backoffs as a conversion | |||
| factor, if needed). | factor, if needed). | |||
| (b) When the number of transmissions of the same segment reaches | (b) When the number of transmissions of the same segment reaches | |||
| or exceeds threshold R1, pass negative advice (see Section 3.3.1.4 | or exceeds threshold R1, pass negative advice (see Section 3.3.1.4 | |||
| of [18]) to the IP layer, to trigger dead-gateway diagnosis. | of [20]) to the IP layer, to trigger dead-gateway diagnosis. | |||
| (c) When the number of transmissions of the same segment reaches a | (c) When the number of transmissions of the same segment reaches a | |||
| threshold R2 greater than R1, close the connection. | threshold R2 greater than R1, close the connection. | |||
| (d) An application MUST (MUST-21) be able to set the value for R2 | (d) An application MUST (MUST-21) be able to set the value for R2 | |||
| for a particular connection. For example, an interactive | for a particular connection. For example, an interactive | |||
| application might set R2 to "infinity," giving the user control | application might set R2 to "infinity," giving the user control | |||
| over when to disconnect. | over when to disconnect. | |||
| (e) TCP implementations SHOULD inform the application of the | (e) TCP implementations SHOULD inform the application of the | |||
| delivery problem (unless such information has been disabled by the | delivery problem (unless such information has been disabled by the | |||
| application; see Asynchronous Reports section), when R1 is reached | application; see Asynchronous Reports section), when R1 is reached | |||
| and before R2 (SHLD-9). This will allow a remote login (User | and before R2 (SHLD-9). This will allow a remote login | |||
| Telnet) application program to inform the user, for example. | application program to inform the user, for example. | |||
| The value of R1 SHOULD correspond to at least 3 retransmissions, at | The value of R1 SHOULD correspond to at least 3 retransmissions, at | |||
| the current RTO (SHLD-10). The value of R2 SHOULD correspond to at | the current RTO (SHLD-10). The value of R2 SHOULD correspond to at | |||
| least 100 seconds (SHLD-11). | least 100 seconds (SHLD-11). | |||
| An attempt to open a TCP connection could fail with excessive | An attempt to open a TCP connection could fail with excessive | |||
| retransmissions of the SYN segment or by receipt of a RST segment or | retransmissions of the SYN segment or by receipt of a RST segment or | |||
| an ICMP Port Unreachable. SYN retransmissions MUST be handled in the | an ICMP Port Unreachable. SYN retransmissions MUST be handled in the | |||
| general way just described for data retransmissions, including | general way just described for data retransmissions, including | |||
| notification of the application layer. | notification of the application layer. | |||
| skipping to change at page 44, line 10 ¶ | skipping to change at page 44, line 10 ¶ | |||
| An implementation SHOULD send a keep-alive segment with no data | An implementation SHOULD send a keep-alive segment with no data | |||
| (SHLD-12); however, it MAY be configurable to send a keep-alive | (SHLD-12); however, it MAY be configurable to send a keep-alive | |||
| segment containing one garbage octet (MAY-6), for compatibility with | segment containing one garbage octet (MAY-6), for compatibility with | |||
| erroneous TCP implementations. | erroneous TCP implementations. | |||
| 3.8.5. The Communication of Urgent Information | 3.8.5. The Communication of Urgent Information | |||
| As a result of implementation differences and middlebox interactions, | As a result of implementation differences and middlebox interactions, | |||
| new applications SHOULD NOT employ the TCP urgent mechanism (SHLD- | new applications SHOULD NOT employ the TCP urgent mechanism (SHLD- | |||
| 13). However, TCP implementations MUST still include support for the | 13). However, TCP implementations MUST still include support for the | |||
| urgent mechanism (MUST-30). Details can be found in RFC 6093 [38]. | urgent mechanism (MUST-30). Information on how some TCP | |||
| implementations interpret the urgent pointer can be found in RFC 6093 | ||||
| [40]. | ||||
| The objective of the TCP urgent mechanism is to allow the sending | The objective of the TCP urgent mechanism is to allow the sending | |||
| user to stimulate the receiving user to accept some urgent data and | user to stimulate the receiving user to accept some urgent data and | |||
| to permit the receiving TCP endpoint to indicate to the receiving | to permit the receiving TCP endpoint to indicate to the receiving | |||
| user when all the currently known urgent data has been received by | user when all the currently known urgent data has been received by | |||
| the user. | the user. | |||
| This mechanism permits a point in the data stream to be designated as | This mechanism permits a point in the data stream to be designated as | |||
| the end of urgent information. Whenever this point is in advance of | the end of urgent information. Whenever this point is in advance of | |||
| the receive sequence number (RCV.NXT) at the receiving TCP endpoint, | the receive sequence number (RCV.NXT) at the receiving TCP endpoint, | |||
| skipping to change at page 44, line 42 ¶ | skipping to change at page 44, line 44 ¶ | |||
| To send an urgent indication the user must also send at least one | To send an urgent indication the user must also send at least one | |||
| data octet. If the sending user also indicates a push, timely | data octet. If the sending user also indicates a push, timely | |||
| delivery of the urgent information to the destination process is | delivery of the urgent information to the destination process is | |||
| enhanced. Note that because changes in the urgent pointer correspond | enhanced. Note that because changes in the urgent pointer correspond | |||
| to data being written by a sending application, the urgent pointer | to data being written by a sending application, the urgent pointer | |||
| can not "recede" in the sequence space, but a TCP receiver should be | can not "recede" in the sequence space, but a TCP receiver should be | |||
| robust to invalid urgent pointer values. | robust to invalid urgent pointer values. | |||
| A TCP implementation MUST support a sequence of urgent data of any | A TCP implementation MUST support a sequence of urgent data of any | |||
| length (MUST-31). [18] | length (MUST-31). [20] | |||
| The urgent pointer MUST point to the sequence number of the octet | The urgent pointer MUST point to the sequence number of the octet | |||
| following the urgent data (MUST-62). | following the urgent data (MUST-62). | |||
| A TCP implementation MUST (MUST-32) inform the application layer | A TCP implementation MUST (MUST-32) inform the application layer | |||
| asynchronously whenever it receives an Urgent pointer and there was | asynchronously whenever it receives an Urgent pointer and there was | |||
| previously no pending urgent data, or whenever the Urgent pointer | previously no pending urgent data, or whenever the Urgent pointer | |||
| advances in the data stream. The TCP implementation MUST (MUST-33) | advances in the data stream. The TCP implementation MUST (MUST-33) | |||
| provide a way for the application to learn how much urgent data | provide a way for the application to learn how much urgent data | |||
| remains to be read from the connection, or at least to determine | remains to be read from the connection, or at least to determine | |||
| whether or not more urgent data remains to be read [18]. | whether more urgent data remains to be read [20]. | |||
| 3.8.6. Managing the Window | 3.8.6. Managing the Window | |||
| The window sent in each segment indicates the range of sequence | The window sent in each segment indicates the range of sequence | |||
| numbers the sender of the window (the data receiver) is currently | numbers the sender of the window (the data receiver) is currently | |||
| prepared to accept. There is an assumption that this is related to | prepared to accept. There is an assumption that this is related to | |||
| the currently available data buffer space available for this | the currently available data buffer space available for this | |||
| connection. | connection. | |||
| The sending TCP endpoint packages the data to be transmitted into | The sending TCP endpoint packages the data to be transmitted into | |||
| segments that fit the current window, and may repackage segments on | segments that fit the current window, and may repackage segments on | |||
| the retransmission queue. Such repackaging is not required, but may | the retransmission queue. Such repackaging is not required, but may | |||
| be helpful. | be helpful. | |||
| In a connection with a one-way data flow, the window information will | In a connection with a one-way data flow, the window information will | |||
| be carried in acknowledgment segments that all have the same sequence | be carried in acknowledgment segments that all have the same sequence | |||
| number so there will be no way to reorder them if they arrive out of | number, so there will be no way to reorder them if they arrive out of | |||
| order. This is not a serious problem, but it will allow the window | order. This is not a serious problem, but it will allow the window | |||
| information to be on occasion temporarily based on old reports from | information to be on occasion temporarily based on old reports from | |||
| the data receiver. A refinement to avoid this problem is to act on | the data receiver. A refinement to avoid this problem is to act on | |||
| the window information from segments that carry the highest | the window information from segments that carry the highest | |||
| acknowledgment number (that is segments with acknowledgment number | acknowledgment number (that is segments with acknowledgment number | |||
| equal or greater than the highest previously received). | equal or greater than the highest previously received). | |||
| Indicating a large window encourages transmissions. If more data | Indicating a large window encourages transmissions. If more data | |||
| arrives than can be accepted, it will be discarded. This will result | arrives than can be accepted, it will be discarded. This will result | |||
| in excessive retransmissions, adding unnecessarily to the load on the | in excessive retransmissions, adding unnecessarily to the load on the | |||
| network and the TCP endpoints. Indicating a small window may | network and the TCP endpoints. Indicating a small window may | |||
| restrict the transmission of data to the point of introducing a round | restrict the transmission of data to the point of introducing a round | |||
| trip delay between each new segment transmitted. | trip delay between each new segment transmitted. | |||
| The mechanisms provided allow a TCP endpoint to advertise a large | The mechanisms provided allow a TCP endpoint to advertise a large | |||
| window and to subsequently advertise a much smaller window without | window and to subsequently advertise a much smaller window without | |||
| having accepted that much data. This, so called "shrinking the | having accepted that much data. This, so-called "shrinking the | |||
| window," is strongly discouraged. The robustness principle [18] | window," is strongly discouraged. The robustness principle [20] | |||
| dictates that TCP peers will not shrink the window themselves, but | dictates that TCP peers will not shrink the window themselves, but | |||
| will be prepared for such behavior on the part of other TCP peers. | will be prepared for such behavior on the part of other TCP peers. | |||
| A TCP receiver SHOULD NOT shrink the window, i.e., move the right | A TCP receiver SHOULD NOT shrink the window, i.e., move the right | |||
| window edge to the left (SHLD-14). However, a sending TCP peer MUST | window edge to the left (SHLD-14). However, a sending TCP peer MUST | |||
| be robust against window shrinking, which may cause the "usable | be robust against window shrinking, which may cause the "usable | |||
| window" (see Section 3.8.6.2.1) to become negative (MUST-34). | window" (see Section 3.8.6.2.1) to become negative (MUST-34). | |||
| If this happens, the sender SHOULD NOT send new data (SHLD-15), but | If this happens, the sender SHOULD NOT send new data (SHLD-15), but | |||
| SHOULD retransmit normally the old unacknowledged data between | SHOULD retransmit normally the old unacknowledged data between | |||
| skipping to change at page 46, line 36 ¶ | skipping to change at page 46, line 31 ¶ | |||
| reported to the other. This is referred to as Zero-Window Probing | reported to the other. This is referred to as Zero-Window Probing | |||
| (ZWP) in other documents. | (ZWP) in other documents. | |||
| Probing of zero (offered) windows MUST be supported (MUST-36). | Probing of zero (offered) windows MUST be supported (MUST-36). | |||
| A TCP implementation MAY keep its offered receive window closed | A TCP implementation MAY keep its offered receive window closed | |||
| indefinitely (MAY-8). As long as the receiving TCP peer continues to | indefinitely (MAY-8). As long as the receiving TCP peer continues to | |||
| send acknowledgments in response to the probe segments, the sending | send acknowledgments in response to the probe segments, the sending | |||
| TCP peer MUST allow the connection to stay open (MUST-37). This | TCP peer MUST allow the connection to stay open (MUST-37). This | |||
| enables TCP to function in scenarios such as the "printer ran out of | enables TCP to function in scenarios such as the "printer ran out of | |||
| paper" situation described in Section 4.2.2.17 of RFC1122. The | paper" situation described in Section 4.2.2.17 of [20]. The behavior | |||
| behavior is subject to the implementation's resource management | is subject to the implementation's resource management concerns, as | |||
| concerns, as noted in [40]. | noted in [42]. | |||
| When the receiving TCP peer has a zero window and a segment arrives | When the receiving TCP peer has a zero window and a segment arrives | |||
| it must still send an acknowledgment showing its next expected | it must still send an acknowledgment showing its next expected | |||
| sequence number and current window (zero). | sequence number and current window (zero). | |||
| The transmitting host SHOULD send the first zero-window probe when a | The transmitting host SHOULD send the first zero-window probe when a | |||
| zero window has existed for the retransmission timeout period (SHLD- | zero window has existed for the retransmission timeout period (SHLD- | |||
| 29) (Section 3.8.1), and SHOULD increase exponentially the interval | 29) (Section 3.8.1), and SHOULD increase exponentially the interval | |||
| between successive probes (SHLD-30). | between successive probes (SHLD-30). | |||
| skipping to change at page 47, line 28 ¶ | skipping to change at page 47, line 18 ¶ | |||
| 3.8.6.2.1. Sender's Algorithm - When to Send Data | 3.8.6.2.1. Sender's Algorithm - When to Send Data | |||
| A TCP implementation MUST include a SWS avoidance algorithm in the | A TCP implementation MUST include a SWS avoidance algorithm in the | |||
| sender (MUST-38). | sender (MUST-38). | |||
| The Nagle algorithm from Section 3.7.4 additionally describes how to | The Nagle algorithm from Section 3.7.4 additionally describes how to | |||
| coalesce short segments. | coalesce short segments. | |||
| The sender's SWS avoidance algorithm is more difficult than the | The sender's SWS avoidance algorithm is more difficult than the | |||
| receivers's, because the sender does not know (directly) the | receiver's, because the sender does not know (directly) the | |||
| receiver's total buffer space RCV.BUFF. An approach that has been | receiver's total buffer space RCV.BUFF. An approach that has been | |||
| found to work well is for the sender to calculate Max(SND.WND), the | found to work well is for the sender to calculate Max(SND.WND), the | |||
| maximum send window it has seen so far on the connection, and to use | maximum send window it has seen so far on the connection, and to use | |||
| this value as an estimate of RCV.BUFF. Unfortunately, this can only | this value as an estimate of RCV.BUFF. Unfortunately, this can only | |||
| be an estimate; the receiver may at any time reduce the size of | be an estimate; the receiver may at any time reduce the size of | |||
| RCV.BUFF. To avoid a resulting deadlock, it is necessary to have a | RCV.BUFF. To avoid a resulting deadlock, it is necessary to have a | |||
| timeout to force transmission of data, overriding the SWS avoidance | timeout to force transmission of data, overriding the SWS avoidance | |||
| algorithm. In practice, this timeout should seldom occur. | algorithm. In practice, this timeout should seldom occur. | |||
| The "usable window" is: | The "usable window" is: | |||
| skipping to change at page 49, line 31 ¶ | skipping to change at page 49, line 20 ¶ | |||
| min( Fr * RCV.BUFF, Eff.snd.MSS ) | min( Fr * RCV.BUFF, Eff.snd.MSS ) | |||
| where Fr is a fraction whose recommended value is 1/2, and | where Fr is a fraction whose recommended value is 1/2, and | |||
| Eff.snd.MSS is the effective send MSS for the connection (see | Eff.snd.MSS is the effective send MSS for the connection (see | |||
| Section 3.7.1). When the inequality is satisfied, RCV.WND is set to | Section 3.7.1). When the inequality is satisfied, RCV.WND is set to | |||
| RCV.BUFF-RCV.USER. | RCV.BUFF-RCV.USER. | |||
| Note that the general effect of this algorithm is to advance RCV.WND | Note that the general effect of this algorithm is to advance RCV.WND | |||
| in increments of Eff.snd.MSS (for realistic receive buffers: | in increments of Eff.snd.MSS (for realistic receive buffers: | |||
| Eff.snd.MSS < RCV.BUFF/2). Note also that the receiver must use its | Eff.snd.MSS < RCV.BUFF/2). Note also that the receiver must use its | |||
| own Eff.snd.MSS, assuming it is the same as the sender's. | own Eff.snd.MSS, making the assumption that it is the same as the | |||
| sender's. | ||||
| 3.8.6.3. Delayed Acknowledgements - When to Send an ACK Segment | 3.8.6.3. Delayed Acknowledgements - When to Send an ACK Segment | |||
| A host that is receiving a stream of TCP data segments can increase | A host that is receiving a stream of TCP data segments can increase | |||
| efficiency in both the Internet and the hosts by sending fewer than | efficiency in both the Internet and the hosts by sending fewer than | |||
| one ACK (acknowledgment) segment per data segment received; this is | one ACK (acknowledgment) segment per data segment received; this is | |||
| known as a "delayed ACK". | known as a "delayed ACK". | |||
| A TCP endpoint SHOULD implement a delayed ACK (SHLD-18), but an ACK | A TCP endpoint SHOULD implement a delayed ACK (SHLD-18), but an ACK | |||
| should not be excessively delayed; in particular, the delay MUST be | should not be excessively delayed; in particular, the delay MUST be | |||
| less than 0.5 seconds (MUST-40). An ACK SHOULD be generated for at | less than 0.5 seconds (MUST-40). An ACK SHOULD be generated for at | |||
| least every second full-sized segment or 2*RMSS bytes of new data | least every second full-sized segment or 2*RMSS bytes of new data | |||
| (where RMSS is the MSS specified by the TCP endpoint receiving the | (where RMSS is the MSS specified by the TCP endpoint receiving the | |||
| segments to be acknowledged, or the default value if not specified) | segments to be acknowledged, or the default value if not specified) | |||
| (SHLD-19). Excessive delays on ACKs can disturb the round-trip | (SHLD-19). Excessive delays on ACKs can disturb the round-trip | |||
| timing and packet "clocking" algorithms. More complete discussion of | timing and packet "clocking" algorithms. More complete discussion of | |||
| delayed ACK behavior is in Section 4.2 of RFC 5681 [9], including | delayed ACK behavior is in Section 4.2 of RFC 5681 [8], including | |||
| recomendations to immediately acknowledge out-of-order segments, | recommendations to immediately acknowledge out-of-order segments, | |||
| segments above a gap in sequence space, or segments that fill all or | segments above a gap in sequence space, or segments that fill all or | |||
| part of a gap, in order to accelerate loss recovery. | part of a gap, in order to accelerate loss recovery. | |||
| Note that there are several current practices that further lead to a | Note that there are several current practices that further lead to a | |||
| reduced number of ACKs, including generic receive offload (GRO), ACK | reduced number of ACKs, including generic receive offload (GRO) [73], | |||
| compression, and ACK decimation [26]. | ACK compression, and ACK decimation [29]. | |||
| 3.9. Interfaces | 3.9. Interfaces | |||
| There are of course two interfaces of concern: the user/TCP interface | There are of course two interfaces of concern: the user/TCP interface | |||
| and the TCP/lower-level interface. We have a fairly elaborate model | and the TCP/lower level interface. We have a fairly elaborate model | |||
| of the user/TCP interface, but the interface to the lower level | of the user/TCP interface, but the interface to the lower level | |||
| protocol module is left unspecified here, since it will be specified | protocol module is left unspecified here, since it will be specified | |||
| in detail by the specification of the lower level protocol. For the | in detail by the specification of the lower level protocol. For the | |||
| case that the lower level is IP we note some of the parameter values | case that the lower level is IP we note some of the parameter values | |||
| that TCP implementations might use. | that TCP implementations might use. | |||
| 3.9.1. User/TCP Interface | 3.9.1. User/TCP Interface | |||
| The following functional description of user commands to the TCP | The following functional description of user commands to the TCP | |||
| implementation is, at best, fictional, since every operating system | implementation is, at best, fictional, since every operating system | |||
| will have different facilities. Consequently, we must warn readers | will have different facilities. Consequently, we must warn readers | |||
| that different TCP implementations may have different user | that different TCP implementations may have different user | |||
| interfaces. However, all TCP implementations must provide a certain | interfaces. However, all TCP implementations must provide a certain | |||
| minimum set of services to guarantee that all TCP implementations can | minimum set of services to guarantee that all TCP implementations can | |||
| support the same protocol hierarchy. This section specifies the | support the same protocol hierarchy. This section specifies the | |||
| functional interfaces required of all TCP implementations. | functional interfaces required of all TCP implementations. | |||
| Section 3.1 of [52] also identifies primitives provided by TCP, and | Section 3.1 of [54] also identifies primitives provided by TCP, and | |||
| could be used as an additional reference for implementers. | could be used as an additional reference for implementers. | |||
| TCP User Commands | The following sections functionally characterize a USER/TCP | |||
| interface. The notation used is similar to most procedure or | ||||
| The following sections functionally characterize a USER/TCP | function calls in high level languages, but this usage is not meant | |||
| interface. The notation used is similar to most procedure or | to rule out trap type service calls. | |||
| function calls in high level languages, but this usage is not | ||||
| meant to rule out trap type service calls. | ||||
| The user commands described below specify the basic functions the | ||||
| TCP implementation must perform to support interprocess | ||||
| communication. Individual implementations must define their own | ||||
| exact format, and may provide combinations or subsets of the basic | ||||
| functions in single calls. In particular, some implementations | ||||
| may wish to automatically OPEN a connection on the first SEND or | ||||
| RECEIVE issued by the user for a given connection. | ||||
| In providing interprocess communication facilities, the TCP | The user commands described below specify the basic functions the TCP | |||
| implementation must not only accept commands, but must also return | implementation must perform to support interprocess communication. | |||
| information to the processes it serves. The latter consists of: | Individual implementations must define their own exact format, and | |||
| may provide combinations or subsets of the basic functions in single | ||||
| calls. In particular, some implementations may wish to automatically | ||||
| OPEN a connection on the first SEND or RECEIVE issued by the user for | ||||
| a given connection. | ||||
| (a) general information about a connection (e.g., interrupts, | In providing interprocess communication facilities, the TCP | |||
| remote close, binding of unspecified remote socket). | implementation must not only accept commands, but must also return | |||
| information to the processes it serves. The latter consists of: | ||||
| (b) replies to specific user commands indicating success or | (a) general information about a connection (e.g., interrupts, | |||
| various types of failure. | remote close, binding of unspecified remote socket). | |||
| Open | (b) replies to specific user commands indicating success or | |||
| various types of failure. | ||||
| Format: OPEN (local port, remote socket, active/passive [, | 3.9.1.1. Open | |||
| timeout] [, DiffServ field] [, security/compartment] [local IP | ||||
| address,] [, options]) -> local connection name | ||||
| If the active/passive flag is set to passive, then this is a | Format: OPEN (local port, remote socket, active/passive [, | |||
| call to LISTEN for an incoming connection. A passive open may | timeout] [, DiffServ field] [, security/compartment] [local IP | |||
| have either a fully specified remote socket to wait for a | address,] [, options]) -> local connection name | |||
| particular connection or an unspecified remote socket to wait | ||||
| for any call. A fully specified passive call can be made | ||||
| active by the subsequent execution of a SEND. | ||||
| A transmission control block (TCB) is created and partially | If the active/passive flag is set to passive, then this is a call | |||
| filled in with data from the OPEN command parameters. | to LISTEN for an incoming connection. A passive open may have | |||
| either a fully specified remote socket to wait for a particular | ||||
| connection or an unspecified remote socket to wait for any call. | ||||
| A fully specified passive call can be made active by the | ||||
| subsequent execution of a SEND. | ||||
| Every passive OPEN call either creates a new connection record | A transmission control block (TCB) is created and partially filled | |||
| in LISTEN state, or it returns an error; it MUST NOT affect any | in with data from the OPEN command parameters. | |||
| previously created connection record (MUST-41). | ||||
| A TCP implementation that supports multiple concurrent | Every passive OPEN call either creates a new connection record in | |||
| connections MUST provide an OPEN call that will functionally | LISTEN state, or it returns an error; it MUST NOT affect any | |||
| allow an application to LISTEN on a port while a connection | previously created connection record (MUST-41). | |||
| block with the same local port is in SYN-SENT or SYN-RECEIVED | ||||
| state (MUST-42). | ||||
| On an active OPEN command, the TCP endpoint will begin the | A TCP implementation that supports multiple concurrent connections | |||
| procedure to synchronize (i.e., establish) the connection at | MUST provide an OPEN call that will functionally allow an | |||
| once. | application to LISTEN on a port while a connection block with the | |||
| same local port is in SYN-SENT or SYN-RECEIVED state (MUST-42). | ||||
| The timeout, if present, permits the caller to set up a timeout | On an active OPEN command, the TCP endpoint will begin the | |||
| for all data submitted to TCP. If data is not successfully | procedure to synchronize (i.e., establish) the connection at once. | |||
| delivered to the destination within the timeout period, the TCP | ||||
| endpoint will abort the connection. The present global default | ||||
| is five minutes. | ||||
| The TCP implementation or some component of the operating | The timeout, if present, permits the caller to set up a timeout | |||
| system will verify the users authority to open a connection | for all data submitted to TCP. If data is not successfully | |||
| with the specified DiffServ field value or security/ | delivered to the destination within the timeout period, the TCP | |||
| compartment. The absence of a DiffServ field value or | endpoint will abort the connection. The present global default is | |||
| security/compartment specification in the OPEN call indicates | five minutes. | |||
| the default values must be used. | ||||
| TCP will accept incoming requests as matching only if the | The TCP implementation or some component of the operating system | |||
| security/compartment information is exactly the same as that | will verify the user's authority to open a connection with the | |||
| requested in the OPEN call. | specified DiffServ field value or security/compartment. The | |||
| absence of a DiffServ field value or security/compartment | ||||
| specification in the OPEN call indicates the default values must | ||||
| be used. | ||||
| The DiffServ field value indicated by the user only impacts | TCP will accept incoming requests as matching only if the | |||
| outgoing packets, may be altered en route through the network, | security/compartment information is exactly the same as that | |||
| and has no direct bearing or relation to received packets. | requested in the OPEN call. | |||
| A local connection name will be returned to the user by the TCP | The DiffServ field value indicated by the user only impacts | |||
| implementation. The local connection name can then be used as | outgoing packets, may be altered en route through the network, and | |||
| a short hand term for the connection defined by the <local | has no direct bearing or relation to received packets. | |||
| socket, remote socket> pair. | ||||
| The optional "local IP address" parameter MUST be supported to | A local connection name will be returned to the user by the TCP | |||
| allow the specification of the local IP address (MUST-43). | implementation. The local connection name can then be used as a | |||
| This enables applications that need to select the local IP | short-hand term for the connection defined by the <local socket, | |||
| address used when multihoming is present. | remote socket> pair. | |||
| A passive OPEN call with a specified "local IP address" | The optional "local IP address" parameter MUST be supported to | |||
| parameter will await an incoming connection request to that | allow the specification of the local IP address (MUST-43). This | |||
| address. If the parameter is unspecified, a passive OPEN will | enables applications that need to select the local IP address used | |||
| await an incoming connection request to any local IP address, | when multihoming is present. | |||
| and then bind the local IP address of the connection to the | ||||
| particular address that is used. | ||||
| For an active OPEN call, a specified "local IP address" | A passive OPEN call with a specified "local IP address" parameter | |||
| parameter will be used for opening the connection. If the | will await an incoming connection request to that address. If the | |||
| parameter is unspecified, the host will choose an appropriate | parameter is unspecified, a passive OPEN will await an incoming | |||
| local IP address (see RFC 1122 section 3.3.4.2). | connection request to any local IP address, and then bind the | |||
| local IP address of the connection to the particular address that | ||||
| is used. | ||||
| If an application on a multihomed host does not specify the | For an active OPEN call, a specified "local IP address" parameter | |||
| local IP address when actively opening a TCP connection, then | will be used for opening the connection. If the parameter is | |||
| the TCP implementation MUST ask the IP layer to select a local | unspecified, the host will choose an appropriate local IP address | |||
| IP address before sending the (first) SYN (MUST-44). See the | (see RFC 1122 section 3.3.4.2). | |||
| function GET_SRCADDR() in Section 3.4 of RFC 1122. | ||||
| At all other times, a previous segment has either been sent or | If an application on a multihomed host does not specify the local | |||
| received on this connection, and TCP implementations MUST use | IP address when actively opening a TCP connection, then the TCP | |||
| the same local address is used that was used in those previous | implementation MUST ask the IP layer to select a local IP address | |||
| segments (MUST-45). | before sending the (first) SYN (MUST-44). See the function | |||
| GET_SRCADDR() in Section 3.4 of RFC 1122. | ||||
| A TCP implementation MUST reject as an error a local OPEN call | At all other times, a previous segment has either been sent or | |||
| for an invalid remote IP address (e.g., a broadcast or | received on this connection, and TCP implementations MUST use the | |||
| multicast address) (MUST-46). | same local address that was used in those previous segments (MUST- | |||
| 45). | ||||
| Send | A TCP implementation MUST reject as an error a local OPEN call for | |||
| an invalid remote IP address (e.g., a broadcast or multicast | ||||
| address) (MUST-46). | ||||
| Format: SEND (local connection name, buffer address, byte | 3.9.1.2. Send | |||
| count, PUSH flag (optional), URGENT flag [,timeout]) | ||||
| This call causes the data contained in the indicated user | Format: SEND (local connection name, buffer address, byte count, | |||
| buffer to be sent on the indicated connection. If the | PUSH flag (optional), URGENT flag [,timeout]) | |||
| connection has not been opened, the SEND is considered an | This call causes the data contained in the indicated user buffer | |||
| error. Some implementations may allow users to SEND first; in | to be sent on the indicated connection. If the connection has not | |||
| which case, an automatic OPEN would be done. For example, this | been opened, the SEND is considered an error. Some | |||
| might be one way for application data to be included in SYN | implementations may allow users to SEND first; in which case, an | |||
| segments. If the calling process is not authorized to use this | automatic OPEN would be done. For example, this might be one way | |||
| connection, an error is returned. | for application data to be included in SYN segments. If the | |||
| calling process is not authorized to use this connection, an error | ||||
| is returned. | ||||
| A TCP endpoint MAY implement PUSH flags on SEND calls (MAY-15). | A TCP endpoint MAY implement PUSH flags on SEND calls (MAY-15). | |||
| If PUSH flags are not implemented, then the sending TCP peer: | If PUSH flags are not implemented, then the sending TCP peer: (1) | |||
| (1) MUST NOT buffer data indefinitely (MUST-60), and (2) MUST | MUST NOT buffer data indefinitely (MUST-60), and (2) MUST set the | |||
| set the PSH bit in the last buffered segment (i.e., when there | PSH bit in the last buffered segment (i.e., when there is no more | |||
| is no more queued data to be sent) (MUST-61). The remaining | queued data to be sent) (MUST-61). The remaining description | |||
| description below assumes the PUSH flag is supported on SEND | below assumes the PUSH flag is supported on SEND calls. | |||
| calls. | ||||
| If the PUSH flag is set, the application intends the data to be | If the PUSH flag is set, the application intends the data to be | |||
| transmitted promptly to the receiver, and the PUSH bit will be | transmitted promptly to the receiver, and the PUSH bit will be set | |||
| set in the last TCP segment created from the buffer. When an | in the last TCP segment created from the buffer. | |||
| application issues a series of SEND calls without setting the | ||||
| PUSH flag, the TCP implementation MAY aggregate the data | ||||
| internally without sending it (MAY-16). | ||||
| The PSH bit is not a record marker and is independent of | The PSH bit is not a record marker and is independent of segment | |||
| segment boundaries. The transmitter SHOULD collapse successive | boundaries. The transmitter SHOULD collapse successive bits when | |||
| bits when it packetizes data, to send the largest possible | it packetizes data, to send the largest possible segment (SHLD- | |||
| segment (SHLD-27). | 27). | |||
| If the PUSH flag is not set, the data may be combined with data | If the PUSH flag is not set, the data may be combined with data | |||
| from subsequent SENDs for transmission efficiency. Note that | from subsequent SENDs for transmission efficiency. When an | |||
| when the Nagle algorithm is in use, TCP implementations may | application issues a series of SEND calls without setting the PUSH | |||
| buffer the data before sending, without regard to the PUSH flag | flag, the TCP implementation MAY aggregate the data internally | |||
| (see Section 3.7.4). | without sending it (MAY-16). Note that when the Nagle algorithm | |||
| is in use, TCP implementations may buffer the data before sending, | ||||
| without regard to the PUSH flag (see Section 3.7.4). | ||||
| An application program is logically required to set the PUSH | An application program is logically required to set the PUSH flag | |||
| flag in a SEND call whenever it needs to force delivery of the | in a SEND call whenever it needs to force delivery of the data to | |||
| data to avoid a communication deadlock. However, a TCP | avoid a communication deadlock. However, a TCP implementation | |||
| implementation SHOULD send a maximum-sized segment whenever | SHOULD send a maximum-sized segment whenever possible (SHLD-28), | |||
| possible (SHLD-28), to improve performance (see | to improve performance (see Section 3.8.6.2.1). | |||
| Section 3.8.6.2.1). | ||||
| New applications SHOULD NOT set the URGENT flag [38] due to | New applications SHOULD NOT set the URGENT flag [40] due to | |||
| implementation differences and middlebox issues (SHLD-13). | implementation differences and middlebox issues (SHLD-13). | |||
| If the URGENT flag is set, segments sent to the destination TCP | If the URGENT flag is set, segments sent to the destination TCP | |||
| peer will have the urgent pointer set. The receiving TCP peer | peer will have the urgent pointer set. The receiving TCP peer | |||
| will signal the urgent condition to the receiving process if | will signal the urgent condition to the receiving process if the | |||
| the urgent pointer indicates that data preceding the urgent | urgent pointer indicates that data preceding the urgent pointer | |||
| pointer has not been consumed by the receiving process. The | has not been consumed by the receiving process. The purpose of | |||
| purpose of urgent is to stimulate the receiver to process the | urgent is to stimulate the receiver to process the urgent data and | |||
| urgent data and to indicate to the receiver when all the | to indicate to the receiver when all the currently known urgent | |||
| currently known urgent data has been received. The number of | data has been received. The number of times the sending user's | |||
| times the sending user's TCP implementation signals urgent will | TCP implementation signals urgent will not necessarily be equal to | |||
| not necessarily be equal to the number of times the receiving | the number of times the receiving user will be notified of the | |||
| user will be notified of the presence of urgent data. | presence of urgent data. | |||
| If no remote socket was specified in the OPEN, but the | If no remote socket was specified in the OPEN, but the connection | |||
| connection is established (e.g., because a LISTENing connection | is established (e.g., because a LISTENing connection has become | |||
| has become specific due to a remote segment arriving for the | specific due to a remote segment arriving for the local socket), | |||
| local socket), then the designated buffer is sent to the | then the designated buffer is sent to the implied remote socket. | |||
| implied remote socket. Users who make use of OPEN with an | Users who make use of OPEN with an unspecified remote socket can | |||
| unspecified remote socket can make use of SEND without ever | make use of SEND without ever explicitly knowing the remote socket | |||
| explicitly knowing the remote socket address. | address. | |||
| However, if a SEND is attempted before the remote socket | However, if a SEND is attempted before the remote socket becomes | |||
| becomes specified, an error will be returned. Users can use | specified, an error will be returned. Users can use the STATUS | |||
| the STATUS call to determine the status of the connection. | call to determine the status of the connection. Some TCP | |||
| Some TCP implementations may notify the user when an | implementations may notify the user when an unspecified socket is | |||
| unspecified socket is bound. | bound. | |||
| If a timeout is specified, the current user timeout for this | If a timeout is specified, the current user timeout for this | |||
| connection is changed to the new one. | connection is changed to the new one. | |||
| In the simplest implementation, SEND would not return control | In the simplest implementation, SEND would not return control to | |||
| to the sending process until either the transmission was | the sending process until either the transmission was complete or | |||
| complete or the timeout had been exceeded. However, this | the timeout had been exceeded. However, this simple method is | |||
| simple method is both subject to deadlocks (for example, both | both subject to deadlocks (for example, both sides of the | |||
| sides of the connection might try to do SENDs before doing any | connection might try to do SENDs before doing any RECEIVEs) and | |||
| RECEIVEs) and offers poor performance, so it is not | offers poor performance, so it is not recommended. A more | |||
| recommended. A more sophisticated implementation would return | sophisticated implementation would return immediately to allow the | |||
| immediately to allow the process to run concurrently with | process to run concurrently with network I/O, and, furthermore, to | |||
| network I/O, and, furthermore, to allow multiple SENDs to be in | allow multiple SENDs to be in progress. Multiple SENDs are served | |||
| progress. Multiple SENDs are served in first come, first | in first come, first served order, so the TCP endpoint will queue | |||
| served order, so the TCP endpoint will queue those it cannot | those it cannot service immediately. | |||
| service immediately. | ||||
| We have implicitly assumed an asynchronous user interface in | We have implicitly assumed an asynchronous user interface in which | |||
| which a SEND later elicits some kind of SIGNAL or pseudo- | a SEND later elicits some kind of SIGNAL or pseudo-interrupt from | |||
| interrupt from the serving TCP endpoint. An alternative is to | the serving TCP endpoint. An alternative is to return a response | |||
| return a response immediately. For instance, SENDs might | immediately. For instance, SENDs might return immediate local | |||
| return immediate local acknowledgment, even if the segment sent | acknowledgment, even if the segment sent had not been acknowledged | |||
| had not been acknowledged by the distant TCP endpoint. We | by the distant TCP endpoint. We could optimistically assume | |||
| could optimistically assume eventual success. If we are wrong, | eventual success. If we are wrong, the connection will close | |||
| the connection will close anyway due to the timeout. In | anyway due to the timeout. In implementations of this kind | |||
| implementations of this kind (synchronous), there will still be | (synchronous), there will still be some asynchronous signals, but | |||
| some asynchronous signals, but these will deal with the | these will deal with the connection itself, and not with specific | |||
| connection itself, and not with specific segments or buffers. | segments or buffers. | |||
| In order for the process to distinguish among error or success | In order for the process to distinguish among error or success | |||
| indications for different SENDs, it might be appropriate for | indications for different SENDs, it might be appropriate for the | |||
| the buffer address to be returned along with the coded response | buffer address to be returned along with the coded response to the | |||
| to the SEND request. TCP-to-user signals are discussed below, | SEND request. TCP-to-user signals are discussed below, indicating | |||
| indicating the information that should be returned to the | the information that should be returned to the calling process. | |||
| calling process. | ||||
| Receive | 3.9.1.3. Receive | |||
| Format: RECEIVE (local connection name, buffer address, byte | Format: RECEIVE (local connection name, buffer address, byte | |||
| count) -> byte count, urgent flag, push flag (optional) | count) -> byte count, urgent flag, push flag (optional) | |||
| This command allocates a receiving buffer associated with the | This command allocates a receiving buffer associated with the | |||
| specified connection. If no OPEN precedes this command or the | specified connection. If no OPEN precedes this command or the | |||
| calling process is not authorized to use this connection, an | calling process is not authorized to use this connection, an error | |||
| error is returned. | is returned. | |||
| In the simplest implementation, control would not return to the | In the simplest implementation, control would not return to the | |||
| calling program until either the buffer was filled, or some | calling program until either the buffer was filled, or some error | |||
| error occurred, but this scheme is highly subject to deadlocks. | occurred, but this scheme is highly subject to deadlocks. A more | |||
| A more sophisticated implementation would permit several | sophisticated implementation would permit several RECEIVEs to be | |||
| RECEIVEs to be outstanding at once. These would be filled as | outstanding at once. These would be filled as segments arrive. | |||
| segments arrive. This strategy permits increased throughput at | This strategy permits increased throughput at the cost of a more | |||
| the cost of a more elaborate scheme (possibly asynchronous) to | elaborate scheme (possibly asynchronous) to notify the calling | |||
| notify the calling program that a PUSH has been seen or a | program that a PUSH has been seen or a buffer filled. | |||
| buffer filled. | ||||
| A TCP receiver MAY pass a received PSH flag to the application | A TCP receiver MAY pass a received PSH flag to the application | |||
| layer via the PUSH flag in the interface (MAY-17), but it is | layer via the PUSH flag in the interface (MAY-17), but it is not | |||
| not required (this was clarified in RFC 1122 section 4.2.2.2). | required (this was clarified in RFC 1122 section 4.2.2.2). The | |||
| The remainder of text describing the RECEIVE call below assumes | remainder of text describing the RECEIVE call below assumes that | |||
| that passing the PUSH indication is supported. | passing the PUSH indication is supported. | |||
| If enough data arrive to fill the buffer before a PUSH is seen, | If enough data arrive to fill the buffer before a PUSH is seen, | |||
| the PUSH flag will not be set in the response to the RECEIVE. | the PUSH flag will not be set in the response to the RECEIVE. The | |||
| The buffer will be filled with as much data as it can hold. If | buffer will be filled with as much data as it can hold. If a PUSH | |||
| a PUSH is seen before the buffer is filled the buffer will be | is seen before the buffer is filled the buffer will be returned | |||
| returned partially filled and PUSH indicated. | partially filled and PUSH indicated. | |||
| If there is urgent data the user will have been informed as | If there is urgent data the user will have been informed as soon | |||
| soon as it arrived via a TCP-to-user signal. The receiving | as it arrived via a TCP-to-user signal. The receiving user should | |||
| user should thus be in "urgent mode". If the URGENT flag is | thus be in "urgent mode". If the URGENT flag is on, additional | |||
| on, additional urgent data remains. If the URGENT flag is off, | urgent data remains. If the URGENT flag is off, this call to | |||
| this call to RECEIVE has returned all the urgent data, and the | RECEIVE has returned all the urgent data, and the user may now | |||
| user may now leave "urgent mode". Note that data following the | leave "urgent mode". Note that data following the urgent pointer | |||
| urgent pointer (non-urgent data) cannot be delivered to the | (non-urgent data) cannot be delivered to the user in the same | |||
| user in the same buffer with preceding urgent data unless the | buffer with preceding urgent data unless the boundary is clearly | |||
| boundary is clearly marked for the user. | marked for the user. | |||
| To distinguish among several outstanding RECEIVEs and to take | To distinguish among several outstanding RECEIVEs and to take care | |||
| care of the case that a buffer is not completely filled, the | of the case that a buffer is not completely filled, the return | |||
| return code is accompanied by both a buffer pointer and a byte | code is accompanied by both a buffer pointer and a byte count | |||
| count indicating the actual length of the data received. | indicating the actual length of the data received. | |||
| Alternative implementations of RECEIVE might have the TCP | Alternative implementations of RECEIVE might have the TCP endpoint | |||
| endpoint allocate buffer storage, or the TCP endpoint might | allocate buffer storage, or the TCP endpoint might share a ring | |||
| share a ring buffer with the user. | buffer with the user. | |||
| Close | 3.9.1.4. Close | |||
| Format: CLOSE (local connection name) | Format: CLOSE (local connection name) | |||
| This command causes the connection specified to be closed. If | This command causes the connection specified to be closed. If the | |||
| the connection is not open or the calling process is not | connection is not open or the calling process is not authorized to | |||
| authorized to use this connection, an error is returned. | use this connection, an error is returned. Closing connections is | |||
| Closing connections is intended to be a graceful operation in | intended to be a graceful operation in the sense that outstanding | |||
| the sense that outstanding SENDs will be transmitted (and | SENDs will be transmitted (and retransmitted), as flow control | |||
| retransmitted), as flow control permits, until all have been | permits, until all have been serviced. Thus, it should be | |||
| serviced. Thus, it should be acceptable to make several SEND | acceptable to make several SEND calls, followed by a CLOSE, and | |||
| calls, followed by a CLOSE, and expect all the data to be sent | expect all the data to be sent to the destination. It should also | |||
| to the destination. It should also be clear that users should | be clear that users should continue to RECEIVE on CLOSING | |||
| continue to RECEIVE on CLOSING connections, since the remote | connections, since the remote peer may be trying to transmit the | |||
| peer may be trying to transmit the last of its data. Thus, | last of its data. Thus, CLOSE means "I have no more to send" but | |||
| CLOSE means "I have no more to send" but does not mean "I will | does not mean "I will not receive any more." It may happen (if | |||
| not receive any more." It may happen (if the user level | the user level protocol is not well-thought-out) that the closing | |||
| protocol is not well thought out) that the closing side is | side is unable to get rid of all its data before timing out. In | |||
| unable to get rid of all its data before timing out. In this | this event, CLOSE turns into ABORT, and the closing TCP peer gives | |||
| event, CLOSE turns into ABORT, and the closing TCP peer gives | up. | |||
| up. | ||||
| The user may CLOSE the connection at any time on their own | The user may CLOSE the connection at any time on their own | |||
| initiative, or in response to various prompts from the TCP | initiative, or in response to various prompts from the TCP | |||
| implementation (e.g., remote close executed, transmission | implementation (e.g., remote close executed, transmission timeout | |||
| timeout exceeded, destination inaccessible). | exceeded, destination inaccessible). | |||
| Because closing a connection requires communication with the | Because closing a connection requires communication with the | |||
| remote TCP peer, connections may remain in the closing state | remote TCP peer, connections may remain in the closing state for a | |||
| for a short time. Attempts to reopen the connection before the | short time. Attempts to reopen the connection before the TCP peer | |||
| TCP peer replies to the CLOSE command will result in error | replies to the CLOSE command will result in error responses. | |||
| responses. | ||||
| Close also implies push function. | Close also implies push function. | |||
| Status | 3.9.1.5. Status | |||
| Format: STATUS (local connection name) -> status data | Format: STATUS (local connection name) -> status data | |||
| This is an implementation dependent user command and could be | ||||
| excluded without adverse effect. Information returned would | ||||
| typically come from the TCB associated with the connection. | ||||
| This is an implementation dependent user command and could be | This command returns a data block containing the following | |||
| excluded without adverse effect. Information returned would | information: | |||
| typically come from the TCB associated with the connection. | ||||
| This command returns a data block containing the following | - local socket, | |||
| information: | ||||
| local socket, | remote socket, | |||
| remote socket, | local connection name, | |||
| local connection name, | receive window, | |||
| receive window, | send window, | |||
| send window, | connection state, | |||
| connection state, | number of buffers awaiting acknowledgment, | |||
| number of buffers awaiting acknowledgment, | number of buffers pending receipt, | |||
| number of buffers pending receipt, | urgent state, | |||
| urgent state, | DiffServ field value, | |||
| DiffServ field value, | security/compartment, | |||
| security/compartment, | and transmission timeout. | |||
| and transmission timeout. | Depending on the state of the connection, or on the implementation | |||
| itself, some of this information may not be available or | ||||
| meaningful. If the calling process is not authorized to use this | ||||
| connection, an error is returned. This prevents unauthorized | ||||
| processes from gaining information about a connection. | ||||
| Depending on the state of the connection, or on the | 3.9.1.6. Abort | |||
| implementation itself, some of this information may not be | ||||
| available or meaningful. If the calling process is not | ||||
| authorized to use this connection, an error is returned. This | ||||
| prevents unauthorized processes from gaining information about | ||||
| a connection. | ||||
| Abort | Format: ABORT (local connection name) | |||
| Format: ABORT (local connection name) | This command causes all pending SENDs and RECEIVES to be aborted, | |||
| This command causes all pending SENDs and RECEIVES to be | the TCB to be removed, and a special RESET message to be sent to | |||
| aborted, the TCB to be removed, and a special RESET message to | the remote TCP peer of the connection. Depending on the | |||
| be sent to the remote TCP peer of the connection. Depending on | implementation, users may receive abort indications for each | |||
| the implementation, users may receive abort indications for | outstanding SEND or RECEIVE, or may simply receive an ABORT- | |||
| each outstanding SEND or RECEIVE, or may simply receive an | acknowledgment. | |||
| ABORT-acknowledgment. | ||||
| Flush | 3.9.1.7. Flush | |||
| Some TCP implementations have included a FLUSH call, which will | Some TCP implementations have included a FLUSH call, which will | |||
| empty the TCP send queue of any data that the user has issued | empty the TCP send queue of any data that the user has issued SEND | |||
| SEND calls but is still to the right of the current send | calls for but is still to the right of the current send window. | |||
| window. That is, it flushes as much queued send data as | That is, it flushes as much queued send data as possible without | |||
| possible without losing sequence number synchronization. The | losing sequence number synchronization. The FLUSH call MAY be | |||
| FLUSH call MAY be implemented (MAY-14). | implemented (MAY-14). | |||
| Asynchronous Reports | 3.9.1.8. Asynchronous Reports | |||
| There MUST be a mechanism for reporting soft TCP error | There MUST be a mechanism for reporting soft TCP error conditions | |||
| conditions to the application (MUST-47). Generically, we | to the application (MUST-47). Generically, we assume this takes | |||
| assume this takes the form of an application-supplied | the form of an application-supplied ERROR_REPORT routine that may | |||
| ERROR_REPORT routine that may be upcalled asynchronously from | be upcalled asynchronously from the transport layer: | |||
| the transport layer: | ||||
| ERROR_REPORT(local connection name, reason, subreason) | - ERROR_REPORT(local connection name, reason, subreason) | |||
| The precise encoding of the reason and subreason parameters is | The precise encoding of the reason and subreason parameters is not | |||
| not specified here. However, the conditions that are reported | specified here. However, the conditions that are reported | |||
| asynchronously to the application MUST include: | asynchronously to the application MUST include: | |||
| * ICMP error message arrived (see Section 3.9.2.2 for | - * ICMP error message arrived (see Section 3.9.2.2 for | |||
| description of handling each ICMP message type, since some | description of handling each ICMP message type, since some | |||
| message types need to be suppressed from generating reports | message types need to be suppressed from generating reports to | |||
| to the application) | the application) | |||
| * Excessive retransmissions (see Section 3.8.3) | - * Excessive retransmissions (see Section 3.8.3) | |||
| * Urgent pointer advance (see Section 3.8.5) | - * Urgent pointer advance (see Section 3.8.5) | |||
| However, an application program that does not want to receive | However, an application program that does not want to receive such | |||
| such ERROR_REPORT calls SHOULD be able to effectively disable | ERROR_REPORT calls SHOULD be able to effectively disable these | |||
| these calls (SHLD-20). | calls (SHLD-20). | |||
| Set Differentiated Services Field (IPv4 TOS or IPv6 Traffic Class) | 3.9.1.9. Set Differentiated Services Field (IPv4 TOS or IPv6 Traffic | |||
| Class) | ||||
| The application layer MUST be able to specify the | The application layer MUST be able to specify the Differentiated | |||
| Differentiated Services field for segments that are sent on a | Services field for segments that are sent on a connection (MUST- | |||
| connection (MUST-48). The Differentiated Services field | 48). The Differentiated Services field includes the 6-bit | |||
| includes the 6-bit Differentiated Services Code Point (DSCP) | Differentiated Services Code Point (DSCP) value. It is not | |||
| value. It is not required, but the application SHOULD be able | required, but the application SHOULD be able to change the | |||
| to change the Differentiated Services field during the | Differentiated Services field during the connection lifetime | |||
| connection lifetime (SHLD-21). TCP implementations SHOULD pass | (SHLD-21). TCP implementations SHOULD pass the current | |||
| the current Differentiated Services field value without change | Differentiated Services field value without change to the IP | |||
| to the IP layer, when it sends segments on the connection | layer, when it sends segments on the connection (SHLD-22). | |||
| (SHLD-22). | ||||
| The Differentiated Services field will be specified | The Differentiated Services field will be specified independently | |||
| independently in each direction on the connection, so that the | in each direction on the connection, so that the receiver | |||
| receiver application will specify the Differentiated Services | application will specify the Differentiated Services field used | |||
| field used for ACK segments. | for ACK segments. | |||
| TCP implementations MAY pass the most recently received | TCP implementations MAY pass the most recently received | |||
| Differentiated Services field up to the application (MAY-9). | Differentiated Services field up to the application (MAY-9). | |||
| 3.9.2. TCP/Lower-Level Interface | 3.9.2. TCP/Lower-Level Interface | |||
| The TCP endpoint calls on a lower level protocol module to actually | The TCP endpoint calls on a lower level protocol module to actually | |||
| send and receive information over a network. The two current | send and receive information over a network. The two current | |||
| standard Internet Protocol (IP) versions layered below TCP are IPv4 | standard Internet Protocol (IP) versions layered below TCP are IPv4 | |||
| [1] and IPv6 [13]. | [1] and IPv6 [13]. | |||
| If the lower level protocol is IPv4 it provides arguments for a type | If the lower level protocol is IPv4 it provides arguments for a type | |||
| of service (used within the Differentiated Services field) and for a | of service (used within the Differentiated Services field) and for a | |||
| skipping to change at page 60, line 5 ¶ | skipping to change at page 59, line 44 ¶ | |||
| that a segment be destroyed if it cannot be delivered by the | that a segment be destroyed if it cannot be delivered by the | |||
| internet system within one minute. RFC 1122 changed this | internet system within one minute. RFC 1122 changed this | |||
| specification to require that the TTL be configurable. | specification to require that the TTL be configurable. | |||
| - Note that the DiffServ field is permitted to change during a | - Note that the DiffServ field is permitted to change during a | |||
| connection (Section 4.2.4.2 of RFC 1122). However, the | connection (Section 4.2.4.2 of RFC 1122). However, the | |||
| application interface might not support this ability, and the | application interface might not support this ability, and the | |||
| application does not have knowledge about individual TCP | application does not have knowledge about individual TCP | |||
| segments, so this can only be done on a coarse granularity, at | segments, so this can only be done on a coarse granularity, at | |||
| best. This limitation is further discussed in RFC 7657 (sec | best. This limitation is further discussed in RFC 7657 (sec | |||
| 5.1, 5.3, and 6) [49]. Generally, an application SHOULD NOT | 5.1, 5.3, and 6) [51]. Generally, an application SHOULD NOT | |||
| change the DiffServ field value during the course of a | change the DiffServ field value during the course of a | |||
| connection (SHLD-23). | connection (SHLD-23). | |||
| Any lower level protocol will have to provide the source address, | Any lower level protocol will have to provide the source address, | |||
| destination address, and protocol fields, and some way to determine | destination address, and protocol fields, and some way to determine | |||
| the "TCP length", both to provide the functional equivalent service | the "TCP length", both to provide the functional equivalent service | |||
| of IP and to be used in the TCP checksum. | of IP and to be used in the TCP checksum. | |||
| When received options are passed up to TCP from the IP layer, TCP | When received options are passed up to TCP from the IP layer, a TCP | |||
| implementations MUST ignore options that it does not understand | implementation MUST ignore options that it does not understand (MUST- | |||
| (MUST-50). | 50). | |||
| A TCP implementation MAY support the Time Stamp (MAY-10) and Record | A TCP implementation MAY support the Time Stamp (MAY-10) and Record | |||
| Route (MAY-11) options. | Route (MAY-11) options. | |||
| 3.9.2.1. Source Routing | 3.9.2.1. Source Routing | |||
| If the lower level is IP (or other protocol that provides this | If the lower level is IP (or other protocol that provides this | |||
| feature) and source routing is used, the interface must allow the | feature) and source routing is used, the interface must allow the | |||
| route information to be communicated. This is especially important | route information to be communicated. This is especially important | |||
| so that the source and destination addresses used in the TCP checksum | so that the source and destination addresses used in the TCP checksum | |||
| skipping to change at page 61, line 5 ¶ | skipping to change at page 60, line 41 ¶ | |||
| 3.9.2.2. ICMP Messages | 3.9.2.2. ICMP Messages | |||
| TCP implementations MUST act on an ICMP error message passed up from | TCP implementations MUST act on an ICMP error message passed up from | |||
| the IP layer, directing it to the connection that created the error | the IP layer, directing it to the connection that created the error | |||
| (MUST-54). The necessary demultiplexing information can be found in | (MUST-54). The necessary demultiplexing information can be found in | |||
| the IP header contained within the ICMP message. | the IP header contained within the ICMP message. | |||
| This applies to ICMPv6 in addition to IPv4 ICMP. | This applies to ICMPv6 in addition to IPv4 ICMP. | |||
| [33] contains discussion of specific ICMP and ICMPv6 messages | [36] contains discussion of specific ICMP and ICMPv6 messages | |||
| classified as either "soft" or "hard" errors that may bear different | classified as either "soft" or "hard" errors that may bear different | |||
| responses. Treatment for classes of ICMP messages is described | responses. Treatment for classes of ICMP messages is described | |||
| below: | below: | |||
| Source Quench | Source Quench | |||
| TCP implementations MUST silently discard any received ICMP Source | TCP implementations MUST silently discard any received ICMP Source | |||
| Quench messages (MUST-55). See [11] for discussion. | Quench messages (MUST-55). See [11] for discussion. | |||
| Soft Errors | Soft Errors | |||
| For ICMP these include: Destination Unreachable -- codes 0, 1, 5, | For IPv4 ICMP these include: Destination Unreachable -- codes 0, 1, | |||
| Time Exceeded -- codes 0, 1, and Parameter Problem. | 5; Time Exceeded -- codes 0, 1; and Parameter Problem. | |||
| For ICMPv6 these include: Destination Unreachable -- codes 0 and 3, | For ICMPv6 these include: Destination Unreachable -- codes 0, 3; | |||
| Time Exceeded -- codes 0, 1, and Parameter Problem -- codes 0, 1, | Time Exceeded -- codes 0, 1; and Parameter Problem -- codes 0, 1, | |||
| 2. | 2. | |||
| Since these Unreachable messages indicate soft error conditions, | Since these Unreachable messages indicate soft error conditions, | |||
| TCP implementations MUST NOT abort the connection (MUST-56), and it | TCP implementations MUST NOT abort the connection (MUST-56), and it | |||
| SHOULD make the information available to the application (SHLD-25). | SHOULD make the information available to the application (SHLD-25). | |||
| Hard Errors | Hard Errors | |||
| For ICMP these include Destination Unreachable -- codes 2-4. | For ICMP these include Destination Unreachable -- codes 2-4. | |||
| These are hard error conditions, so TCP implementations SHOULD | These are hard error conditions, so TCP implementations SHOULD | |||
| abort the connection (SHLD-26). [33] notes that some | abort the connection (SHLD-26). [36] notes that some | |||
| implementations do not abort connections when an ICMP hard error is | implementations do not abort connections when an ICMP hard error is | |||
| received for a connection that is in any of the synchronized | received for a connection that is in any of the synchronized | |||
| states. | states. | |||
| Note that [33] section 4 describes widespread implementation behavior | Note that [36] section 4 describes widespread implementation behavior | |||
| that treats soft errors as hard errors during connection | that treats soft errors as hard errors during connection | |||
| establishment. | establishment. | |||
| 3.9.2.3. Source Address Validation | 3.9.2.3. Source Address Validation | |||
| RFC 1122 requires addresses to be validated in incoming SYN packets: | RFC 1122 requires addresses to be validated in incoming SYN packets: | |||
| An incoming SYN with an invalid source address MUST be ignored | An incoming SYN with an invalid source address MUST be ignored | |||
| either by TCP or by the IP layer (MUST-63) (Section 3.2.1.3 of | either by TCP or by the IP layer (MUST-63) (Section 3.2.1.3 of | |||
| [18]). | [20]). | |||
| A TCP implementation MUST silently discard an incoming SYN segment | A TCP implementation MUST silently discard an incoming SYN segment | |||
| that is addressed to a broadcast or multicast address (MUST-57). | that is addressed to a broadcast or multicast address (MUST-57). | |||
| This prevents connection state and replies from being erroneously | This prevents connection state and replies from being erroneously | |||
| generated, and implementers should note that this guidance is | generated, and implementers should note that this guidance is | |||
| applicable to all incoming segments, not just SYNs, as specifically | applicable to all incoming segments, not just SYNs, as specifically | |||
| indicated in RFC 1122. | indicated in RFC 1122. | |||
| 3.10. Event Processing | 3.10. Event Processing | |||
| skipping to change at page 63, line 15 ¶ | skipping to change at page 62, line 50 ¶ | |||
| The model of the TCP/user interface is that user commands receive an | The model of the TCP/user interface is that user commands receive an | |||
| immediate return and possibly a delayed response via an event or | immediate return and possibly a delayed response via an event or | |||
| pseudo interrupt. In the following descriptions, the term "signal" | pseudo interrupt. In the following descriptions, the term "signal" | |||
| means cause a delayed response. | means cause a delayed response. | |||
| Error responses in this document are identified by character strings. | Error responses in this document are identified by character strings. | |||
| For example, user commands referencing connections that do not exist | For example, user commands referencing connections that do not exist | |||
| receive "error: connection not open". | receive "error: connection not open". | |||
| Please note in the following that all arithmetic on sequence numbers, | Please note in the following that all arithmetic on sequence numbers, | |||
| acknowledgment numbers, windows, et cetera, is modulo 2**32 the size | acknowledgment numbers, windows, et cetera, is modulo 2**32 (the size | |||
| of the sequence number space. Also note that "=<" means less than or | of the sequence number space). Also note that "=<" means less than | |||
| equal to (modulo 2**32). | or equal to (modulo 2**32). | |||
| A natural way to think about processing incoming segments is to | A natural way to think about processing incoming segments is to | |||
| imagine that they are first tested for proper sequence number (i.e., | imagine that they are first tested for proper sequence number (i.e., | |||
| that their contents lie in the range of the expected "receive window" | that their contents lie in the range of the expected "receive window" | |||
| in the sequence number space) and then that they are generally queued | in the sequence number space) and then that they are generally queued | |||
| and processed in sequence number order. | and processed in sequence number order. | |||
| When a segment overlaps other already received segments we | When a segment overlaps other already received segments we | |||
| reconstruct the segment to contain just the new data, and adjust the | reconstruct the segment to contain just the new data, and adjust the | |||
| header fields to be consistent. | header fields to be consistent. | |||
| skipping to change at page 63, line 43 ¶ | skipping to change at page 63, line 29 ¶ | |||
| CLOSED STATE (i.e., TCB does not exist) | CLOSED STATE (i.e., TCB does not exist) | |||
| - Create a new transmission control block (TCB) to hold | - Create a new transmission control block (TCB) to hold | |||
| connection state information. Fill in local socket identifier, | connection state information. Fill in local socket identifier, | |||
| remote socket, DiffServ field, security/compartment, and user | remote socket, DiffServ field, security/compartment, and user | |||
| timeout information. Note that some parts of the remote socket | timeout information. Note that some parts of the remote socket | |||
| may be unspecified in a passive OPEN and are to be filled in by | may be unspecified in a passive OPEN and are to be filled in by | |||
| the parameters of the incoming SYN segment. Verify the | the parameters of the incoming SYN segment. Verify the | |||
| security and DiffServ value requested are allowed for this | security and DiffServ value requested are allowed for this | |||
| user, if not return "error: precedence not allowed" or "error: | user, if not return "error: DiffServ value not allowed" or | |||
| security/compartment not allowed." If passive enter the LISTEN | "error: security/compartment not allowed." If passive enter | |||
| state and return. If active and the remote socket is | the LISTEN state and return. If active and the remote socket | |||
| unspecified, return "error: remote socket unspecified"; if | is unspecified, return "error: remote socket unspecified"; if | |||
| active and the remote socket is specified, issue a SYN segment. | active and the remote socket is specified, issue a SYN segment. | |||
| An initial send sequence number (ISS) is selected. A SYN | An initial send sequence number (ISS) is selected. A SYN | |||
| segment of the form <SEQ=ISS><CTL=SYN> is sent. Set SND.UNA to | segment of the form <SEQ=ISS><CTL=SYN> is sent. Set SND.UNA to | |||
| ISS, SND.NXT to ISS+1, enter SYN-SENT state, and return. | ISS, SND.NXT to ISS+1, enter SYN-SENT state, and return. | |||
| - If the caller does not have access to the local socket | - If the caller does not have access to the local socket | |||
| specified, return "error: connection illegal for this process". | specified, return "error: connection illegal for this process". | |||
| If there is no room to create a new connection, return "error: | If there is no room to create a new connection, return "error: | |||
| insufficient resources". | insufficient resources". | |||
| skipping to change at page 64, line 11 ¶ | skipping to change at page 64, line 4 ¶ | |||
| An initial send sequence number (ISS) is selected. A SYN | An initial send sequence number (ISS) is selected. A SYN | |||
| segment of the form <SEQ=ISS><CTL=SYN> is sent. Set SND.UNA to | segment of the form <SEQ=ISS><CTL=SYN> is sent. Set SND.UNA to | |||
| ISS, SND.NXT to ISS+1, enter SYN-SENT state, and return. | ISS, SND.NXT to ISS+1, enter SYN-SENT state, and return. | |||
| - If the caller does not have access to the local socket | - If the caller does not have access to the local socket | |||
| specified, return "error: connection illegal for this process". | specified, return "error: connection illegal for this process". | |||
| If there is no room to create a new connection, return "error: | If there is no room to create a new connection, return "error: | |||
| insufficient resources". | insufficient resources". | |||
| LISTEN STATE | LISTEN STATE | |||
| - If the OPEN call is active and the remote socket is specified, | - If the OPEN call is active and the remote socket is specified, | |||
| then change the connection from passive to active, select an | then change the connection from passive to active, select an | |||
| ISS. Send a SYN segment, set SND.UNA to ISS, SND.NXT to ISS+1. | ISS. Send a SYN segment, set SND.UNA to ISS, SND.NXT to ISS+1. | |||
| Enter SYN-SENT state. Data associated with SEND may be sent | Enter SYN-SENT state. Data associated with SEND may be sent | |||
| with SYN segment or queued for transmission after entering | with SYN segment or queued for transmission after entering | |||
| ESTABLISHED state. The urgent bit if requested in the command | ESTABLISHED state. The urgent bit if requested in the command | |||
| must be sent with the data segments sent as a result of this | must be sent with the data segments sent as a result of this | |||
| command. If there is no room to queue the request, respond | command. If there is no room to queue the request, respond | |||
| with "error: insufficient resources". If Foreign socket was | with "error: insufficient resources". If the remote socket was | |||
| not specified, then return "error: remote socket unspecified". | not specified, then return "error: remote socket unspecified". | |||
| SYN-SENT STATE | SYN-SENT STATE | |||
| SYN-RECEIVED STATE | SYN-RECEIVED STATE | |||
| ESTABLISHED STATE | ESTABLISHED STATE | |||
| FIN-WAIT-1 STATE | FIN-WAIT-1 STATE | |||
| skipping to change at page 65, line 15 ¶ | skipping to change at page 65, line 7 ¶ | |||
| LISTEN STATE | LISTEN STATE | |||
| - If the remote socket is specified, then change the connection | - If the remote socket is specified, then change the connection | |||
| from passive to active, select an ISS. Send a SYN segment, set | from passive to active, select an ISS. Send a SYN segment, set | |||
| SND.UNA to ISS, SND.NXT to ISS+1. Enter SYN-SENT state. Data | SND.UNA to ISS, SND.NXT to ISS+1. Enter SYN-SENT state. Data | |||
| associated with SEND may be sent with SYN segment or queued for | associated with SEND may be sent with SYN segment or queued for | |||
| transmission after entering ESTABLISHED state. The urgent bit | transmission after entering ESTABLISHED state. The urgent bit | |||
| if requested in the command must be sent with the data segments | if requested in the command must be sent with the data segments | |||
| sent as a result of this command. If there is no room to queue | sent as a result of this command. If there is no room to queue | |||
| the request, respond with "error: insufficient resources". If | the request, respond with "error: insufficient resources". If | |||
| Foreign socket was not specified, then return "error: remote | the remote socket was not specified, then return "error: remote | |||
| socket unspecified". | socket unspecified". | |||
| SYN-SENT STATE | SYN-SENT STATE | |||
| SYN-RECEIVED STATE | SYN-RECEIVED STATE | |||
| - Queue the data for transmission after entering ESTABLISHED | - Queue the data for transmission after entering ESTABLISHED | |||
| state. If no space to queue, respond with "error: insufficient | state. If no space to queue, respond with "error: insufficient | |||
| resources". | resources". | |||
| skipping to change at page 67, line 4 ¶ | skipping to change at page 66, line 36 ¶ | |||
| - If RCV.UP is in advance of the data currently being passed to | - If RCV.UP is in advance of the data currently being passed to | |||
| the user notify the user of the presence of urgent data. | the user notify the user of the presence of urgent data. | |||
| - When the TCP endpoint takes responsibility for delivering data | - When the TCP endpoint takes responsibility for delivering data | |||
| to the user that fact must be communicated to the sender via an | to the user that fact must be communicated to the sender via an | |||
| acknowledgment. The formation of such an acknowledgment is | acknowledgment. The formation of such an acknowledgment is | |||
| described below in the discussion of processing an incoming | described below in the discussion of processing an incoming | |||
| segment. | segment. | |||
| CLOSE-WAIT STATE | CLOSE-WAIT STATE | |||
| - Since the remote side has already sent FIN, RECEIVEs must be | - Since the remote side has already sent FIN, RECEIVEs must be | |||
| satisfied by data already on hand, but not yet delivered to the | satisfied by data already on hand, but not yet delivered to the | |||
| user. If no text is awaiting delivery, the RECEIVE will get a | user. If no text is awaiting delivery, the RECEIVE will get an | |||
| "error: connection closing" response. Otherwise, any remaining | "error: connection closing" response. Otherwise, any remaining | |||
| text can be used to satisfy the RECEIVE. | data can be used to satisfy the RECEIVE. | |||
| CLOSING STATE | CLOSING STATE | |||
| LAST-ACK STATE | LAST-ACK STATE | |||
| TIME-WAIT STATE | TIME-WAIT STATE | |||
| - Return "error: connection closing". | - Return "error: connection closing". | |||
| 3.10.4. CLOSE Call | 3.10.4. CLOSE Call | |||
| skipping to change at page 68, line 4 ¶ | skipping to change at page 67, line 38 ¶ | |||
| state; otherwise queue for processing after entering | state; otherwise queue for processing after entering | |||
| ESTABLISHED state. | ESTABLISHED state. | |||
| ESTABLISHED STATE | ESTABLISHED STATE | |||
| - Queue this until all preceding SENDs have been segmentized, | - Queue this until all preceding SENDs have been segmentized, | |||
| then form a FIN segment and send it. In any case, enter FIN- | then form a FIN segment and send it. In any case, enter FIN- | |||
| WAIT-1 state. | WAIT-1 state. | |||
| FIN-WAIT-1 STATE | FIN-WAIT-1 STATE | |||
| FIN-WAIT-2 STATE | FIN-WAIT-2 STATE | |||
| - Strictly speaking, this is an error and should receive a | - Strictly speaking, this is an error and should receive an | |||
| "error: connection closing" response. An "ok" response would | "error: connection closing" response. An "ok" response would | |||
| be acceptable, too, as long as a second FIN is not emitted (the | be acceptable, too, as long as a second FIN is not emitted (the | |||
| first FIN may be retransmitted though). | first FIN may be retransmitted though). | |||
| CLOSE-WAIT STATE | CLOSE-WAIT STATE | |||
| - Queue this request until all preceding SENDs have been | - Queue this request until all preceding SENDs have been | |||
| segmentized; then send a FIN segment, enter LAST-ACK state. | segmentized; then send a FIN segment, enter LAST-ACK state. | |||
| CLOSING STATE | CLOSING STATE | |||
| skipping to change at page 72, line 4 ¶ | skipping to change at page 71, line 42 ¶ | |||
| processing in the second step, unless it was first discarded by | processing in the second step, unless it was first discarded by | |||
| RST checking in the first step. | RST checking in the first step. | |||
| 3.10.7.3. SYN-SENT State | 3.10.7.3. SYN-SENT State | |||
| If the state is SYN-SENT then | If the state is SYN-SENT then | |||
| first check the ACK bit | first check the ACK bit | |||
| - If the ACK bit is set | - If the ACK bit is set | |||
| o If SEG.ACK =< ISS, or SEG.ACK > SND.NXT, send a reset | o If SEG.ACK =< ISS, or SEG.ACK > SND.NXT, send a reset | |||
| (unless the RST bit is set, if so drop the segment and | (unless the RST bit is set, if so drop the segment and | |||
| return) | return) | |||
| + <SEQ=SEG.ACK><CTL=RST> | + <SEQ=SEG.ACK><CTL=RST> | |||
| o and discard the segment. Return. | o and discard the segment. Return. | |||
| o If SND.UNA < SEG.ACK =< SND.NXT then the ACK is acceptable. | o If SND.UNA < SEG.ACK =< SND.NXT then the ACK is acceptable. | |||
| Some deployed TCP code has used the check SEG.ACK == SND.NXT | Some deployed TCP code has used the check SEG.ACK == SND.NXT | |||
| (using "==" rather than "=<", but this is not appropriate | (using "==" rather than "=<", but this is not appropriate | |||
| when the stack is capable of sending data on the SYN, | when the stack is capable of sending data on the SYN, | |||
| because the TCP peer may not accept and acknowledge all of | because the TCP peer may not accept and acknowledge all of | |||
| the data on the SYN. | the data on the SYN. | |||
| second check the RST bit | second check the RST bit | |||
| - If the RST bit is set | - If the RST bit is set | |||
| o A potential blind reset attack is described in RFC 5961 | o A potential blind reset attack is described in RFC 5961 [9]. | |||
| [37]. The mitigation described in that document has | The mitigation described in that document has specific | |||
| specific applicability explained therein, and is not a | applicability explained therein, and is not a substitute for | |||
| substitute for cryptographic protection (e.g. IPsec or TCP- | cryptographic protection (e.g. IPsec or TCP-AO). A TCP | |||
| AO). A TCP implementation that supports the RFC 5961 | implementation that supports the RFC 5961 mitigation SHOULD | |||
| mitigation SHOULD first check that the sequence number | first check that the sequence number exactly matches RCV.NXT | |||
| exactly matches RCV.NXT prior to executing the action in the | prior to executing the action in the next paragraph. | |||
| next paragraph. | ||||
| o If the ACK was acceptable then signal the user "error: | o If the ACK was acceptable then signal the user "error: | |||
| connection reset", drop the segment, enter CLOSED state, | connection reset", drop the segment, enter CLOSED state, | |||
| delete TCB, and return. Otherwise (no ACK) drop the segment | delete TCB, and return. Otherwise (no ACK), drop the | |||
| and return. | segment and return. | |||
| third check the security | third check the security | |||
| - If the security/compartment in the segment does not exactly | - If the security/compartment in the segment does not exactly | |||
| match the security/compartment in the TCB, send a reset | match the security/compartment in the TCB, send a reset | |||
| o If there is an ACK | o If there is an ACK | |||
| + <SEQ=SEG.ACK><CTL=RST> | + <SEQ=SEG.ACK><CTL=RST> | |||
| skipping to change at page 74, line 10 ¶ | skipping to change at page 73, line 47 ¶ | |||
| If there are other controls or text in the segment, queue them | If there are other controls or text in the segment, queue them | |||
| for processing after the ESTABLISHED state has been reached, | for processing after the ESTABLISHED state has been reached, | |||
| return. | return. | |||
| - Note that it is legal to send and receive application data on | - Note that it is legal to send and receive application data on | |||
| SYN segments (this is the "text in the segment" mentioned | SYN segments (this is the "text in the segment" mentioned | |||
| above. There has been significant misinformation and | above. There has been significant misinformation and | |||
| misunderstanding of this topic historically. Some firewalls | misunderstanding of this topic historically. Some firewalls | |||
| and security devices consider this suspicious. However, the | and security devices consider this suspicious. However, the | |||
| capability was used in T/TCP [20] and is used in TCP Fast Open | capability was used in T/TCP [22] and is used in TCP Fast Open | |||
| (TFO) [47], so is important for implementations and network | (TFO) [49], so is important for implementations and network | |||
| devices to permit. | devices to permit. | |||
| fifth, if neither of the SYN or RST bits is set then drop the | fifth, if neither of the SYN or RST bits is set then drop the | |||
| segment and return. | segment and return. | |||
| 3.10.7.4. Other States | 3.10.7.4. Other States | |||
| Otherwise, | Otherwise, | |||
| first check sequence number | first check sequence number | |||
| skipping to change at page 74, line 43 ¶ | skipping to change at page 74, line 31 ¶ | |||
| CLOSING STATE | CLOSING STATE | |||
| LAST-ACK STATE | LAST-ACK STATE | |||
| TIME-WAIT STATE | TIME-WAIT STATE | |||
| o Segments are processed in sequence. Initial tests on | o Segments are processed in sequence. Initial tests on | |||
| arrival are used to discard old duplicates, but further | arrival are used to discard old duplicates, but further | |||
| processing is done in SEG.SEQ order. If a segment's | processing is done in SEG.SEQ order. If a segment's | |||
| contents straddle the boundary between old and new, only the | contents straddle the boundary between old and new, only the | |||
| new parts should be processed. | new parts are processed. | |||
| o In general, the processing of received segments MUST be | o In general, the processing of received segments MUST be | |||
| implemented to aggregate ACK segments whenever possible | implemented to aggregate ACK segments whenever possible | |||
| (MUST-58). For example, if the TCP endpoint is processing a | (MUST-58). For example, if the TCP endpoint is processing a | |||
| series of queued segments, it MUST process them all before | series of queued segments, it MUST process them all before | |||
| sending any ACK segments (MUST-59). | sending any ACK segments (MUST-59). | |||
| o There are four cases for the acceptability test for an | o There are four cases for the acceptability test for an | |||
| incoming segment: | incoming segment: | |||
| skipping to change at page 75, line 35 ¶ | skipping to change at page 75, line 35 ¶ | |||
| o If an incoming segment is not acceptable, an acknowledgment | o If an incoming segment is not acceptable, an acknowledgment | |||
| should be sent in reply (unless the RST bit is set, if so | should be sent in reply (unless the RST bit is set, if so | |||
| drop the segment and return): | drop the segment and return): | |||
| + <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> | + <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> | |||
| o After sending the acknowledgment, drop the unacceptable | o After sending the acknowledgment, drop the unacceptable | |||
| segment and return. | segment and return. | |||
| o Note that for the TIME-WAIT state, there is an improved | o Note that for the TIME-WAIT state, there is an improved | |||
| algorithm described in [39] for handling incoming SYN | algorithm described in [41] for handling incoming SYN | |||
| segments, that utilizes timestamps rather than relying on | segments, that utilizes timestamps rather than relying on | |||
| the sequence number check described here. When the improved | the sequence number check described here. When the improved | |||
| algorithm is implemented, the logic above is not applicable | algorithm is implemented, the logic above is not applicable | |||
| for incoming SYN segments with timestamp options, received | for incoming SYN segments with timestamp options, received | |||
| on a connection in the TIME-WAIT state. | on a connection in the TIME-WAIT state. | |||
| o In the following it is assumed that the segment is the | o In the following it is assumed that the segment is the | |||
| idealized segment that begins at RCV.NXT and does not exceed | idealized segment that begins at RCV.NXT and does not exceed | |||
| the window. One could tailor actual segments to fit this | the window. One could tailor actual segments to fit this | |||
| assumption by trimming off any portions that lie outside the | assumption by trimming off any portions that lie outside the | |||
| window (including SYN and FIN), and only processing further | window (including SYN and FIN), and only processing further | |||
| if the segment then begins at RCV.NXT. Segments with higher | if the segment then begins at RCV.NXT. Segments with higher | |||
| beginning sequence numbers SHOULD be held for later | beginning sequence numbers SHOULD be held for later | |||
| processing (SHLD-31). | processing (SHLD-31). | |||
| - second check the RST bit, | - second check the RST bit, | |||
| o RFC 5961 [37] section 3 describes a potential blind reset | o RFC 5961 [9] section 3 describes a potential blind reset | |||
| attack and optional mitigation approach. This does not | attack and optional mitigation approach. This does not | |||
| provide a cryptographic protection (e.g. as in IPsec or TCP- | provide a cryptographic protection (e.g. as in IPsec or TCP- | |||
| AO), but can be applicable in situations described in RFC | AO), but can be applicable in situations described in RFC | |||
| 5961. For stacks implementing the RFC 5961 protection, the | 5961. For stacks implementing the RFC 5961 protection, the | |||
| three checks below apply, otherwise processing for these | three checks below apply, otherwise processing for these | |||
| states is indicated further below. | states is indicated further below. | |||
| + 1) If the RST bit is set and the sequence number is | + 1) If the RST bit is set and the sequence number is | |||
| outside the current receive window, silently drop the | outside the current receive window, silently drop the | |||
| segment. | segment. | |||
| skipping to change at page 76, line 44 ¶ | skipping to change at page 76, line 44 ¶ | |||
| o SYN-RECEIVED STATE | o SYN-RECEIVED STATE | |||
| + If the RST bit is set | + If the RST bit is set | |||
| * If this connection was initiated with a passive OPEN | * If this connection was initiated with a passive OPEN | |||
| (i.e., came from the LISTEN state), then return this | (i.e., came from the LISTEN state), then return this | |||
| connection to LISTEN state and return. The user need | connection to LISTEN state and return. The user need | |||
| not be informed. If this connection was initiated | not be informed. If this connection was initiated | |||
| with an active OPEN (i.e., came from SYN-SENT state) | with an active OPEN (i.e., came from SYN-SENT state) | |||
| then the connection was refused, signal the user | then the connection was refused, signal the user | |||
| "connection refused". In either case, all segments on | "connection refused". In either case, the | |||
| the retransmission queue should be removed. And in | retransmission queue should be flushed. And in the | |||
| the active OPEN case, enter the CLOSED state and | active OPEN case, enter the CLOSED state and delete | |||
| delete the TCB, and return. | the TCB, and return. | |||
| o ESTABLISHED | o ESTABLISHED | |||
| FIN-WAIT-1 | FIN-WAIT-1 | |||
| FIN-WAIT-2 | FIN-WAIT-2 | |||
| CLOSE-WAIT | CLOSE-WAIT | |||
| + If the RST bit is set then, any outstanding RECEIVEs and | + If the RST bit is set then, any outstanding RECEIVEs and | |||
| SEND should receive "reset" responses. All segment | SEND should receive "reset" responses. All segment | |||
| skipping to change at page 78, line 37 ¶ | skipping to change at page 78, line 37 ¶ | |||
| CLOSING STATE | CLOSING STATE | |||
| LAST-ACK STATE | LAST-ACK STATE | |||
| TIME-WAIT STATE | TIME-WAIT STATE | |||
| + If the SYN bit is set in these synchronized states, it | + If the SYN bit is set in these synchronized states, it | |||
| may be either a legitimate new connection attempt (e.g. | may be either a legitimate new connection attempt (e.g. | |||
| in the case of TIME-WAIT), an error where the connection | in the case of TIME-WAIT), an error where the connection | |||
| should be reset, or the result of an attack attempt, as | should be reset, or the result of an attack attempt, as | |||
| described in RFC 5961 [37]. For the TIME-WAIT state, new | described in RFC 5961 [9]. For the TIME-WAIT state, new | |||
| connections can be accepted if the timestamp option is | connections can be accepted if the timestamp option is | |||
| used and meets expectations (per [39]). For all other | used and meets expectations (per [41]). For all other | |||
| cases, RFC 5961 provides a mitigation with applicability | cases, RFC 5961 provides a mitigation with applicability | |||
| to some situations, though there are also alternatives | to some situations, though there are also alternatives | |||
| that offer cryptographic protection (see Section 7). RFC | that offer cryptographic protection (see Section 7). RFC | |||
| 5961 recommends that in these synchronized states, if the | 5961 recommends that in these synchronized states, if the | |||
| SYN bit is set, irrespective of the sequence number, TCP | SYN bit is set, irrespective of the sequence number, TCP | |||
| endpoints MUST send a "challenge ACK" to the remote peer: | endpoints MUST send a "challenge ACK" to the remote peer: | |||
| + <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> | + <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> | |||
| + After sending the acknowledgement, TCP implementations | + After sending the acknowledgement, TCP implementations | |||
| skipping to change at page 79, line 24 ¶ | skipping to change at page 79, line 24 ¶ | |||
| + If the SYN is not in the window this step would not be | + If the SYN is not in the window this step would not be | |||
| reached and an ACK would have been sent in the first step | reached and an ACK would have been sent in the first step | |||
| (sequence number check). | (sequence number check). | |||
| - fifth check the ACK field, | - fifth check the ACK field, | |||
| o if the ACK bit is off drop the segment and return | o if the ACK bit is off drop the segment and return | |||
| o if the ACK bit is on | o if the ACK bit is on | |||
| + RFC 5961 [37] section 5 describes a potential blind data | + RFC 5961 [9] section 5 describes a potential blind data | |||
| injection attack, and mitigation that implementations MAY | injection attack, and mitigation that implementations MAY | |||
| choose to include (MAY-12). TCP stacks that implement | choose to include (MAY-12). TCP stacks that implement | |||
| RFC 5961 MUST add an input check that the ACK value is | RFC 5961 MUST add an input check that the ACK value is | |||
| acceptable only if it is in the range of ((SND.UNA - | acceptable only if it is in the range of ((SND.UNA - | |||
| MAX.SND.WND) =< SEG.ACK =< SND.NXT). All incoming | MAX.SND.WND) =< SEG.ACK =< SND.NXT). All incoming | |||
| segments whose ACK value doesn't satisfy the above | segments whose ACK value doesn't satisfy the above | |||
| condition MUST be discarded and an ACK sent back. The | condition MUST be discarded and an ACK sent back. The | |||
| new state variable MAX.SND.WND is defined as the largest | new state variable MAX.SND.WND is defined as the largest | |||
| window that the local sender has ever received from its | window that the local sender has ever received from its | |||
| peer (subject to window scaling) or may be hard-coded to | peer (subject to window scaling) or may be hard-coded to | |||
| skipping to change at page 82, line 9 ¶ | skipping to change at page 82, line 9 ¶ | |||
| the remote side. Ignore the URG. | the remote side. Ignore the URG. | |||
| - seventh, process the segment text, | - seventh, process the segment text, | |||
| o ESTABLISHED STATE | o ESTABLISHED STATE | |||
| FIN-WAIT-1 STATE | FIN-WAIT-1 STATE | |||
| FIN-WAIT-2 STATE | FIN-WAIT-2 STATE | |||
| + Once in the ESTABLISHED state, it is possible to deliver | + Once in the ESTABLISHED state, it is possible to deliver | |||
| segment text to user RECEIVE buffers. Text from segments | segment data to user RECEIVE buffers. Data from segments | |||
| can be moved into buffers until either the buffer is full | can be moved into buffers until either the buffer is full | |||
| or the segment is empty. If the segment empties and | or the segment is empty. If the segment empties and | |||
| carries a PUSH flag, then the user is informed, when the | carries a PUSH flag, then the user is informed, when the | |||
| buffer is returned, that a PUSH has been received. | buffer is returned, that a PUSH has been received. | |||
| + When the TCP endpoint takes responsibility for delivering | + When the TCP endpoint takes responsibility for delivering | |||
| the data to the user it must also acknowledge the receipt | the data to the user it must also acknowledge the receipt | |||
| of the data. | of the data. | |||
| + Once the TCP endpoint takes responsibility for the data | + Once the TCP endpoint takes responsibility for the data | |||
| skipping to change at page 84, line 43 ¶ | skipping to change at page 84, line 43 ¶ | |||
| previous sequence numbers. | previous sequence numbers. | |||
| connection | connection | |||
| A logical communication path identified by a pair of sockets. | A logical communication path identified by a pair of sockets. | |||
| datagram | datagram | |||
| A message sent in a packet switched computer communications | A message sent in a packet switched computer communications | |||
| network. | network. | |||
| Destination Address | Destination Address | |||
| The network layer address of the remote endpoint. | The network layer address of the endpoint intended to receive | |||
| a segment. | ||||
| FIN | FIN | |||
| A control bit (finis) occupying one sequence number, which | A control bit (finis) occupying one sequence number, which | |||
| indicates that the sender will send no more data or control | indicates that the sender will send no more data or control | |||
| occupying sequence space. | occupying sequence space. | |||
| flush | ||||
| To remove all of the contents (data or segments) from a store | ||||
| (buffer or queue). | ||||
| fragment | fragment | |||
| A portion of a logical unit of data, in particular an | A portion of a logical unit of data, in particular an | |||
| internet fragment is a portion of an internet datagram. | internet fragment is a portion of an internet datagram. | |||
| header | header | |||
| Control information at the beginning of a message, segment, | Control information at the beginning of a message, segment, | |||
| fragment, packet or block of data. | fragment, packet or block of data. | |||
| host | host | |||
| A computer. In particular a source or destination of | A computer. In particular a source or destination of | |||
| messages from the point of view of the communication network. | messages from the point of view of the communication network. | |||
| Identification | Identification | |||
| An Internet Protocol field. This identifying value assigned | An Internet Protocol field. This identifying value assigned | |||
| by the sender aids in assembling the fragments of a datagram. | by the sender aids in assembling the fragments of a datagram. | |||
| internet address | internet address | |||
| A network layer address. | A network layer address. | |||
| internet datagram | internet datagram | |||
| The unit of data exchanged between an internet module and the | A unit of data exchanged between internet hosts, together | |||
| higher level protocol together with the internet header. | with the internet header that allows the datagram to be | |||
| routed from source to destination. | ||||
| internet fragment | internet fragment | |||
| A portion of the data of an internet datagram with an | A portion of the data of an internet datagram with an | |||
| internet header. | internet header. | |||
| IP | IP | |||
| Internet Protocol. See [1] and [13]. | Internet Protocol. See [1] and [13]. | |||
| IRS | IRS | |||
| The Initial Receive Sequence number. The first sequence | The Initial Receive Sequence number. The first sequence | |||
| skipping to change at page 87, line 11 ¶ | skipping to change at page 87, line 18 ¶ | |||
| receive next sequence number | receive next sequence number | |||
| This is the next sequence number the local TCP endpoint is | This is the next sequence number the local TCP endpoint is | |||
| expecting to receive. | expecting to receive. | |||
| receive window | receive window | |||
| This represents the sequence numbers the local (receiving) | This represents the sequence numbers the local (receiving) | |||
| TCP endpoint is willing to receive. Thus, the local TCP | TCP endpoint is willing to receive. Thus, the local TCP | |||
| endpoint considers that segments overlapping the range | endpoint considers that segments overlapping the range | |||
| RCV.NXT to RCV.NXT + RCV.WND - 1 carry acceptable data or | RCV.NXT to RCV.NXT + RCV.WND - 1 carry acceptable data or | |||
| control. Segments containing sequence numbers entirely | control. Segments containing sequence numbers entirely | |||
| outside of this range are considered duplicates and | outside this range are considered duplicates or injection | |||
| discarded. | attacks and discarded. | |||
| RST | RST | |||
| A control bit (reset), occupying no sequence space, | A control bit (reset), occupying no sequence space, | |||
| indicating that the receiver should delete the connection | indicating that the receiver should delete the connection | |||
| without further interaction. The receiver can determine, | without further interaction. The receiver can determine, | |||
| based on the sequence number and acknowledgment fields of the | based on the sequence number and acknowledgment fields of the | |||
| incoming segment, whether it should honor the reset command | incoming segment, whether it should honor the reset command | |||
| or ignore it. In no case does receipt of a segment | or ignore it. In no case does receipt of a segment | |||
| containing RST give rise to a RST in response. | containing RST give rise to a RST in response. | |||
| skipping to change at page 89, line 17 ¶ | skipping to change at page 89, line 22 ¶ | |||
| the state of a connection. | the state of a connection. | |||
| TCP | TCP | |||
| Transmission Control Protocol: A host-to-host protocol for | Transmission Control Protocol: A host-to-host protocol for | |||
| reliable communication in internetwork environments. | reliable communication in internetwork environments. | |||
| TOS | TOS | |||
| Type of Service, an obsoleted IPv4 field. The same header | Type of Service, an obsoleted IPv4 field. The same header | |||
| bits currently are used for the Differentiated Services field | bits currently are used for the Differentiated Services field | |||
| [4] containing the Differentiated Services Code Point (DSCP) | [4] containing the Differentiated Services Code Point (DSCP) | |||
| value and the 2-bit ECN codepoint [7]. | value and the 2-bit ECN codepoint [6]. | |||
| Type of Service | Type of Service | |||
| See "TOS". | See "TOS". | |||
| URG | URG | |||
| A control bit (urgent), occupying no sequence space, used to | A control bit (urgent), occupying no sequence space, used to | |||
| indicate that the receiving user should be notified to do | indicate that the receiving user should be notified to do | |||
| urgent processing as long as there is data to be consumed | urgent processing as long as there is data to be consumed | |||
| with sequence numbers less than the value indicated in the | with sequence numbers less than the value indicated by the | |||
| urgent pointer. | urgent pointer. | |||
| urgent pointer | urgent pointer | |||
| A control field meaningful only when the URG bit is on. This | A control field meaningful only when the URG bit is on. This | |||
| field communicates the value of the urgent pointer that | field communicates the value of the urgent pointer that | |||
| indicates the data octet associated with the sending user's | indicates the data octet associated with the sending user's | |||
| urgent call. | urgent call. | |||
| 5. Changes from RFC 793 | 5. Changes from RFC 793 | |||
| skipping to change at page 89, line 52 ¶ | skipping to change at page 90, line 11 ¶ | |||
| valuable in learning about and understanding TCP, and they are valid | valuable in learning about and understanding TCP, and they are valid | |||
| Informational references, even though their normative content has | Informational references, even though their normative content has | |||
| been incorporated into this document. | been incorporated into this document. | |||
| The main body of this document was adapted from RFC 793's Section 3, | The main body of this document was adapted from RFC 793's Section 3, | |||
| titled "FUNCTIONAL SPECIFICATION", with an attempt to keep formatting | titled "FUNCTIONAL SPECIFICATION", with an attempt to keep formatting | |||
| and layout as close as possible. | and layout as close as possible. | |||
| The collection of applicable RFC Errata that have been reported and | The collection of applicable RFC Errata that have been reported and | |||
| either accepted or held for an update to RFC 793 were incorporated | either accepted or held for an update to RFC 793 were incorporated | |||
| (Errata IDs: 573, 574, 700, 701, 1283, 1561, 1562, 1564, 1565, 1571, | (Errata IDs: 573, 574, 700, 701, 1283, 1561, 1562, 1564, 1571, 1572, | |||
| 1572, 2296, 2297, 2298, 2748, 2749, 2934, 3213, 3300, 3301, 6222). | 2297, 2298, 2748, 2749, 2934, 3213, 3300, 3301, 6222). Some errata | |||
| Some errata were not applicable due to other changes (Errata IDs: | were not applicable due to other changes (Errata IDs: 572, 575, 1565, | |||
| 572, 575, 1569, 3305, 3602). | 1569, 2296, 3305, 3602). | |||
| Changes to the specification of the Urgent Pointer described in RFC | Changes to the specification of the Urgent Pointer described in RFCs | |||
| 1122 and 6093 were incorporated. See RFC 6093 for detailed | 1011, 1122, and 6093 were incorporated. See RFC 6093 for detailed | |||
| discussion of why these changes were necessary. | discussion of why these changes were necessary. | |||
| The discussion of the RTO from RFC 793 was updated to refer to RFC | The discussion of the RTO from RFC 793 was updated to refer to RFC | |||
| 6298. The RFC 1122 text on the RTO originally replaced the 793 text, | 6298. The RFC 1122 text on the RTO originally replaced the 793 text, | |||
| however, RFC 2988 should have updated 1122, and has subsequently been | however, RFC 2988 should have updated 1122, and has subsequently been | |||
| obsoleted by 6298. | obsoleted by 6298. | |||
| RFC 1122 contains a collection of other changes and clarifications to | RFC 1011 [19] contains a number of comments about RFC 793, including | |||
| RFC 793. The normative items impacting the protocol have been | some needed changes to the TCP specification. These are expanded in | |||
| incorporated here, though some historically useful implementation | RFC 1122, which contains a collection of other changes and | |||
| advice and informative discussion from RFC 1122 is not included here. | clarifications to RFC 793. The normative items impacting the | |||
| protocol have been incorporated here, though some historically useful | ||||
| implementation advice and informative discussion from RFC 1122 is not | ||||
| included here. The present document updates RFC 1011, since this is | ||||
| now the TCP specification rather than RFC 793, and the comments noted | ||||
| in 1011 have been incorporated. | ||||
| RFC 1122 contains more than just TCP requirements, so this document | RFC 1122 contains more than just TCP requirements, so this document | |||
| can't obsolete RFC 1122 entirely. It is only marked as "updating" | can't obsolete RFC 1122 entirely. It is only marked as "updating" | |||
| 1122, however, it should be understood to effectively obsolete all of | 1122, however, it should be understood to effectively obsolete all of | |||
| the RFC 1122 material on TCP. | the RFC 1122 material on TCP. | |||
| The more secure Initial Sequence Number generation algorithm from RFC | The more secure Initial Sequence Number generation algorithm from RFC | |||
| 6528 was incorporated. See RFC 6528 for discussion of the attacks | 6528 was incorporated. See RFC 6528 for discussion of the attacks | |||
| that this mitigates, as well as advice on selecting PRF algorithms | that this mitigates, as well as advice on selecting PRF algorithms | |||
| and managing secret key data. | and managing secret key data. | |||
| skipping to change at page 91, line 12 ¶ | skipping to change at page 91, line 27 ¶ | |||
| content of RFC 793 Section 3 titled "FUNCTIONAL SPECIFICATION". | content of RFC 793 Section 3 titled "FUNCTIONAL SPECIFICATION". | |||
| Other content from RFC 793 has not been incorporated. The -01 | Other content from RFC 793 has not been incorporated. The -01 | |||
| revision of this document makes some minor formatting changes to the | revision of this document makes some minor formatting changes to the | |||
| RFC 793 content in order to convert the content into XML2RFC format | RFC 793 content in order to convert the content into XML2RFC format | |||
| and account for left-out parts of RFC 793. For instance, figure | and account for left-out parts of RFC 793. For instance, figure | |||
| numbering differs and some indentation is not exactly the same. | numbering differs and some indentation is not exactly the same. | |||
| The -02 revision of draft-eddy-rfc793bis incorporates errata that | The -02 revision of draft-eddy-rfc793bis incorporates errata that | |||
| have been verified: | have been verified: | |||
| Errata ID 573: Reported by Bob Braden (note: This errata basically | Errata ID 573: Reported by Bob Braden (note: This errata report | |||
| is just a reminder that RFC 1122 updates 793. Some of the | basically is just a reminder that RFC 1122 updates 793. Some of | |||
| associated changes are left pending to a separate revision that | the associated changes are left pending to a separate revision | |||
| incorporates 1122. Bob's mention of PUSH in 793 section 2.8 was | that incorporates 1122. Bob's mention of PUSH in 793 section 2.8 | |||
| not applicable here because that section was not part of the | was not applicable here because that section was not part of the | |||
| "functional specification". Also the 1122 text on the | "functional specification". Also, the 1122 text on the | |||
| retransmission timeout also has been updated by subsequent RFCs, | retransmission timeout also has been updated by subsequent RFCs, | |||
| so the change here deviates from Bob's suggestion to apply the | so the change here deviates from Bob's suggestion to apply the | |||
| 1122 text.) | 1122 text.) | |||
| Errata ID 574: Reported by Yin Shuming | Errata ID 574: Reported by Yin Shuming | |||
| Errata ID 700: Reported by Yin Shuming | Errata ID 700: Reported by Yin Shuming | |||
| Errata ID 701: Reported by Yin Shuming | Errata ID 701: Reported by Yin Shuming | |||
| Errata ID 1283: Reported by Pei-chun Cheng | Errata ID 1283: Reported by Pei-chun Cheng | |||
| Errata ID 1561: Reported by Constantin Hagemeier | Errata ID 1561: Reported by Constantin Hagemeier | |||
| Errata ID 1562: Reported by Constantin Hagemeier | Errata ID 1562: Reported by Constantin Hagemeier | |||
| Errata ID 1564: Reported by Constantin Hagemeier | Errata ID 1564: Reported by Constantin Hagemeier | |||
| skipping to change at page 92, line 15 ¶ | skipping to change at page 92, line 28 ¶ | |||
| The -03 revision of draft-eddy-rfc793bis revises all discussion of | The -03 revision of draft-eddy-rfc793bis revises all discussion of | |||
| the urgent pointer in order to comply with RFC 6093, 1122, and 1011. | the urgent pointer in order to comply with RFC 6093, 1122, and 1011. | |||
| Since 1122 held requirements on the urgent pointer, the full list of | Since 1122 held requirements on the urgent pointer, the full list of | |||
| requirements was brought into an appendix of this document, so that | requirements was brought into an appendix of this document, so that | |||
| it can be updated as-needed. | it can be updated as-needed. | |||
| The -04 revision of draft-eddy-rfc793bis includes the ISN generation | The -04 revision of draft-eddy-rfc793bis includes the ISN generation | |||
| changes from RFC 6528. | changes from RFC 6528. | |||
| The -05 revision of draft-eddy-rfc793bis incorporates MSS | The -05 revision of draft-eddy-rfc793bis incorporates MSS | |||
| requirements and definitions from RFC 879, 1122, and 6691, as well as | requirements and definitions from RFC 879 [17], 1122, and 6691, as | |||
| option-handling requirements from RFC 1122. | well as option-handling requirements from RFC 1122. | |||
| The -00 revision of draft-ietf-tcpm-rfc793bis incorporates several | The -00 revision of draft-ietf-tcpm-rfc793bis incorporates several | |||
| additional clarifications and updates to the section on segmentation, | additional clarifications and updates to the section on segmentation, | |||
| many of which are based on feedback from Joe Touch improving from the | many of which are based on feedback from Joe Touch improving from the | |||
| initial text on this in the previous revision. | initial text on this in the previous revision. | |||
| The -01 revision incorporates the change to Reserved bits due to ECN, | The -01 revision incorporates the change to Reserved bits due to ECN, | |||
| as well as many other changes that come from RFC 1122. | as well as many other changes that come from RFC 1122. | |||
| The -02 revision has small formatting modifications in order to | The -02 revision has small formatting modifications in order to | |||
| address xml2rfc warnings about long lines. It was a quick update to | address xml2rfc warnings about long lines. It was a quick update to | |||
| avoid document expiration. TCPM working group discussion in 2015 | avoid document expiration. TCPM working group discussion in 2015 | |||
| also indicated that that we should not try to add sections on | also indicated that we should not try to add sections on | |||
| implementation advice or similar non-normative information. | implementation advice or similar non-normative information. | |||
| The -03 revision incorporates more content from RFC 1122: Passive | The -03 revision incorporates more content from RFC 1122: Passive | |||
| OPEN Calls, Time-To-Live, Multihoming, IP Options, ICMP messages, | OPEN Calls, Time-To-Live, Multihoming, IP Options, ICMP messages, | |||
| Data Communications, When to Send Data, When to Send a Window Update, | Data Communications, When to Send Data, When to Send a Window Update, | |||
| Managing the Window, Probing Zero Windows, When to Send an ACK | Managing the Window, Probing Zero Windows, When to Send an ACK | |||
| Segment. The section on data communications was re-organized into | Segment. The section on data communications was re-organized into | |||
| clearer subsections (previously headings were embedded in the 793 | clearer subsections (previously headings were embedded in the 793 | |||
| text), and windows management advice from 793 was removed (as | text), and windows management advice from 793 was removed (as | |||
| reviewed by TCPM working group) in favor of the 1122 additions on | reviewed by TCPM working group) in favor of the 1122 additions on | |||
| skipping to change at page 95, line 20 ¶ | skipping to change at page 95, line 45 ¶ | |||
| Joe Touch. Important changes for review include (1) removal of the | Joe Touch. Important changes for review include (1) removal of the | |||
| need to check for the PUSH flag when evaluating the SWS override | need to check for the PUSH flag when evaluating the SWS override | |||
| timer expiration, (2) clarification about receding urgent pointer, | timer expiration, (2) clarification about receding urgent pointer, | |||
| and (3) de-duplicating handling of the RST checking between step 4 | and (3) de-duplicating handling of the RST checking between step 4 | |||
| and step 1. | and step 1. | |||
| The -25 revision incorporates changes based on the GENART review from | The -25 revision incorporates changes based on the GENART review from | |||
| Francis Dupont, SECDIR review from Kyle Rose, and OPSDIR review from | Francis Dupont, SECDIR review from Kyle Rose, and OPSDIR review from | |||
| Sarah Banks. | Sarah Banks. | |||
| The -26 revision incorporates changes stemming from the IESG reviews, | ||||
| and INTDIR review from Bernie Volz. | ||||
| The -27 revision fixes a few small editorial incompatibilities that | ||||
| Stephen McQuistin found related to automated code generation. | ||||
| The -28 revision addresses some COMMENTs from Ben Kaduk's IESG | ||||
| review. | ||||
| Some other suggested changes that will not be incorporated in this | Some other suggested changes that will not be incorporated in this | |||
| 793 update unless TCPM consensus changes with regard to scope are: | 793 update unless TCPM consensus changes with regard to scope are: | |||
| 1. Tony Sabatini's suggestion for describing DO field | 1. Tony Sabatini's suggestion for describing DO field | |||
| 2. Per discussion with Joe Touch (TAPS list, 6/20/2015), the | 2. Per discussion with Joe Touch (TAPS list, 6/20/2015), the | |||
| description of the API could be revisited | description of the API could be revisited | |||
| 3. Reducing the R2 value for SYNs has been suggested as a possible | 3. Reducing the R2 value for SYNs has been suggested as a possible | |||
| topic for future consideration. | topic for future consideration. | |||
| Early in the process of updating RFC 793, Scott Brim mentioned that | Early in the process of updating RFC 793, Scott Brim mentioned that | |||
| skipping to change at page 96, line 13 ¶ | skipping to change at page 97, line 13 ¶ | |||
| IANA should assign values indicated below. | IANA should assign values indicated below. | |||
| TCP Header Flags | TCP Header Flags | |||
| Bit Name Reference Assignment Notes | Bit Name Reference Assignment Notes | |||
| Offset | Offset | |||
| --- ---- --------- ---------------- | --- ---- --------- ---------------- | |||
| 4 Reserved for future use (this document) | 4 Reserved for future use (this document) | |||
| 5 Reserved for future use (this document) | 5 Reserved for future use (this document) | |||
| 6 Reserved for future use (this document) | 6 Reserved for future use (this document) | |||
| 7 Reserved for future use [RFC8311] Previously used by Historic [RFC3540] as NS (Nonce Sum) | 7 Reserved for future use [RFC8311] [1] | |||
| 8 CWR (Congestion Window Reduced) [RFC3168] | 8 CWR (Congestion Window Reduced) [RFC3168] | |||
| 9 ECE (ECN-Echo) [RFC3168] | 9 ECE (ECN-Echo) [RFC3168] | |||
| 10 Urgent Pointer field is significant (URG) (this document) | 10 Urgent Pointer field is significant (URG) (this document) | |||
| 11 Acknowledgment field is significant (ACK) (this document) | 11 Acknowledgment field is significant (ACK) (this document) | |||
| 12 Push Function (PSH) (this document) | 12 Push Function (PSH) (this document) | |||
| 13 Reset the connection (RST) (this document) | 13 Reset the connection (RST) (this document) | |||
| 14 Synchronize sequence numbers (SYN) (this document) | 14 Synchronize sequence numbers (SYN) (this document) | |||
| 15 No more data from sender (FIN) (this document) | 15 No more data from sender (FIN) (this document) | |||
| FOOTNOTES: | ||||
| [1] Previously used by Historic [RFC3540] as NS (Nonce Sum). | ||||
| This TCP Header Flags registry should also be moved to a sub-registry | This TCP Header Flags registry should also be moved to a sub-registry | |||
| under the global "Transmission Control Protocol (TCP) Parameters | under the global "Transmission Control Protocol (TCP) Parameters | |||
| registry (https://www.iana.org/assignments/tcp-parameters/tcp- | registry (https://www.iana.org/assignments/tcp-parameters/tcp- | |||
| parameters.xhtml). | parameters.xhtml). | |||
| The registry's Registration Procedure should remain Standards Action, | The registry's Registration Procedure should remain Standards Action, | |||
| but the Reference can be updated to this document, and the Note | but the Reference can be updated to this document, and the Note | |||
| removed. | removed. | |||
| 7. Security and Privacy Considerations | 7. Security and Privacy Considerations | |||
| The TCP design includes only rudimentary security features that | The TCP design includes only rudimentary security features that | |||
| improve the robustness and reliability of connections and application | improve the robustness and reliability of connections and application | |||
| data transfer, but there are no built-in cryptographic capabilities | data transfer, but there are no built-in cryptographic capabilities | |||
| to support any form of privacy, authentication, or other typical | to support any form of confidentiality, authentication, or other | |||
| security functions. Non-cryptographic enhancements (e.g. [37]) have | typical security functions. Non-cryptographic enhancements (e.g. | |||
| been developed to improve robustness of TCP connections to particular | [9]) have been developed to improve robustness of TCP connections to | |||
| types of attacks, but the applicability and protections of non- | particular types of attacks, but the applicability and protections of | |||
| cryptographic enhancements are limited (e.g. see section 1.1 of | non-cryptographic enhancements are limited (e.g. see section 1.1 of | |||
| [37]). Applications typically utilize lower-layer (e.g. IPsec) and | [9]). Applications typically utilize lower-layer (e.g. IPsec) and | |||
| upper-layer (e.g. TLS) protocols to provide security and privacy for | upper-layer (e.g. TLS) protocols to provide security and privacy for | |||
| TCP connections and application data carried in TCP. Methods based | TCP connections and application data carried in TCP. Methods based | |||
| on TCP options have been developed as well, to support some security | on TCP options have been developed as well, to support some security | |||
| capabilities. | capabilities. | |||
| In order to fully protect TCP connections (including their control | In order to fully provide confidentiality, integrity protection, and | |||
| flags) IPsec or the TCP Authentication Option (TCP-AO) [36] are the | authentication for TCP connections (including their control flags) | |||
| only current effective methods. Other methods discussed in this | IPsec is the only current effective method. For integrity protection | |||
| section may protect the payload, but either only a subset of the | and authentication, the TCP Authentication Option (TCP-AO) [39] is | |||
| fields (e.g. tcpcrypt [55]) or none at all (e.g. TLS). Other | available, with a proposed extension to also provide confidentiality | |||
| security features that have been added to TCP (e.g. ISN generation, | for the segment payload. Other methods discussed in this section may | |||
| sequence number checks, and others) are only capable of partially | provide confidentiality or integrity protection for the payload, but | |||
| hindering attacks. | for the TCP header only cover either a subset of the fields (e.g. | |||
| tcpcrypt [57]) or none at all (e.g. TLS). Other security features | ||||
| that have been added to TCP (e.g. ISN generation, sequence number | ||||
| checks, and others) are only capable of partially hindering attacks. | ||||
| Applications using long-lived TCP flows have been vulnerable to | Applications using long-lived TCP flows have been vulnerable to | |||
| attacks that exploit the processing of control flags described in | attacks that exploit the processing of control flags described in | |||
| earlier TCP specifications [31]. TCP-MD5 was a commonly implemented | earlier TCP specifications [34]. TCP-MD5 was a commonly implemented | |||
| TCP option to support authentication for some of these connections, | TCP option to support authentication for some of these connections, | |||
| but had flaws and is now deprecated. TCP-AO provides a capability to | but had flaws and is now deprecated. TCP-AO provides a capability to | |||
| protect long-lived TCP connections from attacks, and has superior | protect long-lived TCP connections from attacks, and has superior | |||
| properties to TCP-MD5. It does not provide any privacy for | properties to TCP-MD5. It does not provide any privacy for | |||
| application data, nor for the TCP headers. | application data, nor for the TCP headers. | |||
| The "tcpcrypt" [55] Experimental extension to TCP provides the | The "tcpcrypt" [57] Experimental extension to TCP provides the | |||
| ability to cryptographically protect connection data. Metadata | ability to cryptographically protect connection data. Metadata | |||
| aspects of the TCP flow are still visible, but the application stream | aspects of the TCP flow are still visible, but the application stream | |||
| is well-protected. Within the TCP header, only the urgent pointer | is well-protected. Within the TCP header, only the urgent pointer | |||
| and FIN flag are protected through tcpcrypt. | and FIN flag are protected through tcpcrypt. | |||
| The TCP Roadmap [48] includes notes about several RFCs related to TCP | The TCP Roadmap [50] includes notes about several RFCs related to TCP | |||
| security. Many of the enhancements provided by these RFCs have been | security. Many of the enhancements provided by these RFCs have been | |||
| integrated into the present document, including ISN generation, | integrated into the present document, including ISN generation, | |||
| mitigating blind in-window attacks, and improving handling of soft | mitigating blind in-window attacks, and improving handling of soft | |||
| errors and ICMP packets. These are all discussed in greater detail | errors and ICMP packets. These are all discussed in greater detail | |||
| in the referenced RFCs that originally described the changes needed | in the referenced RFCs that originally described the changes needed | |||
| to earlier TCP specifications. Additionally, see RFC 6093 [38] for | to earlier TCP specifications. Additionally, see RFC 6093 [40] for | |||
| discussion of security considerations related to the urgent pointer | discussion of security considerations related to the urgent pointer | |||
| field, that has been deprecated. | field, that has been deprecated. | |||
| Since TCP is often used for bulk transfer flows, some attacks are | Since TCP is often used for bulk transfer flows, some attacks are | |||
| possible that abuse the TCP congestion control logic. An example is | possible that abuse the TCP congestion control logic. An example is | |||
| "ACK-division" attacks. Updates that have been made to the TCP | "ACK-division" attacks. Updates that have been made to the TCP | |||
| congestion control specifications include mechanisms like Appropriate | congestion control specifications include mechanisms like Appropriate | |||
| Byte Counting (ABC) [27] that act as mitigations to these attacks. | Byte Counting (ABC) [30] that act as mitigations to these attacks. | |||
| Other attacks are focused on exhausting the resources of a TCP | Other attacks are focused on exhausting the resources of a TCP | |||
| server. Examples include SYN flooding [30] or wasting resources on | server. Examples include SYN flooding [33] or wasting resources on | |||
| non-progressing connections [40]. Operating systems commonly | non-progressing connections [42]. Operating systems commonly | |||
| implement mitigations for these attacks. Some common defenses also | implement mitigations for these attacks. Some common defenses also | |||
| utilize proxies, stateful firewalls, and other technologies outside | utilize proxies, stateful firewalls, and other technologies outside | |||
| of the end-host TCP implementation. | the end-host TCP implementation. | |||
| The concept of a protocol's "wire image" is described in RFC 8546 | The concept of a protocol's "wire image" is described in RFC 8546 | |||
| [54], which describes how TCP's cleartext headers expose more | [56], which describes how TCP's cleartext headers expose more | |||
| metadata to nodes on the path than is strictly required to route the | metadata to nodes on the path than is strictly required to route the | |||
| packets to their destination. On-path adversaries may be able to | packets to their destination. On-path adversaries may be able to | |||
| leverage this metadata. Lessons learned in this respect from TCP | leverage this metadata. Lessons learned in this respect from TCP | |||
| have been applied in the design of newer transports like QUIC [58]. | have been applied in the design of newer transports like QUIC [60]. | |||
| Additionally, based partly on experiences with TCP and its | Additionally, based partly on experiences with TCP and its | |||
| extensions, there are considerations that might be applicable for | extensions, there are considerations that might be applicable for | |||
| future TCP extensions and other transports that the IETF has | future TCP extensions and other transports that the IETF has | |||
| documented in RFC 9065 [59], along with IAB recommendations in RFC | documented in RFC 9065 [61], along with IAB recommendations in RFC | |||
| 8558 [56] and [66]. | 8558 [58] and [68]. | |||
| There are also methods of "fingerprinting" that can be used to infer | ||||
| the host TCP implementation (operating system) version or platform | ||||
| information. These collect observations of several aspects such as | ||||
| the options present in segments, the ordering of options, the | ||||
| specific behaviors in the case of various conditions, packet timing, | ||||
| packet sizing, and other aspects of the protocol that are left to be | ||||
| determined by an implementer, and can use those observations to | ||||
| identify information about the host and implementation. | ||||
| 8. Acknowledgements | 8. Acknowledgements | |||
| This document is largely a revision of RFC 793, which Jon Postel was | This document is largely a revision of RFC 793, which Jon Postel was | |||
| the editor of. Due to his excellent work, it was able to last for | the editor of. Due to his excellent work, it was able to last for | |||
| three decades before we felt the need to revise it. | three decades before we felt the need to revise it. | |||
| Andre Oppermann was a contributor and helped to edit the first | Andre Oppermann was a contributor and helped to edit the first | |||
| revision of this document. | revision of this document. | |||
| skipping to change at page 98, line 41 ¶ | skipping to change at page 100, line 15 ¶ | |||
| During the discussions of this work on the TCPM mailing list, in | During the discussions of this work on the TCPM mailing list, in | |||
| working group meetings, and via area reviews, helpful comments, | working group meetings, and via area reviews, helpful comments, | |||
| critiques, and reviews were received from (listed alphabetically by | critiques, and reviews were received from (listed alphabetically by | |||
| last name): Praveen Balasubramanian, David Borman, Mohamed Boucadair, | last name): Praveen Balasubramanian, David Borman, Mohamed Boucadair, | |||
| Bob Briscoe, Neal Cardwell, Yuchung Cheng, Martin Duke, Francis | Bob Briscoe, Neal Cardwell, Yuchung Cheng, Martin Duke, Francis | |||
| Dupont, Ted Faber, Gorry Fairhurst, Fernando Gont, Rodney Grimes, Yi | Dupont, Ted Faber, Gorry Fairhurst, Fernando Gont, Rodney Grimes, Yi | |||
| Huang, Rahul Jadhav, Markku Kojo, Mike Kosek, Juhamatti Kuusisaari, | Huang, Rahul Jadhav, Markku Kojo, Mike Kosek, Juhamatti Kuusisaari, | |||
| Kevin Lahey, Kevin Mason, Matt Mathis, Stephen McQuistin, Jonathan | Kevin Lahey, Kevin Mason, Matt Mathis, Stephen McQuistin, Jonathan | |||
| Morton, Matt Olson, Tommy Pauly, Tom Petch, Hagen Paul Pfeifer, Kyle | Morton, Matt Olson, Tommy Pauly, Tom Petch, Hagen Paul Pfeifer, Kyle | |||
| Rose, Anthony Sabatini, Michael Scharf, Greg Skinner, Joe Touch, | Rose, Anthony Sabatini, Michael Scharf, Greg Skinner, Joe Touch, | |||
| Michael Tuexen, Reji Varghese, Tim Wicinski, Lloyd Wood, and Alex | Michael Tuexen, Reji Varghese, Bernie Volz, Tim Wicinski, Lloyd Wood, | |||
| Zimmermann. | and Alex Zimmermann. | |||
| Joe Touch provided additional help in clarifying the description of | Joe Touch provided additional help in clarifying the description of | |||
| segment size parameters and PMTUD/PLPMTUD recommendations. Markku | segment size parameters and PMTUD/PLPMTUD recommendations. Markku | |||
| Kojo helped put together the text in the section on TCP Congestion | Kojo helped put together the text in the section on TCP Congestion | |||
| Control. | Control. | |||
| This document includes content from errata that were reported by | This document includes content from errata that were reported by | |||
| (listed chronologically): Yin Shuming, Bob Braden, Morris M. Keesan, | (listed chronologically): Yin Shuming, Bob Braden, Morris M. Keesan, | |||
| Pei-chun Cheng, Constantin Hagemeier, Vishwas Manral, Mykyta | Pei-chun Cheng, Constantin Hagemeier, Vishwas Manral, Mykyta | |||
| Yevstifeyev, EungJun Yi, Botong Huang, Charles Deng, Merlin Buge. | Yevstifeyev, EungJun Yi, Botong Huang, Charles Deng, Merlin Buge. | |||
| skipping to change at page 99, line 28 ¶ | skipping to change at page 101, line 5 ¶ | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| [4] Nichols, K., Blake, S., Baker, F., and D. Black, | [4] Nichols, K., Blake, S., Baker, F., and D. Black, | |||
| "Definition of the Differentiated Services Field (DS | "Definition of the Differentiated Services Field (DS | |||
| Field) in the IPv4 and IPv6 Headers", RFC 2474, | Field) in the IPv4 and IPv6 Headers", RFC 2474, | |||
| DOI 10.17487/RFC2474, December 1998, | DOI 10.17487/RFC2474, December 1998, | |||
| <https://www.rfc-editor.org/info/rfc2474>. | <https://www.rfc-editor.org/info/rfc2474>. | |||
| [5] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms", | [5] Floyd, S., "Congestion Control Principles", BCP 41, | |||
| RFC 2675, DOI 10.17487/RFC2675, August 1999, | ||||
| <https://www.rfc-editor.org/info/rfc2675>. | ||||
| [6] Floyd, S., "Congestion Control Principles", BCP 41, | ||||
| RFC 2914, DOI 10.17487/RFC2914, September 2000, | RFC 2914, DOI 10.17487/RFC2914, September 2000, | |||
| <https://www.rfc-editor.org/info/rfc2914>. | <https://www.rfc-editor.org/info/rfc2914>. | |||
| [7] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [6] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
| of Explicit Congestion Notification (ECN) to IP", | of Explicit Congestion Notification (ECN) to IP", | |||
| RFC 3168, DOI 10.17487/RFC3168, September 2001, | RFC 3168, DOI 10.17487/RFC3168, September 2001, | |||
| <https://www.rfc-editor.org/info/rfc3168>. | <https://www.rfc-editor.org/info/rfc3168>. | |||
| [8] Floyd, S. and M. Allman, "Specifying New Congestion | [7] Floyd, S. and M. Allman, "Specifying New Congestion | |||
| Control Algorithms", BCP 133, RFC 5033, | Control Algorithms", BCP 133, RFC 5033, | |||
| DOI 10.17487/RFC5033, August 2007, | DOI 10.17487/RFC5033, August 2007, | |||
| <https://www.rfc-editor.org/info/rfc5033>. | <https://www.rfc-editor.org/info/rfc5033>. | |||
| [9] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | [8] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||
| Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, | Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, | |||
| <https://www.rfc-editor.org/info/rfc5681>. | <https://www.rfc-editor.org/info/rfc5681>. | |||
| [9] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's | ||||
| Robustness to Blind In-Window Attacks", RFC 5961, | ||||
| DOI 10.17487/RFC5961, August 2010, | ||||
| <https://www.rfc-editor.org/info/rfc5961>. | ||||
| [10] Paxson, V., Allman, M., Chu, J., and M. Sargent, | [10] Paxson, V., Allman, M., Chu, J., and M. Sargent, | |||
| "Computing TCP's Retransmission Timer", RFC 6298, | "Computing TCP's Retransmission Timer", RFC 6298, | |||
| DOI 10.17487/RFC6298, June 2011, | DOI 10.17487/RFC6298, June 2011, | |||
| <https://www.rfc-editor.org/info/rfc6298>. | <https://www.rfc-editor.org/info/rfc6298>. | |||
| [11] Gont, F., "Deprecation of ICMP Source Quench Messages", | [11] Gont, F., "Deprecation of ICMP Source Quench Messages", | |||
| RFC 6633, DOI 10.17487/RFC6633, May 2012, | RFC 6633, DOI 10.17487/RFC6633, May 2012, | |||
| <https://www.rfc-editor.org/info/rfc6633>. | <https://www.rfc-editor.org/info/rfc6633>. | |||
| [12] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [12] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
| skipping to change at page 100, line 38 ¶ | skipping to change at page 102, line 15 ¶ | |||
| [15] Allman, M., "Requirements for Time-Based Loss Detection", | [15] Allman, M., "Requirements for Time-Based Loss Detection", | |||
| BCP 233, RFC 8961, DOI 10.17487/RFC8961, November 2020, | BCP 233, RFC 8961, DOI 10.17487/RFC8961, November 2020, | |||
| <https://www.rfc-editor.org/info/rfc8961>. | <https://www.rfc-editor.org/info/rfc8961>. | |||
| 9.2. Informative References | 9.2. Informative References | |||
| [16] Postel, J., "Transmission Control Protocol", STD 7, | [16] Postel, J., "Transmission Control Protocol", STD 7, | |||
| RFC 793, DOI 10.17487/RFC0793, September 1981, | RFC 793, DOI 10.17487/RFC0793, September 1981, | |||
| <https://www.rfc-editor.org/info/rfc793>. | <https://www.rfc-editor.org/info/rfc793>. | |||
| [17] Nagle, J., "Congestion Control in IP/TCP Internetworks", | [17] Postel, J., "The TCP Maximum Segment Size and Related | |||
| Topics", RFC 879, DOI 10.17487/RFC0879, November 1983, | ||||
| <https://www.rfc-editor.org/info/rfc879>. | ||||
| [18] Nagle, J., "Congestion Control in IP/TCP Internetworks", | ||||
| RFC 896, DOI 10.17487/RFC0896, January 1984, | RFC 896, DOI 10.17487/RFC0896, January 1984, | |||
| <https://www.rfc-editor.org/info/rfc896>. | <https://www.rfc-editor.org/info/rfc896>. | |||
| [18] Braden, R., Ed., "Requirements for Internet Hosts - | [19] Reynolds, J. and J. Postel, "Official Internet protocols", | |||
| RFC 1011, DOI 10.17487/RFC1011, May 1987, | ||||
| <https://www.rfc-editor.org/info/rfc1011>. | ||||
| [20] Braden, R., Ed., "Requirements for Internet Hosts - | ||||
| Communication Layers", STD 3, RFC 1122, | Communication Layers", STD 3, RFC 1122, | |||
| DOI 10.17487/RFC1122, October 1989, | DOI 10.17487/RFC1122, October 1989, | |||
| <https://www.rfc-editor.org/info/rfc1122>. | <https://www.rfc-editor.org/info/rfc1122>. | |||
| [19] Almquist, P., "Type of Service in the Internet Protocol | [21] Almquist, P., "Type of Service in the Internet Protocol | |||
| Suite", RFC 1349, DOI 10.17487/RFC1349, July 1992, | Suite", RFC 1349, DOI 10.17487/RFC1349, July 1992, | |||
| <https://www.rfc-editor.org/info/rfc1349>. | <https://www.rfc-editor.org/info/rfc1349>. | |||
| [20] Braden, R., "T/TCP -- TCP Extensions for Transactions | [22] Braden, R., "T/TCP -- TCP Extensions for Transactions | |||
| Functional Specification", RFC 1644, DOI 10.17487/RFC1644, | Functional Specification", RFC 1644, DOI 10.17487/RFC1644, | |||
| July 1994, <https://www.rfc-editor.org/info/rfc1644>. | July 1994, <https://www.rfc-editor.org/info/rfc1644>. | |||
| [21] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP | [23] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP | |||
| Selective Acknowledgment Options", RFC 2018, | Selective Acknowledgment Options", RFC 2018, | |||
| DOI 10.17487/RFC2018, October 1996, | DOI 10.17487/RFC2018, October 1996, | |||
| <https://www.rfc-editor.org/info/rfc2018>. | <https://www.rfc-editor.org/info/rfc2018>. | |||
| [22] Paxson, V., Allman, M., Dawson, S., Fenner, W., Griner, | [24] Paxson, V., Allman, M., Dawson, S., Fenner, W., Griner, | |||
| J., Heavens, I., Lahey, K., Semke, J., and B. Volz, "Known | J., Heavens, I., Lahey, K., Semke, J., and B. Volz, "Known | |||
| TCP Implementation Problems", RFC 2525, | TCP Implementation Problems", RFC 2525, | |||
| DOI 10.17487/RFC2525, March 1999, | DOI 10.17487/RFC2525, March 1999, | |||
| <https://www.rfc-editor.org/info/rfc2525>. | <https://www.rfc-editor.org/info/rfc2525>. | |||
| [23] Xiao, X., Hannan, A., Paxson, V., and E. Crabbe, "TCP | [25] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms", | |||
| RFC 2675, DOI 10.17487/RFC2675, August 1999, | ||||
| <https://www.rfc-editor.org/info/rfc2675>. | ||||
| [26] Xiao, X., Hannan, A., Paxson, V., and E. Crabbe, "TCP | ||||
| Processing of the IPv4 Precedence Field", RFC 2873, | Processing of the IPv4 Precedence Field", RFC 2873, | |||
| DOI 10.17487/RFC2873, June 2000, | DOI 10.17487/RFC2873, June 2000, | |||
| <https://www.rfc-editor.org/info/rfc2873>. | <https://www.rfc-editor.org/info/rfc2873>. | |||
| [24] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An | [27] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An | |||
| Extension to the Selective Acknowledgement (SACK) Option | Extension to the Selective Acknowledgement (SACK) Option | |||
| for TCP", RFC 2883, DOI 10.17487/RFC2883, July 2000, | for TCP", RFC 2883, DOI 10.17487/RFC2883, July 2000, | |||
| <https://www.rfc-editor.org/info/rfc2883>. | <https://www.rfc-editor.org/info/rfc2883>. | |||
| [25] Lahey, K., "TCP Problems with Path MTU Discovery", | [28] Lahey, K., "TCP Problems with Path MTU Discovery", | |||
| RFC 2923, DOI 10.17487/RFC2923, September 2000, | RFC 2923, DOI 10.17487/RFC2923, September 2000, | |||
| <https://www.rfc-editor.org/info/rfc2923>. | <https://www.rfc-editor.org/info/rfc2923>. | |||
| [26] Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. | [29] Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. | |||
| Sooriyabandara, "TCP Performance Implications of Network | Sooriyabandara, "TCP Performance Implications of Network | |||
| Path Asymmetry", BCP 69, RFC 3449, DOI 10.17487/RFC3449, | Path Asymmetry", BCP 69, RFC 3449, DOI 10.17487/RFC3449, | |||
| December 2002, <https://www.rfc-editor.org/info/rfc3449>. | December 2002, <https://www.rfc-editor.org/info/rfc3449>. | |||
| [27] Allman, M., "TCP Congestion Control with Appropriate Byte | [30] Allman, M., "TCP Congestion Control with Appropriate Byte | |||
| Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February | Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February | |||
| 2003, <https://www.rfc-editor.org/info/rfc3465>. | 2003, <https://www.rfc-editor.org/info/rfc3465>. | |||
| [28] Fenner, B., "Experimental Values In IPv4, IPv6, ICMPv4, | [31] Fenner, B., "Experimental Values In IPv4, IPv6, ICMPv4, | |||
| ICMPv6, UDP, and TCP Headers", RFC 4727, | ICMPv6, UDP, and TCP Headers", RFC 4727, | |||
| DOI 10.17487/RFC4727, November 2006, | DOI 10.17487/RFC4727, November 2006, | |||
| <https://www.rfc-editor.org/info/rfc4727>. | <https://www.rfc-editor.org/info/rfc4727>. | |||
| [29] Mathis, M. and J. Heffner, "Packetization Layer Path MTU | [32] Mathis, M. and J. Heffner, "Packetization Layer Path MTU | |||
| Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, | Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, | |||
| <https://www.rfc-editor.org/info/rfc4821>. | <https://www.rfc-editor.org/info/rfc4821>. | |||
| [30] Eddy, W., "TCP SYN Flooding Attacks and Common | [33] Eddy, W., "TCP SYN Flooding Attacks and Common | |||
| Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, | Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, | |||
| <https://www.rfc-editor.org/info/rfc4987>. | <https://www.rfc-editor.org/info/rfc4987>. | |||
| [31] Touch, J., "Defending TCP Against Spoofing Attacks", | [34] Touch, J., "Defending TCP Against Spoofing Attacks", | |||
| RFC 4953, DOI 10.17487/RFC4953, July 2007, | RFC 4953, DOI 10.17487/RFC4953, July 2007, | |||
| <https://www.rfc-editor.org/info/rfc4953>. | <https://www.rfc-editor.org/info/rfc4953>. | |||
| [32] Culley, P., Elzur, U., Recio, R., Bailey, S., and J. | [35] Culley, P., Elzur, U., Recio, R., Bailey, S., and J. | |||
| Carrier, "Marker PDU Aligned Framing for TCP | Carrier, "Marker PDU Aligned Framing for TCP | |||
| Specification", RFC 5044, DOI 10.17487/RFC5044, October | Specification", RFC 5044, DOI 10.17487/RFC5044, October | |||
| 2007, <https://www.rfc-editor.org/info/rfc5044>. | 2007, <https://www.rfc-editor.org/info/rfc5044>. | |||
| [33] Gont, F., "TCP's Reaction to Soft Errors", RFC 5461, | [36] Gont, F., "TCP's Reaction to Soft Errors", RFC 5461, | |||
| DOI 10.17487/RFC5461, February 2009, | DOI 10.17487/RFC5461, February 2009, | |||
| <https://www.rfc-editor.org/info/rfc5461>. | <https://www.rfc-editor.org/info/rfc5461>. | |||
| [34] StJohns, M., Atkinson, R., and G. Thomas, "Common | [37] StJohns, M., Atkinson, R., and G. Thomas, "Common | |||
| Architecture Label IPv6 Security Option (CALIPSO)", | Architecture Label IPv6 Security Option (CALIPSO)", | |||
| RFC 5570, DOI 10.17487/RFC5570, July 2009, | RFC 5570, DOI 10.17487/RFC5570, July 2009, | |||
| <https://www.rfc-editor.org/info/rfc5570>. | <https://www.rfc-editor.org/info/rfc5570>. | |||
| [35] Sandlund, K., Pelletier, G., and L-E. Jonsson, "The RObust | [38] Sandlund, K., Pelletier, G., and L-E. Jonsson, "The RObust | |||
| Header Compression (ROHC) Framework", RFC 5795, | Header Compression (ROHC) Framework", RFC 5795, | |||
| DOI 10.17487/RFC5795, March 2010, | DOI 10.17487/RFC5795, March 2010, | |||
| <https://www.rfc-editor.org/info/rfc5795>. | <https://www.rfc-editor.org/info/rfc5795>. | |||
| [36] Touch, J., Mankin, A., and R. Bonica, "The TCP | [39] Touch, J., Mankin, A., and R. Bonica, "The TCP | |||
| Authentication Option", RFC 5925, DOI 10.17487/RFC5925, | Authentication Option", RFC 5925, DOI 10.17487/RFC5925, | |||
| June 2010, <https://www.rfc-editor.org/info/rfc5925>. | June 2010, <https://www.rfc-editor.org/info/rfc5925>. | |||
| [37] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's | [40] Gont, F. and A. Yourtchenko, "On the Implementation of the | |||
| Robustness to Blind In-Window Attacks", RFC 5961, | ||||
| DOI 10.17487/RFC5961, August 2010, | ||||
| <https://www.rfc-editor.org/info/rfc5961>. | ||||
| [38] Gont, F. and A. Yourtchenko, "On the Implementation of the | ||||
| TCP Urgent Mechanism", RFC 6093, DOI 10.17487/RFC6093, | TCP Urgent Mechanism", RFC 6093, DOI 10.17487/RFC6093, | |||
| January 2011, <https://www.rfc-editor.org/info/rfc6093>. | January 2011, <https://www.rfc-editor.org/info/rfc6093>. | |||
| [39] Gont, F., "Reducing the TIME-WAIT State Using TCP | [41] Gont, F., "Reducing the TIME-WAIT State Using TCP | |||
| Timestamps", BCP 159, RFC 6191, DOI 10.17487/RFC6191, | Timestamps", BCP 159, RFC 6191, DOI 10.17487/RFC6191, | |||
| April 2011, <https://www.rfc-editor.org/info/rfc6191>. | April 2011, <https://www.rfc-editor.org/info/rfc6191>. | |||
| [40] Bashyam, M., Jethanandani, M., and A. Ramaiah, "TCP Sender | [42] Bashyam, M., Jethanandani, M., and A. Ramaiah, "TCP Sender | |||
| Clarification for Persist Condition", RFC 6429, | Clarification for Persist Condition", RFC 6429, | |||
| DOI 10.17487/RFC6429, December 2011, | DOI 10.17487/RFC6429, December 2011, | |||
| <https://www.rfc-editor.org/info/rfc6429>. | <https://www.rfc-editor.org/info/rfc6429>. | |||
| [41] Gont, F. and S. Bellovin, "Defending against Sequence | [43] Gont, F. and S. Bellovin, "Defending against Sequence | |||
| Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February | Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February | |||
| 2012, <https://www.rfc-editor.org/info/rfc6528>. | 2012, <https://www.rfc-editor.org/info/rfc6528>. | |||
| [42] Borman, D., "TCP Options and Maximum Segment Size (MSS)", | [44] Borman, D., "TCP Options and Maximum Segment Size (MSS)", | |||
| RFC 6691, DOI 10.17487/RFC6691, July 2012, | RFC 6691, DOI 10.17487/RFC6691, July 2012, | |||
| <https://www.rfc-editor.org/info/rfc6691>. | <https://www.rfc-editor.org/info/rfc6691>. | |||
| [43] Touch, J., "Updated Specification of the IPv4 ID Field", | [45] Touch, J., "Updated Specification of the IPv4 ID Field", | |||
| RFC 6864, DOI 10.17487/RFC6864, February 2013, | RFC 6864, DOI 10.17487/RFC6864, February 2013, | |||
| <https://www.rfc-editor.org/info/rfc6864>. | <https://www.rfc-editor.org/info/rfc6864>. | |||
| [44] Touch, J., "Shared Use of Experimental TCP Options", | [46] Touch, J., "Shared Use of Experimental TCP Options", | |||
| RFC 6994, DOI 10.17487/RFC6994, August 2013, | RFC 6994, DOI 10.17487/RFC6994, August 2013, | |||
| <https://www.rfc-editor.org/info/rfc6994>. | <https://www.rfc-editor.org/info/rfc6994>. | |||
| [45] McPherson, D., Oran, D., Thaler, D., and E. Osterweil, | [47] McPherson, D., Oran, D., Thaler, D., and E. Osterweil, | |||
| "Architectural Considerations of IP Anycast", RFC 7094, | "Architectural Considerations of IP Anycast", RFC 7094, | |||
| DOI 10.17487/RFC7094, January 2014, | DOI 10.17487/RFC7094, January 2014, | |||
| <https://www.rfc-editor.org/info/rfc7094>. | <https://www.rfc-editor.org/info/rfc7094>. | |||
| [46] Borman, D., Braden, B., Jacobson, V., and R. | [48] Borman, D., Braden, B., Jacobson, V., and R. | |||
| Scheffenegger, Ed., "TCP Extensions for High Performance", | Scheffenegger, Ed., "TCP Extensions for High Performance", | |||
| RFC 7323, DOI 10.17487/RFC7323, September 2014, | RFC 7323, DOI 10.17487/RFC7323, September 2014, | |||
| <https://www.rfc-editor.org/info/rfc7323>. | <https://www.rfc-editor.org/info/rfc7323>. | |||
| [47] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP | [49] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP | |||
| Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, | Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, | |||
| <https://www.rfc-editor.org/info/rfc7413>. | <https://www.rfc-editor.org/info/rfc7413>. | |||
| [48] Duke, M., Braden, R., Eddy, W., Blanton, E., and A. | [50] Duke, M., Braden, R., Eddy, W., Blanton, E., and A. | |||
| Zimmermann, "A Roadmap for Transmission Control Protocol | Zimmermann, "A Roadmap for Transmission Control Protocol | |||
| (TCP) Specification Documents", RFC 7414, | (TCP) Specification Documents", RFC 7414, | |||
| DOI 10.17487/RFC7414, February 2015, | DOI 10.17487/RFC7414, February 2015, | |||
| <https://www.rfc-editor.org/info/rfc7414>. | <https://www.rfc-editor.org/info/rfc7414>. | |||
| [49] Black, D., Ed. and P. Jones, "Differentiated Services | [51] Black, D., Ed. and P. Jones, "Differentiated Services | |||
| (Diffserv) and Real-Time Communication", RFC 7657, | (Diffserv) and Real-Time Communication", RFC 7657, | |||
| DOI 10.17487/RFC7657, November 2015, | DOI 10.17487/RFC7657, November 2015, | |||
| <https://www.rfc-editor.org/info/rfc7657>. | <https://www.rfc-editor.org/info/rfc7657>. | |||
| [50] Fairhurst, G. and M. Welzl, "The Benefits of Using | [52] Fairhurst, G. and M. Welzl, "The Benefits of Using | |||
| Explicit Congestion Notification (ECN)", RFC 8087, | Explicit Congestion Notification (ECN)", RFC 8087, | |||
| DOI 10.17487/RFC8087, March 2017, | DOI 10.17487/RFC8087, March 2017, | |||
| <https://www.rfc-editor.org/info/rfc8087>. | <https://www.rfc-editor.org/info/rfc8087>. | |||
| [51] Fairhurst, G., Ed., Trammell, B., Ed., and M. Kuehlewind, | [53] Fairhurst, G., Ed., Trammell, B., Ed., and M. Kuehlewind, | |||
| Ed., "Services Provided by IETF Transport Protocols and | Ed., "Services Provided by IETF Transport Protocols and | |||
| Congestion Control Mechanisms", RFC 8095, | Congestion Control Mechanisms", RFC 8095, | |||
| DOI 10.17487/RFC8095, March 2017, | DOI 10.17487/RFC8095, March 2017, | |||
| <https://www.rfc-editor.org/info/rfc8095>. | <https://www.rfc-editor.org/info/rfc8095>. | |||
| [52] Welzl, M., Tuexen, M., and N. Khademi, "On the Usage of | [54] Welzl, M., Tuexen, M., and N. Khademi, "On the Usage of | |||
| Transport Features Provided by IETF Transport Protocols", | Transport Features Provided by IETF Transport Protocols", | |||
| RFC 8303, DOI 10.17487/RFC8303, February 2018, | RFC 8303, DOI 10.17487/RFC8303, February 2018, | |||
| <https://www.rfc-editor.org/info/rfc8303>. | <https://www.rfc-editor.org/info/rfc8303>. | |||
| [53] Chown, T., Loughney, J., and T. Winters, "IPv6 Node | [55] Chown, T., Loughney, J., and T. Winters, "IPv6 Node | |||
| Requirements", BCP 220, RFC 8504, DOI 10.17487/RFC8504, | Requirements", BCP 220, RFC 8504, DOI 10.17487/RFC8504, | |||
| January 2019, <https://www.rfc-editor.org/info/rfc8504>. | January 2019, <https://www.rfc-editor.org/info/rfc8504>. | |||
| [54] Trammell, B. and M. Kuehlewind, "The Wire Image of a | [56] Trammell, B. and M. Kuehlewind, "The Wire Image of a | |||
| Network Protocol", RFC 8546, DOI 10.17487/RFC8546, April | Network Protocol", RFC 8546, DOI 10.17487/RFC8546, April | |||
| 2019, <https://www.rfc-editor.org/info/rfc8546>. | 2019, <https://www.rfc-editor.org/info/rfc8546>. | |||
| [55] Bittau, A., Giffin, D., Handley, M., Mazieres, D., Slack, | [57] Bittau, A., Giffin, D., Handley, M., Mazieres, D., Slack, | |||
| Q., and E. Smith, "Cryptographic Protection of TCP Streams | Q., and E. Smith, "Cryptographic Protection of TCP Streams | |||
| (tcpcrypt)", RFC 8548, DOI 10.17487/RFC8548, May 2019, | (tcpcrypt)", RFC 8548, DOI 10.17487/RFC8548, May 2019, | |||
| <https://www.rfc-editor.org/info/rfc8548>. | <https://www.rfc-editor.org/info/rfc8548>. | |||
| [56] Hardie, T., Ed., "Transport Protocol Path Signals", | [58] Hardie, T., Ed., "Transport Protocol Path Signals", | |||
| RFC 8558, DOI 10.17487/RFC8558, April 2019, | RFC 8558, DOI 10.17487/RFC8558, April 2019, | |||
| <https://www.rfc-editor.org/info/rfc8558>. | <https://www.rfc-editor.org/info/rfc8558>. | |||
| [57] Ford, A., Raiciu, C., Handley, M., Bonaventure, O., and C. | [59] Ford, A., Raiciu, C., Handley, M., Bonaventure, O., and C. | |||
| Paasch, "TCP Extensions for Multipath Operation with | Paasch, "TCP Extensions for Multipath Operation with | |||
| Multiple Addresses", RFC 8684, DOI 10.17487/RFC8684, March | Multiple Addresses", RFC 8684, DOI 10.17487/RFC8684, March | |||
| 2020, <https://www.rfc-editor.org/info/rfc8684>. | 2020, <https://www.rfc-editor.org/info/rfc8684>. | |||
| [58] Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based | [60] Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based | |||
| Multiplexed and Secure Transport", RFC 9000, | Multiplexed and Secure Transport", RFC 9000, | |||
| DOI 10.17487/RFC9000, May 2021, | DOI 10.17487/RFC9000, May 2021, | |||
| <https://www.rfc-editor.org/info/rfc9000>. | <https://www.rfc-editor.org/info/rfc9000>. | |||
| [59] Fairhurst, G. and C. Perkins, "Considerations around | [61] Fairhurst, G. and C. Perkins, "Considerations around | |||
| Transport Header Confidentiality, Network Operations, and | Transport Header Confidentiality, Network Operations, and | |||
| the Evolution of Internet Transport Protocols", RFC 9065, | the Evolution of Internet Transport Protocols", RFC 9065, | |||
| DOI 10.17487/RFC9065, July 2021, | DOI 10.17487/RFC9065, July 2021, | |||
| <https://www.rfc-editor.org/info/rfc9065>. | <https://www.rfc-editor.org/info/rfc9065>. | |||
| [60] IANA, "Transmission Control Protocol (TCP) Parameters, | [62] IANA, "Transmission Control Protocol (TCP) Parameters, | |||
| https://www.iana.org/assignments/tcp-parameters/tcp- | https://www.iana.org/assignments/tcp-parameters/tcp- | |||
| parameters.xhtml", 2019. | parameters.xhtml", 2019. | |||
| [61] IANA, "Transmission Control Protocol (TCP) Header Flags, | [63] IANA, "Transmission Control Protocol (TCP) Header Flags, | |||
| https://www.iana.org/assignments/tcp-header-flags/tcp- | https://www.iana.org/assignments/tcp-header-flags/tcp- | |||
| header-flags.xhtml", 2019. | header-flags.xhtml", 2019. | |||
| [62] Gont, F., "Processing of IP Security/Compartment and | [64] Gont, F., "Processing of IP Security/Compartment and | |||
| Precedence Information by TCP", Work in Progress, | Precedence Information by TCP", Work in Progress, | |||
| Internet-Draft, draft-gont-tcpm-tcp-seccomp-prec-00, 29 | Internet-Draft, draft-gont-tcpm-tcp-seccomp-prec-00, 29 | |||
| March 2012, <http://www.ietf.org/internet-drafts/draft- | March 2012, <http://www.ietf.org/internet-drafts/draft- | |||
| gont-tcpm-tcp-seccomp-prec-00.txt>. | gont-tcpm-tcp-seccomp-prec-00.txt>. | |||
| [63] Gont, F. and D. Borman, "On the Validation of TCP Sequence | [65] Gont, F. and D. Borman, "On the Validation of TCP Sequence | |||
| Numbers", Work in Progress, Internet-Draft, draft-gont- | Numbers", Work in Progress, Internet-Draft, draft-gont- | |||
| tcpm-tcp-seq-validation-04, 11 March 2019, | tcpm-tcp-seq-validation-04, 11 March 2019, | |||
| <http://www.ietf.org/internet-drafts/draft-gont-tcpm-tcp- | <http://www.ietf.org/internet-drafts/draft-gont-tcpm-tcp- | |||
| seq-validation-04.txt>. | seq-validation-04.txt>. | |||
| [64] Touch, J. and W. Eddy, "TCP Extended Data Offset Option", | [66] Touch, J. and W. Eddy, "TCP Extended Data Offset Option", | |||
| Work in Progress, Internet-Draft, draft-ietf-tcpm-tcp-edo- | Work in Progress, Internet-Draft, draft-ietf-tcpm-tcp-edo- | |||
| 10, 19 July 2018, <http://www.ietf.org/internet-drafts/ | 10, 19 July 2018, <http://www.ietf.org/internet-drafts/ | |||
| draft-ietf-tcpm-tcp-edo-10.txt>. | draft-ietf-tcpm-tcp-edo-10.txt>. | |||
| [65] McQuistin, S., Band, V., Jacob, D., and C. Perkins, | [67] McQuistin, S., Band, V., Jacob, D., and C. Perkins, | |||
| "Describing Protocol Data Units with Augmented Packet | "Describing Protocol Data Units with Augmented Packet | |||
| Header Diagrams", Work in Progress, Internet-Draft, draft- | Header Diagrams", Work in Progress, Internet-Draft, draft- | |||
| mcquistin-augmented-ascii-diagrams-08, 5 May 2021, | mcquistin-augmented-ascii-diagrams-08, 5 May 2021, | |||
| <https://www.ietf.org/archive/id/draft-mcquistin- | <https://www.ietf.org/archive/id/draft-mcquistin- | |||
| augmented-ascii-diagrams-08.txt>. | augmented-ascii-diagrams-08.txt>. | |||
| [66] Thomson, M. and T. Pauly, "Long-term Viability of Protocol | [68] Thomson, M. and T. Pauly, "Long-term Viability of Protocol | |||
| Extension Mechanisms", Work in Progress, Internet-Draft, | Extension Mechanisms", Work in Progress, Internet-Draft, | |||
| draft-iab-use-it-or-lose-it-02, 23 August 2021, | draft-iab-use-it-or-lose-it-02, 23 August 2021, | |||
| <https://www.ietf.org/archive/id/draft-iab-use-it-or-lose- | <https://www.ietf.org/archive/id/draft-iab-use-it-or-lose- | |||
| it-02.txt>. | it-02.txt>. | |||
| [67] Minshall, G., "A Proposed Modification to Nagle's | [69] Minshall, G., "A Proposed Modification to Nagle's | |||
| Algorithm", Work in Progress, Internet-Draft, draft- | Algorithm", Work in Progress, Internet-Draft, draft- | |||
| minshall-nagle-01, June 1999, | minshall-nagle-01, June 1999, | |||
| <https://datatracker.ietf.org/doc/html/draft-minshall- | <https://datatracker.ietf.org/doc/html/draft-minshall- | |||
| nagle-01>. | nagle-01>. | |||
| [68] Dalal, Y. and C. Sunshine, "Connection Management in | [70] Dalal, Y. and C. Sunshine, "Connection Management in | |||
| Transport Protocols", Computer Networks Vol. 2, No. 6, pp. | Transport Protocols", Computer Networks Vol. 2, No. 6, pp. | |||
| 454-473, December 1978. | 454-473, December 1978. | |||
| [69] Faber, T., Touch, J., and W. Yui, "The TIME-WAIT state in | [71] Faber, T., Touch, J., and W. Yui, "The TIME-WAIT state in | |||
| TCP and Its Effect on Busy Servers", Proceedings of IEEE | TCP and Its Effect on Busy Servers", Proceedings of IEEE | |||
| INFOCOM pp. 1573-1583, March 1999. | INFOCOM pp. 1573-1583, March 1999. | |||
| [72] Postel, J., "Comments on Action Items from the January | ||||
| Meeting", IEN 177, March 1981, | ||||
| <https://www.rfc-editor.org/ien/ien177.txt>. | ||||
| [73] "Segmentation Offloads", Linux Networking Documentation , | ||||
| <https://www.kernel.org/doc/html/latest/networking/ | ||||
| segmentation-offloads.html>. | ||||
| Appendix A. Other Implementation Notes | Appendix A. Other Implementation Notes | |||
| This section includes additional notes and references on TCP | This section includes additional notes and references on TCP | |||
| implementation decisions that are currently not a part of the RFC | implementation decisions that are currently not a part of the RFC | |||
| series or included within the TCP standard. These items can be | series or included within the TCP standard. These items can be | |||
| considered by implementers, but there was not yet a consensus to | considered by implementers, but there was not yet a consensus to | |||
| include them in the standard. | include them in the standard. | |||
| A.1. IP Security Compartment and Precedence | A.1. IP Security Compartment and Precedence | |||
| The IPv4 specification [1] includes a precedence value in the (now | The IPv4 specification [1] includes a precedence value in the (now | |||
| obsoleted) Type of Service field (TOS) field. It was modified in | obsoleted) Type of Service field (TOS) field. It was modified in | |||
| [19], and then obsoleted by the definition of Differentiated Services | [21], and then obsoleted by the definition of Differentiated Services | |||
| (DiffServ) [4]. Setting and conveying TOS between the network layer, | (DiffServ) [4]. Setting and conveying TOS between the network layer, | |||
| TCP implementation, and applications is obsolete, and replaced by | TCP implementation, and applications is obsolete, and replaced by | |||
| DiffServ in the current TCP specification. | DiffServ in the current TCP specification. | |||
| RFC 793 requires checking the IP security compartment and precedence | RFC 793 required checking the IP security compartment and precedence | |||
| on incoming TCP segments for consistency within a connection, and | on incoming TCP segments for consistency within a connection, and | |||
| with application requests. Each of these aspects of IP have become | with application requests. Each of these aspects of IP have become | |||
| outdated, without specific updates to RFC 793. The issues with | outdated, without specific updates to RFC 793. The issues with | |||
| precedence were fixed by [23], which is Standards Track, and so this | precedence were fixed by [26], which is Standards Track, and so this | |||
| present TCP specification includes those changes. However, the state | present TCP specification includes those changes. However, the state | |||
| of IP security options that may be used by MLS systems is not as | of IP security options that may be used by MLS systems is not as | |||
| clean. | apparent in the IETF currently. | |||
| Resetting connections when incoming packets do not meet expected | Resetting connections when incoming packets do not meet expected | |||
| security compartment or precedence expectations has been recognized | security compartment or precedence expectations has been recognized | |||
| as a possible attack vector [62], and there has been discussion about | as a possible attack vector [64], and there has been discussion about | |||
| amending the TCP specification to prevent connections from being | amending the TCP specification to prevent connections from being | |||
| aborted due to non-matching IP security compartment and DiffServ | aborted due to non-matching IP security compartment and DiffServ | |||
| codepoint values. | codepoint values. | |||
| A.1.1. Precedence | A.1.1. Precedence | |||
| In DiffServ the former precedence values are treated as Class | In DiffServ the former precedence values are treated as Class | |||
| Selector codepoints, and methods for compatible treatment are | Selector codepoints, and methods for compatible treatment are | |||
| described in the DiffServ architecture. The RFC 793/1122 TCP | described in the DiffServ architecture. The RFC 793/1122 TCP | |||
| specification includes logic intending to have connections use the | specification includes logic intending to have connections use the | |||
| highest precedence requested by either endpoint application, and to | highest precedence requested by either endpoint application, and to | |||
| keep the precedence consistent throughout a connection. This logic | keep the precedence consistent throughout a connection. This logic | |||
| from the obsolete TOS is not applicable for DiffServ, and should not | from the obsolete TOS is not applicable for DiffServ, and should not | |||
| be included in TCP implementations, though changes to DiffServ values | be included in TCP implementations, though changes to DiffServ values | |||
| within a connection are discouraged. For discussion of this, see RFC | within a connection are discouraged. For discussion of this, see RFC | |||
| 7657 (sec 5.1, 5.3, and 6) [49]. | 7657 (sec 5.1, 5.3, and 6) [51]. | |||
| The obsoleted TOS processing rules in TCP assumed bidirectional (or | The obsoleted TOS processing rules in TCP assumed bidirectional (or | |||
| symmetric) precedence values used on a connection, but the DiffServ | symmetric) precedence values used on a connection, but the DiffServ | |||
| architecture is asymmetric. Problems with the old TCP logic in this | architecture is asymmetric. Problems with the old TCP logic in this | |||
| regard were described in [23] and the solution described is to ignore | regard were described in [26] and the solution described is to ignore | |||
| IP precedence in TCP. Since RFC 2873 is a Standards Track document | IP precedence in TCP. Since RFC 2873 is a Standards Track document | |||
| (although not marked as updating RFC 793), current implementations | (although not marked as updating RFC 793), current implementations | |||
| are expected to be robust to these conditions. Note that the | are expected to be robust to these conditions. Note that the | |||
| DiffServ field value used in each direction is a part of the | DiffServ field value used in each direction is a part of the | |||
| interface between TCP and the network layer, and values in use can be | interface between TCP and the network layer, and values in use can be | |||
| indicated both ways between TCP and the application. | indicated both ways between TCP and the application. | |||
| A.1.2. MLS Systems | A.1.2. MLS Systems | |||
| The IP security option (IPSO) and compartment defined in [1] was | The IP security option (IPSO) and compartment defined in [1] was | |||
| refined in RFC 1038 that was later obsoleted by RFC 1108. The | refined in RFC 1038 that was later obsoleted by RFC 1108. The | |||
| Commercial IP Security Option (CIPSO) is defined in FIPS-188, and is | Commercial IP Security Option (CIPSO) is defined in FIPS-188 | |||
| supported by some vendors and operating systems. RFC 1108 is now | (withdrawn by NIST in 2015), and is supported by some vendors and | |||
| Historic, though RFC 791 itself has not been updated to remove the IP | operating systems. RFC 1108 is now Historic, though RFC 791 itself | |||
| security option. For IPv6, a similar option (CALIPSO) has been | has not been updated to remove the IP security option. For IPv6, a | |||
| defined [34]. RFC 793 includes logic that includes the IP security/ | similar option (CALIPSO) has been defined [37]. RFC 793 includes | |||
| compartment information in treatment of TCP segments. References to | logic that includes the IP security/compartment information in | |||
| the IP "security/compartment" in this document may be relevant for | treatment of TCP segments. References to the IP "security/ | |||
| Multi-Level Secure (MLS) system implementers, but can be ignored for | compartment" in this document may be relevant for Multi-Level Secure | |||
| non-MLS implementations, consistent with running code on the | (MLS) system implementers, but can be ignored for non-MLS | |||
| Internet. See Appendix A.1 for further discussion. Note that RFC | implementations, consistent with running code on the Internet. See | |||
| 5570 describes some MLS networking scenarios where IPSO, CIPSO, or | Appendix A.1 for further discussion. Note that RFC 5570 describes | |||
| CALIPSO may be used. In these special cases, TCP implementers should | some MLS networking scenarios where IPSO, CIPSO, or CALIPSO may be | |||
| see section 7.3.1 of RFC 5570, and follow the guidance in that | used. In these special cases, TCP implementers should see section | |||
| document. | 7.3.1 of RFC 5570, and follow the guidance in that document. | |||
| A.2. Sequence Number Validation | A.2. Sequence Number Validation | |||
| There are cases where the TCP sequence number validation rules can | There are cases where the TCP sequence number validation rules can | |||
| prevent ACK fields from being processed. This can result in | prevent ACK fields from being processed. This can result in | |||
| connection issues, as described in [63], which includes descriptions | connection issues, as described in [65], which includes descriptions | |||
| of potential problems in conditions of simultaneous open, self- | of potential problems in conditions of simultaneous open, self- | |||
| connects, simultaneous close, and simultaneous window probes. The | connects, simultaneous close, and simultaneous window probes. The | |||
| document also describes potential changes to the TCP specification to | document also describes potential changes to the TCP specification to | |||
| mitigate the issue by expanding the acceptable sequence numbers. | mitigate the issue by expanding the acceptable sequence numbers. | |||
| In Internet usage of TCP, these conditions are rarely occurring. | In Internet usage of TCP, these conditions are rarely occurring. | |||
| Common operating systems include different alternative mitigations, | Common operating systems include different alternative mitigations, | |||
| and the standard has not been updated yet to codify one of them, but | and the standard has not been updated yet to codify one of them, but | |||
| implementers should consider the problems described in [63]. | implementers should consider the problems described in [65]. | |||
| A.3. Nagle Modification | A.3. Nagle Modification | |||
| In common operating systems, both the Nagle algorithm and delayed | In common operating systems, both the Nagle algorithm and delayed | |||
| acknowledgements are implemented and enabled by default. TCP is used | acknowledgements are implemented and enabled by default. TCP is used | |||
| by many applications that have a request-response style of | by many applications that have a request-response style of | |||
| communication, where the combination of the Nagle algorithm and | communication, where the combination of the Nagle algorithm and | |||
| delayed acknowledgements can result in poor application performance. | delayed acknowledgements can result in poor application performance. | |||
| A modification to the Nagle algorithm is described in [67] that | A modification to the Nagle algorithm is described in [69] that | |||
| improves the situation for these applications. | improves the situation for these applications. | |||
| This modification is implemented in some common operating systems, | This modification is implemented in some common operating systems, | |||
| and does not impact TCP interoperability. Additionally, many | and does not impact TCP interoperability. Additionally, many | |||
| applications simply disable Nagle, since this is generally supported | applications simply disable Nagle, since this is generally supported | |||
| by a socket option. The TCP standard has not been updated to include | by a socket option. The TCP standard has not been updated to include | |||
| this Nagle modification, but implementers may find it beneficial to | this Nagle modification, but implementers may find it beneficial to | |||
| consider. | consider. | |||
| A.4. Low Water Mark Settings | A.4. Low Watermark Settings | |||
| Some operating system kernel TCP implementations include socket | Some operating system kernel TCP implementations include socket | |||
| options that allow specifying the number of bytes in the buffer until | options that allow specifying the number of bytes in the buffer until | |||
| the socket layer will pass sent data to TCP (SO_SNDLOWAT) or to the | the socket layer will pass sent data to TCP (SO_SNDLOWAT) or to the | |||
| application on receiving (SO_RCVLOWAT). | application on receiving (SO_RCVLOWAT). | |||
| In addition, another socket option (TCP_NOTSENT_LOWAT) can be used to | In addition, another socket option (TCP_NOTSENT_LOWAT) can be used to | |||
| control the amount of unsent bytes in the write queue. This can help | control the amount of unsent bytes in the write queue. This can help | |||
| a sending TCP application to avoid creating large amounts of buffered | a sending TCP application to avoid creating large amounts of buffered | |||
| data (and corresponding latency). As an example, this may be useful | data (and corresponding latency). As an example, this may be useful | |||
| skipping to change at page 110, line 10 ¶ | skipping to change at page 111, line 48 ¶ | |||
| Implement sending & receiving MSS option | MUST-14|x| | | | | | Implement sending & receiving MSS option | MUST-14|x| | | | | | |||
| IPv4 Send MSS option unless 536 | SHLD-5 | |x| | | | | IPv4 Send MSS option unless 536 | SHLD-5 | |x| | | | | |||
| IPv6 Send MSS option unless 1220 | SHLD-5 | |x| | | | | IPv6 Send MSS option unless 1220 | SHLD-5 | |x| | | | | |||
| Send MSS option always | MAY-3 | | |x| | | | Send MSS option always | MAY-3 | | |x| | | | |||
| IPv4 Send-MSS default is 536 | MUST-15|x| | | | | | IPv4 Send-MSS default is 536 | MUST-15|x| | | | | | |||
| IPv6 Send-MSS default is 1220 | MUST-15|x| | | | | | IPv6 Send-MSS default is 1220 | MUST-15|x| | | | | | |||
| Calculate effective send seg size | MUST-16|x| | | | | | Calculate effective send seg size | MUST-16|x| | | | | | |||
| MSS accounts for varying MTU | SHLD-6 | |x| | | | | MSS accounts for varying MTU | SHLD-6 | |x| | | | | |||
| MSS not sent on non-SYN segments | MUST-65| | | | |x| | MSS not sent on non-SYN segments | MUST-65| | | | |x| | |||
| MSS value based on MMS_R | MUST-67|x| | | | | | MSS value based on MMS_R | MUST-67|x| | | | | | |||
| Pad with zero | MUST-69|x| | | | | | ||||
| | | | | | | | | | | | | | | | | |||
| TCP Checksums | | | | | | | | TCP Checksums | | | | | | | | |||
| Sender compute checksum | MUST-2 |x| | | | | | Sender compute checksum | MUST-2 |x| | | | | | |||
| Receiver check checksum | MUST-3 |x| | | | | | Receiver check checksum | MUST-3 |x| | | | | | |||
| | | | | | | | | | | | | | | | | |||
| ISN Selection | | | | | | | | ISN Selection | | | | | | | | |||
| Include a clock-driven ISN generator component | MUST-8 |x| | | | | | Include a clock-driven ISN generator component | MUST-8 |x| | | | | | |||
| Secure ISN generator with a PRF component | SHLD-1 | |x| | | | | Secure ISN generator with a PRF component | SHLD-1 | |x| | | | | |||
| PRF computable from outside the host | MUST-9 | | | | |x| | PRF computable from outside the host | MUST-9 | | | | |x| | |||
| | | | | | | | | | | | | | | | | |||
| skipping to change at page 111, line 4 ¶ | skipping to change at page 112, line 43 ¶ | |||
| Retransmit with same IP ident | MAY-4 | | |x| | | | Retransmit with same IP ident | MAY-4 | | |x| | | | |||
| Karn's algorithm | MUST-18|x| | | | | | Karn's algorithm | MUST-18|x| | | | | | |||
| | | | | | | | | | | | | | | | | |||
| Generating ACKs: | | | | | | | | Generating ACKs: | | | | | | | | |||
| Aggregate whenever possible | MUST-58|x| | | | | | Aggregate whenever possible | MUST-58|x| | | | | | |||
| Queue out-of-order segments | SHLD-31| |x| | | | | Queue out-of-order segments | SHLD-31| |x| | | | | |||
| Process all Q'd before send ACK | MUST-59|x| | | | | | Process all Q'd before send ACK | MUST-59|x| | | | | | |||
| Send ACK for out-of-order segment | MAY-13 | | |x| | | | Send ACK for out-of-order segment | MAY-13 | | |x| | | | |||
| Delayed ACKs | SHLD-18| |x| | | | | Delayed ACKs | SHLD-18| |x| | | | | |||
| Delay < 0.5 seconds | MUST-40|x| | | | | | Delay < 0.5 seconds | MUST-40|x| | | | | | |||
| Every 2nd full-sized segment or 2*RMSS ACK'd | SHLD-19|x| | | | | | Every 2nd full-sized segment or 2*RMSS ACK'd | SHLD-19| |x| | | | | |||
| Receiver SWS-Avoidance Algorithm | MUST-39|x| | | | | | Receiver SWS-Avoidance Algorithm | MUST-39|x| | | | | | |||
| | | | | | | | | | | | | | | | | |||
| Sending data | | | | | | | | Sending data | | | | | | | | |||
| Configurable TTL | MUST-49|x| | | | | | Configurable TTL | MUST-49|x| | | | | | |||
| Sender SWS-Avoidance Algorithm | MUST-38|x| | | | | | Sender SWS-Avoidance Algorithm | MUST-38|x| | | | | | |||
| Nagle algorithm | SHLD-7 | |x| | | | | Nagle algorithm | SHLD-7 | |x| | | | | |||
| Application can disable Nagle algorithm | MUST-17|x| | | | | | Application can disable Nagle algorithm | MUST-17|x| | | | | | |||
| | | | | | | | | | | | | | | | | |||
| Connection Failures: | | | | | | | | Connection Failures: | | | | | | | | |||
| Negative advice to IP on R1 retxs | MUST-20|x| | | | | | Negative advice to IP on R1 retxs | MUST-20|x| | | | | | |||
| Close connection on R2 retxs | MUST-20|x| | | | | | Close connection on R2 retxs | MUST-20|x| | | | | | |||
| ALP can set R2 | MUST-21|x| | | | |1 | ALP can set R2 | MUST-21|x| | | | |1 | |||
| Inform ALP of R1<=retxs<R2 | SHLD-9 | |x| | | |1 | Inform ALP of R1<=retxs<R2 | SHLD-9 | |x| | | |1 | |||
| Recommended value for R1 | SHLD-10| |x| | | | | Recommended value for R1 | SHLD-10| |x| | | | | |||
| Recommended value for R2 | SHLD-11| |x| | | | | Recommended value for R2 | SHLD-11| |x| | | | | |||
| Same mechanism for SYNs | MUST-22|x| | | | | | Same mechanism for SYNs | MUST-22|x| | | | | | |||
| R2 at least 3 minutes for SYN | MUST-23|x| | | | | | R2 at least 3 minutes for SYN | MUST-23|x| | | | | | |||
| | | | | | | | | | | | | | | | | |||
| skipping to change at page 111, line 45 ¶ | skipping to change at page 113, line 37 ¶ | |||
| Time Stamp support | MAY-10 | | |x| | | | Time Stamp support | MAY-10 | | |x| | | | |||
| Record Route support | MAY-11 | | |x| | | | Record Route support | MAY-11 | | |x| | | | |||
| Source Route: | | | | | | | | Source Route: | | | | | | | | |||
| ALP can specify | MUST-51|x| | | | |1 | ALP can specify | MUST-51|x| | | | |1 | |||
| Overrides src rt in datagram | MUST-52|x| | | | | | Overrides src rt in datagram | MUST-52|x| | | | | | |||
| Build return route from src rt | MUST-53|x| | | | | | Build return route from src rt | MUST-53|x| | | | | | |||
| Later src route overrides | SHLD-24| |x| | | | | Later src route overrides | SHLD-24| |x| | | | | |||
| | | | | | | | | | | | | | | | | |||
| Receiving ICMP Messages from IP | MUST-54|x| | | | | | Receiving ICMP Messages from IP | MUST-54|x| | | | | | |||
| Dest. Unreach (0,1,5) => inform ALP | SHLD-25| |x| | | | | Dest. Unreach (0,1,5) => inform ALP | SHLD-25| |x| | | | | |||
| Dest. Unreach (0,1,5) => abort conn | MUST-56| | | | |x| | Abort on Dest. Unreach (0,1,5) =>nn | MUST-56| | | | |x| | |||
| Dest. Unreach (2-4) => abort conn | SHLD-26| |x| | | | | Dest. Unreach (2-4) => abort conn | SHLD-26| |x| | | | | |||
| Source Quench => silent discard | MUST-55|x| | | | | | Source Quench => silent discard | MUST-55|x| | | | | | |||
| Time Exceeded => tell ALP, don't abort | MUST-56| | | | |x| | Abort on Time Exceeded => | MUST-56| | | | |x| | |||
| Param Problem => tell ALP, don't abort | MUST-56| | | | |x| | Abort on Param Problem => | MUST-56| | | | |x| | |||
| | | | | | | | | | | | | | | | | |||
| Address Validation | | | | | | | | Address Validation | | | | | | | | |||
| Reject OPEN call to invalid IP address | MUST-46|x| | | | | | Reject OPEN call to invalid IP address | MUST-46|x| | | | | | |||
| Reject SYN from invalid IP address | MUST-63|x| | | | | | Reject SYN from invalid IP address | MUST-63|x| | | | | | |||
| Silently discard SYN to bcast/mcast addr | MUST-57|x| | | | | | Silently discard SYN to bcast/mcast addr | MUST-57|x| | | | | | |||
| | | | | | | | | | | | | | | | | |||
| TCP/ALP Interface Services | | | | | | | | TCP/ALP Interface Services | | | | | | | | |||
| Error Report mechanism | MUST-47|x| | | | | | Error Report mechanism | MUST-47|x| | | | | | |||
| ALP can disable Error Report Routine | SHLD-20| |x| | | | | ALP can disable Error Report Routine | SHLD-20| |x| | | | | |||
| ALP can specify DiffServ field for sending | MUST-48|x| | | | | | ALP can specify DiffServ field for sending | MUST-48|x| | | | | | |||
| End of changes. 324 change blocks. | ||||
| 709 lines changed or deleted | 782 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||