| < draft-ietf-tcpm-accurate-ecn-17.txt | draft-ietf-tcpm-accurate-ecn-18.txt > | |||
|---|---|---|---|---|
| TCP Maintenance & Minor Extensions (tcpm) B. Briscoe | TCP Maintenance & Minor Extensions (tcpm) B. Briscoe | |||
| Internet-Draft Independent | Internet-Draft Independent | |||
| Updates: 3168, 3449 (if approved) M. Kuehlewind | Updates: 3168, 3449 (if approved) M. Kuehlewind | |||
| Intended status: Standards Track Ericsson | Intended status: Standards Track Ericsson | |||
| Expires: September 8, 2022 R. Scheffenegger | Expires: September 23, 2022 R. Scheffenegger | |||
| NetApp | NetApp | |||
| March 7, 2022 | March 22, 2022 | |||
| More Accurate ECN Feedback in TCP | More Accurate ECN Feedback in TCP | |||
| draft-ietf-tcpm-accurate-ecn-17 | draft-ietf-tcpm-accurate-ecn-18 | |||
| Abstract | Abstract | |||
| Explicit Congestion Notification (ECN) is a mechanism where network | Explicit Congestion Notification (ECN) is a mechanism where network | |||
| nodes can mark IP packets instead of dropping them to indicate | nodes can mark IP packets instead of dropping them to indicate | |||
| incipient congestion to the end-points. Receivers with an ECN- | incipient congestion to the end-points. Receivers with an ECN- | |||
| capable transport protocol feed back this information to the sender. | capable transport protocol feed back this information to the sender. | |||
| ECN was originally specified for TCP in such a way that only one | ECN was originally specified for TCP in such a way that only one | |||
| feedback signal can be transmitted per Round-Trip Time (RTT). Recent | feedback signal can be transmitted per Round-Trip Time (RTT). Recent | |||
| new TCP mechanisms like Congestion Exposure (ConEx), Data Center TCP | new TCP mechanisms like Congestion Exposure (ConEx), Data Center TCP | |||
| skipping to change at page 2, line 7 ¶ | skipping to change at page 2, line 7 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on September 8, 2022. | This Internet-Draft will expire on September 23, 2022. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2022 IETF Trust and the persons identified as the | Copyright (c) 2022 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| skipping to change at page 2, line 45 ¶ | skipping to change at page 2, line 45 ¶ | |||
| 2.4. Feedback Metrics . . . . . . . . . . . . . . . . . . . . 10 | 2.4. Feedback Metrics . . . . . . . . . . . . . . . . . . . . 10 | |||
| 2.5. Generic (Dumb) Reflector . . . . . . . . . . . . . . . . 11 | 2.5. Generic (Dumb) Reflector . . . . . . . . . . . . . . . . 11 | |||
| 3. AccECN Protocol Specification . . . . . . . . . . . . . . . . 12 | 3. AccECN Protocol Specification . . . . . . . . . . . . . . . . 12 | |||
| 3.1. Negotiating to use AccECN . . . . . . . . . . . . . . . . 12 | 3.1. Negotiating to use AccECN . . . . . . . . . . . . . . . . 12 | |||
| 3.1.1. Negotiation during the TCP handshake . . . . . . . . 12 | 3.1.1. Negotiation during the TCP handshake . . . . . . . . 12 | |||
| 3.1.2. Backward Compatibility . . . . . . . . . . . . . . . 13 | 3.1.2. Backward Compatibility . . . . . . . . . . . . . . . 13 | |||
| 3.1.3. Forward Compatibility . . . . . . . . . . . . . . . . 15 | 3.1.3. Forward Compatibility . . . . . . . . . . . . . . . . 15 | |||
| 3.1.4. Retransmission of the SYN . . . . . . . . . . . . . . 15 | 3.1.4. Retransmission of the SYN . . . . . . . . . . . . . . 15 | |||
| 3.1.5. Implications of AccECN Mode . . . . . . . . . . . . . 16 | 3.1.5. Implications of AccECN Mode . . . . . . . . . . . . . 16 | |||
| 3.2. AccECN Feedback . . . . . . . . . . . . . . . . . . . . . 18 | 3.2. AccECN Feedback . . . . . . . . . . . . . . . . . . . . . 18 | |||
| 3.2.1. Initialization of Feedback Counters . . . . . . . . . 18 | 3.2.1. Initialization of Feedback Counters . . . . . . . . . 19 | |||
| 3.2.2. The ACE Field . . . . . . . . . . . . . . . . . . . . 19 | 3.2.2. The ACE Field . . . . . . . . . . . . . . . . . . . . 19 | |||
| 3.2.2.1. ACE Field on the ACK of the SYN/ACK . . . . . . . 20 | 3.2.2.1. ACE Field on the ACK of the SYN/ACK . . . . . . . 20 | |||
| 3.2.2.2. Encoding and Decoding Feedback in the ACE Field . 21 | 3.2.2.2. Encoding and Decoding Feedback in the ACE Field . 21 | |||
| 3.2.2.3. Testing for Mangling of the IP/ECN Field . . . . 23 | 3.2.2.3. Testing for Mangling of the IP/ECN Field . . . . 23 | |||
| 3.2.2.4. Testing for Zeroing of the ACE Field . . . . . . 24 | 3.2.2.4. Testing for Zeroing of the ACE Field . . . . . . 25 | |||
| 3.2.2.5. Safety against Ambiguity of the ACE Field . . . . 25 | 3.2.2.5. Safety against Ambiguity of the ACE Field . . . . 26 | |||
| 3.2.3. The AccECN Option . . . . . . . . . . . . . . . . . . 28 | 3.2.3. The AccECN Option . . . . . . . . . . . . . . . . . . 28 | |||
| 3.2.3.1. Encoding and Decoding Feedback in the AccECN | 3.2.3.1. Encoding and Decoding Feedback in the AccECN | |||
| Option Fields . . . . . . . . . . . . . . . . . . 30 | Option Fields . . . . . . . . . . . . . . . . . . 30 | |||
| 3.2.3.2. Path Traversal of the AccECN Option . . . . . . . 30 | 3.2.3.2. Path Traversal of the AccECN Option . . . . . . . 31 | |||
| 3.2.3.3. Usage of the AccECN TCP Option . . . . . . . . . 34 | 3.2.3.3. Usage of the AccECN TCP Option . . . . . . . . . 35 | |||
| 3.3. AccECN Compliance Requirements for TCP Proxies, Offload | 3.3. AccECN Compliance Requirements for TCP Proxies, Offload | |||
| Engines and other Middleboxes . . . . . . . . . . . . . . 36 | Engines and other Middleboxes . . . . . . . . . . . . . . 37 | |||
| 3.3.1. Requirements for TCP Proxies . . . . . . . . . . . . 36 | 3.3.1. Requirements for TCP Proxies . . . . . . . . . . . . 37 | |||
| 3.3.2. Requirements for Transparent Middleboxes and TCP | 3.3.2. Requirements for Transparent Middleboxes and TCP | |||
| Normalizers . . . . . . . . . . . . . . . . . . . . . 36 | Normalizers . . . . . . . . . . . . . . . . . . . . . 37 | |||
| 3.3.3. Requirements for TCP ACK Filtering . . . . . . . . . 37 | 3.3.3. Requirements for TCP ACK Filtering . . . . . . . . . 38 | |||
| 3.3.4. Requirements for TCP Segmentation Offload . . . . . . 38 | 3.3.4. Requirements for TCP Segmentation Offload . . . . . . 39 | |||
| 4. Updates to RFC 3168 . . . . . . . . . . . . . . . . . . . . . 39 | 4. Updates to RFC 3168 . . . . . . . . . . . . . . . . . . . . . 40 | |||
| 5. Interaction with TCP Variants . . . . . . . . . . . . . . . . 40 | 5. Interaction with TCP Variants . . . . . . . . . . . . . . . . 41 | |||
| 5.1. Compatibility with SYN Cookies . . . . . . . . . . . . . 40 | 5.1. Compatibility with SYN Cookies . . . . . . . . . . . . . 41 | |||
| 5.2. Compatibility with TCP Experiments and Common TCP Options 41 | 5.2. Compatibility with TCP Experiments and Common TCP Options 42 | |||
| 5.3. Compatibility with Feedback Integrity Mechanisms . . . . 41 | 5.3. Compatibility with Feedback Integrity Mechanisms . . . . 42 | |||
| 6. Protocol Properties . . . . . . . . . . . . . . . . . . . . . 42 | 6. Protocol Properties . . . . . . . . . . . . . . . . . . . . . 43 | |||
| 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 44 | 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 45 | |||
| 8. Security Considerations . . . . . . . . . . . . . . . . . . . 46 | 8. Security Considerations . . . . . . . . . . . . . . . . . . . 47 | |||
| 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 47 | 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 48 | |||
| 10. Comments Solicited . . . . . . . . . . . . . . . . . . . . . 47 | 10. Comments Solicited . . . . . . . . . . . . . . . . . . . . . 48 | |||
| 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 47 | 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 48 | |||
| 11.1. Normative References . . . . . . . . . . . . . . . . . . 47 | 11.1. Normative References . . . . . . . . . . . . . . . . . . 48 | |||
| 11.2. Informative References . . . . . . . . . . . . . . . . . 48 | 11.2. Informative References . . . . . . . . . . . . . . . . . 49 | |||
| Appendix A. Example Algorithms . . . . . . . . . . . . . . . . . 51 | Appendix A. Example Algorithms . . . . . . . . . . . . . . . . . 52 | |||
| A.1. Example Algorithm to Encode/Decode the AccECN Option . . 51 | A.1. Example Algorithm to Encode/Decode the AccECN Option . . 52 | |||
| A.2. Example Algorithm for Safety Against Long Sequences of | A.2. Example Algorithm for Safety Against Long Sequences of | |||
| ACK Loss . . . . . . . . . . . . . . . . . . . . . . . . 52 | ACK Loss . . . . . . . . . . . . . . . . . . . . . . . . 53 | |||
| A.2.1. Safety Algorithm without the AccECN Option . . . . . 52 | A.2.1. Safety Algorithm without the AccECN Option . . . . . 53 | |||
| A.2.2. Safety Algorithm with the AccECN Option . . . . . . . 54 | A.2.2. Safety Algorithm with the AccECN Option . . . . . . . 55 | |||
| A.3. Example Algorithm to Estimate Marked Bytes from Marked | A.3. Example Algorithm to Estimate Marked Bytes from Marked | |||
| Packets . . . . . . . . . . . . . . . . . . . . . . . . . 56 | Packets . . . . . . . . . . . . . . . . . . . . . . . . . 57 | |||
| A.4. Example Algorithm to Count Not-ECT Bytes . . . . . . . . 57 | A.4. Example Algorithm to Count Not-ECT Bytes . . . . . . . . 58 | |||
| Appendix B. Rationale for Usage of TCP Header Flags . . . . . . 57 | Appendix B. Rationale for Usage of TCP Header Flags . . . . . . 58 | |||
| B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake . . . 57 | B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake . . . 58 | |||
| B.2. Four Codepoints in the SYN/ACK . . . . . . . . . . . . . 58 | B.2. Four Codepoints in the SYN/ACK . . . . . . . . . . . . . 59 | |||
| B.3. Space for Future Evolution . . . . . . . . . . . . . . . 59 | B.3. Space for Future Evolution . . . . . . . . . . . . . . . 60 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 60 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 61 | |||
| 1. Introduction | 1. Introduction | |||
| Explicit Congestion Notification (ECN) [RFC3168] is a mechanism where | Explicit Congestion Notification (ECN) [RFC3168] is a mechanism where | |||
| network nodes can mark IP packets instead of dropping them to | network nodes can mark IP packets instead of dropping them to | |||
| indicate incipient congestion to the end-points. Receivers with an | indicate incipient congestion to the end-points. Receivers with an | |||
| ECN-capable transport protocol feed back this information to the | ECN-capable transport protocol feed back this information to the | |||
| sender. In RFC 3168, ECN was specified for TCP in such a way that | sender. In RFC 3168, ECN was specified for TCP in such a way that | |||
| only one feedback signal could be transmitted per Round-Trip Time | only one feedback signal could be transmitted per Round-Trip Time | |||
| (RTT). Recently, proposed mechanisms like Congestion Exposure (ConEx | (RTT). Recently, proposed mechanisms like Congestion Exposure (ConEx | |||
| skipping to change at page 5, line 13 ¶ | skipping to change at page 5, line 13 ¶ | |||
| [RFC5681] to respond to the existence of at least one congestion | [RFC5681] to respond to the existence of at least one congestion | |||
| notification within a round trip. Or, unlike Reno, AccECN can be | notification within a round trip. Or, unlike Reno, AccECN can be | |||
| used to respond to the extent of congestion notification over a round | used to respond to the extent of congestion notification over a round | |||
| trip, as for example DCTCP does in controlled environments [RFC8257]. | trip, as for example DCTCP does in controlled environments [RFC8257]. | |||
| For congestion response, this specification refers to RFC 3168, or | For congestion response, this specification refers to RFC 3168, or | |||
| ECN experiments such as those referred to in [RFC8311], namely: a | ECN experiments such as those referred to in [RFC8311], namely: a | |||
| TCP-based Low Latency Low Loss Scalable (L4S) congestion control | TCP-based Low Latency Low Loss Scalable (L4S) congestion control | |||
| [I-D.ietf-tsvwg-l4s-arch]; or Alternative Backoff with ECN (ABE) | [I-D.ietf-tsvwg-l4s-arch]; or Alternative Backoff with ECN (ABE) | |||
| [RFC8511]. | [RFC8511]. | |||
| It is recommended that the AccECN protocol is implemented alongside | It is RECOMMENDED that the AccECN protocol is implemented alongside | |||
| SACK [RFC2018] and the experimental ECN++ protocol | SACK [RFC2018] and the experimental ECN++ protocol | |||
| [I-D.ietf-tcpm-generalized-ecn], which allows the ECN capability to | [I-D.ietf-tcpm-generalized-ecn], which allows the ECN capability to | |||
| be used on TCP control packets. Therefore, this specification does | be used on TCP control packets. Therefore, this specification does | |||
| not discuss implementing AccECN alongside [RFC5562], which was an | not discuss implementing AccECN alongside [RFC5562], which was an | |||
| earlier experimental protocol with narrower scope than ECN++. | earlier experimental protocol with narrower scope than ECN++. | |||
| 1.1. Document Roadmap | 1.1. Document Roadmap | |||
| The following introductory section outlines the goals of AccECN | The following introductory section outlines the goals of AccECN | |||
| (Section 1.2). Then terminology is defined (Section 1.3) and a recap | (Section 1.2). Then terminology is defined (Section 1.3) and a recap | |||
| skipping to change at page 9, line 50 ¶ | skipping to change at page 9, line 50 ¶ | |||
| packet-based CE counter using the ECN bits in the TCP header, now | packet-based CE counter using the ECN bits in the TCP header, now | |||
| renamed the Accurate ECN (ACE) field (see Figure 3 later). The 24 | renamed the Accurate ECN (ACE) field (see Figure 3 later). The 24 | |||
| LSBs of each byte counter are carried in the AccECN Option. | LSBs of each byte counter are carried in the AccECN Option. | |||
| 2.3. Delayed ACKs and Resilience Against ACK Loss | 2.3. Delayed ACKs and Resilience Against ACK Loss | |||
| With both the ACE and the AccECN Option mechanisms, the Data Receiver | With both the ACE and the AccECN Option mechanisms, the Data Receiver | |||
| continually repeats the current LSBs of each of its respective | continually repeats the current LSBs of each of its respective | |||
| counters. There is no need to acknowledge these continually repeated | counters. There is no need to acknowledge these continually repeated | |||
| counters, so the congestion window reduced (CWR) mechanism is no | counters, so the congestion window reduced (CWR) mechanism is no | |||
| longer used. Even if some ACKs are lost, the Data Sender should be | longer used. Even if some ACKs are lost, the Data Sender ought to be | |||
| able to infer how much to increment its own counters, even if the | able to infer how much to increment its own counters, even if the | |||
| protocol field has wrapped. | protocol field has wrapped. | |||
| The 3-bit ACE field can wrap fairly frequently. Therefore, even if | The 3-bit ACE field can wrap fairly frequently. Therefore, even if | |||
| it appears to have incremented by one (say), the field might have | it appears to have incremented by one (say), the field might have | |||
| actually cycled completely then incremented by one. The Data | actually cycled completely then incremented by one. The Data | |||
| Receiver is not allowed to delay sending an ACK to such an extent | Receiver is not allowed to delay sending an ACK to such an extent | |||
| that the ACE field would cycle. However cycling is still a | that the ACE field would cycle. However cycling is still a | |||
| possibility at the Data Sender because a whole sequence of ACKs | possibility at the Data Sender because a whole sequence of ACKs | |||
| carrying intervening values of the field might all be lost or delayed | carrying intervening values of the field might all be lost or delayed | |||
| skipping to change at page 10, line 51 ¶ | skipping to change at page 10, line 51 ¶ | |||
| The CE packet counter in the ACE field and the CE byte counter in the | The CE packet counter in the ACE field and the CE byte counter in the | |||
| AccECN Option both provide feedback on received CE-marks. The CE | AccECN Option both provide feedback on received CE-marks. The CE | |||
| packet counter includes control packets that do not have payload | packet counter includes control packets that do not have payload | |||
| data, while the CE byte counter solely includes marked payload bytes. | data, while the CE byte counter solely includes marked payload bytes. | |||
| If both are present, the byte counter in the option will provide the | If both are present, the byte counter in the option will provide the | |||
| more accurate information needed for modern congestion control and | more accurate information needed for modern congestion control and | |||
| policing schemes, such as L4S, DCTCP or ConEx. If the option is | policing schemes, such as L4S, DCTCP or ConEx. If the option is | |||
| stripped, a simple algorithm to estimate the number of marked bytes | stripped, a simple algorithm to estimate the number of marked bytes | |||
| from the ACE field is given in Appendix A.3. | from the ACE field is given in Appendix A.3. | |||
| Feedback in bytes is recommended in order to protect against the | Feedback in bytes is provided in order to protect against the | |||
| receiver using attacks similar to 'ACK-Division' to artificially | receiver using attacks similar to 'ACK-Division' to artificially | |||
| inflate the congestion window, which is why [RFC5681] now recommends | inflate the congestion window, which is why [RFC5681] now recommends | |||
| that TCP counts acknowledged bytes not packets. | that TCP counts acknowledged bytes not packets. | |||
| 2.5. Generic (Dumb) Reflector | 2.5. Generic (Dumb) Reflector | |||
| The ACE field provides feedback about CE markings in the IP-ECN field | The ACE field provides feedback about CE markings in the IP-ECN field | |||
| of both data and control packets. According to [RFC3168] the Data | of both data and control packets. According to [RFC3168] the Data | |||
| Sender is meant to set the IP-ECN field of control packets to Not- | Sender is meant to set the IP-ECN field of control packets to Not- | |||
| ECT. However, mechanisms in certain private networks (e.g. data | ECT. However, mechanisms in certain private networks (e.g. data | |||
| skipping to change at page 11, line 38 ¶ | skipping to change at page 11, line 38 ¶ | |||
| supports future scenarios in which SYNs might be ECN-enabled (without | supports future scenarios in which SYNs might be ECN-enabled (without | |||
| prejudging whether they ought to be). For instance, [RFC8311] | prejudging whether they ought to be). For instance, [RFC8311] | |||
| updates this aspect of RFC 3168 to allow experimentation with ECN- | updates this aspect of RFC 3168 to allow experimentation with ECN- | |||
| capable TCP control packets. | capable TCP control packets. | |||
| Even if the TCP client (or server) has set the SYN (or SYN/ACK) to | Even if the TCP client (or server) has set the SYN (or SYN/ACK) to | |||
| not-ECT in compliance with RFC 3168, feedback on the state of the IP- | not-ECT in compliance with RFC 3168, feedback on the state of the IP- | |||
| ECN field when it arrives at the receiver could still be useful, | ECN field when it arrives at the receiver could still be useful, | |||
| because middleboxes have been known to overwrite the IP-ECN field as | because middleboxes have been known to overwrite the IP-ECN field as | |||
| if it is still part of the old Type of Service (ToS) field | if it is still part of the old Type of Service (ToS) field | |||
| [Mandalari18]. If a TCP client has set the SYN to Not-ECT, but | [Mandalari18]. For example, if a TCP client has set the SYN to Not- | |||
| receives feedback that the IP-ECN field on the SYN arrived with a | ECT, but receives feedback that the IP-ECN field on the SYN arrived | |||
| different codepoint, it can detect such middlebox interference and | with a different codepoint, it can detect such middlebox | |||
| send Not-ECT for the rest of the connection. Previously, if a TCP | interference. Previously, neither end knew what IP-ECN field the | |||
| server received ECT or CE on a SYN, it could not know whether it was | other had sent. So, if a TCP server received ECT or CE on a SYN, it | |||
| invalid (or valid) because only the TCP client knew whether it | could not know whether it was invalid (or valid) because only the TCP | |||
| originally marked the SYN as Not-ECT (or ECT). Therefore, prior to | client knew whether it originally marked the SYN as Not-ECT (or ECT). | |||
| AccECN, the server's only safe course of action was to disable ECN | Therefore, prior to AccECN, the server's only safe course of action | |||
| for the connection. Instead, the AccECN protocol allows the server | in this example was to disable ECN for the connection. Instead, the | |||
| to feed back the received ECN field to the client, which then has all | AccECN protocol allows the server to feed back the received ECN field | |||
| the information to decide whether the connection has to fall-back | to the client, which then has all the information to decide whether | |||
| from supporting ECN (or not). | the connection has to fall-back from supporting ECN (or not). | |||
| 3. AccECN Protocol Specification | 3. AccECN Protocol Specification | |||
| 3.1. Negotiating to use AccECN | 3.1. Negotiating to use AccECN | |||
| 3.1.1. Negotiation during the TCP handshake | 3.1.1. Negotiation during the TCP handshake | |||
| Given the ECN Nonce [RFC3540] has been reclassified as historic | Given the ECN Nonce [RFC3540] has been reclassified as historic | |||
| [RFC8311], the present specification re-allocates the TCP flag at bit | [RFC8311], the present specification re-allocates the TCP flag at bit | |||
| 7 of the TCP header, which was previously called NS (Nonce Sum), as | 7 of the TCP header, which was previously called NS (Nonce Sum), as | |||
| skipping to change at page 15, line 18 ¶ | skipping to change at page 15, line 18 ¶ | |||
| Simultaneous Open: An originating AccECN Host (A), having sent a SYN | Simultaneous Open: An originating AccECN Host (A), having sent a SYN | |||
| with AE=1, CWR=1 and ECE=1, might receive another SYN from host B. | with AE=1, CWR=1 and ECE=1, might receive another SYN from host B. | |||
| Host A MUST then enter the same feedback mode as it would have | Host A MUST then enter the same feedback mode as it would have | |||
| entered had it been a responding host and received the same SYN. | entered had it been a responding host and received the same SYN. | |||
| Then host A MUST send the same SYN/ACK as it would have sent had | Then host A MUST send the same SYN/ACK as it would have sent had | |||
| it been a responding host. | it been a responding host. | |||
| In-window SYN during TIME-WAIT: Many TCP implementations create a | In-window SYN during TIME-WAIT: Many TCP implementations create a | |||
| new TCP connection if they receive an in-window SYN packet during | new TCP connection if they receive an in-window SYN packet during | |||
| TIME-WAIT state. When a TCP host enters TIME-WAIT or CLOSED | TIME-WAIT state. When a TCP host enters TIME-WAIT or CLOSED | |||
| state, it should ignore any previous state about the negotiation | state, it ought to ignore any previous state about the negotiation | |||
| of AccECN for that connection and renegotiate the feedback mode | of AccECN for that connection and renegotiate the feedback mode | |||
| according to Table 2. | according to Table 2. | |||
| 3.1.3. Forward Compatibility | 3.1.3. Forward Compatibility | |||
| If a TCP server that implements AccECN receives a SYN with the three | If a TCP server that implements AccECN receives a SYN with the three | |||
| TCP header flags (AE, CWR and ECE) set to any combination other than | TCP header flags (AE, CWR and ECE) set to any combination other than | |||
| 000, 011 or 111, it MUST negotiate the use of AccECN as if they had | 000, 011 or 111, it MUST negotiate the use of AccECN as if they had | |||
| been set to 111. This ensures that future uses of the other | been set to 111. This ensures that future uses of the other | |||
| combinations on a SYN can rely on consistent behaviour from the | combinations on a SYN can rely on consistent behaviour from the | |||
| skipping to change at page 16, line 13 ¶ | skipping to change at page 16, line 13 ¶ | |||
| e.g. congestion, wireless interference, etc. | e.g. congestion, wireless interference, etc. | |||
| Implementers MAY use other fall-back strategies if they are found to | Implementers MAY use other fall-back strategies if they are found to | |||
| be more effective (e.g. attempting to negotiate AccECN on the SYN | be more effective (e.g. attempting to negotiate AccECN on the SYN | |||
| only once or more than twice (most appropriate during high levels of | only once or more than twice (most appropriate during high levels of | |||
| congestion). However, other fall-back strategies will need to follow | congestion). However, other fall-back strategies will need to follow | |||
| all the rules in Section 3.1.5, which concern behaviour when SYNs or | all the rules in Section 3.1.5, which concern behaviour when SYNs or | |||
| SYN/ACKs negotiating different types of feedback have been sent | SYN/ACKs negotiating different types of feedback have been sent | |||
| within the same connection. | within the same connection. | |||
| Further it may make sense to also remove any other new or | Further it might make sense to also remove any other new or | |||
| experimental fields or options on the SYN in case a middlebox might | experimental fields or options on the SYN in case a middlebox might | |||
| be blocking them, although the required behaviour will depend on the | be blocking them, although the required behaviour will depend on the | |||
| specification of the other option(s) and any attempt to co-ordinate | specification of the other option(s) and any attempt to co-ordinate | |||
| fall-back between different modules of the stack. | fall-back between different modules of the stack. | |||
| Whichever fall-back strategy is used, the TCP initiator SHOULD cache | Whichever fall-back strategy is used, the TCP initiator SHOULD cache | |||
| failed connection attempts. If it does, it SHOULD NOT give up | failed connection attempts. If it does, it SHOULD NOT give up | |||
| attempting to negotiate AccECN on the SYN of subsequent connection | attempting to negotiate AccECN on the SYN of subsequent connection | |||
| attempts until it is clear that the blockage is persistently and | attempts until it is clear that the blockage is persistently and | |||
| specifically due to AccECN. The cache should be arranged to expire | specifically due to AccECN. The cache needs to be arranged to expire | |||
| so that the initiator will infrequently attempt to check whether the | so that the initiator will infrequently attempt to check whether the | |||
| problem has been resolved. | problem has been resolved. | |||
| The fall-back procedure if the TCP server receives no ACK to | The fall-back procedure if the TCP server receives no ACK to | |||
| acknowledge a SYN/ACK that tried to negotiate AccECN is specified in | acknowledge a SYN/ACK that tried to negotiate AccECN is specified in | |||
| Section 3.2.3.2. | Section 3.2.3.2. | |||
| 3.1.5. Implications of AccECN Mode | 3.1.5. Implications of AccECN Mode | |||
| Section 3.1.1 describes the only ways that a host can enter AccECN | Section 3.1.1 describes the only ways that a host can enter AccECN | |||
| skipping to change at page 17, line 29 ¶ | skipping to change at page 17, line 29 ¶ | |||
| rules, the two peers could end up using different feedback modes | rules, the two peers could end up using different feedback modes | |||
| without knowing it. | without knowing it. | |||
| o Congestion response: | o Congestion response: | |||
| * It is still obliged to respond appropriately to AccECN feedback | * It is still obliged to respond appropriately to AccECN feedback | |||
| that indicates there were ECN marks on packets it had | that indicates there were ECN marks on packets it had | |||
| previously sent, as defined in Section 6.1 of [RFC3168] and | previously sent, as defined in Section 6.1 of [RFC3168] and | |||
| updated by Sections 2.1 and 4.1 of [RFC8311]. | updated by Sections 2.1 and 4.1 of [RFC8311]. | |||
| In general, it is obliged to respond to congestion feedback | ||||
| even when it is solely sending non-ECN-capable packets (for | ||||
| rationale, some examples and some exceptions see | ||||
| Section 3.2.2.3, Section 3.2.2.4). | ||||
| * The commitment to respond appropriately to incoming indications | * The commitment to respond appropriately to incoming indications | |||
| of congestion remains even if it sends a SYN packet with | of congestion remains even if it sends a SYN packet with | |||
| AE=CWR=ECE=0, in a later transmission within the same TCP | AE=CWR=ECE=0, in a later transmission within the same TCP | |||
| connection. | connection. | |||
| * Unlike an RFC 3168 data sender, it MUST NOT set CWR to indicate | * Unlike an RFC 3168 data sender, it MUST NOT set CWR to indicate | |||
| it has received and responded to indications of congestion (for | it has received and responded to indications of congestion (for | |||
| the avoidance of doubt, this does not preclude it from setting | the avoidance of doubt, this does not preclude it from setting | |||
| the bits of the ACE counter field, which includes an overloaded | the bits of the ACE counter field, which includes an overloaded | |||
| use of the same bit). | use of the same bit). | |||
| skipping to change at page 23, line 37 ¶ | skipping to change at page 23, line 40 ¶ | |||
| of the SYN/ACK) that is delayed for longer than the server's | of the SYN/ACK) that is delayed for longer than the server's | |||
| retransmission timeout; or packet duplication by the network. And | retransmission timeout; or packet duplication by the network. And | |||
| the impact of any error in the feedback on such ACKs will only be | the impact of any error in the feedback on such ACKs will only be | |||
| temporary. | temporary. | |||
| 3.2.2.3. Testing for Mangling of the IP/ECN Field | 3.2.2.3. Testing for Mangling of the IP/ECN Field | |||
| The value of the ACE field on the SYN/ACK indicates the value of the | The value of the ACE field on the SYN/ACK indicates the value of the | |||
| IP/ECN field when the SYN arrived at the server. The client can | IP/ECN field when the SYN arrived at the server. The client can | |||
| compare this with how it originally set the IP/ECN field on the SYN. | compare this with how it originally set the IP/ECN field on the SYN. | |||
| If this comparison implies an unsafe transition (see below) of the | If this comparison implies an invalid transition (defined below) of | |||
| IP/ECN field, for the remainder of the connection the client MUST NOT | the IP/ECN field, for the remainder of the half-connection the client | |||
| send ECN-capable packets, but it MUST continue to feed back any ECN | is advised to send non-ECN-capable packets, but it still ought to | |||
| markings on arriving packets. | respond to any feedback of CE markings (explained below). However, | |||
| the client MUST remain in the AccECN feedback mode and it MUST | ||||
| continue to feed back any ECN markings on arriving packets (in its | ||||
| role as Data Receiver). | ||||
| The value of the ACE field on the last ACK of the 3WHS indicates the | The value of the ACE field on the last ACK of the 3WHS indicates the | |||
| value of the IP/ECN field when the SYN/ACK arrived at the client. | value of the IP/ECN field when the SYN/ACK arrived at the client. | |||
| The server can compare this with how it originally set the IP/ECN | The server can compare this with how it originally set the IP/ECN | |||
| field on the SYN/ACK. If this comparison implies an unsafe | field on the SYN/ACK. If this comparison implies an invalid | |||
| transition of the IP/ECN field, for the remainder of the connection | transition of the IP/ECN field, for the remainder of the half- | |||
| the server MUST NOT send ECN-capable packets, but it MUST continue to | connection the server is advised to send non-ECN-capable packets, but | |||
| feed back any ECN markings on arriving packets. | it still ought to respond to any feedback of CE markings (explained | |||
| below). However, the server MUST remain in the AccECN feedback mode | ||||
| and it MUST continue to feed back any ECN markings on arriving | ||||
| packets (in its role as Data Receiver). | ||||
| If a Data Sender in AccECN mode starts sending non-ECN-capable | ||||
| packets because it has detected mangling, it is still advised to | ||||
| respond to CE feedback. Reason: any CE-marking arriving at the Data | ||||
| Receiver could be due to something early in the path mangling the | ||||
| non-ECN-capable IP/ECN field into an ECN-capable codepoint and then, | ||||
| later in the path, a network bottleneck might be applying CE-markings | ||||
| to indicate genuine congestion. This argument applies whether the | ||||
| handshake packet originally sent by the client or server was non-ECN- | ||||
| capable or ECN-capable because, in either case, an unsafe transition | ||||
| could imply that future non-ECN-capable packets might get mangled. | ||||
| The above advice on switching to sending non-ECN-capable packets but | ||||
| still responding to CE-markings unless they become continuous is not | ||||
| stated normatively (in capitals), because the best strategy might | ||||
| depend on experience of the most likely types of mangling, which can | ||||
| only be known at the time of deployment. | ||||
| The ACK of the SYN/ACK is not reliably delivered (nonetheless, the | The ACK of the SYN/ACK is not reliably delivered (nonetheless, the | |||
| count of CE marks is still eventually delivered reliably). If this | count of CE marks is still eventually delivered reliably). If this | |||
| ACK does not arrive, the server can continue to send ECN-capable | ACK does not arrive, the server is advised to continue to send ECN- | |||
| packets without having tested for mangling of the IP/ECN field on the | capable packets without having tested for mangling of the IP/ECN | |||
| SYN/ACK. | field on the SYN/ACK. | |||
| Invalid transitions of the IP/ECN field are defined in [RFC3168] and | Invalid transitions of the IP/ECN field are defined in section 18 of | |||
| repeated here for convenience: | [RFC3168] and repeated here for convenience: | |||
| o the not-ECT codepoint changes; | o the not-ECT codepoint changes; | |||
| o either ECT codepoint transitions to not-ECT; | o either ECT codepoint transitions to not-ECT; | |||
| o the CE codepoint changes. | o the CE codepoint changes. | |||
| RFC 3168 says that a router that changes ECT to not-ECT is invalid | RFC 3168 says that a router that changes ECT to not-ECT is invalid | |||
| but safe. However, from a host's viewpoint, this transition is | but safe. However, from a host's viewpoint, this transition is | |||
| unsafe because it could be the result of two transitions at different | unsafe because it could be the result of two transitions at different | |||
| routers on the path: ECT to CE (safe) then CE to not-ECT (unsafe). | routers on the path: ECT to CE (safe) then CE to not-ECT (unsafe). | |||
| This scenario could well happen where an ECN-enabled home router | This scenario could well happen where an ECN-enabled home router | |||
| congests its upstream mobile broadband bottleneck link, then the | congests its upstream mobile broadband bottleneck link, then the | |||
| ingress to the mobile network clears the ECN field [Mandalari18]. | ingress to the mobile network clears the ECN field [Mandalari18]. | |||
| Once a Data Sender has entered AccECN mode it SHOULD check whether it | Once a Data Sender has entered AccECN mode it is advised to check | |||
| is receiving continuous CE marking. Specifying exactly how to do | whether it is receiving continuous CE marking. Specifying exactly | |||
| this is beyond the scope of the present specification, but the sender | how to do this is beyond the scope of the present specification, but | |||
| might check whether the feedback for every packet it sends for the | the sender might check whether the feedback for every packet it sends | |||
| first three or four rounds indicates CE-marking. If continuous CE- | for the first three or four rounds indicates CE-marking. If | |||
| marking is detected, for the remainder of the connection the Data | continuous CE-marking is detected, for the remainder of the half- | |||
| Sender SHOULD NOT send ECN-capable packets and consequently it SHOULD | connection, the Data Sender ought to send non-ECN-capable packets and | |||
| NOT respond to any ECN feedback. The phrase 'MUST NOT' has been | it is advised not to respond to any feedback of CE markings. The | |||
| avoided to allow the sender to test whether it can resume sending | Data Sender might occasionally test whether it can resume sending | |||
| ECN-capable packets. Throughout, it MUST remain in the AccECN | ECN-capable packets. As always, once a host has entered AccECN mode, | |||
| feedback mode and it MUST continue to feed back any ECN markings on | it MUST remain in the same feedback mode and it MUST continue to feed | |||
| arriving packets (in its role as Data Receiver). | back any ECN markings on arriving packets. | |||
| All the fall-back behaviours in this section are necessary in case | All the fall-back behaviours in this section are necessary in case | |||
| mangling of the IP/ECN field is asymmetric, which is currently common | mangling of the IP/ECN field is asymmetric, which is currently common | |||
| over some mobile networks [Mandalari18]. Then one end might see no | over some mobile networks [Mandalari18]. Then one end might see no | |||
| unsafe transition and continue sending ECN-capable packets, while the | unsafe transition and continue sending ECN-capable packets, while the | |||
| other end sees an unsafe transition and stops sending ECN-capable | other end sees an unsafe transition and stops sending ECN-capable | |||
| packets. | packets. | |||
| 3.2.2.4. Testing for Zeroing of the ACE Field | 3.2.2.4. Testing for Zeroing of the ACE Field | |||
| Section 3.2.2 required the Data Receiver to initialize the r.cep | Section 3.2.2 required the Data Receiver to initialize the r.cep | |||
| counter to a non-zero value. Therefore, in either direction the | counter to a non-zero value. Therefore, in either direction the | |||
| initial value of the ACE counter ought to be non-zero. | initial value of the ACE counter ought to be non-zero. | |||
| If AccECN has been successfully negotiated, the Data Sender SHOULD | If AccECN has been successfully negotiated, the Data Sender SHOULD | |||
| check the value of the ACE counter in the first packet (with or | check the value of the ACE counter in the first packet (with or | |||
| without data) that arrives with SYN=0. If the value of this ACE | without data) that arrives with SYN=0. If the value of this ACE | |||
| field is zero (0b000), the Data Sender disables sending ECN-capable | field is zero (0b000), for the remainder of the half-connection the | |||
| packets for the remainder of the half-connection by setting the IP/ | Data Sender ought to send non-ECN-capable packets and it is advised | |||
| ECN field in all subsequent packets to Not-ECT. | not to respond to any feedback of CE markings. Reason: the symptoms | |||
| imply either potential mangling of the ECN fields in both the IP and | ||||
| TCP headers, or a broken remote TCP implementation. This advice is | ||||
| not stated normatively (in capitals), because the best strategy might | ||||
| depend on experience of the most likely types of mangling, which can | ||||
| only be known at the time of deployment. | ||||
| Usually, the server checks the ACK of the SYN/ACK from the client, | If reordering occurs, "the first packet ... that arrives" will not | |||
| while the client checks the first data segment from the server. | necessarily be the same as the first packet in sequence order. The | |||
| However, if reordering occurs, "the first packet ... that arrives" | test has been specified loosely like this to simplify implementation, | |||
| will not necessarily be the same as the first packet in sequence | and because it would not have been any more precise to have specified | |||
| order. The test has been specified loosely like this to simplify | the first packet in sequence order, which would not necessarily be | |||
| implementation, and because it would not have been any more precise | the first ACE counter that the Data Receiver fed back anyway, given | |||
| to have specified the first packet in sequence order, which would not | it might have been a retransmission. Usually, the server checks the | |||
| necessarily be the first ACE counter that the Data Receiver fed back | ACK of the SYN/ACK from the client, while the client checks the first | |||
| anyway, given it might have been a retransmission. | data segment from the server. | |||
| The possibility of re-ordering means that there is a small chance | The possibility of re-ordering means that there is a small chance | |||
| that the ACE field on the first packet to arrive is genuinely zero | that the ACE field on the first packet to arrive is genuinely zero | |||
| (without middlebox interference). This would cause a host to | (without middlebox interference). This would cause a host to | |||
| unnecessarily disable ECN for a half connection. Therefore, in | unnecessarily disable ECN for a half connection. Therefore, in | |||
| environments where there is no evidence of the ACE field being | environments where there is no evidence of the ACE field being | |||
| zeroed, implementations can skip this test. | zeroed, implementations can skip this test. | |||
| Note that the Data Sender MUST NOT test whether the arriving counter | Note that the Data Sender MUST NOT test whether the arriving counter | |||
| in the initial ACE field has been initialized to a specific valid | in the initial ACE field has been initialized to a specific valid | |||
| skipping to change at page 29, line 38 ¶ | skipping to change at page 30, line 31 ¶ | |||
| be able to read in AccECN Options of any of the above lengths. For | be able to read in AccECN Options of any of the above lengths. For | |||
| forward compatibility, if the AccECN Option is of any other length, | forward compatibility, if the AccECN Option is of any other length, | |||
| implementations MUST use those whole 3-octet fields that fit within | implementations MUST use those whole 3-octet fields that fit within | |||
| the length and ignore the remainder of the option, treating it as | the length and ignore the remainder of the option, treating it as | |||
| padding. | padding. | |||
| The AccECN Option has to be optional to implement, because both | The AccECN Option has to be optional to implement, because both | |||
| sender and receiver have to be able to cope without the option anyway | sender and receiver have to be able to cope without the option anyway | |||
| - in cases where it does not traverse a network path. It is | - in cases where it does not traverse a network path. It is | |||
| RECOMMENDED to implement both sending and receiving of the AccECN | RECOMMENDED to implement both sending and receiving of the AccECN | |||
| Option. If sending of the AccECN Option is implemented, the fall- | Option. Support for the AccECN Option is particularly valuable over | |||
| backs described in this document will need to be implemented as well | paths that introduce a high degree of ACK filtering, where the 3-bit | |||
| (unless solely for a controlled environment where path traversal is | ACE counter alone might sometimes be insufficient, when it is | |||
| not considered a problem). Even if a developer does not implement | ambiguous whether it has wrapped. If sending of the AccECN Option is | |||
| sending of the AccECN Option, it is RECOMMENDED that they still | implemented, the fall-backs described in this document will need to | |||
| implement logic to receive and understand any AccECN Options sent by | be implemented as well (unless solely for a controlled environment | |||
| remote peers. | where path traversal is not considered a problem). Even if a | |||
| developer does not implement sending of the AccECN Option, it is | ||||
| RECOMMENDED that they still implement logic to receive and understand | ||||
| any AccECN Options sent by remote peers. | ||||
| If a Data Receiver intends to send the AccECN Option at any time | If a Data Receiver intends to send the AccECN Option at any time | |||
| during the rest of the connection it is strongly recommended to also | during the rest of the connection it is strongly RECOMMENDED to also | |||
| test path traversal of the AccECN Option as specified in | test path traversal of the AccECN Option as specified in | |||
| Section 3.2.3.2. | Section 3.2.3.2. | |||
| 3.2.3.1. Encoding and Decoding Feedback in the AccECN Option Fields | 3.2.3.1. Encoding and Decoding Feedback in the AccECN Option Fields | |||
| Whenever the Data Receiver includes any of the counter fields (ECEB, | Whenever the Data Receiver includes any of the counter fields (ECEB, | |||
| EE0B, EE1B) in an AccECN Option, it MUST encode the 24 least | EE0B, EE1B) in an AccECN Option, it MUST encode the 24 least | |||
| significant bits of the current value of the associated counter into | significant bits of the current value of the associated counter into | |||
| the field (respectively r.ceb, r.e0b, r.e1b). | the field (respectively r.ceb, r.e0b, r.e1b). | |||
| skipping to change at page 31, line 42 ¶ | skipping to change at page 32, line 40 ¶ | |||
| Option is blocked for subsequent connections. [RFC9040] further | Option is blocked for subsequent connections. [RFC9040] further | |||
| discusses caching of TCP parameters and status information. | discusses caching of TCP parameters and status information. | |||
| If a host falls back to not sending the AccECN Option, it will | If a host falls back to not sending the AccECN Option, it will | |||
| continue to process any incoming AccECN Options as normal. | continue to process any incoming AccECN Options as normal. | |||
| Either host MAY include the AccECN Option in a subsequent segment to | Either host MAY include the AccECN Option in a subsequent segment to | |||
| retest whether the AccECN Option can traverse the path. | retest whether the AccECN Option can traverse the path. | |||
| If the TCP server receives a second SYN with a request for AccECN | If the TCP server receives a second SYN with a request for AccECN | |||
| support, it should resend the SYN/ACK, again confirming its support | support, it is advised to resend the SYN/ACK, again confirming its | |||
| for AccECN, but this time without the AccECN Option. This approach | support for AccECN, but this time without the AccECN Option. This | |||
| rules out any interference by middleboxes that may drop packets with | approach rules out any interference by middleboxes that might drop | |||
| unknown options, even though it is more likely that the SYN/ACK would | packets with unknown options, even though it is more likely that the | |||
| have been lost due to congestion. The TCP server MAY try to send | SYN/ACK would have been lost due to congestion. The TCP server MAY | |||
| another packet with the AccECN Option at a later point during the | try to send another packet with the AccECN Option at a later point | |||
| connection but should monitor if that packet got lost as well, in | during the connection but it ought to monitor if that packet got lost | |||
| which case it SHOULD disable the sending of the AccECN Option for | as well, in which case it SHOULD disable the sending of the AccECN | |||
| this half-connection. | Option for this half-connection. | |||
| Similarly, an AccECN end-point MAY separately memorize which data | Similarly, an AccECN end-point MAY separately memorize which data | |||
| packets carried an AccECN Option and disable the sending of AccECN | packets carried an AccECN Option and disable the sending of AccECN | |||
| Options if the loss probability of those packets is significantly | Options if the loss probability of those packets is significantly | |||
| higher than that of all other data packets in the same connection. | higher than that of all other data packets in the same connection. | |||
| 3.2.3.2.3. Testing for Absence of the AccECN Option | 3.2.3.2.3. Testing for Absence of the AccECN Option | |||
| If the TCP client has successfully negotiated AccECN but does not | If the TCP client has successfully negotiated AccECN but does not | |||
| receive an AccECN Option on the SYN/ACK (e.g. because is has been | receive an AccECN Option on the SYN/ACK (e.g. because is has been | |||
| skipping to change at page 37, line 34 ¶ | skipping to change at page 38, line 34 ¶ | |||
| TCP pure ACKs to be ECN-capable if AccECN has been negotiated | TCP pure ACKs to be ECN-capable if AccECN has been negotiated | |||
| [I-D.ietf-tcpm-generalized-ecn]). This heuristic is simple and | [I-D.ietf-tcpm-generalized-ecn]). This heuristic is simple and | |||
| stateless. However, it might omit some AccECN ACKs, because it is | stateless. However, it might omit some AccECN ACKs, because it is | |||
| only recommended but not obligatory to use ECN++ with AccECN - | only recommended but not obligatory to use ECN++ with AccECN - | |||
| only deployment experience will tell. Also, TCP ACKs might be | only deployment experience will tell. Also, TCP ACKs might be | |||
| ECN-capable owing to some scheme other than AccECN, e.g. [RFC5690] | ECN-capable owing to some scheme other than AccECN, e.g. [RFC5690] | |||
| or some future standards action. Again, only deployment | or some future standards action. Again, only deployment | |||
| experience will tell. | experience will tell. | |||
| o The main concern with preserving correct AccECN operation involves | o The main concern with preserving correct AccECN operation involves | |||
| leaving enough ACKs for the data sender to work out whether the | leaving enough ACKs for the Data Sender to work out whether the | |||
| 3-bit ACE field has wrapped. ACE field wrap is of less concern if | 3-bit ACE field has wrapped. ACE field wrap might be of less | |||
| packets also carry the AccECN TCP Option. | concern if packets also carry the AccECN TCP Option. | |||
| Note that the present specification of AccECN in TCP does not presume | Note that the present specification of AccECN in TCP does not presume | |||
| to rely on any of the above ACK filtering behaviour in the network | to rely on any of the above ACK filtering behaviour in the network | |||
| (hence the use of 'SHOULD' rather than 'MUST' above), because it has | (hence the use of 'SHOULD' rather than 'MUST' above), because it has | |||
| to be robust against pre-existing network nodes that do not | to be robust against pre-existing network nodes that do not | |||
| distinguish AccECN ACKs, and robust against ACK loss during overload | distinguish AccECN ACKs, and robust against ACK loss during overload | |||
| more generally. | more generally. | |||
| Section 5.2.1 of BCP 69 [RFC3449] gives best current practice on pure | Section 5.2.1 of BCP 69 [RFC3449] gives best current practice on pure | |||
| TCP ACK filtering. It gives no advice on ACKs carrying ECN feedback, | TCP ACK filtering. It gives no advice on ACKs carrying ECN feedback, | |||
| skipping to change at page 40, line 8 ¶ | skipping to change at page 41, line 8 ¶ | |||
| (not just data packets) in the present specification. | (not just data packets) in the present specification. | |||
| Specifically, in the normative specification of AccECN (Section 3) | Specifically, in the normative specification of AccECN (Section 3) | |||
| only 'Acceptable' packets contribute to the ECN counters at the | only 'Acceptable' packets contribute to the ECN counters at the | |||
| AccECN receiver and Section 1.3 defines an Acceptable packet as | AccECN receiver and Section 1.3 defines an Acceptable packet as | |||
| one that passes the acceptability tests in both [RFC0793] and | one that passes the acceptability tests in both [RFC0793] and | |||
| [RFC5961]. | [RFC5961]. | |||
| o Sections 5.2, 6.1.1, 6.1.4, 6.1.5 and 6.1.6 of [RFC3168] prohibit | o Sections 5.2, 6.1.1, 6.1.4, 6.1.5 and 6.1.6 of [RFC3168] prohibit | |||
| use of ECN on TCP control packets and retransmissions. The | use of ECN on TCP control packets and retransmissions. The | |||
| present specification does not update that aspect of RFC 3168, but | present specification does not update that aspect of RFC 3168, but | |||
| it does say what feedback an AccECN Data Receiver should provide | it does say what feedback an AccECN Data Receiver ought to provide | |||
| if it receives an ECN-capable control packet or retransmission. | if it receives an ECN-capable control packet or retransmission. | |||
| This ensures AccECN is forward compatible with any future scheme | This ensures AccECN is forward compatible with any future scheme | |||
| that allows ECN on these packets, as provided for in section 4.3 | that allows ECN on these packets, as provided for in section 4.3 | |||
| of [RFC8311] and as proposed in [I-D.ietf-tcpm-generalized-ecn]. | of [RFC8311] and as proposed in [I-D.ietf-tcpm-generalized-ecn]. | |||
| 5. Interaction with TCP Variants | 5. Interaction with TCP Variants | |||
| This section is informative, not normative. | This section is informative, not normative. | |||
| 5.1. Compatibility with SYN Cookies | 5.1. Compatibility with SYN Cookies | |||
| skipping to change at page 40, line 34 ¶ | skipping to change at page 41, line 34 ¶ | |||
| thread). Therefore it cannot record the fact that it entered AccECN | thread). Therefore it cannot record the fact that it entered AccECN | |||
| mode for both half-connections. Indeed, it cannot even remember | mode for both half-connections. Indeed, it cannot even remember | |||
| whether it negotiated the use of classic ECN [RFC3168]. | whether it negotiated the use of classic ECN [RFC3168]. | |||
| Nonetheless, such a server can determine that it negotiated AccECN as | Nonetheless, such a server can determine that it negotiated AccECN as | |||
| follows. If a TCP server using SYN Cookies supports AccECN and if it | follows. If a TCP server using SYN Cookies supports AccECN and if it | |||
| receives a pure ACK that acknowledges an ISN that is a valid SYN | receives a pure ACK that acknowledges an ISN that is a valid SYN | |||
| cookie, and if the ACK contains an ACE field with the value 0b010 to | cookie, and if the ACK contains an ACE field with the value 0b010 to | |||
| 0b111 (decimal 2 to 7), it can assume that: | 0b111 (decimal 2 to 7), it can assume that: | |||
| o the TCP client must have requested AccECN support on the SYN | o the TCP client has to have requested AccECN support on the SYN | |||
| o it (the server) must have confirmed that it supported AccECN | o it (the server) has to have confirmed that it supported AccECN | |||
| Therefore the server can switch itself into AccECN mode, and continue | Therefore the server can switch itself into AccECN mode, and continue | |||
| as if it had never forgotten that it switched itself into AccECN mode | as if it had never forgotten that it switched itself into AccECN mode | |||
| earlier. | earlier. | |||
| If the pure ACK that acknowledges a SYN cookie contains an ACE field | If the pure ACK that acknowledges a SYN cookie contains an ACE field | |||
| with the value 0b000 or 0b001, these values indicate that the client | with the value 0b000 or 0b001, these values indicate that the client | |||
| did not request support for AccECN and therefore the server does not | did not request support for AccECN and therefore the server does not | |||
| enter AccECN mode for this connection. Further, 0b001 on the ACK | enter AccECN mode for this connection. Further, 0b001 on the ACK | |||
| implies that the server sent an ECN-capable SYN/ACK, which was marked | implies that the server sent an ECN-capable SYN/ACK, which was marked | |||
| skipping to change at page 41, line 47 ¶ | skipping to change at page 42, line 47 ¶ | |||
| loss) feedback by occasionally setting the IP-ECN field to a value | loss) feedback by occasionally setting the IP-ECN field to a value | |||
| normally only set by the network (and/or deliberately leaving a | normally only set by the network (and/or deliberately leaving a | |||
| sequence number gap). Then it can test whether the Data | sequence number gap). Then it can test whether the Data | |||
| Receiver's feedback faithfully reports what it expects (similar to | Receiver's feedback faithfully reports what it expects (similar to | |||
| para 2 of Section 20.2 of [RFC3168]). Unlike the ECN Nonce | para 2 of Section 20.2 of [RFC3168]). Unlike the ECN Nonce | |||
| [RFC3540], this approach does not waste the ECT(1) codepoint in | [RFC3540], this approach does not waste the ECT(1) codepoint in | |||
| the IP header, it does not require standardization and it does not | the IP header, it does not require standardization and it does not | |||
| rely on misbehaving receivers volunteering to reveal feedback | rely on misbehaving receivers volunteering to reveal feedback | |||
| information that allows them to be detected. However, setting the | information that allows them to be detected. However, setting the | |||
| CE mark by the sender might conceal actual congestion feedback | CE mark by the sender might conceal actual congestion feedback | |||
| from the network and should therefore only be done sparingly. | from the network and therefore ought to only be done sparingly. | |||
| o Networks generate congestion signals when they are becoming | o Networks generate congestion signals when they are becoming | |||
| congested, so networks are more likely than Data Senders to be | congested, so networks are more likely than Data Senders to be | |||
| concerned about the integrity of the receiver's feedback of these | concerned about the integrity of the receiver's feedback of these | |||
| signals. A network can enforce a congestion response to its ECN | signals. A network can enforce a congestion response to its ECN | |||
| markings (or packet losses) using congestion exposure (ConEx) | markings (or packet losses) using congestion exposure (ConEx) | |||
| audit [RFC7713]. Whether the receiver or a downstream network is | audit [RFC7713]. Whether the receiver or a downstream network is | |||
| suppressing congestion feedback or the sender is unresponsive to | suppressing congestion feedback or the sender is unresponsive to | |||
| the feedback, or both, ConEx audit can neutralize any advantage | the feedback, or both, ConEx audit can neutralize any advantage | |||
| that any of these three parties would otherwise gain. | that any of these three parties would otherwise gain. | |||
| skipping to change at page 57, line 26 ¶ | skipping to change at page 58, line 26 ¶ | |||
| assumption is currently correct, given that RFC 3168 requires that | assumption is currently correct, given that RFC 3168 requires that | |||
| the Data Sender marks retransmitted segments as Not-ECT. However, | the Data Sender marks retransmitted segments as Not-ECT. However, | |||
| the converse is not true; necessary retransmissions will result in | the converse is not true; necessary retransmissions will result in | |||
| under-counting. | under-counting. | |||
| However, such precision is unlikely to be necessary. The only known | However, such precision is unlikely to be necessary. The only known | |||
| use of a count of Not-ECT marked bytes is to test whether equipment | use of a count of Not-ECT marked bytes is to test whether equipment | |||
| on the path is clearing the ECN field (perhaps due to an out-dated | on the path is clearing the ECN field (perhaps due to an out-dated | |||
| attempt to clear, or bleach, what used to be the ToS field). To | attempt to clear, or bleach, what used to be the ToS field). To | |||
| detect bleaching it will be sufficient to detect whether nearly all | detect bleaching it will be sufficient to detect whether nearly all | |||
| bytes arrive marked as Not-ECT. Therefore there should be no need to | bytes arrive marked as Not-ECT. Therefore there ought to be no need | |||
| keep track of the details of retransmissions. | to keep track of the details of retransmissions. | |||
| Appendix B. Rationale for Usage of TCP Header Flags | Appendix B. Rationale for Usage of TCP Header Flags | |||
| B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | |||
| AccECN uses a rather unorthodox approach to negotiate the highest | AccECN uses a rather unorthodox approach to negotiate the highest | |||
| version TCP ECN feedback scheme that both ends support, as justified | version TCP ECN feedback scheme that both ends support, as justified | |||
| below. It follows from the original TCP ECN capability negotiation | below. It follows from the original TCP ECN capability negotiation | |||
| [RFC3168], in which the client set the 2 least significant of the | [RFC3168], in which the client set the 2 least significant of the | |||
| original reserved flags in the TCP header, and fell back to no ECN | original reserved flags in the TCP header, and fell back to no ECN | |||
| End of changes. 35 change blocks. | ||||
| 120 lines changed or deleted | 156 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||