idnits 2.17.1 draft-ietf-tcpm-generalized-ecn-09.txt: -(2150): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 6 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC5562, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (31 January 2022) is 816 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-28) exists of draft-ietf-tcpm-accurate-ecn-15 ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 2140 (Obsoleted by RFC 9040) == Outdated reference: A later version (-29) exists of draft-ietf-tsvwg-ecn-l4s-id-23 == Outdated reference: A later version (-20) exists of draft-ietf-tsvwg-l4s-arch-15 == Outdated reference: A later version (-07) exists of draft-stewart-tsvwg-sctpecn-05 Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Bagnulo 3 Internet-Draft UC3M 4 Obsoletes: 5562 (if approved) B. Briscoe 5 Intended status: Experimental Independent 6 Expires: 4 August 2022 31 January 2022 8 ECN++: Adding Explicit Congestion Notification (ECN) to TCP Control 9 Packets 10 draft-ietf-tcpm-generalized-ecn-09 12 Abstract 14 This document describes an experimental modification to ECN when used 15 with TCP. It allows the use of ECN on the following TCP packets: 16 SYNs, pure ACKs, Window probes, FINs, RSTs and retransmissions. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at https://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on 4 August 2022. 35 Copyright Notice 37 Copyright (c) 2022 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 42 license-info) in effect on the date of publication of this document. 43 Please review these documents carefully, as they describe your rights 44 and restrictions with respect to this document. Code Components 45 extracted from this document must include Revised BSD License text as 46 described in Section 4.e of the Trust Legal Provisions and are 47 provided without warranty as described in the Revised BSD License. 49 This document may contain material from IETF Documents or IETF 50 Contributions published or made publicly available before November 51 10, 2008. The person(s) controlling the copyright in some of this 52 material may not have granted the IETF Trust the right to allow 53 modifications of such material outside the IETF Standards Process. 54 Without obtaining an adequate license from the person(s) controlling 55 the copyright in such materials, this document may not be modified 56 outside the IETF Standards Process, and derivative works of it may 57 not be created outside the IETF Standards Process, except to format 58 it for publication as an RFC or to translate it into languages other 59 than English. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 4 65 1.2. Experiment Goals . . . . . . . . . . . . . . . . . . . . 5 66 1.3. Document Structure . . . . . . . . . . . . . . . . . . . 6 67 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 68 3. Specification . . . . . . . . . . . . . . . . . . . . . . . . 7 69 3.1. Network (e.g. Firewall) Behaviour . . . . . . . . . . . . 8 70 3.2. Sender Behaviour . . . . . . . . . . . . . . . . . . . . 8 71 3.2.1. SYN (Send) . . . . . . . . . . . . . . . . . . . . . 10 72 3.2.2. SYN-ACK (Send) . . . . . . . . . . . . . . . . . . . 13 73 3.2.3. Pure ACK (Send) . . . . . . . . . . . . . . . . . . . 14 74 3.2.4. Window Probe (Send) . . . . . . . . . . . . . . . . . 16 75 3.2.5. FIN (Send) . . . . . . . . . . . . . . . . . . . . . 16 76 3.2.6. RST (Send) . . . . . . . . . . . . . . . . . . . . . 17 77 3.2.7. Retransmissions (Send) . . . . . . . . . . . . . . . 17 78 3.2.8. General Fall-back for any Control Packet or 79 Retransmission . . . . . . . . . . . . . . . . . . . 18 80 3.3. Receiver Behaviour . . . . . . . . . . . . . . . . . . . 18 81 3.3.1. Receiver Behaviour for Any TCP Control Packet or 82 Retransmission . . . . . . . . . . . . . . . . . . . 18 83 3.3.2. SYN (Receive) . . . . . . . . . . . . . . . . . . . . 19 84 3.3.3. Pure ACK (Receive) . . . . . . . . . . . . . . . . . 20 85 3.3.4. FIN (Receive) . . . . . . . . . . . . . . . . . . . . 20 86 3.3.5. RST (Receive) . . . . . . . . . . . . . . . . . . . . 20 87 3.3.6. Retransmissions (Receive) . . . . . . . . . . . . . . 21 88 4. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . 21 89 4.1. The Reliability Argument . . . . . . . . . . . . . . . . 21 90 4.2. SYNs . . . . . . . . . . . . . . . . . . . . . . . . . . 22 91 4.2.1. Argument 1a: Unrecognized CE on the SYN . . . . . . . 22 92 4.2.2. Argument 1b: ECT Considered Invalid on the SYN . . . 23 93 4.2.3. Caching Strategies for ECT on SYNs . . . . . . . . . 25 94 4.2.4. Argument 2: DoS Attacks . . . . . . . . . . . . . . . 27 95 4.3. SYN-ACKs . . . . . . . . . . . . . . . . . . . . . . . . 28 96 4.3.1. Possibility of Unrecognized CE on the SYN-ACK . . . . 28 97 4.3.2. Response to Congestion on a SYN-ACK . . . . . . . . . 29 98 4.3.3. Fall-Back if ECT SYN-ACK Fails . . . . . . . . . . . 30 99 4.4. Pure ACKs . . . . . . . . . . . . . . . . . . . . . . . . 30 100 4.4.1. Mechanisms to Respond to CE-Marked Pure ACKs . . . . 32 101 4.4.2. Summary: Enabling ECN on Pure ACKs . . . . . . . . . 35 102 4.5. Window Probes . . . . . . . . . . . . . . . . . . . . . . 36 103 4.6. FINs . . . . . . . . . . . . . . . . . . . . . . . . . . 37 104 4.7. RSTs . . . . . . . . . . . . . . . . . . . . . . . . . . 37 105 4.8. Retransmitted Packets. . . . . . . . . . . . . . . . . . 38 106 4.9. General Fall-back for any Control Packet . . . . . . . . 39 107 5. Interaction with popular variants or derivatives of TCP . . . 40 108 5.1. IW10 . . . . . . . . . . . . . . . . . . . . . . . . . . 41 109 5.2. TFO . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 110 5.3. L4S . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 111 5.4. Other transport protocols . . . . . . . . . . . . . . . . 43 112 6. Security Considerations . . . . . . . . . . . . . . . . . . . 43 113 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 43 114 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 44 115 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 44 116 9.1. Normative References . . . . . . . . . . . . . . . . . . 44 117 9.2. Informative References . . . . . . . . . . . . . . . . . 45 118 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 48 120 1. Introduction 122 RFC 3168 [RFC3168] specifies support of Explicit Congestion 123 Notification (ECN) in IP (v4 and v6). By using the ECN capability, 124 network elements (e.g. routers, switches) performing Active Queue 125 Management (AQM) can use ECN marks instead of packet drops to signal 126 congestion to the endpoints of a communication. This results in 127 lower packet loss and increased performance. RFC 3168 also specifies 128 support for ECN in TCP, but solely on data packets. For various 129 reasons it precludes the use of ECN on TCP control packets (TCP SYN, 130 TCP SYN-ACK, pure ACKs, Window probes) and on retransmitted packets. 131 RFC 3168 is silent about the use of ECN on RST and FIN packets. RFC 132 5562 [RFC5562] is an experimental modification to ECN that enables 133 ECN support for TCP SYN-ACK packets. 135 This document defines an experimental modification to ECN [RFC3168] 136 that shall be called ECN++. It enables ECN support on all the 137 aforementioned types of TCP packet. RFC 5562 (which was called ECN+) 138 is obsoleted by the present specification, because it has the same 139 goal of enabling ECT, but on only one type of control packet. The 140 mechanisms proposed in this document have been defined conservatively 141 and with safety in mind, possibly in some cases at the expense of 142 performance. 144 ECN++ uses a sender-only deployment model. It works whether the two 145 ends of the TCP connection use classic ECN feedback [RFC3168] or 146 Accurate ECN feedback (AccECN [I-D.ietf-tcpm-accurate-ecn]), the two 147 ECN feedback mechanisms for TCP being standardized at the time of 148 writing. 150 Using ECN on initial SYN packets provides significant benefits, as we 151 describe in the next subsection. However, only AccECN provides a way 152 to feed back whether the SYN was CE marked, and RFC 3168 does not. 153 Therefore, implementers of ECN++ are RECOMMENDED to also implement 154 AccECN. Conversely, if AccECN (or an equivalent safety mechanism) is 155 not implemented with ECN++, this specification rules out ECN on the 156 SYN. 158 ECN++ is designed for compatibility with a number of latency 159 improvements to TCP such as TCP Fast Open (TFO [RFC7413]), initial 160 window of 10 SMSS (IW10 [RFC6928]) and Low latency Low Loss Scalable 161 Transport (L4S [I-D.ietf-tsvwg-l4s-arch]), but they can all be 162 implemented and deployed independently. [RFC8311] is a standards 163 track procedural device that relaxes requirements in RFC 3168 and 164 other standards track RFCs that would otherwise preclude the 165 experimental modifications needed for ECN++ and other ECN 166 experiments. 168 1.1. Motivation 170 The absence of ECN support on TCP control packets and retransmissions 171 has a potential harmful effect. In any ECN deployment, non-ECN- 172 capable packets suffer a penalty when they traverse a congested 173 bottleneck. For instance, with a drop probability of 1%, 1% of 174 connection attempts suffer a timeout of about 1 second before the SYN 175 is retransmitted, which is highly detrimental to the performance of 176 short flows. TCP control packets, particularly TCP SYNs and SYN- 177 ACKs, are important for performance, so dropping them is best 178 avoided. 180 Not using ECN on control packets can be particularly detrimental to 181 performance in environments where the ECN marking level is high. For 182 example, [judd-nsdi] shows that in a controlled private data centre 183 (DC) environment where ECN is used (in conjunction with DCTCP 184 [RFC8257]), the probability of being able to establish a new 185 connection using a non-ECN SYN packet drops to close to zero even 186 when there are only 16 ongoing TCP flows transmitting at full speed. 187 The issue is that DCTCP exhibits a much more aggressive response to 188 packet marking (which is why it is only applicable in controlled 189 environments). This leads to a high marking probability for ECN- 190 capable packets, and in turn a high drop probability for non-ECN 191 packets. Therefore non-ECN SYNs are dropped aggressively, rendering 192 it nearly impossible to establish a new connection in the presence of 193 even mild traffic load. 195 Finally, there are ongoing experimental efforts to promote the 196 adoption of a slightly modified variant of DCTCP (and similar 197 congestion controls) over the Internet to achieve low latency, low 198 loss and scalable throughput (L4S) for all communications 199 [I-D.ietf-tsvwg-l4s-arch]. In such an approach, L4S packets identify 200 themselves using an ECN codepoint [I-D.ietf-tsvwg-ecn-l4s-id]. With 201 L4S, preventing TCP control packets from obtaining the benefits of 202 ECN would not only expose them to the prevailing level of congestion 203 loss, but it would also classify them into a different queue. Then 204 only L4S data packets would be classified into the L4S queue that is 205 expected to have lower latency, while the packets controlling and 206 retransmitting these data packets would still get stuck behind the 207 queue induced by non-L4S-enabled TCP traffic. 209 1.2. Experiment Goals 211 The goal of the experimental modifications defined in this document 212 is to allow the use of ECN on all TCP packets. Experiments are 213 expected in the public Internet as well as in controlled environments 214 to understand the following issues: 216 * How SYNs, Window probes, pure ACKs, FINs, RSTs and retransmissions 217 that carry the ECT(0), ECT(1) or CE codepoints are processed by 218 the TCP endpoints and the network (including routers, firewalls 219 and other middleboxes). In particular we would like to learn if 220 these packets are frequently blocked or if these packets are 221 usually forwarded and processed. 223 * The scale of deployment of the different flavours of ECN, 224 including [RFC3168], [RFC5562], [RFC3540] and 225 [I-D.ietf-tcpm-accurate-ecn]. 227 * How much the performance of TCP communications is improved by 228 allowing ECN marking of each packet type. 230 * To identify any issues (including security issues) raised by 231 enabling ECN marking of these packets. 233 * To conduct the specific experiments identified in the text by the 234 strings "EXPERIMENTATION NEEDED" or "MEASUREMENTS NEEDED". 236 The data gathered through the experiments described in this document, 237 particularly under the first 2 bullets above, will help in the 238 redesign of the final mechanism (if needed) for adding ECN support to 239 the different packet types considered in this document. 241 Success criteria: The experiment will be a success if we obtain 242 enough data to have a clearer view of the deployability and benefits 243 of enabling ECN on all TCP packets, as well as any issues. If the 244 results of the experiment show that it is feasible to deploy such 245 changes; that there are gains to be achieved through the changes 246 described in this specification; and that no other major issues may 247 interfere with the deployment of the proposed changes; then it would 248 be reasonable to adopt the proposed changes in a standards track 249 specification that would update RFC 3168. 251 1.3. Document Structure 253 The remainder of this document is structured as follows. In 254 Section 2, we present the terminology used in the rest of the 255 document. In Section 3, we specify the modifications to provide ECN 256 support to TCP SYNs, pure ACKs, Window probes, FINs, RSTs and 257 retransmissions. We describe both the network behaviour and the 258 endpoint behaviour. Section 5 discusses variations of the 259 specification that will be necessary to interwork with a number of 260 popular variants or derivatives of TCP. RFC 3168 provides a number 261 of specific reasons why ECN support is not appropriate for each 262 packet type. In Section 4, we revisit each of these arguments for 263 each packet type to justify why it is reasonable to conduct this 264 experiment. 266 2. Terminology 268 The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, 269 SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL in this 270 document, are to be interpreted as described in BCP 14 [RFC2119] when 271 and only when they appear in all capitals [RFC8174]. 273 Pure ACK: A TCP segment with the ACK flag set and no data payload. 275 SYN: A TCP segment with the SYN (synchronize) flag set. 277 Window probe: Defined in [RFC0793], a window probe is a TCP segment 278 with only one byte of data sent to learn if the receive window is 279 still zero. 281 FIN: A TCP segment with the FIN (finish) flag set. 283 RST: A TCP segment with the RST (reset) flag set. 285 Retransmission: A TCP segment that has been retransmitted by the TCP 286 sender. 288 TCP client: The initiating end of a TCP connection. Also called the 289 initiator. 291 TCP server: The responding end of a TCP connection. Also called the 292 responder. 294 ECT: ECN-Capable Transport. One of the two codepoints ECT(0) or 295 ECT(1) in the ECN field [RFC3168] of the IP header (v4 or v6). An 296 ECN-capable sender sets one of these to indicate that both transport 297 end-points support ECN. When this specification says the sender sets 298 an ECT codepoint, by default it means ECT(0). Optionally, it could 299 mean ECT(1), which is in the process of being redefined for use by 300 L4S experiments [RFC8311] [I-D.ietf-tsvwg-ecn-l4s-id]. 302 Not-ECT: The ECN codepoint set by senders that indicates that the 303 transport is not ECN-capable. 305 CE: Congestion Experienced. The ECN codepoint that an intermediate 306 node sets to indicate congestion [RFC3168]. A node sets an 307 increasing proportion of ECT packets to CE as the level of congestion 308 increases. 310 3. Specification 312 The experimental ECN++ changes to the specification of TCP over ECN 313 [RFC3168] defined here primarily alter the behaviour of the sending 314 host for each half-connection. However, there are subsections for 315 forwarding elements and receivers below, which recommend that they 316 accept the new packets - they should do already, but might not. This 317 will allow implementers to check the receive side code while they are 318 altering the send-side code. All changes can be deployed at each 319 end-point independently of others and independent of any network 320 behaviour. 322 The feedback behaviour at the receiver depends on whether classic ECN 323 TCP feedback [RFC3168] or Accurate ECN (AccECN) TCP feedback 324 [I-D.ietf-tcpm-accurate-ecn] has been negotiated. Nonetheless, 325 neither receiver feedback behaviour is altered by the present 326 specification. 328 3.1. Network (e.g. Firewall) Behaviour 330 Previously the specification of ECN for TCP [RFC3168] required the 331 sender to set not-ECT on TCP control packets and retransmissions. 332 Some readers of RFC 3168 might have erroneously interpreted this as a 333 requirement for firewalls, intrusion detection systems, etc. to check 334 and enforce this behaviour. Section 4.3 of [RFC8311] updates RFC 335 3168 to remove this ambiguity. It requires firewalls or any 336 intermediate nodes not to treat certain types of ECN-capable TCP 337 segment differently (except potentially in one attack scenario). 338 This is likely to only involve a firewall rule change in a fraction 339 of cases (at most 0.4% of paths according to the tests reported in 340 Section 4.2.2). 342 In case a TCP sender encounters a middlebox blocking ECT on certain 343 TCP segments, the specification below includes behaviour to fall back 344 to non-ECN. However, this loses the benefit of ECN on control 345 packets. So operators are RECOMMENDED to alter their firewall rules 346 to comply with the requirement referred to above (section 4.3 of 347 [RFC8311]). 349 3.2. Sender Behaviour 351 For each type of control packet or retransmission, the following 352 sections detail changes to the sender's behaviour in two respects: i) 353 whether it sets ECT; and ii) its response to congestion feedback. 354 Table 1 summarises these two behaviours for each type of packet, but 355 the relevant subsection below should be referred to for the detailed 356 behaviour. The subsection on the SYN is more complex than the 357 others, because it has to include fall-back behaviour if the ECT 358 packet appears not to have got through, and caching of the outcome to 359 detect persistent failures. 361 +============+==============+==============+======================+ 362 | TCP packet | ECN field if | ECN field if | Congestion Response | 363 | type | AccECN f/b | RFC3168 f/b | | 364 | | negotiated* | negotiated* | | 365 +============+==============+==============+======================+ 366 | SYN | ECT | not-ECT | If AccECN, reduce IW | 367 +------------+--------------+--------------+----------------------+ 368 | SYN-ACK | ECT | ECT | Reduce IW | 369 +------------+--------------+--------------+----------------------+ 370 | Pure ACK | ECT | not-ECT | If AccECN, usual | 371 | | | | cwnd response and | 372 | | | | optionally [RFC5690] | 373 +------------+--------------+--------------+----------------------+ 374 | W Probe | ECT | ECT | Usual cwnd response | 375 +------------+--------------+--------------+----------------------+ 376 | FIN | ECT | ECT | None or optionally | 377 | | | | [RFC5690] | 378 +------------+--------------+--------------+----------------------+ 379 | RST | ECT | ECT | N/A | 380 +------------+--------------+--------------+----------------------+ 381 | Re-XMT | ECT | ECT | Usual cwnd response | 382 +------------+--------------+--------------+----------------------+ 384 Table 1: Summary of sender behaviour. In each case the 385 relevant section below should be referred to for the detailed 386 behaviour 388 Window probe and retransmission are abbreviated to W Probe an Re-XMT. 389 * For a SYN, "negotiated" means "requested". 391 It can be seen that we recommend against the sender setting ECT on 392 the SYN if it is not requesting AccECN feedback. Therefore it is 393 RECOMMENDED that the AccECN specification 394 [I-D.ietf-tcpm-accurate-ecn] is implemented, along with the ECN++ 395 experiment, because it is expected that ECT on the SYN will give the 396 most significant performance gain, particularly for short flows. 398 Nonetheless, this specification also caters for the case where an 399 ECN++ TCP sender is not using AccECN. This could be because it does 400 not support AccECN or because the other end of the TCP connection 401 does not (AccECN can only be used for a connection if both ends 402 support it). 404 Note that Table 1 does not imply any obligation to set any packet to 405 ECT. ECN++ removes the restrictions that RFC 3168 places against 406 setting ECT on these types of packets, and an implementation would 407 normally be expected to take advantage of this, but it does not have 408 to. Therefore, an implementation of the ECN++ experiment would be 409 compliant if, for instance, it set ECT on some types of control 410 packets but not others. 412 3.2.1. SYN (Send) 414 3.2.1.1. Setting ECT on the SYN 416 With classic [RFC3168] ECN feedback, the SYN was not expected to be 417 ECN-capable, so the flag provided to feed back congestion was put to 418 another use (it is used in combination with other flags to indicate 419 that the responder supports ECN). In contrast, Accurate ECN (AccECN) 420 feedback [I-D.ietf-tcpm-accurate-ecn] provides a codepoint in the 421 SYN-ACK for the responder to feed back whether the SYN arrived marked 422 CE. Therefore the setting of the IP/ECN field on the SYN is 423 specified separately for each case in the following two subsections. 425 3.2.1.1.1. ECN++ TCP Client also Supports AccECN 427 For the ECN++ experiment, if the SYN is requesting AccECN feedback, 428 the TCP sender will also set ECT on the SYN. It can ignore the 429 prohibition in section 6.1.1 of RFC 3168 against setting ECT on such 430 a SYN, as per Section 4.3 of [RFC8311]. 432 3.2.1.1.2. ECN++ TCP Client does not Support AccECN 434 If the SYN sent by a TCP initiator does not attempt to negotiate 435 Accurate ECN feedback, or does not use an equivalent safety 436 mechanism, it MUST still comply with RFC 3168, which says that a TCP 437 initiator "MUST NOT set ECT on a SYN". 439 The only envisaged examples of "equivalent safety mechanisms" are: a) 440 some future TCP ECN feedback protocol, perhaps evolved from AccECN, 441 that feeds back CE marking on a SYN; b) setting the initial window to 442 1 SMSS. IW=1 is NOT RECOMMENDED because it could degrade 443 performance, but might be appropriate for certain lightweight TCP 444 implementations. 446 See Section 4.2 for discussion and rationale. 448 If the TCP initiator does not set ECT on the SYN, the rest of 449 Section 3.2.1 does not apply. 451 3.2.1.2. Caching where to use ECT on SYNs 453 This subsection only applies if the ECN++ TCP client set ECTs on the 454 SYN and supports AccECN. 456 Until AccECN servers become widely deployed, a TCP initiator that 457 sets ECT on a SYN (which typically implies the same SYN also requests 458 AccECN, as above) SHOULD also maintain a cache entry per server to 459 record servers that it is not worth sending an ECT SYN to, 460 e.g. because they do not support AccECN and therefore have no logic 461 for congestion markings on the SYN. Mobile hosts MAY maintain a 462 cache entry per access network to record 'non-ECT SYN' entries 463 against proxies (see Section 4.2.3). This cache can be implemented 464 as part of the shared state across multiple TCP connections, 465 following [RFC2140]. 467 Subsequently the initiator will not set ECT on a SYN to such a server 468 or proxy, but it can still always request AccECN support (because the 469 response will state any earlier stage of ECN evolution that the 470 server supports with no performance penalty). If a server 471 subsequently upgrades to support AccECN, the initiator will discover 472 this as soon as it next connects, then it can remove the server from 473 its cache and subsequently always set ECT for that server. 475 The client can limit the size of its cache of 'non-ECT SYN' servers. 476 Then, while AccECN is not widely deployed, it will only cache the 477 'non-ECT SYN' servers that are most used and most recently used by 478 the client. As the client accesses servers that have been expelled 479 from its cache, it will simply use ECT on the SYN by default. 481 Servers that do not support ECN as a whole do not need to be recorded 482 separately from non-support of AccECN because the response to a 483 request for AccECN immediately states which stage in the evolution of 484 ECN the server supports (AccECN [I-D.ietf-tcpm-accurate-ecn], classic 485 ECN [RFC3168] or no ECN). 487 The above strategy is named "optimistic ECT and cache failures". It 488 is believed to be sufficient based on three measurement studies and 489 assumptions detailed in Section 4.2.3. However, Section 4.2.3 gives 490 two other strategies and the choice between them depends on the 491 implementer's goals and the deployment prevalence of ECN variants in 492 the network and on servers, not to mention the prevalence of some 493 significant bugs. 495 If the initiator times out without seeing a SYN-ACK, it will 496 separately cache this fact (see fall-back in Section 3.2.1.4 for 497 details). 499 3.2.1.3. SYN Congestion Response 501 As explained above, this subsection only applies if the ECN++ TCP 502 client sets ECT on the initial SYN. 504 If the SYN-ACK returned to the TCP initiator confirms that the server 505 supports AccECN, it will also be able to indicate whether or not the 506 SYN was CE-marked. If the SYN was CE-marked, and if the initial 507 window is greater than 1 MSS, then, the initiator MUST reduce its 508 Initial Window (IW) and SHOULD reduce it to 1 SMSS (sender maximum 509 segment size). The rationale is the same as that for the response to 510 CE on a SYN-ACK (Section 4.3.2). 512 If the initiator has set ECT on the SYN and if the SYN-ACK shows that 513 the server does not support feedback of a CE on the SYN (e.g. it does 514 not support AccECN) and if the initial congestion window of the 515 initiator is greater than 1 MSS, then the TCP initiator MUST 516 conservatively reduce its Initial Window and SHOULD reduce it to 1 517 SMSS. A reduction to greater than 1 SMSS MAY be appropriate (see 518 Section 4.2.1). Conservatism is necessary because the SYN-ACK cannot 519 show whether the SYN was CE-marked. 521 If the TCP initiator (host A) receives a SYN from the remote end 522 (host B) after it has sent a SYN to B, it indicates the (unusual) 523 case of a simultaneous open. Host A will respond with a SYN-ACK. 524 Host A will probably then receive a SYN-ACK in response to its own 525 SYN, after which it can follow the appropriate one of the two 526 paragraphs above. 528 In all the above cases, the initiator does not have to back off its 529 retransmission timer as it would in response to a timeout following 530 no response to its SYN [RFC6298], because both the SYN and the SYN- 531 ACK have been successfully delivered through the network. Also, the 532 initiator does not need to exit slow start or reduce ssthresh, which 533 is not even required when a SYN is lost [RFC5681]. 535 If an initial window of more than 3 segments is implemented 536 (e.g. IW10 [RFC6928]), Section 5 gives additional recommendations. 538 3.2.1.4. Fall-Back Following No Response to an ECT SYN 540 As explained above, this subsection only applies if the ECN++ TCP 541 client also sets ECT on the initial SYN. 543 An ECT SYN might be lost due to an over-zealous path element (or 544 server) blocking ECT packets that do not conform to RFC 3168. Some 545 evidence of this was found in a 2014 study [ecn-pam], but in a more 546 recent study using 2017 data [Mandalari18] extensive measurements 547 found no case where ECT on TCP control packets was treated any 548 differently from ECT on TCP data packets. Loss is commonplace for 549 numerous other reasons, e.g. congestion loss at a non-ECN queue on 550 the forward or reverse path, transmission errors, etc. 551 Alternatively, the cause of the loss might be the associated attempt 552 to negotiate AccECN, or possibly other unrelated options on the SYN. 554 Therefore, if the timer expires after the TCP initiator has sent the 555 first ECT SYN, it SHOULD make one more attempt to retransmit the SYN 556 with ECT set (backing off the timer as usual). If the retransmission 557 timer expires again, it SHOULD retransmit the SYN with the not-ECT 558 codepoint in the IP header, to expedite connection set-up. If other 559 experimental fields or options were on the SYN, it will also be 560 necessary to follow their specifications for fall-back too. It would 561 make sense to coordinate all the strategies for fall-back in order to 562 isolate the specific cause of the problem. 564 If the TCP initiator is caching failed connection attempts, it SHOULD 565 NOT give up using ECT on the first SYN of subsequent connection 566 attempts until it is clear that a blockage persistently and 567 specifically affects ECT on SYNs. This is because loss is so 568 commonplace for other reasons. Even if it does eventually decide to 569 give up setting ECT on the SYN, it will probably not need to give up 570 on AccECN on the SYN. In any case, if a cache is used, it SHOULD be 571 arranged to expire so that the initiator will infrequently attempt to 572 check whether the problem has been resolved. 574 Other fall-back strategies MAY be adopted where applicable (see 575 Section 4.2.2 for suggestions, and the conditions under which they 576 would apply). 578 3.2.2. SYN-ACK (Send) 580 3.2.2.1. Setting ECT on the SYN-ACK 582 For the ECN++ experiment, the TCP implementation will set ECT on SYN- 583 ACKs. It can ignore the requirement in section 6.1.1 of RFC 3168 to 584 set not-ECT on a SYN-ACK, as per Section 4.3 of [RFC8311]. 586 3.2.2.2. SYN-ACK Congestion Response 588 A host that sets ECT on SYN-ACKs MUST reduce its initial window in 589 response to any congestion feedback, whether using classic ECN or 590 AccECN (see Section 4.3.1). It SHOULD reduce it to 1 SMSS. This is 591 different to the behaviour specified in an earlier experiment that 592 set ECT on the SYN-ACK [RFC5562]. This is justified in 593 Section 4.3.2. 595 The responder does not have to back off its retransmission timer 596 because the ECN feedback proves that the network is delivering 597 packets successfully and is not severely overloaded. Also the 598 responder does not have to leave slow start or reduce ssthresh, which 599 is not even required when a SYN-ACK has been lost. 601 The congestion response to CE-marking on a SYN-ACK for a server that 602 implements either the TCP Fast Open experiment (TFO [RFC7413]) or 603 experimentation with an initial window of more than 3 segments 604 (e.g. IW10 [RFC6928]) is discussed in Section 5. 606 3.2.2.3. Fall-Back Following No Response to an ECT SYN-ACK 608 After the responder sends a SYN-ACK with ECT set, if its 609 retransmission timer expires it SHOULD retransmit one more SYN-ACK 610 with ECT set (and back-off its timer as usual). If the timer expires 611 again, it SHOULD retransmit the SYN-ACK with not-ECT in the IP 612 header. If other experimental fields or options were on the initial 613 SYN-ACK, it will also be necessary to follow their specifications for 614 fall-back. It would make sense to co-ordinate all the strategies for 615 fall-back in order to isolate the specific cause of the problem. 617 This fall-back strategy attempts to use ECT one more time than the 618 strategy for ECT SYN-ACKs in [RFC5562] (which is made obsolete, being 619 superseded by the present specification). Other fall-back strategies 620 MAY be adopted if found to be more effective, e.g. fall-back to not- 621 ECT on the first retransmission attempt. 623 The server MAY cache failed connection attempts, e.g. per client 624 access network. A client-based alternative to caching at the server 625 is given in Section 4.3.3. If the TCP server is caching failed 626 connection attempts, it SHOULD NOT give up using ECT on the first 627 SYN-ACK of subsequent connection attempts until it is clear that the 628 blockage persistently and specifically affects ECT on SYN-ACKs. This 629 is because loss is so commonplace for other reasons (see 630 Section 3.2.1.4). If a cache is used, it SHOULD be arranged to 631 expire so that the server will infrequently attempt to check whether 632 the problem has been resolved. 634 3.2.3. Pure ACK (Send) 636 A Pure ACK is an ACK packet that does not carry data, which includes 637 the Pure ACK at the end of TCP's 3-way handshake. 639 For the ECN++ experiment, whether a TCP implementation sets ECT on a 640 Pure ACK depends on whether or not Accurate ECN TCP feedback 641 [I-D.ietf-tcpm-accurate-ecn] has been successfully negotiated for a 642 particular TCP connection, as specified in the following two 643 subsections. 645 3.2.3.1. Pure ACK without AccECN Feedback 647 If AccECN has not been successfully negotiated for a connection, ECT 648 MUST NOT be set on Pure ACKs by either end. 650 3.2.3.2. Pure ACK with AccECN Feedback 652 For the ECN++ experiment, if AccECN has been successfully negotiated, 653 either end of the connection will set ECT on Pure ACKs. They can 654 ignore the requirement in section 6.1.4 of RFC 3168 to set not-ECT on 655 a pure ACK, as per Section 4.3 of [RFC8311]. 657 MEASUREMENTS NEEDED: Measurements are needed to learn how the 658 deployed base of network elements and RFC 3168 servers react to 659 pure ACKs marked with the ECT(0)/ECT(1)/CE codepoints, 660 i.e. whether they are dropped, codepoint cleared or processed and 661 the congestion indication fed back on a subsequent packet. 663 See Section 3.3.3 for the implications if a host receives a CE-marked 664 Pure ACK. 666 3.2.3.2.1. Pure ACK Congestion Response 668 As explained above, this subsection only applies if AccECN has been 669 successfully negotiated for the TCP connection. 671 A host that sets ECT on pure ACKs SHOULD respond to the congestion 672 signal resulting from pure ACKs being marked with the CE codepoint. 673 The specific response will need to be defined as an update to each 674 congestion control specification. Possible responses to congestion 675 feedback include reducing the congestion window (CWND) and/or 676 regulating the pure ACK rate (see Section 4.4.1.1). 678 Note that, in comparison, TCP Congestion Control [RFC5681] does not 679 require a TCP to detect or respond to loss of pure ACKs at all; it 680 requires no reduction in congestion window or ACK rate. 682 3.2.4. Window Probe (Send) 684 For the ECN++ experiment, the TCP sender will set ECT on window 685 probes. It can ignore the prohibition in section 6.1.6 of RFC 3168 686 against setting ECT on a window probe, as per Section 4.3 of 687 [RFC8311]. 689 A window probe contains a single octet, so it is no different from a 690 regular TCP data segment. Therefore a TCP receiver will feed back 691 any CE marking on a window probe as normal (either using classic ECN 692 feedback or AccECN feedback). The sender of the probe will then 693 reduce its congestion window as normal. 695 A receive window of zero indicates that the application is not 696 consuming data fast enough and does not imply anything about network 697 congestion. Once the receive window opens, the congestion window 698 might become the limiting factor, so it is correct that CE-marked 699 probes reduce the congestion window. This complements cwnd 700 validation [RFC7661], which reduces cwnd as more time elapses without 701 having used available capacity. However, CE-marking on window probes 702 does not reduce the rate of the probes themselves. This is unlikely 703 to present a problem, given the duration between window probes 704 doubles [RFC1122] as long as the receiver is advertising a zero 705 window (currently minimum 1 second, maximum at least 1 minute 706 [RFC6298]). 708 MEASUREMENTS NEEDED: Measurements are needed to learn how the 709 deployed base of network elements and servers react to Window 710 probes marked with the ECT(0)/ECT(1)/CE codepoints, i.e. whether 711 they are dropped, codepoint cleared or processed. 713 3.2.5. FIN (Send) 715 A TCP implementation can set ECT on a FIN. 717 See Section 3.3.4 for the implications if a host receives a CE-marked 718 FIN. 720 A congestion response to a CE-marking on a FIN is not required. 722 After sending a FIN, the endpoint will not send any more data in the 723 connection. Therefore, even if the FIN-ACK indicates that the FIN 724 was CE-marked (whether using classic or AccECN feedback), reducing 725 the congestion window will not affect anything. 727 After sending a FIN, a host might send one or more pure ACKs. If it 728 is using one of the techniques in Section 3.2.3 to regulate the 729 delayed ACK ratio for pure ACKs, it could equally be applied after a 730 FIN. But this is not required. 732 MEASUREMENTS NEEDED: Measurements are needed to learn how the 733 deployed base of network elements and servers react to FIN packets 734 marked with the ECT(0)/ECT(1)/CE codepoints, i.e. whether they are 735 dropped, codepoint cleared or processed. 737 3.2.6. RST (Send) 739 A TCP implementation can set ECT on a RST. 741 See Section 3.3.5 for the implications if a host receives a CE-marked 742 RST. 744 A congestion response to a CE-marking on a RST is not required (and 745 actually not possible). 747 MEASUREMENTS NEEDED: Measurements are needed to learn how the 748 deployed base of network elements and servers react to RST packets 749 marked with the ECT(0)/ECT(1)/CE codepoints, i.e. whether they are 750 dropped, codepoint cleared or processed. 752 Implementers SHOULD ensure that RST packets (and control packets 753 generally) are always sent out with the same ECN field regardless of 754 the TCP state machine. Otherwise the ECN field could reveal internal 755 TCP state. For instance, the ECN field on a RST ought not to reveal 756 any distinction between a non-listening port, a recently in-use port, 757 and a closed session port. 759 3.2.7. Retransmissions (Send) 761 For the ECN++ experiment, the TCP sender will set ECT on 762 retransmitted segments. It can ignore the prohibition in section 763 6.1.5 of RFC 3168 against setting ECT on retransmissions, as per 764 Section 4.3 of [RFC8311]. 766 See Section 3.3.6 for the implications if a host receives a CE-marked 767 retransmission. 769 If the TCP sender receives feedback that a retransmitted packet was 770 CE-marked, it will react as it would to any feedback of CE-marking on 771 a data packet. 773 MEASUREMENTS NEEDED: Measurements are needed to learn how the 774 deployed base of network elements and servers react to 775 retransmissions marked with the ECT(0)/ECT(1)/CE codepoints, 776 i.e. whether they are dropped, codepoint cleared or processed. 778 3.2.8. General Fall-back for any Control Packet or Retransmission 780 Extensive measurements in fixed and mobile networks [Mandalari18] 781 have found no evidence of blockages due to ECT being set on any type 782 of TCP control packet. 784 In case traversal problems arise in future, fall-back measures have 785 been specified above, but only for the cases where ECT on the initial 786 packet of a half-connection (SYN or SYN-ACK) is persistently failing 787 to get through. 789 Fall-back measures for blockage of ECT on other TCP control packets 790 MAY be implemented. However they are not specified here given the 791 lack of any evidence they will be needed. Section 4.9 justifies this 792 advice in more detail. 794 3.3. Receiver Behaviour 796 The present ECN++ specification primarily concerns the behaviour for 797 sending TCP control packets or retransmissions. Below are a few 798 changes to the receive side of an implementation that are recommended 799 while updating its send side. Nonetheless, where deployment is 800 concerned, ECN++ is still a sender-only deployment, because it does 801 not depend on receivers complying with any of these recommendations. 803 3.3.1. Receiver Behaviour for Any TCP Control Packet or Retransmission 805 RFC8311 is a standards track update to RFC 3168 in order to (amongst 806 other things) "...allow the use of ECT codepoints on SYN packets, 807 pure acknowledgement packets, window probe packets, and 808 retransmissions of packets..., provided that the changes from RFC 809 3168 are documented in an Experimental RFC in the IETF document 810 stream." 812 Section 4.3 of RFC 8311 amends every statement in RFC 3168 that 813 precludes the use of ECT on control packets and retransmissions to 814 add "unless otherwise specified by an Experimental RFC in the IETF 815 document stream". The present specification is such an Experimental 816 RFC. Therefore, In order for the present RFC 8311 experiment to be 817 useful, TCP receivers will need to satisfy the following 818 requirements: 820 * Any TCP implementation SHOULD accept receipt of any valid TCP 821 control packet or retransmission irrespective of its IP/ECN field. 822 If any existing implementation does not, it SHOULD be updated to 823 do so. 825 * A TCP implementation taking part in the experiments proposed here 826 MUST accept receipt of any valid TCP control packet or 827 retransmission irrespective of its IP/ECN field. 829 The following sections give further requirements specific to each 830 type of control packet. 832 These measures are derived from the robustness principle of "... be 833 liberal in what you accept from others", not only to ensure 834 compatibility with the present experimental specification, but also 835 any future protocol changes that allow ECT on any TCP packet. 837 3.3.2. SYN (Receive) 839 RFC 3168 negotiates the use of ECN for the connection end-to-end 840 using the ECN flags in the TCP header. RFC 3168 originally said that 841 "A host MUST NOT set ECT on SYN ... packets." but it was silent as to 842 what a TCP server ought to do if it receives a SYN packet with a non- 843 zero IP/ECN field anyway. 845 For the avoidance of doubt, the normative statements for all TCP 846 control packets in Section 3.3.1 are interpreted for the specific 847 case when a SYN is received as follows: 849 * Any TCP server implementation SHOULD accept receipt of a valid SYN 850 that requests ECN support for the connection, irrespective of the 851 IP/ECN field of the SYN. If any existing implementation does not, 852 it SHOULD be updated to do so. 854 * A TCP implementation taking part in the ECN++ experiment MUST 855 accept receipt of a valid SYN, irrespective of its IP/ECN field. 857 * If the SYN is CE-marked and the server has no logic to feed back a 858 CE mark on a SYN-ACK (e.g. it does not support AccECN), it has to 859 ignore the CE-mark (the client detects this case and behaves 860 conservatively in mitigation - see Section 3.2.1.3). 862 Rationale: At the time of the writing, some implementations of TCP 863 servers (see Section 4.2.2.2) assume that, if a host receives a SYN 864 with a non-zero IP/ECN field, it must be due to network mangling, and 865 they disable ECN for the rest of the connection. Section 4.2.2.2 866 cites a measurement study run in 2017 that found no occurrence of 867 this type of network mangling. However, a year earlier, when ECN was 868 enabled on connections from Apple clients, there was a case of a 869 whole network that re-marked the ECN field of every packet to CE (it 870 was rapidly fixed). 872 When ECN was not allowed on SYNs, it made sense to look for a non- 873 zero ECN field on the SYN to detect this type of network mangling. 874 But now that ECN is being allowed on a SYN, detection needs to be 875 more nuanced. A server needs to disable the test on the SYN alone 876 for AccECN SYNs (which was done for Linux RFC 3168 servers in 2019 877 [relax-strict-ecn]) and for RFC 3168 SYNs it needs to watch for three 878 or four packets all set to CE at the start of a flow. If such 879 mangling is indeed now so rare, it would also be preferable to log 880 each case detected and manually report it to the responsible network, 881 so that the problem will eventually be eliminated. 883 3.3.3. Pure ACK (Receive) 885 For the avoidance of doubt, the normative statements for all TCP 886 control packets in Section 3.3.1 are interpreted for the specific 887 case when a Pure ACK is received as follows: 889 * Any TCP implementation SHOULD accept receipt of a pure ACK with a 890 non-zero ECN field, despite current RFCs precluding the sending of 891 such packets. 893 * A TCP implementation taking part in the ECN++ experiment MUST 894 accept receipt of a pure ACK with a non-zero ECN field. 896 The question of whether and how the receiver of pure ACKs is required 897 to feed back any CE marks on them is outside the scope of the present 898 specification because it is a matter for the relevant feedback 899 specification ([RFC3168] or [I-D.ietf-tcpm-accurate-ecn]). AccECN 900 feedback is required to count CE marking of any control packet 901 including pure ACKs. Whereas RFC 3168 is silent on this point, so 902 feedback of CE-markings might be implementation specific (see 903 Section 4.4.1.1). 905 3.3.4. FIN (Receive) 907 The TCP data receiver MUST ignore the CE codepoint on incoming FINs 908 that fail any validity check. The validity check in section 5.2 of 909 [RFC5961] is RECOMMENDED. 911 3.3.5. RST (Receive) 913 The "challenge ACK" approach to checking the validity of RSTs 914 (section 3.2 of [RFC5961] is RECOMMENDED at the data receiver. 916 3.3.6. Retransmissions (Receive) 918 The TCP data receiver MUST ignore the CE codepoint on incoming 919 segments that fail any validity check. The validity check in section 920 5.2 of [RFC5961] is RECOMMENDED. This will effectively mitigate an 921 attack that uses spoofed data packets to fool the receiver into 922 feeding back spoofed congestion indications to the sender, which in 923 turn would be fooled into continually reducing its congestion window. 925 4. Rationale 927 This section is informative, not normative. It presents counter- 928 arguments against the justifications in the RFC series for disabling 929 ECN on TCP control segments and retransmissions. It also gives 930 rationale for why ECT is safe on control segments that have not, so 931 far, been mentioned in the RFC series. First it addresses over- 932 arching arguments used for most packet types, then it addresses the 933 specific arguments for each packet type in turn. 935 4.1. The Reliability Argument 937 Section 5.2 of RFC 3168 states: 939 "To ensure the reliable delivery of the congestion indication of 940 the CE codepoint, an ECT codepoint MUST NOT be set in a packet 941 unless the loss of that packet [at a subsequent node] in the 942 network would be detected by the end nodes and interpreted as an 943 indication of congestion." 945 We believe this argument is misplaced. TCP does not deliver most 946 control packets reliably. So it is more important to allow control 947 packets to be ECN-capable, which greatly improves reliable delivery 948 of the control packets themselves (see motivation in Section 1.1). 949 ECN also improves the reliability and latency of delivery of any 950 congestion notification on control packets, particularly because TCP 951 does not detect the loss of most types of control packet anyway. 952 Both these points outweigh by far the concern that a CE marking 953 applied to a control packet by one node might subsequently be dropped 954 by another node. 956 The principle to determine whether a packet can be ECN-capable ought 957 to be "do no extra harm", meaning that the reliability of a 958 congestion signal's delivery ought to be no worse with ECN than 959 without. In particular, setting the CE codepoint on the very same 960 packet that would otherwise have been dropped fulfills this 961 criterion, since either the packet is delivered and the CE signal is 962 delivered to the endpoint, or the packet is dropped and the original 963 congestion signal (packet loss) is delivered to the endpoint. 965 The concern about a CE marking being dropped at a subsequent node 966 might be motivated by the idea that ECN-marking a packet at the first 967 node does not remove the packet, so it could go on to worsen 968 congestion at a subsequent node. However, it is not useful to reason 969 about congestion by considering single packets. The departure rate 970 from the first node will generally be the same (fully utilized) with 971 or without ECN, so this argument does not apply. 973 4.2. SYNs 975 RFC 5562 presents two arguments against ECT marking of SYN packets 976 (quoted verbatim): 978 "First, when the TCP SYN packet is sent, there are no guarantees 979 that the other TCP endpoint (node B in Figure 2) is ECN-Capable, 980 or that it would be able to understand and react if the ECN CE 981 codepoint was set by a congested router. 983 Second, the ECN-Capable codepoint in TCP SYN packets could be 984 misused by malicious clients to "improve" the well-known TCP SYN 985 attack. By setting an ECN-Capable codepoint in TCP SYN packets, a 986 malicious host might be able to inject a large number of TCP SYN 987 packets through a potentially congested ECN-enabled router, 988 congesting it even further." 990 The first point actually describes two subtly different issues. So 991 below three arguments are countered in turn. 993 4.2.1. Argument 1a: Unrecognized CE on the SYN 995 This argument certainly applied at the time RFC 5562 was written, 996 when no ECN responder mechanism had any logic to recognize a CE 997 marking on a SYN and, even if logic were added, there was no field in 998 the SYN-ACK to feed it back. The problem was that, during the 3WHS, 999 the flag in the TCP header for ECN feedback (called Echo Congestion 1000 Experienced) had been overloaded to negotiate the use of ECN itself. 1002 The accurate ECN (AccECN) protocol [I-D.ietf-tcpm-accurate-ecn] has 1003 since been designed to solve this problem. Two features are 1004 important here: 1006 1. An AccECN server uses the 3 'ECN' bits in the TCP header of the 1007 SYN-ACK to respond to the client. 4 of the possible 8 codepoints 1008 provide enough space for the server to feed back which of the 4 1009 IP/ECN codepoints was on the incoming SYN (including CE of 1010 course). 1012 2. If any of these 4 codepoints are in the SYN-ACK, it confirms that 1013 the server supports AccECN and, if another codepoint is returned, 1014 it confirms that the server doesn't support AccECN. 1016 This still does not seem to allow a client to set ECT on a SYN, it 1017 only finds out whether the server would have supported it afterwards. 1018 The trick the client uses for ECN++ is to set ECT on the SYN 1019 optimistically then, if the SYN-ACK reveals that the server wouldn't 1020 have understood CE on the SYN, the client responds conservatively as 1021 if the SYN was marked with CE. 1023 The recommended conservative congestion response is to reduce the 1024 initial window, which does not affect the performance of very popular 1025 protocols such as HTTP, since it is extremely rare for an HTTP client 1026 to send more than one packet as its initial request anyway (for data 1027 on HTTP/1 & HTTP/2 request sizes see Fig 3 in [Manzoor17]). Any 1028 clients that do frequently use a larger initial window for their 1029 first message to the server can cache which servers will not 1030 understand ECT on a SYN (see Section 4.2.3 below). If caching is not 1031 practical, such clients could reduce the initial window to say IW2 or 1032 IW3. 1034 EXPERIMENTATION NEEDED: Experiments will be needed to determine 1035 any better strategy for reducing IW in response to congestion on a 1036 SYN, when the server does not support congestion feedback on the 1037 SYN-ACK (whether cached or discovered explicitly). 1039 4.2.2. Argument 1b: ECT Considered Invalid on the SYN 1041 Given, until now, ECT-marked SYN packets have been prohibited, it 1042 cannot be assumed they will be accepted, by TCP middleboxes or 1043 servers. 1045 4.2.2.1. ECT on SYN Considered Invalid by Middleboxes 1047 According to a study using 2014 data [ecn-pam] from a limited range 1048 of fixed vantage points, for the top 1M Alexa web sites, adding the 1049 ECN capability to SYNs was increasing connection establishment 1050 failures by about 0.4%. 1052 From a wider range of fixed and mobile vantage points, a more recent 1053 study in Jan-May 2017 [Mandalari18] found no occurrences of blocking 1054 of ECT on SYNs. However, in more than half the mobile networks 1055 tested it found wiping of the ECN codepoint at the first hop. 1057 MEASUREMENTS NEEDED: As wiping at the first hop is remedied, 1058 measurements will be needed to check whether SYNs with ECT are 1059 sometimes blocked deeper into the path. 1061 Silent failures introduce a retransmission timeout delay (default 1 1062 second) at the initiator before it attempts any fall back strategy 1063 (whereas explicit RSTs can be dealt with immediately). Ironically, 1064 making SYNs ECN-capable is intended to avoid the timeout when a SYN 1065 is lost due to congestion. Fortunately, if there is any discard of 1066 ECN-capable SYNs due to policy, it will occur predictably, not 1067 randomly like congestion. So the initiator should be able to avoid 1068 it by caching those sites that do not support ECN-capable SYNs (see 1069 the last paragraph of Section 3.2.1.2). 1071 4.2.2.2. ECT on SYN Considered Invalid by Servers 1073 A study conducted in Nov 2017 [Kuehlewind18] found that, of the 82% 1074 of the Alexa top 50k web servers that supported ECN, 84% disabled ECN 1075 if the IP/ECN field on the SYN was ECT0, CE or either. Given most 1076 web servers use Linux, this behaviour can most likely be traced to a 1077 patch contributed in May 2012 that was first distributed in v3.5 of 1078 the Linux kernel [strict-ecn]. The comment says "RFC3168 : 6.1.1 SYN 1079 packets must not have ECT/ECN bits set. If we receive a SYN packet 1080 with these bits set, it means a network is playing bad games with TOS 1081 bits. In order to avoid possible false congestion notifications, we 1082 disable TCP ECN negociation." Of course, some of the 84% might be 1083 due to similar code in other OSs. 1085 For brevity we shall call this the "over-strict" ECN test, because it 1086 is over-conservative with what it accepts, contrary to Postel's 1087 robustness principle. A robust protocol will not usually assume 1088 network mangling without comparing with the value originally sent, 1089 and one packet is not sufficient to make an assumption with such 1090 irreversible consequences anyway. 1092 Ironically, networks rarely seem to alter the IP/ECN field on a SYN 1093 from zero to non-zero anyway. In a study conducted in Jan-May 2017 1094 over millions of paths from vantage points in a few dozen mobile and 1095 fixed networks [Mandalari18], no such transition was observed. With 1096 such a small or non-existent incidence of this sort of network 1097 mangling, it would be preferable to report any residual problem paths 1098 so that they can be fixed. 1100 Whatever, the widespread presence of this 'over-strict' test proves 1101 that RFC 5562 was correct to expect that ECT would be considered 1102 invalid on SYNs. Nonetheless, it is not an insurmountable problem - 1103 the over-strict test in Linux was patched in Apr 2019 1104 [relax-strict-ecn] and caching can work round it where previous 1105 versions of Linux are running. The prevalence of these "over-strict" 1106 ECN servers makes it challenging to cache them all. However, 1107 Section 4.2.3 below explains how a cache of limited size can 1108 alleviate this problem for a client's most popular sites. 1110 For the future, [RFC8311] updates RFC 3168 to clarify that the IP/ECN 1111 field does not have to be zero on a SYN if documented in an 1112 experimental RFC such as the present ECN++ specification. 1114 4.2.3. Caching Strategies for ECT on SYNs 1116 Given the server handling of ECN on SYNs outlined in Section 4.2.2.2 1117 above, an initiator might combine AccECN with three candidate caching 1118 strategies for setting ECT on a SYN: 1120 (S1): Pessimistic ECT and cache successes: The initiator always 1121 requests AccECN, but by default without ECT on the SYN. Then 1122 it caches those servers that confirm that they support AccECN 1123 as 'ECT SYN OK'. On a subsequent connection to any server 1124 that supports AccECN, the initiator can then set ECT on the 1125 SYN. When connecting to other servers (non-ECN or classic 1126 ECN) it will not set ECT on the SYN, so it will not fail the 1127 'over-strict' ECN test. 1129 Longer term, as servers upgrade to AccECN, the initiator is 1130 still requesting AccECN, so it will add them to the cache and 1131 use ECT on subsequent SYNs to those servers. However, 1132 assuming it has to cap the size of the cache, the client will 1133 not have the benefit of ECT SYNs to those less frequently used 1134 AccECN servers expelled from its cache. 1136 (S2): Optimistic ECT: The initiator always requests AccECN and by 1137 default sets ECT on the SYN. Then, if the server response 1138 shows it has no AccECN logic (so it cannot feed back a CE 1139 mark), the initiator conservatively behaves as if the SYN was 1140 CE-marked, by reducing its initial window. 1142 a. No cache. 1144 b. Cache failures: The optimistic ECT strategy can be 1145 improved by caching solely those servers that do not 1146 support AccECN as 'ECT SYN NOK'. This would include non- 1147 ECN servers and all Classic ECN servers whether 'over- 1148 strict' or not. On subsequent connections to these non- 1149 AccECN servers, the initiator will still request AccECN 1150 but not set ECT on the SYN. Then, the connection can 1151 still fall back to Classic ECN, if the server supports it, 1152 and the initiator can use its full initial window (if it 1153 has enough request data to need it). 1155 Longer term, as servers upgrade to AccECN, the initiator 1156 will remove them from the cache and use ECT on subsequent 1157 SYNs to that server. 1159 Where an access network operator mediates Internet access 1160 via a proxy that does not support AccECN, the optimistic 1161 ECT strategy will always fail. This scenario is more 1162 likely in mobile networks. Therefore, a mobile host could 1163 cache lack of AccECN support per attached access network 1164 operator. Whenever it attached to a new operator, it 1165 could check a well-known AccECN test server and, if it 1166 found no AccECN support, it would add a cache entry for 1167 the attached operator. It would only use ECT when neither 1168 network nor server were cached. It would only populate 1169 its per server cache when not attached to a non-AccECN 1170 proxy. 1172 (S3): ECT by configuration: In a controlled environment, the 1173 administrator can make sure that servers support ECN-capable 1174 SYN packets. Examples of controlled environments are single- 1175 tenant DCs, and possibly multi-tenant DCs if it is assumed 1176 that each tenant mostly communicates with its own VMs. 1178 For unmanaged environments like the public Internet, pragmatically 1179 the choice is between strategies (S1), (S2A) and (S2B). The 1180 normative specification for ECT on a SYN in Section 3.2.1 recommends 1181 the "optimistic ECT and cache failures" strategy (S2B) but the choice 1182 depends on the implementer's motivation for using ECN++, and the 1183 deployment prevalence of different technologies and bug-fixes. 1185 * The "pessimistic ECT and cache successes" strategy (S1) suffers 1186 from exposing the initial SYN to the prevailing loss level, even 1187 if the server supports ECT on SYNs, but only on the first 1188 connection to each AccECN server. If AccECN becomes widely 1189 deployed on servers, SYNs to those AccECN servers that are less 1190 frequently used by the client and therefore don't fit in the cache 1191 will not benefit from ECN protection at all. 1193 * The "optimistic ECT without a cache" strategy (S2A) is the 1194 simplest. It would satisfy the goal of an implementer who is 1195 solely interested in low latency using AccECN and ECN++ and is not 1196 concerned about fall-back to Classic ECN. 1198 * The "optimistic ECT and cache failures" strategy (S2B) exploits 1199 ECT on SYNs from the very first attempt. But if the server turns 1200 out to be 'over-strict' it will disable ECN for the connection, 1201 but only for the first connection if it's one of the client's more 1202 popular servers that fits in the cache. If the server turns out 1203 not to support AccECN, the initiator has to conservatively limit 1204 its initial window, but again only for the first connection if 1205 it's one of the client's more popular servers (and anyway this 1206 rarely makes any difference when most client requests fit in a 1207 single packet). 1209 Note that, if AccECN deployment grows, caching successes (S1) starts 1210 off small then grows, while caching failures (S2B) becomes large at 1211 first, then shrinks. At half-way, the size of the cache has to be 1212 capped with either approach, so the default behaviour for all the 1213 servers that do not fit in the cache is as important as the behaviour 1214 for the popular servers that do fit. 1216 MEASUREMENTS NEEDED: Measurements are needed to determine which 1217 strategy would be sufficient for any particular client, whether a 1218 particular client would need different strategies in different 1219 circumstances and how many occurrences of problems would be masked 1220 by how few cache entries. 1222 Another strategy would be to send a not-ECT SYN a short delay (below 1223 the typical lowest RTT) after an ECT SYN and only accept the non-ECT 1224 connection if it returned first. This would reduce the performance 1225 penalty for those deploying ECT SYN support. However, this 'happy 1226 eyeballs' approach becomes complex when multiple optional features 1227 are all tried on the first SYN (or on multiple SYNs), so it is not 1228 recommended. 1230 4.2.4. Argument 2: DoS Attacks 1232 [RFC5562] says that ECT SYN packets could be misused by malicious 1233 clients to augment "the well-known TCP SYN attack". It goes on to 1234 say "a malicious host might be able to inject a large number of TCP 1235 SYN packets through a potentially congested ECN-enabled router, 1236 congesting it even further." 1237 We assume this is a reference to the TCP SYN flood attack (see 1238 https://en.wikipedia.org/wiki/SYN_flood), which is an attack against 1239 a responder end point. We assume the idea of this attack is to use 1240 ECT to get more packets through an ECN-enabled router in preference 1241 to other non-ECN traffic so that they can go on to use the SYN 1242 flooding attack to inflict more damage on the responder end point. 1243 This argument could apply to flooding with any type of packet, but we 1244 assume SYNs are singled out because their source address is easier to 1245 spoof, whereas floods of other types of packets are easier to block. 1247 Mandating Not-ECT in an RFC does not stop attackers using ECT for 1248 flooding. Nonetheless, if a standard says SYNs are not meant to be 1249 ECT it would make it legitimate for firewalls to discard them. 1250 However this would negate the considerable benefit of ECT SYNs for 1251 compliant transports and seems unnecessary because RFC 3168 already 1252 provides the means to address this concern. In section 7, RFC 3168 1253 says "During periods where ... the potential packet marking rate 1254 would be high, our recommendation is that routers drop packets rather 1255 then set the CE codepoint..." and this advice is repeated in 1256 [RFC7567] (section 4.2.1). This makes it harder for flooding packets 1257 to gain from ECT. 1259 [ecn-overload] showed that ECT can only slightly augment flooding 1260 attacks relative to a non-ECT attack. It was hard to overload the 1261 link without causing the queue to grow, which in turn caused the AQM 1262 to disable ECN and switch to drop, thus negating any advantage of 1263 using ECT. This was true even with the switch-over point set to 25% 1264 drop probability (i.e. the arrival rate was 133% of the link rate). 1266 4.3. SYN-ACKs 1268 The proposed approach in Section 3.2.2 for experimenting with ECN- 1269 capable SYN-ACKs is effectively identical to the scheme called ECN+ 1270 [ECN-PLUS]. In 2005, the ECN+ paper demonstrated that it could 1271 reduce the average Web response time by an order of magnitude. It 1272 also argued that adding ECT to SYN-ACKs did not raise any new 1273 security vulnerabilities. 1275 4.3.1. Possibility of Unrecognized CE on the SYN-ACK 1277 The feedback behaviour by the initiator in response to a CE-marked 1278 SYN-ACK from the responder depends on whether classic ECN feedback 1279 [RFC3168] or AccECN feedback [I-D.ietf-tcpm-accurate-ecn] has been 1280 negotiated. In either case no change is required to RFC 3168 or the 1281 AccECN specification. 1283 Some classic ECN client implementations might ignore a CE-mark on a 1284 SYN-ACK, or even ignore a SYN-ACK packet entirely if it is set to ECT 1285 or CE. This is a possibility because an RFC 3168 implementation 1286 would not necessarily expect a SYN-ACK to be ECN-capable. This issue 1287 already came up when the IETF first decided to experiment with ECN on 1288 SYN-ACKs [RFC5562] and it was decided to go ahead without any extra 1289 precautionary measures. This was because the probability of 1290 encountering the problem was believed to be low and the harm if the 1291 problem arose was also low (see Appendix B of RFC 5562). 1293 4.3.2. Response to Congestion on a SYN-ACK 1295 The IETF has already specified an experiment with ECN-capable SYN-ACK 1296 packets [RFC5562]. It was inspired by the ECN+ paper, but it 1297 specified a much more conservative congestion response to a CE-marked 1298 SYN-ACK, called ECN+/TryOnce. This required the server to reduce its 1299 initial window to 1 segment (like ECN+), but then the server had to 1300 send a second SYN-ACK and wait for its ACK before it could continue 1301 with its initial window of 1 SMSS. The second SYN-ACK of this 5-way 1302 handshake had to carry no data, and had to disable ECN, but no 1303 justification was given for these last two aspects. 1305 The present ECN++ experimental specification obsoletes RFC 5562 1306 because it uses the ECN+ congestion response, not ECN+/TryOnce. 1307 First we argue against the rationale for ECN+/TryOnce given in 1308 sections 4.4 and 6.2 of [RFC5562]. It starts with a rather too 1309 literal interpretation of the requirement in RFC 3168 that says TCP's 1310 response to a single CE mark has to be "essentially the same as the 1311 congestion control response to a *single* dropped packet." TCP's 1312 response to a dropped initial (SYN or SYN-ACK) packet is to wait for 1313 the retransmission timer to expire (currently 1s). However, this 1314 long delay assumes the worst case between two possible causes of the 1315 loss: a) heavy overload; or b) the normal capacity-seeking behaviour 1316 of other TCP flows. When the network is still delivering CE-marked 1317 packets, it implies that there is an AQM at the bottleneck and that 1318 it is not overloaded. This is because an AQM under overload will 1319 disable ECN (as recommended in section 7 of RFC 3168 and repeated in 1320 section 4.2.1 of RFC 7567). So scenario (a) can be ruled out. 1321 Therefore, TCP's response to a CE-marked SYN-ACK can be similar to 1322 its response to the loss of _any_ packet, rather than backing off as 1323 if the special _initial_ packet of a flow has been lost. 1325 How TCP responds to the loss of any single packet depends what it has 1326 just been doing. But there is not really a precedent for TCP's 1327 response when it experiences a CE mark having sent only one (small) 1328 packet. If TCP had been adding one segment per RTT, it would have 1329 halved its congestion window, but it hasn't established a congestion 1330 window yet. If it had been exponentially increasing it would have 1331 exited slow start, but it hasn't started exponentially increasing yet 1332 so it hasn't established a slow-start threshold. 1334 Therefore, we have to work out a reasoned argument for what to do. 1335 If an AQM is CE-marking packets, it implies there is already a queue 1336 and it is probably already somewhere around the AQM's operating point 1337 - it is unlikely to be well below and it might be well above. So, 1338 the more data packets that the client sends in its IW, the more 1339 likely at least one will be CE marked, leading it to exit slow-start 1340 early. On the other hand, it is highly unlikely that the SYN-ACK 1341 itself pushed the AQM into congestion, so it will be safe to 1342 introduce another single segment immediately (1 RTT after the SYN- 1343 ACK). Therefore, starting to probe for capacity with a slow start 1344 from an initial window of 1 segment seems appropriate to the 1345 circumstances. This is the approach adopted in Section 3.2.2. 1347 EXPERIMENTATION NEEDED: Experiments will be needed to check the 1348 above reasoning and determine any better strategy for reducing IW 1349 in response to congestion on a SYN-ACK (or a SYN). 1351 4.3.3. Fall-Back if ECT SYN-ACK Fails 1353 An alternative to the server caching failed connection attempts would 1354 be for the server to rely on the client caching failed attempts (on 1355 the basis that the client would cache a failure whether ECT was 1356 blocked on the SYN or the SYN-ACK). This strategy cannot be used if 1357 the SYN does not request AccECN support. It works as follows: if the 1358 server receives a SYN that requests AccECN support but is set to not- 1359 ECT, it replies with a SYN-ACK also set to not-ECT. If a middlebox 1360 only blocks ECT on SYNs, not SYN-ACKs, this strategy might disable 1361 ECN on a SYN-ACK when it did not need to, but at least it saves the 1362 server from maintaining a cache. 1364 4.4. Pure ACKs 1366 Section 5.2 of RFC 3168 gives the following arguments for not 1367 allowing the ECT marking of pure ACKs (ACKs not piggy-backed on 1368 data): 1370 "To ensure the reliable delivery of the congestion indication of 1371 the CE codepoint, an ECT codepoint MUST NOT be set in a packet 1372 unless the loss of that packet in the network would be detected by 1373 the end nodes and interpreted as an indication of congestion. 1375 Transport protocols such as TCP do not necessarily detect all 1376 packet drops, such as the drop of a "pure" ACK packet; for 1377 example, TCP does not reduce the arrival rate of subsequent ACK 1378 packets in response to an earlier dropped ACK packet. Any 1379 proposal for extending ECN-Capability to such packets would have 1380 to address issues such as the case of an ACK packet that was 1381 marked with the CE codepoint but was later dropped in the network. 1382 We believe that this aspect is still the subject of research, so 1383 this document specifies that at this time, "pure" ACK packets MUST 1384 NOT indicate ECN-Capability." 1386 Later on, in section 6.1.4 it reads: 1388 "For the current generation of TCP congestion control algorithms, 1389 pure acknowledgement packets (e.g., packets that do not contain 1390 any accompanying data) MUST be sent with the not-ECT codepoint. 1391 Current TCP receivers have no mechanisms for reducing traffic on 1392 the ACK-path in response to congestion notification. Mechanisms 1393 for responding to congestion on the ACK-path are areas for current 1394 and future research. (One simple possibility would be for the 1395 sender to reduce its congestion window when it receives a pure ACK 1396 packet with the CE codepoint set). For current TCP 1397 implementations, a single dropped ACK generally has only a very 1398 small effect on the TCP's sending rate." 1400 We next address each of the arguments presented above. 1402 The first argument is a specific instance of the reliability argument 1403 for the case of pure ACKs. This has already been addressed by 1404 countering the general reliability argument in Section 4.1. 1406 The second argument says that ECN ought not to be enabled unless 1407 there is a mechanism to respond to it. This argument actually 1408 comprises three sub-arguments: 1410 Mechanism feasibility: If ECN is enabled on Pure ACKs, are there, or 1411 could there be, suitable mechanisms to detect, feed back and 1412 respond to ECN-marked Pure ACKs? 1414 Do no extra harm: There has never been a mechanism to respond to 1415 loss of non-ECN Pure ACKs. So it seems that adding ECN without a 1416 response mechanism will do no extra harm to others, while 1417 improving a connection's own performance (because loss of an ACK 1418 holds back new data). However, if the end systems have no 1419 response mechanism, ECN Pure ACKs do slightly more harm than non- 1420 ECN, because the AQM doesn't immediately clear ECT packets from 1421 the queue until it reaches overload and disables ECN. 1423 Standards policy: Even if there were no harm to others, does it set 1424 an undesirable precedent to allow a flow to use ECN to protect its 1425 Pure ACKs from loss, when there is no mechanism to respond to ECN- 1426 marking? 1428 The last two arguments involve value judgements, but they both depend 1429 on the concrete technical question of mechanism feasibility, which 1430 will therefore be addressed first in Section 4.4.1 below. Then 1431 Section 4.4.2 draws conclusions by addressing the value judgements in 1432 the other two questions. 1434 4.4.1. Mechanisms to Respond to CE-Marked Pure ACKs 1436 The question of whether the receiver of pure ACKs is required to 1437 detect and feed back any CE-marking is outside the scope of the 1438 present specification - it is a matter for the relevant feedback 1439 specification (classic ECN [RFC3168] and AccECN 1440 [I-D.ietf-tcpm-accurate-ecn]). The response to congestion feedback 1441 is also out of scope, because it would be defined in the base TCP 1442 congestion control specification [RFC5681] or its variants. 1444 Nonetheless, in order to decide whether the present ECN++ 1445 experimental specification should require a host to set ECT on pure 1446 ACKs, we only need to know whether a response mechanism would be 1447 feasible - we do not have to standardize it. So the bullets below 1448 assess, for each type of feedback, whether the three stages of the 1449 congestion response mechanism could all work. 1451 Detection: Can the receiver of a pure ACK detect a CE marking on 1452 it?: 1454 * Classic feedback: RFC 3168 is silent on this point. The 1455 implementer of the receiver would not expect CE marks on pure 1456 ACKs, but the implementation might happen to check for CE marks 1457 before it looks for the data. So detection will be 1458 implementation-dependent. 1460 * AccECN feedback: the AccECN specification requires the receiver 1461 of any TCP packets to count any CE marks on them (whether or 1462 not it sends ECN-capable control packets itself). 1464 Feedback: As a general rule, TCP does not ACK a pure ACK. However, 1465 even if the receiver of a CE-mark on a pure ACK does not feed it 1466 back immediately, it could still include it within subsequent 1467 feedback, for instance when it later sends a data segment (if it 1468 ever does): 1470 * Classic feedback: RFC 3168 is silent on this point, so feedback 1471 of CE-markings might be implementation specific. If the 1472 receiver (of the pure ACKs) did generate feedback, it would set 1473 the echo congestion experienced (ECE) flag in the TCP header of 1474 subsequent packets in the round, as it would to feed back CE on 1475 data packets. 1477 * AccECN feedback: the receiver continually feeds back a count of 1478 the number of CE-marked packets that it has received and, 1479 optionally, a count of CE-marked bytes. For either metric, 1480 AccECN takes into account all types of packets, including pure 1481 ACKs. CE-marked pure ACKs will solely increment the packet 1482 counter; not any byte counter, because by definition they 1483 contain no bytes of data. 1485 Congestion response: In either case (classic or AccECN feedback), if 1486 the TCP sender does receive feedback about CE-markings on pure 1487 ACKs, it will be able to reduce the congestion window (cwnd) and/ 1488 or the ACK rate. 1490 Therefore a congestion response mechanism is clearly feasible if 1491 AccECN has been negotiated, but the position is unknown for the 1492 installed base of classic ECN feedback. 1494 4.4.1.1. Congestion Window Response to CE-Marked Pure ACKs 1496 This subsection explores issues that congestion control designers 1497 will need to consider when defining a cwnd response to CE-marked Pure 1498 ACKs. 1500 A CE-mark on a Pure ACK does not mean that only Pure ACKs are causing 1501 congestion. It only means that the marked Pure ACK is part of an 1502 aggregate that is collectively causing a bottleneck queue to randomly 1503 CE-mark a fraction of the packets. A CE-mark on a Pure ACK might be 1504 due to data packets in other flows through the same bottleneck, due 1505 to data packets interspersed between Pure ACKs in the same half- 1506 connection, or just due to the rate of Pure ACKs alone. (RFC 3168 1507 only considered the last possibility, which led to the argument that 1508 ECN-enabled Pure ACKs had to be deferred, because ACK congestion 1509 control was a research issue.) 1510 If a host has been sending a mix of Pure ACKs and data, it doesn't 1511 need to work out whether a particular CE mark was on a Pure ACK or 1512 not; it just needs to respond to congestion feedback as a whole by 1513 reducing its congestion window (cwnd), which limits the data it can 1514 launch into flight through the congested bottleneck. If it is purely 1515 receiving data and sending only Pure ACKs, reducing cwnd will have 1516 caused it no harm, having no effect on its ACK rate (the next 1517 subsection addresses that). 1519 However, when a host is sending data as well as Pure ACKs, it would 1520 not be right for CE-marks on Pure ACKs and on data packets to induce 1521 the same reduction in cwnd. A possible way to address this issue 1522 would be to weight the response by the size of the marked packets 1523 (assuming the congestion control supports a weighted response, 1524 e.g. [RFC8257]). For instance, one could calculate the fraction of 1525 CE-marked bytes (headers and data) over each round trip (say) as 1526 follows: 1528 (CE-marked header bytes + CE-marked data bytes) / (all header 1529 bytes + all data bytes) 1531 Header bytes can be calculated by multiplying a packet count by a 1532 nominal header size, which is possible with AccECN feedback, because 1533 it gives a count of CE-marked packets (as well as CE-marked bytes). 1534 The above simple aggregate calculation caters for the full range of 1535 scenarios; from all Pure ACKs to just a few interspersed with data 1536 packets. 1538 Note that any mechanism that reduces cwnd due to CE-marked Pure ACKs 1539 would need to be integrated with the congestion window validation 1540 mechanism [RFC7661], which already conservatively reduces cwnd over 1541 time because cwnd becomes stale if it is not used to fill the pipe. 1543 4.4.1.2. ACK Rate Response to CE-Marked Pure ACKs 1545 Reducing the congestion window will have no effect on the rate of 1546 pure ACKs. The worst case here is if the bottleneck is congested 1547 solely with pure ACKs, but it could also be problematic if a large 1548 fraction of the load was from unresponsive ACKs, leaving little or no 1549 capacity for the load from responsive data. 1551 Since RFC 3168 was published, experimental Acknowledgement Congestion 1552 Control (AckCC) techniques have been documented in [RFC5690] 1553 (informational). So any pair of TCP end-points can choose to agree 1554 to regulate the delayed ACK ratio in response to lost or CE-marked 1555 pure ACKs. However, the protocol has a number of open issues 1556 concerning deployment (e.g. it requires support from both ends, it 1557 relies on two new TCP options, one of which is required on the SYN 1558 where option space is at a premium and, if either option is blocked 1559 by a middlebox, no fall-back behaviour is specified). 1561 The new TCP options address two problems, namely that TCP had: i) no 1562 mechanism to allow ECT to be set on pure ACKs; and ii) no mechanism 1563 to feed back loss or CE-marking of pure ACKs. A combination of the 1564 present specification and AccECN addresses both these problems, at 1565 least for CE-marking. So it might now be possible to design an ECN- 1566 specific ACK congestion control scheme without the extra TCP options 1567 proposed in RFC 5690. However, such a mechanism is out of scope of 1568 the present document. 1570 Setting aside the practicality of RFC 5690, the need for AckCC has 1571 not been conclusively demonstrated. It has been argued that the 1572 Internet has survived so far with no mechanism to even detect loss of 1573 pure ACKs. However, it has also been argued that ECN is not the same 1574 as loss. Packet discard can naturally thin the ACK load to whatever 1575 the bottleneck can support, whereas ECN marking does not (it queues 1576 the ACKs instead). Nonetheless, RFC 3168 (section 7) recommends that 1577 an AQM switches over from ECN marking to discard when the marking 1578 probability becomes high. Therefore discard can still be relied on 1579 to thin out ECN-enabled pure ACKs as a last resort. 1581 4.4.2. Summary: Enabling ECN on Pure ACKs 1583 In the case when AccECN has been negotiated, it provides a feasible 1584 congestion response mechanism, so the arguments for ECT on pure ACKs 1585 heavily outweigh those against. ECN is always more and never less 1586 reliable for delivery of congestion notification. A cwnd reduction 1587 needs to be considered by congestion control designers as a response 1588 to congestion on pure ACKs. Separately, AckCC (or an improved 1589 variant exploiting AccECN) could optionally be used to regulate the 1590 spacing between pure ACKs. However, it is not clear whether AckCC is 1591 justified. If it is not, packet discard will still act as the 1592 "congestion response of last resort" by thinning out the traffic. In 1593 contrast, not setting ECT on pure ACKs is certainly detrimental to 1594 performance, because when a pure ACK is lost it can prevent the 1595 release of new data. 1597 In the case when Classic ECN has been negotiated, the argument for 1598 ECT on pure ACKs is less clear-cut. Some of the installed base of 1599 RFC 3168 implementations might happen to (unintentionally) provide a 1600 feedback mechanism to support a cwnd response. For those that did 1601 not, setting ECT on pure ACKs would be better for the flow's own 1602 performance than not setting it. However, where there was no 1603 feedback mechanism, setting ECT could do slightly more harm than not 1604 setting it. AckCC could provide a complementary response mechanism, 1605 because it is designed to work with RFC 3168 ECN, but it has 1606 deployment challenges. In summary, a congestion response mechanism 1607 is unlikely to be feasible with the installed base of classic ECN. 1609 This specification uses a safe approach. Allowing hosts to set ECT 1610 on Pure ACKs without a feasible response mechanism could result in 1611 risk. It would certainly improve the flow's own performance, but it 1612 would slightly increase potential harm to others. Morevoer, if would 1613 set an undesirable precedent for setting ECT on packets with no 1614 mechanism to respond to any resulting congestion signals. Therefore, 1615 Section 3.2.3 allows ECT on Pure ACKs if AccECN feedback has been 1616 negotiated, but not with classic RFC 3168 ECN feedback. 1618 4.5. Window Probes 1620 Section 6.1.6 of RFC 3168 presents only the reliability argument for 1621 prohibiting ECT on Window probes: 1623 "If a window probe packet is dropped in the network, this loss is 1624 not detected by the receiver. Therefore, the TCP data sender MUST 1625 NOT set either an ECT codepoint or the CWR bit on window probe 1626 packets. 1628 However, because window probes use exact sequence numbers, they 1629 cannot be easily spoofed in denial-of-service attacks. Therefore, 1630 if a window probe arrives with the CE codepoint set, then the 1631 receiver SHOULD respond to the ECN indications." 1633 The reliability argument has already been addressed in Section 4.1. 1635 Allowing ECT on window probes could considerably improve performance 1636 because, once the receive window has reopened, if a window probe is 1637 lost the sender will stall until the next window probe reaches the 1638 receiver, which might be after the maximum retransmission timeout (at 1639 least 1 minute [RFC6928]). 1641 On the bright side, RFC 3168 at least specifies the receiver 1642 behaviour if a CE-marked window probe arrives, so changing the 1643 behaviour ought to be less painful than for other packet types. 1645 4.6. FINs 1647 RFC 3168 is silent on whether a TCP sender can set ECT on a FIN. A 1648 FIN is considered as part of the sequence of data, and the rate of 1649 pure ACKs sent after a FIN could be controlled by a CE marking on the 1650 FIN. Therefore there is no reason not to set ECT on a FIN. 1652 4.7. RSTs 1654 RFC 3168 is silent on whether a TCP sender can set ECT on a RST. The 1655 host generating the RST message does not have an open connection 1656 after sending it (either because there was no such connection when 1657 the packet that triggered the RST message was received or because the 1658 packet that triggered the RST message also triggered the closure of 1659 the connection). 1661 Moreover, the receiver of a CE-marked RST message can either: i) 1662 accept the RST message and close the connection; ii) emit a so-called 1663 challenge ACK in response (with suitable throttling) [RFC5961] and 1664 otherwise ignore the RST (e.g. because the sequence number is in- 1665 window but not the precise number expected next); or iii) discard the 1666 RST message (e.g. because the sequence number is out-of-window). In 1667 the first two cases there is no point in echoing any CE mark received 1668 because the sender closed its connection when it sent the RST. In 1669 the third case it makes sense to discard the CE signal as well as the 1670 RST. 1672 Although a congestion response following a CE-marking on a RST does 1673 not appear to make sense, the following factors have been considered 1674 before deciding whether the sender ought to set ECT on a RST message: 1676 * As explained above, a congestion response by the sender of a CE- 1677 marked RST message is not possible; 1679 * So the only reason for the sender setting ECT on a RST would be to 1680 improve the reliability of the message's delivery; 1682 * RST messages are used to both mount and mitigate attacks: 1684 - Spoofed RST messages are used by attackers to terminate ongoing 1685 connections, although the mitigations in RFC 5961 have 1686 considerably raised the bar against off-path RST attacks; 1688 - Legitimate RST messages allow endpoints to inform their peers 1689 to eliminate existing state that correspond to non existing 1690 connections, liberating resources e.g. in DoS attacks 1691 scenarios; 1693 * AQMs are advised to disable ECN marking during persistent 1694 overload, so: 1696 - it is harder for an attacker to exploit ECN to intensify an 1697 attack; 1699 - it is harder for a legitimate user to exploit ECN to more 1700 reliably mitigate an attack 1702 * Prohibiting ECT on a RST would deny the benefit of ECN to 1703 legitimate RST messages, but not to attackers who can disregard 1704 RFCs; 1706 * If ECT were prohibited on RSTs 1708 - it would be easy for security middleboxes to discard all ECN- 1709 capable RSTs; 1711 - However, unlike a SYN flood, it is already easy for a security 1712 middlebox (or host) to distinguish a RST flood from legitimate 1713 traffic [RFC5961], and even if a some legitimate RSTs are 1714 accidentally removed as well, legitimate connections still 1715 function. 1717 So, on balance, it has been decided that it is worth experimenting 1718 with ECT on RSTs. During experiments, if the ECN capability on RSTs 1719 is found to open a vulnerability that is hard to close, this decision 1720 can be reversed, before it is specified for the standards track. 1722 4.8. Retransmitted Packets. 1724 RFC 3168 says the sender "MUST NOT" set ECT on retransmitted packets. 1725 The rationale for this consumes nearly 2 pages of RFC 3168, so the 1726 reader is referred to section 6.1.5 of RFC 3168, rather than quoting 1727 it all here. There are essentially three arguments, namely: 1728 reliability; DoS attacks; and over-reaction to congestion. We 1729 address them in order below. 1731 The reliability argument has already been addressed in Section 4.1. 1733 Protection against DoS attacks is not afforded by prohibiting ECT on 1734 retransmitted packets. An attacker can set CE on spoofed 1735 retransmissions whether or not it is prohibited by an RFC. 1736 Protection against the DoS attack described in section 6.1.5 of RFC 1737 3168 is solely afforded by the requirement that "the TCP data 1738 receiver SHOULD ignore the CE codepoint on out-of-window packets". 1739 Therefore in Section 3.2.7 the sender is allowed to set ECT on 1740 retransmitted packets, in order to reduce the chance of them being 1741 dropped. We also strengthen the receiver's requirement from "SHOULD 1742 ignore" to "MUST ignore". And we generalize the receiver's 1743 requirement to include failure of any validity check, not just out- 1744 of-window checks, in order to include the more stringent validity 1745 checks in RFC 5961 that have been developed since RFC 3168. 1747 A consequence is that, for those retransmitted packets that arrive at 1748 the receiver after the original packet has been properly received 1749 (so-called spurious retransmissions), any CE marking will be ignored. 1750 There is no problem with that because the fact that the original 1751 packet has been delivered implies that the sender's original 1752 congestion response (when it deemed the packet lost and retransmitted 1753 it) was unnecessary. 1755 Finally, the third argument is about over-reacting to congestion. 1756 The argument goes that, if a retransmitted packet is dropped, the 1757 sender will not detect it, so it will not react again to congestion 1758 (it would have reduced its congestion window already when it 1759 retransmitted the packet). Whereas, if retransmitted packets can be 1760 CE tagged instead of dropped, senders could potentially react more 1761 than once to congestion. However, we argue that it is legitimate to 1762 respond again to congestion if it still persists in subsequent round 1763 trip(s). 1765 Therefore, in all three cases, it is not incorrect to set ECT on 1766 retransmissions. 1768 4.9. General Fall-back for any Control Packet 1770 Extensive experiments have found no evidence of any traversal 1771 problems with ECT on any TCP control packet [Mandalari18]. 1772 Nonetheless, Sections 3.2.1.4 and 3.2.2.3 specify fall-back measures 1773 if ECT on the first packet of each half-connection (SYN or SYN-ACK) 1774 appears to be blocking progress. Here, the question of fall-back 1775 measures for ECT on other control packets is explored. It supports 1776 the advice given in Section 3.2.8; until there's evidence that 1777 something's broken, don't fix it. 1779 If an implementation has had to disable ECT to ensure the first 1780 packet of a flow (SYN or SYN-ACK) gets through, the question arises 1781 whether it ought to disable ECT on all subsequent control packets 1782 within the same TCP connection. Without evidence of any such 1783 problems, this seems unnecessarily cautious. Particularly given it 1784 would be hard to detect loss of most other types of TCP control 1785 packets that are not ACK'd. And particularly given that 1786 unnecessarily removing ECT from other control packets could lead to 1787 performance problems, e.g. by directing them into another queue 1788 [I-D.ietf-tsvwg-ecn-l4s-id] or over a different path, because some 1789 broken multipath equipment (erroneously) routes based on all 8 bits 1790 of the Diffserv field. 1792 In the case where a connection starts without ECT on the SYN (perhaps 1793 because problems with previous connections had been cached), there 1794 will have been no test for ECT traversal in the client-server 1795 direction until the pure ACK that completes the handshake. It is 1796 possible that some middlebox might block ECT on this pure ACK or on 1797 later retransmissions of lost packets. Similarly, after a route 1798 change, the new path might include some middlebox that blocks ECT on 1799 some or all TCP control packets. However, without evidence of such 1800 problems, the complexity of a fix does not seem worthwhile. 1802 MORE MEASUREMENTS NEEDED (?): If further two-ended measurements do 1803 find evidence for these traversal problems, measurements would be 1804 needed to check for correlation of ECT traversal problems between 1805 different control packets. It might then be necessary to 1806 introduce a catch-all fall-back rule that disables ECT on certain 1807 subsequent TCP control packets based on some criteria developed 1808 from these measurements. 1810 5. Interaction with popular variants or derivatives of TCP 1812 The following subsections discuss any interactions between setting 1813 ECT on all packets and using the following popular variants of TCP: 1814 IW10 and TFO. It also briefly notes the possibility that the 1815 principles applied here should translate to protocols derived from 1816 TCP. This section is informative not normative, because no 1817 interactions have been identified that require any change to 1818 specifications. The subsection on IW10 discusses potential changes 1819 to specifications but recommends that no changes are needed. 1821 The designs of the following TCP variants have also been assessed and 1822 found not to interact adversely with ECT on TCP control packets: SYN 1823 cookies (see Appendix A of [RFC4987] and section 3.1 of [RFC5562]), 1824 TCP Fast Open (TFO [RFC7413]) and L4S [I-D.ietf-tsvwg-l4s-arch]. 1826 5.1. IW10 1828 IW10 is an experiment to determine whether it is safe for TCP to use 1829 an initial window of 10 SMSS [RFC6928]. 1831 This subsection does not recommend any additions to the present 1832 specification in order to interwork with IW10. The specifications as 1833 they stand are safe, and there is only a corner-case with ECT on the 1834 SYN where performance could be occasionally improved, as explained 1835 below. 1837 As specified in Section 3.2.1.1, a TCP initiator will typically only 1838 set ECT on the SYN if it requests AccECN support. If, however, the 1839 SYN-ACK tells the initiator that the responder does not support 1840 AccECN, Section 3.2.1.1 advises the initiator to conservatively 1841 reduce its initial window, preferably to 1 SMSS because, if the SYN 1842 was CE-marked, the SYN-ACK has no way to feed that back. 1844 If the initiator implements IW10, it seems rather over-conservative 1845 to reduce IW from 10 to 1 just in case a congestion marking was 1846 missed. Nonetheless, a reduction to 1 SMSS will rarely harm 1847 performance, because: 1849 * as long as the initiator is caching failures to negotiate AccECN, 1850 subsequent attempts to access the same server will not use ECT on 1851 the SYN anyway, so there will no longer be any need to 1852 conservatively reduce IW; 1854 * currently, at least for web sessions, it is extremely rare for a 1855 TCP initiator (client) to have more than one data segment to send 1856 at the start of a TCP connection (see Fig 3 in [Manzoor17]) - IW10 1857 is primarily exploited by TCP servers. 1859 If a responder receives feedback that the SYN-ACK was CE-marked, 1860 Section 3.2.2.2 recommends that it reduces its initial window, 1861 preferably to 1 SMSS. When the responder also implements IW10, it 1862 might again seem rather over-conservative to reduce IW from 10 to 1. 1863 But in this case the rationale is somewhat different: 1865 * Feedback that the SYN-ACK was CE-marked is an explicit indication 1866 that the queue has been building, not just uncertainty due to 1867 absence of feedback; 1869 * Given it is now likely that a queue already exists, the more data 1870 packets that the server sends in its IW, the more likely at least 1871 one will be CE marked, leading it to exit slow-start early. 1873 Experimentation will be needed to determine the best strategy. It 1874 should be noted that experience from recent congestion avoidance 1875 experiments where the window is reduced by less than half is not 1876 necessarily applicable to a flow start scenario. Reducing cwnd by 1877 less is one thing. Reducing an increase in cwnd by less is another. 1879 5.2. TFO 1881 TCP Fast Open (TFO [RFC7413]) is an experiment to remove the round 1882 trip delay of TCP's 3-way hand-shake (3WHS). A TFO initiator caches 1883 a cookie from a previous connection with a TFO-enabled server. Then, 1884 for subsequent connections to the same server, any data included on 1885 the SYN can be passed directly to the server application, which can 1886 then return up to an initial window of response data on the SYN-ACK 1887 and on data segments straight after it, without waiting for the ACK 1888 that completes the 3WHS. 1890 The TFO experiment and the present experiment to add ECN-support for 1891 TCP control packets can be combined without altering either 1892 specification, which is justified as follows: 1894 * The handling of ECN marking on a SYN is no different whether or 1895 not it carries data. 1897 * In response to any CE-marking on the SYN-ACK, the responder adopts 1898 the normal response to congestion, as discussed in Section 7.2 of 1899 [RFC7413]. 1901 5.3. L4S 1903 A Low Latency Low Loss Scalable throughput (L4S) variant of TCP such 1904 as TCP Prague [PragueLinux] is mandated to negotiate AccECN feedback, 1905 and strongly recommended to use ECN++ [I-D.ietf-tsvwg-ecn-l4s-id]. 1907 The L4S experiment and the present ECN++ experiment can be combined 1908 without altering any of the specifications. The only difference 1909 would be in the recommendation of the best SYN cache strategy. 1911 The normative specification for ECT on a SYN in Section 3.2.1 1912 recommends the "optimistic ECT and cache failures" strategy (S2B 1913 defined in Section 4.2.3) for the general Internet. However, if a 1914 user's Internet access bottleneck supported L4S ECN but not Classic 1915 ECN, the "optimistic ECT without a cache" strategy (S2A) would make 1916 most sense, because there would be little point trying to avoid the 1917 'over-strict' test and negotiate Classic ECN, if L4S ECN but not 1918 Classic ECN was available on that user's access link (as is the case 1919 with Low Latency DOCSIS [DOCSIS3.1]). 1921 Strategy (S2A) is the simplest, because it requires no cache. It 1922 would satisfy the goal of an implementer who is solely interested in 1923 ultra-low latency using AccECN and ECN++ (e.g. accessing L4S servers) 1924 and is not concerned about fall-back to Classic ECN (e.g. when 1925 accessing other servers). 1927 5.4. Other transport protocols 1929 Experience from experiments on adding ECN support to all TCP packets 1930 ought to be directly transferable between TCP and other transport 1931 protocols, like SCTP or QUIC. 1933 Stream Control Transmission Protocol (SCTP [RFC4960]) is a standards 1934 track transport protocol derived from TCP. SCTP currently does not 1935 include ECN support, but Appendix A of RFC 4960 broadly describes how 1936 it would be supported and a (long-expired) draft on the addition of 1937 ECN to SCTP has been produced [I-D.stewart-tsvwg-sctpecn]. This 1938 draft avoided setting ECT on control packets and retransmissions, 1939 closely following the arguments in RFC 3168. 1941 QUIC [RFC9000] is another standards track transport protocol offering 1942 similar services to TCP but intended to exploit some of the benefits 1943 of running over UDP. Building on the arguments in the current draft, 1944 a QUIC sender sets ECT(0) on all packets. 1946 6. Security Considerations 1948 Section 3.2.6 considers the question of whether ECT on RSTs will 1949 allow RST attacks to be intensified. There are several security 1950 arguments presented in RFC 3168 for preventing the ECN marking of TCP 1951 control packets and retransmitted segments. We believe all of them 1952 have been properly addressed in Section 4, particularly Section 4.2.4 1953 and Section 4.8 on DoS attacks using spoofed ECT-marked SYNs and 1954 spoofed CE-marked retransmissions. 1956 Section 3.2.6 on sending TCP RSTs points out that implementers need 1957 to take care to ensure that the ECN field on a RST does not depend on 1958 TCP's state machine. Otherwise the internal information revealed 1959 could be of use to potential attackers. This point applies more 1960 generally to all control packets, not just RSTs. 1962 7. IANA Considerations 1964 There are no IANA considerations in this memo. 1966 8. Acknowledgments 1968 Thanks to Mirja Kuehlewind, David Black, Padma Bhooma, Gorry 1969 Fairhurst, Michael Scharf, Yuchung Cheng and Christophe Paasch for 1970 their useful reviews. Richard Scheffenegger provided useful advice 1971 gained from implementing ECN++ for FreeBSD. 1973 The work of Marcelo Bagnulo has been performed in the framework of 1974 the H2020-ICT-2014-2 project 5G NORMA. His contribution reflects the 1975 consortium's view, but the consortium is not liable for any use that 1976 may be made of any of the information contained therein. 1978 Bob Briscoe's contribution was partly funded by the Research Council 1979 of Norway through the TimeIn project, partly by CableLabs and partly 1980 by the Comcast Innovation Fund. The views expressed here are solely 1981 those of the authors. 1983 9. References 1985 9.1. Normative References 1987 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1988 Requirement Levels", BCP 14, RFC 2119, 1989 DOI 10.17487/RFC2119, March 1997, 1990 . 1992 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1993 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1994 May 2017, . 1996 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1997 of Explicit Congestion Notification (ECN) to IP", 1998 RFC 3168, DOI 10.17487/RFC3168, September 2001, 1999 . 2001 [RFC5961] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's 2002 Robustness to Blind In-Window Attacks", RFC 5961, 2003 DOI 10.17487/RFC5961, August 2010, 2004 . 2006 [I-D.ietf-tcpm-accurate-ecn] 2007 Briscoe, B., Kühlewind, M., and R. Scheffenegger, "More 2008 Accurate ECN Feedback in TCP", Work in Progress, Internet- 2009 Draft, draft-ietf-tcpm-accurate-ecn-15, 12 July 2021, 2010 . 2013 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 2014 Notification (ECN) Experimentation", RFC 8311, 2015 DOI 10.17487/RFC8311, January 2018, 2016 . 2018 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 2019 RFC 793, DOI 10.17487/RFC0793, September 1981, 2020 . 2022 9.2. Informative References 2024 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 2025 Communication Layers", STD 3, RFC 1122, 2026 DOI 10.17487/RFC1122, October 1989, 2027 . 2029 [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit 2030 Congestion Notification (ECN) Signaling with Nonces", 2031 RFC 3540, DOI 10.17487/RFC3540, June 2003, 2032 . 2034 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 2035 RFC 4960, DOI 10.17487/RFC4960, September 2007, 2036 . 2038 [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common 2039 Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, 2040 . 2042 [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. 2043 Ramakrishnan, "Adding Explicit Congestion Notification 2044 (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, 2045 DOI 10.17487/RFC5562, June 2009, 2046 . 2048 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 2049 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 2050 . 2052 [RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding 2053 Acknowledgement Congestion Control to TCP", RFC 5690, 2054 DOI 10.17487/RFC5690, February 2010, 2055 . 2057 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 2058 "Computing TCP's Retransmission Timer", RFC 6298, 2059 DOI 10.17487/RFC6298, June 2011, 2060 . 2062 [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, 2063 "Increasing TCP's Initial Window", RFC 6928, 2064 DOI 10.17487/RFC6928, April 2013, 2065 . 2067 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP 2068 Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, 2069 . 2071 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 2072 Recommendations Regarding Active Queue Management", 2073 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 2074 . 2076 [RFC7661] Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating 2077 TCP to Support Rate-Limited Traffic", RFC 7661, 2078 DOI 10.17487/RFC7661, October 2015, 2079 . 2081 [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., 2082 and G. Judd, "Data Center TCP (DCTCP): TCP Congestion 2083 Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257, 2084 October 2017, . 2086 [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140, 2087 DOI 10.17487/RFC2140, April 1997, 2088 . 2090 [I-D.ietf-tsvwg-ecn-l4s-id] 2091 Schepper, K. D. and B. Briscoe, "Explicit Congestion 2092 Notification (ECN) Protocol for Very Low Queuing Delay 2093 (L4S)", Work in Progress, Internet-Draft, draft-ietf- 2094 tsvwg-ecn-l4s-id-23, 24 December 2021, 2095 . 2098 [I-D.ietf-tsvwg-l4s-arch] 2099 Briscoe, B., Schepper, K. D., Bagnulo, M., and G. White, 2100 "Low Latency, Low Loss, Scalable Throughput (L4S) Internet 2101 Service: Architecture", Work in Progress, Internet-Draft, 2102 draft-ietf-tsvwg-l4s-arch-15, 24 December 2021, 2103 . 2106 [I-D.stewart-tsvwg-sctpecn] 2107 Stewart, R. R., Tuexen, M., and X. Dong, "ECN for Stream 2108 Control Transmission Protocol (SCTP)", Work in Progress, 2109 Internet-Draft, draft-stewart-tsvwg-sctpecn-05, 15 January 2110 2014, . 2113 [RFC9000] Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 2114 Multiplexed and Secure Transport", RFC 9000, 2115 DOI 10.17487/RFC9000, May 2021, 2116 . 2118 [judd-nsdi] 2119 Judd, G.J., "Attaining the promise and avoiding the 2120 pitfalls of TCP in the Datacenter", USENIX Symposium on 2121 Networked Systems Design and Implementation 2122 (NSDI'15) pp.145-157, May 2015, 2123 . 2125 [ecn-pam] Trammell, B., Kühlewind, M., Boppart, D., Learmonth, I., 2126 Fairhurst, G., and R. Scheffenegger, "Enabling Internet- 2127 Wide Deployment of Explicit Congestion Notification", 2128 Int'l Conf. on Passive and Active Network Measurement 2129 (PAM'15) pp193-205, 2015, . 2132 [ECN-PLUS] Kuzmanovic, A., "The Power of Explicit Congestion 2133 Notification", ACM SIGCOMM 35(4):61--72, 2005, 2134 . 2136 [Mandalari18] 2137 Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Ö. 2138 Alay, "Measuring ECN++: Good News for ++, Bad News for ECN 2139 over Mobile", IEEE Communications Magazine , March 2018, 2140 . 2142 [Manzoor17] 2143 Manzoor, J., Drago, I., and R. Sadre, "How HTTP/2 is 2144 changing Web traffic and how to detect it", In Proc: 2145 Network Traffic Measurement and Analysis Conference (TMA) 2146 2017 pp.1-9, June 2017, 2147 . 2149 [Kuehlewind18] 2150 Kühlewind, M., Walter, M., Learmonth, I., and B. Trammell, 2151 "Tracing Internet Path Transparency", In Proc: Network 2152 Traffic Measurement and Analysis Conference (TMA) 2018 , 2153 June 2018, . 2156 [strict-ecn] 2157 Dumazet, E., "tcp: be more strict before accepting ECN 2158 negociation", Linux netdev patch list , 4 May 2012, 2159 . 2161 [relax-strict-ecn] 2162 Tilmans, O., "tcp: Accept ECT on SYN in the presence of 2163 RFC8311", Linux netdev patch list , 3 April 2019, 2164 . 2166 [ecn-overload] 2167 Steen, H., "Destruction Testing: Ultra-Low Delay using 2168 Dual Queue Coupled Active Queue Management", Masters 2169 Thesis, Uni Oslo , May 2017, 2170 . 2173 [PragueLinux] 2174 Briscoe, B., De Schepper, K., Albisser, O., Misund, J., 2175 Tilmans, O., Kühlewind, M., and A.S. Ahmed, "Implementing 2176 the `TCP Prague' Requirements for Low Latency Low Loss 2177 Scalable Throughput (L4S)", Proc. Linux Netdev 0x13 , 2178 March 2019, . 2181 [DOCSIS3.1] 2182 CableLabs, "MAC and Upper Layer Protocols Interface 2183 (MULPI) Specification, CM-SP-MULPIv3.1", Data-Over-Cable 2184 Service Interface Specifications DOCSIS® 3.1 Version i17 2185 or later, 21 January 2019, . 2188 Authors' Addresses 2190 Marcelo Bagnulo 2191 Universidad Carlos III de Madrid 2192 Av. Universidad 30 2193 28911 Leganes Madrid 2194 Spain 2196 Phone: 34 91 6249500 2197 Email: marcelo@it.uc3m.es 2198 URI: http://www.it.uc3m.es 2200 Bob Briscoe 2201 Independent 2202 United Kingdom 2204 Email: ietf@bobbriscoe.net 2205 URI: http://bobbriscoe.net/