idnits 2.17.1 draft-ietf-tcpm-generalized-ecn-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC5562, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 8, 2019) is 1752 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Unused Reference: 'Manzoor17' is defined on line 1950, but no explicit reference was found in the text == Outdated reference: A later version (-28) exists of draft-ietf-tcpm-accurate-ecn-08 == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-20 == Outdated reference: A later version (-29) exists of draft-ietf-tsvwg-ecn-l4s-id-06 == Outdated reference: A later version (-20) exists of draft-ietf-tsvwg-l4s-arch-03 == Outdated reference: A later version (-07) exists of draft-stewart-tsvwg-sctpecn-05 -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) Summary: 0 errors (**), 0 flaws (~~), 9 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Bagnulo 3 Internet-Draft UC3M 4 Obsoletes: 5562 (if approved) B. Briscoe 5 Intended status: Experimental CableLabs 6 Expires: January 9, 2020 July 8, 2019 8 ECN++: Adding Explicit Congestion Notification (ECN) to TCP Control 9 Packets 10 draft-ietf-tcpm-generalized-ecn-04 12 Abstract 14 This document describes an experimental modification to ECN when used 15 with TCP. It allows the use of ECN on the following TCP packets: 16 SYNs, pure ACKs, Window probes, FINs, RSTs and retransmissions. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at https://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on January 9, 2020. 35 Copyright Notice 37 Copyright (c) 2019 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (https://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 This document may contain material from IETF Documents or IETF 51 Contributions published or made publicly available before November 52 10, 2008. The person(s) controlling the copyright in some of this 53 material may not have granted the IETF Trust the right to allow 54 modifications of such material outside the IETF Standards Process. 55 Without obtaining an adequate license from the person(s) controlling 56 the copyright in such materials, this document may not be modified 57 outside the IETF Standards Process, and derivative works of it may 58 not be created outside the IETF Standards Process, except to format 59 it for publication as an RFC or to translate it into languages other 60 than English. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 4 66 1.2. Experiment Goals . . . . . . . . . . . . . . . . . . . . 5 67 1.3. Document Structure . . . . . . . . . . . . . . . . . . . 6 68 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 69 3. Specification . . . . . . . . . . . . . . . . . . . . . . . . 7 70 3.1. Network (e.g. Firewall) Behaviour . . . . . . . . . . . . 7 71 3.2. Sender Behaviour . . . . . . . . . . . . . . . . . . . . 8 72 3.2.1. SYN (Send) . . . . . . . . . . . . . . . . . . . . . 9 73 3.2.2. SYN-ACK (Send) . . . . . . . . . . . . . . . . . . . 12 74 3.2.3. Pure ACK (Send) . . . . . . . . . . . . . . . . . . . 13 75 3.2.4. Window Probe (Send) . . . . . . . . . . . . . . . . . 14 76 3.2.5. FIN (Send) . . . . . . . . . . . . . . . . . . . . . 15 77 3.2.6. RST (Send) . . . . . . . . . . . . . . . . . . . . . 15 78 3.2.7. Retransmissions (Send) . . . . . . . . . . . . . . . 16 79 3.2.8. General Fall-back for any Control Packet or 80 Retransmission . . . . . . . . . . . . . . . . . . . 16 81 3.3. Receiver Behaviour . . . . . . . . . . . . . . . . . . . 16 82 3.3.1. Receiver Behaviour for Any TCP Control Packet or 83 Retransmission . . . . . . . . . . . . . . . . . . . 17 84 3.3.2. SYN (Receive) . . . . . . . . . . . . . . . . . . . . 17 85 3.3.3. Pure ACK (Receive) . . . . . . . . . . . . . . . . . 18 86 3.3.4. FIN (Receive) . . . . . . . . . . . . . . . . . . . . 18 87 3.3.5. RST (Receive) . . . . . . . . . . . . . . . . . . . . 18 88 3.3.6. Retransmissions (Receive) . . . . . . . . . . . . . . 19 89 4. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . 19 90 4.1. The Reliability Argument . . . . . . . . . . . . . . . . 19 91 4.2. SYNs . . . . . . . . . . . . . . . . . . . . . . . . . . 20 92 4.2.1. Argument 1a: Unrecognized CE on the SYN . . . . . . . 20 93 4.2.2. Argument 1b: ECT Considered Invalid on the SYN . . . 21 94 4.2.3. Caching Strategies for ECT on SYNs . . . . . . . . . 23 95 4.2.4. Argument 2: DoS Attacks . . . . . . . . . . . . . . . 25 96 4.3. SYN-ACKs . . . . . . . . . . . . . . . . . . . . . . . . 26 97 4.3.1. Possibility of Unrecognized CE on the SYN-ACK . . . . 26 98 4.3.2. Response to Congestion on a SYN-ACK . . . . . . . . . 26 99 4.3.3. Fall-Back if ECT SYN-ACK Fails . . . . . . . . . . . 28 100 4.4. Pure ACKs . . . . . . . . . . . . . . . . . . . . . . . . 28 101 4.4.1. Mechanisms to Respond to CE-Marked Pure ACKs . . . . 29 102 4.4.2. Summary: Enabling ECN on Pure ACKs . . . . . . . . . 32 103 4.5. Window Probes . . . . . . . . . . . . . . . . . . . . . . 33 104 4.6. FINs . . . . . . . . . . . . . . . . . . . . . . . . . . 34 105 4.7. RSTs . . . . . . . . . . . . . . . . . . . . . . . . . . 34 106 4.8. Retransmitted Packets. . . . . . . . . . . . . . . . . . 35 107 4.9. General Fall-back for any Control Packet . . . . . . . . 36 108 5. Interaction with popular variants or derivatives of TCP . . . 37 109 5.1. IW10 . . . . . . . . . . . . . . . . . . . . . . . . . . 37 110 5.2. TFO . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 111 5.3. TCP Derivatives . . . . . . . . . . . . . . . . . . . . . 39 112 6. Security Considerations . . . . . . . . . . . . . . . . . . . 39 113 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39 114 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 39 115 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 40 116 9.1. Normative References . . . . . . . . . . . . . . . . . . 40 117 9.2. Informative References . . . . . . . . . . . . . . . . . 40 118 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 43 120 1. Introduction 122 RFC 3168 [RFC3168] specifies support of Explicit Congestion 123 Notification (ECN) in IP (v4 and v6). By using the ECN capability, 124 network elements (e.g. routers, switches) performing Active Queue 125 Management (AQM) can use ECN marks instead of packet drops to signal 126 congestion to the endpoints of a communication. This results in 127 lower packet loss and increased performance. RFC 3168 also specifies 128 support for ECN in TCP, but solely on data packets. For various 129 reasons it precludes the use of ECN on TCP control packets (TCP SYN, 130 TCP SYN-ACK, pure ACKs, Window probes) and on retransmitted packets. 131 RFC 3168 is silent about the use of ECN on RST and FIN packets. RFC 132 5562 [RFC5562] is an experimental modification to ECN that enables 133 ECN support for TCP SYN-ACK packets. 135 This document defines an experimental modification to ECN [RFC3168] 136 that shall be called ECN++. It enables ECN support on all the 137 aforementioned types of TCP packet. 139 ECN++ uses a sender-only deployment model. It works whether the two 140 ends of the TCP connection use classic ECN feedback [RFC3168] or 141 experimental Accurate ECN feedback (AccECN 142 [I-D.ietf-tcpm-accurate-ecn]). Nonetheless, if the client does not 143 implement AccECN, it cannot use ECN++ on the one packet that offers 144 most benefit from it - the initial SYN. Therefore, implementers of 145 ECN++ are RECOMMENDED to also implement AccECN. 147 ECN++ is designed for compatibility with a number of latency 148 improvements to TCP such as TCP Fast Open (TFO [RFC7413]), initial 149 window of 10 SMSS (IW10 [RFC6928]) and Low latency Low Loss Scalable 150 Transport (L4S [I-D.ietf-tsvwg-l4s-arch]), but they can all be 151 implemented and deployed independently. [RFC8311] is a standards 152 track procedural device that relaxes requirements in RFC 3168 and 153 other standards track RFCs that would otherwise preclude the 154 experimental modifications needed for ECN++ and other ECN 155 experiments. 157 1.1. Motivation 159 The absence of ECN support on TCP control packets and retransmissions 160 has a potential harmful effect. In any ECN deployment, non-ECN- 161 capable packets suffer a penalty when they traverse a congested 162 bottleneck. For instance, with a drop probability of 1%, 1% of 163 connection attempts suffer a timeout of about 1 second before the SYN 164 is retransmitted, which is highly detrimental to the performance of 165 short flows. TCP control packets, particularly TCP SYNs and SYN- 166 ACKs, are important for performance, so dropping them is best 167 avoided. 169 Non-ECN control packets particularly harm performance in environments 170 where the ECN marking level is high. For example, [judd-nsdi] shows 171 that in a controlled private data centre (DC) environment where ECN 172 is used (in conjunction with DCTCP [RFC8257]), the probability of 173 being able to establish a new connection using a non-ECN SYN packet 174 drops to close to zero even when there are only 16 ongoing TCP flows 175 transmitting at full speed. The issue is that DCTCP exhibits a much 176 more aggressive response to packet marking (which is why it is only 177 applicable in controlled environments). This leads to a high marking 178 probability for ECN-capable packets, and in turn a high drop 179 probability for non-ECN packets. Therefore non-ECN SYNs are dropped 180 aggressively, rendering it nearly impossible to establish a new 181 connection in the presence of even mild traffic load. 183 Finally, there are ongoing experimental efforts to promote the 184 adoption of a slightly modified variant of DCTCP (and similar 185 congestion controls) over the Internet to achieve low latency, low 186 loss and scalable throughput (L4S) for all communications 187 [I-D.ietf-tsvwg-l4s-arch]. In such an approach, L4S packets identify 188 themselves using an ECN codepoint [I-D.ietf-tsvwg-ecn-l4s-id]. With 189 L4S, preventing TCP control packets from obtaining the benefits of 190 ECN would not only expose them to the prevailing level of congestion 191 loss, but it would also classify them into a different queue. Then 192 only L4S data packets would enjoy ultra-low latency forwarding, while 193 the packets controlling and retransmitting these data packets would 194 still get stuck behind the queue induced by legacy ('Classic') TCP 195 traffic. 197 1.2. Experiment Goals 199 The goal of the experimental modifications defined in this document 200 is to allow the use of ECN on all TCP packets. Experiments are 201 expected in the public Internet as well as in controlled environments 202 to understand the following issues: 204 o How SYNs, Window probes, pure ACKs, FINs, RSTs and retransmissions 205 that carry the ECT(0), ECT(1) or CE codepoints are processed by 206 the TCP endpoints and the network (including routers, firewalls 207 and other middleboxes). In particular we would like to learn if 208 these packets are frequently blocked or if these packets are 209 usually forwarded and processed. 211 o The scale of deployment of the different flavours of ECN, 212 including [RFC3168], [RFC5562], [RFC3540] and 213 [I-D.ietf-tcpm-accurate-ecn]. 215 o How much the performance of TCP communications is improved by 216 allowing ECN marking of each packet type. 218 o To identify any issues (including security issues) raised by 219 enabling ECN marking of these packets. 221 The data gathered through the experiments described in this document, 222 particularly under the first 2 bullets above, will help in the 223 redesign of the final mechanism (if needed) for adding ECN support to 224 the different packet types considered in this document. Whenever 225 data input is needed to assist in a design choice, it is spelled out 226 throughout the document. 228 Success criteria: The experiment will be a success if we obtain 229 enough data to have a clearer view of the deployability and benefits 230 of enabling ECN on all TCP packets, as well as any issues. If the 231 results of the experiment show that it is feasible to deploy such 232 changes; that there are gains to be achieved through the changes 233 described in this specification; and that no other major issues may 234 interfere with the deployment of the proposed changes; then it would 235 be reasonable to adopt the proposed changes in a standards track 236 specification that would update RFC 3168. 238 1.3. Document Structure 240 The remainder of this document is structured as follows. In 241 Section 2, we present the terminology used in the rest of the 242 document. In Section 3, we specify the modifications to provide ECN 243 support to TCP SYNs, pure ACKs, Window probes, FINs, RSTs and 244 retransmissions. We describe both the network behaviour and the 245 endpoint behaviour. Section 5 discusses variations of the 246 specification that will be necessary to interwork with a number of 247 popular variants or derivatives of TCP. RFC 3168 provides a number 248 of specific reasons why ECN support is not appropriate for each 249 packet type. In Section 4, we revisit each of these arguments for 250 each packet type to justify why it is reasonable to conduct this 251 experiment. 253 2. Terminology 255 The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, 256 SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL in this 257 document, are to be interpreted as described in BCP 14 [RFC2119] when 258 and only when they appear in all capitals. 260 Pure ACK: A TCP segment with the ACK flag set and no data payload. 262 SYN: A TCP segment with the SYN (synchronize) flag set. 264 Window probe: Defined in [RFC0793], a window probe is a TCP segment 265 with only one byte of data sent to learn if the receive window is 266 still zero. 268 FIN: A TCP segment with the FIN (finish) flag set. 270 RST: A TCP segment with the RST (reset) flag set. 272 Retransmission: A TCP segment that has been retransmitted by the TCP 273 sender. 275 TCP client: The initiating end of a TCP connection. Also called the 276 initiator. 278 TCP server: The responding end of a TCP connection. Also called the 279 responder. 281 ECT: ECN-Capable Transport. One of the two codepoints ECT(0) or 282 ECT(1) in the ECN field [RFC3168] of the IP header (v4 or v6). An 283 ECN-capable sender sets one of these to indicate that both transport 284 end-points support ECN. When this specification says the sender sets 285 an ECT codepoint, by default it means ECT(0). Optionally, it could 286 mean ECT(1), which is in the process of being redefined for use by 287 L4S experiments [RFC8311] [I-D.ietf-tsvwg-ecn-l4s-id]. 289 Not-ECT: The ECN codepoint set by senders that indicates that the 290 transport is not ECN-capable. 292 CE: Congestion Experienced. The ECN codepoint that an intermediate 293 node sets to indicate congestion [RFC3168]. A node sets an 294 increasing proportion of ECT packets to CE as the level of congestion 295 increases. 297 3. Specification 299 The experimental ECN++ changes to the specification of TCP over ECN 300 [RFC3168] defined here primarily alter the behaviour of the sending 301 host for each half-connection. However, there are subsections for 302 forwarding elements and receivers below, which recommend that they 303 accept the new packets - they should do already, but might not. This 304 will allow implementers to check the receive side code while they are 305 altering the send-side code. All changes can be deployed at each 306 end-point independently of others and independent of any network 307 behaviour. 309 The feedback behaviour at the receiver depends on whether classic ECN 310 TCP feedback [RFC3168] or Accurate ECN (AccECN) TCP feedback 311 [I-D.ietf-tcpm-accurate-ecn] has been negotiated. Nonetheless, 312 neither receiver feedback behaviour is altered by the present 313 specification. 315 3.1. Network (e.g. Firewall) Behaviour 317 Previously the specification of ECN for TCP [RFC3168] required the 318 sender to set not-ECT on TCP control packets and retransmissions. 319 Some readers of RFC 3168 might have erroneously interpreted this as a 320 requirement for firewalls, intrusion detection systems, etc. to check 321 and enforce this behaviour. Section 4.3 of [RFC8311] updates RFC 322 3168 to remove this ambiguity. It requires firewalls or any 323 intermediate nodes not to treat certain types of ECN-capable TCP 324 segment differently (except potentially in one attack scenario). 325 This is likely to only involve a firewall rule change in a fraction 326 of cases (at most 0.4% of paths according to the tests reported in 327 Section 4.2.2). 329 In case a TCP sender encounters a middlebox blocking ECT on certain 330 TCP segments, the specification below includes behaviour to fall back 331 to non-ECN. However, this loses the benefit of ECN on control 332 packets. So operators are RECOMMENDED to alter their firewall rules 333 to comply with the requirement referred to above (section 4.3 of 334 [RFC8311]). 336 3.2. Sender Behaviour 338 For each type of control packet or retransmission, the following 339 sections detail changes to the sender's behaviour in two respects: i) 340 whether it sets ECT; and ii) its response to congestion feedback. 341 Table 1 summarises these two behaviours for each type of packet, but 342 the relevant subsection below should be referred to for the detailed 343 behaviour. The subsection on the SYN is more complex than the 344 others, because it has to include fall-back behaviour if the ECT 345 packet appears not to have got through, and caching of the outcome to 346 detect persistent failures. 348 +---------+----------------+-----------------+----------------------+ 349 | TCP | ECN field if | ECN field if | Congestion Response | 350 | packet | AccECN f/b | RFC3168 f/b | | 351 | type | negotiated* | negotiated* | | 352 +---------+----------------+-----------------+----------------------+ 353 | SYN | ECT | not-ECT | If AccECN, reduce IW | 354 | | | | | 355 | SYN-ACK | ECT | ECT | Reduce IW | 356 | | | | | 357 | Pure | ECT | not-ECT | If AccECN, usual | 358 | ACK | | | cwnd response and | 359 | | | | optionally [RFC5690] | 360 | | | | | 361 | W Probe | ECT | ECT | Usual cwnd response | 362 | | | | | 363 | FIN | ECT | ECT | None or optionally | 364 | | | | [RFC5690] | 365 | | | | | 366 | RST | ECT | ECT | N/A | 367 | | | | | 368 | Re-XMT | ECT | ECT | Usual cwnd response | 369 +---------+----------------+-----------------+----------------------+ 371 Window probe and retransmission are abbreviated to W Probe an Re-XMT. 372 * For a SYN, "negotiated" means "requested". 374 Table 1: Summary of sender behaviour. In each case the relevant 375 section below should be referred to for the detailed behaviour 377 It can be seen that the sender cannot set ECT on the SYN if it is not 378 requesting AccECN feedback. Therefore it is RECOMMENDED that the 379 experimental AccECN specification [I-D.ietf-tcpm-accurate-ecn] is 380 implemented (as well as the present specification), because it is 381 expected that ECT on the SYN will give the most significant 382 performance gain, particularly for short flows. 384 Nonetheless, this specification also caters for the case where an 385 ECN++ TCP sender is not using AccECN. This could be because it does 386 not support AccECN or because the other end of the TCP connection 387 does not (AccECN can only be used for a connection if both ends 388 support it). 390 3.2.1. SYN (Send) 392 3.2.1.1. Setting ECT on the SYN 394 With classic [RFC3168] ECN feedback, the SYN was not expected to be 395 ECN-capable, so the flag provided to feed back congestion was put to 396 another use (it is used in combination with other flags to indicate 397 that the responder supports ECN). In contrast, Accurate ECN (AccECN) 398 feedback [I-D.ietf-tcpm-accurate-ecn] provides a codepoint in the 399 SYN-ACK for the responder to feed back whether the SYN arrived marked 400 CE. Therefore the setting of the IP/ECN field on the SYN is 401 specified separately for each case in the following two subsections. 403 3.2.1.1.1. ECN++ TCP Client also Supports AccECN 405 For the ECN++ experiment, if the SYN is requesting AccECN feedback, 406 the TCP sender will also set ECT on the SYN. It can ignore the 407 prohibition in section 6.1.1 of RFC 3168 against setting ECT on such 408 a SYN, as per Section 4.3 of [RFC8311]. 410 3.2.1.1.2. ECN++ TCP Client does not Support AccECN 412 A TCP initiator MUST NOT set ECT on a SYN if it does not also attempt 413 to negotiate Accurate ECN feedback in the same SYN. 415 If the TCP initiator does not support AccECN, the rest of 416 Section 3.2.1 does not apply. It solely applies to the case where 417 the TCP initiator supports AccECN as well as ECN++. 419 3.2.1.2. Caching where to use ECT on SYNs 421 As explained above, this subsection only applies if the ECN++ TCP 422 client also supports AccECN. 424 Until AccECN servers become widely deployed, a TCP initiator that 425 sets ECT on a SYN (which implies the same SYN also requests AccECN, 426 as required above) SHOULD also maintain a cache entry per server to 427 record servers that it is not worth sending an ECT SYN to, e.g. 428 because they do not support AccECN and therefore have no logic for 429 congestion markings on the SYN. Mobile hosts MAY maintain a cache 430 entry per access network to record 'non-ECT SYN' entries against 431 proxies (see Section 4.2.1). 433 Subsequently the initiator will not set ECT on a SYN to such a server 434 or proxy, but it can still always request AccECN support (because the 435 response will state any earlier stage of ECN evolution that the 436 server supports with no performance penalty). The initiator will 437 discover a server that has upgraded to support AccECN as soon as it 438 next connects, then it can remove the server from its cache and 439 subsequently always set ECT for that server. 441 The client can limit the size of its cache of 'non-ECT SYN' servers. 442 Then, while AccECN is not widely deployed, it will only cache the 443 'non-ECT SYN' servers that are most used and most recently used by 444 the client. As the client accesses servers that have been expelled 445 from its cache, it will simply use ECT on the SYN by default. 447 Servers that do not support ECN as a whole do not need to be recorded 448 separately from non-support of AccECN because the response to a 449 request for AccECN immediately states which stage in the evolution of 450 ECN the server supports (AccECN [I-D.ietf-tcpm-accurate-ecn], classic 451 ECN [RFC3168] or no ECN). 453 The above strategy is named "optimistic ECT and cache failures". It 454 is believed to be sufficient based on three measurement studies and 455 assumptions detailed in Section 4.2.1. However, Section 4.2.1 gives 456 two other strategies and the choice between them depends on the 457 implementer's goals and the deployment prevalence of ECN variants in 458 the network and on servers, not to mention the prevalence of some 459 significant bugs. 461 If the initiator times out without seeing a SYN-ACK, it will 462 separately cache this fact (see fall-back in Section 3.2.1.4 for 463 details). 465 3.2.1.3. SYN Congestion Response 467 As explained above, this subsection only applies if the ECN++ TCP 468 client also supports AccECN. 470 If the SYN-ACK returned to the TCP initiator confirms that the server 471 supports AccECN, it will also indicate whether or not the SYN was CE- 472 marked. If the SYN was CE-marked, the initiator MUST reduce its 473 Initial Window (IW) and SHOULD reduce it to 1 SMSS (sender maximum 474 segment size). 476 If the initiator has set ECT on the SYN and if the SYN-ACK shows that 477 the server does not support AccECN, the TCP initiator MUST 478 conservatively reduce its Initial Window and SHOULD reduce it to 1 479 SMSS. A reduction to greater than 1 SMSS MAY be appropriate (see 480 Section 4.2.1). Conservatism is necessary because a non-AccECN SYN- 481 ACK cannot show whether the SYN was CE-marked. 483 If the TCP initiator (host A) receives a SYN from the remote end 484 (host B) after it has sent a SYN to B, it indicates the (unusual) 485 case of a simultaneous open. Host A will respond with a SYN-ACK. 486 Host A will probably then receive a SYN-ACK in response to its own 487 SYN, after which it can follow the appropriate one of the two 488 paragraphs above. 490 In all the above cases, the initiator does not have to back off its 491 retransmission timer as it would in response to a timeout following 492 no response to its SYN [RFC6298], because both the SYN and the SYN- 493 ACK have been successfully delivered through the network. Also, the 494 initiator does not need to exit slow start or reduce ssthresh, which 495 is not even required when a SYN is lost [RFC5681]. 497 If an initial window of 10 (IW10 [RFC6928]) is implemented, Section 5 498 gives additional recommendations. 500 3.2.1.4. Fall-Back Following No Response to an ECT SYN 502 As explained above, this subsection only applies if the ECN++ TCP 503 client also supports AccECN. 505 An ECT SYN might be lost due to an over-zealous path element (or 506 server) blocking ECT packets that do not conform to RFC 3168. Some 507 evidence of this was found in a 2014 study [ecn-pam], but in a more 508 recent study using 2017 data [Mandalari18] extensive measurements 509 found no case where ECT on TCP control packets was treated any 510 differently from ECT on TCP data packets. Loss is commonplace for 511 numerous other reasons, e.g. congestion loss at a non-ECN queue on 512 the forward or reverse path, transmission errors, etc. 513 Alternatively, the cause of the loss might be the attempt to 514 negotiate AccECN, or possibly other unrelated options on the SYN. 516 Therefore, if the timer expires after the TCP initiator has sent the 517 first ECT SYN, it SHOULD make one more attempt to retransmit the SYN 518 with ECT set (backing off the timer as usual). If the retransmission 519 timer expires again, it SHOULD retransmit the SYN with the not-ECT 520 codepoint in the IP header, to expedite connection set-up. If other 521 experimental fields or options were on the SYN, it will also be 522 necessary to follow their specifications for fall-back too. It would 523 make sense to coordinate all the strategies for fall-back in order to 524 isolate the specific cause of the problem. 526 If the TCP initiator is caching failed connection attempts, it SHOULD 527 NOT give up using ECT on the first SYN of subsequent connection 528 attempts until it is clear that a blockage persistently and 529 specifically affects ECT on SYNs. This is because loss is so 530 commonplace for other reasons. Even if it does eventually decide to 531 give up setting ECT on the SYN, it will probably not need to give up 532 on AccECN on the SYN. In any case, if a cache is used, it SHOULD be 533 arranged to expire so that the initiator will infrequently attempt to 534 check whether the problem has been resolved. 536 Other fall-back strategies MAY be adopted where applicable (see 537 Section 4.2.2 for suggestions, and the conditions under which they 538 would apply). 540 3.2.2. SYN-ACK (Send) 542 3.2.2.1. Setting ECT on the SYN-ACK 544 For the ECN++ experiment, the TCP implementation will set ECT on SYN- 545 ACKs. It can ignore the requirement in section 6.1.1 of RFC 3168 to 546 set not-ECT on a SYN-ACK, as per Section 4.3 of [RFC8311]. 548 3.2.2.2. SYN-ACK Congestion Response 550 A host that sets ECT on SYN-ACKs MUST reduce its initial window in 551 response to any congestion feedback, whether using classic ECN or 552 AccECN (see Section 4.3.1). It SHOULD reduce it to 1 SMSS. This is 553 different to the behaviour specified in an earlier experiment that 554 set ECT on the SYN-ACK [RFC5562]. This is justified in Section 4.3. 556 The responder does not have to back off its retransmission timer 557 because the ECN feedback proves that the network is delivering 558 packets successfully and is not severely overloaded. Also the 559 responder does not have to leave slow start or reduce ssthresh, which 560 is not even required when a SYN-ACK has been lost. 562 The congestion response to CE-marking on a SYN-ACK for a server that 563 implements either the TCP Fast Open experiment (TFO [RFC7413]) or the 564 initial window of 10 experiment (IW10 [RFC6928]) is discussed in 565 Section 5. 567 3.2.2.3. Fall-Back Following No Response to an ECT SYN-ACK 569 After the responder sends a SYN-ACK with ECT set, if its 570 retransmission timer expires it SHOULD retransmit one more SYN-ACK 571 with ECT set (and back-off its timer as usual). If the timer expires 572 again, it SHOULD retransmit the SYN-ACK with not-ECT in the IP 573 header. If other experimental fields or options were on the initial 574 SYN-ACK, it will also be necessary to follow their specifications for 575 fall-back. It would make sense to co-ordinate all the strategies for 576 fall-back in order to isolate the specific cause of the problem. 578 This fall-back strategy attempts to use ECT one more time than the 579 strategy for ECT SYN-ACKs in [RFC5562] (which is made obsolete, being 580 superseded by the present specification). Other fall-back strategies 581 MAY be adopted if found to be more effective, e.g. fall-back to not- 582 ECT on the first retransmission attempt. 584 The server MAY cache failed connection attempts, e.g. per client 585 access network. A client-based alternative to caching at the server 586 is given in Section 4.3.3. If the TCP server is caching failed 587 connection attempts, it SHOULD NOT give up using ECT on the first 588 SYN-ACK of subsequent connection attempts until it is clear that the 589 blockage persistently and specifically affects ECT on SYN-ACKs. This 590 is because loss is so commonplace for other reasons (see 591 Section 3.2.1.4). If a cache is used, it SHOULD be arranged to 592 expire so that the server will infrequently attempt to check whether 593 the problem has been resolved. 595 3.2.3. Pure ACK (Send) 597 A Pure ACK is an ACK packet that does not carry data, which includes 598 the Pure ACK at the end of TCP's 3-way handshake. 600 For the ECN++ experiment, whether a TCP implementation sets ECT on a 601 Pure ACK depends on whether or not Accurate ECN TCP feedback 602 [I-D.ietf-tcpm-accurate-ecn] has been successfully negotiated for a 603 particular TCP connection, as specified in the following two 604 subsections. 606 3.2.3.1. Pure ACK without AccECN Feedback 608 If AccECN has not been successfully negotiated for a connection, ECT 609 MUST NOT be set on Pure ACKs by either end. 611 3.2.3.2. Pure ACK with AccECN Feedback 613 For the ECN++ experiment, if AccECN has been successfully negotiated, 614 either end of the connection will set ECT on Pure ACKs. They can 615 ignore the requirement in section 6.1.4 of RFC 3168 to set not-ECT on 616 a pure ACK, as per Section 4.3 of [RFC8311]. 618 MEASUREMENTS NEEDED: Measurements are needed to learn how the 619 deployed base of network elements and RFC 3168 servers react to 620 pure ACKs marked with the ECT(0)/ECT(1)/CE codepoints, i.e. 621 whether they are dropped, codepoint cleared or processed and the 622 congestion indication fed back on a subsequent packet. 624 See Section 3.3.3 for the implications if a host receives a CE-marked 625 Pure ACK. 627 3.2.3.2.1. Pure ACK Congestion Response 629 As explained above, this subsection only applies if AccECN has been 630 successfully negotiated for the TCP connection. 632 A host that sets ECT on pure ACKs SHOULD respond to the congestion 633 signal resulting from pure ACKs being marked with the CE codepoint. 634 The specific response will need to be defined as an update to each 635 congestion control specification. Possible responses to congestion 636 feedback include reducing the congestion window (CWND) and/or 637 regulating the pure ACK rate (see Section 4.4.1.1). 639 Note that, in comparison, TCP Congestion Control [RFC5681] does not 640 require a TCP to detect or respond to loss of pure ACKs at all; it 641 requires no reduction in congestion window or ACK rate. 643 3.2.4. Window Probe (Send) 645 For the ECN++ experiment, the TCP sender will set ECT on window 646 probes. It can ignore the prohibition in section 6.1.6 of RFC 3168 647 against setting ECT on a window probe, as per Section 4.3 of 648 [RFC8311]. 650 A window probe contains a single octet, so it is no different from a 651 regular TCP data segment. Therefore a TCP receiver will feed back 652 any CE marking on a window probe as normal (either using classic ECN 653 feedback or AccECN feedback). The sender of the probe will then 654 reduce its congestion window as normal. 656 A receive window of zero indicates that the application is not 657 consuming data fast enough and does not imply anything about network 658 congestion. Once the receive window opens, the congestion window 659 might become the limiting factor, so it is correct that CE-marked 660 probes reduce the congestion window. This complements cwnd 661 validation [RFC7661], which reduces cwnd as more time elapses without 662 having used available capacity. However, CE-marking on window probes 663 does not reduce the rate of the probes themselves. This is unlikely 664 to present a problem, given the duration between window probes 665 doubles [RFC1122] as long as the receiver is advertising a zero 666 window (currently minimum 1 second, maximum at least 1 minute 667 [RFC6298]). 669 MEASUREMENTS NEEDED: Measurements are needed to learn how the 670 deployed base of network elements and servers react to Window 671 probes marked with the ECT(0)/ECT(1)/CE codepoints, i.e. whether 672 they are dropped, codepoint cleared or processed. 674 3.2.5. FIN (Send) 676 A TCP implementation can set ECT on a FIN. 678 See Section 3.3.4 for the implications if a host receives a CE-marked 679 FIN. 681 A congestion response to a CE-marking on a FIN is not required. 683 After sending a FIN, the endpoint will not send any more data in the 684 connection. Therefore, even if the FIN-ACK indicates that the FIN 685 was CE-marked (whether using classic or AccECN feedback), reducing 686 the congestion window will not affect anything. 688 After sending a FIN, a host might send one or more pure ACKs. If it 689 is using one of the techniques in Section 3.2.3 to regulate the 690 delayed ACK ratio for pure ACKs, it could equally be applied after a 691 FIN. But this is not required. 693 MEASUREMENTS NEEDED: Measurements are needed to learn how the 694 deployed base of network elements and servers react to FIN packets 695 marked with the ECT(0)/ECT(1)/CE codepoints, i.e. whether they 696 are dropped, codepoint cleared or processed. 698 3.2.6. RST (Send) 700 A TCP implementation can set ECT on a RST. 702 See Section 3.3.5 for the implications if a host receives a CE-marked 703 RST. 705 A congestion response to a CE-marking on a RST is not required (and 706 actually not possible). 708 MEASUREMENTS NEEDED: Measurements are needed to learn how the 709 deployed base of network elements and servers react to RST packets 710 marked with the ECT(0)/ECT(1)/CE codepoints, i.e. whether they 711 are dropped, codepoint cleared or processed. 713 3.2.7. Retransmissions (Send) 715 For the ECN++ experiment, the TCP sender will set ECT on 716 retransmitted segments. It can ignore the prohibition in section 717 6.1.5 of RFC 3168 against setting ECT on retransmissions, as per 718 Section 4.3 of [RFC8311]. 720 See Section 3.3.6 for the implications if a host receives a CE-marked 721 retransmission. 723 If the TCP sender receives feedback that a retransmitted packet was 724 CE-marked, it will react as it would to any feedback of CE-marking on 725 a data packet. 727 MEASUREMENTS NEEDED: Measurements are needed to learn how the 728 deployed base of network elements and servers react to 729 retransmissions marked with the ECT(0)/ECT(1)/CE codepoints, i.e. 730 whether they are dropped, codepoint cleared or processed. 732 3.2.8. General Fall-back for any Control Packet or Retransmission 734 Extensive measurements in fixed and mobile networks [Mandalari18] 735 have found no evidence of blockages due to ECT being set on any type 736 of TCP control packet. 738 In case traversal problems arise in future, fall-back measures have 739 been specified above, but only for the cases where ECT on the initial 740 packet of a half-connection (SYN or SYN-ACK) is persistently failing 741 to get through. 743 Fall-back measures for blockage of ECT on other TCP control packets 744 MAY be implemented. However they are not specified here given the 745 lack of any evidence they will be needed. Section 4.9 justifies this 746 advice in more detail. 748 3.3. Receiver Behaviour 750 The present ECN++ specification primarily concerns the behaviour for 751 sending TCP control packets or retransmissions. Below are a few 752 changes to the receive side of an implementation that are recommended 753 while updating its send side. Nonetheless, where deployment is 754 concerned, ECN++ is still a sender-only deployment, because it does 755 not depend on receivers complying with any of these recommendations. 757 3.3.1. Receiver Behaviour for Any TCP Control Packet or Retransmission 759 RFC8311 is a standards track update to RFC 3168 in order to (amongst 760 other things) "...allow the use of ECT codepoints on SYN packets, 761 pure acknowledgement packets, window probe packets, and 762 retransmissions of packets..., provided that the changes from RFC 763 3168 are documented in an Experimental RFC in the IETF document 764 stream." 766 Section 4.3 of RFC 8311 amends every statement in RFC 3168 that 767 precludes the use of ECT on control packets and retransmissions to 768 add "unless otherwise specified by an Experimental RFC in the IETF 769 document stream". The present specification is such an Experimental 770 RFC. Therefore, In order for this experiment to be useful, the 771 following requirements follow from RFC8311: 773 o Any TCP implementation SHOULD accept receipt of any valid TCP 774 control packet or retransmission irrespective of its IP/ECN field. 775 If any existing implementation does not, it SHOULD be updated to 776 do so. 778 o A TCP implementation taking part in the experiments proposed here 779 MUST accept receipt of any valid TCP control packet or 780 retransmission irrespective of its IP/ECN field. 782 These measures are derived from the robustness principle of "... be 783 liberal in what you accept from others", in order to ensure 784 compatibility with any future protocol changes that allow ECT on any 785 TCP packet. 787 3.3.2. SYN (Receive) 789 RFC 3168 negotiates the use of ECN for the connection end-to-end 790 using the ECN flags in the TCP header. When RFC3168 says that "A 791 host MUST NOT set ECT on SYN ... packets." it is silent as to what a 792 TCP server ought to do if it receives a SYN packet with a non-zero 793 IP/ECN field. 795 Some implementations of TCP servers (e.g. current Linux) assume that, 796 if a host receives a SYN with a non-zero IP/ECN field, it must be due 797 to network mangling, and they disable ECN for the rest of the 798 connection. Section 4.2.2.2 finds that this type of network mangling 799 seems to be virtually non-existent so it would be preferable to 800 report any such mangling so it can be fixed. 802 For the avoidance of doubt, the normative statements for all TCP 803 control packets in Section 3.3.1 are interpreted for the case when a 804 SYN is received as follows: 806 o Any TCP server implementation SHOULD accept receipt of a valid SYN 807 that requests ECN support for the connection, irrespective of the 808 IP/ECN field of the SYN. If any existing implementation does not, 809 it SHOULD be updated to do so. 811 o A TCP implementation taking part in the ECN++ experiment MUST 812 accept receipt of a valid SYN, irrespective of its IP/ECN field. 814 o If the SYN is CE-marked and the server has no logic to feed back a 815 CE mark on a SYN-ACK (e.g. it does not support AccECN), it has to 816 ignore the CE-mark (the client detects this case and behaves 817 conservatively in mitigation - see Section 3.2.1.3). 819 3.3.3. Pure ACK (Receive) 821 For the avoidance of doubt, the normative statements for all TCP 822 control packets in Section 3.3.1 are interpreted for the case when a 823 Pure ACK is received as follows: 825 o Any TCP implementation SHOULD accept receipt of a pure ACK with a 826 non-zero ECN field, despite current RFCs precluding the sending of 827 such packets. 829 o A TCP implementation taking part in the ECN++ experiment MUST 830 accept receipt of a pure ACK with a non-zero ECN field. 832 The question of whether and how the receiver of pure ACKs is required 833 to feed back any CE marks on them is outside the scope of the present 834 specification because it is a matter for the relevant feedback 835 specification ([RFC3168] or [I-D.ietf-tcpm-accurate-ecn]). Currently 836 AccECN feedback is required to count CE marking of any control packet 837 including pure ACKs. Whereas RFC 3168 is silent on this point, so 838 feedback of CE-markings might be implementation specific (see 839 Section 4.4.1.1). 841 3.3.4. FIN (Receive) 843 The TCP data receiver MUST ignore the CE codepoint on incoming FINs 844 that fail any validity check. The validity check in section 5.2 of 845 [RFC5961] is RECOMMENDED. 847 3.3.5. RST (Receive) 849 The "challenge ACK" approach to checking the validity of RSTs 850 (section 3.2 of [RFC5961] is RECOMMENDED at the data receiver. 852 3.3.6. Retransmissions (Receive) 854 The TCP data receiver MUST ignore the CE codepoint on incoming 855 segments that fail any validity check. The validity check in section 856 5.2 of [RFC5961] is RECOMMENDED. This will effectively mitigate an 857 attack that uses spoofed data packets to fool the receiver into 858 feeding back spoofed congestion indications to the sender, which in 859 turn would be fooled into continually reducing its congestion window. 861 4. Rationale 863 This section is informative, not normative. It presents counter- 864 arguments against the justifications in the RFC series for disabling 865 ECN on TCP control segments and retransmissions. It also gives 866 rationale for why ECT is safe on control segments that have not, so 867 far, been mentioned in the RFC series. First it addresses over- 868 arching arguments used for most packet types, then it addresses the 869 specific arguments for each packet type in turn. 871 4.1. The Reliability Argument 873 Section 5.2 of RFC 3168 states: 875 "To ensure the reliable delivery of the congestion indication of 876 the CE codepoint, an ECT codepoint MUST NOT be set in a packet 877 unless the loss of that packet [at a subsequent node] in the 878 network would be detected by the end nodes and interpreted as an 879 indication of congestion." 881 We believe this argument is misplaced. TCP does not deliver most 882 control packets reliably. So it is more important to allow control 883 packets to be ECN-capable, which greatly improves reliable delivery 884 of the control packets themselves (see motivation in Section 1.1). 885 ECN also improves the reliability and latency of delivery of any 886 congestion notification on control packets, particularly because TCP 887 does not detect the loss of most types of control packet anyway. 888 Both these points outweigh by far the concern that a CE marking 889 applied to a control packet by one node might subsequently be dropped 890 by another node. 892 The principle to determine whether a packet can be ECN-capable ought 893 to be "do no extra harm", meaning that the reliability of a 894 congestion signal's delivery ought to be no worse with ECN than 895 without. In particular, setting the CE codepoint on the very same 896 packet that would otherwise have been dropped fulfills this 897 criterion, since either the packet is delivered and the CE signal is 898 delivered to the endpoint, or the packet is dropped and the original 899 congestion signal (packet loss) is delivered to the endpoint. 901 The concern about a CE marking being dropped at a subsequent node 902 might be motivated by the idea that ECN-marking a packet at the first 903 node does not remove the packet, so it could go on to worsen 904 congestion at a subsequent node. However, it is not useful to reason 905 about congestion by considering single packets. The departure rate 906 from the first node will generally be the same (fully utilized) with 907 or without ECN, so this argument does not apply. 909 4.2. SYNs 911 RFC 5562 presents two arguments against ECT marking of SYN packets 912 (quoted verbatim): 914 "First, when the TCP SYN packet is sent, there are no guarantees 915 that the other TCP endpoint (node B in Figure 2) is ECN-Capable, 916 or that it would be able to understand and react if the ECN CE 917 codepoint was set by a congested router. 919 Second, the ECN-Capable codepoint in TCP SYN packets could be 920 misused by malicious clients to "improve" the well-known TCP SYN 921 attack. By setting an ECN-Capable codepoint in TCP SYN packets, a 922 malicious host might be able to inject a large number of TCP SYN 923 packets through a potentially congested ECN-enabled router, 924 congesting it even further." 926 The first point actually describes two subtly different issues. So 927 below three arguments are countered in turn. 929 4.2.1. Argument 1a: Unrecognized CE on the SYN 931 This argument certainly applied at the time RFC 5562 was written, 932 when no ECN responder mechanism had any logic to recognize a CE 933 marking on a SYN and, even if logic were added, there was no field in 934 the SYN-ACK to feed it back. The problem was that, during the 3WHS, 935 the flag in the TCP header for ECN feedback (called Echo Congestion 936 Experienced) had been overloaded to negotiate the use of ECN itself. 938 The accurate ECN (AccECN) protocol [I-D.ietf-tcpm-accurate-ecn] has 939 since been designed to solve this problem. Two features are 940 important here: 942 1. An AccECN server uses the 3 'ECN' bits in the TCP header of the 943 SYN-ACK to respond to the client. 4 of the possible 8 codepoints 944 provide enough space for the server to feed back which of the 4 945 IP/ECN codepoints was on the incoming SYN (including CE of 946 course). 948 2. If any of these 4 codepoints are in the SYN-ACK, it confirms that 949 the server supports AccECN and, if another codepoint is returned, 950 it confirms that the server doesn't support AccECN. 952 This still does not seem to allow a client set ECT on a SYN, it only 953 finds out whether the server would have supported it afterwards. The 954 trick the client uses for ECN++ is to set ECT on the SYN 955 optimistically then, if the SYN-ACK reveals that the server wouldn't 956 have understood CE on the SYN, the client responds conservatively as 957 if the SYN was marked with CE. 959 Happily, the appropriate conservative congestion response is to 960 reduce the initial window, and it is extremely rare for a TCP client 961 to send more than one packet as its initial request anyway. Any 962 clients that do frequently use a larger initial window for their 963 first message to the server can cache which servers will not 964 understand ECT on a SYN (see Section 4.2.3 below). 966 4.2.2. Argument 1b: ECT Considered Invalid on the SYN 968 Given, until now, ECT-marked SYN packets have been prohibited, it 969 cannot be assumed they will be accepted, by TCP middleboxes or 970 servers. 972 4.2.2.1. ECT on SYN Considered Invalid by Middleboxes 974 According to a study using 2014 data [ecn-pam] from a limited range 975 of fixed vantage points, for the top 1M Alexa web sites, adding the 976 ECN capability to SYNs was increasing connection establishment 977 failures by about 0.4%. 979 From a wider range of fixed and mobile vantage points, a more recent 980 study in Jan-May 2017 [Mandalari18] found no occurrences of blocking 981 of ECT on SYNs. However, in more than half the mobile networks 982 tested it found wiping of the ECN codepoint at the first hop. 984 MEASUREMENTS NEEDED: As wiping at the first hop is remedied, 985 measurements will be needed to check whether SYNs with ECT are 986 sometimes blocked deeper into the path. 988 Silent failures introduce a retransmission timeout delay (default 1 989 second) at the initiator before it attempts any fall back strategy 990 (whereas explicit RSTs can be dealt with immediately). Ironically, 991 making SYNs ECN-capable is intended to avoid the timeout when a SYN 992 is lost due to congestion. Fortunately, if there is any discard of 993 ECN-capable SYNs due to policy, it will occur predictably, not 994 randomly like congestion. So the initiator should be able to avoid 995 it by caching those sites that do not support ECN-capable SYNs (see 996 the last paragraph of Section 3.2.1.2). 998 4.2.2.2. ECT on SYN Considered Invalid by Servers 1000 A study conducted in Nov 2017 [Kuehlewind18] found that, of the 82% 1001 of the Alexa top 50k web servers that supported ECN, 84% disabled ECN 1002 if the IP/ECN field on the SYN was ECT0, CE or either. Given most 1003 web servers use Linux, this behaviour can most likely be traced to a 1004 patch contributed in May 2012 that was first distributed in v3.5 of 1005 the Linux kernel [strict-ecn]. The comment says "RFC3168 : 6.1.1 SYN 1006 packets must not have ECT/ECN bits set. If we receive a SYN packet 1007 with these bits set, it means a network is playing bad games with TOS 1008 bits. In order to avoid possible false congestion notifications, we 1009 disable TCP ECN negociation." Of course, some of the 84% might be 1010 due to similar code in other OSs. 1012 For brevity we shall call this the "over-strict" ECN test, because it 1013 is over-conservative with what it accepts, contrary to Postel's 1014 robustness principle. A robust protocol will not usually assume 1015 network mangling without comparing with the value originally sent, 1016 and one packet is not sufficient to make an assumption with such 1017 irreversible consequences anyway. 1019 Ironically, networks rarely seem to alter the IP/ECN field on a SYN 1020 from zero to non-zero anyway. In a study conducted in Jan-May 2017 1021 over millions of paths from vantage points in a few dozen mobile and 1022 fixed networks [Mandalari18], no such transition was observed. With 1023 such a small or non-existent incidence of this sort of network 1024 mangling, it would be preferable to report any residual problem paths 1025 so that they can be fixed. 1027 Whatever, the widespread presence of this 'over-strict' test proves 1028 that RFC 5562 was correct to expect that ECT would be considered 1029 invalid on SYNs. Nonetheless, it is not an insurmountable problem - 1030 the over-strict test in Linux was patched in Apr 2019 1031 [relax-strict-ecn] and caching can work round it where previous 1032 versions of Linux are running. The prevalence of these "over-strict" 1033 ECN servers makes it challenging to cache them all. However, 1034 Section 4.2.3 below explains how a cache of limited size can 1035 alleviate this problem for a client's most popular sites. 1037 For the future, [RFC8311] updates RFC 3168 to clarify that the IP/ECN 1038 field does not have to be zero on a SYN if documented in an 1039 experimental RFC such as the present ECN++ specification. 1041 4.2.3. Caching Strategies for ECT on SYNs 1043 Given the server handling of ECN on SYNs outlined in Section 4.2.2.2 1044 above, an initiator might combine AccECN with three candidate caching 1045 strategies for setting ECT on a SYN: 1047 (S1): Pessimistic ECT and cache successes: The initiator always 1048 requests AccECN, but by default without ECT on the SYN. Then 1049 it caches those servers that confirm that they support AccECN 1050 as 'ECT SYN OK'. On a subsequent connection to any server 1051 that supports AccECN, the initiator can then set ECT on the 1052 SYN. When connecting to other servers (non-ECN or classic 1053 ECN) it will not set ECT on the SYN, so it will not fail the 1054 'over-strict' ECN test. 1056 Longer term, as servers upgrade to AccECN, the initiator is 1057 still requesting AccECN, so it will add them to the cache and 1058 use ECT on subsequent SYNs to those servers. However, 1059 assuming it has to cap the size of the cache, the client will 1060 not have the benefit of ECT SYNs to those less frequently used 1061 AccECN servers expelled from its cache. 1063 (S2): Optimistic ECT: The initiator always requests AccECN and by 1064 default sets ECT on the SYN. Then, if the server response 1065 shows it has no AccECN logic (so it cannot feed back a CE 1066 mark), the initiator conservatively behaves as if the SYN was 1067 CE-marked, by reducing its initial window. 1069 A. No cache. 1071 B. Cache failures: The optimistic ECT strategy can be 1072 improved by caching solely those servers that do not 1073 support AccECN as 'ECT SYN NOK'. This would include non- 1074 ECN servers and all Classic ECN servers whether 'over- 1075 strict' or not. On subsequent connections to these non- 1076 AccECN servers, the initiator will still request AccECN 1077 but not set ECT on the SYN. Then, the connection can 1078 still fall back to Classic ECN, if the server supports it, 1079 and the initiator can use its full initial window (if it 1080 has enough request data to need it). 1082 Longer term, as servers upgrade to AccECN, the initiator 1083 will remove them from the cache and use ECT on subsequent 1084 SYNs to that server. 1086 Where an access network operator mediates Internet access 1087 via a proxy that does not support AccECN, the optimistic 1088 ECT strategy will always fail. This scenario is more 1089 likely in mobile networks. Therefore, a mobile host could 1090 cache lack of AccECN support per attached access network 1091 operator. Whenever it attached to a new operator, it 1092 could check a well-known AccECN test server and, if it 1093 found no AccECN support, it would add a cache entry for 1094 the attached operator. It would only use ECT when neither 1095 network nor server were cached. It would only populate 1096 its per server cache when not attached to a non-AccECN 1097 proxy. 1099 (S3): ECT by configuration: In a controlled environment, the 1100 administrator can make sure that servers support ECN-capable 1101 SYN packets. Examples of controlled environments are single- 1102 tenant DCs, and possibly multi-tenant DCs if it is assumed 1103 that each tenant mostly communicates with its own VMs. 1105 For unmanaged environments like the public Internet, pragmatically 1106 the choice is between strategies (S1), (S2A) and (S2B). The 1107 normative specification for ECT on a SYN in Section 3.2.1 recommends 1108 the "optimistic ECT and cache failures" strategy (S2B) but the choice 1109 depends on the implementer's motivation for using ECN++, and the 1110 deployment prevalence of different technologies and bug-fixes. For 1111 instance, if a user's Internet access bottleneck supported L4S ECN 1112 but not Classic ECN, strategy (S2A) would make most sense and there 1113 would be no point trying to avoid the 'over-strict' test and 1114 negotiate Classic ECN. 1116 o The "pessimistic ECT and cache successes" strategy (S1) suffers 1117 from exposing the initial SYN to the prevailing loss level, even 1118 if the server supports ECT on SYNs, but only on the first 1119 connection to each AccECN server. If AccECN becomes widely 1120 deployed on servers, SYNs to those AccECN servers that are less 1121 frequently used by the client and therefore don't fit in the cache 1122 will not benefit from ECN protection at all. 1124 o The "optimistic ECT without a cache" strategy (S2A) is the 1125 simplest. It would satisfy the goal of an implementer who is 1126 solely interested in ultra-low latency using AccECN and ECN++ 1127 (e.g. accessing L4S servers) and is not concerned about fall-back 1128 to Classic ECN (e.g. when accessing other servers). 1130 o The "optimistic ECT and cache failures" strategy (S2B) exploits 1131 ECT on SYNs from the very first attempt. But if the server turns 1132 out to be 'over-strict' it will disable ECN for the connection, 1133 but only for the first connection if it's one of the client's more 1134 popular servers that fits in the cache. If the server turns out 1135 not to support AccECN, the initiator has to conservatively limit 1136 its initial window, but again only for the first connection if 1137 it's one of the client's more popular servers (and anyway this 1138 rarely makes any difference when most client requests fit in a 1139 single packet). 1141 Note that, if AccECN deployment grows, caching successes (S1) starts 1142 off small then grows, while caching failures (S2B) becomes large at 1143 first, then shrinks. At half-way, the size of the cache has to be 1144 capped with either approach, so the default behaviour for all the 1145 servers that do not fit in the cache is as important as the behaviour 1146 for the popular servers that do fit. 1148 MEASUREMENTS NEEDED: Measurements are needed to determine which 1149 strategy would be sufficient for any particular client, whether a 1150 particular client would need different strategies in different 1151 circumstances and how many occurrences of problems would be masked 1152 by how few cache entries. 1154 Another strategy would be to send a not-ECT SYN a short delay (below 1155 the typical lowest RTT) after an ECT SYN and only accept the non-ECT 1156 connection if it returned first. This would reduce the performance 1157 penalty for those deploying ECT SYN support. However, this 'happy 1158 eyeballs' approach becomes complex when multiple optional features 1159 are all tried on the first SYN (or on multiple SYNs), so it is not 1160 recommended. 1162 4.2.4. Argument 2: DoS Attacks 1164 [RFC5562] says that ECT SYN packets could be misused by malicious 1165 clients to augment "the well-known TCP SYN attack". It goes on to 1166 say "a malicious host might be able to inject a large number of TCP 1167 SYN packets through a potentially congested ECN-enabled router, 1168 congesting it even further." 1170 We assume this is a reference to the TCP SYN flood attack (see 1171 https://en.wikipedia.org/wiki/SYN_flood), which is an attack against 1172 a responder end point. We assume the idea of this attack is to use 1173 ECT to get more packets through an ECN-enabled router in preference 1174 to other non-ECN traffic so that they can go on to use the SYN 1175 flooding attack to inflict more damage on the responder end point. 1176 This argument could apply to flooding with any type of packet, but we 1177 assume SYNs are singled out because their source address is easier to 1178 spoof, whereas floods of other types of packets are easier to block. 1180 Mandating Not-ECT in an RFC does not stop attackers using ECT for 1181 flooding. Nonetheless, if a standard says SYNs are not meant to be 1182 ECT it would make it legitimate for firewalls to discard them. 1183 However this would negate the considerable benefit of ECT SYNs for 1184 compliant transports and seems unnecessary because RFC 3168 already 1185 provides the means to address this concern. In section 7, RFC 3168 1186 says "During periods where ... the potential packet marking rate 1187 would be high, our recommendation is that routers drop packets rather 1188 then set the CE codepoint..." and this advice is repeated in 1189 [RFC7567] (section 4.2.1). This makes it harder for flooding packets 1190 to gain from ECT. 1192 [ecn-overload] showed that ECT can only slightly augment flooding 1193 attacks relative to a non-ECT attack. It was hard to overload the 1194 link without causing the queue to grow, which in turn caused the AQM 1195 to disable ECN and switch to drop, thus negating any advantage of 1196 using ECT. This was true even with the switch-over point set to 25% 1197 drop probability (i.e. the arrival rate was 133% of the link rate). 1199 4.3. SYN-ACKs 1201 The proposed approach in Section 3.2.2 for experimenting with ECN- 1202 capable SYN-ACKs is effectively identical to the scheme called ECN+ 1203 [ECN-PLUS]. In 2005, the ECN+ paper demonstrated that it could 1204 reduce the average Web response time by an order of magnitude. It 1205 also argued that adding ECT to SYN-ACKs did not raise any new 1206 security vulnerabilities. 1208 4.3.1. Possibility of Unrecognized CE on the SYN-ACK 1210 The feedback behaviour by the initiator in response to a CE-marked 1211 SYN-ACK from the responder depends on whether classic ECN feedback 1212 [RFC3168] or AccECN feedback [I-D.ietf-tcpm-accurate-ecn] has been 1213 negotiated. In either case no change is required to RFC 3168 or the 1214 AccECN specification. 1216 Some classic ECN client implementations might ignore a CE-mark on a 1217 SYN-ACK, or even ignore a SYN-ACK packet entirely if it is set to ECT 1218 or CE. This is a possibility because an RFC 3168 implementation 1219 would not necessarily expect a SYN-ACK to be ECN-capable. This issue 1220 already came up when the IETF first decided to experiment with ECN on 1221 SYN-ACKs [RFC5562] and it was decided to go ahead without any extra 1222 precautionary measures. This was because the probability of 1223 encountering the problem was believed to be low and the harm if the 1224 problem arose was also low (see Appendix B of RFC 5562). 1226 4.3.2. Response to Congestion on a SYN-ACK 1228 The IETF has already specified an experiment with ECN-capable SYN-ACK 1229 packets [RFC5562]. It was inspired by the ECN+ paper, but it 1230 specified a much more conservative congestion response to a CE-marked 1231 SYN-ACK, called ECN+/TryOnce. This required the server to reduce its 1232 initial window to 1 segment (like ECN+), but then the server had to 1233 send a second SYN-ACK and wait for its ACK before it could continue 1234 with its initial window of 1 SMSS. The second SYN-ACK of this 5-way 1235 handshake had to carry no data, and had to disable ECN, but no 1236 justification was given for these last two aspects. 1238 The present ECN++ experimental specification obsoletes RFC 5562 1239 because it uses the ECN+ congestion response, not ECN+/TryOnce. 1240 First we argue against the rationale for ECN+/TryOnce given in 1241 sections 4.4 and 6.2 of [RFC5562]. It starts with a rather too 1242 literal interpretation of the requirement in RFC 3168 that says TCP's 1243 response to a single CE mark has to be "essentially the same as the 1244 congestion control response to a *single* dropped packet." TCP's 1245 response to a dropped initial (SYN or SYN-ACK) packet is to wait for 1246 the retransmission timer to expire (currently 1s). However, this 1247 long delay assumes the worst case between two possible causes of the 1248 loss: a) heavy overload; or b) the normal capacity-seeking behaviour 1249 of other TCP flows. When the network is still delivering CE-marked 1250 packets, it implies that there is an AQM at the bottleneck and that 1251 it is not overloaded. This is because an AQM under overload will 1252 disable ECN (as recommended in section 7 of RFC 3168 and repeated in 1253 section 4.2.1 of RFC 7567). So scenario (a) can be ruled out. 1254 Therefore, TCP's response to a CE-marked SYN-ACK can be similar to 1255 its response to the loss of _any_ packet, rather than backing off as 1256 if the special _initial_ packet of a flow has been lost. 1258 How TCP responds to the loss of any single packet depends what it has 1259 just been doing. But there is not really a precedent for TCP's 1260 response when it experiences a CE mark having sent only one (small) 1261 packet. If TCP had been adding one segment per RTT, it would have 1262 halved its congestion window, but it hasn't established a congestion 1263 window yet. If it had been exponentially increasing it would have 1264 exited slow start, but it hasn't started exponentially increasing yet 1265 so it hasn't established a slow-start threshold. 1267 Therefore, we have to work out a reasoned argument for what to do. 1268 If an AQM is CE-marking packets, it implies there is already a queue 1269 and it is probably already somewhere around the AQM's operating point 1270 - it is unlikely to be well below and it might be well above. So, it 1271 does not seem sensible to add a number of packets at once. On the 1272 other hand, it is highly unlikely that the SYN-ACK itself pushed the 1273 AQM into congestion, so it will be safe to introduce another single 1274 segment immediately (1 RTT after the SYN-ACK). Therefore, starting 1275 to probe for capacity with a slow start from an initial window of 1 1276 segment seems appropriate to the circumstances. This is the approach 1277 adopted in Section 3.2.2. 1279 4.3.3. Fall-Back if ECT SYN-ACK Fails 1281 An alternative to the server caching failed connection attempts would 1282 be for the server to rely on the client caching failed attempts (on 1283 the basis that the client would cache a failure whether ECT was 1284 blocked on the SYN or the SYN-ACK). This strategy cannot be used if 1285 the SYN does not request AccECN support. It works as follows: if the 1286 server receives a SYN that requests AccECN support but is set to not- 1287 ECT, it replies with a SYN-ACK also set to not-ECT. If a middlebox 1288 only blocks ECT on SYNs, not SYN-ACKs, this strategy might disable 1289 ECN on a SYN-ACK when it did not need to, but at least it saves the 1290 server from maintaining a cache. 1292 4.4. Pure ACKs 1294 Section 5.2 of RFC 3168 gives the following arguments for not 1295 allowing the ECT marking of pure ACKs (ACKs not piggy-backed on 1296 data): 1298 "To ensure the reliable delivery of the congestion indication of 1299 the CE codepoint, an ECT codepoint MUST NOT be set in a packet 1300 unless the loss of that packet in the network would be detected by 1301 the end nodes and interpreted as an indication of congestion. 1303 Transport protocols such as TCP do not necessarily detect all 1304 packet drops, such as the drop of a "pure" ACK packet; for 1305 example, TCP does not reduce the arrival rate of subsequent ACK 1306 packets in response to an earlier dropped ACK packet. Any 1307 proposal for extending ECN-Capability to such packets would have 1308 to address issues such as the case of an ACK packet that was 1309 marked with the CE codepoint but was later dropped in the network. 1310 We believe that this aspect is still the subject of research, so 1311 this document specifies that at this time, "pure" ACK packets MUST 1312 NOT indicate ECN-Capability." 1314 Later on, in section 6.1.4 it reads: 1316 "For the current generation of TCP congestion control algorithms, 1317 pure acknowledgement packets (e.g., packets that do not contain 1318 any accompanying data) MUST be sent with the not-ECT codepoint. 1319 Current TCP receivers have no mechanisms for reducing traffic on 1320 the ACK-path in response to congestion notification. Mechanisms 1321 for responding to congestion on the ACK-path are areas for current 1322 and future research. (One simple possibility would be for the 1323 sender to reduce its congestion window when it receives a pure ACK 1324 packet with the CE codepoint set). For current TCP 1325 implementations, a single dropped ACK generally has only a very 1326 small effect on the TCP's sending rate." 1328 We next address each of the arguments presented above. 1330 The first argument is a specific instance of the reliability argument 1331 for the case of pure ACKs. This has already been addressed by 1332 countering the general reliability argument in Section 4.1. 1334 The second argument says that ECN ought not to be enabled unless 1335 there is a mechanism to respond to it. This argument actually 1336 comprises three sub-arguments: 1338 Mechanism feasibility: If ECN is enabled on Pure ACKs, are there, or 1339 could there be, suitable mechanisms to detect, feed back and 1340 respond to ECN-marked Pure ACKs? 1342 Do no extra harm: There has never been a mechanism to respond to 1343 loss of non-ECN Pure ACKs. So it seems that adding ECN without a 1344 response mechanism will do no extra harm to others, while 1345 improving a connection's own performance (because loss of an ACK 1346 holds back new data). However, if the end systems have no 1347 response mechanism, ECN Pure ACKS do slightly more harm than non- 1348 ECN, because the AQM doesn't immediately clear ECT packets from 1349 the queue until it reaches overload and disables ECN. 1351 Standards policy: Even if there were no harm to others, does it set 1352 an undesirable precedent to allow a flow to use ECN to protect its 1353 Pure ACKs from loss, when there is no mechanism to respond to ECN- 1354 marking? 1356 The last two arguments involve value judgements, but they both depend 1357 on the concrete technical question of mechanism feasibility, which 1358 will therefore be addressed first in Section 4.4.1 below. Then 1359 Section 4.4.2 draws conclusions by addressing the value judgements in 1360 the other two questions. 1362 4.4.1. Mechanisms to Respond to CE-Marked Pure ACKs 1364 The question of whether the receiver of pure ACKs is required to 1365 detect and feed back any CE-marking is outside the scope of the 1366 present specification - it is a matter for the relevant feedback 1367 specification (classic ECN [RFC3168] and AccECN 1368 [I-D.ietf-tcpm-accurate-ecn]). The response to congestion feedback 1369 is also out of scope, because it would be defined in the base TCP 1370 congestion control specification [RFC5681] or its variants. 1372 Nonetheless, in order to decide whether the present ECN++ 1373 experimental specification should require a host to set ECT on pure 1374 ACKs, we only need to know whether a response mechanism would be 1375 feasible - we do not have to standardize it. So the bullets below 1376 assess, for each type of feedback, whether the three stages of the 1377 congestion response mechanism could all work. 1379 Detection: Can the receiver of a pure ACK detect a CE marking on 1380 it?: 1382 * Classic feedback: RFC 3168 is silent on this point. The 1383 implementer of the receiver would not expect CE marks on pure 1384 ACKs, but the implementation might happen to check for CE marks 1385 before it looks for the data. So detection will be 1386 implementation-dependent. 1388 * AccECN feedback: the AccECN specification requires the receiver 1389 of any TCP packets to count any CE marks on them (whether or 1390 not it sends ECN-capable control packets itself). 1392 Feedback: TCP never ACKs a pure ACK, but the receiver of a CE-mark 1393 on a pure ACK could feed it back when it sends a subsequent data 1394 segment (if it ever does): 1396 * Classic feedback: RFC 3168 is silent on this point, so feedback 1397 of CE-markings might be implementation specific. If the 1398 receiver (of the pure ACKs) did generate feedback, it would set 1399 the echo congestion experienced (ECE) flag in the TCP header of 1400 subsequent packets in the round, as it would to feed back CE on 1401 data packets. 1403 * AccECN feedback: the receiver continually feeds back a count of 1404 the number of CE-marked packets that it has received and, 1405 optionally, a count of CE-marked bytes. For either metric, 1406 AccECN includes pure ACKs and indeed all types of packets. 1408 Congestion response: In either case (classic or AccECN feedback), if 1409 the TCP sender does receive feedback about CE-markings on pure 1410 ACKs, it will be able to reduce the congestion window (cwnd) and/ 1411 or the ACK rate. 1413 Therefore a congestion response mechanism is clearly feasible if 1414 AccECN has been negotiated, but the position is unknown for the 1415 installed base of classic ECN feedback. 1417 4.4.1.1. Congestion Window Response to CE-Marked Pure ACKs 1419 This subsection explores issues that congestion control designers 1420 will need to consider when defining a cwnd response to CE-marked Pure 1421 ACKs. 1423 A CE-mark on a Pure ACK does not mean that only Pure ACKs are causing 1424 congestion. It only means that the marked Pure ACK is part of an 1425 aggregate that is collectively causing a bottleneck queue to randomly 1426 CE-mark a fraction of the packets. A CE-mark on a Pure ACK might be 1427 due to data packets in other flows through the same bottleneck, due 1428 to data packets interspersed between Pure ACKs in the same half- 1429 connection, or just due to the rate of Pure ACKs alone. (RFC 3168 1430 only considered the last possibility, which led to the argument that 1431 ECN-enabled Pure ACKs had to be deferred, because ACK congestion 1432 control was a research issue.) 1434 If a host has been sending a mix of Pure ACKs and data, it doesn't 1435 need to work out whether a particular CE mark was on a Pure ACK or 1436 not; it just needs to respond to congestion feedback as a whole by 1437 reducing its congestion window (cwnd), which limits the data it can 1438 launch into flight through the congested bottleneck. If it is purely 1439 receiving data and sending only Pure ACKs, reducing cwnd will have 1440 caused it no harm, having no effect on its ACK rate (the next 1441 subsection addresses that). 1443 However, when a host is sending data as well as Pure ACKs, it would 1444 not be right for CE-marks on Pure ACKs and on data packets to induce 1445 the same reduction in cwnd. A possible way to address this issue 1446 would be to weight the response by the size of the marked packets 1447 (assuming the congestion control supports a weighted response, e.g. 1448 [RFC8257]). For instance, one could calculate the fraction of CE- 1449 marked bytes (headers and data) over each round trip (say) as 1450 follows: 1452 (CE-marked header bytes + CE-marked data bytes) / (all header 1453 bytes + all data bytes) 1455 Header bytes can be calculated by multiplying a packet count by a 1456 nominal header size, which is possible with AccECN feedback, because 1457 it gives a count of CE-marked packets (as well as CE-marked bytes). 1458 The above simple aggregate calculation caters for the full range of 1459 scenarios; from all Pure ACKs to just a few interspersed with data 1460 packets. 1462 Note that any mechanism that reduces cwnd due to CE-marked Pure ACKs 1463 would need to be integrated with the congestion window validation 1464 mechanism [RFC7661], which already conservatively reduces cwnd over 1465 time because cwnd becomes stale if it is not used to fill the pipe. 1467 4.4.1.2. ACK Rate Response to CE-Marked Pure ACKs 1469 Reducing the congestion window will have no effect on the rate of 1470 pure ACKs. The worst case here is if the bottleneck is congested 1471 solely with pure ACKs, but it could also be problematic if a large 1472 fraction of the load was from unresponsive ACKs, leaving little or no 1473 capacity for the load from responsive data. 1475 Since RFC 3168 was published, experimental Acknowledgement Congestion 1476 Control (AckCC) techniques have been documented in [RFC5690] 1477 (informational). So any pair of TCP end-points can choose to agree 1478 to regulate the delayed ACK ratio in response to lost or CE-marked 1479 pure ACKs. However, the protocol has a number of open issues 1480 concerning deployment (e.g. it requires support from both ends, it 1481 relies on two new TCP options, one of which is required on the SYN 1482 where option space is at a premium and, if either option is blocked 1483 by a middlebox, no fall-back behaviour is specified). 1485 The new TCP options address two problems, namely that TCP had: i) no 1486 mechanism to allow ECT to be set on pure ACKs; and ii) no mechanism 1487 to feed back loss or CE-marking of pure ACKs. A combination of the 1488 present specification and AccECN addresses both these problems, at 1489 least for CE-marking. So it might now be possible to design an ECN- 1490 specific ACK congestion control scheme without the extra TCP options 1491 proposed in RFC 5690. However, such a mechanism is out of scope of 1492 the present document. 1494 Setting aside the practicality of RFC 5690, the need for AckCC has 1495 not been conclusively demonstrated. It has been argued that the 1496 Internet has survived so far with no mechanism to even detect loss of 1497 pure ACKs. However, it has also been argued that ECN is not the same 1498 as loss. Packet discard can naturally thin the ACK load to whatever 1499 the bottleneck can support, whereas ECN marking does not (it queues 1500 the ACKs instead). Nonetheless, RFC 3168 (section 7) recommends that 1501 an AQM switches over from ECN marking to discard when the marking 1502 probability becomes high. Therefore discard can still be relied on 1503 to thin out ECN-enabled pure ACKs as a last resort. 1505 4.4.2. Summary: Enabling ECN on Pure ACKs 1507 In the case when AccECN has been negotiated, it provides a feasible 1508 congestion response mechanism, so the arguments for ECT on pure ACKs 1509 heavily outweigh those against. ECN is always more and never less 1510 reliable for delivery of congestion notification. A cwnd reduction 1511 needs to be considered by congestion control designers as a response 1512 to congestion on pure ACKs. Separately, AckCC (or an improved 1513 variant exploiting AccECN) could optionally be used to regulate the 1514 spacing between pure ACKs. However, it is not clear whether AckCC is 1515 justified. If it is not, packet discard will still act as the 1516 "congestion response of last resort" by thinning out the traffic. In 1517 contrast, not setting ECT on pure ACKs is certainly detrimental to 1518 performance, because when a pure ACK is lost it can prevent the 1519 release of new data. 1521 In the case when Classic ECN has been negotiated, the argument for 1522 ECT on pure ACKs is less clear-cut. Some of the installed base of 1523 RFC 3168 implementations might happen to (unintentionally) provide a 1524 feedback mechanism to support a cwnd response. For those that did 1525 not, setting ECT on pure ACKs would be better for the flow's own 1526 performance than not setting it. However, where there was no 1527 feedback mechanism, setting ECT could do slightly more harm than not 1528 setting it. AckCC could provide a complementary response mechanism, 1529 because it is designed to work with RFC 3168 ECN, but it has 1530 deployment challenges. In summary, a congestion response mechanism 1531 is unlikely to be feasible with the installed base of classic ECN. 1533 During review of this specification, it was decided that allowing 1534 hosts to set ECT on Pure ACKs without a feasible response mechanism 1535 would set an undesirable precedent. It would certainly improve the 1536 flow's own performance, but it would slightly increase potential harm 1537 to others. Therefore, Section 3.2.3 allows ECT on Pure ACKs if 1538 AccECN feedback has been negotiated, but not with classic RFC 3168 1539 ECN feedback. 1541 4.5. Window Probes 1543 Section 6.1.6 of RFC 3168 presents only the reliability argument for 1544 prohibiting ECT on Window probes: 1546 "If a window probe packet is dropped in the network, this loss is 1547 not detected by the receiver. Therefore, the TCP data sender MUST 1548 NOT set either an ECT codepoint or the CWR bit on window probe 1549 packets. 1551 However, because window probes use exact sequence numbers, they 1552 cannot be easily spoofed in denial-of-service attacks. Therefore, 1553 if a window probe arrives with the CE codepoint set, then the 1554 receiver SHOULD respond to the ECN indications." 1556 The reliability argument has already been addressed in Section 4.1. 1558 Allowing ECT on window probes could considerably improve performance 1559 because, once the receive window has reopened, if a window probe is 1560 lost the sender will stall until the next window probe reaches the 1561 receiver, which might be after the maximum retransmission timeout (at 1562 least 1 minute [RFC6928]). 1564 On the bright side, RFC 3168 at least specifies the receiver 1565 behaviour if a CE-marked window probe arrives, so changing the 1566 behaviour ought to be less painful than for other packet types. 1568 4.6. FINs 1570 RFC 3168 is silent on whether a TCP sender can set ECT on a FIN. A 1571 FIN is considered as part of the sequence of data, and the rate of 1572 pure ACKs sent after a FIN could be controlled by a CE marking on the 1573 FIN. Therefore there is no reason not to set ECT on a FIN. 1575 4.7. RSTs 1577 RFC 3168 is silent on whether a TCP sender can set ECT on a RST. The 1578 host generating the RST message does not have an open connection 1579 after sending it (either because there was no such connection when 1580 the packet that triggered the RST message was received or because the 1581 packet that triggered the RST message also triggered the closure of 1582 the connection). 1584 Moreover, the receiver of a CE-marked RST message can either: i) 1585 accept the RST message and close the connection; ii) emit a so-called 1586 challenge ACK in response (with suitable throttling) [RFC5961] and 1587 otherwise ignore the RST (e.g. because the sequence number is in- 1588 window but not the precise number expected next); or iii) discard the 1589 RST message (e.g. because the sequence number is out-of-window). In 1590 the first two cases there is no point in echoing any CE mark received 1591 because the sender closed its connection when it sent the RST. In 1592 the third case it makes sense to discard the CE signal as well as the 1593 RST. 1595 Although a congestion response following a CE-marking on a RST does 1596 not appear to make sense, the following factors have been considered 1597 before deciding whether the sender ought to set ECT on a RST message: 1599 o As explained above, a congestion response by the sender of a CE- 1600 marked RST message is not possible; 1602 o So the only reason for the sender setting ECT on a RST would be to 1603 improve the reliability of the message's delivery; 1605 o RST messages are used to both mount and mitigate attacks: 1607 * Spoofed RST messages are used by attackers to terminate ongoing 1608 connections, although the mitigations in RFC 5961 have 1609 considerably raised the bar against off-path RST attacks; 1611 * Legitimate RST messages allow endpoints to inform their peers 1612 to eliminate existing state that correspond to non existing 1613 connections, liberating resources e.g. in DoS attacks 1614 scenarios; 1616 o AQMs are advised to disable ECN marking during persistent 1617 overload, so: 1619 * it is harder for an attacker to exploit ECN to intensify an 1620 attack; 1622 * it is harder for a legitimate user to exploit ECN to more 1623 reliably mitigate an attack 1625 o Prohibiting ECT on a RST would deny the benefit of ECN to 1626 legitimate RST messages, but not to attackers who can disregard 1627 RFCs; 1629 o If ECT were prohibited on RSTs 1631 * it would be easy for security middleboxes to discard all ECN- 1632 capable RSTs; 1634 * However, unlike a SYN flood, it is already easy for a security 1635 middlebox (or host) to distinguish a RST flood from legitimate 1636 traffic [RFC5961], and even if a some legitimate RSTs are 1637 accidentally removed as well, legitimate connections still 1638 function. 1640 So, on balance, it has been decided that it is worth experimenting 1641 with ECT on RSTs. During experiments, if the ECN capability on RSTs 1642 is found to open a vulnerability that is hard to close, this decision 1643 can be reversed, before it is specified for the standards track. 1645 4.8. Retransmitted Packets. 1647 RFC 3168 says the sender "MUST NOT" set ECT on retransmitted packets. 1648 The rationale for this consumes nearly 2 pages of RFC 3168, so the 1649 reader is referred to section 6.1.5 of RFC 3168, rather than quoting 1650 it all here. There are essentially three arguments, namely: 1651 reliability; DoS attacks; and over-reaction to congestion. We 1652 address them in order below. 1654 The reliability argument has already been addressed in Section 4.1. 1656 Protection against DoS attacks is not afforded by prohibiting ECT on 1657 retransmitted packets. An attacker can set CE on spoofed 1658 retransmissions whether or not it is prohibited by an RFC. 1660 Protection against the DoS attack described in section 6.1.5 of RFC 1661 3168 is solely afforded by the requirement that "the TCP data 1662 receiver SHOULD ignore the CE codepoint on out-of-window packets". 1663 Therefore in Section 3.2.7 the sender is allowed to set ECT on 1664 retransmitted packets, in order to reduce the chance of them being 1665 dropped. We also strengthen the receiver's requirement from "SHOULD 1666 ignore" to "MUST ignore". And we generalize the receiver's 1667 requirement to include failure of any validity check, not just out- 1668 of-window checks, in order to include the more stringent validity 1669 checks in RFC 5961 that have been developed since RFC 3168. 1671 A consequence is that, for those retransmitted packets that arrive at 1672 the receiver after the original packet has been properly received 1673 (so-called spurious retransmissions), any CE marking will be ignored. 1674 There is no problem with that because the fact that the original 1675 packet has been delivered implies that the sender's original 1676 congestion response (when it deemed the packet lost and retransmitted 1677 it) was unnecessary. 1679 Finally, the third argument is about over-reacting to congestion. 1680 The argument goes that, if a retransmitted packet is dropped, the 1681 sender will not detect it, so it will not react again to congestion 1682 (it would have reduced its congestion window already when it 1683 retransmitted the packet). Whereas, if retransmitted packets can be 1684 CE tagged instead of dropped, senders could potentially react more 1685 than once to congestion. However, we argue that it is legitimate to 1686 respond again to congestion if it still persists in subsequent round 1687 trip(s). 1689 Therefore, in all three cases, it is not incorrect to set ECT on 1690 retransmissions. 1692 4.9. General Fall-back for any Control Packet 1694 Extensive experiments have found no evidence of any traversal 1695 problems with ECT on any TCP control packet [Mandalari18]. 1696 Nonetheless, Sections 3.2.1.4 and 3.2.2.3 specify fall-back measures 1697 if ECT on the first packet of each half-connection (SYN or SYN-ACK) 1698 appears to be blocking progress. Here, the question of fall-back 1699 measures for ECT on other control packets is explored. It supports 1700 the advice given in Section 3.2.8; until there's evidence that 1701 something's broken, don't fix it. 1703 If an implementation has had to disable ECT to ensure the first 1704 packet of a flow (SYN or SYN-ACK) gets through, the question arises 1705 whether it ought to disable ECT on all subsequent control packets 1706 within the same TCP connection. Without evidence of any such 1707 problems, this seems unnecessarily cautious. Particularly given it 1708 would be hard to detect loss of most other types of TCP control 1709 packets that are not ACK'd. And particularly given that 1710 unnecessarily removing ECT from other control packets could lead to 1711 performance problems, e.g. by directing them into an inferior queue 1712 [I-D.ietf-tsvwg-ecn-l4s-id] or over a different path, because some 1713 broken multipath equipment (erroneously) routes based on all 8 bits 1714 of the Diffserv field. 1716 In the case where a connection starts without ECT on the SYN (perhaps 1717 because problems with previous connections had been cached), there 1718 will have been no test for ECT traversal in the client-server 1719 direction until the pure ACK that completes the handshake. It is 1720 possible that some middlebox might block ECT on this pure ACK or on 1721 later retransmissions of lost packets. Similarly, after a route 1722 change, the new path might include some middlebox that blocks ECT on 1723 some or all TCP control packets. However, without evidence of such 1724 problems, the complexity of a fix does not seem worthwhile. 1726 MORE MEASUREMENTS NEEDED (?): If further two-ended measurements do 1727 find evidence for these traversal problems, measurements would be 1728 needed to check for correlation of ECT traversal problems between 1729 different control packets. It might then be necessary to 1730 introduce a catch-all fall-back rule that disables ECT on certain 1731 subsequent TCP control packets based on some criteria developed 1732 from these measurements. 1734 5. Interaction with popular variants or derivatives of TCP 1736 The following subsections discuss any interactions between setting 1737 ECT on all packets and using the following popular variants of TCP: 1738 IW10 and TFO. It also briefly notes the possibility that the 1739 principles applied here should translate to protocols derived from 1740 TCP. This section is informative not normative, because no 1741 interactions have been identified that require any change to 1742 specifications. The subsection on IW10 discusses potential changes 1743 to specifications but recommends that no changes are needed. 1745 The designs of the following TCP variants have also been assessed and 1746 found not to interact adversely with ECT on TCP control packets: SYN 1747 cookies (see Appendix A of [RFC4987] and section 3.1 of [RFC5562]), 1748 TCP Fast Open (TFO [RFC7413]) and L4S [I-D.ietf-tsvwg-l4s-arch]. 1750 5.1. IW10 1752 IW10 is an experiment to determine whether it is safe for TCP to use 1753 an initial window of 10 SMSS [RFC6928]. 1755 This subsection does not recommend any additions to the present 1756 specification in order to interwork with IW10. The specifications as 1757 they stand are safe, and there is only a corner-case with ECT on the 1758 SYN where performance could be occasionally improved, as explained 1759 below. 1761 As specified in Section 3.2.1.1, a TCP initiator can only set ECT on 1762 the SYN if it requests AccECN support. If, however, the SYN-ACK 1763 tells the initiator that the responder does not support AccECN, 1764 Section 3.2.1.1 advises the initiator to conservatively reduce its 1765 initial window to 1 SMSS because, if the SYN was CE-marked, the SYN- 1766 ACK has no way to feed that back. 1768 If the initiator implements IW10, it seems rather over-conservative 1769 to reduce IW from 10 to 1 just in case a congestion marking was 1770 missed. Nonetheless, the reduction to 1 SMSS will rarely harm 1771 performance, because: 1773 o as long as the initiator is caching failures to negotiate AccECN, 1774 subsequent attempts to access the same server will not use ECT on 1775 the SYN anyway, so there will no longer be any need to 1776 conservatively reduce IW; 1778 o currently, at least for web sessions, it is extremely rare for a 1779 TCP initiator (client) to have more than one data segment to send 1780 at the start of a TCP connection [28; Fig 3] - IW10 is primarily 1781 exploited by TCP servers. 1783 If a responder receives feedback that the SYN-ACK was CE-marked, 1784 Section 3.2.2.2 mandates that it reduces its initial window to 1 1785 SMSS. When the responder also implements IW10, it is particularly 1786 important to adhere to this requirement in order to avoid overflowing 1787 a queue that is clearly already congested. 1789 5.2. TFO 1791 TCP Fast Open (TFO [RFC7413]) is an experiment to remove the round 1792 trip delay of TCP's 3-way hand-shake (3WHS). A TFO initiator caches 1793 a cookie from a previous connection with a TFO-enabled server. Then, 1794 for subsequent connections to the same server, any data included on 1795 the SYN can be passed directly to the server application, which can 1796 then return up to an initial window of response data on the SYN-ACK 1797 and on data segments straight after it, without waiting for the ACK 1798 that completes the 3WHS. 1800 The TFO experiment and the present experiment to add ECN-support for 1801 TCP control packets can be combined without altering either 1802 specification, which is justified as follows: 1804 o The handling of ECN marking on a SYN is no different whether or 1805 not it carries data. 1807 o In response to any CE-marking on the SYN-ACK, the responder adopts 1808 the normal response to congestion, as discussed in Section 7.2 of 1809 [RFC7413]. 1811 5.3. TCP Derivatives 1813 Experience from experiments on adding ECN support to all TCP packets 1814 ought to be directly transferable between TCP and derivatives of TCP, 1815 like SCTP or QUIC. 1817 Stream Control Transmission Protocol (SCTP [RFC4960]) is a standards 1818 track transport protocol derived from TCP. SCTP currently does not 1819 include ECN support, but Appendix A of RFC 4960 broadly describes how 1820 it would be supported and a (long-expired) draft on the addition of 1821 ECN to SCTP has been produced [I-D.stewart-tsvwg-sctpecn]. This 1822 draft avoided setting ECT on control packets and retransmissions, 1823 closely following the arguments in RFC 3168. 1825 QUIC [I-D.ietf-quic-transport] is another standards track transport 1826 protocol offering similar services to TCP but intended to exploit 1827 some of the benefits of running over UDP. Building on the arguments 1828 in the current draft, a QUIC sender sets ECT(0) on all packets. 1830 6. Security Considerations 1832 Section 3.2.6 considers the question of whether ECT on RSTs will 1833 allow RST attacks to be intensified. There are several security 1834 arguments presented in RFC 3168 for preventing the ECN marking of TCP 1835 control packets and retransmitted segments. We believe all of them 1836 have been properly addressed in Section 4, particularly Section 4.2.4 1837 and Section 4.8 on DoS attacks using spoofed ECT-marked SYNs and 1838 spoofed CE-marked retransmissions. 1840 7. IANA Considerations 1842 There are no IANA considerations in this memo. 1844 8. Acknowledgments 1846 Thanks to Mirja Kuehlewind, David Black, Padma Bhooma, Gorry 1847 Fairhurst, Michael Scharf, Yuchung Cheng and Christophe Paasch for 1848 their useful reviews. 1850 The work of Marcelo Bagnulo has been performed in the framework of 1851 the H2020-ICT-2014-2 project 5G NORMA. His contribution reflects the 1852 consortium's view, but the consortium is not liable for any use that 1853 may be made of any of the information contained therein. 1855 Bob Briscoe's contribution was partly funded by the Research Council 1856 of Norway through the TimeIn project. The views expressed here are 1857 solely those of the authors. 1859 9. References 1861 9.1. Normative References 1863 [I-D.ietf-tcpm-accurate-ecn] 1864 Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More 1865 Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate- 1866 ecn-08 (work in progress), March 2019. 1868 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1869 Requirement Levels", BCP 14, RFC 2119, 1870 DOI 10.17487/RFC2119, March 1997, 1871 . 1873 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1874 of Explicit Congestion Notification (ECN) to IP", 1875 RFC 3168, DOI 10.17487/RFC3168, September 2001, 1876 . 1878 [RFC5961] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's 1879 Robustness to Blind In-Window Attacks", RFC 5961, 1880 DOI 10.17487/RFC5961, August 2010, 1881 . 1883 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 1884 Notification (ECN) Experimentation", RFC 8311, 1885 DOI 10.17487/RFC8311, January 2018, 1886 . 1888 9.2. Informative References 1890 [ecn-overload] 1891 Steen, H., "Destruction Testing: Ultra-Low Delay using 1892 Dual Queue Coupled Active Queue Management", Masters 1893 Thesis, Uni Oslo , May 2017, 1894 . 1897 [ecn-pam] Trammell, B., Kuehlewind, M., Boppart, D., Learmonth, I., 1898 Fairhurst, G., and R. Scheffenegger, "Enabling Internet- 1899 Wide Deployment of Explicit Congestion Notification", 1900 Int'l Conf. on Passive and Active Network Measurement 1901 (PAM'15) pp193-205, 2015, . 1904 [ECN-PLUS] 1905 Kuzmanovic, A., "The Power of Explicit Congestion 1906 Notification", ACM SIGCOMM 35(4):61--72, 2005, 1907 . 1909 [I-D.ietf-quic-transport] 1910 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 1911 and Secure Transport", draft-ietf-quic-transport-20 (work 1912 in progress), April 2019. 1914 [I-D.ietf-tsvwg-ecn-l4s-id] 1915 Schepper, K. and B. Briscoe, "Identifying Modified 1916 Explicit Congestion Notification (ECN) Semantics for 1917 Ultra-Low Queuing Delay (L4S)", draft-ietf-tsvwg-ecn-l4s- 1918 id-06 (work in progress), March 2019. 1920 [I-D.ietf-tsvwg-l4s-arch] 1921 Briscoe, B., Schepper, K., and M. Bagnulo, "Low Latency, 1922 Low Loss, Scalable Throughput (L4S) Internet Service: 1923 Architecture", draft-ietf-tsvwg-l4s-arch-03 (work in 1924 progress), October 2018. 1926 [I-D.stewart-tsvwg-sctpecn] 1927 Stewart, R., Tuexen, M., and X. Dong, "ECN for Stream 1928 Control Transmission Protocol (SCTP)", draft-stewart- 1929 tsvwg-sctpecn-05 (work in progress), January 2014. 1931 [judd-nsdi] 1932 Judd, G., "Attaining the promise and avoiding the pitfalls 1933 of TCP in the Datacenter", USENIX Symposium on Networked 1934 Systems Design and Implementation (NSDI'15) pp.145-157, 1935 May 2015, . 1937 [Kuehlewind18] 1938 Kuehlewind, M., Walter, M., Learmonth, I., and B. 1939 Trammell, "Tracing Internet Path Transparency", In Proc: 1940 Network Traffic Measurement and Analysis Conference (TMA) 1941 2018 , June 2018, . 1944 [Mandalari18] 1945 Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Oe. 1946 Alay, "Measuring ECN++: Good News for ++, Bad News for ECN 1947 over Mobile", IEEE Communications Magazine , March 2018, 1948 . 1950 [Manzoor17] 1951 Manzoor, J., Drago, I., and R. Sadre, "How HTTP/2 is 1952 changing Web traffic and how to detect it", In Proc: 1953 Network Traffic Measurement and Analysis Conference (TMA) 1954 2017 pp.1-9, June 2017, 1955 . 1957 [relax-strict-ecn] 1958 Tilmans, O., "tcp: Accept ECT on SYN in the presence of 1959 RFC8311", Linux netdev patch list , April 2019, 1960 . 1962 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 1963 RFC 793, DOI 10.17487/RFC0793, September 1981, 1964 . 1966 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 1967 Communication Layers", STD 3, RFC 1122, 1968 DOI 10.17487/RFC1122, October 1989, 1969 . 1971 [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit 1972 Congestion Notification (ECN) Signaling with Nonces", 1973 RFC 3540, DOI 10.17487/RFC3540, June 2003, 1974 . 1976 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 1977 RFC 4960, DOI 10.17487/RFC4960, September 2007, 1978 . 1980 [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common 1981 Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, 1982 . 1984 [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. 1985 Ramakrishnan, "Adding Explicit Congestion Notification 1986 (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, 1987 DOI 10.17487/RFC5562, June 2009, 1988 . 1990 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1991 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1992 . 1994 [RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding 1995 Acknowledgement Congestion Control to TCP", RFC 5690, 1996 DOI 10.17487/RFC5690, February 2010, 1997 . 1999 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 2000 "Computing TCP's Retransmission Timer", RFC 6298, 2001 DOI 10.17487/RFC6298, June 2011, 2002 . 2004 [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, 2005 "Increasing TCP's Initial Window", RFC 6928, 2006 DOI 10.17487/RFC6928, April 2013, 2007 . 2009 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP 2010 Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, 2011 . 2013 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 2014 Recommendations Regarding Active Queue Management", 2015 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 2016 . 2018 [RFC7661] Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating 2019 TCP to Support Rate-Limited Traffic", RFC 7661, 2020 DOI 10.17487/RFC7661, October 2015, 2021 . 2023 [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., 2024 and G. Judd, "Data Center TCP (DCTCP): TCP Congestion 2025 Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257, 2026 October 2017, . 2028 [strict-ecn] 2029 Dumazet, E., "tcp: be more strict before accepting ECN 2030 negociation", Linux netdev patch list , May 2012, 2031 . 2033 Authors' Addresses 2034 Marcelo Bagnulo 2035 Universidad Carlos III de Madrid 2036 Av. Universidad 30 2037 Leganes, Madrid 28911 2038 SPAIN 2040 Phone: 34 91 6249500 2041 Email: marcelo@it.uc3m.es 2042 URI: http://www.it.uc3m.es 2044 Bob Briscoe 2045 CableLabs 2046 UK 2048 Email: ietf@bobbriscoe.net 2049 URI: http://bobbriscoe.net/