idnits 2.17.1 draft-bagnulo-tcpm-generalized-ecn-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (April 25, 2017) is 2557 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-28) exists of draft-ietf-tcpm-accurate-ecn-02 == Outdated reference: A later version (-08) exists of draft-ietf-tsvwg-ecn-experimentation-01 -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) == Outdated reference: A later version (-07) exists of draft-stewart-tsvwg-sctpecn-05 Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Bagnulo 3 Internet-Draft UC3M 4 Intended status: Experimental B. Briscoe 5 Expires: October 27, 2017 Simula Research Lab 6 April 25, 2017 8 Adding Explicit Congestion Notification (ECN) to TCP control packets and 9 TCP retransmissions 10 draft-bagnulo-tcpm-generalized-ecn-03 12 Abstract 14 This document describes an experimental modification to ECN when used 15 with TCP. It allows the use of ECN on the following TCP packets: 16 SYNs, pure ACKs, Window probes, FINs, RSTs and retransmissions. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at http://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on October 27, 2017. 35 Copyright Notice 37 Copyright (c) 2017 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 53 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 3 54 1.2. Experiment Goals . . . . . . . . . . . . . . . . . . . . 4 55 1.3. Document Structure . . . . . . . . . . . . . . . . . . . 5 56 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 57 3. Specification . . . . . . . . . . . . . . . . . . . . . . . . 6 58 3.1. Network (e.g. Firewall) Behaviour . . . . . . . . . . . . 6 59 3.2. Endpoint Behaviour . . . . . . . . . . . . . . . . . . . 6 60 3.2.1. SYN . . . . . . . . . . . . . . . . . . . . . . . . . 8 61 3.2.2. SYN-ACK . . . . . . . . . . . . . . . . . . . . . . . 10 62 3.2.3. Pure ACK . . . . . . . . . . . . . . . . . . . . . . 12 63 3.2.4. Window Probe . . . . . . . . . . . . . . . . . . . . 12 64 3.2.5. FIN . . . . . . . . . . . . . . . . . . . . . . . . . 13 65 3.2.6. RST . . . . . . . . . . . . . . . . . . . . . . . . . 13 66 3.2.7. Retransmissions . . . . . . . . . . . . . . . . . . . 14 67 4. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . 14 68 4.1. The Reliability Argument . . . . . . . . . . . . . . . . 14 69 4.2. SYNs . . . . . . . . . . . . . . . . . . . . . . . . . . 15 70 4.2.1. Argument 1a: Unrecognized CE on the SYN . . . . . . . 15 71 4.2.2. Argument 1b: Unrecognized ECT on the SYN . . . . . . 17 72 4.2.3. Argument 2: DoS Attacks . . . . . . . . . . . . . . . 19 73 4.3. SYN-ACKs . . . . . . . . . . . . . . . . . . . . . . . . 19 74 4.4. Pure ACKs . . . . . . . . . . . . . . . . . . . . . . . . 21 75 4.4.1. Cwnd Response to CE-Marked Pure ACKs . . . . . . . . 22 76 4.4.2. ACK Rate Response to CE-Marked Pure ACKs . . . . . . 23 77 4.4.3. Summary: Enabling ECN on Pure ACKs . . . . . . . . . 24 78 4.5. Window Probes . . . . . . . . . . . . . . . . . . . . . . 24 79 4.6. FINs . . . . . . . . . . . . . . . . . . . . . . . . . . 25 80 4.7. RSTs . . . . . . . . . . . . . . . . . . . . . . . . . . 25 81 4.8. Retransmitted Packets. . . . . . . . . . . . . . . . . . 26 82 5. Interaction with popular variants or derivatives of TCP . . . 27 83 5.1. SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . 28 84 5.2. IW10 . . . . . . . . . . . . . . . . . . . . . . . . . . 28 85 5.3. TFO . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 86 6. Security Considerations . . . . . . . . . . . . . . . . . . . 29 87 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 88 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 29 89 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 30 90 9.1. Normative References . . . . . . . . . . . . . . . . . . 30 91 9.2. Informative References . . . . . . . . . . . . . . . . . 30 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 32 94 1. Introduction 96 RFC 3168 [RFC3168] specifies support of Explicit Congestion 97 Notification (ECN) in IP (v4 and v6). By using the ECN capability, 98 switches performing Active Queue Management (AQM) can use ECN marks 99 instead of packet drops to signal congestion to the endpoints of a 100 communication. This results in lower packet loss and increased 101 performance. RFC 3168 also specifies support for ECN in TCP, but 102 solely on data packets. For various reasons it precludes the use of 103 ECN on TCP control packets (TCP SYN, TCP SYN-ACK, pure ACKs, Window 104 probes) and on retransmitted packets. RFC 3168 is silent about the 105 use of ECN on RST and FIN packets. RFC 5562 [RFC5562] is an 106 experimental modification to ECN that enables ECN support for TCP 107 SYN-ACK packets. 109 This document defines an experimental modification to ECN [RFC3168] 110 that enables ECN support on all the aforementioned types of TCP 111 packet. [I-D.ietf-tsvwg-ecn-experimentation] is a standards track 112 procedural device that relaxes standards track requirements in RFC 113 3168 that would otherwise preclude these experimental modifications. 115 The present document also considers the implications for common 116 derivatives and variants of TCP, such as SCTP [RFC4960], if the 117 experiment is successful. One particular variant of TCP adds 118 accurate ECN feedback (AccECN [I-D.ietf-tcpm-accurate-ecn]), without 119 which ECN support cannot be added to SYNs. Nonetheless, ECN support 120 can be added to all the other types of TCP packet whether or not 121 AccECN is also supported. 123 1.1. Motivation 125 The absence of ECN support on TCP control packets and retransmissions 126 has a potential harmful effect. In any ECN deployment, non-ECN- 127 capable packets suffer a penalty when they traverse a congested 128 bottleneck. For instance, with a drop probability of 1%, 1% of 129 connection attempts suffer a timeout of about 1 second before the SYN 130 is retransmitted, which is highly detrimental to the performance of 131 short flows. TCP control packets, such as TCP SYNs and pure ACKs, 132 are important for performance, so dropping them is best avoided. 134 Non-ECN control packets particularly harm performance in environments 135 where the ECN marking level is high. For example, [judd-nsdi] shows 136 that in a data centre (DC) environment where ECN is used (in 137 conjunction with DCTCP), the probability of being able to establish a 138 new connection using a non-ECN SYN packet drops to close to zero even 139 when there are only 16 ongoing TCP flows transmitting at full speed. 140 In this data centre context, the issue is that DCTCP's aggressive 141 response to packet marking leads to a high marking probability for 142 ECN-capable packets, and in turn a high drop probability for non-ECN 143 packets. Therefore non-ECN SYNs are dropped aggressively, rendering 144 it nearly impossible to establish a new connection in the presence of 145 even mild traffic load. 147 Finally, there are ongoing experimental efforts to promote the 148 adoption of a slightly modified variant of DCTCP (and similar 149 congestion controls) over the Internet to achieve low latency, low 150 loss and scalable throughput (L4S) for all communications 151 [I-D.briscoe-tsvwg-l4s-arch]. In such an approach, L4S packets 152 identify themselves using an ECN codepoint. With L4S and potentially 153 other similar cases, preventing TCP control packets from obtaining 154 the benefits of ECN would not only expose them to the prevailing 155 level of congestion loss, but it would also classify control packet 156 into a different queue with different network treatment, which may 157 also lead to reordering, further degrading TCP performance. 159 1.2. Experiment Goals 161 The goal of the experimental modifications defined in this document 162 is to allow the use of ECN on all TCP packets. Experiments are 163 expected in the public Internet as well as in controlled environments 164 to understand the following issues: 166 o How SYNs, Window probes, pure ACKs, FINs, RSTs and retransmissions 167 that carry the ECT(0), ECT(1) or CE codepoints are processed by 168 the TCP endpoints and the network (including routers, firewalls 169 and other middleboxes). In particular we would like to learn if 170 these packets are frequently blocked or if these packets are 171 usually forwarded and processed. 173 o The scale of deployment of the different flavours of ECN, 174 including [RFC3168], [RFC5562], [RFC3540] and 175 [I-D.ietf-tcpm-accurate-ecn]. 177 o How much the performance of TCP communications is improved by 178 allowing ECN marking of each packet type. 180 o To identify any issues (including security issues) raised by 181 enabling ECN marking of these packets. 183 The data gathered through the experiments described in this document, 184 particularly under the first 2 bullets above, will help in the design 185 of the final mechanism (if any) for adding ECN support to the 186 different packet types considered in this document. Whenever data 187 input is needed to assist in a design choice, it is spelled out 188 throughout the document. 190 Success criteria: The experiment will be a success if we obtain 191 enough data to have a clearer view of the deployability and benefits 192 of enabling ECN on all TCP packets, as well as any issues. If the 193 results of the experiment show that it is feasible to deploy such 194 changes; that there are gains to be achieved though the changes 195 described in this specification; and that no other major issues may 196 interfere with the deployment of the proposed changes; then it would 197 be reasonable to adopt the proposed changes in a standards track 198 specification that would update RFC 3168. 200 1.3. Document Structure 202 The remainder of this document is structured as follows. In 203 Section 2, we present the terminology used in the rest of the 204 document. In Section 3, we specify the modifications to provide ECN 205 support to TCP SYNs, pure ACKs, Window probes, FINs, RSTs and 206 retransmissions. We describe both the network behaviour and the 207 endpoint behaviour. Section 5 discusses variations of the 208 specification that will be necessary to interwork with a number of 209 popular variants or derivatives of TCP. RFC 3168 provides a number 210 of specific reasons why ECN support is not appropriate for each 211 packet type. In Section 4, we revisit each of these arguments for 212 each packet type to justify why it is reasonable to conduct this 213 experiment. 215 2. Terminology 217 The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, 218 SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this 219 document, are to be interpreted as described in [RFC2119]. 221 Pure ACK: A TCP segment with the ACK flag set and no data payload. 223 SYN: A TCP segment with the SYN (synchronize) flag set. 225 Window probe: Defined in [RFC0793], a window probe is a TCP segment 226 with only one byte of data sent to learn if the receive window is 227 still zero. 229 FIN: A TCP segment with the FIN (finish) flag set. 231 RST: A TCP segment with the RST (reset) flag set. 233 Retransmission: A TCP segment that has been retransmitted by the TCP 234 sender. 236 ECT: ECN-Capable Transport. One of the two codepoints ECT(0) or 237 ECT(1) in the ECN field [RFC3168] of the IP header (v4 or v6). An 238 ECN-capable sender sets one of these to indicate that both transport 239 end-points support ECN. When this specification says the sender sets 240 an ECT codepoint, by default it means ECT(0). Optionally, it could 241 mean ECT(1), which is in the process of being redefined for use by 242 L4S experiments [I-D.ietf-tsvwg-ecn-experimentation] 243 [I-D.briscoe-tsvwg-ecn-l4s-id]. 245 Not-ECT: The ECN codepoint set by senders that indicates that the 246 transport is not ECN-capable. 248 CE: Congestion Experienced. The ECN codepoint that an intermediate 249 node sets to indicate congestion [RFC3168]. A node sets an 250 increasing proportion of ECT packets to CE as the level of congestion 251 increases. 253 3. Specification 255 3.1. Network (e.g. Firewall) Behaviour 257 Previously the specification of ECN for TCP [RFC3168] required the 258 sender to set not-ECT on TCP control packets and retransmissions. 259 Some readers of RFC 3168 might have erroneously interpreted this as a 260 requirement for firewalls, intrusion detection systems, etc. to check 261 and enforce this behaviour. Section 4.3 of 262 [I-D.ietf-tsvwg-ecn-experimentation] updates RFC 3168 to remove this 263 ambiguity. It require firewalls or any intermediate nodes not to 264 treat certain types of ECN-capable TCP segment differently (except 265 potentially in one attack scenario). This is likely to only involve 266 a firewall rule change in a fraction of cases (at most 0.4% of paths 267 according to the tests reported in Section 4.2.2). 269 In case a TCP sender encounters a middlebox blocking ECT on certain 270 TCP segments, the specification below includes behaviour to fall back 271 to non-ECN. However, this loses the benefit of ECN on control 272 packets. So operators are RECOMMENDED to alter their firewall rules 273 to comply with the requirement referred to above (section 4.3 of 274 [I-D.ietf-tsvwg-ecn-experimentation]). 276 3.2. Endpoint Behaviour 278 The changes to the specification of TCP over ECN [RFC3168] defined 279 here solely alter the behaviour of the sending host for each half- 280 connection. All changes can be deployed at each end-point 281 independently of others. 283 The feedback behaviour at the receiver depends on whether classic ECN 284 TCP feedback [RFC3168] or Accurate ECN (AccECN) TCP feedback 285 [I-D.ietf-tcpm-accurate-ecn] has been negotiated. Nonetheless, 286 neither receiver feedback behaviour is altered by the present 287 specification. 289 For each type of control packet or retransmission, the following 290 sections detail changes to the sender's behaviour in two respects: i) 291 whether it sets ECT; and ii) its response to congestion feedback. 292 Table 1 summarises these two behaviours for each type of packet, but 293 the relevant subsection below should be referred to for the detailed 294 behaviour. The subsection on the SYN is more complex than the 295 others, because it has to include fall-back behaviour if the ECT 296 packet appears not to have got through, and caching of the outcome to 297 detect persistent failures. 299 +-----------+-----------------+-----------------+-------------------+ 300 | TCP | ECN field if | ECN field if | Congestion | 301 | packet | AccECN f/b | RFC3168 f/b | Response | 302 | type | negotiated* | negotiated* | | 303 +-----------+-----------------+-----------------+-------------------+ 304 | SYN | ECT | not-ECT | Reduce IW | 305 | | | | | 306 | SYN-ACK | ECT | ECT | Reduce IW as in | 307 | [RFC5562] | | | [RFC5562] | 308 | | | | | 309 | Pure ACK | ECT | ECT | Usual cwnd | 310 | | | | response and | 311 | | | | optionally | 312 | | | | [RFC5690] | 313 | | | | | 314 | W Probe | ECT | ECT | Usual cwnd | 315 | | | | response | 316 | | | | | 317 | FIN | ECT | ECT | None or | 318 | | | | optionally | 319 | | | | [RFC5690] | 320 | | | | | 321 | RST | ECT | ECT | N/A | 322 | | | | | 323 | Re-XMT | ECT | ECT | Usual cwnd | 324 | | | | response | 325 +-----------+-----------------+-----------------+-------------------+ 327 Window probe and retransmission are abbreviated to W Probe an Re-XMT. 328 * For a SYN, "negotiated" means "requested". 330 Table 1: Summary of sender behaviour. In each case the relevant 331 section below should be referred to for the detailed behaviour 333 It can be seen that the sender can set ECT in all cases, except if it 334 is not requesting AccECN feedback on the SYN. Therefore it is 335 RECOMMENDED that the experimental AccECN specification 336 [I-D.ietf-tcpm-accurate-ecn] is implemented (as well as the present 337 specification), because it is expected that ECT on the SYN will give 338 the most significant performance gain, particularly for short flows. 339 Nonetheless, this specification also caters for the case where AccECN 340 feedback is not implemented. 342 3.2.1. SYN 344 3.2.1.1. Setting ECT on the SYN 346 With classic [RFC3168] ECN feedback, the SYN was never expected to be 347 ECN-capable, so the flag provided to feed back congestion was put to 348 another use (it is used in combination with other flags to indicate 349 that the responder supports ECN). In contrast, Accurate ECN (AccECN) 350 feedback [I-D.ietf-tcpm-accurate-ecn] provides two codepoints in the 351 SYN-ACK for the responder to feed back whether or not the SYN arrived 352 marked CE. 354 Therefore, a TCP initiator MUST NOT set ECT on a SYN unless it also 355 attempts to negotiate Accurate ECN feedback in the same SYN. 357 For the experiments proposed here, if the SYN is requesting AccECN 358 feedback, the TCP sender will also set ECT on the SYN. It can ignore 359 the prohibition in section 6.1.1 of RFC 3168 against setting ECT on 360 such a SYN. 362 The following subsections about the SYN solely apply to this case 363 where the initiator sent an ECT SYN. 365 3.2.1.2. Caching Lack of Support for ECT on SYNs 367 Until AccECN servers become widely deployed, a TCP initiator that 368 sets ECT on a SYN (which implies the same SYN also requests AccECN, 369 as required above) SHOULD also maintain a cache per server to record 370 any failure of previous attempts. 372 The initiator will record any server's SYN-ACK response that does not 373 support AccECN. Subsequently the initiator will not set ECT on a SYN 374 to such a server, but it can still always request AccECN support 375 (because the response will state any earlier stage of ECN evolution 376 that the server supports with no performance penalty). The initiator 377 will discover a server that has upgraded to support AccECN as soon as 378 it next connects, then it can remove the server from its cache and 379 subsequently always set ECT for that server. 381 If the initiator times out without seeing a SYN-ACK, it will also 382 cache this fact (see fall-back in Section 3.2.1.4 for details). 384 There is no need to cache successful attempts, because the default 385 ECT SYN behaviour performs optimally on success anyway. Servers that 386 do not support ECN as a whole probably do not need to be recorded 387 separately from non-support of AccECN because the response to a 388 request for AccECN immediately states which stage in the evolution of 389 ECN the server supports (AccECN [I-D.ietf-tcpm-accurate-ecn], classic 390 ECN [RFC3168] or no ECN). 392 The above strategy is named "optimistic ECT and cache failures". It 393 is believed to be sufficient based on initial measurements and 394 assumptions detailed in Section 4.2.1, which also gives alternative 395 strategies in case larger scale measurements uncover different 396 scenarios. 398 3.2.1.3. SYN Congestion Response 400 If the SYN-ACK returned to the TCP initiator confirms that the server 401 supports AccECN, it will also indicate whether or not the SYN was CE- 402 marked. If the SYN was CE-marked, the initiator MUST reduce its 403 Initial Window (IW) and SHOULD reduce it to 1 SMSS (sender maximum 404 segment size). 406 If the SYN-ACK shows that the server does not support AccECN, the TCP 407 initiator MUST conservatively reduce its Initial Window and SHOULD 408 reduce it to 1 SMSS. A reduction to greater than 1 SMSS MAY be 409 appropriate (see Section 4.2.1). Conservatism is necessary because a 410 non-AccECN SYN-ACK cannot show whether the SYN was CE-marked. 412 If the TCP initiator (host A) receives a SYN from the remote end 413 (host B) after it has sent a SYN to B, it indicates the (unusual) 414 case of a simultaneous open. Host A will respond with a SYN-ACK. 415 Host A will probably then receive a SYN-ACK in response to its own 416 SYN, after which it can follow the appropriate one of the two 417 paragraphs above. 419 In all the above cases, the initiator does not have to back off its 420 retransmission timer as it would in response to a timeout following 421 no response to its SYN [RFC6298], because both the SYN and the SYN- 422 ACK have been successfully delivered through the network. Also, the 423 initiator does not need to exit slow start or reduce ssthresh, which 424 is not even required when a SYN is lost [RFC5681]. 426 If an initial window of 10 (IW10 [RFC6928]) is implemented, Section 5 427 gives additional recommendations. 429 3.2.1.4. Fall-Back Following No Response to an ECT SYN 431 An ECT SYN might be lost due to an over-zealous path element (or 432 server) blocking ECT packets that do not conform to RFC 3168. 433 However, loss is commonplace for numerous other reasons, e.g. 434 congestion loss at a non-ECN queue on the forward or reverse path, 435 transmission errors, etc. Alternatively, the cause of the blockage 436 might be the attempt to negotiate AccECN, or possibly other unrelated 437 options on the SYN. 439 To expedite connection set-up if, after sending an ECT SYN, the 440 retransmission timer expires, the TCP initiator SHOULD send a SYN 441 with the not-ECT codepoint in the IP header. If other experimental 442 fields or options were on the SYN, it will also be necessary to 443 follow their specifications for fall-back too. It would make sense 444 to co- ordinate all the strategies for fall-back in order to isolate 445 the specific cause of the problem. 447 If the TCP initiator is caching failed connection attempts, it SHOULD 448 NOT give up using ECT on the first SYN of subsequent connection 449 attempts until it is clear that the blockage persistently and 450 specifically affects ECT on SYNs. This is because loss is so 451 commonplace for other reasons. Even if it does eventually decide to 452 give up on ECT on the SYN, it will probably not need to give up on 453 AccECN on the SYN. In any case, the cache should be arranged to 454 expire so that the initiator will infrequently attempt to check 455 whether the problem has been resolved. 457 Other fall-back strategies MAY be adopted where applicable (see 458 Section 4.2.2 for suggestions, and the conditions under which they 459 would apply). 461 3.2.2. SYN-ACK 463 3.2.2.1. Setting ECT on the SYN-ACK 465 For the experiments proposed here, the TCP implementation will set 466 ECT on SYN-ACKs. It can ignore the requirement in section 6.1.1 of 467 RFC 3168 to set not-ECT on a SYN-ACK. 469 The feedback behaviour by the initiator in response to a CE-marked 470 SYN-ACK from the responder depends on whether classic ECN feedback 471 [RFC3168] or AccECN feedback [I-D.ietf-tcpm-accurate-ecn] has been 472 negotiated. In either case no change is required to RFC 3168 or the 473 AccECN specification. 475 Some classic ECN implementations might ignore a CE-mark on a SYN-ACK, 476 or even ignore a SYN-ACK packet entirely if it is set to ECT or CE. 478 This is a possibility because an RFC 3168 implementation would not 479 necessarily expect a SYN-ACK to be ECN-capable. 481 FOR DISCUSSION: To eliminate this problem, the WG could decide to 482 prohibit setting ECT on SYN-ACKs unless AccECN has been 483 negotiated. However, this issue already came up when the IETF 484 first decided to experiment with ECN on SYN-ACKs [RFC5562] and it 485 was decided to go ahead without any extra precautionary measures 486 because the risk was low. This was because the probability of 487 encountering the problem was believed to be low and the harm if 488 the problem arose was also low (see Appendix B of RFC 5562). 490 MEASUREMENTS NEEDED: Server-side experiments could determine 491 whether this specific problem is indeed rare across the current 492 installed base of clients that support ECN. 494 3.2.2.2. SYN-ACK Congestion Response 496 A host that sets ECT on SYN-ACKs MUST reduce its initial window in 497 response to any congestion feedback, whether using classic ECN or 498 AccECN. It SHOULD reduce it to 1 SMSS. This is different to the 499 behaviour specified in an earlier experiment that set ECT on the SYN- 500 ACK [RFC5562]. This is justified in Section 4.3. 502 The responder does not have to back off its retransmission timer 503 because the ECN feedback proves that the network is delivering 504 packets successfully and is not severely overloaded. Also the 505 responder does not have to leave slow start or reduce ssthresh, which 506 is not even required when a SYN-ACK has been lost. 508 The congestion response to CE-marking on a SYN-ACK for a server that 509 implements either the TCP Fast Open experiment (TFO [RFC7413]) or the 510 initial window of 10 experiment (IW10 [RFC6928]) is discussed in 511 Section 5. 513 3.2.2.3. Fall-Back Following No Response to an ECT SYN-ACK 515 After the responder sends a SYN-ACK with ECT set, if its 516 retransmission timer expires it SHOULD resend a SYN-ACK with not-ECT 517 set. If other experimental fields or options were on the SYN, it 518 will also be necessary to follow their specifications for fall-back 519 too. It would make sense to co-ordinate all the strategies for fall- 520 back in order to isolate the specific cause of the problem. 522 The server MAY cache failed connection attempts, e.g. per client 523 access network. If the TCP server is caching failed connection 524 attempts, it SHOULD NOT give up using ECT on the first SYN-ACK of 525 subsequent connection attempts until it is clear that the blockage 526 persistently and specifically affects ECT on SYN-ACKs. This is 527 because loss is so commonplace for other reasons (see 528 Section 3.2.1.4). The cache should be arranged to expire so that the 529 server will infrequently attempt to check whether the problem has 530 been resolved. 532 This fall-back strategy is the same as that for ECT SYN-ACKs in 533 [RFC5562]. Other fall-back strategies MAY be adopted if found to be 534 more effective, e.g. one retransmission attempt using ECT before 535 reverting to not-ECT. 537 3.2.3. Pure ACK 539 For the experiments proposed here, the TCP implementation will set 540 ECT on pure ACKs. It can ignore the requirement in section 6.1.4 of 541 RFC 3168 to set not-ECT on a pure ACK. 543 A host that sets ECT on pure ACKs MUST reduce its congestion window 544 in response to any congestion feedback, in order to regulate any data 545 segments it might be sending amongst the pure ACKs. It MAY also 546 implement AckCC [RFC5690] to regulate the pure ACK rate, but this is 547 not required. Note that, in comparison, TCP Congestion Control 548 [RFC5681] does not require a TCP to detect or respond to loss of pure 549 ACKs at all; it requires no reduction in congestion window or ACK 550 rate. 552 The question of whether the receiver of pure ACKs is required to feed 553 back any CE marks on them is a matter for the relevant feedback 554 specification ([RFC3168] or [I-D.ietf-tcpm-accurate-ecn]). It is 555 outside the scope of the present specification. Currently AccECN 556 feedback is required to count CE marking of any control packet 557 including pure ACKs. Whereas RFC 3168 is silent on this point, so 558 feedback of CE-markings might be implementation specific (see 559 Section 4.4.1). 561 DISCUSSION: An AccECN deployment or an implementation of RFC 3168 562 that feeds back CE on pure ACKs will be at a disadvantage compared 563 to an RFC 3168 implementation that does not. To solve this, the 564 WG could decide to prohibit setting ECT on pure ACKs unless AccECN 565 has been negotiated. If it does, the penultimate sentence of the 566 Introduction will need to be modified. 568 3.2.4. Window Probe 570 For the experiments proposed here, the TCP sender will set ECT on 571 window probes. It can ignore the prohibition in section 6.1.6 of RFC 572 3168 against setting ECT on a window probe. 574 A window probe contains a single octet, so it is no different from a 575 regular TCP data segment. Therefore a TCP receiver will feed back 576 any CE marking on a window probe as normal (either using classic ECN 577 feedback or AccECN feedback). The sender of the probe will then 578 reduce its congestion window as normal. 580 A receive window of zero indicates that the application is not 581 consuming data fast enough and does not imply anything about network 582 congestion. Once the receive window opens, the congestion window 583 might become the limiting factor, so it is correct that CE-marked 584 probes reduce the congestion window. However, CE-marking on window 585 probes does not reduce the rate of the probes themselves. This is 586 unlikely to present a problem, given the duration between window 587 probes doubles [RFC1122] as long as the receiver is advertising a 588 zero window (currently minimum 1 second, maximum at least 1 minute 589 [RFC6298]). 591 3.2.5. FIN 593 A TCP implementation can set ECT on a FIN. 595 The TCP data receiver MUST ignore the CE codepoint on incoming FINs 596 that fail any validity check. The validity check in section 5.2 of 597 [RFC5961] is RECOMMENDED. 599 A congestion response to a CE-marking on a FIN is not required. 601 After sending a FIN, the endpoint will not send any more data in the 602 connection. Therefore, even if the FIN-ACK indicates that the FIN 603 was CE-marked (whether using classic or AccECN feedback), reducing 604 the congestion window will not affect anything. 606 After sending a FIN, a host might send one or more pure ACKs. If it 607 is using one of the techniques in Section 3.2.3 to regulate the 608 delayed ACK ratio for pure ACKs, it could equally be applied after a 609 FIN. But this is not required. 611 3.2.6. RST 613 A TCP implementation can set ECT on a RST. 615 The "challenge ACK" approach to checking the validity of RSTs 616 (section 3.2 of [RFC5961] is RECOMMENDED at the data receiver. 618 A congestion response to a CE-marking on a RST is not required (and 619 actually not possible). 621 3.2.7. Retransmissions 623 For the experiments proposed here, the TCP sender will set ECT on 624 retransmitted segments. It can ignore the prohibition in section 625 6.1.5 of RFC 3168 against setting ECT on retransmissions. 627 Nonetheless, the TCP data receiver MUST ignore the CE codepoint on 628 incoming segments that fail any validity check. The validity check 629 in section 5.2 of [RFC5961] is RECOMMENDED. This will effectively 630 mitigate an attack that uses spoofed data packets to fool the 631 receiver into feeding back spoofed congestion indications to the 632 sender, which in turn would be fooled into continually halving its 633 congestion window. 635 If the TCP sender receives feedback that a retransmitted packet was 636 CE-marked, it will react as it would to any feedback of CE-marking on 637 a data packet. 639 4. Rationale 641 This section is informative, not normative. It presents counter- 642 arguments against the justifications in the RFC series for disabling 643 ECN on TCP control segments and retransmissions. It also gives 644 rationale for why ECT is safe on control segments that have not, so 645 far, been mentioned in the RFC series. First it addresses over- 646 arching arguments used for most packet types, then it addresses the 647 specific arguments for each packet type in turn. 649 4.1. The Reliability Argument 651 Section 5.2 of RFC 3168 states: 653 "To ensure the reliable delivery of the congestion indication of 654 the CE codepoint, an ECT codepoint MUST NOT be set in a packet 655 unless the loss of that packet [at a subsequent node] in the 656 network would be detected by the end nodes and interpreted as an 657 indication of congestion." 659 We believe this argument is misplaced. TCP does not deliver most 660 control packets reliably. So it is more important to allow control 661 packets to be ECN-capable, which greatly improves reliable delivery 662 of the control packets themselves (see motivation in Section 1.1). 663 ECN also improves the reliability and latency of delivery of any 664 congestion notification on control packets, particularly because TCP 665 does not detect the loss of most types of control packet anyway. 666 Both these points outweigh by far the concern that a CE marking 667 applied to a control packet by one node might subsequently be dropped 668 by another node. 670 The principle to determine whether a packet can be ECN-capable ought 671 to be "do no extra harm", meaning that the reliability of a 672 congestion signal's delivery ought to be no worse with ECN than 673 without. In particular, setting the CE codepoint on the very same 674 packet that would otherwise have been dropped fulfills this 675 criterion, since either the packet is delivered and the CE signal is 676 delivered to the endpoint, or the packet is dropped and the original 677 congestion signal (packet loss) is delivered to the endpoint. 679 The concern about a CE marking being dropped at a subsequent node 680 might be motivated by the idea that ECN-marking a packet at the first 681 node does not remove the packet, so it could go on to worsen 682 congestion at a subsequent node. However, it is not useful to reason 683 about congestion by considering single packets. The departure rate 684 from the first node will generally be the same (fully utilized) with 685 or without ECN, so this argument does not apply. 687 4.2. SYNs 689 RFC 5562 presents two arguments against ECT marking of SYN packets 690 (quoted verbatim): 692 "First, when the TCP SYN packet is sent, there are no guarantees 693 that the other TCP endpoint (node B in Figure 2) is ECN-Capable, 694 or that it would be able to understand and react if the ECN CE 695 codepoint was set by a congested router. 697 Second, the ECN-Capable codepoint in TCP SYN packets could be 698 misused by malicious clients to "improve" the well-known TCP SYN 699 attack. By setting an ECN-Capable codepoint in TCP SYN packets, a 700 malicious host might be able to inject a large number of TCP SYN 701 packets through a potentially congested ECN-enabled router, 702 congesting it even further." 704 The first point actually describes two subtly different issues. So 705 below three arguments are countered in turn. 707 4.2.1. Argument 1a: Unrecognized CE on the SYN 709 This argument certainly applied at the time RFC 5562 was written, 710 when no ECN responder mechanism had any logic to recognize or feed 711 back a CE marking on a SYN. The problem was that, during the 3WHS, 712 the flag in the TCP header for ECN feedback (called Echo Congestion 713 Experienced) had been overloaded to negotiate the use of ECN itself. 714 So there was no space for feedback in a SYN-ACK. 716 The accurate ECN (AccECN) protocol [I-D.ietf-tcpm-accurate-ecn] has 717 since been designed to solve this problem, using a two-pronged 718 approach. First AccECN uses the 3 ECN bits in the TCP header as 8 719 codepoints, so there is space for the responder to feed back whether 720 there was CE on the SYN. Second a TCP initiator can always request 721 AccECN support on every SYN, and any responder reveals its level of 722 ECN support: AccECN, classic ECN, or no ECN. Therefore, if a 723 responder does indicate that it supports AccECN, the initiator can be 724 sure that, if there is no CE feedback on the SYN-ACK, then there 725 really was no CE on the SYN. 727 An initiator can combine AccECN with three possible strategies for 728 setting ECT on a SYN: 730 (S1): Pessimistic ECT and cache successes: The initiator always 731 requests AccECN in the SYN, but without setting ECT. Then it 732 records those servers that confirm that they support AccECN in 733 a cache. On a subsequent connection to any server that 734 supports AccECN, the initiator can then set ECT on the SYN. 736 (S2): Optimistic ECT: The initiator always sets ECT optimistically 737 on the initial SYN and it always requests AccECN support. 738 Then, if the server response shows it has no AccECN logic (so 739 it cannot feed back a CE mark), the initiator conservatively 740 behaves as if the SYN was CE-marked, by reducing its initial 741 window. 743 A. No cache: The optimistic ECT strategy ought to work fairly 744 well without caching any responses. 746 B. Cache failures: The optimistic ECT strategy can be 747 improved by recording solely those servers that do not 748 support AccECN. On subsequent connections to these non- 749 AccECN servers, the initiator will still request AccECN 750 but not set ECT on the SYN. Then, the initiator can use 751 its full initial window (if it has enough request data to 752 need it). Longer term, as servers upgrade to AccECN, the 753 initiator will remove them from the cache and use ECT on 754 subsequent SYNs to that server. 756 (S3): ECT by configuration: In a controlled environment, the 757 administrator can make sure that servers support ECN-capable 758 SYN packets. Examples of controlled environments are single- 759 tenant DCs, and possibly multi-tenant DCs if it is assumed 760 that each tenant mostly communicates with its own VMs. 762 For unmanaged environments like the public Internet, pragmatically 763 the choice is between strategies (S1) and (S2B): 765 o The "pessimistic ECT and cache successes" strategy (S1) suffers 766 from exposing the initial SYN to the prevailing loss level, even 767 if the server supports ECT on SYNs, but only on the first 768 connection to each AccECN server. 770 o The "optimistic ECT and cache failures" strategy (S2B) exploits a 771 server's support for ECT on SYNs from the very first attempt. But 772 if the server turns out not to support AccECN, the initiator has 773 to conservatively limit its initial window - usually 774 unnecessarily. Nonetheless, initiator request data (as opposed to 775 server response data) is rarely larger than 1 SMSS anyway {ToDo: 776 reference? (this information was given informally by Yuchung 777 Cheng)}. 779 The normative specification for ECT on a SYN in Section 3.2.1 uses 780 the "optimistic ECT and cache failures" strategy (S2B) on the 781 assumption that an initial window of 1 SMSS is usually sufficient for 782 client requests anyway. Clients that often initially send more than 783 1 SMSS of data could use strategy (S1) during initial deployment, and 784 strategy (S2B) later (when the probability of servers supporting 785 AccECN and the likelihood of seeing some CE marking is higher). 786 Also, as deployment proceeds, caching successes (S1) starts off small 787 then grows, while caching failures (S2B) becomes large at first, then 788 shrinks. 790 MEASUREMENTS NEEDED: Measurements are needed to determine whether 791 one or the other strategy would be sufficient for any particular 792 client, or whether a particular client would need both strategies 793 in different circumstances. 795 4.2.2. Argument 1b: Unrecognized ECT on the SYN 797 Given, until now, ECT-marked SYN packets have been prohibited, it 798 cannot be assumed they will be accepted. According to a study using 799 2014 data [ecn-pam] from a limited range of vantage points, out of 800 the top 1M Alexa web sites, 4791 (0.82%) IPv4 sites and 104 (0.61%) 801 IPv6 sites failed to establish a connection when they received a TCP 802 SYN with any ECN codepoint set in the IP header and the appropriate 803 ECN flags in the TCP header. Of these, about 41% failed to establish 804 a connection due to the ECN flags in the TCP header even with a Not- 805 ECT ECN field in the IP header (i.e. despite full compliance with RFC 806 3168). Therefore adding the ECN-capability to SYNs was increasing 807 connection establishment failures by about 0.4%. 809 MEASUREMENTS NEEDED: In order to get these failures fixed, data 810 will be needed on which of the possible causes below is behind 811 them. 813 RFC 3168 says "a host MUST NOT set ECT on SYN [...] packets", but it 814 does not say what the responder should do if an ECN-capable SYN 815 arrives. So perhaps some responder implementations are checking that 816 the SYN complies with RFC 3168, then silently ignoring non-compliant 817 SYNs (or perhaps returning a RST). Also some middleboxes (e.g. 818 firewalls) might be discarding non-compliant SYNs. For the future, 819 [I-D.ietf-tsvwg-ecn-experimentation] updates RFC 3168 to clarify that 820 middleboxes "SHOULD NOT" do this, but that does not alter the past. 822 Whereas RSTs can be dealt with immediately, silent failures introduce 823 a retransmission timeout delay (default 1 second) at the initiator 824 before it attempts any fall back strategy. Ironically, making SYNs 825 ECN-capable is intended to avoid the timeout when a SYN is lost due 826 to congestion. Fortunately, where discard of ECN-capable SYNs is due 827 to policy it will occur predictably, not randomly like congestion. 828 So the initiator can avoid it by caching those sites that do not 829 support ECN-capable SYNs. This further justifies the use of the 830 "optimistic ECT and cache failures" strategy in Section 3.2.1. 832 MEASUREMENTS NEEDED: Experiments are needed to determine whether 833 blocking of ECT on SYNs is widespread, and how many occurrences of 834 problems would be masked by how few cache entries. 836 If blocking is too widespread for the "optimistic ECT and cache 837 failures" strategy (S2B), the "pessimistic ECT and cache successes" 838 strategy (Section 4.2.1) would be better. 840 MEASUREMENTS NEEDED: Then measurements would be needed on whether 841 failures were still widespread on the second connection attempt 842 after the more careful ("pessimistic") first connection. 844 If so, it might be necessary to send a not-ECT SYN soon after the 845 first ECT SYN (possibly with a delay between them - effectively 846 reducing the retransmission timeout) and only accept the non-ECT 847 connection if it returned first. This would reduce the performance 848 penalty for those deploying ECT SYN support. 850 FOR DISCUSSION: If this becomes necessary, how much delay ought to 851 be required before the second SYN? Certainly less than the 852 standard RTO (1 second). But more or less than the maximum RTT 853 expected over the surface of the earth (roughly 250ms)? Or even 854 back-to-back? 856 However, based on the data above from [ecn-pam], even a cache of a 857 dozen or so sites ought to avoid all ECN-related performance problems 858 with roughly the Alexa top thousand. So it is questionable whether 859 sending two SYNs will be necessary, particularly given failures at 860 well-maintained sites could reduce further once ECT SYNs are 861 standardized. 863 4.2.3. Argument 2: DoS Attacks 865 [RFC5562] says that ECT SYN packets could be misused by malicious 866 clients to augment "the well-known TCP SYN attack". It goes on to 867 say "a malicious host might be able to inject a large number of TCP 868 SYN packets through a potentially congested ECN-enabled router, 869 congesting it even further." 871 We assume this is a reference to the TCP SYN flood attack (see 872 https://en.wikipedia.org/wiki/SYN_flood), which is an attack against 873 a responder end point. We assume the idea of this attack is to use 874 ECT to get more packets through an ECN-enabled router in preference 875 to other non-ECN traffic so that they can go on to use the SYN 876 flooding attack to inflict more damage on the responder end point. 877 This argument could apply to flooding with any type of packet, but we 878 assume SYNs are singled out because their source address is easier to 879 spoof, whereas floods of other types of packets are easier to block. 881 Mandating Not-ECT in an RFC does not stop attackers using ECT for 882 flooding. Nonetheless, if a standard says SYNs are not meant to be 883 ECT it would make it legitimate for firewalls to discard them. 884 However this would negate the considerable benefit of ECT SYNs for 885 compliant transports and seems unnecessary because RFC 3168 already 886 provides the means to address this concern. In section 7, RFC 3168 887 says "During periods where ... the potential packet marking rate 888 would be high, our recommendation is that routers drop packets rather 889 then set the CE codepoint..." and this advice is repeated in 890 [RFC7567] (section 4.2.1). This makes it harder for flooding packets 891 to gain from ECT. 893 Further experiments are needed to test how much malicious hosts can 894 use ECT to augment flooding attacks without triggering AQMs to turn 895 off ECN support (flying "just under the radar"). If it is found that 896 ECT can only slightly augment flooding attacks, the risk of such 897 attacks will need to be weighed against the performance benefits of 898 ECT SYNs. 900 4.3. SYN-ACKs 902 The proposed approach in Section 3.2.2 for experimenting with ECN- 903 capable SYN-ACKs is identical to the scheme called ECN+ [ECN-PLUS]. 904 In 2005, the ECN+ paper demonstrated that it could reduce the average 905 Web response time by an order of magnitude. It also argued that 906 adding ECT to SYN-ACKs did not raise any new security 907 vulnerabilities. 909 The IETF has already specified an experiment with ECN-capable SYN-ACK 910 packets [RFC5562]. It was inspired by the ECN+ paper, but it 911 specified a much more conservative congestion response to a CE-marked 912 SYN-ACK, called ECN+/TryOnce. This required the server to reduce its 913 initial window to 1 segment (like ECN+), but then the server had to 914 send a second SYN-ACK and wait for its ACK before it could continue 915 with its initial window of 1 SMSS. The second SYN-ACK of this 5-way 916 handshake had to carry no data, and had to disable ECN, but no 917 justification was given for these last two aspects. 919 The present ECN experiment uses the ECN+ congestion response, not 920 ECN+/TryOnce. First we argue against the rationale for ECN+/TryOnce 921 given in sections 4.4 and 6.2 of [RFC5562]. It starts with a rather 922 too literal interpretation of the requirement in RFC 3168 that says 923 TCP's response to a single CE mark has to be "essentially the same as 924 the congestion control response to a *single* dropped packet." TCP's 925 response to a dropped initial (SYN or SYN-ACK) packet is to wait for 926 the retransmission timer to expire (currently 1s). However, this 927 long delay assumes the worst case between two possible causes of the 928 loss: a) heavy overload; or b) the normal capacity-seeking behaviour 929 of other TCP flows. When the network is still delivering CE-marked 930 packets, it implies that there is an AQM at the bottleneck and that 931 it is not overloaded. So scenario (a) can be ruled out. Therefore, 932 TCP's response to a CE-marked SYN-ACK can be similar to its response 933 to the loss of _any_ packet, rather than backing off as if the 934 special _initial_ packet of a flow has been lost. 936 How TCP responds to the loss of any single packet depends what it has 937 just been doing. But there is not really a precedent for TCP's 938 response when it experiences a CE mark having sent only one (small) 939 packet. If TCP had been adding one segment per RTT, it would have 940 halved its congestion window, but it hasn't established a congestion 941 window yet. If it had been exponentially increasing it would have 942 exited slow start, but it hasn't started exponentially increasing yet 943 so it hasn't established a slow-start threshold. 945 Therefore, we have to work out a reasoned argument for what to do. 946 If an AQM is CE-marking packets, it implies the queue is somewhere 947 around the AQM's operating point - it might be well above and it is 948 less likely to be well below. So it does not seem sensible to add a 949 number of packets at once. But it will be safe to introduce one 950 segment again, after 1 RTT. Therefore, starting to probe for 951 capacity with a slow start from an initial window of 1 segment seems 952 appropriate to the circumstances. This is the approach adopted in 953 Section 3.2.2. 955 4.4. Pure ACKs 957 Section 5.2 of RFC 3168 gives the following arguments for not 958 allowing the ECT marking of pure ACKs (ACKs not piggy-backed on 959 data): 961 "To ensure the reliable delivery of the congestion indication of 962 the CE codepoint, an ECT codepoint MUST NOT be set in a packet 963 unless the loss of that packet in the network would be detected by 964 the end nodes and interpreted as an indication of congestion. 966 Transport protocols such as TCP do not necessarily detect all 967 packet drops, such as the drop of a "pure" ACK packet; for 968 example, TCP does not reduce the arrival rate of subsequent ACK 969 packets in response to an earlier dropped ACK packet. Any 970 proposal for extending ECN-Capability to such packets would have 971 to address issues such as the case of an ACK packet that was 972 marked with the CE codepoint but was later dropped in the network. 973 We believe that this aspect is still the subject of research, so 974 this document specifies that at this time, "pure" ACK packets MUST 975 NOT indicate ECN-Capability." 977 Later on, in section 6.1.4 it reads: 979 "For the current generation of TCP congestion control algorithms, 980 pure acknowledgement packets (e.g., packets that do not contain 981 any accompanying data) MUST be sent with the not-ECT codepoint. 982 Current TCP receivers have no mechanisms for reducing traffic on 983 the ACK-path in response to congestion notification. Mechanisms 984 for responding to congestion on the ACK-path are areas for current 985 and future research. (One simple possibility would be for the 986 sender to reduce its congestion window when it receives a pure ACK 987 packet with the CE codepoint set). For current TCP 988 implementations, a single dropped ACK generally has only a very 989 small effect on the TCP's sending rate." 991 We next address each of the arguments presented above. 993 The first argument is a specific instance of the reliability argument 994 for the case of pure ACKs. This has already been addressed by 995 countering the general reliability argument in Section 4.1. 997 The second argument says that ECN ought not to be enabled unless 998 there is a mechanism to respond to it. However, actually there _is_ 999 a mechanism to respond to congestion on a pure ACK that RFC 3168 has 1000 overlooked - the congestion window mechanism. When data segments and 1001 pure ACKs are interspersed, congestion notifications ought to 1002 regulate the congestion window, whether they are on data segments or 1003 on pure ACKs. Otherwise, if ECN is disabled on Pure ACKs, and if 1004 (say) 70% of the segments in one direction are Pure ACKs, about 70% 1005 of the congestion notifications will be missed and the data segments 1006 will not be correctly regulated. 1008 So RFC 3168 ought to have considered two congestion response 1009 mechanisms - reducing the congestion window (cwnd) and reducing the 1010 ACK rate - and only the latter was missing. Further, RFC 3168 was 1011 incorrect to assume that, if one ACK was a pure ACK, all segments in 1012 the same direction would be pure ACKs. Admittedly a continual stream 1013 of pure ACKs in one direction is quite a common case (e.g. a file 1014 download). However, it is also common for the pure ACKs to be 1015 interspersed with data segments (e.g. HTTP/2 browser requests 1016 controlling a web application). Indeed, it is more likely that any 1017 congestion experienced by pure ACKs will be due to mixing with data 1018 segments, either within the same flow, or within competing flows. 1020 This insight swings the argument towards enabling ECN on pure ACKs so 1021 that CE marks can drive the cwnd response to congestion (whenever 1022 data segments are interspersed with the pure ACKs). Then to 1023 separately decide whether an ACK rate response is also required (when 1024 they are ECN-enabled). The two types of response are addressed 1025 separately in the following two subsections, then a final subsection 1026 draws conclusions. 1028 4.4.1. Cwnd Response to CE-Marked Pure ACKs 1030 If the sender of pure ACKs sets them to ECT, the bullets below assess 1031 whether the three stages of the congestion response mechanism will 1032 all work for each type of congestion feedback (classic ECN [RFC3168] 1033 and AccECN [I-D.ietf-tcpm-accurate-ecn]): 1035 Detection: The receiver of a pure ACK can detect a CE marking on it: 1037 * Classic feedback: the receiver will not expect CE marks on pure 1038 ACKs, so it will be implementation-dependent whether it happens 1039 to check for CE marks on all packets. 1041 * AccECN feedback: the AccECN specification requires the receiver 1042 of any TCP packets to count any CE marks on them (whether or 1043 not control packets are ECN-capable). 1045 Feedback: TCP never ACKs a pure ACK, but the receiver of a CE-mark 1046 on a pure ACK can feed it back when it sends a subsequent data 1047 segment (if it ever does): 1049 * Classic feedback: the receiver (of the pure ACKs) would set the 1050 echo congestion experienced (ECE) flag in the TCP header as 1051 normal. 1053 * AccECN feedback: the receiver continually feeds back a count of 1054 the number of CE-marked packets that it has received (and, if 1055 possible, a count of CE-marked bytes). 1057 Congestion response: In either case (classic or AccECN feedback), if 1058 the TCP sender does receive feedback about CE-markings on pure 1059 ACKs, it will react in the usual way by reducing its congestion 1060 window accordingly. This will regulate the rate of any data 1061 packets it is sending amongst the pure ACKs. 1063 4.4.2. ACK Rate Response to CE-Marked Pure ACKs 1065 Reducing the congestion window will have no effect on the rate of 1066 pure ACKs. The worst case here is if the bottleneck is congested 1067 solely with pure ACKs, but it could also be problematic if a large 1068 fraction of the load was from unresponsive ACKs, leaving little or no 1069 capacity for the load from responsive data. 1071 Since RFC 3168 was published, Acknowledgement Congestion Control 1072 (AckCC) techniques have been documented in [RFC5690] (informational). 1073 So any pair of TCP end-points can choose to agree to regulate the 1074 delayed ACK ratio in response to lost or CE-marked pure ACKs. 1075 However, the protocol has a number of open deployment issues (e.g. it 1076 relies on two new TCP options, one of which is required on the SYN 1077 where option space is at a premium and, if either option is blocked 1078 by a middlebox, no fall-back behaviour is specified). The new TCP 1079 options addressed two problems, namely that TCP had: i) no mechanism 1080 to allow ECT to be set on pure ACKs; and ii) no mechanism to feed 1081 back loss or CE-marking of pure ACKs. A combination of the present 1082 specification and AccECN addresses both these problems, at least for 1083 ECN marking. So it might now be possible to design an ECN-specific 1084 ACK congestion control scheme without the extra TCP options proposed 1085 in RFC 5690. However, such a mechanism is out of scope of the 1086 present document. 1088 Setting aside the practicality of RFC 5690, the need for AckCC has 1089 not been conclusively demonstrated. It has been argued that the 1090 Internet has survived so far with no mechanism to even detect loss of 1091 pure ACKs. However, it has also been argued that ECN is not the same 1092 as loss. Packet discard can naturally thin the ACK load to whatever 1093 the bottleneck can support, whereas ECN marking does not (it queues 1094 the ACKs instead). Nonetheless, RFC 3168 (section 7) recommends that 1095 an AQM switches over from ECN marking to discard when the marking 1096 probability becomes high. Therefore discard can still be relied on 1097 to thin out ECN-enabled pure ACKs as a last resort. 1099 4.4.3. Summary: Enabling ECN on Pure ACKs 1101 In the case when AccECN has been negotiated, the arguments for ECT 1102 (and CE) on pure ACKs heavily outweigh those against. ECN is always 1103 more and never less reliable for delivery of congestion notification. 1104 The cwnd response has been overlooked as a mechanism for responding 1105 to congestion on pure ACKs, so it is incorrect not to set ECT on pure 1106 ACKs when they are interspersed with data segments. And when they 1107 are not, packet discard still acts as the "congestion response of 1108 last resort". In contrast, not setting ECT on pure ACKs is certainly 1109 detrimental to performance, because when a pure ACK is lost it can 1110 prevent the release of new data. Separately, AckCC (or perhaps an 1111 improved variant exploiting AccECN) could optionally be used to 1112 regulate the spacing between pure ACKs. However, it is not clear 1113 whether AckCC is justified. 1115 In the case when Classic ECN has been negotiated, there is still an 1116 argument for ECT (and CE) on pure ACKs, but it is less clear-cut. 1117 Some existing RFC 3168 implementations might happen to 1118 (unintentionally) provide the correct feedback to support a cwnd 1119 response. Even for those that did not, setting ECT on pure ACKs 1120 would still be better for performance than not setting it and do no 1121 extra harm. If AckCC was required, it is designed to work with RFC 1122 3168 ECN. 1124 4.5. Window Probes 1126 Section 6.1.6 of RFC 3168 presents only the reliability argument for 1127 prohibiting ECT on Window probes: 1129 "If a window probe packet is dropped in the network, this loss is 1130 not detected by the receiver. Therefore, the TCP data sender MUST 1131 NOT set either an ECT codepoint or the CWR bit on window probe 1132 packets. 1134 However, because window probes use exact sequence numbers, they 1135 cannot be easily spoofed in denial-of-service attacks. Therefore, 1136 if a window probe arrives with the CE codepoint set, then the 1137 receiver SHOULD respond to the ECN indications." 1139 The reliability argument has already been addressed in Section 4.1. 1141 Allowing ECT on window probes could considerably improve performance 1142 because, once the receive window has reopened, if a window probe is 1143 lost the sender will stall until the next window probe reaches the 1144 receiver, which might be after the maximum retransmission timeout (at 1145 least 1 minute [RFC6928]). 1147 On the bright side, RFC 3168 at least specifies the receiver 1148 behaviour if a CE-marked window probe arrives, so changing the 1149 behaviour ought to be less painful than for other packet types. 1151 4.6. FINs 1153 RFC 3168 is silent on whether a TCP sender can set ECT on a FIN. A 1154 FIN is considered as part of the sequence of data, and the rate of 1155 pure ACKs sent after a FIN could be controlled by a CE marking on the 1156 FIN. Therefore there is no reason not to set ECT on a FIN. 1158 4.7. RSTs 1160 RFC 3168 is silent on whether a TCP sender can set ECT on a RST. The 1161 host generating the RST message does not have an open connection 1162 after sending it (either because there was no such connection when 1163 the packet that triggered the RST message was received or because the 1164 packet that triggered the RST message also triggered the closure of 1165 the connection). 1167 Moreover, the receiver of a CE-marked RST message can either: i) 1168 accept the RST message and close the connection; ii) emit a so-called 1169 challenge ACK in response (with suitable throttling) [RFC5961] and 1170 otherwise ignore the RST (e.g. because the sequence number is in- 1171 window but not the precise number expected next); or iii) discard the 1172 RST message (e.g. because the sequence number is out-of-window). In 1173 the first two cases there is no point in echoing any CE mark received 1174 because the sender closed its connection when it sent the RST. In 1175 the third case it makes sense to discard the CE signal as well as the 1176 RST. 1178 Although a congestion response following a CE-marking on a RST does 1179 not appear to make sense, the following factors have been considered 1180 before deciding whether the sender ought to set ECT on a RST message: 1182 o As explained above, a congestion response by the sender of a CE- 1183 marked RST message is not possible; 1185 o So the only reason for the sender setting ECT on a RST would be to 1186 improve the reliability of the message's delivery; 1188 o RST messages are used to both mount and mitigate attacks: 1190 * Spoofed RST messages are used by attackers to terminate ongoing 1191 connections, although the mitigations in RFC 5961 have 1192 considerably raised the bar against off-path RST attacks; 1194 * Legitimate RST messages allow endpoints to inform their peers 1195 to eliminate existing state that correspond to non existing 1196 connections, liberating resources e.g. in DoS attacks 1197 scenarios; 1199 o AQMs are advised to disable ECN marking during persistent 1200 overload, so: 1202 * it is harder for an attacker to exploit ECN to intensify an 1203 attack; 1205 * it is harder for a legitimate user to exploit ECN to more 1206 reliably mitigate an attack 1208 o Prohibiting ECT on a RST would deny the benefit of ECN to 1209 legitimate RST messages, but not to attackers who can disregard 1210 RFCs; 1212 o If ECT were prohibited on RSTs 1214 * it would be easy for security middleboxes to discard all ECN- 1215 capable RSTs; 1217 * However, unlike a SYN flood, it is already easy for a security 1218 middlebox (or host) to distinguish a RST flood from legitimate 1219 traffic [RFC5961], and even if a some legitimate RSTs are 1220 accidentally removed as well, legitimate connections still 1221 function. 1223 So, on balance, it has been decided that it is worth experimenting 1224 with ECT on RSTs. During experiments, if the ECN capability on RSTs 1225 is found to open a vulnerability that is hard to close, this decision 1226 can be reversed, before it is specified for the standards track. 1228 4.8. Retransmitted Packets. 1230 RFC 3168 says the sender "MUST NOT" set ECT on retransmitted packets. 1231 The rationale for this consumes nearly 2 pages of RFC 3168, so the 1232 reader is referred to section 6.1.5 of RFC 3168, rather than quoting 1233 it all here. There are essentially three arguments, namely: 1234 reliability; DoS attacks; and over-reaction to congestion. We 1235 address them in order below. 1237 The reliability argument has already been addressed in Section 4.1. 1239 Protection against DoS attacks is not afforded by prohibiting ECT on 1240 retransmitted packets. An attacker can set CE on spoofed 1241 retransmissions whether or not it is prohibited by an RFC. 1242 Protection against the DoS attack described in section 6.1.5 of RFC 1243 3168 is solely afforded by the requirement that "the TCP data 1244 receiver SHOULD ignore the CE codepoint on out-of-window packets". 1245 Therefore in Section 3.2.7 the sender is allowed to set ECT on 1246 retransmitted packets, in order to reduce the chance of them being 1247 dropped. We also strengthen the receiver's requirement from "SHOULD 1248 ignore" to "MUST ignore". And we generalize the receiver's 1249 requirement to include failure of any validity check, not just out- 1250 of-window checks, in order to include the more stringent validity 1251 checks in RFC 5961 that have been developed since RFC 3168. 1253 A consequence is that, for those retransmitted packets that arrive at 1254 the receiver after the original packet has been properly received 1255 (so-called spurious retransmissions), any CE marking will be ignored. 1256 There is no problem with that because the fact that the original 1257 packet has been delivered implies that the sender's original 1258 congestion response (when it deemed the packet lost and retransmitted 1259 it) was unnecessary. 1261 Finally, the third argument is about over-reacting to congestion. 1262 The argument goes that, if a retransmitted packet is dropped, the 1263 sender will not detect it, so it will not react again to congestion 1264 (it would have reduced its congestion window already when it 1265 retransmitted the packet). Whereas, if retransmitted packets can be 1266 CE tagged instead of dropped, senders could potentially react more 1267 than once to congestion. However, we argue that it is legitimate to 1268 respond again to congestion if it still persists in subsequent round 1269 trip(s). 1271 Therefore, in all three cases, it is not incorrect to set ECT on 1272 retransmissions. 1274 5. Interaction with popular variants or derivatives of TCP 1276 The following subsections discuss any interactions between setting 1277 ECT on all all packets and using the following popular variants or 1278 derivatives of TCP: SCTP, IW10 and TFO. This section is informative 1279 not normative, because no interactions have been identified that 1280 require any change to specifications. The subsection on IW10 1281 discusses potential changes to specifications but recommends that no 1282 changes are needed. 1284 TCP variants that have been assessed and found not to interact 1285 adversely with ECT on TCP control packets are: SYN cookies (see 1286 Appendix A of [RFC4987] and section 3.1 of [RFC5562]), TCP Fast Open 1287 (TFO [RFC7413]) and L4S [I-D.briscoe-tsvwg-l4s-arch]. 1289 5.1. SCTP 1291 Stream Control Transmission Protocol (SCTP [RFC4960]) is a standards 1292 track protocol derived from TCP. SCTP currently does not include ECN 1293 support, but Appendix A of RFC 4960 broadly describes how it would be 1294 supported and a draft on the addition of ECN to SCTP has been 1295 produced [I-D.stewart-tsvwg-sctpecn]. This draft avoids setting ECT 1296 on control packets and retransmissions, closely following the 1297 arguments in RFC 3168. When ECN is finally added to SCTP, experience 1298 from experiments on adding ECN support to all TCP packets ought to be 1299 directly transferable to SCTP. 1301 5.2. IW10 1303 IW10 is an experiment to determine whether it is safe for TCP to use 1304 an initial window of 10 SMSS [RFC6928]. 1306 This subsection does not recommend any additions to the present 1307 specification in order to interwork with IW10. The specifications as 1308 they stand are safe, and there is only a corner-case with ECT on the 1309 SYN where performance could be occasionally improved, as explained 1310 below. 1312 As specified in Section 3.2.1.1, a TCP initiator can only set ECT on 1313 the SYN if it requests AccECN support. If, however, the SYN-ACK 1314 tells the initiator that the responder does not support AccECN, 1315 Section 3.2.1.1 advises the initiator to conservatively reduce its 1316 initial window to 1 SMSS because, if the SYN was CE-marked, the SYN- 1317 ACK has no way to feed that back. 1319 If the initiator implements IW10, it seems rather over-conservative 1320 to reduce IW from 10 to 1 just in case a congestion marking was 1321 missed. Nonetheless, the reduction to 1 SMSS will rarely harm 1322 performance, because: 1324 o as long as the initiator is caching failures to negotiate AccECN, 1325 subsequent attempts to access the same server will not use ECT on 1326 the SYN anyway, so there will no longer be any need to 1327 conservatively reduce IW; 1329 o currently it is not common for a TCP initiator (client) to have 1330 more than one data segment to send {ToDo: evidence/reference?} - 1331 IW10 is primarily exploited by TCP servers. 1333 If a responder receives feedback that the SYN-ACK was CE-marked, 1334 Section 3.2.2.2 mandates that it reduces its initial window to 1 1335 SMSS. When the responder also implements IW10, it is particularly 1336 important to adhere to this requirement in order to avoid overflowing 1337 a queue that is clearly already congested. 1339 5.3. TFO 1341 TCP Fast Open (TFO [RFC7413]) is an experiment to remove the round 1342 trip delay of TCP's 3-way hand-shake (3WHS). A TFO initiator caches 1343 a cookie from a previous connection with a TFO-enabled server. Then, 1344 for subsequent connections to the same server, any data included on 1345 the SYN can be passed directly to the server application, which can 1346 then return up to an initial window of response data on the SYN-ACK 1347 and on data segments straight after it, without waiting for the ACK 1348 that completes the 3WHS. 1350 The TFO experiment and the present experiment to add ECN-support for 1351 TCP control packets can be combined without altering either 1352 specification, which is justified as follows: 1354 o The handling of ECN marking on a SYN is no different whether or 1355 not it carries data. 1357 o In response to any CE-marking on the SYN-ACK, the responder adopts 1358 the normal response to congestion, as discussed in Section 7.2 of 1359 [RFC7413]. 1361 6. Security Considerations 1363 Section 3.2.6 considers the question of whether ECT on RSTs will 1364 allow RST attacks to be intensified. There are several security 1365 arguments presented in RFC 3168 for preventing the ECN marking of TCP 1366 control packets and retransmitted segments. We believe all of them 1367 have been properly addressed in Section 4, particularly Section 4.2.3 1368 and Section 4.8 on DoS attacks using spoofed ECT-marked SYNs and 1369 spoofed CE-marked retransmissions. 1371 7. IANA Considerations 1373 There are no IANA considerations in this memo. 1375 8. Acknowledgments 1377 Thanks to Mirja Kuehlewind and David Black for their useful reviews. 1379 9. References 1381 9.1. Normative References 1383 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1384 Requirement Levels", BCP 14, RFC 2119, 1385 DOI 10.17487/RFC2119, March 1997, 1386 . 1388 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1389 of Explicit Congestion Notification (ECN) to IP", 1390 RFC 3168, DOI 10.17487/RFC3168, September 2001, 1391 . 1393 [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. 1394 Ramakrishnan, "Adding Explicit Congestion Notification 1395 (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, 1396 DOI 10.17487/RFC5562, June 2009, 1397 . 1399 [RFC5961] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's 1400 Robustness to Blind In-Window Attacks", RFC 5961, 1401 DOI 10.17487/RFC5961, August 2010, 1402 . 1404 [I-D.ietf-tcpm-accurate-ecn] 1405 Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More 1406 Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate- 1407 ecn-02 (work in progress), October 2016. 1409 [I-D.ietf-tsvwg-ecn-experimentation] 1410 Black, D., "Explicit Congestion Notification (ECN) 1411 Experimentation", draft-ietf-tsvwg-ecn-experimentation-01 1412 (work in progress), March 2017. 1414 9.2. Informative References 1416 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 1417 RFC 793, DOI 10.17487/RFC0793, September 1981, 1418 . 1420 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 1421 Communication Layers", STD 3, RFC 1122, 1422 DOI 10.17487/RFC1122, October 1989, 1423 . 1425 [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit 1426 Congestion Notification (ECN) Signaling with Nonces", 1427 RFC 3540, DOI 10.17487/RFC3540, June 2003, 1428 . 1430 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 1431 RFC 4960, DOI 10.17487/RFC4960, September 2007, 1432 . 1434 [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common 1435 Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, 1436 . 1438 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1439 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1440 . 1442 [RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding 1443 Acknowledgement Congestion Control to TCP", RFC 5690, 1444 DOI 10.17487/RFC5690, February 2010, 1445 . 1447 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 1448 "Computing TCP's Retransmission Timer", RFC 6298, 1449 DOI 10.17487/RFC6298, June 2011, 1450 . 1452 [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, 1453 "Increasing TCP's Initial Window", RFC 6928, 1454 DOI 10.17487/RFC6928, April 2013, 1455 . 1457 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP 1458 Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, 1459 . 1461 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 1462 Recommendations Regarding Active Queue Management", 1463 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 1464 . 1466 [I-D.briscoe-tsvwg-ecn-l4s-id] 1467 Schepper, K., Briscoe, B., and I. Tsang, "Identifying 1468 Modified Explicit Congestion Notification (ECN) Semantics 1469 for Ultra-Low Queuing Delay", draft-briscoe-tsvwg-ecn-l4s- 1470 id-02 (work in progress), October 2016. 1472 [I-D.briscoe-tsvwg-l4s-arch] 1473 Briscoe, B., Schepper, K., and M. Bagnulo, "Low Latency, 1474 Low Loss, Scalable Throughput (L4S) Internet Service: 1475 Architecture", draft-briscoe-tsvwg-l4s-arch-02 (work in 1476 progress), March 2017. 1478 [I-D.stewart-tsvwg-sctpecn] 1479 Stewart, R., Tuexen, M., and X. Dong, "ECN for Stream 1480 Control Transmission Protocol (SCTP)", draft-stewart- 1481 tsvwg-sctpecn-05 (work in progress), January 2014. 1483 [judd-nsdi] 1484 Judd, G., "Attaining the promise and avoiding the pitfalls 1485 of TCP in the Datacenter", USENIX Symposium on Networked 1486 Systems Design and Implementation (NSDI'15) pp.145-157, 1487 May 2015. 1489 [ecn-pam] Trammell, B., Kuehlewind, M., Boppart, D., Learmonth, I., 1490 Fairhurst, G., and R. Scheffenegger, "Enabling Internet- 1491 Wide Deployment of Explicit Congestion Notification", 1492 Int'l Conf. on Passive and Active Network Measurement 1493 (PAM'15) pp193-205, 2015. 1495 [ECN-PLUS] 1496 Kuzmanovic, A., "The Power of Explicit Congestion 1497 Notification", ACM SIGCOMM 35(4):61--72, 2005. 1499 Authors' Addresses 1501 Marcelo Bagnulo 1502 Universidad Carlos III de Madrid 1503 Av. Universidad 30 1504 Leganes, Madrid 28911 1505 SPAIN 1507 Phone: 34 91 6249500 1508 Email: marcelo@it.uc3m.es 1509 URI: http://www.it.uc3m.es 1511 Bob Briscoe 1512 Simula Research Lab 1514 Email: ietf@bobbriscoe.net 1515 URI: http://bobbriscoe.net/