idnits 2.17.1 draft-morton-tsvwg-sce-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC8311, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document updates RFC3168, but the abstract doesn't seem to directly say this. It does mention RFC3168 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC3168, updated by this document, for RFC5378 checks: 2000-11-17) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (17 May 2021) is 1069 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-20) exists of draft-ietf-tsvwg-l4s-arch-08 -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Working Group J. Morton 3 Internet-Draft 4 Updates: 3168, 8311 (if approved) P. Heist 5 Intended status: Experimental 6 Expires: 18 November 2021 R.W. Grimes, Ed. 7 17 May 2021 9 The Some Congestion Experienced ECN Codepoint 10 draft-morton-tsvwg-sce-03 12 Abstract 14 This memo reclassifies ECT(1) to be an early notification of 15 congestion on ECT(0) marked packets, which can be used by AQM 16 algorithms and transports as an earlier signal of congestion than CE. 17 It is a simple, transparent, and backward compatible upgrade to 18 existing IETF-approved AQMs, RFC3168, and nearly all congestion 19 control algorithms. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on 18 November 2021. 38 Copyright Notice 40 Copyright (c) 2021 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 45 license-info) in effect on the date of publication of this document. 46 Please review these documents carefully, as they describe your rights 47 and restrictions with respect to this document. Code Components 48 extracted from this document must include Simplified BSD License text 49 as described in Section 4.e of the Trust Legal Provisions and are 50 provided without warranty as described in the Simplified BSD License. 52 Table of Contents 54 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 56 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 57 4. Some Congestion Experienced . . . . . . . . . . . . . . . . . 5 58 5. Design Rationale . . . . . . . . . . . . . . . . . . . . . . 7 59 5.1. Risks with ECN Signaling . . . . . . . . . . . . . . . . 7 60 5.2. Unresponsive Flows . . . . . . . . . . . . . . . . . . . 8 61 5.3. Fairness . . . . . . . . . . . . . . . . . . . . . . . . 9 62 5.4. ECT(1) as SCE . . . . . . . . . . . . . . . . . . . . . . 9 63 6. Diffserv Usage . . . . . . . . . . . . . . . . . . . . . . . 10 64 6.1. SCE Diffserv Codepoints (DSCPs) . . . . . . . . . . . . . 10 65 6.1.1. SCE-CAPABLE . . . . . . . . . . . . . . . . . . . . . 10 66 6.1.2. SCE-LOWDELAY . . . . . . . . . . . . . . . . . . . . 11 67 6.1.3. SCE-LOWCOST . . . . . . . . . . . . . . . . . . . . . 11 68 6.2. Diffserv Codepoints for Experimental and Private Use . . 11 69 6.3. Diffserv Codepoints for Public Use . . . . . . . . . . . 12 70 7. Examples of use . . . . . . . . . . . . . . . . . . . . . . . 12 71 7.1. Codel-type AQMs . . . . . . . . . . . . . . . . . . . . . 12 72 7.2. RED-type AQMs (including PIE) . . . . . . . . . . . . . . 13 73 7.3. Simple Two-Queue Middleboxes . . . . . . . . . . . . . . 13 74 7.4. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 75 7.5. Other . . . . . . . . . . . . . . . . . . . . . . . . . . 14 76 8. Compatibility . . . . . . . . . . . . . . . . . . . . . . . . 14 77 8.1. Existing ECN & AQM Deployments . . . . . . . . . . . . . 14 78 8.2. L4S . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 79 9. Ongoing Research and Development . . . . . . . . . . . . . . 16 80 10. Related Work . . . . . . . . . . . . . . . . . . . . . . . . 16 81 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 82 12. Security Considerations . . . . . . . . . . . . . . . . . . . 17 83 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 17 84 14. Normative References . . . . . . . . . . . . . . . . . . . . 17 85 15. Informative References . . . . . . . . . . . . . . . . . . . 17 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 88 1. Terminology 90 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 91 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 92 "OPTIONAL" in this document are to be interpreted as described in 93 [RFC2119] and [RFC8174] when, and only when, they appear in all 94 capitals, as shown here. 96 2. Introduction 98 Traditional TCP congestion control exhibits a "sawtooth" pattern 99 which, in the most favourable cases, oscillates around the optimum 100 operating point of maximum throughput and minimum delay, which exists 101 at the point where the congestion window equals path BDP. The term 102 "sawtooth" brings to mind the straight-edged graphs of TCP Reno, but 103 the equally common TCP CUBIC is essentially similar in character, as 104 are other AIMD-derived algorithms. 106 A number of proposals have sought to improve this, but introduce 107 various other tradoffs in return. TCP Vegas is consistently 108 outcompeted by standard TCPs, DCTCP proved to be too aggressive for 109 deployment in the public Internet, and while BBR appears to have 110 avoided both of these problems, its complexity makes it difficult to 111 implement correctly. Each of these proposals is characterised by 112 primarily changing only the endpoints, not the network nodes on the 113 path between them; though DCTCP is intended for use with a specific 114 style of AQM, it can work with standard AQMs as long as there is no 115 competing non-DCTCP traffic. 117 Some other proposals have attempted to convey information about the 118 network path explicitly, by having network nodes inject data about 119 link capacity and/or utilisation into passing traffic. These 120 proposals have generally been unsuccessful due to the complex slow- 121 path processing required in network nodes, and are not widely 122 deployed. The only successful proposal of this type is Explicit 123 Congestion Notification [RFC3168] which allows an AQM to signal 124 congestion by marking packets with (essentially) a one-bit signal in 125 preference to dropping them. 127 ECN defines a two-bit field supporting four codepoints, of which 128 three are in active use and the fourth is a semantic duplicate. It 129 was explicitly suggested during ECN's development that new meaning 130 could be given to this spare codepoint, including as a lesser 131 indication of congestion in [RFC3168] (section 20.2). With an 132 alternative use of this codepoint having fallen out of favour, the 133 time is right to revisit this suggestion and propose a workable 134 method of applying it. 136 In so doing, care must be taken that backwards compatibility is 137 maintained with existing traffic, endpoints and network nodes that 138 are known or suspected to have been deployed. Keeping the changes to 139 on-wire protocols minimal, and the complexity of implementation low, 140 are also highly desirable. 142 This memo reclassifies ECT(1) to be an early notification of 143 congestion on ECT(0) marked packets, which can be used by AQM 144 algorithms and transports as an earlier signal of congestion than CE 145 ("Congestion Experienced"). 147 This memo also briefly discusses how transports should respond to 148 ECT(1) marked packets. Detailed specifications of this behaviour are 149 left to transport-specific memos. 151 3. Background 153 [RFC3168] defines the lower two bits of the (former) TOS byte in the 154 IPv4/6 header as the ECN field. This may take four values: Not-ECT, 155 ECT(0), ECT(1) or CE. 157 +========+=====================================+============+ 158 | Binary | Keyword | References | 159 +========+=====================================+============+ 160 | 00 | Not-ECT (Not ECN-Capable Transport) | [RFC3168] | 161 +--------+-------------------------------------+------------+ 162 | 01 | ECT(1) (ECN-Capable Transport(1)) | [RFC3168] | 163 +--------+-------------------------------------+------------+ 164 | 10 | ECT(0) (ECN-Capable Transport(0)) | [RFC3168] | 165 +--------+-------------------------------------+------------+ 166 | 11 | CE (Congestion Experienced) | [RFC3168] | 167 +--------+-------------------------------------+------------+ 169 Table 1 171 Research has shown that the ECT(1) codepoint goes essentially unused, 172 with the "Nonce Sum" extension to ECN having not been implemented in 173 practice and thus subsequently obsoleted by [RFC8311] (section 3). 174 Additionally, known [RFC3168] compliant senders do not emit ECT(1), 175 and compliant middleboxes do not alter the field to ECT(1), while 176 compliant receivers all interpret ECT(1) identically to ECT(0). 177 These are useful properties which represent an opportunity for 178 improvement. 180 Experience gained with 7 years of [RFC8290] deployment in the field 181 suggests that it remains difficult to maintain the desired 100% link 182 utilisation, whilst simultaneously strictly minimising induced delay 183 due to excess queue depth - irrespective of whether ECN is in use. 185 This leads to a reluctance amongst hardware vendors to implement the 186 most effective AQM schemes because their headline benchmarks are 187 throughput-based. 189 The underlying cause is the very sharp "multiplicative decrease" 190 reaction required of transport protocols to congestion signalling 191 (whether that be packet loss or CE marks), which tends to leave the 192 congestion window significantly smaller than the ideal BDP when 193 triggered at only slightly above the ideal value. The availability 194 of this sharp response is required to assure network stability (AIMD 195 principle), but there is presently no standardised and backwards- 196 compatible means of providing a less drastic signal. 198 4. Some Congestion Experienced 200 As consensus has arisen that some form of ECN signaling should be an 201 earlier signal than drop, this memo changes the meaning of ECT(1) to 202 SCE, meaning "Some Congestion Experienced". Since there is no longer 203 ambiguity between two ECT codepoints, ECT(0) is referred to as ECT. 204 The ECN-field codepoint table then becomes: 206 +========+=====================================+==============+ 207 | Binary | Keyword | References | 208 +========+=====================================+==============+ 209 | 00 | Not-ECT (Not ECN-Capable Transport) | [RFC3168] | 210 +--------+-------------------------------------+--------------+ 211 | 01 | SCE (Some Congestion Experienced) | [This draft] | 212 +--------+-------------------------------------+--------------+ 213 | 10 | ECT (ECN-Capable Transport) | [RFC3168] | 214 +--------+-------------------------------------+--------------+ 215 | 11 | CE (Congestion Experienced) | [RFC3168] | 216 +--------+-------------------------------------+--------------+ 218 Table 2 220 This permits middleboxes implementing AQM to signal incipient 221 congestion, below the threshold required to justify setting CE, by 222 converting some proportion of ECT codepoints to SCE ("SCE marking"). 223 Existing [RFC3168] compliant receivers MUST transparently ignore this 224 new signal with respect to congestion control, and both existing and 225 SCE-aware middleboxes SHOULD convert SCE to CE in the same 226 circumstances as for ECT, thus ensuring backwards compatibility with 227 [RFC3168] ECN endpoints. 229 The permitted ECN codepoint transitions by middleboxes are: 231 +=========+==================+ 232 | From | To | 233 +=========+==================+ 234 | Not-ECT | Not-ECT | 235 +---------+------------------+ 236 | ECT | ECT or SCE or CE | 237 +---------+------------------+ 238 | SCE | SCE or CE | 239 +---------+------------------+ 240 | CE | CE | 241 +---------+------------------+ 243 Table 3 245 Note that dropping a packet is an allowed action for any ECN 246 codepoint. While that is the only way of indicating congestion with 247 Not-ECT, it may also be used to both indicate and reduce congestion 248 in any state. 250 To re-state the allowed transitions another way: for ECN-aware flows, 251 the ECN marking of an individual packet MAY be increased by a 252 middlebox to signal congestion, but MUST NOT be decreased, and 253 packets SHALL NOT be altered to appear to be ECN-aware if they were 254 not originally, nor vice versa. Note however that SCE is numerically 255 less than ECT, but semantically greater, and the latter definition 256 applies for this rule. 258 Receivers and transport protocols conforming to this specification 259 SHALL continue to apply the [RFC3168] interpretation of the CE 260 codepoint, that is, to signal the sender to back off send rate to the 261 same extent as if a packet loss were detected. This maintains 262 compatibility with existing middleboxes, senders and receivers. 264 New SCE-aware receivers and transport protocols SHOULD interpret the 265 SCE codepoint as an indication of mild congestion, and respond 266 accordingly by applying send rates intermediate between those 267 resulting from a continuous sequence of ECT codepoints, and those 268 resulting from a CE codepoint. The ratio of ECT and SCE codepoints 269 received indicates the relative severity of such congestion, with a 270 higher proportion of SCE codepoints indicating more congestion. 272 The intent of SCE marking is a "cruise control" signal which permits 273 middleboxes to request relatively small reductions in send rate, or 274 merely a slowing of send rate growth. Accordingly, SCE marks SHOULD 275 progressively trigger exit from exponential slow-start growth, then 276 reduction to Reno-linear growth (for congestion control algorithms 277 which support higher growth rates in congestion-avoidance phase), 278 then a halt to send rate growth, then a gradual reduction of send 279 rate. For immediate large reductions of send rate, the CE mark MUST 280 retain its original Multiplicative Decrease power as per [RFC8511], 281 and compliant AQMs SHOULD retain the ability to employ it where 282 appropriate. 284 Details of how to implement SCE awareness at the transport layer are 285 left to additional Internet Drafts. To ensure RTT-fair convergence 286 with single-queue SCE AQMs, transports SHOULD stabilise at lower SCE- 287 mark ratios for higher BDPs, and MAY reduce their response to CE 288 marks IFF they are responding to SCE signals received at around the 289 same time (eg. within 1-2 RTTs) in the same flow. 291 To maximise the benefit of SCE, middleboxes SHOULD begin to produce 292 SCE marks at lower congestion levels than they begin to produce CE 293 marks. This will usually ensure that SCE-aware flows avoid receiving 294 CE marks. When a single-queue AQM is upgraded to SCE awareness, this 295 will tend to cause SCE flows to give way to non-SCE flows; to avoid 296 this behaviour, single-queue AQMs MAY be left as [RFC3168] compliant 297 without SCE support. 299 For the avoidance of doubt, a decision to mark CE or to drop a packet 300 always takes precedence over SCE marking. 302 5. Design Rationale 304 The SCE design sees ECN as a "network feature". The risks with ECN 305 signaling (Section 5.1), the need to handle unresponsive flows 306 (Section 5.2), the utility of fairness (Section 5.3), and the 307 availability of only one ECN codepoint all influenced the SCE 308 signaling design. This section discusses these related concerns, 309 along with what is needed from middleboxes to address them, and how 310 that ultimately led to the selection of ECT(1) as an additional 311 signal of lesser congestion (Section 5.4). 313 5.1. Risks with ECN Signaling 315 The safety and effectiveness of ECN signaling depends upon the 316 unaltered transmission of the ECN bits, both for the indication of 317 ECN support, and for ECN signaling. Unlike a drop, which is reliably 318 and irrevocably signaled, ECN signals may be erased or manipulated. 319 Specifically, any of the following results in the lack of a 320 congestion response, which is likely to lead to the near starvation 321 of competing flows: 323 * if transports indicate ECT(0) but do not respond to CE 325 * if packets are erroneously changed from Not-ECT to ECT(0) in the 326 network 328 * if CE marks are erased after a bottleneck 330 * if ECE marks are erased post-negotiation 332 Although the lack of a congestion response is similar to when 333 transports do not respond appropriately to drop, the difference is 334 that with ECN, the behavior can be brought about in the network, 335 without changes to the endpoint. This may happen by accident, for 336 example due to a broken network configuration or endpoint 337 implementation, or on purpose, e.g. using a simple firewall rule. 339 Unresponsive flow mitigation, discussed in the next section, deals 340 with flows that are not responding to congestion signals, including 341 for the reasons listed above. 343 5.2. Unresponsive Flows 345 A single unresponsive flow has the potential to nearly starve all 346 other competing flows in a congested bottleneck, resulting in 347 unacceptable network delays and collapses in throughput. The need to 348 handle unresponsive flows is corroborated in [RFC7567] (section 4), 349 stating: 351 | "Research, engineering, and measurement efforts are needed 352 | regarding the design of mechanisms to deal with flows that are 353 | unresponsive to congestion notification or are responsive, but are 354 | more aggressive than present TCP." 356 The source language from [RFC2309] (section 5) is more direct: 358 | "It is urgent to begin or continue research, engineering, and 359 | measurement efforts contributing to the design of mechanisms to 360 | deal with flows that are unresponsive to congestion notification 361 | or are responsive but more aggressive than TCP." 363 The [COBALT] AQM algorithm is one example of how unresponsive flows 364 can be dealt with, using the [BLUE] algorithm to detect overload and 365 trigger drops. 367 Regardless of how it's done exactly, unresponsive flow mitigation is 368 most effectively implemented with some level of flow awareness, so 369 that drops may be directed to the offending flow/s. Once flow 370 awareness is available, fairness steering becomes possible, discussed 371 further in the following section. 373 5.3. Fairness 375 In order for SCE flows to compete fairly with non-SCE flows, at least 376 one of the following is required: some form of fairness steering, or 377 some way of separating SCE and non-SCE flows. Following is a non- 378 exhaustive list of options: 380 * FQ (fair queueing), to isolate and schedule flows fairly from 381 separate queues 383 * AF (approximate fairness), so that SCE and non-SCE flows can share 384 the same queue, e.g. [AFD], [I-D.morton-tsvwg-codel-approx-fair], 385 [I-D.morton-tsvwg-lightweight-fair-queueing] 387 * DSCP [RFC2474], to explicitly separate SCE and non-SCE flows (see 388 Section 6) 390 When available, fairness is viewed as an advantage, in that it: 392 * controls aggressive flows 394 * prevents network bias 396 * promotes the fair interoperation between the ever-expanding matrix 397 of new congestion control mechanisms 399 The abundance of new and proposed congestion controls is making their 400 fair competition across bandwidths, RTTs and network conditions more 401 difficult if not impossible to ensure in the endpoint alone 402 [CC-REVOLUTION] [CC-COMPAT]. Congestion control implementations may 403 dominate one another under different conditions, e.g. [BBR-CUBIC], 404 while the widespread deployment of potentially beneficial congestion 405 controls that seek to minimize delay is discouraged by the fact that 406 they are often out-competed in bottlenecks by standard TCP. Fairness 407 in the network both improves these conditions and assists transports 408 responding to SCE. 410 5.4. ECT(1) as SCE 412 With only a single ECN codepoint remaining, options are limited for 413 how to signal congestion with high fidelity. Meanwhile, the recent 414 rise in ECN signaling makes backwards compatibility with [RFC3168] a 415 practical requirement. 417 Fortunately, the same network technologies that mitigate the well 418 recognized risks listed in Section 5 above, also make the use of 419 ECT(1) as defined by SCE possible, without a separate traffic 420 identifier. Where those technologies cannot be deployed, Diffserv 421 may be used to identify SCE traffic (see Section 6), a purpose for 422 which it was expressly designed. Where that is impossible, SCE 423 allows a graceful fallback to [RFC3168] ECN. SCE's usage of ECT(1) 424 provides a safe and solid foundation on which future innovations in 425 the network can improve the availability and performance of high- 426 fidelity congestion signaling. 428 6. Diffserv Usage 430 SCE is not dependent on Diffserv [RFC2474] for its signaling, but 431 makes use of it in the following ways: 433 * to mark SCE traffic for experimental or private use 435 * to assist middleboxes in their operation 437 * to request special SCE treatment, such as low delay or low cost 439 6.1. SCE Diffserv Codepoints (DSCPs) 441 All SCE DSCPs indicate SCE support in the originating endpoint. This 442 MAY assist SCE marking middleboxes in their operation, but MUST NOT 443 be depended upon for effective congestion control. See Section 7.3 444 for an example of such a usage. 446 SCE middleboxes MUST retain any SCE DSCPs that arrive on incoming 447 packets, and MUST NOT set them on packets that do not already have 448 them. 450 The SCE DSCPs MAY be set on TCP ACK and control packets which have 451 the Not-ECT codepoint set in the ECN field, IFF the TCP connection as 452 a whole is SCE capable (or in the process of being negotiated as 453 such). This allows all packets relating to that connection to be 454 treated equally by middleboxes which distinguish them. Should ECN 455 negotiation fail, the DSCP should be changed to some non-SCE value 456 for subsequent traffic on that connection. 458 6.1.1. SCE-CAPABLE 460 The SCE-CAPABLE DSCP indicates SCE support, with standard, best- 461 effort service implied. This is the appropriate service for 462 capacity-seeking traffic, for which latency is a secondary 463 consideration. 465 6.1.2. SCE-LOWDELAY 467 The SCE-LOWDELAY DSCP is used to both indicate SCE support and 468 request low-delay service. This MAY be used by AQMs to select a low 469 delay queue with tighter marking parameters that reduce delay, at the 470 possible expense of throughput. 472 6.1.3. SCE-LOWCOST 474 The SCE-LOWCOST DSCP is used to both indicate SCE support and request 475 altruistic low-cost service. This MAY be used by AQMs to 476 deprioritise this traffic in favour of low-delay and best-effort 477 traffic, similar to the LE PHB [RFC8622]. 479 6.2. Diffserv Codepoints for Experimental and Private Use 481 Prior to approval for public experiment, the SCE DSCPs are defined in 482 the experimental pool xxxx11, and the following rules MUST be 483 observed to contain SCE traffic within the experimental network: 485 * SCE senders SHOULD set one of the SCE DSCPs when participating in 486 an SCE experimental network. 488 * SCE middleboxes MUST NOT mark SCE on packets lacking an SCE DSCP, 489 or packets that may leave the experimental network. 491 * SCE receivers MUST check that one of the SCE DSCPs is present 492 before returning SCE feedback. 494 * All SCE DSCPs MUST be bleached at the experimental network 495 boundaries. 497 The following values are proposed for guidance only. Because they 498 are in the experimental pool, they may be changed to suit the 499 environment: 501 +==============+================+=================+ 502 | Name | Value (Binary) | Value (Decimal) | 503 +==============+================+=================+ 504 | SCE-CAPABLE | 000111 | 7 | 505 +--------------+----------------+-----------------+ 506 | SCE-LOWDELAY | 001011 | 11 | 507 +--------------+----------------+-----------------+ 508 | SCE-LOWCOST | 000011 | 3 | 509 +--------------+----------------+-----------------+ 511 Table 4 513 6.3. Diffserv Codepoints for Public Use 515 In the event that SCE is approved for public experiment, the DSCPs 516 will be allocated in an appropriate standards action pool, using a 517 value that is intended to be treated as best-effort traffic by 518 existing deployed devices. 520 One of the SCE DSCPs SHOULD be set by sending endpoints on all SCE 521 capable traffic. However, they neither need to be checked by 522 middleboxes that do not require them before marking SCE, nor by 523 receiving endpoints before returning SCE feedback. That way, they 524 can serve as hints for middleboxes, but the SCE signaling mechanism 525 is not dependent on end-to-end DSCP traversal. 527 Unless and until a public experiment is approved, the guidance in 528 Section 6.2 MUST be followed. 530 7. Examples of use 532 7.1. Codel-type AQMs 534 A simple and natural way to implement SCE in a Codel-type AQM is to 535 mark all ECT packets as SCE if they are over half the Codel target 536 sojourn time, and not marked CE by Codel itself. This threshold 537 function does not necessarily produce the best performance, but is 538 very easy to implement and provides useful information to SCE-aware 539 flows, often sufficient to avoid receiving CE marks whilst still 540 efficiently using available capacity. 542 For a more sophisticated approach avoiding even small-scale 543 oscillation, a stochastic ramp function may be implemented with 100% 544 marking at the Codel target, falling to 0% marking at or above zero 545 sojourn time. The lower point of the ramp should be chosen so that 546 SCE is not accidentally signalled due to CPU scheduling latencies or 547 serialisation delays of single packets. Absent rigorous analysis of 548 these factors, setting the lower limit at half the Codel target 549 should be safe in many cases. 551 The default configuration of Codel is 100ms interval, 5ms target. A 552 typical ramp function for these parameters might cease marking below 553 2.5ms sojourn time, increase marking probability linearly to 100% at 554 5ms, and mark at 100% for sojourn times above 5ms (in which CE 555 marking is also possible). 557 In single-queue AQMs, the above strategy will result in SCE flows 558 yielding to pressure from non-SCE flows, since CE marks do not occur 559 until SCE marking has reached 100%. A balance between smooth SCE 560 behaviour and fairness versus non-SCE traffic can be found by having 561 the marking ramp cross the Codel target at some lower SCE marking 562 rate, perhaps even 0%. A two-part ramp, reaching 1/sqrt(X) at the 563 Codel target (for some chosen X, a cwnd at which the crossover 564 between smoothness and fairness occurs) and ramping up more steeply 565 thereafter, has been implemented successfully for experimentation. 567 The CNQ algorithm [I-D.morton-tsvwg-cheap-nasty-queueing] offers a 568 relatively simple way to limit this yielding behaviour and ensure 569 that, even in competition with non-SCE flows, SCE flows maintain a 570 reasonable minimum throughput capability. This may be sufficient to 571 avoid the need for the two-part ramp described above. 573 Flow-isolating AQMs, including especially CNQ and DRR++ based 574 algorithms, should avoid signalling SCE to flows classified as 575 "sparse", in order to encourage the fastest possible convergence to 576 the fair share. 578 7.2. RED-type AQMs (including PIE) 580 There are several reasonable methods of producing SCE signals in a 581 RED-type AQM. 583 The simplest would be a threshold function, giving a hard boundary in 584 queue depth between 0% and 100% SCE marking. This could be a 585 sensible option for limited hardware implementations. The threshold 586 should be set below the point at which a growing queue might trigger 587 CE marking or packet drops. 589 Another option would be to implement a second marking probability 590 function, occupying a queue-depth space just below that occupied by 591 the main marking probability function. This should be arranged so 592 that high marking rates (ideally 100%) are achieved at or before the 593 point at which CE marking or packet drops begin. 595 For PIE specifically, a second marking probability function could be 596 added with the same parameters as the main marking probability 597 function, except for a lower QDELAY_REF value. This would result in 598 the SCE marking probability remaining strictly higher than the CE 599 marking probability for ECT flows. 601 7.3. Simple Two-Queue Middleboxes 603 In high-capacity or resource constrained SCE marking middleboxes, 604 DSCP may be used to select one of two queues, in lieu of implementing 605 fairness steering. Packets marked with an SCE DSCP are placed in an 606 SCE queue, where an AQM instance may mark congestion with either SCE 607 or CE. Packets not marked with an SCE DSCP are placed in a second 608 [RFC3168] queue, whose AQM instance may only mark congestion with CE. 610 For approximate flow fairness, the queues may be scheduled in 611 proportion to the number of flows they contain. 613 Note that as long as the SCE DSCP remains intact from the sending 614 endpoint to the marking queue, the SCE queue may be used. If it has 615 been erased or altered to a non-SCE DSCP, the packet will be placed 616 in the [RFC3168] queue, and may still benefit from standard ECN. 618 If this middlebox is to be used in public environments, some form of 619 unresponsive flow mitigation is warranted to ensure that flows 620 haven't indicated their support for either SCE or [RFC3168] ECN 621 incorrectly. If flows do not respond to the signals they advertise 622 support for, they will dominate competing traffic in the same queue. 624 7.4. TCP 626 The proposed mechanism for TCP to feed back SCE signals to the sender 627 is outlined in [I-D.grimes-tcpm-tcpsce]. Use is made of the 628 redundant NS bit in the TCP header, which was formerly associated 629 with ECT(1) in the Nonce Sum specification. 631 The recommended response to each single segment marked with SCE is to 632 reduce cwnd by an amortised 1/sqrt(cwnd) segments. Other responses, 633 such as the 1/cwnd from DCTCP, are also acceptable but may perform 634 less well. 636 7.5. Other 638 New transports under development, such as QUIC, may implement a fine- 639 grained signal back to the sender based on SCE. QUIC itself appears 640 to have this sort of feedback already (counting ECT(0), ECT(1) and CE 641 packets received), and the data should be made available for 642 congestion control. 644 8. Compatibility 646 8.1. Existing ECN & AQM Deployments 648 SCE explicitly retains [RFC8511] compliant Multiplicative Decrease 649 responses to CE marks, and conventional Multiplicative Decrease 650 responses to packet loss. SCE senders' behaviour is thus naturally 651 compliant with existing specifications when running over existing 652 networks. 654 Existing endpoints, supporting Not-ECT or [RFC3168] compliant 655 congestion control, are required to treat SCE marks (that is, ECT(1)) 656 as identical to ECT(0), and will thus transparently ignore SCE marks. 657 This is allowed for in SCE's design, and allows SCE middleboxes to be 658 deployed into a heterogeneous network. 660 Hence the incremental deployability of SCE endpoints and middleboxes 661 is good. 663 8.2. L4S 665 L4S [I-D.ietf-tsvwg-l4s-arch] also claims the ECT(1) codepoint, with 666 significantly different semantic meaning than SCE, so a discussion 667 around the potential for L4S and SCE compatibility is warranted. In 668 the L4S system, ECT(1) is used to identify L4S flows, to distinguish 669 them from [RFC3168] flows - necessary since in L4S, the semantic 670 meaning of CE marks is also changed. 672 Since L4S connections are explicitly negotiated through support of 673 AccECN, and AccECN doesn't support SCE, there is no ambiguity 674 regarding the mode of the connection as far as endpoints are 675 concerned. 677 SCE middleboxes will treat L4S flows in the same way as [RFC3168] 678 does. However, because SCE middleboxes are likely to upgrade ECT(1) 679 marked packets to CE at a higher threshold than L4S middleboxes 680 would, L4S flows will outcompete non-L4S flows in a single SCE-aware 681 queue. This is the same known safety concern with L4S deployment in 682 regards to existing [RFC3168] queues, resulting from the redefinition 683 of CE in L4S. Fairness steering in SCE middleboxes could mitigate 684 this. 686 L4S middleboxes may interpret ECT packets which have received SCE 687 markings at some other SCE-aware middlebox as though they were L4S 688 traffic. This may result in a higher CE marking rate and/or 689 different queuing behaviour. It may also result in the reordering of 690 packets for both SCE and non-SCE aware flows through L4S middleboxes, 691 as packets marked ECT(1) will on average traverse the bottleneck with 692 lower delay than packets not marked ECT(1). Although this could be 693 mitigated by [I-D.ietf-tcpm-rack], it may lead to reduced throughput 694 and head-of-line blocking for flows that traverse both SCE and L4S 695 bottlenecks. 697 There are at least two secondary concerns brought about by the L4S 698 use of ECT(1) as a traffic identifier: 700 * If it is found necessary to firewall L4S traffic off from the 701 general Internet, then SCE-marked packets are also likely to be 702 dropped at this boundary. This could have a significantly 703 detrimental effect on ECT traffic traversing both an SCE and an 704 L4S enabled network, even if the endpoints are not explicitly SCE 705 aware. 707 * If it is found necessary to bleach ECT(1) in order to disable L4S 708 in a network, this would erase SCE signals sent to endpoints. 709 Although not ideal, SCE transports would still safely fall back to 710 relying on CE for congestion notification. 712 Lastly, an ambiguous definition of ECT(1) complicates network 713 debugging with packet captures, since it would be unclear whether a 714 packet was marked ECT(1) due to congestion at an SCE bottleneck, or 715 because it is an L4S flow. Although examination of other packets in 716 the flow could reduce this ambiguity, the necessity of observing flow 717 state is generally discouraged for debugging purposes. 719 Thus far, the working group is operating under the assumption that 720 coexistence of SCE and L4S is not an option. 722 9. Ongoing Research and Development 724 The SCE proposal is a work in progress, with ongoing or planned work 725 in at least the following areas: 727 * AQM strategies for a small number of FIFO queues 729 * Tunnel traversal, with possible updates to [RFC3168] and [RFC6040] 731 * Research ways of reducing RTT dependence (Prague requirement #5) 733 * Performance in environments with jitter and burstiness 735 * New testing tools that cover many short flows, and VBR UDP flows 737 * Testing, with guidance from [RFC2914], [RFC7141] and [RFC5033] 739 10. Related Work 741 [RFC8087] [RFC7567] [RFC7928] [RFC8290] [RFC8289] [RFC8033] [RFC8034] 742 [I-D.morton-tsvwg-interflow-intraflow-delays] 744 11. IANA Considerations 746 There are no IANA considerations. 748 12. Security Considerations 750 An adversary could inappropriately set SCE marks at middleboxes he 751 controls to slow down SCE-aware flows, eventually reaching a minimum 752 congestion window. However, the same threat already exists with 753 respect to inappropriately setting CE marks on normal ECN flows, and 754 this would have a greater impact per mark. Therefore no new threat 755 is exposed by SCE in practice. 757 An adversary could also simply ignore SCE marks at the receiver, or 758 ignore SCE information fed back from the receiver to the sender, in 759 an attempt to gain some advantage in throughput. Again, the same 760 could be said about ignoring CE marks, so no truly new threat is 761 exposed. Additionally, correctly implemented SCE detection may 762 actually improve long-term goodput compared to ignoring SCE. 764 An adversary could erase congestion information by converting SCE 765 marks to ECT or Not-ECT codepoints, thus hiding it from the receiver. 766 This has equivalent effects to ignoring SCE signals at the receiver. 767 An identical threat already exists for erasing congestion information 768 from CE marked packets, and may be mitigated by AQMs switching to 769 dropping packets from flows observed to be non-responsive to CE. 771 An adversary could drop SCE-marked packets, believing them to be 772 bogons (see also L4S Compatibility, above). Endpoints should be able 773 to recover from this through retransmission and a reduction of cwnd. 774 However, it is possible for this to lead to a significant denial of 775 service. A workaround is to disable ECN for connections over the 776 affected path. 778 13. Acknowledgements 780 Thanks to Dave Taht for his contributions to the SCE effort, and his 781 work on writing the original draft-morton-taht-sce-00 that was 782 submitted for IETF/104 on which this draft is based. 784 Many thanks to John Gilmore, the members of the ecn-sane project and 785 the cake@lists.bufferbloat.net mailing list, and the former IETF AQM 786 working group. 788 14. Normative References 790 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 791 Notification (ECN) Experimentation", RFC 8311, 792 DOI 10.17487/RFC8311, January 2018, 793 . 795 15. Informative References 797 [AFD] Pan, R., Breslau, L., Prabhakar, B., and S. Shenker, 798 "Approximate fairness through differential dropping", 799 in ACM SIGCOMM Computer Communication Review, April 2003, 800 . 802 [BBR-CUBIC] 803 Borgli, R.J. and J. Misund, "Comparing BBR and CUBIC 804 Congestion Controls", in University of Oslo, INF5072, 805 2018, 806 . 809 [BLUE] Feng, W., Kandlur, D.D., Saha, D., and K.G. Shin, "BLUE: A 810 New Class of Active Queue Management Algorithms", 811 in Computer Science Technical Report, April 1999, 812 . 815 [CC-COMPAT] 816 Fejes, F., Gombos, G., Laki, S., and S. Nadas, 817 "Compatibility of Scalable Congestion Controls", in Second 818 Workshop on the Future of Internet Transport - FIT 2020, 819 Paris, France (Virtual), 2020, 820 . 822 [CC-REVOLUTION] 823 Fejes, F., Gombos, G., Laki, S., and S. Nadas, "Who will 824 Save the Internet from the Congestion Control 825 Revolution?", in Workshop on Buffer Sizing, Stanford 826 University, 2019, . 828 [COBALT] Palmei, J., Gupta, S., Imputato, P., Morton, J., 829 Tahiliani, M.P., Avallone, S., and D. Taht, "Design and 830 Evaluation of COBALT Queue Discipline", in 2019 IEEE 831 International Symposium on Local and Metropolitan Area 832 Networks (LANMAN), September 2019, 833 . 835 [I-D.grimes-tcpm-tcpsce] 836 Grimes, R. W. and P. G. Heist, "Some Congestion 837 Experienced in TCP", Work in Progress, Internet-Draft, 838 draft-grimes-tcpm-tcpsce-01, 4 November 2019, 839 . 841 [I-D.ietf-tcpm-rack] 842 Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "The 843 RACK-TLP Loss Detection Algorithm for TCP", Work in 844 Progress, Internet-Draft, draft-ietf-tcpm-rack-15, 22 845 December 2020, 846 . 848 [I-D.ietf-tsvwg-l4s-arch] 849 Briscoe, B., Schepper, K. D., Bagnulo, M., and G. White, 850 "Low Latency, Low Loss, Scalable Throughput (L4S) Internet 851 Service: Architecture", Work in Progress, Internet-Draft, 852 draft-ietf-tsvwg-l4s-arch-08, 15 November 2020, 853 . 856 [I-D.morton-tsvwg-cheap-nasty-queueing] 857 Morton, J. and P. G. Heist, "Cheap Nasty Queueing", Work 858 in Progress, Internet-Draft, draft-morton-tsvwg-cheap- 859 nasty-queueing-01, 4 November 2019, 860 . 863 [I-D.morton-tsvwg-codel-approx-fair] 864 Morton, J. and P. G. Heist, "Controlled Delay Approximate 865 Fairness AQM", Work in Progress, Internet-Draft, draft- 866 morton-tsvwg-codel-approx-fair-01, 9 March 2020, 867 . 870 [I-D.morton-tsvwg-interflow-intraflow-delays] 871 Morton, J. and P. G. Heist, "Interflow vs Intraflow 872 Delays", Work in Progress, Internet-Draft, draft-morton- 873 tsvwg-interflow-intraflow-delays-00, 17 May 2021, 874 . 877 [I-D.morton-tsvwg-lightweight-fair-queueing] 878 Morton, J. and P. G. Heist, "Lightweight Fair Queueing", 879 Work in Progress, Internet-Draft, draft-morton-tsvwg- 880 lightweight-fair-queueing-00, 2 July 2019, 881 . 884 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 885 Requirement Levels", BCP 14, RFC 2119, 886 DOI 10.17487/RFC2119, March 1997, 887 . 889 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 890 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 891 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 892 S., Wroclawski, J., and L. Zhang, "Recommendations on 893 Queue Management and Congestion Avoidance in the 894 Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998, 895 . 897 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 898 "Definition of the Differentiated Services Field (DS 899 Field) in the IPv4 and IPv6 Headers", RFC 2474, 900 DOI 10.17487/RFC2474, December 1998, 901 . 903 [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, 904 RFC 2914, DOI 10.17487/RFC2914, September 2000, 905 . 907 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 908 of Explicit Congestion Notification (ECN) to IP", 909 RFC 3168, DOI 10.17487/RFC3168, September 2001, 910 . 912 [RFC5033] Floyd, S. and M. Allman, "Specifying New Congestion 913 Control Algorithms", BCP 133, RFC 5033, 914 DOI 10.17487/RFC5033, August 2007, 915 . 917 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 918 Notification", RFC 6040, DOI 10.17487/RFC6040, November 919 2010, . 921 [RFC7141] Briscoe, B. and J. Manner, "Byte and Packet Congestion 922 Notification", BCP 41, RFC 7141, DOI 10.17487/RFC7141, 923 February 2014, . 925 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 926 Recommendations Regarding Active Queue Management", 927 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 928 . 930 [RFC7928] Kuhn, N., Ed., Natarajan, P., Ed., Khademi, N., Ed., and 931 D. Ros, "Characterization Guidelines for Active Queue 932 Management (AQM)", RFC 7928, DOI 10.17487/RFC7928, July 933 2016, . 935 [RFC8033] Pan, R., Natarajan, P., Baker, F., and G. White, 936 "Proportional Integral Controller Enhanced (PIE): A 937 Lightweight Control Scheme to Address the Bufferbloat 938 Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017, 939 . 941 [RFC8034] White, G. and R. Pan, "Active Queue Management (AQM) Based 942 on Proportional Integral Controller Enhanced PIE) for 943 Data-Over-Cable Service Interface Specifications (DOCSIS) 944 Cable Modems", RFC 8034, DOI 10.17487/RFC8034, February 945 2017, . 947 [RFC8087] Fairhurst, G. and M. Welzl, "The Benefits of Using 948 Explicit Congestion Notification (ECN)", RFC 8087, 949 DOI 10.17487/RFC8087, March 2017, 950 . 952 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 953 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 954 May 2017, . 956 [RFC8289] Nichols, K., Jacobson, V., McGregor, A., Ed., and J. 957 Iyengar, Ed., "Controlled Delay Active Queue Management", 958 RFC 8289, DOI 10.17487/RFC8289, January 2018, 959 . 961 [RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys, 962 J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler 963 and Active Queue Management Algorithm", RFC 8290, 964 DOI 10.17487/RFC8290, January 2018, 965 . 967 [RFC8511] Khademi, N., Welzl, M., Armitage, G., and G. Fairhurst, 968 "TCP Alternative Backoff with ECN (ABE)", RFC 8511, 969 DOI 10.17487/RFC8511, December 2018, 970 . 972 [RFC8622] Bless, R., "A Lower-Effort Per-Hop Behavior (LE PHB) for 973 Differentiated Services", RFC 8622, DOI 10.17487/RFC8622, 974 June 2019, . 976 Authors' Addresses 978 Jonathan Morton 979 Kokkonranta 21 980 FI-31520 Pitkajarvi 981 Finland 982 Phone: +358 44 927 2377 983 Email: chromatix99@gmail.com 985 Peter G. Heist 986 Redacted 987 463 11 Liberec 30 988 Czech Republic 990 Email: pete@heistp.net 992 Rodney W. Grimes (editor) 993 Redacted 994 Portland, OR 97217 995 United States 997 Email: rgrimes@freebsd.org