idnits 2.17.1 draft-ietf-tsvwg-ecn-l4s-id-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1993 has weird spacing: '...initial even...' -- The document date (March 9, 2020) is 1502 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-09) exists of draft-ietf-avtcore-cc-feedback-message-05 == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-27 == Outdated reference: A later version (-28) exists of draft-ietf-tcpm-accurate-ecn-11 == Outdated reference: A later version (-15) exists of draft-ietf-tcpm-generalized-ecn-05 == Outdated reference: A later version (-15) exists of draft-ietf-tcpm-rack-07 == Outdated reference: A later version (-25) exists of draft-ietf-tsvwg-aqm-dualq-coupled-10 == Outdated reference: A later version (-22) exists of draft-ietf-tsvwg-ecn-encap-guidelines-13 == Outdated reference: A later version (-20) exists of draft-ietf-tsvwg-l4s-arch-05 == Outdated reference: A later version (-22) exists of draft-ietf-tsvwg-nqb-00 == Outdated reference: A later version (-06) exists of draft-stewart-tsvwg-sctpecn-05 -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 8312 (Obsoleted by RFC 9438) Summary: 0 errors (**), 0 flaws (~~), 12 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Services (tsv) K. De Schepper 3 Internet-Draft Nokia Bell Labs 4 Intended status: Experimental B. Briscoe, Ed. 5 Expires: September 10, 2020 Independent 6 March 9, 2020 8 Identifying Modified Explicit Congestion Notification (ECN) Semantics 9 for Ultra-Low Queuing Delay (L4S) 10 draft-ietf-tsvwg-ecn-l4s-id-10 12 Abstract 14 This specification defines the identifier to be used on IP packets 15 for a new network service called low latency, low loss and scalable 16 throughput (L4S). It is similar to the original (or 'Classic') 17 Explicit Congestion Notification (ECN). 'Classic' ECN marking was 18 required to be equivalent to a drop, both when applied in the network 19 and when responded to by a transport. Unlike 'Classic' ECN marking, 20 for packets carrying the L4S identifier, the network applies marking 21 more immediately and more aggressively than drop, and the transport 22 response to each mark is reduced and smoothed relative to that for 23 drop. The two changes counterbalance each other so that the 24 throughput of an L4S flow will be roughly the same as a non-L4S flow 25 under the same conditions. Nonetheless, the much more frequent 26 control signals and the finer responses to them result in much more 27 fine-grained adjustments, so that ultra-low and consistently low 28 queuing delay (typically sub-millisecond on average) becomes possible 29 for L4S traffic without compromising link utilization. Thus even 30 capacity-seeking (TCP-like) traffic can have high bandwidth and very 31 low delay at the same time, even during periods of high traffic load. 33 The L4S identifier defined in this document distinguishes L4S from 34 'Classic' (e.g. TCP-Reno-friendly) traffic. It gives an incremental 35 migration path so that suitably modified network bottlenecks can 36 distinguish and isolate existing traffic that still follows the 37 Classic behaviour, to prevent it degrading the low queuing delay and 38 loss of L4S traffic. This specification defines the rules that L4S 39 transports and network elements need to follow to ensure they neither 40 harm each other's performance nor that of Classic traffic. Examples 41 of new active queue management (AQM) marking algorithms and examples 42 of new transports (whether TCP-like or real-time) are specified 43 separately. 45 Status of This Memo 47 This Internet-Draft is submitted in full conformance with the 48 provisions of BCP 78 and BCP 79. 50 Internet-Drafts are working documents of the Internet Engineering 51 Task Force (IETF). Note that other groups may also distribute 52 working documents as Internet-Drafts. The list of current Internet- 53 Drafts is at https://datatracker.ietf.org/drafts/current/. 55 Internet-Drafts are draft documents valid for a maximum of six months 56 and may be updated, replaced, or obsoleted by other documents at any 57 time. It is inappropriate to use Internet-Drafts as reference 58 material or to cite them other than as "work in progress." 60 This Internet-Draft will expire on September 10, 2020. 62 Copyright Notice 64 Copyright (c) 2020 IETF Trust and the persons identified as the 65 document authors. All rights reserved. 67 This document is subject to BCP 78 and the IETF Trust's Legal 68 Provisions Relating to IETF Documents 69 (https://trustee.ietf.org/license-info) in effect on the date of 70 publication of this document. Please review these documents 71 carefully, as they describe your rights and restrictions with respect 72 to this document. Code Components extracted from this document must 73 include Simplified BSD License text as described in Section 4.e of 74 the Trust Legal Provisions and are provided without warranty as 75 described in the Simplified BSD License. 77 Table of Contents 79 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 80 1.1. Latency, Loss and Scaling Problems . . . . . . . . . . . 5 81 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 82 1.3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 9 83 2. Consensus Choice of L4S Packet Identifier: Requirements . . . 9 84 3. L4S Packet Identification at Run-Time . . . . . . . . . . . . 10 85 4. Prerequisite Transport Layer Behaviour . . . . . . . . . . . 11 86 4.1. Prerequisite Codepoint Setting . . . . . . . . . . . . . 11 87 4.2. Prerequisite Transport Feedback . . . . . . . . . . . . . 11 88 4.3. Prerequisite Congestion Response . . . . . . . . . . . . 12 89 5. Prerequisite Network Node Behaviour . . . . . . . . . . . . . 14 90 5.1. Prerequisite Classification and Re-Marking Behaviour . . 14 91 5.2. The Meaning of L4S CE Relative to Drop . . . . . . . . . 15 92 5.3. Exception for L4S Packet Identification by Network Nodes 93 with Transport-Layer Awareness . . . . . . . . . . . . . 15 94 5.4. Interaction of the L4S Identifier with other Identifiers 16 95 5.4.1. DualQ Examples of Other Identifiers Complementing L4S 96 Identifiers . . . . . . . . . . . . . . . . . . . . . 16 97 5.4.1.1. Inclusion of Additional Traffic with L4S . . . . 16 98 5.4.1.2. Exclusion of Traffic From L4S Treatment . . . . . 18 99 5.4.1.3. Generalized Combination of L4S and Other 100 Identifiers . . . . . . . . . . . . . . . . . . . 18 101 5.4.2. Per-Flow Queuing Examples of Other Identifiers 102 Complementing L4S Identifiers . . . . . . . . . . . . 19 103 6. L4S Experiments . . . . . . . . . . . . . . . . . . . . . . . 20 104 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 105 8. Security Considerations . . . . . . . . . . . . . . . . . . . 20 106 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 21 107 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 108 10.1. Normative References . . . . . . . . . . . . . . . . . . 21 109 10.2. Informative References . . . . . . . . . . . . . . . . . 22 110 Appendix A. The 'Prague L4S Requirements' . . . . . . . . . . . 28 111 A.1. Requirements for Scalable Transport Protocols . . . . . . 29 112 A.1.1. Use of L4S Packet Identifier . . . . . . . . . . . . 29 113 A.1.2. Accurate ECN Feedback . . . . . . . . . . . . . . . . 29 114 A.1.3. Fall back to Reno-friendly congestion control on 115 packet loss . . . . . . . . . . . . . . . . . . . . . 29 116 A.1.4. Fall back to Reno-friendly congestion control on 117 classic ECN bottlenecks . . . . . . . . . . . . . . . 30 118 A.1.5. Reduce RTT dependence . . . . . . . . . . . . . . . . 31 119 A.1.6. Scaling down to fractional congestion windows . . . . 31 120 A.1.7. Measuring Reordering Tolerance in Time Units . . . . 32 121 A.2. Scalable Transport Protocol Optimizations . . . . . . . . 35 122 A.2.1. Setting ECT in TCP Control Packets and 123 Retransmissions . . . . . . . . . . . . . . . . . . . 35 124 A.2.2. Faster than Additive Increase . . . . . . . . . . . . 35 125 A.2.3. Faster Convergence at Flow Start . . . . . . . . . . 36 126 Appendix B. Alternative Identifiers . . . . . . . . . . . . . . 36 127 B.1. ECT(1) and CE codepoints . . . . . . . . . . . . . . . . 37 128 B.2. ECN Plus a Diffserv Codepoint (DSCP) . . . . . . . . . . 39 129 B.3. ECN capability alone . . . . . . . . . . . . . . . . . . 42 130 B.4. Protocol ID . . . . . . . . . . . . . . . . . . . . . . . 42 131 B.5. Source or destination addressing . . . . . . . . . . . . 42 132 B.6. Summary: Merits of Alternative Identifiers . . . . . . . 43 133 Appendix C. Potential Competing Uses for the ECT(1) Codepoint . 43 134 C.1. Integrity of Congestion Feedback . . . . . . . . . . . . 43 135 C.2. Notification of Less Severe Congestion than CE . . . . . 44 136 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 45 138 1. Introduction 140 This specification defines the identifier to be used on IP packets 141 for a new network service called low latency, low loss and scalable 142 throughput (L4S). It is similar to the original (or 'Classic') 143 Explicit Congestion Notification (ECN [RFC3168]). RFC 3168 required 144 an ECN mark to be equivalent to a drop, both when applied in the 145 network and when responded to by a transport. Unlike Classic ECN 146 marking, the network applies L4S marking more immediately and more 147 aggressively than drop, and the transport response to each mark is 148 reduced and smoothed relative to that for drop. The two changes 149 counterbalance each other so that the throughput of an L4S flow will 150 be roughly the same as a non-L4S flow under the same conditions. 151 Nonetheless, the much more frequent control signals and the finer 152 responses to them result in ultra-low queuing delay without 153 compromising link utilization, and this low delay can be maintained 154 during high load. Ultra-low queuing delay means less than 1 155 millisecond (ms) on average and less than about 2 ms at the 99th 156 percentile. 158 An example of a scalable congestion control that would enable the L4S 159 service is Data Center TCP (DCTCP), which until now has been 160 applicable solely to controlled environments like data centres 161 [RFC8257], because it is too aggressive to co-exist with existing 162 TCP-Reno-friendly traffic. The DualQ Coupled AQM, which is defined 163 in a complementary experimental specification 164 [I-D.ietf-tsvwg-aqm-dualq-coupled], is an AQM framework that enables 165 scalable congestion controls like DCTCP to co-exist with existing 166 traffic, each getting roughly the same flow rate when they compete 167 under similar conditions. Note that a transport such as DCTCP is 168 still not safe to deploy on the Internet unless it satisfies the 169 requirements listed in Section 4. 171 L4S is not only for elastic (TCP-like) traffic - there are scalable 172 congestion controls for real-time media, such as the L4S variant of 173 the SCReAM [RFC8298] real-time media congestion avoidance technique 174 (RMCAT). The factor that distinguishes L4S from Classic traffic is 175 its behaviour in response to congestion. The transport wire 176 protocol, e.g. TCP, QUIC, SCTP, DCCP, RTP/RTCP, is orthogonal (and 177 therefore not suitable for distinguishing L4S from Classic packets). 179 The L4S identifier defined in this document is the key piece that 180 distinguishes L4S from 'Classic' (e.g. Reno-friendly) traffic. It 181 gives an incremental migration path so that suitably modified network 182 bottlenecks can distinguish and isolate existing Classic traffic from 183 L4S traffic to prevent it from degrading the ultra-low delay and loss 184 of the new scalable transports, without harming Classic performance. 186 Initial implementation of the separate parts of the system has been 187 motivated by the performance benefits. 189 1.1. Latency, Loss and Scaling Problems 191 Latency is becoming the critical performance factor for many (most?) 192 applications on the public Internet, e.g. interactive Web, Web 193 services, voice, conversational video, interactive video, interactive 194 remote presence, instant messaging, online gaming, remote desktop, 195 cloud-based applications, and video-assisted remote control of 196 machinery and industrial processes. In the 'developed' world, 197 further increases in access network bit-rate offer diminishing 198 returns, whereas latency is still a multi-faceted problem. In the 199 last decade or so, much has been done to reduce propagation time by 200 placing caches or servers closer to users. However, queuing remains 201 a major intermittent component of latency. 203 The Diffserv architecture provides Expedited Forwarding [RFC3246], so 204 that low latency traffic can jump the queue of other traffic. 205 However, on access links dedicated to individual sites (homes, small 206 enterprises or mobile devices), often all traffic at any one time 207 will be latency-sensitive. Then, given nothing to differentiate 208 from, Diffserv makes no difference. Instead, we need to remove the 209 causes of any unnecessary delay. 211 The bufferbloat project has shown that excessively-large buffering 212 ('bufferbloat') has been introducing significantly more delay than 213 the underlying propagation time. These delays appear only 214 intermittently--only when a capacity-seeking (e.g. TCP) flow is long 215 enough for the queue to fill the buffer, making every packet in other 216 flows sharing the buffer sit through the queue. 218 Active queue management (AQM) was originally developed to solve this 219 problem (and others). Unlike Diffserv, which gives low latency to 220 some traffic at the expense of others, AQM controls latency for _all_ 221 traffic in a class. In general, AQM methods introduce an increasing 222 level of discard from the buffer the longer the queue persists above 223 a shallow threshold. This gives sufficient signals to capacity- 224 seeking (aka. greedy) flows to keep the buffer empty for its intended 225 purpose: absorbing bursts. However, RED [RFC2309] and other 226 algorithms from the 1990s were sensitive to their configuration and 227 hard to set correctly. So, this form of AQM was not widely deployed. 229 More recent state-of-the-art AQM methods, e.g. fq_CoDel [RFC8290], 230 PIE [RFC8033], Adaptive RED [ARED01], are easier to configure, 231 because they define the queuing threshold in time not bytes, so it is 232 invariant for different link rates. However, no matter how good the 233 AQM, the sawtoothing sending window of a Classic congestion control 234 will either cause queuing delay to vary or cause the link to be 235 under-utilized. Even with a perfectly tuned AQM, the additional 236 queuing delay will be of the same order as the underlying speed-of- 237 light delay across the network. 239 If a sender's own behaviour is introducing queuing delay variation, 240 no AQM in the network can "un-vary" the delay without significantly 241 compromising link utilization. Even flow-queuing (e.g. [RFC8290]), 242 which isolates one flow from another, cannot isolate a flow from the 243 delay variations it inflicts on itself. Therefore those applications 244 that need to seek out high bandwidth but also need low latency will 245 have to migrate to scalable congestion control. 247 Altering host behaviour is not enough on its own though. Even if 248 hosts adopt low latency behaviour (scalable congestion controls), 249 they need to be isolated from the behaviour of existing Classic 250 congestion controls that induce large queue variations. L4S enables 251 that migration by providing latency isolation in the network and 252 distinguishing the two types of packets that need to be isolated: L4S 253 and Classic. L4S isolation can be achieved with a queue per flow 254 (e.g. [RFC8290]) but a DualQ [I-D.ietf-tsvwg-aqm-dualq-coupled] is 255 sufficient, and actually gives better tail latency. Both approaches 256 are addressed in this document. 258 The DualQ solution was developed to make ultra-low latency available 259 without requiring per-flow queues at every bottleneck. This was 260 because FQ has well-known downsides - not least the need to inspect 261 transport layer headers in the network, which makes it incompatible 262 with privacy approaches such as IPSec VPN tunnels, and incompatible 263 with link layer queue management, where transport layer headers can 264 be hidden, e.g. 5G. 266 Latency is not the only concern addressed by L4S: It was known when 267 TCP congestion avoidance was first developed that it would not scale 268 to high bandwidth-delay products (footnote 6 of Jacobson and Karels 269 [TCP-CA]). Given regular broadband bit-rates over WAN distances are 270 already [RFC3649] beyond the scaling range of Reno TCP, 'less 271 unscalable' Cubic [RFC8312] and Compound [I-D.sridharan-tcpm-ctcp] 272 variants of TCP have been successfully deployed. However, these are 273 now approaching their scaling limits. Unfortunately, fully scalable 274 congestion controls such as DCTCP [RFC8257] cause Classic ECN 275 congestion controls sharing the same queue to starve themselves, 276 which is why they have been confined to private data centres or 277 research testbeds (until now). 279 It turns out that a congestion control algorithm like DCTCP that 280 solves the latency problem also solves the scalability problem of 281 Classic congestion controls. The finer sawteeth in the congestion 282 window have low amplitude, so they cause very little queuing delay 283 variation and the average time to recover from one congestion signal 284 to the next (the average duration of each sawtooth) remains 285 invariant, which maintains constant tight control as flow-rate 286 scales. A background paper [DCttH15] gives the full explanation of 287 why the design solves both the latency and the scaling problems, both 288 in plain English and in more precise mathematical form. The 289 explanation is summarised without the maths in the L4S architecture 290 document [I-D.ietf-tsvwg-l4s-arch]. 292 1.2. Terminology 294 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 295 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 296 "OPTIONAL" in this document are to be interpreted as described in 297 [RFC2119]. In this document, these words will appear with that 298 interpretation only when in ALL CAPS. Lower case uses of these words 299 are not to be interpreted as carrying RFC-2119 significance. 301 Classic Congestion Control: A congestion control behaviour that can 302 co-exist with standard TCP Reno [RFC5681] without causing 303 significantly negative impact on its flow rate [RFC5033]. With 304 Classic congestion controls, as flow rate scales, the number of 305 round trips between congestion signals (losses or ECN marks) rises 306 with the flow rate. So it takes longer and longer to recover 307 after each congestion event. Therefore control of queuing and 308 utilization becomes very slack, and the slightest disturbance 309 prevents a high rate from being attained [RFC3649]. 311 For instance, with 1500 byte packets and an end-to-end round trip 312 time (RTT) of 36 ms, over the years, as Reno flow rate scales from 313 2 to 100 Mb/s the number of round trips taken to recover from a 314 congestion event rises proportionately, from 4 round trips to 200. 315 Cubic [RFC8312] was developed to be less unscalable, but it is 316 approaching its scaling limit; with the same RTT of 36ms, at 317 100Mb/s it takes about 106 round trips to recover, and at 800 Mb/s 318 its recovery time triples to over 340 round trips, or still more 319 than 12 seconds (Reno would take 57 seconds). Cubic only becomes 320 significantly better than Reno at high delay and rate 321 combinations, for example at 90 ms RTT and 800 Mb/s a Reno flow 322 takes 4000 RTTs or 6 minutes to recover, whereas Cubic 'only' 323 needs 188 RTTs, which is still 17 seconds (double its recovery 324 time at 100Mb/s). 326 Scalable Congestion Control: A congestion control where the average 327 time from one congestion signal to the next (the recovery time) 328 remains invariant as the flow rate scales, all other factors being 329 equal. This maintains the same degree of control over queueing 330 and utilization whatever the flow rate, as well as ensuring that 331 high throughput is robust to disturbances. For instance, DCTCP 332 averages 2 congestion signals per round-trip whatever the flow 333 rate. See Section 4.3 for more explanation. 335 Classic service: The Classic service is intended for all the 336 congestion control behaviours that co-exist with Reno [RFC5681] 337 (e.g. Reno itself, Cubic [RFC8312], Compound 338 [I-D.sridharan-tcpm-ctcp], TFRC [RFC5348]). The term 'Classic 339 queue' means a queue providing the Classic service. 341 Low-Latency, Low-Loss Scalable throughput (L4S) service: The 'L4S' 342 service is intended for traffic from scalable congestion control 343 algorithms, such as Data Center TCP [RFC8257]. The L4S service is 344 for more general traffic than just DCTCP--it allows the set of 345 congestion controls with similar scaling properties to DCTCP to 346 evolve (e.g. Relentless TCP [Mathis09], TCP Prague [PragueLinux] 347 and the L4S variant of SCREAM for real-time media [RFC8298]). The 348 term 'L4S queue' means a queue providing the L4S service. 350 The terms Classic or L4S can also qualify other nouns, such as 351 'queue', 'codepoint', 'identifier', 'classification', 'packet', 352 'flow'. For example: an L4S packet means a packet with an L4S 353 identifier sent from an L4S congestion control. 355 Both Classic and L4S services can cope with a proportion of 356 unresponsive or less-responsive traffic as well, as long as it 357 does not build a queue (e.g. DNS, VoIP, game sync datagrams, 358 etc). 360 Reno-friendly: The subset of Classic traffic that excludes 361 unresponsive traffic and excludes experimental congestion controls 362 intended to coexist with Reno but without always being strictly 363 friendly to Reno (as allowed by [RFC5033]). Reno-friendly is used 364 in place of 'TCP-friendly', given that the TCP protocol is used 365 with many different congestion control behaviours. 367 Classic ECN: The original Explicit Congestion Notification (ECN) 368 protocol [RFC3168], which requires ECN signals to be treated the 369 same as drops, both when generated in the network and when 370 responded to by the sender. The names used for the four 371 codepoints of the 2-bit IP-ECN field are as defined in [RFC3168]: 372 Not ECT, ECT(0), ECT(1) and CE, where ECT stands for ECN-Capable 373 Transport and CE stands for Congestion Experienced. 375 1.3. Scope 377 The new L4S identifier defined in this specification is applicable 378 for IPv4 and IPv6 packets (as for Classic ECN [RFC3168]). It is 379 applicable for the unicast, multicast and anycast forwarding modes. 381 The L4S identifier is an orthogonal packet classification to the 382 Differentiated Services Code Point (DSCP) [RFC2474]. Section 5.4 383 explains what this means in practice. 385 This document is intended for experimental status, so it does not 386 update any standards track RFCs. Therefore it depends on [RFC8311], 387 which is a standards track specification that: 389 o updates the ECN proposed standard [RFC3168] to allow experimental 390 track RFCs to relax the requirement that an ECN mark must be 391 equivalent to a drop (when the network applies markings and/or 392 when the sender responds to them); 394 o changes the status of the experimental ECN nonce [RFC3540] to 395 historic; 397 o makes consequent updates to the following additional proposed 398 standard RFCs to reflect the above two bullets: 400 * ECN for RTP [RFC6679]; 402 * the congestion control specifications of various DCCP 403 congestion control identifier (CCID) profiles [RFC4341], 404 [RFC4342], [RFC5622]. 406 This document is about identifiers that are used for interoperation 407 between hosts and networks. So the audience is broad, covering 408 developers of host transports and network AQMs, as well as covering 409 how operators might wish to combine various identifiers, which would 410 require flexibility from equipment developers. 412 2. Consensus Choice of L4S Packet Identifier: Requirements 414 This subsection briefly records the process that led to a consensus 415 choice of L4S identifier, selected from all the alternatives in 416 Appendix B. 418 The identifier for packets using the Low Latency, Low Loss, Scalable 419 throughput (L4S) service needs to meet the following requirements: 421 o it SHOULD survive end-to-end between source and destination 422 applications: across the boundary between host and network, 423 between interconnected networks, and through middleboxes; 425 o it SHOULD be visible at the IP layer 427 o it SHOULD be common to IPv4 and IPv6 and transport-agnostic; 429 o it SHOULD be incrementally deployable; 431 o it SHOULD enable an AQM to classify packets encapsulated by outer 432 IP or lower-layer headers; 434 o it SHOULD consume minimal extra codepoints; 436 o it SHOULD be consistent on all the packets of a transport layer 437 flow, so that some packets of a flow are not served by a different 438 queue to others. 440 Whether the identifier would be recoverable if the experiment failed 441 is a factor that could be taken into account. However, this has not 442 been made a requirement, because that would favour schemes that would 443 be easier to fail, rather than those more likely to succeed. 445 It is recognised that the chosen identifier is unlikely to satisfy 446 all these requirements, particularly given the limited space left in 447 the IP header. Therefore a compromise will be necessary, which is 448 why all the above requirements are expressed with the word 'SHOULD' 449 not 'MUST'. Appendix B discusses the pros and cons of the 450 compromises made in various competing identification schemes against 451 the above requirements. 453 On the basis of this analysis, "ECT(1) and CE codepoints" is the best 454 compromise. Therefore this scheme is defined in detail in the 455 following sections, while Appendix B records the rationale for this 456 decision. 458 3. L4S Packet Identification at Run-Time 460 The L4S treatment is an experimental track alternative packet marking 461 treatment [RFC4774] to the Classic ECN treatment in [RFC3168], which 462 has been updated by [RFC8311] to allow experiments such as the one 463 defined in the present specification. Like Classic ECN, L4S ECN 464 identifies both network and host behaviour: it identifies the marking 465 treatment that network nodes are expected to apply to L4S packets, 466 and it identifies packets that have been sent from hosts that are 467 expected to comply with a broad type of sending behaviour. 469 For a packet to receive L4S treatment as it is forwarded, the sender 470 sets the ECN field in the IP header to the ECT(1) codepoint. See 471 Section 4 for full transport layer behaviour requirements, including 472 feedback and congestion response. 474 A network node that implements the L4S service normally classifies 475 arriving ECT(1) and CE packets for L4S treatment. See Section 5 for 476 full network element behaviour requirements, including 477 classification, ECN-marking and interaction of the L4S identifier 478 with other identifiers and per-hop behaviours. 480 4. Prerequisite Transport Layer Behaviour 482 4.1. Prerequisite Codepoint Setting 484 A sender that wishes a packet to receive L4S treatment as it is 485 forwarded, MUST set the ECN field in the IP header (v4 or v6) to the 486 ECT(1) codepoint. 488 4.2. Prerequisite Transport Feedback 490 For a transport protocol to provide scalable congestion control it 491 MUST provide feedback of the extent of CE marking on the forward 492 path. When ECN was added to TCP [RFC3168], the feedback method 493 reported no more than one CE mark per round trip. Some transport 494 protocols derived from TCP mimic this behaviour while others report 495 the accurate extent of ECN marking. This means that some transport 496 protocols will need to be updated as a prerequisite for scalable 497 congestion control. The position for a few well-known transport 498 protocols is given below. 500 TCP: Support for the accurate ECN feedback requirements [RFC7560] 501 (such as that provided by AccECN [I-D.ietf-tcpm-accurate-ecn]) by 502 both ends is a prerequisite for scalable congestion control in 503 TCP. Therefore, the presence of ECT(1) in the IP headers even in 504 one direction of a TCP connection will imply that both ends must 505 be supporting accurate ECN feedback. However, the converse does 506 not apply. So even if both ends support AccECN, either of the two 507 ends can choose not to use a scalable congestion control, whatever 508 the other end's choice. 510 SCTP: A suitable ECN feedback mechanism for SCTP could add a chunk 511 to report the number of received CE marks (e.g. 512 [I-D.stewart-tsvwg-sctpecn]), and update the ECN feedback protocol 513 sketched out in Appendix A of the standards track specification of 514 SCTP [RFC4960]. 516 RTP over UDP: A prerequisite for scalable congestion control is for 517 both (all) ends of one media-level hop to signal ECN support 518 [RFC6679] and use the new generic RTCP feedback format of 519 [I-D.ietf-avtcore-cc-feedback-message]. The presence of ECT(1) 520 implies that both (all) ends of that media-level hop support ECN. 521 However, the converse does not apply. So each end of a media- 522 level hop can independently choose not to use a scalable 523 congestion control, even if both ends support ECN. 525 QUIC: Support for sufficiently fine-grained ECN feedback is provided 526 by the first IETF QUIC transport [I-D.ietf-quic-transport]. 528 DCCP: The ACK vector in DCCP [RFC4340] is already sufficient to 529 report the extent of CE marking as needed by a scalable congestion 530 control. 532 4.3. Prerequisite Congestion Response 534 As a condition for a host to send packets with the L4S identifier 535 (ECT(1)), it SHOULD implement a congestion control behaviour that 536 ensures that, in steady state, the average time from one ECN 537 congestion signal to the next (the 'recovery time') does not increase 538 as flow rate scales, all other factors being equal. This is termed a 539 scalable congestion control. This is necessary to ensure that queue 540 variations remain small as flow rate scales, without having to 541 sacrifice utilization. For instance, for DCTCP, the average recovery 542 time is always half a round trip, whatever the flow rate. 544 The condition 'all other factors being equal', allows the recovery 545 time to be different for different round trip times, as long as it 546 does not increase with flow rate for any particular RTT. 548 Saying that the recovery time remains roughly invariant is equivalent 549 to saying that the number of ECN CE marks per round trip remains 550 invariant as flow rate scales, all other factors being equal. For 551 instance, DCTCP's average recovery time of half of 1 RTT is 552 equivalent to 2 ECN marks per round trip. For those who understand 553 steady-state congestion response functions, it is also equivalent to 554 say that, the congestion window is inversely proportional to the 555 proportion of bytes in packets marked with the CE codepoint (see 556 section 2 of [PI2]). 558 As well as DCTCP, TCP Prague [PragueLinux] and the L4S variant of 559 SCReAM [RFC8298] are examples of scalable congestion controls. 561 As with all transport behaviours, a detailed specification (probably 562 an experimental RFC) will need to be defined for each congestion 563 control, following the guidelines for specifying new congestion 564 control algorithms in [RFC5033]. In addition it will need to 565 document these L4S-specific matters, specifically the timescale over 566 which the proportionality is averaged, and control of burstiness. 567 The recovery time requirement above is worded as a 'SHOULD' rather 568 than a 'MUST' to allow reasonable flexibility when defining these 569 specifications. 571 In order to coexist safely with other Internet traffic, a scalable 572 congestion control MUST NOT tag its packets with the ECT(1) codepoint 573 unless it complies with the following bulleted requirements. The 574 specification of a particular scalable congestion control MUST 575 describe in detail how it satisfies each requirement and, for any 576 non-mandatory requirements, it MUST justify why it does not comply: 578 o As well as responding to ECN markings, a scalable congestion 579 control MUST react to packet loss in a way that will coexist 580 safely with a TCP Reno congestion control [RFC5681] (see 581 Appendix A.1.3 for rationale). 583 o A scalable congestion control MUST react to ECN marking from a 584 non-L4S but ECN-capable bottleneck in a way that will coexist with 585 a TCP Reno congestion control [RFC5681] (see Appendix A.1.4 for 586 rationale). 588 Note that a scalable congestion control is not expected to change 589 to setting ECT(0) while it falls back to coexist with Reno. 591 o A scalable congestion control MUST reduce or eliminate RTT bias 592 over as wide a range of RTTs as possible, or at least over the 593 typical range of RTTs that will interact in the intended 594 deployment scenario (see Appendix A.1.5 for rationale). 596 o A scalable congestion control SHOULD remain responsive to 597 congestion when typical RTTs over the public Internet are 598 significantly smaller because they are no longer inflated by 599 queuing delay (see Appendix A.1.6 for rationale). 601 o A scalable congestion control intended for reordering-prone 602 networks SHOULD detect loss by counting in time-based units, which 603 is scalable, as opposed to counting in units of packets (as in the 604 3 DupACK rule of RFC 5681 TCP), which is not scalable (see 605 Appendix A.1.7 for rationale). This requirement is scoped to 606 'reordering-prone networks' in order to exclude congestion 607 controls that are solely used in controlled environments where the 608 network introduces hardly any reordering. 610 Each sender in a session can use a scalable congestion control 611 independently of the congestion control used by the receiver(s) when 612 they send data. Therefore there might be ECT(1) packets in one 613 direction and ECT(0) or Not-ECT in the other. 615 As well as traffic controlled by a scalable congestion control, a 616 reasonable level of smooth unresponsive traffic at a low rate 617 relative to typical broadband capacities is likely to be acceptable 618 (see "'Safe' Unresponsive Traffic" in Section 5.4.1.1.1). 620 5. Prerequisite Network Node Behaviour 622 5.1. Prerequisite Classification and Re-Marking Behaviour 624 A network node that implements the L4S service MUST classify arriving 625 ECT(1) packets for L4S treatment and, other than in the exceptional 626 case referred to next, it MUST classify arriving CE packets for L4S 627 treatment as well. CE packets might have originated as ECT(1) or 628 ECT(0), but the above rule to classify them as if they originated as 629 ECT(1) is the safe choice (see Appendix B.1 for rationale). The 630 exception is where some flow-aware in-network mechanism happens to be 631 available for distinguishing CE packets that originated as ECT(0), as 632 described in Section 5.3, but there is no implication that such a 633 mechanism is necessary. 635 An L4S AQM treatment follows similar codepoint transition rules to 636 those in RFC 3168. Specifically, the ECT(1) codepoint MUST NOT be 637 changed to any other codepoint than CE, and CE MUST NOT be changed to 638 any other codepoint. An ECT(1) packet is classified as ECN-capable 639 and, if congestion increases, an L4S AQM algorithm will increasingly 640 mark the ECN field as CE, otherwise forwarding packets unchanged as 641 ECT(1). Necessary conditions for an L4S marking treatment are 642 defined in Section 5.2. Under persistent overload an L4S marking 643 treatment SHOULD turn off ECN marking, using drop as a congestion 644 signal until the overload episode has subsided, as recommended for 645 all AQM methods in [RFC7567] (Section 4.2.1), which follows the 646 similar advice in RFC 3168 (Section 7). 648 For backward compatibility in uncontrolled environments, a network 649 node that implements the L4S treatment MUST also implement an AQM 650 treatment for the Classic service as defined in Section 1.2. This 651 Classic AQM treatment need not mark ECT(0) packets, but if it does, 652 it will do so under the same conditions as it would drop Not-ECT 653 packets [RFC3168]. It MUST classify arriving ECT(0) and Not-ECT 654 packets for treatment by the Classic AQM (see the discussion of the 655 classifier for the dual-queue coupled AQM in 656 [I-D.ietf-tsvwg-aqm-dualq-coupled]). 658 5.2. The Meaning of L4S CE Relative to Drop 660 The likelihood that an AQM drops a Not-ECT Classic packet (p_C) MUST 661 be roughly proportional to the square of the likelihood that it would 662 have marked it if it had been an L4S packet (p_L). That is 664 p_C ~= (p_L / k)^2 666 The constant of proportionality (k) does not have to be standardised 667 for interoperability, but a value of 2 is RECOMMENDED. The term 668 'likelihood' is used above to allow for marking and dropping to be 669 either probabilistic or deterministic. 671 This formula ensures that Scalable and Classic flows will converge to 672 roughly equal congestion windows, for the worst case of Reno 673 congestion control. This is because the congestion windows of 674 Scalable and Classic congestion controls are inversely proportional 675 to p_L and sqrt(p_C) respectively. So squaring p_C in the above 676 formula counterbalances the square root that characterizes Reno- 677 friendly flows. 679 [I-D.ietf-tsvwg-aqm-dualq-coupled] specifies the essential aspects of 680 an L4S AQM, as well as recommending other aspects. It gives example 681 implementations in appendices. 683 Note that, contrary to RFC 3168, a Coupled Dual Queue AQM 684 implementing the L4S and Classic treatments does not mark an ECT(1) 685 packet under the same conditions that it would have dropped a Not-ECT 686 packet, as allowed by [RFC8311], which updates RFC 3168. However, if 687 it marks ECT(0) packets, it does so under the same conditions that it 688 would have dropped a Not-ECT packet. 690 5.3. Exception for L4S Packet Identification by Network Nodes with 691 Transport-Layer Awareness 693 To implement the L4S treatment, a network node does not need to 694 identify transport-layer flows. Nonetheless, if an implementer is 695 willing to identify transport-layer flows at a network node, and if 696 the most recent ECT packet in the same flow was ECT(0), the node MAY 697 classify CE packets for Classic ECN [RFC3168] treatment. In all 698 other cases, a network node MUST classify all CE packets for L4S 699 treatment. Examples of such other cases are: i) if no ECT packets 700 have yet been identified in a flow; ii) if it is not desirable for a 701 network node to identify transport-layer flows; or iii) if the most 702 recent ECT packet in a flow was ECT(1). 704 If an implementer uses flow-awareness to classify CE packets, to 705 determine whether the flow is using ECT(0) or ECT(1) it only uses the 706 most recent ECT packet of a flow (this advice will need to be 707 verified as part of L4S experiments). This is because a sender might 708 switch from sending ECT(1) (L4S) packets to sending ECT(0) (Classic 709 ECN) packets, or back again, in the middle of a transport-layer flow 710 (e.g. it might manually switch its congestion control module mid- 711 connection, or it might be deliberately attempting to confuse the 712 network). 714 5.4. Interaction of the L4S Identifier with other Identifiers 716 The examples in this section concern how additional identifiers might 717 complement the L4S identifier to classify packets between class-based 718 queues. Firstly considering two queues, L4S and Classic, as in the 719 Coupled DualQ AQM [I-D.ietf-tsvwg-aqm-dualq-coupled], then more 720 complex structures within a larger queuing hierarchy. 722 5.4.1. DualQ Examples of Other Identifiers Complementing L4S 723 Identifiers 725 5.4.1.1. Inclusion of Additional Traffic with L4S 727 In a typical case for the public Internet a network element that 728 implements L4S might want to classify some low-rate but unresponsive 729 traffic (e.g. DNS, LDAP, NTP, voice, game sync packets) into the low 730 latency queue to mix with L4S traffic. Such non-ECN-based packet 731 types MUST be safe to mix with L4S traffic without harming the low 732 latency service, where 'safe' is explained in Section 5.4.1.1.1 733 below. 735 In this case it would not be appropriate to call the queue an L4S 736 queue, because it is shared by L4S and non-L4S traffic. Instead it 737 will be called the low latency or L queue. The L queue then offers 738 two different treatments: 740 o The L4S treatment, which is a combination of the L4S AQM treatment 741 and a priority scheduling treatment; 743 o The low latency treatment, which is solely the priority scheduling 744 treatment, without ECN-marking by the AQM. 746 To identify packets for just the scheduling treatment, it would be 747 inappropriate to use the L4S ECT(1) identifier, because such traffic 748 is unresponsive to ECN marking. Therefore, a network element that 749 implements L4S MAY classify additional packets into the L queue if 750 they carry certain non-ECN identifiers. For instance: 752 o addresses of specific applications or hosts configured to be safe 753 (or perhaps they comply with L4S behaviour and can respond to ECN 754 feedback, but perhaps cannot set the ECN field for some reason); 756 o certain protocols that are usually lightweight (e.g. ARP, DNS); 758 o specific Diffserv codepoints that indicate traffic with limited 759 burstiness such as the EF (Expedited Forwarding [RFC3246]), Voice- 760 Admit [RFC5865] or proposed NQB (Non-Queue-Building 761 [I-D.ietf-tsvwg-nqb]) service classes or equivalent local-use 762 DSCPs (see [I-D.briscoe-tsvwg-l4s-diffserv]). 764 Of course, a packet that carried both the ECT(1) codepoint and a non- 765 ECN identifier associated with the L queue would be classified into 766 the L queue. 768 For clarity, non-ECN identifiers, such as the examples itemized 769 above, might be used by some network operators who believe they 770 identify non-L4S traffic that would be safe to mix with L4S traffic. 771 They are not alternative ways for a host to indicate that it is 772 sending L4S packets. Only the ECT(1) ECN codepoint indicates to a 773 network element that a host is sending L4S packets (and CE indicates 774 that it could have originated as ECT(1)). Specifically ECT(1) 775 indicates that the host claims its behaviour satisfies the 776 prerequisite transport requirements in Section 4. 778 To include additional traffic with L4S, a network element only reads 779 identifiers such as those itemized above. It MUST NOT alter these 780 non-ECN identifiers, so that they survive for any potential use later 781 on the network path. 783 5.4.1.1.1. 'Safe' Unresponsive Traffic 785 The above section requires unresponsive traffic to be 'safe' to mix 786 with L4S traffic. Ideally this means that the sender never sends any 787 sequence of packets at a rate that exceeds the available capacity of 788 the bottleneck link. However, typically an unresponsive transport 789 does not even know the bottleneck capacity of the path, let alone its 790 available capacity. Nonetheless, an application can be considered 791 safe enough if it paces packets out (not necessarily completely 792 regularly) such that its maximum instantaneous rate from packet to 793 packet stays well below a typical broadband access rate. 795 This is a vague but useful definition, because many low latency 796 applications of interest, such as DNS, voice, game sync packets, RPC, 797 ACKs, keep-alives, could match this description. 799 5.4.1.2. Exclusion of Traffic From L4S Treatment 801 To extend the above example, an operator might want to exclude some 802 traffic from the L4S treatment for a policy reason, e.g. security 803 (traffic from malicious sources) or commercial (e.g. initially the 804 operator may wish to confine the benefits of L4S to business 805 customers). 807 In this exclusion case, the operator MUST classify on the relevant 808 locally-used identifiers (e.g. source addresses) before classifying 809 the non-matching traffic on the end-to-end L4S ECN identifier. 811 The operator MUST NOT alter the end-to-end L4S ECN identifier from 812 L4S to Classic, because its decision to exclude certain traffic from 813 L4S treatment is local-only. The end-to-end L4S identifier then 814 survives for other operators to use, or indeed, they can apply their 815 own policy, independently based on their own choice of locally-used 816 identifiers. This approach also allows any operator to remove its 817 locally-applied exclusions in future, e.g. if it wishes to widen the 818 benefit of the L4S treatment to all its customers. 820 5.4.1.3. Generalized Combination of L4S and Other Identifiers 822 L4S concerns low latency, which it can provide for all traffic 823 without differentiation and without affecting bandwidth allocation. 824 Diffserv provides for differentiation of both bandwidth and low 825 latency, but its control of latency depends on its control of 826 bandwidth. The two can be combined if a network operator wants to 827 control bandwidth allocation but it also wants to provide low latency 828 - for any amount of traffic within one of these allocations of 829 bandwidth (rather than only providing low latency by limiting 830 bandwidth) [I-D.briscoe-tsvwg-l4s-diffserv]. 832 The DualQ examples so far have been framed in the context of 833 providing the default Best Efforts Per-Hop Behaviour (PHB) using two 834 queues - a Low Latency (L) queue and a Classic (C) Queue. This 835 single DualQ structure is expected to be the most common and useful 836 arrangement. But, more generally, an operator might choose to 837 control bandwidth allocation through a hierarchy of Diffserv PHBs at 838 a node, and to offer one (or more) of these PHBs with a low latency 839 and a Classic variant. 841 In the first case, if we assume that there are no other PHBs except 842 the DualQ, if a packet carries ECT(1) or CE, a network element would 843 classify it for the L4S treatment irrespective of its DSCP. And, if 844 a packet carried (say) the EF DSCP, the network element could 845 classify it into the L queue irrespective of its ECN codepoint. 846 However, where the DualQ is in a hierarchy of other PHBs, the 847 classifier would classify some traffic into other PHBs based on DSCP 848 before classifying between the low latency and Classic queues (based 849 on ECT(1), CE and perhaps also the EF DSCP or other identifiers as in 850 the above example). [I-D.briscoe-tsvwg-l4s-diffserv] gives a number 851 of examples of such arrangements to address various requirements. 853 [I-D.briscoe-tsvwg-l4s-diffserv] describes how an operator might use 854 L4S to offer low latency for all L4S traffic as well as using 855 Diffserv for bandwidth differentiation. It identifies two main types 856 of approach, which can be combined: the operator might split certain 857 Diffserv PHBs between L4S and a corresponding Classic service. Or it 858 might split the L4S and/or the Classic service into multiple Diffserv 859 PHBs. In either of these cases, a packet would have to be classified 860 on its Diffserv and ECN codepoints. 862 In summary, there are numerous ways in which the L4S ECN identifier 863 (ECT(1) and CE) could be combined with other identifiers to achieve 864 particular objectives. The following categorization articulates 865 those that are valid, but it is not necessarily exhaustive. Those 866 tagged 'Recommended-standard-use' could be set by the sending host or 867 a network. Those tagged 'Local-use' would only be set by a network: 869 1. Identifiers Complementing the L4S Identifier 871 A. Including More Traffic in the L Queue 872 (Could use Recommended-standard-use or Local-use identifiers) 874 B. Excluding Certain Traffic from the L Queue 875 (Local-use only) 877 2. Identifiers to place L4S classification in a PHB Hierarchy 878 (Could use Recommended-standard-use or Local-use identifiers) 880 A. PHBs Before L4S ECN Classification 882 B. PHBs After L4S ECN Classification 884 5.4.2. Per-Flow Queuing Examples of Other Identifiers Complementing L4S 885 Identifiers 887 At a node with per-flow queueing (e.g. FQ-CoDel [RFC8290]), the L4S 888 identifier could complement the Layer-4 flow ID as a further level of 889 flow granularity (i.e. Not-ECT and ECT(0) queued separately from 890 ECT(1) and CE packets). "Risk of reordering Classic CE packets" in 891 Appendix B.1 discusses the resulting ambiguity if packets originally 892 marked ECT(0) are marked CE by an upstream AQM before they arrive at 893 a node that classifies CE as L4S. It argues that the risk of re- 894 ordering is vanishingly small and the consequence of such a low level 895 of re-ordering is minimal. 897 Alternatively, it could be assumed that it is not in a flow's own 898 interest to mix Classic and L4S identifiers. Then the AQM could use 899 the ECN field to switch itself between a Classic and an L4S AQM 900 behaviour within one per-flow queue. For instance, for ECN-capable 901 packets, the AQM might consist of a simple marking threshold and an 902 L4S ECN identifier might simply select a shallower threshold than a 903 Classic ECN identifier would. 905 6. L4S Experiments 907 [I-D.ietf-tsvwg-aqm-dualq-coupled] sets operational and management 908 requirements for experiments with DualQ Coupled AQMs. General 909 operational and management requirements for experiments with L4S 910 congestion controls are given in Section 4 and Section 5 above, e.g. 911 co-existence and scaling requirements, incremental deployment 912 arrangements. 914 The specification of each scalable congestion control will need to 915 include protocol-specific requirements for configuration and 916 monitoring performance during experiments. Appendix A of [RFC5706] 917 provides a helpful checklist. 919 Monitoring for harm to other traffic, specifically bandwidth 920 starvation or excess queuing delay, will need to be conducted 921 alongside all early L4S experiments. It is hard, if not impossible, 922 for an individual flow to measure its impact on other traffic. So 923 such monitoring will need to be conducted using bespoke monitoring 924 across flows and/or across classes of traffic. 926 7. IANA Considerations 928 This specification contains no IANA considerations. 930 8. Security Considerations 932 Approaches to assure the integrity of signals using the new identifer 933 are introduced in Appendix C.1. See the security considerations in 934 the L4S architecture [I-D.ietf-tsvwg-l4s-arch] for further discussion 935 of mis-use of the identifier. 937 The recommendation to detect loss in time units prevents the ACK- 938 splitting attacks described in [Savage-TCP]. 940 9. Acknowledgements 942 Thanks to Richard Scheffenegger, John Leslie, David Taeht, Jonathan 943 Morton, Gorry Fairhurst, Michael Welzl, Mikael Abrahamsson and Andrew 944 McGregor for the discussions that led to this specification. Ing-jyh 945 (Inton) Tsang was a contributor to the early drafts of this document. 946 And thanks to Mikael Abrahamsson, Lloyd Wood, Nicolas Kuhn, Greg 947 White, Tom Henderson, David Black, Gorry Fairhurst, Brian Carpenter, 948 Jake Holland, Rod Grimes and Richard Scheffenegger for providing help 949 and reviewing this draft and to Ingemar Johansson for reviewing and 950 providing substantial text. Appendix A listing the Prague L4S 951 Requirements is based on text authored by Marcelo Bagnulo Braun that 952 was originally an appendix to [I-D.ietf-tsvwg-l4s-arch]. That text 953 was in turn based on the collective output of the attendees listed in 954 the minutes of a 'bar BoF' on DCTCP Evolution during IETF-94 955 [TCPPrague]. 957 The authors' contributions were part-funded by the European Community 958 under its Seventh Framework Programme through the Reducing Internet 959 Transport Latency (RITE) project (ICT-317700). Bob Briscoe was also 960 funded partly by the Research Council of Norway through the TimeIn 961 project, partly by CableLabs and partly by the Comcast Innovation 962 Fund. The views expressed here are solely those of the authors. 964 10. References 966 10.1. Normative References 968 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 969 Requirement Levels", BCP 14, RFC 2119, 970 DOI 10.17487/RFC2119, March 1997, 971 . 973 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 974 of Explicit Congestion Notification (ECN) to IP", 975 RFC 3168, DOI 10.17487/RFC3168, September 2001, 976 . 978 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 979 Explicit Congestion Notification (ECN) Field", BCP 124, 980 RFC 4774, DOI 10.17487/RFC4774, November 2006, 981 . 983 [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., 984 and K. Carlberg, "Explicit Congestion Notification (ECN) 985 for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August 986 2012, . 988 10.2. Informative References 990 [A2DTCP] Zhang, T., Wang, J., Huang, J., Huang, Y., Chen, J., and 991 Y. Pan, "Adaptive-Acceleration Data Center TCP", IEEE 992 Transactions on Computers 64(6):1522-1533, June 2015, 993 . 996 [Ahmed19] Ahmed, A., "Extending TCP for Low Round Trip Delay", 997 Masters Thesis, Uni Oslo , August 2019, 998 . 1000 [Alizadeh-stability] 1001 Alizadeh, M., Javanmard, A., and B. Prabhakar, "Analysis 1002 of DCTCP: Stability, Convergence, and Fairness", ACM 1003 SIGMETRICS 2011 , June 2011. 1005 [ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An 1006 Algorithm for Increasing the Robustness of RED's Active 1007 Queue Management", ACIRI Technical Report , August 2001, 1008 . 1010 [DCttH15] De Schepper, K., Bondarenko, O., Briscoe, B., and I. 1011 Tsang, "'Data Centre to the Home': Ultra-Low Latency for 1012 All", RITE Project Technical Report , 2015, 1013 . 1015 [I-D.briscoe-tsvwg-l4s-diffserv] 1016 Briscoe, B., "Interactions between Low Latency, Low Loss, 1017 Scalable Throughput (L4S) and Differentiated Services", 1018 draft-briscoe-tsvwg-l4s-diffserv-02 (work in progress), 1019 November 2018. 1021 [I-D.ietf-avtcore-cc-feedback-message] 1022 Sarker, Z., Perkins, C., Singh, V., and M. Ramalho, "RTP 1023 Control Protocol (RTCP) Feedback for Congestion Control", 1024 draft-ietf-avtcore-cc-feedback-message-05 (work in 1025 progress), November 2019. 1027 [I-D.ietf-quic-transport] 1028 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 1029 and Secure Transport", draft-ietf-quic-transport-27 (work 1030 in progress), February 2020. 1032 [I-D.ietf-tcpm-accurate-ecn] 1033 Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More 1034 Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate- 1035 ecn-11 (work in progress), March 2020. 1037 [I-D.ietf-tcpm-generalized-ecn] 1038 Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit 1039 Congestion Notification (ECN) to TCP Control Packets", 1040 draft-ietf-tcpm-generalized-ecn-05 (work in progress), 1041 November 2019. 1043 [I-D.ietf-tcpm-rack] 1044 Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK: 1045 a time-based fast loss detection algorithm for TCP", 1046 draft-ietf-tcpm-rack-07 (work in progress), January 2020. 1048 [I-D.ietf-tsvwg-aqm-dualq-coupled] 1049 Schepper, K., Briscoe, B., and G. White, "DualQ Coupled 1050 AQMs for Low Latency, Low Loss and Scalable Throughput 1051 (L4S)", draft-ietf-tsvwg-aqm-dualq-coupled-10 (work in 1052 progress), July 2019. 1054 [I-D.ietf-tsvwg-ecn-encap-guidelines] 1055 Briscoe, B., Kaippallimalil, J., and P. Thaler, 1056 "Guidelines for Adding Congestion Notification to 1057 Protocols that Encapsulate IP", draft-ietf-tsvwg-ecn- 1058 encap-guidelines-13 (work in progress), May 2019. 1060 [I-D.ietf-tsvwg-l4s-arch] 1061 Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low 1062 Latency, Low Loss, Scalable Throughput (L4S) Internet 1063 Service: Architecture", draft-ietf-tsvwg-l4s-arch-05 (work 1064 in progress), February 2020. 1066 [I-D.ietf-tsvwg-nqb] 1067 White, G. and T. Fossati, "A Non-Queue-Building Per-Hop 1068 Behavior (NQB PHB) for Differentiated Services", draft- 1069 ietf-tsvwg-nqb-00 (work in progress), November 2019. 1071 [I-D.sridharan-tcpm-ctcp] 1072 Sridharan, M., Tan, K., Bansal, D., and D. Thaler, 1073 "Compound TCP: A New TCP Congestion Control for High-Speed 1074 and Long Distance Networks", draft-sridharan-tcpm-ctcp-02 1075 (work in progress), November 2008. 1077 [I-D.stewart-tsvwg-sctpecn] 1078 Stewart, R., Tuexen, M., and X. Dong, "ECN for Stream 1079 Control Transmission Protocol (SCTP)", draft-stewart- 1080 tsvwg-sctpecn-05 (work in progress), January 2014. 1082 [LinuxPacedChirping] 1083 Misund, J. and B. Briscoe, "Paced Chirping - Rethinking 1084 TCP start-up", Proc. Linux Netdev 0x13 , March 2019, 1085 . 1087 [Mathis09] 1088 Mathis, M., "Relentless Congestion Control", PFLDNeT'09 , 1089 May 2009, . 1092 [Paced-Chirping] 1093 Misund, J., "Rapid Acceleration in TCP Prague", Masters 1094 Thesis , May 2018, 1095 . 1098 [PI2] De Schepper, K., Bondarenko, O., Tsang, I., and B. 1099 Briscoe, "PI^2 : A Linearized AQM for both Classic and 1100 Scalable TCP", Proc. ACM CoNEXT 2016 pp.105-119, December 1101 2016, 1102 . 1104 [PragueLinux] 1105 Briscoe, B., De Schepper, K., Albisser, O., Misund, J., 1106 Tilmans, O., Kuehlewind, M., and A. Ahmed, "Implementing 1107 the `TCP Prague' Requirements for Low Latency Low Loss 1108 Scalable Throughput (L4S)", Proc. Linux Netdev 0x13 , 1109 March 2019, . 1112 [QV] Briscoe, B. and P. Hurtig, "Up to Speed with Queue View", 1113 RITE Technical Report D2.3; Appendix C.2, August 2015, 1114 . 1117 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 1118 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 1119 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 1120 S., Wroclawski, J., and L. Zhang, "Recommendations on 1121 Queue Management and Congestion Avoidance in the 1122 Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998, 1123 . 1125 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1126 "Definition of the Differentiated Services Field (DS 1127 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1128 DOI 10.17487/RFC2474, December 1998, 1129 . 1131 [RFC2983] Black, D., "Differentiated Services and Tunnels", 1132 RFC 2983, DOI 10.17487/RFC2983, October 2000, 1133 . 1135 [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, 1136 J., Courtney, W., Davari, S., Firoiu, V., and D. 1137 Stiliadis, "An Expedited Forwarding PHB (Per-Hop 1138 Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002, 1139 . 1141 [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit 1142 Congestion Notification (ECN) Signaling with Nonces", 1143 RFC 3540, DOI 10.17487/RFC3540, June 2003, 1144 . 1146 [RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows", 1147 RFC 3649, DOI 10.17487/RFC3649, December 2003, 1148 . 1150 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 1151 Congestion Control Protocol (DCCP)", RFC 4340, 1152 DOI 10.17487/RFC4340, March 2006, 1153 . 1155 [RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion 1156 Control Protocol (DCCP) Congestion Control ID 2: TCP-like 1157 Congestion Control", RFC 4341, DOI 10.17487/RFC4341, March 1158 2006, . 1160 [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for 1161 Datagram Congestion Control Protocol (DCCP) Congestion 1162 Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, 1163 DOI 10.17487/RFC4342, March 2006, 1164 . 1166 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 1167 RFC 4960, DOI 10.17487/RFC4960, September 2007, 1168 . 1170 [RFC5033] Floyd, S. and M. Allman, "Specifying New Congestion 1171 Control Algorithms", BCP 133, RFC 5033, 1172 DOI 10.17487/RFC5033, August 2007, 1173 . 1175 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 1176 Friendly Rate Control (TFRC): Protocol Specification", 1177 RFC 5348, DOI 10.17487/RFC5348, September 2008, 1178 . 1180 [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. 1181 Ramakrishnan, "Adding Explicit Congestion Notification 1182 (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, 1183 DOI 10.17487/RFC5562, June 2009, 1184 . 1186 [RFC5622] Floyd, S. and E. Kohler, "Profile for Datagram Congestion 1187 Control Protocol (DCCP) Congestion ID 4: TCP-Friendly Rate 1188 Control for Small Packets (TFRC-SP)", RFC 5622, 1189 DOI 10.17487/RFC5622, August 2009, 1190 . 1192 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1193 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1194 . 1196 [RFC5706] Harrington, D., "Guidelines for Considering Operations and 1197 Management of New Protocols and Protocol Extensions", 1198 RFC 5706, DOI 10.17487/RFC5706, November 2009, 1199 . 1201 [RFC5865] Baker, F., Polk, J., and M. Dolly, "A Differentiated 1202 Services Code Point (DSCP) for Capacity-Admitted Traffic", 1203 RFC 5865, DOI 10.17487/RFC5865, May 2010, 1204 . 1206 [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP 1207 Authentication Option", RFC 5925, DOI 10.17487/RFC5925, 1208 June 2010, . 1210 [RFC6077] Papadimitriou, D., Ed., Welzl, M., Scharf, M., and B. 1211 Briscoe, "Open Research Issues in Internet Congestion 1212 Control", RFC 6077, DOI 10.17487/RFC6077, February 2011, 1213 . 1215 [RFC6660] Briscoe, B., Moncaster, T., and M. Menth, "Encoding Three 1216 Pre-Congestion Notification (PCN) States in the IP Header 1217 Using a Single Diffserv Codepoint (DSCP)", RFC 6660, 1218 DOI 10.17487/RFC6660, July 2012, 1219 . 1221 [RFC7560] Kuehlewind, M., Ed., Scheffenegger, R., and B. Briscoe, 1222 "Problem Statement and Requirements for Increased Accuracy 1223 in Explicit Congestion Notification (ECN) Feedback", 1224 RFC 7560, DOI 10.17487/RFC7560, August 2015, 1225 . 1227 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 1228 Recommendations Regarding Active Queue Management", 1229 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 1230 . 1232 [RFC7713] Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx) 1233 Concepts, Abstract Mechanism, and Requirements", RFC 7713, 1234 DOI 10.17487/RFC7713, December 2015, 1235 . 1237 [RFC8033] Pan, R., Natarajan, P., Baker, F., and G. White, 1238 "Proportional Integral Controller Enhanced (PIE): A 1239 Lightweight Control Scheme to Address the Bufferbloat 1240 Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017, 1241 . 1243 [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., 1244 and G. Judd, "Data Center TCP (DCTCP): TCP Congestion 1245 Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257, 1246 October 2017, . 1248 [RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys, 1249 J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler 1250 and Active Queue Management Algorithm", RFC 8290, 1251 DOI 10.17487/RFC8290, January 2018, 1252 . 1254 [RFC8298] Johansson, I. and Z. Sarker, "Self-Clocked Rate Adaptation 1255 for Multimedia", RFC 8298, DOI 10.17487/RFC8298, December 1256 2017, . 1258 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 1259 Notification (ECN) Experimentation", RFC 8311, 1260 DOI 10.17487/RFC8311, January 2018, 1261 . 1263 [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 1264 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 1265 RFC 8312, DOI 10.17487/RFC8312, February 2018, 1266 . 1268 [Savage-TCP] 1269 Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, 1270 "TCP Congestion Control with a Misbehaving Receiver", ACM 1271 SIGCOMM Computer Communication Review 29(5):71--78, 1272 October 1999. 1274 [sub-mss-prob] 1275 Briscoe, B. and K. De Schepper, "Scaling TCP's Congestion 1276 Window for Small Round Trip Times", BT Technical Report 1277 TR-TUB8-2015-002, May 2015, 1278 . 1280 [TCP-CA] Jacobson, V. and M. Karels, "Congestion Avoidance and 1281 Control", Laurence Berkeley Labs Technical Report , 1282 November 1988, . 1284 [TCPPrague] 1285 Briscoe, B., "Notes: DCTCP evolution 'bar BoF': Tue 21 Jul 1286 2015, 17:40, Prague", tcpprague mailing list archive , 1287 July 2015, . 1290 [VCP] Xia, Y., Subramanian, L., Stoica, I., and S. Kalyanaraman, 1291 "One more bit is enough", Proc. SIGCOMM'05, ACM CCR 1292 35(4)37--48, 2005, 1293 . 1295 Appendix A. The 'Prague L4S Requirements' 1297 This appendix is informative, not normative. It gives a list of 1298 modifications to current scalable congestion controls so that they 1299 can be deployed over the public Internet and coexist safely with 1300 existing traffic. The list complements the normative requirements in 1301 Section 4 that a sender has to comply with before it can set the L4S 1302 identifier in packets it sends into the Internet. As well as 1303 necessary safety improvements (requirements) this appendix also 1304 includes preferable performance improvements (optimizations). 1306 These recommendations have become know as the Prague L4S 1307 Requirements, because they were originally identified at an ad hoc 1308 meeting during IETF-94 in Prague [TCPPrague]. The wording has been 1309 generalized to apply to all scalable congestion controls, not just 1310 TCP congestion control specifically. They were originally called the 1311 'TCP Prague Requirements', but they are not solely applicable to TCP, 1312 so the name has been generalized, and TCP Prague is now used for a 1313 specific implementation of the requirements. 1315 At the time of writing, DCTCP [RFC8257] is the most widely used 1316 scalable transport protocol. In its current form, DCTCP is specified 1317 to be deployable only in controlled environments. Deploying it in 1318 the public Internet would lead to a number of issues, both from the 1319 safety and the performance perspective. The modifications and 1320 additional mechanisms listed in this section will be necessary for 1321 its deployment over the global Internet. Where an example is needed, 1322 DCTCP is used as a base, but it is likely that most of these 1323 requirements equally apply to other scalable congestion controls. 1325 A.1. Requirements for Scalable Transport Protocols 1327 A.1.1. Use of L4S Packet Identifier 1329 Description: A scalable congestion control needs to distinguish the 1330 packets it sends from those sent by Classic congestion controls. 1332 Motivation: It needs to be possible for a network node to classify 1333 L4S packets without flow state into a queue that applies an L4S ECN 1334 marking behaviour and isolates L4S packets from the queuing delay of 1335 Classic packets. 1337 A.1.2. Accurate ECN Feedback 1339 Description: The transport protocol for a scalable congestion control 1340 needs to provide timely, accurate feedback about the extent of ECN 1341 marking experienced by all packets. 1343 Motivation: Classic congestion controls only need feedback about the 1344 existence of a congestion episode within a round trip, not precisely 1345 how many packets were marked with ECN or dropped. Therefore, in 1346 2001, when ECN feedback was added to TCP [RFC3168], it could not 1347 inform the sender of more than one ECN mark per RTT. Since then, 1348 requirements for more accurate ECN feedback in TCP have been defined 1349 in [RFC7560] and [I-D.ietf-tcpm-accurate-ecn] specifies an 1350 experimental change to the TCP wire protocol to satisfy these 1351 requirements. Most other transport protocols already satisfy this 1352 requirement. 1354 A.1.3. Fall back to Reno-friendly congestion control on packet loss 1356 Description: As well as responding to ECN markings in a scalable way, 1357 a scalable congestion control needs to react to packet loss in a way 1358 that will coexist safely with a TCP Reno congestion control 1359 [RFC5681]. 1361 Motivation: Part of the safety conditions for deploying a scalable 1362 congestion control on the public Internet is to make sure that it 1363 behaves properly when it builds a queue at a network bottleneck that 1364 has not been upgraded to support L4S. Packet loss can have many 1365 causes, but it usually has to be conservatively assumed that it is a 1366 sign of congestion. Therefore, on detecting packet loss, a scalable 1367 congestion control will need to fall back to Classic congestion 1368 control behaviour. If it does not comply with this requirement it 1369 could starve Classic traffic. 1371 A scalable congestion control can be used for different types of 1372 transport, e.g. for real-time media or for reliable transport like 1373 TCP. Therefore, the particular Classic congestion control behaviour 1374 to fall back on will need to be part of the congestion control 1375 specification of the relevant transport. In the particular case of 1376 DCTCP, the DCTCP specification [RFC8257] states that "It is 1377 RECOMMENDED that an implementation deal with loss episodes in the 1378 same way as conventional TCP." For safe deployment of a scalable 1379 congestion control in the public Internet, the above requirement 1380 would need to be defined as a "MUST". 1382 Even though a bottleneck is L4S capable, it might still become 1383 overloaded and have to drop packets. In this case, the sender may 1384 receive a high proportion of packets marked with the CE bit set and 1385 also experience loss. Current DCTCP implementations react 1386 differently to this situation. At least one implementation reacts 1387 only to the drop signal (e.g. by halving the CWND) and at least 1388 another DCTCP implementation reacts to both signals (e.g. by halving 1389 the CWND due to the drop and also further reducing the CWND based on 1390 the proportion of marked packet). A third approach for the public 1391 Internet has been proposed that adjusts the loss response to result 1392 in a halving when combined with the ECN response. We believe that 1393 further experimentation is needed to understand what is the best 1394 behaviour for the public Internet, which may or not be one of these 1395 existing approaches. 1397 A.1.4. Fall back to Reno-friendly congestion control on classic ECN 1398 bottlenecks 1400 Description: A scalable congestion control needs to react to ECN 1401 marking from a non-L4S, but ECN-capable, bottleneck in a way that 1402 will coexist with a TCP Reno congestion control [RFC5681]. 1404 Motivation: Similarly to the requirement in Appendix A.1.3, this 1405 requirement is a safety condition to ensure a scalable congestion 1406 control behaves properly when it builds a queue at a network 1407 bottleneck that has not been upgraded to support L4S. On detecting 1408 Classic ECN marking (see below), a scalable congestion control will 1409 need to fall back to Classic congestion control behaviour. If it 1410 does not comply with this requirement it could starve Classic 1411 traffic. 1413 It would take time for endpoints to distinguish Classic and L4S ECN 1414 marking. An increase in queuing delay or in delay variation would be 1415 a tell-tale sign, but it is not yet clear where a line would be drawn 1416 between the two behaviours. It might be possible to cache what was 1417 learned about the path to help subsequent attempts to detect the type 1418 of marking. 1420 A.1.5. Reduce RTT dependence 1422 Description: A scalable congestion control needs to reduce or 1423 eliminate RTT bias over as wide a range of RTTs as possible, or at 1424 least over the typical range of RTTs that will interact in the 1425 intended deployment scenario. 1427 Motivation: The throughput of Classic congestion controls is known to 1428 be inversely proportional to RTT, so one would expect flows over very 1429 low RTT paths to nearly starve flows over larger RTTs. However, 1430 Classic congestion controls have never allowed a very low RTT path to 1431 exist because they induce a large queue. For instance, consider two 1432 paths with base RTT 1ms and 100ms. If a Classic congestion control 1433 induces a 100ms queue, it turns these RTTs into 101ms and 200ms 1434 leading to a throughput ratio of about 2:1. Whereas if a scalable 1435 congestion control induces only a 1ms queue, the ratio is 2:101, 1436 leading to a throughput ratio of about 50:1. 1438 Therefore, with very small queues, long RTT flows will essentially 1439 starve, unless scalable congestion controls comply with this 1440 requirement. 1442 A.1.6. Scaling down to fractional congestion windows 1444 Description: A scalable congestion control needs to remain responsive 1445 to congestion when typical RTTs over the public Internet are 1446 significantly smaller because they are no longer inflated by queuing 1447 delay. 1449 Motivation: As currently specified, the minimum required congestion 1450 window of TCP (and its derivatives) is set to 2 sender maximum 1451 segment sizes (SMSS) (see equation (4) in [RFC5681]). Once the 1452 congestion window reaches this minimum, all known window-based 1453 congestion control algorithms become unresponsive to congestion 1454 signals. No matter how much drop or ECN marking, the congestion 1455 window of all these algorithms no longer reduces. Instead, the 1456 sender's lack of any further congestion response forces the queue to 1457 grow, overriding any AQM and increasing queuing delay. 1459 L4S mechanisms significantly reduce queueing delay so, over the same 1460 path, the RTT becomes lower. Then this problem becomes surprisingly 1461 common [sub-mss-prob]. This is because, for the same link capacity, 1462 smaller RTT implies a smaller window. For instance, consider a 1463 residential setting with an upstream broadband Internet access of 8 1464 Mb/s, assuming a max segment size of 1500 B. Two upstream flows will 1465 each have the minimum window of 2 SMSS if the RTT is 6ms or less, 1466 which is quite common when accessing a nearby data centre. So, any 1467 more than two such parallel TCP flows will become unresponsive and 1468 increase queuing delay. 1470 Unless scalable congestion controls address this requirement from the 1471 start, they will frequently become unresponsive, negating the low 1472 latency benefit of L4S, for themselves and for others. 1474 That would seem to imply that scalable congestion controllers ought 1475 to be required to be able work with a congestion window less than 2 1476 SMSS. For instance, one possible mechanism that can maintain a 1477 congestion window significantly less than 1 SMSS is described in 1478 [Ahmed19], and other approaches are likely to be feasible. 1480 However, the requirement in Section 4.3 is worded as a "SHOULD" 1481 because the existence of a minimum window is not all bad. When 1482 competing with an unresponsive flow, a minimum window naturally 1483 protects the flow from starvation by at least keeping some data 1484 flowing. 1486 By stating this requirement as a "SHOULD", specifications of scalable 1487 congestion controllers will be able to choose an appropriate minimum 1488 window, but they will at least have to justify the decision. 1490 A.1.7. Measuring Reordering Tolerance in Time Units 1492 Description: A scalable congestion control needs to detect loss by 1493 counting in time-based units, which is scalable, rather than counting 1494 in units of packets, which is not. 1496 Motivation: A primary purpose of L4S is scalable throughput (it's in 1497 the name). Scalability in all dimensions is, of course, also a goal 1498 of all IETF technology. The inverse linear congestion response in 1499 Section 4.3 is necessary, but not sufficient, to solve the congestion 1500 control scalability problem identified in [RFC3649]. As well as 1501 maintaining frequent ECN signals as rate scales, it is also important 1502 to ensure that a potentially false perception of loss does not limit 1503 throughput scaling. 1505 End-systems cannot know whether a missing packet is due to loss or 1506 reordering, except in hindsight - if it appears later. So they can 1507 only deem that there has been a loss if a gap in the sequence space 1508 has not been filled, either after a certain number of subsequent 1509 packets has arrived (e.g. the 3 DupACK rule of standard TCP 1510 congestion control [RFC5681]) or after a certain amount of time (e.g. 1511 the experimental RACK approach [I-D.ietf-tcpm-rack]). 1513 As we attempt to scale packet rate over the years: 1515 o Even if only _some_ sending hosts still deem that loss has 1516 occurred by counting reordered packets, _all_ networks will have 1517 to keep reducing the time over which they keep packets in order. 1518 If some link technologies keep the time within which reordering 1519 occurs roughly unchanged, then loss over these links, as perceived 1520 by these hosts, will appear to continually rise over the years. 1522 o In contrast, if all senders detect loss in units of time, the time 1523 over which the network has to keep packets in order stays roughly 1524 invariant. 1526 Therefore hosts have an incentive to detect loss in time units (so as 1527 not to fool themselves too often into detecting losses when there are 1528 none). And for hosts that are changing their congestion control 1529 implementation to L4S, there is no downside to including time-based 1530 loss detection code in the change (loss recovery implemented in 1531 hardware is an exception, covered later). Therefore requiring L4S 1532 hosts to detect loss in time-based units would not be a burden. 1534 If this requirement is not placed on L4S hosts, even though it would 1535 be no burden on them to do so, all networks will face unnecessary 1536 uncertainty over whether some L4S hosts might be detecting loss by 1537 counting packets. Then _all_ link technologies will have to 1538 unnecessarily keep reducing the time within which reordering occurs. 1539 That is not a problem for some link technologies, but it becomes 1540 increasingly challenging for other link technologies to continue to 1541 scale, particularly those relying on channel bonding for scaling, 1542 such as LTE, 5G and DOCSIS. 1544 Given Internet paths traverse many link technologies, any scaling 1545 limit for these more challenging access link technologies would 1546 become a scaling limit for the Internet as a whole. 1548 It might be asked how it helps to place this loss detection 1549 requirement only on L4S hosts, because networks will still face 1550 uncertainty over whether non-L4S flows are detecting loss by counting 1551 DupACKs. The answer is that those link technologies for which it is 1552 challenging to keep squeezing the reordering time will only need to 1553 do so for non-L4S traffic (which they can do because the L4S 1554 identifier is visible at the IP layer). Therefore, they can focus 1555 their processing and memory resources into scaling non-L4S (Classic) 1556 traffic. Then, the higher the proportion of L4S traffic, the less of 1557 a scaling challenge they will have. 1559 To summarize, there is no reason for L4S hosts not to be part of the 1560 solution instead of part of the problem. 1562 Requirement ("MUST") or recommendation ("SHOULD")? As explained 1563 above, this is a subtle interoperability issue between hosts and 1564 networks, which seems to need a "MUST". Unless networks can be 1565 certain that all L4S hosts follow the time-based approach, they still 1566 have to cater for the worst case - continually squeeze reordering 1567 into a smaller and smaller duration - just for hosts that might be 1568 using the counting approach. However, it was decided to express this 1569 as a recommendation, using "SHOULD". The main justification was that 1570 networks can still be fairly certain that L4S hosts will follow this 1571 recommendation, because following it offers only gain and no pain. 1573 Details: 1575 The speed of loss recovery is much more significant for short flows 1576 than long, therefore a good compromise is to adapt the reordering 1577 window; from a small fraction of the RTT at the start of a flow, to a 1578 larger fraction of the RTT for flows that continue for many round 1579 trips. 1581 This is broadly the approach adopted by TCP RACK (Recent 1582 ACKnowledgements) [I-D.ietf-tcpm-rack]. However, RACK starts with 1583 the 3 DupACK approach, because the RTT estimate is not necessarily 1584 stable. As long as the initial window is paced, such initial use of 1585 3 DupACK counting would amount to time-based loss detection and 1586 therefore would satisfy the time-based loss detection recommendation 1587 of Section 4.3. This is because pacing of the initial window would 1588 ensure that 3 DupACKs early in the connection would be spread over a 1589 small fraction of the round trip. 1591 As mentioned above, hardware implementations of loss recovery using 1592 DupACK counting exist (e.g. some implementations of RoCEv2 for RDMA). 1593 For low latency, these implementations can change their congestion 1594 control to implement L4S, because the congestion control (as distinct 1595 from loss recovery) is implemented in software. But they cannot 1596 easily satisfy this loss recovery requirement. However, it is 1597 believed they do not need to. It is believed that such 1598 implementations solely exist in controlled environments, where the 1599 network technology keeps reordering extremely low anyway. This is 1600 why the scope of the normative recommendation in Section 4.3 is 1601 limited to 'reordering-prone' networks. 1603 Detecting loss in time units also prevents the ACK-splitting attacks 1604 described in [Savage-TCP]. 1606 A.2. Scalable Transport Protocol Optimizations 1608 A.2.1. Setting ECT in TCP Control Packets and Retransmissions 1610 Description: This item only concerns TCP and its derivatives (e.g. 1611 SCTP), because the original specification of ECN for TCP precluded 1612 the use of ECN on control packets and retransmissions. To improve 1613 performance, scalable transport protocols ought to enable ECN at the 1614 IP layer in TCP control packets (SYN, SYN-ACK, pure ACKs, etc.) and 1615 in retransmitted packets. The same is true for derivatives of TCP, 1616 e.g. SCTP. 1618 Motivation: RFC 3168 prohibits the use of ECN on these types of TCP 1619 packet, based on a number of arguments. This means these packets are 1620 not protected from congestion loss by ECN, which considerably harms 1621 performance, particularly for short flows. 1622 [I-D.ietf-tcpm-generalized-ecn] counters each argument in RFC 3168 in 1623 turn, showing it was over-cautious. Instead it proposes experimental 1624 use of ECN on all types of TCP packet as long as AccECN feedback 1625 [I-D.ietf-tcpm-accurate-ecn] is available (which is itself a 1626 prerequisite for using a scalable congestion control). 1628 A.2.2. Faster than Additive Increase 1630 Description: It would improve performance if scalable congestion 1631 controls did not limit their congestion window increase to the 1632 standard additive increase of 1 SMSS per round trip [RFC5681] during 1633 congestion avoidance. The same is true for derivatives of TCP 1634 congestion control, including similar approaches used for real-time 1635 media. 1637 Motivation: As currently defined [RFC8257], DCTCP uses the 1638 traditional TCP Reno additive increase in congestion avoidance phase. 1639 When the available capacity suddenly increases (e.g. when another 1640 flow finishes, or if radio capacity increases) it can take very many 1641 round trips to take advantage of the new capacity. TCP Cubic was 1642 designed to solve this problem, but as flow rates have continued to 1643 increase, the delay accelerating into available capacity has become 1644 prohibitive. See, for instance, the examples in Section 1.2. Even 1645 when out of its Reno-compatibility mode, every 8x scaling of Cubic's 1646 flow rate leads to 2x more acceleration delay. 1648 In the steady state, DCTCP induces about 2 ECN marks per round trip, 1649 so it is possible to quickly detect when these signals have 1650 disappeared and seek available capacity more rapidly, while 1651 minimizing the impact on other flows (Classic and scalable) 1652 [LinuxPacedChirping]. Alternatively, approaches such as Adaptive 1653 Acceleration (A2DTCP [A2DTCP]) have been proposed to address this 1654 problem in data centres, which might be deployable over the public 1655 Internet. 1657 A.2.3. Faster Convergence at Flow Start 1659 Description: Particularly when a flow starts, scalable congestion 1660 controls need to converge (reach their steady-state share of the 1661 capacity) at least as fast as Classic congestion controls and 1662 preferably faster. This affects the flow start behaviour of any L4S 1663 congestion control derived from a Classic transport that uses TCP 1664 slow start, including those for real-time media. 1666 Motivation: As an example, a new DCTCP flow takes longer than a 1667 Classic congestion control to obtain its share of the capacity of the 1668 bottleneck when there are already ongoing flows using the bottleneck 1669 capacity. In a data centre environment DCTCP takes about a factor of 1670 1.5 to 2 longer to converge due to the much higher typical level of 1671 ECN marking that DCTCP background traffic induces, which causes new 1672 flows to exit slow start early [Alizadeh-stability]. In testing for 1673 use over the public Internet the convergence time of DCTCP relative 1674 to a regular loss-based TCP slow start is even less favourable 1675 [Paced-Chirping]) due to the shallow ECN marking threshold needed for 1676 L4S. It is exacerbated by the typically greater mismatch between the 1677 link rate of the sending host and typical Internet access 1678 bottlenecks. This problem is detrimental in general, but would 1679 particularly harm the performance of short flows relative to Classic 1680 congestion controls. 1682 Appendix B. Alternative Identifiers 1684 This appendix is informative, not normative. It records the pros and 1685 cons of various alternative ways to identify L4S packets to record 1686 the rationale for the choice of ECT(1) (Appendix B.1) as the L4S 1687 identifier. At the end, Appendix B.6 summarises the distinguishing 1688 features of the leading alternatives. It is intended to supplement, 1689 not replace the detailed text. 1691 The leading solutions all use the ECN field, sometimes in combination 1692 with the Diffserv field. This is because L4S traffic has to indicate 1693 that it is ECN-capable anyway, because ECN is intrinsic to how L4S 1694 works. Both the ECN and Diffserv fields have the additional 1695 advantage that they are no different in either IPv4 or IPv6. A 1696 couple of alternatives that use other fields are mentioned at the 1697 end, but it is quickly explained why they are not serious contenders. 1699 B.1. ECT(1) and CE codepoints 1701 Definition: 1703 Packets with ECT(1) and conditionally packets with CE would 1704 signify L4S semantics as an alternative to the semantics of 1705 Classic ECN [RFC3168], specifically: 1707 * The ECT(1) codepoint would signify that the packet was sent by 1708 an L4S-capable sender. 1710 * Given shortage of codepoints, both L4S and Classic ECN sides of 1711 an AQM would have to use the same CE codepoint to indicate that 1712 a packet had experienced congestion. If a packet that had 1713 already been marked CE in an upstream buffer arrived at a 1714 subsequent AQM, this AQM would then have to guess whether to 1715 classify CE packets as L4S or Classic ECN. Choosing the L4S 1716 treatment would be a safer choice, because then a few Classic 1717 packets might arrive early, rather than a few L4S packets 1718 arriving late. 1720 * Additional information might be available if the classifier 1721 were transport-aware. Then it could classify a CE packet for 1722 Classic ECN treatment if the most recent ECT packet in the same 1723 flow had been marked ECT(0). However, the L4S service ought 1724 not to need tranport-layer awareness. 1726 Cons: 1728 Consumes the last ECN codepoint: The L4S service is intended to 1729 supersede the service provided by Classic ECN, therefore using 1730 ECT(1) to identify L4S packets could ultimately mean that the 1731 ECT(0) codepoint was 'wasted' purely to distinguish one form of 1732 ECN from its successor. 1734 ECN hard in some lower layers: It is not always possible to support 1735 ECN in an AQM acting in a buffer below the IP layer 1736 [I-D.ietf-tsvwg-ecn-encap-guidelines]. In such cases, the L4S 1737 service would have to drop rather than mark frames even though 1738 they might encapsulate an ECN-capable packet. However, such cases 1739 would be unusual. 1741 Risk of reordering Classic CE packets: Classifying all CE packets 1742 into the L4S queue risks any CE packets that were originally 1743 ECT(0) being incorrectly classified as L4S. If there were delay 1744 in the Classic queue, these incorrectly classified CE packets 1745 would arrive early, which is a form of reordering. Reordering can 1746 cause TCP senders (and senders of similar transports) to 1747 retransmit spuriously. However, the risk of spurious 1748 retransmissions would be extremely low for the following reasons: 1750 1. It is quite unusual to experience queuing at more than one 1751 bottleneck on the same path (the available capacities have to 1752 be identical). 1754 2. In only a subset of these unusual cases would the first 1755 bottleneck support Classic ECN marking while the second 1756 supported L4S ECN marking, which would be the only scenario 1757 where some ECT(0) packets could be CE marked by an AQM 1758 supporting Classic ECN then the remainder experienced further 1759 delay through the Classic side of a subsequent L4S DualQ AQM. 1761 3. Even then, when a few packets are delivered early, it takes 1762 very unusual conditions to cause a spurious retransmission, in 1763 contrast to when some packets are delivered late. The first 1764 bottleneck has to apply CE-marks to at least N contiguous 1765 packets and the second bottleneck has to inject an 1766 uninterrupted sequence of at least N of these packets between 1767 two packets earlier in the stream (where N is the reordering 1768 window that the transport protocol allows before it considers 1769 a packet is lost). 1771 For example consider N=3, and consider the sequence of 1772 packets 100, 101, 102, 103,... and imagine that packets 1773 150,151,152 from later in the flow are injected as follows: 1774 100, 150, 151, 101, 152, 102, 103... If this were late 1775 reordering, even one packet arriving 50 out of sequence 1776 would trigger a spurious retransmission, but there is no 1777 spurious retransmission here, with early reordering, 1778 because packet 101 moves the cumulative ACK counter forward 1779 before 3 packets have arrived out of order. Later, when 1780 packets 148, 149, 153... arrive, even though there is a 1781 3-packet hole, there will be no problem, because the 1782 packets to fill the hole are already in the receive buffer. 1784 4. Even with the current TCP recommendation of N=3 [RFC5681] 1785 spurious retransmissions will be unlikely for all the above 1786 reasons. As RACK [I-D.ietf-tcpm-rack] is becoming widely 1787 deployed, it tends to adapt its reordering window to a larger 1788 value of N, which will make the chance of a contiguous 1789 sequence of N early arrivals vanishingly small. 1791 5. Even a run of 2 CE marks within a Classic ECN flow is 1792 unlikely, given FQ-CoDel is the only known widely deployed AQM 1793 that supports Classic ECN marking and it takes great care to 1794 separate out flows and to space any markings evenly along each 1795 flow. 1797 It is extremely unlikely that the above set of 5 eventualities 1798 that are each unusual in themselves would all happen 1799 simultaneously. But, even if they did, the consequences would 1800 hardly be dire: the odd spurious fast retransmission. Admittedly 1801 TCP (and similar transports) reduce their congestion window when 1802 they deem there has been a loss, but even this can be recovered 1803 once the sender detects that the retransmission was spurious. 1805 Non-L4S service for control packets: The Classic ECN RFCs [RFC3168] 1806 and [RFC5562] require a sender to clear the ECN field to Not-ECT 1807 for retransmissions and certain control packets specifically pure 1808 ACKs, window probes and SYNs. When L4S packets are classified by 1809 the ECN field alone, these control packets would not be classified 1810 into an L4S queue, and could therefore be delayed relative to the 1811 other packets in the flow. This would not cause re-ordering 1812 (because retransmissions are already out of order, and the control 1813 packets carry no data). However, it would make critical control 1814 packets more vulnerable to loss and delay. To address this 1815 problem, [I-D.ietf-tcpm-generalized-ecn] proposes an experiment in 1816 which all TCP control packets and retransmissions are ECN-capable 1817 as long as ECN feedback is available. 1819 Pros: 1821 Should work e2e: The ECN field generally works end-to-end across the 1822 Internet. Unlike the DSCP, the setting of the ECN field is at 1823 least forwarded unchanged by networks that do not support ECN, and 1824 networks rarely clear it to zero. 1826 Should work in tunnels: Unlike Diffserv, ECN is defined to always 1827 work across tunnels. However, tunnels do not always implement ECN 1828 processing as they should do, particularly because IPsec tunnels 1829 were defined differently for a few years. 1831 Could migrate to one codepoint: If all Classic ECN senders 1832 eventually evolve to use the L4S service, the ECT(0) codepoint 1833 could be reused for some future purpose, but only once use of 1834 ECT(0) packets had reduced to zero, or near-zero, which might 1835 never happen. 1837 B.2. ECN Plus a Diffserv Codepoint (DSCP) 1839 Definition: 1841 For packets with a defined DSCP, all codepoints of the ECN field 1842 (except Not-ECT) would signify alternative L4S semantics to those 1843 for Classic ECN [RFC3168], specifically: 1845 * The L4S DSCP would signifiy that the packet came from an L4S- 1846 capable sender. 1848 * ECT(0) and ECT(1) would both signify that the packet was 1849 travelling between transport endpoints that were both ECN- 1850 capable. 1852 * CE would signify that the packet had been marked by an AQM 1853 implementing the L4S service. 1855 Use of a DSCP is the only approach for alternative ECN semantics 1856 given as an example in [RFC4774]. However, it was perhaps considered 1857 more for controlled environments than new end-to-end services. 1859 Cons: 1861 Consumes DSCP pairs: A DSCP is obviously not orthogonal to Diffserv. 1862 Therefore, wherever the L4S service is applied to multiple 1863 Diffserv scheduling behaviours, it would be necessary to replace 1864 each DSCP with a pair of DSCPs. 1866 Uses critical lower-layer header space: The resulting increased 1867 number of DSCPs might be hard to support for some lower layer 1868 technologies, e.g. 802.1p and MPLS both offer only 3-bits for a 1869 maximum of 8 traffic class identifiers. Although L4S should 1870 reduce and possibly remove the need for some DSCPs intended for 1871 differentiated queuing delay, it will not remove the need for 1872 Diffserv entirely, because Diffserv is also used to allocate 1873 bandwidth, e.g. by prioritising some classes of traffic over 1874 others when traffic exceeds available capacity. 1876 Not end-to-end (host-network): Very few networks honour a DSCP set 1877 by a host. Typically a network will zero (bleach) the Diffserv 1878 field from all hosts. Sometimes networks will attempt to identify 1879 applications by some form of packet inspection and, based on 1880 network policy, they will set the DSCP considered appropriate for 1881 the identified application. Network-based application 1882 identification might use some combination of protocol ID, port 1883 numbers(s), application layer protocol headers, IP address(es), 1884 VLAN ID(s) and even packet timing. 1886 Not end-to-end (network-network): Very few networks honour a DSCP 1887 received from a neighbouring network. Typically a network will 1888 zero (bleach) the Diffserv field from all neighbouring networks at 1889 an interconnection point. Sometimes bilateral arrangements are 1890 made between networks, such that the receiving network remarks 1891 some DSCPs to those it uses for roughly equivalent services. The 1892 likelihood that a DSCP will be bleached or ignored depends on the 1893 type of DSCP: 1895 Local-use DSCP: These tend to be used to implement application- 1896 specific network policies, but a bilateral arrangement to 1897 remark certain DSCPs is often applied to DSCPs in the local-use 1898 range simply because it is easier not to change all of a 1899 network's internal configurations when a new arrangement is 1900 made with a neighbour. 1902 Recommended standard DSCP: These do not tend to be honoured 1903 across network interconnections more than local-use DSCPs. 1904 However, if two networks decide to honour certain of each 1905 other's DSCPs, the reconfiguration is a little easier if both 1906 of their globally recognised services are already represented 1907 by the relevant recommended standard DSCPs. 1909 Note that today a recommended standard DSCP gives little more 1910 assurance of end-to-end service than a local-use DSCP. In 1911 future the range recommended as standard might give more 1912 assurance of end-to-end service than local-use, but it is 1913 unlikely that either assurance will be high, particularly given 1914 the hosts are included in the end-to-end path. 1916 Not all tunnels: Diffserv codepoints are often not propagated to the 1917 outer header when a packet is encapsulated by a tunnel header. 1918 DSCPs are propagated to the outer of uniform mode tunnels, but not 1919 pipe mode [RFC2983], and pipe mode is fairly common. 1921 ECN hard in some lower layers:: Because this approach uses both the 1922 Diffserv and ECN fields, an AQM wil only work at a lower layer if 1923 both can be supported. If individual network operators wished to 1924 deploy an AQM at a lower layer, they would usually propagate an IP 1925 Diffserv codepoint to the lower layer, using for example IEEE 1926 802.1p. However, the ECN capability is harder to propagate down 1927 to lower layers because few lower layers support it. 1929 Pros: 1931 Could migrate to e2e: If all usage of Classic ECN migrates to usage 1932 of L4S, the DSCP would become redundant, and the ECN capability 1933 alone could eventually identify L4S packets without the 1934 interconnection problems of Diffserv detailed above, and without 1935 having permanently consumed more than one codepoint in the IP 1936 header. Although the DSCP does not generally function as an end- 1937 to-end identifier (see above), it could be used initially by 1938 individual ISPs to introduce the L4S service for their own locally 1939 generated traffic. 1941 B.3. ECN capability alone 1943 This approach uses ECN capability alone as the L4S identifier. It 1944 would only have been feasible if RFC 3168 ECN had not been widely 1945 deployed. This was the case when the choice of L4S identifier was 1946 being made and this appendix was first written. Since then, RFC 3168 1947 ECN has been widely deployed and L4S did not take this approach 1948 anyway. So this approach is not discussed further, because it is no 1949 longer a feasible option. 1951 B.4. Protocol ID 1953 It has been suggested that a new ID in the IPv4 Protocol field or the 1954 IPv6 Next Header field could identify L4S packets. However this 1955 approach is ruled out by numerous problems: 1957 o A new protocol ID would need to be paired with the old one for 1958 each transport (TCP, SCTP, UDP, etc.). 1960 o In IPv6, there can be a sequence of Next Header fields, and it 1961 would not be obvious which one would be expected to identify a 1962 network service like L4S. 1964 o A new protocol ID would rarely provide an end-to-end service, 1965 because It is well-known that new protocol IDs are often blocked 1966 by numerous types of middlebox. 1968 o The approach is not a solution for AQM methods below the IP layer. 1970 B.5. Source or destination addressing 1972 Locally, a network operator could arrange for L4S service to be 1973 applied based on source or destination addressing, e.g. packets from 1974 its own data centre and/or CDN hosts, packets to its business 1975 customers, etc. It could use addressing at any layer, e.g. IP 1976 addresses, MAC addresses, VLAN IDs, etc. Although addressing might 1977 be a useful tactical approach for a single ISP, it would not be a 1978 feasible approach to identify an end-to-end service like L4S. Even 1979 for a single ISP, it would require packet classifiers in buffers to 1980 be dependent on changing topology and address allocation decisions 1981 elsewhere in the network. Therefore this approach is not a feasible 1982 solution. 1984 B.6. Summary: Merits of Alternative Identifiers 1986 Table 1 provides a very high level summary of the pros and cons 1987 detailed against the schemes described respectively in Appendix B.2 1988 and Appendix B.1, for six issues that set them apart. 1990 +--------------+--------------------+--------------------+ 1991 | Issue | DSCP + ECN | ECT(1) + CE | 1992 +--------------+--------------------+--------------------+ 1993 | | initial eventual | initial eventual | 1994 | | | | 1995 | end-to-end | N . . . ? . | . . Y . . Y | 1996 | tunnels | . O . . O . | . . ? . . Y | 1997 | lower layers | N . . . ? . | . O . . . ? | 1998 | codepoints | N . . . . ? | N . . . . ? | 1999 | reordering | . . Y . . Y | . O . . . ? | 2000 | ctrl pkts | . . Y . . Y | . O . . . ? | 2001 | | | | 2002 | | | | 2003 +--------------+--------------------+--------------------+ 2005 Table 1: Comparison of the Merits of Three Alternative Identifiers 2007 The schemes are scored based on both their capabilities now 2008 ('initial') and in the long term ('eventual'). The scores are one of 2009 'N, O, Y', meaning 'Poor', 'Ordinary', 'Good' respectively. The same 2010 scores are aligned vertically to aid the eye. A score of "?" in one 2011 of the positions means that this approach might optimistically become 2012 this good, given sufficient effort. The table summarises the text 2013 and is not meant to be understandable without having read the text. 2015 Appendix C. Potential Competing Uses for the ECT(1) Codepoint 2017 The ECT(1) codepoint of the ECN field has already been assigned once 2018 for the ECN nonce [RFC3540], which has now been categorized as 2019 historic [RFC8311]. ECN is probably the only remaining field in the 2020 Internet Protocol that is common to IPv4 and IPv6 and still has 2021 potential to work end-to-end, with tunnels and with lower layers. 2022 Therefore, ECT(1) should not be reassigned to a different 2023 experimental use (L4S) without carefully assessing competing 2024 potential uses. These fall into the following categories: 2026 C.1. Integrity of Congestion Feedback 2028 Receiving hosts can fool a sender into downloading faster by 2029 suppressing feedback of ECN marks (or of losses if retransmissions 2030 are not necessary or available otherwise). 2032 The historic ECN nonce protocol [RFC3540] proposed that a TCP sender 2033 could set either of ECT(0) or ECT(1) in each packet of a flow and 2034 remember the sequence it had set. If any packet was lost or 2035 congestion marked, the receiver would miss that bit of the sequence. 2036 An ECN Nonce receiver had to feed back the least significant bit of 2037 the sum, so it could not suppress feedback of a loss or mark without 2038 a 50-50 chance of guessing the sum incorrectly. 2040 It is highly unlikely that ECT(1) will be needed for integrity 2041 protection in future. The ECN Nonce RFC [RFC3540] as been 2042 reclassified as historic, partly because other ways have been 2043 developed to protect feedback integrity of TCP and other transports 2044 [RFC8311] that do not consume a codepoint in the IP header. For 2045 instance: 2047 o the sender can test the integrity of the receiver's feedback by 2048 occasionally setting the IP-ECN field to a value normally only set 2049 by the network. Then it can test whether the receiver's feedback 2050 faithfully reports what it expects (see para 2 of Section 20.2 of 2051 [RFC3168]. This works for loss and it will work for the accurate 2052 ECN feedback [RFC7560] intended for L4S. 2054 o A network can enforce a congestion response to its ECN markings 2055 (or packet losses) by auditing congestion exposure (ConEx) 2056 [RFC7713]. Whether the receiver or a downstream network is 2057 suppressing congestion feedback or the sender is unresponsive to 2058 the feedback, or both, ConEx audit can neutralise any advantage 2059 that any of these three parties would otherwise gain. 2061 o The TCP authentication option (TCP-AO [RFC5925]) can be used to 2062 detect any tampering with TCP congestion feedback (whether 2063 malicious or accidental). TCP's congestion feedback fields are 2064 immutable end-to-end, so they are amenable to TCP-AO protection, 2065 which covers the main TCP header and TCP options by default. 2066 However, TCP-AO is often too brittle to use on many end-to-end 2067 paths, where middleboxes can make verification fail in their 2068 attempts to improve performance or security, e.g. by 2069 resegmentation or shifting the sequence space. 2071 C.2. Notification of Less Severe Congestion than CE 2073 Various researchers have proposed to use ECT(1) as a less severe 2074 congestion notification than CE, particularly to enable flows to fill 2075 available capacity more quickly after an idle period, when another 2076 flow departs or when a flow starts, e.g. VCP [VCP], Queue View (QV) 2077 [QV]. 2079 Before assigning ECT(1) as an identifer for L4S, we must carefully 2080 consider whether it might be better to hold ECT(1) in reserve for 2081 future standardisation of rapid flow acceleration, which is an 2082 important and enduring problem [RFC6077]. 2084 Pre-Congestion Notification (PCN) is another scheme that assigns 2085 alternative semantics to the ECN field. It uses ECT(1) to signify a 2086 less severe level of pre-congestion notification than CE [RFC6660]. 2087 However, the ECN field only takes on the PCN semantics if packets 2088 carry a Diffserv codepoint defined to indicate PCN marking within a 2089 controlled environment. PCN is required to be applied solely to the 2090 outer header of a tunnel across the controlled region in order not to 2091 interfere with any end-to-end use of the ECN field. Therefore a PCN 2092 region on the path would not interfere with any of the L4S service 2093 identifiers proposed in Appendix B. 2095 Authors' Addresses 2097 Koen De Schepper 2098 Nokia Bell Labs 2099 Antwerp 2100 Belgium 2102 Email: koen.de_schepper@nokia.com 2103 URI: https://www.bell-labs.com/usr/koen.de_schepper 2105 Bob Briscoe (editor) 2106 Independent 2107 UK 2109 Email: ietf@bobbriscoe.net 2110 URI: http://bobbriscoe.net/