idnits 2.17.1 draft-ietf-tsvwg-ecn-l4s-id-19.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 26, 2021) is 1005 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 1283 == Missing Reference: 'RFCXXXX' is mentioned on line 1284, but not defined == Outdated reference: A later version (-07) exists of draft-briscoe-docsis-q-protection-00 == Outdated reference: A later version (-03) exists of draft-briscoe-iccrg-prague-congestion-control-00 == Outdated reference: A later version (-28) exists of draft-ietf-tcpm-accurate-ecn-15 == Outdated reference: A later version (-15) exists of draft-ietf-tcpm-generalized-ecn-07 == Outdated reference: A later version (-25) exists of draft-ietf-tsvwg-aqm-dualq-coupled-16 == Outdated reference: A later version (-22) exists of draft-ietf-tsvwg-ecn-encap-guidelines-16 == Outdated reference: A later version (-20) exists of draft-ietf-tsvwg-l4s-arch-10 == Outdated reference: A later version (-06) exists of draft-ietf-tsvwg-l4sops-01 == Outdated reference: A later version (-22) exists of draft-ietf-tsvwg-nqb-06 == Outdated reference: A later version (-23) exists of draft-ietf-tsvwg-rfc6040update-shim-14 == Outdated reference: A later version (-07) exists of draft-stewart-tsvwg-sctpecn-05 -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 6347 (Obsoleted by RFC 9147) -- Obsolete informational reference (is this intentional?): RFC 8312 (Obsoleted by RFC 9438) Summary: 0 errors (**), 0 flaws (~~), 14 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Services (tsv) K. De Schepper 3 Internet-Draft Nokia Bell Labs 4 Intended status: Experimental B. Briscoe, Ed. 5 Expires: January 27, 2022 Independent 6 July 26, 2021 8 Explicit Congestion Notification (ECN) Protocol for Very Low Queuing 9 Delay (L4S) 10 draft-ietf-tsvwg-ecn-l4s-id-19 12 Abstract 14 This specification defines the protocol to be used for a new network 15 service called low latency, low loss and scalable throughput (L4S). 16 L4S uses an Explicit Congestion Notification (ECN) scheme at the IP 17 layer that is similar to the original (or 'Classic') ECN approach, 18 except as specified within. L4S uses 'scalable' congestion control, 19 which induces much more frequent control signals from the network and 20 it responds to them with much more fine-grained adjustments, so that 21 very low (typically sub-millisecond on average) and consistently low 22 queuing delay becomes possible for L4S traffic without compromising 23 link utilization. Thus even capacity-seeking (TCP-like) traffic can 24 have high bandwidth and very low delay at the same time, even during 25 periods of high traffic load. 27 The L4S identifier defined in this document distinguishes L4S from 28 'Classic' (e.g. TCP-Reno-friendly) traffic. It gives an incremental 29 migration path so that suitably modified network bottlenecks can 30 distinguish and isolate existing traffic that still follows the 31 Classic behaviour, to prevent it degrading the low queuing delay and 32 low loss of L4S traffic. This specification defines the rules that 33 L4S transports and network elements need to follow with the intention 34 that L4S flows neither harm each other's performance nor that of 35 Classic traffic. Examples of new active queue management (AQM) 36 marking algorithms and examples of new transports (whether TCP-like 37 or real-time) are specified separately. 39 Status of This Memo 41 This Internet-Draft is submitted in full conformance with the 42 provisions of BCP 78 and BCP 79. 44 Internet-Drafts are working documents of the Internet Engineering 45 Task Force (IETF). Note that other groups may also distribute 46 working documents as Internet-Drafts. The list of current Internet- 47 Drafts is at https://datatracker.ietf.org/drafts/current/. 49 Internet-Drafts are draft documents valid for a maximum of six months 50 and may be updated, replaced, or obsoleted by other documents at any 51 time. It is inappropriate to use Internet-Drafts as reference 52 material or to cite them other than as "work in progress." 54 This Internet-Draft will expire on January 27, 2022. 56 Copyright Notice 58 Copyright (c) 2021 IETF Trust and the persons identified as the 59 document authors. All rights reserved. 61 This document is subject to BCP 78 and the IETF Trust's Legal 62 Provisions Relating to IETF Documents 63 (https://trustee.ietf.org/license-info) in effect on the date of 64 publication of this document. Please review these documents 65 carefully, as they describe your rights and restrictions with respect 66 to this document. Code Components extracted from this document must 67 include Simplified BSD License text as described in Section 4.e of 68 the Trust Legal Provisions and are provided without warranty as 69 described in the Simplified BSD License. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 74 1.1. Latency, Loss and Scaling Problems . . . . . . . . . . . 5 75 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 76 1.3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 8 77 2. Choice of L4S Packet Identifier: Requirements . . . . . . . . 9 78 3. L4S Packet Identification . . . . . . . . . . . . . . . . . . 10 79 4. Transport Layer Behaviour (the 'Prague Requirements') . . . . 11 80 4.1. Codepoint Setting . . . . . . . . . . . . . . . . . . . . 11 81 4.2. Prerequisite Transport Feedback . . . . . . . . . . . . . 11 82 4.3. Prerequisite Congestion Response . . . . . . . . . . . . 12 83 4.4. Filtering or Smoothing of ECN Feedback . . . . . . . . . 15 84 5. Network Node Behaviour . . . . . . . . . . . . . . . . . . . 15 85 5.1. Classification and Re-Marking Behaviour . . . . . . . . . 15 86 5.2. The Strength of L4S CE Marking Relative to Drop . . . . . 16 87 5.3. Exception for L4S Packet Identification by Network Nodes 88 with Transport-Layer Awareness . . . . . . . . . . . . . 18 89 5.4. Interaction of the L4S Identifier with other Identifiers 18 90 5.4.1. DualQ Examples of Other Identifiers Complementing L4S 91 Identifiers . . . . . . . . . . . . . . . . . . . . . 18 92 5.4.1.1. Inclusion of Additional Traffic with L4S . . . . 18 93 5.4.1.2. Exclusion of Traffic From L4S Treatment . . . . . 20 94 5.4.1.3. Generalized Combination of L4S and Other 95 Identifiers . . . . . . . . . . . . . . . . . . . 21 96 5.4.2. Per-Flow Queuing Examples of Other Identifiers 97 Complementing L4S Identifiers . . . . . . . . . . . . 22 98 5.5. Limiting Packet Bursts from Links Supporting L4S AQMs . . 22 99 6. Behaviour of Tunnels and Encapsulations . . . . . . . . . . . 23 100 6.1. No Change to ECN Tunnels and Encapsulations in General . 23 101 6.2. VPN Behaviour to Avoid Limitations of Anti-Replay . . . . 24 102 7. L4S Experiments . . . . . . . . . . . . . . . . . . . . . . . 25 103 7.1. Open Questions . . . . . . . . . . . . . . . . . . . . . 25 104 7.2. Open Issues . . . . . . . . . . . . . . . . . . . . . . . 26 105 7.3. Future Potential . . . . . . . . . . . . . . . . . . . . 27 106 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27 107 9. Security Considerations . . . . . . . . . . . . . . . . . . . 28 108 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 28 109 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 29 110 11.1. Normative References . . . . . . . . . . . . . . . . . . 29 111 11.2. Informative References . . . . . . . . . . . . . . . . . 29 112 Appendix A. The 'Prague L4S Requirements' . . . . . . . . . . . 38 113 A.1. Requirements for Scalable Transport Protocols . . . . . . 38 114 A.1.1. Use of L4S Packet Identifier . . . . . . . . . . . . 38 115 A.1.2. Accurate ECN Feedback . . . . . . . . . . . . . . . . 39 116 A.1.3. Capable of Replacement by Classic Congestion Control 39 117 A.1.4. Fall back to Classic Congestion Control on Packet 118 Loss . . . . . . . . . . . . . . . . . . . . . . . . 39 119 A.1.5. Coexistence with Classic Congestion Control at 120 Classic ECN bottlenecks . . . . . . . . . . . . . . . 40 121 A.1.6. Reduce RTT dependence . . . . . . . . . . . . . . . . 43 122 A.1.7. Scaling down to fractional congestion windows . . . . 44 123 A.1.8. Measuring Reordering Tolerance in Time Units . . . . 45 124 A.2. Scalable Transport Protocol Optimizations . . . . . . . . 48 125 A.2.1. Setting ECT in Control Packets and Retransmissions . 48 126 A.2.2. Faster than Additive Increase . . . . . . . . . . . . 48 127 A.2.3. Faster Convergence at Flow Start . . . . . . . . . . 49 128 Appendix B. Compromises in the Choice of L4S Identifier . . . . 49 129 Appendix C. Potential Competing Uses for the ECT(1) Codepoint . 54 130 C.1. Integrity of Congestion Feedback . . . . . . . . . . . . 54 131 C.2. Notification of Less Severe Congestion than CE . . . . . 55 132 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 56 134 1. Introduction 136 This specification defines the protocol to be used for a new network 137 service called low latency, low loss and scalable throughput (L4S). 138 L4S uses an Explicit Congestion Notification (ECN) scheme at the IP 139 layer that is similar to the original (or 'Classic') Explicit 140 Congestion Notification (ECN [RFC3168]). RFC 3168 required an ECN 141 mark to be equivalent to a drop, both when applied in the network and 142 when responded to by a transport. Unlike Classic ECN marking, the 143 network applies L4S marking more immediately and more aggressively 144 than drop, and the transport response to each mark is reduced and 145 smoothed relative to that for drop. The two changes counterbalance 146 each other so that the throughput of an L4S flow will be roughly the 147 same as a comparable non-L4S flow under the same conditions. 148 Nonetheless, the much more frequent control signals and the finer 149 responses to them result in very low queuing delay without 150 compromising link utilization, and this low delay can be maintained 151 during high load. For instance, queuing delay under heavy and highly 152 varying load with the example DCTCP/DualQ solution cited below on a 153 DSL or Ethernet link is sub-millisecond on average and roughly 1 to 2 154 milliseconds at the 99th percentile without losing link 155 utilization [DualPI2Linux], [DCttH15]. Note that the inherent 156 queuing delay while waiting to acquire a discontinuous medium such as 157 WiFi has to be minimized in its own right, so it would be additional 158 to the above (see section 6.3 of [I-D.ietf-tsvwg-l4s-arch]). 160 L4S relies on 'scalable' congestion controls for these delay 161 properties and for preserving low delay as flow rate scales, hence 162 the name. The congestion control used in Data Center TCP (DCTCP) is 163 an example of a scalable congestion control, but DCTCP is applicable 164 solely to controlled environments like data centres [RFC8257], 165 because it is too aggressive to co-exist with existing TCP-Reno- 166 friendly traffic. The DualQ Coupled AQM, which is defined in a 167 complementary experimental specification 168 [I-D.ietf-tsvwg-aqm-dualq-coupled], is an AQM framework that enables 169 scalable congestion controls derived from DCTCP to co-exist with 170 existing traffic, each getting roughly the same flow rate when they 171 compete under similar conditions. Note that a scalable congestion 172 control is still not safe to deploy on the Internet unless it 173 satisfies the requirements listed in Section 4. 175 L4S is not only for elastic (TCP-like) traffic - there are scalable 176 congestion controls for real-time media, such as the L4S variant of 177 the SCReAM [RFC8298] real-time media congestion avoidance technique 178 (RMCAT). The factor that distinguishes L4S from Classic traffic is 179 its behaviour in response to congestion. The transport wire 180 protocol, e.g. TCP, QUIC, SCTP, DCCP, RTP/RTCP, is orthogonal (and 181 therefore not suitable for distinguishing L4S from Classic packets). 183 The L4S identifier defined in this document is the key piece that 184 distinguishes L4S from 'Classic' (e.g. Reno-friendly) traffic. It 185 gives an incremental migration path so that suitably modified network 186 bottlenecks can distinguish and isolate existing Classic traffic from 187 L4S traffic to prevent the former from degrading the very low delay 188 and loss of the new scalable transports, without harming Classic 189 performance at these bottlenecks. Initial implementation of the 190 separate parts of the system has been motivated by the performance 191 benefits. 193 1.1. Latency, Loss and Scaling Problems 195 Latency is becoming the critical performance factor for many (most?) 196 applications on the public Internet, e.g. interactive Web, Web 197 services, voice, conversational video, interactive video, interactive 198 remote presence, instant messaging, online gaming, remote desktop, 199 cloud-based applications, and video-assisted remote control of 200 machinery and industrial processes. In the 'developed' world, 201 further increases in access network bit-rate offer diminishing 202 returns, whereas latency is still a multi-faceted problem. In the 203 last decade or so, much has been done to reduce propagation time by 204 placing caches or servers closer to users. However, queuing remains 205 a major intermittent component of latency. 207 The Diffserv architecture provides Expedited Forwarding [RFC3246], so 208 that low latency traffic can jump the queue of other traffic. If 209 growth in high-throughput latency-sensitive applications continues, 210 periods with solely latency-sensitive traffic will become 211 increasingly common on links where traffic aggregation is low. For 212 instance, on the access links dedicated to individual sites (homes, 213 small enterprises or mobile devices). These links also tend to 214 become the path bottleneck under load. During these periods, if all 215 the traffic were marked for the same treatment at these bottlenecks, 216 Diffserv would make no difference. Instead, it becomes imperative to 217 remove the underlying causes of any unnecessary delay. 219 The bufferbloat project has shown that excessively-large buffering 220 ('bufferbloat') has been introducing significantly more delay than 221 the underlying propagation time. These delays appear only 222 intermittently--only when a capacity-seeking (e.g. TCP) flow is long 223 enough for the queue to fill the buffer, making every packet in other 224 flows sharing the buffer sit through the queue. 226 Active queue management (AQM) was originally developed to solve this 227 problem (and others). Unlike Diffserv, which gives low latency to 228 some traffic at the expense of others, AQM controls latency for _all_ 229 traffic in a class. In general, AQM methods introduce an increasing 230 level of discard from the buffer the longer the queue persists above 231 a shallow threshold. This gives sufficient signals to capacity- 232 seeking (aka. greedy) flows to keep the buffer empty for its intended 233 purpose: absorbing bursts. However, RED [RFC2309] and other 234 algorithms from the 1990s were sensitive to their configuration and 235 hard to set correctly. So, this form of AQM was not widely deployed. 237 More recent state-of-the-art AQM methods, e.g. FQ-CoDel [RFC8290], 238 PIE [RFC8033], Adaptive RED [ARED01], are easier to configure, 239 because they define the queuing threshold in time not bytes, so it is 240 invariant for different link rates. However, no matter how good the 241 AQM, the sawtoothing sending window of a Classic congestion control 242 will either cause queuing delay to vary or cause the link to be 243 under-utilized. Even with a perfectly tuned AQM, the additional 244 queuing delay will be of the same order as the underlying speed-of- 245 light delay across the network. 247 If a sender's own behaviour is introducing queuing delay variation, 248 no AQM in the network can 'un-vary' the delay without significantly 249 compromising link utilization. Even flow-queuing (e.g. [RFC8290]), 250 which isolates one flow from another, cannot isolate a flow from the 251 delay variations it inflicts on itself. Therefore those applications 252 that need to seek out high bandwidth but also need low latency will 253 have to migrate to scalable congestion control. 255 Altering host behaviour is not enough on its own though. Even if 256 hosts adopt low latency behaviour (scalable congestion controls), 257 they need to be isolated from the behaviour of existing Classic 258 congestion controls that induce large queue variations. L4S enables 259 that migration by providing latency isolation in the network and 260 distinguishing the two types of packets that need to be isolated: L4S 261 and Classic. L4S isolation can be achieved with a queue per flow 262 (e.g. [RFC8290]) but a DualQ [I-D.ietf-tsvwg-aqm-dualq-coupled] is 263 sufficient, and actually gives better tail latency. Both approaches 264 are addressed in this document. 266 The DualQ solution was developed to make very low latency available 267 without requiring per-flow queues at every bottleneck. This was 268 because FQ has well-known downsides - not least the need to inspect 269 transport layer headers in the network, which makes it incompatible 270 with privacy approaches such as IPSec VPN tunnels, and incompatible 271 with link layer queue management, where transport layer headers can 272 be hidden, e.g. 5G. 274 Latency is not the only concern addressed by L4S: It was known when 275 TCP congestion avoidance was first developed that it would not scale 276 to high bandwidth-delay products (footnote 6 of Jacobson and Karels 277 [TCP-CA]). Given regular broadband bit-rates over WAN distances are 278 already [RFC3649] beyond the scaling range of Reno congestion 279 control, 'less unscalable' Cubic [RFC8312] and 280 Compound [I-D.sridharan-tcpm-ctcp] variants of TCP have been 281 successfully deployed. However, these are now approaching their 282 scaling limits. Unfortunately, fully scalable congestion controls 283 such as DCTCP [RFC8257] outcompete Classic ECN congestion controls 284 sharing the same queue, which is why they have been confined to 285 private data centres or research testbeds. 287 It turns out that these scalable congestion control algorithms that 288 solve the latency problem can also solve the scalability problem of 289 Classic congestion controls. The finer sawteeth in the congestion 290 window have low amplitude, so they cause very little queuing delay 291 variation and the average time to recover from one congestion signal 292 to the next (the average duration of each sawtooth) remains 293 invariant, which maintains constant tight control as flow-rate 294 scales. A background paper [DCttH15] gives the full explanation of 295 why the design solves both the latency and the scaling problems, both 296 in plain English and in more precise mathematical form. The 297 explanation is summarised without the maths in the L4S architecture 298 document [I-D.ietf-tsvwg-l4s-arch]. 300 1.2. Terminology 302 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 303 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 304 "OPTIONAL" in this document are to be interpreted as described in 305 [RFC2119]. In this document, these words will appear with that 306 interpretation only when in ALL CAPS. Lower case uses of these words 307 are not to be interpreted as carrying RFC-2119 significance. 309 Classic Congestion Control: A congestion control behaviour that can 310 co-exist with standard Reno [RFC5681] without causing 311 significantly negative impact on its flow rate [RFC5033]. With 312 Classic congestion controls, such as Reno or Cubic, because flow 313 rate has scaled since TCP congestion control was first designed in 314 1988, it now takes hundreds of round trips (and growing) to 315 recover after a congestion signal (whether a loss or an ECN mark) 316 as shown in the examples in section 5.1 of 317 [I-D.ietf-tsvwg-l4s-arch] and in [RFC3649]. Therefore control of 318 queuing and utilization becomes very slack, and the slightest 319 disturbances (e.g. from new flows starting) prevent a high rate 320 from being attained. 322 Scalable Congestion Control: A congestion control where the average 323 time from one congestion signal to the next (the recovery time) 324 remains invariant as the flow rate scales, all other factors being 325 equal. This maintains the same degree of control over queueing 326 and utilization whatever the flow rate, as well as ensuring that 327 high throughput is robust to disturbances. For instance, DCTCP 328 averages 2 congestion signals per round-trip whatever the flow 329 rate, as do other recently developed scalable congestion controls, 330 e.g. Relentless TCP [Mathis09], TCP Prague 331 [I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux], 332 BBRv2 [BBRv2] and the L4S variant of SCREAM for real-time 333 media [SCReAM], [RFC8298]). See Section 4.3 for more explanation. 335 Classic service: The Classic service is intended for all the 336 congestion control behaviours that co-exist with Reno [RFC5681] 337 (e.g. Reno itself, Cubic [RFC8312], Compound 338 [I-D.sridharan-tcpm-ctcp], TFRC [RFC5348]). The term 'Classic 339 queue' means a queue providing the Classic service. 341 Low-Latency, Low-Loss Scalable throughput (L4S) service: The 'L4S' 342 service is intended for traffic from scalable congestion control 343 algorithms, such as TCP Prague 344 [I-D.briscoe-iccrg-prague-congestion-control], which was derived 345 from DCTCP [RFC8257]. The L4S service is for more general traffic 346 than just TCP Prague--it allows the set of congestion controls 347 with similar scaling properties to Prague to evolve, such as the 348 examples listed above (Relentless, SCReAM). The term 'L4S queue' 349 means a queue providing the L4S service. 351 The terms Classic or L4S can also qualify other nouns, such as 352 'queue', 'codepoint', 'identifier', 'classification', 'packet', 353 'flow'. For example: an L4S packet means a packet with an L4S 354 identifier sent from an L4S congestion control. 356 Both Classic and L4S services can cope with a proportion of 357 unresponsive or less-responsive traffic as well, but in the L4S 358 case its rate has to be smooth enough or low enough not to build a 359 queue (e.g. DNS, VoIP, game sync datagrams, etc). 361 Reno-friendly: The subset of Classic traffic that is friendly to the 362 standard Reno congestion control defined for TCP in [RFC5681]. 363 Reno-friendly is used in place of 'TCP-friendly', given the latter 364 has become imprecise, because the TCP protocol is now used with so 365 many different congestion control behaviours, and Reno is used in 366 non-TCP transports such as QUIC. 368 Classic ECN: The original Explicit Congestion Notification (ECN) 369 protocol [RFC3168], which requires ECN signals to be treated the 370 same as drops, both when generated in the network and when 371 responded to by the sender. For L4S, the names used for the four 372 codepoints of the 2-bit IP-ECN field are unchanged from those 373 defined in [RFC3168]: Not ECT, ECT(0), ECT(1) and CE, where ECT 374 stands for ECN-Capable Transport and CE stands for Congestion 375 Experienced. A packet marked with the CE codepoint is termed 376 'ECN-marked' or sometimes just 'marked' where the context makes 377 ECN obvious. 379 1.3. Scope 381 The new L4S identifier defined in this specification is applicable 382 for IPv4 and IPv6 packets (as for Classic ECN [RFC3168]). It is 383 applicable for the unicast, multicast and anycast forwarding modes. 385 The L4S identifier is an orthogonal packet classification to the 386 Differentiated Services Code Point (DSCP) [RFC2474]. Section 5.4 387 explains what this means in practice. 389 This document is intended for experimental status, so it does not 390 update any standards track RFCs. Therefore it depends on [RFC8311], 391 which is a standards track specification that: 393 o updates the ECN proposed standard [RFC3168] to allow experimental 394 track RFCs to relax the requirement that an ECN mark must be 395 equivalent to a drop (when the network applies markings and/or 396 when the sender responds to them). For instance, in the ABE 397 experiment [RFC8511] this permits a sender to respond less to ECN 398 marks than to drops; 400 o changes the status of the experimental ECN nonce [RFC3540] to 401 historic; 403 o makes consequent updates to the following additional proposed 404 standard RFCs to reflect the above two bullets: 406 * ECN for RTP [RFC6679]; 408 * the congestion control specifications of various DCCP 409 congestion control identifier (CCID) profiles [RFC4341], 410 [RFC4342], [RFC5622]. 412 This document is about identifiers that are used for interoperation 413 between hosts and networks. So the audience is broad, covering 414 developers of host transports and network AQMs, as well as covering 415 how operators might wish to combine various identifiers, which would 416 require flexibility from equipment developers. 418 2. Choice of L4S Packet Identifier: Requirements 420 This subsection briefly records the process that led to the chosen 421 L4S identifier. 423 The identifier for packets using the Low Latency, Low Loss, Scalable 424 throughput (L4S) service needs to meet the following requirements: 426 o it SHOULD survive end-to-end between source and destination end- 427 points: across the boundary between host and network, between 428 interconnected networks, and through middleboxes; 430 o it SHOULD be visible at the IP layer; 432 o it SHOULD be common to IPv4 and IPv6 and transport-agnostic; 433 o it SHOULD be incrementally deployable; 435 o it SHOULD enable an AQM to classify packets encapsulated by outer 436 IP or lower-layer headers; 438 o it SHOULD consume minimal extra codepoints; 440 o it SHOULD be consistent on all the packets of a transport layer 441 flow, so that some packets of a flow are not served by a different 442 queue to others. 444 Whether the identifier would be recoverable if the experiment failed 445 is a factor that could be taken into account. However, this has not 446 been made a requirement, because that would favour schemes that would 447 be easier to fail, rather than those more likely to succeed. 449 It is recognised that any choice of identifier is unlikely to satisfy 450 all these requirements, particularly given the limited space left in 451 the IP header. Therefore a compromise will always be necessary, 452 which is why all the above requirements are expressed with the word 453 'SHOULD' not 'MUST'. 455 After extensive assessment of alternative schemes, "ECT(1) and CE 456 codepoints" was chosen as the best compromise. Therefore this scheme 457 is defined in detail in the following sections, while Appendix B 458 records its pros and cons against the above requirements. 460 3. L4S Packet Identification 462 The L4S treatment is an experimental track alternative packet marking 463 treatment to the Classic ECN treatment in [RFC3168], which has been 464 updated by [RFC8311] to allow experiments such as the one defined in 465 the present specification. [RFC4774] discusses some of the issues 466 and evaluation criteria when defining alternative ECN semantics. 467 Like Classic ECN, L4S ECN identifies both network and host behaviour: 468 it identifies the marking treatment that network nodes are expected 469 to apply to L4S packets, and it identifies packets that have been 470 sent from hosts that are expected to comply with a broad type of 471 sending behaviour. 473 For a packet to receive L4S treatment as it is forwarded, the sender 474 sets the ECN field in the IP header to the ECT(1) codepoint. See 475 Section 4 for full transport layer behaviour requirements, including 476 feedback and congestion response. 478 A network node that implements the L4S service always classifies 479 arriving ECT(1) packets for L4S treatment and by default classifies 480 CE packets for L4S treatment unless the heuristics described in 481 Section 5.3 are employed. See Section 5 for full network element 482 behaviour requirements, including classification, ECN-marking and 483 interaction of the L4S identifier with other identifiers and per-hop 484 behaviours. 486 4. Transport Layer Behaviour (the 'Prague Requirements') 488 4.1. Codepoint Setting 490 A sender that wishes a packet to receive L4S treatment as it is 491 forwarded, MUST set the ECN field in the IP header (v4 or v6) to the 492 ECT(1) codepoint. 494 4.2. Prerequisite Transport Feedback 496 For a transport protocol to provide scalable congestion control 497 (Section 4.3) it MUST provide feedback of the extent of CE marking on 498 the forward path. When ECN was added to TCP [RFC3168], the feedback 499 method reported no more than one CE mark per round trip. Some 500 transport protocols derived from TCP mimic this behaviour while 501 others report the accurate extent of ECN marking. This means that 502 some transport protocols will need to be updated as a prerequisite 503 for scalable congestion control. The position for a few well-known 504 transport protocols is given below. 506 TCP: Support for the accurate ECN feedback requirements [RFC7560] 507 (such as that provided by AccECN [I-D.ietf-tcpm-accurate-ecn]) by 508 both ends is a prerequisite for scalable congestion control in 509 TCP. Therefore, the presence of ECT(1) in the IP headers even in 510 one direction of a TCP connection will imply that both ends must 511 be supporting accurate ECN feedback. However, the converse does 512 not apply. So even if both ends support AccECN, either of the two 513 ends can choose not to use a scalable congestion control, whatever 514 the other end's choice. 516 SCTP: A suitable ECN feedback mechanism for SCTP could add a chunk 517 to report the number of received CE marks 518 (e.g. [I-D.stewart-tsvwg-sctpecn]), and update the ECN feedback 519 protocol sketched out in Appendix A of the standards track 520 specification of SCTP [RFC4960]. 522 RTP over UDP: A prerequisite for scalable congestion control is for 523 both (all) ends of one media-level hop to signal ECN support 524 [RFC6679] and use the new generic RTCP feedback format of 525 [RFC8888]. The presence of ECT(1) implies that both (all) ends of 526 that media-level hop support ECN. However, the converse does not 527 apply. So each end of a media-level hop can independently choose 528 not to use a scalable congestion control, even if both ends 529 support ECN. 531 QUIC: Support for sufficiently fine-grained ECN feedback is provided 532 by the v1 IETF QUIC transport [RFC9000]. 534 DCCP: The ACK vector in DCCP [RFC4340] is already sufficient to 535 report the extent of CE marking as needed by a scalable congestion 536 control. 538 4.3. Prerequisite Congestion Response 540 As a condition for a host to send packets with the L4S identifier 541 (ECT(1)), it SHOULD implement a congestion control behaviour that 542 ensures that, in steady state, the average duration between induced 543 ECN marks does not increase as flow rate scales up, all other factors 544 being equal. This is termed a scalable congestion control. This 545 invariant duration ensures that, as flow rate scales, the average 546 period with no feedback information about capacity does not become 547 excessive. It also ensures that queue variations remain small, 548 without having to sacrifice utilization. 550 With a congestion control that sawtooths to probe capacity, this 551 duration is called the recovery time, because each time the sawtooth 552 yields, on average it take this time to recover to its previous high 553 point. A scalable congestion control does not have to sawtooth, but 554 it has to coexist with scalable congestion controls that do. 556 For instance, for DCTCP [RFC8257], TCP Prague 557 [I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux] and the 558 L4S variant of SCReAM [RFC8298], the average recovery time is always 559 half a round trip (or half a reference round trip), whatever the flow 560 rate. 562 As with all transport behaviours, a detailed specification (probably 563 an experimental RFC) is expected for each congestion control, 564 following the guidelines for specifying new congestion control 565 algorithms in [RFC5033]. In addition it is expected to document 566 these L4S-specific matters, specifically the timescale over which the 567 proportionality is averaged, and control of burstiness. The recovery 568 time requirement above is worded as a 'SHOULD' rather than a 'MUST' 569 to allow reasonable flexibility for such implementations. 571 The condition 'all other factors being equal', allows the recovery 572 time to be different for different round trip times, as long as it 573 does not increase with flow rate for any particular RTT. 575 Saying that the recovery time remains roughly invariant is equivalent 576 to saying that the number of ECN CE marks per round trip remains 577 invariant as flow rate scales, all other factors being equal. For 578 instance, an average recovery time of half of 1 RTT is equivalent to 579 2 ECN marks per round trip. For those familiar with steady-state 580 congestion response functions, it is also equivalent to say that the 581 congestion window is inversely proportional to the proportion of 582 bytes in packets marked with the CE codepoint (see section 2 of 583 [PI2]). 585 In order to coexist safely with other Internet traffic, a scalable 586 congestion control MUST NOT tag its packets with the ECT(1) codepoint 587 unless it complies with the following bulleted requirements: 589 o A scalable congestion control MUST be capable of being replaced by 590 a Classic congestion control (by application and/or by 591 administrative control). If a Classic congestion control is 592 activated, it will not tag its packets with the ECT(1) codepoint 593 (see Appendix A.1.3 for rationale). 595 o As well as responding to ECN markings, a scalable congestion 596 control MUST react to packet loss in a way that will coexist 597 safely with Classic congestion controls such as standard Reno 598 [RFC5681], as required by [RFC5033] (see Appendix A.1.4 for 599 rationale). 601 o In uncontrolled environments, monitoring MUST be implemented to 602 support detection of problems with an ECN-capable AQM at the path 603 bottleneck that appears not to support L4S and might be in a 604 shared queue. Such monitoring SHOULD be applied to live traffic 605 that is using Scalable congestion control. Alternatively, 606 monitoring need not be applied to live traffic, if monitoring has 607 been arranged to cover the paths that live traffic takes through 608 uncontrolled environments. 610 The detection function SHOULD be capable of making the congestion 611 control adapt its ECN-marking response to coexist safely with 612 Classic congestion controls such as standard Reno [RFC5681], as 613 required by [RFC5033]. Alternatively, if adaptation is not 614 implemented and problems with such an AQM are detected, the 615 scalable congestion control MUST be replaced by a Classic 616 congestion control. 618 Note that a scalable congestion control is not expected to change 619 to setting ECT(0) while it transiently adapts to coexist with 620 Classic congestion controls. 622 See Appendix A.1.5 and [I-D.ietf-tsvwg-l4sops] for rationale. 624 o In the range between the minimum likely RTT and typical RTTs 625 expected in the intended deployment scenario, a scalable 626 congestion control MUST converge towards a rate that is as 627 independent of RTT as is possible without compromising stability 628 or efficiency (see Appendix A.1.6 for rationale). 630 o A scalable congestion control SHOULD remain responsive to 631 congestion when typical RTTs over the public Internet are 632 significantly smaller because they are no longer inflated by 633 queuing delay. It would be preferable for the minimum window of a 634 scalable congestion control to be lower than 1 segment rather than 635 use the timeout approach described for TCP in S.6.1.2 of [RFC3168] 636 (or an equivalent for other transports). However, a lower minimum 637 is not set as a formal requirement for L4S experiments (see 638 Appendix A.1.7 for rationale). 640 o A scalable congestion control's loss detection SHOULD be resilient 641 to reordering over an adaptive time interval that scales with 642 throughput and adapts to reordering (as in [RFC8985]), as opposed 643 to counting only in fixed units of packets (as in the 3 DupACK 644 rule of [RFC5681] and [RFC6675], which is not scalable). As data 645 rates increase (e.g., due to new and/or improved technology), 646 congestion controls that detect loss by counting in units of 647 packets become more likely to incorrectly treat reordering events 648 as congestion-caused loss events (see Appendix A.1.8 for further 649 rationale). This requirement does not apply to congestion 650 controls that are solely used in controlled environments where the 651 network introduces hardly any reordering. 653 o A scalable congestion control is expected to limit the queue 654 caused by bursts of packets. It would not seem necessary to set 655 the limit any lower than 10% of the minimum RTT expected in a 656 typical deployment (e.g. additional queuing of roughly 250 us for 657 the public Internet). This would be converted to a number of 658 packets under the worst-case assumption that the bottleneck link 659 capacity equals the current flow rate. No normative requirement 660 to limit bursts is given here and, until there is more industry 661 experience from the L4S experiment, it is not even known whether 662 one is needed - it seems to be in an L4S sender's self-interest to 663 limit bursts. 665 Each sender in a session can use a scalable congestion control 666 independently of the congestion control used by the receiver(s) when 667 they send data. Therefore there might be ECT(1) packets in one 668 direction and ECT(0) or Not-ECT in the other. 670 Later (Section 5.4.1.1) this document discusses the conditions for 671 mixing other "'Safe' Unresponsive Traffic" (e.g. DNS, LDAP, NTP, 672 voice, game sync packets) with L4S traffic. To be clear, although 673 such traffic can share the same queue as L4S traffic, it is not 674 appropriate for the sender to tag it as ECT(1), except in the 675 (unlikely) case that it satisfies the above conditions. 677 4.4. Filtering or Smoothing of ECN Feedback 679 Section 5.2 below specifies that an L4S AQM is expected to signal L4S 680 ECN without filtering or smoothing. This contrasts with a Classic 681 AQM, which filters out variations in the queue before signalling ECN 682 marking or drop. In the L4S architecture [I-D.ietf-tsvwg-l4s-arch], 683 responsibility for smoothing out these variations shifts to the 684 sender's congestion control. 686 This shift of responsibility has the advantage that each sender can 687 smooth variations over a timescale proportionate to its own RTT. 688 Whereas, in the Classic approach, the network doesn't know the RTTs 689 of all the flows, so it has to smooth out variations for a worst-case 690 RTT to ensure stability. For all the typical flows with shorter RTT 691 than the worst-case, this makes congestion control unnecessarily 692 sluggish. 694 This also gives an L4S sender the choice not to smooth, depending on 695 its context (start-up, congestion avoidance, etc). Therefore, this 696 document places no requirement on an L4S congestion control to smooth 697 out variations in any particular way. Implementers are encouraged to 698 openly publish the approach they take to smoothing, and the results 699 and experience they gain during the L4S experiment. 701 5. Network Node Behaviour 703 5.1. Classification and Re-Marking Behaviour 705 A network node that implements the L4S service: 707 o MUST classify arriving ECT(1) packets for L4S treatment, unless 708 overridden by another classifier (e.g., see Section 5.4.1.2); 710 o MUST classify arriving CE packets for L4S treatment as well, 711 unless overridden by a another classifier or unless the exception 712 referred to next applies; 714 CE packets might have originated as ECT(1) or ECT(0), but the 715 above rule to classify them as if they originated as ECT(1) is the 716 safe choice (see Appendix B for rationale). The exception is 717 where some flow-aware in-network mechanism happens to be available 718 for distinguishing CE packets that originated as ECT(0), as 719 described in Section 5.3, but there is no implication that such a 720 mechanism is necessary. 722 An L4S AQM treatment follows similar codepoint transition rules to 723 those in RFC 3168. Specifically, the ECT(1) codepoint MUST NOT be 724 changed to any other codepoint than CE, and CE MUST NOT be changed to 725 any other codepoint. An ECT(1) packet is classified as ECN-capable 726 and, if congestion increases, an L4S AQM algorithm will increasingly 727 mark the ECN field as CE, otherwise forwarding packets unchanged as 728 ECT(1). Necessary conditions for an L4S marking treatment are 729 defined in Section 5.2. 731 Under persistent overload an L4S marking treatment MUST begin 732 applying drop to L4S traffic until the overload episode has subsided, 733 as recommended for all AQM methods in [RFC7567] (Section 4.2.1), 734 which follows the similar advice in RFC 3168 (Section 7). During 735 overload, it MUST apply the same drop probability to L4S traffic as 736 it would to Classic traffic. 738 Where an L4S AQM is transport-aware, this requirement could be 739 satisfied by using drop in only the most overloaded individual per- 740 flow AQMs. In a DualQ with flow-aware queue protection (e.g. 741 [I-D.briscoe-docsis-q-protection]), this could be achieved by 742 redirecting packets in those flows contributing most to the overload 743 out of the L4S queue so that they are subjected to drop in the 744 Classic queue. 746 For backward compatibility in uncontrolled environments, a network 747 node that implements the L4S treatment MUST also implement an AQM 748 treatment for the Classic service as defined in Section 1.2. This 749 Classic AQM treatment need not mark ECT(0) packets, but if it does, 750 see Section 5.2 for the strengths of the markings relative to drop. 751 It MUST classify arriving ECT(0) and Not-ECT packets for treatment by 752 this Classic AQM (for the DualQ Coupled AQM, see the extensive 753 discussion on classification in Sections 2.3 and 2.5.1.1 of 754 [I-D.ietf-tsvwg-aqm-dualq-coupled]). 756 In case unforeseen problems arise with the L4S experiment, it MUST be 757 possible to configure an L4S implementation to disable the L4S 758 treatment. Once disabled, all packets of all ECN codepoints will 759 receive Classic treatment and ECT(1) packets MUST be treated as if 760 they were {ToDo: Not-ECT / ECT(0) ?}. 762 5.2. The Strength of L4S CE Marking Relative to Drop 764 The relative strengths of L4S CE and drop are irrelevant where AQMs 765 are implemented in separate queues per-application-flow, which are 766 then explicitly scheduled (e.g. with an FQ scheduler as in 768 [RFC8290]). Nonetheless, the relationship between them needs to be 769 defined for the coupling between L4S and Classic congestion signals 770 in a DualQ Coupled AQM [I-D.ietf-tsvwg-aqm-dualq-coupled], as below. 772 Unless an AQM node schedules application flows explicitly, the 773 likelihood that the AQM drops a Not-ECT Classic packet (p_C) MUST be 774 roughly proportional to the square of the likelihood that it would 775 have marked it if it had been an L4S packet (p_L). That is 777 p_C ~= (p_L / k)^2 779 The constant of proportionality (k) does not have to be standardised 780 for interoperability, but a value of 2 is RECOMMENDED. The term 781 'likelihood' is used above to allow for marking and dropping to be 782 either probabilistic or deterministic. 784 This formula ensures that Scalable and Classic flows will converge to 785 roughly equal congestion windows, for the worst case of Reno 786 congestion control. This is because the congestion windows of 787 Scalable and Classic congestion controls are inversely proportional 788 to p_L and sqrt(p_C) respectively. So squaring p_C in the above 789 formula counterbalances the square root that characterizes Reno- 790 friendly flows. 792 Note that, contrary to RFC 3168, an AQM implementing the L4S and 793 Classic treatments does not mark an ECT(1) packet under the same 794 conditions that it would have dropped a Not-ECT packet, as allowed by 795 [RFC8311], which updates RFC 3168. However, if it marks ECT(0) 796 packets, it does so under the same conditions that it would have 797 dropped a Not-ECT packet [RFC3168]. 799 Also, L4S CE marking needs to be interpreted as an unsmoothed signal, 800 in contrast to the Classic approach in which AQMs filter out 801 variations before signalling congestion. An L4S AQM SHOULD NOT 802 smooth or filter out variations in the queue before signalling 803 congestion. In the L4S architecture [I-D.ietf-tsvwg-l4s-arch], the 804 sender, not the network, is responsible for smoothing out variations. 806 This requirement is worded as 'SHOULD NOT' rather than 'MUST NOT' to 807 allow for the case where the signals from a Classic smoothed AQM are 808 coupled with those from an unsmoothed L4S AQM. Nonetheless, the 809 spirit of the requirement is for all systems to expect that L4S ECN 810 signalling is unsmoothed and unfiltered, which is important for 811 interoperability. 813 5.3. Exception for L4S Packet Identification by Network Nodes with 814 Transport-Layer Awareness 816 To implement the L4S treatment, a network node does not need to 817 identify transport-layer flows. Nonetheless, if an implementer is 818 willing to identify transport-layer flows at a network node, and if 819 the most recent ECT packet in the same flow was ECT(0), the node MAY 820 classify CE packets for Classic ECN [RFC3168] treatment. In all 821 other cases, a network node MUST classify all CE packets for L4S 822 treatment. Examples of such other cases are: i) if no ECT packets 823 have yet been identified in a flow; ii) if it is not desirable for a 824 network node to identify transport-layer flows; or iii) if the most 825 recent ECT packet in a flow was ECT(1). 827 If an implementer uses flow-awareness to classify CE packets, to 828 determine whether the flow is using ECT(0) or ECT(1) it only uses the 829 most recent ECT packet of a flow (this advice will need to be 830 verified as part of L4S experiments). This is because a sender might 831 switch from sending ECT(1) (L4S) packets to sending ECT(0) (Classic 832 ECN) packets, or back again, in the middle of a transport-layer flow 833 (e.g. it might manually switch its congestion control module mid- 834 connection, or it might be deliberately attempting to confuse the 835 network). 837 5.4. Interaction of the L4S Identifier with other Identifiers 839 The examples in this section concern how additional identifiers might 840 complement the L4S identifier to classify packets between class-based 841 queues. Firstly Section 5.4.1 considers two queues, L4S and Classic, 842 as in the Coupled DualQ AQM [I-D.ietf-tsvwg-aqm-dualq-coupled], 843 either alone (Section 5.4.1.1) or within a larger queuing hierarchy 844 (Section 5.4.1.2). Then Section 5.4.2 considers schemes that might 845 combine per-flow 5-tuples with other identifiers. 847 5.4.1. DualQ Examples of Other Identifiers Complementing L4S 848 Identifiers 850 5.4.1.1. Inclusion of Additional Traffic with L4S 852 In a typical case for the public Internet a network element that 853 implements L4S in a shared queue might want to classify some low-rate 854 but unresponsive traffic (e.g. DNS, LDAP, NTP, voice, game sync 855 packets) into the low latency queue to mix with L4S traffic. 857 In this case it would not be appropriate to call the queue an L4S 858 queue, because it is shared by L4S and non-L4S traffic. Instead it 859 will be called the low latency or L queue. The L queue then offers 860 two different treatments: 862 o The L4S treatment, which is a combination of the L4S AQM treatment 863 and a priority scheduling treatment; 865 o The low latency treatment, which is solely the priority scheduling 866 treatment, without ECN-marking by the AQM. 868 To identify packets for just the scheduling treatment, it would be 869 inappropriate to use the L4S ECT(1) identifier, because such traffic 870 is unresponsive to ECN marking. Examples of relevant non-ECN 871 identifiers are: 873 o address ranges of specific applications or hosts configured to be, 874 or known to be, safe, e.g. hard-coded IoT devices sending low 875 intensity traffic; 877 o certain low data-volume applications or protocols (e.g. ARP, DNS); 879 o specific Diffserv codepoints that indicate traffic with limited 880 burstiness such as the EF (Expedited Forwarding [RFC3246]), Voice- 881 Admit [RFC5865] or proposed NQB (Non-Queue-Building 882 [I-D.ietf-tsvwg-nqb]) service classes or equivalent local-use 883 DSCPs (see [I-D.briscoe-tsvwg-l4s-diffserv]). 885 In summary, a network element that implements L4S in a shared queue 886 MAY classify additional types of packets into the L queue based on 887 identifiers other than the ECN field, but the types SHOULD be 'safe' 888 to mix with L4S traffic, where 'safe' is explained in 889 Section 5.4.1.1.1. 891 A packet that carries one of these non-ECN identifiers to classify it 892 into the L queue would not be subject to the L4S ECN marking 893 treatment, unless it also carried an ECT(1) or CE codepoint. The 894 specification of an L4S AQM MUST define the behaviour for packets 895 with unexpected combinations of codepoints, e.g. a non-ECN-based 896 classifier for the L queue, but ECT(0) in the ECN field (for examples 897 see section 2.5.1.1 of [I-D.ietf-tsvwg-aqm-dualq-coupled]). 899 For clarity, non-ECN identifiers, such as the examples itemized 900 above, might be used by some network operators who believe they 901 identify non-L4S traffic that would be safe to mix with L4S traffic. 902 They are not alternative ways for a host to indicate that it is 903 sending L4S packets. Only the ECT(1) ECN codepoint indicates to a 904 network element that a host is sending L4S packets (and CE indicates 905 that it could have originated as ECT(1)). Specifically ECT(1) 906 indicates that the host claims its behaviour satisfies the 907 prerequisite transport requirements in Section 4. 909 To include additional traffic with L4S, a network element only reads 910 identifiers such as those itemized above. It MUST NOT alter these 911 non-ECN identifiers, so that they survive for any potential use later 912 on the network path. 914 5.4.1.1.1. 'Safe' Unresponsive Traffic 916 The above section requires unresponsive traffic to be 'safe' to mix 917 with L4S traffic. Ideally this means that the sender never sends any 918 sequence of packets at a rate that exceeds the available capacity of 919 the bottleneck link. However, typically an unresponsive transport 920 does not even know the bottleneck capacity of the path, let alone its 921 available capacity. Nonetheless, an application can be considered 922 safe enough if it paces packets out (not necessarily completely 923 regularly) such that its maximum instantaneous rate from packet to 924 packet stays well below a typical broadband access rate. 926 This is a vague but useful definition, because many low latency 927 applications of interest, such as DNS, voice, game sync packets, RPC, 928 ACKs, keep-alives, could match this description. 930 5.4.1.2. Exclusion of Traffic From L4S Treatment 932 To extend the above example, an operator might want to exclude some 933 traffic from the L4S treatment for a policy reason, e.g. security 934 (traffic from malicious sources) or commercial (e.g. initially the 935 operator may wish to confine the benefits of L4S to business 936 customers). 938 In this exclusion case, the operator MUST classify on the relevant 939 locally-used identifiers (e.g. source addresses) before classifying 940 the non-matching traffic on the end-to-end L4S ECN identifier. 942 The operator MUST NOT alter the end-to-end L4S ECN identifier from 943 L4S to Classic, because an operator decision to exclude certain 944 traffic from L4S treatment is local-only. The end-to-end L4S 945 identifier then survives for other operators to use, or indeed, they 946 can apply their own policy, independently based on their own choice 947 of locally-used identifiers. This approach also allows any operator 948 to remove its locally-applied exclusions in future, e.g. if it wishes 949 to widen the benefit of the L4S treatment to all its customers. 951 An operator that excludes traffic carrying the L4S identifier from 952 L4S treatment MUST NOT treat such traffic as if it carries the ECT(0) 953 codepoint, which could confuse the sender. 955 5.4.1.3. Generalized Combination of L4S and Other Identifiers 957 L4S concerns low latency, which it can provide for all traffic 958 without differentiation and without _necessarily_ affecting bandwidth 959 allocation. Diffserv provides for differentiation of both bandwidth 960 and low latency, but its control of latency depends on its control of 961 bandwidth. The two can be combined if a network operator wants to 962 control bandwidth allocation but it also wants to provide low latency 963 - for any amount of traffic within one of these allocations of 964 bandwidth (rather than only providing low latency by limiting 965 bandwidth) [I-D.briscoe-tsvwg-l4s-diffserv]. 967 The DualQ examples so far have been framed in the context of 968 providing the default Best Efforts Per-Hop Behaviour (PHB) using two 969 queues - a Low Latency (L) queue and a Classic (C) Queue. This 970 single DualQ structure is expected to be the most common and useful 971 arrangement. But, more generally, an operator might choose to 972 control bandwidth allocation through a hierarchy of Diffserv PHBs at 973 a node, and to offer one (or more) of these PHBs with a low latency 974 and a Classic variant. 976 In the first case, if we assume that a network element provides no 977 PHBs except the DualQ, if a packet carries ECT(1) or CE, the network 978 element would classify it for the L4S treatment irrespective of its 979 DSCP. And, if a packet carried (say) the EF DSCP, the network 980 element could classify it into the L queue irrespective of its ECN 981 codepoint. However, where the DualQ is in a hierarchy of other PHBs, 982 the classifier would classify some traffic into other PHBs based on 983 DSCP before classifying between the low latency and Classic queues 984 (based on ECT(1), CE and perhaps also the EF DSCP or other 985 identifiers as in the above example). 986 [I-D.briscoe-tsvwg-l4s-diffserv] gives a number of examples of such 987 arrangements to address various requirements. 989 [I-D.briscoe-tsvwg-l4s-diffserv] describes how an operator might use 990 L4S to offer low latency for all L4S traffic as well as using 991 Diffserv for bandwidth differentiation. It identifies two main types 992 of approach, which can be combined: the operator might split certain 993 Diffserv PHBs between L4S and a corresponding Classic service. Or it 994 might split the L4S and/or the Classic service into multiple Diffserv 995 PHBs. In either of these cases, a packet would have to be classified 996 on its Diffserv and ECN codepoints. 998 In summary, there are numerous ways in which the L4S ECN identifier 999 (ECT(1) and CE) could be combined with other identifiers to achieve 1000 particular objectives. The following categorization articulates 1001 those that are valid, but it is not necessarily exhaustive. Those 1002 tagged 'Recommended-standard-use' could be set by the sending host or 1003 a network. Those tagged 'Local-use' would only be set by a network: 1005 1. Identifiers Complementing the L4S Identifier 1007 A. Including More Traffic in the L Queue 1008 (Could use Recommended-standard-use or Local-use identifiers) 1010 B. Excluding Certain Traffic from the L Queue 1011 (Local-use only) 1013 2. Identifiers to place L4S classification in a PHB Hierarchy 1014 (Could use Recommended-standard-use or Local-use identifiers) 1016 A. PHBs Before L4S ECN Classification 1018 B. PHBs After L4S ECN Classification 1020 5.4.2. Per-Flow Queuing Examples of Other Identifiers Complementing L4S 1021 Identifiers 1023 At a node with per-flow queueing (e.g. FQ-CoDel [RFC8290]), the L4S 1024 identifier could complement the Layer-4 flow ID as a further level of 1025 flow granularity (i.e. Not-ECT and ECT(0) queued separately from 1026 ECT(1) and CE packets). "Risk of reordering Classic CE packets" in 1027 Appendix B discusses the resulting ambiguity if packets originally 1028 marked ECT(0) are marked CE by an upstream AQM before they arrive at 1029 a node that classifies CE as L4S. It argues that the risk of 1030 reordering is vanishingly small and the consequence of such a low 1031 level of reordering is minimal. 1033 Alternatively, it could be assumed that it is not in a flow's own 1034 interest to mix Classic and L4S identifiers. Then the AQM could use 1035 the ECN field to switch itself between a Classic and an L4S AQM 1036 behaviour within one per-flow queue. For instance, for ECN-capable 1037 packets, the AQM might consist of a simple marking threshold and an 1038 L4S ECN identifier might simply select a shallower threshold than a 1039 Classic ECN identifier would. 1041 5.5. Limiting Packet Bursts from Links Supporting L4S AQMs 1043 As well as senders needing to limit packet bursts (Section 4.3), 1044 links need to limit the degree of burstiness they introduce. In both 1045 cases (senders and links) this is a tradeoff, because batch-handling 1046 of packets is done for good reason, e.g. processing efficiency or to 1047 make efficient use of medium acquisition delay. Some take the 1048 attitude that there is no point reducing burst delay at the sender 1049 below that introduced by links (or vice versa). However, delay 1050 reduction proceeds by cutting down 'the longest pole in the tent', 1051 which turns the spotlight on the next longest, and so on. 1053 This document does not set any quantified requirements for links to 1054 limit burst delay, primarily because link technologies are outside 1055 the remit of L4S specifications. Nonetheless, it would not make 1056 sense to implement an L4S AQM that feeds into a particular link 1057 technology without also reviewing opportunities to reduce any form of 1058 burst delay introduced by that link technology. This would at least 1059 limit the bursts that the link would otherwise introduce into the 1060 onward traffic, which would cause jumpy feedback to the sender as 1061 well as potential extra queuing delay downstream. This document does 1062 not presume to even give guidance on an appropriate target for such 1063 burst delay until there is more industry experience of L4S. However, 1064 as suggested in Section 4.3 it would not seem necessary to limit 1065 bursts lower than roughly 10% of the minimum base RTT expected in the 1066 typical deployment scenario (e.g. 250 us burst duration for links 1067 within the public Internet). 1069 6. Behaviour of Tunnels and Encapsulations 1071 6.1. No Change to ECN Tunnels and Encapsulations in General 1073 The L4S identifier is expected to work through and within any tunnel 1074 without modification, as long as the tunnel propagates the ECN field 1075 in any of the ways that have been defined since the first variant in 1076 the year 2001 [RFC3168]. L4S will also work with (but does not rely 1077 on) any of the more recent updates to ECN propagation in [RFC4301], 1078 [RFC6040] or [I-D.ietf-tsvwg-rfc6040update-shim]. However, it is 1079 likely that some tunnels still do not implement ECN propagation at 1080 all. In these cases, L4S will work through such tunnels, but within 1081 them the outer header of L4S traffic will appear as Classic. 1083 AQMs are typically implemented where an IP-layer buffer feeds into a 1084 lower layer, so they are agnostic to link layer encapsulations. 1085 Where a bottleneck link is not IP-aware, the L4S identifier is still 1086 expected to work within any lower layer encapsulation without 1087 modification, as long it propagates the ECN field as defined for the 1088 link technology, for example for MPLS [RFC5129] or TRILL 1089 [I-D.ietf-trill-ecn-support]. In some of these cases, e.g. layer-3 1090 Ethernet switches, the AQM accesses the IP layer header within the 1091 outer encapsulation, so again the L4S identifier is expected to work 1092 without modification. Nonetheless, the programme to define ECN for 1093 other lower layers is still in progress 1094 [I-D.ietf-tsvwg-ecn-encap-guidelines]. 1096 6.2. VPN Behaviour to Avoid Limitations of Anti-Replay 1098 If a mix of L4S and Classic packets is sent into the same security 1099 association (SA) of a virtual private network (VPN), and if the VPN 1100 egress is employing the optional anti-replay feature, it could 1101 inappropriately discard Classic packets (or discard the records in 1102 Classic packets) by mistaking their greater queuing delay for a 1103 replay attack (see [Heist21] for the potential performance impact). 1104 This known problem is common to both IPsec [RFC4301] and DTLS 1105 [RFC6347] VPNs, given they use similar anti-replay window mechanisms. 1106 The mechanism used can only check for replay within its window, so if 1107 the window is smaller than the degree of reordering, it can only 1108 assume there might be a replay attack and discard all the packets 1109 behind the trailing edge of the window. The specifications of IPsec 1110 AH [RFC4302] and ESP [RFC4303] suggest that an implementer scales the 1111 size of the anti-replay window with interface speed, and the current 1112 draft of DTLS 1.3 [I-D.ietf-tls-dtls13] says "The receiver SHOULD 1113 pick a window large enough to handle any plausible reordering, which 1114 depends on the data rate." However, in practice, the size of a VPN's 1115 anti-replay window is not always scaled appropriately. 1117 If a VPN carrying traffic participating in the L4S experiment 1118 experiences inappropriate replay detection, the foremost remedy would 1119 be to ensure that the egress is configured to comply with the above 1120 window-sizing requirements. 1122 If an implementation of a VPN egress does not support a sufficiently 1123 large anti-replay window, e.g. due to hardware limitations, one of 1124 the temporary alternatives listed in order of preference below might 1125 be feasible instead: 1127 o If the VPN can be configured to classify packets into different 1128 SAs indexed by DSCP, apply the appropriate locally defined DSCPs 1129 to Classic and L4S packets. The DSCPs could be applied by the 1130 network (based on the least significant bit of the ECN field), or 1131 by the sending host. Such DSCPs would only need to survive as far 1132 as the VPN ingress. 1134 o If the above is not possible and it is necessary to use L4S, 1135 either of the following might be appropriate as a last resort: 1137 * disable anti-replay protection at the VPN egress, after 1138 considering the security implications (optional anti-replay is 1139 mandatory in both IPsec and DTLS); 1141 * configure the tunnel ingress not to propagate ECN to the outer, 1142 which would lose the benefits of L4S and Classic ECN over the 1143 VPN. 1145 Modification to VPN implementations is outside the present scope, 1146 which is why this section has so far focused on reconfiguration. 1147 Although this document does not define any requirements for VPN 1148 implementations, determining whether there is a need for such 1149 requirements could be one aspect of L4S experimentation. 1151 7. L4S Experiments 1153 This section describes open questions that L4S Experiments ought to 1154 focus on. This section also documents outstanding open issues that 1155 will need to be investigated as part of L4S experimentation, given 1156 they could not be fully resolved during the WG phase. It also lists 1157 metrics that will need to be monitored during experiments 1158 (summarizing text elsewhere in L4S documents) and finally lists some 1159 potential future directions that researchers might wish to 1160 investigate. 1162 In addition to this section, [I-D.ietf-tsvwg-aqm-dualq-coupled] sets 1163 operational and management requirements for experiments with DualQ 1164 Coupled AQMs; and General operational and management requirements for 1165 experiments with L4S congestion controls are given in Section 4 and 1166 Section 5 above, e.g. co-existence and scaling requirements, 1167 incremental deployment arrangements. 1169 The specification of each scalable congestion control will need to 1170 include protocol-specific requirements for configuration and 1171 monitoring performance during experiments. Appendix A of [RFC5706] 1172 provides a helpful checklist. 1174 7.1. Open Questions 1176 L4S experiments would be expected to answer the following questions: 1178 o Have all the parts of L4S been deployed, and if so, what 1179 proportion of paths support it? 1181 o Does use of L4S over the Internet result in significantly improved 1182 user experience? 1184 o Has L4S enabled novel interactive applications? 1186 o Did use of L4S over the Internet result in improvements to the 1187 following metrics: 1189 o 1191 * queue delay (mean and 99th percentile) under various loads; 1192 * utilization; 1194 * starvation / fairness; 1196 * scaling range of flow rates and RTTs? 1198 o How much does burstiness in the Internet affect L4S performance, 1199 and how much limitation of bustiness was needed and/or was 1200 realized - both at senders and at links, especially radio links? 1202 o Was per-flow queue protection typically (un)necessary? 1204 * How well did overload protection or queue protection work? 1206 o How well did L4S flows coexist with Classic flows when sharing a 1207 bottleneck? 1209 o 1211 * How frequently did problems arise? 1213 * What caused any coexistence problems, and were any problems due 1214 to single-queue Classic ECN AQMs (this assumes single-queue 1215 Classic ECN AQMs can be distinguished from FQ ones)? 1217 o How prevalent were problems with the L4S service due to tunnels / 1218 encapsulations that do not support ECN decapsulation? 1220 o How easy was it to implement a fully compliant L4S congestion 1221 control, over various different transport protocols (TCP. QUIC, 1222 RMCAT, etc)? 1224 Monitoring for harm to other traffic, specifically bandwidth 1225 starvation or excess queuing delay, will need to be conducted 1226 alongside all early L4S experiments. It is hard, if not impossible, 1227 for an individual flow to measure its impact on other traffic. So 1228 such monitoring will need to be conducted using bespoke monitoring 1229 across flows and/or across classes of traffic. 1231 7.2. Open Issues 1233 o What is the best way forward to deal with L4S over single-queue 1234 Classic ECN AQM bottlenecks, given current problems with 1235 misdetecting L4S AQMs as Classic ECN AQMs? 1237 o Fixing the poor Interaction between current L4S congestion 1238 controls and CoDel with only Classic ECN support during flow 1239 startup. 1241 7.3. Future Potential 1243 Researchers might find that L4S opens up the following interesting 1244 areas for investigation: 1246 o Potential for faster convergence time and tracking of available 1247 capacity; 1249 o Potential for improvements to particular link technologies, and 1250 cross-layer interactions with them; 1252 o Potential for using virtual queues, e.g. to further reduce latency 1253 jitter, or to leave headroom for capacity variation in radio 1254 networks; 1256 o Development and specification of reverse path congestion control 1257 using L4S building bocks (e.g. AccECN, QUIC); 1259 o Once queuing delay is cut down, what becomes the 'second longest 1260 pole in the tent' (other than the speed of light)? 1262 o Novel alternatives to the existing set of L4S AQMs; 1264 o Novel applications enabled by L4S. 1266 8. IANA Considerations 1268 The 01 codepoint of the ECN Field of the IP header is specified by 1269 the present Experimental RFC. The process for an experimental RFC to 1270 assign this codepoint in the IP header (v4 and v6) is documented in 1271 Proposed Standard [RFC8311], which updates the Proposed Standard 1272 [RFC3168]. 1274 When the present document is published as an RFC, IANA is asked to 1275 update the 01 entry in the registry, "ECN Field (Bits 6-7)" to the 1276 following (see https://www.iana.org/assignments/dscp-registry/dscp- 1277 registry.xhtml#ecn-field ): 1279 +--------+-----------------------------+----------------------------+ 1280 | Binary | Keyword | References | 1281 +--------+-----------------------------+----------------------------+ 1282 | 01 | ECT(1) (ECN-Capable | [RFC8311] | 1283 | | Transport(1))[1] | [RFC Errata 5399] | 1284 | | | [RFCXXXX] | 1285 +--------+-----------------------------+----------------------------+ 1287 [XXXX is the number that the RFC Editor assigns to the present 1288 document (this sentence to be removed by the RFC Editor)]. 1290 9. Security Considerations 1292 Approaches to assure the integrity of signals using the new 1293 identifier are introduced in Appendix C.1. See the security 1294 considerations in the L4S architecture [I-D.ietf-tsvwg-l4s-arch] for 1295 further discussion of mis-use of the identifier, as well as extensive 1296 discussion of policing rate and latency in regard to L4S. 1298 If the anti-replay window of a VPN egress is too small, it will 1299 mistake deliberate delay differences as a replay attack, and discard 1300 higher delay packets (e.g. Classic) carried within the same security 1301 association (SA) as low delay packets (e.g. L4S). Section 6.2 1302 recommends that VPNs used in L4S experiments are configured with a 1303 sufficiently large anti-replay window, as required by the relevant 1304 specifications. It also discusses other alternatives. 1306 If a user taking part in the L4S experiment sets up a VPN without 1307 being aware of the above advice, and if the user allows anyone to 1308 send traffic into their VPN, they would open up a DoS vulnerability 1309 in which an attacker could induce the VPN's anti-replay mechanism to 1310 discard enough of the user's Classic (C) traffic (if they are 1311 receiving any) to cause a significant rate reduction. While the user 1312 is actively downloading C traffic, the attacker sends C traffic into 1313 the VPN to fill the remainder of the bottleneck link, then sends 1314 intermittent L4S packets to maximize the chance of exceeding the 1315 VPN's replay window. The user can prevent this attack by following 1316 the recommendations in Section 6.2. 1318 The recommendation to detect loss in time units prevents the ACK- 1319 splitting attacks described in [Savage-TCP]. 1321 10. Acknowledgements 1323 Thanks to Richard Scheffenegger, John Leslie, David Taeht, Jonathan 1324 Morton, Gorry Fairhurst, Michael Welzl, Mikael Abrahamsson and Andrew 1325 McGregor for the discussions that led to this specification. Ing-jyh 1326 (Inton) Tsang was a contributor to the early drafts of this document. 1327 And thanks to Mikael Abrahamsson, Lloyd Wood, Nicolas Kuhn, Greg 1328 White, Tom Henderson, David Black, Gorry Fairhurst, Brian Carpenter, 1329 Jake Holland, Rod Grimes, Richard Scheffenegger, Sebastian Moeller, 1330 Neal Cardwell, Praveen Balasubramanian, Reza Marandian Hagh, Stuart 1331 Cheshire and Vidhi Goel for providing help and reviewing this draft 1332 and thanks to Ingemar Johansson for reviewing and providing 1333 substantial text. Thanks to Sebastian Moeller for identifying the 1334 interaction with VPN anti-replay and to Jonathan Morton for 1335 identifying the attack based on this. Particular thanks to tsvwg 1336 chairs Gorry Fairhurst, David Black and Wes Eddy for patiently 1337 helping this and the other L4S drafts through the IETF process. 1339 Appendix A listing the Prague L4S Requirements is based on text 1340 authored by Marcelo Bagnulo Braun that was originally an appendix to 1341 [I-D.ietf-tsvwg-l4s-arch]. That text was in turn based on the 1342 collective output of the attendees listed in the minutes of a 'bar 1343 BoF' on DCTCP Evolution during IETF-94 [TCPPrague]. 1345 The authors' contributions were part-funded by the European Community 1346 under its Seventh Framework Programme through the Reducing Internet 1347 Transport Latency (RITE) project (ICT-317700). Bob Briscoe was also 1348 funded partly by the Research Council of Norway through the TimeIn 1349 project, partly by CableLabs and partly by the Comcast Innovation 1350 Fund. The views expressed here are solely those of the authors. 1352 11. References 1354 11.1. Normative References 1356 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1357 Requirement Levels", BCP 14, RFC 2119, 1358 DOI 10.17487/RFC2119, March 1997, 1359 . 1361 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1362 of Explicit Congestion Notification (ECN) to IP", 1363 RFC 3168, DOI 10.17487/RFC3168, September 2001, 1364 . 1366 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 1367 Explicit Congestion Notification (ECN) Field", BCP 124, 1368 RFC 4774, DOI 10.17487/RFC4774, November 2006, 1369 . 1371 [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., 1372 and K. Carlberg, "Explicit Congestion Notification (ECN) 1373 for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August 1374 2012, . 1376 11.2. Informative References 1378 [A2DTCP] Zhang, T., Wang, J., Huang, J., Huang, Y., Chen, J., and 1379 Y. Pan, "Adaptive-Acceleration Data Center TCP", IEEE 1380 Transactions on Computers 64(6):1522-1533, June 2015, 1381 . 1384 [Ahmed19] Ahmed, A., "Extending TCP for Low Round Trip Delay", 1385 Masters Thesis, Uni Oslo , August 2019, 1386 . 1388 [Alizadeh-stability] 1389 Alizadeh, M., Javanmard, A., and B. Prabhakar, "Analysis 1390 of DCTCP: Stability, Convergence, and Fairness", ACM 1391 SIGMETRICS 2011 , June 2011. 1393 [ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An 1394 Algorithm for Increasing the Robustness of RED's Active 1395 Queue Management", ACIRI Technical Report , August 2001, 1396 . 1398 [BBRv2] Cardwell, N., "TCP BBR v2 Alpha/Preview Release", github 1399 repository; Linux congestion control module, 1400 . 1402 [DCttH15] De Schepper, K., Bondarenko, O., Briscoe, B., and I. 1403 Tsang, "'Data Centre to the Home': Ultra-Low Latency for 1404 All", RITE Project Technical Report , 2015, 1405 . 1407 [DualPI2Linux] 1408 Albisser, O., De Schepper, K., Briscoe, B., Tilmans, O., 1409 and H. Steen, "DUALPI2 - Low Latency, Low Loss and 1410 Scalable (L4S) AQM", Proc. Linux Netdev 0x13 , March 2019, 1411 . 1414 [ecn-fallback] 1415 Briscoe, B. and A. Ahmed, "TCP Prague Fall-back on 1416 Detection of a Classic ECN AQM", bobbriscoe.net Technical 1417 Report TR-BB-2019-002, April 2020, 1418 . 1420 [Heist21] Heist, P., "Dropped Packets for Tunnels with Replay 1421 Protection Enabled", github README, May 2021, 1422 . 1425 [I-D.briscoe-docsis-q-protection] 1426 Briscoe, B. and G. White, "Queue Protection to Preserve 1427 Low Latency", draft-briscoe-docsis-q-protection-00 (work 1428 in progress), July 2019. 1430 [I-D.briscoe-iccrg-prague-congestion-control] 1431 Schepper, K. D., Tilmans, O., and B. Briscoe, "Prague 1432 Congestion Control", draft-briscoe-iccrg-prague- 1433 congestion-control-00 (work in progress), March 2021. 1435 [I-D.briscoe-tsvwg-l4s-diffserv] 1436 Briscoe, B., "Interactions between Low Latency, Low Loss, 1437 Scalable Throughput (L4S) and Differentiated Services", 1438 draft-briscoe-tsvwg-l4s-diffserv-02 (work in progress), 1439 November 2018. 1441 [I-D.ietf-tcpm-accurate-ecn] 1442 Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More 1443 Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate- 1444 ecn-15 (work in progress), July 2021. 1446 [I-D.ietf-tcpm-generalized-ecn] 1447 Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit 1448 Congestion Notification (ECN) to TCP Control Packets", 1449 draft-ietf-tcpm-generalized-ecn-07 (work in progress), 1450 February 2021. 1452 [I-D.ietf-tls-dtls13] 1453 Rescorla, E., Tschofenig, H., and N. Modadugu, "The 1454 Datagram Transport Layer Security (DTLS) Protocol Version 1455 1.3", draft-ietf-tls-dtls13-43 (work in progress), April 1456 2021. 1458 [I-D.ietf-trill-ecn-support] 1459 Eastlake, D. E. and B. Briscoe, "TRILL (TRansparent 1460 Interconnection of Lots of Links): ECN (Explicit 1461 Congestion Notification) Support", draft-ietf-trill-ecn- 1462 support-07 (work in progress), February 2018. 1464 [I-D.ietf-tsvwg-aqm-dualq-coupled] 1465 Schepper, K. D., Briscoe, B., and G. White, "DualQ Coupled 1466 AQMs for Low Latency, Low Loss and Scalable Throughput 1467 (L4S)", draft-ietf-tsvwg-aqm-dualq-coupled-16 (work in 1468 progress), July 2021. 1470 [I-D.ietf-tsvwg-ecn-encap-guidelines] 1471 Briscoe, B. and J. Kaippallimalil, "Guidelines for Adding 1472 Congestion Notification to Protocols that Encapsulate IP", 1473 draft-ietf-tsvwg-ecn-encap-guidelines-16 (work in 1474 progress), May 2021. 1476 [I-D.ietf-tsvwg-l4s-arch] 1477 Briscoe, B., Schepper, K. D., Bagnulo, M., and G. White, 1478 "Low Latency, Low Loss, Scalable Throughput (L4S) Internet 1479 Service: Architecture", draft-ietf-tsvwg-l4s-arch-10 (work 1480 in progress), July 2021. 1482 [I-D.ietf-tsvwg-l4sops] 1483 White, G., "Operational Guidance for Deployment of L4S in 1484 the Internet", draft-ietf-tsvwg-l4sops-01 (work in 1485 progress), July 2021. 1487 [I-D.ietf-tsvwg-nqb] 1488 White, G. and T. Fossati, "A Non-Queue-Building Per-Hop 1489 Behavior (NQB PHB) for Differentiated Services", draft- 1490 ietf-tsvwg-nqb-06 (work in progress), July 2021. 1492 [I-D.ietf-tsvwg-rfc6040update-shim] 1493 Briscoe, B., "Propagating Explicit Congestion Notification 1494 Across IP Tunnel Headers Separated by a Shim", draft-ietf- 1495 tsvwg-rfc6040update-shim-14 (work in progress), May 2021. 1497 [I-D.sridharan-tcpm-ctcp] 1498 Sridharan, M., Tan, K., Bansal, D., and D. Thaler, 1499 "Compound TCP: A New TCP Congestion Control for High-Speed 1500 and Long Distance Networks", draft-sridharan-tcpm-ctcp-02 1501 (work in progress), November 2008. 1503 [I-D.stewart-tsvwg-sctpecn] 1504 Stewart, R. R., Tuexen, M., and X. Dong, "ECN for Stream 1505 Control Transmission Protocol (SCTP)", draft-stewart- 1506 tsvwg-sctpecn-05 (work in progress), January 2014. 1508 [LinuxPacedChirping] 1509 Misund, J. and B. Briscoe, "Paced Chirping - Rethinking 1510 TCP start-up", Proc. Linux Netdev 0x13 , March 2019, 1511 . 1513 [Mathis09] 1514 Mathis, M., "Relentless Congestion Control", PFLDNeT'09 , 1515 May 2009, . 1518 [Paced-Chirping] 1519 Misund, J., "Rapid Acceleration in TCP Prague", Masters 1520 Thesis , May 2018, 1521 . 1524 [PI2] De Schepper, K., Bondarenko, O., Tsang, I., and B. 1525 Briscoe, "PI^2 : A Linearized AQM for both Classic and 1526 Scalable TCP", Proc. ACM CoNEXT 2016 pp.105-119, December 1527 2016, 1528 . 1530 [PragueLinux] 1531 Briscoe, B., De Schepper, K., Albisser, O., Misund, J., 1532 Tilmans, O., Kuehlewind, M., and A. Ahmed, "Implementing 1533 the `TCP Prague' Requirements for Low Latency Low Loss 1534 Scalable Throughput (L4S)", Proc. Linux Netdev 0x13 , 1535 March 2019, . 1538 [QV] Briscoe, B. and P. Hurtig, "Up to Speed with Queue View", 1539 RITE Technical Report D2.3; Appendix C.2, August 2015, 1540 . 1543 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 1544 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 1545 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 1546 S., Wroclawski, J., and L. Zhang, "Recommendations on 1547 Queue Management and Congestion Avoidance in the 1548 Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998, 1549 . 1551 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1552 "Definition of the Differentiated Services Field (DS 1553 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1554 DOI 10.17487/RFC2474, December 1998, 1555 . 1557 [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, 1558 J., Courtney, W., Davari, S., Firoiu, V., and D. 1559 Stiliadis, "An Expedited Forwarding PHB (Per-Hop 1560 Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002, 1561 . 1563 [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit 1564 Congestion Notification (ECN) Signaling with Nonces", 1565 RFC 3540, DOI 10.17487/RFC3540, June 2003, 1566 . 1568 [RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows", 1569 RFC 3649, DOI 10.17487/RFC3649, December 2003, 1570 . 1572 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1573 Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, 1574 December 2005, . 1576 [RFC4302] Kent, S., "IP Authentication Header", RFC 4302, 1577 DOI 10.17487/RFC4302, December 2005, 1578 . 1580 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", 1581 RFC 4303, DOI 10.17487/RFC4303, December 2005, 1582 . 1584 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 1585 Congestion Control Protocol (DCCP)", RFC 4340, 1586 DOI 10.17487/RFC4340, March 2006, 1587 . 1589 [RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion 1590 Control Protocol (DCCP) Congestion Control ID 2: TCP-like 1591 Congestion Control", RFC 4341, DOI 10.17487/RFC4341, March 1592 2006, . 1594 [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for 1595 Datagram Congestion Control Protocol (DCCP) Congestion 1596 Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, 1597 DOI 10.17487/RFC4342, March 2006, 1598 . 1600 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 1601 RFC 4960, DOI 10.17487/RFC4960, September 2007, 1602 . 1604 [RFC5033] Floyd, S. and M. Allman, "Specifying New Congestion 1605 Control Algorithms", BCP 133, RFC 5033, 1606 DOI 10.17487/RFC5033, August 2007, 1607 . 1609 [RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion 1610 Marking in MPLS", RFC 5129, DOI 10.17487/RFC5129, January 1611 2008, . 1613 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 1614 Friendly Rate Control (TFRC): Protocol Specification", 1615 RFC 5348, DOI 10.17487/RFC5348, September 2008, 1616 . 1618 [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. 1619 Ramakrishnan, "Adding Explicit Congestion Notification 1620 (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, 1621 DOI 10.17487/RFC5562, June 2009, 1622 . 1624 [RFC5622] Floyd, S. and E. Kohler, "Profile for Datagram Congestion 1625 Control Protocol (DCCP) Congestion ID 4: TCP-Friendly Rate 1626 Control for Small Packets (TFRC-SP)", RFC 5622, 1627 DOI 10.17487/RFC5622, August 2009, 1628 . 1630 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1631 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1632 . 1634 [RFC5706] Harrington, D., "Guidelines for Considering Operations and 1635 Management of New Protocols and Protocol Extensions", 1636 RFC 5706, DOI 10.17487/RFC5706, November 2009, 1637 . 1639 [RFC5865] Baker, F., Polk, J., and M. Dolly, "A Differentiated 1640 Services Code Point (DSCP) for Capacity-Admitted Traffic", 1641 RFC 5865, DOI 10.17487/RFC5865, May 2010, 1642 . 1644 [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP 1645 Authentication Option", RFC 5925, DOI 10.17487/RFC5925, 1646 June 2010, . 1648 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 1649 Notification", RFC 6040, DOI 10.17487/RFC6040, November 1650 2010, . 1652 [RFC6077] Papadimitriou, D., Ed., Welzl, M., Scharf, M., and B. 1653 Briscoe, "Open Research Issues in Internet Congestion 1654 Control", RFC 6077, DOI 10.17487/RFC6077, February 2011, 1655 . 1657 [RFC6347] Rescorla, E. and N. Modadugu, "Datagram Transport Layer 1658 Security Version 1.2", RFC 6347, DOI 10.17487/RFC6347, 1659 January 2012, . 1661 [RFC6660] Briscoe, B., Moncaster, T., and M. Menth, "Encoding Three 1662 Pre-Congestion Notification (PCN) States in the IP Header 1663 Using a Single Diffserv Codepoint (DSCP)", RFC 6660, 1664 DOI 10.17487/RFC6660, July 2012, 1665 . 1667 [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., 1668 and Y. Nishida, "A Conservative Loss Recovery Algorithm 1669 Based on Selective Acknowledgment (SACK) for TCP", 1670 RFC 6675, DOI 10.17487/RFC6675, August 2012, 1671 . 1673 [RFC7560] Kuehlewind, M., Ed., Scheffenegger, R., and B. Briscoe, 1674 "Problem Statement and Requirements for Increased Accuracy 1675 in Explicit Congestion Notification (ECN) Feedback", 1676 RFC 7560, DOI 10.17487/RFC7560, August 2015, 1677 . 1679 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 1680 Recommendations Regarding Active Queue Management", 1681 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 1682 . 1684 [RFC7713] Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx) 1685 Concepts, Abstract Mechanism, and Requirements", RFC 7713, 1686 DOI 10.17487/RFC7713, December 2015, 1687 . 1689 [RFC8033] Pan, R., Natarajan, P., Baker, F., and G. White, 1690 "Proportional Integral Controller Enhanced (PIE): A 1691 Lightweight Control Scheme to Address the Bufferbloat 1692 Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017, 1693 . 1695 [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., 1696 and G. Judd, "Data Center TCP (DCTCP): TCP Congestion 1697 Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257, 1698 October 2017, . 1700 [RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys, 1701 J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler 1702 and Active Queue Management Algorithm", RFC 8290, 1703 DOI 10.17487/RFC8290, January 2018, 1704 . 1706 [RFC8298] Johansson, I. and Z. Sarker, "Self-Clocked Rate Adaptation 1707 for Multimedia", RFC 8298, DOI 10.17487/RFC8298, December 1708 2017, . 1710 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 1711 Notification (ECN) Experimentation", RFC 8311, 1712 DOI 10.17487/RFC8311, January 2018, 1713 . 1715 [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 1716 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 1717 RFC 8312, DOI 10.17487/RFC8312, February 2018, 1718 . 1720 [RFC8511] Khademi, N., Welzl, M., Armitage, G., and G. Fairhurst, 1721 "TCP Alternative Backoff with ECN (ABE)", RFC 8511, 1722 DOI 10.17487/RFC8511, December 2018, 1723 . 1725 [RFC8888] Sarker, Z., Perkins, C., Singh, V., and M. Ramalho, "RTP 1726 Control Protocol (RTCP) Feedback for Congestion Control", 1727 RFC 8888, DOI 10.17487/RFC8888, January 2021, 1728 . 1730 [RFC8985] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "The 1731 RACK-TLP Loss Detection Algorithm for TCP", RFC 8985, 1732 DOI 10.17487/RFC8985, February 2021, 1733 . 1735 [RFC9000] Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 1736 Multiplexed and Secure Transport", RFC 9000, 1737 DOI 10.17487/RFC9000, May 2021, 1738 . 1740 [Savage-TCP] 1741 Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, 1742 "TCP Congestion Control with a Misbehaving Receiver", ACM 1743 SIGCOMM Computer Communication Review 29(5):71--78, 1744 October 1999. 1746 [SCReAM] Johansson, I., "SCReAM", github repository; , 1747 . 1750 [sub-mss-prob] 1751 Briscoe, B. and K. De Schepper, "Scaling TCP's Congestion 1752 Window for Small Round Trip Times", BT Technical Report 1753 TR-TUB8-2015-002, May 2015, 1754 . 1756 [TCP-CA] Jacobson, V. and M. Karels, "Congestion Avoidance and 1757 Control", Laurence Berkeley Labs Technical Report , 1758 November 1988, . 1760 [TCPPrague] 1761 Briscoe, B., "Notes: DCTCP evolution 'bar BoF': Tue 21 Jul 1762 2015, 17:40, Prague", tcpprague mailing list archive , 1763 July 2015, . 1766 [VCP] Xia, Y., Subramanian, L., Stoica, I., and S. Kalyanaraman, 1767 "One more bit is enough", Proc. SIGCOMM'05, ACM CCR 1768 35(4)37--48, 2005, 1769 . 1771 Appendix A. The 'Prague L4S Requirements' 1773 This appendix is informative, not normative. It gives a list of 1774 modifications to current scalable congestion controls so that they 1775 can be deployed over the public Internet and coexist safely with 1776 existing traffic. The list complements the normative requirements in 1777 Section 4 that a sender has to comply with before it can set the L4S 1778 identifier in packets it sends into the Internet. As well as 1779 necessary safety improvements (requirements) this appendix also 1780 includes preferable performance improvements (optimizations). 1782 These recommendations have become know as the Prague L4S 1783 Requirements, because they were originally identified at an ad hoc 1784 meeting during IETF-94 in Prague [TCPPrague]. They were originally 1785 called the 'TCP Prague Requirements', but they are not solely 1786 applicable to TCP, so the name and wording has been generalized for 1787 all transport protocols, and the name 'TCP Prague' is now used for a 1788 specific implementation of the requirements. 1790 At the time of writing, DCTCP [RFC8257] is the most widely used 1791 scalable transport protocol. In its current form, DCTCP is specified 1792 to be deployable only in controlled environments. Deploying it in 1793 the public Internet would lead to a number of issues, both from the 1794 safety and the performance perspective. The modifications and 1795 additional mechanisms listed in this section will be necessary for 1796 its deployment over the global Internet. Where an example is needed, 1797 DCTCP is used as a base, but it is likely that most of these 1798 requirements equally apply to other scalable congestion controls, 1799 covering adaptive real-time media, etc., not just capacity-seeking 1800 behaviours. 1802 A.1. Requirements for Scalable Transport Protocols 1804 A.1.1. Use of L4S Packet Identifier 1806 Description: A scalable congestion control needs to distinguish the 1807 packets it sends from those sent by Classic congestion controls (see 1808 the precise normative requirement wording in Section 4.1). 1810 Motivation: It needs to be possible for a network node to classify 1811 L4S packets without flow state into a queue that applies an L4S ECN 1812 marking behaviour and isolates L4S packets from the queuing delay of 1813 Classic packets. 1815 A.1.2. Accurate ECN Feedback 1817 Description: The transport protocol for a scalable congestion control 1818 needs to provide timely, accurate feedback about the extent of ECN 1819 marking experienced by all packets (see the precise normative 1820 requirement wording in Section 4.2). 1822 Motivation: Classic congestion controls only need feedback about the 1823 existence of a congestion episode within a round trip, not precisely 1824 how many packets were marked with ECN or dropped. Therefore, in 1825 2001, when ECN feedback was added to TCP [RFC3168], it could not 1826 inform the sender of more than one ECN mark per RTT. Since then, 1827 requirements for more accurate ECN feedback in TCP have been defined 1828 in [RFC7560] and [I-D.ietf-tcpm-accurate-ecn] specifies a change to 1829 the TCP protocol to satisfy these requirements. Most other transport 1830 protocols already satisfy this requirement (see Section 4.2). 1832 A.1.3. Capable of Replacement by Classic Congestion Control 1834 Description: It needs to be possible to replace the implementation of 1835 a scalable congestion control with a Classic control (see the precise 1836 normative requirement wording in Section 4.3). 1838 Motivation: L4S is an experimental protocol, therefore it seems 1839 prudent to be able to disable it at source in case of insurmountable 1840 problems, perhaps due to some unexpected interaction on a particular 1841 sender; over a particular path or network; with a particular receiver 1842 or even ultimately an insurmountable problem with the experiment as a 1843 whole. 1845 A.1.4. Fall back to Classic Congestion Control on Packet Loss 1847 Description: As well as responding to ECN markings in a scalable way, 1848 a scalable congestion control needs to react to packet loss in a way 1849 that will coexist safely with a Reno congestion control [RFC5681] 1850 (see the precise normative requirement wording in Section 4.3). 1852 Motivation: Part of the safety conditions for deploying a scalable 1853 congestion control on the public Internet is to make sure that it 1854 behaves properly when it builds a queue at a network bottleneck that 1855 has not been upgraded to support L4S. Packet loss can have many 1856 causes, but it usually has to be conservatively assumed that it is a 1857 sign of congestion. Therefore, on detecting packet loss, a scalable 1858 congestion control will need to fall back to Classic congestion 1859 control behaviour. If it does not comply with this requirement it 1860 could starve Classic traffic. 1862 A scalable congestion control can be used for different types of 1863 transport, e.g. for real-time media or for reliable transport like 1864 TCP. Therefore, the particular Classic congestion control behaviour 1865 to fall back on will need to be dependent on the specific congestion 1866 control implementation. In the particular case of DCTCP, the DCTCP 1867 specification [RFC8257] states that "It is RECOMMENDED that an 1868 implementation deal with loss episodes in the same way as 1869 conventional TCP." For safe deployment of a scalable congestion 1870 control in the public Internet, the above requirement would need to 1871 be defined as a "MUST". 1873 Even though a bottleneck is L4S capable, it might still become 1874 overloaded and have to drop packets. In this case, the sender may 1875 receive a high proportion of packets marked with the CE bit set and 1876 also experience loss. Current DCTCP implementations each react 1877 differently to this situation. At least one implementation reacts 1878 only to the drop signal (e.g. by halving the CWND) and at least 1879 another DCTCP implementation reacts to both signals (e.g. by halving 1880 the CWND due to the drop and also further reducing the CWND based on 1881 the proportion of marked packet). A third approach for the public 1882 Internet has been proposed that adjusts the loss response to result 1883 in a halving when combined with the ECN response. We believe that 1884 further experimentation is needed to understand what is the best 1885 behaviour for the public Internet, which may or not be one of these 1886 existing approaches. 1888 A.1.5. Coexistence with Classic Congestion Control at Classic ECN 1889 bottlenecks 1891 Description: Monitoring has to be in place so that a non-L4S but ECN- 1892 capable AQM can be detected at path bottlenecks. This is in case 1893 such an AQM has been implemented in a shared queue, in which case any 1894 long-running scalable flow would predominate over any simultaneous 1895 long-running Classic flow sharing the queue. The requirement is 1896 written so that such a problem could either be resolved in real-time, 1897 or via administrative intervention (see the precise normative 1898 requirement wording in Section 4.3). 1900 Motivation: Similarly to the requirement in Appendix A.1.4, this 1901 requirement is a safety condition to ensure an L4S congestion control 1902 coexists well with Classic flows when it builds a queue at a shared 1903 network bottleneck that has not been upgraded to support L4S. 1904 Nonetheless, if necessary, it is considered reasonable to resolve 1905 such problems over management timescales (possibly involving human 1906 intervention) because: 1908 o although a Classic flow can considerably reduce its throughput in 1909 the face of a competing scalable flow, it still makes progress and 1910 does not starve; 1912 o implementations of a Classic ECN AQM in a queue that is intended 1913 to be shared are believed to be rare; 1915 o detection of such AQMs is not always clear-cut; so focused out-of- 1916 band testing (or even contacting the relevant network operator) 1917 would improve certainty. 1919 Therefore, the relevant normative requirement (Section 4.3) is 1920 divided into three stages: monitoring, detection and action: 1922 Monitoring: Monitoring involves collection of the measurement data 1923 to be analysed. Monitoring is expressed as a 'MUST' for 1924 uncontrolled environments, although the placement of the 1925 monitoring function is left open. Whether monitoring has to be 1926 applied in real-time is expressed as a 'SHOULD'. This allows for 1927 the possibility that the operator of an L4S sender (e.g. a CDN) 1928 might prefer to test out-of-band for signs of Classic ECN AQMs, 1929 perhaps to avoid continually consuming resources to monitor live 1930 traffic. 1932 Detection: Detection involves analysis of the monitored data to 1933 detect the likelihood of a Classic ECN AQM. The requirements 1934 recommend that detection occurs live in real-time. However, 1935 detection is allowed to be deferred (e.g. it might involve further 1936 testing targeted at candidate AQMs); 1938 Action: This involves the act of switching the sender to a Classic 1939 congestion control. This might occur in real-time within the 1940 congestion control for the subsequent duration of a flow, or it 1941 might involve administrative action to switch to Classic 1942 congestion control for a specific interface or for a certain set 1943 of destination addresses. 1945 Instead of the sender taking action itself, the operator of the 1946 sender (e.g. a CDN) might prefer to ask the network operator to 1947 modify the Classic AQM's treatment of L4S packets; or to ensure 1948 L4S packets bypass the AQM; or to upgrade the AQM to support L4S. 1949 Once L4S flows no longer shared the Classic ECN AQM they would 1950 obviously no longer detect it, and the requirement to act on it 1951 would no longer apply. 1953 The whole set of normative requirements concerning Classic ECN AQMs 1954 does not apply in controlled environments, such as private networks 1955 or data centre networks. CDN servers placed within an access ISP's 1956 network can be considered as a single controlled environment, but any 1957 onward networks served by the access network, including all the 1958 attached customer networks, would be unlikely to fall under the same 1959 degree of coordinated control. Monitoring is expressed as a 'MUST' 1960 for these uncontrolled segments of paths (e.g. beyond the access ISP 1961 in a home network), because there is a possibility that there might 1962 be a shared queue Classic ECN AQM in that segment. Nonetheless, the 1963 intent is to only require occasional monitoring of these uncontrolled 1964 regions, and not to burden CDN operators if monitoring never uncovers 1965 any potential problems, given it is anyway in the CDN's own interests 1966 not to degrade the service of its own customers. 1968 More detailed discussion of all the above options and alternatives 1969 can be found in [I-D.ietf-tsvwg-l4sops]. 1971 Having said all the above, the approach recommended in the 1972 requirements is to monitor, detect and act in real-time on live 1973 traffic. A passive monitoring algorithm to detect a Classic ECN AQM 1974 at the bottleneck and fall back to Classic congestion control is 1975 described in an extensive technical report [ecn-fallback], which also 1976 provides a link to Linux source code, and a large online 1977 visualization of its evaluation results. Very briefly, the algorithm 1978 primarily monitors RTT variation using the same algorithm that 1979 maintains the mean deviation of TCP's smoothed RTT, but it smooths 1980 over a duration of the order of a Classic sawtooth. The outcome is 1981 also conditioned on other metrics such as the presence of CE marking 1982 and congestion avoidance phase having stabilized. The report also 1983 identifies further work to improve the approach, for instance 1984 improvements with low capacity links and combining the measurements 1985 with a cache of what had been learned about a path in previous 1986 connections. The report also suggests alternative approaches. 1988 Although using passive measurements within live traffic (as above) 1989 can detect a Classic ECN AQM, it is much harder (perhaps impossible) 1990 to determine whether or not the AQM is in a shared queue. 1991 Nonetheless, this is much easier using active test traffic out-of- 1992 band, because two flows can be used. Section 4 of the same report 1993 [ecn-fallback] describes a simple technique to detect a Classic ECN 1994 AQM and determine whether it is in a shared queue, summarized here. 1996 An L4S-enabled test server could be set up so that, when a test 1997 client accesses it, it serves a script that gets the client to open 1998 two parallel long-running flows. It could serve one with a Classic 1999 congestion control (C, that sets ECT(0)) and one with a scaleable CC 2000 (L, that sets ECT(1)).If neither flow induces any ECN marks, it can 2001 be presumed the path does not contain a Classic ECN AQM. If either 2002 flow induces some ECN marks, the server could measure the relative 2003 flow rates and round trip times of the two flows. Table 1 shows the 2004 AQM that can be inferred for various cases. 2006 +--------+-------+------------------------+ 2007 | Rate | RTT | Inferred AQM | 2008 +--------+-------+------------------------+ 2009 | L > C | L = C | Classic ECN AQM (FIFO) | 2010 | L = C | L = C | Classic ECN AQM (FQ) | 2011 | L = C | L < C | FQ-L4S AQM | 2012 | L ~= C | L < C | Coupled DualQ AQM | 2013 +--------+-------+------------------------+ 2015 Table 1: Out-of-band testing with two parallel flows. L:=L4S, 2016 C:=Classic. 2018 Finally, we motivate the recommendation in Section 4.3 that a 2019 scalable congestion control is not expected to change to setting 2020 ECT(0) while it adapts its behaviour to coexist with Classic flows. 2021 This is because the sender needs to continue to check whether it made 2022 the right decision - and switch back if it was wrong, or if a 2023 different link becomes the bottleneck: 2025 o If, as recommended, the sender changes only its behaviour but not 2026 its codepoint to Classic, its codepoint will still be compatible 2027 with either an L4S or a Classic AQM. If the bottleneck does 2028 actually support both, it will still classify ECT(1) into the same 2029 L4S queue, where the sender can measure that switching to Classic 2030 behaviour was wrong, so that it can switch back. 2032 o In contrast, if the sender changes both its behaviour and its 2033 codepoint to Classic, even if the bottleneck supports both, it 2034 will classify ECT(0) into the Classic queue, reinforcing the 2035 sender's incorrect decision so that it never switches back. 2037 o Also, not changing codepoint avoids the risk of being flipped to a 2038 different path by a load balancer or multipath routing that hashes 2039 on the whole of the ex-ToS byte (unfortunately still a common 2040 pathology). 2042 Note that if a flow is configured to _only_ use a Classic congestion 2043 control, it is then entirely appropriate not to use ECT(1). 2045 A.1.6. Reduce RTT dependence 2047 Description: A scalable congestion control needs to reduce RTT bias 2048 as much as possible at least over the low to typical range of RTTs 2049 that will interact in the intended deployment scenario (see the 2050 precise normative requirement wording in Section 4.3). 2052 Motivation: The throughput of Classic congestion controls is known to 2053 be inversely proportional to RTT, so one would expect flows over very 2054 low RTT paths to nearly starve flows over larger RTTs. However, 2055 Classic congestion controls have never allowed a very low RTT path to 2056 exist because they induce a large queue. For instance, consider two 2057 paths with base RTT 1 ms and 100 ms. If a Classic congestion control 2058 induces a 100 ms queue, it turns these RTTs into 101 ms and 200 ms 2059 leading to a throughput ratio of about 2:1. Whereas if a scalable 2060 congestion control induces only a 1 ms queue, the ratio is 2:101, 2061 leading to a throughput ratio of about 50:1. 2063 Therefore, with very small queues, long RTT flows will essentially 2064 starve, unless scalable congestion controls comply with this 2065 requirement. 2067 The RTT bias in current Classic congestion controls works 2068 satisfactorily when the RTT is higher than typical, and L4S does not 2069 change that. So, there is no additional requirement for high RTT L4S 2070 flows to remove RTT bias - they can but they don't have to. 2072 A.1.7. Scaling down to fractional congestion windows 2074 Description: A scalable congestion control needs to remain responsive 2075 to congestion when typical RTTs over the public Internet are 2076 significantly smaller because they are no longer inflated by queuing 2077 delay (see the precise normative requirement wording in Section 4.3). 2079 Motivation: As currently specified, the minimum congestion window of 2080 ECN-capable TCP (and its derivatives) is expected to be 2 sender 2081 maximum segment sizes (SMSS), or 1 SMSS after a retransmission 2082 timeout. Once the congestion window reaches this minimum, if there 2083 is further ECN-marking, TCP is meant to wait for a retransmission 2084 timeout before sending another segment (see section 6.1.2 of 2085 [RFC3168]). In practice, most known window-based congestion control 2086 algorithms become unresponsive to congestion signals at this point. 2087 No matter how much drop or ECN marking, the congestion window no 2088 longer reduces. Instead, the sender's lack of any further congestion 2089 response forces the queue to grow, overriding any AQM and increasing 2090 queuing delay (making the window large enough to become responsive 2091 again). 2093 Most congestion controls for other transport protocols have a similar 2094 minimum, albeit when measured in bytes for those that use smaller 2095 packets. 2097 L4S mechanisms significantly reduce queueing delay so, over the same 2098 path, the RTT becomes lower. Then this problem becomes surprisingly 2099 common [sub-mss-prob]. This is because, for the same link capacity, 2100 smaller RTT implies a smaller window. For instance, consider a 2101 residential setting with an upstream broadband Internet access of 8 2102 Mb/s, assuming a max segment size of 1500 B. Two upstream flows will 2103 each have the minimum window of 2 SMSS if the RTT is 6 ms or less, 2104 which is quite common when accessing a nearby data centre. So, any 2105 more than two such parallel TCP flows will become unresponsive and 2106 increase queuing delay. 2108 Unless scalable congestion controls address this requirement from the 2109 start, they will frequently become unresponsive, negating the low 2110 latency benefit of L4S, for themselves and for others. 2112 That would seem to imply that scalable congestion controllers ought 2113 to be required to be able work with a congestion window less than 2114 1 SMSS. For instance, if an ECN-capable TCP gets an ECN-mark when it 2115 is already sitting at a window of 1 SMSS, RFC 3168 requires it to 2116 defer sending for a retransmission timeout. A less drastic but more 2117 complex mechanism can maintain a congestion window less than 1 SMSS 2118 (significantly less if necessary), as described in [Ahmed19]. Other 2119 approaches are likely to be feasible. 2121 However, the requirement in Section 4.3 is worded as a "SHOULD" 2122 because the existence of a minimum window is not all bad. When 2123 competing with an unresponsive flow, a minimum window naturally 2124 protects the flow from starvation by at least keeping some data 2125 flowing. 2127 By stating the requirement to go lower than 1 SMSS as a "SHOULD", 2128 while the requirement in RFC 3168 still stands as well, we shall be 2129 able to watch the choices of minimum window evolve in different 2130 scalable congestion controllers. 2132 A.1.8. Measuring Reordering Tolerance in Time Units 2134 Description: When detecting loss, a scalable congestion control needs 2135 to be tolerant to reordering over an adaptive time interval, which 2136 scales with throughput, rather than counting only in fixed units of 2137 packets, which does not scale (see the precise normative requirement 2138 wording in Section 4.3). 2140 Motivation: A primary purpose of L4S is scalable throughput (it's in 2141 the name). Scalability in all dimensions is, of course, also a goal 2142 of all IETF technology. The inverse linear congestion response in 2143 Section 4.3 is necessary, but not sufficient, to solve the congestion 2144 control scalability problem identified in [RFC3649]. As well as 2145 maintaining frequent ECN signals as rate scales, it is also important 2146 to ensure that a potentially false perception of loss does not limit 2147 throughput scaling. 2149 End-systems cannot know whether a missing packet is due to loss or 2150 reordering, except in hindsight - if it appears later. So they can 2151 only deem that there has been a loss if a gap in the sequence space 2152 has not been filled, either after a certain number of subsequent 2153 packets has arrived (e.g. the 3 DupACK rule of standard TCP 2154 congestion control [RFC5681]) or after a certain amount of time 2155 (e.g. the RACK approach [RFC8985]). 2157 As we attempt to scale packet rate over the years: 2159 o Even if only _some_ sending hosts still deem that loss has 2160 occurred by counting reordered packets, _all_ networks will have 2161 to keep reducing the time over which they keep packets in order. 2162 If some link technologies keep the time within which reordering 2163 occurs roughly unchanged, then loss over these links, as perceived 2164 by these hosts, will appear to continually rise over the years. 2166 o In contrast, if all senders detect loss in units of time, the time 2167 over which the network has to keep packets in order stays roughly 2168 invariant. 2170 Therefore hosts have an incentive to detect loss in time units (so as 2171 not to fool themselves too often into detecting losses when there are 2172 none). And for hosts that are changing their congestion control 2173 implementation to L4S, there is no downside to including time-based 2174 loss detection code in the change (loss recovery implemented in 2175 hardware is an exception, covered later). Therefore requiring L4S 2176 hosts to detect loss in time-based units would not be a burden. 2178 If this requirement is not placed on L4S hosts, even though it would 2179 be no burden on them to do so, all networks will face unnecessary 2180 uncertainty over whether some L4S hosts might be detecting loss by 2181 counting packets. Then _all_ link technologies will have to 2182 unnecessarily keep reducing the time within which reordering occurs. 2183 That is not a problem for some link technologies, but it becomes 2184 increasingly challenging for other link technologies to continue to 2185 scale, particularly those relying on channel bonding for scaling, 2186 such as LTE, 5G and DOCSIS. 2188 Given Internet paths traverse many link technologies, any scaling 2189 limit for these more challenging access link technologies would 2190 become a scaling limit for the Internet as a whole. 2192 It might be asked how it helps to place this loss detection 2193 requirement only on L4S hosts, because networks will still face 2194 uncertainty over whether non-L4S flows are detecting loss by counting 2195 DupACKs. The answer is that those link technologies for which it is 2196 challenging to keep squeezing the reordering time will only need to 2197 do so for non-L4S traffic (which they can do because the L4S 2198 identifier is visible at the IP layer). Therefore, they can focus 2199 their processing and memory resources into scaling non-L4S (Classic) 2200 traffic. Then, the higher the proportion of L4S traffic, the less of 2201 a scaling challenge they will have. 2203 To summarize, there is no reason for L4S hosts not to be part of the 2204 solution instead of part of the problem. 2206 Requirement ("MUST") or recommendation ("SHOULD")? As explained 2207 above, this is a subtle interoperability issue between hosts and 2208 networks, which seems to need a "MUST". Unless networks can be 2209 certain that all L4S hosts follow the time-based approach, they still 2210 have to cater for the worst case - continually squeeze reordering 2211 into a smaller and smaller duration - just for hosts that might be 2212 using the counting approach. However, it was decided to express this 2213 as a recommendation, using "SHOULD". The main justification was that 2214 networks can still be fairly certain that L4S hosts will follow this 2215 recommendation, because following it offers only gain and no pain. 2217 Details: 2219 The speed of loss recovery is much more significant for short flows 2220 than long, therefore a good compromise is to adapt the reordering 2221 window; from a small fraction of the RTT at the start of a flow, to a 2222 larger fraction of the RTT for flows that continue for many round 2223 trips. 2225 This is broadly the approach adopted by TCP RACK (Recent 2226 ACKnowledgements) [RFC8985]. However, RACK starts with the 3 DupACK 2227 approach, because the RTT estimate is not necessarily stable. As 2228 long as the initial window is paced, such initial use of 3 DupACK 2229 counting would amount to time-based loss detection and therefore 2230 would satisfy the time-based loss detection recommendation of 2231 Section 4.3. This is because pacing of the initial window would 2232 ensure that 3 DupACKs early in the connection would be spread over a 2233 small fraction of the round trip. 2235 As mentioned above, hardware implementations of loss recovery using 2236 DupACK counting exist (e.g. some implementations of RoCEv2 for RDMA). 2237 For low latency, these implementations can change their congestion 2238 control to implement L4S, because the congestion control (as distinct 2239 from loss recovery) is implemented in software. But they cannot 2240 easily satisfy this loss recovery requirement. However, it is 2241 believed they do not need to, because such implementations are 2242 believed to solely exist in controlled environments, where the 2243 network technology keeps reordering extremely low anyway. This is 2244 why controlled environments with hardly any reordering are excluded 2245 from the scope of the normative recommendation in Section 4.3. 2247 Detecting loss in time units also prevents the ACK-splitting attacks 2248 described in [Savage-TCP]. 2250 A.2. Scalable Transport Protocol Optimizations 2252 A.2.1. Setting ECT in Control Packets and Retransmissions 2254 Description: This item concerns TCP and its derivatives (e.g. SCTP) 2255 as well as RTP/RTCP [RFC6679]. The original specification of ECN for 2256 TCP precluded the use of ECN on control packets and retransmissions. 2257 To improve performance, scalable transport protocols ought to enable 2258 ECN at the IP layer in TCP control packets (SYN, SYN-ACK, pure ACKs, 2259 etc.) and in retransmitted packets. The same is true for derivatives 2260 of TCP, e.g. SCTP. Similarly [RFC6679] precludes the use of ECT on 2261 RTCP datagrams, in case the path changes after it has been checked 2262 for ECN traversal. 2264 Motivation (TCP): RFC 3168 prohibits the use of ECN on these types of 2265 TCP packet, based on a number of arguments. This means these packets 2266 are not protected from congestion loss by ECN, which considerably 2267 harms performance, particularly for short flows. 2268 [I-D.ietf-tcpm-generalized-ecn] proposes experimental use of ECN on 2269 all types of TCP packet as long as AccECN feedback 2270 [I-D.ietf-tcpm-accurate-ecn] is available (which itself satisfies the 2271 accurate feedback requirement in Section 4.2 for using a scalable 2272 congestion control). 2274 Motivation (RTCP): L4S experiments in general will need to observe 2275 the rule in [RFC6679] that precludes ECT on RTCP datagrams. 2276 Nonetheless, as ECN usage becomes more widespread, it would be useful 2277 to conduct specific experiments with ECN-capable RTCP to gather data 2278 on whether such caution is necessary. 2280 A.2.2. Faster than Additive Increase 2282 Description: It would improve performance if scalable congestion 2283 controls did not limit their congestion window increase to the 2284 standard additive increase of 1 SMSS per round trip [RFC5681] during 2285 congestion avoidance. The same is true for derivatives of TCP 2286 congestion control, including similar approaches used for real-time 2287 media. 2289 Motivation: As currently defined [RFC8257], DCTCP uses the 2290 traditional Reno additive increase in congestion avoidance phase. 2291 When the available capacity suddenly increases (e.g. when another 2292 flow finishes, or if radio capacity increases) it can take very many 2293 round trips to take advantage of the new capacity. TCP Cubic was 2294 designed to solve this problem, but as flow rates have continued to 2295 increase, the delay accelerating into available capacity has become 2296 prohibitive. See, for instance, the examples in Section 1.2. Even 2297 when out of its Reno-compatibility mode, every 8x scaling of Cubic's 2298 flow rate leads to 2x more acceleration delay. 2300 In the steady state, DCTCP induces about 2 ECN marks per round trip, 2301 so it is possible to quickly detect when these signals have 2302 disappeared and seek available capacity more rapidly, while 2303 minimizing the impact on other flows (Classic and scalable) 2304 [LinuxPacedChirping]. Alternatively, approaches such as Adaptive 2305 Acceleration (A2DTCP [A2DTCP]) have been proposed to address this 2306 problem in data centres, which might be deployable over the public 2307 Internet. 2309 A.2.3. Faster Convergence at Flow Start 2311 Description: It would improve performance if scalable congestion 2312 controls converged (reached their steady-state share of the capacity) 2313 faster than Classic congestion controls or at least no slower. This 2314 affects the flow start behaviour of any L4S congestion control 2315 derived from a Classic transport that uses TCP slow start, including 2316 those for real-time media. 2318 Motivation: As an example, a new DCTCP flow takes longer than a 2319 Classic congestion control to obtain its share of the capacity of the 2320 bottleneck when there are already ongoing flows using the bottleneck 2321 capacity. In a data centre environment DCTCP takes about a factor of 2322 1.5 to 2 longer to converge due to the much higher typical level of 2323 ECN marking that DCTCP background traffic induces, which causes new 2324 flows to exit slow start early [Alizadeh-stability]. In testing for 2325 use over the public Internet the convergence time of DCTCP relative 2326 to a regular loss-based TCP slow start is even less favourable 2327 [Paced-Chirping] due to the shallow ECN marking threshold needed for 2328 L4S. It is exacerbated by the typically greater mismatch between the 2329 link rate of the sending host and typical Internet access 2330 bottlenecks. This problem is detrimental in general, but would 2331 particularly harm the performance of short flows relative to Classic 2332 congestion controls. 2334 Appendix B. Compromises in the Choice of L4S Identifier 2336 This appendix is informative, not normative. As explained in 2337 Section 2, there is insufficient space in the IP header (v4 or v6) to 2338 fully accommodate every requirement. So the choice of L4S identifier 2339 involves tradeoffs. This appendix records the pros and cons of the 2340 choice that was made. 2342 Non-normative recap of the chosen codepoint scheme: 2344 Packets with ECT(1) and conditionally packets with CE signify L4S 2345 semantics as an alternative to the semantics of Classic ECN 2346 [RFC3168], specifically: 2348 * The ECT(1) codepoint signifies that the packet was sent by an 2349 L4S-capable sender. 2351 * Given shortage of codepoints, both L4S and Classic ECN sides of 2352 an AQM have to use the same CE codepoint to indicate that a 2353 packet has experienced congestion. If a packet that had 2354 already been marked CE in an upstream buffer arrived at a 2355 subsequent AQM, this AQM would then have to guess whether to 2356 classify CE packets as L4S or Classic ECN. Choosing the L4S 2357 treatment is a safer choice, because then a few Classic packets 2358 might arrive early, rather than a few L4S packets arriving 2359 late. 2361 * Additional information might be available if the classifier 2362 were transport-aware. Then it could classify a CE packet for 2363 Classic ECN treatment if the most recent ECT packet in the same 2364 flow had been marked ECT(0). However, the L4S service ought 2365 not to need tranport-layer awareness. 2367 Cons: 2369 Consumes the last ECN codepoint: The L4S service could potentially 2370 supersede the service provided by Classic ECN, therefore using 2371 ECT(1) to identify L4S packets could ultimately mean that the 2372 ECT(0) codepoint was 'wasted' purely to distinguish one form of 2373 ECN from its successor. 2375 ECN hard in some lower layers: It is not always possible to support 2376 the equivalent of an IP-ECN field in an AQM acting in a buffer 2377 below the IP layer [I-D.ietf-tsvwg-ecn-encap-guidelines]. Then, 2378 depending on the lower layer scheme, the L4S service might have to 2379 drop rather than mark frames even though they might encapsulate an 2380 ECN-capable packet. 2382 Risk of reordering Classic CE packets within a flow: Classifying all 2383 CE packets into the L4S queue risks any CE packets that were 2384 originally ECT(0) being incorrectly classified as L4S. If there 2385 were delay in the Classic queue, these incorrectly classified CE 2386 packets would arrive early, which is a form of reordering. 2388 Reordering within a microflow can cause TCP senders (and senders 2389 of similar transports) to retransmit spuriously. However, the 2390 risk of spurious retransmissions would be extremely low for the 2391 following reasons: 2393 1. It is quite unusual to experience queuing at more than one 2394 bottleneck on the same path (the available capacities have to 2395 be identical). 2397 2. In only a subset of these unusual cases would the first 2398 bottleneck support Classic ECN marking while the second 2399 supported L4S ECN marking, which would be the only scenario 2400 where some ECT(0) packets could be CE marked by an AQM 2401 supporting Classic ECN then the remainder experienced further 2402 delay through the Classic side of a subsequent L4S DualQ AQM. 2404 3. Even then, when a few packets are delivered early, it takes 2405 very unusual conditions to cause a spurious retransmission, in 2406 contrast to when some packets are delivered late. The first 2407 bottleneck has to apply CE-marks to at least N contiguous 2408 packets and the second bottleneck has to inject an 2409 uninterrupted sequence of at least N of these packets between 2410 two packets earlier in the stream (where N is the reordering 2411 window that the transport protocol allows before it considers 2412 a packet is lost). 2414 For example consider N=3, and consider the sequence of 2415 packets 100, 101, 102, 103,... and imagine that packets 2416 150,151,152 from later in the flow are injected as follows: 2417 100, 150, 151, 101, 152, 102, 103... If this were late 2418 reordering, even one packet arriving out of sequence would 2419 trigger a spurious retransmission, but there is no spurious 2420 retransmission here with early reordering, because packet 2421 101 moves the cumulative ACK counter forward before 3 2422 packets have arrived out of order. Later, when packets 2423 148, 149, 153... arrive, even though there is a 3-packet 2424 hole, there will be no problem, because the packets to fill 2425 the hole are already in the receive buffer. 2427 4. Even with the current TCP recommendation of N=3 [RFC5681] 2428 spurious retransmissions will be unlikely for all the above 2429 reasons. As RACK [RFC8985] is becoming widely deployed, it 2430 tends to adapt its reordering window to a larger value of N, 2431 which will make the chance of a contiguous sequence of N early 2432 arrivals vanishingly small. 2434 5. Even a run of 2 CE marks within a Classic ECN flow is 2435 unlikely, given FQ-CoDel is the only known widely deployed AQM 2436 that supports Classic ECN marking and it takes great care to 2437 separate out flows and to space any markings evenly along each 2438 flow. 2440 It is extremely unlikely that the above set of 5 eventualities 2441 that are each unusual in themselves would all happen 2442 simultaneously. But, even if they did, the consequences would 2443 hardly be dire: the odd spurious fast retransmission. Whenever 2444 the traffic source (a Classic congestion control) mistakes the 2445 reordering of a string of CE marks for a loss, one might think 2446 that it will reduce its congestion window as well as emitting a 2447 spurious retransmission. However, it would have already reduced 2448 its congestion window when the CE markings arrived early. If it 2449 is using ABE [RFC8511], it might reduce cwnd a little more for a 2450 loss than for a CE mark. But it will revert that reduction once 2451 it detects that the retransmission was spurious. 2453 In conclusion, the impact of early reordering on spurious 2454 retransmissions due to CE being ambiguous will generally be 2455 vanishingly small. 2457 Insufficient anti-replay window in some pre-existing VPNs: If delay 2458 is reduced for a subset of the flows within a VPN, the anti-replay 2459 feature of some VPNs is known to potentially mistake the 2460 difference in delay for a replay attack. Section 6.2 recommends 2461 that the anti-replay window at the VPN egress is sufficiently 2462 sized, as required by the relevant specifications. However, in 2463 some VPN implementations the maximum anti-replay window is 2464 insufficient to cater for a large delay difference at prevailing 2465 packet rates. Section 6.2 suggests alternative work-rounds for 2466 such cases, but end-users using L4S over a VPN will need to be 2467 able to recognize the symptoms of this problem, in order to seek 2468 out these work-rounds. 2470 Hard to distinguish Classic ECN AQM: With this scheme, when a source 2471 receives ECN feedback, it is not explicitly clear which type of 2472 AQM generated the CE markings. This is not a problem for Classic 2473 ECN sources that send ECT(0) packets, because an L4S AQM will 2474 recognize the ECT(0) packets as Classic and apply the appropriate 2475 Classic ECN marking behaviour. 2477 However, in the absence of explicit disambiguation of the CE 2478 markings, an L4S source needs to use heuristic techniques to work 2479 out which type of congestion response to apply (see 2480 Appendix A.1.5). Otherwise, if long-running Classic flow(s) are 2481 sharing a Classic ECN AQM bottleneck with long-running L4S 2482 flow(s), which then apply an L4S response to Classic CE signals, 2483 the L4S flows would outcompete the Classic flow(s). Experiments 2484 have shown that L4S flows can take about 20 times more capacity 2485 share than equivalent Classic flows. Nonetheless, as link 2486 capacity reduces (e.g. to 4 Mb/s), the inequality reduces. So 2487 Classic flows always make progress and are not starved. 2489 When L4S was first proposed (in 2015, 14 years after [RFC3168] was 2490 published), it was believed that Classic ECN AQMs had failed to be 2491 deployed, because research measurements had found little or no 2492 evidence of CE marking. In subsequent years Classic ECN was 2493 included in per-flow-queuing (FQ) deployments, however an FQ 2494 scheduler stops an L4S flow outcompeting Classic, because it 2495 enforces equality between flow rates. It is not known whether 2496 there have been any non-FQ deployments of Classic ECN AQMs in the 2497 subsequent years, or whether there will be in future. 2499 An algorithm for detecting a Classic ECN AQM as soon as a flow 2500 stabilizes after start-up has been proposed [ecn-fallback] (see 2501 Appendix A.1.5 for a brief summary). Testbed evaluations of v2 of 2502 the algorithm have shown detection is reasonably good for Classic 2503 ECN AQMs, in a wide range of circumstances. However, although it 2504 can correctly detect an L4S ECN AQM in many circumstances, its is 2505 often incorrect at low link capacities and/or high RTTs. Although 2506 this is the safe way round, there is a danger that it will 2507 discourage use of the algorithm. 2509 Non-L4S service for control packets: Solely for the case of TCP, the 2510 Classic ECN RFCs [RFC3168] and [RFC5562] require a sender to clear 2511 the ECN field to Not-ECT on retransmissions and on certain control 2512 packets specifically pure ACKs, window probes and SYNs. When L4S 2513 packets are classified by the ECN field, these TCP control packets 2514 would not be classified into an L4S queue, and could therefore be 2515 delayed relative to the other packets in the flow. This would not 2516 cause reordering (because retransmissions are already out of 2517 order, and these control packets typically carry no data). 2518 However, it would make critical TCP control packets more 2519 vulnerable to loss and delay. To address this problem, 2520 [I-D.ietf-tcpm-generalized-ecn] proposes an experiment in which 2521 all TCP control packets and retransmissions are ECN-capable as 2522 long as appropriate ECN feedback is available in each case. 2524 Pros: 2526 Should work e2e: The ECN field generally propagates end-to-end 2527 across the Internet without being wiped or mangled, at least over 2528 fixed networks. Unlike the DSCP, the setting of the ECN field is 2529 at least meant to be forwarded unchanged by networks that do not 2530 support ECN. 2532 Should work in tunnels: The L4S identifiers work across and within 2533 any tunnel that propagates the ECN field in any of the variant 2534 ways it has been defined since ECN-tunneling was first specified 2535 in the year 2001 [RFC3168]. However, it is likely that some 2536 tunnels still do not implement ECN propagation at all. 2538 Should work for many link technologies: At most, but not all, path 2539 bottlenecks there is IP-awareness, so that L4S AQMs can be located 2540 where the IP-ECN field can be manipulated. Bottlenecks at lower 2541 layer nodes without IP-awareness either have to use drop to signal 2542 congestion or a specific congestion notification facility has to 2543 be defined for that link technology, including propagation to and 2544 from IP-ECN. The programme to define these is progressing and in 2545 each case so far the scheme already defined for ECN inherently 2546 supports L4S as well (see Section 6.1). 2548 Could migrate to one codepoint: If all Classic ECN senders 2549 eventually evolve to use the L4S service, the ECT(0) codepoint 2550 could be reused for some future purpose, but only once use of 2551 ECT(0) packets had reduced to zero, or near-zero, which might 2552 never happen. 2554 L4 not required: Being based on the ECN field, this scheme does not 2555 need the network to access transport layer flow identifiers. 2556 Nonetheless, it does not preclude solutions that do. 2558 Appendix C. Potential Competing Uses for the ECT(1) Codepoint 2560 The ECT(1) codepoint of the ECN field has already been assigned once 2561 for the ECN nonce [RFC3540], which has now been categorized as 2562 historic [RFC8311]. ECN is probably the only remaining field in the 2563 Internet Protocol that is common to IPv4 and IPv6 and still has 2564 potential to work end-to-end, with tunnels and with lower layers. 2565 Therefore, ECT(1) should not be reassigned to a different 2566 experimental use (L4S) without carefully assessing competing 2567 potential uses. These fall into the following categories: 2569 C.1. Integrity of Congestion Feedback 2571 Receiving hosts can fool a sender into downloading faster by 2572 suppressing feedback of ECN marks (or of losses if retransmissions 2573 are not necessary or available otherwise). 2575 The historic ECN nonce protocol [RFC3540] proposed that a TCP sender 2576 could set either of ECT(0) or ECT(1) in each packet of a flow and 2577 remember the sequence it had set. If any packet was lost or 2578 congestion marked, the receiver would miss that bit of the sequence. 2579 An ECN Nonce receiver had to feed back the least significant bit of 2580 the sum, so it could not suppress feedback of a loss or mark without 2581 a 50-50 chance of guessing the sum incorrectly. 2583 It is highly unlikely that ECT(1) will be needed for integrity 2584 protection in future. The ECN Nonce RFC [RFC3540] as been 2585 reclassified as historic, partly because other ways have been 2586 developed to protect feedback integrity of TCP and other transports 2587 [RFC8311] that do not consume a codepoint in the IP header. For 2588 instance: 2590 o the sender can test the integrity of the receiver's feedback by 2591 occasionally setting the IP-ECN field to a value normally only set 2592 by the network. Then it can test whether the receiver's feedback 2593 faithfully reports what it expects (see para 2 of Section 20.2 of 2594 [RFC3168]. This works for loss and it will work for the accurate 2595 ECN feedback [RFC7560] intended for L4S. 2597 o A network can enforce a congestion response to its ECN markings 2598 (or packet losses) by auditing congestion exposure (ConEx) 2599 [RFC7713]. Whether the receiver or a downstream network is 2600 suppressing congestion feedback or the sender is unresponsive to 2601 the feedback, or both, ConEx audit can neutralise any advantage 2602 that any of these three parties would otherwise gain. 2604 o The TCP authentication option (TCP-AO [RFC5925]) can be used to 2605 detect any tampering with TCP congestion feedback (whether 2606 malicious or accidental). TCP's congestion feedback fields are 2607 immutable end-to-end, so they are amenable to TCP-AO protection, 2608 which covers the main TCP header and TCP options by default. 2609 However, TCP-AO is often too brittle to use on many end-to-end 2610 paths, where middleboxes can make verification fail in their 2611 attempts to improve performance or security, e.g. by 2612 resegmentation or shifting the sequence space. 2614 C.2. Notification of Less Severe Congestion than CE 2616 Various researchers have proposed to use ECT(1) as a less severe 2617 congestion notification than CE, particularly to enable flows to fill 2618 available capacity more quickly after an idle period, when another 2619 flow departs or when a flow starts, e.g. VCP [VCP], Queue View (QV) 2620 [QV]. 2622 Before assigning ECT(1) as an identifier for L4S, we must carefully 2623 consider whether it might be better to hold ECT(1) in reserve for 2624 future standardisation of rapid flow acceleration, which is an 2625 important and enduring problem [RFC6077]. 2627 Pre-Congestion Notification (PCN) is another scheme that assigns 2628 alternative semantics to the ECN field. It uses ECT(1) to signify a 2629 less severe level of pre-congestion notification than CE [RFC6660]. 2630 However, the ECN field only takes on the PCN semantics if packets 2631 carry a Diffserv codepoint defined to indicate PCN marking within a 2632 controlled environment. PCN is required to be applied solely to the 2633 outer header of a tunnel across the controlled region in order not to 2634 interfere with any end-to-end use of the ECN field. Therefore a PCN 2635 region on the path would not interfere with the L4S service 2636 identifier defined in Section 3. 2638 Authors' Addresses 2640 Koen De Schepper 2641 Nokia Bell Labs 2642 Antwerp 2643 Belgium 2645 Email: koen.de_schepper@nokia.com 2646 URI: https://www.bell-labs.com/usr/koen.de_schepper 2648 Bob Briscoe (editor) 2649 Independent 2650 UK 2652 Email: ietf@bobbriscoe.net 2653 URI: http://bobbriscoe.net/