idnits 2.17.1 draft-ietf-tsvwg-ecn-l4s-id-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 2581 has weird spacing: '...initial even...' == Line 2600 has weird spacing: '...initial even...' -- The document date (February 22, 2021) is 1153 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 1153 == Missing Reference: 'RFCXXXX' is mentioned on line 1154, but not defined == Outdated reference: A later version (-07) exists of draft-briscoe-docsis-q-protection-00 == Outdated reference: A later version (-28) exists of draft-ietf-tcpm-accurate-ecn-13 == Outdated reference: A later version (-15) exists of draft-ietf-tcpm-generalized-ecn-06 == Outdated reference: A later version (-25) exists of draft-ietf-tsvwg-aqm-dualq-coupled-13 == Outdated reference: A later version (-22) exists of draft-ietf-tsvwg-ecn-encap-guidelines-14 == Outdated reference: A later version (-20) exists of draft-ietf-tsvwg-l4s-arch-08 == Outdated reference: A later version (-22) exists of draft-ietf-tsvwg-nqb-03 == Outdated reference: A later version (-23) exists of draft-ietf-tsvwg-rfc6040update-shim-12 == Outdated reference: A later version (-04) exists of draft-morton-tsvwg-sce-02 == Outdated reference: A later version (-06) exists of draft-stewart-tsvwg-sctpecn-05 -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 8312 (Obsoleted by RFC 9438) Summary: 0 errors (**), 0 flaws (~~), 15 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Services (tsv) K. De Schepper 3 Internet-Draft Nokia Bell Labs 4 Intended status: Experimental B. Briscoe, Ed. 5 Expires: August 26, 2021 Independent 6 February 22, 2021 8 Identifying Modified Explicit Congestion Notification (ECN) Semantics 9 for Ultra-Low Queuing Delay (L4S) 10 draft-ietf-tsvwg-ecn-l4s-id-13 12 Abstract 14 This specification defines the identifier to be used on IP packets 15 for a new network service called low latency, low loss and scalable 16 throughput (L4S). L4S uses an Explicit Congestion Notification (ECN) 17 scheme that is similar to the original (or 'Classic') ECN approach. 18 'Classic' ECN marking was required to be equivalent to a drop, both 19 when applied in the network and when responded to by a transport. 20 Unlike 'Classic' ECN marking, for packets carrying the L4S 21 identifier, the network applies marking more immediately and more 22 aggressively than drop, and the transport response to each mark is 23 reduced and smoothed relative to that for drop. The two changes 24 counterbalance each other so that the throughput of an L4S flow will 25 be roughly the same as a non-L4S flow under the same conditions. 26 Nonetheless, the much more frequent control signals and the finer 27 responses to them result in much more fine-grained adjustments, so 28 that ultra-low and consistently low queuing delay (typically sub- 29 millisecond on average) becomes possible for L4S traffic without 30 compromising link utilization. Thus even capacity-seeking (TCP-like) 31 traffic can have high bandwidth and very low delay at the same time, 32 even during periods of high traffic load. 34 The L4S identifier defined in this document distinguishes L4S from 35 'Classic' (e.g. TCP-Reno-friendly) traffic. It gives an incremental 36 migration path so that suitably modified network bottlenecks can 37 distinguish and isolate existing traffic that still follows the 38 Classic behaviour, to prevent it degrading the low queuing delay and 39 low loss of L4S traffic. This specification defines the rules that 40 L4S transports and network elements need to follow to ensure they 41 neither harm each other's performance nor that of Classic traffic. 42 Examples of new active queue management (AQM) marking algorithms and 43 examples of new transports (whether TCP-like or real-time) are 44 specified separately. 46 Status of This Memo 48 This Internet-Draft is submitted in full conformance with the 49 provisions of BCP 78 and BCP 79. 51 Internet-Drafts are working documents of the Internet Engineering 52 Task Force (IETF). Note that other groups may also distribute 53 working documents as Internet-Drafts. The list of current Internet- 54 Drafts is at https://datatracker.ietf.org/drafts/current/. 56 Internet-Drafts are draft documents valid for a maximum of six months 57 and may be updated, replaced, or obsoleted by other documents at any 58 time. It is inappropriate to use Internet-Drafts as reference 59 material or to cite them other than as "work in progress." 61 This Internet-Draft will expire on August 26, 2021. 63 Copyright Notice 65 Copyright (c) 2021 IETF Trust and the persons identified as the 66 document authors. All rights reserved. 68 This document is subject to BCP 78 and the IETF Trust's Legal 69 Provisions Relating to IETF Documents 70 (https://trustee.ietf.org/license-info) in effect on the date of 71 publication of this document. Please review these documents 72 carefully, as they describe your rights and restrictions with respect 73 to this document. Code Components extracted from this document must 74 include Simplified BSD License text as described in Section 4.e of 75 the Trust Legal Provisions and are provided without warranty as 76 described in the Simplified BSD License. 78 Table of Contents 80 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 81 1.1. Latency, Loss and Scaling Problems . . . . . . . . . . . 5 82 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 83 1.3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 9 84 2. Consensus Choice of L4S Packet Identifier: Requirements . . . 9 85 3. L4S Packet Identification at Run-Time . . . . . . . . . . . . 10 86 4. Prerequisite Transport Layer Behaviour (the 'Prague 87 Requirements') . . . . . . . . . . . . . . . . . . . . . . . 11 88 4.1. Prerequisite Codepoint Setting . . . . . . . . . . . . . 11 89 4.2. Prerequisite Transport Feedback . . . . . . . . . . . . . 11 90 4.3. Prerequisite Congestion Response . . . . . . . . . . . . 12 91 4.4. Filtering or Smoothing of ECN Feedback . . . . . . . . . 14 92 5. Prerequisite Network Node Behaviour . . . . . . . . . . . . . 15 93 5.1. Prerequisite Classification and Re-Marking Behaviour . . 15 94 5.2. The Meaning of L4S CE Relative to Drop . . . . . . . . . 16 95 5.3. Exception for L4S Packet Identification by Network Nodes 96 with Transport-Layer Awareness . . . . . . . . . . . . . 17 97 5.4. Interaction of the L4S Identifier with other Identifiers 17 98 5.4.1. DualQ Examples of Other Identifiers Complementing L4S 99 Identifiers . . . . . . . . . . . . . . . . . . . . . 18 100 5.4.1.1. Inclusion of Additional Traffic with L4S . . . . 18 101 5.4.1.2. Exclusion of Traffic From L4S Treatment . . . . . 19 102 5.4.1.3. Generalized Combination of L4S and Other 103 Identifiers . . . . . . . . . . . . . . . . . . . 20 104 5.4.2. Per-Flow Queuing Examples of Other Identifiers 105 Complementing L4S Identifiers . . . . . . . . . . . . 21 106 5.5. Limiting Packet Bursts from Links Supporting L4S AQMs . . 21 107 6. L4S Experiments . . . . . . . . . . . . . . . . . . . . . . . 22 108 6.1. Open Questions . . . . . . . . . . . . . . . . . . . . . 23 109 6.2. Open Issues . . . . . . . . . . . . . . . . . . . . . . . 24 110 6.3. Future Potential . . . . . . . . . . . . . . . . . . . . 24 111 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 112 8. Security Considerations . . . . . . . . . . . . . . . . . . . 25 113 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 25 114 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 26 115 10.1. Normative References . . . . . . . . . . . . . . . . . . 26 116 10.2. Informative References . . . . . . . . . . . . . . . . . 26 117 Appendix A. The 'Prague L4S Requirements' . . . . . . . . . . . 33 118 A.1. Requirements for Scalable Transport Protocols . . . . . . 34 119 A.1.1. Use of L4S Packet Identifier . . . . . . . . . . . . 34 120 A.1.2. Accurate ECN Feedback . . . . . . . . . . . . . . . . 34 121 A.1.3. Fall back to Reno-friendly congestion control on 122 packet loss . . . . . . . . . . . . . . . . . . . . . 35 123 A.1.4. Fall back to Reno-friendly congestion control on 124 classic ECN bottlenecks . . . . . . . . . . . . . . . 36 125 A.1.5. Reduce RTT dependence . . . . . . . . . . . . . . . . 37 126 A.1.6. Scaling down to fractional congestion windows . . . . 37 127 A.1.7. Measuring Reordering Tolerance in Time Units . . . . 38 128 A.2. Scalable Transport Protocol Optimizations . . . . . . . . 41 129 A.2.1. Setting ECT in TCP Control Packets and 130 Retransmissions . . . . . . . . . . . . . . . . . . . 41 131 A.2.2. Faster than Additive Increase . . . . . . . . . . . . 41 132 A.2.3. Faster Convergence at Flow Start . . . . . . . . . . 42 133 Appendix B. Alternative Identifiers . . . . . . . . . . . . . . 42 134 B.1. ECT(1) and CE codepoints . . . . . . . . . . . . . . . . 43 135 B.2. ECN-DualQ-SCE1 . . . . . . . . . . . . . . . . . . . . . 47 136 B.3. ECN-DualQ-SCE0 . . . . . . . . . . . . . . . . . . . . . 49 137 B.4. ECN Plus a Diffserv Codepoint (DSCP) . . . . . . . . . . 51 138 B.5. ECN capability alone . . . . . . . . . . . . . . . . . . 54 139 B.6. Protocol ID . . . . . . . . . . . . . . . . . . . . . . . 54 140 B.7. Source or destination addressing . . . . . . . . . . . . 54 141 B.8. Summary: Merits of Alternative Identifiers . . . . . . . 55 143 Appendix C. Potential Competing Uses for the ECT(1) Codepoint . 56 144 C.1. Integrity of Congestion Feedback . . . . . . . . . . . . 56 145 C.2. Notification of Less Severe Congestion than CE . . . . . 57 146 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 57 148 1. Introduction 150 This specification defines the identifier to be used on IP packets 151 for a new network service called low latency, low loss and scalable 152 throughput (L4S). It is similar to the original (or 'Classic') 153 Explicit Congestion Notification (ECN [RFC3168]). RFC 3168 required 154 an ECN mark to be equivalent to a drop, both when applied in the 155 network and when responded to by a transport. Unlike Classic ECN 156 marking, the network applies L4S marking more immediately and more 157 aggressively than drop, and the transport response to each mark is 158 reduced and smoothed relative to that for drop. The two changes 159 counterbalance each other so that the throughput of an L4S flow will 160 be roughly the same as a non-L4S flow under the same conditions. 161 Nonetheless, the much more frequent control signals and the finer 162 responses to them result in ultra-low queuing delay without 163 compromising link utilization, and this low delay can be maintained 164 during high load. Ultra-low queuing delay means less than 1 165 millisecond (ms) on average and less than about 2 ms at the 99th 166 percentile. 168 An example of a scalable congestion control that would enable the L4S 169 service is Data Center TCP (DCTCP), which until now has been 170 applicable solely to controlled environments like data centres 171 [RFC8257], because it is too aggressive to co-exist with existing 172 TCP-Reno-friendly traffic. The DualQ Coupled AQM, which is defined 173 in a complementary experimental specification 174 [I-D.ietf-tsvwg-aqm-dualq-coupled], is an AQM framework that enables 175 scalable congestion controls like DCTCP to co-exist with existing 176 traffic, each getting roughly the same flow rate when they compete 177 under similar conditions. Note that a transport such as DCTCP is 178 still not safe to deploy on the Internet unless it satisfies the 179 requirements listed in Section 4. 181 L4S is not only for elastic (TCP-like) traffic - there are scalable 182 congestion controls for real-time media, such as the L4S variant of 183 the SCReAM [RFC8298] real-time media congestion avoidance technique 184 (RMCAT). The factor that distinguishes L4S from Classic traffic is 185 its behaviour in response to congestion. The transport wire 186 protocol, e.g. TCP, QUIC, SCTP, DCCP, RTP/RTCP, is orthogonal (and 187 therefore not suitable for distinguishing L4S from Classic packets). 189 The L4S identifier defined in this document is the key piece that 190 distinguishes L4S from 'Classic' (e.g. Reno-friendly) traffic. It 191 gives an incremental migration path so that suitably modified network 192 bottlenecks can distinguish and isolate existing Classic traffic from 193 L4S traffic to prevent it from degrading the ultra-low delay and loss 194 of the new scalable transports, without harming Classic performance. 195 Initial implementation of the separate parts of the system has been 196 motivated by the performance benefits. 198 1.1. Latency, Loss and Scaling Problems 200 Latency is becoming the critical performance factor for many (most?) 201 applications on the public Internet, e.g. interactive Web, Web 202 services, voice, conversational video, interactive video, interactive 203 remote presence, instant messaging, online gaming, remote desktop, 204 cloud-based applications, and video-assisted remote control of 205 machinery and industrial processes. In the 'developed' world, 206 further increases in access network bit-rate offer diminishing 207 returns, whereas latency is still a multi-faceted problem. In the 208 last decade or so, much has been done to reduce propagation time by 209 placing caches or servers closer to users. However, queuing remains 210 a major intermittent component of latency. 212 The Diffserv architecture provides Expedited Forwarding [RFC3246], so 213 that low latency traffic can jump the queue of other traffic. 214 However, on access links dedicated to individual sites (homes, small 215 enterprises or mobile devices), often all traffic at any one time 216 will be latency-sensitive. Then, given nothing to differentiate 217 from, Diffserv makes no difference. Instead, we need to remove the 218 causes of any unnecessary delay. 220 The bufferbloat project has shown that excessively-large buffering 221 ('bufferbloat') has been introducing significantly more delay than 222 the underlying propagation time. These delays appear only 223 intermittently--only when a capacity-seeking (e.g. TCP) flow is long 224 enough for the queue to fill the buffer, making every packet in other 225 flows sharing the buffer sit through the queue. 227 Active queue management (AQM) was originally developed to solve this 228 problem (and others). Unlike Diffserv, which gives low latency to 229 some traffic at the expense of others, AQM controls latency for _all_ 230 traffic in a class. In general, AQM methods introduce an increasing 231 level of discard from the buffer the longer the queue persists above 232 a shallow threshold. This gives sufficient signals to capacity- 233 seeking (aka. greedy) flows to keep the buffer empty for its intended 234 purpose: absorbing bursts. However, RED [RFC2309] and other 235 algorithms from the 1990s were sensitive to their configuration and 236 hard to set correctly. So, this form of AQM was not widely deployed. 238 More recent state-of-the-art AQM methods, e.g. FQ-CoDel [RFC8290], 239 PIE [RFC8033], Adaptive RED [ARED01], are easier to configure, 240 because they define the queuing threshold in time not bytes, so it is 241 invariant for different link rates. However, no matter how good the 242 AQM, the sawtoothing sending window of a Classic congestion control 243 will either cause queuing delay to vary or cause the link to be 244 under-utilized. Even with a perfectly tuned AQM, the additional 245 queuing delay will be of the same order as the underlying speed-of- 246 light delay across the network. 248 If a sender's own behaviour is introducing queuing delay variation, 249 no AQM in the network can 'un-vary' the delay without significantly 250 compromising link utilization. Even flow-queuing (e.g. [RFC8290]), 251 which isolates one flow from another, cannot isolate a flow from the 252 delay variations it inflicts on itself. Therefore those applications 253 that need to seek out high bandwidth but also need low latency will 254 have to migrate to scalable congestion control. 256 Altering host behaviour is not enough on its own though. Even if 257 hosts adopt low latency behaviour (scalable congestion controls), 258 they need to be isolated from the behaviour of existing Classic 259 congestion controls that induce large queue variations. L4S enables 260 that migration by providing latency isolation in the network and 261 distinguishing the two types of packets that need to be isolated: L4S 262 and Classic. L4S isolation can be achieved with a queue per flow 263 (e.g. [RFC8290]) but a DualQ [I-D.ietf-tsvwg-aqm-dualq-coupled] is 264 sufficient, and actually gives better tail latency. Both approaches 265 are addressed in this document. 267 The DualQ solution was developed to make ultra-low latency available 268 without requiring per-flow queues at every bottleneck. This was 269 because FQ has well-known downsides - not least the need to inspect 270 transport layer headers in the network, which makes it incompatible 271 with privacy approaches such as IPSec VPN tunnels, and incompatible 272 with link layer queue management, where transport layer headers can 273 be hidden, e.g. 5G. 275 Latency is not the only concern addressed by L4S: It was known when 276 TCP congestion avoidance was first developed that it would not scale 277 to high bandwidth-delay products (footnote 6 of Jacobson and Karels 278 [TCP-CA]). Given regular broadband bit-rates over WAN distances are 279 already [RFC3649] beyond the scaling range of Reno TCP, 'less 280 unscalable' Cubic [RFC8312] and Compound [I-D.sridharan-tcpm-ctcp] 281 variants of TCP have been successfully deployed. However, these are 282 now approaching their scaling limits. Unfortunately, fully scalable 283 congestion controls such as DCTCP [RFC8257] cause Classic ECN 284 congestion controls sharing the same queue to starve themselves, 285 which is why they have been confined to private data centres or 286 research testbeds (until now). 288 It turns out that a congestion control algorithm like DCTCP that 289 solves the latency problem also solves the scalability problem of 290 Classic congestion controls. The finer sawteeth in the congestion 291 window have low amplitude, so they cause very little queuing delay 292 variation and the average time to recover from one congestion signal 293 to the next (the average duration of each sawtooth) remains 294 invariant, which maintains constant tight control as flow-rate 295 scales. A background paper [DCttH15] gives the full explanation of 296 why the design solves both the latency and the scaling problems, both 297 in plain English and in more precise mathematical form. The 298 explanation is summarised without the maths in the L4S architecture 299 document [I-D.ietf-tsvwg-l4s-arch]. 301 1.2. Terminology 303 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 304 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 305 "OPTIONAL" in this document are to be interpreted as described in 306 [RFC2119]. In this document, these words will appear with that 307 interpretation only when in ALL CAPS. Lower case uses of these words 308 are not to be interpreted as carrying RFC-2119 significance. 310 Classic Congestion Control: A congestion control behaviour that can 311 co-exist with standard TCP Reno [RFC5681] without causing 312 significantly negative impact on its flow rate [RFC5033]. With 313 Classic congestion controls, as flow rate scales, the number of 314 round trips between congestion signals (losses or ECN marks) rises 315 with the flow rate. So it takes longer and longer to recover 316 after each congestion event. Therefore control of queuing and 317 utilization becomes very slack, and the slightest disturbance 318 prevents a high rate from being attained [RFC3649]. 320 For instance, with 1500 byte packets and an end-to-end round trip 321 time (RTT) of 36 ms, over the years, as Reno flow rate scales from 322 2 to 100 Mb/s the number of round trips taken to recover from a 323 congestion event rises proportionately, from 4 round trips to 200. 324 Cubic [RFC8312] was developed to be less unscalable, but it is 325 approaching its scaling limit; with the same RTT of 36ms, at 326 100Mb/s it takes about 106 round trips to recover, and at 800 Mb/s 327 its recovery time triples to over 340 round trips, or still more 328 than 12 seconds (Reno would take 57 seconds). Cubic only becomes 329 significantly better than Reno at high delay and rate 330 combinations, for example at 90 ms RTT and 800 Mb/s a Reno flow 331 takes 4000 RTTs or 6 minutes to recover, whereas Cubic 'only' 332 needs 188 RTTs, which is still 17 seconds (double its recovery 333 time at 100Mb/s). 335 Scalable Congestion Control: A congestion control where the average 336 time from one congestion signal to the next (the recovery time) 337 remains invariant as the flow rate scales, all other factors being 338 equal. This maintains the same degree of control over queueing 339 and utilization whatever the flow rate, as well as ensuring that 340 high throughput is robust to disturbances. For instance, DCTCP 341 averages 2 congestion signals per round-trip whatever the flow 342 rate, as do other recently developed scalable congestion controls, 343 e.g. Relentless TCP [Mathis09], TCP Prague [PragueLinux] and the 344 L4S variant of SCREAM for real-time media [RFC8298]). See 345 Section 4.3 for more explanation. 347 Classic service: The Classic service is intended for all the 348 congestion control behaviours that co-exist with Reno [RFC5681] 349 (e.g. Reno itself, Cubic [RFC8312], Compound 350 [I-D.sridharan-tcpm-ctcp], TFRC [RFC5348]). The term 'Classic 351 queue' means a queue providing the Classic service. 353 Low-Latency, Low-Loss Scalable throughput (L4S) service: The 'L4S' 354 service is intended for traffic from scalable congestion control 355 algorithms, such as Data Center TCP [RFC8257]. The L4S service is 356 for more general traffic than just DCTCP--it allows the set of 357 congestion controls with similar scaling properties to DCTCP to 358 evolve, such as the examples listed above (Relentless, Prague, 359 SCReAM). The term 'L4S queue' means a queue providing the L4S 360 service. 362 The terms Classic or L4S can also qualify other nouns, such as 363 'queue', 'codepoint', 'identifier', 'classification', 'packet', 364 'flow'. For example: an L4S packet means a packet with an L4S 365 identifier sent from an L4S congestion control. 367 Both Classic and L4S services can cope with a proportion of 368 unresponsive or less-responsive traffic as well, as long as it 369 does not build a queue (e.g. DNS, VoIP, game sync datagrams, etc). 371 Reno-friendly: The subset of Classic traffic that excludes 372 unresponsive traffic and excludes experimental congestion controls 373 intended to coexist with Reno but without always being strictly 374 friendly to Reno (as allowed by [RFC5033]). Reno-friendly is used 375 in place of 'TCP-friendly', given that the TCP protocol is used 376 with many different congestion control behaviours. 378 Classic ECN: The original Explicit Congestion Notification (ECN) 379 protocol [RFC3168], which requires ECN signals to be treated the 380 same as drops, both when generated in the network and when 381 responded to by the sender. The names used for the four 382 codepoints of the 2-bit IP-ECN field are as defined in [RFC3168]: 383 Not ECT, ECT(0), ECT(1) and CE, where ECT stands for ECN-Capable 384 Transport and CE stands for Congestion Experienced. 386 1.3. Scope 388 The new L4S identifier defined in this specification is applicable 389 for IPv4 and IPv6 packets (as for Classic ECN [RFC3168]). It is 390 applicable for the unicast, multicast and anycast forwarding modes. 392 The L4S identifier is an orthogonal packet classification to the 393 Differentiated Services Code Point (DSCP) [RFC2474]. Section 5.4 394 explains what this means in practice. 396 This document is intended for experimental status, so it does not 397 update any standards track RFCs. Therefore it depends on [RFC8311], 398 which is a standards track specification that: 400 o updates the ECN proposed standard [RFC3168] to allow experimental 401 track RFCs to relax the requirement that an ECN mark must be 402 equivalent to a drop (when the network applies markings and/or 403 when the sender responds to them); 405 o changes the status of the experimental ECN nonce [RFC3540] to 406 historic; 408 o makes consequent updates to the following additional proposed 409 standard RFCs to reflect the above two bullets: 411 * ECN for RTP [RFC6679]; 413 * the congestion control specifications of various DCCP 414 congestion control identifier (CCID) profiles [RFC4341], 415 [RFC4342], [RFC5622]. 417 This document is about identifiers that are used for interoperation 418 between hosts and networks. So the audience is broad, covering 419 developers of host transports and network AQMs, as well as covering 420 how operators might wish to combine various identifiers, which would 421 require flexibility from equipment developers. 423 2. Consensus Choice of L4S Packet Identifier: Requirements 425 This subsection briefly records the process that led to a consensus 426 choice of L4S identifier, selected from all the alternatives in 427 Appendix B. 429 The identifier for packets using the Low Latency, Low Loss, Scalable 430 throughput (L4S) service needs to meet the following requirements: 432 o it SHOULD survive end-to-end between source and destination 433 applications: across the boundary between host and network, 434 between interconnected networks, and through middleboxes; 436 o it SHOULD be visible at the IP layer 438 o it SHOULD be common to IPv4 and IPv6 and transport-agnostic; 440 o it SHOULD be incrementally deployable; 442 o it SHOULD enable an AQM to classify packets encapsulated by outer 443 IP or lower-layer headers; 445 o it SHOULD consume minimal extra codepoints; 447 o it SHOULD be consistent on all the packets of a transport layer 448 flow, so that some packets of a flow are not served by a different 449 queue to others. 451 Whether the identifier would be recoverable if the experiment failed 452 is a factor that could be taken into account. However, this has not 453 been made a requirement, because that would favour schemes that would 454 be easier to fail, rather than those more likely to succeed. 456 It is recognised that the chosen identifier is unlikely to satisfy 457 all these requirements, particularly given the limited space left in 458 the IP header. Therefore a compromise will be necessary, which is 459 why all the above requirements are expressed with the word 'SHOULD' 460 not 'MUST'. Appendix B discusses the pros and cons of the 461 compromises made in various competing identification schemes against 462 the above requirements. 464 On the basis of this analysis, "ECT(1) and CE codepoints" is the best 465 compromise. Therefore this scheme is defined in detail in the 466 following sections, while Appendix B records the rationale for this 467 decision. 469 3. L4S Packet Identification at Run-Time 471 The L4S treatment is an experimental track alternative packet marking 472 treatment [RFC4774] to the Classic ECN treatment in [RFC3168], which 473 has been updated by [RFC8311] to allow experiments such as the one 474 defined in the present specification. Like Classic ECN, L4S ECN 475 identifies both network and host behaviour: it identifies the marking 476 treatment that network nodes are expected to apply to L4S packets, 477 and it identifies packets that have been sent from hosts that are 478 expected to comply with a broad type of sending behaviour. 480 For a packet to receive L4S treatment as it is forwarded, the sender 481 sets the ECN field in the IP header to the ECT(1) codepoint. See 482 Section 4 for full transport layer behaviour requirements, including 483 feedback and congestion response. 485 A network node that implements the L4S service normally classifies 486 arriving ECT(1) and CE packets for L4S treatment. See Section 5 for 487 full network element behaviour requirements, including 488 classification, ECN-marking and interaction of the L4S identifier 489 with other identifiers and per-hop behaviours. 491 4. Prerequisite Transport Layer Behaviour (the 'Prague Requirements') 493 4.1. Prerequisite Codepoint Setting 495 A sender that wishes a packet to receive L4S treatment as it is 496 forwarded, MUST set the ECN field in the IP header (v4 or v6) to the 497 ECT(1) codepoint. 499 4.2. Prerequisite Transport Feedback 501 For a transport protocol to provide scalable congestion control it 502 MUST provide feedback of the extent of CE marking on the forward 503 path. When ECN was added to TCP [RFC3168], the feedback method 504 reported no more than one CE mark per round trip. Some transport 505 protocols derived from TCP mimic this behaviour while others report 506 the accurate extent of ECN marking. This means that some transport 507 protocols will need to be updated as a prerequisite for scalable 508 congestion control. The position for a few well-known transport 509 protocols is given below. 511 TCP: Support for the accurate ECN feedback requirements [RFC7560] 512 (such as that provided by AccECN [I-D.ietf-tcpm-accurate-ecn]) by 513 both ends is a prerequisite for scalable congestion control in 514 TCP. Therefore, the presence of ECT(1) in the IP headers even in 515 one direction of a TCP connection will imply that both ends must 516 be supporting accurate ECN feedback. However, the converse does 517 not apply. So even if both ends support AccECN, either of the two 518 ends can choose not to use a scalable congestion control, whatever 519 the other end's choice. 521 SCTP: A suitable ECN feedback mechanism for SCTP could add a chunk 522 to report the number of received CE marks 523 (e.g. [I-D.stewart-tsvwg-sctpecn]), and update the ECN feedback 524 protocol sketched out in Appendix A of the standards track 525 specification of SCTP [RFC4960]. 527 RTP over UDP: A prerequisite for scalable congestion control is for 528 both (all) ends of one media-level hop to signal ECN support 529 [RFC6679] and use the new generic RTCP feedback format of 530 [I-D.ietf-avtcore-cc-feedback-message]. The presence of ECT(1) 531 implies that both (all) ends of that media-level hop support ECN. 532 However, the converse does not apply. So each end of a media- 533 level hop can independently choose not to use a scalable 534 congestion control, even if both ends support ECN. 536 QUIC: Support for sufficiently fine-grained ECN feedback is provided 537 by the v1 IETF QUIC transport [I-D.ietf-quic-transport]. 539 DCCP: The ACK vector in DCCP [RFC4340] is already sufficient to 540 report the extent of CE marking as needed by a scalable congestion 541 control. 543 4.3. Prerequisite Congestion Response 545 As a condition for a host to send packets with the L4S identifier 546 (ECT(1)), it SHOULD implement a congestion control behaviour that 547 ensures that, in steady state, the average time from one ECN 548 congestion signal to the next (the 'recovery time') does not increase 549 as flow rate scales, all other factors being equal. This is termed a 550 scalable congestion control. This is necessary to ensure that queue 551 variations remain small as flow rate scales, without having to 552 sacrifice utilization. 554 For instance, for DCTCP, TCP Prague [PragueLinux] and the L4S variant 555 of SCReAM [RFC8298], the average recovery time is always half a round 556 trip, whatever the flow rate. 558 As with all transport behaviours, a detailed specification (probably 559 an experimental RFC) will need to be defined for each congestion 560 control, following the guidelines for specifying new congestion 561 control algorithms in [RFC5033]. In addition it will need to 562 document these L4S-specific matters, specifically the timescale over 563 which the proportionality is averaged, and control of burstiness. 564 The recovery time requirement above is worded as a 'SHOULD' rather 565 than a 'MUST' to allow reasonable flexibility when defining these 566 specifications. 568 The condition 'all other factors being equal', allows the recovery 569 time to be different for different round trip times, as long as it 570 does not increase with flow rate for any particular RTT. 572 Saying that the recovery time remains roughly invariant is equivalent 573 to saying that the number of ECN CE marks per round trip remains 574 invariant as flow rate scales, all other factors being equal. For 575 instance, DCTCP's average recovery time of half of 1 RTT is 576 equivalent to 2 ECN marks per round trip. For those familiar with 577 steady-state congestion response functions, it is also equivalent to 578 say that the congestion window is inversely proportional to the 579 proportion of bytes in packets marked with the CE codepoint (see 580 section 2 of [PI2]). 582 In order to coexist safely with other Internet traffic, a scalable 583 congestion control MUST NOT tag its packets with the ECT(1) codepoint 584 unless it complies with the following bulleted requirements: 586 o As well as responding to ECN markings, a scalable congestion 587 control MUST react to packet loss in a way that will coexist 588 safely with a TCP Reno congestion control [RFC5681] (see 589 Section 1.2 on Terminology for definition of Reno-Friendly and 590 Appendix A.1.3 for rationale). 592 o A scalable congestion control MUST implement monitoring in order 593 to detect a likely non-L4S but ECN-capable AQM at the bottleneck. 594 On detection of a likely ECN-capable bottleneck it SHOULD be 595 capable (dependent on configuration) of automatically adapting its 596 congestion response to coexist with TCP Reno congestion controls 597 [RFC5681] (see Appendix A.1.4 for rationale and a referenced 598 algorithm). 600 Note that a scalable congestion control is not expected to change 601 to setting ECT(0) while it falls back to coexist with Reno. 603 o A scalable congestion control MUST eliminate RTT bias as much as 604 possible in the range between the minimum likely RTT and typical 605 RTTs expected in the intended deployment scenario (see 606 Appendix A.1.5 for rationale). 608 o A scalable congestion control SHOULD remain responsive to 609 congestion when typical RTTs over the public Internet are 610 significantly smaller because they are no longer inflated by 611 queuing delay. It would be preferable for the minimum window of a 612 scalable congestion control to be lower than the 2 segment minimum 613 of TCP Reno [RFC5681] but this is not set as a formal requirement 614 for L4S experiments (see Appendix A.1.6 for rationale). 616 o A scalable congestion control SHOULD detect loss by counting in 617 time-based units, which is scalable, as opposed to counting in 618 units of packets (as in the 3 DupACK rule of RFC 5681 TCP), which 619 is not scalable. As packet rates increase (e.g., due to new and/ 620 or improved technology), congestion controls that detect loss by 621 counting in units of packets become more likely to incorrectly 622 treat reordering events as congestion-caused loss events (see 623 Appendix A.1.7 for further rationale). This requirement does not 624 apply to congestion controls that are solely used in controlled 625 environments where the network introduces hardly any reordering. 627 o A scalable congestion control is expected to limit the queue 628 caused by bursts of packets. It would not seem necessary to set 629 the limit any lower than 10% of the minimum RTT expected in a 630 typical deployment (e.g. additional queuing of roughly 250 us for 631 the public Internet). This would be converted to a number of 632 packets under the worst-case assumption that the bottleneck link 633 capacity equals the current flow rate. No normative requirement 634 to limit bursts is given here and, until there is more industry 635 experience from the L4S experiment, it is not even known whether 636 one is needed - it seems to be in an L4S sender's self-interest to 637 limit bursts. 639 To participate in the L4S experiment, a scalable congestion control 640 MUST be capable of being replaced by a Classic congestion control (by 641 application and by administrative control). A purely Classic 642 congestion control will not tag its packets with the ECT(1) 643 codepoint. 645 Each sender in a session can use a scalable congestion control 646 independently of the congestion control used by the receiver(s) when 647 they send data. Therefore there might be ECT(1) packets in one 648 direction and ECT(0) or Not-ECT in the other. 650 Later (Section 5.4.1.1) this document discusses the conditions for 651 mixing other "'Safe' Unresponsive Traffic" (e.g. DNS, LDAP, NTP, 652 voice, game sync packets) with L4S traffic. To be clear, although 653 such traffic can share the same queue as L4S traffic, it is not 654 appropriate for the sender to tag it as ECT(1), except in the 655 (unlikely) case that it satisfies the above conditions. 657 4.4. Filtering or Smoothing of ECN Feedback 659 Section 5.2 below specifies that an L4S AQM is expected to signal L4S 660 ECN without filtering or smoothing. This contrasts with a Classic 661 AQM, which filters out variations in the queue before signalling ECN 662 marking or drop. In the L4S architecture [I-D.ietf-tsvwg-l4s-arch], 663 responsibility for smoothing out these variations shifts to the 664 sender's congestion control. 666 This shift of responsibility has the advantage that each sender can 667 smooth variations over a timescale proportionate to its own RTT. 669 Whereas, in the Classic approach, the network doesn't know the RTTs 670 of all the flows, so it has to smooth out variations for a worst-case 671 RTT to ensure stability. For all the typical flows with shorter RTT 672 than the worst-case, this makes congestion control unnecessarily 673 sluggish. 675 This also gives an L4S sender the choice not to smooth, depending on 676 its context (start-up, congestion avoidance, etc). Therefore, this 677 document places no requirement on an L4S congestion control to smooth 678 out variations in any particular way. Nonetheless, the specification 679 of a particular L4S congestion control SHOULD describe how it smooths 680 the L4S ECN signals fed back to it from the receiver. 682 5. Prerequisite Network Node Behaviour 684 5.1. Prerequisite Classification and Re-Marking Behaviour 686 A network node that implements the L4S service MUST classify arriving 687 ECT(1) packets for L4S treatment and, other than in the exceptional 688 case referred to next, it MUST classify arriving CE packets for L4S 689 treatment as well. CE packets might have originated as ECT(1) or 690 ECT(0), but the above rule to classify them as if they originated as 691 ECT(1) is the safe choice (see Appendix B.1 for rationale). The 692 exception is where some flow-aware in-network mechanism happens to be 693 available for distinguishing CE packets that originated as ECT(0), as 694 described in Section 5.3, but there is no implication that such a 695 mechanism is necessary. 697 An L4S AQM treatment follows similar codepoint transition rules to 698 those in RFC 3168. Specifically, the ECT(1) codepoint MUST NOT be 699 changed to any other codepoint than CE, and CE MUST NOT be changed to 700 any other codepoint. An ECT(1) packet is classified as ECN-capable 701 and, if congestion increases, an L4S AQM algorithm will increasingly 702 mark the ECN field as CE, otherwise forwarding packets unchanged as 703 ECT(1). Necessary conditions for an L4S marking treatment are 704 defined in Section 5.2. 706 Under persistent overload an L4S marking treatment MUST begin using 707 Classic drop until the overload episode has subsided, as recommended 708 for all AQM methods in [RFC7567] (Section 4.2.1), which follows the 709 similar advice in RFC 3168 (Section 7). Where an L4S AQM is 710 transport-aware, this requirement could be satisfied by using Classic 711 drop in only the most overloaded individual per-flow AQMs or in a 712 DualQ by redirecting packets in those flows contributing most to the 713 overload from the L4S queue so that they are subjected to drop in the 714 Classic queue [I-D.briscoe-docsis-q-protection]. 716 For backward compatibility in uncontrolled environments, a network 717 node that implements the L4S treatment MUST also implement an AQM 718 treatment for the Classic service as defined in Section 1.2. This 719 Classic AQM treatment need not mark ECT(0) packets, but if it does, 720 it will do so under the same conditions as it would drop Not-ECT 721 packets [RFC3168]. It MUST classify arriving ECT(0) and Not-ECT 722 packets for treatment by this Classic AQM (for the DualQ Coupled AQM, 723 see the extensive discussion on classification in Sections 2.3 and 724 2.5.1.1 of [I-D.ietf-tsvwg-aqm-dualq-coupled]). 726 5.2. The Meaning of L4S CE Relative to Drop 728 The likelihood that an AQM drops a Not-ECT Classic packet (p_C) MUST 729 be roughly proportional to the square of the likelihood that it would 730 have marked it if it had been an L4S packet (p_L). That is 732 p_C ~= (p_L / k)^2 734 The constant of proportionality (k) does not have to be standardised 735 for interoperability, but a value of 2 is RECOMMENDED. The term 736 'likelihood' is used above to allow for marking and dropping to be 737 either probabilistic or deterministic. 739 This formula ensures that Scalable and Classic flows will converge to 740 roughly equal congestion windows, for the worst case of Reno 741 congestion control. This is because the congestion windows of 742 Scalable and Classic congestion controls are inversely proportional 743 to p_L and sqrt(p_C) respectively. So squaring p_C in the above 744 formula counterbalances the square root that characterizes Reno- 745 friendly flows. 747 The relative strengths of L4S CE and drop are irrelevant in an AQM 748 that schedules application flows explicitly (e.g. an FQ scheduler). 749 Nonetheless, the above relationship defines the coupling between L4S 750 and Classic congestion signals in a DualQ Coupled AQM 751 [I-D.ietf-tsvwg-aqm-dualq-coupled]. 753 Note that, contrary to RFC 3168, a Dual Queue Coupled AQM 754 implementing the L4S and Classic treatments does not mark an ECT(1) 755 packet under the same conditions that it would have dropped a Not-ECT 756 packet, as allowed by [RFC8311], which updates RFC 3168. However, if 757 it marks ECT(0) packets, it does so under the same conditions that it 758 would have dropped a Not-ECT packet. 760 Also, L4S CE marking needs to be interpreted as an unsmoothed signal, 761 in contrast to the Classic approach in which AQMs filter out 762 variations before signalling congestion. An L4S AQM SHOULD NOT 763 smooth or filter out variations in the queue before signalling 764 congestion. In the L4S architecture [I-D.ietf-tsvwg-l4s-arch], the 765 sender, not the network, is responsible for smoothing out variations. 767 This requirement is worded as 'SHOULD NOT' rather than 'MUST NOT' to 768 allow for the case where the signals from a Classic smoothed AQM are 769 coupled with those from an unsmoothed L4S AQM. Nonetheless, the 770 spirit of the requirement is for all systems to expect that L4S ECN 771 signalling is unsmoothed and unfiltered, which is important for 772 interoperability. 774 5.3. Exception for L4S Packet Identification by Network Nodes with 775 Transport-Layer Awareness 777 To implement the L4S treatment, a network node does not need to 778 identify transport-layer flows. Nonetheless, if an implementer is 779 willing to identify transport-layer flows at a network node, and if 780 the most recent ECT packet in the same flow was ECT(0), the node MAY 781 classify CE packets for Classic ECN [RFC3168] treatment. In all 782 other cases, a network node MUST classify all CE packets for L4S 783 treatment. Examples of such other cases are: i) if no ECT packets 784 have yet been identified in a flow; ii) if it is not desirable for a 785 network node to identify transport-layer flows; or iii) if the most 786 recent ECT packet in a flow was ECT(1). 788 If an implementer uses flow-awareness to classify CE packets, to 789 determine whether the flow is using ECT(0) or ECT(1) it only uses the 790 most recent ECT packet of a flow (this advice will need to be 791 verified as part of L4S experiments). This is because a sender might 792 switch from sending ECT(1) (L4S) packets to sending ECT(0) (Classic 793 ECN) packets, or back again, in the middle of a transport-layer flow 794 (e.g. it might manually switch its congestion control module mid- 795 connection, or it might be deliberately attempting to confuse the 796 network). 798 5.4. Interaction of the L4S Identifier with other Identifiers 800 The examples in this section concern how additional identifiers might 801 complement the L4S identifier to classify packets between class-based 802 queues. Firstly Section 5.4.1 considers two queues, L4S and Classic, 803 as in the Coupled DualQ AQM [I-D.ietf-tsvwg-aqm-dualq-coupled], 804 either alone (Section 5.4.1.1) or within a larger queuing hierarchy 805 (Section 5.4.1.2). Then Section 5.4.2 considers schemes that might 806 combine per-flow 5-tuples with other identifiers. 808 5.4.1. DualQ Examples of Other Identifiers Complementing L4S 809 Identifiers 811 5.4.1.1. Inclusion of Additional Traffic with L4S 813 In a typical case for the public Internet a network element that 814 implements L4S in a shared queue might want to classify some low-rate 815 but unresponsive traffic (e.g. DNS, LDAP, NTP, voice, game sync 816 packets) into the low latency queue to mix with L4S traffic. Such 817 non-ECN-based packet types MUST be safe to mix with L4S traffic 818 without harming the low latency service, where 'safe' is explained in 819 Section 5.4.1.1.1 below. 821 In this case it would not be appropriate to call the queue an L4S 822 queue, because it is shared by L4S and non-L4S traffic. Instead it 823 will be called the low latency or L queue. The L queue then offers 824 two different treatments: 826 o The L4S treatment, which is a combination of the L4S AQM treatment 827 and a priority scheduling treatment; 829 o The low latency treatment, which is solely the priority scheduling 830 treatment, without ECN-marking by the AQM. 832 To identify packets for just the scheduling treatment, it would be 833 inappropriate to use the L4S ECT(1) identifier, because such traffic 834 is unresponsive to ECN marking. Therefore, a network element that 835 implements L4S in a shared queue MAY classify additional packets into 836 the L queue if they carry certain non-ECN identifiers. For instance: 838 o addresses of specific applications or hosts configured to be safe 839 (or perhaps they comply with L4S behaviour and can respond to ECN 840 feedback, but perhaps cannot set the ECN field for some reason); 842 o certain protocols that are usually lightweight (e.g. ARP, DNS); 844 o specific Diffserv codepoints that indicate traffic with limited 845 burstiness such as the EF (Expedited Forwarding [RFC3246]), Voice- 846 Admit [RFC5865] or proposed NQB (Non-Queue-Building 847 [I-D.ietf-tsvwg-nqb]) service classes or equivalent local-use 848 DSCPs (see [I-D.briscoe-tsvwg-l4s-diffserv]). 850 Of course, a packet that carried both the ECT(1) codepoint and a non- 851 ECN identifier associated with the L queue would be classified into 852 the L queue. 854 For clarity, non-ECN identifiers, such as the examples itemized 855 above, might be used by some network operators who believe they 856 identify non-L4S traffic that would be safe to mix with L4S traffic. 857 They are not alternative ways for a host to indicate that it is 858 sending L4S packets. Only the ECT(1) ECN codepoint indicates to a 859 network element that a host is sending L4S packets (and CE indicates 860 that it could have originated as ECT(1)). Specifically ECT(1) 861 indicates that the host claims its behaviour satisfies the 862 prerequisite transport requirements in Section 4. 864 To include additional traffic with L4S, a network element only reads 865 identifiers such as those itemized above. It MUST NOT alter these 866 non-ECN identifiers, so that they survive for any potential use later 867 on the network path. 869 5.4.1.1.1. 'Safe' Unresponsive Traffic 871 The above section requires unresponsive traffic to be 'safe' to mix 872 with L4S traffic. Ideally this means that the sender never sends any 873 sequence of packets at a rate that exceeds the available capacity of 874 the bottleneck link. However, typically an unresponsive transport 875 does not even know the bottleneck capacity of the path, let alone its 876 available capacity. Nonetheless, an application can be considered 877 safe enough if it paces packets out (not necessarily completely 878 regularly) such that its maximum instantaneous rate from packet to 879 packet stays well below a typical broadband access rate. 881 This is a vague but useful definition, because many low latency 882 applications of interest, such as DNS, voice, game sync packets, RPC, 883 ACKs, keep-alives, could match this description. 885 5.4.1.2. Exclusion of Traffic From L4S Treatment 887 To extend the above example, an operator might want to exclude some 888 traffic from the L4S treatment for a policy reason, e.g. security 889 (traffic from malicious sources) or commercial (e.g. initially the 890 operator may wish to confine the benefits of L4S to business 891 customers). 893 In this exclusion case, the operator MUST classify on the relevant 894 locally-used identifiers (e.g. source addresses) before classifying 895 the non-matching traffic on the end-to-end L4S ECN identifier. 897 The operator MUST NOT alter the end-to-end L4S ECN identifier from 898 L4S to Classic, because its decision to exclude certain traffic from 899 L4S treatment is local-only. The end-to-end L4S identifier then 900 survives for other operators to use, or indeed, they can apply their 901 own policy, independently based on their own choice of locally-used 902 identifiers. This approach also allows any operator to remove its 903 locally-applied exclusions in future, e.g. if it wishes to widen the 904 benefit of the L4S treatment to all its customers. 906 5.4.1.3. Generalized Combination of L4S and Other Identifiers 908 L4S concerns low latency, which it can provide for all traffic 909 without differentiation and without _necessarily_ affecting bandwidth 910 allocation. Diffserv provides for differentiation of both bandwidth 911 and low latency, but its control of latency depends on its control of 912 bandwidth. The two can be combined if a network operator wants to 913 control bandwidth allocation but it also wants to provide low latency 914 - for any amount of traffic within one of these allocations of 915 bandwidth (rather than only providing low latency by limiting 916 bandwidth) [I-D.briscoe-tsvwg-l4s-diffserv]. 918 The DualQ examples so far have been framed in the context of 919 providing the default Best Efforts Per-Hop Behaviour (PHB) using two 920 queues - a Low Latency (L) queue and a Classic (C) Queue. This 921 single DualQ structure is expected to be the most common and useful 922 arrangement. But, more generally, an operator might choose to 923 control bandwidth allocation through a hierarchy of Diffserv PHBs at 924 a node, and to offer one (or more) of these PHBs with a low latency 925 and a Classic variant. 927 In the first case, if we assume that a network element provides no 928 PHBs except the DualQ, if a packet carries ECT(1) or CE, the network 929 element would classify it for the L4S treatment irrespective of its 930 DSCP. And, if a packet carried (say) the EF DSCP, the network 931 element could classify it into the L queue irrespective of its ECN 932 codepoint. However, where the DualQ is in a hierarchy of other PHBs, 933 the classifier would classify some traffic into other PHBs based on 934 DSCP before classifying between the low latency and Classic queues 935 (based on ECT(1), CE and perhaps also the EF DSCP or other 936 identifiers as in the above example). 937 [I-D.briscoe-tsvwg-l4s-diffserv] gives a number of examples of such 938 arrangements to address various requirements. 940 [I-D.briscoe-tsvwg-l4s-diffserv] describes how an operator might use 941 L4S to offer low latency for all L4S traffic as well as using 942 Diffserv for bandwidth differentiation. It identifies two main types 943 of approach, which can be combined: the operator might split certain 944 Diffserv PHBs between L4S and a corresponding Classic service. Or it 945 might split the L4S and/or the Classic service into multiple Diffserv 946 PHBs. In either of these cases, a packet would have to be classified 947 on its Diffserv and ECN codepoints. 949 In summary, there are numerous ways in which the L4S ECN identifier 950 (ECT(1) and CE) could be combined with other identifiers to achieve 951 particular objectives. The following categorization articulates 952 those that are valid, but it is not necessarily exhaustive. Those 953 tagged 'Recommended-standard-use' could be set by the sending host or 954 a network. Those tagged 'Local-use' would only be set by a network: 956 1. Identifiers Complementing the L4S Identifier 958 A. Including More Traffic in the L Queue 959 (Could use Recommended-standard-use or Local-use identifiers) 961 B. Excluding Certain Traffic from the L Queue 962 (Local-use only) 964 2. Identifiers to place L4S classification in a PHB Hierarchy 965 (Could use Recommended-standard-use or Local-use identifiers) 967 A. PHBs Before L4S ECN Classification 969 B. PHBs After L4S ECN Classification 971 5.4.2. Per-Flow Queuing Examples of Other Identifiers Complementing L4S 972 Identifiers 974 At a node with per-flow queueing (e.g. FQ-CoDel [RFC8290]), the L4S 975 identifier could complement the Layer-4 flow ID as a further level of 976 flow granularity (i.e. Not-ECT and ECT(0) queued separately from 977 ECT(1) and CE packets). "Risk of reordering Classic CE packets" in 978 Appendix B.1 discusses the resulting ambiguity if packets originally 979 marked ECT(0) are marked CE by an upstream AQM before they arrive at 980 a node that classifies CE as L4S. It argues that the risk of 981 reordering is vanishingly small and the consequence of such a low 982 level of reordering is minimal. 984 Alternatively, it could be assumed that it is not in a flow's own 985 interest to mix Classic and L4S identifiers. Then the AQM could use 986 the ECN field to switch itself between a Classic and an L4S AQM 987 behaviour within one per-flow queue. For instance, for ECN-capable 988 packets, the AQM might consist of a simple marking threshold and an 989 L4S ECN identifier might simply select a shallower threshold than a 990 Classic ECN identifier would. 992 5.5. Limiting Packet Bursts from Links Supporting L4S AQMs 994 As well as senders needing to limit packet bursts (Section 4.3), 995 links need to limit the degree of burstiness they introduce. In both 996 cases (senders and links) this is a tradeoff, because batch-handling 997 of packets is done for good reason, e.g. processing efficiency or to 998 make efficient use of medium acquisition delay. Some take the 999 attitude that there is no point reducing burst delay at the sender 1000 below that introduced by links (or vice versa). However, delay 1001 reduction proceeds by cutting down 'the longest pole in the tent', 1002 which turns the spotlight on the next longest, and so on. 1004 This document does not set any quantified requirements for links to 1005 limit burst delay, primarily because link technologies are outside 1006 the remit of L4S specifications. Nonetheless, it would not make 1007 sense to implement an L4S AQM that feeds into a particular link 1008 technology without also reviewing opportunities to reduce any form of 1009 burst delay introduced by that link technology. This would at least 1010 limit the bursts that the link would otherwise introduce into the 1011 onward traffic, which would cause jumpy feedback to the sender as 1012 well as potential extra queuing delay downstream. This document does 1013 not presume to even give guidance on an appropriate target for such 1014 burst delay until there is more industry experience of L4S. However, 1015 as suggested in Section 4.3 it would not seem necessary to limit 1016 bursts lower than roughly 10% of the minimum base RTT expected in the 1017 typical deployment scenario (e.g. 250 us burst duration for links 1018 within the public Internet). 1020 6. L4S Experiments 1022 This section describes open questions that L4S Experiments ought to 1023 focus on. This section also documents outstanding open issues that 1024 will need to be investigated as part of L4S experimentation, given 1025 they could not be fully resolved during the WG phase. It also lists 1026 metrics that will need to be monitored during experiments 1027 (summarizing text elsewhere in L4S documents) and finally lists some 1028 potential future directions that researchers might wish to 1029 investigate. 1031 In addition to this section, [I-D.ietf-tsvwg-aqm-dualq-coupled] sets 1032 operational and management requirements for experiments with DualQ 1033 Coupled AQMs; and General operational and management requirements for 1034 experiments with L4S congestion controls are given in Section 4 and 1035 Section 5 above, e.g. co-existence and scaling requirements, 1036 incremental deployment arrangements. 1038 The specification of each scalable congestion control will need to 1039 include protocol-specific requirements for configuration and 1040 monitoring performance during experiments. Appendix A of [RFC5706] 1041 provides a helpful checklist. 1043 6.1. Open Questions 1045 L4S experiments would be expected to answer the following questions: 1047 o Have all the parts of L4S been deployed, and if so, what 1048 proportion of paths support it? 1050 o Does use of L4S over the Internet result in significantly improved 1051 user experience? 1053 o Has L4S enabled novel interactive applications? 1055 o Did use of L4S over the Internet result in improvements to the 1056 following metrics: 1058 o 1060 * queue delay (mean and 99th percentile) under various loads 1062 * utilization 1064 * starvation / fairness 1066 * scaling range of flow rates and RTTs 1068 o How much does burstiness in the Internet affect L4S performance, 1069 and how much limitation of bustiness was needed and/or was 1070 realized - both at senders and at links, especially radio links? 1072 o Was per-flow queue protection typically (un)necessary? 1074 * How well did overload protection or queue protection work? 1076 o How well did L4S flows coexist with Classic flows when sharing a 1077 bottleneck? 1079 o 1081 * How frequently did problems arise? 1083 * What caused any coexistence problems, and were any problems due 1084 to single-queue Classic ECN AQMs (this assumes single-queue 1085 Classic ECN AQMs can be distinguished from FQ ones)? 1087 o How prevalent were problems with the L4S service due to tunnels / 1088 encapsulations that do not support ECN decapsulation? 1090 o How easy was it to implement a fully compliant L4S congestion 1091 control, over various different transport protocols (TCP. QUIC, 1092 RMCAT, etc)? 1094 Monitoring for harm to other traffic, specifically bandwidth 1095 starvation or excess queuing delay, will need to be conducted 1096 alongside all early L4S experiments. It is hard, if not impossible, 1097 for an individual flow to measure its impact on other traffic. So 1098 such monitoring will need to be conducted using bespoke monitoring 1099 across flows and/or across classes of traffic. 1101 6.2. Open Issues 1103 o What is the best way forward to deal with L4S over single-queue 1104 Classic ECN AQM bottlenecks, given current problems with 1105 misdetecting L4S AQMs as Classic ECN AQMs? 1107 o Fixing the poor Interaction between current L4S congestion 1108 controls and CoDel with only Classic ECN support during flow 1109 startup 1111 6.3. Future Potential 1113 Researchers might find that L4S opens up the following interesting 1114 areas for investigation: 1116 o Potential for faster convergence time and tracking of available 1117 capacity 1119 o Potential for improvements to particular link technologies, and 1120 cross-layer interactions with them. 1122 o Potential for using virtual queues, e.g. to further reduce latency 1123 jitter, or to leave headroom for capacity variation in radio 1124 networks 1126 o Development and specification of reverse path congestion control 1127 using L4S building bocks (e.g. AccECN, QUIC) 1129 o Once queuing delay is cut down, what becomes the 'second longest 1130 pole in the tent' (other than the speed of light)? 1132 o Novel alternatives to the existing set of L4S AQMs 1134 o Novel applications enabled by L4S 1136 7. IANA Considerations 1138 The 01 codepoint of the ECN Field of the IP header is specified by 1139 the present Experimental RFC. The process for an experimental RFC to 1140 assign this codepoint in the IP header (v4 and v6) is documented in 1141 Proposed Standard [RFC8311], which updates the Proposed Standard 1142 [RFC3168]. 1144 When the present document is published as an RFC, IANA is asked to 1145 update the 01 entry in the registry, "ECN Field (Bits 6-7)" to the 1146 following (see https://www.iana.org/assignments/dscp-registry/dscp- 1147 registry.xhtml#ecn-field ): 1149 +--------+-----------------------------+----------------------------+ 1150 | Binary | Keyword | References | 1151 +--------+-----------------------------+----------------------------+ 1152 | 01 | ECT(1) (ECN-Capable | [RFC8311] | 1153 | | Transport(1))[1] | [RFC Errata 5399] | 1154 | | | [RFCXXXX] | 1155 +--------+-----------------------------+----------------------------+ 1157 [XXXX is the number that the RFC Editor assigns to the present 1158 document (this sentence to be removed by the RFC Editor)]. 1160 8. Security Considerations 1162 Approaches to assure the integrity of signals using the new 1163 identifier are introduced in Appendix C.1. See the security 1164 considerations in the L4S architecture [I-D.ietf-tsvwg-l4s-arch] for 1165 further discussion of mis-use of the identifier, as well as extensive 1166 discussion of policing rate and latency in regard to L4S. 1168 The recommendation to detect loss in time units prevents the ACK- 1169 splitting attacks described in [Savage-TCP]. 1171 9. Acknowledgements 1173 Thanks to Richard Scheffenegger, John Leslie, David Taeht, Jonathan 1174 Morton, Gorry Fairhurst, Michael Welzl, Mikael Abrahamsson and Andrew 1175 McGregor for the discussions that led to this specification. Ing-jyh 1176 (Inton) Tsang was a contributor to the early drafts of this document. 1177 And thanks to Mikael Abrahamsson, Lloyd Wood, Nicolas Kuhn, Greg 1178 White, Tom Henderson, David Black, Gorry Fairhurst, Brian Carpenter, 1179 Jake Holland, Rod Grimes and Richard Scheffenegger for providing help 1180 and reviewing this draft and to Ingemar Johansson for reviewing and 1181 providing substantial text. Particular thanks to Wes Eddy for 1182 patiently shepherding this and the other L4S drafts through the IETF 1183 process. Appendix A listing the Prague L4S Requirements is based on 1184 text authored by Marcelo Bagnulo Braun that was originally an 1185 appendix to [I-D.ietf-tsvwg-l4s-arch]. That text was in turn based 1186 on the collective output of the attendees listed in the minutes of a 1187 'bar BoF' on DCTCP Evolution during IETF-94 [TCPPrague]. 1189 The authors' contributions were part-funded by the European Community 1190 under its Seventh Framework Programme through the Reducing Internet 1191 Transport Latency (RITE) project (ICT-317700). Bob Briscoe was also 1192 funded partly by the Research Council of Norway through the TimeIn 1193 project, partly by CableLabs and partly by the Comcast Innovation 1194 Fund. The views expressed here are solely those of the authors. 1196 10. References 1198 10.1. Normative References 1200 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1201 Requirement Levels", BCP 14, RFC 2119, 1202 DOI 10.17487/RFC2119, March 1997, 1203 . 1205 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1206 of Explicit Congestion Notification (ECN) to IP", 1207 RFC 3168, DOI 10.17487/RFC3168, September 2001, 1208 . 1210 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 1211 Explicit Congestion Notification (ECN) Field", BCP 124, 1212 RFC 4774, DOI 10.17487/RFC4774, November 2006, 1213 . 1215 [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., 1216 and K. Carlberg, "Explicit Congestion Notification (ECN) 1217 for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August 1218 2012, . 1220 10.2. Informative References 1222 [A2DTCP] Zhang, T., Wang, J., Huang, J., Huang, Y., Chen, J., and 1223 Y. Pan, "Adaptive-Acceleration Data Center TCP", IEEE 1224 Transactions on Computers 64(6):1522-1533, June 2015, 1225 . 1228 [Ahmed19] Ahmed, A., "Extending TCP for Low Round Trip Delay", 1229 Masters Thesis, Uni Oslo , August 2019, 1230 . 1232 [Alizadeh-stability] 1233 Alizadeh, M., Javanmard, A., and B. Prabhakar, "Analysis 1234 of DCTCP: Stability, Convergence, and Fairness", ACM 1235 SIGMETRICS 2011 , June 2011. 1237 [ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An 1238 Algorithm for Increasing the Robustness of RED's Active 1239 Queue Management", ACIRI Technical Report , August 2001, 1240 . 1242 [DCttH15] De Schepper, K., Bondarenko, O., Briscoe, B., and I. 1243 Tsang, "'Data Centre to the Home': Ultra-Low Latency for 1244 All", RITE Project Technical Report , 2015, 1245 . 1247 [ecn-fallback] 1248 Briscoe, B. and A. Ahmed, "TCP Prague Fall-back on 1249 Detection of a Classic ECN AQM", bobbriscoe.net Technical 1250 Report TR-BB-2019-002, April 2020, 1251 . 1253 [I-D.briscoe-docsis-q-protection] 1254 Briscoe, B. and G. White, "Queue Protection to Preserve 1255 Low Latency", draft-briscoe-docsis-q-protection-00 (work 1256 in progress), July 2019. 1258 [I-D.briscoe-tsvwg-l4s-diffserv] 1259 Briscoe, B., "Interactions between Low Latency, Low Loss, 1260 Scalable Throughput (L4S) and Differentiated Services", 1261 draft-briscoe-tsvwg-l4s-diffserv-02 (work in progress), 1262 November 2018. 1264 [I-D.ietf-avtcore-cc-feedback-message] 1265 Sarker, Z., Perkins, C., Singh, V., and M. Ramalho, "RTP 1266 Control Protocol (RTCP) Feedback for Congestion Control", 1267 draft-ietf-avtcore-cc-feedback-message-09 (work in 1268 progress), November 2020. 1270 [I-D.ietf-quic-transport] 1271 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 1272 and Secure Transport", draft-ietf-quic-transport-34 (work 1273 in progress), January 2021. 1275 [I-D.ietf-tcpm-accurate-ecn] 1276 Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More 1277 Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate- 1278 ecn-13 (work in progress), November 2020. 1280 [I-D.ietf-tcpm-generalized-ecn] 1281 Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit 1282 Congestion Notification (ECN) to TCP Control Packets", 1283 draft-ietf-tcpm-generalized-ecn-06 (work in progress), 1284 October 2020. 1286 [I-D.ietf-tcpm-rack] 1287 Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "The 1288 RACK-TLP loss detection algorithm for TCP", draft-ietf- 1289 tcpm-rack-15 (work in progress), December 2020. 1291 [I-D.ietf-tsvwg-aqm-dualq-coupled] 1292 Schepper, K., Briscoe, B., and G. White, "DualQ Coupled 1293 AQMs for Low Latency, Low Loss and Scalable Throughput 1294 (L4S)", draft-ietf-tsvwg-aqm-dualq-coupled-13 (work in 1295 progress), November 2020. 1297 [I-D.ietf-tsvwg-ecn-encap-guidelines] 1298 Briscoe, B. and J. Kaippallimalil, "Guidelines for Adding 1299 Congestion Notification to Protocols that Encapsulate IP", 1300 draft-ietf-tsvwg-ecn-encap-guidelines-14 (work in 1301 progress), November 2020. 1303 [I-D.ietf-tsvwg-l4s-arch] 1304 Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low 1305 Latency, Low Loss, Scalable Throughput (L4S) Internet 1306 Service: Architecture", draft-ietf-tsvwg-l4s-arch-08 (work 1307 in progress), November 2020. 1309 [I-D.ietf-tsvwg-nqb] 1310 White, G. and T. Fossati, "A Non-Queue-Building Per-Hop 1311 Behavior (NQB PHB) for Differentiated Services", draft- 1312 ietf-tsvwg-nqb-03 (work in progress), November 2020. 1314 [I-D.ietf-tsvwg-rfc6040update-shim] 1315 Briscoe, B., "Propagating Explicit Congestion Notification 1316 Across IP Tunnel Headers Separated by a Shim", draft-ietf- 1317 tsvwg-rfc6040update-shim-12 (work in progress), November 1318 2020. 1320 [I-D.morton-tsvwg-sce] 1321 Morton, J., Heist, P., and R. Grimes, "The Some Congestion 1322 Experienced ECN Codepoint", draft-morton-tsvwg-sce-02 1323 (work in progress), November 2020. 1325 [I-D.sridharan-tcpm-ctcp] 1326 Sridharan, M., Tan, K., Bansal, D., and D. Thaler, 1327 "Compound TCP: A New TCP Congestion Control for High-Speed 1328 and Long Distance Networks", draft-sridharan-tcpm-ctcp-02 1329 (work in progress), November 2008. 1331 [I-D.stewart-tsvwg-sctpecn] 1332 Stewart, R., Tuexen, M., and X. Dong, "ECN for Stream 1333 Control Transmission Protocol (SCTP)", draft-stewart- 1334 tsvwg-sctpecn-05 (work in progress), January 2014. 1336 [LinuxPacedChirping] 1337 Misund, J. and B. Briscoe, "Paced Chirping - Rethinking 1338 TCP start-up", Proc. Linux Netdev 0x13 , March 2019, 1339 . 1341 [Mathis09] 1342 Mathis, M., "Relentless Congestion Control", PFLDNeT'09 , 1343 May 2009, . 1346 [Paced-Chirping] 1347 Misund, J., "Rapid Acceleration in TCP Prague", Masters 1348 Thesis , May 2018, 1349 . 1352 [PI2] De Schepper, K., Bondarenko, O., Tsang, I., and B. 1353 Briscoe, "PI^2 : A Linearized AQM for both Classic and 1354 Scalable TCP", Proc. ACM CoNEXT 2016 pp.105-119, December 1355 2016, 1356 . 1358 [PragueLinux] 1359 Briscoe, B., De Schepper, K., Albisser, O., Misund, J., 1360 Tilmans, O., Kuehlewind, M., and A. Ahmed, "Implementing 1361 the `TCP Prague' Requirements for Low Latency Low Loss 1362 Scalable Throughput (L4S)", Proc. Linux Netdev 0x13 , 1363 March 2019, . 1366 [QV] Briscoe, B. and P. Hurtig, "Up to Speed with Queue View", 1367 RITE Technical Report D2.3; Appendix C.2, August 2015, 1368 . 1371 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 1372 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 1373 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 1374 S., Wroclawski, J., and L. Zhang, "Recommendations on 1375 Queue Management and Congestion Avoidance in the 1376 Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998, 1377 . 1379 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1380 "Definition of the Differentiated Services Field (DS 1381 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1382 DOI 10.17487/RFC2474, December 1998, 1383 . 1385 [RFC2983] Black, D., "Differentiated Services and Tunnels", 1386 RFC 2983, DOI 10.17487/RFC2983, October 2000, 1387 . 1389 [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, 1390 J., Courtney, W., Davari, S., Firoiu, V., and D. 1391 Stiliadis, "An Expedited Forwarding PHB (Per-Hop 1392 Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002, 1393 . 1395 [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit 1396 Congestion Notification (ECN) Signaling with Nonces", 1397 RFC 3540, DOI 10.17487/RFC3540, June 2003, 1398 . 1400 [RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows", 1401 RFC 3649, DOI 10.17487/RFC3649, December 2003, 1402 . 1404 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 1405 Congestion Control Protocol (DCCP)", RFC 4340, 1406 DOI 10.17487/RFC4340, March 2006, 1407 . 1409 [RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion 1410 Control Protocol (DCCP) Congestion Control ID 2: TCP-like 1411 Congestion Control", RFC 4341, DOI 10.17487/RFC4341, March 1412 2006, . 1414 [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for 1415 Datagram Congestion Control Protocol (DCCP) Congestion 1416 Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, 1417 DOI 10.17487/RFC4342, March 2006, 1418 . 1420 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 1421 RFC 4960, DOI 10.17487/RFC4960, September 2007, 1422 . 1424 [RFC5033] Floyd, S. and M. Allman, "Specifying New Congestion 1425 Control Algorithms", BCP 133, RFC 5033, 1426 DOI 10.17487/RFC5033, August 2007, 1427 . 1429 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 1430 Friendly Rate Control (TFRC): Protocol Specification", 1431 RFC 5348, DOI 10.17487/RFC5348, September 2008, 1432 . 1434 [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. 1435 Ramakrishnan, "Adding Explicit Congestion Notification 1436 (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, 1437 DOI 10.17487/RFC5562, June 2009, 1438 . 1440 [RFC5622] Floyd, S. and E. Kohler, "Profile for Datagram Congestion 1441 Control Protocol (DCCP) Congestion ID 4: TCP-Friendly Rate 1442 Control for Small Packets (TFRC-SP)", RFC 5622, 1443 DOI 10.17487/RFC5622, August 2009, 1444 . 1446 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1447 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1448 . 1450 [RFC5706] Harrington, D., "Guidelines for Considering Operations and 1451 Management of New Protocols and Protocol Extensions", 1452 RFC 5706, DOI 10.17487/RFC5706, November 2009, 1453 . 1455 [RFC5865] Baker, F., Polk, J., and M. Dolly, "A Differentiated 1456 Services Code Point (DSCP) for Capacity-Admitted Traffic", 1457 RFC 5865, DOI 10.17487/RFC5865, May 2010, 1458 . 1460 [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP 1461 Authentication Option", RFC 5925, DOI 10.17487/RFC5925, 1462 June 2010, . 1464 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 1465 Notification", RFC 6040, DOI 10.17487/RFC6040, November 1466 2010, . 1468 [RFC6077] Papadimitriou, D., Ed., Welzl, M., Scharf, M., and B. 1469 Briscoe, "Open Research Issues in Internet Congestion 1470 Control", RFC 6077, DOI 10.17487/RFC6077, February 2011, 1471 . 1473 [RFC6660] Briscoe, B., Moncaster, T., and M. Menth, "Encoding Three 1474 Pre-Congestion Notification (PCN) States in the IP Header 1475 Using a Single Diffserv Codepoint (DSCP)", RFC 6660, 1476 DOI 10.17487/RFC6660, July 2012, 1477 . 1479 [RFC7560] Kuehlewind, M., Ed., Scheffenegger, R., and B. Briscoe, 1480 "Problem Statement and Requirements for Increased Accuracy 1481 in Explicit Congestion Notification (ECN) Feedback", 1482 RFC 7560, DOI 10.17487/RFC7560, August 2015, 1483 . 1485 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 1486 Recommendations Regarding Active Queue Management", 1487 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 1488 . 1490 [RFC7713] Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx) 1491 Concepts, Abstract Mechanism, and Requirements", RFC 7713, 1492 DOI 10.17487/RFC7713, December 2015, 1493 . 1495 [RFC8033] Pan, R., Natarajan, P., Baker, F., and G. White, 1496 "Proportional Integral Controller Enhanced (PIE): A 1497 Lightweight Control Scheme to Address the Bufferbloat 1498 Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017, 1499 . 1501 [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., 1502 and G. Judd, "Data Center TCP (DCTCP): TCP Congestion 1503 Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257, 1504 October 2017, . 1506 [RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys, 1507 J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler 1508 and Active Queue Management Algorithm", RFC 8290, 1509 DOI 10.17487/RFC8290, January 2018, 1510 . 1512 [RFC8298] Johansson, I. and Z. Sarker, "Self-Clocked Rate Adaptation 1513 for Multimedia", RFC 8298, DOI 10.17487/RFC8298, December 1514 2017, . 1516 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 1517 Notification (ECN) Experimentation", RFC 8311, 1518 DOI 10.17487/RFC8311, January 2018, 1519 . 1521 [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 1522 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 1523 RFC 8312, DOI 10.17487/RFC8312, February 2018, 1524 . 1526 [RFC8511] Khademi, N., Welzl, M., Armitage, G., and G. Fairhurst, 1527 "TCP Alternative Backoff with ECN (ABE)", RFC 8511, 1528 DOI 10.17487/RFC8511, December 2018, 1529 . 1531 [Savage-TCP] 1532 Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, 1533 "TCP Congestion Control with a Misbehaving Receiver", ACM 1534 SIGCOMM Computer Communication Review 29(5):71--78, 1535 October 1999. 1537 [sub-mss-prob] 1538 Briscoe, B. and K. De Schepper, "Scaling TCP's Congestion 1539 Window for Small Round Trip Times", BT Technical Report 1540 TR-TUB8-2015-002, May 2015, 1541 . 1543 [TCP-CA] Jacobson, V. and M. Karels, "Congestion Avoidance and 1544 Control", Laurence Berkeley Labs Technical Report , 1545 November 1988, . 1547 [TCPPrague] 1548 Briscoe, B., "Notes: DCTCP evolution 'bar BoF': Tue 21 Jul 1549 2015, 17:40, Prague", tcpprague mailing list archive , 1550 July 2015, . 1553 [VCP] Xia, Y., Subramanian, L., Stoica, I., and S. Kalyanaraman, 1554 "One more bit is enough", Proc. SIGCOMM'05, ACM CCR 1555 35(4)37--48, 2005, 1556 . 1558 Appendix A. The 'Prague L4S Requirements' 1560 This appendix is informative, not normative. It gives a list of 1561 modifications to current scalable congestion controls so that they 1562 can be deployed over the public Internet and coexist safely with 1563 existing traffic. The list complements the normative requirements in 1564 Section 4 that a sender has to comply with before it can set the L4S 1565 identifier in packets it sends into the Internet. As well as 1566 necessary safety improvements (requirements) this appendix also 1567 includes preferable performance improvements (optimizations). 1569 These recommendations have become know as the Prague L4S 1570 Requirements, because they were originally identified at an ad hoc 1571 meeting during IETF-94 in Prague [TCPPrague]. They were originally 1572 called the 'TCP Prague Requirements', but they are not solely 1573 applicable to TCP, so the name and wording has been generalized for 1574 all transport protocols, and the name 'TCP Prague' is now used for a 1575 specific implementation of the requirements. 1577 At the time of writing, DCTCP [RFC8257] is the most widely used 1578 scalable transport protocol. In its current form, DCTCP is specified 1579 to be deployable only in controlled environments. Deploying it in 1580 the public Internet would lead to a number of issues, both from the 1581 safety and the performance perspective. The modifications and 1582 additional mechanisms listed in this section will be necessary for 1583 its deployment over the global Internet. Where an example is needed, 1584 DCTCP is used as a base, but it is likely that most of these 1585 requirements equally apply to other scalable congestion controls, 1586 covering adaptive real-time media, etc., not just capacity-seeking 1587 behaviours. 1589 A.1. Requirements for Scalable Transport Protocols 1591 A.1.1. Use of L4S Packet Identifier 1593 Description: A scalable congestion control needs to distinguish the 1594 packets it sends from those sent by Classic congestion controls (see 1595 the precise normative requirement wording in Section 4.1). 1597 Motivation: It needs to be possible for a network node to classify 1598 L4S packets without flow state into a queue that applies an L4S ECN 1599 marking behaviour and isolates L4S packets from the queuing delay of 1600 Classic packets. 1602 A.1.2. Accurate ECN Feedback 1604 Description: The transport protocol for a scalable congestion control 1605 needs to provide timely, accurate feedback about the extent of ECN 1606 marking experienced by all packets (see the precise normative 1607 requirement wording in Section 4.2). 1609 Motivation: Classic congestion controls only need feedback about the 1610 existence of a congestion episode within a round trip, not precisely 1611 how many packets were marked with ECN or dropped. Therefore, in 1612 2001, when ECN feedback was added to TCP [RFC3168], it could not 1613 inform the sender of more than one ECN mark per RTT. Since then, 1614 requirements for more accurate ECN feedback in TCP have been defined 1615 in [RFC7560] and [I-D.ietf-tcpm-accurate-ecn] specifies an 1616 experimental change to the TCP wire protocol to satisfy these 1617 requirements. Most other transport protocols already satisfy this 1618 requirement (see Section 4.2). 1620 A.1.3. Fall back to Reno-friendly congestion control on packet loss 1622 Description: As well as responding to ECN markings in a scalable way, 1623 a scalable congestion control needs to react to packet loss in a way 1624 that will coexist safely with a TCP Reno congestion control [RFC5681] 1625 (see the precise normative requirement wording in Section 4.3). 1627 Motivation: Part of the safety conditions for deploying a scalable 1628 congestion control on the public Internet is to make sure that it 1629 behaves properly when it builds a queue at a network bottleneck that 1630 has not been upgraded to support L4S. Packet loss can have many 1631 causes, but it usually has to be conservatively assumed that it is a 1632 sign of congestion. Therefore, on detecting packet loss, a scalable 1633 congestion control will need to fall back to Classic congestion 1634 control behaviour. If it does not comply with this requirement it 1635 could starve Classic traffic. 1637 A scalable congestion control can be used for different types of 1638 transport, e.g. for real-time media or for reliable transport like 1639 TCP. Therefore, the particular Classic congestion control behaviour 1640 to fall back on will need to be part of the congestion control 1641 specification of the relevant transport. In the particular case of 1642 DCTCP, the DCTCP specification [RFC8257] states that "It is 1643 RECOMMENDED that an implementation deal with loss episodes in the 1644 same way as conventional TCP." For safe deployment of a scalable 1645 congestion control in the public Internet, the above requirement 1646 would need to be defined as a "MUST". 1648 Even though a bottleneck is L4S capable, it might still become 1649 overloaded and have to drop packets. In this case, the sender may 1650 receive a high proportion of packets marked with the CE bit set and 1651 also experience loss. Current DCTCP implementations each react 1652 differently to this situation. At least one implementation reacts 1653 only to the drop signal (e.g. by halving the CWND) and at least 1654 another DCTCP implementation reacts to both signals (e.g. by halving 1655 the CWND due to the drop and also further reducing the CWND based on 1656 the proportion of marked packet). A third approach for the public 1657 Internet has been proposed that adjusts the loss response to result 1658 in a halving when combined with the ECN response. We believe that 1659 further experimentation is needed to understand what is the best 1660 behaviour for the public Internet, which may or not be one of these 1661 existing approaches. 1663 A.1.4. Fall back to Reno-friendly congestion control on classic ECN 1664 bottlenecks 1666 Description: A scalable congestion control needs to react to ECN 1667 marking from a non-L4S, but ECN-capable, bottleneck in a way that 1668 will coexist with a TCP Reno congestion control [RFC5681] (see the 1669 precise normative requirement wording in Section 4.3). 1671 Motivation: Similarly to the requirement in Appendix A.1.3, this 1672 requirement is a safety condition to ensure a scalable congestion 1673 control behaves properly when it builds a queue at a network 1674 bottleneck that has not been upgraded to support L4S. On detecting 1675 Classic ECN marking (see below), a scalable congestion control will 1676 need to fall back to Classic congestion control behaviour. If it 1677 does not comply with this requirement it could starve Classic 1678 traffic. 1680 A passive monitoring algorithm to detect a Classic ECN AQM at the 1681 bottleneck is provided in [ecn-fallback], which also provides a link 1682 to Linux source code. Very briefly, the algorithm primarily monitors 1683 RTT variation using the same algorithm that maintains the mean 1684 deviation of TCP's smoothed RTT, but it smooths over a duration of 1685 the order of a Classic sawtooth. The outcome is also conditioned on 1686 other metrics such as the presence of CE marking and congestion 1687 avoidance phase having stabilized. The report also identifies 1688 further work to improve the approach, for instance improvements with 1689 low capacity links and combining the measurements with a cache of 1690 what had been learned about a path in previous connections. 1692 The relevant normative requirement (Section 4.3) is expressed as a 1693 'SHOULD' to allow the possibility that the operator of the host knows 1694 that the network it serves has not deployed any single queue classic 1695 ECN AQM (e.g. a CDN might be testing out of band for signs of Classic 1696 ECN AQMs, or they might have manually checked which ISPs they serve 1697 have not deployed Classic ECN AQMs). 1699 Nonetheless, monitoring is still expressed as a 'MUST' because there 1700 is still a possibility that there is a Classic ECN AQM somewhere else 1701 on the path (to continue the CDN example, perhaps beyond the ISP in a 1702 home network). Then, if the server operators have disabled fall-back 1703 for parts of their deployment, they can reconsider their policy or at 1704 least do more focused testing if in-band monitoring frequently 1705 detects single-queue Classic ECN AQMs. 1707 A.1.5. Reduce RTT dependence 1709 Description: A scalable congestion control needs to reduce or 1710 eliminate RTT bias at least over the low to typical range of RTTs 1711 that will interact in the intended deployment scenario (see the 1712 precise normative requirement wording in Section 4.3). 1714 Motivation: The throughput of Classic congestion controls is known to 1715 be inversely proportional to RTT, so one would expect flows over very 1716 low RTT paths to nearly starve flows over larger RTTs. However, 1717 Classic congestion controls have never allowed a very low RTT path to 1718 exist because they induce a large queue. For instance, consider two 1719 paths with base RTT 1ms and 100ms. If a Classic congestion control 1720 induces a 100ms queue, it turns these RTTs into 101ms and 200ms 1721 leading to a throughput ratio of about 2:1. Whereas if a scalable 1722 congestion control induces only a 1ms queue, the ratio is 2:101, 1723 leading to a throughput ratio of about 50:1. 1725 Therefore, with very small queues, long RTT flows will essentially 1726 starve, unless scalable congestion controls comply with this 1727 requirement. 1729 The RTT bias in current Classic congestion controls works 1730 satisfactorily when the RTT is higher than typical, and L4S does not 1731 change that. So, there is no additional requirement for high RTT L4S 1732 flows to remove RTT bias - they can but they don't have to. 1734 A.1.6. Scaling down to fractional congestion windows 1736 Description: A scalable congestion control needs to remain responsive 1737 to congestion when typical RTTs over the public Internet are 1738 significantly smaller because they are no longer inflated by queuing 1739 delay (see the precise normative requirement wording in Section 4.3). 1741 Motivation: As currently specified, the minimum required congestion 1742 window of TCP (and its derivatives) is set to 2 sender maximum 1743 segment sizes (SMSS) (see equation (4) in [RFC5681]). Once the 1744 congestion window reaches this minimum, all known window-based 1745 congestion control algorithms become unresponsive to congestion 1746 signals. No matter how much drop or ECN marking, the congestion 1747 window of all these algorithms no longer reduces. Instead, the 1748 sender's lack of any further congestion response forces the queue to 1749 grow, overriding any AQM and increasing queuing delay. 1751 L4S mechanisms significantly reduce queueing delay so, over the same 1752 path, the RTT becomes lower. Then this problem becomes surprisingly 1753 common [sub-mss-prob]. This is because, for the same link capacity, 1754 smaller RTT implies a smaller window. For instance, consider a 1755 residential setting with an upstream broadband Internet access of 8 1756 Mb/s, assuming a max segment size of 1500 B. Two upstream flows will 1757 each have the minimum window of 2 SMSS if the RTT is 6ms or less, 1758 which is quite common when accessing a nearby data centre. So, any 1759 more than two such parallel TCP flows will become unresponsive and 1760 increase queuing delay. 1762 Unless scalable congestion controls address this requirement from the 1763 start, they will frequently become unresponsive, negating the low 1764 latency benefit of L4S, for themselves and for others. 1766 That would seem to imply that scalable congestion controllers ought 1767 to be required to be able work with a congestion window less than 2 1768 SMSS. For instance, one possible mechanism that can maintain a 1769 congestion window significantly less than 1 SMSS is described in 1770 [Ahmed19], and other approaches are likely to be feasible. 1772 However, the requirement in Section 4.3 is worded as a "SHOULD" 1773 because the existence of a minimum window is not all bad. When 1774 competing with an unresponsive flow, a minimum window naturally 1775 protects the flow from starvation by at least keeping some data 1776 flowing. 1778 By stating this requirement as a "SHOULD", specifications of scalable 1779 congestion controllers will be able to choose an appropriate minimum 1780 window, but they will at least have to justify the decision. 1782 A.1.7. Measuring Reordering Tolerance in Time Units 1784 Description: A scalable congestion control needs to detect loss by 1785 counting in time-based units, which is scalable, rather than counting 1786 in units of packets, which is not (see the precise normative 1787 requirement wording in Section 4.3). 1789 Motivation: A primary purpose of L4S is scalable throughput (it's in 1790 the name). Scalability in all dimensions is, of course, also a goal 1791 of all IETF technology. The inverse linear congestion response in 1792 Section 4.3 is necessary, but not sufficient, to solve the congestion 1793 control scalability problem identified in [RFC3649]. As well as 1794 maintaining frequent ECN signals as rate scales, it is also important 1795 to ensure that a potentially false perception of loss does not limit 1796 throughput scaling. 1798 End-systems cannot know whether a missing packet is due to loss or 1799 reordering, except in hindsight - if it appears later. So they can 1800 only deem that there has been a loss if a gap in the sequence space 1801 has not been filled, either after a certain number of subsequent 1802 packets has arrived (e.g. the 3 DupACK rule of standard TCP 1803 congestion control [RFC5681]) or after a certain amount of time 1804 (e.g. the RACK approach [I-D.ietf-tcpm-rack]). 1806 As we attempt to scale packet rate over the years: 1808 o Even if only _some_ sending hosts still deem that loss has 1809 occurred by counting reordered packets, _all_ networks will have 1810 to keep reducing the time over which they keep packets in order. 1811 If some link technologies keep the time within which reordering 1812 occurs roughly unchanged, then loss over these links, as perceived 1813 by these hosts, will appear to continually rise over the years. 1815 o In contrast, if all senders detect loss in units of time, the time 1816 over which the network has to keep packets in order stays roughly 1817 invariant. 1819 Therefore hosts have an incentive to detect loss in time units (so as 1820 not to fool themselves too often into detecting losses when there are 1821 none). And for hosts that are changing their congestion control 1822 implementation to L4S, there is no downside to including time-based 1823 loss detection code in the change (loss recovery implemented in 1824 hardware is an exception, covered later). Therefore requiring L4S 1825 hosts to detect loss in time-based units would not be a burden. 1827 If this requirement is not placed on L4S hosts, even though it would 1828 be no burden on them to do so, all networks will face unnecessary 1829 uncertainty over whether some L4S hosts might be detecting loss by 1830 counting packets. Then _all_ link technologies will have to 1831 unnecessarily keep reducing the time within which reordering occurs. 1832 That is not a problem for some link technologies, but it becomes 1833 increasingly challenging for other link technologies to continue to 1834 scale, particularly those relying on channel bonding for scaling, 1835 such as LTE, 5G and DOCSIS. 1837 Given Internet paths traverse many link technologies, any scaling 1838 limit for these more challenging access link technologies would 1839 become a scaling limit for the Internet as a whole. 1841 It might be asked how it helps to place this loss detection 1842 requirement only on L4S hosts, because networks will still face 1843 uncertainty over whether non-L4S flows are detecting loss by counting 1844 DupACKs. The answer is that those link technologies for which it is 1845 challenging to keep squeezing the reordering time will only need to 1846 do so for non-L4S traffic (which they can do because the L4S 1847 identifier is visible at the IP layer). Therefore, they can focus 1848 their processing and memory resources into scaling non-L4S (Classic) 1849 traffic. Then, the higher the proportion of L4S traffic, the less of 1850 a scaling challenge they will have. 1852 To summarize, there is no reason for L4S hosts not to be part of the 1853 solution instead of part of the problem. 1855 Requirement ("MUST") or recommendation ("SHOULD")? As explained 1856 above, this is a subtle interoperability issue between hosts and 1857 networks, which seems to need a "MUST". Unless networks can be 1858 certain that all L4S hosts follow the time-based approach, they still 1859 have to cater for the worst case - continually squeeze reordering 1860 into a smaller and smaller duration - just for hosts that might be 1861 using the counting approach. However, it was decided to express this 1862 as a recommendation, using "SHOULD". The main justification was that 1863 networks can still be fairly certain that L4S hosts will follow this 1864 recommendation, because following it offers only gain and no pain. 1866 Details: 1868 The speed of loss recovery is much more significant for short flows 1869 than long, therefore a good compromise is to adapt the reordering 1870 window; from a small fraction of the RTT at the start of a flow, to a 1871 larger fraction of the RTT for flows that continue for many round 1872 trips. 1874 This is broadly the approach adopted by TCP RACK (Recent 1875 ACKnowledgements) [I-D.ietf-tcpm-rack]. However, RACK starts with 1876 the 3 DupACK approach, because the RTT estimate is not necessarily 1877 stable. As long as the initial window is paced, such initial use of 1878 3 DupACK counting would amount to time-based loss detection and 1879 therefore would satisfy the time-based loss detection recommendation 1880 of Section 4.3. This is because pacing of the initial window would 1881 ensure that 3 DupACKs early in the connection would be spread over a 1882 small fraction of the round trip. 1884 As mentioned above, hardware implementations of loss recovery using 1885 DupACK counting exist (e.g. some implementations of RoCEv2 for RDMA). 1886 For low latency, these implementations can change their congestion 1887 control to implement L4S, because the congestion control (as distinct 1888 from loss recovery) is implemented in software. But they cannot 1889 easily satisfy this loss recovery requirement. However, it is 1890 believed they do not need to. It is believed that such 1891 implementations solely exist in controlled environments, where the 1892 network technology keeps reordering extremely low anyway. This is 1893 why controlled environments with hardly any reordering are excluded 1894 from the scope of the normative recommendation in Section 4.3. 1896 Detecting loss in time units also prevents the ACK-splitting attacks 1897 described in [Savage-TCP]. 1899 A.2. Scalable Transport Protocol Optimizations 1901 A.2.1. Setting ECT in TCP Control Packets and Retransmissions 1903 Description: This item only concerns TCP and its derivatives 1904 (e.g. SCTP), because the original specification of ECN for TCP 1905 precluded the use of ECN on control packets and retransmissions. To 1906 improve performance, scalable transport protocols ought to enable ECN 1907 at the IP layer in TCP control packets (SYN, SYN-ACK, pure ACKs, 1908 etc.) and in retransmitted packets. The same is true for derivatives 1909 of TCP, e.g. SCTP. 1911 Motivation: RFC 3168 prohibits the use of ECN on these types of TCP 1912 packet, based on a number of arguments. This means these packets are 1913 not protected from congestion loss by ECN, which considerably harms 1914 performance, particularly for short flows. 1915 [I-D.ietf-tcpm-generalized-ecn] counters each argument in RFC 3168 in 1916 turn, showing it was over-cautious. Instead it proposes experimental 1917 use of ECN on all types of TCP packet as long as AccECN feedback 1918 [I-D.ietf-tcpm-accurate-ecn] is available (which is itself a 1919 prerequisite for using a scalable congestion control). 1921 A.2.2. Faster than Additive Increase 1923 Description: It would improve performance if scalable congestion 1924 controls did not limit their congestion window increase to the 1925 standard additive increase of 1 SMSS per round trip [RFC5681] during 1926 congestion avoidance. The same is true for derivatives of TCP 1927 congestion control, including similar approaches used for real-time 1928 media. 1930 Motivation: As currently defined [RFC8257], DCTCP uses the 1931 traditional TCP Reno additive increase in congestion avoidance phase. 1932 When the available capacity suddenly increases (e.g. when another 1933 flow finishes, or if radio capacity increases) it can take very many 1934 round trips to take advantage of the new capacity. TCP Cubic was 1935 designed to solve this problem, but as flow rates have continued to 1936 increase, the delay accelerating into available capacity has become 1937 prohibitive. See, for instance, the examples in Section 1.2. Even 1938 when out of its Reno-compatibility mode, every 8x scaling of Cubic's 1939 flow rate leads to 2x more acceleration delay. 1941 In the steady state, DCTCP induces about 2 ECN marks per round trip, 1942 so it is possible to quickly detect when these signals have 1943 disappeared and seek available capacity more rapidly, while 1944 minimizing the impact on other flows (Classic and scalable) 1945 [LinuxPacedChirping]. Alternatively, approaches such as Adaptive 1946 Acceleration (A2DTCP [A2DTCP]) have been proposed to address this 1947 problem in data centres, which might be deployable over the public 1948 Internet. 1950 A.2.3. Faster Convergence at Flow Start 1952 Description: It would improve performance if scalable congestion 1953 controls converged (reached their steady-state share of the capacity) 1954 faster than Classic congestion controls or at least no slower. This 1955 affects the flow start behaviour of any L4S congestion control 1956 derived from a Classic transport that uses TCP slow start, including 1957 those for real-time media. 1959 Motivation: As an example, a new DCTCP flow takes longer than a 1960 Classic congestion control to obtain its share of the capacity of the 1961 bottleneck when there are already ongoing flows using the bottleneck 1962 capacity. In a data centre environment DCTCP takes about a factor of 1963 1.5 to 2 longer to converge due to the much higher typical level of 1964 ECN marking that DCTCP background traffic induces, which causes new 1965 flows to exit slow start early [Alizadeh-stability]. In testing for 1966 use over the public Internet the convergence time of DCTCP relative 1967 to a regular loss-based TCP slow start is even less favourable 1968 [Paced-Chirping] due to the shallow ECN marking threshold needed for 1969 L4S. It is exacerbated by the typically greater mismatch between the 1970 link rate of the sending host and typical Internet access 1971 bottlenecks. This problem is detrimental in general, but would 1972 particularly harm the performance of short flows relative to Classic 1973 congestion controls. 1975 Appendix B. Alternative Identifiers 1977 This appendix is informative, not normative. It records the pros and 1978 cons of various alternative ways to identify L4S packets to record 1979 the rationale for the choice of ECT(1) (Appendix B.1) as the L4S 1980 identifier. At the end, Appendix B.8 summarises the distinguishing 1981 features of the leading alternatives. It is intended to supplement, 1982 not replace the detailed text. 1984 The leading solutions all use the ECN field, sometimes in combination 1985 with the Diffserv field. This is because L4S traffic has to indicate 1986 that it is ECN-capable anyway, because ECN is intrinsic to how L4S 1987 works. Both the ECN and Diffserv fields have the additional 1988 advantage that they are no different in either IPv4 or IPv6. A 1989 couple of alternatives that use other fields are mentioned at the 1990 end, but it is quickly explained why they are not serious contenders. 1992 B.1. ECT(1) and CE codepoints 1994 Definition: 1996 Packets with ECT(1) and conditionally packets with CE would 1997 signify L4S semantics as an alternative to the semantics of 1998 Classic ECN [RFC3168], specifically: 2000 * The ECT(1) codepoint would signify that the packet was sent by 2001 an L4S-capable sender. 2003 * Given shortage of codepoints, both L4S and Classic ECN sides of 2004 an AQM would have to use the same CE codepoint to indicate that 2005 a packet had experienced congestion. If a packet that had 2006 already been marked CE in an upstream buffer arrived at a 2007 subsequent AQM, this AQM would then have to guess whether to 2008 classify CE packets as L4S or Classic ECN. Choosing the L4S 2009 treatment would be a safer choice, because then a few Classic 2010 packets might arrive early, rather than a few L4S packets 2011 arriving late. 2013 * Additional information might be available if the classifier 2014 were transport-aware. Then it could classify a CE packet for 2015 Classic ECN treatment if the most recent ECT packet in the same 2016 flow had been marked ECT(0). However, the L4S service ought 2017 not to need tranport-layer awareness. 2019 Cons: 2021 Consumes the last ECN codepoint: The L4S service could potentially 2022 supersede the service provided by Classic ECN, therefore using 2023 ECT(1) to identify L4S packets could ultimately mean that the 2024 ECT(0) codepoint was 'wasted' purely to distinguish one form of 2025 ECN from its successor. 2027 ECN hard in some lower layers: It is not always possible to support 2028 ECN in an AQM acting in a buffer below the IP layer 2029 [I-D.ietf-tsvwg-ecn-encap-guidelines]. In such cases, the L4S 2030 service would have to drop rather than mark frames even though 2031 they might encapsulate an ECN-capable packet. 2033 Risk of reordering Classic CE packets: Classifying all CE packets 2034 into the L4S queue risks any CE packets that were originally 2035 ECT(0) being incorrectly classified as L4S. If there were delay 2036 in the Classic queue, these incorrectly classified CE packets 2037 would arrive early, which is a form of reordering. Reordering can 2038 cause TCP senders (and senders of similar transports) to 2039 retransmit spuriously. However, the risk of spurious 2040 retransmissions would be extremely low for the following reasons: 2042 1. It is quite unusual to experience queuing at more than one 2043 bottleneck on the same path (the available capacities have to 2044 be identical). 2046 2. In only a subset of these unusual cases would the first 2047 bottleneck support Classic ECN marking while the second 2048 supported L4S ECN marking, which would be the only scenario 2049 where some ECT(0) packets could be CE marked by an AQM 2050 supporting Classic ECN then the remainder experienced further 2051 delay through the Classic side of a subsequent L4S DualQ AQM. 2053 3. Even then, when a few packets are delivered early, it takes 2054 very unusual conditions to cause a spurious retransmission, in 2055 contrast to when some packets are delivered late. The first 2056 bottleneck has to apply CE-marks to at least N contiguous 2057 packets and the second bottleneck has to inject an 2058 uninterrupted sequence of at least N of these packets between 2059 two packets earlier in the stream (where N is the reordering 2060 window that the transport protocol allows before it considers 2061 a packet is lost). 2063 For example consider N=3, and consider the sequence of 2064 packets 100, 101, 102, 103,... and imagine that packets 2065 150,151,152 from later in the flow are injected as follows: 2066 100, 150, 151, 101, 152, 102, 103... If this were late 2067 reordering, even one packet arriving out of sequence would 2068 trigger a spurious retransmission, but there is no spurious 2069 retransmission here with early reordering, because packet 2070 101 moves the cumulative ACK counter forward before 3 2071 packets have arrived out of order. Later, when packets 2072 148, 149, 153... arrive, even though there is a 3-packet 2073 hole, there will be no problem, because the packets to fill 2074 the hole are already in the receive buffer. 2076 4. Even with the current TCP recommendation of N=3 [RFC5681] 2077 spurious retransmissions will be unlikely for all the above 2078 reasons. As RACK [I-D.ietf-tcpm-rack] is becoming widely 2079 deployed, it tends to adapt its reordering window to a larger 2080 value of N, which will make the chance of a contiguous 2081 sequence of N early arrivals vanishingly small. 2083 5. Even a run of 2 CE marks within a Classic ECN flow is 2084 unlikely, given FQ-CoDel is the only known widely deployed AQM 2085 that supports Classic ECN marking and it takes great care to 2086 separate out flows and to space any markings evenly along each 2087 flow. 2089 It is extremely unlikely that the above set of 5 eventualities 2090 that are each unusual in themselves would all happen 2091 simultaneously. But, even if they did, the consequences would 2092 hardly be dire: the odd spurious fast retransmission. Whenever 2093 the traffic source (a Classic congestion control) mistakes the 2094 reordering of a string of CE marks for a loss, one might think 2095 that it will reduce its congestion window as well as emitting a 2096 spurious retransmission. However, it would have already reduced 2097 its congestion window when the CE markings arrived early. If it 2098 is using ABE [RFC8511], it might reduce cwnd a little more for a 2099 loss than for a CE mark. But it will revert that reduction once 2100 it detects that the retransmission was spurious. 2102 In conclusion, the impact of early reordering due to CE being 2103 ambiguous will generally be vanishingly small. 2105 Hard to distinguish Classic ECN AQM: With this scheme, when a source 2106 receives ECN feedback, it is not explicitly clear which type of 2107 AQM generated the CE markings. This is not a problem for Classic 2108 ECN sources that send ECT(0) packets, because an L4S AQM will 2109 recognize the ECT(0) packets as Classic and apply the appropriate 2110 Classic ECN marking behaviour. 2112 However, in the absence of explicit disambiguation of the CE 2113 markings, an L4S source needs to use heuristic techniques to work 2114 out which type of congestion response to apply (see 2115 Appendix A.1.4). Otherwise, if long-running Classic flow(s) are 2116 sharing a Classic ECN AQM bottleneck with long-running L4S 2117 flow(s), which then apply an L4S response to Classic CE signals, 2118 the L4S flows would outcompete the Classic flow(s). Experiments 2119 have shown that L4S flows can take about 20 times more capacity 2120 share than equivalent Classic flows. Nonetheless, as link 2121 capacity reduces (e.g. to 4 4 Mb/s), the inequality reduces. So 2122 Classic flows always make progress and are not starved. 2124 When L4S was first proposed (in 2015, 14 years after [RFC3168] was 2125 published), it was believed that Classic ECN AQMs had failed to be 2126 deployed, because research measurements had found little or no 2127 evidence of CE marking. In subsequent years Classic ECN was 2128 included in FQ-CoDel deployments, however an FQ scheduler stops an 2129 L4S flow outcompeting Classic, because it enforces equality 2130 between flow rates. It is not known whether there have been any 2131 non-FQ deployments of Classic ECN AQMs in the subsequent years, or 2132 whether there will be in future. 2134 An algorithm for detecting a Classic ECN AQM as soon as a flow 2135 stabilizes after start-up has been proposed [ecn-fallback] (see 2136 Appendix A.1.4 for a brief summary). Testbed evaluations of v2 of 2137 the algorithm have shown detection is reasonably good for Classic 2138 ECN AQMs, in a wide range of circumstances. However, although it 2139 can correctly detect an L4S ECN AQM in many circumstances, its is 2140 often incorrect at low link capacities and/or high RTTs. Although 2141 this is the safe way round, there is a danger that it will 2142 discourage use of the algorithm. 2144 Non-L4S service for control packets: The Classic ECN RFCs [RFC3168] 2145 and [RFC5562] require a sender to clear the ECN field to Not-ECT 2146 on retransmissions and on certain control packets specifically 2147 pure ACKs, window probes and SYNs. When L4S packets are 2148 classified by the ECN field, these control packets would not be 2149 classified into an L4S queue, and could therefore be delayed 2150 relative to the other packets in the flow. This would not cause 2151 reordering (because retransmissions are already out of order, and 2152 these control packets typically carry no data). However, it would 2153 make critical control packets more vulnerable to loss and delay. 2154 To address this problem, [I-D.ietf-tcpm-generalized-ecn] proposes 2155 an experiment in which all TCP control packets and retransmissions 2156 are ECN-capable as long as appropriate ECN feedback is available 2157 in each case. 2159 Pros: 2161 Should work e2e: The ECN field generally works end-to-end across the 2162 Internet. Unlike the DSCP, the setting of the ECN field is at 2163 least forwarded unchanged by networks that do not support ECN, and 2164 networks rarely clear it to zero. 2166 Should work in tunnels: Unlike Diffserv, ECN is defined to always 2167 work across tunnels. This scheme works within a tunnel that 2168 propagates the ECN field in any of the variant ways it has been 2169 defined, from the year 2001 [RFC3168] onwards. However, it is 2170 likely that some tunnels still do not implement ECN propagation at 2171 all. 2173 Could migrate to one codepoint: If all Classic ECN senders 2174 eventually evolve to use the L4S service, the ECT(0) codepoint 2175 could be reused for some future purpose, but only once use of 2176 ECT(0) packets had reduced to zero, or near-zero, which might 2177 never happen. 2179 L4 not required: Being based on the ECN field, this scheme does not 2180 need the network to access transport layer flow identifiers. 2181 Nonetheless, it does not preclude solutions that do. 2183 B.2. ECN-DualQ-SCE1 2185 Definition: 2187 In this proposal, an L4S AQM would indicate congestion with ECT(1) 2188 in contrast to a Classic AQM, which indicates congestion with CE. 2189 More specifically: 2191 * Given shortage of codepoints, with this proposal L4S ECN hosts 2192 send packets as ECT(0), like Classic ECN does by default 2193 [RFC8311] hosts. 2195 * If the ECT(1) codepoint were used to indicate congestion in 2196 this way, it would signify a shallow queue AQM to the end-to- 2197 end transport. So those who proposed this approach called it 2198 'Some Congestion Experienced' (SCE) because of its similarity 2199 to [I-D.morton-tsvwg-sce]. It has also been described as 2200 'ECT(1) on output', in contrast to the 'ECT(1) on input' 2201 approach outlined in Appendix B.1. 2203 * The approach works best if the network is transport-aware and 2204 isolates each application flow in its own queue (per-flow 2205 queuing, or FQ). Two AQMs are implemented in each queue, one 2206 with a shallow target that marks selected ECT packets as 2207 ECT(1), the other with a deeper target that marks selected ECT 2208 packets as CE, or drops selected non-ECT packets. 2210 * A Classic congestion control would not have the logic to 2211 recognize ECT(1) as a congestion signal. So it would 2212 (correctly) drive the queue to the deeper threshold, responding 2213 only to CE markings. An L4S congestion control that 2214 understands this scheme would respond to ECT(1) markings, which 2215 ought to therefore keep the queue close to the shallower 2216 threshold. 2218 * A dual queue approach has been informally proposed, with an L4S 2219 and a Classic queue and coupling similar to 2220 [I-D.ietf-tsvwg-aqm-dualq-coupled]. In an interim 2221 classification, all ECT packets would be classified into the 2222 low latency queue, and non-ECT packets into the Classic queue. 2223 But then, in front of the low latency queue, a stateful flow 2224 characterization function would maintain a queue occupancy 2225 metric. It would then redirect any high occupancy flows into 2226 the Classic queue. 2228 Cons: 2230 Network requires transport-layer awareness: There is no variant of 2231 this approach that works without network visibility of transport 2232 layer flow identifiers (the 5-tuple). Obviously the FQ variant 2233 needs to see 5-tuples, but so does the DualQ SCE1 variant (to 2234 redirect flows based on sparseness). So there is no arrangement 2235 of this approach that operators could choose if they could not 2236 access the transport layer, or did not want to (e.g. to support 2237 full end-to-end encryption above the IP layer). 2239 Incomplete isolation: When evaluated, the DualQ variant of ECN- 2240 DualQ-SCE1 introduced impairments to both L4S and Classic flows. 2241 The evaluation used the DOCSIS queue protection function 2242 [I-D.briscoe-docsis-q-protection] to maintain the per-flow 2243 sparseness metrics and redirect packets from non-sparse flows into 2244 the Classic queue. Unfortunately, it is impossible to determine 2245 non-sparseness until sufficient packets of each flow have been 2246 analyzed. Up to this point, all packets default to the L4S queue. 2247 Then: 2249 * Long-running Classic flows experience reordering during the 2250 transition to classifying them as Classic. Worse, the 2251 reordering occurs early in the flow when it is less robust to 2252 confusing RTT measurements; 2254 * Considerable numbers of Classic packets add to the L4S queue - 2255 from all the short flows and the start of long flows before the 2256 classifier can be certain enough to redirect them to the other 2257 (Classic) queue. So true L4S flows unavoidably experience a 2258 degree of extra delay. 2260 Consumes the last ECN codepoint: The L4S service could potentially 2261 supersede the service provided by Classic ECN, therefore using 2262 ECT(1) to indicate L4S congestion could ultimately mean that the 2263 CE codepoint was 'wasted' purely to distinguish one form of 2264 congestion from its successor. 2266 Only recently updated tunnels: If this scheme is applied to an outer 2267 header within a tunnel or lower layer encapsulation, the ECT(1) 2268 codepoint will be black-holed at decapsulation, unless the 2269 decapsulator complies with changes to IP-in-IP tunnels introduced 2270 in 2010 [RFC6040], or changes to other tunnels that are 2271 (currently) work in progress [I-D.ietf-tsvwg-rfc6040update-shim], 2272 [I-D.ietf-tsvwg-ecn-encap-guidelines]. 2274 Limited TCP support for feedback: This approach requires transport 2275 layer feedback of two congestion signals ECT(1) and CE. Recently 2276 developed protocols such as QUIC provide this by default. 2277 However, there is limited space in the main TCP header to feed 2278 back both signals reliably and accurately [RFC7560]. AccECN 2279 [I-D.ietf-tcpm-accurate-ecn] devotes the limited space in the main 2280 TCP header to CE feedback, and optionally feeds back ECT(1) in a 2281 new TCP option, which will have limited initial deployment 2282 support. 2284 Alters non-participating packets: An AQM following this approach 2285 alters some selected ECT(0) packets to ECT(1) irrespective of 2286 whether they are participating in the L4S experiment. Although 2287 ECT(0) and ECT(1) have historically been defined as equivalent, in 2288 practice ECT(1) packets have been extremely rare on the Internet. 2289 Therefore, in practice, there might be a risk that firewalls and 2290 other devices will block ECT(1) packets, or at least treat them 2291 with greater suspicion. 2293 ECN hard in some lower layers: Similarly to the 'Con' point in 2294 Appendix B.1, it is not always possible to support ECN in an AQM 2295 acting in a buffer below the IP layer 2296 [I-D.ietf-tsvwg-ecn-encap-guidelines]. However, adding support to 2297 lower layers would be even harder with this scheme, because it 2298 needs space for two severity levels of congestion, not one. 2299 Without lower layer ECN support, the L4S service would have to 2300 drop rather than mark frames even though they might encapsulate an 2301 ECN-capable packet. . 2303 Non-L4S service for control packets: Identical to 'Con' point in 2304 Appendix B.1. 2306 Pros: 2308 Distinct indication of Classic ECN AQM: An AQM following the ECN- 2309 DualQ-SCE1 approach outputs distinctive signals (ECT(1)) compared 2310 to those output by a Classic ECN AQM. So an L4S congestion 2311 control using the SCE1 approach would inherently respond 2312 appropriately to a Classic AQM. 2314 Should work e2e: Identical to 'Pro' point in Appendix B.1. 2316 B.3. ECN-DualQ-SCE0 2318 Definition: 2320 This proposal is the inverse of the ECN-DualQ-SCE1 scheme (see 2321 Appendix B.2 above). L4S AQMs signal congestion with the 2322 transition ECT(1) -> ECT(0). More specifically: 2324 * L4S senders would send their packets as ECT(1), while Classic 2325 ECN senders would continue to send ECT(0) by default [RFC8311]. 2327 * FQ AQMs would work in a similar way to that described for ECN- 2328 DualQ-SCE1 in Appendix B.2 above. Except the shallow queue AQM 2329 would mark selected ECT packets with ECT(0), rather than 2330 ECT(1). 2332 It would seem possible to classify packets by both 5-tuple and 2333 ECT codepoint, so that each per-flow queue could instantiate 2334 just the one AQM appropriate to the ECT codepoint using it. In 2335 this case, CE and Not-ECT packets would be classified into the 2336 same queue as ECT(0). However, this would open up the risk of 2337 reordering explained below, so it is not considered further. 2339 * A Classic congestion control would only receive CE feedback, 2340 and it would have no logic to recognize ECT(0) as congestion 2341 markings, because it would send all its packets as ECT(0) 2342 anyway. So it would (correctly) drive the queue to the deeper 2343 threshold, responding only to CE markings. An L4S congestion 2344 control would understand ECT(0) markings as L4S congestion 2345 signals and therefore ought to keep the queue close to the 2346 shallower threshold. 2348 * Under the SCE0 scheme, a dual queue coupled AQM 2349 [I-D.ietf-tsvwg-aqm-dualq-coupled] would use ECT(1) as the L4S 2350 classifier in a very similar way to the 'ECT(1) and CE' scheme 2351 it was originally designed for. The one difference would be to 2352 classify CE packets into the Classic queue along with ECT(0) 2353 and Not-ECT. 2355 Cons: 2357 Consumes the last ECN codepoint: The L4S service could potentially 2358 supersede the service provided by Classic ECN, therefore using 2359 ECT(0) to indicate L4S congestion could ultimately mean that the 2360 CE codepoint was 'wasted' purely to distinguish one form of 2361 congestion from its successor. 2363 Incompatible with all ECN tunnels: The transition ECT(1) -> ECT(0) 2364 has never previously been recognized as valid. So, any ECT(0) 2365 marking applied to an ECT(1) outer header within a tunnel or lower 2366 layer encapsulation will be black-holed at decapsulation by any 2367 decapsulator whatever variant of ECN tunnel RFC it complies with. 2369 Limited TCP support for feedback: Identical to 'Con' point in 2370 Appendix B.2 above except space would be needed for CE and ECT(0) 2371 rather than CE and ECT(1) feedback. 2373 Risk of reordering Classic CE packets: If an L4S flow traverses a 2374 path with two or more bottleneck AQMs that both support L4S, 2375 reordering is likely to occur. This is because the first 2376 bottleneck will re-mark some ECT(1) packets to ECT(0), which will 2377 then be classified into the Classic queue of the second AQM, even 2378 though they originated as L4S packets. 2380 In contrast to the 'ECT(1) and CE' scheme in Appendix B.1, the 2381 risk of impairment in the ECN-DualQ-SCE0 case is not vanishingly 2382 small: 2384 1. Certainly, queuing at more than one bottleneck on the same 2385 path would still be quite unusual. 2387 2. However, the ECN-DualQ-SCE0 case occurs if both bottlenecks 2388 support L4S ECN and the traffic is L4S. This contrasts with 2389 the "ECT(1) and CE" case, which solely occurs if the AQMs are 2390 in a certain order (Classic followed by L4S). 2392 3. When misclassification occurs, it is from L4S to Classic. So 2393 selected packets are delivered late, which in itself adds 2394 delay, and also increases the risk that each late delivery 2395 will be deemed a loss and cause a high level of spurious 2396 retransmissions. This contrasts with the "ECT(1) and CE" case 2397 where selected packets are delivered early, which is very 2398 unlikely to have any effect (as already explained in 2399 Appendix B.1). 2401 ECN hard in some lower layers: Identical to 'Con' point in 2402 Appendix B.2. 2404 Non-L4S service for control packets: Identical to 'Con' point in 2405 Appendix B.1. 2407 Pros: 2409 Distinct indication of Classic ECN AQM: An AQM following the ECN- 2410 DualQ-SCE0 approach outputs distinctive signals (ECT(0)) compared 2411 to those output by a Classic ECN AQM (CE). So an L4S congestion 2412 control can inherently respond appropriately to a Classic AQM. 2414 Should work e2e: Identical to 'Pro' point in Appendix B.1. 2416 B.4. ECN Plus a Diffserv Codepoint (DSCP) 2418 Definition: 2420 For packets with a defined DSCP, all codepoints of the ECN field 2421 (except Not-ECT) would signify alternative L4S semantics to those 2422 for Classic ECN [RFC3168], specifically: 2424 * The L4S DSCP would signify that the packet came from an L4S- 2425 capable sender. 2427 * ECT(0) and ECT(1) would both signify that the packet was 2428 travelling between transport endpoints that were both ECN- 2429 capable. 2431 * CE would signify that the packet had been marked by an AQM 2432 implementing the L4S service. 2434 Use of a DSCP is the only approach for alternative ECN semantics 2435 given as an example in [RFC4774]. However, it was perhaps considered 2436 more for controlled environments than new end-to-end services. 2438 Cons: 2440 Consumes DSCP pairs: A DSCP is by definition not orthogonal to 2441 Diffserv. Therefore, wherever the L4S service is applied to 2442 multiple Diffserv scheduling behaviours, it would be necessary to 2443 replace each DSCP with a pair of DSCPs. 2445 Uses critical lower-layer header space: The resulting increased 2446 number of DSCPs might be hard to support for some lower layer 2447 technologies, e.g. 802.1Q and MPLS both offer only 3-bits for a 2448 maximum of 8 traffic class identifiers. Although L4S should 2449 reduce and possibly remove the need for some DSCPs intended for 2450 differentiated queuing delay, it will not remove the need for 2451 Diffserv entirely, because Diffserv is also used to allocate 2452 bandwidth, e.g. by prioritising some classes of traffic over 2453 others when traffic exceeds available capacity. 2455 Not end-to-end (host-network): Very few networks honour a DSCP set 2456 by a host. Typically a network will zero (bleach) the Diffserv 2457 field from all hosts. DSCP bleaching would turn an L4S ECN packet 2458 into a Classic ECN packet. 2460 Not end-to-end (network-network): Very few networks honour a DSCP 2461 received from a neighbouring network. Typically a network will 2462 zero (bleach) the Diffserv field from all neighbouring networks at 2463 an interconnection point. Sometimes bilateral arrangements are 2464 made between networks, such that the receiving network remarks 2465 some DSCPs to those it uses for roughly equivalent services. The 2466 likelihood that a DSCP will be bleached or ignored depends on the 2467 type of DSCP: 2469 Local-use DSCP: These tend to be used to implement application- 2470 specific network policies, but a bilateral arrangement to 2471 remark certain DSCPs is often applied to DSCPs in the local-use 2472 range simply because it is easier not to change all of a 2473 network's internal configurations when a new arrangement is 2474 made with a neighbour. 2476 Recommended standard DSCP: These do not tend to be honoured 2477 across network interconnections more than local-use DSCPs. 2478 However, if two networks decide to honour certain of each 2479 other's DSCPs, the reconfiguration is a little easier if both 2480 of their globally recognised services are already represented 2481 by the relevant recommended standard DSCPs. 2483 Note that today a recommended standard DSCP gives little more 2484 assurance of end-to-end service than a local-use DSCP. In 2485 future the range recommended as standard might give more 2486 assurance of end-to-end service than local-use, but it is 2487 unlikely that either assurance will be high, particularly given 2488 the hosts are included in the end-to-end path. 2490 Whenever DSCP bleaching did occur, it would turn an L4S ECN packet 2491 into a Classic ECN packet. 2493 Not all tunnels: Diffserv codepoints are often not propagated to the 2494 outer header when a packet is encapsulated by a tunnel header. 2495 DSCPs are propagated to the outer of uniform mode tunnels, but not 2496 pipe mode [RFC2983], and pipe mode is fairly common. Whenever 2497 pipe mode was used, it would temporarily turn an L4S ECN packet 2498 into a Classic ECN packet. 2500 ECN hard in some lower layers:: Because this approach uses both the 2501 Diffserv and ECN fields, an AQM will only work at a lower layer if 2502 both can be supported. If individual network operators wished to 2503 deploy an AQM at a lower layer, they would usually propagate an IP 2504 Diffserv codepoint to the lower layer, using for example IEEE 2505 802.1p. However, the ECN capability is harder to propagate down 2506 to lower layers because few lower layers support it. 2508 Hard to distinguish Classic ECN AQM: Defining a DSCP to indicate L4S 2509 is a way to help network nodes identify L4S packets (albeit 2510 unreliable due to the likelihood of bleaching - see above). 2511 However, it does not help hosts distinguish between ECN markings 2512 from L4S and Classic AQMs. This is because Classic AQMs would 2513 have been implemented without any logic to recognize an L4S DSCP 2514 or apply L4S marking behaviour. 2516 Pros: 2518 Could migrate to e2e: If all usage of Classic ECN migrates to usage 2519 of L4S, the DSCP would become redundant, and the ECN capability 2520 alone could eventually identify L4S packets without the 2521 interconnection problems of Diffserv detailed above, and without 2522 having permanently consumed more than one codepoint in the IP 2523 header. Although the DSCP does not generally function as an end- 2524 to-end identifier (see above), it could be used initially by 2525 individual ISPs to introduce the L4S service for their own locally 2526 generated traffic. 2528 B.5. ECN capability alone 2530 This approach uses ECN capability alone as the L4S identifier. It 2531 would only have been feasible if RFC 3168 ECN had not been widely 2532 deployed. This was the case when the choice of L4S identifier was 2533 being made and this appendix was first written. Since then, RFC 3168 2534 ECN has been widely deployed and L4S did not take this approach 2535 anyway. So this approach is not discussed further, because it is no 2536 longer a feasible option. 2538 B.6. Protocol ID 2540 It has been suggested that a new Protocol ID in the IPv4 Protocol 2541 field or the IPv6 Next Header field could identify L4S packets. 2542 However this approach is ruled out by numerous problems: 2544 o A duplicate protocol ID would need to be created for each 2545 transport (TCP, SCTP, UDP, etc.). 2547 o In IPv6, there can be a sequence of Next Header fields, and it 2548 would not be obvious which one would be expected to identify a 2549 network service like L4S. 2551 o A new protocol ID would rarely provide an end-to-end service, 2552 because It is well-known that new protocol IDs are often blocked 2553 by numerous types of middlebox. 2555 o The approach is not a solution for AQM methods below the IP layer. 2557 B.7. Source or destination addressing 2559 Locally, a network operator could arrange for L4S service to be 2560 applied based on source or destination addressing, e.g. packets from 2561 its own data centre and/or CDN hosts, packets to its business 2562 customers, etc. It could use addressing at any layer, e.g. IP 2563 addresses, MAC addresses, VLAN IDs, etc. Although addressing might 2564 be a useful tactical approach for a single ISP, it would not be a 2565 feasible approach to identify an end-to-end service like L4S. Even 2566 for a single ISP, it would require packet classifiers in buffers to 2567 be dependent on changing topology and address allocation decisions 2568 elsewhere in the network. Therefore this approach is not a feasible 2569 solution. 2571 B.8. Summary: Merits of Alternative Identifiers 2573 Table 1 and Table 2 provide a very high level summary of the pros and 2574 cons detailed against the schemes described respectively in 2575 Appendix B.1, Appendix B.4, Appendix B.2 and Appendix B.3 for nine 2576 issues that set them apart. 2578 +-----------------+----------------------+--------------------+ 2579 | Issue | ECT(1) + CE (Chosen) | DSCP + ECN | 2580 +-----------------+----------------------+--------------------+ 2581 | | initial eventual | initial eventual | 2582 | | | | 2583 | end-to-end | . . Y . . Y | N . . . ? . | 2584 | tunnels | . . ? . . Y | . O . . O . | 2585 | lower layers | . O . . . ? | N . . . ? . | 2586 | spare codepoint | N . . . . ? | N . . . . ? | 2587 | reordering | . O . . . ? | . . Y . . Y | 2588 | Classic ECN AQM | . O . . . ? | . O . . . ? | 2589 | isolation | . . Y . . Y | . . Y . . Y | 2590 | poss w/o L4 IDs | . . Y . . Y | . . Y . . Y | 2591 | TCP feedback | . O . . . Y | . O . . . Y | 2592 | TCP ctrl pkts | . O . . . ? | . . Y . . Y | 2593 +-----------------+----------------------+--------------------+ 2595 Table 1: Merits of Alternative L4S Identifiers (pt 1) 2597 +-----------------+--------------------+--------------------+ 2598 | Issue | ECN-DualQ-SCE1 | ECN-DualQ-SCE0 | 2599 +-----------------+--------------------+--------------------+ 2600 | | initial eventual | initial eventual | 2601 | | | | 2602 | end-to-end | . . Y . . Y | . . Y . . Y | 2603 | tunnels | . ? . . . ? | N . . . ? . | 2604 | lower layers | N . . . ? . | N . . . ? . | 2605 | spare codepoint | N . . N . . | N . . N . . | 2606 | reordering | N . . N . . | N . . N . . | 2607 | Classic ECN AQM | . . Y . . Y | . . Y . . Y | 2608 | isolation | N . . N . . | . . Y . . Y | 2609 | poss w/o L4 IDs | N . . N . . | . . Y . . Y | 2610 | TCP feedback | N . . . O . | N . . . O . | 2611 | TCP ctrl pkts | . O . . . ? | . O . . . ? | 2612 +-----------------+--------------------+--------------------+ 2614 Table 2: Merits of Alternative L4S Identifiers (pt 2) 2616 The schemes are scored based on both their capabilities now 2617 ('initial') and in the long term ('eventual'). The scores are one of 2618 'N, O, Y', meaning 'Poor', 'Ordinary', 'Good' respectively. The same 2619 scores are aligned vertically to aid the eye. A score of "?" in one 2620 of the positions means that this approach might optimistically become 2621 this good, given sufficient effort. The tables summarize the text 2622 and are not meant to be understandable without having read the text. 2624 Appendix C. Potential Competing Uses for the ECT(1) Codepoint 2626 The ECT(1) codepoint of the ECN field has already been assigned once 2627 for the ECN nonce [RFC3540], which has now been categorized as 2628 historic [RFC8311]. ECN is probably the only remaining field in the 2629 Internet Protocol that is common to IPv4 and IPv6 and still has 2630 potential to work end-to-end, with tunnels and with lower layers. 2631 Therefore, ECT(1) should not be reassigned to a different 2632 experimental use (L4S) without carefully assessing competing 2633 potential uses. These fall into the following categories: 2635 C.1. Integrity of Congestion Feedback 2637 Receiving hosts can fool a sender into downloading faster by 2638 suppressing feedback of ECN marks (or of losses if retransmissions 2639 are not necessary or available otherwise). 2641 The historic ECN nonce protocol [RFC3540] proposed that a TCP sender 2642 could set either of ECT(0) or ECT(1) in each packet of a flow and 2643 remember the sequence it had set. If any packet was lost or 2644 congestion marked, the receiver would miss that bit of the sequence. 2645 An ECN Nonce receiver had to feed back the least significant bit of 2646 the sum, so it could not suppress feedback of a loss or mark without 2647 a 50-50 chance of guessing the sum incorrectly. 2649 It is highly unlikely that ECT(1) will be needed for integrity 2650 protection in future. The ECN Nonce RFC [RFC3540] as been 2651 reclassified as historic, partly because other ways have been 2652 developed to protect feedback integrity of TCP and other transports 2653 [RFC8311] that do not consume a codepoint in the IP header. For 2654 instance: 2656 o the sender can test the integrity of the receiver's feedback by 2657 occasionally setting the IP-ECN field to a value normally only set 2658 by the network. Then it can test whether the receiver's feedback 2659 faithfully reports what it expects (see para 2 of Section 20.2 of 2660 [RFC3168]. This works for loss and it will work for the accurate 2661 ECN feedback [RFC7560] intended for L4S. 2663 o A network can enforce a congestion response to its ECN markings 2664 (or packet losses) by auditing congestion exposure (ConEx) 2665 [RFC7713]. Whether the receiver or a downstream network is 2666 suppressing congestion feedback or the sender is unresponsive to 2667 the feedback, or both, ConEx audit can neutralise any advantage 2668 that any of these three parties would otherwise gain. 2670 o The TCP authentication option (TCP-AO [RFC5925]) can be used to 2671 detect any tampering with TCP congestion feedback (whether 2672 malicious or accidental). TCP's congestion feedback fields are 2673 immutable end-to-end, so they are amenable to TCP-AO protection, 2674 which covers the main TCP header and TCP options by default. 2675 However, TCP-AO is often too brittle to use on many end-to-end 2676 paths, where middleboxes can make verification fail in their 2677 attempts to improve performance or security, e.g. by 2678 resegmentation or shifting the sequence space. 2680 C.2. Notification of Less Severe Congestion than CE 2682 Various researchers have proposed to use ECT(1) as a less severe 2683 congestion notification than CE, particularly to enable flows to fill 2684 available capacity more quickly after an idle period, when another 2685 flow departs or when a flow starts, e.g. VCP [VCP], Queue View (QV) 2686 [QV]. 2688 Before assigning ECT(1) as an identifier for L4S, we must carefully 2689 consider whether it might be better to hold ECT(1) in reserve for 2690 future standardisation of rapid flow acceleration, which is an 2691 important and enduring problem [RFC6077]. 2693 Pre-Congestion Notification (PCN) is another scheme that assigns 2694 alternative semantics to the ECN field. It uses ECT(1) to signify a 2695 less severe level of pre-congestion notification than CE [RFC6660]. 2696 However, the ECN field only takes on the PCN semantics if packets 2697 carry a Diffserv codepoint defined to indicate PCN marking within a 2698 controlled environment. PCN is required to be applied solely to the 2699 outer header of a tunnel across the controlled region in order not to 2700 interfere with any end-to-end use of the ECN field. Therefore a PCN 2701 region on the path would not interfere with any of the L4S service 2702 identifiers proposed in Appendix B. 2704 Authors' Addresses 2705 Koen De Schepper 2706 Nokia Bell Labs 2707 Antwerp 2708 Belgium 2710 Email: koen.de_schepper@nokia.com 2711 URI: https://www.bell-labs.com/usr/koen.de_schepper 2713 Bob Briscoe (editor) 2714 Independent 2715 UK 2717 Email: ietf@bobbriscoe.net 2718 URI: http://bobbriscoe.net/