idnits 2.17.1 draft-ietf-tsvwg-ecn-l4s-id-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 2571 has weird spacing: '...initial even...' == Line 2589 has weird spacing: '...initial even...' -- The document date (November 2, 2020) is 1264 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Missing Reference: 'RFCXXXX' is mentioned on line 1147, but not defined -- Looks like a reference, but probably isn't: '1' on line 1150 == Outdated reference: A later version (-07) exists of draft-briscoe-docsis-q-protection-00 == Outdated reference: A later version (-09) exists of draft-ietf-avtcore-cc-feedback-message-08 == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-32 == Outdated reference: A later version (-28) exists of draft-ietf-tcpm-accurate-ecn-12 == Outdated reference: A later version (-15) exists of draft-ietf-tcpm-generalized-ecn-06 == Outdated reference: A later version (-15) exists of draft-ietf-tcpm-rack-11 == Outdated reference: A later version (-25) exists of draft-ietf-tsvwg-aqm-dualq-coupled-12 == Outdated reference: A later version (-22) exists of draft-ietf-tsvwg-ecn-encap-guidelines-13 == Outdated reference: A later version (-20) exists of draft-ietf-tsvwg-l4s-arch-07 == Outdated reference: A later version (-22) exists of draft-ietf-tsvwg-nqb-02 == Outdated reference: A later version (-23) exists of draft-ietf-tsvwg-rfc6040update-shim-10 == Outdated reference: A later version (-04) exists of draft-morton-tsvwg-sce-01 == Outdated reference: A later version (-06) exists of draft-stewart-tsvwg-sctpecn-05 -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 8312 (Obsoleted by RFC 9438) Summary: 0 errors (**), 0 flaws (~~), 18 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Services (tsv) K. De Schepper 3 Internet-Draft Nokia Bell Labs 4 Intended status: Experimental B. Briscoe, Ed. 5 Expires: May 6, 2021 Independent 6 November 2, 2020 8 Identifying Modified Explicit Congestion Notification (ECN) Semantics 9 for Ultra-Low Queuing Delay (L4S) 10 draft-ietf-tsvwg-ecn-l4s-id-11 12 Abstract 14 This specification defines the identifier to be used on IP packets 15 for a new network service called low latency, low loss and scalable 16 throughput (L4S). It is similar to the original (or 'Classic') 17 Explicit Congestion Notification (ECN). 'Classic' ECN marking was 18 required to be equivalent to a drop, both when applied in the network 19 and when responded to by a transport. Unlike 'Classic' ECN marking, 20 for packets carrying the L4S identifier, the network applies marking 21 more immediately and more aggressively than drop, and the transport 22 response to each mark is reduced and smoothed relative to that for 23 drop. The two changes counterbalance each other so that the 24 throughput of an L4S flow will be roughly the same as a non-L4S flow 25 under the same conditions. Nonetheless, the much more frequent 26 control signals and the finer responses to them result in much more 27 fine-grained adjustments, so that ultra-low and consistently low 28 queuing delay (typically sub-millisecond on average) becomes possible 29 for L4S traffic without compromising link utilization. Thus even 30 capacity-seeking (TCP-like) traffic can have high bandwidth and very 31 low delay at the same time, even during periods of high traffic load. 33 The L4S identifier defined in this document distinguishes L4S from 34 'Classic' (e.g. TCP-Reno-friendly) traffic. It gives an incremental 35 migration path so that suitably modified network bottlenecks can 36 distinguish and isolate existing traffic that still follows the 37 Classic behaviour, to prevent it degrading the low queuing delay and 38 loss of L4S traffic. This specification defines the rules that L4S 39 transports and network elements need to follow to ensure they neither 40 harm each other's performance nor that of Classic traffic. Examples 41 of new active queue management (AQM) marking algorithms and examples 42 of new transports (whether TCP-like or real-time) are specified 43 separately. 45 Status of This Memo 47 This Internet-Draft is submitted in full conformance with the 48 provisions of BCP 78 and BCP 79. 50 Internet-Drafts are working documents of the Internet Engineering 51 Task Force (IETF). Note that other groups may also distribute 52 working documents as Internet-Drafts. The list of current Internet- 53 Drafts is at https://datatracker.ietf.org/drafts/current/. 55 Internet-Drafts are draft documents valid for a maximum of six months 56 and may be updated, replaced, or obsoleted by other documents at any 57 time. It is inappropriate to use Internet-Drafts as reference 58 material or to cite them other than as "work in progress." 60 This Internet-Draft will expire on May 6, 2021. 62 Copyright Notice 64 Copyright (c) 2020 IETF Trust and the persons identified as the 65 document authors. All rights reserved. 67 This document is subject to BCP 78 and the IETF Trust's Legal 68 Provisions Relating to IETF Documents 69 (https://trustee.ietf.org/license-info) in effect on the date of 70 publication of this document. Please review these documents 71 carefully, as they describe your rights and restrictions with respect 72 to this document. Code Components extracted from this document must 73 include Simplified BSD License text as described in Section 4.e of 74 the Trust Legal Provisions and are provided without warranty as 75 described in the Simplified BSD License. 77 Table of Contents 79 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 80 1.1. Latency, Loss and Scaling Problems . . . . . . . . . . . 5 81 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 82 1.3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 9 83 2. Consensus Choice of L4S Packet Identifier: Requirements . . . 9 84 3. L4S Packet Identification at Run-Time . . . . . . . . . . . . 10 85 4. Prerequisite Transport Layer Behaviour . . . . . . . . . . . 11 86 4.1. Prerequisite Codepoint Setting . . . . . . . . . . . . . 11 87 4.2. Prerequisite Transport Feedback . . . . . . . . . . . . . 11 88 4.3. Prerequisite Congestion Response . . . . . . . . . . . . 12 89 4.4. Filtering or Smoothing of ECN Feedback . . . . . . . . . 14 90 5. Prerequisite Network Node Behaviour . . . . . . . . . . . . . 15 91 5.1. Prerequisite Classification and Re-Marking Behaviour . . 15 92 5.2. The Meaning of L4S CE Relative to Drop . . . . . . . . . 16 93 5.3. Exception for L4S Packet Identification by Network Nodes 94 with Transport-Layer Awareness . . . . . . . . . . . . . 17 95 5.4. Interaction of the L4S Identifier with other Identifiers 17 96 5.4.1. DualQ Examples of Other Identifiers Complementing L4S 97 Identifiers . . . . . . . . . . . . . . . . . . . . . 17 98 5.4.1.1. Inclusion of Additional Traffic with L4S . . . . 17 99 5.4.1.2. Exclusion of Traffic From L4S Treatment . . . . . 19 100 5.4.1.3. Generalized Combination of L4S and Other 101 Identifiers . . . . . . . . . . . . . . . . . . . 19 102 5.4.2. Per-Flow Queuing Examples of Other Identifiers 103 Complementing L4S Identifiers . . . . . . . . . . . . 21 104 5.5. Limiting Packet Bursts from Links Supporting L4S AQMs . . 21 105 6. L4S Experiments . . . . . . . . . . . . . . . . . . . . . . . 22 106 6.1. Open Questions . . . . . . . . . . . . . . . . . . . . . 22 107 6.2. Open Issues . . . . . . . . . . . . . . . . . . . . . . . 23 108 6.3. Future Potential . . . . . . . . . . . . . . . . . . . . 24 109 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 110 8. Security Considerations . . . . . . . . . . . . . . . . . . . 25 111 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 25 112 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 25 113 10.1. Normative References . . . . . . . . . . . . . . . . . . 25 114 10.2. Informative References . . . . . . . . . . . . . . . . . 26 115 Appendix A. The 'Prague L4S Requirements' . . . . . . . . . . . 33 116 A.1. Requirements for Scalable Transport Protocols . . . . . . 34 117 A.1.1. Use of L4S Packet Identifier . . . . . . . . . . . . 34 118 A.1.2. Accurate ECN Feedback . . . . . . . . . . . . . . . . 34 119 A.1.3. Fall back to Reno-friendly congestion control on 120 packet loss . . . . . . . . . . . . . . . . . . . . . 35 121 A.1.4. Fall back to Reno-friendly congestion control on 122 classic ECN bottlenecks . . . . . . . . . . . . . . . 36 123 A.1.5. Reduce RTT dependence . . . . . . . . . . . . . . . . 37 124 A.1.6. Scaling down to fractional congestion windows . . . . 37 125 A.1.7. Measuring Reordering Tolerance in Time Units . . . . 38 126 A.2. Scalable Transport Protocol Optimizations . . . . . . . . 41 127 A.2.1. Setting ECT in TCP Control Packets and 128 Retransmissions . . . . . . . . . . . . . . . . . . . 41 129 A.2.2. Faster than Additive Increase . . . . . . . . . . . . 41 130 A.2.3. Faster Convergence at Flow Start . . . . . . . . . . 42 131 Appendix B. Alternative Identifiers . . . . . . . . . . . . . . 42 132 B.1. ECT(1) and CE codepoints . . . . . . . . . . . . . . . . 43 133 B.2. ECN-DualQ-SCE1 . . . . . . . . . . . . . . . . . . . . . 47 134 B.3. ECN-DualQ-SCE0 . . . . . . . . . . . . . . . . . . . . . 49 135 B.4. ECN Plus a Diffserv Codepoint (DSCP) . . . . . . . . . . 51 136 B.5. ECN capability alone . . . . . . . . . . . . . . . . . . 54 137 B.6. Protocol ID . . . . . . . . . . . . . . . . . . . . . . . 54 138 B.7. Source or destination addressing . . . . . . . . . . . . 54 139 B.8. Summary: Merits of Alternative Identifiers . . . . . . . 55 140 Appendix C. Potential Competing Uses for the ECT(1) Codepoint . 56 141 C.1. Integrity of Congestion Feedback . . . . . . . . . . . . 56 142 C.2. Notification of Less Severe Congestion than CE . . . . . 57 143 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 57 145 1. Introduction 147 This specification defines the identifier to be used on IP packets 148 for a new network service called low latency, low loss and scalable 149 throughput (L4S). It is similar to the original (or 'Classic') 150 Explicit Congestion Notification (ECN [RFC3168]). RFC 3168 required 151 an ECN mark to be equivalent to a drop, both when applied in the 152 network and when responded to by a transport. Unlike Classic ECN 153 marking, the network applies L4S marking more immediately and more 154 aggressively than drop, and the transport response to each mark is 155 reduced and smoothed relative to that for drop. The two changes 156 counterbalance each other so that the throughput of an L4S flow will 157 be roughly the same as a non-L4S flow under the same conditions. 158 Nonetheless, the much more frequent control signals and the finer 159 responses to them result in ultra-low queuing delay without 160 compromising link utilization, and this low delay can be maintained 161 during high load. Ultra-low queuing delay means less than 1 162 millisecond (ms) on average and less than about 2 ms at the 99th 163 percentile. 165 An example of a scalable congestion control that would enable the L4S 166 service is Data Center TCP (DCTCP), which until now has been 167 applicable solely to controlled environments like data centres 168 [RFC8257], because it is too aggressive to co-exist with existing 169 TCP-Reno-friendly traffic. The DualQ Coupled AQM, which is defined 170 in a complementary experimental specification 171 [I-D.ietf-tsvwg-aqm-dualq-coupled], is an AQM framework that enables 172 scalable congestion controls like DCTCP to co-exist with existing 173 traffic, each getting roughly the same flow rate when they compete 174 under similar conditions. Note that a transport such as DCTCP is 175 still not safe to deploy on the Internet unless it satisfies the 176 requirements listed in Section 4. 178 L4S is not only for elastic (TCP-like) traffic - there are scalable 179 congestion controls for real-time media, such as the L4S variant of 180 the SCReAM [RFC8298] real-time media congestion avoidance technique 181 (RMCAT). The factor that distinguishes L4S from Classic traffic is 182 its behaviour in response to congestion. The transport wire 183 protocol, e.g. TCP, QUIC, SCTP, DCCP, RTP/RTCP, is orthogonal (and 184 therefore not suitable for distinguishing L4S from Classic packets). 186 The L4S identifier defined in this document is the key piece that 187 distinguishes L4S from 'Classic' (e.g. Reno-friendly) traffic. It 188 gives an incremental migration path so that suitably modified network 189 bottlenecks can distinguish and isolate existing Classic traffic from 190 L4S traffic to prevent it from degrading the ultra-low delay and loss 191 of the new scalable transports, without harming Classic performance. 192 Initial implementation of the separate parts of the system has been 193 motivated by the performance benefits. 195 1.1. Latency, Loss and Scaling Problems 197 Latency is becoming the critical performance factor for many (most?) 198 applications on the public Internet, e.g. interactive Web, Web 199 services, voice, conversational video, interactive video, interactive 200 remote presence, instant messaging, online gaming, remote desktop, 201 cloud-based applications, and video-assisted remote control of 202 machinery and industrial processes. In the 'developed' world, 203 further increases in access network bit-rate offer diminishing 204 returns, whereas latency is still a multi-faceted problem. In the 205 last decade or so, much has been done to reduce propagation time by 206 placing caches or servers closer to users. However, queuing remains 207 a major intermittent component of latency. 209 The Diffserv architecture provides Expedited Forwarding [RFC3246], so 210 that low latency traffic can jump the queue of other traffic. 211 However, on access links dedicated to individual sites (homes, small 212 enterprises or mobile devices), often all traffic at any one time 213 will be latency-sensitive. Then, given nothing to differentiate 214 from, Diffserv makes no difference. Instead, we need to remove the 215 causes of any unnecessary delay. 217 The bufferbloat project has shown that excessively-large buffering 218 ('bufferbloat') has been introducing significantly more delay than 219 the underlying propagation time. These delays appear only 220 intermittently--only when a capacity-seeking (e.g. TCP) flow is long 221 enough for the queue to fill the buffer, making every packet in other 222 flows sharing the buffer sit through the queue. 224 Active queue management (AQM) was originally developed to solve this 225 problem (and others). Unlike Diffserv, which gives low latency to 226 some traffic at the expense of others, AQM controls latency for _all_ 227 traffic in a class. In general, AQM methods introduce an increasing 228 level of discard from the buffer the longer the queue persists above 229 a shallow threshold. This gives sufficient signals to capacity- 230 seeking (aka. greedy) flows to keep the buffer empty for its intended 231 purpose: absorbing bursts. However, RED [RFC2309] and other 232 algorithms from the 1990s were sensitive to their configuration and 233 hard to set correctly. So, this form of AQM was not widely deployed. 235 More recent state-of-the-art AQM methods, e.g. FQ-CoDel [RFC8290], 236 PIE [RFC8033], Adaptive RED [ARED01], are easier to configure, 237 because they define the queuing threshold in time not bytes, so it is 238 invariant for different link rates. However, no matter how good the 239 AQM, the sawtoothing sending window of a Classic congestion control 240 will either cause queuing delay to vary or cause the link to be 241 under-utilized. Even with a perfectly tuned AQM, the additional 242 queuing delay will be of the same order as the underlying speed-of- 243 light delay across the network. 245 If a sender's own behaviour is introducing queuing delay variation, 246 no AQM in the network can 'un-vary' the delay without significantly 247 compromising link utilization. Even flow-queuing (e.g. [RFC8290]), 248 which isolates one flow from another, cannot isolate a flow from the 249 delay variations it inflicts on itself. Therefore those applications 250 that need to seek out high bandwidth but also need low latency will 251 have to migrate to scalable congestion control. 253 Altering host behaviour is not enough on its own though. Even if 254 hosts adopt low latency behaviour (scalable congestion controls), 255 they need to be isolated from the behaviour of existing Classic 256 congestion controls that induce large queue variations. L4S enables 257 that migration by providing latency isolation in the network and 258 distinguishing the two types of packets that need to be isolated: L4S 259 and Classic. L4S isolation can be achieved with a queue per flow 260 (e.g. [RFC8290]) but a DualQ [I-D.ietf-tsvwg-aqm-dualq-coupled] is 261 sufficient, and actually gives better tail latency. Both approaches 262 are addressed in this document. 264 The DualQ solution was developed to make ultra-low latency available 265 without requiring per-flow queues at every bottleneck. This was 266 because FQ has well-known downsides - not least the need to inspect 267 transport layer headers in the network, which makes it incompatible 268 with privacy approaches such as IPSec VPN tunnels, and incompatible 269 with link layer queue management, where transport layer headers can 270 be hidden, e.g. 5G. 272 Latency is not the only concern addressed by L4S: It was known when 273 TCP congestion avoidance was first developed that it would not scale 274 to high bandwidth-delay products (footnote 6 of Jacobson and Karels 275 [TCP-CA]). Given regular broadband bit-rates over WAN distances are 276 already [RFC3649] beyond the scaling range of Reno TCP, 'less 277 unscalable' Cubic [RFC8312] and Compound [I-D.sridharan-tcpm-ctcp] 278 variants of TCP have been successfully deployed. However, these are 279 now approaching their scaling limits. Unfortunately, fully scalable 280 congestion controls such as DCTCP [RFC8257] cause Classic ECN 281 congestion controls sharing the same queue to starve themselves, 282 which is why they have been confined to private data centres or 283 research testbeds (until now). 285 It turns out that a congestion control algorithm like DCTCP that 286 solves the latency problem also solves the scalability problem of 287 Classic congestion controls. The finer sawteeth in the congestion 288 window have low amplitude, so they cause very little queuing delay 289 variation and the average time to recover from one congestion signal 290 to the next (the average duration of each sawtooth) remains 291 invariant, which maintains constant tight control as flow-rate 292 scales. A background paper [DCttH15] gives the full explanation of 293 why the design solves both the latency and the scaling problems, both 294 in plain English and in more precise mathematical form. The 295 explanation is summarised without the maths in the L4S architecture 296 document [I-D.ietf-tsvwg-l4s-arch]. 298 1.2. Terminology 300 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 301 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 302 "OPTIONAL" in this document are to be interpreted as described in 303 [RFC2119]. In this document, these words will appear with that 304 interpretation only when in ALL CAPS. Lower case uses of these words 305 are not to be interpreted as carrying RFC-2119 significance. 307 Classic Congestion Control: A congestion control behaviour that can 308 co-exist with standard TCP Reno [RFC5681] without causing 309 significantly negative impact on its flow rate [RFC5033]. With 310 Classic congestion controls, as flow rate scales, the number of 311 round trips between congestion signals (losses or ECN marks) rises 312 with the flow rate. So it takes longer and longer to recover 313 after each congestion event. Therefore control of queuing and 314 utilization becomes very slack, and the slightest disturbance 315 prevents a high rate from being attained [RFC3649]. 317 For instance, with 1500 byte packets and an end-to-end round trip 318 time (RTT) of 36 ms, over the years, as Reno flow rate scales from 319 2 to 100 Mb/s the number of round trips taken to recover from a 320 congestion event rises proportionately, from 4 round trips to 200. 321 Cubic [RFC8312] was developed to be less unscalable, but it is 322 approaching its scaling limit; with the same RTT of 36ms, at 323 100Mb/s it takes about 106 round trips to recover, and at 800 Mb/s 324 its recovery time triples to over 340 round trips, or still more 325 than 12 seconds (Reno would take 57 seconds). Cubic only becomes 326 significantly better than Reno at high delay and rate 327 combinations, for example at 90 ms RTT and 800 Mb/s a Reno flow 328 takes 4000 RTTs or 6 minutes to recover, whereas Cubic 'only' 329 needs 188 RTTs, which is still 17 seconds (double its recovery 330 time at 100Mb/s). 332 Scalable Congestion Control: A congestion control where the average 333 time from one congestion signal to the next (the recovery time) 334 remains invariant as the flow rate scales, all other factors being 335 equal. This maintains the same degree of control over queueing 336 and utilization whatever the flow rate, as well as ensuring that 337 high throughput is robust to disturbances. For instance, DCTCP 338 averages 2 congestion signals per round-trip whatever the flow 339 rate, as do other recently developed scalable congestion controls, 340 e.g. Relentless TCP [Mathis09], TCP Prague [PragueLinux] and the 341 L4S variant of SCREAM for real-time media [RFC8298]). See 342 Section 4.3 for more explanation. 344 Classic service: The Classic service is intended for all the 345 congestion control behaviours that co-exist with Reno [RFC5681] 346 (e.g. Reno itself, Cubic [RFC8312], Compound 347 [I-D.sridharan-tcpm-ctcp], TFRC [RFC5348]). The term 'Classic 348 queue' means a queue providing the Classic service. 350 Low-Latency, Low-Loss Scalable throughput (L4S) service: The 'L4S' 351 service is intended for traffic from scalable congestion control 352 algorithms, such as Data Center TCP [RFC8257]. The L4S service is 353 for more general traffic than just DCTCP--it allows the set of 354 congestion controls with similar scaling properties to DCTCP to 355 evolve, such as the examples listed above (Relentless, Prague, 356 SCReAM). The term 'L4S queue' means a queue providing the L4S 357 service. 359 The terms Classic or L4S can also qualify other nouns, such as 360 'queue', 'codepoint', 'identifier', 'classification', 'packet', 361 'flow'. For example: an L4S packet means a packet with an L4S 362 identifier sent from an L4S congestion control. 364 Both Classic and L4S services can cope with a proportion of 365 unresponsive or less-responsive traffic as well, as long as it 366 does not build a queue (e.g. DNS, VoIP, game sync datagrams, etc). 368 Reno-friendly: The subset of Classic traffic that excludes 369 unresponsive traffic and excludes experimental congestion controls 370 intended to coexist with Reno but without always being strictly 371 friendly to Reno (as allowed by [RFC5033]). Reno-friendly is used 372 in place of 'TCP-friendly', given that the TCP protocol is used 373 with many different congestion control behaviours. 375 Classic ECN: The original Explicit Congestion Notification (ECN) 376 protocol [RFC3168], which requires ECN signals to be treated the 377 same as drops, both when generated in the network and when 378 responded to by the sender. The names used for the four 379 codepoints of the 2-bit IP-ECN field are as defined in [RFC3168]: 381 Not ECT, ECT(0), ECT(1) and CE, where ECT stands for ECN-Capable 382 Transport and CE stands for Congestion Experienced. 384 1.3. Scope 386 The new L4S identifier defined in this specification is applicable 387 for IPv4 and IPv6 packets (as for Classic ECN [RFC3168]). It is 388 applicable for the unicast, multicast and anycast forwarding modes. 390 The L4S identifier is an orthogonal packet classification to the 391 Differentiated Services Code Point (DSCP) [RFC2474]. Section 5.4 392 explains what this means in practice. 394 This document is intended for experimental status, so it does not 395 update any standards track RFCs. Therefore it depends on [RFC8311], 396 which is a standards track specification that: 398 o updates the ECN proposed standard [RFC3168] to allow experimental 399 track RFCs to relax the requirement that an ECN mark must be 400 equivalent to a drop (when the network applies markings and/or 401 when the sender responds to them); 403 o changes the status of the experimental ECN nonce [RFC3540] to 404 historic; 406 o makes consequent updates to the following additional proposed 407 standard RFCs to reflect the above two bullets: 409 * ECN for RTP [RFC6679]; 411 * the congestion control specifications of various DCCP 412 congestion control identifier (CCID) profiles [RFC4341], 413 [RFC4342], [RFC5622]. 415 This document is about identifiers that are used for interoperation 416 between hosts and networks. So the audience is broad, covering 417 developers of host transports and network AQMs, as well as covering 418 how operators might wish to combine various identifiers, which would 419 require flexibility from equipment developers. 421 2. Consensus Choice of L4S Packet Identifier: Requirements 423 This subsection briefly records the process that led to a consensus 424 choice of L4S identifier, selected from all the alternatives in 425 Appendix B. 427 The identifier for packets using the Low Latency, Low Loss, Scalable 428 throughput (L4S) service needs to meet the following requirements: 430 o it SHOULD survive end-to-end between source and destination 431 applications: across the boundary between host and network, 432 between interconnected networks, and through middleboxes; 434 o it SHOULD be visible at the IP layer 436 o it SHOULD be common to IPv4 and IPv6 and transport-agnostic; 438 o it SHOULD be incrementally deployable; 440 o it SHOULD enable an AQM to classify packets encapsulated by outer 441 IP or lower-layer headers; 443 o it SHOULD consume minimal extra codepoints; 445 o it SHOULD be consistent on all the packets of a transport layer 446 flow, so that some packets of a flow are not served by a different 447 queue to others. 449 Whether the identifier would be recoverable if the experiment failed 450 is a factor that could be taken into account. However, this has not 451 been made a requirement, because that would favour schemes that would 452 be easier to fail, rather than those more likely to succeed. 454 It is recognised that the chosen identifier is unlikely to satisfy 455 all these requirements, particularly given the limited space left in 456 the IP header. Therefore a compromise will be necessary, which is 457 why all the above requirements are expressed with the word 'SHOULD' 458 not 'MUST'. Appendix B discusses the pros and cons of the 459 compromises made in various competing identification schemes against 460 the above requirements. 462 On the basis of this analysis, "ECT(1) and CE codepoints" is the best 463 compromise. Therefore this scheme is defined in detail in the 464 following sections, while Appendix B records the rationale for this 465 decision. 467 3. L4S Packet Identification at Run-Time 469 The L4S treatment is an experimental track alternative packet marking 470 treatment [RFC4774] to the Classic ECN treatment in [RFC3168], which 471 has been updated by [RFC8311] to allow experiments such as the one 472 defined in the present specification. Like Classic ECN, L4S ECN 473 identifies both network and host behaviour: it identifies the marking 474 treatment that network nodes are expected to apply to L4S packets, 475 and it identifies packets that have been sent from hosts that are 476 expected to comply with a broad type of sending behaviour. 478 For a packet to receive L4S treatment as it is forwarded, the sender 479 sets the ECN field in the IP header to the ECT(1) codepoint. See 480 Section 4 for full transport layer behaviour requirements, including 481 feedback and congestion response. 483 A network node that implements the L4S service normally classifies 484 arriving ECT(1) and CE packets for L4S treatment. See Section 5 for 485 full network element behaviour requirements, including 486 classification, ECN-marking and interaction of the L4S identifier 487 with other identifiers and per-hop behaviours. 489 4. Prerequisite Transport Layer Behaviour 491 4.1. Prerequisite Codepoint Setting 493 A sender that wishes a packet to receive L4S treatment as it is 494 forwarded, MUST set the ECN field in the IP header (v4 or v6) to the 495 ECT(1) codepoint. 497 4.2. Prerequisite Transport Feedback 499 For a transport protocol to provide scalable congestion control it 500 MUST provide feedback of the extent of CE marking on the forward 501 path. When ECN was added to TCP [RFC3168], the feedback method 502 reported no more than one CE mark per round trip. Some transport 503 protocols derived from TCP mimic this behaviour while others report 504 the accurate extent of ECN marking. This means that some transport 505 protocols will need to be updated as a prerequisite for scalable 506 congestion control. The position for a few well-known transport 507 protocols is given below. 509 TCP: Support for the accurate ECN feedback requirements [RFC7560] 510 (such as that provided by AccECN [I-D.ietf-tcpm-accurate-ecn]) by 511 both ends is a prerequisite for scalable congestion control in 512 TCP. Therefore, the presence of ECT(1) in the IP headers even in 513 one direction of a TCP connection will imply that both ends must 514 be supporting accurate ECN feedback. However, the converse does 515 not apply. So even if both ends support AccECN, either of the two 516 ends can choose not to use a scalable congestion control, whatever 517 the other end's choice. 519 SCTP: A suitable ECN feedback mechanism for SCTP could add a chunk 520 to report the number of received CE marks 521 (e.g. [I-D.stewart-tsvwg-sctpecn]), and update the ECN feedback 522 protocol sketched out in Appendix A of the standards track 523 specification of SCTP [RFC4960]. 525 RTP over UDP: A prerequisite for scalable congestion control is for 526 both (all) ends of one media-level hop to signal ECN support 527 [RFC6679] and use the new generic RTCP feedback format of 528 [I-D.ietf-avtcore-cc-feedback-message]. The presence of ECT(1) 529 implies that both (all) ends of that media-level hop support ECN. 530 However, the converse does not apply. So each end of a media- 531 level hop can independently choose not to use a scalable 532 congestion control, even if both ends support ECN. 534 QUIC: Support for sufficiently fine-grained ECN feedback is provided 535 by the v1 IETF QUIC transport [I-D.ietf-quic-transport]. 537 DCCP: The ACK vector in DCCP [RFC4340] is already sufficient to 538 report the extent of CE marking as needed by a scalable congestion 539 control. 541 4.3. Prerequisite Congestion Response 543 As a condition for a host to send packets with the L4S identifier 544 (ECT(1)), it SHOULD implement a congestion control behaviour that 545 ensures that, in steady state, the average time from one ECN 546 congestion signal to the next (the 'recovery time') does not increase 547 as flow rate scales, all other factors being equal. This is termed a 548 scalable congestion control. This is necessary to ensure that queue 549 variations remain small as flow rate scales, without having to 550 sacrifice utilization. 552 For instance, for DCTCP, TCP Prague [PragueLinux] and the L4S variant 553 of SCReAM [RFC8298], the average recovery time is always half a round 554 trip, whatever the flow rate. 556 As with all transport behaviours, a detailed specification (probably 557 an experimental RFC) will need to be defined for each congestion 558 control, following the guidelines for specifying new congestion 559 control algorithms in [RFC5033]. In addition it will need to 560 document these L4S-specific matters, specifically the timescale over 561 which the proportionality is averaged, and control of burstiness. 562 The recovery time requirement above is worded as a 'SHOULD' rather 563 than a 'MUST' to allow reasonable flexibility when defining these 564 specifications. 566 The condition 'all other factors being equal', allows the recovery 567 time to be different for different round trip times, as long as it 568 does not increase with flow rate for any particular RTT. 570 Saying that the recovery time remains roughly invariant is equivalent 571 to saying that the number of ECN CE marks per round trip remains 572 invariant as flow rate scales, all other factors being equal. For 573 instance, DCTCP's average recovery time of half of 1 RTT is 574 equivalent to 2 ECN marks per round trip. For those familiar with 575 steady-state congestion response functions, it is also equivalent to 576 say that the congestion window is inversely proportional to the 577 proportion of bytes in packets marked with the CE codepoint (see 578 section 2 of [PI2]). 580 In order to coexist safely with other Internet traffic, a scalable 581 congestion control MUST NOT tag its packets with the ECT(1) codepoint 582 unless it complies with the following bulleted requirements. The 583 specification of a particular scalable congestion control MUST 584 describe in detail how it satisfies each requirement and, for any 585 non-mandatory requirements, it MUST justify why it does not comply: 587 o As well as responding to ECN markings, a scalable congestion 588 control MUST react to packet loss in a way that will coexist 589 safely with a TCP Reno congestion control [RFC5681] (see 590 Appendix A.1.3 for rationale). 592 o A scalable congestion control MUST implement monitoring in order 593 to detect a likely non-L4S but ECN-capable AQM at the bottleneck. 594 On detection of a likely ECN-capable bottleneck it SHOULD be 595 capable (dependent on configuration) of automatically adapting its 596 congestion response to coexist with TCP Reno congestion controls 597 [RFC5681] (see Appendix A.1.4 for rationale and a referenced 598 algorithm). 600 Note that a scalable congestion control is not expected to change 601 to setting ECT(0) while it falls back to coexist with Reno. 603 o A scalable congestion control MUST eliminate RTT bias as much as 604 possible in the range between the minimum likely RTT and typical 605 RTTs expected in the intended deployment scenario (see 606 Appendix A.1.5 for rationale). 608 o A scalable congestion control SHOULD remain responsive to 609 congestion when typical RTTs over the public Internet are 610 significantly smaller because they are no longer inflated by 611 queuing delay (see Appendix A.1.6 for rationale). 613 o A scalable congestion control intended for reordering-prone 614 networks SHOULD detect loss by counting in time-based units, which 615 is scalable, as opposed to counting in units of packets (as in the 616 3 DupACK rule of RFC 5681 TCP), which is not scalable (see 617 Appendix A.1.7 for rationale). This requirement is scoped to 618 'reordering-prone networks' in order to exclude congestion 619 controls that are solely used in controlled environments where the 620 network introduces hardly any reordering. 622 o A scalable congestion control is expected to limit the queue 623 caused by bursts of packets. It would not seem necessary to set 624 the limit any lower than 10% of the minimum RTT expected in a 625 typical deployment (e.g. additional queuing of roughly 250 us for 626 the public Internet). This would be converted to a number of 627 packets under the worst-case assumption that the bottleneck link 628 capacity equals the current flow rate. No normative requirement 629 to limit bursts is given here and, until there is more industry 630 experience from the L4S experiment, it is not even known whether 631 one is needed - it seems to be in an L4S sender's self-interest to 632 limit bursts. Instead, it is only required that the specification 633 of a particular scalable congestion control MUST define, quantify 634 and justify its approach to limiting bursts. 636 To participate in the L4S experiment, a scalable congestion control 637 MUST be capable of being replaced by a Classic congestion control (by 638 application and by administrative control). A purely Classic 639 congestion control will not tag its packets with the ECT(1) 640 codepoint. 642 Each sender in a session can use a scalable congestion control 643 independently of the congestion control used by the receiver(s) when 644 they send data. Therefore there might be ECT(1) packets in one 645 direction and ECT(0) or Not-ECT in the other. 647 Later (Section 5.4.1.1) this document discusses the conditions for 648 mixing other "'Safe' Unresponsive Traffic" (e.g. DNS, LDAP, NTP, 649 voice, game sync packets) with L4S traffic. To be clear, although 650 such traffic can share the same queue as L4S traffic, it is not 651 appropriate for the sender to tag it as ECT(1), except in the 652 (unlikely) case that it satisfies the above conditions. 654 4.4. Filtering or Smoothing of ECN Feedback 656 Section 5.2 below specifies that an L4S AQM is expected to signal L4S 657 ECN without filtering or smoothing. This contrasts with a Classic 658 AQM, which filters out variations in the queue before signalling ECN 659 marking or drop. In the L4S architecture [I-D.ietf-tsvwg-l4s-arch], 660 responsibility for smoothing out these variations shifts to the 661 sender's congestion control. 663 This shift of responsibility has the advantage that each sender can 664 smooth variations over a timescale proportionate to its own RTT. 665 Whereas, in the Classic approach, the network doesn't know the RTTs 666 of all the flows, so it has to smooth out variations for a worst-case 667 RTT to ensure stability. For all the typical flows with shorter RTT 668 than the worst-case, this makes congestion control unnecessarily 669 sluggish. 671 This also gives an L4S sender the choice not to smooth, depending on 672 its context (start-up, congestion avoidance, etc). Therefore, this 673 document places no requirement on an L4S congestion control to smooth 674 out variations in any particular way. Nonetheless, the specification 675 of a particular L4S congestion control SHOULD describe how it smooths 676 the L4S ECN signals fed back to it from the receiver. 678 5. Prerequisite Network Node Behaviour 680 5.1. Prerequisite Classification and Re-Marking Behaviour 682 A network node that implements the L4S service MUST classify arriving 683 ECT(1) packets for L4S treatment and, other than in the exceptional 684 case referred to next, it MUST classify arriving CE packets for L4S 685 treatment as well. CE packets might have originated as ECT(1) or 686 ECT(0), but the above rule to classify them as if they originated as 687 ECT(1) is the safe choice (see Appendix B.1 for rationale). The 688 exception is where some flow-aware in-network mechanism happens to be 689 available for distinguishing CE packets that originated as ECT(0), as 690 described in Section 5.3, but there is no implication that such a 691 mechanism is necessary. 693 An L4S AQM treatment follows similar codepoint transition rules to 694 those in RFC 3168. Specifically, the ECT(1) codepoint MUST NOT be 695 changed to any other codepoint than CE, and CE MUST NOT be changed to 696 any other codepoint. An ECT(1) packet is classified as ECN-capable 697 and, if congestion increases, an L4S AQM algorithm will increasingly 698 mark the ECN field as CE, otherwise forwarding packets unchanged as 699 ECT(1). Necessary conditions for an L4S marking treatment are 700 defined in Section 5.2. 702 Under persistent overload an L4S marking treatment SHOULD begin using 703 Classic drop until the overload episode has subsided, as recommended 704 for all AQM methods in [RFC7567] (Section 4.2.1), which follows the 705 similar advice in RFC 3168 (Section 7). Where an L4S AQM is 706 transport-aware, this requirement could be satisfied by redirecting 707 packets in those flows contributing most to the overload so that they 708 are subjected to drop in the Classic queue 709 [I-D.briscoe-docsis-q-protection]. 711 For backward compatibility in uncontrolled environments, a network 712 node that implements the L4S treatment MUST also implement an AQM 713 treatment for the Classic service as defined in Section 1.2. This 714 Classic AQM treatment need not mark ECT(0) packets, but if it does, 715 it will do so under the same conditions as it would drop Not-ECT 716 packets [RFC3168]. It MUST classify arriving ECT(0) and Not-ECT 717 packets for treatment by this Classic AQM (for the DualQ Coupled AQM, 718 see the extensive discussion on classification in Sections 2.3 and 719 2.5.1.1 of [I-D.ietf-tsvwg-aqm-dualq-coupled]). 721 5.2. The Meaning of L4S CE Relative to Drop 723 The likelihood that an AQM drops a Not-ECT Classic packet (p_C) MUST 724 be roughly proportional to the square of the likelihood that it would 725 have marked it if it had been an L4S packet (p_L). That is 727 p_C ~= (p_L / k)^2 729 The constant of proportionality (k) does not have to be standardised 730 for interoperability, but a value of 2 is RECOMMENDED. The term 731 'likelihood' is used above to allow for marking and dropping to be 732 either probabilistic or deterministic. 734 This formula ensures that Scalable and Classic flows will converge to 735 roughly equal congestion windows, for the worst case of Reno 736 congestion control. This is because the congestion windows of 737 Scalable and Classic congestion controls are inversely proportional 738 to p_L and sqrt(p_C) respectively. So squaring p_C in the above 739 formula counterbalances the square root that characterizes Reno- 740 friendly flows. 742 The relative strengths of L4S CE and drop are irrelevant in an AQM 743 that schedules application flows explicitly (e.g. an FQ scheduler). 744 Nonetheless, the above relationship defines the coupling between L4S 745 and Classic congestion signals in a DualQ Coupled AQM 746 [I-D.ietf-tsvwg-aqm-dualq-coupled]. 748 Note that, contrary to RFC 3168, a Dual Queue Coupled AQM 749 implementing the L4S and Classic treatments does not mark an ECT(1) 750 packet under the same conditions that it would have dropped a Not-ECT 751 packet, as allowed by [RFC8311], which updates RFC 3168. However, if 752 it marks ECT(0) packets, it does so under the same conditions that it 753 would have dropped a Not-ECT packet. 755 Also, L4S CE marking needs to be interpreted as an unsmoothed signal, 756 in contrast to the Classic approach in which AQMs filter out 757 variations before signalling congestion. An L4S AQM SHOULD NOT 758 smooth or filter out variations in the queue before signalling 759 congestion. In the L4S architecture [I-D.ietf-tsvwg-l4s-arch], the 760 sender, not the network, is responsible for smoothing out variations. 762 This requirement is worded as 'SHOULD NOT' rather than 'MUST NOT' to 763 allow for the case where the signals from a Classic smoothed AQM are 764 coupled with those from an unsmoothed L4S AQM. Nonetheless, the 765 spirit of the requirement is for all systems to expect that L4S ECN 766 signalling is unsmoothed and unfiltered, which is important for 767 interoperability. 769 5.3. Exception for L4S Packet Identification by Network Nodes with 770 Transport-Layer Awareness 772 To implement the L4S treatment, a network node does not need to 773 identify transport-layer flows. Nonetheless, if an implementer is 774 willing to identify transport-layer flows at a network node, and if 775 the most recent ECT packet in the same flow was ECT(0), the node MAY 776 classify CE packets for Classic ECN [RFC3168] treatment. In all 777 other cases, a network node MUST classify all CE packets for L4S 778 treatment. Examples of such other cases are: i) if no ECT packets 779 have yet been identified in a flow; ii) if it is not desirable for a 780 network node to identify transport-layer flows; or iii) if the most 781 recent ECT packet in a flow was ECT(1). 783 If an implementer uses flow-awareness to classify CE packets, to 784 determine whether the flow is using ECT(0) or ECT(1) it only uses the 785 most recent ECT packet of a flow (this advice will need to be 786 verified as part of L4S experiments). This is because a sender might 787 switch from sending ECT(1) (L4S) packets to sending ECT(0) (Classic 788 ECN) packets, or back again, in the middle of a transport-layer flow 789 (e.g. it might manually switch its congestion control module mid- 790 connection, or it might be deliberately attempting to confuse the 791 network). 793 5.4. Interaction of the L4S Identifier with other Identifiers 795 The examples in this section concern how additional identifiers might 796 complement the L4S identifier to classify packets between class-based 797 queues. Firstly Section 5.4.1 considers two queues, L4S and Classic, 798 as in the Coupled DualQ AQM [I-D.ietf-tsvwg-aqm-dualq-coupled], 799 either alone (Section 5.4.1.1) or within a larger queuing hierarchy 800 (Section 5.4.1.2). Then Section 5.4.2 considers schemes that might 801 combine per-flow 5-tuples with other identifiers. 803 5.4.1. DualQ Examples of Other Identifiers Complementing L4S 804 Identifiers 806 5.4.1.1. Inclusion of Additional Traffic with L4S 808 In a typical case for the public Internet a network element that 809 implements L4S in a shared queue might want to classify some low-rate 810 but unresponsive traffic (e.g. DNS, LDAP, NTP, voice, game sync 811 packets) into the low latency queue to mix with L4S traffic. Such 812 non-ECN-based packet types MUST be safe to mix with L4S traffic 813 without harming the low latency service, where 'safe' is explained in 814 Section 5.4.1.1.1 below. 816 In this case it would not be appropriate to call the queue an L4S 817 queue, because it is shared by L4S and non-L4S traffic. Instead it 818 will be called the low latency or L queue. The L queue then offers 819 two different treatments: 821 o The L4S treatment, which is a combination of the L4S AQM treatment 822 and a priority scheduling treatment; 824 o The low latency treatment, which is solely the priority scheduling 825 treatment, without ECN-marking by the AQM. 827 To identify packets for just the scheduling treatment, it would be 828 inappropriate to use the L4S ECT(1) identifier, because such traffic 829 is unresponsive to ECN marking. Therefore, a network element that 830 implements L4S in a shared queue MAY classify additional packets into 831 the L queue if they carry certain non-ECN identifiers. For instance: 833 o addresses of specific applications or hosts configured to be safe 834 (or perhaps they comply with L4S behaviour and can respond to ECN 835 feedback, but perhaps cannot set the ECN field for some reason); 837 o certain protocols that are usually lightweight (e.g. ARP, DNS); 839 o specific Diffserv codepoints that indicate traffic with limited 840 burstiness such as the EF (Expedited Forwarding [RFC3246]), Voice- 841 Admit [RFC5865] or proposed NQB (Non-Queue-Building 842 [I-D.ietf-tsvwg-nqb]) service classes or equivalent local-use 843 DSCPs (see [I-D.briscoe-tsvwg-l4s-diffserv]). 845 Of course, a packet that carried both the ECT(1) codepoint and a non- 846 ECN identifier associated with the L queue would be classified into 847 the L queue. 849 For clarity, non-ECN identifiers, such as the examples itemized 850 above, might be used by some network operators who believe they 851 identify non-L4S traffic that would be safe to mix with L4S traffic. 852 They are not alternative ways for a host to indicate that it is 853 sending L4S packets. Only the ECT(1) ECN codepoint indicates to a 854 network element that a host is sending L4S packets (and CE indicates 855 that it could have originated as ECT(1)). Specifically ECT(1) 856 indicates that the host claims its behaviour satisfies the 857 prerequisite transport requirements in Section 4. 859 To include additional traffic with L4S, a network element only reads 860 identifiers such as those itemized above. It MUST NOT alter these 861 non-ECN identifiers, so that they survive for any potential use later 862 on the network path. 864 5.4.1.1.1. 'Safe' Unresponsive Traffic 866 The above section requires unresponsive traffic to be 'safe' to mix 867 with L4S traffic. Ideally this means that the sender never sends any 868 sequence of packets at a rate that exceeds the available capacity of 869 the bottleneck link. However, typically an unresponsive transport 870 does not even know the bottleneck capacity of the path, let alone its 871 available capacity. Nonetheless, an application can be considered 872 safe enough if it paces packets out (not necessarily completely 873 regularly) such that its maximum instantaneous rate from packet to 874 packet stays well below a typical broadband access rate. 876 This is a vague but useful definition, because many low latency 877 applications of interest, such as DNS, voice, game sync packets, RPC, 878 ACKs, keep-alives, could match this description. 880 5.4.1.2. Exclusion of Traffic From L4S Treatment 882 To extend the above example, an operator might want to exclude some 883 traffic from the L4S treatment for a policy reason, e.g. security 884 (traffic from malicious sources) or commercial (e.g. initially the 885 operator may wish to confine the benefits of L4S to business 886 customers). 888 In this exclusion case, the operator MUST classify on the relevant 889 locally-used identifiers (e.g. source addresses) before classifying 890 the non-matching traffic on the end-to-end L4S ECN identifier. 892 The operator MUST NOT alter the end-to-end L4S ECN identifier from 893 L4S to Classic, because its decision to exclude certain traffic from 894 L4S treatment is local-only. The end-to-end L4S identifier then 895 survives for other operators to use, or indeed, they can apply their 896 own policy, independently based on their own choice of locally-used 897 identifiers. This approach also allows any operator to remove its 898 locally-applied exclusions in future, e.g. if it wishes to widen the 899 benefit of the L4S treatment to all its customers. 901 5.4.1.3. Generalized Combination of L4S and Other Identifiers 903 L4S concerns low latency, which it can provide for all traffic 904 without differentiation and without _necessarily_ affecting bandwidth 905 allocation. Diffserv provides for differentiation of both bandwidth 906 and low latency, but its control of latency depends on its control of 907 bandwidth. The two can be combined if a network operator wants to 908 control bandwidth allocation but it also wants to provide low latency 909 - for any amount of traffic within one of these allocations of 910 bandwidth (rather than only providing low latency by limiting 911 bandwidth) [I-D.briscoe-tsvwg-l4s-diffserv]. 913 The DualQ examples so far have been framed in the context of 914 providing the default Best Efforts Per-Hop Behaviour (PHB) using two 915 queues - a Low Latency (L) queue and a Classic (C) Queue. This 916 single DualQ structure is expected to be the most common and useful 917 arrangement. But, more generally, an operator might choose to 918 control bandwidth allocation through a hierarchy of Diffserv PHBs at 919 a node, and to offer one (or more) of these PHBs with a low latency 920 and a Classic variant. 922 In the first case, if we assume that a network element provides no 923 PHBs except the DualQ, if a packet carries ECT(1) or CE, the network 924 element would classify it for the L4S treatment irrespective of its 925 DSCP. And, if a packet carried (say) the EF DSCP, the network 926 element could classify it into the L queue irrespective of its ECN 927 codepoint. However, where the DualQ is in a hierarchy of other PHBs, 928 the classifier would classify some traffic into other PHBs based on 929 DSCP before classifying between the low latency and Classic queues 930 (based on ECT(1), CE and perhaps also the EF DSCP or other 931 identifiers as in the above example). 932 [I-D.briscoe-tsvwg-l4s-diffserv] gives a number of examples of such 933 arrangements to address various requirements. 935 [I-D.briscoe-tsvwg-l4s-diffserv] describes how an operator might use 936 L4S to offer low latency for all L4S traffic as well as using 937 Diffserv for bandwidth differentiation. It identifies two main types 938 of approach, which can be combined: the operator might split certain 939 Diffserv PHBs between L4S and a corresponding Classic service. Or it 940 might split the L4S and/or the Classic service into multiple Diffserv 941 PHBs. In either of these cases, a packet would have to be classified 942 on its Diffserv and ECN codepoints. 944 In summary, there are numerous ways in which the L4S ECN identifier 945 (ECT(1) and CE) could be combined with other identifiers to achieve 946 particular objectives. The following categorization articulates 947 those that are valid, but it is not necessarily exhaustive. Those 948 tagged 'Recommended-standard-use' could be set by the sending host or 949 a network. Those tagged 'Local-use' would only be set by a network: 951 1. Identifiers Complementing the L4S Identifier 953 A. Including More Traffic in the L Queue 954 (Could use Recommended-standard-use or Local-use identifiers) 956 B. Excluding Certain Traffic from the L Queue 957 (Local-use only) 959 2. Identifiers to place L4S classification in a PHB Hierarchy 960 (Could use Recommended-standard-use or Local-use identifiers) 962 A. PHBs Before L4S ECN Classification 964 B. PHBs After L4S ECN Classification 966 5.4.2. Per-Flow Queuing Examples of Other Identifiers Complementing L4S 967 Identifiers 969 At a node with per-flow queueing (e.g. FQ-CoDel [RFC8290]), the L4S 970 identifier could complement the Layer-4 flow ID as a further level of 971 flow granularity (i.e. Not-ECT and ECT(0) queued separately from 972 ECT(1) and CE packets). "Risk of reordering Classic CE packets" in 973 Appendix B.1 discusses the resulting ambiguity if packets originally 974 marked ECT(0) are marked CE by an upstream AQM before they arrive at 975 a node that classifies CE as L4S. It argues that the risk of 976 reordering is vanishingly small and the consequence of such a low 977 level of reordering is minimal. 979 Alternatively, it could be assumed that it is not in a flow's own 980 interest to mix Classic and L4S identifiers. Then the AQM could use 981 the ECN field to switch itself between a Classic and an L4S AQM 982 behaviour within one per-flow queue. For instance, for ECN-capable 983 packets, the AQM might consist of a simple marking threshold and an 984 L4S ECN identifier might simply select a shallower threshold than a 985 Classic ECN identifier would. 987 5.5. Limiting Packet Bursts from Links Supporting L4S AQMs 989 As well as senders needing to limit packet bursts (Section 4.3), 990 links need to limit the degree of burstiness they introduce. In both 991 cases (senders and links) this is a tradeoff, because batch-handling 992 of packets is done for good reason, e.g. processing efficiency or to 993 make efficient use of medium acquisition delay. Some take the 994 attitude that there is no point reducing burst delay at the sender 995 below that introduced by links (or vice versa). However, delay 996 reduction proceeds by cutting down 'the longest pole in the tent', 997 which turns the spotlight on the next longest, and so on. 999 This document does not set any quantified requirements for links to 1000 limit burst delay, primarily because link technologies are outside 1001 the remit of L4S specifications. Nonetheless, it would not make 1002 sense to implement an L4S AQM that feeds into a particular link 1003 technology without also reviewing opportunities to reduce any form of 1004 burst delay introduced by that link technology. This would at least 1005 limit the bursts that the link would otherwise introduce into the 1006 onward traffic, which would cause jumpy feedback to the sender as 1007 well as potential extra queuing delay downstream. This document does 1008 not presume to even give guidance on an appropriate target for such 1009 burst delay until there is more industry experience of L4S. However, 1010 as suggested in Section 4.3 it would not seem necessary to limit 1011 bursts lower than roughly 10% of the minimum base RTT expected in the 1012 typical deployment scenario (e.g. 250 us burst duration for links 1013 within the public Internet). 1015 6. L4S Experiments 1017 This section describes open questions that L4S Experiments ought to 1018 focus on. This section also documents outstanding open issues that 1019 will need to be investigated as part of L4S experimentation, given 1020 they could not be fully resolved during the WG phase. It also lists 1021 metrics that will need to be monitored during experiments 1022 (summarizing text elsewhere in L4S documents) and finally lists some 1023 potential future directions that researchers might wish to 1024 investigate. 1026 In addition to this section, [I-D.ietf-tsvwg-aqm-dualq-coupled] sets 1027 operational and management requirements for experiments with DualQ 1028 Coupled AQMs; and General operational and management requirements for 1029 experiments with L4S congestion controls are given in Section 4 and 1030 Section 5 above, e.g. co-existence and scaling requirements, 1031 incremental deployment arrangements. 1033 The specification of each scalable congestion control will need to 1034 include protocol-specific requirements for configuration and 1035 monitoring performance during experiments. Appendix A of [RFC5706] 1036 provides a helpful checklist. 1038 6.1. Open Questions 1040 L4S experiments would be expected to answer the following questions: 1042 o Have all the parts of L4S been deployed, and if so, what 1043 proportion of paths support it? 1045 o Does use of L4S over the Internet result in significantly improved 1046 user experience? 1048 o Has L4S enabled novel interactive applications? 1050 o Did use of L4S over the Internet result in improvements to the 1051 following metrics: 1053 o 1055 * queue delay (mean and 99th percentile) under various loads 1057 * utilization 1059 * starvation / fairness 1061 * scaling range of flow rates and RTTs 1063 o How much does burstiness in the Internet affect L4S performance, 1064 and how much limitation of bustiness was needed and/or was 1065 realized - both at senders and at links, especially radio links? 1067 o Was per-flow queue protection typically (un)necessary? 1069 * How well did overload protection or queue protection work? 1071 o How well did L4S flows coexist with Classic flows when sharing a 1072 bottleneck? 1074 o 1076 * How frequently did problems arise? 1078 * What caused any coexistence problems, and were any problems due 1079 to single-queue Classic ECN AQMs (this assumes single-queue 1080 Classic ECN AQMs can be distinguished from FQ ones)? 1082 o How prevalent were problems with the L4S service due to tunnels / 1083 encapsulations that do not support ECN decapsulation? 1085 o How easy was it to implement a fully compliant L4S congestion 1086 control, over various different transport protocols (TCP. QUIC, 1087 RMCAT, etc)? 1089 Monitoring for harm to other traffic, specifically bandwidth 1090 starvation or excess queuing delay, will need to be conducted 1091 alongside all early L4S experiments. It is hard, if not impossible, 1092 for an individual flow to measure its impact on other traffic. So 1093 such monitoring will need to be conducted using bespoke monitoring 1094 across flows and/or across classes of traffic. 1096 6.2. Open Issues 1098 o What is the best way forward to deal with L4S over single-queue 1099 Classic ECN AQM bottlenecks, given current problems with 1100 misdetecting L4S AQMs as Classic ECN AQMs? 1102 o Fixing the poor Interaction between current L4S congestion 1103 controls and CoDel with only Classic ECN support during flow 1104 startup 1106 6.3. Future Potential 1108 Researchers might find that L4S opens up the following interesting 1109 areas for investigation: 1111 o Potential for faster convergence time and tracking of available 1112 capacity 1114 o Potential for improvements to particular link technologies, and 1115 cross-layer interactions with them. 1117 o Potential for using virtual queues, e.g. to further reduce latency 1118 jitter, or to leave headroom for capacity variation in radio 1119 networks 1121 o Development and specification of reverse path congestion control 1122 using L4S building bocks (e.g. AccECN, QUIC) 1124 o Once queuing delay is cut down, what becomes the 'second longest 1125 pole in the tent' (other than the speed of light)? 1127 o Novel alternatives to the existing set of L4S AQMs 1129 o Novel applications enabled by L4S 1131 7. IANA Considerations 1133 The 01 codepoint of the ECN Field of the IP header is specified by 1134 the present Experimental RFC. The process for an experimental RFC to 1135 assign this codepoint in the IP header (v4 and v6) is documented in 1136 Proposed Standard [RFC8311], which updates the Proposed Standard 1137 [RFC3168]. 1139 When the present document is published as an RFC, IANA is asked to 1140 update the 01 entry in the registry, "ECN Field (Bits 6-7)" to the 1141 following (see https://www.iana.org/assignments/dscp-registry/dscp- 1142 registry.xhtml#ecn-field ): 1144 +--------+-----------------------------------+--------------------+ 1145 | Binary | Keyword | References | 1146 +--------+-----------------------------------+--------------------+ 1147 | 01 | ECT(1) (ECN-Capable Transport(1)) | [RFC8311][RFCXXXX] | 1148 +--------+-----------------------------------+--------------------+ 1150 and remove footnote [1]. 1152 [XXXX is the number that the RFC Editor assigns to the present 1153 document (this sentence to be removed by the RFC Editor)]. 1155 8. Security Considerations 1157 Approaches to assure the integrity of signals using the new 1158 identifier are introduced in Appendix C.1. See the security 1159 considerations in the L4S architecture [I-D.ietf-tsvwg-l4s-arch] for 1160 further discussion of mis-use of the identifier, as well as extensive 1161 discussion of policing rate and latency in regard to L4S. 1163 The recommendation to detect loss in time units prevents the ACK- 1164 splitting attacks described in [Savage-TCP]. 1166 9. Acknowledgements 1168 Thanks to Richard Scheffenegger, John Leslie, David Taeht, Jonathan 1169 Morton, Gorry Fairhurst, Michael Welzl, Mikael Abrahamsson and Andrew 1170 McGregor for the discussions that led to this specification. Ing-jyh 1171 (Inton) Tsang was a contributor to the early drafts of this document. 1172 And thanks to Mikael Abrahamsson, Lloyd Wood, Nicolas Kuhn, Greg 1173 White, Tom Henderson, David Black, Gorry Fairhurst, Brian Carpenter, 1174 Jake Holland, Rod Grimes and Richard Scheffenegger for providing help 1175 and reviewing this draft and to Ingemar Johansson for reviewing and 1176 providing substantial text. Particular thanks to Wes Eddy for 1177 patiently shepherding this and the other L4S drafts through the IETF 1178 process. Appendix A listing the Prague L4S Requirements is based on 1179 text authored by Marcelo Bagnulo Braun that was originally an 1180 appendix to [I-D.ietf-tsvwg-l4s-arch]. That text was in turn based 1181 on the collective output of the attendees listed in the minutes of a 1182 'bar BoF' on DCTCP Evolution during IETF-94 [TCPPrague]. 1184 The authors' contributions were part-funded by the European Community 1185 under its Seventh Framework Programme through the Reducing Internet 1186 Transport Latency (RITE) project (ICT-317700). Bob Briscoe was also 1187 funded partly by the Research Council of Norway through the TimeIn 1188 project, partly by CableLabs and partly by the Comcast Innovation 1189 Fund. The views expressed here are solely those of the authors. 1191 10. References 1193 10.1. Normative References 1195 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1196 Requirement Levels", BCP 14, RFC 2119, 1197 DOI 10.17487/RFC2119, March 1997, 1198 . 1200 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1201 of Explicit Congestion Notification (ECN) to IP", 1202 RFC 3168, DOI 10.17487/RFC3168, September 2001, 1203 . 1205 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 1206 Explicit Congestion Notification (ECN) Field", BCP 124, 1207 RFC 4774, DOI 10.17487/RFC4774, November 2006, 1208 . 1210 [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., 1211 and K. Carlberg, "Explicit Congestion Notification (ECN) 1212 for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August 1213 2012, . 1215 10.2. Informative References 1217 [A2DTCP] Zhang, T., Wang, J., Huang, J., Huang, Y., Chen, J., and 1218 Y. Pan, "Adaptive-Acceleration Data Center TCP", IEEE 1219 Transactions on Computers 64(6):1522-1533, June 2015, 1220 . 1223 [Ahmed19] Ahmed, A., "Extending TCP for Low Round Trip Delay", 1224 Masters Thesis, Uni Oslo , August 2019, 1225 . 1227 [Alizadeh-stability] 1228 Alizadeh, M., Javanmard, A., and B. Prabhakar, "Analysis 1229 of DCTCP: Stability, Convergence, and Fairness", ACM 1230 SIGMETRICS 2011 , June 2011. 1232 [ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An 1233 Algorithm for Increasing the Robustness of RED's Active 1234 Queue Management", ACIRI Technical Report , August 2001, 1235 . 1237 [DCttH15] De Schepper, K., Bondarenko, O., Briscoe, B., and I. 1238 Tsang, "'Data Centre to the Home': Ultra-Low Latency for 1239 All", RITE Project Technical Report , 2015, 1240 . 1242 [ecn-fallback] 1243 Briscoe, B. and A. Ahmed, "TCP Prague Fall-back on 1244 Detection of a Classic ECN AQM", bobbriscoe.net Technical 1245 Report TR-BB-2019-002, April 2020, 1246 . 1248 [I-D.briscoe-docsis-q-protection] 1249 Briscoe, B. and G. White, "Queue Protection to Preserve 1250 Low Latency", draft-briscoe-docsis-q-protection-00 (work 1251 in progress), July 2019. 1253 [I-D.briscoe-tsvwg-l4s-diffserv] 1254 Briscoe, B., "Interactions between Low Latency, Low Loss, 1255 Scalable Throughput (L4S) and Differentiated Services", 1256 draft-briscoe-tsvwg-l4s-diffserv-02 (work in progress), 1257 November 2018. 1259 [I-D.ietf-avtcore-cc-feedback-message] 1260 Sarker, Z., Perkins, C., Singh, V., and M. Ramalho, "RTP 1261 Control Protocol (RTCP) Feedback for Congestion Control", 1262 draft-ietf-avtcore-cc-feedback-message-08 (work in 1263 progress), September 2020. 1265 [I-D.ietf-quic-transport] 1266 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 1267 and Secure Transport", draft-ietf-quic-transport-32 (work 1268 in progress), October 2020. 1270 [I-D.ietf-tcpm-accurate-ecn] 1271 Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More 1272 Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate- 1273 ecn-12 (work in progress), October 2020. 1275 [I-D.ietf-tcpm-generalized-ecn] 1276 Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit 1277 Congestion Notification (ECN) to TCP Control Packets", 1278 draft-ietf-tcpm-generalized-ecn-06 (work in progress), 1279 October 2020. 1281 [I-D.ietf-tcpm-rack] 1282 Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "The 1283 RACK-TLP loss detection algorithm for TCP", draft-ietf- 1284 tcpm-rack-11 (work in progress), September 2020. 1286 [I-D.ietf-tsvwg-aqm-dualq-coupled] 1287 Schepper, K., Briscoe, B., and G. White, "DualQ Coupled 1288 AQMs for Low Latency, Low Loss and Scalable Throughput 1289 (L4S)", draft-ietf-tsvwg-aqm-dualq-coupled-12 (work in 1290 progress), July 2020. 1292 [I-D.ietf-tsvwg-ecn-encap-guidelines] 1293 Briscoe, B., Kaippallimalil, J., and P. Thaler, 1294 "Guidelines for Adding Congestion Notification to 1295 Protocols that Encapsulate IP", draft-ietf-tsvwg-ecn- 1296 encap-guidelines-13 (work in progress), May 2019. 1298 [I-D.ietf-tsvwg-l4s-arch] 1299 Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low 1300 Latency, Low Loss, Scalable Throughput (L4S) Internet 1301 Service: Architecture", draft-ietf-tsvwg-l4s-arch-07 (work 1302 in progress), October 2020. 1304 [I-D.ietf-tsvwg-nqb] 1305 White, G. and T. Fossati, "A Non-Queue-Building Per-Hop 1306 Behavior (NQB PHB) for Differentiated Services", draft- 1307 ietf-tsvwg-nqb-02 (work in progress), September 2020. 1309 [I-D.ietf-tsvwg-rfc6040update-shim] 1310 Briscoe, B., "Propagating Explicit Congestion Notification 1311 Across IP Tunnel Headers Separated by a Shim", draft-ietf- 1312 tsvwg-rfc6040update-shim-10 (work in progress), March 1313 2020. 1315 [I-D.morton-tsvwg-sce] 1316 Morton, J. and R. Grimes, "The Some Congestion Experienced 1317 ECN Codepoint", draft-morton-tsvwg-sce-01 (work in 1318 progress), November 2019. 1320 [I-D.sridharan-tcpm-ctcp] 1321 Sridharan, M., Tan, K., Bansal, D., and D. Thaler, 1322 "Compound TCP: A New TCP Congestion Control for High-Speed 1323 and Long Distance Networks", draft-sridharan-tcpm-ctcp-02 1324 (work in progress), November 2008. 1326 [I-D.stewart-tsvwg-sctpecn] 1327 Stewart, R., Tuexen, M., and X. Dong, "ECN for Stream 1328 Control Transmission Protocol (SCTP)", draft-stewart- 1329 tsvwg-sctpecn-05 (work in progress), January 2014. 1331 [LinuxPacedChirping] 1332 Misund, J. and B. Briscoe, "Paced Chirping - Rethinking 1333 TCP start-up", Proc. Linux Netdev 0x13 , March 2019, 1334 . 1336 [Mathis09] 1337 Mathis, M., "Relentless Congestion Control", PFLDNeT'09 , 1338 May 2009, . 1341 [Paced-Chirping] 1342 Misund, J., "Rapid Acceleration in TCP Prague", Masters 1343 Thesis , May 2018, 1344 . 1347 [PI2] De Schepper, K., Bondarenko, O., Tsang, I., and B. 1348 Briscoe, "PI^2 : A Linearized AQM for both Classic and 1349 Scalable TCP", Proc. ACM CoNEXT 2016 pp.105-119, December 1350 2016, 1351 . 1353 [PragueLinux] 1354 Briscoe, B., De Schepper, K., Albisser, O., Misund, J., 1355 Tilmans, O., Kuehlewind, M., and A. Ahmed, "Implementing 1356 the `TCP Prague' Requirements for Low Latency Low Loss 1357 Scalable Throughput (L4S)", Proc. Linux Netdev 0x13 , 1358 March 2019, . 1361 [QV] Briscoe, B. and P. Hurtig, "Up to Speed with Queue View", 1362 RITE Technical Report D2.3; Appendix C.2, August 2015, 1363 . 1366 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 1367 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 1368 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 1369 S., Wroclawski, J., and L. Zhang, "Recommendations on 1370 Queue Management and Congestion Avoidance in the 1371 Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998, 1372 . 1374 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1375 "Definition of the Differentiated Services Field (DS 1376 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1377 DOI 10.17487/RFC2474, December 1998, 1378 . 1380 [RFC2983] Black, D., "Differentiated Services and Tunnels", 1381 RFC 2983, DOI 10.17487/RFC2983, October 2000, 1382 . 1384 [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, 1385 J., Courtney, W., Davari, S., Firoiu, V., and D. 1386 Stiliadis, "An Expedited Forwarding PHB (Per-Hop 1387 Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002, 1388 . 1390 [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit 1391 Congestion Notification (ECN) Signaling with Nonces", 1392 RFC 3540, DOI 10.17487/RFC3540, June 2003, 1393 . 1395 [RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows", 1396 RFC 3649, DOI 10.17487/RFC3649, December 2003, 1397 . 1399 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 1400 Congestion Control Protocol (DCCP)", RFC 4340, 1401 DOI 10.17487/RFC4340, March 2006, 1402 . 1404 [RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion 1405 Control Protocol (DCCP) Congestion Control ID 2: TCP-like 1406 Congestion Control", RFC 4341, DOI 10.17487/RFC4341, March 1407 2006, . 1409 [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for 1410 Datagram Congestion Control Protocol (DCCP) Congestion 1411 Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, 1412 DOI 10.17487/RFC4342, March 2006, 1413 . 1415 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 1416 RFC 4960, DOI 10.17487/RFC4960, September 2007, 1417 . 1419 [RFC5033] Floyd, S. and M. Allman, "Specifying New Congestion 1420 Control Algorithms", BCP 133, RFC 5033, 1421 DOI 10.17487/RFC5033, August 2007, 1422 . 1424 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 1425 Friendly Rate Control (TFRC): Protocol Specification", 1426 RFC 5348, DOI 10.17487/RFC5348, September 2008, 1427 . 1429 [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. 1430 Ramakrishnan, "Adding Explicit Congestion Notification 1431 (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, 1432 DOI 10.17487/RFC5562, June 2009, 1433 . 1435 [RFC5622] Floyd, S. and E. Kohler, "Profile for Datagram Congestion 1436 Control Protocol (DCCP) Congestion ID 4: TCP-Friendly Rate 1437 Control for Small Packets (TFRC-SP)", RFC 5622, 1438 DOI 10.17487/RFC5622, August 2009, 1439 . 1441 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1442 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1443 . 1445 [RFC5706] Harrington, D., "Guidelines for Considering Operations and 1446 Management of New Protocols and Protocol Extensions", 1447 RFC 5706, DOI 10.17487/RFC5706, November 2009, 1448 . 1450 [RFC5865] Baker, F., Polk, J., and M. Dolly, "A Differentiated 1451 Services Code Point (DSCP) for Capacity-Admitted Traffic", 1452 RFC 5865, DOI 10.17487/RFC5865, May 2010, 1453 . 1455 [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP 1456 Authentication Option", RFC 5925, DOI 10.17487/RFC5925, 1457 June 2010, . 1459 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 1460 Notification", RFC 6040, DOI 10.17487/RFC6040, November 1461 2010, . 1463 [RFC6077] Papadimitriou, D., Ed., Welzl, M., Scharf, M., and B. 1464 Briscoe, "Open Research Issues in Internet Congestion 1465 Control", RFC 6077, DOI 10.17487/RFC6077, February 2011, 1466 . 1468 [RFC6660] Briscoe, B., Moncaster, T., and M. Menth, "Encoding Three 1469 Pre-Congestion Notification (PCN) States in the IP Header 1470 Using a Single Diffserv Codepoint (DSCP)", RFC 6660, 1471 DOI 10.17487/RFC6660, July 2012, 1472 . 1474 [RFC7560] Kuehlewind, M., Ed., Scheffenegger, R., and B. Briscoe, 1475 "Problem Statement and Requirements for Increased Accuracy 1476 in Explicit Congestion Notification (ECN) Feedback", 1477 RFC 7560, DOI 10.17487/RFC7560, August 2015, 1478 . 1480 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 1481 Recommendations Regarding Active Queue Management", 1482 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 1483 . 1485 [RFC7713] Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx) 1486 Concepts, Abstract Mechanism, and Requirements", RFC 7713, 1487 DOI 10.17487/RFC7713, December 2015, 1488 . 1490 [RFC8033] Pan, R., Natarajan, P., Baker, F., and G. White, 1491 "Proportional Integral Controller Enhanced (PIE): A 1492 Lightweight Control Scheme to Address the Bufferbloat 1493 Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017, 1494 . 1496 [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., 1497 and G. Judd, "Data Center TCP (DCTCP): TCP Congestion 1498 Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257, 1499 October 2017, . 1501 [RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys, 1502 J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler 1503 and Active Queue Management Algorithm", RFC 8290, 1504 DOI 10.17487/RFC8290, January 2018, 1505 . 1507 [RFC8298] Johansson, I. and Z. Sarker, "Self-Clocked Rate Adaptation 1508 for Multimedia", RFC 8298, DOI 10.17487/RFC8298, December 1509 2017, . 1511 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 1512 Notification (ECN) Experimentation", RFC 8311, 1513 DOI 10.17487/RFC8311, January 2018, 1514 . 1516 [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 1517 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 1518 RFC 8312, DOI 10.17487/RFC8312, February 2018, 1519 . 1521 [RFC8511] Khademi, N., Welzl, M., Armitage, G., and G. Fairhurst, 1522 "TCP Alternative Backoff with ECN (ABE)", RFC 8511, 1523 DOI 10.17487/RFC8511, December 2018, 1524 . 1526 [Savage-TCP] 1527 Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, 1528 "TCP Congestion Control with a Misbehaving Receiver", ACM 1529 SIGCOMM Computer Communication Review 29(5):71--78, 1530 October 1999. 1532 [sub-mss-prob] 1533 Briscoe, B. and K. De Schepper, "Scaling TCP's Congestion 1534 Window for Small Round Trip Times", BT Technical Report 1535 TR-TUB8-2015-002, May 2015, 1536 . 1538 [TCP-CA] Jacobson, V. and M. Karels, "Congestion Avoidance and 1539 Control", Laurence Berkeley Labs Technical Report , 1540 November 1988, . 1542 [TCPPrague] 1543 Briscoe, B., "Notes: DCTCP evolution 'bar BoF': Tue 21 Jul 1544 2015, 17:40, Prague", tcpprague mailing list archive , 1545 July 2015, . 1548 [VCP] Xia, Y., Subramanian, L., Stoica, I., and S. Kalyanaraman, 1549 "One more bit is enough", Proc. SIGCOMM'05, ACM CCR 1550 35(4)37--48, 2005, 1551 . 1553 Appendix A. The 'Prague L4S Requirements' 1555 This appendix is informative, not normative. It gives a list of 1556 modifications to current scalable congestion controls so that they 1557 can be deployed over the public Internet and coexist safely with 1558 existing traffic. The list complements the normative requirements in 1559 Section 4 that a sender has to comply with before it can set the L4S 1560 identifier in packets it sends into the Internet. As well as 1561 necessary safety improvements (requirements) this appendix also 1562 includes preferable performance improvements (optimizations). 1564 These recommendations have become know as the Prague L4S 1565 Requirements, because they were originally identified at an ad hoc 1566 meeting during IETF-94 in Prague [TCPPrague]. The wording has been 1567 generalized to apply to all scalable congestion controls, not just 1568 TCP congestion control specifically. They were originally called the 1569 'TCP Prague Requirements', but they are not solely applicable to TCP, 1570 so the name has been generalized, and TCP Prague is now used for a 1571 specific implementation of the requirements. 1573 At the time of writing, DCTCP [RFC8257] is the most widely used 1574 scalable transport protocol. In its current form, DCTCP is specified 1575 to be deployable only in controlled environments. Deploying it in 1576 the public Internet would lead to a number of issues, both from the 1577 safety and the performance perspective. The modifications and 1578 additional mechanisms listed in this section will be necessary for 1579 its deployment over the global Internet. Where an example is needed, 1580 DCTCP is used as a base, but it is likely that most of these 1581 requirements equally apply to other scalable congestion controls. 1583 A.1. Requirements for Scalable Transport Protocols 1585 A.1.1. Use of L4S Packet Identifier 1587 Description: A scalable congestion control needs to distinguish the 1588 packets it sends from those sent by Classic congestion controls (see 1589 the precise normative requirement wording in Section 4.1). 1591 Motivation: It needs to be possible for a network node to classify 1592 L4S packets without flow state into a queue that applies an L4S ECN 1593 marking behaviour and isolates L4S packets from the queuing delay of 1594 Classic packets. 1596 A.1.2. Accurate ECN Feedback 1598 Description: The transport protocol for a scalable congestion control 1599 needs to provide timely, accurate feedback about the extent of ECN 1600 marking experienced by all packets (see the precise normative 1601 requirement wording in Section 4.2). 1603 Motivation: Classic congestion controls only need feedback about the 1604 existence of a congestion episode within a round trip, not precisely 1605 how many packets were marked with ECN or dropped. Therefore, in 1606 2001, when ECN feedback was added to TCP [RFC3168], it could not 1607 inform the sender of more than one ECN mark per RTT. Since then, 1608 requirements for more accurate ECN feedback in TCP have been defined 1609 in [RFC7560] and [I-D.ietf-tcpm-accurate-ecn] specifies an 1610 experimental change to the TCP wire protocol to satisfy these 1611 requirements. Most other transport protocols already satisfy this 1612 requirement. 1614 A.1.3. Fall back to Reno-friendly congestion control on packet loss 1616 Description: As well as responding to ECN markings in a scalable way, 1617 a scalable congestion control needs to react to packet loss in a way 1618 that will coexist safely with a TCP Reno congestion control [RFC5681] 1619 (see the precise normative requirement wording in Section 4.3). 1621 Motivation: Part of the safety conditions for deploying a scalable 1622 congestion control on the public Internet is to make sure that it 1623 behaves properly when it builds a queue at a network bottleneck that 1624 has not been upgraded to support L4S. Packet loss can have many 1625 causes, but it usually has to be conservatively assumed that it is a 1626 sign of congestion. Therefore, on detecting packet loss, a scalable 1627 congestion control will need to fall back to Classic congestion 1628 control behaviour. If it does not comply with this requirement it 1629 could starve Classic traffic. 1631 A scalable congestion control can be used for different types of 1632 transport, e.g. for real-time media or for reliable transport like 1633 TCP. Therefore, the particular Classic congestion control behaviour 1634 to fall back on will need to be part of the congestion control 1635 specification of the relevant transport. In the particular case of 1636 DCTCP, the DCTCP specification [RFC8257] states that "It is 1637 RECOMMENDED that an implementation deal with loss episodes in the 1638 same way as conventional TCP." For safe deployment of a scalable 1639 congestion control in the public Internet, the above requirement 1640 would need to be defined as a "MUST". 1642 Even though a bottleneck is L4S capable, it might still become 1643 overloaded and have to drop packets. In this case, the sender may 1644 receive a high proportion of packets marked with the CE bit set and 1645 also experience loss. Current DCTCP implementations react 1646 differently to this situation. At least one implementation reacts 1647 only to the drop signal (e.g. by halving the CWND) and at least 1648 another DCTCP implementation reacts to both signals (e.g. by halving 1649 the CWND due to the drop and also further reducing the CWND based on 1650 the proportion of marked packet). A third approach for the public 1651 Internet has been proposed that adjusts the loss response to result 1652 in a halving when combined with the ECN response. We believe that 1653 further experimentation is needed to understand what is the best 1654 behaviour for the public Internet, which may or not be one of these 1655 existing approaches. 1657 A.1.4. Fall back to Reno-friendly congestion control on classic ECN 1658 bottlenecks 1660 Description: A scalable congestion control needs to react to ECN 1661 marking from a non-L4S, but ECN-capable, bottleneck in a way that 1662 will coexist with a TCP Reno congestion control [RFC5681] (see the 1663 precise normative requirement wording in Section 4.3). 1665 Motivation: Similarly to the requirement in Appendix A.1.3, this 1666 requirement is a safety condition to ensure a scalable congestion 1667 control behaves properly when it builds a queue at a network 1668 bottleneck that has not been upgraded to support L4S. On detecting 1669 Classic ECN marking (see below), a scalable congestion control will 1670 need to fall back to Classic congestion control behaviour. If it 1671 does not comply with this requirement it could starve Classic 1672 traffic. 1674 A passive monitoring algorithm to detect a Classic ECN AQM at the 1675 bottleneck is provided in [ecn-fallback], which also provides a link 1676 to Linux source code. Very briefly, the algorithm primarily monitors 1677 RTT variation using the same algorithm that maintains the mean 1678 deviation of TCP's smoothed RTT, but it smooths over a duration of 1679 the order of a Classic sawtooth. The outcome is also conditioned on 1680 other metrics such as the presence of CE marking and congestion 1681 avoidance phase having stabilized. The report also identifies 1682 further work to improve the approach, for instance improvements with 1683 low capacity links and combining the measurements with a cache of 1684 what had been learned about a path in previous connections. 1686 The relevant normative requirement (Section 4.3) is expressed as a 1687 'SHOULD' to allow the possibility that the operator of the host knows 1688 that the network it serves has not deployed any single queue classic 1689 ECN AQM (e.g. a CDN might be testing separately for signs of Classic 1690 ECN AQMs, or they might have checked which ISPs they serve have not 1691 deployed Classic ECN AQMs). 1693 Nonetheless, monitoring is still expressed as a 'MUST' because there 1694 is still a possibility that there is a Classic ECN AQM somewhere else 1695 on the path (to continue the CDN example, perhaps beyond the ISP in a 1696 home network). Then, if the server operators have disabled fall-back 1697 for parts of their deployment, they can reconsider their policy or at 1698 least do more focused testing if in-band monitoring frequently 1699 detects single-queue Classic ECN AQMs. 1701 A.1.5. Reduce RTT dependence 1703 Description: A scalable congestion control needs to reduce or 1704 eliminate RTT bias over as wide a range of RTTs as possible, or at 1705 least over the typical range of RTTs that will interact in the 1706 intended deployment scenario (see the precise normative requirement 1707 wording in Section 4.3). 1709 Motivation: The throughput of Classic congestion controls is known to 1710 be inversely proportional to RTT, so one would expect flows over very 1711 low RTT paths to nearly starve flows over larger RTTs. However, 1712 Classic congestion controls have never allowed a very low RTT path to 1713 exist because they induce a large queue. For instance, consider two 1714 paths with base RTT 1ms and 100ms. If a Classic congestion control 1715 induces a 100ms queue, it turns these RTTs into 101ms and 200ms 1716 leading to a throughput ratio of about 2:1. Whereas if a scalable 1717 congestion control induces only a 1ms queue, the ratio is 2:101, 1718 leading to a throughput ratio of about 50:1. 1720 Therefore, with very small queues, long RTT flows will essentially 1721 starve, unless scalable congestion controls comply with this 1722 requirement. 1724 A.1.6. Scaling down to fractional congestion windows 1726 Description: A scalable congestion control needs to remain responsive 1727 to congestion when typical RTTs over the public Internet are 1728 significantly smaller because they are no longer inflated by queuing 1729 delay (see the precise normative requirement wording in Section 4.3). 1731 Motivation: As currently specified, the minimum required congestion 1732 window of TCP (and its derivatives) is set to 2 sender maximum 1733 segment sizes (SMSS) (see equation (4) in [RFC5681]). Once the 1734 congestion window reaches this minimum, all known window-based 1735 congestion control algorithms become unresponsive to congestion 1736 signals. No matter how much drop or ECN marking, the congestion 1737 window of all these algorithms no longer reduces. Instead, the 1738 sender's lack of any further congestion response forces the queue to 1739 grow, overriding any AQM and increasing queuing delay. 1741 L4S mechanisms significantly reduce queueing delay so, over the same 1742 path, the RTT becomes lower. Then this problem becomes surprisingly 1743 common [sub-mss-prob]. This is because, for the same link capacity, 1744 smaller RTT implies a smaller window. For instance, consider a 1745 residential setting with an upstream broadband Internet access of 8 1746 Mb/s, assuming a max segment size of 1500 B. Two upstream flows will 1747 each have the minimum window of 2 SMSS if the RTT is 6ms or less, 1748 which is quite common when accessing a nearby data centre. So, any 1749 more than two such parallel TCP flows will become unresponsive and 1750 increase queuing delay. 1752 Unless scalable congestion controls address this requirement from the 1753 start, they will frequently become unresponsive, negating the low 1754 latency benefit of L4S, for themselves and for others. 1756 That would seem to imply that scalable congestion controllers ought 1757 to be required to be able work with a congestion window less than 2 1758 SMSS. For instance, one possible mechanism that can maintain a 1759 congestion window significantly less than 1 SMSS is described in 1760 [Ahmed19], and other approaches are likely to be feasible. 1762 However, the requirement in Section 4.3 is worded as a "SHOULD" 1763 because the existence of a minimum window is not all bad. When 1764 competing with an unresponsive flow, a minimum window naturally 1765 protects the flow from starvation by at least keeping some data 1766 flowing. 1768 By stating this requirement as a "SHOULD", specifications of scalable 1769 congestion controllers will be able to choose an appropriate minimum 1770 window, but they will at least have to justify the decision. 1772 A.1.7. Measuring Reordering Tolerance in Time Units 1774 Description: A scalable congestion control needs to detect loss by 1775 counting in time-based units, which is scalable, rather than counting 1776 in units of packets, which is not (see the precise normative 1777 requirement wording in Section 4.3). 1779 Motivation: A primary purpose of L4S is scalable throughput (it's in 1780 the name). Scalability in all dimensions is, of course, also a goal 1781 of all IETF technology. The inverse linear congestion response in 1782 Section 4.3 is necessary, but not sufficient, to solve the congestion 1783 control scalability problem identified in [RFC3649]. As well as 1784 maintaining frequent ECN signals as rate scales, it is also important 1785 to ensure that a potentially false perception of loss does not limit 1786 throughput scaling. 1788 End-systems cannot know whether a missing packet is due to loss or 1789 reordering, except in hindsight - if it appears later. So they can 1790 only deem that there has been a loss if a gap in the sequence space 1791 has not been filled, either after a certain number of subsequent 1792 packets has arrived (e.g. the 3 DupACK rule of standard TCP 1793 congestion control [RFC5681]) or after a certain amount of time 1794 (e.g. the RACK approach [I-D.ietf-tcpm-rack]). 1796 As we attempt to scale packet rate over the years: 1798 o Even if only _some_ sending hosts still deem that loss has 1799 occurred by counting reordered packets, _all_ networks will have 1800 to keep reducing the time over which they keep packets in order. 1801 If some link technologies keep the time within which reordering 1802 occurs roughly unchanged, then loss over these links, as perceived 1803 by these hosts, will appear to continually rise over the years. 1805 o In contrast, if all senders detect loss in units of time, the time 1806 over which the network has to keep packets in order stays roughly 1807 invariant. 1809 Therefore hosts have an incentive to detect loss in time units (so as 1810 not to fool themselves too often into detecting losses when there are 1811 none). And for hosts that are changing their congestion control 1812 implementation to L4S, there is no downside to including time-based 1813 loss detection code in the change (loss recovery implemented in 1814 hardware is an exception, covered later). Therefore requiring L4S 1815 hosts to detect loss in time-based units would not be a burden. 1817 If this requirement is not placed on L4S hosts, even though it would 1818 be no burden on them to do so, all networks will face unnecessary 1819 uncertainty over whether some L4S hosts might be detecting loss by 1820 counting packets. Then _all_ link technologies will have to 1821 unnecessarily keep reducing the time within which reordering occurs. 1822 That is not a problem for some link technologies, but it becomes 1823 increasingly challenging for other link technologies to continue to 1824 scale, particularly those relying on channel bonding for scaling, 1825 such as LTE, 5G and DOCSIS. 1827 Given Internet paths traverse many link technologies, any scaling 1828 limit for these more challenging access link technologies would 1829 become a scaling limit for the Internet as a whole. 1831 It might be asked how it helps to place this loss detection 1832 requirement only on L4S hosts, because networks will still face 1833 uncertainty over whether non-L4S flows are detecting loss by counting 1834 DupACKs. The answer is that those link technologies for which it is 1835 challenging to keep squeezing the reordering time will only need to 1836 do so for non-L4S traffic (which they can do because the L4S 1837 identifier is visible at the IP layer). Therefore, they can focus 1838 their processing and memory resources into scaling non-L4S (Classic) 1839 traffic. Then, the higher the proportion of L4S traffic, the less of 1840 a scaling challenge they will have. 1842 To summarize, there is no reason for L4S hosts not to be part of the 1843 solution instead of part of the problem. 1845 Requirement ("MUST") or recommendation ("SHOULD")? As explained 1846 above, this is a subtle interoperability issue between hosts and 1847 networks, which seems to need a "MUST". Unless networks can be 1848 certain that all L4S hosts follow the time-based approach, they still 1849 have to cater for the worst case - continually squeeze reordering 1850 into a smaller and smaller duration - just for hosts that might be 1851 using the counting approach. However, it was decided to express this 1852 as a recommendation, using "SHOULD". The main justification was that 1853 networks can still be fairly certain that L4S hosts will follow this 1854 recommendation, because following it offers only gain and no pain. 1856 Details: 1858 The speed of loss recovery is much more significant for short flows 1859 than long, therefore a good compromise is to adapt the reordering 1860 window; from a small fraction of the RTT at the start of a flow, to a 1861 larger fraction of the RTT for flows that continue for many round 1862 trips. 1864 This is broadly the approach adopted by TCP RACK (Recent 1865 ACKnowledgements) [I-D.ietf-tcpm-rack]. However, RACK starts with 1866 the 3 DupACK approach, because the RTT estimate is not necessarily 1867 stable. As long as the initial window is paced, such initial use of 1868 3 DupACK counting would amount to time-based loss detection and 1869 therefore would satisfy the time-based loss detection recommendation 1870 of Section 4.3. This is because pacing of the initial window would 1871 ensure that 3 DupACKs early in the connection would be spread over a 1872 small fraction of the round trip. 1874 As mentioned above, hardware implementations of loss recovery using 1875 DupACK counting exist (e.g. some implementations of RoCEv2 for RDMA). 1876 For low latency, these implementations can change their congestion 1877 control to implement L4S, because the congestion control (as distinct 1878 from loss recovery) is implemented in software. But they cannot 1879 easily satisfy this loss recovery requirement. However, it is 1880 believed they do not need to. It is believed that such 1881 implementations solely exist in controlled environments, where the 1882 network technology keeps reordering extremely low anyway. This is 1883 why the scope of the normative recommendation in Section 4.3 is 1884 limited to 'reordering-prone' networks. 1886 Detecting loss in time units also prevents the ACK-splitting attacks 1887 described in [Savage-TCP]. 1889 A.2. Scalable Transport Protocol Optimizations 1891 A.2.1. Setting ECT in TCP Control Packets and Retransmissions 1893 Description: This item only concerns TCP and its derivatives 1894 (e.g. SCTP), because the original specification of ECN for TCP 1895 precluded the use of ECN on control packets and retransmissions. To 1896 improve performance, scalable transport protocols ought to enable ECN 1897 at the IP layer in TCP control packets (SYN, SYN-ACK, pure ACKs, 1898 etc.) and in retransmitted packets. The same is true for derivatives 1899 of TCP, e.g. SCTP. 1901 Motivation: RFC 3168 prohibits the use of ECN on these types of TCP 1902 packet, based on a number of arguments. This means these packets are 1903 not protected from congestion loss by ECN, which considerably harms 1904 performance, particularly for short flows. 1905 [I-D.ietf-tcpm-generalized-ecn] counters each argument in RFC 3168 in 1906 turn, showing it was over-cautious. Instead it proposes experimental 1907 use of ECN on all types of TCP packet as long as AccECN feedback 1908 [I-D.ietf-tcpm-accurate-ecn] is available (which is itself a 1909 prerequisite for using a scalable congestion control). 1911 A.2.2. Faster than Additive Increase 1913 Description: It would improve performance if scalable congestion 1914 controls did not limit their congestion window increase to the 1915 standard additive increase of 1 SMSS per round trip [RFC5681] during 1916 congestion avoidance. The same is true for derivatives of TCP 1917 congestion control, including similar approaches used for real-time 1918 media. 1920 Motivation: As currently defined [RFC8257], DCTCP uses the 1921 traditional TCP Reno additive increase in congestion avoidance phase. 1922 When the available capacity suddenly increases (e.g. when another 1923 flow finishes, or if radio capacity increases) it can take very many 1924 round trips to take advantage of the new capacity. TCP Cubic was 1925 designed to solve this problem, but as flow rates have continued to 1926 increase, the delay accelerating into available capacity has become 1927 prohibitive. See, for instance, the examples in Section 1.2. Even 1928 when out of its Reno-compatibility mode, every 8x scaling of Cubic's 1929 flow rate leads to 2x more acceleration delay. 1931 In the steady state, DCTCP induces about 2 ECN marks per round trip, 1932 so it is possible to quickly detect when these signals have 1933 disappeared and seek available capacity more rapidly, while 1934 minimizing the impact on other flows (Classic and scalable) 1935 [LinuxPacedChirping]. Alternatively, approaches such as Adaptive 1936 Acceleration (A2DTCP [A2DTCP]) have been proposed to address this 1937 problem in data centres, which might be deployable over the public 1938 Internet. 1940 A.2.3. Faster Convergence at Flow Start 1942 Description: Particularly when a flow starts, scalable congestion 1943 controls need to converge (reach their steady-state share of the 1944 capacity) at least as fast as Classic congestion controls and 1945 preferably faster. This affects the flow start behaviour of any L4S 1946 congestion control derived from a Classic transport that uses TCP 1947 slow start, including those for real-time media. 1949 Motivation: As an example, a new DCTCP flow takes longer than a 1950 Classic congestion control to obtain its share of the capacity of the 1951 bottleneck when there are already ongoing flows using the bottleneck 1952 capacity. In a data centre environment DCTCP takes about a factor of 1953 1.5 to 2 longer to converge due to the much higher typical level of 1954 ECN marking that DCTCP background traffic induces, which causes new 1955 flows to exit slow start early [Alizadeh-stability]. In testing for 1956 use over the public Internet the convergence time of DCTCP relative 1957 to a regular loss-based TCP slow start is even less favourable 1958 [Paced-Chirping]) due to the shallow ECN marking threshold needed for 1959 L4S. It is exacerbated by the typically greater mismatch between the 1960 link rate of the sending host and typical Internet access 1961 bottlenecks. This problem is detrimental in general, but would 1962 particularly harm the performance of short flows relative to Classic 1963 congestion controls. 1965 Appendix B. Alternative Identifiers 1967 This appendix is informative, not normative. It records the pros and 1968 cons of various alternative ways to identify L4S packets to record 1969 the rationale for the choice of ECT(1) (Appendix B.1) as the L4S 1970 identifier. At the end, Appendix B.8 summarises the distinguishing 1971 features of the leading alternatives. It is intended to supplement, 1972 not replace the detailed text. 1974 The leading solutions all use the ECN field, sometimes in combination 1975 with the Diffserv field. This is because L4S traffic has to indicate 1976 that it is ECN-capable anyway, because ECN is intrinsic to how L4S 1977 works. Both the ECN and Diffserv fields have the additional 1978 advantage that they are no different in either IPv4 or IPv6. A 1979 couple of alternatives that use other fields are mentioned at the 1980 end, but it is quickly explained why they are not serious contenders. 1982 B.1. ECT(1) and CE codepoints 1984 Definition: 1986 Packets with ECT(1) and conditionally packets with CE would 1987 signify L4S semantics as an alternative to the semantics of 1988 Classic ECN [RFC3168], specifically: 1990 * The ECT(1) codepoint would signify that the packet was sent by 1991 an L4S-capable sender. 1993 * Given shortage of codepoints, both L4S and Classic ECN sides of 1994 an AQM would have to use the same CE codepoint to indicate that 1995 a packet had experienced congestion. If a packet that had 1996 already been marked CE in an upstream buffer arrived at a 1997 subsequent AQM, this AQM would then have to guess whether to 1998 classify CE packets as L4S or Classic ECN. Choosing the L4S 1999 treatment would be a safer choice, because then a few Classic 2000 packets might arrive early, rather than a few L4S packets 2001 arriving late. 2003 * Additional information might be available if the classifier 2004 were transport-aware. Then it could classify a CE packet for 2005 Classic ECN treatment if the most recent ECT packet in the same 2006 flow had been marked ECT(0). However, the L4S service ought 2007 not to need tranport-layer awareness. 2009 Cons: 2011 Consumes the last ECN codepoint: The L4S service could potentially 2012 supersede the service provided by Classic ECN, therefore using 2013 ECT(1) to identify L4S packets could ultimately mean that the 2014 ECT(0) codepoint was 'wasted' purely to distinguish one form of 2015 ECN from its successor. 2017 ECN hard in some lower layers: It is not always possible to support 2018 ECN in an AQM acting in a buffer below the IP layer 2019 [I-D.ietf-tsvwg-ecn-encap-guidelines]. In such cases, the L4S 2020 service would have to drop rather than mark frames even though 2021 they might encapsulate an ECN-capable packet. 2023 Risk of reordering Classic CE packets: Classifying all CE packets 2024 into the L4S queue risks any CE packets that were originally 2025 ECT(0) being incorrectly classified as L4S. If there were delay 2026 in the Classic queue, these incorrectly classified CE packets 2027 would arrive early, which is a form of reordering. Reordering can 2028 cause TCP senders (and senders of similar transports) to 2029 retransmit spuriously. However, the risk of spurious 2030 retransmissions would be extremely low for the following reasons: 2032 1. It is quite unusual to experience queuing at more than one 2033 bottleneck on the same path (the available capacities have to 2034 be identical). 2036 2. In only a subset of these unusual cases would the first 2037 bottleneck support Classic ECN marking while the second 2038 supported L4S ECN marking, which would be the only scenario 2039 where some ECT(0) packets could be CE marked by an AQM 2040 supporting Classic ECN then the remainder experienced further 2041 delay through the Classic side of a subsequent L4S DualQ AQM. 2043 3. Even then, when a few packets are delivered early, it takes 2044 very unusual conditions to cause a spurious retransmission, in 2045 contrast to when some packets are delivered late. The first 2046 bottleneck has to apply CE-marks to at least N contiguous 2047 packets and the second bottleneck has to inject an 2048 uninterrupted sequence of at least N of these packets between 2049 two packets earlier in the stream (where N is the reordering 2050 window that the transport protocol allows before it considers 2051 a packet is lost). 2053 For example consider N=3, and consider the sequence of 2054 packets 100, 101, 102, 103,... and imagine that packets 2055 150,151,152 from later in the flow are injected as follows: 2056 100, 150, 151, 101, 152, 102, 103... If this were late 2057 reordering, even one packet arriving out of sequence would 2058 trigger a spurious retransmission, but there is no spurious 2059 retransmission here with early reordering, because packet 2060 101 moves the cumulative ACK counter forward before 3 2061 packets have arrived out of order. Later, when packets 2062 148, 149, 153... arrive, even though there is a 3-packet 2063 hole, there will be no problem, because the packets to fill 2064 the hole are already in the receive buffer. 2066 4. Even with the current TCP recommendation of N=3 [RFC5681] 2067 spurious retransmissions will be unlikely for all the above 2068 reasons. As RACK [I-D.ietf-tcpm-rack] is becoming widely 2069 deployed, it tends to adapt its reordering window to a larger 2070 value of N, which will make the chance of a contiguous 2071 sequence of N early arrivals vanishingly small. 2073 5. Even a run of 2 CE marks within a Classic ECN flow is 2074 unlikely, given FQ-CoDel is the only known widely deployed AQM 2075 that supports Classic ECN marking and it takes great care to 2076 separate out flows and to space any markings evenly along each 2077 flow. 2079 It is extremely unlikely that the above set of 5 eventualities 2080 that are each unusual in themselves would all happen 2081 simultaneously. But, even if they did, the consequences would 2082 hardly be dire: the odd spurious fast retransmission. Whenever 2083 the traffic source (a Classic congestion control) mistakes the 2084 reordering of a string of CE marks for a loss, one might think 2085 that it will reduce its congestion window as well as emitting a 2086 spurious retransmission. However, it would have already reduced 2087 its congestion window when the CE markings arrived early. If it 2088 is using ABE [RFC8511], it might reduce cwnd a little more for a 2089 loss than for a CE mark. But it will revert that reduction once 2090 it detects that the retransmission was spurious. 2092 In conclusion, the impact of early reordering due to CE being 2093 ambiguous will generally be vanishingly small. 2095 Hard to distinguish Classic ECN AQM: With this scheme, when a source 2096 receives ECN feedback, it is not explicitly clear which type of 2097 AQM generated the CE markings. This is not a problem for Classic 2098 ECN sources that send ECT(0) packets, because an L4S AQM will 2099 recognize the ECT(0) packets as Classic and apply the appropriate 2100 Classic ECN marking behaviour. 2102 However, in the absence of explicit disambiguation of the CE 2103 markings, an L4S source needs to use heuristic techniques to work 2104 out which type of congestion response to apply (see 2105 Appendix A.1.4). Otherwise, if long-running Classic flow(s) are 2106 sharing a Classic ECN AQM bottleneck with long-running L4S 2107 flow(s), which then apply an L4S response to Classic CE signals, 2108 the L4S flows would outcompete the Classic flow(s). Experiments 2109 have shown that L4S flows can take about 20 times more capacity 2110 share than equivalent Classic flows. Nonetheless, as link 2111 capacity reduces (e.g. to 4 4 Mb/s), the inequality reduces. So 2112 Classic flows always make progress and are not starved. 2114 When L4S was first proposed (in 2015, 14 years after [RFC3168] was 2115 published), it was believed that Classic ECN AQMs had failed to be 2116 deployed, because research measurements had found little or no 2117 evidence of CE marking. In subsequent years Classic ECN was 2118 included in FQ-CoDel deployments, however an FQ scheduler stops an 2119 L4S flow outcompeting Classic, because it enforces equality 2120 between flow rates. It is not known whether there have been any 2121 non-FQ deployments of Classic ECN AQMs in the subsequent years, or 2122 whether there will be in future. 2124 An algorithm for detecting a Classic ECN AQM as soon as a flow 2125 stabilizes after start-up has been proposed [ecn-fallback] (see 2126 Appendix A.1.4 for a brief summary). Testbed evaluations of v2 of 2127 the algorithm have shown detection is reasonably good for Classic 2128 ECN AQMs, in a wide range of circumstances. However, although it 2129 can correctly detect an L4S ECN AQM in many circumstances, its is 2130 often incorrect at low link capacities and/or high RTTs. Although 2131 this is the safe way round, there is a danger that it will 2132 discourage use of the algorithm. 2134 Non-L4S service for control packets: The Classic ECN RFCs [RFC3168] 2135 and [RFC5562] require a sender to clear the ECN field to Not-ECT 2136 on retransmissions and on certain control packets specifically 2137 pure ACKs, window probes and SYNs. When L4S packets are 2138 classified by the ECN field, these control packets would not be 2139 classified into an L4S queue, and could therefore be delayed 2140 relative to the other packets in the flow. This would not cause 2141 reordering (because retransmissions are already out of order, and 2142 these control packets typically carry no data). However, it would 2143 make critical control packets more vulnerable to loss and delay. 2144 To address this problem, [I-D.ietf-tcpm-generalized-ecn] proposes 2145 an experiment in which all TCP control packets and retransmissions 2146 are ECN-capable as long as appropriate ECN feedback is available 2147 in each case. 2149 Pros: 2151 Should work e2e: The ECN field generally works end-to-end across the 2152 Internet. Unlike the DSCP, the setting of the ECN field is at 2153 least forwarded unchanged by networks that do not support ECN, and 2154 networks rarely clear it to zero. 2156 Should work in tunnels: Unlike Diffserv, ECN is defined to always 2157 work across tunnels. This scheme works within a tunnel that 2158 propagates the ECN field in any of the variant ways it has been 2159 defined, from the year 2001 [RFC3168] onwards. However, it is 2160 likely that some tunnels still do not implement ECN propagation at 2161 all. 2163 Could migrate to one codepoint: If all Classic ECN senders 2164 eventually evolve to use the L4S service, the ECT(0) codepoint 2165 could be reused for some future purpose, but only once use of 2166 ECT(0) packets had reduced to zero, or near-zero, which might 2167 never happen. 2169 L4 not required: Being based on the ECN field, this scheme does not 2170 need the network to access transport layer flow identifiers. 2171 Nonetheless, it does not preclude solutions that do. 2173 B.2. ECN-DualQ-SCE1 2175 Definition: 2177 In this proposal, an L4S AQM would indicate congestion with ECT(1) 2178 in contrast to a Classic AQM, which indicates congestion with CE. 2179 More specifically: 2181 * Given shortage of codepoints, with this proposal L4S ECN hosts 2182 send packets as ECT(0), like Classic ECN does by default 2183 [RFC8311] hosts. 2185 * If the ECT(1) codepoint were used to indicate congestion in 2186 this way, it would signify a shallow queue AQM to the end-to- 2187 end transport. So those who proposed this approach called it 2188 'Some Congestion Experienced' (SCE) because of its similarity 2189 to [I-D.morton-tsvwg-sce]. It has also been described as 2190 'ECT(1) on output', in contrast to the 'ECT(1) on input' 2191 approach outlined in Appendix B.1. 2193 * The approach works best if the network is transport-aware and 2194 isolates each application flow in its own queue (per-flow 2195 queuing, or FQ). Two AQMs are implemented in each queue, one 2196 with a shallow target that marks selected ECT packets as 2197 ECT(1), the other with a deeper target that marks selected ECT 2198 packets as CE, or drops selected non-ECT packets. 2200 * A Classic congestion control would not have the logic to 2201 recognize ECT(1) as a congestion signal. So it would 2202 (correctly) drive the queue to the deeper threshold, responding 2203 only to CE markings. An L4S congestion control that 2204 understands this scheme would respond to ECT(1) markings, which 2205 ought to therefore keep the queue close to the shallower 2206 threshold. 2208 * A dual queue approach has been informally proposed, with an L4S 2209 and a Classic queue and coupling similar to 2210 [I-D.ietf-tsvwg-aqm-dualq-coupled]. In an interim 2211 classification, all ECT packets would be classified into the 2212 low latency queue, and non-ECT packets into the Classic queue. 2213 But then, in front of the low latency queue, a stateful flow 2214 characterization function would maintain a queue occupancy 2215 metric. It would then redirect any high occupancy flows into 2216 the Classic queue. 2218 Cons: 2220 Network requires transport-layer awareness: There is no variant of 2221 this approach that works without network visibility of transport 2222 layer flow identifiers (the 5-tuple). Obviously the FQ variant 2223 needs to see 5-tuples, but so does the DualQ SCE1 variant (to 2224 redirect flows based on sparseness). So there is no arrangement 2225 of this approach that operators could choose if they could not 2226 access the transport layer, or did not want to (e.g. to support 2227 full end-to-end encryption above the IP layer). 2229 Incomplete isolation: When evaluated, the DualQ variant of ECN- 2230 DualQ-SCE1 introduced impairments to both L4S and Classic flows. 2231 The evaluation used the DOCSIS queue protection function 2232 [I-D.briscoe-docsis-q-protection] to maintain the per-flow 2233 sparseness metrics and redirect packets from non-sparse flows into 2234 the Classic queue. Unfortunately, it is impossible to determine 2235 non-sparseness until sufficient packets of each flow have been 2236 analyzed. Up to this point, all packets default to the L4S queue. 2237 Then: 2239 * Long-running Classic flows experience reordering during the 2240 transition to classifying them as Classic. Worse, the 2241 reordering occurs early in the flow when it is less robust to 2242 confusing RTT measurements; 2244 * Considerable numbers of Classic packets add to the L4S queue - 2245 from all the short flows and the start of long flows before the 2246 classifier can be certain enough to redirect them to the other 2247 (Classic) queue. So true L4S flows unavoidably experience a 2248 degree of extra delay. 2250 Consumes the last ECN codepoint: The L4S service could potentially 2251 supersede the service provided by Classic ECN, therefore using 2252 ECT(1) to indicate L4S congestion could ultimately mean that the 2253 CE codepoint was 'wasted' purely to distinguish one form of 2254 congestion from its successor. 2256 Only recently updated tunnels: If this scheme is applied to an outer 2257 header within a tunnel or lower layer encapsulation, the ECT(1) 2258 codepoint will be black-holed at decapsulation, unless the 2259 decapsulator complies with changes to IP-in-IP tunnels introduced 2260 in 2010 [RFC6040], or changes to other tunnels that are 2261 (currently) work in progress [I-D.ietf-tsvwg-rfc6040update-shim], 2262 [I-D.ietf-tsvwg-ecn-encap-guidelines]. 2264 Limited TCP support for feedback: This approach requires transport 2265 layer feedback of two congestion signals ECT(1) and CE. Recently 2266 developed protocols such as QUIC provide this by default. 2267 However, there is limited space in the main TCP header to feed 2268 back both signals reliably and accurately [RFC7560]. AccECN 2269 [I-D.ietf-tcpm-accurate-ecn] devotes the limited space in the main 2270 TCP header to CE feedback, and optionally feeds back ECT(1) in a 2271 new TCP option, which will have limited initial deployment 2272 support. 2274 Alters non-participating packets: An AQM following this approach 2275 alters some selected ECT(0) packets to ECT(1) irrespective of 2276 whether they are participating in the L4S experiment. Although 2277 ECT(0) and ECT(1) have historically been defined as equivalent, in 2278 practice ECT(1) packets have been extremely rare on the Internet. 2279 Therefore, in practice, there might be a risk that firewalls and 2280 other devices will block ECT(1) packets, or at least treat them 2281 with greater suspicion. 2283 ECN hard in some lower layers: Similarly to the 'Con' point in 2284 Appendix B.1, it is not always possible to support ECN in an AQM 2285 acting in a buffer below the IP layer 2286 [I-D.ietf-tsvwg-ecn-encap-guidelines]. However, adding support to 2287 lower layers would be even harder with this scheme, because it 2288 needs space for two severity levels of congestion, not one. 2289 Without lower layer ECN support, the L4S service would have to 2290 drop rather than mark frames even though they might encapsulate an 2291 ECN-capable packet. . 2293 Non-L4S service for control packets: Identical to 'Con' point in 2294 Appendix B.1. 2296 Pros: 2298 Distinct indication of Classic ECN AQM: An AQM following the ECN- 2299 DualQ-SCE1 approach outputs distinctive signals (ECT(1)) compared 2300 to those output by a Classic ECN AQM. So an L4S congestion 2301 control using the SCE1 approach would inherently respond 2302 appropriately to a Classic AQM. 2304 Should work e2e: Identical to 'Pro' point in Appendix B.1. 2306 B.3. ECN-DualQ-SCE0 2308 Definition: 2310 This proposal is the inverse of the ECN-DualQ-SCE1 scheme (see 2311 Appendix B.2 above). L4S AQMs signal congestion with the 2312 transition ECT(1) -> ECT(0). More specifically: 2314 * L4S senders would send their packets as ECT(1), while Classic 2315 ECN senders would continue to send ECT(0) by default [RFC8311]. 2317 * FQ AQMs would work in a similar way to that described for ECN- 2318 DualQ-SCE1 in Appendix B.2 above. Except the shallow queue AQM 2319 would mark selected ECT packets with ECT(0), rather than 2320 ECT(1). 2322 It would seem possible to classify packets by both 5-tuple and 2323 ECT codepoint, so that each per-flow queue could instantiate 2324 just the one AQM appropriate to the ECT codepoint using it. In 2325 this case, CE and Not-ECT packets would be classified into the 2326 same queue as ECT(0). However, this would open up the risk of 2327 reordering explained below, so it is not considered further. 2329 * A Classic congestion control would only receive CE feedback, 2330 and it would have no logic to recognize ECT(0) as congestion 2331 markings, because it would send all its packets as ECT(0) 2332 anyway. So it would (correctly) drive the queue to the deeper 2333 threshold, responding only to CE markings. An L4S congestion 2334 control would understand ECT(0) markings as L4S congestion 2335 signals and therefore ought to keep the queue close to the 2336 shallower threshold. 2338 * Under the SCE0 scheme, a dual queue coupled AQM 2339 [I-D.ietf-tsvwg-aqm-dualq-coupled] would use ECT(1) as the L4S 2340 classifier in a very similar way to the 'ECT(1) and CE' scheme 2341 it was originally designed for. The one difference would be to 2342 classify CE packets into the Classic queue along with ECT(0) 2343 and Not-ECT. 2345 Cons: 2347 Consumes the last ECN codepoint: The L4S service could potentially 2348 supersede the service provided by Classic ECN, therefore using 2349 ECT(0) to indicate L4S congestion could ultimately mean that the 2350 CE codepoint was 'wasted' purely to distinguish one form of 2351 congestion from its successor. 2353 Incompatible with all ECN tunnels: The transition ECT(1) -> ECT(0) 2354 has never previously been recognized as valid. So, any ECT(0) 2355 marking applied to an ECT(1) outer header within a tunnel or lower 2356 layer encapsulation will be black-holed at decapsulation by any 2357 decapsulator whatever variant of ECN tunnel RFC it complies with. 2359 Limited TCP support for feedback: Identical to 'Con' point in 2360 Appendix B.2 above except space would be needed for CE and ECT(0) 2361 rather than CE and ECT(1) feedback. 2363 Risk of reordering Classic CE packets: If an L4S flow traverses a 2364 path with two or more bottleneck AQMs that both support L4S, 2365 reordering is likely to occur. This is because the first 2366 bottleneck will re-mark some ECT(1) packets to ECT(0), which will 2367 then be classified into the Classic queue of the second AQM, even 2368 though they originated as L4S packets. 2370 In contrast to the 'ECT(1) and CE' scheme in Appendix B.1, the 2371 risk of impairment in the ECN-DualQ-SCE0 case is not vanishingly 2372 small: 2374 1. Certainly, queuing at more than one bottleneck on the same 2375 path would still be quite unusual. 2377 2. However, the ECN-DualQ-SCE0 case occurs if both bottlenecks 2378 support L4S ECN and the traffic is L4S. This contrasts with 2379 the "ECT(1) and CE" case, which solely occurs if the AQMs are 2380 in a certain order (Classic followed by L4S). 2382 3. When misclassification occurs, it is from L4S to Classic. So 2383 selected packets are delivered late, which in itself adds 2384 delay, and also increases the risk that each late delivery 2385 will be deemed a loss and cause a high level of spurious 2386 retransmissions. This contrasts with the "ECT(1) and CE" case 2387 where selected packets are delivered early, which is very 2388 unlikely to have any effect (as already explained in 2389 Appendix B.1). 2391 ECN hard in some lower layers: Identical to 'Con' point in 2392 Appendix B.2. 2394 Non-L4S service for control packets: Identical to 'Con' point in 2395 Appendix B.1. 2397 Pros: 2399 Distinct indication of Classic ECN AQM: An AQM following the ECN- 2400 DualQ-SCE0 approach outputs distinctive signals (ECT(0)) compared 2401 to those output by a Classic ECN AQM (CE). So an L4S congestion 2402 control can inherently respond appropriately to a Classic AQM. 2404 Should work e2e: Identical to 'Pro' point in Appendix B.1. 2406 B.4. ECN Plus a Diffserv Codepoint (DSCP) 2408 Definition: 2410 For packets with a defined DSCP, all codepoints of the ECN field 2411 (except Not-ECT) would signify alternative L4S semantics to those 2412 for Classic ECN [RFC3168], specifically: 2414 * The L4S DSCP would signify that the packet came from an L4S- 2415 capable sender. 2417 * ECT(0) and ECT(1) would both signify that the packet was 2418 travelling between transport endpoints that were both ECN- 2419 capable. 2421 * CE would signify that the packet had been marked by an AQM 2422 implementing the L4S service. 2424 Use of a DSCP is the only approach for alternative ECN semantics 2425 given as an example in [RFC4774]. However, it was perhaps considered 2426 more for controlled environments than new end-to-end services. 2428 Cons: 2430 Consumes DSCP pairs: A DSCP is by definition not orthogonal to 2431 Diffserv. Therefore, wherever the L4S service is applied to 2432 multiple Diffserv scheduling behaviours, it would be necessary to 2433 replace each DSCP with a pair of DSCPs. 2435 Uses critical lower-layer header space: The resulting increased 2436 number of DSCPs might be hard to support for some lower layer 2437 technologies, e.g. 802.1Q and MPLS both offer only 3-bits for a 2438 maximum of 8 traffic class identifiers. Although L4S should 2439 reduce and possibly remove the need for some DSCPs intended for 2440 differentiated queuing delay, it will not remove the need for 2441 Diffserv entirely, because Diffserv is also used to allocate 2442 bandwidth, e.g. by prioritising some classes of traffic over 2443 others when traffic exceeds available capacity. 2445 Not end-to-end (host-network): Very few networks honour a DSCP set 2446 by a host. Typically a network will zero (bleach) the Diffserv 2447 field from all hosts. DSCP bleaching would turn an L4S ECN packet 2448 into a Classic ECN packet. 2450 Not end-to-end (network-network): Very few networks honour a DSCP 2451 received from a neighbouring network. Typically a network will 2452 zero (bleach) the Diffserv field from all neighbouring networks at 2453 an interconnection point. Sometimes bilateral arrangements are 2454 made between networks, such that the receiving network remarks 2455 some DSCPs to those it uses for roughly equivalent services. The 2456 likelihood that a DSCP will be bleached or ignored depends on the 2457 type of DSCP: 2459 Local-use DSCP: These tend to be used to implement application- 2460 specific network policies, but a bilateral arrangement to 2461 remark certain DSCPs is often applied to DSCPs in the local-use 2462 range simply because it is easier not to change all of a 2463 network's internal configurations when a new arrangement is 2464 made with a neighbour. 2466 Recommended standard DSCP: These do not tend to be honoured 2467 across network interconnections more than local-use DSCPs. 2468 However, if two networks decide to honour certain of each 2469 other's DSCPs, the reconfiguration is a little easier if both 2470 of their globally recognised services are already represented 2471 by the relevant recommended standard DSCPs. 2473 Note that today a recommended standard DSCP gives little more 2474 assurance of end-to-end service than a local-use DSCP. In 2475 future the range recommended as standard might give more 2476 assurance of end-to-end service than local-use, but it is 2477 unlikely that either assurance will be high, particularly given 2478 the hosts are included in the end-to-end path. 2480 Whenever DSCP bleaching did occur, it would turn an L4S ECN packet 2481 into a Classic ECN packet. 2483 Not all tunnels: Diffserv codepoints are often not propagated to the 2484 outer header when a packet is encapsulated by a tunnel header. 2485 DSCPs are propagated to the outer of uniform mode tunnels, but not 2486 pipe mode [RFC2983], and pipe mode is fairly common. Whenever 2487 pipe mode was used, it would temporarily turn an L4S ECN packet 2488 into a Classic ECN packet. 2490 ECN hard in some lower layers:: Because this approach uses both the 2491 Diffserv and ECN fields, an AQM will only work at a lower layer if 2492 both can be supported. If individual network operators wished to 2493 deploy an AQM at a lower layer, they would usually propagate an IP 2494 Diffserv codepoint to the lower layer, using for example IEEE 2495 802.1p. However, the ECN capability is harder to propagate down 2496 to lower layers because few lower layers support it. 2498 Hard to distinguish Classic ECN AQM: Defining a DSCP to indicate L4S 2499 is a way to help network nodes identify L4S packets (albeit 2500 unreliable due to the likelihood of bleaching - see above). 2501 However, it does not help hosts distinguish between ECN markings 2502 from L4S and Classic AQMs. This is because Classic AQMs would 2503 have been implemented without any logic to recognize an L4S DSCP 2504 or apply L4S marking behaviour. 2506 Pros: 2508 Could migrate to e2e: If all usage of Classic ECN migrates to usage 2509 of L4S, the DSCP would become redundant, and the ECN capability 2510 alone could eventually identify L4S packets without the 2511 interconnection problems of Diffserv detailed above, and without 2512 having permanently consumed more than one codepoint in the IP 2513 header. Although the DSCP does not generally function as an end- 2514 to-end identifier (see above), it could be used initially by 2515 individual ISPs to introduce the L4S service for their own locally 2516 generated traffic. 2518 B.5. ECN capability alone 2520 This approach uses ECN capability alone as the L4S identifier. It 2521 would only have been feasible if RFC 3168 ECN had not been widely 2522 deployed. This was the case when the choice of L4S identifier was 2523 being made and this appendix was first written. Since then, RFC 3168 2524 ECN has been widely deployed and L4S did not take this approach 2525 anyway. So this approach is not discussed further, because it is no 2526 longer a feasible option. 2528 B.6. Protocol ID 2530 It has been suggested that a new Protocol ID in the IPv4 Protocol 2531 field or the IPv6 Next Header field could identify L4S packets. 2532 However this approach is ruled out by numerous problems: 2534 o A duplicate protocol ID would need to be created for each 2535 transport (TCP, SCTP, UDP, etc.). 2537 o In IPv6, there can be a sequence of Next Header fields, and it 2538 would not be obvious which one would be expected to identify a 2539 network service like L4S. 2541 o A new protocol ID would rarely provide an end-to-end service, 2542 because It is well-known that new protocol IDs are often blocked 2543 by numerous types of middlebox. 2545 o The approach is not a solution for AQM methods below the IP layer. 2547 B.7. Source or destination addressing 2549 Locally, a network operator could arrange for L4S service to be 2550 applied based on source or destination addressing, e.g. packets from 2551 its own data centre and/or CDN hosts, packets to its business 2552 customers, etc. It could use addressing at any layer, e.g. IP 2553 addresses, MAC addresses, VLAN IDs, etc. Although addressing might 2554 be a useful tactical approach for a single ISP, it would not be a 2555 feasible approach to identify an end-to-end service like L4S. Even 2556 for a single ISP, it would require packet classifiers in buffers to 2557 be dependent on changing topology and address allocation decisions 2558 elsewhere in the network. Therefore this approach is not a feasible 2559 solution. 2561 B.8. Summary: Merits of Alternative Identifiers 2563 Table 1 and Table 2 provide a very high level summary of the pros and 2564 cons detailed against the schemes described respectively in 2565 Appendix B.1, Appendix B.4, Appendix B.2 and Appendix B.3 for nine 2566 issues that set them apart. 2568 +----------------+----------------------+--------------------+ 2569 | Issue | ECT(1) + CE (Chosen) | DSCP + ECN | 2570 +----------------+----------------------+--------------------+ 2571 | | initial eventual | initial eventual | 2572 | | | | 2573 | end-to-end | . . Y . . Y | N . . . ? . | 2574 | tunnels | . . ? . . Y | . O . . O . | 2575 | lower layers | . O . . . ? | N . . . ? . | 2576 | codepoints | N . . . . ? | N . . . . ? | 2577 | reordering | . O . . . ? | . . Y . . Y | 2578 | identify C AQM | . O . . . ? | . O . . . ? | 2579 | L3-only poss. | . . Y . . Y | . . Y . . Y | 2580 | TCP feedback | . O . . . Y | . O . . . Y | 2581 | TCP ctrl pkts | . O . . . ? | . . Y . . Y | 2582 +----------------+----------------------+--------------------+ 2584 Table 1: Merits of Alternative L4S Identifiers (pt 1) 2586 +----------------+--------------------+--------------------+ 2587 | Issue | ECN-DualQ-SCE1 | ECN-DualQ-SCE0 | 2588 +----------------+--------------------+--------------------+ 2589 | | initial eventual | initial eventual | 2590 | | | | 2591 | end-to-end | . . Y . . Y | . . Y . . Y | 2592 | tunnels | . ? . . . ? | N . . ? . . | 2593 | lower layers | N . . . ? . | N . . ? . . | 2594 | codepoints | N . . . ? . | N . . . ? . | 2595 | reordering | N . . N . . | N . . N . . | 2596 | identify C AQM | . . Y . . Y | . . Y . . Y | 2597 | L3-only poss | N . . N . . | . . Y . . Y | 2598 | TCP feedback | N . . . O . | N . . . O . | 2599 | TCP ctrl pkts | . O . . . ? | . O . . . ? | 2600 +----------------+--------------------+--------------------+ 2602 Table 2: Merits of Alternative L4S Identifiers (pt 2) 2604 The schemes are scored based on both their capabilities now 2605 ('initial') and in the long term ('eventual'). The scores are one of 2606 'N, O, Y', meaning 'Poor', 'Ordinary', 'Good' respectively. The same 2607 scores are aligned vertically to aid the eye. A score of "?" in one 2608 of the positions means that this approach might optimistically become 2609 this good, given sufficient effort. The tables summarize the text 2610 and are not meant to be understandable without having read the text. 2612 Appendix C. Potential Competing Uses for the ECT(1) Codepoint 2614 The ECT(1) codepoint of the ECN field has already been assigned once 2615 for the ECN nonce [RFC3540], which has now been categorized as 2616 historic [RFC8311]. ECN is probably the only remaining field in the 2617 Internet Protocol that is common to IPv4 and IPv6 and still has 2618 potential to work end-to-end, with tunnels and with lower layers. 2619 Therefore, ECT(1) should not be reassigned to a different 2620 experimental use (L4S) without carefully assessing competing 2621 potential uses. These fall into the following categories: 2623 C.1. Integrity of Congestion Feedback 2625 Receiving hosts can fool a sender into downloading faster by 2626 suppressing feedback of ECN marks (or of losses if retransmissions 2627 are not necessary or available otherwise). 2629 The historic ECN nonce protocol [RFC3540] proposed that a TCP sender 2630 could set either of ECT(0) or ECT(1) in each packet of a flow and 2631 remember the sequence it had set. If any packet was lost or 2632 congestion marked, the receiver would miss that bit of the sequence. 2633 An ECN Nonce receiver had to feed back the least significant bit of 2634 the sum, so it could not suppress feedback of a loss or mark without 2635 a 50-50 chance of guessing the sum incorrectly. 2637 It is highly unlikely that ECT(1) will be needed for integrity 2638 protection in future. The ECN Nonce RFC [RFC3540] as been 2639 reclassified as historic, partly because other ways have been 2640 developed to protect feedback integrity of TCP and other transports 2641 [RFC8311] that do not consume a codepoint in the IP header. For 2642 instance: 2644 o the sender can test the integrity of the receiver's feedback by 2645 occasionally setting the IP-ECN field to a value normally only set 2646 by the network. Then it can test whether the receiver's feedback 2647 faithfully reports what it expects (see para 2 of Section 20.2 of 2648 [RFC3168]. This works for loss and it will work for the accurate 2649 ECN feedback [RFC7560] intended for L4S. 2651 o A network can enforce a congestion response to its ECN markings 2652 (or packet losses) by auditing congestion exposure (ConEx) 2653 [RFC7713]. Whether the receiver or a downstream network is 2654 suppressing congestion feedback or the sender is unresponsive to 2655 the feedback, or both, ConEx audit can neutralise any advantage 2656 that any of these three parties would otherwise gain. 2658 o The TCP authentication option (TCP-AO [RFC5925]) can be used to 2659 detect any tampering with TCP congestion feedback (whether 2660 malicious or accidental). TCP's congestion feedback fields are 2661 immutable end-to-end, so they are amenable to TCP-AO protection, 2662 which covers the main TCP header and TCP options by default. 2663 However, TCP-AO is often too brittle to use on many end-to-end 2664 paths, where middleboxes can make verification fail in their 2665 attempts to improve performance or security, e.g. by 2666 resegmentation or shifting the sequence space. 2668 C.2. Notification of Less Severe Congestion than CE 2670 Various researchers have proposed to use ECT(1) as a less severe 2671 congestion notification than CE, particularly to enable flows to fill 2672 available capacity more quickly after an idle period, when another 2673 flow departs or when a flow starts, e.g. VCP [VCP], Queue View (QV) 2674 [QV]. 2676 Before assigning ECT(1) as an identifier for L4S, we must carefully 2677 consider whether it might be better to hold ECT(1) in reserve for 2678 future standardisation of rapid flow acceleration, which is an 2679 important and enduring problem [RFC6077]. 2681 Pre-Congestion Notification (PCN) is another scheme that assigns 2682 alternative semantics to the ECN field. It uses ECT(1) to signify a 2683 less severe level of pre-congestion notification than CE [RFC6660]. 2684 However, the ECN field only takes on the PCN semantics if packets 2685 carry a Diffserv codepoint defined to indicate PCN marking within a 2686 controlled environment. PCN is required to be applied solely to the 2687 outer header of a tunnel across the controlled region in order not to 2688 interfere with any end-to-end use of the ECN field. Therefore a PCN 2689 region on the path would not interfere with any of the L4S service 2690 identifiers proposed in Appendix B. 2692 Authors' Addresses 2694 Koen De Schepper 2695 Nokia Bell Labs 2696 Antwerp 2697 Belgium 2699 Email: koen.de_schepper@nokia.com 2700 URI: https://www.bell-labs.com/usr/koen.de_schepper 2701 Bob Briscoe (editor) 2702 Independent 2703 UK 2705 Email: ietf@bobbriscoe.net 2706 URI: http://bobbriscoe.net/