idnits 2.17.1 draft-ietf-tsvwg-ecn-l4s-id-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1948 has weird spacing: '...initial even...' -- The document date (February 20, 2020) is 1521 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-09) exists of draft-ietf-avtcore-cc-feedback-message-05 == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-25 == Outdated reference: A later version (-28) exists of draft-ietf-tcpm-accurate-ecn-09 == Outdated reference: A later version (-15) exists of draft-ietf-tcpm-generalized-ecn-05 == Outdated reference: A later version (-15) exists of draft-ietf-tcpm-rack-07 == Outdated reference: A later version (-25) exists of draft-ietf-tsvwg-aqm-dualq-coupled-10 == Outdated reference: A later version (-22) exists of draft-ietf-tsvwg-ecn-encap-guidelines-13 == Outdated reference: A later version (-20) exists of draft-ietf-tsvwg-l4s-arch-04 == Outdated reference: A later version (-22) exists of draft-ietf-tsvwg-nqb-00 == Outdated reference: A later version (-06) exists of draft-stewart-tsvwg-sctpecn-05 -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 8312 (Obsoleted by RFC 9438) Summary: 0 errors (**), 0 flaws (~~), 12 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Services (tsv) K. De Schepper 3 Internet-Draft Nokia Bell Labs 4 Intended status: Experimental B. Briscoe, Ed. 5 Expires: August 23, 2020 Independent 6 February 20, 2020 8 Identifying Modified Explicit Congestion Notification (ECN) Semantics 9 for Ultra-Low Queuing Delay (L4S) 10 draft-ietf-tsvwg-ecn-l4s-id-09 12 Abstract 14 This specification defines the identifier to be used on IP packets 15 for a new network service called low latency, low loss and scalable 16 throughput (L4S). It is similar to the original (or 'Classic') 17 Explicit Congestion Notification (ECN). 'Classic' ECN marking was 18 required to be equivalent to a drop, both when applied in the network 19 and when responded to by a transport. Unlike 'Classic' ECN marking, 20 for packets carrying the L4S identifier, the network applies marking 21 more immediately and more aggressively than drop, and the transport 22 response to each mark is reduced and smoothed relative to that for 23 drop. The two changes counterbalance each other so that the 24 throughput of an L4S flow will be roughly the same as a non-L4S flow 25 under the same conditions. Nonetheless, the much more frequent 26 control signals and the finer responses to them result in ultra-low 27 queuing delay (sub-millisecond) for L4S traffic without compromising 28 link utilization, and this low delay can be maintained during high 29 traffic load. 31 The L4S identifier defined in this document is the key piece that 32 distinguishes L4S from TCP-Reno-Friendly (or 'Classic') traffic. It 33 gives an incremental migration path so that suitably modified network 34 bottlenecks can distinguish and isolate existing Classic traffic from 35 L4S traffic to prevent it from degrading the ultra-low queuing delay 36 and loss of the new scalable transports, without harming Classic 37 performance. Examples of new active queue management (AQM) marking 38 algorithms and examples of new transports (whether TCP-like or real- 39 time) are specified separately. 41 Status of This Memo 43 This Internet-Draft is submitted in full conformance with the 44 provisions of BCP 78 and BCP 79. 46 Internet-Drafts are working documents of the Internet Engineering 47 Task Force (IETF). Note that other groups may also distribute 48 working documents as Internet-Drafts. The list of current Internet- 49 Drafts is at https://datatracker.ietf.org/drafts/current/. 51 Internet-Drafts are draft documents valid for a maximum of six months 52 and may be updated, replaced, or obsoleted by other documents at any 53 time. It is inappropriate to use Internet-Drafts as reference 54 material or to cite them other than as "work in progress." 56 This Internet-Draft will expire on August 23, 2020. 58 Copyright Notice 60 Copyright (c) 2020 IETF Trust and the persons identified as the 61 document authors. All rights reserved. 63 This document is subject to BCP 78 and the IETF Trust's Legal 64 Provisions Relating to IETF Documents 65 (https://trustee.ietf.org/license-info) in effect on the date of 66 publication of this document. Please review these documents 67 carefully, as they describe your rights and restrictions with respect 68 to this document. Code Components extracted from this document must 69 include Simplified BSD License text as described in Section 4.e of 70 the Trust Legal Provisions and are provided without warranty as 71 described in the Simplified BSD License. 73 Table of Contents 75 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 76 1.1. Latency, Loss and Scaling Problems . . . . . . . . . . . 4 77 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 78 1.3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 8 79 2. Consensus Choice of L4S Packet Identifier: Requirements . . . 9 80 3. L4S Packet Identification at Run-Time . . . . . . . . . . . . 10 81 4. Prerequisite Transport Layer Behaviour . . . . . . . . . . . 10 82 4.1. Prerequisite Codepoint Setting . . . . . . . . . . . . . 10 83 4.2. Prerequisite Transport Feedback . . . . . . . . . . . . . 10 84 4.3. Prerequisite Congestion Response . . . . . . . . . . . . 11 85 5. Prerequisite Network Node Behaviour . . . . . . . . . . . . . 13 86 5.1. Prerequisite Classification and Re-Marking Behaviour . . 13 87 5.2. The Meaning of L4S CE Relative to Drop . . . . . . . . . 14 88 5.3. Exception for L4S Packet Identification by Network Nodes 89 with Transport-Layer Awareness . . . . . . . . . . . . . 15 90 5.4. Interaction of the L4S Identifier with other Identifiers 15 91 5.4.1. DualQ Examples of Other Identifiers Complementing L4S 92 Identifiers . . . . . . . . . . . . . . . . . . . . . 15 93 5.4.1.1. Inclusion of Additional Traffic with L4S . . . . 15 94 5.4.1.2. Exclusion of Traffic From L4S Treatment . . . . . 17 95 5.4.1.3. Generalized Combination of L4S and Other 96 Identifiers . . . . . . . . . . . . . . . . . . . 17 97 5.4.2. Per-Flow Queuing Examples of Other Identifiers 98 Complementing L4S Identifiers . . . . . . . . . . . . 19 99 6. L4S Experiments . . . . . . . . . . . . . . . . . . . . . . . 19 100 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 101 8. Security Considerations . . . . . . . . . . . . . . . . . . . 20 102 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 20 103 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 104 10.1. Normative References . . . . . . . . . . . . . . . . . . 20 105 10.2. Informative References . . . . . . . . . . . . . . . . . 21 106 Appendix A. The 'Prague L4S Requirements' . . . . . . . . . . . 27 107 A.1. Requirements for Scalable Transport Protocols . . . . . . 28 108 A.1.1. Use of L4S Packet Identifier . . . . . . . . . . . . 28 109 A.1.2. Accurate ECN Feedback . . . . . . . . . . . . . . . . 28 110 A.1.3. Fall back to Reno-friendly congestion control on 111 packet loss . . . . . . . . . . . . . . . . . . . . . 29 112 A.1.4. Fall back to Reno-friendly congestion control on 113 classic ECN bottlenecks . . . . . . . . . . . . . . . 30 114 A.1.5. Reduce RTT dependence . . . . . . . . . . . . . . . . 30 115 A.1.6. Scaling down to fractional congestion windows . . . . 31 116 A.1.7. Measuring Reordering Tolerance in Time Units . . . . 31 117 A.2. Scalable Transport Protocol Optimizations . . . . . . . . 34 118 A.2.1. Setting ECT in TCP Control Packets and 119 Retransmissions . . . . . . . . . . . . . . . . . . . 34 120 A.2.2. Faster than Additive Increase . . . . . . . . . . . . 34 121 A.2.3. Faster Convergence at Flow Start . . . . . . . . . . 35 122 Appendix B. Alternative Identifiers . . . . . . . . . . . . . . 35 123 B.1. ECT(1) and CE codepoints . . . . . . . . . . . . . . . . 36 124 B.2. ECN Plus a Diffserv Codepoint (DSCP) . . . . . . . . . . 39 125 B.3. ECN capability alone . . . . . . . . . . . . . . . . . . 41 126 B.4. Protocol ID . . . . . . . . . . . . . . . . . . . . . . . 41 127 B.5. Source or destination addressing . . . . . . . . . . . . 41 128 B.6. Summary: Merits of Alternative Identifiers . . . . . . . 42 129 Appendix C. Potential Competing Uses for the ECT(1) Codepoint . 42 130 C.1. Integrity of Congestion Feedback . . . . . . . . . . . . 43 131 C.2. Notification of Less Severe Congestion than CE . . . . . 44 132 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 44 134 1. Introduction 136 This specification defines the identifier to be used on IP packets 137 for a new network service called low latency, low loss and scalable 138 throughput (L4S). It is similar to the original (or 'Classic') 139 Explicit Congestion Notification (ECN [RFC3168]). RFC 3168 required 140 an ECN mark to be equivalent to a drop, both when applied in the 141 network and when responded to by a transport. Unlike Classic ECN 142 marking, the network applies L4S marking more immediately and more 143 aggressively than drop, and the transport response to each mark is 144 reduced and smoothed relative to that for drop. The two changes 145 counterbalance each other so that the throughput of an L4S flow will 146 be roughly the same as a non-L4S flow under the same conditions. 147 Nonetheless, the much more frequent control signals and the finer 148 responses to them result in ultra-low queuing delay without 149 compromising link utilization, and this low delay can be maintained 150 during high load. Ultra-low queuing delay means less than 1 151 millisecond (ms) on average and less than about 2 ms at the 99th 152 percentile. 154 An example of a scalable congestion control that would enable the L4S 155 service is Data Center TCP (DCTCP), which until now has been 156 applicable solely to controlled environments like data centres 157 [RFC8257], because it is too aggressive to co-exist with existing 158 TCP-Reno-Friendly traffic. The DualQ Coupled AQM, which is defined 159 in a complementary experimental specification 160 [I-D.ietf-tsvwg-aqm-dualq-coupled], is an AQM framework that enables 161 scalable congestion controls like DCTCP to co-exist with existing 162 traffic, each getting roughly the same flow rate when they compete 163 under similar conditions. Note that a transport such as DCTCP is 164 still not safe to deploy on the Internet unless it satisfies the 165 requirements listed in Section 4. 167 L4S is not only for elastic (TCP-like) traffic - there are scalable 168 congestion controls for real-time media, such as the L4S variant of 169 the SCReAM [RFC8298] real-time media congestion avoidance technique 170 (RMCAT). The factor that distinguishes L4S from Classic traffic is 171 its behaviour in response to congestion. The transport wire 172 protocol, e.g. TCP, QUIC, SCTP, DCCP, RTP/RTCP, is orthogonal (and 173 therefore not suitable for distinguishing L4S from Classic packets). 175 The L4S identifier defined in this document is the key piece that 176 distinguishes L4S from TCP-Reno-Friendly (or 'Classic') traffic. It 177 gives an incremental migration path so that suitably modified network 178 bottlenecks can distinguish and isolate existing Classic traffic from 179 L4S traffic to prevent it from degrading the ultra-low delay and loss 180 of the new scalable transports, without harming Classic performance. 181 Initial implementation of the separate parts of the system has been 182 motivated by the performance benefits. 184 1.1. Latency, Loss and Scaling Problems 186 Latency is becoming the critical performance factor for many (most?) 187 applications on the public Internet, e.g. interactive Web, Web 188 services, voice, conversational video, interactive video, interactive 189 remote presence, instant messaging, online gaming, remote desktop, 190 cloud-based applications, and video-assisted remote control of 191 machinery and industrial processes. In the 'developed' world, 192 further increases in access network bit-rate offer diminishing 193 returns, whereas latency is still a multi-faceted problem. In the 194 last decade or so, much has been done to reduce propagation time by 195 placing caches or servers closer to users. However, queuing remains 196 a major intermittent component of latency. 198 The Diffserv architecture provides Expedited Forwarding [RFC3246], so 199 that low latency traffic can jump the queue of other traffic. 200 However, on access links dedicated to individual sites (homes, small 201 enterprises or mobile devices), often all traffic at any one time 202 will be latency-sensitive. Then, given nothing to differentiate 203 from, Diffserv makes no difference. Instead, we need to remove the 204 causes of any unnecessary delay. 206 The bufferbloat project has shown that excessively-large buffering 207 ('bufferbloat') has been introducing significantly more delay than 208 the underlying propagation time. These delays appear only 209 intermittently--only when a capacity-seeking (e.g. TCP) flow is long 210 enough for the queue to fill the buffer, making every packet in other 211 flows sharing the buffer sit through the queue. 213 Active queue management (AQM) was originally developed to solve this 214 problem (and others). Unlike Diffserv, which gives low latency to 215 some traffic at the expense of others, AQM controls latency for _all_ 216 traffic in a class. In general, AQM methods introduce an increasing 217 level of discard from the buffer the longer the queue persists above 218 a shallow threshold. This gives sufficient signals to capacity- 219 seeking (aka. greedy) flows to keep the buffer empty for its intended 220 purpose: absorbing bursts. However, RED [RFC2309] and other 221 algorithms from the 1990s were sensitive to their configuration and 222 hard to set correctly. So, this form of AQM was not widely deployed. 224 More recent state-of-the-art AQM methods, e.g. fq_CoDel [RFC8290], 225 PIE [RFC8033], Adaptive RED [ARED01], are easier to configure, 226 because they define the queuing threshold in time not bytes, so it is 227 invariant for different link rates. However, no matter how good the 228 AQM, the sawtoothing sending window of a Classic congestion control 229 will either cause queuing delay to vary or cause the link to be 230 under-utilized. Even with a perfectly tuned AQM, the additional 231 queuing delay will be of the same order as the underlying speed-of- 232 light delay across the network. 234 If a sender's own behaviour is introducing queuing delay variation, 235 no AQM in the network can "un-vary" the delay without significantly 236 compromising link utilization. Even flow-queuing (e.g. [RFC8290]), 237 which isolates one flow from another, cannot isolate a flow from the 238 delay variations it inflicts on itself. Therefore those applications 239 that need to seek out high bandwidth but also need low latency will 240 have to migrate to scalable congestion control. 242 Altering host behaviour is not enough on its own though. Even if 243 hosts adopt low latency behaviour (scalable congestion controls), 244 they need to be isolated from the behaviour of existing Classic 245 congestion controls that induce large queue variations. L4S enables 246 that migration by providing latency isolation in the network and 247 distinguishing the two types of packets that need to be isolated: L4S 248 and Classic. L4S isolation can be achieved with a queue per flow 249 (e.g. [RFC8290]) but a DualQ [I-D.ietf-tsvwg-aqm-dualq-coupled] is 250 sufficient, and actually gives better tail latency. Both approaches 251 are addressed in this document. 253 The DualQ solution was developed to make ultra-low latency available 254 without requiring per-flow queues at every bottleneck. This was 255 because FQ has well-known downsides - not least the need to inspect 256 transport layer headers in the network, which makes it incompatible 257 with privacy approaches such as IPSec VPN tunnels, and incompatible 258 with link layer queue management, where transport layer headers can 259 be hidden, e.g. 5G. 261 Latency is not the only concern addressed by L4S: It was known when 262 TCP congestion avoidance was first developed that it would not scale 263 to high bandwidth-delay products (footnote 6 of Jacobson and Karels 264 [TCP-CA]). Given regular broadband bit-rates over WAN distances are 265 already [RFC3649] beyond the scaling range of Reno TCP, 'less 266 unscalable' Cubic [RFC8312] and Compound [I-D.sridharan-tcpm-ctcp] 267 variants of TCP have been successfully deployed. However, these are 268 now approaching their scaling limits. Unfortunately, fully scalable 269 congestion controls such as DCTCP [RFC8257] cause Classic ECN 270 congestion controls sharing the same queue to starve themselves, 271 which is why they have been confined to private data centres or 272 research testbeds (until now). 274 It turns out that a congestion control algorithm like DCTCP that 275 solves the latency problem also solves the scalability problem of 276 Classic congestion controls. The finer sawteeth in the congestion 277 window have low amplitude, so they cause very little queuing delay 278 variation and the average time to recover from one congestion signal 279 to the next (the average duration of each sawtooth) remains 280 invariant, which maintains constant tight control as flow-rate 281 scales. A background paper [DCttH15] gives the full explanation of 282 why the design solves both the latency and the scaling problems, both 283 in plain English and in more precise mathematical form. The 284 explanation is summarised without the maths in the L4S architecture 285 document [I-D.ietf-tsvwg-l4s-arch]. 287 1.2. Terminology 289 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 290 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 291 "OPTIONAL" in this document are to be interpreted as described in 292 [RFC2119]. In this document, these words will appear with that 293 interpretation only when in ALL CAPS. Lower case uses of these words 294 are not to be interpreted as carrying RFC-2119 significance. 296 Classic Congestion Control: A congestion control behaviour that can 297 co-exist with standard TCP Reno [RFC5681] without causing flow 298 rate starvation. With Classic congestion controls, as flow rate 299 scales, the number of round trips between congestion signals 300 (losses or ECN marks) rises with the flow rate. So it takes 301 longer and longer to recover after each congestion event. 302 Therefore control of queuing and utilization becomes very slack, 303 and the slightest disturbance prevents a high rate from being 304 attained [RFC3649]. 306 For instance, with 1500 byte packets and an end-to-end round trip 307 time (RTT) of 36 ms, over the years, as Reno flow rate scales from 308 2 to 100 Mb/s the number of round trips taken to recover from a 309 congestion event rises proportionately, from 4 to 200. Cubic was 310 developed to be less unscalable, but it is approaching its scaling 311 limit; with the same RTT of 36ms, at 100Mb/s it takes over 300 312 round trips to recover, and at 800 Mb/s its recovery time doubles 313 to over 600 round trips, or more than 20 seconds. 315 Scalable Congestion Control: A congestion control where the average 316 time from one congestion signal to the next (the recovery time) 317 remains invariant as the flow rate scales, all other factors being 318 equal. This maintains the same degree of control over queueing 319 and utilization whatever the flow rate, as well as ensuring that 320 high throughput is robust to disturbances. For instance, DCTCP 321 averages 2 congestion signals per round-trip whatever the flow 322 rate. See Section 4.3 for more explanation. 324 Classic service: The Classic service is intended for all the 325 congestion control behaviours that co-exist with Reno [RFC5681] 326 (e.g. Reno itself, Cubic [RFC8312], Compound 327 [I-D.sridharan-tcpm-ctcp], TFRC [RFC5348]). 329 Low-Latency, Low-Loss Scalable throughput (L4S) service: The 'L4S' 330 service is intended for traffic from scalable congestion control 331 algorithms, such as Data Center TCP [RFC8257]. The L4S service is 332 for more general traffic than just DCTCP--it allows the set of 333 congestion controls with similar scaling properties to DCTCP to 334 evolve (e.g. Relentless TCP [Mathis09], TCP Prague [LinuxPrague] 335 and the L4S variant of SCREAM for real-time media [RFC8298]). 337 Both Classic and L4S services can cope with a proportion of 338 unresponsive or less-responsive traffic as well, as long as it 339 does not build a queue (e.g. DNS, VoIP, game sync datagrams, 340 etc). 342 The terms Classic or L4S can also qualify other nouns, such 343 'queue', 'codepoint', 'identifier', 'classification', 'packet', 344 'flow'. For example: a 'Classic queue', means a queue providing 345 the Classic service; an L4S packet means a packet with an L4S 346 identifier sent from an L4S congestion control. 348 Classic ECN: The original Explicit Congestion Notification (ECN) 349 protocol [RFC3168], which requires ECN signals to be treated the 350 same as drops, both when generated in the network and when 351 responded to by the sender. The names used for the four 352 codepoints of the 2-bit IP-ECN field are as defined in [RFC3168]: 353 Not ECT, ECT(0), ECT(1) and CE, where ECT stands for ECN-Capable 354 Transport and CE stands for Congestion Experienced. 356 1.3. Scope 358 The new L4S identifier defined in this specification is applicable 359 for IPv4 and IPv6 packets (as for Classic ECN [RFC3168]). It is 360 applicable for the unicast, multicast and anycast forwarding modes. 362 The L4S identifier is an orthogonal packet classification to the 363 Differentiated Services Code Point (DSCP) [RFC2474]. Section 5.4 364 explains what this means in practice. 366 This document is intended for experimental status, so it does not 367 update any standards track RFCs. Therefore it depends on [RFC8311], 368 which is a standards track specification that: 370 o updates the ECN proposed standard [RFC3168] to allow experimental 371 track RFCs to relax the requirement that an ECN mark must be 372 equivalent to a drop (when the network applies markings and/or 373 when the sender responds to them); 375 o changes the status of the experimental ECN nonce [RFC3540] to 376 historic; 378 o makes consequent updates to the following additional proposed 379 standard RFCs to reflect the above two bullets: 381 * ECN for RTP [RFC6679]; 382 * the congestion control specifications of various DCCP 383 congestion control identifier (CCID) profiles [RFC4341], 384 [RFC4342], [RFC5622]. 386 This document is about identifiers that are used for interoperation 387 between hosts and networks. So the audience is broad, covering 388 developers of host transports and network AQMs, as well as covering 389 how operators might wish to combine various identifiers, which would 390 require flexibility from equipment developers. 392 2. Consensus Choice of L4S Packet Identifier: Requirements 394 This subsection briefly records the process that led to a consensus 395 choice of L4S identifier, selected from all the alternatives in 396 Appendix B. 398 The identifier for packets using the Low Latency, Low Loss, Scalable 399 throughput (L4S) service needs to meet the following requirements: 401 o it SHOULD survive end-to-end between source and destination 402 applications: across the boundary between host and network, 403 between interconnected networks, and through middleboxes; 405 o it SHOULD be visible at the IP layer 407 o it SHOULD be common to IPv4 and IPv6 and transport-agnostic; 409 o it SHOULD be incrementally deployable; 411 o it SHOULD enable an AQM to classify packets encapsulated by outer 412 IP or lower-layer headers; 414 o it SHOULD consume minimal extra codepoints; 416 o it SHOULD be consistent on all the packets of a transport layer 417 flow, so that some packets of a flow are not served by a different 418 queue to others. 420 Whether the identifier would be recoverable if the experiment failed 421 is a factor that could be taken into account. However, this has not 422 been made a requirement, because that would favour schemes that would 423 be easier to fail, rather than those more likely to succeed. 425 It is recognised that the chosen identifier is unlikely to satisfy 426 all these requirements, particularly given the limited space left in 427 the IP header. Therefore a compromise will be necessary, which is 428 why all the above requirements are expressed with the word 'SHOULD' 429 not 'MUST'. Appendix B discusses the pros and cons of the 430 compromises made in various competing identification schemes against 431 the above requirements. 433 On the basis of this analysis, "ECT(1) and CE codepoints" is the best 434 compromise. Therefore this scheme is defined in detail in the 435 following sections, while Appendix B records the rationale for this 436 decision. 438 3. L4S Packet Identification at Run-Time 440 The L4S treatment is an experimental track alternative packet marking 441 treatment [RFC4774] to the Classic ECN treatment in [RFC3168], which 442 has been updated by [RFC8311] to allow experiments such as the one 443 defined in the present specification. Like Classic ECN, L4S ECN 444 identifies both network and host behaviour: it identifies the marking 445 treatment that network nodes are expected to apply to L4S packets, 446 and it identifies packets that have been sent from hosts that are 447 expected to comply with a broad type of sending behaviour. 449 For a packet to receive L4S treatment as it is forwarded, the sender 450 sets the ECN field in the IP header to the ECT(1) codepoint. See 451 Section 4 for full transport layer behaviour requirements, including 452 feedback and congestion response. 454 A network node that implements the L4S service normally classifies 455 arriving ECT(1) and CE packets for L4S treatment. See Section 5 for 456 full network element behaviour requirements, including 457 classification, ECN-marking and interaction of the L4S identifier 458 with other identifiers and per-hop behaviours. 460 4. Prerequisite Transport Layer Behaviour 462 4.1. Prerequisite Codepoint Setting 464 A sender that wishes a packet to receive L4S treatment as it is 465 forwarded, MUST set the ECN field in the IP header (v4 or v6) to the 466 ECT(1) codepoint. 468 4.2. Prerequisite Transport Feedback 470 For a transport protocol to provide scalable congestion control it 471 MUST provide feedback of the extent of CE marking on the forward 472 path. When ECN was added to TCP [RFC3168], the feedback method 473 reported no more than one CE mark per round trip. Some transport 474 protocols derived from TCP mimic this behaviour while others report 475 the accurate extent of ECN marking. This means that some transport 476 protocols will need to be updated as a prerequisite for scalable 477 congestion control. The position for a few well-known transport 478 protocols is given below. 480 TCP: Support for the accurate ECN feedback requirements [RFC7560] 481 (such as that provided by AccECN [I-D.ietf-tcpm-accurate-ecn]) by 482 both ends is a prerequisite for scalable congestion control in 483 TCP. Therefore, the presence of ECT(1) in the IP headers even in 484 one direction of a TCP connection will imply that both ends must 485 be supporting accurate ECN feedback. However, the converse does 486 not apply. So even if both ends support AccECN, either of the two 487 ends can choose not to use a scalable congestion control, whatever 488 the other end's choice. 490 SCTP: A suitable ECN feedback mechanism for SCTP could add a chunk 491 to report the number of received CE marks (e.g. 492 [I-D.stewart-tsvwg-sctpecn]), and update the ECN feedback protocol 493 sketched out in Appendix A of the standards track specification of 494 SCTP [RFC4960]. 496 RTP over UDP: A prerequisite for scalable congestion control is for 497 both (all) ends of one media-level hop to signal ECN support 498 [RFC6679] and use the new generic RTCP feedback format of 499 [I-D.ietf-avtcore-cc-feedback-message]. The presence of ECT(1) 500 implies that both (all) ends of that media-level hop support ECN. 501 However, the converse does not apply. So each end of a media- 502 level hop can independently choose not to use a scalable 503 congestion control, even if both ends support ECN. 505 QUIC: Support for sufficiently fine-grained ECN feedback is provided 506 by the first IETF QUIC transport [I-D.ietf-quic-transport]. 508 DCCP: The ACK vector in DCCP [RFC4340] is already sufficient to 509 report the extent of CE marking as needed by a scalable congestion 510 control. 512 4.3. Prerequisite Congestion Response 514 As a condition for a host to send packets with the L4S identifier 515 (ECT(1)), it SHOULD implement a congestion control behaviour that 516 ensures that, in steady state, the average time from one ECN 517 congestion signal to the next (the 'recovery time') does not increase 518 as flow rate scales, all other factors being equal. This is termed a 519 scalable congestion control. This is necessary to ensure that queue 520 variations remain small as flow rate scales, without having to 521 sacrifice utilization. For instance, for DCTCP, the average recovery 522 time is always half a round trip, whatever the flow rate. 524 The condition 'all other factors being equal', allows the recovery 525 time to be different for different round trip times, as long as it 526 does not increase with flow rate for any particular RTT. 528 Saying that the recovery time remains roughly invariant is equivalent 529 to saying that the number of ECN CE marks per round trip remains 530 invariant as flow rate scales, all other factors being equal. For 531 instance, DCTCP's average recovery time of half of 1 RTT is 532 equivalent to 2 ECN marks per round trip. For those who understand 533 steady-state congestion response functions, it is also equivalent to 534 say that, the congestion window is inversely proportional to the 535 proportion of bytes in packets marked with the CE codepoint (see 536 section 2 of [PI2]). 538 As well as DCTCP, TCP Prague [LinuxPrague] and the L4S variant of 539 SCReAM [RFC8298] are examples of scalable congestion controls. 541 As with all transport behaviours, a detailed specification (probably 542 an experimental RFC) will need to be defined for each type of 543 transport or application, including the timescale over which the 544 proportionality is averaged, and control of burstiness. The recovery 545 time requirement above is worded as a 'SHOULD' rather than a 'MUST' 546 to allow reasonable flexibility when defining these specifications. 548 Each sender in a session can use a scalable congestion control 549 independently of the congestion control used by the receiver(s) when 550 they send data. Therefore there might be ECT(1) packets in one 551 direction and ECT(0) or Not-ECT in the other. 553 In order to coexist safely with other Internet traffic, a scalable 554 congestion control MUST NOT tag its packets with the ECT(1) codepoint 555 unless it complies with the following bulleted requirements. The 556 specification of a particular scalable congestion control MUST 557 describe in detail how it satisfies each requirement and, for any 558 non-mandatory requirements, it MUST justify why it does not comply: 560 o As well as responding to ECN markings, a scalable congestion 561 control MUST react to packet loss in a way that will coexist 562 safely with a TCP Reno congestion control [RFC5681] (see 563 Appendix A.1.3 for rationale). 565 o A scalable congestion control MUST react to ECN marking from a 566 non-L4S but ECN-capable bottleneck in a way that will coexist with 567 a TCP Reno congestion control [RFC5681] (see Appendix A.1.4 for 568 rationale). 570 Note that a scalable congestion control is not expected to change 571 to setting ECT(0) while it falls back to coexist with Reno. 573 o A scalable congestion control MUST reduce or eliminate RTT bias 574 over as wide a range of RTTs as possible, or at least over the 575 typical range of RTTs that will interact in the intended 576 deployment scenario (see Appendix A.1.5 for rationale). 578 o A scalable congestion control MUST remain responsive to congestion 579 when the RTT is significantly smaller than in the public Internet 580 (see Appendix A.1.6 for rationale). 582 o A scalable congestion control intended for reordering-prone 583 networks SHOULD detect loss by counting in time-based units, which 584 is scalable, as opposed to counting in units of packets (as in the 585 3 DupACK rule of RFC 5681 TCP), which is not scalable (see 586 Appendix A.1.7 for rationale). This requirement is scoped to 587 'reordering-prone networks' in order to exclude congestion 588 controls that are solely used in controlled environments where the 589 network introduces hardly any reordering. 591 As well as traffic controlled by a scalable congestion control, a 592 reasonable level of smooth unresponsive traffic at a low rate 593 relative to typical broadband capacities is likely to be acceptable 594 (see "'Safe' Unresponsive Traffic" in Section 5.4.1.1.1). 596 5. Prerequisite Network Node Behaviour 598 5.1. Prerequisite Classification and Re-Marking Behaviour 600 A network node that implements the L4S service MUST classify arriving 601 ECT(1) packets for L4S treatment and, other than in the exceptional 602 case referred to next, it MUST classify arriving CE packets for L4S 603 treatment as well. CE packets might have originated as ECT(1) or 604 ECT(0), but the above rule to classify them as if they originated as 605 ECT(1) is the safe choice (see Appendix B.1 for rationale). The 606 exception is where some flow-aware in-network mechanism happens to be 607 available for distinguishing CE packets that originated as ECT(0), as 608 described in Section 5.3, but there is no implication that such a 609 mechanism is necessary. 611 An L4S AQM treatment follows similar codepoint transition rules to 612 those in RFC 3168. Specifically, the ECT(1) codepoint MUST NOT be 613 changed to any other codepoint than CE, and CE MUST NOT be changed to 614 any other codepoint. An ECT(1) packet is classified as ECN-capable 615 and, if congestion increases, an L4S AQM algorithm will increasingly 616 mark the ECN field as CE, otherwise forwarding packets unchanged as 617 ECT(1). Necessary conditions for an L4S marking treatment are 618 defined in Section 5.2. Under persistent overload an L4S marking 619 treatment SHOULD turn off ECN marking, using drop as a congestion 620 signal until the overload episode has subsided, as recommended for 621 all AQM methods in [RFC7567] (Section 4.2.1), which follows the 622 similar advice in RFC 3168 (Section 7). 624 For backward compatibility in uncontrolled environments, a network 625 node that implements the L4S treatment MUST also implement an AQM 626 treatment for the Classic service as defined in Section 1.2. This 627 Classic AQM treatment need not mark ECT(0) packets, but if it does, 628 it will do so under the same conditions as it would drop Not-ECT 629 packets [RFC3168]. It MUST classify arriving ECT(0) and Not-ECT 630 packets for treatment by the Classic AQM (see the discussion of the 631 classifier for the dual-queue coupled AQM in 632 [I-D.ietf-tsvwg-aqm-dualq-coupled]). 634 5.2. The Meaning of L4S CE Relative to Drop 636 The likelihood that an AQM drops a Not-ECT Classic packet (p_C) MUST 637 be roughly proportional to the square of the likelihood that it would 638 have marked it if it had been an L4S packet (p_L). That is 640 p_C ~= (p_L / k)^2 642 The constant of proportionality (k) does not have to be standardised 643 for interoperability, but a value of 2 is RECOMMENDED. The term 644 'likelihood' is used above to allow for marking and dropping to be 645 either probabilistic or deterministic. 647 This formula ensures that Scalable and Classic flows will converge to 648 roughly equal congestion windows, for the worst case of Reno 649 congestion control. This is because the congestion windows of 650 Scalable and Classic congestion controls are inversely proportional 651 to p_L and sqrt(p_C) respectively. So squaring p_C in the above 652 formula counterbalances the square root that characterizes TCP-Reno- 653 friendly flows. 655 [I-D.ietf-tsvwg-aqm-dualq-coupled] specifies the essential aspects of 656 an L4S AQM, as well as recommending other aspects. It gives example 657 implementations in appendices. 659 Note that, contrary to RFC 3168, a Coupled Dual Queue AQM 660 implementing the L4S and Classic treatments does not mark an ECT(1) 661 packet under the same conditions that it would have dropped a Not-ECT 662 packet, as allowed by [RFC8311], which updates RFC 3168. However, if 663 it marks ECT(0) packets, it does so under the same conditions that it 664 would have dropped a Not-ECT packet. 666 5.3. Exception for L4S Packet Identification by Network Nodes with 667 Transport-Layer Awareness 669 To implement the L4S treatment, a network node does not need to 670 identify transport-layer flows. Nonetheless, if an implementer is 671 willing to identify transport-layer flows at a network node, and if 672 the most recent ECT packet in the same flow was ECT(0), the node MAY 673 classify CE packets for Classic ECN [RFC3168] treatment. In all 674 other cases, a network node MUST classify all CE packets for L4S 675 treatment. Examples of such other cases are: i) if no ECT packets 676 have yet been identified in a flow; ii) if it is not desirable for a 677 network node to identify transport-layer flows; or iii) if the most 678 recent ECT packet in a flow was ECT(1). 680 If an implementer uses flow-awareness to classify CE packets, to 681 determine whether the flow is using ECT(0) or ECT(1) it only uses the 682 most recent ECT packet of a flow (this advice will need to be 683 verified as part of L4S experiments). This is because a sender might 684 switch from sending ECT(1) (L4S) packets to sending ECT(0) (Classic 685 ECN) packets, or back again, in the middle of a transport-layer flow 686 (e.g. it might manually switch its congestion control module mid- 687 connection, or it might be deliberately attempting to confuse the 688 network). 690 5.4. Interaction of the L4S Identifier with other Identifiers 692 The examples in this section concern how additional identifiers might 693 complement the L4S identifier to classify packets between class-based 694 queues. Firstly considering two queues, L4S and Classic, as in the 695 Coupled DualQ AQM [I-D.ietf-tsvwg-aqm-dualq-coupled], then more 696 complex structures within a larger queuing hierarchy. 698 5.4.1. DualQ Examples of Other Identifiers Complementing L4S 699 Identifiers 701 5.4.1.1. Inclusion of Additional Traffic with L4S 703 In a typical case for the public Internet a network element that 704 implements L4S might want to classify some low-rate but unresponsive 705 traffic (e.g. DNS, LDAP, NTP, voice, game sync packets) into the low 706 latency queue to mix with L4S traffic. Such non-ECN-based packet 707 types MUST be safe to mix with L4S traffic without harming the low 708 latency service, where 'safe' is explained in Section 5.4.1.1.1 709 below. 711 In this case it would not be appropriate to call the queue an L4S 712 queue, because it is shared by L4S and non-L4S traffic. Instead it 713 will be called the low latency or L queue. The L queue then offers 714 two different treatments: 716 o The L4S treatment, which is a combination of the L4S AQM treatment 717 and a priority scheduling treatment; 719 o The low latency treatment, which is solely the priority scheduling 720 treatment, without ECN-marking by the AQM. 722 To identify packets for just the scheduling treatment, it would be 723 inappropriate to use the L4S ECT(1) identifier, because such traffic 724 is unresponsive to ECN marking. Therefore, a network element that 725 implements L4S MAY classify additional packets into the L queue if 726 they carry certain non-ECN identifiers. For instance: 728 o addresses of specific applications or hosts configured to be safe 729 (or perhaps they comply with L4S behaviour and can respond to ECN 730 feedback, but perhaps cannot set the ECN field for some reason); 732 o certain protocols that are usually lightweight (e.g. ARP, DNS); 734 o specific Diffserv codepoints that indicate traffic with limited 735 burstiness such as the EF (Expedited Forwarding [RFC3246]), Voice- 736 Admit [RFC5865] or proposed NQB (Non-Queue-Building 737 [I-D.ietf-tsvwg-nqb]) service classes or equivalent local-use 738 DSCPs (see [I-D.briscoe-tsvwg-l4s-diffserv]). 740 Of course, a packet that carried both the ECT(1) codepoint and a non- 741 ECN identifier associated with the L queue would be classified into 742 the L queue. 744 For clarity, non-ECN identifiers, such as the examples itemized 745 above, might be used by some network operators who believe they 746 identify non-L4S traffic that would be safe to mix with L4S traffic. 747 They are not alternative ways for a host to indicate that it is 748 sending L4S packets. Only the ECT(1) ECN codepoint indicates to a 749 network element that a host is sending L4S packets (and CE indicates 750 that it could have originated as ECT(1)). Specifically ECT(1) 751 indicates that the host claims its behaviour satisfies the 752 prerequisite transport requirements in Section 4. 754 To include additional traffic with L4S, a network element only reads 755 identifiers such as those itemized above. It MUST NOT alter these 756 non-ECN identifiers, so that they survive for any potential use later 757 on the network path. 759 5.4.1.1.1. 'Safe' Unresponsive Traffic 761 The above section requires unresponsive traffic to be 'safe' to mix 762 with L4S traffic. Ideally this means that the sender never sends any 763 sequence of packets at a rate that exceeds the available capacity of 764 the bottleneck link. However, typically an unresponsive transport 765 does not even know the bottleneck capacity of the path, let alone its 766 available capacity. Nonetheless, an application can be considered 767 safe enough if it paces packets out (not necessarily completely 768 regularly) such that its maximum instantaneous rate from packet to 769 packet stays well below a typical broadband access rate. 771 This is a vague but useful definition, because many low latency 772 applications of interest, such as DNS, voice, game sync packets, RPC, 773 ACKs, keep-alives, could match this description. 775 5.4.1.2. Exclusion of Traffic From L4S Treatment 777 To extend the above example, an operator might want to exclude some 778 traffic from the L4S treatment for a policy reason, e.g. security 779 (traffic from malicious sources) or commercial (e.g. initially the 780 operator may wish to confine the benefits of L4S to business 781 customers). 783 In this exclusion case, the operator MUST classify on the relevant 784 locally-used identifiers (e.g. source addresses) before classifying 785 the non-matching traffic on the end-to-end L4S ECN identifier. 787 The operator MUST NOT alter the end-to-end L4S ECN identifier from 788 L4S to Classic, because its decision to exclude certain traffic from 789 L4S treatment is local-only. The end-to-end L4S identifier then 790 survives for other operators to use, or indeed, they can apply their 791 own policy, independently based on their own choice of locally-used 792 identifiers. This approach also allows any operator to remove its 793 locally-applied exclusions in future, e.g. if it wishes to widen the 794 benefit of the L4S treatment to all its customers. 796 5.4.1.3. Generalized Combination of L4S and Other Identifiers 798 L4S concerns low latency, which it can provide for all traffic 799 without differentiation and without affecting bandwidth allocation. 800 Diffserv provides for differentiation of both bandwidth and low 801 latency, but its control of latency depends on its control of 802 bandwidth. The two can be combined if a network operator wants to 803 control bandwidth allocation but it also wants to provide low latency 804 - for any amount of traffic within one of these allocations of 805 bandwidth (rather than only providing low latency by limiting 806 bandwidth) [I-D.briscoe-tsvwg-l4s-diffserv]. 808 The DualQ examples so far have been framed in the context of 809 providing the default Best Efforts Per-Hop Behaviour (PHB) using two 810 queues - a Low Latency (L) queue and a Classic (C) Queue. This 811 single DualQ structure is expected to be the most common and useful 812 arrangement. But, more generally, an operator might choose to 813 control bandwidth allocation through a hierarchy of Diffserv PHBs at 814 a node, and to offer one (or more) of these PHBs with a low latency 815 and a Classic variant. 817 In the first case, if we assume that there are no other PHBs except 818 the DualQ, if a packet carries ECT(1) or CE, a network element would 819 classify it for the L4S treatment irrespective of its DSCP. And, if 820 a packet carried (say) the EF DSCP, the network element could 821 classify it into the L queue irrespective of its ECN codepoint. 822 However, where the DualQ is in a hierarchy of other PHBs, the 823 classifier would classify some traffic into other PHBs based on DSCP 824 before classifying between the low latency and Classic queues (based 825 on ECT(1), CE and perhaps also the EF DSCP or other identifiers as in 826 the above example). [I-D.briscoe-tsvwg-l4s-diffserv] gives a number 827 of examples of such arrangements to address various requirements. 829 [I-D.briscoe-tsvwg-l4s-diffserv] describes how an operator might use 830 L4S to offer low latency for all L4S traffic as well as using 831 Diffserv for bandwidth differentiation. It identifies two main types 832 of approach, which can be combined: the operator might split certain 833 Diffserv PHBs between L4S and a corresponding Classic service. Or it 834 might split the L4S and/or the Classic service into multiple Diffserv 835 PHBs. In either of these cases, a packet would have to be classified 836 on its Diffserv and ECN codepoints. 838 In summary, there are numerous ways in which the L4S ECN identifier 839 (ECT(1) and CE) could be combined with other identifiers to achieve 840 particular objectives. The following categorization articulates 841 those that are valid, but it is not necessarily exhaustive. Those 842 tagged 'Recommended-standard-use' could be set by the sending host or 843 a network. Those tagged 'Local-use' would only be set by a network: 845 1. Identifiers Complementing the L4S Identifier 847 A. Including More Traffic in the L Queue 848 (Could use Recommended-standard-use or Local-use identifiers) 850 B. Excluding Certain Traffic from the L Queue 851 (Local-use only) 853 2. Identifiers to place L4S classification in a PHB Hierarchy 854 (Could use Recommended-standard-use or Local-use identifiers) 855 A. PHBs Before L4S ECN Classification 857 B. PHBs After L4S ECN Classification 859 5.4.2. Per-Flow Queuing Examples of Other Identifiers Complementing L4S 860 Identifiers 862 At a node with per-flow queueing (e.g. FQ-CoDel [RFC8290]), the L4S 863 identifier could complement the Layer-4 flow ID as a further level of 864 flow granularity (i.e. Not-ECT and ECT(0) queued separately from 865 ECT(1) and CE packets). "Risk of reordering Classic CE packets" in 866 Appendix B.1 discusses the resulting ambiguity if packets originally 867 marked ECT(0) are marked CE by an upstream AQM before they arrive at 868 a node that classifies CE as L4S. It argues that the risk of re- 869 ordering is vanishingly small and the consequence of such a low level 870 of re-ordering is minimal. 872 Alternatively, it could be assumed that it is not in a flow's own 873 interest to mix Classic and L4S identifiers. Then the AQM could use 874 the ECN field to switch itself between a Classic and an L4S AQM 875 behaviour within one per-flow queue. For instance, for ECN-capable 876 packets, the AQM might consist of a simple marking threshold and an 877 L4S ECN identifier might simply select a shallower threshold than a 878 Classic ECN identifier would. 880 6. L4S Experiments 882 [I-D.ietf-tsvwg-aqm-dualq-coupled] sets operational and management 883 requirements for experiments with DualQ Coupled AQMs. General 884 operational and management requirements for experiments with L4S 885 congestion controls are given in Section 4 and Section 5 above, e.g. 886 co-existence and scaling requirements, incremental deployment 887 arrangements. 889 The specification of each scalable congestion control will need to 890 include protocol-specific requirements for configuration and 891 monitoring performance during experiments. Appendix A of [RFC5706] 892 provides a helpful checklist. 894 Monitoring for harm to other traffic, specifically bandwidth 895 starvation or excess queuing delay, will need to be conducted 896 alongside all early L4S experiments. It is hard, if not impossible, 897 for an individual flow to measure its impact on other traffic. So 898 such monitoring will need to be conducted using bespoke monitoring 899 across flows and/or across classes of traffic. 901 7. IANA Considerations 903 This specification contains no IANA considerations. 905 8. Security Considerations 907 Approaches to assure the integrity of signals using the new identifer 908 are introduced in Appendix C.1. See the security considerations in 909 the L4S architecture [I-D.ietf-tsvwg-l4s-arch] for further discussion 910 of mis-use of the identifier. 912 The recommendation to detect loss in time units prevents the ACK- 913 splitting attacks described in [Savage-TCP]. 915 9. Acknowledgements 917 Thanks to Richard Scheffenegger, John Leslie, David Taeht, Jonathan 918 Morton, Gorry Fairhurst, Michael Welzl, Mikael Abrahamsson and Andrew 919 McGregor for the discussions that led to this specification. Ing-jyh 920 (Inton) Tsang was a contributor to the early drafts of this document. 921 And thanks to Mikael Abrahamsson, Lloyd Wood, Nicolas Kuhn, Greg 922 White, Tom Henderson, David Black, Gorry Fairhurst, Brian Carpenter, 923 Jake Holland, Rod Grimes and Richard Scheffenegger for providing help 924 and reviewing this draft and to Ingemar Johansson for reviewing and 925 providing substantial text. Appendix A listing the Prague L4S 926 Requirements is based on text authored by Marcelo Bagnulo Braun that 927 was originally an appendix to [I-D.ietf-tsvwg-l4s-arch]. That text 928 was in turn based on the collective output of the attendees listed in 929 the minutes of a 'bar BoF' on DCTCP Evolution during IETF-94 930 [TCPPrague]. 932 The authors' contributions were part-funded by the European Community 933 under its Seventh Framework Programme through the Reducing Internet 934 Transport Latency (RITE) project (ICT-317700). Bob Briscoe was also 935 funded partly by the Research Council of Norway through the TimeIn 936 project, partly by CableLabs and partly by the Comcast Innovation 937 Fund. The views expressed here are solely those of the authors. 939 10. References 941 10.1. Normative References 943 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 944 Requirement Levels", BCP 14, RFC 2119, 945 DOI 10.17487/RFC2119, March 1997, 946 . 948 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 949 of Explicit Congestion Notification (ECN) to IP", 950 RFC 3168, DOI 10.17487/RFC3168, September 2001, 951 . 953 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 954 Explicit Congestion Notification (ECN) Field", BCP 124, 955 RFC 4774, DOI 10.17487/RFC4774, November 2006, 956 . 958 [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., 959 and K. Carlberg, "Explicit Congestion Notification (ECN) 960 for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August 961 2012, . 963 10.2. Informative References 965 [A2DTCP] Zhang, T., Wang, J., Huang, J., Huang, Y., Chen, J., and 966 Y. Pan, "Adaptive-Acceleration Data Center TCP", IEEE 967 Transactions on Computers 64(6):1522-1533, June 2015, 968 . 971 [Ahmed19] Ahmed, A., "Extending TCP for Low Round Trip Delay", 972 Masters Thesis, Uni Oslo , August 2019, 973 . 975 [Alizadeh-stability] 976 Alizadeh, M., Javanmard, A., and B. Prabhakar, "Analysis 977 of DCTCP: Stability, Convergence, and Fairness", ACM 978 SIGMETRICS 2011 , June 2011. 980 [ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An 981 Algorithm for Increasing the Robustness of RED's Active 982 Queue Management", ACIRI Technical Report , August 2001, 983 . 985 [DCttH15] De Schepper, K., Bondarenko, O., Briscoe, B., and I. 986 Tsang, "'Data Centre to the Home': Ultra-Low Latency for 987 All", RITE Project Technical Report , 2015, 988 . 990 [I-D.briscoe-tsvwg-l4s-diffserv] 991 Briscoe, B., "Interactions between Low Latency, Low Loss, 992 Scalable Throughput (L4S) and Differentiated Services", 993 draft-briscoe-tsvwg-l4s-diffserv-02 (work in progress), 994 November 2018. 996 [I-D.ietf-avtcore-cc-feedback-message] 997 Sarker, Z., Perkins, C., Singh, V., and M. Ramalho, "RTP 998 Control Protocol (RTCP) Feedback for Congestion Control", 999 draft-ietf-avtcore-cc-feedback-message-05 (work in 1000 progress), November 2019. 1002 [I-D.ietf-quic-transport] 1003 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 1004 and Secure Transport", draft-ietf-quic-transport-25 (work 1005 in progress), January 2020. 1007 [I-D.ietf-tcpm-accurate-ecn] 1008 Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More 1009 Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate- 1010 ecn-09 (work in progress), July 2019. 1012 [I-D.ietf-tcpm-generalized-ecn] 1013 Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit 1014 Congestion Notification (ECN) to TCP Control Packets", 1015 draft-ietf-tcpm-generalized-ecn-05 (work in progress), 1016 November 2019. 1018 [I-D.ietf-tcpm-rack] 1019 Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK: 1020 a time-based fast loss detection algorithm for TCP", 1021 draft-ietf-tcpm-rack-07 (work in progress), January 2020. 1023 [I-D.ietf-tsvwg-aqm-dualq-coupled] 1024 Schepper, K., Briscoe, B., and G. White, "DualQ Coupled 1025 AQMs for Low Latency, Low Loss and Scalable Throughput 1026 (L4S)", draft-ietf-tsvwg-aqm-dualq-coupled-10 (work in 1027 progress), July 2019. 1029 [I-D.ietf-tsvwg-ecn-encap-guidelines] 1030 Briscoe, B., Kaippallimalil, J., and P. Thaler, 1031 "Guidelines for Adding Congestion Notification to 1032 Protocols that Encapsulate IP", draft-ietf-tsvwg-ecn- 1033 encap-guidelines-13 (work in progress), May 2019. 1035 [I-D.ietf-tsvwg-l4s-arch] 1036 Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low 1037 Latency, Low Loss, Scalable Throughput (L4S) Internet 1038 Service: Architecture", draft-ietf-tsvwg-l4s-arch-04 (work 1039 in progress), July 2019. 1041 [I-D.ietf-tsvwg-nqb] 1042 White, G. and T. Fossati, "A Non-Queue-Building Per-Hop 1043 Behavior (NQB PHB) for Differentiated Services", draft- 1044 ietf-tsvwg-nqb-00 (work in progress), November 2019. 1046 [I-D.sridharan-tcpm-ctcp] 1047 Sridharan, M., Tan, K., Bansal, D., and D. Thaler, 1048 "Compound TCP: A New TCP Congestion Control for High-Speed 1049 and Long Distance Networks", draft-sridharan-tcpm-ctcp-02 1050 (work in progress), November 2008. 1052 [I-D.stewart-tsvwg-sctpecn] 1053 Stewart, R., Tuexen, M., and X. Dong, "ECN for Stream 1054 Control Transmission Protocol (SCTP)", draft-stewart- 1055 tsvwg-sctpecn-05 (work in progress), January 2014. 1057 [LinuxPacedChirping] 1058 Misund, J. and B. Briscoe, "Paced Chirping - Rethinking 1059 TCP start-up", Proc. Linux Netdev 0x13 , March 2019, 1060 . 1062 [LinuxPrague] 1063 Briscoe, B., De Schepper, K., Albisser, O., Misund, J., 1064 Tilmans, O., Kuehlewind, M., and A. Ahmed, "Implementing 1065 the `TCP Prague' Requirements for Low Latency Low Loss 1066 Scalable Throughput (L4S)", Proc. Linux Netdev 0x13 , 1067 March 2019, . 1070 [Mathis09] 1071 Mathis, M., "Relentless Congestion Control", PFLDNeT'09 , 1072 May 2009, . 1075 [Paced-Chirping] 1076 Misund, J., "Rapid Acceleration in TCP Prague", Masters 1077 Thesis , May 2018, 1078 . 1081 [PI2] De Schepper, K., Bondarenko, O., Tsang, I., and B. 1082 Briscoe, "PI^2 : A Linearized AQM for both Classic and 1083 Scalable TCP", Proc. ACM CoNEXT 2016 pp.105-119, December 1084 2016, 1085 . 1087 [QV] Briscoe, B. and P. Hurtig, "Up to Speed with Queue View", 1088 RITE Technical Report D2.3; Appendix C.2, August 2015, 1089 . 1092 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 1093 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 1094 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 1095 S., Wroclawski, J., and L. Zhang, "Recommendations on 1096 Queue Management and Congestion Avoidance in the 1097 Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998, 1098 . 1100 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1101 "Definition of the Differentiated Services Field (DS 1102 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1103 DOI 10.17487/RFC2474, December 1998, 1104 . 1106 [RFC2983] Black, D., "Differentiated Services and Tunnels", 1107 RFC 2983, DOI 10.17487/RFC2983, October 2000, 1108 . 1110 [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, 1111 J., Courtney, W., Davari, S., Firoiu, V., and D. 1112 Stiliadis, "An Expedited Forwarding PHB (Per-Hop 1113 Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002, 1114 . 1116 [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit 1117 Congestion Notification (ECN) Signaling with Nonces", 1118 RFC 3540, DOI 10.17487/RFC3540, June 2003, 1119 . 1121 [RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows", 1122 RFC 3649, DOI 10.17487/RFC3649, December 2003, 1123 . 1125 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 1126 Congestion Control Protocol (DCCP)", RFC 4340, 1127 DOI 10.17487/RFC4340, March 2006, 1128 . 1130 [RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion 1131 Control Protocol (DCCP) Congestion Control ID 2: TCP-like 1132 Congestion Control", RFC 4341, DOI 10.17487/RFC4341, March 1133 2006, . 1135 [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for 1136 Datagram Congestion Control Protocol (DCCP) Congestion 1137 Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, 1138 DOI 10.17487/RFC4342, March 2006, 1139 . 1141 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 1142 RFC 4960, DOI 10.17487/RFC4960, September 2007, 1143 . 1145 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 1146 Friendly Rate Control (TFRC): Protocol Specification", 1147 RFC 5348, DOI 10.17487/RFC5348, September 2008, 1148 . 1150 [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. 1151 Ramakrishnan, "Adding Explicit Congestion Notification 1152 (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, 1153 DOI 10.17487/RFC5562, June 2009, 1154 . 1156 [RFC5622] Floyd, S. and E. Kohler, "Profile for Datagram Congestion 1157 Control Protocol (DCCP) Congestion ID 4: TCP-Friendly Rate 1158 Control for Small Packets (TFRC-SP)", RFC 5622, 1159 DOI 10.17487/RFC5622, August 2009, 1160 . 1162 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1163 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1164 . 1166 [RFC5706] Harrington, D., "Guidelines for Considering Operations and 1167 Management of New Protocols and Protocol Extensions", 1168 RFC 5706, DOI 10.17487/RFC5706, November 2009, 1169 . 1171 [RFC5865] Baker, F., Polk, J., and M. Dolly, "A Differentiated 1172 Services Code Point (DSCP) for Capacity-Admitted Traffic", 1173 RFC 5865, DOI 10.17487/RFC5865, May 2010, 1174 . 1176 [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP 1177 Authentication Option", RFC 5925, DOI 10.17487/RFC5925, 1178 June 2010, . 1180 [RFC6077] Papadimitriou, D., Ed., Welzl, M., Scharf, M., and B. 1181 Briscoe, "Open Research Issues in Internet Congestion 1182 Control", RFC 6077, DOI 10.17487/RFC6077, February 2011, 1183 . 1185 [RFC6660] Briscoe, B., Moncaster, T., and M. Menth, "Encoding Three 1186 Pre-Congestion Notification (PCN) States in the IP Header 1187 Using a Single Diffserv Codepoint (DSCP)", RFC 6660, 1188 DOI 10.17487/RFC6660, July 2012, 1189 . 1191 [RFC7560] Kuehlewind, M., Ed., Scheffenegger, R., and B. Briscoe, 1192 "Problem Statement and Requirements for Increased Accuracy 1193 in Explicit Congestion Notification (ECN) Feedback", 1194 RFC 7560, DOI 10.17487/RFC7560, August 2015, 1195 . 1197 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 1198 Recommendations Regarding Active Queue Management", 1199 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 1200 . 1202 [RFC7713] Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx) 1203 Concepts, Abstract Mechanism, and Requirements", RFC 7713, 1204 DOI 10.17487/RFC7713, December 2015, 1205 . 1207 [RFC8033] Pan, R., Natarajan, P., Baker, F., and G. White, 1208 "Proportional Integral Controller Enhanced (PIE): A 1209 Lightweight Control Scheme to Address the Bufferbloat 1210 Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017, 1211 . 1213 [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., 1214 and G. Judd, "Data Center TCP (DCTCP): TCP Congestion 1215 Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257, 1216 October 2017, . 1218 [RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys, 1219 J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler 1220 and Active Queue Management Algorithm", RFC 8290, 1221 DOI 10.17487/RFC8290, January 2018, 1222 . 1224 [RFC8298] Johansson, I. and Z. Sarker, "Self-Clocked Rate Adaptation 1225 for Multimedia", RFC 8298, DOI 10.17487/RFC8298, December 1226 2017, . 1228 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 1229 Notification (ECN) Experimentation", RFC 8311, 1230 DOI 10.17487/RFC8311, January 2018, 1231 . 1233 [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 1234 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 1235 RFC 8312, DOI 10.17487/RFC8312, February 2018, 1236 . 1238 [Savage-TCP] 1239 Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, 1240 "TCP Congestion Control with a Misbehaving Receiver", ACM 1241 SIGCOMM Computer Communication Review 29(5):71--78, 1242 October 1999. 1244 [sub-mss-prob] 1245 Briscoe, B. and K. De Schepper, "Scaling TCP's Congestion 1246 Window for Small Round Trip Times", BT Technical Report 1247 TR-TUB8-2015-002, May 2015, 1248 . 1250 [TCP-CA] Jacobson, V. and M. Karels, "Congestion Avoidance and 1251 Control", Laurence Berkeley Labs Technical Report , 1252 November 1988, . 1254 [TCPPrague] 1255 Briscoe, B., "Notes: DCTCP evolution 'bar BoF': Tue 21 Jul 1256 2015, 17:40, Prague", tcpprague mailing list archive , 1257 July 2015, . 1260 [VCP] Xia, Y., Subramanian, L., Stoica, I., and S. Kalyanaraman, 1261 "One more bit is enough", Proc. SIGCOMM'05, ACM CCR 1262 35(4)37--48, 2005, 1263 . 1265 Appendix A. The 'Prague L4S Requirements' 1267 This appendix is informative, not normative. It gives a list of 1268 modifications to current scalable congestion controls so that they 1269 can be deployed over the public Internet and coexist safely with 1270 existing traffic. The list complements the normative requirements in 1271 Section 4 that a sender has to comply with before it can set the L4S 1272 identifier in packets it sends into the Internet. As well as 1273 necessary safety improvements (requirements) this appendix also 1274 includes preferable performance improvements (optimizations). 1276 These recommendations have become know as the Prague L4S 1277 Requirements, because they were originally identified at an ad hoc 1278 meeting during IETF-94 in Prague [TCPPrague]. The wording has been 1279 generalized to apply to all scalable congestion controls, not just 1280 TCP congestion control specifically. They were originally called the 1281 'TCP Prague Requirements', but they are not solely applicable to TCP, 1282 so the name has been generalized, and TCP Prague is now used for a 1283 specific implementation of the requirements. 1285 At the time of writing, DCTCP [RFC8257] is the most widely used 1286 scalable transport protocol. In its current form, DCTCP is specified 1287 to be deployable only in controlled environments. Deploying it in 1288 the public Internet would lead to a number of issues, both from the 1289 safety and the performance perspective. The modifications and 1290 additional mechanisms listed in this section will be necessary for 1291 its deployment over the global Internet. Where an example is needed, 1292 DCTCP is used as a base, but it is likely that most of these 1293 requirements equally apply to other scalable congestion controls. 1295 A.1. Requirements for Scalable Transport Protocols 1297 A.1.1. Use of L4S Packet Identifier 1299 Description: A scalable congestion control needs to distinguish the 1300 packets it sends from those sent by Classic congestion controls. 1302 Motivation: It needs to be possible for a network node to classify 1303 L4S packets without flow state into a queue that applies an L4S ECN 1304 marking behaviour and isolates L4S packets from the queuing delay of 1305 Classic packets. 1307 A.1.2. Accurate ECN Feedback 1309 Description: The transport protocol for a scalable congestion control 1310 needs to provide timely, accurate feedback about the extent of ECN 1311 marking experienced by all packets. 1313 Motivation: Classic congestion controls only need feedback about the 1314 existence of a congestion episode within a round trip, not precisely 1315 how many packets were marked with ECN or dropped. Therefore, in 1316 2001, when ECN feedback was added to TCP [RFC3168], it could not 1317 inform the sender of more than one ECN mark per RTT. Since then, 1318 requirements for more accurate ECN feedback in TCP have been defined 1319 in [RFC7560] and [I-D.ietf-tcpm-accurate-ecn] specifies an 1320 experimental change to the TCP wire protocol to satisfy these 1321 requirements. Most other transport protocols already satisfy this 1322 requirement. 1324 A.1.3. Fall back to Reno-friendly congestion control on packet loss 1326 Description: As well as responding to ECN markings in a scalable way, 1327 a scalable congestion control needs to react to packet loss in a way 1328 that will coexist safely with a TCP Reno congestion control 1329 [RFC5681]. 1331 Motivation: Part of the safety conditions for deploying a scalable 1332 congestion control on the public Internet is to make sure that it 1333 behaves properly when it builds a queue at a network bottleneck that 1334 has not been upgraded to support L4S. Packet loss can have many 1335 causes, but it usually has to be conservatively assumed that it is a 1336 sign of congestion. Therefore, on detecting packet loss, a scalable 1337 congestion control will need to fall back to Classic congestion 1338 control behaviour. If it does not comply with this requirement it 1339 could starve Classic traffic. 1341 A scalable congestion control can be used for different types of 1342 transport, e.g. for real-time media or for reliable transport like 1343 TCP. Therefore, the particular Classic congestion control behaviour 1344 to fall back on will need to be part of the congestion control 1345 specification of the relevant transport. In the particular case of 1346 DCTCP, the DCTCP specification [RFC8257] states that "It is 1347 RECOMMENDED that an implementation deal with loss episodes in the 1348 same way as conventional TCP." For safe deployment of a scalable 1349 congestion control in the public Internet, the above requirement 1350 would need to be defined as a "MUST". 1352 Even though a bottleneck is L4S capable, it might still become 1353 overloaded and have to drop packets. In this case, the sender may 1354 receive a high proportion of packets marked with the CE bit set and 1355 also experience loss. Current DCTCP implementations react 1356 differently to this situation. At least one implementation reacts 1357 only to the drop signal (e.g. by halving the CWND) and at least 1358 another DCTCP implementation reacts to both signals (e.g. by halving 1359 the CWND due to the drop and also further reducing the CWND based on 1360 the proportion of marked packet). A third approach for the public 1361 Internet has been proposed that adjusts the loss response to result 1362 in a halving when combined with the ECN response. We believe that 1363 further experimentation is needed to understand what is the best 1364 behaviour for the public Internet, which may or not be one of these 1365 existing approaches. 1367 A.1.4. Fall back to Reno-friendly congestion control on classic ECN 1368 bottlenecks 1370 Description: A scalable congestion control needs to react to ECN 1371 marking from a non-L4S, but ECN-capable, bottleneck in a way that 1372 will coexist with a TCP Reno congestion control [RFC5681]. 1374 Motivation: Similarly to the requirement in Appendix A.1.3, this 1375 requirement is a safety condition to ensure a scalable congestion 1376 control behaves properly when it builds a queue at a network 1377 bottleneck that has not been upgraded to support L4S. On detecting 1378 Classic ECN marking (see below), a scalable congestion control will 1379 need to fall back to Classic congestion control behaviour. If it 1380 does not comply with this requirement it could starve Classic 1381 traffic. 1383 It would take time for endpoints to distinguish Classic and L4S ECN 1384 marking. An increase in queuing delay or in delay variation would be 1385 a tell-tale sign, but it is not yet clear where a line would be drawn 1386 between the two behaviours. It might be possible to cache what was 1387 learned about the path to help subsequent attempts to detect the type 1388 of marking. 1390 A.1.5. Reduce RTT dependence 1392 Description: A scalable congestion control needs to reduce or 1393 eliminate RTT bias over as wide a range of RTTs as possible, or at 1394 least over the typical range of RTTs that will interact in the 1395 intended deployment scenario. 1397 Motivation: The throughput of Classic congestion controls is known to 1398 be inversely proportional to RTT, so one would expect flows over very 1399 low RTT paths to nearly starve flows over larger RTTs. However, 1400 Classic congestion controls have never allowed a very low RTT path to 1401 exist because they induce a large queue. For instance, consider two 1402 paths with base RTT 1ms and 100ms. If a Classic congestion control 1403 induces a 100ms queue, it turns these RTTs into 101ms and 200ms 1404 leading to a throughput ratio of about 2:1. Whereas if a scalable 1405 congestion control induces only a 1ms queue, the ratio is 2:101, 1406 leading to a throughput ratio of about 50:1. 1408 Therefore, with very small queues, long RTT flows will essentially 1409 starve, unless scalable congestion controls comply with this 1410 requirement. 1412 A.1.6. Scaling down to fractional congestion windows 1414 Description: A scalable congestion control needs to remain responsive 1415 to congestion when RTTs are significantly smaller than in the current 1416 public Internet. 1418 Motivation: As currently specified, the minimum required congestion 1419 window of TCP (and its derivatives) is set to 2 sender maximum 1420 segment sizes (SMSS) (see equation (4) in [RFC5681]). Once the 1421 congestion window reaches this minimum, all known window-based 1422 congestion control algorithms become unresponsive to congestion 1423 signals. No matter how much drop or ECN marking, the congestion 1424 window no longer reduces. Instead, the sender's lack of any further 1425 congestion response forces the queue to grow, overriding any AQM and 1426 increasing queuing delay. 1428 L4S mechanisms significantly reduce queueing delay so, over the same 1429 path, the RTT becomes lower. Then this problem becomes surprisingly 1430 common [sub-mss-prob]. This is because, for the same link capacity, 1431 smaller RTT implies a smaller window. For instance, consider a 1432 residential setting with an upstream broadband Internet access of 8 1433 Mb/s, assuming a max segment size of 1500 B. Two upstream flows will 1434 each have the minimum window of 2 SMSS if the RTT is 6ms or less, 1435 which is quite common when accessing a nearby data centre. So, any 1436 more than two such parallel TCP flows will become unresponsive and 1437 increase queuing delay. 1439 Unless scalable congestion controls are required to comply with this 1440 requirement from the start, they will frequently become unresponsive, 1441 negating the low latency benefit of L4S, for themselves and for 1442 others. One possible sub-MSS window mechanism is described in 1443 [Ahmed19], and other approaches are likely to be feasible. 1445 A.1.7. Measuring Reordering Tolerance in Time Units 1447 Description: A scalable congestion control needs to detect loss by 1448 counting in time-based units, which is scalable, rather than counting 1449 in units of packets, which is not. 1451 Motivation: A primary purpose of L4S is scalable throughput (it's in 1452 the name). Scalability in all dimensions is, of course, also a goal 1453 of all IETF technology. The inverse linear congestion response in 1454 Section 4.3 is necessary, but not sufficient, to solve the congestion 1455 control scalability problem identified in [RFC3649]. As well as 1456 maintaining frequent ECN signals as rate scales, it is also important 1457 to ensure that a potentially false perception of loss does not limit 1458 throughput scaling. 1460 End-systems cannot know whether a missing packet is due to loss or 1461 reordering, except in hindsight - if it appears later. So they can 1462 only deem that there has been a loss if a gap in the sequence space 1463 has not been filled, either after a certain number of subsequent 1464 packets has arrived (e.g. the 3 DupACK rule of standard TCP 1465 congestion control [RFC5681]) or after a certain amount of time (e.g. 1466 the experimental RACK approach [I-D.ietf-tcpm-rack]). 1468 As we attempt to scale packet rate over the years: 1470 o Even if only _some_ sending hosts still deem that loss has 1471 occurred by counting reordered packets, _all_ networks will have 1472 to keep reducing the time over which they keep packets in order. 1473 If some link technologies keep the time within which reordering 1474 occurs roughly unchanged, then loss over these links, as perceived 1475 by these hosts, will appear to continually rise over the years. 1477 o In contrast, if all senders detect loss in units of time, the time 1478 over which the network has to keep packets in order stays roughly 1479 invariant. 1481 Therefore hosts have an incentive to detect loss in time units (so as 1482 not to fool themselves too often into detecting losses when there are 1483 none). And for hosts that are changing their congestion control 1484 implementation to L4S, there is no downside to including time-based 1485 loss detection code in the change (loss recovery implemented in 1486 hardware is an exception, covered later). Therefore requiring L4S 1487 hosts to detect loss in time-based units would not be a burden. 1489 If this requirement is not placed on L4S hosts, even though it would 1490 be no burden on them to do so, all networks will face unnecessary 1491 uncertainty over whether some L4S hosts might be detecting loss by 1492 counting packets. Then _all_ link technologies will have to 1493 unnecessarily keep reducing the time within which reordering occurs. 1494 That is not a problem for some link technologies, but it becomes 1495 increasingly challenging for other link technologies to continue to 1496 scale, particularly those relying on channel bonding for scaling, 1497 such as LTE, 5G and DOCSIS. 1499 Given Internet paths traverse many link technologies, any scaling 1500 limit for these more challenging access link technologies would 1501 become a scaling limit for the Internet as a whole. 1503 It might be asked how it helps to place this loss detection 1504 requirement only on L4S hosts, because networks will still face 1505 uncertainty over whether non-L4S flows are detecting loss by counting 1506 DupACKs. The answer is that those link technologies for which it is 1507 challenging to keep squeezing the reordering time will only need to 1508 do so for non-L4S traffic (which they can do because the L4S 1509 identifier is visible at the IP layer). Therefore, they can focus 1510 their processing and memory resources into scaling non-L4S (Classic) 1511 traffic. Then, the higher the proportion of L4S traffic, the less of 1512 a scaling challenge they will have. 1514 To summarize, there is no reason for L4S hosts not to be part of the 1515 solution instead of part of the problem. 1517 Requirement ("MUST") or recommendation ("SHOULD")? As explained 1518 above, this is a subtle interoperability issue between hosts and 1519 networks, which seems to need a "MUST". Unless networks can be 1520 certain that all L4S hosts follow the time-based approach, they still 1521 have to cater for the worst case - continually squeeze reordering 1522 into a smaller and smaller duration - just for hosts that might be 1523 using the counting approach. However, it was decided to express this 1524 as a recommendation, using "SHOULD". The main justification was that 1525 networks can still be fairly certain that L4S hosts will follow this 1526 recommendation, because following it offers only gain and no pain. 1528 Details: 1530 The speed of loss recovery is much more significant for short flows 1531 than long, therefore a good compromise is to adapt the reordering 1532 window; from a small fraction of the RTT at the start of a flow, to a 1533 larger fraction of the RTT for flows that continue for many round 1534 trips. 1536 This is broadly the approach adopted by TCP RACK (Recent 1537 ACKnowledgements) [I-D.ietf-tcpm-rack]. However, RACK starts with 1538 the 3 DupACK approach, because the RTT estimate is not necessarily 1539 stable. As long as the initial window is paced, such initial use of 1540 3 DupACK counting would amount to time-based loss detection and 1541 therefore would satisfy the time-based loss detection recommendation 1542 of Section 4.3. This is because pacing of the initial window would 1543 ensure that 3 DupACKs early in the connection would be spread over a 1544 small fraction of the round trip. 1546 As mentioned above, hardware implementations of loss recovery using 1547 DupACK counting exist (e.g. some implementations of RoCEv2 for RDMA). 1548 For low latency, these implementations can change their congestion 1549 control to implement L4S, because the congestion control (as distinct 1550 from loss recovery) is implemented in software. But they cannot 1551 easily satisfy this loss recovery requirement. However, it is 1552 believed they do not need to. It is believed that such 1553 implementations solely exist in controlled environments, where the 1554 network technology keeps reordering extremely low anyway. This is 1555 why the scope of the normative recommendation in Section 4.3 is 1556 limited to 'reordering-prone' networks. 1558 Detecting loss in time units also prevents the ACK-splitting attacks 1559 described in [Savage-TCP]. 1561 A.2. Scalable Transport Protocol Optimizations 1563 A.2.1. Setting ECT in TCP Control Packets and Retransmissions 1565 Description: This item only concerns TCP and its derivatives (e.g. 1566 SCTP), because the original specification of ECN for TCP precluded 1567 the use of ECN on control packets and retransmissions. To improve 1568 performance, scalable transport protocols ought to enable ECN at the 1569 IP layer in TCP control packets (SYN, SYN-ACK, pure ACKs, etc.) and 1570 in retransmitted packets. The same is true for derivatives of TCP, 1571 e.g. SCTP. 1573 Motivation: RFC 3168 prohibits the use of ECN on these types of TCP 1574 packet, based on a number of arguments. This means these packets are 1575 not protected from congestion loss by ECN, which considerably harms 1576 performance, particularly for short flows. 1577 [I-D.ietf-tcpm-generalized-ecn] counters each argument in RFC 3168 in 1578 turn, showing it was over-cautious. Instead it proposes experimental 1579 use of ECN on all types of TCP packet as long as AccECN feedback 1580 [I-D.ietf-tcpm-accurate-ecn] is available (which is itself a 1581 prerequisite for using a scalable congestion control). 1583 A.2.2. Faster than Additive Increase 1585 Description: It would improve performance if scalable congestion 1586 controls did not limit their congestion window increase to the 1587 standard additive increase of 1 SMSS per round trip [RFC5681] during 1588 congestion avoidance. The same is true for derivatives of TCP 1589 congestion control, including similar approaches used for real-time 1590 media. 1592 Motivation: As currently defined [RFC8257], DCTCP uses the 1593 traditional TCP Reno additive increase in congestion avoidance phase. 1594 When the available capacity suddenly increases (e.g. when another 1595 flow finishes, or if radio capacity increases) it can take very many 1596 round trips to take advantage of the new capacity. TCP Cubic was 1597 designed to solve this problem, but as flow rates have continued to 1598 increase, the delay accelerating into available capacity has become 1599 prohibitive. For instance, with RTT=20 ms, to increase flow rate 1600 from 100Mb/s to 200Mb/s Cubic takes between 50 and 100 round trips. 1601 Every 8x increase in flow rate leads to 2x more acceleration delay. 1603 In the steady state, DCTCP induces about 2 ECN marks per round trip, 1604 so it is possible to quickly detect when these signals have 1605 disappeared and seek available capacity more rapidly, while 1606 minimizing the impact on other flows (Classic and scalable) 1607 [LinuxPacedChirping]. Alternatively, approaches such as Adaptive 1608 Acceleration (A2DTCP [A2DTCP]) have been proposed to address this 1609 problem in data centres, which might be deployable over the public 1610 Internet. 1612 A.2.3. Faster Convergence at Flow Start 1614 Description: Particularly when a flow starts, scalable congestion 1615 controls need to converge (reach their steady-state share of the 1616 capacity) at least as fast as Classic congestion controls and 1617 preferably faster. This affects the flow start behaviour of any L4S 1618 congestion control derived from a Classic transport that uses TCP 1619 slow start, including those for real-time media. 1621 Motivation: As an example, a new DCTCP flow takes longer than a 1622 Classic congestion control to obtain its share of the capacity of the 1623 bottleneck when there are already ongoing flows using the bottleneck 1624 capacity. In a data centre environment DCTCP takes about a factor of 1625 1.5 to 2 longer to converge due to the much higher typical level of 1626 ECN marking that DCTCP background traffic induces, which causes new 1627 flows to exit slow start early [Alizadeh-stability]. In testing for 1628 use over the public Internet the convergence time of DCTCP relative 1629 to a regular loss-based TCP slow start is even less favourable 1630 [Paced-Chirping]) due to the shallow ECN marking threshold needed for 1631 L4S. It is exacerbated by the typically greater mismatch between the 1632 link rate of the sending host and typical Internet access 1633 bottlenecks. This problem is detrimental in general, but would 1634 particularly harm the performance of short flows relative to Classic 1635 congestion controls. 1637 Appendix B. Alternative Identifiers 1639 This appendix is informative, not normative. It records the pros and 1640 cons of various alternative ways to identify L4S packets to record 1641 the rationale for the choice of ECT(1) (Appendix B.1) as the L4S 1642 identifier. At the end, Appendix B.6 summarises the distinguishing 1643 features of the leading alternatives. It is intended to supplement, 1644 not replace the detailed text. 1646 The leading solutions all use the ECN field, sometimes in combination 1647 with the Diffserv field. This is because L4S traffic has to indicate 1648 that it is ECN-capable anyway, because ECN is intrinsic to how L4S 1649 works. Both the ECN and Diffserv fields have the additional 1650 advantage that they are no different in either IPv4 or IPv6. A 1651 couple of alternatives that use other fields are mentioned at the 1652 end, but it is quickly explained why they are not serious contenders. 1654 B.1. ECT(1) and CE codepoints 1656 Definition: 1658 Packets with ECT(1) and conditionally packets with CE would 1659 signify L4S semantics as an alternative to the semantics of 1660 Classic ECN [RFC3168], specifically: 1662 * The ECT(1) codepoint would signify that the packet was sent by 1663 an L4S-capable sender. 1665 * Given shortage of codepoints, both L4S and Classic ECN sides of 1666 an AQM would have to use the same CE codepoint to indicate that 1667 a packet had experienced congestion. If a packet that had 1668 already been marked CE in an upstream buffer arrived at a 1669 subsequent AQM, this AQM would then have to guess whether to 1670 classify CE packets as L4S or Classic ECN. Choosing the L4S 1671 treatment would be a safer choice, because then a few Classic 1672 packets might arrive early, rather than a few L4S packets 1673 arriving late. 1675 * Additional information might be available if the classifier 1676 were transport-aware. Then it could classify a CE packet for 1677 Classic ECN treatment if the most recent ECT packet in the same 1678 flow had been marked ECT(0). However, the L4S service ought 1679 not to need tranport-layer awareness. 1681 Cons: 1683 Consumes the last ECN codepoint: The L4S service is intended to 1684 supersede the service provided by Classic ECN, therefore using 1685 ECT(1) to identify L4S packets could ultimately mean that the 1686 ECT(0) codepoint was 'wasted' purely to distinguish one form of 1687 ECN from its successor. 1689 ECN hard in some lower layers: It is not always possible to support 1690 ECN in an AQM acting in a buffer below the IP layer 1691 [I-D.ietf-tsvwg-ecn-encap-guidelines]. In such cases, the L4S 1692 service would have to drop rather than mark frames even though 1693 they might encapsulate an ECN-capable packet. However, such cases 1694 would be unusual. 1696 Risk of reordering Classic CE packets: Classifying all CE packets 1697 into the L4S queue risks any CE packets that were originally 1698 ECT(0) being incorrectly classified as L4S. If there were delay 1699 in the Classic queue, these incorrectly classified CE packets 1700 would arrive early, which is a form of reordering. Reordering can 1701 cause TCP senders (and senders of similar transports) to 1702 retransmit spuriously. However, the risk of spurious 1703 retransmissions would be extremely low for the following reasons: 1705 1. It is quite unusual to experience queuing at more than one 1706 bottleneck on the same path (the available capacities have to 1707 be identical). 1709 2. In only a subset of these unusual cases would the first 1710 bottleneck support Classic ECN marking while the second 1711 supported L4S ECN marking, which would be the only scenario 1712 where some ECT(0) packets could be CE marked by an AQM 1713 supporting Classic ECN then the remainder experienced further 1714 delay through the Classic side of a subsequent L4S DualQ AQM. 1716 3. Even then, when a few packets are delivered early, it takes 1717 very unusual conditions to cause a spurious retransmission, in 1718 contrast to when some packets are delivered late. The first 1719 bottleneck has to apply CE-marks to at least N contiguous 1720 packets and the second bottleneck has to inject an 1721 uninterrupted sequence of at least N of these packets between 1722 two packets earlier in the stream (where N is the reordering 1723 window that the transport protocol allows before it considers 1724 a packet is lost). 1726 For example consider N=3, and consider the sequence of 1727 packets 100, 101, 102, 103,... and imagine that packets 1728 150,151,152 from later in the flow are injected as follows: 1729 100, 150, 151, 101, 152, 102, 103... If this were late 1730 reordering, even one packet arriving 50 out of sequence 1731 would trigger a spurious retransmission, but there is no 1732 spurious retransmission here, with early reordering, 1733 because packet 101 moves the cumulative ACK counter forward 1734 before 3 packets have arrived out of order. Later, when 1735 packets 148, 149, 153... arrive, even though there is a 1736 3-packet hole, there will be no problem, because the 1737 packets to fill the hole are already in the receive buffer. 1739 4. Even with the current TCP recommendation of N=3 [RFC5681] 1740 spurious retransmissions will be unlikely for all the above 1741 reasons. As RACK [I-D.ietf-tcpm-rack] is becoming widely 1742 deployed, it tends to adapt its reordering window to a larger 1743 value of N, which will make the chance of a contiguous 1744 sequence of N early arrivals vanishingly small. 1746 5. Even a run of 2 CE marks within a Classic ECN flow is 1747 unlikely, given FQ-CoDel is the only known widely deployed AQM 1748 that supports Classic ECN marking and it takes great care to 1749 separate out flows and to space any markings evenly along each 1750 flow. 1752 It is extremely unlikely that the above set of 5 eventualities 1753 that are each unusual in themselves would all happen 1754 simultaneously. But, even if they did, the consequences would 1755 hardly be dire: the odd spurious fast retransmission. Admittedly 1756 TCP (and similar transports) reduce their congestion window when 1757 they deem there has been a loss, but even this can be recovered 1758 once the sender detects that the retransmission was spurious. 1760 Non-L4S service for control packets: The Classic ECN RFCs [RFC3168] 1761 and [RFC5562] require a sender to clear the ECN field to Not-ECT 1762 for retransmissions and certain control packets specifically pure 1763 ACKs, window probes and SYNs. When L4S packets are classified by 1764 the ECN field alone, these control packets would not be classified 1765 into an L4S queue, and could therefore be delayed relative to the 1766 other packets in the flow. This would not cause re-ordering 1767 (because retransmissions are already out of order, and the control 1768 packets carry no data). However, it would make critical control 1769 packets more vulnerable to loss and delay. To address this 1770 problem, [I-D.ietf-tcpm-generalized-ecn] proposes an experiment in 1771 which all TCP control packets and retransmissions are ECN-capable 1772 as long as ECN feedback is available. 1774 Pros: 1776 Should work e2e: The ECN field generally works end-to-end across the 1777 Internet. Unlike the DSCP, the setting of the ECN field is at 1778 least forwarded unchanged by networks that do not support ECN, and 1779 networks rarely clear it to zero. 1781 Should work in tunnels: Unlike Diffserv, ECN is defined to always 1782 work across tunnels. However, tunnels do not always implement ECN 1783 processing as they should do, particularly because IPsec tunnels 1784 were defined differently for a few years. 1786 Could migrate to one codepoint: If all Classic ECN senders 1787 eventually evolve to use the L4S service, the ECT(0) codepoint 1788 could be reused for some future purpose, but only once use of 1789 ECT(0) packets had reduced to zero, or near-zero, which might 1790 never happen. 1792 B.2. ECN Plus a Diffserv Codepoint (DSCP) 1794 Definition: 1796 For packets with a defined DSCP, all codepoints of the ECN field 1797 (except Not-ECT) would signify alternative L4S semantics to those 1798 for Classic ECN [RFC3168], specifically: 1800 * The L4S DSCP would signifiy that the packet came from an L4S- 1801 capable sender. 1803 * ECT(0) and ECT(1) would both signify that the packet was 1804 travelling between transport endpoints that were both ECN- 1805 capable. 1807 * CE would signify that the packet had been marked by an AQM 1808 implementing the L4S service. 1810 Use of a DSCP is the only approach for alternative ECN semantics 1811 given as an example in [RFC4774]. However, it was perhaps considered 1812 more for controlled environments than new end-to-end services. 1814 Cons: 1816 Consumes DSCP pairs: A DSCP is obviously not orthogonal to Diffserv. 1817 Therefore, wherever the L4S service is applied to multiple 1818 Diffserv scheduling behaviours, it would be necessary to replace 1819 each DSCP with a pair of DSCPs. 1821 Uses critical lower-layer header space: The resulting increased 1822 number of DSCPs might be hard to support for some lower layer 1823 technologies, e.g. 802.1p and MPLS both offer only 3-bits for a 1824 maximum of 8 traffic class identifiers. Although L4S should 1825 reduce and possibly remove the need for some DSCPs intended for 1826 differentiated queuing delay, it will not remove the need for 1827 Diffserv entirely, because Diffserv is also used to allocate 1828 bandwidth, e.g. by prioritising some classes of traffic over 1829 others when traffic exceeds available capacity. 1831 Not end-to-end (host-network): Very few networks honour a DSCP set 1832 by a host. Typically a network will zero (bleach) the Diffserv 1833 field from all hosts. Sometimes networks will attempt to identify 1834 applications by some form of packet inspection and, based on 1835 network policy, they will set the DSCP considered appropriate for 1836 the identified application. Network-based application 1837 identification might use some combination of protocol ID, port 1838 numbers(s), application layer protocol headers, IP address(es), 1839 VLAN ID(s) and even packet timing. 1841 Not end-to-end (network-network): Very few networks honour a DSCP 1842 received from a neighbouring network. Typically a network will 1843 zero (bleach) the Diffserv field from all neighbouring networks at 1844 an interconnection point. Sometimes bilateral arrangements are 1845 made between networks, such that the receiving network remarks 1846 some DSCPs to those it uses for roughly equivalent services. The 1847 likelihood that a DSCP will be bleached or ignored depends on the 1848 type of DSCP: 1850 Local-use DSCP: These tend to be used to implement application- 1851 specific network policies, but a bilateral arrangement to 1852 remark certain DSCPs is often applied to DSCPs in the local-use 1853 range simply because it is easier not to change all of a 1854 network's internal configurations when a new arrangement is 1855 made with a neighbour. 1857 Recommended standard DSCP: These do not tend to be honoured 1858 across network interconnections more than local-use DSCPs. 1859 However, if two networks decide to honour certain of each 1860 other's DSCPs, the reconfiguration is a little easier if both 1861 of their globally recognised services are already represented 1862 by the relevant recommended standard DSCPs. 1864 Note that today a recommended standard DSCP gives little more 1865 assurance of end-to-end service than a local-use DSCP. In 1866 future the range recommended as standard might give more 1867 assurance of end-to-end service than local-use, but it is 1868 unlikely that either assurance will be high, particularly given 1869 the hosts are included in the end-to-end path. 1871 Not all tunnels: Diffserv codepoints are often not propagated to the 1872 outer header when a packet is encapsulated by a tunnel header. 1873 DSCPs are propagated to the outer of uniform mode tunnels, but not 1874 pipe mode [RFC2983], and pipe mode is fairly common. 1876 ECN hard in some lower layers:: Because this approach uses both the 1877 Diffserv and ECN fields, an AQM wil only work at a lower layer if 1878 both can be supported. If individual network operators wished to 1879 deploy an AQM at a lower layer, they would usually propagate an IP 1880 Diffserv codepoint to the lower layer, using for example IEEE 1881 802.1p. However, the ECN capability is harder to propagate down 1882 to lower layers because few lower layers support it. 1884 Pros: 1886 Could migrate to e2e: If all usage of Classic ECN migrates to usage 1887 of L4S, the DSCP would become redundant, and the ECN capability 1888 alone could eventually identify L4S packets without the 1889 interconnection problems of Diffserv detailed above, and without 1890 having permanently consumed more than one codepoint in the IP 1891 header. Although the DSCP does not generally function as an end- 1892 to-end identifier (see above), it could be used initially by 1893 individual ISPs to introduce the L4S service for their own locally 1894 generated traffic. 1896 B.3. ECN capability alone 1898 This approach uses ECN capability alone as the L4S identifier. It 1899 would only have been feasible if RFC 3168 ECN had not been widely 1900 deployed. This was the case when the choice of L4S identifier was 1901 being made and this appendix was first written. Since then, RFC 3168 1902 ECN has been widely deployed and L4S did not take this approach 1903 anyway. So this approach is not discussed further, because it is no 1904 longer a feasible option. 1906 B.4. Protocol ID 1908 It has been suggested that a new ID in the IPv4 Protocol field or the 1909 IPv6 Next Header field could identify L4S packets. However this 1910 approach is ruled out by numerous problems: 1912 o A new protocol ID would need to be paired with the old one for 1913 each transport (TCP, SCTP, UDP, etc.). 1915 o In IPv6, there can be a sequence of Next Header fields, and it 1916 would not be obvious which one would be expected to identify a 1917 network service like L4S. 1919 o A new protocol ID would rarely provide an end-to-end service, 1920 because It is well-known that new protocol IDs are often blocked 1921 by numerous types of middlebox. 1923 o The approach is not a solution for AQM methods below the IP layer. 1925 B.5. Source or destination addressing 1927 Locally, a network operator could arrange for L4S service to be 1928 applied based on source or destination addressing, e.g. packets from 1929 its own data centre and/or CDN hosts, packets to its business 1930 customers, etc. It could use addressing at any layer, e.g. IP 1931 addresses, MAC addresses, VLAN IDs, etc. Although addressing might 1932 be a useful tactical approach for a single ISP, it would not be a 1933 feasible approach to identify an end-to-end service like L4S. Even 1934 for a single ISP, it would require packet classifiers in buffers to 1935 be dependent on changing topology and address allocation decisions 1936 elsewhere in the network. Therefore this approach is not a feasible 1937 solution. 1939 B.6. Summary: Merits of Alternative Identifiers 1941 Table 1 provides a very high level summary of the pros and cons 1942 detailed against the schemes described respectively in Appendix B.2 1943 and Appendix B.1, for six issues that set them apart. 1945 +--------------+--------------------+--------------------+ 1946 | Issue | DSCP + ECN | ECT(1) + CE | 1947 +--------------+--------------------+--------------------+ 1948 | | initial eventual | initial eventual | 1949 | | | | 1950 | end-to-end | N . . . ? . | . . Y . . Y | 1951 | tunnels | . O . . O . | . . ? . . Y | 1952 | lower layers | N . . . ? . | . O . . . ? | 1953 | codepoints | N . . . . ? | N . . . . ? | 1954 | reordering | . . Y . . Y | . O . . . ? | 1955 | ctrl pkts | . . Y . . Y | . O . . . ? | 1956 | | | | 1957 | | | | 1958 +--------------+--------------------+--------------------+ 1960 Table 1: Comparison of the Merits of Three Alternative Identifiers 1962 The schemes are scored based on both their capabilities now 1963 ('initial') and in the long term ('eventual'). The scores are one of 1964 'N, O, Y', meaning 'Poor', 'Ordinary', 'Good' respectively. The same 1965 scores are aligned vertically to aid the eye. A score of "?" in one 1966 of the positions means that this approach might optimistically become 1967 this good, given sufficient effort. The table summarises the text 1968 and is not meant to be understandable without having read the text. 1970 Appendix C. Potential Competing Uses for the ECT(1) Codepoint 1972 The ECT(1) codepoint of the ECN field has already been assigned once 1973 for the ECN nonce [RFC3540], which has now been categorized as 1974 historic [RFC8311]. ECN is probably the only remaining field in the 1975 Internet Protocol that is common to IPv4 and IPv6 and still has 1976 potential to work end-to-end, with tunnels and with lower layers. 1977 Therefore, ECT(1) should not be reassigned to a different 1978 experimental use (L4S) without carefully assessing competing 1979 potential uses. These fall into the following categories: 1981 C.1. Integrity of Congestion Feedback 1983 Receiving hosts can fool a sender into downloading faster by 1984 suppressing feedback of ECN marks (or of losses if retransmissions 1985 are not necessary or available otherwise). 1987 The historic ECN nonce protocol [RFC3540] proposed that a TCP sender 1988 could set either of ECT(0) or ECT(1) in each packet of a flow and 1989 remember the sequence it had set. If any packet was lost or 1990 congestion marked, the receiver would miss that bit of the sequence. 1991 An ECN Nonce receiver had to feed back the least significant bit of 1992 the sum, so it could not suppress feedback of a loss or mark without 1993 a 50-50 chance of guessing the sum incorrectly. 1995 It is highly unlikely that ECT(1) will be needed for integrity 1996 protection in future. The ECN Nonce RFC [RFC3540] as been 1997 reclassified as historic, partly because other ways have been 1998 developed to protect feedback integrity of TCP and other transports 1999 [RFC8311] that do not consume a codepoint in the IP header. For 2000 instance: 2002 o the sender can test the integrity of the receiver's feedback by 2003 occasionally setting the IP-ECN field to a value normally only set 2004 by the network. Then it can test whether the receiver's feedback 2005 faithfully reports what it expects (see para 2 of Section 20.2 of 2006 [RFC3168]. This works for loss and it will work for the accurate 2007 ECN feedback [RFC7560] intended for L4S. 2009 o A network can enforce a congestion response to its ECN markings 2010 (or packet losses) by auditing congestion exposure (ConEx) 2011 [RFC7713]. Whether the receiver or a downstream network is 2012 suppressing congestion feedback or the sender is unresponsive to 2013 the feedback, or both, ConEx audit can neutralise any advantage 2014 that any of these three parties would otherwise gain. 2016 o The TCP authentication option (TCP-AO [RFC5925]) can be used to 2017 detect any tampering with TCP congestion feedback (whether 2018 malicious or accidental). TCP's congestion feedback fields are 2019 immutable end-to-end, so they are amenable to TCP-AO protection, 2020 which covers the main TCP header and TCP options by default. 2021 However, TCP-AO is often too brittle to use on many end-to-end 2022 paths, where middleboxes can make verification fail in their 2023 attempts to improve performance or security, e.g. by 2024 resegmentation or shifting the sequence space. 2026 C.2. Notification of Less Severe Congestion than CE 2028 Various researchers have proposed to use ECT(1) as a less severe 2029 congestion notification than CE, particularly to enable flows to fill 2030 available capacity more quickly after an idle period, when another 2031 flow departs or when a flow starts, e.g. VCP [VCP], Queue View (QV) 2032 [QV]. 2034 Before assigning ECT(1) as an identifer for L4S, we must carefully 2035 consider whether it might be better to hold ECT(1) in reserve for 2036 future standardisation of rapid flow acceleration, which is an 2037 important and enduring problem [RFC6077]. 2039 Pre-Congestion Notification (PCN) is another scheme that assigns 2040 alternative semantics to the ECN field. It uses ECT(1) to signify a 2041 less severe level of pre-congestion notification than CE [RFC6660]. 2042 However, the ECN field only takes on the PCN semantics if packets 2043 carry a Diffserv codepoint defined to indicate PCN marking within a 2044 controlled environment. PCN is required to be applied solely to the 2045 outer header of a tunnel across the controlled region in order not to 2046 interfere with any end-to-end use of the ECN field. Therefore a PCN 2047 region on the path would not interfere with any of the L4S service 2048 identifiers proposed in Appendix B. 2050 Authors' Addresses 2052 Koen De Schepper 2053 Nokia Bell Labs 2054 Antwerp 2055 Belgium 2057 Email: koen.de_schepper@nokia.com 2058 URI: https://www.bell-labs.com/usr/koen.de_schepper 2060 Bob Briscoe (editor) 2061 Independent 2062 UK 2064 Email: ietf@bobbriscoe.net 2065 URI: http://bobbriscoe.net/