idnits 2.17.1 draft-ietf-tsvwg-l4s-arch-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 1, 2021) is 1029 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-07) exists of draft-briscoe-docsis-q-protection-00 == Outdated reference: A later version (-03) exists of draft-briscoe-iccrg-prague-congestion-control-00 == Outdated reference: A later version (-02) exists of draft-cardwell-iccrg-bbr-congestion-control-00 == Outdated reference: A later version (-28) exists of draft-ietf-tcpm-accurate-ecn-14 == Outdated reference: A later version (-15) exists of draft-ietf-tcpm-generalized-ecn-07 == Outdated reference: A later version (-25) exists of draft-ietf-tsvwg-aqm-dualq-coupled-14 == Outdated reference: A later version (-22) exists of draft-ietf-tsvwg-ecn-encap-guidelines-15 == Outdated reference: A later version (-29) exists of draft-ietf-tsvwg-ecn-l4s-id-14 == Outdated reference: A later version (-22) exists of draft-ietf-tsvwg-nqb-05 == Outdated reference: A later version (-23) exists of draft-ietf-tsvwg-rfc6040update-shim-13 == Outdated reference: A later version (-07) exists of draft-stewart-tsvwg-sctpecn-05 -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 7540 (Obsoleted by RFC 9113) -- Obsolete informational reference (is this intentional?): RFC 8312 (Obsoleted by RFC 9438) Summary: 0 errors (**), 0 flaws (~~), 12 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Area Working Group B. Briscoe, Ed. 3 Internet-Draft Independent 4 Intended status: Informational K. De Schepper 5 Expires: January 2, 2022 Nokia Bell Labs 6 M. Bagnulo Braun 7 Universidad Carlos III de Madrid 8 G. White 9 CableLabs 10 July 1, 2021 12 Low Latency, Low Loss, Scalable Throughput (L4S) Internet Service: 13 Architecture 14 draft-ietf-tsvwg-l4s-arch-10 16 Abstract 18 This document describes the L4S architecture, which enables Internet 19 applications to achieve Low queuing Latency, Low Loss, and Scalable 20 throughput (L4S). The insight on which L4S is based is that the root 21 cause of queuing delay is in the congestion controllers of senders, 22 not in the queue itself. The L4S architecture is intended to enable 23 _all_ Internet applications to transition away from congestion 24 control algorithms that cause queuing delay, to a new class of 25 congestion controls that induce very little queuing, aided by 26 explicit congestion signaling from the network. This new class of 27 congestion control can provide low latency for capacity-seeking 28 flows, so applications can achieve both high bandwidth and low 29 latency. 31 The architecture primarily concerns incremental deployment. It 32 defines mechanisms that allow the new class of L4S congestion 33 controls to coexist with 'Classic' congestion controls in a shared 34 network. These mechanisms aim to ensure that the latency and 35 throughput performance using an L4S-compliant congestion controller 36 is usually much better (and never worse) than the performance would 37 have been using a 'Classic' congestion controller, and that competing 38 flows continuing to use 'Classic' controllers are typically not 39 impacted by the presence of L4S. These characteristics are important 40 to encourage adoption of L4S congestion control algorithms and L4S 41 compliant network elements. 43 The L4S architecture consists of three components: network support to 44 isolate L4S traffic from classic traffic; protocol features that 45 allow network elements to identify L4S traffic; and host support for 46 L4S congestion controls. 48 Status of This Memo 50 This Internet-Draft is submitted in full conformance with the 51 provisions of BCP 78 and BCP 79. 53 Internet-Drafts are working documents of the Internet Engineering 54 Task Force (IETF). Note that other groups may also distribute 55 working documents as Internet-Drafts. The list of current Internet- 56 Drafts is at https://datatracker.ietf.org/drafts/current/. 58 Internet-Drafts are draft documents valid for a maximum of six months 59 and may be updated, replaced, or obsoleted by other documents at any 60 time. It is inappropriate to use Internet-Drafts as reference 61 material or to cite them other than as "work in progress." 63 This Internet-Draft will expire on January 2, 2022. 65 Copyright Notice 67 Copyright (c) 2021 IETF Trust and the persons identified as the 68 document authors. All rights reserved. 70 This document is subject to BCP 78 and the IETF Trust's Legal 71 Provisions Relating to IETF Documents 72 (https://trustee.ietf.org/license-info) in effect on the date of 73 publication of this document. Please review these documents 74 carefully, as they describe your rights and restrictions with respect 75 to this document. Code Components extracted from this document must 76 include Simplified BSD License text as described in Section 4.e of 77 the Trust Legal Provisions and are provided without warranty as 78 described in the Simplified BSD License. 80 Table of Contents 82 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 83 2. L4S Architecture Overview . . . . . . . . . . . . . . . . . . 5 84 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 85 4. L4S Architecture Components . . . . . . . . . . . . . . . . . 7 86 4.1. Protocol Mechanisms . . . . . . . . . . . . . . . . . . . 7 87 4.2. Network Components . . . . . . . . . . . . . . . . . . . 9 88 4.3. Host Mechanisms . . . . . . . . . . . . . . . . . . . . . 11 89 5. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . 12 90 5.1. Why These Primary Components? . . . . . . . . . . . . . . 12 91 5.2. What L4S adds to Existing Approaches . . . . . . . . . . 15 92 6. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 18 93 6.1. Applications . . . . . . . . . . . . . . . . . . . . . . 18 94 6.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 19 95 6.3. Applicability with Specific Link Technologies . . . . . . 20 96 6.4. Deployment Considerations . . . . . . . . . . . . . . . . 21 97 6.4.1. Deployment Topology . . . . . . . . . . . . . . . . . 21 98 6.4.2. Deployment Sequences . . . . . . . . . . . . . . . . 22 99 6.4.3. L4S Flow but Non-ECN Bottleneck . . . . . . . . . . . 25 100 6.4.4. L4S Flow but Classic ECN Bottleneck . . . . . . . . . 26 101 6.4.5. L4S AQM Deployment within Tunnels . . . . . . . . . . 26 102 7. IANA Considerations (to be removed by RFC Editor) . . . . . . 26 103 8. Security Considerations . . . . . . . . . . . . . . . . . . . 26 104 8.1. Traffic Rate (Non-)Policing . . . . . . . . . . . . . . . 26 105 8.2. 'Latency Friendliness' . . . . . . . . . . . . . . . . . 27 106 8.3. Interaction between Rate Policing and L4S . . . . . . . . 29 107 8.4. ECN Integrity . . . . . . . . . . . . . . . . . . . . . . 30 108 8.5. Privacy Considerations . . . . . . . . . . . . . . . . . 30 109 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 31 110 10. Informative References . . . . . . . . . . . . . . . . . . . 31 111 Appendix A. Standardization items . . . . . . . . . . . . . . . 39 112 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 114 1. Introduction 116 It is increasingly common for _all_ of a user's applications at any 117 one time to require low delay: interactive Web, Web services, voice, 118 conversational video, interactive video, interactive remote presence, 119 instant messaging, online gaming, remote desktop, cloud-based 120 applications and video-assisted remote control of machinery and 121 industrial processes. In the last decade or so, much has been done 122 to reduce propagation delay by placing caches or servers closer to 123 users. However, queuing remains a major, albeit intermittent, 124 component of latency. For instance spikes of hundreds of 125 milliseconds are common, even with state-of-the-art active queue 126 management (AQM). During a long-running flow, queuing is typically 127 configured to cause overall network delay to roughly double relative 128 to expected base (unloaded) path delay. Low loss is also important 129 because, for interactive applications, losses translate into even 130 longer retransmission delays. 132 It has been demonstrated that, once access network bit rates reach 133 levels now common in the developed world, increasing capacity offers 134 diminishing returns if latency (delay) is not addressed. 135 Differentiated services (Diffserv) offers Expedited Forwarding 136 (EF [RFC3246]) for some packets at the expense of others, but this is 137 not sufficient when all (or most) of a user's applications require 138 low latency. 140 Therefore, the goal is an Internet service with very Low queueing 141 Latency, very Low Loss and Scalable throughput (L4S). Very low 142 queuing latency means less than 1 millisecond (ms) on average and 143 less than about 2 ms at the 99th percentile. L4S is potentially for 144 _all_ traffic - a service for all traffic needs none of the 145 configuration or management baggage (traffic policing, traffic 146 contracts) associated with favouring some traffic over others. This 147 document describes the L4S architecture for achieving these goals. 149 It must be said that queuing delay only degrades performance 150 infrequently [Hohlfeld14]. It only occurs when a large enough 151 capacity-seeking (e.g. TCP) flow is running alongside the user's 152 traffic in the bottleneck link, which is typically in the access 153 network. Or when the low latency application is itself a large 154 capacity-seeking or adaptive rate (e.g. interactive video) flow. At 155 these times, the performance improvement from L4S must be sufficient 156 that network operators will be motivated to deploy it. 158 Active Queue Management (AQM) is part of the solution to queuing 159 under load. AQM improves performance for all traffic, but there is a 160 limit to how much queuing delay can be reduced by solely changing the 161 network; without addressing the root of the problem. 163 The root of the problem is the presence of standard TCP congestion 164 control (Reno [RFC5681]) or compatible variants (e.g. TCP 165 Cubic [RFC8312]). We shall use the term 'Classic' for these Reno- 166 friendly congestion controls. Classic congestion controls induce 167 relatively large saw-tooth-shaped excursions up the queue and down 168 again, which have been growing as flow rate scales [RFC3649]. So if 169 a network operator naively attempts to reduce queuing delay by 170 configuring an AQM to operate at a shallower queue, a Classic 171 congestion control will significantly underutilize the link at the 172 bottom of every saw-tooth. 174 It has been demonstrated that if the sending host replaces a Classic 175 congestion control with a 'Scalable' alternative, when a suitable AQM 176 is deployed in the network the performance under load of all the 177 above interactive applications can be significantly improved. For 178 instance, queuing delay under heavy load with the example DCTCP/DualQ 179 solution cited below on a DSL or Ethernet link is roughly 1 to 2 180 milliseconds at the 99th percentile without losing link 181 utilization [DualPI2Linux], [DCttH15] (for other link types, see 182 Section 6.3). This compares with 5-20 ms on _average_ with a Classic 183 congestion control and current state-of-the-art AQMs such as FQ- 184 CoDel [RFC8290], PIE [RFC8033] or DOCSIS PIE [RFC8034] and about 185 20-30 ms at the 99th percentile [DualPI2Linux]. 187 It has also been demonstrated [DCttH15], [DualPI2Linux] that it is 188 possible to deploy such an L4S service alongside the existing best 189 efforts service so that all of a user's applications can shift to it 190 when their stack is updated. Access networks are typically designed 191 with one link as the bottleneck for each site (which might be a home, 192 small enterprise or mobile device), so deployment at each end of this 193 link should give nearly all the benefit in each direction. The L4S 194 approach also requires component mechanisms at the endpoints to 195 fulfill its goal. This document presents the L4S architecture, by 196 describing the different components and how they interact to provide 197 the scalable, low latency, low loss Internet service. 199 2. L4S Architecture Overview 201 There are three main components to the L4S architecture; the AQM in 202 the network, the congestion control on the host, and the protocol 203 between them: 205 1) Network: L4S traffic needs to be isolated from the queuing 206 latency of Classic traffic. One queue per application flow (FQ) 207 is one way to achieve this, e.g. FQ-CoDel [RFC8290]. However, 208 just two queues is sufficient and does not require inspection of 209 transport layer headers in the network, which is not always 210 possible (see Section 5.2). With just two queues, it might seem 211 impossible to know how much capacity to schedule for each queue 212 without inspecting how many flows at any one time are using each. 213 And it would be undesirable to arbitrarily divide access network 214 capacity into two partitions. The Dual Queue Coupled AQM was 215 developed as a minimal complexity solution to this problem. It 216 acts like a 'semi-permeable' membrane that partitions latency but 217 not bandwidth. As such, the two queues are for transition from 218 Classic to L4S behaviour, not bandwidth prioritization. Section 4 219 gives a high level explanation of how FQ and DualQ solutions work, 220 and [I-D.ietf-tsvwg-aqm-dualq-coupled] gives a full explanation of 221 the DualQ Coupled AQM framework. 223 2) Protocol: A host needs to distinguish L4S and Classic packets 224 with an identifier so that the network can classify them into 225 their separate treatments. [I-D.ietf-tsvwg-ecn-l4s-id] concludes 226 that all alternatives involve compromises, but the ECT(1) and CE 227 codepoints of the ECN field represent a workable solution. 229 3) Host: Scalable congestion controls already exist. They solve the 230 scaling problem with Reno congestion control that was explained in 231 [RFC3649]. The one used most widely (in controlled environments) 232 is Data Center TCP (DCTCP [RFC8257]), which has been implemented 233 and deployed in Windows Server Editions (since 2012), in Linux and 234 in FreeBSD. Although DCTCP as-is 'works' well over the public 235 Internet, most implementations lack certain safety features that 236 will be necessary once it is used outside controlled environments 237 like data centres (see Section 6.4.3 and Appendix A). Scalable 238 congestion control will also need to be implemented in protocols 239 other than TCP (QUIC, SCTP, RTP/RTCP, RMCAT, etc.). Indeed, 240 between the present document being drafted and published, the 241 following scalable congestion controls were implemented: TCP 242 Prague [PragueLinux], QUIC Prague, an L4S variant of the RMCAT 243 SCReAM controller [SCReAM] and the L4S ECN part of BBRv2 [BBRv2] 244 intended for TCP and QUIC transports. 246 3. Terminology 248 Classic Congestion Control: A congestion control behaviour that can 249 co-exist with standard Reno [RFC5681] without causing 250 significantly negative impact on its flow rate [RFC5033]. With 251 Classic congestion controls, such as Reno or Cubic, because flow 252 rate has scaled since TCP congestion control was first designed in 253 1988, it now takes hundreds of round trips (and growing) to 254 recover after a congestion signal (whether a loss or an ECN mark) 255 as shown in the examples in Section 5.1 and [RFC3649]. Therefore 256 control of queuing and utilization becomes very slack, and the 257 slightest disturbances (e.g. from new flows starting) prevent a 258 high rate from being attained. 260 Scalable Congestion Control: A congestion control where the average 261 time from one congestion signal to the next (the recovery time) 262 remains invariant as the flow rate scales, all other factors being 263 equal. This maintains the same degree of control over queueing 264 and utilization whatever the flow rate, as well as ensuring that 265 high throughput is more robust to disturbances. For instance, 266 DCTCP averages 2 congestion signals per round-trip whatever the 267 flow rate, as do other recently developed scalable congestion 268 controls, e.g. Relentless TCP [Mathis09], TCP Prague 269 [I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux], 270 BBRv2 [BBRv2] and the L4S variant of SCReAM for real-time 271 media [SCReAM], [RFC8298]). See Section 4.3 of 272 [I-D.ietf-tsvwg-ecn-l4s-id] for more explanation. 274 Classic service: The Classic service is intended for all the 275 congestion control behaviours that co-exist with Reno [RFC5681] 276 (e.g. Reno itself, Cubic [RFC8312], 277 Compound [I-D.sridharan-tcpm-ctcp], TFRC [RFC5348]). The term 278 'Classic queue' means a queue providing the Classic service. 280 Low-Latency, Low-Loss Scalable throughput (L4S) service: The 'L4S' 281 service is intended for traffic from scalable congestion control 282 algorithms, such as the Prague congestion control 283 [I-D.briscoe-iccrg-prague-congestion-control], which was derived 284 from DCTCP [RFC8257]. The L4S service is for more general 285 traffic than just TCP Prague--it allows the set of congestion 286 controls with similar scaling properties to Prague to evolve, such 287 as the examples listed above (Relentless, SCReAM). The term 'L4S 288 queue' means a queue providing the L4S service. 290 The terms Classic or L4S can also qualify other nouns, such as 291 'queue', 'codepoint', 'identifier', 'classification', 'packet', 292 'flow'. For example: an L4S packet means a packet with an L4S 293 identifier sent from an L4S congestion control. 295 Both Classic and L4S services can cope with a proportion of 296 unresponsive or less-responsive traffic as well, but in the L4S 297 case its rate has to be smooth enough or low enough not build a 298 queue (e.g. DNS, VoIP, game sync datagrams, etc). 300 Reno-friendly: The subset of Classic traffic that is friendly to the 301 standard Reno congestion control defined for TCP in [RFC5681]. 302 Reno-friendly is used in place of 'TCP-friendly', given the latter 303 has become imprecise, because the TCP protocol is now used with so 304 many different congestion control behaviours, and Reno is used in 305 non-TCP transports such as QUIC. 307 Classic ECN: The original Explicit Congestion Notification (ECN) 308 protocol [RFC3168], which requires ECN signals to be treated as 309 equivalent to drops, both when generated in the network and when 310 responded to by the sender. 312 For L4S, the names used for the four codepoints of the 2-bit IP- 313 ECN field are unchanged from those defined in [RFC3168]: Not ECT, 314 ECT(0), ECT(1) and CE, where ECT stands for ECN-Capable Transport 315 and CE stands for Congestion Experienced. A packet marked with 316 the CE codepoint is termed 'ECN-marked' or sometimes just 'marked' 317 where the context makes ECN obvious. 319 Site: A home, mobile device, small enterprise or campus, where the 320 network bottleneck is typically the access link to the site. Not 321 all network arrangements fit this model but it is a useful, widely 322 applicable generalization. 324 4. L4S Architecture Components 326 The L4S architecture is composed of the elements in the following 327 three subsections. 329 4.1. Protocol Mechanisms 331 The L4S architecture involves: a) unassignment of an identifier; b) 332 reassignment of the same identifier; and c) optional further 333 identifiers: 335 a. An essential aspect of a scalable congestion control is the use 336 of explicit congestion signals. 'Classic' ECN [RFC3168] requires 337 an ECN signal to be treated as equivalent to drop, both when it 338 is generated in the network and when it is responded to by hosts. 339 L4S needs networks and hosts to support a more fine-grained 340 meaning for each ECN signal that is less severe than a drop, so 341 that the L4S signals: 343 * can be much more frequent; 345 * can be signalled immediately, without the signficant delay 346 required to smooth out fluctuations in the queue. 348 To enable L4S, the standards track [RFC3168] has had to be 349 updated to allow L4S packets to depart from the 'equivalent to 350 drop' constraint. [RFC8311] is a standards track update to relax 351 specific requirements in RFC 3168 (and certain other standards 352 track RFCs), which clears the way for the experimental changes 353 proposed for L4S. [RFC8311] also reclassifies the original 354 experimental assignment of the ECT(1) codepoint as an ECN 355 nonce [RFC3540] as historic. 357 b. [I-D.ietf-tsvwg-ecn-l4s-id] recommends ECT(1) is used as the 358 identifier to classify L4S packets into a separate treatment from 359 Classic packets. This satisfies the requirements for identifying 360 an alternative ECN treatment in [RFC4774]. 362 The CE codepoint is used to indicate Congestion Experienced by 363 both L4S and Classic treatments. This raises the concern that a 364 Classic AQM earlier on the path might have marked some ECT(0) 365 packets as CE. Then these packets will be erroneously classified 366 into the L4S queue. Appendix B of [I-D.ietf-tsvwg-ecn-l4s-id] 367 explains why five unlikely eventualities all have to coincide for 368 this to have any detrimental effect, which even then would only 369 involve a vanishingly small likelihood of a spurious 370 retransmission. 372 c. A network operator might wish to include certain unresponsive, 373 non-L4S traffic in the L4S queue if it is deemed to be smoothly 374 enough paced and low enough rate not to build a queue. For 375 instance, VoIP, low rate datagrams to sync online games, 376 relatively low rate application-limited traffic, DNS, LDAP, etc. 377 This traffic would need to be tagged with specific identifiers, 378 e.g. a low latency Diffserv Codepoint such as Expedited 379 Forwarding (EF [RFC3246]), Non-Queue-Building 380 (NQB [I-D.ietf-tsvwg-nqb]), or operator-specific identifiers. 382 4.2. Network Components 384 The L4S architecture aims to provide low latency without the _need_ 385 for per-flow operations in network components. Nonetheless, the 386 architecture does not preclude per-flow solutions--it encompasses the 387 following combinations: 389 a. The Dual Queue Coupled AQM (illustrated in Figure 1) achieves the 390 'semi-permeable' membrane property mentioned earlier as follows. 391 The obvious part is that using two separate queues isolates the 392 queuing delay of one from the other. The less obvious part is 393 how the two queues act as if they are a single pool of bandwidth 394 without the scheduler needing to decide between them. This is 395 achieved by having the Classic AQM provide a congestion signal to 396 both queues in a manner that ensures a consistent response from 397 the two types of congestion control. In other words, the Classic 398 AQM generates a drop/mark probability based on congestion in the 399 Classic queue, uses this probability to drop/mark packets in that 400 queue, and also uses this probability to affect the marking 401 probability in the L4S queue. This coupling of the congestion 402 signaling between the two queues makes the L4S flows slow down to 403 leave the right amount of capacity for the Classic traffic (as 404 they would if they were the same type of traffic sharing the same 405 queue). Then the scheduler can serve the L4S queue with 406 priority, because the L4S traffic isn't offering up enough 407 traffic to use all the priority that it is given. Therefore, on 408 short time-scales (sub-round-trip) the prioritization of the L4S 409 queue protects its low latency by allowing bursts to dissipate 410 quickly; but on longer time-scales (round-trip and longer) the 411 Classic queue creates an equal and opposite pressure against the 412 L4S traffic to ensure that neither has priority when it comes to 413 bandwidth. The tension between prioritizing L4S and coupling the 414 marking from the Classic AQM results in approximate per-flow 415 fairness. To protect against unresponsive traffic in the L4S 416 queue taking advantage of the prioritization and starving the 417 Classic queue, it is advisable not to use strict priority, but 418 instead to use a weighted scheduler (see Appendix A of 419 [I-D.ietf-tsvwg-aqm-dualq-coupled]). 421 When there is no Classic traffic, the L4S queue's AQM comes into 422 play. It starts congestion marking with a very shallow queue, so 423 L4S traffic maintains very low queuing delay. 425 The Dual Queue Coupled AQM has been specified as generically as 426 possible [I-D.ietf-tsvwg-aqm-dualq-coupled] without specifying 427 the particular AQMs to use in the two queues so that designers 428 are free to implement diverse ideas. Informational appendices in 429 that draft give pseudocode examples of two different specific AQM 430 approaches: one called DualPI2 (pronounced Dual PI 431 Squared) [DualPI2Linux] that uses the PI2 variant of PIE, and a 432 zero-config variant of RED called Curvy RED. A DualQ Coupled AQM 433 based on PIE has also been specified and implemented for Low 434 Latency DOCSIS [DOCSIS3.1]. 436 (2) (1) 437 .-------^------. .--------------^-------------------. 438 ,-(3)-----. ______ 439 ; ________ : L4S --------. | | 440 :|Scalable| : _\ ||___\_| mark | 441 :| sender | : __________ / / || / |______|\ _________ 442 :|________|\; | |/ --------' ^ \1|condit'nl| 443 `---------'\_| IP-ECN | Coupling : \|priority |_\ 444 ________ / |Classifier| : /|scheduler| / 445 |Classic |/ |__________|\ --------. ___:__ / |_________| 446 | sender | \_\ || | |||___\_| mark/|/ 447 |________| / || | ||| / | drop | 448 Classic --------' |______| 450 Figure 1: Components of an L4S Solution: 1) Isolation in separate 451 network queues; 2) Packet Identification Protocol; and 3) Scalable 452 Sending Host 454 b. A scheduler with per-flow queues such as FQ-CoDel or FQ-PIE can 455 be used for L4S. For instance within each queue of an FQ-CoDel 456 system, as well as a CoDel AQM, there is typically also ECN 457 marking at an immediate (unsmoothed) shallow threshold to support 458 use in data centres (see Sec.5.2.7 of [RFC8290]). This can be 459 modified so that the shallow threshold is solely applied to 460 ECT(1) packets. Then if there is a flow of non-ECN or ECT(0) 461 packets in the per-flow-queue, the Classic AQM (e.g. CoDel) is 462 applied; while if there is a flow of ECT(1) packets in the queue, 463 the shallower (typically sub-millisecond) threshold is applied. 464 In addition, ECT(0) and not-ECT packets could potentially be 465 classified into a separate flow-queue from ECT(1) and CE packets 466 to avoid them mixing if they share a common flow-identifier (e.g. 467 in a VPN). 469 c. It should also be possible to use dual queues for isolation, but 470 with per-flow marking to control flow-rates (instead of the 471 coupled per-queue marking of the Dual Queue Coupled AQM). One of 472 the two queues would be for isolating L4S packets, which would be 473 classified by the ECN codepoint. Flow rates could be controlled 474 by flow-specific marking. The policy goal of the marking could 475 be to differentiate flow rates (e.g. [Nadas20], which requires 476 additional signalling of a per-flow 'value'), or to equalize 477 flow-rates (perhaps in a similar way to Approx Fair CoDel [AFCD], 478 [I-D.morton-tsvwg-codel-approx-fair], but with two queues not 479 one). 481 Note that whenever the term 'DualQ' is used loosely without 482 saying whether marking is per-queue or per-flow, it means a dual 483 queue AQM with per-queue marking. 485 4.3. Host Mechanisms 487 The L4S architecture includes two main mechanisms in the end host 488 that we enumerate next: 490 a. Scalable Congestion Control at the sender: Data Center TCP is the 491 most widely used example. It has been documented as an 492 informational record of the protocol currently in use in 493 controlled environments [RFC8257]. A draft list of safety and 494 performance improvements for a scalable congestion control to be 495 usable on the public Internet has been drawn up (the so-called 496 'Prague L4S requirements' in Appendix A of 497 [I-D.ietf-tsvwg-ecn-l4s-id]). The subset that involve risk of 498 harm to others have been captured as normative requirements in 499 Section 4 of [I-D.ietf-tsvwg-ecn-l4s-id]. TCP 500 Prague [I-D.briscoe-iccrg-prague-congestion-control] has been 501 implemented in Linux as a reference implementation to address 502 these requirements [PragueLinux]. 504 Transport protocols other than TCP use various congestion 505 controls that are designed to be friendly with Reno. Before they 506 can use the L4S service, they will need to be updated to 507 implement a scalable congestion response, which they will have to 508 indicate by using the ECT(1) codepoint. Scalable variants are 509 under consideration for more recent transport protocols, 510 e.g. QUIC, and the L4S ECN part of BBRv2 [BBRv2] is a scalable 511 congestion control intended for the TCP and QUIC transports, 512 amongst others. Also an L4S variant of the RMCAT SCReAM 513 controller [RFC8298] has been implemented [SCReAM] for media 514 transported over RTP. 516 b. The ECN feedback in some transport protocols is already 517 sufficiently fine-grained for L4S (specifically DCCP [RFC4340] 518 and QUIC [RFC9000]). But others either require update or are in 519 the process of being updated: 521 * For the case of TCP, the feedback protocol for ECN embeds the 522 assumption from Classic ECN [RFC3168] that an ECN mark is 523 equivalent to a drop, making it unusable for a scalable TCP. 524 Therefore, the implementation of TCP receivers will have to be 525 upgraded [RFC7560]. Work to standardize and implement more 526 accurate ECN feedback for TCP (AccECN) is in 527 progress [I-D.ietf-tcpm-accurate-ecn], [PragueLinux]. 529 * ECN feedback is only roughly sketched in an appendix of the 530 SCTP specification [RFC4960]. A fuller specification has been 531 proposed in a long-expired draft [I-D.stewart-tsvwg-sctpecn], 532 which would need to be implemented and deployed before SCTCP 533 could support L4S. 535 * For RTP, sufficient ECN feedback was defined in [RFC6679], but 536 [RFC8888] defines the latest standards track improvements. 538 5. Rationale 540 5.1. Why These Primary Components? 542 Explicit congestion signalling (protocol): Explicit congestion 543 signalling is a key part of the L4S approach. In contrast, use of 544 drop as a congestion signal creates a tension because drop is both 545 an impairment (less would be better) and a useful signal (more 546 would be better): 548 * Explicit congestion signals can be used many times per round 549 trip, to keep tight control, without any impairment. Under 550 heavy load, even more explicit signals can be applied so the 551 queue can be kept short whatever the load. In contrast, 552 Classic AQMs have to introduce very high packet drop at high 553 load to keep the queue short. By using ECN, an L4S congestion 554 control's sawtooth reduction can be smaller and therefore 555 return to the operating point more often, without worrying that 556 more sawteeth will cause more signals. The consequent smaller 557 amplitude sawteeth fit between an empty queue and a very 558 shallow marking threshold (~1 ms in the public Internet), so 559 queue delay variation can be very low, without risk of under- 560 utilization. 562 * Explicit congestion signals can be emitted immediately to track 563 fluctuations of the queue. L4S shifts smoothing from the 564 network to the host. The network doesn't know the round trip 565 times of all the flows. So if the network is responsible for 566 smoothing (as in the Classic approach), it has to assume a 567 worst case RTT, otherwise long RTT flows would become unstable. 568 This delays Classic congestion signals by 100-200 ms. In 569 contrast, each host knows its own round trip time. So, in the 570 L4S approach, the host can smooth each flow over its own RTT, 571 introducing no more soothing delay than strictly necessary 572 (usually only a few milliseconds). A host can also choose not 573 to introduce any smoothing delay if appropriate, e.g. during 574 flow start-up. 576 Neither of the above are feasible if explicit congestion 577 signalling has to be considered 'equivalent to drop' (as was 578 required with Classic ECN [RFC3168]), because drop is an 579 impairment as well as a signal. So drop cannot be excessively 580 frequent, and drop cannot be immediate, otherwise too many drops 581 would turn out to have been due to only a transient fluctuation in 582 the queue that would not have warranted dropping a packet in 583 hindsight. Therefore, in an L4S AQM, the L4S queue uses a new L4S 584 variant of ECN that is not equivalent to drop (see section 5.2 of 585 [I-D.ietf-tsvwg-ecn-l4s-id]), while the Classic queue uses either 586 Classic ECN [RFC3168] or drop, which are equivalent to each other. 588 Before Classic ECN was standardized, there were various proposals 589 to give an ECN mark a different meaning from drop. However, there 590 was no particular reason to agree on any one of the alternative 591 meanings, so 'equivalent to drop' was the only compromise that 592 could be reached. RFC 3168 contains a statement that: 594 "An environment where all end nodes were ECN-Capable could 595 allow new criteria to be developed for setting the CE 596 codepoint, and new congestion control mechanisms for end-node 597 reaction to CE packets. However, this is a research issue, and 598 as such is not addressed in this document." 600 Latency isolation (network): L4S congestion controls keep queue 601 delay low whereas Classic congestion controls need a queue of the 602 order of the RTT to avoid under-utilization. One queue cannot 603 have two lengths, therefore L4S traffic needs to be isolated in a 604 separate queue (e.g. DualQ) or queues (e.g. FQ). 606 Coupled congestion notification: Coupling the congestion 607 notification between two queues as in the DualQ Coupled AQM is not 608 necessarily essential, but it is a simple way to allow senders to 609 determine their rate, packet by packet, rather than be overridden 610 by a network scheduler. An alternative is for a network scheduler 611 to control the rate of each application flow (see discussion in 612 Section 5.2). 614 L4S packet identifier (protocol): Once there are at least two 615 treatments in the network, hosts need an identifier at the IP 616 layer to distinguish which treatment they intend to use. 618 Scalable congestion notification: A scalable congestion control in 619 the host keeps the signalling frequency from the network high 620 whatever the flow rate, so that queue delay variations can be 621 small when conditions are stable, and rate can track variations in 622 available capacity as rapidly as possible otherwise. 624 Low loss: Latency is not the only concern of L4S. The 'Low Loss" 625 part of the name denotes that L4S generally achieves zero 626 congestion loss due to its use of ECN. Otherwise, loss would 627 itself cause delay, particularly for short flows, due to 628 retransmission delay [RFC2884]. 630 Scalable throughput: The "Scalable throughput" part of the name 631 denotes that the per-flow throughput of scalable congestion 632 controls should scale indefinitely, avoiding the imminent scaling 633 problems with Reno-friendly congestion control 634 algorithms [RFC3649]. It was known when TCP congestion avoidance 635 was first developed in 1988 that it would not scale to high 636 bandwidth-delay products (see footnote 6 in [TCP-CA]). Today, 637 regular broadband flow rates over WAN distances are already beyond 638 the scaling range of Classic Reno congestion control. So `less 639 unscalable' Cubic [RFC8312] and Compound [I-D.sridharan-tcpm-ctcp] 640 variants of TCP have been successfully deployed. However, these 641 are now approaching their scaling limits. 643 For instance, we will consider a scenario with a maximum RTT of 644 30 ms at the peak of each sawtooth. As Reno packet rate scales 8x 645 from 1,250 to 10,000 packet/s (from 15 to 120 Mb/s with 1500 B 646 packets), the time to recover from a congestion event rises 647 proportionately by 8x as well, from 422 ms to 3.38 s. It is 648 clearly problematic for a congestion control to take multiple 649 seconds to recover from each congestion event. Cubic [RFC8312] 650 was developed to be less unscalable, but it is approaching its 651 scaling limit; with the same max RTT of 30 ms, at 120 Mb/s the 652 Linux implementation of Cubic is still in its Reno-friendly mode, 653 so it takes about 2.3 s to recover. However, once the flow rate 654 scales by 8x again to 960 Mb/s it enters true Cubic mode, with a 655 recovery time of 10.6 s. From then on, each further scaling by 8x 656 doubles Cubic's recovery time (because the cube root of 8 is 2), 657 e.g. at 7.68 Gb/s the recovery time is 21.3 s. In contrast a 658 scalable congestion control like DCTCP or TCP Prague induces 2 659 congestion signals per round trip on average, which remains 660 invariant for any flow rate, keeping dynamic control very tight. 662 Although work on scaling congestion controls tends to start with 663 TCP as the transport, the above is not intended to exclude other 664 transports (e.g. SCTP, QUIC) or less elastic algorithms 665 (e.g. RMCAT), which all tend to adopt the same or similar 666 developments. 668 5.2. What L4S adds to Existing Approaches 670 All the following approaches address some part of the same problem 671 space as L4S. In each case, it is shown that L4S complements them or 672 improves on them, rather than being a mutually exclusive alternative: 674 Diffserv: Diffserv addresses the problem of bandwidth apportionment 675 for important traffic as well as queuing latency for delay- 676 sensitive traffic. Of these, L4S solely addresses the problem of 677 queuing latency. Diffserv will still be necessary where important 678 traffic requires priority (e.g. for commercial reasons, or for 679 protection of critical infrastructure traffic) - see 680 [I-D.briscoe-tsvwg-l4s-diffserv]. Nonetheless, the L4S approach 681 can provide low latency for _all_ traffic within each Diffserv 682 class (including the case where there is only the one default 683 Diffserv class). 685 Also, Diffserv only works for a small subset of the traffic on a 686 link. As already explained, it is not applicable when all the 687 applications in use at one time at a single site (home, small 688 business or mobile device) require low latency. In contrast, 689 because L4S is for all traffic, it needs none of the management 690 baggage (traffic policing, traffic contracts) associated with 691 favouring some packets over others. This baggage has probably 692 held Diffserv back from widespread end-to-end deployment. 694 In particular, because networks tend not to trust end systems to 695 identify which packets should be favoured over others, where 696 networks assign packets to Diffserv classes they often use packet 697 inspection of application flow identifiers or deeper inspection of 698 application signatures. Thus, nowadays, Diffserv doesn't always 699 sit well with encryption of the layers above IP. So users have to 700 choose between privacy and QoS. 702 As with Diffserv, the L4S identifier is in the IP header. But, in 703 contrast to Diffserv, the L4S identifier does not convey a want or 704 a need for a certain level of quality. Rather, it promises a 705 certain behaviour (scalable congestion response), which networks 706 can objectively verify if they need to. This is because low delay 707 depends on collective host behaviour, whereas bandwidth priority 708 depends on network behaviour. 710 State-of-the-art AQMs: AQMs such as PIE and FQ-CoDel give a 711 significant reduction in queuing delay relative to no AQM at all. 712 L4S is intended to complement these AQMs, and should not distract 713 from the need to deploy them as widely as possible. Nonetheless, 714 AQMs alone cannot reduce queuing delay too far without 715 significantly reducing link utilization, because the root cause of 716 the problem is on the host - where Classic congestion controls use 717 large saw-toothing rate variations. The L4S approach resolves 718 this tension by ensuring hosts can minimize the size of their 719 sawteeth without appearing so aggressive to Classic flows that 720 they starve them. 722 Per-flow queuing or marking: Similarly, per-flow approaches such as 723 FQ-CoDel or Approx Fair CoDel [AFCD] are not incompatible with the 724 L4S approach. However, per-flow queuing alone is not enough - it 725 only isolates the queuing of one flow from others; not from 726 itself. Per-flow implementations still need to have support for 727 scalable congestion control added, which has already been done in 728 FQ-CoDel (see Sec.5.2.7 of [RFC8290]). Without this simple 729 modification, per-flow AQMs like FQ-CoDel would still not be able 730 to support applications that need both very low delay and high 731 bandwidth, e.g. video-based control of remote procedures, or 732 interactive cloud-based video (see Note 1 below). 734 Although per-flow techniques are not incompatible with L4S, it is 735 important to have the DualQ alternative. This is because handling 736 end-to-end (layer 4) flows in the network (layer 3 or 2) precludes 737 some important end-to-end functions. For instance: 739 A. Per-flow forms of L4S like FQ-CoDel are incompatible with full 740 end-to-end encryption of transport layer identifiers for 741 privacy and confidentiality (e.g. IPSec or encrypted VPN 742 tunnels), because they require packet inspection to access the 743 end-to-end transport flow identifiers. 745 In contrast, the DualQ form of L4S requires no deeper 746 inspection than the IP layer. So, as long as operators take 747 the DualQ approach, their users can have both very low queuing 748 delay and full end-to-end encryption [RFC8404]. 750 B. With per-flow forms of L4S, the network takes over control of 751 the relative rates of each application flow. Some see it as 752 an advantage that the network will prevent some flows running 753 faster than others. Others consider it an inherent part of 754 the Internet's appeal that applications can control their rate 755 while taking account of the needs of others via congestion 756 signals. They maintain that this has allowed applications 757 with interesting rate behaviours to evolve, for instance, 758 variable bit-rate video that varies around an equal share 759 rather than being forced to remain equal at every instant, or 760 scavenger services that use less than an equal share of 761 capacity [LEDBAT_AQM]. 763 The L4S architecture does not require the IETF to commit to 764 one approach over the other, because it supports both, so that 765 the 'market' can decide. Nonetheless, in the spirit of 'Do 766 one thing and do it well' [McIlroy78], the DualQ option 767 provides low delay without prejudging the issue of flow-rate 768 control. Then, flow rate policing can be added separately if 769 desired. This allows application control up to a point, but 770 the network can still choose to set the point at which it 771 intervenes to prevent one flow completely starving another. 773 Note: 775 1. It might seem that self-inflicted queuing delay within a per- 776 flow queue should not be counted, because if the delay wasn't 777 in the network it would just shift to the sender. However, 778 modern adaptive applications, e.g. HTTP/2 [RFC7540] or some 779 interactive media applications (see Section 6.1), can keep low 780 latency objects at the front of their local send queue by 781 shuffling priorities of other objects dependent on the 782 progress of other transfers. They cannot shuffle objects once 783 they have released them into the network. 785 Alternative Back-off ECN (ABE): Here again, L4S is not an 786 alternative to ABE but a complement that introduces much lower 787 queuing delay. ABE [RFC8511] alters the host behaviour in 788 response to ECN marking to utilize a link better and give ECN 789 flows faster throughput. It uses ECT(0) and assumes the network 790 still treats ECN and drop the same. Therefore ABE exploits any 791 lower queuing delay that AQMs can provide. But as explained 792 above, AQMs still cannot reduce queuing delay too far without 793 losing link utilization (to allow for other, non-ABE, flows). 795 BBR: Bottleneck Bandwidth and Round-trip propagation time 796 (BBR [I-D.cardwell-iccrg-bbr-congestion-control]) controls queuing 797 delay end-to-end without needing any special logic in the network, 798 such as an AQM. So it works pretty-much on any path (although it 799 has not been without problems, particularly capacity sharing in 800 BBRv1). BBR keeps queuing delay reasonably low, but perhaps not 801 quite as low as with state-of-the-art AQMs such as PIE or FQ- 802 CoDel, and certainly nowhere near as low as with L4S. Queuing 803 delay is also not consistently low, due to BBR's regular bandwidth 804 probing spikes and its aggressive flow start-up phase. 806 L4S complements BBR. Indeed BBRv2 [BBRv2] uses L4S ECN and a 807 scalable L4S congestion control behaviour in response to any ECN 808 signalling from the path. The L4S ECN signal complements the 809 delay based congestion control aspects of BBR with an explicit 810 indication that hosts can use, both to converge on a fair rate and 811 to keep below a shallow queue target set by the network. Without 812 L4S ECN, both these aspects need to be assumed or estimated. 814 6. Applicability 816 6.1. Applications 818 A transport layer that solves the current latency issues will provide 819 new service, product and application opportunities. 821 With the L4S approach, the following existing applications also 822 experience significantly better quality of experience under load: 824 o Gaming, including cloud based gaming; 826 o VoIP; 828 o Video conferencing; 830 o Web browsing; 832 o (Adaptive) video streaming; 834 o Instant messaging. 836 The significantly lower queuing latency also enables some interactive 837 application functions to be offloaded to the cloud that would hardly 838 even be usable today: 840 o Cloud based interactive video; 842 o Cloud based virtual and augmented reality. 844 The above two applications have been successfully demonstrated with 845 L4S, both running together over a 40 Mb/s broadband access link 846 loaded up with the numerous other latency sensitive applications in 847 the previous list as well as numerous downloads - all sharing the 848 same bottleneck queue simultaneously [L4Sdemo16]. For the former, a 849 panoramic video of a football stadium could be swiped and pinched so 850 that, on the fly, a proxy in the cloud could generate a sub-window of 851 the match video under the finger-gesture control of each user. For 852 the latter, a virtual reality headset displayed a viewport taken from 853 a 360 degree camera in a racing car. The user's head movements 854 controlled the viewport extracted by a cloud-based proxy. In both 855 cases, with 7 ms end-to-end base delay, the additional queuing delay 856 of roughly 1 ms was so low that it seemed the video was generated 857 locally. 859 Using a swiping finger gesture or head movement to pan a video are 860 extremely latency-demanding actions--far more demanding than VoIP. 861 Because human vision can detect extremely low delays of the order of 862 single milliseconds when delay is translated into a visual lag 863 between a video and a reference point (the finger or the orientation 864 of the head sensed by the balance system in the inner ear --- the 865 vestibular system). 867 Without the low queuing delay of L4S, cloud-based applications like 868 these would not be credible without significantly more access 869 bandwidth (to deliver all possible video that might be viewed) and 870 more local processing, which would increase the weight and power 871 consumption of head-mounted displays. When all interactive 872 processing can be done in the cloud, only the data to be rendered for 873 the end user needs to be sent. 875 Other low latency high bandwidth applications such as: 877 o Interactive remote presence; 879 o Video-assisted remote control of machinery or industrial 880 processes. 882 are not credible at all without very low queuing delay. No amount of 883 extra access bandwidth or local processing can make up for lost time. 885 6.2. Use Cases 887 The following use-cases for L4S are being considered by various 888 interested parties: 890 o Where the bottleneck is one of various types of access network: 891 e.g. DSL, Passive Optical Networks (PON), DOCSIS cable, mobile, 892 satellite (see Section 6.3 for some technology-specific details) 894 o Private networks of heterogeneous data centres, where there is no 895 single administrator that can arrange for all the simultaneous 896 changes to senders, receivers and network needed to deploy DCTCP: 898 * a set of private data centres interconnected over a wide area 899 with separate administrations, but within the same company 901 * a set of data centres operated by separate companies 902 interconnected by a community of interest network (e.g. for the 903 finance sector) 905 * multi-tenant (cloud) data centres where tenants choose their 906 operating system stack (Infrastructure as a Service - IaaS) 908 o Different types of transport (or application) congestion control: 910 * elastic (TCP/SCTP); 912 * real-time (RTP, RMCAT); 914 * query (DNS/LDAP). 916 o Where low delay quality of service is required, but without 917 inspecting or intervening above the IP layer [RFC8404]: 919 * mobile and other networks have tended to inspect higher layers 920 in order to guess application QoS requirements. However, with 921 growing demand for support of privacy and encryption, L4S 922 offers an alternative. There is no need to select which 923 traffic to favour for queuing, when L4S gives favourable 924 queuing to all traffic. 926 o If queuing delay is minimized, applications with a fixed delay 927 budget can communicate over longer distances, or via a longer 928 chain of service functions [RFC7665] or onion routers. 930 o If delay jitter is minimized, it is possible to reduce the 931 dejitter buffers on the receive end of video streaming, which 932 should improve the interactive experience 934 6.3. Applicability with Specific Link Technologies 936 Certain link technologies aggregate data from multiple packets into 937 bursts, and buffer incoming packets while building each burst. WiFi, 938 PON and cable all involve such packet aggregation, whereas fixed 939 Ethernet and DSL do not. No sender, whether L4S or not, can do 940 anything to reduce the buffering needed for packet aggregation. So 941 an AQM should not count this buffering as part of the queue that it 942 controls, given no amount of congestion signals will reduce it. 944 Certain link technologies also add buffering for other reasons, 945 specifically: 947 o Radio links (cellular, WiFi, satellite) that are distant from the 948 source are particularly challenging. The radio link capacity can 949 vary rapidly by orders of magnitude, so it is considered desirable 950 to hold a standing queue that can utilize sudden increases of 951 capacity; 953 o Cellular networks are further complicated by a perceived need to 954 buffer in order to make hand-overs imperceptible; 956 L4S cannot remove the need for all these different forms of 957 buffering. However, by removing 'the longest pole in the tent' 958 (buffering for the large sawteeth of Classic congestion controls), 959 L4S exposes all these 'shorter poles' to greater scrutiny. 961 Until now, the buffering needed for these additional reasons tended 962 to be over-specified - with the excuse that none were 'the longest 963 pole in the tent'. But having removed the 'longest pole', it becomes 964 worthwhile to minimize them, for instance reducing packet aggregation 965 burst sizes and MAC scheduling intervals. 967 6.4. Deployment Considerations 969 L4S AQMs, whether DualQ [I-D.ietf-tsvwg-aqm-dualq-coupled] or FQ, 970 e.g. [RFC8290] are, in themselves, an incremental deployment 971 mechanism for L4S - so that L4S traffic can coexist with existing 972 Classic (Reno-friendly) traffic. Section 6.4.1 explains why only 973 deploying an L4S AQM in one node at each end of the access link will 974 realize nearly all the benefit of L4S. 976 L4S involves both end systems and the network, so Section 6.4.2 977 suggests some typical sequences to deploy each part, and why there 978 will be an immediate and significant benefit after deploying just one 979 part. 981 Section 6.4.3 and Section 6.4.4 describe the converse incremental 982 deployment case where there is no L4S AQM at the network bottleneck, 983 so any L4S flow traversing this bottleneck has to take care in case 984 it is competing with Classic traffic. 986 6.4.1. Deployment Topology 988 L4S AQMs will not have to be deployed throughout the Internet before 989 L4S will work for anyone. Operators of public Internet access 990 networks typically design their networks so that the bottleneck will 991 nearly always occur at one known (logical) link. This confines the 992 cost of queue management technology to one place. 994 The case of mesh networks is different and will be discussed later in 995 this section. But the known bottleneck case is generally true for 996 Internet access to all sorts of different 'sites', where the word 997 'site' includes home networks, small- to medium-sized campus or 998 enterprise networks and even cellular devices (Figure 2). Also, this 999 known-bottleneck case tends to be applicable whatever the access link 1000 technology; whether xDSL, cable, PON, cellular, line of sight 1001 wireless or satellite. 1003 Therefore, the full benefit of the L4S service should be available in 1004 the downstream direction when an L4S AQM is deployed at the ingress 1005 to this bottleneck link. And similarly, the full upstream service 1006 will be available once an L4S AQM is deployed at the ingress into the 1007 upstream link. (Of course, multi-homed sites would only see the full 1008 benefit once all their access links were covered.) 1010 ______ 1011 ( ) 1012 __ __ ( ) 1013 |DQ\________/DQ|( enterprise ) 1014 ___ |__/ \__| ( /campus ) 1015 ( ) (______) 1016 ( ) ___||_ 1017 +----+ ( ) __ __ / \ 1018 | DC |-----( Core )|DQ\_______________/DQ|| home | 1019 +----+ ( ) |__/ \__||______| 1020 (_____) __ 1021 |DQ\__/\ __ ,===. 1022 |__/ \ ____/DQ||| ||mobile 1023 \/ \__|||_||device 1024 | o | 1025 `---' 1027 Figure 2: Likely location of DualQ (DQ) Deployments in common access 1028 topologies 1030 Deployment in mesh topologies depends on how over-booked the core is. 1031 If the core is non-blocking, or at least generously provisioned so 1032 that the edges are nearly always the bottlenecks, it would only be 1033 necessary to deploy an L4S AQM at the edge bottlenecks. For example, 1034 some data-centre networks are designed with the bottleneck in the 1035 hypervisor or host NICs, while others bottleneck at the top-of-rack 1036 switch (both the output ports facing hosts and those facing the 1037 core). 1039 An L4S AQM would eventually also need to be deployed at any other 1040 persistent bottlenecks such as network interconnections, e.g. some 1041 public Internet exchange points and the ingress and egress to WAN 1042 links interconnecting data-centres. 1044 6.4.2. Deployment Sequences 1046 For any one L4S flow to work, it requires 3 parts to have been 1047 deployed. This was the same deployment problem that ECN 1048 faced [RFC8170] so we have learned from that experience. 1050 Firstly, L4S deployment exploits the fact that DCTCP already exists 1051 on many Internet hosts (Windows, FreeBSD and Linux); both servers and 1052 clients. Therefore, just deploying an L4S AQM at a network 1053 bottleneck immediately gives a working deployment of all the L4S 1054 parts. DCTCP needs some safety concerns to be fixed for general use 1055 over the public Internet (see Section 4.3 of 1056 [I-D.ietf-tsvwg-ecn-l4s-id]), but DCTCP is not on by default, so 1057 these issues can be managed within controlled deployments or 1058 controlled trials. 1060 Secondly, the performance improvement with L4S is so significant that 1061 it enables new interactive services and products that were not 1062 previously possible. It is much easier for companies to initiate new 1063 work on deployment if there is budget for a new product trial. If, 1064 in contrast, there were only an incremental performance improvement 1065 (as with Classic ECN), spending on deployment tends to be much harder 1066 to justify. 1068 Thirdly, the L4S identifier is defined so that initially network 1069 operators can enable L4S exclusively for certain customers or certain 1070 applications. But this is carefully defined so that it does not 1071 compromise future evolution towards L4S as an Internet-wide service. 1072 This is because the L4S identifier is defined not only as the end-to- 1073 end ECN field, but it can also optionally be combined with any other 1074 packet header or some status of a customer or their access link (see 1075 section 5.4 of [I-D.ietf-tsvwg-ecn-l4s-id]). Operators could do this 1076 anyway, even if it were not blessed by the IETF. However, it is best 1077 for the IETF to specify that, if they use their own local identifier, 1078 it must be in combination with the IETF's identifier. Then, if an 1079 operator has opted for an exclusive local-use approach, later they 1080 only have to remove this extra rule to make the service work 1081 Internet-wide - it will already traverse middleboxes, peerings, etc. 1083 +-+--------------------+----------------------+---------------------+ 1084 | | Servers or proxies | Access link | Clients | 1085 +-+--------------------+----------------------+---------------------+ 1086 |0| DCTCP (existing) | | DCTCP (existing) | 1087 +-+--------------------+----------------------+---------------------+ 1088 |1| |Add L4S AQM downstream| | 1089 | | WORKS DOWNSTREAM FOR CONTROLLED DEPLOYMENTS/TRIALS | 1090 +-+--------------------+----------------------+---------------------+ 1091 |2| Upgrade DCTCP to | |Replace DCTCP feedb'k| 1092 | | TCP Prague | | with AccECN | 1093 | | FULLY WORKS DOWNSTREAM | 1094 +-+--------------------+----------------------+---------------------+ 1095 | | | | Upgrade DCTCP to | 1096 |3| | Add L4S AQM upstream | TCP Prague | 1097 | | | | | 1098 | | FULLY WORKS UPSTREAM AND DOWNSTREAM | 1099 +-+--------------------+----------------------+---------------------+ 1101 Figure 3: Example L4S Deployment Sequence 1103 Figure 3 illustrates some example sequences in which the parts of L4S 1104 might be deployed. It consists of the following stages: 1106 1. Here, the immediate benefit of a single AQM deployment can be 1107 seen, but limited to a controlled trial or controlled deployment. 1108 In this example downstream deployment is first, but in other 1109 scenarios the upstream might be deployed first. If no AQM at all 1110 was previously deployed for the downstream access, an L4S AQM 1111 greatly improves the Classic service (as well as adding the L4S 1112 service). If an AQM was already deployed, the Classic service 1113 will be unchanged (and L4S will add an improvement on top). 1115 2. In this stage, the name 'TCP 1116 Prague' [I-D.briscoe-iccrg-prague-congestion-control] is used to 1117 represent a variant of DCTCP that is safe to use in a production 1118 Internet environment. If the application is primarily 1119 unidirectional, 'TCP Prague' at one end will provide all the 1120 benefit needed. For TCP transports, Accurate ECN feedback 1121 (AccECN) [I-D.ietf-tcpm-accurate-ecn] is needed at the other end, 1122 but it is a generic ECN feedback facility that is already planned 1123 to be deployed for other purposes, e.g. DCTCP, BBR. The two ends 1124 can be deployed in either order, because, in TCP, an L4S 1125 congestion control only enables itself if it has negotiated the 1126 use of AccECN feedback with the other end during the connection 1127 handshake. Thus, deployment of TCP Prague on a server enables 1128 L4S trials to move to a production service in one direction, 1129 wherever AccECN is deployed at the other end. This stage might 1130 be further motivated by the performance improvements of TCP 1131 Prague relative to DCTCP (see Appendix A.2 of 1132 [I-D.ietf-tsvwg-ecn-l4s-id]). 1134 Unlike TCP, from the outset, QUIC ECN feedback [RFC9000] has 1135 supported L4S. Therefore, if the transport is QUIC, one-ended 1136 deployment of a Prague congestion control at this stage is simple 1137 and sufficient. 1139 3. This is a two-move stage to enable L4S upstream. An L4S AQM or 1140 TCP Prague can be deployed in either order as already explained. 1141 To motivate the first of two independent moves, the deferred 1142 benefit of enabling new services after the second move has to be 1143 worth it to cover the first mover's investment risk. As 1144 explained already, the potential for new interactive services 1145 provides this motivation. An L4S AQM also improves the upstream 1146 Classic service - significantly if no other AQM has already been 1147 deployed. 1149 Note that other deployment sequences might occur. For instance: the 1150 upstream might be deployed first; a non-TCP protocol might be used 1151 end-to-end, e.g. QUIC, RTP; a body such as the 3GPP might require L4S 1152 to be implemented in 5G user equipment, or other random acts of 1153 kindness. 1155 6.4.3. L4S Flow but Non-ECN Bottleneck 1157 If L4S is enabled between two hosts, the L4S sender is required to 1158 coexist safely with Reno in response to any drop (see Section 4.3 of 1159 [I-D.ietf-tsvwg-ecn-l4s-id]). 1161 Unfortunately, as well as protecting Classic traffic, this rule 1162 degrades the L4S service whenever there is any loss, even if the 1163 cause is not persistent congestion at a bottleneck, e.g.: 1165 o congestion loss at other transient bottlenecks, e.g. due to bursts 1166 in shallower queues; 1168 o transmission errors, e.g. due to electrical interference; 1170 o rate policing. 1172 Three complementary approaches are in progress to address this issue, 1173 but they are all currently research: 1175 o In Prague congestion control, ignore certain losses deemed 1176 unlikely to be due to congestion (using some ideas from 1177 BBR [I-D.cardwell-iccrg-bbr-congestion-control] regarding isolated 1178 losses). This could mask any of the above types of loss while 1179 still coexisting with drop-based congestion controls. 1181 o A combination of RACK, L4S and link retransmission without 1182 resequencing could repair transmission errors without the head of 1183 line blocking delay usually associated with link-layer 1184 retransmission [UnorderedLTE], [I-D.ietf-tsvwg-ecn-l4s-id]; 1186 o Hybrid ECN/drop rate policers (see Section 8.3). 1188 L4S deployment scenarios that minimize these issues (e.g. over 1189 wireline networks) can proceed in parallel to this research, in the 1190 expectation that research success could continually widen L4S 1191 applicability. 1193 6.4.4. L4S Flow but Classic ECN Bottleneck 1195 Classic ECN support is starting to materialize on the Internet as an 1196 increased level of CE marking. It is hard to detect whether this is 1197 all due to the addition of support for ECN in the Linux 1198 implementation of FQ-CoDel, which is not problematic, because FQ 1199 inherently forces the throughput of each flow to be equal 1200 irrespective of its aggressiveness. However, some of this Classic 1201 ECN marking might be due to single-queue ECN deployment. This case 1202 is discussed in Section 4.3 of [I-D.ietf-tsvwg-ecn-l4s-id]). 1204 6.4.5. L4S AQM Deployment within Tunnels 1206 An L4S AQM uses the ECN field to signal congestion. So, in common 1207 with Classic ECN, if the AQM is within a tunnel or at a lower layer, 1208 correct functioning of ECN signalling requires correct propagation of 1209 the ECN field up the layers [RFC6040], 1210 [I-D.ietf-tsvwg-rfc6040update-shim], 1211 [I-D.ietf-tsvwg-ecn-encap-guidelines]. 1213 7. IANA Considerations (to be removed by RFC Editor) 1215 This specification contains no IANA considerations. 1217 8. Security Considerations 1219 8.1. Traffic Rate (Non-)Policing 1221 Because the L4S service can serve all traffic that is using the 1222 capacity of a link, it should not be necessary to rate-police access 1223 to the L4S service. In contrast, Diffserv only works if some packets 1224 get less favourable treatment than others. So Diffserv has to use 1225 traffic rate policers to limit how much traffic can be favoured. In 1226 turn, traffic policers require traffic contracts between users and 1227 networks as well as pairwise between networks. Because L4S will lack 1228 all this management complexity, it is more likely to work end-to-end. 1230 During early deployment (and perhaps always), some networks will not 1231 offer the L4S service. In general, these networks should not need to 1232 police L4S traffic - they are required not to change the L4S 1233 identifier, merely treating the traffic as best efforts traffic, as 1234 they already treat traffic with ECT(1) today. At a bottleneck, such 1235 networks will introduce some queuing and dropping. When a scalable 1236 congestion control detects a drop it will have to respond safely with 1237 respect to Classic congestion controls (as required in Section 4.3 of 1238 [I-D.ietf-tsvwg-ecn-l4s-id]). This will degrade the L4S service to 1239 be no better (but never worse) than Classic best efforts, whenever a 1240 non-ECN bottleneck is encountered on a path (see Section 6.4.3). 1242 In some cases, networks that solely support Classic ECN [RFC3168] in 1243 a single queue bottleneck might opt to police L4S traffic in order to 1244 protect competing Classic ECN traffic. 1246 Certain network operators might choose to restrict access to the L4S 1247 class, perhaps only to selected premium customers as a value-added 1248 service. Their packet classifier (item 2 in Figure 1) could identify 1249 such customers against some other field (e.g. source address range) 1250 as well as ECN. If only the ECN L4S identifier matched, but not the 1251 source address (say), the classifier could direct these packets (from 1252 non-premium customers) into the Classic queue. Explaining clearly 1253 how operators can use an additional local classifiers (see section 1254 5.4 of [I-D.ietf-tsvwg-ecn-l4s-id]) is intended to remove any 1255 motivation to bleach the L4S identifier. Then at least the L4S ECN 1256 identifier will be more likely to survive end-to-end even though the 1257 service may not be supported at every hop. Such local arrangements 1258 would only require simple registered/not-registered packet 1259 classification, rather than the managed, application-specific traffic 1260 policing against customer-specific traffic contracts that Diffserv 1261 uses. 1263 8.2. 'Latency Friendliness' 1265 Like the Classic service, the L4S service relies on self-constraint - 1266 limiting rate in response to congestion. In addition, the L4S 1267 service requires self-constraint in terms of limiting latency 1268 (burstiness). It is hoped that self-interest and guidance on dynamic 1269 behaviour (especially flow start-up, which might need to be 1270 standardized) will be sufficient to prevent transports from sending 1271 excessive bursts of L4S traffic, given the application's own latency 1272 will suffer most from such behaviour. 1274 Whether burst policing becomes necessary remains to be seen. Without 1275 it, there will be potential for attacks on the low latency of the L4S 1276 service. 1278 If needed, various arrangements could be used to address this 1279 concern: 1281 Local bottleneck queue protection: A per-flow (5-tuple) queue 1282 protection function [I-D.briscoe-docsis-q-protection] has been 1283 developed for the low latency queue in DOCSIS, which has adopted 1284 the DualQ L4S architecture. It protects the low latency service 1285 from any queue-building flows that accidentally or maliciously 1286 classify themselves into the low latency queue. It is designed to 1287 score flows based solely on their contribution to queuing (not 1288 flow rate in itself). Then, if the shared low latency queue is at 1289 risk of exceeding a threshold, the function redirects enough 1290 packets of the highest scoring flow(s) into the Classic queue to 1291 preserve low latency. 1293 Distributed traffic scrubbing: Rather than policing locally at each 1294 bottleneck, it may only be necessary to address problems 1295 reactively, e.g. punitively target any deployments of new bursty 1296 malware, in a similar way to how traffic from flooding attack 1297 sources is rerouted via scrubbing facilities. 1299 Local bottleneck per-flow scheduling: Per-flow scheduling should 1300 inherently isolate non-bursty flows from bursty (see Section 5.2 1301 for discussion of the merits of per-flow scheduling relative to 1302 per-flow policing). 1304 Distributed access subnet queue protection: Per-flow queue 1305 protection could be arranged for a queue structure distributed 1306 across a subnet inter-communicating using lower layer control 1307 messages (see Section 2.1.4 of [QDyn]). For instance, in a radio 1308 access network user equipment already sends regular buffer status 1309 reports to a radio network controller, which could use this 1310 information to remotely police individual flows. 1312 Distributed Congestion Exposure to Ingress Policers: The Congestion 1313 Exposure (ConEx) architecture [RFC7713] which uses egress audit to 1314 motivate senders to truthfully signal path congestion in-band 1315 where it can be used by ingress policers. An edge-to-edge variant 1316 of this architecture is also possible. 1318 Distributed Domain-edge traffic conditioning: An architecture 1319 similar to Diffserv [RFC2475] may be preferred, where traffic is 1320 proactively conditioned on entry to a domain, rather than 1321 reactively policed only if it is leads to queuing once combined 1322 with other traffic at a bottleneck. 1324 Distributed core network queue protection: The policing function 1325 could be divided between per-flow mechanisms at the network 1326 ingress that characterize the burstiness of each flow into a 1327 signal carried with the traffic, and per-class mechanisms at 1328 bottlenecks that act on these signals if queuing actually occurs 1329 once the traffic converges. This would be somewhat similar to the 1330 idea behind core stateless fair queuing, which is in turn similar 1331 to [Nadas20]. 1333 None of these possible queue protection capabilities are considered a 1334 necessary part of the L4S architecture, which works without them (in 1335 a similar way to how the Internet works without per-flow rate 1336 policing). Indeed, under normal circumstances, latency policers 1337 would not intervene, and if operators found they were not necessary 1338 they could disable them. Part of the L4S experiment will be to see 1339 whether such a function is necessary, and which arrangements are most 1340 appropriate to the size of the problem. 1342 8.3. Interaction between Rate Policing and L4S 1344 As mentioned in Section 5.2, L4S should remove the need for low 1345 latency Diffserv classes. However, those Diffserv classes that give 1346 certain applications or users priority over capacity, would still be 1347 applicable in certain scenarios (e.g. corporate networks). Then, 1348 within such Diffserv classes, L4S would often be applicable to give 1349 traffic low latency and low loss as well. Within such a Diffserv 1350 class, the bandwidth available to a user or application is often 1351 limited by a rate policer. Similarly, in the default Diffserv class, 1352 rate policers are used to partition shared capacity. 1354 A classic rate policer drops any packets exceeding a set rate, 1355 usually also giving a burst allowance (variants exist where the 1356 policer re-marks non-compliant traffic to a discard-eligible Diffserv 1357 codepoint, so they may be dropped elsewhere during contention). 1358 Whenever L4S traffic encounters one of these rate policers, it will 1359 experience drops and the source will have to fall back to a Classic 1360 congestion control, thus losing the benefits of L4S (Section 6.4.3). 1361 So, in networks that already use rate policers and plan to deploy 1362 L4S, it will be preferable to redesign these rate policers to be more 1363 friendly to the L4S service. 1365 L4S-friendly rate policing is currently a research area (note that 1366 this is not the same as latency policing). It might be achieved by 1367 setting a threshold where ECN marking is introduced, such that it is 1368 just under the policed rate or just under the burst allowance where 1369 drop is introduced. This could be applied to various types of rate 1370 policer, e.g. [RFC2697], [RFC2698] or the 'local' (non-ConEx) variant 1371 of the ConEx congestion policer [I-D.briscoe-conex-policing]. It 1372 might also be possible to design scalable congestion controls to 1373 respond less catastrophically to loss that has not been preceded by a 1374 period of increasing delay. 1376 The design of L4S-friendly rate policers will require a separate 1377 dedicated document. For further discussion of the interaction 1378 between L4S and Diffserv, see [I-D.briscoe-tsvwg-l4s-diffserv]. 1380 8.4. ECN Integrity 1382 Receiving hosts can fool a sender into downloading faster by 1383 suppressing feedback of ECN marks (or of losses if retransmissions 1384 are not necessary or available otherwise). Various ways to protect 1385 transport feedback integrity have been developed. For instance: 1387 o The sender can test the integrity of the receiver's feedback by 1388 occasionally setting the IP-ECN field to the congestion 1389 experienced (CE) codepoint, which is normally only set by a 1390 congested link. Then the sender can test whether the receiver's 1391 feedback faithfully reports what it expects (see 2nd para of 1392 Section 20.2 of [RFC3168]). 1394 o A network can enforce a congestion response to its ECN markings 1395 (or packet losses) by auditing congestion exposure 1396 (ConEx) [RFC7713]. 1398 o The TCP authentication option (TCP-AO [RFC5925]) can be used to 1399 detect tampering with TCP congestion feedback. 1401 o The ECN Nonce [RFC3540] was proposed to detect tampering with 1402 congestion feedback, but it has been reclassified as 1403 historic [RFC8311]. 1405 Appendix C.1 of [I-D.ietf-tsvwg-ecn-l4s-id] gives more details of 1406 these techniques including their applicability and pros and cons. 1408 8.5. Privacy Considerations 1410 As discussed in Section 5.2, the L4S architecture does not preclude 1411 approaches that inspect end-to-end transport layer identifiers. For 1412 instance it is simple to add L4S support to FQ-CoDel, which 1413 classifies by application flow ID in the network. However, the main 1414 innovation of L4S is the DualQ AQM framework that does not need to 1415 inspect any deeper than the outermost IP header, because the L4S 1416 identifier is in the IP-ECN field. 1418 Thus, the L4S architecture enables very low queuing delay without 1419 _requiring_ inspection of information above the IP layer. This means 1420 that users who want to encrypt application flow identifiers, e.g. in 1421 IPSec or other encrypted VPN tunnels, don't have to sacrifice low 1422 delay [RFC8404]. 1424 Because L4S can provide low delay for a broad set of applications 1425 that choose to use it, there is no need for individual applications 1426 or classes within that broad set to be distinguishable in any way 1427 while traversing networks. This removes much of the ability to 1428 correlate between the delay requirements of traffic and other 1429 identifying features [RFC6973]. There may be some types of traffic 1430 that prefer not to use L4S, but the coarse binary categorization of 1431 traffic reveals very little that could be exploited to compromise 1432 privacy. 1434 9. Acknowledgements 1436 Thanks to Richard Scheffenegger, Wes Eddy, Karen Nielsen, David 1437 Black, Jake Holland and Vidhi Goel for their useful review comments. 1439 Bob Briscoe and Koen De Schepper were part-funded by the European 1440 Community under its Seventh Framework Programme through the Reducing 1441 Internet Transport Latency (RITE) project (ICT-317700). Bob Briscoe 1442 was also part-funded by the Research Council of Norway through the 1443 TimeIn project, partly by CableLabs and partly by the Comcast 1444 Innovation Fund. The views expressed here are solely those of the 1445 authors. 1447 10. Informative References 1449 [AFCD] Xue, L., Kumar, S., Cui, C., Kondikoppa, P., Chiu, C-H., 1450 and S-J. Park, "Towards fair and low latency next 1451 generation high speed networks: AFCD queuing", Journal of 1452 Network and Computer Applications 70:183--193, July 2016. 1454 [BBRv2] Cardwell, N., "TCP BBR v2 Alpha/Preview Release", github 1455 repository; Linux congestion control module, 1456 . 1458 [DCttH15] De Schepper, K., Bondarenko, O., Briscoe, B., and I. 1459 Tsang, "`Data Centre to the Home': Ultra-Low Latency for 1460 All", RITE project Technical Report , 2015, 1461 . 1463 [DOCSIS3.1] 1464 CableLabs, "MAC and Upper Layer Protocols Interface 1465 (MULPI) Specification, CM-SP-MULPIv3.1", Data-Over-Cable 1466 Service Interface Specifications DOCSIS(R) 3.1 Version i17 1467 or later, January 2019, . 1470 [DualPI2Linux] 1471 Albisser, O., De Schepper, K., Briscoe, B., Tilmans, O., 1472 and H. Steen, "DUALPI2 - Low Latency, Low Loss and 1473 Scalable (L4S) AQM", Proc. Linux Netdev 0x13 , March 2019, 1474 . 1477 [Hohlfeld14] 1478 Hohlfeld , O., Pujol, E., Ciucu, F., Feldmann, A., and P. 1479 Barford, "A QoE Perspective on Sizing Network Buffers", 1480 Proc. ACM Internet Measurement Conf (IMC'14) hmm, November 1481 2014. 1483 [I-D.briscoe-conex-policing] 1484 Briscoe, B., "Network Performance Isolation using 1485 Congestion Policing", draft-briscoe-conex-policing-01 1486 (work in progress), February 2014. 1488 [I-D.briscoe-docsis-q-protection] 1489 Briscoe, B. and G. White, "Queue Protection to Preserve 1490 Low Latency", draft-briscoe-docsis-q-protection-00 (work 1491 in progress), July 2019. 1493 [I-D.briscoe-iccrg-prague-congestion-control] 1494 Schepper, K. D., Tilmans, O., and B. Briscoe, "Prague 1495 Congestion Control", draft-briscoe-iccrg-prague- 1496 congestion-control-00 (work in progress), March 2021. 1498 [I-D.briscoe-tsvwg-l4s-diffserv] 1499 Briscoe, B., "Interactions between Low Latency, Low Loss, 1500 Scalable Throughput (L4S) and Differentiated Services", 1501 draft-briscoe-tsvwg-l4s-diffserv-02 (work in progress), 1502 November 2018. 1504 [I-D.cardwell-iccrg-bbr-congestion-control] 1505 Cardwell, N., Cheng, Y., Yeganeh, S. H., and V. Jacobson, 1506 "BBR Congestion Control", draft-cardwell-iccrg-bbr- 1507 congestion-control-00 (work in progress), July 2017. 1509 [I-D.ietf-tcpm-accurate-ecn] 1510 Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More 1511 Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate- 1512 ecn-14 (work in progress), February 2021. 1514 [I-D.ietf-tcpm-generalized-ecn] 1515 Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit 1516 Congestion Notification (ECN) to TCP Control Packets", 1517 draft-ietf-tcpm-generalized-ecn-07 (work in progress), 1518 February 2021. 1520 [I-D.ietf-tsvwg-aqm-dualq-coupled] 1521 Schepper, K. D., Briscoe, B., and G. White, "DualQ Coupled 1522 AQMs for Low Latency, Low Loss and Scalable Throughput 1523 (L4S)", draft-ietf-tsvwg-aqm-dualq-coupled-14 (work in 1524 progress), March 2021. 1526 [I-D.ietf-tsvwg-ecn-encap-guidelines] 1527 Briscoe, B. and J. Kaippallimalil, "Guidelines for Adding 1528 Congestion Notification to Protocols that Encapsulate IP", 1529 draft-ietf-tsvwg-ecn-encap-guidelines-15 (work in 1530 progress), March 2021. 1532 [I-D.ietf-tsvwg-ecn-l4s-id] 1533 Schepper, K. D. and B. Briscoe, "Explicit Congestion 1534 Notification (ECN) Protocol for Ultra-Low Queuing Delay 1535 (L4S)", draft-ietf-tsvwg-ecn-l4s-id-14 (work in progress), 1536 March 2021. 1538 [I-D.ietf-tsvwg-nqb] 1539 White, G. and T. Fossati, "A Non-Queue-Building Per-Hop 1540 Behavior (NQB PHB) for Differentiated Services", draft- 1541 ietf-tsvwg-nqb-05 (work in progress), March 2021. 1543 [I-D.ietf-tsvwg-rfc6040update-shim] 1544 Briscoe, B., "Propagating Explicit Congestion Notification 1545 Across IP Tunnel Headers Separated by a Shim", draft-ietf- 1546 tsvwg-rfc6040update-shim-13 (work in progress), March 1547 2021. 1549 [I-D.morton-tsvwg-codel-approx-fair] 1550 Morton, J. and P. G. Heist, "Controlled Delay Approximate 1551 Fairness AQM", draft-morton-tsvwg-codel-approx-fair-01 1552 (work in progress), March 2020. 1554 [I-D.sridharan-tcpm-ctcp] 1555 Sridharan, M., Tan, K., Bansal, D., and D. Thaler, 1556 "Compound TCP: A New TCP Congestion Control for High-Speed 1557 and Long Distance Networks", draft-sridharan-tcpm-ctcp-02 1558 (work in progress), November 2008. 1560 [I-D.stewart-tsvwg-sctpecn] 1561 Stewart, R. R., Tuexen, M., and X. Dong, "ECN for Stream 1562 Control Transmission Protocol (SCTP)", draft-stewart- 1563 tsvwg-sctpecn-05 (work in progress), January 2014. 1565 [L4Sdemo16] 1566 Bondarenko, O., De Schepper, K., Tsang, I., and B. 1567 Briscoe, "Ultra-Low Delay for All: Live Experience, Live 1568 Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016, 1569 . 1573 [LEDBAT_AQM] 1574 Al-Saadi, R., Armitage, G., and J. But, "Characterising 1575 LEDBAT Performance Through Bottlenecks Using PIE, FQ-CoDel 1576 and FQ-PIE Active Queue Management", Proc. IEEE 42nd 1577 Conference on Local Computer Networks (LCN) 278--285, 1578 2017, . 1580 [Mathis09] 1581 Mathis, M., "Relentless Congestion Control", PFLDNeT'09 , 1582 May 2009, . 1587 [McIlroy78] 1588 McIlroy, M., Pinson, E., and B. Tague, "UNIX Time-Sharing 1589 System: Foreword", The Bell System Technical Journal 1590 57:6(1902--1903), July 1978, 1591 . 1593 [Nadas20] Nadas, S., Gombos, G., Fejes, F., and S. Laki, "A 1594 Congestion Control Independent L4S Scheduler", Proc. 1595 Applied Networking Research Workshop (ANRW '20) 45--51, 1596 July 2020. 1598 [NewCC_Proc] 1599 Eggert, L., "Experimental Specification of New Congestion 1600 Control Algorithms", IETF Operational Note ion-tsv-alt-cc, 1601 July 2007. 1603 [PragueLinux] 1604 Briscoe, B., De Schepper, K., Albisser, O., Misund, J., 1605 Tilmans, O., Kuehlewind, M., and A. Ahmed, "Implementing 1606 the `TCP Prague' Requirements for Low Latency Low Loss 1607 Scalable Throughput (L4S)", Proc. Linux Netdev 0x13 , 1608 March 2019, . 1611 [QDyn] Briscoe, B., "Rapid Signalling of Queue Dynamics", 1612 bobbriscoe.net Technical Report TR-BB-2017-001; 1613 arXiv:1904.07044 [cs.NI], September 2017, 1614 . 1616 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 1617 and W. Weiss, "An Architecture for Differentiated 1618 Services", RFC 2475, DOI 10.17487/RFC2475, December 1998, 1619 . 1621 [RFC2697] Heinanen, J. and R. Guerin, "A Single Rate Three Color 1622 Marker", RFC 2697, DOI 10.17487/RFC2697, September 1999, 1623 . 1625 [RFC2698] Heinanen, J. and R. Guerin, "A Two Rate Three Color 1626 Marker", RFC 2698, DOI 10.17487/RFC2698, September 1999, 1627 . 1629 [RFC2884] Hadi Salim, J. and U. Ahmed, "Performance Evaluation of 1630 Explicit Congestion Notification (ECN) in IP Networks", 1631 RFC 2884, DOI 10.17487/RFC2884, July 2000, 1632 . 1634 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 1635 of Explicit Congestion Notification (ECN) to IP", 1636 RFC 3168, DOI 10.17487/RFC3168, September 2001, 1637 . 1639 [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, 1640 J., Courtney, W., Davari, S., Firoiu, V., and D. 1641 Stiliadis, "An Expedited Forwarding PHB (Per-Hop 1642 Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002, 1643 . 1645 [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit 1646 Congestion Notification (ECN) Signaling with Nonces", 1647 RFC 3540, DOI 10.17487/RFC3540, June 2003, 1648 . 1650 [RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows", 1651 RFC 3649, DOI 10.17487/RFC3649, December 2003, 1652 . 1654 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 1655 Congestion Control Protocol (DCCP)", RFC 4340, 1656 DOI 10.17487/RFC4340, March 2006, 1657 . 1659 [RFC4774] Floyd, S., "Specifying Alternate Semantics for the 1660 Explicit Congestion Notification (ECN) Field", BCP 124, 1661 RFC 4774, DOI 10.17487/RFC4774, November 2006, 1662 . 1664 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 1665 RFC 4960, DOI 10.17487/RFC4960, September 2007, 1666 . 1668 [RFC5033] Floyd, S. and M. Allman, "Specifying New Congestion 1669 Control Algorithms", BCP 133, RFC 5033, 1670 DOI 10.17487/RFC5033, August 2007, 1671 . 1673 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 1674 Friendly Rate Control (TFRC): Protocol Specification", 1675 RFC 5348, DOI 10.17487/RFC5348, September 2008, 1676 . 1678 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1679 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1680 . 1682 [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP 1683 Authentication Option", RFC 5925, DOI 10.17487/RFC5925, 1684 June 2010, . 1686 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 1687 Notification", RFC 6040, DOI 10.17487/RFC6040, November 1688 2010, . 1690 [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., 1691 and K. Carlberg, "Explicit Congestion Notification (ECN) 1692 for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August 1693 2012, . 1695 [RFC6973] Cooper, A., Tschofenig, H., Aboba, B., Peterson, J., 1696 Morris, J., Hansen, M., and R. Smith, "Privacy 1697 Considerations for Internet Protocols", RFC 6973, 1698 DOI 10.17487/RFC6973, July 2013, 1699 . 1701 [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext 1702 Transfer Protocol Version 2 (HTTP/2)", RFC 7540, 1703 DOI 10.17487/RFC7540, May 2015, 1704 . 1706 [RFC7560] Kuehlewind, M., Ed., Scheffenegger, R., and B. Briscoe, 1707 "Problem Statement and Requirements for Increased Accuracy 1708 in Explicit Congestion Notification (ECN) Feedback", 1709 RFC 7560, DOI 10.17487/RFC7560, August 2015, 1710 . 1712 [RFC7665] Halpern, J., Ed. and C. Pignataro, Ed., "Service Function 1713 Chaining (SFC) Architecture", RFC 7665, 1714 DOI 10.17487/RFC7665, October 2015, 1715 . 1717 [RFC7713] Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx) 1718 Concepts, Abstract Mechanism, and Requirements", RFC 7713, 1719 DOI 10.17487/RFC7713, December 2015, 1720 . 1722 [RFC8033] Pan, R., Natarajan, P., Baker, F., and G. White, 1723 "Proportional Integral Controller Enhanced (PIE): A 1724 Lightweight Control Scheme to Address the Bufferbloat 1725 Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017, 1726 . 1728 [RFC8034] White, G. and R. Pan, "Active Queue Management (AQM) Based 1729 on Proportional Integral Controller Enhanced PIE) for 1730 Data-Over-Cable Service Interface Specifications (DOCSIS) 1731 Cable Modems", RFC 8034, DOI 10.17487/RFC8034, February 1732 2017, . 1734 [RFC8170] Thaler, D., Ed., "Planning for Protocol Adoption and 1735 Subsequent Transitions", RFC 8170, DOI 10.17487/RFC8170, 1736 May 2017, . 1738 [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., 1739 and G. Judd, "Data Center TCP (DCTCP): TCP Congestion 1740 Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257, 1741 October 2017, . 1743 [RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys, 1744 J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler 1745 and Active Queue Management Algorithm", RFC 8290, 1746 DOI 10.17487/RFC8290, January 2018, 1747 . 1749 [RFC8298] Johansson, I. and Z. Sarker, "Self-Clocked Rate Adaptation 1750 for Multimedia", RFC 8298, DOI 10.17487/RFC8298, December 1751 2017, . 1753 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 1754 Notification (ECN) Experimentation", RFC 8311, 1755 DOI 10.17487/RFC8311, January 2018, 1756 . 1758 [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 1759 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 1760 RFC 8312, DOI 10.17487/RFC8312, February 2018, 1761 . 1763 [RFC8404] Moriarty, K., Ed. and A. Morton, Ed., "Effects of 1764 Pervasive Encryption on Operators", RFC 8404, 1765 DOI 10.17487/RFC8404, July 2018, 1766 . 1768 [RFC8511] Khademi, N., Welzl, M., Armitage, G., and G. Fairhurst, 1769 "TCP Alternative Backoff with ECN (ABE)", RFC 8511, 1770 DOI 10.17487/RFC8511, December 2018, 1771 . 1773 [RFC8888] Sarker, Z., Perkins, C., Singh, V., and M. Ramalho, "RTP 1774 Control Protocol (RTCP) Feedback for Congestion Control", 1775 RFC 8888, DOI 10.17487/RFC8888, January 2021, 1776 . 1778 [RFC9000] Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 1779 Multiplexed and Secure Transport", RFC 9000, 1780 DOI 10.17487/RFC9000, May 2021, 1781 . 1783 [SCReAM] Johansson, I., "SCReAM", github repository; , 1784 . 1787 [TCP-CA] Jacobson, V. and M. Karels, "Congestion Avoidance and 1788 Control", Laurence Berkeley Labs Technical Report , 1789 November 1988, . 1791 [TCP-sub-mss-w] 1792 Briscoe, B. and K. De Schepper, "Scaling TCP's Congestion 1793 Window for Small Round Trip Times", BT Technical Report 1794 TR-TUB8-2015-002, May 2015, 1795 . 1798 [UnorderedLTE] 1799 Austrheim, M., "Implementing immediate forwarding for 4G 1800 in a network simulator", Masters Thesis, Uni Oslo , June 1801 2019. 1803 Appendix A. Standardization items 1805 The following table includes all the items that will need to be 1806 standardized to provide a full L4S architecture. 1808 The table is too wide for the ASCII draft format, so it has been 1809 split into two, with a common column of row index numbers on the 1810 left. 1812 The columns in the second part of the table have the following 1813 meanings: 1815 WG: The IETF WG most relevant to this requirement. The "tcpm/iccrg" 1816 combination refers to the procedure typically used for congestion 1817 control changes, where tcpm owns the approval decision, but uses 1818 the iccrg for expert review [NewCC_Proc]; 1820 TCP: Applicable to all forms of TCP congestion control; 1822 DCTCP: Applicable to Data Center TCP as currently used (in 1823 controlled environments); 1825 DCTCP bis: Applicable to any future Data Center TCP congestion 1826 control intended for controlled environments; 1828 XXX Prague: Applicable to a Scalable variant of XXX (TCP/SCTP/RMCAT) 1829 congestion control. 1831 +-----+------------------------+------------------------------------+ 1832 | Req | Requirement | Reference | 1833 | # | | | 1834 +-----+------------------------+------------------------------------+ 1835 | 0 | ARCHITECTURE | | 1836 | 1 | L4S IDENTIFIER | [I-D.ietf-tsvwg-ecn-l4s-id] S.3 | 1837 | 2 | DUAL QUEUE AQM | [I-D.ietf-tsvwg-aqm-dualq-coupled] | 1838 | 3 | Suitable ECN Feedback | [I-D.ietf-tcpm-accurate-ecn] | 1839 | | | S.4.2, | 1840 | | | [I-D.stewart-tsvwg-sctpecn]. | 1841 | | | | 1842 | | SCALABLE TRANSPORT - | | 1843 | | SAFETY ADDITIONS | | 1844 | 4-1 | Fall back to | [I-D.ietf-tsvwg-ecn-l4s-id] S.4.3, | 1845 | | Reno/Cubic on loss | [RFC8257] | 1846 | 4-2 | Fall back to | [I-D.ietf-tsvwg-ecn-l4s-id] S.4.3 | 1847 | | Reno/Cubic if classic | | 1848 | | ECN bottleneck | | 1849 | | detected | | 1850 | | | | 1851 | 4-3 | Reduce RTT-dependence | [I-D.ietf-tsvwg-ecn-l4s-id] S.4.3 | 1852 | | | | 1853 | 4-4 | Scaling TCP's | [I-D.ietf-tsvwg-ecn-l4s-id] S.4.3, | 1854 | | Congestion Window for | [TCP-sub-mss-w] | 1855 | | Small Round Trip Times | | 1856 | | SCALABLE TRANSPORT - | | 1857 | | PERFORMANCE | | 1858 | | ENHANCEMENTS | | 1859 | 5-1 | Setting ECT in TCP | [I-D.ietf-tcpm-generalized-ecn] | 1860 | | Control Packets and | | 1861 | | Retransmissions | | 1862 | 5-2 | Faster-than-additive | [I-D.ietf-tsvwg-ecn-l4s-id] (Appx | 1863 | | increase | A.2.2) | 1864 | 5-3 | Faster Convergence at | [I-D.ietf-tsvwg-ecn-l4s-id] (Appx | 1865 | | Flow Start | A.2.2) | 1866 +-----+------------------------+------------------------------------+ 1867 +-----+--------+-----+-------+-----------+--------+--------+--------+ 1868 | # | WG | TCP | DCTCP | DCTCP-bis | TCP | SCTP | RMCAT | 1869 | | | | | | Prague | Prague | Prague | 1870 +-----+--------+-----+-------+-----------+--------+--------+--------+ 1871 | 0 | tsvwg | Y | Y | Y | Y | Y | Y | 1872 | 1 | tsvwg | | | Y | Y | Y | Y | 1873 | 2 | tsvwg | n/a | n/a | n/a | n/a | n/a | n/a | 1874 | | | | | | | | | 1875 | | | | | | | | | 1876 | | | | | | | | | 1877 | 3 | tcpm | Y | Y | Y | Y | n/a | n/a | 1878 | | | | | | | | | 1879 | 4-1 | tcpm | | Y | Y | Y | Y | Y | 1880 | | | | | | | | | 1881 | 4-2 | tcpm/ | | | | Y | Y | ? | 1882 | | iccrg? | | | | | | | 1883 | | | | | | | | | 1884 | | | | | | | | | 1885 | | | | | | | | | 1886 | | | | | | | | | 1887 | 4-3 | tcpm/ | | | Y | Y | Y | ? | 1888 | | iccrg? | | | | | | | 1889 | 4-4 | tcpm | Y | Y | Y | Y | Y | ? | 1890 | | | | | | | | | 1891 | | | | | | | | | 1892 | 5-1 | tcpm | Y | Y | Y | Y | n/a | n/a | 1893 | | | | | | | | | 1894 | 5-2 | tcpm/ | | | Y | Y | Y | ? | 1895 | | iccrg? | | | | | | | 1896 | 5-3 | tcpm/ | | | Y | Y | Y | ? | 1897 | | iccrg? | | | | | | | 1898 +-----+--------+-----+-------+-----------+--------+--------+--------+ 1900 Authors' Addresses 1902 Bob Briscoe (editor) 1903 Independent 1904 UK 1906 Email: ietf@bobbriscoe.net 1907 URI: http://bobbriscoe.net/ 1908 Koen De Schepper 1909 Nokia Bell Labs 1910 Antwerp 1911 Belgium 1913 Email: koen.de_schepper@nokia.com 1914 URI: https://www.bell-labs.com/usr/koen.de_schepper 1916 Marcelo Bagnulo 1917 Universidad Carlos III de Madrid 1918 Av. Universidad 30 1919 Leganes, Madrid 28911 1920 Spain 1922 Phone: 34 91 6249500 1923 Email: marcelo@it.uc3m.es 1924 URI: http://www.it.uc3m.es 1926 Greg White 1927 CableLabs 1928 US 1930 Email: G.White@CableLabs.com