idnits 2.17.1 draft-ietf-tsvwg-l4sops-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (12 July 2021) is 1011 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'Reno' is mentioned on line 214, but not defined == Outdated reference: A later version (-25) exists of draft-ietf-tsvwg-aqm-dualq-coupled-13 == Outdated reference: A later version (-29) exists of draft-ietf-tsvwg-ecn-l4s-id-12 == Outdated reference: A later version (-20) exists of draft-ietf-tsvwg-l4s-arch-08 Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Area Working Group G. White, Ed. 3 Internet-Draft CableLabs 4 Intended status: Informational 12 July 2021 5 Expires: 13 January 2022 7 Operational Guidance for Deployment of L4S in the Internet 8 draft-ietf-tsvwg-l4sops-01 10 Abstract 12 This document is intended to provide guidance in order to ensure 13 successful deployment of Low Latency Low Loss Scalable throughput 14 (L4S) in the Internet. Other L4S documents provide guidance for 15 running an L4S experiment, but this document is focused solely on 16 potential interactions between L4S flows and flows using the original 17 ('Classic') ECN over a Classic ECN bottleneck link. The document 18 discusses the potential outcomes of these interactions, describes 19 mechanisms to detect the presence of Classic ECN bottlenecks, and 20 identifies opportunities to prevent and/or detect and resolve 21 fairness problems in such networks. This guidance is aimed at 22 operators of end-systems, operators of networks, and researchers. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at https://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on 13 January 2022. 41 Copyright Notice 43 Copyright (c) 2021 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 48 license-info) in effect on the date of publication of this document. 49 Please review these documents carefully, as they describe your rights 50 and restrictions with respect to this document. Code Components 51 extracted from this document must include Simplified BSD License text 52 as described in Section 4.e of the Trust Legal Provisions and are 53 provided without warranty as described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2. Per-Flow Fairness . . . . . . . . . . . . . . . . . . . . . . 5 59 3. Flow Queuing Systems . . . . . . . . . . . . . . . . . . . . 7 60 4. Detection of Classic ECN Bottlenecks . . . . . . . . . . . . 7 61 4.1. Recent Studies . . . . . . . . . . . . . . . . . . . . . 7 62 4.2. Future Experiments . . . . . . . . . . . . . . . . . . . 9 63 5. Operator of an L4S host . . . . . . . . . . . . . . . . . . . 9 64 5.1. Server Type . . . . . . . . . . . . . . . . . . . . . . . 10 65 5.1.1. General purpose servers (e.g. web servers) . . . . . 10 66 5.1.2. Specialized servers handling long-running sessions 67 (e.g. cloud gaming) . . . . . . . . . . . . . . . . . 10 68 5.2. Server deployment environment . . . . . . . . . . . . . . 11 69 5.2.1. Edge Servers . . . . . . . . . . . . . . . . . . . . 11 70 5.2.2. Other hosts . . . . . . . . . . . . . . . . . . . . . 12 71 6. Operator of a Network Employing RFC3168 FIFO Bottlenecks . . 13 72 6.1. Preferred Options . . . . . . . . . . . . . . . . . . . . 13 73 6.1.1. Upgrade AQMs to an L4S-aware AQM . . . . . . . . . . 13 74 6.1.2. Configure Non-Coupled Dual Queue with Shallow 75 Target . . . . . . . . . . . . . . . . . . . . . . . 13 76 6.1.3. Approximate Fair Dropping . . . . . . . . . . . . . . 14 77 6.1.4. Replace RFC3168 FIFO with RFC3168 FQ . . . . . . . . 14 78 6.1.5. Do Nothing . . . . . . . . . . . . . . . . . . . . . 14 79 6.2. Less Preferred Options . . . . . . . . . . . . . . . . . 14 80 6.2.1. Configure Non-Coupled Dual Queue Treating ECT(1) as 81 NotECT . . . . . . . . . . . . . . . . . . . . . . . 14 82 6.2.2. WRED with ECT(1) Differentation . . . . . . . . . . . 15 83 6.2.3. Configure AQM to treat ECT(1) as NotECT . . . . . . . 15 84 6.2.4. ECT(1) Tunnel Bypass . . . . . . . . . . . . . . . . 15 85 6.3. Last Resort Options . . . . . . . . . . . . . . . . . . . 15 86 6.3.1. Disable RFC3168 Support . . . . . . . . . . . . . . . 16 87 6.3.2. Re-mark ECT(1) to NotECT Prior to AQM . . . . . . . . 16 88 7. Operator of a Network Employing RFC3168 FQ Bottlenecks . . . 16 89 8. Conclusion of the L4S experiment . . . . . . . . . . . . . . 17 90 8.1. Termination of a successful L4S experiment . . . . . . . 17 91 8.2. Termination of an unsuccessful L4S experiment . . . . . . 18 92 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 18 93 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 94 11. Security Considerations . . . . . . . . . . . . . . . . . . . 18 95 12. Informative References . . . . . . . . . . . . . . . . . . . 18 96 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 21 98 1. Introduction 100 Low-latency, low-loss, scalable throughput (L4S) 101 [I-D.ietf-tsvwg-l4s-arch] traffic is designed to provide lower 102 queuing delay than conventional traffic via a new network service 103 based on a modified Explicit Congestion Notification (ECN) response 104 from the network. L4S traffic is identified by the ECT(1) codepoint, 105 and network bottlenecks that support L4S should congestion-mark 106 ECT(1) packets to enable L4S congestion feedback. However, L4S 107 traffic is also expected to coexist well with classic congestion 108 controlled traffic even if the bottleneck queue does not support L4S. 109 This includes paths where the bottleneck link utilizes packet drops 110 in response to congestion (either due to buffer overrun or active 111 queue management), as well as paths that implement a 'flow-queuing' 112 scheduler such as fq_codel [RFC8290]. A potential area of poor 113 interoperability lies in network bottlenecks employing a shared queue 114 that implements an Active Queue Management (AQM) algorithm that 115 provides Explicit Congestion Notification signaling according to 116 [RFC3168]. RFC3168 has been updated (via [RFC8311]) to reserve 117 ECT(1) for experimental use only (also see [IANA-ECN]), and its use 118 for L4S has been specified in [I-D.ietf-tsvwg-ecn-l4s-id]. However, 119 any deployed RFC3168 AQMs might not be updated, and RFC8311 still 120 prefers that routers not involved in L4S experimentation treat ECT(1) 121 and ECT(0) as equivalent. It has been demonstrated [Briscoe] that 122 when a set of long-running flows comprising both classic congestion 123 controlled flows and L4S-compliant congestion controlled flows 124 compete for bandwidth in such a legacy shared RFC3168 queue, the 125 classic congestion controlled flows may achieve lower throughput than 126 they would have if all of the flows had been classic congestion 127 controlled flows. This 'unfairness' between the two classes is more 128 pronounced on longer RTT paths (e.g. 50ms and above) and/or at higher 129 link rates (e.g. 50 Mbps and above). The lower the capacity per 130 flow, the less pronounced the problem becomes. Thus the imbalance is 131 most significant when the slowest flow rate is still high in absolute 132 terms. 134 The root cause of the unfairness is that the L4S architecture 135 redefines the congestion signal (CE mark) and congestion response in 136 the case of packets marked ECT(1) (used by L4S senders), whereas a 137 RFC3168 queue does not differentiate between packets marked ECT(0) 138 (used by classic senders) and those marked ECT(1), and provides CE 139 marks identically to both types. The classic senders expect that CE 140 marks are sent very rarely (e.g. approximately 1 CE mark every 200 141 round trips on a 50 Mbps x 50ms path) while the L4S senders expect 142 very frequent CE marking (e.g. approximately 2 CE marks per round 143 trip). The result is that the classic senders respond to the CE 144 marks provided by the bottleneck by yielding capacity to the L4S 145 flows. The resulting rate imbalance can be demonstrated, and could 146 be a cause of concern in some cases. 148 This concern primarily relates to single-queue (FIFO) bottleneck 149 links that implement RFC3168 ECN, but the situation can also 150 potentially occur with per-flow queuing, e.g. fq_codel [RFC8290], 151 when flow isolation is imperfect due to hash collisions or VPN 152 tunnels. 154 While the above mentioned unfairness has been demonstrated in 155 laboratory testing, it has not been observed in operational networks, 156 in part because members of the Transport Working group are not aware 157 of any deployments of single-queue Classic ECN bottlenecks in the 158 Internet. 160 This issue was considered in November 2015 (and reaffirmed in April 161 2020) when the WG decided on the identifier to use for L4S, as 162 recorded in Appendix B.1 of [I-D.ietf-tsvwg-ecn-l4s-id]. It was 163 recognized that compromises would have to be made because IP header 164 space is extremely limited. A number of alternative codepoint 165 schemes were compared for their ability to traverse most Internet 166 paths, to work over tunnels, to work at lower layers, to work with 167 TCP, etc. It was decided to progress on the basis that robust 168 performance in presence of these single-queue RFC3168 bottlenecks is 169 not the most critical issue, since it was believed that they are 170 rare. 172 Nonetheless, there is the possibility that such deployments exist, 173 and there is the possibility that they could be deployed/enabled in 174 the future. Since any negative impact of this coexistence issue 175 would not be directly experienced by the party experimenting with L4S 176 endpoints, but rather by the other users of the bottleneck, there is 177 an interest in providing guidance to ensure that measures can be 178 taken to address the potential issues, should they arise in practice. 180 2. Per-Flow Fairness 182 There are a number of factors that influence the relative rates 183 achieved by a set of users or a set of applications sharing a queue 184 in a bottleneck link. Notably the response that each application has 185 to congestion signals (whether loss or explicit signaling) can play a 186 large role in determining whether the applications share the 187 bandwidth in an equitable manner. In the Internet, ISPs typically 188 control capacity sharing between their customers using a scheduler at 189 the access bottleneck rather than relying on the congestion responses 190 of end-systems. So in that context this question primarily concerns 191 capacity sharing between the applications used by one customer site. 192 Nonetheless, there are many networks on the Internet where capacity 193 sharing relies, at least to some extent, on congestion control in the 194 end-systems. The traditional norm for congestion response has been 195 that it is handled on a per-connection basis, and that (all else 196 being equal) it results in each connection in the bottleneck 197 achieving a data rate inversely proportional to the average RTT of 198 the connection. The end result (in the case of steady-state behavior 199 of a set of like connections) is that each user or application 200 achieves a data rate proportional to N/RTT, where N is the number of 201 simultaneous connections that the user or application creates, and 202 RTT is the harmonic mean of the average round-trip-times for those 203 connections. Thus, users or applications that create a larger number 204 of connections and/or that have a lower RTT achieve a larger share of 205 the bottleneck link rate than others. 207 While this may not be considered fair by many, it nonetheless has 208 been the typical starting point for discussions around fairness. In 209 fact it has been common when evaluating new congestion responses to 210 actually set aside N & RTT as variables in the equation, and just 211 compare per-flow rates between flows with the same RTT. For example 212 [RFC5348] defines the congestion response for a flow to be 213 '"reasonably fair" if its sending rate is generally within a factor 214 of two of the sending rate of a [Reno] TCP flow under the same 215 conditions.' Given that RTTs can vary by roughly two orders of 216 magnitude and flow counts can vary by at least an order of magnitude 217 between applications, it seems that the accepted definition of 218 reasonable fairness leaves quite a bit of room for different levels 219 of performance between users or applications, and so perhaps isn't 220 the gold standard, but is rather a metric that is used because of its 221 convenience. 223 In practice, the effect of this RTT dependence has historically been 224 muted by the fact that many networks were deployed with very large 225 ("bloated") drop-tail buffers that would introduce queuing delays 226 well in excess of the base RTT of the flows utilizing the link, thus 227 equalizing (to some degree) the effective RTTs of those flows. 229 Recently, as network equipment suppliers and operators have worked to 230 improve the latency performance of the network by the use of smaller 231 buffers and/or AQM algorithms, this has had the side-effect of 232 uncovering the inherent RTT bias in classic congestion control 233 algorithms. 235 The L4S architecture aims to significantly improve this situation, by 236 requiring senders to adopt a congestion response that eliminates RTT 237 bias as much as possible (see [I-D.ietf-tsvwg-ecn-l4s-id]). As a 238 result, L4S promotes a level of per-flow fairness beyond what is 239 ordinarily considered for classic senders, the RFC3168 issue 240 notwithstanding. 242 It is also worth noting that the congestion control algorithms 243 deployed currently on the internet tend toward (RTT-weighted) 244 fairness only over long timescales. For example, the cubic algorithm 245 can take minutes to converge to fairness when a new flow joins an 246 existing flow on a link [Ha]. Since the vast majority of TCP 247 connections don't last for minutes, it is unclear to what degree per- 248 flow, same-RTT fairness, even when demonstrated in the lab, 249 translates to the real world. 251 So, in real networks, where per-application, per-end-host or per- 252 customer fairness may be more important than long-term, same-RTT, 253 per-flow fairness, it may not be that instructive to focus on the 254 latter as being a necessary end goal. 256 Nonetheless, situations in which the presence of an L4S flow has the 257 potential to cause harm [Ware] to classic flows need to be 258 understood. Most importantly, if there are situations in which the 259 introduction of L4S traffic would degrade both the absolute and 260 relative performance of classic traffic significantly, i.e. to the 261 point that it would be considered starvation while L4S was not 262 starved, these situations need to be understood and either remedied 263 or avoided. 265 Aligned with this context, the guidance provided in this document is 266 aimed not at monitoring the relative performance of L4S senders 267 compared against classic senders on a per-flow basis, but rather at 268 identifying instances where RFC3168 bottlenecks are deployed so that 269 operators of L4S senders can have the opportunity to assess whether 270 any actions need to be taken. Additionally this document provides 271 guidance for network operators around configuring any RFC3168 272 bottlenecks to minimize the potential for negative interactions 273 between L4S and classic senders. 275 3. Flow Queuing Systems 277 As noted above, the concern around RFC3168 coexistence mainly 278 concerns single-queue systems where classic and L4S traffic are 279 mixed. In a flow-queuing system, when flow isolation is successful, 280 the FQ scheduling of such queues isolates classic congestion control 281 traffic from L4S traffic, and thus eliminates the potential for 282 unfairness. But, these systems are known to sometimes result in 283 imperfect isolation, either due to hash collisions (see Section 5.3 284 of [RFC8290]), because of VPN tunneling (see Section 6.2 of 285 [RFC8290]), or due to deliberate configuration (see Section 7, 286 Paragraph 5). 288 It is believed that the majority of FQ deployments in bottleneck 289 links today (e.g. Cake [Hoiland-Jorgensen]) employ hashing 290 algorithms that virtually eliminate the possibility of collisions, 291 making this a non-issue for those deployments. But, VPN tunnels 292 remain an issue for FQ deployments, and the introduction of L4S 293 traffic raises the possibility that tunnels containing mixed classic 294 and L4S traffic would exist, in which case FQ implementations that 295 have not been updated to be L4S-aware could exhibit similar 296 unfairness properties as single queue AQMs. Section 7 discusses some 297 remedies that can be implemented by operators of FQ equipment in 298 order to minimize this risk. Additionally, end-host mitigations such 299 as separating L4S and Classic traffic into distinct VPN tunnels could 300 be employed. 302 4. Detection of Classic ECN Bottlenecks 304 The IETF encourages researchers, end system deployers and network 305 operators to conduct experiments to identify to what degree RFC3168 306 bottlecks exist in networks. These types of measurement campaigns, 307 even if each is conducted over a limited set of paths, could be 308 useful to further understand the scope of any potential issues, to 309 guide end system deployers on where to examine performance more 310 closely (or possibly delay L4S deployment), and to help network 311 operators identify nodes where remediation may be necessary to 312 provide the best performance. 314 4.1. Recent Studies 316 A small number of recent studies have attempted to gauge the level of 317 RFC3168 AQM deployment in the internet. 319 In 2020, Akamai conducted a study 320 (https://mailarchive.ietf.org/arch/msg/tsvwg/2tbRHphJ8K_CE6is9n7iQy- 321 VAZM/) of "downstream" (server to client) CE marking broken out by 322 ASN on two separate days, one in late March, the other in mid July 324 [Holland]. They concluded that prevalence of CE-marking was low 325 across the ~800 ASNs observed (0.19% - 0.30% of ECT client IPs ever 326 saw a CE mark), but it was growing, and that they could not determine 327 whether the CE marking was due to a single queue or FQ. They also 328 observed that RFC3168 AQMs are not uniformly distributed. There were 329 three small ISPs where prevalence of CE-marking was above ~70%, 330 indicating a likely deployment by the ISP. There were another four 331 small ASNs where the prevalence was between 10% and 20%, which may 332 also indicate deployment by the ISP. There were also roughly six 333 larger ASNs (and perhaps 20 small ASNs) where the prevalence was 334 between 3% and 8%. 336 In 2017, Apple reported on their observations of ECN marking by 337 networks, broken out by country [Bhooma]. They reported four 338 countries that exceeded the global baseline seen by Akamai, but one 339 of these (Argentine Republic) was later discovered to be due to a bug 340 (https://datatracker.ietf.org/meeting/106/materials/slides-106-tsvwg- 341 sessa-72-l4s-drafts-00#page=15), leaving three countries: China 1% of 342 paths, Mexico 3.2% of paths, France 6% of paths. The percentage in 343 France appears consistent with reports 344 (https://mailarchive.ietf.org/arch/msg/tsvwg/ 345 UyvpwUiNw0obd_EylBBV7kDRIHs/) that fq_codel has been implemented in 346 DSL home routers deployed by Free.fr. 348 In December 2020 - January 2021, Pete Heist worked with a small 349 cooperative WISP in the Czech Republic to collect data on CE-marking 350 [I-D.heist-tsvwg-ecn-deployment-observations]. Overall, 18.6% of 351 paths saw possible RFC3168 AQM activity, which appears to place this 352 ISP in the small group with moderately high RFC3168 prevalence 353 reported by Akamai. This ISP was known to have deployed RFC3168 354 fq_codel equipment in some of their subnets, and in other subnets 355 there were 33 IPs where possible AQM activity was observed via CE- 356 marks and/or ECE flags, corresponding to approximately 10% of paths. 357 It was agreed (https://mailarchive.ietf.org/arch/msg/tsvwg/ 358 Rj7GylByZuFa3_LTCMvEfb-CYpw/) that these were likely to be due to 359 fq_codel implementations in home routers deployed by members of the 360 cooperative. 362 The interpretation of these studies seems to be that there are no 363 known deployments of FIFO RFC3168, all of the known RFC3168 364 deployments are fq_codel, the majority of the currently unknown 365 deployments are likely to be fq_codel, and there may be a small 366 number of networks where CE-marking is prevalent (and thus likely 367 ISP-managed) where it is currently unknown as to whether the source 368 is a FIFO or an FQ system. 370 Other studies (e.g. [Trammel], [Bauer], [Mandalari]) have examined 371 ECN traversal, but have not reported data on prevalence of CE-marking 372 by networks. Another [Roddav] examined traces from a Tier 1 ISP link 373 in 2018 and observed that 94% of the non-zero ECN marked packets were 374 CE, which appears to reflect a misconfiguration of equipment using 375 that link, as opposed to providing evidence of RFC3168 AQM 376 deployment. 378 4.2. Future Experiments 380 The design of future experiments should consider not only the 381 detection of RFC3168 ECN marking, but also the determination whether 382 the bottleneck AQM is a single queue (FIFO) or a flow-queuing (FQ) 383 system. It is believed that the vast majority, if not all, of the 384 RFC3168 AQMs in use at bottleneck links are flow-queuing systems 385 (e.g. fq_codel [RFC8290] or COBALT [Palmei]). 387 [Briscoe] contains recommendations on some of the mechanisms that can 388 be used to detect RFC3168 bottlenecks. In particular, Section 4 of 389 [Briscoe] outlines an approach for out-band-detection of RFC3168 390 bottlenecks. 392 5. Operator of an L4S host 394 From a host's perspective, support for L4S only involves the sender 395 via ECT(1) marking & L4S-compatible congestion control. The receiver 396 is involved in ECN feedback but can generally be agnostic to whether 397 ECN is being used for L4S [I-D.ietf-tsvwg-l4s-arch]. Between these 398 two entities, it is primarily incumbent upon the sender to evaluate 399 the potential for presence of RFC3168 FIFO bottlenecks and make 400 decisions whether or not to use L4S congestion control. While is is 401 possible for a receiver to disable L4S functionality by not 402 negotiating ECN, a general purpose receiver is not expected to 403 perform any testing or monitoring for RFC3168, and is also not 404 expected to invoke any active response in the case that such a 405 bottleneck exists. 407 Prior to deployment of any new technology, it is commonplace for the 408 parties involved in the deployment to validate the performance of the 409 new technology via lab testing, limited field testing, large scale 410 field testing, etc., usually in a progressive manner. The same is 411 expected for deployers of L4S technology. As part of that 412 validation, it is recommended that deployers consider the issue of 413 RFC3168 FIFO bottlenecks and conduct experiments as described in the 414 previous section, or otherwise assess the impact that the L4S 415 technology will have in the networks in which it is to be deployed, 416 and take action as is described further in this section. This sort 417 of progressive (incremental) deployment helps to ensure that any 418 issues are discovered when the scale of those issues is relatively 419 small. 421 TODO: discussion of risk of incorrectly classifying a path 423 5.1. Server Type 425 If pre-deployment testing raises concerns about issues with RFC3168 426 bottlenecks, the actions taken may depend on the server type. 428 5.1.1. General purpose servers (e.g. web servers) 430 * Out-of-band active testing could be performed by the server. For 431 example, a javascript application could run simultaneous downloads 432 (i.e. with and without L4S) during page reading time in order to 433 survey for presence of RFC3168 FIFO bottlenecks on paths to users 434 (e.g. as described in Section 4 of [Briscoe]). 436 * In-band testing could be built in to the transport protocol 437 implementation at the sender in order to perform detection (see 438 Section 5 of [Briscoe], though note that this mechanism does not 439 differentiate between FIFO and FQ). 441 * Discontinuing use of L4S based on the detection of RFC3168 FIFO 442 bottlenecks is likely not needed for short transactional transfers 443 (e.g. sub 10 seconds) since these are unlikely to achieve the 444 steady-state conditions where unfairness has been observed. 446 * For longer file transfers, it may be possible to fall-back to 447 Classic behavior in real-time (i.e. when doing in-band testing), 448 or to cache those destinations where RFC3168 has been detected, 449 and disable L4S for subsequent long file transfers to those 450 destinations. 452 5.1.2. Specialized servers handling long-running sessions (e.g. cloud 453 gaming) 455 * Out-of-band active testing could be performed at each session 456 startup 458 * Out-of-band active testing could be integrated into a "pre- 459 validation" of the service, done when the user signs up, and 460 periodically thereafter 462 * In-band detection as described in [Briscoe] could be performed 463 during the session 465 5.2. Server deployment environment 467 The responsibilities of and actions taken by a sender may 468 additionally depend on the environment in which it is deployed. The 469 following sub-sections discuss two scenarios: senders serving a 470 limited, known target audience and those that serve an unknown target 471 audience. 473 5.2.1. Edge Servers 475 Some hosts (such as CDN leaf nodes and servers internal to an ISP) 476 are deployed in environments in which they serve content to a 477 constrained set of networks or clients. The operator of such hosts 478 may be able to determine whether there is the possibility of 479 [RFC3168] FIFO bottlenecks being present, and utilize this 480 information to make decisions on selectively deploying L4S and/or 481 disabling it (e.g. bleaching ECN). Furthermore, such an operator may 482 be able to determine the likelihood of an L4S bottleneck being 483 present, and use this information as well. 485 For example, if a particular network is known to have deployed legacy 486 [RFC3168] FIFO bottlenecks, usage of L4S for long capacity-seeking 487 file transfers on that network could be delayed until those 488 bottlenecks can be upgraded to mitigate any potential issues as 489 discussed in the next section. 491 Prior to deploying L4S on edge servers a server operator should: 493 * Consult with network operators on presence of legacy [RFC3168] 494 FIFO bottlenecks 496 * Consult with network operators on presence of L4S bottlenecks 498 * Perform pre-deployment testing per network 500 If a particular network offers connectivity to other networks (e.g. 501 in the case of an ISP offering service to their customer's networks), 502 the lack of RFC3168 FIFO bottleneck deployment in the ISP network 503 can't be taken as evidence that RFC3168 FIFO bottlenecks don't exist 504 end-to-end (because one may have been deployed by the end-user 505 network). In these cases, deployment of L4S will need to take 506 appropriate steps to detect the presence of such bottlenecks. At 507 present, it is believed that the vast majority of RFC3168 bottlenecks 508 in end-user networks are implementations that utilize fq_codel or 509 Cake, where the unfairness problem is less likely to be a concern. 510 While this doesn't completely eliminate the possibility that a legacy 511 [RFC3168] FIFO bottleneck could exist, it nonetheless provides useful 512 information that can be utilized in the decision making around the 513 potential risk for any unfairness to be experienced by end users. 515 5.2.2. Other hosts 517 Hosts that are deployed in locations that serve a wide variety of 518 networks face a more difficult prospect in terms of handling the 519 potential presence of RFC3168 FIFO bottlenecks. Nonetheless, the 520 steps listed in the ealier section (based on server type) can be 521 taken to minimize the risk of unfairness. 523 The interpretation of studies on ECN usage and their deployment 524 context (see Section 4.1) has so far concluded that RFC3168 FIFO 525 bottlenecks are likely to be rare, and so detections using these 526 techniques may also prove to be rare. Therefore, it may be possible 527 for a host to cache a list of end host ip addresses where a RFC3168 528 bottleneck has been detected. Entries in such a cache would need to 529 age-out after a period of time to account for IP address changes, 530 path changes, equipment upgrades, etc. [TODO: more info on ways to 531 cache/maintain such a list] 533 It has been suggested that a public block-list of domains that 534 implement RFC3168 FIFO bottlenecks could be maintained. There are a 535 number of significant issues that would seem to make this idea 536 infeasible, not the least of which is the fact that presence of 537 RFC3168 FIFO bottlenecks or L4S bottlenecks is not a property of a 538 domain, it is the property of a link, and therefore of the particular 539 current path between two endpoints. 541 It has also been suggested that a public allow-list of domains that 542 are participating in the L4S experiment could be maintained. This 543 approach would not be useful, given the presence of an L4S domain on 544 the path does not imply the absence of RFC3168 AQMs upstream or 545 downstream of that domain. Also, the approach cannot cater for 546 domains with a mix of L4S and RFC3168 AQMs. 548 6. Operator of a Network Employing RFC3168 FIFO Bottlenecks 550 While it is more preferable for L4S senders to detect problems 551 themselves, a network operator who has deployed equipment in a likely 552 bottleneck link location (i.e. a link that is expected to frequently 553 be fully saturated) that is configured with a legacy [RFC3168] FIFO 554 AQM can take certain steps in order to improve rate fairness between 555 classic traffic and L4S traffic, and thus enable L4S to be deployed 556 in a greater number of paths. 558 Some of the options listed in this section may not be feasible in all 559 networking equipment. 561 6.1. Preferred Options 563 6.1.1. Upgrade AQMs to an L4S-aware AQM 565 If the RFC3168 AQM implementation can be upgraded to enable support 566 for L4S, either via [I-D.ietf-tsvwg-aqm-dualq-coupled] or via an L4S- 567 aware FQ implementation, this is the preferred approach to addressing 568 potential unfairness, because it additionally enables all of the 569 benefits of L4S. 571 6.1.2. Configure Non-Coupled Dual Queue with Shallow Target 573 Equipment supporting [RFC3168] may be configurable to enable two 574 parallel queues for the same traffic class, with classification done 575 based on the ECN field. 577 * Configure 2 queues, both with ECN; 50:50 WRR scheduler 579 - Queue #1: ECT(1) & CE packets - Shallow immediate AQM target 581 - Queue #2: ECT(0) & NotECT packets - Classic AQM target 583 * Outcome in the case of n L4S flows and m long-running Classic 584 flows 586 - if m & n are non-zero, flows get 1/2n and 1/2m of the capacity, 587 otherwise 1/n or 1/m 589 - never < 1/2 each flow's rate if all had been Classic 591 This option would allow L4S flows to achieve low latency, low loss 592 and scalable throughput, but would sacrifice the more precise flow 593 balance offered by [I-D.ietf-tsvwg-aqm-dualq-coupled]. This option 594 would be expected to result in some reordering of previously CE 595 marked packets sent by Classic ECN senders, which is a trait shared 596 with [I-D.ietf-tsvwg-aqm-dualq-coupled]. As is discussed in 597 [I-D.ietf-tsvwg-ecn-l4s-id], this reordering would be either zero 598 risk or very low risk. 600 6.1.3. Approximate Fair Dropping 602 The Approximate Fair Dropping ([AFD]) algorithm tracks individual 603 flow rates and introduces either packet drops or CE-marks to each 604 flow in proportion to the amount by which the flow rate exceeds a 605 computed per-flow fair-share rate. Where an implementation of AFD or 606 an equivalent algorithm is available, it could be enabled on an 607 interface with a single-queue RFC3168 AQM as a fairly lightweight way 608 to inject additional ECN marks into any significantly higher rate 609 flows. See also [Cisco-N9000]. 611 6.1.4. Replace RFC3168 FIFO with RFC3168 FQ 613 As discussed in Section XREF, implementations of RFC3168 with an FQ 614 scheduler (e.g. fq_codel or Cake) significantly reduce the likelihood 615 of experiencing any unfairness between Classic and L4S traffic. 617 6.1.5. Do Nothing 619 If it is infeasible to implement any of the above options, it may be 620 preferable for an operator of RFC3168 FIFO bottlenecks to leave them 621 unchanged. In many deployment situations the risk of fairness issues 622 may be very low, and the impact if they occur may not be particularly 623 troublesome. This could, for instance, be true in bottlenecks where 624 there is a high degree of flow aggregation or in high-speed 625 bottlenecks (e.g. greater than 100 Mbps). 627 6.2. Less Preferred Options 629 In the case that there is a concern about per-flow fairness between 630 L4S flows and Classic flows in an RFC3168 FIFO bottleneck, and none 631 of the remedies in the previous section can be implemented, the 632 options listed in this section could be considered. 634 6.2.1. Configure Non-Coupled Dual Queue Treating ECT(1) as NotECT 636 * Configure 2 queues, both with AQM; 50:50 WRR scheduler 638 - Queue #1: ECT(1) & NotECT packets - ECN disabled 640 - Queue #2: ECT(0) & CE packets - ECN enabled 642 * Outcome 643 - ECT(1) treated as NotECT 645 - Flow balance for the 2 queues is the same as in Section 6.1.2 647 This option would not allow L4S flows to achieve low latency, low 648 loss and scalable throughput in this bottleneck link. As a result it 649 is the less preferred option. 651 6.2.2. WRED with ECT(1) Differentation 653 This configuration is similar to the option described in 654 Section 6.2.1, but uses a single queue with WRED functionality. 656 * Configure the queue with two WRED classes 658 - Class #1: ECT(1) & NotECT packets - ECN disabled 660 - Class #2: ECT(0) & CE packets - ECN enabled 662 6.2.3. Configure AQM to treat ECT(1) as NotECT 664 If equipment is configurable in such a way as to only supply CE marks 665 to ECT(0) packets, and treat ECT(1) packets identically to NotECT, or 666 is upgradable to support this capability, doing so will eliminate the 667 risk of unfairness. 669 6.2.4. ECT(1) Tunnel Bypass 671 Tunnel ECT(1) traffic through the RFC3168 bottleneck with the outer 672 header indicating Not-ECT, by using either an ECN tunnel ingress in 673 Compatibility Mode [RFC6040] or a Limited Functionality ECN tunnel 674 [RFC3168]. 676 Two variants exist for this approach 678 1. per-domain: tunnel ECT(1) pkts to domain edge towards dst 680 2. per-dst: tunnel ECT(1) pkts to dst 682 6.3. Last Resort Options 684 If serious issues are detected, where the presence of L4S flows is 685 determined to be the likely cause, and none of the above options are 686 implementable, the options in this section can be considered as a 687 last resort. These options are not recommended. 689 6.3.1. Disable RFC3168 Support 691 Disabling an [RFC3168] AQM from CE marking both ECT(0) traffic and 692 ECT(1) traffic eliminates the unfairness issue. A downside to this 693 approach is that classic senders will no longer get the benefits of 694 Explict Congestion Notification at this bottleneck link either. This 695 alternative is only mentioned in case there is no other way to 696 reconfigure an RFC3168 AQM. 698 6.3.2. Re-mark ECT(1) to NotECT Prior to AQM 700 Remarking ECT(1) packets as NotECT (i.e. bleaching ECT(1)) ensures 701 that they are treated identically to classic NotECT senders. 702 However, this action is not recommended because a) it would also 703 prevent downstream L4S bottlenecks from providing high fidelity 704 congestion signals; b) it could lead to problems with future 705 experiments that use ECT(1) in alternative ways to L4S; and c) it 706 would violate requirements in [I-D.ietf-tsvwg-ecn-l4s-id]. This 707 alternative is mentioned as an absolute last resort in case there is 708 no other way to reconfigure an RFC3168 AQM. 710 Note that the CE codepoint must never be bleached, otherwise it would 711 black-hole congestion indications. 713 7. Operator of a Network Employing RFC3168 FQ Bottlenecks 715 A network operator who has deployed flow-queuing systems that 716 implement RFC3168 (e.g. fq_codel or CAKE using default hashing) at 717 network bottlenecks will likely see fewer potential issues when L4S 718 traffic is present on their network as compared to operators of 719 RFC3168 FIFOs. As discussed in Section 3, the flow queuing mechanism 720 will typically isolate L4S flows and Classic flows into separate 721 queues, and the scheduler will then enforce per-flow fairness. As a 722 result, the potential fairness issues between Classic and L4S traffic 723 that can occur in FIFOs will typically not occur in FQ systems. That 724 said, FQ systems commonly treat a tunneled traffic aggregate as a 725 single flow, and thus a tunneled traffic aggregate that contains a 726 mix of Classic and L4S traffic will utilize a single queue, and the 727 traffic within the tunnel could experience the same fairness issue as 728 has been described for RFC3168 FIFOs. This unfairness is compounded 729 by the fact that the FQ scheduler will already be causing unfairness 730 to flows within the tunnel relative to flows that are not tunneled 731 (each of which gets the same bandwidth share as does the tunnel). 732 Additionally, many of the deployed RFC3168 FQ systems currently 733 implement an AQM algorithm (either CoDel or COBALT) that is designed 734 for Classic traffic and reacts sluggishly to L4S (or unresponsive) 735 traffic, with the result being that L4S senders could in some cases 736 see worse latency performance than Classic senders. 738 While the potential unfairness result is arguably less impactful in 739 the case of RFC3168 FQ bottlenecks, it is believed that RFC3168 FQ 740 bottlenecks are currently more common than RFC3168 FIFO bottlenecks. 741 The most common deployments of RFC3168 FQ bottlenecks are in home 742 routers running OpenWRT firmware where the user has turned the 743 feature on. 745 As is the case with RFC3168 FIFOs, the preferred remedy for a network 746 operator that wishes to enable the best performance possible with 747 regard to L4S, is for the network operator to update RFC3168 FQ 748 bottlenecks to be L4S-aware. In cases where that is infeasible, 749 several of the remedies described in the previous section can be used 750 to reduce or eliminate these issues. 752 * Configure AQM to treat ECT(1) as NotECT 754 * Disable RFC3168 Support 756 * Re-mark ECT(1) to NotECT Prior to AQM 758 Note that some FQ schedulers can be configured to intentionally 759 aggregate multiple flows into each queue. This might be used, for 760 instance, to implement per-user or per-host fairness rather than per- 761 flow fairness. In this case, if the flow aggregates contain a mix of 762 Classic and L4S traffic, one would expect to see the same potential 763 unfairness as is seen in the FIFO case. The same remedies mentioned 764 above would apply in this case as well. 766 8. Conclusion of the L4S experiment 768 This section gives guidance on how L4S-deploying networks and 769 endpoints should respond to either of the two possible outcomes of 770 the IETF-supported L4S experiment. 772 8.1. Termination of a successful L4S experiment 774 If the L4S experiment is deemed successful, the IETF would be 775 expected to move the L4S specifications to standards track. Networks 776 would then be encouraged to continue/begin deploying L4S-aware nodes 777 and to replace all non-L4S-aware RFC3168 AQMs already deployed as far 778 as feasible, or at least restrict RFC3168 AQM to interpret ECT(1) 779 equal to NotECT. Networks that participated in the experiment would 780 be expected to track the evolution of the L4S standards and adapt 781 their implementations accordingly (e.g. if as part of switching from 782 experimental to standards track, changes in the L4S RFCs become 783 necessary). 785 8.2. Termination of an unsuccessful L4S experiment 787 If the L4S experiment is deemed unsuccessful due to lack of 788 deployment of compliant end-systems or AQMs, it might need to be 789 terminated: any L4S network nodes should then be un-deployed and the 790 ECT(1) codepoint usage should be released/recycled as quickly as 791 possible, recognizing that this process may take some time. To 792 facilitate this potential outcome, [I-D.ietf-tsvwg-ecn-l4s-id] 793 requires L4S hosts to be configurable to revert to non-L4S congestion 794 control, and networks to be configurable to treat ECT(1) the same as 795 ECT(0). 797 9. Contributors 799 Thanks to Bob Briscoe, Jake Holland, Koen De Schepper, Olivier 800 Tilmans, Tom Henderson, Asad Ahmed, Gorry Fairhurst, Sebastian 801 Moeller, Pete Heist, and members of the TSVWG mailing list for their 802 contributions to this document. 804 10. IANA Considerations 806 None. 808 11. Security Considerations 810 For further study. 812 12. Informative References 814 [AFD] Pan, R., Breslau, L., Prabhakar, B., and S. Shenker, 815 "Approximate Fairness through Differential Dropping", 816 Computer Comm. Rev. vol.33, no.1, January 2003, 817 . 820 [Bauer] Bauer, S., Beverly, R., and A. Berger, "Measuring the 821 State of ECN Readiness in Servers, Clients, and Routers", 822 Proc ACM SIGCOMM Internet Measurement Conference IMC'11, 823 2011, 824 . 826 [Bhooma] Bhooma, P., "TCP ECN: Experience with enabling ECN on the 827 Internet", 98th IETF MAPRG Presentation , 2017, 828 . 832 [Briscoe] Briscoe, B. and A.S. Ahmed, "TCP Prague Fall-back on 833 Detection of a Classic ECN AQM", ArXiv , February 2021, 834 . 836 [Cisco-N9000] 837 Cisco, "Intelligent Buffer Management on Cisco Nexus 9000 838 Series Switches White Paper", Cisco Product 839 Document 1486580292771926, 6 June 2017, 840 . 844 [Ha] Ha, S., Rhee, I., and L. Xu, "CUBIC: A New TCP-Friendly 845 High-Speed TCP Variant", ACM SIGOPS Operating Systems 846 Review , 2008, 847 . 850 [Hoiland-Jorgensen] 851 Hoiland-Jorgensen, T., Taht, D., and J. Morton, "Piece of 852 CAKE: A Comprehensive Queue Management Solution for Home 853 Gateways", 2018, . 855 [Holland] Holland, J., "Latency & AQM Observations on the Internet", 856 IETF MAPRG interim-2020-maprg-01, August 2020, 857 . 861 [I-D.heist-tsvwg-ecn-deployment-observations] 862 Heist, P. and J. Morton, "Explicit Congestion Notification 863 (ECN) Deployment Observations", Work in Progress, 864 Internet-Draft, draft-heist-tsvwg-ecn-deployment- 865 observations-02, 8 March 2021, . 869 [I-D.ietf-tsvwg-aqm-dualq-coupled] 870 Schepper, K., Briscoe, B., and G. White, "DualQ Coupled 871 AQMs for Low Latency, Low Loss and Scalable Throughput 872 (L4S)", Work in Progress, Internet-Draft, draft-ietf- 873 tsvwg-aqm-dualq-coupled-13, 15 November 2020, 874 . 877 [I-D.ietf-tsvwg-ecn-l4s-id] 878 Schepper, K. and B. Briscoe, "Identifying Modified 879 Explicit Congestion Notification (ECN) Semantics for 880 Ultra-Low Queuing Delay (L4S)", Work in Progress, 881 Internet-Draft, draft-ietf-tsvwg-ecn-l4s-id-12, 15 882 November 2020, . 885 [I-D.ietf-tsvwg-l4s-arch] 886 Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low 887 Latency, Low Loss, Scalable Throughput (L4S) Internet 888 Service: Architecture", Work in Progress, Internet-Draft, 889 draft-ietf-tsvwg-l4s-arch-08, 15 November 2020, 890 . 893 [IANA-ECN] Internet Assigned Numbers Authority, "IANA ECN Field 894 Assignments", 2018, . 897 [Mandalari] 898 Mandalari, AM., Lutu, A., Briscoe, B., Bagnulo, M., and O. 899 Alay, "Measuring ECN++: Good News for ++, Bad News for ECN 900 over Mobile", DOI 10.1109/MCOM.2018.1700739, IEEE 901 Communications Magazine vol. 56, no. 3, March 2018, 902 . 904 [Palmei] Palmei, J. and X. et al., "Design and Evaluation of COBALT 905 Queue Discipline", IEEE International Symposium on Local 906 and Metropolitan Area Networks 2019, 2019, 907 . 909 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 910 of Explicit Congestion Notification (ECN) to IP", 911 RFC 3168, DOI 10.17487/RFC3168, September 2001, 912 . 914 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 915 Friendly Rate Control (TFRC): Protocol Specification", 916 RFC 5348, DOI 10.17487/RFC5348, September 2008, 917 . 919 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 920 Notification", RFC 6040, DOI 10.17487/RFC6040, November 921 2010, . 923 [RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys, 924 J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler 925 and Active Queue Management Algorithm", RFC 8290, 926 DOI 10.17487/RFC8290, January 2018, 927 . 929 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 930 Notification (ECN) Experimentation", RFC 8311, 931 DOI 10.17487/RFC8311, January 2018, 932 . 934 [Roddav] Roddav, N., Streit, K., Rodosek, G.D., and A. Pras, "On 935 the Usage of DSCP and ECN Codepoints in Internet Backbone 936 Traffic Traces for IPv4 and IPv6", 937 DOI 10.1109/ISNCC.2019.8909187, ISNCC 2019, 2019, 938 . 940 [Trammel] Trammel, B., Kuehlewind, M., Boppart, D., Learmonth, I., 941 Fairhurst, G., and R. Scheffenegger, "Enabling Internet- 942 Wide Deployment of Explicit Congestion Notification", Proc 943 Passive & Active Measurement Conference PAM15, 2015, 944 . 947 [Ware] Ware, R., Mukerjee, M., Seshan, S., and J. Sherry, "Beyond 948 Jain's Fairness Index: Setting the Bar For The Deployment 949 of Congestion Control Algorithms", Hotnets'19 , 2019, 950 . 953 Author's Address 955 Greg White (editor) 956 CableLabs 958 Email: g.white@cablelabs.com