idnits 2.17.1 

draft-ietf-tsvwg-l4sops-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (5 May 2021) is 1084 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'Reno' is mentioned on line 197, but not defined

  == Outdated reference: A later version (-25) exists of
     draft-ietf-tsvwg-aqm-dualq-coupled-13

  == Outdated reference: A later version (-29) exists of
     draft-ietf-tsvwg-ecn-l4s-id-12

  == Outdated reference: A later version (-20) exists of
     draft-ietf-tsvwg-l4s-arch-08


     Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Transport Area Working Group                               G. White, Ed.
3	Internet-Draft                                                 CableLabs
4	Intended status: Informational                                5 May 2021
5	Expires: 6 November 2021

7	       Operational Guidance for Deployment of L4S in the Internet
8	                       draft-ietf-tsvwg-l4sops-00

10	Abstract

12	   This document is intended to provide guidance in order to ensure
13	   successful deployment of Low Latency Low Loss Scalable throughput
14	   (L4S) in the Internet.  Other L4S documents provide guidance for
15	   running an L4S experiment, but this document is focused solely on
16	   potential interactions between L4S flows and flows using the original
17	   ('Classic') ECN over a Classic ECN bottleneck link.  The document
18	   discusses the potential outcomes of these interactions, describes
19	   mechanisms to detect the presence of Classic ECN bottlenecks, and
20	   identifies opportunities to prevent and/or detect and resolve
21	   fairness problems in such networks.  This guidance is aimed at
22	   operators of end-systems, operators of networks, and researchers.

24	Status of This Memo

26	   This Internet-Draft is submitted in full conformance with the
27	   provisions of BCP 78 and BCP 79.

29	   Internet-Drafts are working documents of the Internet Engineering
30	   Task Force (IETF).  Note that other groups may also distribute
31	   working documents as Internet-Drafts.  The list of current Internet-
32	   Drafts is at https://datatracker.ietf.org/drafts/current/.

34	   Internet-Drafts are draft documents valid for a maximum of six months
35	   and may be updated, replaced, or obsoleted by other documents at any
36	   time.  It is inappropriate to use Internet-Drafts as reference
37	   material or to cite them other than as "work in progress."

39	   This Internet-Draft will expire on 6 November 2021.

41	Copyright Notice

43	   Copyright (c) 2021 IETF Trust and the persons identified as the
44	   document authors.  All rights reserved.

46	   This document is subject to BCP 78 and the IETF Trust's Legal
47	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
48	   license-info) in effect on the date of publication of this document.
49	   Please review these documents carefully, as they describe your rights
50	   and restrictions with respect to this document.  Code Components
51	   extracted from this document must include Simplified BSD License text
52	   as described in Section 4.e of the Trust Legal Provisions and are
53	   provided without warranty as described in the Simplified BSD License.

55	Table of Contents

57	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
58	   2.  Per-Flow Fairness . . . . . . . . . . . . . . . . . . . . . .   4
59	   3.  Detection of Classic ECN Bottlenecks  . . . . . . . . . . . .   6
60	     3.1.  Recent Studies  . . . . . . . . . . . . . . . . . . . . .   6
61	     3.2.  Future Experiments  . . . . . . . . . . . . . . . . . . .   7
62	   4.  Operator of an L4S host . . . . . . . . . . . . . . . . . . .   8
63	     4.1.  Edge Servers  . . . . . . . . . . . . . . . . . . . . . .  10
64	     4.2.  Other hosts . . . . . . . . . . . . . . . . . . . . . . .  11
65	   5.  Operator of a Network Employing RFC3168 FIFO Bottlenecks  . .  11
66	     5.1.  Configure AQM to treat ECT(1) as NotECT . . . . . . . . .  12
67	     5.2.  ECT(1) Tunnel Bypass  . . . . . . . . . . . . . . . . . .  12
68	     5.3.  Configure Non-Coupled Dual Queue  . . . . . . . . . . . .  12
69	     5.4.  WRED with ECT(1) Differentation . . . . . . . . . . . . .  13
70	     5.5.  Disable RFC3168 Support . . . . . . . . . . . . . . . . .  13
71	     5.6.  Re-mark ECT(1) to NotECT Prior to AQM . . . . . . . . . .  14
72	   6.  Operator of a Network Employing RFC3168 FQ Bottlenecks  . . .  14
73	   7.  Conclusion of the L4S experiment  . . . . . . . . . . . . . .  15
74	     7.1.  Successful termination of the L4S experiment  . . . . . .  15
75	     7.2.  Unsuccessful termination of the L4S experiment  . . . . .  15
76	   8.  Contributors  . . . . . . . . . . . . . . . . . . . . . . . .  15
77	   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  16
78	   10. Security Considerations . . . . . . . . . . . . . . . . . . .  16
79	   11. Informative References  . . . . . . . . . . . . . . . . . . .  16
80	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  18

82	1.  Introduction

84	   Low-latency, low-loss, scalable throughput (L4S)
85	   [I-D.ietf-tsvwg-l4s-arch] traffic is designed to provide lower
86	   queuing delay than conventional traffic via a new network service
87	   based on a modified Explicit Congestion Notification (ECN) response
88	   from the network.  L4S traffic is identified by the ECT(1) codepoint,
89	   and network bottlenecks that support L4S should congestion-mark
90	   ECT(1) packets to enable L4S congestion feedback.  However, L4S
91	   traffic is also expected to coexist well with classic congestion
92	   controlled traffic even if the bottleneck queue does not support L4S.
93	   This includes paths where the bottleneck link utilizes packet drops
94	   in response to congestion (either due to buffer overrun or active
95	   queue management), as well as paths that implement a 'flow-queuing'
96	   scheduler such as fq_codel [RFC8290].  A potential area of poor
97	   interoperability lies in network bottlenecks employing a shared queue
98	   that implements an Active Queue Management (AQM) algorithm that
99	   provides Explicit Congestion Notification signaling according to
100	   [RFC3168].  RFC3168 has been updated (via [RFC8311]) to reserve
101	   ECT(1) for experimental use only (also see [IANA-ECN]), and its use
102	   for L4S has been specified in [I-D.ietf-tsvwg-ecn-l4s-id].  However,
103	   any deployed RFC3168 AQMs might not be updated, and RFC8311 still
104	   prefers that routers not involved in L4S experimentation treat ECT(1)
105	   and ECT(0) as equivalent.  It has been demonstrated ([Detection])
106	   that when a set of long-running flows comprising both classic
107	   congestion controlled flows and L4S-compliant congestion controlled
108	   flows compete for bandwidth in such a legacy shared RFC3168 queue,
109	   the classic congestion controlled flows may achieve lower throughput
110	   than they would have if all of the flows had been classic congestion
111	   controlled flows.  This 'unfairness' between the two classes is more
112	   pronounced on longer RTT paths (e.g. 50ms and above) and/or at higher
113	   link rates (e.g. 50 Mbps and above).  The lower the capacity per
114	   flow, the less pronounced the problem becomes.  Thus the imbalance is
115	   most significant when the slowest flow rate is still high in absolute
116	   terms.

118	   The root cause of the unfairness is that the L4S architecture
119	   redefines the congestion signal (CE mark) and congestion response in
120	   the case of packets marked ECT(1) (used by L4S senders), whereas a
121	   RFC3168 queue does not differentiate between packets marked ECT(0)
122	   (used by classic senders) and those marked ECT(1), and provides
123	   identical CE marks to both types.  The result is that the two classes
124	   respond differently to the CE congestion signal.  The classic senders
125	   expect that CE marks are sent very rarely (e.g. approximately 1 CE
126	   mark every 200 round trips on a 50 Mbps x 50ms path) while the L4S
127	   senders expect very frequent CE marking (e.g. approximately 2 CE
128	   marks per round trip).  The result is that the classic senders
129	   respond to the CE marks provided by the bottleneck by yielding
130	   capacity to the L4S flows.  The resulting rate imbalance can be
131	   demonstrated, and could be a cause of concern in some cases.

133	   This concern primarily relates to single-queue (FIFO) bottleneck
134	   links that implement RFC3168 ECN, but the situation can also
135	   potentially occur with per-flow queuing, e.g. fq_codel [RFC8290],
136	   when flow isolation is imperfect due to hash collisions or VPN
137	   tunnels.

139	   While the above mentioned unfairness has been demonstrated in
140	   laboratory testing, it has not been observed in operational networks,
141	   in part because members of the Transport Working group are not aware
142	   of any deployments of single-queue Classic ECN bottlenecks in the
143	   Internet.

145	   This issue was considered in November 2015 (and reaffirmed in April
146	   2020) when the WG decided on the identifier to use for L4S, as
147	   recorded in Appendix B.1 of [I-D.ietf-tsvwg-ecn-l4s-id].  It was
148	   recognized that compromises would have to be made because IP header
149	   space is extremely limited.  A number of alternative codepoint
150	   schemes were compared for their ability to traverse most Internet
151	   paths, to work over tunnels, to work at lower layers, to work with
152	   TCP, etc.  It was decided to progress on the basis that robust
153	   performance in presence of these single-queue RFC3168 bottlenecks is
154	   not the most critical issue, since it was believed that they are
155	   rare.  Nonetheless, there is the possibility that such deployments
156	   exist, and there is the possibility that more could be deployed/
157	   enabled in the future, hence there is an interest in providing
158	   guidance to ensure that measures can be taken to address the
159	   potential issues, should they arise in practice.

161	   TODO: further discussion on severity and who might be impacted?

163	2.  Per-Flow Fairness

165	   There are a number of factors that influence the relative rates
166	   achieved by a set of users or a set of applications sharing a queue
167	   in a bottleneck link.  Notably the response that each application has
168	   to congestion signals (whether loss or explicit signaling) can play a
169	   large role in determining whether the applications share the
170	   bandwidth in an equitable manner.  In the Internet, ISPs typically
171	   control capacity sharing between their customers using a scheduler at
172	   the access bottleneck rather than relying on the congestion responses
173	   of end-systems.  So in that context this question primarily concerns
174	   capacity sharing between the applications used by one customer site.
175	   Nonetheless, there are many networks on the Internet where capacity
176	   sharing relies, at least to some extent, on congestion control in the
177	   end-systems.  The traditional norm for congestion response has been
178	   that it is handled on a per-connection basis, and that (all else
179	   being equal) it results in each connection in the bottleneck
180	   achieving a data rate inversely proportional to the average RTT of
181	   the connection.  The end result (in the case of steady-state behavior
182	   of a set of like connections) is that each user or application
183	   achieves a data rate proportional to N/RTT, where N is the number of
184	   simultaneous connections that the user or application creates, and
185	   RTT is the harmonic mean of the average round-trip-times for those
186	   connections.  Thus, users or applications that create a larger number
187	   of connections and/or that have a lower RTT achieve a larger share of
188	   the bottleneck link rate than others.

190	   While this may not be considered fair by many, it nonetheless has
191	   been the typical starting point for discussions around fairness.  In
192	   fact it has been common when evaluating new congestion responses to
193	   actually set aside N & RTT as variables in the equation, and just
194	   compare per-flow rates between flows with the same RTT.  For example
195	   [RFC5348] defines the congestion response for a flow to be
196	   '"reasonably fair" if its sending rate is generally within a factor
197	   of two of the sending rate of a [Reno] TCP flow under the same
198	   conditions.'  Given that RTTs can vary by roughly two orders of
199	   magnitude and flow counts can vary by at least an order of magnitude
200	   between applications, it seems that the accepted definition of
201	   reasonable fairness leaves quite a bit of room for different levels
202	   of performance between users or applications, and so perhaps isn't
203	   the gold standard, but is rather a metric that is used because of its
204	   convenience.

206	   In practice, the effect of this RTT dependence has historically been
207	   muted by the fact that many networks were deployed with very large
208	   ("bloated") drop-tail buffers that would introduce queuing delays
209	   well in excess of the base RTT of the flows utilizing the link, thus
210	   equalizing (to some degree) the effective RTTs of those flows.
211	   Recently, as network equipment suppliers and operators have worked to
212	   improve the latency performance of the network by the use of smaller
213	   buffers and/or AQM algorithms, this has had the side-effect of
214	   uncovering the inherent RTT bias in classic congestion control
215	   algorithms.

217	   The L4S architecture aims to significantly improve this situation, by
218	   requiring senders to adopt a congestion response that eliminates RTT
219	   bias as much as possible (see [I-D.ietf-tsvwg-ecn-l4s-id]).  As a
220	   result, L4S promotes a level of per-flow fairness beyond what is
221	   ordinarily considered for classic senders, the RFC3168 issue
222	   notwithstanding.

224	   It is also worth noting that the congestion control algorithms
225	   deployed currently on the internet tend toward (RTT-weighted)
226	   fairness only over long timescales.  For example, the cubic algorithm
227	   can take minutes to converge to fairness when a new flow joins an
228	   existing flow on a link [Cubic].  Since the vast majority of TCP
229	   connections don't last for minutes, it is unclear to what degree per-
230	   flow, same-RTT fairness, even when demonstrated in the lab,
231	   translates to the real world.

233	   So, in real networks, where per-application, per-end-host or per-
234	   customer fairness may be more important than long-term, same-RTT,
235	   per-flow fairness, it may not be that instructive to focus on the
236	   latter as being a necessary end goal.

238	   Nonetheless, situations in which the presence of an L4S flow has the
239	   potential to cause harm [Harm] to classic flows need to be
240	   understood.  Most importantly, if there are situations in which the
241	   introduction of L4S traffic would degrade both the absolute and
242	   relative performance of classic traffic significantly, i.e. to the
243	   point that it would be considered starvation while L4S was not
244	   starved, these situations need to be understood and either remedied
245	   or avoided.

247	   Aligned with this context, the guidance provided in this document is
248	   aimed not at monitoring the relative performance of L4S senders
249	   compared against classic senders on a per-flow basis, but rather at
250	   identifying instances where RFC3168 bottlenecks are deployed so that
251	   operators of L4S senders can have the opportunity to assess whether
252	   any actions need to be taken.  Additionally this document provides
253	   guidance for network operators around configuring any RFC3168
254	   bottlenecks to minimize the potential for negative interactions
255	   between L4S and classic senders.

257	3.  Detection of Classic ECN Bottlenecks

259	   The IETF encourages researchers, end system deployers and network
260	   operators to conduct experiments to identify to what degree RFC3168
261	   bottlecks exist in networks.  These types of measurement campaigns,
262	   even if each is conducted over a limited set of paths, could be
263	   useful to further understand the scope of any potential issues, to
264	   guide end system deployers on where to examine performance more
265	   closely (or possibly delay L4S deployment), and to help network
266	   operators identify nodes where remediation may be necessary to
267	   provide the best performance.

269	3.1.  Recent Studies

271	   A small number of recent studies have attempted to gauge the level of
272	   RFC3168 deployment in the internet.

274	   In 2020, Akamai conducted a study
275	   (https://mailarchive.ietf.org/arch/msg/tsvwg/2tbRHphJ8K_CE6is9n7iQy-
276	   VAZM/) of "downstream" (server to client) CE marking broken out by
277	   ASN on two separate days, one in late March, the other in mid July
278	   [Akamai].  They concluded that prevalence of CE-marking was low
279	   across the ~800 ASNs observed, but it was growing, and that they
280	   could not determine whether the CE marking was due to a single queue
281	   or FQ.  There were a small handful (5-7) of ASNs showing evidence of
282	   CE-marking across more than 10% of their client IPs, and the global
283	   baseline was CE-marking across 0.3% of IPs.

285	   In 2017, Apple reported [TCPECN] on their observations of ECN marking
286	   by networks, broken out by country.  They reported four countries
287	   that exceeded the global baseline seen by Akamai, but one of these
288	   (Argentine Republic) was later discovered to be due to a bug, leaving
289	   three countries: China 1% of paths, Mexico 3.2% of paths, France 6%
290	   of paths.  The percentage in France appears consistent with reports
291	   (https://mailarchive.ietf.org/arch/msg/tsvwg/
292	   UyvpwUiNw0obd_EylBBV7kDRIHs/) that fq_codel has been implemented in
293	   DSL home routers deployed by Free.fr.

295	   In December 2020 - January 2021, Pete Heist worked with a small
296	   cooperative WISP in the Czech Republic to collect data on CE-marking
297	   [I-D.heist-tsvwg-ecn-deployment-observations].  This ISP had deployed
298	   RFC3168 fq_codel equipment in some of their subnets, but in other
299	   subnets there were 33 IPs where CE-marking was possibly observed,
300	   corresponding to approximately 10% of paths, significantly greater
301	   than the baseline reported by Akamai.  It was agreed
302	   (https://mailarchive.ietf.org/arch/msg/tsvwg/Rj7GylByZuFa3_LTCMvEfb-
303	   CYpw/) that these were likely to be due to fq_codel implementations
304	   in home routers deployed by members of the cooperative.

306	   The interpretation of these studies seems to be that all of the known
307	   RFC3168 deployments are fq_codel, the majority of the currently
308	   unknown deployments are likely to be fq_codel, and there may be a
309	   small number of networks where CE-marking is prevalent (and thus
310	   likely ISP-managed) where it is currently unknown as to whether the
311	   source is a FIFO or an FQ system.

313	   Other studies (e.g.  [EnablingECN], [ECNreadiness], [MeasuringECN])
314	   have examined ECN traversal, but have not reported data on prevalence
315	   of CE-marking by networks.

317	3.2.  Future Experiments

319	   The design of future experiments should consider not only the
320	   detection of RFC3168 ECN marking, but also the determination whether
321	   the bottleneck AQM is a single queue (FIFO) or a flow-queuing (FQ)
322	   system.  It is believed that the vast majority, if not all, of the
323	   RFC3168 AQMs in use at bottleneck links are flow-queuing systems
324	   (e.g. fq_codel [RFC8290] or [COBALT]).  When flow isolation is
325	   successful, the FQ scheduling of such queues isolates classic
326	   congestion control traffic from L4S traffic, and thus eliminates the
327	   potential for unfairness.  But, these systems are known to sometimes
328	   result in imperfect isolation, either due to hash collisions (see
329	   Section 5.3 (https://datatracker.ietf.org/doc/html/rfc8290#section-
330	   5.3) of [RFC8290]) or because of VPN tunneling (see Section 6.2
331	   (https://datatracker.ietf.org/doc/html/rfc8290#section-6.2) of
332	   [RFC8290]).  It is believed that the majority of FQ deployments in
333	   bottleneck links today (e.g.  [Cake]) employ hashing algorithms that
334	   virtually eliminate the possibility of collisions, making this a non-
335	   issue for those deployments.  But, VPN tunnels remain an issue for FQ
336	   deployments, and the introduction of L4S traffic raises the
337	   possibility that tunnels containing mixed classic and L4S traffic
338	   would exist, in which case FQ implementations that have not been
339	   updated to be L4S-aware could exhibit similar unfairness properties
340	   as single queue AQMs.  Until such queues are upgraded to support L4S
341	   (see Section 6) or treat ECT(1) as not-ECT traffic, end-host
342	   mitigations such as separating L4S and Classic traffic into distinct
343	   VPN tunnels could be employed.

345	   [Detection] contains recommendations on some of the mechanisms that
346	   can be used to detect RFC3168 bottlenecks.  In particular, Section 4
347	   of [Detection] outlines an approach for out-band-detection of RFC3168
348	   bottlenecks.

350	4.  Operator of an L4S host

352	   From a host's perspective, support for L4S only involves the sender
353	   via ECT(1) marking & L4S-compatible congestion control.  The receiver
354	   is involved in ECN feedback but can generally be agnostic to whether
355	   ECN is being used for L4S [I-D.ietf-tsvwg-l4s-arch].  Between these
356	   two entities, it is primarily incumbent upon the sender to evaluate
357	   the potential for presence of RFC3168 FIFO bottlenecks and make
358	   decisions whether or not to use L4S congestion control.  While is is
359	   possible for a receiver to disable L4S functionality by not
360	   negotiating ECN, a general purpose receiver is not expected to
361	   perform any testing or monitoring for RFC3168, and is also not
362	   expected to invoke any active response in the case that such a
363	   bottleneck exists.

365	   Prior to deployment of any new technology, it is commonplace for the
366	   parties involved in the deployment to validate the performance of the
367	   new technology, via lab testing, limited field testing, large scale
368	   field testing, etc.  The same is expected for deployers of L4S
369	   technology.  As part of that validation, it is recommended that
370	   deployers consider the issue of RFC3168 FIFO bottlenecks and conduct
371	   experiments as described in the previous section, or otherwise assess
372	   the impact that the L4S technology will have in the networks in which
373	   it is to be deployed, and take action as is described further in this
374	   section.

376	   If pre-deployment testing raises concerns about issues with RFC3168
377	   bottlenecks, the actions taken may depend on the server type:

379	   *  General purpose servers (e.g. web servers)
380	      -  Out-of-band active testing could be performed by the server.
381	         For example, a javascript application could run simultaneous
382	         downloads (i.e. with and without L4S) during page reading time
383	         in order to survey for presence of RFC3168 FIFO bottlenecks on
384	         paths to users (e.g. as described in Section 4 of [Detection]).

386	      -  In-band testing could be built in to the transport protocol
387	         implementation at the sender in order to perform detection (see
388	         Section 5 of [Detection], though note that this mechanism does
389	         not differentiate between FIFO and FQ).

391	      -  Discontinuing use of L4S based on the detection of RFC3168 FIFO
392	         bottlenecks is likely not needed for short transactional
393	         transfers (e.g. sub 10 seconds) since these are unlikely to
394	         achieve the steady-state conditions where unfairness has been
395	         observed.

397	      -  For longer file transfers, it may be possible to fall-back to
398	         Classic behavior in real-time (i.e. when doing in-band
399	         testing), or to cache those destinations where RFC3168 has been
400	         detected, and disable L4S for subsequent long file transfers to
401	         those destinations.

403	   *  Specialized servers handling long-running sessions (e.g. cloud
404	      gaming)

406	      -  Out-of-band active testing could be performed at each session
407	         startup

409	      -  Out-of-band active testing could be integrated into a "pre-
410	         validation" of the service, done when the user signs up, and
411	         periodically thereafter

413	      -  In-band detection as described in [Detection] could be
414	         performed during the session

416	   TODO: discussion of risk of incorrectly classifying a path

418	   In addition, the responsibilities of and actions taken by a sender
419	   may depend on the environment in which it is deployed.  The following
420	   sub-sections discuss two scenarios: senders serving a limited known
421	   target audience and those that serve an unknown target audience.

423	4.1.  Edge Servers

425	   Some hosts (such as CDN leaf nodes and servers internal to an ISP)
426	   are deployed in environments in which they serve content to a
427	   constrained set of networks or clients.  The operator of such hosts
428	   may be able to determine whether there is the possibility of
429	   [RFC3168] FIFO bottlenecks being present, and utilize this
430	   information to make decisions on selectively deploying L4S and/or
431	   disabling it (e.g. bleaching ECN).  Furthermore, such an operator may
432	   be able to determine the likelihood of an L4S bottleneck being
433	   present, and use this information as well.

435	   For example, if a particular network is known to have deployed legacy
436	   [RFC3168] FIFO bottlenecks, usage of L4S for long capacity-seeking
437	   file transfers on that network could be delayed until those
438	   bottlenecks can be upgraded to mitigate any potential issues as
439	   discussed in the next section.

441	   Prior to deploying L4S on edge servers a server operator should:

443	   *  Consult with network operators on presence of legacy [RFC3168]
444	      FIFO bottlenecks

446	   *  Consult with network operators on presence of L4S bottlenecks

448	   *  Perform pre-deployment testing per network

450	   If a particular network offers connectivity to other networks (e.g.
451	   in the case of an ISP offering service to their customer's networks),
452	   the lack of RFC3168 FIFO bottleneck deployment in the ISP network
453	   can't be taken as evidence that RFC3168 FIFO bottlenecks don't exist
454	   end-to-end (because one may have been deployed by the end-user
455	   network).  In these cases, deployment of L4S will need to take
456	   appropriate steps to detect the presence of such bottlenecks.  At
457	   present, it is believed that the vast majority of RFC3168 bottlenecks
458	   in end-user networks are implementations that utilize fq_codel or
459	   Cake, where the unfairness problem is less likely to be a concern.
460	   While this doesn't completely eliminate the possibility that a legacy
461	   [RFC3168] FIFO bottleneck could exist, it nonetheless provides useful
462	   information that can be utilized in the decision making around the
463	   potential risk for any unfairness to be experienced by end users.

465	4.2.  Other hosts

467	   Hosts that are deployed in locations that serve a wide variety of
468	   networks face a more difficult prospect in terms of handling the
469	   potential presence of RFC3168 FIFO bottlenecks.  Nonetheless, the
470	   steps listed in the ealier section (based on server type) can be
471	   taken to minimize the risk of unfairness.

473	   The interpretation of studies on ECN usage and their deployment
474	   context (see Section 3.1) has so far concluded that RFC3168 FIFO
475	   bottlenecks are likely to be rare, and so detections using these
476	   techniques may also prove to be rare.  Therefore, it may be possible
477	   for a host to cache a list of end host ip addresses where a RFC3168
478	   bottleneck has been detected.  Entries in such a cache would need to
479	   age-out after a period of time to account for IP address changes,
480	   path changes, equipment upgrades, etc.  [TODO: more info on ways to
481	   cache/maintain such a list]

483	   It has been suggested that a public block-list of domains that
484	   implement RFC3168 FIFO bottlenecks could be maintained.  There are a
485	   number of significant issues that would seem to make this idea
486	   infeasible, not the least of which is the fact that presence of
487	   RFC3168 FIFO bottlenecks or L4S bottlenecks is not a property of a
488	   domain, it is the property of a link, and therefore of the particular
489	   current path between two endpoints.

491	   It has also been suggested that a public allow-list of domains that
492	   are participating in the L4S experiment could be maintained.  This
493	   approach would not be useful, given the presence of an L4S domain on
494	   the path does not imply the absence of RFC3168 AQMs upstream or
495	   downstream of that domain.  Also, the approach cannot cater for
496	   domains with a mix of L4S and RFC3168 AQMs.

498	5.  Operator of a Network Employing RFC3168 FIFO Bottlenecks

500	   While it is, of course, preferred for networks to deploy L4S-capable
501	   high fidelity congestion signaling, and while it is more preferable
502	   for L4S senders to detect problems themselves, a network operator who
503	   has deployed equipment in a likely bottleneck link location (i.e. a
504	   link that is expected to be fully saturated) that is configured with
505	   a legacy [RFC3168] FIFO AQM can take certain steps in order to
506	   improve rate fairness between classic traffic and L4S traffic, and
507	   thus enable L4S to be deployed in a greater number of paths.

509	   Some of the options listed in this section may not be feasible in all
510	   networking equipment.

512	5.1.  Configure AQM to treat ECT(1) as NotECT

514	   If equipment is configurable in such a way as to only supply CE marks
515	   to ECT(0) packets, and treat ECT(1) packets identically to NotECT, or
516	   is upgradable to support this capability, doing so will eliminate the
517	   risk of unfairness.

519	5.2.  ECT(1) Tunnel Bypass

521	   Tunnel ECT(1) traffic through the RFC3168 bottleneck with the outer
522	   header indicating Not-ECT, by using either an ECN tunnel ingress in
523	   Compatibility Mode [RFC6040] or a Limited Functionality ECN tunnel
524	   [RFC3168].

526	   Two variants exist for this approach

528	   1.  per-domain: tunnel ECT(1) pkts to domain edge towards dst

530	   2.  per-dst: tunnel ECT(1) pkts to dst

532	5.3.  Configure Non-Coupled Dual Queue

534	   Equipment supporting [RFC3168] may be configurable to enable two
535	   parallel queues for the same traffic class, with classification done
536	   based on the ECN field.

538	   Option 1:

540	   *  Configure 2 queues, both with ECN; 50:50 WRR scheduler

542	      -  Queue #1: ECT(1) & CE packets - Shallow immediate AQM target

544	      -  Queue #2: ECT(0) & NotECT packets - Classic AQM target

546	   *  Outcome in the case of n L4S flows and m long-running Classic
547	      flows

549	      -  if m & n are non-zero, flows get 1/2n and 1/2m of the capacity,
550	         otherwise 1/n or 1/m

552	      -  never < 1/2 each flow's rate if all had been Classic

554	   This option would allow L4S flows to achieve low latency, low loss
555	   and scalable throughput, but would sacrifice the more precise flow
556	   balance offered by [I-D.ietf-tsvwg-aqm-dualq-coupled].  This option
557	   would be expected to result in some reordering of previously CE
558	   marked packets sent by Classic ECN senders, which is a trait shared
559	   with [I-D.ietf-tsvwg-aqm-dualq-coupled].  As is discussed in
560	   [I-D.ietf-tsvwg-ecn-l4s-id], this reordering would be either zero
561	   risk or very low risk.

563	   Option 2:

565	   *  Configure 2 queues, both with AQM; 50:50 WRR scheduler

567	      -  Queue #1: ECT(1) & NotECT packets - ECN disabled

569	      -  Queue #2: ECT(0) & CE packets - ECN enabled

571	   *  Outcome

573	      -  ECT(1) treated as NotECT

575	      -  Flow balance for the 2 queues is the same as in option 1

577	   This option would not allow L4S flows to achieve low latency, low
578	   loss and scalable throughput in this bottleneck link.  As a result it
579	   is the less preferred option.

581	5.4.  WRED with ECT(1) Differentation

583	   This configuration is similar to Option 2 in the previous section,
584	   but uses a single queue with WRED functionality.

586	   *  Configure the queue with two WRED classes

588	   *  Class #1: ECT(1) & NotECT packets - ECN disabled

590	   *  Class #2: ECT(0) & CE packets - ECN enabled

592	5.5.  Disable RFC3168 Support

594	   Disabling an [RFC3168] AQM from CE marking both ECT(0) traffic and
595	   ECT(1) traffic eliminates the unfairness issue.  A downside to this
596	   approach is that classic senders will no longer get the benefits of
597	   Explict Congestion Notification at this bottleneck link.  This
598	   alternative is only mentioned in case there is no other way to
599	   reconfigure an RFC3168 AQM.

601	5.6.  Re-mark ECT(1) to NotECT Prior to AQM

603	   Remarking ECT(1) packets as NotECT (i.e. bleaching ECT(1)) ensures
604	   that they are treated identically to classic NotECT senders.
605	   However, this action is not recommended because a) it would also
606	   prevent downstream L4S bottlenecks from providing high fidelity
607	   congestion signals; and b) it could lead to problems with future
608	   experiments that use ECT(1) in alternative ways to L4S.  This
609	   alternative is only mentioned in case there is no other way to
610	   reconfigure an RFC3168 AQM.

612	   Note that the CE codepoint must never be bleached, otherwise it would
613	   black-hole congestion indications.

615	6.  Operator of a Network Employing RFC3168 FQ Bottlenecks

617	   A network operator who has deployed flow-queuing systems that
618	   implement RFC3168 (e.g. fq_codel or CAKE) at network bottlenecks will
619	   likely see fewer potential issues when L4S traffic is present on
620	   their network as compared to operators of RFC3168 FIFOs.  As
621	   discussed in previous sections, the flow queuing mechanism will
622	   typically isolate L4S flows and Classic flows into separate queues,
623	   and the scheduler will then enforce per-flow fairness.  As a result,
624	   the potential fairness issues between Classic and L4S traffic that
625	   can occur in FIFOs will typically not occur in FQ systems.  That
626	   said, FQ systems commonly treat a tunneled traffic aggregate as a
627	   single flow, and thus a tunneled traffic aggregate that contains a
628	   mix of Classic and L4S traffic will utilize a single queue, and could
629	   experience the same fairness issue as has been described for RFC3168
630	   FIFOs.  This unfairness is compounded by the fact that the FQ
631	   scheduler will already be causing unfairness to flows within the
632	   tunnel relative to flows that are not tunneled.  Additionally, many
633	   of the deployed RFC3168 FQ systems currently implement an AQM
634	   algorithm (either CoDel or COBALT) that is designed for Classic
635	   traffic and reacts sluggishly to L4S (or unresponsive) traffic, with
636	   the result being that L4S senders could in some cases see worse
637	   latency performance than Classic senders.

639	   While the potential unfairness result is arguably less impactful in
640	   the case of RFC3168 FQ bottlenecks, it is believed that RFC3168 FQ
641	   bottlenecks are currently more common than RFC3168 FIFO bottlenecks.
642	   The most common deployments of RFC3168 FQ bottlenecks are in home
643	   routers running OpenWRT firmware where the user has turned the
644	   feature on.

646	   As is the case with RFC3168 FIFOs, the preferred remedy for a network
647	   operator that wishes to enable the best performance possible with
648	   regard to L4S, is for the network operator to update RFC3168 FQ
649	   bottlenecks to be L4S-aware.  In cases where that is infeasible,
650	   several of the remedies described in the previous section can be used
651	   to reduce or eliminate these issues.

653	   *  Configure AQM to treat ECT(1) as NotECT

655	   *  ECT(1) Tunnel Bypass

657	   *  Disable RFC3168 Support

659	   *  Re-mark ECT(1) to NotECT Prior to AQM

661	7.  Conclusion of the L4S experiment

663	   This section gives guidance on how L4S-deploying networks and
664	   endpoints should respond to either of the two possible outcomes of
665	   the IETF-supported L4S experiment.

667	7.1.  Successful termination of the L4S experiment

669	   If the L4S experiment is deemed successful, the IETF would be
670	   expected to move the L4S specifications to standards track.  Networks
671	   would then be encouraged to continue/begin deploying L4S-aware nodes
672	   and to replace all non-L4S-aware RFC3168 AQMs already deployed as far
673	   as feasible, or at least restrict RFC3168 AQM to interpret ECT(1)
674	   equal to NotECT.  Networks that participated in the experiment would
675	   be expected to track the evolution of the L4S standards and adapt
676	   their implementations accordingly (e.g. if as part of switching from
677	   experimental to standards track, changes in the L4S RFCs become
678	   necessary).

680	7.2.  Unsuccessful termination of the L4S experiment

682	   If the L4S experiment is deemed unsuccessful due to lack of
683	   deployment of compliant end-systems or AQMs, it might need to be
684	   terminated: any L4S network nodes should then be un-deployed and the
685	   ECT(1) codepoint usage should be released/recycled as quickly as
686	   possible, recognizing that this process may take some time.  To
687	   facilitate this potential outcome, [draft-ecn-l4s-id] requires L4S
688	   hosts to be configurable to revert to non-L4S congestion control, and
689	   networks to be configurable to treat ECT(1) the same as ECT(0).

691	8.  Contributors

693	   Thanks to Bob Briscoe, Jake Holland, Koen De Schepper, Olivier
694	   Tilmans, Tom Henderson, Asad Ahmed, Gorry Fairhurst, Sebastian
695	   Moeller, and members of the TSVWG mailing list for their
696	   contributions to this document.

698	9.  IANA Considerations

700	   None.

702	10.  Security Considerations

704	   For further study.

706	11.  Informative References

708	   [Akamai]   Holland, J., "Latency & AQM Observations on the Internet",
709	              IETF MAPRG interim-2020-maprg-01, August 2020,
710	              <https://www.ietf.org/proceedings/interim-2020-maprg-
711	              01/slides/slides-interim-2020-maprg-01-sessa-latency-aqm-
712	              observations-on-the-internet-01.pdf>.

714	   [Cake]     Hoiland-Jorgensen, T., Taht, D., and J. Morton, "Piece of
715	              CAKE: A Comprehensive Queue Management Solution for Home
716	              Gateways", 2018, <https://arxiv.org/abs/1804.07617>.

718	   [COBALT]   Palmei, J. and et al., "Design and Evaluation of COBALT
719	              Queue Discipline", IEEE International Symposium on Local
720	              and Metropolitan Area Networks 2019, 2019,
721	              <https://ieeexplore.ieee.org/abstract/document/8847054>.

723	   [Cubic]    Ha, S., Rhee, I., and L. Xu, "CUBIC: A New TCP-Friendly
724	              High-Speed TCP Variant", ACM SIGOPS Operating Systems
725	              Review , 2008,
726	              <https://www.cs.princeton.edu/courses/archive/fall16/
727	              cos561/papers/Cubic08.pdf>.

729	   [Detection]
730	              Briscoe, B. and A.S. Ahmed, "TCP Prague Fall-back on
731	              Detection of a Classic ECN AQM", ArXiv , February 2021,
732	              <https://arxiv.org/abs/1911.00710>.

734	   [ECNreadiness]
735	              Bauer, S., Beverly, R., and A. Berger, "Measuring the
736	              State of ECN Readiness in Servers, Clients, and Routers",
737	              Proc ACM SIGCOMM Internet Measurement Conference IMC'11,
738	              2011,
739	              <http://conferences.sigcomm.org/imc/2011/docs/p171.pdf>.

741	   [EnablingECN]
742	              Trammel, B., Kuehlewind, M., Boppart, D., Learmonth, I.,
743	              Fairhurst, G., and R. Scheffenegger, "Enabling Internet-
744	              Wide Deployment of Explicit Congestion Notification", Proc
745	              Passive & Active Measurement Conference PAM15, 2015,
746	              <https://link.springer.com/
747	              chapter/10.1007%2F978-3-319-15509-8_15>.

749	   [Harm]     Ware, R., Mukerjee, M., Seshan, S., and J. Sherry, "Beyond
750	              Jain's Fairness Index: Setting the Bar For The Deployment
751	              of Congestion Control Algorithms", Hotnets'19 , 2019,
752	              <https://www.cs.cmu.edu/~rware/assets/pdf/ware-
753	              hotnets19.pdf>.

755	   [I-D.heist-tsvwg-ecn-deployment-observations]
756	              Heist, P. and J. Morton, "Explicit Congestion Notification
757	              (ECN) Deployment Observations", Work in Progress,
758	              Internet-Draft, draft-heist-tsvwg-ecn-deployment-
759	              observations-02, 8 March 2021, <http://www.ietf.org/
760	              internet-drafts/draft-heist-tsvwg-ecn-deployment-
761	              observations-02.txt>.

763	   [I-D.ietf-tsvwg-aqm-dualq-coupled]
764	              Schepper, K., Briscoe, B., and G. White, "DualQ Coupled
765	              AQMs for Low Latency, Low Loss and Scalable Throughput
766	              (L4S)", Work in Progress, Internet-Draft, draft-ietf-
767	              tsvwg-aqm-dualq-coupled-13, 15 November 2020,
768	              <http://www.ietf.org/internet-drafts/draft-ietf-tsvwg-aqm-
769	              dualq-coupled-13.txt>.

771	   [I-D.ietf-tsvwg-ecn-l4s-id]
772	              Schepper, K. and B. Briscoe, "Identifying Modified
773	              Explicit Congestion Notification (ECN) Semantics for
774	              Ultra-Low Queuing Delay (L4S)", Work in Progress,
775	              Internet-Draft, draft-ietf-tsvwg-ecn-l4s-id-12, 15
776	              November 2020, <http://www.ietf.org/internet-drafts/draft-
777	              ietf-tsvwg-ecn-l4s-id-12.txt>.

779	   [I-D.ietf-tsvwg-l4s-arch]
780	              Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low
781	              Latency, Low Loss, Scalable Throughput (L4S) Internet
782	              Service: Architecture", Work in Progress, Internet-Draft,
783	              draft-ietf-tsvwg-l4s-arch-08, 15 November 2020,
784	              <http://www.ietf.org/internet-drafts/draft-ietf-tsvwg-l4s-
785	              arch-08.txt>.

787	   [IANA-ECN] Internet Assigned Numbers Authority, "IANA ECN Field
788	              Assignments", 2018, <https://www.iana.org/assignments/
789	              dscp-registry/dscp-registry.xhtml#ecn-field>.

791	   [MeasuringECN]
792	              Mandalari, AM., Lutu, A., Briscoe, B., Bagnulo, M., and O.
793	              Alay, "Measuring ECN++: Good News for ++, Bad News for ECN
794	              over Mobile", DOI 10.1109/MCOM.2018.1700739, IEEE
795	              Communications Magazine vol. 56, no. 3, March 2018,
796	              <https://ieeexplore.ieee.org/document/8316790>.

798	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
799	              of Explicit Congestion Notification (ECN) to IP",
800	              RFC 3168, DOI 10.17487/RFC3168, September 2001,
801	              <https://www.rfc-editor.org/info/rfc3168>.

803	   [RFC5348]  Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP
804	              Friendly Rate Control (TFRC): Protocol Specification",
805	              RFC 5348, DOI 10.17487/RFC5348, September 2008,
806	              <https://www.rfc-editor.org/info/rfc5348>.

808	   [RFC6040]  Briscoe, B., "Tunnelling of Explicit Congestion
809	              Notification", RFC 6040, DOI 10.17487/RFC6040, November
810	              2010, <https://www.rfc-editor.org/info/rfc6040>.

812	   [RFC8290]  Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys,
813	              J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler
814	              and Active Queue Management Algorithm", RFC 8290,
815	              DOI 10.17487/RFC8290, January 2018,
816	              <https://www.rfc-editor.org/info/rfc8290>.

818	   [RFC8311]  Black, D., "Relaxing Restrictions on Explicit Congestion
819	              Notification (ECN) Experimentation", RFC 8311,
820	              DOI 10.17487/RFC8311, January 2018,
821	              <https://www.rfc-editor.org/info/rfc8311>.

823	   [TCPECN]   Bhooma, P., "TCP ECN: Experience with enabling ECN on the
824	              Internet", 98th IETF MAPRG Presentation , 2017,
825	              <https://datatracker.ietf.org/meeting/98/materials/slides-
826	              98-maprg-tcp-ecn-experience-with-enabling-ecn-on-the-
827	              internet-padma-bhooma-00>.

829	Author's Address

831	   Greg White (editor)
832	   CableLabs

834	   Email: g.white@cablelabs.com