idnits 2.17.1 

draft-ietf-tsvwg-ecn-l4s-id-13.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 2581 has weird spacing: '...initial   even...'

  == Line 2600 has weird spacing: '...initial   even...'

  -- The document date (February 22, 2021) is 1153 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  -- Looks like a reference, but probably isn't: '1' on line 1153

  == Missing Reference: 'RFCXXXX' is mentioned on line 1154, but not defined

  == Outdated reference: A later version (-07) exists of
     draft-briscoe-docsis-q-protection-00

  == Outdated reference: A later version (-28) exists of
     draft-ietf-tcpm-accurate-ecn-13

  == Outdated reference: A later version (-15) exists of
     draft-ietf-tcpm-generalized-ecn-06

  == Outdated reference: A later version (-25) exists of
     draft-ietf-tsvwg-aqm-dualq-coupled-13

  == Outdated reference: A later version (-22) exists of
     draft-ietf-tsvwg-ecn-encap-guidelines-14

  == Outdated reference: A later version (-20) exists of
     draft-ietf-tsvwg-l4s-arch-08

  == Outdated reference: A later version (-22) exists of
     draft-ietf-tsvwg-nqb-03

  == Outdated reference: A later version (-23) exists of
     draft-ietf-tsvwg-rfc6040update-shim-12

  == Outdated reference: A later version (-04) exists of
     draft-morton-tsvwg-sce-02

  == Outdated reference: A later version (-06) exists of
     draft-stewart-tsvwg-sctpecn-05

  -- Obsolete informational reference (is this intentional?): RFC 2309
     (Obsoleted by RFC 7567)

  -- Obsolete informational reference (is this intentional?): RFC 4960
     (Obsoleted by RFC 9260)

  -- Obsolete informational reference (is this intentional?): RFC 8312
     (Obsoleted by RFC 9438)


     Summary: 0 errors (**), 0 flaws (~~), 15 warnings (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Transport Services (tsv)                                  K. De Schepper
3	Internet-Draft                                           Nokia Bell Labs
4	Intended status: Experimental                            B. Briscoe, Ed.
5	Expires: August 26, 2021                                     Independent
6	                                                       February 22, 2021

8	 Identifying Modified Explicit Congestion Notification (ECN) Semantics
9	                   for Ultra-Low Queuing Delay (L4S)
10	                     draft-ietf-tsvwg-ecn-l4s-id-13

12	Abstract

14	   This specification defines the identifier to be used on IP packets
15	   for a new network service called low latency, low loss and scalable
16	   throughput (L4S).  L4S uses an Explicit Congestion Notification (ECN)
17	   scheme that is similar to the original (or 'Classic') ECN approach.
18	   'Classic' ECN marking was required to be equivalent to a drop, both
19	   when applied in the network and when responded to by a transport.
20	   Unlike 'Classic' ECN marking, for packets carrying the L4S
21	   identifier, the network applies marking more immediately and more
22	   aggressively than drop, and the transport response to each mark is
23	   reduced and smoothed relative to that for drop.  The two changes
24	   counterbalance each other so that the throughput of an L4S flow will
25	   be roughly the same as a non-L4S flow under the same conditions.
26	   Nonetheless, the much more frequent control signals and the finer
27	   responses to them result in much more fine-grained adjustments, so
28	   that ultra-low and consistently low queuing delay (typically sub-
29	   millisecond on average) becomes possible for L4S traffic without
30	   compromising link utilization.  Thus even capacity-seeking (TCP-like)
31	   traffic can have high bandwidth and very low delay at the same time,
32	   even during periods of high traffic load.

34	   The L4S identifier defined in this document distinguishes L4S from
35	   'Classic' (e.g. TCP-Reno-friendly) traffic.  It gives an incremental
36	   migration path so that suitably modified network bottlenecks can
37	   distinguish and isolate existing traffic that still follows the
38	   Classic behaviour, to prevent it degrading the low queuing delay and
39	   low loss of L4S traffic.  This specification defines the rules that
40	   L4S transports and network elements need to follow to ensure they
41	   neither harm each other's performance nor that of Classic traffic.
42	   Examples of new active queue management (AQM) marking algorithms and
43	   examples of new transports (whether TCP-like or real-time) are
44	   specified separately.

46	Status of This Memo

48	   This Internet-Draft is submitted in full conformance with the
49	   provisions of BCP 78 and BCP 79.

51	   Internet-Drafts are working documents of the Internet Engineering
52	   Task Force (IETF).  Note that other groups may also distribute
53	   working documents as Internet-Drafts.  The list of current Internet-
54	   Drafts is at https://datatracker.ietf.org/drafts/current/.

56	   Internet-Drafts are draft documents valid for a maximum of six months
57	   and may be updated, replaced, or obsoleted by other documents at any
58	   time.  It is inappropriate to use Internet-Drafts as reference
59	   material or to cite them other than as "work in progress."

61	   This Internet-Draft will expire on August 26, 2021.

63	Copyright Notice

65	   Copyright (c) 2021 IETF Trust and the persons identified as the
66	   document authors.  All rights reserved.

68	   This document is subject to BCP 78 and the IETF Trust's Legal
69	   Provisions Relating to IETF Documents
70	   (https://trustee.ietf.org/license-info) in effect on the date of
71	   publication of this document.  Please review these documents
72	   carefully, as they describe your rights and restrictions with respect
73	   to this document.  Code Components extracted from this document must
74	   include Simplified BSD License text as described in Section 4.e of
75	   the Trust Legal Provisions and are provided without warranty as
76	   described in the Simplified BSD License.

78	Table of Contents

80	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
81	     1.1.  Latency, Loss and Scaling Problems  . . . . . . . . . . .   5
82	     1.2.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   7
83	     1.3.  Scope . . . . . . . . . . . . . . . . . . . . . . . . . .   9
84	   2.  Consensus Choice of L4S Packet Identifier: Requirements . . .   9
85	   3.  L4S Packet Identification at Run-Time . . . . . . . . . . . .  10
86	   4.  Prerequisite Transport Layer Behaviour (the 'Prague
87	       Requirements')  . . . . . . . . . . . . . . . . . . . . . . .  11
88	     4.1.  Prerequisite Codepoint Setting  . . . . . . . . . . . . .  11
89	     4.2.  Prerequisite Transport Feedback . . . . . . . . . . . . .  11
90	     4.3.  Prerequisite Congestion Response  . . . . . . . . . . . .  12
91	     4.4.  Filtering or Smoothing of ECN Feedback  . . . . . . . . .  14
92	   5.  Prerequisite Network Node Behaviour . . . . . . . . . . . . .  15
93	     5.1.  Prerequisite Classification and Re-Marking Behaviour  . .  15
94	     5.2.  The Meaning of L4S CE Relative to Drop  . . . . . . . . .  16
95	     5.3.  Exception for L4S Packet Identification by Network Nodes
96	           with Transport-Layer Awareness  . . . . . . . . . . . . .  17
97	     5.4.  Interaction of the L4S Identifier with other Identifiers   17
98	       5.4.1.  DualQ Examples of Other Identifiers Complementing L4S
99	               Identifiers . . . . . . . . . . . . . . . . . . . . .  18
100	         5.4.1.1.  Inclusion of Additional Traffic with L4S  . . . .  18
101	         5.4.1.2.  Exclusion of Traffic From L4S Treatment . . . . .  19
102	         5.4.1.3.  Generalized Combination of L4S and Other
103	                   Identifiers . . . . . . . . . . . . . . . . . . .  20
104	       5.4.2.  Per-Flow Queuing Examples of Other Identifiers
105	               Complementing L4S Identifiers . . . . . . . . . . . .  21
106	     5.5.  Limiting Packet Bursts from Links Supporting L4S AQMs . .  21
107	   6.  L4S Experiments . . . . . . . . . . . . . . . . . . . . . . .  22
108	     6.1.  Open Questions  . . . . . . . . . . . . . . . . . . . . .  23
109	     6.2.  Open Issues . . . . . . . . . . . . . . . . . . . . . . .  24
110	     6.3.  Future Potential  . . . . . . . . . . . . . . . . . . . .  24
111	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  25
112	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  25
113	   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  25
114	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  26
115	     10.1.  Normative References . . . . . . . . . . . . . . . . . .  26
116	     10.2.  Informative References . . . . . . . . . . . . . . . . .  26
117	   Appendix A.  The 'Prague L4S Requirements'  . . . . . . . . . . .  33
118	     A.1.  Requirements for Scalable Transport Protocols . . . . . .  34
119	       A.1.1.  Use of L4S Packet Identifier  . . . . . . . . . . . .  34
120	       A.1.2.  Accurate ECN Feedback . . . . . . . . . . . . . . . .  34
121	       A.1.3.  Fall back to Reno-friendly congestion control on
122	               packet loss . . . . . . . . . . . . . . . . . . . . .  35
123	       A.1.4.  Fall back to Reno-friendly congestion control on
124	               classic ECN bottlenecks . . . . . . . . . . . . . . .  36
125	       A.1.5.  Reduce RTT dependence . . . . . . . . . . . . . . . .  37
126	       A.1.6.  Scaling down to fractional congestion windows . . . .  37
127	       A.1.7.  Measuring Reordering Tolerance in Time Units  . . . .  38
128	     A.2.  Scalable Transport Protocol Optimizations . . . . . . . .  41
129	       A.2.1.  Setting ECT in TCP Control Packets and
130	               Retransmissions . . . . . . . . . . . . . . . . . . .  41
131	       A.2.2.  Faster than Additive Increase . . . . . . . . . . . .  41
132	       A.2.3.  Faster Convergence at Flow Start  . . . . . . . . . .  42
133	   Appendix B.  Alternative Identifiers  . . . . . . . . . . . . . .  42
134	     B.1.  ECT(1) and CE codepoints  . . . . . . . . . . . . . . . .  43
135	     B.2.  ECN-DualQ-SCE1  . . . . . . . . . . . . . . . . . . . . .  47
136	     B.3.  ECN-DualQ-SCE0  . . . . . . . . . . . . . . . . . . . . .  49
137	     B.4.  ECN Plus a Diffserv Codepoint (DSCP)  . . . . . . . . . .  51
138	     B.5.  ECN capability alone  . . . . . . . . . . . . . . . . . .  54
139	     B.6.  Protocol ID . . . . . . . . . . . . . . . . . . . . . . .  54
140	     B.7.  Source or destination addressing  . . . . . . . . . . . .  54
141	     B.8.  Summary: Merits of Alternative Identifiers  . . . . . . .  55

143	   Appendix C.  Potential Competing Uses for the ECT(1) Codepoint  .  56
144	     C.1.  Integrity of Congestion Feedback  . . . . . . . . . . . .  56
145	     C.2.  Notification of Less Severe Congestion than CE  . . . . .  57
146	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  57

148	1.  Introduction

150	   This specification defines the identifier to be used on IP packets
151	   for a new network service called low latency, low loss and scalable
152	   throughput (L4S).  It is similar to the original (or 'Classic')
153	   Explicit Congestion Notification (ECN [RFC3168]).  RFC 3168 required
154	   an ECN mark to be equivalent to a drop, both when applied in the
155	   network and when responded to by a transport.  Unlike Classic ECN
156	   marking, the network applies L4S marking more immediately and more
157	   aggressively than drop, and the transport response to each mark is
158	   reduced and smoothed relative to that for drop.  The two changes
159	   counterbalance each other so that the throughput of an L4S flow will
160	   be roughly the same as a non-L4S flow under the same conditions.
161	   Nonetheless, the much more frequent control signals and the finer
162	   responses to them result in ultra-low queuing delay without
163	   compromising link utilization, and this low delay can be maintained
164	   during high load.  Ultra-low queuing delay means less than 1
165	   millisecond (ms) on average and less than about 2 ms at the 99th
166	   percentile.

168	   An example of a scalable congestion control that would enable the L4S
169	   service is Data Center TCP (DCTCP), which until now has been
170	   applicable solely to controlled environments like data centres
171	   [RFC8257], because it is too aggressive to co-exist with existing
172	   TCP-Reno-friendly traffic.  The DualQ Coupled AQM, which is defined
173	   in a complementary experimental specification
174	   [I-D.ietf-tsvwg-aqm-dualq-coupled], is an AQM framework that enables
175	   scalable congestion controls like DCTCP to co-exist with existing
176	   traffic, each getting roughly the same flow rate when they compete
177	   under similar conditions.  Note that a transport such as DCTCP is
178	   still not safe to deploy on the Internet unless it satisfies the
179	   requirements listed in Section 4.

181	   L4S is not only for elastic (TCP-like) traffic - there are scalable
182	   congestion controls for real-time media, such as the L4S variant of
183	   the SCReAM [RFC8298] real-time media congestion avoidance technique
184	   (RMCAT).  The factor that distinguishes L4S from Classic traffic is
185	   its behaviour in response to congestion.  The transport wire
186	   protocol, e.g. TCP, QUIC, SCTP, DCCP, RTP/RTCP, is orthogonal (and
187	   therefore not suitable for distinguishing L4S from Classic packets).

189	   The L4S identifier defined in this document is the key piece that
190	   distinguishes L4S from 'Classic' (e.g. Reno-friendly) traffic.  It
191	   gives an incremental migration path so that suitably modified network
192	   bottlenecks can distinguish and isolate existing Classic traffic from
193	   L4S traffic to prevent it from degrading the ultra-low delay and loss
194	   of the new scalable transports, without harming Classic performance.
195	   Initial implementation of the separate parts of the system has been
196	   motivated by the performance benefits.

198	1.1.  Latency, Loss and Scaling Problems

200	   Latency is becoming the critical performance factor for many (most?)
201	   applications on the public Internet, e.g. interactive Web, Web
202	   services, voice, conversational video, interactive video, interactive
203	   remote presence, instant messaging, online gaming, remote desktop,
204	   cloud-based applications, and video-assisted remote control of
205	   machinery and industrial processes.  In the 'developed' world,
206	   further increases in access network bit-rate offer diminishing
207	   returns, whereas latency is still a multi-faceted problem.  In the
208	   last decade or so, much has been done to reduce propagation time by
209	   placing caches or servers closer to users.  However, queuing remains
210	   a major intermittent component of latency.

212	   The Diffserv architecture provides Expedited Forwarding [RFC3246], so
213	   that low latency traffic can jump the queue of other traffic.
214	   However, on access links dedicated to individual sites (homes, small
215	   enterprises or mobile devices), often all traffic at any one time
216	   will be latency-sensitive.  Then, given nothing to differentiate
217	   from, Diffserv makes no difference.  Instead, we need to remove the
218	   causes of any unnecessary delay.

220	   The bufferbloat project has shown that excessively-large buffering
221	   ('bufferbloat') has been introducing significantly more delay than
222	   the underlying propagation time.  These delays appear only
223	   intermittently--only when a capacity-seeking (e.g. TCP) flow is long
224	   enough for the queue to fill the buffer, making every packet in other
225	   flows sharing the buffer sit through the queue.

227	   Active queue management (AQM) was originally developed to solve this
228	   problem (and others).  Unlike Diffserv, which gives low latency to
229	   some traffic at the expense of others, AQM controls latency for _all_
230	   traffic in a class.  In general, AQM methods introduce an increasing
231	   level of discard from the buffer the longer the queue persists above
232	   a shallow threshold.  This gives sufficient signals to capacity-
233	   seeking (aka. greedy) flows to keep the buffer empty for its intended
234	   purpose: absorbing bursts.  However, RED [RFC2309] and other
235	   algorithms from the 1990s were sensitive to their configuration and
236	   hard to set correctly.  So, this form of AQM was not widely deployed.

238	   More recent state-of-the-art AQM methods, e.g. FQ-CoDel [RFC8290],
239	   PIE [RFC8033], Adaptive RED [ARED01], are easier to configure,
240	   because they define the queuing threshold in time not bytes, so it is
241	   invariant for different link rates.  However, no matter how good the
242	   AQM, the sawtoothing sending window of a Classic congestion control
243	   will either cause queuing delay to vary or cause the link to be
244	   under-utilized.  Even with a perfectly tuned AQM, the additional
245	   queuing delay will be of the same order as the underlying speed-of-
246	   light delay across the network.

248	   If a sender's own behaviour is introducing queuing delay variation,
249	   no AQM in the network can 'un-vary' the delay without significantly
250	   compromising link utilization.  Even flow-queuing (e.g. [RFC8290]),
251	   which isolates one flow from another, cannot isolate a flow from the
252	   delay variations it inflicts on itself.  Therefore those applications
253	   that need to seek out high bandwidth but also need low latency will
254	   have to migrate to scalable congestion control.

256	   Altering host behaviour is not enough on its own though.  Even if
257	   hosts adopt low latency behaviour (scalable congestion controls),
258	   they need to be isolated from the behaviour of existing Classic
259	   congestion controls that induce large queue variations.  L4S enables
260	   that migration by providing latency isolation in the network and
261	   distinguishing the two types of packets that need to be isolated: L4S
262	   and Classic.  L4S isolation can be achieved with a queue per flow
263	   (e.g. [RFC8290]) but a DualQ [I-D.ietf-tsvwg-aqm-dualq-coupled] is
264	   sufficient, and actually gives better tail latency.  Both approaches
265	   are addressed in this document.

267	   The DualQ solution was developed to make ultra-low latency available
268	   without requiring per-flow queues at every bottleneck.  This was
269	   because FQ has well-known downsides - not least the need to inspect
270	   transport layer headers in the network, which makes it incompatible
271	   with privacy approaches such as IPSec VPN tunnels, and incompatible
272	   with link layer queue management, where transport layer headers can
273	   be hidden, e.g. 5G.

275	   Latency is not the only concern addressed by L4S: It was known when
276	   TCP congestion avoidance was first developed that it would not scale
277	   to high bandwidth-delay products (footnote 6 of Jacobson and Karels
278	   [TCP-CA]).  Given regular broadband bit-rates over WAN distances are
279	   already [RFC3649] beyond the scaling range of Reno TCP, 'less
280	   unscalable' Cubic [RFC8312] and Compound [I-D.sridharan-tcpm-ctcp]
281	   variants of TCP have been successfully deployed.  However, these are
282	   now approaching their scaling limits.  Unfortunately, fully scalable
283	   congestion controls such as DCTCP [RFC8257] cause Classic ECN
284	   congestion controls sharing the same queue to starve themselves,
285	   which is why they have been confined to private data centres or
286	   research testbeds (until now).

288	   It turns out that a congestion control algorithm like DCTCP that
289	   solves the latency problem also solves the scalability problem of
290	   Classic congestion controls.  The finer sawteeth in the congestion
291	   window have low amplitude, so they cause very little queuing delay
292	   variation and the average time to recover from one congestion signal
293	   to the next (the average duration of each sawtooth) remains
294	   invariant, which maintains constant tight control as flow-rate
295	   scales.  A background paper [DCttH15] gives the full explanation of
296	   why the design solves both the latency and the scaling problems, both
297	   in plain English and in more precise mathematical form.  The
298	   explanation is summarised without the maths in the L4S architecture
299	   document [I-D.ietf-tsvwg-l4s-arch].

301	1.2.  Terminology

303	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
304	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
305	   "OPTIONAL" in this document are to be interpreted as described in
306	   [RFC2119].  In this document, these words will appear with that
307	   interpretation only when in ALL CAPS.  Lower case uses of these words
308	   are not to be interpreted as carrying RFC-2119 significance.

310	   Classic Congestion Control:  A congestion control behaviour that can
311	      co-exist with standard TCP Reno [RFC5681] without causing
312	      significantly negative impact on its flow rate [RFC5033].  With
313	      Classic congestion controls, as flow rate scales, the number of
314	      round trips between congestion signals (losses or ECN marks) rises
315	      with the flow rate.  So it takes longer and longer to recover
316	      after each congestion event.  Therefore control of queuing and
317	      utilization becomes very slack, and the slightest disturbance
318	      prevents a high rate from being attained [RFC3649].

320	      For instance, with 1500 byte packets and an end-to-end round trip
321	      time (RTT) of 36 ms, over the years, as Reno flow rate scales from
322	      2 to 100 Mb/s the number of round trips taken to recover from a
323	      congestion event rises proportionately, from 4 round trips to 200.
324	      Cubic [RFC8312] was developed to be less unscalable, but it is
325	      approaching its scaling limit; with the same RTT of 36ms, at
326	      100Mb/s it takes about 106 round trips to recover, and at 800 Mb/s
327	      its recovery time triples to over 340 round trips, or still more
328	      than 12 seconds (Reno would take 57 seconds).  Cubic only becomes
329	      significantly better than Reno at high delay and rate
330	      combinations, for example at 90 ms RTT and 800 Mb/s a Reno flow
331	      takes 4000 RTTs or 6 minutes to recover, whereas Cubic 'only'
332	      needs 188 RTTs, which is still 17 seconds (double its recovery
333	      time at 100Mb/s).

335	   Scalable Congestion Control:  A congestion control where the average
336	      time from one congestion signal to the next (the recovery time)
337	      remains invariant as the flow rate scales, all other factors being
338	      equal.  This maintains the same degree of control over queueing
339	      and utilization whatever the flow rate, as well as ensuring that
340	      high throughput is robust to disturbances.  For instance, DCTCP
341	      averages 2 congestion signals per round-trip whatever the flow
342	      rate, as do other recently developed scalable congestion controls,
343	      e.g. Relentless TCP [Mathis09], TCP Prague [PragueLinux] and the
344	      L4S variant of SCREAM for real-time media [RFC8298]).  See
345	      Section 4.3 for more explanation.

347	   Classic service:  The Classic service is intended for all the
348	      congestion control behaviours that co-exist with Reno [RFC5681]
349	      (e.g. Reno itself, Cubic [RFC8312], Compound
350	      [I-D.sridharan-tcpm-ctcp], TFRC [RFC5348]).  The term 'Classic
351	      queue' means a queue providing the Classic service.

353	   Low-Latency, Low-Loss Scalable throughput (L4S) service:  The 'L4S'
354	      service is intended for traffic from scalable congestion control
355	      algorithms, such as Data Center TCP [RFC8257].  The L4S service is
356	      for more general traffic than just DCTCP--it allows the set of
357	      congestion controls with similar scaling properties to DCTCP to
358	      evolve, such as the examples listed above (Relentless, Prague,
359	      SCReAM).  The term 'L4S queue' means a queue providing the L4S
360	      service.

362	      The terms Classic or L4S can also qualify other nouns, such as
363	      'queue', 'codepoint', 'identifier', 'classification', 'packet',
364	      'flow'.  For example: an L4S packet means a packet with an L4S
365	      identifier sent from an L4S congestion control.

367	      Both Classic and L4S services can cope with a proportion of
368	      unresponsive or less-responsive traffic as well, as long as it
369	      does not build a queue (e.g. DNS, VoIP, game sync datagrams, etc).

371	   Reno-friendly:  The subset of Classic traffic that excludes
372	      unresponsive traffic and excludes experimental congestion controls
373	      intended to coexist with Reno but without always being strictly
374	      friendly to Reno (as allowed by [RFC5033]).  Reno-friendly is used
375	      in place of 'TCP-friendly', given that the TCP protocol is used
376	      with many different congestion control behaviours.

378	   Classic ECN:  The original Explicit Congestion Notification (ECN)
379	      protocol [RFC3168], which requires ECN signals to be treated the
380	      same as drops, both when generated in the network and when
381	      responded to by the sender.  The names used for the four
382	      codepoints of the 2-bit IP-ECN field are as defined in [RFC3168]:
383	      Not ECT, ECT(0), ECT(1) and CE, where ECT stands for ECN-Capable
384	      Transport and CE stands for Congestion Experienced.

386	1.3.  Scope

388	   The new L4S identifier defined in this specification is applicable
389	   for IPv4 and IPv6 packets (as for Classic ECN [RFC3168]).  It is
390	   applicable for the unicast, multicast and anycast forwarding modes.

392	   The L4S identifier is an orthogonal packet classification to the
393	   Differentiated Services Code Point (DSCP) [RFC2474].  Section 5.4
394	   explains what this means in practice.

396	   This document is intended for experimental status, so it does not
397	   update any standards track RFCs.  Therefore it depends on [RFC8311],
398	   which is a standards track specification that:

400	   o  updates the ECN proposed standard [RFC3168] to allow experimental
401	      track RFCs to relax the requirement that an ECN mark must be
402	      equivalent to a drop (when the network applies markings and/or
403	      when the sender responds to them);

405	   o  changes the status of the experimental ECN nonce [RFC3540] to
406	      historic;

408	   o  makes consequent updates to the following additional proposed
409	      standard RFCs to reflect the above two bullets:

411	      *  ECN for RTP [RFC6679];

413	      *  the congestion control specifications of various DCCP
414	         congestion control identifier (CCID) profiles [RFC4341],
415	         [RFC4342], [RFC5622].

417	   This document is about identifiers that are used for interoperation
418	   between hosts and networks.  So the audience is broad, covering
419	   developers of host transports and network AQMs, as well as covering
420	   how operators might wish to combine various identifiers, which would
421	   require flexibility from equipment developers.

423	2.  Consensus Choice of L4S Packet Identifier: Requirements

425	   This subsection briefly records the process that led to a consensus
426	   choice of L4S identifier, selected from all the alternatives in
427	   Appendix B.

429	   The identifier for packets using the Low Latency, Low Loss, Scalable
430	   throughput (L4S) service needs to meet the following requirements:

432	   o  it SHOULD survive end-to-end between source and destination
433	      applications: across the boundary between host and network,
434	      between interconnected networks, and through middleboxes;

436	   o  it SHOULD be visible at the IP layer

438	   o  it SHOULD be common to IPv4 and IPv6 and transport-agnostic;

440	   o  it SHOULD be incrementally deployable;

442	   o  it SHOULD enable an AQM to classify packets encapsulated by outer
443	      IP or lower-layer headers;

445	   o  it SHOULD consume minimal extra codepoints;

447	   o  it SHOULD be consistent on all the packets of a transport layer
448	      flow, so that some packets of a flow are not served by a different
449	      queue to others.

451	   Whether the identifier would be recoverable if the experiment failed
452	   is a factor that could be taken into account.  However, this has not
453	   been made a requirement, because that would favour schemes that would
454	   be easier to fail, rather than those more likely to succeed.

456	   It is recognised that the chosen identifier is unlikely to satisfy
457	   all these requirements, particularly given the limited space left in
458	   the IP header.  Therefore a compromise will be necessary, which is
459	   why all the above requirements are expressed with the word 'SHOULD'
460	   not 'MUST'.  Appendix B discusses the pros and cons of the
461	   compromises made in various competing identification schemes against
462	   the above requirements.

464	   On the basis of this analysis, "ECT(1) and CE codepoints" is the best
465	   compromise.  Therefore this scheme is defined in detail in the
466	   following sections, while Appendix B records the rationale for this
467	   decision.

469	3.  L4S Packet Identification at Run-Time

471	   The L4S treatment is an experimental track alternative packet marking
472	   treatment [RFC4774] to the Classic ECN treatment in [RFC3168], which
473	   has been updated by [RFC8311] to allow experiments such as the one
474	   defined in the present specification.  Like Classic ECN, L4S ECN
475	   identifies both network and host behaviour: it identifies the marking
476	   treatment that network nodes are expected to apply to L4S packets,
477	   and it identifies packets that have been sent from hosts that are
478	   expected to comply with a broad type of sending behaviour.

480	   For a packet to receive L4S treatment as it is forwarded, the sender
481	   sets the ECN field in the IP header to the ECT(1) codepoint.  See
482	   Section 4 for full transport layer behaviour requirements, including
483	   feedback and congestion response.

485	   A network node that implements the L4S service normally classifies
486	   arriving ECT(1) and CE packets for L4S treatment.  See Section 5 for
487	   full network element behaviour requirements, including
488	   classification, ECN-marking and interaction of the L4S identifier
489	   with other identifiers and per-hop behaviours.

491	4.  Prerequisite Transport Layer Behaviour (the 'Prague Requirements')

493	4.1.  Prerequisite Codepoint Setting

495	   A sender that wishes a packet to receive L4S treatment as it is
496	   forwarded, MUST set the ECN field in the IP header (v4 or v6) to the
497	   ECT(1) codepoint.

499	4.2.  Prerequisite Transport Feedback

501	   For a transport protocol to provide scalable congestion control it
502	   MUST provide feedback of the extent of CE marking on the forward
503	   path.  When ECN was added to TCP [RFC3168], the feedback method
504	   reported no more than one CE mark per round trip.  Some transport
505	   protocols derived from TCP mimic this behaviour while others report
506	   the accurate extent of ECN marking.  This means that some transport
507	   protocols will need to be updated as a prerequisite for scalable
508	   congestion control.  The position for a few well-known transport
509	   protocols is given below.

511	   TCP:  Support for the accurate ECN feedback requirements [RFC7560]
512	      (such as that provided by AccECN [I-D.ietf-tcpm-accurate-ecn]) by
513	      both ends is a prerequisite for scalable congestion control in
514	      TCP.  Therefore, the presence of ECT(1) in the IP headers even in
515	      one direction of a TCP connection will imply that both ends must
516	      be supporting accurate ECN feedback.  However, the converse does
517	      not apply.  So even if both ends support AccECN, either of the two
518	      ends can choose not to use a scalable congestion control, whatever
519	      the other end's choice.

521	   SCTP:  A suitable ECN feedback mechanism for SCTP could add a chunk
522	      to report the number of received CE marks
523	      (e.g. [I-D.stewart-tsvwg-sctpecn]), and update the ECN feedback
524	      protocol sketched out in Appendix A of the standards track
525	      specification of SCTP [RFC4960].

527	   RTP over UDP:  A prerequisite for scalable congestion control is for
528	      both (all) ends of one media-level hop to signal ECN support
529	      [RFC6679] and use the new generic RTCP feedback format of
530	      [I-D.ietf-avtcore-cc-feedback-message].  The presence of ECT(1)
531	      implies that both (all) ends of that media-level hop support ECN.
532	      However, the converse does not apply.  So each end of a media-
533	      level hop can independently choose not to use a scalable
534	      congestion control, even if both ends support ECN.

536	   QUIC:  Support for sufficiently fine-grained ECN feedback is provided
537	      by the v1 IETF QUIC transport [I-D.ietf-quic-transport].

539	   DCCP:  The ACK vector in DCCP [RFC4340] is already sufficient to
540	      report the extent of CE marking as needed by a scalable congestion
541	      control.

543	4.3.  Prerequisite Congestion Response

545	   As a condition for a host to send packets with the L4S identifier
546	   (ECT(1)), it SHOULD implement a congestion control behaviour that
547	   ensures that, in steady state, the average time from one ECN
548	   congestion signal to the next (the 'recovery time') does not increase
549	   as flow rate scales, all other factors being equal.  This is termed a
550	   scalable congestion control.  This is necessary to ensure that queue
551	   variations remain small as flow rate scales, without having to
552	   sacrifice utilization.

554	   For instance, for DCTCP, TCP Prague [PragueLinux] and the L4S variant
555	   of SCReAM [RFC8298], the average recovery time is always half a round
556	   trip, whatever the flow rate.

558	   As with all transport behaviours, a detailed specification (probably
559	   an experimental RFC) will need to be defined for each congestion
560	   control, following the guidelines for specifying new congestion
561	   control algorithms in [RFC5033].  In addition it will need to
562	   document these L4S-specific matters, specifically the timescale over
563	   which the proportionality is averaged, and control of burstiness.
564	   The recovery time requirement above is worded as a 'SHOULD' rather
565	   than a 'MUST' to allow reasonable flexibility when defining these
566	   specifications.

568	   The condition 'all other factors being equal', allows the recovery
569	   time to be different for different round trip times, as long as it
570	   does not increase with flow rate for any particular RTT.

572	   Saying that the recovery time remains roughly invariant is equivalent
573	   to saying that the number of ECN CE marks per round trip remains
574	   invariant as flow rate scales, all other factors being equal.  For
575	   instance, DCTCP's average recovery time of half of 1 RTT is
576	   equivalent to 2 ECN marks per round trip.  For those familiar with
577	   steady-state congestion response functions, it is also equivalent to
578	   say that the congestion window is inversely proportional to the
579	   proportion of bytes in packets marked with the CE codepoint (see
580	   section 2 of [PI2]).

582	   In order to coexist safely with other Internet traffic, a scalable
583	   congestion control MUST NOT tag its packets with the ECT(1) codepoint
584	   unless it complies with the following bulleted requirements:

586	   o  As well as responding to ECN markings, a scalable congestion
587	      control MUST react to packet loss in a way that will coexist
588	      safely with a TCP Reno congestion control [RFC5681] (see
589	      Section 1.2 on Terminology for definition of Reno-Friendly and
590	      Appendix A.1.3 for rationale).

592	   o  A scalable congestion control MUST implement monitoring in order
593	      to detect a likely non-L4S but ECN-capable AQM at the bottleneck.
594	      On detection of a likely ECN-capable bottleneck it SHOULD be
595	      capable (dependent on configuration) of automatically adapting its
596	      congestion response to coexist with TCP Reno congestion controls
597	      [RFC5681] (see Appendix A.1.4 for rationale and a referenced
598	      algorithm).

600	      Note that a scalable congestion control is not expected to change
601	      to setting ECT(0) while it falls back to coexist with Reno.

603	   o  A scalable congestion control MUST eliminate RTT bias as much as
604	      possible in the range between the minimum likely RTT and typical
605	      RTTs expected in the intended deployment scenario (see
606	      Appendix A.1.5 for rationale).

608	   o  A scalable congestion control SHOULD remain responsive to
609	      congestion when typical RTTs over the public Internet are
610	      significantly smaller because they are no longer inflated by
611	      queuing delay.  It would be preferable for the minimum window of a
612	      scalable congestion control to be lower than the 2 segment minimum
613	      of TCP Reno [RFC5681] but this is not set as a formal requirement
614	      for L4S experiments (see Appendix A.1.6 for rationale).

616	   o  A scalable congestion control SHOULD detect loss by counting in
617	      time-based units, which is scalable, as opposed to counting in
618	      units of packets (as in the 3 DupACK rule of RFC 5681 TCP), which
619	      is not scalable.  As packet rates increase (e.g., due to new and/
620	      or improved technology), congestion controls that detect loss by
621	      counting in units of packets become more likely to incorrectly
622	      treat reordering events as congestion-caused loss events (see
623	      Appendix A.1.7 for further rationale).  This requirement does not
624	      apply to congestion controls that are solely used in controlled
625	      environments where the network introduces hardly any reordering.

627	   o  A scalable congestion control is expected to limit the queue
628	      caused by bursts of packets.  It would not seem necessary to set
629	      the limit any lower than 10% of the minimum RTT expected in a
630	      typical deployment (e.g. additional queuing of roughly 250 us for
631	      the public Internet).  This would be converted to a number of
632	      packets under the worst-case assumption that the bottleneck link
633	      capacity equals the current flow rate.  No normative requirement
634	      to limit bursts is given here and, until there is more industry
635	      experience from the L4S experiment, it is not even known whether
636	      one is needed - it seems to be in an L4S sender's self-interest to
637	      limit bursts.

639	   To participate in the L4S experiment, a scalable congestion control
640	   MUST be capable of being replaced by a Classic congestion control (by
641	   application and by administrative control).  A purely Classic
642	   congestion control will not tag its packets with the ECT(1)
643	   codepoint.

645	   Each sender in a session can use a scalable congestion control
646	   independently of the congestion control used by the receiver(s) when
647	   they send data.  Therefore there might be ECT(1) packets in one
648	   direction and ECT(0) or Not-ECT in the other.

650	   Later (Section 5.4.1.1) this document discusses the conditions for
651	   mixing other "'Safe' Unresponsive Traffic" (e.g. DNS, LDAP, NTP,
652	   voice, game sync packets) with L4S traffic.  To be clear, although
653	   such traffic can share the same queue as L4S traffic, it is not
654	   appropriate for the sender to tag it as ECT(1), except in the
655	   (unlikely) case that it satisfies the above conditions.

657	4.4.  Filtering or Smoothing of ECN Feedback

659	   Section 5.2 below specifies that an L4S AQM is expected to signal L4S
660	   ECN without filtering or smoothing.  This contrasts with a Classic
661	   AQM, which filters out variations in the queue before signalling ECN
662	   marking or drop.  In the L4S architecture [I-D.ietf-tsvwg-l4s-arch],
663	   responsibility for smoothing out these variations shifts to the
664	   sender's congestion control.

666	   This shift of responsibility has the advantage that each sender can
667	   smooth variations over a timescale proportionate to its own RTT.

669	   Whereas, in the Classic approach, the network doesn't know the RTTs
670	   of all the flows, so it has to smooth out variations for a worst-case
671	   RTT to ensure stability.  For all the typical flows with shorter RTT
672	   than the worst-case, this makes congestion control unnecessarily
673	   sluggish.

675	   This also gives an L4S sender the choice not to smooth, depending on
676	   its context (start-up, congestion avoidance, etc).  Therefore, this
677	   document places no requirement on an L4S congestion control to smooth
678	   out variations in any particular way.  Nonetheless, the specification
679	   of a particular L4S congestion control SHOULD describe how it smooths
680	   the L4S ECN signals fed back to it from the receiver.

682	5.  Prerequisite Network Node Behaviour

684	5.1.  Prerequisite Classification and Re-Marking Behaviour

686	   A network node that implements the L4S service MUST classify arriving
687	   ECT(1) packets for L4S treatment and, other than in the exceptional
688	   case referred to next, it MUST classify arriving CE packets for L4S
689	   treatment as well.  CE packets might have originated as ECT(1) or
690	   ECT(0), but the above rule to classify them as if they originated as
691	   ECT(1) is the safe choice (see Appendix B.1 for rationale).  The
692	   exception is where some flow-aware in-network mechanism happens to be
693	   available for distinguishing CE packets that originated as ECT(0), as
694	   described in Section 5.3, but there is no implication that such a
695	   mechanism is necessary.

697	   An L4S AQM treatment follows similar codepoint transition rules to
698	   those in RFC 3168.  Specifically, the ECT(1) codepoint MUST NOT be
699	   changed to any other codepoint than CE, and CE MUST NOT be changed to
700	   any other codepoint.  An ECT(1) packet is classified as ECN-capable
701	   and, if congestion increases, an L4S AQM algorithm will increasingly
702	   mark the ECN field as CE, otherwise forwarding packets unchanged as
703	   ECT(1).  Necessary conditions for an L4S marking treatment are
704	   defined in Section 5.2.

706	   Under persistent overload an L4S marking treatment MUST begin using
707	   Classic drop until the overload episode has subsided, as recommended
708	   for all AQM methods in [RFC7567] (Section 4.2.1), which follows the
709	   similar advice in RFC 3168 (Section 7).  Where an L4S AQM is
710	   transport-aware, this requirement could be satisfied by using Classic
711	   drop in only the most overloaded individual per-flow AQMs or in a
712	   DualQ by redirecting packets in those flows contributing most to the
713	   overload from the L4S queue so that they are subjected to drop in the
714	   Classic queue [I-D.briscoe-docsis-q-protection].

716	   For backward compatibility in uncontrolled environments, a network
717	   node that implements the L4S treatment MUST also implement an AQM
718	   treatment for the Classic service as defined in Section 1.2.  This
719	   Classic AQM treatment need not mark ECT(0) packets, but if it does,
720	   it will do so under the same conditions as it would drop Not-ECT
721	   packets [RFC3168].  It MUST classify arriving ECT(0) and Not-ECT
722	   packets for treatment by this Classic AQM (for the DualQ Coupled AQM,
723	   see the extensive discussion on classification in Sections 2.3 and
724	   2.5.1.1 of [I-D.ietf-tsvwg-aqm-dualq-coupled]).

726	5.2.  The Meaning of L4S CE Relative to Drop

728	   The likelihood that an AQM drops a Not-ECT Classic packet (p_C) MUST
729	   be roughly proportional to the square of the likelihood that it would
730	   have marked it if it had been an L4S packet (p_L).  That is

732	      p_C ~= (p_L / k)^2

734	   The constant of proportionality (k) does not have to be standardised
735	   for interoperability, but a value of 2 is RECOMMENDED.  The term
736	   'likelihood' is used above to allow for marking and dropping to be
737	   either probabilistic or deterministic.

739	   This formula ensures that Scalable and Classic flows will converge to
740	   roughly equal congestion windows, for the worst case of Reno
741	   congestion control.  This is because the congestion windows of
742	   Scalable and Classic congestion controls are inversely proportional
743	   to p_L and sqrt(p_C) respectively.  So squaring p_C in the above
744	   formula counterbalances the square root that characterizes Reno-
745	   friendly flows.

747	   The relative strengths of L4S CE and drop are irrelevant in an AQM
748	   that schedules application flows explicitly (e.g. an FQ scheduler).
749	   Nonetheless, the above relationship defines the coupling between L4S
750	   and Classic congestion signals in a DualQ Coupled AQM
751	   [I-D.ietf-tsvwg-aqm-dualq-coupled].

753	   Note that, contrary to RFC 3168, a Dual Queue Coupled AQM
754	   implementing the L4S and Classic treatments does not mark an ECT(1)
755	   packet under the same conditions that it would have dropped a Not-ECT
756	   packet, as allowed by [RFC8311], which updates RFC 3168.  However, if
757	   it marks ECT(0) packets, it does so under the same conditions that it
758	   would have dropped a Not-ECT packet.

760	   Also, L4S CE marking needs to be interpreted as an unsmoothed signal,
761	   in contrast to the Classic approach in which AQMs filter out
762	   variations before signalling congestion.  An L4S AQM SHOULD NOT
763	   smooth or filter out variations in the queue before signalling
764	   congestion.  In the L4S architecture [I-D.ietf-tsvwg-l4s-arch], the
765	   sender, not the network, is responsible for smoothing out variations.

767	   This requirement is worded as 'SHOULD NOT' rather than 'MUST NOT' to
768	   allow for the case where the signals from a Classic smoothed AQM are
769	   coupled with those from an unsmoothed L4S AQM.  Nonetheless, the
770	   spirit of the requirement is for all systems to expect that L4S ECN
771	   signalling is unsmoothed and unfiltered, which is important for
772	   interoperability.

774	5.3.  Exception for L4S Packet Identification by Network Nodes with
775	      Transport-Layer Awareness

777	   To implement the L4S treatment, a network node does not need to
778	   identify transport-layer flows.  Nonetheless, if an implementer is
779	   willing to identify transport-layer flows at a network node, and if
780	   the most recent ECT packet in the same flow was ECT(0), the node MAY
781	   classify CE packets for Classic ECN [RFC3168] treatment.  In all
782	   other cases, a network node MUST classify all CE packets for L4S
783	   treatment.  Examples of such other cases are: i) if no ECT packets
784	   have yet been identified in a flow; ii) if it is not desirable for a
785	   network node to identify transport-layer flows; or iii) if the most
786	   recent ECT packet in a flow was ECT(1).

788	   If an implementer uses flow-awareness to classify CE packets, to
789	   determine whether the flow is using ECT(0) or ECT(1) it only uses the
790	   most recent ECT packet of a flow (this advice will need to be
791	   verified as part of L4S experiments).  This is because a sender might
792	   switch from sending ECT(1) (L4S) packets to sending ECT(0) (Classic
793	   ECN) packets, or back again, in the middle of a transport-layer flow
794	   (e.g. it might manually switch its congestion control module mid-
795	   connection, or it might be deliberately attempting to confuse the
796	   network).

798	5.4.  Interaction of the L4S Identifier with other Identifiers

800	   The examples in this section concern how additional identifiers might
801	   complement the L4S identifier to classify packets between class-based
802	   queues.  Firstly Section 5.4.1 considers two queues, L4S and Classic,
803	   as in the Coupled DualQ AQM [I-D.ietf-tsvwg-aqm-dualq-coupled],
804	   either alone (Section 5.4.1.1) or within a larger queuing hierarchy
805	   (Section 5.4.1.2).  Then Section 5.4.2 considers schemes that might
806	   combine per-flow 5-tuples with other identifiers.

808	5.4.1.  DualQ Examples of Other Identifiers Complementing L4S
809	        Identifiers

811	5.4.1.1.  Inclusion of Additional Traffic with L4S

813	   In a typical case for the public Internet a network element that
814	   implements L4S in a shared queue might want to classify some low-rate
815	   but unresponsive traffic (e.g. DNS, LDAP, NTP, voice, game sync
816	   packets) into the low latency queue to mix with L4S traffic.  Such
817	   non-ECN-based packet types MUST be safe to mix with L4S traffic
818	   without harming the low latency service, where 'safe' is explained in
819	   Section 5.4.1.1.1 below.

821	   In this case it would not be appropriate to call the queue an L4S
822	   queue, because it is shared by L4S and non-L4S traffic.  Instead it
823	   will be called the low latency or L queue.  The L queue then offers
824	   two different treatments:

826	   o  The L4S treatment, which is a combination of the L4S AQM treatment
827	      and a priority scheduling treatment;

829	   o  The low latency treatment, which is solely the priority scheduling
830	      treatment, without ECN-marking by the AQM.

832	   To identify packets for just the scheduling treatment, it would be
833	   inappropriate to use the L4S ECT(1) identifier, because such traffic
834	   is unresponsive to ECN marking.  Therefore, a network element that
835	   implements L4S in a shared queue MAY classify additional packets into
836	   the L queue if they carry certain non-ECN identifiers.  For instance:

838	   o  addresses of specific applications or hosts configured to be safe
839	      (or perhaps they comply with L4S behaviour and can respond to ECN
840	      feedback, but perhaps cannot set the ECN field for some reason);

842	   o  certain protocols that are usually lightweight (e.g. ARP, DNS);

844	   o  specific Diffserv codepoints that indicate traffic with limited
845	      burstiness such as the EF (Expedited Forwarding [RFC3246]), Voice-
846	      Admit [RFC5865] or proposed NQB (Non-Queue-Building
847	      [I-D.ietf-tsvwg-nqb]) service classes or equivalent local-use
848	      DSCPs (see [I-D.briscoe-tsvwg-l4s-diffserv]).

850	   Of course, a packet that carried both the ECT(1) codepoint and a non-
851	   ECN identifier associated with the L queue would be classified into
852	   the L queue.

854	   For clarity, non-ECN identifiers, such as the examples itemized
855	   above, might be used by some network operators who believe they
856	   identify non-L4S traffic that would be safe to mix with L4S traffic.
857	   They are not alternative ways for a host to indicate that it is
858	   sending L4S packets.  Only the ECT(1) ECN codepoint indicates to a
859	   network element that a host is sending L4S packets (and CE indicates
860	   that it could have originated as ECT(1)).  Specifically ECT(1)
861	   indicates that the host claims its behaviour satisfies the
862	   prerequisite transport requirements in Section 4.

864	   To include additional traffic with L4S, a network element only reads
865	   identifiers such as those itemized above.  It MUST NOT alter these
866	   non-ECN identifiers, so that they survive for any potential use later
867	   on the network path.

869	5.4.1.1.1.  'Safe' Unresponsive Traffic

871	   The above section requires unresponsive traffic to be 'safe' to mix
872	   with L4S traffic.  Ideally this means that the sender never sends any
873	   sequence of packets at a rate that exceeds the available capacity of
874	   the bottleneck link.  However, typically an unresponsive transport
875	   does not even know the bottleneck capacity of the path, let alone its
876	   available capacity.  Nonetheless, an application can be considered
877	   safe enough if it paces packets out (not necessarily completely
878	   regularly) such that its maximum instantaneous rate from packet to
879	   packet stays well below a typical broadband access rate.

881	   This is a vague but useful definition, because many low latency
882	   applications of interest, such as DNS, voice, game sync packets, RPC,
883	   ACKs, keep-alives, could match this description.

885	5.4.1.2.  Exclusion of Traffic From L4S Treatment

887	   To extend the above example, an operator might want to exclude some
888	   traffic from the L4S treatment for a policy reason, e.g. security
889	   (traffic from malicious sources) or commercial (e.g. initially the
890	   operator may wish to confine the benefits of L4S to business
891	   customers).

893	   In this exclusion case, the operator MUST classify on the relevant
894	   locally-used identifiers (e.g. source addresses) before classifying
895	   the non-matching traffic on the end-to-end L4S ECN identifier.

897	   The operator MUST NOT alter the end-to-end L4S ECN identifier from
898	   L4S to Classic, because its decision to exclude certain traffic from
899	   L4S treatment is local-only.  The end-to-end L4S identifier then
900	   survives for other operators to use, or indeed, they can apply their
901	   own policy, independently based on their own choice of locally-used
902	   identifiers.  This approach also allows any operator to remove its
903	   locally-applied exclusions in future, e.g. if it wishes to widen the
904	   benefit of the L4S treatment to all its customers.

906	5.4.1.3.  Generalized Combination of L4S and Other Identifiers

908	   L4S concerns low latency, which it can provide for all traffic
909	   without differentiation and without _necessarily_ affecting bandwidth
910	   allocation.  Diffserv provides for differentiation of both bandwidth
911	   and low latency, but its control of latency depends on its control of
912	   bandwidth.  The two can be combined if a network operator wants to
913	   control bandwidth allocation but it also wants to provide low latency
914	   - for any amount of traffic within one of these allocations of
915	   bandwidth (rather than only providing low latency by limiting
916	   bandwidth) [I-D.briscoe-tsvwg-l4s-diffserv].

918	   The DualQ examples so far have been framed in the context of
919	   providing the default Best Efforts Per-Hop Behaviour (PHB) using two
920	   queues - a Low Latency (L) queue and a Classic (C) Queue.  This
921	   single DualQ structure is expected to be the most common and useful
922	   arrangement.  But, more generally, an operator might choose to
923	   control bandwidth allocation through a hierarchy of Diffserv PHBs at
924	   a node, and to offer one (or more) of these PHBs with a low latency
925	   and a Classic variant.

927	   In the first case, if we assume that a network element provides no
928	   PHBs except the DualQ, if a packet carries ECT(1) or CE, the network
929	   element would classify it for the L4S treatment irrespective of its
930	   DSCP.  And, if a packet carried (say) the EF DSCP, the network
931	   element could classify it into the L queue irrespective of its ECN
932	   codepoint.  However, where the DualQ is in a hierarchy of other PHBs,
933	   the classifier would classify some traffic into other PHBs based on
934	   DSCP before classifying between the low latency and Classic queues
935	   (based on ECT(1), CE and perhaps also the EF DSCP or other
936	   identifiers as in the above example).
937	   [I-D.briscoe-tsvwg-l4s-diffserv] gives a number of examples of such
938	   arrangements to address various requirements.

940	   [I-D.briscoe-tsvwg-l4s-diffserv] describes how an operator might use
941	   L4S to offer low latency for all L4S traffic as well as using
942	   Diffserv for bandwidth differentiation.  It identifies two main types
943	   of approach, which can be combined: the operator might split certain
944	   Diffserv PHBs between L4S and a corresponding Classic service.  Or it
945	   might split the L4S and/or the Classic service into multiple Diffserv
946	   PHBs.  In either of these cases, a packet would have to be classified
947	   on its Diffserv and ECN codepoints.

949	   In summary, there are numerous ways in which the L4S ECN identifier
950	   (ECT(1) and CE) could be combined with other identifiers to achieve
951	   particular objectives.  The following categorization articulates
952	   those that are valid, but it is not necessarily exhaustive.  Those
953	   tagged 'Recommended-standard-use' could be set by the sending host or
954	   a network.  Those tagged 'Local-use' would only be set by a network:

956	   1.  Identifiers Complementing the L4S Identifier

958	       A.  Including More Traffic in the L Queue
959	           (Could use Recommended-standard-use or Local-use identifiers)

961	       B.  Excluding Certain Traffic from the L Queue
962	           (Local-use only)

964	   2.  Identifiers to place L4S classification in a PHB Hierarchy
965	       (Could use Recommended-standard-use or Local-use identifiers)

967	       A.  PHBs Before L4S ECN Classification

969	       B.  PHBs After L4S ECN Classification

971	5.4.2.  Per-Flow Queuing Examples of Other Identifiers Complementing L4S
972	        Identifiers

974	   At a node with per-flow queueing (e.g. FQ-CoDel [RFC8290]), the L4S
975	   identifier could complement the Layer-4 flow ID as a further level of
976	   flow granularity (i.e.  Not-ECT and ECT(0) queued separately from
977	   ECT(1) and CE packets).  "Risk of reordering Classic CE packets" in
978	   Appendix B.1 discusses the resulting ambiguity if packets originally
979	   marked ECT(0) are marked CE by an upstream AQM before they arrive at
980	   a node that classifies CE as L4S.  It argues that the risk of
981	   reordering is vanishingly small and the consequence of such a low
982	   level of reordering is minimal.

984	   Alternatively, it could be assumed that it is not in a flow's own
985	   interest to mix Classic and L4S identifiers.  Then the AQM could use
986	   the ECN field to switch itself between a Classic and an L4S AQM
987	   behaviour within one per-flow queue.  For instance, for ECN-capable
988	   packets, the AQM might consist of a simple marking threshold and an
989	   L4S ECN identifier might simply select a shallower threshold than a
990	   Classic ECN identifier would.

992	5.5.  Limiting Packet Bursts from Links Supporting L4S AQMs

994	   As well as senders needing to limit packet bursts (Section 4.3),
995	   links need to limit the degree of burstiness they introduce.  In both
996	   cases (senders and links) this is a tradeoff, because batch-handling
997	   of packets is done for good reason, e.g. processing efficiency or to
998	   make efficient use of medium acquisition delay.  Some take the
999	   attitude that there is no point reducing burst delay at the sender
1000	   below that introduced by links (or vice versa).  However, delay
1001	   reduction proceeds by cutting down 'the longest pole in the tent',
1002	   which turns the spotlight on the next longest, and so on.

1004	   This document does not set any quantified requirements for links to
1005	   limit burst delay, primarily because link technologies are outside
1006	   the remit of L4S specifications.  Nonetheless, it would not make
1007	   sense to implement an L4S AQM that feeds into a particular link
1008	   technology without also reviewing opportunities to reduce any form of
1009	   burst delay introduced by that link technology.  This would at least
1010	   limit the bursts that the link would otherwise introduce into the
1011	   onward traffic, which would cause jumpy feedback to the sender as
1012	   well as potential extra queuing delay downstream.  This document does
1013	   not presume to even give guidance on an appropriate target for such
1014	   burst delay until there is more industry experience of L4S.  However,
1015	   as suggested in Section 4.3 it would not seem necessary to limit
1016	   bursts lower than roughly 10% of the minimum base RTT expected in the
1017	   typical deployment scenario (e.g. 250 us burst duration for links
1018	   within the public Internet).

1020	6.  L4S Experiments

1022	   This section describes open questions that L4S Experiments ought to
1023	   focus on.  This section also documents outstanding open issues that
1024	   will need to be investigated as part of L4S experimentation, given
1025	   they could not be fully resolved during the WG phase.  It also lists
1026	   metrics that will need to be monitored during experiments
1027	   (summarizing text elsewhere in L4S documents) and finally lists some
1028	   potential future directions that researchers might wish to
1029	   investigate.

1031	   In addition to this section, [I-D.ietf-tsvwg-aqm-dualq-coupled] sets
1032	   operational and management requirements for experiments with DualQ
1033	   Coupled AQMs; and General operational and management requirements for
1034	   experiments with L4S congestion controls are given in Section 4 and
1035	   Section 5 above, e.g. co-existence and scaling requirements,
1036	   incremental deployment arrangements.

1038	   The specification of each scalable congestion control will need to
1039	   include protocol-specific requirements for configuration and
1040	   monitoring performance during experiments.  Appendix A of [RFC5706]
1041	   provides a helpful checklist.

1043	6.1.  Open Questions

1045	   L4S experiments would be expected to answer the following questions:

1047	   o  Have all the parts of L4S been deployed, and if so, what
1048	      proportion of paths support it?

1050	   o  Does use of L4S over the Internet result in significantly improved
1051	      user experience?

1053	   o  Has L4S enabled novel interactive applications?

1055	   o  Did use of L4S over the Internet result in improvements to the
1056	      following metrics:

1058	   o

1060	      *  queue delay (mean and 99th percentile) under various loads

1062	      *  utilization

1064	      *  starvation / fairness

1066	      *  scaling range of flow rates and RTTs

1068	   o  How much does burstiness in the Internet affect L4S performance,
1069	      and how much limitation of bustiness was needed and/or was
1070	      realized - both at senders and at links, especially radio links?

1072	   o  Was per-flow queue protection typically (un)necessary?

1074	      *  How well did overload protection or queue protection work?

1076	   o  How well did L4S flows coexist with Classic flows when sharing a
1077	      bottleneck?

1079	   o

1081	      *  How frequently did problems arise?

1083	      *  What caused any coexistence problems, and were any problems due
1084	         to single-queue Classic ECN AQMs (this assumes single-queue
1085	         Classic ECN AQMs can be distinguished from FQ ones)?

1087	   o  How prevalent were problems with the L4S service due to tunnels /
1088	      encapsulations that do not support ECN decapsulation?

1090	   o  How easy was it to implement a fully compliant L4S congestion
1091	      control, over various different transport protocols (TCP.  QUIC,
1092	      RMCAT, etc)?

1094	   Monitoring for harm to other traffic, specifically bandwidth
1095	   starvation or excess queuing delay, will need to be conducted
1096	   alongside all early L4S experiments.  It is hard, if not impossible,
1097	   for an individual flow to measure its impact on other traffic.  So
1098	   such monitoring will need to be conducted using bespoke monitoring
1099	   across flows and/or across classes of traffic.

1101	6.2.  Open Issues

1103	   o  What is the best way forward to deal with L4S over single-queue
1104	      Classic ECN AQM bottlenecks, given current problems with
1105	      misdetecting L4S AQMs as Classic ECN AQMs?

1107	   o  Fixing the poor Interaction between current L4S congestion
1108	      controls and CoDel with only Classic ECN support during flow
1109	      startup

1111	6.3.  Future Potential

1113	   Researchers might find that L4S opens up the following interesting
1114	   areas for investigation:

1116	   o  Potential for faster convergence time and tracking of available
1117	      capacity

1119	   o  Potential for improvements to particular link technologies, and
1120	      cross-layer interactions with them.

1122	   o  Potential for using virtual queues, e.g. to further reduce latency
1123	      jitter, or to leave headroom for capacity variation in radio
1124	      networks

1126	   o  Development and specification of reverse path congestion control
1127	      using L4S building bocks (e.g. AccECN, QUIC)

1129	   o  Once queuing delay is cut down, what becomes the 'second longest
1130	      pole in the tent' (other than the speed of light)?

1132	   o  Novel alternatives to the existing set of L4S AQMs

1134	   o  Novel applications enabled by L4S

1136	7.  IANA Considerations

1138	   The 01 codepoint of the ECN Field of the IP header is specified by
1139	   the present Experimental RFC.  The process for an experimental RFC to
1140	   assign this codepoint in the IP header (v4 and v6) is documented in
1141	   Proposed Standard [RFC8311], which updates the Proposed Standard
1142	   [RFC3168].

1144	   When the present document is published as an RFC, IANA is asked to
1145	   update the 01 entry in the registry, "ECN Field (Bits 6-7)" to the
1146	   following (see https://www.iana.org/assignments/dscp-registry/dscp-
1147	   registry.xhtml#ecn-field ):

1149	   +--------+-----------------------------+----------------------------+
1150	   | Binary | Keyword                     | References                 |
1151	   +--------+-----------------------------+----------------------------+
1152	   | 01     | ECT(1) (ECN-Capable         | [RFC8311]                  |
1153	   |        | Transport(1))[1]            | [RFC Errata 5399]          |
1154	   |        |                             | [RFCXXXX]                  |
1155	   +--------+-----------------------------+----------------------------+

1157	   [XXXX is the number that the RFC Editor assigns to the present
1158	   document (this sentence to be removed by the RFC Editor)].

1160	8.  Security Considerations

1162	   Approaches to assure the integrity of signals using the new
1163	   identifier are introduced in Appendix C.1.  See the security
1164	   considerations in the L4S architecture [I-D.ietf-tsvwg-l4s-arch] for
1165	   further discussion of mis-use of the identifier, as well as extensive
1166	   discussion of policing rate and latency in regard to L4S.

1168	   The recommendation to detect loss in time units prevents the ACK-
1169	   splitting attacks described in [Savage-TCP].

1171	9.  Acknowledgements

1173	   Thanks to Richard Scheffenegger, John Leslie, David Taeht, Jonathan
1174	   Morton, Gorry Fairhurst, Michael Welzl, Mikael Abrahamsson and Andrew
1175	   McGregor for the discussions that led to this specification.  Ing-jyh
1176	   (Inton) Tsang was a contributor to the early drafts of this document.
1177	   And thanks to Mikael Abrahamsson, Lloyd Wood, Nicolas Kuhn, Greg
1178	   White, Tom Henderson, David Black, Gorry Fairhurst, Brian Carpenter,
1179	   Jake Holland, Rod Grimes and Richard Scheffenegger for providing help
1180	   and reviewing this draft and to Ingemar Johansson for reviewing and
1181	   providing substantial text.  Particular thanks to Wes Eddy for
1182	   patiently shepherding this and the other L4S drafts through the IETF
1183	   process.  Appendix A listing the Prague L4S Requirements is based on
1184	   text authored by Marcelo Bagnulo Braun that was originally an
1185	   appendix to [I-D.ietf-tsvwg-l4s-arch].  That text was in turn based
1186	   on the collective output of the attendees listed in the minutes of a
1187	   'bar BoF' on DCTCP Evolution during IETF-94 [TCPPrague].

1189	   The authors' contributions were part-funded by the European Community
1190	   under its Seventh Framework Programme through the Reducing Internet
1191	   Transport Latency (RITE) project (ICT-317700).  Bob Briscoe was also
1192	   funded partly by the Research Council of Norway through the TimeIn
1193	   project, partly by CableLabs and partly by the Comcast Innovation
1194	   Fund.  The views expressed here are solely those of the authors.

1196	10.  References

1198	10.1.  Normative References

1200	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1201	              Requirement Levels", BCP 14, RFC 2119,
1202	              DOI 10.17487/RFC2119, March 1997,
1203	              <https://www.rfc-editor.org/info/rfc2119>.

1205	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
1206	              of Explicit Congestion Notification (ECN) to IP",
1207	              RFC 3168, DOI 10.17487/RFC3168, September 2001,
1208	              <https://www.rfc-editor.org/info/rfc3168>.

1210	   [RFC4774]  Floyd, S., "Specifying Alternate Semantics for the
1211	              Explicit Congestion Notification (ECN) Field", BCP 124,
1212	              RFC 4774, DOI 10.17487/RFC4774, November 2006,
1213	              <https://www.rfc-editor.org/info/rfc4774>.

1215	   [RFC6679]  Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P.,
1216	              and K. Carlberg, "Explicit Congestion Notification (ECN)
1217	              for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August
1218	              2012, <https://www.rfc-editor.org/info/rfc6679>.

1220	10.2.  Informative References

1222	   [A2DTCP]   Zhang, T., Wang, J., Huang, J., Huang, Y., Chen, J., and
1223	              Y. Pan, "Adaptive-Acceleration Data Center TCP", IEEE
1224	              Transactions on Computers 64(6):1522-1533, June 2015,
1225	              <http://ieeexplore.ieee.org/xpl/
1226	              articleDetails.jsp?arnumber=6871352>.

1228	   [Ahmed19]  Ahmed, A., "Extending TCP for Low Round Trip Delay",
1229	              Masters Thesis, Uni Oslo , August 2019,
1230	              <https://www.duo.uio.no/handle/10852/70966>.

1232	   [Alizadeh-stability]
1233	              Alizadeh, M., Javanmard, A., and B. Prabhakar, "Analysis
1234	              of DCTCP: Stability, Convergence, and Fairness", ACM
1235	              SIGMETRICS 2011 , June 2011.

1237	   [ARED01]   Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An
1238	              Algorithm for Increasing the Robustness of RED's Active
1239	              Queue Management", ACIRI Technical Report , August 2001,
1240	              <http://www.icir.org/floyd/red.html>.

1242	   [DCttH15]  De Schepper, K., Bondarenko, O., Briscoe, B., and I.
1243	              Tsang, "'Data Centre to the Home': Ultra-Low Latency for
1244	              All", RITE Project Technical Report , 2015,
1245	              <http://riteproject.eu/publications/>.

1247	   [ecn-fallback]
1248	              Briscoe, B. and A. Ahmed, "TCP Prague Fall-back on
1249	              Detection of a Classic ECN AQM", bobbriscoe.net Technical
1250	              Report TR-BB-2019-002, April 2020,
1251	              <https://arxiv.org/abs/1911.00710>.

1253	   [I-D.briscoe-docsis-q-protection]
1254	              Briscoe, B. and G. White, "Queue Protection to Preserve
1255	              Low Latency", draft-briscoe-docsis-q-protection-00 (work
1256	              in progress), July 2019.

1258	   [I-D.briscoe-tsvwg-l4s-diffserv]
1259	              Briscoe, B., "Interactions between Low Latency, Low Loss,
1260	              Scalable Throughput (L4S) and Differentiated Services",
1261	              draft-briscoe-tsvwg-l4s-diffserv-02 (work in progress),
1262	              November 2018.

1264	   [I-D.ietf-avtcore-cc-feedback-message]
1265	              Sarker, Z., Perkins, C., Singh, V., and M. Ramalho, "RTP
1266	              Control Protocol (RTCP) Feedback for Congestion Control",
1267	              draft-ietf-avtcore-cc-feedback-message-09 (work in
1268	              progress), November 2020.

1270	   [I-D.ietf-quic-transport]
1271	              Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed
1272	              and Secure Transport", draft-ietf-quic-transport-34 (work
1273	              in progress), January 2021.

1275	   [I-D.ietf-tcpm-accurate-ecn]
1276	              Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More
1277	              Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate-
1278	              ecn-13 (work in progress), November 2020.

1280	   [I-D.ietf-tcpm-generalized-ecn]
1281	              Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit
1282	              Congestion Notification (ECN) to TCP Control Packets",
1283	              draft-ietf-tcpm-generalized-ecn-06 (work in progress),
1284	              October 2020.

1286	   [I-D.ietf-tcpm-rack]
1287	              Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "The
1288	              RACK-TLP loss detection algorithm for TCP", draft-ietf-
1289	              tcpm-rack-15 (work in progress), December 2020.

1291	   [I-D.ietf-tsvwg-aqm-dualq-coupled]
1292	              Schepper, K., Briscoe, B., and G. White, "DualQ Coupled
1293	              AQMs for Low Latency, Low Loss and Scalable Throughput
1294	              (L4S)", draft-ietf-tsvwg-aqm-dualq-coupled-13 (work in
1295	              progress), November 2020.

1297	   [I-D.ietf-tsvwg-ecn-encap-guidelines]
1298	              Briscoe, B. and J. Kaippallimalil, "Guidelines for Adding
1299	              Congestion Notification to Protocols that Encapsulate IP",
1300	              draft-ietf-tsvwg-ecn-encap-guidelines-14 (work in
1301	              progress), November 2020.

1303	   [I-D.ietf-tsvwg-l4s-arch]
1304	              Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low
1305	              Latency, Low Loss, Scalable Throughput (L4S) Internet
1306	              Service: Architecture", draft-ietf-tsvwg-l4s-arch-08 (work
1307	              in progress), November 2020.

1309	   [I-D.ietf-tsvwg-nqb]
1310	              White, G. and T. Fossati, "A Non-Queue-Building Per-Hop
1311	              Behavior (NQB PHB) for Differentiated Services", draft-
1312	              ietf-tsvwg-nqb-03 (work in progress), November 2020.

1314	   [I-D.ietf-tsvwg-rfc6040update-shim]
1315	              Briscoe, B., "Propagating Explicit Congestion Notification
1316	              Across IP Tunnel Headers Separated by a Shim", draft-ietf-
1317	              tsvwg-rfc6040update-shim-12 (work in progress), November
1318	              2020.

1320	   [I-D.morton-tsvwg-sce]
1321	              Morton, J., Heist, P., and R. Grimes, "The Some Congestion
1322	              Experienced ECN Codepoint", draft-morton-tsvwg-sce-02
1323	              (work in progress), November 2020.

1325	   [I-D.sridharan-tcpm-ctcp]
1326	              Sridharan, M., Tan, K., Bansal, D., and D. Thaler,
1327	              "Compound TCP: A New TCP Congestion Control for High-Speed
1328	              and Long Distance Networks", draft-sridharan-tcpm-ctcp-02
1329	              (work in progress), November 2008.

1331	   [I-D.stewart-tsvwg-sctpecn]
1332	              Stewart, R., Tuexen, M., and X. Dong, "ECN for Stream
1333	              Control Transmission Protocol (SCTP)", draft-stewart-
1334	              tsvwg-sctpecn-05 (work in progress), January 2014.

1336	   [LinuxPacedChirping]
1337	              Misund, J. and B. Briscoe, "Paced Chirping - Rethinking
1338	              TCP start-up", Proc. Linux Netdev 0x13 , March 2019,
1339	              <https://www.netdevconf.org/0x13/session.html?talk-chirp>.

1341	   [Mathis09]
1342	              Mathis, M., "Relentless Congestion Control", PFLDNeT'09 ,
1343	              May 2009, <http://www.hpcc.jp/pfldnet2009/
1344	              Program_files/1569198525.pdf>.

1346	   [Paced-Chirping]
1347	              Misund, J., "Rapid Acceleration in TCP Prague", Masters
1348	              Thesis , May 2018,
1349	              <https://riteproject.files.wordpress.com/2018/07/
1350	              misundjoakimmastersthesissubmitted180515.pdf>.

1352	   [PI2]      De Schepper, K., Bondarenko, O., Tsang, I., and B.
1353	              Briscoe, "PI^2 : A Linearized AQM for both Classic and
1354	              Scalable TCP", Proc. ACM CoNEXT 2016 pp.105-119, December
1355	              2016,
1356	              <http://dl.acm.org/citation.cfm?doid=2999572.2999578>.

1358	   [PragueLinux]
1359	              Briscoe, B., De Schepper, K., Albisser, O., Misund, J.,
1360	              Tilmans, O., Kuehlewind, M., and A. Ahmed, "Implementing
1361	              the `TCP Prague' Requirements for Low Latency Low Loss
1362	              Scalable Throughput (L4S)", Proc. Linux Netdev 0x13 ,
1363	              March 2019, <https://www.netdevconf.org/0x13/
1364	              session.html?talk-tcp-prague-l4s>.

1366	   [QV]       Briscoe, B. and P. Hurtig, "Up to Speed with Queue View",
1367	              RITE Technical Report D2.3; Appendix C.2, August 2015,
1368	              <https://riteproject.files.wordpress.com/2015/12/rite-
1369	              deliverable-2-3.pdf>.

1371	   [RFC2309]  Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
1372	              S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
1373	              Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
1374	              S., Wroclawski, J., and L. Zhang, "Recommendations on
1375	              Queue Management and Congestion Avoidance in the
1376	              Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998,
1377	              <https://www.rfc-editor.org/info/rfc2309>.

1379	   [RFC2474]  Nichols, K., Blake, S., Baker, F., and D. Black,
1380	              "Definition of the Differentiated Services Field (DS
1381	              Field) in the IPv4 and IPv6 Headers", RFC 2474,
1382	              DOI 10.17487/RFC2474, December 1998,
1383	              <https://www.rfc-editor.org/info/rfc2474>.

1385	   [RFC2983]  Black, D., "Differentiated Services and Tunnels",
1386	              RFC 2983, DOI 10.17487/RFC2983, October 2000,
1387	              <https://www.rfc-editor.org/info/rfc2983>.

1389	   [RFC3246]  Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec,
1390	              J., Courtney, W., Davari, S., Firoiu, V., and D.
1391	              Stiliadis, "An Expedited Forwarding PHB (Per-Hop
1392	              Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002,
1393	              <https://www.rfc-editor.org/info/rfc3246>.

1395	   [RFC3540]  Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
1396	              Congestion Notification (ECN) Signaling with Nonces",
1397	              RFC 3540, DOI 10.17487/RFC3540, June 2003,
1398	              <https://www.rfc-editor.org/info/rfc3540>.

1400	   [RFC3649]  Floyd, S., "HighSpeed TCP for Large Congestion Windows",
1401	              RFC 3649, DOI 10.17487/RFC3649, December 2003,
1402	              <https://www.rfc-editor.org/info/rfc3649>.

1404	   [RFC4340]  Kohler, E., Handley, M., and S. Floyd, "Datagram
1405	              Congestion Control Protocol (DCCP)", RFC 4340,
1406	              DOI 10.17487/RFC4340, March 2006,
1407	              <https://www.rfc-editor.org/info/rfc4340>.

1409	   [RFC4341]  Floyd, S. and E. Kohler, "Profile for Datagram Congestion
1410	              Control Protocol (DCCP) Congestion Control ID 2: TCP-like
1411	              Congestion Control", RFC 4341, DOI 10.17487/RFC4341, March
1412	              2006, <https://www.rfc-editor.org/info/rfc4341>.

1414	   [RFC4342]  Floyd, S., Kohler, E., and J. Padhye, "Profile for
1415	              Datagram Congestion Control Protocol (DCCP) Congestion
1416	              Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342,
1417	              DOI 10.17487/RFC4342, March 2006,
1418	              <https://www.rfc-editor.org/info/rfc4342>.

1420	   [RFC4960]  Stewart, R., Ed., "Stream Control Transmission Protocol",
1421	              RFC 4960, DOI 10.17487/RFC4960, September 2007,
1422	              <https://www.rfc-editor.org/info/rfc4960>.

1424	   [RFC5033]  Floyd, S. and M. Allman, "Specifying New Congestion
1425	              Control Algorithms", BCP 133, RFC 5033,
1426	              DOI 10.17487/RFC5033, August 2007,
1427	              <https://www.rfc-editor.org/info/rfc5033>.

1429	   [RFC5348]  Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP
1430	              Friendly Rate Control (TFRC): Protocol Specification",
1431	              RFC 5348, DOI 10.17487/RFC5348, September 2008,
1432	              <https://www.rfc-editor.org/info/rfc5348>.

1434	   [RFC5562]  Kuzmanovic, A., Mondal, A., Floyd, S., and K.
1435	              Ramakrishnan, "Adding Explicit Congestion Notification
1436	              (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562,
1437	              DOI 10.17487/RFC5562, June 2009,
1438	              <https://www.rfc-editor.org/info/rfc5562>.

1440	   [RFC5622]  Floyd, S. and E. Kohler, "Profile for Datagram Congestion
1441	              Control Protocol (DCCP) Congestion ID 4: TCP-Friendly Rate
1442	              Control for Small Packets (TFRC-SP)", RFC 5622,
1443	              DOI 10.17487/RFC5622, August 2009,
1444	              <https://www.rfc-editor.org/info/rfc5622>.

1446	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
1447	              Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
1448	              <https://www.rfc-editor.org/info/rfc5681>.

1450	   [RFC5706]  Harrington, D., "Guidelines for Considering Operations and
1451	              Management of New Protocols and Protocol Extensions",
1452	              RFC 5706, DOI 10.17487/RFC5706, November 2009,
1453	              <https://www.rfc-editor.org/info/rfc5706>.

1455	   [RFC5865]  Baker, F., Polk, J., and M. Dolly, "A Differentiated
1456	              Services Code Point (DSCP) for Capacity-Admitted Traffic",
1457	              RFC 5865, DOI 10.17487/RFC5865, May 2010,
1458	              <https://www.rfc-editor.org/info/rfc5865>.

1460	   [RFC5925]  Touch, J., Mankin, A., and R. Bonica, "The TCP
1461	              Authentication Option", RFC 5925, DOI 10.17487/RFC5925,
1462	              June 2010, <https://www.rfc-editor.org/info/rfc5925>.

1464	   [RFC6040]  Briscoe, B., "Tunnelling of Explicit Congestion
1465	              Notification", RFC 6040, DOI 10.17487/RFC6040, November
1466	              2010, <https://www.rfc-editor.org/info/rfc6040>.

1468	   [RFC6077]  Papadimitriou, D., Ed., Welzl, M., Scharf, M., and B.
1469	              Briscoe, "Open Research Issues in Internet Congestion
1470	              Control", RFC 6077, DOI 10.17487/RFC6077, February 2011,
1471	              <https://www.rfc-editor.org/info/rfc6077>.

1473	   [RFC6660]  Briscoe, B., Moncaster, T., and M. Menth, "Encoding Three
1474	              Pre-Congestion Notification (PCN) States in the IP Header
1475	              Using a Single Diffserv Codepoint (DSCP)", RFC 6660,
1476	              DOI 10.17487/RFC6660, July 2012,
1477	              <https://www.rfc-editor.org/info/rfc6660>.

1479	   [RFC7560]  Kuehlewind, M., Ed., Scheffenegger, R., and B. Briscoe,
1480	              "Problem Statement and Requirements for Increased Accuracy
1481	              in Explicit Congestion Notification (ECN) Feedback",
1482	              RFC 7560, DOI 10.17487/RFC7560, August 2015,
1483	              <https://www.rfc-editor.org/info/rfc7560>.

1485	   [RFC7567]  Baker, F., Ed. and G. Fairhurst, Ed., "IETF
1486	              Recommendations Regarding Active Queue Management",
1487	              BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015,
1488	              <https://www.rfc-editor.org/info/rfc7567>.

1490	   [RFC7713]  Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx)
1491	              Concepts, Abstract Mechanism, and Requirements", RFC 7713,
1492	              DOI 10.17487/RFC7713, December 2015,
1493	              <https://www.rfc-editor.org/info/rfc7713>.

1495	   [RFC8033]  Pan, R., Natarajan, P., Baker, F., and G. White,
1496	              "Proportional Integral Controller Enhanced (PIE): A
1497	              Lightweight Control Scheme to Address the Bufferbloat
1498	              Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017,
1499	              <https://www.rfc-editor.org/info/rfc8033>.

1501	   [RFC8257]  Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L.,
1502	              and G. Judd, "Data Center TCP (DCTCP): TCP Congestion
1503	              Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257,
1504	              October 2017, <https://www.rfc-editor.org/info/rfc8257>.

1506	   [RFC8290]  Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys,
1507	              J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler
1508	              and Active Queue Management Algorithm", RFC 8290,
1509	              DOI 10.17487/RFC8290, January 2018,
1510	              <https://www.rfc-editor.org/info/rfc8290>.

1512	   [RFC8298]  Johansson, I. and Z. Sarker, "Self-Clocked Rate Adaptation
1513	              for Multimedia", RFC 8298, DOI 10.17487/RFC8298, December
1514	              2017, <https://www.rfc-editor.org/info/rfc8298>.

1516	   [RFC8311]  Black, D., "Relaxing Restrictions on Explicit Congestion
1517	              Notification (ECN) Experimentation", RFC 8311,
1518	              DOI 10.17487/RFC8311, January 2018,
1519	              <https://www.rfc-editor.org/info/rfc8311>.

1521	   [RFC8312]  Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and
1522	              R. Scheffenegger, "CUBIC for Fast Long-Distance Networks",
1523	              RFC 8312, DOI 10.17487/RFC8312, February 2018,
1524	              <https://www.rfc-editor.org/info/rfc8312>.

1526	   [RFC8511]  Khademi, N., Welzl, M., Armitage, G., and G. Fairhurst,
1527	              "TCP Alternative Backoff with ECN (ABE)", RFC 8511,
1528	              DOI 10.17487/RFC8511, December 2018,
1529	              <https://www.rfc-editor.org/info/rfc8511>.

1531	   [Savage-TCP]
1532	              Savage, S., Cardwell, N., Wetherall, D., and T. Anderson,
1533	              "TCP Congestion Control with a Misbehaving Receiver", ACM
1534	              SIGCOMM Computer Communication Review 29(5):71--78,
1535	              October 1999.

1537	   [sub-mss-prob]
1538	              Briscoe, B. and K. De Schepper, "Scaling TCP's Congestion
1539	              Window for Small Round Trip Times", BT Technical Report
1540	              TR-TUB8-2015-002, May 2015,
1541	              <https://arxiv.org/abs/1904.07598>.

1543	   [TCP-CA]   Jacobson, V. and M. Karels, "Congestion Avoidance and
1544	              Control", Laurence Berkeley Labs Technical Report ,
1545	              November 1988, <http://ee.lbl.gov/papers/congavoid.pdf>.

1547	   [TCPPrague]
1548	              Briscoe, B., "Notes: DCTCP evolution 'bar BoF': Tue 21 Jul
1549	              2015, 17:40, Prague", tcpprague mailing list archive ,
1550	              July 2015, <https://www.ietf.org/mail-
1551	              archive/web/tcpprague/current/msg00001.html>.

1553	   [VCP]      Xia, Y., Subramanian, L., Stoica, I., and S. Kalyanaraman,
1554	              "One more bit is enough", Proc. SIGCOMM'05, ACM CCR
1555	              35(4)37--48, 2005,
1556	              <http://doi.acm.org/10.1145/1080091.1080098>.

1558	Appendix A.  The 'Prague L4S Requirements'

1560	   This appendix is informative, not normative.  It gives a list of
1561	   modifications to current scalable congestion controls so that they
1562	   can be deployed over the public Internet and coexist safely with
1563	   existing traffic.  The list complements the normative requirements in
1564	   Section 4 that a sender has to comply with before it can set the L4S
1565	   identifier in packets it sends into the Internet.  As well as
1566	   necessary safety improvements (requirements) this appendix also
1567	   includes preferable performance improvements (optimizations).

1569	   These recommendations have become know as the Prague L4S
1570	   Requirements, because they were originally identified at an ad hoc
1571	   meeting during IETF-94 in Prague [TCPPrague].  They were originally
1572	   called the 'TCP Prague Requirements', but they are not solely
1573	   applicable to TCP, so the name and wording has been generalized for
1574	   all transport protocols, and the name 'TCP Prague' is now used for a
1575	   specific implementation of the requirements.

1577	   At the time of writing, DCTCP [RFC8257] is the most widely used
1578	   scalable transport protocol.  In its current form, DCTCP is specified
1579	   to be deployable only in controlled environments.  Deploying it in
1580	   the public Internet would lead to a number of issues, both from the
1581	   safety and the performance perspective.  The modifications and
1582	   additional mechanisms listed in this section will be necessary for
1583	   its deployment over the global Internet.  Where an example is needed,
1584	   DCTCP is used as a base, but it is likely that most of these
1585	   requirements equally apply to other scalable congestion controls,
1586	   covering adaptive real-time media, etc., not just capacity-seeking
1587	   behaviours.

1589	A.1.  Requirements for Scalable Transport Protocols

1591	A.1.1.  Use of L4S Packet Identifier

1593	   Description: A scalable congestion control needs to distinguish the
1594	   packets it sends from those sent by Classic congestion controls (see
1595	   the precise normative requirement wording in Section 4.1).

1597	   Motivation: It needs to be possible for a network node to classify
1598	   L4S packets without flow state into a queue that applies an L4S ECN
1599	   marking behaviour and isolates L4S packets from the queuing delay of
1600	   Classic packets.

1602	A.1.2.  Accurate ECN Feedback

1604	   Description: The transport protocol for a scalable congestion control
1605	   needs to provide timely, accurate feedback about the extent of ECN
1606	   marking experienced by all packets (see the precise normative
1607	   requirement wording in Section 4.2).

1609	   Motivation: Classic congestion controls only need feedback about the
1610	   existence of a congestion episode within a round trip, not precisely
1611	   how many packets were marked with ECN or dropped.  Therefore, in
1612	   2001, when ECN feedback was added to TCP [RFC3168], it could not
1613	   inform the sender of more than one ECN mark per RTT.  Since then,
1614	   requirements for more accurate ECN feedback in TCP have been defined
1615	   in [RFC7560] and [I-D.ietf-tcpm-accurate-ecn] specifies an
1616	   experimental change to the TCP wire protocol to satisfy these
1617	   requirements.  Most other transport protocols already satisfy this
1618	   requirement (see Section 4.2).

1620	A.1.3.  Fall back to Reno-friendly congestion control on packet loss

1622	   Description: As well as responding to ECN markings in a scalable way,
1623	   a scalable congestion control needs to react to packet loss in a way
1624	   that will coexist safely with a TCP Reno congestion control [RFC5681]
1625	   (see the precise normative requirement wording in Section 4.3).

1627	   Motivation: Part of the safety conditions for deploying a scalable
1628	   congestion control on the public Internet is to make sure that it
1629	   behaves properly when it builds a queue at a network bottleneck that
1630	   has not been upgraded to support L4S.  Packet loss can have many
1631	   causes, but it usually has to be conservatively assumed that it is a
1632	   sign of congestion.  Therefore, on detecting packet loss, a scalable
1633	   congestion control will need to fall back to Classic congestion
1634	   control behaviour.  If it does not comply with this requirement it
1635	   could starve Classic traffic.

1637	   A scalable congestion control can be used for different types of
1638	   transport, e.g. for real-time media or for reliable transport like
1639	   TCP.  Therefore, the particular Classic congestion control behaviour
1640	   to fall back on will need to be part of the congestion control
1641	   specification of the relevant transport.  In the particular case of
1642	   DCTCP, the DCTCP specification [RFC8257] states that "It is
1643	   RECOMMENDED that an implementation deal with loss episodes in the
1644	   same way as conventional TCP."  For safe deployment of a scalable
1645	   congestion control in the public Internet, the above requirement
1646	   would need to be defined as a "MUST".

1648	   Even though a bottleneck is L4S capable, it might still become
1649	   overloaded and have to drop packets.  In this case, the sender may
1650	   receive a high proportion of packets marked with the CE bit set and
1651	   also experience loss.  Current DCTCP implementations each react
1652	   differently to this situation.  At least one implementation reacts
1653	   only to the drop signal (e.g. by halving the CWND) and at least
1654	   another DCTCP implementation reacts to both signals (e.g. by halving
1655	   the CWND due to the drop and also further reducing the CWND based on
1656	   the proportion of marked packet).  A third approach for the public
1657	   Internet has been proposed that adjusts the loss response to result
1658	   in a halving when combined with the ECN response.  We believe that
1659	   further experimentation is needed to understand what is the best
1660	   behaviour for the public Internet, which may or not be one of these
1661	   existing approaches.

1663	A.1.4.  Fall back to Reno-friendly congestion control on classic ECN
1664	        bottlenecks

1666	   Description: A scalable congestion control needs to react to ECN
1667	   marking from a non-L4S, but ECN-capable, bottleneck in a way that
1668	   will coexist with a TCP Reno congestion control [RFC5681] (see the
1669	   precise normative requirement wording in Section 4.3).

1671	   Motivation: Similarly to the requirement in Appendix A.1.3, this
1672	   requirement is a safety condition to ensure a scalable congestion
1673	   control behaves properly when it builds a queue at a network
1674	   bottleneck that has not been upgraded to support L4S.  On detecting
1675	   Classic ECN marking (see below), a scalable congestion control will
1676	   need to fall back to Classic congestion control behaviour.  If it
1677	   does not comply with this requirement it could starve Classic
1678	   traffic.

1680	   A passive monitoring algorithm to detect a Classic ECN AQM at the
1681	   bottleneck is provided in [ecn-fallback], which also provides a link
1682	   to Linux source code.  Very briefly, the algorithm primarily monitors
1683	   RTT variation using the same algorithm that maintains the mean
1684	   deviation of TCP's smoothed RTT, but it smooths over a duration of
1685	   the order of a Classic sawtooth.  The outcome is also conditioned on
1686	   other metrics such as the presence of CE marking and congestion
1687	   avoidance phase having stabilized.  The report also identifies
1688	   further work to improve the approach, for instance improvements with
1689	   low capacity links and combining the measurements with a cache of
1690	   what had been learned about a path in previous connections.

1692	   The relevant normative requirement (Section 4.3) is expressed as a
1693	   'SHOULD' to allow the possibility that the operator of the host knows
1694	   that the network it serves has not deployed any single queue classic
1695	   ECN AQM (e.g. a CDN might be testing out of band for signs of Classic
1696	   ECN AQMs, or they might have manually checked which ISPs they serve
1697	   have not deployed Classic ECN AQMs).

1699	   Nonetheless, monitoring is still expressed as a 'MUST' because there
1700	   is still a possibility that there is a Classic ECN AQM somewhere else
1701	   on the path (to continue the CDN example, perhaps beyond the ISP in a
1702	   home network).  Then, if the server operators have disabled fall-back
1703	   for parts of their deployment, they can reconsider their policy or at
1704	   least do more focused testing if in-band monitoring frequently
1705	   detects single-queue Classic ECN AQMs.

1707	A.1.5.  Reduce RTT dependence

1709	   Description: A scalable congestion control needs to reduce or
1710	   eliminate RTT bias at least over the low to typical range of RTTs
1711	   that will interact in the intended deployment scenario (see the
1712	   precise normative requirement wording in Section 4.3).

1714	   Motivation: The throughput of Classic congestion controls is known to
1715	   be inversely proportional to RTT, so one would expect flows over very
1716	   low RTT paths to nearly starve flows over larger RTTs.  However,
1717	   Classic congestion controls have never allowed a very low RTT path to
1718	   exist because they induce a large queue.  For instance, consider two
1719	   paths with base RTT 1ms and 100ms.  If a Classic congestion control
1720	   induces a 100ms queue, it turns these RTTs into 101ms and 200ms
1721	   leading to a throughput ratio of about 2:1.  Whereas if a scalable
1722	   congestion control induces only a 1ms queue, the ratio is 2:101,
1723	   leading to a throughput ratio of about 50:1.

1725	   Therefore, with very small queues, long RTT flows will essentially
1726	   starve, unless scalable congestion controls comply with this
1727	   requirement.

1729	   The RTT bias in current Classic congestion controls works
1730	   satisfactorily when the RTT is higher than typical, and L4S does not
1731	   change that.  So, there is no additional requirement for high RTT L4S
1732	   flows to remove RTT bias - they can but they don't have to.

1734	A.1.6.  Scaling down to fractional congestion windows

1736	   Description: A scalable congestion control needs to remain responsive
1737	   to congestion when typical RTTs over the public Internet are
1738	   significantly smaller because they are no longer inflated by queuing
1739	   delay (see the precise normative requirement wording in Section 4.3).

1741	   Motivation: As currently specified, the minimum required congestion
1742	   window of TCP (and its derivatives) is set to 2 sender maximum
1743	   segment sizes (SMSS) (see equation (4) in [RFC5681]).  Once the
1744	   congestion window reaches this minimum, all known window-based
1745	   congestion control algorithms become unresponsive to congestion
1746	   signals.  No matter how much drop or ECN marking, the congestion
1747	   window of all these algorithms no longer reduces.  Instead, the
1748	   sender's lack of any further congestion response forces the queue to
1749	   grow, overriding any AQM and increasing queuing delay.

1751	   L4S mechanisms significantly reduce queueing delay so, over the same
1752	   path, the RTT becomes lower.  Then this problem becomes surprisingly
1753	   common [sub-mss-prob].  This is because, for the same link capacity,
1754	   smaller RTT implies a smaller window.  For instance, consider a
1755	   residential setting with an upstream broadband Internet access of 8
1756	   Mb/s, assuming a max segment size of 1500 B.  Two upstream flows will
1757	   each have the minimum window of 2 SMSS if the RTT is 6ms or less,
1758	   which is quite common when accessing a nearby data centre.  So, any
1759	   more than two such parallel TCP flows will become unresponsive and
1760	   increase queuing delay.

1762	   Unless scalable congestion controls address this requirement from the
1763	   start, they will frequently become unresponsive, negating the low
1764	   latency benefit of L4S, for themselves and for others.

1766	   That would seem to imply that scalable congestion controllers ought
1767	   to be required to be able work with a congestion window less than 2
1768	   SMSS.  For instance, one possible mechanism that can maintain a
1769	   congestion window significantly less than 1 SMSS is described in
1770	   [Ahmed19], and other approaches are likely to be feasible.

1772	   However, the requirement in Section 4.3 is worded as a "SHOULD"
1773	   because the existence of a minimum window is not all bad.  When
1774	   competing with an unresponsive flow, a minimum window naturally
1775	   protects the flow from starvation by at least keeping some data
1776	   flowing.

1778	   By stating this requirement as a "SHOULD", specifications of scalable
1779	   congestion controllers will be able to choose an appropriate minimum
1780	   window, but they will at least have to justify the decision.

1782	A.1.7.  Measuring Reordering Tolerance in Time Units

1784	   Description: A scalable congestion control needs to detect loss by
1785	   counting in time-based units, which is scalable, rather than counting
1786	   in units of packets, which is not (see the precise normative
1787	   requirement wording in Section 4.3).

1789	   Motivation: A primary purpose of L4S is scalable throughput (it's in
1790	   the name).  Scalability in all dimensions is, of course, also a goal
1791	   of all IETF technology.  The inverse linear congestion response in
1792	   Section 4.3 is necessary, but not sufficient, to solve the congestion
1793	   control scalability problem identified in [RFC3649].  As well as
1794	   maintaining frequent ECN signals as rate scales, it is also important
1795	   to ensure that a potentially false perception of loss does not limit
1796	   throughput scaling.

1798	   End-systems cannot know whether a missing packet is due to loss or
1799	   reordering, except in hindsight - if it appears later.  So they can
1800	   only deem that there has been a loss if a gap in the sequence space
1801	   has not been filled, either after a certain number of subsequent
1802	   packets has arrived (e.g. the 3 DupACK rule of standard TCP
1803	   congestion control [RFC5681]) or after a certain amount of time
1804	   (e.g. the RACK approach [I-D.ietf-tcpm-rack]).

1806	   As we attempt to scale packet rate over the years:

1808	   o  Even if only _some_ sending hosts still deem that loss has
1809	      occurred by counting reordered packets, _all_ networks will have
1810	      to keep reducing the time over which they keep packets in order.
1811	      If some link technologies keep the time within which reordering
1812	      occurs roughly unchanged, then loss over these links, as perceived
1813	      by these hosts, will appear to continually rise over the years.

1815	   o  In contrast, if all senders detect loss in units of time, the time
1816	      over which the network has to keep packets in order stays roughly
1817	      invariant.

1819	   Therefore hosts have an incentive to detect loss in time units (so as
1820	   not to fool themselves too often into detecting losses when there are
1821	   none).  And for hosts that are changing their congestion control
1822	   implementation to L4S, there is no downside to including time-based
1823	   loss detection code in the change (loss recovery implemented in
1824	   hardware is an exception, covered later).  Therefore requiring L4S
1825	   hosts to detect loss in time-based units would not be a burden.

1827	   If this requirement is not placed on L4S hosts, even though it would
1828	   be no burden on them to do so, all networks will face unnecessary
1829	   uncertainty over whether some L4S hosts might be detecting loss by
1830	   counting packets.  Then _all_ link technologies will have to
1831	   unnecessarily keep reducing the time within which reordering occurs.
1832	   That is not a problem for some link technologies, but it becomes
1833	   increasingly challenging for other link technologies to continue to
1834	   scale, particularly those relying on channel bonding for scaling,
1835	   such as LTE, 5G and DOCSIS.

1837	   Given Internet paths traverse many link technologies, any scaling
1838	   limit for these more challenging access link technologies would
1839	   become a scaling limit for the Internet as a whole.

1841	   It might be asked how it helps to place this loss detection
1842	   requirement only on L4S hosts, because networks will still face
1843	   uncertainty over whether non-L4S flows are detecting loss by counting
1844	   DupACKs.  The answer is that those link technologies for which it is
1845	   challenging to keep squeezing the reordering time will only need to
1846	   do so for non-L4S traffic (which they can do because the L4S
1847	   identifier is visible at the IP layer).  Therefore, they can focus
1848	   their processing and memory resources into scaling non-L4S (Classic)
1849	   traffic.  Then, the higher the proportion of L4S traffic, the less of
1850	   a scaling challenge they will have.

1852	   To summarize, there is no reason for L4S hosts not to be part of the
1853	   solution instead of part of the problem.

1855	   Requirement ("MUST") or recommendation ("SHOULD")?  As explained
1856	   above, this is a subtle interoperability issue between hosts and
1857	   networks, which seems to need a "MUST".  Unless networks can be
1858	   certain that all L4S hosts follow the time-based approach, they still
1859	   have to cater for the worst case - continually squeeze reordering
1860	   into a smaller and smaller duration - just for hosts that might be
1861	   using the counting approach.  However, it was decided to express this
1862	   as a recommendation, using "SHOULD".  The main justification was that
1863	   networks can still be fairly certain that L4S hosts will follow this
1864	   recommendation, because following it offers only gain and no pain.

1866	   Details:

1868	   The speed of loss recovery is much more significant for short flows
1869	   than long, therefore a good compromise is to adapt the reordering
1870	   window; from a small fraction of the RTT at the start of a flow, to a
1871	   larger fraction of the RTT for flows that continue for many round
1872	   trips.

1874	   This is broadly the approach adopted by TCP RACK (Recent
1875	   ACKnowledgements) [I-D.ietf-tcpm-rack].  However, RACK starts with
1876	   the 3 DupACK approach, because the RTT estimate is not necessarily
1877	   stable.  As long as the initial window is paced, such initial use of
1878	   3 DupACK counting would amount to time-based loss detection and
1879	   therefore would satisfy the time-based loss detection recommendation
1880	   of Section 4.3.  This is because pacing of the initial window would
1881	   ensure that 3 DupACKs early in the connection would be spread over a
1882	   small fraction of the round trip.

1884	   As mentioned above, hardware implementations of loss recovery using
1885	   DupACK counting exist (e.g. some implementations of RoCEv2 for RDMA).
1886	   For low latency, these implementations can change their congestion
1887	   control to implement L4S, because the congestion control (as distinct
1888	   from loss recovery) is implemented in software.  But they cannot
1889	   easily satisfy this loss recovery requirement.  However, it is
1890	   believed they do not need to.  It is believed that such
1891	   implementations solely exist in controlled environments, where the
1892	   network technology keeps reordering extremely low anyway.  This is
1893	   why controlled environments with hardly any reordering are excluded
1894	   from the scope of the normative recommendation in Section 4.3.

1896	   Detecting loss in time units also prevents the ACK-splitting attacks
1897	   described in [Savage-TCP].

1899	A.2.  Scalable Transport Protocol Optimizations

1901	A.2.1.  Setting ECT in TCP Control Packets and Retransmissions

1903	   Description: This item only concerns TCP and its derivatives
1904	   (e.g. SCTP), because the original specification of ECN for TCP
1905	   precluded the use of ECN on control packets and retransmissions.  To
1906	   improve performance, scalable transport protocols ought to enable ECN
1907	   at the IP layer in TCP control packets (SYN, SYN-ACK, pure ACKs,
1908	   etc.) and in retransmitted packets.  The same is true for derivatives
1909	   of TCP, e.g. SCTP.

1911	   Motivation: RFC 3168 prohibits the use of ECN on these types of TCP
1912	   packet, based on a number of arguments.  This means these packets are
1913	   not protected from congestion loss by ECN, which considerably harms
1914	   performance, particularly for short flows.
1915	   [I-D.ietf-tcpm-generalized-ecn] counters each argument in RFC 3168 in
1916	   turn, showing it was over-cautious.  Instead it proposes experimental
1917	   use of ECN on all types of TCP packet as long as AccECN feedback
1918	   [I-D.ietf-tcpm-accurate-ecn] is available (which is itself a
1919	   prerequisite for using a scalable congestion control).

1921	A.2.2.  Faster than Additive Increase

1923	   Description: It would improve performance if scalable congestion
1924	   controls did not limit their congestion window increase to the
1925	   standard additive increase of 1 SMSS per round trip [RFC5681] during
1926	   congestion avoidance.  The same is true for derivatives of TCP
1927	   congestion control, including similar approaches used for real-time
1928	   media.

1930	   Motivation: As currently defined [RFC8257], DCTCP uses the
1931	   traditional TCP Reno additive increase in congestion avoidance phase.
1932	   When the available capacity suddenly increases (e.g. when another
1933	   flow finishes, or if radio capacity increases) it can take very many
1934	   round trips to take advantage of the new capacity.  TCP Cubic was
1935	   designed to solve this problem, but as flow rates have continued to
1936	   increase, the delay accelerating into available capacity has become
1937	   prohibitive.  See, for instance, the examples in Section 1.2.  Even
1938	   when out of its Reno-compatibility mode, every 8x scaling of Cubic's
1939	   flow rate leads to 2x more acceleration delay.

1941	   In the steady state, DCTCP induces about 2 ECN marks per round trip,
1942	   so it is possible to quickly detect when these signals have
1943	   disappeared and seek available capacity more rapidly, while
1944	   minimizing the impact on other flows (Classic and scalable)
1945	   [LinuxPacedChirping].  Alternatively, approaches such as Adaptive
1946	   Acceleration (A2DTCP [A2DTCP]) have been proposed to address this
1947	   problem in data centres, which might be deployable over the public
1948	   Internet.

1950	A.2.3.  Faster Convergence at Flow Start

1952	   Description: It would improve performance if scalable congestion
1953	   controls converged (reached their steady-state share of the capacity)
1954	   faster than Classic congestion controls or at least no slower.  This
1955	   affects the flow start behaviour of any L4S congestion control
1956	   derived from a Classic transport that uses TCP slow start, including
1957	   those for real-time media.

1959	   Motivation: As an example, a new DCTCP flow takes longer than a
1960	   Classic congestion control to obtain its share of the capacity of the
1961	   bottleneck when there are already ongoing flows using the bottleneck
1962	   capacity.  In a data centre environment DCTCP takes about a factor of
1963	   1.5 to 2 longer to converge due to the much higher typical level of
1964	   ECN marking that DCTCP background traffic induces, which causes new
1965	   flows to exit slow start early [Alizadeh-stability].  In testing for
1966	   use over the public Internet the convergence time of DCTCP relative
1967	   to a regular loss-based TCP slow start is even less favourable
1968	   [Paced-Chirping] due to the shallow ECN marking threshold needed for
1969	   L4S.  It is exacerbated by the typically greater mismatch between the
1970	   link rate of the sending host and typical Internet access
1971	   bottlenecks.  This problem is detrimental in general, but would
1972	   particularly harm the performance of short flows relative to Classic
1973	   congestion controls.

1975	Appendix B.  Alternative Identifiers

1977	   This appendix is informative, not normative.  It records the pros and
1978	   cons of various alternative ways to identify L4S packets to record
1979	   the rationale for the choice of ECT(1) (Appendix B.1) as the L4S
1980	   identifier.  At the end, Appendix B.8 summarises the distinguishing
1981	   features of the leading alternatives.  It is intended to supplement,
1982	   not replace the detailed text.

1984	   The leading solutions all use the ECN field, sometimes in combination
1985	   with the Diffserv field.  This is because L4S traffic has to indicate
1986	   that it is ECN-capable anyway, because ECN is intrinsic to how L4S
1987	   works.  Both the ECN and Diffserv fields have the additional
1988	   advantage that they are no different in either IPv4 or IPv6.  A
1989	   couple of alternatives that use other fields are mentioned at the
1990	   end, but it is quickly explained why they are not serious contenders.

1992	B.1.  ECT(1) and CE codepoints

1994	   Definition:

1996	      Packets with ECT(1) and conditionally packets with CE would
1997	      signify L4S semantics as an alternative to the semantics of
1998	      Classic ECN [RFC3168], specifically:

2000	      *  The ECT(1) codepoint would signify that the packet was sent by
2001	         an L4S-capable sender.

2003	      *  Given shortage of codepoints, both L4S and Classic ECN sides of
2004	         an AQM would have to use the same CE codepoint to indicate that
2005	         a packet had experienced congestion.  If a packet that had
2006	         already been marked CE in an upstream buffer arrived at a
2007	         subsequent AQM, this AQM would then have to guess whether to
2008	         classify CE packets as L4S or Classic ECN.  Choosing the L4S
2009	         treatment would be a safer choice, because then a few Classic
2010	         packets might arrive early, rather than a few L4S packets
2011	         arriving late.

2013	      *  Additional information might be available if the classifier
2014	         were transport-aware.  Then it could classify a CE packet for
2015	         Classic ECN treatment if the most recent ECT packet in the same
2016	         flow had been marked ECT(0).  However, the L4S service ought
2017	         not to need tranport-layer awareness.

2019	   Cons:

2021	   Consumes the last ECN codepoint:  The L4S service could potentially
2022	      supersede the service provided by Classic ECN, therefore using
2023	      ECT(1) to identify L4S packets could ultimately mean that the
2024	      ECT(0) codepoint was 'wasted' purely to distinguish one form of
2025	      ECN from its successor.

2027	   ECN hard in some lower layers:  It is not always possible to support
2028	      ECN in an AQM acting in a buffer below the IP layer
2029	      [I-D.ietf-tsvwg-ecn-encap-guidelines].  In such cases, the L4S
2030	      service would have to drop rather than mark frames even though
2031	      they might encapsulate an ECN-capable packet.

2033	   Risk of reordering Classic CE packets:  Classifying all CE packets
2034	      into the L4S queue risks any CE packets that were originally
2035	      ECT(0) being incorrectly classified as L4S.  If there were delay
2036	      in the Classic queue, these incorrectly classified CE packets
2037	      would arrive early, which is a form of reordering.  Reordering can
2038	      cause TCP senders (and senders of similar transports) to
2039	      retransmit spuriously.  However, the risk of spurious
2040	      retransmissions would be extremely low for the following reasons:

2042	      1.  It is quite unusual to experience queuing at more than one
2043	          bottleneck on the same path (the available capacities have to
2044	          be identical).

2046	      2.  In only a subset of these unusual cases would the first
2047	          bottleneck support Classic ECN marking while the second
2048	          supported L4S ECN marking, which would be the only scenario
2049	          where some ECT(0) packets could be CE marked by an AQM
2050	          supporting Classic ECN then the remainder experienced further
2051	          delay through the Classic side of a subsequent L4S DualQ AQM.

2053	      3.  Even then, when a few packets are delivered early, it takes
2054	          very unusual conditions to cause a spurious retransmission, in
2055	          contrast to when some packets are delivered late.  The first
2056	          bottleneck has to apply CE-marks to at least N contiguous
2057	          packets and the second bottleneck has to inject an
2058	          uninterrupted sequence of at least N of these packets between
2059	          two packets earlier in the stream (where N is the reordering
2060	          window that the transport protocol allows before it considers
2061	          a packet is lost).

2063	             For example consider N=3, and consider the sequence of
2064	             packets 100, 101, 102, 103,... and imagine that packets
2065	             150,151,152 from later in the flow are injected as follows:
2066	             100, 150, 151, 101, 152, 102, 103...  If this were late
2067	             reordering, even one packet arriving out of sequence would
2068	             trigger a spurious retransmission, but there is no spurious
2069	             retransmission here with early reordering, because packet
2070	             101 moves the cumulative ACK counter forward before 3
2071	             packets have arrived out of order.  Later, when packets
2072	             148, 149, 153... arrive, even though there is a 3-packet
2073	             hole, there will be no problem, because the packets to fill
2074	             the hole are already in the receive buffer.

2076	      4.  Even with the current TCP recommendation of N=3 [RFC5681]
2077	          spurious retransmissions will be unlikely for all the above
2078	          reasons.  As RACK [I-D.ietf-tcpm-rack] is becoming widely
2079	          deployed, it tends to adapt its reordering window to a larger
2080	          value of N, which will make the chance of a contiguous
2081	          sequence of N early arrivals vanishingly small.

2083	      5.  Even a run of 2 CE marks within a Classic ECN flow is
2084	          unlikely, given FQ-CoDel is the only known widely deployed AQM
2085	          that supports Classic ECN marking and it takes great care to
2086	          separate out flows and to space any markings evenly along each
2087	          flow.

2089	      It is extremely unlikely that the above set of 5 eventualities
2090	      that are each unusual in themselves would all happen
2091	      simultaneously.  But, even if they did, the consequences would
2092	      hardly be dire: the odd spurious fast retransmission.  Whenever
2093	      the traffic source (a Classic congestion control) mistakes the
2094	      reordering of a string of CE marks for a loss, one might think
2095	      that it will reduce its congestion window as well as emitting a
2096	      spurious retransmission.  However, it would have already reduced
2097	      its congestion window when the CE markings arrived early.  If it
2098	      is using ABE [RFC8511], it might reduce cwnd a little more for a
2099	      loss than for a CE mark.  But it will revert that reduction once
2100	      it detects that the retransmission was spurious.

2102	      In conclusion, the impact of early reordering due to CE being
2103	      ambiguous will generally be vanishingly small.

2105	   Hard to distinguish Classic ECN AQM:  With this scheme, when a source
2106	      receives ECN feedback, it is not explicitly clear which type of
2107	      AQM generated the CE markings.  This is not a problem for Classic
2108	      ECN sources that send ECT(0) packets, because an L4S AQM will
2109	      recognize the ECT(0) packets as Classic and apply the appropriate
2110	      Classic ECN marking behaviour.

2112	      However, in the absence of explicit disambiguation of the CE
2113	      markings, an L4S source needs to use heuristic techniques to work
2114	      out which type of congestion response to apply (see
2115	      Appendix A.1.4).  Otherwise, if long-running Classic flow(s) are
2116	      sharing a Classic ECN AQM bottleneck with long-running L4S
2117	      flow(s), which then apply an L4S response to Classic CE signals,
2118	      the L4S flows would outcompete the Classic flow(s).  Experiments
2119	      have shown that L4S flows can take about 20 times more capacity
2120	      share than equivalent Classic flows.  Nonetheless, as link
2121	      capacity reduces (e.g. to 4 4 Mb/s), the inequality reduces.  So
2122	      Classic flows always make progress and are not starved.

2124	      When L4S was first proposed (in 2015, 14 years after [RFC3168] was
2125	      published), it was believed that Classic ECN AQMs had failed to be
2126	      deployed, because research measurements had found little or no
2127	      evidence of CE marking.  In subsequent years Classic ECN was
2128	      included in FQ-CoDel deployments, however an FQ scheduler stops an
2129	      L4S flow outcompeting Classic, because it enforces equality
2130	      between flow rates.  It is not known whether there have been any
2131	      non-FQ deployments of Classic ECN AQMs in the subsequent years, or
2132	      whether there will be in future.

2134	      An algorithm for detecting a Classic ECN AQM as soon as a flow
2135	      stabilizes after start-up has been proposed [ecn-fallback] (see
2136	      Appendix A.1.4 for a brief summary).  Testbed evaluations of v2 of
2137	      the algorithm have shown detection is reasonably good for Classic
2138	      ECN AQMs, in a wide range of circumstances.  However, although it
2139	      can correctly detect an L4S ECN AQM in many circumstances, its is
2140	      often incorrect at low link capacities and/or high RTTs.  Although
2141	      this is the safe way round, there is a danger that it will
2142	      discourage use of the algorithm.

2144	   Non-L4S service for control packets:  The Classic ECN RFCs [RFC3168]
2145	      and [RFC5562] require a sender to clear the ECN field to Not-ECT
2146	      on retransmissions and on certain control packets specifically
2147	      pure ACKs, window probes and SYNs.  When L4S packets are
2148	      classified by the ECN field, these control packets would not be
2149	      classified into an L4S queue, and could therefore be delayed
2150	      relative to the other packets in the flow.  This would not cause
2151	      reordering (because retransmissions are already out of order, and
2152	      these control packets typically carry no data).  However, it would
2153	      make critical control packets more vulnerable to loss and delay.
2154	      To address this problem, [I-D.ietf-tcpm-generalized-ecn] proposes
2155	      an experiment in which all TCP control packets and retransmissions
2156	      are ECN-capable as long as appropriate ECN feedback is available
2157	      in each case.

2159	   Pros:

2161	   Should work e2e:  The ECN field generally works end-to-end across the
2162	      Internet.  Unlike the DSCP, the setting of the ECN field is at
2163	      least forwarded unchanged by networks that do not support ECN, and
2164	      networks rarely clear it to zero.

2166	   Should work in tunnels:  Unlike Diffserv, ECN is defined to always
2167	      work across tunnels.  This scheme works within a tunnel that
2168	      propagates the ECN field in any of the variant ways it has been
2169	      defined, from the year 2001 [RFC3168] onwards.  However, it is
2170	      likely that some tunnels still do not implement ECN propagation at
2171	      all.

2173	   Could migrate to one codepoint:  If all Classic ECN senders
2174	      eventually evolve to use the L4S service, the ECT(0) codepoint
2175	      could be reused for some future purpose, but only once use of
2176	      ECT(0) packets had reduced to zero, or near-zero, which might
2177	      never happen.

2179	   L4 not required:  Being based on the ECN field, this scheme does not
2180	      need the network to access transport layer flow identifiers.
2181	      Nonetheless, it does not preclude solutions that do.

2183	B.2.  ECN-DualQ-SCE1

2185	   Definition:

2187	      In this proposal, an L4S AQM would indicate congestion with ECT(1)
2188	      in contrast to a Classic AQM, which indicates congestion with CE.
2189	      More specifically:

2191	      *  Given shortage of codepoints, with this proposal L4S ECN hosts
2192	         send packets as ECT(0), like Classic ECN does by default
2193	         [RFC8311] hosts.

2195	      *  If the ECT(1) codepoint were used to indicate congestion in
2196	         this way, it would signify a shallow queue AQM to the end-to-
2197	         end transport.  So those who proposed this approach called it
2198	         'Some Congestion Experienced' (SCE) because of its similarity
2199	         to [I-D.morton-tsvwg-sce].  It has also been described as
2200	         'ECT(1) on output', in contrast to the 'ECT(1) on input'
2201	         approach outlined in Appendix B.1.

2203	      *  The approach works best if the network is transport-aware and
2204	         isolates each application flow in its own queue (per-flow
2205	         queuing, or FQ).  Two AQMs are implemented in each queue, one
2206	         with a shallow target that marks selected ECT packets as
2207	         ECT(1), the other with a deeper target that marks selected ECT
2208	         packets as CE, or drops selected non-ECT packets.

2210	      *  A Classic congestion control would not have the logic to
2211	         recognize ECT(1) as a congestion signal.  So it would
2212	         (correctly) drive the queue to the deeper threshold, responding
2213	         only to CE markings.  An L4S congestion control that
2214	         understands this scheme would respond to ECT(1) markings, which
2215	         ought to therefore keep the queue close to the shallower
2216	         threshold.

2218	      *  A dual queue approach has been informally proposed, with an L4S
2219	         and a Classic queue and coupling similar to
2220	         [I-D.ietf-tsvwg-aqm-dualq-coupled].  In an interim
2221	         classification, all ECT packets would be classified into the
2222	         low latency queue, and non-ECT packets into the Classic queue.
2223	         But then, in front of the low latency queue, a stateful flow
2224	         characterization function would maintain a queue occupancy
2225	         metric.  It would then redirect any high occupancy flows into
2226	         the Classic queue.

2228	   Cons:

2230	   Network requires transport-layer awareness:  There is no variant of
2231	      this approach that works without network visibility of transport
2232	      layer flow identifiers (the 5-tuple).  Obviously the FQ variant
2233	      needs to see 5-tuples, but so does the DualQ SCE1 variant (to
2234	      redirect flows based on sparseness).  So there is no arrangement
2235	      of this approach that operators could choose if they could not
2236	      access the transport layer, or did not want to (e.g. to support
2237	      full end-to-end encryption above the IP layer).

2239	   Incomplete isolation:  When evaluated, the DualQ variant of ECN-
2240	      DualQ-SCE1 introduced impairments to both L4S and Classic flows.
2241	      The evaluation used the DOCSIS queue protection function
2242	      [I-D.briscoe-docsis-q-protection] to maintain the per-flow
2243	      sparseness metrics and redirect packets from non-sparse flows into
2244	      the Classic queue.  Unfortunately, it is impossible to determine
2245	      non-sparseness until sufficient packets of each flow have been
2246	      analyzed.  Up to this point, all packets default to the L4S queue.
2247	      Then:

2249	      *  Long-running Classic flows experience reordering during the
2250	         transition to classifying them as Classic.  Worse, the
2251	         reordering occurs early in the flow when it is less robust to
2252	         confusing RTT measurements;

2254	      *  Considerable numbers of Classic packets add to the L4S queue -
2255	         from all the short flows and the start of long flows before the
2256	         classifier can be certain enough to redirect them to the other
2257	         (Classic) queue.  So true L4S flows unavoidably experience a
2258	         degree of extra delay.

2260	   Consumes the last ECN codepoint:  The L4S service could potentially
2261	      supersede the service provided by Classic ECN, therefore using
2262	      ECT(1) to indicate L4S congestion could ultimately mean that the
2263	      CE codepoint was 'wasted' purely to distinguish one form of
2264	      congestion from its successor.

2266	   Only recently updated tunnels:  If this scheme is applied to an outer
2267	      header within a tunnel or lower layer encapsulation, the ECT(1)
2268	      codepoint will be black-holed at decapsulation, unless the
2269	      decapsulator complies with changes to IP-in-IP tunnels introduced
2270	      in 2010 [RFC6040], or changes to other tunnels that are
2271	      (currently) work in progress [I-D.ietf-tsvwg-rfc6040update-shim],
2272	      [I-D.ietf-tsvwg-ecn-encap-guidelines].

2274	   Limited TCP support for feedback:  This approach requires transport
2275	      layer feedback of two congestion signals ECT(1) and CE.  Recently
2276	      developed protocols such as QUIC provide this by default.
2277	      However, there is limited space in the main TCP header to feed
2278	      back both signals reliably and accurately [RFC7560].  AccECN
2279	      [I-D.ietf-tcpm-accurate-ecn] devotes the limited space in the main
2280	      TCP header to CE feedback, and optionally feeds back ECT(1) in a
2281	      new TCP option, which will have limited initial deployment
2282	      support.

2284	   Alters non-participating packets:  An AQM following this approach
2285	      alters some selected ECT(0) packets to ECT(1) irrespective of
2286	      whether they are participating in the L4S experiment.  Although
2287	      ECT(0) and ECT(1) have historically been defined as equivalent, in
2288	      practice ECT(1) packets have been extremely rare on the Internet.
2289	      Therefore, in practice, there might be a risk that firewalls and
2290	      other devices will block ECT(1) packets, or at least treat them
2291	      with greater suspicion.

2293	   ECN hard in some lower layers:  Similarly to the 'Con' point in
2294	      Appendix B.1, it is not always possible to support ECN in an AQM
2295	      acting in a buffer below the IP layer
2296	      [I-D.ietf-tsvwg-ecn-encap-guidelines].  However, adding support to
2297	      lower layers would be even harder with this scheme, because it
2298	      needs space for two severity levels of congestion, not one.
2299	      Without lower layer ECN support, the L4S service would have to
2300	      drop rather than mark frames even though they might encapsulate an
2301	      ECN-capable packet. .

2303	   Non-L4S service for control packets:  Identical to 'Con' point in
2304	      Appendix B.1.

2306	   Pros:

2308	   Distinct indication of Classic ECN AQM:  An AQM following the ECN-
2309	      DualQ-SCE1 approach outputs distinctive signals (ECT(1)) compared
2310	      to those output by a Classic ECN AQM.  So an L4S congestion
2311	      control using the SCE1 approach would inherently respond
2312	      appropriately to a Classic AQM.

2314	   Should work e2e:  Identical to 'Pro' point in Appendix B.1.

2316	B.3.  ECN-DualQ-SCE0

2318	   Definition:

2320	      This proposal is the inverse of the ECN-DualQ-SCE1 scheme (see
2321	      Appendix B.2 above).  L4S AQMs signal congestion with the
2322	      transition ECT(1) -> ECT(0).  More specifically:

2324	      *  L4S senders would send their packets as ECT(1), while Classic
2325	         ECN senders would continue to send ECT(0) by default [RFC8311].

2327	      *  FQ AQMs would work in a similar way to that described for ECN-
2328	         DualQ-SCE1 in Appendix B.2 above.  Except the shallow queue AQM
2329	         would mark selected ECT packets with ECT(0), rather than
2330	         ECT(1).

2332	         It would seem possible to classify packets by both 5-tuple and
2333	         ECT codepoint, so that each per-flow queue could instantiate
2334	         just the one AQM appropriate to the ECT codepoint using it.  In
2335	         this case, CE and Not-ECT packets would be classified into the
2336	         same queue as ECT(0).  However, this would open up the risk of
2337	         reordering explained below, so it is not considered further.

2339	      *  A Classic congestion control would only receive CE feedback,
2340	         and it would have no logic to recognize ECT(0) as congestion
2341	         markings, because it would send all its packets as ECT(0)
2342	         anyway.  So it would (correctly) drive the queue to the deeper
2343	         threshold, responding only to CE markings.  An L4S congestion
2344	         control would understand ECT(0) markings as L4S congestion
2345	         signals and therefore ought to keep the queue close to the
2346	         shallower threshold.

2348	      *  Under the SCE0 scheme, a dual queue coupled AQM
2349	         [I-D.ietf-tsvwg-aqm-dualq-coupled] would use ECT(1) as the L4S
2350	         classifier in a very similar way to the 'ECT(1) and CE' scheme
2351	         it was originally designed for.  The one difference would be to
2352	         classify CE packets into the Classic queue along with ECT(0)
2353	         and Not-ECT.

2355	   Cons:

2357	   Consumes the last ECN codepoint:  The L4S service could potentially
2358	      supersede the service provided by Classic ECN, therefore using
2359	      ECT(0) to indicate L4S congestion could ultimately mean that the
2360	      CE codepoint was 'wasted' purely to distinguish one form of
2361	      congestion from its successor.

2363	   Incompatible with all ECN tunnels:  The transition ECT(1) -> ECT(0)
2364	      has never previously been recognized as valid.  So, any ECT(0)
2365	      marking applied to an ECT(1) outer header within a tunnel or lower
2366	      layer encapsulation will be black-holed at decapsulation by any
2367	      decapsulator whatever variant of ECN tunnel RFC it complies with.

2369	   Limited TCP support for feedback:  Identical to 'Con' point in
2370	      Appendix B.2 above except space would be needed for CE and ECT(0)
2371	      rather than CE and ECT(1) feedback.

2373	   Risk of reordering Classic CE packets:  If an L4S flow traverses a
2374	      path with two or more bottleneck AQMs that both support L4S,
2375	      reordering is likely to occur.  This is because the first
2376	      bottleneck will re-mark some ECT(1) packets to ECT(0), which will
2377	      then be classified into the Classic queue of the second AQM, even
2378	      though they originated as L4S packets.

2380	      In contrast to the 'ECT(1) and CE' scheme in Appendix B.1, the
2381	      risk of impairment in the ECN-DualQ-SCE0 case is not vanishingly
2382	      small:

2384	      1.  Certainly, queuing at more than one bottleneck on the same
2385	          path would still be quite unusual.

2387	      2.  However, the ECN-DualQ-SCE0 case occurs if both bottlenecks
2388	          support L4S ECN and the traffic is L4S.  This contrasts with
2389	          the "ECT(1) and CE" case, which solely occurs if the AQMs are
2390	          in a certain order (Classic followed by L4S).

2392	      3.  When misclassification occurs, it is from L4S to Classic.  So
2393	          selected packets are delivered late, which in itself adds
2394	          delay, and also increases the risk that each late delivery
2395	          will be deemed a loss and cause a high level of spurious
2396	          retransmissions.  This contrasts with the "ECT(1) and CE" case
2397	          where selected packets are delivered early, which is very
2398	          unlikely to have any effect (as already explained in
2399	          Appendix B.1).

2401	   ECN hard in some lower layers:  Identical to 'Con' point in
2402	      Appendix B.2.

2404	   Non-L4S service for control packets:  Identical to 'Con' point in
2405	      Appendix B.1.

2407	   Pros:

2409	   Distinct indication of Classic ECN AQM:  An AQM following the ECN-
2410	      DualQ-SCE0 approach outputs distinctive signals (ECT(0)) compared
2411	      to those output by a Classic ECN AQM (CE).  So an L4S congestion
2412	      control can inherently respond appropriately to a Classic AQM.

2414	   Should work e2e:  Identical to 'Pro' point in Appendix B.1.

2416	B.4.  ECN Plus a Diffserv Codepoint (DSCP)

2418	   Definition:

2420	      For packets with a defined DSCP, all codepoints of the ECN field
2421	      (except Not-ECT) would signify alternative L4S semantics to those
2422	      for Classic ECN [RFC3168], specifically:

2424	      *  The L4S DSCP would signify that the packet came from an L4S-
2425	         capable sender.

2427	      *  ECT(0) and ECT(1) would both signify that the packet was
2428	         travelling between transport endpoints that were both ECN-
2429	         capable.

2431	      *  CE would signify that the packet had been marked by an AQM
2432	         implementing the L4S service.

2434	   Use of a DSCP is the only approach for alternative ECN semantics
2435	   given as an example in [RFC4774].  However, it was perhaps considered
2436	   more for controlled environments than new end-to-end services.

2438	   Cons:

2440	   Consumes DSCP pairs:  A DSCP is by definition not orthogonal to
2441	      Diffserv.  Therefore, wherever the L4S service is applied to
2442	      multiple Diffserv scheduling behaviours, it would be necessary to
2443	      replace each DSCP with a pair of DSCPs.

2445	   Uses critical lower-layer header space:  The resulting increased
2446	      number of DSCPs might be hard to support for some lower layer
2447	      technologies, e.g. 802.1Q and MPLS both offer only 3-bits for a
2448	      maximum of 8 traffic class identifiers.  Although L4S should
2449	      reduce and possibly remove the need for some DSCPs intended for
2450	      differentiated queuing delay, it will not remove the need for
2451	      Diffserv entirely, because Diffserv is also used to allocate
2452	      bandwidth, e.g. by prioritising some classes of traffic over
2453	      others when traffic exceeds available capacity.

2455	   Not end-to-end (host-network):  Very few networks honour a DSCP set
2456	      by a host.  Typically a network will zero (bleach) the Diffserv
2457	      field from all hosts.  DSCP bleaching would turn an L4S ECN packet
2458	      into a Classic ECN packet.

2460	   Not end-to-end (network-network):  Very few networks honour a DSCP
2461	      received from a neighbouring network.  Typically a network will
2462	      zero (bleach) the Diffserv field from all neighbouring networks at
2463	      an interconnection point.  Sometimes bilateral arrangements are
2464	      made between networks, such that the receiving network remarks
2465	      some DSCPs to those it uses for roughly equivalent services.  The
2466	      likelihood that a DSCP will be bleached or ignored depends on the
2467	      type of DSCP:

2469	      Local-use DSCP:  These tend to be used to implement application-
2470	         specific network policies, but a bilateral arrangement to
2471	         remark certain DSCPs is often applied to DSCPs in the local-use
2472	         range simply because it is easier not to change all of a
2473	         network's internal configurations when a new arrangement is
2474	         made with a neighbour.

2476	      Recommended standard DSCP:  These do not tend to be honoured
2477	         across network interconnections more than local-use DSCPs.
2478	         However, if two networks decide to honour certain of each
2479	         other's DSCPs, the reconfiguration is a little easier if both
2480	         of their globally recognised services are already represented
2481	         by the relevant recommended standard DSCPs.

2483	         Note that today a recommended standard DSCP gives little more
2484	         assurance of end-to-end service than a local-use DSCP.  In
2485	         future the range recommended as standard might give more
2486	         assurance of end-to-end service than local-use, but it is
2487	         unlikely that either assurance will be high, particularly given
2488	         the hosts are included in the end-to-end path.

2490	      Whenever DSCP bleaching did occur, it would turn an L4S ECN packet
2491	      into a Classic ECN packet.

2493	   Not all tunnels:  Diffserv codepoints are often not propagated to the
2494	      outer header when a packet is encapsulated by a tunnel header.
2495	      DSCPs are propagated to the outer of uniform mode tunnels, but not
2496	      pipe mode [RFC2983], and pipe mode is fairly common.  Whenever
2497	      pipe mode was used, it would temporarily turn an L4S ECN packet
2498	      into a Classic ECN packet.

2500	   ECN hard in some lower layers::  Because this approach uses both the
2501	      Diffserv and ECN fields, an AQM will only work at a lower layer if
2502	      both can be supported.  If individual network operators wished to
2503	      deploy an AQM at a lower layer, they would usually propagate an IP
2504	      Diffserv codepoint to the lower layer, using for example IEEE
2505	      802.1p.  However, the ECN capability is harder to propagate down
2506	      to lower layers because few lower layers support it.

2508	   Hard to distinguish Classic ECN AQM:  Defining a DSCP to indicate L4S
2509	      is a way to help network nodes identify L4S packets (albeit
2510	      unreliable due to the likelihood of bleaching - see above).
2511	      However, it does not help hosts distinguish between ECN markings
2512	      from L4S and Classic AQMs.  This is because Classic AQMs would
2513	      have been implemented without any logic to recognize an L4S DSCP
2514	      or apply L4S marking behaviour.

2516	   Pros:

2518	   Could migrate to e2e:  If all usage of Classic ECN migrates to usage
2519	      of L4S, the DSCP would become redundant, and the ECN capability
2520	      alone could eventually identify L4S packets without the
2521	      interconnection problems of Diffserv detailed above, and without
2522	      having permanently consumed more than one codepoint in the IP
2523	      header.  Although the DSCP does not generally function as an end-
2524	      to-end identifier (see above), it could be used initially by
2525	      individual ISPs to introduce the L4S service for their own locally
2526	      generated traffic.

2528	B.5.  ECN capability alone

2530	   This approach uses ECN capability alone as the L4S identifier.  It
2531	   would only have been feasible if RFC 3168 ECN had not been widely
2532	   deployed.  This was the case when the choice of L4S identifier was
2533	   being made and this appendix was first written.  Since then, RFC 3168
2534	   ECN has been widely deployed and L4S did not take this approach
2535	   anyway.  So this approach is not discussed further, because it is no
2536	   longer a feasible option.

2538	B.6.  Protocol ID

2540	   It has been suggested that a new Protocol ID in the IPv4 Protocol
2541	   field or the IPv6 Next Header field could identify L4S packets.
2542	   However this approach is ruled out by numerous problems:

2544	   o  A duplicate protocol ID would need to be created for each
2545	      transport (TCP, SCTP, UDP, etc.).

2547	   o  In IPv6, there can be a sequence of Next Header fields, and it
2548	      would not be obvious which one would be expected to identify a
2549	      network service like L4S.

2551	   o  A new protocol ID would rarely provide an end-to-end service,
2552	      because It is well-known that new protocol IDs are often blocked
2553	      by numerous types of middlebox.

2555	   o  The approach is not a solution for AQM methods below the IP layer.

2557	B.7.  Source or destination addressing

2559	   Locally, a network operator could arrange for L4S service to be
2560	   applied based on source or destination addressing, e.g. packets from
2561	   its own data centre and/or CDN hosts, packets to its business
2562	   customers, etc.  It could use addressing at any layer, e.g. IP
2563	   addresses, MAC addresses, VLAN IDs, etc.  Although addressing might
2564	   be a useful tactical approach for a single ISP, it would not be a
2565	   feasible approach to identify an end-to-end service like L4S.  Even
2566	   for a single ISP, it would require packet classifiers in buffers to
2567	   be dependent on changing topology and address allocation decisions
2568	   elsewhere in the network.  Therefore this approach is not a feasible
2569	   solution.

2571	B.8.  Summary: Merits of Alternative Identifiers

2573	   Table 1 and Table 2 provide a very high level summary of the pros and
2574	   cons detailed against the schemes described respectively in
2575	   Appendix B.1, Appendix B.4, Appendix B.2 and Appendix B.3 for nine
2576	   issues that set them apart.

2578	      +-----------------+----------------------+--------------------+
2579	      | Issue           | ECT(1) + CE (Chosen) |     DSCP + ECN     |
2580	      +-----------------+----------------------+--------------------+
2581	      |                 |  initial   eventual  | initial   eventual |
2582	      |                 |                      |                    |
2583	      | end-to-end      |   . . Y      . . Y   |  N . .      . ? .  |
2584	      | tunnels         |   . . ?      . . Y   |  . O .      . O .  |
2585	      | lower layers    |   . O .      . . ?   |  N . .      . ? .  |
2586	      | spare codepoint |   N . .      . . ?   |  N . .      . . ?  |
2587	      | reordering      |   . O .      . . ?   |  . . Y      . . Y  |
2588	      | Classic ECN AQM |   . O .      . . ?   |  . O .      . . ?  |
2589	      | isolation       |   . . Y      . . Y   |  . . Y      . . Y  |
2590	      | poss w/o L4 IDs |   . . Y      . . Y   |  . . Y      . . Y  |
2591	      | TCP feedback    |   . O .      . . Y   |  . O .      . . Y  |
2592	      | TCP ctrl pkts   |   . O .      . . ?   |  . . Y      . . Y  |
2593	      +-----------------+----------------------+--------------------+

2595	           Table 1: Merits of Alternative L4S Identifiers (pt 1)

2597	       +-----------------+--------------------+--------------------+
2598	       | Issue           |   ECN-DualQ-SCE1   |   ECN-DualQ-SCE0   |
2599	       +-----------------+--------------------+--------------------+
2600	       |                 | initial   eventual | initial   eventual |
2601	       |                 |                    |                    |
2602	       | end-to-end      |  . . Y      . . Y  |  . . Y      . . Y  |
2603	       | tunnels         |  . ? .      . . ?  |  N . .      . ? .  |
2604	       | lower layers    |  N . .      . ? .  |  N . .      . ? .  |
2605	       | spare codepoint |  N . .      N . .  |  N . .      N . .  |
2606	       | reordering      |  N . .      N . .  |  N . .      N . .  |
2607	       | Classic ECN AQM |  . . Y      . . Y  |  . . Y      . . Y  |
2608	       | isolation       |  N . .      N . .  |  . . Y      . . Y  |
2609	       | poss w/o L4 IDs |  N . .      N . .  |  . . Y      . . Y  |
2610	       | TCP feedback    |  N . .      . O .  |  N . .      . O .  |
2611	       | TCP ctrl pkts   |  . O .      . . ?  |  . O .      . . ?  |
2612	       +-----------------+--------------------+--------------------+

2614	           Table 2: Merits of Alternative L4S Identifiers (pt 2)

2616	   The schemes are scored based on both their capabilities now
2617	   ('initial') and in the long term ('eventual').  The scores are one of
2618	   'N, O, Y', meaning 'Poor', 'Ordinary', 'Good' respectively.  The same
2619	   scores are aligned vertically to aid the eye.  A score of "?" in one
2620	   of the positions means that this approach might optimistically become
2621	   this good, given sufficient effort.  The tables summarize the text
2622	   and are not meant to be understandable without having read the text.

2624	Appendix C.  Potential Competing Uses for the ECT(1) Codepoint

2626	   The ECT(1) codepoint of the ECN field has already been assigned once
2627	   for the ECN nonce [RFC3540], which has now been categorized as
2628	   historic [RFC8311].  ECN is probably the only remaining field in the
2629	   Internet Protocol that is common to IPv4 and IPv6 and still has
2630	   potential to work end-to-end, with tunnels and with lower layers.
2631	   Therefore, ECT(1) should not be reassigned to a different
2632	   experimental use (L4S) without carefully assessing competing
2633	   potential uses.  These fall into the following categories:

2635	C.1.  Integrity of Congestion Feedback

2637	   Receiving hosts can fool a sender into downloading faster by
2638	   suppressing feedback of ECN marks (or of losses if retransmissions
2639	   are not necessary or available otherwise).

2641	   The historic ECN nonce protocol [RFC3540] proposed that a TCP sender
2642	   could set either of ECT(0) or ECT(1) in each packet of a flow and
2643	   remember the sequence it had set.  If any packet was lost or
2644	   congestion marked, the receiver would miss that bit of the sequence.
2645	   An ECN Nonce receiver had to feed back the least significant bit of
2646	   the sum, so it could not suppress feedback of a loss or mark without
2647	   a 50-50 chance of guessing the sum incorrectly.

2649	   It is highly unlikely that ECT(1) will be needed for integrity
2650	   protection in future.  The ECN Nonce RFC [RFC3540] as been
2651	   reclassified as historic, partly because other ways have been
2652	   developed to protect feedback integrity of TCP and other transports
2653	   [RFC8311] that do not consume a codepoint in the IP header.  For
2654	   instance:

2656	   o  the sender can test the integrity of the receiver's feedback by
2657	      occasionally setting the IP-ECN field to a value normally only set
2658	      by the network.  Then it can test whether the receiver's feedback
2659	      faithfully reports what it expects (see para 2 of Section 20.2 of
2660	      [RFC3168].  This works for loss and it will work for the accurate
2661	      ECN feedback [RFC7560] intended for L4S.

2663	   o  A network can enforce a congestion response to its ECN markings
2664	      (or packet losses) by auditing congestion exposure (ConEx)
2665	      [RFC7713].  Whether the receiver or a downstream network is
2666	      suppressing congestion feedback or the sender is unresponsive to
2667	      the feedback, or both, ConEx audit can neutralise any advantage
2668	      that any of these three parties would otherwise gain.

2670	   o  The TCP authentication option (TCP-AO [RFC5925]) can be used to
2671	      detect any tampering with TCP congestion feedback (whether
2672	      malicious or accidental).  TCP's congestion feedback fields are
2673	      immutable end-to-end, so they are amenable to TCP-AO protection,
2674	      which covers the main TCP header and TCP options by default.
2675	      However, TCP-AO is often too brittle to use on many end-to-end
2676	      paths, where middleboxes can make verification fail in their
2677	      attempts to improve performance or security, e.g. by
2678	      resegmentation or shifting the sequence space.

2680	C.2.  Notification of Less Severe Congestion than CE

2682	   Various researchers have proposed to use ECT(1) as a less severe
2683	   congestion notification than CE, particularly to enable flows to fill
2684	   available capacity more quickly after an idle period, when another
2685	   flow departs or when a flow starts, e.g. VCP [VCP], Queue View (QV)
2686	   [QV].

2688	   Before assigning ECT(1) as an identifier for L4S, we must carefully
2689	   consider whether it might be better to hold ECT(1) in reserve for
2690	   future standardisation of rapid flow acceleration, which is an
2691	   important and enduring problem [RFC6077].

2693	   Pre-Congestion Notification (PCN) is another scheme that assigns
2694	   alternative semantics to the ECN field.  It uses ECT(1) to signify a
2695	   less severe level of pre-congestion notification than CE [RFC6660].
2696	   However, the ECN field only takes on the PCN semantics if packets
2697	   carry a Diffserv codepoint defined to indicate PCN marking within a
2698	   controlled environment.  PCN is required to be applied solely to the
2699	   outer header of a tunnel across the controlled region in order not to
2700	   interfere with any end-to-end use of the ECN field.  Therefore a PCN
2701	   region on the path would not interfere with any of the L4S service
2702	   identifiers proposed in Appendix B.

2704	Authors' Addresses
2705	   Koen De Schepper
2706	   Nokia Bell Labs
2707	   Antwerp
2708	   Belgium

2710	   Email: koen.de_schepper@nokia.com
2711	   URI:   https://www.bell-labs.com/usr/koen.de_schepper

2713	   Bob Briscoe (editor)
2714	   Independent
2715	   UK

2717	   Email: ietf@bobbriscoe.net
2718	   URI:   http://bobbriscoe.net/