idnits 2.17.1 

draft-briscoe-tsvwg-l4s-arch-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (March 13, 2017) is 2601 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-28) exists of
     draft-ietf-tcpm-accurate-ecn-02

  == Outdated reference: A later version (-07) exists of
     draft-ietf-tcpm-cubic-04

  == Outdated reference: A later version (-10) exists of
     draft-ietf-tcpm-dctcp-04

  == Outdated reference: A later version (-08) exists of
     draft-ietf-tsvwg-ecn-experimentation-01

  == Outdated reference: A later version (-03) exists of
     draft-johansson-quic-ecn-01

  == Outdated reference: A later version (-07) exists of
     draft-stewart-tsvwg-sctpecn-05

  -- Obsolete informational reference (is this intentional?): RFC 4960
     (Obsoleted by RFC 9260)


     Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Transport Area Working Group                             B. Briscoe, Ed.
3	Internet-Draft                                       Simula Research Lab
4	Intended status: Informational                            K. De Schepper
5	Expires: September 14, 2017                              Nokia Bell Labs
6	                                                        M. Bagnulo Braun
7	                                        Universidad Carlos III de Madrid
8	                                                          March 13, 2017

10	   Low Latency, Low Loss, Scalable Throughput (L4S) Internet Service:
11	                              Architecture
12	                    draft-briscoe-tsvwg-l4s-arch-01

14	Abstract

16	   This document describes the L4S architecture for the provision of a
17	   new service that the Internet could provide to eventually replace
18	   best efforts for all traffic: Low Latency, Low Loss, Scalable
19	   throughput (L4S).  It is becoming common for _all_ (or most)
20	   applications being run by a user at any one time to require low
21	   latency.  However, the only solution the IETF can offer for ultra-low
22	   queuing delay is Diffserv, which only favours a minority of packets
23	   at the expense of others.  In extensive testing the new L4S service
24	   keeps average queuing delay under a millisecond for _all_
25	   applications even under very heavy load, without sacrificing
26	   utilization; and it keeps congestion loss to zero.  It is becoming
27	   widely recognized that adding more access capacity gives diminishing
28	   returns, because latency is becoming the critical problem.  Even with
29	   a high capacity broadband access, the reduced latency of L4S
30	   remarkably and consistently improves performance under load for
31	   applications such as interactive video, conversational video, voice,
32	   Web, gaming, instant messaging, remote desktop and cloud-based apps
33	   (even when all being used at once over the same access link).  The
34	   insight is that the root cause of queuing delay is in TCP, not in the
35	   queue.  By fixing the sending TCP (and other transports) queuing
36	   latency becomes so much better than today that operators will want to
37	   deploy the network part of L4S to enable new products and services.
38	   Further, the network part is simple to deploy - incrementally with
39	   zero-config.  Both parts, sender and network, ensure coexistence with
40	   other legacy traffic.  At the same time L4S solves the long-
41	   recognized problem with the future scalability of TCP throughput.

43	   This document describes the L4S architecture, briefly describing the
44	   different components and how the work together to provide the
45	   aforementioned enhanced Internet service.

47	Status of This Memo

49	   This Internet-Draft is submitted in full conformance with the
50	   provisions of BCP 78 and BCP 79.

52	   Internet-Drafts are working documents of the Internet Engineering
53	   Task Force (IETF).  Note that other groups may also distribute
54	   working documents as Internet-Drafts.  The list of current Internet-
55	   Drafts is at http://datatracker.ietf.org/drafts/current/.

57	   Internet-Drafts are draft documents valid for a maximum of six months
58	   and may be updated, replaced, or obsoleted by other documents at any
59	   time.  It is inappropriate to use Internet-Drafts as reference
60	   material or to cite them other than as "work in progress."

62	   This Internet-Draft will expire on September 14, 2017.

64	Copyright Notice

66	   Copyright (c) 2017 IETF Trust and the persons identified as the
67	   document authors.  All rights reserved.

69	   This document is subject to BCP 78 and the IETF Trust's Legal
70	   Provisions Relating to IETF Documents
71	   (http://trustee.ietf.org/license-info) in effect on the date of
72	   publication of this document.  Please review these documents
73	   carefully, as they describe your rights and restrictions with respect
74	   to this document.  Code Components extracted from this document must
75	   include Simplified BSD License text as described in Section 4.e of
76	   the Trust Legal Provisions and are provided without warranty as
77	   described in the Simplified BSD License.

79	Table of Contents

81	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
82	   2.  L4S architecture overview . . . . . . . . . . . . . . . . . .   4
83	   3.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   6
84	   4.  L4S architecture components . . . . . . . . . . . . . . . . .   7
85	   5.  Rationale . . . . . . . . . . . . . . . . . . . . . . . . . .   9
86	     5.1.  Why These Primary Components? . . . . . . . . . . . . . .   9
87	     5.2.  Why Not Alternative Approaches? . . . . . . . . . . . . .  10
88	   6.  Applicability . . . . . . . . . . . . . . . . . . . . . . . .  12
89	     6.1.  Use Cases . . . . . . . . . . . . . . . . . . . . . . . .  13
90	     6.2.  Deployment Considerations . . . . . . . . . . . . . . . .  14
91	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  14
92	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  14
93	     8.1.  Traffic (Non-)Policing  . . . . . . . . . . . . . . . . .  15
94	     8.2.  'Latency Friendliness'  . . . . . . . . . . . . . . . . .  15
95	     8.3.  Policing Prioritized L4S Bandwidth  . . . . . . . . . . .  16
96	     8.4.  ECN Integrity . . . . . . . . . . . . . . . . . . . . . .  16
97	   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  17
98	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  17
99	     10.1.  Normative References . . . . . . . . . . . . . . . . . .  17
100	     10.2.  Informative References . . . . . . . . . . . . . . . . .  18
101	   Appendix A.  Required features for scalable transport protocols
102	                to be safely deployable in the Internet (a.k.a. TCP
103	                Prague requirements) . . . . . . . . . . . . . . . .  22
104	   Appendix B.  Standardization items  . . . . . . . . . . . . . . .  26
105	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  28

107	1.  Introduction

109	   It is increasingly common for _all_ of a user's applications at any
110	   one time to require low delay: interactive Web, Web services, voice,
111	   conversational video, interactive video, instant messaging, online
112	   gaming, remote desktop and cloud-based applications.  In the last
113	   decade or so, much has been done to reduce propagation delay by
114	   placing caches or servers closer to users.  However, queuing remains
115	   a major, albeit intermittent, component of latency.  When present it
116	   typically doubles the path delay from that due to the base speed-of-
117	   light.  Low loss is also important because, for interactive
118	   applications, losses translate into even longer retransmission
119	   delays.

121	   It has been demonstrated that, once access network bit rates reach
122	   levels now common in the developed world, increasing capacity offers
123	   diminishing returns if latency (delay) is not addressed.
124	   Differentiated services (Diffserv) offers Expedited Forwarding
125	   [RFC3246] for some packets at the expense of others, but this is not
126	   applicable when all (or most) of a user's applications require low
127	   latency.

129	   Therefore, the goal is an Internet service with ultra-Low queueing
130	   Latency, ultra-Low Loss and Scalable throughput (L4S) - for _all_
131	   traffic.  A service for all traffic will need none of the
132	   configuration or management baggage (traffic policing, traffic
133	   contracts) associated with favouring some packets over others.  This
134	   document describes the L4S architecture for achieving that goal.

136	   It must be said that queuing delay only degrades performance
137	   infrequently [Hohlfeld14].  It only occurs when a large enough
138	   capacity-seeking (e.g.  TCP) flow is running alongside the user's
139	   traffic in the bottleneck link, which is typically in the access
140	   network.  Or when the low latency application is itself a large
141	   capacity-seeking flow (e.g. interactive video).  At these times, the
142	   performance improvement must be so remarkable that network operators
143	   will be motivated to deploy it.

145	   Active Queue Management (AQM) is part of the solution to queuing
146	   under load.  AQM improves performance for all traffic, but there is a
147	   limit to how much queuing delay can be reduced by solely changing the
148	   network; without addressing the root of the problem.

150	   The root of the problem is the presence of standard TCP congestion
151	   control (Reno [RFC5681]) or compatible variants (e.g.  TCP Cubic
152	   [I-D.ietf-tcpm-cubic]).  We shall call this family of congestion
153	   controls 'Classic' TCP.  It has been demonstrated that if the sending
154	   host replaces Classic TCP with a 'Scalable' alternative, when a
155	   suitable AQM is deployed in the network the performance under load of
156	   all the above interactive applications can be stunningly improved.
157	   For instance, queuing delay under heavy load with the example DCTCP/
158	   DualQ solution cited below is roughly 1 millisecond (1 ms) at the
159	   99th percentile without losing link utilization.  This compares with
160	   5 to 20 ms on _average_ with a Classic TCP and current state-of-the-
161	   art AQMs such as fq_CoDel [I-D.ietf-aqm-fq-codel] or PIE [RFC8033].
162	   Also, with a Classic TCP, 5 ms of queuing is usually only possible by
163	   losing some utilization.

165	   It has been convincingly demonstrated [DCttH15] that it is possible
166	   to deploy such an L4S service alongside the existing best efforts
167	   service so that all of a user's applications can shift to it when
168	   their stack is updated.  Access networks are typically designed with
169	   one link as the bottleneck for each site (which might be a home,
170	   small enterprise or mobile device), so deployment at a single node
171	   should give nearly all the benefit.  The L4S approach requires a
172	   number of mechanisms in different parts of the Internet to fulfill
173	   its goal.  This document presents the L4S architecture, by describing
174	   the different components and how they interact to provide the
175	   scalable low-latency, low-loss, Internet service.

177	2.  L4S architecture overview

179	   There are three main components to the L4S architecture (illustrated
180	   in Figure 1):

182	   1) Network:  The L4S service traffic needs to be isolated from the
183	      queuing latency of the Classic service traffic.  However, the two
184	      should be able to freely share a common pool of capacity.  This is
185	      because there is no way to predict how many flows at any one time
186	      might use each service and capacity in access networks is too
187	      scarce to partition into two.  So a 'semi-permeable' membrane is
188	      needed that partitions latency but not bandwidth.  The Dual Queue
189	      Coupled AQM [I-D.briscoe-aqm-dualq-coupled] is an example of such
190	      a semi-permeable membrane.

192	      Per-flow queuing such as in [I-D.ietf-aqm-fq-codel] could be used,
193	      but it partitions both latency and bandwidth between every end-to-
194	      end flow.  So it is rather overkill, which brings disadvantages
195	      (see Section 5.2), not least that thousands of queues are needed
196	      when two are sufficient.

198	   2) Protocol:  A host needs to distinguish L4S and Classic packets
199	      with an identifier so that the network can classify them into
200	      their separate treatments.  [I-D.briscoe-tsvwg-ecn-l4s-id]
201	      considers various alternative identifiers, and concludes that all
202	      alternatives involve compromises, but the ECT(1) codepoint of the
203	      ECN field is a workable solution.

205	   3) Host:  Scalable congestion controls already exist.  They solve the
206	      scaling problem with TCP first pointed out in [RFC3649].  The one
207	      used most widely (in controlled environments) is Data Centre TCP
208	      (DCTCP [I-D.ietf-tcpm-dctcp]), which has been implemented and
209	      deployed in Windows Server Editions (since 2012), in Linux and in
210	      FreeBSD.  Although DCTCP as-is 'works' well over the public
211	      Internet, most implementations lack certain safety features that
212	      will be necessary once it is used outside controlled environments
213	      like data centres (see later).  A similar scalable congestion
214	      control will also need to be transplanted into protocols other
215	      than TCP (SCTP, RTP/RTCP, RMCAT, etc.)

217	                        (2)                     (1)
218	                 .-------^------. .--------------^-------------------.
219	    ,-(3)-----.                                  ______
220	   ; ________  :            L4S   --------.     |      |
221	   :|Scalable| :               _\        ||___\_| mark |
222	   :| sender | :  __________  / /        ||   / |______|\   _________
223	   :|________|\; |          |/    --------'         ^    \1|         |
224	    `---------'\_|  IP-ECN  |              Coupling :     \|priority |_\
225	     ________  / |Classifier|                       :     /|scheduler| /
226	    |Classic |/  |__________|\    --------.      ___:__  / |_________|
227	    | sender |                \_\  || | |||___\_| mark/|/
228	    |________|                  /  || | |||   / | drop |
229	                         Classic  --------'     |______|

231	     Figure 1: Components of an L4S Solution: 1) Isolation in separate
232	    network queues; 2) Packet Identification Protocol; and 3) Scalable
233	                               Sending Host

235	3.  Terminology

237	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
238	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
239	   document are to be interpreted as described in [RFC2119].  In this
240	   document, these words will appear with that interpretation only when
241	   in ALL CAPS.  Lower case uses of these words are not to be
242	   interpreted as carrying RFC-2119 significance.  COMMENT: Since this
243	   will be an information document, This should be removed.

245	   Classic service:  The 'Classic' service is intended for all the
246	      congestion control behaviours that currently co-exist with TCP
247	      Reno (e.g.  TCP Cubic, Compound, SCTP, etc).

249	   Low-Latency, Low-Loss and Scalable (L4S) service:  The 'L4S' service
250	      is intended for traffic from scalable TCP algorithms such as Data
251	      Centre TCP.  But it is also more general--it will allow a set of
252	      congestion controls with similar scaling properties to DCTCP (e.g.
253	      Relentless [Mathis09]) to evolve.

255	      Both Classic and L4S services can cope with a proportion of
256	      unresponsive or less-responsive traffic as well (e.g.  DNS, VoIP,
257	      etc).

259	   Scalable Congestion Control:  A congestion control where flow rate is
260	      inversely proportional to the level of congestion signals.  Then,
261	      as flow rate scales, the number of congestion signals per round
262	      trip remains invariant, maintaining the same degree of control.
263	      For instance, DCTCP averages 2 congestion signals per round-trip
264	      whatever the flow rate.

266	   Classic Congestion Control:  A congestion control with a flow rate
267	      compatible with standard TCP Reno [RFC5681].  With Classic
268	      congestion controls, as capacity increases enabling higher flow
269	      rates, the number of round trips between congestion signals
270	      (losses or ECN marks) rises in proportion to the flow rate.  So
271	      control of queuing and/or utilization becomes very slack.  For
272	      instance, with 1500 B packets and an RTT of 18 ms, as TCP Reno
273	      flow rate increases from 2 to 100 Mb/s the number of round trips
274	      between congestion signals rises proportionately, from 2 to 100.

276	      The default congestion control in Linux (TCP Cubic) is Reno-
277	      compatible for most scenarios expected for some years.  For
278	      instance, with a typical domestic round-trip time (RTT) of 18ms,
279	      TCP Cubic only switches out of Reno-compatibility mode once the
280	      flow rate approaches 1 Gb/s.  For a typical data centre RTT of 1
281	      ms, the switch-over point is theoretically 1.3 Tb/s.  However,
282	      with a less common transcontinental RTT of 100 ms, it only remains
283	      Reno-compatible up to 13 Mb/s.  All examples assume 1,500 B
284	      packets.

286	   Classic ECN:  The original proposed standard Explicit Congestion
287	      Notification (ECN) protocol [RFC3168], which requires ECN signals
288	      to be treated the same as drops, both when generated in the
289	      network and when responded to by the sender.

291	   Site:  A home, mobile device, small enterprise or campus, where the
292	      network bottleneck is typically the access link to the site.  Not
293	      all network arrangements fit this model but it is a useful, widely
294	      applicable generalisation.

296	4.  L4S architecture components

298	   The L4S architecture is composed by the following elements.

300	   Protocols:The L4S architecture encompass the two protocol changes
301	   that we describe next:

303	   a.  [I-D.briscoe-tsvwg-ecn-l4s-id] recommends ECT(1) is used as the
304	       identifier to classify L4S and Classic packets into their
305	       separate treatments, as required by [RFC4774].

307	   b.  An essential aspect of a scalable congestion control is the use
308	       of explicit congestion signals rather than losses, because the
309	       signals need to be sent immediately and frequently--too often to
310	       use drops.  'Classic' ECN [RFC3168] requires an ECN signal to be
311	       treated the same as a drop, both when it is generated in the
312	       network and when it is responded to by hosts.  L4S allows
313	       networks and hosts to support two separate meanings for ECN.  So
314	       the standards track [RFC3168] will need to be updated to allow
315	       ECT(1) packets to depart from the 'same as drop' constraint.

317	       [I-D.ietf-tsvwg-ecn-experimentation] has been prepared as a
318	       standards track update to relax specific requirements in RFC 3168
319	       (and certain other standards track RFCs), which clears the way
320	       for the above experimental changes proposed for L4S.
321	       [I-D.ietf-tsvwg-ecn-experimentation] also obsoletes the original
322	       experimental assignment of the ECT(1) codepoint as an ECN nonce
323	       [RFC3540] (it was never deployed, and it offers no security
324	       benefit now that deployment is optional).

326	   Network components:The Dual Queue Coupled AQM has been specified as
327	   generically as possible [I-D.briscoe-aqm-dualq-coupled] as a 'semi-
328	   permeable' membrane without specifying the particular AQMs to use in
329	   the two queues.  An informational appendix of the draft is provided
330	   for pseudocode examples of different possible AQM approaches.

332	   Initially a zero-config variant of RED called Curvy RED was
333	   implemented, tested and documented.  The aim is for designers to be
334	   free to implement diverse ideas.  So the brief normative body of the
335	   draft only specifies the minimum constraints an AQM needs to comply
336	   with to ensure that the L4S and Classic services will coexist.  For
337	   instance, a variant of PIE called Dual PI Squared [PI2] has been
338	   implemented and found to perform better over a wide range of
339	   conditions, so it has been documented in a second appendix of
340	   [I-D.briscoe-aqm-dualq-coupled].

342	   Host mechanisms: The L4S architecture includes a number of mechanisms
343	   in the end host that we enumerate next:

345	   a.  Data Centre TCP is the most widely used example of a scalable
346	       congestion control.  It is being documented in the TCPM WG as an
347	       informational record of the protocol currently in use
348	       [I-D.ietf-tcpm-dctcp].  It will be necessary to define a number
349	       of safety features for a variant usable on the public Internet.
350	       A draft list of these, known as the TCP Prague requirements, has
351	       been drawn up (see Appendix A).  The list also includes some
352	       optional performance improvements.

354	   b.  Transport protocols other than TCP use various congestion
355	       controls designed to be friendly with Classic TCP.  Before they
356	       can use the L4S service, it will be necessary to implement
357	       scalable variants of each of these transport behaviours.  The
358	       following standards track RFCs currently define these protocols:
359	       ECN in TCP [RFC3168], in SCTP [RFC4960], in RTP [RFC6679], and in
360	       DCCP [RFC4340].  Not all are in widespread use, but those that
361	       are will eventually need to be updated to allow a different
362	       congestion response, which they will have to indicate by using
363	       the ECT(1) codepoint.  Scalable variants are under consideration
364	       for some new transport protocols that are themselves under
365	       development, e.g.  QUIC [I-D.johansson-quic-ecn] and certain
366	       real-time media congestion avoidandance techniques (RMCAT)
367	       protocols.

369	   c.  ECN feedback is sufficient for L4S in some transport protocols
370	       (RTCP, DCCP) but not others:

372	       *  For the case of TCP, the feedback protocol for ECN embeds the
373	          assumption from Classic ECN that it is the same as drop,
374	          making it unusable for a scalable TCP.  Therefore, the
375	          implementation of TCP receivers will have to be upgraded
376	          [RFC7560].  Work to standardize more accurate ECN feedback for
377	          TCP (AccECN [I-D.ietf-tcpm-accurate-ecn]) is already in
378	          progress.

380	       *  ECN feedback is only roughly sketched in an appendix of the
381	          SCTP specification.  A fuller specification has been proposed
382	          [I-D.stewart-tsvwg-sctpecn], which would need to be
383	          implemented and deployed before SCTCP could support L4S.

385	5.  Rationale

387	5.1.  Why These Primary Components?

389	   Explicit congestion signalling (protocol):  Explicit congestion
390	      signalling is a key part of the L4S approach.  In contrast, use of
391	      drop as a congestion signal creates a tension because drop is both
392	      a useful signal (more would reduce delay) and an impairment (less
393	      would reduce delay).  Explicit congestion signals can be used many
394	      times per round trip, to keep tight control, without any
395	      impairment.  Under heavy load, even more explicit signals can be
396	      applied so the queue can be kept short whatever the load.  Whereas
397	      state-of-the-art AQMs have to introduce very high packet drop at
398	      high load to keep the queue short.  Further, TCP's sawtooth
399	      reduction can be smaller, and therefore return to the operating
400	      point more often, without worrying that this causes more signals
401	      (one at the top of each smaller sawtooth).  The consequent smaller
402	      amplitude sawteeth fit between a very shallow marking threshold
403	      and an empty queue, so delay variation can be very low, without
404	      risk of under-utilization.

406	      All the above makes it clear that explicit congestion signalling
407	      is only advantageous for latency if it does not have to be
408	      considered 'the same as' drop (as required with Classic ECN
409	      [RFC3168]).  Therefore, in a DualQ AQM, the L4S queue uses a new
410	      L4S variant of ECN that is not equivalent to drop
411	      [I-D.briscoe-tsvwg-ecn-l4s-id], while the Classic queue uses
412	      either classic ECN [RFC3168] or drop, which are equivalent.

414	      Before Classic ECN was standardized, there were various proposals
415	      to give an ECN mark a different meaning from drop.  However, there
416	      was no particular reason to agree on any one of the alternative
417	      meanings, so 'the same as drop' was the only compromise that could
418	      be reached.  RFC 3168 contains a statement that:

420	         "An environment where all end nodes were ECN-Capable could
421	         allow new criteria to be developed for setting the CE
422	         codepoint, and new congestion control mechanisms for end-node
423	         reaction to CE packets.  However, this is a research issue, and
424	         as such is not addressed in this document."

426	   Latency isolation with coupled congestion notification (network):

428	      Using just two queues is not essential to L4S (more would be
429	      possible), but it is the simplest way to isolate all the L4S
430	      traffic that keeps latency low from all the legacy Classic traffic
431	      that does not.

433	      Similarly, coupling the congestion notification between the queues
434	      is not necessarily essential, but it is a clever and simple way to
435	      allow senders to determine their rate, packet-by-packet, rather
436	      than be overridden by a network scheduler.  Because otherwise a
437	      network scheduler would have to inspect at least transport layer
438	      headers, and it would have to continually assign a rate to each
439	      flow without any easy way to understand application intent.

441	   L4S packet identifier (protocol):  Once there are at least two
442	      separate treatments in the network, hosts need an identifier at
443	      the IP layer to distinguish which treatment they intend to use.

445	   Scalable congestion notification (host):  A scalable congestion
446	      control keeps the signalling frequency high so that rate
447	      variations can be small when signalling is stable, and rate can
448	      track variations in available capacity as rapidly as possible
449	      otherwise.

451	5.2.  Why Not Alternative Approaches?

453	   All the following approaches address some part of the same problem
454	   space as L4S.  In each case, it is shown that L4S complements them or
455	   improves on them, rather than being a mutually exclusive alternative:

457	   Diffserv:  Diffserv addresses the problem of bandwidth apportionment
458	      for important traffic as well as queuing latency for delay-
459	      sensitive traffic.  L4S solely addresses the problem of queuing
460	      latency (as well as loss and throughput scaling).  Diffserv will
461	      still be necessary where important traffic requires priority (e.g.
462	      for commercial reasons, or for protection of critical
463	      infrastructure traffic).  Nonetheless, if there are Diffserv
464	      classes for important traffic, the L4S approach can provide low
465	      latency for _all_ traffic within each Diffserv class (including
466	      the case where there is only one Diffserv class).

468	      Also, as already explained, Diffserv only works for a small subset
469	      of the traffic on a link.  It is not applicable when all the
470	      applications in use at one time at a single site (home, small
471	      business or mobile device) require low latency.  Also, because L4S
472	      is for all traffic, it needs none of the management baggage
473	      (traffic policing, traffic contracts) associated with favouring
474	      some packets over others.  This baggage has held Diffserv back
475	      from widespread end-to-end deployment.

477	   State-of-the-art AQMs:  AQMs such as PIE and fq_CoDel give a
478	      significant reduction in queuing delay relative to no AQM at all.
479	      The L4S work is intended to complement these AQMs, and we
480	      definitely do not want to distract from the need to deploy them as
481	      widely as possible.  Nonetheless, without addressing the large
482	      saw-toothing rate variations of Classic congestion controls, AQMs
483	      alone cannot reduce queuing delay too far without significantly
484	      reducing link utilization.  The L4S approach resolves this tension
485	      by ensuring hosts can minimize the size of their sawteeth without
486	      appearing so aggressive to legacy flows that they starve.

488	   Per-flow queuing:  Similarly per-flow queuing is not incompatible
489	      with the L4S approach.  However, one queue for every flow can be
490	      thought of as overkill compared to the minimum of two queues for
491	      all traffic needed for the L4S approach.  The overkill of per-flow
492	      queuing has side-effects:

494	      A.  fq makes high performance networking equipment costly
495	          (processing and memory) - in contrast dual queue code can be
496	          very simple;

498	      B.  fq requires packet inspection into the end-to-end transport
499	          layer, which doesn't sit well alongside encryption for privacy
500	          - in contrast a dual queue only operates at the IP layer;

502	      C.  fq isolates the queuing of each flow from the others and it
503	          prevents any one flow from consuming more than 1/N of the
504	          capacity.  In contrast, all L4S flows are expected to keep the
505	          queue shallow, and policing of individual flows to enforce
506	          this may be applied separately, as a policy choice.

508	          An fq scheduler has to decide packet-by-packet which flow to
509	          schedule without knowing application intent.  Whereas a
510	          separate policing function can be configured less strictly, so
511	          that senders can still control the instantaneous rate of each
512	          flow dependent on the needs of each application (e.g. variable
513	          rate video), giving more wriggle-room before a flow is deemed
514	          non-compliant.  Also policing of queuing and of flow-rates can
515	          be applied independently.

517	   Alternative Back-off ECN (ABE):  Yet again, L4S is not an alternative
518	      to ABE but a complement that introduces much lower queuing delay.
519	      ABE [I-D.khademi-tcpm-alternativebackoff-ecn] alters the host
520	      behaviour in response to ECN marking to utilize a link better and
521	      give ECN flows a faster throughput, but it assumes the network
522	      still treats ECN and drop the same.  Therefore ABE exploits any
523	      lower queuing delay that AQMs can provide.  But as explained
524	      above, AQMs still cannot reduce queuing delay too far without
525	      losing link utilization (for other non-ABE flows).

527	6.  Applicability

529	   A transport layer that solves the current latency issues will provide
530	   new service, product and application opportunities.

532	   With the L4S approach, the following existing applications will
533	   immediately experience significantly better quality of experience
534	   under load in the best effort class:

536	   o  Gaming

538	   o  VoIP

540	   o  Video conferencing

542	   o  Web browsing

544	   o  (Adaptive) video streaming

546	   o  Instant messaging

548	   The significantly lower queuing latency also enables some interactive
549	   application functions to be offloaded to the cloud that would hardly
550	   even be usable today:

552	   o  Cloud based interactive video

554	   o  Cloud based virtual and augmented reality

556	   The above two applications have been successfully demonstrated with
557	   L4S, both running together over a 40 Mb/s broadband access link
558	   loaded up with the numerous other latency sensitive applications in
559	   the previous list as well as numerous downloads.  A panoramic video
560	   of a football stadium can be swiped and pinched so that on the fly a
561	   proxy in the cloud generates a sub-window of the match video under
562	   the finger-gesture control of each user.  At the same time, a virtual
563	   reality headset fed from a 360 degree camera in a racing car has been
564	   demonstrated, where the user's head movements control the scene
565	   generated in the cloud.  In both cases, with 7 ms end-to-end base
566	   delay, the additional queuing delay of roughly 1 ms is so low that it
567	   seems the video is generated locally.  See https://riteproject.eu/
568	   dctth/ for videos of these demonstrations.

570	   Using a swiping finger gesture or head movement to pan a video are
571	   extremely demanding applications--far more demanding than VoIP.

573	   Because human vision can detect extremely low delays of the order of
574	   single milliseconds when delay is translated into a visual lag
575	   between a video and a reference point (the finger or the orientation
576	   of the head).

578	   If low network delay is not available, all fine interaction has to be
579	   done locally and therefore much more redundant data has to be
580	   downloaded.  When all interactive processing can be done in the
581	   cloud, only the data to be rendered for the end user needs to be
582	   sent.  Whereas, once applications can rely on minimal queues in the
583	   network, they can focus on reducing their own latency by only
584	   minimizing the application send queue.

586	6.1.  Use Cases

588	   The following use-cases for L4S are being considered by various
589	   interested parties:

591	   o  Where the bottleneck is one of various types of access network:
592	      DSL, cable, mobile, satellite

594	      *  Radio links (cellular, WiFi) that are distant from the source
595	         are particularly challenging.  The radio link capacity can vary
596	         rapidly by orders of magnitude, so it is often desirable to
597	         hold a buffer to utilise sudden increases of capacity;

599	      *  cellular networks are further complicated by a perceived need
600	         to buffer in order to make hand-overs imperceptible;

602	      *  Satellite networks generally have a very large base RTT, so
603	         even with minimal queuing, overall delay can never be extremely
604	         low;

606	      *  Nonetheless, it is certainly desirable not to hold a buffer
607	         purely because of the sawteeth of Classic TCP, when it is more
608	         than is needed for all the above reasons.

610	   o  Private networks of heterogeneous data centres, where there is no
611	      single administrator that can arrange for all the simultaneous
612	      changes to senders, receivers and network needed to deploy DCTCP:

614	      *  a set of private data centres interconnected over a wide area
615	         with separate administrations, but within the same company

617	      *  a set of data centres operated by separate companies
618	         interconnected by a community of interest network (e.g. for the
619	         finance sector)

621	      *  multi-tenant (cloud) data centres where tenants choose their
622	         operating system stack (Infrastructure as a Service - IaaS)

624	   o  Different types of transport (or application) congestion control:

626	      *  elastic (TCP/SCTP);

628	      *  real-time (RTP, RMCAT);

630	      *  query (DNS/LDAP).

632	   o  Where low delay quality of service is required, but without
633	      inspecting or intervening above the IP layer
634	      [I-D.you-encrypted-traffic-management]:

636	      *  mobile and other networks have tended to inspect higher layers
637	         in order to guess application QoS requirements.  However, with
638	         growing demand for support of privacy and encryption, L4S
639	         offers an alternative.  There is no need to select which
640	         traffic to favour for queuing, when L4S gives favourable
641	         queuing to all traffic.

643	   o  If queuing delay is minimized, applications with a fixed delay
644	      budget can communicate over longer distances, or via a longer
645	      chain of service functions [RFC7665] or onion routers.

647	6.2.  Deployment Considerations

649	   {ToDo: This section TBA - currently, bullet points only.}

651	   Incremental deployment parts.

653	   Possible deployment sequences.

655	   Prioritizing the most-likely bottlenecks in the various use-cases
656	   (access links, downstream and upstream, broadband, mobile, DC, etc).

658	   Deployment incentives: Immediate vs. deferred benefits.

660	7.  IANA Considerations

662	   This specification contains no IANA considerations.

664	8.  Security Considerations
665	8.1.  Traffic (Non-)Policing

667	   Because the L4S service can serve all traffic that is using the
668	   capacity of a link, it should not be necessary to police access to
669	   the L4S service.  In contrast, Diffserv only works if some packets
670	   get less favourable treatement than others.  So it has to use traffic
671	   policers to limit how much traffic can be favoured, In turn, traffic
672	   policers require traffic contracts between users and networks as well
673	   as pairwise between networks.  Because L4S will lack all this
674	   management complexity, it is more likely to work end-to-end.

676	   During early deployment (and perhaps always), some networks will not
677	   offer the L4S service.  These networks do not need to police or re-
678	   mark L4S traffic - they just forward it unchanged as best efforts
679	   traffic, as they would already forward traffic with ECT(1) today.  At
680	   a bottleneck, such networks will introduce some queuing and dropping.
681	   When a scalable congestion control detects a drop it will have to
682	   respond as if it is a Classic congestion control (see item 3-1 in
683	   Appendix A).  This will ensure safe interworking with other traffic
684	   at the 'legacy' bottleneck, but it will degrade the L4S service to no
685	   better (but never worse) than classic best efforts, whenever a legacy
686	   (non-L4S) bottleneck is encountered on a path.

688	   Certain network operators might choose to restict access to the L4S
689	   class, perhaps only to customers who have paid a premium.  Their
690	   packet classifer (item 2 in Figure 1) could identify such customers
691	   against some other field (e.g. source address range) as well as ECN.
692	   If only the ECN L4S identifier matched, but not the source address
693	   (say), the classifier could direct these packets (from non-paying
694	   customers) into the Classic queue.  Allowing operators to use an
695	   additional local classifier is intended to remove any incentive to
696	   bleach the L4S identifier.  Then at least the L4S ECN identifier will
697	   be more likely to survive end-to-end even though the service may not
698	   be supported at every hop.  Such arrangements would only require
699	   simple registered/not-registered packet classification, rather than
700	   the managed application-specific traffic policing against customer-
701	   specific traffic contracts that Diffserv requires.

703	8.2.  'Latency Friendliness'

705	   The L4S service does rely on self-constraint - not in terms of
706	   limiting capacity usage, but in terms of limiting burstiness.  It is
707	   hoped that standardisation of dynamic behaviour (cf.  TCP slow-start)
708	   and self-interest will be sufficient to prevent transports from
709	   sending excessive bursts of L4S traffic, given the application's own
710	   latency will suffer most from such behaviour.

712	   Whether burst policing becomes necessary remains to be seen.  Without
713	   it, there will be potential for attacks on the low latency of the L4S
714	   service.  However it may only be necessary to apply such policing
715	   reactively, e.g. punitively targeted at any deployments of new bursty
716	   malware.

718	8.3.  Policing Prioritized L4S Bandwidth

720	   As mentioned in Section 5.2, L4S should remove the need for low
721	   latency Diffserv classes.  However, those Diffserv classes that give
722	   certain applications or users priority over capacity, would still be
723	   applicable.  Then, within such Diffserv classes, L4S would often be
724	   applicable to give traffic low latency and low loss.  WIthin such a
725	   class, the bandwidth available to a user or application is often
726	   limited by a rate policer.  Similarly, in the default Diffserv class,
727	   rate policers are used to partition shared capacity.

729	   A classic rate policer drops any packets exceeding a set rate,
730	   usually also giving a burst allowance (variant exist where the
731	   policer re-marks non-compliant traffic to a discard-eligible Diffserv
732	   codepoint, so they may be dropped elsewhere during contention).  In
733	   networks that deploy L4S and use rate policers, it will be preferable
734	   to deploy a policer designed to be more friendly to the L4S service,

736	   This might be achieved by setting a threshold where ECN marking is
737	   introduced, such that it is just under the policed rate or just under
738	   the burst allowance where drop is introduced.  This could be applied
739	   to various types of policer, e.g.  [RFC2697], [RFC2698] or the local
740	   (non-ConEx) variant of the ConEx congestion policer
741	   [I-D.briscoe-conex-policing].  Otherwise, whenever L4S traffic
742	   encounters a rate policer, it will experience drops and the source
743	   will fall back to a Classic congestion control, thus losing all the
744	   benefits of L4S.

746	   Further discussion of the applicability of L4S to the various
747	   Diffserv classes, and the design of suitable L4S rate policers.

749	8.4.  ECN Integrity

751	   Receiving hosts can fool a sender into downloading faster by
752	   suppressing feedback of ECN marks (or of losses if retransmissions
753	   are not necessary or available otherwise).  [RFC3540] proposes that a
754	   TCP sender could pseudorandomly set either of ECT(0) or ECT(1) in
755	   each packet of a flow and remember the sequence it had set, termed
756	   the ECN nonce.  If the receiver supports the nonce, it can prove that
757	   it is not suppressing feedback by reflecting its knowledge of the
758	   sequence back to the sender.  The nonce was proposed on the
759	   assumption that receivers might be more likely to cheat congestion
760	   control than senders (although senders also have a motive to cheat).

762	   If L4S uses the ECT(1) codepoint of ECN for packet classification, it
763	   will have to obsolete the experimental nonce.  As far as is known,
764	   the ECN Nonce has never been deployed, and it was only implemented
765	   for a couple of testbed evaluations.  It would be nearly impossible
766	   to deploy now, because any misbehaving receiver can simply opt-out,
767	   which would be unremarkable given all receivers currently opt-out.

769	   Other ways to protect TCP feedback integrity have since been
770	   developed.  For instance:

772	   o  the sender can test the integrity of the receiver's feedback by
773	      occasionally setting the IP-ECN field to a value normally only set
774	      by the network.  Then it can test whether the receiver's feedback
775	      faithfully reports what it expects [I-D.moncaster-tcpm-rcv-cheat].
776	      This method consumes no extra codepoints.  It works for loss and
777	      it will work for ECN feedback in any transport protocol suitable
778	      for L4S.  However, it shares the same assumption as the nonce;
779	      that the sender is not cheating and it is motivated to prevent the
780	      receiver cheating;

782	   o  A network can enforce a congestion response to its ECN markings
783	      (or packet losses) by auditing congestion exposure (ConEx)
784	      [RFC7713].  Whether the receiver or a downstream network is
785	      suppressing congestion feedback or the sender is unresponsive to
786	      the feedback, or both, ConEx audit can neutralise any advantage
787	      that any of these three parties would otherwise gain.  ConEx is
788	      only currently defined for IPv6 and consumes a destination option
789	      header.  It has been implemented, but not deployed as far as is
790	      known.

792	9.  Acknowledgements

794	   Thanks to Wes Eddy, Karen Nielsen and David Black for their useful
795	   review comments.

797	10.  References

799	10.1.  Normative References

801	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
802	              Requirement Levels", BCP 14, RFC 2119,
803	              DOI 10.17487/RFC2119, March 1997,
804	              <http://www.rfc-editor.org/info/rfc2119>.

806	10.2.  Informative References

808	   [Alizadeh-stability]
809	              Alizadeh, M., Javanmard, A., and B. Prabhakar, "Analysis
810	              of DCTCP: Stability, Convergence, and Fairness", ACM
811	              SIGMETRICS 2011 , June 2011.

813	   [DCttH15]  De Schepper, K., Bondarenko, O., Tsang, I., and B.
814	              Briscoe, "'Data Centre to the Home': Ultra-Low Latency for
815	              All", 2015, <http://www.bobbriscoe.net/projects/latency/
816	              dctth_preprint.pdf>.

818	              (Under submission)

820	   [Hohlfeld14]
821	              Hohlfeld , O., Pujol, E., Ciucu, F., Feldmann, A., and P.
822	              Barford, "A QoE Perspective on Sizing Network Buffers",
823	              Proc. ACM Internet Measurement Conf (IMC'14) hmm, November
824	              2014.

826	   [I-D.briscoe-aqm-dualq-coupled]
827	              Schepper, K., Briscoe, B., Bondarenko, O., and I. Tsang,
828	              "DualQ Coupled AQM for Low Latency, Low Loss and Scalable
829	              Throughput", draft-briscoe-aqm-dualq-coupled-01 (work in
830	              progress), March 2016.

832	   [I-D.briscoe-conex-policing]
833	              Briscoe, B., "Network Performance Isolation using
834	              Congestion Policing", draft-briscoe-conex-policing-01
835	              (work in progress), February 2014.

837	   [I-D.briscoe-tsvwg-ecn-l4s-id]
838	              Schepper, K., Briscoe, B., and I. Tsang, "Identifying
839	              Modified Explicit Congestion Notification (ECN) Semantics
840	              for Ultra-Low Queuing Delay", draft-briscoe-tsvwg-ecn-l4s-
841	              id-02 (work in progress), October 2016.

843	   [I-D.ietf-aqm-fq-codel]
844	              Hoeiland-Joergensen, T., McKenney, P.,
845	              dave.taht@gmail.com, d., Gettys, J., and E. Dumazet, "The
846	              FlowQueue-CoDel Packet Scheduler and Active Queue
847	              Management Algorithm", draft-ietf-aqm-fq-codel-06 (work in
848	              progress), March 2016.

850	   [I-D.ietf-tcpm-accurate-ecn]
851	              Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More
852	              Accurate ECN Feedback in TCP", draft-ietf-tcpm-accurate-
853	              ecn-02 (work in progress), October 2016.

855	   [I-D.ietf-tcpm-cubic]
856	              Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and
857	              R. Scheffenegger, "CUBIC for Fast Long-Distance Networks",
858	              draft-ietf-tcpm-cubic-04 (work in progress), February
859	              2017.

861	   [I-D.ietf-tcpm-dctcp]
862	              Bensley, S., Eggert, L., Thaler, D., Balasubramanian, P.,
863	              and G. Judd, "Datacenter TCP (DCTCP): TCP Congestion
864	              Control for Datacenters", draft-ietf-tcpm-dctcp-04 (work
865	              in progress), February 2017.

867	   [I-D.ietf-tsvwg-ecn-experimentation]
868	              Black, D., "Explicit Congestion Notification (ECN)
869	              Experimentation", draft-ietf-tsvwg-ecn-experimentation-01
870	              (work in progress), March 2017.

872	   [I-D.johansson-quic-ecn]
873	              Johansson, I., "ECN support in QUIC", draft-johansson-
874	              quic-ecn-01 (work in progress), February 2017.

876	   [I-D.khademi-tcpm-alternativebackoff-ecn]
877	              Khademi, N., Welzl, M., Armitage, G., and G. Fairhurst,
878	              "TCP Alternative Backoff with ECN (ABE)", draft-khademi-
879	              tcpm-alternativebackoff-ecn-01 (work in progress), October
880	              2016.

882	   [I-D.moncaster-tcpm-rcv-cheat]
883	              Moncaster, T., Briscoe, B., and A. Jacquet, "A TCP Test to
884	              Allow Senders to Identify Receiver Non-Compliance", draft-
885	              moncaster-tcpm-rcv-cheat-03 (work in progress), July 2014.

887	   [I-D.stewart-tsvwg-sctpecn]
888	              Stewart, R., Tuexen, M., and X. Dong, "ECN for Stream
889	              Control Transmission Protocol (SCTP)", draft-stewart-
890	              tsvwg-sctpecn-05 (work in progress), January 2014.

892	   [I-D.you-encrypted-traffic-management]
893	              You, J. and C. Xiong, "The Effect of Encrypted Traffic on
894	              the QoS Mechanisms in Cellular Networks", draft-you-
895	              encrypted-traffic-management-00 (work in progress),
896	              October 2015.

898	   [Mathis09]
899	              Mathis, M., "Relentless Congestion Control", PFLDNeT'09 ,
900	              May 2009, <http://www.hpcc.jp/pfldnet2009/
901	              Program_files/1569198525.pdf>.

903	   [NewCC_Proc]
904	              Eggert, L., "Experimental Specification of New Congestion
905	              Control Algorithms", IETF Operational Note ion-tsv-alt-cc,
906	              July 2007.

908	   [PI2]      De Schepper, K., Bondarenko, O., Tsang, I., and B.
909	              Briscoe, "PI^2 : A Linearized AQM for both Classic and
910	              Scalable TCP", Proc. ACM CoNEXT 2016 pp.105-119, December
911	              2016,
912	              <http://dl.acm.org/citation.cfm?doid=2999572.2999578>.

914	   [RFC2697]  Heinanen, J. and R. Guerin, "A Single Rate Three Color
915	              Marker", RFC 2697, DOI 10.17487/RFC2697, September 1999,
916	              <http://www.rfc-editor.org/info/rfc2697>.

918	   [RFC2698]  Heinanen, J. and R. Guerin, "A Two Rate Three Color
919	              Marker", RFC 2698, DOI 10.17487/RFC2698, September 1999,
920	              <http://www.rfc-editor.org/info/rfc2698>.

922	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
923	              of Explicit Congestion Notification (ECN) to IP",
924	              RFC 3168, DOI 10.17487/RFC3168, September 2001,
925	              <http://www.rfc-editor.org/info/rfc3168>.

927	   [RFC3246]  Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec,
928	              J., Courtney, W., Davari, S., Firoiu, V., and D.
929	              Stiliadis, "An Expedited Forwarding PHB (Per-Hop
930	              Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002,
931	              <http://www.rfc-editor.org/info/rfc3246>.

933	   [RFC3540]  Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
934	              Congestion Notification (ECN) Signaling with Nonces",
935	              RFC 3540, DOI 10.17487/RFC3540, June 2003,
936	              <http://www.rfc-editor.org/info/rfc3540>.

938	   [RFC3649]  Floyd, S., "HighSpeed TCP for Large Congestion Windows",
939	              RFC 3649, DOI 10.17487/RFC3649, December 2003,
940	              <http://www.rfc-editor.org/info/rfc3649>.

942	   [RFC4340]  Kohler, E., Handley, M., and S. Floyd, "Datagram
943	              Congestion Control Protocol (DCCP)", RFC 4340,
944	              DOI 10.17487/RFC4340, March 2006,
945	              <http://www.rfc-editor.org/info/rfc4340>.

947	   [RFC4774]  Floyd, S., "Specifying Alternate Semantics for the
948	              Explicit Congestion Notification (ECN) Field", BCP 124,
949	              RFC 4774, DOI 10.17487/RFC4774, November 2006,
950	              <http://www.rfc-editor.org/info/rfc4774>.

952	   [RFC4960]  Stewart, R., Ed., "Stream Control Transmission Protocol",
953	              RFC 4960, DOI 10.17487/RFC4960, September 2007,
954	              <http://www.rfc-editor.org/info/rfc4960>.

956	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
957	              Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
958	              <http://www.rfc-editor.org/info/rfc5681>.

960	   [RFC6679]  Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P.,
961	              and K. Carlberg, "Explicit Congestion Notification (ECN)
962	              for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August
963	              2012, <http://www.rfc-editor.org/info/rfc6679>.

965	   [RFC7560]  Kuehlewind, M., Ed., Scheffenegger, R., and B. Briscoe,
966	              "Problem Statement and Requirements for Increased Accuracy
967	              in Explicit Congestion Notification (ECN) Feedback",
968	              RFC 7560, DOI 10.17487/RFC7560, August 2015,
969	              <http://www.rfc-editor.org/info/rfc7560>.

971	   [RFC7665]  Halpern, J., Ed. and C. Pignataro, Ed., "Service Function
972	              Chaining (SFC) Architecture", RFC 7665,
973	              DOI 10.17487/RFC7665, October 2015,
974	              <http://www.rfc-editor.org/info/rfc7665>.

976	   [RFC7713]  Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx)
977	              Concepts, Abstract Mechanism, and Requirements", RFC 7713,
978	              DOI 10.17487/RFC7713, December 2015,
979	              <http://www.rfc-editor.org/info/rfc7713>.

981	   [RFC8033]  Pan, R., Natarajan, P., Baker, F., and G. White,
982	              "Proportional Integral Controller Enhanced (PIE): A
983	              Lightweight Control Scheme to Address the Bufferbloat
984	              Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017,
985	              <http://www.rfc-editor.org/info/rfc8033>.

987	   [TCP-sub-mss-w]
988	              Briscoe, B. and K. De Schepper, "Scaling TCP's Congestion
989	              Window for Small Round Trip Times", BT Technical Report
990	              TR-TUB8-2015-002, May 2015,
991	              <http://www.bobbriscoe.net/projects/latency/
992	              sub-mss-w.pdf>.

994	   [TCPPrague]
995	              Briscoe, B., "Notes: DCTCP evolution 'bar BoF': Tue 21 Jul
996	              2015, 17:40, Prague", tcpprague mailing list archive ,
997	              July 2015.

999	Appendix A.  Required features for scalable transport protocols to be
1000	             safely deployable in the Internet (a.k.a.  TCP Prague
1001	             requirements)

1003	   This list contains a list of features, mechanisms and modifications
1004	   from currently defined behaviour for scalable Transport protocols so
1005	   that they can be safely deployed over the public Internet.  This list
1006	   of requirements was produced at an ad hoc meeting during IETF-94 in
1007	   Prague [TCPPrague].

1009	   One of such scalable transport protocols is DCTCP, currently
1010	   specified in [I-D.ietf-tcpm-dctcp].  In its current form, DCTCP is
1011	   specified to be deployable in controlled environments and deploying
1012	   it in the public Internet would lead to a number of issues, both from
1013	   the safety and the performance perspective.  In this section, we
1014	   describe the modifications and additional mechanisms that are
1015	   required for its deployment over the global Internet.  We use DCTCP
1016	   as a base, but it is likely that most of these requirements equally
1017	   apply to other scalable transport protocols.

1019	   We next provide a brief description of each required feature.

1021	   Requirement #4.1: Fall back to Reno/Cubic congestion control on
1022	   packet loss.

1024	   Description: In case of packet loss, the scalable transport MUST
1025	   react as classic TCP (whatever the classic version of TCP is running
1026	   in the host, e.g.  Reno, Cubic).

1028	   Motivation: As part of the safety conditions for deploying a scalable
1029	   transport over the public Internet is to make sure that it behaves
1030	   properly when some or all the network devices connecting the two
1031	   endpoints that implement the scalable transport have not been
1032	   upgraded.  In particular, it may be the case that some of the
1033	   switches along the path between the two endpoints may only react to
1034	   congestion by dropping packets (i.e. no ECN marking).  It is
1035	   important that in these cases, the scalable transport react to the
1036	   congestion signal in the form of a packet drop similarly to classic
1037	   TCP.

1039	   In the particular case of DCTCP, the current DCTCP specification
1040	   states that "It is RECOMMENDED that an implementation deal with loss
1041	   episodes in the same way as conventional TCP."  For safe deployment
1042	   in the public Internet of a scalable transport, the above requirement
1043	   needs to be defined as a MUST.

1045	   Packet loss, while rare, may also occur in the case that the
1046	   bottleneck is L4S capable.  In this case, the sender may receive a
1047	   high number of packets marked with the CE bit set and also experience
1048	   a loss.  Current DCTCP implementations react differently to this
1049	   situation.  At least one implementation reacts only to the drop
1050	   signal (e.g. by halving the CWND) and at least another DCTCP
1051	   implementation reacts to both signals (e.g. by halving the CWND due
1052	   to the drop and also further reducing the CWND based on the
1053	   proportion of marked packet).  We believe that further
1054	   experimentation is needed to understand what is the best behaviour
1055	   for the public Internet, which may or not be one of the existent
1056	   implementations.

1058	   Requirement #4.2: Fall back to Reno/Cubic congestion control on
1059	   classic ECN bottlenecks.

1061	   Description: The scalable transport protocol SHOULD/MAY? behave as
1062	   classic TCP with classic ECN if the path contains a legacy bottleneck
1063	   which marks both ect(0) and ect(1) in the same way as drop (non L4S,
1064	   but ECN capable bottleneck).

1066	   Motivation: Similarly to Requirement #3.1, this requirement is a
1067	   safety condition in case L4S-capable endpoints are communicating over
1068	   a path that contains one or more non-L4S but ECN capable switches and
1069	   one of them happens to be the bottleneck.  In this case, the scalable
1070	   transport will attempt to fill in the buffer of the bottleneck switch
1071	   up to the marking threshold and produce a small sawtooth around that
1072	   operation point.  The result is that the switch will set its
1073	   operation point with the buffer full and all other non-scalable
1074	   transports will be starved (as they will react reducing their CWND
1075	   more aggressively than the scalable transport).

1077	   Scalable transports then MUST be able to detect the presence of a
1078	   classic ECN bottleneck and fall back to classic TCP/classic ECN
1079	   behaviour in this case.

1081	   Discussion: It is not clear at this point if it is possible to design
1082	   a mechanism that always detect the aforementioned cases.  One
1083	   possibility is to base the detection on an increase on top of a
1084	   minimum RTT, but it is not yet clear which value should trigger this.
1085	   Having a delay based fall back response on L4S may as well be
1086	   beneficial for preserving low latency without legacy network nodes.
1087	   Even if it possible to design such a mechanism, it may well be that
1088	   it would encompass additional complexity that implementers may
1089	   consider unnecessary.  The need for this mechanism depends on the
1090	   extent of classic ECN deployment.

1092	   Requirement #4.3: Reduce RTT dependence
1093	   Description: Scalable transport congestion control algorithms MUST
1094	   reduce or eliminate the RTT bias within the range of RTTs available.

1096	   Motivation: Classic TCP's throughput is known to be inversely
1097	   proportional to RTT.  One would expect flows over very low RTT paths
1098	   to nearly starve flows over larger RTTs.  However, because Classic
1099	   TCP induces a large queue, it has never allowed a very low RTT path
1100	   to exist, so far.  For instance, consider two paths with base RTT 1ms
1101	   and 100ms.  If Classic TCP induces a 20ms queue, it turns these RTTs
1102	   into 21ms and 120ms leading to a throughput ratio of about 1:6.
1103	   Whereas if a Scalable TCP induces only a 1ms queue, the ratio is
1104	   2:101.  Therefore, with small queues, long RTT flows will essentially
1105	   starve.

1107	   Scalable transport protocol MUST then accommodate flows across the
1108	   range of RTTs enabled by the deployment of L4S service over the
1109	   public Internet.

1111	   Requirement #4.4: Scaling down the congestion window.

1113	   Description: Scalable transports MUST be responsive to congestion
1114	   when RTTs are significantly smaller than in the current public
1115	   Internet.

1117	   Motivation: As currently specified, the minimum CWND of TCP (and the
1118	   scalable extensions such as DCTCP), is set to 2 MSS.  Once this
1119	   minimum CWND is reached, the transport protocol ceases to react to
1120	   congestion signals (the CWND is not further reduced beyond this
1121	   minimum size).

1123	   L4S mechanisms reduce significantly the queueing delay, achieving
1124	   smaller RTTs over the Internet.  For the same CWND, smaller RTTs
1125	   imply higher transmission rates.  The result is that when scalable
1126	   transport are used and small RTTs are achieved, the minimum value of
1127	   the CWND currently defined in 2 MSS may still result in a high
1128	   transmission rate for a large number of common scenarios.  For
1129	   example, as described in [TCP-sub-mss-w], consider a residential
1130	   setting with an broadband Internet access of 40Mbps.  Suppose now a
1131	   number of equal TCP flows running in parallel with the Internet
1132	   access link being the bottleneck.  Suppose that for these flows, the
1133	   RTT is 6ms and the MSS is 1500B.  The minimum transmission rate
1134	   supported by TCP in this scenario is when CWND is set to 2 MSS, which
1135	   results in 4Mbps for each flow.  This means that in this scenario, if
1136	   the number of flows is higher than 10, the congestion control ceases
1137	   to be responsive and starts to build up a queue in the network.

1139	   In order to address this issue, the congestion control mechanism for
1140	   scalable transports MUST be responsive for the new range of RTT
1141	   resulting from the decrease of the queueing delay.

1143	   There are several ways how this can be achieved.  One possible sub-
1144	   MSS window mechanism is described in [TCP-sub-mss-w].

1146	   In addition to the safety requirements described before, there are
1147	   some optimizations that while not required for the safe deployment of
1148	   scalable transports over the public Internet, would results in an
1149	   optimized performance.  We describe them next.

1151	   Optimization #5.1: Setting ECT in SYN, SYN/ACK and pure ACK packets.

1153	   Description: Scalable transport SHOULD set the ECT bit in SYN, SYN/
1154	   ACK and pure ACK packets.

1156	   Motivation: Failing to set the ECT bit in SYN, SYN/ACK or ACK packets
1157	   results in these packets being more likely dropped during congestion
1158	   events.  Dropping SYN and SYN/ACK packets is particularly bad for
1159	   performance as the retransmission timers for these packets are large.
1160	   [RFC3168] prevents from marking these packets due to security
1161	   reasons.  The arguments provided should be revisited in the the
1162	   context of L4S and evaluate if avoiding marking these packets is
1163	   still the best approach.

1165	   Optimization #5.2: Faster than additive increase.

1167	   Description: Scalable transport MAY support faster than additive
1168	   increase in the congestion avoidance phase.

1170	   Motivation: As currently defined, DCTCP supports additive increase in
1171	   congestion avoidance phase.  It would be beneficial for performance
1172	   to update the congestion control algorithm to increase the CWND more
1173	   than 1 MSS per RTT during the congestion avoidance phase.  In the
1174	   context of L4S such mechanism, must also provide fairness with other
1175	   classes of traffic, including classic TCP and possibly scalable TCP
1176	   that uses additive increase.

1178	   Optimization #5.3: Faster convergence to fairness.

1180	   Description: Scalable transport SHOULD converge to a fair share
1181	   allocation of the available capacity as fast as classic TCP or
1182	   faster.

1184	   Motivation: The time required for a new flow to obtain its fair share
1185	   of the capacity of the bottleneck when the there are already ongoing
1186	   flows using up all the bottleneck capacity is higher in the case of
1187	   DCTCP than in the case of classic TCP (about a factor of 1,5 and 2
1188	   larger according to [Alizadeh-stability]).  This is detrimental in
1189	   general, but it is very harmful for short flows, which performance
1190	   can be worse than the one obtained with classic TCP. for this reason
1191	   it is desirable that scalable transport provide convergence times no
1192	   larger than classic TCP.

1194	Appendix B.  Standardization items

1196	   The following table includes all the itmes that should be
1197	   standardized to provide a full L4S architecture.

1199	   The table is too wide for the ASCII draft format, so it has been
1200	   split into two, with a common column of row index numbers on the
1201	   left.

1203	   The columns in the second part of the table have the following
1204	   meanings:

1206	   WG:  The IETF WG most relevant to this requirement.  The "tcpm/iccrg"
1207	      combination refers to the procedure typically used for congestion
1208	      control changes, where tcpm owns the approval decision, but uses
1209	      the iccrg for expert review [NewCC_Proc];

1211	   TCP:  Applicable to all forms of TCP congestion control;

1213	   DCTCP:  Applicable to Data Centre TCP as currently used (in
1214	      controlled environments);

1216	   DCTCP bis:  Applicable to an future Data Centre TCP congestion
1217	      control intended for controlled environments;

1219	   XXX Prague:  Applicable to a Scalable variant of XXX (TCP/SCTP/RMCAT)
1220	      congestion control.

1222	   +-----+-----------------------+-------------------------------------+
1223	   | Req | Requirement           | Reference                           |
1224	   | #   |                       |                                     |
1225	   +-----+-----------------------+-------------------------------------+
1226	   | 0   | ARCHITECTURE          |                                     |
1227	   | 1   | L4S IDENTIFIER        | [I-D.briscoe-tsvwg-ecn-l4s-id]      |
1228	   | 2   | DUAL QUEUE AQM        | [I-D.briscoe-aqm-dualq-coupled]     |
1229	   | 3   | Suitable ECN Feedback | [I-D.ietf-tcpm-accurate-ecn],       |
1230	   |     |                       | [I-D.stewart-tsvwg-sctpecn].        |
1231	   |     |                       |                                     |
1232	   |     | SCALABLE TRANSPORT -  |                                     |
1233	   |     | SAFETY ADDITIONS      |                                     |
1234	   | 4-1 | Fall back to          | [I-D.ietf-tcpm-dctcp]               |
1235	   |     | Reno/Cubic on loss    |                                     |
1236	   | 4-2 | Fall back to          |                                     |
1237	   |     | Reno/Cubic if classic |                                     |
1238	   |     | ECN bottleneck        |                                     |
1239	   |     | detected              |                                     |
1240	   |     |                       |                                     |
1241	   | 4-3 | Reduce RTT-dependence |                                     |
1242	   |     |                       |                                     |
1243	   | 4-4 | Scaling TCP's         | [TCP-sub-mss-w]                     |
1244	   |     | Congestion Window for |                                     |
1245	   |     | Small Round Trip      |                                     |
1246	   |     | Times                 |                                     |
1247	   |     | SCALABLE TRANSPORT -  |                                     |
1248	   |     | PERFORMANCE           |                                     |
1249	   |     | ENHANCEMENTS          |                                     |
1250	   | 5-1 | Setting ECT in SYN,   | draft-bagnulo-tsvwg-generalized-ECN |
1251	   |     | SYN/ACK and pure ACK  |                                     |
1252	   |     | packets               |                                     |
1253	   | 5-2 | Faster-than-additive  |                                     |
1254	   |     | increase              |                                     |
1255	   | 5-3 | Less drastic exit     |                                     |
1256	   |     | from slow-start       |                                     |
1257	   +-----+-----------------------+-------------------------------------+
1258	   +-----+--------+-----+-------+-----------+--------+--------+--------+
1259	   | #   | WG     | TCP | DCTCP | DCTCP-bis | TCP    | SCTP   | RMCAT  |
1260	   |     |        |     |       |           | Prague | Prague | Prague |
1261	   +-----+--------+-----+-------+-----------+--------+--------+--------+
1262	   | 0   | tsvwg? | Y   | Y     | Y         | Y      | Y      | Y      |
1263	   | 1   | tsvwg? |     |       | Y         | Y      | Y      | Y      |
1264	   | 2   | aqm?   | n/a | n/a   | n/a       | n/a    | n/a    | n/a    |
1265	   |     |        |     |       |           |        |        |        |
1266	   |     |        |     |       |           |        |        |        |
1267	   |     |        |     |       |           |        |        |        |
1268	   | 3   | tcpm   | Y   | Y     | Y         | Y      | n/a    | n/a    |
1269	   |     |        |     |       |           |        |        |        |
1270	   | 4-1 | tcpm   |     | Y     | Y         | Y      | Y      | Y      |
1271	   |     |        |     |       |           |        |        |        |
1272	   | 4-2 | tcpm/  |     |       |           | Y      | Y      | ?      |
1273	   |     | iccrg? |     |       |           |        |        |        |
1274	   |     |        |     |       |           |        |        |        |
1275	   |     |        |     |       |           |        |        |        |
1276	   |     |        |     |       |           |        |        |        |
1277	   |     |        |     |       |           |        |        |        |
1278	   | 4-3 | tcpm/  |     |       | Y         | Y      | Y      | ?      |
1279	   |     | iccrg? |     |       |           |        |        |        |
1280	   | 4-4 | tcpm   | Y   | Y     | Y         | Y      | Y      | ?      |
1281	   |     |        |     |       |           |        |        |        |
1282	   |     |        |     |       |           |        |        |        |
1283	   | 5-1 | tsvwg  | Y   | Y     | Y         | Y      | n/a    | n/a    |
1284	   |     |        |     |       |           |        |        |        |
1285	   | 5-2 | tcpm/  |     |       | Y         | Y      | Y      | ?      |
1286	   |     | iccrg? |     |       |           |        |        |        |
1287	   | 5-3 | tcpm/  |     |       | Y         | Y      | Y      | ?      |
1288	   |     | iccrg? |     |       |           |        |        |        |
1289	   +-----+--------+-----+-------+-----------+--------+--------+--------+

1291	Authors' Addresses

1293	   Bob Briscoe (editor)
1294	   Simula Research Lab

1296	   Email: ietf@bobbriscoe.net
1297	   URI:   http://bobbriscoe.net/
1298	   Koen De Schepper
1299	   Nokia Bell Labs
1300	   Antwerp
1301	   Belgium

1303	   Email: koen.de_schepper@nokia.com
1304	   URI:   https://www.bell-labs.com/usr/koen.de_schepper

1306	   Marcelo Bagnulo
1307	   Universidad Carlos III de Madrid
1308	   Av. Universidad 30
1309	   Leganes, Madrid 28911
1310	   Spain

1312	   Phone: 34 91 6249500
1313	   Email: marcelo@it.uc3m.es
1314	   URI:   http://www.it.uc3m.es