idnits 2.17.1 

draft-white-tsvwg-nqb-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (October 22, 2018) is 2006 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-20) exists of
     draft-ietf-tsvwg-l4s-arch-02


     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Transport Area Working Group                                    G. White
3	Internet-Draft                                                 CableLabs
4	Intended status: Informational                          October 22, 2018
5	Expires: April 25, 2019

7	 Identifying and Handling Non Queue Building Flows in a Bottleneck Link
8	                        draft-white-tsvwg-nqb-00

10	Abstract

12	   This draft discusses the potential to improve quality of experience
13	   for broadband internet applications by distinguishing between flows
14	   that cause queuing latency and flows that don't.

16	Status of This Memo

18	   This Internet-Draft is submitted in full conformance with the
19	   provisions of BCP 78 and BCP 79.

21	   Internet-Drafts are working documents of the Internet Engineering
22	   Task Force (IETF).  Note that other groups may also distribute
23	   working documents as Internet-Drafts.  The list of current Internet-
24	   Drafts is at https://datatracker.ietf.org/drafts/current/.

26	   Internet-Drafts are draft documents valid for a maximum of six months
27	   and may be updated, replaced, or obsoleted by other documents at any
28	   time.  It is inappropriate to use Internet-Drafts as reference
29	   material or to cite them other than as "work in progress."

31	   This Internet-Draft will expire on April 25, 2019.

33	Copyright Notice

35	   Copyright (c) 2018 IETF Trust and the persons identified as the
36	   document authors.  All rights reserved.

38	   This document is subject to BCP 78 and the IETF Trust's Legal
39	   Provisions Relating to IETF Documents
40	   (https://trustee.ietf.org/license-info) in effect on the date of
41	   publication of this document.  Please review these documents
42	   carefully, as they describe your rights and restrictions with respect
43	   to this document.  Code Components extracted from this document must
44	   include Simplified BSD License text as described in Section 4.e of
45	   the Trust Legal Provisions and are provided without warranty as
46	   described in the Simplified BSD License.

48	Table of Contents

50	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
51	   2.  Non-Queue Building Flows  . . . . . . . . . . . . . . . . . .   3
52	   3.  Identifying NQB traffic . . . . . . . . . . . . . . . . . . .   3
53	     3.1.  Endpoint marking  . . . . . . . . . . . . . . . . . . . .   4
54	     3.2.  Queuing behavior analysis . . . . . . . . . . . . . . . .   5
55	   4.  Non Queue Building PHB  . . . . . . . . . . . . . . . . . . .   5
56	   5.  End-to-end Support  . . . . . . . . . . . . . . . . . . . . .   6
57	   6.  Relationship to L4S . . . . . . . . . . . . . . . . . . . . .   6
58	   7.  Comparison to Existing Approaches . . . . . . . . . . . . . .   6
59	   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   7
60	   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   7
61	   10. Security Considerations . . . . . . . . . . . . . . . . . . .   7
62	   11. Informative References  . . . . . . . . . . . . . . . . . . .   7
63	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   8

65	1.  Introduction

67	   Residential broadband internet services are commonly configured with
68	   a single bottleneck link (the access network link) upon which the
69	   service definition is applied.  The service definition, typically an
70	   upstream/downstream data rate tuple, is implemented as a configured
71	   pair of rate shapers that are applied to the user's traffic.  In such
72	   networks, the quality of service that each application receives, and
73	   as a result, the quality of experience that it generates for the user
74	   is influenced by the characteristics of the access network link.

76	   The vast majority of packets that are carried by residential
77	   broadband access networks are managed by an end-to-end congestion
78	   control algorithm, such as Reno, Cubic or BBR.  These congestion
79	   control algorithms attempt to seek the available capacity of the end-
80	   to-end path (which in the case of residential broadband networks, can
81	   frequently be the access network link), and in doing so generally
82	   overshoot the available capacity, causing a queue to build-up at the
83	   bottleneck link.  This queue build up results in queuing delay that
84	   the application experiences as variable latency.

86	   In contrast to congestion-controlled applications, there are a
87	   variety of relatively low data rate applications that do not
88	   materially contribute to queueing delay, but are nonetheless
89	   subjected to it by sharing the same bottleneck link in the access
90	   network.  Many of these applications may be sensitive to latency or
91	   latency variation, and thus produce a poor quality of experience in
92	   such conditions.

94	   Active Queue Management (AQM) mechanisms (such as PIE [RFC8033],
95	   DOCSIS-PIE [RFC8034], or CoDel [RFC8289]) can improve the quality of
96	   experience for latency sensitive applications, but there are
97	   practical limits to the amount of improvement that can be achieved
98	   without impacting the throughput of capacity-seeking applications.

100	   This document considers differentiating between these two classes of
101	   traffic in bottleneck links in order that both classes can deliver
102	   exceptional quality of experience for their applications, and
103	   solicits discussion / feedback.

105	2.  Non-Queue Building Flows

107	   There are many applications that send traffic at relatively low data
108	   rates and/or in a fairly smooth and consistent manner such that they
109	   are highly unlikely to exceed the available capacity of the network
110	   path between source and sink.  Such applications are ideal candidates
111	   to be queued separately from the capacity-seeking applications that
112	   cause queue buildups and latency.

114	   These Non-queue-building (NQB) flows are typically UDP flows, which
115	   send traffic at a lower data rate and don't seek the capacity of the
116	   link (examples: online games, voice chat, dns lookups).  Here the
117	   data rate is essentially limited by the Application itself.  In
118	   contrast, Queue-building (QB) flows include traffic which uses the
119	   Traditional TCP, QUIC, BBR or other TCP variants.

121	   There are a lot of great examples of applications that fall very
122	   neatly into these two categories, but there are also application
123	   flows that may be in a gray area in between (e.g. they are NQB on
124	   high-speed links, but QB on slow-speed links).

126	3.  Identifying NQB traffic

128	   This memo is intended to seek feedback on mechanisms by which Non-
129	   Queue Building flows can be identified by the network in an
130	   application-neutral way.  Two mechanisms in particular seem feasible,
131	   and could (either alone or in concert) be used to differentiate
132	   between QB and NQB flows.

134	   o  Endpoint marking.  This mechanism would have application endpoints
135	      apply a marking (perhaps utilizing the Diffserv field of the IP
136	      header) to NQB flows that could then be used by the network to
137	      differentiate between QB and NQB flows.

139	   o  Queuing behavior analysis.  This mechanism would utilize real-time
140	      per-flow traffic statistics to identify whether a flow is sending
141	      traffic at a rate that exceeds the available capacity of the
142	      bottleneck link and hence is causing a queue to form.

144	3.1.  Endpoint marking

146	   This mechanism would have application endpoints apply a marking
147	   (perhaps utilizing the Diffserv field of the IP header) to NQB flows
148	   that could then be used by the network to differentiate between QB
149	   and NQB flows.  It would be useful for such a marking to be
150	   universally agreed upon, rather than being locally defined by the
151	   network operator, such that applications could be written to apply
152	   the marking without regard to local network policies.

154	   Some questions that arise when considering endpoint marking are: How
155	   can an application determine whether it is queue building or not,
156	   given that the sending application is generally not aware of the
157	   available capacity of the path to the receiving endpoint?  Even in
158	   cases where an application is aware of the capacity of the path, how
159	   can it be sure that the available capacity (considering other flows
160	   that may be sharing the path) would be sufficient to result in the
161	   application's traffic not causing a queue to form?  In an unmanaged
162	   environment, how can networks trust endpoint marking, why wouldn't
163	   all applications mark their packets as NQB?

165	   As an answer the last question, it would be worthwhile to note that
166	   the NQB designation and marking would be intended to convey
167	   verifiable traffic behavior, not needs or wants.  Also, it would be
168	   important that incentives are aligned correctly, i.e. that there is a
169	   benefit to the application in marking its packets correctly, and no
170	   benefit for an application in intentionally mismarking its traffic.
171	   Thus, a useful property of nodes that support separate queues for NQB
172	   and QB flows would be that for NQB flows, the NQB queue provides
173	   better performance (considering latency, loss and throughput) than
174	   the QB queue; and for QB flows, the QB queue provides better
175	   performance (considering latency, loss and throughput) than the NQB
176	   queue.

178	   Even so, it is possible that due to an implementation error or
179	   misconfiguration, a QB flow would end up getting mismarked as NQB, or
180	   vice versa.  In the case of an NQB flow that isn't marked as NQB and
181	   ends up in the QB queue, it would only impact its own quality of
182	   service, and so it seems to be of lesser concern.  However, a QB flow
183	   that is mismarked as NQB, either due to error or due to the fact that
184	   the application developer can't predict the data rate capabilities of
185	   the link, would causing queuing delays for all of the other flows
186	   that are sharing the NQB queue.

188	   To prevent this situation from harming the performance of the real
189	   NQB flows, it would likely be valuable to support a "queue
190	   protection" function that could identify QB flows that are mismarked
191	   as NQB, and reclassify those flows/packets to the QB queue.  This
192	   would benefit the reclassified flow by giving it access to a large
193	   buffer (and thus lower packet loss rate), and would benefit the
194	   actual NQB flows by preventing harm (increased latency variability)
195	   to them.  Some open questions around this function include: How could
196	   such a function be implemented in an objective and verifiable manner?
197	   What other options might exist to serve this purpose in a dual-queue
198	   architecture?

200	3.2.  Queuing behavior analysis

202	   Similar to the queue protection function outlined in the previous
203	   section, it may be feasible to devise a real time flow analyzer for a
204	   node that would identify flows that are causing queue build up, and
205	   redirect those flows to the QB queue, leaving the remaining flows in
206	   the NQB queue.

208	4.  Non Queue Building PHB

210	   This section uses the DiffServ nomenclature of per-hop-behavior (PHB)
211	   to describe how a network node could provide better quality of
212	   service for NQB flows without reducing performance of QB flows.

214	   A node supporting the NQB PHB would provide a separate queue for non-
215	   queue-building traffic.  This queue would support a latency-based
216	   queue protection mechanism that is able to identify queue-building
217	   behavior in flows that are classified into the queue, and to redirect
218	   flows causing queue build up to a different queue.

220	   While there may be some similarities between the characteristics of
221	   NQB flows and flows marked with the Expedited Forwarding DSCP, the
222	   NQB PHB would differ from the Expedited Forwarding PHB in several
223	   important ways.

225	   o  NQB traffic is not rate limited or rate policed.  Rather, the NQB
226	      queue would be expected to support a latency-based queue
227	      protection mechanism that identifies NQB marked flows that are
228	      beginning to cause latency, and redirects packets from those flows
229	      to the queue for QB flows.

231	   o  The node supporting the NQB PHB makes no guarantees on latency or
232	      data rate for NQB marked flows, but instead aims to provide sub-
233	      millisecond queuing delays for as many such marked flows as it
234	      can, and shed load when needed.

236	   o  EF is commonly used exclusively for voice traffic, for which
237	      additional functions are applied, such as admission control,
238	      accounting, prioritized delivery, etc.

240	   In networks that support the NQB PHB, it may be preferred to also
241	   include traffic marked EF (101110b) in the NQB queue.  The choice of
242	   the 0x2A codepoint (101010b) for NQB would conveniently allow a node
243	   to select these two codepoints using a single mask pattern of
244	   101x10b.

246	5.  End-to-end Support

248	   In contrast to the existing standard DSCPs, which are typically only
249	   enforced within a DiffServ Domain (e.g. an AS), this DSCP would be
250	   intended for end-to-end usage across the Internet.  Some access
251	   network service providers bleach the Diffserv field on ingress into
252	   their network, and in some cases apply their own DSCP for internal
253	   usage.  Access networks that support the NQB PHB would need to permit
254	   the NQB PHB to pass through this bleaching operation such that the
255	   PHB can be provided at the access network link.

257	6.  Relationship to L4S

259	   The dual-queue mechanism described in this draft is similar to, and
260	   is intended to be compatible with [I-D.ietf-tsvwg-l4s-arch].

262	7.  Comparison to Existing Approaches

264	   Traditional QoS mechanisms focus on prioritization in an attempt to
265	   achieve two goals, reduced latency for "latency-sensitive" traffic,
266	   and increased bandwidth availability for "important" applications.
267	   Applications are generally given priority in proportion to some
268	   combination of latency-sensitivity and importance.

270	   Downsides to this approach include the difficulties in sorting out
271	   what priority level each application should get (making the value
272	   judgement as to latency-sensitivity and importance), associating
273	   packets to priority levels (lots of classifier state, or trusting
274	   endpoint markings and the value judgements that they convey),
275	   ensuring that high priority traffic doesn't starve lower priority
276	   traffic (admission control, weighted scheduling, etc. are possible
277	   solutions).  This solution can work in a managed network, where the
278	   network operator can control the usage of the QoS mechanisms, but has
279	   not been adopted end-to-end across the internet.

281	   Flow queueing approaches (such as fq_codel RFC 8290 [RFC8290]), on
282	   the other hand, achieve latency improvements by associating packets
283	   into "flow" queues and then prioritizing "sparse flows", i.e. packets
284	   that arrive to an empty flow queue.  Flow queueing does not attempt
285	   to differentiate between flows on the basis of value (importance or
286	   latency-sensitivity), it simply gives preference to sparse flows, and
287	   tries to guarantee that the non-sparse flows all get an equal share
288	   of the remaining channel capacity.  As a result, fq mechanisms could
289	   be considered more appropriate for unmanaged environments and general
290	   internet traffic.

292	   Downsides to this approach include loss of low latency performance
293	   due to hash collisions (where a sparse flow shares a queue with a
294	   bulk data flow), complexity in managing a large number of queues, and
295	   the scheduling (typically DRR) that enforces that each non-sparse
296	   flow gets an equal fraction of link bandwidth causes problems with
297	   VPNs and other tunnels, exhibits poor behavior with less-aggressive
298	   CA algos, e.g.  LEDBAT, and exhibits poor behavior with RMCAT CA
299	   algos.  In effect the network element is making a decision as to what
300	   constitutes a flow, and then forcing all such flows to take equal
301	   bandwidth at every instant.

303	   The Dual-queue approach achieves the main benefit of fq_codel:
304	   latency improvement without value judgements, without the downsides.

306	   The distinction between NQB flows and QB flows is similar to the
307	   distinction made between "sparse flow queues" and "non-sparse flow
308	   queues" in fq_codel.  In fq_codel, a flow queue is considered sparse
309	   if it is drained completely by each packet transmission, and remains
310	   empty for at least one cycle of the round robin over the active flows
311	   (this is approximately equivalent to saying that it utilizes less
312	   than its fair share of capacity).  While this definition is
313	   convenient to implement in fq_codel, it isn't the only useful
314	   definition of sparse flows.

316	8.  Acknowledgements

318	   TBD

320	9.  IANA Considerations

322	   TBD

324	10.  Security Considerations

326	   TBD

328	11.  Informative References

330	   [I-D.ietf-tsvwg-l4s-arch]
331	              Briscoe, B., Schepper, K., and M. Bagnulo, "Low Latency,
332	              Low Loss, Scalable Throughput (L4S) Internet Service:
333	              Architecture", draft-ietf-tsvwg-l4s-arch-02 (work in
334	              progress), March 2018.

336	   [RFC8033]  Pan, R., Natarajan, P., Baker, F., and G. White,
337	              "Proportional Integral Controller Enhanced (PIE): A
338	              Lightweight Control Scheme to Address the Bufferbloat
339	              Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017,
340	              <https://www.rfc-editor.org/info/rfc8033>.

342	   [RFC8034]  White, G. and R. Pan, "Active Queue Management (AQM) Based
343	              on Proportional Integral Controller Enhanced PIE) for
344	              Data-Over-Cable Service Interface Specifications (DOCSIS)
345	              Cable Modems", RFC 8034, DOI 10.17487/RFC8034, February
346	              2017, <https://www.rfc-editor.org/info/rfc8034>.

348	   [RFC8289]  Nichols, K., Jacobson, V., McGregor, A., Ed., and J.
349	              Iyengar, Ed., "Controlled Delay Active Queue Management",
350	              RFC 8289, DOI 10.17487/RFC8289, January 2018,
351	              <https://www.rfc-editor.org/info/rfc8289>.

353	   [RFC8290]  Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys,
354	              J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler
355	              and Active Queue Management Algorithm", RFC 8290,
356	              DOI 10.17487/RFC8290, January 2018,
357	              <https://www.rfc-editor.org/info/rfc8290>.

359	Author's Address

361	   Greg White
362	   CableLabs
363	   858 Coal Creek Circle
364	   Louisville, CO  80027
365	   US

367	   Email: g.white@cablelabs.com