idnits 2.17.1 

draft-briscoe-docsis-q-protection-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == There are 4 instances of lines with non-ascii characters in the document.


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 859 has weird spacing: '... pkt_sz   prob...'

  -- The document date (31 January 2022) is 816 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Looks like a reference, but probably isn't: '0' on line 682

  -- Looks like a reference, but probably isn't: '1' on line 682

  == Outdated reference: A later version (-29) exists of
     draft-ietf-tsvwg-ecn-l4s-id-23

  == Outdated reference: A later version (-22) exists of
     draft-ietf-tsvwg-nqb-08

  == Outdated reference: A later version (-03) exists of
     draft-briscoe-iccrg-prague-congestion-control-00

  == Outdated reference: A later version (-25) exists of
     draft-ietf-tsvwg-aqm-dualq-coupled-20

  == Outdated reference: A later version (-20) exists of
     draft-ietf-tsvwg-l4s-arch-15


     Summary: 0 errors (**), 0 flaws (~~), 8 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                    B. Briscoe, Ed.
3	Internet-Draft                                               Independent
4	Intended status: Informational                                  G. White
5	Expires: 4 August 2022                                         CableLabs
6	                                                         31 January 2022

8	     The DOCSIS® Queue Protection Algorithm to Preserve Low Latency
9	                  draft-briscoe-docsis-q-protection-02

11	Abstract

13	   This informational document explains the specification of the queue
14	   protection algorithm used in DOCSIS technology since version 3.1.  A
15	   shared low latency queue relies on the non-queue-building behaviour
16	   of every traffic flow using it.  However, some flows might not take
17	   such care, either accidentally or maliciously.  If a queue is about
18	   to exceed a threshold level of delay, the queue protection algorithm
19	   can rapidly detect the flows most likely to be responsible.  It can
20	   then prevent harm to other traffic in the low latency queue by
21	   ejecting selected packets (or all packets) of these flows.  The
22	   document is designed for three types of audience: a) congestion
23	   control designers who need to understand how to keep on the 'good'
24	   side of the algorithm; b) implementers of the algorithm who want to
25	   understand it in more depth; and c) designers of algorithms with
26	   similar goals, perhaps for non-DOCSIS scenarios.

28	Status of This Memo

30	   This Internet-Draft is submitted in full conformance with the
31	   provisions of BCP 78 and BCP 79.

33	   Internet-Drafts are working documents of the Internet Engineering
34	   Task Force (IETF).  Note that other groups may also distribute
35	   working documents as Internet-Drafts.  The list of current Internet-
36	   Drafts is at https://datatracker.ietf.org/drafts/current/.

38	   Internet-Drafts are draft documents valid for a maximum of six months
39	   and may be updated, replaced, or obsoleted by other documents at any
40	   time.  It is inappropriate to use Internet-Drafts as reference
41	   material or to cite them other than as "work in progress."

43	   This Internet-Draft will expire on 4 August 2022.

45	Copyright Notice

47	   Copyright (c) 2022 IETF Trust and the persons identified as the
48	   document authors.  All rights reserved.

50	   This document is subject to BCP 78 and the IETF Trust's Legal
51	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
52	   license-info) in effect on the date of publication of this document.
53	   Please review these documents carefully, as they describe your rights
54	   and restrictions with respect to this document.  Code Components
55	   extracted from this document must include Revised BSD License text as
56	   described in Section 4.e of the Trust Legal Provisions and are
57	   provided without warranty as described in the Revised BSD License.

59	Table of Contents

61	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
62	     1.1.  Document Roadmap  . . . . . . . . . . . . . . . . . . . .   4
63	     1.2.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   4
64	     1.3.  Copyright Material  . . . . . . . . . . . . . . . . . . .   5
65	   2.  Approach - In Brief . . . . . . . . . . . . . . . . . . . . .   5
66	     2.1.  Mechanism . . . . . . . . . . . . . . . . . . . . . . . .   6
67	     2.2.  Policy  . . . . . . . . . . . . . . . . . . . . . . . . .   6
68	   3.  Necessary Flow Behaviour  . . . . . . . . . . . . . . . . . .   7
69	   4.  Pseudocode Walk-Through . . . . . . . . . . . . . . . . . . .   8
70	     4.1.  Input Parameters, Constants and Variables . . . . . . . .   8
71	     4.2.  Queue Protection Data Path  . . . . . . . . . . . . . . .  11
72	       4.2.1.  The qprotect() function . . . . . . . . . . . . . . .  12
73	       4.2.2.  The pick_bucket() function  . . . . . . . . . . . . .  13
74	       4.2.3.  The fill_bucket() function  . . . . . . . . . . . . .  16
75	       4.2.4.  The calcProbNative() function . . . . . . . . . . . .  16
76	   5.  Rationale . . . . . . . . . . . . . . . . . . . . . . . . . .  17
77	     5.1.  Rationale: Blame for Queuing, not for Rate in Itself  . .  17
78	     5.2.  Rationale for Aging the Queuing Score . . . . . . . . . .  19
79	     5.3.  Rationale for Normalized Queuing Score  . . . . . . . . .  20
80	     5.4.  Rationale for Policy Conditions . . . . . . . . . . . . .  21
81	     5.5.  Rationale for Reclassification as the Policy Action . . .  24
82	   6.  Limitations . . . . . . . . . . . . . . . . . . . . . . . . .  25
83	   7.  IANA Considerations (to be removed by RFC Editor) . . . . . .  25
84	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  25
85	   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  26
86	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  26
87	     10.1.  Normative References . . . . . . . . . . . . . . . . . .  26
88	     10.2.  Informative References . . . . . . . . . . . . . . . . .  27
89	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  28

91	1.  Introduction

93	   This informational document explains the specification of the queue
94	   protection (QProt) algorithm used in DOCSIS technology since version
95	   3.1 [DOCSIS].

97	   Although the algorithm is defined in annex P of [DOCSIS], it relies
98	   on cross-references to other parts of the set of specs.  This
99	   document pulls all the strands together into one self-contained
100	   document.  The core of the document is a similar pseudocode walk-
101	   through to that in the DOCSIS spec, but it also includes additional
102	   material: i) a brief overview; ii) a definition of how a data sender
103	   needs to behave to avoid triggering queue protection; and iii) a
104	   section giving the rationale for the design choices.

106	   Low queuing delay depends on hosts sending their data smoothly,
107	   either at low rate or responding to explicit congestion notifications
108	   (ECN).  So low queuing latency is something hosts create themselves,
109	   not something the network gives them.  Therefore, self-interest gives
110	   no incentive for flows to mis-mark their packets for the low latency
111	   queue.  However, traffic from an application that does not behave in
112	   a non-queue-building way might erroneously be classified into a low
113	   latency queue, whether accidentally or maliciously.  QProt protects
114	   other traffic in the low latency queue from the harm due to excess
115	   queuing that would otherwise be caused by such anomalous behaviour.

117	   In normal scenarios without misclassified traffic, QProt is not
118	   expected to intervene at all in the classification or forwarding of
119	   packets.

121	   An overview of how low latency support has been added to DOCSIS
122	   technology is given in [I-D.white-tsvwg-lld].  In each direction of a
123	   DOCSIS link (upstream and downstream), there are two queues: one for
124	   Low Latency and one for Classic traffic, in an arrangement similar to
125	   the IETF's Coupled DualQ AQM [I-D.ietf-tsvwg-aqm-dualq-coupled].  The
126	   two queues enable a transition from 'Classic' to 'Scalable'
127	   congestion control so that low latency can become the norm for any
128	   application, including ones seeking both full throughput and low
129	   latency, not just low-rate applications that have been more
130	   traditionally associated with a low latency service.  The Classic
131	   queue is only necessary for traffic such as traditional (Reno/Cubic)
132	   TCP that needs about a round trip of buffering to fully utilize the
133	   link, and therefore has no incentive to mismark itself as low
134	   latency.  The QProt function is located at the ingress to the Low
135	   Latency queue.  Therefore, in the upstream QProt is located on the
136	   cable modem (CM), and in the downstream it is located on the cable
137	   CMTS (CM Termination System).  If an arriving packet triggers queue
138	   protection, the QProt algorithm reclassifies the packet from the Low
139	   Latency queue into the Classic queue.

141	   If QProt is used in settings other than DOCSIS links, it would be a
142	   simple matter to detect queue-building flows by using slightly
143	   different conditions, and/or to trigger a different action as a
144	   consequence, as appropriate for the scenario, e.g., dropping instead
145	   of reclassifying packets or perhaps accumulating a second per-flow
146	   score to decide whether to redirect a whole flow rather than just
147	   certain packets.  Such work is for future study and out of scope of
148	   the present document.

150	   The algorithm is based on a rigorous approach to quantifying how much
151	   each flow contributes to congestion, which is used in economics to
152	   allocate responsibility for the cost of one party's behaviour on
153	   others (the economic externality).  Another important feature of the
154	   approach is that the metric used for the queuing score is based on
155	   the same variable that determines the level of ECN signalling seen by
156	   the sender [RFC8311], [I-D.ietf-tsvwg-ecn-l4s-id].  This makes the
157	   internal queuing score visible externally as ECN markings.  This
158	   transparency is necessary to be able to objectively state (in
159	   Section 3) how a flow can keep on the 'good' side of the algorithm.

161	1.1.  Document Roadmap

163	   The core of the document is the walk-through of the DOCSIS QProt
164	   algorithm's pseudocode in Section 4.

166	   Prior to that, two brief sections provide a "bluffer's guide to
167	   QProt" which should suffice for those who do not need the details or
168	   the insights.  Section 2 summarizes the approach used in the
169	   algorithm.  Then Section 3 considers QProt from the perspective of
170	   the end-system, by defining the behaviour that a flow needs to comply
171	   with to avoid the QProt algorithm ejecting its packets from the low
172	   latency queue.

174	   Section 5 gives deeper insight into the principles and rationale
175	   behind the algorithm.  Then Section 6 explains the limitations of the
176	   approach, followed by the usual closing sections.

178	1.2.  Terminology

180	   The normative language for the DOCSIS QProt algorithm is in the
181	   DOCSIS specs [DOCSIS], [DOCSIS-CM-OSS], [DOCSIS-CCAP-OSS] not in this
182	   informational guide.  If there is any inconsistency, the DOCSIS specs
183	   take precedence.

185	   The following terms and abbreviations are used:

187	   CM:  Cable Modem

189	   CMTS:  CM Termination System

191	   Congestion-rate:  The rate at which a flow induces ECN-marked (or
192	      dropped) bytes, where an ECN-mark on a packet is defined as
193	      marking all the packet's bytes.  Congestion-bit-rate and
194	      congestion-volume were introduced in [RFC7713] and [RFC6789].

196	   DOCSIS:  Data Over Cable System Interface Specification.  "DOCSIS" is
197	      a registered trademark of Cable Television Laboratories, Inc.
198	      ("CableLabs").

200	   Non-queue-building:  A flow that tends not to build a queue

202	   Queue-building:  A flow that builds a queue.  If it is classified
203	      into the Low Latency queue, it is therefore a candidate for the
204	      queue protection algorithm to detect and sanction.

206	   ECN:  Explicit Congestion Notification

208	   QProt:  The Queue Protection function

210	1.3.  Copyright Material

212	   Parts of this document are reproduced from [DOCSIS] with kind
213	   permission of the copyright holder, Cable Television Laboratories,
214	   Inc. ("CableLabs").

216	2.  Approach - In Brief

218	   The algorithm is divided into mechanism and policy.  There is only a
219	   tiny amount of policy code, but policy might need to be changed in
220	   the future.  So, where hardware implementation is being considered,
221	   it would be advisable to implement the policy aspects in firmware or
222	   software:

224	   *  The mechanism aspects identify flows, maintain flow-state and
225	      accumulate per-flow queuing scores;

227	   *  The policy aspects can be divided into conditions and actions:

229	      -  The conditions are the logic that determines when action should
230	         be taken to avert the risk of queuing delay becoming excessive;

232	      -  The actions determine how this risk is averted, e.g., by
233	         redirecting packets from a flow into another queue, or to
234	         reclassify a whole flow that seems to be misclassified.

236	2.1.  Mechanism

238	   The algorithm maintains per-flow-state, where 'flow' usually means an
239	   end-to-end (layer-4) 5-tuple.  The flow-state consists of a queuing
240	   score normalized to also represent the flow-state's own expiry time
241	   (explained in Section 5.3).  A higher queuing score pushes out the
242	   expiry time further.

244	   Non-queue-building flows tend to release their flow-state rapidly ---
245	   it usually expires reasonably early in the gap between the packets of
246	   a normal flow.  Then the memory can be recycled for packets from
247	   other flows that arrive in between.  So only queue-building flows
248	   hold flow state persistently.

250	   The simplicity and effectiveness of the algorithm is due to the
251	   definition of the queuing score.  It uses the internal variable from
252	   the native AQM, probNative, that determines the ECN marking
253	   probability of the low latency queue alone (Classic queuing is not
254	   relevant).  The actual DOCSIS QProt algorithm is defined using
255	   integer arithmetic, but in the floating point arithmetic used in this
256	   document, (0 <= probNative <= 1).  The queueing score algorithm
257	   accumulates the size of each arriving packet of a flow scaled by the
258	   value of probNative at the time.

260	   The algorithm as described so far would accumulate a number that
261	   would rise at the so-called congestion-rate of the flow, i.e., the
262	   rate at which the flow is contributing to congestion, or the rate at
263	   which the AQM is forwarding bytes of the flow that are ECN marked.
264	   However, rather than growing continually, the queuing score is also
265	   reduced (or 'aged') at a constant rate.  This gives each flow an
266	   allowed congestion rate, which is justified in Section 5.2.

268	   In practice, the queuing score is normalized into time units (so that
269	   it represents the expiry time of the flow state, as already discussed
270	   above).  Then it does not need to be explicitly aged, because the
271	   natural passage of time implicitly 'ages' an expiry time (explained
272	   in Section 5.3).

274	2.2.  Policy

276	   The algorithm uses the queuing score to determine whether to eject
277	   each packet only at the time it first arrives.  This limits the
278	   policies available.  For instance, when queueing delay exceeds a
279	   threshold, it is not possible to eject a packet from the flow with
280	   the highest queuing scoring, because that would involve searching the
281	   queue for such a packet (if indeed one was still in the queue).
282	   Nonetheless, it is still possible to develop a policy that protects
283	   the low latency of the queue by making the queuing score threshold
284	   stricter the greater the excess of queuing delay relative to the
285	   threshold (explained in Section 5.4).

287	   In the DOCSIS QProt spec at the time of writing, when the policy
288	   conditions are met the action taken to protect the low latency queue
289	   is to reclassify a packet into the Classic queue (justified in
290	   Section 5.5).

292	3.  Necessary Flow Behaviour

294	   The QProt algorithm described here can be used for responsive and/or
295	   unresponsive flows.

297	   *  It is possible to objectively describe the least responsive way
298	      that a flow will need to respond to congestion signals in order to
299	      avoid triggering queue protection, no matter the link capacity and
300	      no matter how much other traffic there is.

302	   *  It is not possible to describe how fast or smooth an unresponsive
303	      flow should be to avoid queue protection, because this depends on
304	      how much other traffic there is and the capacity of the link,
305	      which an application is unable to know.  However, the more
306	      smoothly an unresponsive flow paces its packets and the lower its
307	      rate relative to typical broadband link capacities, the less
308	      likelihood that it will risk causing enough queueing to trigger
309	      queue protection.

311	   Responsive low latency flows can use an L4S ECN codepoint
312	   [I-D.ietf-tsvwg-ecn-l4s-id] to get classified into the low latency
313	   queue.

315	   A sender can arrange for flows that are smooth but do not respond to
316	   ECN marking to be classified into the low latency queue by using the
317	   Non-Queue-Building (NQB) Diffserv codepoint [I-D.ietf-tsvwg-nqb],
318	   which the DOCSIS specs support, or an operator can use various other
319	   local classifiers.

321	   The QProt algorithm is driven from the same variable that drives the
322	   ECN marking probability in the low latency queue (see the Immediate
323	   Active Queue Management Annex in [DOCSIS]).  The algorithm that
324	   calculates this internal variable is run on the arrival of every
325	   packet, whether it is ECN-capable or not, so that it can be used by
326	   the QProt algorithm.  But the variable is only used to ECN-mark
327	   packets that are ECN-capable.

329	   Not only does this dual use of the variable improve processing
330	   efficiency, but it also makes the basis of the QProt algorithm
331	   visible and transparent, at least for responsive ECN-capable flows.

333	   Then it is possible to state objectively that a flow can avoid
334	   triggering queue protection by keeping the bit rate of ECN marked
335	   packets (the congestion-rate) below AGING, which is a configured
336	   constant of the algorithm (default 2^19 B/s ~= 4.2 Mb/s).  Note that
337	   it is in a congestion controller's own interest to keep its average
338	   congestion-rate well below this level (e.g., ~1 Mb/s), to ensure that
339	   it does not trigger queue protection during transient dynamics.

341	   If the QProt algorithm is used in other settings, it would still need
342	   to be based on the visible level of congestion signalling, in a
343	   similar way to the DOCSIS approach.  Without transparency of the
344	   basis of the algorithm's decisions, end-systems would not be able to
345	   avoid triggering queue protection on an objective basis.

347	4.  Pseudocode Walk-Through

349	4.1.  Input Parameters, Constants and Variables

351	   The operator input parameters that set the parameters in the first
352	   two blocks of pseudocode below are defined for cable modems (CMs) in
353	   [DOCSIS-CM-OSS] and for CMTSs in [DOCSIS-CCAP-OSS].  Then, further
354	   constants are either derived from the input parameters or hard-coded.

356	   Defaults and units are shown in square brackets.  Defaults (or indeed
357	   any aspect of the algorithm) are subject to change, so the latest
358	   DOCSIS specs are the definitive references.  Also any operator might
359	   set certain parameters to non-default values.

361	   <CODE BEGINS>
362	   // Input Parameters
363	   QPROTECT_ON        // Queue Protection is enabled [Default: TRUE]
364	   CRITICALqL_us      // L queue threshold delay [us] Default: MAXTH_us
365	   CRITICALqLSCORE_us // The threshold queuing score [Default: 4000us]
366	   LG_AGING           // The aging rate of the q'ing score [Default: 19]
367	                      //  as log base 2 of the congestion-rate [lg(B/s)]

369	   // Input Parameters for the calcProbNative() algorithm:
370	   MAXTH_us           // Max IAQM marking threshold [Default: 1000us]
371	   LG_RANGE           // Log base 2 of the range of ramp [lg(ns)]
372	                      //  Default: 2^19 = 524288 ns (roughly 525 us)
373	   <CODE ENDS>
374	   <CODE BEGINS>
375	   // Constants, either derived from input parameters or hard-coded
376	   T_RES                                     // Resolution of t_exp [ns]
377	                                             // Convert units
378	   AGING = pow(2, (LG_AGING-30) ) * T_RES    // lg([B/s]) to [B/T_RES]
379	   CRITICALqL = CRITICALqL_us * 1000         // [us] to [ns]
380	   CRITICALqLSCORE = CRITICALqLSCORE_us * 1000 /T_RES // [us] to [T_RES]
381	   // Threshold for the q'ing score condition
382	   CRITICALqLPRODUCT = CRITICALqL * CRITICALqLSCORE
383	   qLSCORE_MAX = 5E9 / T_RES            // Max queuing score = 5 s

385	   ATTEMPTS = 2; // Max attempts to pick a bucket (vendor-specific)
386	   BI_SIZE = 5;  // Bit-width of index number for non-default buckets
387	   NBUCKETS = pow(2, BI_SIZE);  // No. of non-default buckets
388	   MASK = NBUCKETS-1;           // convenient constant, filled with ones

390	                          // Queue Protection exit states
391	   EXIT_SUCCESS  = 0;     // Forward the packet
392	   EXIT_SANCTION = 1;     // Redirect the packet

394	   MAX_PROB      = 1; // For integer arithmetic, would use a large int
395	                      //  e.g., 2^31, to allow space for overflow
396	   MAXTH = MAXTH_us * 1000;   // Max marking threshold [ns]
397	   // Minimum marking threshold of 2 MTU for slow links [ns]
398	   FLOOR =  2 * 8 * MAX FRAME SIZE * 10^9 / MAX RATE;
399	   RANGE = (1 << LG_RANGE);      // Range of ramp [ns]
400	   MINTH = max ( MAXTH - RANGE, FLOOR);
401	   MAXTH = MINTH + RANGE;           // Max marking threshold [ns]
402	   <CODE ENDS>

404	   The resolution for expressing time, T_RES, needs to be chosen to
405	   ensure that expiry times for buckets can represent times that are a
406	   fraction (e.g., 1/10) of the expected packet interarrival time for
407	   the system.

409	   The following definitions explain the purpose of important variables
410	   and functions.

412	   <CODE BEGINS>
413	   // Public variables:
414	   qdelay         // The current queuing delay of the LL queue [ns]
415	   probNative     // Native marking probability of LL queue within [0,1]

417	   // External variables
418	   packet             // The structure holding packet header fields
419	   packet.size        // The size of the current packet [B]
420	   packet.uflow       // The flow identifier of the current packet
421	                      //  (e.g., 5-tuple or 4-tuple if IPSec)

423	   // Irrelevant details of DOCSIS function to return qdelay are removed
424	   qdelayL(...)      // Returns current delay of the low latency Q [ns]
425	   <CODE ENDS>

427	   Pseudocode for how the algorithm categorizes packets by flow ID to
428	   populate the variable packet.uflow is not given in detail here.  The
429	   application's flow ID is usually defined by a common 5-tuple (or
430	   4-tuple) of:

432	   *  source and destination IP addresses of the innermost IP header
433	      found;

435	   *  the protocol (IPv4) or next header (IPv6) field in this IP header

437	   *  either of:

439	      -  source and destination port numbers, for TCP, UDP, UDP-Lite,
440	         SCTP, DCCP, etc.

442	      -  Security Parameters Index (SPI) for IPSec Encapsulating
443	         Security Payload (ESP) [RFC4303].

445	   The Microflow Classification section of the Queue Protection Annex of
446	   the DOCSIS spec.  [DOCSIS] defines various strategies to find these
447	   headers by skipping extension headers or encapsulations.  If they
448	   cannot be found, the spec. defines various less-specific 3-tuples
449	   that would be used.  The DOCSIS spec. should be referred to for all
450	   these strategies, which will not be repeated here.

452	   The array of bucket structures defined below is used by all the Queue
453	   Protection functions:

455	   <CODE BEGINS>
456	   struct bucket { // The leaky bucket structure to hold per-flow state
457	      id;          // identifier (e.g., 5-tuple) of flow using bucket
458	      t_exp;       // expiry time in units of T_RES
459	                   // (t_exp - now) = flow's normalized q'ing score
460	   };
461	   struct bucket buckets[NBUCKETS+1];
462	   <CODE ENDS>

464	4.2.  Queue Protection Data Path

466	   All the functions of Queue Protection operate on the data path,
467	   driven by packet arrivals.

469	   The following functions that maintain per-flow queuing scores and
470	   manage per-flow state are considered primarily as mechanism:

472	      pick_bucket(uflow_id); // Returns bucket identifier

474	      fill_bucket(bucket_id, pkt_size, probNative); // Returns queuing
475	      score

477	      calcProbNative(qdelay) // Returns probability of ECN-marking

479	   The following function is primarily concerned with policy:

481	      qprotect(packet, ...); // Returns exit status to either forward or
482	      redirect the packet

484	   ('...' suppresses distracting detail.)

486	   Future modifications to policy aspects are more likely than to
487	   mechanisms.  Therefore, policy aspects would be less appropriate
488	   candidates for any hardware acceleration.

490	   The entry point to these functions is qprotect(), which is called
491	   from packet classification before each packet is enqueued into the
492	   appropriate queue, queue_id, as follows:

494	   <CODE BEGINS>
495	   classifier(packet) {
496	      // Determine which queue using ECN, DSCP and any local-use fields
497	      queue_id = classify(packet);
498	      //  LQ & CQ are macros for valid queue IDs returned by classify()
499	      if (queue_id == LQ) {
500	         // if packet classified to Low Latency Service Flow
501	         if (QPROTECT_ON) {
502	            if (qprotect(packet, ...) == EXIT_SANCTION) {
503	               // redirect packet to Classic Service Flow
504	               queue_id = CQ;
505	            }
506	         }
507	      return queue_id;
508	   }
509	   <CODE ENDS>

511	4.2.1.  The qprotect() function

513	   On each packet arrival, qprotect() measures the current queue delay
514	   and derives the native marking probability from it.  Then it uses
515	   pick_bucket to find the bucket already holding the flow's state, or
516	   to allocate a new bucket if the flow is new or its state has expired
517	   (the most likely case).  Then the queuing score is updated by the
518	   fill_bucket() function.  That completes the mechanism aspects.

520	   The comments against the subsequent policy conditions and actions
521	   should be self-explanatory at a superficial level.  The deeper
522	   rationale for these conditions is given in Section 5.4.

524	   <CODE BEGINS>
525	   // Per packet queue protection
526	   qprotect(packet, ...) {

528	      bckt_id;   // bucket index
529	      qLscore;   // queuing score of pkt's flow in units of T_RES

531	      qdelay = qL.qdelay(...);
532	      probNative = calcProbNative(qdelay);

534	      bckt_id = pick_bucket(packet.uflow);
535	      // if (bckt_id->t_exp risks overflow)    // Details not shown
536	      //    return EXIT_SANCTION;
537	      qLscore = fill_bucket(buckets[bckt_id], packet.size, probNative);

539	      // Determine whether to sanction packet
540	      if ( ( ( qdelay > CRITICALqL ) // Test if qdelay over threshold...
541	         // ...and if flow's q'ing score scaled by qdelay/CRITICALqL
542	         // ...exceeds CRITICALqLSCORE
543	         && ( qdelay * qLscore > CRITICALqLPRODUCT ) )
544	         // or qLSCORE_MAX reached
545	         || ( qLscore >= qLSCORE_MAX ) )

547	         return EXIT_SANCTION;

549	      else
550	         return EXIT_SUCCESS;
551	   }
552	   <CODE ENDS>

554	4.2.2.  The pick_bucket() function

556	   The pick_bucket() function is optimized for flow-state that will
557	   normally have expired from packet to packet of the same flow.  It is
558	   just one way of finding the bucket associated with the flow ID of
559	   each packet - it might be possible to develop more efficient
560	   alternatives.

562	   The algorithm is arranged so that the bucket holding any live (non-
563	   expired) flow-state associated with a packet will always be found
564	   before a new bucket is allocated.  The constant ATTEMPTS, defined
565	   earlier, determines how many hashes are used to find a bucket for
566	   each flow (actually, only one hash is generated; then, by default, 5
567	   bits of it at a time are used as the hash value, because by default
568	   there are 2^5 = 32 buckets).

570	   The algorithm stores the flow's own ID in its flow-state.  So, when a
571	   packet of a flow arrives, the algorithm tries up to ATTEMPTS times to
572	   hash to a bucket, looking for the flow's own ID.  If found, it uses
573	   that bucket, first resettings the expiry time to 'now' if it has
574	   expired.

576	   If it does not find the flow's ID, and the expiry time is still
577	   current, the algorithm can tell that another flow is using that
578	   bucket, and it continues to look for a bucket for the flow.  Even if
579	   it finds another flow's bucket where the expiry time has passed, it
580	   doesn't immediately use it.  It merely remembers it as the potential
581	   bucket to use.  But first it runs through all the ATTEMPTS hashes to
582	   look for a bucket assigned to the flow ID.  Then, if a live bucket is
583	   not already associated with the packet's flow, the algorithm should
584	   have already set aside an existing bucket with a score that has aged
585	   out.  Given this bucket is no longer necessary to hold state for its
586	   previous flow, it can be recycled for use by the present packet's
587	   flow.

589	   If all else fails, there is one additional bucket (called the dregs)
590	   that can be used.  If the dregs is still in live use by another flow,
591	   subsequent flows that cannot find a bucket of their own all share it,
592	   adding their score to the one in the dregs.  A flow might get away
593	   with using the dregs on its own, but when there are many mis-marked
594	   flows, multiple flows are more likely to collide in the dregs,
595	   including innocent flows.  The choice of number of buckets and number
596	   of hash attempts determines how likely it will be that this
597	   undesirable scenario will occur.

599	   <CODE BEGINS>
600	   // Pick the bucket associated with flow uflw
601	   pick_bucket(uflw) {

603	      now;                      // current time
604	      j;                        // loop counter
605	      h32;                      // holds hash of the packet's flow IDs
606	      h;                        // bucket index being checked
607	      hsav;                     // interim chosen bucket index

609	      h32   = hash32(uflw);     // 32-bit hash of flow ID
610	      hsav  = NBUCKETS;         // Default bucket
611	      now   = get_time_now();   // in units of T_RES

613	      // The for loop checks ATTEMPTS buckets for ownership by flow-ID
614	      // It also records the 1st bucket, if any, that could be recycled
615	      // because it's expired.
616	      // Must not recycle a bucket until all ownership checks completed
617	      for (j=0; j<ATTEMPTS; j++) {
618	         // Use least signif. BI_SIZE bits of hash for each attempt
619	         h = h32 & MASK;
620	         if (buckets[h].id == uflw) {    // Once uflw's bucket found...
621	            if (buckets[h].t_exp <= now) // ...if bucket has expired...
622	               buckets[h].t_exp = now;   // ...reset it
623	            return h;                    // Either way, use it
624	         }
625	         else if ( (hsav == NBUCKETS)  // If not seen expired bucket yet
626	                                       //  and this bucket has expired
627	              && (buckets[h].t_exp <= now) ) {
628	            hsav = h;                  // set it as the interim bucket
629	         }
630	         h32 >>= BI_SIZE;          // Bit-shift hash for next attempt
631	      }
632	      // If reached here, no tested bucket was owned by the flow-ID
633	      if (hsav != NBUCKETS) {
634	         // If here, found an expired bucket within the above for loop
635	         buckets[hsav].t_exp = now;              // Reset expired bucket
636	      } else {
637	         // If here, we're having to use the default bucket (the dregs)
638	         if (buckets[hsav].t_exp <= now) {   // If dregs has expired...
639	            buckets[hsav].t_exp = now;       // ...reset it
640	         }
641	      }
642	      buckets[hsav].id = uflw; // In either case, claim for recycling
643	      return hsav;
644	   }
645	   <CODE ENDS>

647	4.2.3.  The fill_bucket() function

649	   The fill_bucket() function both accumulates and ages the queuing
650	   score over time, as outlined in Section 2.1.  To make aging the score
651	   efficient, the increment of the queuing score is normalized to units
652	   of time by dividing by AGING, so that the result represents the new
653	   expiry time of the flow.

655	   Given that probNative is already used to select which packets to ECN-
656	   mark, it might be thought that the queuing score could just be
657	   incremented by the full size of each selected packet, instead of
658	   incrementing it by the product of every packet's size (pkt_sz) and
659	   probNative.  However, the unpublished experience of one of the
660	   authors with other congestion policers has found that the score then
661	   increments far too jumpily, particularly when probNative is low.

663	   A deeper explanation of the queuing score is given in Section 5.

665	   <CODE BEGINS>
666	   fill_bucket(bckt_id, pkt_sz, probNative) {
667	      now;                                       // current time
668	      now = get_time_now();                      // in units of T_RES
669	      // Add packet's queuing score
670	      // For integer arithmetic, a bit-shift can replace the division
671	      qLscore = min(buckets[bckt_id].t_exp - now
672	                    + probNative * pkt_sz / AGING, qLSCORE_MAX);
673	      buckets[bckt_id].t_exp = now + qLscore;
674	      return qLscore;
675	   }
676	   <CODE ENDS>

678	4.2.4.  The calcProbNative() function

680	   To derive this queuing score, the QProt algorithm uses the linear
681	   ramp function calcProbNative() to normalize instantaneous queuing
682	   delay into a probability in the range [0,1], which it assigns to
683	   probNative.

685	   <CODE BEGINS>
686	   calcProbNative(qdelay){
687	         if ( qdelay >= MAXTH ) {
688	            probNative = MAX_PROB;
689	         } else if ( qdelay > MINTH ) {
690	            probNative = MAX_PROB * (qdelay - MINTH)/RANGE;
691	            // In practice, the * and the / would use a bit-shift
692	         } else {
693	            probNative = 0;
694	         }
695	         return probNative;
696	   }
697	   <CODE ENDS>

699	5.  Rationale

701	5.1.  Rationale: Blame for Queuing, not for Rate in Itself

703	   Figure 1 shows the bit rates of two flows as stacked areas.  It poses
704	   the question of which flow is more to blame for queuing delay; the
705	   unresponsive constant bit rate flow (c) that is consuming about 80%
706	   of the capacity, or the flow sending regular short unresponsive
707	   bursts (b)?  The smoothness of c seems better for avoiding queuing,
708	   but its high rate does not.  However, if flow c was not there, or ran
709	   slightly more slowly, b would not cause any queuing.

711	   ^ bit rate (stacked areas)
712	   |  ,-.          ,-.          ,-.          ,-.          ,-.
713	   |--|b|----------|b|----------|b|----------|b|----------|b|---Capacity
714	   |__|_|__________|_|__________|_|__________|_|__________|_|_____
715	   |
716	   |                       c
717	   |
718	   |
719	   |
720	   +---------------------------------------------------------------->
721	                                                                 time

723	            Figure 1: Which is More to Blame for Queuing Delay?

725	   To explain queuing scores, in the following it will initially be
726	   assumed that the QProt algorithm is accumulating queuing scores, but
727	   not taking any action as a result.

729	   To quantify the responsibility that each flow bears for queuing
730	   delay, the QProt algorithm accumulates the product of the rate of
731	   each flow and the level of congestion, both measured at the instant
732	   each packet arrives.  The instantaneous flow rate is represented at
733	   each discrete event when a packet arrives by the packet's size, which
734	   accumulates faster the more packets arrive within each unit of time.
735	   The level of congestion is normalized to a dimensionless number
736	   between 0 and 1 (nativeProb).  This fractional congestion level is
737	   used in preference to a direct dependence on queuing delay for two
738	   reasons:

740	   *  to be able to ignore very low levels of queuing that contribute
741	      insignificantly to delay

743	   *  to be able to erect a steep barrier against excessive queuing
744	      delay

746	   The unit of the resulting queue score is "congested-bytes" per
747	   second, which distinguishes it from just bytes per second.

749	   Then, during the periods between bursts (b), neither flow accumulates
750	   any queuing score - the high rate of c is benign.  But, during each
751	   burst, if we say the rate of c and b are 80% and 45% of capacity,
752	   thus causing 25% overload, they each bear (80/125)% and (45/125)% of
753	   the responsibility for the queuing delay (64% and 36%).  The
754	   algorithm does not explicitly calculate these percentages.  They are
755	   just the outcome of the number of packets arriving from each flow
756	   during the burst.

758	   To summarize, the queuing score never sanctions rate solely on its
759	   own account.  It only sanctions rate inasmuch as it causes queuing.

761	   ^ bit rate (stacked areas)                               ,
762	   |               ,-.                       |\           ,-
763	   |------Capacity-|b|----------,-.----------|b|----------|b\-----
764	   |             __|_|_______   |b|        /``\| _...-._-': | ,.--
765	   |  ,-.     __/            \__|_|_     _/    |/          \|/
766	   |  |b| ___/                      \___/   __       r
767	   |  |_|/                v             \__/  \_______    _/\____/
768	   | _/                                               \__/
769	   |
770	   +---------------------------------------------------------------->
771	                                                                 time

773	        Figure 2: Responsibility for Queuing: More Complex Scenario

775	   Figure 2 gives a more complex illustration of the way the queuing
776	   score assigns responsibility for queuing (limited to the precision
777	   that ASCII art can illustrate).  The figure shows the bit rates of
778	   three flows represented as stacked areas labelled b, v and r.  The
779	   unresponsive bursts (b) are the same as in the previous example, but
780	   a variable rate video (v) replaces flow c.  It's rate varies as the
781	   complexity of the video scene varies.  Also on a slower timescale, in
782	   response to the level of congestion, the video adapts its quality.
783	   However, on a short time-scale it appears to be unresponsive to small
784	   amounts of queuing.  Also, part-way through, a low latency responsive
785	   flow (r) joins in, aiming to fill the balance of capacity left by the
786	   other two.

788	   The combination of the first burst and the low application-limited
789	   rate of the video causes neither flow to accumulate queuing score.
790	   In contrast, the second burst causes similar excessive overload
791	   (125%) to the example in Figure 1.  Then, the video happens to reduce
792	   its rate (probably due to a less complex scene) so the third burst
793	   causes only a little congestion.  Let us assume the resulting queue
794	   causes probNative to rise to just 1%, then the queuing score will
795	   only accumulate 1% of the size of each packet of flows v and b during
796	   this burst.

798	   The fourth burst happens to arrive just as the new responsive flow
799	   (r) has filled the available capacity, so it leads to very rapid
800	   growth of the queue.  After a round trip the responsive flow rapidly
801	   backs off, and the adaptive video also backs off more rapidly than it
802	   would normally, because of the very high congestion level.  The rapid
803	   response to congestion of flow r reduces the queuing score that all
804	   three flows accumulate, but they each still bear the cost in
805	   proportion to the product of the rates at which their packets arrive
806	   at the queue and the value of probNative when they do so.  Thus,
807	   during the fifth burst, they all accumulate less score than the
808	   fourth, because the queuing delay is not as excessive.

810	5.2.  Rationale for Aging the Queuing Score

812	   Even well-behaved flows will not always be able to respond fast
813	   enough to dynamic events.  Also well-behaved flows, e.g., DCTCP
814	   [RFC8257], TCP Prague [I-D.briscoe-iccrg-prague-congestion-control],
815	   BBRv2 [BBRv2] or the L4S variant of SCReAM [SCReAM] for real-time
816	   media [RFC8298], can maintain a very shallow queue by continual
817	   careful probing for more while also continually subtracting a little
818	   from their rate (or congestion window) in response to low levels of
819	   ECN signalling.  Therefore, the QProt algorithm needs to continually
820	   offer a degree of forgiveness to age out the queuing score as it
821	   accumulates.

823	   Scalable congestion controllers such as those above maintain their
824	   congestion window in inverse proportion to the congestion level,
825	   probNative.  That leads to the important property that on average a
826	   scalable flow holds the product of its congestion window and the
827	   congestion level constant, no matter the capacity of the link or how
828	   many other flows it competes with.  For instance, if the link
829	   capacity doubles, a scalable flow induces half the congestion
830	   probability.  Or if three scalable flows compete for the capacity,
831	   each flow will reduce to one third of the capacity they would use on
832	   their own and increase the congestion level by 3x.

834	   This suggests that the QProt algorithm will not sanction a well-
835	   behaved scalable flow if it ages out the queuing score at a
836	   sufficient constant rate.  The constant will need to be somewhat
837	   above the average of a well-behaved scalable flow to allow for normal
838	   dynamics.

840	   Relating QProt's aging constant to a scalable flow does not mean that
841	   a flow has to behave like a scalable flow.  It can be less
842	   aggressive, but not more.  For instance, a longer RTT flow can run at
843	   a lower congestion-rate than the aging rate, but it can also increase
844	   its aggressiveness to equal the rate of short RTT scalable flows
845	   [ScalingCC].  The constant aging of QProt also means that a long-
846	   running unresponsive flow will be prone to trigger QProt if it runs
847	   faster than a competing responsive scalable flow would.  And, of
848	   course, if a flow causes excessive queuing in the short-term, its
849	   queuing score will still rise faster than the constant aging process
850	   will decrease it.  Then QProt will still eject the flow's packets
851	   before they harm the low latency of the shared queue.

853	5.3.  Rationale for Normalized Queuing Score

855	   The QProt algorithm holds a flow's queuing score state in a structure
856	   called a bucket, because of its similarity to a classic leaky bucket
857	   (except the contents of the bucket does not represent bytes).

859	   probNative * pkt_sz   probNative * pkt_sz / AGING
860	             |                        |
861	          |  V  |                  |  V  |
862	          |  :  |        ___       |  :  |
863	          |_____|        ___       |_____|
864	          |     |        ___       |     |
865	          |__ __|                  |__ __|
866	             |                        |
867	             V                        V
868	        AGING * Dt                    Dt

870	                  Figure 3: Normalization of Queuing Score

872	   The accumulation and aging of the queuing score is shown on the left
873	   of Figure 3 in token bucket form.  Dt is the difference between the
874	   times when the scores of the current and previous packets were
875	   processed.

877	   A normalized equivalent of this token bucket is shown on the right of
878	   Figure 3, dividing both the input and output by the constant AGING
879	   rate.  The result is a bucket-depth that represents time and it
880	   drains at the rate that time passes.

882	   As a further optimization, the time the bucket was last updated is
883	   not stored with the flow-state.  Instead, when the bucket is
884	   initialized the queuing score is added to the system time 'now' and
885	   the resulting expiry time is written into the bucket.  Subsequently,
886	   if the bucket has not expired, the incremental queuing score is added
887	   to the time already held in the bucket.  Then the queuing score
888	   always represents the expiry time of the flow-state itself.  This
889	   means that the queuing score does not need to be aged explicitly
890	   because it ages itself implicitly.

892	5.4.  Rationale for Policy Conditions

894	   Pseudocode for the QProt policy conditions is given in Section 4.1
895	   within the second half of the qprotect() function.  When each packet
896	   arrives, after finding its flow state and updating the queuing score
897	   of the packet's flow, the algorithm checks whether the shared queue
898	   delay exceeds a constant threshold CRITICALqL (e.g., 2 ms), as
899	   repeated below for convenience:

901	   <CODE BEGINS>
902	      if (  ( qdelay > CRITICALqL )  // Test if qdelay over threshold...
903	         // ...and if flow's q'ing score scaled by qdelay/CRITICALqL
904	         // ...exceeds CRITICALqLSCORE
905	         && ( qdelay * qLscore > CRITICALqLPRODUCT ) )
906	         // Recall that CRITICALqLPRODUCT = CRITICALqL * CRITICALqLSCORE
907	   <CODE ENDS>

909	   If the queue delay threshold is exceeded, the flow's queuing score is
910	   temporarily scaled up by the ratio of the current queue delay to the
911	   threshold queuing delay, CRITICALqL (the reason for the scaling is
912	   given next).  If this scaled up score exceeds another constant
913	   threshold CRITICALqLSCORE, the packet is ejected.  The actual last
914	   line of code above multiplies both sides of the second condition by
915	   CRITICALqL to avoid a costly division.

917	   This approach allows each packet to be assessed once, as it arrives.
918	   Once queue delay exceeds the threshold, it has two implications:

920	   *  The current packet might be ejected even though there are packets
921	      already in the queue from flows with higher queuing scores.
922	      However, any flow that continues to contribute to the queue will
923	      have to send further packets, giving an opportunity to eject them
924	      as well, as they subsequently arrive.

926	   *  The next packets to arrive might not be ejected, because they
927	      might belong to flows with low queuing scores.  In this case,
928	      queue delay could continue to rise with no opportunity to eject a
929	      packet.  This is why the queuing score is scaled up by the current
930	      queue delay.  Then, the more the queue has grown without ejecting
931	      a packet, the more the algorithm 'raises the bar' to further
932	      packets.

934	   The above approach is preferred over the extra per-packet processing
935	   cost of searching the buckets for the flow with the highest queuing
936	   score and searching the queue for one of its packets to eject (if one
937	   is still in the queue).

939	   Note that by default CRITICALqL_us is set to the maximum threshold of
940	   the ramp marking algorithm, MAXTH_us.  However, there is some debate
941	   as to whether setting it to the minimum threshold instead would
942	   improve QProt performance.  This would roughly double the ratio of
943	   qdelay to CRITICALqL, which is compared against the CRITICALqLSCORE
944	   threshold.  So the threshold would have to be roughly doubled
945	   accordingly.

947	   Figure 4 explains this approach graphically.  On the horizontal axis
948	   it shows actual harm, meaning the queuing delay in the shared queue.
949	   On the vertical axis it shows the behaviour record of the flow
950	   associated with the currently arriving packet, represented in the
951	   algorithm by the flow's queuing score.  The shaded region represents
952	   the combination of actual harm and behaviour record that will lead to
953	   the packet being ejected.

955	   Behaviour Record:
956	   Queueing Score of
957	   Arriving Packet's Flow
958	   ^
959	   |   +          |/ / / / / / / / / / / / / / / / / / /
960	   |    +   N     | / / / / / / / / / / / / / / / / / / /
961	   |     +        |/ / / / /                   / / / / /
962	   |      +       | / / / /  E (Eject packet)   / / / / /
963	   |       +      |/ / / / /                   / / / / /
964	   |         +    | / / / / / / / / / / / / / / / / / / /
965	   |           +  |/ / / / / / / / / / / / / / / / / / /
966	   |             +| / / / / / / / / / / / / / / / / / / /
967	   |              |+ / / / / / / / / / / / / / / / / / /
968	   |    N         |   + / / / / / / / / / / / / / / / / /
969	   | (No actual   |       +/ / / / / / / / / / / / / / /
970	   |   harm)      |            +  / / / / / / / / / / / /
971	   |              | P (Pass over)   +   ,/ / / / / / / /
972	   |              |                           ^ + /./ /_/
973	   +--------------+------------------------------------------>
974	             CRITICALqL        Actual Harm: Shared Queue Delay

976	          Figure 4: Graphical Explanation of the Policy Conditions

978	   The regions labelled 'N' represent cases where the first condition is
979	   not met - no actual harm - queue delay is below the critical
980	   threshold, CRITICALqL.

982	   The region labelled 'E' represents cases where there is actual harm
983	   (queue delay exceeds CRITICALqL) and the queuing score associated
984	   with the arriving packet is high enough to be able to eject it with
985	   certainty.

987	   The region labelled 'P' represents cases where there is actual harm,
988	   but the queuing score of the arriving packet is insufficient to eject
989	   it, so it has to be Passed over.  This adds to queuing delay, but the
990	   alternative would be to sanction an innocent flow.  It can be seen
991	   that, as actual harm increases, the judgement of innocence becomes
992	   increasingly stringent; the behaviour record of the next packet's
993	   flow does not have to be as bad to eject it.

995	   Conditioning ejection on actual harm helps prevent VPN packets being
996	   ejected unnecessarily.  VPNs consisting of multiple flows can tend to
997	   accumulate queuing score faster than it is aged out, because the
998	   aging rate is intended for a single flow.  However, whether or not
999	   some traffic is in a VPN, the queue delay threshold (CRITICALqL) will
1000	   be no more likely to be exceeded.  So conditioning ejection on actual
1001	   harm helps reduce the chance that VPN traffic will be ejected by the
1002	   QProt function.

1004	5.5.  Rationale for Reclassification as the Policy Action

1006	   When the DOCSIS QProt algorithm deems that it is necessary to eject a
1007	   packet to protect the Low Latency queue, it redirects the packet to
1008	   the Classic queue.  In the Low Latency DOCSIS architecture (as in
1009	   Coupled DualQ AQMs generally), a scheduler is arranged to give the
1010	   Low Latency Queue conditional priority over the Classic queue, for
1011	   instance using weighted round robin (WRR) with a high weight in
1012	   favour of the Low Latency queue.

1014	   Therefore, typically, an ejected packet will experience higher
1015	   queuing delay than it would otherwise, and it could be re-ordered
1016	   within its flow (assuming QProt does not eject all packets of an
1017	   anomalous flow).  The mild harm caused to the performance of the
1018	   ejected packet's flow is deliberate.  It gives senders a slight
1019	   incentive to identify their packets correctly.

1021	   If there were no such harm, there would be nothing to prevent all
1022	   flows from identifying themselves as suitable for classification into
1023	   the low latency queue, and just letting QProt sort the resulting
1024	   aggregate into queue-building and non-queue-building flows.  This
1025	   might seem like a useful alternative to requiring senders to
1026	   correctly identify their flows.  However, handling of mis-classified
1027	   flows is not without a cost.  The more packets that have to be
1028	   reclassified, the more often the delay of the low latency queue would
1029	   exceed the threshold.  Also more memory would be required to hold the
1030	   extra flow state.

1032	   When a packet is redirected into the Classic queue, an operator might
1033	   want to alter the identifier(s) that originally caused it to be
1034	   classified into the Low Latency queue, so that the packet will not be
1035	   classified into another low latency queue further downstream.
1036	   However, redirection of occasional packets can be due to unusually
1037	   high transient load just at the specific bottleneck, not necessarily
1038	   at any other bottleneck, and not necessarily due to bad flow
1039	   behaviour.  Therefore, Section 5.4.1.2 of [I-D.ietf-tsvwg-ecn-l4s-id]
1040	   precludes a network node from altering the end-to-end ECN field to
1041	   exclude traffic from L4S treatment.  Instead a local-use identifier
1042	   ought to be used (e.g., Diffserv Codepoint or VLAN tag), so that each
1043	   operator can apply its own policy, without prejudging what other
1044	   operators ought to do.

1046	   Although not supported in the DOCSIS specs, QProt could be extended
1047	   to recognize that large numbers of redirected packets belong to the
1048	   same flow.  This might be detected when the bucket expiry time t_exp
1049	   exceeds a threshold, which could also conveniently prevent overflow
1050	   of the t_exp variable (see the qprotect() function in Section 4.2.1).
1051	   Depending on policy and implementation capabilities, QProt could then
1052	   install a classifier to redirect a whole flow into the Classic queue,
1053	   with an idle timeout to remove stale classifiers.  In these
1054	   'persistent offender' cases, QProt might also overwrite each
1055	   redirected packet's DSCP or clear its ECN field to Not-ECT, in order
1056	   to protect other potential L4S queues downstream.  The DOCSIS specs
1057	   do not discuss sanctioning whole flows, so further discussion is
1058	   beyond the scope of the present document.

1060	6.  Limitations

1062	   The QProt algorithm groups packets with common layer-4 flow
1063	   identifiers.  It then uses this grouping to accumulate queuing scores
1064	   and to sanction packets.

1066	   This choice of identifier for grouping is pragmatic with no
1067	   scientific basis.  All the packets of a flow certainly pass between
1068	   the same two endpoints.  But some applications might initiate
1069	   multiple flows between the same end-points, e.g., for media, control,
1070	   data, etc.  Others might use common flow identifiers for all these
1071	   streams.  Also, a user might group multiple application flows within
1072	   the same encrypted VPN between the same layer-4 tunnel end-points.
1073	   And even if there were a one-to-one mapping between flows and
1074	   applications, there is no reason to believe that the rate at which
1075	   congestion can be caused ought to be allocated on a per application
1076	   flow basis.

1078	   The use of a queuing score that excludes those aspects of flow rate
1079	   that do not contribute to queuing (Section 5.1) goes some way to
1080	   mitigating this limitation, because the algorithm does not judge
1081	   responsibility for queuing delay primarily on the combined rate of a
1082	   set of flows grouped under one flow ID.

1084	7.  IANA Considerations (to be removed by RFC Editor)

1086	   This specification contains no IANA considerations.

1088	8.  Security Considerations

1090	   The whole of this document concerns traffic security.  It considers
1091	   the security question of how to identify and eject traffic that does
1092	   not comply with the non-queue-building behaviour required to use a
1093	   shared low latency queue, whether accidentally or maliciously.

1095	   The algorithm has been designed to fail gracefully in the face of
1096	   traffic crafted to overrun the resources used for the algorithm's own
1097	   processing and flow state.  This means that non-queue-building flows
1098	   will always be less likely to be sanctioned than queue-building
1099	   flows.  But an attack could be contrived to deplete resources in such
1100	   a way that the proportion of innocent (non-queue-building) flows that
1101	   are incorrectly sanctioned could increase.

1103	   Section 8.2 of the L4S architecture [I-D.ietf-tsvwg-l4s-arch]
1104	   introduces the problem of maintaining low latency by either self-
1105	   restraint or enforcement, and places DOCSIS queue protection in
1106	   context within a wider set of approaches to the problem.

1108	9.  Acknowledgements

1110	   Thanks to Tom Henderson and to Adrian Farrel for their reviews of
1111	   this document.  The design of the QProt algorithm and the settings of
1112	   the parameters benefited from discussion and critique from the
1113	   participants of the cable industry working group on Low Latency
1114	   DOCSIS.  CableLabs funded Bob Briscoe's initial work on this
1115	   document.

1117	10.  References

1119	10.1.  Normative References

1121	   [DOCSIS]   CableLabs, "MAC and Upper Layer Protocols Interface
1122	              (MULPI) Specification, CM-SP-MULPIv3.1", Data-Over-Cable
1123	              Service Interface Specifications DOCSIS® 3.1 Version I17
1124	              or later, 21 January 2019, <https://specification-
1125	              search.cablelabs.com/CM-SP-MULPIv3.1>.

1127	   [DOCSIS-CCAP-OSS]
1128	              CableLabs, "CCAP Operations Support System Interface
1129	              Spec", Data-Over-Cable Service Interface Specifications
1130	              DOCSIS® 3.1 Version I14 or later, 21 January 2019,
1131	              <https://specification-search.cablelabs.com/CM-SP-CM-
1132	              OSSIv3.1>.

1134	   [DOCSIS-CM-OSS]
1135	              CableLabs, "Cable Modem Operations Support System
1136	              Interface Spec", Data-Over-Cable Service Interface
1137	              Specifications DOCSIS® 3.1 Version I14 or later, 21
1138	              January 2019, <https://specification-search.cablelabs.com/
1139	              CM-SP-CM-OSSIv3.1>.

1141	   [I-D.ietf-tsvwg-ecn-l4s-id]
1142	              Schepper, K. D. and B. Briscoe, "Explicit Congestion
1143	              Notification (ECN) Protocol for Very Low Queuing Delay
1144	              (L4S)", Work in Progress, Internet-Draft, draft-ietf-
1145	              tsvwg-ecn-l4s-id-23, 24 December 2021,
1146	              <https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-
1147	              ecn-l4s-id-23>.

1149	   [I-D.ietf-tsvwg-nqb]
1150	              White, G. and T. Fossati, "A Non-Queue-Building Per-Hop
1151	              Behavior (NQB PHB) for Differentiated Services", Work in
1152	              Progress, Internet-Draft, draft-ietf-tsvwg-nqb-08, 25
1153	              October 2021, <https://datatracker.ietf.org/doc/html/
1154	              draft-ietf-tsvwg-nqb-08>.

1156	   [RFC8311]  Black, D., "Relaxing Restrictions on Explicit Congestion
1157	              Notification (ECN) Experimentation", RFC 8311,
1158	              DOI 10.17487/RFC8311, January 2018,
1159	              <https://www.rfc-editor.org/info/rfc8311>.

1161	10.2.  Informative References

1163	   [BBRv2]    Cardwell, N., "TCP BBR v2 Alpha/Preview Release", github
1164	              repository; Linux congestion control module,
1165	              <https://github.com/google/bbr/blob/v2alpha/README.md>.

1167	   [I-D.briscoe-iccrg-prague-congestion-control]
1168	              Schepper, K. D., Tilmans, O., and B. Briscoe, "Prague
1169	              Congestion Control", Work in Progress, Internet-Draft,
1170	              draft-briscoe-iccrg-prague-congestion-control-00, 9 March
1171	              2021, <https://datatracker.ietf.org/doc/html/draft-
1172	              briscoe-iccrg-prague-congestion-control-00>.

1174	   [I-D.ietf-tsvwg-aqm-dualq-coupled]
1175	              Schepper, K. D., Briscoe, B., and G. White, "DualQ Coupled
1176	              AQMs for Low Latency, Low Loss and Scalable Throughput
1177	              (L4S)", Work in Progress, Internet-Draft, draft-ietf-
1178	              tsvwg-aqm-dualq-coupled-20, 24 December 2021,
1179	              <https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-
1180	              aqm-dualq-coupled-20>.

1182	   [I-D.ietf-tsvwg-l4s-arch]
1183	              Briscoe, B., Schepper, K. D., Bagnulo, M., and G. White,
1184	              "Low Latency, Low Loss, Scalable Throughput (L4S) Internet
1185	              Service: Architecture", Work in Progress, Internet-Draft,
1186	              draft-ietf-tsvwg-l4s-arch-15, 24 December 2021,
1187	              <https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-
1188	              l4s-arch-15>.

1190	   [I-D.white-tsvwg-lld]
1191	              White, G., Sundaresan, K., and B. Briscoe, "Low Latency
1192	              DOCSIS - Technology Overview", Work in Progress, Internet-
1193	              Draft, draft-white-tsvwg-lld-00, 11 March 2019,
1194	              <https://datatracker.ietf.org/doc/html/draft-white-tsvwg-
1195	              lld-00>.

1197	   [RFC4303]  Kent, S., "IP Encapsulating Security Payload (ESP)",
1198	              RFC 4303, DOI 10.17487/RFC4303, December 2005,
1199	              <https://www.rfc-editor.org/info/rfc4303>.

1201	   [RFC6789]  Briscoe, B., Ed., Woundy, R., Ed., and A. Cooper, Ed.,
1202	              "Congestion Exposure (ConEx) Concepts and Use Cases",
1203	              RFC 6789, DOI 10.17487/RFC6789, December 2012,
1204	              <https://www.rfc-editor.org/info/rfc6789>.

1206	   [RFC7713]  Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx)
1207	              Concepts, Abstract Mechanism, and Requirements", RFC 7713,
1208	              DOI 10.17487/RFC7713, December 2015,
1209	              <https://www.rfc-editor.org/info/rfc7713>.

1211	   [RFC8257]  Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L.,
1212	              and G. Judd, "Data Center TCP (DCTCP): TCP Congestion
1213	              Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257,
1214	              October 2017, <https://www.rfc-editor.org/info/rfc8257>.

1216	   [RFC8298]  Johansson, I. and Z. Sarker, "Self-Clocked Rate Adaptation
1217	              for Multimedia", RFC 8298, DOI 10.17487/RFC8298, December
1218	              2017, <https://www.rfc-editor.org/info/rfc8298>.

1220	   [ScalingCC]
1221	              Briscoe, B. and K. De Schepper, "Resolving Tensions
1222	              between Congestion Control Scaling Requirements", Simula
1223	              Technical Report TR-CS-2016-001 arXiv:1904.07605, July
1224	              2017, <https://arxiv.org/abs/1904.07605>.

1226	   [SCReAM]   Johansson, I., "SCReAM", github repository; ,
1227	              <https://github.com/EricssonResearch/scream/blob/master/
1228	              README.md>.

1230	Authors' Addresses

1232	   Bob Briscoe (editor)
1233	   Independent
1234	   United Kingdom

1236	   Email: ietf@bobbriscoe.net
1237	   URI:   http://bobbriscoe.net/
1238	   Greg White
1239	   CableLabs
1240	   United States of America

1242	   Email: G.White@CableLabs.com