```
```

```
```

```
```

```
```

```
```

```
```
Insights from Curvy RED (Random Early Detection)
BT
Controlling Queue Delay
PARC
Pollere Inc
MEDF - a simple scheduling algorithm for two real-time
transport service classes with application in the UTRAN
University of Wuerzburg
Infosim AG
Siemens
Siemens
Destruction Testing: Ultra-Low Delay using Dual Queue Coupled
Active Queue Management
Uni Oslo

```
```
As a first concrete example, the pseudocode below gives the DualPI2
algorithm. DualPI2 follows the structure of the DualQ Coupled AQM
framework in . A simple step
threshold (in units of queuing time) is used for the Native L4S AQM, but
a ramp is also described as an alternative. And the PI2 algorithm is used for the Classic AQM. PI2 is an improved variant
of the PIE AQM .
We will introduce the pseudocode in two passes. The first pass
explains the core concepts, deferring handling of overload to the second
pass. To aid comparison, line numbers are kept in step between the two
passes by using letter suffixes where the longer code needs extra
lines.
A full open source implementation for Linux is available at:
https://github.com/olgabo/dualpi2.
The pseudocode manipulates three main structures of variables: the
packet (pkt), the L4S queue (lq) and the Classic queue (cq). The
pseudocode consists of the following five functions:
initialization code () that sets parameter
defaults (the API for setting non-default values is omitted for
brevity)
enqueue code ()
dequeue code ()
a ramp function ()
used to calculate the ECN-marking probability for the L4S
queue
code to regularly update the base probability (p) used in the
dequeue code ().

It also uses the following functions that are not shown in
full here:
scheduler(), which selects between the head packets of the two
queues; the choice of scheduler technology is discussed later;
cq.len() or lq.len() returns the current length (aka. backlog)
of the relevant queue in bytes;
cq.time() or lq.time() returns the current queuing delay (aka.
sojourn time or service time) of the relevant queue in units of
time;

Queuing delay could be measured directly by storing a per-packet
time-stamp as each packet is enqueued, and subtracting this from the
system time when the packet is dequeued. If time-stamping is not easy
to introduce with certain hardware, queuing delay could be predicted
indirectly by dividing the size of the queue by the predicted
departure rate, which might be known precisely for some link
technologies (see for example ).
In our experiments so far (building on experiments with PIE) on
broadband access links ranging from 4 Mb/s to 200 Mb/s with base RTTs
from 5 ms to 100 ms, DualPI2 achieves good results with the default
parameters in . The
parameters are categorised by whether they relate to the Base PI2 AQM,
the L4S AQM or the framework coupling them together. Variables derived
from these parameters are also included at the end of each category.
Each parameter is explained as it is encountered in the walk-through
of the pseudocode below.
For brevity the pseudocode shows some parameters in units of
microseconds (us), but a real implementation would probably use
nanoseconds.
The overall goal of the code is to maintain the base probability
(p), which is an internal variable from which the marking and dropping
probabilities for L4S and Classic traffic (p_L and p_C) are derived.
The variable named p in the pseudocode and in this walk-through is the
same as p' (p-prime) in . The
probabilities p_L and p_C are derived in lines 3, 4 and 5 of the
dualpi2_update() function ()
then used in the dualpi2_dequeue() function (). The code walk-through below
builds up to explaining that part of the code eventually, but it
starts from packet arrival.
When packets arrive, first a common queue limit is checked as shown
in line 2 of the enqueuing pseudocode in . Note that the limit is
deliberately tested before enqueue to avoid any bias against larger
packets (so depending whether the implementation stores a packet while
testing whether to drop it from the tail, it might be necessary for
the actual buffer memory to be one MTU larger than limit).
Line 2 assumes an implementation where lq and cq share common
buffer memory. An alternative implementation could use separate
buffers for each queue, in which case the arriving packet would have
to be classified first to determine which buffer to check for
available space. The choice is a trade off; a shared buffer can use
less memory whereas separate buffers isolate the L4S queue from
tail-drop due to large bursts of Classic traffic (e.g. a Classic TCP
during slow-start over a long RTT).
Returning to the shared buffer case, if limit is not exceeded, the
packet will be classified and enqueued to the Classic or L4S queue
dependent on the least significant bit of the ECN field in the IP
header (line 5). Packets with a codepoint having an LSB of 0 (Not-ECT
and ECT(0)) will be enqueued in the Classic queue. Otherwise, ECT(1)
and CE packets will be enqueued in the L4S queue. Optional additional
packet classification flexibility is omitted for brevity (see ).
The dequeue pseudocode () is repeatedly called whenever
the lower layer is ready to forward a packet. It schedules one packet
for dequeuing (or zero if the queue is empty) then returns control to
the caller, so that it does not block while that packet is being
forwarded. While making this dequeue decision, it also makes the
necessary AQM decisions on dropping or marking. The alternative of
applying the AQMs at enqueue would shift some processing from the
critical time when each packet is dequeued. However, it would also add
a whole queue of delay to the control signals, making the control loop
very sloppy.
All the dequeue code is contained within a large while loop so that
if it decides to drop a packet, it will continue until it selects a
packet to schedule. Line 3 of the dequeue pseudocode is where the
scheduler chooses between the L4S queue (lq) and the Classic queue
(cq). Detailed implementation of the scheduler is not shown (see
discussion later).
If an L4S packet is scheduled, lines 7 and 8 ECN-mark the
packet if a random marking decision is drawn according to p_L.
Line 6 calculates p_L as the maximum of the coupled L4S
probability p_CL and the probability from the native L4S AQM p'_L.
This implements the max() function shown in to couple the outputs of the two
AQMs together. Of the two probabilities input to p_L in line
6:
p'_L is calculated per packet in line 5 by the laqm()
function (see ),
whereas p_CL is maintained by the dualpi2_update() function
which runs every Tupdate (default 16ms) (see ).

If a Classic packet is scheduled, lines 10 to 17 drop or mark
the packet based on the squared probability p_C.

The Native L4S AQM algorithm () is a ramp function, similar to
the RED algorithm, but simpler due to the following differences:
The min and max of the ramp are defined in units of queuing
delay, not bytes, so that configuration remains invariant as the
queue departure rate varies.
It uses instantaneous queueing delay to remove smoothing delay
(L4S senders smooth incoming ECN feedback when necessary).
The ramp rises linearly directly from 0 to 1, not to a an
intermediate value of p'_L as RED would, because there is no need
to keep ECN marking probability low.
Marking does not have to be randomized. Determinism is being
experimented with instead of randomness; to reduce the delay
necessary to smooth out the noise of randomness from the signal.
In this case, for each packet, the algorithm would accumulate p_L
in a counter and mark the packet that took the counter over 1,
then subtract 1 from the counter and continue.

This ramp function requires two configuration parameters, the
minimum threshold (minTh) and the width of the ramp (range), both in
units of queuing time), as shown in the parameter initialization code
in . A minimum marking
threshold parameter (Th_len) in transmission units (default 2 MTU) is
also necessary to ensure that the ramp does not trigger excessive
marking on slow links. The code in lines 23-28 of converts 2 MTU into time
units and adjusts the ramp thresholds to be no shallower than this
floor.
An operator can effectively turn the ramp into a step function, as
used by DCTCP, by setting the range to its minimum value (e.g. 1 ns).
Then the condition for the ramp calculation will hardly ever arise.
There is some concern that using the step function of DCTCP for the
Native L4S AQM requires end-systems to smooth the signal for an
unnecessarily large number of round trips to ensure sufficient
fidelity. A ramp seems to be no worse than a step in initial
experiments with existing DCTCP. Therefore, it is recommended that a
ramp is configured in place of a step, which will allow congestion
control algorithms to investigate faster smoothing algorithms.
p_CL depends on the base probability (p), which is kept up to date
by the core PI algorithm in
executed every Tupdate.
Note that p solely depends on the queuing time in the Classic
queue. In line 2, the current queuing delay (curq) is evaluated from
how long the head packet was in the Classic queue (cq). The function
cq.time() (not shown) subtracts the time stamped at enqueue from the
current time and implicitly takes the current queuing delay as 0 if
the queue is empty.
The algorithm centres on line 3, which is a classical
Proportional-Integral (PI) controller that alters p dependent on: a)
the error between the current queuing delay (curq) and the target
queuing delay ('target' - see ); and b) the
change in queuing delay since the last sample. The name 'PI'
represents the fact that the second factor (how fast the queue is
growing) is P roportional to load while the
first is the I ntegral of the load (so it
removes any standing queue in excess of the target).
The two 'gain factors' in line 3, alpha_U and beta_U, respectively
weight how strongly each of these elements ((a) and (b)) alters p.
They are in units of 'per second of delay' or Hz, because they
transform differences in queueing delay into changes in
probability.
alpha_U and beta_U are derived from the input parameters alpha and
beta (see lines 5 and 6 of ). These recommended values
of alpha and beta come from the stability analysis in so that the AQM can change p as fast as possible in
response to changes in load without over-compensating and therefore
causing oscillations in the queue.
alpha and beta determine how much p ought to change if it was
updated every second. It is best to update p as frequently as
possible, but the update interval (Tupdate) will probably be
constrained by hardware performance. For link rates from 4 - 200 Mb/s,
we found Tupdate=16ms (as recommended in ) is
sufficient. However small the chosen value of Tupdate, p should change
by the same amount per second, but in finer more frequent steps. So
the gain factors used for updating p in need to be scaled by (Tupdate/1s),
which is done in lines 9 and 10 of ). The suffix '_U' represents
'per update time' (Tupdate).
In corner cases, p can overflow the range [0,1] so the resulting
value of p has to be bounded (omitted from the pseudocode). Then, as
already explained, the coupled and Classic probabilities are derived
from the new p in lines 4 and 5 as p_CL = k*p and p_C = p^2.
Because the coupled L4S marking probability (p_CL) is factored up
by k, the dynamic gain parameters alpha and beta are also inherently
factored up by k for the L4S queue, which is necessary to ensure that
Classic TCP and DCTCP controls have the same stability. So, if alpha
is 10 Hz^2, the effective gain factor for the L4S queue is k*alpha,
which is 20 Hz^2 with the default coupling factor of k=2.
Unlike in PIE , alpha_U and beta_U do not
need to be tuned every Tupdate dependent on p. Instead, in PI2,
alpha_U and beta_U are independent of p because the squaring applied
to Classic traffic tunes them inherently. This is explained in , which also explains why this more principled approach
removes the need for most of the heuristics that had to be added to
PIE.
{ToDo: Scaling beta with Tupdate and scaling both alpha & beta
with RTT}
repeats the
dequeue function of , but
with overload details added. Similarly repeats the core PI algorithm
of with overload details
added. The initialization, enqueue and L4S AQM functions are
unchanged.
In line 7 of the initialization function (), the default maximum
Classic drop probability p_Cmax = 1/4 or 25%. This is the point at
which it is deemed that the Classic queue has become persistently
overloaded, so it switches to using solely drop, even for ECN-capable
packets. This protects the queue against any unresponsive traffic that
falsely claims that it is responsive to ECN marking, as required by
and .
Line 22 of the initialization function translates this into a
maximum L4S marking probability (p_Lmax) by rearranging Equation (1).
With a coupling factor of k=2 (the default) or greater, this
translates to a maximum L4S marking probability of 1 (or 100%). This
is intended to ensure that the L4S queue starts to introduce dropping
once marking saturates and can rise no further. The 'TCP Prague'
requirements state that,
when an L4S congestion control detects a drop, it falls back to a
response that coexists with 'Classic' TCP. So it is correct that the
L4S queue drops packets proportional to p^2, as if they are Classic
packets.
Both these switch-overs are triggered by the tests for overload
introduced in lines 4b and 12b of the dequeue function (). Lines 8c to 8g drop L4S
packets with probability p^2. Lines 8h to 8i mark the remaining
packets with probability p_CL. If p_Lmax = 1, which is the suggested
default configuration, all remaining packets will be marked because,
to have reached the else block at line 8b, p_CL >= 1.
Lines 2c to 2d in the core PI algorithm () deal with overload of the L4S
queue when there is no Classic traffic. This is necessary, because the
core PI algorithm maintains the appropriate drop probability to
regulate overload, but it depends on the length of the Classic queue.
If there is no Classic queue the naive algorithm in drops nothing, even if the L4S
queue is overloaded - so tail drop would have to take over (lines 3
and 4 of ).
If the test at line 2a finds that the Classic queue is empty, line
2d measures the current queue delay using the L4S queue instead. While
the L4S queue is not overloaded, its delay will always be tiny
compared to the target Classic queue delay. So p_L will be driven to
zero, and the L4S queue will naturally be governed solely by threshold
marking (lines 5 and 6 of the dequeue algorithm in ). But, if unresponsive L4S
source(s) cause overload, the DualQ transitions smoothly to L4S
marking based on the PI algorithm. And as overload increases, it
naturally transitions from marking to dropping by the switch-over
mechanism already described.
The choice of scheduler technology is critical to overload
protection (see ).
A well-understood weighted scheduler such as weighted round
robin (WRR) is recommended. The scheduler weight for Classic
should be low, e.g. 1/16.
Alternatively, a time-shifted FIFO could be used. This is a
very simple scheduler, but it does not fully isolate latency in
the L4S queue from uncontrolled bursts in the Classic queue. It
works by selecting the head packet that has waited the longest,
biased against the Classic traffic by a time-shift of tshift. To
implement time-shifted FIFO, the "if (scheduler() == lq )" test in
line 3 of the dequeue code would simply be replaced by "if (
lq.time() + tshift >= cq.time() )". For the public Internet a
good value for tshift is 50ms. For private networks with smaller
diameter, about 4*target would be reasonable.
A strict priority scheduler would be inappropriate, because it
would starve Classic if L4S was overloaded.

As another example of a DualQ Coupled AQM algorithm, the pseudocode
below gives the Curvy RED based algorithm we used and tested. Although
we designed the AQM to be efficient in integer arithmetic, to aid
understanding it is first given using real-number arithmetic. Then, one
possible optimization for integer arithmetic is given, also in
pseudocode. To aid comparison, the line numbers are kept in step between
the two by using letter suffixes where the longer code needs extra
lines.
Packet classification code is not shown, as it is no different from
. Potential classification
schemes are discussed in . The
Curvy RED algorithm has not been maintained to the same degree as the
DualPI2 algorithm. Some ideas used in DualPI2 would need to be
translated into Curvy RED, such as i) the conditional priority scheduler
instead of strict priority ii) the time-based L4S threshold; iii)
turning off ECN as overload protection; iv) Classic ECN support. These
are not shown in the Curvy RED pseudocode, but would need to be
implemented for production. {ToDo}
At the outer level, the structure of dualq_dequeue() implements
strict priority scheduling. The code is written assuming the AQM is
applied on dequeue (Note ) . Every time dualq_dequeue() is called,
the if-block in lines 2-6 determines whether there is an L4S packet to
dequeue by calling lq.dequeue(pkt), and otherwise the while-block in
lines 7-13 determines whether there is a Classic packet to dequeue, by
calling cq.dequeue(pkt). (Note )
In the lower priority Classic queue, a while loop is used so that, if
the AQM determines that a classic packet should be dropped, it continues
to test for classic packets deciding whether to drop each until it
actually forwards one. Thus, every call to dualq_dequeue() returns one
packet if at least one is present in either queue, otherwise it returns
NULL at line 14. (Note )
Within each queue, the decision whether to drop or mark is taken as
follows (to simplify the explanation, it is assumed that U=1):
If the test at line 2 determines there is an L4S
packet to dequeue, the tests at lines 3a and 3c determine whether to
mark it. The first is a simple test of whether the L4S queue
(lq.byt() in bytes) is greater than a step threshold T in bytes
(Note ). The second
test is similar to the random ECN marking in RED, but with the
following differences: i) the marking function does not start with a
plateau of zero marking until a minimum threshold, rather the
marking probability starts to increase as soon as the queue is
positive; ii) marking depends on queuing time, not bytes, in order
to scale for any link rate without being reconfigured; iii) marking
of the L4S queue does not depend on itself, it depends on the
queuing time of the other (Classic)
queue, where cq.sec() is the queuing time of the packet at the head
of the Classic queue (zero if empty); iv) marking depends on the
instantaneous queuing time (of the other Classic queue), not a
smoothed average; v) the queue is compared with the maximum of U
random numbers (but if U=1, this is the same as the single random
number used in RED).Specifically, in line 3a
the marking probability p_L is set to the Classic queueing time
qc.sec() in seconds divided by the L4S scaling parameter 2^S_L,
which represents the queuing time (in seconds) at which marking
probability would hit 100%. Then in line 3d (if U=1) the result is
compared with a uniformly distributed random number between 0 and 1,
which ensures that marking probability will linearly increase with
queueing time. The scaling parameter is expressed as a power of 2 so
that division can be implemented as a right bit-shift (>>) in
line 3 of the integer variant of the pseudocode ().
If the test at line 7 determines that there
is at least one Classic packet to dequeue, the test at line 9b
determines whether to drop it. But before that, line 8b updates Q_C,
which is an exponentially weighted moving average (Note ) of the queuing time
in the Classic queue, where pkt.sec() is the instantaneous queueing
time of the current Classic packet and alpha is the EWMA constant
for the classic queue. In line 8a, alpha is represented as an
integer power of 2, so that in line 8 of the integer code the
division needed to weight the moving average can be implemented by a
right bit-shift (>> f_C).Lines 9a and
9b implement the drop function. In line 9a the averaged queuing time
Q_C is divided by the Classic scaling parameter 2^S_C, in the same
way that queuing time was scaled for L4S marking. This scaled
queuing time is given the variable name sqrt_p_C because it will be
squared to compute Classic drop probability, so before it is squared
it is effectively the square root of the drop probability. The
squaring is done by comparing it with the maximum out of two random
numbers (assuming U=1). Comparing it with the maximum out of two is
the same as the logical `AND' of two tests, which ensures drop
probability rises with the square of queuing time (Note ). Again, the
scaling parameter is expressed as a power of 2 so that division can
be implemented as a right bit-shift in line 9 of the integer
pseudocode.

The marking/dropping functions in each queue (lines 3 & 9) are
two cases of a new generalization of RED called Curvy RED, motivated as
follows. When we compared the performance of our AQM with fq_CoDel and
PIE, we came to the conclusion that their goal of holding queuing delay
to a fixed target is misguided . As the
number of flows increases, if the AQM does not allow TCP to increase
queuing delay, it has to introduce abnormally high levels of loss. Then
loss rather than queuing becomes the dominant cause of delay for short
flows, due to timeouts and tail losses.
Curvy RED constrains delay with a softened target that allows some
increase in delay as load increases. This is achieved by increasing drop
probability on a convex curve relative to queue growth (the square curve
in the Classic queue, if U=1). Like RED, the curve hugs the zero axis
while the queue is shallow. Then, as load increases, it introduces a
growing barrier to higher delay. But, unlike RED, it requires only one
parameter, the scaling, not three. The diadvantage of Curvy RED is that
it is not adapted to a wide range of RTTs. Curvy RED can be used as is
when the RTT range to support is limited otherwise an adaptation
mechanism is required.
There follows a summary listing of the two parameters used for each
of the two queues:
The scaling factor of the dropping function
scales Classic queuing times in the range [0, 2^(S_C)] seconds
into a dropping probability in the range [0,1]. To make division
efficient, it is constrained to be an integer power of two;
To smooth the queuing time of the Classic
queue and make multiplication efficient, we use a negative
integer power of two for the dimensionless EWMA constant, which
we define as alpha = 2^(-f_C).

As for the Classic queue, the
scaling factor of the L4S marking function scales Classic
queueing times in the range [0, 2^(S_L)] seconds into a
probability in the range [0,1]. Note that S_L = S_C + k', where
k' is the coupling between the queues. So S_L and k' count as
only one parameter; k' is related to k in Equation (1) () by k=2^k', where both k and k' are
constants. Then implementations can avoid costly division by
shifting p_L by k' bits to the right.
The queue size in bytes at which step
threshold marking starts in the L4S queue.

{ToDo: These are the raw parameters used within the algorithm.
A configuration front-end could accept more meaningful parameters and
convert them into these raw parameters.}
From our experiments so far, recommended values for these parameters
are: S_C = -1; f_C = 5; T = 5 * MTU for the range of base RTTs typical
on the public Internet. explains why
these parameters are applicable whatever rate link this AQM
implementation is deployed on and how the parameters would need to be
adjusted for a scenario with a different range of RTTs (e.g. a data
centre) {ToDo incorporate a summary of that report into this draft}. The
setting of k depends on policy (see and
respectively for its recommended
setting and guidance on alternatives).
There is also a cUrviness parameter, U, which is a small positive
integer. It is likely to take the same hard-coded value for all
implementations, once experiments have determined a good value. We have
solely used U=1 in our experiments so far, but results might be even
better with U=2 or higher.
Note that the dropping function at line 9 calls maxrand(2*U), which
gives twice as much curviness as the call to maxrand(U) in the marking
function at line 3. This is the trick that implements the square rule in
equation (1) (). This is based on the fact
that, given a number X from 1 to 6, the probability that two dice throws
will both be less than X is the square of the probability that one throw
will be less than X. So, when U=1, the L4S marking function is linear
and the Classic dropping function is squared. If U=2, L4S would be a
square function and Classic would be quartic. And so on.
The maxrand(u) function in lines 16-21 simply generates u random
numbers and returns the maximum (Note ). Typically, maxrand(u) could be
run in parallel out of band. For instance, if U=1, the Classic queue
would require the maximum of two random numbers. So, instead of calling
maxrand(2*U) in-band, the maximum of every pair of values from a
pseudorandom number generator could be generated out-of-band, and held
in a buffer ready for the Classic queue to consume.
Notes:
The drain rate of the queue can vary
if it is scheduled relative to other queues, or to cater for
fluctuations in a wireless medium. To auto-adjust to changes in
drain rate, the queue must be measured in time, not bytes or packets
. In our Linux implementation, it was easiest
to measure queuing time at dequeue. Queuing time can be estimated
when a packet is enqueued by measuring the queue length in bytes and
dividing by the recent drain rate.
An implementation has to use
priority queueing, but it need not implement strict priority.
If packets can be enqueued while
processing dequeue code, an implementer might prefer to place the
while loop around both queues so that it goes back to test again
whether any L4S packets arrived while it was dropping a Classic
packet.
In order not to change too many factors
at once, for now, we keep the marking function for DCTCP-only
traffic as similar as possible to DCTCP. However, unlike DCTCP, all
processing is at dequeue, so we determine whether to mark a packet
at the head of the queue by the byte-length of the queue behind it. We plan to test whether using
queuing time will work in all circumstances, and if we find that the
step can cause oscillations, we will investigate replacing it with a
steep random marking curve.
An EWMA is only one possible way to
filter bursts; other more adaptive smoothing methods could be valid
and it might be appropriate to decrease the EWMA faster than it
increases.
In practice at line 10 the
Classic queue would probably test for ECN capability on the packet
to determine whether to drop or mark the packet. However, for
brevity such detail is omitted. All packets classified into the L4S
queue have to be ECN-capable, so no dropping logic is necessary at
line 3. Nonetheless, L4S packets could be dropped by overload code
(see ).
In the integer variant of the
pseudocode () real numbers are
all represented as integers scaled up by 2^32. In lines 3 & 9
the function maxrand() is arranged to return an integer in the range
0 <= maxrand() < 2^32. Queuing times are also scaled up by
2^32, but in two stages: i) In lines 3 and 8 queuing times cq.ns()
and pkt.ns() are returned in integer nanoseconds, making the values
about 2^30 times larger than when the units were seconds, ii) then
in lines 3 and 9 an adjustment of -2 to the right bit-shift
multiplies the result by 2^2, to complete the scaling by 2^32.

RTT_C / RTT_L
Reno
Cubic
1
k'=1
k'=0
2
k'=2
k'=1
3
k'=2
k'=2
4
k'=3
k'=2
5
k'=3
k'=3
k' is related to k in Equation (1) ()
by k=2^k'.
To determine the appropriate policy, the operator first has to judge
whether it wants DCTCP flows to have roughly equal throughput with Reno
or with Cubic (because, even in its Reno-compatibility mode, Cubic is
about 1.4 times more aggressive than Reno). Then the operator needs to
decide at what ratio of RTTs it wants DCTCP and Classic flows to have
roughly equal throughput. For example choosing k'=0 (equivalent to k=1)
will make DCTCP throughput roughly the same as Cubic, if their RTTs are the same .
However, even if the base RTTs are the same, the actual RTTs are
unlikely to be the same, because Classic (Cubic or Reno) traffic needs a
large queue to avoid under-utilization and excess drop, whereas L4S
(DCTCP) does not. The operator might still choose this policy if it
judges that DCTCP throughput should be rewarded for keeping its own
queue short.
On the other hand, the operator will choose one of the higher values
for k', if it wants to slow DCTCP down to roughly the same throughput as
Classic flows, to compensate for Classic flows slowing themselves down
by causing themselves extra queuing delay.
The values for k' in the table are derived from the formulae, which
was developed in :
For localized traffic from a particular ISP's data centre, we used
the measured RTTs to calculate that a value of k'=3 (equivalant to k=8)
would achieve throughput equivalence, and our experiments verified the
formula very closely.
For a typical mix of RTTs from local data centres and across the
general Internet, a value of k'=1 (equivalent to k=2) is recommended as
a good workable compromise.
Most of the following open issues are also tagged '{ToDo}' at the
appropriate point in the document:
PI2 appendix: scaling of alpha & beta, esp. dependence of
beta_U on Tupdate
Curvy RED appendix: complete the unfinished parts

```
```

```
```