MulTFRC: TFRC with weighted fairnessUniversity of OsloPO Box 1080 BlindernOsloN-0316Norway+47 22 85 24 20michawe@ifi.uio.noUniversity of InnsbruckTechnikerstr. 21 AInnsbruckA-6020Austria+43 512 507 96803dragana.damjanovic@uibk.ac.atUniversity of OsloPO Box 1080 BlindernOsloN-0316Norway+47 22 85 24 44steing@ifi.uio.no
Transport
Internet Engineering Task ForcetemplateThis document specifies the MulTFRC congestion control mechanism.
MulTFRC is derived from TFRC and can be
implemented by carrying out a few small and simple changes to
the original mechanism. Its behavior differs
from TFRC in that it emulates a number of TFRC flows with more
flexibility than what would be practical or
even possible using multiple real TFRC flows. Additionally, MulTFRC better
preserves the original features of TFRC than multiple real TFRCs do.
"TCP-friendliness", the requirement for a flow to behave under
congestion like a flow produced by a conformant TCP (introduced
by the name of "TCP-compatibility" in ),
has been put into question in recent years (cf. ).
As an illustrative example, consider the fact that not all
data transfers are of equal importance to a user. A user may
therefore want to assign different priorities to different flows
between two hosts, but TCP(-friendly) congestion control would
always let these flows use the same sending rate. Users and
their applications are now already bypassing TCP-friendliness in
practice: since multiple TCP flows can better saturate
a bottleneck than a single one, some applications open multiple
connections as a simple workaround. The "GridFTP"
protocol explicitly provides this function as a performance improvement.
Some research efforts were therefore carried out to develop
protocols where a weight can directly be applied to the congestion
control mechanism,
allowing a flow to be as aggressive as a number of parallel TCP flows at
the same time. The first, and best
known, such protocol is MulTCP, which
emulates N TCPs in a rather simple fashion. Improved versions were
later published, e.g. Stochastic TCP
and Probe-Aided (PA-)MulTCP. These protocols
could be called "N-TCP-friendly", i.e. as TCP-friendly as N TCPs.
MulTFRC, defined in this document, does with TFRC what MulTCP does
with TCP. In , it was shown that MulTFRC
achieves its goal of emulating N flows better than MulTCP (and
improved versions of it) and has a number of other benefits. For
instance, MulTFRC with N=2 is more
reactive than two real TFRC flows are, and it has a smoother sending
rate than two real MulTFRC flows do. Moreover, since it is only one
mechanism, a protocol that uses MulTFRC can send a single data stream
with the congestion control behavior of multiple data streams
without the need to split the data and spread it over separate
connections. Depending on the protocol in use, N real TFRC flows can also
be expected to have N times the overhead for, e.g., connection setup
and teardown, of a MulTFRC flow with the same value of N.
The core idea of TFRC is to achieve TCP-friendliness by explicitly
calculating an equation which approximates the steady-state throughput
of TCP and sending as much as the calculation says. The core idea of
MulTFRC is to replace this equation in TFRC with the algorithm
from , which approximates the steady-state
throughput of N TCP flows. MulTFRC can be implemented via
a few simple changes to the TFRC code. It is therefore defined
here by specifying how it differs from the TFRC
specification.
This section lists the changes to that must be
applied to turn TFRC into MulTFRC. The small number of
changes ensures that many original properties of a single TFRC flow are
preserved, which is often the most appropriate choice (e.g. it
would probably not make sense for a MulTFRC flow to detect a data-limited
interval differently than a single TFRC flow would). It also makes MulTFRC
easy to understand and implement. Experiments have shown that these
changes are enough to attain the desired effect.
While the TCP throughput equation requires
the loss event rate, round-trip time and segment size
as input, the algorithm to be used for MulTFRC
additionally needs the number of packets lost in
a loss event. The equation, specified in as
is replaced with the following algorithm, which returns
X_Bps, the average transmit rate of N TCPs in bytes per second:
Where:
s is the segment size in bytes (excluding IP and transport protocol headers).R is the round-trip time in seconds.
b is the maximum number of packets acknowledged by a single TCP acknowledgement.
p is the loss event rate, between 0 and 1.0, of the number of loss events as a fraction of the number of packets transmitted.
j is the number of packets lost in a loss event.
t_RTO is the TCP retransmission timeout value in seconds.
N is the number of TFRC flows that MulTFRC should emulate. N is a positive rational number; a discussion of appropriate values for this parameter, and reasons for choosing them, is provided in .
ceil(N) gives the smallest integer greater than or equal to N.
x, af, a, z and q are temporary floating point variables.
contains an argument that shows why the calculations in the algorithm will not overflow, underflow or produce significant rounding errors.
Section 3.1 of contains a recommendation for setting the t_RTO parameter, and a simplification of the equation as a result of setting this parameter in a specific way. This part of the TFRC specification could be applied here too. Section 3.1 of also contains a discussion of the parameter b for delayed acknowledgements and concludes that the use of b=1 is RECOMMENDED. This is also the case for MulTFRC.
Section 3.2.2 of specifies the contents of feedback packets. In addition to the information listed there, a MulTFRC feedback packet also carries j, the number of packets lost in a loss event.
The procedure for updating the allowed sending rate in section 4.3 of ("action 4") contains the statement:
which is replaced with the statement:
Section 5.2 of explains how a lost packet that
starts a new loss event should be distinguished from a lost packet that is
a part of the previous loss event interval. Here, additionally the number of
packets lost in a loss event is counted, and therefore this section is
extended with:
If S_new is a part of the current loss interval LP_0 (the number of lost packets in
the current interval) is increased by 1. On the other hand, if S_new starts a new loss
event, LP_0 is set to 1.
Section 5.4 of contains the algorithm for calculating the average
loss interval that is needed for calculation of the loss event rate, p. MulTFRC also requires the number of lost
packets in a loss event, j. In the calculation of the average loss interval is done
using a filter that weights the n most recent loss event intervals, and setting n to 8 is RECOMMENDED.
The same algorithm is used here for calculating the average loss interval. For the number of lost packets in
a loss event interval, j, the weighted average number of lost packets in the n most recent loss intervals is taken and
the same filter is used.
For calculating the average number of packets lost in a loss event interval we use the same loss intervals
as for the p calculation.
Let LP_0 to LP_k be the number of lost packets in the k most recent loss intervals. The algorithm for calculating
I_mean in Section 5.4 of (page 23) is extended by adding, after the last line ("p = 1 / I_mean;"):
In section 5.5 of (page 25), the algorithm that
ends with "p = min(W_tot0/I_tot0, W_tot1/I_tot1);" is extended by adding:
The steps to be carried out by the receiver when a packet is received in section 6.1 of ("action 4") contain the statement:
which is replaced with the statement:
The steps to be carried out by the receiver upon expiration of feedback timer
in section 6.2 of ("action 1") contain the statement:
which is replaced with:
This statement is added at the beginning of the list of initial steps
to take when the first packet is received, in section 6.3 of
:
Set j = 0.
Section 6.3.1 of discusses how the loss history is initialized
after the first loss event. TFRC approximates the rate to be the maximum value of X_recv
so far, assuming that a higher rate introduces loss. Therefore j for this rate is
approximated by 1 and the number of packets lost in the first interval is set to 1. This is accomplished by the following change.
The first sentence of the fourth paragraph (in section 6.3.1) is:
which is replaced with:
The second last paragraph in section 6.3.1 ends with:
which is replaced with:
Section 8.1 explains details about calculating the original TCP
throughput equation, which was replaced with a new algorithm
in this document. It is therefore obsolete.
This section provides a terminology list for TFRC, which
is extended as follows:
The "weighted fairness" service provided by a protocol
using MulTFRC is quite different from the service provided
by traditional Internet transport protocols. This section
intends to answer some questions that this new service
may raise.
Like TFRC, MulTFRC is suitable for applications
that require a smoother sending rate than standard TCP.
Since it is likely that these would be
multimedia applications, TFRC has largely been associated
with them (and
mentions "streaming media" as an example). Since timely
transmission is often more important for them
than reliability, multimedia applications usually do not
keep retransmitting
packets until their successful delivery is ensured.
Accordingly, TFRC usage was specified for the Datagram
Congestion Control Protocol (DCCP) ,
but not for reliable data transfers.
MulTFRC, on the other hand, provides an altogether
different service. For some applications, a smoother
sending rate may not be particularly desirable but
might also not be considered harmful, while the ability
to emulate the congestion control of N flows may be
useful for them. This could include reliable transfers
such as the transmission of files. Possible reasons
to use MulTFRC for file transfers include the assignment
of priorities according to user preferences, increased
efficiency with N > 1, the implementation of low-priority
"scavenger" services and resource pooling .
N MUST be set at the beginning of a transfer; it
MUST NOT be changed while a transfer is ongoing.
The effects of changing N during the lifetime of
a MulTFRC session on the dynamics of the mechanism
are yet to be investigated; in particular, it is
unclear how often N could safely be changed, and
how "safely" should be defined in this context.
Further research is required to answer these questions.
N is a positive floating point number which can also
take values between 0 and 1, making MulTFRC
applicable as a mechanism for what has been called a
"Lower-than-Best-Effort" (LBE) service. Since it does
not reduce its sending rate early as delay increases,
as some alternative proposals for such a service do
(e.g. TCP-LP,
TCP Nice or
4CP), it can probably
be expected to
be more aggressive than these mechanisms if they share
a bottleneck at the same time. This, however, also means that
MulTFRC is less likely to be prone to starvation.
Values between 0 and 1 could also be useful if MulTFRC
is used across multiple paths to realize resource pooling .
Setting N to 1 is also possible. In this case, the
only difference between TFRC and MulTFRC is that the
underlying model of TFRC assumes that all remaining
packets following a dropped packet in a "round"
(less than one round-trip time apart) are
also dropped, whereas the underlying model of MulTFRC
does not have this assumption. Whether it
is correct or not depends on the specific network
situation; large windows and other queuing schemes
than Drop-Tail make it less likely for
the assumption to match reality. This document does
not make any recommendation about which mechanism
to use if only one flow is desired.
Since TCP has been extensively studied, and the aggression
of its congestion control mechanism is emulated by TFRC,
we can look at the behavior of a TCP aggregate in order
to find a reasonable upper limit for N in MulTFRC.
From , N TCPs (assuming
non-sychronized loss events over connections) can saturate
a bottleneck link by roughly 100-100/(1+3N) percent. This means
that a single flow can only achieve 75% utilization, whereas
3 flows already achieve 90%. The theoretical gain that can
be achieved by adding a flow declines with the total number
of flows - e.g., while going from 1 to 2 flows is a 14.3%
performance gain, the gain becomes less than 1% beyond 6 flows
(which already achieve 95% link utilization). Since the
link utilization of MulTFRC can be expected to be roughly
the same as the link utilization of multiple TCPs, the
approximation above also holds for MulTFRC. Thus,
setting N to a much larger value than the values mentioned
above will only yield a marginal benefit in isolation but
can significantly affect other traffic. Therefore, the
maximum value that a user can set for MulTFRC SHOULD NOT
exceed 6.
Note that the model in , and hence the above discussion,
considers the long-term steady-state behavior of TCP, which may not always be
seen when the bandwidth*delay product is very large .
This is due to TCP's slow congestion window growth in the congestion avoidance phase.
While a MulTFRC flow with N > 1 can generally be expected to outperform a single standard
TCP flow if N is large enough, such usage of MulTFRC is not recommended as
a fix to the problem of saturating very large bandwidth*delay product paths:
in order to always achieve good bottleneck utilization
with MulTFRC under such conditions, N would have to be a function of the bandwidth*delay
product. In other words, the mechanism does not scale with bandwidth and delay;
very large bandwidth or delay values may require very large values for N,
leading to a behavior which is overly aggressive but possibly worse in terms of
performance than mechanisms such as HighSpeed TCP , which
are specifically designed for such situations. The same argument applies
for running multiple TCP flows, as in .
It is well known that a single uncontrolled UDP flow
can cause significant harm to a large number of TCP flows
that share the same bottleneck. This potential danger
is due to the total lack of congestion control in UDP.
Because this problem is well known, and because UDP is easy
to detect, UDP traffic will often be rate limited by
service providers.
If MulTFRC is used within a protocol such as DCCP,
which will normally not be considered harmful and
will therefore typically not be rate-limited, its
tunable aggression could theoretically make it
possible to use it for a Denial-of-Service (DoS)
attack. In order to avoid such usage, the maximum
value of N MUST be restricted. If, as recommended
in this document, the maximum value for N is
restricted to 6, the impact of MulTFRC on TCP is
roughly the same as the impact of 6 TCP flows would
be. It is clear that the conjoint congestion control
behavior of 6 TCPs is far from being such an attack.
With transport protocols such as TCP, SCTP or DCCP,
users can already be more aggressive than others by
opening multiple connections. If MulTFRC is used
within a transport protocol, this effect becomes
more pronounced - e.g., 2 connections with N set to
6 for each of them roughly exhibit the same congestion
control behavior as 12 TCP flows. The N limit SHOULD
therefore be implemented as a system wide parameter
such that the sum of the N values of all MulTFRC
connections does not exceed it. Alternatively, the
number of connections that can be opened could be
restricted.
This work was partially funded by the EU IST project EC-GIN under the contract STREP FP6-2006-IST-045256.The authors would like to thank the following people whose feedback
and comments contributed to this document (in alphabetic order): Lachlan Andrew, Dirceu Cavendish, Soo-Hyun Choi, Wes Eddy.
TCP Friendly Rate Control (TFRC): Protocol SpecificationThis document specifies TCP Friendly Rate Control (TFRC). TFRC is a congestion control mechanism for unicast flows operating in a best-effort Internet environment. It is reasonably fair when competing for bandwidth with TCP flows, but has a much lower variation of throughput over time compared with TCP, making it more suitable for applications such as streaming media where a relatively smooth sending rate is of importance.</t><t> This document obsoletes RFC 3448 and updates RFC 4342. [STANDARDS TRACK]MulTFRC: Providing Weighted Fairness for Multimedia Applications (and others too!)Extending the TCP Steady-State Throughput Equation for Parallel TCP FlowsDifferentiated end-to-end Internet services using a weighted proportional fair sharing TCPProbe-Aided MulTCP: an aggregate congestion control mechanismFlow rate fairness: dismantling a religionRecommendations on Queue Management and Congestion Avoidance in the InternetUSC Information Sciences Institute4676 Admiralty WayMarina del ReyCA90292310-822-1511Braden@ISI.EDUMIT Laboratory for Computer Science545 Technology Sq.CambridgeMA02139617-253-6003DDC@lcs.mit.eduUniversity College LondonDepartment of Computer ScienceGower StreetLondon, WC1E 6BTENGLAND+44 171 380 7296Jon.Crowcroft@cs.ucl.ac.ukCisco Systems, Inc.250 Apollo DriveChelmsfordMA01824bdavie@cisco.comCisco Systems, Inc.170 West Tasman DriveSan JoseCA95134-1706408-527-8213deering@cisco.comUSC Information Sciences Institute4676 Admiralty WayMarina del ReyCA90292310-822-1511Estrin@usc.eduLawrence Berkeley National Laboratory, MS 50B-2239, One Cyclotron Road, Berkeley CA 94720510-486-7518Floyd@ee.lbl.govLawrence Berkeley National Laboratory, MS 46A, One Cyclotron Road, Berkeley CA 94720510-486-7519Van@ee.lbl.govFiberlane Communications1399 Charleston RoadMountain ViewCA94043+1 650 237 3164Minshall@fiberlane.comBBN Technologies10 Moulton St.Cambridge MA 02138510-558-8675craig@bbn.comDepartment of Computer ScienceUniversity of ArizonaTucsonAZ85721520-621-4231LLP@cs.arizona.eduAT&T Labs. ResearchRm. A155180 Park AvenueFlorham Park, N.J. 07932973-360-8766KKRama@research.att.comXerox PARC3333 Coyote Hill RoadPalo AltoCA94304415-812-4840Shenker@parc.xerox.comMIT Laboratory for Computer Science545 Technology Sq.CambridgeMA02139617-253-7885JTW@lcs.mit.eduUCLA4531G Boelter HallLos AngelesCA90024310-825-2695Lixia@cs.ucla.edu
Routing
congestion
This memo presents two recommendations to the Internet community
concerning measures to improve and preserve Internet performance.
It presents a strong recommendation for testing, standardization,
and widespread deployment of active queue management in routers,
to improve the performance of today's Internet. It also urges a
concerted effort of research, measurement, and ultimate deployment
of router mechanisms to protect the Internet from flows that are
not sufficiently responsive to congestion notification.
GridFTP: Protocol Extensions to FTP for the GridProfile for Datagram Congestion Control Protocol (DCCP) Congestion Control ID 3: TCP-Friendly Rate Control (TFRC)This document contains the profile for Congestion Control Identifier 3, TCP-Friendly Rate Control (TFRC), in the Datagram Congestion Control Protocol (DCCP). CCID 3 should be used by senders that want a TCP-friendly sending rate, possibly with Explicit Congestion Notification (ECN), while minimizing abrupt rate changes. [STANDARDS TRACK]The Resource Pooling PrincipleTCP-LP: low-priority service via end-point congestion controlTCP Nice: a mechanism for background transfersCompetitive and Considerate Congestion Control for Bulk Data TransfersParallel TCP Sockets: Simple Model, Throughput and ValidationParallel TCP Data Transfers: A Practical Model and its ApplicationImproving Throughput and Maintaining Fairness using Parallel TCPHighSpeed TCP for Large Congestion WindowsThe proposals in this document are experimental. While they may be deployed in the current Internet, they do not represent a consensus that this is the best method for high-speed congestion control. In particular, we note that alternative experimental proposals are likely to be forthcoming, and it is not well understood how the proposals in this document will interact with such alternative proposals. This document proposes HighSpeed TCP, a modification to TCP's congestion control mechanism for use with TCP connections with large congestion windows. The congestion control mechanisms of the current Standard TCP constrains the congestion windows that can be achieved by TCP in realistic environments. For example, for a Standard TCP connection with 1500-byte packets and a 100 ms round-trip time, achieving a steady-state throughput of 10 Gbps would require an average congestion window of 83,333 segments, and a packet drop rate of at most one congestion event every 5,000,000,000 packets (or equivalently, at most one congestion event every 1 2/3 hours). This is widely acknowledged as an unrealistic constraint. To address his limitation of TCP, this document proposes HighSpeed TCP, and solicits experimentation and feedback from the wider community.In this appendix we show why the algorithm for calculating X_Bps in contains integer and floating
point arithmetic that will not give underflow, overflow or rounding errors that
will adversely affect the result of the algorithm.
Note that the algorithm is not invoked when p == 1. p is computed in section 5.4
of .
If p is equal to 1 there is exactly one packet in each loss interval (and this will be an ECN-marked packet).Assuming that the rest of the parameters are not outside their conceivable values, the calculation of X_Bps in could lead to arithmetic errors or large imprecision only if p is very close to 1. However this will never happen because p is calculated as 1/I_mean in section 5.4 of . The algorithm that calculates I_mean does so by a weighted average of the number of packets in the last n loss events. If n == 8 and the weights are set as defined in
, the smallest possible value of I_mean will be 1.033333. The reason for this is that the number of packets in a loss event is a positive integer and the smallest value (not equal to 1) is found when all recent intervals are of length 1, but the oldest interval (number 8 in this case) is of length two. All other sizes of loss intervals and smaller values of n will give higher values for I_mean and hence lower values for p.
Below we analyze the algorithm that calculates X_Bps, and see that it will always execute without any overflow, underflows or large rounding errors. In this analysis we assume that no parameter has an improper value. We say that a calculation is "sound" when no overflow, underflow or significant rounding may occur during the calculation. In the first lines of the algorithm, af is calculated, and it is easy to see that the calculation is sound and that the value of af will end up between 1 and N+1.
In the calculation of a, N may be equal to or close to af/2, and then the expression (p*b*af*(N-2*af)^2) may evaluate to a number very close to 0. However, (24*N^2) is added, and hence the calculation of a is sound, as the values of p, b and N are neither extremely small nor extremely large. Note that a will be a smaller value when p is a smaller value.When x is calculated there will be no problem to find the root of a, and then add it to (af*p*b*(2*af-N)), which again might be a very small value. The final calculation of x involves a division by (6*p*N^2). However when p is small, the dividend is not very large (dominated by p and a), hence the calculation of x is sound and the result will neither be a very large nor a very small number.
However, we need to show that x will not be 0 or negative, otherwise the algorithm will try a division by 0, or, if x is negative, the calculation of X_Bps could turn out to be a negative value.We show that x is a positive rational number (and not 0) by defining W=p*b*af, Y=2*af-N and D=6*N^2*p. Then the assignment to x can be written as
x= (W*Y + sqrt(W*24*N^2 + W^2 * (-Y)^2 )) / D.
If we subtract W*24*N^2 from the argument to the sqrt-function, we get an expression that is obviously smaller, hence after the assignment to x, this inequality holds:
x > (W*Y + sqrt(W^2 * (-Y)^2) ) / D.
Simplifying the right hand side gives:
x > (W*Y - W*Y) / D,
that is: x > 0.
From this argument it is also clear that x is not close to 0, so that the divisons by x (or x*R) that we will see later in the algorithm will give a sound result. Then z is found by division by 1-p, which is sound when p is not close to 1 (which we have argued above it is not). Below we also need the fact that z is not close to 0, which is true because neither t_RTO nor p is close to 0.
Since neither R, j nor x is 0, or close to 0, the calculation of the first parameter to the min function is sound. The second parameter of the min function is sound because x*R is not a value close to 0, and hence the execution of all the three arguments to the min function is sound.
Finally X_Bps is calculated and since N, p, x, R, z are not close to zero, and p is not close to one (and hence 1-p is not close to 0), this final calculation is also sound.