idnits 2.17.1 

draft-ietf-ledbat-congestion-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Sep 2009 rather than the newer Notice from 28 Dec 2009.  (See
     https://trustee.ietf.org/license-info/)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (March 22, 2010) is 5149 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	LEDBAT WG                                                    S. Shalunov
3	Internet-Draft                                            BitTorrent Inc
4	Intended status: Experimental                             March 22, 2010
5	Expires: September 23, 2010

7	             Low Extra Delay Background Transport (LEDBAT)
8	                  draft-ietf-ledbat-congestion-01.txt

10	Abstract

12	   LEDBAT is an alternative experimental congestion control algorithm.

14	   LEDBAT enables an advanced networking application to minimize the
15	   extra delay it induces in the bottleneck while saturating the
16	   bottleneck.  It thus implements an end-to-end version of scavenger
17	   service.  LEDBAT has been been implemented in BitTorrent DNA, as the
18	   exclusive congestion control mechanism, and in uTorrent, as an
19	   experimental mechanism, and deployed in the wild with favorable
20	   results.

22	Status of this Memo

24	   This Internet-Draft is submitted to IETF in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF), its areas, and its working groups.  Note that
29	   other groups may also distribute working documents as Internet-
30	   Drafts.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   The list of current Internet-Drafts can be accessed at
38	   http://www.ietf.org/ietf/1id-abstracts.txt.

40	   The list of Internet-Draft Shadow Directories can be accessed at
41	   http://www.ietf.org/shadow.html.

43	   This Internet-Draft will expire on September 23, 2010.

45	Copyright Notice

47	   Copyright (c) 2010 IETF Trust and the persons identified as the
48	   document authors.  All rights reserved.

50	   This document is subject to BCP 78 and the IETF Trust's Legal
51	   Provisions Relating to IETF Documents
52	   (http://trustee.ietf.org/license-info) in effect on the date of
53	   publication of this document.  Please review these documents
54	   carefully, as they describe your rights and restrictions with respect
55	   to this document.  Code Components extracted from this document must
56	   include Simplified BSD License text as described in Section 4.e of
57	   the Trust Legal Provisions and are provided without warranty as
58	   described in the BSD License.

60	Table of Contents

62	   1.  Requirements notation  . . . . . . . . . . . . . . . . . . . .  3
63	   2.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
64	   3.  LEDBAT design goals  . . . . . . . . . . . . . . . . . . . . .  3
65	   4.  LEDBAT motivation  . . . . . . . . . . . . . . . . . . . . . .  4
66	     4.1.  Simplest network topology  . . . . . . . . . . . . . . . .  4
67	     4.2.  Extra delay  . . . . . . . . . . . . . . . . . . . . . . .  4
68	     4.3.  Queuing delay target . . . . . . . . . . . . . . . . . . .  4
69	     4.4.  Need to measure delay  . . . . . . . . . . . . . . . . . .  5
70	     4.5.  Queing delay estimate  . . . . . . . . . . . . . . . . . .  5
71	     4.6.  Controller . . . . . . . . . . . . . . . . . . . . . . . .  5
72	     4.7.  Max rampup rate same as TCP  . . . . . . . . . . . . . . .  5
73	     4.8.  Halve on loss  . . . . . . . . . . . . . . . . . . . . . .  6
74	     4.9.  Yield to TCP . . . . . . . . . . . . . . . . . . . . . . .  6
75	     4.10. Need for one-way delay . . . . . . . . . . . . . . . . . .  6
76	     4.11. Measuring one-way delay  . . . . . . . . . . . . . . . . .  6
77	     4.12. Route changes  . . . . . . . . . . . . . . . . . . . . . .  6
78	     4.13. Timestamp errors . . . . . . . . . . . . . . . . . . . . .  7
79	       4.13.1.  Clock offset  . . . . . . . . . . . . . . . . . . . .  7
80	       4.13.2.  Clock skew  . . . . . . . . . . . . . . . . . . . . .  7
81	     4.14. Noise filtering  . . . . . . . . . . . . . . . . . . . . .  8
82	     4.15. Non-bulk flows . . . . . . . . . . . . . . . . . . . . . .  8
83	     4.16. LEDBAT framing and wire format . . . . . . . . . . . . . .  9
84	     4.17. Fairness between LEDBAT flows  . . . . . . . . . . . . . .  9
85	       4.17.1.  Late comers . . . . . . . . . . . . . . . . . . . . . 10
86	     4.18. Safety of LEDBAT . . . . . . . . . . . . . . . . . . . . . 11
87	   5.  LEDBAT congestion control  . . . . . . . . . . . . . . . . . . 11
88	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 14
89	   7.  Normative References . . . . . . . . . . . . . . . . . . . . . 14
90	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 14

92	1.  Requirements notation

94	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
95	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
96	   document are to be interpreted as described in [RFC2119].

98	2.  Introduction

100	   The standard congestion control in TCP is based on loss and has not
101	   been designed to drive delay to any given value.  Because TCP needs
102	   losses to back off, when a FIFO bottleneck lacks AQM, TCP fills the
103	   buffer, effectively maximizing possible delay.  Large number of the
104	   thinnest links in the Internet, particularly most uplinks of home
105	   connections, lack AQM.  They also frequently contain enough buffer
106	   space to get delays into hundreds of milliseconds and even seconds.
107	   There is no benefit to having delays this large, but there are very
108	   substantial drawbacks for interactive applications: games and VoIP
109	   become impossible and even web browsing becomes very slow.

111	   While a number of delay-based congestion control mechanisms have been
112	   proposed, they were generally not designed to minimize the delay
113	   induced in the network.

115	   LEDBAT is designed to allow to keep the latency across the congested
116	   bottleneck low even as it is saturated.  This allows applications
117	   that send large amounts of data, particularly upstream on home
118	   connections, such as peer-to-peer application, to operate without
119	   destroying the user experience in interactive applications.  LEDBAT
120	   takes advantage of delay measurements and backs off before loss
121	   occurs.  It has been deployed by BitTorrent in the wild with the
122	   BitTorrent DNA client and now, experimentally, with the uTorrent
123	   client.  This mechanism not only allows to keep delay across a
124	   bottleneck low, but also yields quickly in the presence of competing
125	   traffic with loss-based congestion control.

127	   Beyond its utility for P2P, LEDBAT enables other advanced networking
128	   applications to better get out of the way of interactive apps.

130	   In addition to direct and immediate benefits for P2P and other
131	   application that can benefit from scavenger service, LEDBAT could
132	   point the way for a possible future evolution of the Internet where
133	   loss is not part of the designed behavior and delay is minimized.

135	3.  LEDBAT design goals

137	   LEDBAT design goals are:

139	   1.  saturate the bottleneck
140	   2.  keep delay low when no other traffic is present
141	   3.  quickly yield to traffic sharing the same bottleneck queue that
142	       uses standard TCP congestion control
143	   4.  add little to the queuing delays induced by TCP traffic
144	   5.  operate well in networks with FIFO queuing with drop-tail
145	       discipline
146	   6.  be deployable for popular applications that currently comprise
147	       noticeable fractions of Internet traffic
148	   7.  where available, use explicit congestion notification (ECN),
149	       active queue management (AQM), and/or end-to-end differentiated
150	       services (DiffServ).

152	4.  LEDBAT motivation

154	   This section describes LEDBAT informally and provides some
155	   motivation.  It is expected to be helpful for general understanding
156	   and useful in discussion of the properties of LEDBAT.

158	   Without a loss of generality, we can consider only one direction of
159	   the data transfer.  The opposite direction can be treated
160	   identically.

162	4.1.  Simplest network topology

164	   Consider first the desired behavior when there's only a single
165	   bottleneck and no competing traffic whatsoever, not even other LEDBAT
166	   connections.  The design goals obviously need to be robustly met for
167	   this trivial case.

169	4.2.  Extra delay

171	   Consider the queuing delay on the bottleneck.  This delay is the
172	   extra delay induced by congestion control.  One of our design goals
173	   is to keep this delay low.  However, when this delay is zero, the
174	   queue is empty and so no data is being transmitted and the link is
175	   thus not saturated.  Hence, our design goal is to keep the queuing
176	   delay low, but non-zero.

178	4.3.  Queuing delay target

180	   How low do we want the queuing delay to be?  Because another design
181	   goal is to be deployable on networks with only simple FIFO queuing
182	   and drop-tail discipline, we can't rely on explicit signaling for the
183	   queuing delay.  So we're going to estimate it using external
184	   measurements.  The external measurements will have an error at least
185	   on the order of best-case scheduling delays in the OSes.  There's
186	   thus a good reason to try to make the queuing delay larger than this
187	   error.  There's no reason that would want us to push the delay much
188	   further up.  Thus, we will have a delay target that we would want to
189	   maintain.

191	4.4.  Need to measure delay

193	   To maintain delay near the target, we have to use delay measurements.
194	   Lacking delay measurements, we'd have to go only by loss (when ECN is
195	   lacking).  For loss to occur (on a FIFO link with drop-tail
196	   discipline), the buffer must first be filled.  This would drive the
197	   delay to the largest possible value for this link, thus violating our
198	   design goal of keeping delay low.

200	4.5.  Queing delay estimate

202	   Since our goal is to control the queuing delay, it is natural to
203	   maintain an estimate of it.  Let's call delay components propagation,
204	   serialization, processing, and queuing.  All components but queuing
205	   are nearly constant and queuing is variable.  Because queuing delay
206	   is always positive, the constant propagation+serialization+processing
207	   delay is no less than the minimum delay observed.  Assuming that the
208	   queuing delay distribution density has non-zero integral from zero to
209	   any sufficiently small upper limit, minimum is also an asymptotically
210	   consistent estimate of the constant fraction of the delay.  We can
211	   thus estimate the queuing delay as the difference between current and
212	   base delay as usual.

214	4.6.  Controller

216	   When our estimate of the queuing delay is lower than the target, it's
217	   natural to send faster.  When our estimate is higher, it's natural to
218	   send slower.  To avoid trivial oscillations on round-trip-time (RTT)
219	   scale, the response of the controller needs to be near zero when the
220	   estimate is near the target.  To converge faster, the response needs
221	   to increase as the difference increases.  The simplest controller
222	   with this property is the linear controller, where the response is
223	   proportional to the difference between the estimate and the target.
224	   This controller happens to work well in practice obviating the need
225	   for more complex ones.

227	4.7.  Max rampup rate same as TCP

229	   The maximum speed with which we can increase our congestion window is
230	   then when queuing delay estimate is zero.  To be on the safe side,
231	   we'll make this speed equal to how fast TCP increases its sending
232	   speed.  Since queuing delay estimate is always non-negative, this
233	   will ensure never ramping up faster than TCP would.

235	4.8.  Halve on loss

237	   Further, to deal with severe congestion when most packets are lost
238	   and to provide a safety net against incorrect queuing delay
239	   estimates, we'll halve the window when a loss event is detected.
240	   We'll do so at most once per RTT.

242	4.9.  Yield to TCP

244	   Consider competition between a LEDBAT connection and a connection
245	   governed by loss-based congestion control (on a FIFO bottleneck with
246	   drop-tail discipline).  Loss-based connection will need to experience
247	   loss to back off.  Loss will only occur after the connection
248	   experiences maximum possible delays.  LEDBAT will thus receive
249	   congestion indication sooner than the loss-based connection.  If
250	   LEDBAT can ramp down faster than the loss-based connection ramps up,
251	   LEDBAT will yield.  LEDBAT ramps down when queuing delay estimate
252	   exceeds the target: the more the excess, the faster the ramp-down.
253	   When the loss-based connection is standard TCP, LEDBAT will yield at
254	   precisely the same rate as TCP is ramping up when the queuing delay
255	   is double the target.

257	4.10.  Need for one-way delay

259	   Now consider a case when one link direction is saturated with
260	   unrelated TCP traffic while another direction is near-empty.
261	   Consider LEDBAT sending in the near-empty direction.  Our design goal
262	   is to saturate it.  However, if we pay attention to round-trip
263	   delays, we'll sense the delays on the reverse path and respond to
264	   them as described in the previous paragraph.  We must, thus, measure
265	   one-way delay and use that for our queuing delay estimate.

267	4.11.  Measuring one-way delay

269	   A special IETF protocol, One-Way Active Measurement Protocol (OWAMP),
270	   exists for measuring one-way delay.  However, since LEDBAT will
271	   already be sending data, it is more efficient to add a timestamp to
272	   the packets on the data direction and a measurement result field on
273	   the acknowledgement direction.  This also prevents the danger of
274	   measurement packets being treated differently from the data packets.
275	   The failure case would be better treatment of measurement packets,
276	   where the data connection would be driven to losses.

278	4.12.  Route changes

280	   Routes can change.  To deal, base delay needs to be computed over a
281	   period of last few minutes instead of since the start of connection.
282	   The tradeoff is: for longer intervals, base is more accurate; for
283	   shorter intervals, reaction to route changes is faster.

285	   A convenient way to implement an approximate minimum over last N
286	   minutes is to keep separate minima for last N+1 minutes (last one for
287	   the partial current minute).

289	4.13.  Timestamp errors

291	   One-way delay measurement needs to deal with timestamp errors.  We'll
292	   use the same locally linear clock model and the same terminology as
293	   Network Time Protocol (NTP).  This model is valid for any
294	   differentiable clocks.  NTP uses the term "offset" to refer to
295	   difference from true time and "skew" to refer to difference of clock
296	   rate from the true rate.  The clock will thus have a fixed offset
297	   from the true time and a skew.  We'll consider what we need to do
298	   about the offset and the skew separately.

300	4.13.1.  Clock offset

302	   First, consider the case of zero skew.  The offset of each of the two
303	   clocks shows up as a fixed error in one-way delay measurement.  The
304	   difference of the offsets is the absolute error of the one-way delay
305	   estimate.  We won't use this estimate directly, however.  We'll use
306	   the difference between that and a base delay.  Because the error
307	   (difference of clock offsets) is the same for the current and base
308	   delay, it cancels from the queuing delay estimate, which is what
309	   we'll use.  Clock offset is thus irrelevant to the design.

311	4.13.2.  Clock skew

313	   Now consider the skew.  For a given clock, skew manifests in a
314	   linearly changing error in the time estimate.  For a given pair of
315	   clocks, the difference in skews is the skew of the one-way delay
316	   estimate.  Unlike the offset, this no longer cancels in the
317	   computation of the queuing delay estimate.  On the other hand, while
318	   the offset could be huge, with some clocks off by minutes or even
319	   hours or more, the skew is typically not too bad.  For example, NTP
320	   is designed to work with most clocks, yet it gives up when the skew
321	   is more than 500 parts per million (PPM).  Typical skews of clocks
322	   that have never been trained seem to often be around 100-200 PPM.
323	   Previously trained clocks could have 10-20 PPM skew due to
324	   temperature changes.  A 100-PPM skew means accumulating 6
325	   milliseconds of error per minute.  The expiration of base delay
326	   related to route changes mostly takes care of clock skew.  A
327	   technique to specifically compute and cancel it is trivially possible
328	   and involves tracking base delay skew over a number of minutes and
329	   then correcting for it, but usually isn't necessary, unless the
330	   target is unusually low, the skew is unusually high, or the base
331	   interval is unusually long.  It is not further described in this
332	   document.

334	4.14.  Noise filtering

336	   In addition to timestamp errors, one-way delay estimate includes an
337	   error of measurement when part of the time measured was spent inside
338	   the sending or the receiving machines.  Different views are possible
339	   on the nature of this delay: one view holds that, to the extent this
340	   delay internal to a machine is not constant, it is a variety of
341	   queuing delay and nothing needs to be done to detect or eliminate it;
342	   another view holds that, since this delay does not have the same
343	   characteristics as queuing delay induced by a fixed-capacity
344	   bottleneck, it is more correctly classified as non-constant
345	   processing delay and should be filtered out.  In practice, this
346	   doesn't seem to matter very much one way or the other.  The way to
347	   filter the noise out is to observe, again, that the noise is always
348	   nonnegative and so a good filter is the minimum of several recent
349	   delay measurements.

351	4.15.  Non-bulk flows

353	   Normally in transport, we're mainly concerned with bulk flows with
354	   infinite sources.  Sometimes this model needs modification.  For
355	   example, an application might be limiting the rate at which the data
356	   is fed to the transport layer.  If the offered rate can be carried by
357	   the network without any sign of congestion, an adaptive congestion
358	   control loop will increase the rate allowed by congestion control.
359	   The LEDBAT mechanism in the form described above will keep increasing
360	   the congestion window linearly with time.  This is clearly wrong for
361	   non-bulk flows because it allows the congestion control to decide
362	   that it is allowed to send much faster than it has ever sent.
363	   Therefore, some special treatment is required to deal with the case
364	   of non-bulk flows.

366	   The same problem may also occur in some TCP implementations.  It has
367	   first been noticed for TCP in situations where a connection has been
368	   idle for a substantial time and the original workaround was to re-
369	   enter slow start after an idle period.  While re-entering slow start
370	   (or, more generally, returning to the original state of the
371	   connection) is a good idea after a long idle period, it does not
372	   solve the problem for connections that are trickling data out at an
373	   application-determined rate.  In fact, even application-level
374	   keepalives or small metadata exchanges can be enough to keep the
375	   connection from becoming idle for long.  The fix for this problem is
376	   to limit the growth of the congestion window by a quantity related to
377	   the current flight size, i.e., the amount of outstanding
378	   unacknowledged data.

380	   The congestion window should not grow beyond a factor of the current
381	   flight size for sufficiently large values of the flight size.  This
382	   provides leeway for the congestion window to grow and probe the
383	   network beyond the current rate, at the same time keeping it on a
384	   tether that does not let it increase indefinitely.  A refinement of
385	   this rule is that the congestion window should not grow beyond a
386	   constant plus a factor of the flight size.  The presence of the
387	   additive constant is necessary because the congestion window needs to
388	   be able to increase at least by one packet even when it is small --
389	   otherwise no probing is possible.  The smallest value, one packet,
390	   then would define the minimum value of the factor: 2.  However, for
391	   larger values of congestion window, a tighter bound than "double the
392	   flight size" may be desirable.  The additive constant makes it
393	   possible to use values between 1 and 2 for the factor.

395	4.16.  LEDBAT framing and wire format

397	   The actual framing and wire format of the protocol(s) using the
398	   LEDBAT congestion control mechanism is outside of scope of this
399	   document, which only describes the congestion control part.

401	   There is an implication of the need to use one-way delay from the
402	   sender to the receiver in the sender.  An obvious way to support this
403	   is to use a framing that timestamps packets at the sender and conveys
404	   the measured one-way delay back to the sender in ack packets.  This
405	   is the method we'll keep in mind for the purposes of exposition.
406	   Other methods are possible and valid.

408	   The protocols to which this congestion control mechanism is
409	   applicable, with possible appropriate extensions, are TCP, SCTP,
410	   DCCP, etc.  It is not a goal of this document to cover such
411	   applications.  The mechanism can also be used with proprietary
412	   transport protocols, e.g., those built over UDP for P2P applications.

414	4.17.  Fairness between LEDBAT flows

416	   The design goals of LEDBAT center around the aggregate behavior of
417	   LEDBAT flows when they compete with standard TCP.  It is also
418	   interesting how LEDBAT flows share bottleneck bandwidth when they
419	   only compete between themselves.

421	   LEDBAT as described so far lacks a mechanism specifically designed to
422	   equalize utilization between these flows.  The observed behavior of
423	   existing implementations indicates that a rough equalization, in
424	   fact, does occur.

426	   The delay measurements used as control inputs by LEDBAT contain some
427	   amount of noise and errors.  The linear controller converts this
428	   input noise into the same amount of output noise.  The effect that
429	   this has is that the uncorrelated component of the noise between
430	   flows serves to randomly shuffle some amount of bandwidth between
431	   flows.  The amount shuffled during each RTT is proportional to the
432	   noise divided by the target delay.  The random-walk trajectory of
433	   bandwidth utilized by each of the flows over time tends to the fair
434	   share.  The timescales on which the rates become comparable are
435	   proportional to the target delay multiplied by the RTT and divided by
436	   the noise.

438	   In complex real-life systems, the main concern is usually the
439	   reduction of the amount of noise, which is copious if not eliminated.
440	   In some circumstances, however, the measurements might be "too good"
441	   -- since the equalization timescale is inversely proportional to
442	   noise, perfect measurements would result in lack of convergence.

444	   Under these circumstances, it may be beneficial to introduce some
445	   artificial randomness into the inputs (or, equivalently, outputs) of
446	   the controller.  Note that most systems should not require this and
447	   should be primarily concerned with reducing, not adding, noise.

449	4.17.1.  Late comers

451	   With delay-based congestion control systems, there's a concern about
452	   the ability of late comers to measure the base delay correctly.
453	   Suppose a LEDBAT flow saturates a bottleneck; another LEDBAT flow
454	   starts and proceeds to measure the base delay and the current delay
455	   and to estimate the queuing delay.  If the bottleneck always contains
456	   target delay worth of packets, the second flow would see the
457	   bottleneck as empty start building a second target delay worth of
458	   queue on top of the existing queue.  The concern ("late comers'
459	   advantage") is that the initial flow would now back off because it
460	   sees the real delay and the late comer would use the whole capacity.

462	   However, once the initial flow yields, the late comer immediately
463	   measures the true base delay and the two flows operate from the same
464	   (correct) inputs.

466	   Additionally, in practice this concern is further alleviated by the
467	   burstiness of network traffic: all that's needed to measure the base
468	   delay is one small gap.  These gaps can occur for a variety of
469	   reasons: the OS may delay the scheduling of the sending process until
470	   a time slice ends, the sending computer might be unusually busy for
471	   some number of milliseconds or tens of milliseconds, etc.  If such a
472	   gap occurs while the late comer is starting, base delay is
473	   immediately correctly measured.  With small number of flows, this
474	   appears to be the main mechanism of regulating the late comers'
475	   advantage.

477	4.18.  Safety of LEDBAT

479	   LEDBAT is most aggressive when its queuing delay estimate is most
480	   wrong and is as low as it can be.  Queuing delay estimate is
481	   nonnegative, therefore the worst possible case is when somehow the
482	   estimate is always returned as zero.  In this case, LEDBAT will ramp
483	   up as fast as TCP and halve the rate on loss.  Thus, in case of worst
484	   possible failure of estimates, LEDBAT will behave identically to TCP.
485	   This provides an extra safety net.

487	5.  LEDBAT congestion control

489	   Consider two parties, a sender and a receiver, with the sender having
490	   an unlimited source of data to send to the receiver and the receiver
491	   merely acknowledging the data.  (In an actual protocol, it's more
492	   convenient to have bidirectional connections, but unidirectional
493	   abstraction suffices to describe the congestion control mechanism.)

495	   Consider a protocol that uses packets of equal size and acknowledges
496	   each of them separately.  (Variable-sized packets and delayed
497	   acknowledgements are possible and are being implemented, but
498	   complicate the exposition.)

500	   Assume that each data packet contains a header field timestamp.  The
501	   sender puts a timestamp from its clock into this field.  Further
502	   assume that each acknowledgement packet contains a field delay.  It
503	   is shown below how it is populated.

505	   Slow start behavior is unchanged in LEDBAT.  Note that rampup is
506	   faster in slow start than during congestion avoidance and so very
507	   conservative implementations MAY skip slow start altogether.

509	   As far as congestion control is concerned, the receiver is then very
510	   simple and operates as follows, using a pseudocode:

512	   on data_packet:
513	     remote_timestamp = data_packet.timestamp
514	     acknowledgement.delay = local_timestamp() - remote_timestamp
515	     # fill in other fields of acknowledgement
516	     acknowlegement.send()

518	   The sender actually operates the congestion control algorithm and
519	   acts, in first approximation, as follows:

521	   on initialization:
522	     base_delay = +infinity

524	   on acknowledgement:
525	     current_delay = acknowledgement.delay
526	     base_delay = min(base_delay, current_delay)
527	     queuing_delay = current_delay - base_delay
528	     off_target = TARGET - queuing_delay
529	     cwnd += GAIN * off_target / cwnd

531	   The pseudocode above is a simplification and ignores noise filtering
532	   and base expiration.  The more precise pseudocode that takes these
533	   factors into account is as follows and MUST be followed:

535	   on initialization:
536	     set all NOISE_FILTER delays used by current_delay() to +infinity
537	     set all BASE_HISTORY delays used by base_delay() to +infinity
538	     last_rollover = -infinity # More than a minute in the past.

540	   on acknowledgement:
541	     delay = acknowledgement.delay
542	     update_base_delay(delay)
543	     update_current_delay(delay)
544	     queuing_delay = current_delay() - base_delay()
545	     off_target = TARGET - queuing_delay + random_input()
546	     cwnd += GAIN * off_target / cwnd
547	     # flight_size() is the amount of currently not acked data.
548	     max_allowed_cwnd = ALLOWED_INCREASE + TETHER*flight_size()
549	     cwnd = min(cwnd, max_allowed_cwnd)

551	   random_input()
552	     # random() is a PRNG between 0.0 and 1.0
553	     # NB: RANDOMNESS_AMOUNT is normally 0
554	     RANDOMNESS_AMOUNT * TARGET * ((random() - 0.5)*2)

556	   update_current_delay(delay)
557	     # Maintain a list of NOISE_FILTER last delays observed.
558	     forget the earliest of NOISE_FILTER current_delays
559	     add delay to the end of current_delays

561	   current_delay()
562	     min(the NOISE_FILTER delays stored by update_current_delay)

564	   update_base_delay(delay)
565	     # Maintain BASE_HISTORY min delays. Each represents a minute.
566	     if round_to_minute(now) != round_to_minute(last_rollover)
567	       last_rollover = now
568	       forget the earliest of base delays
569	       add delay to the end of base_delays
570	     else
571	       last of base_delays = min(last of base_delays, delay)

573	   base_delay()
574	     min(the BASE_HISTORY min delays stored by update_base_delay)

576	   TARGET parameter MUST be set to 25 milliseconds and GAIN MUST be set
577	   so that max rampup rate is the same as for TCP.  BASE_HISTORY SHOULD
578	   be 2; it MUST be no less than 2 and SHOULD NOT be more than 10.
579	   NOISE_FILTER SHOULD be 1; it MAY be tuned so that it is at least 1
580	   and no more than cwnd/2.  ALLOWED_INCREASE SHOULD BE 1 packet; it
581	   MUST be at least 1 packet and SHOULD NOT be more than 3 packets.
582	   TETHER SHOULD be 1.5; it MUST be greater than 1.  RANDONMESS_AMOUNT
583	   SHOULD be 0; it MUST be between 0 and 0.1 inclusively.

585	6.  Security Considerations

587	   A network on the path might choose to cause higher delay measurements
588	   than the real queuing delay so that LEDBAT backs off even when
589	   there's no congestion present.  Shaping of traffic into an
590	   artificially narrow bottleneck can't be counteracted, but faking
591	   timestamp field can and SHOULD.  A protocol using the LEDBAT
592	   congestion control SHOULD authenticate the timestamp and delay
593	   fields, preferably as part of authenticating most of the rest of the
594	   packet, with the exception of volatile header fields.  The choice of
595	   the authentication mechanism that resists man-in-the-middle attacks
596	   is outside of scope of this document.

598	7.  Normative References

600	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
601	              Requirement Levels", BCP 14, RFC 2119, March 1997.

603	Author's Address

605	   Stanislav Shalunov
606	   BitTorrent Inc
607	   612 Howard St, Suite 400
608	   San Francisco, CA  94105
609	   USA

611	   Email: shalunov@bittorrent.com
612	   URI:   http://shlang.com