idnits 2.17.1 

draft-welzl-rmcat-coupled-cc-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (October 24, 2014) is 3471 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC 2140 (Obsoleted by RFC 9040)


     Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	RTP Media Congestion Avoidance                                  M. Welzl
3	Techniques (rmcat)                                              S. Islam
4	Internet-Draft                                               S. Gjessing
5	Intended status: Experimental                         University of Oslo
6	Expires: April 27, 2015                                 October 24, 2014

8	                Coupled congestion control for RTP media
9	                    draft-welzl-rmcat-coupled-cc-04

11	Abstract

13	   When multiple congestion controlled RTP sessions traverse the same
14	   network bottleneck, it can be beneficial to combine their controls
15	   such that the total on-the-wire behavior is improved.  This document
16	   describes such a method for flows that have the same sender, in a way
17	   that is as flexible and simple as possible while minimizing the
18	   amount of changes needed to existing RTP applications.

20	Status of this Memo

22	   This Internet-Draft is submitted in full conformance with the
23	   provisions of BCP 78 and BCP 79.

25	   Internet-Drafts are working documents of the Internet Engineering
26	   Task Force (IETF).  Note that other groups may also distribute
27	   working documents as Internet-Drafts.  The list of current Internet-
28	   Drafts is at http://datatracker.ietf.org/drafts/current/.

30	   Internet-Drafts are draft documents valid for a maximum of six months
31	   and may be updated, replaced, or obsoleted by other documents at any
32	   time.  It is inappropriate to use Internet-Drafts as reference
33	   material or to cite them other than as "work in progress."

35	   This Internet-Draft will expire on April 27, 2015.

37	Copyright Notice

39	   Copyright (c) 2014 IETF Trust and the persons identified as the
40	   document authors.  All rights reserved.

42	   This document is subject to BCP 78 and the IETF Trust's Legal
43	   Provisions Relating to IETF Documents
44	   (http://trustee.ietf.org/license-info) in effect on the date of
45	   publication of this document.  Please review these documents
46	   carefully, as they describe your rights and restrictions with respect
47	   to this document.  Code Components extracted from this document must
48	   include Simplified BSD License text as described in Section 4.e of
49	   the Trust Legal Provisions and are provided without warranty as
50	   described in the Simplified BSD License.

52	Table of Contents

54	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
55	   2.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  3
56	   3.  Limitations  . . . . . . . . . . . . . . . . . . . . . . . . .  4
57	   4.  Architectural overview . . . . . . . . . . . . . . . . . . . .  5
58	   5.  Roles  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
59	     5.1.  SBD  . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
60	     5.2.  FSE  . . . . . . . . . . . . . . . . . . . . . . . . . . .  7
61	     5.3.  Flows  . . . . . . . . . . . . . . . . . . . . . . . . . .  7
62	       5.3.1.  Example algorithm 1 - Active FSE . . . . . . . . . . .  8
63	       5.3.2.  Example algorithm 2 - Conservative Active FSE  . . . .  9
64	   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
65	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 10
66	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 10
67	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
68	     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 11
69	     9.2.  Informative References . . . . . . . . . . . . . . . . . . 11
70	   Appendix A.  Example algorithm - Passive FSE . . . . . . . . . . . 12
71	     A.1.  Example operation (passive)  . . . . . . . . . . . . . . . 14
72	   Appendix B.  Change log  . . . . . . . . . . . . . . . . . . . . . 19
73	     B.1.  Changes from -00 to -01  . . . . . . . . . . . . . . . . . 19
74	     B.2.  Changes from -01 to -02  . . . . . . . . . . . . . . . . . 19
75	     B.3.  Changes from -02 to -03  . . . . . . . . . . . . . . . . . 19
76	     B.4.  Changes from -03 to -04  . . . . . . . . . . . . . . . . . 19
77	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19

79	1.  Introduction

81	   When there is enough data to send, a congestion controller must
82	   increase its sending rate until the path's capacity has been reached;
83	   depending on the controller, sometimes the rate is increased further,
84	   until packets are ECN-marked or dropped.  This process inevitably
85	   creates undesirable queuing delay -- an effect that is amplified when
86	   multiple congestion controlled connections traverse the same network
87	   bottleneck.  When such connections originate from the same host, it
88	   would therefore be ideal to use only one single sender-side
89	   congestion controller which determines the overall allowed sending
90	   rate, and then use a local scheduler to assign a proportion of this
91	   rate to each RTP session.  This way, priorities could also be
92	   implemented quite easily, as a function of the scheduler; honoring
93	   user-specified priorities is, for example, required by rtcweb
94	   [rtcweb-usecases].

96	   The Congestion Manager (CM) [RFC3124] provides a single congestion
97	   controller with a scheduling function just as described above.  It is
98	   hard to implement because it requires an additional congestion
99	   controller and removes all per-connection congestion control
100	   functionality, which is quite a significant change to existing RTP
101	   based applications.  This document presents a method that is easier
102	   to implement than the CM and also requires less significant changes
103	   to existing RTP based applications.  It attempts to roughly
104	   approximate the CM behavior by sharing information between existing
105	   congestion controllers, akin to "Ensemble Sharing" in [RFC2140].

107	2.  Definitions

109	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
110	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
111	   document are to be interpreted as described in RFC 2119 [RFC2119].

113	   Available Bandwidth:
114	         The available bandwidth is the nominal link capacity minus the
115	         amount of traffic that traversed the link during a certain time
116	         interval, divided by that time interval.

118	   Bottleneck:
119	         The first link with the smallest available bandwidth along the
120	         path between a sender and receiver.

122	   Flow:
123	         A flow is the entity that congestion control is operating on.
124	         It could, for example, be a transport layer connection, an RTP
125	         session, or a subsession that is multiplexed onto a single RTP
126	         session together with other subsessions.

128	   Flow Group Identifier (FGI):
129	         A unique identifier for each subset of flows that is limited by
130	         a common bottleneck.

132	   Flow State Exchange (FSE):
133	         The entity that maintains information that is exchanged between
134	         flows.

136	   Flow Group (FG):
137	         A group of flows having the same FGI.

139	   Shared Bottleneck Detection (SBD):
140	         The entity that determines which flows traverse the same
141	         bottleneck in the network, or the process of doing so.

143	3.  Limitations

145	   Sender-side only:
146	         Coupled congestion control as described here only operates
147	         inside a single host on the sender side.  This is because,
148	         irrespective of where the major decisions for congestion
149	         control are taken, the sender of a flow needs to eventually
150	         decide the transmission rate.  Additionally, the necessary
151	         information about how much data an application can currently
152	         send on a flow is typically only available at the sender side,
153	         making the sender an obvious choice for placement of the
154	         elements and mechanisms described here.

156	         When implementing a sender-side change to a congestion control
157	         mechanism such as TFRC [RFC5348], where receiver-side
158	         calculations make assumptions about the rate of the sender, the
159	         receiver also needs to be updated accordingly.  Flows that have
160	         different senders but the same receiver, or different senders
161	         and different receivers can also share a bottleneck; such
162	         scenarios have been omitted for simplicity, and could be
163	         incorporated in future versions of this document.  Note that
164	         limiting the number of flows on which coupled congestion
165	         control operates merely limits the benefits derived from the
166	         mechanism.

168	   Shared bottlenecks do not change quickly:
169	         As per the definition above, a bottleneck depends on cross
170	         traffic, and since such traffic can heavily fluctuate,
171	         bottlenecks can change at a high frequency (e.g., there can be
172	         oscillation between two or more links).  This means that, when
173	         flows are partially routed along different paths, they may
174	         quickly change between sharing and not sharing a bottleneck.
175	         For simplicity, here it is assumed that a shared bottleneck is
176	         valid for a time interval that is significantly longer than the
177	         interval at which congestion controllers operate.  Note that,
178	         for the only SBD mechanism defined in this document
179	         (multiplexing on the same five-tuple), the notion of a shared
180	         bottleneck stays correct even in the presence of fast traffic
181	         fluctuations: since all flows that are assumed to share a
182	         bottleneck are routed in the same way, if the bottleneck
183	         changes, it will still be shared.

185	4.  Architectural overview

187	   Figure 1 shows the elements of the architecture for coupled
188	   congestion control: the Flow State Exchange (FSE), Shared Bottleneck
189	   Detection (SBD) and Flows.  The FSE is a storage element that can be
190	   implemented in two ways: active and passive.  In the active version,
191	   it initiates communication with flows and SBD.  However, in the
192	   passive version, it does not actively initiate communication with
193	   flows and SBD; its only active role is internal state maintenance
194	   (e.g., an implementation could use soft state to remove a flow's data
195	   after long periods of inactivity).  Every time a flow's congestion
196	   control mechanism would normally update its sending rate, the flow
197	   instead updates information in the FSE and performs a query on the
198	   FSE, leading to a sending rate that can be different from what the
199	   congestion controller originally determined.  Using information
200	   about/from the currently active flows, SBD updates the FSE with the
201	   correct Flow State Identifiers (FSIs).

203	                          -------  <---  Flow 1
204	                          | FSE |  <---  Flow 2 ..
205	                          -------  <---  .. Flow N
206	                             ^
207	                             |             |
208	                          -------          |
209	                          | SBD |  <-------|
210	                          -------

212	             Figure 1: Coupled congestion control architecture

214	   Since everything shown in Figure 1 is assumed to operate on a single
215	   host (the sender) only, this document only describes aspects that
216	   have an influence on the resulting on-the-wire behavior.  It does,
217	   for instance, not define how many bits must be used to represent
218	   FSIs, or in which way the entities communicate.  Implementations can
219	   take various forms: for instance, all the elements in the figure
220	   could be implemented within a single application, thereby operating
221	   on flows generated by that application only.  Another alternative
222	   could be to implement both the FSE and SBD together in a separate
223	   process which different applications communicate with via some form
224	   of Inter-Process Communication (IPC).  Such an implementation would
225	   extend the scope to flows generated by multiple applications.  The
226	   FSE and SBD could also be included in the Operating System kernel.

228	5.  Roles

230	   This section gives an overview of the roles of the elements of
231	   coupled congestion control, and provides an example of how coupled
232	   congestion control can operate.

234	5.1.  SBD

236	   SBD uses knowledge about the flows to determine which flows belong in
237	   the same Flow Group (FG), and assigns FGIs accordingly.  This
238	   knowledge can be derived in three basic ways:

240	   1.  From multiplexing: it can be based on the simple assumption that
241	       packets sharing the same five-tuple (IP source and destination
242	       address, protocol, and transport layer port number pair) and
243	       having the same Differentiated Services Code Point (DSCP) in the
244	       IP header are typically treated in the same way along the path.
245	       The latter method is the only one specified in this document: SBD
246	       MAY consider all flows that use the same five-tuple and DSCP to
247	       belong to the same FG.  This classification applies to certain
248	       tunnels, or RTP flows that are multiplexed over one transport
249	       (cf. [transport-multiplex]).  In one way or another, such
250	       multiplexing will probably be recommended for use with rtcweb
251	       [rtcweb-rtp-usage].

253	   2.  Via configuration: e.g. by assuming that a common wireless uplink
254	       is also a shared bottleneck.

256	   3.  From measurements: e.g. by considering correlations among
257	       measured delay and loss as an indication of a shared bottleneck.

259	   The methods above have some essential trade-offs: e.g., multiplexing
260	   is a completely reliable measure, however it is limited in scope to
261	   two end points (i.e., it cannot be applied to couple congestion
262	   controllers of one sender talking to multiple receivers).  A
263	   measurement-based SBD mechanism is described in [sbd].  Measurements
264	   can never be 100% reliable, in particular because they are based on
265	   the past but applying coupled congestion control means to make an
266	   assumption about the future; it is therefore recommended to implement
267	   cautionary measures, e.g. by disabling coupled congestion control if
268	   enabling it causes a significant increase in delay and/or packet
269	   loss.  Measurements also take time, which entails a certain delay for
270	   turning on coupling (refer to [sbd] for details).

272	5.2.  FSE

274	   The FSE contains a list of all flows that have registered with it.
275	   For each flow, it stores the following:

277	   o  a unique flow number to identify the flow

279	   o  the FGI of the FG that it belongs to (based on the definitions in
280	      this document, a flow has only one bottleneck, and can therefore
281	      be in only one FG)

283	   o  a priority P, which here is assumed to be represented as a
284	      floating point number in the range from 0.1 (unimportant) to 1
285	      (very important).  A negative value is used to indicate that a
286	      flow has terminated

288	   o  The rate used by the flow in bits per second, FSE_R.

290	   The FSE can operate on window-based as well as rate-based congestion
291	   controllers (TEMPORARY NOTE: and probably -- not yet tested --
292	   combinations thereof, with calculations to convert from one to the
293	   other).  In case of a window-based controller, FSE_R is a window, and
294	   all the text below should be considered to refer to window, not
295	   rates.

297	   In the FSE, each FG contains one static variable S_CR which is meant
298	   to be the sum of the calculated rates of all flows in the same FG
299	   (including the flow itself).  This value is used to calculate the
300	   sending rate.

302	   The information listed here is enough to implement the sample flow
303	   algorithm given below.  FSE implementations could easily be extended
304	   to store, e.g., a flow's current sending rate for statistics
305	   gathering or future potential optimizations.

307	5.3.  Flows

309	   Flows register themselves with SBD and FSE when they start,
310	   deregister from the FSE when they stop, and carry out an UPDATE
311	   function call every time their congestion controller calculates a new
312	   sending rate.  Via UPDATE, they provide the newly calculated rate and
313	   the desired rate (less than the calculated rate in case of
314	   application-limited flows, the same otherwise).

316	   Below, two example algorithms are described.  While other algorithms
317	   could be used instead, the same algorithm must be applied to all
318	   flows.

320	5.3.1.  Example algorithm 1 - Active FSE

322	   This algorithm was designed to be the simplest possible method to
323	   assign rates according to the priorities of flows.  Simulations
324	   results in [fse] indicate that it does however not significantly
325	   reduce queuing delay and packet loss.

327	   (1)  When a flow f starts, it registers itself with SBD and the FSE.
328	        FSE_R is initialized with the congestion controller's initial
329	        rate.  SBD will assign the correct FGI.  When a flow is assigned
330	        an FGI, it adds its FSE_R to S_CR.

332	   (2)  When a flow f stops, its entry is removed from the list.

334	   (3)  Every time the congestion controller of the flow f determines a
335	        new sending rate CC_R, the flow calls UPDATE, which carries out
336	        the tasks listed below to derive the new sending rates for all
337	        the flows in the FG.  A flow's UPDATE function uses a local
338	        (i.e. per-flow) temporary variable S_P, which is the sum of all
339	        the priorities.

341	        (a)  It updates S_CR.

343	               S_CR = S_CR + CC_R - FSE_R(f)

345	        (b)  It calculates the sum of all the priorities, S_P.

347	               S_P = 0
348	               for all flows i in FG do
349	                   S_P = S_P + P(i)
350	               end for

352	        (c)  It calculates the sending rates for all the flows in an FG
353	             and distributes them.

355	               for all flows i in FG do
356	                   FSE_R(i) = (P(i)*S_CR)/S_P
357	                   send FSE_R(i) to the flow i
358	               end for

360	5.3.2.  Example algorithm 2 - Conservative Active FSE

362	   This algorithm extends algorithm 1 to conservatively emulate the
363	   behavior of a single flow by proportionally reducing the aggregate
364	   rate on congestion.  Simulations results in [fse] indicate that it
365	   can significantly reduce queuing delay and packet loss.

367	   (1)  When a flow f starts, it registers itself with SBD and the FSE.
368	        FSE_R is initialized with the congestion controller's initial
369	        rate.  SBD will assign the correct FGI.  When a flow is assigned
370	        an FGI, it adds its FSE_R to S_CR.

372	   (2)  When a flow f stops, its entry is removed from the list.

374	   (3)  Every time the congestion controller of the flow f determines a
375	        new sending rate CC_R, the flow calls UPDATE, which carries out
376	        the tasks listed below to derive the new sending rates for all
377	        the flows in the FG.  A flow's UPDATE function uses a local
378	        (i.e. per-flow) temporary variable S_P, which is the sum of all
379	        the priorities, and a local variable DELTA, which is used to
380	        calculate the difference between CC_R and the previously stored
381	        FSE_R. To prevent flows from either ignoring congestion or
382	        overreacting, a timer keeps them from changing their rates
383	        immediately after the common rate reduction that follows a
384	        congestion event.  This timer is set to 2 RTTs of the flow that
385	        experienced congestion because it is assumed that a congestion
386	        event can persist for up to one RTT of that flow, with another
387	        RTT added to compensate for fluctuations in the measured RTT
388	        value.

390	        (a)  It updates S_CR based on DELTA.

392	               if Timer has expired or not set then
393	                 DELTA = CC_R - FSE_R(f)
394	                 if DELTA < 0 then  // Reduce S_CR proportionally
395	                   S_CR = S_CR * CC_R / FSE_R(f)
396	                   Set Timer for 2 RTTs
397	                 else
398	                   S_CR = S_CR + DELTA
399	                 end if
400	                end if

402	        (b)  It calculates the sum of all the priorities, S_P.

404	               S_P = 0
405	               for all flows i in FG do
406	                   S_P = S_P + P(i)
407	               end for

409	        (c)  It calculates the sending rates for all the flows in an FG
410	             and distributes them.

412	               for all flows i in FG do
413	                   FSE_R(i) = (P(i)*S_CR)/S_P
414	                   send FSE_R(i) to the flow i
415	               end for

417	6.  Acknowledgements

419	   This document has benefitted from discussions with and feedback from
420	   David Hayes, Andreas Petlund, and David Ros (who also gave the FSE
421	   its name).

423	   This work was partially funded by the European Community under its
424	   Seventh Framework Programme through the Reducing Internet Transport
425	   Latency (RITE) project (ICT-317700).

427	7.  IANA Considerations

429	   This memo includes no request to IANA.

431	8.  Security Considerations

433	   In scenarios where the architecture described in this document is
434	   applied across applications, various cheating possibilities arise:
435	   e.g., supporting wrong values for the calculated rate, the desired
436	   rate, or the priority of a flow.  In the worst case, such cheating
437	   could either prevent other flows from sending or make them send at a
438	   rate that is unreasonably large.  The end result would be unfair
439	   behavior at the network bottleneck, akin to what could be achieved
440	   with any UDP based application.  Hence, since this is no worse than
441	   UDP in general, there seems to be no significant harm in using this
442	   in the absence of UDP rate limiters.

444	   In the case of a single-user system, it should also be in the
445	   interest of any application programmer to give the user the best
446	   possible experience by using reasonable flow priorities or even
447	   letting the user choose them.  In a multi-user system, this interest
448	   may not be given, and one could imagine the worst case of an "arms
449	   race" situation, where applications end up setting their priorities
450	   to the maximum value.  If all applications do this, the end result is
451	   a fair allocation in which the priority mechanism is implicitly
452	   eliminated, and no major harm is done.

454	9.  References

456	9.1.  Normative References

458	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
459	              Requirement Levels", BCP 14, RFC 2119, March 1997.

461	   [RFC2140]  Touch, J., "TCP Control Block Interdependence", RFC 2140,
462	              April 1997.

464	   [RFC3124]  Balakrishnan, H. and S. Seshan, "The Congestion Manager",
465	              RFC 3124, June 2001.

467	   [RFC5348]  Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP
468	              Friendly Rate Control (TFRC): Protocol Specification",
469	              RFC 5348, September 2008.

471	9.2.  Informative References

473	   [fse]      Islam, S., Welzl, M., Gjessing, S., and N. Khademi,
474	              "Coupled Congestion Control for RTP Media", ACM SIGCOMM
475	              Capacity Sharing Workshop (CSWS 2014); extended version
476	              available as a technical report from
477	              http://safiquli.at.ifi.uio.no/paper/fse-tech-report.pdf ,
478	              2014.

480	   [rtcweb-rtp-usage]
481	              Perkins, C., Westerlund, M., and J. Ott, "Web Real-Time
482	              Communication (WebRTC): Media Transport and Use of RTP",
483	              draft-ietf-rtcweb-rtp-usage-18.txt (work in progress),
484	              October 2014.

486	   [rtcweb-usecases]
487	              Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real-
488	              Time Communication Use-cases and Requirements",
489	              draft-ietf-rtcweb-use-cases-and-requirements-14.txt (work
490	              in progress), February 2014.

492	   [sbd]      Hayes, D., Ferlin, S., and M. Welzl, "Shared Bottleneck
493	              Detection for Coupled Congestion Control for RTP Media",
494	              draft-hayes-rmcat-sbd-00.txt (work in progress),
495	              October 2014.

497	   [transport-multiplex]
498	              Westerlund, M. and C. Perkins, "Multiple RTP Sessions on a
499	              Single Lower-Layer Transport",
500	              draft-westerlund-avtcore-transport-multiplexing-07.txt
501	              (work in progress), October 2013.

503	Appendix A.  Example algorithm - Passive FSE

505	   Active algorithms calculate the rates for all the flows in the FG and
506	   actively distribute them.  In a passive algorithm, UPDATE returns a
507	   rate that should be used instead of the rate that the congestion
508	   controller has determined.  This can make a passive algorithm easier
509	   to implement; however, the resulting dynamics are not fully
510	   understood.  The algorithm described below is to be considered as
511	   highly experimental and did not perform as well as the active
512	   variants in simulations.

514	   This passive version of the FSE stores the following information in
515	   addition to the variables described in Section 5.2:

517	   o  The desired rate DR.  This can be smaller than the calculated rate
518	      if the application feeding into the flow has less data to send
519	      than the congestion controller would allow.  In case of a bulk
520	      transfer, DR must be set to CC_R received from the flow's
521	      congestion module.

523	   The passive version of the FSE contains one static variable per FG
524	   called TLO (Total Leftover Rate -- used to let a flow 'take'
525	   bandwidth from application-limited or terminated flows) which is
526	   initialized to 0.  For the passive version, S_CR is limited to
527	   increase or decrease as conservatively as a flow's congestion
528	   controller decides in order to prohibit sudden rate jumps.

530	   (1)  When a flow f starts, it registers itself with SBD and the FSE.
531	        FSE_R and DR are initialized with the congestion controller's
532	        initial rate.  SBD will assign the correct FGI.  When a flow is
533	        assigned an FGI, it adds its FSE_R to S_CR.

535	   (2)  When a flow f stops, it sets its DR to 0 and sets P to -1.

537	   (3)  Every time the congestion controller of the flow f determines a
538	        new sending rate CC_R, assuming the flow's new desired rate
539	        new_DR to be "infinity" in case of a bulk data transfer with an
540	        unknown maximum rate, the flow calls UPDATE, which carries out
541	        the tasks listed below to derive the flow's new sending rate,
542	        Rate.  A flow's UPDATE function uses a few local (i.e. per-flow)
543	        temporary variables, which are all initialized to 0: DELTA,
544	        new_S_CR and S_P.

546	        (a)  For all the flows in its FG (including itself), it
547	             calculates the sum of all the calculated rates, new_S_CR.
548	             Then it calculates the difference between FSE_R(f) and
549	             CC_R, DELTA.

551	               for all flows i in FG do
552	                   new_S_CR = new_S_CR + FSE_R(i)
553	               end for
554	               DELTA =  CC_R - FSE_R(f)

556	        (b)  It updates S_CR, FSE_R(f) and DR(f).

558	               FSE_R(f) = CC_R
559	               if DELTA > 0 then  // the flow's rate has increased
560	                   S_CR = S_CR + DELTA
561	               else if DELTA < 0 then
562	                   S_CR = new_S_CR + DELTA
563	               end if
564	               DR(f) = min(new_DR,FSE_R(f))

566	        (c)  It calculates the leftover rate TLO, removes the terminated
567	             flows from the FSE and calculates the sum of all the
568	             priorities, S_P.

570	               for all flows i in FG do
571	                  if P(i)<0 then
572	                     delete flow
573	                  else
574	                     S_P = S_P + P(i)
575	                  end if
576	               end for
577	               if DR(f) < FSE_R(f) then
578	                  TLO = TLO + (P(f)/S_P) * S_CR - DR(f))
579	               end if

581	        (d)  It calculates the sending rate, Rate.

583	               Rate = min(new_DR, (P(f)*S_CR)/S_P + TLO)

585	               if Rate != new_DR and TLO > 0 then
586	                   TLO = 0  // f has 'taken' TLO
587	               end if

589	        (e)  It updates DR(f) and FSE_R(f) with Rate.

591	               if Rate > DR(f) then
592	                   DR(f) = Rate
593	               end if
594	               FSE_R(f)  = Rate

596	   The goals of the flow algorithm are to achieve prioritization,
597	   improve network utilization in the face of application-limited flows,
598	   and impose limits on the increase behavior such that the negative
599	   impact of multiple flows trying to increase their rate together is
600	   minimized.  It does that by assigning a flow a sending rate that may
601	   not be what the flow's congestion controller expected.  It therefore
602	   builds on the assumption that no significant inefficiencies arise
603	   from temporary application-limited behavior or from quickly jumping
604	   to a rate that is higher than the congestion controller intended.
605	   How problematic these issues really are depends on the controllers in
606	   use and requires careful per-controller experimentation.  The coupled
607	   congestion control mechanism described here also does not require all
608	   controllers to be equal; effects of heterogeneous controllers, or
609	   homogeneous controllers being in different states, are also subject
610	   to experimentation.

612	   This algorithm gives all the leftover rate of application-limited
613	   flows to the first flow that updates its sending rate, provided that
614	   this flow needs it all (otherwise, its own leftover rate can be taken
615	   by the next flow that updates its rate).  Other policies could be
616	   applied, e.g. to divide the leftover rate of a flow equally among all
617	   other flows in the FGI.

619	A.1.  Example operation (passive)

621	   In order to illustrate the operation of the passive coupled
622	   congestion control algorithm, this section presents a toy example of
623	   two flows that use it.  Let us assume that both flows traverse a
624	   common 10 Mbit/s bottleneck and use a simplistic congestion
625	   controller that starts out with 1 Mbit/s, increases its rate by 1
626	   Mbit/s in the absence of congestion and decreases it by 2 Mbit/s in
627	   the presence of congestion.  For simplicity, flows are assumed to
628	   always operate in a round-robin fashion.  Rate numbers below without
629	   units are assumed to be in Mbit/s.  For illustration purposes, the
630	   actual sending rate is also shown for every flow in FSE diagrams even
631	   though it is not really stored in the FSE.

633	   Flow #1 begins.  It is a bulk data transfer and considers itself to
634	   have top priority.  This is the FSE after the flow algorithm's step
635	   1:

637	   ----------------------------------------
638	   | # | FGI |  P  | FSE_R  |  DR  | Rate |
639	   |   |     |     |        |      |      |
640	   | 1 |  1  |  1  |   1    |   1  |   1  |
641	   ----------------------------------------
642	   S_CR = 1, TLO = 0

644	   Its congestion controller gradually increases its rate.  Eventually,
645	   at some point, the FSE should look like this:

647	   --------------------------------------
648	   | # | FGI |  P  |  FSE_R  |  DR  | Rate |
649	   |   |     |     |         |      |      |
650	   | 1 |  1  |  1  |   10    |  10  |  10  |
651	   -----------------------------------------
652	   S_CR = 10, TLO = 0

654	   Now another flow joins.  It is also a bulk data transfer, and has a
655	   lower priority (0.5):

657	   ----------------------------------------
658	   | # | FGI |   P   | FSE_R  |  DR  | Rate |
659	   |   |     |       |        |      |      |
660	   | 1 |  1  |   1   |   10   |  10  |  10  |
661	   | 2 |  1  |  0.5  |    1   |   1  |   1  |
662	   ------------------------------------------
663	   S_CR = 11, TLO = 0

665	   Now assume that the first flow updates its rate to 8, because the
666	   total sending rate of 11 exceeds the total capacity.  Let us take a
667	   closer look at what happens in step 3 of the flow algorithm.

669	   CC_R = 8. new_DR = infinity.
670	   3 a) new_S_CR = 11; DELTA = 8 - 10 = -2.
671	   3 b) FSE_Rf) = 8. DELTA is negative, hence S_CR = 9;
672	        DR(f) = 8.
673	   3 c) S_P = 1.5.
674	   3 d) new sending rate = min(infinity, 1/1.5 * 9 + 0) = 6.
675	   3 e) FSE_R(f) = 6.

677	   The resulting FSE looks as follows:
678	   ----------------------------------------
679	   | # | FGI |   P   |  FSE_R  |  DR  | Rate |
680	   |   |     |       |         |      |      |
681	   | 1 |  1  |   1   |    6    |   8  |   6  |
682	   | 2 |  1  |  0.5  |    1    |   1  |   1  |
683	   -------------------------------------------
684	   S_CR = 9, TLO = 0

686	   The effect is that flow #1 is sending with 6 Mbit/s instead of the 8
687	   Mbit/s that the congestion controller derived.  Let us now assume
688	   that flow #2 updates its rate.  Its congestion controller detects
689	   that the network is not fully saturated (the actual total sending
690	   rate is 6+1=7) and increases its rate.

692	   CC_R=2. new_DR = infinity.
693	   3 a) new_S_CR = 7; DELTA = 2 - 1 = 1.
694	   3 b) FSE_R(f) = 2. DELTA is positive, hence S_CR = 9 + 1 = 10;
695	        DR(f) = 2.
696	   3 c) S_P = 1.5.
697	   3 d) new sending rate = min(infinity, 0.5/1.5 * 10 + 0) = 3.33.
698	   3 e) DR(f) = FSE_R(f) = 3.33.

700	   The resulting FSE looks as follows:
701	   -------------------------------------------
702	   | # | FGI |   P   |  FSE_R  |  DR  | Rate |
703	   |   |     |       |         |      |      |
704	   | 1 |  1  |   1   |    6    |   8  |   6  |
705	   | 2 |  1  |  0.5  |   3.33  | 3.33 | 3.33 |
706	   -------------------------------------------
707	   S_CR = 10, TLO = 0
708	   The effect is that flow #2 is now sending with 3.33 Mbit/s, which is
709	   close to half of the rate of flow #1 and leads to a total utilization
710	   of 6(#1) + 3.33(#2) = 9.33 Mbit/s.  Flow #2's congestion controller
711	   has increased its rate faster than the controller actually expected.
712	   Now, flow #1 updates its rate.  Its congestion controller detects
713	   that the network is not fully saturated and increases its rate.
714	   Additionally, the application feeding into flow #1 limits the flow's
715	   sending rate to at most 2 Mbit/s.

717	   CC_R=7. new_DR=2.
718	   3 a) new_S_CR = 9.33; DELTA = 1.
719	   3 b) FSE_R(f) = 7, DELTA is positive, hence S_CR = 10 + 1 = 11;
720	        DR = min(2, 7) = 2.
721	   3 c) S_P = 1.5; DR(f) < FSE_R(f), hence TLO = 1/1.5 * 11 - 2 = 5.33.
722	   3 d) new sending rate = min(2, 1/1.5 * 11 + 5.33) = 2.
723	   3 e) FSE_R(f) = 2.

725	   The resulting FSE looks as follows:
726	   -------------------------------------------
727	   | # | FGI |   P   |  FSE_R  |  DR  | Rate |
728	   |   |     |       |         |      |      |
729	   | 1 |  1  |   1   |    2    |   2  |   2  |
730	   | 2 |  1  |  0.5  |   3.33  | 3.33 | 3.33 |
731	   -------------------------------------------
732	   S_CR = 11, TLO = 5.33

734	   Now, the total rate of the two flows is 2 + 3.33 = 5.33 Mbit/s, i.e.
735	   the network is significantly underutilized due to the limitation of
736	   flow #1.  Flow #2 updates its rate.  Its congestion controller
737	   detects that the network is not fully saturated and increases its
738	   rate.

740	   CC_R=4.33. new_DR = infinity.
741	   3 a) new_S_CR = 5.33; DELTA = 1.
742	   3 b) FSE_R(f) = 4.33. DELTA is positive, hence S_CR = 12;
743	        DR(f) = 4.33.
744	   3 c) S_P = 1.5.
745	   3 d) new sending rate: min(infinity, 0.5/1.5 * 12 + 5.33 ) = 9.33.
746	   3 e) FSE_R(f) = 9.33, DR(f) = 9.33.

748	   The resulting FSE looks as follows:
749	   -------------------------------------------
750	   | # | FGI |   P   |  FSE_R  |  DR  | Rate |
751	   |   |     |       |         |      |      |
752	   | 1 |  1  |   1   |    2    |   2  |   2  |
753	   | 2 |  1  |  0.5  |   9.33  | 9.33 | 9.33 |
754	   -------------------------------------------
755	   S_CR = 12, TLO = 0

757	   Now, the total rate of the two flows is 2 + 9.33 = 11.33 Mbit/s.
758	   Finally, flow #1 terminates.  It sets P to -1 and DR to 0.  Let us
759	   assume that it terminated late enough for flow #2 to still experience
760	   the network in a congested state, i.e. flow #2 decreases its rate in
761	   the next iteration.

763	   CC_R = 7.33. new_DR = infinity.
764	   3 a) new_S_CR = 11.33; DELTA = -2.
765	   3 b) FSE_R(f) = 7.33. DELTA is negative, hence S_CR = 9.33;
766	        DR(f) = 7.33.
767	   3 c) Flow 1 has P = -1, hence it is deleted from the FSE.
768	        S_P = 0.5.
769	   3 d) new sending rate: min(infinity, 0.5/0.5*9.33 + 0) = 9.33.
770	   3 e) FSE_R(f) = DR(f) = 9.33.

772	   The resulting FSE looks as follows:
773	   -------------------------------------------
774	   | # | FGI |   P   |  FSE_R  |  DR  | Rate |
775	   |   |     |       |         |      |      |
776	   | 2 |  1  |  0.5  |   9.33  | 9.33 | 9.33 |
777	   -------------------------------------------
778	   S_CR = 9.33, TLO = 0

780	Appendix B.  Change log

782	B.1.  Changes from -00 to -01

784	   o  Added change log.

786	   o  Updated the example algorithm and its operation.

788	B.2.  Changes from -01 to -02

790	   o  Included an active version of the algorithm which is simpler.

792	   o  Replaced "greedy flow" with "bulk data transfer" and "non-greedy"
793	      with "application-limited".

795	   o  Updated new_CR to CC_R, and CR to FSE_R for better understanding.

797	B.3.  Changes from -02 to -03

799	   o  Included an active conservative version of the algorithm which
800	      reduces queue growth and packet loss; added a reference to a
801	      technical report that shows these benefits with simulations.

803	   o  Moved the passive variant of the algorithm to appendix.

805	B.4.  Changes from -03 to -04

807	   o  Extended SBD section.

809	   o  Added a note about window-based controllers.

811	Authors' Addresses

813	   Michael Welzl
814	   University of Oslo
815	   PO Box 1080 Blindern
816	   Oslo,   N-0316
817	   Norway

819	   Phone: +47 22 85 24 20
820	   Email: michawe@ifi.uio.no
821	   Safiqul Islam
822	   University of Oslo
823	   PO Box 1080 Blindern
824	   Oslo,   N-0316
825	   Norway

827	   Phone: +47 22 84 08 37
828	   Email: safiquli@ifi.uio.no

830	   Stein Gjessing
831	   University of Oslo
832	   PO Box 1080 Blindern
833	   Oslo,   N-0316
834	   Norway

836	   Phone: +47 22 85 24 44
837	   Email: steing@ifi.uio.no