idnits 2.17.1 

draft-welzl-rmcat-coupled-cc-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (June 18, 2015) is 3235 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	RTP Media Congestion Avoidance                                  M. Welzl
3	Techniques (rmcat)                                              S. Islam
4	Internet-Draft                                               S. Gjessing
5	Intended status: Experimental                         University of Oslo
6	Expires: December 20, 2015                                 June 18, 2015

8	                Coupled congestion control for RTP media
9	                    draft-welzl-rmcat-coupled-cc-05

11	Abstract

13	   When multiple congestion controlled RTP sessions traverse the same
14	   network bottleneck, it can be beneficial to combine their controls
15	   such that the total on-the-wire behavior is improved.  This document
16	   describes such a method for flows that have the same sender, in a way
17	   that is as flexible and simple as possible while minimizing the
18	   amount of changes needed to existing RTP applications.  It specifies
19	   how to apply the method for the NADA congestion control algorithm.

21	Status of this Memo

23	   This Internet-Draft is submitted in full conformance with the
24	   provisions of BCP 78 and BCP 79.

26	   Internet-Drafts are working documents of the Internet Engineering
27	   Task Force (IETF).  Note that other groups may also distribute
28	   working documents as Internet-Drafts.  The list of current Internet-
29	   Drafts is at http://datatracker.ietf.org/drafts/current/.

31	   Internet-Drafts are draft documents valid for a maximum of six months
32	   and may be updated, replaced, or obsoleted by other documents at any
33	   time.  It is inappropriate to use Internet-Drafts as reference
34	   material or to cite them other than as "work in progress."

36	   This Internet-Draft will expire on December 20, 2015.

38	Copyright Notice

40	   Copyright (c) 2015 IETF Trust and the persons identified as the
41	   document authors.  All rights reserved.

43	   This document is subject to BCP 78 and the IETF Trust's Legal
44	   Provisions Relating to IETF Documents
45	   (http://trustee.ietf.org/license-info) in effect on the date of
46	   publication of this document.  Please review these documents
47	   carefully, as they describe your rights and restrictions with respect
48	   to this document.  Code Components extracted from this document must
49	   include Simplified BSD License text as described in Section 4.e of
50	   the Trust Legal Provisions and are provided without warranty as
51	   described in the Simplified BSD License.

53	Table of Contents

55	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
56	   2.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  3
57	   3.  Limitations  . . . . . . . . . . . . . . . . . . . . . . . . .  4
58	   4.  Architectural overview . . . . . . . . . . . . . . . . . . . .  5
59	   5.  Roles  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
60	     5.1.  SBD  . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
61	     5.2.  FSE  . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
62	     5.3.  Flows  . . . . . . . . . . . . . . . . . . . . . . . . . .  7
63	       5.3.1.  Example algorithm 1 - Active FSE . . . . . . . . . . .  7
64	       5.3.2.  Example algorithm 2 - Conservative Active FSE  . . . .  8
65	   6.  Application  . . . . . . . . . . . . . . . . . . . . . . . . . 10
66	     6.1.  NADA . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
67	     6.2.  General recommendations  . . . . . . . . . . . . . . . . . 10
68	   7.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11
69	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 11
70	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 11
71	   10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
72	     10.1. Normative References . . . . . . . . . . . . . . . . . . . 11
73	     10.2. Informative References . . . . . . . . . . . . . . . . . . 12
74	   Appendix A.  Example algorithm - Passive FSE . . . . . . . . . . . 12
75	     A.1.  Example operation (passive)  . . . . . . . . . . . . . . . 15
76	   Appendix B.  Change log  . . . . . . . . . . . . . . . . . . . . . 19
77	     B.1.  Changes from -00 to -01  . . . . . . . . . . . . . . . . . 19
78	     B.2.  Changes from -01 to -02  . . . . . . . . . . . . . . . . . 19
79	     B.3.  Changes from -02 to -03  . . . . . . . . . . . . . . . . . 19
80	     B.4.  Changes from -03 to -04  . . . . . . . . . . . . . . . . . 20
81	     B.5.  Changes from -04 to -05  . . . . . . . . . . . . . . . . . 20
82	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20

84	1.  Introduction

86	   When there is enough data to send, a congestion controller must
87	   increase its sending rate until the path's capacity has been reached;
88	   depending on the controller, sometimes the rate is increased further,
89	   until packets are ECN-marked or dropped.  This process inevitably
90	   creates undesirable queuing delay -- an effect that is amplified when
91	   multiple congestion controlled connections traverse the same network
92	   bottleneck.  When such connections originate from the same host, it
93	   would therefore be ideal to use only one single sender-side
94	   congestion controller which determines the overall allowed sending
95	   rate, and then use a local scheduler to assign a proportion of this
96	   rate to each RTP session.  This way, priorities could also be
97	   implemented quite easily, as a function of the scheduler; honoring
98	   user-specified priorities is, for example, required by rtcweb
99	   [rtcweb-usecases].

101	   The Congestion Manager (CM) [RFC3124] provides a single congestion
102	   controller with a scheduling function just as described above.  It is
103	   hard to implement because it requires an additional congestion
104	   controller and removes all per-connection congestion control
105	   functionality, which is quite a significant change to existing RTP
106	   based applications.  This document presents a method that is easier
107	   to implement than the CM and also requires less significant changes
108	   to existing RTP based applications.  It attempts to roughly
109	   approximate the CM behavior by sharing information between existing
110	   congestion controllers.

112	2.  Definitions

114	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
115	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
116	   document are to be interpreted as described in RFC 2119 [RFC2119].

118	   Available Bandwidth:
119	         The available bandwidth is the nominal link capacity minus the
120	         amount of traffic that traversed the link during a certain time
121	         interval, divided by that time interval.

123	   Bottleneck:
124	         The first link with the smallest available bandwidth along the
125	         path between a sender and receiver.

127	   Flow:
128	         A flow is the entity that congestion control is operating on.
129	         It could, for example, be a transport layer connection, an RTP
130	         session, or a subsession that is multiplexed onto a single RTP
131	         session together with other subsessions.

133	   Flow Group Identifier (FGI):
134	         A unique identifier for each subset of flows that is limited by
135	         a common bottleneck.

137	   Flow State Exchange (FSE):
138	         The entity that maintains information that is exchanged between
139	         flows.

141	   Flow Group (FG):
142	         A group of flows having the same FGI.

144	   Shared Bottleneck Detection (SBD):
145	         The entity that determines which flows traverse the same
146	         bottleneck in the network, or the process of doing so.

148	3.  Limitations

150	   Sender-side only:
151	         Coupled congestion control as described here only operates
152	         inside a single host on the sender side.  This is because,
153	         irrespective of where the major decisions for congestion
154	         control are taken, the sender of a flow needs to eventually
155	         decide the transmission rate.  Additionally, the necessary
156	         information about how much data an application can currently
157	         send on a flow is often only available at the sender side,
158	         making the sender an obvious choice for placement of the
159	         elements and mechanisms described here.

161	   Shared bottlenecks do not change quickly:
162	         As per the definition above, a bottleneck depends on cross
163	         traffic, and since such traffic can heavily fluctuate,
164	         bottlenecks can change at a high frequency (e.g., there can be
165	         oscillation between two or more links).  This means that, when
166	         flows are partially routed along different paths, they may
167	         quickly change between sharing and not sharing a bottleneck.
168	         For simplicity, here it is assumed that a shared bottleneck is
169	         valid for a time interval that is significantly longer than the
170	         interval at which congestion controllers operate.  Note that,
171	         for the only SBD mechanism defined in this document
172	         (multiplexing on the same five-tuple), the notion of a shared
173	         bottleneck stays correct even in the presence of fast traffic
174	         fluctuations: since all flows that are assumed to share a
175	         bottleneck are routed in the same way, if the bottleneck
176	         changes, it will still be shared.

178	4.  Architectural overview

180	   Figure 1 shows the elements of the architecture for coupled
181	   congestion control: the Flow State Exchange (FSE), Shared Bottleneck
182	   Detection (SBD) and Flows.  The FSE is a storage element that can be
183	   implemented in two ways: active and passive.  In the active version,
184	   it initiates communication with flows and SBD.  However, in the
185	   passive version, it does not actively initiate communication with
186	   flows and SBD; its only active role is internal state maintenance
187	   (e.g., an implementation could use soft state to remove a flow's data
188	   after long periods of inactivity).  Every time a flow's congestion
189	   control mechanism would normally update its sending rate, the flow
190	   instead updates information in the FSE and performs a query on the
191	   FSE, leading to a sending rate that can be different from what the
192	   congestion controller originally determined.  Using information
193	   about/from the currently active flows, SBD updates the FSE with the
194	   correct Flow State Identifiers (FSIs).

196	                          -------  <---  Flow 1
197	                          | FSE |  <---  Flow 2 ..
198	                          -------  <---  .. Flow N
199	                             ^
200	                             |             |
201	                          -------          |
202	                          | SBD |  <-------|
203	                          -------

205	             Figure 1: Coupled congestion control architecture

207	   Since everything shown in Figure 1 is assumed to operate on a single
208	   host (the sender) only, this document only describes aspects that
209	   have an influence on the resulting on-the-wire behavior.  It does,
210	   for instance, not define how many bits must be used to represent
211	   FSIs, or in which way the entities communicate.  Implementations can
212	   take various forms: for instance, all the elements in the figure
213	   could be implemented within a single application, thereby operating
214	   on flows generated by that application only.  Another alternative
215	   could be to implement both the FSE and SBD together in a separate
216	   process which different applications communicate with via some form
217	   of Inter-Process Communication (IPC).  Such an implementation would
218	   extend the scope to flows generated by multiple applications.  The
219	   FSE and SBD could also be included in the Operating System kernel.

221	5.  Roles

223	   This section gives an overview of the roles of the elements of
224	   coupled congestion control, and provides an example of how coupled
225	   congestion control can operate.

227	5.1.  SBD

229	   SBD uses knowledge about the flows to determine which flows belong in
230	   the same Flow Group (FG), and assigns FGIs accordingly.  This
231	   knowledge can be derived in three basic ways:

233	   1.  From multiplexing: it can be based on the simple assumption that
234	       packets sharing the same five-tuple (IP source and destination
235	       address, protocol, and transport layer port number pair) and
236	       having the same Differentiated Services Code Point (DSCP) in the
237	       IP header are typically treated in the same way along the path.
238	       The latter method is the only one specified in this document: SBD
239	       MAY consider all flows that use the same five-tuple and DSCP to
240	       belong to the same FG.  This classification applies to certain
241	       tunnels, or RTP flows that are multiplexed over one transport
242	       (cf. [transport-multiplex]).  In one way or another, such
243	       multiplexing will probably be recommended for use with rtcweb
244	       [rtcweb-rtp-usage].

246	   2.  Via configuration: e.g. by assuming that a common wireless uplink
247	       is also a shared bottleneck.

249	   3.  From measurements: e.g. by considering correlations among
250	       measured delay and loss as an indication of a shared bottleneck.

252	   The methods above have some essential trade-offs: e.g., multiplexing
253	   is a completely reliable measure, however it is limited in scope to
254	   two end points (i.e., it cannot be applied to couple congestion
255	   controllers of one sender talking to multiple receivers).  A
256	   measurement-based SBD mechanism is described in [sbd].  Measurements
257	   can never be 100% reliable, in particular because they are based on
258	   the past but applying coupled congestion control means to make an
259	   assumption about the future; it is therefore recommended to implement
260	   cautionary measures, e.g. by disabling coupled congestion control if
261	   enabling it causes a significant increase in delay and/or packet
262	   loss.  Measurements also take time, which entails a certain delay for
263	   turning on coupling (refer to [sbd] for details).

265	5.2.  FSE

267	   The FSE contains a list of all flows that have registered with it.
268	   For each flow, it stores the following:

270	   o  a unique flow number to identify the flow

272	   o  the FGI of the FG that it belongs to (based on the definitions in
273	      this document, a flow has only one bottleneck, and can therefore
274	      be in only one FG)

276	   o  a priority P, which here is assumed to be represented as a
277	      floating point number in the range from 0.1 (unimportant) to 1
278	      (very important).  A negative value is used to indicate that a
279	      flow has terminated

281	   o  The rate used by the flow in bits per second, FSE_R.

283	   The FSE can operate on window-based as well as rate-based congestion
284	   controllers (TEMPORARY NOTE: and probably -- not yet tested --
285	   combinations thereof, with calculations to convert from one to the
286	   other).  In case of a window-based controller, FSE_R is a window, and
287	   all the text below should be considered to refer to window, not
288	   rates.

290	   In the FSE, each FG contains one static variable S_CR which is meant
291	   to be the sum of the calculated rates of all flows in the same FG
292	   (including the flow itself).  This value is used to calculate the
293	   sending rate.

295	   The information listed here is enough to implement the sample flow
296	   algorithm given below.  FSE implementations could easily be extended
297	   to store, e.g., a flow's current sending rate for statistics
298	   gathering or future potential optimizations.

300	5.3.  Flows

302	   Flows register themselves with SBD and FSE when they start,
303	   deregister from the FSE when they stop, and carry out an UPDATE
304	   function call every time their congestion controller calculates a new
305	   sending rate.  Via UPDATE, they provide the newly calculated rate and
306	   optionally (if the algorithm supports it) the desired rate.  The
307	   desired rate is less than the calculated rate in case of application-
308	   limited flows; otherwise, it is the same as the calculated rate.

310	   Below, two example algorithms are described.  While other algorithms
311	   could be used instead, the same algorithm must be applied to all
312	   flows.

314	5.3.1.  Example algorithm 1 - Active FSE

316	   This algorithm was designed to be the simplest possible method to
317	   assign rates according to the priorities of flows.  Simulations
318	   results in [fse] indicate that it does however not significantly
319	   reduce queuing delay and packet loss.

321	   (1)  When a flow f starts, it registers itself with SBD and the FSE.
322	        FSE_R is initialized with the congestion controller's initial
323	        rate.  SBD will assign the correct FGI.  When a flow is assigned
324	        an FGI, it adds its FSE_R to S_CR.

326	   (2)  When a flow f stops, its entry is removed from the list.

328	   (3)  Every time the congestion controller of the flow f determines a
329	        new sending rate CC_R, the flow calls UPDATE, which carries out
330	        the tasks listed below to derive the new sending rates for all
331	        the flows in the FG.  A flow's UPDATE function uses a local
332	        (i.e. per-flow) temporary variable S_P, which is the sum of all
333	        the priorities.

335	        (a)  It updates S_CR.

337	               S_CR = S_CR + CC_R - FSE_R(f)

339	        (b)  It calculates the sum of all the priorities, S_P.

341	               S_P = 0
342	               for all flows i in FG do
343	                   S_P = S_P + P(i)
344	               end for

346	        (c)  It calculates the sending rates for all the flows in an FG
347	             and distributes them.

349	               for all flows i in FG do
350	                   FSE_R(i) = (P(i)*S_CR)/S_P
351	                   send FSE_R(i) to the flow i
352	               end for

354	5.3.2.  Example algorithm 2 - Conservative Active FSE

356	   This algorithm extends algorithm 1 to conservatively emulate the
357	   behavior of a single flow by proportionally reducing the aggregate
358	   rate on congestion.  Simulations results in [fse] indicate that it
359	   can significantly reduce queuing delay and packet loss.

361	   (1)  When a flow f starts, it registers itself with SBD and the FSE.
362	        FSE_R is initialized with the congestion controller's initial
363	        rate.  SBD will assign the correct FGI.  When a flow is assigned
364	        an FGI, it adds its FSE_R to S_CR.

366	   (2)  When a flow f stops, its entry is removed from the list.

368	   (3)  Every time the congestion controller of the flow f determines a
369	        new sending rate CC_R, the flow calls UPDATE, which carries out
370	        the tasks listed below to derive the new sending rates for all
371	        the flows in the FG.  A flow's UPDATE function uses a local
372	        (i.e. per-flow) temporary variable S_P, which is the sum of all
373	        the priorities, and a local variable DELTA, which is used to
374	        calculate the difference between CC_R and the previously stored
375	        FSE_R. To prevent flows from either ignoring congestion or
376	        overreacting, a timer keeps them from changing their rates
377	        immediately after the common rate reduction that follows a
378	        congestion event.  This timer is set to 2 RTTs of the flow that
379	        experienced congestion because it is assumed that a congestion
380	        event can persist for up to one RTT of that flow, with another
381	        RTT added to compensate for fluctuations in the measured RTT
382	        value.

384	        (a)  It updates S_CR based on DELTA.

386	               if Timer has expired or not set then
387	                 DELTA = CC_R - FSE_R(f)
388	                 if DELTA < 0 then  // Reduce S_CR proportionally
389	                   S_CR = S_CR * CC_R / FSE_R(f)
390	                   Set Timer for 2 RTTs
391	                 else
392	                   S_CR = S_CR + DELTA
393	                 end if
394	                end if

396	        (b)  It calculates the sum of all the priorities, S_P.

398	               S_P = 0
399	               for all flows i in FG do
400	                   S_P = S_P + P(i)
401	               end for

403	        (c)  It calculates the sending rates for all the flows in an FG
404	             and distributes them.

406	               for all flows i in FG do
407	                   FSE_R(i) = (P(i)*S_CR)/S_P
408	                   send FSE_R(i) to the flow i
409	               end for

411	6.  Application

413	   This section specifies how the FSE can be applied to specific
414	   congestion control mechanisms and makes general recommendations that
415	   facilitate applying the FSE to future congestion controls.

417	6.1.  NADA

419	   Network-Assisted Dynamic Adapation (NADA) [nada] is a congestion
420	   control scheme for rtcweb.  It calculates a reference rate R_n upon
421	   receiving an acknowledgment, and then, based on the reference rate,
422	   it calculates a video target rate R_v and a sending rate for the
423	   flows, R_s.

425	   When applying the FSE to NADA, the UPDATE function call described in
426	   Section 5.3 gives the FSE NADA's reference rate R_n.  The recommended
427	   algorithm for NADA is the Active FSE in Section 5.3.1.  In step 3
428	   (c), when the FSE_R(i) is "sent" to the flow i, this means updating
429	   R_v and R_s of flow i with the value of FSE_R(i).

431	   NADA simulation results are available from
432	   http://heim.ifi.uio.no/safiquli/coupled-cc/.  The next version of
433	   this document will refer to a technical report that will be made
434	   available at the same URL.

436	6.2.  General recommendations

438	   This section will provides general advice for applying the FSE to
439	   congestion control mechanisms.  TEMPORARY NOTE: Future versions of
440	   this document will contain a longer list.

442	   Receiver-side calculations:
443	         When receiver-side calculations make assumptions about the rate
444	         of the sender, the calculations need to be synchronized or the
445	         receiver needs to be updated accordingly.  This applies to TFRC
446	         [RFC5348], for example, where simulations showed somewhat less
447	         favorable results when using the FSE without a receiver-side
448	         change [fse].

450	7.  Acknowledgements

452	   This document has benefitted from discussions with and feedback from
453	   David Hayes, Mirja Kuehlewind, Andreas Petlund, David Ros (who also
454	   gave the FSE its name), Zaheduzzaman Sarker and Varun Singh.  The
455	   authors would like to thank Xiaoqing Zhu for helping with NADA.

457	   This work was partially funded by the European Community under its
458	   Seventh Framework Programme through the Reducing Internet Transport
459	   Latency (RITE) project (ICT-317700).

461	8.  IANA Considerations

463	   This memo includes no request to IANA.

465	9.  Security Considerations

467	   In scenarios where the architecture described in this document is
468	   applied across applications, various cheating possibilities arise:
469	   e.g., supporting wrong values for the calculated rate, the desired
470	   rate, or the priority of a flow.  In the worst case, such cheating
471	   could either prevent other flows from sending or make them send at a
472	   rate that is unreasonably large.  The end result would be unfair
473	   behavior at the network bottleneck, akin to what could be achieved
474	   with any UDP based application.  Hence, since this is no worse than
475	   UDP in general, there seems to be no significant harm in using this
476	   in the absence of UDP rate limiters.

478	   In the case of a single-user system, it should also be in the
479	   interest of any application programmer to give the user the best
480	   possible experience by using reasonable flow priorities or even
481	   letting the user choose them.  In a multi-user system, this interest
482	   may not be given, and one could imagine the worst case of an "arms
483	   race" situation, where applications end up setting their priorities
484	   to the maximum value.  If all applications do this, the end result is
485	   a fair allocation in which the priority mechanism is implicitly
486	   eliminated, and no major harm is done.

488	10.  References

490	10.1.  Normative References

492	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
493	              Requirement Levels", BCP 14, RFC 2119, March 1997.

495	   [RFC3124]  Balakrishnan, H. and S. Seshan, "The Congestion Manager",
496	              RFC 3124, June 2001.

498	   [RFC5348]  Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP
499	              Friendly Rate Control (TFRC): Protocol Specification",
500	              RFC 5348, September 2008.

502	10.2.  Informative References

504	   [fse]      Islam, S., Welzl, M., Gjessing, S., and N. Khademi,
505	              "Coupled Congestion Control for RTP Media", ACM SIGCOMM
506	              Capacity Sharing Workshop (CSWS 2014); extended version
507	              available as a technical report from
508	              http://safiquli.at.ifi.uio.no/paper/fse-tech-report.pdf ,
509	              2014.

511	   [nada]     Zhu, X., Pan, R., Ramalho, M., Mena, S., Ganzhorn, C.,
512	              Jones, P., and S. De Aronco, "NADA: A Unified Congestion
513	              Control Scheme for Real-Time Media",
514	              draft-ietf-rmcat-nada-00 (work in progress), April 2015.

516	   [rtcweb-rtp-usage]
517	              Perkins, C., Westerlund, M., and J. Ott, "Web Real-Time
518	              Communication (WebRTC): Media Transport and Use of RTP",
519	              draft-ietf-rtcweb-rtp-usage-18.txt (work in progress),
520	              October 2014.

522	   [rtcweb-usecases]
523	              Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real-
524	              Time Communication Use-cases and Requirements",
525	              draft-ietf-rtcweb-use-cases-and-requirements-14.txt (work
526	              in progress), February 2014.

528	   [sbd]      Hayes, D., Ferlin, S., and M. Welzl, "Shared Bottleneck
529	              Detection for Coupled Congestion Control for RTP Media",
530	              draft-ietf-rmcat-sbd-00.txt (work in progress), May 2015.

532	   [transport-multiplex]
533	              Westerlund, M. and C. Perkins, "Multiple RTP Sessions on a
534	              Single Lower-Layer Transport",
535	              draft-westerlund-avtcore-transport-multiplexing-07.txt
536	              (work in progress), October 2013.

538	Appendix A.  Example algorithm - Passive FSE

540	   Active algorithms calculate the rates for all the flows in the FG and
541	   actively distribute them.  In a passive algorithm, UPDATE returns a
542	   rate that should be used instead of the rate that the congestion
543	   controller has determined.  This can make a passive algorithm easier
544	   to implement; however, when round-trip times of flows are unequal,
545	   shorter-RTT flows will update and react to the overall FSE state more
546	   often than longer-RTT flows, which can produce unwanted side effects.
547	   This problem is more significant when the congestion control
548	   convergence depends on the RTT.  While the passive algorithm works
549	   better for congestion controls with RTT-independent convergence, it
550	   can still produce oscillations on short time scales.  The algorithm
551	   described below is therefore considered as highly experimental.

553	   This passive version of the FSE stores the following information in
554	   addition to the variables described in Section 5.2:

556	   o  The desired rate DR.  This can be smaller than the calculated rate
557	      if the application feeding into the flow has less data to send
558	      than the congestion controller would allow.  In case of a bulk
559	      transfer, DR must be set to CC_R received from the flow's
560	      congestion module.

562	   The passive version of the FSE contains one static variable per FG
563	   called TLO (Total Leftover Rate -- used to let a flow 'take'
564	   bandwidth from application-limited or terminated flows) which is
565	   initialized to 0.  For the passive version, S_CR is limited to
566	   increase or decrease as conservatively as a flow's congestion
567	   controller decides in order to prohibit sudden rate jumps.

569	   (1)  When a flow f starts, it registers itself with SBD and the FSE.
570	        FSE_R and DR are initialized with the congestion controller's
571	        initial rate.  SBD will assign the correct FGI.  When a flow is
572	        assigned an FGI, it adds its FSE_R to S_CR.

574	   (2)  When a flow f stops, it sets its DR to 0 and sets P to -1.

576	   (3)  Every time the congestion controller of the flow f determines a
577	        new sending rate CC_R, assuming the flow's new desired rate
578	        new_DR to be "infinity" in case of a bulk data transfer with an
579	        unknown maximum rate, the flow calls UPDATE, which carries out
580	        the tasks listed below to derive the flow's new sending rate,
581	        Rate.  A flow's UPDATE function uses a few local (i.e. per-flow)
582	        temporary variables, which are all initialized to 0: DELTA,
583	        new_S_CR and S_P.

585	        (a)  For all the flows in its FG (including itself), it
586	             calculates the sum of all the calculated rates, new_S_CR.
587	             Then it calculates the difference between FSE_R(f) and
588	             CC_R, DELTA.

590	               for all flows i in FG do
591	                   new_S_CR = new_S_CR + FSE_R(i)
592	               end for
593	               DELTA =  CC_R - FSE_R(f)

595	        (b)  It updates S_CR, FSE_R(f) and DR(f).

597	               FSE_R(f) = CC_R
598	               if DELTA > 0 then  // the flow's rate has increased
599	                   S_CR = S_CR + DELTA
600	               else if DELTA < 0 then
601	                   S_CR = new_S_CR + DELTA
602	               end if
603	               DR(f) = min(new_DR,FSE_R(f))

605	        (c)  It calculates the leftover rate TLO, removes the terminated
606	             flows from the FSE and calculates the sum of all the
607	             priorities, S_P.

609	               for all flows i in FG do
610	                  if P(i)<0 then
611	                     delete flow
612	                  else
613	                     S_P = S_P + P(i)
614	                  end if
615	               end for
616	               if DR(f) < FSE_R(f) then
617	                  TLO = TLO + (P(f)/S_P) * S_CR - DR(f))
618	               end if

620	        (d)  It calculates the sending rate, Rate.

622	               Rate = min(new_DR, (P(f)*S_CR)/S_P + TLO)

624	               if Rate != new_DR and TLO > 0 then
625	                   TLO = 0  // f has 'taken' TLO
626	               end if

628	        (e)  It updates DR(f) and FSE_R(f) with Rate.

630	               if Rate > DR(f) then
631	                   DR(f) = Rate
632	               end if
633	               FSE_R(f)  = Rate

635	   The goals of the flow algorithm are to achieve prioritization,
636	   improve network utilization in the face of application-limited flows,
637	   and impose limits on the increase behavior such that the negative
638	   impact of multiple flows trying to increase their rate together is
639	   minimized.  It does that by assigning a flow a sending rate that may
640	   not be what the flow's congestion controller expected.  It therefore
641	   builds on the assumption that no significant inefficiencies arise
642	   from temporary application-limited behavior or from quickly jumping
643	   to a rate that is higher than the congestion controller intended.
644	   How problematic these issues really are depends on the controllers in
645	   use and requires careful per-controller experimentation.  The coupled
646	   congestion control mechanism described here also does not require all
647	   controllers to be equal; effects of heterogeneous controllers, or
648	   homogeneous controllers being in different states, are also subject
649	   to experimentation.

651	   This algorithm gives all the leftover rate of application-limited
652	   flows to the first flow that updates its sending rate, provided that
653	   this flow needs it all (otherwise, its own leftover rate can be taken
654	   by the next flow that updates its rate).  Other policies could be
655	   applied, e.g. to divide the leftover rate of a flow equally among all
656	   other flows in the FGI.

658	A.1.  Example operation (passive)

660	   In order to illustrate the operation of the passive coupled
661	   congestion control algorithm, this section presents a toy example of
662	   two flows that use it.  Let us assume that both flows traverse a
663	   common 10 Mbit/s bottleneck and use a simplistic congestion
664	   controller that starts out with 1 Mbit/s, increases its rate by 1
665	   Mbit/s in the absence of congestion and decreases it by 2 Mbit/s in
666	   the presence of congestion.  For simplicity, flows are assumed to
667	   always operate in a round-robin fashion.  Rate numbers below without
668	   units are assumed to be in Mbit/s.  For illustration purposes, the
669	   actual sending rate is also shown for every flow in FSE diagrams even
670	   though it is not really stored in the FSE.

672	   Flow #1 begins.  It is a bulk data transfer and considers itself to
673	   have top priority.  This is the FSE after the flow algorithm's step
674	   1:

676	   ----------------------------------------
677	   | # | FGI |  P  | FSE_R  |  DR  | Rate |
678	   |   |     |     |        |      |      |
679	   | 1 |  1  |  1  |   1    |   1  |   1  |
680	   ----------------------------------------
681	   S_CR = 1, TLO = 0
682	   Its congestion controller gradually increases its rate.  Eventually,
683	   at some point, the FSE should look like this:

685	   --------------------------------------
686	   | # | FGI |  P  |  FSE_R  |  DR  | Rate |
687	   |   |     |     |         |      |      |
688	   | 1 |  1  |  1  |   10    |  10  |  10  |
689	   -----------------------------------------
690	   S_CR = 10, TLO = 0

692	   Now another flow joins.  It is also a bulk data transfer, and has a
693	   lower priority (0.5):

695	   ----------------------------------------
696	   | # | FGI |   P   | FSE_R  |  DR  | Rate |
697	   |   |     |       |        |      |      |
698	   | 1 |  1  |   1   |   10   |  10  |  10  |
699	   | 2 |  1  |  0.5  |    1   |   1  |   1  |
700	   ------------------------------------------
701	   S_CR = 11, TLO = 0

703	   Now assume that the first flow updates its rate to 8, because the
704	   total sending rate of 11 exceeds the total capacity.  Let us take a
705	   closer look at what happens in step 3 of the flow algorithm.

707	   CC_R = 8. new_DR = infinity.
708	   3 a) new_S_CR = 11; DELTA = 8 - 10 = -2.
709	   3 b) FSE_Rf) = 8. DELTA is negative, hence S_CR = 9;
710	        DR(f) = 8.
711	   3 c) S_P = 1.5.
712	   3 d) new sending rate = min(infinity, 1/1.5 * 9 + 0) = 6.
713	   3 e) FSE_R(f) = 6.

715	   The resulting FSE looks as follows:
716	   ----------------------------------------
717	   | # | FGI |   P   |  FSE_R  |  DR  | Rate |
718	   |   |     |       |         |      |      |
719	   | 1 |  1  |   1   |    6    |   8  |   6  |
720	   | 2 |  1  |  0.5  |    1    |   1  |   1  |
721	   -------------------------------------------
722	   S_CR = 9, TLO = 0
723	   The effect is that flow #1 is sending with 6 Mbit/s instead of the 8
724	   Mbit/s that the congestion controller derived.  Let us now assume
725	   that flow #2 updates its rate.  Its congestion controller detects
726	   that the network is not fully saturated (the actual total sending
727	   rate is 6+1=7) and increases its rate.

729	   CC_R=2. new_DR = infinity.
730	   3 a) new_S_CR = 7; DELTA = 2 - 1 = 1.
731	   3 b) FSE_R(f) = 2. DELTA is positive, hence S_CR = 9 + 1 = 10;
732	        DR(f) = 2.
733	   3 c) S_P = 1.5.
734	   3 d) new sending rate = min(infinity, 0.5/1.5 * 10 + 0) = 3.33.
735	   3 e) DR(f) = FSE_R(f) = 3.33.

737	   The resulting FSE looks as follows:
738	   -------------------------------------------
739	   | # | FGI |   P   |  FSE_R  |  DR  | Rate |
740	   |   |     |       |         |      |      |
741	   | 1 |  1  |   1   |    6    |   8  |   6  |
742	   | 2 |  1  |  0.5  |   3.33  | 3.33 | 3.33 |
743	   -------------------------------------------
744	   S_CR = 10, TLO = 0

746	   The effect is that flow #2 is now sending with 3.33 Mbit/s, which is
747	   close to half of the rate of flow #1 and leads to a total utilization
748	   of 6(#1) + 3.33(#2) = 9.33 Mbit/s.  Flow #2's congestion controller
749	   has increased its rate faster than the controller actually expected.
750	   Now, flow #1 updates its rate.  Its congestion controller detects
751	   that the network is not fully saturated and increases its rate.
752	   Additionally, the application feeding into flow #1 limits the flow's
753	   sending rate to at most 2 Mbit/s.

755	   CC_R=7. new_DR=2.
756	   3 a) new_S_CR = 9.33; DELTA = 1.
757	   3 b) FSE_R(f) = 7, DELTA is positive, hence S_CR = 10 + 1 = 11;
758	        DR = min(2, 7) = 2.
759	   3 c) S_P = 1.5; DR(f) < FSE_R(f), hence TLO = 1/1.5 * 11 - 2 = 5.33.
760	   3 d) new sending rate = min(2, 1/1.5 * 11 + 5.33) = 2.
761	   3 e) FSE_R(f) = 2.

763	   The resulting FSE looks as follows:
764	   -------------------------------------------
765	   | # | FGI |   P   |  FSE_R  |  DR  | Rate |
766	   |   |     |       |         |      |      |
767	   | 1 |  1  |   1   |    2    |   2  |   2  |
768	   | 2 |  1  |  0.5  |   3.33  | 3.33 | 3.33 |
769	   -------------------------------------------
770	   S_CR = 11, TLO = 5.33

772	   Now, the total rate of the two flows is 2 + 3.33 = 5.33 Mbit/s, i.e.
773	   the network is significantly underutilized due to the limitation of
774	   flow #1.  Flow #2 updates its rate.  Its congestion controller
775	   detects that the network is not fully saturated and increases its
776	   rate.

778	   CC_R=4.33. new_DR = infinity.
779	   3 a) new_S_CR = 5.33; DELTA = 1.
780	   3 b) FSE_R(f) = 4.33. DELTA is positive, hence S_CR = 12;
781	        DR(f) = 4.33.
782	   3 c) S_P = 1.5.
783	   3 d) new sending rate: min(infinity, 0.5/1.5 * 12 + 5.33 ) = 9.33.
784	   3 e) FSE_R(f) = 9.33, DR(f) = 9.33.

786	   The resulting FSE looks as follows:
787	   -------------------------------------------
788	   | # | FGI |   P   |  FSE_R  |  DR  | Rate |
789	   |   |     |       |         |      |      |
790	   | 1 |  1  |   1   |    2    |   2  |   2  |
791	   | 2 |  1  |  0.5  |   9.33  | 9.33 | 9.33 |
792	   -------------------------------------------
793	   S_CR = 12, TLO = 0

795	   Now, the total rate of the two flows is 2 + 9.33 = 11.33 Mbit/s.
796	   Finally, flow #1 terminates.  It sets P to -1 and DR to 0.  Let us
797	   assume that it terminated late enough for flow #2 to still experience
798	   the network in a congested state, i.e. flow #2 decreases its rate in
799	   the next iteration.

801	   CC_R = 7.33. new_DR = infinity.
802	   3 a) new_S_CR = 11.33; DELTA = -2.
803	   3 b) FSE_R(f) = 7.33. DELTA is negative, hence S_CR = 9.33;
804	        DR(f) = 7.33.
805	   3 c) Flow 1 has P = -1, hence it is deleted from the FSE.
806	        S_P = 0.5.
807	   3 d) new sending rate: min(infinity, 0.5/0.5*9.33 + 0) = 9.33.
808	   3 e) FSE_R(f) = DR(f) = 9.33.

810	   The resulting FSE looks as follows:
811	   -------------------------------------------
812	   | # | FGI |   P   |  FSE_R  |  DR  | Rate |
813	   |   |     |       |         |      |      |
814	   | 2 |  1  |  0.5  |   9.33  | 9.33 | 9.33 |
815	   -------------------------------------------
816	   S_CR = 9.33, TLO = 0

818	Appendix B.  Change log

820	B.1.  Changes from -00 to -01

822	   o  Added change log.

824	   o  Updated the example algorithm and its operation.

826	B.2.  Changes from -01 to -02

828	   o  Included an active version of the algorithm which is simpler.

830	   o  Replaced "greedy flow" with "bulk data transfer" and "non-greedy"
831	      with "application-limited".

833	   o  Updated new_CR to CC_R, and CR to FSE_R for better understanding.

835	B.3.  Changes from -02 to -03

837	   o  Included an active conservative version of the algorithm which
838	      reduces queue growth and packet loss; added a reference to a
839	      technical report that shows these benefits with simulations.

841	   o  Moved the passive variant of the algorithm to appendix.

843	B.4.  Changes from -03 to -04

845	   o  Extended SBD section.

847	   o  Added a note about window-based controllers.

849	B.5.  Changes from -04 to -05

851	   o  Added a section about applying the FSE to specific congestion
852	      control algorithms, with a subsection specifying its use with
853	      NADA.

855	Authors' Addresses

857	   Michael Welzl
858	   University of Oslo
859	   PO Box 1080 Blindern
860	   Oslo,   N-0316
861	   Norway

863	   Phone: +47 22 85 24 20
864	   Email: michawe@ifi.uio.no

866	   Safiqul Islam
867	   University of Oslo
868	   PO Box 1080 Blindern
869	   Oslo,   N-0316
870	   Norway

872	   Phone: +47 22 84 08 37
873	   Email: safiquli@ifi.uio.no

875	   Stein Gjessing
876	   University of Oslo
877	   PO Box 1080 Blindern
878	   Oslo,   N-0316
879	   Norway

881	   Phone: +47 22 85 24 44
882	   Email: steing@ifi.uio.no