idnits 2.17.1 

draft-raiciu-mptcp-congestion-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** The document seems to lack a License Notice according IETF Trust
     Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009
     Section 6.b -- however, there's a paragraph with a matching beginning.
     Boilerplate error?

     (You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Feb 2009 rather than one of the newer Notices.  See
     https://trustee.ietf.org/license-info/.)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (October 19, 2009) is 5302 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Missing Reference: 'REF' is mentioned on line 334, but not defined

  == Outdated reference: A later version (-03) exists of
     draft-ford-mptcp-multiaddressed-01

  -- Obsolete informational reference (is this intentional?): RFC  793
     (Obsoleted by RFC 9293)


     Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                                C. Raiciu
3	Internet-Draft                                                M. Handley
4	Intended status: Experimental                                 D. Wischik
5	Expires: April 22, 2010                        University College London
6	                                                        October 19, 2009

8	               Coupled Multipath-Aware Congestion Control
9	                    draft-raiciu-mptcp-congestion-00

11	Status of this Memo

13	   This Internet-Draft is submitted to IETF in full conformance with the
14	   provisions of BCP 78 and BCP 79.

16	   Internet-Drafts are working documents of the Internet Engineering
17	   Task Force (IETF), its areas, and its working groups.  Note that
18	   other groups may also distribute working documents as Internet-
19	   Drafts.

21	   Internet-Drafts are draft documents valid for a maximum of six months
22	   and may be updated, replaced, or obsoleted by other documents at any
23	   time.  It is inappropriate to use Internet-Drafts as reference
24	   material or to cite them other than as "work in progress."

26	   The list of current Internet-Drafts can be accessed at
27	   http://www.ietf.org/ietf/1id-abstracts.txt.

29	   The list of Internet-Draft Shadow Directories can be accessed at
30	   http://www.ietf.org/shadow.html.

32	   This Internet-Draft will expire on April 22, 2010.

34	Copyright Notice

36	   Copyright (c) 2009 IETF Trust and the persons identified as the
37	   document authors.  All rights reserved.

39	   This document is subject to BCP 78 and the IETF Trust's Legal
40	   Provisions Relating to IETF Documents in effect on the date of
41	   publication of this document (http://trustee.ietf.org/license-info).
42	   Please review these documents carefully, as they describe your rights
43	   and restrictions with respect to this document.

45	Abstract

47	   Often endpoints are connected by multiple paths, but communications
48	   are usually restricted to a single path per socket.  Resource usage
49	   within the network would be more efficient were it possible for these
50	   multiple paths to be used concurrently.

52	   The use of multiple paths simultaneously, specifically within a
53	   Multipath TCP protocol, necessitates the development of new
54	   congestion control algorithms.  If existing algorithms such as TCP
55	   New Reno were run independently on each path, the multipath flow
56	   would take more than its fair share if there was a common bottleneck.
57	   Further, it is desirable that a source with multiple paths available
58	   will transfer more traffic using the least congested of the paths,
59	   hence achieving resource pooling.  This would increase the overall
60	   utilization of the network and also its robustness to failure.

62	   This document presents a congestion control algorithm which couples
63	   the congestion control algorithms running on different subflows by
64	   linking their increase functions, and dynamically controls the
65	   overall aggresiveness of the multipath flow.  The result is a
66	   practical algorithm that is fair to TCP at bottlenecks while moving
67	   traffic away from congested links.

69	Table of Contents

71	   1.  Requirements Language  . . . . . . . . . . . . . . . . . . . .  4
72	   2.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
73	   3.  Coupled Congestion Control Algorithm . . . . . . . . . . . . .  5
74	   4.  Implementation Optimizations . . . . . . . . . . . . . . . . .  7
75	     4.1.  Implementation Considerations when CWND is Expressed
76	           in Packets . . . . . . . . . . . . . . . . . . . . . . . .  8
77	   5.  Discussion . . . . . . . . . . . . . . . . . . . . . . . . . .  8
78	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
79	   7.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .  9
80	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  9
81	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . .  9
82	     9.1.  Normative References . . . . . . . . . . . . . . . . . . .  9
83	     9.2.  Informative References . . . . . . . . . . . . . . . . . .  9
84	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10

86	1.  Requirements Language

88	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
89	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
90	   document are to be interpreted as described in RFC 2119 [RFC2119].

92	2.  Introduction

94	   Multipath TCP (MPTCP, [I-D.ford-mptcp-multiaddressed]) is a set of
95	   extensions to regular TCP [RFC0793] that allow one TCP connection to
96	   be spread across multiple paths.  MPTCP distributes load through the
97	   creation of separate "subflows" across potentially disjoint paths.

99	   How should congestion control be performed for multipath TCP?  First,
100	   each subflow must have its own congestion control state (i.e. cwnd)
101	   so that capacity on that path is matched by offered load.  The
102	   simplest way to achieve this goal is to simply run TCP New Reno
103	   congestion control [RFC5681] on each subflow.  However this solution
104	   is unsatisfactory as it gives the multipath flow an unfair share when
105	   the paths taken by its different subflows share a common bottleneck.

107	   Bottleneck fairness is just one requirement multipath congestion
108	   control should meet.  The following three goals capture the desirable
109	   properties of a practical multipath congestion control algorithm:

111	   o  Goal 1 (Improve Throughput) A multipath flow should perform at
112	      least as well as a single path flow would on the best of the paths
113	      available to it.

115	   o  Goal 2 (Do no harm) A multipath flow should not take up more
116	      capacity on any one of its paths than if it was a single path flow
117	      using only that route.  This guarantees it will not unduly harm
118	      other flows.

120	   o  Goal 3 (Balance congestion) A multipath flow should move as much
121	      traffic as possible off its most congested paths, subject to
122	      meeting the first two goals.

124	   Goals 1 and 2 together ensure fairness at the bottleneck.  Goal 3
125	   captures the concept of resource pooling [WISCHIK]: if each multipath
126	   flow sends more data through its least congested path, the traffic in
127	   the network will move away from congested areas.  This improves
128	   robustness and overall throughput, among other things.  The way to
129	   achieve resource pooling is to effectively "couple" the congestion
130	   control loops for the different subflows.

132	   We propose an algorithm that couples only the additive increase
133	   function of the subflows, and uses unmodified TCP New Reno behavior
134	   in case of a drop.  The algorithm relies on the traditional TCP
135	   mechanisms to detect drops, to retransmit data, etc.

137	   Detecting shared bottlenecks reliably is quite difficult, so our
138	   proposal always assumes there is a shared bottleneck and throttles
139	   the aggressiveness of the multipath flow such that its total
140	   throughput is no more than that of a regular TCP running on the best
141	   path available.

143	   It is intended that the algorithm presented here can be applied to
144	   other multipath transport protocols, such as alternative multipath
145	   extensions to TCP, or indeed any other congestion-aware transport
146	   protocols.  However, for the purposes of example this document will,
147	   where appropriate, refer to the MPTCP protocol.

149	   It is foreseeable that different congestion controllers will be
150	   implemented for Multipath transport, each aiming to achieve different
151	   properties in the resource pooling/fairness/stability design space.
152	   In particular, solutions that give better resource pooling may be
153	   proposed.  This algorithm is conservative from this point of view,
154	   sacrificing resource pooling for stability.

156	3.  Coupled Congestion Control Algorithm

158	   The algorithm we present only applies to the congestion avoidance
159	   state.  The slow start, fast retransmit, and fast recovery algorithms
160	   are the same as [RFC5681].

162	   Let cwnd_i be the congestion window on the subflow i, and assume
163	   there is always data to send.  Let tot_cwnd be the sum of the
164	   congestion windows of all subflows in the connection.  Let p_i, rtt_i
165	   and mss_i be the drop probability, round trip time and maximum
166	   segment size on subflow i.

168	   We assume throughout this document that the congestion window is
169	   maintained in bytes, unless otherwise specified.  We briefly describe
170	   the algorithm for packet-based implementations of cwnd in section
171	   Section 4.1.

173	   Our proposed "Linked Increases" algorithm is:

175	   o  For each ack received on subflow i, increase cwnd_i by min (
176	      ceil(alfa*bytes_acked*mss_i/tot_cwnd) , bytes_acked*mss_i/cwnd_i )

178	   o  For each drop event on subflow i, decrease set cwnd_i to
179	      max(cwnd_i/2,2).  A drop event is one or more packet drops
180	      experienced by a subflows in the same round trip time.

182	   The decrease function is the same as in TCP New Reno, so we will not
183	   discuss it further in the remainder of this document.

185	   The increase formula takes the minimum between the computed increase
186	   for the multipath subflow (first argument to min), and the increase
187	   TCP would get in the same scenario (the second argument).  In this
188	   way, we ensure that a multipath subflow is NEVER more aggressive than
189	   a TCP flow, hence achieving goal 2 (do no harm)

191	   ceil returns the smallest integer greater than or equal to its real-
192	   valued input.  As long as bytes_acked is non-zero, the increase is
193	   non-zero.  The increase rounds up the computed value; and it ensures
194	   that multipath does not suffer more from rounding errors than TCP
195	   (which it might, as tot_cwnd is always greater than individual
196	   cwnds).  We discuss how to implement this formula in practice in the
197	   next section.

199	   We assume appropriate byte counting (ABC, [RFC3465]) is used, hence
200	   the bytes_acked variable records the number of bytes newly
201	   acknowledged.  If ABC is not used, bytes_acked SHOULD be set to
202	   mss_i.

204	   To compute tot_cwnd, it is an easy mistake to sum up cwnd_i across
205	   all subflows: when a flow is in fast retransmit, its cwnd is
206	   typically inflated and no longer represents the real congestion
207	   window.  The correct behavior is to use the ssthresh value for flows
208	   in fast retransmit whe computing tot_cwnd.

210	   "alfa" is a parameter of the algorithm that describes the
211	   aggresiveness of the multipath flow.  To meet Goal 1 (improve
212	   throughput), the value of alfa is chosen such that the aggregate
213	   throughput of the multipath flow is equal to the rate a TCP flow
214	   would get if it ran on the best path.

216	   The total throughput of a multipath flow depends on the value of alfa
217	   and the drop probabilities, maximum segment sizes and round trip
218	   times of its paths.  Since we require that the total throughput is no
219	   worse than the throughput a single TCP would get on the fastest path,
220	   it is impossible to choose a-priori a single value of alfa that
221	   achieves the desired throughput in every ocasion.  Hence, alfa must
222	   be computed for each multipath flow, based on the observed properties
223	   of the paths.

225	   The formula to compute alfa is:

227	                                                    2
228	                                      cwnd_i * mss_i
229	                                 max ---------------
230	                                  i           2
231	                                         rtt_i
232	             alfa = tot_cwnd * -------------------------
233	                               /      cwnd_i * mss_i \ 2
234	                               | sum ----------------|
235	                               \  i        rtt_i     /

237	   The formula is derived by equalizing the rate of the multipath flow
238	   with the rate of a TCP running on the fastest path, and solving for
239	   alfa.

241	4.  Implementation Optimizations

243	   It is possible to implement our algorithm by calculating tot_cwnd on
244	   each ack, however this would be costly especially when the number of
245	   subflows is large.  To avoid this overhead the implementation SHOULD
246	   maintain tot_cwnd per connection, and MUST update its value when the
247	   individual subflows' windows are updated.  Updating only requires one
248	   more addition or subtraction operation compared to the regular, per
249	   subflow congestion control code, so its performance impact should be
250	   minimal.

252	   Computing alfa per ack is also costly.  Simply maintaining alfa per
253	   connection does not help, as its value would still need to be updated
254	   per ack.  If we assume RTTs are constant, it is sufficient to compute
255	   a once per drop, as a does not change between drops (the insight here
256	   is that cwnd_i/cwnd_j = constant as long as both windows increase).

258	   Experimental results show that even if round trip times are not
259	   constant, using average round trip time instead of instantaneous
260	   round trip time gives good precision for computing alfa.  Hence, we
261	   recommend that alfa SHOULD be computed once per drop according to the
262	   formula above, by replacing rtt_i with rtt_avg_i.

264	   rtt_avg_i is computed by sampling the srtt_i whenever the window can
265	   accomodate one more packet, i.e. when cwnd / mss < (cwnd+increase)/
266	   mss.  The samples are averaged once per sawtooth into rtt_avg_i.
267	   This sampling ensures that there is no sampling bias for larger
268	   windows

270	   Given tot_cwnd and alfa, the congestion control algorithm is run for
271	   each subflow independently.  The window increase per ack is alfa *
272	   mss * bytes_acked / tot_cwnd.  Compared to traditional increase code,
273	   this would require floating point operations to be performed on each
274	   ack.

276	   To avoid such costly operations, implementors SHOULD add a state
277	   variable to each subflow incr_i = mss_i * mss_i * alfa / tot_cwnd.
278	   When alfa or tot_cwnd changes, incr_i MUST be updated.  Since the
279	   change is rare (once per sawtooth) the performance impact should be
280	   minimal.  With incr_i properly set, the increase per ack becomes
281	   bytes_acked * incr_i / mss_i.  As this requires only integer
282	   multiplication, the overhead is comparable to existing
283	   implementations of TCP.

285	4.1.  Implementation Considerations when CWND is Expressed in Packets

287	   When the congestion control algorithm maintains cwnd in packets rates
288	   than bytes, the code to compute tot_cwnd remains unchanged.

290	   To compute the rtt_avg_i, srtt will be sampled when cwnd_i is
291	   incremented.

293	   To compute the increase when an ack is received, the implementation
294	   for multipath congestion control is a simple extension of the TCP New
295	   Reno code.  In TCP New Reno cwnd_cnt is an additional state variable
296	   records the number of bytes acked since the last cwnd increment. cwnd
297	   is incremented again only when cwnd_cnt > cwnd.

299	   In the multipath case, cwnd_cnt_i is maintained for each subflow as
300	   above, and cwnd_i is increased by 1 when cwnd_cnt_i > tot_cwnd / alfa
301	   .  To avoid costly floating point operations, the right hand side of
302	   the inequality can be stored as a per connection state variable that
303	   is updated only when tot_cwnd or alfa change.

305	5.  Discussion

307	   To achieve perfect resource pooling, one must couple both increase
308	   and decrease of congestion windows across subflows, as in [KELLY].
309	   Yet this tends to exhibit "flappiness": when the paths have similar
310	   levels of congestion, the congestion controller will tend to allocate
311	   all the window to one random subflow, and allocate zero window to the
312	   other subflows.  The controller will perform random flips between
313	   these stable points. points.  This seems not desirable in general,
314	   and is particularly bad when the achieved rates depend on the RTT (as
315	   in the current Internet): in such a case, the resulting rate with
316	   fluctuate unpredictably depending on which state the controller is
317	   in, hence violating Goal 1.

319	   By only coupling increases our proposal removes flappiness but also
320	   reduces the extent of resource pooling the protocol achieves.  The
321	   algorithm will allocate window to the subflows such that p_i * cwnd_i
322	   = constant, for all i.  Thus, when the drop probabilities of the
323	   subflows are equals, each subflow will get an equal window, removing
324	   flappiness. when they are different, progressively more window will
325	   be allocated to the flow with the lower drop probability.  In
326	   contrast, perfect resource pooling requires that all the window
327	   should be allocated on the path with the lowest drop probability.

329	6.  Security Considerations

331	   None.

333	   Detailed security analysis for the Multipath TCP protocol itself is
334	   included in [I-D.ford-mptcp-multiaddressed] and [REF]

336	7.  Acknowledgements

338	   The authors are supported by Trilogy
339	   (http://www.trilogy-project.org), a research project (ICT-216372)
340	   partially funded by the European Community under its Seventh
341	   Framework Program.  The views expressed here are those of the
342	   author(s) only.  The European Commission is not liable for any use
343	   that may be made of the information in this document.

345	8.  IANA Considerations

347	   None.

349	9.  References

351	9.1.  Normative References

353	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
354	              Requirement Levels", BCP 14, RFC 2119, March 1997.

356	9.2.  Informative References

358	   [I-D.ford-mptcp-multiaddressed]
359	              Ford, A., Raiciu, C., Handley, M., and S. Barre, "TCP
360	              Extensions for Multipath Operation with Multiple
361	              Addresses", draft-ford-mptcp-multiaddressed-01 (work in
362	              progress), July 2009.

364	   [KELLY]    Kelly, F. and T. Voice, "Stability of end-to-end
365	              algorithms for joint routing and rate control", ACM
366	              SIGCOMM CCR vol. 35 num. 2, pp. 5-12, 2005,
367	              <http://portal.acm.org/citation.cfm?id=1064415>.

369	   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
370	              RFC 793, September 1981.

372	   [RFC3465]  Allman, M., "TCP Congestion Control with Appropriate Byte
373	              Counting (ABC)", RFC 3465, February 2003.

375	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
376	              Control", RFC 5681, September 2009.

378	   [WISCHIK]  Wischik, D., Handley, M., and M. Bagnulo Braun, "The
379	              Resource Pooling Principle", ACM SIGCOMM CCR vol. 38 num.
380	              5, pp. 47-52, October 2008,
381	              <http://ccr.sigcomm.org/online/files/p47-handleyA4.pdf>.

383	Authors' Addresses

385	   Costin Raiciu
386	   University College London
387	   Gower Street
388	   London  WC1E 6BT
389	   UK

391	   Email: c.raiciu@cs.ucl.ac.uk

393	   Mark Handley
394	   University College London
395	   Gower Street
396	   London  WC1E 6BT
397	   UK

399	   Email: m.handley@cs.ucl.ac.uk

401	   Damon Wischik
402	   University College London
403	   Gower Street
404	   London  WC1E 6BT
405	   UK

407	   Email: d.wischik@cs.ucl.ac.uk