idnits 2.17.1 draft-raiciu-mptcp-congestion-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 19, 2009) is 5302 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Missing Reference: 'REF' is mentioned on line 334, but not defined == Outdated reference: A later version (-03) exists of draft-ford-mptcp-multiaddressed-01 -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force C. Raiciu 3 Internet-Draft M. Handley 4 Intended status: Experimental D. Wischik 5 Expires: April 22, 2010 University College London 6 October 19, 2009 8 Coupled Multipath-Aware Congestion Control 9 draft-raiciu-mptcp-congestion-00 11 Status of this Memo 13 This Internet-Draft is submitted to IETF in full conformance with the 14 provisions of BCP 78 and BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on April 22, 2010. 34 Copyright Notice 36 Copyright (c) 2009 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents in effect on the date of 41 publication of this document (http://trustee.ietf.org/license-info). 42 Please review these documents carefully, as they describe your rights 43 and restrictions with respect to this document. 45 Abstract 47 Often endpoints are connected by multiple paths, but communications 48 are usually restricted to a single path per socket. Resource usage 49 within the network would be more efficient were it possible for these 50 multiple paths to be used concurrently. 52 The use of multiple paths simultaneously, specifically within a 53 Multipath TCP protocol, necessitates the development of new 54 congestion control algorithms. If existing algorithms such as TCP 55 New Reno were run independently on each path, the multipath flow 56 would take more than its fair share if there was a common bottleneck. 57 Further, it is desirable that a source with multiple paths available 58 will transfer more traffic using the least congested of the paths, 59 hence achieving resource pooling. This would increase the overall 60 utilization of the network and also its robustness to failure. 62 This document presents a congestion control algorithm which couples 63 the congestion control algorithms running on different subflows by 64 linking their increase functions, and dynamically controls the 65 overall aggresiveness of the multipath flow. The result is a 66 practical algorithm that is fair to TCP at bottlenecks while moving 67 traffic away from congested links. 69 Table of Contents 71 1. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 72 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 3. Coupled Congestion Control Algorithm . . . . . . . . . . . . . 5 74 4. Implementation Optimizations . . . . . . . . . . . . . . . . . 7 75 4.1. Implementation Considerations when CWND is Expressed 76 in Packets . . . . . . . . . . . . . . . . . . . . . . . . 8 77 5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 8 78 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 79 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 80 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 81 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 82 9.1. Normative References . . . . . . . . . . . . . . . . . . . 9 83 9.2. Informative References . . . . . . . . . . . . . . . . . . 9 84 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 86 1. Requirements Language 88 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 89 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 90 document are to be interpreted as described in RFC 2119 [RFC2119]. 92 2. Introduction 94 Multipath TCP (MPTCP, [I-D.ford-mptcp-multiaddressed]) is a set of 95 extensions to regular TCP [RFC0793] that allow one TCP connection to 96 be spread across multiple paths. MPTCP distributes load through the 97 creation of separate "subflows" across potentially disjoint paths. 99 How should congestion control be performed for multipath TCP? First, 100 each subflow must have its own congestion control state (i.e. cwnd) 101 so that capacity on that path is matched by offered load. The 102 simplest way to achieve this goal is to simply run TCP New Reno 103 congestion control [RFC5681] on each subflow. However this solution 104 is unsatisfactory as it gives the multipath flow an unfair share when 105 the paths taken by its different subflows share a common bottleneck. 107 Bottleneck fairness is just one requirement multipath congestion 108 control should meet. The following three goals capture the desirable 109 properties of a practical multipath congestion control algorithm: 111 o Goal 1 (Improve Throughput) A multipath flow should perform at 112 least as well as a single path flow would on the best of the paths 113 available to it. 115 o Goal 2 (Do no harm) A multipath flow should not take up more 116 capacity on any one of its paths than if it was a single path flow 117 using only that route. This guarantees it will not unduly harm 118 other flows. 120 o Goal 3 (Balance congestion) A multipath flow should move as much 121 traffic as possible off its most congested paths, subject to 122 meeting the first two goals. 124 Goals 1 and 2 together ensure fairness at the bottleneck. Goal 3 125 captures the concept of resource pooling [WISCHIK]: if each multipath 126 flow sends more data through its least congested path, the traffic in 127 the network will move away from congested areas. This improves 128 robustness and overall throughput, among other things. The way to 129 achieve resource pooling is to effectively "couple" the congestion 130 control loops for the different subflows. 132 We propose an algorithm that couples only the additive increase 133 function of the subflows, and uses unmodified TCP New Reno behavior 134 in case of a drop. The algorithm relies on the traditional TCP 135 mechanisms to detect drops, to retransmit data, etc. 137 Detecting shared bottlenecks reliably is quite difficult, so our 138 proposal always assumes there is a shared bottleneck and throttles 139 the aggressiveness of the multipath flow such that its total 140 throughput is no more than that of a regular TCP running on the best 141 path available. 143 It is intended that the algorithm presented here can be applied to 144 other multipath transport protocols, such as alternative multipath 145 extensions to TCP, or indeed any other congestion-aware transport 146 protocols. However, for the purposes of example this document will, 147 where appropriate, refer to the MPTCP protocol. 149 It is foreseeable that different congestion controllers will be 150 implemented for Multipath transport, each aiming to achieve different 151 properties in the resource pooling/fairness/stability design space. 152 In particular, solutions that give better resource pooling may be 153 proposed. This algorithm is conservative from this point of view, 154 sacrificing resource pooling for stability. 156 3. Coupled Congestion Control Algorithm 158 The algorithm we present only applies to the congestion avoidance 159 state. The slow start, fast retransmit, and fast recovery algorithms 160 are the same as [RFC5681]. 162 Let cwnd_i be the congestion window on the subflow i, and assume 163 there is always data to send. Let tot_cwnd be the sum of the 164 congestion windows of all subflows in the connection. Let p_i, rtt_i 165 and mss_i be the drop probability, round trip time and maximum 166 segment size on subflow i. 168 We assume throughout this document that the congestion window is 169 maintained in bytes, unless otherwise specified. We briefly describe 170 the algorithm for packet-based implementations of cwnd in section 171 Section 4.1. 173 Our proposed "Linked Increases" algorithm is: 175 o For each ack received on subflow i, increase cwnd_i by min ( 176 ceil(alfa*bytes_acked*mss_i/tot_cwnd) , bytes_acked*mss_i/cwnd_i ) 178 o For each drop event on subflow i, decrease set cwnd_i to 179 max(cwnd_i/2,2). A drop event is one or more packet drops 180 experienced by a subflows in the same round trip time. 182 The decrease function is the same as in TCP New Reno, so we will not 183 discuss it further in the remainder of this document. 185 The increase formula takes the minimum between the computed increase 186 for the multipath subflow (first argument to min), and the increase 187 TCP would get in the same scenario (the second argument). In this 188 way, we ensure that a multipath subflow is NEVER more aggressive than 189 a TCP flow, hence achieving goal 2 (do no harm) 191 ceil returns the smallest integer greater than or equal to its real- 192 valued input. As long as bytes_acked is non-zero, the increase is 193 non-zero. The increase rounds up the computed value; and it ensures 194 that multipath does not suffer more from rounding errors than TCP 195 (which it might, as tot_cwnd is always greater than individual 196 cwnds). We discuss how to implement this formula in practice in the 197 next section. 199 We assume appropriate byte counting (ABC, [RFC3465]) is used, hence 200 the bytes_acked variable records the number of bytes newly 201 acknowledged. If ABC is not used, bytes_acked SHOULD be set to 202 mss_i. 204 To compute tot_cwnd, it is an easy mistake to sum up cwnd_i across 205 all subflows: when a flow is in fast retransmit, its cwnd is 206 typically inflated and no longer represents the real congestion 207 window. The correct behavior is to use the ssthresh value for flows 208 in fast retransmit whe computing tot_cwnd. 210 "alfa" is a parameter of the algorithm that describes the 211 aggresiveness of the multipath flow. To meet Goal 1 (improve 212 throughput), the value of alfa is chosen such that the aggregate 213 throughput of the multipath flow is equal to the rate a TCP flow 214 would get if it ran on the best path. 216 The total throughput of a multipath flow depends on the value of alfa 217 and the drop probabilities, maximum segment sizes and round trip 218 times of its paths. Since we require that the total throughput is no 219 worse than the throughput a single TCP would get on the fastest path, 220 it is impossible to choose a-priori a single value of alfa that 221 achieves the desired throughput in every ocasion. Hence, alfa must 222 be computed for each multipath flow, based on the observed properties 223 of the paths. 225 The formula to compute alfa is: 227 2 228 cwnd_i * mss_i 229 max --------------- 230 i 2 231 rtt_i 232 alfa = tot_cwnd * ------------------------- 233 / cwnd_i * mss_i \ 2 234 | sum ----------------| 235 \ i rtt_i / 237 The formula is derived by equalizing the rate of the multipath flow 238 with the rate of a TCP running on the fastest path, and solving for 239 alfa. 241 4. Implementation Optimizations 243 It is possible to implement our algorithm by calculating tot_cwnd on 244 each ack, however this would be costly especially when the number of 245 subflows is large. To avoid this overhead the implementation SHOULD 246 maintain tot_cwnd per connection, and MUST update its value when the 247 individual subflows' windows are updated. Updating only requires one 248 more addition or subtraction operation compared to the regular, per 249 subflow congestion control code, so its performance impact should be 250 minimal. 252 Computing alfa per ack is also costly. Simply maintaining alfa per 253 connection does not help, as its value would still need to be updated 254 per ack. If we assume RTTs are constant, it is sufficient to compute 255 a once per drop, as a does not change between drops (the insight here 256 is that cwnd_i/cwnd_j = constant as long as both windows increase). 258 Experimental results show that even if round trip times are not 259 constant, using average round trip time instead of instantaneous 260 round trip time gives good precision for computing alfa. Hence, we 261 recommend that alfa SHOULD be computed once per drop according to the 262 formula above, by replacing rtt_i with rtt_avg_i. 264 rtt_avg_i is computed by sampling the srtt_i whenever the window can 265 accomodate one more packet, i.e. when cwnd / mss < (cwnd+increase)/ 266 mss. The samples are averaged once per sawtooth into rtt_avg_i. 267 This sampling ensures that there is no sampling bias for larger 268 windows 270 Given tot_cwnd and alfa, the congestion control algorithm is run for 271 each subflow independently. The window increase per ack is alfa * 272 mss * bytes_acked / tot_cwnd. Compared to traditional increase code, 273 this would require floating point operations to be performed on each 274 ack. 276 To avoid such costly operations, implementors SHOULD add a state 277 variable to each subflow incr_i = mss_i * mss_i * alfa / tot_cwnd. 278 When alfa or tot_cwnd changes, incr_i MUST be updated. Since the 279 change is rare (once per sawtooth) the performance impact should be 280 minimal. With incr_i properly set, the increase per ack becomes 281 bytes_acked * incr_i / mss_i. As this requires only integer 282 multiplication, the overhead is comparable to existing 283 implementations of TCP. 285 4.1. Implementation Considerations when CWND is Expressed in Packets 287 When the congestion control algorithm maintains cwnd in packets rates 288 than bytes, the code to compute tot_cwnd remains unchanged. 290 To compute the rtt_avg_i, srtt will be sampled when cwnd_i is 291 incremented. 293 To compute the increase when an ack is received, the implementation 294 for multipath congestion control is a simple extension of the TCP New 295 Reno code. In TCP New Reno cwnd_cnt is an additional state variable 296 records the number of bytes acked since the last cwnd increment. cwnd 297 is incremented again only when cwnd_cnt > cwnd. 299 In the multipath case, cwnd_cnt_i is maintained for each subflow as 300 above, and cwnd_i is increased by 1 when cwnd_cnt_i > tot_cwnd / alfa 301 . To avoid costly floating point operations, the right hand side of 302 the inequality can be stored as a per connection state variable that 303 is updated only when tot_cwnd or alfa change. 305 5. Discussion 307 To achieve perfect resource pooling, one must couple both increase 308 and decrease of congestion windows across subflows, as in [KELLY]. 309 Yet this tends to exhibit "flappiness": when the paths have similar 310 levels of congestion, the congestion controller will tend to allocate 311 all the window to one random subflow, and allocate zero window to the 312 other subflows. The controller will perform random flips between 313 these stable points. points. This seems not desirable in general, 314 and is particularly bad when the achieved rates depend on the RTT (as 315 in the current Internet): in such a case, the resulting rate with 316 fluctuate unpredictably depending on which state the controller is 317 in, hence violating Goal 1. 319 By only coupling increases our proposal removes flappiness but also 320 reduces the extent of resource pooling the protocol achieves. The 321 algorithm will allocate window to the subflows such that p_i * cwnd_i 322 = constant, for all i. Thus, when the drop probabilities of the 323 subflows are equals, each subflow will get an equal window, removing 324 flappiness. when they are different, progressively more window will 325 be allocated to the flow with the lower drop probability. In 326 contrast, perfect resource pooling requires that all the window 327 should be allocated on the path with the lowest drop probability. 329 6. Security Considerations 331 None. 333 Detailed security analysis for the Multipath TCP protocol itself is 334 included in [I-D.ford-mptcp-multiaddressed] and [REF] 336 7. Acknowledgements 338 The authors are supported by Trilogy 339 (http://www.trilogy-project.org), a research project (ICT-216372) 340 partially funded by the European Community under its Seventh 341 Framework Program. The views expressed here are those of the 342 author(s) only. The European Commission is not liable for any use 343 that may be made of the information in this document. 345 8. IANA Considerations 347 None. 349 9. References 351 9.1. Normative References 353 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 354 Requirement Levels", BCP 14, RFC 2119, March 1997. 356 9.2. Informative References 358 [I-D.ford-mptcp-multiaddressed] 359 Ford, A., Raiciu, C., Handley, M., and S. Barre, "TCP 360 Extensions for Multipath Operation with Multiple 361 Addresses", draft-ford-mptcp-multiaddressed-01 (work in 362 progress), July 2009. 364 [KELLY] Kelly, F. and T. Voice, "Stability of end-to-end 365 algorithms for joint routing and rate control", ACM 366 SIGCOMM CCR vol. 35 num. 2, pp. 5-12, 2005, 367 . 369 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 370 RFC 793, September 1981. 372 [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte 373 Counting (ABC)", RFC 3465, February 2003. 375 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 376 Control", RFC 5681, September 2009. 378 [WISCHIK] Wischik, D., Handley, M., and M. Bagnulo Braun, "The 379 Resource Pooling Principle", ACM SIGCOMM CCR vol. 38 num. 380 5, pp. 47-52, October 2008, 381 . 383 Authors' Addresses 385 Costin Raiciu 386 University College London 387 Gower Street 388 London WC1E 6BT 389 UK 391 Email: c.raiciu@cs.ucl.ac.uk 393 Mark Handley 394 University College London 395 Gower Street 396 London WC1E 6BT 397 UK 399 Email: m.handley@cs.ucl.ac.uk 401 Damon Wischik 402 University College London 403 Gower Street 404 London WC1E 6BT 405 UK 407 Email: d.wischik@cs.ucl.ac.uk