idnits 2.17.1 draft-khalili-mptcp-congestion-control-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == It seems as if not all pages are separated by form feeds - found 0 form feeds but 11 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([5]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 259 has weird spacing: '...optimal resou...' == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (February 17, 2013) is 4079 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '7' is defined on line 386, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 6182 (ref. '3') ** Downref: Normative reference to an Experimental RFC: RFC 6356 (ref. '4') == Outdated reference: A later version (-06) exists of draft-khalili-mptcp-performance-issues-02 Summary: 5 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Multipath TCP Ramin Khalili 3 INTERNET-DRAFT T-Labs/TU-Berlin 4 Intended Status: Standard Track Nicolas Gast 5 Expires: August 21, 2013 Miroslav Popovic 6 Jean-Yves Le Boudec 7 EPFL-LCA2 8 February 17, 2013 10 11 draft-khalili-mptcp-congestion-control-00 13 Abstract 15 This document describes the mechanism of OLIA, the "opportunistic 16 linked increases algorithm". OLIA is a congestion control algorithm 17 for MPTCP. The current congestion control algorithm of MPTCP, LIA, 18 forces a tradeoff between optimal congestion balancing and 19 responsiveness. OLIA's design departs from this tradeoff and provide 20 these properties simultaneously. Hence, it solves the identified 21 performance problems with LIA while retaining non-flappiness and 22 responsiveness behavior of LIA [5]. 24 Status of this Memo 26 This Internet-Draft is submitted to IETF in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF), its areas, and its working groups. Note that 31 other groups may also distribute working documents as 32 Internet-Drafts. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on August 22, 2013. 41 The list of current Internet-Drafts can be accessed at 42 http://www.ietf.org/1id-abstracts.html 44 The list of Internet-Draft Shadow Directories can be accessed at 45 http://www.ietf.org/shadow.html 47 Copyright and License Notice 49 Copyright (c) 2013 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3 65 1.1 Requirements Language . . . . . . . . . . . . . . . . . . . 4 66 1.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 67 2 The set of best paths and paths with maximum windows . . . . . . 5 68 3 Opportunistic Linked-Increases Algorithm . . . . . . . . . . . . 6 69 4 Practical considerations . . . . . . . . . . . . . . . . . . . . 8 70 5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 71 6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 72 6.1 Normative References . . . . . . . . . . . . . . . . . . . . 10 73 6.2 Informative References . . . . . . . . . . . . . . . . . . . 10 74 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 76 1 Introduction 78 The current MPTCP implementation uses a congestion control algorithm 79 called LIA, the "Linked-Increases" algorithm [4]. However, MPTCP with 80 LIA suffers from important performance issues (see [5] and [6]): in 81 some scenarios upgrading TCP users to MPTCP can result in a 82 significant drop in the aggregate throughput in the network without 83 any benefit for anybody. We also show in [5] that MPTCP users could 84 be excessively aggressive toward TCP users. 86 The design of LIA forces a tradeoff between optimal congestion 87 balancing and responsiveness. Hence, to provide good responsiveness, 88 LIA's current implementation must depart from optimal congestion 89 balancing that leads to the identified problems. In this draft, we 90 introduce OLIA, the "opportunistic linked increases algorithm", as an 91 alternative to LIA. Contrary to LIA, OLIA's design is not based on a 92 trade-off between responsiveness and optimal congestion balancing; it 93 can provide both simultaneously. 95 Similarly to LIA, OLIA couples the additive increases and uses 96 unmodified TCP behavior in the case of a loss. The difference between 97 LIA and OLIA is in the increase part. OLIA's increase part, Equation 98 (1), has two terms: 100 - The first term is an adaptation of the increase term of Kelly and 101 Voice's algorithm [8]. This term is essential to provide optimal 102 resource pooling. 104 - The second term guarantees responsiveness and non-flappiness of 105 OLIA. By measuring the number of transmitted bytes since the last 106 loss, it reacts to events within the current window and adapts to 107 changes faster than the first term. 109 By adapting the window increases as a function of RTTs, OLIA also 110 compensates for different RTTs. As OLIA is rooted on the optimal 111 algorithm of [8], it provides fairness and optimal congestion 112 balancing. Because of the second term, it is responsive and non- 113 flappy. 115 OLIA is implemented in the Linux kernel and is now a part of MPTCP's 116 implementation. In [5], we study the performance of MPTCP with OLIA 117 over a testbed, by simulations and by theoretical analysis. We prove 118 theoretically that OLIA is Pareto-optimal and that it satisfies the 119 design goals of MPTCP described in [4]. Hence, it can provide optimal 120 congestion balancing and fairness in the network. Our measurements 121 and simulations indicate that MPTCP with OLIA is as responsive and 122 non-flappy as MPTCP with LIA and that it solves the identified 123 problems with LIA. 125 The rest of the document provides a description of OLIA. For an 126 analysis of its performance, we refer to [5]. 128 1.1 Requirements Language 130 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 131 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 132 document are to be interpreted as described in RFC 2119 [1]. 134 1.2 Terminology 136 Regular TCP: The standard version of TCP that operates between a 137 single pair of IP addresses and ports [2]. 139 Multipath TCP: A modified version of the regular TCP that allows a 140 user to spread its traffic across multiple paths. 142 MPTCP: The proposal for multipath TCP specified in [3]. 144 LIA: The Linked-Increases Algorithm of MPTCP (the congestion control 145 of MPTCP) [4]. 147 OLIA: The Opportunistic Linked-Increases Algorithm for MPTCP proposed 148 in [5]. 150 best_paths: The set of paths that are presumably the best paths for 151 the MPTCP connection. 153 max_w_paths: The set of paths with largest congestion windows. 155 collected_paths: The set of paths that are presumably the best paths 156 but do not have largest congestion window (i.e. the paths of 157 best_paths that are not in max_w_paths). 159 rtt_r: The Round-Trip Time on a path r. 161 MSS_r: The Maximum Segment Size that specifies the largest amount of 162 data can be transmitted by a TCP packet on the path r. 164 2 The set of best paths and paths with maximum windows 166 A MPTCP connection has access to one or more paths. Let r be one of 167 these paths. We denote by l_{1r} the number of bytes that were 168 successfully transmitted over path r between the last two losses seen 169 on r, and by l_{2r} the number of bytes that are successfully 170 transmitted over r after the last loss. We denote by 171 l_r=max{l_{1r},l_{2r}} the smoothed estimation of number of bytes 172 transmitted on path r between last two losses. 174 l_{1r} and l_{2r} can be measured by using information that is 175 already available to a regular TCP user: 177 - For each ACK on r: l_{2r} <- l_{2r} + (number of bytes that are 178 acknowledged by ACK), 180 - For each loss on r: l_{1r} <- l_{2r} and l_{2r} <- 0. 182 l_{1r} and l_{2r} are initially set to zero when the connection is 183 established. If no losses have been observed on r until now, then 184 l_{1r}=0 and l_{2r} is the total number of bytes transmitted on r. 186 Let rtt_r be the round-trip time observed on path r (e.g. the 187 smoothed round-trip time used by regular TCP). We denote by 188 best_paths the set of paths r that have the maximum value of 189 l_r*l_r/rtt_r and by max_w_paths the set of paths with largest 190 congestion window. Also, let collected_paths be the set of paths that 191 are in best_paths but not in max_w_paths. Note that l_{1r}, l_{2r}, 192 l_r, best_paths, max_w_paths and collected_paths are all functions of 193 time. 195 best_paths represents the set of paths that are presumably the best 196 paths for the user: 1/l_r can be considered as an estimate of byte 197 loss probability on path r, and hence the rate that path r can 198 provide to a TCP user can be estimated by (2*l_r)^{1/2}/rtt_r. 199 collected_paths is the set of "collected" paths: the paths that are 200 presumably the best paths for the user but that are not yet fully 201 used. 203 3 Opportunistic Linked-Increases Algorithm 205 In this section, we introduce OLIA. OLIA is a window-based 206 congestion-control algorithm. It couples the increase of congestion 207 windows and uses unmodified TCP behavior in the case of a loss. OLIA 208 is an alternative for LIA, the current congestion control algorithm 209 of MPTCP. 211 The algorithm only applies to the increase part of the congestion 212 avoidance phase. The fast retransmit and fast recovery algorithms, as 213 well as the multiplicative decrease of the congestion avoidance 214 phase, are the same as in TCP [2]. We also use a similar slow start 215 algorithm as in TCP, with the modification that we set the ssthresh 216 (slow start threshold) to be 1 MSS if multiple paths are established. 217 In the case of a single path flow, we use the same minimum ssthresh 218 as in TCP (i.e. 2 MSS). The purpose of this modification is to avoid 219 transmitting unnecessary traffic over congested paths when multiple 220 paths are available to a user. 222 For a path r, we denote by w_r the congestion windows on this path 223 (also called subflow). We denote by MSS_r be the maximum segment size 224 on the path r. We assume that w_r is maintained in bytes. 226 Our proposed "Opportunistic Linked-Increases Algorithm" (OLIA) must: 228 - For each ACK on path r, increase w_r by 230 w_r/rtt_r^2 alpha_r 231 ( --------------------- + --------- ) (1) 232 (SUM (w_p/rtt_p))^2 w_r 234 multiplied by MSS_r * bytes_acked. 236 The summation in the denominator of the first term is over all the 237 paths available to the user. 239 alpha_r is calculated as follows: 241 - If r is in collected_paths, then 242 1/number_of_paths 243 alpha_r = -------------------- 244 |collected_paths| 246 - If r is in max_w_paths and if collected_paths is not empty, then 248 1/number_of_paths 249 alpha_r = - ----------------- 250 |max_w_paths| 252 - Otherwise, alpha_r=0. 254 |collected_paths| and |max_w_paths| are the number of paths in 255 collected_paths and in max_w_paths. Note that the sum of all alpha_r 256 is equal to 0. 258 The first term in (1) is an adaptation of Kelly and Voice's increase 259 term [8] and provides the optimal resource pooling (Kelly and 260 Voice's algorithm is based on scalable TCP; the first term in (1) is 261 a TCP compatible version of their algorithm that compensates also for 262 different RTTs). The second term, with alpha_r, guarantees 263 responsiveness and non-flappiness of our algorithm. 265 By definition of alpha_r, if all the best paths have the largest 266 window size, then alpha_r=0 for any r. This is because we already use 267 the capacity available to the user by using all the best path. 269 If there is any best path with a small window size, i.e. if 270 collected_paths is not empty, then alpha_r is positive for all r in 271 collected_paths and negative for all r in max_w_paths. Hence, our 272 algorithm increases windows faster on the paths that are presumably 273 best but that have small windows. The increase will be slower on the 274 paths with maximum windows. In this case, OLIA re-forwards traffic 275 from fully used paths (i.e. paths in max_w_paths) to paths that have 276 free capacity available to the users (i.e. paths in collected_paths). 278 In [4], three goals have been proposed for the design of a practical 279 multipath congestion control algorithm : (1) Improve throughput: a 280 multipath TCP user should perform at least as well as a TCP user that 281 uses the best path available to it. (2) Do no harm: a multipath TCP 282 user should never take up more capacity from any of its paths than a 283 TCP user. And (3) balance congestion: a multipath TCP algorithm 284 should balance congestion in the network, subject to meeting the 285 first two goals. 287 Our theoretical results in [5] show that OLIA fully satisfies these 288 three goals. LIA, however, fails to fully satisfy the goal (3) as 289 discussed in [4] and [5]. Moreover, in [5], we show through 290 measurements and by simulation that our algorithm is as responsive 291 and non-flappy as LIA and that it can solve the identified problems 292 with LIA. 294 4 Practical considerations 296 Calculation of alpha requires performing costly floating point 297 operation whenever an ACK received over path r. In practice, however, 298 we can integrate calculation of alpha and Equation (1) together. Our 299 algorithm can be therefore simplified as the following. 301 For each ACK on the path r: 303 - If r is in collected_paths, increase w_r by 305 w_r/rtt_r^2 1 306 ------------------- + ----------------------- (2) 307 (SUM (w_p/rtt_p))^2 w_r * number_of_paths * |collected_paths| 309 multiplied by MSS_r * bytes_acked. 311 - If r is in max_w_paths and if collected_paths is not empty, 312 increase w_r by 314 w_r/rtt_r^2 1 315 -------------------- - ------------------------ (3) 316 (SUM (w_r/rtt_r))^2 w_r * number_of_paths * |max_w_paths| 318 multiplied by MSS_r * bytes_acked. 320 - Otherwise, increase w_r by 322 (w_r/rtt_r^2) 323 ---------------------------------- (4) 324 (SUM (w_r/rtt_r))^2 326 multiplied by MSS_r * bytes_acked. 328 Therefore, to compute the increase, we only need to determine the 329 sets collected_paths and max_w_paths when an ACK is received on the 330 path r. We can further simplify the algorithm by updating the sets 331 collected_paths and max_w_paths only once per round-trip time or 332 whenever there is a drop on the path. 334 We can see from above that in some cases (i.e. when r is max_w_paths 335 and collected_paths is not empty) the increase could be negative. 336 This is a property of our algorithm as in this case OLIA re-forwards 337 traffic from paths in max_w_paths to paths in collected_paths. It is 338 easy to show that using our algorithm, w_r >= 1 for any path r. 340 5 Discussion 342 Our results in [5] show that the identified problems with current 343 MPTCP implementation are not due to the nature of a window-based 344 multipath protocol, but rather to the design of LIA. OLIA shows that 345 it is possible to build an alternative to LIA that mitigates these 346 problems and that is as responsive and non-flappy as LIA. 348 Our proposed algorithm can provide similar resource pooling as Kelly 349 and Voice's algorithm [8] and fully satisfies the design goals of 350 MPTCP described in [4]. Hence, it can provide optimal congestion 351 balancing and fairness in the network. Moreover, it is as responsive 352 and non-flappy as LIA [5]. 354 We therefore believe that IETF should revisit the congestion control 355 part of MPTCP and that an alternative algorithm, such as OLIA, should 356 be considered. 358 6 References 360 6.1 Normative References 362 [1] Bradner, S., "Key words for use in RFCs to Indicate 363 Requirement Levels", BCP 14, RFC 2119, March 1997. 365 [2] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 366 Control", RFC 5681, September 2009. 368 [3] Ford, A., Raiciu, C., Greenhalgh, A., and M. Handley, 369 "Architectural Guidelines for Multipath TCP Development", 370 RFC 6182, March 2011. 372 [4] Raiciu, C., Handley, M., and D. Wischik, "Coupled 373 Congestion Control for Multipath Transport Protocols", 374 RFC 6356, October 2011. 376 6.2 Informative References 378 [5] R. Khalili, N. Gast, M. Popovic, U. Upadhyay, and J.-Y. Le 379 Boudec. "MPTCP is not Pareto-optimality: Performance 380 issues and a possible solution", ACM CoNext 2012. 382 [6] R. Khalili, N. Gast, M. Popovic, and J.-Y. Le Boudec. 383 "Performance Issues with MPTCP", draft-khalili-mptcp- 384 performance-issues-02. 386 [7] M. Handly, D. Wischik, C. Raiciu, "Coupled Congestion 387 Control for MPTCP", presented at 77th IETF meeting, 388 Anaheim, California. 389 . 391 [8] Kelly, F. and T. Voice, "Stability of end-to-end 392 algorithms for joint routing and rate control", ACM 393 SIGCOMM CCR vol. 35 num. 2, pp. 5-12, 2005. 395 Authors' Addresses 397 Ramin Khalili 398 T-Labs/TU-Berlin 399 TEL 3, Ernst-Reuter-Platz 7 400 10587 Berlin 401 Germany 403 Phone: +49 30 8353 58276 404 EMail: ramin@net.t-labs.tu-berlin.de 406 Nicolas Gast 407 EPFL IC ISC LCA2 408 Station 14 409 CH-1015 Lausanne 410 Switzerland 412 Phone: +41 21 693 1254 413 EMail: nicolas.gast@epfl.ch 415 Miroslav Popovic 416 EPFL IC ISC LCA2 417 Station 14 418 CH-1015 Lausanne 419 Switzerland 421 Phone: +41 21 693 6466 422 EMail: miroslav.popovic@epfl.ch 424 Jean-Yves Le Boudec 425 EPFL IC ISC LCA2 426 Station 14 427 CH-1015 Lausanne 428 Switzerland 430 Phone: +41 21 693 6631 431 EMail: jean-yves.leboudec@epfl.ch