idnits 2.17.1 draft-bonaventure-mptcp-experience-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 01, 2014) is 3581 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 6824 (Obsoleted by RFC 8684) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 MPTCP Working Group O. Bonaventure 3 Internet-Draft C. Paasch 4 Intended status: Informational UCLouvain 5 Expires: January 2, 2015 July 01, 2014 7 Experience with Multipath TCP 8 draft-bonaventure-mptcp-experience-00 10 Abstract 12 This document discusses operational experiences of using Multipath 13 TCP in real world networks. It lists several prominent use cases for 14 which Multipath TCP has been considered and is being used. It also 15 gives insight in some heuristics and decisions that have helped to 16 realize these use cases. Further, it presents several open issues 17 that are yet unclear on how they can be solved. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on January 2, 2015. 36 Copyright Notice 38 Copyright (c) 2014 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 54 2. Middlebox interference . . . . . . . . . . . . . . . . . . . 3 55 3. Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 4. Congestion control . . . . . . . . . . . . . . . . . . . . . 8 57 5. Subflow management . . . . . . . . . . . . . . . . . . . . . 8 58 5.1. Implemented subflow managers . . . . . . . . . . . . . . 9 59 5.2. Subflow destination port . . . . . . . . . . . . . . . . 11 60 5.3. Closing subflows . . . . . . . . . . . . . . . . . . . . 12 61 6. Packet schedulers . . . . . . . . . . . . . . . . . . . . . . 13 62 7. Interactions with the Domain Name System . . . . . . . . . . 14 63 8. Captive portals . . . . . . . . . . . . . . . . . . . . . . . 15 64 9. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 15 65 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 15 66 11. Informative References . . . . . . . . . . . . . . . . . . . 15 67 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 69 1. Introduction 71 Multipath TCP was standardized in [RFC6824] and four implementations 72 have been developed [I-D.eardley-mptcp-implementations-survey]. 73 Since the publication of [RFC6824], some experience has been gathered 74 by various network researchers and users about the issues that arise 75 when Multipath TCP is used in the Internet. 77 Most of the experience reported in this document comes from the 78 utilization of the Multipath TCP implementation in the Linux kernel 79 [MultipathTCP-Linux]. It has been downloaded and is used by 80 thousands of users all over the world. Many of these users have 81 provided direct or indirect feedback by writing documents (scientific 82 articles or blog messages) or posting to the mptcp-dev mailing list ( 83 https://listes-2.sipr.ucl.ac.be/sympa/arc/mptcp-dev ) . This 84 Multipath TCP implementation is actively maintained and continuously 85 improved. It is used on various types of hosts, ranging from 86 smartphones or embedded systems to high-end servers. 88 This is not, by far, the most widespread deployment of Multipath TCP. 89 Since September 2013, Multipath TCP is also supported on smartphones 90 and tablets running iOS7 [IOS7]. There are likely hundreds of 91 millions of Multipath TCP enabled devices. However, this particular 92 Multipath TCP implementation is currently only used to support a 93 single application. Unfortunately, there is no public information 94 about the lessons learned from this large scale deployment. 96 This document is organized as follows. We explain in Section 2 which 97 types of middleboxes the Linux Kernel implementation of Multipath TCP 98 supports and how it reacts upon encountering these. Next, we list 99 several use cases of Multipath TCP in Section 3. Section 4 100 summarizes the MPTCP specific congestion controls that have been 101 implemented. Section 5 and 6 discuss heuristics and issues with 102 respect to subflow management as well as the scheduling across the 103 subflows. Section 7 presents issues with respect to content delivery 104 networks and suggests a solution to this issue. Finally, Section 8 105 shows an issue with captive portals where MPTCP will behave 106 suboptimal. 108 2. Middlebox interference 110 The interference caused by various types of middleboxes has been an 111 important concern during the design of the Multipath TCP protocol. 112 Three studies on the interactions between Multipath TCP and 113 middleboxes are worth being discussed. 115 The first analysis was described in [IMC11]. This paper was the main 116 motivation for including inside Multipath TCP various techniques to 117 cope with middlebox interference. More specifically, Multipath TCP 118 has been designed to cope with middleboxes that : - change source or 119 destination addresses - change source or destination port numbers - 120 change TCP sequence numbers - split or coalesce segments - remove TCP 121 options - modify the payload of TCP segments 123 These middlebox interferences have all been included in the MBtest 124 suite [MBTest]. This test suite has been used [HotMiddlebox13] to 125 verify the reaction of the Multipath TCP implementation in the Linux 126 kernel when faced with middlebox interference. The test environment 127 used for this evaluation is a dual-homed client connected to a 128 single-homed server. The middlebox behavior can be activated on any 129 of the paths. The main results of this analysis are : 131 o the Multipath TCP implementation in the Linux kernel is not 132 affected by a middlebox that performs NAT or modifies TCP sequence 133 numbers 135 o when a middlebox removes the MP_CAPABLE option from the initial 136 SYN segment, the Multipath TCP implementation in the Linux kernel 137 falls back correctly to regular TCP 139 o when a middlebox removes the DSS option from all data segments, 140 the Multipath TCP implementation in the Linux kernel falls back 141 correctly to regular TCP 143 o when a middlebox performs segment coalescing, the Multipath TCP 144 implementation in the Linux kernel is still able to accurately 145 extract the data corresponding to the indicated mapping 147 o when a middlebox performs segment splitting, the Multipath TCP 148 implementation in the Linux kernel correctly reassembles the data 149 corresponding to the indicated mapping. [HotMiddlebox13] 150 documents a corner case with segment splitting that may lead to 151 desynchronisation between the two hosts. 153 The interactions between Multipath TCP and real deployed middleboxes 154 is also analyzed in [HotMiddlebox13] and a particular scenario with 155 the FTP application level gateway running on a NAT is described. 157 From an operational viewpoint, knowing that Multipath TCP can cope 158 with various types of middlebox interference is important. However, 159 there are situations where the network operators need to gather 160 information about where a particular middlebox interference occurs. 161 The tracebox software [tracebox] described in [IMC13a] is an 162 extension of the popular traceroute software that enables network 163 operators to check at which hop a particular field of the TCP header 164 (including options) is modified. It has been used by several network 165 operators to debug various middlebox interference problems. tracebox 166 includes a scripting language that enables its user to specify 167 precisely which packet is sent by the source. tracebox sends packets 168 with an increasing TTL/HopLimit and compares the information returned 169 in the ICMP messages with the packet that it sends. This enables 170 tracebox to detect any interference caused by middleboxes on a given 171 path. tracebox works better when routers implement the ICMP extension 172 defined in [RFC1812]. 174 3. Use cases 176 Multipath TCP has been tested in several use cases. Several of the 177 papers published in the scientific litterature have identified 178 possible improvements that are worth being discussed here. 180 A first, although initially unexpected, documented use case for 181 Multipath TCP has been the datacenters [HotNets][SIGCOMM11]. Today's 182 datacenters are designed to provide several paths between single- 183 homed servers. The multiplicity of these paths comes from the 184 utilization of Equal Cost Multipath (ECMP) and other load balancing 185 techniques inside the datacenter. Most of the deployed load 186 balancing techniques in these datacenters rely on hashes computed or 187 the five tuple to ensure that all packets from the same TCP 188 connection will follow the same path to prevent packet reordering. 189 The results presented in [HotNets] demonstrate by simulations that 190 Multipath TCP can achieve a better utilization of the available 191 network by using multiple subflows for each Multipath TCP session. 192 Although [RFC6182] assumes that at least one of the communicating 193 hosts has several IP addresses, [HotNets] demonstrates that there are 194 also benefits when both hosts are single-homed. This idea was 195 pursued further in [SIGCOMM11] where the Multipath TCP implementation 196 in the Linux kernel was modified to be able to use several subflows 197 from the same IP address. Measurements performed in a public 198 datacenter showed performance improvements with Multipath TCP. 200 Although ECMP is widely used inside datacenters, this is not the only 201 environment where there are different paths between a pair of hosts. 202 ECMP and other load balancing techniques such as LAG are widely used 203 in today's network and having multiple paths between a pair of 204 single-homed hosts is becoming the norm instead of the exception. 205 Although these multiple paths have often the same cost (from an IGP 206 metrics viewpoint), they do not necessarily have the same 207 performance. For example, [IMC13c] reports the results of a long 208 measurement study showing that load balanced Internet paths between 209 that same pair of hosts can have huge delay differences. 211 A second use case that has been explored by several network 212 researchers is the cellular/WiFi offload use case. Smartphones or 213 other mobile devices equipped with two wireless interfaces are a very 214 common use case for Multipath TCP. As of this writing, this is also 215 the largest deployment of Multipath-TCP enabled devices [IOS7]. 216 Unfortunately, as there are no public measurements about this 217 deployment, we can only rely on published papers that have mainly 218 used the Multipath TCP implementation in the Linux kernel for their 219 experiment. 221 The performance of Multipath TCP in wireless networks was briefly 222 evaluated in [NSDI12]. One experiment analyzes the performance of 223 Multipath TCP on a client with two wireless interfaces. This 224 evaluation shows that when the receive window is large, Multipath TCP 225 can efficiently use the two available links. However, if the window 226 becomes smaller, then packets sent on a slow path can block the 227 transmission of packets on a faster path. In some cases, the 228 performance of Multipath TCP over two paths can become lower than the 229 performance of regular TCP over the best performing path. Two 230 heuristics, reinjection and penalization, are proposed in [NSDI12] to 231 solve this identified performance problem. These two heuristics have 232 since been used in the Multipath TCP implementation in the Linux 233 kernel. [CONEXT13] explored the problem in more details and revealed 234 some other scenarios where Multipath TCP can have difficulties in 235 efficiently pooling the available paths. Improvements to the 236 Multipath TCP implementation in the Linux kernel are proposed in 237 [CONEXT13] to cope with some of these problems. 239 The first experimental analysis of Multipath TCP in a public wireless 240 environment was presented in [Cellnet12]. These measurements explore 241 the ability of Multipath TCP to use two wireless networks (real WiFi 242 and 3G networks). Three modes of operation are compared. The first 243 mode of operation is the simultaneous use of the two wireless 244 networks. In this mode, Multipath TCP pools the available resources 245 and uses both wireless interfaces. This mode provides fast handover 246 from WiFi to cellular or the opposite when the user moves. 247 Measurements presented in [CACM14] show that the handover from one 248 wireless network to another is not an abrupt process. When a host 249 moves, it does not experience either excellent connectivity or no 250 connectivity at all. Instead, there are regions where the quality of 251 one of the wireless networks is weaker than the other, but the host 252 considers this wireless network to still be up. When a mobile host 253 enters such regions, its ability to send packets over another 254 wireless network is important to ensure a smooth handover. This is 255 clearly illustrated from the packet trace discussed in [CACM14]. 257 Many cellular networks use volume-based pricing and users often 258 prefer to use unmetered WiFi networks when available instead of 259 metered cellular networks. [Cellnet12] implements the support for 260 the MP_PRIO option to explore two other modes of operation. 262 In the backup mode, Multipath TCP opens a TCP subflow over each 263 interface, but the cellular interface is configured in backup mode. 264 This implies that data only flows over the WiFi interface when both 265 interfaces are considered to be active. If the WiFi interface fails, 266 then the traffic switches quickly to the cellular interface, ensuring 267 a smooth handover from the user's viewpoint [Cellnet12]. The cost of 268 this approach is that the WiFi and cellular interfaces likely remain 269 active all the time since all subflows are established over the two 270 interfaces. 272 The single-path mode is slightly different. This mode benefits from 273 the break-before-make capability of Multipath TCP. When an MPTCP 274 session is established, a subflow is created over the WiFi interface. 275 No packet is sent over the cellular interface as long as the WiFi 276 interface remains up [Cellnet12]. This implies that the cellular 277 interface can remain idle and battery capacity is preserved. When 278 the WiFi interface fails, new subflows are established over the 279 cellular interface in order to preserve the established Multipath TCP 280 sessions. Compared to the backup mode described earlier, this mode 281 of operation is characterized by a throughput drop while the cellular 282 interface is brought up and the subflows are reestablished. During 283 this time, no data packet is transmitted. 285 From a protocol viewpoint, [Cellnet12] discusses the problem posed by 286 the unreliability of the ADD_ADDR option and proposes a small 287 protocol extension to allow hosts to reliably exchange this option. 288 It would be useful to analyze packet traces to understand whether the 289 unreliability of the REMOVE_ADDR option poses an operational problem 290 in real deployments. 292 Another study of the performance of Multipath TCP in wireless 293 networks was reported in [IMC13b]. This study uses laptops connected 294 to various cellular ISPs and WiFi hotspots. It compares various file 295 transfer scenarios and concludes based on measurements with the 296 Multipath TCP implementation in the Linux kernel that "MPTCP provides 297 a robust data transport and reduces variations in download 298 latencies". 300 A different study of the performance of Multipath TCP with two 301 wireless networks is presented in [INFOCOM14]. In this study the two 302 networks had different qualities : a good network and a lossy 303 network. When using two paths with different packet loss ratios, the 304 Multipath TCP congestion control scheme moves traffic away from the 305 lossy link that is considered to be congested. However, [INFOCOM14] 306 documents an interesting scenario that is summarised in the figure 307 below. 309 client ----------- path1 -------- server 310 | | 311 +--------------- path2 ------------+ 313 Figure 1: Simple network topology 315 Initially, the two paths have the same quality and Multipath TCP 316 distributes the load over both of them. During the transfer, the 317 second path becomes lossy, e.g. because the client moves. Multipath 318 TCP detects the packet losses and they are retransmitted over the 319 first path. This enables the data transfer to continue over the 320 first path. However, the subflow over the second path is still up 321 and transmits one packet from time to time. Although the N packets 322 have been acknowledged over the first subflow (at the MPTCP level), 323 they have not been acknowledged at the TCP level over the second 324 subflow. To preserve the continuity of the sequence numbers over the 325 second subflow, TCP will continue to retransmit these segments until 326 either they are acknowledged or the maximum number of retransmissions 327 is reached. This behavior is clearly inefficient and may lead to 328 blocking since the second subflow will consume window space to be 329 able to retransmit these packets. [INFOCOM14] proposes a new 330 Multipath TCP option to solve this problem. In practice, a new TCP 331 option is probably not required. When the client detects that the 332 data transmitted over the second subflow has been acknowledged over 333 the first subflow, it could decide to terminate the second subflow by 334 sending a RST segment. If the interface associated to this subflow 335 is still up, a new subflow could be immediately reestablished. It 336 would then be immediately usable to send new data and would not be 337 forced to first retransmit the previously transmitted data. As of 338 this writing, this dynamic management of the subflows is not yet 339 implemented in the Multipath TCP implementation in the Linux kernel. 341 A third use case has been the coupling between software defined 342 networking techniques such as Openflow and Multipath TCP. Openflow 343 can be used to configure different paths inside a network. Using an 344 international network, [TNC13] demonstrates that Multipath TCP can 345 achieve high throughput in the wide area. An interesting point to 346 note about the measurements reported in [TNC13] is that the 347 measurement setup used four paths through the WAN. Only two of these 348 paths were disjoint. When Multipath TCP was used, the congestion 349 control scheme ensured that only two of these paths were actually 350 used. 352 4. Congestion control 354 Congestion control has been an important problem for Multipath TCP. 355 The standardised congestion control scheme for Multipath TCP is 356 defined in [RFC6356] and [NSDI11]. This congestion control scheme 357 has been implemented in the Linux implementation of Multipath TCP. 358 Linux uses a modular architecture to support various congestion 359 control schemes. This architecture is applicable for both regular 360 TCP and Multipath TCP. While the coupled congestion control scheme 361 defined in [RFC6356] is the default congestion control scheme in the 362 Linux implementation, other congestion control schemes have been 363 added. The second congestion control scheme is OLIA [CONEXT12]. 364 This congestion control scheme is also an adaptation of the NewReno 365 single path congestion control scheme to support multiple paths. 366 Simulations and measurements have shown that it provides some 367 performance benefits compared to the the default congestion control 368 scheme [CONEXT12]. Measurement over a wide range of parameters 369 reported in [CONEXT13] also indicate some benefits with the OLIA 370 congestion control scheme. Recently, a delay-based congestion 371 control scheme has been ported to the Multipath TCP implementation in 372 the Linux kernel. This congestion control scheme has been evaluated 373 by using simulations in [ICNP12]. As of this writing, it has not yet 374 been evaluated by performing large measurement campaigns. 376 5. Subflow management 378 The multipath capability of Multipath TCP comes from the utilization 379 of one subflow per path. The Multipath TCP architecture [RFC6182] 380 and the protocol specification [RFC6824] define the basic usage of 381 the subflows and the protocol mechanisms that are required to create 382 and terminate them. However, there are no guidelines on how subflows 383 are used during the lifetime of a Multipath TCP session. Most of the 384 experiments with Multipath TCP have been performed in controlled 385 environments. Still, based on the experience running them and 386 discussions on the mptcp-dev mailing list, interesting lessons have 387 been learned about the management of these subflows. 389 From a subflow viewpoint, the Multipath TCP protocol is completely 390 symmetrical. Both the clients and the server have the capability to 391 create subflows. However in practice the existing Multipath TCP 392 implementations [I-D.eardley-mptcp-implementations-survey] have opted 393 for a strategy where only the client creates new subflows. The main 394 motivation for this strategy is that often the client resides behind 395 a NAT or a firewall, preventing passive subflow openings on the 396 client. Although there are environments such as datacenters where 397 this problem does not occur, as of this writing, no precise 398 requirement has emerged for allowing the server to create new 399 subflows. 401 5.1. Implemented subflow managers 403 The Multipath TCP implementation in the Linux kernel includes several 404 strategies to manage the subflows that compose a Multipath TCP 405 session. The basic subflow manager is the full-mesh. As the name 406 implies, it creates a full-mesh of subflows between the communicating 407 hosts. 409 The most frequent use case for this subflow manager is a multihomed 410 client connected to a single-homed server. In this case, one subflow 411 is created for each interface on the client. The current 412 implementation of the full-mesh subflow manager is static. The 413 subflows are created immediately after the creation of the initial 414 subflow. If one subflow fails during the lifetime of the Multipath 415 TCP session (e.g. due to excessive retransmissions, or the loss of 416 the corresponding interface), it is not always reestablished. There 417 is ongoing work to enhance the full-mesh path manager to deal with 418 such events. 420 When the server is multihomed, using the full-mesh subflow manager 421 may lead to a large number of subflows being established. For 422 example, consider a dual-homed client connected to a server with 423 three interfaces. In this case, even if the subflows are only 424 created by the client, 6 subflows will be established. This may be 425 excessive in some environments, in particular when the client and/or 426 the server have a large number of interfaces. It should be noted 427 that there have been reports on the mptcp-dev mailing indicating that 428 users rely on Multipath TCP to aggregate more than four different 429 interfaces. Thus, there is a need for supporting many interfaces 430 efficiently. 432 It should be noted that creating subflows between multihomed clients 433 and servers may sometimes lead to operational issues as observed by 434 discussions on the mptcp-dev mailing list. In some cases the network 435 operators would like to have a better control on how the subflows are 436 created by Multipath TCP. This might require the definition of 437 policy rules to control the operation of the subflow manager. The 438 two scenarios below illustrate some of these requirements. 440 host1 ---------- switch1 ----- host2 441 | | | 442 +-------------- switch2 --------+ 444 Figure 2: Simple switched network topology 446 Consider the simple network topology shown in Figure 2. From an 447 operational viewpoint, a network operator could want to create two 448 subflows between the communicating hosts. From a bandwidth 449 utilization viewpoint, the most natural paths are host1-switch1-host2 450 and host1-switch2-host2. However, a Multipath TCP implementation 451 running on these two hosts may sometimes have difficulties to achieve 452 this result. 454 To understand the difficulty, let us consider different allocation 455 strategies for the IP addresses. A first strategy is to assign two 456 subnets : subnetA (resp. subnetB) contains the IP addresses of 457 host1's interface to switch1 (resp. switch2) and host2's interface to 458 switch1 (resp. switch2). In this case, a Multipath TCP subflow 459 manager should only create one subflow per subnet. To enforce the 460 utilization of these paths, the network operator would have to 461 specify a policy that prefers the subflows in the same subnet over 462 subflows between addresses in different subnets. It should be noted 463 that the policy should probably also specify how the subflow manager 464 should react when an interface or subflow fails. 466 A second strategy is to use a single subnet for all IP addresses. In 467 this case, it becomes more difficult to specify a policy that 468 indicates which subflows should be established. 470 The second subflow manager that is currently supported by the 471 Multipath TCP implementation in the Linux kernel is the ndiffport 472 subflow manager. This manager was initially created to exploit the 473 path diversity that exists between single-homed hosts due to the 474 utilization of flow-based load balancing techniques. This subflow 475 manager creates N subflows between the same pair of IP addresses. 477 The N subflows are created by the client and differ only in the 478 source port selected by the client. 480 5.2. Subflow destination port 482 The Multipath TCP protocol relies on the token contained in the 483 MP_JOIN option to associate a subflow to an existing Multipath TCP 484 session. This implies that there is no restriction on the source 485 address, destination address and source or destination ports used for 486 the new subflow. The ability to use different source and destination 487 addresses is key to support multihomed servers and clients. The 488 ability to use different destination port numbers is worth being 489 discussed because it has operational implications. 491 For illustration, consider a dual-homed client that creates a second 492 subflow to reach a single-homed server as illustrated in the 493 Figure 3. 495 client ------- r1 --- internet --- server 496 | | 497 +----------r2-------+ 499 Figure 3: Multihomed-client connected to single-homed server 501 When the Multipath TCP implementation in the Linux kernel creates the 502 second subflow it uses the same destination port as the initial 503 subflow. This choice is motivated by the fact that the server might 504 be protected by a firewall and only accept TCP connections (including 505 subflows) on the official port number. Using the same destination 506 port for all subflows is also useful for operators that rely on the 507 port numbers to track application usage in their network. 509 There have been suggestions from Multipath TCP users to modify the 510 implementation to allow the client to use different destination ports 511 to reach the server. This suggestion seems mainly motivated by 512 traffic shaping middleboxes that are used in some wireless networks. 513 In networks where different shaping rates are associated to different 514 destination port numbers, this could allow Multipath TCP to reach a 515 higher performance. As of this writing, we are not aware of any 516 implementation of this kind of tweaking. 518 However, from an implementation point-of-view supporting different 519 destination ports for the same Multipath TCP connection introduces a 520 new performance issue. A legacy implementation of a TCP stack 521 creates a listening socket to react upon incoming SYN segments. The 522 listening socket is handling the SYN segments that are sent on a 523 specific port number. Demultiplexing incoming segments can thus be 524 done solely by looking at the IP addresses and the port numbers. 525 With Multipath TCP however, incoming SYN segments may have an MP_JOIN 526 option with a different destination port. This means, that all 527 incoming segments that did not match on an existing listening-socket 528 or an already established socket must be parsed for an eventual 529 MP_JOIN option. This imposes an additional cost on servers, 530 previously not existent on legacy TCP implementations. 532 5.3. Closing subflows 534 client server 535 | | 536 MPTCP: established | | MPTCP: established 537 Sub: established | | Sub: established 538 | | 539 | DATA_FIN | 540 MPTCP: close-wait | <------------------------ | close() (step 1) 541 Sub: established | DATA_ACK | 542 | ------------------------> | MPTCP: fin-wait-2 543 | | Sub: established 544 | | 545 | DATA_FIN + subflow-FIN | 546 close()/shutdown() | ------------------------> | MPTCP: time-wait 547 (step 2) | DATA_ACK | Sub: close-wait 548 MPTCP: closed | <------------------------ | 549 Sub: fin-wait-2 | | 550 | | 551 | subflow-FIN | 552 MPTCP: closed | <------------------------ | subflow-close() 553 Sub: time-wait | subflow-ACK | 554 (step 3) | ------------------------> | MPTCP: time-wait 555 | | Sub: closed 556 | | 558 Figure 4: Multipath TCP may not be able to avoid time-wait state 559 (even if enforced by the application). 561 Figure 4 shows a very particular issue within Multipath TCP. Many 562 high-performance applications try to avoid Time-Wait state by 563 deferring the closure of the connection until the peer has sent a 564 FIN. That way, the client on the left of Figure 4 does a passive 565 closure of the connection, transitioning from Close-Wait to Last-ACK 566 and finally freeing the resources after reception of the ACK of the 567 FIN. An application running on top of a Multipath TCP enabled Linux 568 kernel might also use this approach. The difference here is that the 569 close() of the connection (Step 1 in Figure 4) only triggers the 570 sending of a DATA_FIN. Nothing guarantees that the kernel is ready 571 to combine the DATA_FIN with a subflow-FIN. The reception of the 572 DATA_FIN will make the application trigger the closure of the 573 connection (step 2), trying to avoid Time-Wait state with this late 574 closure. This time, the kernel might decide to combine the DATA_FIN 575 with a subflow-FIN. This decision will be fatal, as the subflow's 576 state machine will not transition from Close-Wait to Last-Ack, but 577 rather go through Fin-Wait-2 into Time-Wait state. The Time-Wait 578 state will consume resources on the host for at least 2 MSL (Maximum 579 Segment Lifetime). Thus, a smart application, that tries to avoid 580 Time-Wait state by doing late closure of the connection actually ends 581 up with one of its subflows in Time-Wait state. A high-performance 582 Multipath TCP kernel implementation should honor the desire of the 583 application to do passive closure of the connection and successfully 584 avoid Time-Wait state - even on the subflows. 586 The solution to this problem lies in an optimistic assumption that a 587 host doing active-closure of a Multipath TCP connection by sending a 588 DATA_FIN will soon also send a FIN on all its in subflows. Thus, the 589 passive closer of the connection can simply wait for the peer to send 590 exactly this FIN - enforcing passive closure even on the subflows. 591 Of course, to avoid consuming resources indefinitely, a timer must 592 limit the time our implementation waits for the FIN. 594 6. Packet schedulers 596 In a Multipath TCP implementation, the packet scheduler is the 597 algorithm that is executed when transmitting each packet to decide on 598 which subflow it needs to be transmitted. The packet scheduler 599 itself does not have any impact on the interoperability of Multipath 600 TCP implementations. However, it may clearly impact the performance 601 of Multipath TCP sessions. It is important to note that the problem 602 of scheduling Multipath TCP packets among subflows is different from 603 the problem of scheduling SCTP messages. SCTP implementations also 604 include schedulers, but these are used to schedule the different 605 streams. Multipath TCP uses a single data stream. 607 Various researchers have explored theoretically and by simulations 608 the problem of scheduling packets among Multipath TCP subflows 609 [ICC14]. Unfortunately, none of the proposed techniques have been 610 implemented and used in real deployment. A detailed analysis of the 611 impact of the packet scheduler will appear in [CSWS14]. This article 612 proposes a pluggable architecture for the scheduler used by the 613 Multipath TCP implementation in the Linux kernel. This architecture 614 allows researchers to experiment with different types of schedulers. 615 Two schedulers are compared in [CSWS14] : round-robin and lowest-rtt- 616 first. The experiments and measurements described in [CSWS14] show 617 that the lowest-rtt-first scheduler appears to be the best compromise 618 from a performance viewpoint. 620 Another study of the packet schedulers is presented in [PAMS2014]. 621 This study relies on simulations with the Multipath TCP 622 implementation in the Linux kernel. The simulation scenarios 623 discussed in [PAMS2014] confirm the impact of the packet scheduler on 624 the performance of Multipath TCP. 626 7. Interactions with the Domain Name System 628 Multihomed clients such as smartphones could lead to operational 629 problems when interacting with the Domain Name System. When a 630 single-homed client performs a DNS query, it receives from its local 631 resolver the best answer for its request. If the client is 632 multihomed, the answer returned to the DNS query may vary with the 633 interface over which it has been sent. 635 cdn1 636 | 637 client -- cellular -- internet -- cdn3 638 | | 639 +----- wifi --------+ 640 | 641 cdn2 643 Figure 5: Simple network topology 645 If the client sends a DNS query over the WiFi interface, the answer 646 will point to the cdn2 server while the same request sent over the 647 cellular interface will point to the cdn1 server. This might cause 648 problems for CDN providers that locate their servers inside ISP 649 networks and have contracts that specify that the CDN server will 650 only be accessed from within this particular ISP. Assume now that 651 both the client and the CDN servers support Multipath TCP. In this 652 case, a Multipath TCP session from cdn1 or cdn2 would potentially use 653 both the cellular network and the WiFi network. This would violate 654 the contract between the CDN provider and the network operators. A 655 possible solution to prevent this problem would be to modify the DNS 656 resolution on the client. The client subnet EDNS extension defined 657 in [I-D.vandergaast-edns-client-subnet] could be used for this 658 purpose. When the client sends a DNS query from its WiFi interface, 659 it should also send the client subnet corresponding to the cellular 660 interface in this request. This would indicate to the resolver that 661 the answer should be valid for both the WiFi and the cellular 662 interfaces (e.g., the cdn3 server). 664 8. Captive portals 666 Multipath TCP enables a host to use different interfaces to reach a 667 server. In theory, this should ensure connectivity when at least one 668 of the interfaces is active. In practice however, there are some 669 particular scenarios with captive portals that may cause operational 670 problems. The reference environment is the following : 672 client ----- network1 673 | 674 +------- internet ------------- server 676 Figure 6: Issue with captive portal 678 The client is attached to two networks : network1 that provides 679 limited connectivity and the entire Internet through the second 680 network interface. In practice, this scenario corresponds to an open 681 WiFi network with a captive portal for network1 and a cellular 682 service for the second interface. On many smartphones, the WiFi 683 interface is preferred over the cellular interface. If the 684 smartphone learns a default route via both interfaces, it will 685 typically prefer to use the WiFi interface to send its DNS request 686 and create the first subflow. This is not optimal with Multipath 687 TCP. A better approach would probably be to try a few attempts on 688 the WiFi interface and then try to use the second interface for the 689 initial subflow as well. 691 9. Conclusion 693 In this document, we have documented a few years of experience with 694 Multipath TCP. The information presented in this document was 695 gathered from scientific publications and discussions with various 696 users of the Multipath TCP implementation in the Linux kernel. 698 10. Acknowledgements 700 This work was partially supported by the FP7-Trilogy2 project. We 701 would like to thank all the implementers and users of the Multipath 702 TCP implementation in the Linux kernel. 704 11. Informative References 706 [CACM14] Paasch, C. and O. Bonaventure, "Multipath TCP", 707 Communications of the ACM, 57(4):51-57 , April 2014, 708 . 710 [CONEXT12] 711 Khalili, R., Gast, N., Popovic, M., Upadhyay, U., and J. 712 Leboudec, "MPTCP is not pareto-optimal performance issues 713 and a possible solution", Proceedings of the 8th 714 international conference on Emerging networking 715 experiments and technologies (CoNEXT12) , 2012. 717 [CONEXT13] 718 Paasch, C., Khalili, R., and O. Bonaventure, "On the 719 Benefits of Applying Experimental Design to Improve 720 Multipath TCP", Conference on emerging Networking 721 EXperiments and Technologies (CoNEXT) , December 2013, 722 . 725 [CSWS14] Paasch, C., Ferlin, S., Alay, O., and O. Bonaventure, 726 "Experimental Evaluation of Multipath TCP Schedulers", 727 SIGCOMM CSWS2014 workshop , August 2014. 729 [Cellnet12] 730 Paasch, C., Detal, G., Duchene, F., Raiciu, C., and O. 731 Bonaventure, "Exploring Mobile/WiFi Handover with 732 Multipath TCP", ACM SIGCOMM workshop on Cellular Networks 733 (Cellnet12) , 2012, . 736 [HotMiddlebox13] 737 Hesmans, B., Duchene, F., Paasch, C., Detal, G., and O. 738 Bonaventure, "Are TCP Extensions Middlebox-proof?", CoNEXT 739 workshop HotMiddlebox , December 2013, 740 . 743 [HotNets] Raiciu, C., Pluntke, C., Barre, S., Greenhalgh, A., 744 Wischik, D., and M. Handley, "Data center networking with 745 multipath TCP", Proceedings of the 9th ACM SIGCOMM 746 Workshop on Hot Topics in Networks (Hotnets-IX) , 2010, 747 . 749 [I-D.eardley-mptcp-implementations-survey] 750 Eardley, P., "Survey of MPTCP Implementations", draft- 751 eardley-mptcp-implementations-survey-02 (work in 752 progress), July 2013. 754 [I-D.vandergaast-edns-client-subnet] 755 Contavalli, C., Gaast, W., Leach, S., and E. Lewis, 756 "Client Subnet in DNS Requests", draft-vandergaast-edns- 757 client-subnet-02 (work in progress), July 2013. 759 [ICC14] Kuhn, N., Lochin, E., Mifdaoui, A., Sarwar, G., Mehani, 760 O., and R. Boreli, "DAPS Intelligent Delay-Aware Packet 761 Scheduling For Multipath Transport", IEEE ICC 2014 , 2014. 763 [ICNP12] Cao, Y., Xu, M., and X. Fu, "Delay-based congestion 764 control for multipath TCP", 20th IEEE International 765 Conference on Network Protocols (ICNP) , 2012. 767 [IMC11] Honda, M., Nishida, Y., Raiciu, C., Greenhalgh, A., 768 Handley, M., and H. Tokuda, "Is it still possible to 769 extend TCP?", Proceedings of the 2011 ACM SIGCOMM 770 conference on Internet measurement conference (IMC '11) , 771 2011, . 773 [IMC13a] Detal, G., Hesmans, B., Bonaventure, O., Vanaubel, Y., and 774 B. Donnet, "Revealing Middlebox Interference with 775 Tracebox", Proceedings of the 2013 ACM SIGCOMM conference 776 on Internet measurement conference , 2013, 777 . 780 [IMC13b] Chen, Y., Lim, Y., Gibbens, R., Nahum, E., Khalili, R., 781 and D. Towsley, "A measurement-based study of MultiPath 782 TCP performance over wireless network", Proceedings of the 783 2013 conference on Internet measurement conference (IMC 784 '13) , n.d., . 786 [IMC13c] Pelsser, C., Cittadini, L., Vissicchio, S., and R. Bush, 787 "From Paris to Tokyo on the suitability of ping to measure 788 latency", Proceedings of the 2013 conference on Internet 789 measurement conference (IMC '13) , 2013, 790 . 792 [INFOCOM14] 793 Lim, Y., Chen, Y., Nahum, E., Towsley, D., and K. Lee, 794 "Cross-Layer Path Management in Multi-path Transport 795 Protocol for Mobile Devices", IEEE INFOCOM'14 , 2014. 797 [IOS7] "Multipath TCP Support in iOS 7", January 2014, 798 . 800 [MBTest] Hesmans, B., "MBTest", 2013, . 803 [MultipathTCP-Linux] 804 Paasch, C., Barre, S., and . et al, "Multipath TCP 805 implementation in the Linux kernel", n.d., 806 . 808 [NSDI11] Wischik, D., Raiciu, C., Greenhalgh, A., and M. Handley, 809 "Design, implementation and evaluation of congestion 810 control for Multipath TCP", In Proceedings of the 8th 811 USENIX conference on Networked systems design and 812 implementation (NSDI11) , 2011. 814 [NSDI12] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M., 815 Duchene, F., Bonaventure, O., and M. Handley, "How Hard 816 Can It Be? Designing and Implementing a Deployable 817 Multipath TCP", USENIX Symposium of Networked Systems 818 Design and Implementation (NSDI12) , April 2012, 819 . 822 [PAMS2014] 823 Arzani, B., Gurney, A., Cheng, S., Guerin, R., and B. Loo, 824 "Impact of Path Selection and Scheduling Policies on MPTCP 825 Performance", PAMS2014 , 2014. 827 [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", RFC 828 1812, June 1995. 830 [RFC6182] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. 831 Iyengar, "Architectural Guidelines for Multipath TCP 832 Development", RFC 6182, March 2011. 834 [RFC6356] Raiciu, C., Handley, M., and D. Wischik, "Coupled 835 Congestion Control for Multipath Transport Protocols", RFC 836 6356, October 2011. 838 [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, 839 "TCP Extensions for Multipath Operation with Multiple 840 Addresses", RFC 6824, January 2013. 842 [SIGCOMM11] 843 Raiciu, C., Barre, S., Pluntke, C., Greenhalgh, A., 844 Wischik, D., and M. Handley, "Improving datacenter 845 performance and robustness with multipath TCP", 846 Proceedings of the ACM SIGCOMM 2011 conference , n.d., 847 . 849 [TNC13] van der Pol, R., Bredel, M., and A. Barczyk, "Experiences 850 with MPTCP in an intercontinental multipathed OpenFlow 851 network", TNC2013 , 2013. 853 [tracebox] 854 Detal, G., "tracebox", 2013, . 856 Authors' Addresses 858 Olivier Bonaventure 859 UCLouvain 861 Email: Olivier.Bonaventure@uclouvain.be 863 Christoph Paasch 864 UCLouvain 866 Email: Christoph.Paasch@uclouvain.be