idnits 2.17.1 draft-ietf-mptcp-experience-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 08, 2015) is 3309 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 6824 (Obsoleted by RFC 8684) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 MPTCP Working Group O. Bonaventure 3 Internet-Draft C. Paasch 4 Intended status: Informational UCLouvain 5 Expires: September 9, 2015 G. Detal 6 UCLouvain and Tessares 7 March 08, 2015 9 Experience with Multipath TCP 10 draft-ietf-mptcp-experience-01 12 Abstract 14 This document discusses operational experiences of using Multipath 15 TCP in real world networks. It lists several prominent use cases for 16 which Multipath TCP has been considered and is being used. It also 17 gives insight in some heuristics and decisions that have helped to 18 realize these use cases. Further, it presents several open issues 19 that are yet unclear on how they can be solved. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on September 9, 2015. 38 Copyright Notice 40 Copyright (c) 2015 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 56 2. Middlebox interference . . . . . . . . . . . . . . . . . . . 3 57 3. Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . 4 58 4. Congestion control . . . . . . . . . . . . . . . . . . . . . 8 59 5. Subflow management . . . . . . . . . . . . . . . . . . . . . 9 60 5.1. Implemented subflow managers . . . . . . . . . . . . . . 9 61 5.2. Subflow destination port . . . . . . . . . . . . . . . . 11 62 5.3. Closing subflows . . . . . . . . . . . . . . . . . . . . 12 63 6. Packet schedulers . . . . . . . . . . . . . . . . . . . . . . 13 64 7. Segment size selection . . . . . . . . . . . . . . . . . . . 14 65 8. Interactions with the Domain Name System . . . . . . . . . . 15 66 9. Captive portals . . . . . . . . . . . . . . . . . . . . . . . 15 67 10. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 16 68 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16 69 12. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 16 70 13. Informative References . . . . . . . . . . . . . . . . . . . 17 71 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 73 1. Introduction 75 Multipath TCP was standardized in [RFC6824] and four implementations 76 have been developed [I-D.eardley-mptcp-implementations-survey]. 77 Since the publication of [RFC6824], some experience has been gathered 78 by various network researchers and users about the issues that arise 79 when Multipath TCP is used in the Internet. 81 Most of the experience reported in this document comes from the 82 utilization of the Multipath TCP implementation in the Linux kernel 83 [MultipathTCP-Linux]. It has been downloaded and is used by 84 thousands of users all over the world. Many of these users have 85 provided direct or indirect feedback by writing documents (scientific 86 articles or blog messages) or posting to the mptcp-dev mailing list ( 87 https://listes-2.sipr.ucl.ac.be/sympa/arc/mptcp-dev ) . This 88 Multipath TCP implementation is actively maintained and continuously 89 improved. It is used on various types of hosts, ranging from 90 smartphones or embedded systems to high-end servers. 92 This is not, by far, the most widespread deployment of Multipath TCP. 93 Since September 2013, Multipath TCP is also supported on smartphones 94 and tablets running iOS7 [IOS7]. There are likely hundreds of 95 millions of Multipath TCP enabled devices. However, this particular 96 Multipath TCP implementation is currently only used to support a 97 single application. Unfortunately, there is no public information 98 about the lessons learned from this large scale deployment. 100 This document is organized as follows. We explain in 101 Section Section 2 which types of middleboxes the Linux Kernel 102 implementation of Multipath TCP supports and how it reacts upon 103 encountering these. Next, we list several use cases of Multipath TCP 104 in Section {{usecases}. Section {{congestion} summarises the MPTCP 105 specific congestion controls that have been implemented. Sections 106 Section 5 and Section 6 discuss heuristics and issues with respect to 107 subflow management as well as the scheduling across the subflows. 108 Section Section 7 explains some problems that occurred with subflows 109 having different MSS values. Section Section 8 presents issues with 110 respect to content delivery networks and suggests a solution to this 111 issue. Finally, Section Section 9 shows an issue with captive 112 portals where MPTCP will behave suboptimal. 114 2. Middlebox interference 116 The interference caused by various types of middleboxes has been an 117 important concern during the design of the Multipath TCP protocol. 118 Three studies on the interactions between Multipath TCP and 119 middleboxes are worth being discussed. 121 The first analysis was described in [IMC11]. This paper was the main 122 motivation for including inside Multipath TCP various techniques to 123 cope with middlebox interference. More specifically, Multipath TCP 124 has been designed to cope with middleboxes that : - change source or 125 destination addresses - change source or destination port numbers - 126 change TCP sequence numbers - split or coalesce segments - remove TCP 127 options - modify the payload of TCP segments 129 These middlebox interferences have all been included in the MBtest 130 suite [MBTest]. This test suite has been used [HotMiddlebox13] to 131 verify the reaction of the Multipath TCP implementation in the Linux 132 kernel when faced with middlebox interference. The test environment 133 used for this evaluation is a dual-homed client connected to a 134 single-homed server. The middlebox behavior can be activated on any 135 of the paths. The main results of this analysis are : 137 o the Multipath TCP implementation in the Linux kernel is not 138 affected by a middlebox that performs NAT or modifies TCP sequence 139 numbers 141 o when a middlebox removes the MP_CAPABLE option from the initial 142 SYN segment, the Multipath TCP implementation in the Linux kernel 143 falls back correctly to regular TCP 145 o when a middlebox removes the DSS option from all data segments, 146 the Multipath TCP implementation in the Linux kernel falls back 147 correctly to regular TCP 149 o when a middlebox performs segment coalescing, the Multipath TCP 150 implementation in the Linux kernel is still able to accurately 151 extract the data corresponding to the indicated mapping 153 o when a middlebox performs segment splitting, the Multipath TCP 154 implementation in the Linux kernel correctly reassembles the data 155 corresponding to the indicated mapping. [HotMiddlebox13] 156 documents a corner case with segment splitting that may lead to 157 desynchronisation between the two hosts. 159 The interactions between Multipath TCP and real deployed middleboxes 160 is also analyzed in [HotMiddlebox13] and a particular scenario with 161 the FTP application level gateway running on a NAT is described. 163 From an operational viewpoint, knowing that Multipath TCP can cope 164 with various types of middlebox interference is important. However, 165 there are situations where the network operators need to gather 166 information about where a particular middlebox interference occurs. 167 The tracebox software [tracebox] described in [IMC13a] is an 168 extension of the popular traceroute software that enables network 169 operators to check at which hop a particular field of the TCP header 170 (including options) is modified. It has been used by several network 171 operators to debug various middlebox interference problems. tracebox 172 includes a scripting language that enables its user to specify 173 precisely which packet is sent by the source. tracebox sends packets 174 with an increasing TTL/HopLimit and compares the information returned 175 in the ICMP messages with the packet that it sends. This enables 176 tracebox to detect any interference caused by middleboxes on a given 177 path. tracebox works better when routers implement the ICMP extension 178 defined in [RFC1812]. 180 Users of the Multipath TCP implementation have reported some 181 experience with middlebox interference. The strangest scenario has 182 been a middlebox that accepts the Multipath TCP options in the SYN 183 segment but later replaces Multipath TCP options with a TCP EOL 184 option [StrangeMbox]. This causes Multipath TCP to perform a 185 fallback to regular TCP without any impact on the application. 187 3. Use cases 189 Multipath TCP has been tested in several use cases. Several of the 190 papers published in the scientific litterature have identified 191 possible improvements that are worth being discussed here. 193 A first, although initially unexpected, documented use case for 194 Multipath TCP has been the datacenters [HotNets][SIGCOMM11]. Today's 195 datacenters are designed to provide several paths between single- 196 homed servers. The multiplicity of these paths comes from the 197 utilization of Equal Cost Multipath (ECMP) and other load balancing 198 techniques inside the datacenter. Most of the deployed load 199 balancing techniques in these datacenters rely on hashes computed or 200 the five tuple to ensure that all packets from the same TCP 201 connection will follow the same path to prevent packet reordering. 202 The results presented in [HotNets] demonstrate by simulations that 203 Multipath TCP can achieve a better utilization of the available 204 network by using multiple subflows for each Multipath TCP session. 205 Although [RFC6182] assumes that at least one of the communicating 206 hosts has several IP addresses, [HotNets] demonstrates that there are 207 also benefits when both hosts are single-homed. This idea was 208 pursued further in [SIGCOMM11] where the Multipath TCP implementation 209 in the Linux kernel was modified to be able to use several subflows 210 from the same IP address. Measurements performed in a public 211 datacenter showed performance improvements with Multipath TCP 212 [SIGCOMM11]. 214 Although ECMP is widely used inside datacenters, this is not the only 215 environment where there are different paths between a pair of hosts. 216 ECMP and other load balancing techniques such as LAG are widely used 217 in today's network and having multiple paths between a pair of 218 single-homed hosts is becoming the norm instead of the exception. 219 Although these multiple paths have often the same cost (from an IGP 220 metrics viewpoint), they do not necessarily have the same 221 performance. For example, [IMC13c] reports the results of a long 222 measurement study showing that load balanced Internet paths between 223 that same pair of hosts can have huge delay differences. 225 A second use case that has been explored by several network 226 researchers is the cellular/WiFi offload use case. Smartphones or 227 other mobile devices equipped with two wireless interfaces are a very 228 common use case for Multipath TCP. As of this writing, this is also 229 the largest deployment of Multipath-TCP enabled devices [IOS7]. 230 Unfortunately, as there are no public measurements about this 231 deployment, we can only rely on published papers that have mainly 232 used the Multipath TCP implementation in the Linux kernel for their 233 experiment. 235 The performance of Multipath TCP in wireless networks was briefly 236 evaluated in [NSDI12]. One experiment analyzes the performance of 237 Multipath TCP on a client with two wireless interfaces. This 238 evaluation shows that when the receive window is large, Multipath TCP 239 can efficiently use the two available links. However, if the window 240 becomes smaller, then packets sent on a slow path can block the 241 transmission of packets on a faster path. In some cases, the 242 performance of Multipath TCP over two paths can become lower than the 243 performance of regular TCP over the best performing path. Two 244 heuristics, reinjection and penalization, are proposed in [NSDI12] to 245 solve this identified performance problem. These two heuristics have 246 since been used in the Multipath TCP implementation in the Linux 247 kernel. [CONEXT13] explored the problem in more details and revealed 248 some other scenarios where Multipath TCP can have difficulties in 249 efficiently pooling the available paths. Improvements to the 250 Multipath TCP implementation in the Linux kernel are proposed in 251 [CONEXT13] to cope with some of these problems. 253 The first experimental analysis of Multipath TCP in a public wireless 254 environment was presented in [Cellnet12]. These measurements explore 255 the ability of Multipath TCP to use two wireless networks (real WiFi 256 and 3G networks). Three modes of operation are compared. The first 257 mode of operation is the simultaneous use of the two wireless 258 networks. In this mode, Multipath TCP pools the available resources 259 and uses both wireless interfaces. This mode provides fast handover 260 from WiFi to cellular or the opposite when the user moves. 261 Measurements presented in [CACM14] show that the handover from one 262 wireless network to another is not an abrupt process. When a host 263 moves, it does not experience either excellent connectivity or no 264 connectivity at all. Instead, there are regions where the quality of 265 one of the wireless networks is weaker than the other, but the host 266 considers this wireless network to still be up. When a mobile host 267 enters such regions, its ability to send packets over another 268 wireless network is important to ensure a smooth handover. This is 269 clearly illustrated from the packet trace discussed in [CACM14]. 271 Many cellular networks use volume-based pricing and users often 272 prefer to use unmetered WiFi networks when available instead of 273 metered cellular networks. [Cellnet12] implements the support for 274 the MP_PRIO option to explore two other modes of operation. 276 In the backup mode, Multipath TCP opens a TCP subflow over each 277 interface, but the cellular interface is configured in backup mode. 278 This implies that data only flows over the WiFi interface when both 279 interfaces are considered to be active. If the WiFi interface fails, 280 then the traffic switches quickly to the cellular interface, ensuring 281 a smooth handover from the user's viewpoint [Cellnet12]. The cost of 282 this approach is that the WiFi and cellular interfaces likely remain 283 active all the time since all subflows are established over the two 284 interfaces. 286 The single-path mode is slightly different. This mode benefits from 287 the break-before-make capability of Multipath TCP. When an MPTCP 288 session is established, a subflow is created over the WiFi interface. 290 No packet is sent over the cellular interface as long as the WiFi 291 interface remains up [Cellnet12]. This implies that the cellular 292 interface can remain idle and battery capacity is preserved. When 293 the WiFi interface fails, new subflows are established over the 294 cellular interface in order to preserve the established Multipath TCP 295 sessions. Compared to the backup mode described earlier, this mode 296 of operation is characterized by a throughput drop while the cellular 297 interface is brought up and the subflows are reestablished. During 298 this time, no data packet is transmitted. 300 From a protocol viewpoint, [Cellnet12] discusses the problem posed by 301 the unreliability of the ADD_ADDR option and proposes a small 302 protocol extension to allow hosts to reliably exchange this option. 303 It would be useful to analyze packet traces to understand whether the 304 unreliability of the REMOVE_ADDR option poses an operational problem 305 in real deployments. 307 Another study of the performance of Multipath TCP in wireless 308 networks was reported in [IMC13b]. This study uses laptops connected 309 to various cellular ISPs and WiFi hotspots. It compares various file 310 transfer scenarios and concludes based on measurements with the 311 Multipath TCP implementation in the Linux kernel that "MPTCP provides 312 a robust data transport and reduces variations in download 313 latencies". 315 A different study of the performance of Multipath TCP with two 316 wireless networks is presented in [INFOCOM14]. In this study the two 317 networks had different qualities : a good network and a lossy 318 network. When using two paths with different packet loss ratios, the 319 Multipath TCP congestion control scheme moves traffic away from the 320 lossy link that is considered to be congested. However, [INFOCOM14] 321 documents an interesting scenario that is summarised in the figure 322 below. 324 client ----------- path1 -------- server 325 | | 326 +--------------- path2 ------------+ 328 Figure 1: Simple network topology 330 Initially, the two paths have the same quality and Multipath TCP 331 distributes the load over both of them. During the transfer, the 332 second path becomes lossy, e.g. because the client moves. Multipath 333 TCP detects the packet losses and they are retransmitted over the 334 first path. This enables the data transfer to continue over the 335 first path. However, the subflow over the second path is still up 336 and transmits one packet from time to time. Although the N packets 337 have been acknowledged over the first subflow (at the MPTCP level), 338 they have not been acknowledged at the TCP level over the second 339 subflow. To preserve the continuity of the sequence numbers over the 340 second subflow, TCP will continue to retransmit these segments until 341 either they are acknowledged or the maximum number of retransmissions 342 is reached. This behavior is clearly inefficient and may lead to 343 blocking since the second subflow will consume window space to be 344 able to retransmit these packets. [INFOCOM14] proposes a new 345 Multipath TCP option to solve this problem. In practice, a new TCP 346 option is probably not required. When the client detects that the 347 data transmitted over the second subflow has been acknowledged over 348 the first subflow, it could decide to terminate the second subflow by 349 sending a RST segment. If the interface associated to this subflow 350 is still up, a new subflow could be immediately reestablished. It 351 would then be immediately usable to send new data and would not be 352 forced to first retransmit the previously transmitted data. As of 353 this writing, this dynamic management of the subflows is not yet 354 implemented in the Multipath TCP implementation in the Linux kernel. 356 A third use case has been the coupling between software defined 357 networking techniques such as Openflow and Multipath TCP. Openflow 358 can be used to configure different paths inside a network. Using an 359 international network, [TNC13] demonstrates that Multipath TCP can 360 achieve high throughput in the wide area. An interesting point to 361 note about the measurements reported in [TNC13] is that the 362 measurement setup used four paths through the WAN. Only two of these 363 paths were disjoint. When Multipath TCP was used, the congestion 364 control scheme ensured that only two of these paths were actually 365 used. 367 4. Congestion control 369 Congestion control has been an important problem for Multipath TCP. 370 The standardised congestion control scheme for Multipath TCP is 371 defined in [RFC6356] and [NSDI11]. This congestion control scheme 372 has been implemented in the Linux implementation of Multipath TCP. 373 Linux uses a modular architecture to support various congestion 374 control schemes. This architecture is applicable for both regular 375 TCP and Multipath TCP. While the coupled congestion control scheme 376 defined in [RFC6356] is the default congestion control scheme in the 377 Linux implementation, other congestion control schemes have been 378 added. The second congestion control scheme is OLIA [CONEXT12]. 379 This congestion control scheme is also an adaptation of the NewReno 380 single path congestion control scheme to support multiple paths. 381 Simulations and measurements have shown that it provides some 382 performance benefits compared to the the default congestion control 383 scheme [CONEXT12]. Measurement over a wide range of parameters 384 reported in [CONEXT13] also indicate some benefits with the OLIA 385 congestion control scheme. Recently, a delay-based congestion 386 control scheme has been ported to the Multipath TCP implementation in 387 the Linux kernel. This congestion control scheme has been evaluated 388 by using simulations in [ICNP12]. 390 5. Subflow management 392 The multipath capability of Multipath TCP comes from the utilization 393 of one subflow per path. The Multipath TCP architecture [RFC6182] 394 and the protocol specification [RFC6824] define the basic usage of 395 the subflows and the protocol mechanisms that are required to create 396 and terminate them. However, there are no guidelines on how subflows 397 are used during the lifetime of a Multipath TCP session. Most of the 398 experiments with Multipath TCP have been performed in controlled 399 environments. Still, based on the experience running them and 400 discussions on the mptcp-dev mailing list, interesting lessons have 401 been learned about the management of these subflows. 403 From a subflow viewpoint, the Multipath TCP protocol is completely 404 symmetrical. Both the clients and the server have the capability to 405 create subflows. However in practice the existing Multipath TCP 406 implementations [I-D.eardley-mptcp-implementations-survey] have opted 407 for a strategy where only the client creates new subflows. The main 408 motivation for this strategy is that often the client resides behind 409 a NAT or a firewall, preventing passive subflow openings on the 410 client. Although there are environments such as datacenters where 411 this problem does not occur, as of this writing, no precise 412 requirement has emerged for allowing the server to create new 413 subflows. 415 5.1. Implemented subflow managers 417 The Multipath TCP implementation in the Linux kernel includes several 418 strategies to manage the subflows that compose a Multipath TCP 419 session. The basic subflow manager is the full-mesh. As the name 420 implies, it creates a full-mesh of subflows between the communicating 421 hosts. 423 The most frequent use case for this subflow manager is a multihomed 424 client connected to a single-homed server. In this case, one subflow 425 is created for each interface on the client. The current 426 implementation of the full-mesh subflow manager is static. The 427 subflows are created immediately after the creation of the initial 428 subflow. If one subflow fails during the lifetime of the Multipath 429 TCP session (e.g. due to excessive retransmissions, or the loss of 430 the corresponding interface), it is not always reestablished. There 431 is ongoing work to enhance the full-mesh path manager to deal with 432 such events. 434 When the server is multihomed, using the full-mesh subflow manager 435 may lead to a large number of subflows being established. For 436 example, consider a dual-homed client connected to a server with 437 three interfaces. In this case, even if the subflows are only 438 created by the client, 6 subflows will be established. This may be 439 excessive in some environments, in particular when the client and/or 440 the server have a large number of interfaces. It should be noted 441 that there have been reports on the mptcp-dev mailing indicating that 442 users rely on Multipath TCP to aggregate more than four different 443 interfaces. Thus, there is a need for supporting many interfaces 444 efficiently. 446 It should be noted that creating subflows between multihomed clients 447 and servers may sometimes lead to operational issues as observed by 448 discussions on the mptcp-dev mailing list. In some cases the network 449 operators would like to have a better control on how the subflows are 450 created by Multipath TCP. This might require the definition of 451 policy rules to control the operation of the subflow manager. The 452 two scenarios below illustrate some of these requirements. 454 host1 ---------- switch1 ----- host2 455 | | | 456 +-------------- switch2 --------+ 458 Figure 2: Simple switched network topology 460 Consider the simple network topology shown in Figure 2. From an 461 operational viewpoint, a network operator could want to create two 462 subflows between the communicating hosts. From a bandwidth 463 utilization viewpoint, the most natural paths are host1-switch1-host2 464 and host1-switch2-host2. However, a Multipath TCP implementation 465 running on these two hosts may sometimes have difficulties to achieve 466 this result. 468 To understand the difficulty, let us consider different allocation 469 strategies for the IP addresses. A first strategy is to assign two 470 subnets : subnetA (resp. subnetB) contains the IP addresses of 471 host1's interface to switch1 (resp. switch2) and host2's interface to 472 switch1 (resp. switch2). In this case, a Multipath TCP subflow 473 manager should only create one subflow per subnet. To enforce the 474 utilization of these paths, the network operator would have to 475 specify a policy that prefers the subflows in the same subnet over 476 subflows between addresses in different subnets. It should be noted 477 that the policy should probably also specify how the subflow manager 478 should react when an interface or subflow fails. 480 A second strategy is to use a single subnet for all IP addresses. In 481 this case, it becomes more difficult to specify a policy that 482 indicates which subflows should be established. 484 The second subflow manager that is currently supported by the 485 Multipath TCP implementation in the Linux kernel is the ndiffport 486 subflow manager. This manager was initially created to exploit the 487 path diversity that exists between single-homed hosts due to the 488 utilization of flow-based load balancing techniques. This subflow 489 manager creates N subflows between the same pair of IP addresses. 490 The N subflows are created by the client and differ only in the 491 source port selected by the client. 493 5.2. Subflow destination port 495 The Multipath TCP protocol relies on the token contained in the 496 MP_JOIN option to associate a subflow to an existing Multipath TCP 497 session. This implies that there is no restriction on the source 498 address, destination address and source or destination ports used for 499 the new subflow. The ability to use different source and destination 500 addresses is key to support multihomed servers and clients. The 501 ability to use different destination port numbers is worth being 502 discussed because it has operational implications. 504 For illustration, consider a dual-homed client that creates a second 505 subflow to reach a single-homed server as illustrated in the 506 Figure 3. 508 client ------- r1 --- internet --- server 509 | | 510 +----------r2-------+ 512 Figure 3: Multihomed-client connected to single-homed server 514 When the Multipath TCP implementation in the Linux kernel creates the 515 second subflow it uses the same destination port as the initial 516 subflow. This choice is motivated by the fact that the server might 517 be protected by a firewall and only accept TCP connections (including 518 subflows) on the official port number. Using the same destination 519 port for all subflows is also useful for operators that rely on the 520 port numbers to track application usage in their network. 522 There have been suggestions from Multipath TCP users to modify the 523 implementation to allow the client to use different destination ports 524 to reach the server. This suggestion seems mainly motivated by 525 traffic shaping middleboxes that are used in some wireless networks. 526 In networks where different shaping rates are associated to different 527 destination port numbers, this could allow Multipath TCP to reach a 528 higher performance. As of this writing, we are not aware of any 529 implementation of this kind of tweaking. 531 However, from an implementation point-of-view supporting different 532 destination ports for the same Multipath TCP connection introduces a 533 new performance issue. A legacy implementation of a TCP stack 534 creates a listening socket to react upon incoming SYN segments. The 535 listening socket is handling the SYN segments that are sent on a 536 specific port number. Demultiplexing incoming segments can thus be 537 done solely by looking at the IP addresses and the port numbers. 538 With Multipath TCP however, incoming SYN segments may have an MP_JOIN 539 option with a different destination port. This means, that all 540 incoming segments that did not match on an existing listening-socket 541 or an already established socket must be parsed for an eventual 542 MP_JOIN option. This imposes an additional cost on servers, 543 previously not existent on legacy TCP implementations. 545 5.3. Closing subflows 547 client server 548 | | 549 MPTCP: established | | MPTCP: established 550 Sub: established | | Sub: established 551 | | 552 | DATA_FIN | 553 MPTCP: close-wait | <------------------------ | close() (step 1) 554 Sub: established | DATA_ACK | 555 | ------------------------> | MPTCP: fin-wait-2 556 | | Sub: established 557 | | 558 | DATA_FIN + subflow-FIN | 559 close()/shutdown() | ------------------------> | MPTCP: time-wait 560 (step 2) | DATA_ACK | Sub: close-wait 561 MPTCP: closed | <------------------------ | 562 Sub: fin-wait-2 | | 563 | | 564 | subflow-FIN | 565 MPTCP: closed | <------------------------ | subflow-close() 566 Sub: time-wait | subflow-ACK | 567 (step 3) | ------------------------> | MPTCP: time-wait 568 | | Sub: closed 569 | | 571 Figure 4: Multipath TCP may not be able to avoid time-wait state 572 (even if enforced by the application). 574 Figure 4 shows a very particular issue within Multipath TCP. Many 575 high-performance applications try to avoid Time-Wait state by 576 deferring the closure of the connection until the peer has sent a 577 FIN. That way, the client on the left of Figure 4 does a passive 578 closure of the connection, transitioning from Close-Wait to Last-ACK 579 and finally freeing the resources after reception of the ACK of the 580 FIN. An application running on top of a Multipath TCP enabled Linux 581 kernel might also use this approach. The difference here is that the 582 close() of the connection (Step 1 in Figure 4) only triggers the 583 sending of a DATA_FIN. Nothing guarantees that the kernel is ready 584 to combine the DATA_FIN with a subflow-FIN. The reception of the 585 DATA_FIN will make the application trigger the closure of the 586 connection (step 2), trying to avoid Time-Wait state with this late 587 closure. This time, the kernel might decide to combine the DATA_FIN 588 with a subflow-FIN. This decision will be fatal, as the subflow's 589 state machine will not transition from Close-Wait to Last-Ack, but 590 rather go through Fin-Wait-2 into Time-Wait state. The Time-Wait 591 state will consume resources on the host for at least 2 MSL (Maximum 592 Segment Lifetime). Thus, a smart application, that tries to avoid 593 Time-Wait state by doing late closure of the connection actually ends 594 up with one of its subflows in Time-Wait state. A high-performance 595 Multipath TCP kernel implementation should honor the desire of the 596 application to do passive closure of the connection and successfully 597 avoid Time-Wait state - even on the subflows. 599 The solution to this problem lies in an optimistic assumption that a 600 host doing active-closure of a Multipath TCP connection by sending a 601 DATA_FIN will soon also send a FIN on all its in subflows. Thus, the 602 passive closer of the connection can simply wait for the peer to send 603 exactly this FIN - enforcing passive closure even on the subflows. 604 Of course, to avoid consuming resources indefinitely, a timer must 605 limit the time our implementation waits for the FIN. 607 6. Packet schedulers 609 In a Multipath TCP implementation, the packet scheduler is the 610 algorithm that is executed when transmitting each packet to decide on 611 which subflow it needs to be transmitted. The packet scheduler 612 itself does not have any impact on the interoperability of Multipath 613 TCP implementations. However, it may clearly impact the performance 614 of Multipath TCP sessions. It is important to note that the problem 615 of scheduling Multipath TCP packets among subflows is different from 616 the problem of scheduling SCTP messages. SCTP implementations also 617 include schedulers, but these are used to schedule the different 618 streams. Multipath TCP uses a single data stream. 620 Various researchers have explored theoretically and by simulations 621 the problem of scheduling packets among Multipath TCP subflows 623 [ICC14]. Unfortunately, none of the proposed techniques have been 624 implemented and used in real deployment. A detailed analysis of the 625 impact of the packet scheduler will appear in [CSWS14]. This article 626 proposes a pluggable architecture for the scheduler used by the 627 Multipath TCP implementation in the Linux kernel. This architecture 628 allows researchers to experiment with different types of schedulers. 629 Two schedulers are compared in [CSWS14] : round-robin and lowest-rtt- 630 first. The experiments and measurements described in [CSWS14] show 631 that the lowest-rtt-first scheduler appears to be the best compromise 632 from a performance viewpoint. 634 Another study of the packet schedulers is presented in [PAMS2014]. 635 This study relies on simulations with the Multipath TCP 636 implementation in the Linux kernel. The simulation scenarios 637 discussed in [PAMS2014] confirm the impact of the packet scheduler on 638 the performance of Multipath TCP. 640 7. Segment size selection 642 When an application performs a write/send system call, the kernel 643 allocates a packet buffer (sk_buff in Linux) to store the data the 644 application wants to send. The kernel will store at most one MSS 645 (Maximum Segment Size) of data per buffer. As MSS can differ amongst 646 subflows, an MPTCP implementation must select carefully the MSS used 647 to generate application data. The Linux kernel implementation had 648 various ways of selecting the MSS: minimum or maximum amongst the 649 different subflows. However, these heuristics of MSS selection can 650 cause significant performances issues in some environment. Consider 651 the following example. An MPTCP connection has two established 652 subflows that respectively use a MSS of 1420 and 1428 bytes. If 653 MPTCP selects the maximum, then the application will generate 654 segments of 1428 bytes of data. An MPTCP implementation will have to 655 split the segment in two (a 1420-byte and 8-byte segments) when 656 pushing on the subflow with the smallest MSS. The latter segment 657 will introduce a large overhead as for a single data segment 2 slots 658 will be used in the congestion window (in packets) therefore reducing 659 by ~2 the potential throughput (in bytes/s) of this subflow. Taking 660 the smallest MSS does not solve the issue as there might be a case 661 where the sublow with the smallest MSS will only participate 662 marginally to the overall performance therefore reducing the 663 potential throughput of the other subflows. 665 The Linux implementation recently took another approach [DetalMSS]. 666 Instead of selecting the minimum and maximum values, it now 667 dynamically adapts the MSS based on the contribution of all the 668 subflows to the connection's throughput. For this it computes, for 669 each subflow, the potential throughput achieved by selecting each MSS 670 value and by taking into account the lost space in the cwnd. It then 671 selects the MSS that allows to achieve the highest potential 672 throughput. 674 8. Interactions with the Domain Name System 676 Multihomed clients such as smartphones could lead to operational 677 problems when interacting with the Domain Name System. When a 678 single-homed client performs a DNS query, it receives from its local 679 resolver the best answer for its request. If the client is 680 multihomed, the answer returned to the DNS query may vary with the 681 interface over which it has been sent. 683 cdn1 684 | 685 client -- cellular -- internet -- cdn3 686 | | 687 +----- wifi --------+ 688 | 689 cdn2 691 Figure 5: Simple network topology 693 If the client sends a DNS query over the WiFi interface, the answer 694 will point to the cdn2 server while the same request sent over the 695 cellular interface will point to the cdn1 server. This might cause 696 problems for CDN providers that locate their servers inside ISP 697 networks and have contracts that specify that the CDN server will 698 only be accessed from within this particular ISP. Assume now that 699 both the client and the CDN servers support Multipath TCP. In this 700 case, a Multipath TCP session from cdn1 or cdn2 would potentially use 701 both the cellular network and the WiFi network. This would violate 702 the contract between the CDN provider and the network operators. A 703 possible solution to prevent this problem would be to modify the DNS 704 resolution on the client. The client subnet EDNS extension defined 705 in [I-D.vandergaast-edns-client-subnet] could be used for this 706 purpose. When the client sends a DNS query from its WiFi interface, 707 it should also send the client subnet corresponding to the cellular 708 interface in this request. This would indicate to the resolver that 709 the answer should be valid for both the WiFi and the cellular 710 interfaces (e.g., the cdn3 server). 712 9. Captive portals 714 Multipath TCP enables a host to use different interfaces to reach a 715 server. In theory, this should ensure connectivity when at least one 716 of the interfaces is active. In practice however, there are some 717 particular scenarios with captive portals that may cause operational 718 problems. The reference environment is the following : 720 client ----- network1 721 | 722 +------- internet ------------- server 724 Figure 6: Issue with captive portal 726 The client is attached to two networks : network1 that provides 727 limited connectivity and the entire Internet through the second 728 network interface. In practice, this scenario corresponds to an open 729 WiFi network with a captive portal for network1 and a cellular 730 service for the second interface. On many smartphones, the WiFi 731 interface is preferred over the cellular interface. If the 732 smartphone learns a default route via both interfaces, it will 733 typically prefer to use the WiFi interface to send its DNS request 734 and create the first subflow. This is not optimal with Multipath 735 TCP. A better approach would probably be to try a few attempts on 736 the WiFi interface and then try to use the second interface for the 737 initial subflow as well. 739 10. Conclusion 741 In this document, we have documented a few years of experience with 742 Multipath TCP. The information presented in this document was 743 gathered from scientific publications and discussions with various 744 users of the Multipath TCP implementation in the Linux kernel. 746 11. Acknowledgements 748 This work was partially supported by the FP7-Trilogy2 project. We 749 would like to thank all the implementers and users of the Multipath 750 TCP implementation in the Linux kernel. 752 12. Changelog 754 o initial version : September 16th, 2014 : Added section Section 7 755 that discusses some performance problems that appeared with the 756 Linux implementation when using subflows having different MSS 757 values 759 o update with a description of the middlebox that replaces an 760 unknown TCP option with EOL [StrangeMbox] 762 13. Informative References 764 [CACM14] Paasch, C. and O. Bonaventure, "Multipath TCP", 765 Communications of the ACM, 57(4):51-57 , April 2014, 766 . 768 [CONEXT12] 769 Khalili, R., Gast, N., Popovic, M., Upadhyay, U., and J. 770 Leboudec, "MPTCP is not pareto-optimal performance issues 771 and a possible solution", Proceedings of the 8th 772 international conference on Emerging networking 773 experiments and technologies (CoNEXT12) , 2012. 775 [CONEXT13] 776 Paasch, C., Khalili, R., and O. Bonaventure, "On the 777 Benefits of Applying Experimental Design to Improve 778 Multipath TCP", Conference on emerging Networking 779 EXperiments and Technologies (CoNEXT) , December 2013, 780 . 783 [CSWS14] Paasch, C., Ferlin, S., Alay, O., and O. Bonaventure, 784 "Experimental Evaluation of Multipath TCP Schedulers", 785 SIGCOMM CSWS2014 workshop , August 2014. 787 [Cellnet12] 788 Paasch, C., Detal, G., Duchene, F., Raiciu, C., and O. 789 Bonaventure, "Exploring Mobile/WiFi Handover with 790 Multipath TCP", ACM SIGCOMM workshop on Cellular Networks 791 (Cellnet12) , 2012, 792 . 795 [DetalMSS] 796 Detal, G., "Adaptive MSS value", Post on the mptcp-dev 797 mailing list , September 2014, . 801 [HotMiddlebox13] 802 Hesmans, B., Duchene, F., Paasch, C., Detal, G., and O. 803 Bonaventure, "Are TCP Extensions Middlebox-proof?", CoNEXT 804 workshop HotMiddlebox , December 2013, 805 . 808 [HotNets] Raiciu, C., Pluntke, C., Barre, S., Greenhalgh, A., 809 Wischik, D., and M. Handley, "Data center networking with 810 multipath TCP", Proceedings of the 9th ACM SIGCOMM 811 Workshop on Hot Topics in Networks (Hotnets-IX) , 2010, 812 . 814 [I-D.eardley-mptcp-implementations-survey] 815 Eardley, P., "Survey of MPTCP Implementations", draft- 816 eardley-mptcp-implementations-survey-02 (work in 817 progress), July 2013. 819 [I-D.vandergaast-edns-client-subnet] 820 Contavalli, C., Gaast, W., Leach, S., and E. Lewis, 821 "Client Subnet in DNS Requests", draft-vandergaast-edns- 822 client-subnet-02 (work in progress), July 2013. 824 [ICC14] Kuhn, N., Lochin, E., Mifdaoui, A., Sarwar, G., Mehani, 825 O., and R. Boreli, "DAPS Intelligent Delay-Aware Packet 826 Scheduling For Multipath Transport", IEEE ICC 2014 , 2014. 828 [ICNP12] Cao, Y., Xu, M., and X. Fu, "Delay-based congestion 829 control for multipath TCP", 20th IEEE International 830 Conference on Network Protocols (ICNP) , 2012. 832 [IMC11] Honda, M., Nishida, Y., Raiciu, C., Greenhalgh, A., 833 Handley, M., and H. Tokuda, "Is it still possible to 834 extend TCP?", Proceedings of the 2011 ACM SIGCOMM 835 conference on Internet measurement conference (IMC '11) , 836 2011, . 838 [IMC13a] Detal, G., Hesmans, B., Bonaventure, O., Vanaubel, Y., and 839 B. Donnet, "Revealing Middlebox Interference with 840 Tracebox", Proceedings of the 2013 ACM SIGCOMM conference 841 on Internet measurement conference , 2013, 842 . 845 [IMC13b] Chen, Y., Lim, Y., Gibbens, R., Nahum, E., Khalili, R., 846 and D. Towsley, "A measurement-based study of MultiPath 847 TCP performance over wireless network", Proceedings of the 848 2013 conference on Internet measurement conference (IMC 849 '13) , n.d., . 851 [IMC13c] Pelsser, C., Cittadini, L., Vissicchio, S., and R. Bush, 852 "From Paris to Tokyo on the suitability of ping to measure 853 latency", Proceedings of the 2013 conference on Internet 854 measurement conference (IMC '13) , 2013, 855 . 857 [INFOCOM14] 858 Lim, Y., Chen, Y., Nahum, E., Towsley, D., and K. Lee, 859 "Cross-Layer Path Management in Multi-path Transport 860 Protocol for Mobile Devices", IEEE INFOCOM'14 , 2014. 862 [IOS7] "Multipath TCP Support in iOS 7", January 2014, 863 . 865 [MBTest] Hesmans, B., "MBTest", 2013, 866 . 868 [MultipathTCP-Linux] 869 Paasch, C., Barre, S., and . et al, "Multipath TCP 870 implementation in the Linux kernel", n.d., 871 . 873 [NSDI11] Wischik, D., Raiciu, C., Greenhalgh, A., and M. Handley, 874 "Design, implementation and evaluation of congestion 875 control for Multipath TCP", In Proceedings of the 8th 876 USENIX conference on Networked systems design and 877 implementation (NSDI11) , 2011. 879 [NSDI12] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M., 880 Duchene, F., Bonaventure, O., and M. Handley, "How Hard 881 Can It Be? Designing and Implementing a Deployable 882 Multipath TCP", USENIX Symposium of Networked Systems 883 Design and Implementation (NSDI12) , April 2012, 884 . 887 [PAMS2014] 888 Arzani, B., Gurney, A., Cheng, S., Guerin, R., and B. Loo, 889 "Impact of Path Selection and Scheduling Policies on MPTCP 890 Performance", PAMS2014 , 2014. 892 [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", RFC 893 1812, June 1995. 895 [RFC6182] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. 896 Iyengar, "Architectural Guidelines for Multipath TCP 897 Development", RFC 6182, March 2011. 899 [RFC6356] Raiciu, C., Handley, M., and D. Wischik, "Coupled 900 Congestion Control for Multipath Transport Protocols", RFC 901 6356, October 2011. 903 [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, 904 "TCP Extensions for Multipath Operation with Multiple 905 Addresses", RFC 6824, January 2013. 907 [SIGCOMM11] 908 Raiciu, C., Barre, S., Pluntke, C., Greenhalgh, A., 909 Wischik, D., and M. Handley, "Improving datacenter 910 performance and robustness with multipath TCP", 911 Proceedings of the ACM SIGCOMM 2011 conference , n.d., 912 . 914 [StrangeMbox] 915 Bonaventure, O., "Multipath TCP through a strange 916 middlebox", Blog post , January 2015, 917 . 920 [TNC13] van der Pol, R., Bredel, M., and A. Barczyk, "Experiences 921 with MPTCP in an intercontinental multipathed OpenFlow 922 network", TNC2013 , 2013. 924 [tracebox] 925 Detal, G., "tracebox", 2013, . 927 Authors' Addresses 929 Olivier Bonaventure 930 UCLouvain 932 Email: Olivier.Bonaventure@uclouvain.be 934 Christoph Paasch 935 UCLouvain 937 Email: Christoph.Paasch@uclouvain.be 939 Gregory Detal 940 UCLouvain and Tessares 942 Email: Gregory.Detal@tessares.net