idnits 2.17.1 

draft-ietf-mptcp-experience-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (September 16, 2014) is 3509 days in the past.  Is
     this intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Obsolete informational reference (is this intentional?): RFC 6824
     (Obsoleted by RFC 8684)


     Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	MPTCP Working Group                                       O. Bonaventure
3	Internet-Draft                                                 C. Paasch
4	Intended status: Informational                                  G. Detal
5	Expires: March 20, 2015                                        UCLouvain
6	                                                      September 16, 2014

8	                     Experience with Multipath TCP
9	                     draft-ietf-mptcp-experience-00

11	Abstract

13	   This document discusses operational experiences of using Multipath
14	   TCP in real world networks.  It lists several prominent use cases for
15	   which Multipath TCP has been considered and is being used.  It also
16	   gives insight in some heuristics and decisions that have helped to
17	   realize these use cases.  Further, it presents several open issues
18	   that are yet unclear on how they can be solved.

20	Status of This Memo

22	   This Internet-Draft is submitted in full conformance with the
23	   provisions of BCP 78 and BCP 79.

25	   Internet-Drafts are working documents of the Internet Engineering
26	   Task Force (IETF).  Note that other groups may also distribute
27	   working documents as Internet-Drafts.  The list of current Internet-
28	   Drafts is at http://datatracker.ietf.org/drafts/current/.

30	   Internet-Drafts are draft documents valid for a maximum of six months
31	   and may be updated, replaced, or obsoleted by other documents at any
32	   time.  It is inappropriate to use Internet-Drafts as reference
33	   material or to cite them other than as "work in progress."

35	   This Internet-Draft will expire on March 20, 2015.

37	Copyright Notice

39	   Copyright (c) 2014 IETF Trust and the persons identified as the
40	   document authors.  All rights reserved.

42	   This document is subject to BCP 78 and the IETF Trust's Legal
43	   Provisions Relating to IETF Documents
44	   (http://trustee.ietf.org/license-info) in effect on the date of
45	   publication of this document.  Please review these documents
46	   carefully, as they describe your rights and restrictions with respect
47	   to this document.  Code Components extracted from this document must
48	   include Simplified BSD License text as described in Section 4.e of
49	   the Trust Legal Provisions and are provided without warranty as
50	   described in the Simplified BSD License.

52	Table of Contents

54	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
55	   2.  Middlebox interference  . . . . . . . . . . . . . . . . . . .   3
56	   3.  Use cases . . . . . . . . . . . . . . . . . . . . . . . . . .   4
57	   4.  Congestion control  . . . . . . . . . . . . . . . . . . . . .   8
58	   5.  Subflow management  . . . . . . . . . . . . . . . . . . . . .   9
59	     5.1.  Implemented subflow managers  . . . . . . . . . . . . . .   9
60	     5.2.  Subflow destination port  . . . . . . . . . . . . . . . .  11
61	     5.3.  Closing subflows  . . . . . . . . . . . . . . . . . . . .  12
62	   6.  Packet schedulers . . . . . . . . . . . . . . . . . . . . . .  13
63	   7.  Segment size selection  . . . . . . . . . . . . . . . . . . .  14
64	   8.  Interactions with the Domain Name System  . . . . . . . . . .  15
65	   9.  Captive portals . . . . . . . . . . . . . . . . . . . . . . .  15
66	   10. Conclusion  . . . . . . . . . . . . . . . . . . . . . . . . .  16
67	   11. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  16
68	   12. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . .  16
69	   13. Informative References  . . . . . . . . . . . . . . . . . . .  16
70	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  20

72	1.  Introduction

74	   Multipath TCP was standardized in [RFC6824] and four implementations
75	   have been developed [I-D.eardley-mptcp-implementations-survey].
76	   Since the publication of [RFC6824], some experience has been gathered
77	   by various network researchers and users about the issues that arise
78	   when Multipath TCP is used in the Internet.

80	   Most of the experience reported in this document comes from the
81	   utilization of the Multipath TCP implementation in the Linux kernel
82	   [MultipathTCP-Linux].  It has been downloaded and is used by
83	   thousands of users all over the world.  Many of these users have
84	   provided direct or indirect feedback by writing documents (scientific
85	   articles or blog messages) or posting to the mptcp-dev mailing list (
86	   https://listes-2.sipr.ucl.ac.be/sympa/arc/mptcp-dev ) . This
87	   Multipath TCP implementation is actively maintained and continuously
88	   improved.  It is used on various types of hosts, ranging from
89	   smartphones or embedded systems to high-end servers.

91	   This is not, by far, the most widespread deployment of Multipath TCP.
92	   Since September 2013, Multipath TCP is also supported on smartphones
93	   and tablets running iOS7 [IOS7].  There are likely hundreds of
94	   millions of Multipath TCP enabled devices.  However, this particular
95	   Multipath TCP implementation is currently only used to support a
96	   single application.  Unfortunately, there is no public information
97	   about the lessons learned from this large scale deployment.

99	   This document is organized as follows.  We explain in
100	   Section Section 2 which types of middleboxes the Linux Kernel
101	   implementation of Multipath TCP supports and how it reacts upon
102	   encountering these.  Next, we list several use cases of Multipath TCP
103	   in Section {{usecases}. Section {{congestion} summarises the MPTCP
104	   specific congestion controls that have been implemented.  Sections
105	   Section 5 and Section 6 discuss heuristics and issues with respect to
106	   subflow management as well as the scheduling across the subflows.
107	   Section Section 7 explains some problems that occurred with subflows
108	   having different MSS values.  Section Section 8 presents issues with
109	   respect to content delivery networks and suggests a solution to this
110	   issue.  Finally, Section Section 9 shows an issue with captive
111	   portals where MPTCP will behave suboptimal.

113	2.  Middlebox interference

115	   The interference caused by various types of middleboxes has been an
116	   important concern during the design of the Multipath TCP protocol.
117	   Three studies on the interactions between Multipath TCP and
118	   middleboxes are worth being discussed.

120	   The first analysis was described in [IMC11].  This paper was the main
121	   motivation for including inside Multipath TCP various techniques to
122	   cope with middlebox interference.  More specifically, Multipath TCP
123	   has been designed to cope with middleboxes that : - change source or
124	   destination addresses - change source or destination port numbers -
125	   change TCP sequence numbers - split or coalesce segments - remove TCP
126	   options - modify the payload of TCP segments

128	   These middlebox interferences have all been included in the MBtest
129	   suite [MBTest].  This test suite has been used [HotMiddlebox13] to
130	   verify the reaction of the Multipath TCP implementation in the Linux
131	   kernel when faced with middlebox interference.  The test environment
132	   used for this evaluation is a dual-homed client connected to a
133	   single-homed server.  The middlebox behavior can be activated on any
134	   of the paths.  The main results of this analysis are :

136	   o  the Multipath TCP implementation in the Linux kernel is not
137	      affected by a middlebox that performs NAT or modifies TCP sequence
138	      numbers

140	   o  when a middlebox removes the MP_CAPABLE option from the initial
141	      SYN segment, the Multipath TCP implementation in the Linux kernel
142	      falls back correctly to regular TCP

144	   o  when a middlebox removes the DSS option from all data segments,
145	      the Multipath TCP implementation in the Linux kernel falls back
146	      correctly to regular TCP

148	   o  when a middlebox performs segment coalescing, the Multipath TCP
149	      implementation in the Linux kernel is still able to accurately
150	      extract the data corresponding to the indicated mapping

152	   o  when a middlebox performs segment splitting, the Multipath TCP
153	      implementation in the Linux kernel correctly reassembles the data
154	      corresponding to the indicated mapping.  [HotMiddlebox13]
155	      documents a corner case with segment splitting that may lead to
156	      desynchronisation between the two hosts.

158	   The interactions between Multipath TCP and real deployed middleboxes
159	   is also analyzed in [HotMiddlebox13] and a particular scenario with
160	   the FTP application level gateway running on a NAT is described.

162	   From an operational viewpoint, knowing that Multipath TCP can cope
163	   with various types of middlebox interference is important.  However,
164	   there are situations where the network operators need to gather
165	   information about where a particular middlebox interference occurs.
166	   The tracebox software [tracebox] described in [IMC13a] is an
167	   extension of the popular traceroute software that enables network
168	   operators to check at which hop a particular field of the TCP header
169	   (including options) is modified.  It has been used by several network
170	   operators to debug various middlebox interference problems. tracebox
171	   includes a scripting language that enables its user to specify
172	   precisely which packet is sent by the source. tracebox sends packets
173	   with an increasing TTL/HopLimit and compares the information returned
174	   in the ICMP messages with the packet that it sends.  This enables
175	   tracebox to detect any interference caused by middleboxes on a given
176	   path. tracebox works better when routers implement the ICMP extension
177	   defined in [RFC1812].

179	3.  Use cases

181	   Multipath TCP has been tested in several use cases.  Several of the
182	   papers published in the scientific litterature have identified
183	   possible improvements that are worth being discussed here.

185	   A first, although initially unexpected, documented use case for
186	   Multipath TCP has been the datacenters [HotNets][SIGCOMM11].  Today's
187	   datacenters are designed to provide several paths between single-
188	   homed servers.  The multiplicity of these paths comes from the
189	   utilization of Equal Cost Multipath (ECMP) and other load balancing
190	   techniques inside the datacenter.  Most of the deployed load
191	   balancing techniques in these datacenters rely on hashes computed or
192	   the five tuple to ensure that all packets from the same TCP
193	   connection will follow the same path to prevent packet reordering.
194	   The results presented in [HotNets] demonstrate by simulations that
195	   Multipath TCP can achieve a better utilization of the available
196	   network by using multiple subflows for each Multipath TCP session.
197	   Although [RFC6182] assumes that at least one of the communicating
198	   hosts has several IP addresses, [HotNets] demonstrates that there are
199	   also benefits when both hosts are single-homed.  This idea was
200	   pursued further in [SIGCOMM11] where the Multipath TCP implementation
201	   in the Linux kernel was modified to be able to use several subflows
202	   from the same IP address.  Measurements performed in a public
203	   datacenter showed performance improvements with Multipath TCP.

205	   Although ECMP is widely used inside datacenters, this is not the only
206	   environment where there are different paths between a pair of hosts.
207	   ECMP and other load balancing techniques such as LAG are widely used
208	   in today's network and having multiple paths between a pair of
209	   single-homed hosts is becoming the norm instead of the exception.
210	   Although these multiple paths have often the same cost (from an IGP
211	   metrics viewpoint), they do not necessarily have the same
212	   performance.  For example, [IMC13c] reports the results of a long
213	   measurement study showing that load balanced Internet paths between
214	   that same pair of hosts can have huge delay differences.

216	   A second use case that has been explored by several network
217	   researchers is the cellular/WiFi offload use case.  Smartphones or
218	   other mobile devices equipped with two wireless interfaces are a very
219	   common use case for Multipath TCP.  As of this writing, this is also
220	   the largest deployment of Multipath-TCP enabled devices [IOS7].
221	   Unfortunately, as there are no public measurements about this
222	   deployment, we can only rely on published papers that have mainly
223	   used the Multipath TCP implementation in the Linux kernel for their
224	   experiment.

226	   The performance of Multipath TCP in wireless networks was briefly
227	   evaluated in [NSDI12].  One experiment analyzes the performance of
228	   Multipath TCP on a client with two wireless interfaces.  This
229	   evaluation shows that when the receive window is large, Multipath TCP
230	   can efficiently use the two available links.  However, if the window
231	   becomes smaller, then packets sent on a slow path can block the
232	   transmission of packets on a faster path.  In some cases, the
233	   performance of Multipath TCP over two paths can become lower than the
234	   performance of regular TCP over the best performing path.  Two
235	   heuristics, reinjection and penalization, are proposed in [NSDI12] to
236	   solve this identified performance problem.  These two heuristics have
237	   since been used in the Multipath TCP implementation in the Linux
238	   kernel.  [CONEXT13] explored the problem in more details and revealed
239	   some other scenarios where Multipath TCP can have difficulties in
240	   efficiently pooling the available paths.  Improvements to the
241	   Multipath TCP implementation in the Linux kernel are proposed in
242	   [CONEXT13] to cope with some of these problems.

244	   The first experimental analysis of Multipath TCP in a public wireless
245	   environment was presented in [Cellnet12].  These measurements explore
246	   the ability of Multipath TCP to use two wireless networks (real WiFi
247	   and 3G networks).  Three modes of operation are compared.  The first
248	   mode of operation is the simultaneous use of the two wireless
249	   networks.  In this mode, Multipath TCP pools the available resources
250	   and uses both wireless interfaces.  This mode provides fast handover
251	   from WiFi to cellular or the opposite when the user moves.
252	   Measurements presented in [CACM14] show that the handover from one
253	   wireless network to another is not an abrupt process.  When a host
254	   moves, it does not experience either excellent connectivity or no
255	   connectivity at all.  Instead, there are regions where the quality of
256	   one of the wireless networks is weaker than the other, but the host
257	   considers this wireless network to still be up.  When a mobile host
258	   enters such regions, its ability to send packets over another
259	   wireless network is important to ensure a smooth handover.  This is
260	   clearly illustrated from the packet trace discussed in [CACM14].

262	   Many cellular networks use volume-based pricing and users often
263	   prefer to use unmetered WiFi networks when available instead of
264	   metered cellular networks.  [Cellnet12] implements the support for
265	   the MP_PRIO option to explore two other modes of operation.

267	   In the backup mode, Multipath TCP opens a TCP subflow over each
268	   interface, but the cellular interface is configured in backup mode.
269	   This implies that data only flows over the WiFi interface when both
270	   interfaces are considered to be active.  If the WiFi interface fails,
271	   then the traffic switches quickly to the cellular interface, ensuring
272	   a smooth handover from the user's viewpoint [Cellnet12].  The cost of
273	   this approach is that the WiFi and cellular interfaces likely remain
274	   active all the time since all subflows are established over the two
275	   interfaces.

277	   The single-path mode is slightly different.  This mode benefits from
278	   the break-before-make capability of Multipath TCP.  When an MPTCP
279	   session is established, a subflow is created over the WiFi interface.
280	   No packet is sent over the cellular interface as long as the WiFi
281	   interface remains up [Cellnet12].  This implies that the cellular
282	   interface can remain idle and battery capacity is preserved.  When
283	   the WiFi interface fails, new subflows are established over the
284	   cellular interface in order to preserve the established Multipath TCP
285	   sessions.  Compared to the backup mode described earlier, this mode
286	   of operation is characterized by a throughput drop while the cellular
287	   interface is brought up and the subflows are reestablished.  During
288	   this time, no data packet is transmitted.

290	   From a protocol viewpoint, [Cellnet12] discusses the problem posed by
291	   the unreliability of the ADD_ADDR option and proposes a small
292	   protocol extension to allow hosts to reliably exchange this option.
293	   It would be useful to analyze packet traces to understand whether the
294	   unreliability of the REMOVE_ADDR option poses an operational problem
295	   in real deployments.

297	   Another study of the performance of Multipath TCP in wireless
298	   networks was reported in [IMC13b].  This study uses laptops connected
299	   to various cellular ISPs and WiFi hotspots.  It compares various file
300	   transfer scenarios and concludes based on measurements with the
301	   Multipath TCP implementation in the Linux kernel that "MPTCP provides
302	   a robust data transport and reduces variations in download
303	   latencies".

305	   A different study of the performance of Multipath TCP with two
306	   wireless networks is presented in [INFOCOM14].  In this study the two
307	   networks had different qualities : a good network and a lossy
308	   network.  When using two paths with different packet loss ratios, the
309	   Multipath TCP congestion control scheme moves traffic away from the
310	   lossy link that is considered to be congested.  However, [INFOCOM14]
311	   documents an interesting scenario that is summarised in the figure
312	   below.

314	   client ----------- path1 -------- server
315	     |                                  |
316	     +--------------- path2 ------------+

318	                     Figure 1: Simple network topology

320	   Initially, the two paths have the same quality and Multipath TCP
321	   distributes the load over both of them.  During the transfer, the
322	   second path becomes lossy, e.g. because the client moves.  Multipath
323	   TCP detects the packet losses and they are retransmitted over the
324	   first path.  This enables the data transfer to continue over the
325	   first path.  However, the subflow over the second path is still up
326	   and transmits one packet from time to time.  Although the N packets
327	   have been acknowledged over the first subflow (at the MPTCP level),
328	   they have not been acknowledged at the TCP level over the second
329	   subflow.  To preserve the continuity of the sequence numbers over the
330	   second subflow, TCP will continue to retransmit these segments until
331	   either they are acknowledged or the maximum number of retransmissions
332	   is reached.  This behavior is clearly inefficient and may lead to
333	   blocking since the second subflow will consume window space to be
334	   able to retransmit these packets.  [INFOCOM14] proposes a new
335	   Multipath TCP option to solve this problem.  In practice, a new TCP
336	   option is probably not required.  When the client detects that the
337	   data transmitted over the second subflow has been acknowledged over
338	   the first subflow, it could decide to terminate the second subflow by
339	   sending a RST segment.  If the interface associated to this subflow
340	   is still up, a new subflow could be immediately reestablished.  It
341	   would then be immediately usable to send new data and would not be
342	   forced to first retransmit the previously transmitted data.  As of
343	   this writing, this dynamic management of the subflows is not yet
344	   implemented in the Multipath TCP implementation in the Linux kernel.

346	   A third use case has been the coupling between software defined
347	   networking techniques such as Openflow and Multipath TCP.  Openflow
348	   can be used to configure different paths inside a network.  Using an
349	   international network, [TNC13] demonstrates that Multipath TCP can
350	   achieve high throughput in the wide area.  An interesting point to
351	   note about the measurements reported in [TNC13] is that the
352	   measurement setup used four paths through the WAN.  Only two of these
353	   paths were disjoint.  When Multipath TCP was used, the congestion
354	   control scheme ensured that only two of these paths were actually
355	   used.

357	4.  Congestion control

359	   Congestion control has been an important problem for Multipath TCP.
360	   The standardised congestion control scheme for Multipath TCP is
361	   defined in [RFC6356] and [NSDI11].  This congestion control scheme
362	   has been implemented in the Linux implementation of Multipath TCP.
363	   Linux uses a modular architecture to support various congestion
364	   control schemes.  This architecture is applicable for both regular
365	   TCP and Multipath TCP.  While the coupled congestion control scheme
366	   defined in [RFC6356] is the default congestion control scheme in the
367	   Linux implementation, other congestion control schemes have been
368	   added.  The second congestion control scheme is OLIA [CONEXT12].
369	   This congestion control scheme is also an adaptation of the NewReno
370	   single path congestion control scheme to support multiple paths.
371	   Simulations and measurements have shown that it provides some
372	   performance benefits compared to the the default congestion control
373	   scheme [CONEXT12].  Measurement over a wide range of parameters
374	   reported in [CONEXT13] also indicate some benefits with the OLIA
375	   congestion control scheme.  Recently, a delay-based congestion
376	   control scheme has been ported to the Multipath TCP implementation in
377	   the Linux kernel.  This congestion control scheme has been evaluated
378	   by using simulations in [ICNP12].  As of this writing, it has not yet
379	   been evaluated by performing large measurement campaigns.

381	5.  Subflow management

383	   The multipath capability of Multipath TCP comes from the utilization
384	   of one subflow per path.  The Multipath TCP architecture [RFC6182]
385	   and the protocol specification [RFC6824] define the basic usage of
386	   the subflows and the protocol mechanisms that are required to create
387	   and terminate them.  However, there are no guidelines on how subflows
388	   are used during the lifetime of a Multipath TCP session.  Most of the
389	   experiments with Multipath TCP have been performed in controlled
390	   environments.  Still, based on the experience running them and
391	   discussions on the mptcp-dev mailing list, interesting lessons have
392	   been learned about the management of these subflows.

394	   From a subflow viewpoint, the Multipath TCP protocol is completely
395	   symmetrical.  Both the clients and the server have the capability to
396	   create subflows.  However in practice the existing Multipath TCP
397	   implementations [I-D.eardley-mptcp-implementations-survey] have opted
398	   for a strategy where only the client creates new subflows.  The main
399	   motivation for this strategy is that often the client resides behind
400	   a NAT or a firewall, preventing passive subflow openings on the
401	   client.  Although there are environments such as datacenters where
402	   this problem does not occur, as of this writing, no precise
403	   requirement has emerged for allowing the server to create new
404	   subflows.

406	5.1.  Implemented subflow managers

408	   The Multipath TCP implementation in the Linux kernel includes several
409	   strategies to manage the subflows that compose a Multipath TCP
410	   session.  The basic subflow manager is the full-mesh.  As the name
411	   implies, it creates a full-mesh of subflows between the communicating
412	   hosts.

414	   The most frequent use case for this subflow manager is a multihomed
415	   client connected to a single-homed server.  In this case, one subflow
416	   is created for each interface on the client.  The current
417	   implementation of the full-mesh subflow manager is static.  The
418	   subflows are created immediately after the creation of the initial
419	   subflow.  If one subflow fails during the lifetime of the Multipath
420	   TCP session (e.g. due to excessive retransmissions, or the loss of
421	   the corresponding interface), it is not always reestablished.  There
422	   is ongoing work to enhance the full-mesh path manager to deal with
423	   such events.

425	   When the server is multihomed, using the full-mesh subflow manager
426	   may lead to a large number of subflows being established.  For
427	   example, consider a dual-homed client connected to a server with
428	   three interfaces.  In this case, even if the subflows are only
429	   created by the client, 6 subflows will be established.  This may be
430	   excessive in some environments, in particular when the client and/or
431	   the server have a large number of interfaces.  It should be noted
432	   that there have been reports on the mptcp-dev mailing indicating that
433	   users rely on Multipath TCP to aggregate more than four different
434	   interfaces.  Thus, there is a need for supporting many interfaces
435	   efficiently.

437	   It should be noted that creating subflows between multihomed clients
438	   and servers may sometimes lead to operational issues as observed by
439	   discussions on the mptcp-dev mailing list.  In some cases the network
440	   operators would like to have a better control on how the subflows are
441	   created by Multipath TCP.  This might require the definition of
442	   policy rules to control the operation of the subflow manager.  The
443	   two scenarios below illustrate some of these requirements.

445	           host1 ----------  switch1 ----- host2
446	             |                   |            |
447	             +--------------  switch2 --------+

449	                Figure 2: Simple switched network topology

451	   Consider the simple network topology shown in Figure 2.  From an
452	   operational viewpoint, a network operator could want to create two
453	   subflows between the communicating hosts.  From a bandwidth
454	   utilization viewpoint, the most natural paths are host1-switch1-host2
455	   and host1-switch2-host2.  However, a Multipath TCP implementation
456	   running on these two hosts may sometimes have difficulties to achieve
457	   this result.

459	   To understand the difficulty, let us consider different allocation
460	   strategies for the IP addresses.  A first strategy is to assign two
461	   subnets : subnetA (resp. subnetB) contains the IP addresses of
462	   host1's interface to switch1 (resp. switch2) and host2's interface to
463	   switch1 (resp. switch2).  In this case, a Multipath TCP subflow
464	   manager should only create one subflow per subnet.  To enforce the
465	   utilization of these paths, the network operator would have to
466	   specify a policy that prefers the subflows in the same subnet over
467	   subflows between addresses in different subnets.  It should be noted
468	   that the policy should probably also specify how the subflow manager
469	   should react when an interface or subflow fails.

471	   A second strategy is to use a single subnet for all IP addresses.  In
472	   this case, it becomes more difficult to specify a policy that
473	   indicates which subflows should be established.

475	   The second subflow manager that is currently supported by the
476	   Multipath TCP implementation in the Linux kernel is the ndiffport
477	   subflow manager.  This manager was initially created to exploit the
478	   path diversity that exists between single-homed hosts due to the
479	   utilization of flow-based load balancing techniques.  This subflow
480	   manager creates N subflows between the same pair of IP addresses.
481	   The N subflows are created by the client and differ only in the
482	   source port selected by the client.

484	5.2.  Subflow destination port

486	   The Multipath TCP protocol relies on the token contained in the
487	   MP_JOIN option to associate a subflow to an existing Multipath TCP
488	   session.  This implies that there is no restriction on the source
489	   address, destination address and source or destination ports used for
490	   the new subflow.  The ability to use different source and destination
491	   addresses is key to support multihomed servers and clients.  The
492	   ability to use different destination port numbers is worth being
493	   discussed because it has operational implications.

495	   For illustration, consider a dual-homed client that creates a second
496	   subflow to reach a single-homed server as illustrated in the
497	   Figure 3.

499	           client ------- r1 --- internet --- server
500	               |                   |
501	               +----------r2-------+

503	       Figure 3: Multihomed-client connected to single-homed server

505	   When the Multipath TCP implementation in the Linux kernel creates the
506	   second subflow it uses the same destination port as the initial
507	   subflow.  This choice is motivated by the fact that the server might
508	   be protected by a firewall and only accept TCP connections (including
509	   subflows) on the official port number.  Using the same destination
510	   port for all subflows is also useful for operators that rely on the
511	   port numbers to track application usage in their network.

513	   There have been suggestions from Multipath TCP users to modify the
514	   implementation to allow the client to use different destination ports
515	   to reach the server.  This suggestion seems mainly motivated by
516	   traffic shaping middleboxes that are used in some wireless networks.
517	   In networks where different shaping rates are associated to different
518	   destination port numbers, this could allow Multipath TCP to reach a
519	   higher performance.  As of this writing, we are not aware of any
520	   implementation of this kind of tweaking.

522	   However, from an implementation point-of-view supporting different
523	   destination ports for the same Multipath TCP connection introduces a
524	   new performance issue.  A legacy implementation of a TCP stack
525	   creates a listening socket to react upon incoming SYN segments.  The
526	   listening socket is handling the SYN segments that are sent on a
527	   specific port number.  Demultiplexing incoming segments can thus be
528	   done solely by looking at the IP addresses and the port numbers.
529	   With Multipath TCP however, incoming SYN segments may have an MP_JOIN
530	   option with a different destination port.  This means, that all
531	   incoming segments that did not match on an existing listening-socket
532	   or an already established socket must be parsed for an eventual
533	   MP_JOIN option.  This imposes an additional cost on servers,
534	   previously not existent on legacy TCP implementations.

536	5.3.  Closing subflows

538	                    client                       server
539	                       |                           |
540	   MPTCP: established  |                           | MPTCP: established
541	   Sub: established    |                           | Sub: established
542	                       |                           |
543	                       |         DATA_FIN          |
544	   MPTCP: close-wait   | <------------------------ | close()   (step 1)
545	   Sub: established    |         DATA_ACK          |
546	                       | ------------------------> | MPTCP: fin-wait-2
547	                       |                           | Sub: established
548	                       |                           |
549	                       |  DATA_FIN + subflow-FIN   |
550	   close()/shutdown()  | ------------------------> | MPTCP: time-wait
551	   (step 2)            |        DATA_ACK           | Sub: close-wait
552	   MPTCP: closed       | <------------------------ |
553	   Sub: fin-wait-2     |                           |
554	                       |                           |
555	                       |        subflow-FIN        |
556	   MPTCP: closed       | <------------------------ | subflow-close()
557	   Sub: time-wait      |        subflow-ACK        |
558	   (step 3)            | ------------------------> | MPTCP: time-wait
559	                       |                           | Sub: closed
560	                       |                           |

562	     Figure 4: Multipath TCP may not be able to avoid time-wait state
563	                  (even if enforced by the application).

565	   Figure 4 shows a very particular issue within Multipath TCP.  Many
566	   high-performance applications try to avoid Time-Wait state by
567	   deferring the closure of the connection until the peer has sent a
568	   FIN.  That way, the client on the left of Figure 4 does a passive
569	   closure of the connection, transitioning from Close-Wait to Last-ACK
570	   and finally freeing the resources after reception of the ACK of the
571	   FIN.  An application running on top of a Multipath TCP enabled Linux
572	   kernel might also use this approach.  The difference here is that the
573	   close() of the connection (Step 1 in Figure 4) only triggers the
574	   sending of a DATA_FIN.  Nothing guarantees that the kernel is ready
575	   to combine the DATA_FIN with a subflow-FIN.  The reception of the
576	   DATA_FIN will make the application trigger the closure of the
577	   connection (step 2), trying to avoid Time-Wait state with this late
578	   closure.  This time, the kernel might decide to combine the DATA_FIN
579	   with a subflow-FIN.  This decision will be fatal, as the subflow's
580	   state machine will not transition from Close-Wait to Last-Ack, but
581	   rather go through Fin-Wait-2 into Time-Wait state.  The Time-Wait
582	   state will consume resources on the host for at least 2 MSL (Maximum
583	   Segment Lifetime).  Thus, a smart application, that tries to avoid
584	   Time-Wait state by doing late closure of the connection actually ends
585	   up with one of its subflows in Time-Wait state.  A high-performance
586	   Multipath TCP kernel implementation should honor the desire of the
587	   application to do passive closure of the connection and successfully
588	   avoid Time-Wait state - even on the subflows.

590	   The solution to this problem lies in an optimistic assumption that a
591	   host doing active-closure of a Multipath TCP connection by sending a
592	   DATA_FIN will soon also send a FIN on all its in subflows.  Thus, the
593	   passive closer of the connection can simply wait for the peer to send
594	   exactly this FIN - enforcing passive closure even on the subflows.
595	   Of course, to avoid consuming resources indefinitely, a timer must
596	   limit the time our implementation waits for the FIN.

598	6.  Packet schedulers

600	   In a Multipath TCP implementation, the packet scheduler is the
601	   algorithm that is executed when transmitting each packet to decide on
602	   which subflow it needs to be transmitted.  The packet scheduler
603	   itself does not have any impact on the interoperability of Multipath
604	   TCP implementations.  However, it may clearly impact the performance
605	   of Multipath TCP sessions.  It is important to note that the problem
606	   of scheduling Multipath TCP packets among subflows is different from
607	   the problem of scheduling SCTP messages.  SCTP implementations also
608	   include schedulers, but these are used to schedule the different
609	   streams.  Multipath TCP uses a single data stream.

611	   Various researchers have explored theoretically and by simulations
612	   the problem of scheduling packets among Multipath TCP subflows
613	   [ICC14].  Unfortunately, none of the proposed techniques have been
614	   implemented and used in real deployment.  A detailed analysis of the
615	   impact of the packet scheduler will appear in [CSWS14].  This article
616	   proposes a pluggable architecture for the scheduler used by the
617	   Multipath TCP implementation in the Linux kernel.  This architecture
618	   allows researchers to experiment with different types of schedulers.
619	   Two schedulers are compared in [CSWS14] : round-robin and lowest-rtt-
620	   first.  The experiments and measurements described in [CSWS14] show
621	   that the lowest-rtt-first scheduler appears to be the best compromise
622	   from a performance viewpoint.

624	   Another study of the packet schedulers is presented in [PAMS2014].
625	   This study relies on simulations with the Multipath TCP
626	   implementation in the Linux kernel.  The simulation scenarios
627	   discussed in [PAMS2014] confirm the impact of the packet scheduler on
628	   the performance of Multipath TCP.

630	7.  Segment size selection

632	   When an application performs a write/send system call, the kernel
633	   allocates a packet buffer (sk_buff in Linux) to store the data the
634	   application wants to send.  The kernel will store at most one MSS
635	   (Maximum Segment Size) of data per buffer.  As MSS can differ amongst
636	   subflows, an MPTCP implementation must select carefully the MSS used
637	   to generate application data.  The Linux kernel implementation had
638	   various ways of selecting the MSS: minimum or maximum amongst the
639	   different subflows.  However, these heuristics of MSS selection can
640	   cause significant performances issues in some environment.  Consider
641	   the following example.  An MPTCP connection has two established
642	   subflows that respectively use a MSS of 1420 and 1428 bytes.  If
643	   MPTCP selects the maximum, then the application will generate
644	   segments of 1428 bytes of data.  An MPTCP implementation will have to
645	   split the segment in two (a 1420-byte and 8-byte segments) when
646	   pushing on the subflow with the smallest MSS.  The latter segment
647	   will introduce a large overhead as for a single data segment 2 slots
648	   will be used in the congestion window (in packets) therefore reducing
649	   by ~2 the potential throughput (in bytes/s) of this subflow.  Taking
650	   the smallest MSS does not solve the issue as there might be a case
651	   where the sublow with the smallest MSS will only participate
652	   marginally to the overall performance therefore reducing the
653	   potential throughput of the other subflows.

655	   The Linux implementation recently took another approach [DetalMSS].
656	   Instead of selecting the minimum and maximum values, it now
657	   dynamically adapts the MSS based on the contribution of all the
658	   subflows to the connection's throughput.  For this it computes, for
659	   each subflow, the potential throughput achieved by selecting each MSS
660	   value and by taking into account the lost space in the cwnd.  It then
661	   selects the MSS that allows to achieve the highest potential
662	   throughput.

664	8.  Interactions with the Domain Name System

666	   Multihomed clients such as smartphones could lead to operational
667	   problems when interacting with the Domain Name System.  When a
668	   single-homed client performs a DNS query, it receives from its local
669	   resolver the best answer for its request.  If the client is
670	   multihomed, the answer returned to the DNS query may vary with the
671	   interface over which it has been sent.

673	                      cdn1
674	                       |
675	           client -- cellular -- internet -- cdn3
676	              |                   |
677	              +----- wifi --------+
678	                       |
679	                     cdn2

681	                     Figure 5: Simple network topology

683	   If the client sends a DNS query over the WiFi interface, the answer
684	   will point to the cdn2 server while the same request sent over the
685	   cellular interface will point to the cdn1 server.  This might cause
686	   problems for CDN providers that locate their servers inside ISP
687	   networks and have contracts that specify that the CDN server will
688	   only be accessed from within this particular ISP.  Assume now that
689	   both the client and the CDN servers support Multipath TCP.  In this
690	   case, a Multipath TCP session from cdn1 or cdn2 would potentially use
691	   both the cellular network and the WiFi network.  This would violate
692	   the contract between the CDN provider and the network operators.  A
693	   possible solution to prevent this problem would be to modify the DNS
694	   resolution on the client.  The client subnet EDNS extension defined
695	   in [I-D.vandergaast-edns-client-subnet] could be used for this
696	   purpose.  When the client sends a DNS query from its WiFi interface,
697	   it should also send the client subnet corresponding to the cellular
698	   interface in this request.  This would indicate to the resolver that
699	   the answer should be valid for both the WiFi and the cellular
700	   interfaces (e.g., the cdn3 server).

702	9.  Captive portals

704	   Multipath TCP enables a host to use different interfaces to reach a
705	   server.  In theory, this should ensure connectivity when at least one
706	   of the interfaces is active.  In practice however, there are some
707	   particular scenarios with captive portals that may cause operational
708	   problems.  The reference environment is the following :

710	           client -----  network1
711	                |
712	                +------- internet ------------- server

714	                    Figure 6: Issue with captive portal

716	   The client is attached to two networks : network1 that provides
717	   limited connectivity and the entire Internet through the second
718	   network interface.  In practice, this scenario corresponds to an open
719	   WiFi network with a captive portal for network1 and a cellular
720	   service for the second interface.  On many smartphones, the WiFi
721	   interface is preferred over the cellular interface.  If the
722	   smartphone learns a default route via both interfaces, it will
723	   typically prefer to use the WiFi interface to send its DNS request
724	   and create the first subflow.  This is not optimal with Multipath
725	   TCP.  A better approach would probably be to try a few attempts on
726	   the WiFi interface and then try to use the second interface for the
727	   initial subflow as well.

729	10.  Conclusion

731	   In this document, we have documented a few years of experience with
732	   Multipath TCP.  The information presented in this document was
733	   gathered from scientific publications and discussions with various
734	   users of the Multipath TCP implementation in the Linux kernel.

736	11.  Acknowledgements

738	   This work was partially supported by the FP7-Trilogy2 project.  We
739	   would like to thank all the implementers and users of the Multipath
740	   TCP implementation in the Linux kernel.

742	12.  Changelog

744	   o  initial version :

746	13.  Informative References

748	   [CACM14]   Paasch, C. and O. Bonaventure, "Multipath TCP",
749	              Communications of the ACM, 57(4):51-57 , April 2014,
750	              <http://inl.info.ucl.ac.be/publications/multipath-tcp>.

752	   [CONEXT12]
753	              Khalili, R., Gast, N., Popovic, M., Upadhyay, U., and J.
754	              Leboudec, "MPTCP is not pareto-optimal performance issues
755	              and a possible solution", Proceedings of the 8th
756	              international conference on Emerging networking
757	              experiments and technologies (CoNEXT12) , 2012.

759	   [CONEXT13]
760	              Paasch, C., Khalili, R., and O. Bonaventure, "On the
761	              Benefits of Applying Experimental Design to Improve
762	              Multipath TCP", Conference on emerging Networking
763	              EXperiments and Technologies (CoNEXT) , December 2013,
764	              <http://inl.info.ucl.ac.be/publications/benefits-applying-
765	              experimental-design-improve-multipath-tcp>.

767	   [CSWS14]   Paasch, C., Ferlin, S., Alay, O., and O. Bonaventure,
768	              "Experimental Evaluation of Multipath TCP Schedulers",
769	              SIGCOMM CSWS2014 workshop , August 2014.

771	   [Cellnet12]
772	              Paasch, C., Detal, G., Duchene, F., Raiciu, C., and O.
773	              Bonaventure, "Exploring Mobile/WiFi Handover with
774	              Multipath TCP", ACM SIGCOMM workshop on Cellular Networks
775	              (Cellnet12) , 2012,
776	              <http://inl.info.ucl.ac.be/publications/
777	              exploring-mobilewifi-handover-multipath-tcp>.

779	   [DetalMSS]
780	              Detal, G., "Adaptive MSS value", Post on the mptcp-dev
781	              mailing list , September 2014, <https://listes-
782	              2.sipr.ucl.ac.be/sympa/arc/mptcp-dev/2014-09/
783	              msg00130.html>.

785	   [HotMiddlebox13]
786	              Hesmans, B., Duchene, F., Paasch, C., Detal, G., and O.
787	              Bonaventure, "Are TCP Extensions Middlebox-proof?", CoNEXT
788	              workshop HotMiddlebox , December 2013,
789	              <http://inl.info.ucl.ac.be/publications/
790	              are-tcp-extensions-middlebox-proof>.

792	   [HotNets]  Raiciu, C., Pluntke, C., Barre, S., Greenhalgh, A.,
793	              Wischik, D., and M. Handley, "Data center networking with
794	              multipath TCP", Proceedings of the 9th ACM SIGCOMM
795	              Workshop on Hot Topics in Networks (Hotnets-IX) , 2010,
796	              <http://doi.acm.org/10.1145/1868447.1868457>.

798	   [I-D.eardley-mptcp-implementations-survey]
799	              Eardley, P., "Survey of MPTCP Implementations", draft-
800	              eardley-mptcp-implementations-survey-02 (work in
801	              progress), July 2013.

803	   [I-D.vandergaast-edns-client-subnet]
804	              Contavalli, C., Gaast, W., Leach, S., and E. Lewis,
805	              "Client Subnet in DNS Requests", draft-vandergaast-edns-
806	              client-subnet-02 (work in progress), July 2013.

808	   [ICC14]    Kuhn, N., Lochin, E., Mifdaoui, A., Sarwar, G., Mehani,
809	              O., and R. Boreli, "DAPS Intelligent Delay-Aware Packet
810	              Scheduling For Multipath Transport", IEEE ICC 2014 , 2014.

812	   [ICNP12]   Cao, Y., Xu, M., and X. Fu, "Delay-based congestion
813	              control for multipath TCP", 20th IEEE International
814	              Conference on Network Protocols (ICNP) , 2012.

816	   [IMC11]    Honda, M., Nishida, Y., Raiciu, C., Greenhalgh, A.,
817	              Handley, M., and H. Tokuda, "Is it still possible to
818	              extend TCP?", Proceedings of the 2011 ACM SIGCOMM
819	              conference on Internet measurement conference (IMC '11) ,
820	              2011, <http://doi.acm.org/10.1145/2068816.2068834>.

822	   [IMC13a]   Detal, G., Hesmans, B., Bonaventure, O., Vanaubel, Y., and
823	              B. Donnet, "Revealing Middlebox Interference with
824	              Tracebox", Proceedings of the 2013 ACM SIGCOMM conference
825	              on Internet measurement conference , 2013,
826	              <http://inl.info.ucl.ac.be/publications/
827	              revealing-middlebox-interference-tracebox>.

829	   [IMC13b]   Chen, Y., Lim, Y., Gibbens, R., Nahum, E., Khalili, R.,
830	              and D. Towsley, "A measurement-based study of MultiPath
831	              TCP performance over wireless network", Proceedings of the
832	              2013 conference on Internet measurement conference (IMC
833	              '13) , n.d., <http://doi.acm.org/10.1145/2504730.2504751>.

835	   [IMC13c]   Pelsser, C., Cittadini, L., Vissicchio, S., and R. Bush,
836	              "From Paris to Tokyo on the suitability of ping to measure
837	              latency", Proceedings of the 2013 conference on Internet
838	              measurement conference (IMC '13) , 2013,
839	              <http://doi.acm.org/10.1145/2504730.2504765>.

841	   [INFOCOM14]
842	              Lim, Y., Chen, Y., Nahum, E., Towsley, D., and K. Lee,
843	              "Cross-Layer Path Management in Multi-path Transport
844	              Protocol for Mobile Devices", IEEE INFOCOM'14 , 2014.

846	   [IOS7]     "Multipath TCP Support in iOS 7", January 2014,
847	              <http://support.apple.com/kb/HT5977>.

849	   [MBTest]   Hesmans, B., "MBTest", 2013,
850	              <https://bitbucket.org/bhesmans/mbtest>.

852	   [MultipathTCP-Linux]
853	              Paasch, C., Barre, S., and . et al, "Multipath TCP
854	              implementation in the Linux kernel", n.d.,
855	              <http://www.multipath-tcp.org>.

857	   [NSDI11]   Wischik, D., Raiciu, C., Greenhalgh, A., and M. Handley,
858	              "Design, implementation and evaluation of congestion
859	              control for Multipath TCP", In Proceedings of the 8th
860	              USENIX conference on Networked systems design and
861	              implementation (NSDI11) , 2011.

863	   [NSDI12]   Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M.,
864	              Duchene, F., Bonaventure, O., and M. Handley, "How Hard
865	              Can It Be? Designing and Implementing a Deployable
866	              Multipath TCP", USENIX Symposium of Networked Systems
867	              Design and Implementation (NSDI12) , April 2012,
868	              <http://inl.info.ucl.ac.be/publications/how-hard-can-it-
869	              be-designing-and-implementing-deployable-multipath-tcp>.

871	   [PAMS2014]
872	              Arzani, B., Gurney, A., Cheng, S., Guerin, R., and B. Loo,
873	              "Impact of Path Selection and Scheduling Policies on MPTCP
874	              Performance", PAMS2014 , 2014.

876	   [RFC1812]  Baker, F., "Requirements for IP Version 4 Routers", RFC
877	              1812, June 1995.

879	   [RFC6182]  Ford, A., Raiciu, C., Handley, M., Barre, S., and J.
880	              Iyengar, "Architectural Guidelines for Multipath TCP
881	              Development", RFC 6182, March 2011.

883	   [RFC6356]  Raiciu, C., Handley, M., and D. Wischik, "Coupled
884	              Congestion Control for Multipath Transport Protocols", RFC
885	              6356, October 2011.

887	   [RFC6824]  Ford, A., Raiciu, C., Handley, M., and O. Bonaventure,
888	              "TCP Extensions for Multipath Operation with Multiple
889	              Addresses", RFC 6824, January 2013.

891	   [SIGCOMM11]
892	              Raiciu, C., Barre, S., Pluntke, C., Greenhalgh, A.,
893	              Wischik, D., and M. Handley, "Improving datacenter
894	              performance and robustness with multipath TCP",
895	              Proceedings of the ACM SIGCOMM 2011 conference , n.d.,
896	              <http://doi.acm.org/10.1145/2018436.2018467>.

898	   [TNC13]    van der Pol, R., Bredel, M., and A. Barczyk, "Experiences
899	              with MPTCP in an intercontinental multipathed OpenFlow
900	              network", TNC2013 , 2013.

902	   [tracebox]
903	              Detal, G., "tracebox", 2013, <http://www.tracebox.org>.

905	Authors' Addresses

907	   Olivier Bonaventure
908	   UCLouvain

910	   Email: Olivier.Bonaventure@uclouvain.be

912	   Christoph Paasch
913	   UCLouvain

915	   Email: Christoph.Paasch@uclouvain.be

917	   Gregory Detal
918	   UCLouvain

920	   Email: Gregory.Detal@uclouvain.be