idnits 2.17.1 

draft-bonaventure-mptcp-backup-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The abstract seems to contain references ([RFC6824]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 332: '...   value MUST be larger than the UPERF...'

  -- The draft header indicates that this document updates RFC6824, but the
     abstract doesn't seem to directly say this.  It does mention RFC6824
     though, so this could be OK.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 06, 2015) is 3211 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC 6824 (Obsoleted by RFC 8684)

  -- Obsolete informational reference (is this intentional?): RFC  793
     (Obsoleted by RFC 9293)


     Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	MPTCP Working Group                                       O. Bonaventure
3	Internet-Draft                                             Q. De Coninck
4	Updates: 6824 (if approved)                                    M. Baerts
5	Intended status: Experimental                                 F. Duchene
6	Expires: January 7, 2016                                      B. Hesmans
7	                                                               UCLouvain
8	                                                           July 06, 2015

10	                Improving Multipath TCP Backup Subflows
11	                   draft-bonaventure-mptcp-backup-00

13	Abstract

15	   This document documents some issues with the current definition of
16	   the backup subflows in [RFC6824].  The solution proposed in [RFC6824]
17	   works well when a subflow completely fails.  However, if a subflow
18	   suffers from huge packet losses, but still remains up, then the delay
19	   to switch to the backup subflow may be very long.  We propose to
20	   measure the evolution of the retransmission timer (RTO) to detect the
21	   bad performance of subflows.

23	Status of This Memo

25	   This Internet-Draft is submitted in full conformance with the
26	   provisions of BCP 78 and BCP 79.

28	   Internet-Drafts are working documents of the Internet Engineering
29	   Task Force (IETF).  Note that other groups may also distribute
30	   working documents as Internet-Drafts.  The list of current Internet-
31	   Drafts is at http://datatracker.ietf.org/drafts/current/.

33	   Internet-Drafts are draft documents valid for a maximum of six months
34	   and may be updated, replaced, or obsoleted by other documents at any
35	   time.  It is inappropriate to use Internet-Drafts as reference
36	   material or to cite them other than as "work in progress."

38	   This Internet-Draft will expire on January 7, 2016.

40	Copyright Notice

42	   Copyright (c) 2015 IETF Trust and the persons identified as the
43	   document authors.  All rights reserved.

45	   This document is subject to BCP 78 and the IETF Trust's Legal
46	   Provisions Relating to IETF Documents
47	   (http://trustee.ietf.org/license-info) in effect on the date of
48	   publication of this document.  Please review these documents
49	   carefully, as they describe your rights and restrictions with respect
50	   to this document.  Code Components extracted from this document must
51	   include Simplified BSD License text as described in Section 4.e of
52	   the Trust Legal Provisions and are provided without warranty as
53	   described in the Simplified BSD License.

55	Table of Contents

57	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
58	   2.  What is a Subflow Failure ? . . . . . . . . . . . . . . . . .   3
59	   3.  Detecting Underperforming Subflows  . . . . . . . . . . . . .   5
60	   4.  Security considerations . . . . . . . . . . . . . . . . . . .   8
61	   5.  IANA considerations . . . . . . . . . . . . . . . . . . . . .   8
62	   6.  Conclusion  . . . . . . . . . . . . . . . . . . . . . . . . .   9
63	   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   9
64	   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   9
65	     8.1.  Normative References  . . . . . . . . . . . . . . . . . .   9
66	     8.2.  Informative References  . . . . . . . . . . . . . . . . .   9
67	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  10

69	1.  Introduction

71	   Multipath TCP is an extension to TCP [RFC0793] that was specified in
72	   [RFC6824].  A Multipath TCP connection is composed of one or more
73	   subflows.  Each subflow is a TCP connection that is established by
74	   using the classical TCP three-way handshake.  The subflows that
75	   compose a Multipath TCP connection are not all equal.  [RFC6824]
76	   defines two types of subflows:

78	   o  the regular subflows

80	   o  the backup subflows

82	   The regular subflows can be used to transport any data.  The backup
83	   subflows are intended to be used only when all the regular subflows
84	   have failed.  Section 2.5 of [RFC6824] defines them by using the
85	   following sentence: "Hosts can indicate at initial subflow setup
86	   whether they wish the subflow to be used as a regular or backup path
87	   - a backup path only being used if there are no regular paths
88	   available."

90	   Intuitively, a user expects that the backup subflow will be used when
91	   the regular subflow fails to continue the data transfer and minimize
92	   the impact of the failure on the Multipath TCP connection.

94	   In this document, we first describe in Section 2 how Multipath TCP
95	   operates when backup subflows are used and some of the operational
96	   problems that this causes.  Backup subflows work well when subflows
97	   completely fail due to, for example, the reception of a RST segment
98	   or the invalidity of the IP address associated to the subflow
99	   (expired lease time, de-attachment from network, etc.).  However,
100	   there are many practical situations where the failure of a regular
101	   subflow cannot be quickly detected and the user experience suffers.
102	   We then propose in Section 3 a slight modification to the handling of
103	   the backup subflows in Multipath TCP.

105	2.  What is a Subflow Failure ?

107	   Experience with Multipath TCP shows that the backup subflows that are
108	   only used when all the other subflows have failed works well on fixed
109	   hosts where the loss of connectivity can be quickly detected by the
110	   affected host.  However, there are many situations where it can be
111	   difficult to detect the failure of a regular subflow.

113	                <-----  primary subflow  ----->

115	          +----link1----router1-------router2---link2---+
116	          |                                             |
117	       Client                                         Server
118	          |                                             |
119	          +----link3----router3-------router4---link4---+

121	                <-----  backup subflow  ----->

123	                         Figure 1: Simple network

125	   To understand the situation, let us consider the simple network shown
126	   in Figure 1.  In this network, the client has established two
127	   subflows:

129	   o  a regular subflow passing through router1 and router2

131	   o  a backup subflow passing through router3 and router4

133	   [RFC6824] supports two methods to signal that a subflow is a backup
134	   subflow:

136	   o  setting the B bit in the MP_JOIN option that is used to create the
137	      subflow

139	   o  sending the MP_PRIO option with the B bit set

141	   Note that in both cases, when a host sets the B bit in the MP_JOIN or
142	   sends an MP_PRIO option, it requests the other host to only use the
143	   subflow if the other regular subflows have failed.  Setting the B bit
144	   in the MP_JOIN option or sending the MP_PRIO option does not affect
145	   the data sent by the host that sends this option [RFC6824].

147	   Let us now consider three different failure scenarios.  For
148	   simplicity, we assume that all the data flows from the Server to the
149	   Client and that the top subflow is the primary subflow while the
150	   bottom subflow was signaled as a backup subflow.

152	   Our first failure scenario is the simplest one: the failure of link1.
153	   In this case, the Client detects the failure locally.  This detection
154	   can be fast with wired link layer technologies and slower with some
155	   wireless technologies.  Once the failure has been detected, the
156	   Client can either send a REMOVE_ADDR option to indicate the failure
157	   of its address attached to link1 or send an MP_PRIO option with the B
158	   bit reset over the backup subflow.  In both cases, a single segment
159	   sent over the backup subflow is sufficient to inform the Server of
160	   the failure of the primary subflow.  Note that the REMOVE_ADDR and
161	   the MP_PRIO options are sent unreliably.  This implies that any loss
162	   of these options will further delay the recovery on the Server.

164	   Our second failure scenario is the symmetric scenario: the failure of
165	   link2.  In this case, the Server will react by sending a REMOVE_ADDR
166	   option over the backup subflow to indicate the loss of the address
167	   attached to this link.  Since the Server knows that the primary
168	   subflow has failed, it can immediately start to use the backup
169	   subflow to send data to the Client.  Experiments show that these two
170	   failure scenarios work well [Cellnet12].

172	   The third failure scenario is a failure of the link between router1
173	   and router2.  Different types of failures are possible on this link.
174	   We consider two extreme cases.  The first case is a pure link failure
175	   that is detected by the two routers.  Since there is no alternate
176	   path between router1 and router2 in our example network, the Client
177	   cannot reach the Server anymore over the top path.  Once router1 and
178	   router2 have detected the failure, they will return ICMP destination
179	   unreachable messages to the Client and the Server.  This error
180	   message could suggest a failure of the primary subflow.  According to
181	   [RFC1122], this ICMP message should cause the termination of the top
182	   subflow.  However, according to [RFC5461], current TCP
183	   implementations do not follow this recommendation and ignore the
184	   received ICMP messages.  This is motivated by the risk of denial of
185	   service attacks that could disrupt existing TCP connections by
186	   sending spoofed ICMP messages.  A Multipath TCP implementation could
187	   react differently and for example consider the subflow over which the
188	   ICMP message was received as temporarily unusable to cause the
189	   utilization of other (possibly backup) subflows.

191	   If a Multipath TCP implementation does not react to ICMP messages,
192	   the last resort method to detect the failure of the top path is the
193	   retransmission timer (RTO).  TCP implementations apply an exponential
194	   backoff algorithm to the retransmission timeout [RFC6298].  If the
195	   primary path fails, the retransmission timeout associated to this
196	   path will double until it reaches the maximum value configured on the
197	   TCP stack.  On many stacks, this limit is in the order of tens of
198	   minutes which does not match the expectations of the Multipath TCP
199	   user who expects that her backup subflow will be used earlier than
200	   that.  A similar situation occurs when the link between the two
201	   routers remains up but is so congested that packets sent on the
202	   regular subflow rarely traverse the link [BD2015].  In this case, the
203	   user also expects to be able to quickly use the backup subflow to
204	   preserve the end-to-end connectivity.

206	3.  Detecting Underperforming Subflows

208	   As explained in the previous section, users cannot accept a too long
209	   delay to detect the failure of a regular subflow and the switch to an
210	   existing backup subflow.  [RFC6824] allows a host to specify that a
211	   subflow is a backup subflow, but there is no definition of
212	   underperfoming subflows and no mechanism to allow applications to
213	   specify a switchover time to a backup subflow.

215	   Various techniques exist to detect failures.  Shim6 [RFC5533]
216	   includes the REAP protocol [RFC5534] to verify the reachability of
217	   addresses.  BFD [RFC5880] is used to detect link failures between
218	   routers and also over multihop paths [RFC5883].  Depending on the
219	   chosen parameters, these protocols can achieve fast detection and/or
220	   low overhead.  We do not believe that additional protocols are
221	   required to quickly detect the failure of a subflow.  With its
222	   retransmission timer that doubles after each unsuccessful
223	   retransmission, Multipath TCP already has the ability to detect
224	   underperforming subflows.  If data is transmitted over a broken
225	   subflow, the retransmission timer of this subflow will quickly
226	   increase.  These successive retransmissions are an appropriate
227	   mechanism to detect the failure of a subflow and switch to a backup
228	   one provided that the TCP retransmission timer does not become too
229	   high.

231	   [RFC0793] specifies an abstract API that allows user applications to
232	   indicate bounds on the retransmission timer.  [RFC5482] goes further
233	   in by proposing a TCP option that can be used to signal a proposed
234	   maximum value for the TCP retransmission timeout through the User
235	   Timeout option [RFC5482].  This option specifies the maximum time
236	   that some data can remain unacknowledged before considering the
237	   connection to have failed.  In [RFC5482], the User Timeout is encoded
238	   as a 15 bits field that represents seconds or minutes.  This implies
239	   that the User Timeout option cannot be used to signal a bound smaller
240	   than 1 second.

242	   With the User Timeout option, the TCP connection must be terminated
243	   once its RTO reaches the signaled maximum value.

245	   [RFC5482] defines the following parameters for the RTO:

247	   o  U_LIMIT: the upper limit on the USER TIMEOUT

249	   o  L_LIMIT: the lower limit on the USER TIMEOUT

251	   In addition, the application can specify, e.g. through a socket
252	   option, the USER TIMEOUT that it wishes to use and advertise to the
253	   peer: ADV_UTO.  Similarly, the REMOTE_UTO is the User Timeout option
254	   received from the peer.  Then, [RFC5482] defines the USER TIMEOUT
255	   with the following formula:

257	   USER_TIMEOUT = min(U_LIMIT, max(ADV_UTO, REMOTE_UTO, L_LIMIT))

259	   [RFC6824] does not discuss precisely how the User Timeout option
260	   should be handled if received over a Multipath TCP connection.  If
261	   this option is set through the regular socket API that does not
262	   expose any information about the subflows, it must apply on the
263	   overall Multipath TCP connection.

265	   In this document, we envision an API that exposes some parts of
266	   Multipath TCP to the application to enable them to make a better
267	   utilisation of the features of the protocol.  Such an API would
268	   expose some information about the subflows to the applications.

270	   A first possibility to control the performance of the subflows could
271	   be to specify a USER_TIMEOUT on a per subflow basis and terminate the
272	   subflows whose RTO has reached the USER_TIMEOUT.  However,
273	   terminating an underperforming subflow may be too severe in
274	   environments where there are transient losses such as wireless
275	   networks.  An alternative approach is to tag the subflow as
276	   underperforming and modify the operation of Multipath TCP.

278	   According to [RFC6824], an established subflow can operate in two
279	   modes :

281	   o  primary mode

283	   o  backup mode

285	   The initial subflow is always created in primary mode.  When a
286	   subflow is created, its mode depends on the B bit of the received
287	   MP_JOIN option.  The reception of the MP_PRIO option changes the mode
288	   of the corresponding subflow.  We a Multipath TCP implementation
289	   sends data, it always selects one of the available primary subflows
290	   to transmit the data.  The backup subflows are only selected if there
291	   is no established subflow in primary mode.

293	   We propose a new mode of operation : the underperforming mode.
294	   Subflows are still established in the primary or backup mode as
295	   explained above.  A subflow enters the underperforming mode as soon
296	   as its retransmission timer (RTO) reaches a configurable limit.  At
297	   this point, the subflow is considered to be underperforming.  An
298	   underperforming subflow cannot be selected for data transmission if
299	   there exists another subflow in primary or backup mode.  Once a
300	   subflow has been tagged as underperforming, it remains in this mode
301	   as long as there are unacknowledged data on this subflow.  Once all
302	   data has been acknowledged, it may return to the primary or backup
303	   mode.  Further experimentation is required to evaluate how quickly an
304	   underperforming subflow should leave the underperforming mode once
305	   all data has been acknowledged.

307	   System administrators and/or application developpers (e.g. through a
308	   socket option) should be able to specify the maximum RTO that causes
309	   a Multipath TCP subflow to be tagged as underperforming.  For this,
310	   we propose two new parameters:

312	   o  UPERF_ADV_TO: the upper threshold on the RTO that forces the
313	      subflow to be considered as underperforming

315	   o  UPERF_REMOTE_TO: the upper threshold on the RTO received from the
316	      remote peer

318	   The UPERF_ADV_TO is configured locally on the host.  It could be
319	   configured globally or on a per connection basis.  The configuration
320	   applies to all subflows of a Multipath TCP connection.

322	   The UPERF_REMOTE_TO is received in a Multipath TCP option.  This
323	   value applies only on the subflow over which it has been received.

325	   The UPERF_TIMEOUT that is used to detect underperforming subflows is
326	   then computed by using the following formula:

328	   UPERF_TIMEOUT = min(U_LIMIT, max(UPERF_ADV_TO, UPERF_REMOTE_TO,
329	   L_LIMIT))

331	   If a USER_TIMEOUT is defined for the Multipath TCP connection, its
332	   value MUST be larger than the UPERF_TIMEOUT.

334	   The UPERF_REMOTE_TO can be signaled by using a Multipath TCP option
335	   to the remote peer.  This document proposes the following
336	   experimental option to encode this information (Figure 2 :

338	                        1                   2                   3
339	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
340	    +---------------+---------------+-------+-----------------------+
341	    |     Kind      |    Length     |Subtype| Flags |  Experiment   |
342	    +---------------+---------------+-------+-------+---------------+
343	    | Id. (16 bits) |       Maximum  RTO  (milliseconds)            |
344	    +---------------------------------------------------------------+

346	     Figure 2: The UPERF Maximum RTO experimental Multipath TCP option

348	   We do not use the same encoding as [RFC5482] because the encoding for
349	   the USER_TIMEOUT option cannot support maximum RTOs that are smaller
350	   than one second.  There are already use cases where users do not
351	   accept to wait such a long time before switching to a backup subflow.

353	   The Experiment Identifier should be TBD and the flags must be used as
354	   defined in [I-D.bonaventure-mptcp-exp-option].

356	   If experiments conducted with this option show positive results, it
357	   could be possible to update the MP_PRIO option to encode the maximum
358	   RTO information as shown in Figure 3.

360	                         1                   2                   3
361	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
362	    +---------------+---------------+-------+-----+-+--------------+
363	    |     Kind      |     Length    |Subtype|     |B| AddrID (opt) |
364	    +---------------+---------------+-------+-----+-+--------------+
365	    |            Maximum RTO  (milliseconds)        |
366	    +-----------------------------------------------+

368	           Figure 3: The UPERF Maximum RTO Multipath TCP option

370	4.  Security considerations

372	   This document does not modify the security considerations for
373	   Multipath TCP.

375	5.  IANA considerations

377	   This document proposes the UPERF experimental Multipath TCP option
378	   whose experiment identifier is TBD.

380	   If experiments are successful, an update to this document will
381	   propose a new format for the MP_PRIO option defined in [RFC6824].

383	6.  Conclusion

385	   In this document, we have first explained some issues with the
386	   handling of backup subflows by Multipath TCP.  Multipath TCP meets
387	   the expectations of its uses when subflows fail completely.  In this
388	   case, Multipath TCP moves the traffic over the backup subflows.
389	   However, if the primary subflows underperform, Multipath TCP
390	   implementations may try to retransmit data over such subflows for a
391	   long period of time instead of switching quickly to the backup
392	   subflow.  We have then proposed to set an upper bound on the
393	   retransmission timer (RTO) to detect underperforming subflows.  This
394	   bound can be set locally of exchanged through the proposed UPERF
395	   Multipath TCP option.

397	7.  Acknowledgements

399	   This work was partially supported by the FP7-Trilogy2 project.  We
400	   would like to thank Mohamed Boucadair for his useful suggestions and
401	   comments on this document.

403	8.  References

405	8.1.  Normative References

407	   [RFC6824]  Ford, A., Raiciu, C., Handley, M., and O. Bonaventure,
408	              "TCP Extensions for Multipath Operation with Multiple
409	              Addresses", RFC 6824, January 2013.

411	8.2.  Informative References

413	   [BD2015]   Baerts, M. and Q. De Coninck, "Multipath TCP with Real
414	              Smartphone Applications", Master Thesis, UCL , June 2015.

416	   [Cellnet12]
417	              Paasch, C., Detal, G., Duchene, F., Raiciu, C., and O.
418	              Bonaventure, "Exploring Mobile/WiFi Handover with
419	              Multipath TCP", ACM SIGCOMM workshop on Cellular Networks
420	              (Cellnet12) , 2012,
421	              <http://inl.info.ucl.ac.be/publications/
422	              exploring-mobilewifi-handover-multipath-tcp>.

424	   [I-D.bonaventure-mptcp-exp-option]
425	              Bonaventure, O., benjamin.hesmans@uclouvain.be, b., and M.
426	              Boucadair, "Experimental Multipath TCP option", draft-
427	              bonaventure-mptcp-exp-option-00 (work in progress), June
428	              2015.

430	   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7, RFC
431	              793, September 1981.

433	   [RFC1122]  Braden, R., "Requirements for Internet Hosts -
434	              Communication Layers", STD 3, RFC 1122, October 1989.

436	   [RFC5461]  Gont, F., "TCP's Reaction to Soft Errors", RFC 5461,
437	              February 2009.

439	   [RFC5482]  Eggert, L. and F. Gont, "TCP User Timeout Option", RFC
440	              5482, March 2009.

442	   [RFC5533]  Nordmark, E. and M. Bagnulo, "Shim6: Level 3 Multihoming
443	              Shim Protocol for IPv6", RFC 5533, June 2009.

445	   [RFC5534]  Arkko, J. and I. van Beijnum, "Failure Detection and
446	              Locator Pair Exploration Protocol for IPv6 Multihoming",
447	              RFC 5534, June 2009.

449	   [RFC5880]  Katz, D. and D. Ward, "Bidirectional Forwarding Detection
450	              (BFD)", RFC 5880, June 2010.

452	   [RFC5883]  Katz, D. and D. Ward, "Bidirectional Forwarding Detection
453	              (BFD) for Multihop Paths", RFC 5883, June 2010.

455	   [RFC6298]  Paxson, V., Allman, M., Chu, J., and M. Sargent,
456	              "Computing TCP's Retransmission Timer", RFC 6298, June
457	              2011.

459	Authors' Addresses

461	   Olivier Bonaventure
462	   UCLouvain

464	   Email: Olivier.Bonaventure@uclouvain.be

466	   Quentin De Coninck
467	   UCLouvain

469	   Email: Quentin.Deconinck@student.uclouvain.be
470	   Matthieu Baerts
471	   UCLouvain

473	   Email: Matthieu.Baerts@student.uclouvain.be

475	   Fabien Duchene
476	   UCLouvain

478	   Email: Fabien.Duchene@uclouvain.be

480	   Benjamin Hesmans
481	   UCLouvain

483	   Email: Benjamin.Hesmans@uclouvain.be