idnits 2.17.1 draft-bonaventure-mptcp-backup-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC6824]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 332: '... value MUST be larger than the UPERF...' -- The draft header indicates that this document updates RFC6824, but the abstract doesn't seem to directly say this. It does mention RFC6824 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 06, 2015) is 3211 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 6824 (Obsoleted by RFC 8684) -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 MPTCP Working Group O. Bonaventure 3 Internet-Draft Q. De Coninck 4 Updates: 6824 (if approved) M. Baerts 5 Intended status: Experimental F. Duchene 6 Expires: January 7, 2016 B. Hesmans 7 UCLouvain 8 July 06, 2015 10 Improving Multipath TCP Backup Subflows 11 draft-bonaventure-mptcp-backup-00 13 Abstract 15 This document documents some issues with the current definition of 16 the backup subflows in [RFC6824]. The solution proposed in [RFC6824] 17 works well when a subflow completely fails. However, if a subflow 18 suffers from huge packet losses, but still remains up, then the delay 19 to switch to the backup subflow may be very long. We propose to 20 measure the evolution of the retransmission timer (RTO) to detect the 21 bad performance of subflows. 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on January 7, 2016. 40 Copyright Notice 42 Copyright (c) 2015 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 58 2. What is a Subflow Failure ? . . . . . . . . . . . . . . . . . 3 59 3. Detecting Underperforming Subflows . . . . . . . . . . . . . 5 60 4. Security considerations . . . . . . . . . . . . . . . . . . . 8 61 5. IANA considerations . . . . . . . . . . . . . . . . . . . . . 8 62 6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 9 63 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 64 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 65 8.1. Normative References . . . . . . . . . . . . . . . . . . 9 66 8.2. Informative References . . . . . . . . . . . . . . . . . 9 67 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 69 1. Introduction 71 Multipath TCP is an extension to TCP [RFC0793] that was specified in 72 [RFC6824]. A Multipath TCP connection is composed of one or more 73 subflows. Each subflow is a TCP connection that is established by 74 using the classical TCP three-way handshake. The subflows that 75 compose a Multipath TCP connection are not all equal. [RFC6824] 76 defines two types of subflows: 78 o the regular subflows 80 o the backup subflows 82 The regular subflows can be used to transport any data. The backup 83 subflows are intended to be used only when all the regular subflows 84 have failed. Section 2.5 of [RFC6824] defines them by using the 85 following sentence: "Hosts can indicate at initial subflow setup 86 whether they wish the subflow to be used as a regular or backup path 87 - a backup path only being used if there are no regular paths 88 available." 90 Intuitively, a user expects that the backup subflow will be used when 91 the regular subflow fails to continue the data transfer and minimize 92 the impact of the failure on the Multipath TCP connection. 94 In this document, we first describe in Section 2 how Multipath TCP 95 operates when backup subflows are used and some of the operational 96 problems that this causes. Backup subflows work well when subflows 97 completely fail due to, for example, the reception of a RST segment 98 or the invalidity of the IP address associated to the subflow 99 (expired lease time, de-attachment from network, etc.). However, 100 there are many practical situations where the failure of a regular 101 subflow cannot be quickly detected and the user experience suffers. 102 We then propose in Section 3 a slight modification to the handling of 103 the backup subflows in Multipath TCP. 105 2. What is a Subflow Failure ? 107 Experience with Multipath TCP shows that the backup subflows that are 108 only used when all the other subflows have failed works well on fixed 109 hosts where the loss of connectivity can be quickly detected by the 110 affected host. However, there are many situations where it can be 111 difficult to detect the failure of a regular subflow. 113 <----- primary subflow -----> 115 +----link1----router1-------router2---link2---+ 116 | | 117 Client Server 118 | | 119 +----link3----router3-------router4---link4---+ 121 <----- backup subflow -----> 123 Figure 1: Simple network 125 To understand the situation, let us consider the simple network shown 126 in Figure 1. In this network, the client has established two 127 subflows: 129 o a regular subflow passing through router1 and router2 131 o a backup subflow passing through router3 and router4 133 [RFC6824] supports two methods to signal that a subflow is a backup 134 subflow: 136 o setting the B bit in the MP_JOIN option that is used to create the 137 subflow 139 o sending the MP_PRIO option with the B bit set 141 Note that in both cases, when a host sets the B bit in the MP_JOIN or 142 sends an MP_PRIO option, it requests the other host to only use the 143 subflow if the other regular subflows have failed. Setting the B bit 144 in the MP_JOIN option or sending the MP_PRIO option does not affect 145 the data sent by the host that sends this option [RFC6824]. 147 Let us now consider three different failure scenarios. For 148 simplicity, we assume that all the data flows from the Server to the 149 Client and that the top subflow is the primary subflow while the 150 bottom subflow was signaled as a backup subflow. 152 Our first failure scenario is the simplest one: the failure of link1. 153 In this case, the Client detects the failure locally. This detection 154 can be fast with wired link layer technologies and slower with some 155 wireless technologies. Once the failure has been detected, the 156 Client can either send a REMOVE_ADDR option to indicate the failure 157 of its address attached to link1 or send an MP_PRIO option with the B 158 bit reset over the backup subflow. In both cases, a single segment 159 sent over the backup subflow is sufficient to inform the Server of 160 the failure of the primary subflow. Note that the REMOVE_ADDR and 161 the MP_PRIO options are sent unreliably. This implies that any loss 162 of these options will further delay the recovery on the Server. 164 Our second failure scenario is the symmetric scenario: the failure of 165 link2. In this case, the Server will react by sending a REMOVE_ADDR 166 option over the backup subflow to indicate the loss of the address 167 attached to this link. Since the Server knows that the primary 168 subflow has failed, it can immediately start to use the backup 169 subflow to send data to the Client. Experiments show that these two 170 failure scenarios work well [Cellnet12]. 172 The third failure scenario is a failure of the link between router1 173 and router2. Different types of failures are possible on this link. 174 We consider two extreme cases. The first case is a pure link failure 175 that is detected by the two routers. Since there is no alternate 176 path between router1 and router2 in our example network, the Client 177 cannot reach the Server anymore over the top path. Once router1 and 178 router2 have detected the failure, they will return ICMP destination 179 unreachable messages to the Client and the Server. This error 180 message could suggest a failure of the primary subflow. According to 181 [RFC1122], this ICMP message should cause the termination of the top 182 subflow. However, according to [RFC5461], current TCP 183 implementations do not follow this recommendation and ignore the 184 received ICMP messages. This is motivated by the risk of denial of 185 service attacks that could disrupt existing TCP connections by 186 sending spoofed ICMP messages. A Multipath TCP implementation could 187 react differently and for example consider the subflow over which the 188 ICMP message was received as temporarily unusable to cause the 189 utilization of other (possibly backup) subflows. 191 If a Multipath TCP implementation does not react to ICMP messages, 192 the last resort method to detect the failure of the top path is the 193 retransmission timer (RTO). TCP implementations apply an exponential 194 backoff algorithm to the retransmission timeout [RFC6298]. If the 195 primary path fails, the retransmission timeout associated to this 196 path will double until it reaches the maximum value configured on the 197 TCP stack. On many stacks, this limit is in the order of tens of 198 minutes which does not match the expectations of the Multipath TCP 199 user who expects that her backup subflow will be used earlier than 200 that. A similar situation occurs when the link between the two 201 routers remains up but is so congested that packets sent on the 202 regular subflow rarely traverse the link [BD2015]. In this case, the 203 user also expects to be able to quickly use the backup subflow to 204 preserve the end-to-end connectivity. 206 3. Detecting Underperforming Subflows 208 As explained in the previous section, users cannot accept a too long 209 delay to detect the failure of a regular subflow and the switch to an 210 existing backup subflow. [RFC6824] allows a host to specify that a 211 subflow is a backup subflow, but there is no definition of 212 underperfoming subflows and no mechanism to allow applications to 213 specify a switchover time to a backup subflow. 215 Various techniques exist to detect failures. Shim6 [RFC5533] 216 includes the REAP protocol [RFC5534] to verify the reachability of 217 addresses. BFD [RFC5880] is used to detect link failures between 218 routers and also over multihop paths [RFC5883]. Depending on the 219 chosen parameters, these protocols can achieve fast detection and/or 220 low overhead. We do not believe that additional protocols are 221 required to quickly detect the failure of a subflow. With its 222 retransmission timer that doubles after each unsuccessful 223 retransmission, Multipath TCP already has the ability to detect 224 underperforming subflows. If data is transmitted over a broken 225 subflow, the retransmission timer of this subflow will quickly 226 increase. These successive retransmissions are an appropriate 227 mechanism to detect the failure of a subflow and switch to a backup 228 one provided that the TCP retransmission timer does not become too 229 high. 231 [RFC0793] specifies an abstract API that allows user applications to 232 indicate bounds on the retransmission timer. [RFC5482] goes further 233 in by proposing a TCP option that can be used to signal a proposed 234 maximum value for the TCP retransmission timeout through the User 235 Timeout option [RFC5482]. This option specifies the maximum time 236 that some data can remain unacknowledged before considering the 237 connection to have failed. In [RFC5482], the User Timeout is encoded 238 as a 15 bits field that represents seconds or minutes. This implies 239 that the User Timeout option cannot be used to signal a bound smaller 240 than 1 second. 242 With the User Timeout option, the TCP connection must be terminated 243 once its RTO reaches the signaled maximum value. 245 [RFC5482] defines the following parameters for the RTO: 247 o U_LIMIT: the upper limit on the USER TIMEOUT 249 o L_LIMIT: the lower limit on the USER TIMEOUT 251 In addition, the application can specify, e.g. through a socket 252 option, the USER TIMEOUT that it wishes to use and advertise to the 253 peer: ADV_UTO. Similarly, the REMOTE_UTO is the User Timeout option 254 received from the peer. Then, [RFC5482] defines the USER TIMEOUT 255 with the following formula: 257 USER_TIMEOUT = min(U_LIMIT, max(ADV_UTO, REMOTE_UTO, L_LIMIT)) 259 [RFC6824] does not discuss precisely how the User Timeout option 260 should be handled if received over a Multipath TCP connection. If 261 this option is set through the regular socket API that does not 262 expose any information about the subflows, it must apply on the 263 overall Multipath TCP connection. 265 In this document, we envision an API that exposes some parts of 266 Multipath TCP to the application to enable them to make a better 267 utilisation of the features of the protocol. Such an API would 268 expose some information about the subflows to the applications. 270 A first possibility to control the performance of the subflows could 271 be to specify a USER_TIMEOUT on a per subflow basis and terminate the 272 subflows whose RTO has reached the USER_TIMEOUT. However, 273 terminating an underperforming subflow may be too severe in 274 environments where there are transient losses such as wireless 275 networks. An alternative approach is to tag the subflow as 276 underperforming and modify the operation of Multipath TCP. 278 According to [RFC6824], an established subflow can operate in two 279 modes : 281 o primary mode 283 o backup mode 285 The initial subflow is always created in primary mode. When a 286 subflow is created, its mode depends on the B bit of the received 287 MP_JOIN option. The reception of the MP_PRIO option changes the mode 288 of the corresponding subflow. We a Multipath TCP implementation 289 sends data, it always selects one of the available primary subflows 290 to transmit the data. The backup subflows are only selected if there 291 is no established subflow in primary mode. 293 We propose a new mode of operation : the underperforming mode. 294 Subflows are still established in the primary or backup mode as 295 explained above. A subflow enters the underperforming mode as soon 296 as its retransmission timer (RTO) reaches a configurable limit. At 297 this point, the subflow is considered to be underperforming. An 298 underperforming subflow cannot be selected for data transmission if 299 there exists another subflow in primary or backup mode. Once a 300 subflow has been tagged as underperforming, it remains in this mode 301 as long as there are unacknowledged data on this subflow. Once all 302 data has been acknowledged, it may return to the primary or backup 303 mode. Further experimentation is required to evaluate how quickly an 304 underperforming subflow should leave the underperforming mode once 305 all data has been acknowledged. 307 System administrators and/or application developpers (e.g. through a 308 socket option) should be able to specify the maximum RTO that causes 309 a Multipath TCP subflow to be tagged as underperforming. For this, 310 we propose two new parameters: 312 o UPERF_ADV_TO: the upper threshold on the RTO that forces the 313 subflow to be considered as underperforming 315 o UPERF_REMOTE_TO: the upper threshold on the RTO received from the 316 remote peer 318 The UPERF_ADV_TO is configured locally on the host. It could be 319 configured globally or on a per connection basis. The configuration 320 applies to all subflows of a Multipath TCP connection. 322 The UPERF_REMOTE_TO is received in a Multipath TCP option. This 323 value applies only on the subflow over which it has been received. 325 The UPERF_TIMEOUT that is used to detect underperforming subflows is 326 then computed by using the following formula: 328 UPERF_TIMEOUT = min(U_LIMIT, max(UPERF_ADV_TO, UPERF_REMOTE_TO, 329 L_LIMIT)) 331 If a USER_TIMEOUT is defined for the Multipath TCP connection, its 332 value MUST be larger than the UPERF_TIMEOUT. 334 The UPERF_REMOTE_TO can be signaled by using a Multipath TCP option 335 to the remote peer. This document proposes the following 336 experimental option to encode this information (Figure 2 : 338 1 2 3 339 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 340 +---------------+---------------+-------+-----------------------+ 341 | Kind | Length |Subtype| Flags | Experiment | 342 +---------------+---------------+-------+-------+---------------+ 343 | Id. (16 bits) | Maximum RTO (milliseconds) | 344 +---------------------------------------------------------------+ 346 Figure 2: The UPERF Maximum RTO experimental Multipath TCP option 348 We do not use the same encoding as [RFC5482] because the encoding for 349 the USER_TIMEOUT option cannot support maximum RTOs that are smaller 350 than one second. There are already use cases where users do not 351 accept to wait such a long time before switching to a backup subflow. 353 The Experiment Identifier should be TBD and the flags must be used as 354 defined in [I-D.bonaventure-mptcp-exp-option]. 356 If experiments conducted with this option show positive results, it 357 could be possible to update the MP_PRIO option to encode the maximum 358 RTO information as shown in Figure 3. 360 1 2 3 361 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 362 +---------------+---------------+-------+-----+-+--------------+ 363 | Kind | Length |Subtype| |B| AddrID (opt) | 364 +---------------+---------------+-------+-----+-+--------------+ 365 | Maximum RTO (milliseconds) | 366 +-----------------------------------------------+ 368 Figure 3: The UPERF Maximum RTO Multipath TCP option 370 4. Security considerations 372 This document does not modify the security considerations for 373 Multipath TCP. 375 5. IANA considerations 377 This document proposes the UPERF experimental Multipath TCP option 378 whose experiment identifier is TBD. 380 If experiments are successful, an update to this document will 381 propose a new format for the MP_PRIO option defined in [RFC6824]. 383 6. Conclusion 385 In this document, we have first explained some issues with the 386 handling of backup subflows by Multipath TCP. Multipath TCP meets 387 the expectations of its uses when subflows fail completely. In this 388 case, Multipath TCP moves the traffic over the backup subflows. 389 However, if the primary subflows underperform, Multipath TCP 390 implementations may try to retransmit data over such subflows for a 391 long period of time instead of switching quickly to the backup 392 subflow. We have then proposed to set an upper bound on the 393 retransmission timer (RTO) to detect underperforming subflows. This 394 bound can be set locally of exchanged through the proposed UPERF 395 Multipath TCP option. 397 7. Acknowledgements 399 This work was partially supported by the FP7-Trilogy2 project. We 400 would like to thank Mohamed Boucadair for his useful suggestions and 401 comments on this document. 403 8. References 405 8.1. Normative References 407 [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, 408 "TCP Extensions for Multipath Operation with Multiple 409 Addresses", RFC 6824, January 2013. 411 8.2. Informative References 413 [BD2015] Baerts, M. and Q. De Coninck, "Multipath TCP with Real 414 Smartphone Applications", Master Thesis, UCL , June 2015. 416 [Cellnet12] 417 Paasch, C., Detal, G., Duchene, F., Raiciu, C., and O. 418 Bonaventure, "Exploring Mobile/WiFi Handover with 419 Multipath TCP", ACM SIGCOMM workshop on Cellular Networks 420 (Cellnet12) , 2012, 421 . 424 [I-D.bonaventure-mptcp-exp-option] 425 Bonaventure, O., benjamin.hesmans@uclouvain.be, b., and M. 426 Boucadair, "Experimental Multipath TCP option", draft- 427 bonaventure-mptcp-exp-option-00 (work in progress), June 428 2015. 430 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 431 793, September 1981. 433 [RFC1122] Braden, R., "Requirements for Internet Hosts - 434 Communication Layers", STD 3, RFC 1122, October 1989. 436 [RFC5461] Gont, F., "TCP's Reaction to Soft Errors", RFC 5461, 437 February 2009. 439 [RFC5482] Eggert, L. and F. Gont, "TCP User Timeout Option", RFC 440 5482, March 2009. 442 [RFC5533] Nordmark, E. and M. Bagnulo, "Shim6: Level 3 Multihoming 443 Shim Protocol for IPv6", RFC 5533, June 2009. 445 [RFC5534] Arkko, J. and I. van Beijnum, "Failure Detection and 446 Locator Pair Exploration Protocol for IPv6 Multihoming", 447 RFC 5534, June 2009. 449 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 450 (BFD)", RFC 5880, June 2010. 452 [RFC5883] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 453 (BFD) for Multihop Paths", RFC 5883, June 2010. 455 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 456 "Computing TCP's Retransmission Timer", RFC 6298, June 457 2011. 459 Authors' Addresses 461 Olivier Bonaventure 462 UCLouvain 464 Email: Olivier.Bonaventure@uclouvain.be 466 Quentin De Coninck 467 UCLouvain 469 Email: Quentin.Deconinck@student.uclouvain.be 470 Matthieu Baerts 471 UCLouvain 473 Email: Matthieu.Baerts@student.uclouvain.be 475 Fabien Duchene 476 UCLouvain 478 Email: Fabien.Duchene@uclouvain.be 480 Benjamin Hesmans 481 UCLouvain 483 Email: Benjamin.Hesmans@uclouvain.be