idnits 2.17.1 draft-hesmans-mptcp-socket-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 432 has weird spacing: '...ned int opt...' -- The document date (July 06, 2016) is 2851 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '0' on line 398 ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 6824 (Obsoleted by RFC 8684) -- Obsolete informational reference (is this intentional?): RFC 5246 (Obsoleted by RFC 8446) Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 MPTCP Working Group B. Hesmans 3 Internet-Draft O. Bonaventure 4 Intended status: Informational UCLouvain 5 Expires: January 7, 2017 July 06, 2016 7 A socket API to control Multipath TCP 8 draft-hesmans-mptcp-socket-00 10 Abstract 12 This document proposes an enhanced socket API to allow applications 13 to control the operation of a Multipath TCP stack. 15 Status of This Memo 17 This Internet-Draft is submitted in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF). Note that other groups may also distribute 22 working documents as Internet-Drafts. The list of current Internet- 23 Drafts is at http://datatracker.ietf.org/drafts/current/. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on January 7, 2017. 32 Copyright Notice 34 Copyright (c) 2016 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the Simplified BSD License. 47 Table of Contents 49 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 50 2. Basic operation . . . . . . . . . . . . . . . . . . . . . . . 3 51 3. Multipath TCP Socket API . . . . . . . . . . . . . . . . . . 5 52 3.1. Subflow list . . . . . . . . . . . . . . . . . . . . . . 5 53 3.2. Open subflow . . . . . . . . . . . . . . . . . . . . . . 7 54 3.3. Close subflow . . . . . . . . . . . . . . . . . . . . . . 8 55 3.4. Get subflow tuple . . . . . . . . . . . . . . . . . . . . 9 56 3.5. Subflow socket option . . . . . . . . . . . . . . . . . . 9 57 4. IANA considerations . . . . . . . . . . . . . . . . . . . . . 10 58 5. Security considerations . . . . . . . . . . . . . . . . . . . 11 59 6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 11 60 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11 61 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 62 8.1. Normative References . . . . . . . . . . . . . . . . . . 11 63 8.2. Informative References . . . . . . . . . . . . . . . . . 12 64 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 66 1. Introduction 68 Multipath TCP [RFC6824] was designed as an incrementally deployable 69 [RFC6182] extension to TCP [RFC0793]. One of its design objectives 70 was to remain backward compatible with the traditional socket API to 71 enable applications to benefit from Multipath TCP without requiring 72 any modification. This solution has been adopted by the Multipath 73 TCP implementation in the Linux kernel [MultipathTCP-Linux]. In this 74 implementation, once Multipath TCP has been enabled, all TCP 75 applications automatically use it. It is possible to turn Multipath 76 TCP off on a per socket basis, but this is rarely used. The 77 Multipath TCP stack contains a module, called the path manager, that 78 controls the utilisation of the different paths. Three path managers 79 have been implemented : 81 o the "full mesh" path manager, which is the default one, tries to 82 create subflows in full mesh among all the client addresses and 83 all addresses advertised by the server. All subflows are created 84 by the client because the server assumes that the client is often 85 behind a NAT or firewall 87 o the "ndiffports" path manager was designed for single-homed hosts. 88 It creates n parallel subflows between the client and the server. 89 It has been defined notably for datacenters [SIGCOMM11] 91 o the "user space" path manager [CONEXT15] uses Netlink to expose 92 events to specific applications and enables them to control the 93 operation of the underlying MPTCP stack. 95 However, discussions with users of the Multipath TCP implementation 96 in the Linux kernel indicate that they would often want a finer 97 control on the underlying stack and more precisely on the utilisation 98 of the different subflows. Smartphone applications are a typical 99 example. Measurements indicate that with the default path manager, 100 there are many subflows that are created without being used [PAM2016] 101 [COMMAG2016]. This increases energy consumption and could be avoided 102 on Multipath-TCP aware applications. 104 The Multipath TCP implementation used in Apple smartphones, tablets 105 and laptops [Apple-MPTCP] took a different approach. This MPTCP 106 stack is not exposed by default to the applications. To use MPTCP, 107 they need to use a specific address family and special system calls 108 [ANRW2016]. 110 Using a new address family and new system calls is a major 111 modification and application developers may not agree to maintain 112 different versions of their applications that run above regular TCP 113 and Multipath TCP. In this document, we propose a simple but 114 powerful API that relies only on socket options and the existing 115 system calls to interact with the MPTCP stack. Application 116 developers are already used to manipulate socket options and could 117 thus easily extend their applications to better utilize the 118 underlying MPTCP stack when available. This approach is similar to 119 the API outlined in [RFC6897], but to our knowledge, this API has 120 never been implemented. We also note that during the last decade the 121 socket API exposed by SCTP evolved to use more socket options 122 [RFC6458]. 124 This document is organised as follows. We first describe the basic 125 operation of our enhanced API in section Section 2. We then show in 126 section Section 3 how the "getsockopt" and "setsockopt" system calls 127 can be used to control the underlying Multipath TCP stack. We focus 128 on basic operations like retrieving the list of subflows that compose 129 a Multipath TCP connection, establishing a new subflow or terminating 130 an existing subflow in this first version of the document. We will 131 address in the next revision of this document more advanced topics 132 such as non-blocking I/O and the utilisation of the "recvmsg" and 133 "sendmsg" system calls. 135 2. Basic operation 137 In this section, we briefly describe the basic utilisation of the 138 enhanced socket API for Multipath TCP. As an illustration, we 139 consider a dual-homed smartphone having a WiFi and a cellular 140 interface that interacts with a single homed server. 142 We assume for simplicity in this example that the server is passive. 143 It creates a listening socket and accepts incoming connections 144 through the following system calls : 146 o "socket()" 148 o "bind()" 150 o "listen()" 152 Then data can be sent (resp. received) with the "send()" (resp. 153 "recv()") system calls and the connection can be terminated by using 154 the "close()" or "shutdown()" system calls. 156 On the client side, the following system calls are used to create a 157 Multipath TCP connection : 159 o "socket()" 161 o "connect()" 163 The "connect()" system call succeeds once the initial subflow of the 164 Multipath TCP connection has been established. We assume here that 165 Multipath TCP has been negotiated successfully. The client can then 166 send and receive data by using the "send()" and "recv()" system 167 calls. 169 The enhanced socket API enables the client (and also the server since 170 the protocol is symmetrical, but we ignore this in this section) to 171 control the utilisation of the different subflows. This control is 172 performed by setting and retrieving socket options through the 173 "setsockopt()" and "getsockopt()" system calls. Four main socket 174 options are defined to control the subflows used by the underlying 175 Multipath TCP connection : 177 o "MPTCP_GET_SUB_IDS" can only be used by "getsockopt()". It is 178 used to retrieve the current list of the subflows that compose the 179 underlying Multipath TCP connection. In this list, each one 180 identifier is associated with each subflow. 182 o "MPTCP_GET_SUB_TUPLE". This socket option is equivalent to the 183 "getpeername()" system call with regular TCP, but on a per subflow 184 basis. When used with "getsockopt()", it allows to retrieve the 185 IP addresses and ports of the two endpoints of a particular 186 subflow. 188 o "MPTCP_OPEN_SUB_TUPLE". This socket option is the equivalent to 189 the "connect()" system call, but it operates on subflows. It 190 allows to attempt to establish a new subflow by specifying its 191 (remote and optionally local) endpoints. 193 o "MPTCP_CLOSE_SUB_ID". This socket option allows to close a 194 specific subflow. 196 As an example, consider a smartphone application that creates a 197 Multipath TCP connection. This connection is established by using 198 the "connect()" system call. The MPTCP stack selects the outgoing 199 interface based on its routing table. Let us assume that the initial 200 subflow is established over the cellular interface. This is the only 201 subflow used for this connection at this time. To perform a 202 handover, the smartphone application would use "MPTCP_OPEN_SUB_TUPLE" 203 to create a new subflow over the WiFi interface. It can then use 204 "MPTCP_GET_SUB_TUPLE" to retrieve the local and remote addresses of 205 this subflow. Now that the WiFi subflow is active, the application 206 can use "MPTCP_CLOSE_SUB_ID" to close the cellular subflow. 208 3. Multipath TCP Socket API 210 From an application viewpoint, the interaction with the underlying 211 stack is awlays performed through a single socket. This unique 212 socket is used even if a Multipath TCP stack is used and many 213 subflows have been established. This single socket abstraction is 214 important because the applications exchange data through a bytestream 215 with both TCP and Multipath TCP. We preserve this abstraction in the 216 proposed enhanced socket API but expose some details of the 217 underlying MPTCP stack to the application. 219 For all the socket options presented bellow, we assume that the 220 underlying Multipath TCP connection is still a Multipath TCP 221 connection. Otherwise (e.g. after a fallback), they return an error 222 and set errno to "EOPNOTSUPP" is returned. 224 3.1. Subflow list 225 The first important information that a stack can expose are the 226 different subflows that are combined within a given Multipath TCP 227 connection. For this, we need a data structure that represents the 228 different subflows that compose a connection. The "mptcp_sub_ids" 229 structure shown in figure Figure 1 contains an array with the status 230 of the different subflows that compose a given connection. The 231 actual size of the array depends on the number of subflows and is 232 defined with the "sub_count" field. The "mptcp_sub_status" structure 233 reflects the status of each subflow. A subflow is identified by its 234 "id". In addition to the "id" of the subflow, the "mptcp_sub_status" 235 structure contains one flag : the "low\_prio" flag. It is set to 1 236 when the subflow is defined as a back-up subflow. Other flags could 237 be exposed through this structure in the future. 239 struct mptcp_sub_status { 240 __u8 id; 241 __u16 low_prio:1; 242 }; 244 struct mptcp_sub_ids { 245 __u8 sub_count; 246 struct mptcp_sub_status sub_status[]; 247 }; 249 Figure 1: The mptcp_sub_ids and mptcp_sub_status structures 251 This structure is used by the "MPTCP_GET_SUB_IDS" socket option. 252 More precisely, the "getsockopt", when used with the 253 "MPTCP_GET_SUB_IDS" socket option can retrieve the "mptcp_sub_ids" of 254 the underlying Multipath TCP connection. This call may return an 255 empty array if the connection does not contain any subflow. This can 256 happen with Multipath TCP when the last subflow composing the 257 connection has been terminated abruptly. 259 The "id" that is returned in the "mptcp_sub_ids" structure is 260 important because it identifies the subflow and is used as an 261 identifier by the other socket options. 263 The call may return the error "EINVAL" if the buffer passed by the 264 application is too small to copy the array of subflow status. 266 A simple example of its utilisation is presented in figure Figure 2. 268 int i; 269 unsigned int optlen; 270 struct mptcp_sub_ids *ids; 272 optlen = 42; 273 ids = malloc(optlen); 275 getsockopt(sockfd, IPPROTO_TCP, MPTCP_GET_SUB_IDS, ids, &optlen); 277 for(i = 0; i < ids->sub_count; i++){ 278 printf("Subflow id : %i\n", ids->sub_status[i].id); 279 } 281 Figure 2: Sample code for the utilisation of MPTCP_GET_SUB_IDS 283 3.2. Open subflow 285 Another important part of the API is to enable an application to open 286 new subflows. This is possible through the "MPTCP_OPEN_SUB_TUPLE" 287 socket option. This option uses the "mptcp_sub_tuple" structure 288 shown in figure Figure 3 to pass the priority, local and remote 289 endpoints of the new subflow. 291 struct mptcp_sub_tuple { 292 __u8 id; 293 __u8 prio; 294 __u8 addrs[0]; 295 }; 297 Figure 3: The mptcp_sub_tuple structure 299 The "id" field is an output. This is the "id" of the created 300 subflow. The "prio" field indicates if the new subflow should be 301 considered as back-up or not. The "addrs" must be a pair array of 302 size two. The first address must be the address of the source and 303 the second address must be the address of the destination. The 304 actual structure passed must be either "sockaddr_in"or 305 "sockaddr_in6", but the two elements of the array must be of the same 306 type. The struct "sockaddr" can be used to determine which one is 307 actually passed. 309 The caller can also set the source address to be either "INADDR_ANY" 310 for IPv4 or "in6addr_any" for IPv6. In this case, the kernel chooses 311 the source address to be used for the new subflow. 313 Errors returned by either "bind()" or "connect()" are returned if an 314 error occurred during the process. 316 An example is provided in figure Figure 4. 318 unsigned int optlen; 319 struct mptcp_sub_tuple *sub_tuple; 320 struct sockaddr_in *addr; 321 int error; 323 optlen = sizeof(struct mptcp_sub_tuple) + 324 2 * sizeof(struct sockaddr_in); 325 sub_tuple = malloc(optlen); 327 sub_tuple->id = 0; 328 sub_tuple->prio = 0; 330 addr = (struct sockaddr_in*) &sub_tuple->addrs[0]; 332 addr->sin_family = AF_INET; 333 addr->sin_port = htons(12345); 334 inet_pton(AF_INET, "10.0.0.1", &addr->sin_addr); 336 addr++; 338 addr->sin_family = AF_INET; 339 addr->sin_port = htons(1234); 340 inet_pton(AF_INET, "10.1.0.1", &addr->sin_addr); 342 error = getsockopt(sockfd, IPPROTO_TCP, MPTCP_OPEN_SUB_TUPLE, 343 sub_tuple, &optlen); 345 Figure 4: Sample code to establish an additional subflow 347 3.3. Close subflow 349 To close a subflow, the socket option "MPTCP_CLOSE_SUBFLOW" is used. 350 This option used the "mptcp_close_sub_id" structure defined in figure 351 Figure 5. 353 struct mptcp_close_sub_id { 354 __u8 id; 355 int how; 356 }; 358 Figure 5: The mptcp_close_sub_id structure 360 In the above structure, "id" is the identifier of the subflow that 361 needs to be closed. If the "id" is invalid, "EINVAL" is returned. 363 The "how" field is used to define how to subflow should be 364 terminated. It recognises the same set of constant that are used by 365 "shutdown()". In addition to this set, "RST" can be used to 366 indicates that the subflow should be terminated by sending an "RST". 368 3.4. Get subflow tuple 370 An application may also be interested by the addresses and ports that 371 are used by a given subflow. To retrieve this information, the 372 socket option "MPTCP_GET_SUB_TUPLE" is used in combination with the 373 "mptcp_sub_tuple" structure shown in figure Figure 6. 375 struct mptcp_sub_tuple { 376 __u8 id; 377 __u8 addrs[0]; 378 }; 380 Figure 6: The mptcp_sub_tuple structure 382 This is the same structure as the one used to open a subflow but in 383 this context, "id" is the input and "addrs" is the output. 385 A sample code is provided in figure Figure 7. 387 unsigned int optlen; 388 struct mptcp_sub_tuple *sub_tuple; 390 optlen = 100; 392 sub_tuple = malloc(optlen); 394 sub_tuple->id = sub_id; 395 getsockopt(sockfd, IPPROTO_TCP, MPTCP_GET_SUB_TUPLE, sub_tuple, 396 &optlen); 398 sin = (struct sockaddr_in*) &sub_tuple->addrs[0]; 400 printf("\tip src : %s src port : %hu\n", inet_ntoa(sin->sin_addr), 401 ntohs(sin->sin_port)); 403 sin++; 405 printf("\tip dst : %s dst port : %hu\n", inet_ntoa(sin->sin_addr), 406 ntohs(sin->sin_port)); 408 Figure 7: Sample code using the MPTCP_GET_SUB_TUPLE option 410 3.5. Subflow socket option 412 TCP/IP implementations support different socket options. Some of 413 them can be applied to the TCP layer while others can be applied to 414 the IP layer. To be able to issue a socket option on a specific 415 subflow, we define the "MPTCP_SUB_GETSOCKOPT" and 416 "MPTCP_SUB_SETSOCKOPT" options. These two socket options use 417 respectively the structures presented in figure Figure 8. 419 struct mptcp_sub_getsockopt { 420 __u8 id; 421 int level; 422 int optname; 423 char __user *optval; 424 unsigned int __user *optlen; 425 }; 427 struct mptcp_sub_setsockopt { 428 __u8 id; 429 int level; 430 int optname; 431 char __user *optval; 432 unsigned int optlen; 433 }; 435 Figure 8: Structures used by the ``MPTCP_SUB_GETSOCKOPT`` and 436 ``MPTCP_SUB_SETSOCKOPT`` options 438 In the two structures "id" indicates to which subflow the socket 439 option should be redirected. The end of each structure contains the 440 information needed to perform the socket option call on the subflow. 442 Figure Figure 9 illustrates how the IP_TSO socket option can be 443 applied on a particular subflow. 445 unsigned int optlen, sub_optlen; 446 struct mptcp_sub_setsockopt sub_sso; 447 int val = 12; 449 optlen = sizeof(struct mptcp_sub_setsockopt); 450 sub_optlen = sizeof(int); 451 sub_sso.id = sub_id; 452 sub_sso.level = IPPROTO_IP; 453 sub_sso.optname = IP_TOS; 454 sub_sso.optlen = sub_optlen; 455 sub_sso.optval = (char *) &val; 457 setsockopt(sockfd, IPPROTO_TCP, MPTCP_SUB_SETSOCKOPT, &sub_sso, 458 optlen); 460 Figure 9: Example socket option 462 4. IANA considerations 463 There are no IANA considerations in this document. 465 5. Security considerations 467 TCP and UDP implementations usually reserve port numbers below 1024 468 for privileged users. On such implementations, Multipath TCP should 469 restrict the ability of the users to create subflows on privileged 470 ports through the "MPTCP_OPEN_SUB_TUPLE". 472 For similar reasons, the "MPTCP_SUB_SETSOCKOPT" socket option should 473 not enable an unprivileged user to retrieve or modify a socket option 474 on a subflow if he is not allowed to perform such actions on a 475 regular TCP connection. 477 Applications requiring strong security should implement cryptographic 478 protocols such as TLS [RFC5246] or ssh [RFC4251]. The proposed API 479 enables such application to better control their utilisation of the 480 underlying interfaces by managing the different subflows. 482 6. Conclusion 484 In this document, we have documented an enhanced socket API that 485 enables applications to control the creation and the release of 486 subflows by the underlying Multipath TCP stack. We expect that a 487 standardised API supported by different implementations will be an 488 important stop for the deployment of Multipath TCP aware applications 489 on both multihomed hosts such as smartphones as well as on servers. 490 This enhanced API has already been implemented on the Multipath TCP 491 implementation in the Linux kernel. Future versions of this document 492 will address more advanced utilisations of the socket API such as 493 non-blocking I/O and the "sendmsg()" and "recvmsg()" system calls. 495 7. Acknowledgements 497 We would like to thank Christoph Paasch, Quentin De Coninck Rao 498 Shoaib for their comments on an early version of this document. 500 8. References 502 8.1. Normative References 504 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 505 793, DOI 10.17487/RFC0793, September 1981, 506 . 508 [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, 509 "TCP Extensions for Multipath Operation with Multiple 510 Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013, 511 . 513 8.2. Informative References 515 [ANRW2016] 516 Hesmans, B. and O. Bonaventure, "An enhanced socket API 517 for Multipath TCP", 2016, . 520 [Apple-MPTCP] 521 Apple, Inc, ., "iOS - Multipath TCP Support in iOS 7", 522 n.d., . 524 [COMMAG2016] 525 De Coninck, Q., Baerts, M., Hesmans, B., and O. 526 Bonaventure, "Observing Real Smartphone Applications over 527 Multipath TCP", IEEE Communications Magazine , March 2016, 528 . 531 [CONEXT15] 532 Hesmans, B., Detal, G., Barre, S., Bauduin, R., and O. 533 Bonaventure, "SMAPP - Towards Smart Multipath TCP-enabled 534 APPlications", Proc. Conext 2015, Heidelberg, Germany , 535 December 2015, . 538 [MultipathTCP-Linux] 539 Paasch, C., Barre, S., and . et al, "Multipath TCP 540 implementation in the Linux kernel", n.d., 541 . 543 [PAM2016] De Coninck, Q., Baerts, M., Hesmans, B., and O. 544 Bonaventure, "A First Analysis of Multipath TCP on 545 Smartphones", 17th International Passive and Active 546 Measurements Conference (PAM2016) , March 2016, . 550 [RFC4251] Ylonen, T. and C. Lonvick, Ed., "The Secure Shell (SSH) 551 Protocol Architecture", RFC 4251, DOI 10.17487/RFC4251, 552 January 2006, . 554 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 555 (TLS) Protocol Version 1.2", RFC 5246, DOI 10.17487/ 556 RFC5246, August 2008, 557 . 559 [RFC6182] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. 560 Iyengar, "Architectural Guidelines for Multipath TCP 561 Development", RFC 6182, DOI 10.17487/RFC6182, March 2011, 562 . 564 [RFC6458] Stewart, R., Tuexen, M., Poon, K., Lei, P., and V. 565 Yasevich, "Sockets API Extensions for the Stream Control 566 Transmission Protocol (SCTP)", RFC 6458, DOI 10.17487/ 567 RFC6458, December 2011, 568 . 570 [RFC6897] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application 571 Interface Considerations", RFC 6897, DOI 10.17487/RFC6897, 572 March 2013, . 574 [SIGCOMM11] 575 Raiciu, C., Barre, S., Pluntke, C., Greenhalgh, A., 576 Wischik, D., and M. Handley, "Improving datacenter 577 performance and robustness with multipath TCP", 578 Proceedings of the ACM SIGCOMM 2011 conference , 2011, 579 . 581 Authors' Addresses 583 Benjamin Hesmans 584 UCLouvain 586 Email: Benjamin.Hesmans@uclouvain.be 588 Olivier Bonaventure 589 UCLouvain 591 Email: Olivier.Bonaventure@uclouvain.be