idnits 2.17.1 draft-hesmans-mptcp-socket-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 440 has weird spacing: '...ned int opt...' -- The document date (January 09, 2017) is 2654 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '0' on line 406 ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 6824 (Obsoleted by RFC 8684) -- Obsolete informational reference (is this intentional?): RFC 5246 (Obsoleted by RFC 8446) Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 MPTCP Working Group B. Hesmans 3 Internet-Draft O. Bonaventure 4 Intended status: Informational UCLouvain 5 Expires: July 13, 2017 January 09, 2017 7 A socket API to control Multipath TCP 8 draft-hesmans-mptcp-socket-01 10 Abstract 12 This document proposes an enhanced socket API to allow applications 13 to control the operation of a Multipath TCP stack. 15 Status of This Memo 17 This Internet-Draft is submitted in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF). Note that other groups may also distribute 22 working documents as Internet-Drafts. The list of current Internet- 23 Drafts is at http://datatracker.ietf.org/drafts/current/. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on July 13, 2017. 32 Copyright Notice 34 Copyright (c) 2017 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the Simplified BSD License. 47 Table of Contents 49 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 50 2. Basic operation . . . . . . . . . . . . . . . . . . . . . . . 3 51 3. Multipath TCP Socket API . . . . . . . . . . . . . . . . . . 5 52 3.1. Subflow list . . . . . . . . . . . . . . . . . . . . . . 5 53 3.2. Open subflow . . . . . . . . . . . . . . . . . . . . . . 7 54 3.3. Close subflow . . . . . . . . . . . . . . . . . . . . . . 8 55 3.4. Get subflow tuple . . . . . . . . . . . . . . . . . . . . 9 56 3.5. Subflow socket option . . . . . . . . . . . . . . . . . . 10 57 4. IANA considerations . . . . . . . . . . . . . . . . . . . . . 11 58 5. Security considerations . . . . . . . . . . . . . . . . . . . 11 59 6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 11 60 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 61 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 62 8.1. Normative References . . . . . . . . . . . . . . . . . . 12 63 8.2. Informative References . . . . . . . . . . . . . . . . . 12 64 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 66 1. Introduction 68 Multipath TCP [RFC6824] was designed as an incrementally deployable 69 [RFC6182] extension to TCP [RFC0793]. One of its design objectives 70 was to remain backward compatible with the traditional socket API to 71 enable applications to benefit from Multipath TCP without requiring 72 any modification. This solution has been adopted by the Multipath 73 TCP implementation in the Linux kernel [MultipathTCP-Linux]. In this 74 implementation, once Multipath TCP has been enabled, all TCP 75 applications automatically use it. It is possible to turn Multipath 76 TCP off on a per socket basis, but this is rarely used. The 77 Multipath TCP stack contains a module, called the path manager, that 78 controls the utilisation of the different paths. Three path managers 79 have been implemented : 81 o the "full mesh" path manager, which is the default one, tries to 82 create subflows in full mesh among all the client addresses and 83 all addresses advertised by the server. All subflows are created 84 by the client because the server assumes that the client is often 85 behind a NAT or firewall 87 o the "ndiffports" path manager was designed for single-homed hosts. 88 It creates n parallel subflows between the client and the server. 89 It has been defined notably for datacenters [SIGCOMM11] 91 o the "user space" path manager [CONEXT15] uses Netlink to expose 92 events to specific applications and enables them to control the 93 operation of the underlying MPTCP stack. 95 However, discussions with users of the Multipath TCP implementation 96 in the Linux kernel indicate that they would often want a finer 97 control on the underlying stack and more precisely on the utilisation 98 of the different subflows. Smartphone applications are a typical 99 example. Measurements indicate that with the default path manager, 100 there are many subflows that are created without being used [PAM2016] 101 [COMMAG2016]. This increases energy consumption and could be avoided 102 on Multipath-TCP aware applications. 104 The Multipath TCP implementation used in Apple smartphones, tablets 105 and laptops [Apple-MPTCP] took a different approach. This MPTCP 106 stack is not exposed by default to the applications. To use MPTCP, 107 they need to use a specific address family and special system calls 108 [ANRW2016]. 110 Using a new address family and new system calls is a major 111 modification and application developers may not agree to maintain 112 different versions of their applications that run above regular TCP 113 and Multipath TCP. In this document, we propose a simple but 114 powerful API that relies only on socket options and the existing 115 system calls to interact with the MPTCP stack. Application 116 developers are already used to manipulate socket options and could 117 thus easily extend their applications to better utilize the 118 underlying MPTCP stack when available. This approach is similar to 119 the API outlined in [RFC6897], but to our knowledge, this API has 120 never been implemented. We also note that during the last decade the 121 socket API exposed by SCTP evolved to use more socket options 122 [RFC6458]. 124 This document is organised as follows. We first describe the basic 125 operation of our enhanced API in section Section 2. We then show in 126 section Section 3 how the "getsockopt" and "setsockopt" system calls 127 can be used to control the underlying Multipath TCP stack. We focus 128 on basic operations like retrieving the list of subflows that compose 129 a Multipath TCP connection, establishing a new subflow or terminating 130 an existing subflow in this first version of the document. We will 131 address in the next revision of this document more advanced topics 132 such as non-blocking I/O and the utilisation of the "recvmsg" and 133 "sendmsg" system calls. 135 2. Basic operation 137 In this section, we briefly describe the basic utilisation of the 138 enhanced socket API for Multipath TCP. As an illustration, we 139 consider a dual-homed smartphone having a WiFi and a cellular 140 interface that interacts with a single homed server. 142 We assume for simplicity in this example that the server is passive. 143 It creates a listening socket and accepts incoming connections 144 through the following system calls : 146 o "socket()" 148 o "bind()" 150 o "listen()" 152 Then data can be sent (resp. received) with the "send()" (resp. 153 "recv()") system calls and the connection can be terminated by using 154 the "close()" or "shutdown()" system calls. 156 On the client side, the following system calls are used to create a 157 Multipath TCP connection : 159 o "socket()" 161 o "connect()" 163 The "connect()" system call succeeds once the initial subflow of the 164 Multipath TCP connection has been established. We assume here that 165 Multipath TCP has been negotiated successfully. The client can then 166 send and receive data by using the "send()" and "recv()" system 167 calls. 169 The enhanced socket API enables the client (and also the server since 170 the protocol is symmetrical, but we ignore this in this section) to 171 control the utilisation of the different subflows. This control is 172 performed by setting and retrieving socket options through the 173 "setsockopt()" and "getsockopt()" system calls. Four main socket 174 options are defined to control the subflows used by the underlying 175 Multipath TCP connection : 177 o "MPTCP_GET_SUB_IDS" can only be used by "getsockopt()". It is 178 used to retrieve the current list of the subflows that compose the 179 underlying Multipath TCP connection. In this list, each one 180 identifier is associated with each subflow. 182 o "MPTCP_GET_SUB_TUPLE". This socket option is equivalent to the 183 "getpeername()" system call with regular TCP, but on a per subflow 184 basis. When used with "getsockopt()", it allows to retrieve the 185 IP addresses and ports of the two endpoints of a particular 186 subflow. 188 o "MPTCP_OPEN_SUB_TUPLE". This socket option is the equivalent to 189 the "connect()" system call, but it operates on subflows. It 190 allows to attempt to establish a new subflow by specifying its 191 (remote and optionally local) endpoints. 193 o "MPTCP_CLOSE_SUB_ID". This socket option allows to close a 194 specific subflow. 196 As an example, consider a smartphone application that creates a 197 Multipath TCP connection. This connection is established by using 198 the "connect()" system call. The MPTCP stack selects the outgoing 199 interface based on its routing table. Let us assume that the initial 200 subflow is established over the cellular interface. This is the only 201 subflow used for this connection at this time. To perform a 202 handover, the smartphone application would use "MPTCP_OPEN_SUB_TUPLE" 203 to create a new subflow over the WiFi interface. It can then use 204 "MPTCP_GET_SUB_TUPLE" to retrieve the local and remote addresses of 205 this subflow. Now that the WiFi subflow is active, the application 206 can use "MPTCP_CLOSE_SUB_ID" to close the cellular subflow. 208 3. Multipath TCP Socket API 210 From an application viewpoint, the interaction with the underlying 211 stack is awlays performed through a single socket. This unique 212 socket is used even if a Multipath TCP stack is used and many 213 subflows have been established. This single socket abstraction is 214 important because the applications exchange data through a bytestream 215 with both TCP and Multipath TCP. We preserve this abstraction in the 216 proposed enhanced socket API but expose some details of the 217 underlying MPTCP stack to the application. 219 For all the socket options presented bellow, we assume that the 220 underlying Multipath TCP connection is still a Multipath TCP 221 connection. Otherwise (e.g. after a fallback), they return an error 222 and set errno to "EOPNOTSUPP" is returned. 224 3.1. Subflow list 226 The first important information that a stack can expose are the 227 different subflows that are combined within a given Multipath TCP 228 connection. For this, we need a data structure that represents the 229 different subflows that compose a connection. The "mptcp_sub_ids" 230 structure shown in figure Figure 1 contains an array with the status 231 of the different subflows that compose a given connection. The 232 actual size of the array depends on the number of subflows and is 233 defined with the "sub_count" field. The "mptcp_sub_status" structure 234 reflects the status of each subflow. A subflow is identified by its 235 "id". In addition to the "id" of the subflow, the "mptcp_sub_status" 236 structure contains one flag : the "low\_prio" flag. It is set to 1 237 when the subflow is defined as a back-up subflow. Other flags could 238 be exposed through this structure in the future. 240 struct mptcp_sub_status { 241 __u8 id; 242 __u16 low_prio:1; 243 }; 245 struct mptcp_sub_ids { 246 __u8 sub_count; 247 struct mptcp_sub_status sub_status[]; 248 }; 250 Figure 1: The mptcp_sub_ids and mptcp_sub_status structures 252 This structure is used by the "MPTCP_GET_SUB_IDS" socket option. 253 More precisely, the "getsockopt", when used with the 254 "MPTCP_GET_SUB_IDS" socket option can retrieve the "mptcp_sub_ids" of 255 the underlying Multipath TCP connection. This call may return an 256 empty array if the connection does not contain any subflow. This can 257 happen with Multipath TCP when the last subflow composing the 258 connection has been terminated abruptly. 260 The "id" that is returned in the "mptcp_sub_ids" structure is 261 important because it identifies the subflow and is used as an 262 identifier by the other socket options. 264 The call may return the error "EINVAL" if the buffer passed by the 265 application is too small to copy the array of subflow status. 267 A simple example of its utilisation is presented in figure Figure 2. 269 int i; 270 unsigned int optlen; 271 struct mptcp_sub_ids *ids; 273 optlen = 42; 275 ids = malloc(optlen); 277 getsockopt(sockfd, IPPROTO_TCP, MPTCP_GET_SUB_IDS, ids, &optlen); 279 for(i = 0; i < ids->sub_count; i++){ 280 printf("Subflow id : %i\n", ids->sub_status[i].id); 281 } 283 Figure 2: Sample code for the utilisation of MPTCP_GET_SUB_IDS 285 3.2. Open subflow 287 Another important part of the API is to enable an application to open 288 new subflows. This is possible through the "MPTCP_OPEN_SUB_TUPLE" 289 socket option. This option uses the "mptcp_sub_tuple" structure 290 shown in figure Figure 3 to pass the priority, local and remote 291 endpoints of the new subflow. 293 struct mptcp_sub_tuple { 294 __u8 id; 295 __u8 prio; 296 __u8 addrs[0]; 297 int if_idx; 298 }; 300 Figure 3: The mptcp_sub_tuple structure 302 The "id" field is an output. This is the "id" of the created 303 subflow. The "prio" field indicates if the new subflow should be 304 considered as back-up or not. The "addrs" must be a pair array of 305 size two. The first address must be the address of the source and 306 the second address must be the address of the destination. The 307 actual structure passed must be either "sockaddr_in" or 308 "sockaddr_in6", but the two elements of the array must be of the same 309 type. The struct "sockaddr" can be used to determine which one is 310 actually passed. 312 The caller can also set the source address to be either "INADDR_ANY" 313 for IPv4 or "in6addr_any" for IPv6. In this case, the kernel chooses 314 the source address to be used for the new subflow. 316 If a single source address is used for multiple interfaces, the 317 caller may choose the interface to be used by setting the "if_idx" 318 field. If this field is set to zero the kernel will choose the 319 default interface. 321 Errors returned by either "bind()" or "connect()" are returned if an 322 error occurred during the process. 324 An example is provided in figure Figure 4. 326 unsigned int optlen; 327 struct mptcp_sub_tuple *sub_tuple; 328 struct sockaddr_in *addr; 329 int error; 331 optlen = sizeof(struct mptcp_sub_tuple) + 332 2 * sizeof(struct sockaddr_in); 333 sub_tuple = malloc(optlen); 335 sub_tuple->id = 0; 336 sub_tuple->prio = 0; 338 addr = (struct sockaddr_in*) &sub_tuple->addrs[0]; 340 addr->sin_family = AF_INET; 341 addr->sin_port = htons(12345); 342 inet_pton(AF_INET, "10.0.0.1", &addr->sin_addr); 344 addr++; 346 addr->sin_family = AF_INET; 347 addr->sin_port = htons(1234); 348 inet_pton(AF_INET, "10.1.0.1", &addr->sin_addr); 350 error = getsockopt(sockfd, IPPROTO_TCP, MPTCP_OPEN_SUB_TUPLE, 351 sub_tuple, &optlen); 353 Figure 4: Sample code to establish an additional subflow 355 3.3. Close subflow 357 To close a subflow, the socket option "MPTCP_CLOSE_SUBFLOW" is used. 358 This option used the "mptcp_close_sub_id" structure defined in figure 359 Figure 5. 361 struct mptcp_close_sub_id { 362 __u8 id; 363 int how; 364 }; 366 Figure 5: The mptcp_close_sub_id structure 368 In the above structure, "id" is the identifier of the subflow that 369 needs to be closed. If the "id" is invalid, "EINVAL" is returned. 371 The "how" field is used to define how to subflow should be 372 terminated. It recognises the same set of constant that are used by 373 "shutdown()". In addition to this set, "RST" can be used to 374 indicates that the subflow should be terminated by sending an "RST". 376 3.4. Get subflow tuple 378 An application may also be interested by the addresses and ports that 379 are used by a given subflow. To retrieve this information, the 380 socket option "MPTCP_GET_SUB_TUPLE" is used in combination with the 381 "mptcp_sub_tuple" structure shown in figure Figure 6. 383 struct mptcp_sub_tuple { 384 __u8 id; 385 __u8 addrs[0]; 386 }; 388 Figure 6: The mptcp_sub_tuple structure 390 This is the same structure as the one used to open a subflow but in 391 this context, "id" is the input and "addrs" is the output. 393 A sample code is provided in figure Figure 7. 395 unsigned int optlen; 396 struct mptcp_sub_tuple *sub_tuple; 398 optlen = 100; 400 sub_tuple = malloc(optlen); 402 sub_tuple->id = sub_id; 403 getsockopt(sockfd, IPPROTO_TCP, MPTCP_GET_SUB_TUPLE, sub_tuple, 404 &optlen); 406 sin = (struct sockaddr_in*) &sub_tuple->addrs[0]; 408 printf("\tip src : %s src port : %hu\n", inet_ntoa(sin->sin_addr), 409 ntohs(sin->sin_port)); 411 sin++; 413 printf("\tip dst : %s dst port : %hu\n", inet_ntoa(sin->sin_addr), 414 ntohs(sin->sin_port)); 416 Figure 7: Sample code using the MPTCP_GET_SUB_TUPLE option 418 3.5. Subflow socket option 420 TCP/IP implementations support different socket options. Some of 421 them can be applied to the TCP layer while others can be applied to 422 the IP layer. To be able to issue a socket option on a specific 423 subflow, we define the "MPTCP_SUB_GETSOCKOPT" and 424 "MPTCP_SUB_SETSOCKOPT" options. These two socket options use 425 respectively the structures presented in figure Figure 8. 427 struct mptcp_sub_getsockopt { 428 __u8 id; 429 int level; 430 int optname; 431 char __user *optval; 432 unsigned int __user *optlen; 433 }; 435 struct mptcp_sub_setsockopt { 436 __u8 id; 437 int level; 438 int optname; 439 char __user *optval; 440 unsigned int optlen; 441 }; 443 Figure 8: Structures used by the ``MPTCP_SUB_GETSOCKOPT`` and 444 ``MPTCP_SUB_SETSOCKOPT`` options 446 In the two structures "id" indicates to which subflow the socket 447 option should be redirected. The end of each structure contains the 448 information needed to perform the socket option call on the subflow. 450 Figure Figure 9 illustrates how the IP_TSO socket option can be 451 applied on a particular subflow. 453 unsigned int optlen, sub_optlen; 454 struct mptcp_sub_setsockopt sub_sso; 455 int val = 12; 457 optlen = sizeof(struct mptcp_sub_setsockopt); 458 sub_optlen = sizeof(int); 459 sub_sso.id = sub_id; 460 sub_sso.level = IPPROTO_IP; 461 sub_sso.optname = IP_TOS; 462 sub_sso.optlen = sub_optlen; 463 sub_sso.optval = (char *) &val; 465 setsockopt(sockfd, IPPROTO_TCP, MPTCP_SUB_SETSOCKOPT, &sub_sso, 466 optlen); 468 Figure 9: Example socket option 470 4. IANA considerations 472 There are no IANA considerations in this document. 474 5. Security considerations 476 TCP and UDP implementations usually reserve port numbers below 1024 477 for privileged users. On such implementations, Multipath TCP should 478 restrict the ability of the users to create subflows on privileged 479 ports through the "MPTCP_OPEN_SUB_TUPLE". 481 For similar reasons, the "MPTCP_SUB_SETSOCKOPT" socket option should 482 not enable an unprivileged user to retrieve or modify a socket option 483 on a subflow if he is not allowed to perform such actions on a 484 regular TCP connection. 486 Applications requiring strong security should implement cryptographic 487 protocols such as TLS [RFC5246] or ssh [RFC4251]. The proposed API 488 enables such application to better control their utilisation of the 489 underlying interfaces by managing the different subflows. 491 6. Conclusion 493 In this document, we have documented an enhanced socket API that 494 enables applications to control the creation and the release of 495 subflows by the underlying Multipath TCP stack. We expect that a 496 standardised API supported by different implementations will be an 497 important stop for the deployment of Multipath TCP aware applications 498 on both multihomed hosts such as smartphones as well as on servers. 499 This enhanced API has already been implemented on the Multipath TCP 500 implementation in the Linux kernel. Future versions of this document 501 will address more advanced utilisations of the socket API such as 502 non-blocking I/O and the "sendmsg()" and "recvmsg()" system calls. 504 7. Acknowledgements 506 We would like to thank Christoph Paasch, Quentin De Coninck Rao 507 Shoaib for their comments on an early version of this document. 509 8. References 511 8.1. Normative References 513 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 514 RFC 793, DOI 10.17487/RFC0793, September 1981, 515 . 517 [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, 518 "TCP Extensions for Multipath Operation with Multiple 519 Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013, 520 . 522 8.2. Informative References 524 [ANRW2016] 525 Hesmans, B. and O. Bonaventure, "An enhanced socket API 526 for Multipath TCP", 2016, . 529 [Apple-MPTCP] 530 Apple, Inc, ., "iOS - Multipath TCP Support in iOS 7", 531 n.d., . 533 [COMMAG2016] 534 De Coninck, Q., Baerts, M., Hesmans, B., and O. 535 Bonaventure, "Observing Real Smartphone Applications over 536 Multipath TCP", IEEE Communications Magazine , March 2016, 537 . 540 [CONEXT15] 541 Hesmans, B., Detal, G., Barre, S., Bauduin, R., and O. 542 Bonaventure, "SMAPP - Towards Smart Multipath TCP-enabled 543 APPlications", Proc. Conext 2015, Heidelberg, Germany , 544 December 2015, . 547 [MultipathTCP-Linux] 548 Paasch, C., Barre, S., and . et al, "Multipath TCP 549 implementation in the Linux kernel", n.d., 550 . 552 [PAM2016] De Coninck, Q., Baerts, M., Hesmans, B., and O. 553 Bonaventure, "A First Analysis of Multipath TCP on 554 Smartphones", 17th International Passive and Active 555 Measurements Conference (PAM2016) , March 2016, 556 . 559 [RFC4251] Ylonen, T. and C. Lonvick, Ed., "The Secure Shell (SSH) 560 Protocol Architecture", RFC 4251, DOI 10.17487/RFC4251, 561 January 2006, . 563 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 564 (TLS) Protocol Version 1.2", RFC 5246, 565 DOI 10.17487/RFC5246, August 2008, 566 . 568 [RFC6182] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. 569 Iyengar, "Architectural Guidelines for Multipath TCP 570 Development", RFC 6182, DOI 10.17487/RFC6182, March 2011, 571 . 573 [RFC6458] Stewart, R., Tuexen, M., Poon, K., Lei, P., and V. 574 Yasevich, "Sockets API Extensions for the Stream Control 575 Transmission Protocol (SCTP)", RFC 6458, 576 DOI 10.17487/RFC6458, December 2011, 577 . 579 [RFC6897] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application 580 Interface Considerations", RFC 6897, DOI 10.17487/RFC6897, 581 March 2013, . 583 [SIGCOMM11] 584 Raiciu, C., Barre, S., Pluntke, C., Greenhalgh, A., 585 Wischik, D., and M. Handley, "Improving datacenter 586 performance and robustness with multipath TCP", 587 Proceedings of the ACM SIGCOMM 2011 conference , 2011, 588 . 590 Authors' Addresses 591 Benjamin Hesmans 592 UCLouvain 594 Email: Benjamin.Hesmans@uclouvain.be 596 Olivier Bonaventure 597 UCLouvain 599 Email: Olivier.Bonaventure@uclouvain.be