idnits 2.17.1 draft-scharf-mptcp-api-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 20, 2010) is 4934 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (ref. '1') (Obsoleted by RFC 9293) == Outdated reference: A later version (-05) exists of draft-ietf-mptcp-architecture-01 == Outdated reference: A later version (-12) exists of draft-ietf-mptcp-multiaddressed-01 == Outdated reference: A later version (-08) exists of draft-ietf-mptcp-threat-03 == Outdated reference: A later version (-07) exists of draft-ietf-mptcp-congestion-00 == Outdated reference: A later version (-17) exists of draft-ietf-shim6-multihome-shim-api-13 == Outdated reference: A later version (-32) exists of draft-ietf-tsvwg-sctpsocket-23 == Outdated reference: A later version (-15) exists of draft-ietf-mif-problem-statement-04 == Outdated reference: A later version (-12) exists of draft-ietf-mif-current-practices-01 Summary: 1 error (**), 0 flaws (~~), 9 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force M. Scharf 3 Internet-Draft Alcatel-Lucent Bell Labs 4 Intended status: Informational A. Ford 5 Expires: April 23, 2011 Roke Manor Research 6 October 20, 2010 8 MPTCP Application Interface Considerations 9 draft-scharf-mptcp-api-03 11 Abstract 13 Multipath TCP (MPTCP) adds the capability of using multiple paths to 14 a regular TCP session. Even though it is designed to be totally 15 backward compatible to applications, the data transport differs 16 compared to regular TCP, and there are several additional degrees of 17 freedom that applications may wish to exploit. This document 18 summarizes the impact that MPTCP may have on applications, such as 19 changes in performance. Furthermore, it discusses compatibility 20 issues of MPTCP in combination with non-MPTCP-aware applications. 21 Finally, the document describes a basic application interface for 22 MPTCP-aware applications that provides access to multipath address 23 information and a level of control equivalent to regular TCP. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on April 23, 2011. 42 Copyright Notice 44 Copyright (c) 2010 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 61 3. Comparison of MPTCP and Regular TCP . . . . . . . . . . . . . 5 62 3.1. Performance Impact . . . . . . . . . . . . . . . . . . . . 5 63 3.1.1. Throughput . . . . . . . . . . . . . . . . . . . . . . 5 64 3.1.2. Delay . . . . . . . . . . . . . . . . . . . . . . . . 6 65 3.1.3. Resilience . . . . . . . . . . . . . . . . . . . . . . 6 66 3.2. Potential Problems . . . . . . . . . . . . . . . . . . . . 7 67 3.2.1. Impact of Middleboxes . . . . . . . . . . . . . . . . 7 68 3.2.2. Outdated Implicit Assumptions . . . . . . . . . . . . 7 69 3.2.3. Security Implications . . . . . . . . . . . . . . . . 7 70 4. Operation of MPTCP with Legacy Applications . . . . . . . . . 8 71 4.1. Overview of the MPTCP Network Stack . . . . . . . . . . . 8 72 4.2. Address Issues . . . . . . . . . . . . . . . . . . . . . . 9 73 4.2.1. Specification of Addresses by Applications . . . . . . 9 74 4.2.2. Querying of Addresses by Applications . . . . . . . . 9 75 4.3. Socket Option Issues . . . . . . . . . . . . . . . . . . . 10 76 4.3.1. General Guideline . . . . . . . . . . . . . . . . . . 10 77 4.3.2. Disabling of the Nagle Algorithm . . . . . . . . . . . 10 78 4.3.3. Buffer Sizing . . . . . . . . . . . . . . . . . . . . 10 79 4.3.4. Other Socket Options . . . . . . . . . . . . . . . . . 11 80 4.4. Default Enabling of MPTCP . . . . . . . . . . . . . . . . 11 81 4.5. Summary of Advices to Application Developers . . . . . . . 11 82 5. Basic API for MPTCP-aware Applications . . . . . . . . . . . . 12 83 5.1. Design Considerations . . . . . . . . . . . . . . . . . . 12 84 5.2. Requirements on the Basic MPTCP API . . . . . . . . . . . 13 85 5.3. Sockets Interface Extensions by the Basic MPTCP API . . . 14 86 5.3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . 14 87 5.3.2. Enabling and Disabling of MPTCP . . . . . . . . . . . 15 88 5.3.3. Binding MPTCP to Specified Addresses . . . . . . . . . 16 89 5.3.4. Querying the MPTCP Subflow Addresses . . . . . . . . . 16 90 5.3.5. Getting a Unique Connection Identifier . . . . . . . . 17 91 5.4. Usage Examples . . . . . . . . . . . . . . . . . . . . . . 17 92 6. Other Compatibility Issues . . . . . . . . . . . . . . . . . . 17 93 6.1. Usage of the SCTP Socket API . . . . . . . . . . . . . . . 17 94 6.2. Incompatibilities with other Multihoming Solutions . . . . 18 95 6.3. Interactions with DNS . . . . . . . . . . . . . . . . . . 18 96 7. Security Considerations . . . . . . . . . . . . . . . . . . . 19 97 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 98 9. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 19 99 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 19 100 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 101 11.1. Normative References . . . . . . . . . . . . . . . . . . . 19 102 11.2. Informative References . . . . . . . . . . . . . . . . . . 20 103 Appendix A. Requirements on a Future Advanced MPTCP API . . . . . 21 104 A.1. Design Considerations . . . . . . . . . . . . . . . . . . 21 105 A.2. MPTCP Usage Scenarios and Application Requirements . . . . 21 106 A.3. Potential Requirements on an Advanced MPTCP API . . . . . 23 107 Appendix B. Change History of the Document . . . . . . . . . . . 24 109 1. Introduction 111 Multipath TCP adds the capability of using multiple paths to a 112 regular TCP session [1]. The motivations for this extension include 113 increasing throughput, overall resource utilisation, and resilience 114 to network failure, and these motivations are discussed, along with 115 high-level design decisions, as part of the Multipath TCP 116 architecture [4]. The MPTCP protocol [5] offers the same reliable, 117 in-order, byte-stream transport as TCP, and is designed to be 118 backward compatible with both applications and the network layer. It 119 requires support inside the network stack of both endpoints. 121 This document first presents the impacts that MPTCP may have on 122 applications, such as performance changes compared to regular TCP. 123 Second, it defines the interoperation of MPTCP and applications that 124 are unaware of the multipath transport. MPTCP is designed to be 125 usable without any application changes, but some compatibility issues 126 have to be taken into account. Third, this memo specifies a basic 127 Application Programming Interface (API) for MPTCP-aware applications. 128 The API presented here is an extension to the regular TCP API to 129 allow an MPTCP-aware application the equivalent level of control and 130 access to information of an MPTCP connection that would be possible 131 with the standard TCP API on a regular TCP connection. 133 An advanced API for MPTCP is outside the scope of this document. 134 Such an advanced API could offer a more fine-grained control over 135 multipath transport functions and policies. The appendix includes a 136 brief, non-compulsory list of potential features of such an advanced 137 API. 139 The de facto standard API for TCP/IP applications is the "sockets" 140 interface. This document defines experimental MPTCP-specific 141 extensions, using additional socket options. It is up to the 142 applications, high-level programming languages, or libraries to 143 decide whether to use these optional extensions. For instance, an 144 application may want to turn on or off the MPTCP mechanism for 145 certain data transfers, or limit its use to certain interfaces. The 146 syntax and semantics of the specification is in line with the Posix 147 standard [8] as much as possible. 149 There are also various related extensions of the sockets interface: 150 [12] specifies sockets API extensions for a multihoming shim layer. 151 The API enables interactions between applications and the multihoming 152 shim layer for advanced locator management and for access to 153 information about failure detection and path exploration. 154 Experimental extensions to the sockets API are also defined for the 155 Host Identity Protocol (HIP) [13] in order to manage the bindings of 156 identifiers and locator. Further related API extensions exist for 157 IPv6 [10], Mobile IP [11], and SCTP [14]. There can be interactions 158 or incompatibilities of these APIs with MPTCP, which are discussed 159 later in this document. 161 Some network stack implementations, specially on mobile devices, have 162 centralized connection managers or other higher-level APIs to solve 163 multi-interface issues, as surveyed in [16]. Their interaction with 164 MPTCP is outside the scope of this note. 166 The target readers of this document are application developers whose 167 software may benefit significantly from MPTCP. This document also 168 provides the necessary information for developers of MPTCP to 169 implement the API in a TCP/IP network stack. 171 2. Terminology 173 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 174 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 175 document are to be interpreted as described in [3]. 177 This document uses the terminology introduced in [5]. 179 3. Comparison of MPTCP and Regular TCP 181 This section discusses the impact that the use of MPTCP will have on 182 applications, in comparison to what may be expected from the use of 183 regular TCP. 185 3.1. Performance Impact 187 One of the key goals of adding multipath capability to TCP is to 188 improve the performance of a transport connection by load 189 distribution over separate subflows across potentially disjoint 190 paths. Furthermore, it is an explicit goal of MPTCP that it should 191 not provide a worse performing connection that would have existed 192 through the use of single-path TCP. A corresponding congestion 193 control algorithm is described in [7]. The following sections 194 summarize the performance impact of MPTCP as seen by an application. 196 3.1.1. Throughput 198 The most obvious performance improvement that will be gained with the 199 use of MPTCP is an increase in throughput, since MPTCP will pool more 200 than one path (where available) between two endpoints. This will 201 provide greater bandwidth for an application. If there are shared 202 bottlenecks between the flows, then the congestion control algorithms 203 will ensure that load is evenly spread amongst regular and multipath 204 TCP sessions, so that no end user receives worse performance than 205 single-path TCP. 207 This performance increase additionally means that an MPTCP session 208 could achieve throughput that is greater than the capacity of a 209 single interface on the device. If any applications make assumptions 210 about interfaces due to throughput (or vice versa), they must take 211 this into account (although an MPTCP implementation must always 212 respect an application's request for a particular interface). 214 The transport of MPTCP signaling information results in a small 215 overhead. If multiple subflows share a same bottleneck, this 216 overhead slightly reduces the capacity that is available for data 217 transport. Yet, this potential reduction of throughput will be 218 neglectible in many usage scenarios, and the protocol contains 219 optimisations in its design so that this overhead is minimal. 221 3.1.2. Delay 223 If the delays on the constituent subflows of an MPTCP connection 224 differ, the jitter perceivable to an application may appear higher as 225 the data is spread across the subflows. Although MPTCP will ensure 226 in-order delivery to the application, the application must be able to 227 cope with the data delivery being burstier than may be usual with 228 single-path TCP. Since burstiness is commonplace on the Internet 229 today, it is unlikely that applications will suffer from such an 230 impact on the traffic profile, but application authors may wish to 231 consider this in future development. 233 In addition, applications that make round trip time (RTT) estimates 234 at the application level may have some issues. Whilst the average 235 delay calculated will be accurate, whether this is useful for an 236 application will depend on what it requires this information for. If 237 a new application wishes to derive such information, it should 238 consider how multiple subflows may affect its measurements, and thus 239 how it may wish to respond. In such a case, an application may wish 240 to express its scheduling preferences, as described later in this 241 document. 243 3.1.3. Resilience 245 The use of multiple subflows simultaneously means that, if one should 246 fail, all traffic will move to the remaining subflow(s), and 247 additionally any lost packets can be retransmitted on these subflows. 249 Subflow failure may be caused by issues within the network, which an 250 application would be unaware of, or interface failure on the node. 251 An application may, under certain circumstances, be in a position to 252 be aware of such failure (e.g. by radio signal strength, or simply an 253 interface enabled flag), and so must not make assumptions of an MPTCP 254 flow's stablity based on this. An MPTCP implementation must never 255 override an application's request for a given interface, however, so 256 the cases where this issue may be applicable are limited. 258 3.2. Potential Problems 260 3.2.1. Impact of Middleboxes 262 MPTCP has been designed in order to pass through the majority of 263 middleboxes. Empirical evidence suggests that new TCP options can 264 successfully be used on most paths in the Internet. Nevertheless 265 some middleboxes may still refuse to pass MPTCP messages due to the 266 presence of TCP options, or they may strip TCP options. If this is 267 the case, MPTCP should fall back to regular TCP. Although this will 268 not create a problem for the application (its communication will be 269 set up either way), there may be additional (and indeed, user- 270 perceivable) delay while the first handshake fails. A detailed 271 discussion of the various fallback mechanisms, for failures occurring 272 at different points in the connection, is presented in [5]. 274 There may also be middleboxes that transparently change the length of 275 content. If such middleboxes are present, MPTCP's reassembly of the 276 byte stream in the receiver is difficult. Still, MPTCP can detect 277 such middleboxes and then fall back to regular TCP. An overview of 278 the impact of middleboxes is presented in [4] and MPTCP's mechanisms 279 to work around these are presented and discussed in [5]. 281 MPTCP can also have other unexpected implications. For instance, 282 intrusion detection systems could be triggered. A full analysis of 283 MPTCP's impact on such middleboxes is for further study after 284 deployment experiments. 286 3.2.2. Outdated Implicit Assumptions 288 In regular TCP, there is a one-to-one mapping of the socket interface 289 to a flow through a network. Since MPTCP can make use of multiple 290 flows, applications cannot implicitly rely on this one-to-one mapping 291 any more. Applications that require the transport along a single 292 path can disable the use of MPTCP as described later in this 293 document. Examples include monitoring tools that want to measure the 294 available bandwidth on a path, or routing protocols such as BGP that 295 require the use of a specific link. 297 3.2.3. Security Implications 299 The support for multiple IP addresses within one MPTCP connection can 300 result in additional security vulnerabilities, such as possibilities 301 for attackers to hijack connections. The protocol design of MPTCP 302 minimizes this risk. An attacker on one of the paths can cause harm, 303 but this is hardly an additional security risk compared to single- 304 path TCP, which is vulnerable to man-in-the-middle attacks, too. A 305 detailed thread analysis of MPTCP is published in [6]. 307 4. Operation of MPTCP with Legacy Applications 309 4.1. Overview of the MPTCP Network Stack 311 MPTCP is an extension of TCP, but it is designed to be backward 312 compatible for legacy applications. TCP interacts with other parts 313 of the network stack by different interfaces. The de facto standard 314 API between TCP and applications is the sockets interface. The 315 position of MPTCP in the protocol stack can be illustrated in 316 Figure 1. 318 +-------------------------------+ 319 | Application | 320 +-------------------------------+ 321 ^ | 322 ~~~~~~~~~~~|~Socket Interface|~~~~~~~~~~~ 323 | v 324 +-------------------------------+ 325 | MPTCP | 326 + - - - - - - - + - - - - - - - + 327 | Subflow (TCP) | Subflow (TCP) | 328 +-------------------------------+ 329 | IP | IP | 330 +-------------------------------+ 332 Figure 1: MPTCP protocol stack 334 In general, MPTCP can affect all interfaces that make assumptions 335 about the coupling of a TCP connection to a single IP address and TCP 336 port pair, to one sockets endpoint, to one network interface, or to a 337 given path through the network. 339 This means that there are two classes of applications: 341 o Legacy applications: These applications are unaware of MPTCP and 342 use the existing API towards TCP without any changes. This is the 343 default case. 345 o MPTCP-aware applications: These applications indicate support for 346 an enhance MPTCP interface. This document specified a minimum set 347 of API extensions for such applications. 349 In the following, it is discussed to which extent MPTCP affects 350 legacy applications using the existing sockets API. The existing 351 sockets API implies that applications deal with data structures that 352 store, amongst others, the IP addresses and TCP port numbers of a TCP 353 connection. A design objective of MPTCP is that legacy applications 354 can continue to use the established sockets API without any changes. 355 However, in MPTCP there is a one-to-many mapping between the socket 356 endpoint and the subflows. This has several subtle implications for 357 legacy applications using sockets API functions. 359 4.2. Address Issues 361 4.2.1. Specification of Addresses by Applications 363 During binding, an application can either select a specific address, 364 or bind to INADDR_ANY. Furthermore, on some systems other socket 365 options (e.g., SO_BINDTODEVICE) can be used to bind to a specific 366 interface. If an application uses a specific address or binds to a 367 specific interface, then MPTCP MUST respect this and not interfere in 368 the application's choices. If an application binds to INADDR_ANY, it 369 is assumed that the application does not care which addresses to use 370 locally. In this case, a local policy MAY allow MPTCP to 371 automatically set up multiple subflows on such a connection. 373 The basic sockets API of MPTCP-aware applications allows to express 374 further preferences in an MPTCP-compatible way (e.g. bind to a subset 375 of interfaces only). 377 4.2.2. Querying of Addresses by Applications 379 Applications can use the getpeername() or getsockname() functions in 380 order to retrieve the IP address of the peer or of the local socket. 381 These functions can be used for various purposes, including security 382 mechanisms, geo-location, or interface checks. The socket API was 383 designed with an assumption that a socket is using just one address, 384 and since this address is visible to the application, the application 385 may assume that the information provided by the functions is the same 386 during the lifetime of a connection. However, in MPTCP, unlike in 387 TCP, there is a one-to-many mapping of a connection to subflows, and 388 subflows can be added and removed while the connections continues to 389 exist. Therefore, MPTCP cannot expose addresses by getpeername() or 390 getsockname() that are both valid and constant during the 391 connection's lifetime. 393 This problem is addressed as follows: If used by a legacy 394 application, the MPTCP stack MUST always return the addresses of the 395 first subflow of an MPTCP connection, in all circumstances, even if 396 that particular subflow is no longer in use. 398 As this address may not be valid any more if the first subflow is 399 closed, the MPTCP stack MAY close the whole MPTCP connection if the 400 first subflow is closed (i.e. fate sharing between the initial 401 subflow and the MPTCP connection as a whole). Whether to close the 402 whole MPTCP connection by default SHOULD be controlled by a local 403 policy. Further experiments are needed to investigate its 404 implications. 406 The functions getpeername() and getsockname() SHOULD also always 407 return the addresses of the first subflow if the socket is used by an 408 MPTCP-aware application, in order to be consistent with MPTCP-unaware 409 applications, and, e. g., also with SCTP. Instead of getpeername() 410 or getsockname(), MPTCP-aware applications can use new API calls, 411 documented later, in order to retrieve the full list of address pairs 412 for the subflows in use. 414 4.3. Socket Option Issues 416 4.3.1. General Guideline 418 The existing sockets API includes options that modify the behavior of 419 sockets and their underlying communications protocols. Various 420 socket options exist on socket, TCP, and IP level. The value of an 421 option can usually be set by the setsockopt() system function. The 422 getsockopt() function gets information. In general, the existing 423 sockets interface functions cannot configure each MPTCP subflow 424 individually. In order to be backward compatible, existing APIs 425 therefore SHOULD apply to all subflows within one connection, as far 426 as possible. 428 4.3.2. Disabling of the Nagle Algorithm 430 One commonly used TCP socket option (TCP_NODELAY) disables the Nagle 431 algorithm as described in [2]. This option is also specified in the 432 Posix standard [8]. Applications can use this option in combination 433 with MPTCP exactly in the same way. It then SHOULD disable the Nagle 434 algorithm for the MPTCP connection, i.e., all subflows. 436 In addition, the MPTCP protocol instance MAY use a different path 437 scheduler algorithm if TCP_NODELAY is present. For instance, it 438 could use an algorithm that is optimized for latency-sensitive 439 traffic. Specific algorithms are outside the scope of this document. 441 4.3.3. Buffer Sizing 443 Applications can explicitly configure send and receive buffer sizes 444 by the sockets API (SO_SNDBUF, SO_RCVBUF). These socket options can 445 also be used in combination with MPTCP and then affect the buffer 446 size of the MPTCP connection. However, when defining buffer sizes, 447 application programmers should take into account that the transport 448 over several subflows requires a certain amount of buffer for 449 resequencing in the receiver. MPTCP may also require more storage 450 space in the sender, in particular, if retransmissions are sent over 451 more than one path. In addition, very small send buffers may prevent 452 MPTCP from efficiently scheduling data over different subflows. 453 Therefore, it does not make sense to use MPTCP in combination with 454 small send or receive buffers. 456 An MPTCP implementation MAY set a lower bound for send and receive 457 buffers and treat a small buffer size request as an implicit request 458 not to use MPTCP. 460 4.3.4. Other Socket Options 462 Some network stacks also provide other implementation-specific socket 463 options or interfaces that affect TCP's behavior. If a network stack 464 supports MPTCP, it must be ensured that these options do not 465 interfere. 467 4.4. Default Enabling of MPTCP 469 It is up to a local policy at the end system whether a network stack 470 should automatically enable MPTCP for sockets even if there is no 471 explicit sign of MPTCP awareness of the corresponding application. 472 Such a choice may be under the control of the user through system 473 preferences. 475 The enabling of MPTCP, either by application or by system defaults, 476 does not necessarily mean that MPTCP will always be used. Both 477 endpoints must support MPTCP, and there must be multiple addresses at 478 at least one endpoint, for MPTCP to be used. Even if those 479 requirements are met, however, MPTCP may not be immediately used on a 480 connection. It may make sense for multiple paths to be brought into 481 operation only after a given period of time, or if the connection is 482 saturated. 484 4.5. Summary of Advices to Application Developers 486 o Using the default MPTCP configuration: Like TCP, MPTCP is designed 487 to be efficient and robust in the default configuration. 488 Application developers should not explicitly configure TCP (or 489 MPTCP) features unless this is really needed. 491 o Socker buffet dimensioning: Multipath transport requires larger 492 buffers in the receiver for resequencing, as already explained. 493 Applications should use reasonably buffer sizes (such as the 494 operating system default values) in order to fully benefit from 495 MPTCP. A full discussion of buffer sizing issues is given in [5]. 497 o Facilitating stack-internal heuristics: The path management and 498 data scheduling by MPTCP is realized by stack-internal algorithms 499 that may implicitly try to self-optimize their behavior according 500 to assumed application needs. For instance, an MPTCP 501 implementation may use heuristics to determine whether an 502 application requires delay-sensitive or bulk data transport, using 503 for instance port numbers, the TCP_NODELAY socket options, or the 504 application's read/write patterns as input parameters. An 505 application developer can facilitate the operation of such 506 heuristics by avoiding atypical interface use cases. For 507 instance, for long bulk data transfers, it does neither make sense 508 to enable the TCP_NODELAY socket option, nor is it reasonable to 509 use many small subsequent socket "send()" calls with small amounts 510 of data only. 512 5. Basic API for MPTCP-aware Applications 514 5.1. Design Considerations 516 While applications can use MPTCP with the unmodified sockets API, 517 multipath transport results in many degrees of freedom. MPTCP 518 manages the data transport over different subflows automatically. By 519 default, this is transparent to the application, but an application 520 could use an additional API to interface with the MPTCP layer and to 521 control important aspects of the MPTCP implementation's behaviour. 523 This document describes a basic MPTCP API. The API uses non- 524 mandatory socket options and only includes a minimum set of functions 525 that provide an equivalent level of control and information as exists 526 for regular TCP. It maintains backward compatibility with legacy 527 applications. 529 An advanced MPTCP API is outside the scope of this document. The 530 basic API does not allow a sender or a receiver to express 531 preferences about the management of paths or the scheduling of data, 532 even if this can have a significant performance impact and if an 533 MPTCP implementation could benefit from additional guidance by 534 applications. A list of potential further API extensions is provided 535 in the appendix. The specification of such an advanced API is for 536 further study and may partly be implementation-specific. 538 MPTCP mainly affects the sending of data. Therefore, the basic API 539 only affects the sender side of a data transfer. A receiver may also 540 have preferences about data transfer choices, and it may have 541 performance requirements, too. Yet, the signaling of the receiver's 542 needs is outside of the scope of this document. 544 As this document specifies sockets API extensions, it is written so 545 that the syntax and semantics are in line with the Posix standard [8] 546 as much as possible. 548 5.2. Requirements on the Basic MPTCP API 550 Because of the importance of the sockets interface there are several 551 fundamental design objectives for the basic interface between MPTCP 552 and applications: 554 o Consistency with existing sockets APIs must be maintained as far 555 as possible. In order to support the large base of applications 556 using the original API, a legacy application must be able to 557 continue to use standard socket interface functions when run on a 558 system supporting MPTCP. Also, MPTCP-aware applications should be 559 able to access the socket without any major changes. 561 o Sockets API extensions must be minimized and independent of an 562 implementation. 564 o The interface should both handle IPv4 and IPv6. 566 The following is a list of the core requirements for the basic API: 568 REQ1: Turn on/off MPTCP: An application should be able to request to 569 turn on or turn off the usage of MPTCP. This means that an 570 application should be able to explicitly request the use of 571 MPTCP if this is possible. Applications should also be able 572 to request not to enable MPTCP and to use regular TCP 573 transport instead. This can be implicit in many cases, since 574 MPTCP must disabled by the use of binding to a specific 575 address. MPTCP may also be enabled if an application uses a 576 dedicated multipath address family (such as AF_MULTIPATH, 577 [9]). 579 REQ2: An application should be able to restrict MPTCP to binding to 580 a given set of addresses. 582 REQ3: An application should be able obtain information on the 583 addresses used by the MPTCP subflows. 585 REQ4: An application should be able to extract a unique identifier 586 for the connection (per endpoint). 588 The first requirement is the most important one, since some 589 applications could benefit a lot from MPTCP, but there are also cases 590 in which it hardly makes sense. The existing sockets API provides 591 similar mechanisms to enable or disable advanced TCP features. The 592 second requirement corresponds to the binding of addresses with the 593 bind() socket call, or, e.g., explicit device bindings with a 594 SO_BINDTODEVICE option. The third requirement ensures that there is 595 an equivalent to getpeername() or getsockname() that is able to deal 596 with more than one subflow. Finally, it should be possible for the 597 application to retrieve a unique connection identifier (local to the 598 endpoint on which it is running) for the MPTCP connection. This is 599 equivalent to using the (address, port) pair for a connection 600 identifier in single-path TCP, which is no longer static in MPTCP. 602 An application can continue to use getpeername() or getsockname() in 603 addition to the basic MPTCP API. In that case, both functions return 604 the corresponding addresses of the first subflow, as already 605 explained. 607 5.3. Sockets Interface Extensions by the Basic MPTCP API 609 5.3.1. Overview 611 The basic MPTCP API consist of four new socket options that are 612 specific to MPTCP. All of these socket options are defined at TCP 613 level (IPPROTO_TCP). 615 o TCP_MULTIPATH_ENABLE: Enable/disable MPTCP 617 o TCP_MULTIPATH_BIND: Bind MPTCP to a set of given local addresses 619 o TCP_MULTIPATH_SUBFLOWS: Get the addresses currently used by the 620 MPTCP subflows 622 o TCP_MULTIPATH_CONNID: Get the local connection identifier for this 623 MPTCP connection 625 Table Table 1 shows a list of the socket options for the general 626 configuration of MPTCP. The first column gives the name of the 627 option. The second and third columns indicate whether the option can 628 be handled by the getsockopt() system call and/or by the setsockopt() 629 system call. The fourth column lists the type of data structure 630 specified along with the socket option. 632 +------------------------+-----+-----+------------------------------+ 633 | Option name | Get | Set | Data type | 634 +------------------------+-----+-----+------------------------------+ 635 | TCP_MULTIPATH_ENABLE | o | o | int | 636 | TCP_MULTIPATH_BIND | | o | list of "struct sockaddr" | 637 | TCP_MULTIPATH_SUBFLOWS | o | | list of pairs of "struct | 638 | | | | sockaddr" | 639 | TCP_MULTIPATH_CONNID | o | | uint32 | 640 +------------------------+-----+-----+------------------------------+ 642 Table 1: Socket options for MPTCP 644 There are restrictions when these new socket options can be used: 646 o TCP_MULTIPATH_ENABLE: This option SHOULD only be set before the 647 establishment of a TCP connection. Its value SHOULD only be read 648 after the establishment of a connection. 650 o TCP_MULTIPATH_BIND: This option MAY be both applied before 651 connection setup or during a connection. In the latter case, it 652 allows MPTCP to use a new address, if there has been a restriction 653 before connection setup. 655 o TCP_MULTIPATH_SUBFLOWS: This option is read-only and SHOULD only 656 be used after connection setup. 658 o TCP_MULTIPATH_CONNID: This option is read-only and SHOULD only be 659 used after connection setup. 661 5.3.2. Enabling and Disabling of MPTCP 663 An application can explicitly indicate multipath capability by 664 setting the TCP_MULTIPATH_ENABLE option with a value larger than 0. 665 In this case, the MPTCP implementation SHOULD try to negitiate MPTCP 666 for that connection. Note that multipath transport will not 667 necessarily be enabled, as it requires multiple addresses and support 668 in the other end-system and potentially also on middleboxes. 670 An application can disable MPTCP setting the option with a value of 671 0. In that case, MPTCP MUST NOT be used on that connection. 673 After connection establishment, an application can get the value of 674 the TCP_MULTIPATH_ENABLE option. A value of 0 then means lack of 675 MPTCP support. Any value equal to or larger than 1 means that MPTCP 676 is supported. 678 As alternative to setting a socket option, an application can also 679 use a new, separate address family called AF_MULTIPATH [9]. This 680 separate address family can be used to exchange multiple addresses 681 between an application and the standard sockets API, and additionally 682 acts as an explicit indication that an application is MPTCP-aware, 683 i.e., that it can deal with the semantic changes of the sockets API, 684 in particular concerning getpeername() and getsockname(). The usage 685 of AF_MULTIPATH is also more flexible with respect to multipath 686 transport, either IPv4 or IPv6, or both in parallel [9]. 688 5.3.3. Binding MPTCP to Specified Addresses 690 An application can set the TCP_MULTIPATH_BIND socket option to 691 announce a set of local IP addresses that MPTCP may bind to. The 692 parameter of the option is a list of data structures of type 693 "sockaddr". A MPTCP implementation must iterate over this list since 694 the length of the structures may vary and will be deteremined by the 695 address families. By extension, this option will also control the 696 list of addresses that can be advertised to the peer via MPTCP 697 signalling. 699 If used during the lifetime of a connection, an application MUST 700 always provide the full list of addresses that MPTCP is allowed to 701 use. If the option is set, MPTCP MUST only establish subflows using 702 one of the addresses in that list as source addresses. MPTCP MUST 703 also use the list as the only set of addresses it can signal to its 704 peer. It should be noted that this signal is only a hint, and an 705 MPTCP implementation may only use a subset of the addresses. 707 If an address is not present in the new list, MPTCP MUST close any 708 corresponding subflows (i.e. those using the local address that is no 709 longer present), and signal the removal of the address to the peer. 710 If alternative paths are available using the supplied address list 711 but MPTCP is not currently using them, an MPTCP implementation SHOULD 712 establish alternative subflows before undertaking the address 713 removal. 715 TBD: If it is unreasonable or difficult for an application to keep 716 track of addresses to provide full lists for every time 717 TCP_MULTIPATH_BIND is set, we could also provide separate 718 TCP_MULTIPATH_ADDR_ADD and TCP_MULTIPATH_ADDR_REMOVE options. Would 719 this be preferable? (The ADD option would provide the same 720 functionality as bind() before connection setup.) 722 5.3.4. Querying the MPTCP Subflow Addresses 724 An application can get a list of the addresses used by the currently 725 established subflows by means of the TCP_MULTIPATH_SUBFLOWS option, 726 which cannot be set. The return value is a list of pairs of 727 "sockaddr" data structures. In one pair, the first data structure 728 refers to the local IP address and the second one to the remote IP 729 address used by the subflow. The list MUST only include established 730 subflows. 732 The length of the data structure depends on the number of subflows, 733 and so an application must iterate over the list for its length, 734 determining the length of each "sockaddr" data structure by its 735 address family. 737 5.3.5. Getting a Unique Connection Identifier 739 An application that wants a unique identifier for the connection, 740 analogous to an (address, port) pair in regular TCP, can use the 741 TCP_MULTIPATH_CONNID option to get a local connection identifier for 742 the MPTCP connection. 744 This is a 32-bit number, and SHOULD be the same as the local 745 connection identifier sent in the MPTCP handshake. 747 5.4. Usage Examples 749 TODO: Example C code for the API functions 751 6. Other Compatibility Issues 753 6.1. Usage of the SCTP Socket API 755 For dealing with multi-homing, several socket API extensions have 756 been defined for SCTP [14]. As MPTCP realizes multipath transport 757 from and to multi-homed endsystems, some of these interface function 758 calls are actually applicable to MPTCP in a similar way. 760 The following functions that are defined for SCTP have similar 761 functionality to the MPTCP API extensions defined earlier: 763 o sctp_bindx() 765 o sctp_connectx() 767 o sctp_getladdrs() 769 o sctp_getpaddrs() 771 The syntax and semantics of these functions are described in [14]. 773 API developers MAY wish to integrate SCTP and MPTCP calls to provide 774 a consistent interface to the application. Yet, it must be 775 emphasized that the transport service provided by MPTCP is different 776 to SCTP, and this is why not all SCTP API functions can be mapped 777 directly to MPTCP. Furthermore, a network stack implementing MPTCP 778 does not necessarily support SCTP and its specific socket interface 779 extensions. This is why the basic API of MPTCP defines additional 780 socket options only, which are a backward compatible extension of 781 TCP's application interface. 783 6.2. Incompatibilities with other Multihoming Solutions 785 The use of MPTCP can interact with various related sockets API 786 extensions. The use of a multihoming shim layer conflicts with 787 multipath transport such as MPTCP or SCTP [12]. Care should be taken 788 for the usage not to confuse with the overlapping features of other 789 APIs: 791 o SHIM API [12]: This API specifies sockets API extensions for the 792 multihoming shim layer. 794 o HIP API [13]: The Host Identity Protocol (HIP) also results in a 795 new API. 797 o API for Mobile IPv6 [11]: For Mobile IPv6, a significantly 798 extended socket API exists as well. 800 In order to avoid any conflict, multiaddressed MPTCP SHOULD NOT be 801 enabled if a network stack uses SHIM6, HIP, or Mobile IPv6. 802 Furthermore, applications should not try to use both the MPTCP API 803 and another multihoming or mobility layer API. 805 It is possible, however, that some of the MPTCP functionality, such 806 as congestion control, could be used in a SHIM6 or HIP environment. 807 Such operation is outside the scope of this document. 809 6.3. Interactions with DNS 811 In multihomed or multiaddressed environments, there are various 812 issues that are not specific to MPTCP, but have to be considered, 813 too. These problems are summarized in [15]. 815 Specifically, there can be interactions with DNS. Whilst it is 816 expected that an application will iterate over the list of addresses 817 returned from a call such as getaddrinfo(), MPTCP itself MUST NOT 818 make any assumptions about multiple A or AAAA records from the same 819 DNS query referring to the same host, as it is very likely that 820 multiple addresses refer to multiple servers for load balancing 821 purposes. 823 TODO: Elaborate on DNS 825 7. Security Considerations 827 Will be added in a later version of this document. 829 8. IANA Considerations 831 No IANA considerations. 833 9. Conclusion 835 This document discusses MPTCP's application implications and 836 specifies a basic MPTCP API. For legacy applications, it is ensured 837 that the existing sockets API continues to work. MPTCP-aware 838 applications can use the basic MPTCP API that provides some control 839 over the transport layer equivalent to regular TCP. A more fine- 840 granular interaction between applications and MPTCP requires an 841 advanced MPTCP API, which is not specified in this document. 843 10. Acknowledgments 845 Authors sincerely thank to the following people for their helpful 846 comments to the document: Costin Raiciu 848 Michael Scharf is supported by the German-Lab project 849 (http://www.german-lab.de/) funded by the German Federal Ministry of 850 Education and Research (BMBF). Alan Ford is supported by Trilogy 851 (http://www.trilogy-project.org/), a research project (ICT-216372) 852 partially funded by the European Community under its Seventh 853 Framework Program. The views expressed here are those of the 854 author(s) only. The European Commission is not liable for any use 855 that may be made of the information in this document. 857 11. References 859 11.1. Normative References 861 [1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 862 September 1981. 864 [2] Braden, R., "Requirements for Internet Hosts - Communication 865 Layers", STD 3, RFC 1122, October 1989. 867 [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement 868 Levels", BCP 14, RFC 2119, March 1997. 870 [4] Ford, A., Raiciu, C., Barre, S., and J. Iyengar, "Architectural 871 Guidelines for Multipath TCP Development", 872 draft-ietf-mptcp-architecture-01 (work in progress), June 2010. 874 [5] Ford, A., Raiciu, C., and M. Handley, "TCP Extensions for 875 Multipath Operation with Multiple Addresses", 876 draft-ietf-mptcp-multiaddressed-01 (work in progress), 877 July 2010. 879 [6] Bagnulo, M., "Threat Analysis for Multi-addressed/Multi-path 880 TCP", draft-ietf-mptcp-threat-03 (work in progress), 881 October 2010. 883 [7] Raiciu, C., Handley, M., and D. Wischik, "Coupled Multipath- 884 Aware Congestion Control", draft-ietf-mptcp-congestion-00 (work 885 in progress), July 2010. 887 [8] "IEEE Std. 1003.1-2008 Standard for Information Technology -- 888 Portable Operating System Interface (POSIX). Open Group 889 Technical Standard: Base Specifications, Issue 7, 2008.". 891 11.2. Informative References 893 [9] Sarolahti, P., "Multi-address Interface in the Socket API", 894 draft-sarolahti-mptcp-af-multipath-01 (work in progress), 895 March 2010. 897 [10] Stevens, W., Thomas, M., Nordmark, E., and T. Jinmei, "Advanced 898 Sockets Application Program Interface (API) for IPv6", 899 RFC 3542, May 2003. 901 [11] Chakrabarti, S. and E. Nordmark, "Extension to Sockets API for 902 Mobile IPv6", RFC 4584, July 2006. 904 [12] Komu, M., Bagnulo, M., Slavov, K., and S. Sugimoto, "Socket 905 Application Program Interface (API) for Multihoming Shim", 906 draft-ietf-shim6-multihome-shim-api-13 (work in progress), 907 February 2010. 909 [13] Komu, M. and T. Henderson, "Basic Socket Interface Extensions 910 for Host Identity Protocol (HIP)", draft-ietf-hip-native-api-12 911 (work in progress), January 2010. 913 [14] Stewart, R., Poon, K., Tuexen, M., Yasevich, V., and P. Lei, 914 "Sockets API Extensions for Stream Control Transmission 915 Protocol (SCTP)", draft-ietf-tsvwg-sctpsocket-23 (work in 916 progress), July 2010. 918 [15] Blanchet, M. and P. Seite, "Multiple Interfaces Problem 919 Statement", draft-ietf-mif-problem-statement-04 (work in 920 progress), May 2010. 922 [16] Wasserman, M. and P. Seite, "Current Practices for Multiple 923 Interface Hosts", draft-ietf-mif-current-practices-01 (work in 924 progress), June 2010. 926 Appendix A. Requirements on a Future Advanced MPTCP API 928 A.1. Design Considerations 930 Multipath transport results in many degrees of freedom. The basic 931 MPTCP API only defines a minimum set of the sockets API extensions 932 for the interface between the MPTCP layer and applications, which 933 does not offer much control of the MPTCP implementation's behaviour. 934 A future, advanced API could address further features of MPTCP and 935 provide more control. 937 Applications that use TCP may have different requirements on the 938 transport layer. While developers have become used to the 939 characteristics of regular TCP, new opportunities created by MPTCP 940 could allow the service provided to be optimised further. An 941 advanced API could enable MPTCP-aware applications to specify 942 preferences and control certain aspects of the behavior, in addition 943 to the simple control provided by the basic interface. An advanced 944 API could also address aspects that are completely out-of-scope of 945 the basic API, for example, the question whether a receiving 946 application could influence the sending policy. 948 Furthermore, an advanced MPTCP API could be part of a new overall 949 interface between the network stack and applications that addresses 950 other issues as well, such as the split between identifiers and 951 locators. An API that does not use IP addresses (but, instead e.g. a 952 connectbyname() function) would be useful for numerous purposes, 953 independent of MPTCP. 955 This appendix documents a list of potential usage scenarios and 956 requirements for the advanded API. The specification and 957 implementation of a corresponding API is outside the scope of this 958 document. 960 A.2. MPTCP Usage Scenarios and Application Requirements 962 There are different MPTCP usage scenarios. An application that 963 wishes to transmit bulk data will want MPTCP to provide a high 964 throughput service immediately, through creating and maximising 965 utilisation of all available subflows. This is the default MPTCP use 966 case. 968 But at the other extreme, there are applications that are highly 969 interactive, but require only a small amount of throughput, and these 970 are optimally served by low latency and jitter stability. In such a 971 situation, it would be preferable for the traffic to use only the 972 lowest latency subflow (assuming it has sufficient capacity), maybe 973 with one or two additional subflows for resilience and recovery 974 purposes. The key challenge for such a strategy is that the delay on 975 a path may fluctuate significantly and that just always selecting the 976 path with the smallest delay might result in instability. 978 The choice between bulk data transport and latency-sensitive 979 transport affects the scheduler in terms of whether traffic should 980 be, by default, sent on one subflow or across several ones. Even if 981 the total bandwidth required is less than that available on an 982 individual path, it is desirable to spread this load to reduce stress 983 on potential bottlenecks, and this is why this method should be the 984 default for bulk data transport. However, that may not be optimal 985 for applications that require latency/jitter stability. 987 In the case of the latter option, a further question arises: Should 988 additional subflows be used whenever the primary subflow is 989 overloaded, or only when the primary path fails (hot-standby)? In 990 other words, is latency stability or bandwidth more important to the 991 application? This results in two different options: Firstly, there 992 is the single path which can overflow into an additional subflow; and 993 secondly there is single-path with hot-standby, whereby an 994 application may want an alternative backup subflow in order to 995 improve resilience. In case that data delivery on the first subflow 996 fails, the data transport could immediately be continued on the 997 second subflow, which is idle otherwise. 999 A further, mostly orthogonal question is whether data should be 1000 duplicated over the different subflows, in particular if there is 1001 spare capacity. This could improve both the timeliness and 1002 reliability of data delivery. 1004 In summary, there are at least three possible performance objectives 1005 for multipath transport (not necessarily disjoint): 1007 1. High bandwidth 1009 2. Low latency and jitter stability 1011 3. High reliability 1013 In an advanced API, applications could provide high-level guidance to 1014 the MPTCP implementation concerning these performance requirements, 1015 for instance, which is considered to be the most important one. The 1016 MPTCP stack would then use internal mechanisms to fulfill this 1017 abstract indication of a desired service, as far as possible. This 1018 would both affect the assignment of data (including retransmissions) 1019 to existing subflows (e.g., 'use all in parallel', 'use as overflow', 1020 'hot standby', 'duplicate traffic') as well as the decisions when to 1021 set up additional subflows to which addresses. In both cases 1022 different policies can exist, which can be expected to be 1023 implementation-specific. 1025 Therefore, an advanced API could provide a mechanism how applications 1026 can specify their high-level requirements in an implementation- 1027 independent way. One possibility would be to select one "application 1028 profile" out of a number of choices that characterize typical 1029 applications. Yet, as applications today do not have to inform TCP 1030 about their communication requirements, it requires further studies 1031 whether such an approach would be realistic. 1033 Of course, independent of an advanced API, such functionality could 1034 also partly be achieved by MPTCP-internal heuristics that infer some 1035 application preferences e.g. from existing socket options, such as 1036 TCP_NODELAY. Whether this would be reliable, and indeed appropriate, 1037 is for further study, too. 1039 A.3. Potential Requirements on an Advanced MPTCP API 1041 The following is a list of potential requirements for an advanced 1042 MPTCP API beyond the features of the basic API. It is included here 1043 for information only: 1045 REQ5: An application should be able to establish MPTCP connections 1046 without using IP addresses as locators. 1048 REQ6: An application should be able obtain usage information and 1049 statistics about all subflows (e.g., ratio of traffic sent 1050 via this subflow). 1052 REQ7: An application should be able to request a change in the 1053 number of subflows in use, thus triggering removal or 1054 addition of subflows. An even finer control granularity 1055 would be a request for the establishment of a new subflow to 1056 a provided destination, or a request for the termination of a 1057 specified, existing subflow. 1059 REQ8: An application should be able to inform the MPTCP 1060 implementation about its high-level performance requirements, 1061 e.g., in form of a profile. 1063 REQ9: An application should be able to control the automatic 1064 establishment/termination of subflows. This would imply a 1065 selection among different heuristics of the path manager, 1066 e.g., 'try as soon as possible', 'wait until there is a bunch 1067 of data', etc. 1069 REQ10: An application should be able to set preferred subflows or 1070 subflow usage policies. This would result in a selection 1071 among different configurations of the multipath scheduler. 1073 REQ11: An application should be able to control the level of 1074 redundancy by telling whether segments should be sent on more 1075 than one path in parallel. 1077 An advanced API fulfilling these requirements would allow application 1078 developers to more specifically configure MPTCP. It could avoid 1079 suboptimal decisions of internal, implicit heuristics. However, it 1080 is unclear whether all of these requirements would have a significant 1081 benefit to applications, since they are going above and beyond what 1082 the existing API to regular TCP provides. 1084 Appendix B. Change History of the Document 1086 Changes compared to version 02: 1088 o Definition of the behavior of getpeername() and getsockname() when 1089 being called by an MPTCP-aware application. 1091 o Discussion of the possiblity that an MPTCP implementation could 1092 support the SCTP API, as far as it is applicable to MPTCP. 1094 o Various editorial fixes. 1096 Changes compared to version 01: 1098 o Second half of the document completely restructured 1100 o Separation between a basic API and an advanced API: The focus of 1101 the document is the basic API only; all text concerning a 1102 potential extended API is moved to the appendix 1104 o Several clarifications, e. g., concerning buffer sizeing and the 1105 use of different scheduling strategies triggered by TCP_NODELAY 1107 o Additional references 1109 Changes compared to version 00: 1111 o Distinction between legacy and MPTCP-aware applications 1113 o Guidance concerning default enabling, reaction to the shutdown of 1114 the first subflow, etc. 1116 o Reference to a potential use of AF_MULTIPATH 1118 o Additional references to related work 1120 Authors' Addresses 1122 Michael Scharf 1123 Alcatel-Lucent Bell Labs 1124 Lorenzstrasse 10 1125 70435 Stuttgart 1126 Germany 1128 EMail: michael.scharf@alcatel-lucent.com 1130 Alan Ford 1131 Roke Manor Research 1132 Old Salisbury Lane 1133 Romsey, Hampshire SO51 0ZN 1134 UK 1136 Phone: +44 1794 833 465 1137 EMail: alan.ford@roke.co.uk