idnits 2.17.1 draft-scharf-mptcp-api-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 9, 2010) is 5040 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (ref. '1') (Obsoleted by RFC 9293) == Outdated reference: A later version (-05) exists of draft-ietf-mptcp-architecture-01 == Outdated reference: A later version (-12) exists of draft-ietf-mptcp-multiaddressed-00 == Outdated reference: A later version (-08) exists of draft-ietf-mptcp-threat-02 == Outdated reference: A later version (-01) exists of draft-raiciu-mptcp-congestion-00 == Outdated reference: A later version (-17) exists of draft-ietf-shim6-multihome-shim-api-13 == Outdated reference: A later version (-32) exists of draft-ietf-tsvwg-sctpsocket-21 == Outdated reference: A later version (-15) exists of draft-ietf-mif-problem-statement-04 == Outdated reference: A later version (-12) exists of draft-ietf-mif-current-practices-01 Summary: 1 error (**), 0 flaws (~~), 9 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force M. Scharf 3 Internet-Draft Alcatel-Lucent Bell Labs 4 Intended status: Informational A. Ford 5 Expires: January 10, 2011 Roke Manor Research 6 July 9, 2010 8 MPTCP Application Interface Considerations 9 draft-scharf-mptcp-api-02 11 Abstract 13 Multipath TCP (MPTCP) adds the capability of using multiple paths to 14 a regular TCP session. Even though it is designed to be totally 15 backward compatible to applications, the data transport differs 16 compared to regular TCP, and there are several additional degrees of 17 freedom that applications may wish to exploit. This document 18 summarizes the impact that MPTCP may have on applications, such as 19 changes in performance. Furthermore, it discusses compatibility 20 issues of MPTCP in combination with legacy applications. Finally, 21 the document describes a basic application interface for MPTCP-aware 22 applications that provides access to multipath address information 23 and a level of control equivalent to regular TCP. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on January 10, 2011. 42 Copyright Notice 44 Copyright (c) 2010 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 61 3. Comparison of MPTCP and Regular TCP . . . . . . . . . . . . . 5 62 3.1. Performance Impact . . . . . . . . . . . . . . . . . . . . 5 63 3.1.1. Throughput . . . . . . . . . . . . . . . . . . . . . . 5 64 3.1.2. Delay . . . . . . . . . . . . . . . . . . . . . . . . 6 65 3.1.3. Resilience . . . . . . . . . . . . . . . . . . . . . . 6 66 3.2. Potential Problems . . . . . . . . . . . . . . . . . . . . 7 67 3.2.1. Impact of Middleboxes . . . . . . . . . . . . . . . . 7 68 3.2.2. Outdated Implicit Assumptions . . . . . . . . . . . . 7 69 3.2.3. Security Implications . . . . . . . . . . . . . . . . 8 70 4. Operation of MPTCP with Legacy Applications . . . . . . . . . 8 71 4.1. Overview of the MPTCP Network Stack . . . . . . . . . . . 8 72 4.2. Address Issues . . . . . . . . . . . . . . . . . . . . . . 9 73 4.2.1. Specification of Addresses by Applications . . . . . . 9 74 4.2.2. Querying of Addresses by Applications . . . . . . . . 9 75 4.3. Socket Option Issues . . . . . . . . . . . . . . . . . . . 10 76 4.3.1. General Guideline . . . . . . . . . . . . . . . . . . 10 77 4.3.2. Disabling of the Nagle Algorithm . . . . . . . . . . . 10 78 4.3.3. Buffer Sizing . . . . . . . . . . . . . . . . . . . . 11 79 4.3.4. Other Socket Options . . . . . . . . . . . . . . . . . 11 80 4.4. Default Enabling of MPTCP . . . . . . . . . . . . . . . . 11 81 4.5. Summary of Advices to Application Developers . . . . . . . 12 82 5. Basic API for MPTCP-aware Applications . . . . . . . . . . . . 12 83 5.1. Design Considerations . . . . . . . . . . . . . . . . . . 12 84 5.2. Requirements on the Basic MPTCP API . . . . . . . . . . . 13 85 5.3. Sockets Interface Extensions by the Basic MPTCP API . . . 14 86 5.3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . 14 87 5.3.2. Enabling and Disabling of MPTCP . . . . . . . . . . . 15 88 5.3.3. Binding MPTCP to Specified Addresses . . . . . . . . . 16 89 5.3.4. Querying the MPTCP Subflow Addresses . . . . . . . . . 16 90 5.3.5. Getting a Unique Connection Identifier . . . . . . . . 17 91 5.4. Usage Examples . . . . . . . . . . . . . . . . . . . . . . 17 92 6. Other Compatibility Issues . . . . . . . . . . . . . . . . . . 17 93 6.1. Incompatibilities with other Multihoming Solutions . . . . 17 94 6.2. Interactions with DNS . . . . . . . . . . . . . . . . . . 18 95 7. Security Considerations . . . . . . . . . . . . . . . . . . . 18 96 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 97 9. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 18 98 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 18 99 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 100 11.1. Normative References . . . . . . . . . . . . . . . . . . . 19 101 11.2. Informative References . . . . . . . . . . . . . . . . . . 19 102 Appendix A. Requirements on a Future Advanced MPTCP API . . . . . 20 103 A.1. Design Considerations . . . . . . . . . . . . . . . . . . 20 104 A.2. MPTCP Usage Scenarios and Application Requirements . . . . 21 105 A.3. Potential Requirements on an Advanced MPTCP API . . . . . 22 106 Appendix B. Change History of the Document . . . . . . . . . . . 23 108 1. Introduction 110 Multipath TCP (MPTCP) adds the capability of using multiple paths to 111 a regular TCP session [1]. The motivations for this extension 112 include increasing throughput, overall resource utilisation, and 113 resilience to network failure, and these motivations are discussed, 114 along with high-level design decisions, as part of the MPTCP 115 architecture [4]. MPTCP [5] offers the same reliable, in-order, 116 byte-stream transport as TCP, and is designed to be backward 117 compatible with both applications and the network layer. It requires 118 support inside the network stack of both endpoints. 120 This document first presents the impacts that MPTCP may have on 121 applications, such as performance changes compared to regular TCP. 122 Second, it defines the interoperation of MPTCP and legacy 123 applications that are unaware of the multipath transport. MPTCP is 124 designed to be usable without any application changes, but some 125 compatibility issues have to be taken into account. Third, this memo 126 specifies a basic Application Programming Interface (API) for MPTCP- 127 aware applications. The API presented here is an extension to the 128 regular TCP API to allow an MPTCP-aware application the same level of 129 control and access to information of an MPTCP connection that would 130 be possible with the standard TCP API on a regular TCP connection. 132 An advanced API for MPTCP is outside the scope of this document. 133 Such an advanced API could offer a more fine-grained control over 134 multipath transport functions and policies. The appendix includes a 135 brief, non-compulsory list of potential features of such an advanced 136 API. 138 The de facto standard API for TCP/IP applications is the "sockets" 139 interface. This document defines experimental MPTCP-specific 140 extensions, using additional socket options. It is up to the 141 applications, high-level programming languages, or libraries to 142 decide whether to use these optional extensions. For instance, an 143 application may want to turn on or off the MPTCP mechanism for 144 certain data transfers, or limit its use to certain interfaces. The 145 syntax and semantics of the specification is in line with the Posix 146 standard [8] as much as possible. 148 There are also various related extensions of the sockets interface: 149 [12] specifies sockets API extensions for a multihoming shim layer. 150 The API enables interactions between applications and the multihoming 151 shim layer for advanced locator management and for access to 152 information about failure detection and path exploration. 153 Experimental extensions to the sockets API are also defined for the 154 Host Identity Protocol (HIP) [13] in order to manage the bindings of 155 identifiers and locator. Further related API extensions exist for 156 IPv6 [10], Mobile IP [11], and SCTP [14]. There can be interactions 157 or incompatibilities of these APIs with MPTCP, which are discussed 158 later in this document. 160 Some network stack implementations, specially on mobile devices, have 161 centralized connection managers or other higher-level APIs to solve 162 multi-interface issues, as surveyed in [16]. Their interaction with 163 MPTCP is outside the scope of this note. 165 The target readers of this document are application programmers who 166 develop application software that may benefit significantly from 167 MPTCP. This document also provides the necessary information for 168 developers of MPTCP to implement the API in a TCP/IP network stack. 170 2. Terminology 172 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 173 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 174 document are to be interpreted as described in [3]. 176 This document uses the terminology introduced in [5]. 178 3. Comparison of MPTCP and Regular TCP 180 This section discusses the impact that the use of MPTCP will have on 181 applications, in comparison to what may be expected from the use of 182 regular TCP. 184 3.1. Performance Impact 186 One of the key goals of adding multipath capability to TCP is to 187 improve the performance of a transport connection by load 188 distribution over separate subflows across potentially disjoint 189 paths. Furthermore, it is an explicit goal of MPTCP that it should 190 not provide a worse performing connection that would have existed 191 through the use of legacy, single-path TCP. A corresponding 192 congestion control algorithm is described in [7]. The following 193 sections summarize the performance impact of MPTCP as seen by an 194 application. 196 3.1.1. Throughput 198 The most obvious performance improvement that will be gained with the 199 use of MPTCP is an increase in throughput, since MPTCP will pool more 200 than one path (where available) between two endpoints. This will 201 provide greater bandwidth for an application. If there are shared 202 bottlenecks between the flows, then the congestion control algorithms 203 will ensure that load is evenly spread amongst regular and multipath 204 TCP sessions, so that no end user receives worse performance than 205 single-path TCP. 207 Furthermore, this means that an MPTCP session could achieve 208 throughput that is greater than the capacity of a single interface on 209 the device. If any applications make assumptions about interfaces 210 due to throughput (or vice versa), they must take this into account 211 (although MPTCP will always respect an application's request for a 212 particular interface). 214 The transport of MPTCP signaling information results in a small 215 overhead. If multiple subflows share a same bottleneck, this 216 overhead slightly reduces the capacity that is available for data 217 transport. Yet, this potential reduction of throughput will be 218 neglectible in many usage scenarios, and the protocol contains 219 optimisations in its design so that this overhead is minimal. 221 3.1.2. Delay 223 If the delays on the constituent subflows of an MPTCP connection 224 differ, the jitter perceivable to an application may appear higher as 225 the data is striped across the subflows. Although MPTCP will ensure 226 in-order delivery to the application, the application must be able to 227 cope with the data delivery being burstier than may be usual with 228 single-path TCP. Since burstiness is commonplace on the Internet 229 today, it is unlikely that applications will suffer from such an 230 impact on the traffic profile, but application authors may wish to 231 consider this in future development. 233 In addition, applications that make round trip time (RTT) estimates 234 at the application level may have some issues. Whilst the average 235 delay calculated will be accurate, whether this is useful for an 236 application will depend on what it requires this information for. If 237 a new application wishes to derive such information, it should 238 consider how multiple subflows may affect its measurements, and thus 239 how it may wish to respond. In such a case, an application may wish 240 to express its scheduling preferences, as described later in this 241 document. 243 3.1.3. Resilience 245 The use of multiple subflows simultaneously means that, if one should 246 fail, all traffic will move to the remaining subflow(s), and 247 additionally any lost packets can be retransmitted on these subflows. 249 Subflow failure may be caused by issues within the network, which an 250 application would be unaware of, or interface failure on the node. 251 An application may, under certain circumstances, be in a position to 252 be aware of such failure (e.g. by radio signal strength, or simply an 253 interface enabled flag), and so must not make assumptions of an MPTCP 254 flow's stablity based on this. MPTCP will never override an 255 application's request for a given interface, however, so the cases 256 where this issue may be applicable are limited. 258 3.2. Potential Problems 260 3.2.1. Impact of Middleboxes 262 MPTCP has been designed in order to pass through the majority of 263 middleboxes. Empirical evidence suggests that new TCP options can 264 successfully be used on most paths in the Internet. Nevertheless 265 some middleboxes may still refuse to pass MPTCP messages due to the 266 presence of TCP options, or they may strip TCP options. If this is 267 the case, MPTCP should fall back to regular TCP. Although this will 268 not create a problem for the application (its communication will be 269 set up either way), there may be additional (and indeed, user- 270 perceivable) delay while the first handshake fails. A detailed 271 discussion of the various fallback mechanisms, for failures occurring 272 at different points in the connection, is presented in [5]. 274 There may also be middleboxes that transparently change the length of 275 content. If such middleboxes are present, MPTCP's reassembly of the 276 byte stream in the receiver is difficult. Still, MPTCP can detect 277 such middleboxes and then fall back to regular TCP. An overview of 278 the impact of middleboxes is presented in [4] and MPTCP's mechanisms 279 to work around these are presented and discussed in [5]. 281 MPTCP can also have other unexpected implications. For instance, 282 intrusion detection systems could be triggered. A full analysis of 283 MPTCP's impact on such middleboxes is for further study after 284 deployment experiments. 286 3.2.2. Outdated Implicit Assumptions 288 In regular TCP, there is a one-to-one mapping of the socket interface 289 to a flow through a network. Since MPTCP can make use of multiple 290 flows, applications cannot implicitly rely on this one-to-one mapping 291 any more. Applications that require the transport along a single 292 path can disable the use of MPTCP as described later in this 293 document. Examples include monitoring tools that want to measure the 294 available bandwidth on a path, or routing protocols such as BGP that 295 require the use of a specific link. 297 3.2.3. Security Implications 299 The support for multiple IP addresses within one MPTCP connection can 300 result in additional security vulnerabilities, such as possibilities 301 for attackers to hijack connections. The protocol design of MPTCP 302 minimizes this risk. An attacker on one of the paths can cause harm, 303 but this is hardly an additional security risk compared to single- 304 path TCP, which is vulnerable to man-in-the-middle attacks, too. A 305 detailed thread analysis of MPTCP is published in [6]. 307 4. Operation of MPTCP with Legacy Applications 309 4.1. Overview of the MPTCP Network Stack 311 MPTCP is an extension of TCP, but it is designed to be backward 312 compatible for legacy applications. TCP interacts with other parts 313 of the network stack by different interfaces. The de facto standard 314 API between TCP and applications is the sockets interface. The 315 position of MPTCP in the protocol stack can be illustrated in 316 Figure 1. 318 +-------------------------------+ 319 | Application | 320 +-------------------------------+ 321 ^ | 322 ~~~~~~~~~~~|~Socket Interface|~~~~~~~~~~~ 323 | v 324 +-------------------------------+ 325 | MPTCP | 326 + - - - - - - - + - - - - - - - + 327 | Subflow (TCP) | Subflow (TCP) | 328 +-------------------------------+ 329 | IP | IP | 330 +-------------------------------+ 332 Figure 1: MPTCP protocol stack 334 In general, MPTCP can affect all interfaces that make assumptions 335 about the coupling of a TCP connection to a single IP address and TCP 336 port pair, to one sockets endpoint, to one network interface, or to a 337 given path through the network. 339 This means that there are two classes of applications: 341 o Legacy applications: These applications use the existing API 342 towards TCP without any changes. This is the default case. 344 o MPTCP-aware applications: These applications indicate support for 345 an enhance MPTCP interface. This document specified a minimum set 346 of API extensions for such applications. 348 In the following, it is discussed to which extent MPTCP affects 349 legacy applications using the existing sockets API. The existing 350 sockets API implies that applications deal with data structures that 351 store, amongst others, the IP addresses and TCP port numbers of a TCP 352 connection. A design objective of MPTCP is that legacy applications 353 can continue to use the established sockets API without any changes. 354 However, in MPTCP there is a one-to-many mapping between the socket 355 endpoint and the subflows. This has several subtle implications for 356 legacy applications using sockets API functions. 358 4.2. Address Issues 360 4.2.1. Specification of Addresses by Applications 362 During binding, an application can either select a specific address, 363 or bind to INADDR_ANY. Furthermore, on some systems other socket 364 options (e. g., SO_BINDTODEVICE) can be used to bind to a specific 365 interface. If an application uses a specific address or binds to a 366 specific interface, then MPTCP MUST respect this and not interfere in 367 the application's choices. If an application binds to INADDR_ANY, it 368 is assumed that the application does not care which addresses to use 369 locally. In this case, a local policy MAY allow MPTCP to 370 automatically set up multiple subflows on such a connection. 372 The basic sockets API of MPTCP-aware applications allows to express 373 further preferences in an MPTCP-compatible way (e.g. bind to a subset 374 of interfaces only). 376 4.2.2. Querying of Addresses by Applications 378 Applications can use the getpeername() or getsockname() functions in 379 order to retrieve the IP address of the peer or of the local socket. 380 These functions can be used for various purposes, including security 381 mechanisms, geo-location, or interface checks. The socket API was 382 designed with an assumption that a socket is using just one address, 383 and since this address is visible to the application, the application 384 may assume that the information provided by the functions is the same 385 during the lifetime of a connection. However, in MPTCP, unlike in 386 TCP, there is a one-to-many mapping of a connection to subflows, and 387 subflows can be added and removed while the connections continues to 388 exist. Therefore, MPTCP cannot expose addresses by getpeername() or 389 getsockname() that are both valid and constant during the 390 connection's lifetime. 392 This problem is addressed as follows: If used by a legacy 393 application, the MPTCP stack MUST always return the addresses of the 394 first subflow of an MPTCP connection, in all circumstances, even if 395 that particular subflow is no longer in use. 397 As this address may not be valid any more if the first subflow is 398 closed, the MPTCP stack MAY close the whole MPTCP connection if the 399 first subflow is closed (i.e. fate sharing between the initial 400 subflow and the MPTCP connection as a whole). Whether to close the 401 whole MPTCP connection by default SHOULD be controlled by a local 402 policy. Further experiments are needed to investigate its 403 implications. 405 Instead of getpeername() or getsockname(), MPTCP-aware applications 406 can use new API calls, documented later, in order to retrieve the 407 full list of address pairs for the subflows in use. 409 TBD: If a socket is used by an MPTCP-aware application and thus does 410 not use the backward compatibility mode, the functions getpeername() 411 and getsockname() could fail with a new error code EMULTIPATH. The 412 motivation would be that an MPTCP-aware application should not use 413 these two functions due to their ambiguity. Instead, the information 414 about the addresses in use should be accessed by the basic MPTCP 415 sockets API, if needed. The alternative would be to always returning 416 the addresses of the first subflow - which is the best option is 417 currently unspecified, and may be left to the implementation. 419 4.3. Socket Option Issues 421 4.3.1. General Guideline 423 The existing sockets API includes options that modify the behavior of 424 sockets and their underlying communications protocols. Various 425 socket options exist on socket, TCP, and IP level. The value of an 426 option can usually be set by the setsockopt() system function. The 427 getsockopt() function gets information. In general, the existing 428 sockets interface functions cannot configure each MPTCP subflow 429 individually. In order to be backward compatible, existing APIs 430 therefore SHOULD apply to all subflows within one connection, as far 431 as possible. 433 4.3.2. Disabling of the Nagle Algorithm 435 One commonly used TCP socket option (TCP_NODELAY) disables the Nagle 436 algorithm as described in [2]. This option is also specified in the 437 Posix standard [8]. Applications can use this option in combination 438 with MPTCP exactly in the same way. It then SHOULD disable the Nagle 439 algorithm for the MPTCP connection, i.e., all subflows. 441 In addition, the MPTCP protocol instance MAY use a different path 442 scheduler algorithm if TCP_NODELAY is present. For instance, it 443 could use an algorithm that is optimized for latency-sensitive 444 traffic. Specific algorithms are outside the scope of this document. 446 4.3.3. Buffer Sizing 448 Applications can explicitly configure send and receive buffer sizes 449 by the sockets API (SO_SNDBUF, SO_RCVBUF). These socket options can 450 also be used in combination with MPTCP and then affect the buffer 451 size of the MPTCP connection. However, when defining buffer sizes, 452 application programmers should take into account that the transport 453 over several subflows requires a certain amount of buffer for 454 resequencing in the receiver. MPTCP may also require more storage 455 space in the sender, in particular, if retransmissions are sent over 456 more than one path. In addition, very small send buffers may prevent 457 MPTCP from efficiently scheduling data over different subflows. 458 Therefore, it does not make sense to use MPTCP in combination with 459 small send or receive buffers. 461 An MPTCP implementation MAY set a lower bound for send and receive 462 buffers and treat a small buffer size request as an implicit request 463 not to use MPTCP. 465 4.3.4. Other Socket Options 467 Some network stacks also provide other implementation-specific socket 468 options or interfaces that affect TCP's behavior. If a network stack 469 supports MPTCP, it must be ensured that these options do not 470 interfere. 472 4.4. Default Enabling of MPTCP 474 It is up to a local policy at the end system whether a network stack 475 should automatically enable MPTCP for sockets even if there is no 476 explicit sign of MPTCP awareness of the corresponding application. 477 Such a choice may be under the control of the user through system 478 preferences. 480 The enabling of MPTCP, either by application or by system defaults, 481 does not necessarily mean that MPTCP will always be used. Both 482 endpoints must support MPTCP, and there must be multiple addresses at 483 at least one endpoint, for MPTCP to be used. Even if those 484 requirements are met, however, MPTCP may not be immediately used on a 485 connection. It may make sense for multiple paths to be brought into 486 operation only after a given period of time, or if the connection is 487 saturated. 489 4.5. Summary of Advices to Application Developers 491 o Using the default MPTCP configuration: Like TCP, MPTCP is designed 492 to be efficient and robust in the default configuration. 493 Application developers should not explicitly configure TCP (or 494 MPTCP) features unless this is really needed. 496 o Socker buffet dimensioning: Multipath transport requires larger 497 buffers in the receiver for resequencing, as already explained. 498 Applications should use reasonably buffer sizes (such as the 499 operating system default values) in order to fully benefit from 500 MPTCP. A full discussion of buffer sizing issues is given in [5]. 502 o Facilitating stack-internal heuristics: The path management and 503 data scheduling by MPTCP is realized by stack-internal algorithms 504 that may implicitly try to self-optimize their behavior according 505 to assumed application needs. For instance, an MPTCP 506 implementation may use heuristics to determine whether an 507 application requires delay-sensitive or bulk data transport, using 508 for instance port numbers, the TCP_NODELAY socket options, or the 509 application's read/write patterns as input parameters. An 510 application developer can facilitate the operation of such 511 heuristics by avoiding atypical interface use cases. For 512 instance, for long bulk data transfers, it does neither make sense 513 to enable the TCP_NODELAY socket option, nor is it reasonable to 514 use many small subsequent socket "send()" calls with small amounts 515 of data only. 517 5. Basic API for MPTCP-aware Applications 519 5.1. Design Considerations 521 While applications can use MPTCP with the unmodified sockets API, 522 multipath transport results in many degrees of freedom. MPTCP 523 manages the data transport over different subflows automatically. By 524 default, this is transparent to the application, but an application 525 could use an additional API to interface with the MPTCP layer and to 526 control important aspects of the MPTCP implementation's behaviour. 528 This document describes a basic MPTCP API. The API uses non- 529 mandatory socket options and only includes a minimum set of functions 530 that provide an equivalent level of control and information as exists 531 for regular TCP. It maintains backward compatibility with legacy 532 applications. 534 An advanced MPTCP API is outside the scope of this document. The 535 basic API does not allow a sender or a receiver to express 536 preferences about the management of paths or the scheduling of data, 537 even if this can have a significant performance impact and if an 538 MPTCP implementation could benefit from additional guidance by 539 applications. A list of potential further API extensions is provided 540 in the appendix. The specification of such an advanced API is for 541 further study and may partly be implementation-specific. 543 MPTCP mainly affects the sending of data. Therefore, the basic API 544 only affects the sender side of a data transfer. A receiver may also 545 have preferences about data transfer choices, and it may have 546 performance requirements, too. Yet, the signaling of the receiver's 547 needs is outside of the scope of this document. 549 As this document specifies sockets API extensions, it is written so 550 that the syntax and semantics are in line with the Posix standard [8] 551 as much as possible. 553 5.2. Requirements on the Basic MPTCP API 555 Because of the importance of the sockets interface there are several 556 fundamental design objectives for the basic interface between MPTCP 557 and applications: 559 o Consistency with existing sockets APIs must be maintained as far 560 as possible. In order to support the large base of applications 561 using the original API, a legacy application must be able to 562 continue to use standard socket interface functions when run on a 563 system supporting MPTCP. Also, MPTCP-aware applications should be 564 able to access the socket without any major changes. 566 o Sockets API extensions must be minimized and independent of an 567 implementation. 569 o The interface should both handle IPv4 and IPv6. 571 The following is a list of the core requirements for the basic API: 573 REQ1: Turn on/off MPTCP: An application should be able to request to 574 turn on or turn off the usage of MPTCP. This means that an 575 application should be able to explicitly request the use of 576 MPTCP if this is possible. Applications should also be able 577 to request not to enable MPTCP and to use regular TCP 578 transport instead. This can be implicit in many cases, since 579 MPTCP must disabled by the use of binding to a specific 580 address. MPTCP may also be enabled if an application uses 581 AF_MULTIPATH. 583 REQ2: An application should be able to restrict MPTCP to binding to 584 a given set of addresses. 586 REQ3: An application should be able obtain information on the 587 addresses used by the MPTCP subflows. 589 REQ4: An application should be able to extract a unique identifier 590 for the connection (per endpoint). 592 The first requirement is the most important one, since some 593 applications could benefit a lot from MPTCP, but there are also cases 594 in which it hardly makes sense. The existing sockets API provides 595 similar mechanisms to enable or disable advanced TCP features. The 596 second requirement corresponds to the binding of addresses with the 597 bind() socket call, or, e.g., explicit device bindings with a 598 SO_BINDTODEVICE option. The third requirement ensures that there is 599 an equivalent to getpeername() or getsockname() that is able to deal 600 with more than one subflow. Finally, it should be possible for the 601 application to retrieve a unique connection identifier (local to the 602 endpoint on which it is running) for the MPTCP connection. This is 603 equivalent to using the (address, port) pair for a connection 604 identifier in legacy TCP, which is no longer static in MPTCP. 606 5.3. Sockets Interface Extensions by the Basic MPTCP API 608 5.3.1. Overview 610 The basic MPTCP API consist of four new socket options that are 611 specific to MPTCP. All of these socket options are defined at TCP 612 level (IPPROTO_TCP). 614 o TCP_MULTIPATH_ENABLE: Enable/disable MPTCP 616 o TCP_MULTIPATH_BIND: Bind MPTCP to a set of given local addresses 618 o TCP_MULTIPATH_SUBFLOWS: Get the addresses currently used by the 619 MPTCP subflows 621 o TCP_MULTIPATH_CONNID: Get the local connection identifier for this 622 MPTCP connection 624 Table Table 1 shows a list of the socket options for the general 625 configuration of MPTCP. The first column gives the name of the 626 option. The second and third columns indicate whether the option can 627 be handled by the getsockopt() system call and/or by the setsockopt() 628 system call. The fourth column lists the type of data structure 629 specified along with the socket option. 631 +------------------------+-----+-----+------------------------------+ 632 | Option name | Get | Set | Data type | 633 +------------------------+-----+-----+------------------------------+ 634 | TCP_MULTIPATH_ENABLE | o | o | int | 635 | TCP_MULTIPATH_BIND | | o | list of "struct sockaddr" | 636 | TCP_MULTIPATH_SUBFLOWS | o | | list of pairs of "struct | 637 | | | | sockaddr" | 638 | TCP_MULTIPATH_CONNID | o | | uint32 | 639 +------------------------+-----+-----+------------------------------+ 641 Table 1: Socket options for MPTCP 643 There are restrictions when these new socket options can be used: 645 o TCP_MULTIPATH_ENABLE: This option SHOULD only be set before the 646 establishment of a TCP connection. Its value SHOULD only be read 647 after the establishment of a connection. 649 o TCP_MULTIPATH_BIND: This option MAY be both applied before 650 connection setup or during a connection. In the latter case, it 651 allows MPTCP to use a new address, if there has been a restriction 652 before connection setup. 654 o TCP_MULTIPATH_SUBFLOWS: This option is read-only and SHOULD only 655 be used after connection setup. 657 o TCP_MULTIPATH_CONNID: This option is read-only and SHOULD only be 658 used after connection setup. 660 5.3.2. Enabling and Disabling of MPTCP 662 An application can explicitly indicate multipath capability by 663 setting the TCP_MULTIPATH_ENABLE option with a value larger than 0. 664 In this case, the MPTCP implementation SHOULD try to negitiate MPTCP 665 for that connection. Note that multipath transport will not 666 necessarily be enabled, as it requires multiple addresses and support 667 in the other end-system and potentially also on middleboxes. 669 An application can disable MPTCP setting the option with a value of 670 0. In that case, MPTCP MUST NOT be used on that connection. 672 After connection establishment, an application can get the value of 673 the TCP_MULTIPATH_ENABLE option. A value of 0 then means lack of 674 MPTCP support. Any value equal to or larger than 1 means that MPTCP 675 is supported. 677 TBD: In case of success, the value could return the current number of 678 subflows. 680 As alternative to setting a socket option, an application can also 681 use a new, separate address family called AF_MULTIPATH [9]. This 682 separate address family can be used to exchange multiple addresses 683 between an application and the standard sockets API, and additionally 684 acts as an explicit indication that an application is MPTCP-aware, 685 i.e., that it can deal with the semantic changes of the sockets API, 686 in particular concerning getpeername() and getsockname(). The usage 687 of AF_MULTIPATH is also more flexible with respect to multipath 688 transport, either IPv4 or IPv6, or both in parallel [9]. 690 5.3.3. Binding MPTCP to Specified Addresses 692 An application can set the TCP_MULTIPATH_BIND socket option to 693 announce a set of local IP addresses that MPTCP may bind to. The 694 parameter of the option is a list of data structures of type 695 "sockaddr". A MPTCP implementation must iterate over this list since 696 the length of the structures may vary and will be deteremined by the 697 address families. 699 If used, an application SHOULD always provide the full list of 700 addresses that MPTCP is allowed to use. If the option is set, MPTCP 701 MUST only establish additional subflows using one of the addresses in 702 that list as source addresses. Of course, MPTCP may also use a 703 subset of the addresses only. 705 The option may be repeatedly set. In that case, an updated list of 706 addresses SHOULD only affect the establishment of new subflows. In 707 addition, MPTCP MAY close the corresponding subflows if an address is 708 not present in an updated list any more, but it is also allowed to 709 keep these subflows open. The basic API does not provide a mechanism 710 to explicitly close a subflow. 712 TBD: are these the best heuristics? Is it reasonable to expect an 713 application to keep track of all addresses if it wants to do changes? 714 Should it be stronger than a MAY for address removal? 716 5.3.4. Querying the MPTCP Subflow Addresses 718 An application can get a list of the addresses used by the currently 719 established subflows by means of the TCP_MULTIPATH_SUBFLOWS option, 720 which cannot be set. The return value is a list of pairs of 721 "sockaddr" data structures. In one pair, the first data structure 722 refers to the local IP address and the second one to the remote IP 723 address used by the subflow. The list MUST only include established 724 subflows. 726 The length of the data structure depends on the number of subflows, 727 and so an application must iterate over the list for its length, 728 determining the length of each "sockaddr" data structure by its 729 address family. 731 5.3.5. Getting a Unique Connection Identifier 733 An application that wants a unique identifier for the connection, 734 analogous to an (address, port) pair in regular TCP, can use the 735 TCP_MULTIPATH_CONNID option to get a local connection identifier for 736 the MPTCP connection. 738 This is a 32-bit number, and SHOULD be the same as the local 739 connection identifier sent in the MPTCP handshake. 741 5.4. Usage Examples 743 TODO: Example C code for the API functions 745 6. Other Compatibility Issues 747 6.1. Incompatibilities with other Multihoming Solutions 749 The use of MPTCP can interact with various related sockets API 750 extensions. The use of a multihoming shim layer conflicts with 751 multipath transport such as MPTCP or SCTP [12]. Care should be taken 752 for the usage not to confuse with the overlapping features of other 753 APIs: 755 o SHIM API [12]: This API specifies sockets API extensions for the 756 multihoming shim layer. 758 o HIP API [13]: The Host Identity Protocol (HIP) also results in a 759 new API. 761 o API for Mobile IPv6 [11]: For Mobile IPv6, a significantly 762 extended socket API exists as well. 764 In order to avoid any conflict, multiaddressed MPTCP SHOULD NOT be 765 enabled if a network stack uses SHIM6, HIP, or Mobile IPv6. 766 Furthermore, applications should not try to use both the MPTCP API 767 and another multihoming or mobility layer API. 769 It is possible, however, that some of the MPTCP functionality, such 770 as congestion control, could be used in a SHIM6 or HIP environment. 771 Such operation is outside the scope of this document. 773 6.2. Interactions with DNS 775 In multihomed or multiaddressed environments, there are various 776 issues that are not specific to MPTCP, but have to be considered, 777 too. These problems are summarized in [15]. 779 Specifically, there can be interactions with DNS. Whilst it is 780 expected that an application will iterate over the list of addresses 781 returned from a call such as getaddrinfo(), MPTCP itself MUST NOT 782 make any assumptions about multiple A or AAAA records from the same 783 DNS query referring to the same host, as it is very likely that 784 multiple addresses refer to multiple servers for load balancing 785 purposes. 787 TODO: Elaborate on DNS 789 7. Security Considerations 791 Will be added in a later version of this document. 793 8. IANA Considerations 795 No IANA considerations. 797 9. Conclusion 799 This document discusses MPTCP's application implications and 800 specifies a basic MPTCP API. For legacy applications, it is ensured 801 that the existing sockets API continues to work. MPTCP-aware 802 applications can use the basic MPTCP API that provides some control 803 over the transport layer equivalent to regular TCP. A more fine- 804 granular interaction between applications and MPTCP requires an 805 advanced MPTCP API, which is not specified in this document. 807 10. Acknowledgments 809 Authors sincerely thank to the following people for their helpful 810 comments to the document: Costin Raiciu 812 Michael Scharf is supported by the German-Lab project 813 (http://www.german-lab.de/) funded by the German Federal Ministry of 814 Education and Research (BMBF). Alan Ford is supported by Trilogy 815 (http://www.trilogy-project.org/), a research project (ICT-216372) 816 partially funded by the European Community under its Seventh 817 Framework Program. The views expressed here are those of the 818 author(s) only. The European Commission is not liable for any use 819 that may be made of the information in this document. 821 11. References 823 11.1. Normative References 825 [1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 826 September 1981. 828 [2] Braden, R., "Requirements for Internet Hosts - Communication 829 Layers", STD 3, RFC 1122, October 1989. 831 [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement 832 Levels", BCP 14, RFC 2119, March 1997. 834 [4] Ford, A., Raiciu, C., Barre, S., and J. Iyengar, "Architectural 835 Guidelines for Multipath TCP Development", 836 draft-ietf-mptcp-architecture-01 (work in progress), June 2010. 838 [5] Ford, A., Raiciu, C., and M. Handley, "TCP Extensions for 839 Multipath Operation with Multiple Addresses", 840 draft-ietf-mptcp-multiaddressed-00 (work in progress), 841 June 2010. 843 [6] Bagnulo, M., "Threat Analysis for Multi-addressed/Multi-path 844 TCP", draft-ietf-mptcp-threat-02 (work in progress), 845 March 2010. 847 [7] Raiciu, C., Handley, M., and D. Wischik, "Coupled Multipath- 848 Aware Congestion Control", draft-raiciu-mptcp-congestion-00 849 (work in progress), October 2009. 851 [8] "IEEE Std. 1003.1-2008 Standard for Information Technology -- 852 Portable Operating System Interface (POSIX). Open Group 853 Technical Standard: Base Specifications, Issue 7, 2008.". 855 11.2. Informative References 857 [9] Sarolahti, P., "Multi-address Interface in the Socket API", 858 draft-sarolahti-mptcp-af-multipath-01 (work in progress), 859 March 2010. 861 [10] Stevens, W., Thomas, M., Nordmark, E., and T. Jinmei, "Advanced 862 Sockets Application Program Interface (API) for IPv6", 863 RFC 3542, May 2003. 865 [11] Chakrabarti, S. and E. Nordmark, "Extension to Sockets API for 866 Mobile IPv6", RFC 4584, July 2006. 868 [12] Komu, M., Bagnulo, M., Slavov, K., and S. Sugimoto, "Socket 869 Application Program Interface (API) for Multihoming Shim", 870 draft-ietf-shim6-multihome-shim-api-13 (work in progress), 871 February 2010. 873 [13] Komu, M. and T. Henderson, "Basic Socket Interface Extensions 874 for Host Identity Protocol (HIP)", draft-ietf-hip-native-api-12 875 (work in progress), January 2010. 877 [14] Stewart, R., Poon, K., Tuexen, M., Yasevich, V., and P. Lei, 878 "Sockets API Extensions for Stream Control Transmission 879 Protocol (SCTP)", draft-ietf-tsvwg-sctpsocket-21 (work in 880 progress), February 2010. 882 [15] Blanchet, M. and P. Seite, "Multiple Interfaces Problem 883 Statement", draft-ietf-mif-problem-statement-04 (work in 884 progress), May 2010. 886 [16] Wasserman, M. and P. Seite, "Current Practices for Multiple 887 Interface Hosts", draft-ietf-mif-current-practices-01 (work in 888 progress), June 2010. 890 Appendix A. Requirements on a Future Advanced MPTCP API 892 A.1. Design Considerations 894 Multipath transport results in many degrees of freedom. The basic 895 MPTCP API only defines a minimum set of the sockets API extensions 896 for the interface between the MPTCP layer and applications, which 897 does not offer much control of the MPTCP implementation's behaviour. 898 A future, advanced API could address further features of MPTCP and 899 provide more control. 901 Applications that use TCP may have different requirements on the 902 transport layer. While developers have become used to the 903 characteristics of regular TCP, new opportunities created by MPTCP 904 could allow the service provided to be optimised further. An 905 advanced API could enable MPTCP-aware applications to specify 906 preferences and control certain aspects of the behavior, in addition 907 to the simple control provided by the basic interface. An advanced 908 API could also address aspects that are completely out-of-scope of 909 the basic API, for example, the question whether a receiving 910 application could influence the sending policy. 912 Furthermore, an advanced MPTCP API could be part of a new overall 913 interface between the network stack and applications that addresses 914 other issues as well, such as the split between identifiers and 915 locators. An API that does not use IP addresses (but, instead e.g. a 916 connectbyname() function) would be useful for numerous purposes, 917 independent of MPTCP. 919 This appendix documents a list of potential usage scenarios and 920 requirements for the advanded API. The specification and 921 implementation of a corresponding API is outside the scope of this 922 document. 924 A.2. MPTCP Usage Scenarios and Application Requirements 926 There are different MPTCP usage scenarios. An application that 927 wishes to transmit bulk data will want MPTCP to provide a high 928 throughput service immediately, through creating and maximising 929 utilisation of all available subflows. This is the default MPTCP use 930 case. 932 But at the other extreme, there are applications that are highly 933 interactive, but require only a small amount of throughput, and these 934 are optimally served by low latency and jitter stability. In such a 935 situation, it would be preferable for the traffic to use only the 936 lowest latency subflow (assuming it has sufficient capacity), maybe 937 with one or two additional subflows for resilience and recovery 938 purposes. The key challenge for such a strategy is that the delay on 939 a path may fluctuate significantly and that just always selecting the 940 path with the smallest delay might result in instability. 942 The choice between bulk data transport and latency-sensitive 943 transport affects the scheduler in terms of whether traffic should 944 be, by default, sent on one subflow or across several ones. Even if 945 the total bandwidth required is less than that available on an 946 individual path, it is desirable to spread this load to reduce stress 947 on potential bottlenecks, and this is why this method should be the 948 default for bulk data transport. However, that may not be optimal 949 for applications that require latency/jitter stability. 951 In the case of the latter option, a further question arises: Should 952 additional subflows be used whenever the primary subflow is 953 overloaded, or only when the primary path fails (hot-standby)? In 954 other words, is latency stability or bandwidth more important to the 955 application? This results in two different options: Firstly, there 956 is the single path which can overflow into an additional subflow; and 957 secondly there is single-path with hot-standby, whereby an 958 application may want an alternative backup subflow in order to 959 improve resilience. In case that data delivery on the first subflow 960 fails, the data transport could immediately be continued on the 961 second subflow, which is idle otherwise. 963 A further, mostly orthogonal question is whether data should be 964 duplicated over the different subflows, in particular if there is 965 spare capacity. This could improve both the timeliness and 966 reliability of data delivery. 968 In summary, there are at least three possible performance objectives 969 for multipath transport (not necessarily disjoint): 971 1. High bandwidth 973 2. Low latency and jitter stability 975 3. High reliability 977 In an advanced API, applications could provide high-level guidance to 978 the MPTCP implementation concerning these performance requirements, 979 for instance, which is considered to be the most important one. The 980 MPTCP stack would then use internal mechanisms to fulfill this 981 abstract indication of a desired service, as far as possible. This 982 would both affect the assignment of data (including retransmissions) 983 to existing subflows (e.g., 'use all in parallel', 'use as overflow', 984 'hot standby', 'duplicate traffic') as well as the decisions when to 985 set up additional subflows to which addresses. In both cases 986 different policies can exist, which can be expected to be 987 implementation-specific. 989 Therefore, an advanced API could provide a mechanism how applications 990 can specify their high-level requirements in an implementation- 991 independent way. One possibility would be to select one "application 992 profile" out of a number of choices that characterize typical 993 applications. Yet, as applications today do not have to inform TCP 994 about their communication requirements, it requires further studies 995 whether such an approach would be realistic. 997 Of course, independent of an advanced API, such functionality could 998 also partly be achieved by MPTCP-internal heuristics that infer some 999 application preferences e.g. from existing socket options, such as 1000 TCP_NODELAY. Whether this would be reliable, and indeed appropriate, 1001 is for further study, too. 1003 A.3. Potential Requirements on an Advanced MPTCP API 1005 The following is a list of potential requirements for an advanced 1006 MPTCP API beyond the features of the basic API. It is included here 1007 for information only: 1009 REQ5: An application should be able to establish MPTCP connections 1010 without using IP addresses as locators. 1012 REQ6: An application should be able obtain usage information and 1013 statistics about all subflows (e.g., ratio of traffic sent 1014 via this subflow). 1016 REQ7: An application should be able to request a change in the 1017 number of subflows in use, thus triggering removal or 1018 addition of subflows. An even finer control granularity 1019 would be a request for the establishment of a new subflow to 1020 a provided destination, or a request for the termination of a 1021 specified, existing subflow. 1023 REQ8: An application should be able to inform the MPTCP 1024 implementation about its high-level performance requirements, 1025 e.g., in form of a profile. 1027 REQ9: An application should be able to control the automatic 1028 establishment/termination of subflows. This would imply a 1029 selection among different heuristics of the path manager, 1030 e.g., 'try as soon as possible', 'wait until there is a bunch 1031 of data', etc. 1033 REQ10: An application should be able to set preferred subflows or 1034 subflow usage policies. This would result in a selection 1035 among different configurations of the multipath scheduler. 1037 REQ11: An application should be able to control the level of 1038 redundancy by telling whether segments should be sent on more 1039 than one path in parallel. 1041 An advanced API fulfilling these requirements would allow application 1042 developers to more specifically configure MPTCP. It could avoid 1043 suboptimal decisions of internal, implicit heuristics. However, it 1044 is unclear whether all of these requirements would have a significant 1045 benefit to applications, since they are going above and beyond what 1046 the existing API to regular TCP provides. 1048 Appendix B. Change History of the Document 1050 Changes compared to version 01: 1052 o Second half of the document completely restructured 1054 o Separation between a basic API and an advanced API: The focus of 1055 the document is the basic API only; all text concerning a 1056 potential extended API is moved to the appendix 1058 o Several clarifications, e. g., concerning buffer sizeing and the 1059 use of different scheduling strategies triggered by TCP_NODELAY 1061 o Additional references 1063 Changes compared to version 00: 1065 o Distinction between legacy and MPTCP-aware applications 1067 o Guidance concerning default enabling, reaction to the shutdown of 1068 the first subflow, etc. 1070 o Reference to a potential use of AF_MULTIPATH 1072 o Additional references to related work 1074 Authors' Addresses 1076 Michael Scharf 1077 Alcatel-Lucent Bell Labs 1078 Lorenzstrasse 10 1079 70435 Stuttgart 1080 Germany 1082 EMail: michael.scharf@alcatel-lucent.com 1084 Alan Ford 1085 Roke Manor Research 1086 Old Salisbury Lane 1087 Romsey, Hampshire SO51 0ZN 1088 UK 1090 Phone: +44 1794 833 465 1091 EMail: alan.ford@roke.co.uk