idnits 2.17.1 

draft-ietf-mptcp-api-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (November 30, 2011) is 4524 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC  793 (ref. '1') (Obsoleted by RFC 9293)

  == Outdated reference: A later version (-12) exists of
     draft-ietf-mptcp-multiaddressed-04

  == Outdated reference: A later version (-07) exists of
     draft-ietf-v6ops-happy-eyeballs-05


     Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                                M. Scharf
3	Internet-Draft                                  Alcatel-Lucent Bell Labs
4	Intended status: Informational                                   A. Ford
5	Expires: June 2, 2012                                  November 30, 2011

7	               MPTCP Application Interface Considerations
8	                        draft-ietf-mptcp-api-03

10	Abstract

12	   Multipath TCP (MPTCP) adds the capability of using multiple paths to
13	   a regular TCP session.  Even though it is designed to be totally
14	   backward compatible to applications, the data transport differs
15	   compared to regular TCP, and there are several additional degrees of
16	   freedom that applications may wish to exploit.  This document
17	   summarizes the impact that MPTCP may have on applications, such as
18	   changes in performance.  Furthermore, it discusses compatibility
19	   issues of MPTCP in combination with non-MPTCP-aware applications.
20	   Finally, the document describes a basic application interface for
21	   MPTCP-aware applications that provides access to multipath address
22	   information and a level of control equivalent to regular TCP.

24	Status of This Memo

26	   This Internet-Draft is submitted in full conformance with the
27	   provisions of BCP 78 and BCP 79.

29	   Internet-Drafts are working documents of the Internet Engineering
30	   Task Force (IETF).  Note that other groups may also distribute
31	   working documents as Internet-Drafts.  The list of current Internet-
32	   Drafts is at http://datatracker.ietf.org/drafts/current/.

34	   Internet-Drafts are draft documents valid for a maximum of six months
35	   and may be updated, replaced, or obsoleted by other documents at any
36	   time.  It is inappropriate to use Internet-Drafts as reference
37	   material or to cite them other than as "work in progress."

39	   This Internet-Draft will expire on June 2, 2012.

41	Copyright Notice

43	   Copyright (c) 2011 IETF Trust and the persons identified as the
44	   document authors.  All rights reserved.

46	   This document is subject to BCP 78 and the IETF Trust's Legal
47	   Provisions Relating to IETF Documents
48	   (http://trustee.ietf.org/license-info) in effect on the date of
49	   publication of this document.  Please review these documents
50	   carefully, as they describe your rights and restrictions with respect
51	   to this document.  Code Components extracted from this document must
52	   include Simplified BSD License text as described in Section 4.e of
53	   the Trust Legal Provisions and are provided without warranty as
54	   described in the Simplified BSD License.

56	Table of Contents

58	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
59	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  5
60	   3.  Comparison of MPTCP and Regular TCP  . . . . . . . . . . . . .  5
61	     3.1.  Performance Impact . . . . . . . . . . . . . . . . . . . .  6
62	       3.1.1.  Throughput . . . . . . . . . . . . . . . . . . . . . .  6
63	       3.1.2.  Delay  . . . . . . . . . . . . . . . . . . . . . . . .  6
64	       3.1.3.  Resilience . . . . . . . . . . . . . . . . . . . . . .  7
65	     3.2.  Potential Problems . . . . . . . . . . . . . . . . . . . .  7
66	       3.2.1.  Impact of Middleboxes  . . . . . . . . . . . . . . . .  7
67	       3.2.2.  Outdated Implicit Assumptions  . . . . . . . . . . . .  8
68	       3.2.3.  Security Implications  . . . . . . . . . . . . . . . .  8
69	   4.  Operation of MPTCP with Legacy Applications  . . . . . . . . .  9
70	     4.1.  Overview of the MPTCP Network Stack  . . . . . . . . . . .  9
71	     4.2.  Address Issues . . . . . . . . . . . . . . . . . . . . . . 10
72	       4.2.1.  Specification of Addresses by Applications . . . . . . 10
73	       4.2.2.  Querying of Addresses by Applications  . . . . . . . . 10
74	     4.3.  Socket Option Issues . . . . . . . . . . . . . . . . . . . 11
75	       4.3.1.  General Guideline  . . . . . . . . . . . . . . . . . . 11
76	       4.3.2.  Disabling of the Nagle Algorithm . . . . . . . . . . . 11
77	       4.3.3.  Buffer Sizing  . . . . . . . . . . . . . . . . . . . . 12
78	       4.3.4.  Other Socket Options . . . . . . . . . . . . . . . . . 12
79	     4.4.  Default Enabling of MPTCP  . . . . . . . . . . . . . . . . 12
80	     4.5.  Summary of Advices to Application Developers . . . . . . . 12
81	   5.  Basic API for MPTCP-aware Applications . . . . . . . . . . . . 13
82	     5.1.  Design Considerations  . . . . . . . . . . . . . . . . . . 13
83	     5.2.  Requirements on the Basic MPTCP API  . . . . . . . . . . . 14
84	     5.3.  Sockets Interface Extensions by the Basic MPTCP API  . . . 15
85	       5.3.1.  Overview . . . . . . . . . . . . . . . . . . . . . . . 15
86	       5.3.2.  Enabling and Disabling of MPTCP  . . . . . . . . . . . 16
87	       5.3.3.  Binding MPTCP to Specified Addresses . . . . . . . . . 17
88	       5.3.4.  Querying the MPTCP Subflow Addresses . . . . . . . . . 18
89	       5.3.5.  Getting a Unique Connection Identifier . . . . . . . . 18
90	   6.  Other Compatibility Issues . . . . . . . . . . . . . . . . . . 18
91	     6.1.  Usage of the SCTP Socket API . . . . . . . . . . . . . . . 18
92	     6.2.  Incompatibilities with other Multihoming Solutions . . . . 19
93	     6.3.  Interactions with DNS  . . . . . . . . . . . . . . . . . . 19
94	   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 19
95	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 19
96	   9.  Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 20
97	   10. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 20
98	   11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
99	     11.1. Normative References . . . . . . . . . . . . . . . . . . . 20
100	     11.2. Informative References . . . . . . . . . . . . . . . . . . 21
101	   Appendix A.  Requirements on a Future Advanced MPTCP API . . . . . 22
102	     A.1.  Design Considerations  . . . . . . . . . . . . . . . . . . 22
103	     A.2.  MPTCP Usage Scenarios and Application Requirements . . . . 22
104	     A.3.  Potential Requirements on an Advanced MPTCP API  . . . . . 24
105	     A.4.  Integration with the SCTP Socket API . . . . . . . . . . . 25
106	   Appendix B.  Change History of the Document  . . . . . . . . . . . 26

108	1.  Introduction

110	   Multipath TCP adds the capability of using multiple paths to a
111	   regular TCP session [1].  The motivations for this extension include
112	   increasing throughput, overall resource utilisation, and resilience
113	   to network failure, and these motivations are discussed, along with
114	   high-level design decisions, as part of the Multipath TCP
115	   architecture [4].  The MPTCP protocol [5] offers the same reliable,
116	   in-order, byte-stream transport as TCP, and is designed to be
117	   backward compatible with both applications and the network layer.  It
118	   requires support inside the network stack of both endpoints.

120	   This document first presents the impacts that MPTCP may have on
121	   applications, such as performance changes compared to regular TCP.
122	   Second, it defines the interoperation of MPTCP and applications that
123	   are unaware of the multipath transport.  MPTCP is designed to be
124	   usable without any application changes, but some compatibility issues
125	   have to be taken into account.  Third, this memo specifies a basic
126	   Application Programming Interface (API) for MPTCP-aware applications.
127	   The API presented here is an extension to the regular TCP API to
128	   allow an MPTCP-aware application the equivalent level of control and
129	   access to information of an MPTCP connection that would be possible
130	   with the standard TCP API on a regular TCP connection.

132	   An advanced API for MPTCP is outside the scope of this document.
133	   Such an advanced API could offer a more fine-grained control over
134	   multipath transport functions and policies.  The appendix includes a
135	   brief, non-compulsory list of potential features of such an advanced
136	   API.

138	   The de facto standard API for TCP/IP applications is the "sockets"
139	   interface.  This document provides an abstract definition of MPTCP-
140	   specific extensions to this interface.  These are operations that can
141	   be used by an application to get or set additional MPTCP-specific
142	   information on a socket, in order to provide an equivalent level of
143	   information and control over MPTCP as exists for an application using
144	   regular TCP.  It is up to the applications, high-level programming
145	   languages, or libraries to decide whether to use these optional
146	   extensions.  For instance, an application may want to turn on or off
147	   the MPTCP mechanism for certain data transfers, or limit its use to
148	   certain interfaces.  The abstract specification is in line with the
149	   Posix standard [8] as much as possible.

151	   There are also various related extensions of the sockets interface:
152	   [12] specifies sockets API extensions for a multihoming shim layer.
153	   The API enables interactions between applications and the multihoming
154	   shim layer for advanced locator management and for access to
155	   information about failure detection and path exploration.

157	   Experimental extensions to the sockets API are also defined for the
158	   Host Identity Protocol (HIP) [13] in order to manage the bindings of
159	   identifiers and locator.  Further related API extensions exist for
160	   IPv6 [10], Mobile IP [11], and SCTP [14].  There can be interactions
161	   or incompatibilities of these APIs with MPTCP, which are discussed
162	   later in this document.

164	   Some network stack implementations, specially on mobile devices, have
165	   centralized connection managers or other higher-level APIs to solve
166	   multi-interface issues, as surveyed in [16].  Their interaction with
167	   MPTCP is outside the scope of this note.

169	   The target readers of this document are application developers whose
170	   software may benefit significantly from MPTCP.  This document also
171	   provides the necessary information for developers of MPTCP to
172	   implement the API in a TCP/IP network stack.

174	2.  Terminology

176	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
177	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
178	   document are to be interpreted as described in [3].

180	   This document uses the MPTCP terminology introduced in [5].

182	   Concerning the API towards applications, the following terms are
183	   distinguished:

185	   o  Legacy API: The interface towards TCP that is currently used by
186	      applications.  This document explains the impact of MPTCP for such
187	      applications, as well as resulting issues.

189	   o  Basic API: A simple extension of TCP's interface for applications
190	      that are aware of MPTCP.  This document abstractly describes this
191	      interface, which provides access to multipath address information
192	      and a level of control equivalent to regular TCP.

194	   o  Advanced API: An API that offers more fine-grained control over
195	      the MPTCP behaviour.  Its detailed specification is outside scope
196	      of this document.

198	3.  Comparison of MPTCP and Regular TCP

200	   This section discusses the impact that the use of MPTCP will have on
201	   applications, in comparison to what may be expected from the use of
202	   regular TCP.

204	3.1.  Performance Impact

206	   One of the key goals of adding multipath capability to TCP is to
207	   improve the performance of a transport connection by load
208	   distribution over separate subflows across potentially disjoint
209	   paths.  Furthermore, it is an explicit goal of MPTCP that it should
210	   not provide a worse performing connection that would have existed
211	   through the use of single-path TCP.  A corresponding congestion
212	   control algorithm is described in [7].  The following sections
213	   summarize the performance impact of MPTCP as seen by an application.

215	3.1.1.  Throughput

217	   The most obvious performance improvement that will be gained with the
218	   use of MPTCP is an increase in throughput, since MPTCP will pool more
219	   than one path (where available) between two endpoints.  This will
220	   provide greater bandwidth for an application.  If there are shared
221	   bottlenecks between the flows, then the congestion control algorithms
222	   will ensure that load is evenly spread amongst regular and multipath
223	   TCP sessions, so that no end user receives worse performance than
224	   single-path TCP.

226	   This performance increase additionally means that an MPTCP session
227	   could achieve throughput that is greater than the capacity of a
228	   single interface on the device.  If any applications make assumptions
229	   about interfaces due to throughput (or vice versa), they must take
230	   this into account (although an MPTCP implementation must always
231	   respect an application's request for a particular interface).

233	   Furthermore, the flexibility of MPTCP to add and remove subflows as
234	   paths change availability could lead to a greater variation, and more
235	   frequent change, in connection bandwidth.  Applications that adapt to
236	   available bandwidth (such as video and audio streaming) may need to
237	   adjust some of their assumptions to most effectively take this into
238	   account.

240	   The transport of MPTCP signaling information results in a small
241	   overhead.  If multiple subflows share a same bottleneck, this
242	   overhead slightly reduces the capacity that is available for data
243	   transport.  Yet, this potential reduction of throughput will be
244	   neglectible in many usage scenarios, and the protocol contains
245	   optimisations in its design so that this overhead is minimal.

247	3.1.2.  Delay

249	   If the delays on the constituent subflows of an MPTCP connection
250	   differ, the jitter perceivable to an application may appear higher as
251	   the data is spread across the subflows.  Although MPTCP will ensure
252	   in-order delivery to the application, the application must be able to
253	   cope with the data delivery being burstier than may be usual with
254	   single-path TCP.  Since burstiness is commonplace on the Internet
255	   today, it is unlikely that applications will suffer from such an
256	   impact on the traffic profile, but application authors may wish to
257	   consider this in future development.

259	   In addition, applications that make round trip time (RTT) estimates
260	   at the application level may have some issues.  Whilst the average
261	   delay calculated will be accurate, whether this is useful for an
262	   application will depend on what it requires this information for.  If
263	   a new application wishes to derive such information, it should
264	   consider how multiple subflows may affect its measurements, and thus
265	   how it may wish to respond.  In such a case, an application may wish
266	   to express its scheduling preferences, as described later in this
267	   document.

269	3.1.3.  Resilience

271	   The use of multiple subflows simultaneously means that, if one should
272	   fail, all traffic will move to the remaining subflow(s), and
273	   additionally any lost packets can be retransmitted on these subflows.

275	   Subflow failure may be caused by issues within the network, which an
276	   application would be unaware of, or interface failure on the node.
277	   An application may, under certain circumstances, be in a position to
278	   be aware of such failure (e.g. by radio signal strength, or simply an
279	   interface enabled flag), and so must not make assumptions of an MPTCP
280	   flow's stablity based on this.  An MPTCP implementation must never
281	   override an application's request for a given interface, however, so
282	   the cases where this issue may be applicable are limited.

284	3.2.  Potential Problems

286	3.2.1.  Impact of Middleboxes

288	   MPTCP has been designed in order to pass through the majority of
289	   middleboxes.  Empirical evidence suggests that new TCP options can
290	   successfully be used on most paths in the Internet.  Nevertheless
291	   some middleboxes may still refuse to pass MPTCP messages due to the
292	   presence of TCP options, or they may strip TCP options.  If this is
293	   the case, MPTCP should fall back to regular TCP.  Although this will
294	   not create a problem for the application (its communication will be
295	   set up either way), there may be additional (and indeed, user-
296	   perceivable) delay while the first handshake fails.  Therefore, an
297	   alternative approach could be to try both MPTCP and regular TCP
298	   connection attempts at the same time, and respond to whichever
299	   replies first (or apply a timeout on the MPTCP attempt, while having
300	   TCP SYN/ACK ready to reply to, thus reducing the setup delay by a
301	   RTT) in a similar fashion to the "Happy Eyeballs" proposal for IPv6
302	   [17].

304	   An MPTCP implementation can learn the rate of MPTCP connection
305	   attempt successes or failures to particular hosts or networks, and on
306	   particular interfaces, and could therefore learn heuristics of when
307	   and when not to use MPTCP.  A detailed discussion of the various
308	   fallback mechanisms, for failures occurring at different points in
309	   the connection, is presented in [5].

311	   There may also be middleboxes that transparently change the length of
312	   content.  If such middleboxes are present, MPTCP's reassembly of the
313	   byte stream in the receiver is difficult.  Still, MPTCP can detect
314	   such middleboxes and then fall back to regular TCP.  An overview of
315	   the impact of middleboxes is presented in [4] and MPTCP's mechanisms
316	   to work around these are presented and discussed in [5].

318	   MPTCP can also have other unexpected implications.  For instance,
319	   intrusion detection systems could be triggered.  A full analysis of
320	   MPTCP's impact on such middleboxes is for further study after
321	   deployment experiments.

323	3.2.2.  Outdated Implicit Assumptions

325	   In regular TCP, there is a one-to-one mapping of the socket interface
326	   to a flow through a network.  Since MPTCP can make use of multiple
327	   subflows, applications cannot implicitly rely on this one-to-one
328	   mapping any more.  Applications that require the transport along a
329	   single path can disable the use of MPTCP as described later in this
330	   document.  Examples include monitoring tools that want to measure the
331	   available bandwidth on a path, or routing protocols such as BGP that
332	   require the use of a specific link.

334	   Furthermore, an implementation may choose to persist an MPTCP
335	   connection even if an IP address is not allocated any more to a host,
336	   depending on the policy concerning the first subflow (fate-sharing,
337	   see Section 4.2.2).  In this case, the IP address exposed to an
338	   MPTCP-unaware application can differ to the addresses actually been
339	   used by MPTCP.  It is even possible that an IP address gets assigned
340	   to another host during the lifetime of an MPTCP connection.

342	3.2.3.  Security Implications

344	   The support for multiple IP addresses within one MPTCP connection can
345	   result in additional security vulnerabilities, such as possibilities
346	   for attackers to hijack connections.  The protocol design of MPTCP
347	   minimizes this risk.  An attacker on one of the paths can cause harm,
348	   but this is hardly an additional security risk compared to single-
349	   path TCP, which is vulnerable to man-in-the-middle attacks, too.  A
350	   detailed thread analysis of MPTCP is published in [6].

352	4.  Operation of MPTCP with Legacy Applications

354	4.1.  Overview of the MPTCP Network Stack

356	   MPTCP is an extension of TCP, but it is designed to be backward
357	   compatible for legacy applications.  TCP interacts with other parts
358	   of the network stack by different interfaces.  The de facto standard
359	   API between TCP and applications is the sockets interface.  The
360	   position of MPTCP in the protocol stack can be illustrated in
361	   Figure 1.

363	                      +-------------------------------+
364	                      |           Application         |
365	                      +-------------------------------+
366	                             ^                 |
367	                  ~~~~~~~~~~~|~Socket Interface|~~~~~~~~~~~
368	                             |                 v
369	                     +-------------------------------+
370	                     |             MPTCP             |
371	                     + - - - - - - - + - - - - - - - +
372	                     | Subflow (TCP) | Subflow (TCP) |
373	                     +-------------------------------+
374	                     |       IP      |      IP       |
375	                     +-------------------------------+

377	                      Figure 1: MPTCP protocol stack

379	   In general, MPTCP can affect all interfaces that make assumptions
380	   about the coupling of a TCP connection to a single IP address and TCP
381	   port pair, to one sockets endpoint, to one network interface, or to a
382	   given path through the network.

384	   This means that there are two classes of applications:

386	   o  Legacy applications: These applications are unaware of MPTCP and
387	      use the existing API towards TCP without any changes.  This is the
388	      default case.

390	   o  MPTCP-aware applications: These applications indicate support for
391	      an enhance MPTCP interface.  This document specified a minimum set
392	      of API extensions for such applications.

394	   In the following, it is discussed to which extent MPTCP affects
395	   legacy applications using the existing sockets API.  The existing
396	   sockets API implies that applications deal with data structures that
397	   store, amongst others, the IP addresses and TCP port numbers of a TCP
398	   connection.  A design objective of MPTCP is that legacy applications
399	   can continue to use the established sockets API without any changes.
400	   However, in MPTCP there is a one-to-many mapping between the socket
401	   endpoint and the subflows.  This has several subtle implications for
402	   legacy applications using sockets API functions.

404	4.2.  Address Issues

406	4.2.1.  Specification of Addresses by Applications

408	   During binding, an application can either select a specific address,
409	   or bind to INADDR_ANY.  Furthermore, on some systems other socket
410	   options (e.g., SO_BINDTODEVICE) can be used to bind to a specific
411	   interface.  If an application uses a specific address or binds to a
412	   specific interface, then MPTCP MUST respect this and not interfere in
413	   the application's choices.  The binding to a specific address or
414	   interface implies that the application is not aware of MPTCP and will
415	   disable the use of MPTCP on this connection.  An application that
416	   wishes to bind to a specific set of addresses with MPTCP must use
417	   multipath-aware calls to achieve this (as described in
418	   Section 5.3.3).

420	   If an application binds to INADDR_ANY, it is assumed that the
421	   application does not care which addresses to use locally.  In this
422	   case, a local policy MAY allow MPTCP to automatically set up multiple
423	   subflows on such a connection.

425	   The basic sockets API of MPTCP-aware applications allows to express
426	   further preferences in an MPTCP-compatible way (e.g. bind to a subset
427	   of interfaces only).

429	4.2.2.  Querying of Addresses by Applications

431	   Applications can use the getpeername() or getsockname() functions in
432	   order to retrieve the IP address of the peer or of the local socket.
433	   These functions can be used for various purposes, including security
434	   mechanisms, geo-location, or interface checks.  The socket API was
435	   designed with an assumption that a socket is using just one address,
436	   and since this address is visible to the application, the application
437	   may assume that the information provided by the functions is the same
438	   during the lifetime of a connection.  However, in MPTCP, unlike in
439	   TCP, there is a one-to-many mapping of a connection to subflows, and
440	   subflows can be added and removed while the connections continues to
441	   exist.  Therefore, MPTCP cannot expose addresses by getpeername() or
442	   getsockname() that are both valid and constant during the
443	   connection's lifetime.

445	   This problem is addressed as follows: If used by a legacy
446	   application, the MPTCP stack MUST always return the addresses of the
447	   first subflow of an MPTCP connection, in all circumstances, even if
448	   that particular subflow is no longer in use.

450	   As this address may not be valid any more if the first subflow is
451	   closed, the MPTCP stack MAY close the whole MPTCP connection if the
452	   first subflow is closed (i.e. fate sharing between the initial
453	   subflow and the MPTCP connection as a whole).  Whether to close the
454	   whole MPTCP connection by default SHOULD be controlled by a local
455	   policy.  Further experiments are needed to investigate its
456	   implications.

458	   The functions getpeername() and getsockname() SHOULD also always
459	   return the addresses of the first subflow if the socket is used by an
460	   MPTCP-aware application, in order to be consistent with MPTCP-unaware
461	   applications, and, e. g., also with SCTP.  Instead of getpeername()
462	   or getsockname(), MPTCP-aware applications can use new API calls,
463	   documented later, in order to retrieve the full list of address pairs
464	   for the subflows in use.

466	4.3.  Socket Option Issues

468	4.3.1.  General Guideline

470	   The existing sockets API includes options that modify the behavior of
471	   sockets and their underlying communications protocols.  Various
472	   socket options exist on socket, TCP, and IP level.  The value of an
473	   option can usually be set by the setsockopt() system function.  The
474	   getsockopt() function gets information.  In general, the existing
475	   sockets interface functions cannot configure each MPTCP subflow
476	   individually.  In order to be backward compatible, existing APIs
477	   therefore SHOULD apply to all subflows within one connection, as far
478	   as possible.

480	4.3.2.  Disabling of the Nagle Algorithm

482	   One commonly used TCP socket option (TCP_NODELAY) disables the Nagle
483	   algorithm as described in [2].  This option is also specified in the
484	   Posix standard [8].  Applications can use this option in combination
485	   with MPTCP exactly in the same way.  It then SHOULD disable the Nagle
486	   algorithm for the MPTCP connection, i.e., all subflows.

488	   In addition, the MPTCP protocol instance MAY use a different path
489	   scheduler algorithm if TCP_NODELAY is present.  For instance, it
490	   could use an algorithm that is optimized for latency-sensitive
491	   traffic.  Specific algorithms are outside the scope of this document.

493	4.3.3.  Buffer Sizing

495	   Applications can explicitly configure send and receive buffer sizes
496	   by the sockets API (SO_SNDBUF, SO_RCVBUF).  These socket options can
497	   also be used in combination with MPTCP and then affect the buffer
498	   size of the MPTCP connection.  However, when defining buffer sizes,
499	   application programmers should take into account that the transport
500	   over several subflows requires a certain amount of buffer for
501	   resequencing in the receiver.  MPTCP may also require more storage
502	   space in the sender, in particular, if retransmissions are sent over
503	   more than one path.  In addition, very small send buffers may prevent
504	   MPTCP from efficiently scheduling data over different subflows.
505	   Therefore, it does not make sense to use MPTCP in combination with
506	   small send or receive buffers.

508	   An MPTCP implementation MAY set a lower bound for send and receive
509	   buffers and treat a small buffer size request as an implicit request
510	   not to use MPTCP.

512	4.3.4.  Other Socket Options

514	   Some network stacks also provide other implementation-specific socket
515	   options or interfaces that affect TCP's behavior.  If a network stack
516	   supports MPTCP, it must be ensured that these options do not
517	   interfere.

519	4.4.  Default Enabling of MPTCP

521	   It is up to a local policy at the end system whether a network stack
522	   should automatically enable MPTCP for sockets even if there is no
523	   explicit sign of MPTCP awareness of the corresponding application.
524	   Such a choice may be under the control of the user through system
525	   preferences.

527	   The enabling of MPTCP, either by application or by system defaults,
528	   does not necessarily mean that MPTCP will always be used.  Both
529	   endpoints must support MPTCP, and there must be multiple addresses at
530	   at least one endpoint, for MPTCP to be used.  Even if those
531	   requirements are met, however, MPTCP may not be immediately used on a
532	   connection.  It may make sense for multiple paths to be brought into
533	   operation only after a given period of time, or if the connection is
534	   saturated.

536	4.5.  Summary of Advices to Application Developers

538	   o  Using the default MPTCP configuration: Like TCP, MPTCP is designed
539	      to be efficient and robust in the default configuration.
540	      Application developers should not explicitly configure TCP (or
541	      MPTCP) features unless this is really needed.

543	   o  Socker buffet dimensioning: Multipath transport requires larger
544	      buffers in the receiver for resequencing, as already explained.
545	      Applications should use reasonably buffer sizes (such as the
546	      operating system default values) in order to fully benefit from
547	      MPTCP.  A full discussion of buffer sizing issues is given in [5].

549	   o  Facilitating stack-internal heuristics: The path management and
550	      data scheduling by MPTCP is realized by stack-internal algorithms
551	      that may implicitly try to self-optimize their behavior according
552	      to assumed application needs.  For instance, an MPTCP
553	      implementation may use heuristics to determine whether an
554	      application requires delay-sensitive or bulk data transport, using
555	      for instance port numbers, the TCP_NODELAY socket options, or the
556	      application's read/write patterns as input parameters.  An
557	      application developer can facilitate the operation of such
558	      heuristics by avoiding atypical interface use cases.  For
559	      instance, for long bulk data transfers, it does neither make sense
560	      to enable the TCP_NODELAY socket option, nor is it reasonable to
561	      use many small subsequent socket "send()" calls with small amounts
562	      of data only.

564	5.  Basic API for MPTCP-aware Applications

566	5.1.  Design Considerations

568	   While applications can use MPTCP with the unmodified sockets API,
569	   multipath transport results in many degrees of freedom.  MPTCP
570	   manages the data transport over different subflows automatically.  By
571	   default, this is transparent to the application, but an application
572	   could use an additional API to interface with the MPTCP layer and to
573	   control important aspects of the MPTCP implementation's behaviour.

575	   This document describes a basic MPTCP API.  The API contains a
576	   minimum set of functions that provide an equivalent level of control
577	   and information as exists for regular TCP.  It maintains backward
578	   compatibility with legacy applications.

580	   An advanced MPTCP API is outside the scope of this document.  The
581	   basic API does not allow a sender or a receiver to express
582	   preferences about the management of paths or the scheduling of data,
583	   even if this can have a significant performance impact and if an
584	   MPTCP implementation could benefit from additional guidance by
585	   applications.  A list of potential further API extensions is provided
586	   in the appendix.  The specification of such an advanced API is for
587	   further study and may partly be implementation-specific.

589	   MPTCP mainly affects the sending of data.  Therefore, the basic API
590	   only affects the sender side of a data transfer.  A receiver may also
591	   have preferences about data transfer choices, and it may have
592	   performance requirements, too.  A receiver may also have preferences
593	   about data transfer choices, and it may have performance
594	   requirements, too.  Yet, the configuration of such preferences is
595	   outside of the scope of the basic API.

597	5.2.  Requirements on the Basic MPTCP API

599	   Because of the importance of the sockets interface there are several
600	   fundamental design objectives for the basic interface between MPTCP
601	   and applications:

603	   o  Consistency with existing sockets APIs must be maintained as far
604	      as possible.  In order to support the large base of applications
605	      using the original API, a legacy application must be able to
606	      continue to use standard socket interface functions when run on a
607	      system supporting MPTCP.  Also, MPTCP-aware applications should be
608	      able to access the socket without any major changes.

610	   o  Sockets API extensions must be minimized and independent of an
611	      implementation.

613	   o  The interface should both handle IPv4 and IPv6.

615	   The following is a list of the core requirements for the basic API:

617	   REQ1:  Turn on/off MPTCP: An application should be able to request to
618	          turn on or turn off the usage of MPTCP.  This means that an
619	          application should be able to explicitly request the use of
620	          MPTCP if this is possible.  Applications should also be able
621	          to request not to enable MPTCP and to use regular TCP
622	          transport instead.  This can be implicit in many cases, since
623	          MPTCP must disabled by the use of binding to a specific
624	          address.  MPTCP may also be enabled if an application uses a
625	          dedicated multipath address family (such as AF_MULTIPATH,
626	          [9]).

628	   REQ2:  An application should be able to restrict MPTCP to binding to
629	          a given set of addresses.

631	   REQ3:  An application should be able obtain information on the
632	          addresses used by the MPTCP subflows.

634	   REQ4:  An application should be able to extract a unique identifier
635	          for the connection (per endpoint).

637	   The first requirement is the most important one, since some
638	   applications could benefit a lot from MPTCP, but there are also cases
639	   in which it hardly makes sense.  The existing sockets API provides
640	   similar mechanisms to enable or disable advanced TCP features.  The
641	   second requirement corresponds to the binding of addresses with the
642	   bind() socket call, or, e.g., explicit device bindings with a
643	   SO_BINDTODEVICE option.  The third requirement ensures that there is
644	   an equivalent to getpeername() or getsockname() that is able to deal
645	   with more than one subflow.  Finally, it should be possible for the
646	   application to retrieve a unique connection identifier (local to the
647	   endpoint on which it is running) for the MPTCP connection.  This is
648	   equivalent to using the (address, port) pair for a connection
649	   identifier in single-path TCP, which is no longer static in MPTCP.

651	   An application can continue to use getpeername() or getsockname() in
652	   addition to the basic MPTCP API.  In that case, both functions return
653	   the corresponding addresses of the first subflow, as already
654	   explained.

656	5.3.  Sockets Interface Extensions by the Basic MPTCP API

658	5.3.1.  Overview

660	   The abstract, basic MPTCP API consists of a set of new values that
661	   are associated with an MPTCP socket.  Such values may be used for
662	   changing properties of an MPTCP connection, or retrieving
663	   information.  These values could be accessed by new symbols on
664	   existing calls such as setsockopt() and getsockopt(), or could be
665	   implemented as entirely new function calls.  This implementation
666	   decision is out of scope for this document.  The following list
667	   presents symbolic names for these MPTCP socket settings.

669	   o  TCP_MULTIPATH_ENABLE: Enable/disable MPTCP

671	   o  TCP_MULTIPATH_ADD: Bind MPTCP to a set of given local addresses,
672	      or add a new local address to an existing MPTCP connection

674	   o  TCP_MULTIPATH_REMOVE: Remove a local address from an MPTCP
675	      connection

677	   o  TCP_MULTIPATH_SUBFLOWS: Get the pairs of addresses currently used
678	      by the MPTCP subflows

680	   o  TCP_MULTIPATH_CONNID: Get the local connection identifier for this
681	      MPTCP connection

683	   Table Table 1 shows a list of the abstract socket operations for the
684	   basic configuration of MPTCP.  The first column gives the symbolic
685	   name of the operation.  The second and third columns indicate whether
686	   the operation provides values to be read ("Get") or takes values to
687	   configure ("Set").  The fourth column lists the type of data
688	   associated with this operation.

690	    +------------------------+-----+-----+----------------------------+
691	    | Name                   | Get | Set |          Data type         |
692	    +------------------------+-----+-----+----------------------------+
693	    | TCP_MULTIPATH_ENABLE   |  o  |  o  |           boolean          |
694	    | TCP_MULTIPATH_ADD      |     |  o  |      list of addresses     |
695	    | TCP_MULTIPATH_REMOVE   |     |  o  |      list of addresses     |
696	    | TCP_MULTIPATH_SUBFLOWS |  o  |     | list of pairs of addresses |
697	    | TCP_MULTIPATH_CONNID   |  o  |     |       32-bit integer       |
698	    +------------------------+-----+-----+----------------------------+

700	                     Table 1: MPTCP Socket Operations

702	   There are restrictions when these new socket operations can be used:

704	   o  TCP_MULTIPATH_ENABLE: This value SHOULD only be set before the
705	      establishment of a TCP connection.  Its value SHOULD only be read
706	      after the establishment of a connection.

708	   o  TCP_MULTIPATH_ADD: This operation can be both applied before
709	      connection setup or during a connection.  If used before, it
710	      controls the local addresses that an MPTCP connection can use.  In
711	      the latter case, it allows MPTCP to use an additional local
712	      address, if there has been a restriction before connection setup.

714	   o  TCP_MULTIPATH_REMOVE: This operation can be both applied before
715	      connection setup or during a connection.  In both cases, it
716	      removes an address from the list of local addresses that may be
717	      used by subflows.

719	   o  TCP_MULTIPATH_SUBFLOWS: This value is read-only and SHOULD only be
720	      used after connection setup.

722	   o  TCP_MULTIPATH_CONNID: This value is read-only and SHOULD only be
723	      used after connection setup.

725	5.3.2.  Enabling and Disabling of MPTCP

727	   An application can explicitly indicate multipath capability by
728	   setting TCP_MULTIPATH_ENABLE to a value larger than 0.  In this case,
729	   the MPTCP implementation SHOULD try to negitiate MPTCP for that
730	   connection.  Note that multipath transport will not necessarily be
731	   enabled, as it requires multiple addresses and support in the other
732	   end-system and potentially also on middleboxes.

734	   An application can disable MPTCP setting TCP_MULTIPATH_ENABLE to a
735	   value of 0.  In that case, MPTCP MUST NOT be used on that connection.

737	   After connection establishment, an application can get the value of
738	   TCP_MULTIPATH_ENABLE.  A value of 0 then means lack of MPTCP support.
739	   Any value equal to or larger than 1 means that MPTCP is supported.

741	   As alternative to setting an explicit value, an application could
742	   also use a new, separate address family called AF_MULTIPATH [9].
743	   This separate address family can be used to exchange multiple
744	   addresses between an application and the standard sockets API, and
745	   additionally acts as an explicit indication that an application is
746	   MPTCP-aware, i.e., that it can deal with the semantic changes of the
747	   sockets API, in particular concerning getpeername() and
748	   getsockname().  The usage of AF_MULTIPATH is also more flexible with
749	   respect to multipath transport, either IPv4 or IPv6, or both in
750	   parallel [9].

752	5.3.3.  Binding MPTCP to Specified Addresses

754	   Before connection establishment, an application can use
755	   TCP_MULTIPATH_ADD socket option to indicate a set of local IP
756	   addresses that MPTCP may bind to.  The parameter of the function is a
757	   list of addresses in a corresponding data structure.  By extension,
758	   this operation will also control the list of addresses that can be
759	   advertised to the peer via MPTCP signalling.

761	   An application MAY also indicate a TCP port number that MPTCP should
762	   bind to for a given address.  The port number MAY be different to the
763	   one used by existing subflows.  If no port number is provided by the
764	   application, the port number is automatically selected by the MPTCP
765	   implementation, and will usually be the same across all subflows.

767	   This operation can also be used to modify the address list in use
768	   during the lifetime of an MPTCP connection.  In this case, it is used
769	   to indicate a set of additional local addresses that the MPTCP
770	   connection can make use of, and which can be signalled to the peer.
771	   It should be noted that this signal is only a hint, and an MPTCP
772	   implementation MAY only use a subset of the addresses.

774	   The TCP_MULTIPATH_REMOVE operation can be used to remove a (set of)
775	   local addresses from an MPTCP connection.  MPTCP MUST close any
776	   corresponding subflows (i.e. those using the local address that is no
777	   longer present), and signal the removal of the address to the peer.
778	   If alternative paths are available using the supplied address list
779	   but MPTCP is not currently using them, an MPTCP implementation SHOULD
780	   establish alternative subflows before undertaking the address
781	   removal.

783	   It should be remembered that these operations SHOULD support both
784	   IPv4 and IPv6 addresses, potentially in the same call.

786	5.3.4.  Querying the MPTCP Subflow Addresses

788	   An application can get a list of the addresses used by the currently
789	   established subflows by means of the read-only TCP_MULTIPATH_SUBFLOWS
790	   operation.  The return value is a list of pairs of tuples of IP
791	   address and TCP port number.  In one pair, the first tuple refers to
792	   the local IP address and the local TCP port, and the second one to
793	   the remote IP address and remote TCP port used by the subflow.  The
794	   list MUST only include established subflows.  Both addresses in each
795	   pair MUST be either IPv4 or IPv6.

797	5.3.5.  Getting a Unique Connection Identifier

799	   An application that wants a unique identifier for the connection,
800	   analogous to an (address, port) pair in regular TCP, can query the
801	   TCP_MULTIPATH_CONNID value to get a local connection identifier for
802	   the MPTCP connection.

804	   This is a 32-bit number, and SHOULD be the same as the local
805	   connection identifier sent in the MPTCP handshake.

807	6.  Other Compatibility Issues

809	6.1.  Usage of the SCTP Socket API

811	   For dealing with multi-homing, several socket API extensions have
812	   been defined for SCTP [14].  As MPTCP realizes multipath transport
813	   from and to multi-homed endsystems, some of these interface function
814	   calls are actually applicable to MPTCP in a similar way.

816	   API developers MAY wish to integrate SCTP and MPTCP calls to provide
817	   a consistent interface to the application.  Yet, it must be
818	   emphasized that the transport service provided by MPTCP is different
819	   to SCTP, and this is why not all SCTP API functions can be mapped
820	   directly to MPTCP.  Furthermore, a network stack implementing MPTCP
821	   does not necessarily support SCTP and its specific socket interface
822	   extensions.  This is why the basic API of MPTCP defines additional
823	   socket options only, which are a backward compatible extension of
824	   TCP's application interface.  An integration with the SCTP API is
825	   outside the scope of the basic API.

827	6.2.  Incompatibilities with other Multihoming Solutions

829	   The use of MPTCP can interact with various related sockets API
830	   extensions.  The use of a multihoming shim layer conflicts with
831	   multipath transport such as MPTCP or SCTP [12].  Care should be taken
832	   for the usage not to confuse with the overlapping features of other
833	   APIs:

835	   o  SHIM API [12]: This API specifies sockets API extensions for the
836	      multihoming shim layer.

838	   o  HIP API [13]: The Host Identity Protocol (HIP) also results in a
839	      new API.

841	   o  API for Mobile IPv6 [11]: For Mobile IPv6, a significantly
842	      extended socket API exists as well.

844	   In order to avoid any conflict, multiaddressed MPTCP SHOULD NOT be
845	   enabled if a network stack uses SHIM6, HIP, or Mobile IPv6.
846	   Furthermore, applications should not try to use both the MPTCP API
847	   and another multihoming or mobility layer API.

849	   It is possible, however, that some of the MPTCP functionality, such
850	   as congestion control, could be used in a SHIM6 or HIP environment.
851	   Such operation is outside the scope of this document.

853	6.3.  Interactions with DNS

855	   In multihomed or multiaddressed environments, there are various
856	   issues that are not specific to MPTCP, but have to be considered,
857	   too.  These problems are summarized in [15].

859	   Specifically, there can be interactions with DNS.  Whilst it is
860	   expected that an application will iterate over the list of addresses
861	   returned from a call such as getaddrinfo(), MPTCP itself MUST NOT
862	   make any assumptions about multiple A or AAAA records from the same
863	   DNS query referring to the same host, as it is possible that multiple
864	   addresses refer to multiple servers for load balancing purposes.

866	7.  Security Considerations

868	   Will be added in a later version of this document.

870	8.  IANA Considerations

872	   No IANA considerations.

874	9.  Conclusion

876	   This document discusses MPTCP's application implications and
877	   specifies a basic MPTCP API.  For legacy applications, it is ensured
878	   that the existing sockets API continues to work.  MPTCP-aware
879	   applications can use the basic MPTCP API that provides some control
880	   over the transport layer equivalent to regular TCP.  A more fine-
881	   granular interaction between applications and MPTCP requires an
882	   advanced MPTCP API, which is not specified in this document.

884	10.  Acknowledgments

886	   Authors sincerely thank to the following people for their helpful
887	   comments and reviews of the document: Costin Raiciu, Philip Eardley,
888	   Javier Ubillos, and Michael Tuexen.

890	   Michael Scharf is supported by the German-Lab project
891	   (http://www.german-lab.de/) funded by the German Federal Ministry of
892	   Education and Research (BMBF).  Alan Ford was supported by Roke Manor
893	   Research and by Trilogy (http://www.trilogy-project.org/), a research
894	   project (ICT-216372) partially funded by the European Community under
895	   its Seventh Framework Program.  The views expressed here are those of
896	   the author(s) only.  The European Commission is not liable for any
897	   use that may be made of the information in this document.

899	11.  References

901	11.1.  Normative References

903	   [1]   Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
904	         September 1981.

906	   [2]   Braden, R., "Requirements for Internet Hosts - Communication
907	         Layers", STD 3, RFC 1122, October 1989.

909	   [3]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
910	         Levels", BCP 14, RFC 2119, March 1997.

912	   [4]   Ford, A., Raiciu, C., Handley, M., Barre, S., and J. Iyengar,
913	         "Architectural Guidelines for Multipath TCP Development",
914	         RFC 6182, March 2011.

916	   [5]   Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, "TCP
917	         Extensions for Multipath Operation with Multiple Addresses",
918	         draft-ietf-mptcp-multiaddressed-04 (work in progress),
919	         July 2011.

921	   [6]   Bagnulo, M., "Threat Analysis for TCP Extensions for Multipath
922	         Operation with Multiple Addresses", RFC 6181, March 2011.

924	   [7]   Raiciu, C., Handley, M., and D. Wischik, "Coupled Congestion
925	         Control for Multipath Transport Protocols", RFC 6356,
926	         October 2011.

928	   [8]   "IEEE Std. 1003.1-2008 Standard for Information Technology --
929	         Portable Operating System Interface (POSIX). Open Group
930	         Technical Standard: Base Specifications, Issue 7, 2008.".

932	11.2.  Informative References

934	   [9]   Sarolahti, P., "Multi-address Interface in the Socket API",
935	         draft-sarolahti-mptcp-af-multipath-01 (work in progress),
936	         March 2010.

938	   [10]  Stevens, W., Thomas, M., Nordmark, E., and T. Jinmei, "Advanced
939	         Sockets Application Program Interface (API) for IPv6",
940	         RFC 3542, May 2003.

942	   [11]  Chakrabarti, S. and E. Nordmark, "Extension to Sockets API for
943	         Mobile IPv6", RFC 4584, July 2006.

945	   [12]  Komu, M., Bagnulo, M., Slavov, K., and S. Sugimoto, "Sockets
946	         Application Program Interface (API) for Multihoming Shim",
947	         RFC 6316, July 2011.

949	   [13]  Komu, M. and T. Henderson, "Basic Socket Interface Extensions
950	         for the Host Identity Protocol (HIP)", RFC 6317, July 2011.

952	   [14]  Stewart, R., Tuexen, M., Poon, K., Lei, P., and V. Yasevich,
953	         "Sockets API Extensions for Stream Control Transmission
954	         Protocol (SCTP)", draft-ietf-tsvwg-sctpsocket-32 (work in
955	         progress), October 2011.

957	   [15]  Blanchet, M. and P. Seite, "Multiple Interfaces and
958	         Provisioning Domains Problem Statement",
959	         draft-ietf-mif-problem-statement-15 (work in progress),
960	         May 2011.

962	   [16]  Wasserman, M. and P. Seite, "Current Practices for Multiple
963	         Interface Hosts", draft-ietf-mif-current-practices-12 (work in
964	         progress), July 2011.

966	   [17]  Wing, D. and A. Yourtchenko, "Happy Eyeballs: Success with
967	         Dual-Stack Hosts", draft-ietf-v6ops-happy-eyeballs-05 (work in
968	         progress), October 2011.

970	Appendix A.  Requirements on a Future Advanced MPTCP API

972	A.1.  Design Considerations

974	   Multipath transport results in many degrees of freedom.  The basic
975	   MPTCP API only defines a minimum set of the API extensions for the
976	   interface between the MPTCP layer and applications, which does not
977	   offer much control of the MPTCP implementation's behaviour.  A
978	   future, advanced API could address further features of MPTCP and
979	   provide more control.

981	   Applications that use TCP may have different requirements on the
982	   transport layer.  While developers have become used to the
983	   characteristics of regular TCP, new opportunities created by MPTCP
984	   could allow the service provided to be optimised further.  An
985	   advanced API could enable MPTCP-aware applications to specify
986	   preferences and control certain aspects of the behavior, in addition
987	   to the simple control provided by the basic interface.  An advanced
988	   API could also address aspects that are completely out-of-scope of
989	   the basic API, for example, the question whether a receiving
990	   application could influence the sending policy.

992	   Furthermore, an advanced MPTCP API could be part of a new overall
993	   interface between the network stack and applications that addresses
994	   other issues as well, such as the split between identifiers and
995	   locators.  An API that does not use IP addresses (but, instead e.g. a
996	   connectbyname() function) would be useful for numerous purposes,
997	   independent of MPTCP.

999	   This appendix documents a list of potential usage scenarios and
1000	   requirements for the advanded API.  The specification and
1001	   implementation of a corresponding API is outside the scope of this
1002	   document.

1004	A.2.  MPTCP Usage Scenarios and Application Requirements

1006	   There are different MPTCP usage scenarios.  An application that
1007	   wishes to transmit bulk data will want MPTCP to provide a high
1008	   throughput service immediately, through creating and maximising
1009	   utilisation of all available subflows.  This is the default MPTCP use
1010	   case.

1012	   But at the other extreme, there are applications that are highly
1013	   interactive, but require only a small amount of throughput, and these
1014	   are optimally served by low latency and jitter stability.  In such a
1015	   situation, it would be preferable for the traffic to use only the
1016	   lowest latency subflow (assuming it has sufficient capacity), maybe
1017	   with one or two additional subflows for resilience and recovery
1018	   purposes.  The key challenge for such a strategy is that the delay on
1019	   a path may fluctuate significantly and that just always selecting the
1020	   path with the smallest delay might result in instability.

1022	   The choice between bulk data transport and latency-sensitive
1023	   transport affects the scheduler in terms of whether traffic should
1024	   be, by default, sent on one subflow or across several ones.  Even if
1025	   the total bandwidth required is less than that available on an
1026	   individual path, it is desirable to spread this load to reduce stress
1027	   on potential bottlenecks, and this is why this method should be the
1028	   default for bulk data transport.  However, that may not be optimal
1029	   for applications that require latency/jitter stability.

1031	   In the case of the latter option, a further question arises: Should
1032	   additional subflows be used whenever the primary subflow is
1033	   overloaded, or only when the primary path fails (hot-standby)?  In
1034	   other words, is latency stability or bandwidth more important to the
1035	   application?  This results in two different options: Firstly, there
1036	   is the single path which can overflow into an additional subflow; and
1037	   secondly there is single-path with hot-standby, whereby an
1038	   application may want an alternative backup subflow in order to
1039	   improve resilience.  In case that data delivery on the first subflow
1040	   fails, the data transport could immediately be continued on the
1041	   second subflow, which is idle otherwise.

1043	   Yet another complication is introduced with the potential that MPTCP
1044	   introduces for changes in available bandwidth as the number of
1045	   available subflows changes.  Such jitter in bandwidth may prove
1046	   confusing for some applications such as video or audio streaming that
1047	   dynamically adapt codecs based on available bandwidth.  Such
1048	   applications may prefer MPTCP to attempt to provide a consistent
1049	   bandwidth as far as is possible, and avoid maximising the use of all
1050	   subflows.

1052	   A further, mostly orthogonal question is whether data should be
1053	   duplicated over the different subflows, in particular if there is
1054	   spare capacity.  This could improve both the timeliness and
1055	   reliability of data delivery.

1057	   In summary, there are at least three possible performance objectives
1058	   for multipath transport (not necessarily disjoint):

1060	   1.  High bandwidth

1062	   2.  Low latency and jitter stability

1064	   3.  High reliability
1065	   In an advanced API, applications could provide high-level guidance to
1066	   the MPTCP implementation concerning these performance requirements,
1067	   for instance, which is considered to be the most important one.  The
1068	   MPTCP stack would then use internal mechanisms to fulfill this
1069	   abstract indication of a desired service, as far as possible.  This
1070	   would both affect the assignment of data (including retransmissions)
1071	   to existing subflows (e.g., 'use all in parallel', 'use as overflow',
1072	   'hot standby', 'duplicate traffic') as well as the decisions when to
1073	   set up additional subflows to which addresses.  In both cases
1074	   different policies can exist, which can be expected to be
1075	   implementation-specific.

1077	   Therefore, an advanced API could provide a mechanism how applications
1078	   can specify their high-level requirements in an implementation-
1079	   independent way.  One possibility would be to select one "application
1080	   profile" out of a number of choices that characterize typical
1081	   applications.  Yet, as applications today do not have to inform TCP
1082	   about their communication requirements, it requires further studies
1083	   whether such an approach would be realistic.

1085	   Of course, independent of an advanced API, such functionality could
1086	   also partly be achieved by MPTCP-internal heuristics that infer some
1087	   application preferences e.g. from existing socket options, such as
1088	   TCP_NODELAY.  Whether this would be reliable, and indeed appropriate,
1089	   is for further study, too.

1091	A.3.  Potential Requirements on an Advanced MPTCP API

1093	   The following is a list of potential requirements for an advanced
1094	   MPTCP API beyond the features of the basic API.  It is included here
1095	   for information only:

1097	   REQ5:   An application should be able to establish MPTCP connections
1098	           without using IP addresses as locators.

1100	   REQ6:   An application should be able obtain usage information and
1101	           statistics about all subflows (e.g., ratio of traffic sent
1102	           via this subflow).

1104	   REQ7:   An application should be able to request a change in the
1105	           number of subflows in use, thus triggering removal or
1106	           addition of subflows.  An even finer control granularity
1107	           would be a request for the establishment of a new subflow to
1108	           a provided destination, or a request for the termination of a
1109	           specified, existing subflow.

1111	   REQ8:   An application should be able to inform the MPTCP
1112	           implementation about its high-level performance requirements,
1113	           e.g., in form of a profile.

1115	   REQ9:   An application should be able to indicate communication
1116	           characteristics, e. g., the expected amount of data to be
1117	           sent, the expected duration of the connection, or the
1118	           expected rate at which data is provided.  Applications may in
1119	           some cases be able to forecast such properties.  If so, such
1120	           information could be an additional input parameter for
1121	           heuristics inside the MPTCP implementation, which could be
1122	           useful for example to decide when to set up additional
1123	           subflows.

1125	   REQ10:  An application should be able to control the automatic
1126	           establishment/termination of subflows.  This would imply a
1127	           selection among different heuristics of the path manager,
1128	           e.g., 'try as soon as possible', 'wait until there is a bunch
1129	           of data', etc.

1131	   REQ11:  An application should be able to set preferred subflows or
1132	           subflow usage policies.  This would result in a selection
1133	           among different configurations of the multipath scheduler.
1134	           For instance, an application might want to use certain
1135	           subflows as backup only.

1137	   REQ12:  An application should be able to control the level of
1138	           redundancy by telling whether segments should be sent on more
1139	           than one path in parallel.

1141	   An advanced API fulfilling these requirements would allow application
1142	   developers to more specifically configure MPTCP.  It could avoid
1143	   suboptimal decisions of internal, implicit heuristics.  However, it
1144	   is unclear whether all of these requirements would have a significant
1145	   benefit to applications, since they are going above and beyond what
1146	   the existing API to regular TCP provides.

1148	   A subset of this functions might also be implemented system wide or
1149	   by other configuration mechanisms.  These implementation details are
1150	   left for further study.

1152	A.4.  Integration with the SCTP Socket API

1154	   The advanced API may also integrate or use the SCTP Socket API.  The
1155	   following functions that are defined for SCTP have a similar
1156	   functionality like the basic MPTCP API:

1158	   o  sctp_bindx()

1160	   o  sctp_connectx()

1162	   o  sctp_getladdrs()

1164	   o  sctp_getpaddrs()

1166	   o  sctp_freeladdrs()

1168	   o  sctp_freepaddrs()

1170	   The syntax and semantics of these functions are described in [14].

1172	   A potential objective for the advanced API is to provide a consistent
1173	   MPTCP and SCTP interface to the application.  This is left for
1174	   further study in this document.

1176	Appendix B.  Change History of the Document

1178	   Changes compared to version draft-ietf-mptcp-api-02:

1180	   o  Updated references

1182	   o  Editorial changes

1184	   Changes compared to version draft-ietf-mptcp-api-01:

1186	   o  Additional text on outdated assumptions if an MPTCP application
1187	      does not use fate sharing.

1189	   o  The appendix explicitly mentions an integration of the advanced
1190	      MPTCP API and the SCTP API as a potential objective, which is left
1191	      for further study for the basic API.

1193	   o  A short additional explanation of the parameters of the abstract
1194	      functions TCP_MULTIPATH_ADD and TCP_MULTIPATH_REMOVE.

1196	   o  Better explanation when TCP_MULTIPATH_REMOVE may be used.

1198	   Changes compared to version draft-ietf-mptcp-api-00:

1200	   o  Explicitly specify that the TCP_MULTIPATH_SUBFLOWS function
1201	      returns port numbers, too.  Furthermore, add a new comment that
1202	      TCP_MULTIPATH_ADD permits the specification of a port number.

1204	   o  Mention possible additional extended API functions for the
1205	      indication of application characterstics and for backup paths,
1206	      based on comments received from the community.

1208	   o  Mentions alternative approaches for avoiding non-MPTCP-capable
1209	      paths to reduce impact on applications.

1211	   Changes compared to version draft-scharf-mptcp-api-03:

1213	   o  Removal of explicit references to "socket options" and getsockopt/
1214	      setsockopt.

1216	   o  Change of TCP_MULTIPATH_BIND to TCP_MULTIPATH_ADD and
1217	      TCP_MULTIPATH_REMOVE.

1219	   o  Mention of stability of bandwidth as another potential QoS
1220	      parameter for the advanced API.

1222	   o  Address comments received from Philip Eardley: Explanation of the
1223	      API terminology, more explicit statement concerning applications
1224	      that bind to a specific address, and some smaller editorial fixes

1226	   Changes compared to version draft-scharf-mptcp-api-02:

1228	   o  Definition of the behavior of getpeername() and getsockname() when
1229	      being called by an MPTCP-aware application.

1231	   o  Discussion of the possiblity that an MPTCP implementation could
1232	      support the SCTP API, as far as it is applicable to MPTCP.

1234	   o  Various editorial fixes.

1236	   Changes compared to version draft-scharf-mptcp-api-01:

1238	   o  Second half of the document completely restructured

1240	   o  Separation between a basic API and an advanced API: The focus of
1241	      the document is the basic API only; all text concerning a
1242	      potential extended API is moved to the appendix

1244	   o  Several clarifications, e. g., concerning buffer sizeing and the
1245	      use of different scheduling strategies triggered by TCP_NODELAY

1247	   o  Additional references

1249	   Changes compared to version draft-scharf-mptcp-api-00:

1251	   o  Distinction between legacy and MPTCP-aware applications
1252	   o  Guidance concerning default enabling, reaction to the shutdown of
1253	      the first subflow, etc.

1255	   o  Reference to a potential use of AF_MULTIPATH

1257	   o  Additional references to related work

1259	Authors' Addresses

1261	   Michael Scharf
1262	   Alcatel-Lucent Bell Labs
1263	   Lorenzstrasse 10
1264	   70435 Stuttgart
1265	   Germany

1267	   EMail: michael.scharf@alcatel-lucent.com

1269	   Alan Ford

1271	   EMail: alan.ford@gmail.com