idnits 2.17.1 

draft-ietf-mptcp-architecture-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     As a corollary to both network and application compatibility, the
     architecture must enable new Multipath TCP flows to coexist gracefully
     with existing legacy TCP flows, competing for bandwidth neither unduly
     aggressively or unduly timidly (unless low-precedence operation is
     specifically requested by the application, such as with LEDBAT).  The use
     of multiple paths MUST not unduly harm users using single path TCP at
     shared bottlenecks, beyond the impact that would occur from another
     single legacy TCP flow.

  -- The document date (June 22, 2010) is 5056 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: '9' is defined on line 884, but no explicit reference
     was found in the text

  == Outdated reference: A later version (-12) exists of
     draft-ietf-mptcp-multiaddressed-00

  -- Obsolete informational reference (is this intentional?): RFC  793 (ref.
     '4') (Obsoleted by RFC 9293)

  -- Obsolete informational reference (is this intentional?): RFC 4960 (ref.
     '5') (Obsoleted by RFC 9260)

  == Outdated reference: A later version (-04) exists of
     draft-scharf-mptcp-api-01

  == Outdated reference: A later version (-08) exists of
     draft-ietf-mptcp-threat-02


     Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Engineering Task Force                             A. Ford, Ed.
3	Internet-Draft                                       Roke Manor Research
4	Intended status: Informational                                 C. Raiciu
5	Expires: December 24, 2010                     University College London
6	                                                                S. Barre
7	                                                Universite catholique de
8	                                                                 Louvain
9	                                                              J. Iyengar
10	                                           Franklin and Marshall College
11	                                                           June 22, 2010

13	         Architectural Guidelines for Multipath TCP Development
14	                    draft-ietf-mptcp-architecture-01

16	Abstract

18	   Endpoints are often connected by multiple paths, but TCP restricts
19	   communications to a single path per transport connection.  Resource
20	   usage within the network would be more efficient were these multiple
21	   paths able to be used concurrently.  This should enhance user
22	   experience through improved resilience to network failure and higher
23	   throughput.

25	   This document outlines architectural guidelines for the development
26	   of a Multipath Transport Protocol, with references to how these
27	   architectural components come together in the Multipath TCP (MPTCP)
28	   protocol.  This document also lists certain high level design
29	   decisions that provide foundations for the MPTCP design, based upon
30	   these architectural requirements.

32	Status of this Memo

34	   This Internet-Draft is submitted in full conformance with the
35	   provisions of BCP 78 and BCP 79.

37	   Internet-Drafts are working documents of the Internet Engineering
38	   Task Force (IETF).  Note that other groups may also distribute
39	   working documents as Internet-Drafts.  The list of current Internet-
40	   Drafts is at http://datatracker.ietf.org/drafts/current/.

42	   Internet-Drafts are draft documents valid for a maximum of six months
43	   and may be updated, replaced, or obsoleted by other documents at any
44	   time.  It is inappropriate to use Internet-Drafts as reference
45	   material or to cite them other than as "work in progress."

47	   This Internet-Draft will expire on December 24, 2010.

49	Copyright Notice

51	   Copyright (c) 2010 IETF Trust and the persons identified as the
52	   document authors.  All rights reserved.

54	   This document is subject to BCP 78 and the IETF Trust's Legal
55	   Provisions Relating to IETF Documents
56	   (http://trustee.ietf.org/license-info) in effect on the date of
57	   publication of this document.  Please review these documents
58	   carefully, as they describe your rights and restrictions with respect
59	   to this document.  Code Components extracted from this document must
60	   include Simplified BSD License text as described in Section 4.e of
61	   the Trust Legal Provisions and are provided without warranty as
62	   described in the Simplified BSD License.

64	Table of Contents

66	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
67	     1.1.  Requirements Language  . . . . . . . . . . . . . . . . . .  5
68	     1.2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . .  5
69	     1.3.  Reference Scenario . . . . . . . . . . . . . . . . . . . .  5
70	   2.  Goals  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
71	     2.1.  Functional Goals . . . . . . . . . . . . . . . . . . . . .  6
72	     2.2.  Compatibility Goals  . . . . . . . . . . . . . . . . . . .  7
73	       2.2.1.  Application Compatibility  . . . . . . . . . . . . . .  7
74	       2.2.2.  Network Compatibility  . . . . . . . . . . . . . . . .  7
75	       2.2.3.  Compatibility with other network users . . . . . . . .  8
76	   3.  An Architectural Basis For MPTCP . . . . . . . . . . . . . . .  9
77	   4.  A Functional Decomposition of MPTCP  . . . . . . . . . . . . . 10
78	   5.  High-Level Design Decisions  . . . . . . . . . . . . . . . . . 12
79	     5.1.  Sequence Numbering . . . . . . . . . . . . . . . . . . . . 12
80	     5.2.  Reliability  . . . . . . . . . . . . . . . . . . . . . . . 13
81	     5.3.  Buffers  . . . . . . . . . . . . . . . . . . . . . . . . . 14
82	     5.4.  Signalling . . . . . . . . . . . . . . . . . . . . . . . . 15
83	     5.5.  Path Management  . . . . . . . . . . . . . . . . . . . . . 15
84	     5.6.  Connection Identification  . . . . . . . . . . . . . . . . 16
85	     5.7.  Network Layer Compatibility  . . . . . . . . . . . . . . . 16
86	     5.8.  Congestion Control . . . . . . . . . . . . . . . . . . . . 17
87	   6.  Summary  . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
88	   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 17
89	   8.  Interactions with Applications . . . . . . . . . . . . . . . . 17
90	   9.  Interactions with Middleboxes  . . . . . . . . . . . . . . . . 18
91	   10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19
92	   11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 19
93	   12. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 19
94	   13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
95	     13.1. Normative References . . . . . . . . . . . . . . . . . . . 20
96	     13.2. Informative References . . . . . . . . . . . . . . . . . . 20
97	   Appendix A.  Implementation Architecture . . . . . . . . . . . . . 21
98	     A.1.  Functional Separation  . . . . . . . . . . . . . . . . . . 21
99	       A.1.1.  Application to default MPTCP protocol  . . . . . . . . 21
100	       A.1.2.  Generic architecture for MPTCP . . . . . . . . . . . . 24
101	     A.2.  PM/MPS interface . . . . . . . . . . . . . . . . . . . . . 25
102	   Appendix B.  Changelog . . . . . . . . . . . . . . . . . . . . . . 26
103	     B.1.  Changes since draft-ietf-mptcp-architecture-00 . . . . . . 26
104	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27

106	1.  Introduction

108	   As the Internet evolves, demands on Internet resources are ever-
109	   increasing, but often these resources (in particular, bandwidth)
110	   cannot be fully utilised due to protocol constraints both on the end-
111	   systems and within the network.  If these resources could instead be
112	   used concurrently, end user experience could be greatly improved.
113	   Such enhancements would also reduce the necessary expenditure on
114	   network infrastructure which would otherwise be needed to create an
115	   equivalent improvement in user experience.

117	   By the application of resource pooling[2], these available resources
118	   can be 'pooled' such that they appear as a single logical resource to
119	   the user.  The purpose of a multipath transport, therefore, is to
120	   make use of multiple available paths, through resource pooling, to
121	   bring two key benefits:

123	   o  To increase the resilience of the connectivity by providing
124	      multiple paths, protecting end hosts from the failure of one.

126	   o  To increase the efficiency of the resource usage, and thus
127	      increase the network capacity available to end hosts.

129	   Multipath TCP (MPTCP)[3] is a set of extensions for TCP[4] that
130	   implements a multipath transport and achieves these goals by pooling
131	   multiple paths within a transport connection, transparent to the
132	   application.  While multihoming and multipath functions have been
133	   implemented in transport protocols previously, notably SCTP[5], MPTCP
134	   is distinct in recognizing application and network compatibility
135	   goals that we believe are important for deployability of a multipath
136	   transport; we discuss these goals in more detail later in Section 2.

138	   This document makes three contributions: (i) it describes goals for a
139	   multipath transport - goals that MPTCP is designed to meet; (ii) it
140	   lays out an architectural basis for MPTCP's design - a discussion
141	   that applies to other multipath transports as well; and (iii) it
142	   discusses and documents high-level design decisions made in MPTCP's
143	   development, and considers their implications.

145	   Companion documents to this architectural overview are those which
146	   provide details of the protocol extensions[3], congestion control
147	   algorithms[6], and application-level considerations[7].  Put
148	   together, these components specify a complete Multipath TCP design.
149	   We note that specific components are replaceable with other protocols
150	   in accordance with the layer and functional decompositions discussed
151	   in this document.

153	   Please note this document is a work-in-progress and covers several
154	   topics, some of which may be more appropriately moved to separate
155	   documents as this work evolves.

157	1.1.  Requirements Language

159	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
160	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
161	   document are to be interpreted as described in RFC 2119 [1].

163	1.2.  Terminology

165	   Path:  A sequence of links between a sender and a receiver, defined
166	      in this context by a source and destination address pair.

168	   Endpoint:  A host either initiating or terminating a MPTCP
169	      connection.

171	   Multipath TCP (MPTCP):  A modified version of the TCP [4] protocol
172	      that supports the simultaneous use of multiple paths between
173	      endpoints.

175	   Subflow:  A flow of TCP packets operating over an individual path,
176	      which forms part of a larger MPTCP connection.

178	   MPTCP Connection:  A set of one or more subflows combined to provide
179	      a single Multipath TCP service to an application at an endpoint.

181	1.3.  Reference Scenario

183	   The diagram shown in Figure 1 illustrates a typical usage scenario
184	   for MPTCP.  Two hosts, A and B, are communicating with each other.
185	   These endpoints are multi-homed and multi-addressed, providing two
186	   disjoint connections to the Internet.  The addresses on each endpoint
187	   are referred to as A1, A2, B1 and B2.  There are therefore up to four
188	   different paths between the two endpoints: A1-B1, A1-B2, A2-B1,
189	   A2-B2.

191	     +------+           __________           +------+
192	     |      |A1 ______ (          ) ______ B1|      |
193	     | Host |--/      (            )      \--| Host |
194	     |      |        (   Internet   )        |      |
195	     |  A   |--\______(            )______/--|   B  |
196	     |      |A2        (__________)        B2|      |
197	     +------+                                +------+

199	                   Figure 1: Simple MPTCP Usage Scenario

201	   The scenario could have any number of addresses (1 or more) on each
202	   endpoint, so long as the number of paths available between the two
203	   endpoints is 2 or more (i.e. num_addr(A) * num_addr(B) > 1).  The
204	   paths created by these address combinations through the Internet need
205	   not be entirely disjoint - shared bottlenecks will be addressed by
206	   the MPTCP congestion controller.  Furthermore, the paths through the
207	   Internet may be interrupted by any number of middleboxes including
208	   NATs and Firewalls.  Finally, although the diagram refers to the
209	   Internet, MPTCP may be used over any network where there are multiple
210	   paths that could be used concurrently.

212	   TBD - what further detail here would be useful?

214	2.  Goals

216	   This section outlines primary goals that Multipath TCP aims to meet.
217	   These are broadly broken down into functional goals, which steer
218	   services and features that MPTCP must provide, and compatibility
219	   goals, which determine how MPTCP should appear to entities that
220	   interact with it.

222	2.1.  Functional Goals

224	   In providing the use of multiple paths, MPTCP has the following two
225	   functional goals.

227	   o  Improve Throughput: MPTCP MUST support the concurrent use of
228	      multiple paths.  To meet the minimum performance incentives for
229	      deployment, an MPTCP connection over multiple paths SHOULD achieve
230	      no lesser throughput than a single TCP connection over the best
231	      constituent path.

233	   o  Improve Resilience: MPTCP MUST support the use of multiple paths
234	      interchangeably for resilience purposes, by permitting packets to
235	      be sent and re-sent on any available path.  It follows that, in
236	      the worst case, the protocol MUST be no less resilient than legacy
237	      TCP.

239	   As distribution of traffic among available paths and responses to
240	   congestion are done in accordance with resource pooling
241	   principles[2], a secondary effect of meeting these goals is that
242	   widespread use of MPTCP over the Internet should optimize overall
243	   network utility by shifting load away from congested bottlenecks and
244	   by taking advantage of spare capacity wherever possible.

246	   Furthermore, MPTCP SHOULD feature automatic negotiation of its use.
247	   A host supporting Multipath TCP that requires the other endpoint to
248	   do so too must be able to detect reliably whether this endpoint does
249	   in fact support the next-generation protocol, using it if so, and
250	   otherwise automatically falling back to the legacy protocol.

252	2.2.  Compatibility Goals

254	   In addition to the functional goals listed above, a Multipath TCP
255	   must meet a number of compatibility goals in order to support
256	   deployment in today's Internet.  These goals fall into the following
257	   categories:

259	2.2.1.  Application Compatibility

261	   Application compatibility refers to the appearance of MPTCP to the
262	   application both in terms of the API that can be used and the
263	   expected service model that is provided.

265	   MPTCP MUST follow the same service model as TCP [4]: in-order,
266	   reliable, and byte-oriented delivery.  Furthermore, an MPTCP
267	   connection SHOULD provide the application with no worse throughput
268	   than it would expect from running a single TCP connection over any
269	   one of its available paths.

271	   A multipath-capable equivalent of TCP SHOULD retain backward
272	   compatibility with existing TCP APIs, so that existing applications
273	   can use the newer transport merely by upgrading the operating systems
274	   of the end-hosts.  This does not preclude the use of an advanced API
275	   to permit multipath-aware applications to specify preferences, nor
276	   for users to configure their systems in a different way from the
277	   default, for example switching on or off the automatic use of MPTCP.

279	2.2.2.  Network Compatibility

281	   Traditional Internet architecture slots network devices in the
282	   network layer and lower layers of the OSI 7-layer stack, where the
283	   layers above the network layer - the transport layer and upper layers
284	   - are instantiated only at the end-hosts.  While this architecture,
285	   shown in Figure 2, was largely adhered to earlier, this layering no
286	   longer reflects the "ground truth" in the Internet with the
287	   proliferation of middleboxes[8].  Middleboxes routinely interpose on
288	   the transport layer; sometimes even completely terminating transport
289	   connections, thus leaving the application layer as the first real
290	   end-to-end layer, as shown in Figure 3.

292	   +-------------+                                       +-------------+
293	   | Application |<------------ end-to-end ------------->| Application |
294	   +-------------+                                       +-------------+
295	   |  Transport  |<------------ end-to-end ------------->|  Transport  |
296	   +-------------+   +-------------+   +-------------+   +-------------+
297	   |   Network   |<->|   Network   |<->|   Network   |<->|   Network   |
298	   +-------------+   +-------------+   +-------------+   +-------------+
299	      End Host           Router             Router          End Host

301	                Figure 2: Traditional Internet Architecture

303	   +-------------+                                       +-------------+
304	   | Application |<------------ end-to-end ------------->| Application |
305	   +-------------+                     +-------------+   +-------------+
306	   |  Transport  |<------------------->|  Transport  |<->|  Transport  |
307	   +-------------+   +-------------+   +-------------+   +-------------+
308	   |   Network   |<->|   Network   |<->|   Network   |<->|   Network   |
309	   +-------------+   +-------------+   +-------------+   +-------------+
310	                                          Firewall,
311	      End Host           Router         NAT, or Proxy      End Host

313	                        Figure 3: Internet Reality

315	   Middleboxes that interpose on the transport layer result in loss of
316	   "fate-sharing"[9], that is, they often hold "hard" state that, when
317	   lost or corrupted, results in loss or corruption of the end-to-end
318	   transport connection.

320	   MPTCP MUST remain backward compatible with the Internet as it exists
321	   today, including being able to traverse predominant middleboxes such
322	   as firewalls, NATs, and performance enhancing proxies[8].  This
323	   requirement comes from recognizing middleboxes as a significant
324	   deployment bottleneck for any transport that is not TCP, and
325	   constrains MPTCP to appear as TCP does on the wire and to use
326	   established TCP extensions where necessary.  To ensure end-to-endness
327	   of the transport, we further require MPTCP to preserve fate-sharing
328	   without making any assumptions about middlebox behavior.

330	2.2.3.  Compatibility with other network users

332	   As a corollary to both network and application compatibility, the
333	   architecture must enable new Multipath TCP flows to coexist
334	   gracefully with existing legacy TCP flows, competing for bandwidth
335	   neither unduly aggressively or unduly timidly (unless low-precedence
336	   operation is specifically requested by the application, such as with
337	   LEDBAT).  The use of multiple paths MUST not unduly harm users using
338	   single path TCP at shared bottlenecks, beyond the impact that would
339	   occur from another single legacy TCP flow.

341	3.  An Architectural Basis For MPTCP

343	   We now present one possible transport architecture that we believe
344	   can effectively support MPTCP's goals.  The new Internet model
345	   described here is based on ideas proposed earlier in Tng ("Transport
346	   next-generation") [10].  While by no means the only possible
347	   architecture supporting multipath transport, Tng incorporates many
348	   lessons learned from previous transport research and development
349	   practice, and offers a strong starting point from which to consider
350	   the extant Internet architecture and its bearing on the design of any
351	   new Internet transports or transport extensions.

353	          +------------------+
354	          |    Application   |
355	          +------------------+  ^ Application-oriented transport
356	          |                  |  | functions (Semantic Layer)
357	          + - - Transport - -+ ----------------------------------
358	          |                  |  | Network-oriented transport
359	          +------------------+  v functions (Flow+Endpoint Layer)
360	          |      Network     |
361	          +------------------+
362	            Existing Layers             Tng Decomposition

364	              Figure 4: Decomposition of Transport Functions

366	   Tng loosely splits the transport layer into "application-oriented"
367	   and "network-oriented" layers, as shown in Figure 4.  The
368	   application-oriented "Semantic" layer implements functions driven
369	   primarily by concerns of supporting and protecting the application's
370	   end-to-end communication, while the network-oriented "Flow+Endpoint"
371	   layer implements functions such as endpoint identification (using
372	   port numbers) and congestion control.  These network-oriented
373	   functions, while traditionally located in the ostensibly "end-to-end"
374	   Transport layer, have proven in practice to be of great concern to
375	   network operators and the middleboxes they deploy in the network to
376	   enforce network usage policies[11] [12] or optimize communication
377	   performance[13].  Figure 5 shows how middleboxes interact with
378	   different layers in this decomposed model of the transport layer: the
379	   application-oriented layer operates end-to-end, while the network-
380	   oriented layer operates "segment-by-segment" and can be interposed
381	   upon by middleboxes.

383	   +-------------+                                       +-------------+
384	   | Application |<------------ end-to-end ------------->| Application |
385	   +-------------+                                       +-------------+
386	   |  Semantic   |<------------ end-to-end ------------->|  Semantic   |
387	   +-------------+   +-------------+   +-------------+   +-------------+
388	   |Flow+Endpoint|<->|Flow+Endpoint|<->|Flow+Endpoint|<->|Flow+Endpoint|
389	   +-------------+   +-------------+   +-------------+   +-------------+
390	   |   Network   |<->|   Network   |<->|   Network   |<->|   Network   |
391	   +-------------+   +-------------+   +-------------+   +-------------+
392	                        Firewall         Performance
393	      End Host           or NAT        Enhancing Proxy      End Host

395	              Figure 5: Middleboxes in the new Internet model

397	   MPTCP's architectural design follows Tng's decomposition as shown in
398	   Figure 6.  The MPTCP component, which provides application
399	   compatibility through the preservation of TCP-like semantics of
400	   global ordering of application data and reliability, is an
401	   instantiation of the "application-oriented" Semantic layer; whereas
402	   the legacy-TCP component, which provides network compatibility by
403	   appearing and behaving as a TCP flow in network, is an instantiation
404	   of the "network-oriented" Flow+Endpoint layer.

406	        +--------------------------+    +-------------------------+
407	        |      Application         |    |      Application        |
408	        +--------------------------+    +-------------------------+
409	        |        Semantic          |    |         MPTCP           |
410	        |--------------------------|    + - - - - -  +  - - - - - +
411	        | Flow+Endpt | Flow+Endpt  |    |    TCP     |     TCP    |
412	        +--------------------------+    +-------------------------+
413	        |   Network  |   Network   |    |     IP     |     IP     |
414	        +--------------------------+    +-------------------------+

416	                      Figure 6: MPTCP mapping to Tng

418	   As a protocol extension to TCP, MPTCP thus explicitly acknowledges
419	   middleboxes in its design, and specifies a protocol that operates at
420	   two scales: the MPTCP component operates end-to-end, while it allows
421	   the TCP component to operate segment-by-segment.

423	4.  A Functional Decomposition of MPTCP

425	   Having laid out the goals to be met and the architectural basis for
426	   MPTCP, we now provide a functional decomposition MPTCP's design.

428	   The MPTCP component relies upon (what appear to the network to be)
429	   standard TCP sessions, termed "subflows", to provide the underlying
430	   transport per path, and as such these retain the network
431	   compatibility desired.  MPTCP as described in [3] carries MPTCP-
432	   specific information in a TCP-compatible manner, although this
433	   mechanism is separate from the actual information being transferred
434	   so could evolve in future revisions.  Figure 7 illustrates the
435	   layered architecture.

437	                                   +-------------------------------+
438	                                   |           Application         |
439	      +---------------+            +-------------------------------+
440	      |  Application  |            |             MPTCP             |
441	      +---------------+            + - - - - - - - + - - - - - - - +
442	      |      TCP      |            | Subflow (TCP) | Subflow (TCP) |
443	      +---------------+            +-------------------------------+
444	      |      IP       |            |       IP      |      IP       |
445	      +---------------+            +-------------------------------+

447	      Figure 7: Comparison of Standard TCP and MPTCP Protocol Stacks

449	   Situated below the application, the MPTCP extension manages multiple
450	   TCP subflows below it and must implement the following functions:

452	   o  Path Management: This is the function to detect and use multiple
453	      paths between two endpoints.  In the case of the MPTCP design [3],
454	      this feature is implemented using multiple IP addresses at least
455	      one of the endpoints.  Although this does not guarantee path
456	      diversity, and there may be shared bottlenecks, this is a simple
457	      mechanism that can be used with no additional features in the
458	      network.  The path management features of the MPTCP protocol are
459	      the mechanisms to signal alternative addresses to endpoints, and
460	      mechanisms to set up new subflows attached to an existing MPTCP
461	      connection.

463	   o  Packet Scheduling: This function breaks the bytestream received
464	      from the application into segments which are transmitted on one of
465	      the available lower subflows.  The MPTCP design makes use of a
466	      data sequence mapping, associating packets sent on different
467	      subflows to a connection-level sequence numbering, thus allowing
468	      packets sent on different subflows to be correctly re-ordered at
469	      the receiver.  The packet scheduler is dependent upon information
470	      about the availability of paths exposed by the path management
471	      component, and then makes use of the subflows to transmit these
472	      packets.

474	   o  Subflow (single-path TCP) Interface: A subflow component takes
475	      segments from the packet-scheduling component and transmits them
476	      over the specified path, ensuring detectable delivery to the
477	      endpoint.  Detection of delivery is necessary to allow the
478	      congestion control protocol to attribute packet delivery or loss
479	      to the right path.  Note that the packet scheduling component does
480	      not embed enough information in packets to allow this to happen:
481	      segments with the same connection-level sequence number can be
482	      transmitted over multiple paths, i.e. as retransmissions or just
483	      to increase redundancy.  MPTCP uses TCP underneath for network
484	      compatibility; TCP ensures in-order, reliable delivery.  TCP adds
485	      its of sequence numbers to the segments; these are used to detect
486	      and retransmit lost packets.

488	   o  Congestion Control: This function manages congestion control
489	      across the subflows.  As specified, this congestion control
490	      algorithm must ensure that a MPTCP connection does not unfairly
491	      take more bandwidth than a single path TCP flow would take at a
492	      shared bottlneck.  An algorithm to support this is specified in
493	      [6].

495	   These functions fit together as follows.  The Path Management looks
496	   after the discovery (and if necessary, initialisation) of multiple
497	   paths between two endpoints.  The Packet Scheduler then receives
498	   packets from the application for the network and does the necessary
499	   operations on them (such as adding a data-level sequence number)
500	   before sending to a subflow.  The subflow then adds its own sequence
501	   number, acks, and passes them to network.  The receiving subflow re-
502	   orders data and passes it to the MPTCP component, which performs
503	   connection level re-ordering, removes the segment boundaries and
504	   sends it to the application.  Finally, the congestion control
505	   component exists as part of the packet scheduling, in order to
506	   schedule which packets should be sent at what rate on which subflow.

508	5.  High-Level Design Decisions

510	   There is seemingly a wide range of choices when designing a multipath
511	   extension to TCP.  However, the goals as discussed earlier in this
512	   document constrain the possible solutions, leaving relative little
513	   choice in many areas.  Here, we outline high-level design choices
514	   that draw from the architectural basis discussed earlier in
515	   Section 3, and their implications for the MPTCP design.

517	5.1.  Sequence Numbering

519	   MPTCP uses two levels of sequence spaces: a connection level sequence
520	   number, and another sequence number for each subflow.  This permits
521	   connection-level segmentation and reassembly, and retransmission of
522	   the same part of connection-level sequence space on different
523	   subflow-level sequence space.

525	   The alternative approach would be to use a single connection level
526	   sequence number, which gets sent on multiple subflows.  This has two
527	   problems: first, the individual subflows will appear to the network
528	   as TCP sessions with gaps in the sequence space; this in turn may
529	   upset certain middleboxes such as intrusion detection systems, or
530	   certain transparent proxies, and would go against the network
531	   compatibility goal.  Second, the sender cannot attribute packet
532	   losses or receptions to the correct path when the same packet is sent
533	   on multiple paths, in the case of retransmissions.

535	   The sender must be able to tell the receiver how to reorder the data,
536	   for delivery to the application.  The sender does so by telling the
537	   receiver how subflow-level data (carying subflow sequence numbers)
538	   maps at connection level, which we refer to as Data Sequence Mapping.
539	   This mapping takes the form (data seq, subflow seq, length), i.e. for
540	   a given number of bytes (the length), the subflow sequence space
541	   beginning at the given sequence number maps to the connection-level
542	   sequence space (beginning at the given data seq number).

544	   This architecture does not mandate a mechanism for signalling such
545	   information, and it could conceivably have various sources.

547	   One option would be to use existing fields in the TCP segment (such
548	   as subflow seqno, length) and only add the data sequence number to
549	   each segment, for instance as a TCP option.  This is, however,
550	   vulnerable to middleboxes that resegment or assemble data, since
551	   there is no specified behaviour for coalescing TCP options.  If one
552	   signalled (data seqno, length), this would still be vulnerable to
553	   middleboxes that coalesce segments and do not correctly coalesce the
554	   options.  Because of these potential issues, the current
555	   specification of MPTCP mandates that the full mapping should be sent
556	   to the other end.

558	   To reduce the overhead, it would be permissable for the mapping to be
559	   sent periodically and cover more than a single segment.  It could
560	   also be excluded entirely in the case of a connection before more
561	   than one subflow is used, where the data-level and subflow-level
562	   sequence space is the same.

564	5.2.  Reliability

566	   Under normal behaviour, MPTCP can use the data sequence mapping and
567	   subflow ACKs to decide when a connection-level segment was received.
568	   This has certain implications on end-to-end semantics.  It means that
569	   once a packet is acked at subflow level it cannot be discarded in the
570	   re-order buffer at the connection level.  Secondly, unlike in
571	   standard TCP, a receiver cannot simply drop out-of-order segments if
572	   needed (for instance, due to memory pressure).

574	   Furthermore, it is possible to conceive of some cases where
575	   connection-level acknowledgements could improve robustness.  Consider
576	   a subflow traversing a transparent proxy: if the proxy acks a segment
577	   and then crashes, the sender will not retransmit the lost segment on
578	   another subflow, as it thinks the segment has been received.  The
579	   connection grinds to a halt despite having other working subflows,
580	   and the sender would be unable to determine the cause of the problem.
581	   Finally, as an optimisation, it may be feasible for a connection-
582	   level acknowledgement to be transmitted over the shortest RTT path,
583	   potentially reducing send buffer requirements (see Section 5.3).

585	   Therefore, to provide a fully robust multipath TCP solution, MPTCP
586	   SHOULD feature explicit connection-level acknowledgements.

588	   Regarding retransmissions, it MUST be possible for a packet to be
589	   retransmitted on a different subflow to that on which it was
590	   originally sent.  This is one of MPTCP's core goals, in order to
591	   maintain integrity during temporary or permanent subflow failure, and
592	   this is enabled by the dual sequence number space.

594	   The scheduling of retransmissions will have significant impact on
595	   MPTCP user experience.  The current MPTCP specification suggests that
596	   data outstanding on subflows that have timed out should be
597	   rescheduled for transmission on different subflows.  This behaviour
598	   aims to minimize disruption when a path breaks, and uses the first
599	   timeout as indicators.  More conservative versions would be to use
600	   second or third timeouts for the same packet.

602	   When packet loss is detected and corrected with fast retransmit,
603	   retransmission on different subflows may still be desirable in
604	   certain cases, for instance to reduce the receive buffer
605	   requirements.  However, in all cases with retransmissions on
606	   different subflows, the lost packets SHOULD still be sent on the path
607	   that lost them.  This is currently believed to be necessary to
608	   maintain subflow integrity, as per the network compatiblity goal.  By
609	   doing this, throughput will be wasted, and it is unclear at this
610	   point what the optimal retransmit strategy is.

612	5.3.  Buffers

614	   Receive Buffer: ideally, a subflow failing should not affect the
615	   throughput of other working subflows.  However, the receive buffer
616	   has limited size: if a flow times out, the other subflows will
617	   quickly fill the receive buffer with out-of-order data, and will
618	   stall.  Hence, receive buffer sizing is important for both robustness
619	   and throughput.

621	   The smallest receive buffer we need to avoid stalling under any
622	   circumstances is max(RTO)*sum(BW).  This is, for most multipath
623	   connections, too expensive.  A more reasonable size is proportional
624	   to max(RTT)*sum(BW) which ensures subflows don't stall when fast
625	   retransmit works.  Also, depending on how the implementation behaves,
626	   an additional sum(RTT*BW) might be needed for the individual re-order
627	   buffers of the TCP subflows.

629	   Send Buffer: the smallest send buffer we need is sum(BDP) across all
630	   paths; this is to hold data until it's acked at subflow level.  If we
631	   didn't use a subflow level ack, and relied on a data-level ack, the
632	   send buffer would need to be as big as the receive buffer of the
633	   connection, max(RTT)*sum(BW).  In practice, the senders will be web
634	   servers and receivers will be desktops or mobile servers.  The send
635	   buffer size matters particularly for servers, which must be able to
636	   maintain a large number of ongoing connections.

638	5.4.  Signalling

640	   Since MPTCP will use regular TCP streams as its transport mechanism,
641	   a MPTCP connection will also begin as a single TCP stream.
642	   Nevertheless, it must signal to the peer that it supports MPTCP and
643	   wishes to use it on this connection.  As such, a TCP Option will be
644	   used to transmit this information, since this is the established
645	   mechanism for indicating additional functionality on a TCP session.

647	   On top of this, however, is signalling required during the operation
648	   of an MPTCP session, such as that for reassembly for multiple
649	   subflows, and for informing the other endpoint about potential other
650	   available addresses.  It is not mandated by the architecture in what
651	   format this signalling should be transmitted.

653	   The current MPTCP protocol proposal suggests the use of TCP options
654	   for this signalling, however another approach would be to embed such
655	   information in the payload, and use type-length-value (TLV) encoding
656	   to separate signalling and payload data.

658	5.5.  Path Management

660	   Currently, the network does not expose multiple paths between
661	   endpoints.  Multipath TCP will use multiple addresses at one or both
662	   endpoints to get different paths to the destination.  The hope is
663	   that these paths, whilst not necesarily entirely non-overlapping,
664	   will be sufficiently disjoint to allow multipath achieve improved
665	   throughput and robustness.

667	   Multiple different (source, destination) address pairs will thus be
668	   used as path selectors.  Each path will be identified by a TCP
669	   4-tuple (i.e. source address, destination address, source port,
670	   destination port), thus allowing the extension of MPTCP to use such
671	   4-tuples as path selectors if the network will route different ports
672	   over different paths (which may be the case with technologies such as
673	   ECMP).

675	   For increased chance of successfully setting up additional subflows
676	   (such as when one end is behind a firewall, NAT, or other restrictive
677	   middlebox), either endpoint should be able to add new subflows to a
678	   MPTCP connection.

680	   The modularity of path management will permit alternative mechanisms
681	   to be employed if appropriate in the future.

683	5.6.  Connection Identification

685	   Therefore, each MPTCP connection should have a connection identifier
686	   at each endpoint, which is locally unique within that endpoint.  In
687	   many ways, this is analogous to a port number in regular TCP.  The
688	   manifestation and purpose of such an identifier is out of the scope
689	   of this architecture document.

691	   Legacy applications will not, however, have access to this identifier
692	   and in such cases a MPTCP connection will be identified by the
693	   5-tuple of the first TCP subflow.  It is out of the scope of this
694	   document, however, to define the behaviour of the MPTCP
695	   implementation if the first TCP subflow later fails.  If there are
696	   legacy applications that make assumptions about continued existance
697	   of the initial address pair, their behaviour could be disrupted by
698	   carrying on regardless.  It is expected that this is a very small,
699	   possibly negligible, set of applications, however.  In the case of
700	   applications that have specifically asked to be bound to a particular
701	   address or interface, MPTCP will not be used.

703	   Since the requirements of applications are not clear at this stage,
704	   however, it is as yet unconfirmed what the best behaviour is.  It
705	   will be an implementation-specific solution, however, and as such the
706	   behaviour is expected to be chosen by implementors once more research
707	   has been undertaken to determine its impact.

709	5.7.  Network Layer Compatibility

711	   MPTCP's modifications remain at the transport layer, although some
712	   knowledge of the underlying network layer is required.  MPTCP MUST
713	   work with IPv4 and IPv6 interchangeably, i.e. one MPTCP connection
714	   may operate over both IPv4 and IPv6 networks.

716	5.8.  Congestion Control

718	   As already documented in network-layer compatibility requirements,
719	   the congestion control algorithms used by an MPTCP implementation
720	   must not harm other legacy users on shared bottlenecks.  To achieve
721	   this, the congestion control algorithms on use on each subflow must
722	   be coupled in some way - a proposal for this is given in [6].

724	6.  Summary

726	   This document has provided a summary of the components that have been
727	   identified to provide a Multipath TCP solution, and described the
728	   high-level design decisions that have been used as a basis of the
729	   MPTCP specification.

731	   The suite of drafts that specify a complete MPTCP implementation, on
732	   top of this architectural overview, are as follows:

734	   o  A specification of the MPTCP protocol [3], describing the on- and
735	      off-the-wire differences to regular TCP.

737	   o  A specification of a coupled congestion control algorithm [6],
738	      that can be applied to the above protocol while meeting the goals
739	      for such an algorithm as specified in this document.

741	   o  A document [7] that builds upon the application compatibility
742	      issues discussed in this document, explaining in more detail what
743	      if any changes an application may experience through the use of
744	      MPTCP.  This document also provides a proposed API through which
745	      an application can influence the behaviour of the MPTCP protocol,
746	      as specified in the above drafts.

748	7.  Security Considerations

750	   Please see [14] for a threat analysis of Multipath TCP.  The threats
751	   analysed in this companion document are addressed as appropriate in
752	   the protocol design [3].

754	8.  Interactions with Applications

756	   Interactions with applications - incuding, but not limited to,
757	   performances changes that may be expected, semantic changes, and new
758	   features that may be requested of an API, are presented in [7].

760	9.  Interactions with Middleboxes

762	   As discussed in Section 2.2, it is a goal of MPTCP to be deployable
763	   today and thus compatible with the majority of middleboxes.  This
764	   section summarises the issues that may arise with NATs, firewalls,
765	   proxies, intrusion detection systems, and other middleboxes that, if
766	   not considered in the protocol design, may hinder its deployment.

768	   This section is intended primarily as a description of options and
769	   considerations only.  Protocol-specific solutions to these issues
770	   will be given in the companion documents.

772	   Multipath TCP will be deployed in a network that no longer provides
773	   just basic datagram delivery.  A miriad of middleboxes are deployed
774	   to optimize various perceived problems with the Internet protocols:
775	   NATs primarily address space shortage [11], Performance Enhancing
776	   Proxies (PEPs) optimize TCP for different link characteristics [13],
777	   firewalls [12] and intrusion detection systems try to block malicious
778	   content from reaching a host, and traffic normalizers [15] ensure a
779	   consistent view of the traffic stream to IDSes and hosts.

781	   All these middleboxes optimize current applications at the expense of
782	   future applications.  In effect, future applications must mimic
783	   existing ones if they want to be deployed.  Further, the precise
784	   behaviour of all these middleboxes is not clearly specified, and
785	   implementation errors make matters worse, raising the bar for the
786	   deployment of new technologies.

788	   The following list of middlebox classes documents behaviour that
789	   could impact the use of MPTCP.  This list is used in [3] to describe
790	   the features of the MPTCP protocol that are used to mitigate the
791	   impact of these middlebox behaviours.

793	   o  NATs: Network Address Translators decouple the endpoint's local IP
794	      address with that which is seen in the wider Internet when the
795	      packets are transmitted through a NAT.  This adds complexity, and
796	      reduces the chances of success, when signalling IP addresses.

798	   o  PEPs: Performance Enhancing Proxies, which aim to improve the
799	      performance of protocols over low-performance (e.g. high latency
800	      or high error rate) links.  As such, they may "split" a TCP
801	      connection and behaviour such as proactive ACKing may occur.  As
802	      with NATs, it is no longer guaranteed that one endpoint is
803	      communicating directly with another.

805	   o  Traffic Normalizers: These aim to eliminate ambiguities and
806	      potential attacks at the network level, and amongst other things
807	      are unlikely to permit holes in sequence space.

809	   o  TCP Options: many middleboxes are in a position to drop packets
810	      with unknown TCP options, or strip those options from the packets.

812	   o  Segmentation/Colescing: middleboxes (or even something as close to
813	      the end host as TCP Segmentation Offloading) may change the packet
814	      boundaries from those which the sender intended.  It may do this
815	      by splitting packets, or coalescing them together.  This leads to
816	      two major impacts: we cannot guarantee where a packet boundary
817	      will be, and we cannot say for sure what a middlebox will do with
818	      TCP options in these cases (they may be repeated, dropped, or sent
819	      only once).

821	   o  Firewalls: on top of preventing incoming connections, firewalls
822	      may also attempt additional protection such as sequence number
823	      randomization.

825	   o  Intrusion Detection Systems: IDSs may look for traffic patterns to
826	      protect a network, and may have false positives with MPTCP and
827	      drop the connections during normal operation.  For future MPTCP-
828	      aware middleboxes, they will require the ability to correlate the
829	      various paths in use.

831	10.  Acknowledgements

833	   Alan Ford, Costin Raiciu and Sebastien Barre are supported by Trilogy
834	   (http://www.trilogy-project.org), a research project (ICT-216372)
835	   partially funded by the European Community under its Seventh
836	   Framework Program.  The views expressed here are those of the
837	   author(s) only.  The European Commission is not liable for any use
838	   that may be made of the information in this document.

840	11.  Contributors

842	   The authors would like to acknowledge the contributions of Mark
843	   Handley and Bryan Ford to this document.

845	12.  IANA Considerations

847	   None.

849	13.  References
850	13.1.  Normative References

852	   [1]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
853	         Levels", BCP 14, RFC 2119, March 1997.

855	13.2.  Informative References

857	   [2]   Wischik, D., Handley, M., and M. Bagnulo Braun, "The Resource
858	         Pooling Principle", ACM SIGCOMM CCR vol. 38 num. 5, pp. 47-52,
859	         October 2008,
860	         <http://ccr.sigcomm.org/online/files/p47-handleyA4.pdf>.

862	   [3]   Ford, A., Raiciu, C., and M. Handley, "TCP Extensions for
863	         Multipath Operation with Multiple Addresses",
864	         draft-ietf-mptcp-multiaddressed-00 (work in progress),
865	         June 2010.

867	   [4]   Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
868	         September 1981.

870	   [5]   Stewart, R., "Stream Control Transmission Protocol", RFC 4960,
871	         September 2007.

873	   [6]   Raiciu, C., Handley, M., and D. Wischik, "Coupled Multipath-
874	         Aware Congestion Control", draft-raiciu-mptcp-congestion-01
875	         (work in progress), March 2010.

877	   [7]   Scharf, M. and A. Ford, "MPTCP Application Interface
878	         Considerations", draft-scharf-mptcp-api-01 (work in progress),
879	         March 2010.

881	   [8]   Carpenter, B. and S. Brim, "Middleboxes: Taxonomy and Issues",
882	         RFC 3234, February 2002.

884	   [9]   Carpenter, B., "Internet Transparency", RFC 2775,
885	         February 2000.

887	   [10]  Ford, B. and J. Iyengar, "Breaking Up the Transport Logjam",
888	          ACM HotNets, October 2008.

890	   [11]  Srisuresh, P. and K. Egevang, "Traditional IP Network Address
891	         Translator (Traditional NAT)", RFC 3022, January 2001.

893	   [12]  Freed, N., "Behavior of and Requirements for Internet
894	         Firewalls", RFC 2979, October 2000.

896	   [13]  Border, J., Kojo, M., Griner, J., Montenegro, G., and Z.
897	         Shelby, "Performance Enhancing Proxies Intended to Mitigate
898	         Link-Related Degradations", RFC 3135, June 2001.

900	   [14]  Bagnulo, M., "Threat Analysis for Multi-addressed/Multi-path
901	         TCP", draft-ietf-mptcp-threat-02 (work in progress),
902	         March 2010.

904	   [15]  Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion
905	         Detection: Evasion, Traffic Normalization, and End-to-End
906	         Protocol Semantics", Usenix Security 2001, 2001, <http://
907	         www.usenix.org/events/sec01/full_papers/handley/handley.pdf>.

909	Appendix A.  Implementation Architecture

911	   This section provides suggestions for an architecture to implement an
912	   extensible, modular multipath transport protocol.

914	A.1.  Functional Separation

916	   This section describes a generic view of the internal implementation
917	   of a Multipath TCP, through which the technical components specified
918	   in the companion documents can fit together.  It shows how an
919	   implementation could be built that permits extensibility between
920	   components without changing the external representation.

922	   We first show the functional decomposition of an MPTCP solution that
923	   is completely contained in the transport layer.  That solution is
924	   described in more details in [3].  Then we generalize the approach to
925	   allow good extensibility of that solution.

927	A.1.1.  Application to default MPTCP protocol

929	   Although, in the default approach, MPTCP is fully contained in the
930	   transport layer, it can still be divided into two main modules.  One
931	   manages the scheduling of packets as well as congestion control.  The
932	   other one manages the control of paths.  The interface between the
933	   two is dealt with thanks to a Path Index.  As shown in Figure 8, the
934	   Path Manager announces to the MultiPath Scheduler what paths can be
935	   used trough path indices, and maintains the mapping between that
936	   value and the particular action that it must apply to use the path
937	   (an example of such a mapping is in Table 1).  In the case of the
938	   built-in Path Manager, the action is to replace an address/port pair
939	   with another one, in such a way that another path is used across the
940	   Internet to forward that packet.

942	            Control plane    <--     |     -->    Data plane
943	   +---------------------------------------------------------------+
944	   |                     Multipath Scheduler (MPS)                 |
945	   +---------------------------------------------------------------+
946	                ^                    |          |
947	                |                    |   [A1,B1,|pA1,pB1]
948	                |For conn_id         |          |
949	                |<A1,B1,pA1,pB1>     |   +-------------+
950	                |Paths 1->4 can be   |   | Data packet |<--Path idx:3
951	                |used.               |   +-------------+   attached
952	                |                    |          |          by MPS
953	                |                    |          V
954	   +--------------------------------------------\------------------+
955	   |                         Path Manager (PM)   \[A1,B1]->[A1,B2] |
956	   +--------------------------------------------------\------------+
957	      /                           \  |                 \
958	     /-----------------------------\ |   /"\    /"\    /"\   /"\
959	     | rewriting table:             ||   | |    | |    | |   | |
960	     | Subflow id  <-->  network_id ||   | |    | |    | |   | |
961	     |                              ||   | |    | |    | |   | |
962	     |    [see table below]         ||   | |    | |    | |   | |
963	     |                              ||   \./    \./    \./   \./
964	     +------------------------------+|  path1  path2  path3 path4

966	      Figure 8: Functional separation of MPTCP in the transport layer

968	   The MultiPath Scheduler only deals with abstract paths, represented
969	   by numbers.  It only sees one address pair throughout the
970	   communication, that we call the connection identifier.  However, the
971	   MultiPath Scheduler must be able to perform per-subflow congestion
972	   control, and thus to distinguish between the subflows.  This leads to
973	   define a subflow identifier, that consists of the usual transport
974	   identifier extended with the path index:
975	   <addr_src,psrc,addr_dst,pdst,path_index>.  The following options,
976	   described in [3], are managed by the MultiPath Scheduler.

978	   o  MULTIPATH CAPABLE (MPC): Tell the peer that we support MPTCP.
979	      Note that the MPC option also holds a token, which is necessary
980	      only if the built-in Path Manager is used.  In the next section we
981	      describe the generalized case, where the token can be ignored by
982	      the receiver if another path manager is used.

984	   o  DATA SEQUENCE NUMBER (DSN): Identify the position of a set of
985	      bytes in the meta-flow.

987	   o  DATA FIN (DFIN): Terminate a meta-flow.

989	   An implementation MUST use those options even if another Path Manager
990	   than the default one is implemented.

992	   The Path manager applies a particular technology to give the MPS the
993	   possibility to use several paths.  The built-in MPTCP Path Manager
994	   uses multiple IPv4 addresses as its mean to influence the forwarding
995	   of packets through the Internet.

997	   When the MPS starts a new connection, the PM chooses a token that
998	   will be used to identify the connection.  This is necessary to allow
999	   the PM applying the correct path index to incoming packets.  An
1000	   example mapping table is given hereafter:

1002	      +-----------------+---------------+---------+-----------------+
1003	      |  connection id  |   subflow id  |  token  |    Network id   |
1004	      +-----------------+---------------+---------+-----------------+
1005	      | <A1,B1,pA1,pB1> | <conn_id,pi1> | token_1 | <A1,B1,pA1,pB1> |
1006	      | <A1,B1,pA1,pB1> | <conn_id,pi2> | token_1 | <A2,B2,pA1,pB2> |
1007	      | <A1,B1,pA1,pB1> | <conn_id,pi3> | token_1 | <A1,B2,pA1,pB2> |
1008	      | <A1,B1,pA1,pB1> | <conn_id,pi4> | token_1 | <A2,B1,pA1,pB1> |
1009	      | <A1,B1,pA1,pB3> | <conn_id,pi1> | token_2 | <A1,B1,pA1,pB3> |
1010	      | <A1,B1,pA1,pB3> | <conn_id,pi2> | token_2 | <A2,B1,pA1,pB3> |
1011	      +-----------------+---------------+---------+-----------------+

1013	              Table 1: Example mapping table for built-in PM

1015	   Table 1 shows an example where two connections are ongoing.  One is
1016	   identified by token_1, the other one with token_2.  Since addresses
1017	   are rewritten by the path manager, the attachment to the right
1018	   connection is achieved thanks to the token, which is used at
1019	   connection establishment and subflow establishment.  It is then
1020	   remembered.  The first column holds the information that is exposed
1021	   to the applications, while the last column shows the information that
1022	   is actually written in packets that will fly through the network.  We
1023	   note that additionnally to the addresses, ports can be rewritten,
1024	   which contributes to supporting NATs.  The table also shows the role
1025	   of the token, which is to attach various combinations of ports and
1026	   addresses to a single connection.  The token is specific to the
1027	   built-in path manager, and can be ignored if another path manager is
1028	   used.  An implementation of the built-in path manager MUST implement
1029	   the following options (defined in more details in [3]):

1031	   o  Add Address (ADDR): Announce a new address we own

1033	   o  Remove Addresse (REMADDR): Withdraw a previously announced address

1035	   o  Join Connection (JOIN): Attach a new subflow to the current
1036	      connection

1038	   Those options form the default MPTCP Path Manager, based on declaring
1039	   IP addresses, and carries control information in TCP options.  An
1040	   implementation of Multipath TCP can use any Path Manager, but it MUST
1041	   be able to fallback to the default PM in case the other end does not
1042	   support the custom PM.  Alternative Path Managers may be specified in
1043	   separate documents in the future.

1045	A.1.2.  Generic architecture for MPTCP

1047	   Now that the functional decomposition has been shown for MPTCP with
1048	   the built-in Path Manager, we show how that architecture can be
1049	   generalized to allow the implementation of other Path Managers for
1050	   MPTCP.  A general overview of the architecture is provided in
1051	   Figure 9.  The Multipath Scheduler (MPS) learns about the number of
1052	   available paths through notifications received from the Path Manager
1053	   (PM).  From the point of view of the Multipath Scheduler, a path is
1054	   just a number, called a Path Index.  Notifications from the PM to the
1055	   MPS MAY contain supporting information about the paths, if relevant,
1056	   so that the MPS can make more intelligent decisions about where to
1057	   route traffic.  When the Multipath Scheduler initiates a
1058	   communication to a new host, it can only send the packets to the
1059	   default path.  But since the Path manager is layered below the MPS,
1060	   it can detect that a new communication is happening, and tell the MPS
1061	   about the other paths it knows about.

1063	            Control plane    <--     |     -->    Data plane
1064	   +---------------------------------------------------------------+
1065	   |                     Multipath Scheduler (MPS)                 |
1066	   +---------------------------------------------------------------+
1067	                ^                    |          |
1068	                |                    |   [A1,B1,|pA1,pB1]
1069	                |                    |          |
1070	                |Announcing new      |   +-------------+
1071	                |paths. (referred    |   | Data packet |<--Path idx:3
1072	                |to as path indices) |   +-------------+   attached
1073	                |                    |          |          by MPS
1074	                |                    |          V
1075	   +--------------------------------------------\------------------+
1076	   |                         Path Manager (PM)   \__________zzzzz  |
1077	   +--------------------------------------------------------\------+
1078	      /                         \    |                       \
1079	     /---------------------------\   |   /"\       /"\       /"\
1080	     | subflow_id        Action  |   |   | |       | |       | |
1081	     |<A1,B1,pA1,pB1,1>  xxxxx   |   |   | |       | |       | |
1082	     |<A1,B1,pA1,pB1,2>  yyyyy   |   |   \./       \./       \./
1083	     |<A1,B1,pA1,pB1,3>  zzzzz   |   |  path1     path2     path3
1084	     +---------------------------+

1086	                 Figure 9: Overview of MPTCP architecture

1088	   From then on, it is possible for the MPS to associate a Path Index
1089	   with its packets, so that the Path Manager can map this Path Index to
1090	   a particular action (see table in the lower left part of Figure 9).
1091	   The particular action depends on the network mechanism used to select
1092	   a path.  Examples are address rewriting, tunnelling or setting a path
1093	   selector value inside the packet.  Note that the Path Index is not
1094	   supposed to be written inside the packet, but instead associated with
1095	   it, internally to the implementation.

1097	   The applicability of the architecture is not limited to the MPTCP
1098	   protocol.  While we define in this document an MPTCP MPS (MPTCP
1099	   Multipath Scheduler), other Multipath Schedulers can be defined.  For
1100	   example, if an appropriate socket interface is designed, applications
1101	   could behave as a Multipath Scheduler and decide where to send any
1102	   particular data.  In this document we concentrate on the MPTCP case,
1103	   however.

1105	A.2.  PM/MPS interface

1107	   The minimal set of requirement for a Path Manager is as follows:

1109	   o  Outgoing untagged packets: Any outgoing packet flowing through the
1110	      Path Manager is either tagged or untagged (by the MPS) with a path
1111	      index.  If it is untagged, the packet is sent normally to the
1112	      Internet, as if no multi-path support were present.  Untagged
1113	      packets can be used to trigger a path discovery procedure, that
1114	      is, a Path Manager can listen to untagged packets and decide at
1115	      some time to find if any other path than the default one is
1116	      useable for the corresponding host pair.  Note that any other
1117	      criteria could be used to decide when to start discovering
1118	      available paths.  Note also that MPS scheduling will not be
1119	      possible until the Path Manager has notified the available paths.
1120	      The PM is thus the first entity coming into action.

1122	   o  Outgoing tagged packets: The Path Manager maintains a table
1123	      mapping path indices to actions.  The action is the operation that
1124	      allows using a particular path.  Examples of possible actions are
1125	      route selection, interface selection or packet transformation.
1126	      When the PM sees a packet tagged with a path index, it looks up
1127	      its table to find the appropriate action for that packet.  The tag
1128	      is purely local.  It is removed before the packet is transmitted.

1130	   o  Incoming packets: A Path Manager MUST ensure that each incoming
1131	      path is mapped unambiguously to exactly one outgoing path.  Note
1132	      that this requirement implies that the same number of incoming/
1133	      outgoing paths must be established.  Moreover, a PM MUST tag any
1134	      incoming path with the same Path Index as the one used for the
1135	      corresponding outgoing path.  This is necessary for MPTCP to know
1136	      what outgoing path is acknowledged by an incoming packet.

1138	   o  Module interface: A PM MUST be able to notify the MPS about the
1139	      number of available paths.  Such notifications MUST contain the
1140	      path indices that are legal for use by the MPS.  In case the PM
1141	      decides to stop providing service for one path, it MUST notify the
1142	      MPS about path removal.  Additionnaly, a PM MAY provide
1143	      complementary path information when available, such as link
1144	      quality or preference level.

1146	Appendix B.  Changelog

1148	B.1.  Changes since draft-ietf-mptcp-architecture-00

1150	   o  Added middlebox compatibility discussion (Section 9).

1152	   o  Clarified path identification (TCP 4-tuple) in Section 5.5.

1154	   o  Added brief scenario and diagram to Section 1.3.

1156	Authors' Addresses

1158	   Alan Ford (editor)
1159	   Roke Manor Research
1160	   Old Salisbury Lane
1161	   Romsey, Hampshire  SO51 0ZN
1162	   UK

1164	   Phone: +44 1794 833 465
1165	   Email: alan.ford@roke.co.uk

1167	   Costin Raiciu
1168	   University College London
1169	   Gower Street
1170	   London  WC1E 6BT
1171	   UK

1173	   Email: c.raiciu@cs.ucl.ac.uk

1175	   Sebastien Barre
1176	   Universite catholique de Louvain
1177	   Pl. Ste Barbe, 2
1178	   Louvain-la-Neuve  1348
1179	   Belgium

1181	   Phone: +32 10 47 91 03
1182	   Email: sebastien.barre@uclouvain.be

1184	   Janardhan Iyengar
1185	   Franklin and Marshall College
1186	   Mathematics and Computer Science
1187	   PO Box 3003
1188	   Lancaster, PA  17604-3003
1189	   USA

1191	   Phone: 717-358-4774
1192	   Email: jiyengar@fandm.edu