idnits 2.17.1 draft-ford-mptcp-architecture-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: As a corollary to both network and application compatibility, the architecture must enable new Multipath TCP flows to coexist gracefully with existing legacy TCP flows, competing for bandwidth neither unduly aggressively or unduly timidly (unless low-precedence operation is specifically requested by the application, such as with LEDBAT). The use of multiple paths MUST not significantly harm users using single path TCP at shared bottlenecks, beyond the impact that would occur from another single legacy TCP flow. -- The document date (February 3, 2010) is 5168 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: '12' is defined on line 767, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 793 (ref. '2') (Obsoleted by RFC 9293) == Outdated reference: A later version (-03) exists of draft-ford-mptcp-multiaddressed-02 == Outdated reference: A later version (-01) exists of draft-raiciu-mptcp-congestion-00 == Outdated reference: A later version (-04) exists of draft-scharf-mptcp-api-00 == Outdated reference: A later version (-01) exists of draft-bagnulo-mptcp-threat-00 -- Obsolete informational reference (is this intentional?): RFC 2581 (ref. '12') (Obsoleted by RFC 5681) Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force A. Ford, Ed. 3 Internet-Draft Roke Manor Research 4 Intended status: Informational C. Raiciu 5 Expires: August 7, 2010 University College London 6 S. Barre 7 Universite catholique de 8 Louvain 9 J. Iyengar 10 Franklin and Marshall College 11 B. Ford 12 Max Planck Institute for Software 13 Systems 14 February 3, 2010 16 Architectural Guidelines for Multipath TCP Development 17 draft-ford-mptcp-architecture-01 19 Abstract 21 Often endpoints are connected by multiple paths, but the nature of 22 TCP/IP restricts communications to a single path per socket. 23 Resource usage within the network would be more efficient were these 24 multiple paths able to be used concurrently. This should enhance 25 user experience through improved resilience to network failure and 26 higher throughput. 28 This document outlines architectural guidelines for the development 29 of a Multipath Transport Protocol, with references to how these 30 architectural components come together in the Multipath TCP (MPTCP) 31 protocol. This document also lists certain high level design 32 decisions that provide foundations for the MPTCP design, based upon 33 these architectural requirements. 35 Status of this Memo 37 This Internet-Draft is submitted to IETF in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF), its areas, and its working groups. Note that 42 other groups may also distribute working documents as Internet- 43 Drafts. 45 Internet-Drafts are draft documents valid for a maximum of six months 46 and may be updated, replaced, or obsoleted by other documents at any 47 time. It is inappropriate to use Internet-Drafts as reference 48 material or to cite them other than as "work in progress." 49 The list of current Internet-Drafts can be accessed at 50 http://www.ietf.org/ietf/1id-abstracts.txt. 52 The list of Internet-Draft Shadow Directories can be accessed at 53 http://www.ietf.org/shadow.html. 55 This Internet-Draft will expire on August 7, 2010. 57 Copyright Notice 59 Copyright (c) 2010 IETF Trust and the persons identified as the 60 document authors. All rights reserved. 62 This document is subject to BCP 78 and the IETF Trust's Legal 63 Provisions Relating to IETF Documents 64 (http://trustee.ietf.org/license-info) in effect on the date of 65 publication of this document. Please review these documents 66 carefully, as they describe your rights and restrictions with respect 67 to this document. Code Components extracted from this document must 68 include Simplified BSD License text as described in Section 4.e of 69 the Trust Legal Provisions and are provided without warranty as 70 described in the BSD License. 72 Table of Contents 74 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 75 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 4 76 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 5 77 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 78 1.4. Reference Scenario . . . . . . . . . . . . . . . . . . . . 5 79 2. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 80 2.1. Functional Goals . . . . . . . . . . . . . . . . . . . . . 5 81 2.2. Compatibility Goals . . . . . . . . . . . . . . . . . . . 6 82 2.2.1. Application Compatibility . . . . . . . . . . . . . . 6 83 2.2.2. Network Compatibility . . . . . . . . . . . . . . . . 7 84 2.2.3. Compatibility with other network users . . . . . . . . 7 85 3. Multipath Architecture . . . . . . . . . . . . . . . . . . . . 7 86 3.1. Decomposing Transport Functions . . . . . . . . . . . . . 9 87 4. High-Level Design Decisions . . . . . . . . . . . . . . . . . 11 88 4.1. Sequence Numbering . . . . . . . . . . . . . . . . . . . . 11 89 4.2. Reliability . . . . . . . . . . . . . . . . . . . . . . . 12 90 4.3. Buffers . . . . . . . . . . . . . . . . . . . . . . . . . 13 91 4.4. Signalling . . . . . . . . . . . . . . . . . . . . . . . . 13 92 4.5. Path Management . . . . . . . . . . . . . . . . . . . . . 14 93 4.6. Connection Identification . . . . . . . . . . . . . . . . 14 94 4.7. Network Layer Compatibility . . . . . . . . . . . . . . . 15 95 4.8. Congestion Control . . . . . . . . . . . . . . . . . . . . 15 96 5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 97 6. Security Considerations . . . . . . . . . . . . . . . . . . . 16 98 7. Interactions with Applications . . . . . . . . . . . . . . . . 16 99 8. Interactions with Middleboxes . . . . . . . . . . . . . . . . 16 100 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16 101 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 102 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16 103 11.1. Normative References . . . . . . . . . . . . . . . . . . . 16 104 11.2. Informative References . . . . . . . . . . . . . . . . . . 17 105 Appendix A. Implementation Architecture . . . . . . . . . . . . . 17 106 A.1. Functional Separation . . . . . . . . . . . . . . . . . . 18 107 A.1.1. Application to default MPTCP protocol . . . . . . . . 18 108 A.1.2. Generic architecture for MPTCP . . . . . . . . . . . . 21 109 A.2. PM/MPS interface . . . . . . . . . . . . . . . . . . . . . 22 110 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 112 1. Introduction 114 Multipath TCP (MPTCP) is a set of extensions of regular TCP [2] that 115 allow one TCP connection to be spread across multiple paths. This 116 section describes the motivation behind the design of Multipath TCP. 118 Companion documents to this architectural overview are those which 119 provide details of the protocol extensions [3], congestion control 120 algorithms [4], and application-level considerations [5]. Put 121 together, these components build a complete Multipath TCP 122 implementation. Other components, however, could be introduced in 123 place of these, in accordance with the architecture specified in this 124 document. 126 Please note this document is a work-in-progress and covers several 127 topics, some of which may be more appropriately moved to separate 128 documents as this work evolves. 130 1.1. Motivation 132 As the Internet evolves, demands on Internet resources are ever- 133 increasing, but often these resources (in particular, bandwidth) 134 cannot be fully utilised due to protocol constraints both on the end- 135 systems and within the network. If these resources could instead be 136 used concurrently, end user experience could be greatly improved. 137 Such enhancements would also reduce the necessary expenditure on 138 network infrastructure which would otherwise be needed to create an 139 equivalent improvement in user experience. 141 By the application of resource pooling [6], these available resources 142 can be 'pooled' such that they appear as a single logical resource to 143 the user. The purpose of Multipath TCP, therefore, is to provide a 144 TCP to the user that is able to make use of multiple available paths. 146 The achievement of resource pooling through Multipath TCP bring two 147 key benefits: 149 o To increase the resilience of the connectivity by providing 150 multiple paths, protecting end hosts from the failure of one. 152 o To increase the efficiency of the resource usage, and thus 153 increase the network capacity available to end hosts. 155 Multipath TCP as presented in [3] addresses these aims, by achieving 156 resource pooling through splitting a TCP session to run over multiple 157 paths, and presenting it as a single TCP connection to the 158 application. This is not the only way of creating a Multipath TCP, 159 however, and as such this architecture is designed so that other 160 components can be used to create an alternative solution, while still 161 achieving the goals of resource pooling. 163 1.2. Requirements Language 165 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 166 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 167 document are to be interpreted as described in RFC 2119 [1]. 169 1.3. Terminology 171 Path: A sequence of links between a sender and a receiver, defined 172 in this context by a source and destination address pair. 174 Endpoint: A host either initiating or terminating a MPTCP 175 connection. 177 Multipath TCP (MPTCP): A modified version of the TCP [2] protocol 178 that supports the simultaneous use of multiple paths between 179 endpoints. 181 Subflow: A flow of TCP packets operating over an individual path, 182 which forms part of a larger MPTCP connection. 184 MPTCP Connection: A set of one or more subflows combined to provide 185 a single Multipath TCP service to an application at an endpoint. 187 1.4. Reference Scenario 189 TBD - would this be useful? 191 Endpoints, routes. Addresses/path selection mechanisms? 193 2. Goals 195 This section outlines key goals for Multipath TCP. These are 196 separated into functional goals, i.e. the behaviour that MPTCP must 197 provide, and compatibility goals, i.e. the impact MPTCP must place on 198 other entities. 200 2.1. Functional Goals 202 The fundamental goal of MPTCP is to use multiple paths (which are not 203 necessarily entirely disjoint) between two endpoints. There are two 204 primary motivations for this goal, which themselves provide 205 functional goals for the design. These are: 207 o Improve Throughput: To do this, MPTCP MUST support the use of 208 multiple paths simultaneously. MPTCP SHOULD NOT reduce the 209 throughput seen below that of legacy TCP operating on any one of 210 the paths. 212 o Improve Resilience: MPTCP MUST support the use of multiple paths 213 interchangeably for resilience purposes, by permitting packets to 214 be sent and re-sent on any available path. It follows that, in 215 the worst case, the protocol MUST be no less resilient than legacy 216 TCP. 218 The secondary benefit of resource pooling is that, as MPTCP should be 219 able to balance traffic among available paths, and respond to 220 congestion appropriately, network utility should be optimized in a 221 global sense by shifting load away from congested bottlenecks and 222 taking advantage of spare capacity wherever it may be located. 224 To support the goal of resource pooling as presented above, a MPTCP 225 host must be able to detect and utilise multiple paths. Impacts on 226 the design of such functions are derived later in Section 3. 228 2.2. Compatibility Goals 230 In addition to the functional goals listed above, a Multipath TCP 231 must meet a number of compatibility goals in order to support 232 deployment in today's Internet. These goals fall into the following 233 categories: 235 2.2.1. Application Compatibility 237 Application compatibility refers to the appearance of MPTCP to the 238 application both in terms of the API that can be used and the 239 expected service model that is provided. 241 A multipath-capable equivalent of TCP SHOULD retain backward 242 compatibility with existing APIs, so that existing applications can 243 use the newer transport merely by upgrading the operating systems of 244 the end-hosts. This does not preclude the use of an advanced API to 245 permit multipath-aware applications to specify preferences, nor for 246 users to configure their systems in a different way from the default, 247 for example switching on or off the automatic use of MPTCP. 249 A Multipath TCP MUST follow the same service model as TCP: byte 250 oriented, in order reliable delivery. To have a deployable protocol, 251 MPTCP SHOULD adhere to the following "do no harm" philosophy: 252 multipath TCP SHOULD behave no worse (throughput wise) than running a 253 single TCP connection over any of its paths. 255 2.2.2. Network Compatibility 257 In terms of compatibility with the network layer, and devices that 258 operate at the network layer, Multipath TCP MUST remain backward 259 compatible with the Internet as it exists today, including being able 260 to traverse predominant existing middleboxes such as firewalls, NATs, 261 and performance enhancing proxies [7]. This has an effect on 262 protocol design, in terms of ensuring MPTCP still looks like TCP on 263 the wire, and uses established TCP extensions where appropriate. 264 Secondly, this may require the protocol extensions to feature 265 functionality to allow it to detect and traverse such established 266 middleboxes. 268 2.2.3. Compatibility with other network users 270 As a corollary to both network and application compatibility, the 271 architecture must enable new Multipath TCP flows to coexist 272 gracefully with existing legacy TCP flows, competing for bandwidth 273 neither unduly aggressively or unduly timidly (unless low-precedence 274 operation is specifically requested by the application, such as with 275 LEDBAT). The use of multiple paths MUST not significantly harm users 276 using single path TCP at shared bottlenecks, beyond the impact that 277 would occur from another single legacy TCP flow. 279 Furthermore, MPTCP SHOULD feature automatic negotiation of its use. 280 A host supporting Multipath TCP that requires the other endpoint to 281 do so too must be able to detect reliably whether this endpoint does 282 in fact support the next-generation protocol, using it if so, and 283 otherwise automatically falling back to the legacy protocol. 285 3. Multipath Architecture 287 Here we present an architectural view of multipath TCP. The 288 architecture directly follows the protocol goals as presented above, 289 and identifies the practical impact that these functional and 290 compatibility goals will have on the design of the MPTCP solution. 292 Multipath TCP operates at the transport layer, and its existence 293 should be transparent to both higher and lower layers. It is a set 294 of additional features on top of standard TCP, and as such the impact 295 on applications should be minimal, or entirely transparent 296 (application considerations are discussed in detail in [5]). 297 Although the standard TCP API will still be provided to the 298 application layer, multipath-aware applications would be able to use 299 an extended sockets API to have further influence on the behaviour of 300 MPTCP, which is also specified in [5]. 302 The MPTCP layer relies upon (what appear to the network to be) 303 standard TCP sessions, termed "subflows", to provide the underlying 304 transport per path, and as such these retain the network 305 compatibility desired. MPTCP as described in [3] carries MPTCP- 306 specific information in a TCP-compatible manner, although this 307 mechanism is separate from the actual information being transferred 308 so could evolve in future revisions. Figure 1 illustrates the 309 layered architecture. 311 +-------------------------------+ 312 | Application | 313 +---------------+ +-------------------------------+ 314 | Application | | MPTCP | 315 +---------------+ + - - - - - - - + - - - - - - - + 316 | TCP | | Subflow (TCP) | Subflow (TCP) | 317 +---------------+ +-------------------------------+ 318 | IP | | IP | IP | 319 +---------------+ +-------------------------------+ 321 Figure 1: Comparison of Standard TCP and MPTCP Protocol Stacks 323 Within the new MPTCP layer, a number of functions are provided that 324 can be identified and, if necessary, implemented separately within a 325 modular architecture. These functions are those for: 327 o Path Management: This is the function to detect and use multiple 328 paths between two endpoints. In the case of the MPTCP design [3], 329 this feature is implemented using multiple IP addresses at least 330 one of the endpoints. Although this does not guarantee path 331 diversity, and there may be shared bottlenecks, this is a simple 332 mechanism that can be used with no additional features in the 333 network. The path management features of the MPTCP protocol are 334 the mechanisms to signal alternative addresses to endpoints, and 335 mechanisms to set up new subflows attached to an existing MPTCP 336 connection. 338 o Packet Scheduling: This function breaks the bytestream received 339 from the application layer into segments which are transmitted on 340 one of the available lower (subflow) layers. The MPTCP design 341 makes use of a data sequence mapping, associating packets sent on 342 different subflows to a connection-level sequence numbering, thus 343 allowing packets sent on different subflows to be correctly re- 344 ordered at the receiver. The packet scheduler is dependent upon 345 information about the availability of paths exposed by the path 346 management component, and then makes use of the subflow layers to 347 transmit these packets. 349 o Subflow (single-path TCP) Interface: The subflow layer takes 350 segments from the packet scheduling component and transmits them 351 over the specified path, ensuring detectable delivery to the 352 endpoint. Detection of delivery is necessary to allow the 353 congestion control protocol to attribute packet delivery or loss 354 to the right path. Note that the packet scheduling layer does not 355 embed enough information in packets to allow this to happen: 356 segments with the same connection-level sequence number can be 357 transmitted over multiple paths, i.e. as retransmissions or just 358 to increase redundancy. MPTCP uses TCP at this layer for network 359 compatibility; TCP ensures in-order, reliable delivery. TCP adds 360 its of sequence numbers to the segments; these are used to detect 361 and retransmit lost packets. 363 o Congestion Control: This function manages congestion control 364 across the subflows. As specified, this congestion control 365 algorithm must ensure that a MPTCP connection does not unfairly 366 take more bandwidth than a single path TCP flow would take at a 367 shared bottlneck. An algorithm to support this is specified in 368 [4]. 370 These functions fit together as follows. The Path Management looks 371 after the discovery (and if necessary, initialisation) of multiple 372 paths between two endpoints. The Packet Scheduler then receives 373 packets from the application for the network and does the necessary 374 operations on them (such as adding a data-level sequence number) 375 before sending to the subflow layer. The subflow layer adds its own 376 sequence number, acks, and passes them to network. The receiving 377 subflow re-orders data and passes it to the multipath layer, which 378 performs connection level re-ordering, removes the segment boundaries 379 and sends it to the application. Finally, the congestion control 380 component exists as part of the packet scheduling, in order to 381 schedule which packets should be sent at what rate on which subflow. 383 3.1. Decomposing Transport Functions 385 This section provides a generic view of the above functional 386 separation, presenting an extensible model by which transport layer 387 functions can be analysed and developed in a modular fashion. 389 As shown in Figure 2, we first loosely separate functions within 390 transports into "application-oriented" and "network-oriented" parts. 391 We use this separation of functions as an architectural framework 392 that a multipath transport must recognize, primarily to maintain 393 backward compatibility with applications and with the network. The 394 desire for network compatibility will impact design choices at the 395 subflow level, while the need for application compatibility will 396 primarily impact design choices at the higher, application-facing 397 MPTCP layer. 399 The top application-oriented "Semantic" functions are whatever 400 communication abstractions are to be made available to applications, 401 including providing the end-to-end reliability and ordering 402 properties of abstractions like TCP's byte streams or SCTP's message- 403 based multi-streams; these functions essentially deal with concerns 404 of application-visible semantics. 406 We consider the bottom part "network-oriented" because they represent 407 functions that, while traditionally located in the ostensibly "end- 408 to-end" Transport Layer, have proven in practice to be of great 409 concern to network operators and the middleboxes they deploy in the 410 network to enforce network usage policies [8][9] or optimize 411 communication performance [10]. The network-oriented functions 412 include congestion control and other performance-management functions 413 ("Flow" performance functions), and endpoint/service identification 414 functions (e.g., port numbers) that network operators and their 415 middleboxes require to enforce network access and security policies 416 ("Endpoint" functions). These network-oriented transport functions 417 are collectively labeled in figure Figure 2 as "Flow/Endpoint" 418 functions. 420 +-----------------+ 421 | Application | 422 +---------------+ ---> +-----------------+ 423 | Application | / | Semantic | (Application-Oriented 424 +---------------+ <-- | Functions | Functions) 425 | Transport | |- - - - - -| 426 +---------------+ <-- | Flow / Endpoint | (Network-Oriented 427 | Network | \ | Functions | Functions) 428 +---------------+ ---> +-----------------+ 429 | Network | 430 +-----------------+ 432 Figure 2: Decomposition of Transport Functions 434 Following from the discussion above, a multipath transport would have 435 to manage Flow/Endpoint functions for every path in an end-to-end 436 connection, while providing a transparent single interface to the 437 application. In keeping with this architectural worldview, MPTCP 438 divides the Transport Layer into two components: the MPTCP part, 439 which is responsible for the Semantic functions of global ordering of 440 application data and reliability; and the "legacy TCP" part, which 441 implements the Flow/Endpoint functions. The figure below shows how 442 MPTCP implements this architecture: 444 +--------------------------+ +-------------------------+ 445 | Application | | Application | 446 +--------------------------+ +-------------------------+ 447 | Semantic | | MPTCP | 448 |- - - - - - - - - | + - - - - - + - - - - - + 449 | Flow/Endpt | Flow/Endpt | | TCP | TCP | 450 +--------------------------+ +-------------------------+ 451 | Network | Network | | IP | IP | 452 +--------------------------+ +-------------------------+ 454 Figure 3: Mapping Transport Architecture to MPTCP 456 4. High-Level Design Decisions 458 There is seemingly a wide range of choices when designing a multipath 459 extension to TCP. However, the goals as discussed earlier in this 460 document constrain the possible solutions, leaving relative little 461 choice in many areas. Here, we outline high-level design choices 462 derived from the architectural requirements, and their implications 463 for complete protocol design. 465 4.1. Sequence Numbering 467 MPTCP uses two layers of sequence spaces: a connection level sequence 468 number, and another sequence number for each subflow. This permits 469 connection-level segmentation and reassembly, and retransmission of 470 the same part of connection-level sequence space on different 471 subflow-level sequence space. 473 The alternative approach would be to use a single connection level 474 sequence number, which gets sent on multiple subflows. This has two 475 problems: first, the individual subflows will appear to the network 476 as TCP sessions with gaps in the sequence space; this in turn may 477 upset certain middleboxes such as intrusion detection systems, or 478 certain transparent proxies, and would go against the network 479 compatibility goal. Second, the sender cannot attribute packet 480 losses or receptions to the correct path when the same packet is sent 481 on multiple paths, in the case of retransmissions. 483 The sender must be able to tell the receiver how to reorder the data, 484 for delivery to the application. The sender does so by telling the 485 receiver how subflow-level data (carying subflow sequence numbers) 486 maps at connection level, which we refer to as Data Sequence Mapping. 487 This mapping takes the form (data seq, subflow seq, length), i.e. for 488 a given number of bytes (the length), the subflow sequence space 489 beginning at the given sequence number maps to the connection-level 490 sequence space (beginning at the given data seq number). 492 This architecture does not mandate a mechanism for signalling such 493 information, and it could conceivably have various sources. 495 One option would be to use existing fields in the TCP segment (such 496 as subflow seqno, length) and only add the data sequence number to 497 each segment, for instance as a TCP option. This is, however, 498 vulnerable to middleboxes that resegment or assemble data, since 499 there is no specified behaviour for coalescing TCP options. If one 500 signalled (data seqno, length), this would still be vulnerable to 501 middleboxes that coalesce segments and do not correctly coalesce the 502 options. Because of these potential issues, the current 503 specification of MPTCP mandates that the full mapping should be sent 504 to the other end. 506 To reduce the overhead, it would be permissable for the mapping to be 507 sent periodically and cover more than a single segment. It could 508 also be excluded entirely in the case of a connection before more 509 than one subflow is used, where the data-level and subflow-level 510 sequence space is the same. 512 4.2. Reliability 514 MPTCP uses the data sequence mapping and subflow ACKs to decide when 515 a connection-level segment was received. There are currently no 516 connection-level acks; this decision was made to reduce network 517 overheads. This has certain implications on end-to-end semantics. 518 It means that, once a packet is acked at subflow level it cannot be 519 discarded in the re-order buffer at the connection level. 520 Connection-level MPTCP ACKs are not cumulative, as in TCP. As such, 521 the emergent behaviour is different from standard TCP, where the 522 receiver can simply drop out-of-order segments if needed (for 523 instance, due to memory pressure). 525 It is possible to conceive of some cases where not adding data-level 526 acks could be detrimental to robustness. Consider a subflow 527 traversing a transparent proxy; if the proxy acks a segment and then 528 crashes, the sender will not retransmit the lost segment on another 529 subflow, as it thinks the segment has been received. The connection 530 grinds to a halt despite having other working subflows, and the 531 sender would be unable to determine the cause of the problem. To 532 deal with this case we are considering adding "informative" data- 533 level acks. 535 Regarding retransmissions, it must be possible for a packet to be 536 retransmitted on a different subflow to that on which it was 537 originally sent. This is one of MPTCP's core goals, in order to 538 maintain integrity during temporary or permanent subflow failure, and 539 this is enabled by the dual sequence number space. 541 The scheduling of retransmissions will have significant impact on 542 MPTCP user experience. The current MPTCP specification suggests that 543 data outstanding on subflows that have timed out should be 544 rescheduled for transmission on different subflows. This behaviour 545 aims to minimize disruption when a path breaks, and uses the first 546 timeout as indicators. More conservative versions would be to use 547 second or third timeouts for the same packet. 549 When packet loss is detected and corrected with fast retransmit, 550 retransmission on different subflows may still be desirable in 551 certain cases, for instance to reduce the receive buffer 552 requirements. However, the lost packets MUST still be sent on the 553 path that lost them (this is dictated by our network compatiblity 554 goal), so throughput will be wasted. It is unclear at this point 555 what the optimal retransmit strategy is. 557 4.3. Buffers 559 Receive Buffer: ideally, a subflow failing should not affect the 560 throughput of other working subflows. However, the receive buffer 561 has limited size: if a flow times out, the other subflows will 562 quickly fill the receive buffer with out-of-order data, and will 563 stall. Hence, receive buffer sizing is important for both robustness 564 and throughput. 566 The smallest receive buffer we need to avoid stalling under any 567 circumstances is max(RTO)*sum(BW). This is, for most multipath 568 connections, too expensive. A more reasonable size is proportional 569 to max(RTT)*sum(BW) which ensures subflows don't stall when fast 570 retransmit works. Also, depending on how the implementation behaves, 571 an additional sum(RTT*BW) might be needed for the individual re-order 572 buffers of the TCP subflows. 574 Send Buffer: the smallest send buffer we need is sum(BDP) across all 575 paths; this is to hold data until it's acked at subflow level. If we 576 didn't use a subflow level ack, and relied on a data-level ack, the 577 send buffer would need to be as big as the receive buffer of the 578 connection, max(RTT)*sum(BW). In practice, the senders will be web 579 servers and receivers will be desktops or mobile servers. The send 580 buffer size matters particularly for servers, which must be able to 581 maintain a large number of ongoing connections. 583 4.4. Signalling 585 Since MPTCP will use regular TCP streams as its transport mechanism, 586 a MPTCP connection will also begin as a single TCP stream. 587 Nevertheless, it must signal to the peer that it supports MPTCP and 588 wishes to use it on this connection. As such, a TCP Option will be 589 used to transmit this information, since this is the established 590 mechanism for indicating additional functionality on a TCP session. 592 On top of this, however, is signalling required during the operation 593 of an MPTCP session, such as that for reassembly for multiple 594 subflows, and for informing the other endpoint about potential other 595 available addresses. It is not mandated by the architecture in what 596 format this signalling should be transmitted. 598 The current MPTCP protocol proposal suggests the use of TCP options 599 for this signalling, however another approach would be to embed such 600 information in the payload, and use type-length-value (TLV) encoding 601 to separate signalling and payload data. 603 4.5. Path Management 605 Currently, the network does not expose multiple paths between 606 endpoints. Multipath TCP will use multiple addresses at one or both 607 endpoints to get different paths to the destination. The hope is 608 that these paths, whilst not necesarily entirely non-overlapping, 609 will be sufficiently disjoint to allow multipath achieve improved 610 throughput and robustness. 612 Multiple different (source, destination) address pairs will thus be 613 used as path selectors. 615 For increased chance of successfully setting up additional subflows 616 (such as when one end is behind a firewall, NAT, or other restrictive 617 middlebox), either endpoint should be able to add new subflows to a 618 MPTCP connection. 620 The modularity of path management will permit alternative mechanisms 621 to be employed if appropriate in the future. 623 4.6. Connection Identification 625 Since an MPTCP connection may not be bound to a traditional 5-tuple 626 (source addr and port, destination addr and port, protocol number) 627 for the entirity of its existance, it is desirable to provide a new 628 mechanism for connection identification. This will be useful for 629 MPTCP-aware applications, and for the MPTCP implementation (and 630 MPTCP-aware middleboxes) to have a unique identifier with which to 631 associate the multiple subflows. 633 Therefore, each MPTCP connection should have a connection identifier 634 at each endpoint, which is locally unique within that endpoint. This 635 is analogous to a port number in regular TCP. The manifestation and 636 purpose of such an identifier is out of the scope of this 637 architecture document. 639 For legacy applications, however, a MPTCP connection will be 640 identified by the 5-tuple of the first TCP subflow. [TBD: This will 641 continue to be the case even if that subflow closes / even if an 642 address disappears / the connection will close in that case unless 643 the extended API has been used / etc]. 645 4.7. Network Layer Compatibility 647 MPTCP's modifications remain at the TCP layer, although some 648 knowledge of the underlying IP layer is required. MPTCP MUST work 649 with IPv4 and IPv6 interchangeably, i.e. one MPTCP connection may 650 operate over both IPv4 and IPv6 networks. 652 4.8. Congestion Control 654 As already documented in network-layer compatibility requirements, 655 the congestion control algorithms used by an MPTCP implementation 656 must not harm other legacy users on shared bottlenecks. To achieve 657 this, the congestion control algorithms on use on each subflow must 658 be coupled in some way - a proposal for this is given in [4]. 660 5. Summary 662 This document has provided a summary of the components that have been 663 identified to provide a Multipath TCP solution, and described the 664 high-level design decisions that have been used as a basis of the 665 MPTCP specification. 667 The suite of drafts that specify a complete MPTCP implementation, on 668 top of this architectural overview, are as follows: 670 o A specification of the MPTCP protocol [3], describing the on- and 671 off-the-wire differences to regular TCP. 673 o A specification of a coupled congestion control algorithm [4], 674 that can be applied to the above protocol while meeting the goals 675 for such an algorithm as specified in this document. 677 o A document [5] that builds upon the application compatibility 678 issues discussed in this document, explaining in more detail what 679 if any changes an application may experience through the use of 680 MPTCP. This document also provides a proposed API through which 681 an application can influence the behaviour of the MPTCP protocol, 682 as specified in the above drafts. 684 6. Security Considerations 686 Please see [11] for a threat analysis of Multipath TCP. The threats 687 analysed in this companion document are addressed as appropriate in 688 the protocol design [3]. 690 7. Interactions with Applications 692 Interactions with applications - incuding, but not limited to, 693 performances changes that may be expected, semantic changes, and new 694 features that may be requested of an API, are presented in [5]. 696 8. Interactions with Middleboxes 698 TBD? 700 List of issues that may arise with NATs, firewalls, proxies, etc? 702 This will be an overview only, and protocol-specific solutions to 703 this will be given in the companion docments. 705 (Not sure we really need this section any more) 707 9. Acknowledgements 709 Alan Ford, Costin Raiciu and Sebastien Barre are supported by Trilogy 710 (http://www.trilogy-project.org), a research project (ICT-216372) 711 partially funded by the European Community under its Seventh 712 Framework Program. The views expressed here are those of the 713 author(s) only. The European Commission is not liable for any use 714 that may be made of the information in this document. 716 10. IANA Considerations 718 None. 720 11. References 722 11.1. Normative References 724 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 725 Levels", BCP 14, RFC 2119, March 1997. 727 11.2. Informative References 729 [2] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 730 September 1981. 732 [3] Ford, A., Raiciu, C., and M. Handley, "TCP Extensions for 733 Multipath Operation with Multiple Addresses", 734 draft-ford-mptcp-multiaddressed-02 (work in progress), 735 October 2009. 737 [4] Raiciu, C., Handley, M., and D. Wischik, "Coupled Multipath- 738 Aware Congestion Control", draft-raiciu-mptcp-congestion-00 739 (work in progress), October 2009. 741 [5] Scharf, M. and A. Ford, "MPTCP Application Interface 742 Considerations", draft-scharf-mptcp-api-00 (work in progress), 743 October 2009. 745 [6] Wischik, D., Handley, M., and M. Bagnulo Braun, "The Resource 746 Pooling Principle", ACM SIGCOMM CCR vol. 38 num. 5, pp. 47-52, 747 October 2008, 748 . 750 [7] Carpenter, B. and S. Brim, "Middleboxes: Taxonomy and Issues", 751 RFC 3234, February 2002. 753 [8] Srisuresh, P. and K. Egevang, "Traditional IP Network Address 754 Translator (Traditional NAT)", RFC 3022, January 2001. 756 [9] Freed, N., "Behavior of and Requirements for Internet 757 Firewalls", RFC 2979, October 2000. 759 [10] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. 760 Shelby, "Performance Enhancing Proxies Intended to Mitigate 761 Link-Related Degradations", RFC 3135, June 2001. 763 [11] Bagnulo, M., "Threat Analysis for Multi-addressed/Multi-path 764 TCP", draft-bagnulo-mptcp-threat-00 (work in progress), 765 October 2009. 767 [12] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion 768 Control", RFC 2581, April 1999. 770 Appendix A. Implementation Architecture 772 This section provides suggestions for an architecture to implement an 773 extensible, modular multipath transport protocol. 775 A.1. Functional Separation 777 This section describes a generic view of the internal implementation 778 of a Multipath TCP, through which the technical components specified 779 in the companion documents can fit together. It shows how an 780 implementation could be built that permits extensibility between 781 components without changing the external representation. 783 We first show the functional decomposition of an MPTCP solution that 784 is completely contained in the transport layer. That solution is 785 described in more details in [3]. Then we generalize the approach to 786 allow good extensibility of that solution. 788 A.1.1. Application to default MPTCP protocol 790 Although, in the default approach, MPTCP is fully contained in the 791 transport layer, it can still be divided into two main modules. One 792 manages the scheduling of packets as well as congestion control. The 793 other one manages the control of paths. The interface between the 794 two is dealt with thanks to a Path Index. As shown in Figure 4, the 795 Path Manager announces to the MultiPath Scheduler what paths can be 796 used trough path indices, and maintains the mapping between that 797 value and the particular action that it must apply to use the path 798 (an example of such a mapping is in Table 1). In the case of the 799 built-in Path Manager, the action is to replace an address/port pair 800 with another one, in such a way that another path is used across the 801 Internet to forward that packet. 803 Control plane <-- | --> Data plane 804 +---------------------------------------------------------------+ 805 | Multipath Scheduler (MPS) | 806 +---------------------------------------------------------------+ 807 ^ | | 808 | | [A1,B1,|pA1,pB1] 809 |For conn_id | | 810 | | +-------------+ 811 |Paths 1->4 can be | | Data packet |<--Path idx:3 812 |used. | +-------------+ attached 813 | | | by MPS 814 | | V 815 +--------------------------------------------\------------------+ 816 | Path Manager (PM) \[A1,B1]->[A1,B2] | 817 +--------------------------------------------------\------------+ 818 / \ | \ 819 /-----------------------------\ | /"\ /"\ /"\ /"\ 820 | rewriting table: || | | | | | | | | 821 | Subflow id <--> network_id || | | | | | | | | 822 | || | | | | | | | | 823 | [see table below] || | | | | | | | | 824 | || \./ \./ \./ \./ 825 +------------------------------+| path1 path2 path3 path4 827 Figure 4: Functional separation of MPTCP in the transport layer 829 The MultiPath Scheduler only deals with abstract paths, represented 830 by numbers. It only sees one address pair throughout the 831 communication, that we call the connection identifier. However, the 832 MultiPath Scheduler must be able to perform per-subflow congestion 833 control, and thus to distinguish between the subflows. This leads to 834 define a subflow identifier, that consists of the usual transport 835 identifier extended with the path index: 836 . The following options, 837 described in [3], are managed by the MultiPath Scheduler. 839 o MULTIPATH CAPABLE (MPC): Tell the peer that we support MPTCP. 840 Note that the MPC option also holds a token, which is necessary 841 only if the built-in Path Manager is used. In the next section we 842 describe the generalized case, where the token can be ignored by 843 the receiver if another path manager is used. 845 o DATA SEQUENCE NUMBER (DSN): Identify the position of a set of 846 bytes in the meta-flow. 848 o DATA FIN (DFIN): Terminate a meta-flow. 850 An implementation MUST use those options even if another Path Manager 851 than the default one is implemented. 853 The Path manager applies a particular technology to give the MPS the 854 possibility to use several paths. The built-in MPTCP Path Manager 855 uses multiple IPv4 addresses as its mean to influence the forwarding 856 of packets through the Internet. 858 When the MPS starts a new connection, the PM chooses a token that 859 will be used to identify the connection. This is necessary to allow 860 the PM applying the correct path index to incoming packets. An 861 example mapping table is given hereafter: 863 +-----------------+---------------+---------+-----------------+ 864 | connection id | subflow id | token | Network id | 865 +-----------------+---------------+---------+-----------------+ 866 | | | token_1 | | 867 | | | token_1 | | 868 | | | token_1 | | 869 | | | token_1 | | 870 | | | token_2 | | 871 | | | token_2 | | 872 +-----------------+---------------+---------+-----------------+ 874 Table 1: Example mapping table for built-in PM 876 Table 1 shows an example where two connections are ongoing. One is 877 identified by token_1, the other one with token_2. Since addresses 878 are rewritten by the path manager, the attachment to the right 879 connection is achieved thanks to the token, which is used at 880 connection establishment and subflow establishment. It is then 881 remembered. The first column holds the information that is exposed 882 to the applications, while the last column shows the information that 883 is actually written in packets that will fly through the network. We 884 note that additionnally to the addresses, ports can be rewritten, 885 which contributes to supporting NATs. The table also shows the role 886 of the token, which is to attach various combinations of ports and 887 addresses to a single connection. The token is specific to the 888 built-in path manager, and can be ignored if another path manager is 889 used. An implementation of the built-in path manager MUST implement 890 the following options (defined in more details in [3]): 892 o Add Address (ADDR): Announce a new address we own 894 o Remove Addresse (REMADDR): Withdraw a previously announced address 896 o Join Connection (JOIN): Attach a new subflow to the current 897 connection 899 Those options form the default MPTCP Path Manager, based on declaring 900 IP addresses, and carries control information in TCP options. An 901 implementation of Multipath TCP can use any Path Manager, but it MUST 902 be able to fallback to the default PM in case the other end does not 903 support the custom PM. Alternative Path Managers may be specified in 904 separate documents in the future. 906 A.1.2. Generic architecture for MPTCP 908 Now that the functional decomposition has been shown for MPTCP with 909 the built-in Path Manager, we show how that architecture can be 910 generalized to allow the implementation of other Path Managers for 911 MPTCP. A general overview of the architecture is provided in 912 Figure 5. The Multipath Scheduler (MPS) learns about the number of 913 available paths through notifications received from the Path Manager 914 (PM). From the point of view of the Multipath Scheduler, a path is 915 just a number, called a Path Index. Notifications from the PM to the 916 MPS MAY contain supporting information about the paths, if relevant, 917 so that the MPS can make more intelligent decisions about where to 918 route traffic. When the Multipath Scheduler initiates a 919 communication to a new host, it can only send the packets to the 920 default path. But since the Path manager is layered below the MPS, 921 it can detect that a new communication is happening, and tell the MPS 922 about the other paths it knows about. 924 Control plane <-- | --> Data plane 925 +---------------------------------------------------------------+ 926 | Multipath Scheduler (MPS) | 927 +---------------------------------------------------------------+ 928 ^ | | 929 | | [A1,B1,|pA1,pB1] 930 | | | 931 |Announcing new | +-------------+ 932 |paths. (referred | | Data packet |<--Path idx:3 933 |to as path indices) | +-------------+ attached 934 | | | by MPS 935 | | V 936 +--------------------------------------------\------------------+ 937 | Path Manager (PM) \__________zzzzz | 938 +--------------------------------------------------------\------+ 939 / \ | \ 940 /---------------------------\ | /"\ /"\ /"\ 941 | subflow_id Action | | | | | | | | 942 | xxxxx | | | | | | | | 943 | yyyyy | | \./ \./ \./ 944 | zzzzz | | path1 path2 path3 945 +---------------------------+ 947 Figure 5: Overview of MPTCP architecture 949 From then on, it is possible for the MPS to attach a Path Index to 950 the control structure of its packets (internal to the MPTCP 951 implementation), so that the Path Manager can map this Path Index to 952 the corresponding action. (see table in the lower left part of 953 Figure 5). The particular action depends on the network mechanism 954 used to select a path. Examples are address rewriting, tunnelling or 955 setting a path selector value inside the packet. 957 The applicability of the architecture is not limited to the MPTCP 958 protocol. While we define in this document an MPTCP MPS (MPTCP 959 Multipath Scheduler), other Multipath Schedulers can be defined. For 960 example, if an appropriate socket interface is designed, applications 961 could behave as a Multipath Scheduler and decide where to send any 962 particular data. In this document we concentrate on the MPTCP case, 963 however. 965 A.2. PM/MPS interface 967 The minimal set of requirement for a Path Manager is as follows: 969 o Outgoing untagged packets: Any outgoing packet flowing through the 970 Path Manager is either tagged or untagged (by the MPS) with a path 971 index. If it is untagged, the packet is sent normally to the 972 Internet, as if no multi-path support were present. Untagged 973 packets can be used to trigger a path discovery procedure, that 974 is, a Path Manager can listen to untagged packets and decide at 975 some time to find if any other path than the default one is 976 useable for the corresponding host pair. Note that any other 977 criteria could be used to decide when to start discovering 978 available paths. Note also that MPS scheduling will not be 979 possible until the Path Manager has notified the available paths. 980 The PM is thus the first entity coming into action. 982 o Outgoing tagged packets: The Path Manager maintains a table 983 mapping path indices to actions. The action is the operation that 984 allows using a particular path. Examples of possible actions are 985 route selection, interface selection or packet transformation. 986 When the PM sees a packet tagged with a path index, it looks up 987 its table to find the appropriate action for that packet. The tag 988 is purely local. It is removed before the packet is transmitted. 990 o Incoming packets: A Path Manager MUST ensure that each incoming 991 path is mapped unambiguously to exactly one outgoing path. Note 992 that this requirement implies that the same number of incoming/ 993 outgoing paths must be established. Moreover, a PM MUST tag any 994 incoming path with the same Path Index as the one used for the 995 corresponding outgoing path. This is necessary for MPTCP to know 996 what outgoing path is acknowledged by an incoming packet. 998 o Module interface: A PM MUST be able to notify the MPS about the 999 number of available paths. Such notifications MUST contain the 1000 path indices that are legal for use by the MPS. In case the PM 1001 decides to stop providing service for one path, it MUST notify the 1002 MPS about path removal. Additionnaly, a PM MAY provide 1003 complementary path information when available, such as link 1004 quality or preference level. 1006 Authors' Addresses 1008 Alan Ford (editor) 1009 Roke Manor Research 1010 Old Salisbury Lane 1011 Romsey, Hampshire SO51 0ZN 1012 UK 1014 Phone: +44 1794 833 465 1015 Email: alan.ford@roke.co.uk 1016 Costin Raiciu 1017 University College London 1018 Gower Street 1019 London WC1E 6BT 1020 UK 1022 Email: c.raiciu@cs.ucl.ac.uk 1024 Sebastien Barre 1025 Universite catholique de Louvain 1026 Pl. Ste Barbe, 2 1027 Louvain-la-Neuve 1348 1028 Belgium 1030 Phone: +32 10 47 91 03 1031 Email: sebastien.barre@uclouvain.be 1033 Janardhan Iyengar 1034 Franklin and Marshall College 1035 Mathematics and Computer Science 1036 PO Box 3003 1037 Lancaster, PA 17604-3003 1038 USA 1040 Phone: 717-358-4774 1041 Email: jiyengar@fandm.edu 1043 Bryan Ford 1044 Max Planck Institute for Software Systems 1045 Saarbrucken, 1046 Germany 1048 Email: baford@mpi-sws.org