idnits 2.17.1 draft-ietf-mptcp-architecture-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 16, 2010) is 4940 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: '8' is defined on line 1012, but no explicit reference was found in the text ** Obsolete normative reference: RFC 793 (ref. '1') (Obsoleted by RFC 9293) == Outdated reference: A later version (-12) exists of draft-ietf-mptcp-multiaddressed-01 == Outdated reference: A later version (-07) exists of draft-ietf-mptcp-congestion-00 == Outdated reference: A later version (-04) exists of draft-scharf-mptcp-api-02 == Outdated reference: A later version (-08) exists of draft-ietf-mptcp-threat-02 Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force A. Ford, Ed. 3 Internet-Draft Roke Manor Research 4 Intended status: Informational C. Raiciu 5 Expires: April 19, 2011 M. Handley 6 University College London 7 J. Iyengar 8 Franklin and Marshall College 9 October 16, 2010 11 Architectural Guidelines for Multipath TCP Development 12 draft-ietf-mptcp-architecture-02 14 Abstract 16 Hosts are often connected by multiple paths, but TCP restricts 17 communications to a single path per transport connection. Resource 18 usage within the network would be more efficient were these multiple 19 paths able to be used concurrently. This should enhance user 20 experience through improved resilience to network failure and higher 21 throughput. 23 This document outlines architectural guidelines for the development 24 of a Multipath Transport Protocol, with references to how these 25 architectural components come together in the development of a 26 Multipath TCP protocol. This document lists certain high level 27 design decisions that provide foundations for the design of the MPTCP 28 protocol, based upon these architectural requirements. 30 Status of this Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on April 19, 2011. 47 Copyright Notice 48 Copyright (c) 2010 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 64 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 65 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 66 1.3. Reference Scenario . . . . . . . . . . . . . . . . . . . . 5 67 2. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 68 2.1. Functional Goals . . . . . . . . . . . . . . . . . . . . . 6 69 2.2. Compatibility Goals . . . . . . . . . . . . . . . . . . . 7 70 2.2.1. Application Compatibility . . . . . . . . . . . . . . 7 71 2.2.2. Network Compatibility . . . . . . . . . . . . . . . . 7 72 2.2.3. Compatibility with other network users . . . . . . . . 9 73 2.2.4. Security Goals . . . . . . . . . . . . . . . . . . . . 9 74 3. An Architectural Basis For MPTCP . . . . . . . . . . . . . . . 9 75 4. A Functional Decomposition of MPTCP . . . . . . . . . . . . . 11 76 5. High-Level Design Decisions . . . . . . . . . . . . . . . . . 13 77 5.1. Sequence Numbering . . . . . . . . . . . . . . . . . . . . 13 78 5.2. Reliability and Retransmissions . . . . . . . . . . . . . 14 79 5.3. Buffers . . . . . . . . . . . . . . . . . . . . . . . . . 15 80 5.4. Signalling . . . . . . . . . . . . . . . . . . . . . . . . 16 81 5.5. Path Management . . . . . . . . . . . . . . . . . . . . . 17 82 5.6. Connection Identification . . . . . . . . . . . . . . . . 18 83 5.7. Congestion Control . . . . . . . . . . . . . . . . . . . . 18 84 5.8. Security . . . . . . . . . . . . . . . . . . . . . . . . . 19 85 6. Interactions with Applications . . . . . . . . . . . . . . . . 20 86 7. Interactions with Middleboxes . . . . . . . . . . . . . . . . 20 87 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 22 88 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22 89 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 90 11. Security Considerations . . . . . . . . . . . . . . . . . . . 22 91 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 92 12.1. Normative References . . . . . . . . . . . . . . . . . . . 22 93 12.2. Informative References . . . . . . . . . . . . . . . . . . 23 94 Appendix A. Changelog . . . . . . . . . . . . . . . . . . . . . . 24 95 A.1. Changes since draft-ietf-mptcp-architecture-01 . . . . . . 24 96 A.2. Changes since draft-ietf-mptcp-architecture-00 . . . . . . 24 97 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 99 1. Introduction 101 As the Internet evolves, demands on Internet resources are ever- 102 increasing, but often these resources (in particular, bandwidth) 103 cannot be fully utilised due to protocol constraints both on the end- 104 systems and within the network. If these resources could instead be 105 used concurrently, end user experience could be greatly improved. 106 Such enhancements would also reduce the necessary expenditure on 107 network infrastructure that would otherwise be needed to create an 108 equivalent improvement in user experience. 110 By the application of resource pooling[3], these available resources 111 can be 'pooled' such that they appear as a single logical resource to 112 the user. The purpose of a multipath transport, therefore, is to 113 make use of multiple available paths, through resource pooling, to 114 bring two key benefits: 116 o To increase the resilience of the connectivity by providing 117 multiple paths, protecting end hosts from the failure of one. 119 o To increase the efficiency of the resource usage, and thus 120 increase the network capacity available to end hosts. 122 MPTCP [4] is a set of extensions for TCP[1] that implements a 123 multipath transport and achieves these goals by pooling multiple 124 paths within a transport connection, transparent to the application. 125 Although multihoming and multipath functions are not new to transport 126 protocols, MPTCP aims to gain wide-scale deployment by recognising 127 the importance of application and network compatibility goals. These 128 goals, discussed in detail in Section 2, relate to the appearance of 129 MPTCP to the network (so non-MPTCP-aware entities see it as TCP) and 130 to the application (through providing an equivalent service to TCP to 131 non-MPTCP-aware applications). 133 This document has three key purposes: (i) it describes goals for a 134 multipath transport - goals that MPTCP is designed to meet; (ii) it 135 lays out an architectural basis for MPTCP's design - a discussion 136 that applies to other multipath transports as well; and (iii) it 137 discusses and documents high-level design decisions made in MPTCP's 138 development, and considers their implications. 140 Companion documents to this architectural overview are those which 141 provide details of the protocol extensions[4], congestion control 142 algorithms[5], and application-level considerations[6]. Put 143 together, these components specify a complete Multipath TCP design. 144 We note that specific components are replaceable with other protocols 145 in accordance with the layer and functional decompositions discussed 146 in this document. 148 1.1. Requirements Language 150 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 151 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 152 document are to be interpreted as described in RFC 2119 [2]. 154 1.2. Terminology 156 Path: A sequence of links between a sender and a receiver, defined 157 in this context by a source and destination address pair. 159 Path Identifier: Within the context of a multi-addressed multipath 160 TCP, a path is defined by the source and destination (address, 161 port) pairs (i.e. a 4-tuple). 163 Host: An end host either initiating or terminating a MPTCP 164 connection. 166 Multipath TCP: A modified version of the TCP [1] protocol that 167 supports the simultaneous use of multiple paths between hosts. 169 MPTCP: The proposed protocol extensions specified in [4] to provide 170 a Multipath TCP implementation. 172 Subflow: A flow of TCP packets operating over an individual path, 173 which forms part of a larger MPTCP connection. 175 MPTCP Connection: A set of one or more subflows combined to provide 176 a single Multipath TCP service to an application at a host. 178 1.3. Reference Scenario 180 The diagram shown in Figure 1 illustrates a typical usage scenario 181 for MPTCP. Two hosts, A and B, are communicating with each other. 182 These hosts are multi-homed and multi-addressed, providing two 183 disjoint connections to the Internet. The addresses on each host are 184 referred to as A1, A2, B1 and B2. There are therefore up to four 185 different paths between the two hosts: A1-B1, A1-B2, A2-B1, A2-B2. 187 +------+ __________ +------+ 188 | |A1 ______ ( ) ______ B1| | 189 | Host |--/ ( ) \--| Host | 190 | | ( Internet ) | | 191 | A |--\______( )______/--| B | 192 | |A2 (__________) B2| | 193 +------+ +------+ 195 Figure 1: Simple MPTCP Usage Scenario 197 The scenario could have any number of addresses (1 or more) on each 198 host, so long as the number of paths available between the two hosts 199 is 2 or more (i.e. num_addr(A) * num_addr(B) > 1). The paths created 200 by these address combinations through the Internet need not be 201 entirely disjoint - shared bottlenecks will be addressed by the MPTCP 202 congestion controller. Furthermore, the paths through the Internet 203 may be interrupted by any number of middleboxes including NATs and 204 Firewalls. Finally, although the diagram refers to the Internet, 205 MPTCP may be used over any network where there are multiple paths 206 that could be used concurrently. 208 2. Goals 210 This section outlines primary goals that Multipath TCP aims to meet. 211 These are broadly broken down into: functional goals, which steer 212 services and features that Multipath TCP must provide; and 213 compatibility goals, which determine how Multipath TCP should appear 214 to entities that interact with it. 216 2.1. Functional Goals 218 In supporting the use of multiple paths, Multipath TCP has the 219 following two functional goals. 221 o Improve Throughput: Multipath TCP MUST support the concurrent use 222 of multiple paths. To meet the minimum performance incentives for 223 deployment, a Multipath TCP connection over multiple paths SHOULD 224 achieve no lesser throughput than a single TCP connection over the 225 best constituent path. 227 o Improve Resilience: Multipath TCP MUST support the use of multiple 228 paths interchangeably for resilience purposes, by permitting 229 packets to be sent and re-sent on any available path. It follows 230 that, in the worst case, the protocol MUST be no less resilient 231 than regular single-path TCP. 233 As distribution of traffic among available paths and responses to 234 congestion are done in accordance with resource pooling 235 principles[3], a secondary effect of meeting these goals is that 236 widespread use of Multipath TCP over the Internet should optimize 237 overall network utility by shifting load away from congested 238 bottlenecks and by taking advantage of spare capacity wherever 239 possible. 241 Furthermore, Multipath TCP SHOULD feature automatic negotiation of 242 its use. A host supporting Multipath TCP that requires the other 243 host to do so too must be able to detect reliably whether this host 244 does in fact support the required extensions, using them if so, and 245 otherwise automatically falling back to single-path TCP. 247 2.2. Compatibility Goals 249 In addition to the functional goals listed above, a Multipath TCP 250 must meet a number of compatibility goals in order to support 251 deployment in today's Internet. These goals fall into the following 252 categories: 254 2.2.1. Application Compatibility 256 Application compatibility refers to the appearance of Multipath TCP 257 to the application both in terms of the API that can be used and the 258 expected service model that is provided. 260 Multipath TCP MUST follow the same service model as TCP [1]: in- 261 order, reliable, and byte-oriented delivery. Furthermore, an 262 Multipath TCP connection SHOULD provide the application with no worse 263 throughput than it would expect from running a single TCP connection 264 over any one of its available paths. 266 A multipath-capable equivalent of TCP SHOULD retain backward 267 compatibility with existing TCP APIs, so that existing applications 268 can use the newer transport merely by upgrading the operating systems 269 of the end-hosts. This does not preclude the use of an advanced API 270 to permit multipath-aware applications to specify preferences, nor 271 for users to configure their systems in a different way from the 272 default, for example switching on or off the automatic use of 273 multipath extensions. 275 2.2.2. Network Compatibility 277 Traditional Internet architecture slots network devices in the 278 network layer and lower layers, where the layers above the network 279 layer are instantiated only at the end-hosts. While this 280 architecture, shown in Figure 2, was initially largely adhered to, 281 this layering no longer reflects the "ground truth" in the Internet 282 with the proliferation of middleboxes[7]. Middleboxes routinely 283 interpose on the transport layer; sometimes even completely 284 terminating transport connections, thus leaving the application layer 285 as the first real end-to-end layer, as shown in Figure 3. 287 +-------------+ +-------------+ 288 | Application |<------------ end-to-end ------------->| Application | 289 +-------------+ +-------------+ 290 | Transport |<------------ end-to-end ------------->| Transport | 291 +-------------+ +-------------+ +-------------+ +-------------+ 292 | Network |<->| Network |<->| Network |<->| Network | 293 +-------------+ +-------------+ +-------------+ +-------------+ 294 End Host Router Router End Host 296 Figure 2: Traditional Internet Architecture 298 +-------------+ +-------------+ 299 | Application |<------------ end-to-end ------------->| Application | 300 +-------------+ +-------------+ +-------------+ 301 | Transport |<------------------->| Transport |<->| Transport | 302 +-------------+ +-------------+ +-------------+ +-------------+ 303 | Network |<->| Network |<->| Network |<->| Network | 304 +-------------+ +-------------+ +-------------+ +-------------+ 305 Firewall, 306 End Host Router NAT, or Proxy End Host 308 Figure 3: Internet Reality 310 Middleboxes that interpose on the transport layer result in loss of 311 "fate-sharing"[8], that is, they often hold "hard" state that, when 312 lost or corrupted, results in loss or corruption of the end-to-end 313 transport connection. 315 The network compatibility goal requires that the multipath extension 316 to TCP retains compatibility with the Internet as it exists today, 317 including making reasonable efforts to be able to traverse 318 predominant middleboxes such as firewalls, NATs, and performance 319 enhancing proxies[7]. This requirement comes from recognizing 320 middleboxes as a significant deployment bottleneck for any transport 321 that is not TCP, and constrains Multipath TCP to appear as TCP does 322 on the wire and to use established TCP extensions where necessary. 323 To ensure end-to-endness of the transport, we further require 324 Multipath TCP to preserve fate-sharing without making any assumptions 325 about middlebox behavior. 327 A detailed analysis of middlebox behaviour and the impact on the 328 Multipath TCP architecture is presented in Section 7. In addition, 329 network compatibility must be retained to the extent that Multipath 330 TCP MUST fall back to regular TCP if there are insurmountable 331 incompatibilities for the multipath extension on a path. 333 MPTCP's modifications remain at the transport layer, although some 334 knowledge of the underlying network layer is required. MPTCP SHOULD 335 work with IPv4 and IPv6 interchangeably, i.e. one MPTCP connection 336 may operate over both IPv4 and IPv6 networks. 338 2.2.3. Compatibility with other network users 340 As a corollary to both network and application compatibility, the 341 architecture must enable new Multipath TCP flows to coexist 342 gracefully with existing single-path TCP flows, competing for 343 bandwidth neither unduly aggressively or unduly timidly (unless low- 344 precedence operation is specifically requested by the application, 345 such as with LEDBAT). The use of multiple paths MUST NOT unduly harm 346 users using single-path TCP at shared bottlenecks, beyond the impact 347 that would occur from another single-path TCP flow. Multiple 348 Multipath TCP flows on a shared bottleneck MUST share bandwidth 349 between each other with the similar fairness to that which occurs 350 with a shared bottleneck with single-path TCP. 352 2.2.4. Security Goals 354 The extension of TCP with multipath capabilities will bring with it a 355 number of new threats, analysed in detail in [9]. The security goal 356 for Multipath TCP is to provide a service no less secure than 357 regular, single-path TCP. This will be achieved through a 358 combination of existing TCP security mechanisms (potentially modified 359 to align with the Multipath TCP extensions) and of protection against 360 the new multipath threats identified. The design decisions derived 361 from this goal are presented in Section 5.8. 363 3. An Architectural Basis For MPTCP 365 We now present one possible transport architecture that we believe 366 can effectively support MPTCP's goals. The new Internet model 367 described here is based on ideas proposed earlier in Tng ("Transport 368 next-generation") [10]. While by no means the only possible 369 architecture supporting multipath transport, Tng incorporates many 370 lessons learned from previous transport research and development 371 practice, and offers a strong starting point from which to consider 372 the extant Internet architecture and its bearing on the design of any 373 new Internet transports or transport extensions. 375 +------------------+ 376 | Application | 377 +------------------+ ^ Application-oriented transport 378 | | | functions (Semantic Layer) 379 + - - Transport - -+ ---------------------------------- 380 | | | Network-oriented transport 381 +------------------+ v functions (Flow+Endpoint Layer) 382 | Network | 383 +------------------+ 384 Existing Layers Tng Decomposition 386 Figure 4: Decomposition of Transport Functions 388 Tng loosely splits the transport layer into "application-oriented" 389 and "network-oriented" layers, as shown in Figure 4. The 390 application-oriented "Semantic" layer implements functions driven 391 primarily by concerns of supporting and protecting the application's 392 end-to-end communication, while the network-oriented "Flow+Endpoint" 393 layer implements functions such as endpoint identification (using 394 port numbers) and congestion control. These network-oriented 395 functions, while traditionally located in the ostensibly "end-to-end" 396 Transport layer, have proven in practice to be of great concern to 397 network operators and the middleboxes they deploy in the network to 398 enforce network usage policies[11] [12] or optimize communication 399 performance[13]. Figure 5 shows how middleboxes interact with 400 different layers in this decomposed model of the transport layer: the 401 application-oriented layer operates end-to-end, while the network- 402 oriented layer operates "segment-by-segment" and can be interposed 403 upon by middleboxes. 405 +-------------+ +-------------+ 406 | Application |<------------ end-to-end ------------->| Application | 407 +-------------+ +-------------+ 408 | Semantic |<------------ end-to-end ------------->| Semantic | 409 +-------------+ +-------------+ +-------------+ +-------------+ 410 |Flow+Endpoint|<->|Flow+Endpoint|<->|Flow+Endpoint|<->|Flow+Endpoint| 411 +-------------+ +-------------+ +-------------+ +-------------+ 412 | Network |<->| Network |<->| Network |<->| Network | 413 +-------------+ +-------------+ +-------------+ +-------------+ 414 Firewall Performance 415 End Host or NAT Enhancing Proxy End Host 417 Figure 5: Middleboxes in the new Internet model 419 MPTCP's architectural design follows Tng's decomposition as shown in 420 Figure 6. MPTCP, which provides application compatibility through 421 the preservation of TCP-like semantics of global ordering of 422 application data and reliability, is an instantiation of the 423 "application-oriented" Semantic layer; whereas the subflow TCP 424 component, which provides network compatibility by appearing and 425 behaving as a TCP flow in network, is an instantiation of the 426 "network-oriented" Flow+Endpoint layer. 428 +--------------------------+ +-------------------------------+ 429 | Application | | Application | 430 +--------------------------+ +-------------------------------+ 431 | Semantic | | MPTCP | 432 |------------+-------------| + - - - - - - - + - - - - - - - + 433 | Flow+Endpt | Flow+Endpt | | Subflow (TCP) | Subflow (TCP) | 434 +------------+-------------+ +---------------+---------------+ 435 | Network | Network | | IP | IP | 436 +------------+-------------+ +---------------+---------------+ 438 Figure 6: MPTCP mapping to Tng 440 As a protocol extension to TCP, MPTCP thus explicitly acknowledges 441 middleboxes in its design, and specifies a protocol that operates at 442 two scales: the MPTCP component operates end-to-end, while it allows 443 the TCP component to operate segment-by-segment. 445 4. A Functional Decomposition of MPTCP 447 MPTCP, as described in [4], makes use of (what appear to the network 448 to be) standard TCP sessions, termed "subflows", to provide the 449 underlying transport per path, and as such these retain the network 450 compatibility desired. MPTCP-specific information is carried in a 451 TCP-compatible manner, although this mechanism is separate from the 452 actual information being transferred so could evolve in future 453 revisions. Figure 7 illustrates the layered architecture. 455 +-------------------------------+ 456 | Application | 457 +---------------+ +-------------------------------+ 458 | Application | | MPTCP | 459 +---------------+ + - - - - - - - + - - - - - - - + 460 | TCP | | Subflow (TCP) | Subflow (TCP) | 461 +---------------+ +-------------------------------+ 462 | IP | | IP | IP | 463 +---------------+ +-------------------------------+ 465 Figure 7: Comparison of Standard TCP and MPTCP Protocol Stacks 467 Situated below the application, the MPTCP extension in turn manages 468 multiple TCP subflows below it. In order to do this, it must 469 implement the following functions: 471 o Path Management: This is the function to detect and use multiple 472 paths between two hosts. In the case of the MPTCP design [4], 473 this feature is implemented using multiple IP addresses at one or 474 both of the hosts. The path management features of the MPTCP 475 protocol are the mechanisms to signal alternative addresses to 476 hosts, and mechanisms to set up new subflows joined to an existing 477 MPTCP connection. 479 o Packet Scheduling: This function breaks the bytestream received 480 from the application into segments to be transmitted on one of the 481 available lower subflows. The MPTCP design makes use of a data 482 sequence mapping, associating segments sent on different subflows 483 to a connection-level sequence numbering, thus allowing segments 484 sent on different subflows to be correctly re-ordered at the 485 receiver. The packet scheduler is dependent upon information 486 about the availability of paths exposed by the path management 487 component, and then makes use of the subflows to transmit queued 488 segments. 490 o Subflow (single-path TCP) Interface: A subflow component takes 491 segments from the packet-scheduling component and transmits them 492 over the specified path, ensuring detectable delivery to the host. 493 MPTCP uses TCP underneath for network compatibility; TCP ensures 494 in-order, reliable delivery. TCP adds its own sequence numbers to 495 the segments; these are used to detect and retransmit lost packets 496 at the subflow layer. The connection-level sequence numbering 497 from the packet scheduling component allows re-ordering of the 498 entire bytestream. 500 o Congestion Control: This function coordinates congestion control 501 across the subflows. As specified, this congestion control 502 algorithm MUST ensure that a MPTCP connection does not unfairly 503 take more bandwidth than a single path TCP flow would take at a 504 shared bottlneck. An algorithm to support this is specified in 505 [5]. 507 These functions fit together as follows. The Path Management looks 508 after the discovery (and if necessary, initialisation) of multiple 509 paths between two hosts. The Packet Scheduler then receives a stream 510 of data from the application destined for the network, and undertakes 511 the necessary operations on it (such as segmenting the data into 512 connection-level segments, and adding a connection-level sequence 513 number) before sending it on to a subflow. The subflow then adds its 514 own sequence number, acks, and passes them to network. The receiving 515 subflow re-orders data (if necessary) and passes it to the packet 516 scheduling component, which performs connection level re-ordering, 517 and sends the data stream to the application. Finally, the 518 congestion control component exists as part of the packet scheduling, 519 in order to schedule which packets should be sent at what rate on 520 which subflow. 522 5. High-Level Design Decisions 524 There is seemingly a wide range of choices when designing a multipath 525 extension to TCP. However, the goals as discussed earlier in this 526 document constrain the possible solutions, leaving relative little 527 choice in many areas. Here, we outline high-level design choices 528 that draw from the architectural basis discussed earlier in 529 Section 3, and their implications for the MPTCP design [4]. 531 5.1. Sequence Numbering 533 MPTCP uses two levels of sequence spaces: a connection level sequence 534 number, and another sequence number for each subflow. This permits 535 connection-level segmentation and reassembly, and retransmission of 536 the same part of connection-level sequence space on different 537 subflow-level sequence space. 539 The alternative approach would be to use a single connection level 540 sequence number, which gets sent on multiple subflows. This has two 541 problems: first, the individual subflows will appear to the network 542 as TCP sessions with gaps in the sequence space; this in turn may 543 upset certain middleboxes such as intrusion detection systems, or 544 certain transparent proxies, and would thus go against the network 545 compatibility goal. Second, the sender would not be able to 546 attribute packet losses or receptions to the correct path when the 547 same packet is sent on multiple paths (i.e. in the case of 548 retransmissions). 550 The sender must be able to tell the receiver how to reassemble the 551 data, for delivery to the application. In order to achieve this, the 552 receiver must determine how subflow-level data (carying subflow 553 sequence numbers) maps at the connection level. We refer to this as 554 the Data Sequence Mapping. This mapping takes the form (data seq, 555 subflow seq, length), i.e. for a given number of bytes (the length), 556 the subflow sequence space beginning at the given sequence number 557 maps to the connection-level sequence space (beginning at the given 558 data seq number). 560 This architecture does not mandate a mechanism for signalling the 561 Data Sequence Mapping, and it could conceivably have various sources. 563 One option would be to use existing fields in the TCP segment (such 564 as subflow seqno, length) and only add the data sequence number to 565 each segment, for instance as a TCP option. This is, however, 566 vulnerable to middleboxes that resegment or assemble data, since 567 there is no specified behaviour for coalescing TCP options. If one 568 signalled (data seqno, length), this would still be vulnerable to 569 middleboxes that coalesce segments and do not understand MPTCP 570 signalling so do not correctly rewrite the options. 572 Because of these potential issues, the design decision taken in the 573 MPTCP protocol [4] is that whenever a mapping for subflow data needs 574 to be conveyed to the other host, all three pieces of data (data seq, 575 subflow seq, length) must be sent. To reduce the overhead, it would 576 be permissable for the mapping to be sent periodically and cover more 577 than a single segment. Further experimentation is required to 578 determine what tradeoffs exist regarding the frequency at which 579 mappings should be sent. It could also be excluded entirely in the 580 case of a connection before more than one subflow is used, where the 581 data-level and subflow-level sequence space is the same. 583 5.2. Reliability and Retransmissions 585 MPTCP features acknowledgements at connection-level as well as 586 subflow-level acknowledgements, in order to provide a robust service 587 to the application. 589 Under normal behaviour, MPTCP can use the data sequence mapping and 590 subflow ACKs to decide when a connection-level segment was received. 591 The transmission of TCP ACKs for a subflow are handled entirely at 592 the subflow level, in order to maintain TCP semantics and trigger 593 subflow-level retransmissions. This has certain implications on end- 594 to-end semantics. It means that once a packet is acked at the 595 subflow level it cannot be discarded in the re-order buffer at the 596 connection level. Secondly, unlike in standard TCP, a receiver 597 cannot simply drop out-of-order segments if needed (for instance, due 598 to memory pressure). Under certain circumstances, therefore, it may 599 be desirable to be able to drop packets after acknowledgement on the 600 subflow but before delivery to the application, and this can be 601 facilitated by a connection-level acknowledgement. 603 Furthermore, it is possible to conceive of some cases where 604 connection-level acknowledgements could improve robustness. Consider 605 a subflow traversing a transparent proxy: if the proxy acks a segment 606 and then crashes, the sender will not retransmit the lost segment on 607 another subflow, as it thinks the segment has been received. The 608 connection grinds to a halt despite having other working subflows, 609 and the sender would be unable to determine the cause of the problem. 610 An example situation where this may occur would be mobility between 611 wireless access points, each of which operates a transport-level 612 proxy. Finally, as an optimisation, it may be feasible for a 613 connection-level acknowledgement to be transmitted over the shortest 614 RTT path, potentially reducing send buffer requirements (see 615 Section 5.3). 617 Therefore, to provide a fully robust multipath TCP solution, MPTCP 618 SHOULD feature explicit connection-level acknowledgements, in 619 addition to subflow-level acknowledgements. A connection-level 620 acknowledgement would only be required in order to signal when the 621 receive window moves forward; the heuristics for using such a signal 622 are discussed in more detail in the protocol specificiation [4]. 624 Regarding retransmissions, it MUST be possible for a packet to be 625 retransmitted on a different subflow to that on which it was 626 originally sent. This is one of MPTCP's core goals, in order to 627 maintain integrity during temporary or permanent subflow failure, and 628 this is enabled by the dual sequence number space. 630 The scheduling of retransmissions will have significant impact on 631 MPTCP user experience. The current MPTCP specification suggests that 632 data outstanding on subflows that have timed out should be 633 rescheduled for transmission on different subflows. This behaviour 634 aims to minimize disruption when a path breaks, and uses the first 635 timeout as indicators. More conservative versions would be to use 636 second or third timeouts for the same packet. 638 Typically, fast retransmit on an individual subflow will not trigger 639 retransmission on another subflow, although this may still be 640 desirable in certain cases, for instance to reduce the receive buffer 641 requirements. However, in all cases with retransmissions on 642 different subflows, the lost packets SHOULD still be sent on the path 643 that lost them. This is currently believed to be necessary to 644 maintain subflow integrity, as per the network compatiblity goal. By 645 doing this, throughput will be wasted, and it is unclear at this 646 point what the optimal retransmit strategy is. 648 Large-scale experiments are therefore required in order to determine 649 the most appropriate retransmission strategy, and recommendations 650 will be refined once more information is available. 652 5.3. Buffers 654 To ensure in-order delivery, Multipath TCP must use a connection 655 level receive buffer, where segments are placed until they are in 656 order and can be read by the application. 658 In regular, single-path TCP, it is usually recommended to set the 659 receive buffer to 2*BDP (Bandwidth-Delay Product, i.e. BDP = BW*RTT, 660 where BW = Bandwidth and RTT = Round-Trip Time). One BDP allows 661 supporting reordering of segments by the network. The other BDP 662 allows the connection to continue during fast retransmit: when a 663 segment is fast retransmitted, the receiver must be able to store 664 incoming data during one more RTT. 666 For Multipath TCP, the story is a bit more complicated. The ultimate 667 goal is that a subflow packet loss or subflow failure should not 668 affect the throughput of other working subflows; the receiver should 669 have enough buffering to store all data until the missing packet is 670 re-transmitted and reaches the destination. 672 The worst case scenario would be when the subflow with the highest 673 RTT/RTO (Round-Trip Time or Retransmission TimeOut) experiences a 674 timeout; in that case the receiver has to buffer data from all 675 subflows for the duration of the RTO. Thus, the smallest connection- 676 level receive buffer that would be needed to avoid stalling with 677 subflow failures is sum(BW_i)*RTO_max, where BW_i = Bandwidth for 678 each subflow and RTO_max is the largest RTO across all subflows. 680 This is an order of magnitude more than the receive buffer required 681 for a single connection, and is probably too expensive for practical 682 purposes. A more sensible requirement is to avoid stalls in the 683 absence of timeouts. Therefore, the RECOMMENDED receive buffer is 684 2*sum(BW_i)*RTT_max, where RTT_max is the largest RTT across all 685 subflows. This buffer sizing ensures subflows do not stall when fast 686 retransmit is triggered on any subflow. 688 The resulting buffer size should be small enough for practical use. 689 However, there may be extreme cases where fast, high throughput paths 690 (e.g. 100Mb/s, 10ms RTT) are used in conjunction with slow paths 691 (e.g. 1Mb/s, 1000ms RTT). In that case the required receive buffer 692 would be 12.5MB, which is likely too big. In these cases a Multipath 693 TCP scheduler SHOULD use only the fast path, potentially falling back 694 to the slow path if the fast path fails. 696 Send Buffer: The RECOMMENDED send buffer is the same size as the 697 recommended receive buffer i.e., 2*sum(BW_i)*RTT_max. This is 698 because the sender must store locally the segments sent but 699 unacknowledged by the connection level ACK. The send buffer size 700 matters particularly for hosts that maintain a large number of 701 ongoing connections. If the required send buffer is too large, a 702 host can choose to only send data on the fast subflows, using the 703 slow subflows only in cases of failure. 705 5.4. Signalling 707 Since MPTCP uses TCP as its subflow transport mechanism, a MPTCP 708 connection will also begin as a single TCP connection. Nevertheless, 709 it must signal to the peer that it supports MPTCP and wishes to use 710 it on this connection. As such, a TCP Option will be used to 711 transmit this information, since this is the established mechanism 712 for indicating additional functionality on a TCP session. 714 In addition, further signalling is required during the operation of 715 an MPTCP session, such as that for reassembly for multiple subflows, 716 and for informing the other host about potential other available 717 addresses. It is not mandated by the architecture in what format 718 this signalling should be transmitted. 720 The MPTCP protocol design [4] continues to use TCP Options for this 721 signalling. This has been chosen as the mechanism most fitting in 722 with the goals as specified in Section 2. With this mechanism, the 723 signalling requires to operate MPTCP is transported separately from 724 the data, allowing it to be created and processed separately from the 725 data stream, and retaining architectural compatibility with network 726 entities. 728 5.5. Path Management 730 Currently, the network does not expose multiple paths between hosts. 731 Multipath TCP will use multiple addresses at one or both hosts to 732 infer different paths across the network. The hope is that these 733 paths, whilst not necesarily entirely non-overlapping, will be 734 sufficiently disjoint to allow multipath to achieve improved 735 throughput and robustness. The use of multiple IP addresses is a 736 simple mechanism that requires no additional features in the network. 738 Multiple different (source, destination) address pairs will thus be 739 used as path selectors. Each path will be identified by a TCP 740 4-tuple (i.e. source address, destination address, source port, 741 destination port), thus allowing the extension of MPTCP to use such 742 4-tuples as path selectors if the network will route different ports 743 over different paths (which may be the case with technologies such as 744 Equal Cost MultiPath (ECMP) routing, e.g. [14]). 746 For increased chance of successfully setting up additional subflows 747 (such as when one end is behind a firewall, NAT, or other restrictive 748 middlebox), either host SHOULD be able to add new subflows to a MPTCP 749 connection. MPTCP MUST be able to handle paths that appear and 750 disappear during the lifetime of a connection (for example, through 751 the activation of an additional network interface). 753 The modularity of path management will permit alternative mechanisms 754 to be employed if appropriate in the future. 756 5.6. Connection Identification 758 Since an MPTCP connection may not be bound to a traditional 5-tuple 759 (source addr and port, destination addr and port, protocol number) 760 for the entirity of its existance, it is desirable to provide a new 761 mechanism for connection identification. This will be useful for 762 MPTCP-aware applications, and for the MPTCP implementation (and 763 MPTCP-aware middleboxes) to have a unique identifier with which to 764 associate the multiple subflows. 766 Therefore, each MPTCP connection requires a connection identifier at 767 each host, which is locally unique within that host. In many ways, 768 this is analogous to a port number in regular TCP. The manifestation 769 and purpose of such an identifier is out of the scope of this 770 architecture document. 772 Legacy applications will not, however, have access to this identifier 773 and in such cases a MPTCP connection will be identified by the 774 5-tuple of the first TCP subflow. It is out of the scope of this 775 document, however, to define the behaviour of the MPTCP 776 implementation if the first TCP subflow later fails. If there are 777 MPTCP-unaware applications that make assumptions about continued 778 existance of the initial address pair, their behaviour could be 779 disrupted by carrying on regardless. It is expected that this is a 780 very small, possibly negligible, set of applications, however. In 781 the case of applications that have used an existing API call to bind 782 to a specific address or interface, the MPTCP extension MUST NOT be 783 used, since the applications are indicating a clear choice of path to 784 use and thus will have expectations of behaviour that must be 785 maintained, in order to adhere to the application compatibility 786 goals. 788 Since the requirements of applications are not clear at this stage, 789 however, it is as yet unconfirmed what the best behaviour is. It 790 will be an implementation-specific solution, however, and as such the 791 behaviour is expected to be chosen by implementors once more research 792 has been undertaken to determine its impact. 794 5.7. Congestion Control 796 As discussed in network-layer compatibility requirements 797 Section 2.2.3, there are three goals for the congestion control 798 algorithms used by an MPTCP implementation: improve throughput (at 799 least as well as a single-path TCP connection would perform); do no 800 harm to other network users (do not take up more capacity on any one 801 path than if it was a single path flow using only that route - this 802 is particularly relevant for shared bottlenecks); and balance 803 congestion by moving traffic away from the most congested paths. To 804 achieve these goals, the congestion control algorithms on use on each 805 subflow must be coupled in some way. A proposal for a suitable 806 congestion control algorithm is given in [5]. 808 5.8. Security 810 A detailed threat analysis for Multipath TCP is presented in a 811 separate document [9]. This focuses on flooding attacks and 812 hijacking attacks that can be launched against a Multipath TCP 813 connection. 815 The basic security goal of Multipath TCP, as introduced in 816 Section 2.2.4, can be stated as: "provide a solution that is no worse 817 than standard TCP". 819 From the threat analysis, and with this goal in mind, three key 820 security requirements can be identified. A multi-addressed Multipath 821 TCP SHOULD be able to: 823 o Provide a mechanism to confirm that the parties in a subflow 824 handshake are the same as in the original connection setup (e.g. 825 require use of a key exchanged in the initial handshake in the 826 subflow handshake, to limit the scope for hijacking attacks). 828 o Provide verification that the peer can receive traffic at a new 829 address before adding it (i.e. verify that the address belongs to 830 the other host, to prevent flooding attacks). 832 o Provide replay protection, i.e. ensure that a request to add/ 833 remove a subflow is 'fresh'. 835 Additional mechanisms have been deployed as part of standard TCP 836 stacks to provide resistance to Denial-of-Service attacks. For 837 example, there are various mechanisms to protect against TCP reset 838 attacks [15], and Multipath TCP should continue to support similar 839 protection. In addition, TCP SYN Cookies [16] were developed to 840 allow a TCP server to defer the creation of session state in the 841 SYN_RCVD state, and remain stateless until the ESTABLISHED state had 842 been reached. Multipath TCP should, ideally, continue to provide 843 such functionality and, at a minimum, avoid significant computational 844 burden prior to reaching the ESTABLISHED state (of the Multipath TCP 845 connection as a whole). 847 It should be noted that aspects of the Multipath TCP design space 848 place constraints on the security solution: 850 o The use of TCP options significantly limits the amount of 851 information that can be carried in the handshake. 853 o The need to work through middleboxes results in the need to handle 854 mutability of packets. 856 o The desire to support a 'break-before-make' approach to adding 857 subflows removes the ability to actively use a pre-existing 858 subflow to support the addition of a new one. 860 The MPTCP protocol design [4] aims to meet these security 861 requirements, and the protocol specification will document how these 862 are met. 864 6. Interactions with Applications 866 Interactions with applications - incuding, but not limited to, 867 performances changes that may be expected, semantic changes, and new 868 features that may be requested of an API, are presented in [6]. 870 7. Interactions with Middleboxes 872 As discussed in Section 2.2, it is a goal of MPTCP to be deployable 873 today and thus compatible with the majority of middleboxes. This 874 section summarises the issues that may arise with NATs, firewalls, 875 proxies, intrusion detection systems, and other middleboxes that, if 876 not considered in the protocol design, may hinder its deployment. 878 This section is intended primarily as a description of options and 879 considerations only. Protocol-specific solutions to these issues 880 will be given in the companion documents. 882 Multipath TCP will be deployed in a network that no longer provides 883 just basic datagram delivery. A miriad of middleboxes are deployed 884 to optimize various perceived problems with the Internet protocols: 885 NATs primarily address space shortage [11], Performance Enhancing 886 Proxies (PEPs) optimize TCP for different link characteristics [13], 887 firewalls [12] and intrusion detection systems try to block malicious 888 content from reaching a host, and traffic normalizers [17] ensure a 889 consistent view of the traffic stream to IDSes and hosts. 891 All these middleboxes optimize current applications at the expense of 892 future applications. In effect, future applications will often need 893 to behave in a similar fashion to existing ones, in order to increase 894 the chances of successful deployment. Further, the precise behaviour 895 of all these middleboxes is not clearly specified, and implementation 896 errors make matters worse, raising the bar for the deployment of new 897 technologies. 899 The following list of middlebox classes documents behaviour that 900 could impact the use of MPTCP. This list is used in [4] to describe 901 the features of the MPTCP protocol that are used to mitigate the 902 impact of these middlebox behaviours. 904 o NATs: Network Address Translators decouple the host's local IP 905 address with that which is seen in the wider Internet when the 906 packets are transmitted through a NAT. This adds complexity, and 907 reduces the chances of success, when signalling IP addresses. 909 o PEPs: Performance Enhancing Proxies, which aim to improve the 910 performance of protocols over low-performance (e.g. high latency 911 or high error rate) links. As such, they may "split" a TCP 912 connection and behaviour such as proactive ACKing may occur. As 913 with NATs, it is no longer guaranteed that one host is 914 communicating directly with another. 916 o Traffic Normalizers: These aim to eliminate ambiguities and 917 potential attacks at the network level, and amongst other things 918 are unlikely to permit holes in TCP-level sequence space. 920 o Firewalls: on top of preventing incoming connections, firewalls 921 may also attempt additional protection such as sequence number 922 randomization. 924 o Intrusion Detection Systems: IDSs may look for traffic patterns to 925 protect a network, and may have false positives with MPTCP and 926 drop the connections during normal operation. For future MPTCP- 927 aware middleboxes, they will require the ability to correlate the 928 various paths in use. 930 In addition, all classes of middleboxes may affect TCP traffic in the 931 following ways: 933 o TCP Options: many middleboxes are in a position to drop packets 934 with unknown TCP options, or strip those options from the packets. 936 o Segmentation/Colescing: middleboxes (or even something as close to 937 the end host as TCP Segmentation Offloading) may change the packet 938 boundaries from those which the sender intended. It may do this 939 by splitting packets, or coalescing them together. This leads to 940 two major impacts: we cannot guarantee where a packet boundary 941 will be, and we cannot say for sure what a middlebox will do with 942 TCP options in these cases (they may be repeated, dropped, or sent 943 only once). 945 8. Contributors 947 The authors would like to acknowledge the contributions of Sebastien 948 Barre, Andrew McDonald, and Bryan Ford to this document. 950 The authors would also like to thank the following people for 951 detailed reviews: Olivier Bonaventure, Gorry Fairhurst, Iljitsch van 952 Beijnum, and Philip Eardley. 954 9. Acknowledgements 956 Alan Ford, Costin Raiciu and Mark Handley are supported by Trilogy 957 (http://www.trilogy-project.org), a research project (ICT-216372) 958 partially funded by the European Community under its Seventh 959 Framework Program. The views expressed here are those of the 960 author(s) only. The European Commission is not liable for any use 961 that may be made of the information in this document. 963 10. IANA Considerations 965 None. 967 11. Security Considerations 969 This informational document provides an architectural overview for 970 Multipath TCP and so does not, in itself, raise any security issues. 971 A separate threat analysis [9] lists threats that can exist with a 972 Multipath TCP. However, a protocol based on the architecture in this 973 document will have a number of security requirements. The high level 974 goals for such a protocol are identified in Section 2.2.4, whilst 975 Section 5.8 provides more detailed discussion of security 976 requirements and design decisions which are applied in the MPTCP 977 protocol design [4]. 979 12. References 981 12.1. Normative References 983 [1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 984 September 1981. 986 [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement 987 Levels", BCP 14, RFC 2119, March 1997. 989 12.2. Informative References 991 [3] Wischik, D., Handley, M., and M. Bagnulo Braun, "The Resource 992 Pooling Principle", ACM SIGCOMM CCR vol. 38 num. 5, pp. 47-52, 993 October 2008, 994 . 996 [4] Ford, A., Raiciu, C., and M. Handley, "TCP Extensions for 997 Multipath Operation with Multiple Addresses", 998 draft-ietf-mptcp-multiaddressed-01 (work in progress), 999 July 2010. 1001 [5] Raiciu, C., Handley, M., and D. Wischik, "Coupled Multipath- 1002 Aware Congestion Control", draft-ietf-mptcp-congestion-00 (work 1003 in progress), July 2010. 1005 [6] Scharf, M. and A. Ford, "MPTCP Application Interface 1006 Considerations", draft-scharf-mptcp-api-02 (work in progress), 1007 July 2010. 1009 [7] Carpenter, B. and S. Brim, "Middleboxes: Taxonomy and Issues", 1010 RFC 3234, February 2002. 1012 [8] Carpenter, B., "Internet Transparency", RFC 2775, 1013 February 2000. 1015 [9] Bagnulo, M., "Threat Analysis for Multi-addressed/Multi-path 1016 TCP", draft-ietf-mptcp-threat-02 (work in progress), 1017 March 2010. 1019 [10] Ford, B. and J. Iyengar, "Breaking Up the Transport Logjam", 1020 ACM HotNets, October 2008. 1022 [11] Srisuresh, P. and K. Egevang, "Traditional IP Network Address 1023 Translator (Traditional NAT)", RFC 3022, January 2001. 1025 [12] Freed, N., "Behavior of and Requirements for Internet 1026 Firewalls", RFC 2979, October 2000. 1028 [13] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. 1029 Shelby, "Performance Enhancing Proxies Intended to Mitigate 1030 Link-Related Degradations", RFC 3135, June 2001. 1032 [14] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm", 1033 RFC 2992, November 2000. 1035 [15] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's 1036 Robustness to Blind In-Window Attacks", RFC 5961, August 2010. 1038 [16] Eddy, W., "TCP SYN Flooding Attacks and Common Mitigations", 1039 RFC 4987, August 2007. 1041 [17] Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion 1042 Detection: Evasion, Traffic Normalization, and End-to-End 1043 Protocol Semantics", Usenix Security 2001, 2001, . 1046 Appendix A. Changelog 1048 (For removal by the RFC Editor) 1050 A.1. Changes since draft-ietf-mptcp-architecture-01 1052 o Responded to review comments. 1054 o Added security sections. 1056 A.2. Changes since draft-ietf-mptcp-architecture-00 1058 o Added middlebox compatibility discussion (Section 7). 1060 o Clarified path identification (TCP 4-tuple) in Section 5.5. 1062 o Added brief scenario and diagram to Section 1.3. 1064 Authors' Addresses 1066 Alan Ford (editor) 1067 Roke Manor Research 1068 Old Salisbury Lane 1069 Romsey, Hampshire SO51 0ZN 1070 UK 1072 Phone: +44 1794 833 465 1073 Email: alan.ford@roke.co.uk 1075 Costin Raiciu 1076 University College London 1077 Gower Street 1078 London WC1E 6BT 1079 UK 1081 Email: c.raiciu@cs.ucl.ac.uk 1082 Mark Handley 1083 University College London 1084 Gower Street 1085 London WC1E 6BT 1086 UK 1088 Email: m.handley@cs.ucl.ac.uk 1090 Janardhan Iyengar 1091 Franklin and Marshall College 1092 Mathematics and Computer Science 1093 PO Box 3003 1094 Lancaster, PA 17604-3003 1095 USA 1097 Phone: 717-358-4774 1098 Email: jiyengar@fandm.edu