idnits 2.17.1 draft-ietf-mptcp-architecture-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: As a corollary to both network and application compatibility, the architecture must enable new Multipath TCP flows to coexist gracefully with existing legacy TCP flows, competing for bandwidth neither unduly aggressively or unduly timidly (unless low-precedence operation is specifically requested by the application, such as with LEDBAT). The use of multiple paths MUST not unduly harm users using single path TCP at shared bottlenecks, beyond the impact that would occur from another single legacy TCP flow. -- The document date (June 22, 2010) is 5056 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: '9' is defined on line 884, but no explicit reference was found in the text == Outdated reference: A later version (-12) exists of draft-ietf-mptcp-multiaddressed-00 -- Obsolete informational reference (is this intentional?): RFC 793 (ref. '4') (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 4960 (ref. '5') (Obsoleted by RFC 9260) == Outdated reference: A later version (-04) exists of draft-scharf-mptcp-api-01 == Outdated reference: A later version (-08) exists of draft-ietf-mptcp-threat-02 Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force A. Ford, Ed. 3 Internet-Draft Roke Manor Research 4 Intended status: Informational C. Raiciu 5 Expires: December 24, 2010 University College London 6 S. Barre 7 Universite catholique de 8 Louvain 9 J. Iyengar 10 Franklin and Marshall College 11 June 22, 2010 13 Architectural Guidelines for Multipath TCP Development 14 draft-ietf-mptcp-architecture-01 16 Abstract 18 Endpoints are often connected by multiple paths, but TCP restricts 19 communications to a single path per transport connection. Resource 20 usage within the network would be more efficient were these multiple 21 paths able to be used concurrently. This should enhance user 22 experience through improved resilience to network failure and higher 23 throughput. 25 This document outlines architectural guidelines for the development 26 of a Multipath Transport Protocol, with references to how these 27 architectural components come together in the Multipath TCP (MPTCP) 28 protocol. This document also lists certain high level design 29 decisions that provide foundations for the MPTCP design, based upon 30 these architectural requirements. 32 Status of this Memo 34 This Internet-Draft is submitted in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF). Note that other groups may also distribute 39 working documents as Internet-Drafts. The list of current Internet- 40 Drafts is at http://datatracker.ietf.org/drafts/current/. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 This Internet-Draft will expire on December 24, 2010. 49 Copyright Notice 51 Copyright (c) 2010 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (http://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. Code Components extracted from this document must 60 include Simplified BSD License text as described in Section 4.e of 61 the Trust Legal Provisions and are provided without warranty as 62 described in the Simplified BSD License. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 67 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 68 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 69 1.3. Reference Scenario . . . . . . . . . . . . . . . . . . . . 5 70 2. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 71 2.1. Functional Goals . . . . . . . . . . . . . . . . . . . . . 6 72 2.2. Compatibility Goals . . . . . . . . . . . . . . . . . . . 7 73 2.2.1. Application Compatibility . . . . . . . . . . . . . . 7 74 2.2.2. Network Compatibility . . . . . . . . . . . . . . . . 7 75 2.2.3. Compatibility with other network users . . . . . . . . 8 76 3. An Architectural Basis For MPTCP . . . . . . . . . . . . . . . 9 77 4. A Functional Decomposition of MPTCP . . . . . . . . . . . . . 10 78 5. High-Level Design Decisions . . . . . . . . . . . . . . . . . 12 79 5.1. Sequence Numbering . . . . . . . . . . . . . . . . . . . . 12 80 5.2. Reliability . . . . . . . . . . . . . . . . . . . . . . . 13 81 5.3. Buffers . . . . . . . . . . . . . . . . . . . . . . . . . 14 82 5.4. Signalling . . . . . . . . . . . . . . . . . . . . . . . . 15 83 5.5. Path Management . . . . . . . . . . . . . . . . . . . . . 15 84 5.6. Connection Identification . . . . . . . . . . . . . . . . 16 85 5.7. Network Layer Compatibility . . . . . . . . . . . . . . . 16 86 5.8. Congestion Control . . . . . . . . . . . . . . . . . . . . 17 87 6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 88 7. Security Considerations . . . . . . . . . . . . . . . . . . . 17 89 8. Interactions with Applications . . . . . . . . . . . . . . . . 17 90 9. Interactions with Middleboxes . . . . . . . . . . . . . . . . 18 91 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 92 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 19 93 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 94 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 95 13.1. Normative References . . . . . . . . . . . . . . . . . . . 20 96 13.2. Informative References . . . . . . . . . . . . . . . . . . 20 97 Appendix A. Implementation Architecture . . . . . . . . . . . . . 21 98 A.1. Functional Separation . . . . . . . . . . . . . . . . . . 21 99 A.1.1. Application to default MPTCP protocol . . . . . . . . 21 100 A.1.2. Generic architecture for MPTCP . . . . . . . . . . . . 24 101 A.2. PM/MPS interface . . . . . . . . . . . . . . . . . . . . . 25 102 Appendix B. Changelog . . . . . . . . . . . . . . . . . . . . . . 26 103 B.1. Changes since draft-ietf-mptcp-architecture-00 . . . . . . 26 104 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27 106 1. Introduction 108 As the Internet evolves, demands on Internet resources are ever- 109 increasing, but often these resources (in particular, bandwidth) 110 cannot be fully utilised due to protocol constraints both on the end- 111 systems and within the network. If these resources could instead be 112 used concurrently, end user experience could be greatly improved. 113 Such enhancements would also reduce the necessary expenditure on 114 network infrastructure which would otherwise be needed to create an 115 equivalent improvement in user experience. 117 By the application of resource pooling[2], these available resources 118 can be 'pooled' such that they appear as a single logical resource to 119 the user. The purpose of a multipath transport, therefore, is to 120 make use of multiple available paths, through resource pooling, to 121 bring two key benefits: 123 o To increase the resilience of the connectivity by providing 124 multiple paths, protecting end hosts from the failure of one. 126 o To increase the efficiency of the resource usage, and thus 127 increase the network capacity available to end hosts. 129 Multipath TCP (MPTCP)[3] is a set of extensions for TCP[4] that 130 implements a multipath transport and achieves these goals by pooling 131 multiple paths within a transport connection, transparent to the 132 application. While multihoming and multipath functions have been 133 implemented in transport protocols previously, notably SCTP[5], MPTCP 134 is distinct in recognizing application and network compatibility 135 goals that we believe are important for deployability of a multipath 136 transport; we discuss these goals in more detail later in Section 2. 138 This document makes three contributions: (i) it describes goals for a 139 multipath transport - goals that MPTCP is designed to meet; (ii) it 140 lays out an architectural basis for MPTCP's design - a discussion 141 that applies to other multipath transports as well; and (iii) it 142 discusses and documents high-level design decisions made in MPTCP's 143 development, and considers their implications. 145 Companion documents to this architectural overview are those which 146 provide details of the protocol extensions[3], congestion control 147 algorithms[6], and application-level considerations[7]. Put 148 together, these components specify a complete Multipath TCP design. 149 We note that specific components are replaceable with other protocols 150 in accordance with the layer and functional decompositions discussed 151 in this document. 153 Please note this document is a work-in-progress and covers several 154 topics, some of which may be more appropriately moved to separate 155 documents as this work evolves. 157 1.1. Requirements Language 159 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 160 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 161 document are to be interpreted as described in RFC 2119 [1]. 163 1.2. Terminology 165 Path: A sequence of links between a sender and a receiver, defined 166 in this context by a source and destination address pair. 168 Endpoint: A host either initiating or terminating a MPTCP 169 connection. 171 Multipath TCP (MPTCP): A modified version of the TCP [4] protocol 172 that supports the simultaneous use of multiple paths between 173 endpoints. 175 Subflow: A flow of TCP packets operating over an individual path, 176 which forms part of a larger MPTCP connection. 178 MPTCP Connection: A set of one or more subflows combined to provide 179 a single Multipath TCP service to an application at an endpoint. 181 1.3. Reference Scenario 183 The diagram shown in Figure 1 illustrates a typical usage scenario 184 for MPTCP. Two hosts, A and B, are communicating with each other. 185 These endpoints are multi-homed and multi-addressed, providing two 186 disjoint connections to the Internet. The addresses on each endpoint 187 are referred to as A1, A2, B1 and B2. There are therefore up to four 188 different paths between the two endpoints: A1-B1, A1-B2, A2-B1, 189 A2-B2. 191 +------+ __________ +------+ 192 | |A1 ______ ( ) ______ B1| | 193 | Host |--/ ( ) \--| Host | 194 | | ( Internet ) | | 195 | A |--\______( )______/--| B | 196 | |A2 (__________) B2| | 197 +------+ +------+ 199 Figure 1: Simple MPTCP Usage Scenario 201 The scenario could have any number of addresses (1 or more) on each 202 endpoint, so long as the number of paths available between the two 203 endpoints is 2 or more (i.e. num_addr(A) * num_addr(B) > 1). The 204 paths created by these address combinations through the Internet need 205 not be entirely disjoint - shared bottlenecks will be addressed by 206 the MPTCP congestion controller. Furthermore, the paths through the 207 Internet may be interrupted by any number of middleboxes including 208 NATs and Firewalls. Finally, although the diagram refers to the 209 Internet, MPTCP may be used over any network where there are multiple 210 paths that could be used concurrently. 212 TBD - what further detail here would be useful? 214 2. Goals 216 This section outlines primary goals that Multipath TCP aims to meet. 217 These are broadly broken down into functional goals, which steer 218 services and features that MPTCP must provide, and compatibility 219 goals, which determine how MPTCP should appear to entities that 220 interact with it. 222 2.1. Functional Goals 224 In providing the use of multiple paths, MPTCP has the following two 225 functional goals. 227 o Improve Throughput: MPTCP MUST support the concurrent use of 228 multiple paths. To meet the minimum performance incentives for 229 deployment, an MPTCP connection over multiple paths SHOULD achieve 230 no lesser throughput than a single TCP connection over the best 231 constituent path. 233 o Improve Resilience: MPTCP MUST support the use of multiple paths 234 interchangeably for resilience purposes, by permitting packets to 235 be sent and re-sent on any available path. It follows that, in 236 the worst case, the protocol MUST be no less resilient than legacy 237 TCP. 239 As distribution of traffic among available paths and responses to 240 congestion are done in accordance with resource pooling 241 principles[2], a secondary effect of meeting these goals is that 242 widespread use of MPTCP over the Internet should optimize overall 243 network utility by shifting load away from congested bottlenecks and 244 by taking advantage of spare capacity wherever possible. 246 Furthermore, MPTCP SHOULD feature automatic negotiation of its use. 247 A host supporting Multipath TCP that requires the other endpoint to 248 do so too must be able to detect reliably whether this endpoint does 249 in fact support the next-generation protocol, using it if so, and 250 otherwise automatically falling back to the legacy protocol. 252 2.2. Compatibility Goals 254 In addition to the functional goals listed above, a Multipath TCP 255 must meet a number of compatibility goals in order to support 256 deployment in today's Internet. These goals fall into the following 257 categories: 259 2.2.1. Application Compatibility 261 Application compatibility refers to the appearance of MPTCP to the 262 application both in terms of the API that can be used and the 263 expected service model that is provided. 265 MPTCP MUST follow the same service model as TCP [4]: in-order, 266 reliable, and byte-oriented delivery. Furthermore, an MPTCP 267 connection SHOULD provide the application with no worse throughput 268 than it would expect from running a single TCP connection over any 269 one of its available paths. 271 A multipath-capable equivalent of TCP SHOULD retain backward 272 compatibility with existing TCP APIs, so that existing applications 273 can use the newer transport merely by upgrading the operating systems 274 of the end-hosts. This does not preclude the use of an advanced API 275 to permit multipath-aware applications to specify preferences, nor 276 for users to configure their systems in a different way from the 277 default, for example switching on or off the automatic use of MPTCP. 279 2.2.2. Network Compatibility 281 Traditional Internet architecture slots network devices in the 282 network layer and lower layers of the OSI 7-layer stack, where the 283 layers above the network layer - the transport layer and upper layers 284 - are instantiated only at the end-hosts. While this architecture, 285 shown in Figure 2, was largely adhered to earlier, this layering no 286 longer reflects the "ground truth" in the Internet with the 287 proliferation of middleboxes[8]. Middleboxes routinely interpose on 288 the transport layer; sometimes even completely terminating transport 289 connections, thus leaving the application layer as the first real 290 end-to-end layer, as shown in Figure 3. 292 +-------------+ +-------------+ 293 | Application |<------------ end-to-end ------------->| Application | 294 +-------------+ +-------------+ 295 | Transport |<------------ end-to-end ------------->| Transport | 296 +-------------+ +-------------+ +-------------+ +-------------+ 297 | Network |<->| Network |<->| Network |<->| Network | 298 +-------------+ +-------------+ +-------------+ +-------------+ 299 End Host Router Router End Host 301 Figure 2: Traditional Internet Architecture 303 +-------------+ +-------------+ 304 | Application |<------------ end-to-end ------------->| Application | 305 +-------------+ +-------------+ +-------------+ 306 | Transport |<------------------->| Transport |<->| Transport | 307 +-------------+ +-------------+ +-------------+ +-------------+ 308 | Network |<->| Network |<->| Network |<->| Network | 309 +-------------+ +-------------+ +-------------+ +-------------+ 310 Firewall, 311 End Host Router NAT, or Proxy End Host 313 Figure 3: Internet Reality 315 Middleboxes that interpose on the transport layer result in loss of 316 "fate-sharing"[9], that is, they often hold "hard" state that, when 317 lost or corrupted, results in loss or corruption of the end-to-end 318 transport connection. 320 MPTCP MUST remain backward compatible with the Internet as it exists 321 today, including being able to traverse predominant middleboxes such 322 as firewalls, NATs, and performance enhancing proxies[8]. This 323 requirement comes from recognizing middleboxes as a significant 324 deployment bottleneck for any transport that is not TCP, and 325 constrains MPTCP to appear as TCP does on the wire and to use 326 established TCP extensions where necessary. To ensure end-to-endness 327 of the transport, we further require MPTCP to preserve fate-sharing 328 without making any assumptions about middlebox behavior. 330 2.2.3. Compatibility with other network users 332 As a corollary to both network and application compatibility, the 333 architecture must enable new Multipath TCP flows to coexist 334 gracefully with existing legacy TCP flows, competing for bandwidth 335 neither unduly aggressively or unduly timidly (unless low-precedence 336 operation is specifically requested by the application, such as with 337 LEDBAT). The use of multiple paths MUST not unduly harm users using 338 single path TCP at shared bottlenecks, beyond the impact that would 339 occur from another single legacy TCP flow. 341 3. An Architectural Basis For MPTCP 343 We now present one possible transport architecture that we believe 344 can effectively support MPTCP's goals. The new Internet model 345 described here is based on ideas proposed earlier in Tng ("Transport 346 next-generation") [10]. While by no means the only possible 347 architecture supporting multipath transport, Tng incorporates many 348 lessons learned from previous transport research and development 349 practice, and offers a strong starting point from which to consider 350 the extant Internet architecture and its bearing on the design of any 351 new Internet transports or transport extensions. 353 +------------------+ 354 | Application | 355 +------------------+ ^ Application-oriented transport 356 | | | functions (Semantic Layer) 357 + - - Transport - -+ ---------------------------------- 358 | | | Network-oriented transport 359 +------------------+ v functions (Flow+Endpoint Layer) 360 | Network | 361 +------------------+ 362 Existing Layers Tng Decomposition 364 Figure 4: Decomposition of Transport Functions 366 Tng loosely splits the transport layer into "application-oriented" 367 and "network-oriented" layers, as shown in Figure 4. The 368 application-oriented "Semantic" layer implements functions driven 369 primarily by concerns of supporting and protecting the application's 370 end-to-end communication, while the network-oriented "Flow+Endpoint" 371 layer implements functions such as endpoint identification (using 372 port numbers) and congestion control. These network-oriented 373 functions, while traditionally located in the ostensibly "end-to-end" 374 Transport layer, have proven in practice to be of great concern to 375 network operators and the middleboxes they deploy in the network to 376 enforce network usage policies[11] [12] or optimize communication 377 performance[13]. Figure 5 shows how middleboxes interact with 378 different layers in this decomposed model of the transport layer: the 379 application-oriented layer operates end-to-end, while the network- 380 oriented layer operates "segment-by-segment" and can be interposed 381 upon by middleboxes. 383 +-------------+ +-------------+ 384 | Application |<------------ end-to-end ------------->| Application | 385 +-------------+ +-------------+ 386 | Semantic |<------------ end-to-end ------------->| Semantic | 387 +-------------+ +-------------+ +-------------+ +-------------+ 388 |Flow+Endpoint|<->|Flow+Endpoint|<->|Flow+Endpoint|<->|Flow+Endpoint| 389 +-------------+ +-------------+ +-------------+ +-------------+ 390 | Network |<->| Network |<->| Network |<->| Network | 391 +-------------+ +-------------+ +-------------+ +-------------+ 392 Firewall Performance 393 End Host or NAT Enhancing Proxy End Host 395 Figure 5: Middleboxes in the new Internet model 397 MPTCP's architectural design follows Tng's decomposition as shown in 398 Figure 6. The MPTCP component, which provides application 399 compatibility through the preservation of TCP-like semantics of 400 global ordering of application data and reliability, is an 401 instantiation of the "application-oriented" Semantic layer; whereas 402 the legacy-TCP component, which provides network compatibility by 403 appearing and behaving as a TCP flow in network, is an instantiation 404 of the "network-oriented" Flow+Endpoint layer. 406 +--------------------------+ +-------------------------+ 407 | Application | | Application | 408 +--------------------------+ +-------------------------+ 409 | Semantic | | MPTCP | 410 |--------------------------| + - - - - - + - - - - - + 411 | Flow+Endpt | Flow+Endpt | | TCP | TCP | 412 +--------------------------+ +-------------------------+ 413 | Network | Network | | IP | IP | 414 +--------------------------+ +-------------------------+ 416 Figure 6: MPTCP mapping to Tng 418 As a protocol extension to TCP, MPTCP thus explicitly acknowledges 419 middleboxes in its design, and specifies a protocol that operates at 420 two scales: the MPTCP component operates end-to-end, while it allows 421 the TCP component to operate segment-by-segment. 423 4. A Functional Decomposition of MPTCP 425 Having laid out the goals to be met and the architectural basis for 426 MPTCP, we now provide a functional decomposition MPTCP's design. 428 The MPTCP component relies upon (what appear to the network to be) 429 standard TCP sessions, termed "subflows", to provide the underlying 430 transport per path, and as such these retain the network 431 compatibility desired. MPTCP as described in [3] carries MPTCP- 432 specific information in a TCP-compatible manner, although this 433 mechanism is separate from the actual information being transferred 434 so could evolve in future revisions. Figure 7 illustrates the 435 layered architecture. 437 +-------------------------------+ 438 | Application | 439 +---------------+ +-------------------------------+ 440 | Application | | MPTCP | 441 +---------------+ + - - - - - - - + - - - - - - - + 442 | TCP | | Subflow (TCP) | Subflow (TCP) | 443 +---------------+ +-------------------------------+ 444 | IP | | IP | IP | 445 +---------------+ +-------------------------------+ 447 Figure 7: Comparison of Standard TCP and MPTCP Protocol Stacks 449 Situated below the application, the MPTCP extension manages multiple 450 TCP subflows below it and must implement the following functions: 452 o Path Management: This is the function to detect and use multiple 453 paths between two endpoints. In the case of the MPTCP design [3], 454 this feature is implemented using multiple IP addresses at least 455 one of the endpoints. Although this does not guarantee path 456 diversity, and there may be shared bottlenecks, this is a simple 457 mechanism that can be used with no additional features in the 458 network. The path management features of the MPTCP protocol are 459 the mechanisms to signal alternative addresses to endpoints, and 460 mechanisms to set up new subflows attached to an existing MPTCP 461 connection. 463 o Packet Scheduling: This function breaks the bytestream received 464 from the application into segments which are transmitted on one of 465 the available lower subflows. The MPTCP design makes use of a 466 data sequence mapping, associating packets sent on different 467 subflows to a connection-level sequence numbering, thus allowing 468 packets sent on different subflows to be correctly re-ordered at 469 the receiver. The packet scheduler is dependent upon information 470 about the availability of paths exposed by the path management 471 component, and then makes use of the subflows to transmit these 472 packets. 474 o Subflow (single-path TCP) Interface: A subflow component takes 475 segments from the packet-scheduling component and transmits them 476 over the specified path, ensuring detectable delivery to the 477 endpoint. Detection of delivery is necessary to allow the 478 congestion control protocol to attribute packet delivery or loss 479 to the right path. Note that the packet scheduling component does 480 not embed enough information in packets to allow this to happen: 481 segments with the same connection-level sequence number can be 482 transmitted over multiple paths, i.e. as retransmissions or just 483 to increase redundancy. MPTCP uses TCP underneath for network 484 compatibility; TCP ensures in-order, reliable delivery. TCP adds 485 its of sequence numbers to the segments; these are used to detect 486 and retransmit lost packets. 488 o Congestion Control: This function manages congestion control 489 across the subflows. As specified, this congestion control 490 algorithm must ensure that a MPTCP connection does not unfairly 491 take more bandwidth than a single path TCP flow would take at a 492 shared bottlneck. An algorithm to support this is specified in 493 [6]. 495 These functions fit together as follows. The Path Management looks 496 after the discovery (and if necessary, initialisation) of multiple 497 paths between two endpoints. The Packet Scheduler then receives 498 packets from the application for the network and does the necessary 499 operations on them (such as adding a data-level sequence number) 500 before sending to a subflow. The subflow then adds its own sequence 501 number, acks, and passes them to network. The receiving subflow re- 502 orders data and passes it to the MPTCP component, which performs 503 connection level re-ordering, removes the segment boundaries and 504 sends it to the application. Finally, the congestion control 505 component exists as part of the packet scheduling, in order to 506 schedule which packets should be sent at what rate on which subflow. 508 5. High-Level Design Decisions 510 There is seemingly a wide range of choices when designing a multipath 511 extension to TCP. However, the goals as discussed earlier in this 512 document constrain the possible solutions, leaving relative little 513 choice in many areas. Here, we outline high-level design choices 514 that draw from the architectural basis discussed earlier in 515 Section 3, and their implications for the MPTCP design. 517 5.1. Sequence Numbering 519 MPTCP uses two levels of sequence spaces: a connection level sequence 520 number, and another sequence number for each subflow. This permits 521 connection-level segmentation and reassembly, and retransmission of 522 the same part of connection-level sequence space on different 523 subflow-level sequence space. 525 The alternative approach would be to use a single connection level 526 sequence number, which gets sent on multiple subflows. This has two 527 problems: first, the individual subflows will appear to the network 528 as TCP sessions with gaps in the sequence space; this in turn may 529 upset certain middleboxes such as intrusion detection systems, or 530 certain transparent proxies, and would go against the network 531 compatibility goal. Second, the sender cannot attribute packet 532 losses or receptions to the correct path when the same packet is sent 533 on multiple paths, in the case of retransmissions. 535 The sender must be able to tell the receiver how to reorder the data, 536 for delivery to the application. The sender does so by telling the 537 receiver how subflow-level data (carying subflow sequence numbers) 538 maps at connection level, which we refer to as Data Sequence Mapping. 539 This mapping takes the form (data seq, subflow seq, length), i.e. for 540 a given number of bytes (the length), the subflow sequence space 541 beginning at the given sequence number maps to the connection-level 542 sequence space (beginning at the given data seq number). 544 This architecture does not mandate a mechanism for signalling such 545 information, and it could conceivably have various sources. 547 One option would be to use existing fields in the TCP segment (such 548 as subflow seqno, length) and only add the data sequence number to 549 each segment, for instance as a TCP option. This is, however, 550 vulnerable to middleboxes that resegment or assemble data, since 551 there is no specified behaviour for coalescing TCP options. If one 552 signalled (data seqno, length), this would still be vulnerable to 553 middleboxes that coalesce segments and do not correctly coalesce the 554 options. Because of these potential issues, the current 555 specification of MPTCP mandates that the full mapping should be sent 556 to the other end. 558 To reduce the overhead, it would be permissable for the mapping to be 559 sent periodically and cover more than a single segment. It could 560 also be excluded entirely in the case of a connection before more 561 than one subflow is used, where the data-level and subflow-level 562 sequence space is the same. 564 5.2. Reliability 566 Under normal behaviour, MPTCP can use the data sequence mapping and 567 subflow ACKs to decide when a connection-level segment was received. 568 This has certain implications on end-to-end semantics. It means that 569 once a packet is acked at subflow level it cannot be discarded in the 570 re-order buffer at the connection level. Secondly, unlike in 571 standard TCP, a receiver cannot simply drop out-of-order segments if 572 needed (for instance, due to memory pressure). 574 Furthermore, it is possible to conceive of some cases where 575 connection-level acknowledgements could improve robustness. Consider 576 a subflow traversing a transparent proxy: if the proxy acks a segment 577 and then crashes, the sender will not retransmit the lost segment on 578 another subflow, as it thinks the segment has been received. The 579 connection grinds to a halt despite having other working subflows, 580 and the sender would be unable to determine the cause of the problem. 581 Finally, as an optimisation, it may be feasible for a connection- 582 level acknowledgement to be transmitted over the shortest RTT path, 583 potentially reducing send buffer requirements (see Section 5.3). 585 Therefore, to provide a fully robust multipath TCP solution, MPTCP 586 SHOULD feature explicit connection-level acknowledgements. 588 Regarding retransmissions, it MUST be possible for a packet to be 589 retransmitted on a different subflow to that on which it was 590 originally sent. This is one of MPTCP's core goals, in order to 591 maintain integrity during temporary or permanent subflow failure, and 592 this is enabled by the dual sequence number space. 594 The scheduling of retransmissions will have significant impact on 595 MPTCP user experience. The current MPTCP specification suggests that 596 data outstanding on subflows that have timed out should be 597 rescheduled for transmission on different subflows. This behaviour 598 aims to minimize disruption when a path breaks, and uses the first 599 timeout as indicators. More conservative versions would be to use 600 second or third timeouts for the same packet. 602 When packet loss is detected and corrected with fast retransmit, 603 retransmission on different subflows may still be desirable in 604 certain cases, for instance to reduce the receive buffer 605 requirements. However, in all cases with retransmissions on 606 different subflows, the lost packets SHOULD still be sent on the path 607 that lost them. This is currently believed to be necessary to 608 maintain subflow integrity, as per the network compatiblity goal. By 609 doing this, throughput will be wasted, and it is unclear at this 610 point what the optimal retransmit strategy is. 612 5.3. Buffers 614 Receive Buffer: ideally, a subflow failing should not affect the 615 throughput of other working subflows. However, the receive buffer 616 has limited size: if a flow times out, the other subflows will 617 quickly fill the receive buffer with out-of-order data, and will 618 stall. Hence, receive buffer sizing is important for both robustness 619 and throughput. 621 The smallest receive buffer we need to avoid stalling under any 622 circumstances is max(RTO)*sum(BW). This is, for most multipath 623 connections, too expensive. A more reasonable size is proportional 624 to max(RTT)*sum(BW) which ensures subflows don't stall when fast 625 retransmit works. Also, depending on how the implementation behaves, 626 an additional sum(RTT*BW) might be needed for the individual re-order 627 buffers of the TCP subflows. 629 Send Buffer: the smallest send buffer we need is sum(BDP) across all 630 paths; this is to hold data until it's acked at subflow level. If we 631 didn't use a subflow level ack, and relied on a data-level ack, the 632 send buffer would need to be as big as the receive buffer of the 633 connection, max(RTT)*sum(BW). In practice, the senders will be web 634 servers and receivers will be desktops or mobile servers. The send 635 buffer size matters particularly for servers, which must be able to 636 maintain a large number of ongoing connections. 638 5.4. Signalling 640 Since MPTCP will use regular TCP streams as its transport mechanism, 641 a MPTCP connection will also begin as a single TCP stream. 642 Nevertheless, it must signal to the peer that it supports MPTCP and 643 wishes to use it on this connection. As such, a TCP Option will be 644 used to transmit this information, since this is the established 645 mechanism for indicating additional functionality on a TCP session. 647 On top of this, however, is signalling required during the operation 648 of an MPTCP session, such as that for reassembly for multiple 649 subflows, and for informing the other endpoint about potential other 650 available addresses. It is not mandated by the architecture in what 651 format this signalling should be transmitted. 653 The current MPTCP protocol proposal suggests the use of TCP options 654 for this signalling, however another approach would be to embed such 655 information in the payload, and use type-length-value (TLV) encoding 656 to separate signalling and payload data. 658 5.5. Path Management 660 Currently, the network does not expose multiple paths between 661 endpoints. Multipath TCP will use multiple addresses at one or both 662 endpoints to get different paths to the destination. The hope is 663 that these paths, whilst not necesarily entirely non-overlapping, 664 will be sufficiently disjoint to allow multipath achieve improved 665 throughput and robustness. 667 Multiple different (source, destination) address pairs will thus be 668 used as path selectors. Each path will be identified by a TCP 669 4-tuple (i.e. source address, destination address, source port, 670 destination port), thus allowing the extension of MPTCP to use such 671 4-tuples as path selectors if the network will route different ports 672 over different paths (which may be the case with technologies such as 673 ECMP). 675 For increased chance of successfully setting up additional subflows 676 (such as when one end is behind a firewall, NAT, or other restrictive 677 middlebox), either endpoint should be able to add new subflows to a 678 MPTCP connection. 680 The modularity of path management will permit alternative mechanisms 681 to be employed if appropriate in the future. 683 5.6. Connection Identification 685 Therefore, each MPTCP connection should have a connection identifier 686 at each endpoint, which is locally unique within that endpoint. In 687 many ways, this is analogous to a port number in regular TCP. The 688 manifestation and purpose of such an identifier is out of the scope 689 of this architecture document. 691 Legacy applications will not, however, have access to this identifier 692 and in such cases a MPTCP connection will be identified by the 693 5-tuple of the first TCP subflow. It is out of the scope of this 694 document, however, to define the behaviour of the MPTCP 695 implementation if the first TCP subflow later fails. If there are 696 legacy applications that make assumptions about continued existance 697 of the initial address pair, their behaviour could be disrupted by 698 carrying on regardless. It is expected that this is a very small, 699 possibly negligible, set of applications, however. In the case of 700 applications that have specifically asked to be bound to a particular 701 address or interface, MPTCP will not be used. 703 Since the requirements of applications are not clear at this stage, 704 however, it is as yet unconfirmed what the best behaviour is. It 705 will be an implementation-specific solution, however, and as such the 706 behaviour is expected to be chosen by implementors once more research 707 has been undertaken to determine its impact. 709 5.7. Network Layer Compatibility 711 MPTCP's modifications remain at the transport layer, although some 712 knowledge of the underlying network layer is required. MPTCP MUST 713 work with IPv4 and IPv6 interchangeably, i.e. one MPTCP connection 714 may operate over both IPv4 and IPv6 networks. 716 5.8. Congestion Control 718 As already documented in network-layer compatibility requirements, 719 the congestion control algorithms used by an MPTCP implementation 720 must not harm other legacy users on shared bottlenecks. To achieve 721 this, the congestion control algorithms on use on each subflow must 722 be coupled in some way - a proposal for this is given in [6]. 724 6. Summary 726 This document has provided a summary of the components that have been 727 identified to provide a Multipath TCP solution, and described the 728 high-level design decisions that have been used as a basis of the 729 MPTCP specification. 731 The suite of drafts that specify a complete MPTCP implementation, on 732 top of this architectural overview, are as follows: 734 o A specification of the MPTCP protocol [3], describing the on- and 735 off-the-wire differences to regular TCP. 737 o A specification of a coupled congestion control algorithm [6], 738 that can be applied to the above protocol while meeting the goals 739 for such an algorithm as specified in this document. 741 o A document [7] that builds upon the application compatibility 742 issues discussed in this document, explaining in more detail what 743 if any changes an application may experience through the use of 744 MPTCP. This document also provides a proposed API through which 745 an application can influence the behaviour of the MPTCP protocol, 746 as specified in the above drafts. 748 7. Security Considerations 750 Please see [14] for a threat analysis of Multipath TCP. The threats 751 analysed in this companion document are addressed as appropriate in 752 the protocol design [3]. 754 8. Interactions with Applications 756 Interactions with applications - incuding, but not limited to, 757 performances changes that may be expected, semantic changes, and new 758 features that may be requested of an API, are presented in [7]. 760 9. Interactions with Middleboxes 762 As discussed in Section 2.2, it is a goal of MPTCP to be deployable 763 today and thus compatible with the majority of middleboxes. This 764 section summarises the issues that may arise with NATs, firewalls, 765 proxies, intrusion detection systems, and other middleboxes that, if 766 not considered in the protocol design, may hinder its deployment. 768 This section is intended primarily as a description of options and 769 considerations only. Protocol-specific solutions to these issues 770 will be given in the companion documents. 772 Multipath TCP will be deployed in a network that no longer provides 773 just basic datagram delivery. A miriad of middleboxes are deployed 774 to optimize various perceived problems with the Internet protocols: 775 NATs primarily address space shortage [11], Performance Enhancing 776 Proxies (PEPs) optimize TCP for different link characteristics [13], 777 firewalls [12] and intrusion detection systems try to block malicious 778 content from reaching a host, and traffic normalizers [15] ensure a 779 consistent view of the traffic stream to IDSes and hosts. 781 All these middleboxes optimize current applications at the expense of 782 future applications. In effect, future applications must mimic 783 existing ones if they want to be deployed. Further, the precise 784 behaviour of all these middleboxes is not clearly specified, and 785 implementation errors make matters worse, raising the bar for the 786 deployment of new technologies. 788 The following list of middlebox classes documents behaviour that 789 could impact the use of MPTCP. This list is used in [3] to describe 790 the features of the MPTCP protocol that are used to mitigate the 791 impact of these middlebox behaviours. 793 o NATs: Network Address Translators decouple the endpoint's local IP 794 address with that which is seen in the wider Internet when the 795 packets are transmitted through a NAT. This adds complexity, and 796 reduces the chances of success, when signalling IP addresses. 798 o PEPs: Performance Enhancing Proxies, which aim to improve the 799 performance of protocols over low-performance (e.g. high latency 800 or high error rate) links. As such, they may "split" a TCP 801 connection and behaviour such as proactive ACKing may occur. As 802 with NATs, it is no longer guaranteed that one endpoint is 803 communicating directly with another. 805 o Traffic Normalizers: These aim to eliminate ambiguities and 806 potential attacks at the network level, and amongst other things 807 are unlikely to permit holes in sequence space. 809 o TCP Options: many middleboxes are in a position to drop packets 810 with unknown TCP options, or strip those options from the packets. 812 o Segmentation/Colescing: middleboxes (or even something as close to 813 the end host as TCP Segmentation Offloading) may change the packet 814 boundaries from those which the sender intended. It may do this 815 by splitting packets, or coalescing them together. This leads to 816 two major impacts: we cannot guarantee where a packet boundary 817 will be, and we cannot say for sure what a middlebox will do with 818 TCP options in these cases (they may be repeated, dropped, or sent 819 only once). 821 o Firewalls: on top of preventing incoming connections, firewalls 822 may also attempt additional protection such as sequence number 823 randomization. 825 o Intrusion Detection Systems: IDSs may look for traffic patterns to 826 protect a network, and may have false positives with MPTCP and 827 drop the connections during normal operation. For future MPTCP- 828 aware middleboxes, they will require the ability to correlate the 829 various paths in use. 831 10. Acknowledgements 833 Alan Ford, Costin Raiciu and Sebastien Barre are supported by Trilogy 834 (http://www.trilogy-project.org), a research project (ICT-216372) 835 partially funded by the European Community under its Seventh 836 Framework Program. The views expressed here are those of the 837 author(s) only. The European Commission is not liable for any use 838 that may be made of the information in this document. 840 11. Contributors 842 The authors would like to acknowledge the contributions of Mark 843 Handley and Bryan Ford to this document. 845 12. IANA Considerations 847 None. 849 13. References 850 13.1. Normative References 852 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 853 Levels", BCP 14, RFC 2119, March 1997. 855 13.2. Informative References 857 [2] Wischik, D., Handley, M., and M. Bagnulo Braun, "The Resource 858 Pooling Principle", ACM SIGCOMM CCR vol. 38 num. 5, pp. 47-52, 859 October 2008, 860 . 862 [3] Ford, A., Raiciu, C., and M. Handley, "TCP Extensions for 863 Multipath Operation with Multiple Addresses", 864 draft-ietf-mptcp-multiaddressed-00 (work in progress), 865 June 2010. 867 [4] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 868 September 1981. 870 [5] Stewart, R., "Stream Control Transmission Protocol", RFC 4960, 871 September 2007. 873 [6] Raiciu, C., Handley, M., and D. Wischik, "Coupled Multipath- 874 Aware Congestion Control", draft-raiciu-mptcp-congestion-01 875 (work in progress), March 2010. 877 [7] Scharf, M. and A. Ford, "MPTCP Application Interface 878 Considerations", draft-scharf-mptcp-api-01 (work in progress), 879 March 2010. 881 [8] Carpenter, B. and S. Brim, "Middleboxes: Taxonomy and Issues", 882 RFC 3234, February 2002. 884 [9] Carpenter, B., "Internet Transparency", RFC 2775, 885 February 2000. 887 [10] Ford, B. and J. Iyengar, "Breaking Up the Transport Logjam", 888 ACM HotNets, October 2008. 890 [11] Srisuresh, P. and K. Egevang, "Traditional IP Network Address 891 Translator (Traditional NAT)", RFC 3022, January 2001. 893 [12] Freed, N., "Behavior of and Requirements for Internet 894 Firewalls", RFC 2979, October 2000. 896 [13] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. 897 Shelby, "Performance Enhancing Proxies Intended to Mitigate 898 Link-Related Degradations", RFC 3135, June 2001. 900 [14] Bagnulo, M., "Threat Analysis for Multi-addressed/Multi-path 901 TCP", draft-ietf-mptcp-threat-02 (work in progress), 902 March 2010. 904 [15] Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion 905 Detection: Evasion, Traffic Normalization, and End-to-End 906 Protocol Semantics", Usenix Security 2001, 2001, . 909 Appendix A. Implementation Architecture 911 This section provides suggestions for an architecture to implement an 912 extensible, modular multipath transport protocol. 914 A.1. Functional Separation 916 This section describes a generic view of the internal implementation 917 of a Multipath TCP, through which the technical components specified 918 in the companion documents can fit together. It shows how an 919 implementation could be built that permits extensibility between 920 components without changing the external representation. 922 We first show the functional decomposition of an MPTCP solution that 923 is completely contained in the transport layer. That solution is 924 described in more details in [3]. Then we generalize the approach to 925 allow good extensibility of that solution. 927 A.1.1. Application to default MPTCP protocol 929 Although, in the default approach, MPTCP is fully contained in the 930 transport layer, it can still be divided into two main modules. One 931 manages the scheduling of packets as well as congestion control. The 932 other one manages the control of paths. The interface between the 933 two is dealt with thanks to a Path Index. As shown in Figure 8, the 934 Path Manager announces to the MultiPath Scheduler what paths can be 935 used trough path indices, and maintains the mapping between that 936 value and the particular action that it must apply to use the path 937 (an example of such a mapping is in Table 1). In the case of the 938 built-in Path Manager, the action is to replace an address/port pair 939 with another one, in such a way that another path is used across the 940 Internet to forward that packet. 942 Control plane <-- | --> Data plane 943 +---------------------------------------------------------------+ 944 | Multipath Scheduler (MPS) | 945 +---------------------------------------------------------------+ 946 ^ | | 947 | | [A1,B1,|pA1,pB1] 948 |For conn_id | | 949 | | +-------------+ 950 |Paths 1->4 can be | | Data packet |<--Path idx:3 951 |used. | +-------------+ attached 952 | | | by MPS 953 | | V 954 +--------------------------------------------\------------------+ 955 | Path Manager (PM) \[A1,B1]->[A1,B2] | 956 +--------------------------------------------------\------------+ 957 / \ | \ 958 /-----------------------------\ | /"\ /"\ /"\ /"\ 959 | rewriting table: || | | | | | | | | 960 | Subflow id <--> network_id || | | | | | | | | 961 | || | | | | | | | | 962 | [see table below] || | | | | | | | | 963 | || \./ \./ \./ \./ 964 +------------------------------+| path1 path2 path3 path4 966 Figure 8: Functional separation of MPTCP in the transport layer 968 The MultiPath Scheduler only deals with abstract paths, represented 969 by numbers. It only sees one address pair throughout the 970 communication, that we call the connection identifier. However, the 971 MultiPath Scheduler must be able to perform per-subflow congestion 972 control, and thus to distinguish between the subflows. This leads to 973 define a subflow identifier, that consists of the usual transport 974 identifier extended with the path index: 975 . The following options, 976 described in [3], are managed by the MultiPath Scheduler. 978 o MULTIPATH CAPABLE (MPC): Tell the peer that we support MPTCP. 979 Note that the MPC option also holds a token, which is necessary 980 only if the built-in Path Manager is used. In the next section we 981 describe the generalized case, where the token can be ignored by 982 the receiver if another path manager is used. 984 o DATA SEQUENCE NUMBER (DSN): Identify the position of a set of 985 bytes in the meta-flow. 987 o DATA FIN (DFIN): Terminate a meta-flow. 989 An implementation MUST use those options even if another Path Manager 990 than the default one is implemented. 992 The Path manager applies a particular technology to give the MPS the 993 possibility to use several paths. The built-in MPTCP Path Manager 994 uses multiple IPv4 addresses as its mean to influence the forwarding 995 of packets through the Internet. 997 When the MPS starts a new connection, the PM chooses a token that 998 will be used to identify the connection. This is necessary to allow 999 the PM applying the correct path index to incoming packets. An 1000 example mapping table is given hereafter: 1002 +-----------------+---------------+---------+-----------------+ 1003 | connection id | subflow id | token | Network id | 1004 +-----------------+---------------+---------+-----------------+ 1005 | | | token_1 | | 1006 | | | token_1 | | 1007 | | | token_1 | | 1008 | | | token_1 | | 1009 | | | token_2 | | 1010 | | | token_2 | | 1011 +-----------------+---------------+---------+-----------------+ 1013 Table 1: Example mapping table for built-in PM 1015 Table 1 shows an example where two connections are ongoing. One is 1016 identified by token_1, the other one with token_2. Since addresses 1017 are rewritten by the path manager, the attachment to the right 1018 connection is achieved thanks to the token, which is used at 1019 connection establishment and subflow establishment. It is then 1020 remembered. The first column holds the information that is exposed 1021 to the applications, while the last column shows the information that 1022 is actually written in packets that will fly through the network. We 1023 note that additionnally to the addresses, ports can be rewritten, 1024 which contributes to supporting NATs. The table also shows the role 1025 of the token, which is to attach various combinations of ports and 1026 addresses to a single connection. The token is specific to the 1027 built-in path manager, and can be ignored if another path manager is 1028 used. An implementation of the built-in path manager MUST implement 1029 the following options (defined in more details in [3]): 1031 o Add Address (ADDR): Announce a new address we own 1033 o Remove Addresse (REMADDR): Withdraw a previously announced address 1035 o Join Connection (JOIN): Attach a new subflow to the current 1036 connection 1038 Those options form the default MPTCP Path Manager, based on declaring 1039 IP addresses, and carries control information in TCP options. An 1040 implementation of Multipath TCP can use any Path Manager, but it MUST 1041 be able to fallback to the default PM in case the other end does not 1042 support the custom PM. Alternative Path Managers may be specified in 1043 separate documents in the future. 1045 A.1.2. Generic architecture for MPTCP 1047 Now that the functional decomposition has been shown for MPTCP with 1048 the built-in Path Manager, we show how that architecture can be 1049 generalized to allow the implementation of other Path Managers for 1050 MPTCP. A general overview of the architecture is provided in 1051 Figure 9. The Multipath Scheduler (MPS) learns about the number of 1052 available paths through notifications received from the Path Manager 1053 (PM). From the point of view of the Multipath Scheduler, a path is 1054 just a number, called a Path Index. Notifications from the PM to the 1055 MPS MAY contain supporting information about the paths, if relevant, 1056 so that the MPS can make more intelligent decisions about where to 1057 route traffic. When the Multipath Scheduler initiates a 1058 communication to a new host, it can only send the packets to the 1059 default path. But since the Path manager is layered below the MPS, 1060 it can detect that a new communication is happening, and tell the MPS 1061 about the other paths it knows about. 1063 Control plane <-- | --> Data plane 1064 +---------------------------------------------------------------+ 1065 | Multipath Scheduler (MPS) | 1066 +---------------------------------------------------------------+ 1067 ^ | | 1068 | | [A1,B1,|pA1,pB1] 1069 | | | 1070 |Announcing new | +-------------+ 1071 |paths. (referred | | Data packet |<--Path idx:3 1072 |to as path indices) | +-------------+ attached 1073 | | | by MPS 1074 | | V 1075 +--------------------------------------------\------------------+ 1076 | Path Manager (PM) \__________zzzzz | 1077 +--------------------------------------------------------\------+ 1078 / \ | \ 1079 /---------------------------\ | /"\ /"\ /"\ 1080 | subflow_id Action | | | | | | | | 1081 | xxxxx | | | | | | | | 1082 | yyyyy | | \./ \./ \./ 1083 | zzzzz | | path1 path2 path3 1084 +---------------------------+ 1086 Figure 9: Overview of MPTCP architecture 1088 From then on, it is possible for the MPS to associate a Path Index 1089 with its packets, so that the Path Manager can map this Path Index to 1090 a particular action (see table in the lower left part of Figure 9). 1091 The particular action depends on the network mechanism used to select 1092 a path. Examples are address rewriting, tunnelling or setting a path 1093 selector value inside the packet. Note that the Path Index is not 1094 supposed to be written inside the packet, but instead associated with 1095 it, internally to the implementation. 1097 The applicability of the architecture is not limited to the MPTCP 1098 protocol. While we define in this document an MPTCP MPS (MPTCP 1099 Multipath Scheduler), other Multipath Schedulers can be defined. For 1100 example, if an appropriate socket interface is designed, applications 1101 could behave as a Multipath Scheduler and decide where to send any 1102 particular data. In this document we concentrate on the MPTCP case, 1103 however. 1105 A.2. PM/MPS interface 1107 The minimal set of requirement for a Path Manager is as follows: 1109 o Outgoing untagged packets: Any outgoing packet flowing through the 1110 Path Manager is either tagged or untagged (by the MPS) with a path 1111 index. If it is untagged, the packet is sent normally to the 1112 Internet, as if no multi-path support were present. Untagged 1113 packets can be used to trigger a path discovery procedure, that 1114 is, a Path Manager can listen to untagged packets and decide at 1115 some time to find if any other path than the default one is 1116 useable for the corresponding host pair. Note that any other 1117 criteria could be used to decide when to start discovering 1118 available paths. Note also that MPS scheduling will not be 1119 possible until the Path Manager has notified the available paths. 1120 The PM is thus the first entity coming into action. 1122 o Outgoing tagged packets: The Path Manager maintains a table 1123 mapping path indices to actions. The action is the operation that 1124 allows using a particular path. Examples of possible actions are 1125 route selection, interface selection or packet transformation. 1126 When the PM sees a packet tagged with a path index, it looks up 1127 its table to find the appropriate action for that packet. The tag 1128 is purely local. It is removed before the packet is transmitted. 1130 o Incoming packets: A Path Manager MUST ensure that each incoming 1131 path is mapped unambiguously to exactly one outgoing path. Note 1132 that this requirement implies that the same number of incoming/ 1133 outgoing paths must be established. Moreover, a PM MUST tag any 1134 incoming path with the same Path Index as the one used for the 1135 corresponding outgoing path. This is necessary for MPTCP to know 1136 what outgoing path is acknowledged by an incoming packet. 1138 o Module interface: A PM MUST be able to notify the MPS about the 1139 number of available paths. Such notifications MUST contain the 1140 path indices that are legal for use by the MPS. In case the PM 1141 decides to stop providing service for one path, it MUST notify the 1142 MPS about path removal. Additionnaly, a PM MAY provide 1143 complementary path information when available, such as link 1144 quality or preference level. 1146 Appendix B. Changelog 1148 B.1. Changes since draft-ietf-mptcp-architecture-00 1150 o Added middlebox compatibility discussion (Section 9). 1152 o Clarified path identification (TCP 4-tuple) in Section 5.5. 1154 o Added brief scenario and diagram to Section 1.3. 1156 Authors' Addresses 1158 Alan Ford (editor) 1159 Roke Manor Research 1160 Old Salisbury Lane 1161 Romsey, Hampshire SO51 0ZN 1162 UK 1164 Phone: +44 1794 833 465 1165 Email: alan.ford@roke.co.uk 1167 Costin Raiciu 1168 University College London 1169 Gower Street 1170 London WC1E 6BT 1171 UK 1173 Email: c.raiciu@cs.ucl.ac.uk 1175 Sebastien Barre 1176 Universite catholique de Louvain 1177 Pl. Ste Barbe, 2 1178 Louvain-la-Neuve 1348 1179 Belgium 1181 Phone: +32 10 47 91 03 1182 Email: sebastien.barre@uclouvain.be 1184 Janardhan Iyengar 1185 Franklin and Marshall College 1186 Mathematics and Computer Science 1187 PO Box 3003 1188 Lancaster, PA 17604-3003 1189 USA 1191 Phone: 717-358-4774 1192 Email: jiyengar@fandm.edu