idnits 2.17.1 draft-ford-mptcp-multiaddressed-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 26, 2009) is 5289 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 793 (ref. '2') (Obsoleted by RFC 9293) == Outdated reference: A later version (-01) exists of draft-ford-mptcp-architecture-00 == Outdated reference: A later version (-01) exists of draft-raiciu-mptcp-congestion-00 == Outdated reference: A later version (-04) exists of draft-scharf-mptcp-api-00 == Outdated reference: A later version (-01) exists of draft-bagnulo-mptcp-threat-00 Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force A. Ford 3 Internet-Draft Roke Manor Research 4 Intended status: Experimental C. Raiciu 5 Expires: April 29, 2010 M. Handley 6 University College London 7 October 26, 2009 9 TCP Extensions for Multipath Operation with Multiple Addresses 10 draft-ford-mptcp-multiaddressed-02 12 Status of this Memo 14 This Internet-Draft is submitted to IETF in full conformance with the 15 provisions of BCP 78 and BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on April 29, 2010. 35 Copyright Notice 37 Copyright (c) 2009 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents in effect on the date of 42 publication of this document (http://trustee.ietf.org/license-info). 43 Please review these documents carefully, as they describe your rights 44 and restrictions with respect to this document. 46 Abstract 48 TCP/IP communication is currently restricted to a single path per 49 connection, yet multiple paths often exist between peers. The 50 simultaneous use of these multiple paths for a TCP/IP session would 51 improve resource usage within the network, and thus improve user 52 experience through higher throughput and improved resilience to 53 network failure. 55 Multipath TCP provides the ability to simultaneously use multiple 56 paths between peers. This document presents a set of extensions to 57 traditional TCP to support multipath operation. The protocol offers 58 the same type of service to applications as TCP - reliable bytestream 59 - and provides the components necessary to establish and use multiple 60 TCP flows across potentially disjoint paths. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 65 1.1. Design Assumptions . . . . . . . . . . . . . . . . . . . . 3 66 1.2. Layered Representation . . . . . . . . . . . . . . . . . . 4 67 1.3. Operation Summary . . . . . . . . . . . . . . . . . . . . 5 68 1.4. Open Issues . . . . . . . . . . . . . . . . . . . . . . . 6 69 1.5. Requirements Language . . . . . . . . . . . . . . . . . . 7 70 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 71 3. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 7 72 4. MPTCP Protocol . . . . . . . . . . . . . . . . . . . . . . . . 8 73 4.1. Session Initiation . . . . . . . . . . . . . . . . . . . . 9 74 4.2. Starting a New Subflow . . . . . . . . . . . . . . . . . . 10 75 4.3. Address Knowledge Exchange (Path Management) . . . . . . . 11 76 4.3.1. Adding Addresses . . . . . . . . . . . . . . . . . . . 13 77 4.3.2. Remove Address . . . . . . . . . . . . . . . . . . . . 14 78 4.4. General MPTCP Operation . . . . . . . . . . . . . . . . . 14 79 4.4.1. Receive Window Considerations . . . . . . . . . . . . 16 80 4.4.2. Congestion Control Considerations . . . . . . . . . . 17 81 4.4.3. Subflow Policy . . . . . . . . . . . . . . . . . . . . 17 82 4.4.4. Retransmissions . . . . . . . . . . . . . . . . . . . 18 83 4.5. Closing a Connection . . . . . . . . . . . . . . . . . . . 19 84 4.6. Error Handling . . . . . . . . . . . . . . . . . . . . . . 20 85 5. Security Considerations . . . . . . . . . . . . . . . . . . . 20 86 6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 21 87 7. Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . 22 88 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22 89 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 90 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 91 10.1. Normative References . . . . . . . . . . . . . . . . . . . 23 92 10.2. Informative References . . . . . . . . . . . . . . . . . . 23 93 Appendix A. Notes on use of TCP Options . . . . . . . . . . . . . 23 94 Appendix B. Resync Packet . . . . . . . . . . . . . . . . . . . . 24 95 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 25 97 1. Introduction 99 Multipath TCP (henceforth referred to as MPTCP) is set of extensions 100 to regular TCP [2] to allow a transport connection to operate across 101 multiple paths simultaneously. This document presents the protocol 102 changes required by Multipath TCP, specifically those for signalling 103 and setting up multiple paths ("subflows"), managing these subflows, 104 reassembly of data, and termination of sessions. This is not the 105 only information required to create a Multipath TCP implementation, 106 however. This document is complemented by several others: 108 o Architecture [3], which explains the motivations behind Multipath 109 TCP and a functional separation through which an extensible MPTCP 110 implementation can be developed. 112 o Congestion Control [4], presenting a safe congestion control 113 algorithm for coupling the behaviour of the multiple paths in 114 order to "do no harm" to other network users. 116 o Application Considerations [5], discussing what impact MPTCP will 117 have on applications, what applications will want to do with 118 MPTCP, and as a consequence of these factors, what API extensions 119 an MPTCP implementation should present. 121 1.1. Design Assumptions 123 In order to limit the potentially huge design space, the authors 124 imposed two key constraints on the multipath TCP design presented in 125 this document: 127 o It must be backwards-compatible with current, regular TCP, to 128 increase its chances of deployment 130 o It can be assumed that one or both endpoints are multihomed and 131 multiaddressed 133 To simplify the design we assume that the presence of multiple 134 addresses at an endpoint is sufficient to indicate the existence of 135 multiple paths. These paths need not be entirely disjoint: they may 136 share one or many routers between them. Even in such a situation 137 making use of multiple paths is beneficial, improving resource 138 utilisation and resilience to a subset of node failures. The 139 congestion control algorithms as discussed in [4] ensure this does 140 not act detrimentally. 142 There are three aspects to the backwards-compatibility listed above: 144 External Constraints: The protocol must function through the vast 145 majority of existing middleboxes such as NATs, firewalls and 146 proxies, and as such must resemble existing TCP as far as possible 147 on the wire. Furthermore, the protocol must not assume the 148 segments it sends on the wire arrive unmodified at the 149 destination: they may be split or coalesced; options may be 150 removed or duplicated. 152 Application Constraints: The protocol must be usable with no change 153 to existing applications that use the standard TCP API (although 154 it is reasonable that not all features would be available to such 155 legacy applications). 157 Fall-back: The protocol should be able to fall back to standard TCP 158 with no interference from the user, to be able to communicate with 159 legacy hosts. 161 Areas for further study: 163 o In theory, since this is purely a TCP extension, it should be 164 possible to use MPTCP with both IPv4 and IPv6 on dual-stack hosts, 165 thus having the additional possible benefit of aiding transition. 167 o Some features of the design presented here could be extended to 168 work with non-multi-addressed hosts by using other packet metadata 169 (such as ports or flow label), packet marking, or partial 170 (potenitally proxied) multipath. 172 1.2. Layered Representation 174 MPTCP operates at the transport layer, and its existence aims to be 175 transparent to both higher and lower layers. It is a set of 176 additional features on top of standard TCP, and as such MPTCP is 177 designed to be usable by legacy applications with no changes. A 178 possible implementation would be for such a feature to be a system- 179 wide setting: "Use multipath TCP by default? Y/N". Multipath-aware 180 applications would be able to use an extended sockets API to have 181 further influence on the behaviour of MPTCP. Figure 1 illustrates 182 this layering. 184 +-------------------------------+ 185 | Application | 186 +---------------+ +-------------------------------+ 187 | Application | | MPTCP | 188 +---------------+ + - - - - - - - + - - - - - - - + 189 | TCP | | Subflow (TCP) | Subflow (TCP) | 190 +---------------+ +-------------------------------+ 191 | IP | | IP | IP | 192 +---------------+ +-------------------------------+ 194 Figure 1: Comparison of Standard TCP and MPTCP Protocol Stacks 196 Detailed discussion of an architecture for developing a multipath TCP 197 implementation, especially regarding the functional separation by 198 which different components should be developed, is given in [3]. 200 1.3. Operation Summary 202 This section provides a high-level summary of normal operation of 203 MPTCP, and is illustrated by the scenario shown in Figure 2. A 204 detailed description of operation is given in Section 4. 206 o To a non-MPTCP-aware application, MPTCP will be indistinguishable 207 from normal TCP. All MPTCP operation is handled by the MPTCP 208 implementation, although extended APIs could provide additional 209 control and influence [5]. An application begins by opening a TCP 210 socket in the normal way. 212 o An MPTCP connection begins as a single TCP session. This is 213 illustrated in Figure 2 as being between Addresses A1 and B1 on 214 Hosts A and B respectively. 216 o If extra paths are available, additional TCP sessions are created 217 on these paths, and are combined with the existing session, which 218 continues to appear as a single connection to the applications at 219 both ends. The creation of the additional TCP session is 220 illustrated between Address A2 on Host A and Address B1 on Host B. 222 o MPTCP identifies multiple paths by the presence of multiple 223 addresses at endpoints. Combinations of these multiple addresses 224 equate to the additional paths. In the example, other potential 225 paths that could be set up are A1<->B2 and A2<->B2. Although this 226 additional session is shown as being initiated from A2, it could 227 equally have been initiated from B1. 229 o The discovery and setup of additional TCP sessions (termed 230 'subflows') will be achieved through a path management method. 231 This document describes a mechanism by which an endpoint can 232 initiate new subflows by using its additional addresses, or by 233 signalling its available addresses to the other endpoint. 235 o MPTCP adds connection-level sequence numbers in order to 236 reassemble the data stream in-order from multiple subflows. 237 Connections are terminated by connection-level FIN packets as well 238 as those relating to the individual subflows. 240 Host A Host B 241 ------------------------ ------------------------ 242 Address A1 Address A2 Address B1 Address B2 243 ---------- ---------- ---------- ---------- 244 | | | | 245 | (initial connection setup) | | 246 |----------------------------------->| | 247 |<-----------------------------------| | 248 | | | | 249 | (additional subflow setup) | 250 | |--------------------->| | 251 | |<---------------------| | 252 | | | | 253 | | | | 255 Figure 2: Example MPTCP Usage Scenario 257 1.4. Open Issues 259 This specification is a work-in-progress, and as such there are many 260 issues that are still to be resolved. This section lists many of the 261 key open issues within this specification; these are discussed in 262 more detail in the appropriate sections throughout this document. 264 o Best handshake mechanisms (Section 4.1). This document contains a 265 proposed scheme by which connections and subflows can be set up. 266 It is felt that, although this is "no worse than regular TCP", 267 there could be opportunities for significant improvements in 268 security that could be included (potentially optionally) within 269 this protocol. 271 o Issues around simulataneous opens, where both ends attempt to 272 create a new subflow simultaneously, need to be investigated and 273 behaviour specified. 275 o Appropriate mechanisms for controlling policy/priority of subflow 276 usage (specifically regarding controlling incoming traffic, 277 Section 4.4.3). The ECN signal is currently proposed but other 278 alternatives, including per subflow receive windows or path 279 property options, could be employed instead. 281 o How much control do we want over subflows from other subflows 282 (e.g. closing when interface has failed)? Do we want to 283 differentiate between subflows and addresses (Section 4.2)? 285 o Do we want a connection identifier in every packet? E.g. would 286 make implementation of IDS much easier? 288 o Best way of ensuring data/subflow sequence numbering mapping 289 through middleboxes (Section 4.4)? 291 o Is there any benefit to a data-level acknowlegement? 293 1.5. Requirements Language 295 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 296 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 297 document are to be interpreted as described in RFC 2119 [1]. 299 2. Terminology 301 Path: A sequence of links between a sender and a receiver, defined 302 in this context by a source and destination address pair. 304 Subflow: A stream of TCP packets sent over a path. A subflow is a 305 component part of a connection between two endpoints. 307 Connection: A collection of one or more subflows, over which an 308 application can communicate between two endpoints. There is a 309 one-to-one mapping between a connection and a socket. 311 Token: A unique identifier given to a multipath connection by an 312 endpoint. May also be referred to as a "Connection ID". 314 Endpoint: A host operating an MPTCP implementation, and either 315 initiating or terminating a MPTCP connection. 317 3. Semantic Issues 319 In order to support multipath operation, the semantics of some TCP 320 components have changed. To aid clarity, this section collects these 321 semantic changes as a reference. 323 Sequence Number: The (in-header) TCP sequence number is subflow- 324 specific. To allow the receiver to reorder application data, an 325 additional data-level sequence space is used. In this space, the 326 initial SYN and the final DATA FIN occupy one octet. There is an 327 explicit mapping of data sequence space to subflow sequence space, 328 which is signalled through TCP options in data packets. 330 Receive Window: The receive window exists at the connection level, 331 rather than at the subflow level, as it tries to regulate the 332 sending rate of the sender to a slower receiver. With multipath 333 TCP, each subflow MUST report the same global receive window, 334 describing the per connection receive buffer. 336 FIN: The FIN only applies to a subflow, not to a connection. For a 337 connection-level FIN, use the DATA FIN option. 339 ACK: The ACK acknowledges the subflow sequence number only, and the 340 mapping to the data sequence number is handled out-of-band. 342 RST: The RST only applies to a subflow. There is no connection- 343 level RST, since it would be impossible to distinguish the two, 344 i.e. if there is no state about a subflow, the host cannot know to 345 what connection the subflow is related. A connection is 346 considered reset if every subflow sends a RST in response. 348 Address List: The address management is handled per-connection to 349 permit the application of per-connection local policy. 351 5-tuple: The 5-tuple (protocol, local address, local port, remote 352 address, remote port) presented to the application layer in a non- 353 multipath-aware application is that of the first subflow, even if 354 the subflow has since been closed and removed from the connection. 355 These API issues are discussed in more detail in [5]. 357 4. MPTCP Protocol 359 This section describes the operation of the MPTCP protocol, and is 360 subdivided into sections for each key part of the protocol operation. 362 All MPTCP operations are signalled using optional TCP header fields. 363 These TCP Options will have option numbers allocated by IANA, as 364 listed in Section 9, and are defined throughout the following 365 subsections. 367 4.1. Session Initiation 369 Session Initiation begins with a SYN, SYN/ACK exchange on a single 370 path. Each of these packets will additionally feature the Multipath 371 Capable TCP option (Figure 3), which declares the sender's locally 372 unique 32-bit token for this connection, and contains a version 373 field. 375 The "Multipath Capable" option declares an endpoint to be capable of 376 operating Multipath TCP (or rather, more accurately, a desire to 377 operate Multipath TCP on this particular connection). As well as 378 this declaration, this field presents a token, which is used when 379 adding additional subflows to this connection. 381 This token is generated by the sender and has local meaning only, 382 hence it MUST be unique for the sender. The token MUST be difficult 383 for an attacker to guess, and thus it is recommended it SHOULD be 384 generated randomly. (However, see further discussions about security 385 in Section 5, including the possibility of 64-bit tokens.) 387 This option is only present in packets with the SYN flag set. It is 388 only used in the first TCP session of a connection, in order to 389 identify the connection; all following connections will use path 390 management options (see Section 4.2) to join the existing connection. 392 1 2 3 393 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 394 +---------------+---------------+-------------------------------+ 395 | Kind=OPT_MPC | Length = 7 |(resvd)|Version| Sender Token : 396 +---------------+---------------+-------------------------------+ 397 : Sender Token (continued - 4 octets total) | 398 +-----------------------------------------------+ 400 Figure 3: Multipath Capable option 402 The version field represents the version of MPTCP in use. The 403 version provided in this specification is 0. The reserved bits may 404 be used for connection-specific flags in later versions, or may be 405 used to indicate an authentication method. 407 If a SYN contains a "multipath capable" option but the SYN/ACK does 408 not, it is assumed that the recipient is not multipath capable and 409 thus the MPTCP session will operate as regular, single-path TCP. If 410 a SYN does not contain a "multipath capable" option, the SYN/ACK MUST 411 NOT contain one in response. 413 If these packets are unacknowledged, it is up to local policy to 414 decide how to respond. It is expected that a sender will eventually 415 fall back to single-path TCP (i.e. without the Multipath Capable 416 Option), in order to work around middleboxes that may drop packets 417 with unknown options; however, the number of multipath-capable 418 attempts that are made first will be up to local policy. In the case 419 of out-of-order packets, i.e. if a multipath-capable SYN/ACK is 420 received in response to a multipath-capable SYN, after a standard SYN 421 has been sent, then once again it is up to the initiator to choose 422 how to behave. For example, it could respond to new connections 423 using the previously declared token, or it could simply drop any new 424 multipath options within the flow. 426 If an endpoint is known to be multiaddressed (e.g. through multiple 427 addresses returned in a DNS lookup), alternative destination 428 addresses SHOULD be tried first, before falling back to regular TCP. 430 In addition to this option, a Data Sequence Number option (discussed 431 in Section 4.4) is included to provide an initial data-level sequence 432 number (and this initial SYN counts as one octet in this space, as 433 for a regular SYN in single-path TCP). This could also have some 434 (minor) security benefits, discussed in Section 5. 436 4.2. Starting a New Subflow 438 Endpoints have knowledge of their own address(es), and can become 439 aware of the other endpoint's addresses through signalling exchanges 440 as described in Section 4.3. Using this knowledge, an endpoint will 441 initiate a new subflow over a currently unused pair of addresses. 443 A new subflow is started as a normal TCP SYN/ACK exchange. The 444 "Join" TCP option (Figure 4) is used to identify of which connection 445 the new subflow should become a part. The token used is the locally 446 unique token of the destination for the subflow, as defined by the 447 Multipath Capable option received in the first SYN/ACK exchange. 449 It should be noted that, in theory, additional subflows can exist 450 between any pair of ports; no explicit accept calls or bind calls are 451 required to open additional subflows. To associate a new subflow to 452 an existing connection, the token supplied in the subflow's SYN 453 exchange is used for demultiplexing. This means that port numbers on 454 subflow SYN exchanges are not important, and any values can be used, 455 as long as the 5-tuple is unique for each host. In practice, it is 456 envisaged that most new subflows will connect to a port that is 457 already in use as the source or destination port of an existing 458 subflow, in order to have a greater chance of getting through 459 firewalls and other middleboxes, and to support traffic engineering 460 of the flows. 462 Deumultiplexing subflow SYNs MUST be done using the token; this is 463 unlike traditional TCP, where the destination port is used for 464 demultiplexing SYN packets. Once a subflow is setup, demultiplexing 465 packets is done using the five-tuple, as in traditional TCP. 467 The "Join" option includes an "Address ID". This is an identifier, 468 locally unique to the sender of this option, and with only per- 469 connection relevance, which identifies the source address of this 470 packet. This serves two purposes. Firstly, if an address becomes 471 unexpectedly unavailable on the sender, it can signal this to the 472 receiver via a remove address option (Section 4.3.2) without needing 473 to know what the source address actually is (thus allowing the use of 474 NATs). Secondly, it allows correlation between new connection 475 attempts and address signalling (Section 4.3.1), to prevent duplicate 476 subflow initiation. 478 TBD: Instead of an Address ID, are there any cases where a Subflow ID 479 (i.e. unique to the subflow) would be useful instead? For example, 480 two addresses which become NATted to the same address? 482 This option can only be present when the SYN flag is set. 484 1 2 3 485 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 486 +---------------+---------------+-------------------------------+ 487 | Kind=OPT_JOIN | Length = 7 |Receiver Token (4 octets total): 488 +---------------+---------------+----------------+--------------+ 489 : Receiver Token (continued) | Address ID | 490 +-------------------------------+----------------+ 492 Figure 4: Join Connection option 494 4.3. Address Knowledge Exchange (Path Management) 496 We use the term "path management" to refer to the exchange of 497 information about additional paths between endpoints, which in this 498 design is managed by multiple addresses at endpoints. For more 499 detail of the architectural thinking behind this design, see the 500 separate document [3]. 502 This design makes use of two methods of sharing such information, 503 used simultaneously. The first is the direct setup of new subflows, 504 already described in Section 4.2, where the initiator has an 505 additional address. The second method is described in the following 506 subsections, whereby addresses are signalled explicitly to the other 507 endpoint, to allow it to initiate new connections. This approach, of 508 two complementary mechanisms, has been chosen to allow addresses to 509 change in flight, and thus support operation through NATs, whilst 510 also allowing the signalling of previously unknown addresses, such as 511 those belonging to other address families (e.g. IPv4 and IPv6). 513 Here is an example of typical operation of the protocol: 515 o An endpoint that is multihomed starts an additional TCP session to 516 an address/port pair that is already in use on the other endpoint, 517 using a token to identify the flow (Section 4.2). (A multihomed 518 destination may open a new subflow from its new address to an 519 existing subflow's source address and port, or a multihomed source 520 may open a new subflow from its new address to an existing 521 subflow's destination and port). 523 o To expand upon this, say a connection is intiated from host "A" on 524 (address, port) combination A1 to destination (address, port) B1 525 on host "B". If host A is multihomed, it starts an additional 526 connection from new (address, port) A2 to B1, using B's previously 527 declared token. Alternatively, if B is multhomed, it will try to 528 set up a new TCP connection from B2 to A1, using A's previously 529 declared token. 531 o Simultaneously (or after a timeout), an "Add Address" option 532 (Section 4.3.1) is sent on an existing subflow, informing the 533 receiver of the sender's alternative address(es). The recipient 534 can use this information to open a new subflow to the sender's 535 additional address. Using the previous notation, this would be an 536 Add Address packet sent from A1 to B1, informing B of address A2. 538 o The mix of using the SYN-based option and the Add Address option, 539 including timeouts, is implementation-specific and can be tailored 540 to agree with local policy. 542 o If host B successfully receives the first SYN, starting a new 543 subflow, it can use the Address ID in the Join option to correlate 544 this with the Add Address option that will also arrive on an 545 existing subflow. Assuming the endpoint has already responded to 546 the SYN with a SYN/ACK, it will know to ignore the Add Address 547 option. Otherwise, if it has not received such a SYN, it will try 548 to initiate a new subflow from one or more of its addresses to 549 address A2 (triggered by the Add Address option). This is 550 intended to permit new sessions to be opened if one endpoint is 551 behind a NAT. A slight security improvement can be gained if a 552 host ensures there is a correlated Add Address option before 553 responding to the SYN. 555 Other scenarios are valid, however, such as those where entirely new 556 addresses are signalled, e.g. to allow an IPv6 and an IPv4 path to be 557 used simultaneously. 559 4.3.1. Adding Addresses 561 The Add Address TCP Option announces additional addresses on which an 562 endpoint can be reached (Figure 5), which allows several (ID, 563 address) pairs to be announced to the other endpoint. Multiple 564 addresses can be added if there is sufficient TCP option space, 565 otherwise multiple TCP messages containing this option will be sent. 566 This option can be used at any time during a connection, depending on 567 when the sender wishes to enable multiple paths and/or when paths 568 become available. 570 Every address has an ID which can be used for address removal, and 571 therefore endpoints must cache the mapping between ID and address. 572 This is also used to identify Join Connection options (Section 4.2) 573 relating to the same address, even when address translators are in 574 use. The ID must be unique to the sender and connection, per 575 address, but its mechanism for allocating such IDs is implementation- 576 specific. 578 This option is shown for IPv4. For IPv6, the IPVer field will read 579 6, and the length of the address will be 16 octets not 4, and thus 580 the length of the option will be 2 + (18 * number_of_entries). If 581 there is sufficient TCP option space, multiple addresses can be 582 included, with an ID following on immediately from the previous 583 address, and their existance can be inferred through the option 584 length and version fields. 586 NB: by having a IPVer field, we get four free reserved bits. These 587 could be used in later versions of this protocol for expressing 588 sender policy, e.g. one bit for "use now" or similar, to 589 differentiate between subflows for backup purposes and those for 590 throughput. 592 1 2 3 593 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 594 +---------------+---------------+---------------+-------+-------+ 595 | Kind=OPT_ADDR | Length | Address ID | IPVer |(resvd)| 596 +---------------+---------------+---------------+-------+-------+ 597 | Address (IPv4 - 4 octets) | 598 +---------------------------------------------------------------+ 599 ( ... further ID/Version/Address fields as required ... ) 601 Figure 5: Add Address option (for IPv4) 603 4.3.2. Remove Address 605 If, during the lifetime of a MPTCP connection, a previously-announced 606 address becomes invalid (e.g. if the interface disappears), the 607 affected endpoint should announce this so that the other endpoint can 608 remove subflows related to this address. 610 This is achieved through the Remove Address option (Figure 6), which 611 will remove a previously-added address (or list of addresses) from a 612 connection and terminate any subflows currently using that address. 614 The sending and receipt of this message should trigger the sending of 615 FINs by both endpoints on the affected subflow(s) (if possible), as a 616 courtesy to cleaning up middlebox state, but endpoints may clean up 617 their internal state without a long timeout. 619 Address removal is undertaken by ID, so as to permit the use of NATs 620 and other middleboxes. If there is no address at the requested ID, 621 the receiver will silently ignore the request. 623 The standard way to close a subflow (so long as it is still 624 functioning) is to use a FIN exchange as in regular TCP - for more 625 information, see Section 4.5. 627 1 2 3 628 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 629 +---------------+---------------+---------------+ 630 |Kind=OPT_REMADR| Length = 2+n | Address ID | ... 631 +---------------+---------------+---------------+ 633 Figure 6: Remove Address option 635 4.4. General MPTCP Operation 637 This section discusses operation of MPTCP for data transfer. At a 638 high level, an MPTCP implementation will take one input data stream 639 from an application, and split it into one or more subflows. The 640 data stream as a whole can be reassembled through the use of the Data 641 Sequence Mapping (Figure 7) option, which defines the mapping from 642 the data sequence number to the subflow sequence number. This is 643 used by the receiver to ensure in-order delivery to the application 644 layer. Meanwhile, the subflow-level sequence numbers (i.e. the 645 regular sequence numbers in the TCP header) have subflow-only 646 relevance. 648 The only acknowledgements are those at the subflow-level, so the 649 sender must be able to map these acknowledgements to the data 650 sequence numbers that were contained in the relevant packets. The 651 sender thus knows, if subflow data goes unackowledged, which part of 652 the original data stream this equates to, and thus what data must be 653 retransmitted. It is expected (but not mandated) that SACK [6] is 654 used at the subflow level to improve efficiency. 656 1 2 3 657 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 658 +---------------+---------------+------------------------------+ 659 | Kind=OPT_DSN | Length | Data Sequence Number ... : 660 +---------------+---------------+------------------------------+ 661 : ... ( (length-8) octets ) | Data-level Length (2 octets) | 662 +-------------------------------+------------------------------+ 663 | Subflow Sequence Number (4 octets) | 664 +-------------------------------+------------------------------+ 666 Figure 7: Data Sequence Mapping option 668 This option specifies a full mapping from data sequence number to 669 subflow sequence number, informing the receiver that there is a one- 670 to-one correspondence between the two sequence spaces for the 671 specified length. The purpose of the explicit mapping is to assist 672 with compatibility with situations where TCP/IP segmentation or 673 coalescing is undertaken separately from the stack that is generating 674 the data flow (e.g. through the use of TCP segmentation offloading on 675 network interface cards, or by middleboxes such as performance 676 enhancing proxies). 678 The data sequence number specified in this option is absolute, 679 whereas the subflow sequence numbering is relative (the SYN at the 680 start of the subflow has subflow sequence number 1). 682 A mapping is unique, in that the subflow sequence number is bound to 683 the data sequence number after the mapping has been processed. It is 684 not possible to change this mapping afterwards; however, the same 685 data sequence number can be mapped on different subflows for 686 retransmission purposes (see Section 4.4.4). 688 A receiver MUST NOT accept data for which it does not have a mapping 689 to the data sequence space. To do this, the receiver will not 690 acknowledge the unmapped data at subflow level. It is better to have 691 a subflow fail than to accept data in the wrong order. However, if 692 there was a lost packet in the subflow, the receiver SHOULD wait for 693 this to be retransmitted before closing the subflow, since the lost 694 packet may contain the necessary mapping information. 696 Although it is expected that initial implementations will use 32-bit 697 data sequence numbers (i.e. 4 octets, so a length field of 12), 698 setting the length field to 16 and including a 64-bit sequence number 699 (eight octets) MUST be considered valid and processed appropriately. 700 This may have also have useful security implications, discussed in 701 Section 5. 703 As with the standard TCP sequence number, the data sequence number 704 should not start at zero, but at a random value to make session 705 hijacking harder. This is done by including a Data Sequence Number 706 option along with the Multipath Capable option in the initial SYN 707 (which occupies one octet of data sequence space; see Section 4.1). 708 In this case, to save option space, neither the data-level length nor 709 the subflow sequence number fields are present in this option, so the 710 Length field will be the length of the Data Sequence Number, plus two 711 octets. 713 The Data Sequence Mapping does not need to be included in every MPTCP 714 packet, as long as the subflow sequence space in that packet is 715 covered by a mapping known at a receiver. This can be used to reduce 716 overhead in cases where the mapping is known in advance; one such 717 case is when there is a single subflow between the endpoints, another 718 is when segments of data are scheduled in larger than packet-sized 719 chunks. 721 The MPTCP data and subflow level sequence numbering could be seen to 722 be analogous to that used in SACK, however there are subtle 723 differences. The key similarity is that it is possible to have 724 temporary "holes" in the received data sequence space - later data 725 may have arrived earlier (most likely on a different subflow), but 726 does not need to be retransmitted. The "holes" are later filled in. 727 The key difference, however, is that while SACK can rely on the 728 regular TCP cumulative acknowledgements to indicate how much data has 729 been successfully received (with no holes), there is no similar 730 method in MPTCP. Instead, the sender must keep track of the 731 acknowledgements to derive what data has been successfully received. 732 This leads to some oddities especially with session termination (see 733 Section 4.5). 735 4.4.1. Receive Window Considerations 737 Regular TCP advertises a receive window in each packet, telling the 738 sender how much data the receiver is willing to accept past the 739 cumulative ack. The receive window is used to implement flow 740 control, throttling down fast senders when receivers cannot keep up. 742 MPTCP also uses a unique receive window, shared between the subflows. 743 The idea is to allow any subflow to send data as long as the receiver 744 is willing to accept it; the alternative, maintaining per subflow 745 receive windows, could end-up stalling some subflows while others 746 would not use up their window. 748 An issue will arise regarding how large a receive buffer to 749 implement. The lower bound would be the maximum bandwidth/delay 750 product of all paths, however this could easily fill when a packet is 751 lost on a slower subflow and needs to be retransmitted (see 752 Section 4.4.4). The upper bound would be the maximum RTT multiplied 753 by the maximum total bandwidth available. This will cover most 754 eventualities, but could easily become very large. It is FFS what 755 the best approach is. 757 4.4.2. Congestion Control Considerations 759 Different subflows in an MPTCP connection have different congestion 760 windows. To achieve resource pooling, it is necessary to couple the 761 congestion windows in use on each subflow, in order to push most 762 traffic to uncongested links. One algorithm for achieving this is 763 presented in [4]; the algorithm does not achieve perfect resource 764 pooling but is "safe" in that it is readily deployable in the current 765 Internet. 767 It is foreseeable that different congestion controllers will be 768 implemented for MPTCP, each aiming to achieve different properties in 769 the resource pooling/fairness/stability design space. Much research 770 is expected in this area in the near future. 772 Regardless of the algorithm used, the design of the MPTCP protocol 773 aims to provide the congestion control implementations sufficient 774 information to take the right decisions; this information includes, 775 for each subflow, which packets where lost and when. 777 4.4.3. Subflow Policy 779 Within a local MPTCP implementation, a host may use any local policy 780 it wishes to decide how to share the traffic to be sent over the 781 available paths. 783 In the typical use case, where the goal is to maximise throughput, 784 all available paths will be used simultaneously for data transfer, 785 using coupled congestion control as described in [4]. It is 786 expected, however, that other use cases will appear. 788 For instance, a possibility is an 'all-or-nothing' approach, i.e. 789 have a second path ready for use in the event of failure of the first 790 path, but alternatives could include entirely saturating one path 791 before using an additional path (the 'overflow' case). Such choices 792 would be most likely based on the monetary cost of links, but may 793 also be based on properties such as the delay or jitter of links, 794 where stability is more important than throughput. Application 795 requirements such as this are discussed in detail in [5]. 797 The ability to make effective choices at the sender requires full 798 knowledge of the path "cost", which is unlikely to be the case. 799 There is no mechanism in MPTCP for a receiver to signal their own 800 particular preferences for paths, but this is a necessary feature 801 since receivers will often be the multihomed party, and may have to 802 pay for metered incoming bandwidth. Instead of incorporating complex 803 signalling, it is proposed to use existing TCP features to signal 804 priority implicitly. If a receiver wishes to keep a path active as a 805 backup but wishes to prevent data being sent on that path, it could 806 stop sending ACKs for any data it receives on that path. The sender 807 would interpret this as severe congestion or a broken path and stop 808 using it. We do not advocate this method, however, since this is 809 brutal, naive, and will result in unnecessary retransmissions. 811 Therefore, a proposal is to use ECN [7] to to provide fake congestion 812 signals on paths that a receiver wishes to stop being used for data. 813 This has the benefit of causing the sender to back off without the 814 need to retransmit data unnecessarily, as in the case of a lost ACK. 815 This should be sufficient to allow a receiver to express their 816 policy, although does not permit a rapid increase in throughput when 817 switching to such a path. 819 TBD: This is clearly an overload of the ECN signal, and as such other 820 solutions, such as explicitly signalling path operation preferences 821 (such as in the reserved bits of certain TCP options, or through 822 entirely new options) may be a preferred solution. 824 4.4.4. Retransmissions 826 This protocol specification does not mandate any mechanisms for 827 handling retransmissions in the event of path failures, and much will 828 be dependent upon local policy (as discussed in Section 4.4.3). The 829 data sequence number, as given in a TCP option, is used to reassemble 830 the incoming streams before presentation to the application layers, 831 so a sender is free to re-send data with the same data sequence 832 number on a different subflow. When doing this, an endpoint must 833 still retransmit the original data on the original subflow, in order 834 to preserve the subflow integrity (middleboxes could replay old data, 835 and/or could reject holes in subflows), and a receiver will ignore 836 these retransmissions. While this is clearly suboptimal, for 837 compatibility reasons we feel this is the best behaviour. 838 Optimisations could be negotiated in future versions of this 839 protocol. 841 Of course, retransmissions on alternative subflows will only occur if 842 this is what local policy suggests. Indeed, it may be equally valid 843 to retransmit on the same subflow if alternative paths have 844 considerably worse quality of service, or are only kept for backup 845 purposes. Additionally, it may be possible for some implementations 846 to signal from lower layers if there are problems with the paths, and 847 so more appropriate responses could occur. 849 4.5. Closing a Connection 851 Under single path TCP, a FIN signifies that the sender has no more 852 data to send. In order to allow subflows to operate independently, 853 however, and with as little change from regular TCP as possible, a 854 FIN in MPTCP will only affect the subflow on which it is sent. This 855 allows nodes to exercise considerable freedom over which paths are in 856 use at any one time. The semantics of a FIN remain as for regular 857 TCP, i.e. it is not until both sides have ACKed each other's FINs 858 that the subflow is fully closed. 860 When an application calls close() on a socket, this indicates that it 861 has no more data to send, and for regular TCP this would result in a 862 FIN on the connection. For MPTCP, an equivalent mechanism is needed, 863 and this is the DATA FIN. This option, shown in Figure 8, is 864 attached to a regular FIN option on a subflow. 866 A DATA FIN is an indication that the sender has no more data to send, 867 and as such can be used as a rapid indication of the end of data from 868 a sender. A DATA FIN, as with the FIN on a regular TCP connection, 869 is a unidirectional signal. 871 The DATA FIN is an optimisation to rapidly indicate the end of a data 872 stream and clean up state associated with a MPTCP connection, 873 especially when some subflows may have failed. Specifically, when a 874 DATA FIN has been received, IF all data has been successfully 875 received, timeouts on all subflows MAY be reduced. Similarly, when 876 sending a DATA FIN, once all data (including the DATA FIN, since it 877 occupies one octet of data sequence space) has been acknowledged, 878 FINs must be sent on every subflow. This applies to both endpoints, 879 and is required in order to clean up state in middleboxes. 881 There are complex interactions, however, between a DATA FIN and 882 subflow properties: 884 o A DATA FIN MUST only be sent on a packet which also has the FIN 885 flag set. 887 o A DATA FIN occupies one octet (the final octet) of Data Sequence 888 Number space. Therefore, even if there is no user data, a Data 889 Sequence Number option MUST be added to a packet containing the 890 DATA FIN option. This allows the receiver to easily determine the 891 last data sequence number that should have been received. 893 o There is a one-to-one mapping between the DATA FIN and the 894 subflow's FIN flag (and its associated sequence space and thus its 895 acknowlegement). In other words, when a subflow's FIN flag has 896 been acknowledged, the associated DATA FIN is also acknowledged. 898 o As such, the acknowledgement of a FIN and DATA FIN DOES NOT 899 indicate that all data has been successfully received. Because 900 the data level ack is inferred from subflow acks, an endpoint must 901 use subflow acks to discover when all data up to and including the 902 DATA FIN has been received. 904 It should be noted that an endpoint may also send a FIN on an 905 individual subflow to shut it down, but this impact is limited to the 906 subflow in question. If all subflows have been closed with a FIN, 907 that is equivalent to having closed the connection with a DATA FIN. 909 1 910 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 911 +---------------+---------------+ 912 | Kind=OPT_DFIN | Length = 2 | 913 +---------------+---------------+ 915 Figure 8: DATA FIN option 917 4.6. Error Handling 919 TBD 921 Unknown token in MPTCP SYN should equate to an unknown port, e.g. a 922 TCP reset? We should make this as silent and tolerant as possible. 923 Where possible, we should keep this close to the semantics of TCP. 924 However, some MPTCP-specific issues such as where a data sequence 925 number is missing from a subflow, will definitely need MPTCP-specific 926 errors handling in those cases. 928 5. Security Considerations 930 TBD 932 (Token generation, handshake mechanisms, new subflow authentication, 933 etc...) 935 A generic threat analysis for the addition of multipath capabilities 936 to TCP is presented in [8]. The protocol presented here has been 937 designed to minimise or eliminate these identified threats. (A 938 future version of this document will explicitly address the presented 939 threats). 941 The development of a TCP extension such as this will bring with it 942 many additional security concerns. We have set out here to produce a 943 solution that is "no worse" than current TCP, with the possibility 944 that more secure extensions could be proposed later. 946 The primary area of concern will be around the handshake to start new 947 subflows which join existing connections. The proposal set out in 948 Section 4.1 and Section 4.2 is for the initiator of the new subflow 949 to include the token of the other endpoint in the handshake. The 950 purpose of this is to indicate that the sender of this token was the 951 same entity that received this token at the initial handshake. 953 One area of concern is that the token could be simply brute-forced. 954 The token must behard to guess, and as such could be randomly 955 generated. This may still not be strong enough, however, and so the 956 use of 64 bits for the token would alleviate this somewhat. 958 Use of these tokens only provide an indication that the token is the 959 same as at the initial handshake, and does not say anything about the 960 current sender of the token. Therefore, another approach would be to 961 bring a new measure of freshness in to the handshake, so instead of 962 using the initial token a sender could request a new token from the 963 receiver to use in the next handshake. Hash chains could also be 964 used for this purpose. 966 Yet another alternative would be for all SYN packets to include a 967 data sequence number. This could either be used as a passive 968 identifier to indicate an awareness of the current data sequence 969 number (although a reasonable window would have to be allowed for 970 delays). Or, the SYN could form part of the data sequence space - 971 but this would cause issues in the event of lost SYNs (if a new 972 subflow is never established), thus causing unnecessary delays for 973 retransmissions. 975 6. Interactions with Middleboxes 977 TBD 979 How we get around NATs, firewalls. Problems with TCP proxies. How 980 to make an MPTCP-aware middlebox, ... 982 7. Interfaces 984 TBD 986 Interface with applications, interface with TCP, interface with lower 987 layers... 989 Discussion of interaction with applications (both in terms of how 990 MPTCP will affect an application's assumptions of the transport 991 layer, and what API extensions an application may wish to use with 992 MPTCP) are discussed in [5]. 994 8. Acknowledgements 996 The authors are supported by Trilogy 997 (http://www.trilogy-project.org), a research project (ICT-216372) 998 partially funded by the European Community under its Seventh 999 Framework Program. The views expressed here are those of the 1000 author(s) only. The European Commission is not liable for any use 1001 that may be made of the information in this document. 1003 The authors gratefully acknowledge significant input into this 1004 document from many members of the Trilogy project, notably Iljitsch 1005 van Beijnum, Lars Eggert, Marcelo Bagnulo Braun, Robert Hancock, Pasi 1006 Sarolahti, Olivier Bonaventure, Toby Moncaster, Philip Eardley and 1007 Andrew McDonald. 1009 9. IANA Considerations 1011 This document will make a request to IANA to allocate new values for 1012 TCP Option identifiers, as follows: 1014 +------------+----------------------+---------------+-------+ 1015 | Symbol | Name | Ref | Value | 1016 +------------+----------------------+---------------+-------+ 1017 | OPT_MPC | Multipath Capable | Section 4.1 | (tbc) | 1018 | OPT_ADDR | Add Address | Section 4.3.1 | (tbc) | 1019 | OPT_REMADR | Remove Address | Section 4.3.2 | (tbc) | 1020 | OPT_JOIN | Join Connection | Section 4.2 | (tbc) | 1021 | OPT_DSN | Data Sequence Number | Section 4.4 | (tbc) | 1022 | OPT_DFIN | DATA FIN | Section 4.5 | (tbc) | 1023 +------------+----------------------+---------------+-------+ 1025 Table 1: TCP Options for MPTCP 1027 10. References 1029 10.1. Normative References 1031 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1032 Levels", BCP 14, RFC 2119, March 1997. 1034 10.2. Informative References 1036 [2] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 1037 September 1981. 1039 [3] Ford, A., Raiciu, C., Barre, S., Iyengar, J., and B. Ford, 1040 "Architectural Guidelines for Multipath TCP Development", 1041 draft-ford-mptcp-architecture-00 (work in progress), 1042 October 2009. 1044 [4] Raiciu, C., Handley, M., and D. Wischik, "Coupled Multipath- 1045 Aware Congestion Control", draft-raiciu-mptcp-congestion-00 1046 (work in progress), October 2009. 1048 [5] Scharf, M. and A. Ford, "MPTCP Application Interface 1049 Considerations", draft-scharf-mptcp-api-00 (work in progress), 1050 October 2009. 1052 [6] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 1053 Selective Acknowledgment Options", RFC 2018, October 1996. 1055 [7] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of 1056 Explicit Congestion Notification (ECN) to IP", RFC 3168, 1057 September 2001. 1059 [8] Bagnulo, M., "Threat Analysis for Multi-addressed/Multi-path 1060 TCP", draft-bagnulo-mptcp-threat-00 (work in progress), 1061 October 2009. 1063 [9] Eddy, W. and A. Langley, "Extending the Space Available for TCP 1064 Options", draft-eddy-tcp-loo-04 (work in progress), July 2008. 1066 Appendix A. Notes on use of TCP Options 1068 The TCP option space is limited due to the length of the Data Offset 1069 field in the TCP header (4 bits), which defines the TCP header length 1070 in 32-bit words. With the standard TCP header being 20 bytes, this 1071 leaves a maximum of 40 bytes for options, and many of these may 1072 already be used by options such as timestamp and SACK. 1074 As such, when doing address list manipulation, not all data may fit. 1075 This can be mitigated in one of two ways: 1077 o Using an option to extend the option space, such as that proposed 1078 in [9], which proposes an option providing a 16-bit header length 1079 field. Such an option could only be used between nodes that 1080 support it, however, and so long options could not be used until a 1081 handshake is complete. 1083 o Alternatively, since at least one IP address option field should 1084 be able to fit per packet, address list manipulation can be 1085 undertaken with one address per packet. One method could be to 1086 wait for data to send, and then append one new address per packet. 1087 This would seem reasonable if the TCP session begins rapidly, but 1088 if it is required that the multipath session is ready before the 1089 first data is to be sent, address list manipulation would be 1090 required on empty data (signalling only) packets. Issues may 1091 arise regarding acknowledged delivery of signalling versus data - 1092 this is discussed in Section 3 below. 1094 Appendix B. Resync Packet 1096 In earlier versions of this draft, we proposed the use of a "re-sync" 1097 option that would be used in certain circumstances when a sender 1098 needs to instruct the receiver to skip over certain subflow sequence 1099 numbers (i.e. to treat the specified sequence space as having been 1100 received and acknowledged). 1102 The typical use of this option will be when packets are retransmitted 1103 on different subflows, after failing to be acknowledged on the 1104 original subflow. In such a case, it becomes necessary to move 1105 forward the original subflow's sequence numbering so as not to later 1106 transmit different data with a previously used sequence number (i.e. 1107 when more data comes to be transmitted on the original subflow, it 1108 would be different data, and so must not be sent with previously-used 1109 (but unacknowledged) sequence numbering). 1111 The rationale for needing to do this is two-fold: firstly, when ACKs 1112 are received they are for the subflow only, and the sender infers 1113 from this the data that was sent - if the same sequence space could 1114 be occupied by different data, the sender won't know whether the 1115 intended data was received. Secondly, certain classes of middleboxes 1116 may cache data and not send the new data on a previously-seen 1117 sequence number. 1119 This option was dropped, however, since some middleboxes may get 1120 confused when they meet a hole in the sequence space, and do not 1121 understand the resync option. It is therefore felt that the same 1122 data must continue to be retransmitted on a subflow even if it is 1123 already received after being retransmitted on another. There should 1124 not be a significant performance hit from this since the amount of 1125 data involved and needing to be retransmitted multiple times will be 1126 relatively small. 1128 Therefore, it is necessary to 're-sync' the expected sequence 1129 numbering at the receiving end of a subflow, using the following TCP 1130 option. This packet declares a sequence number space (inclusive) 1131 which the receiving node should skip over, i.e. if the receiver's 1132 next expected sequence number was previously within the range 1133 start_seq_num to end_seq_num, move it forward to end_seq_num + 1. 1135 This option will be used on the first new packet on the subflow that 1136 needs its sequence numbering re-synchronised. It will be continue to 1137 be included on every packet sent on this subflow until a packet 1138 containing this option has been acknowledged (i.e. if subflow 1139 acknowledgements exist for packets beyond the end sequence number). 1140 If the end sequence number is earlier than the current expected 1141 sequence number (i.e. if a resync packet has already been received), 1142 this option should be ignored. 1144 1 2 3 1145 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1146 +---------------+---------------+------------------------------+ 1147 |Kind=OPT_RESYNC| Length = 10 | Start Sequence Number : 1148 +---------------+---------------+------------------------------+ 1149 : (4 octets) | End Sequence Number : 1150 +---------------+---------------+------------------------------+ 1151 : (4 octets) | 1152 +-------------------------------+ 1154 Figure 9: Resync option 1156 Authors' Addresses 1158 Alan Ford 1159 Roke Manor Research 1160 Old Salisbury Lane 1161 Romsey, Hampshire SO51 0ZN 1162 UK 1164 Phone: +44 1794 833 465 1165 Email: alan.ford@roke.co.uk 1166 Costin Raiciu 1167 University College London 1168 Gower Street 1169 London WC1E 6BT 1170 UK 1172 Email: c.raiciu@cs.ucl.ac.uk 1174 Mark Handley 1175 University College London 1176 Gower Street 1177 London WC1E 6BT 1178 UK 1180 Email: m.handley@cs.ucl.ac.uk