Multipath TCP BOF IETF75 30 July 2009 Chairs: Mark Handley, UCL; Scott Brim, Cisco Agenda: Introduction, Scope, Intention, Process (Scott Brim, 5 minutes) Goals and Background (Mark Handley, 25 minutes) Protocol Design (Alan Ford, 25 minutes) Linked Congestion Control (Costin Raiciu, 20 minutes) Rethinking the Transport Layer and Impact on Multipath TCP (Janardhan Iyengar and Bryan Ford, 10 minutes) Technology Discussion (Led by Mark Handley, 30 minutes) WG Chartering Discussion (Led by Scott Brim, 30 minutes) Goals and Background (Mark Handley) ----------------------------------- Extensions to TCP to enable it to work over multiple simultaneous paths. TCP congestion control has done the job of keeping the Internet from collapsing, even though it has some problems. TCP CC does not make traffic go away, it just spreads it over longer period of time. MPTCP actually DOES make traffic go away over congested paths Today's demands for reliability are higher than can be delivered over concatenated paths across the Internet, and applications are much more demanding of network resources. Further, the Internet is unpredictable - failures, DDoS attacks, etc. - need to provide higher reliability for tomorrows' services. Redundancy mechanisms have historically been the method for offering robustness, such as routing failures, multihoming, traffic engineering, and increasingly traffic shaping (Via DPI). Wireless links (the vast majority of devices will be wireless), are also unpredictable. They do have multiple radios, though they cannot currently use these multiple radios to provide redundancy. Example Blackberry - seamless mobility across access technologies, 3G as backup/baseline interface, WiFi as primary when available MP Transport basic principle - stop hiding the multi-homing. Give all network downlinks addresses and make them available to the transport protocol. Congestion control works across multiple interfaces, not just on single interface. Example: Multi-home server. Client can connect to server over multiple paths if the server is multi-homed, but this is considered a single TCP connection. Congestion control mechanism is linked to both interfaces. Cannot do this with BGP multihoming. Link failure does not impact flow - as alternate path remains available. Example: Multi-home server + Legacy Client. Legacy client consumes resources on one of available links. MP TCP connection balances more traffic to alternate path to distribute traffic evenly. MP Transport goal is "resource pooling" - all network resources behave as single pooled resource, and shift load between various parts of network. Example: Multiple 12MB links appear and act as a single 36MB link. Resource pooling is not a new phenomenon - just being extended across multiple physical paths. The more MP clients are enabled, the more possible usage matrices exist, and therefore the better the distribution of traffic. Both clients and servers have different reasons for sending traffic over multiple paths at the same time - but both have the motivation. MP Transport protocols are key to making internet more robust/responsive. In addition, inherent redundancy and responses to unpredictable events are possible (which is not suitable for the underlying routing architecture). Belief that it is not terribly difficult to add multipath behavior to TCP. What about "over TCP" versus "in TCP"? Bittorrent is an example. You only fully get the congestion benefits if you link the response of one path to the congestion of the other path. Want to extend to ALL applications. SCTP? Already has the mechanism, just not implemented at the congestion control function. DCCP? demand? UDP? no single approach. MP-TCP has been proposed several times and shot down (apologies to Christian Huitema, who was original proposer) New understanding that MP-TCP can solve network-wide traffic engineering problems Q (Joe Touch): Three things we are dealing with: (1) TCP dealing with multiple paths. (2) striping. (3) multiple endpoints. Does the process of building this mechanism break "virtualization"? What if multiple addresses on the same interface. Need to ensure via "acid test" that everything running through single virtual interface - that MP-TCP, at worst, operates no worse than TCP today. Answer: Need to make sure we cover these three aspects without breaking virtualization - full agreement. To be covered, to some degree, in Alan's presentation Protocol Design (Alan Ford) --------------------------- Scope- To build TCP modifications to support multipath. Multiple implementations already available, but need to discuss details to solve Internet implementation criteria Usage - a stream of packets sent by source gets to destination. MP protocol determines which path each packet takes. Criteria - path discovery, sequence numbering, scheduling, SYN meanings, etc. Scenarios - most obvious - bulk client/server transfers. Short transactions (HTTP), P2P, interactive and streaming services. Determining where MP-TCP needs to be deployed differs based on scenario (especially with P2P and streaming) Compatibility - no worse than TCP over best path. Unaware boxes (including NATs) must see as standard TCP. Should also appear as standard TCP to applications (API compatibility) Scheduling - the key part of MP-TCP implementation, since determines how to distribute along multiple paths. Scheduler also responsible for retransmission (over same OR alternative path). Scheduling logic may be dependent on path properties (cost, for instance) Signaling - how to do? Chunking or as TCP options? Chunking is application layer, so TCP options is available and standardizes way to add info to TCP headers. Either way, signaling needs to be kept to minimum. Sequence space - 2 ways - single sequence space or data sequence space. Former sends each TCP segment on one of the paths. Latter has a clear distinction between path and data. Two proposals - "One Ended" and "Two Ended" One ended - multihomed host with provider-independent addressing. Only sender modified, and only one source/destination address - per-path congestion provided by SACK Two-ended - single TCP "subflow" begins, and additional subflows (new source/destination address pair) created, and merged under single identifier with existing flow(s) Doing signaling of additional addresses to other endpoint provides workarounds for NAT/firewalls. Also allows simultaneous IPv4/IPv6 use Subflows look like regular TCP with additional options. Data-level sequence number in TCP option. Each subflow operates as individual TCP session (SYN to FIN) Security - no worse than TCP today, and quite possibly a mechanism towards stronger security (opportunistic encryption), or other for binding sessions together. Two-ended solution must deal with same considerations as SHIM6 and mobility Q (Joe Touch): Endpoints start off with association based on a socket pair, then you can add more socket pairs for better performance. What happens if first socket pair shuts down due to ICMP errors? Answer - Good point. Current thinking that initial socket-pair stays connection identifier for entire MP-TCP session. Iljitsch van Beijnum: ICMP doesn't make TCP go away. Joe: "hard errors". Mark: Regardless, you don't have to send anything over a path that is not actually working even if keep state around. So the state of the initial pair will always anchor the association. Q: One or more Addresses, but what about port numbers? Answer - One or more port numbers. So, 2 address, 2 port numbers = 4 paths. But, you dont have to initiate all paths. Q (Dave Thaler): One-ended model. Multiple paths and multiple addresses not only single use-case. Can be single address with multiple gateways, as well. Answer - Agreement Question - Detailed technical work at this point already. Use-cases are less clear, corner cases. "I disable multiple interfaces because I dont want them both active" Answer - "would you say that you don't use both is because you can't keep session continuity via multipath?" Q (Dave Oran): What about extra complexity around PMTU? Answer - Good question - and no answer at present. Q: Address/Port pair corresponds to one path. How do you detect and test multiple paths via endpoint without knowing network topology? Can't detect shared bottleneck. Answer - Requires proper congestion control. Question - Thoughts regarding MPTCP over MPTCP (ie, VPNs)? Answer - Great thing to consider. No answer at present. Question - No notification to application risks a new application relying on a socket that is not there anymore. Answer - Not a concern since applications do not see. Q (Dave Craig): NAT bindings and connection re-use? A: MPTCP doesn't change addresses. Linked Congestion Control (Costin Raiciu) ----------------------------------------- Demonstration of Multipath TCP. 2 interfaces, both shaped to 10Mbps. Killed interfaces, re-enabled interfaces. Showed how bandwidth went down, up, re-discovery, etc. MPTCP - how do you allocate traffic to each path. Depends on goals: 1) Improve throughput over the best single path versus TCP 2) Any path should not be more aggressive than TCP (ie, take at most the same throughput) 3) Balance congestion away from bad links to good links. This implies resource pooling 1) Use independent CC on each path - problem= bottleneck fairness (ie, you get double the throughput of TCP) solution= couple the congestion controllers (see algorithm in slides). The aggregate flows will then behave like a single TCP flow BUT, fully coupled is "flappy" - flips between two extremes 2) Further modification for "linked algorithm" problem - "flappy" solution = change algorithm slightly to remove problem. (see change in algorithm in slides). Algorithm grows windows on each interface a the same rate. Windowing on one interface causes 50% of that interface load to move to the other interface. 3) Effect on RTT problem= equal window on both subflows if equal RTT and same droprates. If RTT on one interface grows, and droprates on both remain the same, then MP-TCP will be doing worse than standard TCP problem= RTT1TCP2 - windowing and throughput need to be above minimum throughput line of individual TCP connection solution= related to RTT (see slides), but results show that MP-TCP works in this scenario (improved performance, actually). Linked provided more fairness than uncoupled and coupled. Algorithm is simple, and working. But other solutions (less conservative) are possible. Q (jabber via Iljitsch): Clipping problem. With coupled windows, is there possibility that you need to decrease windows below zero Answer - no. Fully coupled algorithm has clipping problem, but linked algorithm never halves to something negative (not possible) Q: Operators will want to control sub-flow switching. Need to look at "Switching" issues - when you switch, how much you send over specific interface, etc. May need additional goal: "no worse than TCP in perturbation situations". Answer - No reply Q: what is a path? MPLS? A: source/destination pair. Q: so, you can only "fork" at source? Can't take advantage of additional paths in core. Answer - Not trying to address these cases as this time (i.e. how to signal to routing system). Iljitsch is interested in this question. Costin: MPTCP is first step; if routing system gave choice ... Iljitsch: discussions on mailing list about path selector to signal to routing system. List concluded not to do this now. Q (Gorry Fairhurst): RTT. Worried that RTT plays a part in this, when RTT varies significantly in access links. Is this something that can be addressed? Answer - the experiment changes RTT every ACK (very dynamic). This was taken into account, and Costin shows the graphs for comparison to standard TCP. Q: Technical comparisons were to two examples - uncoupled and fully coupled. Support standard TCP (weighted, statically) Answer - this would give bottleneck fairness but not resource pooling Q: Some resource pooling, but admittedly not as good as this. Answer - absolutely right. 2 types of resource pooling - (1) for single session at any time. (2) for many sessions at any time. Q (Remi Despre): Congratulations. This is the most innovative and promising idea in a long time. Decoupled TCP is a great solution. Q (Gorry Fairhurst): MTU? Answer - When applied to MTU in IPv6 with MTU 1280, should work immediately. Answer - you have to do MTU on each path. This would require changes to implementation stack Rethinking the transport layer (Janardhan Iyengar, Bryan Ford) -------------------------------------------------------------- Quite different - architectural perspective based on next-gen transport architecture presented in TSVArea at IETF75, Monday session (Tng draft) Transport layer defactored into four layers - Endpoint (naming, port numbers), Flow Regulation (performance concerns), Isolation (separates network functions from application functions, End-to-End security), Semantic (API-related functions, end-to-end reliability) Semantic Layer is where MP-TCP functionality fits. Semantic Layer would provide multipath capabilities. Semantic layer manages end-to-end state across flows, and bundles together for shared congestion control This cleanly separates functional units in MPTCP, and opens the exploration of new protocols in the multipath architecture Design implications - 1) No E2E ACK, just flow-level ACK but, middlebox ACK optimistically MPTCP cannot recover from failure after ACK and before successful transmission 2) Number of TCP flows multiplies quickly MP-TCP will consume port numbers very quickly, and increased state could be problematic. CC behavior gets erratic as number of synched TCP connections increase (SCTP or SST might provide a clue for how to resolve at Semantic Layer) Security - where does security belong? Proposal is TLS above TCP subflows (each subflow appears as TLS over TCP - advantage, since if any middlebox messes with TLS, MPTCP just drops path, and thus protecting MPTCP state) Q (Joe Touch): if propose TLS per flow, there may be TLS over TLS. Also what about TCPAO? TLS only protects data, not the TCP connection. Answer - agreement. Could use IPsec if desired. Q (Tim Shephard): TLS today checks certificates. Are we proposing to not check certs? Answer - applications can run TLS on top of semantic layer E2E security. Answer - we are not wedded to TLS (S/TLS/TCPAO mechanism). Concern is considering architectural implications of where you put security, and benefits of having security at this particular layer (between semantic and flow layer) Q (Andrew Yourtchenko): since TLS does not protect TCP, possibility of having single TLS above the multiplexed MPTCP flows. Answer - agree, but you lose the ability to crop single path. Separating compatibility requirements with TCP at API AND network layer has implications: SCTP is multi-home capable and can be used at Semantic layer with TCP subflows at Flow layer SST (Structured Stream Transport) and SCTP has richer APIs DCCP is better at flow layer than MP TCP *This raises the question of the right evolution of protocols Architectural document should be part of the WG efforts - describe architecture, design choices, and how other protocols fit in this architecture Starting point should be MP-TCP design draft and Tng draft Chair comment: Decision on architecture document will be determined after determining if WG is created Technology Discussion ------------------- Q (Marc Blanchet): at socket level, MPTCP is using same address family? Can this use 1 v4 stream and 1 v6 stream? But a socket is bound to address family, so would require change in API Answer - yes, everything so far has used same address family. Technically feasible to use multiple AFs in parallel. Is this a prudent thing to do? This may also be a way to for v4 to v6 transition Question - v4 and v6 - something to be aware of. Ports are part of the v4 address family in many proposals. Concern about port consumption. Would be great if additional ports wer enot needed. Answer - identifier is used for determining which subflows to bind to initial flow. Ports are actually only used for binding and used for rendezvous for initial connection. Answer from chair - there are great benefits to working through proposal in IETF to understand implications on the Internet. Q (Dino Farinacci): MP-TCP cannot choose path, it chooses exit interface from host. Network does destination routing. Example - site that is multi-homed to SP A and SP B, with unique address. A packet could take an exit that didn't match its source address, and get dropped due to uRPF. Very difficult to decide on source address without considering decisions on the routing side. Keeping it loosely coupled breaks MP-TCP. This questions usefulness of load-splitting this close to the host. You cannot load-split across converged network (ie, Internet). Clear distinction needs to be made that MP-TCP cannot choose paths, just interface, and the fate of the packet is tied to that decision. Having multiple IPs on the source is a very difficult problem to solve Answer - Need to be aware of how to get traffic out of link, and might require addtional methods. Answer - This is not a MPTCP problem, since that's a misconfigured network with or without MPTCP. Answer - The network/datacenter architecture would evolve differently if these capabilities existed. Q (Tim Shephard): In response to previous question/comment - there are ways to expose multiple paths in the network to the end systems. Cha Wei Yang's paper. Remi's preso in IntArea. Doesn't necessarily agree with Dino. Q (Tim Shephard): The most important thing here is congestion work. Don't get lost in the details. HIP/SHIM6 needs a multi-path TCP technology to be successful. Need to manage sets of addresses/locators. Need to manage security for those sets. Would be best to be compatible with HIP and SHIM6. Answer - SHIM6 folks involved in draft, so architectural consideration involved in using SHIM6 and MP-TCP together are being explored Scott: Have looked at HIP and SHIM6 and believe MPTCP has advantages in the area of address-pair liveness detection. Tim: probably agree. Iljitsch: could make one-ended on TCP and then use SHIM6. Answer from Chair - this should be discussed in a WG structure and decide which way to go forward. Feedback from community is best way. Question - Different view on relationship between MP-TCP and IPv4 and IPv6. Need to start to develop complete solution in IPv6. MP on IPv6 is very interesting as a solution to the routing problem is compelling. and can be designed very quickly. Answer - Agreement. From Tng architecture, the flow layer picked in initial prototype (existing TCP) is around maintaining compatibility with legacy interface. With IPv6, there is opportunity to not have to deal with compatibility problems and design limitations (port numbers) of legacy TCP. Point is - only in v4 case is it necessary to be compatible with existing infrastructure. In v6, more feasible to consider MP-TCP default operation use different flow layer (ie, DCCP) Answer from chair - need to work in IPv4 and IPv6. Thats intent of work Question - aren't mobility, multi-homing, etc already clearly differentiated in charter of what we are doing? Answer - next topic of conversation - WG charter Question - please send out mail as to why port problems are alleviated in v6 issue Answer - v6 assumption that no v6 port address translation Question - port selection and concerns are a function of the transport protocol selection Question - ports in v4 and v6 - no port NAT on the path. Not that we don't need or shouldn't have - just don't. IPv6 NAT will not share addresses, versus v4 where you share a single address. Answer from chair - NAT will be moved to mailing list conversation WG Charter Discussion ------------------- Q (Gabriel Montenegro): Is it well-defined? Still questions on v4 or v4/v6? Answer - intent is both Question - Vote for WG charter Q (Joe Touch): set of problems is reasonably well defined and would like it expressed that way. Hesitant to create WG, but if LISP is a WG, this should be a WG. Should go to experimental, and if well-received, move back to proposed. Experimental frees people outside of the IETF process Answer - avoided presenting proposal, so this would not be design-by-committee. Need for WG is to help nail all questions/details. Question - support of working group and load-sharing across Internet is important subject, and MP-TCP interesting by simplicity. If group does not exclude pure IPv6 solution before v4/v6 solution is useful. Experimental or proposed standard? Possible for WG to be open to either? Q (Pete Resnick): as applications guy, this is "delightful" and addresses a host of issues that Apps have dealt with forever. Apps would like to see charter Question - application feedback 2nd'd. Question for audience - create WG? Handraise suggested overwhelming (vast majority) for formation of working group. No objections. Charter is next step. Target September. Question for audience - experimental or proposed? Answer - experimental to proposed migration is easy if design works. experimental is easier to work from WG perspective. Good argument from group needed to go to proposed standard Q (Joe Touch): charter should call out separation of work or building a sequence of steps that add value independently, and integrated into single document at the end. This might simplify things. Would like to see ability to switch to a different IP address separated out. Answer - point taken and detailed look as to whether this is possible. More work here. Question - Does MPTCP assume multiple interfaces? Answer - no. Q (Marc Blanchet): Note re: overlap with MIF Answer - noted. Assumption that output of MIF influences and directs MP WG