Multipath TCP BOF
IETF75
30 July 2009

  Chairs: Mark Handley, UCL; Scott Brim, Cisco
 

Agenda:
  
  Introduction, Scope, Intention, Process (Scott Brim, 5 minutes) 
  Goals and Background (Mark Handley, 25 minutes)
  Protocol Design (Alan Ford, 25 minutes)
  Linked Congestion Control (Costin Raiciu, 20 minutes)
  Rethinking the Transport Layer and Impact on Multipath TCP
       (Janardhan Iyengar and Bryan Ford, 10 minutes)
  Technology Discussion (Led by Mark Handley, 30 minutes)
  WG Chartering Discussion (Led by Scott Brim, 30 minutes)

Goals and Background (Mark Handley)
-----------------------------------

  Extensions to TCP to enable it to work over multiple simultaneous
  paths.
  
  TCP congestion control has done the job of keeping the Internet from
  collapsing, even though it has some problems.  TCP CC does not make
  traffic go away, it just spreads it over longer period of time.
  MPTCP actually DOES make traffic go away over congested paths
  
  Today's demands for reliability are higher than can be delivered
  over concatenated paths across the Internet, and applications are
  much more demanding of network resources.  Further, the Internet is
  unpredictable - failures, DDoS attacks, etc. - need to provide
  higher reliability for tomorrows' services.
  
  Redundancy mechanisms have historically been the method for offering
  robustness, such as routing failures, multihoming, traffic
  engineering, and increasingly traffic shaping (Via DPI).
  
  Wireless links (the vast majority of devices will be wireless), are
  also unpredictable.  They do have multiple radios, though they
  cannot currently use these multiple radios to provide redundancy.
  Example Blackberry - seamless mobility across access technologies,
  3G as backup/baseline interface, WiFi as primary when available
   
  MP Transport basic principle - stop hiding the multi-homing.  Give
  all network downlinks addresses and make them available to the
  transport protocol.  Congestion control works across multiple
  interfaces, not just on single interface.
  
  Example:  Multi-home server.  Client can connect to server over
  multiple paths if the server is multi-homed, but this is considered
  a single TCP connection.  Congestion control mechanism is linked to
  both interfaces.  Cannot do this with BGP multihoming.  Link failure
  does not impact flow - as alternate path remains available.
  
  Example:  Multi-home server + Legacy Client.  Legacy client consumes
  resources on one of available links.  MP TCP connection balances
  more traffic to alternate path to distribute traffic evenly.
  
  MP Transport goal is "resource pooling" - all network resources
  behave as single pooled resource, and shift load between various
  parts of network.  Example: Multiple 12MB links appear and act as a
  single 36MB link.  Resource pooling is not a new phenomenon - just
  being extended across multiple physical paths.  The more MP clients
  are enabled, the more possible usage matrices exist, and therefore
  the better the distribution of traffic.
  
  Both clients and servers have different reasons for sending traffic
  over multiple paths at the same time - but both have the motivation.
  MP Transport protocols are key to making internet more
  robust/responsive.  In addition, inherent redundancy and responses
  to unpredictable events are possible (which is not suitable for the
  underlying routing architecture).
  
  Belief that it is not terribly difficult to add multipath behavior
  to TCP.
  
  What about "over TCP" versus "in TCP"?  Bittorrent is an example.
  
    You only fully get the congestion benefits if you link the
    response of one path to the congestion of the other path.
    
    Want to extend to ALL applications.
    
  SCTP?  Already has the mechanism, just not implemented at the
  congestion control function.
  
  DCCP? demand?
  
  UDP?  no single approach.

  MP-TCP has been proposed several times and shot down (apologies to
  Christian Huitema, who was original proposer)
  
  New understanding that MP-TCP can solve network-wide traffic
  engineering problems

Q (Joe Touch): Three things we are dealing with: (1) TCP dealing with
multiple paths.  (2) striping.  (3) multiple endpoints.  Does the
process of building this mechanism break "virtualization"?  What if
multiple addresses on the same interface.  Need to ensure via "acid
test" that everything running through single virtual interface -
that MP-TCP, at worst, operates no worse than TCP today.
  
  Answer: Need to make sure we cover these three aspects without
  breaking virtualization - full agreement.  To be covered, to some
  degree, in Alan's presentation

  
Protocol Design (Alan Ford)
---------------------------

  Scope-  To build TCP modifications to support multipath.  Multiple
  implementations already available, but need to discuss details to
  solve Internet implementation criteria
  
  Usage - a stream of packets sent by source gets to destination.  MP
  protocol determines which path each packet takes.
  
  Criteria -  path discovery, sequence numbering, scheduling, SYN
  meanings, etc.
  
  Scenarios - most obvious - bulk client/server transfers.  Short
  transactions (HTTP), P2P, interactive and streaming services.
  
    Determining where MP-TCP needs to be deployed differs based on
    scenario (especially with P2P and streaming)
    
  Compatibility - no worse than TCP over best path.  Unaware boxes
  (including NATs) must see as standard TCP.  Should also appear as
  standard TCP to applications (API compatibility)
  
  Scheduling - the key part of MP-TCP implementation, since determines
  how to distribute along multiple paths.  Scheduler also responsible
  for retransmission (over same OR alternative path).  Scheduling
  logic may be dependent on path properties (cost, for instance)
  
  Signaling - how to do?  Chunking or as TCP options?  Chunking is
  application layer, so TCP options is available and standardizes way
  to add info to TCP headers.  Either way, signaling needs to be kept
  to minimum.
  
  Sequence space - 2 ways - single sequence space or data sequence
  space.  Former sends each TCP segment on one of the paths.  Latter
  has a clear distinction between path and data.
  
  Two proposals - "One Ended" and "Two Ended"
  
    One ended - multihomed host with provider-independent addressing.
    Only sender modified, and only one source/destination address -
    per-path congestion provided by SACK
    
    Two-ended - single TCP "subflow" begins, and additional subflows
    (new source/destination address pair) created, and merged under
    single identifier with existing flow(s)
    
      Doing signaling of additional addresses to other endpoint
      provides workarounds for NAT/firewalls.  Also allows
      simultaneous IPv4/IPv6 use
      
      Subflows look like regular TCP with additional options.
      Data-level sequence number in TCP option.
      
      Each subflow operates as individual TCP session (SYN to FIN)

  Security - no worse than TCP today, and quite possibly a mechanism
  towards stronger security (opportunistic encryption), or other for
  binding sessions together.
  
  Two-ended solution must deal with same considerations as SHIM6 and mobility

Q (Joe Touch): Endpoints start off with association based on a socket
pair, then you can add more socket pairs for better performance.  What
happens if first socket pair shuts down due to ICMP errors?

  Answer - Good point.  Current thinking that initial socket-pair
  stays connection identifier for entire MP-TCP session.  Iljitsch van
  Beijnum: ICMP doesn't make TCP go away.  Joe: "hard errors".  Mark:
  Regardless, you don't have to send anything over a path that is not
  actually working even if keep state around.  So the state of the
  initial pair will always anchor the association.
  
Q: One or more Addresses, but what about port numbers?

  Answer - One or more port numbers.  So, 2 address, 2 port numbers =
  4 paths.  But, you dont have to initiate all paths.
  
Q (Dave Thaler): One-ended model.  Multiple paths and multiple
addresses not only single use-case.  Can be single address with
multiple gateways, as well.

  Answer - Agreement

Question - Detailed technical work at this point already.  Use-cases
are less clear, corner cases.  "I disable multiple interfaces because
I dont want them both active"

  Answer - "would you say that you don't use both is because you can't
  keep session continuity via multipath?"
  
Q (Dave Oran): What about extra complexity around PMTU?

  Answer - Good question - and no answer at present.

Q: Address/Port pair corresponds to one path.  How do you
detect and test multiple paths via endpoint without knowing network
topology?  Can't detect shared bottleneck.

  Answer - Requires proper congestion control.
  
Question - Thoughts regarding MPTCP over MPTCP (ie, VPNs)?

  Answer - Great thing to consider.  No answer at present.

Question - No notification to application risks a new application
relying on a socket that is not there anymore.

  Answer - Not a concern since applications do not see.

Q (Dave Craig): NAT bindings and connection re-use?

  A: MPTCP doesn't change addresses.


Linked Congestion Control (Costin Raiciu)
-----------------------------------------

  Demonstration of Multipath TCP.  2 interfaces, both shaped to
  10Mbps.  Killed interfaces, re-enabled interfaces.  Showed how
  bandwidth went down, up, re-discovery, etc.

  MPTCP - how do you allocate traffic to each path.  Depends on
  goals:

  1) Improve throughput over the best single path versus TCP

  2) Any path should not be more aggressive than TCP (ie, take at most
     the same throughput)

  3) Balance congestion away from bad links to good links.  This
     implies resource pooling

  1) Use independent CC on each path - problem= bottleneck fairness
     (ie, you get double the throughput of TCP) solution= couple the
     congestion controllers (see algorithm in slides).  The aggregate
     flows will then behave like a single TCP flow

  BUT, fully coupled is "flappy" - flips between two extremes

  2) Further modification for "linked algorithm" problem - "flappy"
     solution = change algorithm slightly to remove problem.  (see
     change in algorithm in slides).  Algorithm grows windows on each
     interface a the same rate.  Windowing on one interface causes 50%
     of that interface load to move to the other interface.

  3) Effect on RTT problem= equal window on both subflows if equal RTT
     and same droprates.  If RTT on one interface grows, and droprates
     on both remain the same, then MP-TCP will be doing worse than
     standard TCP

  problem= RTT1<RTT2 TCP1>TCP2 - windowing and throughput need to be
  above minimum throughput line of individual TCP connection

  solution= related to RTT (see slides), but results show that MP-TCP
  works in this scenario (improved performance, actually).  Linked
  provided more fairness than uncoupled and coupled.


  Algorithm is simple, and working.  But other solutions (less
  conservative) are possible.

Q (jabber via Iljitsch): Clipping problem.  With coupled windows, is
there possibility that you need to decrease windows below zero

  Answer - no.  Fully coupled algorithm has clipping problem, but
  linked algorithm never halves to something negative (not possible)
  
Q: Operators will want to control sub-flow switching.  Need to look at
"Switching" issues - when you switch, how much you send over specific
interface, etc.  May need additional goal: "no worse than TCP in
perturbation situations".

  Answer - No reply

Q: what is a path?  MPLS?  A: source/destination pair.  Q: so, you can only
"fork" at source?  Can't take advantage of additional paths in core.

  Answer - Not trying to address these cases as this time (i.e. how
  to signal to routing system).  Iljitsch is interested in this
  question.  Costin: MPTCP is first step; if routing system gave
  choice ...  Iljitsch: discussions on mailing list about path
  selector to signal to routing system.  List concluded not to do this
  now.  
  
Q (Gorry Fairhurst): RTT.  Worried that RTT plays a part in this, when
RTT varies significantly in access links.  Is this something that can
be addressed?

  Answer - the experiment changes RTT every ACK (very dynamic).  This
  was taken into account, and Costin shows the graphs for comparison
  to standard TCP.
  
Q: Technical comparisons were to two examples - uncoupled and fully
coupled.  Support standard TCP (weighted, statically)

  Answer - this would give bottleneck fairness but not resource
  pooling
  
Q: Some resource pooling, but admittedly not as good as this.

  Answer - absolutely right. 2 types of resource pooling - (1) for
  single session at any time.  (2) for many sessions at any time.
  
Q (Remi Despre): Congratulations.  This is the most innovative and
promising idea in a long time.  Decoupled TCP is a great solution.

Q (Gorry Fairhurst): MTU?

  Answer - When applied to MTU in IPv6 with MTU 1280, should work
  immediately.

  Answer - you have to do MTU on each path.  This would require
  changes to implementation stack
  

Rethinking the transport layer (Janardhan Iyengar, Bryan Ford)
--------------------------------------------------------------

  Quite different - architectural perspective based on next-gen
  transport architecture presented in TSVArea at IETF75, Monday
  session (Tng draft)

  Transport layer defactored into four layers - Endpoint (naming, port
  numbers), Flow Regulation (performance concerns), Isolation
  (separates network functions from application functions, End-to-End
  security), Semantic (API-related functions, end-to-end reliability)

  Semantic Layer is where MP-TCP functionality fits.  Semantic Layer
  would provide multipath capabilities.  Semantic layer manages
  end-to-end state across flows, and bundles together for shared
  congestion control

    This cleanly separates functional units in MPTCP, and opens the
    exploration of new protocols in the multipath architecture
  
  Design implications -

  1) No E2E ACK, just flow-level ACK
  
     but, middlebox ACK optimistically
  
     MPTCP cannot recover from failure after ACK and before successful
     transmission
   
  2) Number of TCP flows multiplies quickly
  
     MP-TCP will consume port numbers very quickly, and increased
     state could be problematic.  CC behavior gets erratic as number
     of synched TCP connections increase
   
     (SCTP or SST might provide a clue for how to resolve at Semantic
     Layer)

  Security - where does security belong?

    Proposal is TLS above TCP subflows (each subflow appears as TLS
    over TCP - advantage, since if any middlebox messes with TLS,
    MPTCP just drops path, and thus protecting MPTCP state)
  
Q (Joe Touch): if propose TLS per flow, there may be TLS over TLS.
Also what about TCPAO?  TLS only protects data, not the TCP
connection.

  Answer - agreement.  Could use IPsec if desired.

Q (Tim Shephard): TLS today checks certificates.  Are we proposing to
not check certs?

  Answer - applications can run TLS on top of semantic layer E2E
  security.
  
  Answer - we are not wedded to TLS (S/TLS/TCPAO mechanism).  Concern
  is considering architectural implications of where you put security,
  and benefits of having security at this particular layer (between
  semantic and flow layer)
  
Q (Andrew Yourtchenko): since TLS does not protect TCP, possibility of
having single TLS above the multiplexed MPTCP flows.

  Answer - agree, but you lose the ability to crop single path.

Separating compatibility requirements with TCP at API AND network
layer has implications:

  SCTP is multi-home capable and can be used at Semantic layer with
  TCP subflows at Flow layer
  
  SST (Structured Stream Transport) and SCTP has richer APIs
  
  DCCP is better at flow layer than MP TCP
  
  *This raises the question of the right evolution of protocols

Architectural document should be part of the WG efforts - describe
architecture, design choices, and how other protocols fit in this
architecture

  Starting point should be MP-TCP design draft and Tng draft

Chair comment:  Decision on architecture document will be determined
after determining if WG is created


Technology Discussion
-------------------

Q (Marc Blanchet): at socket level, MPTCP is using same address
family?  Can this use 1 v4 stream and 1 v6 stream?  But a socket is
bound to address family, so would require change in API

  Answer - yes, everything so far has used same address family.
  Technically feasible to use multiple AFs in parallel.  Is this a
  prudent thing to do?  This may also be a way to for v4 to v6
  transition

Question - v4 and v6 - something to be aware of.  Ports are part of
the v4 address family in many proposals.  Concern about port
consumption.  Would be great if additional ports wer enot needed.  

  Answer - identifier is used for determining which subflows to bind
  to initial flow.  Ports are actually only used for binding and used
  for rendezvous for initial connection.  
  
  Answer from chair - there are great benefits to working through
  proposal in IETF to understand implications on the Internet.

Q (Dino Farinacci): MP-TCP cannot choose path, it chooses exit
interface from host.  Network does destination routing.  Example -
site that is multi-homed to SP A and SP B, with unique address.  A
packet could take an exit that didn't match its source address, and
get dropped due to uRPF.  Very difficult to decide on source address
without considering decisions on the routing side.  Keeping it loosely
coupled breaks MP-TCP.  This questions usefulness of load-splitting
this close to the host.  You cannot load-split across converged
network (ie, Internet).  Clear distinction needs to be made that
MP-TCP cannot choose paths, just interface, and the fate of the packet
is tied to that decision.  Having multiple IPs on the source is a very
difficult problem to solve

  Answer - Need to be aware of how to get traffic out of link, and
  might require addtional methods.

  Answer - This is not a MPTCP problem, since that's a misconfigured
  network with or without MPTCP.

  Answer - The network/datacenter architecture would evolve
  differently if these capabilities existed.

Q (Tim Shephard): In response to previous question/comment - there are
ways to expose multiple paths in the network to the end systems.  Cha
Wei Yang's paper.  Remi's preso in IntArea.  Doesn't necessarily agree
with Dino.

Q (Tim Shephard): The most important thing here is congestion work.
Don't get lost in the details.  HIP/SHIM6 needs a multi-path TCP
technology to be successful.  Need to manage sets of
addresses/locators.  Need to manage security for those sets.  Would be
best to be compatible with HIP and SHIM6.

  Answer - SHIM6 folks involved in draft, so architectural
  consideration involved in using SHIM6 and MP-TCP together are being
  explored

  Scott: Have looked at HIP and SHIM6 and believe MPTCP has advantages
  in the area of address-pair liveness detection.  Tim: probably
  agree.

 Iljitsch: could make one-ended on TCP and then use SHIM6.
  
  Answer from Chair - this should be discussed in a WG structure and
  decide which way to go forward.  Feedback from community is best
  way.
  
Question - Different view on relationship between MP-TCP and IPv4 and
IPv6.  Need to start to develop complete solution in IPv6.  MP on IPv6
is very interesting as a solution to the routing problem is
compelling. and can be designed very quickly.

  Answer - Agreement.  From Tng architecture, the flow layer picked in
  initial prototype (existing TCP) is around maintaining compatibility
  with legacy interface.  With IPv6, there is opportunity to not have
  to deal with compatibility problems and design limitations (port
  numbers) of legacy TCP.  Point is - only in v4 case is it necessary
  to be compatible with existing infrastructure.  In v6, more feasible
  to consider MP-TCP default operation use different flow layer (ie,
  DCCP)
  
  Answer from chair - need to work in IPv4 and IPv6.  Thats intent of work

Question - aren't mobility, multi-homing, etc already clearly
differentiated in charter of what we are doing?

  Answer - next topic of conversation - WG charter

Question - please send out mail as to why port problems are alleviated
in v6 issue

  Answer - v6 assumption that no v6 port address translation

Question - port selection and concerns are a function of the transport
protocol selection

Question - ports in v4 and v6 - no port NAT on the path.  Not that we
don't need or shouldn't have - just don't.  IPv6 NAT will not share
addresses, versus v4 where you share a single address.

  Answer from chair - NAT will be moved to mailing list conversation


WG Charter Discussion
-------------------

Q (Gabriel Montenegro): Is it well-defined?  Still questions on v4 or
v4/v6?

  Answer - intent is both
  
Question - Vote for WG charter

Q (Joe Touch): set of problems is reasonably well defined and would
like it expressed that way.  Hesitant to create WG, but if LISP is a
WG, this should be a WG.  Should go to experimental, and if
well-received, move back to proposed.  Experimental frees people
outside of the IETF process

  Answer - avoided presenting proposal, so this would not be
  design-by-committee.  Need for WG is to help nail all
  questions/details.
  
Question - support of working group and load-sharing across Internet
is important subject, and MP-TCP interesting by simplicity.  If group
does not exclude pure IPv6 solution before v4/v6 solution is useful.
Experimental or proposed standard?  Possible for WG to be open to
either?

Q (Pete Resnick): as applications guy, this is "delightful" and
addresses a host of issues that Apps have dealt with forever.  Apps
would like to see charter

Question - application feedback 2nd'd.

Question for audience - create WG?

  Handraise suggested overwhelming (vast majority) for  formation of
  working group.  No objections.
  
  Charter is next step.  Target September.

Question for audience - experimental or proposed?
 
  Answer - experimental to proposed migration is easy if design works.
  experimental is easier to work from WG perspective.  Good argument
  from group needed to go to proposed standard
  
Q (Joe Touch): charter should call out separation of work or building a
sequence of steps that add value independently, and integrated into
single document at the end.  This might simplify things.  Would like
to see ability to switch to a different IP address separated out.

  Answer - point taken and detailed look as to whether this is
  possible.  More work here.
  
Question - Does MPTCP assume multiple interfaces?

  Answer - no.
  
Q (Marc Blanchet): Note re: overlap with MIF

  Answer - noted.  Assumption that output of MIF influences and
  directs MP WG