idnits 2.17.1 

draft-duchene-mptcp-load-balancing-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 135: '...hat this address MUST NOT be used to c...'
     RFC 2119 keyword, line 138: '... with the "B" set to 1 MUST NOT try to...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 03, 2017) is 2488 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Unused Reference: 'I-D.ietf-mptcp-rfc6824bis' is defined on line 489,
     but no explicit reference was found in the text

  == Unused Reference: 'RFC1323' is defined on line 499, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC6182' is defined on line 503, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC7430' is defined on line 508, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 6824 (Obsoleted by RFC 8684)

  == Outdated reference: A later version (-18) exists of
     draft-ietf-mptcp-rfc6824bis-07

  -- Obsolete informational reference (is this intentional?): RFC  793
     (Obsoleted by RFC 9293)

  -- Obsolete informational reference (is this intentional?): RFC 1323
     (Obsoleted by RFC 7323)


     Summary: 2 errors (**), 0 flaws (~~), 6 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	MPTCP Working Group                                           F. Duchene
3	Internet-Draft                                                 UCLouvain
4	Intended status: Experimental                                 V. Olteanu
5	Expires: January 4, 2018             University Politehnica of Bucharest
6	                                                          O. Bonaventure
7	                                                               UCLouvain
8	                                                               C. Raiciu
9	                                     University Politehnica of Bucharest
10	                                                                 A. Ford
11	                                                                   Pexip
12	                                                           July 03, 2017

14	                      Multipath TCP Load Balancing
15	                 draft-duchene-mptcp-load-balancing-01

17	Abstract

19	   In this document we propose several solutions to allow Multipath TCP
20	   to better work behind load balancers.

22	Status of This Memo

24	   This Internet-Draft is submitted in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF).  Note that other groups may also distribute
29	   working documents as Internet-Drafts.  The list of current Internet-
30	   Drafts is at http://datatracker.ietf.org/drafts/current/.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   This Internet-Draft will expire on January 4, 2018.

39	Copyright Notice

41	   Copyright (c) 2017 IETF Trust and the persons identified as the
42	   document authors.  All rights reserved.

44	   This document is subject to BCP 78 and the IETF Trust's Legal
45	   Provisions Relating to IETF Documents
46	   (http://trustee.ietf.org/license-info) in effect on the date of
47	   publication of this document.  Please review these documents
48	   carefully, as they describe your rights and restrictions with respect
49	   to this document.  Code Components extracted from this document must
50	   include Simplified BSD License text as described in Section 4.e of
51	   the Trust Legal Provisions and are provided without warranty as
52	   described in the Simplified BSD License.

54	Table of Contents

56	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
57	   2.  Proposed solutions  . . . . . . . . . . . . . . . . . . . . .   3
58	     2.1.  Per-server addresses  . . . . . . . . . . . . . . . . . .   3
59	     2.2.  Embedding Extra Information in Packets  . . . . . . . . .   5
60	       2.2.1.  Proposal 1  . . . . . . . . . . . . . . . . . . . . .   5
61	       2.2.2.  Proposal 2  . . . . . . . . . . . . . . . . . . . . .   6
62	     2.3.  Application Layer Authentication  . . . . . . . . . . . .   9
63	   3.  Comparaison of the solutions  . . . . . . . . . . . . . . . .   9
64	   4.  Recommandations . . . . . . . . . . . . . . . . . . . . . . .  10
65	   5.  IANA considerations . . . . . . . . . . . . . . . . . . . . .  10
66	   6.  Security considerations . . . . . . . . . . . . . . . . . . .  10
67	   7.  Conclusion  . . . . . . . . . . . . . . . . . . . . . . . . .  10
68	   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  10
69	     8.1.  Normative References  . . . . . . . . . . . . . . . . . .  10
70	     8.2.  Informative References  . . . . . . . . . . . . . . . . .  11
71	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  11

73	1.  Introduction

75	   Multipath TCP is an extension to TCP [RFC0793] that was specified in
76	   [RFC6824].  Multipath TCP allows hosts to use multiple paths to send
77	   and receive the data belonging to one connection.  For this, a
78	   Multipath TCP connection is composed of several TCP connections that
79	   are called subflows.

81	   Many large web sites are served by servers that are behind a load
82	   balancer.  The load balancer receives the connection establishment
83	   attempts and forwards them to the actual servers that serve the
84	   requests.  One issue for the end-to-end deployment of Multipath TCP
85	   is its ability to be used on load-balancers.  Different types of load
86	   balancers are possible.  We consider a simple but important load
87	   balancer that does not maintain any per-flow state.  This load
88	   balancer is illustrated in Figure 1.  A stateless load balancer can
89	   be implemented by hashing the five tuple (IP addresses and port
90	   numbers) of each incoming packet and forwarding them to one of the
91	   servers based on the hash value computed.  With TCP, this load
92	   balancer ensures that all the packets that belong to one TCP
93	   connection are sent to the same server since each packet contains the
94	   five-tuple used by the hash function.

96	      +--+---- S1
97	   ---|LB|---- S2
98	      +--+---- S3

100	                     Figure 1: Stateless load balancer

102	   With Multipath TCP, this approach cannot be used anymore when
103	   subflows are created by the clients.  Such subflows can contain any
104	   five tuple and thus packets belonging to them will be load-balanced
105	   to any server, not necessarily the one that was selected by the
106	   hashing function for the initial subflow.

108	   In this document, we propose several solutions to allow Multipath TCP
109	   to work behind load balancers.

111	2.  Proposed solutions

113	2.1.  Per-server addresses

115	   A first solution is to use two types of public addresses.  The load
116	   balancer uses a public address that is advertised in the DNS.  This
117	   address is used to establish the initial subflow of all Multipath TCP
118	   connections.  In addition to this address, a pool of addresses is
119	   used for the servers behind the load balancer.  One address of this
120	   pool is assigned to each server behind the load balancer.  This
121	   server address is not announced in the DNS and only advertised by the
122	   servers through the ADD_ADDR option.

124	   The additional per-server address is used by the clients when they
125	   wish to create additional subflows.  Since each server has its own
126	   public address, this ensures that the additional subflows are
127	   directed to the corresponding server.  For this solution, we need to
128	   ensure that the client never use the public address of the load
129	   balancer to initiate subflows.  This can be achieved by a slight
130	   modification to the MP_CAPABLE option described below.

132	   To allow Multipath TCP to work for servers behind layer 4 load
133	   balancers, we propose to use the reserved "B" flag in the MP_CAPABLE
134	   option sent (shown in Figure 2 in the SYN+ACK.  This flag informs the
135	   other host that this address MUST NOT be used to create additional
136	   subflows.

138	   A host receiving an MP_CAPABLE with the "B" set to 1 MUST NOT try to
139	   establish a subflow to the source address of the MP_CAPABLE.  This
140	   bit can also be used in the MP_CAPABLE option sent in the SYN by a
141	   client that resides behind a NAT or firewall or does not accept
142	   server-initiated subflows.

144	                        1                   2                   3
145	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
146	   +---------------+---------------+-------+-------+---------------+
147	   |     Kind      |    Length     |Subtype|Version|A|B|C|D|E|F|G|H|
148	   +---------------+---------------+-------+-------+---------------+
149	   |                   Option Sender's Key (64 bits)               |
150	   |                      (if option Length > 4)                   |
151	   |                                                               |
152	   +---------------------------------------------------------------+
153	   |                  Option Receiver's Key (64 bits)              |
154	   |                      (if option Length > 12)                  |
155	   |                                                               |
156	   +-------------------------------+-------------------------------+
157	   |  Data-Level Length (16 bits)  |  Checksum (16 bits, optional) |
158	   +-------------------------------+-------------------------------+

160	              Figure 2: Multipath Capable (MP_CAPABLE) Option

162	   This bit can be used by servers behind a stateless load balancer.
163	   The servers set the "B" flag in the MP_CAPABLE option that they
164	   return and advertise their own address by using the ADD_ADDR option.
165	   Upon reception of this option, the clients can create the additional
166	   subflows towards these addresses.  Compared with current stateless
167	   load balancers, an advantage of this approach is that the packets
168	   belonging to the additional subflows do not need to pass through the
169	   load balancer.

171	   To demonstrate the principle of an off path load balancer let's
172	   consider a server behind a load balancer.

174	            +-- net1 --+  +-- Load Balancer --+--- ADDR 1 ---+
175	            |          |  |                                  |
176	   client --+          +--+                                  +--- Server
177	            |          |  |                                  |
178	            +-- net2 --+  +------------- ADDR 2 -------------+

180	                    Figure 3: A server with 2 addresse.

182	   As shown in figure Figure 3, this server has 2 IP addresses: 1 behind
183	   the load balancer and 1 directly connected to the Internet.  The
184	   client sends a SYN containing an MP_CAPABLE option, the server
185	   answers with a SYN+ACK containing an MP_CAPABLE with the "B" flag set
186	   to 1.  Upon reception of the SYN+ACK, the client will know that it
187	   cannot establish any more subflow towards IP address.  The server
188	   will then advertise it's secondary address with an ADD_ADDR.  Once
189	   the client has established at least one connection to the secondary
190	   IP address, the server could elect to close the primary subflow or to
191	   put it in backup mode.

193	2.2.  Embedding Extra Information in Packets

195	   Under some circumstances, addressing the individial servers via their
196	   individial IPs is not desirable or feasible.  To work around this
197	   issue, we propose two mutually-exclusive solutions.  They rely to
198	   varying degrees on getting the client to embed connection or server-
199	   identifying information in the packets that it sends out.  This extra
200	   information can be used statelessly by the loadbalancers.

202	   Both solutions require modifications only to the server stack and
203	   work well with existing MPTCP clients.

205	2.2.1.  Proposal 1

207	   Our first proposal revolves around controlling the destination port
208	   that the client uses in all subflows aside from the initial one.  It
209	   is possible for the server to advertise an additional port via the
210	   ADD_ADDR option [RFC6824].  This informs the client that it can send
211	   an MP_JOIN to this new port and initiate a new subflow.

213	   To take advantage of this, each server is be assigned a unique 16-bit
214	   ID, which must be different from the port on which the service is
215	   being hosted (e.g. 80).  As soon as a connection is initiated, the
216	   server sends an ADD_ADDR to the client advertising a new port equal
217	   to said ID.

219	   Packets that arrive at the loadbalancer are treated as follows:

221	   o  Packets destined to the port that the service is being hosted on
222	      will be forwarded to a server based on a hash of the 5-tuple.

224	   o  Packets destined to any other port are forwarded to the server
225	      whose ID matches the destination port.

227	   This approach has two drawbacks:

229	   o  The client will most likely also try to initiate subflows using
230	      the server's original port.  Because these subflows are
231	      loadbalanced based on a hash of their 5-tuple, they will almost
232	      certainly reach a different server and break.  (Using REMOVE_ADDR
233	      to prevent the creation of these subflows would entail the
234	      destruction of the original subflow.)  This issue can be solved by
235	      the adoption of the protocol modifications outlined in
236	      Section 2.1.

238	   o  If the client is behind a firewall that restricts access to
239	      certain destination ports, it might not succeed in establishing
240	      any new subflows.

242	2.2.2.  Proposal 2

244	   Our second proposal is to loadbalance packets based on the server's
245	   token.

247	   The token's most significant 14 bits are treated as a hash value for
248	   the connection.  They are embedded in all outgoing TCP timestamps,
249	   and subsequently echoed back by the client.  Incoming packets that do
250	   not contain timestamps (such as FINs) are dealt with via redirection
251	   between the servers.

253	2.2.2.1.  Connection Initiation

255	   The client initiates an MPTCP connection by sending a SYN with the
256	   MP_CAPABLE option.  Under normal operation, the server then picks a
257	   random 64-bit key for the connection, and uses it to compute its
258	   token.

260	   To forward the packet appropriately, the load balancer must know the
261	   token before deciding what server to send it to.  To accomplish this,
262	   we move the key generation to the load balancer.  The connection's
263	   token can be computed based on the generated key.

265	   The load balancer places the generated key, along with the IP address
266	   of the server that would be responsible for the subflow under normal
267	   5-tuple hashing (which we call the alternate server IP) in an IP
268	   option and forwards the SYN to the server.

270	                             1                   2                   3
271	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
272	       +---------------+---------------+---------------+---------------+
273	       |   Type = 96   |  Length = 16  |             Unused            |
274	       +---------------+---------------+---------------+---------------+
275	       |                                                               |
276	       +                          Server Key                           +
277	       |                                                               |
278	       +---------------+---------------+---------------+---------------+
279	       |                      Alternate Server IP                      |
280	       +---------------+---------------+---------------+---------------+

282	              Figure 4: IP Option Used for MP_CAPABLE packets

284	   The figure above depicts the IP option that is inserted into the
285	   MP_CAPABLE packet before it is sent to the server.  We have chosen an
286	   IP option despite the fact that the data contained therein pertains
287	   to the transport layer, because TCP option space is very limited.  IP
288	   option type 96 is currently classified as reserved [RFC0791].

290	   Upon receipt of the packet, the server uses the key provided to
291	   compute the token for the connection.  If no connection with the same
292	   token exists, the server uses the key provided.  Otherwise, it takes
293	   a brute-force approach and randomly generates multiple keys and
294	   selects one that yields a token with the same 14 highest-order bits.

296	   The use of the alternate server IP will be discussed in a later
297	   section.

299	2.2.2.2.  Handling MP_JOIN packets

301	   Additional subflows are initiated by the client by sending MP_JOIN
302	   packets.  These packets contain the server's token.

304	   Similarly to how MP_CAPABLE packets are treated, the load balancer
305	   uses an IP option to inform the server about which other server would
306	   be responsible for the subflow under normal 5-tuple hashing.

308	                             1                   2                   3
309	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
310	       +---------------+---------------+---------------+---------------+
311	       |   Type = 97   |   Length = 8  |             Unused            |
312	       +---------------+---------------+---------------+---------------+
313	       |                      Alternate Server IP                      |
314	       +---------------+---------------+---------------+---------------+

316	               Figure 5: IP Option Used for MP_JOIN packets

318	   IP option type 97 is also classified as reserved [RFC0791].

320	2.2.2.3.  Embedding the token in the timestamp

322	   The TCP timestamp option [RFC7323] is present in most packets and is
323	   comprised of two fields: the TSval, which is set by the packet's
324	   sender, and TSecr, which contains a timestamp recently received from
325	   the other end.

327	   Taking advantage of the fact that timestamps set by the server are
328	   echoed back by the client, the server shifts its timestamp clock left
329	   by 14 bits, and embeds the 14 highest-order bits of the token into
330	   the 14 lowest-order bits of the TSval.  When a packet with the ACK
331	   flag set and with the TS option present arrives at the loadbalancer,
332	   it is forwarded based on the 14 least significant bits of the TSecr
333	   field.

335	2.2.2.3.1.  Impact on PAWS

337	   Timestamps supplied by the server are used by the client for
338	   protection against wrapped sequence numbers (PAWS).  Note that for
339	   Multipath TCP, the utilisation of the 64 bits DSN already protects
340	   against PAWS.

342	   We assume that the server uses a timestamp clock frequency of 1 tick
343	   per ms, which is the highest frequency recommended by [RFC7323].  The
344	   recycling time of the timestamp clock's sign bit is required to be
345	   greater than the Maximum Segment Lifetime of 255 seconds.  Given that
346	   the clock ticks once every ms in increments of 2 ^ 14, its recycling
347	   time is roughly 262 s, which is within the bounds set by the
348	   standard.

350	   While the quickly-increasing timestamp is benign to active subflows,
351	   PAWS will still cause segments to be dropped if the subflow in
352	   question had been idle for a period longer than the clock's recycling
353	   time.  To solve this, the server periodically sends keepalive
354	   messages during idle periods.

356	2.2.2.4.  Redirecting packets without timestamps

358	   Some packets (most notably FINs) do not contain timestamps or any
359	   other connection-identifying information.  As such, they are
360	   forwarded to a server based on a hash of the 5-tuple.

362	   As seen in Section 2.2.2.1 and Section 2.2.2.2, whenever a new
363	   subflow is setup, the server responsible for it (A) also knows which
364	   other server (B) would be hit by the packets in case 5-tuple hashing
365	   is used.

367	   A will use a simple peer-to-peer protocol to inform B to setup a
368	   redirection rule for the 5-tuple in question.  The redirection rule
369	   will be deleted by B either at A's request, after the subflow has
370	   finished, or after a timeout.  We do not discuss the specifics of the
371	   protocol in this document.

373	   Redirection of a packet is performed using IP-in-IP encapsulation.

375	2.3.  Application Layer Authentication

377	   With similar motivations to 2.2, this proposal
378	   [I-D.paasch-mptcp-application-authentication] decouples the token
379	   signalled in the TCP options from the key used in authentication,
380	   allowing the token to carry arbitrary information.  By allowing the
381	   token to be arbitrarily assigned by the sender, a load balancer could
382	   embed routing information so it knows which server to forward the
383	   packets on the TCP session towards.

385	   For example, the token could carry a server identifier, a port
386	   number, and a signature based on a known secret.  Furthermore, by
387	   generating tokens directly there is no risk of hash collisions in
388	   token generation.  By allowing the token to be arbitrarily assigned,
389	   decoupled from the keys, the authentication of additional subflows is
390	   delegated to the application layer.  A proposal for the use of TLS
391	   for this is defined in [I-D.paasch-mptcp-tls-authentication], whereby
392	   keys can be extracted from a TLS session and used to set up
393	   additional subflows.

395	3.  Comparaison of the solutions

397	   Per-server addresses:

399	   o  Requires individual public addresses for each of the servers,
400	      making IPv6 almost mandatory.

402	   o  Requires modifications to the clients and servers stack.

404	   o  Is transparent and works with today's load balancers.

406	   o  Doesn't need any modification to the applications.

408	   o  Disclose the real IP address of the servers.

410	   o  Allows to put the load balancer off-path.

412	   Extra Information in Packets:

414	   o  Doesn't require an individual public addresses for each of the
415	      servers.

417	   o  Requires modifications to the load balancers servers stack.

419	   o  Could be broken by a firewall blocking certain destination ports
420	      (proposal 1) or changing the value of the timestamps (proposal 2).

422	   o  Doesn't need any modification to the applications.

424	   o  Doesn't disclose the real IP address of the servers.

426	   Application Layer Authentication:

428	   o  Doesn't require public IP addresses

430	   o  Requires support at clients and load balancers

432	   o  Doesn't disclose IP addresses

434	   o  No greater risk of middle box interference than MPTCP today

436	   o  Additional security through no key exchange in the clear

438	4.  Recommandations

440	5.  IANA considerations

442	   This document proposes some modifications to the Multipath TCP
443	   options defined in [RFC6824].  These modifications do not require any
444	   specific action from IANA.

446	6.  Security considerations

448	   Security considerations will be discussed in the next version of this
449	   draft.

451	7.  Conclusion

453	   In this document, we have described and compared two solutions to
454	   load balance MultiPath TCP connections.  We showed that these two
455	   solutions have advantages and drawbacks and cover different network
456	   configurations.  Future versions of this draft will discuss security
457	   considerations.

459	8.  References

461	8.1.  Normative References

463	   [I-D.paasch-mptcp-application-authentication]
464	              Paasch, C. and A. Ford, "Application Layer Authentication
465	              for MPTCP", draft-paasch-mptcp-application-
466	              authentication-00 (work in progress), May 2016.

468	   [I-D.paasch-mptcp-tls-authentication]
469	              Paasch, C. and A. Ford, "TLS Authentication for MPTCP",
470	              draft-paasch-mptcp-tls-authentication-00 (work in
471	              progress), May 2016.

473	   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
474	              DOI 10.17487/RFC0791, September 1981,
475	              <http://www.rfc-editor.org/info/rfc791>.

477	   [RFC6824]  Ford, A., Raiciu, C., Handley, M., and O. Bonaventure,
478	              "TCP Extensions for Multipath Operation with Multiple
479	              Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013,
480	              <http://www.rfc-editor.org/info/rfc6824>.

482	   [RFC7323]  Borman, D., Braden, B., Jacobson, V., and R.
483	              Scheffenegger, Ed., "TCP Extensions for High Performance",
484	              RFC 7323, DOI 10.17487/RFC7323, September 2014,
485	              <http://www.rfc-editor.org/info/rfc7323>.

487	8.2.  Informative References

489	   [I-D.ietf-mptcp-rfc6824bis]
490	              Ford, A., Raiciu, C., Handley, M., Bonaventure, O., and C.
491	              Paasch, "TCP Extensions for Multipath Operation with
492	              Multiple Addresses", draft-ietf-mptcp-rfc6824bis-07 (work
493	              in progress), October 2016.

495	   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
496	              RFC 793, DOI 10.17487/RFC0793, September 1981,
497	              <http://www.rfc-editor.org/info/rfc793>.

499	   [RFC1323]  Jacobson, V., Braden, R., and D. Borman, "TCP Extensions
500	              for High Performance", RFC 1323, DOI 10.17487/RFC1323, May
501	              1992, <http://www.rfc-editor.org/info/rfc1323>.

503	   [RFC6182]  Ford, A., Raiciu, C., Handley, M., Barre, S., and J.
504	              Iyengar, "Architectural Guidelines for Multipath TCP
505	              Development", RFC 6182, DOI 10.17487/RFC6182, March 2011,
506	              <http://www.rfc-editor.org/info/rfc6182>.

508	   [RFC7430]  Bagnulo, M., Paasch, C., Gont, F., Bonaventure, O., and C.
509	              Raiciu, "Analysis of Residual Threats and Possible Fixes
510	              for Multipath TCP (MPTCP)", RFC 7430,
511	              DOI 10.17487/RFC7430, July 2015,
512	              <http://www.rfc-editor.org/info/rfc7430>.

514	Authors' Addresses

516	   Fabien Duchene
517	   UCLouvain

519	   Email: fabien.duchene@uclouvain.be
520	   Vladimir Olteanu
521	   University Politehnica of Bucharest

523	   Email: vladimir.olteanu@cs.pub.ro

525	   Olivier Bonaventure
526	   UCLouvain

528	   Email: Olivier.Bonaventure@uclouvain.be

530	   Costin Raiciu
531	   University Politehnica of Bucharest

533	   Email: costin.raiciu@cs.pub.ro

535	   Alan Ford
536	   Pexip

538	   Email: alan.ford@gmail.com