idnits 2.17.1 

draft-paasch-mptcp-loadbalancer-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (September 7, 2015) is 3154 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-02) exists of
     draft-paasch-mptcp-syncookies-00

  ** Obsolete normative reference: RFC 6824 (Obsoleted by RFC 8684)


     Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	MPTCP Working Group                                            C. Paasch
3	Internet-Draft                                               G. Greenway
4	Intended status: Experimental                                Apple, Inc.
5	Expires: March 10, 2016                                          A. Ford
6	                                                                   Pexip
7	                                                       September 7, 2015

9	               Multipath TCP behind Layer-4 loadbalancers
10	                   draft-paasch-mptcp-loadbalancer-00

12	Abstract

14	   Large webserver farms consist of thousands of frontend proxies that
15	   serve as endpoints for the TCP and TLS connection and relay traffic
16	   to the (sometimes distant) backend servers.  Load-balancing across
17	   those server is done by layer-4 loadbalancers that ensure that a TCP
18	   flow will always reach the same server.

20	   Multipath TCP's use of multiple TCP subflows for the transmission of
21	   the data stream requires those loadbalancers to be aware of MPTCP to
22	   ensure that all subflows belonging to the same MPTCP connection reach
23	   the same frontend proxy.  In this document we analyze the challenges
24	   related to this and suggest a simple modification to the generation
25	   of the MPTCP-token to overcome those challenges.

27	Status of This Memo

29	   This Internet-Draft is submitted in full conformance with the
30	   provisions of BCP 78 and BCP 79.

32	   Internet-Drafts are working documents of the Internet Engineering
33	   Task Force (IETF).  Note that other groups may also distribute
34	   working documents as Internet-Drafts.  The list of current Internet-
35	   Drafts is at http://datatracker.ietf.org/drafts/current/.

37	   Internet-Drafts are draft documents valid for a maximum of six months
38	   and may be updated, replaced, or obsoleted by other documents at any
39	   time.  It is inappropriate to use Internet-Drafts as reference
40	   material or to cite them other than as "work in progress."

42	   This Internet-Draft will expire on March 10, 2016.

44	Copyright Notice

46	   Copyright (c) 2015 IETF Trust and the persons identified as the
47	   document authors.  All rights reserved.

49	   This document is subject to BCP 78 and the IETF Trust's Legal
50	   Provisions Relating to IETF Documents
51	   (http://trustee.ietf.org/license-info) in effect on the date of
52	   publication of this document.  Please review these documents
53	   carefully, as they describe your rights and restrictions with respect
54	   to this document.  Code Components extracted from this document must
55	   include Simplified BSD License text as described in Section 4.e of
56	   the Trust Legal Provisions and are provided without warranty as
57	   described in the Simplified BSD License.

59	Table of Contents

61	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
62	   2.  Problem statement . . . . . . . . . . . . . . . . . . . . . .   3
63	   3.  Proposals . . . . . . . . . . . . . . . . . . . . . . . . . .   4
64	     3.1.  Explicitly announcing the token . . . . . . . . . . . . .   4
65	     3.2.  Changing the token generation . . . . . . . . . . . . . .   6
66	   4.  Conclusion  . . . . . . . . . . . . . . . . . . . . . . . . .   6
67	   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   7
68	   6.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   7
69	     6.1.  Normative References  . . . . . . . . . . . . . . . . . .   7
70	     6.2.  Informative References  . . . . . . . . . . . . . . . . .   7
71	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   7

73	1.  Introduction

75	   Internet services rely on large server farms to deliver content to
76	   the end-user.  In order to cope with the load on those server farms
77	   they rely on a large, distributed load-balancing architecture at
78	   different layers.  Backend servers are serving the content from
79	   within the data center to the frontend proxies.  These frontend
80	   proxies are the ones terminating the TCP connections from the
81	   clients.  A server farm relies on a large number of these frontend
82	   proxies to provide sufficient capacity.  In order to balance the load
83	   on those frontend proxies, layer-4 loadbalancers are installed in
84	   front of these.  Those loadbalancers ensure that a TCP-flow will
85	   always be routed to the same frontend proxy.  For resilience and
86	   capacity reasons the data-center typically deploys multiple of these
87	   loadbalancers [Shuff13] [Patel13].

89	   These layer-4 loadbalancers rely on consistent hashing algorithms to
90	   ensure that a TCP-flow is routed to the appropriate frontend proxy.
91	   The consistent hashing algorithm avoids state-synchronization across
92	   the loadbalancers, making sure that in case a TCP-flow gets routed to
93	   a different loadbalancer (e.g., due to a change in routing) the TCP-
94	   flow will still be sent to the appropriate frontend proxy
95	   [Greenberg13].

97	   Multipath TCP uses different TCP flows and spreads the application's
98	   data stream across these [RFC6824].  These TCP subflows use a
99	   different 4-tuple in order to be routed on a different path on the
100	   Internet.  However, legacy layer-4 loadbalancers are not aware that
101	   these different TCP flows actually belong to the same MPTCP
102	   connection.

104	   The remainder of this document explains the issues that arise due to
105	   this and suggests a possible change to MPTCP's token-generation
106	   algorithm to overcome these issues.

108	2.  Problem statement

110	   In an architecture with a single layer-4 loadbalancer but multiple
111	   frontend proxies, the layer-4 loadbalancer will have to make sure
112	   that the different TCP subflows that belong to the same MPTCP
113	   connection are routed to the same frontend proxy.  In order to
114	   achieve this, the loadbalancer has to be made "MPTCP-aware", tracking
115	   the keys exchanged in the MP_CAPABLE handshake.  This state-tracking
116	   allows the loadbalancer to also calculate the token associated with
117	   the MPTCP-connection.  The loadbalancer thus creates a mapping
118	   (token, frontend proxy), stored in memory for the lifetime of the
119	   MPTCP connection.  As new TCP subflows are being created by the
120	   client, the token included in the SYN+MP_JOIN message allows the
121	   loadbalancer to ensure that this subflow is being routed to the
122	   appropriate frontend proxy.

124	   However, as soon as the data center employs multiple of these layer-4
125	   loadbalancers, it may happen that TCP subflows that belong to the
126	   same MPTCP connection are being routed to different loadbalancers.
127	   This implies that the loadbalancer needs to share the mapping-state
128	   it created for all MPTCP connections among all other loadbalancers to
129	   ensure that all loadbalancers route the subflows of an MPTCP
130	   connection to the same frontend proxy.  This is substantially more
131	   complicated to implement, and would suffer from latency issues.

133	   Another issue when MPTCP is being used in a large server farm is that
134	   the different frontend proxies may generate the same token for
135	   different MPTCP connections.  This may happen because the token is a
136	   truncated hash of the key, and hash collisions may occur.  A server
137	   farm handling millions of MPTCP connections has actually a very high
138	   chance of generating those token-collisions.  A loadbalancer will
139	   thus no more be able to accurately send the SYN+MP_JOIN to the
140	   correct frontend proxy in case a token-collision happened for this
141	   MPTCP connection.

143	3.  Proposals

145	   The issues described in Section 2 have their origin due to the
146	   undeterministic nature in the token-generation.  Indeed, if it
147	   becomes possible for the loadbalancer to infer the frontend proxy to
148	   forward this flow to, MPTCP becomes deployable in such kinds of
149	   environments.

151	   The suggested solutions have their basis in a token from which a
152	   loadbalacer can glean routing information in a stateless manner.  To
153	   allow the loadbalancer to infer the proxy based on the token, the
154	   proxies each need to be assigned to a range of unique integers.  When
155	   the token falls within a certain range, the loadbalancer knows to
156	   which proxy to forward the sufblow.  Using a contiguous range of
157	   integers makes the frontend very vulnerable to attackers.  Thus, a
158	   reversible function is needed that makes the token random-looking.  A
159	   32-bit block-cipher (e.g., RC5) provides this random-looking
160	   reversible function.  Thus, for both proposals we assume that the
161	   frontend proxies and the layer-4 loadbalancer share a local secret Y,
162	   of size 32 bits.  This secret is only known to the server-side data
163	   center infrastructure.  If X is an integer from within the range
164	   associated to the proxy, the proxy will generate the token by
165	   encypting X with secret Y.  The loadbalancer will simply decrypt the
166	   token with the secret Y, which provides it the value of X, allowing
167	   it to forward the TCP flow to the appropriate proxy.

169	   This approach also ensures that the tokens generated by different
170	   servers are unique to each server, eliminating the token-collision
171	   issue outlined in the previous section.

173	   In the following we outline two different approaches to handle the
174	   above described problems, using this approach.  The two proposals
175	   provide different ways of communicating the token over to the peer
176	   during the MP_CAPABLE handshake.  We would like these proposals to
177	   serve as a discussion basis for the design of the definite solution.

179	3.1.  Explicitly announcing the token

181	   One way of communicating the token to simply announce it in plaintext
182	   within the MP_CAPABLE handshake.  In order to allow this, the wire-
183	   format of the MP_CAPABLE handshake needs to change however.

185	   One solution would be to simply increase the size of the MP_CAPABLE
186	   by 4 bytes, giving space for the token to be included in the SYN and
187	   SYN/ACK as well as adding it to the third ACK.  However, due to the
188	   scarce TCP-option space this solution would suffer deployment
189	   difficulties.

191	   If the solution proposed in [I-D.paasch-mptcp-syncookies] is being
192	   deployed, the MP_CAPABLE-option in the SYN-segment has been reduced
193	   to 4 bytes.  This gives us space within the option-space of the SYN-
194	   segment that can be used.  This allows the client to announce its
195	   token within the SYN-segment.  To allow the server to announce its
196	   token in the SYN/ACK, without bumping the option-size up to 16 bytes,
197	   we reduce the size of the server's key down to 32 bits, which gives
198	   space for the server's token.  To avoid introducing security-risks by
199	   reducing the size of the server's key, we suggest to bump the
200	   client's key up to 96 bits.  This provides still a total of 128 bits
201	   of entropy for the HMAC computation.  The suggested handshake is
202	   outlined in Figure 1.

204	              SYN + MP_CAPABLE_SYN (Token_A)
205	          ------------------------------------->
206	            (the client announces the 4-byte locally
207	             unique token to the server in the
208	             SYN-segment).

210	             SYN/ACK + MP_CAPABLE_SYNACK (Token_B, Key_B)
211	          <-------------------------------------
212	            (the server replies with a SYN/ACK announcing
213	             as well a 4-byte locally unique token and a 4-byte key)

215	             ACK + MP_CAPABLE_ACK (Key_A, Key_B)
216	          -------------------------------------->
217	             (third ack, the client replies with a 12-byte Key_A
218	              and echoes the 4-byte Key_B as well).

220	          The suggested handshake explicitly announces the token.

222	                                 Figure 1

224	   Reducing the size of the server's key down to 32 bits might be
225	   considered a security risk.  However, one might argue that neither
226	   parties involved in the handshake (client and server) have an
227	   interest in compromising the connection.  Thus, the server can have
228	   confidence that the client is going to generate a 96 bits key with
229	   sufficient entropy and thus the server can safely reduce its key-size
230	   down to 32 bits.

232	   However, this would require the server to act statefully in the SYN
233	   exhcnage if it wanted to be able to open connections back to the
234	   client, since the token never appears again in the handshake.

236	3.2.  Changing the token generation

238	   Another suggestion is based on a less drastic change to the
239	   MP_CAPABLE handshake.  We suggest to infer the token based on the key
240	   provided by the host.  However, in contrast to [RFC6824], the token
241	   is not a truncated hash of the keys.  The token-generation uses
242	   rather the following scheme: If we define Z as the 32 high-order bits
243	   and K the 32 low-order bits of the MPTCP-key generated by a host, we
244	   suggest to generate the token as the encryption of Z with key K by
245	   using a 32-bit block-cipher (the block-cipher may for example be RC5
246	   - it remains to be defined by the working-group which is an
247	   appropriate block-cipher to use for this case).  The size of the
248	   MPTCP-key remains unchanged and is actually the concatenation of Z
249	   with K.  Both, K and Z are different for each and every connection,
250	   thus the MPTCP-key still provides 64 bits of randomness.

252	   Using this approach, a frontend proxy can make sure that a
253	   loadbalancer can derive the identity of the backend server solely
254	   through the token in the SYN-segment of the MP_JOIN exchange, without
255	   the need to track any MPTCP-related state.  To achieve this, the
256	   frontend proxy needs to generate K and Z in a specific way.
257	   Basically, the proxy derives the token through the method described
258	   at the beginning of this Section 3.  This gives us the following
259	   relation:

261	   token = block_cipher(proxy_id, Y) (Y is the local secret)

263	   However, as described above, at the same time we enforce:

265	   token = block_cipher(Z, K)

267	   Thus, the proxy simply generates a random number K, and can thus
268	   generate Z by decrypting the token with key K.  It is TBD what number
269	   of bits of a token could be used for conveying routing information.
270	   Exlcuding those bits, the token would be random, and the key K is
271	   random as well, so Z will be random as well.  An attacker
272	   evesdropping the token cannot infer anything on Z nor on K.  However,
273	   prolonged gathering of token data could lead to building up some data
274	   about the key K.

276	4.  Conclusion

278	   In order to be deployable at a large scale, Multipath TCP has to
279	   evolve to accomodate the use-case of distributed layer-4
280	   loadbalancers.  In this document we explained the different problems
281	   that arise when one wants to deploy MPTCP in a large server farm.  We
282	   followed up with two possible approaches to solve the issues around
283	   the non-deterministic nature of the token.  We argue that it is
284	   important that the working group considers this problem and strives
285	   to find a solution.

287	5.  IANA Considerations

289	   No IANA considerations.

291	6.  References

293	6.1.  Normative References

295	   [I-D.paasch-mptcp-syncookies]
296	              Paasch, C., Biswas, A., and D. Haas, "Making Multipath TCP
297	              robust for stateless webservers", draft-paasch-mptcp-
298	              syncookies-00 (work in progress), April 2015.

300	   [RFC6824]  Ford, A., Raiciu, C., Handley, M., and O. Bonaventure,
301	              "TCP Extensions for Multipath Operation with Multiple
302	              Addresses", RFC 6824, January 2013.

304	6.2.  Informative References

306	   [Greenberg13]
307	              Greenberg, A., Lahiri, P., Maltz, D., Parveen, P., and S.
308	              Sengupta, "Towards a Next Generation Data Center
309	              Architecture: Scalability and Commoditization", 2018,
310	              <http://dl.acm.org/citation.cfm?id=1397732>.

312	   [Patel13]  Parveen, P., Bansal, D., Yuan, L., Murthy, A., Maltz, D.,
313	              Kern, R., Kumar, H., Zikos, M., Wu, H., Kim, C., and N.
314	              Karri, "Ananta: Cloud Scale Load Balancing", 2013,
315	              <http://dl.acm.org/citation.cfm?id=2486026>.

317	   [Shuff13]  Shuff, P., "Building A Billion User Load Balancer", 2013,
318	              <https://www.youtube.com/watch?v=MKgJeqF1DHw>.

320	Authors' Addresses

322	   Christoph Paasch
323	   Apple, Inc.
324	   Cupertino
325	   US

327	   Email: cpaasch@apple.com
328	   Greg Greenway
329	   Apple, Inc.
330	   Cupertino
331	   US

333	   Email: ggreenway@apple.com

335	   Alan Ford
336	   Pexip

338	   Email: alan.ford@gmail.com