idnits 2.17.1 draft-paasch-mptcp-loadbalancer-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 7, 2015) is 3154 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-02) exists of draft-paasch-mptcp-syncookies-00 ** Obsolete normative reference: RFC 6824 (Obsoleted by RFC 8684) Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 MPTCP Working Group C. Paasch 3 Internet-Draft G. Greenway 4 Intended status: Experimental Apple, Inc. 5 Expires: March 10, 2016 A. Ford 6 Pexip 7 September 7, 2015 9 Multipath TCP behind Layer-4 loadbalancers 10 draft-paasch-mptcp-loadbalancer-00 12 Abstract 14 Large webserver farms consist of thousands of frontend proxies that 15 serve as endpoints for the TCP and TLS connection and relay traffic 16 to the (sometimes distant) backend servers. Load-balancing across 17 those server is done by layer-4 loadbalancers that ensure that a TCP 18 flow will always reach the same server. 20 Multipath TCP's use of multiple TCP subflows for the transmission of 21 the data stream requires those loadbalancers to be aware of MPTCP to 22 ensure that all subflows belonging to the same MPTCP connection reach 23 the same frontend proxy. In this document we analyze the challenges 24 related to this and suggest a simple modification to the generation 25 of the MPTCP-token to overcome those challenges. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at http://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on March 10, 2016. 44 Copyright Notice 46 Copyright (c) 2015 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 62 2. Problem statement . . . . . . . . . . . . . . . . . . . . . . 3 63 3. Proposals . . . . . . . . . . . . . . . . . . . . . . . . . . 4 64 3.1. Explicitly announcing the token . . . . . . . . . . . . . 4 65 3.2. Changing the token generation . . . . . . . . . . . . . . 6 66 4. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 6 67 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 68 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 69 6.1. Normative References . . . . . . . . . . . . . . . . . . 7 70 6.2. Informative References . . . . . . . . . . . . . . . . . 7 71 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 7 73 1. Introduction 75 Internet services rely on large server farms to deliver content to 76 the end-user. In order to cope with the load on those server farms 77 they rely on a large, distributed load-balancing architecture at 78 different layers. Backend servers are serving the content from 79 within the data center to the frontend proxies. These frontend 80 proxies are the ones terminating the TCP connections from the 81 clients. A server farm relies on a large number of these frontend 82 proxies to provide sufficient capacity. In order to balance the load 83 on those frontend proxies, layer-4 loadbalancers are installed in 84 front of these. Those loadbalancers ensure that a TCP-flow will 85 always be routed to the same frontend proxy. For resilience and 86 capacity reasons the data-center typically deploys multiple of these 87 loadbalancers [Shuff13] [Patel13]. 89 These layer-4 loadbalancers rely on consistent hashing algorithms to 90 ensure that a TCP-flow is routed to the appropriate frontend proxy. 91 The consistent hashing algorithm avoids state-synchronization across 92 the loadbalancers, making sure that in case a TCP-flow gets routed to 93 a different loadbalancer (e.g., due to a change in routing) the TCP- 94 flow will still be sent to the appropriate frontend proxy 95 [Greenberg13]. 97 Multipath TCP uses different TCP flows and spreads the application's 98 data stream across these [RFC6824]. These TCP subflows use a 99 different 4-tuple in order to be routed on a different path on the 100 Internet. However, legacy layer-4 loadbalancers are not aware that 101 these different TCP flows actually belong to the same MPTCP 102 connection. 104 The remainder of this document explains the issues that arise due to 105 this and suggests a possible change to MPTCP's token-generation 106 algorithm to overcome these issues. 108 2. Problem statement 110 In an architecture with a single layer-4 loadbalancer but multiple 111 frontend proxies, the layer-4 loadbalancer will have to make sure 112 that the different TCP subflows that belong to the same MPTCP 113 connection are routed to the same frontend proxy. In order to 114 achieve this, the loadbalancer has to be made "MPTCP-aware", tracking 115 the keys exchanged in the MP_CAPABLE handshake. This state-tracking 116 allows the loadbalancer to also calculate the token associated with 117 the MPTCP-connection. The loadbalancer thus creates a mapping 118 (token, frontend proxy), stored in memory for the lifetime of the 119 MPTCP connection. As new TCP subflows are being created by the 120 client, the token included in the SYN+MP_JOIN message allows the 121 loadbalancer to ensure that this subflow is being routed to the 122 appropriate frontend proxy. 124 However, as soon as the data center employs multiple of these layer-4 125 loadbalancers, it may happen that TCP subflows that belong to the 126 same MPTCP connection are being routed to different loadbalancers. 127 This implies that the loadbalancer needs to share the mapping-state 128 it created for all MPTCP connections among all other loadbalancers to 129 ensure that all loadbalancers route the subflows of an MPTCP 130 connection to the same frontend proxy. This is substantially more 131 complicated to implement, and would suffer from latency issues. 133 Another issue when MPTCP is being used in a large server farm is that 134 the different frontend proxies may generate the same token for 135 different MPTCP connections. This may happen because the token is a 136 truncated hash of the key, and hash collisions may occur. A server 137 farm handling millions of MPTCP connections has actually a very high 138 chance of generating those token-collisions. A loadbalancer will 139 thus no more be able to accurately send the SYN+MP_JOIN to the 140 correct frontend proxy in case a token-collision happened for this 141 MPTCP connection. 143 3. Proposals 145 The issues described in Section 2 have their origin due to the 146 undeterministic nature in the token-generation. Indeed, if it 147 becomes possible for the loadbalancer to infer the frontend proxy to 148 forward this flow to, MPTCP becomes deployable in such kinds of 149 environments. 151 The suggested solutions have their basis in a token from which a 152 loadbalacer can glean routing information in a stateless manner. To 153 allow the loadbalancer to infer the proxy based on the token, the 154 proxies each need to be assigned to a range of unique integers. When 155 the token falls within a certain range, the loadbalancer knows to 156 which proxy to forward the sufblow. Using a contiguous range of 157 integers makes the frontend very vulnerable to attackers. Thus, a 158 reversible function is needed that makes the token random-looking. A 159 32-bit block-cipher (e.g., RC5) provides this random-looking 160 reversible function. Thus, for both proposals we assume that the 161 frontend proxies and the layer-4 loadbalancer share a local secret Y, 162 of size 32 bits. This secret is only known to the server-side data 163 center infrastructure. If X is an integer from within the range 164 associated to the proxy, the proxy will generate the token by 165 encypting X with secret Y. The loadbalancer will simply decrypt the 166 token with the secret Y, which provides it the value of X, allowing 167 it to forward the TCP flow to the appropriate proxy. 169 This approach also ensures that the tokens generated by different 170 servers are unique to each server, eliminating the token-collision 171 issue outlined in the previous section. 173 In the following we outline two different approaches to handle the 174 above described problems, using this approach. The two proposals 175 provide different ways of communicating the token over to the peer 176 during the MP_CAPABLE handshake. We would like these proposals to 177 serve as a discussion basis for the design of the definite solution. 179 3.1. Explicitly announcing the token 181 One way of communicating the token to simply announce it in plaintext 182 within the MP_CAPABLE handshake. In order to allow this, the wire- 183 format of the MP_CAPABLE handshake needs to change however. 185 One solution would be to simply increase the size of the MP_CAPABLE 186 by 4 bytes, giving space for the token to be included in the SYN and 187 SYN/ACK as well as adding it to the third ACK. However, due to the 188 scarce TCP-option space this solution would suffer deployment 189 difficulties. 191 If the solution proposed in [I-D.paasch-mptcp-syncookies] is being 192 deployed, the MP_CAPABLE-option in the SYN-segment has been reduced 193 to 4 bytes. This gives us space within the option-space of the SYN- 194 segment that can be used. This allows the client to announce its 195 token within the SYN-segment. To allow the server to announce its 196 token in the SYN/ACK, without bumping the option-size up to 16 bytes, 197 we reduce the size of the server's key down to 32 bits, which gives 198 space for the server's token. To avoid introducing security-risks by 199 reducing the size of the server's key, we suggest to bump the 200 client's key up to 96 bits. This provides still a total of 128 bits 201 of entropy for the HMAC computation. The suggested handshake is 202 outlined in Figure 1. 204 SYN + MP_CAPABLE_SYN (Token_A) 205 -------------------------------------> 206 (the client announces the 4-byte locally 207 unique token to the server in the 208 SYN-segment). 210 SYN/ACK + MP_CAPABLE_SYNACK (Token_B, Key_B) 211 <------------------------------------- 212 (the server replies with a SYN/ACK announcing 213 as well a 4-byte locally unique token and a 4-byte key) 215 ACK + MP_CAPABLE_ACK (Key_A, Key_B) 216 --------------------------------------> 217 (third ack, the client replies with a 12-byte Key_A 218 and echoes the 4-byte Key_B as well). 220 The suggested handshake explicitly announces the token. 222 Figure 1 224 Reducing the size of the server's key down to 32 bits might be 225 considered a security risk. However, one might argue that neither 226 parties involved in the handshake (client and server) have an 227 interest in compromising the connection. Thus, the server can have 228 confidence that the client is going to generate a 96 bits key with 229 sufficient entropy and thus the server can safely reduce its key-size 230 down to 32 bits. 232 However, this would require the server to act statefully in the SYN 233 exhcnage if it wanted to be able to open connections back to the 234 client, since the token never appears again in the handshake. 236 3.2. Changing the token generation 238 Another suggestion is based on a less drastic change to the 239 MP_CAPABLE handshake. We suggest to infer the token based on the key 240 provided by the host. However, in contrast to [RFC6824], the token 241 is not a truncated hash of the keys. The token-generation uses 242 rather the following scheme: If we define Z as the 32 high-order bits 243 and K the 32 low-order bits of the MPTCP-key generated by a host, we 244 suggest to generate the token as the encryption of Z with key K by 245 using a 32-bit block-cipher (the block-cipher may for example be RC5 246 - it remains to be defined by the working-group which is an 247 appropriate block-cipher to use for this case). The size of the 248 MPTCP-key remains unchanged and is actually the concatenation of Z 249 with K. Both, K and Z are different for each and every connection, 250 thus the MPTCP-key still provides 64 bits of randomness. 252 Using this approach, a frontend proxy can make sure that a 253 loadbalancer can derive the identity of the backend server solely 254 through the token in the SYN-segment of the MP_JOIN exchange, without 255 the need to track any MPTCP-related state. To achieve this, the 256 frontend proxy needs to generate K and Z in a specific way. 257 Basically, the proxy derives the token through the method described 258 at the beginning of this Section 3. This gives us the following 259 relation: 261 token = block_cipher(proxy_id, Y) (Y is the local secret) 263 However, as described above, at the same time we enforce: 265 token = block_cipher(Z, K) 267 Thus, the proxy simply generates a random number K, and can thus 268 generate Z by decrypting the token with key K. It is TBD what number 269 of bits of a token could be used for conveying routing information. 270 Exlcuding those bits, the token would be random, and the key K is 271 random as well, so Z will be random as well. An attacker 272 evesdropping the token cannot infer anything on Z nor on K. However, 273 prolonged gathering of token data could lead to building up some data 274 about the key K. 276 4. Conclusion 278 In order to be deployable at a large scale, Multipath TCP has to 279 evolve to accomodate the use-case of distributed layer-4 280 loadbalancers. In this document we explained the different problems 281 that arise when one wants to deploy MPTCP in a large server farm. We 282 followed up with two possible approaches to solve the issues around 283 the non-deterministic nature of the token. We argue that it is 284 important that the working group considers this problem and strives 285 to find a solution. 287 5. IANA Considerations 289 No IANA considerations. 291 6. References 293 6.1. Normative References 295 [I-D.paasch-mptcp-syncookies] 296 Paasch, C., Biswas, A., and D. Haas, "Making Multipath TCP 297 robust for stateless webservers", draft-paasch-mptcp- 298 syncookies-00 (work in progress), April 2015. 300 [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, 301 "TCP Extensions for Multipath Operation with Multiple 302 Addresses", RFC 6824, January 2013. 304 6.2. Informative References 306 [Greenberg13] 307 Greenberg, A., Lahiri, P., Maltz, D., Parveen, P., and S. 308 Sengupta, "Towards a Next Generation Data Center 309 Architecture: Scalability and Commoditization", 2018, 310 . 312 [Patel13] Parveen, P., Bansal, D., Yuan, L., Murthy, A., Maltz, D., 313 Kern, R., Kumar, H., Zikos, M., Wu, H., Kim, C., and N. 314 Karri, "Ananta: Cloud Scale Load Balancing", 2013, 315 . 317 [Shuff13] Shuff, P., "Building A Billion User Load Balancer", 2013, 318 . 320 Authors' Addresses 322 Christoph Paasch 323 Apple, Inc. 324 Cupertino 325 US 327 Email: cpaasch@apple.com 328 Greg Greenway 329 Apple, Inc. 330 Cupertino 331 US 333 Email: ggreenway@apple.com 335 Alan Ford 336 Pexip 338 Email: alan.ford@gmail.com