idnits 2.17.1 draft-duchene-mptcp-load-balancing-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 132: '...hat this address MUST NOT be used to c...' RFC 2119 keyword, line 135: '... with the "B" set to 1 MUST NOT try to...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 31, 2016) is 2733 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Unused Reference: 'I-D.ietf-mptcp-rfc6824bis' is defined on line 445, but no explicit reference was found in the text == Unused Reference: 'RFC1323' is defined on line 455, but no explicit reference was found in the text == Unused Reference: 'RFC6182' is defined on line 459, but no explicit reference was found in the text == Unused Reference: 'RFC7430' is defined on line 464, but no explicit reference was found in the text ** Obsolete normative reference: RFC 6824 (Obsoleted by RFC 8684) == Outdated reference: A later version (-18) exists of draft-ietf-mptcp-rfc6824bis-07 -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 1323 (Obsoleted by RFC 7323) Summary: 2 errors (**), 0 flaws (~~), 6 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 MPTCP Working Group F. Duchene 3 Internet-Draft UCLouvain 4 Intended status: Experimental V. Olteanu 5 Expires: May 4, 2017 University Politehnica of Bucharest 6 O. Bonaventure 7 UCLouvain 8 C. Raiciu 9 University Politehnica of Bucharest 10 October 31, 2016 12 Multipath TCP Load Balancing 13 draft-duchene-mptcp-load-balancing-00 15 Abstract 17 In this document we propose several solutions to allow Multipath TCP 18 to better work behind load balancers. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on May 4, 2017. 37 Copyright Notice 39 Copyright (c) 2016 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 2. Proposed solutions . . . . . . . . . . . . . . . . . . . . . 3 56 2.1. Per-server addresses . . . . . . . . . . . . . . . . . . 3 57 2.2. Embedding Extra Information in Packets . . . . . . . . . 5 58 2.2.1. Proposal 1 . . . . . . . . . . . . . . . . . . . . . 5 59 2.2.2. Proposal 2 . . . . . . . . . . . . . . . . . . . . . 6 60 3. Comparaison of the solutions . . . . . . . . . . . . . . . . 9 61 4. Recommandations . . . . . . . . . . . . . . . . . . . . . . . 9 62 5. IANA considerations . . . . . . . . . . . . . . . . . . . . . 9 63 6. Security considerations . . . . . . . . . . . . . . . . . . . 9 64 7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 10 65 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 66 8.1. Normative References . . . . . . . . . . . . . . . . . . 10 67 8.2. Informative References . . . . . . . . . . . . . . . . . 10 68 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 70 1. Introduction 72 Multipath TCP is an extension to TCP [RFC0793] that was specified in 73 [RFC6824]. Multipath TCP allows hosts to use multiple paths to send 74 and receive the data belonging to one connection. For this, a 75 Multipath TCP connection is composed of several TCP connections that 76 are called subflows. 78 Many large web sites are served by servers that are behind a load 79 balancer. The load balancer receives the connection establishment 80 attempts and forwards them to the actual servers that serve the 81 requests. One issue for the end-to-end deployment of Multipath TCP 82 is its ability to be used on load-balancers. Different types of load 83 balancers are possible. We consider a simple but important load 84 balancer that does not maintain any per-flow state. This load 85 balancer is illustrated in Figure 1. A stateless load balancer can 86 be implemented by hashing the five tuple (IP addresses and port 87 numbers) of each incoming packet and forwarding them to one of the 88 servers based on the hash value computed. With TCP, this load 89 balancer ensures that all the packets that belong to one TCP 90 connection are sent to the same server since each packet contains the 91 five-tuple used by the hash function. 93 +--+---- S1 94 ---|LB|---- S2 95 +--+---- S3 97 Figure 1: Stateless load balancer 99 With Multipath TCP, this approach cannot be used anymore when 100 subflows are created by the clients. Such subflows can contain any 101 five tuple and thus packets belonging to them will be load-balanced 102 to any server, not necessarily the one that was selected by the 103 hashing function for the initial subflow. 105 In this document, we propose several solutions to allow Multipath TCP 106 to work behind load balancers. 108 2. Proposed solutions 110 2.1. Per-server addresses 112 A first solution is to use two types of public addresses. The load 113 balancer uses a public address that is advertised in the DNS. This 114 address is used to establish the initial subflow of all Multipath TCP 115 connections. In addition to this address, a pool of addresses is 116 used for the servers behind the load balancer. One address of this 117 pool is assigned to each server behind the load balancer. This 118 server address is not announced in the DNS and only advertised by the 119 servers through the ADD_ADDR option. 121 The additional per-server address is used by the clients when they 122 wish to create additional subflows. Since each server has its own 123 public address, this ensures that the additional subflows are 124 directed to the corresponding server. For this solution, we need to 125 ensure that the client never use the public address of the load 126 balancer to initiate subflows. This can be achieved by a slight 127 modification to the MP_CAPABLE option described below. 129 To allow Multipath TCP to work for servers behind layer 4 load 130 balancers, we propose to use the reserved "B" flag in the MP_CAPABLE 131 option sent (shown in Figure 2 in the SYN+ACK. This flag informs the 132 other host that this address MUST NOT be used to create additional 133 subflows. 135 A host receiving an MP_CAPABLE with the "B" set to 1 MUST NOT try to 136 establish a subflow to the source address of the MP_CAPABLE. This 137 bit can also be used in the MP_CAPABLE option sent in the SYN by a 138 client that resides behind a NAT or firewall or does not accept 139 server-initiated subflows. 141 1 2 3 142 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 143 +---------------+---------------+-------+-------+---------------+ 144 | Kind | Length |Subtype|Version|A|B|C|D|E|F|G|H| 145 +---------------+---------------+-------+-------+---------------+ 146 | Option Sender's Key (64 bits) | 147 | (if option Length > 4) | 148 | | 149 +---------------------------------------------------------------+ 150 | Option Receiver's Key (64 bits) | 151 | (if option Length > 12) | 152 | | 153 +-------------------------------+-------------------------------+ 154 | Data-Level Length (16 bits) | Checksum (16 bits, optional) | 155 +-------------------------------+-------------------------------+ 157 Figure 2: Multipath Capable (MP_CAPABLE) Option 159 This bit can be used by servers behind a stateless load balancer. 160 The servers set the "B" flag in the MP_CAPABLE option that they 161 return and advertise their own address by using the ADD_ADDR option. 162 Upon reception of this option, the clients can create the additional 163 subflows towards these addresses. Compared with current stateless 164 load balancers, an advantage of this approach is that the packets 165 belonging to the additional subflows do not need to pass through the 166 load balancer. 168 To demonstrate the principle of an off path load balancer let's 169 consider a server behind a load balancer. 171 +-- net1 --+ +-- Load Balancer --+--- ADDR 1 ---+ 172 | | | | 173 client --+ +--+ +--- Server 174 | | | | 175 +-- net2 --+ +------------- ADDR 2 -------------+ 177 Figure 3: A server with 2 addresse. 179 As shown in figure Figure 3, this server has 2 IP addresses: 1 behind 180 the load balancer and 1 directly connected to the Internet. The 181 client sends a SYN containing an MP_CAPABLE option, the server 182 answers with a SYN+ACK containing an MP_CAPABLE with the "B" flag set 183 to 1. Upon reception of the SYN+ACK, the client will know that it 184 cannot establish any more subflow towards IP address. The server 185 will then advertise it's secondary address with an ADD_ADDR. Once 186 the client has established at least one connection to the secondary 187 IP address, the server could elect to close the primary subflow or to 188 put it in backup mode. 190 2.2. Embedding Extra Information in Packets 192 Under some circumstances, addressing the individial servers via their 193 individial IPs is not desirable or feasible. To work around this 194 issue, we propose two mutually-exclusive solutions. They rely to 195 varying degrees on getting the client to embed connection or server- 196 identifying information in the packets that it sends out. This extra 197 information can be used statelessly by the loadbalancers. 199 Both solutions require modifications only to the server stack and 200 work well with existing MPTCP clients. 202 2.2.1. Proposal 1 204 Our first proposal revolves around controlling the destination port 205 that the client uses in all subflows aside from the initial one. It 206 is possible for the server to advertise an additional port via the 207 ADD_ADDR option [RFC6824]. This informs the client that it can send 208 an MP_JOIN to this new port and initiate a new subflow. 210 To take advantage of this, each server is be assigned a unique 16-bit 211 ID, which must be different from the port on which the service is 212 being hosted (e.g. 80). As soon as a connection is initiated, the 213 server sends an ADD_ADDR to the client advertising a new port equal 214 to said ID. 216 Packets that arrive at the loadbalancer are treated as follows: 218 o Packets destined to the port that the service is being hosted on 219 will be forwarded to a server based on a hash of the 5-tuple. 221 o Packets destined to any other port are forwarded to the server 222 whose ID matches the destination port. 224 This approach has two drawbacks: 226 o The client will most likely also try to initiate subflows using 227 the server's original port. Because these subflows are 228 loadbalanced based on a hash of their 5-tuple, they will almost 229 certainly reach a different server and break. (Using REMOVE_ADDR 230 to prevent the creation of these subflows would entail the 231 destruction of the original subflow.) This issue can be solved by 232 the adoption of the protocol modifications outlined in 233 Section 2.1. 235 o If the client is behind a firewall that restricts access to 236 certain destination ports, it might not succeed in establishing 237 any new subflows. 239 2.2.2. Proposal 2 241 Our second proposal is to loadbalance packets based on the server's 242 token. 244 The token's most significant 14 bits are treated as a hash value for 245 the connection. They are embedded in all outgoing TCP timestamps, 246 and subsequently echoed back by the client. Incoming packets that do 247 not contain timestamps (such as FINs) are dealt with via redirection 248 between the servers. 250 2.2.2.1. Connection Initiation 252 The client initiates an MPTCP connection by sending a SYN with the 253 MP_CAPABLE option. Under normal operation, the server then picks a 254 random 64-bit key for the connection, and uses it to compute its 255 token. 257 To forward the packet appropriately, the load balancer must know the 258 token before deciding what server to send it to. To accomplish this, 259 we move the key generation to the load balancer. The connection's 260 token can be computed based on the generated key. 262 The load balancer places the generated key, along with the IP address 263 of the server that would be responsible for the subflow under normal 264 5-tuple hashing (which we call the alternate server IP) in an IP 265 option and forwards the SYN to the server. 267 1 2 3 268 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 269 +---------------+---------------+---------------+---------------+ 270 | Type = 96 | Length = 16 | Unused | 271 +---------------+---------------+---------------+---------------+ 272 | | 273 + Server Key + 274 | | 275 +---------------+---------------+---------------+---------------+ 276 | Alternate Server IP | 277 +---------------+---------------+---------------+---------------+ 279 Figure 4: IP Option Used for MP_CAPABLE packets 281 The figure above depicts the IP option that is inserted into the 282 MP_CAPABLE packet before it is sent to the server. We have chosen an 283 IP option despite the fact that the data contained therein pertains 284 to the transport layer, because TCP option space is very limited. IP 285 option type 96 is currently classified as reserved [RFC0791]. 287 Upon receipt of the packet, the server uses the key provided to 288 compute the token for the connection. If no connection with the same 289 token exists, the server uses the key provided. Otherwise, it takes 290 a brute-force approach and randomly generates multiple keys and 291 selects one that yields a token with the same 14 highest-order bits. 293 The use of the alternate server IP will be discussed in a later 294 section. 296 2.2.2.2. Handling MP_JOIN packets 298 Additional subflows are initiated by the client by sending MP_JOIN 299 packets. These packets contain the server's token. 301 Similarly to how MP_CAPABLE packets are treated, the load balancer 302 uses an IP option to inform the server about which other server would 303 be responsible for the subflow under normal 5-tuple hashing. 305 1 2 3 306 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 307 +---------------+---------------+---------------+---------------+ 308 | Type = 97 | Length = 8 | Unused | 309 +---------------+---------------+---------------+---------------+ 310 | Alternate Server IP | 311 +---------------+---------------+---------------+---------------+ 313 Figure 5: IP Option Used for MP_JOIN packets 315 IP option type 97 is also classified as reserved [RFC0791]. 317 2.2.2.3. Embedding the token in the timestamp 319 The TCP timestamp option [RFC7323] is present in most packets and is 320 comprised of two fields: the TSval, which is set by the packet's 321 sender, and TSecr, which contains a timestamp recently received from 322 the other end. 324 Taking advantage of the fact that timestamps set by the server are 325 echoed back by the client, the server shifts its timestamp clock left 326 by 14 bits, and embeds the 14 highest-order bits of the token into 327 the 14 lowest-order bits of the TSval. When a packet with the ACK 328 flag set and with the TS option present arrives at the loadbalancer, 329 it is forwarded based on the 14 least significant bits of the TSecr 330 field. 332 2.2.2.3.1. Impact on PAWS 334 Timestamps supplied by the server are used by the client for 335 protection against wrapped sequence numbers (PAWS). Note that for 336 Multipath TCP, the utilisation of the 64 bits DSN already protects 337 against PAWS. 339 We assume that the server uses a timestamp clock frequency of 1 tick 340 per ms, which is the highest frequency recommended by [RFC7323]. The 341 recycling time of the timestamp clock's sign bit is required to be 342 greater than the Maximum Segment Lifetime of 255 seconds. Given that 343 the clock ticks once every ms in increments of 2 ^ 14, its recycling 344 time is roughly 262 s, which is within the bounds set by the 345 standard. 347 While the quickly-increasing timestamp is benign to active subflows, 348 PAWS will still cause segments to be dropped if the subflow in 349 question had been idle for a period longer than the clock's recycling 350 time. To solve this, the server periodically sends keepalive 351 messages during idle periods. 353 2.2.2.4. Redirecting packets without timestamps 355 Some packets (most notably FINs) do not contain timestamps or any 356 other connection-identifying information. As such, they are 357 forwarded to a server based on a hash of the 5-tuple. 359 As seen in Section 2.2.2.1 and Section 2.2.2.2, whenever a new 360 subflow is setup, the server responsible for it (A) also knows which 361 other server (B) would be hit by the packets in case 5-tuple hashing 362 is used. 364 A will use a simple peer-to-peer protocol to inform B to setup a 365 redirection rule for the 5-tuple in question. The redirection rule 366 will be deleted by B either at A's request, after the subflow has 367 finished, or after a timeout. We do not discuss the specifics of the 368 protocol in this document. 370 Redirection of a packet is performed using IP-in-IP encapsulation. 372 3. Comparaison of the solutions 374 Per-server addresses: 376 o Requires individual public addresses for each of the servers, 377 making IPv6 almost mandatory. 379 o Requires modifications to the clients and servers stack. 381 o Is transparent and works with today's load balancers. 383 o Doesn't need any modification to the applications. 385 o Disclose the real IP address of the servers. 387 o Allows to put the load balancer off-path. 389 Extra Information in Packets: 391 o Doesn't require an individual public addresses for each of the 392 servers. 394 o Requires modifications to the load balancers servers stack. 396 o Could be broken by a firewall blocking certain destination ports 397 (proposal 1) or changing the value of the timestamps (proposal 2). 399 o Doesn't need any modification to the applications. 401 o Doesn't disclose the real IP address of the servers. 403 4. Recommandations 405 5. IANA considerations 407 This document proposes some modifications to the Multipath TCP 408 options defined in [RFC6824]. These modifications do not require any 409 specific action from IANA. 411 6. Security considerations 413 Security considerations will be discussed in the next version of this 414 draft. 416 7. Conclusion 418 In this document, we have described and compared two solutions to 419 load balance MultiPath TCP connections. We showed that these two 420 solutions have advantages and drawbacks and cover different network 421 configurations. Future versions of this draft will include more 422 solutions like the Application Layer Authentication and discuss 423 security considerations. 425 8. References 427 8.1. Normative References 429 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, DOI 430 10.17487/RFC0791, September 1981, 431 . 433 [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, 434 "TCP Extensions for Multipath Operation with Multiple 435 Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013, 436 . 438 [RFC7323] Borman, D., Braden, B., Jacobson, V., and R. 439 Scheffenegger, Ed., "TCP Extensions for High Performance", 440 RFC 7323, DOI 10.17487/RFC7323, September 2014, 441 . 443 8.2. Informative References 445 [I-D.ietf-mptcp-rfc6824bis] 446 Ford, A., Raiciu, C., Handley, M., Bonaventure, O., and C. 447 Paasch, "TCP Extensions for Multipath Operation with 448 Multiple Addresses", draft-ietf-mptcp-rfc6824bis-07 (work 449 in progress), October 2016. 451 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 452 793, DOI 10.17487/RFC0793, September 1981, 453 . 455 [RFC1323] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions 456 for High Performance", RFC 1323, DOI 10.17487/RFC1323, May 457 1992, . 459 [RFC6182] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. 460 Iyengar, "Architectural Guidelines for Multipath TCP 461 Development", RFC 6182, DOI 10.17487/RFC6182, March 2011, 462 . 464 [RFC7430] Bagnulo, M., Paasch, C., Gont, F., Bonaventure, O., and C. 465 Raiciu, "Analysis of Residual Threats and Possible Fixes 466 for Multipath TCP (MPTCP)", RFC 7430, DOI 10.17487/ 467 RFC7430, July 2015, 468 . 470 Authors' Addresses 472 Fabien Duchene 473 UCLouvain 475 Email: fabien.duchene@uclouvain.be 477 Vladimir Olteanu 478 University Politehnica of Bucharest 480 Email: vladimir.olteanu@cs.pub.ro 482 Olivier Bonaventure 483 UCLouvain 485 Email: Olivier.Bonaventure@uclouvain.be 487 Costin Raiciu 488 University Politehnica of Bucharest 490 Email: costin.raiciu@cs.pub.ro