idnits 2.17.1 draft-duchene-mptcp-load-balancing-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 135: '...hat this address MUST NOT be used to c...' RFC 2119 keyword, line 138: '... with the "B" set to 1 MUST NOT try to...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 03, 2017) is 2488 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Unused Reference: 'I-D.ietf-mptcp-rfc6824bis' is defined on line 489, but no explicit reference was found in the text == Unused Reference: 'RFC1323' is defined on line 499, but no explicit reference was found in the text == Unused Reference: 'RFC6182' is defined on line 503, but no explicit reference was found in the text == Unused Reference: 'RFC7430' is defined on line 508, but no explicit reference was found in the text ** Obsolete normative reference: RFC 6824 (Obsoleted by RFC 8684) == Outdated reference: A later version (-18) exists of draft-ietf-mptcp-rfc6824bis-07 -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 1323 (Obsoleted by RFC 7323) Summary: 2 errors (**), 0 flaws (~~), 6 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 MPTCP Working Group F. Duchene 3 Internet-Draft UCLouvain 4 Intended status: Experimental V. Olteanu 5 Expires: January 4, 2018 University Politehnica of Bucharest 6 O. Bonaventure 7 UCLouvain 8 C. Raiciu 9 University Politehnica of Bucharest 10 A. Ford 11 Pexip 12 July 03, 2017 14 Multipath TCP Load Balancing 15 draft-duchene-mptcp-load-balancing-01 17 Abstract 19 In this document we propose several solutions to allow Multipath TCP 20 to better work behind load balancers. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on January 4, 2018. 39 Copyright Notice 41 Copyright (c) 2017 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 2. Proposed solutions . . . . . . . . . . . . . . . . . . . . . 3 58 2.1. Per-server addresses . . . . . . . . . . . . . . . . . . 3 59 2.2. Embedding Extra Information in Packets . . . . . . . . . 5 60 2.2.1. Proposal 1 . . . . . . . . . . . . . . . . . . . . . 5 61 2.2.2. Proposal 2 . . . . . . . . . . . . . . . . . . . . . 6 62 2.3. Application Layer Authentication . . . . . . . . . . . . 9 63 3. Comparaison of the solutions . . . . . . . . . . . . . . . . 9 64 4. Recommandations . . . . . . . . . . . . . . . . . . . . . . . 10 65 5. IANA considerations . . . . . . . . . . . . . . . . . . . . . 10 66 6. Security considerations . . . . . . . . . . . . . . . . . . . 10 67 7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 10 68 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 69 8.1. Normative References . . . . . . . . . . . . . . . . . . 10 70 8.2. Informative References . . . . . . . . . . . . . . . . . 11 71 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 73 1. Introduction 75 Multipath TCP is an extension to TCP [RFC0793] that was specified in 76 [RFC6824]. Multipath TCP allows hosts to use multiple paths to send 77 and receive the data belonging to one connection. For this, a 78 Multipath TCP connection is composed of several TCP connections that 79 are called subflows. 81 Many large web sites are served by servers that are behind a load 82 balancer. The load balancer receives the connection establishment 83 attempts and forwards them to the actual servers that serve the 84 requests. One issue for the end-to-end deployment of Multipath TCP 85 is its ability to be used on load-balancers. Different types of load 86 balancers are possible. We consider a simple but important load 87 balancer that does not maintain any per-flow state. This load 88 balancer is illustrated in Figure 1. A stateless load balancer can 89 be implemented by hashing the five tuple (IP addresses and port 90 numbers) of each incoming packet and forwarding them to one of the 91 servers based on the hash value computed. With TCP, this load 92 balancer ensures that all the packets that belong to one TCP 93 connection are sent to the same server since each packet contains the 94 five-tuple used by the hash function. 96 +--+---- S1 97 ---|LB|---- S2 98 +--+---- S3 100 Figure 1: Stateless load balancer 102 With Multipath TCP, this approach cannot be used anymore when 103 subflows are created by the clients. Such subflows can contain any 104 five tuple and thus packets belonging to them will be load-balanced 105 to any server, not necessarily the one that was selected by the 106 hashing function for the initial subflow. 108 In this document, we propose several solutions to allow Multipath TCP 109 to work behind load balancers. 111 2. Proposed solutions 113 2.1. Per-server addresses 115 A first solution is to use two types of public addresses. The load 116 balancer uses a public address that is advertised in the DNS. This 117 address is used to establish the initial subflow of all Multipath TCP 118 connections. In addition to this address, a pool of addresses is 119 used for the servers behind the load balancer. One address of this 120 pool is assigned to each server behind the load balancer. This 121 server address is not announced in the DNS and only advertised by the 122 servers through the ADD_ADDR option. 124 The additional per-server address is used by the clients when they 125 wish to create additional subflows. Since each server has its own 126 public address, this ensures that the additional subflows are 127 directed to the corresponding server. For this solution, we need to 128 ensure that the client never use the public address of the load 129 balancer to initiate subflows. This can be achieved by a slight 130 modification to the MP_CAPABLE option described below. 132 To allow Multipath TCP to work for servers behind layer 4 load 133 balancers, we propose to use the reserved "B" flag in the MP_CAPABLE 134 option sent (shown in Figure 2 in the SYN+ACK. This flag informs the 135 other host that this address MUST NOT be used to create additional 136 subflows. 138 A host receiving an MP_CAPABLE with the "B" set to 1 MUST NOT try to 139 establish a subflow to the source address of the MP_CAPABLE. This 140 bit can also be used in the MP_CAPABLE option sent in the SYN by a 141 client that resides behind a NAT or firewall or does not accept 142 server-initiated subflows. 144 1 2 3 145 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 146 +---------------+---------------+-------+-------+---------------+ 147 | Kind | Length |Subtype|Version|A|B|C|D|E|F|G|H| 148 +---------------+---------------+-------+-------+---------------+ 149 | Option Sender's Key (64 bits) | 150 | (if option Length > 4) | 151 | | 152 +---------------------------------------------------------------+ 153 | Option Receiver's Key (64 bits) | 154 | (if option Length > 12) | 155 | | 156 +-------------------------------+-------------------------------+ 157 | Data-Level Length (16 bits) | Checksum (16 bits, optional) | 158 +-------------------------------+-------------------------------+ 160 Figure 2: Multipath Capable (MP_CAPABLE) Option 162 This bit can be used by servers behind a stateless load balancer. 163 The servers set the "B" flag in the MP_CAPABLE option that they 164 return and advertise their own address by using the ADD_ADDR option. 165 Upon reception of this option, the clients can create the additional 166 subflows towards these addresses. Compared with current stateless 167 load balancers, an advantage of this approach is that the packets 168 belonging to the additional subflows do not need to pass through the 169 load balancer. 171 To demonstrate the principle of an off path load balancer let's 172 consider a server behind a load balancer. 174 +-- net1 --+ +-- Load Balancer --+--- ADDR 1 ---+ 175 | | | | 176 client --+ +--+ +--- Server 177 | | | | 178 +-- net2 --+ +------------- ADDR 2 -------------+ 180 Figure 3: A server with 2 addresse. 182 As shown in figure Figure 3, this server has 2 IP addresses: 1 behind 183 the load balancer and 1 directly connected to the Internet. The 184 client sends a SYN containing an MP_CAPABLE option, the server 185 answers with a SYN+ACK containing an MP_CAPABLE with the "B" flag set 186 to 1. Upon reception of the SYN+ACK, the client will know that it 187 cannot establish any more subflow towards IP address. The server 188 will then advertise it's secondary address with an ADD_ADDR. Once 189 the client has established at least one connection to the secondary 190 IP address, the server could elect to close the primary subflow or to 191 put it in backup mode. 193 2.2. Embedding Extra Information in Packets 195 Under some circumstances, addressing the individial servers via their 196 individial IPs is not desirable or feasible. To work around this 197 issue, we propose two mutually-exclusive solutions. They rely to 198 varying degrees on getting the client to embed connection or server- 199 identifying information in the packets that it sends out. This extra 200 information can be used statelessly by the loadbalancers. 202 Both solutions require modifications only to the server stack and 203 work well with existing MPTCP clients. 205 2.2.1. Proposal 1 207 Our first proposal revolves around controlling the destination port 208 that the client uses in all subflows aside from the initial one. It 209 is possible for the server to advertise an additional port via the 210 ADD_ADDR option [RFC6824]. This informs the client that it can send 211 an MP_JOIN to this new port and initiate a new subflow. 213 To take advantage of this, each server is be assigned a unique 16-bit 214 ID, which must be different from the port on which the service is 215 being hosted (e.g. 80). As soon as a connection is initiated, the 216 server sends an ADD_ADDR to the client advertising a new port equal 217 to said ID. 219 Packets that arrive at the loadbalancer are treated as follows: 221 o Packets destined to the port that the service is being hosted on 222 will be forwarded to a server based on a hash of the 5-tuple. 224 o Packets destined to any other port are forwarded to the server 225 whose ID matches the destination port. 227 This approach has two drawbacks: 229 o The client will most likely also try to initiate subflows using 230 the server's original port. Because these subflows are 231 loadbalanced based on a hash of their 5-tuple, they will almost 232 certainly reach a different server and break. (Using REMOVE_ADDR 233 to prevent the creation of these subflows would entail the 234 destruction of the original subflow.) This issue can be solved by 235 the adoption of the protocol modifications outlined in 236 Section 2.1. 238 o If the client is behind a firewall that restricts access to 239 certain destination ports, it might not succeed in establishing 240 any new subflows. 242 2.2.2. Proposal 2 244 Our second proposal is to loadbalance packets based on the server's 245 token. 247 The token's most significant 14 bits are treated as a hash value for 248 the connection. They are embedded in all outgoing TCP timestamps, 249 and subsequently echoed back by the client. Incoming packets that do 250 not contain timestamps (such as FINs) are dealt with via redirection 251 between the servers. 253 2.2.2.1. Connection Initiation 255 The client initiates an MPTCP connection by sending a SYN with the 256 MP_CAPABLE option. Under normal operation, the server then picks a 257 random 64-bit key for the connection, and uses it to compute its 258 token. 260 To forward the packet appropriately, the load balancer must know the 261 token before deciding what server to send it to. To accomplish this, 262 we move the key generation to the load balancer. The connection's 263 token can be computed based on the generated key. 265 The load balancer places the generated key, along with the IP address 266 of the server that would be responsible for the subflow under normal 267 5-tuple hashing (which we call the alternate server IP) in an IP 268 option and forwards the SYN to the server. 270 1 2 3 271 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 272 +---------------+---------------+---------------+---------------+ 273 | Type = 96 | Length = 16 | Unused | 274 +---------------+---------------+---------------+---------------+ 275 | | 276 + Server Key + 277 | | 278 +---------------+---------------+---------------+---------------+ 279 | Alternate Server IP | 280 +---------------+---------------+---------------+---------------+ 282 Figure 4: IP Option Used for MP_CAPABLE packets 284 The figure above depicts the IP option that is inserted into the 285 MP_CAPABLE packet before it is sent to the server. We have chosen an 286 IP option despite the fact that the data contained therein pertains 287 to the transport layer, because TCP option space is very limited. IP 288 option type 96 is currently classified as reserved [RFC0791]. 290 Upon receipt of the packet, the server uses the key provided to 291 compute the token for the connection. If no connection with the same 292 token exists, the server uses the key provided. Otherwise, it takes 293 a brute-force approach and randomly generates multiple keys and 294 selects one that yields a token with the same 14 highest-order bits. 296 The use of the alternate server IP will be discussed in a later 297 section. 299 2.2.2.2. Handling MP_JOIN packets 301 Additional subflows are initiated by the client by sending MP_JOIN 302 packets. These packets contain the server's token. 304 Similarly to how MP_CAPABLE packets are treated, the load balancer 305 uses an IP option to inform the server about which other server would 306 be responsible for the subflow under normal 5-tuple hashing. 308 1 2 3 309 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 310 +---------------+---------------+---------------+---------------+ 311 | Type = 97 | Length = 8 | Unused | 312 +---------------+---------------+---------------+---------------+ 313 | Alternate Server IP | 314 +---------------+---------------+---------------+---------------+ 316 Figure 5: IP Option Used for MP_JOIN packets 318 IP option type 97 is also classified as reserved [RFC0791]. 320 2.2.2.3. Embedding the token in the timestamp 322 The TCP timestamp option [RFC7323] is present in most packets and is 323 comprised of two fields: the TSval, which is set by the packet's 324 sender, and TSecr, which contains a timestamp recently received from 325 the other end. 327 Taking advantage of the fact that timestamps set by the server are 328 echoed back by the client, the server shifts its timestamp clock left 329 by 14 bits, and embeds the 14 highest-order bits of the token into 330 the 14 lowest-order bits of the TSval. When a packet with the ACK 331 flag set and with the TS option present arrives at the loadbalancer, 332 it is forwarded based on the 14 least significant bits of the TSecr 333 field. 335 2.2.2.3.1. Impact on PAWS 337 Timestamps supplied by the server are used by the client for 338 protection against wrapped sequence numbers (PAWS). Note that for 339 Multipath TCP, the utilisation of the 64 bits DSN already protects 340 against PAWS. 342 We assume that the server uses a timestamp clock frequency of 1 tick 343 per ms, which is the highest frequency recommended by [RFC7323]. The 344 recycling time of the timestamp clock's sign bit is required to be 345 greater than the Maximum Segment Lifetime of 255 seconds. Given that 346 the clock ticks once every ms in increments of 2 ^ 14, its recycling 347 time is roughly 262 s, which is within the bounds set by the 348 standard. 350 While the quickly-increasing timestamp is benign to active subflows, 351 PAWS will still cause segments to be dropped if the subflow in 352 question had been idle for a period longer than the clock's recycling 353 time. To solve this, the server periodically sends keepalive 354 messages during idle periods. 356 2.2.2.4. Redirecting packets without timestamps 358 Some packets (most notably FINs) do not contain timestamps or any 359 other connection-identifying information. As such, they are 360 forwarded to a server based on a hash of the 5-tuple. 362 As seen in Section 2.2.2.1 and Section 2.2.2.2, whenever a new 363 subflow is setup, the server responsible for it (A) also knows which 364 other server (B) would be hit by the packets in case 5-tuple hashing 365 is used. 367 A will use a simple peer-to-peer protocol to inform B to setup a 368 redirection rule for the 5-tuple in question. The redirection rule 369 will be deleted by B either at A's request, after the subflow has 370 finished, or after a timeout. We do not discuss the specifics of the 371 protocol in this document. 373 Redirection of a packet is performed using IP-in-IP encapsulation. 375 2.3. Application Layer Authentication 377 With similar motivations to 2.2, this proposal 378 [I-D.paasch-mptcp-application-authentication] decouples the token 379 signalled in the TCP options from the key used in authentication, 380 allowing the token to carry arbitrary information. By allowing the 381 token to be arbitrarily assigned by the sender, a load balancer could 382 embed routing information so it knows which server to forward the 383 packets on the TCP session towards. 385 For example, the token could carry a server identifier, a port 386 number, and a signature based on a known secret. Furthermore, by 387 generating tokens directly there is no risk of hash collisions in 388 token generation. By allowing the token to be arbitrarily assigned, 389 decoupled from the keys, the authentication of additional subflows is 390 delegated to the application layer. A proposal for the use of TLS 391 for this is defined in [I-D.paasch-mptcp-tls-authentication], whereby 392 keys can be extracted from a TLS session and used to set up 393 additional subflows. 395 3. Comparaison of the solutions 397 Per-server addresses: 399 o Requires individual public addresses for each of the servers, 400 making IPv6 almost mandatory. 402 o Requires modifications to the clients and servers stack. 404 o Is transparent and works with today's load balancers. 406 o Doesn't need any modification to the applications. 408 o Disclose the real IP address of the servers. 410 o Allows to put the load balancer off-path. 412 Extra Information in Packets: 414 o Doesn't require an individual public addresses for each of the 415 servers. 417 o Requires modifications to the load balancers servers stack. 419 o Could be broken by a firewall blocking certain destination ports 420 (proposal 1) or changing the value of the timestamps (proposal 2). 422 o Doesn't need any modification to the applications. 424 o Doesn't disclose the real IP address of the servers. 426 Application Layer Authentication: 428 o Doesn't require public IP addresses 430 o Requires support at clients and load balancers 432 o Doesn't disclose IP addresses 434 o No greater risk of middle box interference than MPTCP today 436 o Additional security through no key exchange in the clear 438 4. Recommandations 440 5. IANA considerations 442 This document proposes some modifications to the Multipath TCP 443 options defined in [RFC6824]. These modifications do not require any 444 specific action from IANA. 446 6. Security considerations 448 Security considerations will be discussed in the next version of this 449 draft. 451 7. Conclusion 453 In this document, we have described and compared two solutions to 454 load balance MultiPath TCP connections. We showed that these two 455 solutions have advantages and drawbacks and cover different network 456 configurations. Future versions of this draft will discuss security 457 considerations. 459 8. References 461 8.1. Normative References 463 [I-D.paasch-mptcp-application-authentication] 464 Paasch, C. and A. Ford, "Application Layer Authentication 465 for MPTCP", draft-paasch-mptcp-application- 466 authentication-00 (work in progress), May 2016. 468 [I-D.paasch-mptcp-tls-authentication] 469 Paasch, C. and A. Ford, "TLS Authentication for MPTCP", 470 draft-paasch-mptcp-tls-authentication-00 (work in 471 progress), May 2016. 473 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 474 DOI 10.17487/RFC0791, September 1981, 475 . 477 [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, 478 "TCP Extensions for Multipath Operation with Multiple 479 Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013, 480 . 482 [RFC7323] Borman, D., Braden, B., Jacobson, V., and R. 483 Scheffenegger, Ed., "TCP Extensions for High Performance", 484 RFC 7323, DOI 10.17487/RFC7323, September 2014, 485 . 487 8.2. Informative References 489 [I-D.ietf-mptcp-rfc6824bis] 490 Ford, A., Raiciu, C., Handley, M., Bonaventure, O., and C. 491 Paasch, "TCP Extensions for Multipath Operation with 492 Multiple Addresses", draft-ietf-mptcp-rfc6824bis-07 (work 493 in progress), October 2016. 495 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 496 RFC 793, DOI 10.17487/RFC0793, September 1981, 497 . 499 [RFC1323] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions 500 for High Performance", RFC 1323, DOI 10.17487/RFC1323, May 501 1992, . 503 [RFC6182] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. 504 Iyengar, "Architectural Guidelines for Multipath TCP 505 Development", RFC 6182, DOI 10.17487/RFC6182, March 2011, 506 . 508 [RFC7430] Bagnulo, M., Paasch, C., Gont, F., Bonaventure, O., and C. 509 Raiciu, "Analysis of Residual Threats and Possible Fixes 510 for Multipath TCP (MPTCP)", RFC 7430, 511 DOI 10.17487/RFC7430, July 2015, 512 . 514 Authors' Addresses 516 Fabien Duchene 517 UCLouvain 519 Email: fabien.duchene@uclouvain.be 520 Vladimir Olteanu 521 University Politehnica of Bucharest 523 Email: vladimir.olteanu@cs.pub.ro 525 Olivier Bonaventure 526 UCLouvain 528 Email: Olivier.Bonaventure@uclouvain.be 530 Costin Raiciu 531 University Politehnica of Bucharest 533 Email: costin.raiciu@cs.pub.ro 535 Alan Ford 536 Pexip 538 Email: alan.ford@gmail.com