idnits 2.17.1 draft-ietf-intarea-flow-label-balancing-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 25, 2013) is 3988 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) ** Obsolete normative reference: RFC 6434 (Obsoleted by RFC 8504) -- Obsolete informational reference (is this intentional?): RFC 2629 (Obsoleted by RFC 7749) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IntArea B. E. Carpenter 3 Internet-Draft Univ. of Auckland 4 Intended status: Informational S. Jiang 5 Expires: November 26, 2013 Huawei Technologies Co., Ltd 6 W. Tarreau 7 Exceliance 8 May 25, 2013 10 Using the IPv6 Flow Label for Server Load Balancing 11 draft-ietf-intarea-flow-label-balancing-01 13 Abstract 15 This document describes how the IPv6 flow label as currently 16 specified can be used to enhance layer 3/4 load distribution and 17 balancing for large server farms. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on November 26, 2013. 36 Copyright Notice 38 Copyright (c) 2013 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 54 2. Summary of Flow Label Specification . . . . . . . . . . . . . 2 55 3. Summary of Load Balancing Techniques . . . . . . . . . . . . 4 56 4. Applying the Flow Label to L3/L4 Load Balancing . . . . . . . 7 57 5. Security Considerations . . . . . . . . . . . . . . . . . . . 9 58 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 59 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 60 8. Change log [RFC Editor: Please remove] . . . . . . . . . . . 10 61 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 62 9.1. Normative References . . . . . . . . . . . . . . . . . . 11 63 9.2. Informative References . . . . . . . . . . . . . . . . . 11 64 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 66 1. Introduction 68 The IPv6 flow label has been redefined [RFC6437] and is now a 69 recommended IPv6 node requirement [RFC6434]. Its use for load 70 sharing in multipath routing has been specified [RFC6438]. Another 71 scenario in which the flow label could be used is in load 72 distribution for large server farms. Load distribution is a slightly 73 more general term than load balancing, but the latter is more 74 commonly used. This document starts with brief introductions to the 75 flow label and to load balancing techniques, and then describes how 76 the flow label can be used to enhance layer 3/4 load balancers in 77 particular. 79 The motivation for this approach is to improve the performance of 80 most types of layer 3/4 load balancers, especially for traffic 81 including multiple IPv6 extension headers and in particular for 82 fragmented packets. Fragmented packets, often the result of 83 customers reaching the load balancer via a VPN with a limited MTU, 84 are a common performance problem. 86 2. Summary of Flow Label Specification 87 The IPv6 flow label is a 20 bit field included in every IPv6 header 88 [RFC2460]. It is recommended to be supported in all IPv6 nodes by 89 [RFC6434] and it is defined in [RFC6437]. There is additional 90 background material in [RFC6436] and [RFC6294]. According to its 91 definition, the flow label should be set to a constant value for a 92 given traffic flow (such as an HTTP connection), and that value will 93 belong to a uniform statistical distribution, making it potentially 94 valuable for load balancing purposes. 96 Any device that has access to the IPv6 header has access to the flow 97 label, and it is at a fixed position in every IPv6 packet. In 98 contrast, transport layer information, such as the port numbers, is 99 not always in a fixed position, since it follows any IPv6 extension 100 headers that may be present. In fact, the logic of finding the 101 transport header is always more complex for IPv6 than for IPv4, due 102 to the absence of an Internet Header Length field in IPv6. 103 Additionally, if packets are fragmented, the flow label will be 104 present in all fragments, but the transport header will only be in 105 one packet. Therefore, within the lifetime of a given transport 106 layer connection, the flow label can be a more convenient "handle" 107 than the port number for identifying that particular connection. 109 According to RFC 6437, source hosts should set the flow label, but, 110 if they do not (i.e., its value is zero), forwarding nodes (such as 111 the first-hop router) may set it instead. In both cases, the flow 112 label value must be constant for a given transport session, normally 113 identified by the IPv6 and Transport header 5-tuple. By default, the 114 flow label value should be calculated by a stateless algorithm. The 115 resulting value should form part of a statistically uniform 116 distribution, regardless of which node sets it. 118 It is recognised that at the time of writing, very few traffic flows 119 include a non-zero flow label value. The mechanism described below 120 is one that can be added to existing load balancing mechanisms, so 121 that it will become effective as more and more flows contain a non- 122 zero label. If the flow label is in fact set to zero, it will not 123 affect the information entropy of the IPv6 header. Even if the flow 124 label is chosen from an imperfectly uniform distribution, it will 125 nevertheless increase the header entropy. These facts allow for 126 progressive introduction of load balancing based on the flow label. 128 A careful reading of RFC 6437 shows that for a given source accessing 129 a well-known TCP port at a given destination, the flow label is, in 130 effect, a substitute for the source port number, found at a fixed 131 position in the layer 3 header. 133 The flow label is defined as an end-to-end component of the IPv6 134 header, but there are three qualifications to this: 136 1. Until the RFC 6437 standard is widely implemented as recommended 137 by RFC 6434, the flow label will often be set to the default 138 value of zero. 140 2. Because of the recommendation to use a stateless algorithm to 141 calculate the label, there is a low (but non-zero) probability 142 that two simultaneous flows from the same source to the same 143 destination have the same flow label value despite having 144 different transport protocol port numbers. 146 3. The flow label field is in an unprotected part of the IPv6 147 header, which means that intentional or unintentional changes to 148 its value cannot be easily detected by a receiver. 150 The first two points are addressed below in Section 4 and the third 151 in Section 5. 153 3. Summary of Load Balancing Techniques 155 Load balancing for server farms is achieved by a variety of methods, 156 often used in combination [Tarreau]. This section gives a general 157 overview of common methods, although the flow label is not relevant 158 to all of them. The actual load balancing algorithm (the choice of 159 which server to use for a new client session) is irrelevant to this 160 discussion. 162 o The simplest method is simply using the DNS to return different 163 server addresses for a single name such as www.example.com to 164 different users. This is typically done by rotating the order in 165 which different addresses within the server site are listed by the 166 relevant authoritative DNS server, on the assumption that the 167 client will pick the first one. Routing may be configured such 168 that the different addresses are handled by different ingress 169 routers. Several variants of this load balancing mechanism exist, 170 such as expecting some clients to use all the advertised addresses 171 when multiple connections are involved, or directing the traffic 172 to multiple sites, also known as global load balancing. None of 173 these mechanisms are in the scope of this document, and what this 174 document proposes does not affect their usability nor aims to 175 replace them, so they will not be discussed further. 177 o Another method, for HTTP servers, is to operate a layer 7 reverse 178 proxy in front of the server farm. The reverse proxy will present 179 a single IP address to the world, communicated to clients by a 180 single AAAA record. For each new client session (an incoming TCP 181 connection and HTTP request), it will pick a particular server and 182 proxy the session to it. The act of proxying should be more 183 efficient and less resource-intensive than the act of serving the 184 required content. The proxy must retain TCP state and proxy state 185 for the duration of the session. This TCP state could, 186 potentially, include the incoming flow label value. 188 o A component of some load balancing systems is an SSL reverse proxy 189 farm. The individual SSL proxies handle all cryptographic aspects 190 and exchange raw HTTP with the actual servers. Thus, from the 191 load balancing point of view, this really looks just like a server 192 farm, except that it's specialised for HTTPS. Each proxy will 193 retain SSL and TCP and maybe HTTP state for the duration of the 194 session, and the TCP state could potentially include the flow 195 label. 197 o Finally the "front end" of many load balancing systems is a layer 198 3/4 load balancer. While it can be a dedicated device, it is also 199 a standard function of some network switches or routers (e.g. 200 using ECMP, [RFC2991]). In this case, it is the layer 3/4 load 201 balancer whose IP address is published as the primary AAAA record 202 for the service. All client sessions will pass through this 203 device. According to the specific scenario, it will spread new 204 sessions across the actual application servers, across an SSL 205 proxy farm, or across a set of layer 7 proxies. In all cases, the 206 layer 3/4 load balancer has to recognize incoming packets as 207 belonging to new or existing client sessions, and choose the 208 target server or proxy so as to ensure persistence. 'Persistence' 209 is defined as guaranteeing that a given session will run to 210 completion on a single server. The layer 3/4 load balancer 211 therefore needs to inspect each incoming packet to identify the 212 session. There are two common types of layer 3/4 load balancers, 213 the totally stateless ones which only act on packets, generally 214 involving a per-packet hashing of easy-to-find information such as 215 the source address and/or port into a server number, and the 216 stateful ones which take the routing decision on the very first 217 packets of a session and maintain the same direction for all 218 packets belonging to the same session. Clearly, both types of 219 layer 3/4 balancers could inspect and make use of the flow label 220 value. 222 Our focus is on how the balancer identifies a particular flow. 223 For clarity, note that two aspects of layer 3/4 load balancers are 224 not affected by use of the flow label to identify sessions: 226 1. Balancers use various techniques to redirect traffic to a 227 specific target server. 229 - All servers are configured with the same IP address, they 230 are all on the same LAN, and the load balancer sends directly 231 to their individual MAC addresses. In this case, return 232 packets from the server to the client are sent back without 233 passing through the balancer, a technique known as direct 234 server return, but we are not concerned here with the return 235 packets. 236 - Each server has its own IP address, and the balancer uses an 237 IP-in-IP tunnel to reach it. 238 - Each server has its own IP address, and the balancer 239 performs NAPT (network address and port translation) to 240 deliver the client's packets to that address. 242 The choice between these methods is not affected by use of the 243 flow label. 245 2. A layer 3/4 balancer must correctly handle Path MTU Discovery 246 by forwarding relevant ICMPv6 packets in both directions. 247 This too is not affected by use of the flow label. 249 The following diagram, inspired by [Tarreau], shows a maximum layout 250 with various methods in use together. 252 ___________________________________________ 253 ( ) 254 ( Clients in the Internet ) 255 (___________________________________________) 256 | | 257 ------------ ------------ 258 | Ingress | | Ingress | 259 | router | | router | 260 ------------ ------------ 261 ___|_______DNS-based____________|___ 262 | load splitting | 263 | (if used) occurs | 264 | here | 265 ------------ ------------ 266 | L3/4 ASIC| | L3/4 ASIC| 267 | balancer | | balancer | 268 ------------ ------------ 269 | load | 270 | spreading | 271 __________|________________________|___________ 272 | | | | 273 ------------ ------------ -------- -------- 274 |HTTP proxy|...|HTTP proxy| | SSL |...| SSL | 275 | balancer | | balancer | | proxy| | proxy| 276 ------------ ------------ -------- -------- 277 ____|_____________|_____________|_________|_____ 278 | | | | | 279 -------- -------- -------- -------- -------- 280 |HTTP | |HTTP | |HTTP | |HTTP | |HTTP | 281 |server| |server| |server| |server| |server| 282 -------- -------- -------- -------- -------- 284 From the previous paragraphs, we can identify several points in this 285 diagram where the flow label might be relevant: 287 1. Layer 3/4 load balancers. 289 2. SSL proxies. 291 3. HTTP proxies. 293 However, usage by the proxies seems unlikely to be cost-effective, so 294 in this document we focus only on layer 3/4 balancers. 296 4. Applying the Flow Label to L3/L4 Load Balancing 298 The suggested model for using the flow label to enhance a L3/L4 load 299 balancing mechanism is as follows: 301 o We are only concerned with IPv6 traffic in which the flow label 302 value has been set at or near the source according to [RFC6437]. 303 If the flow label of an incoming packet is zero, load balancers 304 will continue to use the transport header in the traditional way. 305 As the use of the flow label becomes more prevalent according to 306 RFC 6434, load balancers, and therefore users, will reap a growing 307 performance benefit. 309 o If the flow label of an incoming packet is non-zero, layer 3/4 310 load balancers can use the 2-tuple {source address, flow label} as 311 the session key for whatever load distribution algorithm they 312 support. If any IPv6 extension headers, including fragment 313 headers, are present, this will be significantly quicker than 314 searching for the transport port numbers later in the packet. 315 Moreover, the transport layer information such as the source port 316 is not repeated in fragments, which generally prevents stateless 317 load balancers from supporting fragmented traffic since they 318 generally cannot reassemble fragments. 320 A stateless layer 3/4 load balancer would simply apply a hash 321 algorithm to the 2-tuple {source address, flow label} on all 322 packets, in order to select the same target server consistently 323 for a given flow. Needless to say, the hash algorithm has to be 324 well chosen for its purpose, but this problem is common to several 325 forms of stateless load balancing. The discussion in [RFC6438] 326 applies. 328 A stateful layer 3/4 load balancer would apply its usual load 329 distribution algorithm to the first packet of a session, and store 330 the {2-tuple, server} association in a table so that subsequent 331 packets belonging to the same session are forwarded to the same 332 server. Thus, for all subsequent packets of the session, it can 333 ignore all IPv6 extension headers, which should lead to a 334 performance benefit. Whether this benefit is valuable will depend 335 on engineering details of the specific load balancer. 337 Layer 3/4 balancers that redirect the incoming packets by NAPT are 338 not expected to obtain any saving of time by using the flow label, 339 because they have no choice but to follow the extension header 340 chain, in order to locate and modify the port number and transport 341 checksum. The same would apply to balancers that perform TCP 342 state tracking for any reason. 344 o Note that correct handling of ICMPv6 for Path MTU Discovery 345 requires the layer 3/4 balancer to keep state for the client 346 source address, independently of either the port numbers or the 347 flow label. 349 o SSL and HTTP proxies, if present, should forward the flow label 350 value towards the server. This has no performance benefit, but is 351 consistent with the general RFC 6437 model for the flow label. 353 It should be noted that the performance benefit, if any, depends 354 entirely on engineering trade-offs in the design of the L3/L4 355 balancer. An extra test is needed (is the label non-zero?), but all 356 logic for handling extension headers can be omitted except for the 357 first packet of a new flow. Since the only state to be stored is the 358 2-tuple and the server identifier, storage requirements will be 359 reduced. Additionally, the method will work for fragmented traffic 360 and for flows where the transport information is missing (unknown 361 transport protocol) or obfuscated (e.g., IPsec). Traffic reaching 362 the load balancer via a VPN is particularly prone to the 363 fragmentation issue, due to MTU size issues. For some load balancer 364 designs, these are very significant advantages. 366 In the unlikely event of two simultaneous flows from the same source 367 address having the same flow label value, the two flows would end up 368 assigned to the same server, where they would be distinguished as 369 normal by their port numbers. There are approximately one million 370 possible flow label values, and if the rules for flow label 371 generation [RFC6437] are followed, this would be a statistically rare 372 event, and would not damage the overall load balancing effect. 374 Moreover, with a million possible label values, it is very likely 375 that there will be many more flow label values than servers at most 376 sites, so it is already expected that multiple flow label values will 377 end up on the same server for a given client IP address. 379 In the case that many thousands of clients are hidden behind the same 380 large-scale NAPT (network address and port translator) with a single 381 shared IP address, the assumption of low probability of conflicts 382 might become incorrect, unless flow label values are random enough to 383 avoid following similar sequences for all clients. This is not 384 expected to be a factor for IPv6 anyway, since there is no need to 385 implement large-scale NAPT with address sharing [RFC4864]. The 386 statistical assumption is valid for sites that implement network 387 prefix translation [RFC6296], since this technique provides a 388 different address for each client. 390 5. Security Considerations 392 Security aspects of the flow label are discussed in [RFC6437]. As 393 noted there, a malicious source or man-in-the-middle could disturb 394 load balancing by manipulating flow labels. This risk already exists 395 today where the source address and port are used as hashing key in 396 layer 3/4 load balancers, as well as where a persistence cookie is 397 used in HTTP to designate a server. It even exists on layer 3 398 components which only rely on the source address to select a 399 destination, making them more DDoS-prone. Nevertheless, all these 400 methods are currently used because the benefits for load balancing 401 and persistence hugely outweigh the risks. The flow label does not 402 significantly alter this situation. 404 Specifically, the specification [RFC6437] states that "stateless 405 classifiers should not use the flow label alone to control load 406 distribution, and stateful classifiers should include explicit 407 methods to detect and ignore suspect flow label values." The former 408 point is answered by also using the source address. The latter point 409 is more complex. If the risk is considered serious, the site ingress 410 router or the layer 3/4 balancer should use a suitable heuristic to 411 verify incoming flows with non-zero flow label values. If a flow 412 from a given source address and port number does not have a constant 413 flow label value, it is suspect and should be dropped. This would 414 deal with both intentional and accidental changes to the flow label. 416 RFC 6437 notes in its Security Considerations that if the covert 417 channel risk is considered significant, a firewall might rewrite non- 418 zero flow labels. As long as this is done as described in RFC 6437, 419 it will not invalidate the mechanisms described above. 421 The flow label may be of use in protecting against distributed denial 422 of service (DDOS) attacks against servers. As noted in RFC 6437, a 423 source should generate flow label values that are hard to predict, 424 most likely by including a secret nonce in the hash used to generate 425 each label. The attacker does not know the nonce and therefore has 426 no way to invent flow labels which will all target the same server, 427 even with knowledge of both the hash algorithm and the load balancing 428 algorithm. Still, it is important to understand that it is always 429 trivial to force a load balancer to stick to the same server during 430 an attack, so the security of the whole solution must not rely on the 431 unpredicatability of the flow label values alone, but should include 432 defensive measures like most load balancers already have against 433 abnormal use of source address or session cookies. 435 New flows are assigned to a server according to any of the usual 436 algorithms available on the load balancer (e.g., least connections, 437 round robin, etc.). The association between the flow label value and 438 the server is stored in a table (often called stick table) so that 439 future connections using the same flow label can be sent to the same 440 server. This method is more robust against a loss of server and also 441 makes it harder for an attacker to target a specific server, because 442 the association between a flow label value and a server is not known 443 externally. 445 In the case that a stateless hash function is used to assign client 446 packets to specific servers, it may be advisable to use a 447 cryptographic hash function of some kind, to ensure that an attacker 448 cannot predict the behaviour of the load balancer. 450 6. IANA Considerations 452 This document requests no action by IANA. 454 7. Acknowledgements 456 Valuable comments and contributions were made by Fred Baker, Olivier 457 Bonaventure, Lorenzo Colitti, Linda Dunbar, Donald Eastlake, Joel 458 Jaeggli, Gurudeep Kamat, Warren Kumari, Julia Renouard, Julius Volz, 459 and others. 461 This document was produced using the xml2rfc tool [RFC2629]. 463 8. Change log [RFC Editor: Please remove] 465 draft-ietf-intarea-flow-label-balancing-01: clarifications based on 466 WG comments, 2013-05-25. 468 draft-ietf-intarea-flow-label-balancing-00: WG adoption, minor WG 469 comments, 2013-01-15. 471 draft-carpenter-flow-label-balancing-02: updates based on external 472 review, 2012-12-05. 474 draft-carpenter-flow-label-balancing-01: update following comments, 475 2012-06-12. 477 draft-carpenter-flow-label-balancing-00: restructured after IETF83, 478 2012-05-08. 480 draft-carpenter-v6ops-label-balance-02: clarified after WG 481 discussions, 2012-03-06. 483 draft-carpenter-v6ops-label-balance-01: updated with community 484 comments, additional author, 2012-01-17. 486 draft-carpenter-v6ops-label-balance-00: original version, 2011-10-13. 488 9. References 490 9.1. Normative References 492 [RFC2460] Deering, S.E. and R.M. Hinden, "Internet Protocol, Version 493 6 (IPv6) Specification", RFC 2460, December 1998. 495 [RFC6434] Jankiewicz, E., Loughney, J., and T. Narten, "IPv6 Node 496 Requirements", RFC 6434, December 2011. 498 [RFC6437] Amante, S., Carpenter, B., Jiang, S., and J. Rajahalme, 499 "IPv6 Flow Label Specification", RFC 6437, November 2011. 501 9.2. Informative References 503 [RFC2629] Rose, M.T., "Writing I-Ds and RFCs using XML", RFC 2629, 504 June 1999. 506 [RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and 507 Multicast Next-Hop Selection", RFC 2991, November 2000. 509 [RFC4864] Van de Velde, G., Hain, T., Droms, R., Carpenter, B., and 510 E. Klein, "Local Network Protection for IPv6", RFC 4864, 511 May 2007. 513 [RFC6294] Hu, Q. and B. Carpenter, "Survey of Proposed Use Cases for 514 the IPv6 Flow Label", RFC 6294, June 2011. 516 [RFC6296] Wasserman, M. and F. Baker, "IPv6-to-IPv6 Network Prefix 517 Translation", RFC 6296, June 2011. 519 [RFC6436] Amante, S., Carpenter, B., and S. Jiang, "Rationale for 520 Update to the IPv6 Flow Label Specification", RFC 6436, 521 November 2011. 523 [RFC6438] Carpenter, B. and S. Amante, "Using the IPv6 Flow Label 524 for Equal Cost Multipath Routing and Link Aggregation in 525 Tunnels", RFC 6438, November 2011. 527 [Tarreau] Tarreau, W., "Making applications scalable with load 528 balancing", 2006, . 530 Authors' Addresses 532 Brian Carpenter 533 Department of Computer Science 534 University of Auckland 535 PB 92019 536 Auckland 1142 537 New Zealand 539 Email: brian.e.carpenter@gmail.com 541 Sheng Jiang 542 Huawei Technologies Co., Ltd 543 Q14, Huawei Campus 544 No.156 Beiqing Road 545 Hai-Dian District, Beijing 100095 546 P.R. China 548 Email: jiangsheng@huawei.com 550 Willy Tarreau 551 Exceliance 552 R&D Produits reseau 553 3 rue du petit Robinson 554 78350 Jouy-en-Josas 555 France 557 Email: w@1wt.eu