idnits 2.17.1 draft-carpenter-flow-label-balancing-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 12, 2012) is 4335 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) ** Obsolete normative reference: RFC 6434 (Obsoleted by RFC 8504) -- Obsolete informational reference (is this intentional?): RFC 2629 (Obsoleted by RFC 7749) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group B. Carpenter 3 Internet-Draft Univ. of Auckland 4 Intended status: Informational S. Jiang 5 Expires: December 14, 2012 Huawei Technologies Co., Ltd 6 W. Tarreau 7 Exceliance 8 June 12, 2012 10 Using the IPv6 Flow Label for Server Load Balancing 11 draft-carpenter-flow-label-balancing-01 13 Abstract 15 This document describes how the IPv6 flow label as currently 16 specified can be used to enhance layer 3/4 load distribution and 17 balancing for large server farms. 19 Status of this Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on December 14, 2012. 36 Copyright Notice 38 Copyright (c) 2012 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 54 2. Summary of Flow Label Specification . . . . . . . . . . . . . 3 55 3. Summary of Load Balancing Techniques . . . . . . . . . . . . . 4 56 4. Applying the Flow Label to L3/L4 Load Balancing . . . . . . . 7 57 5. Security Considerations . . . . . . . . . . . . . . . . . . . 9 58 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 59 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 60 8. Change log [RFC Editor: Please remove] . . . . . . . . . . . . 10 61 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 62 9.1. Normative References . . . . . . . . . . . . . . . . . . . 10 63 9.2. Informative References . . . . . . . . . . . . . . . . . . 11 64 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 66 1. Introduction 68 The IPv6 flow label has been redefined [RFC6437] and is now a 69 recommended IPv6 node requirement [RFC6434]. Its use for load 70 sharing in multipath routing has been specified [RFC6438]. Another 71 scenario in which the flow label could be used is in load 72 distribution for large server farms. Load distribution is a slightly 73 more general term than load balancing, but the latter is more 74 commonly used. This document starts with brief introductions to the 75 flow label and to load balancing techniques, and then describes how 76 the flow label can be used to enhance layer 3/4 load balancers in 77 particular. 79 The motivation for this approach is to improve the performance of 80 most types of layer 3/4 load balancers, especially for traffic 81 including multiple IPv6 extension headers and in particular for 82 fragmented packets. Fragmented packets, often the result of 83 customers reaching the load balancer via a VPN with a limited MTU, 84 are a common performance problem. 86 2. Summary of Flow Label Specification 88 The IPv6 flow label is a 20 bit field included in every IPv6 header 89 [RFC2460]. It is recommended to be supported in all IPv6 nodes by 90 [RFC6434] and it is defined in [RFC6437]. According to this 91 definition, the flow label should be set to a constant value for a 92 given traffic flow (such as an HTTP connection). 94 Any device that has access to the IPv6 header has access to the flow 95 label, and it is at a fixed position in every IPv6 packet. In 96 contrast, transport layer information, such as the port numbers, is 97 not always in a fixed position, since it follows any IPv6 extension 98 headers that may be present. In fact, the logic of finding the 99 transport header is always more complex for IPv6 than for IPv4, due 100 to the absence of an Internet Header Length field in IPv6. 101 Therefore, within the lifetime of a given transport layer connection, 102 the flow label can be a more convenient "handle" than the port number 103 for identifying that particular connection. 105 According to RFC 6437, source hosts should set the flow label, but if 106 they do not (i.e. its value is zero), forwarding nodes (such as the 107 first-hop router) may set it instead. In both cases, the flow label 108 value must be constant for a given transport session, normally 109 identified by the IPv6 and Transport header 5-tuple. By default, the 110 flow label value should be calculated by a stateless algorithm. The 111 resulting value should form part of a statistically uniform 112 distribution. 114 A careful reading of RFC 6437 shows that for a given source accessing 115 a well-known TCP port at a given destination, the flow label is in 116 effect a substitute for the source port number, found at a fixed 117 position in the layer 3 header. 119 The flow label is defined as an end-to-end component of the IPv6 120 header, but there are three qualifications to this: 122 1. Until the RFC 6437 standard is widely implemented as recommended 123 by RFC 6434, the flow label will often be set to the default 124 value of zero. 125 2. Because of the recommendation to use a stateless algorithm to 126 calculate the label, there is a low (but non-zero) probability 127 that two simultaneous flows from the same source to the same 128 destination have the same flow label value despite having 129 different transport protocol port numbers. 130 3. The flow label field is in an unprotected part of the IPv6 131 header, which means that intentional or unintentional changes to 132 its value cannot be trivially detected by a receiver. 134 The first two points are addressed below in Section 4 and the third 135 in Section 5. 137 3. Summary of Load Balancing Techniques 139 Load balancing for server farms is achieved by a variety of methods, 140 often used in combination [Tarreau]. The flow label is not relevant 141 to all of them, and the actual load balancing algorithm (the choice 142 of which server to use for a new client session) is irrelevant to 143 this discussion. 145 o The simplest method is simply using the DNS to return different 146 server addresses for a single name such as www.example.com to 147 different users. Typically this is done by rotating the order in 148 which different addresses are listed by the relevant authoritative 149 DNS server, assuming that the client will pick the first one. 150 Routing may be configured such that the different addresses are 151 handled by different ingress routers. The flow label can have no 152 impact on this method and it is not discussed further. 153 o Another method, for HTTP servers, is to operate a layer 7 reverse 154 proxy in front of the server farm. The reverse proxy will present 155 a single IP address to the world, communicated to clients by a 156 single AAAA record. For each new client session (an incoming TCP 157 connection and HTTP request), it will pick a particular server and 158 proxy the session to it. Hopefully the act of proxying will be 159 cheap compared to the act of serving the required content. The 160 proxy must retain TCP state and proxy state for the duration of 161 the session. This TCP state could, potentially, include the 162 incoming flow label value. 163 o A component of some load balancing systems is an SSL reverse proxy 164 farm. The individual SSL proxies handle all cryptographic aspects 165 and exchange raw HTTP with the actual servers. Thus, from the 166 load balancing point of view, this really looks just like a server 167 farm, except that it's specialised for HTTPS. Each proxy will 168 retain SSL and TCP and maybe HTTP state for the duration of the 169 session, and the TCP state could potentially include the flow 170 label. 171 o Finally the "front end" of many load balancing systems is a layer 172 3/4 load balancer. While it can sometimes be a dedicated 173 hardware, it also happens to be a standard function of some 174 network switches or routers (eg: using ECMP, [RFC2991]). In this 175 case, it is the layer 3/4 load balancer whose IP address is 176 published as the primary AAAA record for the service. All client 177 sessions will pass through this device. According to the precise 178 scenario, it will spread new sessions across the actual 179 application servers, across an SSL proxy farm, or across a set of 180 layer 7 proxies. In all cases, the layer 3/4 load balancer has to 181 recognize incoming packets as belonging to new or existing client 182 sessions, and choose the target server or proxy so as to ensure 183 persistence. 'Persistence' is defined as guaranteeing that a 184 given session will run to completion on a single server. The 185 layer 3/4 load balancer therefore needs to inspect each incoming 186 packet to identify the session. There are two common types of 187 layer 3/4 load balancers, the totally stateless ones which only 188 act on packets, generally involving a per-packet hashing of easy- 189 to-find information such as the source address and/or port into a 190 server number, and the stateful ones which take the routing 191 decision on the very first packets of a session and maintain the 192 same direction for all packets belonging to the same session. 193 Clearly, both types of layer 3/4 balancers could inspect and make 194 use of the flow label value. 196 Our focus is on how the balancer identifies a particular flow. 197 For clarity, note that two aspects of layer 3/4 load balancers 198 could not be affected by use of the flow label to identify 199 sessions: 201 1. Balancers use various techniques to redirect traffic to a 202 specific target server. 204 - All servers are configured with the same IP address, they 205 are all on the same LAN, and the load balancer sends directly 206 to their individual MAC addresses. 207 - Each server has its own IP address, and the balancer uses an 208 IP-in-IP tunnel to reach it. 210 - Each server has its own IP address, and the balancer 211 performs NAPT (network address and port translation) to 212 deliver the client's packets to that address. 214 The choice between these methods is not affected by use of the 215 flow label. 217 2. A layer 3/4 balancer must correctly handle Path MTU Discovery 218 by forwarding relevant ICMPv6 packets in both directions. 219 This too is not affected by use of the flow label. 221 The following diagram, inspired by [Tarreau], shows a maximum layout. 223 ___________________________________________ 224 ( ) 225 ( Clients in the Internet ) 226 (___________________________________________) 227 | | 228 ------------ ------------ 229 | Ingress | | Ingress | 230 | router | | router | 231 ------------ ------------ 232 ___|_______DNS-based____________|___ 233 | load splitting | 234 | | 235 | | 236 ------------ ------------ 237 | L3/4 ASIC| | L3/4 ASIC| 238 | balancer | | balancer | 239 ------------ ------------ 240 | load | 241 | spreading | 242 __________|________________________|___________ 243 | | | | 244 ------------ ------------ -------- -------- 245 |HTTP proxy|...|HTTP proxy| | SSL |...| SSL | 246 | balancer | | balancer | | proxy| | proxy| 247 ------------ ------------ -------- -------- 248 ____|_____________|_____________|_________|_____ 249 | | | | | 250 -------- -------- -------- -------- -------- 251 |HTTP | |HTTP | |HTTP | |HTTP | |HTTP | 252 |server| |server| |server| |server| |server| 253 -------- -------- -------- -------- -------- 255 From the previous paragraphs, we can identify several points in this 256 diagram where the flow label might be relevant: 258 1. Layer 3/4 load balancers. 259 2. SSL proxies. 260 3. HTTP proxies. 262 However, usage by the proxies seems unlikely to be cost-effective, so 263 in this document we focus only on layer 3/4 balancers. 265 4. Applying the Flow Label to L3/L4 Load Balancing 267 The suggested model for using the flow label in a load balancing 268 mechanism is as follows: 270 o We are only concerned with IPv6 traffic in which the flow label 271 value has been set at or near the source according to [RFC6437]. 272 If the flow label of an incoming packet is zero, load balancers 273 will continue to use the transport header in the traditional way. 274 As the use of the flow label becomes more prevalent according to 275 RFC 6434, load balancers, and therefore users, will reap a growing 276 performance benefit. 277 o If the flow label of an incoming packet is non-zero, layer 3/4 278 load balancers can use the 2-tuple {source address, flow label} as 279 the session key for whatever load distribution algorithm they 280 support. If any IPv6 extension headers, including fragment 281 headers, are present, this will be significantly quicker than 282 searching for the transport port numbers later in the packet. 283 Moreover, the transport layer information such as the source port 284 is not repeated in fragments, which generally prevents stateless 285 load balancers from supporting fragmented traffic since they 286 generally cannot reassemble fragments. 288 Note that balancers usually do not need to consider the 289 destination address as it is always the same, i.e., the server 290 address. 292 A stateless layer 3/4 load balancer would simply apply a hash 293 algorithm to the 2-tuple {source address, flow label} on all 294 packets, in order to select the same target server consistently 295 for a given flow. 297 A stateful layer 3/4 load balancer would apply its usual load 298 distribution algorithm to the first packet of a session, and store 299 the {2-tuple, server} association in a table so that subsequent 300 packets belonging to the same session are forwarded to the same 301 server. Thus, for all subsequent packets of the session, it can 302 ignore all IPv6 extension headers, which should lead to a 303 performance benefit. Whether this benefit is valuable will depend 304 on engineering details of the specific load balancer. 306 Layer 3/4 balancers that redirect the incoming packets by NAPT are 307 not expected to obtain any saving of time by using the flow label, 308 because they must in any case follow the extension header chain in 309 order to locate and modify the port number and transport checksum. 310 The same would apply to balancers that perform TCP state tracking 311 for any reason. 312 o Note that correct handling of ICMPv6 for Path MTU Discovery 313 requires the layer 3/4 balancer to keep state for the client 314 source address, independently of either the port numbers or the 315 flow label. 316 o SSL and HTTP proxies, if present, should forward the flow label 317 value towards the server. This has no performance benefit, but is 318 consistent with the general RFC 6437 model for the flow label. 320 It should be noted that the performance benefit, if any, depends 321 entirely on engineering trade-offs in the design of the L3/L4 322 balancer. An extra test is needed (is the label non-zero?), but all 323 logic for handling extension headers can be omitted except for the 324 first packet of a new flow. Since the only state to be stored is the 325 2-tuple and the server identifier, storage requirements will be 326 reduced. Additionally, the method will work for fragmented traffic 327 and for flows where the transport information is missing (unknown 328 transport protocol) or obfuscated (e.g., IPsec). Traffic reaching 329 the load balancer via a VPN is particularly prone to the 330 fragmentation issue, due to MTU size issues. For some load balancer 331 designs, these are very significant advantages. 333 In the unlikely event of two simultaneous flows from the same source 334 address having the same flow label value, the two flows would end up 335 assigned to the same server, where they would be distinguished as 336 normal by their port numbers. Since this would be a statistically 337 rare event, it would not damage the overall load balancing effect. 338 Moreover, it is very likely that there will be many more flow label 339 values than servers at most sites (1 million possible label values), 340 so it is already expected that multiple flow label values will end up 341 on the same server for a given IP address. In the case where many 342 thousands of clients are hidden behind the same large-scale NAT with 343 a single IP address, the assumption of low probability of conflicts 344 might become incorrect unless flow label values are random enough to 345 avoid following similar sequences for all clients. This is not 346 expected to be a factor for IPv6 anyway, since there is no valid 347 reason to implement NAT [RFC4864]. The statistical assumption is 348 valid for sites that implement network prefix translation [RFC6296], 349 since this technique provides a different address for each client. 351 5. Security Considerations 353 Security aspects of the flow label are discussed in [RFC6437]. As 354 noted there, a malicious source or man-in-the-middle could disturb 355 load balancing by manipulating flow labels. This risk already exists 356 today where the source address and port are used as hashing key in 357 layer 3/4 load balancers, as well as where a persistence cookie is 358 used in HTTP to designate a server. It even exists on layer 3 359 components which only rely on the source address to select a 360 destination, making them more DDoS-prone. Nevertheless, all these 361 methods are currently used because the benefits for load balancing 362 and persistence hugely outweigh the risks. 364 Specifically, [RFC6437] states that "stateless classifiers should not 365 use the flow label alone to control load distribution, and stateful 366 classifiers should include explicit methods to detect and ignore 367 suspect flow label values." The former point is answered by also 368 using the source address. The latter point is more complex. If the 369 risk is considered serious, the site ingress router or the layer 3/4 370 balancer should verify incoming flows with non-zero flow label 371 values. If a flow from a given source address and port number does 372 not have a constant flow label value, it is suspect and should be 373 dropped. This would deal with both intentional and accidental 374 changes to the flow label. 376 RFC 6437 notes in its Security Considerations that if the covert 377 channel risk is considered significant, a firewall might rewrite non- 378 zero flow labels. As long as this is done as described in RFC 6437, 379 it will not invalidate the mechanisms described above. 381 The flow label may be of use in protecting against distributed denial 382 of service (DDOS) attacks against servers. As noted in RFC 6437, a 383 source should generate flow label values that are hard to predict, 384 most likely by including a secret nonce in the hash used to generate 385 each label. The attacker does not know the nonce and therefore has 386 no way to invent flow labels which will all target the same server, 387 even with knowledge of both the hash algorithm and the load balancing 388 algorithm. Still, it is important to understand that it is always 389 trivial to force a load balancer to stick to the same server during 390 an attack, so the security of the whole solution must not rely on the 391 unpredicatability of the flow label values alone, but should include 392 defensive measures like most load balancers already have against 393 abnormal use of source address or session cookies. 395 New flows are assigned to a server according to any of the usual 396 algorithms available on the load balancer (e.g., least connections, 397 round robin, etc.). The association between the flow label value and 398 the server is stored in a table (often called stick table) so that 399 future connections using the same flow label can be sent to the same 400 server. This method is more robust against a loss of server and also 401 makes it harder for an attacker to target a specific server, because 402 the association between a flow label value and a server is not known 403 externally. 405 6. IANA Considerations 407 This document requests no action by IANA. 409 7. Acknowledgements 411 Valuable comments and contributions were made by Fred Baker, Lorenzo 412 Colitti, Joel Jaeggli, Gurudeep Kamat, Julius Volz, and others. 414 This document was produced using the xml2rfc tool [RFC2629]. 416 8. Change log [RFC Editor: Please remove] 418 draft-carpenter-flow-label-balancing-01: update following comments, 419 2012-06-12. 421 draft-carpenter-flow-label-balancing-00: restructured after IETF83, 422 2012-05-08. 424 draft-carpenter-v6ops-label-balance-02: clarified after WG 425 discussions, 2012-03-06. 427 draft-carpenter-v6ops-label-balance-01: updated with community 428 comments, additional author, 2012-01-17. 430 draft-carpenter-v6ops-label-balance-00: original version, 2011-10-13. 432 9. References 434 9.1. Normative References 436 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 437 (IPv6) Specification", RFC 2460, December 1998. 439 [RFC6434] Jankiewicz, E., Loughney, J., and T. Narten, "IPv6 Node 440 Requirements", RFC 6434, December 2011. 442 [RFC6437] Amante, S., Carpenter, B., Jiang, S., and J. Rajahalme, 443 "IPv6 Flow Label Specification", RFC 6437, November 2011. 445 9.2. Informative References 447 [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, 448 June 1999. 450 [RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and 451 Multicast Next-Hop Selection", RFC 2991, November 2000. 453 [RFC4864] Van de Velde, G., Hain, T., Droms, R., Carpenter, B., and 454 E. Klein, "Local Network Protection for IPv6", RFC 4864, 455 May 2007. 457 [RFC6296] Wasserman, M. and F. Baker, "IPv6-to-IPv6 Network Prefix 458 Translation", RFC 6296, June 2011. 460 [RFC6438] Carpenter, B. and S. Amante, "Using the IPv6 Flow Label 461 for Equal Cost Multipath Routing and Link Aggregation in 462 Tunnels", RFC 6438, November 2011. 464 [Tarreau] Tarreau, W., "Making applications scalable with load 465 balancing", 2006, . 467 Authors' Addresses 469 Brian Carpenter 470 Department of Computer Science 471 University of Auckland 472 PB 92019 473 Auckland, 1142 474 New Zealand 476 Email: brian.e.carpenter@gmail.com 478 Sheng Jiang 479 Huawei Technologies Co., Ltd 480 Q14, Huawei Campus 481 No.156 Beiqing Road 482 Hai-Dian District, Beijing 100095 483 P.R. China 485 Email: jiangsheng@huawei.com 486 Willy Tarreau 487 Exceliance 488 R&D Produits reseau 489 3 rue du petit Robinson 490 78350 Jouy-en-Josas 491 France 493 Email: w@1wt.eu