idnits 2.17.1 

draft-ietf-intarea-flow-label-balancing-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (January 15, 2013) is 4118 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200)

  ** Obsolete normative reference: RFC 6434 (Obsoleted by RFC 8504)

  -- Obsolete informational reference (is this intentional?): RFC 2629
     (Obsoleted by RFC 7749)


     Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	IntArea                                                     B. Carpenter
3	Internet-Draft                                         Univ. of Auckland
4	Intended status: Informational                                  S. Jiang
5	Expires: July 19, 2013                      Huawei Technologies Co., Ltd
6	                                                              W. Tarreau
7	                                                              Exceliance
8	                                                        January 15, 2013

10	          Using the IPv6 Flow Label for Server Load Balancing
11	               draft-ietf-intarea-flow-label-balancing-00

13	Abstract

15	   This document describes how the IPv6 flow label as currently
16	   specified can be used to enhance layer 3/4 load distribution and
17	   balancing for large server farms.

19	Status of this Memo

21	   This Internet-Draft is submitted in full conformance with the
22	   provisions of BCP 78 and BCP 79.

24	   Internet-Drafts are working documents of the Internet Engineering
25	   Task Force (IETF).  Note that other groups may also distribute
26	   working documents as Internet-Drafts.  The list of current Internet-
27	   Drafts is at http://datatracker.ietf.org/drafts/current/.

29	   Internet-Drafts are draft documents valid for a maximum of six months
30	   and may be updated, replaced, or obsoleted by other documents at any
31	   time.  It is inappropriate to use Internet-Drafts as reference
32	   material or to cite them other than as "work in progress."

34	   This Internet-Draft will expire on July 19, 2013.

36	Copyright Notice

38	   Copyright (c) 2013 IETF Trust and the persons identified as the
39	   document authors.  All rights reserved.

41	   This document is subject to BCP 78 and the IETF Trust's Legal
42	   Provisions Relating to IETF Documents
43	   (http://trustee.ietf.org/license-info) in effect on the date of
44	   publication of this document.  Please review these documents
45	   carefully, as they describe your rights and restrictions with respect
46	   to this document.  Code Components extracted from this document must
47	   include Simplified BSD License text as described in Section 4.e of
48	   the Trust Legal Provisions and are provided without warranty as
49	   described in the Simplified BSD License.

51	Table of Contents

53	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
54	   2.  Summary of Flow Label Specification  . . . . . . . . . . . . .  3
55	   3.  Summary of Load Balancing Techniques . . . . . . . . . . . . .  4
56	   4.  Applying the Flow Label to L3/L4 Load Balancing  . . . . . . .  7
57	   5.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
58	   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 10
59	   7.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11
60	   8.  Change log [RFC Editor: Please remove] . . . . . . . . . . . . 11
61	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
62	     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 11
63	     9.2.  Informative References . . . . . . . . . . . . . . . . . . 12
64	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12

66	1.  Introduction

68	   The IPv6 flow label has been redefined [RFC6437] and is now a
69	   recommended IPv6 node requirement [RFC6434].  Its use for load
70	   sharing in multipath routing has been specified [RFC6438].  Another
71	   scenario in which the flow label could be used is in load
72	   distribution for large server farms.  Load distribution is a slightly
73	   more general term than load balancing, but the latter is more
74	   commonly used.  This document starts with brief introductions to the
75	   flow label and to load balancing techniques, and then describes how
76	   the flow label can be used to enhance layer 3/4 load balancers in
77	   particular.

79	   The motivation for this approach is to improve the performance of
80	   most types of layer 3/4 load balancers, especially for traffic
81	   including multiple IPv6 extension headers and in particular for
82	   fragmented packets.  Fragmented packets, often the result of
83	   customers reaching the load balancer via a VPN with a limited MTU,
84	   are a common performance problem.

86	2.  Summary of Flow Label Specification

88	   The IPv6 flow label is a 20 bit field included in every IPv6 header
89	   [RFC2460].  It is recommended to be supported in all IPv6 nodes by
90	   [RFC6434] and it is defined in [RFC6437].  There is additional
91	   background material in [RFC6436] and [RFC6294].  According to its
92	   definition, the flow label should be set to a constant value for a
93	   given traffic flow (such as an HTTP connection), and that value will
94	   belong to a uniform statistical distribution, making it potentially
95	   valuable for load balancing purposes.

97	   Any device that has access to the IPv6 header has access to the flow
98	   label, and it is at a fixed position in every IPv6 packet.  In
99	   contrast, transport layer information, such as the port numbers, is
100	   not always in a fixed position, since it follows any IPv6 extension
101	   headers that may be present.  In fact, the logic of finding the
102	   transport header is always more complex for IPv6 than for IPv4, due
103	   to the absence of an Internet Header Length field in IPv6.
104	   Additionally, if packets are fragmented, the flow label will be
105	   present in all fragments, but the transport header will only be in
106	   one packet.  Therefore, within the lifetime of a given transport
107	   layer connection, the flow label can be a more convenient "handle"
108	   than the port number for identifying that particular connection.

110	   According to RFC 6437, source hosts should set the flow label, but,
111	   if they do not (i.e. its value is zero), forwarding nodes (such as
112	   the first-hop router) may set it instead.  In both cases, the flow
113	   label value must be constant for a given transport session, normally
114	   identified by the IPv6 and Transport header 5-tuple.  By default, the
115	   flow label value should be calculated by a stateless algorithm.  The
116	   resulting value should form part of a statistically uniform
117	   distribution, regardless of which node sets it.

119	   It is recognised that at the time of writing, very few traffic flows
120	   include a non-zero flow label value.  The mechanism described below
121	   is one that can be added to existing load balancing mechanisms, so
122	   that it will become effective as more and more flows contain a non-
123	   zero label.  If the flow label is in fact set to zero, it will not
124	   affect the information entropy of the IPv6 header.  Even if the flow
125	   label is chosen from an imperfectly uniform distribution, it will
126	   nevertheless increase the header entropy.  These facts allow for
127	   progressive introduction of load balancing based on the flow label.

129	   A careful reading of RFC 6437 shows that for a given source accessing
130	   a well-known TCP port at a given destination, the flow label is, in
131	   effect, a substitute for the source port number, found at a fixed
132	   position in the layer 3 header.

134	   The flow label is defined as an end-to-end component of the IPv6
135	   header, but there are three qualifications to this:

137	   1.  Until the RFC 6437 standard is widely implemented as recommended
138	       by RFC 6434, the flow label will often be set to the default
139	       value of zero.
140	   2.  Because of the recommendation to use a stateless algorithm to
141	       calculate the label, there is a low (but non-zero) probability
142	       that two simultaneous flows from the same source to the same
143	       destination have the same flow label value despite having
144	       different transport protocol port numbers.
145	   3.  The flow label field is in an unprotected part of the IPv6
146	       header, which means that intentional or unintentional changes to
147	       its value cannot be easily detected by a receiver.

149	   The first two points are addressed below in Section 4 and the third
150	   in Section 5.

152	3.  Summary of Load Balancing Techniques

154	   Load balancing for server farms is achieved by a variety of methods,
155	   often used in combination [Tarreau].  The flow label is not relevant
156	   to all of them, and the actual load balancing algorithm (the choice
157	   of which server to use for a new client session) is irrelevant to
158	   this discussion.

160	   o  The simplest method is simply using the DNS to return different
161	      server addresses for a single name such as www.example.com to
162	      different users.  This is typically done by rotating the order in
163	      which different addresses are listed by the relevant authoritative
164	      DNS server, on the assumption that the client will pick the first
165	      one.  Routing may be configured such that the different addresses
166	      are handled by different ingress routers.  The flow label can have
167	      no impact on this method and it is not discussed further.
168	   o  Another method, for HTTP servers, is to operate a layer 7 reverse
169	      proxy in front of the server farm.  The reverse proxy will present
170	      a single IP address to the world, communicated to clients by a
171	      single AAAA record.  For each new client session (an incoming TCP
172	      connection and HTTP request), it will pick a particular server and
173	      proxy the session to it.  The act of proxying should be more
174	      efficient and less resource-intensive than the act of serving the
175	      required content.  The proxy must retain TCP state and proxy state
176	      for the duration of the session.  This TCP state could,
177	      potentially, include the incoming flow label value.
178	   o  A component of some load balancing systems is an SSL reverse proxy
179	      farm.  The individual SSL proxies handle all cryptographic aspects
180	      and exchange raw HTTP with the actual servers.  Thus, from the
181	      load balancing point of view, this really looks just like a server
182	      farm, except that it's specialised for HTTPS.  Each proxy will
183	      retain SSL and TCP and maybe HTTP state for the duration of the
184	      session, and the TCP state could potentially include the flow
185	      label.
186	   o  Finally the "front end" of many load balancing systems is a layer
187	      3/4 load balancer.  While it can be a dedicated device, it is also
188	      a standard function of some network switches or routers (e.g.
189	      using ECMP, [RFC2991]).  In this case, it is the layer 3/4 load
190	      balancer whose IP address is published as the primary AAAA record
191	      for the service.  All client sessions will pass through this
192	      device.  According to the specific scenario, it will spread new
193	      sessions across the actual application servers, across an SSL
194	      proxy farm, or across a set of layer 7 proxies.  In all cases, the
195	      layer 3/4 load balancer has to recognize incoming packets as
196	      belonging to new or existing client sessions, and choose the
197	      target server or proxy so as to ensure persistence.  'Persistence'
198	      is defined as guaranteeing that a given session will run to
199	      completion on a single server.  The layer 3/4 load balancer
200	      therefore needs to inspect each incoming packet to identify the
201	      session.  There are two common types of layer 3/4 load balancers,
202	      the totally stateless ones which only act on packets, generally
203	      involving a per-packet hashing of easy-to-find information such as
204	      the source address and/or port into a server number, and the
205	      stateful ones which take the routing decision on the very first
206	      packets of a session and maintain the same direction for all
207	      packets belonging to the same session.  Clearly, both types of
208	      layer 3/4 balancers could inspect and make use of the flow label
209	      value.

211	      Our focus is on how the balancer identifies a particular flow.
212	      For clarity, note that two aspects of layer 3/4 load balancers
213	      could not be affected by use of the flow label to identify
214	      sessions:

216	      1.  Balancers use various techniques to redirect traffic to a
217	          specific target server.

219	          - All servers are configured with the same IP address, they
220	          are all on the same LAN, and the load balancer sends directly
221	          to their individual MAC addresses.
222	          - Each server has its own IP address, and the balancer uses an
223	          IP-in-IP tunnel to reach it.
224	          - Each server has its own IP address, and the balancer
225	          performs NAPT (network address and port translation) to
226	          deliver the client's packets to that address.

228	          The choice between these methods is not affected by use of the
229	          flow label.

231	      2.  A layer 3/4 balancer must correctly handle Path MTU Discovery
232	          by forwarding relevant ICMPv6 packets in both directions.
233	          This too is not affected by use of the flow label.

235	   The following diagram, inspired by [Tarreau], shows a maximum layout.

237	        ___________________________________________
238	       (                                           )
239	       (          Clients in the Internet          )
240	       (___________________________________________)
241	              |                            |
242	         ------------                ------------
243	         | Ingress  |                | Ingress  |
244	         | router   |                | router   |
245	         ------------                ------------
246	           ___|_______DNS-based____________|___
247	                |     load splitting     |
248	                |                        |
249	                |                        |
250	           ------------             ------------
251	           | L3/4 ASIC|             | L3/4 ASIC|
252	           | balancer |             | balancer |
253	           ------------             ------------
254	                |          load          |
255	                |        spreading       |
256	      __________|________________________|___________
257	          |              |            |          |
258	    ------------   ------------   --------   --------
259	    |HTTP proxy|...|HTTP proxy|   | SSL  |...| SSL  |
260	    | balancer |   | balancer |   | proxy|   | proxy|
261	    ------------   ------------   --------   --------
262	      ____|_____________|_____________|_________|_____
263	        |          |          |          |          |
264	    --------   --------   --------   --------   --------
265	    |HTTP  |   |HTTP  |   |HTTP  |   |HTTP  |   |HTTP  |
266	    |server|   |server|   |server|   |server|   |server|
267	    --------   --------   --------   --------   --------

269	   From the previous paragraphs, we can identify several points in this
270	   diagram where the flow label might be relevant:

272	   1.  Layer 3/4 load balancers.
273	   2.  SSL proxies.
274	   3.  HTTP proxies.

276	   However, usage by the proxies seems unlikely to be cost-effective, so
277	   in this document we focus only on layer 3/4 balancers.

279	4.  Applying the Flow Label to L3/L4 Load Balancing

281	   The suggested model for using the flow label in a load balancing
282	   mechanism is as follows:

284	   o  We are only concerned with IPv6 traffic in which the flow label
285	      value has been set at or near the source according to [RFC6437].
286	      If the flow label of an incoming packet is zero, load balancers
287	      will continue to use the transport header in the traditional way.
288	      As the use of the flow label becomes more prevalent according to
289	      RFC 6434, load balancers, and therefore users, will reap a growing
290	      performance benefit.
291	   o  If the flow label of an incoming packet is non-zero, layer 3/4
292	      load balancers can use the 2-tuple {source address, flow label} as
293	      the session key for whatever load distribution algorithm they
294	      support.  If any IPv6 extension headers, including fragment
295	      headers, are present, this will be significantly quicker than
296	      searching for the transport port numbers later in the packet.
297	      Moreover, the transport layer information such as the source port
298	      is not repeated in fragments, which generally prevents stateless
299	      load balancers from supporting fragmented traffic since they
300	      generally cannot reassemble fragments.

302	      A stateless layer 3/4 load balancer would simply apply a hash
303	      algorithm to the 2-tuple {source address, flow label} on all
304	      packets, in order to select the same target server consistently
305	      for a given flow.

307	      A stateful layer 3/4 load balancer would apply its usual load
308	      distribution algorithm to the first packet of a session, and store
309	      the {2-tuple, server} association in a table so that subsequent
310	      packets belonging to the same session are forwarded to the same
311	      server.  Thus, for all subsequent packets of the session, it can
312	      ignore all IPv6 extension headers, which should lead to a
313	      performance benefit.  Whether this benefit is valuable will depend
314	      on engineering details of the specific load balancer.

316	      Layer 3/4 balancers that redirect the incoming packets by NAPT are
317	      not expected to obtain any saving of time by using the flow label,
318	      because they have no choice but to follow the extension header
319	      chain, in order to locate and modify the port number and transport
320	      checksum.  The same would apply to balancers that perform TCP
321	      state tracking for any reason.
322	   o  Note that correct handling of ICMPv6 for Path MTU Discovery
323	      requires the layer 3/4 balancer to keep state for the client
324	      source address, independently of either the port numbers or the
325	      flow label.
326	   o  SSL and HTTP proxies, if present, should forward the flow label
327	      value towards the server.  This has no performance benefit, but is
328	      consistent with the general RFC 6437 model for the flow label.

330	   It should be noted that the performance benefit, if any, depends
331	   entirely on engineering trade-offs in the design of the L3/L4
332	   balancer.  An extra test is needed (is the label non-zero?), but all
333	   logic for handling extension headers can be omitted except for the
334	   first packet of a new flow.  Since the only state to be stored is the
335	   2-tuple and the server identifier, storage requirements will be
336	   reduced.  Additionally, the method will work for fragmented traffic
337	   and for flows where the transport information is missing (unknown
338	   transport protocol) or obfuscated (e.g., IPsec).  Traffic reaching
339	   the load balancer via a VPN is particularly prone to the
340	   fragmentation issue, due to MTU size issues.  For some load balancer
341	   designs, these are very significant advantages.

343	   In the unlikely event of two simultaneous flows from the same source
344	   address having the same flow label value, the two flows would end up
345	   assigned to the same server, where they would be distinguished as
346	   normal by their port numbers.  Since this would be a statistically
347	   rare event, it would not damage the overall load balancing effect.
348	   Moreover, it is very likely that there will be many more flow label
349	   values than servers at most sites (1 million possible label values),
350	   so it is already expected that multiple flow label values will end up
351	   on the same server for a given IP address.

353	   In the case that many thousands of clients are hidden behind the same
354	   large-scale NAPT (network address and port translator) with a single
355	   shared IP address, the assumption of low probability of conflicts
356	   might become incorrect, unless flow label values are random enough to
357	   avoid following similar sequences for all clients.  This is not
358	   expected to be a factor for IPv6 anyway, since there is no need to
359	   implement large-scale NAPT with address sharing [RFC4864].  The
360	   statistical assumption is valid for sites that implement network
361	   prefix translation [RFC6296], since this technique provides a
362	   different address for each client.

364	5.  Security Considerations

366	   Security aspects of the flow label are discussed in [RFC6437].  As
367	   noted there, a malicious source or man-in-the-middle could disturb
368	   load balancing by manipulating flow labels.  This risk already exists
369	   today where the source address and port are used as hashing key in
370	   layer 3/4 load balancers, as well as where a persistence cookie is
371	   used in HTTP to designate a server.  It even exists on layer 3
372	   components which only rely on the source address to select a
373	   destination, making them more DDoS-prone.  Nevertheless, all these
374	   methods are currently used because the benefits for load balancing
375	   and persistence hugely outweigh the risks.  The flow label does not
376	   significantly alter this situation.

378	   Specifically, the specification [RFC6437] states that "stateless
379	   classifiers should not use the flow label alone to control load
380	   distribution, and stateful classifiers should include explicit
381	   methods to detect and ignore suspect flow label values."  The former
382	   point is answered by also using the source address.  The latter point
383	   is more complex.  If the risk is considered serious, the site ingress
384	   router or the layer 3/4 balancer should use a suitable heuristic to
385	   verify incoming flows with non-zero flow label values.  If a flow
386	   from a given source address and port number does not have a constant
387	   flow label value, it is suspect and should be dropped.  This would
388	   deal with both intentional and accidental changes to the flow label.

390	   RFC 6437 notes in its Security Considerations that if the covert
391	   channel risk is considered significant, a firewall might rewrite non-
392	   zero flow labels.  As long as this is done as described in RFC 6437,
393	   it will not invalidate the mechanisms described above.

395	   The flow label may be of use in protecting against distributed denial
396	   of service (DDOS) attacks against servers.  As noted in RFC 6437, a
397	   source should generate flow label values that are hard to predict,
398	   most likely by including a secret nonce in the hash used to generate
399	   each label.  The attacker does not know the nonce and therefore has
400	   no way to invent flow labels which will all target the same server,
401	   even with knowledge of both the hash algorithm and the load balancing
402	   algorithm.  Still, it is important to understand that it is always
403	   trivial to force a load balancer to stick to the same server during
404	   an attack, so the security of the whole solution must not rely on the
405	   unpredicatability of the flow label values alone, but should include
406	   defensive measures like most load balancers already have against
407	   abnormal use of source address or session cookies.

409	   New flows are assigned to a server according to any of the usual
410	   algorithms available on the load balancer (e.g., least connections,
411	   round robin, etc.).  The association between the flow label value and
412	   the server is stored in a table (often called stick table) so that
413	   future connections using the same flow label can be sent to the same
414	   server.  This method is more robust against a loss of server and also
415	   makes it harder for an attacker to target a specific server, because
416	   the association between a flow label value and a server is not known
417	   externally.

419	6.  IANA Considerations

421	   This document requests no action by IANA.

423	7.  Acknowledgements

425	   Valuable comments and contributions were made by Fred Baker, Lorenzo
426	   Colitti, Joel Jaeggli, Gurudeep Kamat, Warren Kumari, Julia Renouard,
427	   Julius Volz, and others.

429	   This document was produced using the xml2rfc tool [RFC2629].

431	8.  Change log [RFC Editor: Please remove]

433	   draft-ietf-intarea-flow-label-balancing-00: WG adoption, minor WG
434	   comments, 2013-01-15.

436	   draft-carpenter-flow-label-balancing-02: updates based on external
437	   review, 2012-12-05.

439	   draft-carpenter-flow-label-balancing-01: update following comments,
440	   2012-06-12.

442	   draft-carpenter-flow-label-balancing-00: restructured after IETF83,
443	   2012-05-08.

445	   draft-carpenter-v6ops-label-balance-02: clarified after WG
446	   discussions, 2012-03-06.

448	   draft-carpenter-v6ops-label-balance-01: updated with community
449	   comments, additional author, 2012-01-17.

451	   draft-carpenter-v6ops-label-balance-00: original version, 2011-10-13.

453	9.  References

455	9.1.  Normative References

457	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
458	              (IPv6) Specification", RFC 2460, December 1998.

460	   [RFC6434]  Jankiewicz, E., Loughney, J., and T. Narten, "IPv6 Node
461	              Requirements", RFC 6434, December 2011.

463	   [RFC6437]  Amante, S., Carpenter, B., Jiang, S., and J. Rajahalme,
464	              "IPv6 Flow Label Specification", RFC 6437, November 2011.

466	9.2.  Informative References

468	   [RFC2629]  Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629,
469	              June 1999.

471	   [RFC2991]  Thaler, D. and C. Hopps, "Multipath Issues in Unicast and
472	              Multicast Next-Hop Selection", RFC 2991, November 2000.

474	   [RFC4864]  Van de Velde, G., Hain, T., Droms, R., Carpenter, B., and
475	              E. Klein, "Local Network Protection for IPv6", RFC 4864,
476	              May 2007.

478	   [RFC6294]  Hu, Q. and B. Carpenter, "Survey of Proposed Use Cases for
479	              the IPv6 Flow Label", RFC 6294, June 2011.

481	   [RFC6296]  Wasserman, M. and F. Baker, "IPv6-to-IPv6 Network Prefix
482	              Translation", RFC 6296, June 2011.

484	   [RFC6436]  Amante, S., Carpenter, B., and S. Jiang, "Rationale for
485	              Update to the IPv6 Flow Label Specification", RFC 6436,
486	              November 2011.

488	   [RFC6438]  Carpenter, B. and S. Amante, "Using the IPv6 Flow Label
489	              for Equal Cost Multipath Routing and Link Aggregation in
490	              Tunnels", RFC 6438, November 2011.

492	   [Tarreau]  Tarreau, W., "Making applications scalable with load
493	              balancing", 2006, <http://1wt.eu/articles/2006_lb/>.

495	Authors' Addresses

497	   Brian Carpenter
498	   Department of Computer Science
499	   University of Auckland
500	   PB 92019
501	   Auckland,   1142
502	   New Zealand

504	   Email: brian.e.carpenter@gmail.com
505	   Sheng Jiang
506	   Huawei Technologies Co., Ltd
507	   Q14, Huawei Campus
508	   No.156 Beiqing Road
509	   Hai-Dian District, Beijing  100095
510	   P.R. China

512	   Email: jiangsheng@huawei.com

514	   Willy Tarreau
515	   Exceliance
516	   R&D Produits reseau
517	   3 rue du petit Robinson
518	   78350 Jouy-en-Josas
519	   France

521	   Email: w@1wt.eu