idnits 2.17.1 

draft-carpenter-flow-label-balancing-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (June 12, 2012) is 4335 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200)

  ** Obsolete normative reference: RFC 6434 (Obsoleted by RFC 8504)

  -- Obsolete informational reference (is this intentional?): RFC 2629
     (Obsoleted by RFC 7749)


     Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                       B. Carpenter
3	Internet-Draft                                         Univ. of Auckland
4	Intended status: Informational                                  S. Jiang
5	Expires: December 14, 2012                  Huawei Technologies Co., Ltd
6	                                                              W. Tarreau
7	                                                              Exceliance
8	                                                           June 12, 2012

10	          Using the IPv6 Flow Label for Server Load Balancing
11	                draft-carpenter-flow-label-balancing-01

13	Abstract

15	   This document describes how the IPv6 flow label as currently
16	   specified can be used to enhance layer 3/4 load distribution and
17	   balancing for large server farms.

19	Status of this Memo

21	   This Internet-Draft is submitted in full conformance with the
22	   provisions of BCP 78 and BCP 79.

24	   Internet-Drafts are working documents of the Internet Engineering
25	   Task Force (IETF).  Note that other groups may also distribute
26	   working documents as Internet-Drafts.  The list of current Internet-
27	   Drafts is at http://datatracker.ietf.org/drafts/current/.

29	   Internet-Drafts are draft documents valid for a maximum of six months
30	   and may be updated, replaced, or obsoleted by other documents at any
31	   time.  It is inappropriate to use Internet-Drafts as reference
32	   material or to cite them other than as "work in progress."

34	   This Internet-Draft will expire on December 14, 2012.

36	Copyright Notice

38	   Copyright (c) 2012 IETF Trust and the persons identified as the
39	   document authors.  All rights reserved.

41	   This document is subject to BCP 78 and the IETF Trust's Legal
42	   Provisions Relating to IETF Documents
43	   (http://trustee.ietf.org/license-info) in effect on the date of
44	   publication of this document.  Please review these documents
45	   carefully, as they describe your rights and restrictions with respect
46	   to this document.  Code Components extracted from this document must
47	   include Simplified BSD License text as described in Section 4.e of
48	   the Trust Legal Provisions and are provided without warranty as
49	   described in the Simplified BSD License.

51	Table of Contents

53	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
54	   2.  Summary of Flow Label Specification  . . . . . . . . . . . . .  3
55	   3.  Summary of Load Balancing Techniques . . . . . . . . . . . . .  4
56	   4.  Applying the Flow Label to L3/L4 Load Balancing  . . . . . . .  7
57	   5.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
58	   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 10
59	   7.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
60	   8.  Change log [RFC Editor: Please remove] . . . . . . . . . . . . 10
61	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
62	     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 10
63	     9.2.  Informative References . . . . . . . . . . . . . . . . . . 11
64	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11

66	1.  Introduction

68	   The IPv6 flow label has been redefined [RFC6437] and is now a
69	   recommended IPv6 node requirement [RFC6434].  Its use for load
70	   sharing in multipath routing has been specified [RFC6438].  Another
71	   scenario in which the flow label could be used is in load
72	   distribution for large server farms.  Load distribution is a slightly
73	   more general term than load balancing, but the latter is more
74	   commonly used.  This document starts with brief introductions to the
75	   flow label and to load balancing techniques, and then describes how
76	   the flow label can be used to enhance layer 3/4 load balancers in
77	   particular.

79	   The motivation for this approach is to improve the performance of
80	   most types of layer 3/4 load balancers, especially for traffic
81	   including multiple IPv6 extension headers and in particular for
82	   fragmented packets.  Fragmented packets, often the result of
83	   customers reaching the load balancer via a VPN with a limited MTU,
84	   are a common performance problem.

86	2.  Summary of Flow Label Specification

88	   The IPv6 flow label is a 20 bit field included in every IPv6 header
89	   [RFC2460].  It is recommended to be supported in all IPv6 nodes by
90	   [RFC6434] and it is defined in [RFC6437].  According to this
91	   definition, the flow label should be set to a constant value for a
92	   given traffic flow (such as an HTTP connection).

94	   Any device that has access to the IPv6 header has access to the flow
95	   label, and it is at a fixed position in every IPv6 packet.  In
96	   contrast, transport layer information, such as the port numbers, is
97	   not always in a fixed position, since it follows any IPv6 extension
98	   headers that may be present.  In fact, the logic of finding the
99	   transport header is always more complex for IPv6 than for IPv4, due
100	   to the absence of an Internet Header Length field in IPv6.
101	   Therefore, within the lifetime of a given transport layer connection,
102	   the flow label can be a more convenient "handle" than the port number
103	   for identifying that particular connection.

105	   According to RFC 6437, source hosts should set the flow label, but if
106	   they do not (i.e. its value is zero), forwarding nodes (such as the
107	   first-hop router) may set it instead.  In both cases, the flow label
108	   value must be constant for a given transport session, normally
109	   identified by the IPv6 and Transport header 5-tuple.  By default, the
110	   flow label value should be calculated by a stateless algorithm.  The
111	   resulting value should form part of a statistically uniform
112	   distribution.

114	   A careful reading of RFC 6437 shows that for a given source accessing
115	   a well-known TCP port at a given destination, the flow label is in
116	   effect a substitute for the source port number, found at a fixed
117	   position in the layer 3 header.

119	   The flow label is defined as an end-to-end component of the IPv6
120	   header, but there are three qualifications to this:

122	   1.  Until the RFC 6437 standard is widely implemented as recommended
123	       by RFC 6434, the flow label will often be set to the default
124	       value of zero.
125	   2.  Because of the recommendation to use a stateless algorithm to
126	       calculate the label, there is a low (but non-zero) probability
127	       that two simultaneous flows from the same source to the same
128	       destination have the same flow label value despite having
129	       different transport protocol port numbers.
130	   3.  The flow label field is in an unprotected part of the IPv6
131	       header, which means that intentional or unintentional changes to
132	       its value cannot be trivially detected by a receiver.

134	   The first two points are addressed below in Section 4 and the third
135	   in Section 5.

137	3.  Summary of Load Balancing Techniques

139	   Load balancing for server farms is achieved by a variety of methods,
140	   often used in combination [Tarreau].  The flow label is not relevant
141	   to all of them, and the actual load balancing algorithm (the choice
142	   of which server to use for a new client session) is irrelevant to
143	   this discussion.

145	   o  The simplest method is simply using the DNS to return different
146	      server addresses for a single name such as www.example.com to
147	      different users.  Typically this is done by rotating the order in
148	      which different addresses are listed by the relevant authoritative
149	      DNS server, assuming that the client will pick the first one.
150	      Routing may be configured such that the different addresses are
151	      handled by different ingress routers.  The flow label can have no
152	      impact on this method and it is not discussed further.
153	   o  Another method, for HTTP servers, is to operate a layer 7 reverse
154	      proxy in front of the server farm.  The reverse proxy will present
155	      a single IP address to the world, communicated to clients by a
156	      single AAAA record.  For each new client session (an incoming TCP
157	      connection and HTTP request), it will pick a particular server and
158	      proxy the session to it.  Hopefully the act of proxying will be
159	      cheap compared to the act of serving the required content.  The
160	      proxy must retain TCP state and proxy state for the duration of
161	      the session.  This TCP state could, potentially, include the
162	      incoming flow label value.
163	   o  A component of some load balancing systems is an SSL reverse proxy
164	      farm.  The individual SSL proxies handle all cryptographic aspects
165	      and exchange raw HTTP with the actual servers.  Thus, from the
166	      load balancing point of view, this really looks just like a server
167	      farm, except that it's specialised for HTTPS.  Each proxy will
168	      retain SSL and TCP and maybe HTTP state for the duration of the
169	      session, and the TCP state could potentially include the flow
170	      label.
171	   o  Finally the "front end" of many load balancing systems is a layer
172	      3/4 load balancer.  While it can sometimes be a dedicated
173	      hardware, it also happens to be a standard function of some
174	      network switches or routers (eg: using ECMP, [RFC2991]).  In this
175	      case, it is the layer 3/4 load balancer whose IP address is
176	      published as the primary AAAA record for the service.  All client
177	      sessions will pass through this device.  According to the precise
178	      scenario, it will spread new sessions across the actual
179	      application servers, across an SSL proxy farm, or across a set of
180	      layer 7 proxies.  In all cases, the layer 3/4 load balancer has to
181	      recognize incoming packets as belonging to new or existing client
182	      sessions, and choose the target server or proxy so as to ensure
183	      persistence.  'Persistence' is defined as guaranteeing that a
184	      given session will run to completion on a single server.  The
185	      layer 3/4 load balancer therefore needs to inspect each incoming
186	      packet to identify the session.  There are two common types of
187	      layer 3/4 load balancers, the totally stateless ones which only
188	      act on packets, generally involving a per-packet hashing of easy-
189	      to-find information such as the source address and/or port into a
190	      server number, and the stateful ones which take the routing
191	      decision on the very first packets of a session and maintain the
192	      same direction for all packets belonging to the same session.
193	      Clearly, both types of layer 3/4 balancers could inspect and make
194	      use of the flow label value.

196	      Our focus is on how the balancer identifies a particular flow.
197	      For clarity, note that two aspects of layer 3/4 load balancers
198	      could not be affected by use of the flow label to identify
199	      sessions:

201	      1.  Balancers use various techniques to redirect traffic to a
202	          specific target server.

204	          - All servers are configured with the same IP address, they
205	          are all on the same LAN, and the load balancer sends directly
206	          to their individual MAC addresses.
207	          - Each server has its own IP address, and the balancer uses an
208	          IP-in-IP tunnel to reach it.

210	          - Each server has its own IP address, and the balancer
211	          performs NAPT (network address and port translation) to
212	          deliver the client's packets to that address.

214	          The choice between these methods is not affected by use of the
215	          flow label.

217	      2.  A layer 3/4 balancer must correctly handle Path MTU Discovery
218	          by forwarding relevant ICMPv6 packets in both directions.
219	          This too is not affected by use of the flow label.

221	   The following diagram, inspired by [Tarreau], shows a maximum layout.

223	        ___________________________________________
224	       (                                           )
225	       (          Clients in the Internet          )
226	       (___________________________________________)
227	              |                            |
228	         ------------                ------------
229	         | Ingress  |                | Ingress  |
230	         | router   |                | router   |
231	         ------------                ------------
232	           ___|_______DNS-based____________|___
233	                |     load splitting     |
234	                |                        |
235	                |                        |
236	           ------------             ------------
237	           | L3/4 ASIC|             | L3/4 ASIC|
238	           | balancer |             | balancer |
239	           ------------             ------------
240	                |          load          |
241	                |        spreading       |
242	      __________|________________________|___________
243	          |              |            |          |
244	    ------------   ------------   --------   --------
245	    |HTTP proxy|...|HTTP proxy|   | SSL  |...| SSL  |
246	    | balancer |   | balancer |   | proxy|   | proxy|
247	    ------------   ------------   --------   --------
248	      ____|_____________|_____________|_________|_____
249	        |          |          |          |          |
250	    --------   --------   --------   --------   --------
251	    |HTTP  |   |HTTP  |   |HTTP  |   |HTTP  |   |HTTP  |
252	    |server|   |server|   |server|   |server|   |server|
253	    --------   --------   --------   --------   --------

255	   From the previous paragraphs, we can identify several points in this
256	   diagram where the flow label might be relevant:

258	   1.  Layer 3/4 load balancers.
259	   2.  SSL proxies.
260	   3.  HTTP proxies.

262	   However, usage by the proxies seems unlikely to be cost-effective, so
263	   in this document we focus only on layer 3/4 balancers.

265	4.  Applying the Flow Label to L3/L4 Load Balancing

267	   The suggested model for using the flow label in a load balancing
268	   mechanism is as follows:

270	   o  We are only concerned with IPv6 traffic in which the flow label
271	      value has been set at or near the source according to [RFC6437].
272	      If the flow label of an incoming packet is zero, load balancers
273	      will continue to use the transport header in the traditional way.
274	      As the use of the flow label becomes more prevalent according to
275	      RFC 6434, load balancers, and therefore users, will reap a growing
276	      performance benefit.
277	   o  If the flow label of an incoming packet is non-zero, layer 3/4
278	      load balancers can use the 2-tuple {source address, flow label} as
279	      the session key for whatever load distribution algorithm they
280	      support.  If any IPv6 extension headers, including fragment
281	      headers, are present, this will be significantly quicker than
282	      searching for the transport port numbers later in the packet.
283	      Moreover, the transport layer information such as the source port
284	      is not repeated in fragments, which generally prevents stateless
285	      load balancers from supporting fragmented traffic since they
286	      generally cannot reassemble fragments.

288	      Note that balancers usually do not need to consider the
289	      destination address as it is always the same, i.e., the server
290	      address.

292	      A stateless layer 3/4 load balancer would simply apply a hash
293	      algorithm to the 2-tuple {source address, flow label} on all
294	      packets, in order to select the same target server consistently
295	      for a given flow.

297	      A stateful layer 3/4 load balancer would apply its usual load
298	      distribution algorithm to the first packet of a session, and store
299	      the {2-tuple, server} association in a table so that subsequent
300	      packets belonging to the same session are forwarded to the same
301	      server.  Thus, for all subsequent packets of the session, it can
302	      ignore all IPv6 extension headers, which should lead to a
303	      performance benefit.  Whether this benefit is valuable will depend
304	      on engineering details of the specific load balancer.

306	      Layer 3/4 balancers that redirect the incoming packets by NAPT are
307	      not expected to obtain any saving of time by using the flow label,
308	      because they must in any case follow the extension header chain in
309	      order to locate and modify the port number and transport checksum.
310	      The same would apply to balancers that perform TCP state tracking
311	      for any reason.
312	   o  Note that correct handling of ICMPv6 for Path MTU Discovery
313	      requires the layer 3/4 balancer to keep state for the client
314	      source address, independently of either the port numbers or the
315	      flow label.
316	   o  SSL and HTTP proxies, if present, should forward the flow label
317	      value towards the server.  This has no performance benefit, but is
318	      consistent with the general RFC 6437 model for the flow label.

320	   It should be noted that the performance benefit, if any, depends
321	   entirely on engineering trade-offs in the design of the L3/L4
322	   balancer.  An extra test is needed (is the label non-zero?), but all
323	   logic for handling extension headers can be omitted except for the
324	   first packet of a new flow.  Since the only state to be stored is the
325	   2-tuple and the server identifier, storage requirements will be
326	   reduced.  Additionally, the method will work for fragmented traffic
327	   and for flows where the transport information is missing (unknown
328	   transport protocol) or obfuscated (e.g., IPsec).  Traffic reaching
329	   the load balancer via a VPN is particularly prone to the
330	   fragmentation issue, due to MTU size issues.  For some load balancer
331	   designs, these are very significant advantages.

333	   In the unlikely event of two simultaneous flows from the same source
334	   address having the same flow label value, the two flows would end up
335	   assigned to the same server, where they would be distinguished as
336	   normal by their port numbers.  Since this would be a statistically
337	   rare event, it would not damage the overall load balancing effect.
338	   Moreover, it is very likely that there will be many more flow label
339	   values than servers at most sites (1 million possible label values),
340	   so it is already expected that multiple flow label values will end up
341	   on the same server for a given IP address.  In the case where many
342	   thousands of clients are hidden behind the same large-scale NAT with
343	   a single IP address, the assumption of low probability of conflicts
344	   might become incorrect unless flow label values are random enough to
345	   avoid following similar sequences for all clients.  This is not
346	   expected to be a factor for IPv6 anyway, since there is no valid
347	   reason to implement NAT [RFC4864].  The statistical assumption is
348	   valid for sites that implement network prefix translation [RFC6296],
349	   since this technique provides a different address for each client.

351	5.  Security Considerations

353	   Security aspects of the flow label are discussed in [RFC6437].  As
354	   noted there, a malicious source or man-in-the-middle could disturb
355	   load balancing by manipulating flow labels.  This risk already exists
356	   today where the source address and port are used as hashing key in
357	   layer 3/4 load balancers, as well as where a persistence cookie is
358	   used in HTTP to designate a server.  It even exists on layer 3
359	   components which only rely on the source address to select a
360	   destination, making them more DDoS-prone.  Nevertheless, all these
361	   methods are currently used because the benefits for load balancing
362	   and persistence hugely outweigh the risks.

364	   Specifically, [RFC6437] states that "stateless classifiers should not
365	   use the flow label alone to control load distribution, and stateful
366	   classifiers should include explicit methods to detect and ignore
367	   suspect flow label values."  The former point is answered by also
368	   using the source address.  The latter point is more complex.  If the
369	   risk is considered serious, the site ingress router or the layer 3/4
370	   balancer should verify incoming flows with non-zero flow label
371	   values.  If a flow from a given source address and port number does
372	   not have a constant flow label value, it is suspect and should be
373	   dropped.  This would deal with both intentional and accidental
374	   changes to the flow label.

376	   RFC 6437 notes in its Security Considerations that if the covert
377	   channel risk is considered significant, a firewall might rewrite non-
378	   zero flow labels.  As long as this is done as described in RFC 6437,
379	   it will not invalidate the mechanisms described above.

381	   The flow label may be of use in protecting against distributed denial
382	   of service (DDOS) attacks against servers.  As noted in RFC 6437, a
383	   source should generate flow label values that are hard to predict,
384	   most likely by including a secret nonce in the hash used to generate
385	   each label.  The attacker does not know the nonce and therefore has
386	   no way to invent flow labels which will all target the same server,
387	   even with knowledge of both the hash algorithm and the load balancing
388	   algorithm.  Still, it is important to understand that it is always
389	   trivial to force a load balancer to stick to the same server during
390	   an attack, so the security of the whole solution must not rely on the
391	   unpredicatability of the flow label values alone, but should include
392	   defensive measures like most load balancers already have against
393	   abnormal use of source address or session cookies.

395	   New flows are assigned to a server according to any of the usual
396	   algorithms available on the load balancer (e.g., least connections,
397	   round robin, etc.).  The association between the flow label value and
398	   the server is stored in a table (often called stick table) so that
399	   future connections using the same flow label can be sent to the same
400	   server.  This method is more robust against a loss of server and also
401	   makes it harder for an attacker to target a specific server, because
402	   the association between a flow label value and a server is not known
403	   externally.

405	6.  IANA Considerations

407	   This document requests no action by IANA.

409	7.  Acknowledgements

411	   Valuable comments and contributions were made by Fred Baker, Lorenzo
412	   Colitti, Joel Jaeggli, Gurudeep Kamat, Julius Volz, and others.

414	   This document was produced using the xml2rfc tool [RFC2629].

416	8.  Change log [RFC Editor: Please remove]

418	   draft-carpenter-flow-label-balancing-01: update following comments,
419	   2012-06-12.

421	   draft-carpenter-flow-label-balancing-00: restructured after IETF83,
422	   2012-05-08.

424	   draft-carpenter-v6ops-label-balance-02: clarified after WG
425	   discussions, 2012-03-06.

427	   draft-carpenter-v6ops-label-balance-01: updated with community
428	   comments, additional author, 2012-01-17.

430	   draft-carpenter-v6ops-label-balance-00: original version, 2011-10-13.

432	9.  References

434	9.1.  Normative References

436	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
437	              (IPv6) Specification", RFC 2460, December 1998.

439	   [RFC6434]  Jankiewicz, E., Loughney, J., and T. Narten, "IPv6 Node
440	              Requirements", RFC 6434, December 2011.

442	   [RFC6437]  Amante, S., Carpenter, B., Jiang, S., and J. Rajahalme,
443	              "IPv6 Flow Label Specification", RFC 6437, November 2011.

445	9.2.  Informative References

447	   [RFC2629]  Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629,
448	              June 1999.

450	   [RFC2991]  Thaler, D. and C. Hopps, "Multipath Issues in Unicast and
451	              Multicast Next-Hop Selection", RFC 2991, November 2000.

453	   [RFC4864]  Van de Velde, G., Hain, T., Droms, R., Carpenter, B., and
454	              E. Klein, "Local Network Protection for IPv6", RFC 4864,
455	              May 2007.

457	   [RFC6296]  Wasserman, M. and F. Baker, "IPv6-to-IPv6 Network Prefix
458	              Translation", RFC 6296, June 2011.

460	   [RFC6438]  Carpenter, B. and S. Amante, "Using the IPv6 Flow Label
461	              for Equal Cost Multipath Routing and Link Aggregation in
462	              Tunnels", RFC 6438, November 2011.

464	   [Tarreau]  Tarreau, W., "Making applications scalable with load
465	              balancing", 2006, <http://1wt.eu/articles/2006_lb/>.

467	Authors' Addresses

469	   Brian Carpenter
470	   Department of Computer Science
471	   University of Auckland
472	   PB 92019
473	   Auckland,   1142
474	   New Zealand

476	   Email: brian.e.carpenter@gmail.com

478	   Sheng Jiang
479	   Huawei Technologies Co., Ltd
480	   Q14, Huawei Campus
481	   No.156 Beiqing Road
482	   Hai-Dian District, Beijing  100095
483	   P.R. China

485	   Email: jiangsheng@huawei.com
486	   Willy Tarreau
487	   Exceliance
488	   R&D Produits reseau
489	   3 rue du petit Robinson
490	   78350 Jouy-en-Josas
491	   France

493	   Email: w@1wt.eu