Network Working Group                                      J. Sigurdsson
Internet-Draft                                                    Google
Intended status: Standards Track                            October 2011
Expires: April 6, 2012

          Anti-DDoS Throttling of HTTP Requests by User-Agent
             draft-sigurdsson-anti-ddos-http-throttling-00

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on April 6, 2012.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Abstract

   Describes a throttling mechanism User-Agents can implement that
   limits the ability of websites and browser extensions to perpetrate a
   DDoS (Distributed Denial of Service) attack.


Sigurdsson                Expires April 6, 2012                 [Page 1]

Internet-Draft            Anti-DDoS Throttling              October 2011

1. Introduction

   This Internet-Draft specifies an anti-DDoS mechanism that can be
   built into the HTTP stack of User-Agents and is intended to limit the
   impact of DDoS attacks perpetrated by web pages or browser
   extensions.

   The anti-DDoS mechanism is primarily to make the User-Agent throttle
   requests based on randomized exponential backoff when a web server
   sends status codes that indicate overload. This causes the aggregate
   traffic from a multitude of User-Agents that have this mechanism to
   decrease to a level the web server can sustain. Additionally, we have
   implemented a couple of custom HTTP response headers that help
   servers further control their traffic. The goal is to provide DDoS
   protection for all web servers on the Internet without requiring them
   to be modified in any way, but allowing for better DDoS prevention
   and traffic management for web servers that are aware of the anti-
   DDoS mechanism.

   In an existing implementation, discrete time simulation was used to
   validate the approach before rolling out experiments to a user
   population. Those experiments confirmed our beliefs about how the
   approach would behave in the wild. At the time of writing, the
   approach is live for a large user population without issues.

2. Requirement Levels

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119.

3. Background

   The rationale for adding an anti-DDoS mechanism to the HTTP stack
   goes something like this:

   a) DDoS attacks have become common.

   b) More and more application software (e.g. HTML5 applications) is
      being written using HTTP as its lingua franca, the only networking
      protocol available to browser-based applications without
      requesting non-default privileges.

   c) Web-only computing devices have started to appear, and it.s
      reasonable to expect that low-level networking will be completely
      unavailable to application software on an increasing number of
      devices, and therefore less available to malicious attackers.

   d) Since only the User-Agent and not the application software running
      on top of it can affect how the HTTP stack is implemented,
      building anti-DDoS behaviors into the User-Agent makes sense.

   Intentional, malicious attacks are often discussed. Another type of
   attack that we want to protect against, that may be becoming more

Sigurdsson                Expires April 6, 2012                 [Page 3]

Internet-Draft            Anti-DDoS Throttling              October 2011

   frequent, is the unintentional DDoS attack. Consider for example a
   hypothetical web site A using Cross-Origin Resource Sharing (CORS) to
   communicate with web site B. Now imagine that a new version of web
   site A is rolled out, which accidentally increases the frequency of
   requests to site B from once every 5 minutes to once every 5 seconds.
   Assuming that site A has a non-trivial number of users compared to
   site B, what you could have on your hands is an unintentional DDoS
   attack, one where you may not want to completely block the attackers
   but rather rate-limit them.

   Other potential attackers in this kind of scenario could be popular
   browser extensions that are updated to a version with a similar bug,
   or your own website making AJAX requests to itself. The Anti-DDoS
   mechanism we describe in this RFC can help deal with these
   unintentional attacks in addition to malicious attacks.

   It should be noted that DDoS attacks by software not running on top
   of a User-Agent are not affected by this mechanism; native software
   or other application software with low-level network access could
   always bypass any mechanisms put in place by the User-Agent. However,
   reducing the number of possibilities an attacker has is still a
   defense in depth in the worst case.

4. Overview of Anti-DDoS Throttling Mechanism

   The anti-DDoS mechanism described here has these basic components:

   o It observes HTTP response codes on a per throttling target basis. A
     throttling target is demarcated as a URL minus its query
     parameters.

   o Once a few HTTP response codes in a row for a given throttling
     target have indicated that the web server is too busy (e.g. a 503
     response), block off an exponentially increasing (with some
     randomization) "no contact" period for this throttling target.

   o Disallow further requests made for the throttling target during the
     "no contact" period.

   o Use exponential backoff parameters that are fairly conservative, so
     as not to block contact with a web server for too long, but at the
     same time are proven to reduce traffic to an overloaded web server
     quite quickly.

   Backing off exponentially, with some randomization to avoid large
   numbers of clients retrying at the same time, causes the aggregate
   traffic of clients to fairly quickly decrease to the threshold where
   the web server is not overloaded, or only slightly overloaded, and so
   is able to mostly stop responding with 5xx status codes. This is easy
   to demonstrate through simulation. However, it is important that a
   large percentage of clients perform exponential backoff, and for this
   reason we hope that other browser projects will adopt the mechanism.


Sigurdsson                Expires April 6, 2012                 [Page 4]

Internet-Draft            Anti-DDoS Throttling              October 2011

   In addition to exponential backoff, there are various behaviors that
   make this mechanism less likely to cause false positives for web
   developers, less likely to block requests explicitly made by a user,
   provide more control to web servers that are aware of the mechanism,
   and provide better DDoS prevention in the face of a malicious attack:

   o Requests to localhost are never blocked.

   o Requests are not blocked if they occur in the first 3.5 seconds
     after an explicit user gesture, i.e. clicking with the mouse or
     typing on the keyboard and thereby causing input to a web page. In
     a DDoS situation, this will tend to let some malicious requests
     through, but this should be worth it to favor actual user actions.
     The percentage of time malicious requests would go through would be
     0% of the time the user is not using the browser, and some
     percentage considerably less than 100% of the time the user is
     active.

   o An HTTP response header lets web servers opt out of the anti-DDoS
     mechanism. The name and value for this header is "Exponential-
     Throttling: disable" and its effect is to opt out the host the
     request was sent to. The opt-out must last for the remainder of the
     browsing session, but multiple opt-outs must be idempotent.

   o The "Retry-After" response header gets slightly different
     semantics, in that it can now be used for any status code, not just
     a 503 status code, e.g. on a 200 success response. It could
     therefore be used e.g. to optimize AJAX applications, where the
     client by default pings the server very frequently, and the server
     then controls the actual rate of pings by using this header.

   o An HTTP response header is provided that lets the web server
     indicate how to bucket the current throttling target in the anti-
     DDoS mechanism. By default, the mechanism observes HTTP response
     codes on a per throttling target basis, i.e. by looking at the URL
     being requested, minus the query parameters. This header could be
     used by the web server to state that all URLs on the server
     starting with a given path (the current response being a match for
     this path) should be observed and blocked together as a single
     throttling target. The header name is "DDoS-Bucket-With" and the
     value is the same format as the value of the path=foo part of a
     Set-Cookie response header. The header need only be set on
     responses that may be considered an indication of a DDoS attack,
     e.g. 500, 503 and 509 responses.

   It should be noted that even with the anti-DDoS mechanism in place, a
   malicious DDoS attack could be perpetrated on a web server not using
   the "DDoS-Bucket-With" header, since the malicious attack could craft
   multiple URLs that each are considered a new throttling target. This
   could have been avoided by using the hostname and port number as the
   throttling target, rather than the URL minus query parameters, but
   this would have made it hard to roll the mechanism out to the entire
   web without opt-in, as there are many web servers that operate many
   different independent sites or services off the same host name. As it

Sigurdsson                Expires April 6, 2012                 [Page 5]

Internet-Draft            Anti-DDoS Throttling              October 2011

   stands, without opt-in, some malicious attacks and all or most non-
   malicious (unintentional) attacks are mitigated, and with opt-in via
   the "DDoS-Bucket-With" header, all are mitigated. Future research may
   look at whether automatic clustering of throttling targets could be
   done, but this is outside of the scope of the current document.

5. Details

   There are two parts to exponential backoff - the backoff algorithm
   and the backoff policy. The components of the backoff policy are as
   follows:

   o num_errors_to_ignore: The number of successive failures to ignore
     before starting to backoff. The suggested value is 2, to be
     conservative and avoid false positives caused by very short server
     downtime or by the typical build/debug/restart server cycle of a
     web developer.

   o initial_backoff_ms: The initial length of the backoff period.
     Suggested value: 700 ms.

   o multiply_factor: The exponential factor that the backoff period is
     increased by on each successive failure. Suggested value: 1.4.

   o jitter_factor: The jitter, or randomization, factor. A factor of
     0.1 (10%) will cause the backoff period to be chosen as a uniform
     random distribution between 90-100% of the calculated backoff
     period. Suggested value: 0.1.

   o maximum_backoff_ms: The maximum length of the backoff period.
     Suggested value: 15 minutes.

   o Which status codes we consider indicators of a server being
     possibly under a DDoS attack, or at least benefiting from a
     reduction in aggregate traffic. Status code 503 MUST be interpreted
     as an indicator, and status codes 500 and 509 MAY be interpreted as
     an indicator, for the following reasons:

      o 500 is the generic error when no better message is suitable, and
        as such does not necessarily indicate a temporary state, but
        other status codes cover most of the permanent error states so
        it.s fairly reasonable to consider this a temporary error.

      o 503 is explicitly documented as a temporary state where the
        server is either overloaded or down for maintenance.

      o 509 is the (non-standard) Bandwidth Limit Exceeded status code,
        which might indicate DDoS.

   Details of the algorithm are as follows:

   o For each throttling target, a failure_count and a release_time is
     maintained.


Sigurdsson                Expires April 6, 2012                 [Page 6]

Internet-Draft            Anti-DDoS Throttling              October 2011

   o failure_count is increased by one for each failure (as defined by
     the backoff policy) and decreased by one (to no less than 0) for
     each success.

   o release_time is the earliest absolute time when a request to the
     throttling target should be allowed. In the default state when a
     throttling target does not have errors, this time is either past or
     exactly present, but in an error state it is a future time.

   o hen a response is received, and failure_count has been established
     by either incrementing it or decrementing it if greater than zero,
     an effective failure count is calculated: effective_failure_count =
     failure_count - num_errors_to_ignore

   o Otherwise, release_time is calculated in steps as follows:

      a) delay =
         initial_backoff_ms*multiply_factor^effective_failure_count

      b) delay -= rand[0,1) * jitter_factor * delay

      c) delay = min(delay, maximum_backoff_ms)

      d) release_time = max(now + delay, release_time)

   o A special case makes delay == 0 for effective_failure_count.

   o Additional logic ensures that release_time is never set to a time
     earlier than its previous value. This is done to never override a
     release_time set using the Retry-After header.

6. Simulations and Experiments Performed

   An initial implementation of most of this standard was undertaken in
   the Chromium project and rolled out in the Google Chrome browser.
   This has helped work out many practical issues.

   The existing implementation of both the policy and the algorithm for
   exponential backoff may be found in the Chromium project in the
   source files net/base/backoff_entry.cc and
   net/url_request/url_request_throttler_entry.cc.

   An initial attempt to turn an early version of the mechanism on in
   the Google Chrome .dev. (early adopter) channel was met with a
   backlash from web developers, who were getting false positives fairly
   frequently when testing their websites. After this, we proceeded with
   much more caution as the desire was to be able to roll the mechanism
   out to all users with practically zero disruption, without requiring
   sites to opt in to protection.

   A couple of discrete time simulations were coded up. The goals for
   these simulations were to validate that the mechanism would be likely
   to behave the way we expected, and to help decide on an optimal set
   of parameters for the backoff algorithm that would prevent as many

Sigurdsson                Expires April 6, 2012                 [Page 7]

Internet-Draft            Anti-DDoS Throttling              October 2011

   false positives as possible and keep the perceived downtime of
   servers as close as possible to what it would be without this
   mechanism, while at the same time providing actual benefits in a DDoS
   scenario.

   The simulations are available in the Chromium source file
   net/url_request/url_request_throttler_simulation_unittest.cc. They
   validate the following assumptions:

   o That a server experiencing overload will actually benefit from the
     anti-DDoS throttling logic, i.e. that its traffic spike will
     subside and be distributed over a longer period of time;

   o That "well-behaved" clients of a server under DDoS attack actually
     benefit from the anti-DDoS throttling logic in that they are more
     likely to receive service; and

   o That the approximate increase in perceived downtime introduced by
     anti-DDoS throttling for various different actual downtimes is what
     we expect it to be, i.e. 8-15% on average for a mixture of
     scenarios.

   Following the simulations, we pushed the mechanism as an experiment
   to a portion of the Google Chrome .dev. channel. This experiment
   showed that anti-DDoS throttling blocked around 1 out of every
   40-50,000 requests, and that the increase in perceived downtime was
   within the noise of the experiment.

   The mechanism has since been rolled out to a significant portion of
   the Google Chrome user population, all of the .dev. channel and the
   .beta. channel, and is expected to go live soon as part of release 15
   of Chrome on the .stable. channel (the bulk of the user population).

7. Acknowledgments

   Thanks to Adam Barth for reviewing and handholding.

8. Security Considerations

   Security should not be impacted by this mechanism. The worst an
   attacker could do in a hypothetical attack on the mechanism would be
   to cause it to falsely believe that a web server was under DDoS
   attack and requests to it should be throttled, effectively blocking
   the user or e.g. a JavaScript client from communicating with the
   server. To fool the mechanism in this way, the attacker would need to
   be able to modify HTTP responses coming from the web server. In order
   to do so, the attacker would already need to have a means of
   modifying the web server's HTTP responses, and would therefore even
   without the anti-DDoS mechanism in place be able to block users or
   other clients from communicating with the server.

9. Internationalization Considerations

   This memo raises no new internationalization considerations.

Sigurdsson                Expires April 6, 2012                 [Page 8]

Internet-Draft            Anti-DDoS Throttling              October 2011

10. IANA Considerations

   This memo adds no new IANA considerations.

Author's Address

   Joi Sigurdsson
   Google
   Stigahlid 68a
   105 Reykjavik
   Iceland

   Telephone: +354 897-9781
   Fax: +1 866 336-5958
   Email: joi@google.com
   URL: http://www.google.com/

Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.


Sigurdsson                Expires April 6, 2012                 [Page 9]

Internet-Draft            Anti-DDoS Throttling              October 2011

Table of Contents

   1. Introduction.................................................... 3
   2. Requirement Levels.............................................. 3
   3. Background...................................................... 3
   4. Overview of Anti-DDoS Throttling Mechanism...................... 4
   5. Details......................................................... 6
   6. Simulations and Experiments Performed........................... 7
   7. Acknowledgments................................................. 8
   8. Security Considerations......................................... 8
   9. Internationalization Considerations............................. 8
   10. IANA Considerations............................................ 9
   Author's Address................................................... 9


Sigurdsson                Expires April 6, 2012                 [Page 2]