idnits 2.17.1 

draft-polli-ratelimit-headers-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 6 instances of too long lines in the document, the longest one
     being 38 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (26 November 2020) is 1239 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'UNIX' is defined on line 1022, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 7234 (ref. 'CACHING') (Obsoleted by
     RFC 9111)

  ** Obsolete normative reference: RFC 7230 (ref. 'MESSAGING') (Obsoleted by
     RFC 9110, RFC 9112)

  ** Obsolete normative reference: RFC 7231 (ref. 'SEMANTICS') (Obsoleted by
     RFC 9110)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UNIX'

  -- Duplicate reference: RFC7234, mentioned in 'RFC7234', was also mentioned
     in 'CACHING'.

  -- Obsolete informational reference (is this intentional?): RFC 7234
     (Obsoleted by RFC 9111)


     Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	HTTP                                                            R. Polli
3	Internet-Draft                         Team Digitale, Italian Government
4	Intended status: Standards Track                             A. Martinez
5	Expires: 30 May 2021                                             Red Hat
6	                                                        26 November 2020

8	                    RateLimit Header Fields for HTTP
9	                    draft-polli-ratelimit-headers-05

11	Abstract

13	   This document defines the RateLimit-Limit, RateLimit-Remaining,
14	   RateLimit-Reset fields for HTTP, thus allowing servers to publish
15	   current request quotas and clients to shape their request policy and
16	   avoid being throttled out.

18	Note to Readers

20	   _RFC EDITOR: please remove this section before publication_

22	   Discussion of this draft takes place on the HTTP working group
23	   mailing list (ietf-http-wg@w3.org), which is archived at
24	   https://lists.w3.org/Archives/Public/ietf-http-wg/
25	   (https://lists.w3.org/Archives/Public/ietf-http-wg/).

27	   The source code and issues list for this draft can be found at
28	   https://github.com/ioggstream/draft-polli-ratelimit-headers
29	   (https://github.com/ioggstream/draft-polli-ratelimit-headers).

31	Status of This Memo

33	   This Internet-Draft is submitted in full conformance with the
34	   provisions of BCP 78 and BCP 79.

36	   Internet-Drafts are working documents of the Internet Engineering
37	   Task Force (IETF).  Note that other groups may also distribute
38	   working documents as Internet-Drafts.  The list of current Internet-
39	   Drafts is at https://datatracker.ietf.org/drafts/current/.

41	   Internet-Drafts are draft documents valid for a maximum of six months
42	   and may be updated, replaced, or obsoleted by other documents at any
43	   time.  It is inappropriate to use Internet-Drafts as reference
44	   material or to cite them other than as "work in progress."

46	   This Internet-Draft will expire on 30 May 2021.

48	Copyright Notice

50	   Copyright (c) 2020 IETF Trust and the persons identified as the
51	   document authors.  All rights reserved.

53	   This document is subject to BCP 78 and the IETF Trust's Legal
54	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
55	   license-info) in effect on the date of publication of this document.
56	   Please review these documents carefully, as they describe your rights
57	   and restrictions with respect to this document.  Code Components
58	   extracted from this document must include Simplified BSD License text
59	   as described in Section 4.e of the Trust Legal Provisions and are
60	   provided without warranty as described in the Simplified BSD License.

62	Table of Contents

64	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
65	     1.1.  Rate-limiting and quotas  . . . . . . . . . . . . . . . .   3
66	     1.2.  Current landscape of rate-limiting headers  . . . . . . .   4
67	       1.2.1.  Interoperability issues . . . . . . . . . . . . . . .   4
68	     1.3.  This proposal . . . . . . . . . . . . . . . . . . . . . .   5
69	     1.4.  Goals . . . . . . . . . . . . . . . . . . . . . . . . . .   5
70	     1.5.  Notational Conventions  . . . . . . . . . . . . . . . . .   6
71	   2.  Expressing rate-limit policies  . . . . . . . . . . . . . . .   6
72	     2.1.  Time window . . . . . . . . . . . . . . . . . . . . . . .   6
73	     2.2.  Request quota . . . . . . . . . . . . . . . . . . . . . .   6
74	     2.3.  Quota policy  . . . . . . . . . . . . . . . . . . . . . .   7
75	   3.  Header Specifications . . . . . . . . . . . . . . . . . . . .   8
76	     3.1.  RateLimit-Limit . . . . . . . . . . . . . . . . . . . . .   8
77	     3.2.  RateLimit-Remaining . . . . . . . . . . . . . . . . . . .   9
78	     3.3.  RateLimit-Reset . . . . . . . . . . . . . . . . . . . . .   9
79	   4.  Providing RateLimit headers . . . . . . . . . . . . . . . . .  10
80	   5.  Intermediaries  . . . . . . . . . . . . . . . . . . . . . . .  11
81	   6.  Caching . . . . . . . . . . . . . . . . . . . . . . . . . . .  11
82	   7.  Receiving RateLimit headers . . . . . . . . . . . . . . . . .  11
83	   8.  Examples  . . . . . . . . . . . . . . . . . . . . . . . . . .  12
84	     8.1.  Unparameterized responses . . . . . . . . . . . . . . . .  12
85	       8.1.1.  Throttling informations in responses  . . . . . . . .  12
86	       8.1.2.  Use in conjunction with custom headers  . . . . . . .  13
87	       8.1.3.  Use for limiting concurrency  . . . . . . . . . . . .  13
88	       8.1.4.  Use in throttled responses  . . . . . . . . . . . . .  14
89	     8.2.  Parameterized responses . . . . . . . . . . . . . . . . .  15
90	       8.2.1.  Throttling window specified via parameter . . . . . .  15
91	       8.2.2.  Dynamic limits with parameterized windows . . . . . .  15
92	       8.2.3.  Dynamic limits for pushing back and slowing down  . .  16
93	     8.3.  Dynamic limits for pushing back with Retry-After and slow
94	           down  . . . . . . . . . . . . . . . . . . . . . . . . . .  17
95	       8.3.1.  Missing Remaining informations  . . . . . . . . . . .  17
96	       8.3.2.  Use with multiple windows . . . . . . . . . . . . . .  18
97	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  19
98	     9.1.  Throttling does not prevent clients from issuing
99	           requests  . . . . . . . . . . . . . . . . . . . . . . . .  19
100	     9.2.  Information disclosure  . . . . . . . . . . . . . . . . .  19
101	     9.3.  Remaining quota-units are not granted requests  . . . . .  19
102	     9.4.  Reliability of RateLimit-Reset  . . . . . . . . . . . . .  20
103	     9.5.  Resource exhaustion . . . . . . . . . . . . . . . . . . .  20
104	     9.6.  Denial of Service . . . . . . . . . . . . . . . . . . . .  20
105	   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  20
106	     10.1.  RateLimit-Limit Field Registration . . . . . . . . . . .  21
107	     10.2.  RateLimit-Remaining Field Registration . . . . . . . . .  21
108	     10.3.  RateLimit-Reset Field Registration . . . . . . . . . . .  21
109	   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  21
110	     11.1.  Normative References . . . . . . . . . . . . . . . . . .  21
111	     11.2.  Informative References . . . . . . . . . . . . . . . . .  22
112	   Appendix A.  Change Log . . . . . . . . . . . . . . . . . . . . .  23
113	   Appendix B.  Acknowledgements . . . . . . . . . . . . . . . . . .  23
114	   Appendix C.  RateLimit headers currently used on the web  . . . .  23
115	   Appendix D.  FAQ  . . . . . . . . . . . . . . . . . . . . . . . .  24
116	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  27

118	1.  Introduction

120	   The widespreading of HTTP as a distributed computation protocol
121	   requires an explicit way of communicating service status and usage
122	   quotas.

124	   This was partially addressed with the "Retry-After" header field
125	   defined in [SEMANTICS] to be returned in "429 Too Many Requests" or
126	   "503 Service Unavailable" responses.

128	   Still, there is not a standard way to communicate service quotas so
129	   that the client can throttle its requests and prevent 4xx or 5xx
130	   responses.

132	1.1.  Rate-limiting and quotas

134	   Servers use quota mechanisms to avoid systems overload, to ensure an
135	   equitable distribution of computational resources or to enforce other
136	   policies - eg. monetization.

138	   A basic quota mechanism limits the number of acceptable requests in a
139	   given time window, eg. 10 requests per second.

141	   When quota is exceeded, servers usually do not serve the request
142	   replying instead with a "4xx" HTTP status code (eg. 429 or 403) or
143	   adopt more aggressive policies like dropping connections.

145	   Quotas may be enforced on different basis (eg. per user, per IP, per
146	   geographic area, ..) and at different levels.  For example, an user
147	   may be allowed to issue:

149	   *  10 requests per second;

151	   *  limited to 60 request per minute;

153	   *  limited to 1000 request per hour.

155	   Moreover system metrics, statistics and heuristics can be used to
156	   implement more complex policies, where the number of acceptable
157	   request and the time window are computed dynamically.

159	1.2.  Current landscape of rate-limiting headers

161	   To help clients throttling their requests, servers may expose the
162	   counters used to evaluate quota policies via HTTP header fields.

164	   Those response headers may be added by HTTP intermediaries such as
165	   API gateways and reverse proxies.

167	   On the web we can find many different rate-limit headers, usually
168	   containing the number of allowed requests in a given time window, and
169	   when the window is reset.

171	   The common choice is to return three headers containing:

173	   *  the maximum number of allowed requests in the time window;

175	   *  the number of remaining requests in the current window;

177	   *  the time remaining in the current window expressed in seconds or
178	      as a timestamp;

180	1.2.1.  Interoperability issues

182	   A major interoperability issue in throttling is the lack of standard
183	   headers, because:

185	   *  each implementation associates different semantics to the same
186	      header field names;

188	   *  header field names proliferates.

190	   Client applications interfacing with different servers may thus need
191	   to process different headers, or the very same application interface
192	   that sits behind different reverse proxies may reply with different
193	   throttling headers.

195	1.3.  This proposal

197	   This proposal defines syntax and semantics for the following fields:

199	   *  "RateLimit-Limit": containing the requests quota in the time
200	      window;

202	   *  "RateLimit-Remaining": containing the remaining requests quota in
203	      the current window;

205	   *  "RateLimit-Reset": containing the time remaining in the current
206	      window, specified in seconds.

208	   The behavior of "RateLimit-Reset" is compatible with the "delta-
209	   seconds" notation of "Retry-After".

211	   The fields definition allows to describe complex policies, including
212	   the ones using multiple and variable time windows and dynamic quotas,
213	   or implementing concurrency limits.

215	1.4.  Goals

217	   The goals of this proposal are:

219	   1.  Standardizing the names and semantic of rate-limit headers;

221	   2.  Improve resiliency of HTTP infrastructures simplifying the
222	       enforcement and the adoption of rate-limit headers;

224	   3.  Simplify API documentation avoiding expliciting rate-limit fields
225	       semantic in documentation.

227	   The goals do not include:

229	   Authorization:  The rate-limit headers described here are not meant
230	      to support authorization or other kinds of access controls.

232	   Throttling scope:  This specification does not cover the throttling
233	      scope, that may be the given resource-target, its parent path or
234	      the whole Origin [RFC6454] section 7.

236	   Response status code:  The rate-limit headers may be returned in both
237	      Successful and non Successful responses.  This specification does
238	      not cover whether non Successful responses count on quota usage.

240	   Throttling policy:  This specification does not mandate a specific
241	      throttling policy.  The values published in the headers, including
242	      the window size, can be statically or dynamically evaluated.

244	   Service Level Agreement:  Conveyed quota hints do not imply any
245	      service guarantee.  Server is free to throttle respectful clients
246	      under certain circumstances.

248	1.5.  Notational Conventions

250	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
251	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
252	   "OPTIONAL" in this document are to be interpreted as described in
253	   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
254	   capitals, as shown here.

256	   This document uses the Augmented BNF defined in [RFC5234] and updated
257	   by [RFC7405] along with the "#rule" extension defined in Section 7 of
258	   [MESSAGING].

260	   The term Origin is to be interpreted as described in [RFC6454]
261	   section 7.

263	   The "delta-seconds" rule is defined in [CACHING] section 1.2.1.

265	2.  Expressing rate-limit policies

267	2.1.  Time window

269	   Rate limit policies limit the number of acceptable requests in a
270	   given time window.

272	   A time window is expressed in seconds, using the following syntax:

274	   time-window = delta-seconds

276	   Subsecond precision is not supported.

278	2.2.  Request quota

280	   The request-quota is a value associated to the maximum number of
281	   requests that the server is willing to accept from one or more
282	   clients on a given basis (originating IP, authenticated user,
283	   geographical, ..) during a "time-window" as defined in Section 2.1.

285	   The "request-quota" is expressed in "quota-units" and has the
286	   following syntax:

288	      request-quota = quota-units
289	      quota-units = 1*DIGIT

291	   The "request-quota" SHOULD match the maximum number of acceptable
292	   requests.

294	   The "request-quota" MAY differ from the total number of acceptable
295	   requests when weight mechanisms, bursts, or other server policies are
296	   implemented.

298	   If the "request-quota" does not match the maximum number of
299	   acceptable requests the relation with that SHOULD be communicated
300	   out-of-band.

302	   Example: A server could

304	   *  count once requests like "/books/{id}"

306	   *  count twice search requests like "/books?author=Camilleri"

308	   so that we have the following counters

310	GET /books/123                  ; request-quota=4, remaining: 3, status=200
311	GET /books?author=Camilleri     ; request-quota=4, remaining: 1, status=200
312	GET /books?author=Eco           ; request-quota=4, remaining: 0, status=429

314	2.3.  Quota policy

316	   This specification allows describing a quota policy with the
317	   following syntax:

319	      quota-policy = request-quota; "w" "=" time-window
320	                     *( OWS ";" OWS quota-comment)
321	      quota-comment = token "=" (token / quoted-string)

323	   quota-policy parameters like "w" and quota-comment tokens MUST NOT
324	   occur multiple times within the same quota-policy.

326	   An example policy of 100 quota-units per minute.

328	      100;w=60

330	   Two examples of providing further details via custom parameters in
331	   "quota-comments".

333	      100;w=60;comment="fixed window"
334	      12;w=1;burst=1000;policy="leaky bucket"

336	3.  Header Specifications

338	   The following "RateLimit" response fields are defined

340	3.1.  RateLimit-Limit

342	   The "RateLimit-Limit" response field indicates the "request-quota"
343	   associated to the client in the current "time-window".

345	   If the client exceeds that limit, it MAY not be served.

347	   The header value is

349	      RateLimit-Limit = expiring-limit [, 1#quota-policy ]
350	      expiring-limit = request-quota

352	   The "expiring-limit" value MUST be set to the "request-quota" that is
353	   closer to reach its limit.

355	   The "quota-policy" is defined in Section 2.3, and its values are
356	   informative.

358	      RateLimit-Limit: 100

360	   A "time-window" associated to "expiring-limit" can be communicated
361	   via an optional "quota-policy" value, like shown in the following
362	   example

364	      RateLimit-Limit: 100, 100;w=10

366	   If the "expiring-limit" is not associated to a "time-window", the
367	   "time-window" MUST either be:

369	   *  inferred by the value of "RateLimit-Reset" at the moment of the
370	      reset, or

372	   *  communicated out-of-band (eg. in the documentation).

374	   Policies using multiple quota limits MAY be returned using multiple
375	   "quota-policy" items, like shown in the following two examples:

377	      RateLimit-Limit: 10, 10;w=1, 50;w=60, 1000;w=3600, 5000;w=86400
378	      RateLimit-Limit: 10, 10;w=1;burst=1000, 1000;w=3600

380	   This header MUST NOT occur multiple times and can be sent in a
381	   trailer section.

383	3.2.  RateLimit-Remaining

385	   The "RateLimit-Remaining" response field indicates the remaining
386	   "quota-units" defined in Section 2.2 associated to the client.

388	   The header value is

390	      RateLimit-Remaining = quota-units

392	   This header MUST NOT occur multiple times and can be sent in a
393	   trailer section.

395	   Clients MUST NOT assume that a positive "RateLimit-Remaining" value
396	   is a guarantee of being served.

398	   A low "RateLimit-Remaining" value is like a yellow traffic-light: the
399	   red light may arrive suddenly.

401	   One example of "RateLimit-Remaining" use is below.

403	      RateLimit-Remaining: 50

405	3.3.  RateLimit-Reset

407	   The "RateLimit-Reset" response field indicates either

409	   *  the number of seconds until the quota resets.

411	   The header value is

413	      RateLimit-Reset = delta-seconds

415	   The delta-seconds format is used because:

417	   *  it does not rely on clock synchronization and is resilient to
418	      clock adjustment and clock skew between client and server (see
419	      [SEMANTICS] Section 4.1.1.1);

421	   *  it mitigates the risk related to thundering herd when too many
422	      clients are serviced with the same timestamp.

424	   This header MUST NOT occur multiple times and can be sent in a
425	   trailer section.

427	   An example of "RateLimit-Reset" use is below.

429	      RateLimit-Reset: 50

431	   The client MUST NOT assume that all its "request-quota" will be
432	   restored after the moment referenced by "RateLimit-Reset".  The
433	   server MAY arbitrarily alter the "RateLimit-Reset" value between
434	   subsequent requests eg. in case of resource saturation or to
435	   implement sliding window policies.

437	4.  Providing RateLimit headers

439	   A server MAY use one or more "RateLimit" response fields defined in
440	   this document to communicate its quota policies.

442	   The returned values refers to the metrics used to evaluate if the
443	   current request respects the quota policy and MAY not apply to
444	   subsequent requests.

446	   Example: a successful response with the following fields

448	      RateLimit-Limit: 10
449	      RateLimit-Remaining: 1
450	      RateLimit-Reset: 7

452	   does not guarantee that the next request will be successful.  Server
453	   metrics may be subject to other conditions like the one shown in the
454	   example from Section 2.2.

456	   A server MAY return "RateLimit" response fields independently of the
457	   response status code.  This includes throttled responses.

459	   If a response contains both the "Retry-After" and the "RateLimit-
460	   Reset" fields, the value of "RateLimit-Reset" SHOULD reference the
461	   same point in time as "Retry-After".

463	   When using a policy involving more than one "time-window", the server
464	   MUST reply with the "RateLimit" headers related to the window with
465	   the lower "RateLimit-Remaining" values.

467	   Under certain conditions, a server MAY artificially lower "RateLimit"
468	   field values between subsequent requests, eg. to respond to Denial of
469	   Service attacks or in case of resource saturation.

471	   Servers usually establish whether the request is in-quota before
472	   creating a response, so the RateLimit field values should be already
473	   available in that moment.  Nonetheless servers MAY decide to send the
474	   "RateLimit" fields in a trailer section.

476	5.  Intermediaries

478	   This section documents the considerations advised in Section 15.3.3
479	   of [SEMANTICS].

481	   An intermediary that is not part of the originating service
482	   infrastructure and is not aware of the quota-policy semantic used by
483	   the Origin Server SHOULD NOT alter the RateLimit fields' values in
484	   such a way as to communicate a more permissive quota-policy; this
485	   includes removing the RateLimit fields.

487	   An intermediary MAY alter the RateLimit fields in such a way as to
488	   communicate a more restrictive quota-policy when:

490	   *  it is aware of the quota-unit semantic used by the Origin Server;

492	   *  it implements this specification and enforces a quota-policy which
493	      is more restrictive than the one conveyed in the fields.

495	   An intermediary SHOULD forward a request even when presuming that it
496	   might not be serviced; the service returning the RateLimit fields is
497	   the sole responsible of enforcing the communicated quota-policy, and
498	   it is always free to service incoming requests.

500	   This specification does not mandate any behavior on intermediaries
501	   respect to retries, nor requires that intermediaries have any role in
502	   respecting quota-policies.  For example, it is legitimate for a proxy
503	   to retransmit a request without notifying the client, and thus
504	   consuming quota-units.

506	6.  Caching

508	   As is the ordinary case for HTTP caching ([RFC7234]), a response with
509	   RateLimit fields might be cached and re-used for subsequent requests.
510	   A cached RateLimit response, does not modify quota counters but could
511	   contain stale information.  Clients interested in determining the
512	   freshness of the RateLimit fields could rely on fields such as "Date"
513	   and on the "window" value of a "quota-policy".

515	7.  Receiving RateLimit headers

517	   A client MUST process the received "RateLimit" headers.

519	   A client MUST validate the values received in the "RateLimit" headers
520	   before using them and check if there are significant discrepancies
521	   with the expected ones.  This includes a "RateLimit-Reset" moment too
522	   far in the future or a "request-quota" too high.

524	   Malformed "RateLimit" headers MAY be ignored.

526	   A client SHOULD NOT exceed the "quota-units" expressed in "RateLimit-
527	   Remaining" before the "time-window" expressed in "RateLimit-Reset".

529	   A client MAY still probe the server if the "RateLimit-Reset" is
530	   considered too high.

532	   The value of "RateLimit-Reset" is generated at response time: a
533	   client aware of a significant network latency MAY behave accordingly
534	   and use other informations (eg. the "Date" response header, or
535	   otherwise gathered metrics) to better estimate the "RateLimit-Reset"
536	   moment intended by the server.

538	   The "quota-policy" values and comments provided in "RateLimit-Limit"
539	   are informative and MAY be ignored.

541	   If a response contains both the "RateLimit-Reset" and "Retry-After"
542	   fields, the "Retry-After" header field MUST take precedence and the
543	   "RateLimit-Reset" field MAY be ignored.

545	8.  Examples

547	8.1.  Unparameterized responses

549	8.1.1.  Throttling informations in responses

551	   The client exhausted its request-quota for the next 50 seconds.  The
552	   "time-window" is communicated out-of-band or inferred by the header
553	   values.

555	   Request:

557	   GET /items/123

559	   Response:

561	   HTTP/1.1 200 Ok
562	   Content-Type: application/json
563	   RateLimit-Limit: 100
564	   Ratelimit-Remaining: 0
565	   Ratelimit-Reset: 50

567	   {"hello": "world"}

569	8.1.2.  Use in conjunction with custom headers

571	   The server uses two custom headers, namely "acme-RateLimit-DayLimit"
572	   and "acme-RateLimit-HourLimit" to expose the following policy:

574	   *  5000 daily quota-units;

576	   *  1000 hourly quota-units.

578	   The client consumed 4900 quota-units in the first 14 hours.

580	   Despite the next hourly limit of 1000 quota-units, the closest limit
581	   to reach is the daily one.

583	   The server then exposes the "RateLimit-*" headers to inform the
584	   client that:

586	   *  it has only 100 quota-units left;

588	   *  the window will reset in 10 hours.

590	   Request:

592	   GET /items/123

594	   Response:

596	   HTTP/1.1 200 Ok
597	   Content-Type: application/json
598	   acme-RateLimit-DayLimit: 5000
599	   acme-RateLimit-HourLimit: 1000
600	   RateLimit-Limit: 5000
601	   RateLimit-Remaining: 100
602	   RateLimit-Reset: 36000

604	   {"hello": "world"}

606	8.1.3.  Use for limiting concurrency

608	   Throttling headers may be used to limit concurrency, advertising
609	   limits that are lower than the usual ones in case of saturation, thus
610	   increasing availability.

612	   The server adopted a basic policy of 100 quota-units per minute, and
613	   in case of resource exhaustion adapts the returned values reducing
614	   both "RateLimit-Limit" and "RateLimit-Remaining".

616	   After 2 seconds the client consumed 40 quota-units
617	   Request:

619	   GET /items/123

621	   Response:

623	   HTTP/1.1 200 Ok
624	   Content-Type: application/json
625	   RateLimit-Limit: 100
626	   RateLimit-Remaining: 60
627	   RateLimit-Reset: 58

629	   {"elapsed": 2, "issued": 40}

631	   At the subsequent request - due to resource exhaustion - the server
632	   advertises only "RateLimit-Remaining: 20".

634	   Request:

636	   GET /items/123

638	   Response:

640	   HTTP/1.1 200 Ok
641	   Content-Type: application/json
642	   RateLimit-Limit: 100
643	   RateLimit-Remaining: 20
644	   RateLimit-Reset: 56

646	   {"elapsed": 4, "issued": 41}

648	8.1.4.  Use in throttled responses

650	   A client exhausted its quota and the server throttles the request
651	   sending the "Retry-After" response header field.

653	   In this example, the values of "Retry-After" and "RateLimit-Reset"
654	   reference the same moment, but this is not a requirement.

656	   The "429 Too Many Requests" HTTP status code is just used as an
657	   example.

659	   Request:

661	   GET /items/123

663	   Response:

665	   HTTP/1.1 429 Too Many Requests
666	   Content-Type: application/json
667	   Date: Mon, 05 Aug 2019 09:27:00 GMT
668	   Retry-After: Mon, 05 Aug 2019 09:27:05 GMT
669	   RateLimit-Reset: 5
670	   RateLimit-Limit: 100
671	   Ratelimit-Remaining: 0

673	   {
674	   "title": "Too Many Requests",
675	   "status": 429,
676	   "detail": "You have exceeded your quota"
677	   }

679	8.2.  Parameterized responses

681	8.2.1.  Throttling window specified via parameter

683	   The client has 99 "quota-units" left for the next 50 seconds.  The
684	   "time-window" is communicated by the "w" parameter, so we know the
685	   throughput is 100 "quota-units" per minute.

687	   Request:

689	   GET /items/123

691	   Response:

693	   HTTP/1.1 200 Ok
694	   Content-Type: application/json
695	   RateLimit-Limit: 100, 100;w=60
696	   Ratelimit-Remaining: 99
697	   Ratelimit-Reset: 50

699	   {"hello": "world"}

701	8.2.2.  Dynamic limits with parameterized windows

703	   The policy conveyed by "RateLimit-Limit" states that the server
704	   accepts 100 quota-units per minute.

706	   To avoid resource exhaustion, the server artificially lowers the
707	   actual limits returned in the throttling headers.

709	   The "RateLimit-Remaining" then advertises only 9 quota-units for the
710	   next 50 seconds to slow down the client.

712	   Note that the server could have lowered even the other values in
713	   "RateLimit-Limit": this specification does not mandate any relation
714	   between the field values contained in subsequent responses.

716	   Request:

718	   GET /items/123

720	   Response:

722	   HTTP/1.1 200 Ok
723	   Content-Type: application/json
724	   RateLimit-Limit: 10, 100;w=60
725	   Ratelimit-Remaining: 9
726	   Ratelimit-Reset: 50

728	   {
729	     "status": 200,
730	     "detail": "Just slow down without waiting."
731	   }

733	8.2.3.  Dynamic limits for pushing back and slowing down

735	   Continuing the previous example, let's say the client waits 10
736	   seconds and performs a new request which, due to resource exhaustion,
737	   the server rejects and pushes back, advertising "RateLimit-Remaining:
738	   0" for the next 20 seconds.

740	   The server advertises a smaller window with a lower limit to slow
741	   down the client for the rest of its original window after the 20
742	   seconds elapse.

744	   Request:

746	   GET /items/123

748	   Response:

750	   HTTP/1.1 429 Too Many Requests
751	   Content-Type: application/json
752	   RateLimit-Limit: 0, 15;w=20
753	   Ratelimit-Remaining: 0
754	   Ratelimit-Reset: 20

756	   {
757	     "status": 429,
758	     "detail": "Wait 20 seconds, then slow down!"
759	   }

761	8.3.  Dynamic limits for pushing back with Retry-After and slow down

763	   Alternatively, given the same context where the previous example
764	   starts, we can convey the same information to the client via the
765	   Retry-After header, with the advantage that the server can now
766	   specify the policy's nominal limit and window that will apply after
767	   the reset, ie. assuming the resource exhaustion is likely to be gone
768	   by then, so the advertised policy does not need to be adjusted, yet
769	   we managed to stop requests for a while and slow down the rest of the
770	   current window.

772	   Request:

774	   GET /items/123

776	   Response:

778	   HTTP/1.1 429 Too Many Requests
779	   Content-Type: application/json
780	   Retry-After: 20
781	   RateLimit-Limit: 15, 100;w=60
782	   Ratelimit-Remaining: 15
783	   Ratelimit-Reset: 40

785	   {
786	     "status": 429,
787	     "detail": "Wait 20 seconds, then slow down!"
788	   }

790	   Note that in this last response the client is expected to honor the
791	   "Retry-After" header and perform no requests for the specified amount
792	   of time, whereas the previous example would not force the client to
793	   stop requests before the reset time is elapsed, as it would still be
794	   free to query again the server even if it is likely to have the
795	   request rejected.

797	8.3.1.  Missing Remaining informations

799	   The server does not expose "RateLimit-Remaining" values, but resets
800	   the limit counter every second.

802	   It communicates to the client the limit of 10 quota-units per second
803	   always returning the couple "RateLimit-Limit" and "RateLimit-Reset".

805	   Request:

807	   GET /items/123
808	   Response:

810	   HTTP/1.1 200 Ok
811	   Content-Type: application/json
812	   RateLimit-Limit: 10
813	   Ratelimit-Reset: 1

815	   {"first": "request"}

817	   Request:

819	   GET /items/123

821	   Response:

823	   HTTP/1.1 200 Ok
824	   Content-Type: application/json
825	   RateLimit-Limit: 10
826	   Ratelimit-Reset: 1

828	   {"second": "request"}

830	8.3.2.  Use with multiple windows

832	   This is a standardized way of describing the policy detailed in
833	   Section 8.1.2:

835	   *  5000 daily quota-units;

837	   *  1000 hourly quota-units.

839	   The client consumed 4900 quota-units in the first 14 hours.

841	   Despite the next hourly limit of 1000 quota-units, the closest limit
842	   to reach is the daily one.

844	   The server then exposes the "RateLimit" headers to inform the client
845	   that:

847	   *  it has only 100 quota-units left;

849	   *  the window will reset in 10 hours;

851	   *  the "expiring-limit" is 5000.

853	   Request:

855	   GET /items/123
856	   Response:

858	   HTTP/1.1 200 OK
859	   Content-Type: application/json
860	   RateLimit-Limit: 5000, 1000;w=3600, 5000;w=86400
861	   RateLimit-Remaining: 100
862	   RateLimit-Reset: 36000

864	   {"hello": "world"}

866	9.  Security Considerations

868	9.1.  Throttling does not prevent clients from issuing requests

870	   This specification does not prevent clients to make over-quota
871	   requests.

873	   Servers should always implement mechanisms to prevent resource
874	   exhaustion.

876	9.2.  Information disclosure

878	   Servers should not disclose operational capacity informations that
879	   can be used to saturate its resources.

881	   While this specification does not mandate whether non 2xx responses
882	   consume quota, if 401 and 403 responses count on quota a malicious
883	   client could probe the endpoint to get traffic informations of
884	   another user.

886	   As intermediaries might retransmit requests and consume quota-units
887	   without prior knowledge of the User Agent, RateLimit headers might
888	   reveal the existence of an intermediary to the User Agent.

890	9.3.  Remaining quota-units are not granted requests

892	   "RateLimit-*" headers convey hints from the server to the clients in
893	   order to avoid being throttled out.

895	   Clients MUST NOT consider the "quota-units" returned in "RateLimit-
896	   Remaining" as a service level agreement.

898	   In case of resource saturation, the server MAY artificially lower the
899	   returned values or not serve the request anyway.

901	9.4.  Reliability of RateLimit-Reset

903	   Consider that "request-quota" may not be restored after the moment
904	   referenced by "RateLimit-Reset", and the "RateLimit-Reset" value
905	   should not be considered fixed nor constant.

907	   Subsequent requests may return an higher "RateLimit-Reset" value to
908	   limit concurrency or implement dynamic or adaptive throttling
909	   policies.

911	9.5.  Resource exhaustion

913	   When returning "RateLimit-Reset" you must be aware that many
914	   throttled clients may come back at the very moment specified.

916	   This is true for "Retry-After" too.

918	   For example, if the quota resets every day at "18:00:00" and your
919	   server returns the "RateLimit-Reset" accordingly

921	      Date: Tue, 15 Nov 1994 08:00:00 GMT
922	      RateLimit-Reset: 36000

924	   there's a high probability that all clients will show up at
925	   "18:00:00".

927	   This could be mitigated adding some jitter to the field-value.

929	9.6.  Denial of Service

931	   "RateLimit" fields may assume unexpected values by chance or purpose.
932	   For example, an excessively high "RateLimit-Remaining" value may be:

934	   *  used by a malicious intermediary to trigger a Denial of Service
935	      attack or consume client resources boosting its requests;

937	   *  passed by a misconfigured server;

939	   or an high "RateLimit-Reset" value could inhibit clients to contact
940	   the server.

942	   Clients MUST validate the received values to mitigate those risks.

944	10.  IANA Considerations
945	10.1.  RateLimit-Limit Field Registration

947	   This section registers the "RateLimit-Limit" field in the "Hypertext
948	   Transfer Protocol (HTTP) Field Name Registry" registry ([SEMANTICS]).

950	   Field name: "RateLimit-Limit"

952	   Status: permanent

954	   Specification document(s): Section 3.1 of this document

956	10.2.  RateLimit-Remaining Field Registration

958	   This section registers the "RateLimit-Remaining" field in the
959	   "Hypertext Transfer Protocol (HTTP) Field Name Registry" registry
960	   ([SEMANTICS]).

962	   Field name: "RateLimit-Remaining"

964	   Status: permanent

966	   Specification document(s): Section 3.2 of this document

968	10.3.  RateLimit-Reset Field Registration

970	   This section registers the "RateLimit-Reset" field in the "Hypertext
971	   Transfer Protocol (HTTP) Field Name Registry" registry ([SEMANTICS]).

973	   Field name: "RateLimit-Reset"

975	   Status: permanent

977	   Specification document(s): Section 3.3 of this document

979	11.  References

981	11.1.  Normative References

983	   [CACHING]  Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke,
984	              Ed., "Hypertext Transfer Protocol (HTTP/1.1): Caching",
985	              RFC 7234, DOI 10.17487/RFC7234, June 2014,
986	              <https://www.rfc-editor.org/info/rfc7234>.

988	   [MESSAGING]
989	              Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
990	              Protocol (HTTP/1.1): Message Syntax and Routing",
991	              RFC 7230, DOI 10.17487/RFC7230, June 2014,
992	              <https://www.rfc-editor.org/info/rfc7230>.

994	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
995	              Requirement Levels", BCP 14, RFC 2119,
996	              DOI 10.17487/RFC2119, March 1997,
997	              <https://www.rfc-editor.org/info/rfc2119>.

999	   [RFC5234]  Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
1000	              Specifications: ABNF", STD 68, RFC 5234,
1001	              DOI 10.17487/RFC5234, January 2008,
1002	              <https://www.rfc-editor.org/info/rfc5234>.

1004	   [RFC6454]  Barth, A., "The Web Origin Concept", RFC 6454,
1005	              DOI 10.17487/RFC6454, December 2011,
1006	              <https://www.rfc-editor.org/info/rfc6454>.

1008	   [RFC7405]  Kyzivat, P., "Case-Sensitive String Support in ABNF",
1009	              RFC 7405, DOI 10.17487/RFC7405, December 2014,
1010	              <https://www.rfc-editor.org/info/rfc7405>.

1012	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
1013	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
1014	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

1016	   [SEMANTICS]
1017	              Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
1018	              Protocol (HTTP/1.1): Semantics and Content", RFC 7231,
1019	              DOI 10.17487/RFC7231, June 2014,
1020	              <https://www.rfc-editor.org/info/rfc7231>.

1022	   [UNIX]     The Open Group, ., "The Single UNIX Specification, Version
1023	              2 - 6 Vol Set for UNIX 98", February 1997.

1025	11.2.  Informative References

1027	   [RFC3339]  Klyne, G. and C. Newman, "Date and Time on the Internet:
1028	              Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002,
1029	              <https://www.rfc-editor.org/info/rfc3339>.

1031	   [RFC6585]  Nottingham, M. and R. Fielding, "Additional HTTP Status
1032	              Codes", RFC 6585, DOI 10.17487/RFC6585, April 2012,
1033	              <https://www.rfc-editor.org/info/rfc6585>.

1035	   [RFC7234]  Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke,
1036	              Ed., "Hypertext Transfer Protocol (HTTP/1.1): Caching",
1037	              RFC 7234, DOI 10.17487/RFC7234, June 2014,
1038	              <https://www.rfc-editor.org/info/rfc7234>.

1040	Appendix A.  Change Log

1042	   RFC EDITOR PLEASE DELETE THIS SECTION.

1044	Appendix B.  Acknowledgements

1046	   Thanks to Willi Schoenborn, Alejandro Martinez Ruiz, Alessandro
1047	   Ranellucci, Amos Jeffries, Martin Thomson, Erik Wilde and Mark
1048	   Nottingham for being the initial contributors of these
1049	   specifications.  Kudos to the first community implementors: Aapo
1050	   Talvensaari, Nathan Friedly and Sanyam Dogra.

1052	Appendix C.  RateLimit headers currently used on the web

1054	   RFC EDITOR PLEASE DELETE THIS SECTION.

1056	   Commonly used header field names are:

1058	   *  "X-RateLimit-Limit", "X-RateLimit-Remaining", "X-RateLimit-Reset";

1060	   *  "X-Rate-Limit-Limit", "X-Rate-Limit-Remaining", "X-Rate-Limit-
1061	      Reset".

1063	   There are variants too, where the window is specified in the header
1064	   field name, eg:

1066	   *  "x-ratelimit-limit-minute", "x-ratelimit-limit-hour", "x-
1067	      ratelimit-limit-day"

1069	   *  "x-ratelimit-remaining-minute", "x-ratelimit-remaining-hour", "x-
1070	      ratelimit-remaining-day"

1072	   Here are some interoperability issues:

1074	   *  "X-RateLimit-Remaining" references different values, depending on
1075	      the implementation:

1077	      -  seconds remaining to the window expiration

1079	      -  milliseconds remaining to the window expiration

1081	      -  seconds since UTC, in UNIX Timestamp

1083	      -  a datetime, either "IMF-fixdate" [SEMANTICS] or [RFC3339]

1085	   *  different headers, with the same semantic, are used by different
1086	      implementers:

1088	      -  X-RateLimit-Limit and X-Rate-Limit-Limit

1090	      -  X-RateLimit-Remaining and X-Rate-Limit-Remaining

1092	      -  X-RateLimit-Reset and X-Rate-Limit-Reset

1094	   The semantic of RateLimit-Remaining depends on the windowing
1095	   algorithm.  A sliding window policy for example may result in having
1096	   a ratelimit-remaining value related to the ratio between the current
1097	   and the maximum throughput.  Eg.

1099	RateLimit-Limit: 12, 12;w=1
1100	RateLimit-Remaining: 6          ; using 50% of throughput, that is 6 units/s
1101	RateLimit-Reset: 1

1103	   If this is the case, the optimal solution is to achieve

1105	RateLimit-Limit: 12, 12;w=1
1106	RateLimit-Remaining: 1          ; using 100% of throughput, that is 12 units/s
1107	RateLimit-Reset: 1

1109	   At this point you should stop increasing your request rate.

1111	Appendix D.  FAQ

1113	   1.  Why defining standard headers for throttling?

1115	       To simplify enforcement of throttling policies.

1117	   2.  Can I use RateLimit-* in throttled responses (eg with status code
1118	       429)?

1120	       Yes, you can.

1122	   3.  Are those specs tied to RFC 6585?

1124	       No.  [RFC6585] defines the "429" status code and we use it just
1125	       as an example of a throttled request, that could instead use even
1126	       403 or whatever status code.

1128	   4.  Why don't pass the throttling scope as a parameter?

1130	       After a discussion on a similar thread
1131	       (https://github.com/httpwg/http-core/pull/317#issuecomment-
1132	       585868767) we will probably add a new "RateLimit-Scope" header to
1133	       this spec.

1135	       I'm open to suggestions: comment on this issue
1136	       (https://github.com/ioggstream/draft-polli-ratelimit-headers/
1137	       issues/70)

1139	   5.  Why using delta-seconds instead of a UNIX Timestamp?  Why not
1140	       using subsecond precision?

1142	       Using delta-seconds aligns with "Retry-After", which is returned
1143	       in similar contexts, eg on 429 responses.

1145	       delta-seconds as defined in [CACHING] section 1.2.1 clarifies
1146	       some parsing rules too.

1148	       Timestamps require a clock synchronization protocol (see
1149	       [SEMANTICS] section 4.1.1.1).  This may be problematic (eg. clock
1150	       adjustment, clock skew, failure of hardcoded clock
1151	       synchronization servers, IoT devices, ..).  Moreover timestamps
1152	       may not be monotonically increasing due to clock adjustment.  See
1153	       Another NTP client failure story
1154	       (https://community.ntppool.org/t/another-ntp-client-failure-
1155	       story/1014/)

1157	       We did not use subsecond precision because:

1159	       *  that is more subject to system clock correction like the one
1160	          implemented via the adjtimex() Linux system call;

1162	       *  response-time latency may not make it worth.  A brief
1163	          discussion on the subject is on the httpwg ml
1164	          (https://lists.w3.org/Archives/Public/ietf-http-
1165	          wg/2019JulSep/0202.html)

1167	       *  almost all rate-limit headers implementations do not use it.

1169	   6.  Why not support multiple quota remaining?

1171	       While this might be of some value, my experience suggests that
1172	       overly-complex quota implementations results in lower
1173	       effectiveness of this policy.  This spec allows the client to
1174	       easily focusing on RateLimit-Remaining and RateLimit-Reset.

1176	   7.  Shouldn't I limit concurrency instead of request rate?

1178	       You can use this specification to limit concurrency at the HTTP
1179	       level (see {#use-for-limiting-concurrency}) and help clients to
1180	       shape their requests avoiding being throttled out.

1182	       A problematic way to limit concurrency is connection dropping,
1183	       especially when connections are multiplexed (eg.  HTTP/2) because
1184	       this results in unserviced client requests, which is something we
1185	       want to avoid.

1187	       A semantic way to limit concurrency is to return 503 + Retry-
1188	       After in case of resource saturation (eg. thrashing, connection
1189	       queues too long, Service Level Objectives not meet, ..).
1190	       Saturation conditions can be either dynamic or static: all this
1191	       is out of the scope for the current document.

1193	   8.  Do a positive value of "RateLimit-Remaining" imply any service
1194	       guarantee for my future requests to be served?

1196	       No.  The returned values were used to decide whether to serve or
1197	       not _the current request_ and do not imply any guarantee that
1198	       future requests will be successful.

1200	       Instead they help to understand when future requests will
1201	       probably be throttled.  A low value for "RateLimit-Remaining"
1202	       should be interpreted as a yellow traffic-light for either the
1203	       number of requests issued in the "time-window" or the request
1204	       throughput.

1206	   9.  Is the quota-policy definition Section 2.3 too complex?

1208	       You can always return the simplest form of the 3 headers

1210	   RateLimit-Limit: 100
1211	   RateLimit-Remaining: 50
1212	   RateLimit-Reset: 60

1214	   The key runtime value is the first element of the list: "expiring-
1215	   limit", the others "quota-policy" are informative.  So for the
1216	   following header:

1218	RateLimit-Limit: 100, 100;w=60;burst=1000;comment="sliding window", 5000;w=3600;burst=0;comment="fixed window"

1220	   the key value is the one referencing the lowest limit: "100"

1222	   1.  Can we use shorter names?  Why don't put everything in one
1223	       header?

1225	   The most common syntax we found on the web is "X-RateLimit-*" and
1226	   when starting this I-D we opted for it
1227	   (https://github.com/ioggstream/draft-polli-ratelimit-headers/
1228	   issues/34#issuecomment-519366481)
1229	   The basic form of those headers is easily parseable, even by
1230	   implementors procesing responses using technologies like dynamic
1231	   interpreter with limited syntax.

1233	   Using a single header complicates parsing and takes a significantly
1234	   different approach from the existing ones: this can limit adoption.

1236	   1.  Why don't mention connections?

1238	       Beware of the term "connection": &#65532; &#65532; - it is just
1239	       _one_ possible saturation cause.  Once you go that path &#65532;
1240	       you will expose other infrastructural details (bandwidth, CPU, ..
1241	       see Section 9.2) &#65532; and complicate client compliance;
1242	       &#65532; - it is an infrastructural detail defined in terms of
1243	       server and network &#65532; rather than the consumed service.
1244	       This specification protects the services first, and then the
1245	       infrastructures through client cooperation (see Section 9.1).
1246	       &#65532; &#65532; RateLimit headers enable sending _on the same
1247	       connection_ different limit values &#65532; on each response,
1248	       depending on the policy scope (eg. per-user, per-custom-key, ..)
1249	       &#65532;

1251	   2.  Can intermediaries alter RateLimit fields?

1253	       Generally, they should not because it might result in unserviced
1254	       requests.  There are reasonable use cases for intermediaries
1255	       mangling RateLimit fields though, e.g. when they enforce stricter
1256	       quota-policies, or when they are an active component of the
1257	       service.  In those case we will consider them as part of the
1258	       originating infrastructure.

1260	Authors' Addresses

1262	   Roberto Polli
1263	   Team Digitale, Italian Government

1265	   Email: robipolli@gmail.com

1267	   Alejandro Martinez Ruiz
1268	   Red Hat

1270	   Email: amr@redhat.com