idnits 2.17.1 draft-muks-dnsop-dns-thundering-herd-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 25, 2020) is 1372 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- No issues found here. Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force M. Sivaraman 3 Internet-Draft Akira Systems Private Limited 4 Intended status: Experimental Liu 5 Expires: December 27, 2020 Infoblox 6 June 25, 2020 8 The DNS thundering herd problem 9 draft-muks-dnsop-dns-thundering-herd-00 11 Abstract 13 This document describes an observed regular pattern of spikes in 14 queries that affects caching resolvers, and recommends software 15 mitigations for it. 17 Status of This Memo 19 This Internet-Draft is submitted in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF). Note that other groups may also distribute 24 working documents as Internet-Drafts. The list of current Internet- 25 Drafts is at https://datatracker.ietf.org/drafts/current/. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 This Internet-Draft will expire on December 27, 2020. 34 Copyright Notice 36 Copyright (c) 2020 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (https://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with respect 44 to this document. Code Components extracted from this document must 45 include Simplified BSD License text as described in Section 4.e of 46 the Trust Legal Provisions and are provided without warranty as 47 described in the Simplified BSD License. 49 Table of Contents 51 1. Problem Description . . . . . . . . . . . . . . . . . . . . . 2 52 2. Requirements Notation . . . . . . . . . . . . . . . . . . . . 4 53 3. Mitigations . . . . . . . . . . . . . . . . . . . . . . . . . 4 54 3.1. Combine identical queries to upstream nameservers . . . . 4 55 3.2. Include noise in response TTLs from caching resolvers . . 4 56 3.3. Other mitigations . . . . . . . . . . . . . . . . . . . . 4 57 4. Security Considerations . . . . . . . . . . . . . . . . . . . 5 58 5. IANA considerations . . . . . . . . . . . . . . . . . . . . . 5 59 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 5 60 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 5 61 7.1. Normative references . . . . . . . . . . . . . . . . . . 5 62 7.2. Informative references . . . . . . . . . . . . . . . . . 5 63 Appendix A. Change history (to be removed before publication) . 6 64 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 6 66 1. Problem Description 68 Typically, DNS caching resolvers prepare answers for multiple clients 69 from a single cached RRset [RFC1034]. Depending on when in time the 70 clients make their queries, caching resolvers reply with lower and 71 lower valued TTLs, before the cached RRset from which answers are 72 prepared expires. Clients themselves may cache and use their copies 73 of RRsets until the TTL that the resolver replied with expires. A 74 key property is that all these copies of answers, and the cached 75 answer from which they are prepared, expire at the same absolute 76 time. 78 As an example, consider the following query sequence received by a 79 resolver from 10 clients all querying for a popular 80 www.example.org./A RRset. We use this example to illustrate two 81 kinds of spikes in queries. 83 +--------+------------+--------+------------------------------------+ 84 | Client | Query time | Answer | Notes | 85 | | (seconds | RRset | | 86 | | since | TTL | | 87 | | epoch) | | | 88 +--------+------------+--------+------------------------------------+ 89 | C1 | 1591441620 | 600 | Answer was not found in cache. | 90 | | | | Resolver performs a resolution | 91 | | | | and caches authoritative answer | 92 | | | | with TTL=600. | 93 | C2 | 1591441626 | 594 | Answered from cache. | 94 | C3 | 1591441713 | 507 | Answered from cache. | 95 | C4 | 1591441780 | 440 | Answered from cache. | 96 | C5 | 1591441866 | 354 | Answered from cache. | 97 | C6 | 1591442006 | 214 | Answered from cache. | 98 | C7 | 1591442070 | 150 | Answered from cache. | 99 | C8 | 1591442070 | 150 | Answered from cache. | 100 | C9 | 1591442213 | 7 | Answered from cache. | 101 | C3 | 1591442220 | 600 | Previously cached answer had | 102 | | | | expired in the resolver's | 103 | | | | cache. So the resolver performs a | 104 | | | | fresh resolution and caches | 105 | | | | authoritative answer with TTL=600. | 106 | C5 | 1591442220 | 600 | Ditto if not joined with previous. | 107 | C2 | 1591442220 | 600 | Ditto if not joined with previous. | 108 | C6 | 1591442220 | 600 | Ditto if not joined with previous. | 109 | C1 | 1591442221 | 599 | Answered from cache. | 110 | C9 | 1591442221 | 599 | Answered from cache. | 111 | C4 | 1591442221 | 599 | Answered from cache. | 112 | C8 | 1591442221 | 599 | Answered from cache. | 113 | C7 | 1591442221 | 599 | Answered from cache. | 114 | C10 | 1591442227 | 593 | Answered from cache. | 115 | C7 | 1591442820 | 600 | Previously cached answer had | 116 | | | | expired in the resolver's | 117 | | | | cache. So the resolver performs a | 118 | | | | fresh resolution and caches | 119 | | | | authoritative answer with TTL=600. | 120 | C4 | 1591442820 | 600 | Ditto if not joined with previous. | 121 | C1 | 1591442820 | 600 | Ditto if not joined with previous. | 122 | C2 | 1591442820 | 600 | Ditto if not joined with previous. | 123 | C10 | 1591442820 | 600 | Ditto if not joined with previous. | 124 | C8 | 1591442820 | 600 | Ditto if not joined with previous. | 125 | C3 | 1591442821 | 599 | Answered from cache. | 126 | C9 | 1591442821 | 599 | Answered from cache. | 127 | C5 | 1591442821 | 599 | Answered from cache. | 128 | C6 | 1591442821 | 599 | Answered from cache. | 129 +--------+------------+--------+------------------------------------+ 131 2. Requirements Notation 133 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 134 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 135 "OPTIONAL" in this document are to be interpreted as described in 136 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, 137 as shown here. 139 3. Mitigations 141 3.1. Combine identical queries to upstream nameservers 143 At a resolver, when multiple queries have arrived together asking the 144 same question and there is no existing unexpired cached answer, DNS 145 resolutions have to be performed to answer these queries. De- 146 duplication of these multiple resolutions into a single DNS 147 resolution by the resolver is RECOMMENDED where possible. 149 If such de-duplication is not performed, the client queries will 150 effectively be forwarded 1:1 by the resolver to upstream nameservers, 151 and they will significantly increase the upstream nameservers' query 152 rate in spikes. Some nameserver operators may have deployed measures 153 such as response rate limiting [RRL] and other IP-address based rate 154 limiting, which may cause them to deny service to the resolver due to 155 the query spikes of identical queries. 157 3.2. Include noise in response TTLs from caching resolvers 159 Caching resolvers are permitted to lower the TTLs of RRsets in their 160 answers as they please [RFC2181]. This can be used to distribute the 161 time at which RRset copies received by clients expire from a single 162 absolute time to a time interval. However, this has to be done with 163 some consideration such that the thundering herd doesn't re-converge 164 at the expiry time of the cached RRset that is used to generate 165 answers to the clients. 167 TBD. 169 3.3. Other mitigations 171 With very low authoritative RRset TTLs (such as under 60s) for 172 popular questions, the frequency of the thundering herd increases and 173 including noise in response TTLs is less effective because the 174 maximum TTL to work with is low. In other words, there is a shorter 175 interval over which the thundering herd can be distributed by adding 176 noise. Some implementations permit an operator to set a minimum TTL 177 value such that authoritative RRset TTLs with lower values are 178 increased and clamped to the minimum TTL value. This breaks 179 currently accepted DNS protocol, and hence this document does not 180 make any recommendation about it. 182 4. Security Considerations 184 There are no security considerations. 186 5. IANA considerations 188 There are no IANA considerations. 190 6. Acknowledgements 192 This document was prepared from thundering herd client query patterns 193 noticed at resolvers of ISPs and large institutions, which resulted 194 in traffic spikes that caused performance issues and lookup failures. 195 The authors acknowledge the contribution of Ramesh Damodaran who 196 participated in analysis of these patterns. 198 7. References 200 7.1. Normative references 202 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", 203 STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987, 204 . 206 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 207 Requirement Levels", BCP 14, RFC 2119, 208 DOI 10.17487/RFC2119, March 1997, 209 . 211 [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS 212 Specification", RFC 2181, DOI 10.17487/RFC2181, July 1997, 213 . 215 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 216 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 217 May 2017, . 219 7.2. Informative references 221 [RRL] Vixie, P. and V. Schryver, "DNS Response Rate Limiting 222 (DNS RRL)", 2012, 223 . 225 Appendix A. Change history (to be removed before publication) 227 o draft-muks-dnsop-dns-thundering-herd-00 228 * Initial draft. 230 Authors' Addresses 232 Mukund Sivaraman 233 Akira Systems Private Limited 234 1 Coleman Street, #05-05 The Adelphi 235 Singapore 179803 236 SG 238 Email: muks@akira.org 239 URI: https://akira.org/ 241 Cricket Liu 242 Infoblox 243 3111 Coronado Drive 244 Santa Clara 95054 245 US 247 Email: cricket@infoblox.com 248 URI: http://www.infoblox.com/