idnits 2.17.1 draft-wkumari-dnsop-hammer-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 27, 2017) is 2494 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group W. Kumari 3 Internet-Draft Google 4 Intended status: Informational R. Arends 5 Expires: December 29, 2017 ICANN 6 S. Woolf 8 D. Migault 9 Ericsson 10 June 27, 2017 12 Highly Automated Method for Maintaining Expiring Records 13 draft-wkumari-dnsop-hammer-03 15 Abstract 17 This document describes a simple DNS cache optimization which keeps 18 the most popular Resource Records set (RRset) in the DNS cache: 19 Highly Automated Method for Maintaining Expiring Records (HAMMER). 21 The principle is that popular RRset in the cache are fetched, that is 22 to say resolved before their TTL expires and flushed. By fetching 23 RRset before they are being queried by an end user, that is to say 24 prefetched, HAMMER is expected to improve the quality of experience 25 of the end users as well as to optimize the resources involved in 26 large DNSSEC resolving platforms. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on December 29, 2017. 45 Copyright Notice 47 Copyright (c) 2017 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 63 1.1. Requirements notation . . . . . . . . . . . . . . . . . . 3 64 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 65 3. Motivations . . . . . . . . . . . . . . . . . . . . . . . . . 3 66 3.1. Improving browsing Quality of Experience by reducing 67 response time . . . . . . . . . . . . . . . . . . . . . . 3 68 3.2. Optimize the resources involved in large DNSSEC resolving 69 platforms . . . . . . . . . . . . . . . . . . . . . . . . 4 70 4. Protocol Description . . . . . . . . . . . . . . . . . . . . 5 71 5. Configuration Variables . . . . . . . . . . . . . . . . . . . 7 72 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 73 7. Security Considerations . . . . . . . . . . . . . . . . . . . 7 74 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 75 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 76 9.1. Normative References . . . . . . . . . . . . . . . . . . 8 77 9.2. Informative References . . . . . . . . . . . . . . . . . 8 78 Appendix A. Known implementations . . . . . . . . . . . . . . . 8 79 A.1. Unbound (NLNet Labs) . . . . . . . . . . . . . . . . . . 9 80 A.2. OpenDNS . . . . . . . . . . . . . . . . . . . . . . . . . 9 81 A.3. ISC BIND . . . . . . . . . . . . . . . . . . . . . . . . 9 82 Appendix B. Changes / Author Notes. . . . . . . . . . . . . . . 10 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 85 1. Introduction 87 A recursive DNS resolver may cache a Resource Record set (RRset) for, 88 at most, the Time To Live (TTL) associated with that RRset. While 89 the TTL is greater than zero, the resolver may respond to queries 90 from its cache; but once the TTL has reached zero, the resolver 91 flushes the RRset. When the resolver gets another query for that 92 RRset, the RRset is not anymore in the cache, thus the resolver need 93 to proceed to a new resolution for that RRset with its associated 94 latency and processing. The resolved RRset are then cached and 95 returned to the original querying client. This document discusses an 96 optimization (Highly Automated Method for Maintaining Expiring 97 Records -- (HAMMER), also known as "prefetch") to help keep popular 98 responses in the cache, by fetching (or resolving) resources before 99 their TTL expires. 101 In that document, a resolver implementing HAMMER (HAMMER resolver) 102 prefetches a RRset candidate to HAMMER (HAMMER RRset) when it 103 receives a query and its TTL is lower than HAMMER TIME. 105 Note that [RFC4035] assumes that all RR of a RRset have the same TTL, 106 while [RFC2181] allows the TTL of the RR of a RRset to be different. 107 When the RRset does not follow [RFC4035], the TTL of the RRset that 108 is considered is the minimum value of the TTL. 110 1.1. Requirements notation 112 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 113 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 114 document are to be interpreted as described in [RFC2119]. 116 2. Terminology 118 HAMMER resolver: A DNS resolver that implements HAMMER mechanism. 120 HAMMER RRset: A RRset that is a candidate for the HAMMER process. 122 HAMMER TIME: a number of seconds that indicates the period preceding 123 the TTL expiration time. When a query for a HAMMER RRset is 124 received during that period, the HAMMER resolver prefetches the 125 HAMMER RRset by initiating a resolution. 127 3. Motivations 129 When a recursive resolver responds to a client, it either responds 130 from cache, or it initiates an iterative query to resolve the answer, 131 caches the answer and then responds with that answer. 133 3.1. Improving browsing Quality of Experience by reducing response time 135 Any end user querying a fetched RRset will get the response from the 136 cache of the resolver. This provides faster responses, thus 137 improving the end user experience for browsing and other 138 applications/activities. 140 Popular RRsets are highly queried, and end users have high 141 expectations in terms of application responsiveness for these RRsets. 142 With regular DNS rules, once the RRset has been flushed from the 143 cache, it waits for the next end user to request the RRset before 144 initiating a resolution for this given RRset with iterative queries. 145 This results in at least one end user waiting for this resolution to 146 be performed over the Internet before the response is sent to them. 147 This may provide a poor user experience since DNS response times over 148 the Internet are unpredictable at best and it provides a response 149 time longer then usual. Note that the response time may also be 150 increased by the use of DNSSEC since DNSSEC may involve additional 151 resolutions, larger payloads, and signature checks. 153 In addition to that first end user querying RRset after it has been 154 flushed, end users querying the RRset during its resolution or 155 fetching phase are also impacted. As a result, especially for 156 popular RRsets, multiple end users are likely to be impacted and be 157 provided a poor user experience. 159 The impact on users also depends on the architecture of resolving 160 platform architecture. In some case, a centralized resolver is 161 implemented as a farm of independent resolving nodes and the traffic 162 is split between the nodes according to the IP addresses and ports. 163 In such architectures, the number of affected end users is 164 proportional to the number of resolving nodes as each query is 165 pseudo-randomly associated to one of the resolving node. Similarly, 166 some global resolving platform uses anycast to steer the queries to 167 the resolving node associated with the shortest path. Unless all 168 queries comes from a single region, such architecture are also 169 expected to impact a number of user proportional to the number of 170 resolving nodes. 172 3.2. Optimize the resources involved in large DNSSEC resolving 173 platforms 175 As mentioned in Section 3.1, large resolving platforms are often 176 composed of a set of independent resolving nodes in order to 177 distribute the traffic between these nodes. Traffic can be 178 distributed using various forms of load balancing between the 179 resolving nodes. This includes, for example, a pseudo-random 180 distribution when load balancing is based on the hash of the IP 181 addresses and ports or shortest path when anycast is deployed. Such 182 distributions split the traffic independently of the queried RRset. 183 Ignoring the coordination of the resolutions implicitly assumes the 184 resource to perform the resolutions is negligible compared to those 185 necessary to handle the queries of the end users. 187 As a result, such platforms perform multiple parallel resolutions on 188 their various nodes. With DNS, the necessary resource associated to 189 the resolution were in fact minimal so little effort have been 190 considered to synchronize these nodes in order to reduce the number 191 of resolutions. On the other hand DNSSEC resolutions involve 192 additional resolutions, larger payloads and signature checks. The 193 consequent increase of resource to perform DNSSEC resolutions versus 194 DNS resolution makes parallel resolutions a non negligible lost of 195 resource and leave place for synchronization mechanisms. 197 One way to reduce the number of DNSSEC resolutions is to prefetch (or 198 provision) the nodes with the most popular RRsets before their TTL 199 expire. Note that in this case, the resolution is not performed by 200 the resolving node. At a node level, prefetching increases the nodes 201 availability. At the platform level, synchronizing the resolving 202 nodes' resolution globally reduce the number of resolution and so the 203 overall resource of the platform. 205 Synchronization of the resolution may be performed by configuring 206 each node as a forwarder for these RRsets. This avoids parallel 207 resolutions and overall reduces cost, because signature checks are 208 not performed by each resolving node. In this case prefetching 209 enables to still benefit from the already existing load balancing 210 architecture that split the load of the end users' queries traffic 211 between the nodes. Note that the advantages of synchronizing the 212 resolutions between the resolving nodes may depend on the popularity 213 of the RRsets. This architecture takes advantage of the Zipf [ZIPF] 214 distribution of the RRsets' popularity. In fact, a few number of 215 RRsets need to be cached (a few thousands) to address most of the 216 traffic (up to 70%) [PREFETCH]. 218 4. Protocol Description 220 This section describes HAMMER. This section is not normative and 221 implementation may implement this mechanism with their own flavor. 223 When a recursive resolver that implements HAMMER receives a query for 224 a HAMMER RRset that it has in the cache, it responds from the cache. 226 If the queried RRset is a HAMMER RRset, the HAMMER resolver compares 227 the TTL value to the HAMMER TIME, as well as if the RRset is being 228 (pre)fetched. 230 If the HAMMER RRset has a TTL greater then the HAMMER TIME, nothing 231 is done. 233 If the HAMMER RRset has a TTL less than the HAMMER TIME, the HAMMER 234 resolver starts a resolution for the RRset in order to fill the 235 cache, just as if the TTL had expired. The HAMMER RRset is 236 prefetched. Note that during the resolution, the HAMMER RRset is 237 still cached, and queries are responded form the cache until the TTL 238 expires. Once the resolution is performed, the freshly resolved 239 RRset replace the existing cached RRset. This ensures the cache has 240 fresh data for subsequent queries. 242 Since prefetching is initiated before the existing cached entry 243 expires (and is flushed), responses will come from the cache more 244 often. This decreases the client resolution latency and improves the 245 user experience. 247 Prefetching is triggered by an incoming query (and only if that query 248 arrives shortly before the record would expire anyway). This 249 effectively keeps the most popular RRsets uniformly queried in the 250 cache, without having to maintain counters in the cache or 251 proactively resolve responses that are not likely to be needed as 252 often. This is purely an implementation optimization - resolvers 253 always have the option to cache records for less than the TTL (for 254 example, when running low on cache space, etc), this simply triggers 255 a refresh of the RRset before it expires. 257 Note that non-uniformly queried RRsets may be popular and may not 258 benefit from the HAMMER mechanism. For example, a RRset MAY be 259 heavily queried the first 10 minutes of every hour with a 30 minute 260 TTL. In that case DNS queries are not expected to come between TTL - 261 HAMMER TIME and TTL. 263 HAMMER RRset with small TTL may generate a prefetching process even 264 though they are not so popular. Suppose an end user is setting a 265 specific session which requires multiple DNS resolutions on a given 266 FQDN. These resolutions are necessary for a short period of time, 267 i.e. the necessary time to establish the session. If these RRset 268 have been set with a small TTL - in the order of the time session 269 establishment - the multiple queries to a HAMMER resolver may trigger 270 an unnecessary resolution. As a result HAMMER would not scale 271 thousands of these RRsets. As a result, if the original TTL of the 272 RRset is less than (or close to HAMMER TIME), the described method 273 could cause excessive prefetching queries to occur. In order to 274 prevent this an additional variable named STOP (described below) is 275 introduced. If the original TTL of the RRset is less than STOP * 276 HAMMER TIME then the cache entry should be marked with a "Can't touch 277 this" flag, and the described method should not be used. 279 5. Configuration Variables 281 These are the mandatory variables: 283 HAMMER TIME: a number of seconds that indicates the period preceding 284 the TTL expiration time. When a query for a HAMMER RRset is 285 received during that period, the HAMMER resolver prefetches the 286 HAMMER RRset by initiating a resolution. A default of 2 287 seconds is RECOMMENDED. 289 STOP: should be a user configurable variable. A default of 3 is 290 recommended. 292 Implementations may consider additional variables. These are not 293 mandatory but would address specific use of the HAMMER. 295 HAMMER MATCH: should be a user configurable variable. It defines 296 RRsets that are expected to implement HAMMER. This rule can be 297 expressed in different ways. It can be a list of RRsets, or a 298 number indicating the number of most popular RRsets that needs 299 to be considered. How HAMMER MATCH is expressed is 300 implementation dependent. Implementations can use a list of 301 FQDNs, others can use a matching rule on the RRsets, or define 302 the HAMMER RRsets as the X most popular RRsets. 304 HAMMER FORWARDER: should be a user configurable variable. It is 305 optional and designates the DNS server the resolver forwards 306 the request to. 308 6. IANA Considerations 310 This document makes no request of the IANA. 312 7. Security Considerations 314 This technique leverages existing protocols, and should not introduce 315 any new risks, other than a slight increase in traffic. 317 By initiating cache fill entries before the existing RR has expired 318 this technique will slightly increase the number of queries seen by 319 authoritative servers. This increase will be inversely proportional 320 to the average TTL of the records that they serve. 322 It is unlikely, but possible, that this increase could cause a denial 323 of service condition. 325 8. Acknowledgements 327 The authors wish to thank Tony Finch and MC Hammer. We also wish to 328 thank Brian Somers and Wouter Wijngaards for telling us that they 329 already do this :-) (They should probably be co-authors, but I left 330 this too close to the draft cutoff time to confirm with them that 331 they are willing to have their names on this). 333 9. References 335 9.1. Normative References 337 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 338 Requirement Levels", BCP 14, RFC 2119, 339 DOI 10.17487/RFC2119, March 1997, 340 . 342 [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS 343 Specification", RFC 2181, DOI 10.17487/RFC2181, July 1997, 344 . 346 [RFC4035] Arends, R., Austein, R., Larson, M., Massey, D., and S. 347 Rose, "Protocol Modifications for the DNS Security 348 Extensions", RFC 4035, DOI 10.17487/RFC4035, March 2005, 349 . 351 9.2. Informative References 353 [PREFETCH] 354 Migault, D., Francfort, S., Senecal, S., Herbert, E., and 355 M. Laurent, "PREFETCHing to optimize DNSSEC deployment 356 over large Resolving Platforms", Jul 2013, 357 . 361 [ZIPF] Powers, D., "Applications and Explanations of Zipf's Law", 362 Jan 1998, . 364 Appendix A. Known implementations 366 [RFC Editor: Please remove this section before publication ] 368 [Ed: Well, this is kinda embarrassing. This idea occurred to us one 369 day while sitting around a pool in New Hampshire. It then took a 370 while before I wrote it down, mostly because I *really* wanted to get 371 "Stop! Hammer Time!" into a draft. Anyway, we presented it in 372 Berlin, and Wouter Wijngaards stood up and mentioned that Unbound 373 already does this (they use a percentage of TTL, instead of a number 374 of seconds). Then we heard from OpenDNS that they *also* implement 375 something similar. Then we had a number of discussions, then got 376 sidetracked into other things. Anyway, BIND as of 9.10, around Feb 377 2014 now implements something like this 378 (https://deepthought.isc.org/article/AA-01122/0/Early-refresh-of- 379 cache-records-cache-prefetch-in-BIND-9.10.html), and enables it by 380 default. Unfortunately, while BIND uses the times based approach, 381 they named their parameters "trigger" and "eligibility" - and 382 shouting "Eligibility! Trigger time!" simply isn't funny (unless you 383 have a very odd sense of humor... So, we are now documenting 384 implementations that existed before this was published and an 385 impl,entation that we think was based on this. We think that this 386 has value to the community. I'm also leaving in the HAMMER TIME bit, 387 because it makes me giggle. This below section should be filled out 388 with more detail, in collaboration with the implementors, but this is 389 being written *just* before the draft cutoff.]. 391 A number of recursive resolvers implement techniques similar to the 392 techniques described in this document. This section documents some 393 of these and tradeoffs they make in picking their techniques. 395 A.1. Unbound (NLNet Labs) 397 The Unbound validating, recursive, and caching DNS resolver 398 implements a HAMMER type feature, called "prefetch". This feature 399 can be enabled or disabled though the configuration option "prefetch: 400 ". When enabled, Unbound will fetch expiring records when 401 their remaining TTL is less than 10% of their original TTL. 403 [Ed: Unbound's "prefetch" function was developed independently, 404 before this draft was written. The authors were unaware of it when 405 writing the document.] 407 A.2. OpenDNS 409 The public DNS resolver, OpenDNS implements a prefetch like solution. 411 [Ed: Will work with OpenDNS to get more details.] 413 A.3. ISC BIND 415 As of version 9.10, Internet Systems Consortium's BIND implements the 416 HAMMER functionality. This feature is enabled by default. 418 The functionality is configured using the "prefetch" options 419 statement, with two parameters: 421 Trigger This is equivalent to the HAMMER_TIME parameter described 422 below. 424 Eligibility This is equivalent to the STOP parameter described 425 below. 427 Appendix B. Changes / Author Notes. 429 [RFC Editor: Please remove this section before publication ] 431 From -01 to -02: 433 o Readbility / cleanup. 435 o Tried to make it more clear that most implementations now support 436 this (although they call it "prefetch" ) 438 From -00 to 01: 440 o Fairly large rewrite. 442 o Added text on the fact that there are implmentations that do this. 444 o Added the "prefetch" name, cleaned up some readability. 446 o Daniel's test (Section 3.2) added. 448 From -template to -00. 450 o Wrote some text. 452 o Changed the name. 454 Authors' Addresses 456 Warren Kumari 457 Google 458 1600 Amphitheatre Parkway 459 Mountain View, CA 94043 460 US 462 Email: warren@kumari.net 464 Roy Arends 465 ICANN 467 Email: roy.arends@icann.org 468 Suzanne Woolf 469 39 Dodge St. #317 470 Beverly, MA 01915 471 US 473 Email: suzworldwide@gmail.com 475 Daniel Migault 476 Ericsson 477 2039 Rue Cohen 478 Saint-Laurent H4R 2A4 479 Canada 481 Email: daniel.migaultf@ericsson.com