idnits 2.17.1 

draft-moura-dnsop-authoritative-recommendations-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (June 11, 2019) is 1774 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-10) exists of
     draft-ietf-dnsop-serve-stale-05

  ** Obsolete normative reference: RFC 5575 (Obsoleted by RFC 8955)

  ** Obsolete normative reference: RFC 8499 (Obsoleted by RFC 9499)


     Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	DNSOP Working Group                                             G. Moura
3	Internet-Draft                                        SIDN Labs/TU Delft
4	Intended status: Informational                               W. Hardaker
5	Expires: December 13, 2019                                  J. Heidemann
6	                                      USC/Information Sciences Institute
7	                                                               M. Davids
8	                                                               SIDN Labs
9	                                                           June 11, 2019

11	      Considerations for Large Authoritative DNS Servers Operators
12	           draft-moura-dnsop-authoritative-recommendations-04

14	Abstract

16	   This document summarizes recent research work exploring DNS
17	   configurations and offers specific, tangible considerations to
18	   operators for configuring authoritative servers.

20	   This document is not an Internet Standards Track specification; it is
21	   published for informational purposes.

23	Ed note

25	   This draft will be renamed to draft-moura-dnsop-large-authoritative-
26	   considerations in case adpoted by the WG, to reflect the new title.

28	   Text inside square brackets ([RF:ABC]) refers to:

30	   o  individual comments we have received about the draft, and
31	      enumerated under <https://github.com/gmmoura/draft-moura-dnsop-
32	      authoritative-recommendations/blob/master/reviews/reviews-
33	      dnsop.md>.

35	   o  Issues listed on our Github repository

37	   Both types will be removed before publication.

39	   This draft is being hosted on GitHub - <https://github.com/gmmoura/
40	   draft-moura-dnsop-authoritative-recommendations>, where the most
41	   recent version of the document and open issues can be found.  The
42	   authors gratefully accept pull requests.

44	Status of This Memo

46	   This Internet-Draft is submitted in full conformance with the
47	   provisions of BCP 78 and BCP 79.

49	   Internet-Drafts are working documents of the Internet Engineering
50	   Task Force (IETF).  Note that other groups may also distribute
51	   working documents as Internet-Drafts.  The list of current Internet-
52	   Drafts is at https://datatracker.ietf.org/drafts/current/.

54	   Internet-Drafts are draft documents valid for a maximum of six months
55	   and may be updated, replaced, or obsoleted by other documents at any
56	   time.  It is inappropriate to use Internet-Drafts as reference
57	   material or to cite them other than as "work in progress."

59	   This Internet-Draft will expire on December 13, 2019.

61	Copyright Notice

63	   Copyright (c) 2019 IETF Trust and the persons identified as the
64	   document authors.  All rights reserved.

66	   This document is subject to BCP 78 and the IETF Trust's Legal
67	   Provisions Relating to IETF Documents
68	   (https://trustee.ietf.org/license-info) in effect on the date of
69	   publication of this document.  Please review these documents
70	   carefully, as they describe your rights and restrictions with respect
71	   to this document.  Code Components extracted from this document must
72	   include Simplified BSD License text as described in Section 4.e of
73	   the Trust Legal Provisions and are provided without warranty as
74	   described in the Simplified BSD License.

76	Table of Contents

78	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
79	   2.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   3
80	   3.  C1:  Use equally strong IP anycast in every authoritative
81	       server (NS) for better load distribution  . . . . . . . . . .   5
82	   4.  C2: Routing Can Matter More Than Locations  . . . . . . . . .   6
83	   5.  C3: Collecting Detailed Anycast Catchment Maps Ahead of
84	       Actual Deployment Can Improve Engineering Designs . . . . . .   7
85	   6.  C4: When under stress, employ two strategies  . . . . . . . .   9
86	   7.  C5: Consider longer time-to-live values whenever possible . .  10
87	   8.  Security considerations . . . . . . . . . . . . . . . . . . .  12
88	   9.  Privacy Considerations  . . . . . . . . . . . . . . . . . . .  12
89	   10. IANA considerations . . . . . . . . . . . . . . . . . . . . .  13
90	   11. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  13
91	   12. References  . . . . . . . . . . . . . . . . . . . . . . . . .  13
92	     12.1.  Normative References . . . . . . . . . . . . . . . . . .  13
93	     12.2.  Informative References . . . . . . . . . . . . . . . . .  14
94	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  16

96	1.  Introduction

98	   This document summarizes recent research work exploring DNS
99	   configurations and offers specific tangible considerations to DNS
100	   authoritative servers operators (DNS operators hereafter).
101	   [RF:JAb2]], [RF:MSJ1], [RF:DW2].  The considerations (C1-C5)
102	   presented in this document are backed by previous research work,
103	   which used wide-scale Internet measurements upon which to draw their
104	   conclusions.  This document describes the key engineering options,
105	   and points readers to the pertinent papers for details and
106	   [RF:Issue15] other research works related to each consideration here
107	   presented.

109	   [RF:JAb1, Issue#2, SJa-02].  These considerations are designed for
110	   operators of "large" authoritative servers.  In this context, "large"
111	   authoritative servers refers to those with a significant global user
112	   population, like TLDs, run by a single or multiple operators.  These
113	   considerations may not be appropriate for smaller domains, such as
114	   those used by an organization with users in one city or region, where
115	   goals such as uniform low latency are less strict.

117	   It is likely that these considerations might be useful in a wider
118	   context, such as for any stateless/short-duration, anycasted service.
119	   Because the conclusions of the studies don't verify this fact, the
120	   wording in this document discusses DNS authoritative services only
121	   ([RF:Issue13]).

123	2.  Background

125	   The domain name system (DNS) has main two types of DNS servers:
126	   authoritative servers and recursive resolvers.  Figure 1 shows their
127	   relationship.  An authoritative server (ATn in Figure 1) knows the
128	   content of a DNS zone from local knowledge, and thus can answer
129	   queries about that zone without needing to query other servers
130	   [RFC2181].  A recursive resolver (Re_n) is a program that extracts
131	   information from name servers in response to client requests
132	   [RFC1034].  A client (stub in Figure 1) refers to stub resolver
133	   [RFC1034] that is typically located within the client software.

135	               +-----+   +-----+   +-----+   +-----+
136	               | AT1 |   | AT2 |   | AT3 |   | AT4 |
137	               +-----+   +-----+   +-----+   +-----+
138	                  ^         ^         ^         ^
139	                  |         |         |         |
140	                  |      +-----+      |         |
141	                  +------|Re_1 |------+         |
142	                  |      +-----+                |
143	                  |         ^                   |
144	                  |         |                   |
145	                  |      +-----+    +-----+     |
146	                  +------|Re_2 |    |Re_3 |-----+
147	                         +-----+    +-----+
148	                           ^           ^
149	                           |           |
150	                           | +------+  |
151	                           +-| stub |--+
152	                             +------+

154	       Figure 1: Relationship between recursive resolvers (Re_n) and
155	                     authoritative name servers (ATn)

157	   DNS queries/responses contribute to user's perceived latency and
158	   affect user experience [Sigla2014], and the DNS system has been
159	   subject to repeated Denial of Service (DoS) attacks (for example, in
160	   November 2015 [Moura16b]) in order to degrade user experience.

162	   To reduce latency and improve resiliency against DoS attacks, DNS
163	   uses several types of server replication.  Replication at the
164	   authoritative server level can be achieved with (i) the deployment of
165	   multiple servers for the same zone [RFC1035] (AT1--AT4 in Figure 1),
166	   (ii) the use of IP anycast [RFC1546][RFC4786][RFC7094] that allows
167	   the same IP address to be announced from multiple locations (each of
168	   them referred to as anycast instance [RFC8499]) and (iii) by using
169	   load balancers to support multiple servers inside a single
170	   (potentially anycasted) instance.  As a consequence, there are many
171	   possible ways an authoritative DNS provider can engineer its
172	   production authoritative server network, with multiple viable choices
173	   and no single optimal design.

175	   In the next sections we cover specific considerations (C1-C5) for
176	   large authoritative DNS servers operators.

178	3.  C1: Use equally strong IP anycast in every authoritative server (NS)
179	    for better load distribution

181	   Authoritative DNS servers operators announce their authoritative
182	   servers as NS records[RFC1034].  Different authoritatives for a given
183	   zone should return the same content, typically by staying
184	   synchronized using DNS zone transfers (AXFR[RFC5936] and
185	   IXFR[RFC1995]) to coordinate the authoritative zone data to return to
186	   their clients.

188	   DNS heavily relies upon replication to support high reliability,
189	   capacity and to reduce latency [Moura16b].  DNS has two complementary
190	   mechanisms to replicate the service.  First, the protocol itself
191	   supports nameserver replication of DNS service for a DNS zone through
192	   the use of multiple nameservers that each operate on different IP
193	   addresses, listed by a zone's NS records.  Second, each of these
194	   network addresses can run from multiple physical locations through
195	   the use of IP anycast[RFC1546][RFC4786][RFC7094], by announcing the
196	   same IP address from each instance and allowing Internet routing
197	   (BGP[RFC4271]) to associate clients with their topologically nearest
198	   anycast instance.  Outside the DNS protocol, replication can be
199	   achieved by deploying load balancers at each physical location.
200	   Nameserver replication is recommended for all zones (multiple NS
201	   records), and IP anycast is used by most large zones such as the DNS
202	   Root, most top-level domains[Moura16b] and large commercial
203	   enterprises, governments and other organizations.

205	   Most DNS operators strive to reduce latency for users of their
206	   service.  However, because they control only their authoritative
207	   servers, and not the recursive resolvers communicating with those
208	   servers, it is difficult to ensure that recursives will be served by
209	   the closest authoritative server.  Server selection is up to the
210	   recursive resolver's software implementation, and different software
211	   vendors and releases employ different criteria to chose which
212	   authoritative servers with which to communicate.

214	   Knowing how recursives choose authoritative servers is a key step to
215	   better engineer the deployment of authoritative servers.
216	   [Mueller17b] evaluates this with a measurement study in which they
217	   deployed seven unicast authoritative name servers in different global
218	   locations and queried these authoritative servers from more than 9k
219	   RIPE Atlas probes and and their respective recursive resolvers.

221	   In the wild, [Mueller17b] found that recursives query all available
222	   authoritative servers, regardless of the observed latency.  But the
223	   distribution of queries tend to be skewed towards authoritatives with
224	   lower latency: the lower the latency between a recursive resolver and
225	   an authoritative server, the more often the recursive will send
226	   queries to that authoritative.  These results were obtained by
227	   aggregating results from all vantage points and not specific to any
228	   vendor/version.

230	   The hypothesis is that this behavior is a consequence of two main
231	   criteria employed by resolvers when choosing authoritatives:
232	   performance (lower latency) and diversity of authoritatives, where a
233	   resolver checks all authoritative servers to determine which is
234	   closer and to provide alternatives if one is unavailable.

236	   For a DNS operator, this policy means that latency of all
237	   authoritatives (NS records [RF:SJa-01]) matter, so all must be
238	   similarly capable, since all available authoritatives will be queried
239	   by most recursive resolvers.  Since unicast cannot deliver good
240	   latency worldwide (a unicast authoritative server in Europe will
241	   always have high latency to resolvers in California, for example,
242	   given its geographical distance), [Mueller17b] recommends to DNS
243	   operators that they deploy equally strong IP anycast in every
244	   authoritative server (ie.e, on each NS record [RF:SJa-01]), in terms
245	   of number of instances and peering, and, consequently, to phase out
246	   unicast, so they can deliver good latency values to global clients.
247	   However, [Mueller17b] also notes that DNS operators should also take
248	   architectural considerations into account when planning for deploying
249	   anycast [RFC1546].

251	   This consideration was deployed at the ".nl" TLD zone, which
252	   originally had seven authoritative severs (mixed unicast/anycast
253	   setup). .nl has moved in early 2018 to a setup with 4 anycast
254	   authoritative name servers.  This is not to say that .nl was the
255	   first - other zones, have been running anycast only authoritatives
256	   (e.g., .be since 2013).  [Mueller17b] contribution is to show that
257	   unicast cannot deliver good latency worldwide, and that anycast has
258	   to be deployed to deliver good latency worldwide.

260	4.  C2: Routing Can Matter More Than Locations

262	   A common metric when choosing an anycast DNS provider or setting up
263	   an anycast service is the number of anycast instances[RFC4786], i.e.,
264	   the number of global locations from which the same address is
265	   announced with BGP.  Intuitively, one could think that more instances
266	   will lead to shorter response times.

268	   However, this is not necessarily true.  In fact, [Schmidt17a] found
269	   that routing can matter more than the total number of locations.
270	   They analyzed the relationship between the number of anycast
271	   instances and the performance of a service (latency-wise, RTT) and
272	   measured the overall performance of four DNS Root servers, namely C,
273	   F, K and L, from more than 7.9k RIPE Atlas probes.

275	   [Schmidt17a] found that C-Root, a smaller anycast deployment
276	   consisting of only 8 instances (they refer to anycast instance as
277	   anycast site), provided a very similar overall performance than that
278	   of the much larger deployments of K and L, with 33 and 144 instances
279	   respectively.  The median RTT for C, K and L Root was between
280	   30-32ms.

282	   Given that Atlas has better coverage in Europe than other regions,
283	   the authors specifically analyzed results per region and per country
284	   (Figure 5 in [Schmidt17a]), and show that Atlas bias to Europe does
285	   not change the conclusion that location of anycast instances
286	   dominates latency.  [RF:Issue12]

288	   [Schmidt17a] consideration for DNS operators when engineering anycast
289	   services is consider factors other than just the number of instances
290	   (such as local routing connectivity) when designing for performance.
291	   They showed that 12 instances can provide reasonable latency, given
292	   they are globally distributed and have good local interconnectivity.
293	   However, more instances can be useful for other reasons, such as when
294	   handling DDoS attacks [Moura16b].

296	5.  C3: Collecting Detailed Anycast Catchment Maps Ahead of Actual
297	    Deployment Can Improve Engineering Designs

299	   An anycast DNS service may have several dozens or even more than one
300	   hundred instances (such as L-Root does).  Anycast leverages Internet
301	   routing to distribute the incoming queries to a service's distributed
302	   anycast instances; in theory, BGP (the Internet's defacto routing
303	   protocol) forwards incoming queries to a nearby anycast instance (in
304	   terms of BGP distance).  However, usually queries are not evenly
305	   distributed across all anycast instances, as found in the case of
306	   L-Root [IcannHedge18].

308	   Adding new instances to an anycast service may change the load
309	   distribution across all instances, leading to suboptimal usage of the
310	   service or even stressing some instances while others remain
311	   underutilized.  This is a scenario that operators constantly face
312	   when expanding an anycast service.  Besides, when setting up a new
313	   anycast service instance, operators cannot directly estimate the
314	   query distribution among the instances in advance of enabling the new
315	   instance.

317	   To estimate the query loads across instances of an expanding service
318	   or a when setting up an entirely new service, operators need detailed
319	   anycast maps and catchment estimates (i.e., operators need to know
320	   which prefixes will be matched to which anycast instance).  To do
321	   that, [Vries17b] developed a new technique enabling operators to
322	   carry out active measurements, using an open-source tool called
323	   Verfploeter (available at [VerfSrc]).  Verfploeter maps a large
324	   portion of the IPv4 address space, allowing DNS operators to predict
325	   both query distribution and clients catchment before deploying new
326	   anycast instances.

328	   [Vries17b] shows how this technique was used to predict both the
329	   catchment and query load distribution for the new anycast service of
330	   B-Root.  Using two anycast instances in Miami (MIA) and Los Angeles
331	   (LAX) from the operational B-Root server, they sent ICMP echo packets
332	   to IP addresses to each IPv4 /24 on the Internet using a source
333	   address within the anycast prefix.  Then, they recorded which
334	   instance the ICMP echo replies arrived at based on the Internet's BGP
335	   routing.  This analysis resulted in an Internet wide catchment map.
336	   Weighting was then applied to the incoming traffic prefixes based on
337	   of 1 day of B-Root traffic (2017-04-12, DITL datasets [Ditl17]).  The
338	   combination of the created catchment mapping and the load per prefix
339	   created an estimate predicting that 81.6% of the traffic would go to
340	   the LAX instance.  The actual value was 81.4% of traffic going to
341	   LAX, showing that the estimation was pretty close and the Verfploeter
342	   technique was a excellent method of predicting traffic loads in
343	   advance of a new anycast instance deployment ([Vries17b] also uses
344	   the term anycast site to refer to anycast instance).

346	   Besides that, Verfploeter can also be used to estimate how traffic
347	   shifts among instances when BGP manipulations are executed, such as
348	   AS Path prepending that is frequently used by production networks
349	   during DDoS attacks.  A new catchment mapping for each prepending
350	   configuration configuration: no prepending, and prepending with 1, 2
351	   or 3 hops at each instance.  Then, [Vries17b] shows that this mapping
352	   can accurately estimate the load distribution for each configuration.

354	   An important operational takeaway from [Vries17b] is that DNS
355	   operators can make informed choices when engineering new anycast
356	   instances or when expending new ones by carrying out active
357	   measurements using Verfploeter in advance of operationally enabling
358	   the fully anycast service.  Operators can spot sub-optimal routing
359	   situations early, with a fine granularity, and with significantly
360	   better coverage than using traditional measurement platforms such as
361	   RIPE Atlas.

363	   To date, Verfploeter has been deployed on B-Root[Vries17b], on a
364	   operational testbed (Anycast testbed) [AnyTest], and on a large
365	   unnamed operator.

367	   The consideration is therefore to deploy a small test Verfploeter-
368	   enabled platform in advance at a potential anycast instance may
369	   reveal the realizable benefits of using that instance as an anycast
370	   interest, potentially saving significant financial and labor costs of
371	   deploying hardware to a new instance that was less effective than as
372	   had been hoped.

374	6.  C4: When under stress, employ two strategies

376	   DDoS attacks are becoming bigger, cheaper, and more frequent
377	   [Moura16b].  The most powerful recorded DDoS attack to DNS servers to
378	   date reached 1.2 Tbps, by using IoT devices [Perlroth16].  Such
379	   attacks call for an answer for the following question: how should a
380	   DNS operator engineer its anycast authoritative DNS server react to
381	   the stress of a DDoS attack?  This question is investigated in study
382	   [Moura16b] in which empirical observations are grounded with the
383	   following theoretical evaluation of options.

385	   An authoritative DNS server deployed using anycast will have many
386	   server instances distributed over many networks.  Ultimately, the
387	   relationship between the DNS provider's network and a client's ISP
388	   will determine which anycast instance will answer queries for a given
389	   client, given that BGP is the protocol that maps clients to specific
390	   anycast instances by using routing information [RF:KDar02].  As a
391	   consequence, when an anycast authoritative server is under attack,
392	   the load that each anycast instance receives is likely to be unevenly
393	   distributed (a function of the source of the attacks), thus some
394	   instances may be more overloaded than others which is what was
395	   observed analyzing the Root DNS events of Nov. 2015 [Moura16b].
396	   Given the fact that different instances may have different capacity
397	   (bandwidth, CPU, etc.), making a decision about how to react to
398	   stress becomes even more difficult.

400	   In practice, an anycast instance under stress, overloaded with
401	   incoming traffic, has two options:

403	   o  It can withdraw or pre-prepend its route to some or to all of its
404	      neighbors, ([RF:Issue3]) perform other traffic shifting tricks
405	      (such as reducing the propagation of its announcements using BGP
406	      communities[RFC1997]) which shrinks portions of its catchment),
407	      use FlowSpec [RFC5575] or other upstream communication mechanisms
408	      to deploy upstream filtering.  The goals of these techniques is to
409	      perform some combination of shifting of both legitimate and attack
410	      traffic to other anycast instances (with hopefully greater
411	      capacity) or to block the traffic entirely.

413	   o  Alternatively, it can be become a degraded absorber, continuing to
414	      operate, but with overloaded ingress routers, dropping some
415	      incoming legitimate requests due to queue overflow.  However,
416	      continued operation will also absorb traffic from attackers in its
417	      catchment, protecting the other anycast instances.

419	   [Moura16b] saw both of these behaviors in practice in the Root DNS
420	   events, observed through instance reachability and route-trip time
421	   (RTTs).  These options represent different uses of an anycast
422	   deployment.  The withdrawal strategy causes anycast to respond as a
423	   waterbed, with stress displacing queries from one instance to others.
424	   The absorption strategy behaves as a conventional mattress,
425	   compressing under load, with some queries getting delayed or dropped.

427	   Although described as strategies and policies, these outcomes are the
428	   result of several factors: the combination of operator and host ISP
429	   routing policies, routing implementations withdrawing under load, the
430	   nature of the attack, and the locations of the instances and the
431	   attackers.  Some policies are explicit, such as the choice of local-
432	   only anycast instances, or operators removing an instance for
433	   maintenance or modifying routing to manage load.  However, under
434	   stress, the choices of withdrawal and absorption can also be results
435	   that emerge from a mix of explicit choices and implementation
436	   details, such as BGP timeout values.

438	   [Moura16b] speculates that more careful, explicit, and automated
439	   management of policies may provide stronger defenses to overload, an
440	   area currently under study.  For DNS operators, that means that
441	   besides traditional filtering, two other options are available
442	   (withdraw/prepend/communities or isolate instances), and the best
443	   choice depends on the specifics of the attack.

445	   Note that this consideration refers to the operation of one anycast
446	   service, i.e., one anycast NS record.  However, DNS zones with
447	   multiple NS anycast services may expect load to spill from one
448	   anycast server to another,as resolvers switch from authoritative to
449	   authoritative when attempting to resolve a name [Mueller17b].

451	7.  C5: Consider longer time-to-live values whenever possible

453	   [RF:Issue7]: this section has been completely rewritten.

455	   Caching is the cornerstone of good DNS performance and reliability.
456	   A 15 ms response to a new DNS query is fast, but a 1 ms cache hit to
457	   a repeat query is far faster.  Caching also protects users from short
458	   outages and can mute even significant DDoS attacks [Moura18b].

460	   DNS record TTLs (time-to-live values) directly control cache
461	   durations [RFC1034][RFC1035] and, therefore, affect latency,
462	   resilience, and the role of DNS in CDN server selection.  Some early
463	   work modeled caches as a function of their TTLs [Jung03a], and recent
464	   work examined their interaction with DNS[Moura18b], but no research
465	   provides considerations about what TTL values are good.  With this
466	   goal Moura et. al.  [Moura19a] carried out a measurement study
467	   investigating TTL choices and its impact on user experience.

469	   First, they identified several reasons why operators/zone owners may
470	   want to choose longer or shorter TTLs:

472	   o  Longer caching results in faster responses, given that cache hits
473	      are faster than cache misses in resolvers.  [Moura19a] shows that
474	      the change in TTL for .uy TLD from 1 day to 5 minutes reduced the
475	      RTT from 15k Atlas vantage points significantly: the median was
476	      reduced from 28.7ms to 8ms, while the 75%ile decreased from 183ms
477	      to 21ms.

479	   o  Longer caching results in lower DNS traffic: authoritative servers
480	      will experience less traffic if TTLs are extended, given that
481	      repeated queries will be answered by resolver caches.

483	   o  Longer caching results in lower cost if DNS is metered: some DNS-
484	      As-A-Service providers charges are metered, with a per query cost
485	      (often added to a fixed monthly cost).

487	   o  Longer caching is more robust to DDoS attacks on DNS: DDoS attacks
488	      on a DNS service provider harmed several prominent websites
489	      [Perlroth16].  Recent work has shown that DNS caching can greatly
490	      reduce the effects of DDoS on DNS, provided caches last longer
491	      than the attack [Moura18b].

493	   o  Shorter caching supports operational changes: An easy way to
494	      transition from an old server to a new one is to change the DNS
495	      records.  Since there is no method to remove cached DNS records,
496	      the TTL duration represents a necessary transition delay to fully
497	      shift to a new server, so low TTLs allow more rapid transition.
498	      However, when deployments are planned in advance (that is, longer
499	      than the TTL), then TTLs can be lowered ''just-before'' a major
500	      operational change, and raised again once accomplished.

502	   o  Shorter caching can with DNS-based load balancing: Some DDoS-
503	      scrubbing services use DNS to redirect traffic during an attack.
504	      Since DDoS attacks arrive unannounced, DNS-based traffic
505	      redirection requires the TTL be kept quite low at all times to be
506	      ready to respond to a potential attack.

508	   As such, choice of TTL depends in part on external factors so no
509	   single recommendation is appropriate for all.  Organizations must
510	   weigh these trade-offs to find a good balance.  Still, some
511	   guidelines can be used when choosing TTLs:

513	   o  For general users, [Moura19a] recommends longer TTLs, of at least
514	      one hour, and ideally 4, 8, 12, or 24 hours.  Assuming planned
515	      maintenance can be scheduled at least a day in advance, long TTLs
516	      have little cost.

518	   o  For TLD operators: TLD operators that allow public registration of
519	      domains (such as most ccTLDs and .com, .net, .org) host, in their
520	      zone files, NS records (and glues if in-bailiwick) of their
521	      respective domains.  [Moura19a] shows that most resolvers will use
522	      TTL values provided by the child delegations, but some will choose
523	      the TTL provided by the parents.  As such, similarly to general
524	      users, [Moura19a] recommends longer TTLs for NS records of their
525	      delegations (at least one hour, preferably more).

527	   o  Users of DNS-based load balancing or DDoS-prevention may require
528	      short TTLs: TTLs may be as short as 5 minutes, although 15 minutes
529	      may provide sufficient agility for many operators.  Shorter TTLs
530	      here help agility; they are are an exception to the consideration
531	      for longer TTLs.

533	   o  Use A/AAAA and NS records: TTLs of A/AAAA records should be
534	      shorter or equal to the TTL for NS records for in-bailiwick
535	      authoritative DNS servers, given that the authors [Moura19a] found
536	      that, for such scenarios, once NS record expires, their associated
537	      A/AAAA will also be updated (glue is sent by the parents).  For
538	      out-of-bailiwick servers, A and NS records are usually cached
539	      independently, so different TTLs, if desired, will be effective.
540	      In either case, short A and AAAA records may be desired if DDoS-
541	      mitigation services are an option.

543	8.  Security considerations

545	   This document suggests the use of [I-D.ietf-dnsop-serve-stale].  It
546	   be noted that usage of such methods may affect data integrity of DNS
547	   information.  This document describes methods of mitigating changes
548	   of a denial of service threat within a DNS service.

550	   As this document discusses research, there are no further security
551	   considerations, other than the ones mentioned in the normative
552	   references.

554	9.  Privacy Considerations

556	   This document does not add any practical new privacy issues.

558	10.  IANA considerations

560	   This document has no IANA actions.

562	11.  Acknowledgements

564	   This document is a summary of the main considerations of six research
565	   works referred in this document.  As such, they were only possible
566	   thanks to the hard work of the authors of these research works.

568	   The authors of this document are also co-authors of these research
569	   works.  However, not all thirteen authors of these research papers
570	   are also authors of this document.  We would like to thank those not
571	   included in this document's author list for their work: Ricardo de O.
572	   Schmidt, Wouter B de Vries, Moritz Mueller, Lan Wei, Cristian
573	   Hesselman, Jan Harm Kuipers, Pieter-Tjerk de Boer and Aiko Pras.

575	   We would like also to thank the various reviewers of different
576	   versions of this draft: Duane Wessels, Joe Abley, Toema Gavrichenkov,
577	   John Levine, Michael StJohns, Kristof Tuyteleers, Stefan Ubbink,
578	   Klaus Darilion and Samir Jafferali, and comments provided at the IETF
579	   DNSOP session (IETF104).

581	   Besides those, we would like thank those who have been individually
582	   thanked in each research work, RIPE NCC and DNS OARC for their tools
583	   and datasets used in this research, as well as the funding agencies
584	   sponsoring the individual research works.

586	12.  References

588	12.1.  Normative References

590	   [I-D.ietf-dnsop-serve-stale]
591	              Lawrence, D., Kumari, W., and P. Sood, "Serving Stale Data
592	              to Improve DNS Resiliency", draft-ietf-dnsop-serve-
593	              stale-05 (work in progress), April 2019.

595	   [RFC1034]  Mockapetris, P., "Domain names - concepts and facilities",
596	              STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987,
597	              <https://www.rfc-editor.org/info/rfc1034>.

599	   [RFC1035]  Mockapetris, P., "Domain names - implementation and
600	              specification", STD 13, RFC 1035, DOI 10.17487/RFC1035,
601	              November 1987, <https://www.rfc-editor.org/info/rfc1035>.

603	   [RFC1546]  Partridge, C., Mendez, T., and W. Milliken, "Host
604	              Anycasting Service", RFC 1546, DOI 10.17487/RFC1546,
605	              November 1993, <https://www.rfc-editor.org/info/rfc1546>.

607	   [RFC1995]  Ohta, M., "Incremental Zone Transfer in DNS", RFC 1995,
608	              DOI 10.17487/RFC1995, August 1996,
609	              <https://www.rfc-editor.org/info/rfc1995>.

611	   [RFC1997]  Chandra, R., Traina, P., and T. Li, "BGP Communities
612	              Attribute", RFC 1997, DOI 10.17487/RFC1997, August 1996,
613	              <https://www.rfc-editor.org/info/rfc1997>.

615	   [RFC2181]  Elz, R. and R. Bush, "Clarifications to the DNS
616	              Specification", RFC 2181, DOI 10.17487/RFC2181, July 1997,
617	              <https://www.rfc-editor.org/info/rfc2181>.

619	   [RFC4271]  Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
620	              Border Gateway Protocol 4 (BGP-4)", RFC 4271,
621	              DOI 10.17487/RFC4271, January 2006,
622	              <https://www.rfc-editor.org/info/rfc4271>.

624	   [RFC4786]  Abley, J. and K. Lindqvist, "Operation of Anycast
625	              Services", BCP 126, RFC 4786, DOI 10.17487/RFC4786,
626	              December 2006, <https://www.rfc-editor.org/info/rfc4786>.

628	   [RFC5575]  Marques, P., Sheth, N., Raszuk, R., Greene, B., Mauch, J.,
629	              and D. McPherson, "Dissemination of Flow Specification
630	              Rules", RFC 5575, DOI 10.17487/RFC5575, August 2009,
631	              <https://www.rfc-editor.org/info/rfc5575>.

633	   [RFC5936]  Lewis, E. and A. Hoenes, Ed., "DNS Zone Transfer Protocol
634	              (AXFR)", RFC 5936, DOI 10.17487/RFC5936, June 2010,
635	              <https://www.rfc-editor.org/info/rfc5936>.

637	   [RFC7094]  McPherson, D., Oran, D., Thaler, D., and E. Osterweil,
638	              "Architectural Considerations of IP Anycast", RFC 7094,
639	              DOI 10.17487/RFC7094, January 2014,
640	              <https://www.rfc-editor.org/info/rfc7094>.

642	   [RFC8499]  Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS
643	              Terminology", BCP 219, RFC 8499, DOI 10.17487/RFC8499,
644	              January 2019, <https://www.rfc-editor.org/info/rfc8499>.

646	12.2.  Informative References

648	   [AnyTest]  Schmidt, R., "Anycast Testbed", December 2018,
649	              <http://www.anycast-testbed.com/>.

651	   [Ditl17]   OARC, D., "2017 DITL data", October 2018,
652	              <https://www.dns-oarc.net/oarc/data/ditl/2017>.

654	   [IcannHedge18]
655	              ICANN, ., "DNS-STATS - Hedgehog 2.4.1", October 2018,
656	              <http://stats.dns.icann.org/hedgehog/>.

658	   [Jung03a]  Jung, J., Berger, A., and H. Balakrishnan, "Modeling TTL-
659	              based Internet caches", ACM 2003 IEEE INFOCOM,
660	              DOI 10.1109/INFCOM.2003.1208693, July 2003,
661	              <http://www.ieee-infocom.org/2003/papers/11_01.PDF>.

663	   [Moura16b]
664	              Moura, G., Schmidt, R., Heidemann, J., Mueller, M., Wei,
665	              L., and C. Hesselman, "Anycast vs DDoS Evaluating the
666	              November 2015 Root DNS Events.", ACM 2016 Internet
667	              Measurement Conference, DOI /10.1145/2987443.2987446,
668	              October 2016,
669	              <https://www.isi.edu/~johnh/PAPERS/Moura16b.pdf>.

671	   [Moura18b]
672	              Moura, G., Heidemann, J., Mueller, M., Schmidt, R., and M.
673	              Davids, "When the Dike Breaks: Dissecting DNS Defenses
674	              During DDos", ACM 2018 Internet Measurement Conference,
675	              DOI 10.1145/3278532.3278534, October 2018,
676	              <https://www.isi.edu/~johnh/PAPERS/Moura18b.pdf>.

678	   [Moura19a]
679	              Moura, G., Heidemann, J., Schmidt, R., and W. Hardaker,
680	              "TBA", June 2019,
681	              <https://www.isi.edu/~johnh/PAPERS/Moura19a.pdf>.

683	   [Mueller17b]
684	              Mueller, M., Moura, G., Schmidt, R., and J. Heidemann,
685	              "Recursives in the Wild- Engineering Authoritative DNS
686	              Servers.", ACM 2017 Internet Measurement Conference,
687	              DOI 10.1145/3131365.3131366, October 2017,
688	              <https://www.isi.edu/%7ejohnh/PAPERS/Mueller17b.pdf>.

690	   [Perlroth16]
691	              Perlroth, N., "Hackers Used New Weapons to Disrupt Major
692	              Websites Across U.S.", October 2016,
693	              <https://www.nytimes.com/2016/10/22/business/
694	              internet-problems-attack.html>.

696	   [Schmidt17a]
697	              Schmidt, R., Heidemann, J., and J. Kuipers, "Anycast
698	              Latency - How Many Sites Are Enough. In Proceedings of the
699	              Passive and Active Measurement Workshop", PAM Passive and
700	              Active Measurement Conference, March 2017,
701	              <https://www.isi.edu/%7ejohnh/PAPERS/Schmidt17a.pdf>.

703	   [Sigla2014]
704	              Singla, A., Chandrasekaran, B., Godfrey, P., and B. Maggs,
705	              "The Internet at the speed of light. In Proceedings of the
706	              13th ACM Workshop on Hot Topics in Networks (Oct 2014)",
707	              ACM Workshop on Hot Topics in Networks, October 2014,
708	              <http://speedierweb.web.engr.illinois.edu/cspeed/papers/
709	              hotnets14.pdf>.

711	   [VerfSrc]  Vries, W., "Verfploeter source code", November 2018,
712	              <https://github.com/Woutifier/verfploeter>.

714	   [Vries17b]
715	              Vries, W., Schmidt, R., Hardaker, W., Heidemann, J., Boer,
716	              P., and A. Pras, "Verfploeter - Broad and Load-Aware
717	              Anycast Mapping", ACM 2017 Internet Measurement
718	              Conference, DOI 10.1145/3131365.3131371, October 2017,
719	              <https://www.isi.edu/%7ejohnh/PAPERS/Vries17b.pdf>.

721	Authors' Addresses

723	   Giovane C. M. Moura
724	   SIDN Labs/TU Delft
725	   Meander 501
726	   Arnhem  6825 MD
727	   The Netherlands

729	   Phone: +31 26 352 5500
730	   Email: giovane.moura@sidn.nl

732	   Wes Hardaker
733	   USC/Information Sciences Institute
734	   PO Box 382
735	   Davis  95617-0382
736	   U.S.A.

738	   Phone: +1 (530) 404-0099
739	   Email: ietf@hardakers.net

741	   John Heidemann
742	   USC/Information Sciences Institute
743	   4676 Admiralty Way
744	   Marina Del Rey  90292-6695
745	   U.S.A.

747	   Phone: +1 (310) 448-8708
748	   Email: johnh@isi.edu
749	   Marco Davids
750	   SIDN Labs
751	   Meander 501
752	   Arnhem  6825 MD
753	   The Netherlands

755	   Phone: +31 26 352 5500
756	   Email: marco.davids@sidn.nl