idnits 2.17.1 

draft-moura-dnsop-authoritative-recommendations-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (December 20, 2018) is 1953 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	DNSOP Working Group                                             G. Moura
3	Internet-Draft                                        SIDN Labs/TU Delft
4	Intended status: Informational                               W. Hardaker
5	Expires: June 23, 2019                                      J. Heidemann
6	                                      USC/Information Sciences Institute
7	                                                               M. Davids
8	                                                               SIDN Labs
9	                                                       December 20, 2018

11	          Recommendations for Authoritative Servers Operators
12	           draft-moura-dnsop-authoritative-recommendations-01

14	Abstract

16	   This document summarizes recent research work exploring DNS
17	   configurations and offers specific, tangible recommendations to
18	   operators for configuring authoritative servers.

20	   This document is not an Internet Standards Track specification; it is
21	   published for informational purposes.

23	Ed note

25	   Text inside square brackets ([RF:ABC]) refers to individual comments
26	   we have received about the draft, and enumerated under
27	   <https://github.com/gmmoura/draft-moura-dnsop-authoritative-
28	   recommendations/blob/master/reviews/reviews-dnsop.md>.  They will be
29	   removed before publication.

31	   This draft is being hosted on GitHub - <https://github.com/gmmoura/
32	   draft-moura-dnsop-authoritative-recommendations>, where the most
33	   recent version of the document and open issues can be found.  The
34	   authors gratefully accept pull requests.

36	Status of This Memo

38	   This Internet-Draft is submitted in full conformance with the
39	   provisions of BCP 78 and BCP 79.

41	   Internet-Drafts are working documents of the Internet Engineering
42	   Task Force (IETF).  Note that other groups may also distribute
43	   working documents as Internet-Drafts.  The list of current Internet-
44	   Drafts is at https://datatracker.ietf.org/drafts/current/.

46	   Internet-Drafts are draft documents valid for a maximum of six months
47	   and may be updated, replaced, or obsoleted by other documents at any
48	   time.  It is inappropriate to use Internet-Drafts as reference
49	   material or to cite them other than as "work in progress."

51	   This Internet-Draft will expire on June 23, 2019.

53	Copyright Notice

55	   Copyright (c) 2018 IETF Trust and the persons identified as the
56	   document authors.  All rights reserved.

58	   This document is subject to BCP 78 and the IETF Trust's Legal
59	   Provisions Relating to IETF Documents
60	   (https://trustee.ietf.org/license-info) in effect on the date of
61	   publication of this document.  Please review these documents
62	   carefully, as they describe your rights and restrictions with respect
63	   to this document.  Code Components extracted from this document must
64	   include Simplified BSD License text as described in Section 4.e of
65	   the Trust Legal Provisions and are provided without warranty as
66	   described in the Simplified BSD License.

68	Table of Contents

70	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
71	   2.  R1:  Use equaly strong IP anycast in every authoritative
72	       server to achieve even load distribution  . . . . . . . . . .   4
73	   3.  R2: Routing Can Matter More Than Locations  . . . . . . . . .   5
74	   4.  R3: Collecting Detailed Anycast Catchment Maps Ahead of
75	       Actual Deployment Can Improve Engineering Designs . . . . . .   6
76	   5.  R4: When under stress, employ two strategies  . . . . . . . .   8
77	   6.  R5: Consider longer time-to-live values whenever possible . .   9
78	   7.  R6: Shared Infrastructure Risks Collateral Damage During
79	       Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . .  11
80	   8.  Security considerations . . . . . . . . . . . . . . . . . . .  12
81	   9.  IANA considerations . . . . . . . . . . . . . . . . . . . . .  12
82	   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  12
83	   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  12
84	     11.1.  Normative References . . . . . . . . . . . . . . . . . .  12
85	     11.2.  Informative References . . . . . . . . . . . . . . . . .  13
86	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  15

88	1.  Introduction

90	   The domain name system (DNS) has main two types of DNS servers:
91	   authoritative servers and recursive resolvers.  Figure 1 shows their
92	   relationship.  An authoritative server knows the content of a DNS
93	   zone from local knowledge, and thus can answer queries about that
94	   zone needing to query other servers [RFC2181].  A recursive resolver
95	   is a program that extracts information from name servers in response
96	   to client requests [RFC1034].  A client, in Figure 1, is shown as
97	   stub, which is shorthand for stub resolver [RFC1034] that is
98	   typically located within the client software.

100	               +-----+   +-----+   +-----+   +-----+
101	               | AT1 |   | AT2 |   | AT3 |   | AT4 |
102	               +--+--+   +--+--+   +---+-+   +--+--+
103	                  ^         ^          ^        ^
104	                  |         |          |        |
105	                  |      +--+--+       |        |
106	                  +------+ Rn  +-------+        |
107	                  |      +--^--+                |
108	                  |         |                   |
109	                  |      +--+--+   +-----+      |
110	                  +------+R1_1 |   |R1_2 +------+
111	                         +-+---+   +----+
112	                           ^           ^
113	                           |           |
114	                           | +------+  |
115	                           +-+ stub +--+
116	                             +------+

118	        Figure 1: Relationship between recursive resolvers (R) and
119	                      authoritative name servers (AT)

121	   DNS queries contribute to user's perceived latency and affect user
122	   experience [Sigla2014], and the DNS system has been subject to
123	   repeated Denial of Service (DoS) attacks (for example, in November
124	   2015 [Moura16b]) in order to degrade user experience.  To reduce
125	   latency and improve resiliency against DoS attacks, DNS uses several
126	   types of server replication.  Replication at the authoritative server
127	   level can be achieved with the deployment of multiple servers for the
128	   same zone [RFC1035] (AT1--AT4 in Figure 1), the use of IP anycast
129	   [RFC1546][RFC4786][RFC7094] and by using load balancers to support
130	   multiple servers inside a single (potentially anycasted) site.  As a
131	   consequence, there are many possible ways a DNS provider can engineer
132	   its production authoritative server network, with multiple viable
133	   choices and no single optimal design.

135	   This document summarizes recent research work exploring DNS
136	   configurations and offers specific tangible recommendations to DNS
137	   authoritative servers operators (DNS operators hereafter).
138	   [RF:JAb2]], [RF:MSJ1], [RF:DW2].  The recommendations (R1-R6)
139	   presented in this document are backed by previous research work,
140	   which used wide-scale Internet measurements upon which to draw their
141	   conclusions.  This document describes the key engineering options,
142	   and points readers to the pertinent papers for details.

144	   [RF:JAb1, Issue#2].  These recommendations are designed for operators
145	   of "large" authoritative servers for domains like TLDs.  "Large"
146	   authoritative servers mean those with a significant global user
147	   population.  These recommendations may not be appropriate for smaller
148	   domains, such as those used by an organization with users in one city
149	   or region, where goals such as uniform low latency are less strict.

151	   It is likely that these recommendations might be useful in a wider
152	   context, such as for any stateless/short-duration, anycasted service.
153	   Because the conclusions of the studies don't verify this fact, the
154	   wording in this document discusses DNS authoritative services only.

156	2.  R1: Use equaly strong IP anycast in every authoritative server to
157	    achieve even load distribution

159	   Authoritative DNS servers operators announce their authoritative
160	   servers in the form of Name Server (NS)records{ {RFC1034}}. Different
161	   authoritatives for a given zone should return the same content,
162	   typically by staying synchronized using DNS zone transfers
163	   (AXFR[RFC5936] and IXFR[RFC1995]) to coordinate the authoritative
164	   zone data to return to their clients.

166	   DNS heavily relies upon replication to support high reliability,
167	   capacity and to reduce latency [Moura16b].  DNS has two complementary
168	   mechanisms to replicate the service.  First, the protocol itself
169	   supports nameserver replication of DNS service for a DNS zone through
170	   the use of multiple nameservers that each operate on different IP
171	   addresses, listed by a zone's NS records.  Second, each of these
172	   network addresses can run from multiple physical locations through
173	   the use of IP anycast[RFC1546][RFC4786][RFC7094], by announcing the
174	   same IP address from each site and allowing Internet routing
175	   (BGP[RFC4271]) to associate clients with their topologically nearest
176	   anycast site.  Outside the DNS protocol, replication can be achieved
177	   by deploying load balancers at each physical location.  Nameserver
178	   replication is recommended for all zones (multiple NS records), and
179	   IP anycast is used by most large zones such as the DNS Root, most
180	   top-level domains[Moura16b] and large commercial enterprises,
181	   governments and other organizations.

183	   Most DNS operators strive to reduce latency for users of their
184	   service.  However, because they control only their authoritative
185	   servers, and not the recursive resolvers communicating with those
186	   servers, it is difficult to ensure that recursives will be served by
187	   the closest authoritative server.  Server selection is up to the
188	   recursive resolver's software implementation, and different software
189	   vendors and releases employ different criteria to chose which
190	   authoritative servers with which to communicate.

192	   Knowing how recursives choose authoritative servers is a key step to
193	   better engineer the deployment of authoritative servers.
194	   [Mueller17b] evaluates this with a measurement study in which they
195	   deployed seven unicast authoritative name servers in different global
196	   locations and queried these authoritative servers from more than
197	   9,000 RIPE Atlas probes (Vantage Points--VPs) and their respective
198	   recursive resolvers.

200	   In the wild, [Mueller17b] found that recursives query all available
201	   authoritative servers, regardless of the observed latency.  But the
202	   distribution of queries tend to be skewed towards authoritatives with
203	   lower latency: the lower the latency between a recursive resolver and
204	   an authoritative server, the more often the recursive will send
205	   queries to that authoritative.  These results were obtained by
206	   aggregating results from all vantage points and not specific to any
207	   vendor/version.

209	   The hypothesis is that this behavior is a consequence of two main
210	   criteria employed by resolvers when choosing authoritatives:
211	   performance (lower latency) and diversity of authoritatives, where a
212	   resolver checks all recursives to determine which is closer and to
213	   provide alternatives if one is unavailable.

215	   For a DNS operator, this policy means that latency of all
216	   authoritatives matter, so all must be similarly capable, since all
217	   available authoritatives will be queried by most recursive resolvers.
218	   Since unicast cannot deliver good latency worldwide (a site in Europe
219	   will always have a high latency to resolvers in California, for
220	   example), [Mueller17b] recommends to DNS operators that they deploy
221	   equally strong IP anycast in every authoritative server (and,
222	   consequently, to phase out unicast), so they can deliver latency
223	   values to global clients.  However, [Mueller17b] also notes that DNS
224	   operators should also take architectural considerations into account
225	   when planning for deploying anycast [RFC1546].

227	   This recommendation was deployed at the ".nl" TLD zone, which
228	   originally had a mixed unicast/anycast setup; since early 2018 it now
229	   has 4 anycast authoritative name servers.

231	3.  R2: Routing Can Matter More Than Locations

233	   A common metric when choosing an anycast DNS provider or setting up
234	   an anycast service is the number of anycast sites, i.e., the number
235	   of global locations from which the same address is announced with
236	   BGP.  Intuitively, one could think that more sites will lead to
237	   shorter response times.

239	   However, this is not necessarily true.  In fact, [Schmidt17a] found
240	   that routing can matter more than the total number of locations.
241	   They analyzed the relationship between the number of anycast sites
242	   and the performance of a service (latency-wise, RTT) and measured the
243	   overall performance of four DNS Root servers, namely C, F, K and L,
244	   from more than 7.9K RIPE Atlas probes.

246	   [Schmidt17a] found that C-Root, a smaller anycast deployment
247	   consisting of only 8 sites, provided a very similar overall
248	   performance than that of the much larger deployments of K and L, with
249	   33 and 144 sites respectively.  A median RTT was measured between
250	   30ms and 32ms for C, K and L roots, and 25ms for F.

252	   [Schmidt17a] recommendation for DNS operators when engineering
253	   anycast services is consider factors other than just the number of
254	   sites (such as local routing connectivity) when designing for
255	   performance.  They showed that 12 sites can provide reasonable
256	   latency, given they are globally distributed and have good local
257	   interconnectivity.  However, more sites can be useful for other
258	   reasons, such as when handling DDoS attacks [Moura16b].

260	4.  R3: Collecting Detailed Anycast Catchment Maps Ahead of Actual
261	    Deployment Can Improve Engineering Designs

263	   An anycast DNS service may have several dozens or even hundreds sites
264	   (such as L-Root does).  Anycast leverages Internet routing to
265	   distribute the incoming queries to a service's distributed anycast
266	   sites; in theory, BGP (the Internet's defacto routing protocol)
267	   forwards incoming queries to a nearby anycast site (in terms of BGP
268	   distance).  However, usually queries are not evenly distributed
269	   across all anycast sites, as found in the case of L-Root
270	   [IcannHedge18].

272	   Adding new sites to an anycast service may change the load
273	   distribution across all sites, leading to suboptimal usage of the
274	   service or even stressing some sites while others remain
275	   underutilized.  This is a scenario that operators constantly face
276	   when expanding an anycast service.  Besides, when setting up a new
277	   anycast service instance, operators cannot directly estimate the
278	   query distribution among the sites in advance of enabling the site.

280	   To estimate the query loads across sites of an expanding service or a
281	   when setting up an entirely new service, operators need detailed
282	   anycast maps and catchment estimates (i.e., operators need to know
283	   which prefixes will be matched to which anycast site).  To do that,
284	   [Vries17b] developed a new technique enabling operators to carry out
285	   active measurements, using aan open-source tool called Verfploeter
286	   (available at [VerfSrc]).  Verfploeter maps a large portion of the
287	   IPv4 address space, allowing DNS operators to predict both query
288	   distribution and clients catchment before deploying new anycast
289	   sites.

291	   [Vries17b] shows how this technique was used to predict both the
292	   catchment and query load distribution for the new anycast service of
293	   B-Root.  Using two anycast sites in Miami (MIA) and Los Angeles (LAX)
294	   from the operational B-Root server, they sent ICMP echo packets to IP
295	   addresses from each IPv4 /24 in on the Internet using a source
296	   address within the anycast prefix.  Then, they recorded which site
297	   the ICMP echo replies arrived at based on the Internet's BGP routing.
298	   This analysis resulted in an Internet wide catchment map.  Weighting
299	   was then applied to the incoming traffic prefixes based on of 1 day
300	   of B-Root traffic (2017-04-12, DITL datasets [Ditl17]).  The
301	   combination of the created catchment mapping and the load per prefix
302	   created an estimate predicting that 81.6% of the traffic would go to
303	   the LAX site.  The actual value was 81.4% of traffic going to LAX,
304	   showing that the estimation was pretty close and the Verfploeter
305	   technique was a excellent method of predicting traffic loads in
306	   advance of a new anycast instance deployment.

308	   Besides that, Verfploeter can also be used to estimate how traffic
309	   shifts among sites when BGP manipulations are executed, such as AS
310	   Path prepending that is frequently used by production networks during
311	   DDoS attacks.  A new catchment mapping for each prepending
312	   configuration configuration: no prepending, and prepending with 1, 2
313	   or 3 hops at each site.  Then, [Vries17b] shows that this mapping can
314	   accurately estimate the load distribution for each configuration.

316	   An important operational takeaway from [Vries17b] is that DNS
317	   operators can make informed choices when engineering new anycast
318	   sites or when expending new ones by carrying out active measurements
319	   using Verfploeter in advance of operationally enabling the fully
320	   anycast service.  Operators can spot sub-optimal routing situations
321	   early, with a fine granularity, and with significantly better
322	   coverage than using traditional measurement platforms such as RIPE
323	   Atlas.

325	   To date, Verfploeter has been deployed on B-Root[Vries17b], on a
326	   operational testbed (Anycast testbed) [AnyTest], and on a large
327	   unnamed operator.

329	   The recommendation is therefore to deploy a small test Verfploeter-
330	   enabled platform in advance at a potential anycast site may reveal
331	   the realizable benefits of using that site as an anycast interest,
332	   potentially saving significant financial and labor costs of deploying
333	   hardware to a new site that was less effective than as had been
334	   hoped.

336	5.  R4: When under stress, employ two strategies

338	   DDoS attacks are becoming bigger, cheaper, and more frequent
339	   [Moura16b].  The most powerful recorded DDoS attack to DNS servers to
340	   date reached 1.2 Tbps, by using IoT devices [Perlroth16].  Such
341	   attacks call for an answer for the following question: how should a
342	   DNS operator engineer its anycast authoritative DNS server react to
343	   the stress of a DDoS attack?  This question is investigated in study
344	   [Moura16b] in which empirical observations are grounded with the
345	   following theoretical evaluation of options.

347	   An authoritative DNS server deployed using anycast will have many
348	   server instances distributed over many networks and sites.
349	   Ultimately, the relationship between the DNS provider's network and a
350	   client's ISP will determine which anycast site will answer for
351	   queries for a given client.  As a consequence, when an anycast
352	   authoritative server is under attack, the load that each anycast site
353	   receives is likely to be unevenly distributed (a function of the
354	   source of the attacks), thus some sites may be more overloaded than
355	   others which is what was observed analyzing the Root DNS events of
356	   Nov. 2015 [Moura16b].  Given the fact that different sites may have
357	   different capacity (bandwidth, CPU, etc.), making a decision about
358	   how to react to stress becomes even more difficult.

360	   In practice, an anycast site under stress, overloaded with incoming
361	   traffic, has two options:

363	   o  It can withdraw or pre-prepend its route to some or to all of its
364	      neighbors, ([RF:Issue3]) perform other traffic shifting tricks
365	      (such as reducing the propagation of its announcements using BGP
366	      communities[RFC1997]) which shrinks portions of its catchment),
367	      use FlowSpec or other upstream communication mechanisms to deploy
368	      upstream filtering.  The goals of these techniques is to perform
369	      some combination of shifting of both legitimate and attack traffic
370	      to other anycast sites (with hopefully greater capacity) or to
371	      block the traffic entirely.

373	   o  Alternatively, it can be become a degraded absorber, continuing to
374	      operate, but with overloaded ingress routers, dropping some
375	      incoming legitimate requests due to queue overflow.  However,
376	      continued operation will also absorb traffic from attackers in its
377	      catchment, protecting the other anycast sites.

379	   [Moura16b] saw both of these behaviors in practice in the Root DNS
380	   events, observed through site reachability and route-trip time
381	   (RTTs).  These options represent different uses of an anycast
382	   deployment.  The withdrawal strategy causes anycast to respond as a
383	   waterbed, with stress displacing queries from one site to others.

385	   The absorption strategy behaves as a conventional mattress,
386	   compressing under load, with some queries getting delayed or dropped.

388	   Although described as strategies and policies, these outcomes are the
389	   result of several factors: the combination of operator and host ISP
390	   routing policies, routing implementations withdrawing under load, the
391	   nature of the attack, and the locations of the sites and the
392	   attackers.  Some policies are explicit, such as the choice of local-
393	   only anycast sites, or operators removing a site for maintenance or
394	   modifying routing to manage load.  However, under stress, the choices
395	   of withdrawal and absorption can also be results that emerge from a
396	   mix of explicit choices and implementation details, such as BGP
397	   timeout values.

399	   [Moura16b] speculates that more careful, explicit, and automated
400	   management of policies may provide stronger defenses to overload, an
401	   area currently under study.  For DNS operators, that means that
402	   besides traditional filtering, two other options are available
403	   (withdraw/prepend/communities or isolate sites), and the best choice
404	   depends on the specifics of the attack.

406	6.  R5: Consider longer time-to-live values whenever possible

408	   In a DNS response, each resource record is accompanied by a time-to-
409	   live value (TTL), which "describes how long a RR can be cached before
410	   it should be discarded" [RFC1034].  The TTL values are set by zone
411	   owners in their zone files - either specifically per record or by
412	   using default values for the entire zone.  Sometimes the same
413	   resource record may have different TTL values - one from the parent
414	   and one from the child DNS server.  In this cases, resolvers are
415	   expected to prioritize the answer according to Section 5.4.1 in
416	   [RFC2181].

418	   While set by authoritative server operators (labeled "AT"s in
419	   Figure 1), the TTL value in fact influences the behavior of recursive
420	   resolvers (and their operators - "Rn" in the same figure), by setting
421	   an upper limit on how long a record should be cached before
422	   discarded.  In this sense, caching can be seen as a sort of
423	   "ephemeral replication", i.e., the contents of an authoritative
424	   server are placed at a recursive resolver cache for a period of time
425	   up to the TTL value.  Caching improves response times by avoiding
426	   repeated queries between recursive resolvers and authoritative.

428	   Besides improving performance, it has been argued that caching plays
429	   a significant role in protecting users during DDoS attacks against
430	   authoritative servers.  To investigate that, [Moura18b] evaluates the
431	   role of caching (and retries) in DNS resiliency to DDoS attacks.  Two
432	   authoritative servers were configured for a newly registered domain
433	   and a series of experiments were carried out using various TTL values
434	   (60,1800, 3600, 86400s) for records.  Unique DNS queries were sent
435	   from roughly 15,000 vantage points, using RIPE Atlas.

437	   [Moura18b] found that , under normal operations, caching works as
438	   expected 70% of the times in the wild.  It is believe that complex
439	   recursive infrastructure (such as anycast recursives with fragmented
440	   cache), besides cache flushing and hierarchy explains these other 30%
441	   of the non-cached records.  The results from the experiments were
442	   confirmed by analyzing authoritative traffic for the .nl TLD, which
443	   showed similar figures.

445	   [Moura18b] also emulated DDoS attacks on authoritative servers were
446	   emulated by dropping all incoming packets for various TTLs values.
447	   For experiments when all authoritative servers were completely
448	   unreachable, they found that TTL value on the DNS records determined
449	   how long clients received responses, together with the status of the
450	   cache at the attack time.  Given the TTL value decreases as time
451	   passes at the cache, it protected clients for up to its value in
452	   cache.  Once the TTL expires, there was some evidence of some
453	   recursives serving stale content [I-D.ietf-dnsop-terminology-bis].
454	   Serving stale is the only viable option when TTL values expire in
455	   recursive caches and authoritative servers became completely
456	   unavailable.

458	   They also emulated partial-failure DDoS failures were also emulated
459	   (similar to Dyn 2016 [Perlroth16], by dropping packet at rates of
460	   50-90%, for various TTL values.  They found that:

462	   o  Caching was a key component in the success of queries.  For
463	      example, with a 50% packet drop rate at the authoritatives, most
464	      clients eventually got an answer.

466	   o  Recursives retries was also a key part of resilience: when caching
467	      could not help (for a scenario with TTL of 60s, and time in
468	      between probing of 10 minutes), recursive servers kept retrying
469	      queries to authoritatives.  With 90% packet drop on both
470	      authoritatives (with TTL of 60s), 27% of clients still got an
471	      answer due to retries, at the price of increased response times.
472	      However, this came with a price for authroritative servers: a 8.1
473	      times increase in normal traffic during a 90% packet drop with TTL
474	      of 60s, as recursives attempt to resolve queries - thus
475	      effectively creating "friendly fire".

477	   Altogether, these results help to explain why previous attacks
478	   against the Roots were not noticed by most users [Moura18b] and why
479	   other attacks (such as Dyn 2016 [Perlroth16]) had significant impact
480	   on users experience: records on the Root zone have TTL values ranging
481	   from 1 to 6 days, while some of unreachable Dyn clients had TTL
482	   values ranging from 120 to 300s, which limit how long records ought
483	   to be cached.

485	   Therefore, given the important role of the TTL on user's experience
486	   during a DDoS attack (and in reducing ''friendly fire''), it is
487	   recommended that DNS zone owners set their TTL values carefully,
488	   using reasonable TTL values (at least 1 hour) whenever possible,
489	   given its role in DNS resilience against DDoS attacks.  However, the
490	   choice of the value depends on the specifics of each operator (CDNs
491	   are known for using TTL values in the range of few minutes).  The
492	   drawback of setting larger TTL values is that changes on the
493	   authoritative system infrastructure (e.g.: adding a new authoritative
494	   server or changing IP address) will take at least as long as the TTL
495	   to propagate among clients.

497	7.  R6: Shared Infrastructure Risks Collateral Damage During Attacks

499	   Co-locating services, such as authoritative servers, creates some
500	   degree of shared risk, in that stress on one service may spill over
501	   into another, resulting in collateral damage.  Collateral damage is a
502	   common side-effect of DDoS, and data centers and operators strive to
503	   minimize collateral damage through redundancy, overcapacity, and
504	   isolation.

506	   This has been seen in practice during the DDoS attack against the
507	   Root DNS system in November 2015 [Moura16b].  In this study, it was
508	   shown that two services not directly targeted by the attack, namely
509	   D-Root and the .nl TLD, suffered collateral damage.  These services
510	   showed reduced end-to-end performance (i.e., higher latency and
511	   reduced reachability) with timing consistent with the DDoS event,
512	   strongly suggesting a shared resource with original targets of the
513	   attack.

515	   Another example of collateral damage was the 1.2 Tbps attack against
516	   Dyn, a major DNS provider on October 2017 [Perlroth16].  As a result,
517	   many of their customers, including Airbnb, HBO, Netflix, and Twitter
518	   experienced issues with clients failing to resolve their domains,
519	   since the servers partially shared the same infrastructure.

521	   It is recommended, therefore, when choosing third-party DNS
522	   providers, operators should be aware of shared infrastructure risks.
523	   By sharing infrastructure, there is an increased attack surface.

525	8.  Security considerations

527	   o  to be added

529	9.  IANA considerations

531	   This document has no IANA actions.

533	10.  Acknowledgements

535	   4 This document is a summary of the main lessons of the research
536	   works mentioned on each recommendation here provided.  As such, each
537	   author of each paper has a clear contribution.

539	   Here we mention the papers co-authors and thank them for their work:
540	   Ricardo de O Schmidt, Wouter B de Vries, Moritz Mueller, Lan Wei,
541	   Cristian Hesselman, Jan Harm Kuipers, Pieter-Tjerk de Boer and Aiko
542	   Pras.

544	   Besides those, we would like thank those who have been individually
545	   thanked in each research work, RIPE NCC and DNS OARC for their tools
546	   and datasets used in this research, as well as the funding agencies
547	   sponsoring the individual research works.

549	11.  References

551	11.1.  Normative References

553	   [I-D.ietf-dnsop-terminology-bis]
554	              Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS
555	              Terminology", draft-ietf-dnsop-terminology-bis-14 (work in
556	              progress), September 2018.

558	   [RFC1034]  Mockapetris, P., "Domain names - concepts and facilities",
559	              STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987,
560	              <https://www.rfc-editor.org/info/rfc1034>.

562	   [RFC1035]  Mockapetris, P., "Domain names - implementation and
563	              specification", STD 13, RFC 1035, DOI 10.17487/RFC1035,
564	              November 1987, <https://www.rfc-editor.org/info/rfc1035>.

566	   [RFC1546]  Partridge, C., Mendez, T., and W. Milliken, "Host
567	              Anycasting Service", RFC 1546, DOI 10.17487/RFC1546,
568	              November 1993, <https://www.rfc-editor.org/info/rfc1546>.

570	   [RFC1995]  Ohta, M., "Incremental Zone Transfer in DNS", RFC 1995,
571	              DOI 10.17487/RFC1995, August 1996,
572	              <https://www.rfc-editor.org/info/rfc1995>.

574	   [RFC1997]  Chandra, R., Traina, P., and T. Li, "BGP Communities
575	              Attribute", RFC 1997, DOI 10.17487/RFC1997, August 1996,
576	              <https://www.rfc-editor.org/info/rfc1997>.

578	   [RFC2181]  Elz, R. and R. Bush, "Clarifications to the DNS
579	              Specification", RFC 2181, DOI 10.17487/RFC2181, July 1997,
580	              <https://www.rfc-editor.org/info/rfc2181>.

582	   [RFC4271]  Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
583	              Border Gateway Protocol 4 (BGP-4)", RFC 4271,
584	              DOI 10.17487/RFC4271, January 2006,
585	              <https://www.rfc-editor.org/info/rfc4271>.

587	   [RFC4786]  Abley, J. and K. Lindqvist, "Operation of Anycast
588	              Services", BCP 126, RFC 4786, DOI 10.17487/RFC4786,
589	              December 2006, <https://www.rfc-editor.org/info/rfc4786>.

591	   [RFC5936]  Lewis, E. and A. Hoenes, Ed., "DNS Zone Transfer Protocol
592	              (AXFR)", RFC 5936, DOI 10.17487/RFC5936, June 2010,
593	              <https://www.rfc-editor.org/info/rfc5936>.

595	   [RFC7094]  McPherson, D., Oran, D., Thaler, D., and E. Osterweil,
596	              "Architectural Considerations of IP Anycast", RFC 7094,
597	              DOI 10.17487/RFC7094, January 2014,
598	              <https://www.rfc-editor.org/info/rfc7094>.

600	11.2.  Informative References

602	   [AnyTest]  Schmidt, R., "Anycast Testbed", December 2018,
603	              <http://www.anycast-testbed.com/>.

605	   [Ditl17]   OARC, D., "2017 DITL data", October 2018,
606	              <https://www.dns-oarc.net/oarc/data/ditl/2017>.

608	   [IcannHedge18]
609	              ICANN, ., "DNS-STATS - Hedgehog 2.4.1", October 2018,
610	              <http://stats.dns.icann.org/hedgehog/>.

612	   [Moura16b]
613	              Moura, G., Schmidt, R., Heidemann, J., Mueller, M., Wei,
614	              L., and C. Hesselman, "Anycast vs DDoS Evaluating the
615	              November 2015 Root DNS Events.", ACM 2016 Internet
616	              Measurement Conference, DOI /10.1145/2987443.2987446,
617	              October 2016,
618	              <https://www.isi.edu/~johnh/PAPERS/Moura16b.pdf>.

620	   [Moura18b]
621	              Moura, G., Heidemann, J., Mueller, M., Schmidt, R., and M.
622	              Davids, "When the Dike Breaks: Dissecting DNS Defenses
623	              During DDos", ACM 2018 Internet Measurement Conference,
624	              DOI 10.1145/3278532.3278534, October 2018,
625	              <https://www.isi.edu/~johnh/PAPERS/Moura18b.pdf>.

627	   [Mueller17b]
628	              Mueller, M., Moura, G., Schmidt, R., and J. Heidemann,
629	              "Recursives in the Wild- Engineering Authoritative DNS
630	              Servers.", ACM 2017 Internet Measurement Conference,
631	              DOI 10.1145/3131365.3131366, October 2017,
632	              <https://www.isi.edu/%7ejohnh/PAPERS/Mueller17b.pdf>.

634	   [Perlroth16]
635	              Perlroth, N., "Hackers Used New Weapons to Disrupt Major
636	              Websites Across U.S.", October 2016,
637	              <https://www.nytimes.com/2016/10/22/business/
638	              internet-problems-attack.html>.

640	   [Schmidt17a]
641	              Schmidt, R., Heidemann, J., and J. Kuipers, "Anycast
642	              Latency - How Many Sites Are Enough. In Proceedings of the
643	              Passive and Active Measurement Workshop", PAM Passive and
644	              Active Measurement Conference, March 2017,
645	              <https://www.isi.edu/%7ejohnh/PAPERS/Schmidt17a.pdf>.

647	   [Sigla2014]
648	              Singla, A., Chandrasekaran, B., Godfrey, P., and B. Maggs,
649	              "The Internet at the speed of light. In Proceedings of the
650	              13th ACM Workshop on Hot Topics in Networks (Oct 2014)",
651	              ACM Workshop on Hot Topics in Networks, October 2014,
652	              <http://speedierweb.web.engr.illinois.edu/cspeed/papers/
653	              hotnets14.pdf>.

655	   [VerfSrc]  Vries, W., "Verfploeter source code", November 2018,
656	              <https://github.com/Woutifier/verfploeter>.

658	   [Vries17b]
659	              Vries, W., Schmidt, R., Hardaker, W., Heidemann, J., Boer,
660	              P., and A. Pras, "Verfploeter - Broad and Load-Aware
661	              Anycast Mapping", ACM 2017 Internet Measurement
662	              Conference, DOI 10.1145/3131365.3131371, October 2017,
663	              <https://www.isi.edu/%7ejohnh/PAPERS/Vries17b.pdf>.

665	Authors' Addresses

667	   Giovane C. M. Moura
668	   SIDN Labs/TU Delft
669	   Meander 501
670	   Arnhem  6825 MD
671	   The Netherlands

673	   Phone: +31 26 352 5500
674	   Email: giovane.moura@sidn.nl

676	   Wes Hardaker
677	   USC/Information Sciences Institute
678	   PO Box 382
679	   Davis  95617-0382
680	   U.S.A.

682	   Phone: +1 (530) 404-0099
683	   Email: ietf@hardakers.net

685	   John Heidemann
686	   USC/Information Sciences Institute
687	   4676 Admiralty Way
688	   Marina Del Rey  90292-6695
689	   U.S.A.

691	   Phone: +1 (310) 448-8708
692	   Email: johnh@isi.edu

694	   Marco Davids
695	   SIDN Labs
696	   Meander 501
697	   Arnhem  6825 MD
698	   The Netherlands

700	   Phone: +31 26 352 5500
701	   Email: marco.davids@sidn.nl