idnits 2.17.1 draft-moura-dnsop-authoritative-recommendations-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 02, 2019) is 1668 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RipeAtlas19a' is defined on line 722, but no explicit reference was found in the text ** Obsolete normative reference: RFC 5575 (Obsoleted by RFC 8955) ** Obsolete normative reference: RFC 8499 (Obsoleted by RFC 9499) Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 DNSOP Working Group G. Moura 3 Internet-Draft SIDN Labs/TU Delft 4 Intended status: Informational W. Hardaker 5 Expires: April 4, 2020 J. Heidemann 6 USC/Information Sciences Institute 7 M. Davids 8 SIDN Labs 9 October 02, 2019 11 Considerations for Large Authoritative DNS Servers Operators 12 draft-moura-dnsop-authoritative-recommendations-06 14 Abstract 16 This document summarizes recent research work exploring Domain Name 17 System (DNS) configurations and offers specific, tangible 18 considerations to operators for configuring authoritative servers. 20 It is possible that the considerations presented in this document 21 could be applicable in a wider context, such as for any stateless/ 22 short-duration, anycasted service. 24 This document is not an IETF consensus document: it is published for 25 informational purposes. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at https://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on April 4, 2020. 44 Copyright Notice 46 Copyright (c) 2019 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (https://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 62 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3 63 3. C1: Use anycast in every authoritative for better load 64 distribution . . . . . . . . . . . . . . . . . . . . . . . . 4 65 4. C2: Routing can matter more than locations . . . . . . . . . 6 66 5. C3: Collecting anycast catchment maps to improve design . . . 7 67 6. C4: When under stress, employ two strategies . . . . . . . . 8 68 7. C5: Consider longer time-to-live values whenever possible . . 10 69 8. Security considerations . . . . . . . . . . . . . . . . . . . 12 70 9. Privacy Considerations . . . . . . . . . . . . . . . . . . . 12 71 10. IANA considerations . . . . . . . . . . . . . . . . . . . . . 12 72 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 73 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 74 12.1. Normative References . . . . . . . . . . . . . . . . . . 13 75 12.2. Informative References . . . . . . . . . . . . . . . . . 14 76 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 78 1. Introduction 80 This document summarizes recent research work exploring DNS 81 configurations and offers specific tangible considerations to DNS 82 authoritative server operators (DNS operators hereafter). The 83 considerations (C1-C5) presented in this document are backed by 84 previous research work, which used wide-scale Internet measurements 85 upon which to draw their conclusions. This document describes the 86 key engineering options, and points readers to the pertinent papers 87 for details and other research works related to each consideration 88 here presented. 90 These considerations are designed for operators of "large" 91 authoritative servers. In this context, "large" authoritative 92 servers refers to those with a significant global user population, 93 like top-level domain (TLD) operators, run by a single or multiple 94 operators. These considerations may not be appropriate for smaller 95 domains, such as those used by an organization with users in one city 96 or region, where goals such as uniform low latency are less strict. 98 It is likely that these considerations might be useful in a wider 99 context, such as for any stateless/short-duration, anycasted service. 100 Because the conclusions of the studies don't verify this fact, the 101 wording in this document discusses DNS authoritative services only. 102 This document is not an IETF consensus document: it is published for 103 informational purposes. 105 2. Background 107 The DNS as main two types of DNS servers: authoritative servers and 108 recursive resolvers. Figure 1 shows their relationship. An 109 authoritative server (ATn in Figure 1) knows the content of a DNS 110 zone from local knowledge, and thus can answer queries about that 111 zone without needing to query other servers [RFC2181]. A recursive 112 resolver (Re_n) is a program that extracts information from name 113 servers in response to client requests [RFC1034]. A client (stub in 114 Figure 1) refers to stub resolver [RFC1034] that is typically located 115 within the client software. 117 +-----+ +-----+ +-----+ +-----+ 118 | AT1 | | AT2 | | AT3 | | AT4 | 119 +-----+ +-----+ +-----+ +-----+ 120 ^ ^ ^ ^ 121 | | | | 122 | +-----+ | | 123 +------|Re_1 |------+ | 124 | +-----+ | 125 | ^ | 126 | | | 127 | +-----+ +-----+ | 128 +------|Re_2 | |Re_3 |-----+ 129 +-----+ +-----+ 130 ^ ^ 131 | | 132 | +------+ | 133 +-| stub |--+ 134 +------+ 136 Figure 1: Relationship between recursive resolvers (Re_n) and 137 authoritative name servers (ATn) 139 DNS queries/responses contribute to a user's perceived perceived 140 latency and affect user experience [Sigla2014], and the DNS system 141 has been subject to repeated Denial of Service (DoS) attacks (for 142 example, in November 2015 [Moura16b]) in order to degrade user 143 experience. 145 To reduce latency and improve resiliency against DoS attacks, DNS 146 uses several types of server replication. Replication at the 147 authoritative server level can be achieved with (i) the deployment of 148 multiple servers for the same zone [RFC1035] (AT1--AT4 in Figure 1), 149 (ii) the use of IP anycast [RFC1546][RFC4786][RFC7094] that allows 150 the same IP address to be announced from multiple locations (each of 151 them referred to as an anycast instance [RFC8499]) and (iii) by using 152 load balancers to support multiple servers inside a single 153 (potentially anycasted) instance. As a consequence, there are many 154 possible ways an authoritative DNS provider can engineer its 155 production authoritative server network, with multiple viable choices 156 and no single optimal design. 158 In the next sections we cover specific considerations (C1-C5) for 159 large authoritative DNS server operators. 161 3. C1: Use anycast in every authoritative for better load distribution 163 Authoritative DNS servers operators announce their authoritative 164 servers as NS records[RFC1034]. Different authoritatives for a given 165 zone should return the same content, typically by staying 166 synchronized using DNS zone transfers (AXFR[RFC5936] and 167 IXFR[RFC1995]) to coordinate the authoritative zone data to return to 168 their clients. 170 DNS heavily relies upon replication to support high reliability, 171 capacity and to reduce latency [Moura16b]. DNS has two complementary 172 mechanisms to replicate the service. First, the protocol itself 173 supports nameserver replication of DNS service for a DNS zone through 174 the use of multiple nameservers that each operate on different IP 175 addresses, listed by a zone's NS records. Second, each of these 176 network addresses can run from multiple physical locations through 177 the use of IP anycast[RFC1546][RFC4786][RFC7094], by announcing the 178 same IP address from each instance and allowing Internet routing 179 (BGP[RFC4271]) to associate clients with their topologically nearest 180 anycast instance. Outside the DNS protocol, replication can be 181 achieved by deploying load balancers at each physical location. 182 Nameserver replication is recommended for all zones (multiple NS 183 records), and IP anycast is used by most large zones such as the DNS 184 Root, most top-level domains[Moura16b] and large commercial 185 enterprises, governments and other organizations. 187 Most DNS operators strive to reduce latency for users of their 188 service. However, because they control only their authoritative 189 servers, and not the recursive resolvers communicating with those 190 servers, it is difficult to ensure that recursives will be served by 191 the closest authoritative server. Server selection is up to the 192 recursive resolver's software implementation, and different software 193 vendors and releases employ different criteria to chose which 194 authoritative servers with which to communicate. 196 Knowing how recursives choose authoritative servers is a key step to 197 better engineer the deployment of authoritative servers. 198 [Mueller17b] evaluates this with a measurement study in which they 199 deployed seven unicast authoritative name servers in different global 200 locations and queried these authoritative servers from more than 9k 201 RIPE authoritative server operators and their respective recursive 202 resolvers. 204 In the wild, [Mueller17b] found that recursives query all available 205 authoritative servers, regardless of the observed latency. But the 206 distribution of queries tends to be skewed towards authoritatives 207 with lower latency: the lower the latency between a recursive 208 resolver and an authoritative server, the more often the recursive 209 will send queries to that authoritative. These results were obtained 210 by aggregating results from all vantage points and not specific to 211 any vendor/version. 213 The hypothesis is that this behavior is a consequence of two main 214 criteria employed by resolvers when choosing authoritatives: 215 performance (lower latency) and diversity of authoritatives, where a 216 resolver checks all authoritative servers to determine which is 217 closer and to provide alternatives if one is unavailable. 219 For a DNS operator, this policy means that latency of all 220 authoritatives (NS records) matter, so all must be similarly capable, 221 since all available authoritatives will be queried by most recursive 222 resolvers. Since unicast cannot deliver good latency worldwide (a 223 unicast authoritative server in Europe will always have high latency 224 to resolvers in California, for example, given its geographical 225 distance), [Mueller17b] recommends to DNS operators that they deploy 226 equally strong IP anycast in every authoritative server (i.e., on 227 each NS record , in terms of number of instances and peering, and, 228 consequently, to phase out unicast, so they can deliver good latency 229 values to global clients. However, [Mueller17b] also notes that DNS 230 operators should also take architectural considerations into account 231 when planning for deploying anycast [RFC1546]. 233 This consideration was deployed at the ".nl" TLD zone, which 234 originally had seven authoritative severs (mixed unicast/anycast 235 setup). In early 2018, .nl moved to a setup with 4 anycast 236 authoritative name servers. This is not to say that .nl was the 237 first - other zones, have been running anycast only authoritatives 238 (e.g., .be since 2013). [Mueller17b] contribution is to show that 239 unicast cannot deliver good latency worldwide, and that anycast has 240 to be deployed to deliver good latency worldwide. 242 4. C2: Routing can matter more than locations 244 A common metric when choosing an anycast DNS provider or setting up 245 an anycast service is the number of anycast instances[RFC4786], i.e., 246 the number of global locations from which the same address is 247 announced with BGP. Intuitively, one could think that more instances 248 will lead to shorter response times. 250 However, this is not necessarily true. In fact, [Schmidt17a] found 251 that routing can matter more than the total number of locations. 252 They analyzed the relationship between the number of anycast 253 instances and the performance of a service (latency-wise, round-trip 254 time (RTT)) and measured the overall performance of four DNS Root 255 servers. The Root DNS is implemented by 13 separate DNS services, 256 each running on a different IP address, but sharing a common master 257 data source: the root DNS zone. These are called the 13 DNS Root 258 Letter Services just the "Root Letters" for short), since each is 259 assigned a letter from A to M and identified as $letter.root- 260 servers.net. 262 In specific, [Schmidt17a] measured the performance of C, F, K and L 263 root letters, from more than 7.9k RIPE Atlas probes (RIPE Atlas is a 264 measurement platform with more than 12000 global devices - Atlas 265 Probes - that provide vantage points that conduct Internet 266 measurements, and its regularly used by researchers and operators 267 [RipeAtlas15a] {{RipeAtlas19a}). 269 [Schmidt17a] found that C-Root, a smaller anycast deployment 270 consisting of only 8 instances (they refer to anycast instance as 271 anycast site), provided a very similar overall performance than that 272 of the much larger deployments of K and L, with 33 and 144 instances 273 respectively. The median RTT for C, K and L Root was between 274 30-32ms. 276 Given that Atlas has better coverage in Europe than other regions, 277 the authors specifically analyzed results per region and per country 278 (Figure 5 in [Schmidt17a]), and show that Atlas bias to Europe does 279 not change the conclusion that location of anycast instances 280 dominates latency. 282 [Schmidt17a] consideration for DNS operators when engineering anycast 283 services is consider factors other than just the number of instances 284 (such as local routing connectivity) when designing for performance. 285 They showed that 12 instances can provide reasonable latency, given 286 they are globally distributed and have good local interconnectivity. 287 However, more instances can be useful for other reasons, such as when 288 handling Denial-of-service (DoS) attacks [Moura16b]. 290 5. C3: Collecting anycast catchment maps to improve design 292 An anycast DNS service may have several dozens or even more than one 293 hundred locations (such as L-Root does). Anycast leverages Internet 294 routing to distribute the incoming queries to a service's distributed 295 anycast locations; in theory, BGP (the Internet's de facto routing 296 protocol) forwards incoming queries to a nearby anycast location (in 297 terms of BGP distance). However, usually queries are not evenly 298 distributed across all anycast locations, as found in the case of 299 L-Root [IcannHedge18]. 301 Adding locations to an anycast service may change the load 302 distribution across all locations. Given that BGP maps clients to 303 locations, whenever a new location is announced, this new location 304 may receive more or less traffic than it was engineered for, leading 305 to suboptimal usage of the service or even stressing the new location 306 while leaving others underutilized. This is a scenario that 307 operators constantly face when expanding an anycast service. 308 Besides, when setting up a new anycast service location, operators 309 cannot directly estimate the query distribution among the locations 310 in advance of enabling the new location. 312 To estimate the query loads across locations of an expanding service 313 or a when setting up an entirely new service, operators need detailed 314 anycast maps and catchment estimates (i.e., operators need to know 315 which prefixes will be matched to which anycast instance). To do 316 that, [Vries17b] developed a new technique enabling operators to 317 carry out active measurements, using an open-source tool called 318 Verfploeter (available at [VerfSrc]). Verfploeter maps a large 319 portion of the IPv4 address space, allowing DNS operators to predict 320 both query distribution and clients catchment before deploying new 321 anycast instances. At the moment of this writing, Verfploeter still 322 does not support IPv6. 324 [Vries17b] shows how this technique was used to predict both the 325 catchment and query load distribution for the new anycast service of 326 B-Root. Using two anycast instances in Miami (MIA) and Los Angeles 327 (LAX) from the operational B-Root server, they sent ICMP echo packets 328 to IP addresses to each IPv4 /24 on the Internet using a source 329 address within the anycast prefix. Then, they recorded which 330 instance the ICMP echo replies arrived at based on the Internet's BGP 331 routing. This analysis resulted in an Internet wide catchment map. 332 Weighting was then applied to the incoming traffic prefixes based on 333 of 1 day of B-Root traffic (2017-04-12, DITL datasets [Ditl17]). The 334 combination of the created catchment mapping and the load per prefix 335 created an estimate predicting that 81.6% of the traffic would go to 336 the LAX location. The actual value was 81.4% of traffic going to 337 LAX, showing that the estimation was pretty close and the Verfploeter 338 technique was a excellent method of predicting traffic loads in 339 advance of a new anycast instance deployment ([Vries17b] also uses 340 the term anycast site to refer to anycast location). 342 Besides that, Verfploeter can also be used to estimate how traffic 343 shifts among locations when BGP manipulations are executed, such as 344 AS Path prepending that is frequently used by production networks 345 during DDoS attacks. A new catchment mapping for each prepending 346 configuration configuration: no prepending, and prepending with 1, 2 347 or 3 hops at each instance. Then, [Vries17b] shows that this mapping 348 can accurately estimate the load distribution for each configuration. 350 An important operational takeaway from [Vries17b] is that DNS 351 operators can make informed choices when engineering new anycast 352 locations or when expending new ones by carrying out active 353 measurements using Verfploeter in advance of operationally enabling 354 the fully anycast service. Operators can spot sub-optimal routing 355 situations early, with a fine granularity, and with significantly 356 better coverage than using traditional measurement platforms such as 357 RIPE Atlas. 359 To date, Verfploeter has been deployed on B-Root[Vries17b], on a 360 operational testbed (Anycast testbed) [AnyTest], and on a large 361 unnamed operator. 363 The consideration is therefore to deploy a small test Verfploeter- 364 enabled platform in advance at a potential anycast locations may 365 reveal the realizable benefits of using that location as an anycast 366 interest, potentially saving significant financial and labor costs of 367 deploying hardware to a new location that was less effective than as 368 had been hoped. 370 6. C4: When under stress, employ two strategies 372 DDoS attacks are becoming bigger, cheaper, and more frequent 373 [Moura16b]. The most powerful recorded DDoS attack to DNS servers to 374 date reached 1.2 Tbps, by using IoT devices [Perlroth16]. Such 375 attacks call for an answer for the following question: how should a 376 DNS operator engineer its anycast authoritative DNS server react to 377 the stress of a DDoS attack? This question is investigated in study 378 [Moura16b] in which empirical observations are grounded with the 379 following theoretical evaluation of options. 381 An authoritative DNS server deployed using anycast will have many 382 server instances distributed over many networks. Ultimately, the 383 relationship between the DNS provider's network and a client's ISP 384 will determine which anycast instance will answer queries for a given 385 client, given that BGP is the protocol that maps clients to specific 386 anycast instances by using routing information [RF:KDar02]. As a 387 consequence, when an anycast authoritative server is under attack, 388 the load that each anycast instance receives is likely to be unevenly 389 distributed (a function of the source of the attacks), thus some 390 instances may be more overloaded than others which is what was 391 observed analyzing the Root DNS events of Nov. 2015 [Moura16b]. 392 Given the fact that different instances may have different capacity 393 (bandwidth, CPU, etc.), making a decision about how to react to 394 stress becomes even more difficult. 396 In practice, an anycast instance under stress, overloaded with 397 incoming traffic, has two options: 399 o It can withdraw or pre-prepend its route to some or to all of its 400 neighbors, perform other traffic shifting tricks (such as reducing 401 the propagation of its announcements using BGP 402 communities[RFC1997]) which shrinks portions of its catchment), 403 use FlowSpec [RFC5575] or other upstream communication mechanisms 404 to deploy upstream filtering. The goals of these techniques is to 405 perform some combination of shifting of both legitimate and attack 406 traffic to other anycast instances (with hopefully greater 407 capacity) or to block the traffic entirely. 409 o Alternatively, it can be become a degraded absorber, continuing to 410 operate, but with overloaded ingress routers, dropping some 411 incoming legitimate requests due to queue overflow. However, 412 continued operation will also absorb traffic from attackers in its 413 catchment, protecting the other anycast instances. 415 [Moura16b] saw both of these behaviors in practice in the Root DNS 416 events, observed through instance reachability and route-trip time 417 (RTTs). These options represent different uses of an anycast 418 deployment. The withdrawal strategy causes anycast to respond as a 419 waterbed, with stress displacing queries from one instance to others. 420 The absorption strategy behaves as a conventional mattress, 421 compressing under load, with some queries getting delayed or dropped. 423 Although described as strategies and policies, these outcomes are the 424 result of several factors: the combination of operator and host ISP 425 routing policies, routing implementations withdrawing under load, the 426 nature of the attack, and the locations of the instances and the 427 attackers. Some policies are explicit, such as the choice of local- 428 only anycast instances, or operators removing an instance for 429 maintenance or modifying routing to manage load. However, under 430 stress, the choices of withdrawal and absorption can also be results 431 that emerge from a mix of explicit choices and implementation 432 details, such as BGP timeout values. 434 [Moura16b] speculates that more careful, explicit, and automated 435 management of policies may provide stronger defenses to overload. 436 For DNS operators, that means that besides traditional filtering, two 437 other options are available (withdraw/prepend/communities or isolate 438 instances), and the best choice depends on the specifics of the 439 attack. 441 Note that this consideration refers to the operation of one anycast 442 service, i.e., one anycast NS record. However, DNS zones with 443 multiple authoritative anycast servers may expect load to spill from 444 one anycast server to another, as resolvers switch from authoritative 445 to authoritative when attempting to resolve a name [Mueller17b]. 447 7. C5: Consider longer time-to-live values whenever possible 449 Caching is the cornerstone of good DNS performance and reliability. 450 A 15 ms response to a new DNS query is fast, but a 1 ms cache hit to 451 a repeat query is far faster. Caching also protects users from short 452 outages and can mute even significant DDoS attacks [Moura18b]. 454 DNS record TTLs (time-to-live values) directly control cache duration 455 [RFC1034][RFC1035] and, therefore, affect latency, resilience, and 456 the role of DNS in CDN server selection. Some early work modeled 457 caches as a function of their TTLs [Jung03a], and recent work 458 examined their interaction with DNS[Moura18b], but no research 459 provides considerations about what TTL values are good. With this 460 goal Moura et. al. [Moura19a] carried out a measurement study 461 investigating TTL choices and its impact on user experience in the 462 wild, and not focused on specific resolvers (and their caching 463 architectures), vendors, or setups. 465 First, they identified several reasons why operators/zone owners may 466 want to choose longer or shorter TTLs: 468 o Longer TTL leads to longer caching, which results in faster 469 responses, given that cache hits are faster than cache misses in 470 resolvers. [Moura19a] shows that the increase in the TTL for .uy 471 TLD from 5 minutes (300s) to 1 day (86400s) reduced the latency 472 from 15k Atlas vantage points significantly: the median RTT went 473 from 28.7ms to 8ms, while the 75%ile decreased from 183ms to 21ms. 475 o Longer caching results in lower DNS traffic: authoritative servers 476 will experience less traffic if TTLs are extended, given that 477 repeated queries will be answered by resolver caches. 479 o Longer caching results in lower cost if DNS is metered: some DNS- 480 As-A-Service providers charges are metered, with a per query cost 481 (often added to a fixed monthly cost). 483 o Longer caching is more robust to DDoS attacks on DNS: DDoS attacks 484 on a DNS service provider harmed several prominent websites 485 [Perlroth16]. Recent work has shown that DNS caching can greatly 486 reduce the effects of DDoS on DNS, provided caches last longer 487 than the attack [Moura18b]. 489 o Shorter caching supports operational changes: An easy way to 490 transition from an old server to a new one is to change the DNS 491 records. Since there is no method to remove cached DNS records, 492 the TTL duration represents a necessary transition delay to fully 493 shift to a new server, so low TTLs allow more rapid transition. 494 However, when deployments are planned in advance (that is, longer 495 than the TTL), then TTLs can be lowered ''just-before'' a major 496 operational change, and raised again once accomplished. 498 o Shorter caching can help with a DNS-based response to DDoS 499 attacks: Some DDoS-scrubbing services use DNS to redirect traffic 500 during an attack. Since DDoS attacks arrive unannounced, DNS- 501 based traffic redirection requires the TTL be keptquite low at all 502 times to be ready to respond to a potential attack. 504 o Shorter caching helps DNS-based load balancing: Many large 505 services use DNS-based load balancing. Each arriving DNS request 506 provides an opportunity to adjust load, so short TTLs may be 507 desired to react more quickly to traffic dynamics. (Although many 508 recursive resolvers have minimum caching times of tens of seconds, 509 placing a limit on agility.) 511 As such, choice of TTL depends in part on external factors so no 512 single recommendation is appropriate for all. Organizations must 513 weigh these trade-offs to find a good balance. Still, some 514 guidelines can be used when choosing TTLs: 516 o For general users, [Moura19a] recommends longer TTLs, of at least 517 one hour, and ideally 4, 8, 12, or 24 hours. Assuming planned 518 maintenance can be scheduled at least a day in advance, long TTLs 519 have little cost. 521 o For TLD operators: TLD operators that allow public registration of 522 domains (such as most ccTLDs and .com, .net, .org) host, in their 523 zone files, NS records (and glues if in-bailiwick) of their 524 respective domains. [Moura19a] shows that most resolvers will use 525 TTL values provided by the child delegations, but some will choose 526 the TTL provided by the parents. As such, similarly to general 527 users, [Moura19a] recommends longer TTLs for NS records of their 528 delegations (at least one hour, preferably more). 530 o Users of DNS-based load balancing or DDoS-prevention may require 531 short TTLs: TTLs may be as short as 5 minutes, although 15 minutes 532 may provide sufficient agility for many operators. Shorter TTLs 533 here help agility; they are are an exception to the consideration 534 for longer TTLs. 536 o Use A/AAAA and NS records: TTLs of A/AAAA records should be 537 shorter or equal to the TTL for NS records for in-bailiwick 538 authoritative DNS servers, given that the authors [Moura19a] found 539 that, for such scenarios, once NS record expires, their associated 540 A/AAAA will also be updated (glue is sent by the parents). For 541 out-of-bailiwick servers, A and NS records are usually cached 542 independently, so different TTLs, if desired, will be effective. 543 In either case, short A and AAAA records may be desired if DDoS- 544 mitigation services are an option. 546 8. Security considerations 548 As this document discusses applying research results to operational 549 deployments, there are no further security considerations, other than 550 the ones mentioned in the normative references. Most of the 551 considerations affect mostly operational practice, though a few do 552 have security related impacts, which we'll summarize at high level. 554 Specifically, C4 discusses a few strategies to employ when a service 555 is under stress, providing operators with additional guidance when 556 handling denial of service attacks. 558 Similarly, C5 identifies both the operational and security benefits 559 to using longer time-to-live values. 561 9. Privacy Considerations 563 This document does not add any practical new privacy issues, aside 564 from possible benefits in deploying longer TTLs as suggested in C5. 565 Longer TTLs may help preserve a user's privacy by reducing the number 566 of requests that get transmitted in both thec lient-to-resolver and 567 resolver-to-authoritative cases. 569 DNS privacy is currently under active study, and future research 570 efforts by multiple organizations may produce more guidance in this 571 area. 573 10. IANA considerations 575 This document has no IANA actions. 577 11. Acknowledgements 579 This document is a summary of the main considerations of six research 580 works referred in this document. As such, they were only possible 581 thanks to the hard work of the authors of these research works. 583 o Ricardo de O. Schmidt 585 o Wouter B de Vries 587 o Moritz Mueller 589 o Lan Wei 591 o Cristian Hesselman 593 o Jan Harm Kuipers 595 o Pieter-Tjerk de Boer 597 o Aiko Pras 599 We would like also to thank the various reviewers of different 600 versions of this draft: Duane Wessels, Joe Abley, Toema Gavrichenkov, 601 John Levine, Michael StJohns, Kristof Tuyteleers, Stefan Ubbink, 602 Klaus Darilion and Samir Jafferali, and comments provided at the IETF 603 DNSOP session (IETF104). 605 Besides those, we would like thank those who have been individually 606 thanked in each research work, RIPE NCC and DNS OARC for their tools 607 and datasets used in this research, as well as the funding agencies 608 sponsoring the individual research works. 610 12. References 612 12.1. Normative References 614 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", 615 STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987, 616 . 618 [RFC1035] Mockapetris, P., "Domain names - implementation and 619 specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, 620 November 1987, . 622 [RFC1546] Partridge, C., Mendez, T., and W. Milliken, "Host 623 Anycasting Service", RFC 1546, DOI 10.17487/RFC1546, 624 November 1993, . 626 [RFC1995] Ohta, M., "Incremental Zone Transfer in DNS", RFC 1995, 627 DOI 10.17487/RFC1995, August 1996, 628 . 630 [RFC1997] Chandra, R., Traina, P., and T. Li, "BGP Communities 631 Attribute", RFC 1997, DOI 10.17487/RFC1997, August 1996, 632 . 634 [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS 635 Specification", RFC 2181, DOI 10.17487/RFC2181, July 1997, 636 . 638 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 639 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 640 DOI 10.17487/RFC4271, January 2006, 641 . 643 [RFC4786] Abley, J. and K. Lindqvist, "Operation of Anycast 644 Services", BCP 126, RFC 4786, DOI 10.17487/RFC4786, 645 December 2006, . 647 [RFC5575] Marques, P., Sheth, N., Raszuk, R., Greene, B., Mauch, J., 648 and D. McPherson, "Dissemination of Flow Specification 649 Rules", RFC 5575, DOI 10.17487/RFC5575, August 2009, 650 . 652 [RFC5936] Lewis, E. and A. Hoenes, Ed., "DNS Zone Transfer Protocol 653 (AXFR)", RFC 5936, DOI 10.17487/RFC5936, June 2010, 654 . 656 [RFC7094] McPherson, D., Oran, D., Thaler, D., and E. Osterweil, 657 "Architectural Considerations of IP Anycast", RFC 7094, 658 DOI 10.17487/RFC7094, January 2014, 659 . 661 [RFC8499] Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS 662 Terminology", BCP 219, RFC 8499, DOI 10.17487/RFC8499, 663 January 2019, . 665 12.2. Informative References 667 [AnyTest] Schmidt, R., "Anycast Testbed", December 2018, 668 . 670 [Ditl17] OARC, D., "2017 DITL data", October 2018, 671 . 673 [IcannHedge18] 674 ICANN, ., "DNS-STATS - Hedgehog 2.4.1", October 2018, 675 . 677 [Jung03a] Jung, J., Berger, A., and H. Balakrishnan, "Modeling TTL- 678 based Internet caches", ACM 2003 IEEE INFOCOM, 679 DOI 10.1109/INFCOM.2003.1208693, July 2003, 680 . 682 [Moura16b] 683 Moura, G., Schmidt, R., Heidemann, J., Mueller, M., Wei, 684 L., and C. Hesselman, "Anycast vs DDoS Evaluating the 685 November 2015 Root DNS Events.", ACM 2016 Internet 686 Measurement Conference, DOI /10.1145/2987443.2987446, 687 October 2016, 688 . 690 [Moura18b] 691 Moura, G., Heidemann, J., Mueller, M., Schmidt, R., and M. 692 Davids, "When the Dike Breaks: Dissecting DNS Defenses 693 During DDos", ACM 2018 Internet Measurement Conference, 694 DOI 10.1145/3278532.3278534, October 2018, 695 . 697 [Moura19a] 698 Moura, G., Heidemann, J., Schmidt, R., and W. Hardaker, 699 "Cache Me If You Can: Effects of DNS Time-to-Live", 700 ACM 2019 Internet Measurement Conference, 701 DOI 10.1145/3355369.3355568, October 2019, 702 . 704 [Mueller17b] 705 Mueller, M., Moura, G., Schmidt, R., and J. Heidemann, 706 "Recursives in the Wild- Engineering Authoritative DNS 707 Servers.", ACM 2017 Internet Measurement Conference, 708 DOI 10.1145/3131365.3131366, October 2017, 709 . 711 [Perlroth16] 712 Perlroth, N., "Hackers Used New Weapons to Disrupt Major 713 Websites Across U.S.", October 2016, 714 . 717 [RipeAtlas15a] 718 Staff, R., "RIPE Atlas A Global Internet Measurement 719 Network", September 2015, . 722 [RipeAtlas19a] 723 NCC, R., "Ripe Atlas - RIPE Network Coordination Centre", 724 September 2019, . 726 [Schmidt17a] 727 Schmidt, R., Heidemann, J., and J. Kuipers, "Anycast 728 Latency - How Many Sites Are Enough. In Proceedings of the 729 Passive and Active Measurement Workshop", PAM Passive and 730 Active Measurement Conference, March 2017, 731 . 733 [Sigla2014] 734 Singla, A., Chandrasekaran, B., Godfrey, P., and B. Maggs, 735 "The Internet at the speed of light. In Proceedings of the 736 13th ACM Workshop on Hot Topics in Networks (Oct 2014)", 737 ACM Workshop on Hot Topics in Networks, October 2014, 738 . 741 [VerfSrc] Vries, W., "Verfploeter source code", November 2018, 742 . 744 [Vries17b] 745 Vries, W., Schmidt, R., Hardaker, W., Heidemann, J., Boer, 746 P., and A. Pras, "Verfploeter - Broad and Load-Aware 747 Anycast Mapping", ACM 2017 Internet Measurement 748 Conference, DOI 10.1145/3131365.3131371, October 2017, 749 . 751 Authors' Addresses 753 Giovane C. M. Moura 754 SIDN Labs/TU Delft 755 Meander 501 756 Arnhem 6825 MD 757 The Netherlands 759 Phone: +31 26 352 5500 760 Email: giovane.moura@sidn.nl 761 Wes Hardaker 762 USC/Information Sciences Institute 763 PO Box 382 764 Davis 95617-0382 765 U.S.A. 767 Phone: +1 (530) 404-0099 768 Email: ietf@hardakers.net 770 John Heidemann 771 USC/Information Sciences Institute 772 4676 Admiralty Way 773 Marina Del Rey 90292-6695 774 U.S.A. 776 Phone: +1 (310) 448-8708 777 Email: johnh@isi.edu 779 Marco Davids 780 SIDN Labs 781 Meander 501 782 Arnhem 6825 MD 783 The Netherlands 785 Phone: +31 26 352 5500 786 Email: marco.davids@sidn.nl