idnits 2.17.1 draft-ietf-dprive-rfc7626-bis-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 1 instance of lines with non-RFC3849-compliant IPv6 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 8, 2019) is 1747 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 1027 -- Looks like a reference, but probably isn't: '2' on line 1030 -- Looks like a reference, but probably isn't: '3' on line 1032 -- Looks like a reference, but probably isn't: '4' on line 1035 == Outdated reference: A later version (-14) exists of draft-ietf-dprive-bcp-op-02 == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-20 == Outdated reference: A later version (-09) exists of draft-ietf-tls-sni-encryption-04 -- Obsolete informational reference (is this intentional?): RFC 7525 (Obsoleted by RFC 9325) -- Obsolete informational reference (is this intentional?): RFC 8499 (Obsoleted by RFC 9499) Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 dprive S. Bortzmeyer 3 Internet-Draft AFNIC 4 Obsoletes: 7626 (if approved) S. Dickinson 5 Intended status: Informational Sinodun IT 6 Expires: January 9, 2020 July 8, 2019 8 DNS Privacy Considerations 9 draft-ietf-dprive-rfc7626-bis-00 11 Abstract 13 This document describes the privacy issues associated with the use of 14 the DNS by Internet users. It is intended to be an analysis of the 15 present situation and does not prescribe solutions. This document 16 obsoletes RFC 7626. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at http://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on January 9, 2020. 35 Copyright Notice 37 Copyright (c) 2019 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 53 2. Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 54 2.1. The Alleged Public Nature of DNS Data . . . . . . . . . . 5 55 2.2. Data in the DNS Request . . . . . . . . . . . . . . . . . 5 56 2.2.1. Data in the DNS payload . . . . . . . . . . . . . . . 7 57 2.3. Cache Snooping . . . . . . . . . . . . . . . . . . . . . 7 58 2.4. On the Wire . . . . . . . . . . . . . . . . . . . . . . . 7 59 2.4.1. Unencrypted Transports . . . . . . . . . . . . . . . 7 60 2.4.2. Encrypted Transports . . . . . . . . . . . . . . . . 9 61 2.5. In the Servers . . . . . . . . . . . . . . . . . . . . . 10 62 2.5.1. In the Recursive Resolvers . . . . . . . . . . . . . 10 63 2.5.2. In the Authoritative Name Servers . . . . . . . . . . 12 64 2.5.3. Rogue Servers . . . . . . . . . . . . . . . . . . . . 13 65 2.5.4. Authentication of servers . . . . . . . . . . . . . . 13 66 2.5.5. Blocking of services . . . . . . . . . . . . . . . . 14 67 2.6. Re-identification and Other Inferences . . . . . . . . . 14 68 2.7. More Information . . . . . . . . . . . . . . . . . . . . 15 69 3. Actual "Attacks" . . . . . . . . . . . . . . . . . . . . . . 15 70 4. Legalities . . . . . . . . . . . . . . . . . . . . . . . . . 15 71 5. Security Considerations . . . . . . . . . . . . . . . . . . . 16 72 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 16 73 7. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 16 74 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 75 8.1. Normative References . . . . . . . . . . . . . . . . . . 17 76 8.2. Informative References . . . . . . . . . . . . . . . . . 17 77 8.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 22 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 80 1. Introduction 82 This document is an analysis of the DNS privacy issues, in the spirit 83 of Section 8 of [RFC6973]. 85 The Domain Name System is specified in [RFC1034], [RFC1035], and many 86 later RFCs, which have never been consolidated. It is one of the 87 most important infrastructure components of the Internet and often 88 ignored or misunderstood by Internet users (and even by many 89 professionals). Almost every activity on the Internet starts with a 90 DNS query (and often several). Its use has many privacy implications 91 and this is an attempt at a comprehensive and accurate list. 93 Let us begin with a simplified reminder of how the DNS works. (See 94 also [RFC8499]) A client, the stub resolver, issues a DNS query to a 95 server, called the recursive resolver (also called caching resolver 96 or full resolver or recursive name server). Let's use the query 97 "What are the AAAA records for www.example.com?" as an example. AAAA 98 is the QTYPE (Query Type), and www.example.com is the QNAME (Query 99 Name). (The description that follows assumes a cold cache, for 100 instance, because the server just started.) The recursive resolver 101 will first query the root name servers. In most cases, the root name 102 servers will send a referral. In this example, the referral will be 103 to the .com name servers. The resolver repeats the query to one of 104 the .com name servers. The .com name servers, in turn, will refer to 105 the example.com name servers. The example.com name server will then 106 return the answer. The root name servers, the name servers of .com, 107 and the name servers of example.com are called authoritative name 108 servers. It is important, when analyzing the privacy issues, to 109 remember that the question asked to all these name servers is always 110 the original question, not a derived question. The question sent to 111 the root name servers is "What are the AAAA records for 112 www.example.com?", not "What are the name servers of .com?". By 113 repeating the full question, instead of just the relevant part of the 114 question to the next in line, the DNS provides more information than 115 necessary to the name server. 117 Because DNS relies on caching heavily, the algorithm described just 118 above is actually a bit more complicated, and not all questions are 119 sent to the authoritative name servers. If a few seconds later the 120 stub resolver asks the recursive resolver, "What are the SRV records 121 of _xmpp-server._tcp.example.com?", the recursive resolver will 122 remember that it knows the name servers of example.com and will just 123 query them, bypassing the root and .com. Because there is typically 124 no caching in the stub resolver, the recursive resolver, unlike the 125 authoritative servers, sees all the DNS traffic. (Applications, like 126 web browsers, may have some form of caching that does not follow DNS 127 rules, for instance, because it may ignore the TTL. So, the 128 recursive resolver does not see all the name resolution activity.) 130 It should be noted that DNS recursive resolvers sometimes forward 131 requests to other recursive resolvers, typically bigger machines, 132 with a larger and more shared cache (and the query hierarchy can be 133 even deeper, with more than two levels of recursive resolvers). From 134 the point of view of privacy, these forwarders are like resolvers, 135 except that they do not see all of the requests being made (due to 136 caching in the first resolver). 138 At the time of writing, almost all this DNS traffic is currently sent 139 in clear (unencrypted). However there is increasing deployment of 140 DNS-over-TLS (DoT) [RFC7858] and DNS-over-HTTPS (DoH) [RFC8484], 141 particularly in mobile devices, browsers and by providers of anycast 142 recursive DNS resolution services. There are a few cases where there 143 is some alternative channel encryption, for instance, in an IPsec 144 VPN, at least between the stub resolver and the resolver. 146 Today, almost all DNS queries are sent over UDP [thomas-ditl-tcp]. 147 This has practical consequences when considering encryption of the 148 traffic as a possible privacy technique. Some encryption solutions 149 are only designed for TCP, not UDP. 151 Another important point to keep in mind when analyzing the privacy 152 issues of DNS is the fact that DNS requests received by a server are 153 triggered by different reasons. Let's assume an eavesdropper wants 154 to know which web page is viewed by a user. For a typical web page, 155 there are three sorts of DNS requests being issued: 157 o Primary request: this is the domain name in the URL that the user 158 typed, selected from a bookmark, or chose by clicking on an 159 hyperlink. Presumably, this is what is of interest for the 160 eavesdropper. 162 o Secondary requests: these are the additional requests performed by 163 the user agent (here, the web browser) without any direct 164 involvement or knowledge of the user. For the Web, they are 165 triggered by embedded content, Cascading Style Sheets (CSS), 166 JavaScript code, embedded images, etc. In some cases, there can 167 be dozens of domain names in different contexts on a single web 168 page. 170 o Tertiary requests: these are the additional requests performed by 171 the DNS system itself. For instance, if the answer to a query is 172 a referral to a set of name servers, and the glue records are not 173 returned, the resolver will have to do additional requests to turn 174 the name servers' names into IP addresses. Similarly, even if 175 glue records are returned, a careful recursive server will do 176 tertiary requests to verify the IP addresses of those records. 178 It can be noted also that, in the case of a typical web browser, more 179 DNS requests than strictly necessary are sent, for instance, to 180 prefetch resources that the user may query later or when 181 autocompleting the URL in the address bar. Both are a big privacy 182 concern since they may leak information even about non-explicit 183 actions. For instance, just reading a local HTML page, even without 184 selecting the hyperlinks, may trigger DNS requests. 186 For privacy-related terms, we will use the terminology from 187 [RFC6973]. 189 2. Risks 191 This document focuses mostly on the study of privacy risks for the 192 end user (the one performing DNS requests). We consider the risks of 193 pervasive surveillance [RFC7258] as well as risks coming from a more 194 focused surveillance. Privacy risks for the holder of a zone (the 195 risk that someone gets the data) are discussed in [RFC5936] and 196 [RFC5155]. Non-privacy risks (such as cache poisoning) are out of 197 scope. 199 2.1. The Alleged Public Nature of DNS Data 201 It has long been claimed that "the data in the DNS is public". While 202 this sentence makes sense for an Internet-wide lookup system, there 203 are multiple facets to the data and metadata involved that deserve a 204 more detailed look. First, access control lists and private 205 namespaces notwithstanding, the DNS operates under the assumption 206 that public-facing authoritative name servers will respond to "usual" 207 DNS queries for any zone they are authoritative for without further 208 authentication or authorization of the client (resolver). Due to the 209 lack of search capabilities, only a given QNAME will reveal the 210 resource records associated with that name (or that name's non- 211 existence). In other words: one needs to know what to ask for, in 212 order to receive a response. The zone transfer QTYPE [RFC5936] is 213 often blocked or restricted to authenticated/authorized access to 214 enforce this difference (and maybe for other reasons). 216 Another differentiation to be considered is between the DNS data 217 itself and a particular transaction (i.e., a DNS name lookup). DNS 218 data and the results of a DNS query are public, within the boundaries 219 described above, and may not have any confidentiality requirements. 220 However, the same is not true of a single transaction or a sequence 221 of transactions; that transaction is not / should not be public. A 222 typical example from outside the DNS world is: the web site of 223 Alcoholics Anonymous is public; the fact that you visit it should not 224 be. 226 2.2. Data in the DNS Request 228 The DNS request includes many fields, but two of them seem 229 particularly relevant for the privacy issues: the QNAME and the 230 source IP address. "source IP address" is used in a loose sense of 231 "source IP address + maybe source port", because the port is also in 232 the request and can be used to differentiate between several users 233 sharing an IP address (behind a Carrier-Grade NAT (CGN), for instance 234 [RFC6269]). 236 The QNAME is the full name sent by the user. It gives information 237 about what the user does ("What are the MX records of example.net?" 238 means he probably wants to send email to someone at example.net, 239 which may be a domain used by only a few persons and is therefore 240 very revealing about communication relationships). Some QNAMEs are 241 more sensitive than others. For instance, querying the A record of a 242 well-known web statistics domain reveals very little (everybody 243 visits web sites that use this analytics service), but querying the A 244 record of www.verybad.example where verybad.example is the domain of 245 an organization that some people find offensive or objectionable may 246 create more problems for the user. Also, sometimes, the QNAME embeds 247 the software one uses, which could be a privacy issue. For instance, 248 _ldap._tcp.Default-First-Site-Name._sites.gc._msdcs.example.org. 249 There are also some BitTorrent clients that query an SRV record for 250 _bittorrent-tracker._tcp.domain.example. 252 Another important thing about the privacy of the QNAME is the future 253 usages. Today, the lack of privacy is an obstacle to putting 254 potentially sensitive or personally identifiable data in the DNS. At 255 the moment, your DNS traffic might reveal that you are doing email 256 but not with whom. If your Mail User Agent (MUA) starts looking up 257 Pretty Good Privacy (PGP) keys in the DNS [RFC7929], then privacy 258 becomes a lot more important. And email is just an example; there 259 would be other really interesting uses for a more privacy- friendly 260 DNS. 262 For the communication between the stub resolver and the recursive 263 resolver, the source IP address is the address of the user's machine. 264 Therefore, all the issues and warnings about collection of IP 265 addresses apply here. For the communication between the recursive 266 resolver and the authoritative name servers, the source IP address 267 has a different meaning; it does not have the same status as the 268 source address in an HTTP connection. It is now the IP address of 269 the recursive resolver that, in a way, "hides" the real user. 270 However, hiding does not always work. Sometimes EDNS(0) Client 271 subnet [RFC7871] is used (see its privacy analysis in 272 [denis-edns-client-subnet]). Sometimes the end user has a personal 273 recursive resolver on her machine. In both cases, the IP address is 274 as sensitive as it is for HTTP [sidn-entrada]. 276 A note about IP addresses: there is currently no IETF document that 277 describes in detail all the privacy issues around IP addressing. In 278 the meantime, the discussion here is intended to include both IPv4 279 and IPv6 source addresses. For a number of reasons, their assignment 280 and utilization characteristics are different, which may have 281 implications for details of information leakage associated with the 282 collection of source addresses. (For example, a specific IPv6 source 283 address seen on the public Internet is less likely than an IPv4 284 address to originate behind a CGN or other NAT.) However, for both 285 IPv4 and IPv6 addresses, it's important to note that source addresses 286 are propagated with queries and comprise metadata about the host, 287 user, or application that originated them. 289 2.2.1. Data in the DNS payload 291 At the time of writing there are no standardized client identifiers 292 contained in the DNS payload itself (ECS [RFC7871] while widely used 293 is only of Category Informational). 295 DNS Cookies [RFC7873] are a lightweight DNS transaction security 296 mechanism that provides limited protection against a variety of 297 increasingly common denial-of-service and amplification/forgery or 298 cache poisoning attacks by off-path attackers. It is noted, however, 299 that they are designed to just verify IP addresses (and should change 300 once a client's IP address changes), they are not designed to 301 actively track users (like HTTP cookies). 303 There are anecdotal accounts of MAC addresses [1] and even user names 304 being inserted in non-standard EDNS(0) options for stub to resolver 305 communications to support proprietary functionality implemented at 306 the resolver (e.g. parental filtering). 308 2.3. Cache Snooping 310 The content of recursive resolvers' caches can reveal data about the 311 clients using it (the privacy risks depend on the number of clients). 312 This information can sometimes be examined by sending DNS queries 313 with RD=0 to inspect cache content, particularly looking at the DNS 314 TTLs [grangeia.snooping]. Since this also is a reconnaissance 315 technique for subsequent cache poisoning attacks, some counter 316 measures have already been developed and deployed. 318 2.4. On the Wire 320 2.4.1. Unencrypted Transports 322 For unencrypted transports, DNS traffic can be seen by an 323 eavesdropper like any other traffic. (DNSSEC, specified in 324 [RFC4033], explicitly excludes confidentiality from its goals.) So, 325 if an initiator starts an HTTPS communication with a recipient, while 326 the HTTP traffic will be encrypted, the DNS exchange prior to it will 327 not be. When other protocols will become more and more privacy-aware 328 and secured against surveillance (e.g. [RFC8446], 329 [I-D.ietf-quic-transport]), the use of unencrypted transports for DNS 330 may become "the weakest link" in privacy. It is noted that at the 331 time of writing there is on-going work attempting to encrypt the SNI 332 in the TLS handshake [I-D.ietf-tls-sni-encryption]. 334 An important specificity of the DNS traffic is that it may take a 335 different path than the communication between the initiator and the 336 recipient. For instance, an eavesdropper may be unable to tap the 337 wire between the initiator and the recipient but may have access to 338 the wire going to the recursive resolver, or to the authoritative 339 name servers. 341 The best place to tap, from an eavesdropper's point of view, is 342 clearly between the stub resolvers and the recursive resolvers, 343 because traffic is not limited by DNS caching. 345 The attack surface between the stub resolver and the rest of the 346 world can vary widely depending upon how the end user's computer is 347 configured. By order of increasing attack surface: 349 The recursive resolver can be on the end user's computer. In 350 (currently) a small number of cases, individuals may choose to 351 operate their own DNS resolver on their local machine. In this 352 case, the attack surface for the connection between the stub 353 resolver and the caching resolver is limited to that single 354 machine. 356 The recursive resolver may be at the local network edge. For 357 many/most enterprise networks and for some residential users, the 358 caching resolver may exist on a server at the edge of the local 359 network. In this case, the attack surface is the local network. 360 Note that in large enterprise networks, the DNS resolver may not 361 be located at the edge of the local network but rather at the edge 362 of the overall enterprise network. In this case, the enterprise 363 network could be thought of as similar to the Internet Access 364 Provider (IAP) network referenced below. 366 The recursive resolver can be in the IAP premises. For most 367 residential users and potentially other networks, the typical case 368 is for the end user's computer to be configured (typically 369 automatically through DHCP) with the addresses of the DNS 370 recursive resolvers at the IAP. The attack surface for on-the- 371 wire attacks is therefore from the end-user system across the 372 local network and across the IAP network to the IAP's recursive 373 resolvers. 375 The recursive resolver can be a public DNS service. Some machines 376 may be configured to use public DNS resolvers such as those 377 operated today by Google Public DNS or OpenDNS. The end user may 378 have configured their machine to use these DNS recursive resolvers 379 themselves -- or their IAP may have chosen to use the public DNS 380 resolvers rather than operating their own resolvers. In this 381 case, the attack surface is the entire public Internet between the 382 end user's connection and the public DNS service. 384 2.4.2. Encrypted Transports 386 The use of encrypted transports directly mitigates passive 387 surveillance of the DNS payload, however there are still some privacy 388 attacks possible. 390 These are cases where user identification, fingerprinting or 391 correlations may be possible due to the use of certain transport 392 layers or clear text/observable features. These issues are not 393 specific to DNS, but DNS traffic is susceptible to these attacks when 394 using specific transports. 396 There are some general examples, for example, certain studies have 397 highlighted that IP TTL or TCP Window sizes os-fingerprint [2] values 398 can be used to fingerprint client OS's or that various techniques can 399 be used to de-NAT DNS queries dns-de-nat [3]. 401 The use of clear text transport options to decrease latency may also 402 identify a user e.g. using TCP Fast Open [RFC7413]. 404 More specifically, (since the deployment of encrypted transports is 405 not widespread at the time of writing) users wishing to use encrypted 406 transports for DNS may in practice be limited in the resolver 407 services available. Given this, the choice of a user to configure a 408 single resolver (or a fixed set of resolvers) and an encrypted 409 transport to use in all network environments can actually serve to 410 identify the user as one that desires privacy and can provide an 411 added mechanism to track them as they move across network 412 environments. 414 Users of encrypted transports are also highly likely to re-use 415 sessions for multiple DNS queries to optimize performance (e.g. via 416 DNS pipelining or HTTPS multiplexing). Certain configuration options 417 for encrypted transports could also in principle fingerprint a user, 418 for example session resumption, the maximum number of messages to 419 send or a maximum connection time before closing a connections and 420 re-opening. 422 Whilst there are known attacks on older versions of TLS the most 423 recent recommendations [RFC7525] and developments [RFC8446] in this 424 area largely mitigate those. 426 Traffic analysis of unpadded encrypted traffic is also possible 427 [pitfalls-of-dns-encrption] because the sizes and timing of encrypted 428 DNS requests and responses can be correlated to unencrypted DNS 429 requests upstream of a recursive resolver. 431 2.5. In the Servers 433 Using the terminology of [RFC6973], the DNS servers (recursive 434 resolvers and authoritative servers) are enablers: they facilitate 435 communication between an initiator and a recipient without being 436 directly in the communications path. As a result, they are often 437 forgotten in risk analysis. But, to quote again [RFC6973], "Although 438 [...] enablers may not generally be considered as attackers, they may 439 all pose privacy threats (depending on the context) because they are 440 able to observe, collect, process, and transfer privacy-relevant 441 data." In [RFC6973] parlance, enablers become observers when they 442 start collecting data. 444 Many programs exist to collect and analyze DNS data at the servers -- 445 from the "query log" of some programs like BIND to tcpdump and more 446 sophisticated programs like PacketQ [packetq] [packetq-list] and 447 DNSmezzo [dnsmezzo]. The organization managing the DNS server can 448 use this data itself, or it can be part of a surveillance program 449 like PRISM [prism] and pass data to an outside observer. 451 Sometimes, this data is kept for a long time and/or distributed to 452 third parties for research purposes [ditl] [day-at-root], security 453 analysis, or surveillance tasks. These uses are sometimes under some 454 sort of contract, with various limitations, for instance, on 455 redistribution, given the sensitive nature of the data. Also, there 456 are observation points in the network that gather DNS data and then 457 make it accessible to third parties for research or security purposes 458 ("passive DNS" [passive-dns]). 460 2.5.1. In the Recursive Resolvers 462 Recursive Resolvers see all the traffic since there is typically no 463 caching before them. To summarize: your recursive resolver knows a 464 lot about you. The resolver of a large IAP, or a large public 465 resolver, can collect data from many users. You may get an idea of 466 the data collected by reading the privacy policy of a big public 467 resolver, e.g., . 470 2.5.1.1. Encrypted transports 472 Use of encrypted transports does not reduce the data available in the 473 recursive resolver and ironically can actually expose more 474 information about users to operators. As mentioned in Section 2.4 475 use of session based encrypted transports (TCP/TLS) can expose 476 correlation data about users. Such concerns in the TCP/TLS layers 477 apply equally to DoT and DoH which both use TLS as the underlying 478 transport. 480 2.5.1.2. DoH vs DoT 482 The proposed specification for DoH [RFC8484] includes a Privacy 483 Considerations section which highlights some of the differences 484 between HTTP and DNS. As a deliberate design choice DoH inherits the 485 privacy properties of the HTTPS stack and as a consequence introduces 486 new privacy concerns when compared with DNS over UDP, TCP or TLS 487 [RFC7858]. The rationale for this decision is that retaining the 488 ability to leverage the full functionality of the HTTP ecosystem is 489 more important than placing specific constraints on this new protocol 490 based on privacy considerations (modulo limiting the use of HTTP 491 cookies). 493 In analyzing the new issues introduced by DoH it is helpful to 494 recognize that there exists a natural tension between 496 o the wide practice in HTTP to use various headers to optimize HTTP 497 connections, functionality and behaviour (which can facilitate 498 user identification and tracking) 500 o and the fact that the DNS payload is currently very tightly 501 encoded and contains no standardized user identifiers. 503 DoT, for example, would normally contain no client identifiers above 504 the TLS layer and a resolver would see only a stream of DNS query 505 payloads originating within one or more connections from a client IP 506 address. Whereas if DoH clients commonly include several headers in 507 a DNS message (e.g. user-agent and accept-language) this could lead 508 to the DoH server being able to identify the source of individual DNS 509 requests not only to a specific end user device but to a specific 510 application. 512 Additionally, depending on the client architecture, isolation of DoH 513 queries from other HTTP traffic may or may not be feasible or 514 desirable. Depending on the use case, isolation of DoH queries from 515 other HTTP traffic may or may not increase privacy. 517 The picture for privacy considerations and user expectations for DoH 518 with respect to what additional data may be available to the DoH 519 server compared to DNS over UDP, TCP or TLS is complex and requires a 520 detailed analysis for each use case. In particular the choice of 521 HTTPS functionality vs privacy is specifically made an implementation 522 choice in DoH and users may well have differing privacy expectations 523 depending on the DoH use case and implementation. 525 At the extremes, there may be implementations that attempt to achieve 526 parity with DoT from a privacy perspective at the cost of using no 527 identifiable headers, there might be others that provide feature rich 528 data flows where the low-level origin of the DNS query is easily 529 identifiable. 531 Privacy focussed users should be aware of the potential for 532 additional client identifiers in DoH compared to DoT and may want to 533 only use DoH implementations that provide clear guidance on what 534 identifiers they add. 536 2.5.2. In the Authoritative Name Servers 538 Unlike what happens for recursive resolvers, observation capabilities 539 of authoritative name servers are limited by caching; they see only 540 the requests for which the answer was not in the cache. For 541 aggregated statistics ("What is the percentage of LOC queries?"), 542 this is sufficient, but it prevents an observer from seeing 543 everything. Still, the authoritative name servers see a part of the 544 traffic, and this subset may be sufficient to violate some privacy 545 expectations. 547 Also, the end user typically has some legal/contractual link with the 548 recursive resolver (he has chosen the IAP, or he has chosen to use a 549 given public resolver), while having no control and perhaps no 550 awareness of the role of the authoritative name servers and their 551 observation abilities. 553 As noted before, using a local resolver or a resolver close to the 554 machine decreases the attack surface for an on-the-wire eavesdropper. 555 But it may decrease privacy against an observer located on an 556 authoritative name server. This authoritative name server will see 557 the IP address of the end client instead of the address of a big 558 recursive resolver shared by many users. 560 This "protection", when using a large resolver with many clients, is 561 no longer present if ECS [RFC7871] is used because, in this case, the 562 authoritative name server sees the original IP address (or prefix, 563 depending on the setup). 565 As of today, all the instances of one root name server, L-root, 566 receive together around 50,000 queries per second. While most of it 567 is "junk" (errors on the Top-Level Domain (TLD) name), it gives an 568 idea of the amount of big data that pours into name servers. (And 569 even "junk" can leak information; for instance, if there is a typing 570 error in the TLD, the user will send data to a TLD that is not the 571 usual one.) 573 Many domains, including TLDs, are partially hosted by third-party 574 servers, sometimes in a different country. The contracts between the 575 domain manager and these servers may or may not take privacy into 576 account. Whatever the contract, the third-party hoster may be honest 577 or not but, in any case, it will have to follow its local laws. So, 578 requests to a given ccTLD may go to servers managed by organizations 579 outside of the ccTLD's country. End users may not anticipate that, 580 when doing a security analysis. 582 Also, it seems (see the survey described in [aeris-dns]) that there 583 is a strong concentration of authoritative name servers among 584 "popular" domains (such as the Alexa Top N list). For instance, 585 among the Alexa Top 100K, one DNS provider hosts today 10% of the 586 domains. The ten most important DNS providers host together one 587 third of the domains. With the control (or the ability to sniff the 588 traffic) of a few name servers, you can gather a lot of information. 590 2.5.3. Rogue Servers 592 The previous paragraphs discussed DNS privacy, assuming that all the 593 traffic was directed to the intended servers and that the potential 594 attacker was purely passive. But, in reality, we can have active 595 attackers redirecting the traffic, not to change it but just to 596 observe it. 598 For instance, a rogue DHCP server, or a trusted DHCP server that has 599 had its configuration altered by malicious parties, can direct you to 600 a rogue recursive resolver. Most of the time, it seems to be done to 601 divert traffic by providing lies for some domain names. But it could 602 be used just to capture the traffic and gather information about you. 603 Other attacks, besides using DHCP, are possible. The traffic from a 604 DNS client to a DNS server can be intercepted along its way from 605 originator to intended source, for instance, by transparent DNS 606 proxies in the network that will divert the traffic intended for a 607 legitimate DNS server. This rogue server can masquerade as the 608 intended server and respond with data to the client. (Rogue servers 609 that inject malicious data are possible, but it is a separate problem 610 not relevant to privacy.) A rogue server may respond correctly for a 611 long period of time, thereby foregoing detection. This may be done 612 for what could be claimed to be good reasons, such as optimization or 613 caching, but it leads to a reduction of privacy compared to if there 614 was no attacker present. Also, malware like DNSchanger [dnschanger] 615 can change the recursive resolver in the machine's configuration, or 616 the routing itself can be subverted (for instance, 617 [ripe-atlas-turkey]). 619 2.5.4. Authentication of servers 621 Both DoH and Strict mode for DoT require authentication of the server 622 and therefore as long as the authentication credentials are obtained 623 over a secure channel then using either of these transports defeats 624 the attack of re-directing traffic to rogue servers. Of course 625 attacks on these secure channels are also possible, but out of the 626 scope of this document. 628 2.5.5. Blocking of services 630 User privacy can also be at risk if there is blocking (by local 631 network operators or more general mechanisms) of access to recursive 632 servers that offer encrypted transports. For example active blocking 633 of port 853 for DoT or of specific IP addresses (e.g. 1.1.1.1 or 634 2606:4700:4700::1111) could restrict the resolvers available to the 635 client. Similarly attacks on such services e.g. DDoS could force 636 users to switch to other services that do not offer encrypted 637 transports for DNS. 639 2.6. Re-identification and Other Inferences 641 An observer has access not only to the data he/she directly collects 642 but also to the results of various inferences about this data. 644 For instance, a user can be re-identified via DNS queries. If the 645 adversary knows a user's identity and can watch their DNS queries for 646 a period, then that same adversary may be able to re-identify the 647 user solely based on their pattern of DNS queries later on regardless 648 of the location from which the user makes those queries. For 649 example, one study [herrmann-reidentification] found that such re- 650 identification is possible so that "73.1% of all day-to-day links 651 were correctly established, i.e. user u was either re-identified 652 unambiguously (1) or the classifier correctly reported that u was not 653 present on day t+1 any more (2)." While that study related to web 654 browsing behavior, equally characteristic patterns may be produced 655 even in machine-to-machine communications or without a user taking 656 specific actions, e.g., at reboot time if a characteristic set of 657 services are accessed by the device. 659 For instance, one could imagine that an intelligence agency 660 identifies people going to a site by putting in a very long DNS name 661 and looking for queries of a specific length. Such traffic analysis 662 could weaken some privacy solutions. 664 The IAB privacy and security program also have a work in progress 665 [RFC7624] that considers such inference-based attacks in a more 666 general framework. 668 2.7. More Information 670 Useful background information can also be found in [tor-leak] (about 671 the risk of privacy leak through DNS) and in a few academic papers: 672 [yanbin-tsudik], [castillo-garcia], [fangming-hori-sakurai], and 673 [federrath-fuchs-herrmann-piosecny]. 675 3. Actual "Attacks" 677 A very quick examination of DNS traffic may lead to the false 678 conclusion that extracting the needle from the haystack is difficult. 679 "Interesting" primary DNS requests are mixed with useless (for the 680 eavesdropper) secondary and tertiary requests (see the terminology in 681 Section 1). But, in this time of "big data" processing, powerful 682 techniques now exist to get from the raw data to what the 683 eavesdropper is actually interested in. 685 Many research papers about malware detection use DNS traffic to 686 detect "abnormal" behavior that can be traced back to the activity of 687 malware on infected machines. Yes, this research was done for the 688 good, but technically it is a privacy attack and it demonstrates the 689 power of the observation of DNS traffic. See [dns-footprint], 690 [dagon-malware], and [darkreading-dns]. 692 Passive DNS systems [passive-dns] allow reconstruction of the data of 693 sometimes an entire zone. They are used for many reasons -- some 694 good, some bad. Well-known passive DNS systems keep only the DNS 695 responses, and not the source IP address of the client, precisely for 696 privacy reasons. Other passive DNS systems may not be so careful. 697 And there is still the potential problems with revealing QNAMEs. 699 The revelations (from the Edward Snowden documents, which were leaked 700 from the National Security Agency (NSA)) of the MORECOWBELL 701 surveillance program [morecowbell], which uses the DNS, both 702 passively and actively, to surreptitiously gather information about 703 the users, is another good example showing that the lack of privacy 704 protections in the DNS is actively exploited. 706 4. Legalities 708 To our knowledge, there are no specific privacy laws for DNS data, in 709 any country. Interpreting general privacy laws like 710 [data-protection-directive] or GDPR [4] applicable in the European 711 Union in the context of DNS traffic data is not an easy task, and we 712 do not know a court precedent here. See an interesting analysis in 713 [sidn-entrada]. 715 5. Security Considerations 717 This document is entirely about security, more precisely privacy. It 718 just lays out the problem; it does not try to set requirements (with 719 the choices and compromises they imply), much less define solutions. 720 Possible solutions to the issues described here are discussed in 721 other documents (currently too many to all be mentioned); see, for 722 instance, 'Recommendations for DNS Privacy Operators' 723 [I-D.ietf-dprive-bcp-op]. 725 6. Acknowledgments 727 Thanks to Nathalie Boulvard and to the CENTR members for the original 728 work that led to this document. Thanks to Ondrej Sury for the 729 interesting discussions. Thanks to Mohsen Souissi and John Heidemann 730 for proofreading and to Paul Hoffman, Matthijs Mekking, Marcos Sanz, 731 Tim Wicinski, Francis Dupont, Allison Mankin, and Warren Kumari for 732 proofreading, providing technical remarks, and making many 733 readability improvements. Thanks to Dan York, Suzanne Woolf, Tony 734 Finch, Stephen Farrell, Peter Koch, Simon Josefsson, and Frank Denis 735 for good written contributions. And thanks to the IESG members for 736 the last remarks. 738 7. Changelog 740 draft-ietf-dprive-rfc7627-bis-00 742 o Rename after WG adoption 744 o Use DoT acronym throughout 746 o Minor updates to status of deployment and other drafts 748 draft-bortzmeyer-dprive-rfc7626-bis-02 750 o Update various references and fix some nits. 752 draft-bortzmeyer-dprive-rfc7626-bis-01 754 o Update reference for dickinson-bcp-op to draft-dickinson-dprive- 755 bcp-op 757 draft-borztmeyer-dprive-rfc7626-bis-00: 759 Initial commit. Differences to RFC7626: 761 o Update many references 762 o Add discussions of encrypted transports including DoT and DoH 764 o Add section on DNS payload 766 o Add section on authentication of servers 768 o Add section on blocking of services 770 8. References 772 8.1. Normative References 774 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", 775 STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987, 776 . 778 [RFC1035] Mockapetris, P., "Domain names - implementation and 779 specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, 780 November 1987, . 782 [RFC6973] Cooper, A., Tschofenig, H., Aboba, B., Peterson, J., 783 Morris, J., Hansen, M., and R. Smith, "Privacy 784 Considerations for Internet Protocols", RFC 6973, 785 DOI 10.17487/RFC6973, July 2013, . 788 [RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an 789 Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May 790 2014, . 792 8.2. Informative References 794 [aeris-dns] 795 Vinot, N., "Vie privee: et le DNS alors?", (In French), 796 2015, . 799 [castillo-garcia] 800 Castillo-Perez, S. and J. Garcia-Alfaro, "Anonymous 801 Resolution of DNS Queries", 2008, 802 . 804 [dagon-malware] 805 Dagon, D., "Corrupted DNS Resolution Paths: The Rise of a 806 Malicious Resolution Authority", ISC/OARC Workshop, 2007, 807 . 810 [darkreading-dns] 811 Lemos, R., "Got Malware? Three Signs Revealed In DNS 812 Traffic", InformationWeek Dark Reading, May 2013, 813 . 817 [data-protection-directive] 818 European Parliament, "Directive 95/46/EC of the European 819 Pariament and of the council on the protection of 820 individuals with regard to the processing of personal data 821 and on the free movement of such data", Official Journal L 822 281, pp. 0031 - 0050, November 1995, . 826 [day-at-root] 827 Castro, S., Wessels, D., Fomenkov, M., and K. Claffy, "A 828 Day at the Root of the Internet", ACM SIGCOMM Computer 829 Communication Review, Vol. 38, Number 5, 830 DOI 10.1145/1452335.1452341, October 2008, 831 . 834 [denis-edns-client-subnet] 835 Denis, F., "Security and privacy issues of edns-client- 836 subnet", August 2013, . 839 [ditl] CAIDA, "A Day in the Life of the Internet (DITL)", 2002, 840 . 842 [dns-footprint] 843 Stoner, E., "DNS Footprint of Malware", OARC Workshop, 844 October 2010, . 847 [dnschanger] 848 Wikipedia, "DNSChanger", October 2013, 849 . 852 [dnsmezzo] 853 Bortzmeyer, S., "DNSmezzo", 2009, 854 . 856 [fangming-hori-sakurai] 857 Fangming, Z., Hori, Y., and K. Sakurai, "Analysis of 858 Privacy Disclosure in DNS Query", 2007 International 859 Conference on Multimedia and Ubiquitous Engineering (MUE 860 2007), Seoul, Korea, ISBN: 0-7695-2777-9, pp. 952-957, 861 DOI 10.1109/MUE.2007.84, April 2007, 862 . 864 [federrath-fuchs-herrmann-piosecny] 865 Federrath, H., Fuchs, K., Herrmann, D., and C. Piosecny, 866 "Privacy-Preserving DNS: Analysis of Broadcast, Range 867 Queries and Mix-based Protection Methods", Computer 868 Security ESORICS 2011, Springer, page(s) 665-683, 869 ISBN 978-3-642-23821-5, 2011, . 873 [grangeia.snooping] 874 Grangeia, L., "DNS Cache Snooping or Snooping the Cache 875 for Fun and Profit", February 2004, 876 . 879 [herrmann-reidentification] 880 Herrmann, D., Gerber, C., Banse, C., and H. Federrath, 881 "Analyzing Characteristic Host Access Patterns for Re- 882 Identification of Web User Sessions", 883 DOI 10.1007/978-3-642-27937-9_10, 2012, . 886 [I-D.ietf-dprive-bcp-op] 887 Dickinson, S., Overeinder, B., Rijswijk-Deij, R., and A. 888 Mankin, "Recommendations for DNS Privacy Service 889 Operators", draft-ietf-dprive-bcp-op-02 (work in 890 progress), March 2019. 892 [I-D.ietf-quic-transport] 893 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 894 and Secure Transport", draft-ietf-quic-transport-20 (work 895 in progress), April 2019. 897 [I-D.ietf-tls-sni-encryption] 898 Huitema, C. and E. Rescorla, "Issues and Requirements for 899 SNI Encryption in TLS", draft-ietf-tls-sni-encryption-04 900 (work in progress), November 2018. 902 [morecowbell] 903 Grothoff, C., Wachs, M., Ermert, M., and J. Appelbaum, 904 "NSA's MORECOWBELL: Knell for DNS", GNUnet e.V., January 905 2015, . 907 [packetq] Dot SE, "PacketQ, a simple tool to make SQL-queries 908 against PCAP-files", 2011, 909 . 911 [packetq-list] 912 PacketQ, "PacketQ Mailing List", 913 . 915 [passive-dns] 916 Weimer, F., "Passive DNS Replication", April 2005, 917 . 919 [pitfalls-of-dns-encrption] 920 Shulman, H., "Pretty Bad Privacy:Pitfalls of DNS 921 Encryption", . 924 [prism] Wikipedia, "PRISM (surveillance program)", July 2015, 925 . 928 [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. 929 Rose, "DNS Security Introduction and Requirements", 930 RFC 4033, DOI 10.17487/RFC4033, March 2005, 931 . 933 [RFC5155] Laurie, B., Sisson, G., Arends, R., and D. Blacka, "DNS 934 Security (DNSSEC) Hashed Authenticated Denial of 935 Existence", RFC 5155, DOI 10.17487/RFC5155, March 2008, 936 . 938 [RFC5936] Lewis, E. and A. Hoenes, Ed., "DNS Zone Transfer Protocol 939 (AXFR)", RFC 5936, DOI 10.17487/RFC5936, June 2010, 940 . 942 [RFC6269] Ford, M., Ed., Boucadair, M., Durand, A., Levis, P., and 943 P. Roberts, "Issues with IP Address Sharing", RFC 6269, 944 DOI 10.17487/RFC6269, June 2011, . 947 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP 948 Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, 949 . 951 [RFC7525] Sheffer, Y., Holz, R., and P. Saint-Andre, 952 "Recommendations for Secure Use of Transport Layer 953 Security (TLS) and Datagram Transport Layer Security 954 (DTLS)", BCP 195, RFC 7525, DOI 10.17487/RFC7525, May 955 2015, . 957 [RFC7624] Barnes, R., Schneier, B., Jennings, C., Hardie, T., 958 Trammell, B., Huitema, C., and D. Borkmann, 959 "Confidentiality in the Face of Pervasive Surveillance: A 960 Threat Model and Problem Statement", RFC 7624, 961 DOI 10.17487/RFC7624, August 2015, . 964 [RFC7858] Hu, Z., Zhu, L., Heidemann, J., Mankin, A., Wessels, D., 965 and P. Hoffman, "Specification for DNS over Transport 966 Layer Security (TLS)", RFC 7858, DOI 10.17487/RFC7858, May 967 2016, . 969 [RFC7871] Contavalli, C., van der Gaast, W., Lawrence, D., and W. 970 Kumari, "Client Subnet in DNS Queries", RFC 7871, 971 DOI 10.17487/RFC7871, May 2016, . 974 [RFC7873] Eastlake 3rd, D. and M. Andrews, "Domain Name System (DNS) 975 Cookies", RFC 7873, DOI 10.17487/RFC7873, May 2016, 976 . 978 [RFC7929] Wouters, P., "DNS-Based Authentication of Named Entities 979 (DANE) Bindings for OpenPGP", RFC 7929, 980 DOI 10.17487/RFC7929, August 2016, . 983 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol 984 Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, 985 . 987 [RFC8484] Hoffman, P. and P. McManus, "DNS Queries over HTTPS 988 (DoH)", RFC 8484, DOI 10.17487/RFC8484, October 2018, 989 . 991 [RFC8499] Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS 992 Terminology", BCP 219, RFC 8499, DOI 10.17487/RFC8499, 993 January 2019, . 995 [ripe-atlas-turkey] 996 Aben, E., "A RIPE Atlas View of Internet Meddling in 997 Turkey", March 2014, 998 . 1001 [sidn-entrada] 1002 Hesselman, C., Jansen, J., Wullink, M., Vink, K., and M. 1003 Simon, "A privacy framework for 'DNS big data' 1004 applications", November 2014, 1005 . 1008 [thomas-ditl-tcp] 1009 Thomas, M. and D. Wessels, "An Analysis of TCP Traffic in 1010 Root Server DITL Data", DNS-OARC 2014 Fall Workshop, 1011 October 2014, . 1015 [tor-leak] 1016 Tor, "DNS leaks in Tor", 2013, 1017 . 1020 [yanbin-tsudik] 1021 Yanbin, L. and G. Tsudik, "Towards Plugging Privacy Leaks 1022 in the Domain Name System", October 2009, 1023 . 1025 8.3. URIs 1027 [1] https://lists.dns-oarc.net/pipermail/dns- 1028 operations/2016-January/014141.html 1030 [2] http://netres.ec/?b=11B99BD 1032 [3] https://www.researchgate.net/publication/320322146_DNS-DNS_DNS- 1033 based_De-NAT_Scheme 1035 [4] https://www.eugdpr.org/the-regulation.html 1037 Authors' Addresses 1038 Stephane Bortzmeyer 1039 AFNIC 1040 1, rue Stephenson 1041 Montigny-le-Bretonneux 1042 France 78180 1044 Email: bortzmeyer+ietf@nic.fr 1046 Sara Dickinson 1047 Sinodun IT 1048 Magdalen Centre 1049 Oxford Science Park 1050 Oxford OX4 4GA 1051 United Kingdom 1053 Email: sara@sinodun.com