idnits 2.17.1 draft-bortzmeyer-dprive-rfc7626-bis-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. -- The draft header indicates that this document obsoletes RFC7626, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 16, 2018) is 2104 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 1003 == Missing Reference: 'I-D.draft-ietf-quic-transport' is mentioned on line 326, but not defined -- Looks like a reference, but probably isn't: '2' on line 1006 -- Looks like a reference, but probably isn't: '3' on line 1008 -- Looks like a reference, but probably isn't: '4' on line 1011 == Outdated reference: A later version (-01) exists of draft-dickinson-dprive-bcp-op-00 == Outdated reference: A later version (-14) exists of draft-ietf-dnsop-terminology-bis-11 == Outdated reference: A later version (-14) exists of draft-ietf-doh-dns-over-https-12 -- Obsolete informational reference (is this intentional?): RFC 7525 (Obsoleted by RFC 9325) Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 dprive S. Bortzmeyer 3 Internet-Draft AFNIC 4 Obsoletes: 7626 (if approved) S. Dickinson 5 Intended status: Informational Sinodun IT 6 Expires: January 17, 2019 July 16, 2018 8 DNS Privacy Considerations 9 draft-bortzmeyer-dprive-rfc7626-bis-01 11 Abstract 13 This document describes the privacy issues associated with the use of 14 the DNS by Internet users. It is intended to be an analysis of the 15 present situation and does not prescribe solutions. 17 Status of This Memo 19 This Internet-Draft is submitted in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF). Note that other groups may also distribute 24 working documents as Internet-Drafts. The list of current Internet- 25 Drafts is at http://datatracker.ietf.org/drafts/current/. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 This Internet-Draft will expire on January 17, 2019. 34 Copyright Notice 36 Copyright (c) 2018 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with respect 44 to this document. Code Components extracted from this document must 45 include Simplified BSD License text as described in Section 4.e of 46 the Trust Legal Provisions and are provided without warranty as 47 described in the Simplified BSD License. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 52 2. Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 53 2.1. The Alleged Public Nature of DNS Data . . . . . . . . . . 5 54 2.2. Data in the DNS Request . . . . . . . . . . . . . . . . . 5 55 2.2.1. Data in the DNS payload . . . . . . . . . . . . . . . 7 56 2.3. Cache Snooping . . . . . . . . . . . . . . . . . . . . . 7 57 2.4. On the Wire . . . . . . . . . . . . . . . . . . . . . . . 7 58 2.4.1. Unencrypted Transports . . . . . . . . . . . . . . . 7 59 2.4.2. Encrypted Transports . . . . . . . . . . . . . . . . 9 60 2.5. In the Servers . . . . . . . . . . . . . . . . . . . . . 10 61 2.5.1. In the Recursive Resolvers . . . . . . . . . . . . . 10 62 2.5.2. In the Authoritative Name Servers . . . . . . . . . . 12 63 2.5.3. Rogue Servers . . . . . . . . . . . . . . . . . . . . 13 64 2.5.4. Authentication of servers . . . . . . . . . . . . . . 13 65 2.5.5. Blocking of services . . . . . . . . . . . . . . . . 14 66 2.6. Re-identification and Other Inferences . . . . . . . . . 14 67 2.7. More Information . . . . . . . . . . . . . . . . . . . . 15 68 3. Actual "Attacks" . . . . . . . . . . . . . . . . . . . . . . 15 69 4. Legalities . . . . . . . . . . . . . . . . . . . . . . . . . 15 70 5. Security Considerations . . . . . . . . . . . . . . . . . . . 16 71 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 16 72 7. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 16 73 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 74 8.1. Normative References . . . . . . . . . . . . . . . . . . 17 75 8.2. Informative References . . . . . . . . . . . . . . . . . 17 76 8.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 22 77 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 79 1. Introduction 81 This document is an analysis of the DNS privacy issues, in the spirit 82 of Section 8 of [RFC6973]. 84 The Domain Name System is specified in [RFC1034], [RFC1035], and many 85 later RFCs, which have never been consolidated. It is one of the 86 most important infrastructure components of the Internet and often 87 ignored or misunderstood by Internet users (and even by many 88 professionals). Almost every activity on the Internet starts with a 89 DNS query (and often several). Its use has many privacy implications 90 and this is an attempt at a comprehensive and accurate list. 92 Let us begin with a simplified reminder of how the DNS works. (See 93 also [I-D.ietf-dnsop-terminology-bis]) A client, the stub resolver, 94 issues a DNS query to a server, called the recursive resolver (also 95 called caching resolver or full resolver or recursive name server). 96 Let's use the query "What are the AAAA records for www.example.com?" 97 as an example. AAAA is the QTYPE (Query Type), and www.example.com 98 is the QNAME (Query Name). (The description that follows assumes a 99 cold cache, for instance, because the server just started.) The 100 recursive resolver will first query the root name servers. In most 101 cases, the root name servers will send a referral. In this example, 102 the referral will be to the .com name servers. The resolver repeats 103 the query to one of the .com name servers. The .com name servers, in 104 turn, will refer to the example.com name servers. The example.com 105 name server will then return the answer. The root name servers, the 106 name servers of .com, and the name servers of example.com are called 107 authoritative name servers. It is important, when analyzing the 108 privacy issues, to remember that the question asked to all these name 109 servers is always the original question, not a derived question. The 110 question sent to the root name servers is "What are the AAAA records 111 for www.example.com?", not "What are the name servers of .com?". By 112 repeating the full question, instead of just the relevant part of the 113 question to the next in line, the DNS provides more information than 114 necessary to the name server. 116 Because DNS relies on caching heavily, the algorithm described just 117 above is actually a bit more complicated, and not all questions are 118 sent to the authoritative name servers. If a few seconds later the 119 stub resolver asks the recursive resolver, "What are the SRV records 120 of _xmpp-server._tcp.example.com?", the recursive resolver will 121 remember that it knows the name servers of example.com and will just 122 query them, bypassing the root and .com. Because there is typically 123 no caching in the stub resolver, the recursive resolver, unlike the 124 authoritative servers, sees all the DNS traffic. (Applications, like 125 web browsers, may have some form of caching that does not follow DNS 126 rules, for instance, because it may ignore the TTL. So, the 127 recursive resolver does not see all the name resolution activity.) 129 It should be noted that DNS recursive resolvers sometimes forward 130 requests to other recursive resolvers, typically bigger machines, 131 with a larger and more shared cache (and the query hierarchy can be 132 even deeper, with more than two levels of recursive resolvers). From 133 the point of view of privacy, these forwarders are like resolvers, 134 except that they do not see all of the requests being made (due to 135 caching in the first resolver). 137 Almost all this DNS traffic is currently sent in clear (unencrypted). 138 At the time of writing there is increasing deployment of DNS-over-TLS 139 [RFC7858] and work underway on DoH [I-D.ietf-doh-dns-over-https]. 140 There are a few cases where there is some alternative channel 141 encryption, for instance, in an IPsec VPN, at least between the stub 142 resolver and the resolver. 144 Today, almost all DNS queries are sent over UDP [thomas-ditl-tcp]. 145 This has practical consequences when considering encryption of the 146 traffic as a possible privacy technique. Some encryption solutions 147 are only designed for TCP, not UDP. 149 Another important point to keep in mind when analyzing the privacy 150 issues of DNS is the fact that DNS requests received by a server are 151 triggered by different reasons. Let's assume an eavesdropper wants 152 to know which web page is viewed by a user. For a typical web page, 153 there are three sorts of DNS requests being issued: 155 Primary request: this is the domain name in the URL that the user 156 typed, selected from a bookmark, or chose by clicking on an 157 hyperlink. Presumably, this is what is of interest for the 158 eavesdropper. 160 Secondary requests: these are the additional requests performed by 161 the user agent (here, the web browser) without any direct involvement 162 or knowledge of the user. For the Web, they are triggered by 163 embedded content, Cascading Style Sheets (CSS), JavaScript code, 164 embedded images, etc. In some cases, there can be dozens of domain 165 names in different contexts on a single web page. 167 Tertiary requests: these are the additional requests performed by the 168 DNS system itself. For instance, if the answer to a query is a 169 referral to a set of name servers, and the glue records are not 170 returned, the resolver will have to do additional requests to turn 171 the name servers' names into IP addresses. Similarly, even if glue 172 records are returned, a careful recursive server will do tertiary 173 requests to verify the IP addresses of those records. 175 It can be noted also that, in the case of a typical web browser, more 176 DNS requests than strictly necessary are sent, for instance, to 177 prefetch resources that the user may query later or when 178 autocompleting the URL in the address bar. Both are a big privacy 179 concern since they may leak information even about non-explicit 180 actions. For instance, just reading a local HTML page, even without 181 selecting the hyperlinks, may trigger DNS requests. 183 For privacy-related terms, we will use the terminology from 184 [RFC6973]. 186 2. Risks 188 This document focuses mostly on the study of privacy risks for the 189 end user (the one performing DNS requests). We consider the risks of 190 pervasive surveillance [RFC7258] as well as risks coming from a more 191 focused surveillance. Privacy risks for the holder of a zone (the 192 risk that someone gets the data) are discussed in [RFC5936] and 193 [RFC5155]. Non-privacy risks (such as cache poisoning) are out of 194 scope. 196 2.1. The Alleged Public Nature of DNS Data 198 It has long been claimed that "the data in the DNS is public". While 199 this sentence makes sense for an Internet-wide lookup system, there 200 are multiple facets to the data and metadata involved that deserve a 201 more detailed look. First, access control lists and private 202 namespaces notwithstanding, the DNS operates under the assumption 203 that public-facing authoritative name servers will respond to "usual" 204 DNS queries for any zone they are authoritative for without further 205 authentication or authorization of the client (resolver). Due to the 206 lack of search capabilities, only a given QNAME will reveal the 207 resource records associated with that name (or that name's non- 208 existence). In other words: one needs to know what to ask for, in 209 order to receive a response. The zone transfer QTYPE [RFC5936] is 210 often blocked or restricted to authenticated/authorized access to 211 enforce this difference (and maybe for other reasons). 213 Another differentiation to be considered is between the DNS data 214 itself and a particular transaction (i.e., a DNS name lookup). DNS 215 data and the results of a DNS query are public, within the boundaries 216 described above, and may not have any confidentiality requirements. 217 However, the same is not true of a single transaction or a sequence 218 of transactions; that transaction is not / should not be public. A 219 typical example from outside the DNS world is: the web site of 220 Alcoholics Anonymous is public; the fact that you visit it should not 221 be. 223 2.2. Data in the DNS Request 225 The DNS request includes many fields, but two of them seem 226 particularly relevant for the privacy issues: the QNAME and the 227 source IP address. "source IP address" is used in a loose sense of 228 "source IP address + maybe source port", because the port is also in 229 the request and can be used to differentiate between several users 230 sharing an IP address (behind a Carrier-Grade NAT (CGN), for instance 231 [RFC6269]). 233 The QNAME is the full name sent by the user. It gives information 234 about what the user does ("What are the MX records of example.net?" 235 means he probably wants to send email to someone at example.net, 236 which may be a domain used by only a few persons and is therefore 237 very revealing about communication relationships). Some QNAMEs are 238 more sensitive than others. For instance, querying the A record of a 239 well-known web statistics domain reveals very little (everybody 240 visits web sites that use this analytics service), but querying the A 241 record of www.verybad.example where verybad.example is the domain of 242 an organization that some people find offensive or objectionable may 243 create more problems for the user. Also, sometimes, the QNAME embeds 244 the software one uses, which could be a privacy issue. For instance, 245 _ldap._tcp.Default-First-Site-Name._sites.gc._msdcs.example.org. 246 There are also some BitTorrent clients that query an SRV record for 247 _bittorrent-tracker._tcp.domain.example. 249 Another important thing about the privacy of the QNAME is the future 250 usages. Today, the lack of privacy is an obstacle to putting 251 potentially sensitive or personally identifiable data in the DNS. At 252 the moment, your DNS traffic might reveal that you are doing email 253 but not with whom. If your Mail User Agent (MUA) starts looking up 254 Pretty Good Privacy (PGP) keys in the DNS [RFC7929], then privacy 255 becomes a lot more important. And email is just an example; there 256 would be other really interesting uses for a more privacy- friendly 257 DNS. 259 For the communication between the stub resolver and the recursive 260 resolver, the source IP address is the address of the user's machine. 261 Therefore, all the issues and warnings about collection of IP 262 addresses apply here. For the communication between the recursive 263 resolver and the authoritative name servers, the source IP address 264 has a different meaning; it does not have the same status as the 265 source address in an HTTP connection. It is now the IP address of 266 the recursive resolver that, in a way, "hides" the real user. 267 However, hiding does not always work. Sometimes EDNS(0) Client 268 subnet [RFC7871] is used (see its privacy analysis in 269 [denis-edns-client-subnet]). Sometimes the end user has a personal 270 recursive resolver on her machine. In both cases, the IP address is 271 as sensitive as it is for HTTP [sidn-entrada]. 273 A note about IP addresses: there is currently no IETF document that 274 describes in detail all the privacy issues around IP addressing. In 275 the meantime, the discussion here is intended to include both IPv4 276 and IPv6 source addresses. For a number of reasons, their assignment 277 and utilization characteristics are different, which may have 278 implications for details of information leakage associated with the 279 collection of source addresses. (For example, a specific IPv6 source 280 address seen on the public Internet is less likely than an IPv4 281 address to originate behind a CGN or other NAT.) However, for both 282 IPv4 and IPv6 addresses, it's important to note that source addresses 283 are propagated with queries and comprise metadata about the host, 284 user, or application that originated them. 286 2.2.1. Data in the DNS payload 288 At the time of writing there are no standardized client identifiers 289 contained in the DNS payload itself (ECS [RFC7871] while widely used 290 is only of Category Informational). 292 DNS Cookies [RFC7873] are a lightweight DNS transaction security 293 mechanism that provides limited protection against a variety of 294 increasingly common denial-of-service and amplification/forgery or 295 cache poisoning attacks by off-path attackers. It is noted, however, 296 that they are designed to just verify IP addresses (and should change 297 once a client's IP address changes), they are not designed to 298 actively track users (like HTTP cookies). 300 There are anecdotal accounts of MAC addresses [1] and even user names 301 being inserted in non-standard EDNS(0) options for stub to resolver 302 communications to support proprietary functionality implemented at 303 the resolver (e.g. parental filtering). 305 2.3. Cache Snooping 307 The content of recursive resolvers' caches can reveal data about the 308 clients using it (the privacy risks depend on the number of clients). 309 This information can sometimes be examined by sending DNS queries 310 with RD=0 to inspect cache content, particularly looking at the DNS 311 TTLs [grangeia.snooping]. Since this also is a reconnaissance 312 technique for subsequent cache poisoning attacks, some counter 313 measures have already been developed and deployed. 315 2.4. On the Wire 317 2.4.1. Unencrypted Transports 319 For unencrypted transports, DNS traffic can be seen by an 320 eavesdropper like any other traffic. (DNSSEC, specified in 321 [RFC4033], explicitly excludes confidentiality from its goals.) So, 322 if an initiator starts an HTTPS communication with a recipient, while 323 the HTTP traffic will be encrypted, the DNS exchange prior to it will 324 not be. When other protocols will become more and more privacy-aware 325 and secured against surveillance (e.g. [I-D.draft-ietf-tls-tls130, 326 [I-D.draft-ietf-quic-transport]), the use of unencrypted transports 327 for DNS may become "the weakest link" in privacy. It is noted that 328 there is on-going work attempting to encrypt the SNI in the TLS 329 handshake but that this is a non-trivial problem [I-D.ietf-tls-sni- 330 encryption]. 332 An important specificity of the DNS traffic is that it may take a 333 different path than the communication between the initiator and the 334 recipient. For instance, an eavesdropper may be unable to tap the 335 wire between the initiator and the recipient but may have access to 336 the wire going to the recursive resolver, or to the authoritative 337 name servers. 339 The best place to tap, from an eavesdropper's point of view, is 340 clearly between the stub resolvers and the recursive resolvers, 341 because traffic is not limited by DNS caching. 343 The attack surface between the stub resolver and the rest of the 344 world can vary widely depending upon how the end user's computer is 345 configured. By order of increasing attack surface: 347 The recursive resolver can be on the end user's computer. In 348 (currently) a small number of cases, individuals may choose to 349 operate their own DNS resolver on their local machine. In this 350 case, the attack surface for the connection between the stub 351 resolver and the caching resolver is limited to that single 352 machine. 354 The recursive resolver may be at the local network edge. For 355 many/most enterprise networks and for some residential users, the 356 caching resolver may exist on a server at the edge of the local 357 network. In this case, the attack surface is the local network. 358 Note that in large enterprise networks, the DNS resolver may not 359 be located at the edge of the local network but rather at the edge 360 of the overall enterprise network. In this case, the enterprise 361 network could be thought of as similar to the Internet Access 362 Provider (IAP) network referenced below. 364 The recursive resolver can be in the IAP premises. For most 365 residential users and potentially other networks, the typical case 366 is for the end user's computer to be configured (typically 367 automatically through DHCP) with the addresses of the DNS 368 recursive resolvers at the IAP. The attack surface for on-the- 369 wire attacks is therefore from the end-user system across the 370 local network and across the IAP network to the IAP's recursive 371 resolvers. 373 The recursive resolver can be a public DNS service. Some machines 374 may be configured to use public DNS resolvers such as those 375 operated today by Google Public DNS or OpenDNS. The end user may 376 have configured their machine to use these DNS recursive resolvers 377 themselves -- or their IAP may have chosen to use the public DNS 378 resolvers rather than operating their own resolvers. In this 379 case, the attack surface is the entire public Internet between the 380 end user's connection and the public DNS service. 382 2.4.2. Encrypted Transports 384 The use of encrypted transports directly mitigates passive 385 surveillance of the DNS payload, however there are still some privacy 386 attacks possible. 388 These are cases where user identification, fingerprinting or 389 correlations may be possible due to the use of certain transport 390 layers or clear text/observable features. These issues are not 391 specific to DNS, but DNS traffic is susceptible to these attacks when 392 using specific transports. 394 There are some general examples, for example, certain studies have 395 highlighted that IP TTL or TCP Window sizes os-fingerprint [2] values 396 can be used to fingerprint client OS's or that various techniques can 397 be used to de-NAT DNS queries dns-de-nat [3]. 399 The use of clear text transport options to decrease latency may also 400 identify a user e.g. using TCP Fast Open [RFC7413]. 402 More specifically, (since the deployment of encrypted transports is 403 not widespread at the time of writing) users wishing to use encrypted 404 transports for DNS may in practice be limited in the resolver 405 services available. Given this, the choice of a user to configure a 406 single resolver (or a fixed set of resolvers) and an encrypted 407 transport to use in all network environments can actually serve to 408 identify the user as one that desires privacy and can provide an 409 added mechanism to track them as they move across network 410 environments. 412 Users of encrypted transports are also highly likely to re-use 413 sessions for multiple DNS queries to optimize performance (e.g. via 414 DNS pipelining or HTTPS multiplexing). Certain configuration options 415 for encrypted transports could also in principle fingerprint a user, 416 for example session resumption, the maximum number of messages to 417 send or a maximum connection time before closing a connections and 418 re-opening. 420 Whilst there are known attacks on older versions of TLS the most 421 recent recommendations [RFC7525] and developments [I-D.draft-ietf- 422 tls-tls13] in this area largely mitigate those. 424 Traffic analysis of unpadded encrypted traffic is also possible 425 [pitfalls-of-dns-encrption] because the sizes and timing of encrypted 426 DNS requests and responses can be correlated to unencrypted DNS 427 requests upstream of a recursive resolver. 429 2.5. In the Servers 431 Using the terminology of [RFC6973], the DNS servers (recursive 432 resolvers and authoritative servers) are enablers: they facilitate 433 communication between an initiator and a recipient without being 434 directly in the communications path. As a result, they are often 435 forgotten in risk analysis. But, to quote again [RFC6973], "Although 436 [...] enablers may not generally be considered as attackers, they may 437 all pose privacy threats (depending on the context) because they are 438 able to observe, collect, process, and transfer privacy-relevant 439 data." In [RFC6973] parlance, enablers become observers when they 440 start collecting data. 442 Many programs exist to collect and analyze DNS data at the servers -- 443 from the "query log" of some programs like BIND to tcpdump and more 444 sophisticated programs like PacketQ [packetq] [packetq-list] and 445 DNSmezzo [dnsmezzo]. The organization managing the DNS server can 446 use this data itself, or it can be part of a surveillance program 447 like PRISM [prism] and pass data to an outside observer. 449 Sometimes, this data is kept for a long time and/or distributed to 450 third parties for research purposes [ditl] [day-at-root], security 451 analysis, or surveillance tasks. These uses are sometimes under some 452 sort of contract, with various limitations, for instance, on 453 redistribution, given the sensitive nature of the data. Also, there 454 are observation points in the network that gather DNS data and then 455 make it accessible to third parties for research or security purposes 456 ("passive DNS" [passive-dns]). 458 2.5.1. In the Recursive Resolvers 460 Recursive Resolvers see all the traffic since there is typically no 461 caching before them. To summarize: your recursive resolver knows a 462 lot about you. The resolver of a large IAP, or a large public 463 resolver, can collect data from many users. You may get an idea of 464 the data collected by reading the privacy policy of a big public 465 resolver, e.g., . 468 2.5.1.1. Encrypted transports 470 Use of encrypted transports does not reduce the data available in the 471 recursive resolver and ironically can actually expose more 472 information about users to operators. As mentioned in Section 2.4 473 use of session based encrypted transports (TCP/TLS) can expose 474 correlation data about users. Such concerns in the TCP/TLS layers 475 apply equally to DNS-over-TLS and DoH which both use TLS as the 476 underlying transport. 478 2.5.1.2. DoH vs DNS-over-TLS 480 The proposed specification for DoH [I-D.ietf-doh-dns-over-https] 481 includes a Privacy Considerations section which highlights some of 482 the differences between HTTP and DNS. As a deliberate design choice 483 DoH inherits the privacy properties of the HTTPS stack and as a 484 consequence introduces new privacy concerns when compared with DNS 485 over UDP, TCP or TLS [RFC7858]. The rationale for this decision is 486 that retaining the ability to leverage the full functionality of the 487 HTTP ecosystem is more important than placing specific constraints on 488 this new protocol based on privacy considerations (modulo limiting 489 the use of HTTP cookies). 491 In analyzing the new issues introduced by DoH it is helpful to 492 recognize that there exists a natural tension between 494 o the wide practice in HTTP to use various headers to optimize HTTP 495 connections, functionality and behaviour (which can facilitate 496 user identification and tracking) 498 o and the fact that the DNS payload is currently very tightly 499 encoded and contains no standardized user identifiers. 501 DNS-over-TLS, for example, would normally contain no client 502 identifiers above the TLS layer and a resolver would see only a 503 stream of DNS query payloads originating within one or more 504 connections from a client IP address. Whereas if DoH clients 505 commonly include several headers in a DNS message (e.g. user-agent 506 and accept-language) this could lead to the DoH server being able to 507 identify the source of individual DNS requests not only to a specific 508 end user device but to a specific application. 510 Additionally, depending on the client architecture, isolation of DoH 511 queries from other HTTP traffic may or may not be feasible or 512 desirable. Depending on the use case, isolation of DoH queries from 513 other HTTP traffic may or may not increase privacy. 515 The picture for privacy considerations and user expectations for DoH 516 with respect to what additional data may be available to the DoH 517 server compared to DNS over UDP,TCP or TLS is complex and requires a 518 detailed analysis for each use case. In particular the choice of 519 HTTPS functionality vs privacy is specifically made an implementation 520 choice in DoH and users may well have differing privacy expectations 521 depending on the DoH use case and implementation. 523 At the extremes, there may be implementations that attempt to achieve 524 parity with DNS-over-TLS from a privacy perspective at the cost of 525 using no identifiable headers, there might be others that provide 526 feature rich data flows where the low-level origin of the DNS query 527 is easily identifiable. 529 Privacy focussed users should be aware of the potential for 530 additional client identifiers in DoH compared to DNS-over-TLS and may 531 want to only use DoH implementations that provide clear guidance on 532 what identifiers they add. 534 2.5.2. In the Authoritative Name Servers 536 Unlike what happens for recursive resolvers, observation capabilities 537 of authoritative name servers are limited by caching; they see only 538 the requests for which the answer was not in the cache. For 539 aggregated statistics ("What is the percentage of LOC queries?"), 540 this is sufficient, but it prevents an observer from seeing 541 everything. Still, the authoritative name servers see a part of the 542 traffic, and this subset may be sufficient to violate some privacy 543 expectations. 545 Also, the end user typically has some legal/contractual link with the 546 recursive resolver (he has chosen the IAP, or he has chosen to use a 547 given public resolver), while having no control and perhaps no 548 awareness of the role of the authoritative name servers and their 549 observation abilities. 551 As noted before, using a local resolver or a resolver close to the 552 machine decreases the attack surface for an on-the-wire eavesdropper. 553 But it may decrease privacy against an observer located on an 554 authoritative name server. This authoritative name server will see 555 the IP address of the end client instead of the address of a big 556 recursive resolver shared by many users. 558 This "protection", when using a large resolver with many clients, is 559 no longer present if ECS [RFC7871] is used because, in this case, the 560 authoritative name server sees the original IP address (or prefix, 561 depending on the setup). 563 As of today, all the instances of one root name server, L-root, 564 receive together around 50,000 queries per second. While most of it 565 is "junk" (errors on the Top-Level Domain (TLD) name), it gives an 566 idea of the amount of big data that pours into name servers. (And 567 even "junk" can leak information; for instance, if there is a typing 568 error in the TLD, the user will send data to a TLD that is not the 569 usual one.) 571 Many domains, including TLDs, are partially hosted by third-party 572 servers, sometimes in a different country. The contracts between the 573 domain manager and these servers may or may not take privacy into 574 account. Whatever the contract, the third-party hoster may be honest 575 or not but, in any case, it will have to follow its local laws. So, 576 requests to a given ccTLD may go to servers managed by organizations 577 outside of the ccTLD's country. End users may not anticipate that, 578 when doing a security analysis. 580 Also, it seems (see the survey described in [aeris-dns]) that there 581 is a strong concentration of authoritative name servers among 582 "popular" domains (such as the Alexa Top N list). For instance, 583 among the Alexa Top 100K, one DNS provider hosts today 10% of the 584 domains. The ten most important DNS providers host together one 585 third of the domains. With the control (or the ability to sniff the 586 traffic) of a few name servers, you can gather a lot of information. 588 2.5.3. Rogue Servers 590 The previous paragraphs discussed DNS privacy, assuming that all the 591 traffic was directed to the intended servers and that the potential 592 attacker was purely passive. But, in reality, we can have active 593 attackers redirecting the traffic, not to change it but just to 594 observe it. 596 For instance, a rogue DHCP server, or a trusted DHCP server that has 597 had its configuration altered by malicious parties, can direct you to 598 a rogue recursive resolver. Most of the time, it seems to be done to 599 divert traffic by providing lies for some domain names. But it could 600 be used just to capture the traffic and gather information about you. 601 Other attacks, besides using DHCP, are possible. The traffic from a 602 DNS client to a DNS server can be intercepted along its way from 603 originator to intended source, for instance, by transparent DNS 604 proxies in the network that will divert the traffic intended for a 605 legitimate DNS server. This rogue server can masquerade as the 606 intended server and respond with data to the client. (Rogue servers 607 that inject malicious data are possible, but it is a separate problem 608 not relevant to privacy.) A rogue server may respond correctly for a 609 long period of time, thereby foregoing detection. This may be done 610 for what could be claimed to be good reasons, such as optimization or 611 caching, but it leads to a reduction of privacy compared to if there 612 was no attacker present. Also, malware like DNSchanger [dnschanger] 613 can change the recursive resolver in the machine's configuration, or 614 the routing itself can be subverted (for instance, 615 [ripe-atlas-turkey]). 617 2.5.4. Authentication of servers 619 Both Strict mode for DNS-over-TLS and DoH require authentication of 620 the server and therefore as long as the authentication credentials 621 are obtained over a secure channel then using either of these 622 transports defeats the attack of re-directing traffic to rogue 623 servers. Of course attacks on these secure channels are also 624 possible, but out of the scope of this document. 626 2.5.5. Blocking of services 628 User privacy can also be at risk if there is blocking (by local 629 network operators or more general mechanisms) of access to recursive 630 servers that offer encrypted transports. For example active blocking 631 of port 853 for DNS-over-TLS or of specific IP addresses (e.g. 632 1.1.1.1) could restrict the resolvers available to the client. 633 Similarly attacks on such services e.g. DDoS could force users to 634 switch to other services that do not offer encrypted transports for 635 DNS. 637 2.6. Re-identification and Other Inferences 639 An observer has access not only to the data he/she directly collects 640 but also to the results of various inferences about this data. 642 For instance, a user can be re-identified via DNS queries. If the 643 adversary knows a user's identity and can watch their DNS queries for 644 a period, then that same adversary may be able to re-identify the 645 user solely based on their pattern of DNS queries later on regardless 646 of the location from which the user makes those queries. For 647 example, one study [herrmann-reidentification] found that such re- 648 identification is possible so that "73.1% of all day-to-day links 649 were correctly established, i.e. user u was either re-identified 650 unambiguously (1) or the classifier correctly reported that u was not 651 present on day t+1 any more (2)." While that study related to web 652 browsing behavior, equally characteristic patterns may be produced 653 even in machine-to-machine communications or without a user taking 654 specific actions, e.g., at reboot time if a characteristic set of 655 services are accessed by the device. 657 For instance, one could imagine that an intelligence agency 658 identifies people going to a site by putting in a very long DNS name 659 and looking for queries of a specific length. Such traffic analysis 660 could weaken some privacy solutions. 662 The IAB privacy and security program also have a work in progress 663 [RFC7624] that considers such inference-based attacks in a more 664 general framework. 666 2.7. More Information 668 Useful background information can also be found in [tor-leak] (about 669 the risk of privacy leak through DNS) and in a few academic papers: 670 [yanbin-tsudik], [castillo-garcia], [fangming-hori-sakurai], and 671 [federrath-fuchs-herrmann-piosecny]. 673 3. Actual "Attacks" 675 A very quick examination of DNS traffic may lead to the false 676 conclusion that extracting the needle from the haystack is difficult. 677 "Interesting" primary DNS requests are mixed with useless (for the 678 eavesdropper) secondary and tertiary requests (see the terminology in 679 Section 1). But, in this time of "big data" processing, powerful 680 techniques now exist to get from the raw data to what the 681 eavesdropper is actually interested in. 683 Many research papers about malware detection use DNS traffic to 684 detect "abnormal" behavior that can be traced back to the activity of 685 malware on infected machines. Yes, this research was done for the 686 good, but technically it is a privacy attack and it demonstrates the 687 power of the observation of DNS traffic. See [dns-footprint], 688 [dagon-malware], and [darkreading-dns]. 690 Passive DNS systems [passive-dns] allow reconstruction of the data of 691 sometimes an entire zone. They are used for many reasons -- some 692 good, some bad. Well-known passive DNS systems keep only the DNS 693 responses, and not the source IP address of the client, precisely for 694 privacy reasons. Other passive DNS systems may not be so careful. 695 And there is still the potential problems with revealing QNAMEs. 697 The revelations (from the Edward Snowden documents, which were leaked 698 from the National Security Agency (NSA)) of the MORECOWBELL 699 surveillance program [morecowbell], which uses the DNS, both 700 passively and actively, to surreptitiously gather information about 701 the users, is another good example showing that the lack of privacy 702 protections in the DNS is actively exploited. 704 4. Legalities 706 To our knowledge, there are no specific privacy laws for DNS data, in 707 any country. Interpreting general privacy laws like 708 [data-protection-directive] or GDPR [4] applicable in the European 709 Union in the context of DNS traffic data is not an easy task, and we 710 do not know a court precedent here. See an interesting analysis in 711 [sidn-entrada]. 713 5. Security Considerations 715 This document is entirely about security, more precisely privacy. It 716 just lays out the problem; it does not try to set requirements (with 717 the choices and compromises they imply), much less define solutions. 718 Possible solutions to the issues described here are discussed in 719 other documents (currently too many to all be mentioned); see, for 720 instance, 'Recommendations for DNS Privacy Operators' 721 [I-D.dickinson-dprive-bcp-op]. 723 6. Acknowledgments 725 Thanks to Nathalie Boulvard and to the CENTR members for the original 726 work that led to this document. Thanks to Ondrej Sury for the 727 interesting discussions. Thanks to Mohsen Souissi and John Heidemann 728 for proofreading and to Paul Hoffman, Matthijs Mekking, Marcos Sanz, 729 Tim Wicinski, Francis Dupont, Allison Mankin, and Warren Kumari for 730 proofreading, providing technical remarks, and making many 731 readability improvements. Thanks to Dan York, Suzanne Woolf, Tony 732 Finch, Stephen Farrell, Peter Koch, Simon Josefsson, and Frank Denis 733 for good written contributions. And thanks to the IESG members for 734 the last remarks. 736 7. Changelog 738 draft-bortzmeyer-dprive-rfc7626-bis-01 740 o Update reference for dickinson-bcp-op to draft-dickinson-dprive- 741 bcp-op 743 draft-borztmeyer-dprive-rfc7626-bis-00: 745 Initial commit. Differences to RFC7626: 747 o Update many references 749 o Add discussions of encrypted transports including DNS-over-TLS and 750 DoH 752 o Add section on DNS payload 754 o Add section on authentication of servers 756 o Add section on blocking of services 758 8. References 760 8.1. Normative References 762 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", 763 STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987, 764 . 766 [RFC1035] Mockapetris, P., "Domain names - implementation and 767 specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, 768 November 1987, . 770 [RFC6973] Cooper, A., Tschofenig, H., Aboba, B., Peterson, J., 771 Morris, J., Hansen, M., and R. Smith, "Privacy 772 Considerations for Internet Protocols", RFC 6973, 773 DOI 10.17487/RFC6973, July 2013, . 776 [RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an 777 Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May 778 2014, . 780 8.2. Informative References 782 [aeris-dns] 783 Vinot, N., "Vie privee: et le DNS alors?", (In French), 784 2015, . 787 [castillo-garcia] 788 Castillo-Perez, S. and J. Garcia-Alfaro, "Anonymous 789 Resolution of DNS Queries", 2008, 790 . 792 [dagon-malware] 793 Dagon, D., "Corrupted DNS Resolution Paths: The Rise of a 794 Malicious Resolution Authority", ISC/OARC Workshop, 2007, 795 . 798 [darkreading-dns] 799 Lemos, R., "Got Malware? Three Signs Revealed In DNS 800 Traffic", InformationWeek Dark Reading, May 2013, 801 . 805 [data-protection-directive] 806 European Parliament, "Directive 95/46/EC of the European 807 Pariament and of the council on the protection of 808 individuals with regard to the processing of personal data 809 and on the free movement of such data", Official Journal L 810 281, pp. 0031 - 0050, November 1995, . 814 [day-at-root] 815 Castro, S., Wessels, D., Fomenkov, M., and K. Claffy, "A 816 Day at the Root of the Internet", ACM SIGCOMM Computer 817 Communication Review, Vol. 38, Number 5, 818 DOI 10.1145/1452335.1452341, October 2008, 819 . 822 [denis-edns-client-subnet] 823 Denis, F., "Security and privacy issues of edns-client- 824 subnet", August 2013, . 827 [ditl] CAIDA, "A Day in the Life of the Internet (DITL)", 2002, 828 . 830 [dns-footprint] 831 Stoner, E., "DNS Footprint of Malware", OARC Workshop, 832 October 2010, . 835 [dnschanger] 836 Wikipedia, "DNSChanger", October 2013, 837 . 840 [dnsmezzo] 841 Bortzmeyer, S., "DNSmezzo", 2009, 842 . 844 [fangming-hori-sakurai] 845 Fangming, Z., Hori, Y., and K. Sakurai, "Analysis of 846 Privacy Disclosure in DNS Query", 2007 International 847 Conference on Multimedia and Ubiquitous Engineering (MUE 848 2007), Seoul, Korea, ISBN: 0-7695-2777-9, pp. 952-957, 849 DOI 10.1109/MUE.2007.84, April 2007, 850 . 852 [federrath-fuchs-herrmann-piosecny] 853 Federrath, H., Fuchs, K., Herrmann, D., and C. Piosecny, 854 "Privacy-Preserving DNS: Analysis of Broadcast, Range 855 Queries and Mix-based Protection Methods", Computer 856 Security ESORICS 2011, Springer, page(s) 665-683, 857 ISBN 978-3-642-23821-5, 2011, . 861 [grangeia.snooping] 862 Grangeia, L., "DNS Cache Snooping or Snooping the Cache 863 for Fun and Profit", February 2004, 864 . 867 [herrmann-reidentification] 868 Herrmann, D., Gerber, C., Banse, C., and H. Federrath, 869 "Analyzing Characteristic Host Access Patterns for Re- 870 Identification of Web User Sessions", 871 DOI 10.1007/978-3-642-27937-9_10, 2012, . 874 [I-D.dickinson-dprive-bcp-op] 875 Dickinson, S., Overeinder, B., Rijswijk-Deij, R., and A. 876 Mankin, "Recommendations for DNS Privacy Service 877 Operators", draft-dickinson-dprive-bcp-op-00 (work in 878 progress), July 2018. 880 [I-D.ietf-dnsop-terminology-bis] 881 Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS 882 Terminology", draft-ietf-dnsop-terminology-bis-11 (work in 883 progress), July 2018. 885 [I-D.ietf-doh-dns-over-https] 886 Hoffman, P. and P. McManus, "DNS Queries over HTTPS 887 (DoH)", draft-ietf-doh-dns-over-https-12 (work in 888 progress), June 2018. 890 [morecowbell] 891 Grothoff, C., Wachs, M., Ermert, M., and J. Appelbaum, 892 "NSA's MORECOWBELL: Knell for DNS", GNUnet e.V., January 893 2015, . 895 [packetq] Dot SE, "PacketQ, a simple tool to make SQL-queries 896 against PCAP-files", 2011, 897 . 899 [packetq-list] 900 PacketQ, "PacketQ Mailing List", 901 . 903 [passive-dns] 904 Weimer, F., "Passive DNS Replication", April 2005, 905 . 907 [pitfalls-of-dns-encrption] 908 Shulman, H., "Pretty Bad Privacy:Pitfalls of DNS 909 Encryption", . 912 [prism] Wikipedia, "PRISM (surveillance program)", July 2015, 913 . 916 [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. 917 Rose, "DNS Security Introduction and Requirements", 918 RFC 4033, DOI 10.17487/RFC4033, March 2005, 919 . 921 [RFC5155] Laurie, B., Sisson, G., Arends, R., and D. Blacka, "DNS 922 Security (DNSSEC) Hashed Authenticated Denial of 923 Existence", RFC 5155, DOI 10.17487/RFC5155, March 2008, 924 . 926 [RFC5936] Lewis, E. and A. Hoenes, Ed., "DNS Zone Transfer Protocol 927 (AXFR)", RFC 5936, DOI 10.17487/RFC5936, June 2010, 928 . 930 [RFC6269] Ford, M., Ed., Boucadair, M., Durand, A., Levis, P., and 931 P. Roberts, "Issues with IP Address Sharing", RFC 6269, 932 DOI 10.17487/RFC6269, June 2011, . 935 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP 936 Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, 937 . 939 [RFC7525] Sheffer, Y., Holz, R., and P. Saint-Andre, 940 "Recommendations for Secure Use of Transport Layer 941 Security (TLS) and Datagram Transport Layer Security 942 (DTLS)", BCP 195, RFC 7525, DOI 10.17487/RFC7525, May 943 2015, . 945 [RFC7624] Barnes, R., Schneier, B., Jennings, C., Hardie, T., 946 Trammell, B., Huitema, C., and D. Borkmann, 947 "Confidentiality in the Face of Pervasive Surveillance: A 948 Threat Model and Problem Statement", RFC 7624, 949 DOI 10.17487/RFC7624, August 2015, . 952 [RFC7858] Hu, Z., Zhu, L., Heidemann, J., Mankin, A., Wessels, D., 953 and P. Hoffman, "Specification for DNS over Transport 954 Layer Security (TLS)", RFC 7858, DOI 10.17487/RFC7858, May 955 2016, . 957 [RFC7871] Contavalli, C., van der Gaast, W., Lawrence, D., and W. 958 Kumari, "Client Subnet in DNS Queries", RFC 7871, 959 DOI 10.17487/RFC7871, May 2016, . 962 [RFC7873] Eastlake 3rd, D. and M. Andrews, "Domain Name System (DNS) 963 Cookies", RFC 7873, DOI 10.17487/RFC7873, May 2016, 964 . 966 [RFC7929] Wouters, P., "DNS-Based Authentication of Named Entities 967 (DANE) Bindings for OpenPGP", RFC 7929, 968 DOI 10.17487/RFC7929, August 2016, . 971 [ripe-atlas-turkey] 972 Aben, E., "A RIPE Atlas View of Internet Meddling in 973 Turkey", March 2014, 974 . 977 [sidn-entrada] 978 Hesselman, C., Jansen, J., Wullink, M., Vink, K., and M. 979 Simon, "A privacy framework for 'DNS big data' 980 applications", November 2014, 981 . 984 [thomas-ditl-tcp] 985 Thomas, M. and D. Wessels, "An Analysis of TCP Traffic in 986 Root Server DITL Data", DNS-OARC 2014 Fall Workshop, 987 October 2014, . 991 [tor-leak] 992 Tor, "DNS leaks in Tor", 2013, 993 . 996 [yanbin-tsudik] 997 Yanbin, L. and G. Tsudik, "Towards Plugging Privacy Leaks 998 in the Domain Name System", October 2009, 999 . 1001 8.3. URIs 1003 [1] https://lists.dns-oarc.net/pipermail/dns- 1004 operations/2016-January/014141.html 1006 [2] http://netres.ec/?b=11B99BD 1008 [3] https://www.researchgate.net/publication/320322146_DNS-DNS_DNS- 1009 based_De-NAT_Scheme 1011 [4] https://www.eugdpr.org/the-regulation.html 1013 Authors' Addresses 1015 Stephane Bortzmeyer 1016 AFNIC 1017 1, rue Stephenson 1018 Montigny-le-Bretonneux 1019 France 78180 1021 Email: bortzmeyer+ietf@nic.fr 1023 Sara Dickinson 1024 Sinodun IT 1025 Magdalen Centre 1026 Oxford Science Park 1027 Oxford OX4 4GA 1028 United Kingdom 1030 Email: sara@sinodun.com