idnits 2.17.1 

draft-ietf-dprive-problem-statement-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (June 15, 2015) is 3231 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Looks like a reference, but probably isn't: '1' on line 762

  == Outdated reference: A later version (-08) exists of
     draft-ietf-dnsop-edns-client-subnet-01

  == Outdated reference: A later version (-01) exists of
     draft-ietf-dprive-start-tls-for-dns-00

  == Outdated reference: A later version (-09) exists of
     draft-ietf-dnsop-qname-minimisation-03

  == Outdated reference: A later version (-05) exists of
     draft-ietf-dnsop-dns-terminology-02


     Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	DNS PRIVate Exchange (dprive) Working Group                S. Bortzmeyer
3	Internet-Draft                                                     AFNIC
4	Intended status: Informational                             June 15, 2015
5	Expires: December 17, 2015

7	                       DNS privacy considerations
8	                 draft-ietf-dprive-problem-statement-06

10	Abstract

12	   This document describes the privacy issues associated with the use of
13	   the DNS by Internet users.  It is intended to be an analysis of the
14	   present situation and does not prescribe solutions.

16	Status of This Memo

18	   This Internet-Draft is submitted in full conformance with the
19	   provisions of BCP 78 and BCP 79.

21	   Internet-Drafts are working documents of the Internet Engineering
22	   Task Force (IETF).  Note that other groups may also distribute
23	   working documents as Internet-Drafts.  The list of current Internet-
24	   Drafts is at http://datatracker.ietf.org/drafts/current/.

26	   Internet-Drafts are draft documents valid for a maximum of six months
27	   and may be updated, replaced, or obsoleted by other documents at any
28	   time.  It is inappropriate to use Internet-Drafts as reference
29	   material or to cite them other than as "work in progress."

31	   This Internet-Draft will expire on December 17, 2015.

33	Copyright Notice

35	   Copyright (c) 2015 IETF Trust and the persons identified as the
36	   document authors.  All rights reserved.

38	   This document is subject to BCP 78 and the IETF Trust's Legal
39	   Provisions Relating to IETF Documents
40	   (http://trustee.ietf.org/license-info) in effect on the date of
41	   publication of this document.  Please review these documents
42	   carefully, as they describe your rights and restrictions with respect
43	   to this document.  Code Components extracted from this document must
44	   include Simplified BSD License text as described in Section 4.e of
45	   the Trust Legal Provisions and are provided without warranty as
46	   described in the Simplified BSD License.

48	Table of Contents

50	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
51	   2.  Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . .   4
52	     2.1.  The alleged public nature of DNS data . . . . . . . . . .   5
53	     2.2.  Data in the DNS request . . . . . . . . . . . . . . . . .   5
54	     2.3.  Cache snooping  . . . . . . . . . . . . . . . . . . . . .   6
55	     2.4.  On the wire . . . . . . . . . . . . . . . . . . . . . . .   7
56	     2.5.  In the servers  . . . . . . . . . . . . . . . . . . . . .   8
57	       2.5.1.  In the recursive resolvers  . . . . . . . . . . . . .   9
58	       2.5.2.  In the authoritative name servers . . . . . . . . . .   9
59	       2.5.3.  Rogue servers . . . . . . . . . . . . . . . . . . . .  10
60	     2.6.  Re-identification and other inferences  . . . . . . . . .  11
61	   3.  Actual "attacks"  . . . . . . . . . . . . . . . . . . . . . .  11
62	   4.  Legalities  . . . . . . . . . . . . . . . . . . . . . . . . .  12
63	   5.  Security considerations . . . . . . . . . . . . . . . . . . .  12
64	   6.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  12
65	   7.  IANA considerations . . . . . . . . . . . . . . . . . . . . .  12
66	   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  13
67	     8.1.  Normative References  . . . . . . . . . . . . . . . . . .  13
68	     8.2.  Informative References  . . . . . . . . . . . . . . . . .  13
69	     8.3.  URIs  . . . . . . . . . . . . . . . . . . . . . . . . . .  17
70	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  17

72	1.  Introduction

74	   This document is an analysis of the DNS privacy issues, in the spirit
75	   of section 8 of [RFC6973].

77	   The Domain Name System is specified in [RFC1034] and [RFC1035] and
78	   many later RFCs, which have never been consolidated.  It is one of
79	   the most important infrastructure components of the Internet and
80	   often ignored or misunderstood by Internet users (and even by many
81	   professionals).  Almost every activity on the Internet starts with a
82	   DNS query (and often several).  Its use has many privacy implications
83	   and this is an attempt at a comprehensive and accurate list.

85	   Let us begin with a simplified reminder of how the DNS works.  (See
86	   also [I-D.ietf-dnsop-dns-terminology].)  A client, the stub resolver,
87	   issues a DNS query to a server, called the recursive resolver (also
88	   called caching resolver or full resolver or recursive name server).
89	   Let's use the query "What are the AAAA records for www.example.com?"
90	   as an example.  AAAA is the QTYPE (Query Type), and www.example.com
91	   is the QNAME (Query Name).  (The description which follows assumes a
92	   cold cache, for instance because the server just started.)  The
93	   recursive resolver will first query the root nameservers.  In most
94	   cases, the root nameservers will send a referral.  In this example,
95	   the referral will be to the .com nameservers.  The resolver repeats
96	   the query to one of the .com nameservers.  The .com nameservers, in
97	   turn, will refer to the example.com nameservers.  The example.com
98	   nameserver will then return the answer.  The root name servers, the
99	   name servers of .com and the name servers of example.com are called
100	   authoritative name servers.  It is important, when analyzing the
101	   privacy issues, to remember that the question asked to all these name
102	   servers is always the original question, not a derived question.  The
103	   question sent to the root name servers is "What are the AAAA records
104	   for www.example.com?", not "What are the name servers of .com?".  By
105	   repeating the full question, instead of just the relevant part of the
106	   question to the next in line, the DNS provides more information than
107	   necessary to the nameserver.

109	   Because DNS relies on caching heavily, the algorithm described just
110	   above is actually a bit more complicated, and not all questions are
111	   sent to the authoritative name servers.  If a few seconds later the
112	   stub resolver asks to the recursive resolver, "What are the SRV
113	   records of _xmpp-server._tcp.example.com?", the recursive resolver
114	   will remember that it knows the name servers of example.com and will
115	   just query them, bypassing the root and .com.  Because there is
116	   typically no caching in the stub resolver, the recursive resolver,
117	   unlike the authoritative servers, sees all the DNS traffic.
118	   (Applications, like Web browsers, may have some form of caching which
119	   do not follow DNS rules, for instance because it may ignore the TTL.
120	   So, the recursive resolver does not see all the name resolution
121	   activity.)

123	   It should be noted that DNS recursive resolvers sometimes forward
124	   requests to other recursive resolvers, typically bigger machines,
125	   with a larger and more shared cache (and the query hierarchy can be
126	   even deeper, with more than two levels of recursive resolvers).  From
127	   the point of view of privacy, these forwarders are like resolvers,
128	   except that they do not see all of the requests being made (due to
129	   caching in the first resolver).

131	   Almost all this DNS traffic is currently sent in clear (unencrypted).
132	   There are a few cases where there is some channel encryption, for
133	   instance in an IPsec VPN, at least between the stub resolver and the
134	   resolver.

136	   Today, almost all DNS queries are sent over UDP [thomas-ditl-tcp].
137	   This has practical consequences when considering encryption of the
138	   traffic as a possible privacy technique.  Some encryption solutions
139	   are only designed for TCP, not UDP.

141	   Another important point to keep in mind when analyzing the privacy
142	   issues of DNS is the fact that DNS requests received by a server were
143	   triggered by different reasons.  Let's assume an eavesdropper wants
144	   to know which Web page is viewed by a user.  For a typical Web page,
145	   there are three sorts of DNS requests being issued:

147	      Primary request: this is the domain name in the URL that the user
148	      typed, selected from a bookmark or chose by clicking on an
149	      hyperlink.  Presumably, this is what is of interest for the
150	      eavesdropper.

152	      Secondary requests: these are the additional requests performed by
153	      the user agent (here, the Web browser) without any direct
154	      involvement or knowledge of the user.  For the Web, they are
155	      triggered by embedded content, CSS sheets, JavaScript code,
156	      embedded images, etc.  In some cases, there can be dozens of
157	      domain names in different contexts on a single Web page.

159	      Tertiary requests: these are the additional requests performed by
160	      the DNS system itself.  For instance, if the answer to a query is
161	      a referral to a set of name servers, and the glue records are not
162	      returned, the resolver will have to do additional requests to turn
163	      name servers' names into IP addresses.  Similarly, even if glue
164	      records are returned, a careful recursive server will do tertiary
165	      requests to verify the IP addresses of those records.

167	   It can be noted also that, in the case of a typical Web browser, more
168	   DNS requests than stricly necessary are sent, for instance to
169	   prefetch resources that the user may query later, or when
170	   autocompleting the URL in the address bar.  Both are a big privacy
171	   concern since they may leak information even about non-explicit
172	   actions.  For instance, just reading a local HTML page, even without
173	   selecting the hyperlinks, may trigger DNS requests.

175	   For privacy-related terms, we will use here the terminology of
176	   [RFC6973].

178	2.  Risks

180	   This document focuses mostly on the study of privacy risks for the
181	   end-user (the one performing DNS requests).  We consider the risks of
182	   pervasive surveillance ([RFC7258]) as well as risks coming from a
183	   more focused surveillance.  Privacy risks for the holder of a zone
184	   (the risk that someone gets the data) are discussed in [RFC5936] and
185	   [RFC5155].  Non-privacy risks (such as cache poisoning) are out of
186	   scope.

188	2.1.  The alleged public nature of DNS data

190	   It has long been claimed that "the data in the DNS is public".  While
191	   this sentence makes sense for an Internet-wide lookup system, there
192	   are multiple facets to the data and metadata involved that deserve a
193	   more detailed look.  First, access control lists and private
194	   namespaces nonwithstanding, the DNS operates under the assumption
195	   that public facing authoritative name servers will respond to "usual"
196	   DNS queries for any zone they are authoritative for without further
197	   authentication or authorization of the client (resolver).  Due to the
198	   lack of search capabilities, only a given QNAME will reveal the
199	   resource records associated with that name (or that name's non-
200	   existence).  In other words: one needs to know what to ask for, in
201	   order to receive a response.  The zone transfer QTYPE [RFC5936] is
202	   often blocked or restricted to authenticated/authorized access to
203	   enforce this difference (and maybe for other reasons).

205	   Another differentiation to be considered is between the DNS data
206	   itself and a particular transaction (i.e., a DNS name lookup).  DNS
207	   data and the results of a DNS query are public, within the boundaries
208	   described above, and may not have any confidentiality requirements.
209	   However, the same is not true of a single transaction or sequence of
210	   transactions; that transaction is not/should not be public.  A
211	   typical example from outside the DNS world is: the Web site of
212	   Alcoholics Anonymous is public; the fact that you visit it should not
213	   be.

215	2.2.  Data in the DNS request

217	   The DNS request includes many fields but two of them seem
218	   particularly relevant for the privacy issues: the QNAME and the
219	   source IP address. "source IP address" is used in a loose sense of
220	   "source IP address + maybe source port", because the port is also in
221	   the request and can be used to differentiate between several users
222	   sharing an IP address (behind a CGN for instance [RFC6269]).

224	   The QNAME is the full name sent by the user.  It gives information
225	   about what the user does ("What are the MX records of example.net?"
226	   means he probably wants to send email to someone at example.net,
227	   which may be a domain used by only a few persons and therefore very
228	   revealing about communication relationships).  Some QNAMEs are more
229	   sensitive than others.  For instance, querying the A record of a
230	   well-known Web statistics domain reveals very little (everybody
231	   visits Web sites which use this analytics service) but querying the A
232	   record of www.verybad.example where verybad.example is the domain of
233	   an organization that some people find offensive or objectionable, may
234	   create more problems for the user.  Also, sometimes, the QNAME embeds
235	   the software one uses, which could be a privacy issue.  For instance,
236	   _ldap._tcp.Default-First-Site-Name._sites.gc._msdcs.example.org.
237	   There are also some BitTorrent clients that query a SRV record for
238	   _bittorrent-tracker._tcp.domain.example.

240	   Another important thing about the privacy of the QNAME is the future
241	   usages.  Today, the lack of privacy is an obstacle to putting
242	   potentially sensitive or personally identifiable data in the DNS.  At
243	   the moment your DNS traffic might reveal that you are doing email but
244	   not with whom.  If your MUA starts looking up PGP keys in the DNS
245	   [I-D.wouters-dane-openpgp] then privacy becomes a lot more important.
246	   And email is just an example; there would be other really interesting
247	   uses for a more privacy-friendly DNS.

249	   For the communication between the stub resolver and the recursive
250	   resolver, the source IP address is the address of the user's machine.
251	   Therefore, all the issues and warnings about collection of IP
252	   addresses apply here.  For the communication between the recursive
253	   resolver and the authoritative name servers, the source IP address
254	   has a different meaning; it does not have the same status as the
255	   source address in a HTTP connection.  It is now the IP address of the
256	   recursive resolver which, in a way "hides" the real user.  However,
257	   hiding does not always work.  Sometimes
258	   [I-D.ietf-dnsop-edns-client-subnet] is used (see its privacy analysis
259	   in [denis-edns-client-subnet]).  Sometimes the end user has a
260	   personal recursive resolver on her machine.  In both cases, the IP
261	   address is as sensitive as it is for HTTP [sidn-entrada].

263	   A note about IP addresses: there is currently no IETF document which
264	   describes in detail all the privacy issues around IP addressing.  In
265	   the meantime, the discussion here is intended to include both IPv4
266	   and IPv6 source addresses.  For a number of reasons their assignment
267	   and utilization characteristics are different, which may have
268	   implications for details of information leakage associated with the
269	   collection of source addresses.  (For example, a specific IPv6 source
270	   address seen on the public Internet is less likely than an IPv4
271	   address to originate behind a CGN or other NAT.)  However, for both
272	   IPv4 and IPv6 addresses, it's important to note that source addresses
273	   are propagated with queries and comprise metadata about the host,
274	   user, or application that originated them.

276	2.3.  Cache snooping

278	   The content of recursive resolvers' caches can reveal data about the
279	   clients using it (the privacy risks depend on the number of clients).
280	   This information can sometimes be examined by sending DNS queries
281	   with RD=0 to inspect cache content, particularly looking at the DNS
282	   TTLs [grangeia.snooping].  Since this also is a reconnaissance
283	   technique for subsequent cache poisoning attacks, some counter
284	   measures have already been developed and deployed.

286	2.4.  On the wire

288	   DNS traffic can be seen by an eavesdropper like any other traffic.
289	   It is typically not encrypted.  (DNSSEC, specified in [RFC4033]
290	   explicitly excludes confidentiality from its goals.)  So, if an
291	   initiator starts a HTTPS communication with a recipient, while the
292	   HTTP traffic will be encrypted, the DNS exchange prior to it will not
293	   be.  When other protocols will become more and more privacy-aware and
294	   secured against surveillance, the DNS may become "the weakest link"
295	   in privacy.

297	   An important specificity of the DNS traffic is that it may take a
298	   different path than the communication between the initiator and the
299	   recipient.  For instance, an eavesdropper may be unable to tap the
300	   wire between the initiator and the recipient but may have access to
301	   the wire going to the recursive resolver, or to the authoritative
302	   name servers.

304	   The best place to tap, from an eavesdropper's point of view, is
305	   clearly between the stub resolvers and the recursive resolvers,
306	   because traffic is not limited by DNS caching.

308	   The attack surface between the stub resolver and the rest of the
309	   world can vary widely depending upon how the end user's computer is
310	   configured.  By order of increasing attack surface:

312	      The recursive resolver can be on the end user's computer.  In
313	      (currently) a small number of cases, individuals may choose to
314	      operate their own DNS resolver on their local machine.  In this
315	      case the attack surface for the connection between the stub
316	      resolver and the caching resolver is limited to that single
317	      machine.

319	      The recursive resolver may be at the local network edge.  For
320	      many/most enterprise networks and for some residential users the
321	      caching resolver may exist on a server at the edge of the local
322	      network.  In this case the attack surface is the local network.
323	      Note that in large enterprise networks the DNS resolver may not be
324	      located at the edge of the local network but rather at the edge of
325	      the overall enterprise network.  In this case the enterprise
326	      network could be thought of as similar to the IAP (Internet Access
327	      Provider) network referenced below.

329	      The recursive resolver can be in the IAP (Internet Access
330	      Provider) premises.  For most residential users and potentially
331	      other networks the typical case is for the end user's computer to
332	      be configured (typically automatically through DHCP) with the
333	      addresses of the DNS recursive resolvers at the IAP.  The attack
334	      surface for on-the-wire attacks is therefore from the end user
335	      system across the local network and across the IAP network to the
336	      IAP's recursive resolvers.

338	      The recursive resolver can be a public DNS service.  Some machines
339	      may be configured to use public DNS resolvers such as those
340	      operated today by Google Public DNS or OpenDNS.  The end user may
341	      have configured their machine to use these DNS recursive resolvers
342	      themselves - or their IAP may have chosen to use the public DNS
343	      resolvers rather than operating their own resolvers.  In this case
344	      the attack surface is the entire public Internet between the end
345	      user's connection and the public DNS service.

347	2.5.  In the servers

349	   Using the terminology of [RFC6973], the DNS servers (recursive
350	   resolvers and authoritative servers) are enablers: they facilitate
351	   communication between an initiator and a recipient without being
352	   directly in the communications path.  As a result, they are often
353	   forgotten in risk analysis.  But, to quote again [RFC6973], "Although
354	   [...] enablers may not generally be considered as attackers, they may
355	   all pose privacy threats (depending on the context) because they are
356	   able to observe, collect, process, and transfer privacy-relevant
357	   data."  In [RFC6973] parlance, enablers become observers when they
358	   start collecting data.

360	   Many programs exist to collect and analyze DNS data at the servers.
361	   From the "query log" of some programs like BIND, to tcpdump and more
362	   sophisticated programs like PacketQ [packetq] and DNSmezzo
363	   [dnsmezzo].  The organization managing the DNS server can use these
364	   data itself or it can be part of a surveillance program like PRISM
365	   [prism] and pass data to an outside observer.

367	   Sometimes, these data are kept for a long time and/or distributed to
368	   third parties, for research purposes [ditl] [day-at-root], for
369	   security analysis, or for surveillance tasks.  These uses are
370	   sometimes under some sort of contract, with various limitations, for
371	   instance on redistribution, giving the sensitive nature of the data.
372	   Also, there are observation points in the network which gather DNS
373	   data and then make it accessible to third-parties for research or
374	   security purposes ("passive DNS [passive-dns]").

376	2.5.1.  In the recursive resolvers

378	   Recursive Resolvers see all the traffic since there is typically no
379	   caching before them.  To summarize: your recursive resolver knows a
380	   lot about you.  The resolver of a large IAP, or a large public
381	   resolver can collect data from many users.  You may get an idea of
382	   the data collected by reading the privacy policy of a big public
383	   resolver [1].

385	2.5.2.  In the authoritative name servers

387	   Unlike what happens for recursive resolvers, observation capabilities
388	   of authoritative name servers are limited by caching; they see only
389	   the requests for which the answer was not in the cache.  For
390	   aggregated statistics ("What is the percentage of LOC queries?"),
391	   this is sufficient; but it prevents an observer from seeing
392	   everything.  Still, the authoritative name servers see a part of the
393	   traffic, and this subset may be sufficient to violate some privacy
394	   expectations.

396	   Also, the end user has typically some legal/contractual link with the
397	   recursive resolver (he has chosen the IAP, or he has chosen to use a
398	   given public resolver), while having no control and perhaps no
399	   awareness of the role of the authoritative name servers and their
400	   observation abilities.

402	   As noted before, using a local resolver or a resolver close to the
403	   machine decreases the attack surface for an on-the-wire eavesdropper.
404	   But it may decrease privacy against an observer located on an
405	   authoritative name server.  This authoritative name server will see
406	   the IP address of the end client, instead of the address of a big
407	   recursive resolver shared by many users.

409	   This "protection", when using a large resolver with many clients, is
410	   no longer present if [I-D.ietf-dnsop-edns-client-subnet] is used
411	   because, in this case, the authoritative name server sees the
412	   original IP address (or prefix, depending on the setup).

414	   As of today, all the instances of one root name server, L-root,
415	   receive together around 50,000 queries per second.  While most of it
416	   is "junk" (errors on the TLD name), it gives an idea of the amount of
417	   big data which pours into name servers.  (And even "junk" can leak
418	   information, for instance if there is a typing error in the TLD, the
419	   user will send data to a TLD which is not the usual one.)

421	   Many domains, including TLDs, are partially hosted by third-party
422	   servers, sometimes in a different country.  The contracts between the
423	   domain manager and these servers may or may not take privacy into
424	   account.  Whatever the contract, the third-party hoster may be honest
425	   or not but, in any case, it will have to follow its local laws.  So,
426	   requests to a given ccTLD may go to servers managed by organizations
427	   outside of the ccTLD's country.  End-users may not anticipate that,
428	   when doing a security analysis.

430	   Also, it seems [aeris-dns] that there is a strong concentration of
431	   authoritative name servers among "popular" domains (such as the Alexa
432	   Top N list).  For instance, among the Alexa Top 100k, one DNS
433	   provider hosts today 10 % of the domains.  The ten most important DNS
434	   providers host together one third of the domains.  With the control
435	   (or the ability to sniff the traffic) of a few name servers, you can
436	   gather a lot of information.

438	2.5.3.  Rogue servers

440	   The previous paragraphs discussed DNS privacy, assuming that all the
441	   traffic was directed to the intended servers, and that the potential
442	   attacker was purely passive.  But, in reality, we can have active
443	   attackers, redirecting the traffic, not for changing it but just to
444	   observe it.

446	   For instance, a rogue DHCP server, or a trusted DHCP server that has
447	   had its configuration altered by malicious parties, can direct you to
448	   a rogue recursive resolver.  Most of the time, it seems to be done to
449	   divert traffic, by providing lies for some domain names.  But it
450	   could be used just to capture the traffic and gather information
451	   about you.  Other attacks, besides using DHCP, are possible.  The
452	   traffic from a DNS client to a DNS server can be intercepted along
453	   its way from originator to intended source; for instance by
454	   transparent DNS proxies in the network that will divert the traffic
455	   intended for a legitimate DNS server.  This rogue server can
456	   masquerade as the intended server and respond with data to the
457	   client.  (Rogue servers that inject malicious data are possible, but
458	   is a separate problem not relevant to privacy.)  A rogue server may
459	   respond correctly for a long period of time, thereby foregoing
460	   detection.  This may be done for what could be claimed to be good
461	   reasons, such as optimization or caching, but it leads to a reduction
462	   of privacy compared to if there were no attacker present.  Also,
463	   malware like DNSchanger [dnschanger] can change the recursive
464	   resolver in the machine's configuration, or the routing itself can be
465	   subverted (for instance [turkey-googledns]).

467	   A practical consequence of this section is that solutions for DNS
468	   privacy may have to address authentication of the server, not just
469	   passive sniffing.

471	2.6.  Re-identification and other inferences

473	   An observer has access not only to the data he/she directly collects
474	   but also to the results of various inferences about these data.

476	   For instance, a user can be re-identified via DNS queries.  If the
477	   adversary knows a user's identity and can watch their DNS queries for
478	   a period, then that same adversary may be able to re-identify the
479	   user solely based on their pattern of DNS queries later on regardless
480	   of the location from which the user makes those queries.  For
481	   example, one study [herrmann-reidentification] found that such re-
482	   identification is possible so that "73.1% of all day-to-day links
483	   were correctly established, i.e.  user u was either re-identified
484	   unambiguously (1) or the classifier correctly reported that u was not
485	   present on day t+1 any more (2)".  While that study related to web
486	   browsing behaviour, equally characteristic patterns may be produced
487	   even in machine-to-machine communications or without a user taking
488	   specific actions, e.g. at reboot time if a characteristic set of
489	   services are accessed by the device.

491	   For instance, one could imagine, for an intelligence agency to
492	   identify people going to a site by putting in a very long DNS name
493	   and looking for queries of a specific length.  Such traffic analysis
494	   could weaken some privacy solutions.

496	   The IAB privacy and security programme also have a work in progress
497	   [I-D.iab-privsec-confidentiality-threat] that considers such
498	   inference based attacks in a more general framework.

500	3.  Actual "attacks"

502	   A very quick examination of DNS traffic may lead to the false
503	   conclusion that extracting the needle from the haystack is difficult.
504	   "Interesting" primary DNS requests are mixed with useless (for the
505	   eavesdropper) secondary and tertiary requests (see the terminology in
506	   Section 1).  But, in this time of "big data" processing, powerful
507	   techniques now exist to get from the raw data to what the
508	   eavesdropper is actually interested in.

510	   Many research papers about malware detection use DNS traffic to
511	   detect "abnormal" behaviour that can be traced back to the activity
512	   of malware on infected machines.  Yes, this research was done for the
513	   good; but, technically, it is a privacy attack and it demonstrates
514	   the power of the observation of DNS traffic.  See [dns-footprint],
515	   [dagon-malware] and [darkreading-dns].

517	   Passive DNS systems [passive-dns] allow reconstruction of the data of
518	   sometimes an entire zone.  They are used for many reasons, some good,
519	   some bad.  Well-known passive DNS systems keep only the DNS
520	   responses, and not the source IP address of the client, precisely for
521	   privacy reasons.  Other passive DNS systems may not be so careful.
522	   And there is still the potential problems with revealing QNAMEs.

524	   The revelations (from the Edward Snowden documents, leaked from the
525	   NSA) of the MORECOWBELL surveillance program [morecowbell], which
526	   uses the DNS, both passively and actively, to surreptitiously gather
527	   information about the users, is another good example showing that the
528	   lack of privacy protections in the DNS is actively exploited.

530	4.  Legalities

532	   To our knowledge, there are no specific privacy laws for DNS data, in
533	   any country.  Interpreting general privacy laws like
534	   [data-protection-directive] (European Union) in the context of DNS
535	   traffic data is not an easy task and we do not know a court precedent
536	   here.  An interesting analysis is [sidn-entrada].

538	5.  Security considerations

540	   This document is entirely about security, more precisely privacy.  It
541	   just lays out the problem, it does not try to set requirements (with
542	   the choices and compromises they imply), much less to define
543	   solutions.  Possible solutions to the issues described here are
544	   discussed in other documents (currently too many to all be
545	   mentioned), see for instance [I-D.ietf-dnsop-qname-minimisation] for
546	   the minimisation of data, or [I-D.ietf-dprive-start-tls-for-dns]
547	   about encryption.

549	6.  Acknowledgments

551	   Thanks to Nathalie Boulvard and to the CENTR members for the original
552	   work which leaded to this document.  Thanks to Ondrej Sury for the
553	   interesting discussions.  Thanks to Mohsen Souissi and John Heidemann
554	   for proofreading, to Paul Hoffman, Matthijs Mekking, Marcos Sanz, Tim
555	   Wicinski, Francis Dupont, Allison Mankin and Warren Kumari for
556	   proofreading, technical remarks, and many readability improvements.
557	   Thanks to Dan York, Suzanne Woolf, Tony Finch, Stephen Farrell, Peter
558	   Koch, Simon Josefsson and Frank Denis for good written contributions.
559	   And thanks to the IESG members for the last remarks.

561	7.  IANA considerations

563	   This document has no actions for IANA.

565	8.  References

567	8.1.  Normative References

569	   [RFC1034]  Mockapetris, P., "Domain names - concepts and facilities",
570	              STD 13, RFC 1034, November 1987.

572	   [RFC1035]  Mockapetris, P., "Domain names - implementation and
573	              specification", STD 13, RFC 1035, November 1987.

575	   [RFC6973]  Cooper, A., Tschofenig, H., Aboba, B., Peterson, J.,
576	              Morris, J., Hansen, M., and R. Smith, "Privacy
577	              Considerations for Internet Protocols", RFC 6973, July
578	              2013.

580	   [RFC7258]  Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an
581	              Attack", BCP 188, RFC 7258, May 2014.

583	8.2.  Informative References

585	   [RFC4033]  Arends, R., Austein, R., Larson, M., Massey, D., and S.
586	              Rose, "DNS Security Introduction and Requirements", RFC
587	              4033, March 2005.

589	   [RFC5155]  Laurie, B., Sisson, G., Arends, R., and D. Blacka, "DNS
590	              Security (DNSSEC) Hashed Authenticated Denial of
591	              Existence", RFC 5155, March 2008.

593	   [RFC5936]  Lewis, E. and A. Hoenes, "DNS Zone Transfer Protocol
594	              (AXFR)", RFC 5936, June 2010.

596	   [RFC6269]  Ford, M., Boucadair, M., Durand, A., Levis, P., and P.
597	              Roberts, "Issues with IP Address Sharing", RFC 6269, June
598	              2011.

600	   [I-D.ietf-dnsop-edns-client-subnet]
601	              Contavalli, C., Gaast, W., Lawrence, D., and W. Kumari,
602	              "Client Subnet in DNS Querys", draft-ietf-dnsop-edns-
603	              client-subnet-01 (work in progress), May 2015.

605	   [I-D.iab-privsec-confidentiality-threat]
606	              Barnes, R., Schneier, B., Jennings, C., Hardie, T.,
607	              Trammell, B., Huitema, C., and D. Borkmann,
608	              "Confidentiality in the Face of Pervasive Surveillance: A
609	              Threat Model and Problem Statement", draft-iab-privsec-
610	              confidentiality-threat-07 (work in progress), May 2015.

612	   [I-D.wouters-dane-openpgp]
613	              Wouters, P., "Using DANE to Associate OpenPGP public keys
614	              with email addresses", draft-wouters-dane-openpgp-02 (work
615	              in progress), February 2014.

617	   [I-D.ietf-dprive-start-tls-for-dns]
618	              Zi, Z., Zhu, L., Heidemann, J., Mankin, A., Wessels, D.,
619	              and P. Hoffman, "TLS for DNS: Initiation and Performance
620	              Considerations", draft-ietf-dprive-start-tls-for-dns-00
621	              (work in progress), May 2015.

623	   [I-D.ietf-dnsop-qname-minimisation]
624	              Bortzmeyer, S., "DNS query name minimisation to improve
625	              privacy", draft-ietf-dnsop-qname-minimisation-03 (work in
626	              progress), June 2015.

628	   [I-D.ietf-dnsop-dns-terminology]
629	              Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS
630	              Terminology", draft-ietf-dnsop-dns-terminology-02 (work in
631	              progress), May 2015.

633	   [denis-edns-client-subnet]
634	              Denis, F., "Security and privacy issues of edns-client-
635	              subnet", August 2013, <https://00f.net/2013/08/07/edns-
636	              client-subnet/>.

638	   [dagon-malware]
639	              Dagon, D., "Corrupted DNS Resolution Paths: The Rise of a
640	              Malicious Resolution Authority", 2007, <https://www.dns-
641	              oarc.net/files/workshop-2007/Dagon-Resolution-
642	              corruption.pdf>.

644	   [dns-footprint]
645	              Stoner, E., "DNS footprint of malware", October 2010,
646	              <https://www.dns-oarc.net/files/workshop-201010/OARC-ers-
647	              20101012.pdf>.

649	   [morecowbell]
650	              Grothoff, C., Wachs, M., Ermert, M., and J. Appelbaum,
651	              "NSA's MORECOWBELL: Knell for DNS", January 2015,
652	              <https://gnunet.org/morecowbell>.

654	   [darkreading-dns]
655	              Lemos, R., "Got Malware? Three Signs Revealed In DNS
656	              Traffic", May 2013,
657	              <http://www.darkreading.com/monitoring/
658	              got-malware-three-signs-revealed-in-dns/240154181>.

660	   [dnschanger]
661	              Wikipedia, , "DNSchanger", November 2011,
662	              <http://en.wikipedia.org/wiki/DNSChanger>.

664	   [packetq]  Dot SE, , "PacketQ, a simple tool to make SQL-queries
665	              against PCAP-files", 2011,
666	              <https://github.com/dotse/packetq/wiki>.

668	   [dnsmezzo]
669	              Bortzmeyer, S., "DNSmezzo", 2009,
670	              <http://www.dnsmezzo.net/>.

672	   [prism]    NSA, , "PRISM", 2007, <http://en.wikipedia.org/wiki/
673	              PRISM_%28surveillance_program%29>.

675	   [grangeia.snooping]
676	              Grangeia, L., "DNS Cache Snooping or Snooping the Cache
677	              for Fun and Profit", 2004,
678	              <http://www.msit2005.mut.ac.th/msit_media/1_2551/nete4630/
679	              materials/20080718130017Hc.pdf>.

681	   [ditl]     CAIDA, , "A Day in the Life of the Internet (DITL)", 2002,
682	              <http://www.caida.org/projects/ditl/>.

684	   [day-at-root]
685	              Castro, S., Wessels, D., Fomenkov, M., and K. Claffy, "A
686	              Day at the Root of the Internet", 2008,
687	              <http://www.sigcomm.org/sites/default/files/ccr/
688	              papers/2008/October/1452335-1452341.pdf>.

690	   [turkey-googledns]
691	              Bortzmeyer, S., "Hijacking of public DNS servers in
692	              Turkey, through routing", 2014,
693	              <http://www.bortzmeyer.org/
694	              dns-routing-hijack-turkey.html>.

696	   [data-protection-directive]
697	              Europe, , "European directive 95/46/EC on the protection
698	              of individuals with regard to the processing of personal
699	              data and on the free movement of such data", November
700	              1995, <http://eur-lex.europa.eu/LexUriServ/
701	              LexUriServ.do?uri=CELEX:31995L0046:EN:HTML>.

703	   [passive-dns]
704	              Weimer, F., "Passive DNS Replication", April 2005,
705	              <http://www.enyo.de/fw/software/dnslogger/#2>.

707	   [tor-leak]
708	              Tor, , "DNS leaks in Tor", 2013,
709	              <https://trac.torproject.org/projects/tor/wiki/doc/TorFAQ#
710	              IkeepseeingthesewarningsaboutSOCKSandDNSandinformationleak
711	              s.ShouldIworry>.

713	   [yanbin-tsudik]
714	              Yanbin, L. and G. Tsudik, "Towards Plugging Privacy Leaks
715	              in the Domain Name System", 2009,
716	              <http://arxiv.org/abs/0910.2472>.

718	   [castillo-garcia]
719	              Castillo-Perez, S. and J. Garcia-Alfaro, "Anonymous
720	              Resolution of DNS Queries", 2008,
721	              <http://deic.uab.es/~joaquin/papers/is08.pdf>.

723	   [fangming-hori-sakurai]
724	              Fangming, , Hori, Y., and K. Sakurai, "Analysis of Privacy
725	              Disclosure in DNS Query", 2007,
726	              <http://dl.acm.org/citation.cfm?id=1262690.1262986>.

728	   [thomas-ditl-tcp]
729	              Thomas, M. and D. Wessels, "An Analysis of TCP Traffic in
730	              Root Server DITL Data"", 2014, <https://indico.dns-
731	              oarc.net/event/20/session/2/contribution/15/material/
732	              slides/1.pdf>.

734	   [federrath-fuchs-herrmann-piosecny]
735	              Federrath, H., Fuchs, K., Herrmann, D., and C. Piosecny,
736	              "Privacy-Preserving DNS: Analysis of Broadcast, Range
737	              Queries and Mix-Based Protection Methods", 2011,
738	              <https://svs.informatik.uni-hamburg.de/publications/2011/2
739	              011-09-14_FFHP_PrivacyPreservingDNS_ESORICS2011.pdf>.

741	   [aeris-dns]
742	              Vinot, N., "[In French] Vie privee : et le DNS alors ?",
743	              2015, <https://blog.imirhil.fr/vie-privee-et-le-dns-
744	              alors.html>.

746	   [herrmann-reidentification]
747	              Herrmann, D., Gerber, C., Banse, C., and H. Federrath,
748	              "Analyzing characteristic host access patterns for re-
749	              identification of web user sessions", 2012,
750	              <http://epub.uni-regensburg.de/21103/1/
751	              Paper_PUL_nordsec_published.pdf>.

753	   [sidn-entrada]
754	              Hesselman, C., Jansen, J., Wullink, M., Vink, K., and M.
755	              Simon, "A privacy framework for 'DNS big data'
756	              applications", 2014,
757	              <https://www.sidnlabs.nl/uploads/tx_sidnpublications/
758	              SIDN_Labs_Privacyraamwerk_Position_Paper_V1.4_ENG.pdf>.

760	8.3.  URIs

762	   [1] https://developers.google.com/speed/public-dns/privacy

764	Author's Address

766	   Stephane Bortzmeyer
767	   AFNIC
768	   1, rue Stephenson
769	   Montigny-le-Bretonneux  78180
770	   France

772	   Phone: +33 1 39 30 83 46
773	   Email: bortzmeyer+ietf@nic.fr
774	   URI:   http://www.afnic.fr/