<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc1034 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.1034.xml">
<!ENTITY rfc1035 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.1035.xml">
<!ENTITY rfc2119 SYSTEM
"http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc2181 SYSTEM
"http://xml.resource.org/public/rfc/bibxml/reference.RFC.2181.xml">
<!ENTITY rfc4033 SYSTEM
"http://xml.resource.org/public/rfc/bibxml/reference.RFC.4033.xml">
<!ENTITY rfc5246 SYSTEM
"http://xml.resource.org/public/rfc/bibxml/reference.RFC.5246.xml">
<!ENTITY rfc5936 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5936.xml">
<!ENTITY rfc6347 SYSTEM
"http://xml.resource.org/public/rfc/bibxml/reference.RFC.6347.xml">
<!ENTITY rfc6973 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6973.xml">
<!ENTITY I-D.koch-perpass-dns-confidentiality SYSTEM
"http://xml.resource.org/public/rfc/bibxml3/reference.I-D.koch-perpass-dns-confidentiality.xml">
<!ENTITY I-D.bortzmeyer-dnsop-privacy-sol SYSTEM
"http://xml.resource.org/public/rfc/bibxml3/reference.I-D.bortzmeyer-dnsop-privacy-sol.xml">
<!ENTITY I-D.vandergaast-edns-client-subnet SYSTEM
"http://xml.resource.org/public/rfc/bibxml3/reference.I-D.vandergaast-edns-client-subnet">
<!ENTITY I-D.wijngaards-dnsop-confidentialdns SYSTEM
"http://xml.resource.org/public/rfc/bibxml3/reference.I-D.wijngaards-dnsop-confidentialdns">
]>

<rfc docName="draft-bortzmeyer-dnsop-dns-privacy-01" category="info" ipr="trust200902">
<?rfc toc="yes"?>
<?rfc strict="yes"?> 
<front>
<title abbrev="DNS privacy">DNS privacy problem statement</title>
<author fullname="Stephane Bortzmeyer" initials="S." surname="Bortzmeyer">
<organization>AFNIC</organization>
<address><postal><street>Immeuble International</street><code>78181</code><city>Saint-Quentin-en-Yvelines</city><country>France</country></postal> <phone>+33 1 39 30 83 46</phone><email>bortzmeyer+ietf@nic.fr</email><uri>http://www.afnic.fr/</uri></address>
</author>
<date month="December" year="2013"/>
<abstract>
<t>This document describes the privacy issues associated with the use
of the
DNS by Internet users. It is intended to be mostly a problem statement and it does not
prescribe solutions.</t>
<t>Discussions of the document should take place on the <xref target="dnsop">dnsop
mailing list</xref>.</t>
</abstract>
</front>

<middle>
<section anchor="introduction" title="Introduction">
<t>The Domain Name System is specified in <xref
target="RFC1034"/> and <xref target="RFC1035"/>. It is one of the
most important infrastructure components of the Internet and one of
the most often ignored or misunderstood. Almost every activity on the
Internet starts with a DNS query (and often several). Its use has many privacy
implications and we try to give here a comprehensive and accurate
list.</t>
<t>Let us start with a small reminder of the way the DNS works (with
some simplifications). A client, the stub resolver, issues a DNS query
to a server, the resolver (also called caching resolver or full resolver or recursive name server). For instance, the query is "What are the
AAAA records for www.example.com?". AAAA is the qtype (Query Type)
and www.example.com the qname (Query Name). To get the answer, the resolver
will query first the root nameservers, which will, most of the times,
send a referral. Here, the referral will be to .com nameservers. In
turn, they will send a referral to the example.com nameservers, which
will provide the answer. The root name servers, the name servers of
.com and those of example.com are called authoritative name servers. It is important, when analyzing the privacy
issues, to remember that the question asked to all these name servers
is always the original question, not a derived question. Unlike what
many "DNS for dummies" articles say, the question sent to the root
name servers is "What are the
AAAA records for www.example.com?", not "What are the name servers
of .com?". So, the DNS leaks more information than it should.</t>
<t>Because the DNS uses caching heavily, not all questions are sent to
the authoritative name servers. If the stub resolver, a few seconds
later, asks to the resolver "What are the SRV records of
_xmpp-server._tcp.example.com?", the resolver will remember that it
knows the name servers of example.com and will just query them,
bypassing the root and .com. Because there is typically no caching in
the stub resolver, the resolver, unlike the authoritative servers, sees everything.
</t>
<t>Almost all the DNS queries are today sent over UDP, and this has
practical consequences if someone thinks of encrypting this traffic
(some encryption solutions are
typically done for TCP, not UDP).</t>
<t>I should be noted to that DNS resolvers sometimes forward requests to
bigger machines, with a larger and more shared cache, the
forwarders. From the point of view of privacy, forwarders are like
resolvers, except that the caching in the resolver before them
decreases the amount of data they can see.</t>
<t>Another important point to keep in mind when analyzing the privacy
issues of DNS is the mix of many sort of DNS requests received by a
server. Let's assume the eavesdropper want to know which Web page is
visited by an user. For a typical Web page displayed by the user,
there are three sorts of DNS requests:
<list>
<t>Primary request: this is the domain name that the user typed or
selected from a bookmark or choosed by clicking on an
hyperklink. Presulably, this is what is of interest for the
eavesdropper.</t>
<t>Secondary requests: these are the requests performed by the user
agent (here, the Web browser) without any direct involvment or
knowledge of the user. For the Web, they are triggered by included
content, CSS sheets, JavaScript code, embedded images, etc. In some
cases, there can be dozens of domain names in a single page.</t>
<t>Tertiary requests: these are the requests performed by the DNS
system itself. For instance, if the answer to a query is a referral to
a set of name servers, and the glue is not returned, the resolver will
have to do tertiary requests to turn name servers' named into IP addresses.</t>
</list></t>
<t>For privacy-related terms, we will use here the terminology of <xref target="RFC6973"/>.</t>
</section>

<section title="Risks">

<t>This draft is limited to the study of privacy risks for the end-user (the one performing DNS
requests). Privacy risks for the holder of a zone (the risk that
someone gets the data) are discussed in <xref target="RFC5936"/> and
in <xref target="I-D.koch-perpass-dns-confidentiality"/>. Non-privacy
risks (such as cache poisoning) are out of scope.</t>

<section title="Data in the DNS request">
<t>The DNS request includes many fields but two of them seem specially
relevant for the privacy issues, the qname and the source IP
address. "source IP address" is used in a loose sense of "source IP address
+ may be source port", because the port is also in the request and can be used to sort out several users
sharing an IP address (CGN for instance).</t>
<t>The qname is the full name sent by the original user. It gives
information about what the user does ("What are the MX records of
example.net?" means he probably wants to send email to someone at
example.net, which may be a domain used by only a few persons and
therefore very revealing). Some qnames are more sensitive than
others. For instance, querying the A record of google-analytics.com
reveals very little (everybody visits Web sites which use Google
Analytics) but querying the A record of www.verybad.example where
verybad.example is the domain of an illegal or very offensive
organization may create more problems for the user. Another example is when
the qname embeds the software one uses. For instance, some
BitTorrent clients query a SRV record for _bittorrent-tracker._tcp.domain.example.</t>
<t>For the communication between the stub resolver and the resolver,
the source IP address is the one of the user's machine. Therefore, all
the issues and warnings about collection of IP addresses apply
here. For the communication between the resolver and the authoritative
name servers, the source IP address has a different meaning, it does not have the same status as the source
address in a HTTP connection. It is now the IP address of the resolver
which, in a way "hides" the real user. However, it does not always
work. Sometimes <xref
target="I-D.vandergaast-edns-client-subnet"/> is used. Sometimes the end user has a personal resolver on her
machine. In that case, the IP address is as sensitive as it is for
HTTP.</t>
<t>A note about IP addresses: there is currently no IETF document
which describes in detail the privacy issues of IP addressing. In the
mean time, the discussion here is intended to include both IPv4 and IPv6 source
addresses. For a number of reasons their assignment and utilization characteristics
are different, which may have implications for details of information leakage
associated with the collection of source addresses. (For example, a specific IPv6
source address seen on the public Internet is less likely than an IPv4 address to
originate behind a CGN or other NAT.) However, for both IPv4 and IPv6 addresses,
it's important to note that source addresses are propagated with queries and
comprise metadata about the host, user, or application that originated them.</t>
</section>

<section anchor="risks-on-wire" title="On the wire">
<t>DNS traffic can be seen by an eavesdropper like any other
traffic. It is typically not encrypted. (DNSSEC, specified in <xref
target="RFC4033"/> explicitely excludes confidentiality from its
goals.) So, if an initiator starts a HTTPS communication with a
recipient, while the HTTP traffic will be encrypted, the DNS exchange
prior to it will not be. When the other protocols will become more or
more privacy-aware and secured against surveillance, the DNS risks to
become "the weakest link" in privacy.</t>
<t>What also makes the DNS traffic different is that it may take a
different path than the communication between the initiator and the
recipient. For instance, an eavesdropper may be unable to tap the wire
between the initiator and the recipient but may have access to the
wire going to the resolver, or to the authoritative name servers.</t>
<t>The best place, from an eavesdropper's point of view, is clearly
between the stub resolvers and the resolvers, because he is not
limited by DNS caching.</t>
<t>The attack surface between the stub resolver and the rest of the world
can vary widely depending upon how the end user's computer is
configured. By order of increasing attack surface:</t>
<t>The resolver can be on the end user's computer. In (currently) a small number of cases,
individuals may choose to operate their own DNS resolver on their local
machine. In this case the attack surface for the stub resolver to caching
resolver connection is limited to that single machine.
</t>
<t>The resolver can be in the IAP (Internet Access Provider) premises.
For most residential users and potentially other
networks the typical case is for the end user's computer to be configured
(typically automatically through DHCP) with the addresses of the DNS
resolver at the IAP.  The attack surface for on-the-wire attacks is
therefore from the end user system across the local network and across the
IAP
network to the IAP's resolvers.</t>
<t>The resolver may also be at the local network edge. For many/most enterprise networks
and for some residential users the caching resolver may exist on a server
at the edge of the local network.  In this case the attack surface is the
local network.  Note that in large enterprise networks the DNS resolver
may not be located at the edge of the local network but rather at the edge
of the overall enterprise network. In this case the enterprise network
could be thought of as similar to the IAP network referenced above.</t>
<t>The resolver can be a public DNS service. Some end users may be configured to
use public DNS resolvers such as those operated by Google Public DNS or
OpenDNS. The end user may have configured their machine to use
these DNS resolvers themselves - or their IAP may choose to use the public
DNS resolvers rather than operating their own resolvers.  In this case the
attack surface is the entire public Internet between the end user's
connection and the public DNS service.</t>
</section>

<section title="In the servers">
<t>Using the terminology of <xref target="RFC6973"/>, the DNS servers
(resolvers and authoritative servers) are enablers: they facilitate communication between
an initiator and a recipient without being directly in the
communications path. As a result, they are often forgotten in risk
analysis. But, to quote again <xref target="RFC6973"/>, "Although [...] enablers may not generally
be considered as attackers, they may all pose privacy threats
(depending on the context) because they are able to observe, collect,
process, and transfer privacy-relevant data." In <xref
target="RFC6973"/> parlance, enablers become observers when they start
collecting data.</t>
<t>Many programs exist to collect and analyze DNS data at the servers. From the
"query log" of some programs like BIND, to tcpdump and more
sophisticated programs like <xref target="packetq">PacketQ</xref>
reference and <xref target="dnsmezzo">DNSmezzo</xref>. The organization
managing the DNS server can use this data itself or it can be
part of a surveillance program like <xref target="prism">PRISM</xref> and
pass data to an outside attacker.</t>
<t>Sometimes, these data are kept for a long time and/or 
distributed to third parties, for research purposes [ditl], for
security analysis, or for surveillance tasks. Also, there are
observation points in the network which gather DNS data and then make
it accessible to third-parties for research or security purposes
("<xref target="passive-dns">passive DNS</xref>").</t>

<section title="In the resolvers">
<t>The resolvers see the entire traffic since there is typically no
caching before them. They are therefore well situated to observe the
traffic. To summarize: your resolver knows a lot about you. The resolver
of a large IAP, or a large public resolver can collect data from many
users. You may get an idea of the data collected by reading <eref
target="https://developers.google.com/speed/public-dns/privacy">the
privacy policy of a big public resolver</eref>. <!-- TODO published
policies of OpenDNS: nothing found on their Web site, only for the
Web, not for the DNS service, question sent, indirect reply received
TODO summarize --></t>
</section>

<section title="In the authoritative name servers">
<t>Unlike the resolvers, they are limited by caching. They see only a
part of the requests. For aggregated statistics ("what is the
percentage of LOC queries?"), it is sufficient but it may prevent an
observer to observe everything. Nevertheless, the authoritative name
servers sees a part of the traffic and this sample may be sufficient to
defeat some privacy expectations.</t>
<t>Also, the end user has typically some legal/contractual link with
the resolver (he has chosen the IAP, or he has chosen to use a given public
resolver) while he is often not even aware of the role of the
authoritative name servers and their observation abilities.</t>
<t>It is an interesting question whether the privacy issues are bigger
in the root or in a large TLD. The root sees the traffic for all the TLDs (and the huge
amount of traffic for non-existing TLD) but a large TLD has less caching
before it.</t>
<t>As noted before, using a local resolver or a resolver close to the
machine decreases the attack surface for an on-the-wire
eavesdropper. But it may decrease privacy against an observer located
on an authoritative name server since the authoritative name server
will see the IP address of the end client, and not the address of a
big resolver shared by many users. This is no longer true if <xref
target="I-D.vandergaast-edns-client-subnet"/> is used because, in this
case, the authoritative name server sees the original IP prefix or address (depending on the setup).</t>
<t>As of today, all the instances of one root name server, L-root,
receive together around 20 000 queries per second<!-- http://dns.icann.org/cgi-bin/dsc-grapher.pl?plot=bynode&server=L-root -->. While most of it is junk (errors on
the TLD name), it gives an idea of the amount of big data which pours
into name servers.</t>
<t>Many domains, including TLD, are partially hosted by third-party
servers, sometimes in a different country. The contracts between the
domain manager and these servers may or may not take privacy into
account. But it may be surprising for an end-user that requests to a
given ccTLD may go to servers managed by organisations outside of the country.</t>
</section>

<section title="Rogue servers">
<t>A rogue DHCP server can direct you to a rogue resolver. Most of the
times, it seems to be done to divert traffic, by providing lies for
some domain names. But it could be used just to
capture the traffic and gather information about you. Same thing for
malwares like DNSchanger<xref target="dnschanger"/> which changes the
resolver in the machine's configuration.</t>
</section>

</section>

</section>

<section title="Actual &quot;attacks&quot;">
<t>A very quick examination of DNS traffic may lead to the false
conclusion that extracting the needle from the haystack is
difficult. "Interesting" primary DNS requests are mixed with useless
(for the eavesdropper) second and
tertiary requests (see the terminology in <xref
target="introduction"/>). But, in this time of "big data" processing,
powerful techniques now exist to get from the raw data to what you're
actually interested in.</t>
<t>Many research papers about malware detection use DNS traffic to
detect "abnormal" behaviour that can be traced back to the activity of
malware on infected machines. Yes, this reasearch was done for the good but,
technically, it is a privacy attack and it demonstrates the power of
the observation of DNS traffic. See <xref target="dns-footprint"/>,
<xref target="dagon-malware"/> and <xref target="darkreading-dns"/>.</t>
</section>

<section title="Legalities">
<t>To our knowledge, there are no specific privacy laws for DNS
data. Interpreting general privacy laws like
[data-protection-directive] (European Union) in the context of DNS traffic data is not
an easy task and it seems there is no court precedent here.</t>
</section>

<section title="Security considerations">
<t>This document is entirely about security, more precisely
privacy. Possible solutions to the issues described here are discussed
in <xref target="I-D.bortzmeyer-dnsop-privacy-sol"/> or in <xref target="I-D.wijngaards-dnsop-confidentialdns"/>.</t>

</section>

<section title="Acknowledgments">
<t>Thanks to Nathalie Boulvard and to the CENTR members for the
original work which leaded to this draft. Thanks to Ondrej Sury for the
interesting discussions. Thanks to Mohsen Souissi for
proofreading. Thanks to Dan York, Suzanne Woolf  and Frank Denis for good written contributions.</t>
</section>

</middle>

<back>

<references title='Normative References'>
&rfc1034;
&rfc1035;
&rfc2119;
&rfc6973;
</references>

<references title='Informative References'>
&rfc2181;
&rfc4033;
&rfc5246;
&rfc5936;
&rfc6347;
&I-D.koch-perpass-dns-confidentiality;
&I-D.vandergaast-edns-client-subnet;
&I-D.bortzmeyer-dnsop-privacy-sol;
&I-D.wijngaards-dnsop-confidentialdns;

<reference anchor="dnsop">
<front>
<title>The dnsop mailing list</title>
<author fullname="IETF" surname="IETF"/>
<date month="October" year="2013"/>
<abstract>
<t>DNSOP is the IETF Working Group tasked with all the DNS operations issues <eref target="http://www.ietf.org/mail-archive/web/dnsop/"/></t>
</abstract>
</front>
</reference>

<reference anchor="dagon-malware">
<front>
<title>Corrupted DNS Resolution Paths: The Rise of a Malicious
Resolution Authority</title>
<author surname="Dagon" initials="D." fullname="David Dagon"/>
<date year="2007"/>
<abstract>
<t>Presented at the DNS-OARC meeting in Atlanta. <eref target="https://www.dns-oarc.net/files/workshop-2007/Dagon-Resolution-corruption.pdf"/></t>
</abstract>
</front>
</reference>

<reference anchor="dns-footprint">
<front>
<title>DNS footprint of malware</title>
<author fullname="Ed Stoner" surname="Stoner" initials="E."/>
<date day="13" month="October" year="2010"/>
<abstract>
<t>Finding Malicious Activity Using Network Flow Data. Presented at the DNS-OARC meeting in Denver. <eref target="https://www.dns-oarc.net/files/workshop-201010/OARC-ers-20101012.pdf"/></t>
</abstract>
</front>
</reference>

<reference anchor="darkreading-dns">
<front>
<title>Got Malware? Three Signs Revealed In DNS Traffic</title>
<author fullname="Robert Lemos" surname="Lemos" initials="R."/>
<date month="May" year="2013" day="3"/>
<abstract>
<t>Monitoring your network's requests for domain lookups can reveal
network problems and potential malware infections. <eref target="http://www.darkreading.com/monitoring/got-malware-three-signs-revealed-in-dns/240154181"/></t>
</abstract>
</front>
</reference>

<reference anchor="dnschanger">
<front>
<title>DNSchanger</title>
<author fullname="Wikipedia" surname="Wikipedia"/>
<date month="November" year="2011"/>
<abstract>
<t><eref target="http://en.wikipedia.org/wiki/DNSChanger"/></t>
</abstract>
</front>
</reference>

<reference anchor="dnscrypt">
<front>
<title>DNSCrypt</title>
<author fullname="Frank Denis" surname="Denis" initials="F."/>
<date/>
<abstract>
<t>A tool for securing communications between a client [stub resolver]
and a DNS resolver. <eref target="http://dnscrypt.org/"/></t>
</abstract>
</front>
</reference>

<reference anchor="dnscurve">
<front>
<title>DNScurve</title>
<author fullname="Dan Bernstein" surname="Bernstein" initials="D."/>
<date/>
<abstract>
<t>DNSCurve uses high-speed high-security elliptic-curve cryptography
and claims 
to "drastically improve every dimension of DNS security". <eref target="http://dnscurve.org/"/><!-- The text in
http://dnscurve.org/espionage.html mentions only wire sniffing and
copmpletey forgets PRISM-style attacks. --></t></abstract>
</front>
</reference>

<reference anchor="packetq">
<front>
<title>PacketQ, a simple tool to make SQL-queries against PCAP-files</title>
<author fullname="Dot SE"/>
<date year="2011"/>
<abstract><t>A tool that provides a basic SQL-frontend to
PCAP-files. Outputs JSON, CSV and XML and includes a build-in
webserver with JSON-api and a nice looking AJAX GUI <eref target="https://github.com/dotse/packetq/wiki"/></t></abstract>
</front>
</reference>

<reference anchor="dnsmezzo">
<front>
<title>PacketQ, a simple tool to make SQL-queries against
PCAP-files</title>
<author fullname="Stéphane Bortzmeyer" surname="Bortzmeyer" initials="S."/>
<date year="2009"/>
<abstract><t>DNSmezzo is a framework for the capture and analysis of DNS packets. It allows the manager of a DNS name server to get information such as the top N domains requests, the percentage of IPv6 queries, the most talkative clients, etc. It is part of the broader program DNSwitness. <eref target="http://www.dnsmezzo.net/"/></t></abstract>
</front>
</reference>

<reference anchor="prism">
<front>
<title>PRISM</title>
<author fullname="National Security Agency" surname="NSA"/>
<date year="2007"/>
<abstract>
<t><eref target="http://en.wikipedia.org/wiki/PRISM_%28surveillance_program%29"/></t>
</abstract>
</front>
</reference>

<reference anchor="crime">
<front>
<title>The CRIME attack against TLS</title>
<author surname="Rizzo" initials="J." fullname="Juliano Rizzo"/>
<author surname="Dong" initials="T." fullname="Thai Duong"/>
<date year="2012"/>
<abstract>
<t><eref target="http://en.wikipedia.org/wiki/CRIME_(security_exploit)"/></t>
</abstract>
</front>
</reference>

<reference anchor="ditl">
<front>
<title>A Day in the Life of the Internet (DITL)</title>
<author fullname="CAIDA"/>
<date year="2002"/>
<abstract>
<t>CAIDA, ISC, DNS-OARC, and many partnering root nameserver operators
and other organizations to coordinate and conduct large-scale,
simultaneous traffic data collection events with the goal of capturing
datasets of strategic interest to researchers. Over the last several
years, we have come to refer to this project and related activities as
"A Day in the Life of the Internet" (DITL). <eref target="http://www.caida.org/projects/ditl/"/></t>
</abstract>
</front>
</reference>

<reference anchor="data-protection-directive">
<front>
<title>European directive 95/46/EC on the protection of individuals
with regard to the processing of personal data and on the free
movement of such data</title>
<author fullname="European Parliament"/>
<date day="23" month="November" year="1995"/>
<abstract>
<t><eref target="http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:31995L0046:EN:HTML"/></t>
</abstract>
</front>
</reference>

<reference anchor="passive-dns">
<front>
<title>Passive DNS Replication</title>
<author fullname="Florian Weimer" initials="F." surname="Weimer"/>
<date month="April"  year="2005"/>
<abstract>
<t>FIRST 17 <eref target="http://www.enyo.de/fw/software/dnslogger/#2"/></t>
</abstract>
</front>
</reference>

<reference anchor="tor-leak">
<front>
<title>DNS leaks in Tor</title>
<author fullname="Tor Project"/>
<date year="2013"/>
<abstract><t><eref target="https://trac.torproject.org/projects/tor/wiki/doc/TorFAQ#IkeepseeingthesewarningsaboutSOCKSandDNSandinformationleaks.ShouldIworry"/></t></abstract>
</front>
</reference>

</references>

</back>

</rfc>



