idnits 2.17.1 draft-mglt-abcd-doh-privacy-analysis-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet has text resembling RFC 2119 boilerplate text. -- The document date (November 04, 2019) is 1635 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 dnsop D. Migault 3 Internet-Draft Ericsson 4 Intended status: Informational November 04, 2019 5 Expires: May 7, 2020 7 A privacy analysis on DoH deployment 8 draft-mglt-abcd-doh-privacy-analysis-00 10 Abstract 12 This document provides an analysis on DoH impact on privacy 14 Status of This Memo 16 This Internet-Draft is submitted in full conformance with the 17 provisions of BCP 78 and BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF). Note that other groups may also distribute 21 working documents as Internet-Drafts. The list of current Internet- 22 Drafts is at https://datatracker.ietf.org/drafts/current/. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 This Internet-Draft will expire on May 7, 2020. 31 Copyright Notice 33 Copyright (c) 2019 IETF Trust and the persons identified as the 34 document authors. All rights reserved. 36 This document is subject to BCP 78 and the IETF Trust's Legal 37 Provisions Relating to IETF Documents 38 (https://trustee.ietf.org/license-info) in effect on the date of 39 publication of this document. Please review these documents 40 carefully, as they describe your rights and restrictions with respect 41 to this document. Code Components extracted from this document must 42 include Simplified BSD License text as described in Section 4.e of 43 the Trust Legal Provisions and are provided without warranty as 44 described in the Simplified BSD License. 46 Table of Contents 48 1. Requirements Notation . . . . . . . . . . . . . . . . . . . . 2 49 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 50 3. DNS traffic and privacy . . . . . . . . . . . . . . . . . . . 3 51 4. Privacy impact of DoH . . . . . . . . . . . . . . . . . . . . 4 52 4.1. DNS systems polices: lost of control versus independence 6 53 5. Privacy impact related to the choice of the DNS resolver . . 7 54 6. Privacy impact of concentration . . . . . . . . . . . . . . . 8 55 6.1. Acknowledgment . . . . . . . . . . . . . . . . . . . . . 10 56 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 57 7.1. Normative References . . . . . . . . . . . . . . . . . . 10 58 7.2. Informative References . . . . . . . . . . . . . . . . . 11 59 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 11 61 1. Requirements Notation 63 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 64 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 65 "OPTIONAL" in this document are to be interpreted as described BCP 14 66 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, 67 as shown here. 69 2. Introduction 71 DNS Queries over HTTPS (DoH) [RFC8484] differs from the traditional 72 DNS [RFC1035] in that DNS exchanges between the DNS client and the 73 resolver are now encrypted and that DNS traffic is not signaled as 74 DNS traffic (with port 53) but instead uses (port 443). 76 Such approach could enhance end user's privacy by preventing any on- 77 path party to infer any DNS related information from the observed 78 traffic. However, such enhancement may also have counter effects 79 such as the loose of control of the DNS traffic by the end user 80 itself. 82 This draft aims at providing an analysis on the impact of the 83 deployment of DoH on the current internet. 85 Section Section 3 details privacy sensitive information carried by 86 the DNS traffic and evaluate how specific this information is 87 specific to DNS or could be inferred from other traffic such as the 88 web traffic depending on Internet concentration. 90 Section Section 4 exposes the privacy implication of possible usage 91 of DoH and more precisely the ability to circumvent or enforce the 92 end user policies. 94 While encrypting the DNS traffic enables the section of a DNS 95 resolver, section Section 5 exposes the privacy implications 96 associated to the selection of a resolver and show that choosing a 97 resolver outside the boundaries of an ISP provides in fact limited 98 protection toward that ISP. 100 Finally, section Section 6 shows that despite the advantages that 101 concentration could provide by obfuscating the IP address, the 102 overall picture of concentration shows that it represents a threat to 103 the end user's privacy. 105 3. DNS traffic and privacy 107 DNS data are public data available to everyone. As a result, the 108 value associated to a DNS exchange are mostly carried by the DNS 109 request that answers to "What this specific end users is interested 110 in?" or "Which end users contact this site?" rather the DNS 111 information provided by the response. Such information is carried by 112 associating the destination IP address (of the IP header) as well as 113 the DNS query field. There are good and bad reasons for monitoring 114 the sites as well as end user that connects to them. Typically, a 115 network administrator may prevent the end user to connect to 116 malicious web sites as well as monitor the sites the end user is 117 connecting to. However, it is out of the control of any protocol to 118 impact the usage of this information. 120 In most cases, the DNS exchanges are followed by a web connection. 121 If the web session were not encrypted, observation of the web traffic 122 would provide the same information as those carried in the DNS 123 traffic. This information would be richer and more accurate, as web 124 traffic really reflects the web sites the end user is accessing. 125 However, the amount of web traffic is huge compared to the DNS 126 traffic and the DNS traffic was clearly distinct from the HTTP 127 traffic with different port from HTTP(S) with a distinct termination 128 point (DNS resolver). 130 With an increasing number of encrypted web traffic, analysis of the 131 HTTP traffic is not anymore possible as it is being protected by TLS. 132 However, HTTPS traffic still reveals the destination IP address and 133 the domain name within a TLS field designated as SNI. As mentioned 134 earlier, analysis of the HTTPS traffic, due to the volumes invoked 135 remains a challenge in itself. However, the encryption of the SNI as 136 well as the fact that one IP address provided by cloud provider can 137 be shared by multiple web sites clearly limit the meaning of the 138 information provided by a supposed analysis of the HTTPS traffic. As 139 a result, in addition to be more convenient, the information 140 associated revealed by the DNS traffic may not be inferred from other 141 traffic. 143 As a result, the information carried by the DNS traffic has the 144 following characteristics: 146 o The DNS traffic is a good representation of the web traffic of one 147 end user. 149 o When not carried over HTTP, the DNS traffic is by construction 150 logically separate from the web traffic. 152 o The DNS traffic is terminated in one point, while web traffic is 153 generally terminated at multiple destinations 155 The privacy sensitive information carried by the DNS traffic are the 156 IP addresses that "identify" the end user and the content of the DNS 157 query, that reflects the activity of the end user. This information 158 is limited to the administrative domain the DNS traffic is steered to 159 when the DNS traffic is not encrypted. When the DNS traffic is 160 encrypted, this information is limited to the two end points, that is 161 the end user and the DNS resolver. 163 The same activity can be inferred from the encrypted web traffic 164 unless ESNI together with a high concentration of web sites behind a 165 limited number of IP addresses. In that sense web site concentration 166 and ESNI adds boundaries to the information associated to the DNS 167 traffic, which could enhance the privacy against on-path monitoring. 168 However, concentration of the web traffic transfers the information 169 from the internet providers to large cloud providers. 170 Section Section 6 details furthermore how concentration represents a 171 direct threat to privacy. 173 As a result privacy sensitive information carried by the DNS traffic 174 is shared between the DNS client, the DNS resolver via DNS traffic. 175 Similar information is provided by the web traffic that is shared 176 between the HTTP client as well as the internet service provider and 177 major cloud providers. The balance between these two depends on the 178 level of concentration. 180 4. Privacy impact of DoH 182 The use of DoH to perform DNS exchanges has the following impacts on 183 the DNS traffic: 185 o DNS traffic is encrypted 187 o DNS traffic is no different from the encrypted web traffic 189 As mentioned in section Section 3, since DNS traffic is encrypted, 190 the privacy sensitive information of the DNS is exchanged between the 191 DNS client and the DNS resolver. As per the Internet threat model of 192 [RFC3552], it is expected that "the end-systems engaging in a 193 protocol exchange have not themselves been compromised. Protecting 194 against an attack when one of the end-systems has been compromised is 195 extraordinarily difficult.". The purpose of the protection is to 196 protect against an attacker that may have a complete control of the 197 network. With that threat model in mind, encryption protects the DNS 198 exchanged via DNS exchange between the DNS client and the DNS 199 resolver and as such improves the end user's privacy. In particular 200 it protects against pervasive monitoring attacks [RFC7258]. 202 However, as mentioned in [RFC6973] privacy analysis needs to question 203 the assumption of [RFC3552] on end-systems "since systems are often 204 compromised for the purpose of obtaining personal data". In 205 addition, privacy also includes the ability of the end user to 206 control and protect its information. 208 The ability to enforce policies for the DNS traffic has been 209 performed until today by having the DNS client centralized in the 210 system of the end user. The configuration at the operating system 211 level ensures that all applications were aligned with the end user 212 policy. A typical policy typically includes the domains that needs 213 to be resolved, the interface to be used, the DNS resolver to 214 contact... 216 DoH changes this paradigm in the way that an application can 217 circumvent the policy set by the end user, without the end user being 218 aware of it. Firstly, the encryption is performed by the application 219 and as such does not provide any visibility to the operating system. 220 Second, the use of HTTPS makes DNS traffic indistinguishable from the 221 web traffic. To that extend, DoT would signal the system that some 222 encrypted DNS traffic is being handled by the application. The end 223 user may accept or refuse such traffic depending on its policy. DoH 224 does not provides such capabilities. 226 Another way to see this issue is to consider that the communication 227 between the DNS client and the DNS resolver is a communication that 228 is secured between the two application end-point. The end resolver 229 policy enforcement is performed on-path inside the end user system, 230 but encryption prevents it to be enforced. 232 In a nutshell, DoH encrypts and makes DNS traffic undetectable. This 233 provides the ability for an application to circumvent the policies 234 defined by the system and can be seen as a loose of control. The 235 alignment with the policies of the system is enforced by explicit 236 policies from the application and trust the application enforces the 237 claimed policies. 239 The impact on privacy needs to balance the DNS policies provided by 240 the system versus those provided by the application and more 241 explicitly which of these policies better protects the end user. 243 4.1. DNS systems polices: lost of control versus independence 245 The DNS system policies may or may not reflects the end user's 246 preferences, however, these are part of the configuration parameters 247 of the system and the end user can at least be aware of the policies 248 of its system. 250 There are cases were the DNS policies in the system expresses the end 251 user's policies. This includes typically the choice of a specific 252 DNS resolver, the subscription to parental control. For such end 253 user, the ability that an application circumvents the policies of the 254 system represents a threat to their ability to control their DNS 255 traffic. 257 Similarly, there are cases were the DNS policies are not explicitly 258 specified by the end user, but there is a agreement of the end user 259 to have these policies. This typically includes corporate users that 260 have agreed to comply with the corporate policies with potentially 261 some web sites cannot be accessed. For these end users, the ability 262 that an application can circumvent the DNS policies of the company 263 exposes the end user to risks he may not want to take. 265 For the two latest category of users the ability for each application 266 to have specific DNS policies present the following drawbacks: 268 1) A per-application control results in defining at multiple places 269 the DNS policies. This at least can create some confusions to the 270 end user, makes configuration prone to errors and eventually 271 debugging harder. 273 2) While some applications may have clear and explicit DNS policies, 274 that the end user could in principle check or configure against the 275 policies he is enforcing, these policies are subject to change over 276 time and without notice, typically during updates. While constantly 277 checking the policies is not something we can rely on, the end user 278 or company may delay the applications to be updated which adds an 279 additional risk to the end user privacy. 281 There are cases where the DNS policies are imposed to the end user 282 against its will and without agreement from his side. Motivations 283 for such policies could be to enforce surveillance of the end user. 284 In such situation the ability to circumvent the DNS policies by an 285 application improves the end user's privacy. It is also safer that 286 DNS policies are enforced by the application as the application will 287 be in these situation the trusted system of the end user. 289 As a result, that an application can enforce there own policies 290 improves or reduce the control of the DNS traffic of the end user 291 depends on what the trust system of the end user is. If the trust 292 system of the end user is the application, this ability clearly 293 improves, otherwise, this may represent a threat. In the later case, 294 applications should follow the configuration of the system. 296 5. Privacy impact related to the choice of the DNS resolver 298 As mentioned in section Section 3 DoH provides end-to-end encryption 299 and as such provides the ability for the end user to chose a specific 300 DNS resolver and share the DNS data only with that resolver. One 301 motivation to chose a specific DNS resolver is to move to a DNS 302 resolver that considers the end user's privacy with more attention. 303 This includes, among other things, not profiling the end users, not 304 selling user's information, or in some places not tracking specific 305 end users. In that sense, the ability for a end user to chose a DNS 306 resolver represents major improvement. When the DNS resolvers are 307 not on-path, and the end user changes from one DNS resolver to the 308 other, encryption does not provide additional protection. In fact, 309 encryption is clearly aiming at protecting against an attacker that 310 would be on-path. 312 On the other hand, as mentioned in section Section 3, web traffic, 313 unless using more advance IP routing such as with TOR, also leaks 314 similar information. Though gathering the information from the web 315 traffic instead of the DNS traffic raises the bar, it represents a 316 major improvement only if the bar remains sufficiently high. 317 Unfortunately, the bar is nonexistent for user tracking, and remains 318 weak to generalize the tracking to all users. As such the decision 319 is more on the network side to decide the value associated to the DNS 320 traffic or legal requirements to put the necessary infrastructure in 321 place for it. It does not seem to be entirely in the end user's 322 hand. As result, the encryption provides limited protection against 323 on-path parties - such as an ISP. Unless combined wit TOR, moving to 324 a DNS resolver that is not managed by the ISP does not hide much 325 information to the ISP. 327 The remaining of this section considers that DNS information is not 328 inferred from the web traffic and analyses how moving from a DNS 329 resolver hosted in the ISP network to an DNS resolver outside the ISP 330 network impacts the end user's privacy. The perspective is 331 considering the information shared with the DNS resolver, and 332 motivations such as bad privacy protection, better latency, DNSSEC 333 resolution, parental filtering are out of scope of this analysis. 335 Note that even if one is connected to an ISP and the DNS resolver is 336 provided by the ISP, it could be interpreted that encryption is not 337 necessary. This is not the case, especially as wireless 338 communication might be unprotected or provide the ability to man-in 339 -the middle. As such we assume in this section that the channel 340 between the end user and the DNS resolver is encrypted. 342 In general a DNS resolver can be seen as an anonymizer. It receives 343 a DNS request from a specific end user, resolves the request under 344 the resolver's identity and sends the response back to the end user. 345 Local ISPs are believed to be fairly close to the end user and as 346 such the IP address of the resolver can be used as a fairly good 347 approximation of the localisation of the end user without revealing 348 information about its identity. 350 When the end user is using a DNS resolver that is not located into 351 the ISP network, the end user is clearly providing this information 352 to another entity that used not to have this information and that 353 cannot infer it from observing the traffic. Similarly to the ISP, 354 the level of information depends on what is already shared with that 355 entity. If the end user were not sharing any information with that 356 entity, the end user may provide sufficient information to get 357 profiled by an additional entity. If that DNS resolver already got a 358 significant amount of data on that user, that data may fill the 359 little remaining privacy but could have a much smaller impact. For 360 example, in a highly concentrated Internet with one cloud provider 361 for all services, the end user traffic would use one - or a few - 362 destination IP addresses. That cloud provider would have access to 363 all the history of the end user, while the ISP would have little 364 information from the IP addresses or the encrypted SNI. In this 365 specific situation the end user may chose that cloud provider for it 366 DNS resolutions, to minimize the information leakage. Such scenario 367 is currently believed to be purely hypothetical. 369 As a result, using a public resolver rather than a local resolver can 370 be seen as sharing the web history of the end user. The balance 371 between sharing partially that history versus completely transferring 372 this information depends on the level of concentration of the 373 Internet and the ability of the resolver not to further share that 374 information. Using a public resolver also means that the user has to 375 trust the public resolver handling the information according to the 376 user's wishes. 378 6. Privacy impact of concentration 380 Section Section 3 pointed out that concentration was one factor that 381 could contribute in enforcing boundaries between the information 382 carried through the DNS traffic and the information provided by 383 observing the web traffic. This section analyses how concentration 384 impacts privacy. 386 At first sight the ability that concentration has to 'hide' multiple 387 web sites behind a single IP address has to be balanced by the fact 388 that a significant amount of your traffic is going into one place - 389 or at least a single actor. In other words, concentration represent 390 a direct threat to privacy with all you data being provided to one 391 person. The cost associated to hide the signification of the IP 392 address is too high, and even a higher trust in one cloud provider 393 rather than your ISP could hardly justify such an approach. 395 Firstly, in order to hide the signification of the destination IP 396 address, mechanisms such as TOR should be used instead. Secondly, 397 trust may change over time, but provided data can hardly be retired. 398 As such privacy should be designed in a way that does not depends on 399 one or few players. Ideally, the data should be sufficiently spread 400 among the various players so that none of them could exploit them. 401 This can only be enforced by a *large number of player*. Typically as 402 long as the fraction of data shared with one player is sufficient to 403 start being analysed, privacy is at risk. 405 To balance that risk, it matters to reduce the amount of data shared 406 as well as to minimize the level of information associated to that 407 data. Typically suppose that a cloud provider proposes both a DNS 408 resolution service as well as hosts the web server www.example.com. 409 Performing the DNS resolution over that cloud provider for 410 www.example.com will provide limited additional information as the 411 DNS resolution will follow an web connection. 413 Note that concentration here includes the access to the data. In 414 other words, a cloud provider hosting various web servers without 415 possible access to that data will not fall into the concentration 416 concept described in this section. 418 While the concentration represents a threat to the privacy, the 419 remaining of this section analysis the impact of a cloud provider 420 providing both a DNS resolution service and hosting service and 421 exposes how this could contribute in balkanizing the internet, or 422 more precisely capturing end users into close to wall gardened 423 networks. 425 Typically, one cloud provider hosting a DNS resolver is likely to 426 redirect the end user traffic within its data center rather than to 427 the data center of a competitor. Note that such choice may be 428 appropriated according to the localisation of the DNS resolver. The 429 problem may arise when the end user would benefit of a better 430 connectivity by accessing the web site instantiated in the cloud of 431 an other cloud provider. In this case, the choice of the DNS 432 resolver may be motivated by its own interest rather than the 433 interest of the end user thus capturing the end user. Furthermore, 434 the former optimization of the data center of the DNS resolver might 435 lead in capturing the end user. Here capturing would mean the cloud 436 provider is keeping the end user - as much as it can - within its 437 borders. Such capture represents a major threat to privacy as the 438 end user is literally kept into one entity, independently of its 439 willingness. 441 The ability to capture an end user is problematic as it might become 442 a mean to bring the end user into a different jurisdiction as its 443 local jurisdiction. This may represent a direct threat to its 444 private information as some jurisdictions provide little protection 445 regarding to privacy. The comparison of local jurisdiction versus 446 other jurisdictions is not the topic of the document. We do not 447 ignore that certain jurisdictions represent a permanent threat to 448 privacy. However, those jurisdictions put apart, it might also to 449 notice that the local jurisdiction is probably the one best 450 understood by the end user, and that bringing its data into other 451 jurisdiction may goes against its believes. Similarly, some aspects 452 of jurisdictions may also reflect the choice of societies, like the 453 protection of the weakest of their members [IWF]. 455 6.1. Acknowledgment 457 We would like to thank the feed backs we received from Bengt Sahlin, 458 Christian Schaefer and Mirja Kuhlewind. 460 7. References 462 7.1. Normative References 464 [RFC1035] Mockapetris, P., "Domain names - implementation and 465 specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, 466 November 1987, . 468 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 469 Requirement Levels", BCP 14, RFC 2119, 470 DOI 10.17487/RFC2119, March 1997, 471 . 473 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 474 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 475 May 2017, . 477 [RFC8484] Hoffman, P. and P. McManus, "DNS Queries over HTTPS 478 (DoH)", RFC 8484, DOI 10.17487/RFC8484, October 2018, 479 . 481 7.2. Informative References 483 [IWF] "DNS over HTTPS Why we're saying DoH could be 484 catastrophic", n.d., . 488 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 489 Text on Security Considerations", BCP 72, RFC 3552, 490 DOI 10.17487/RFC3552, July 2003, 491 . 493 [RFC6973] Cooper, A., Tschofenig, H., Aboba, B., Peterson, J., 494 Morris, J., Hansen, M., and R. Smith, "Privacy 495 Considerations for Internet Protocols", RFC 6973, 496 DOI 10.17487/RFC6973, July 2013, 497 . 499 [RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an 500 Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May 501 2014, . 503 Author's Address 505 Daniel Migault 506 Ericsson 507 8275 Trans Canada Route 508 Saint Laurent, QC 4S 0B6 509 Canada 511 EMail: mglt.ietf@gmail.com