idnits 2.17.1 draft-ietf-dprive-bcp-op-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC6841]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 9 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 4 instances of lines with non-RFC3849-compliant IPv6 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 8, 2018) is 2081 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 1136 -- Looks like a reference, but probably isn't: '2' on line 1138 -- Looks like a reference, but probably isn't: '3' on line 1140 -- Looks like a reference, but probably isn't: '4' on line 1142 -- Looks like a reference, but probably isn't: '5' on line 1144 -- Looks like a reference, but probably isn't: '6' on line 1147 -- Looks like a reference, but probably isn't: '7' on line 1150 -- Looks like a reference, but probably isn't: '8' on line 1152 -- Looks like a reference, but probably isn't: '9' on line 1155 -- Looks like a reference, but probably isn't: '10' on line 1303 -- Looks like a reference, but probably isn't: '11' on line 1310 -- Looks like a reference, but probably isn't: '12' on line 1320 -- Looks like a reference, but probably isn't: '13' on line 1333 -- Looks like a reference, but probably isn't: '14' on line 1350 -- Looks like a reference, but probably isn't: '15' on line 1351 -- Looks like a reference, but probably isn't: '16' on line 1354 -- Looks like a reference, but probably isn't: '17' on line 1381 -- Looks like a reference, but probably isn't: '18' on line 1381 -- Looks like a reference, but probably isn't: '19' on line 1385 -- Looks like a reference, but probably isn't: '20' on line 1387 -- Looks like a reference, but probably isn't: '21' on line 1394 == Outdated reference: A later version (-14) exists of draft-ietf-dnsop-terminology-bis-11 == Outdated reference: A later version (-14) exists of draft-ietf-doh-dns-over-https-12 ** Downref: Normative reference to an Experimental draft: draft-ietf-dprive-padding-policy (ref. 'I-D.ietf-dprive-padding-policy') ** Obsolete normative reference: RFC 5077 (Obsoleted by RFC 8446) ** Downref: Normative reference to an Informational RFC: RFC 6973 ** Obsolete normative reference: RFC 7525 (Obsoleted by RFC 9325) ** Obsolete normative reference: RFC 7816 (Obsoleted by RFC 9156) == Outdated reference: A later version (-02) exists of draft-bortzmeyer-dprive-rfc7626-bis-01 == Outdated reference: A later version (-10) exists of draft-ietf-dnsop-dns-capture-format-07 == Outdated reference: A later version (-15) exists of draft-ietf-dnsop-dns-tcp-requirements-02 == Outdated reference: A later version (-20) exists of draft-ietf-dnsop-session-signal-14 -- Obsolete informational reference (is this intentional?): RFC 7706 (Obsoleted by RFC 8806) Summary: 6 errors (**), 0 flaws (~~), 9 warnings (==), 23 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 dprive S. Dickinson 3 Internet-Draft Sinodun IT 4 Intended status: Best Current Practice B. Overeinder 5 Expires: February 9, 2019 NLnet Labs 6 R. van Rijswijk-Deij 7 SURFnet bv 8 A. Mankin 9 Salesforce 10 August 8, 2018 12 Recommendations for DNS Privacy Service Operators 13 draft-ietf-dprive-bcp-op-00 15 Abstract 17 This document presents operational, policy and security 18 considerations for DNS operators who choose to offer DNS Privacy 19 services. With the recommendations, the operator can make deliberate 20 decisions which services to provide, and how the decisions and 21 alternatives impact the privacy of users. 23 This document also presents a framework to assist writers of DNS 24 Privacy Policy and Practices Statements (analogous to DNS Security 25 Extensions (DNSSEC) Policies and DNSSEC Practice Statements described 26 in [RFC6841]). 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on February 9, 2019. 45 Copyright Notice 47 Copyright (c) 2018 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 63 2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 64 3. Privacy related documents . . . . . . . . . . . . . . . . . . 5 65 4. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 66 5. Recommendations for DNS privacy services . . . . . . . . . . 6 67 5.1. On the wire between client and server . . . . . . . . . . 7 68 5.1.1. Transport recommendations . . . . . . . . . . . . . . 7 69 5.1.2. Authentication of DNS privacy services . . . . . . . 8 70 5.1.3. Protocol recommendations . . . . . . . . . . . . . . 9 71 5.1.4. Availability . . . . . . . . . . . . . . . . . . . . 10 72 5.1.5. Service options . . . . . . . . . . . . . . . . . . . 11 73 5.1.6. Limitations of using a pure TLS proxy . . . . . . . . 11 74 5.2. Data at rest on the server . . . . . . . . . . . . . . . 12 75 5.2.1. Data handling . . . . . . . . . . . . . . . . . . . . 12 76 5.2.2. Data minimization of network traffic . . . . . . . . 13 77 5.2.3. IP address pseudonymization and anonymization methods 14 78 5.2.4. Pseudonymization, anonymization or discarding of 79 other correlation data . . . . . . . . . . . . . . . 14 80 5.2.5. Cache snooping . . . . . . . . . . . . . . . . . . . 15 81 5.3. Data sent onwards from the server . . . . . . . . . . . . 15 82 5.3.1. Protocol recommendations . . . . . . . . . . . . . . 15 83 5.3.2. Client query obfuscation . . . . . . . . . . . . . . 16 84 5.3.3. Data sharing . . . . . . . . . . . . . . . . . . . . 17 85 6. DNS privacy policy and practice statement . . . . . . . . . . 17 86 6.1. Recommended contents of a DPPPS . . . . . . . . . . . . . 18 87 6.2. Current policy and privacy statements . . . . . . . . . . 19 88 6.2.1. Quad9 . . . . . . . . . . . . . . . . . . . . . . . . 19 89 6.2.2. Cloudflare . . . . . . . . . . . . . . . . . . . . . 19 90 6.2.3. Google . . . . . . . . . . . . . . . . . . . . . . . 20 91 6.2.4. OpenDNS . . . . . . . . . . . . . . . . . . . . . . . 20 92 6.2.5. Comparison . . . . . . . . . . . . . . . . . . . . . 20 94 6.3. Enforcement/accountability . . . . . . . . . . . . . . . 20 95 7. IANA considerations . . . . . . . . . . . . . . . . . . . . . 21 96 8. Security considerations . . . . . . . . . . . . . . . . . . . 21 97 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 21 98 10. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 21 99 11. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 21 100 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 101 12.1. Normative References . . . . . . . . . . . . . . . . . . 22 102 12.2. Informative References . . . . . . . . . . . . . . . . . 23 103 12.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 25 104 Appendix A. Documents . . . . . . . . . . . . . . . . . . . . . 26 105 A.1. Potential increases in DNS privacy . . . . . . . . . . . 26 106 A.2. Potential decreases in DNS privacy . . . . . . . . . . . 27 107 A.3. Related operational documents . . . . . . . . . . . . . . 27 108 Appendix B. IP address techniques . . . . . . . . . . . . . . . 27 109 B.1. Google Analytics non-prefix filtering . . . . . . . . . . 28 110 B.2. dnswasher . . . . . . . . . . . . . . . . . . . . . . . . 29 111 B.3. Prefix-preserving map . . . . . . . . . . . . . . . . . . 29 112 B.4. Cryptographic Prefix-Preserving Pseudonymisation . . . . 29 113 B.5. Top-hash Subtree-replicated Anonymisation . . . . . . . . 30 114 B.6. ipcipher . . . . . . . . . . . . . . . . . . . . . . . . 30 115 B.7. Bloom filters . . . . . . . . . . . . . . . . . . . . . . 30 116 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 31 118 1. Introduction 120 [NOTE: This document is submitted to the IETF for initial review and 121 for feedback on the best forum for future versions of this document. 122 Initial considerations for DoH [I-D.ietf-doh-dns-over-https] are 123 included here in anticipation of that draft progressing to be an RFC 124 but further analysis is required.] 126 The Domain Name System (DNS) is at the core of the Internet; almost 127 every activity on the Internet starts with a DNS query (and often 128 several). However the DNS was not originally designed with strong 129 security or privacy mechanisms. A number of developments have taken 130 place in recent years which aim to increase the privacy of the DNS 131 system and these are now seeing some deployment. This latest 132 evolution of the DNS presents new challenges to operators and this 133 document attempts to provide an overview of considerations for 134 privacy focussed DNS services. 136 In recent years there has also been an increase in the availability 137 of "open resolvers" [I-D.ietf-dnsop-terminology-bis] which users may 138 prefer to use instead of the default network resolver because they 139 offer a specific feature (e.g. good reachability, encrypted 140 transport, strong privacy policy, filtering (or lack of), etc.). 141 These open resolvers have tended to be at the forefront of adoption 142 of privacy related enhancements but it is anticipated that operators 143 of other resolver services will follow. 145 Whilst protocols that encrypt DNS messages on the wire provide 146 protection against certain attacks, the resolver operator still has 147 (in principle) full visibility of the query data and transport 148 identifiers for each user. Therefore, a trust relationship exists. 149 The ability of the operator to provide a transparent, well 150 documented, and secure privacy service will likely serve as a major 151 differentiating factor for privacy conscious users if they make an 152 active selection of which resolver to use. 154 It should also be noted that the choice of a user to configure a 155 single resolver (or a fixed set of resolvers) and an encrypted 156 transport to use in all network environments has both advantages and 157 disadvantages. For example the user has a clear expectation of which 158 resolvers have visibility of their query data however this resolver/ 159 transport selection may provide an added mechanism to track them as 160 they move across network environments. Commitments from operators to 161 minimize such tracking are also likely to play a role in users 162 selection of resolver. 164 More recently the global legislative landscape with regard to 165 personal data collection, retention, and pseudonymization has seen 166 significant activity with differing requirements active in different 167 jurisdictions. For example the user of a service and the service 168 itself may be in jurisdictions with conflicting legislation. It is 169 an untested area that simply using a DNS resolution service 170 constitutes consent from the user for the operator to process their 171 query data. The impact of recent legislative changes on data 172 pertaining to the users of both Internet Service Providers and DNS 173 open resolvers is not fully understood at the time of writing. 175 This document has two main goals: 177 o To provide operational and policy guidance related to DNS over 178 encrypted transports and to outline recommendations for data 179 handling for operators of DNS privacy services. 181 o To introduce the DNS Privacy Policy and Practice Statement (DPPPS) 182 and present a framework to assist writers of this document. A 183 DPPPS is a document that an operator can publish outlining their 184 operational practices and commitments with regard to privacy 185 thereby providing a means for clients to evaluate the privacy 186 properties of a given DNS privacy service. In particular, the 187 framework identifies the elements that should be considered in 188 formulating a DPPPS. This document does not, however, define a 189 particular Policy or Practice Statement, nor does it seek to 190 provide legal advice or recommendations as to the contents. 192 Community insight [or judgment?] about operational practices can 193 change quickly, and experience shows that a Best Current Practice 194 (BCP) document about privacy and security is a point-in-time 195 statement. Readers are advised to seek out any errata or updates 196 that apply to this document. 198 2. Scope 200 "DNS Privacy Considerations" [I-D.bortzmeyer-dprive-rfc7626-bis] 201 describes the general privacy issues and threats associated with the 202 use of the DNS by Internet users and much of the threat analysis here 203 is lifted from that document and from [RFC6873]. However this 204 document is limited in scope to best practice considerations for the 205 provision of DNS privacy services by servers (recursive resolvers) to 206 clients (stub resolvers or forwarders). Privacy considerations 207 specifically from the perspective of an end user, or those for 208 operators of authoritative nameservers are out of scope. 210 This document includes (but is not limited to) considerations in the 211 following areas (taken from [I-D.bortzmeyer-dprive-rfc7626-bis]): 213 1. Data "on the wire" between a client and a server 215 2. Data "at rest" on a server (e.g. in logs) 217 3. Data "sent onwards" from the server (either on the wire or shared 218 with a third party) 220 Whilst the issues raised here are targeted at those operators who 221 choose to offer a DNS privacy service, considerations for areas 2 and 222 3 could equally apply to operators who only offer DNS over 223 unencrypted transports but who would like to align with privacy best 224 practice. 226 3. Privacy related documents 228 There are various documents that describe protocol changes that have 229 the potential to either increase or decrease the privacy of the DNS. 230 Note this does not imply that some documents are good or bad, better 231 or worse, just that (for example) some features may bring functional 232 benefits at the price of a reduction in privacy and conversely some 233 features increase privacy with an accompanying increase in 234 complexity. A selection of the most relevant documents are listed in 235 Appendix A for reference. 237 4. Terminology 239 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 240 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 241 "OPTIONAL" in this document are to be interpreted as described in BCP 242 14 [RFC2119] [RFC8174] when, and only when, they appear in all 243 capitals, as shown here. 245 Privacy terminology is as described in Section 3 of [RFC6973]. 247 DNS terminology is as described in [I-D.ietf-dnsop-terminology-bis] 248 with one modification: we use the definition of Privacy-enabling DNS 249 server taken from [RFC8310]: 251 o Privacy-enabling DNS server: A DNS server (most likely a full- 252 service resolver) that implements DNS-over-TLS [RFC7858], and may 253 optionally implement DNS-over-DTLS [RFC8094]. The server should 254 also offer at least one of the credentials described in Section 8 255 and implement the (D)TLS profile described in Section 9. 257 TODO: Update the definition of Privacy-enabling DNS server in 258 [I-D.ietf-dnsop-terminology-bis] to be complete and also include DoH, 259 then reference that here. 261 o DPPPS: DNS Privacy Policy and Practice Statement, see Section 6. 263 o DNS privacy service: The service that is offered via a privacy- 264 enabling DNS server and is documented either in an informal 265 statement of policy and practice with regard to users privacy or a 266 formal DPPPS. 268 5. Recommendations for DNS privacy services 270 We describe three classes of actions that operators of DNS privacy 271 services can take: 273 o Threat mitigation for well understood and documented privacy 274 threats to the users of the service and in some cases to the 275 operators of the service. 277 o Optimization of privacy services from an operational or management 278 perspective 280 o Additional options that could further enhance the privacy and 281 usability of the service 283 This document does not specify policy only best practice, however for 284 DNS Privacy services to be considered compliant with these best 285 practice guidelines they SHOULD implement (where appropriate) all: 287 o Threat mitigations to be minimally compliant 289 o Optimizations to be moderately compliant 291 o Additional options to be maximally compliant 293 TODO: Some of the threats listed in the following sections are taken 294 directly from Section 5 of RFC6973, some are just standalone 295 descriptions, we need to go through all of them and see if we can use 296 the RFC6973 threats where possible and make them consistent. 298 5.1. On the wire between client and server 300 In this section we consider both data on the wire and the service 301 provided to the client. 303 5.1.1. Transport recommendations 305 Threats: 307 o Surveillance: Passive surveillance of traffic on the wire 309 o Intrusion: Active injection of spurious data or traffic 311 Mitigations: 313 A DNS privacy service can mitigate these threats by providing service 314 over one or more of the following transports 316 o DNS-over-TLS [RFC7858] 318 o DoH [I-D.ietf-doh-dns-over-https] 320 Additional options: 322 o A DNS privacy service can also be provided over DNS-over-DTLS 323 [RFC8094], however note that this is an Experimental 324 specification. 326 It is noted that DNS privacy service might be provided over IPSec, 327 DNSCrypt or VPNs. However, use of these transports for DNS are not 328 standardized and any discussion of best practice for providing such 329 service is out of scope for this document. 331 5.1.2. Authentication of DNS privacy services 333 Threats: 335 o Surveillance and Intrusion: Active attacks that can redirect 336 traffic to rogue servers 338 Mitigations: 340 DNS privacy services should ensure clients can authenticate the 341 server. Note that this, in effect, commits the DNS privacy service 342 to a public identity users will trust. 344 When using DNS-over-TLS clients that select a 'Strict Privacy' usage 345 profile [RFC8310] (to mitigate the threat of active attack on the 346 client) require the ability to authenticate the DNS server. To 347 enable this, DNS privacy services that offer DNS-over-TLS should 348 provide credentials in the form of either X.509 certificates, SPKI 349 pinsets or TLSA records. 351 When offering DoH [I-D.ietf-doh-dns-over-https], HTTPS requires 352 authentication of the server as part of the protocol. 354 Optimizations: 356 DNS privacy services can also consider the following capabilities/ 357 options: 359 o As recommended in [RFC8310] providing DANE TLSA records for the 360 nameserver 362 * In particular, the service could provide TLSA records such that 363 authenticating solely via the PKIX infrastructure can be 364 avoided. 366 o Implementing [I-D.ietf-tls-dnssec-chain-extension] 368 * This can decrease the latency of connection setup to the server 369 and remove the need for the client to perform meta-queries to 370 obtain and validate the DANE records. 372 5.1.2.1. Certificate management 374 Anecdotal evidence to date highlights the management of certificates 375 as one of the more challenging aspects for operators of traditional 376 DNS resolvers that choose to additionally provide a DNS privacy 377 service as management of such credentials is new to those DNS 378 operators. 380 It is noted that SPKI pinset management is described in [RFC7858] but 381 that key pinning mechanisms in general have fallen out of favour 382 operationally for various reasons. 384 Threats: 386 o Invalid certificates, resulting in an unavailable service. 388 o Mis-identification of a server by a client e.g. typos in URLs or 389 authentication domain names 391 Mitigations: 393 It is recommended that operators: 395 o Choose a short, memorable authentication name for their service 397 o Automate the generation and publication of certificates 399 o Monitor certificates to prevent accidental expiration of 400 certificates 402 TODO: Could we provide references for certificate management best 403 practice, for example Section 6.5 of RFC7525? 405 5.1.3. Protocol recommendations 407 5.1.3.1. DNS-over-TLS 409 Threats: 411 o Known attacks on TLS (TODO: add a reference) 413 o Traffic analysis (TODO: add a reference) 415 o Potential for client tracking via transport identifiers 417 o Blocking of well known ports (e.g. 853 for DNS-over-TLS) 419 Mitigations: 421 In the case of DNS-over-TLS, TLS profiles from Section 9 and the 422 Countermeasures to DNS Traffic Analysis from section 11.1 of 423 [RFC8310] provide strong mitigations. This includes but is not 424 limited to: 426 o Adhering to [RFC7525] 427 o Implementing only (D)TLS 1.2 or later as specified in [RFC8310] 429 o Implementing EDNS(0) Padding [RFC7830] using the guidelines in 430 [I-D.ietf-dprive-padding-policy] 432 o Clients should not be required to use TLS session resumption 433 [RFC5077], Domain Name System (DNS) Cookies [RFC7873]. 435 o A DNS-over-TLS privacy service on both port 853 and 443. We note 436 that this practice may require revision when DoH becomes more 437 widely deployed, because of the potential use of the same ports 438 for two incompatible types of service. 440 Optimizations: 442 o Concurrent processing of pipelined queries, returning responses as 443 soon as available, potentially out of order as specified in 444 [RFC7766]. This is often called 'OOOR' - out-of-order responses. 445 (Providing processing performance similar to HTTP multiplexing) 447 o Management of TLS connections to optimize performance for clients 448 using either 450 * [RFC7766] and EDNS(0) Keepalive [RFC7828] and/or 452 * DNS Stateful Operations [I-D.ietf-dnsop-session-signal] 454 Additional options that providers may consider: 456 o Offer a .onion [RFC7686] service endpoint 458 5.1.3.2. DoH 460 TODO: Fill this in, a lot of overlap with DNS-over-TLS but we need to 461 address DoH specific ones if possible. 463 Mitigations: 465 o Clients should not be required to use HTTP Cookies [RFC6265]. 467 o Clients should not be required to include any headers beyond the 468 absolute minimum to obtain service from a DoH server. 470 5.1.4. Availability 472 Threats: 474 o A failed DNS privacy service could force the user to switch 475 providers, fallback to cleartext or accept no DNS service for the 476 outage. 478 Mitigations: 480 A DNS privacy service must be engineered for high availability. 481 Particular care should to be taken to protect DNS privacy services 482 against denial-of-service attacks, as experience has shown that 483 unavailability of DNS resolving because of attacks is a significant 484 motivation for users to switch services. 486 TODO: Add reference to ongoing research on this topic. 488 5.1.5. Service options 490 Threats: 492 o Unfairly disadvantaging users of the privacy service with respect 493 to the services available. This could force the user to switch 494 providers, fallback to cleartext or accept no DNS service for the 495 outage. 497 Mitigations: 499 A DNS privacy service should deliver the same level of service 500 offered on un-encrypted channels in terms of such options as 501 filtering (or lack of), DNSSEC validation, etc. 503 5.1.6. Limitations of using a pure TLS proxy 505 Optimization: 507 Some operators may choose to implement DNS-over-TLS using a TLS proxy 508 (e.g. nginx [1], haproxy [2] or stunnel [3]) in front of a DNS 509 nameserver because of proven robustness and capacity when handling 510 large numbers of client connections, load balancing capabilities and 511 good tooling. Currently, however, because such proxies typically 512 have no specific handling of DNS as a protocol over TLS or DTLS using 513 them can restrict traffic management at the proxy layer and at the 514 DNS server. For example, all traffic received by a nameserver behind 515 such a proxy will appear to originate from the proxy and DNS 516 techniques such as ACLs, RRL or DNS64 will be hard or impossible to 517 implement in the nameserver. 519 Operators may choose to use a DNS aware proxy such as dnsdist. 521 5.2. Data at rest on the server 523 5.2.1. Data handling 525 Threats: 527 o Surveillance 529 o Stored data compromise 531 o Correlation 533 o Identification 535 o Secondary use 537 o Disclosure 539 o Contravention of legal requirements not to process user data? 541 Mitigations: 543 The following are common activities for DNS service operators and in 544 all cases should be minimized or completely avoided if possible for 545 DNS privacy services. If data is retained it should be encrypted and 546 either aggregated, pseudonymized or anonymized whenever possible. In 547 general the principle of data minimization described in [RFC6973] 548 should be applied. 550 o Transient data (e.g. that is used for real time monitoring and 551 threat analysis which might be held only memory) should be 552 retained for the shortest possible period deemed operationally 553 feasible. 555 o The retention period of DNS traffic logs should be only those 556 required to sustain operation of the service and, to the extent 557 that such exists, meet regulatory requirements. 559 o DNS privacy services should not track users except for the 560 particular purpose of detecting and remedying technically 561 malicious (e.g. DoS) or anomalous use of the service. 563 o Data access should be minimized to only those personal who require 564 access to perform operational duties. 566 5.2.2. Data minimization of network traffic 568 Data minimization refers to collecting, using, disclosing, and 569 storing the minimal data necessary to perform a task, and this can be 570 achieved by removing or obfuscating privacy-sensitive information in 571 network traffic logs. This is typically personal data, or data that 572 can be used to link a record to an individual, but may also include 573 revealing other confidential information, for example on the 574 structure of an internal corporate network. 576 The problem of effectively ensuring that DNS traffic logs contain no 577 or minimal privacy-sensitive information is not one that currently 578 has a generally agreed solution or any Standards to inform this 579 discussion. This section presents and overview of current techniques 580 to simply provide reference on the current status of this work. 582 Research into data minimization techniques (and particularly IP 583 address pseudonymization/anonymization) was sparked in the late 584 1990s/early 2000s, partly driven by the desire to share significant 585 corpuses of traffic captures for research purposes. Several 586 techniques reflecting different requirements in this area and 587 different performance/resource tradeoffs emerged over the course of 588 the decade. Developments over the last decade have been both a 589 blessing and a curse; the large increase in size between an IPv4 and 590 an IPv6 address, for example, renders some techniques impractical, 591 but also makes available a much larger amount of input entropy, the 592 better to resist brute force re-identification attacks that have 593 grown in practicality over the period. 595 Techniques employed may be broadly categorized as either 596 anonymization or pseudonymization. The following discussion uses the 597 definitions from [RFC6973] Section 3, with additional observations 598 from van Dijkhuizen et al. [4] 600 o Anonymization. To enable anonymity of an individual, there must 601 exist a set of individuals that appear to have the same 602 attribute(s) as the individual. To the attacker or the observer, 603 these individuals must appear indistinguishable from each other. 605 o Pseudonymization. The true identity is deterministically replaced 606 with an alternate identity (a pseudonym). When the 607 pseudonymization schema is known, the process can be reversed, so 608 the original identity becomes known again. 610 In practice there is a fine line between the two; for example, how to 611 categorize a deterministic algorithm for data minimization of IP 612 addresses that produces a group of pseudonyms for a single given 613 address. 615 5.2.3. IP address pseudonymization and anonymization methods 617 As [I-D.bortzmeyer-dprive-rfc7626-bis] makes clear, the big privacy 618 risk in DNS is connecting DNS queries to an individual and the major 619 vector for this in DNS traffic is the client IP address. 621 There is active discussion in the space of effective pseudonymization 622 of IP addresses in DNS traffic logs, however there seems to be no 623 single solution that is widely recognized as suitable for all or most 624 use cases. There are also as yet no standards for this that are 625 unencumbered by patents. This following table presents a high level 626 comparison of various techniques employed or under development today 627 and classifies them according to categorization of technique and 628 other properties. The list of techniques includes the main 629 techniques in current use, but does not claim to be comprehensive. 630 Appendix B provides a more detailed survey of these techniques and 631 definitions for the categories and properties listed below. 633 Figure showing comparison of IP address techniques (SVG) [5] 635 The choice of which method to use for a particular application will 636 depend on the requirements of that application and consideration of 637 the threat analysis of the particular situation. 639 For example, a common goal is that distributed packet captures must 640 be in an existing data format such as PCAP [pcap] or C-DNS 641 [I-D.ietf-dnsop-dns-capture-format] that can be used as input to 642 existing analysis tools. In that case, use of a Format-preserving 643 technique is essential. This, though, is not cost-free - several 644 authors (e.g. Brenker & Arnes [6]) have observed that, as the 645 entropy in a IPv4 address is limited, given a de-identified log from 646 a target, if an attacker is capable of ensuring packets are captured 647 by the target and the attacker can send forged traffic with arbitrary 648 source and destination addresses to that target, any format- 649 preserving pseudonymization is vulnerable to an attack along the 650 lines of a cryptographic chosen plaintext attack. 652 5.2.4. Pseudonymization, anonymization or discarding of other 653 correlation data 655 Threats: 657 o IP TTL/Hoplimit can be used to fingerprint client OS 659 o Tracking of TCP sessions 661 o Tracking of TLS sessions and session resumption mechanisms 662 o Resolvers _might_ receive client identifiers e.g. MAC addresses 663 in EDNS(0) options - some CPE devices are known to add them. 665 o HTTP headers 667 Mitigations: 669 o Data minimization or discarding of such correlation data 671 TODO: More analysis here. 673 5.2.5. Cache snooping 675 Threats: 677 o Profiling of client queries by malicious third parties 679 Mitigations: 681 TODO: Describe techniques to defend against cache snooping 683 5.3. Data sent onwards from the server 685 In this section we consider both data sent on the wire in upstream 686 queries and data shared with third parties. 688 5.3.1. Protocol recommendations 690 Threats: 692 o Transmission of identifying data upstream. 694 Mitigations: 696 As specified in [RFC8310] for DNS-over-TLS but applicable to any DNS 697 Privacy services the server should: 699 o Implement QNAME minimization [RFC7816] 701 o Honour a SOURCE PREFIX-LENGTH set to 0 in a query containing the 702 EDNS(0) Client Subnet (ECS) option and not send an ECS option in 703 upstream queries. 705 Optimizations: 707 o The server should either 709 * not use the ECS option in upstream queries at all, or 710 * offer alternative services, one that sends ECS and one that 711 does not. 713 If operators do offer a service that sends the ECS options upstream 714 they should use the shortest prefix that is operationally feasible 715 (NOTE: the authors believe they will be able to add a reference for 716 advice here soon) and ideally use a policy of whitelisting upstream 717 servers to send ECS to in order to minimize data leakage. Operators 718 should make clear in any policy statement what prefix length they 719 actually send and the specific policy used. 721 Additional options: 723 o Aggressive Use of DNSSEC-Validated Cache [RFC8198] to reduce the 724 number of queries to authoritative servers to increase privacy. 726 o Run a copy of the root zone on loopback [RFC7706] to avoid making 727 queries to the root servers that might leak information. 729 5.3.2. Client query obfuscation 731 Additional options: 733 Since queries from recursive resolvers to authoritative servers are 734 performed using cleartext (at the time of writing), resolver services 735 need to consider the extent to which they may be directly leaking 736 information about their client community via these upstream queries 737 and what they can do to mitigate this further. Note, that even when 738 all the relevant techniques described above are employed there may 739 still be attacks possible, e.g. [Pitfalls-of-DNS-Encryption]. For 740 example, a resolver with a very small community of users risks 741 exposing data in this way and OUGHT obfuscate this traffic by mixing 742 it with 'generated' traffic to make client characterization harder. 743 The resolver could also employ aggressive pre-fetch techniques as a 744 further measure to counter traffic analysis. 746 At the time of writing there are no standardized or widely recognized 747 techniques to preform such obfuscation or bulk pre-fetches. 749 Another technique that particularly small operators may consider is 750 forwarding local traffic to a larger resolver (with a privacy policy 751 that aligns with their own practices) over an encrypted protocol so 752 that the upstream queries are obfuscated among those of the large 753 resolver. 755 5.3.3. Data sharing 757 Threats: 759 o Surveillance 761 o Stored data compromise 763 o Correlation 765 o Identification 767 o Secondary use 769 o Disclosure 771 o Contravention of legal requirements not to process user data? 773 Mitigations: 775 Operators should not provide identifiable data to third-parties 776 without explicit consent from clients (we take the stance here that 777 simply using the resolution service itself does not constitute 778 consent). 780 Even when consent is granted operators should employ data 781 minimization techniques such as those described in Section 5.2.1 if 782 data is shared with third-parties. 784 Operators should consider including specific guidelines for the 785 collection of aggregated and/or anonymized data for research 786 purposes, within or outside of their own organization. 788 TODO: More on data for research vs operations... how to still 789 motivate operators to share anonymized data? 791 TODO: Guidelines for when consent is granted? 793 TODO: Applies to server data handling too.. could operators offer 794 alternatives services one that implies consent for data processing, 795 one that doesn't? 797 6. DNS privacy policy and practice statement 798 6.1. Recommended contents of a DPPPS 800 1 Policy 802 1.1 Recommendations. This section should explain, with reference to 803 section Section 5 of this document which recommendations the DNS 804 privacy service employs. 806 1.2 Data handling. This section should explain, with reference to 807 section Section 5.2 of this document the policy for gathering and 808 disseminating information collected by the DNS privacy service. 810 1.2.1 Specify clearly what data (including whether it is aggregated, 811 pseudonymized or anonymized) is: 813 1.2.1.1 Collected and retained by the operator (and for how long) 815 1.2.1.2 Shared with partners 817 1.2.1.3 Shared, sold or rented to third-parties 819 1.2.2 Specify any exceptions to the above, for example technically 820 malicious or anomalous behaviour 822 1.2.3 Declare any partners, third-party affiliations or sources of 823 funding 825 1.2.4 Whether user DNS data is correlated or combined with any other 826 personal information held by the operator 828 2 Practice. This section should explain the current operational 829 practices of the service. 831 2.1 Specify any temporary or permanent deviations from the policy for 832 operational reasons 834 2.2 With reference to section Section 5.1 provide specific details of 835 which capabilities are provided on which address and ports 837 2.3 With reference to section Section 5.3 provide specific details of 838 which capabilities are employed for upstream traffic from the server 840 2.4 Specify the authentication name to be used (if any) and if TLSA 841 records are published (including options used in the TLSA records) 843 2.5 Specify the SPKI pinsets to be used (if any) and policy for 844 rolling keys 845 2.6 Provide a contact email address for the service 847 6.2. Current policy and privacy statements 849 NOTE: An analysis of these statements will clearly only provide a 850 snapshot at the time of writing. It is included in this version of 851 the draft to provide a basis for the assessment of the contents of 852 the DPPPS and is expected to be removed or substantially re-worked in 853 a future version. 855 6.2.1. Quad9 857 UDP/TCP and TLS (port 853) service provided on two addresses: 859 o 'Secure': 9.9.9.9, 149.112.112.112, 2620:fe::fe, 2620:fe::9 861 o 'Unsecured': 9.9.9.10, 149.112.112.10, 2620:fe::10 863 Policy: 865 o 867 o 869 o 871 6.2.2. Cloudflare 873 UDP/TCP and TLS (port 853) service provided on 1.1.1.1, 1.0.0.1, 874 2606:4700:4700::1111 and 2606:4700:4700::1001. 876 Policy: 878 o 881 DoH provided on: 883 Policy: 885 o 888 Tor endpoint: . 891 6.2.3. Google 893 UDP/TCP service provided on 8.8.8.8, 8.8.4.4, 2001:4860:4860::8888 894 and 2001:4860:4860::8844. 896 Policy: 898 6.2.4. OpenDNS 900 UDP/TCP service provided on 208.67.222.222 and 208.67.220.220 (no 901 IPv6). 903 We could find no specific privacy policy for the DNS resolution, only 904 a general one from Cisco that seems focussed on websites. 906 Policy: 908 6.2.5. Comparison 910 The following tables provides a high-level comparison of the policy 911 and practice statements above and also some observations of practice 912 measured at dnsprivacy.org [7]. The data is not exhaustive and has 913 not been reviewed or confirmed by the operators. 915 A question mark indicates no clear statement or data could be located 916 on the issue. A dash indicates the category is not applicable to the 917 service. 919 Table showing comparison of operators policies [8] 921 Table showing comparison of operators practices [9] 923 NOTE: Review and correction of any inaccuracies in the table would be 924 much appreciated. 926 6.3. Enforcement/accountability 928 Transparency reports may help with building user trust that operators 929 adhere to their policies and practices. 931 Independent monitoring should be performed where possible of: 933 o ECS, QNAME minimization, EDNS(0) padding, etc. 935 o Filtering 937 o Uptime 939 7. IANA considerations 941 None 943 8. Security considerations 945 TODO: e.g. New issues for DoS defence, server admin policies 947 9. Acknowledgements 949 Many thanks to Amelia Andersdotter for a very thorough review of the 950 first draft of this document. Thanks also to John Todd for 951 discussions on this topic, and to Stephane Bortzmeyer for review. 953 Sara Dickinson thanks the Open Technology Fund for a grant to support 954 the work on this document. 956 10. Contributors 958 The below individuals contributed significantly to the document: 960 John Dickinson 961 Sinodun Internet Technologies 962 Magdalen Centre 963 Oxford Science Park 964 Oxford OX4 4GA 965 United Kingdom 967 Jim Hague 968 Sinodun Internet Technologies 969 Magdalen Centre 970 Oxford Science Park 971 Oxford OX4 4GA 972 United Kingdom 974 11. Changelog 976 draft-ietf-dprive-bcp-op-00 978 o Initial commit of re-named document after adoption to replace 979 draft-dickinson-dprive-bcp-op-01 981 12. References 982 12.1. Normative References 984 [I-D.ietf-dnsop-terminology-bis] 985 Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS 986 Terminology", draft-ietf-dnsop-terminology-bis-11 (work in 987 progress), July 2018. 989 [I-D.ietf-doh-dns-over-https] 990 Hoffman, P. and P. McManus, "DNS Queries over HTTPS 991 (DoH)", draft-ietf-doh-dns-over-https-12 (work in 992 progress), June 2018. 994 [I-D.ietf-dprive-padding-policy] 995 Mayrhofer, A., "Padding Policy for EDNS(0)", draft-ietf- 996 dprive-padding-policy-06 (work in progress), July 2018. 998 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 999 Requirement Levels", BCP 14, RFC 2119, 1000 DOI 10.17487/RFC2119, March 1997, . 1003 [RFC5077] Salowey, J., Zhou, H., Eronen, P., and H. Tschofenig, 1004 "Transport Layer Security (TLS) Session Resumption without 1005 Server-Side State", RFC 5077, DOI 10.17487/RFC5077, 1006 January 2008, . 1008 [RFC6265] Barth, A., "HTTP State Management Mechanism", RFC 6265, 1009 DOI 10.17487/RFC6265, April 2011, . 1012 [RFC6973] Cooper, A., Tschofenig, H., Aboba, B., Peterson, J., 1013 Morris, J., Hansen, M., and R. Smith, "Privacy 1014 Considerations for Internet Protocols", RFC 6973, 1015 DOI 10.17487/RFC6973, July 2013, . 1018 [RFC7525] Sheffer, Y., Holz, R., and P. Saint-Andre, 1019 "Recommendations for Secure Use of Transport Layer 1020 Security (TLS) and Datagram Transport Layer Security 1021 (DTLS)", BCP 195, RFC 7525, DOI 10.17487/RFC7525, May 1022 2015, . 1024 [RFC7816] Bortzmeyer, S., "DNS Query Name Minimisation to Improve 1025 Privacy", RFC 7816, DOI 10.17487/RFC7816, March 2016, 1026 . 1028 [RFC7830] Mayrhofer, A., "The EDNS(0) Padding Option", RFC 7830, 1029 DOI 10.17487/RFC7830, May 2016, . 1032 [RFC7858] Hu, Z., Zhu, L., Heidemann, J., Mankin, A., Wessels, D., 1033 and P. Hoffman, "Specification for DNS over Transport 1034 Layer Security (TLS)", RFC 7858, DOI 10.17487/RFC7858, May 1035 2016, . 1037 [RFC7873] Eastlake 3rd, D. and M. Andrews, "Domain Name System (DNS) 1038 Cookies", RFC 7873, DOI 10.17487/RFC7873, May 2016, 1039 . 1041 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1042 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1043 May 2017, . 1045 [RFC8310] Dickinson, S., Gillmor, D., and T. Reddy, "Usage Profiles 1046 for DNS over TLS and DNS over DTLS", RFC 8310, 1047 DOI 10.17487/RFC8310, March 2018, . 1050 12.2. Informative References 1052 [I-D.bortzmeyer-dprive-rfc7626-bis] 1053 Bortzmeyer, S. and S. Dickinson, "DNS Privacy 1054 Considerations", draft-bortzmeyer-dprive-rfc7626-bis-01 1055 (work in progress), July 2018. 1057 [I-D.ietf-dnsop-dns-capture-format] 1058 Dickinson, J., Hague, J., Dickinson, S., Manderson, T., 1059 and J. Bond, "C-DNS: A DNS Packet Capture Format", draft- 1060 ietf-dnsop-dns-capture-format-07 (work in progress), May 1061 2018. 1063 [I-D.ietf-dnsop-dns-tcp-requirements] 1064 Kristoff, J. and D. Wessels, "DNS Transport over TCP - 1065 Operational Requirements", draft-ietf-dnsop-dns-tcp- 1066 requirements-02 (work in progress), May 2018. 1068 [I-D.ietf-dnsop-session-signal] 1069 Bellis, R., Cheshire, S., Dickinson, J., Dickinson, S., 1070 Lemon, T., and T. Pusateri, "DNS Stateful Operations", 1071 draft-ietf-dnsop-session-signal-14 (work in progress), 1072 August 2018. 1074 [I-D.ietf-tls-dnssec-chain-extension] 1075 Shore, M., Barnes, R., Huque, S., and W. Toorop, "A DANE 1076 Record and DNSSEC Authentication Chain Extension for TLS", 1077 draft-ietf-tls-dnssec-chain-extension-07 (work in 1078 progress), March 2018. 1080 [pcap] tcpdump.org, "PCAP", 2016, . 1082 [Pitfalls-of-DNS-Encryption] 1083 Shulman, H., "Pretty Bad Privacy: Pitfalls of DNS 1084 Encryption", 2014, . 1087 [RFC6235] Boschi, E. and B. Trammell, "IP Flow Anonymization 1088 Support", RFC 6235, DOI 10.17487/RFC6235, May 2011, 1089 . 1091 [RFC6841] Ljunggren, F., Eklund Lowinder, AM., and T. Okubo, "A 1092 Framework for DNSSEC Policies and DNSSEC Practice 1093 Statements", RFC 6841, DOI 10.17487/RFC6841, January 2013, 1094 . 1096 [RFC6873] Salgueiro, G., Gurbani, V., and A. Roach, "Format for the 1097 Session Initiation Protocol (SIP) Common Log Format 1098 (CLF)", RFC 6873, DOI 10.17487/RFC6873, February 2013, 1099 . 1101 [RFC7686] Appelbaum, J. and A. Muffett, "The ".onion" Special-Use 1102 Domain Name", RFC 7686, DOI 10.17487/RFC7686, October 1103 2015, . 1105 [RFC7706] Kumari, W. and P. Hoffman, "Decreasing Access Time to Root 1106 Servers by Running One on Loopback", RFC 7706, 1107 DOI 10.17487/RFC7706, November 2015, . 1110 [RFC7766] Dickinson, J., Dickinson, S., Bellis, R., Mankin, A., and 1111 D. Wessels, "DNS Transport over TCP - Implementation 1112 Requirements", RFC 7766, DOI 10.17487/RFC7766, March 2016, 1113 . 1115 [RFC7828] Wouters, P., Abley, J., Dickinson, S., and R. Bellis, "The 1116 edns-tcp-keepalive EDNS0 Option", RFC 7828, 1117 DOI 10.17487/RFC7828, April 2016, . 1120 [RFC7871] Contavalli, C., van der Gaast, W., Lawrence, D., and W. 1121 Kumari, "Client Subnet in DNS Queries", RFC 7871, 1122 DOI 10.17487/RFC7871, May 2016, . 1125 [RFC8094] Reddy, T., Wing, D., and P. Patil, "DNS over Datagram 1126 Transport Layer Security (DTLS)", RFC 8094, 1127 DOI 10.17487/RFC8094, February 2017, . 1130 [RFC8198] Fujiwara, K., Kato, A., and W. Kumari, "Aggressive Use of 1131 DNSSEC-Validated Cache", RFC 8198, DOI 10.17487/RFC8198, 1132 July 2017, . 1134 12.3. URIs 1136 [1] https://nginx.org/ 1138 [2] https://www.haproxy.org/ 1140 [3] https://kb.isc.org/article/AA-01386/0/DNS-over-TLS.html 1142 [4] https://doi.org/10.1145/3182660 1144 [5] https://github.com/Sinodun/draft-dprive-bcp-op/blob/master/draft- 1145 00/ip_techniques_table.svg 1147 [6] https://pdfs.semanticscholar.org/7b34/12c951cebe71cd2cddac5fda164 1148 fb2138a44.pdf 1150 [7] https://dnsprivacy.org/jenkins/job/dnsprivacy-monitoring/ 1152 [8] https://github.com/Sinodun/draft-dprive-bcp-op/blob/master/draft- 1153 00/policy_table.svg 1155 [9] https://github.com/Sinodun/draft-dprive-bcp-op/blob/master/draft- 1156 00/practice_table.svg 1158 [10] https://support.google.com/analytics/answer/2763052?hl=en 1160 [11] https://www.conversionworks.co.uk/blog/2017/05/19/anonymize-ip- 1161 geo-impact-test/ 1163 [12] https://github.com/edmonds/pdns/blob/master/pdns/dnswasher.cc 1165 [13] http://ita.ee.lbl.gov/html/contrib/tcpdpriv.html 1167 [14] http://an.kaist.ac.kr/~sbmoon/paper/intl-journal/2004-cn- 1168 anon.pdf 1170 [15] https://www.cc.gatech.edu/computing/Telecomm/projects/cryptopan/ 1172 [16] http://mharvan.net/talks/noms-ip_anon.pdf 1174 [17] https://medium.com/@bert.hubert/on-ip-address-encryption- 1175 security-analysis-with-respect-for-privacy-dabe1201b476 1177 [18] https://github.com/PowerDNS/ipcipher 1179 [19] https://github.com/veorq/ipcrypt 1181 [20] https://www.ietf.org/mail-archive/web/cfrg/current/msg09494.html 1183 [21] https://tnc18.geant.org/core/presentation/127 1185 Appendix A. Documents 1187 This section provides an overview of some DNS privacy related 1188 documents, however, this is neither an exhaustive list nor a 1189 definitive statement on the characteristic of the document. 1191 A.1. Potential increases in DNS privacy 1193 These documents are limited in scope to communications between stub 1194 clients and recursive resolvers: 1196 o 'Specification for DNS over Transport Layer Security (TLS)' 1197 [RFC7858], referred to here as 'DNS-over-TLS'. 1199 o 'DNS over Datagram Transport Layer Security (DTLS)' [RFC8094], 1200 referred to here as 'DNS-over-DTLS'. Note that this document has 1201 the Category of Experimental. 1203 o 'DNS Queries over HTTPS (DoH)' [I-D.ietf-doh-dns-over-https] 1204 referred to here as DoH. 1206 o 'Usage Profiles for DNS over TLS and DNS over DTLS' [RFC8310] 1208 o 'The EDNS(0) Padding Option' [RFC7830] and 'Padding Policy for 1209 EDNS(0)' [I-D.ietf-dprive-padding-policy] 1211 These documents apply to recursive to authoritative DNS but are 1212 relevant when considering the operation of a recursive server: 1214 o 'DNS Query Name minimization to Improve Privacy' [RFC7816] 1215 referred to here as 'QNAME minimization' 1217 A.2. Potential decreases in DNS privacy 1219 These documents relate to functionality that could provide increased 1220 tracking of user activity as a side effect: 1222 o 'Client Subnet in DNS Queries' [RFC7871] 1224 o 'Domain Name System (DNS) Cookies' [RFC7873]) 1226 o 'Transport Layer Security (TLS) Session Resumption without Server- 1227 Side State' [RFC5077] referred to here as simply TLS session 1228 resumption. 1230 o 'A DNS Packet Capture Format' [I-D.ietf-dnsop-dns-capture-format] 1232 o Passive DNS [I-D.ietf-dnsop-terminology-bis] 1234 Note that depending on the specifics of the implementation 1235 [I-D.ietf-doh-dns-over-https] may also provide increased tracking. 1237 A.3. Related operational documents 1239 o 'DNS Transport over TCP - Implementation Requirements' [RFC7766] 1241 o 'Operational requirements for DNS-over-TCP' 1242 [I-D.ietf-dnsop-dns-tcp-requirements] 1244 o 'The edns-tcp-keepalive EDNS0 Option' [RFC7828] 1246 o 'DNS Stateful Operations' [I-D.ietf-dnsop-session-signal] 1248 Appendix B. IP address techniques 1250 Data minimization methods may be categorized by the processing used 1251 and the properties of their outputs. The following builds on the 1252 categorization employed in [RFC6235]: 1254 o Format-preserving. Normally when encrypting, the original data 1255 length and patterns in the data should be hidden from an attacker. 1256 Some applications of de-identification, such as network capture 1257 de-identification, require that the de-identified data is of the 1258 same form as the original data, to allow the data to be parsed in 1259 the same way as the original. 1261 o Prefix preservation. Values such as IP addresses and MAC 1262 addresses contain prefix information that can be valuable in 1263 analysis, e.g. manufacturer ID in MAC addresses, subnet in IP 1264 addresses. Prefix preservation ensures that prefixes are de- 1265 identified consistently; e.g. if two IP addresses are from the 1266 same subnet, a prefix preserving de-identification will ensure 1267 that their de-identified counterparts will also share a subnet. 1268 Prefix preservation may be fixed (i.e. based on a user selected 1269 prefix length identified in advance to be preserved ) or general. 1271 o Replacement. A one-to-one replacement of a field to a new value 1272 of the same type, for example using a regular expression. 1274 o Filtering. Removing (and thus truncating) or replacing data in a 1275 field. Field data can be overwritten, often with zeros, either 1276 partially (grey marking) or completely (black marking). 1278 o Generalization. Data is replaced by more general data with 1279 reduced specificity. One example would be to replace all TCP/UDP 1280 port numbers with one of two fixed values indicating whether the 1281 original port was ephemeral (>=1024) or non-ephemeral (>1024). 1282 Another example, precision degradation, reduces the accuracy of 1283 e.g. a numeric value or a timestamp. 1285 o Enumeration. With data from a well-ordered set, replace the first 1286 data item data using a random initial value and then allocate 1287 ordered values for subsequent data items. When used with 1288 timestamp data, this preserves ordering but loses precision and 1289 distance. 1291 o Reordering/shuffling. Preserving the original data, but 1292 rearranging its order, often in a random manner. 1294 o Random substitution. As replacement, but using randomly generated 1295 replacement values. 1297 o Cryptographic permutation. Using a permutation function, such as 1298 a hash function or cryptographic block cipher, to generate a 1299 replacement de-identified value. 1301 B.1. Google Analytics non-prefix filtering 1303 Since May 2010, Google Analytics has provided a facility [10] that 1304 allows website owners to request that all their users IP addresses 1305 are anonymized within Google Analytics processing. This very basic 1306 anonymization simply sets to zero the least significant 8 bits of 1307 IPv4 addresses, and the least significant 80 bits of IPv6 addresses. 1308 The level of anonymization this produces is perhaps questionable. 1310 There are some analysis results [11] which suggest that the impact of 1311 this on reducing the accuracy of determining the user's location from 1312 their IP address is less than might be hoped; the average discrepancy 1313 in identification of the user city for UK users is no more than 17%. 1315 Anonymization: Format-preserving, Filtering (grey marking). 1317 B.2. dnswasher 1319 Since 2006, PowerDNS have included a de-identification tool dnswasher 1320 [12] with their PowerDNS product. This is a PCAP filter that 1321 performs a one-to-one mapping of end user IP addresses with an 1322 anonymized address. A table of user IP addresses and their de- 1323 identified counterparts is kept; the first IPv4 user addresses is 1324 translated to 0.0.0.1, the second to 0.0.0.2 and so on. The de- 1325 identified address therefore depends on the order that addresses 1326 arrive in the input, and running over a large amount of data the 1327 address translation tables can grow to a significant size. 1329 Anonymization: Format-preserving, Enumeration. 1331 B.3. Prefix-preserving map 1333 Used in TCPdpriv [13], this algorithm stores a set of original and 1334 anonymised IP address pairs. When a new IP address arrives, it is 1335 compared with previous addresses to determine the longest prefix 1336 match. The new address is anonymized by using the same prefix, with 1337 the remainder of the address anonymized with a random value. The use 1338 of a random value means that TCPdrpiv is not deterministic; different 1339 anonymized values will be generated on each run. The need to store 1340 previous addresses means that TCPdpriv has significant and unbounded 1341 memory requirements, and because of the need to allocated anonymized 1342 addresses sequentially cannot be used in parallel processing. 1344 Anonymization: Format-preserving, prefix preservation (general). 1346 B.4. Cryptographic Prefix-Preserving Pseudonymisation 1348 Cryptographic prefix-preserving pseudonymisation was originally 1349 proposed as an improvement to the prefix-preserving map implemented 1350 in TCPdpriv, described in Xu et al. [14] and implemented in the 1351 Crypto-PAn tool [15]. Crypto-PAn is now frequently used as an 1352 acronym for the algorithm. Initially it was described for IPv4 1353 addresses only; extension for IPv6 addresses was proposed in Harvan & 1354 Schoenwaelder [16] and implemented in snmpdump. This uses a 1355 cryptographic algorithm rather than a random value, and thus 1356 pseudonymity is determined uniquely by the encryption key, and is 1357 deterministic. It requires a separate AES encryption for each output 1358 bit, so has a non-trivial calculation overhead. This can be 1359 mitigated to some extent (for IPv4, at least) by pre-calculating 1360 results for some number of prefix bits. 1362 Pseudonymization: Format-preserving, prefix preservation (general). 1364 B.5. Top-hash Subtree-replicated Anonymisation 1366 Proposed in Ramaswamy & Wolf, Top-hash Subtree-replicated 1367 Anonymisation (TSA) originated in response to the requirement for 1368 faster processing than Crypto-PAn. It used hashing for the most 1369 significant byte of an IPv4 address, and a pre-calculated binary tree 1370 structure for the remainder of the address. To save memory space, 1371 replication is used within the tree structure, reducing the size of 1372 the pre-calculated structures to a few Mb for IPv4 addresses. 1373 Address pseudonymization is done via hash and table lookup, and so 1374 requires minimal computation. However, due to the much increased 1375 address space for IPv6, TSA is not memory efficient for IPv6. 1377 Pseudonymization: Format-preserving, prefix preservation (general). 1379 B.6. ipcipher 1381 A recently-released proposal from PowerDNS [17], ipcipher [18] is a 1382 simple pseudonymization technique for IPv4 and IPv6 addresses. IPv6 1383 addresses are encrypted directly with AES-128 using a key (which may 1384 be derived from a passphrase). IPv4 addresses are similarly 1385 encrypted, but using a recently proposed encryption ipcrypt [19] 1386 suitable for 32bit block lengths. However, the author of ipcrypt has 1387 since indicated [20] that it has low security, and further analysis 1388 has revealed it is vulnerable to attack. 1390 Pseudonymization: Format-preserving, cryptographic permutation. 1392 B.7. Bloom filters 1394 van Rijswijk-Deij et al. [21] have recently described work using 1395 Bloom filters to categorize query traffic and record the traffic as 1396 the state of multiple filters. The goal of this work is to allow 1397 operators to identify so-called Indicators of Compromise (IOCs) 1398 originating from specific subnets without storing information about, 1399 or be able to monitor the DNS queries of an individual user. By 1400 using a Bloom filter, it is possible to determine with a high 1401 probability if, for example, a particular query was made, but the set 1402 of queries made cannot be recovered from the filter. Similarly, by 1403 mixing queries from a sufficient number of users in a single filter, 1404 it becomes practically impossible to determine if a particular user 1405 performed a particular query. Large numbers of queries can be 1406 tracked in a memory-efficient way. As filter status is stored, this 1407 approach cannot be used to regenerate traffic, and so cannot be used 1408 with tools used to process live traffic. 1410 Anonymized: Generalization. 1412 Authors' Addresses 1414 Sara Dickinson 1415 Sinodun IT 1416 Magdalen Centre 1417 Oxford Science Park 1418 Oxford OX4 4GA 1419 United Kingdom 1421 Email: sara@sinodun.com 1423 Benno J. Overeinder 1424 NLnet Labs 1425 Science Park 400 1426 Amsterdam 1098 XH 1427 The Netherlands 1429 Email: benno@nlnetLabs.nl 1431 Roland M. van Rijswijk-Deij 1432 SURFnet bv 1433 PO Box 19035 1434 Utrecht 3501 DA Utrecht 1435 The Netherlands 1437 Email: roland.vanrijswijk@surfnet.nl 1439 Allison Mankin 1440 Salesforce 1442 Email: allison.mankin@gmail.com