idnits 2.17.1 draft-wood-privsec-wfattacks-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 15, 2019) is 1808 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Unused Reference: 'RFC1035' is defined on line 573, but no explicit reference was found in the text == Unused Reference: 'RFC2119' is defined on line 577, but no explicit reference was found in the text == Unused Reference: 'RFC6234' is defined on line 582, but no explicit reference was found in the text == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-20 == Outdated reference: A later version (-18) exists of draft-ietf-tls-esni-03 -- Obsolete informational reference (is this intentional?): RFC 7540 (Obsoleted by RFC 9113) Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 privsec I. Goldberg 3 Internet-Draft University of Waterloo 4 Intended status: Experimental T. Wang 5 Expires: November 16, 2019 HK University of Science and Technology 6 C. Wood 7 Apple, Inc. 8 May 15, 2019 10 Network-Based Website Fingerprinting 11 draft-wood-privsec-wfattacks-00 13 Abstract 15 The IETF is well on its way to protecting connection metadata with 16 protocols such as DNS-over-TLS and DNS-over-HTTPS, and work-in- 17 progress towards encrypting the TLS SNI. However, more work is 18 needed to protect traffic metadata, especially in the context of web 19 traffic. In this document, we survey Website Fingerprinting attacks, 20 which are a class of attacks that use machine learning techniques to 21 attack web privacy, and highlight metadata leaks used by said 22 attacks. We also survey proposed mitigations for such leakage and 23 discuss their applicability to IETF protocols such as TLS, QUIC, and 24 HTTP. We endeavor to show that Website Fingerprinting attacks are a 25 serious problem that affect all Internet users, and we pose open 26 problems and directions for future research in this area. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on November 16, 2019. 45 Copyright Notice 47 Copyright (c) 2019 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 63 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 3. Website Fingerprinting . . . . . . . . . . . . . . . . . . . 4 65 4. Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 66 5. Defenses . . . . . . . . . . . . . . . . . . . . . . . . . . 8 67 6. Open Problems and Directions . . . . . . . . . . . . . . . . 11 68 7. Security Considerations . . . . . . . . . . . . . . . . . . . 12 69 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 70 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 71 9.1. Normative References . . . . . . . . . . . . . . . . . . 13 72 9.2. Informative References . . . . . . . . . . . . . . . . . 13 73 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 18 74 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18 76 1. Introduction 78 Internet protocols such as TLS 1.3 [RFC8446] and QUIC 79 [I-D.ietf-quic-transport] bring substantial improvements to end- 80 users. The IETF engineered these with security and privacy in mind 81 by encrypting more protocol messages using modern cryptographic 82 primitives and algorithms, and engineering against flaws found in 83 previous protocols, yielding several desirable security properties, 84 including: forward-secure session key secrecy, downgrade protection, 85 key compromise impersonation resistance, and protection of endpoint 86 identities. Combined, these two protocols are set to protect a 87 significant amount of Internet data. However, significant metadata 88 leaks still exist for users of these protocols. Examples include 89 plaintext TLS SNI and application-specific extensions (ALPN), as well 90 as DNS queries. This information can be used by a passive attacker 91 to learn information about the contents of an otherwise encrypted 92 network connection. Recently, such information has also been studied 93 as a means of building unique user profiles [li2018can]. It has also 94 been used to build flow classifiers that aid network management 95 [foremski2014dns]. 97 In the context of Tor, a popular low-latency anonymity network, a 98 common class of attacks that use metadata for such inference is 99 called Website Fingerprinting (WF). These attacks use machine 100 learning techniques built with features extracted from metadata such 101 as traffic patterns to attack web (browsing) privacy. Miller et al. 102 [miller2014know] show how these attacks can be applied to web 103 browsing traffic protected with HTTPS to reveal private information 104 about users. Pironti et al. [pironti2012identifying] use similar 105 attacks based on data sizes to identify individual social media 106 clients using encrypted connections. Fingerprinting attacks using 107 encrypted traffic analysis are also applicable to encrypted media 108 streams, such as Netflix videos. (See work from Reed et al. 109 [reed2017identifying] and Schuster et al. [schuster2017beauty] for 110 examples of these attacks.) WF attacks have also been applied to 111 other IETF protocols such as encrypted DNS, including dnscrypt, DNS- 112 over-TLS, and DNS-over-HTTPS [siby2018dns][shulman2014pretty]. In 113 the past, they have also been conducted remotely 114 [gong2010fingerprinting], using buffer-based side channels in a 115 victim's home router. 117 Protocols such as DNS-over-TLS and DNS-over-HTTPS [RFC8484], and 118 work-in-progress towards encrypting the TLS SNI extension 119 [I-D.ietf-tls-esni], help minimize metadata sent in cleartext on the 120 wire. However, regardless of protocol and even network-layer 121 fingerprinting mitigations, application layer specifics, e.g., web 122 page sizes and client request patterns, reveal a noticeable amount of 123 information to attackers. We argue that much more work is needed to 124 protect encrypted connection metadata, especially in the context of 125 web traffic. 127 In this document, we describe WF attacks in the context of IETF 128 protocols such as TLS and QUIC. We survey WF attacks and highlight 129 metadata features and classification techniques used to conduct said 130 attacks. We also describe proposed mitigations for these attacks and 131 discuss their applicability to IETF protocols. We conclude with a 132 discussion of open problems and directions for future research and 133 advocate for more work in this area. 135 2. Background 137 In this section we review how most secure Internet connections are 138 made today. We omit custom configurations such as those using VPNs 139 and proxies since they do not represent the common case for most 140 Internet users. The following steps briefly describe the sequence of 141 events that normally occur when a web client, e.g., browser, curl, 142 etc., connects to a website and obtains some resource. First an 143 unencrypted DNS query is sent to an untrusted DNS recursive resolver 144 to resolve a name to an IP address. Upon receipt, clients then open 145 a TCP and TLS connection to the destination address. During this 146 stage, metadata such as the TLS SNI and ALPN values are sent in 147 cleartext. The SNI is used to denote the destination application or 148 endpoint to which clients want to connect. Servers use this for 149 several purposes, including selecting an appropriate certificate (one 150 with the SNI name in the SubjectAlternativeName list) or routing to a 151 different backend terminator. ALPN values are used to negotiate 152 which application-layer protocol will be used on top of the TLS 153 connection. Common values include "http/1.1", "h2", and (soon) "h3". 154 Upon connection, clients then send HTTP messages to obtain the 155 desired resource. 157 Connections look different (on the wire) with TLS 1.3, encrypted DNS 158 via DNS-over-TLS or DNS-over-HTTPS, and encrypted SNI. DNS queries 159 are encrypted to a (trusted) recursive resolver and TLS metadata such 160 as SNI are encrypted in transit to the terminator. Despite the 161 reduction in cleartext metadata sent over the wire, there still 162 remains several sources of information that an adversary may use for 163 malicious purposes, including: size and timing of DNS queries and 164 responses, size and timing or application traffic, and connection 165 attempts induced while loading a web resource, e.g., Javascript 166 files. So while technologies such as Encrypted SNI, DoT, and DoH 167 help protect some metadata, they are not complete solutions to the 168 larger problem. In the following section, we discuss this 169 overarching problem in detail. 171 3. Website Fingerprinting 173 Website Fingerprinting (WF) is a class of attacks that exploit 174 metadata leakage to attack end-user privacy on the Internet. In the 175 WF threat model, Adv is assumed to be a passive and local attacker. 176 Local means that Adv can associate traffic with a given client. 177 Examples include proxies to which clients directly connect. Passive 178 means that Adv can only view traffic in transit. It cannot add, 179 drop, or otherwise modify packets between the victim client and 180 server(s). Use of reliable and encrypted transport protocols such as 181 TLS limit on-path attackers to eavesdropping on encrypted packets. 182 (In QUIC, however, reordering packets is possible.) 184 Traffic features used for classification include properties such as 185 packet size, timing, direction, interarrival times, and burstiness, 186 among many others [wang2016website]. Normally, features are 187 restricted to those which are extractable as a passive eavesdropper, 188 and not those which are viewable by modifying client or server 189 behavior. Specifically, this means that attacks such as CRIME {{} 190 and TIME {{}, which rely on an attacker abusing TLS-layer compression 191 to leak contents of an encrypted connection, are out of scope. 193 Website Fingerprinting attacks have evolved over the years through 194 three phases: (1) Closed-world WF on SSL/TLS, (2) Closed-world WF on 195 Tor, and (3) Open-world WF on Tor. 197 1. In the closed-world model, clients are assumed to only visit a 198 small set of pages monitored by Adv. This is less realistic but 199 easier to analyze than the open-world model discussed below, and 200 so the earliest results achieved success on SSL/TLS in this 201 model. (For a realistic attack, Adv would need to monitor every 202 possible page of interest to each client, which is impractical.) 203 Attacks against proxy-based privacy technologies such as VPNs and 204 SSH tunneling, which has almost no effect on the network, falls 205 under this category as well. 207 2. Tor, an anonymity network built on onion routing, is harder to 208 attack than SSL for several reasons; successful results on Tor 209 thus came later. First, Tor pads all cells (Tor's application- 210 layer datagrams) to the same constant size, removing unique 211 packet lengths as a powerful feature for the attacker. Second, 212 Tor imposes random network conditions upon the client due to 213 random selection of proxies, so packet sequences are less likely 214 to be consistent. 216 3. In the open-world model, Adv wishes to learn whenever a victim 217 client visits one of a select number of monitored pages 218 [wang2016website]. Adversaries train classifiers in this model 219 using monitored and non-monitored websites of their choosing. By 220 definition, Adv cannot train using client-chosen pages. Clients 221 then visit pages at will and Adv attempts to learn whenever a 222 monitored page is visited, if any are at all. This is a 223 realistic model capturing the fact that the set of pages any 224 attacker would be interested in must necessarily be a small 225 subset of the set of all pages. As this is a harder model to 226 attack, successful results on this model came later. 228 4. Attacks 230 1. Closed-world WF on TLS: WF attacks date back to applications on 231 SSL first inspired by Wagner and Schneier [wagner1996analysis], 232 in which the authors observed that packet lengths reveal 233 information about the underlying data. Subsequent attacks 234 carried out by Cheng et al. [cheng1998traffic], Sun et al. 235 [sun2002statistical], and Hintz [hintz2002fingerprinting] 236 continued to show access. These attacks assume Adv has knowledge 237 of the target resource length(s), which is not always possible 238 with techniques such as padding. 240 Bissias et al. [bissias2005privacy] use cross correlation of inter- 241 packet times in one second time windows as an WF attack. Liberatore 242 and Levine [liberatore2006inferring] proposed two WF attacks using 243 the Jaccard coefficient and the Naive Bayes classifier. Herrmann et 244 al. [herrmann2009website] extended the work of Liberatore and Levine 245 with a multinomial Naive Bayes classifier computed using three input 246 frequency transformations. Results yielded higher accuracy than that 247 of Liberatore and Levine. Herrmann's attack is the best in this 248 category, but the authors assume packets which do not fill a MTU 249 represent packet trailers. Therefore, uniqueness is only accurate 250 modulo the MTU. Efficacy is limited if endpoints pad packets to the 251 MTU or another fixed length. Modern protocols such as HTTP/2, QUIC, 252 and TLS 1.3 all provide some form of application-controlled padding. 253 (Note: These attacks are not successful on Tor.) 255 1. Closed-world WF on Tor: Shmatikov and Wang [shmatikov2006timing] 256 presented a WF attack that exploits cross correlation of arrival 257 packet counts in one second time windows. Lu et al. 258 [lu2010website] developed a classifier based on the Levenshtein 259 distance between ingress and egress packet lengths extracted from 260 packet sequences. Distance is computed between strings of 261 ingress and egress packet lengths. The training packet sequence 262 with the closest distance to the testing packet sequence is 263 deemed the match. Dyer et al. [dyer2012peek] used a Naive Bayes 264 classifier trained with a reduced set of features, including 265 total response transmission time, length of packets (in each 266 direction), and burst lengths. (Wang [wang2016website] notes 267 that measuring burst lengths in Tor is difficult given the 268 presence of SENDME cells for flow control.) This approach did 269 not yield any measurable improvements over the SVM classifier 270 from Panchenko et al. Cai et al. [cai2012touching] extend the 271 work of Lu et al. by adding transpositions to the Levenshtein 272 distance computation and normalizing the result, yielding what 273 the authors refer to as the Optimal String Alignment Distance 274 (OSAD). Before feature extraction, the authors round TCP packet 275 lengths to the nearest multiple of 600B as an estimate of the 276 number of Tor cells. 278 Wang et al. [wang2013improved] tuned the OSAD-based attack to improve 279 its accuracy. Specific changes include use of Tor cells instead of 280 TCP packets for packet and burst lengths, as well as heuristics to 281 remove SENDME cells (those not carrying application data) from flows 282 to recover true burst lengths. The authors also modified the 283 distance computation by removing substitutions, increasing the weight 284 for egress packets, and varying the transposition cost across the 285 packet sequence (large weights at the beginning of a trace, and 286 smaller weights near the end, where variations are expected across 287 repeated page loads.) Wang et al. also developed an alternate 288 classifier with lower accuracy yet superior performance (quadratic to 289 linear time complexity). It works by minimizing the sum of two 290 costs: sequence transpositions and sequence deletions or insertions. 291 These two costs are computed separately, in contrast to the first 292 approach which computes them simultaneously. 294 1. Open-world WF on Tor and TLS: Panchenko et al. 295 [panchenko2011website] were the first to use a support vector 296 machine (SVM) classifier trained with web domain-specific 297 features, such as HTML document sizes, as well as packet lengths. 298 Wang et al. [wang2014effective] also developed an attack using a 299 k-Nearest Neighbors (k-NN) classifier, which is a supervised 300 machine learning algorithm, targeting the open world setting. 301 The classifier extracts a large number of features from packet 302 sequences, including raw (ingress and egress) packet counts, 303 unique packet lengths, direction, burst lengths, and inter-packet 304 times, among others. (There are 4226 features in total.) The 305 k-NN distance metric is computed as the sum of weighted feature 306 differences. 308 Kota et al. [abe2016fingerprinting] were the first to use Deep 309 Learning (DL) methods based on Stacked Denoising Autoencoders for WF 310 attacks. (Autoencoders reduce feature input dimensions when 311 stacked.) Kota et al. form input vectors from Tor cell directions 312 (+1 or -1). They use no other features. Using a (small) data set 313 from Wang [wang2016website], the classifier achieves a 86% true 314 positive rate and 2% false positive rate in the open world model. 315 Rimmer et al. [rimmer2018automated] applied DL for automated feature 316 generation and classifier construction. Trained with 2,500 traces 317 per website, their system achieves 96.3% accuracy in the open world 318 model. Recently, Bhat et al. [bhat2018var], Oh et al. [oh2017pfp], 319 and Sirinam et al. [sirinam2018deep] used Convolutional Neural 320 Networks (CNNs) and Deep Neural Networks (DNNs) for WF attacks. 321 Results from Sirinam et al. show the best results - 98% on Tor 322 without recent defenses (in Section {{defenses}) - while performing 323 favorably when select defenses are used for both open and closed 324 world models. 326 Yan et al. [yan2018feature] studied manual high-information feature 327 extraction from packet traces. They "exhaustively" examined 328 different levels of features, including packet, burst, TCP, port, and 329 IP address, summing to 35,683 in total, and distilled them into a 330 diverse set of uncorrelated features for eight different 331 communication scenarios. Rahman [rahman2018using] studied the 332 utility of features derived from packet interarrival times, 333 including: median interarrival time (per burst), burst packet arrival 334 time variance, cross-burst interarrival median differences, and 335 others. Using a CNN, results show that these features yield a non- 336 negligible increase in WF attack accuracy. 338 For all WF attacks, one limitation worth highlighting is the base 339 rate fallacy. This can be summarized as follows: highly accurate 340 classifiers with a reliable false positive rate (FPR) decrease in 341 efficacy as the world size increases. Juarez et al. 342 [juarez2014critical] studied its impact by measuring the Bayesian 343 detection rate (BDR) in comparison to the FPR as a function of world 344 size. As the world size increases, the BDR approaches 0 while the 345 FPR remains stable, meaning that the probability of incorrect 346 classifier results increase as well. Juarez et al. partially address 347 the base rate fallacy problem by adding a confirmation step to their 348 classifier. Another problem is that web content is (increasingly) 349 dynamic. Most WF attacks, especially those in closed world models, 350 assume that traces are static. However, Juarez et al. 351 [juarez2014critical] show this is not the case even for "simple" 352 pages such as google.com. Thus, due to the base fallacy rate and 353 dynamic nature of content, classifiers require continual retraining 354 in order to ensure accuracy. 356 5. Defenses 358 WF defenses are deterministic or randomized algorithms that take as 359 input application data or packet sequences and return modified 360 application data or packet sequences. Viable defenses seek to 361 minimize the transformation cost and maximum (theoretical and 362 perfect) attacker accuracy. Naive defenses such as sending a 363 constant stream of (possibly random) bytes between client and server 364 may be effective though clearly not viable from a cost perspective. 365 Relevant cost metrics include bandwidth overhead, added time or 366 latency (and its impact on related metrics such as page load time), 367 and even CPU cost, though the latter is often ignored in favor of the 368 former two. Wang [wang2016website] describe defenses as either 369 limited or general. A limited defense is one which only helps 370 mitigate specific WF attacks by transforming packets in a way to 371 obviate a particular (set of) feature(s) used by said attacks. In 372 contrast, general defenses help mitigate a variety of attacks. 374 Several general defenses have been proposed, including BuFLO 375 [dyer2012peek], which pads packets to a fixed length of 1500B (the 376 normal MTU) and schedules packets for transmission at fixed period 377 intervals (and sends fake data if nothing is yet available). Tamaraw 378 [wang2014comparing] is an improvement over BuFLO that uses two 379 different fixed lengths for packet transmission, rather than one, to 380 save on bandwidth overhead. Tamaraw also uses two different 381 scheduling rates for ingress and egress packets. The authors chose 382 to make the ingress packet period smaller than the egress packet 383 period since HTTP responses are often larger in size and count - if 384 HTTP Push is used - than requests. While provably correct, both 385 BuFLO and Tamaraw limit the rate at which clients send traffic, and 386 requires all clients to send at a uniform rate. Both requirements 387 therefore make it difficult to apply as a generic defense in IETF 388 protocols. 390 Wang et al. also developed Supersequence [wang2016website], which 391 attempts to approximate a bandwidth-optimal deterministic defense. 392 This is done by casting the padding and flow control problem as the 393 shortest common subsequence (SCS) of the transformed packet trace. 394 Supersequence approximates the solution by learning the optimal 395 packet scheduling rate; it uses the same padding scheme as Tamaraw. 397 Walkie-Talkie [wang2015walkie] is a collection of mechanisms for WF 398 defense. It includes running the client (browser) in half-duplex 399 mode to batch requests and responses together, as well as randomly 400 padding traffic so as to mimic traffic of benign websites. It 401 assumes knowledge of traffic patterns for benign websites, which can 402 be information learned over time or provided by a cooperating peer. 403 Goldberg and Wang also propose a "randomized" variant that pads real 404 bursts of requests and generates random request bursts according to a 405 uniform distribution. The half-duplex mode could be implemented as 406 an extension to a protocol such as HTTP/2, QUIC, or TLS. 408 Many limited defenses have also been proposed. We list prominent 409 works below. 411 o Shmatikov and Wang [shmatikov2006timing] developed adaptive 412 padding which adds packets to mask inter-packet times. (This 413 mechanism does not ever delay application data being sent, in 414 contrast to other padding mechanisms such as BuFLO; see below.) 415 Juarez et al. [juarez2015wtf]}[juarez2016toward] also created a WF 416 defense based on adaptive padding called WTF-PAD. This variant 417 uses application data and "gap" distribution to generate padding 418 for delays. Specifically, when not sending application data, 419 senders use the gap distribution to drive fake packet 420 transmission. WTF-PAD can be run by a single endpoint, though it 421 is assumed that both client and server participate. As mentioned 422 above, protocols such as HTTP/2, QUIC, and TLS 1.3 offer a 423 mechanism by which applications can send padding. WTF-PAD could 424 therefore be implemented as an extension to any of these 425 protocols, either by applications supplying padding distributions 426 or the system learning them over time. 428 o Wright et al. [wright2009traffic] developed traffic morphing, 429 which pads packets in such a way so as to make the sequence from 430 one page have characteristics of another (non-monitored or benign) 431 page. This technique requires application-specific knowledge 432 about benign pages and is therefore best implemented outside of 433 the transport layer. 435 o Nithyanand et al. [nithyanand2014glove] developed a mechanism 436 called Glove, wherein traces were first clustered and then morphed 437 (via dummy insertion, packet merging, splitting, and delaying) to 438 look indistinguishable within clusters. When used to protect the 439 Alexa top 500 domains, Glove performs well with respect to 440 bandwidth overhead when compared to BuFLO and CS-BuFLO. Varying 441 the cluster size can tune Glove's bandwidth overhead. 443 o Pironti et al. [pironti2012identifying] developed a TLS-based 444 fragmentation and padding scheme designed to hide the length of 445 application data within a certain range with record padding. The 446 mechanism works by iteratively splitting application data into 447 variable sized segments. Applications can guide the range of 448 viable lengths provided such information is available. 450 o Luo et al. [luo2011httpos] created HTTPS with Obfuscation 451 (HTTPOS), which is a client-side mechanism for obfuscating HTTP 452 traffic. It uses the HTTP Range method to receive resources in 453 chunks, TCP MSS to limit the size of individual chunks, and 454 advertised window size to control the flow of chunks in 455 transmission. 457 o Panchenko et al. [panchenko2011website] developed Decoy, which is 458 a simple mechanism that loads a benign page alongside a real page. 459 This seeks to mask the real page load by properties of the "decoy" 460 page. As with morphing, this defense requires application- 461 specific knowledge about benign pages and is best implemented 462 outside of the transport layer. 464 o The Tor project implemented HTTP pipelining 465 [perry2011experimental], which bundles egress HTTP/1.1 requests 466 into batches of varying sizes with random orders. Batching 467 requests to mask request and response sizes could be made easier 468 with HTTP/2 [RFC7540], HTTP/3, and QUIC, since these protocol 469 naturally support multiplexing. However, pipelining and batching 470 may necessarily introduce latency delays that negatively impact 471 the user experience. 473 o Cherubin et al. [cherubin2017website] design two application-layer 474 defenses called Application Layer Padding Concerns Adversaries 475 (ALPaCA) and Lightweight application-Layer Masquerading Add-on 476 (LLaMA). ALPaCA is a server-side defense that pads first-party 477 content (deterministically or probabilistically) according to a 478 known distribution. (Deterministic padding similar to Tamaraw 479 performs worse than probabilistic padding.) LLaMA is similar to 480 randomized pipelining, yet differs in that requests are also 481 delayed (if necessary) and spurious requests are generated 482 according to some probability distribution. Comparatively, ALPaCA 483 yields a greater reduction in WF attack accuracy than LLaMA. 485 o Lu et al. [lu2018dynaflow] designed DynaFlow, which is a defense 486 that dynamically adjusts traffic flows using a combination of 487 burst pattern morphing, constant traffic flow with flexible 488 intervals, and burst padding. DynaFlow overhead is 40% less than 489 that of Tamaraw and was shown to have similar benefits. 491 6. Open Problems and Directions 493 To date, WF attacks target clients running over Tor or some other 494 anonymizing service, meaning that WF attacks are likely more accurate 495 on normal TLS-protected connections. Moreover, attacks normally 496 assume clients use HTTP/1.1 with parallel connections for parallel 497 resource fetches. In recent years, however, protocols such as SPDY, 498 HTTP/2, and QUIC with built-in padding support and multiplexed 499 stream-based connections should make existing attacks more difficult 500 to carry out. That said, it is unclear how exactly these protocol 501 design trends will impact WF attacks. A non-exhaustive list of 502 questions that warrant further research are below: 504 1. How does connection coalescing and consolidation affect WF 505 attacks? Technologies such as DNS-over-HTTPS and ESNI favor 506 architectures wherein a single network or connection can serve 507 multiple origins or resources. With connection coalescing, 508 traffic for multiple resources is sent on the same connection, 509 thereby adding effects similar to that of the Decoy defense 510 mechanism described in Section 5 512 2. To what extent does protocol multiplexing increase WF attack 513 difficulty? Using a single connection with multiple streams to 514 avoid HoL blocking saves on connection startup and bandwidth 515 costs while simultaneously mixing information from multiple 516 requests and resources on the same connection. 518 3. How can protocol features such as HTTP Push be used to improve WF 519 defense efficacy? Defenses without cooperative peer support 520 often induce suboptimal bandwidth or latency costs. If both 521 endpoints of a connection participate in the defense, even 522 proactively with Push, perhaps this could be improved. 524 4. Can connection bootstrapping techniques such as those used by 525 ESNI be used to distribute WF defense information? One possible 526 approach is to distribute client padding profiles derived from 527 CDN knowledge of serviced resources. 529 5. How can clients build, use, and possibly share WF defense 530 information to benefit others? 532 6. How can applications package websites and subresources in such a 533 way that limits unique information? For example, websites link 534 to third party resources in an ad-hoc fashion, causing the 535 subsequent trace of browser fetches to possibly uniquely identify 536 the website. 538 Research into the above questions will help the IETF community better 539 understand the extent to which WF attacks are a problem for Internet 540 users in general. 542 It is worth mentioning that traffic-based WF attacks may not be 543 required to achieve the desired goal of learning a connection's 544 destination. Network connections by nature reveal information about 545 endpoint behavior. For example, a connection to 8.8.8.8 indicates 546 usage of Google's DNS service. Likewise, a connection to any address 547 in a Cloudflare IP address block indicates use of a service hosted by 548 Cloudflare. The relationship between network address and domains, 549 especially when stable and unique, are a strong signal for website 550 fingerprinting. Trevisan et al. [trevisan2016towards] explored use 551 of this signal as a reliable mechanism for website fingerprinting. 552 They find that most major services (domains) have clearly associated 553 IP address(es), though these addresses may change over time. Jiang 554 et al. [jiang2007lightweight] and Tammaro et al. 555 [tammaro2012exploiting] also previously came to the same conclusion. 556 Thus, classifiers that rely solely on network addresses may be used 557 to aid website fingerprinting attacks. 559 7. Security Considerations 561 This document surveys security and privacy attacks and defenses on 562 encrypted TLS connections. It does not introduce, specify, or 563 recommend any particular mitigation to the aforementioned attacks. 565 8. IANA Considerations 567 This document makes no IANA requests. 569 9. References 571 9.1. Normative References 573 [RFC1035] Mockapetris, P., "Domain names - implementation and 574 specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, 575 November 1987, . 577 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 578 Requirement Levels", BCP 14, RFC 2119, 579 DOI 10.17487/RFC2119, March 1997, . 582 [RFC6234] Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms 583 (SHA and SHA-based HMAC and HKDF)", RFC 6234, 584 DOI 10.17487/RFC6234, May 2011, . 587 9.2. Informative References 589 [abe2016fingerprinting] 590 "Fingerprinting attack on tor anonymity using deep 591 learning", Asia-Pacific Advanced Network, 2016 , n.d.. 593 [backes2013preventing] 594 "Preventing Side-Channel Leaks in Web Traffic -- A Formal 595 Approach", NDSS, 2013 , n.d.. 597 [bhat2018var] 598 "Var-CNN and DynaFlow -- Improved Attacks and Defenses for 599 Website Fingerprinting", arXiv preprint arXiv:1802.10215 , 600 n.d.. 602 [bissias2005privacy] 603 "Privacy vulnerabilities in encrypted HTTP streams", 604 International Workshop on Privacy Enhancing Technologies, 605 2005 , n.d.. 607 [cai2012touching] 608 "Touching from a distance -- Website fingerprinting 609 attacks and defenses", ACM conference on Computer and 610 communications security, 2012 , n.d.. 612 [cheng1998traffic] 613 "Traffic analysis of SSL encrypted web browsing", n.d.. 615 [cherubin2017website] 616 "Website fingerprinting defenses at the application 617 layer", Privacy Enhancing Technologies, 2017 , n.d.. 619 [coull2007web] 620 "On Web Browsing Privacy in Anonymized NetFlows", USENIX 621 Security Symposium , n.d.. 623 [dyer2012peek] 624 "Peek-a-boo, i still see you -- Why efficient traffic 625 analysis countermeasures fail", IEEE Symposium on Security 626 and Privacy, 2012 , n.d.. 628 [foremski2014dns] 629 "DNS-Class -- immediate classification of IP flows using 630 DNS", International Journal of Network Management , n.d.. 632 [gong2010fingerprinting] 633 "Fingerprinting websites using remote traffic analysis", 634 Proceedings of the 17th ACM conference on Computer and 635 communications security , n.d.. 637 [hayes2016k] 638 "k-fingerprinting -- A Robust Scalable Website 639 Fingerprinting Technique", USENIX Security Symposium, 640 2016 , n.d.. 642 [herrmann2009website] 643 "Website fingerprinting -- attacking popular privacy 644 enhancing technologies with the multinomial naive-bayes 645 classifier", ACM workshop on Cloud computing security, 646 2009 , n.d.. 648 [hintz2002fingerprinting] 649 "Fingerprinting websites using traffic analysis", 650 International Workshop on Privacy Enhancing Technologies, 651 2002 , n.d.. 653 [I-D.ietf-quic-transport] 654 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 655 and Secure Transport", draft-ietf-quic-transport-20 (work 656 in progress), April 2019. 658 [I-D.ietf-tls-esni] 659 Rescorla, E., Oku, K., Sullivan, N., and C. Wood, 660 "Encrypted Server Name Indication for TLS 1.3", draft- 661 ietf-tls-esni-03 (work in progress), March 2019. 663 [jiang2007lightweight] 664 "Lightweight application classification for network 665 management", SIGCOMM workshop on Internet network 666 management, 2007 , n.d.. 668 [juarez2014critical] 669 "A critical evaluation of website fingerprinting attacks", 670 ACM SIGSAC Conference on Computer and Communications 671 Security, 2014 , n.d.. 673 [juarez2015wtf] 674 "WTF-PAD -- toward an efficient website fingerprinting 675 defense for tor", CoRR, abs/1512.00524 , n.d., . 679 [juarez2016toward] 680 "Toward an efficient website fingerprinting defense", 681 European Symposium on Research in Computer Security, 682 2016 , n.d.. 684 [li2018can] 685 "Can We Learn What People Are Doing from Raw DNS 686 Queries?", IEEE INFOCOM 2018-IEEE Conference on Computer 687 Communications , n.d.. 689 [liberatore2006inferring] 690 "Inferring the source of encrypted HTTP connections", ACM 691 Conference on Computer and Communications Security, 2006 , 692 n.d.. 694 [lu2010website] 695 "Website fingerprinting and identification using ordered 696 feature sequences", European Symposium on Research in 697 Computer Security, 2010 , n.d.. 699 [lu2018dynaflow] 700 "DynaFlow -- An Efficient Website Fingerprinting Defense 701 Based on Dynamically-Adjusting Flows", Workshop on Privacy 702 in the Electronic Society, 2018 , n.d.. 704 [luo2011httpos] 705 "HTTPOS -- Sealing Information Leaks with Browser-side 706 Obfuscation of Encrypted Flows", NDSS, 2011 , n.d.. 708 [miller2014know] 709 "I know why you went to the clinic -- Risks and 710 realization of https traffic analysis", International 711 Symposium on Privacy Enhancing Technologies Symposium, 712 2014 , n.d.. 714 [nithyanand2014glove] 715 "Glove -- A bespoke website fingerprinting defense", 716 Proceedings of the 13th Workshop on Privacy in the 717 Electronic Society , n.d.. 719 [oh2017pfp] 720 "p-FP -- Extraction, Classification, and Prediction of 721 Website Fingerprints with Deep Learning", n.d.. 723 [panchenko2011website] 724 "Website fingerprinting in onion routing based 725 anonymization networks", ACM workshop on Privacy in the 726 electronic society, 2011 , n.d.. 728 [perry2011experimental] 729 "Experimental defense for website traffic fingerprinting", 730 n.d., . 733 [pironti2012identifying] 734 "Identifying website users by TLS traffic analysis -- New 735 attacks and effective countermeasures", n.d.. 737 [rahman2018using] 738 "Using Packet Timing Information in Website 739 Fingerprinting", n.d.. 741 [reed2017identifying] 742 "Identifying https-protected netflix videos in real-time", 743 ACM on Conference on Data and Application Security and 744 Privacy, 2017 , n.d.. 746 [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext 747 Transfer Protocol Version 2 (HTTP/2)", RFC 7540, 748 DOI 10.17487/RFC7540, May 2015, . 751 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol 752 Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, 753 . 755 [RFC8484] Hoffman, P. and P. McManus, "DNS Queries over HTTPS 756 (DoH)", RFC 8484, DOI 10.17487/RFC8484, October 2018, 757 . 759 [rimmer2018automated] 760 "Automated website fingerprinting through deep learning", 761 Network & Distributed System Security Symposium (NDSS), 762 2018 , n.d.. 764 [schuster2017beauty] 765 "Beauty and the burst -- Remote identification of 766 encrypted video streams", USENIX Security, 2017 , n.d.. 768 [shmatikov2006timing] 769 "Timing analysis in low-latency mix networks -- Attacks 770 and defenses", European Symposium on Research in Computer 771 Security, 2006 , n.d.. 773 [shulman2014pretty] 774 "Pretty bad privacy -- Pitfalls of DNS encryption", 775 Workshop on Privacy in the Electronic Society, 2014 , 776 n.d.. 778 [siby2018dns] 779 "DNS Privacy not so private -- the traffic analysis 780 perspective", n.d.. 782 [sirinam2018deep] 783 "Deep fingerprinting -- Undermining website fingerprinting 784 defenses with deep learning", arXiv preprint 785 arXiv:1801.02265 , n.d.. 787 [sun2002statistical] 788 "Statistical identification of encrypted web browsing 789 traffic", IEEE, 2002 , n.d.. 791 [tammaro2012exploiting] 792 "Exploiting packet-sampling measurements for traffic 793 characterization and classification", International 794 Journal of Network Management, 2012 , n.d.. 796 [trevisan2016towards] 797 "Towards web service classification using addresses and 798 DNS", Wireless Communications and Mobile Computing 799 Conference (IWCMC), 2016 International. IEEE, 2016 , n.d.. 801 [wagner1996analysis] 802 "Analysis of the SSL 3.0 protocol", USENIX Workshop on 803 Electronic Commerce Proceedings, 1996 , n.d.. 805 [wang2013improved] 806 "Improved website fingerprinting on tor", Workshop on 807 privacy in the electronic society, 2013 , n.d.. 809 [wang2014comparing] 810 "Comparing website fingerprinting attacks and defenses", 811 Technical Report 2013-30, CACR, 2013. , n.d.. 813 [wang2014effective] 814 "Effective Attacks and Provable Defenses for Website 815 Fingerprinting", USENIX Security Symposium, 2014 , n.d.. 817 [wang2015walkie] 818 "Walkie-talkie -- An effective and efficient defense 819 against website fingerprinting", n.d.. 821 [wang2016website] 822 "Website fingerprinting -- Attacks and defenses", 823 University of Waterloo , n.d.. 825 [wright2009traffic] 826 "Traffic Morphing -- An Efficient Defense Against 827 Statistical Traffic Analysis", NDSS, 2009 , n.d.. 829 [yan2018feature] 830 "Feature selection for website fingerprinting", 831 Proceedings on Privacy Enhancing Technologies, 2018 , 832 n.d.. 834 Appendix A. Acknowledgements 836 The authors would like to thank Frederic Jacobs and Tim Taubert for 837 feedback on earlier versions of this document. 839 Authors' Addresses 841 Ian Goldberg 842 University of Waterloo 844 Email: (iang@uwaterloo.ca 845 Tao Wang 846 HK University of Science and Technology 848 Email: taow@cse.ust.hk 850 Christopher A. Wood 851 Apple, Inc. 853 Email: cawood@apple.com