idnits 2.17.1 draft-iab-privsec-confidentiality-threat-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 11, 2015) is 3263 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'TOR' is defined on line 919, but no explicit reference was found in the text == Unused Reference: 'RFC2015' is defined on line 991, but no explicit reference was found in the text == Unused Reference: 'RFC2821' is defined on line 994, but no explicit reference was found in the text == Unused Reference: 'RFC3851' is defined on line 1009, but no explicit reference was found in the text == Unused Reference: 'RFC4301' is defined on line 1017, but no explicit reference was found in the text == Unused Reference: 'RFC4306' is defined on line 1023, but no explicit reference was found in the text == Unused Reference: 'RFC5655' is defined on line 1035, but no explicit reference was found in the text == Unused Reference: 'RFC5750' is defined on line 1039, but no explicit reference was found in the text == Unused Reference: 'RFC6120' is defined on line 1043, but no explicit reference was found in the text == Unused Reference: 'RFC6698' is defined on line 1049, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 2821 (Obsoleted by RFC 5321) -- Obsolete informational reference (is this intentional?): RFC 3501 (Obsoleted by RFC 9051) -- Obsolete informational reference (is this intentional?): RFC 3851 (Obsoleted by RFC 5751) -- Obsolete informational reference (is this intentional?): RFC 4306 (Obsoleted by RFC 5996) -- Obsolete informational reference (is this intentional?): RFC 5246 (Obsoleted by RFC 8446) -- Obsolete informational reference (is this intentional?): RFC 5750 (Obsoleted by RFC 8550) -- Obsolete informational reference (is this intentional?): RFC 6962 (Obsoleted by RFC 9162) == Outdated reference: A later version (-06) exists of draft-ietf-dprive-problem-statement-02 Summary: 1 error (**), 0 flaws (~~), 12 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Barnes 3 Internet-Draft 4 Intended status: Informational B. Schneier 5 Expires: November 12, 2015 6 C. Jennings 8 T. Hardie 10 B. Trammell 12 C. Huitema 14 D. Borkmann 15 May 11, 2015 17 Confidentiality in the Face of Pervasive Surveillance: A Threat Model 18 and Problem Statement 19 draft-iab-privsec-confidentiality-threat-06 21 Abstract 23 Since the initial revelations of pervasive surveillance in 2013, 24 several classes of attacks on Internet communications have been 25 discovered. In this document we develop a threat model that 26 describes these attacks on Internet confidentiality. We assume an 27 attacker that is interested in undetected, indiscriminate 28 eavesdropping. The threat model is based on published, verified 29 attacks. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at http://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on November 12, 2015. 48 Copyright Notice 50 Copyright (c) 2015 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (http://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 1. Introduction 65 Starting in June 2013, documents released to the press by Edward 66 Snowden have revealed several operations undertaken by intelligence 67 agencies to exploit Internet communications for intelligence 68 purposes. These attacks were largely based on protocol 69 vulnerabilities that were already known to exist. The attacks were 70 nonetheless striking in their pervasive nature, both in terms of the 71 amount of Internet communications targeted, and in terms of the 72 diversity of attack techniques employed. 74 To ensure that the Internet can be trusted by users, it is necessary 75 for the Internet technical community to address the vulnerabilities 76 exploited in these attacks [RFC7258]. The goal of this document is 77 to describe more precisely the threats posed by these pervasive 78 attacks, and based on those threats, lay out the problems that need 79 to be solved in order to secure the Internet in the face of those 80 threats. 82 The remainder of this document is structured as follows. In 83 Section 3, we describe an idealized passive pervasive attacker, one 84 which could completely undetectably compromise communications at 85 Internet scale. In Section 4, we provide a brief summary of some 86 attacks that have been disclosed, and use these to expand the assumed 87 capabilities of our idealized attacker. Note that we do not attempt 88 to describe all possible attacks, but focus on those which result in 89 undetected eavesdropping. Section 5 describes a threat model based 90 on these attacks, focusing on classes of attack that have not been a 91 focus of Internet engineering to date. 93 2. Terminology 95 This document makes extensive use of standard security and privacy 96 terminology; see [RFC4949] and [RFC6973]. Terms used from [RFC6973] 97 include Eavesdropper, Observer, Initiator, Intermediary, Recipient, 98 Attack (in a privacy context), Correlation, Fingerprint, Traffic 99 Analysis, and Identifiability (and related terms). In addition, we 100 use a few terms that are specific to the attacks discussed in this 101 document. Note especially that "passive" and "active" below do not 102 refer to the effort used to mount the attack; a "passive attack" is 103 any attack that accesses a flow but does not modify it, while an 104 "active attack" is any attack that modifies a flow. Some passive 105 attacks involve active interception and modifications of devices, 106 rather than simple access to the medium. The introduced terms are: 108 Pervasive Attack: An attack on Internet communications that makes 109 use of access at a large number of points in the network, or 110 otherwise provides the attacker with access to a large amount of 111 Internet traffic; see [RFC7258]. 113 Passive Pervasive Attack: An eavesdropping attack undertaken by a 114 pervasive attacker, in which the packets in a traffic stream 115 between two endpoints are intercepted, but in which the attacker 116 does not modify the packets in the traffic stream between two 117 endpoints, modify the treatment of packets in the traffic stream 118 (e.g. delay, routing), or add or remove packets in the traffic 119 stream. Passive pervasive attacks are undetectable from the 120 endpoints. Equivalent to passive wiretapping as defined in 121 [RFC4949]; we use an alternate term here since the methods 122 employed are wider than those implied by the word "wiretapping", 123 including the active compromise of intermediate systems. 125 Active Pervasive Attack: An attack undertaken by a pervasive 126 attacker, which in addition to the elements of a passive pervasive 127 attack, also includes modification, addition, or removal of 128 packets in a traffic stream, or modification of treatment of 129 packets in the traffic stream. Active pervasive attacks provide 130 more capabilities to the attacker at the risk of possible 131 detection at the endpoints. Equivalent to active wiretapping as 132 defined in [RFC4949]. 134 Observation: Information collected directly from communications by 135 an eavesdropper or observer. For example, the knowledge that 136 sent a message to via SMTP 137 taken from the headers of an observed SMTP message would be an 138 observation. 140 Inference: Information extracted from analysis of information 141 collected directly from communications by an eavesdropper or 142 observer. For example, the knowledge that a given web page was 143 accessed by a given IP address, by comparing the size in octets of 144 measured network flow records to fingerprints derived from known 145 sizes of linked resources on the web servers involved, would be an 146 inference. 148 Collaborator: An entity that is a legitimate participant in a 149 communication, but who deliberately provides information about 150 that interaction to an attacker. 152 Unwitting Collaborator: An entity that is a legitimate participant 153 in a communication, and who is the source of information obtained 154 by the attacker without the entity's consent or intention, because 155 the attacker has exploited some technology used by the entity. 157 Key Exfiltration: The transmission of cryptographic keying material 158 for an encrypted communication from a collaborator, deliberately 159 or unwittingly, to an attacker. 161 Content Exfiltration: The transmission of the content of a 162 communication from a collaborator, deliberately or unwittingly, to 163 an attacker 165 3. An Idealized Passive Pervasive Attacker 167 In considering the threat posed by pervasive surveillance, we begin 168 by defining an idealized passive pervasive attacker. While this 169 attacker is less capable than those which we now know to have 170 compromised the Internet from press reports, as elaborated in 171 Section 4, it does set a lower bound on the capabilities of an 172 attacker interested in indiscriminate passive surveillance while 173 interested in remaining undetectable. We note that, prior to the 174 Snowden revelations in 2013, the assumptions of attacker capability 175 presented here would be considered on the border of paranoia outside 176 the network security community. 178 Our idealized attacker is an indiscriminate eavesdropper on an 179 Internet-attached computer network that: 181 o can observe every packet of all communications at any hop in any 182 network path between an initiator and a recipient; 184 o can observe data at rest in any intermediate system between the 185 endpoints controlled by the initiator and recipient; and 187 o can share information with other such attackers; but 188 o takes no other action with respect to these communications (i.e., 189 blocking, modification, injection, etc.). 191 The techniques available to our ideal attacker are direct 192 observation and inference. Direct observation involves taking 193 information directly from eavesdropped communications, such as 194 URLs identifying content or email addresses identifying 195 individuals from application- layer headers. Inference, on the 196 other hand, involves analyzing observed information to derive new 197 information, such as searching for application or behavioral 198 fingerprints in observed traffic to derive information about the 199 observed individual. The use of encryption is generally 200 sufficient to provide confidentiality by preventing direct 201 observation of content, assuming of course, uncompromised 202 encryption implementations and cryptographic keying material. 203 However, encryption provides less complete protection against 204 inference, especially inferences based only on plaintext portions 205 of communications, such as IP and TCP headers for TLS-protected 206 traffic [RFC5246]). 208 3.1. Information subject to direct observation 210 Protocols which do not encrypt their payload make the entire content 211 of the communication available to the idealized attacker along their 212 path. Following the advice in [RFC3365], most such protocols have a 213 secure variant which encrypts payload for confidentiality, and these 214 secure variants are seeing ever-wider deployment. A noteworthy 215 exception is DNS [RFC1035], as DNSSEC [RFC4033] does not have 216 confidentiality as a requirement. 218 This implies that, in the absence of changes to the protocol as 219 presently under development in the IETF's DNS Private Exchange 220 (DPRIVE) working group [I-D.ietf-dprive-problem-statement], all DNS 221 queries and answers generated by the activities of any protocol are 222 available to the attacker. 224 When store-and-forward protocols are used, (e.g. SMTP [RFC5321]) 225 intermediaries leave this data subject to observation by an attacker 226 that has compromised these intermediaries, unless the data is 227 encrypted end-to-end by the application layer protocol, or the 228 implementation uses an encrypted store for this data. 230 3.2. Information useful for inference 232 Inference is information extracted from later analysis of an observed 233 or eavesdropped communication, and/or correlation of observed or 234 eavesdropped information with information available from other 235 sources. Indeed, most useful inference performed by the attacker 236 falls under the rubric of correlation. The simplest example of this 237 is the observation of DNS queries and answers from and to a source 238 and correlating those with IP addresses with which that source 239 communicates. This can give access to information otherwise not 240 available from encrypted application payloads (e.g., the Host: 241 HTTP/1.1 request header when HTTP is used with TLS). 243 Protocols which encrypt their payload using an application- or 244 transport-layer encryption scheme (e.g. TLS) still expose all the 245 information in their network and transport layer headers to the 246 attacker, including source and destination addresses and ports. 247 IPsec ESP[RFC4303] further encrypts the transport-layer headers, but 248 still leaves IP address information unencrypted; in tunnel mode, 249 these addresses correspond to the tunnel endpoints. Features of the 250 security protocols themselves, e.g. the TLS session identifier, may 251 leak information that can be used for correlation and inference. 252 While this information is much less semantically rich than the 253 application payload, it can still be useful for the inferring an 254 individual's activities. 256 Inference can also leverage information obtained from sources other 257 than direct traffic observation. Geolocation databases, for example, 258 have been developed that map IP addresses to a location, in order to 259 provide location-aware services such as targeted advertising. This 260 location information is often of sufficient resolution that it can be 261 used to draw further inferences toward identifying or profiling an 262 individual. 264 Social media provide another source of more or less publicly 265 accessible information. This information can be extremely 266 semantically rich, including information about an individual's 267 location, associations with other individuals and groups, and 268 activities. Further, this information is generally contributed and 269 curated voluntarily by the individuals themselves: it represents 270 information which the individuals are not necessarily interested in 271 protecting for privacy reasons. However, correlation of this social 272 networking data with information available from direct observation of 273 network traffic allows the creation of a much richer picture of an 274 individual's activities than either alone. 276 We note with some alarm that there is little that can be done at 277 protocol design time to limit such correlation by the attacker, and 278 that the existence of such data sources in many cases greatly 279 complicates the problem of protecting privacy by hardening protocols 280 alone. 282 3.3. An illustration of an ideal passive pervasive attack 284 To illustrate how capable the idealized attacker is even given its 285 limitations, we explore the non-anonymity of encrypted IP traffic in 286 this section. Here we examine in detail some inference techniques 287 for associating a set of addresses with an individual, in order to 288 illustrate the difficulty of defending communications against our 289 idealized attacker. Here, the basic problem is that information 290 radiated even from protocols which have no obvious connection with 291 personal data can be correlated with other information which can 292 paint a very rich behavioral picture, that only takes one unprotected 293 link in the chain to associate with an identity. 295 3.3.1. Analysis of IP headers 297 Internet traffic can be monitored by tapping Internet links, or by 298 installing monitoring tools in Internet routers. Of course, a single 299 link or a single router only provides access to a fraction of the 300 global Internet traffic. However, monitoring a number of high 301 capacity links or a set of routers placed at strategic locations 302 provides access to a good sampling of Internet traffic. 304 Tools like IPFIX [RFC7011] allow administrators to acquire statistics 305 about sequences of packets with some common properties that pass 306 through a network device. The most common set of properties used in 307 flow measurement is the "five-tuple"of source and destination 308 addresses, protocol type, and source and destination ports. These 309 statistics are commonly used for network engineering, but could 310 certainly be used for other purposes. 312 Let's assume for a moment that IP addresses can be correlated to 313 specific services or specific users. Analysis of the sequences of 314 packets will quickly reveal which users use what services, and also 315 which users engage in peer-to-peer connections with other users. 316 Analysis of traffic variations over time can be used to detect 317 increased activity by particular users, or in the case of peer-to- 318 peer connections increased activity within groups of users. 320 3.3.2. Correlation of IP addresses to user identities 322 The correlation of IP addresses with specific users can be done in 323 various ways. For example, tools like reverse DNS lookup can be used 324 to retrieve the DNS names of servers. Since the addresses of servers 325 tend to be quite stable and since servers are relatively less 326 numerous than users, an attacker could easily maintain its own copy 327 of the DNS for well-known or popular servers, to accelerate such 328 lookups. 330 On the other hand, the reverse lookup of IP addresses of users is 331 generally less informative. For example, a lookup of the address 332 currently used by one author's home network returns a name of the 333 form "c-192-000-002-033.hsd1.wa.comcast.net". This particular type 334 of reverse DNS lookup generally reveals only coarse-grained location 335 or provider information, equivalent to that available from 336 geolocation databases. 338 In many jurisdictions, Internet Service Providers (ISPs) are required 339 to provide identification on a case by case basis of the "owner" of a 340 specific IP address for law enforcement purposes. This is a 341 reasonably expedient process for targeted investigations, but 342 pervasive surveillance requires something more efficient. This 343 provides an incentive for the attacker to secure the cooperation of 344 the ISP in order to automate this correlation. 346 3.3.3. Monitoring messaging clients for IP address correlation 348 Even if the ISP does not cooperate, user identity can often be 349 obtained via inference. POP3 [RFC1939] and IMAP [RFC3501] are used 350 to retrieve mail from mail servers, while a variant of SMTP is used 351 to submit messages through mail servers. IMAP connections originate 352 from the client, and typically start with an authentication exchange 353 in which the client proves its identity by answering a password 354 challenge. The same holds for the SIP protocol [RFC3261] and many 355 instant messaging services operating over the Internet using 356 proprietary protocols. 358 The username is directly observable if any of these protocols operate 359 in cleartext; the username can then be directly associated with the 360 source address. 362 3.3.4. Retrieving IP addresses from mail headers 364 SMTP [RFC5321] requires that each successive SMTP relay adds a 365 "Received" header to the mail headers. The purpose of these headers 366 is to enable audit of mail transmission, and perhaps to distinguish 367 between regular mail and spam. Here is an extract from the headers 368 of a message recently received from the "perpass" mailing list: 370 "Received: from 192-000-002-044.zone13.example.org (HELO 371 ?192.168.1.100?) (xxx.xxx.xxx.xxx) by lvps192-000-002-219.example.net 372 with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 27 Oct 373 2013 21:47:14 +0100 Message-ID: <526D7BD2.7070908@example.org> Date: 374 Sun, 27 Oct 2013 20:47:14 +0000 From: Some One 375 " 376 This is the first "Received" header attached to the message by the 377 first SMTP relay; for privacy reasons, the field values have been 378 anonymized. We learn here that the message was submitted by "Some 379 One" on October 27, from a host behind a NAT (192.168.1.100) 380 [RFC1918] that used the IP address 192.0.2.44. The information 381 remained in the message, and is accessible by all recipients of the 382 "perpass" mailing list, or indeed by any attacker that sees at least 383 one copy of the message. 385 An attacker that can observe sufficient email traffic can regularly 386 update the mapping between public IP addresses and individual email 387 identities. Even if the SMTP traffic was encrypted on submission and 388 relaying, the attacker can still receive a copy of public mailing 389 lists like "perpass". 391 3.3.5. Tracking address usage with web cookies 393 Many web sites only encrypt a small fraction of their transactions. 394 A popular pattern is to use HTTPS for the login information, and then 395 use a "cookie" to associate following clear-text transactions with 396 the user's identity. Cookies are also used by various advertisement 397 services to quickly identify the users and serve them with 398 "personalized" advertisements. Such cookies are particularly useful 399 if the advertisement services want to keep tracking the user across 400 multiple sessions that may use different IP addresses. 402 As cookies are sent in clear text, an attacker can build a database 403 that associates cookies to IP addresses for non-HTTPS traffic. If 404 the IP address is already identified, the cookie can be linked to the 405 user identify. After that, if the same cookie appears on a new IP 406 address, the new IP address can be immediately associated with the 407 pre-determined identity. 409 3.3.6. Graph-based approaches to address correlation 411 An attacker can track traffic from an IP address not yet associated 412 with an individual to various public services (e.g. websites, mail 413 servers, game servers), and exploit patterns in the observed traffic 414 to correlate this address with other addresses that show similar 415 patterns. For example, any two addresses that show connections to 416 the same IMAP or webmail services, the same set of favorite websites, 417 and game servers at similar times of day may be associated with the 418 same individual. Correlated addresses can then be tied to an 419 individual through one of the techniques above, walking the "network 420 graph" to expand the set of attributable traffic. 422 3.3.7. Tracking of Link Layer Identifiers 424 Moving back down the stack, technologies like Ethernet or Wi-Fi use 425 MAC Addresses to identify link-level destinations. MAC Addresses 426 assigned according to IEEE-802 standards are globally-unique 427 identifiers for the device. If the link is publicly accessible, an 428 attacker can eavesdrop and perform tracking. For example, the 429 attacker can track the wireless traffic at publicly accessible Wi-Fi 430 networks. Simple devices can monitor the traffic, and reveal which 431 MAC Addresses are present. Also, devices do not need to be connected 432 to a network to expose link-layer identifiers. Active service 433 discovery always discloses the MAC address of the user, and sometimes 434 the SSIDs of previously visited networks. For instance, certain 435 techniques such as the use of "hidden SSIDs" require the mobile 436 device to broadcast the network identifier together with the device 437 identifier. This combination can further expose the user to 438 inference attacks, as more information can be derived from the 439 combination of MAC address, SSID being probed, time and current 440 location. For example, a user actively probing for a semi-unique 441 SSID on a flight out of a certain city can imply that the user is no 442 longer at the physical location of the corresponding AP. Given that 443 large-scale databases of the MAC addresses of wireless access points 444 for geolocation purposes have been known to exist for some time, the 445 attacker could easily build a database linking link-layer 446 identifiers, time and device or user identities, and use it to track 447 the movement of devices and of their owners. On the other hand, if 448 the network does not use some form of Wi-Fi encryption, or if the 449 attacker can access the decrypted traffic, the analysis will also 450 provide the correlation between link-layer identifiers such as MAC 451 Addresses and IP addresses. Additional monitoring using techniques 452 exposed in the previous sections will reveal the correlation between 453 MAC addresses, IP addresses, and user identity. For instance, 454 similarly to the use of web cookies, MAC addresses provide identity 455 information that can be used to associate a user to different IP 456 addresses. 458 4. Reported Instances of Large-Scale Attacks 460 The situation in reality is more bleak than that suggested by an 461 analysis of our idealized attacker. Through revelations of sensitive 462 documents in several media outlets, the Internet community has been 463 made aware of several intelligence activities conducted by US and UK 464 national intelligence agencies, particularly the US National Security 465 Agency (NSA) and the UK Government Communications Headquarters 466 (GCHQ). These documents have revealed methods that these agencies 467 use to attack Internet applications and obtain sensitive user 468 information. We note that these reports are primarily useful as an 469 illustration of the types of capabilities fielded by pervasive 470 attackers as of the date of the Snowden leaks in 2013. 472 First, they confirm the deployment of large-scale passive collection 473 of Internet traffic, which confirms the existence of pervasive 474 passive attackers with at least the capabilities of our idealized 475 attacker. For example [pass1][pass2][pass3][pass4]: 477 o NSA's XKEYSCORE system accesses data from multiple access points 478 and searches for "selectors" such as email addresses, at the scale 479 of tens of terabytes of data per day. 481 o GCHQ's Tempora system appears to have access to around 1,500 major 482 cables passing through the UK. 484 o NSA's MUSCULAR program has tapped cables between data centers 485 belonging to major service providers. 487 o Several programs appear to perform wide-scale collection of 488 cookies in web traffic and location data from location-aware 489 portable devices such as smartphones. 491 However, the capabilities described by these reports go beyond those 492 of our idealized attacker. They include the compromise of 493 cryptographic protocols, including decryption of TLS-protected 494 Internet sessions [dec1][dec2][dec3]. For example, the NSA BULLRUN 495 project worked to undermine encryption through multiple approaches, 496 including covert modifications to cryptographic software on end 497 systems. 499 Reported capabilities include the direct compromise of intermediate 500 systems and arrangements with service providers for bulk data and 501 metadata access [dir1][dir2][dir3], bypassing the need to capture 502 traffic on the wire. For example, the NSA PRISM program provides the 503 agency with access to many types of user data (e.g., email, chat, 504 VoIP). 506 The reported capabilities also include elements of active pervasive 507 attack, including: 509 o Insertion of devices as a man-in-the-middle of Internet 510 transactions [TOR1][TOR2]. For example, NSA's QUANTUM system 511 appears to use several different techniques to hijack HTTP 512 connections, ranging from DNS response injection to HTTP 302 513 redirects. 515 o Use of implants on end systems to undermine security and anonymity 516 features [dec2][TOR1][TOR2]. For example, QUANTUM is used to 517 direct users to a FOXACID server, which in turn delivers an 518 implant to compromise browsers of Tor users. 520 o Use of implants on network elements from many major equipment 521 providers, including Cisco, Juniper, Huawei, Dell, and HP, as 522 provided by the NSA's Advanced Network Technology group. 523 [spiegel1] 525 o Use of botnet-scale collections of compromised hosts [spiegel3]. 527 The scale of the compromise extends beyond the network to include 528 subversion of the technical standards process itself. For example, 529 there is suspicion that NSA modifications to the DUAL_EC_DRBG random 530 number generator were made to ensure that keys generated using that 531 generator could be predicted by NSA. This RNG was made part of 532 NIST's SP 800-90A, for which NIST acknowledges NSA's assistance. 533 There have also been reports that the NSA paid RSA Security for a 534 related contract with the result that the curve became the default in 535 the RSA BSAFE product line. 537 We use the term "pervasive attack" [RFC7258] to collectively describe 538 these operations. The term "pervasive" is used because the attacks 539 are designed to indiscriminately gather as much data as possible and 540 to apply selective analysis on targets after the fact. This means 541 that all, or nearly all, Internet communications are targets for 542 these attacks. To achieve this scale, the attacks are physically 543 pervasive; they affect a large number of Internet communications. 544 They are pervasive in content, consuming and exploiting any 545 information revealed by the protocol. And they are pervasive in 546 technology, exploiting many different vulnerabilities in many 547 different protocols. 549 It's important to note that although the attacks mentioned above were 550 executed by NSA and GCHQ, there are many other organizations that can 551 mount pervasive surveillance attacks. Because of the resources 552 required to achieve pervasive scale, these attacks are most commonly 553 undertaken by nation-state actors. For example, the Chinese Internet 554 filtering system known as the "Great Firewall of China" uses several 555 techniques that are similar to the QUANTUM program, and which have a 556 high degree of pervasiveness with regard to the Internet in China. 558 5. Threat Model 560 Given these disclosures, we must consider a broader threat model. 562 Pervasive surveillance aims to collect information across a large 563 number of Internet communications, analyzing the collected 564 communications to identify information of interest within individual 565 communications, or inferring information from correlated 566 communications. This analysis sometimes benefits from decryption of 567 encrypted communications and deanonymization of anonymized 568 communications. As a result, these attackers desire both access to 569 the bulk of Internet traffic and to the keying material required to 570 decrypt any traffic that has been encrypted. Even if keys are not 571 available, note that the presence of a communication and the fact 572 that it is encrypted may both be inputs to an analysis, even if the 573 attacker cannot decrypt the communication. 575 The attacks listed above highlight new avenues both for access to 576 traffic and for access to relevant encryption keys. They further 577 indicate that the scale of surveillance is sufficient to provide a 578 general capability to cross-correlate communications, a threat not 579 previously thought to be relevant at the scale of the Internet. 581 5.1. Attacker Capabilities 583 +--------------------------+-------------------------------------+ 584 | Attack Class | Capability | 585 +--------------------------+-------------------------------------+ 586 | Passive observation | Directly capture data in transit | 587 | | | 588 | Passive inference | Infer from reduced/encrypted data | 589 | | | 590 | Active | Manipulate / inject data in transit | 591 | | | 592 | Static key exfiltration | Obtain key material once / rarely | 593 | | | 594 | Dynamic key exfiltration | Obtain per-session key material | 595 | | | 596 | Content exfiltration | Access data at rest | 597 +--------------------------+-------------------------------------+ 599 Security analyses of Internet protocols commonly consider two classes 600 of attacker: Passive pervasive attackers, who can simply listen in on 601 communications as they transit the network, and active pervasive 602 attackers, who can modify or delete packets in addition to simply 603 collecting them. 605 In the context of pervasive passive surveillance, these attacks take 606 on an even greater significance. In the past, these attackers were 607 often assumed to operate near the edge of the network, where attacks 608 can be simpler. For example, in some LANs, it is simple for any node 609 to engage in passive listening to other nodes' traffic or inject 610 packets to accomplish active pervasive attacks. However, as we now 611 know, both passive and active pervasive attacks are undertaken by 612 pervasive attackers closer to the core of the network, greatly 613 expanding the scope and capability of the attacker. 615 Eavesdropping and observation at a larger scale make passive 616 inference attacks easier to carry out: a passive pervasive attacker 617 with access to a large portion of the Internet can analyze collected 618 traffic to create a much more detailed view of individual behavior 619 than an attacker that collects at a single point. Even the usual 620 claim that encryption defeats passive pervasive attackers is 621 weakened, since a pervasive flow access attacker can infer 622 relationships from correlations over large numbers of sessions, e.g., 623 pairing encrypted sessions with unencrypted sessions from the same 624 host, or performing traffic fingerprinting between known and unknown 625 encrypted sessions. Reports on the NSA XKEYSCORE system would 626 indicate it is an example of such an attacker. 628 An active pervasive attacker likewise has capabilities beyond those 629 of a localized active attacker. Flow modification attacks are often 630 limited by network topology, for example by a requirement that the 631 attacker be able to see a targeted session as well as inject packets 632 into it. A pervasive flow modification attacker with access at 633 multiple points within the core of the Internet is able to overcome 634 these topological limitations and perform attacks over a much broader 635 scope. Being positioned in the core of the network rather than the 636 edge can also enable an active pervasive attacker to reroute targeted 637 traffic, amplifying the ability to perform both eavesdropping and 638 traffic injection. Active pervasive attackers can also benefit from 639 passive pervasive collection to identify vulnerable hosts. 641 While not directly related to pervasiveness, attackers that are in a 642 position to mount a active pervasive attack are also often in a 643 position to subvert authentication, a traditional protection against 644 such attacks. Authentication in the Internet is often achieved via 645 trusted third party authorities such as the Certificate Authorities 646 (CAs) that provide web sites with authentication credentials. An 647 attacker with sufficient resources may also be able to induce an 648 authority to grant credentials for an identity of the attacker's 649 choosing. If the parties to a communication will trust multiple 650 authorities to certify a specific identity, this attack may be 651 mounted by suborning any one of the authorities (the proverbial 652 "weakest link"). Subversion of authorities in this way can allow an 653 active attack to succeed in spite of an authentication check. 655 Beyond these three classes (observation, inference, and active), 656 reports on the BULLRUN effort to defeat encryption and the PRISM 657 effort to obtain data from service providers suggest three more 658 classes of attack: 660 o Static key exfiltration 662 o Dynamic key exfiltration 664 o Content exfiltration 666 These attacks all rely on a collaborator providing the attacker with 667 some information, either keys or data. These attacks have not 668 traditionally been considered in scope for the Security 669 Considerations sections of IETF protocols, as they occur outside the 670 protocol. 672 The term "key exfiltration" refers to the transfer of keying material 673 for an encrypted communication from the collaborator to the attacker. 674 By "static", we mean that the transfer of keys happens once, or 675 rarely, typically of a long-lived key. For example, this case would 676 cover a web site operator that provides the private key corresponding 677 to its HTTPS certificate to an intelligence agency. 679 "Dynamic" key exfiltration, by contrast, refers to attacks in which 680 the collaborator delivers keying material to the attacker frequently, 681 e.g., on a per-session basis. This does not necessarily imply 682 frequent communications with the attacker; the transfer of keying 683 material may be virtual. For example, if an endpoint were modified 684 in such a way that the attacker could predict the state of its 685 psuedorandom number generator, then the attacker would be able to 686 derive per-session keys even without per-session communications. 688 Finally, content exfiltration is the attack in which the collaborator 689 simply provides the attacker with the desired data or metadata. 690 Unlike the key exfiltration cases, this attack does not require the 691 attacker to capture the desired data as it flows through the network. 692 The exfiltration is of data at rest, rather than data in transit. 693 This increases the scope of data that the attacker can obtain, since 694 the attacker can access historical data - the attacker does not have 695 to be listening at the time the communication happens. 697 Exfiltration attacks can be accomplished via attacks against one of 698 the parties to a communication, i.e., by the attacker stealing the 699 keys or content rather than the party providing them willingly. In 700 these cases, the party may not be aware that they are collaborating, 701 at least at a human level. Rather, the subverted technical assets 702 are "collaborating" with the attacker (by providing keys/content) 703 without their owner's knowledge or consent. 705 Any party that has access to encryption keys or unencrypted data can 706 be a collaborator. While collaborators are typically the endpoints 707 of a communication (with encryption securing the links), 708 intermediaries in an unencrypted communication can also facilitate 709 content exfiltration attacks as collaborators by providing the 710 attacker access to those communications. For example, documents 711 describing the NSA PRISM program claim that NSA is able to access 712 user data directly from servers, where it is stored unencrypted. In 713 these cases, the operator of the server would be a collaborator, if 714 an unwitting one. By contrast, in the NSA MUSCULAR program, a set of 715 collaborators enabled attackers to access the cables connecting data 716 centers used by service providers such as Google and Yahoo. Because 717 communications among these data centers were not encrypted, the 718 collaboration by an intermediate entity allowed NSA to collect 719 unencrypted user data. 721 5.2. Attacker Costs 723 +--------------------------+-----------------------------------+ 724 | Attack Class | Cost / Risk to Attacker | 725 +--------------------------+-----------------------------------+ 726 | Passive observation | Passive data access | 727 | | | 728 | Passive inference | Passive data access + processing | 729 | | | 730 | Active | Active data access + processing | 731 | | | 732 | Static key exfiltration | One-time interaction | 733 | | | 734 | Dynamic key exfiltration | Ongoing interaction / code change | 735 | | | 736 | Content exfiltration | Ongoing, bulk interaction | 737 +--------------------------+-----------------------------------+ 739 Each of the attack types discussed in the previous section entails 740 certain costs and risks. These costs differ by attack, and can be 741 helpful in guiding response to pervasive attack. 743 Depending on the attack, the attacker may be exposed to several types 744 of risk, ranging from simply losing access to arrest or prosecution. 745 In order for any of these negative consequences to occur, however, 746 the attacker must first be discovered and identified. So the primary 747 risk we focus on here is the risk of discovery and attribution. 749 A passive pervasive attack is the simplest to mount in some ways. 750 The base requirement is that the attacker obtain physical access to a 751 communications medium and extract communications from it. For 752 example, the attacker might tap a fiber-optic cable, acquire a mirror 753 port on a switch, or listen to a wireless signal. The need for these 754 taps to have physical access or proximity to a link exposes the 755 attacker to the risk that the taps will be discovered. For example, 756 a fiber tap or mirror port might be discovered by network operators 757 noticing increased attenuation in the fiber or a change in switch 758 configuration. Of course, passive pervasive attacks may be 759 accomplished with the cooperation of the network operator, in which 760 case there is a risk that the attacker's interactions with the 761 network operator will be exposed. 763 In many ways, the costs and risks for an active pervasive attack are 764 similar to those for a passive pervasive attack, with a few 765 additions. An active attacker requires more robust network access 766 than a passive attacker, since for example they will often need to 767 transmit data as well as receiving it. In the wireless example 768 above, the attacker would need to act as an transmitter as well as 769 receiver, greatly increasing the probability the attacker will be 770 discovered (e.g., using direction-finding technology). Active 771 attacks are also much more observable at higher layers of the 772 network. For example, an active attacker that attempts to use a mis- 773 issued certificate could be detected via Certificate Transparency 774 [RFC6962]. 776 In terms of raw implementation complexity, passive pervasive attacks 777 require only enough processing to extract information from the 778 network and store it. Active pervasive attacks, by contrast, often 779 depend on winning race conditions to inject packets into active 780 connections. So active pervasive attacks in the core of the network 781 require processing hardware to that can operate at line speed 782 (roughly 100Gbps to 1Tbps in the core) to identify opportunities for 783 attack and insert attack traffic in a high-volume traffic. Key 784 exfiltration attacks rely on passive pervasive attack for access to 785 encrypted data, with the collaborator providing keys to decrypt the 786 data. So the attacker undertakes the cost and risk of a passive 787 pervasive attack, as well as additional risk of discovery via the 788 interactions that the attacker has with the collaborator. 790 Some active attacks are more expensive than others. For example, 791 active man-in-the-middle (MITM) attacks require access to one or more 792 points on a communication's network path that allow visibility of the 793 entire session and the ability to modify or drop legitimate packets 794 in favor of the attacker's packets. A similar but weaker form of 795 attack, called an active man-on-the-side (MOTS), requires access to 796 only part of the session. In an active MOTS attack, the attacker 797 need only be able to inject or modify traffic on the network element 798 the attacker has access to. While this may not allow for full 799 control of a communication session (as in an MITM attack), the 800 attacker can perform a number of powerful attacks, including but not 801 limited to: injecting packets that could terminate the session (e.g., 802 TCP RST packets), sending a fake DNS reply to redirect ensuing TCP 803 connections to an address of the attacker's choice (i.e., winning a 804 "DNS response race"), and mounting an HTTP Redirect attack by 805 observing a TCP/HTTP connection to a target address and injecting a 806 TCP data packet containing an HTTP redirect. For example, the system 807 dubbed by researchers as China's "Great Cannon" [great-cannon] can 808 operate in ful MITM mode to accomplish very complex attacks that can 809 modify content in transit while the well-known Great Firewall of 810 China is a MOTS system that focuses on blocking access to certain 811 kinds of traffic and destinations via TCP RST packet injection. 813 In this sense, static exfiltration has a lower risk profile than 814 dynamic. In the static case, the attacker need only interact with 815 the collaborator a small number of times, possibly only once, say to 816 exchange a private key. In the dynamic case, the attacker must have 817 continuing interactions with the collaborator. As noted above these 818 interactions may be real, such as in-person meetings, or virtual, 819 such as software modifications that render keys available to the 820 attacker. Both of these types of interactions introduce a risk that 821 they will be discovered, e.g., by employees of the collaborator 822 organization noticing suspicious meetings or suspicious code changes. 824 Content exfiltration has a similar risk profile to dynamic key 825 exfiltration. In a content exfiltration attack, the attacker saves 826 the cost and risk of conducting a passive pervasive attack. The risk 827 of discovery through interactions with the collaborator, however, is 828 still present, and may be higher. The content of a communication is 829 obviously larger than the key used to encrypt it, often by several 830 orders of magnitude. So in the content exfiltration case, the 831 interactions between the collaborator and the attacker need to be 832 much higher-bandwidth than in the key exfiltration cases, with a 833 corresponding increase in the risk that this high-bandwidth channel 834 will be discovered. 836 It should also be noted that in these latter three exfiltration 837 cases, the collaborator also undertakes a risk that his collaboration 838 with the attacker will be discovered. Thus the attacker may have to 839 incur additional cost in order to convince the collaborator to 840 participate in the attack. Likewise, the scope of these attacks is 841 limited to case where the attacker can convince a collaborator to 842 participate. If the attacker is a national government, for example, 843 it may be able to compel participation within its borders, but have a 844 much more difficult time recruiting foreign collaborators. 846 As noted above, the collaborator in an exfiltration attack can be 847 unwitting; the attacker can steal keys or data to enable the attack. 848 In some ways, the risks of this approach are similar to the case of 849 an active collaborator. In the static case, the attacker needs to 850 steal information from the collaborator once; in the dynamic case, 851 the attacker needs to continued presence inside the collaborators 852 systems. The main difference is that the risk in this case is of 853 automated discovery (e.g., by intrusion detection systems) rather 854 than discovery by humans. 856 6. Security Considerations 858 This document describes a threat model for pervasive surveillance 859 attacks. Mitigations are to be given in a future document. 861 7. IANA Considerations 863 This document has no actions for IANA. 865 8. Acknowledgements 867 Thanks to Dave Thaler for the list of attacks and taxonomy; to 868 Security Area Directors Stephen Farrell, Sean Turner, and Kathleen 869 Moriarty for starting and managing the IETF's discussion on pervasive 870 attack; and to Stephan Neuhaus, Mark Townsley, Chris Inacio, 871 Evangelos Halepilidis, Bjoern Hoehrmann, Aziz Mohaisen, Russ Housley, 872 and the IAB Privacy and Security Program for their input. 874 9. References 876 9.1. Normative References 878 [RFC6973] Cooper, A., Tschofenig, H., Aboba, B., Peterson, J., 879 Morris, J., Hansen, M., and R. Smith, "Privacy 880 Considerations for Internet Protocols", RFC 6973, July 881 2013. 883 9.2. Informative References 885 [pass1] The Guardian, "How the NSA is still harvesting your online 886 data", 2013, 887 . 890 [pass2] The Guardian, "NSA's Prism surveillance program: how it 891 works and what it can do", 2013, 892 . 895 [pass3] The Guardian, "XKeyscore: NSA tool collects 'nearly 896 everything a user does on the internet'", 2013, 897 . 900 [pass4] The Guardian, "How does GCHQ's internet surveillance 901 work?", n.d., . 904 [dec1] The New York Times, "N.S.A. Able to Foil Basic Safeguards 905 of Privacy on Web", 2013, 906 . 909 [dec2] The Guardian, "Project Bullrun - classification guide to 910 the NSA's decryption program", 2013, 911 . 914 [dec3] The Guardian, "Revealed: how US and UK spy agencies defeat 915 internet privacy and security", 2013, 916 . 919 [TOR] The Tor Project, "Tor", 2013, 920 . 922 [TOR1] Schneier, B., "How the NSA Attacks Tor/Firefox Users With 923 QUANTUM and FOXACID", 2013, 924 . 927 [TOR2] The Guardian, "'Tor Stinks' presentation - read the full 928 document", 2013, 929 . 932 [dir1] The Guardian, "NSA collecting phone records of millions of 933 Verizon customers daily", 2013, 934 . 937 [dir2] The Guardian, "NSA Prism program taps in to user data of 938 Apple, Google and others", 2013, 939 . 942 [dir3] The Guardian, "Sigint - how the NSA collaborates with 943 technology companies", 2013, 944 . 947 [secure] Schneier, B., "NSA surveillance: A guide to staying 948 secure", 2013, 949 . 952 [snowden] Technology Review, "NSA Leak Leaves Crypto-Math Intact but 953 Highlights Known Workarounds", 2013, 954 . 958 [spiegel1] 959 C Stocker, ., "NSA's Secret Toolbox: Unit Offers Spy 960 Gadgets for Every Need", December 2013, 961 . 965 [spiegel3] 966 H Schmundt, ., "The Digital Arms Race: NSA Preps America 967 for Future Battle", January 2014, 968 . 972 [key-recovery] 973 Golle, P., "The Design and Implementation of Protocol- 974 Based Hidden Key Recovery", 2003, 975 . 977 [great-cannon] 978 Paxson, V., "China's Great Cannon", 2015, 979 . 981 [RFC1035] Mockapetris, P., "Domain names - implementation and 982 specification", STD 13, RFC 1035, November 1987. 984 [RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and 985 E. Lear, "Address Allocation for Private Internets", BCP 986 5, RFC 1918, February 1996. 988 [RFC1939] Myers, J. and M. Rose, "Post Office Protocol - Version 3", 989 STD 53, RFC 1939, May 1996. 991 [RFC2015] Elkins, M., "MIME Security with Pretty Good Privacy 992 (PGP)", RFC 2015, October 1996. 994 [RFC2821] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821, 995 April 2001. 997 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 998 A., Peterson, J., Sparks, R., Handley, M., and E. 999 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1000 June 2002. 1002 [RFC3365] Schiller, J., "Strong Security Requirements for Internet 1003 Engineering Task Force Standard Protocols", BCP 61, RFC 1004 3365, August 2002. 1006 [RFC3501] Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION 1007 4rev1", RFC 3501, March 2003. 1009 [RFC3851] Ramsdell, B., "Secure/Multipurpose Internet Mail 1010 Extensions (S/MIME) Version 3.1 Message Specification", 1011 RFC 3851, July 2004. 1013 [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. 1014 Rose, "DNS Security Introduction and Requirements", RFC 1015 4033, March 2005. 1017 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1018 Internet Protocol", RFC 4301, December 2005. 1020 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", RFC 1021 4303, December 2005. 1023 [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", RFC 1024 4306, December 2005. 1026 [RFC4949] Shirey, R., "Internet Security Glossary, Version 2", RFC 1027 4949, August 2007. 1029 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 1030 (TLS) Protocol Version 1.2", RFC 5246, August 2008. 1032 [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, 1033 October 2008. 1035 [RFC5655] Trammell, B., Boschi, E., Mark, L., Zseby, T., and A. 1036 Wagner, "Specification of the IP Flow Information Export 1037 (IPFIX) File Format", RFC 5655, October 2009. 1039 [RFC5750] Ramsdell, B. and S. Turner, "Secure/Multipurpose Internet 1040 Mail Extensions (S/MIME) Version 3.2 Certificate 1041 Handling", RFC 5750, January 2010. 1043 [RFC6120] Saint-Andre, P., "Extensible Messaging and Presence 1044 Protocol (XMPP): Core", RFC 6120, March 2011. 1046 [RFC6962] Laurie, B., Langley, A., and E. Kasper, "Certificate 1047 Transparency", RFC 6962, June 2013. 1049 [RFC6698] Hoffman, P. and J. Schlyter, "The DNS-Based Authentication 1050 of Named Entities (DANE) Transport Layer Security (TLS) 1051 Protocol: TLSA", RFC 6698, August 2012. 1053 [RFC7011] Claise, B., Trammell, B., and P. Aitken, "Specification of 1054 the IP Flow Information Export (IPFIX) Protocol for the 1055 Exchange of Flow Information", STD 77, RFC 7011, September 1056 2013. 1058 [RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an 1059 Attack", BCP 188, RFC 7258, May 2014. 1061 [I-D.ietf-dprive-problem-statement] 1062 Bortzmeyer, S., "DNS privacy considerations", draft-ietf- 1063 dprive-problem-statement-02 (work in progress), February 1064 2015. 1066 Authors' Addresses 1068 Richard Barnes 1070 Email: rlb@ipv.sx 1072 Bruce Schneier 1074 Email: schneier@schneier.com 1076 Cullen Jennings 1078 Email: fluffy@cisco.com 1080 Ted Hardie 1082 Email: ted.ietf@gmail.com 1084 Brian Trammell 1086 Email: ietf@trammell.ch 1087 Christian Huitema 1089 Email: huitema@huitema.net 1091 Daniel Borkmann 1093 Email: dborkman@iogearbox.net