idnits 2.17.1 draft-iab-privsec-confidentiality-threat-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 11, 2015) is 3327 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'TOR' is defined on line 882, but no explicit reference was found in the text == Unused Reference: 'RFC2015' is defined on line 950, but no explicit reference was found in the text == Unused Reference: 'RFC2821' is defined on line 953, but no explicit reference was found in the text == Unused Reference: 'RFC3851' is defined on line 968, but no explicit reference was found in the text == Unused Reference: 'RFC4301' is defined on line 976, but no explicit reference was found in the text == Unused Reference: 'RFC4306' is defined on line 982, but no explicit reference was found in the text == Unused Reference: 'RFC5655' is defined on line 994, but no explicit reference was found in the text == Unused Reference: 'RFC5750' is defined on line 998, but no explicit reference was found in the text == Unused Reference: 'RFC6120' is defined on line 1002, but no explicit reference was found in the text == Unused Reference: 'RFC6698' is defined on line 1008, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 2821 (Obsoleted by RFC 5321) -- Obsolete informational reference (is this intentional?): RFC 3501 (Obsoleted by RFC 9051) -- Obsolete informational reference (is this intentional?): RFC 3851 (Obsoleted by RFC 5751) -- Obsolete informational reference (is this intentional?): RFC 4306 (Obsoleted by RFC 5996) -- Obsolete informational reference (is this intentional?): RFC 5246 (Obsoleted by RFC 8446) -- Obsolete informational reference (is this intentional?): RFC 5750 (Obsoleted by RFC 8550) -- Obsolete informational reference (is this intentional?): RFC 6962 (Obsoleted by RFC 9162) == Outdated reference: A later version (-06) exists of draft-ietf-dprive-problem-statement-03 Summary: 1 error (**), 0 flaws (~~), 12 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Barnes 3 Internet-Draft 4 Intended status: Informational B. Schneier 5 Expires: September 12, 2015 6 C. Jennings 8 T. Hardie 10 B. Trammell 12 C. Huitema 14 D. Borkmann 16 March 11, 2015 18 Confidentiality in the Face of Pervasive Surveillance: A Threat Model 19 and Problem Statement 20 draft-iab-privsec-confidentiality-threat-04 22 Abstract 24 Documents published since initial revelations in 2013 have revealed 25 several classes of pervasive surveillance attack on Internet 26 communications. In this document we develop a threat model that 27 describes these pervasive attacks. We start by assuming an attacker 28 with an interest in undetected, indiscriminate eavesdropping, then 29 expand the threat model with a set of verified attacks that have been 30 published. 32 Status of This Memo 34 This Internet-Draft is submitted in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF). Note that other groups may also distribute 39 working documents as Internet-Drafts. The list of current Internet- 40 Drafts is at http://datatracker.ietf.org/drafts/current/. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 This Internet-Draft will expire on September 12, 2015. 49 Copyright Notice 51 Copyright (c) 2015 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (http://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. Code Components extracted from this document must 60 include Simplified BSD License text as described in Section 4.e of 61 the Trust Legal Provisions and are provided without warranty as 62 described in the Simplified BSD License. 64 1. Introduction 66 Starting in June 2013, documents released to the press by Edward 67 Snowden have revealed several operations undertaken by intelligence 68 agencies to exploit Internet communications for intelligence 69 purposes. These attacks were largely based on protocol 70 vulnerabilities that were already known to exist. The attacks were 71 nonetheless striking in their pervasive nature, both in terms of the 72 amount of Internet communications targeted, and in terms of the 73 diversity of attack techniques employed. 75 To ensure that the Internet can be trusted by users, it is necessary 76 for the Internet technical community to address the vulnerabilities 77 exploited in these attacks [RFC7258]. The goal of this document is 78 to describe more precisely the threats posed by these pervasive 79 attacks, and based on those threats, lay out the problems that need 80 to be solved in order to secure the Internet in the face of those 81 threats. 83 The remainder of this document is structured as follows. In 84 Section 3, we describe an idealized passive pervasive attacker, one 85 which could completely undetectably compromise communications at 86 Internet scale. In Section 4, we provide a brief summary of some 87 attacks that have been disclosed, and use these to expand the assumed 88 capabilities of our idealized attacker. Note that we do not attempt 89 to describe all possible attacks, but focus on those which result in 90 undetected eavesdropping. Section 5 describes a threat model based 91 on these attacks, focusing on classes of attack that have not been a 92 focus of Internet engineering to date. 94 2. Terminology 96 This document makes extensive use of standard security and privacy 97 terminology; see [RFC4949] and [RFC6973]. Terms used from [RFC6973] 98 include Eavesdropper, Observer, Initiator, Intermediary, Recipient, 99 Attack (in a privacy context), Correlation, Fingerprint, Traffic 100 Analysis, and Identifiability (and related terms). In addition, we 101 use a few terms that are specific to the attacks discussed in this 102 document. Note especially that "passive" and "active" below do not 103 refer to the effort used to mount the attack; a "passive attack" is 104 any attack that accesses a flow but does not modify it, while an 105 "active attack" is any attack that modifies a flow. Some passive 106 attacks involve active interception and modifications of devices, 107 rather than simple access to the medium. The introduced terms are: 109 Pervasive Attack: An attack on Internet communications that makes 110 use of access at a large number of points in the network, or 111 otherwise provides the attacker with access to a large amount of 112 Internet traffic; see [RFC7258] 114 Passive Pervasive Attack: An eavesdropping attack undertaken by a 115 pervasive attacker, in which the packets in a traffic stream 116 between two endpoints are eavesdropped upon, but in which the 117 attacker does not modify the packets in the traffic stream between 118 two endpoints, modify the treatment of packets in the traffic 119 stream (e.g. delay, routing), or add or remove packets in the 120 traffic stream. Passive pervasive attacks are undetectable from 121 the endpoints. Equivalent to passive wiretapping as defined in 122 [RFC4949]; we use an alternate term here since the methods 123 employed are wider than those implied by the word "wiretapping", 124 including the active compromise of intermediate systems. 126 Active Pervasive Attack: An attack undertaken by a pervasive 127 attacker, which in addition to the elements of a passive pervasive 128 attack, also includes modification, addition, or removal of 129 packets in a traffic stream, or modification of treatment of 130 packets in the traffic stream. Active pervasive attacks provide 131 more capabilities to the attacker at the cost of possible 132 detection at the endpoints. Equivalent to active wiretapping as 133 defined in [RFC4949]. 135 Observation: Information collected directly from communications by 136 an eavesdropper or observer. For example, the knowledge that 137 sent a message to via SMTP 138 taken from the headers of an observed SMTP message would be an 139 observation. 141 Inference: Information extracted from analysis of information 142 collected directly from communications by an eavesdropper or 143 observer. For example, the knowledge that a given web page was 144 accessed by a given IP address, by comparing the size in octets of 145 measured network flow records to fingerprints derived from known 146 sizes of linked resources on the web servers involved, would be an 147 inference. 149 Collaborator: An entity that is a legitimate participant in a 150 communication, but who deliberately provides information about 151 that interaction to an attacker. 153 Unwitting Collaborator: An entity that is a legitimate participant 154 in a communication, and who is the source of information obtained 155 by the attacker without the entity's consent or intention, because 156 the attacker has exploited some technology used by the entity. 158 Key Exfiltration: The transmission of keying material for an 159 encrypted communication from a collaborator, deliberately or 160 unwittingly, to an attacker 162 Content Exfiltration: The transmission of the content of a 163 communication from a collaborator, deliberately or unwittingly, to 164 an attacker 166 3. An Idealized Passive Pervasive Attacker 168 In considering the threat posed by pervasive surveillance, we begin 169 by defining an idealized passive pervasive attacker. While this 170 attacker is less capable than those which we now know to have 171 compromised the Internet from press reports, as elaborated in 172 Section 4, it does set a lower bound on the capabilities of an 173 attacker interested in indiscriminate passive surveillance while 174 interested in remaining undetectable. We note that, prior to the 175 Snowden revelations in 2013, the assumptions of attacker capability 176 presented here would be considered on the border of paranoia outside 177 the network security community. 179 Our idealized attacker is an indiscriminate eavesdropper on an 180 Internet-attached computer network that: 182 o can observe every packet of all communications at any hop in any 183 network path between an initiator and a recipient; 185 o can observe data at rest in any intermediate system between the 186 endpoints controlled by the initiator and recipient; and 188 o can share information with other such attackers; but 189 o takes no other action with respect to these communications (i.e., 190 blocking, modification, injection, etc.). 192 The techniques available to our ideal attacker are direct observation 193 and inference. Direct observation involves taking information 194 directly from eavesdropped communications - e.g., URLs identifying 195 content or email addresses identifying individuals from application- 196 layer headers. Inference, on the other hand, involves analyzing 197 eavesdropped information to derive new information from it; e.g., 198 searching for application or behavioral fingerprints in observed 199 traffic to derive information about the observed individual from 200 them, in absence of directly-observed sources of the same 201 information. The use of encryption to protect confidentiality is 202 generally enough to prevent direct observation of unencrypted 203 content, assuming uncompromised encryption implementations and key 204 material. However, it provides less complete protection against 205 inference, especially inference based only on unprotected portions of 206 communications (e.g. IP and TCP headers for TLS [RFC5246]). 208 3.1. Information subject to direct observation 210 Protocols which do not encrypt their payload make the entire content 211 of the communication available to the idealized attacker along their 212 path. Following the advice in [RFC3365], most such protocols have a 213 secure variant which encrypts payload for confidentiality, and these 214 secure variants are seeing ever-wider deployment. A noteworthy 215 exception is DNS [RFC1035], as DNSSEC [RFC4033] does not have 216 confidentiality as a requirement. 218 This implies that, in the absence of changes to the protocol as 219 presently under development in the IETF's DNS Private Exchange 220 (DPRIVE) working group [I-D.ietf-dprive-problem-statement], all DNS 221 queries and answers generated by the activities of any protocol are 222 available to the attacker. 224 Protocols which imply the storage of some data at rest in 225 intermediaries (e.g. SMTP [RFC5321]) leave this data subject to 226 observation by an attacker that has compromised these intermediaries, 227 unless the data is encrypted end-to-end by the application layer 228 protocol, or the implementation uses an encrypted store for this 229 data. 231 3.2. Information useful for inference 233 Inference is information extracted from later analysis of an observed 234 or eavesdropped communication, and/or correlation of observed or 235 eavesdropped information with information available from other 236 sources. Indeed, most useful inference performed by the attacker 237 falls under the rubric of correlation. The simplest example of this 238 is the observation of DNS queries and answers from and to a source 239 and correlating those with IP addresses with which that source 240 communicates. This can give access to information otherwise not 241 available from encrypted application payloads (e.g., the Host: 242 HTTP/1.1 request header when HTTP is used with TLS). 244 Protocols which encrypt their payload using an application- or 245 transport-layer encryption scheme (e.g. TLS) still expose all the 246 information in their network and transport layer headers to the 247 attacker, including source and destination addresses and ports. 248 IPsec ESP[RFC4303] further encrypts the transport-layer headers, but 249 still leaves IP address information unencrypted; in tunnel mode, 250 these addresses correspond to the tunnel endpoints. Features of the 251 cryptographic protocols themselves, e.g. the TLS session identifier, 252 may leak information that can be used for correlation and inference. 253 While this information is much less semantically rich than the 254 application payload, it can still be useful for the inferring an 255 individual's activities. 257 Inference can also leverage information obtained from sources other 258 than direct traffic observation. Geolocation databases, for example, 259 have been developed map IP addresses to a location, in order to 260 provide location-aware services such as targeted advertising. This 261 location information is often of sufficient resolution that it can be 262 used to draw further inferences toward identifying or profiling an 263 individual. 265 Social media provide another source of more or less publicly 266 accessible information. This information can be extremely 267 semantically rich, including information about an individual's 268 location, associations with other individuals and groups, and 269 activities. Further, this information is generally contributed and 270 curated voluntarily by the individuals themselves: it represents 271 information which the individuals are not necessarily interested in 272 protecting for privacy reasons. However, correlation of this social 273 networking data with information available from direct observation of 274 network traffic allows the creation of a much richer picture of an 275 individual's activities than either alone. 277 We note with some alarm that there is little that can be done at 278 protocol design time to limit such correlation by the attacker, and 279 that the existence of such data sources in many cases greatly 280 complicates the problem of protecting privacy by hardening protocols 281 alone. 283 3.3. An illustration of an ideal passive pervasive attack 285 To illustrate how capable the idealized attacker is even given its 286 limitations, we explore the non-anonymity of encrypted IP traffic in 287 this section. Here we examine in detail some inference techniques 288 for associating a set of addresses with an individual, in order to 289 illustrate the difficulty of defending communications against our 290 idealized attacker. Here, the basic problem is that information 291 radiated even from protocols which have no obvious connection with 292 personal data can be correlated with other information which can 293 paint a very rich behavioral picture, that only takes one unprotected 294 link in the chain to associate with an identity. 296 3.3.1. Analysis of IP headers 298 Internet traffic can be monitored by tapping Internet links, or by 299 installing monitoring tools in Internet routers. Of course, a single 300 link or a single router only provides access to a fraction of the 301 global Internet traffic. However, monitoring a number of high 302 capacity links or a set of routers placed at strategic locations 303 provides access to a good sampling of Internet traffic. 305 Tools like IPFIX [RFC7011] allow administrators to acquire statistics 306 about sequences of packets with some common properties that pass 307 through a network device. The most common set of properties used in 308 flow measurement is the "five-tuple"of source and destination 309 addresses, protocol type, and source and destination ports. These 310 statistics are commonly used for network engineering, but could 311 certainly be used for other purposes. 313 Let's assume for a moment that IP addresses can be correlated to 314 specific services or specific users. Analysis of the sequences of 315 packets will quickly reveal which users use what services, and also 316 which users engage in peer-to-peer connections with other users. 317 Analysis of traffic variations over time can be used to detect 318 increased activity by particular users, or in the case of peer-to- 319 peer connections increased activity within groups of users. 321 3.3.2. Correlation of IP addresses to user identities 323 The correlation of IP addresses with specific users can be done in 324 various ways. For example, tools like reverse DNS lookup can be used 325 to retrieve the DNS names of servers. Since the addresses of servers 326 tend to be quite stable and since servers are relatively less 327 numerous than users, an attacker could easily maintain its own copy 328 of the DNS for well-known or popular servers, to accelerate such 329 lookups. 331 On the other hand, the reverse lookup of IP addresses of users is 332 generally less informative. For example, a lookup of the address 333 currently used by one author's home network returns a name of the 334 form "c-192-000-002-033.hsd1.wa.comcast.net". This particular type 335 of reverse DNS lookup generally reveals only coarse-grained location 336 or provider information, equivalent to that available from 337 geolocation databases. 339 In many jurisdictions, Internet Service Providers (ISPs) are required 340 to provide identification on a case by case basis of the "owner" of a 341 specific IP address for law enforcement purposes. This is a 342 reasonably expedient process for targeted investigations, but 343 pervasive surveillance requires something more efficient. This 344 provides an incentive for the attacker to secure the cooperation of 345 the ISP in order to automate this correlation. 347 3.3.3. Monitoring messaging clients for IP address correlation 349 Even if the ISP does not cooperate, user identity can often be 350 obtained via inference. POP3 [RFC1939] and IMAP [RFC3501] are used 351 to retrieve mail from mail servers, while a variant of SMTP is used 352 to submit messages through mail servers. IMAP connections originate 353 from the client, and typically start with an authentication exchange 354 in which the client proves its identity by answering a password 355 challenge. The same holds for the SIP protocol [RFC3261] and many 356 instant messaging services operating over the Internet using 357 proprietary protocols. 359 The username is directly observable if any of these protocols operate 360 in cleartext; the username can then be directly associated with the 361 source address. 363 3.3.4. Retrieving IP addresses from mail headers 365 SMTP [RFC5321] requires that each successive SMTP relay adds a 366 "Received" header to the mail headers. The purpose of these headers 367 is to enable audit of mail transmission, and perhaps to distinguish 368 between regular mail and spam. Here is an extract from the headers 369 of a message recently received from the "perpass" mailing list: 371 "Received: from 192-000-002-044.zone13.example.org (HELO 372 ?192.168.1.100?) (xxx.xxx.xxx.xxx) by lvps192-000-002-219.example.net 373 with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 27 Oct 374 2013 21:47:14 +0100 Message-ID: <526D7BD2.7070908@example.org> Date: 375 Sun, 27 Oct 2013 20:47:14 +0000 From: Some One 376 " 377 This is the first "Received" header attached to the message by the 378 first SMTP relay; for privacy reasons, the field values have been 379 anonymized. We learn here that the message was submitted by "Some 380 One" on October 27, from a host behind a NAT (192.168.1.100) 381 [RFC1918] that used the IP address 192.0.2.44. The information 382 remained in the message, and is accessible by all recipients of the 383 "perpass" mailing list, or indeed by any attacker that sees at least 384 one copy of the message. 386 An attacker that can observe sufficient email traffic can regularly 387 update the mapping between public IP addresses and individual email 388 identities. Even if the SMTP traffic was encrypted on submission and 389 relaying, the attacker can still receive a copy of public mailing 390 lists like "perpass". 392 3.3.5. Tracking address usage with web cookies 394 Many web sites only encrypt a small fraction of their transactions. 395 A popular pattern is to use HTTPS for the login information, and then 396 use a "cookie" to associate following clear-text transactions with 397 the user's identity. Cookies are also used by various advertisement 398 services to quickly identify the users and serve them with 399 "personalized" advertisements. Such cookies are particularly useful 400 if the advertisement services want to keep tracking the user across 401 multiple sessions that may use different IP addresses. 403 As cookies are sent in clear text, an attacker can build a database 404 that associates cookies to IP addresses for non-HTTPS traffic. If 405 the IP address is already identified, the cookie can be linked to the 406 user identify. After that, if the same cookie appears on a new IP 407 address, the new IP address can be immediately associated with the 408 pre-determined identity. 410 3.3.6. Graph-based approaches to address correlation 412 An attacker can track traffic from an IP address not yet associated 413 with an individual to various public services (e.g. websites, mail 414 servers, game servers), and exploit patterns in the observed traffic 415 to correlate this address with other addresses that show similar 416 patterns. For example, any two addresses that show connections to 417 the same IMAP or webmail services, the same set of favorite websites, 418 and game servers at similar times of day may be associated with the 419 same individual. Correlated addresses can then be tied to an 420 individual through one of the techniques above, walking the "network 421 graph" to expand the set of attributable traffic. 423 3.3.7. Tracking of MAC Addresses 425 Moving back down the stack, technologies like Ethernet or Wi-Fi use 426 MAC Addresses to identify link-level destinations. MAC Addresses 427 assigned according to IEEE-802 standards are unique to the device. 428 If the link is publicly accessible, an attacker can track it. For 429 example, the attacker can track the wireless traffic at public Wi-Fi 430 networks. Simple devices can monitor the traffic, and reveal which 431 MAC Addresses are present. If the network does not use some form of 432 Wi-Fi encryption, or if the attacker can access the decrypted 433 traffic, the analysis will also provide the correlation between MAC 434 Addresses and IP addresses. Additional monitoring using techniques 435 exposed in the previous sections will reveal the correlation between 436 MAC Addresses, IP Addresses, and user identity. 438 Given that large-scale databases of the MAC addresses of wireless 439 access points for geolocation purposes have been known to exist for 440 some time, the attacker could easily build a database linking MAC 441 Addresses and device or user identities, and use it to track the 442 movement of devices and of their owners. 444 4. Reported Instances of Large-Scale Attacks 446 The situation in reality is more bleak than that suggested by an 447 analysis of our idealized attacker. Through revelations of sensitive 448 documents in several media outlets, the Internet community has been 449 made aware of several intelligence activities conducted by US and UK 450 national intelligence agencies, particularly the US National Security 451 Agency (NSA) and the UK Government Communications Headquarters 452 (GCHQ). These documents have revealed methods that these agencies 453 use to attack Internet applications and obtain sensitive user 454 information. We note that these reports are primarily useful as an 455 illustration of the types of capabilities fielded by pervasive 456 attackers as of the date of the Snowden leaks in 2013. 458 First, they confirm the deployment of large-scale passive collection 459 of Internet traffic, which confirms the existence of pervasive 460 passive attackers with at least the capabilities of our idealized 461 attacker. For example [pass1][pass2][pass3][pass4]: 463 o NSA's XKEYSCORE system accesses data from multiple access points 464 and searches for "selectors" such as email addresses, at the scale 465 of tens of terabytes of data per day. 467 o GCHQ's Tempora system appears to have access to around 1,500 major 468 cables passing through the UK. 470 o NSA's MUSCULAR program has tapped cables between data centers 471 belonging to major service providers. 473 o Several programs appear to perform wide-scale collection of 474 cookies in web traffic and location data from location-aware 475 portable devices such as smartphones. 477 However, the capabilities described by these reports go beyond those 478 of our idealized attacker. They include the compromise of 479 cryptographic protocols, including decryption of TLS-protected 480 Internet sessions [dec1][dec2][dec3]. For example, the NSA BULLRUN 481 project worked to undermine encryption through multiple approaches, 482 including covert modifications to cryptographic software on end 483 systems. 485 They also include the direct compromise of intermediate systems and 486 arrangements with service providers for bulk data and metadata access 487 [dir1][dir2][dir3], bypassing the need to capture traffic on the 488 wire. For example, the NSA PRISM program provides the agency with 489 access to many types of user data (e.g., email, chat, VoIP). 491 The reported capabilities also include elements of active pervasive 492 attack, including: 494 o Insertion of devices as a man-in-the-middle of Internet 495 transactions [TOR1][TOR2]. For example, NSA's QUANTUM system 496 appears to use several different techniques to hijack HTTP 497 connections, ranging from DNS response injection to HTTP 302 498 redirects. 500 o Use of implants on end systems to undermine security and anonymity 501 features [dec2][TOR1][TOR2]. For example, QUANTUM is used to 502 direct users to a FOXACID server, which in turn delivers an 503 implant to compromise browsers of Tor users. 505 o Use of implants on network elements from many major equipment 506 providers, including Cisco, Juniper, Huawei, Dell, and HP, as 507 provided by the NSA's Advanced Network Technology group. 508 [spiegel1] 510 o Use of botnet-scale collections of compromised hosts [spiegel3]. 512 The scale of the compromise extends beyond the network to include 513 subversion of the technical standards process itself. For example, 514 there is suspicion that NSA modifications to the DUAL_EC_DRBG random 515 number generator were made to ensure that keys generated using that 516 generator could be predicted by NSA. This RNG was made part of 517 NIST's SP 800-90A, for which NIST acknowledges NSA's assistance. 519 There have also been reports that the NSA paid RSA Security for a 520 related contract with the result that the curve became the default in 521 the RSA BSAFE product line. 523 We use the term "pervasive attack" [RFC7258] to collectively describe 524 these operations. The term "pervasive" is used because the attacks 525 are designed to indiscriminately gather as much data as possible and 526 to apply selective analysis on targets after the fact. This means 527 that all, or nearly all, Internet communications are targets for 528 these attacks. To achieve this scale, the attacks are physically 529 pervasive; they affect a large number of Internet communications. 530 They are pervasive in content, consuming and exploiting any 531 information revealed by the protocol. And they are pervasive in 532 technology, exploiting many different vulnerabilities in many 533 different protocols. 535 It's important to note that although the attacks mentioned above were 536 executed by NSA and GCHQ, there are many other organizations that can 537 mount pervasive surveillance attacks. Because of the resources 538 required to achieve pervasive scale, these attacks are most commonly 539 undertaken by nation-state actors. For example, the Chinese Internet 540 filtering system known as the "Great Firewall of China" uses several 541 techniques that are similar to the QUANTUM program, and which have a 542 high degree of pervasiveness with regard to the Internet in China. 544 5. Threat Model 546 Given these disclosures, we must consider a broader threat model. 548 Pervasive surveillance aims to collect information across a large 549 number of Internet communications, analyzing the collected 550 communications to identify information of interest within individual 551 communications, or inferring information from correlated 552 communications. his analysis sometimes benefits from decryption of 553 encrypted communications and deanonymization of anonymized 554 communications. As a result, these attackers desire both access to 555 the bulk of Internet traffic and to the keying material required to 556 decrypt any traffic that has been encrypted. Even if keys are not 557 available, note that the presence of a communication and the fact 558 that it is encrypted may both be inputs to an analysis, even if the 559 attacker cannot decrypt the communication. 561 The attacks listed above highlight new avenues both for access to 562 traffic and for access to relevant encryption keys. They further 563 indicate that the scale of surveillance is sufficient to provide a 564 general capability to cross-correlate communications, a threat not 565 previously thought to be relevant at the scale of the Internet. 567 5.1. Attacker Capabilities 569 +--------------------------+-------------------------------------+ 570 | Attack Class | Capability | 571 +--------------------------+-------------------------------------+ 572 | Passive observation | Directly capture data in transit | 573 | | | 574 | Passive inference | Infer from reduced/encrypted data | 575 | | | 576 | Active | Manipulate / inject data in transit | 577 | | | 578 | Static key exfiltration | Obtain key material once / rarely | 579 | | | 580 | Dynamic key exfiltration | Obtain per-session key material | 581 | | | 582 | Content exfiltration | Access data at rest | 583 +--------------------------+-------------------------------------+ 585 Security analyses of Internet protocols commonly consider two classes 586 of attacker: Passive pervasive attackers, who can simply listen in on 587 communications as they transit the network, and active pervasive 588 attackers, who can modify or delete packets in addition to simply 589 collecting them. 591 In the context of pervasive passive surveillance, these attacks take 592 on an even greater significance. In the past, these attackers were 593 often assumed to operate near the edge of the network, where attacks 594 can be simpler. For example, in some LANs, it is simple for any node 595 to engage in passive listening to other nodes' traffic or inject 596 packets to accomplish active pervasive attacks. However, as we now 597 know, both passive and active pervasive attacks are undertaken by 598 pervasive attackers closer to the core of the network, greatly 599 expanding the scope and capability of the attacker. 601 Eavesdropping and observation at a larger scale make passive 602 inference attacks easier to carry out: a passive pervasive attacker 603 with access to a large portion of the Internet can analyze collected 604 traffic to create a much more detailed view of individual behavior 605 than an attacker that collects at a single point. Even the usual 606 claim that encryption defeats passive pervasive attackers is 607 weakened, since a pervasive flow access attacker can infer 608 relationships from correlations over large numbers of sessions, e.g., 609 pairing encrypted sessions with unencrypted sessions from the same 610 host, or performing traffic fingerprinting between known and unknown 611 encrypted sessions. Reports on the NSA XKEYSCORE system would 612 indicate it is an example of such an attacker. 614 An active pervasive attacker likewise has capabilities beyond those 615 of a localized active attacker. flow modification attacks are often 616 limited by network topology, for example by a requirement that the 617 attacker be able to see a targeted session as well as inject packets 618 into it. A pervasive flow modification attacker with access at 619 multiple points within the core of the Internet is able to overcome 620 these topological limitations and perform attacks over a much broader 621 scope. Being positioned in the core of the network rather than the 622 edge can also enable an active pervasive attacker to reroute targeted 623 traffic, amplifying the ability to perform both eavesdropping and 624 traffic injection. Active pervasive attackers can also benefit from 625 passive pervasive collection to identify vulnerable hosts. 627 While not directly related to pervasiveness, attackers that are in a 628 position to mount a active pervasive attack are also often in a 629 position to subvert authentication, a traditional protection against 630 such attacks. Authentication in the Internet is often achieved via 631 trusted third party authorities such as the Certificate Authorities 632 (CAs) that provide web sites with authentication credentials. An 633 attacker with sufficient resources may also be able to induce an 634 authority to grant credentials for an identity of the attacker's 635 choosing. If the parties to a communication will trust multiple 636 authorities to certify a specific identity, this attack may be 637 mounted by suborning any one of the authorities (the proverbial 638 "weakest link"). Subversion of authorities in this way can allow an 639 active attack to succeed in spite of an authentication check. 641 Beyond these three classes (observation, inference, and active), 642 reports on the BULLRUN effort to defeat encryption and the PRISM 643 effort to obtain data from service providers suggest three more 644 classes of attack: 646 o Static key exfiltration 648 o Dynamic key exfiltration 650 o Content exfiltration 652 These attacks all rely on a collaborator providing the attacker with 653 some information, either keys or data. These attacks have not 654 traditionally been considered in scope for the Security 655 Considerations sections of IETF protocols, as they occur outside the 656 protocol. 658 The term "key exfiltration" refers to the transfer of keying material 659 for an encrypted communication from the collaborator to the attacker. 660 By "static", we mean that the transfer of keys happens once, or 661 rarely, typically of a long-lived key. For example, this case would 662 cover a web site operator that provides the private key corresponding 663 to its HTTPS certificate to an intelligence agency. 665 "Dynamic" key exfiltration, by contrast, refers to attacks in which 666 the collaborator delivers keying material to the attacker frequently, 667 e.g., on a per-session basis. This does not necessarily imply 668 frequent communications with the attacker; the transfer of keying 669 material may be virtual. For example, if an endpoint were modified 670 in such a way that the attacker could predict the state of its 671 psuedorandom number generator, then the attacker would be able to 672 derive per-session keys even without per-session communications. 674 Finally, content exfiltration is the attack in which the collaborator 675 simply provides the attacker with the desired data or metadata. 676 Unlike the key exfiltration cases, this attack does not require the 677 attacker to capture the desired data as it flows through the network. 678 The risk is to data at rest as opposed to data in transit. This 679 increases the scope of data that the attacker can obtain, since the 680 attacker can access historical data - the attacker does not have to 681 be listening at the time the communication happens. 683 Exfiltration attacks can be accomplished via attacks against one of 684 the parties to a communication, i.e., by the attacker stealing the 685 keys or content rather than the party providing them willingly. In 686 these cases, the party may not be aware that they are collaborating, 687 at least at a human level. Rather, the subverted technical assets 688 are "collaborating" with the attacker (by providing keys/content) 689 without their owner's knowledge or consent. 691 Any party that has access to encryption keys or unencrypted data can 692 be a collaborator. While collaborators are typically the endpoints 693 of a communication (with encryption securing the links), 694 intermediaries in an unencrypted communication can also facilitate 695 content exfiltration attacks as collaborators by providing the 696 attacker access to those communications. For example, documents 697 describing the NSA PRISM program claim that NSA is able to access 698 user data directly from servers, where it is stored unencrypted. In 699 these cases, the operator of the server would be a collaborator, if 700 an unwitting one. By contrast, in the NSA MUSCULAR program, a set of 701 collaborators enabled attackers to access the cables connecting data 702 centers used by service providers such as Google and Yahoo. Because 703 communications among these data centers were not encrypted, the 704 collaboration by an intermediate entity allowed NSA to collect 705 unencrypted user data. 707 5.2. Attacker Costs 709 +--------------------------+-----------------------------------+ 710 | Attack Class | Cost / Risk to Attacker | 711 +--------------------------+-----------------------------------+ 712 | Passive observation | Passive data access | 713 | | | 714 | Passive inference | Passive data access + processing | 715 | | | 716 | Active | Active data access + processing | 717 | | | 718 | Static key exfiltration | One-time interaction | 719 | | | 720 | Dynamic key exfiltration | Ongoing interaction / code change | 721 | | | 722 | Content exfiltration | Ongoing, bulk interaction | 723 +--------------------------+-----------------------------------+ 725 Each of the attack types discussed in the previous section entails 726 certain costs and risks. These costs differ by attack, and can be 727 helpful in guiding response to pervasive attack. 729 Depending on the attack, the attacker may be exposed to several types 730 of risk, ranging from simply losing access to arrest or prosecution. 731 In order for any of these negative consequences to occur, however, 732 the attacker must first be discovered and identified. So the primary 733 risk we focus on here is the risk of discovery and attribution. 735 A passive pervasive attack is the simplest to mount in some ways. 736 The base requirement is that the attacker obtain physical access to a 737 communications medium and extract communications from it. For 738 example, the attacker might tap a fiber-optic cable, acquire a mirror 739 port on a switch, or listen to a wireless signal. The need for these 740 taps to have physical access or proximity to a link exposes the 741 attacker to the risk that the taps will be discovered. For example, 742 a fiber tap or mirror port might be discovered by network operators 743 noticing increased attenuation in the fiber or a change in switch 744 configuration. Of course, passive pervasive attacks may be 745 accomplished with the cooperation of the network operator, in which 746 case there is a risk that the attacker's interactions with the 747 network operator will be exposed. 749 In many ways, the costs and risks for an active pervasive attack are 750 similar to those for a passive pervasive attack, with a few 751 additions. An active attacker requires more robust network access 752 than a passive attacker, since for example they will often need to 753 transmit data as well as receiving it. In the wireless example 754 above, the attacker would need to act as an transmitter as well as 755 receiver, greatly increasing the probability the attacker will be 756 discovered (e.g., using direction-finding technology). Active 757 attacks are also much more observable at higher layers of the 758 network. For example, an active attacker that attempts to use a mis- 759 issued certificate could be detected via Certificate Transparency 760 [RFC6962]. 762 In terms of raw implementation complexity, passive pervasive attacks 763 require only enough processing to extract information from the 764 network and store it. Active pervasive attacks, by contrast, often 765 depend on winning race conditions to inject pakets into active 766 connections. So active pervasive attacks in the core of the network 767 require processing hardware to that can operate at line speed 768 (roughly 100Gbps to 1Tbps in the core) to identify opportunities for 769 attack and insert attack traffic in a high-volume traffic. Key 770 exfiltration attacks rely on passive pervasive attack for access to 771 encrypted data, with the collaborator providing keys to decrypt the 772 data. So the attacker undertakes the cost and risk of a passive 773 pervasive attack, as well as additional risk of discovery via the 774 interactions that the attacker has with the collaborator. 776 In this sense, static exfiltration has a lower risk profile than 777 dynamic. In the static case, the attacker need only interact with 778 the collaborator a small number of times, possibly only once, say to 779 exchange a private key. In the dynamic case, the attacker must have 780 continuing interactions with the collaborator. As noted above these 781 interactions may real, such as in-person meetings, or virtual, such 782 as software modifications that render keys available to the attacker. 783 Both of these types of interactions introduce a risk that they will 784 be discovered, e.g., by employees of the collaborator organization 785 noticing suspicious meetings or suspicious code changes. 787 Content exfiltration has a similar risk profile to dynamic key 788 exfiltration. In a content exfiltration attack, the attacker saves 789 the cost and risk of conducting a passive pervasive attack. The risk 790 of discovery through interactions with the collaborator, however, is 791 still present, and may be higher. The content of a communication is 792 obviously larger than the key used to encrypt it, often by several 793 orders of magnitude. So in the content exfiltration case, the 794 interactions between the collaborator and the attacker need to be 795 much higher-bandwidth than in the key exfiltration cases, with a 796 corresponding increase in the risk that this high-bandwidth channel 797 will be discovered. 799 It should also be noted that in these latter three exfiltration 800 cases, the collaborator also undertakes a risk that his collaboration 801 with the attacker will be discovered. Thus the attacker may have to 802 incur additional cost in order to convince the collaborator to 803 participate in the attack. Likewise, the scope of these attacks is 804 limited to case where the attacker can convince a collaborator to 805 participate. If the attacker is a national government, for example, 806 it may be able to compel participation within its borders, but have a 807 much more difficult time recruiting foreign collaborators. 809 As noted above, the collaborator in an exfiltration attack can be 810 unwitting; the attacker can steal keys or data to enable the attack. 811 In some ways, the risks of this approach are similar to the case of 812 an active collaborator. In the static case, the attacker needs to 813 steal information from the collaborator once; in the dynamic case, 814 the attacker needs to continued presence inside the collaborators 815 systems. The main difference is that the risk in this case is of 816 automated discovery (e.g., by intrusion detection systems) rather 817 than discovery by humans. 819 6. Security Considerations 821 This document describes a threat model for pervasive surveillance 822 attacks. Mitigations are to be given in a future document. 824 7. IANA Considerations 826 This document has no actions for IANA. 828 8. Acknowledgements 830 Thanks to Dave Thaler for the list of attacks and taxonomy; to 831 Security Area Directors Stephen Farrell, Sean Turner, and Kathleen 832 Moriarty for starting and managing the IETF's discussion on pervasive 833 attack; and to Stephan Neuhaus, Mark Townsley, Chris Inacio, 834 Evangelos Halepilidis, Bjoern Hoehrmann, Aziz Mohaisen, Russ Housley, 835 and the IAB Privacy and Security Program for their input. 837 9. References 839 9.1. Normative References 841 [RFC6973] Cooper, A., Tschofenig, H., Aboba, B., Peterson, J., 842 Morris, J., Hansen, M., and R. Smith, "Privacy 843 Considerations for Internet Protocols", RFC 6973, July 844 2013. 846 9.2. Informative References 848 [pass1] The Guardian, "How the NSA is still harvesting your online 849 data", 2013, 850 . 853 [pass2] The Guardian, "NSA's Prism surveillance program: how it 854 works and what it can do", 2013, 855 . 858 [pass3] The Guardian, "XKeyscore: NSA tool collects 'nearly 859 everything a user does on the internet'", 2013, 860 . 863 [pass4] The Guardian, "How does GCHQ's internet surveillance 864 work?", n.d., . 867 [dec1] The New York Times, "N.S.A. Able to Foil Basic Safeguards 868 of Privacy on Web", 2013, 869 . 872 [dec2] The Guardian, "Project Bullrun - classification guide to 873 the NSA's decryption program", 2013, 874 . 877 [dec3] The Guardian, "Revealed: how US and UK spy agencies defeat 878 internet privacy and security", 2013, 879 . 882 [TOR] The Tor Project, "Tor", 2013, 883 . 885 [TOR1] Schneier, B., "How the NSA Attacks Tor/Firefox Users With 886 QUANTUM and FOXACID", 2013, 887 . 890 [TOR2] The Guardian, "'Tor Stinks' presentation - read the full 891 document", 2013, 892 . 895 [dir1] The Guardian, "NSA collecting phone records of millions of 896 Verizon customers daily", 2013, 897 . 900 [dir2] The Guardian, "NSA Prism program taps in to user data of 901 Apple, Google and others", 2013, 902 . 905 [dir3] The Guardian, "Sigint - how the NSA collaborates with 906 technology companies", 2013, 907 . 910 [secure] Schneier, B., "NSA surveillance: A guide to staying 911 secure", 2013, 912 . 915 [snowden] Technology Review, "NSA Leak Leaves Crypto-Math Intact but 916 Highlights Known Workarounds", 2013, 917 . 921 [spiegel1] 922 C Stocker, ., "NSA's Secret Toolbox: Unit Offers Spy 923 Gadgets for Every Need", December 2013, 924 . 928 [spiegel3] 929 H Schmundt, ., "The Digital Arms Race: NSA Preps America 930 for Future Battle", January 2014, 931 . 935 [key-recovery] 936 Golle, P., "The Design and Implementation of Protocol- 937 Based Hidden Key Recovery", 2003, 938 . 940 [RFC1035] Mockapetris, P., "Domain names - implementation and 941 specification", STD 13, RFC 1035, November 1987. 943 [RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and 944 E. Lear, "Address Allocation for Private Internets", BCP 945 5, RFC 1918, February 1996. 947 [RFC1939] Myers, J. and M. Rose, "Post Office Protocol - Version 3", 948 STD 53, RFC 1939, May 1996. 950 [RFC2015] Elkins, M., "MIME Security with Pretty Good Privacy 951 (PGP)", RFC 2015, October 1996. 953 [RFC2821] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821, 954 April 2001. 956 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 957 A., Peterson, J., Sparks, R., Handley, M., and E. 958 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 959 June 2002. 961 [RFC3365] Schiller, J., "Strong Security Requirements for Internet 962 Engineering Task Force Standard Protocols", BCP 61, RFC 963 3365, August 2002. 965 [RFC3501] Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION 966 4rev1", RFC 3501, March 2003. 968 [RFC3851] Ramsdell, B., "Secure/Multipurpose Internet Mail 969 Extensions (S/MIME) Version 3.1 Message Specification", 970 RFC 3851, July 2004. 972 [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. 973 Rose, "DNS Security Introduction and Requirements", RFC 974 4033, March 2005. 976 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 977 Internet Protocol", RFC 4301, December 2005. 979 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", RFC 980 4303, December 2005. 982 [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", RFC 983 4306, December 2005. 985 [RFC4949] Shirey, R., "Internet Security Glossary, Version 2", RFC 986 4949, August 2007. 988 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 989 (TLS) Protocol Version 1.2", RFC 5246, August 2008. 991 [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, 992 October 2008. 994 [RFC5655] Trammell, B., Boschi, E., Mark, L., Zseby, T., and A. 995 Wagner, "Specification of the IP Flow Information Export 996 (IPFIX) File Format", RFC 5655, October 2009. 998 [RFC5750] Ramsdell, B. and S. Turner, "Secure/Multipurpose Internet 999 Mail Extensions (S/MIME) Version 3.2 Certificate 1000 Handling", RFC 5750, January 2010. 1002 [RFC6120] Saint-Andre, P., "Extensible Messaging and Presence 1003 Protocol (XMPP): Core", RFC 6120, March 2011. 1005 [RFC6962] Laurie, B., Langley, A., and E. Kasper, "Certificate 1006 Transparency", RFC 6962, June 2013. 1008 [RFC6698] Hoffman, P. and J. Schlyter, "The DNS-Based Authentication 1009 of Named Entities (DANE) Transport Layer Security (TLS) 1010 Protocol: TLSA", RFC 6698, August 2012. 1012 [RFC7011] Claise, B., Trammell, B., and P. Aitken, "Specification of 1013 the IP Flow Information Export (IPFIX) Protocol for the 1014 Exchange of Flow Information", STD 77, RFC 7011, September 1015 2013. 1017 [RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an 1018 Attack", BCP 188, RFC 7258, May 2014. 1020 [I-D.ietf-dprive-problem-statement] 1021 Bortzmeyer, S., "DNS privacy considerations", draft-ietf- 1022 dprive-problem-statement-03 (work in progress), March 1023 2015. 1025 Authors' Addresses 1027 Richard Barnes 1029 Email: rlb@ipv.sx 1031 Bruce Schneier 1033 Email: schneier@schneier.com 1034 Cullen Jennings 1036 Email: fluffy@cisco.com 1038 Ted Hardie 1040 Email: ted.ietf@gmail.com 1042 Brian Trammell 1044 Email: ietf@trammell.ch 1046 Christian Huitema 1048 Email: huitema@huitema.net 1050 Daniel Borkmann 1052 Email: dborkman@redhat.com