idnits 2.17.1 draft-iab-privsec-confidentiality-threat-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 06, 2015) is 3367 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'TOR' is defined on line 845, but no explicit reference was found in the text == Unused Reference: 'RFC2015' is defined on line 899, but no explicit reference was found in the text == Unused Reference: 'RFC2821' is defined on line 902, but no explicit reference was found in the text == Unused Reference: 'RFC3851' is defined on line 917, but no explicit reference was found in the text == Unused Reference: 'RFC4301' is defined on line 925, but no explicit reference was found in the text == Unused Reference: 'RFC4306' is defined on line 931, but no explicit reference was found in the text == Unused Reference: 'RFC5655' is defined on line 943, but no explicit reference was found in the text == Unused Reference: 'RFC5750' is defined on line 947, but no explicit reference was found in the text == Unused Reference: 'RFC6120' is defined on line 951, but no explicit reference was found in the text == Unused Reference: 'RFC6698' is defined on line 957, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 2821 (Obsoleted by RFC 5321) -- Obsolete informational reference (is this intentional?): RFC 3501 (Obsoleted by RFC 9051) -- Obsolete informational reference (is this intentional?): RFC 3851 (Obsoleted by RFC 5751) -- Obsolete informational reference (is this intentional?): RFC 4306 (Obsoleted by RFC 5996) -- Obsolete informational reference (is this intentional?): RFC 5246 (Obsoleted by RFC 8446) -- Obsolete informational reference (is this intentional?): RFC 5750 (Obsoleted by RFC 8550) -- Obsolete informational reference (is this intentional?): RFC 6962 (Obsoleted by RFC 9162) Summary: 1 error (**), 0 flaws (~~), 11 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Barnes 3 Internet-Draft 4 Intended status: Informational B. Schneier 5 Expires: August 10, 2015 6 C. Jennings 8 T. Hardie 10 B. Trammell 12 C. Huitema 14 D. Borkmann 16 February 06, 2015 18 Confidentiality in the Face of Pervasive Surveillance: A Threat Model 19 and Problem Statement 20 draft-iab-privsec-confidentiality-threat-02 22 Abstract 24 Documents published in 2013 revealed several classes of pervasive 25 surveillance attack on Internet communications. In this document we 26 develop a threat model that describes these pervasive attacks. We 27 start by assuming a completely passive attacker with an interest in 28 undetected, indiscriminate eavesdropping, then expand the threat 29 model with a set of verified attacks that have been published. Based 30 on this threat model, we discuss the techniques that can be employed 31 in Internet protocol design to increase the protocols robustness to 32 pervasive surveillance. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at http://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on August 10, 2015. 50 Copyright Notice 52 Copyright (c) 2015 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 1. Introduction 67 Starting in June 2013, documents released to the press by Edward 68 Snowden have revealed several operations undertaken by intelligence 69 agencies to exploit Internet communications for intelligence 70 purposes. These attacks were largely based on protocol 71 vulnerabilities that were already known to exist. The attacks were 72 nonetheless striking in their pervasive nature, both in terms of the 73 amount of Internet communications targeted, and in terms of the 74 diversity of attack techniques employed. 76 To ensure that the Internet can be trusted by users, it is necessary 77 for the Internet technical community to address the vulnerabilities 78 exploited in these attacks [RFC7258]. The goal of this document is 79 to describe more precisely the threats posed by these pervasive 80 attacks, and based on those threats, lay out the problems that need 81 to be solved in order to secure the Internet in the face of those 82 threats. 84 The remainder of this document is structured as follows. In 85 Section 3, we describe an idealized passive attacker, one which could 86 completely undetectably compromise communications at Internet scale. 87 In Section 4, we provide a brief summary of some attacks that have 88 been disclosed, and use these to expand the assumed capabilities of 89 our idealized attacker. Section 5 describes a threat model based on 90 these attacks, focusing on classes of attack that have not been a 91 focus of Internet engineering to date. 93 2. Terminology 95 This document makes extensive use of standard security and privacy 96 terminology; see [RFC4949] and [RFC6973]. Terms used from [RFC6973] 97 include Eavesdropper, Observer, Initiator, Intermediary, Recipient, 98 Attack (in a privacy context), Correlation, Fingerprint, Traffic 99 Analysis, and Identifiability (and related terms). In addition, we 100 use a few terms that are specific to the attacks discussed here: 102 Passive Attack: In this document, the term passive attack is used 103 with respect to the traffic stream: a passive attack does not 104 modify the packets in the traffic stream between two endpoints, 105 modify the treatment of packets in the traffic stream (e.g. delay, 106 routing), or add or remove packets in the traffic stream. Passive 107 attacks are undetectable from the endpoints. 109 Active Attack: In constrast to a passive attack, and active attack 110 may modify a traffic stream, at the cost of possible detection at 111 the endpoints. 113 Pervasive Attack: An attack on Internet communications that makes 114 use of access at a large number of points in the network, or 115 otherwise provides the attacker with access to a large amount of 116 Internet traffic; see [RFC7258] 118 Observation: Information collected directly from communications by 119 an eavesdropper or observer. For example, the knowledge that 120 sent a message to via SMTP 121 taken from the headers of an observed SMTP message would be an 122 observation. 124 Inference: Information extracted from analysis of information 125 collected directly from communications by an eavesdropper or 126 observer. For example, the knowledge that a given web page was 127 accessed by a given IP address, by comparing the size in octets of 128 measured network flow records to fingerprints derived from known 129 sizes of linked resources on the web servers involved, would be an 130 inference. 132 Collaborator: An entity that is a legitimate participant in a 133 communication, but who deliberately provides information about 134 that interaction to an attacker. 136 Unwitting Collaborator: An entity that is a legitimate participant 137 in a communication, and who is the source of information obtained 138 by the attacker without the entity's consent or intention, because 139 the attacker has exploited some technology used by the entity. 141 Key Exfiltration: The transmission of keying material for an 142 encrypted communication from a collaborator, deliberately or 143 unwittingly, to an attacker 145 Content Exfiltration: The transmission of the content of a 146 communication from a collaborator, deliberately or unwittingly, to 147 an attacker 149 3. An Idealized Pervasive Passive Attacker 151 In considering the threat posed by pervasive surveillance, we begin 152 by defining an idealized pervasive passive attacker. While this 153 attacker is less capable than those which we now know to have 154 compromised the Internet from press reports, as elaborated in 155 Section 4, it does set a lower bound on the capabilities of an 156 attacker interested in indiscriminate passive surveillance while 157 interested in remaining undetectable. We note that, prior to the 158 Snowden revelations in 2013, the assumptions of attacker capability 159 presented here would be considered on the border of paranoia outside 160 the network security community. 162 Our idealized attacker is an indiscriminate eavesdropper on an 163 Internet-attached computer network that: 165 o can observe every packet of all communications at any hop in any 166 network path between an initiator and a recipient; 168 o can observe data at rest in any intermediate system between the 169 endpoints controlled by the initiator and recipient; and 171 o can share information with other such attackers; but 173 o takes no other action with respect to these communications (i.e., 174 blocking, modification, injection, etc.). 176 The techniques available to our ideal attacker are direct observation 177 and inference. Direct observation involves taking information 178 directly from eavesdropped communications - e.g., URLs identifying 179 content or email addresses identifying individuals from application- 180 layer headers. Inference, on the other hand, involves analyzing 181 eavesdropped information to derive new information from it; e.g., 182 searching for application or behavioral fingerprints in observed 183 traffic to derive information about the observed individual from 184 them, in absence of directly-observed sources of the same 185 information. The use of encryption to protect confidentiality is 186 generally enough to prevent direct observation of unencrypted 187 content, assuming uncompromised encryption implementations and key 188 material. However, it provides less complete protection against 189 inference, especially inference based only on unprotected portions of 190 communications (e.g. IP and TCP headers for TLS [RFC5246]). 192 3.1. Information subject to direct observation 194 Protocols which do not encrypt their payload make the entire content 195 of the communication available to the idealized attacker along their 196 path. Following the advice in [RFC3365], most such protocols have a 197 secure variant which encrypts payload for confidentiality, and these 198 secure variants are seeing ever-wider deployment. A noteworthy 199 exception is DNS [RFC1035], as DNSSEC [RFC4033] does not have 200 confidentiality as a requirement. This implies that, in the absence 201 of changes to the protocol as presently under development in the 202 DPRIVE working group, all DNS queries and answers generated by the 203 activities of any protocol are available to the attacker. 205 Protocols which imply the storage of some data at rest in 206 intermediaries (e.g. SMTP [RFC5321]) leave this data subject to 207 observation by an attacker that has compromised these intermediaries, 208 unless the data is encrypted end-to-end by the application layer 209 protocol, or the implementation uses an encrypted store for this 210 data. 212 3.2. Information useful for inference 214 Inference is information extracted from later analysis of an observed 215 or eavesdropped communication, and/or correlation of observed or 216 eavesdropped information with information available from other 217 sources. Indeed, most useful inference performed by the attacker 218 falls under the rubric of correlation. The simplest example of this 219 is the observation of DNS queries and answers from and to a source 220 and correlating those with IP addresses with which that source 221 communicates. This can give access to information otherwise not 222 available from encrypted application payloads (e.g., the Host: 223 HTTP/1.1 request header when HTTP is used with TLS). 225 Protocols which encrypt their payload using an application- or 226 transport-layer encryption scheme (e.g. TLS) still expose all the 227 information in their network and transport layer headers to the 228 attacker, including source and destination addresses and ports. 229 IPsec ESP[RFC4303] further encrypts the transport-layer headers, but 230 still leaves IP address information unencrypted; in tunnel mode, 231 these addresses correspond to the tunnel endpoints. Features of the 232 cryptographic protocols themselves, e.g. the TLS session identifier, 233 may leak information that can be used for correlation and inference. 234 While this information is much less semantically rich than the 235 application payload, it can still be useful for the inferring an 236 individual's activities. 238 Inference can also leverage information obtained from sources other 239 than direct traffic observation. Geolocation databases, for example, 240 have been developed map IP addresses to a location, in order to 241 provide location-aware services such as targeted advertising. This 242 location information is often of sufficient resolution that it can be 243 used to draw further inferences toward identifying or profiling an 244 individual. 246 Social media provide another source of more or less publicly 247 accessible information. This information can be extremely 248 semantically rich, including information about an individual's 249 location, associations with other individuals and groups, and 250 activities. Further, this information is generally contributed and 251 curated voluntarily by the individuals themselves: it represents 252 information which the individuals are not necessarily interested in 253 protecting for privacy reasons. However, correlation of this social 254 networking data with information available from direct observation of 255 network traffic allows the creation of a much richer picture of an 256 individual's activities than either alone. 258 We note with some alarm that there is little that can be done at 259 protocol design time to limit such correlation by the attacker, and 260 that the existence of such data sources in many cases greatly 261 complicates the problem of protecting privacy by hardening protocols 262 alone. 264 3.3. An illustration of an ideal passive attack 266 To illustrate how capable the idealized attacker is even given its 267 limitations, we explore the non-anonymity of encrypted IP traffic in 268 this section. Here we examine in detail some inference techniques 269 for associating a set of addresses with an individual, in order to 270 illustrate the difficulty of defending communications against our 271 idealized attacker. Here, the basic problem is that information 272 radiated even from protocols which have no obvious connection with 273 personal data can be correlated with other information which can 274 paint a very rich behavioral picture, that only takes one unprotected 275 link in the chain to associate with an identity. 277 3.3.1. Analysis of IP headers 279 Internet traffic can be monitored by tapping Internet links, or by 280 installing monitoring tools in Internet routers. Of course, a single 281 link or a single router only provides access to a fraction of the 282 global Internet traffic. However, monitoring a number of high 283 capacity links or a set of routers placed at strategic locations 284 provides access to a good sampling of Internet traffic. 286 Tools like IPFIX [RFC7011] allow administrators to acquire statistics 287 about sequences of packets with some common properties that pass 288 through a network device. The most common set of properties used in 289 flow measurement is the "five-tuple"of source and destination 290 addresses, protocol type, and source and destination ports. These 291 statistics are commonly used for network engineering, but could 292 certainly be used for other purposes. 294 Let's assume for a moment that IP addresses can be correlated to 295 specific services or specific users. Analysis of the sequences of 296 packets will quickly reveal which users use what services, and also 297 which users engage in peer-to-peer connections with other users. 298 Analysis of traffic variations over time can be used to detect 299 increased activity by particular users, or in the case of peer-to- 300 peer connections increased activity within groups of users. 302 3.3.2. Correlation of IP addresses to user identities 304 The correlation of IP addresses with specific users can be done in 305 various ways. For example, tools like reverse DNS lookup can be used 306 to retrieve the DNS names of servers. Since the addresses of servers 307 tend to be quite stable and since servers are relatively less 308 numerous than users, an attacker could easily maintain its own copy 309 of the DNS for well-known or popular servers, to accelerate such 310 lookups. 312 On the other hand, the reverse lookup of IP addresses of users is 313 generally less informative. For example, a lookup of the address 314 currently used by one author's home network returns a name of the 315 form "c-192-000-002-033.hsd1.wa.comcast.net". This particular type 316 of reverse DNS lookup generally reveals only coarse-grained location 317 or provider information, equivalent to that available from 318 geolocation databases. 320 In many jurisdictions, Internet Service Providers (ISPs) are required 321 to provide identification on a case by case basis of the "owner" of a 322 specific IP address for law enforcement purposes. This is a 323 reasonably expedient process for targeted investigations, but 324 pervasive surveillance requires something more efficient. This 325 provides an incentive for the attacker to secure the cooperation of 326 the ISP in order to automate this correlation. 328 3.3.3. Monitoring messaging clients for IP address correlation 330 Even if the ISP does not cooperate, user identity can often be 331 obtained via inference. POP3 [RFC1939] and IMAP [RFC3501] are used 332 to retrieve mail from mail servers, while a variant of SMTP is used 333 to submit messages through mail servers. IMAP connections originate 334 from the client, and typically start with an authentication exchange 335 in which the client proves its identity by answering a password 336 challenge. The same holds for the SIP protocol [RFC3261] and many 337 instant messaging services operating over the Internet using 338 proprietary protocols. 340 The username is directly observable if any of these protocols operate 341 in cleartext; the username can then be directly associated with the 342 source address. 344 3.3.4. Retrieving IP addresses from mail headers 346 SMTP [RFC5321] requires that each successive SMTP relay adds a 347 "Received" header to the mail headers. The purpose of these headers 348 is to enable audit of mail transmission, and perhaps to distinguish 349 between regular mail and spam. Here is an extract from the headers 350 of a message recently received from the "perpass" mailing list: 352 "Received: from 192-000-002-044.zone13.example.org (HELO 353 ?192.168.1.100?) (xxx.xxx.xxx.xxx) by lvps192-000-002-219.example.net 354 with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 27 Oct 355 2013 21:47:14 +0100 Message-ID: <526D7BD2.7070908@example.org> Date: 356 Sun, 27 Oct 2013 20:47:14 +0000 From: Some One 357 " 359 This is the first "Received" header attached to the message by the 360 first SMTP relay; for privacy reasons, the field values have been 361 anonymized. We learn here that the message was submitted by "Some 362 One" on October 27, from a host behind a NAT (192.168.1.100) 363 [RFC1918] that used the IP address 192.0.2.44. The information 364 remained in the message, and is accessible by all recipients of the 365 "perpass" mailing list, or indeed by any attacker that sees at least 366 one copy of the message. 368 An attacker that can observe sufficient email traffic can regularly 369 update the mapping between public IP addresses and individual email 370 identities. Even if the SMTP traffic was encrypted on submission and 371 relaying, the attacker can still receive a copy of public mailing 372 lists like "perpass". 374 3.3.5. Tracking address usage with web cookies 376 Many web sites only encrypt a small fraction of their transactions. 377 A popular pattern is to use HTTPS for the login information, and then 378 use a "cookie" to associate following clear-text transactions with 379 the user's identity. Cookies are also used by various advertisement 380 services to quickly identify the users and serve them with 381 "personalized" advertisements. Such cookies are particularly useful 382 if the advertisement services want to keep tracking the user across 383 multiple sessions that may use different IP addresses. 385 As cookies are sent in clear text, an attacker can build a database 386 that associates cookies to IP addresses for non-HTTPS traffic. If 387 the IP address is already identified, the cookie can be linked to the 388 user identify. After that, if the same cookie appears on a new IP 389 address, the new IP address can be immediately associated with the 390 pre-determined identity. 392 3.3.6. Graph-based approaches to address correlation 394 An attacker can track traffic from an IP address not yet associated 395 with an individual to various public services (e.g. websites, mail 396 servers, game servers), and exploit patterns in the observed traffic 397 to correlate this address with other addresses that show similar 398 patterns. For example, any two addresses that show connections to 399 the same IMAP or webmail services, the same set of favorite websites, 400 and game servers at similar times of day may be associated with the 401 same individual. Correlated addresses can then be tied to an 402 individual through one of the techniques above, walking the "network 403 graph" to expand the set of attributable traffic. 405 3.3.7. Tracking of MAC Addresses 407 Moving back down the stack, technologies like Ethernet or Wi-Fi use 408 MAC Addresses to identify link-level destinations. MAC Addresses 409 assigned according to IEEE-802 standards are unique to the device. 410 If the link is publicly accessible, an attacker can track it. For 411 example, the attacker can track the wireless traffic at public Wi-Fi 412 networks. Simple devices can monitor the traffic, and reveal which 413 MAC Addresses are present. If the network does not use some form of 414 Wi-Fi encryption, or if the attacker can access the decrypted 415 traffic, the analysis will also provide the correlation between MAC 416 Addresses and IP addresses. Additional monitoring using techniques 417 exposed in the previous sections will reveal the correlation between 418 MAC Addresses, IP Addresses, and user identity. 420 Given that large-scale databases of the MAC addresses of wireless 421 access points for geolocation purposes have been known to exist for 422 some time, the attacker could easily build a database linking MAC 423 Addresses and device or user identities, and use it to track the 424 movement of devices and of their owners. 426 4. Reported Instances of Large-Scale Attacks 428 The situation in reality is more bleak than that suggested by an 429 analysis of our idealized attacker. Through revelations of sensitive 430 documents in several media outlets, the Internet community has been 431 made aware of several intelligence activities conducted by US and UK 432 national intelligence agencies, particularly the US National Security 433 Agency (NSA) and the UK Government Communications Headquarters 434 (GCHQ). These documents have revealed methods that these agencies 435 use to attack Internet applications and obtain sensitive user 436 information. 438 First, they have confirmed that these agencies have capabilities in 439 line with those of our idealized attacker, thorugh the large-scale 440 passive collection of Internet traffic [pass1][pass2][pass3][pass4]. 441 For example: - The NSA XKEYSCORE system accesses data from multiple 442 access points and searches for "selectors" such as email addresses, 443 at the scale of tens of terabytes of data per day. - The GCHQ 444 Tempora system appears to have access to around 1,500 major cables 445 passing through the UK. - The NSA MUSCULAR program tapped cables 446 between data centers belonging to major service providers. - Several 447 programs appear to perform wide-scale collection of cookies in web 448 traffic and location data from location-aware portable devices such 449 as smartphones. 451 However, the capabilities described go beyond those available to our 452 idealized attacker, including: 454 o Decryption of TLS-protected Internet sessions [dec1][dec2][dec3]. 455 For example, the NSA BULLRUN project appears to have had a budget 456 of around $250M per year to undermine encryption through multiple 457 approaches. 459 o Insertion of NSA devices as a man-in-the-middle of Internet 460 transactions [TOR1][TOR2]. For example, the NSA QUANTUM system 461 appears to use several different techniques to hijack HTTP 462 connections, ranging from DNS response injection to HTTP 302 463 redirects. 465 o Direct acquisition of bulk data and metadata from service 466 providers [dir1][dir2][dir3]. For example, the NSA PRISM program 467 provides the agency with access to many types of user data (e.g., 468 email, chat, VoIP). 470 o Use of implants (covert modifications or malware) to undermine 471 security and anonymity features [dec2][TOR1][TOR2]. For example: 473 * NSA appears to use the QUANTUM man-in-the-middle system to 474 direct users to a FOXACID server, which delivers an implant to 475 compromise the browser of a user of the Tor anonymous 476 communications network. 478 * The BULLRUN program mentioned above includes the addition of 479 covert modifications to software as one means to undermine 480 encryption. 482 * There is also some suspicion that NSA modifications to the 483 DUAL_EC_DRBG random number generator were made to ensure that 484 keys generated using that generator could be predicted by NSA. 485 These suspicions have been reinforced by reports that RSA 486 Security was paid roughly $10M to make DUAL_EC_DRBG the default 487 in their products. 489 We use the term "pervasive attack" [RFC7258] to collectively describe 490 these operations. The term "pervasive" is used because the attacks 491 are designed to indiscriminately gather as much data as possible and 492 to apply selective analysis on targets after the fact. This means 493 that all, or nearly all, Internet communications are targets for 494 these attacks. To achieve this scale, the attacks are physically 495 pervasive; they affect a large number of Internet communications. 496 They are pervasive in content, consuming and exploiting any 497 information revealed by the protocol. And they are pervasive in 498 technology, exploiting many different vulnerabilities in many 499 different protocols. 501 It's important to note that although the attacks mentioned above were 502 executed by NSA and GCHQ, there are many other organizations that can 503 mount pervasive surveillance attacks. Because of the resources 504 required to achieve pervasive scale, these attacks are most commonly 505 undertaken by nation-state actors. For example, the Chinese Internet 506 filtering system known as the "Great Firewall of China" uses several 507 techniques that are similar to the QUANTUM program, and which have a 508 high degree of pervasiveness with regard to the Internet in China. 510 5. Threat Model 512 Given these disclosures, we must consider a broader threat model. 514 Pervasive surveillance aims to collect information across a large 515 number of Internet communications, analyzing the collected 516 communications to identify information of interest within individual 517 communications, or inferring information from correlated 518 communications. his analysis sometimes benefits from decryption of 519 encrypted communications and deanonymization of anonymized 520 communications. As a result, these attackers desire both access to 521 the bulk of Internet traffic and to the keying material required to 522 decrypt any traffic that has been encrypted. Even if keys are not 523 available, note that the presence of a communication and the fact 524 that it is encrypted may both be inputs to an analysis, even if the 525 attacker cannot decrypt the communication. 527 The attacks listed above highlight new avenues both for access to 528 traffic and for access to relevant encryption keys. They further 529 indicate that the scale of surveillance is sufficient to provide a 530 general capability to cross-correlate communications, a threat not 531 previously thought to be relevant at the scale of the Internet. 533 5.1. Attacker Capabilities 535 +--------------------------+-------------------------------------+ 536 | Attack Class | Capability | 537 +--------------------------+-------------------------------------+ 538 | Passive observation | Directly capture data in transit | 539 | | | 540 | Passive inference | Infer from reduced/encrypted data | 541 | | | 542 | Active | Manipulate / inject data in transit | 543 | | | 544 | Static key exfiltration | Obtain key material once / rarely | 545 | | | 546 | Dynamic key exfiltration | Obtain per-session key material | 547 | | | 548 | Content exfiltration | Access data at rest | 549 +--------------------------+-------------------------------------+ 551 Security analyses of Internet protocols commonly consider two classes 552 of attacker: Passive attackers, who can simply listen in on 553 communications as they transit the network, and active attackers, who 554 can modify or delete packets in addition to simply collecting them. 556 In the context of pervasive passive surveillance, these attacks take 557 on an even greater significance. In the past, these attackers were 558 often assumed to operate near the edge of the network, where attacks 559 can be simpler. For example, in some LANs, it is simple for any node 560 to engage in passive listening to other nodes' traffic or inject 561 packets to accomplish active attacks. However, as we now know, both 562 passive and active attacks are undertaken by pervasive attackers 563 closer to the core of the network, greatly expanding the scope and 564 capability of the attacker. 566 Eavesdropping and observation at a larger scale make passive 567 inference attacks easier to carry out: a passive attacker with access 568 to a large portion of the Internet can analyze collected traffic to 569 create a much more detailed view of individual behavior than an 570 attacker that collects at a single point. Even the usual claim that 571 encryption defeats passive attackers is weakened, since a pervasive 572 passive attacker can infer relationships from correlations over large 573 numbers of sessions, e.g., pairing encrypted sessions with 574 unencrypted sessions from the same host, or performing traffic 575 fingerprinting between known and unknown encrypted sessions. Reports 576 on the NSA XKEYSCORE system would indicate it is an example of such 577 an attacker. 579 A pervasive active attacker likewise has capabilities beyond those of 580 a localized active attacker. Active attacks are often limited by 581 network topology, for example by a requirement that the attacker be 582 able to see a targeted session as well as inject packets into it. A 583 pervasive active attacker with access at multiple points within the 584 core of the Internet is able to overcome these topological 585 limitations and perform attacks over a much broader scope. Being 586 positioned in the core of the network rather than the edge can also 587 enable a pervasive active attacker to reroute targeted traffic, 588 amplifying the ability to perform both eavesdropping and traffic 589 injection. Pervasive active attackers can also benefit from 590 pervasive passive collection to identify vulnerable hosts. 592 While not directly related to pervasiveness, attackers that are in a 593 position to mount a pervasive active attack are also often in a 594 position to subvert authentication, a traditional protection against 595 such attacks. Authentication in the Internet is often achieved via 596 trusted third party authorities such as the Certificate Authorities 597 (CAs) that provide web sites with authentication credentials. An 598 attacker with sufficient resources may also be able to induce an 599 authority to grant credentials for an identity of the attacker's 600 choosing. If the parties to a communication will trust multiple 601 authorities to certify a specific identity, this attack may be 602 mounted by suborning any one of the authorities (the proverbial 603 "weakest link"). Subversion of authorities in this way can allow an 604 active attack to succeed in spite of an authentication check. 606 Beyond these three classes (observation, inference, and active), 607 reports on the BULLRUN effort to defeat encryption and the PRISM 608 effort to obtain data from service providers suggest three more 609 classes of attack: 611 o Static key exfiltration 613 o Dynamic key exfiltration 615 o Content exfiltration 616 These attacks all rely on a collaborator providing the attacker with 617 some information, either keys or data. These attacks have not 618 traditionally been considered in scope for the Security 619 Considerations sections of IETF protocols, as they occur outside the 620 protocol. 622 The term "key exfiltration" refers to the transfer of keying material 623 for an encrypted communication from the collaborator to the attacker. 624 By "static", we mean that the transfer of keys happens once, or 625 rarely, typically of a long-lived key. For example, this case would 626 cover a web site operator that provides the private key corresponding 627 to its HTTPS certificate to an intelligence agency. 629 "Dynamic" key exfiltration, by contrast, refers to attacks in which 630 the collaborator delivers keying material to the attacker frequently, 631 e.g., on a per-session basis. This does not necessarily imply 632 frequent communications with the attacker; the transfer of keying 633 material may be virtual. For example, if an endpoint were modified 634 in such a way that the attacker could predict the state of its 635 psuedorandom number generator, then the attacker would be able to 636 derive per-session keys even without per-session communications. 638 Finally, content exfiltration is the attack in which the collaborator 639 simply provides the attacker with the desired data or metadata. 640 Unlike the key exfiltration cases, this attack does not require the 641 attacker to capture the desired data as it flows through the network. 642 The risk is to data at rest as opposed to data in transit. This 643 increases the scope of data that the attacker can obtain, since the 644 attacker can access historical data - the attacker does not have to 645 be listening at the time the communication happens. 647 Exfiltration attacks can be accomplished via attacks against one of 648 the parties to a communication, i.e., by the attacker stealing the 649 keys or content rather than the party providing them willingly. In 650 these cases, the party may not be aware that they are collaborating, 651 at least at a human level. Rather, the subverted technical assets 652 are "collaborating" with the attacker (by providing keys/content) 653 without their owner's knowledge or consent. 655 Any party that has access to encryption keys or unencrypted data can 656 be a collaborator. While collaborators are typically the endpoints 657 of a communication (with encryption securing the links), 658 intermediaries in an unencrypted communication can also facilitate 659 content exfiltration attacks as collaborators by providing the 660 attacker access to those communications. For example, documents 661 describing the NSA PRISM program claim that NSA is able to access 662 user data directly from servers, where it is stored unencrypted. In 663 these cases, the operator of the server would be a collaborator, if 664 an unwitting one. By contrast, in the NSA MUSCULAR program, a set of 665 collaborators enabled attackers to access the cables connecting data 666 centers used by service providers such as Google and Yahoo. Because 667 communications among these data centers were not encrypted, the 668 collaboration by an intermediate entity allowed NSA to collect 669 unencrypted user data. 671 5.2. Attacker Costs 673 +--------------------------+-----------------------------------+ 674 | Attack Class | Cost / Risk to Attacker | 675 +--------------------------+-----------------------------------+ 676 | Passive observation | Passive data access | 677 | | | 678 | Passive inference | Passive data access + processing | 679 | | | 680 | Active | Active data access + processing | 681 | | | 682 | Static key exfiltration | One-time interaction | 683 | | | 684 | Dynamic key exfiltration | Ongoing interaction / code change | 685 | | | 686 | Content exfiltration | Ongoing, bulk interaction | 687 +--------------------------+-----------------------------------+ 689 Each of the attack types discussed in the previous section entails 690 certain costs and risks. These costs differ by attack, and can be 691 helpful in guiding response to pervasive attack. 693 Depending on the attack, the attacker may be exposed to several types 694 of risk, ranging from simply losing access to arrest or prosecution. 695 In order for any of these negative consequences to occur, however, 696 the attacker must first be discovered and identified. So the primary 697 risk we focus on here is the risk of discovery and attribution. 699 A passive attack is the simplest to mount in some ways. The base 700 requirement is that the attacker obtain physical access to a 701 communications medium and extract communications from it. For 702 example, the attacker might tap a fiber-optic cable, acquire a mirror 703 port on a switch, or listen to a wireless signal. The need for these 704 taps to have physical access or proximity to a link exposes the 705 attacker to the risk that the taps will be discovered. For example, 706 a fiber tap or mirror port might be discovered by network operators 707 noticing increased attenuation in the fiber or a change in switch 708 configuration. Of course, passive attacks may be accomplished with 709 the cooperation of the network operator, in which case there is a 710 risk that the attacker's interactions with the network operator will 711 be exposed. 713 In many ways, the costs and risks for an active attack are similar to 714 those for a passive attack, with a few additions. An active attacker 715 requires more robust network access than a passive attacker, since 716 for example they will often need to transmit data as well as 717 receiving it. In the wireless example above, the attacker would need 718 to act as an transmitter as well as receiver, greatly increasing the 719 probability the attacker will be discovered (e.g., using direction- 720 finding technology). Active attacks are also much more observable at 721 higher layers of the network. For example, an active attacker that 722 attempts to use a mis-issued certificate could be detected via 723 Certificate Transparency [RFC6962]. 725 In terms of raw implementation complexity, passive attacks require 726 only enough processing to extract information from the network and 727 store it. Active attacks, by contrast, often depend on winning race 728 conditions to inject pakets into active connections. So active 729 attacks in the core of the network require processing hardware to 730 that can operate at line speed (roughly 100Gbps to 1Tbps in the core) 731 to identify opportunities for attack and insert attack traffic in a 732 high-volume traffic. 733 Key exfiltration attacks rely on passive attack for access to 734 encrypted data, with the collaborator providing keys to decrypt the 735 data. So the attacker undertakes the cost and risk of a passive 736 attack, as well as additional risk of discovery via the interactions 737 that the attacker has with the collaborator. 739 In this sense, static exfiltration has a lower risk profile than 740 dynamic. In the static case, the attacker need only interact with 741 the collaborator a small number of times, possibly only once, say to 742 exchange a private key. In the dynamic case, the attacker must have 743 continuing interactions with the collaborator. As noted above these 744 interactions may real, such as in-person meetings, or virtual, such 745 as software modifications that render keys available to the attacker. 746 Both of these types of interactions introduce a risk that they will 747 be discovered, e.g., by employees of the collaborator organization 748 noticing suspicious meetings or suspicious code changes. 750 Content exfiltration has a similar risk profile to dynamic key 751 exfiltration. In a content exfiltration attack, the attacker saves 752 the cost and risk of conducting a passive attack. The risk of 753 discovery through interactions with the collaborator, however, is 754 still present, and may be higher. The content of a communication is 755 obviously larger than the key used to encrypt it, often by several 756 orders of magnitude. So in the content exfiltration case, the 757 interactions between the collaborator and the attacker need to be 758 much higher-bandwidth than in the key exfiltration cases, with a 759 corresponding increase in the risk that this high-bandwidth channel 760 will be discovered. 762 It should also be noted that in these latter three exfiltration 763 cases, the collaborator also undertakes a risk that his collaboration 764 with the attacker will be discovered. Thus the attacker may have to 765 incur additional cost in order to convince the collaborator to 766 participate in the attack. Likewise, the scope of these attacks is 767 limited to case where the attacker can convince a collaborator to 768 participate. If the attacker is a national government, for example, 769 it may be able to compel participation within its borders, but have a 770 much more difficult time recruiting foreign collaborators. 772 As noted above, the collaborator in an exfiltration attack can be 773 unwitting; the attacker can steal keys or data to enable the attack. 774 In some ways, the risks of this approach are similar to the case of 775 an active collaborator. In the static case, the attacker needs to 776 steal information from the collaborator once; in the dynamic case, 777 the attacker needs to continued presence inside the collaborators 778 systems. The main difference is that the risk in this case is of 779 automated discovery (e.g., by intrusion detection systems) rather 780 than discovery by humans. 782 6. Security Considerations 784 This document describes a threat model for pervasive surveillance 785 attacks. Mitigations are to be given in a future document. 787 7. IANA Considerations 789 This document has no actions for IANA. 791 8. Acknowledgements 793 Thanks to Dave Thaler for the list of attacks and taxonomy; to 794 Security Area Directors Stephen Farrell, Sean Turner, and Kathleen 795 Moriarty for starting and managing the IETF's discussion on pervasive 796 attack; and to Stephan Neuhaus, Mark Townsley, Chris Inacio, 797 Evangelos Halepilidis, Bjoern Hoehrmann, Aziz Mohaisen, as well as 798 the IAB Privacy and Security Program, for their input. 800 9. References 802 9.1. Normative References 804 [RFC6973] Cooper, A., Tschofenig, H., Aboba, B., Peterson, J., 805 Morris, J., Hansen, M., and R. Smith, "Privacy 806 Considerations for Internet Protocols", RFC 6973, July 807 2013. 809 9.2. Informative References 811 [pass1] The Guardian, "How the NSA is still harvesting your online 812 data", 2013, 813 . 816 [pass2] The Guardian, "NSA's Prism surveillance program: how it 817 works and what it can do", 2013, 818 . 821 [pass3] The Guardian, "XKeyscore: NSA tool collects 'nearly 822 everything a user does on the internet'", 2013, 823 . 826 [pass4] The Guardian, "How does GCHQ's internet surveillance 827 work?", n.d., . 830 [dec1] The New York Times, "N.S.A. Able to Foil Basic Safeguards 831 of Privacy on Web", 2013, 832 . 835 [dec2] The Guardian, "Project Bullrun - classification guide to 836 the NSA's decryption program", 2013, 837 . 840 [dec3] The Guardian, "Revealed: how US and UK spy agencies defeat 841 internet privacy and security", 2013, 842 . 845 [TOR] The Tor Project, "Tor", 2013, 846 . 848 [TOR1] Schneier, B., "How the NSA Attacks Tor/Firefox Users With 849 QUANTUM and FOXACID", 2013, 850 . 853 [TOR2] The Guardian, "'Tor Stinks' presentation - read the full 854 document", 2013, 855 . 858 [dir1] The Guardian, "NSA collecting phone records of millions of 859 Verizon customers daily", 2013, 860 . 863 [dir2] The Guardian, "NSA Prism program taps in to user data of 864 Apple, Google and others", 2013, 865 . 868 [dir3] The Guardian, "Sigint - how the NSA collaborates with 869 technology companies", 2013, 870 . 873 [secure] Schneier, B., "NSA surveillance: A guide to staying 874 secure", 2013, 875 . 878 [snowden] Technology Review, "NSA Leak Leaves Crypto-Math Intact but 879 Highlights Known Workarounds", 2013, 880 . 884 [key-recovery] 885 Golle, P., "The Design and Implementation of Protocol- 886 Based Hidden Key Recovery", 2003, 887 . 889 [RFC1035] Mockapetris, P., "Domain names - implementation and 890 specification", STD 13, RFC 1035, November 1987. 892 [RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and 893 E. Lear, "Address Allocation for Private Internets", BCP 894 5, RFC 1918, February 1996. 896 [RFC1939] Myers, J. and M. Rose, "Post Office Protocol - Version 3", 897 STD 53, RFC 1939, May 1996. 899 [RFC2015] Elkins, M., "MIME Security with Pretty Good Privacy 900 (PGP)", RFC 2015, October 1996. 902 [RFC2821] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821, 903 April 2001. 905 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 906 A., Peterson, J., Sparks, R., Handley, M., and E. 907 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 908 June 2002. 910 [RFC3365] Schiller, J., "Strong Security Requirements for Internet 911 Engineering Task Force Standard Protocols", BCP 61, RFC 912 3365, August 2002. 914 [RFC3501] Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION 915 4rev1", RFC 3501, March 2003. 917 [RFC3851] Ramsdell, B., "Secure/Multipurpose Internet Mail 918 Extensions (S/MIME) Version 3.1 Message Specification", 919 RFC 3851, July 2004. 921 [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. 922 Rose, "DNS Security Introduction and Requirements", RFC 923 4033, March 2005. 925 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 926 Internet Protocol", RFC 4301, December 2005. 928 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", RFC 929 4303, December 2005. 931 [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", RFC 932 4306, December 2005. 934 [RFC4949] Shirey, R., "Internet Security Glossary, Version 2", RFC 935 4949, August 2007. 937 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 938 (TLS) Protocol Version 1.2", RFC 5246, August 2008. 940 [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, 941 October 2008. 943 [RFC5655] Trammell, B., Boschi, E., Mark, L., Zseby, T., and A. 944 Wagner, "Specification of the IP Flow Information Export 945 (IPFIX) File Format", RFC 5655, October 2009. 947 [RFC5750] Ramsdell, B. and S. Turner, "Secure/Multipurpose Internet 948 Mail Extensions (S/MIME) Version 3.2 Certificate 949 Handling", RFC 5750, January 2010. 951 [RFC6120] Saint-Andre, P., "Extensible Messaging and Presence 952 Protocol (XMPP): Core", RFC 6120, March 2011. 954 [RFC6962] Laurie, B., Langley, A., and E. Kasper, "Certificate 955 Transparency", RFC 6962, June 2013. 957 [RFC6698] Hoffman, P. and J. Schlyter, "The DNS-Based Authentication 958 of Named Entities (DANE) Transport Layer Security (TLS) 959 Protocol: TLSA", RFC 6698, August 2012. 961 [RFC7011] Claise, B., Trammell, B., and P. Aitken, "Specification of 962 the IP Flow Information Export (IPFIX) Protocol for the 963 Exchange of Flow Information", STD 77, RFC 7011, September 964 2013. 966 [RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an 967 Attack", BCP 188, RFC 7258, May 2014. 969 Authors' Addresses 971 Richard Barnes 973 Email: rlb@ipv.sx 975 Bruce Schneier 977 Email: schneier@schneier.com 979 Cullen Jennings 981 Email: fluffy@cisco.com 983 Ted Hardie 985 Email: ted.ietf@gmail.com 987 Brian Trammell 989 Email: ietf@trammell.ch 991 Christian Huitema 993 Email: huitema@huitema.net 994 Daniel Borkmann 996 Email: dborkman@redhat.com