idnits 2.17.1 draft-iab-privsec-confidentiality-threat-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 20, 2015) is 3347 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'TOR' is defined on line 862, but no explicit reference was found in the text == Unused Reference: 'RFC2015' is defined on line 930, but no explicit reference was found in the text == Unused Reference: 'RFC2821' is defined on line 933, but no explicit reference was found in the text == Unused Reference: 'RFC3851' is defined on line 948, but no explicit reference was found in the text == Unused Reference: 'RFC4301' is defined on line 956, but no explicit reference was found in the text == Unused Reference: 'RFC4306' is defined on line 962, but no explicit reference was found in the text == Unused Reference: 'RFC5655' is defined on line 974, but no explicit reference was found in the text == Unused Reference: 'RFC5750' is defined on line 978, but no explicit reference was found in the text == Unused Reference: 'RFC6120' is defined on line 982, but no explicit reference was found in the text == Unused Reference: 'RFC6698' is defined on line 988, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 2821 (Obsoleted by RFC 5321) -- Obsolete informational reference (is this intentional?): RFC 3501 (Obsoleted by RFC 9051) -- Obsolete informational reference (is this intentional?): RFC 3851 (Obsoleted by RFC 5751) -- Obsolete informational reference (is this intentional?): RFC 4306 (Obsoleted by RFC 5996) -- Obsolete informational reference (is this intentional?): RFC 5246 (Obsoleted by RFC 8446) -- Obsolete informational reference (is this intentional?): RFC 5750 (Obsoleted by RFC 8550) -- Obsolete informational reference (is this intentional?): RFC 6962 (Obsoleted by RFC 9162) Summary: 1 error (**), 0 flaws (~~), 11 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Barnes 3 Internet-Draft 4 Intended status: Informational B. Schneier 5 Expires: August 24, 2015 6 C. Jennings 8 T. Hardie 10 B. Trammell 12 C. Huitema 14 D. Borkmann 16 February 20, 2015 18 Confidentiality in the Face of Pervasive Surveillance: A Threat Model 19 and Problem Statement 20 draft-iab-privsec-confidentiality-threat-03 22 Abstract 24 Documents published since initial revelations in 2013 have revealed 25 several classes of pervasive surveillance attack on Internet 26 communications. In this document we develop a threat model that 27 describes these pervasive attacks. We start by assuming an attacker 28 with an interest in undetected, indiscriminate eavesdropping, then 29 expand the threat model with a set of verified attacks that have been 30 published. 32 Status of This Memo 34 This Internet-Draft is submitted in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF). Note that other groups may also distribute 39 working documents as Internet-Drafts. The list of current Internet- 40 Drafts is at http://datatracker.ietf.org/drafts/current/. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 This Internet-Draft will expire on August 24, 2015. 49 Copyright Notice 51 Copyright (c) 2015 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (http://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. Code Components extracted from this document must 60 include Simplified BSD License text as described in Section 4.e of 61 the Trust Legal Provisions and are provided without warranty as 62 described in the Simplified BSD License. 64 1. Introduction 66 Starting in June 2013, documents released to the press by Edward 67 Snowden have revealed several operations undertaken by intelligence 68 agencies to exploit Internet communications for intelligence 69 purposes. These attacks were largely based on protocol 70 vulnerabilities that were already known to exist. The attacks were 71 nonetheless striking in their pervasive nature, both in terms of the 72 amount of Internet communications targeted, and in terms of the 73 diversity of attack techniques employed. 75 To ensure that the Internet can be trusted by users, it is necessary 76 for the Internet technical community to address the vulnerabilities 77 exploited in these attacks [RFC7258]. The goal of this document is 78 to describe more precisely the threats posed by these pervasive 79 attacks, and based on those threats, lay out the problems that need 80 to be solved in order to secure the Internet in the face of those 81 threats. 83 The remainder of this document is structured as follows. In 84 Section 3, we describe an idealized flow access attacker, one which 85 could completely undetectably compromise communications at Internet 86 scale. In Section 4, we provide a brief summary of some attacks that 87 have been disclosed, and use these to expand the assumed capabilities 88 of our idealized attacker. Note that we do not attempt to describe 89 all possible attacks, but focus on those which result in undetected 90 eavesdropping. Section 5 describes a threat model based on these 91 attacks, focusing on classes of attack that have not been a focus of 92 Internet engineering to date. 94 2. Terminology 96 This document makes extensive use of standard security and privacy 97 terminology; see [RFC4949] and [RFC6973]. Terms used from [RFC6973] 98 include Eavesdropper, Observer, Initiator, Intermediary, Recipient, 99 Attack (in a privacy context), Correlation, Fingerprint, Traffic 100 Analysis, and Identifiability (and related terms). In addition, we 101 use a few terms that are specific to the attacks discussed here: 103 Flow Access Attack: An eavesdropping attack in which the packets 104 in a traffic stream between two endpoints are eavesdropped upon, 105 but in which the attacker does not modify the packets in the 106 traffic stream between two endpoints, modify the treatment of 107 packets in the traffic stream (e.g. delay, routing), or add or 108 remove packets in the traffic stream. Flow access attacks are 109 undetectable from the endpoints. 111 Flow Modification Attack: An attack which includes both 112 eavesdropping (as in a flow access attack) as well as 113 modification, addition, or removal of packets in a traffic stream, 114 or modification of treatment of packets in the traffic stream. 115 Flow modification attacks provide more capabilities to the 116 attacker at the cost of possible detection at the endpoints. 118 Pervasive Attack: An attack on Internet communications that makes 119 use of access at a large number of points in the network, or 120 otherwise provides the attacker with access to a large amount of 121 Internet traffic; see [RFC7258] 123 Observation: Information collected directly from communications by 124 an eavesdropper or observer. For example, the knowledge that 125 sent a message to via SMTP 126 taken from the headers of an observed SMTP message would be an 127 observation. 129 Inference: Information extracted from analysis of information 130 collected directly from communications by an eavesdropper or 131 observer. For example, the knowledge that a given web page was 132 accessed by a given IP address, by comparing the size in octets of 133 measured network flow records to fingerprints derived from known 134 sizes of linked resources on the web servers involved, would be an 135 inference. 137 Collaborator: An entity that is a legitimate participant in a 138 communication, but who deliberately provides information about 139 that interaction to an attacker. 141 Unwitting Collaborator: An entity that is a legitimate participant 142 in a communication, and who is the source of information obtained 143 by the attacker without the entity's consent or intention, because 144 the attacker has exploited some technology used by the entity. 146 Key Exfiltration: The transmission of keying material for an 147 encrypted communication from a collaborator, deliberately or 148 unwittingly, to an attacker 150 Content Exfiltration: The transmission of the content of a 151 communication from a collaborator, deliberately or unwittingly, to 152 an attacker 154 3. An Idealized Pervasive Flow Access Attacker 156 In considering the threat posed by pervasive surveillance, we begin 157 by defining an idealized pervasive flow access attacker. While this 158 attacker is less capable than those which we now know to have 159 compromised the Internet from press reports, as elaborated in 160 Section 4, it does set a lower bound on the capabilities of an 161 attacker interested in indiscriminate passive surveillance while 162 interested in remaining undetectable. We note that, prior to the 163 Snowden revelations in 2013, the assumptions of attacker capability 164 presented here would be considered on the border of paranoia outside 165 the network security community. 167 Our idealized attacker is an indiscriminate eavesdropper on an 168 Internet-attached computer network that: 170 o can observe every packet of all communications at any hop in any 171 network path between an initiator and a recipient; 173 o can observe data at rest in any intermediate system between the 174 endpoints controlled by the initiator and recipient; and 176 o can share information with other such attackers; but 178 o takes no other action with respect to these communications (i.e., 179 blocking, modification, injection, etc.). 181 The techniques available to our ideal attacker are direct observation 182 and inference. Direct observation involves taking information 183 directly from eavesdropped communications - e.g., URLs identifying 184 content or email addresses identifying individuals from application- 185 layer headers. Inference, on the other hand, involves analyzing 186 eavesdropped information to derive new information from it; e.g., 187 searching for application or behavioral fingerprints in observed 188 traffic to derive information about the observed individual from 189 them, in absence of directly-observed sources of the same 190 information. The use of encryption to protect confidentiality is 191 generally enough to prevent direct observation of unencrypted 192 content, assuming uncompromised encryption implementations and key 193 material. However, it provides less complete protection against 194 inference, especially inference based only on unprotected portions of 195 communications (e.g. IP and TCP headers for TLS [RFC5246]). 197 3.1. Information subject to direct observation 199 Protocols which do not encrypt their payload make the entire content 200 of the communication available to the idealized attacker along their 201 path. Following the advice in [RFC3365], most such protocols have a 202 secure variant which encrypts payload for confidentiality, and these 203 secure variants are seeing ever-wider deployment. A noteworthy 204 exception is DNS [RFC1035], as DNSSEC [RFC4033] does not have 205 confidentiality as a requirement. This implies that, in the absence 206 of changes to the protocol as presently under development in the 207 DPRIVE working group, all DNS queries and answers generated by the 208 activities of any protocol are available to the attacker. 210 Protocols which imply the storage of some data at rest in 211 intermediaries (e.g. SMTP [RFC5321]) leave this data subject to 212 observation by an attacker that has compromised these intermediaries, 213 unless the data is encrypted end-to-end by the application layer 214 protocol, or the implementation uses an encrypted store for this 215 data. 217 3.2. Information useful for inference 219 Inference is information extracted from later analysis of an observed 220 or eavesdropped communication, and/or correlation of observed or 221 eavesdropped information with information available from other 222 sources. Indeed, most useful inference performed by the attacker 223 falls under the rubric of correlation. The simplest example of this 224 is the observation of DNS queries and answers from and to a source 225 and correlating those with IP addresses with which that source 226 communicates. This can give access to information otherwise not 227 available from encrypted application payloads (e.g., the Host: 228 HTTP/1.1 request header when HTTP is used with TLS). 230 Protocols which encrypt their payload using an application- or 231 transport-layer encryption scheme (e.g. TLS) still expose all the 232 information in their network and transport layer headers to the 233 attacker, including source and destination addresses and ports. 234 IPsec ESP[RFC4303] further encrypts the transport-layer headers, but 235 still leaves IP address information unencrypted; in tunnel mode, 236 these addresses correspond to the tunnel endpoints. Features of the 237 cryptographic protocols themselves, e.g. the TLS session identifier, 238 may leak information that can be used for correlation and inference. 239 While this information is much less semantically rich than the 240 application payload, it can still be useful for the inferring an 241 individual's activities. 243 Inference can also leverage information obtained from sources other 244 than direct traffic observation. Geolocation databases, for example, 245 have been developed map IP addresses to a location, in order to 246 provide location-aware services such as targeted advertising. This 247 location information is often of sufficient resolution that it can be 248 used to draw further inferences toward identifying or profiling an 249 individual. 251 Social media provide another source of more or less publicly 252 accessible information. This information can be extremely 253 semantically rich, including information about an individual's 254 location, associations with other individuals and groups, and 255 activities. Further, this information is generally contributed and 256 curated voluntarily by the individuals themselves: it represents 257 information which the individuals are not necessarily interested in 258 protecting for privacy reasons. However, correlation of this social 259 networking data with information available from direct observation of 260 network traffic allows the creation of a much richer picture of an 261 individual's activities than either alone. 263 We note with some alarm that there is little that can be done at 264 protocol design time to limit such correlation by the attacker, and 265 that the existence of such data sources in many cases greatly 266 complicates the problem of protecting privacy by hardening protocols 267 alone. 269 3.3. An illustration of an ideal flow access attack 271 To illustrate how capable the idealized attacker is even given its 272 limitations, we explore the non-anonymity of encrypted IP traffic in 273 this section. Here we examine in detail some inference techniques 274 for associating a set of addresses with an individual, in order to 275 illustrate the difficulty of defending communications against our 276 idealized attacker. Here, the basic problem is that information 277 radiated even from protocols which have no obvious connection with 278 personal data can be correlated with other information which can 279 paint a very rich behavioral picture, that only takes one unprotected 280 link in the chain to associate with an identity. 282 3.3.1. Analysis of IP headers 284 Internet traffic can be monitored by tapping Internet links, or by 285 installing monitoring tools in Internet routers. Of course, a single 286 link or a single router only provides access to a fraction of the 287 global Internet traffic. However, monitoring a number of high 288 capacity links or a set of routers placed at strategic locations 289 provides access to a good sampling of Internet traffic. 291 Tools like IPFIX [RFC7011] allow administrators to acquire statistics 292 about sequences of packets with some common properties that pass 293 through a network device. The most common set of properties used in 294 flow measurement is the "five-tuple"of source and destination 295 addresses, protocol type, and source and destination ports. These 296 statistics are commonly used for network engineering, but could 297 certainly be used for other purposes. 299 Let's assume for a moment that IP addresses can be correlated to 300 specific services or specific users. Analysis of the sequences of 301 packets will quickly reveal which users use what services, and also 302 which users engage in peer-to-peer connections with other users. 303 Analysis of traffic variations over time can be used to detect 304 increased activity by particular users, or in the case of peer-to- 305 peer connections increased activity within groups of users. 307 3.3.2. Correlation of IP addresses to user identities 309 The correlation of IP addresses with specific users can be done in 310 various ways. For example, tools like reverse DNS lookup can be used 311 to retrieve the DNS names of servers. Since the addresses of servers 312 tend to be quite stable and since servers are relatively less 313 numerous than users, an attacker could easily maintain its own copy 314 of the DNS for well-known or popular servers, to accelerate such 315 lookups. 317 On the other hand, the reverse lookup of IP addresses of users is 318 generally less informative. For example, a lookup of the address 319 currently used by one author's home network returns a name of the 320 form "c-192-000-002-033.hsd1.wa.comcast.net". This particular type 321 of reverse DNS lookup generally reveals only coarse-grained location 322 or provider information, equivalent to that available from 323 geolocation databases. 325 In many jurisdictions, Internet Service Providers (ISPs) are required 326 to provide identification on a case by case basis of the "owner" of a 327 specific IP address for law enforcement purposes. This is a 328 reasonably expedient process for targeted investigations, but 329 pervasive surveillance requires something more efficient. This 330 provides an incentive for the attacker to secure the cooperation of 331 the ISP in order to automate this correlation. 333 3.3.3. Monitoring messaging clients for IP address correlation 335 Even if the ISP does not cooperate, user identity can often be 336 obtained via inference. POP3 [RFC1939] and IMAP [RFC3501] are used 337 to retrieve mail from mail servers, while a variant of SMTP is used 338 to submit messages through mail servers. IMAP connections originate 339 from the client, and typically start with an authentication exchange 340 in which the client proves its identity by answering a password 341 challenge. The same holds for the SIP protocol [RFC3261] and many 342 instant messaging services operating over the Internet using 343 proprietary protocols. 345 The username is directly observable if any of these protocols operate 346 in cleartext; the username can then be directly associated with the 347 source address. 349 3.3.4. Retrieving IP addresses from mail headers 351 SMTP [RFC5321] requires that each successive SMTP relay adds a 352 "Received" header to the mail headers. The purpose of these headers 353 is to enable audit of mail transmission, and perhaps to distinguish 354 between regular mail and spam. Here is an extract from the headers 355 of a message recently received from the "perpass" mailing list: 357 "Received: from 192-000-002-044.zone13.example.org (HELO 358 ?192.168.1.100?) (xxx.xxx.xxx.xxx) by lvps192-000-002-219.example.net 359 with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 27 Oct 360 2013 21:47:14 +0100 Message-ID: <526D7BD2.7070908@example.org> Date: 361 Sun, 27 Oct 2013 20:47:14 +0000 From: Some One 362 " 364 This is the first "Received" header attached to the message by the 365 first SMTP relay; for privacy reasons, the field values have been 366 anonymized. We learn here that the message was submitted by "Some 367 One" on October 27, from a host behind a NAT (192.168.1.100) 368 [RFC1918] that used the IP address 192.0.2.44. The information 369 remained in the message, and is accessible by all recipients of the 370 "perpass" mailing list, or indeed by any attacker that sees at least 371 one copy of the message. 373 An attacker that can observe sufficient email traffic can regularly 374 update the mapping between public IP addresses and individual email 375 identities. Even if the SMTP traffic was encrypted on submission and 376 relaying, the attacker can still receive a copy of public mailing 377 lists like "perpass". 379 3.3.5. Tracking address usage with web cookies 381 Many web sites only encrypt a small fraction of their transactions. 382 A popular pattern is to use HTTPS for the login information, and then 383 use a "cookie" to associate following clear-text transactions with 384 the user's identity. Cookies are also used by various advertisement 385 services to quickly identify the users and serve them with 386 "personalized" advertisements. Such cookies are particularly useful 387 if the advertisement services want to keep tracking the user across 388 multiple sessions that may use different IP addresses. 390 As cookies are sent in clear text, an attacker can build a database 391 that associates cookies to IP addresses for non-HTTPS traffic. If 392 the IP address is already identified, the cookie can be linked to the 393 user identify. After that, if the same cookie appears on a new IP 394 address, the new IP address can be immediately associated with the 395 pre-determined identity. 397 3.3.6. Graph-based approaches to address correlation 399 An attacker can track traffic from an IP address not yet associated 400 with an individual to various public services (e.g. websites, mail 401 servers, game servers), and exploit patterns in the observed traffic 402 to correlate this address with other addresses that show similar 403 patterns. For example, any two addresses that show connections to 404 the same IMAP or webmail services, the same set of favorite websites, 405 and game servers at similar times of day may be associated with the 406 same individual. Correlated addresses can then be tied to an 407 individual through one of the techniques above, walking the "network 408 graph" to expand the set of attributable traffic. 410 3.3.7. Tracking of MAC Addresses 412 Moving back down the stack, technologies like Ethernet or Wi-Fi use 413 MAC Addresses to identify link-level destinations. MAC Addresses 414 assigned according to IEEE-802 standards are unique to the device. 415 If the link is publicly accessible, an attacker can track it. For 416 example, the attacker can track the wireless traffic at public Wi-Fi 417 networks. Simple devices can monitor the traffic, and reveal which 418 MAC Addresses are present. If the network does not use some form of 419 Wi-Fi encryption, or if the attacker can access the decrypted 420 traffic, the analysis will also provide the correlation between MAC 421 Addresses and IP addresses. Additional monitoring using techniques 422 exposed in the previous sections will reveal the correlation between 423 MAC Addresses, IP Addresses, and user identity. 425 Given that large-scale databases of the MAC addresses of wireless 426 access points for geolocation purposes have been known to exist for 427 some time, the attacker could easily build a database linking MAC 428 Addresses and device or user identities, and use it to track the 429 movement of devices and of their owners. 431 4. Reported Instances of Large-Scale Attacks 433 The situation in reality is more bleak than that suggested by an 434 analysis of our idealized attacker. Through revelations of sensitive 435 documents in several media outlets, the Internet community has been 436 made aware of several intelligence activities conducted by US and UK 437 national intelligence agencies, particularly the US National Security 438 Agency (NSA) and the UK Government Communications Headquarters 439 (GCHQ). These documents have revealed methods that these agencies 440 use to attack Internet applications and obtain sensitive user 441 information. 443 First, they have confirmed that these agencies have capabilities in 444 line with those of our idealized attacker, thorugh the large-scale 445 passive collection of Internet traffic [pass1][pass2][pass3][pass4]. 446 For example: - The NSA XKEYSCORE system accesses data from multiple 447 access points and searches for "selectors" such as email addresses, 448 at the scale of tens of terabytes of data per day. - The GCHQ 449 Tempora system appears to have access to around 1,500 major cables 450 passing through the UK. - The NSA MUSCULAR program tapped cables 451 between data centers belonging to major service providers. - Several 452 programs appear to perform wide-scale collection of cookies in web 453 traffic and location data from location-aware portable devices such 454 as smartphones. 456 However, the capabilities described go beyond those available to our 457 idealized attacker, including: 459 o Decryption of TLS-protected Internet sessions [dec1][dec2][dec3]. 460 For example, the NSA BULLRUN project appears to have had a budget 461 of around $250M per year to undermine encryption through multiple 462 approaches. 464 o Insertion of NSA devices as a man-in-the-middle of Internet 465 transactions [TOR1][TOR2]. For example, the NSA QUANTUM system 466 appears to use several different techniques to hijack HTTP 467 connections, ranging from DNS response injection to HTTP 302 468 redirects. 470 o Direct acquisition of bulk data and metadata from service 471 providers [dir1][dir2][dir3]. For example, the NSA PRISM program 472 provides the agency with access to many types of user data (e.g., 473 email, chat, VoIP). 475 o Use of implants (covert modifications or malware) to undermine 476 security and anonymity features [dec2][TOR1][TOR2]. For example: 478 * NSA appears to use the QUANTUM man-in-the-middle system to 479 direct users to a FOXACID server, which delivers an implant to 480 compromise the browser of a user of the Tor anonymous 481 communications network. 483 * Implants are apparently available for Cisco, Juniper, Huawei, 484 Dell, and HP network elements, provided by the NSA Advanced 485 Network Technology group [spiegel1] 487 * Compromised hosts at botnet scale, using tools by the NSA's 488 Remote Operations Center [spiegel3] 490 * The BULLRUN program mentioned above includes the addition of 491 covert modifications to software as one means to undermine 492 encryption. 494 * There is also some suspicion that NSA modifications to the 495 DUAL_EC_DRBG random number generator were made to ensure that 496 keys generated using that generator could be predicted by NSA. 497 These suspicions have been reinforced by reports that RSA 498 Security was paid roughly $10M to make DUAL_EC_DRBG the default 499 in their products. 501 We use the term "pervasive attack" [RFC7258] to collectively describe 502 these operations. The term "pervasive" is used because the attacks 503 are designed to indiscriminately gather as much data as possible and 504 to apply selective analysis on targets after the fact. This means 505 that all, or nearly all, Internet communications are targets for 506 these attacks. To achieve this scale, the attacks are physically 507 pervasive; they affect a large number of Internet communications. 508 They are pervasive in content, consuming and exploiting any 509 information revealed by the protocol. And they are pervasive in 510 technology, exploiting many different vulnerabilities in many 511 different protocols. 513 It's important to note that although the attacks mentioned above were 514 executed by NSA and GCHQ, there are many other organizations that can 515 mount pervasive surveillance attacks. Because of the resources 516 required to achieve pervasive scale, these attacks are most commonly 517 undertaken by nation-state actors. For example, the Chinese Internet 518 filtering system known as the "Great Firewall of China" uses several 519 techniques that are similar to the QUANTUM program, and which have a 520 high degree of pervasiveness with regard to the Internet in China. 522 5. Threat Model 524 Given these disclosures, we must consider a broader threat model. 526 Pervasive surveillance aims to collect information across a large 527 number of Internet communications, analyzing the collected 528 communications to identify information of interest within individual 529 communications, or inferring information from correlated 530 communications. his analysis sometimes benefits from decryption of 531 encrypted communications and deanonymization of anonymized 532 communications. As a result, these attackers desire both access to 533 the bulk of Internet traffic and to the keying material required to 534 decrypt any traffic that has been encrypted. Even if keys are not 535 available, note that the presence of a communication and the fact 536 that it is encrypted may both be inputs to an analysis, even if the 537 attacker cannot decrypt the communication. 539 The attacks listed above highlight new avenues both for access to 540 traffic and for access to relevant encryption keys. They further 541 indicate that the scale of surveillance is sufficient to provide a 542 general capability to cross-correlate communications, a threat not 543 previously thought to be relevant at the scale of the Internet. 545 5.1. Attacker Capabilities 547 +--------------------------+-------------------------------------+ 548 | Attack Class | Capability | 549 +--------------------------+-------------------------------------+ 550 | Passive observation | Directly capture data in transit | 551 | | | 552 | Passive inference | Infer from reduced/encrypted data | 553 | | | 554 | Active | Manipulate / inject data in transit | 555 | | | 556 | Static key exfiltration | Obtain key material once / rarely | 557 | | | 558 | Dynamic key exfiltration | Obtain per-session key material | 559 | | | 560 | Content exfiltration | Access data at rest | 561 +--------------------------+-------------------------------------+ 563 Security analyses of Internet protocols commonly consider two classes 564 of attacker: flow access attackers, who can simply listen in on 565 communications as they transit the network, and flow modification 566 attackers, who can modify or delete packets in addition to simply 567 collecting them. 569 In the context of pervasive passive surveillance, these attacks take 570 on an even greater significance. In the past, these attackers were 571 often assumed to operate near the edge of the network, where attacks 572 can be simpler. For example, in some LANs, it is simple for any node 573 to engage in passive listening to other nodes' traffic or inject 574 packets to accomplish flow modification attacks. However, as we now 575 know, both passive and flow modification attacks are undertaken by 576 pervasive attackers closer to the core of the network, greatly 577 expanding the scope and capability of the attacker. 579 Eavesdropping and observation at a larger scale make passive 580 inference attacks easier to carry out: a flow access attacker with 581 access to a large portion of the Internet can analyze collected 582 traffic to create a much more detailed view of individual behavior 583 than an attacker that collects at a single point. Even the usual 584 claim that encryption defeats flow access attackers is weakened, 585 since a pervasive flow access attacker can infer relationships from 586 correlations over large numbers of sessions, e.g., pairing encrypted 587 sessions with unencrypted sessions from the same host, or performing 588 traffic fingerprinting between known and unknown encrypted sessions. 589 Reports on the NSA XKEYSCORE system would indicate it is an example 590 of such an attacker. 592 A pervasive flow modification attacker likewise has capabilities 593 beyond those of a localized flow modification attacker. flow 594 modification attacks are often limited by network topology, for 595 example by a requirement that the attacker be able to see a targeted 596 session as well as inject packets into it. A pervasive flow 597 modification attacker with access at multiple points within the core 598 of the Internet is able to overcome these topological limitations and 599 perform attacks over a much broader scope. Being positioned in the 600 core of the network rather than the edge can also enable a pervasive 601 flow modification attacker to reroute targeted traffic, amplifying 602 the ability to perform both eavesdropping and traffic injection. 603 Pervasive flow modification attackers can also benefit from pervasive 604 passive collection to identify vulnerable hosts. 606 While not directly related to pervasiveness, attackers that are in a 607 position to mount a pervasive flow modification attack are also often 608 in a position to subvert authentication, a traditional protection 609 against such attacks. Authentication in the Internet is often 610 achieved via trusted third party authorities such as the Certificate 611 Authorities (CAs) that provide web sites with authentication 612 credentials. An attacker with sufficient resources may also be able 613 to induce an authority to grant credentials for an identity of the 614 attacker's choosing. If the parties to a communication will trust 615 multiple authorities to certify a specific identity, this attack may 616 be mounted by suborning any one of the authorities (the proverbial 617 "weakest link"). Subversion of authorities in this way can allow an 618 flow modification attack to succeed in spite of an authentication 619 check. 621 Beyond these three classes (observation, inference, and active), 622 reports on the BULLRUN effort to defeat encryption and the PRISM 623 effort to obtain data from service providers suggest three more 624 classes of attack: 626 o Static key exfiltration 628 o Dynamic key exfiltration 630 o Content exfiltration 632 These attacks all rely on a collaborator providing the attacker with 633 some information, either keys or data. These attacks have not 634 traditionally been considered in scope for the Security 635 Considerations sections of IETF protocols, as they occur outside the 636 protocol. 638 The term "key exfiltration" refers to the transfer of keying material 639 for an encrypted communication from the collaborator to the attacker. 640 By "static", we mean that the transfer of keys happens once, or 641 rarely, typically of a long-lived key. For example, this case would 642 cover a web site operator that provides the private key corresponding 643 to its HTTPS certificate to an intelligence agency. 645 "Dynamic" key exfiltration, by contrast, refers to attacks in which 646 the collaborator delivers keying material to the attacker frequently, 647 e.g., on a per-session basis. This does not necessarily imply 648 frequent communications with the attacker; the transfer of keying 649 material may be virtual. For example, if an endpoint were modified 650 in such a way that the attacker could predict the state of its 651 psuedorandom number generator, then the attacker would be able to 652 derive per-session keys even without per-session communications. 654 Finally, content exfiltration is the attack in which the collaborator 655 simply provides the attacker with the desired data or metadata. 656 Unlike the key exfiltration cases, this attack does not require the 657 attacker to capture the desired data as it flows through the network. 658 The risk is to data at rest as opposed to data in transit. This 659 increases the scope of data that the attacker can obtain, since the 660 attacker can access historical data - the attacker does not have to 661 be listening at the time the communication happens. 663 Exfiltration attacks can be accomplished via attacks against one of 664 the parties to a communication, i.e., by the attacker stealing the 665 keys or content rather than the party providing them willingly. In 666 these cases, the party may not be aware that they are collaborating, 667 at least at a human level. Rather, the subverted technical assets 668 are "collaborating" with the attacker (by providing keys/content) 669 without their owner's knowledge or consent. 671 Any party that has access to encryption keys or unencrypted data can 672 be a collaborator. While collaborators are typically the endpoints 673 of a communication (with encryption securing the links), 674 intermediaries in an unencrypted communication can also facilitate 675 content exfiltration attacks as collaborators by providing the 676 attacker access to those communications. For example, documents 677 describing the NSA PRISM program claim that NSA is able to access 678 user data directly from servers, where it is stored unencrypted. In 679 these cases, the operator of the server would be a collaborator, if 680 an unwitting one. By contrast, in the NSA MUSCULAR program, a set of 681 collaborators enabled attackers to access the cables connecting data 682 centers used by service providers such as Google and Yahoo. Because 683 communications among these data centers were not encrypted, the 684 collaboration by an intermediate entity allowed NSA to collect 685 unencrypted user data. 687 5.2. Attacker Costs 689 +--------------------------+-----------------------------------+ 690 | Attack Class | Cost / Risk to Attacker | 691 +--------------------------+-----------------------------------+ 692 | Passive observation | Passive data access | 693 | | | 694 | Passive inference | Passive data access + processing | 695 | | | 696 | Active | Active data access + processing | 697 | | | 698 | Static key exfiltration | One-time interaction | 699 | | | 700 | Dynamic key exfiltration | Ongoing interaction / code change | 701 | | | 702 | Content exfiltration | Ongoing, bulk interaction | 703 +--------------------------+-----------------------------------+ 705 Each of the attack types discussed in the previous section entails 706 certain costs and risks. These costs differ by attack, and can be 707 helpful in guiding response to pervasive attack. 709 Depending on the attack, the attacker may be exposed to several types 710 of risk, ranging from simply losing access to arrest or prosecution. 711 In order for any of these negative consequences to occur, however, 712 the attacker must first be discovered and identified. So the primary 713 risk we focus on here is the risk of discovery and attribution. 715 A flow access attack is the simplest to mount in some ways. The base 716 requirement is that the attacker obtain physical access to a 717 communications medium and extract communications from it. For 718 example, the attacker might tap a fiber-optic cable, acquire a mirror 719 port on a switch, or listen to a wireless signal. The need for these 720 taps to have physical access or proximity to a link exposes the 721 attacker to the risk that the taps will be discovered. For example, 722 a fiber tap or mirror port might be discovered by network operators 723 noticing increased attenuation in the fiber or a change in switch 724 configuration. Of course, flow access attacks may be accomplished 725 with the cooperation of the network operator, in which case there is 726 a risk that the attacker's interactions with the network operator 727 will be exposed. 729 In many ways, the costs and risks for an flow modification attack are 730 similar to those for a flow access attack, with a few additions. An 731 flow modification attacker requires more robust network access than a 732 flow access attacker, since for example they will often need to 733 transmit data as well as receiving it. In the wireless example 734 above, the attacker would need to act as an transmitter as well as 735 receiver, greatly increasing the probability the attacker will be 736 discovered (e.g., using direction-finding technology). flow 737 modification attacks are also much more observable at higher layers 738 of the network. For example, an flow modification attacker that 739 attempts to use a mis-issued certificate could be detected via 740 Certificate Transparency [RFC6962]. 742 In terms of raw implementation complexity, flow access attacks 743 require only enough processing to extract information from the 744 network and store it. flow modification attacks, by contrast, often 745 depend on winning race conditions to inject pakets into active 746 connections. So flow modification attacks in the core of the network 747 require processing hardware to that can operate at line speed 748 (roughly 100Gbps to 1Tbps in the core) to identify opportunities for 749 attack and insert attack traffic in a high-volume traffic. Key 750 exfiltration attacks rely on flow access attack for access to 751 encrypted data, with the collaborator providing keys to decrypt the 752 data. So the attacker undertakes the cost and risk of a flow access 753 attack, as well as additional risk of discovery via the interactions 754 that the attacker has with the collaborator. 756 In this sense, static exfiltration has a lower risk profile than 757 dynamic. In the static case, the attacker need only interact with 758 the collaborator a small number of times, possibly only once, say to 759 exchange a private key. In the dynamic case, the attacker must have 760 continuing interactions with the collaborator. As noted above these 761 interactions may real, such as in-person meetings, or virtual, such 762 as software modifications that render keys available to the attacker. 763 Both of these types of interactions introduce a risk that they will 764 be discovered, e.g., by employees of the collaborator organization 765 noticing suspicious meetings or suspicious code changes. 767 Content exfiltration has a similar risk profile to dynamic key 768 exfiltration. In a content exfiltration attack, the attacker saves 769 the cost and risk of conducting a flow access attack. The risk of 770 discovery through interactions with the collaborator, however, is 771 still present, and may be higher. The content of a communication is 772 obviously larger than the key used to encrypt it, often by several 773 orders of magnitude. So in the content exfiltration case, the 774 interactions between the collaborator and the attacker need to be 775 much higher-bandwidth than in the key exfiltration cases, with a 776 corresponding increase in the risk that this high-bandwidth channel 777 will be discovered. 779 It should also be noted that in these latter three exfiltration 780 cases, the collaborator also undertakes a risk that his collaboration 781 with the attacker will be discovered. Thus the attacker may have to 782 incur additional cost in order to convince the collaborator to 783 participate in the attack. Likewise, the scope of these attacks is 784 limited to case where the attacker can convince a collaborator to 785 participate. If the attacker is a national government, for example, 786 it may be able to compel participation within its borders, but have a 787 much more difficult time recruiting foreign collaborators. 789 As noted above, the collaborator in an exfiltration attack can be 790 unwitting; the attacker can steal keys or data to enable the attack. 791 In some ways, the risks of this approach are similar to the case of 792 an active collaborator. In the static case, the attacker needs to 793 steal information from the collaborator once; in the dynamic case, 794 the attacker needs to continued presence inside the collaborators 795 systems. The main difference is that the risk in this case is of 796 automated discovery (e.g., by intrusion detection systems) rather 797 than discovery by humans. 799 6. Security Considerations 801 This document describes a threat model for pervasive surveillance 802 attacks. Mitigations are to be given in a future document. 804 7. IANA Considerations 806 This document has no actions for IANA. 808 8. Acknowledgements 810 Thanks to Dave Thaler for the list of attacks and taxonomy; to 811 Security Area Directors Stephen Farrell, Sean Turner, and Kathleen 812 Moriarty for starting and managing the IETF's discussion on pervasive 813 attack; and to Stephan Neuhaus, Mark Townsley, Chris Inacio, 814 Evangelos Halepilidis, Bjoern Hoehrmann, Aziz Mohaisen, as well as 815 the IAB Privacy and Security Program, for their input. 817 9. References 819 9.1. Normative References 821 [RFC6973] Cooper, A., Tschofenig, H., Aboba, B., Peterson, J., 822 Morris, J., Hansen, M., and R. Smith, "Privacy 823 Considerations for Internet Protocols", RFC 6973, July 824 2013. 826 9.2. Informative References 828 [pass1] The Guardian, "How the NSA is still harvesting your online 829 data", 2013, 830 . 833 [pass2] The Guardian, "NSA's Prism surveillance program: how it 834 works and what it can do", 2013, 835 . 838 [pass3] The Guardian, "XKeyscore: NSA tool collects 'nearly 839 everything a user does on the internet'", 2013, 840 . 843 [pass4] The Guardian, "How does GCHQ's internet surveillance 844 work?", n.d., . 847 [dec1] The New York Times, "N.S.A. Able to Foil Basic Safeguards 848 of Privacy on Web", 2013, 849 . 852 [dec2] The Guardian, "Project Bullrun - classification guide to 853 the NSA's decryption program", 2013, 854 . 857 [dec3] The Guardian, "Revealed: how US and UK spy agencies defeat 858 internet privacy and security", 2013, 859 . 862 [TOR] The Tor Project, "Tor", 2013, 863 . 865 [TOR1] Schneier, B., "How the NSA Attacks Tor/Firefox Users With 866 QUANTUM and FOXACID", 2013, 867 . 870 [TOR2] The Guardian, "'Tor Stinks' presentation - read the full 871 document", 2013, 872 . 875 [dir1] The Guardian, "NSA collecting phone records of millions of 876 Verizon customers daily", 2013, 877 . 880 [dir2] The Guardian, "NSA Prism program taps in to user data of 881 Apple, Google and others", 2013, 882 . 885 [dir3] The Guardian, "Sigint - how the NSA collaborates with 886 technology companies", 2013, 887 . 890 [secure] Schneier, B., "NSA surveillance: A guide to staying 891 secure", 2013, 892 . 895 [snowden] Technology Review, "NSA Leak Leaves Crypto-Math Intact but 896 Highlights Known Workarounds", 2013, 897 . 901 [spiegel1] 902 C Stocker, ., "NSA's Secret Toolbox: Unit Offers Spy 903 Gadgets for Every Need", December 2013, 904 . 908 [spiegel3] 909 H Schmundt, ., "The Digital Arms Race: NSA Preps America 910 for Future Battle", January 2014, 911 . 915 [key-recovery] 916 Golle, P., "The Design and Implementation of Protocol- 917 Based Hidden Key Recovery", 2003, 918 . 920 [RFC1035] Mockapetris, P., "Domain names - implementation and 921 specification", STD 13, RFC 1035, November 1987. 923 [RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and 924 E. Lear, "Address Allocation for Private Internets", BCP 925 5, RFC 1918, February 1996. 927 [RFC1939] Myers, J. and M. Rose, "Post Office Protocol - Version 3", 928 STD 53, RFC 1939, May 1996. 930 [RFC2015] Elkins, M., "MIME Security with Pretty Good Privacy 931 (PGP)", RFC 2015, October 1996. 933 [RFC2821] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821, 934 April 2001. 936 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 937 A., Peterson, J., Sparks, R., Handley, M., and E. 938 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 939 June 2002. 941 [RFC3365] Schiller, J., "Strong Security Requirements for Internet 942 Engineering Task Force Standard Protocols", BCP 61, RFC 943 3365, August 2002. 945 [RFC3501] Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION 946 4rev1", RFC 3501, March 2003. 948 [RFC3851] Ramsdell, B., "Secure/Multipurpose Internet Mail 949 Extensions (S/MIME) Version 3.1 Message Specification", 950 RFC 3851, July 2004. 952 [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. 953 Rose, "DNS Security Introduction and Requirements", RFC 954 4033, March 2005. 956 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 957 Internet Protocol", RFC 4301, December 2005. 959 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", RFC 960 4303, December 2005. 962 [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", RFC 963 4306, December 2005. 965 [RFC4949] Shirey, R., "Internet Security Glossary, Version 2", RFC 966 4949, August 2007. 968 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 969 (TLS) Protocol Version 1.2", RFC 5246, August 2008. 971 [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, 972 October 2008. 974 [RFC5655] Trammell, B., Boschi, E., Mark, L., Zseby, T., and A. 975 Wagner, "Specification of the IP Flow Information Export 976 (IPFIX) File Format", RFC 5655, October 2009. 978 [RFC5750] Ramsdell, B. and S. Turner, "Secure/Multipurpose Internet 979 Mail Extensions (S/MIME) Version 3.2 Certificate 980 Handling", RFC 5750, January 2010. 982 [RFC6120] Saint-Andre, P., "Extensible Messaging and Presence 983 Protocol (XMPP): Core", RFC 6120, March 2011. 985 [RFC6962] Laurie, B., Langley, A., and E. Kasper, "Certificate 986 Transparency", RFC 6962, June 2013. 988 [RFC6698] Hoffman, P. and J. Schlyter, "The DNS-Based Authentication 989 of Named Entities (DANE) Transport Layer Security (TLS) 990 Protocol: TLSA", RFC 6698, August 2012. 992 [RFC7011] Claise, B., Trammell, B., and P. Aitken, "Specification of 993 the IP Flow Information Export (IPFIX) Protocol for the 994 Exchange of Flow Information", STD 77, RFC 7011, September 995 2013. 997 [RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an 998 Attack", BCP 188, RFC 7258, May 2014. 1000 Authors' Addresses 1002 Richard Barnes 1004 Email: rlb@ipv.sx 1006 Bruce Schneier 1008 Email: schneier@schneier.com 1010 Cullen Jennings 1012 Email: fluffy@cisco.com 1014 Ted Hardie 1016 Email: ted.ietf@gmail.com 1018 Brian Trammell 1020 Email: ietf@trammell.ch 1022 Christian Huitema 1024 Email: huitema@huitema.net 1026 Daniel Borkmann 1028 Email: dborkman@redhat.com