idnits 2.17.1 draft-iab-privsec-confidentiality-threat-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 28, 2015) is 3256 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 3501 (Obsoleted by RFC 9051) -- Obsolete informational reference (is this intentional?): RFC 5246 (Obsoleted by RFC 8446) -- Obsolete informational reference (is this intentional?): RFC 6962 (Obsoleted by RFC 9162) == Outdated reference: A later version (-06) exists of draft-ietf-dprive-problem-statement-05 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Barnes 3 Internet-Draft 4 Intended status: Informational B. Schneier 5 Expires: November 29, 2015 6 C. Jennings 8 T. Hardie 10 B. Trammell 12 C. Huitema 14 D. Borkmann 16 May 28, 2015 18 Confidentiality in the Face of Pervasive Surveillance: A Threat Model 19 and Problem Statement 20 draft-iab-privsec-confidentiality-threat-07 22 Abstract 24 Since the initial revelations of pervasive surveillance in 2013, 25 several classes of attacks on Internet communications have been 26 discovered. In this document we develop a threat model that 27 describes these attacks on Internet confidentiality. We assume an 28 attacker that is interested in undetected, indiscriminate 29 eavesdropping. The threat model is based on published, verified 30 attacks. 32 Status of This Memo 34 This Internet-Draft is submitted in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF). Note that other groups may also distribute 39 working documents as Internet-Drafts. The list of current Internet- 40 Drafts is at http://datatracker.ietf.org/drafts/current/. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 This Internet-Draft will expire on November 29, 2015. 49 Copyright Notice 51 Copyright (c) 2015 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (http://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. Code Components extracted from this document must 60 include Simplified BSD License text as described in Section 4.e of 61 the Trust Legal Provisions and are provided without warranty as 62 described in the Simplified BSD License. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 67 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 68 3. An Idealized Passive Pervasive Attacker . . . . . . . . . . . 5 69 3.1. Information subject to direct observation . . . . . . . . 6 70 3.2. Information useful for inference . . . . . . . . . . . . 6 71 3.3. An illustration of an ideal passive pervasive attack . . 7 72 3.3.1. Analysis of IP headers . . . . . . . . . . . . . . . 7 73 3.3.2. Correlation of IP addresses to user identities . . . 8 74 3.3.3. Monitoring messaging clients for IP address 75 correlation . . . . . . . . . . . . . . . . . . . . . 8 76 3.3.4. Retrieving IP addresses from mail headers . . . . . . 9 77 3.3.5. Tracking address usage with web cookies . . . . . . . 9 78 3.3.6. Graph-based approaches to address correlation . . . . 10 79 3.3.7. Tracking of Link Layer Identifiers . . . . . . . . . 10 80 4. Reported Instances of Large-Scale Attacks . . . . . . . . . . 11 81 5. Threat Model . . . . . . . . . . . . . . . . . . . . . . . . 13 82 5.1. Attacker Capabilities . . . . . . . . . . . . . . . . . . 14 83 5.2. Attacker Costs . . . . . . . . . . . . . . . . . . . . . 17 84 6. Security Considerations . . . . . . . . . . . . . . . . . . . 19 85 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 86 8. IAB Members at the Time of Approval . . . . . . . . . . . . . 20 87 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 20 88 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 89 10.1. Normative References . . . . . . . . . . . . . . . . . . 20 90 10.2. Informative References . . . . . . . . . . . . . . . . . 20 91 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 93 1. Introduction 95 Starting in June 2013, documents released to the press by Edward 96 Snowden have revealed several operations undertaken by intelligence 97 agencies to exploit Internet communications for intelligence 98 purposes. These attacks were largely based on protocol 99 vulnerabilities that were already known to exist. The attacks were 100 nonetheless striking in their pervasive nature, both in terms of the 101 volume of Internet traffic targeted, and in terms of the diversity of 102 attack techniques employed. 104 To ensure that the Internet can be trusted by users, it is necessary 105 for the Internet technical community to address the vulnerabilities 106 exploited in these attacks [RFC7258]. The goal of this document is 107 to describe more precisely the threats posed by these pervasive 108 attacks, and based on those threats, lay out the problems that need 109 to be solved in order to secure the Internet in the face of those 110 threats. 112 The remainder of this document is structured as follows. In 113 Section 3, we describe an idealized passive pervasive attacker, one 114 which could completely undetectably compromise communications at 115 Internet scale. In Section 4, we provide a brief summary of some 116 attacks that have been disclosed, and use these to expand the assumed 117 capabilities of our idealized attacker. Note that we do not attempt 118 to describe all possible attacks, but focus on those which result in 119 undetected eavesdropping. Section 5 describes a threat model based 120 on these attacks, focusing on classes of attack that have not been a 121 focus of Internet engineering to date. 123 2. Terminology 125 This document makes extensive use of standard security and privacy 126 terminology; see [RFC4949] and [RFC6973]. Terms used from [RFC6973] 127 include Eavesdropper, Observer, Initiator, Intermediary, Recipient, 128 Attack (in a privacy context), Correlation, Fingerprint, Traffic 129 Analysis, and Identifiability (and related terms). In addition, we 130 use a few terms that are specific to the attacks discussed in this 131 document. Note especially that "passive" and "active" below do not 132 refer to the effort used to mount the attack; a "passive attack" is 133 any attack that accesses a flow but does not modify it, while an 134 "active attack" is any attack that modifies a flow. Some passive 135 attacks involve active interception and modifications of devices, 136 rather than simple access to the medium. The introduced terms are: 138 Pervasive Attack: An attack on Internet communications that makes 139 use of access at a large number of points in the network, or 140 otherwise provides the attacker with access to a large amount of 141 Internet traffic; see [RFC7258]. 143 Passive Pervasive Attack: An eavesdropping attack undertaken by a 144 pervasive attacker, in which the packets in a traffic stream 145 between two endpoints are intercepted, but in which the attacker 146 does not modify the packets in the traffic stream between two 147 endpoints, modify the treatment of packets in the traffic stream 148 (e.g. delay, routing), or add or remove packets in the traffic 149 stream. Passive pervasive attacks are undetectable from the 150 endpoints. Equivalent to passive wiretapping as defined in 151 [RFC4949]; we use an alternate term here since the methods 152 employed are wider than those implied by the word "wiretapping", 153 including the active compromise of intermediate systems. 155 Active Pervasive Attack: An attack undertaken by a pervasive 156 attacker, which in addition to the elements of a passive pervasive 157 attack, also includes modification, addition, or removal of 158 packets in a traffic stream, or modification of treatment of 159 packets in the traffic stream. Active pervasive attacks provide 160 more capabilities to the attacker at the risk of possible 161 detection at the endpoints. Equivalent to active wiretapping as 162 defined in [RFC4949]. 164 Observation: Information collected directly from communications by 165 an eavesdropper or observer. For example, the knowledge that 166 sent a message to via SMTP 167 taken from the headers of an observed SMTP message would be an 168 observation. 170 Inference: Information derived from analysis of information 171 collected directly from communications by an eavesdropper or 172 observer. For example, the knowledge that a given web page was 173 accessed by a given IP address, by comparing the size in octets of 174 measured network flow records to fingerprints derived from known 175 sizes of linked resources on the web servers involved, would be an 176 inference. 178 Collaborator: An entity that is a legitimate participant in a 179 communication, and provides information about that communication 180 to an attacker. Collaborators may either deliberately or 181 unwittingly cooperate with the attacker, in the latter case 182 because the attacker has subverted the collaborator through 183 technical, social, or other means. 185 Key Exfiltration: The transmission of cryptographic keying material 186 for an encrypted communication from a collaborator, deliberately 187 or unwittingly, to an attacker. 189 Content Exfiltration: The transmission of the content of a 190 communication from a collaborator, deliberately or unwittingly, to 191 an attacker 193 3. An Idealized Passive Pervasive Attacker 195 In considering the threat posed by pervasive surveillance, we begin 196 by defining an idealized passive pervasive attacker. While this 197 attacker is less capable than those which we now know to have 198 compromised the Internet from press reports, as elaborated in 199 Section 4, it does set a lower bound on the capabilities of an 200 attacker interested in indiscriminate passive surveillance while 201 interested in remaining undetectable. We note that, prior to the 202 Snowden revelations in 2013, the assumptions of attacker capability 203 presented here would be considered on the border of paranoia outside 204 the network security community. 206 Our idealized attacker is an indiscriminate eavesdropper on an 207 Internet-attached computer network that: 209 o can observe every packet of all communications at any hop in any 210 network path between an initiator and a recipient; 212 o can observe data at rest in any intermediate system between the 213 endpoints controlled by the initiator and recipient; and 215 o can share information with other such attackers; but 217 o takes no other action with respect to these communications (i.e., 218 blocking, modification, injection, etc.). 220 The techniques available to our ideal attacker are direct 221 observation and inference. Direct observation involves taking 222 information directly from eavesdropped communications, such as 223 URLs identifying content or email addresses identifying 224 individuals from application- layer headers. Inference, on the 225 other hand, involves analyzing observed information to derive new 226 information, such as searching for application or behavioral 227 fingerprints in observed traffic to derive information about the 228 observed individual. The use of encryption is generally 229 sufficient to provide confidentiality by preventing direct 230 observation of content, assuming of course, uncompromised 231 encryption implementations and cryptographic keying material. 232 However, encryption provides less complete protection against 233 inference, especially inferences based only on plaintext portions 234 of communications, such as IP and TCP headers for TLS-protected 235 traffic [RFC5246]). 237 3.1. Information subject to direct observation 239 Protocols which do not encrypt their payload make the entire content 240 of the communication available to the idealized attacker along their 241 path. Following the advice in [RFC3365], most such protocols have a 242 secure variant which encrypts payload for confidentiality, and these 243 secure variants are seeing ever-wider deployment. A noteworthy 244 exception is DNS [RFC1035], as DNSSEC [RFC4033] does not have 245 confidentiality as a requirement. 247 This implies that, in the absence of changes to the protocol as 248 presently under development in the IETF's DNS Private Exchange 249 (DPRIVE) working group [I-D.ietf-dprive-problem-statement], all DNS 250 queries and answers generated by the activities of any protocol are 251 available to the attacker. 253 When store-and-forward protocols are used, (e.g. SMTP [RFC5321]) 254 intermediaries leave this data subject to observation by an attacker 255 that has compromised these intermediaries, unless the data is 256 encrypted end-to-end by the application layer protocol, or the 257 implementation uses an encrypted store for this data. 259 3.2. Information useful for inference 261 Inference is information extracted from later analysis of an observed 262 or eavesdropped communication, and/or correlation of observed or 263 eavesdropped information with information available from other 264 sources. Indeed, most useful inference performed by the attacker 265 falls under the rubric of correlation. The simplest example of this 266 is the observation of DNS queries and answers from and to a source 267 and correlating those with IP addresses with which that source 268 communicates. This can give access to information otherwise not 269 available from encrypted application payloads (e.g., the Host: 270 HTTP/1.1 request header when HTTP is used with TLS). 272 Protocols which encrypt their payload using an application- or 273 transport-layer encryption scheme (e.g. TLS) still expose all the 274 information in their network and transport layer headers to the 275 attacker, including source and destination addresses and ports. 276 IPsec ESP [RFC4303] further encrypts the transport-layer headers, but 277 still leaves IP address information unencrypted; in tunnel mode, 278 these addresses correspond to the tunnel endpoints. Features of the 279 security protocols themselves, e.g. the TLS session identifier, may 280 leak information that can be used for correlation and inference. 281 While this information is much less semantically rich than the 282 application payload, it can still be useful for the inferring an 283 individual's activities. 285 Inference can also leverage information obtained from sources other 286 than direct traffic observation. Geolocation databases, for example, 287 have been developed that map IP addresses to a location, in order to 288 provide location-aware services such as targeted advertising. This 289 location information is often of sufficient resolution that it can be 290 used to draw further inferences toward identifying or profiling an 291 individual. 293 Social media provide another source of more or less publicly 294 accessible information. This information can be extremely 295 semantically rich, including information about an individual's 296 location, associations with other individuals and groups, and 297 activities. Further, this information is generally contributed and 298 curated voluntarily by the individuals themselves: it represents 299 information which the individuals are not necessarily interested in 300 protecting for privacy reasons. However, correlation of this social 301 networking data with information available from direct observation of 302 network traffic allows the creation of a much richer picture of an 303 individual's activities than either alone. 305 We note with some alarm that there is little that can be done at 306 protocol design time to limit such correlation by the attacker, and 307 that the existence of such data sources in many cases greatly 308 complicates the problem of protecting privacy by hardening protocols 309 alone. 311 3.3. An illustration of an ideal passive pervasive attack 313 To illustrate how capable the idealized attacker is even given its 314 limitations, we explore the non-anonymity of encrypted IP traffic in 315 this section. Here we examine in detail some inference techniques 316 for associating a set of addresses with an individual, in order to 317 illustrate the difficulty of defending communications against our 318 idealized attacker. Here, the basic problem is that information 319 radiated even from protocols which have no obvious connection with 320 personal data can be correlated with other information which can 321 paint a very rich behavioral picture, that only takes one unprotected 322 link in the chain to associate with an identity. 324 3.3.1. Analysis of IP headers 326 Internet traffic can be monitored by tapping Internet links, or by 327 installing monitoring tools in Internet routers. Of course, a single 328 link or a single router only provides access to a fraction of the 329 global Internet traffic. However, monitoring a number of high 330 capacity links or a set of routers placed at strategic locations 331 provides access to a good sampling of Internet traffic. 333 Tools like IPFIX [RFC7011] allow administrators to acquire statistics 334 about sequences of packets with some common properties that pass 335 through a network device. The most common set of properties used in 336 flow measurement is the "five-tuple" of source and destination 337 addresses, protocol type, and source and destination ports. These 338 statistics are commonly used for network engineering, but could 339 certainly be used for other purposes. 341 Let's assume for a moment that IP addresses can be correlated to 342 specific services or specific users. Analysis of the sequences of 343 packets will quickly reveal which users use what services, and also 344 which users engage in peer-to-peer connections with other users. 345 Analysis of traffic variations over time can be used to detect 346 increased activity by particular users, or in the case of peer-to- 347 peer connections increased activity within groups of users. 349 3.3.2. Correlation of IP addresses to user identities 351 The correlation of IP addresses with specific users can be done in 352 various ways. For example, tools like reverse DNS lookup can be used 353 to retrieve the DNS names of servers. Since the addresses of servers 354 tend to be quite stable and since servers are relatively less 355 numerous than users, an attacker could easily maintain its own copy 356 of the DNS for well-known or popular servers, to accelerate such 357 lookups. 359 On the other hand, the reverse lookup of IP addresses of users is 360 generally less informative. For example, a lookup of the address 361 currently used by one author's home network returns a name of the 362 form "c-192-000-002-033.hsd1.wa.comcast.net". This particular type 363 of reverse DNS lookup generally reveals only coarse-grained location 364 or provider information, equivalent to that available from 365 geolocation databases. 367 In many jurisdictions, Internet Service Providers (ISPs) are required 368 to provide identification on a case by case basis of the "owner" of a 369 specific IP address for law enforcement purposes. This is a 370 reasonably expedient process for targeted investigations, but 371 pervasive surveillance requires something more efficient. This 372 provides an incentive for the attacker to secure the cooperation of 373 the ISP in order to automate this correlation. 375 3.3.3. Monitoring messaging clients for IP address correlation 377 Even if the ISP does not cooperate, user identity can often be 378 obtained via inference. POP3 [RFC1939] and IMAP [RFC3501] are used 379 to retrieve mail from mail servers, while a variant of SMTP is used 380 to submit messages through mail servers. IMAP connections originate 381 from the client, and typically start with an authentication exchange 382 in which the client proves its identity by answering a password 383 challenge. The same holds for the SIP protocol [RFC3261] and many 384 instant messaging services operating over the Internet using 385 proprietary protocols. 387 The username is directly observable if any of these protocols operate 388 in cleartext; the username can then be directly associated with the 389 source address. 391 3.3.4. Retrieving IP addresses from mail headers 393 SMTP [RFC5321] requires that each successive SMTP relay adds a 394 "Received" header to the mail headers. The purpose of these headers 395 is to enable audit of mail transmission, and perhaps to distinguish 396 between regular mail and spam. Here is an extract from the headers 397 of a message recently received from the "perpass" mailing list: 399 "Received: from 192-000-002-044.zone13.example.org (HELO 400 ?192.168.1.100?) (xxx.xxx.xxx.xxx) by lvps192-000-002-219.example.net 401 with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 27 Oct 402 2013 21:47:14 +0100 Message-ID: <526D7BD2.7070908@example.org> Date: 403 Sun, 27 Oct 2013 20:47:14 +0000 From: Some One 404 " 406 This is the first "Received" header attached to the message by the 407 first SMTP relay; for privacy reasons, the field values have been 408 anonymized. We learn here that the message was submitted by "Some 409 One" on October 27, from a host behind a NAT (192.168.1.100) 410 [RFC1918] that used the IP address 192.0.2.44. The information 411 remained in the message, and is accessible by all recipients of the 412 "perpass" mailing list, or indeed by any attacker that sees at least 413 one copy of the message. 415 An attacker that can observe sufficient email traffic can regularly 416 update the mapping between public IP addresses and individual email 417 identities. Even if the SMTP traffic was encrypted on submission and 418 relaying, the attacker can still receive a copy of public mailing 419 lists like "perpass". 421 3.3.5. Tracking address usage with web cookies 423 Many web sites only encrypt a small fraction of their transactions. 424 A popular pattern is to use HTTPS for the login information, and then 425 use a "cookie" to associate following clear-text transactions with 426 the user's identity. Cookies are also used by various advertisement 427 services to quickly identify the users and serve them with 428 "personalized" advertisements. Such cookies are particularly useful 429 if the advertisement services want to keep tracking the user across 430 multiple sessions that may use different IP addresses. 432 As cookies are sent in clear text, an attacker can build a database 433 that associates cookies to IP addresses for non-HTTPS traffic. If 434 the IP address is already identified, the cookie can be linked to the 435 user identify. After that, if the same cookie appears on a new IP 436 address, the new IP address can be immediately associated with the 437 pre-determined identity. 439 3.3.6. Graph-based approaches to address correlation 441 An attacker can track traffic from an IP address not yet associated 442 with an individual to various public services (e.g. websites, mail 443 servers, game servers), and exploit patterns in the observed traffic 444 to correlate this address with other addresses that show similar 445 patterns. For example, any two addresses that show connections to 446 the same IMAP or webmail services, the same set of favorite websites, 447 and game servers at similar times of day may be associated with the 448 same individual. Correlated addresses can then be tied to an 449 individual through one of the techniques above, walking the "network 450 graph" to expand the set of attributable traffic. 452 3.3.7. Tracking of Link Layer Identifiers 454 Moving back down the stack, technologies like Ethernet or Wi-Fi use 455 MAC Addresses to identify link-level destinations. MAC Addresses 456 assigned according to IEEE-802 standards are globally-unique 457 identifiers for the device. If the link is publicly accessible, an 458 attacker can eavesdrop and perform tracking. For example, the 459 attacker can track the wireless traffic at publicly accessible Wi-Fi 460 networks. Simple devices can monitor the traffic, and reveal which 461 MAC Addresses are present. Also, devices do not need to be connected 462 to a network to expose link-layer identifiers. Active service 463 discovery always discloses the MAC address of the user, and sometimes 464 the SSIDs of previously visited networks. For instance, certain 465 techniques such as the use of "hidden SSIDs" require the mobile 466 device to broadcast the network identifier together with the device 467 identifier. This combination can further expose the user to 468 inference attacks, as more information can be derived from the 469 combination of MAC address, SSID being probed, time and current 470 location. For example, a user actively probing for a semi-unique 471 SSID on a flight out of a certain city can imply that the user is no 472 longer at the physical location of the corresponding AP. Given that 473 large-scale databases of the MAC addresses of wireless access points 474 for geolocation purposes have been known to exist for some time, the 475 attacker could easily build a database linking link-layer 476 identifiers, time and device or user identities, and use it to track 477 the movement of devices and of their owners. On the other hand, if 478 the network does not use some form of Wi-Fi encryption, or if the 479 attacker can access the decrypted traffic, the analysis will also 480 provide the correlation between link-layer identifiers such as MAC 481 Addresses and IP addresses. Additional monitoring using techniques 482 exposed in the previous sections will reveal the correlation between 483 MAC addresses, IP addresses, and user identity. For instance, 484 similarly to the use of web cookies, MAC addresses provide identity 485 information that can be used to associate a user to different IP 486 addresses. 488 4. Reported Instances of Large-Scale Attacks 490 The situation in reality is more bleak than that suggested by an 491 analysis of our idealized attacker. Through revelations of sensitive 492 documents in several media outlets, the Internet community has been 493 made aware of several intelligence activities conducted by US and UK 494 national intelligence agencies, particularly the US National Security 495 Agency (NSA) and the UK Government Communications Headquarters 496 (GCHQ). These documents have revealed methods that these agencies 497 use to attack Internet applications and obtain sensitive user 498 information. There is little reason to suppose that only the US or 499 UK governments are involved in these sorts of activities; the 500 examples are just ones that were disclosed. We note that these 501 reports are primarily useful as an illustration of the types of 502 capabilities fielded by pervasive attackers as of the date of the 503 Snowden leaks in 2013. 505 First, they confirm the deployment of large-scale passive collection 506 of Internet traffic, which confirms the existence of pervasive 507 passive attackers with at least the capabilities of our idealized 508 attacker. For example [pass1][pass2][pass3][pass4]: 510 o NSA's XKEYSCORE system accesses data from multiple access points 511 and searches for "selectors" such as email addresses, at the scale 512 of tens of terabytes of data per day. 514 o GCHQ's Tempora system appears to have access to around 1,500 major 515 cables passing through the UK. 517 o NSA's MUSCULAR program has tapped cables between data centers 518 belonging to major service providers. 520 o Several programs appear to perform wide-scale collection of 521 cookies in web traffic and location data from location-aware 522 portable devices such as smartphones. 524 However, the capabilities described by these reports go beyond those 525 of our idealized attacker. They include the compromise of 526 cryptographic protocols, including decryption of TLS-protected 527 Internet sessions [dec1][dec2][dec3]. For example, the NSA BULLRUN 528 project worked to undermine encryption through multiple approaches, 529 including covert modifications to cryptographic software on end 530 systems. 532 Reported capabilities include the direct compromise of intermediate 533 systems and arrangements with service providers for bulk data and 534 metadata access [dir1][dir2][dir3], bypassing the need to capture 535 traffic on the wire. For example, the NSA PRISM program provides the 536 agency with access to many types of user data (e.g., email, chat, 537 VoIP). 539 The reported capabilities also include elements of active pervasive 540 attack, including: 542 o Insertion of devices as a man-in-the-middle of Internet 543 transactions [TOR1][TOR2]. For example, NSA's QUANTUM system 544 appears to use several different techniques to hijack HTTP 545 connections, ranging from DNS response injection to HTTP 302 546 redirects. 548 o Use of implants on end systems to undermine security and anonymity 549 features [dec2][TOR1][TOR2]. For example, QUANTUM is used to 550 direct users to a FOXACID server, which in turn delivers an 551 implant to compromise browsers of Tor users. 553 o Use of implants on network elements from many major equipment 554 providers, including Cisco, Juniper, Huawei, Dell, and HP, as 555 provided by the NSA's Advanced Network Technology group. 556 [spiegel1] 558 o Use of botnet-scale collections of compromised hosts [spiegel3]. 560 The scale of the compromise extends beyond the network to include 561 subversion of the technical standards process itself. For example, 562 there is suspicion that NSA modifications to the DUAL_EC_DRBG random 563 number generator were made to ensure that keys generated using that 564 generator could be predicted by NSA. This RNG was made part of 565 NIST's SP 800-90A, for which NIST acknowledges NSA's assistance. 566 There have also been reports that the NSA paid RSA Security for a 567 related contract with the result that the curve became the default in 568 the RSA BSAFE product line. 570 We use the term "pervasive attack" [RFC7258] to collectively describe 571 these operations. The term "pervasive" is used because the attacks 572 are designed to indiscriminately gather as much data as possible and 573 to apply selective analysis on targets after the fact. This means 574 that all, or nearly all, Internet communications are targets for 575 these attacks. To achieve this scale, the attacks are physically 576 pervasive; they affect a large number of Internet communications. 577 They are pervasive in content, consuming and exploiting any 578 information revealed by the protocol. And they are pervasive in 579 technology, exploiting many different vulnerabilities in many 580 different protocols. 582 Again, it's important to note that, although the attacks mentioned 583 above were executed by NSA and GCHQ, there are many other 584 organizations that can mount pervasive surveillance attacks. Because 585 of the resources required to achieve pervasive scale, these attacks 586 are most commonly undertaken by nation-state actors. For example, 587 the Chinese Internet filtering system known as the "Great Firewall of 588 China" uses several techniques that are similar to the QUANTUM 589 program, and which have a high degree of pervasiveness with regard to 590 the Internet in China. Therefore, legal restrictions in any one 591 jurisdiction on pervasive monitoring activities cannot eliminate the 592 risk of pervasive attack to the Internet as a whole. 594 5. Threat Model 596 Given these disclosures, we must consider a broader threat model. 598 Pervasive surveillance aims to collect information across a large 599 number of Internet communications, analyzing the collected 600 communications to identify information of interest within individual 601 communications, or inferring information from correlated 602 communications. This analysis sometimes benefits from decryption of 603 encrypted communications and deanonymization of anonymized 604 communications. As a result, these attackers desire both access to 605 the bulk of Internet traffic and to the keying material required to 606 decrypt any traffic that has been encrypted. Even if keys are not 607 available, note that the presence of a communication and the fact 608 that it is encrypted may both be inputs to an analysis, even if the 609 attacker cannot decrypt the communication. 611 The attacks listed above highlight new avenues both for access to 612 traffic and for access to relevant encryption keys. They further 613 indicate that the scale of surveillance is sufficient to provide a 614 general capability to cross-correlate communications, a threat not 615 previously thought to be relevant at the scale of the Internet. 617 5.1. Attacker Capabilities 619 +--------------------------+-------------------------------------+ 620 | Attack Class | Capability | 621 +--------------------------+-------------------------------------+ 622 | Passive observation | Directly capture data in transit | 623 | | | 624 | Passive inference | Infer from reduced/encrypted data | 625 | | | 626 | Active | Manipulate / inject data in transit | 627 | | | 628 | Static key exfiltration | Obtain key material once / rarely | 629 | | | 630 | Dynamic key exfiltration | Obtain per-session key material | 631 | | | 632 | Content exfiltration | Access data at rest | 633 +--------------------------+-------------------------------------+ 635 Security analyses of Internet protocols commonly consider two classes 636 of attacker: Passive pervasive attackers, who can simply listen in on 637 communications as they transit the network, and active pervasive 638 attackers, who can modify or delete packets in addition to simply 639 collecting them. 641 In the context of pervasive passive surveillance, these attacks take 642 on an even greater significance. In the past, these attackers were 643 often assumed to operate near the edge of the network, where attacks 644 can be simpler. For example, in some LANs, it is simple for any node 645 to engage in passive listening to other nodes' traffic or inject 646 packets to accomplish active pervasive attacks. However, as we now 647 know, both passive and active pervasive attacks are undertaken by 648 pervasive attackers closer to the core of the network, greatly 649 expanding the scope and capability of the attacker. 651 Eavesdropping and observation at a larger scale make passive 652 inference attacks easier to carry out: a passive pervasive attacker 653 with access to a large portion of the Internet can analyze collected 654 traffic to create a much more detailed view of individual behavior 655 than an attacker that collects at a single point. Even the usual 656 claim that encryption defeats passive pervasive attackers is 657 weakened, since a pervasive flow access attacker can infer 658 relationships from correlations over large numbers of sessions, e.g., 659 pairing encrypted sessions with unencrypted sessions from the same 660 host, or performing traffic fingerprinting between known and unknown 661 encrypted sessions. Reports on the NSA XKEYSCORE system would 662 indicate it is an example of such an attacker. 664 An active pervasive attacker likewise has capabilities beyond those 665 of a localized active attacker. Flow modification attacks are often 666 limited by network topology, for example by a requirement that the 667 attacker be able to see a targeted session as well as inject packets 668 into it. A pervasive flow modification attacker with access at 669 multiple points within the core of the Internet is able to overcome 670 these topological limitations and perform attacks over a much broader 671 scope. Being positioned in the core of the network rather than the 672 edge can also enable an active pervasive attacker to reroute targeted 673 traffic, amplifying the ability to perform both eavesdropping and 674 traffic injection. Active pervasive attackers can also benefit from 675 passive pervasive collection to identify vulnerable hosts. 677 While not directly related to pervasiveness, attackers that are in a 678 position to mount a active pervasive attack are also often in a 679 position to subvert authentication, a traditional protection against 680 such attacks. Authentication in the Internet is often achieved via 681 trusted third party authorities such as the Certificate Authorities 682 (CAs) that provide web sites with authentication credentials. An 683 attacker with sufficient resources may also be able to induce an 684 authority to grant credentials for an identity of the attacker's 685 choosing. If the parties to a communication will trust multiple 686 authorities to certify a specific identity, this attack may be 687 mounted by suborning any one of the authorities (the proverbial 688 "weakest link"). Subversion of authorities in this way can allow an 689 active attack to succeed in spite of an authentication check. 691 Beyond these three classes (observation, inference, and active), 692 reports on the BULLRUN effort to defeat encryption and the PRISM 693 effort to obtain data from service providers suggest three more 694 classes of attack: 696 o Static key exfiltration 698 o Dynamic key exfiltration 700 o Content exfiltration 702 These attacks all rely on a collaborator providing the attacker with 703 some information, either keys or data. These attacks have not 704 traditionally been considered in scope for the Security 705 Considerations sections of IETF protocols, as they occur outside the 706 protocol. 708 The term "key exfiltration" refers to the transfer of keying material 709 for an encrypted communication from the collaborator to the attacker. 710 By "static", we mean that the transfer of keys happens once, or 711 rarely, typically of a long-lived key. For example, this case would 712 cover a web site operator that provides the private key corresponding 713 to its HTTPS certificate to an intelligence agency. 715 "Dynamic" key exfiltration, by contrast, refers to attacks in which 716 the collaborator delivers keying material to the attacker frequently, 717 e.g., on a per-session basis. This does not necessarily imply 718 frequent communications with the attacker; the transfer of keying 719 material may be virtual. For example, if an endpoint were modified 720 in such a way that the attacker could predict the state of its 721 psuedorandom number generator, then the attacker would be able to 722 derive per-session keys even without per-session communications. 724 Finally, content exfiltration is the attack in which the collaborator 725 simply provides the attacker with the desired data or metadata. 726 Unlike the key exfiltration cases, this attack does not require the 727 attacker to capture the desired data as it flows through the network. 728 The exfiltration is of data at rest, rather than data in transit. 729 This increases the scope of data that the attacker can obtain, since 730 the attacker can access historical data - the attacker does not have 731 to be listening at the time the communication happens. 733 Exfiltration attacks can be accomplished via attacks against one of 734 the parties to a communication, i.e., by the attacker stealing the 735 keys or content rather than the party providing them willingly. In 736 these cases, the party may not be aware that they are collaborating, 737 at least at a human level. Rather, the subverted technical assets 738 are "collaborating" with the attacker (by providing keys/content) 739 without their owner's knowledge or consent. 741 Any party that has access to encryption keys or unencrypted data can 742 be a collaborator. While collaborators are typically the endpoints 743 of a communication (with encryption securing the links), 744 intermediaries in an unencrypted communication can also facilitate 745 content exfiltration attacks as collaborators by providing the 746 attacker access to those communications. For example, documents 747 describing the NSA PRISM program claim that NSA is able to access 748 user data directly from servers, where it is stored unencrypted. In 749 these cases, the operator of the server would be a collaborator, if 750 an unwitting one. By contrast, in the NSA MUSCULAR program, a set of 751 collaborators enabled attackers to access the cables connecting data 752 centers used by service providers such as Google and Yahoo. Because 753 communications among these data centers were not encrypted, the 754 collaboration by an intermediate entity allowed NSA to collect 755 unencrypted user data. 757 5.2. Attacker Costs 759 +--------------------------+-----------------------------------+ 760 | Attack Class | Cost / Risk to Attacker | 761 +--------------------------+-----------------------------------+ 762 | Passive observation | Passive data access | 763 | | | 764 | Passive inference | Passive data access + processing | 765 | | | 766 | Active | Active data access + processing | 767 | | | 768 | Static key exfiltration | One-time interaction | 769 | | | 770 | Dynamic key exfiltration | Ongoing interaction / code change | 771 | | | 772 | Content exfiltration | Ongoing, bulk interaction | 773 +--------------------------+-----------------------------------+ 775 Each of the attack types discussed in the previous section entails 776 certain costs and risks. These costs differ by attack, and can be 777 helpful in guiding response to pervasive attack. 779 Depending on the attack, the attacker may be exposed to several types 780 of risk, ranging from simply losing access to arrest or prosecution. 781 In order for any of these negative consequences to occur, however, 782 the attacker must first be discovered and identified. So the primary 783 risk we focus on here is the risk of discovery and attribution. 785 A passive pervasive attack is the simplest to mount in some ways. 786 The base requirement is that the attacker obtain physical access to a 787 communications medium and extract communications from it. For 788 example, the attacker might tap a fiber-optic cable, acquire a mirror 789 port on a switch, or listen to a wireless signal. The need for these 790 taps to have physical access or proximity to a link exposes the 791 attacker to the risk that the taps will be discovered. For example, 792 a fiber tap or mirror port might be discovered by network operators 793 noticing increased attenuation in the fiber or a change in switch 794 configuration. Of course, passive pervasive attacks may be 795 accomplished with the cooperation of the network operator, in which 796 case there is a risk that the attacker's interactions with the 797 network operator will be exposed. 799 In many ways, the costs and risks for an active pervasive attack are 800 similar to those for a passive pervasive attack, with a few 801 additions. An active attacker requires more robust network access 802 than a passive attacker, since for example they will often need to 803 transmit data as well as receive it. In the wireless example above, 804 the attacker would need to act as an transmitter as well as receiver, 805 greatly increasing the probability the attacker will be discovered 806 (e.g., using direction-finding technology). Active attacks are also 807 much more observable at higher layers of the network. For example, 808 an active attacker that attempts to use a mis-issued certificate 809 could be detected via Certificate Transparency [RFC6962]. 811 In terms of raw implementation complexity, passive pervasive attacks 812 require only enough processing to extract information from the 813 network and store it. Active pervasive attacks, by contrast, often 814 depend on winning race conditions to inject packets into active 815 connections. So active pervasive attacks in the core of the network 816 require processing hardware to that can operate at line speed 817 (roughly 100Gbps to 1Tbps in the core) to identify opportunities for 818 attack and insert attack traffic in a high-volume traffic. Key 819 exfiltration attacks rely on passive pervasive attack for access to 820 encrypted data, with the collaborator providing keys to decrypt the 821 data. So the attacker undertakes the cost and risk of a passive 822 pervasive attack, as well as additional risk of discovery via the 823 interactions that the attacker has with the collaborator. 825 Some active attacks are more expensive than others. For example, 826 active man-in-the-middle (MITM) attacks require access to one or more 827 points on a communication's network path that allow visibility of the 828 entire session and the ability to modify or drop legitimate packets 829 in favor of the attacker's packets. A similar but weaker form of 830 attack, called an active man-on-the-side (MOTS), requires access to 831 only part of the session. In an active MOTS attack, the attacker 832 need only be able to inject or modify traffic on the network element 833 the attacker has access to. While this may not allow for full 834 control of a communication session (as in an MITM attack), the 835 attacker can perform a number of powerful attacks, including but not 836 limited to: injecting packets that could terminate the session (e.g., 837 TCP RST packets), sending a fake DNS reply to redirect ensuing TCP 838 connections to an address of the attacker's choice (i.e., winning a 839 "DNS response race"), and mounting an HTTP Redirect attack by 840 observing a TCP/HTTP connection to a target address and injecting a 841 TCP data packet containing an HTTP redirect. For example, the system 842 dubbed by researchers as China's "Great Cannon" [great-cannon] can 843 operate in ful MITM mode to accomplish very complex attacks that can 844 modify content in transit while the well-known Great Firewall of 845 China is a MOTS system that focuses on blocking access to certain 846 kinds of traffic and destinations via TCP RST packet injection. 848 In this sense, static exfiltration has a lower risk profile than 849 dynamic. In the static case, the attacker need only interact with 850 the collaborator a small number of times, possibly only once, say to 851 exchange a private key. In the dynamic case, the attacker must have 852 continuing interactions with the collaborator. As noted above these 853 interactions may be real, such as in-person meetings, or virtual, 854 such as software modifications that render keys available to the 855 attacker. Both of these types of interactions introduce a risk that 856 they will be discovered, e.g., by employees of the collaborator 857 organization noticing suspicious meetings or suspicious code changes. 859 Content exfiltration has a similar risk profile to dynamic key 860 exfiltration. In a content exfiltration attack, the attacker saves 861 the cost and risk of conducting a passive pervasive attack. The risk 862 of discovery through interactions with the collaborator, however, is 863 still present, and may be higher. The content of a communication is 864 obviously larger than the key used to encrypt it, often by several 865 orders of magnitude. So in the content exfiltration case, the 866 interactions between the collaborator and the attacker need to be 867 much higher-bandwidth than in the key exfiltration cases, with a 868 corresponding increase in the risk that this high-bandwidth channel 869 will be discovered. 871 It should also be noted that in these latter three exfiltration 872 cases, the collaborator also undertakes a risk that his collaboration 873 with the attacker will be discovered. Thus the attacker may have to 874 incur additional cost in order to convince the collaborator to 875 participate in the attack. Likewise, the scope of these attacks is 876 limited to case where the attacker can convince a collaborator to 877 participate. If the attacker is a national government, for example, 878 it may be able to compel participation within its borders, but have a 879 much more difficult time recruiting foreign collaborators. 881 As noted above, the collaborator in an exfiltration attack can be 882 unwitting; the attacker can steal keys or data to enable the attack. 883 In some ways, the risks of this approach are similar to the case of 884 an active collaborator. In the static case, the attacker needs to 885 steal information from the collaborator once; in the dynamic case, 886 the attacker needs to continued presence inside the collaborators 887 systems. The main difference is that the risk in this case is of 888 automated discovery (e.g., by intrusion detection systems) rather 889 than discovery by humans. 891 6. Security Considerations 893 This document describes a threat model for pervasive surveillance 894 attacks. Mitigations are to be given in a future document. 896 7. IANA Considerations 898 This document has no actions for IANA. 900 8. IAB Members at the Time of Approval 902 Jari Arkko (IETF Chair) 903 Mary Barnes 904 Marc Blanchet 905 Ralph Droms 906 Ted Hardie 907 Joe Hildebrand 908 Russ Housley 909 Erik Nordmark 910 Robert Sparks 911 Andrew Sullivan 912 Dave Thaler 913 Brian Trammell 914 Suzanne Woolf 916 9. Acknowledgements 918 Thanks to Dave Thaler for the list of attacks and taxonomy; to 919 Security Area Directors Stephen Farrell, Sean Turner, and Kathleen 920 Moriarty for starting and managing the IETF's discussion on pervasive 921 attack; and to Stephan Neuhaus, Mark Townsley, Chris Inacio, 922 Evangelos Halepilidis, Bjoern Hoehrmann, Aziz Mohaisen, Russ Housley, 923 Joe Hall, Andrew Sullivan, the IEEE 802 Privacy Executive Committee 924 SG, and the IAB Privacy and Security Program for their input. 926 10. References 928 10.1. Normative References 930 [RFC6973] Cooper, A., Tschofenig, H., Aboba, B., Peterson, J., 931 Morris, J., Hansen, M., and R. Smith, "Privacy 932 Considerations for Internet Protocols", RFC 6973, July 933 2013. 935 10.2. Informative References 937 [pass1] The Guardian, "How the NSA is still harvesting your online 938 data", 2013, 939 . 942 [pass2] The Guardian, "NSA's Prism surveillance program: how it 943 works and what it can do", 2013, 944 . 947 [pass3] The Guardian, "XKeyscore: NSA tool collects 'nearly 948 everything a user does on the internet'", 2013, 949 . 952 [pass4] The Guardian, "How does GCHQ's internet surveillance 953 work?", n.d., . 956 [dec1] The New York Times, "N.S.A. Able to Foil Basic Safeguards 957 of Privacy on Web", 2013, 958 . 961 [dec2] The Guardian, "Project Bullrun - classification guide to 962 the NSA's decryption program", 2013, 963 . 966 [dec3] The Guardian, "Revealed: how US and UK spy agencies defeat 967 internet privacy and security", 2013, 968 . 971 [TOR1] Schneier, B., "How the NSA Attacks Tor/Firefox Users With 972 QUANTUM and FOXACID", 2013, 973 . 976 [TOR2] The Guardian, "'Tor Stinks' presentation - read the full 977 document", 2013, 978 . 981 [dir1] The Guardian, "NSA collecting phone records of millions of 982 Verizon customers daily", 2013, 983 . 986 [dir2] The Guardian, "NSA Prism program taps in to user data of 987 Apple, Google and others", 2013, 988 . 991 [dir3] The Guardian, "Sigint - how the NSA collaborates with 992 technology companies", 2013, 993 . 996 [spiegel1] 997 C Stocker, ., "NSA's Secret Toolbox: Unit Offers Spy 998 Gadgets for Every Need", December 2013, 999 . 1003 [spiegel3] 1004 H Schmundt, ., "The Digital Arms Race: NSA Preps America 1005 for Future Battle", January 2014, 1006 . 1010 [great-cannon] 1011 Paxson, V., "China's Great Cannon", 2015, 1012 . 1014 [RFC1035] Mockapetris, P., "Domain names - implementation and 1015 specification", STD 13, RFC 1035, November 1987. 1017 [RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and 1018 E. Lear, "Address Allocation for Private Internets", BCP 1019 5, RFC 1918, February 1996. 1021 [RFC1939] Myers, J. and M. Rose, "Post Office Protocol - Version 3", 1022 STD 53, RFC 1939, May 1996. 1024 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1025 A., Peterson, J., Sparks, R., Handley, M., and E. 1026 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1027 June 2002. 1029 [RFC3365] Schiller, J., "Strong Security Requirements for Internet 1030 Engineering Task Force Standard Protocols", BCP 61, RFC 1031 3365, August 2002. 1033 [RFC3501] Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION 1034 4rev1", RFC 3501, March 2003. 1036 [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. 1037 Rose, "DNS Security Introduction and Requirements", RFC 1038 4033, March 2005. 1040 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", RFC 1041 4303, December 2005. 1043 [RFC4949] Shirey, R., "Internet Security Glossary, Version 2", RFC 1044 4949, August 2007. 1046 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 1047 (TLS) Protocol Version 1.2", RFC 5246, August 2008. 1049 [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, 1050 October 2008. 1052 [RFC6962] Laurie, B., Langley, A., and E. Kasper, "Certificate 1053 Transparency", RFC 6962, June 2013. 1055 [RFC7011] Claise, B., Trammell, B., and P. Aitken, "Specification of 1056 the IP Flow Information Export (IPFIX) Protocol for the 1057 Exchange of Flow Information", STD 77, RFC 7011, September 1058 2013. 1060 [RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an 1061 Attack", BCP 188, RFC 7258, May 2014. 1063 [I-D.ietf-dprive-problem-statement] 1064 Bortzmeyer, S., "DNS privacy considerations", draft-ietf- 1065 dprive-problem-statement-05 (work in progress), May 2015. 1067 Authors' Addresses 1069 Richard Barnes 1071 Email: rlb@ipv.sx 1073 Bruce Schneier 1075 Email: schneier@schneier.com 1077 Cullen Jennings 1079 Email: fluffy@cisco.com 1081 Ted Hardie 1083 Email: ted.ietf@gmail.com 1084 Brian Trammell 1086 Email: ietf@trammell.ch 1088 Christian Huitema 1090 Email: huitema@huitema.net 1092 Daniel Borkmann 1094 Email: dborkman@iogearbox.net