idnits 2.17.1 

draft-iab-privsec-confidentiality-threat-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document is more than 15 pages and seems to lack a Table of Contents.


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The document has examples using IPv4 documentation addresses according
     to RFC6890, but does not use any IPv6 documentation addresses.  Maybe
     there should be IPv6 examples, too?


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (February 06, 2015) is 3367 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'TOR' is defined on line 845, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2015' is defined on line 899, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2821' is defined on line 902, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3851' is defined on line 917, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC4301' is defined on line 925, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC4306' is defined on line 931, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC5655' is defined on line 943, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC5750' is defined on line 947, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC6120' is defined on line 951, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC6698' is defined on line 957, but no explicit
     reference was found in the text

  -- Obsolete informational reference (is this intentional?): RFC 2821
     (Obsoleted by RFC 5321)

  -- Obsolete informational reference (is this intentional?): RFC 3501
     (Obsoleted by RFC 9051)

  -- Obsolete informational reference (is this intentional?): RFC 3851
     (Obsoleted by RFC 5751)

  -- Obsolete informational reference (is this intentional?): RFC 4306
     (Obsoleted by RFC 5996)

  -- Obsolete informational reference (is this intentional?): RFC 5246
     (Obsoleted by RFC 8446)

  -- Obsolete informational reference (is this intentional?): RFC 5750
     (Obsoleted by RFC 8550)

  -- Obsolete informational reference (is this intentional?): RFC 6962
     (Obsoleted by RFC 9162)


     Summary: 1 error (**), 0 flaws (~~), 11 warnings (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                          R. Barnes
3	Internet-Draft
4	Intended status: Informational                               B. Schneier
5	Expires: August 10, 2015
6	                                                             C. Jennings

8	                                                               T. Hardie

10	                                                             B. Trammell

12	                                                              C. Huitema

14	                                                             D. Borkmann

16	                                                       February 06, 2015

18	 Confidentiality in the Face of Pervasive Surveillance: A Threat Model
19	                         and Problem Statement
20	              draft-iab-privsec-confidentiality-threat-02

22	Abstract

24	   Documents published in 2013 revealed several classes of pervasive
25	   surveillance attack on Internet communications.  In this document we
26	   develop a threat model that describes these pervasive attacks.  We
27	   start by assuming a completely passive attacker with an interest in
28	   undetected, indiscriminate eavesdropping, then expand the threat
29	   model with a set of verified attacks that have been published.  Based
30	   on this threat model, we discuss the techniques that can be employed
31	   in Internet protocol design to increase the protocols robustness to
32	   pervasive surveillance.

34	Status of This Memo

36	   This Internet-Draft is submitted in full conformance with the
37	   provisions of BCP 78 and BCP 79.

39	   Internet-Drafts are working documents of the Internet Engineering
40	   Task Force (IETF).  Note that other groups may also distribute
41	   working documents as Internet-Drafts.  The list of current Internet-
42	   Drafts is at http://datatracker.ietf.org/drafts/current/.

44	   Internet-Drafts are draft documents valid for a maximum of six months
45	   and may be updated, replaced, or obsoleted by other documents at any
46	   time.  It is inappropriate to use Internet-Drafts as reference
47	   material or to cite them other than as "work in progress."
48	   This Internet-Draft will expire on August 10, 2015.

50	Copyright Notice

52	   Copyright (c) 2015 IETF Trust and the persons identified as the
53	   document authors.  All rights reserved.

55	   This document is subject to BCP 78 and the IETF Trust's Legal
56	   Provisions Relating to IETF Documents
57	   (http://trustee.ietf.org/license-info) in effect on the date of
58	   publication of this document.  Please review these documents
59	   carefully, as they describe your rights and restrictions with respect
60	   to this document.  Code Components extracted from this document must
61	   include Simplified BSD License text as described in Section 4.e of
62	   the Trust Legal Provisions and are provided without warranty as
63	   described in the Simplified BSD License.

65	1.  Introduction

67	   Starting in June 2013, documents released to the press by Edward
68	   Snowden have revealed several operations undertaken by intelligence
69	   agencies to exploit Internet communications for intelligence
70	   purposes.  These attacks were largely based on protocol
71	   vulnerabilities that were already known to exist.  The attacks were
72	   nonetheless striking in their pervasive nature, both in terms of the
73	   amount of Internet communications targeted, and in terms of the
74	   diversity of attack techniques employed.

76	   To ensure that the Internet can be trusted by users, it is necessary
77	   for the Internet technical community to address the vulnerabilities
78	   exploited in these attacks [RFC7258].  The goal of this document is
79	   to describe more precisely the threats posed by these pervasive
80	   attacks, and based on those threats, lay out the problems that need
81	   to be solved in order to secure the Internet in the face of those
82	   threats.

84	   The remainder of this document is structured as follows.  In
85	   Section 3, we describe an idealized passive attacker, one which could
86	   completely undetectably compromise communications at Internet scale.
87	   In Section 4, we provide a brief summary of some attacks that have
88	   been disclosed, and use these to expand the assumed capabilities of
89	   our idealized attacker.  Section 5 describes a threat model based on
90	   these attacks, focusing on classes of attack that have not been a
91	   focus of Internet engineering to date.

93	2.  Terminology

95	   This document makes extensive use of standard security and privacy
96	   terminology; see [RFC4949] and [RFC6973].  Terms used from [RFC6973]
97	   include Eavesdropper, Observer, Initiator, Intermediary, Recipient,
98	   Attack (in a privacy context), Correlation, Fingerprint, Traffic
99	   Analysis, and Identifiability (and related terms).  In addition, we
100	   use a few terms that are specific to the attacks discussed here:

102	   Passive Attack:  In this document, the term passive attack is used
103	      with respect to the traffic stream: a passive attack does not
104	      modify the packets in the traffic stream between two endpoints,
105	      modify the treatment of packets in the traffic stream (e.g. delay,
106	      routing), or add or remove packets in the traffic stream.  Passive
107	      attacks are undetectable from the endpoints.

109	   Active Attack:  In constrast to a passive attack, and active attack
110	      may modify a traffic stream, at the cost of possible detection at
111	      the endpoints.

113	   Pervasive Attack:  An attack on Internet communications that makes
114	      use of access at a large number of points in the network, or
115	      otherwise provides the attacker with access to a large amount of
116	      Internet traffic; see [RFC7258]

118	   Observation:  Information collected directly from communications by
119	      an eavesdropper or observer.  For example, the knowledge that
120	      <alice@example.com> sent a message to <bob@example.com> via SMTP
121	      taken from the headers of an observed SMTP message would be an
122	      observation.

124	   Inference:  Information extracted from analysis of information
125	      collected directly from communications by an eavesdropper or
126	      observer.  For example, the knowledge that a given web page was
127	      accessed by a given IP address, by comparing the size in octets of
128	      measured network flow records to fingerprints derived from known
129	      sizes of linked resources on the web servers involved, would be an
130	      inference.

132	   Collaborator:  An entity that is a legitimate participant in a
133	      communication, but who deliberately provides information about
134	      that interaction to an attacker.

136	   Unwitting Collaborator:  An entity that is a legitimate participant
137	      in a communication, and who is the source of information obtained
138	      by the attacker without the entity's consent or intention, because
139	      the attacker has exploited some technology used by the entity.

141	   Key Exfiltration:  The transmission of keying material for an
142	      encrypted communication from a collaborator, deliberately or
143	      unwittingly, to an attacker

145	   Content Exfiltration:  The transmission of the content of a
146	      communication from a collaborator, deliberately or unwittingly, to
147	      an attacker

149	3.  An Idealized Pervasive Passive Attacker

151	   In considering the threat posed by pervasive surveillance, we begin
152	   by defining an idealized pervasive passive attacker.  While this
153	   attacker is less capable than those which we now know to have
154	   compromised the Internet from press reports, as elaborated in
155	   Section 4, it does set a lower bound on the capabilities of an
156	   attacker interested in indiscriminate passive surveillance while
157	   interested in remaining undetectable.  We note that, prior to the
158	   Snowden revelations in 2013, the assumptions of attacker capability
159	   presented here would be considered on the border of paranoia outside
160	   the network security community.

162	   Our idealized attacker is an indiscriminate eavesdropper on an
163	   Internet-attached computer network that:

165	   o  can observe every packet of all communications at any hop in any
166	      network path between an initiator and a recipient;

168	   o  can observe data at rest in any intermediate system between the
169	      endpoints controlled by the initiator and recipient; and

171	   o  can share information with other such attackers; but

173	   o  takes no other action with respect to these communications (i.e.,
174	      blocking, modification, injection, etc.).

176	   The techniques available to our ideal attacker are direct observation
177	   and inference.  Direct observation involves taking information
178	   directly from eavesdropped communications - e.g., URLs identifying
179	   content or email addresses identifying individuals from application-
180	   layer headers.  Inference, on the other hand, involves analyzing
181	   eavesdropped information to derive new information from it; e.g.,
182	   searching for application or behavioral fingerprints in observed
183	   traffic to derive information about the observed individual from
184	   them, in absence of directly-observed sources of the same
185	   information.  The use of encryption to protect confidentiality is
186	   generally enough to prevent direct observation of unencrypted
187	   content, assuming uncompromised encryption implementations and key
188	   material.  However, it provides less complete protection against
189	   inference, especially inference based only on unprotected portions of
190	   communications (e.g.  IP and TCP headers for TLS [RFC5246]).

192	3.1.  Information subject to direct observation

194	   Protocols which do not encrypt their payload make the entire content
195	   of the communication available to the idealized attacker along their
196	   path.  Following the advice in [RFC3365], most such protocols have a
197	   secure variant which encrypts payload for confidentiality, and these
198	   secure variants are seeing ever-wider deployment.  A noteworthy
199	   exception is DNS [RFC1035], as DNSSEC [RFC4033] does not have
200	   confidentiality as a requirement.  This implies that, in the absence
201	   of changes to the protocol as presently under development in the
202	   DPRIVE working group, all DNS queries and answers generated by the
203	   activities of any protocol are available to the attacker.

205	   Protocols which imply the storage of some data at rest in
206	   intermediaries (e.g.  SMTP [RFC5321]) leave this data subject to
207	   observation by an attacker that has compromised these intermediaries,
208	   unless the data is encrypted end-to-end by the application layer
209	   protocol, or the implementation uses an encrypted store for this
210	   data.

212	3.2.  Information useful for inference

214	   Inference is information extracted from later analysis of an observed
215	   or eavesdropped communication, and/or correlation of observed or
216	   eavesdropped information with information available from other
217	   sources.  Indeed, most useful inference performed by the attacker
218	   falls under the rubric of correlation.  The simplest example of this
219	   is the observation of DNS queries and answers from and to a source
220	   and correlating those with IP addresses with which that source
221	   communicates.  This can give access to information otherwise not
222	   available from encrypted application payloads (e.g., the Host:
223	   HTTP/1.1 request header when HTTP is used with TLS).

225	   Protocols which encrypt their payload using an application- or
226	   transport-layer encryption scheme (e.g.  TLS) still expose all the
227	   information in their network and transport layer headers to the
228	   attacker, including source and destination addresses and ports.
229	   IPsec ESP[RFC4303] further encrypts the transport-layer headers, but
230	   still leaves IP address information unencrypted; in tunnel mode,
231	   these addresses correspond to the tunnel endpoints.  Features of the
232	   cryptographic protocols themselves, e.g. the TLS session identifier,
233	   may leak information that can be used for correlation and inference.
234	   While this information is much less semantically rich than the
235	   application payload, it can still be useful for the inferring an
236	   individual's activities.

238	   Inference can also leverage information obtained from sources other
239	   than direct traffic observation.  Geolocation databases, for example,
240	   have been developed map IP addresses to a location, in order to
241	   provide location-aware services such as targeted advertising.  This
242	   location information is often of sufficient resolution that it can be
243	   used to draw further inferences toward identifying or profiling an
244	   individual.

246	   Social media provide another source of more or less publicly
247	   accessible information.  This information can be extremely
248	   semantically rich, including information about an individual's
249	   location, associations with other individuals and groups, and
250	   activities.  Further, this information is generally contributed and
251	   curated voluntarily by the individuals themselves: it represents
252	   information which the individuals are not necessarily interested in
253	   protecting for privacy reasons.  However, correlation of this social
254	   networking data with information available from direct observation of
255	   network traffic allows the creation of a much richer picture of an
256	   individual's activities than either alone.

258	   We note with some alarm that there is little that can be done at
259	   protocol design time to limit such correlation by the attacker, and
260	   that the existence of such data sources in many cases greatly
261	   complicates the problem of protecting privacy by hardening protocols
262	   alone.

264	3.3.  An illustration of an ideal passive attack

266	   To illustrate how capable the idealized attacker is even given its
267	   limitations, we explore the non-anonymity of encrypted IP traffic in
268	   this section.  Here we examine in detail some inference techniques
269	   for associating a set of addresses with an individual, in order to
270	   illustrate the difficulty of defending communications against our
271	   idealized attacker.  Here, the basic problem is that information
272	   radiated even from protocols which have no obvious connection with
273	   personal data can be correlated with other information which can
274	   paint a very rich behavioral picture, that only takes one unprotected
275	   link in the chain to associate with an identity.

277	3.3.1.  Analysis of IP headers

279	   Internet traffic can be monitored by tapping Internet links, or by
280	   installing monitoring tools in Internet routers.  Of course, a single
281	   link or a single router only provides access to a fraction of the
282	   global Internet traffic.  However, monitoring a number of high
283	   capacity links or a set of routers placed at strategic locations
284	   provides access to a good sampling of Internet traffic.

286	   Tools like IPFIX [RFC7011] allow administrators to acquire statistics
287	   about sequences of packets with some common properties that pass
288	   through a network device.  The most common set of properties used in
289	   flow measurement is the "five-tuple"of source and destination
290	   addresses, protocol type, and source and destination ports.  These
291	   statistics are commonly used for network engineering, but could
292	   certainly be used for other purposes.

294	   Let's assume for a moment that IP addresses can be correlated to
295	   specific services or specific users.  Analysis of the sequences of
296	   packets will quickly reveal which users use what services, and also
297	   which users engage in peer-to-peer connections with other users.
298	   Analysis of traffic variations over time can be used to detect
299	   increased activity by particular users, or in the case of peer-to-
300	   peer connections increased activity within groups of users.

302	3.3.2.  Correlation of IP addresses to user identities

304	   The correlation of IP addresses with specific users can be done in
305	   various ways.  For example, tools like reverse DNS lookup can be used
306	   to retrieve the DNS names of servers.  Since the addresses of servers
307	   tend to be quite stable and since servers are relatively less
308	   numerous than users, an attacker could easily maintain its own copy
309	   of the DNS for well-known or popular servers, to accelerate such
310	   lookups.

312	   On the other hand, the reverse lookup of IP addresses of users is
313	   generally less informative.  For example, a lookup of the address
314	   currently used by one author's home network returns a name of the
315	   form "c-192-000-002-033.hsd1.wa.comcast.net".  This particular type
316	   of reverse DNS lookup generally reveals only coarse-grained location
317	   or provider information, equivalent to that available from
318	   geolocation databases.

320	   In many jurisdictions, Internet Service Providers (ISPs) are required
321	   to provide identification on a case by case basis of the "owner" of a
322	   specific IP address for law enforcement purposes.  This is a
323	   reasonably expedient process for targeted investigations, but
324	   pervasive surveillance requires something more efficient.  This
325	   provides an incentive for the attacker to secure the cooperation of
326	   the ISP in order to automate this correlation.

328	3.3.3.  Monitoring messaging clients for IP address correlation

330	   Even if the ISP does not cooperate, user identity can often be
331	   obtained via inference.  POP3 [RFC1939] and IMAP [RFC3501] are used
332	   to retrieve mail from mail servers, while a variant of SMTP is used
333	   to submit messages through mail servers.  IMAP connections originate
334	   from the client, and typically start with an authentication exchange
335	   in which the client proves its identity by answering a password
336	   challenge.  The same holds for the SIP protocol [RFC3261] and many
337	   instant messaging services operating over the Internet using
338	   proprietary protocols.

340	   The username is directly observable if any of these protocols operate
341	   in cleartext; the username can then be directly associated with the
342	   source address.

344	3.3.4.  Retrieving IP addresses from mail headers

346	   SMTP [RFC5321] requires that each successive SMTP relay adds a
347	   "Received" header to the mail headers.  The purpose of these headers
348	   is to enable audit of mail transmission, and perhaps to distinguish
349	   between regular mail and spam.  Here is an extract from the headers
350	   of a message recently received from the "perpass" mailing list:

352	   "Received: from 192-000-002-044.zone13.example.org (HELO
353	   ?192.168.1.100?) (xxx.xxx.xxx.xxx) by lvps192-000-002-219.example.net
354	   with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 27 Oct
355	   2013 21:47:14 +0100 Message-ID: <526D7BD2.7070908@example.org> Date:
356	   Sun, 27 Oct 2013 20:47:14 +0000 From: Some One <some.one@example.org>
357	   "

359	   This is the first "Received" header attached to the message by the
360	   first SMTP relay; for privacy reasons, the field values have been
361	   anonymized.  We learn here that the message was submitted by "Some
362	   One" on October 27, from a host behind a NAT (192.168.1.100)
363	   [RFC1918] that used the IP address 192.0.2.44.  The information
364	   remained in the message, and is accessible by all recipients of the
365	   "perpass" mailing list, or indeed by any attacker that sees at least
366	   one copy of the message.

368	   An attacker that can observe sufficient email traffic can regularly
369	   update the mapping between public IP addresses and individual email
370	   identities.  Even if the SMTP traffic was encrypted on submission and
371	   relaying, the attacker can still receive a copy of public mailing
372	   lists like "perpass".

374	3.3.5.  Tracking address usage with web cookies

376	   Many web sites only encrypt a small fraction of their transactions.
377	   A popular pattern is to use HTTPS for the login information, and then
378	   use a "cookie" to associate following clear-text transactions with
379	   the user's identity.  Cookies are also used by various advertisement
380	   services to quickly identify the users and serve them with
381	   "personalized" advertisements.  Such cookies are particularly useful
382	   if the advertisement services want to keep tracking the user across
383	   multiple sessions that may use different IP addresses.

385	   As cookies are sent in clear text, an attacker can build a database
386	   that associates cookies to IP addresses for non-HTTPS traffic.  If
387	   the IP address is already identified, the cookie can be linked to the
388	   user identify.  After that, if the same cookie appears on a new IP
389	   address, the new IP address can be immediately associated with the
390	   pre-determined identity.

392	3.3.6.  Graph-based approaches to address correlation

394	   An attacker can track traffic from an IP address not yet associated
395	   with an individual to various public services (e.g. websites, mail
396	   servers, game servers), and exploit patterns in the observed traffic
397	   to correlate this address with other addresses that show similar
398	   patterns.  For example, any two addresses that show connections to
399	   the same IMAP or webmail services, the same set of favorite websites,
400	   and game servers at similar times of day may be associated with the
401	   same individual.  Correlated addresses can then be tied to an
402	   individual through one of the techniques above, walking the "network
403	   graph" to expand the set of attributable traffic.

405	3.3.7.  Tracking of MAC Addresses

407	   Moving back down the stack, technologies like Ethernet or Wi-Fi use
408	   MAC Addresses to identify link-level destinations.  MAC Addresses
409	   assigned according to IEEE-802 standards are unique to the device.
410	   If the link is publicly accessible, an attacker can track it.  For
411	   example, the attacker can track the wireless traffic at public Wi-Fi
412	   networks.  Simple devices can monitor the traffic, and reveal which
413	   MAC Addresses are present.  If the network does not use some form of
414	   Wi-Fi encryption, or if the attacker can access the decrypted
415	   traffic, the analysis will also provide the correlation between MAC
416	   Addresses and IP addresses.  Additional monitoring using techniques
417	   exposed in the previous sections will reveal the correlation between
418	   MAC Addresses, IP Addresses, and user identity.

420	   Given that large-scale databases of the MAC addresses of wireless
421	   access points for geolocation purposes have been known to exist for
422	   some time, the attacker could easily build a database linking MAC
423	   Addresses and device or user identities, and use it to track the
424	   movement of devices and of their owners.

426	4.  Reported Instances of Large-Scale Attacks

428	   The situation in reality is more bleak than that suggested by an
429	   analysis of our idealized attacker.  Through revelations of sensitive
430	   documents in several media outlets, the Internet community has been
431	   made aware of several intelligence activities conducted by US and UK
432	   national intelligence agencies, particularly the US National Security
433	   Agency (NSA) and the UK Government Communications Headquarters
434	   (GCHQ).  These documents have revealed methods that these agencies
435	   use to attack Internet applications and obtain sensitive user
436	   information.

438	   First, they have confirmed that these agencies have capabilities in
439	   line with those of our idealized attacker, thorugh the large-scale
440	   passive collection of Internet traffic [pass1][pass2][pass3][pass4].
441	   For example: - The NSA XKEYSCORE system accesses data from multiple
442	   access points and searches for "selectors" such as email addresses,
443	   at the scale of tens of terabytes of data per day.  - The GCHQ
444	   Tempora system appears to have access to around 1,500 major cables
445	   passing through the UK.  - The NSA MUSCULAR program tapped cables
446	   between data centers belonging to major service providers.  - Several
447	   programs appear to perform wide-scale collection of cookies in web
448	   traffic and location data from location-aware portable devices such
449	   as smartphones.

451	   However, the capabilities described go beyond those available to our
452	   idealized attacker, including:

454	   o  Decryption of TLS-protected Internet sessions [dec1][dec2][dec3].
455	      For example, the NSA BULLRUN project appears to have had a budget
456	      of around $250M per year to undermine encryption through multiple
457	      approaches.

459	   o  Insertion of NSA devices as a man-in-the-middle of Internet
460	      transactions [TOR1][TOR2].  For example, the NSA QUANTUM system
461	      appears to use several different techniques to hijack HTTP
462	      connections, ranging from DNS response injection to HTTP 302
463	      redirects.

465	   o  Direct acquisition of bulk data and metadata from service
466	      providers [dir1][dir2][dir3].  For example, the NSA PRISM program
467	      provides the agency with access to many types of user data (e.g.,
468	      email, chat, VoIP).

470	   o  Use of implants (covert modifications or malware) to undermine
471	      security and anonymity features [dec2][TOR1][TOR2].  For example:

473	      *  NSA appears to use the QUANTUM man-in-the-middle system to
474	         direct users to a FOXACID server, which delivers an implant to
475	         compromise the browser of a user of the Tor anonymous
476	         communications network.

478	      *  The BULLRUN program mentioned above includes the addition of
479	         covert modifications to software as one means to undermine
480	         encryption.

482	      *  There is also some suspicion that NSA modifications to the
483	         DUAL_EC_DRBG random number generator were made to ensure that
484	         keys generated using that generator could be predicted by NSA.
485	         These suspicions have been reinforced by reports that RSA
486	         Security was paid roughly $10M to make DUAL_EC_DRBG the default
487	         in their products.

489	   We use the term "pervasive attack" [RFC7258] to collectively describe
490	   these operations.  The term "pervasive" is used because the attacks
491	   are designed to indiscriminately gather as much data as possible and
492	   to apply selective analysis on targets after the fact.  This means
493	   that all, or nearly all, Internet communications are targets for
494	   these attacks.  To achieve this scale, the attacks are physically
495	   pervasive; they affect a large number of Internet communications.
496	   They are pervasive in content, consuming and exploiting any
497	   information revealed by the protocol.  And they are pervasive in
498	   technology, exploiting many different vulnerabilities in many
499	   different protocols.

501	   It's important to note that although the attacks mentioned above were
502	   executed by NSA and GCHQ, there are many other organizations that can
503	   mount pervasive surveillance attacks.  Because of the resources
504	   required to achieve pervasive scale, these attacks are most commonly
505	   undertaken by nation-state actors.  For example, the Chinese Internet
506	   filtering system known as the "Great Firewall of China" uses several
507	   techniques that are similar to the QUANTUM program, and which have a
508	   high degree of pervasiveness with regard to the Internet in China.

510	5.  Threat Model

512	   Given these disclosures, we must consider a broader threat model.

514	   Pervasive surveillance aims to collect information across a large
515	   number of Internet communications, analyzing the collected
516	   communications to identify information of interest within individual
517	   communications, or inferring information from correlated
518	   communications.  his analysis sometimes benefits from decryption of
519	   encrypted communications and deanonymization of anonymized
520	   communications.  As a result, these attackers desire both access to
521	   the bulk of Internet traffic and to the keying material required to
522	   decrypt any traffic that has been encrypted.  Even if keys are not
523	   available, note that the presence of a communication and the fact
524	   that it is encrypted may both be inputs to an analysis, even if the
525	   attacker cannot decrypt the communication.

527	   The attacks listed above highlight new avenues both for access to
528	   traffic and for access to relevant encryption keys.  They further
529	   indicate that the scale of surveillance is sufficient to provide a
530	   general capability to cross-correlate communications, a threat not
531	   previously thought to be relevant at the scale of the Internet.

533	5.1.  Attacker Capabilities

535	    +--------------------------+-------------------------------------+
536	    | Attack Class             | Capability                          |
537	    +--------------------------+-------------------------------------+
538	    | Passive observation      | Directly capture data in transit    |
539	    |                          |                                     |
540	    | Passive inference        | Infer from reduced/encrypted data   |
541	    |                          |                                     |
542	    | Active                   | Manipulate / inject data in transit |
543	    |                          |                                     |
544	    | Static key exfiltration  | Obtain key material once / rarely   |
545	    |                          |                                     |
546	    | Dynamic key exfiltration | Obtain per-session key material     |
547	    |                          |                                     |
548	    | Content exfiltration     | Access data at rest                 |
549	    +--------------------------+-------------------------------------+

551	   Security analyses of Internet protocols commonly consider two classes
552	   of attacker: Passive attackers, who can simply listen in on
553	   communications as they transit the network, and active attackers, who
554	   can modify or delete packets in addition to simply collecting them.

556	   In the context of pervasive passive surveillance, these attacks take
557	   on an even greater significance.  In the past, these attackers were
558	   often assumed to operate near the edge of the network, where attacks
559	   can be simpler.  For example, in some LANs, it is simple for any node
560	   to engage in passive listening to other nodes' traffic or inject
561	   packets to accomplish active attacks.  However, as we now know, both
562	   passive and active attacks are undertaken by pervasive attackers
563	   closer to the core of the network, greatly expanding the scope and
564	   capability of the attacker.

566	   Eavesdropping and observation at a larger scale make passive
567	   inference attacks easier to carry out: a passive attacker with access
568	   to a large portion of the Internet can analyze collected traffic to
569	   create a much more detailed view of individual behavior than an
570	   attacker that collects at a single point.  Even the usual claim that
571	   encryption defeats passive attackers is weakened, since a pervasive
572	   passive attacker can infer relationships from correlations over large
573	   numbers of sessions, e.g., pairing encrypted sessions with
574	   unencrypted sessions from the same host, or performing traffic
575	   fingerprinting between known and unknown encrypted sessions.  Reports
576	   on the NSA XKEYSCORE system would indicate it is an example of such
577	   an attacker.

579	   A pervasive active attacker likewise has capabilities beyond those of
580	   a localized active attacker.  Active attacks are often limited by
581	   network topology, for example by a requirement that the attacker be
582	   able to see a targeted session as well as inject packets into it.  A
583	   pervasive active attacker with access at multiple points within the
584	   core of the Internet is able to overcome these topological
585	   limitations and perform attacks over a much broader scope.  Being
586	   positioned in the core of the network rather than the edge can also
587	   enable a pervasive active attacker to reroute targeted traffic,
588	   amplifying the ability to perform both eavesdropping and traffic
589	   injection.  Pervasive active attackers can also benefit from
590	   pervasive passive collection to identify vulnerable hosts.

592	   While not directly related to pervasiveness, attackers that are in a
593	   position to mount a pervasive active attack are also often in a
594	   position to subvert authentication, a traditional protection against
595	   such attacks.  Authentication in the Internet is often achieved via
596	   trusted third party authorities such as the Certificate Authorities
597	   (CAs) that provide web sites with authentication credentials.  An
598	   attacker with sufficient resources may also be able to induce an
599	   authority to grant credentials for an identity of the attacker's
600	   choosing.  If the parties to a communication will trust multiple
601	   authorities to certify a specific identity, this attack may be
602	   mounted by suborning any one of the authorities (the proverbial
603	   "weakest link").  Subversion of authorities in this way can allow an
604	   active attack to succeed in spite of an authentication check.

606	   Beyond these three classes (observation, inference, and active),
607	   reports on the BULLRUN effort to defeat encryption and the PRISM
608	   effort to obtain data from service providers suggest three more
609	   classes of attack:

611	   o  Static key exfiltration

613	   o  Dynamic key exfiltration

615	   o  Content exfiltration
616	   These attacks all rely on a collaborator providing the attacker with
617	   some information, either keys or data.  These attacks have not
618	   traditionally been considered in scope for the Security
619	   Considerations sections of IETF protocols, as they occur outside the
620	   protocol.

622	   The term "key exfiltration" refers to the transfer of keying material
623	   for an encrypted communication from the collaborator to the attacker.
624	   By "static", we mean that the transfer of keys happens once, or
625	   rarely, typically of a long-lived key.  For example, this case would
626	   cover a web site operator that provides the private key corresponding
627	   to its HTTPS certificate to an intelligence agency.

629	   "Dynamic" key exfiltration, by contrast, refers to attacks in which
630	   the collaborator delivers keying material to the attacker frequently,
631	   e.g., on a per-session basis.  This does not necessarily imply
632	   frequent communications with the attacker; the transfer of keying
633	   material may be virtual.  For example, if an endpoint were modified
634	   in such a way that the attacker could predict the state of its
635	   psuedorandom number generator, then the attacker would be able to
636	   derive per-session keys even without per-session communications.

638	   Finally, content exfiltration is the attack in which the collaborator
639	   simply provides the attacker with the desired data or metadata.
640	   Unlike the key exfiltration cases, this attack does not require the
641	   attacker to capture the desired data as it flows through the network.
642	   The risk is to data at rest as opposed to data in transit.  This
643	   increases the scope of data that the attacker can obtain, since the
644	   attacker can access historical data - the attacker does not have to
645	   be listening at the time the communication happens.

647	   Exfiltration attacks can be accomplished via attacks against one of
648	   the parties to a communication, i.e., by the attacker stealing the
649	   keys or content rather than the party providing them willingly.  In
650	   these cases, the party may not be aware that they are collaborating,
651	   at least at a human level.  Rather, the subverted technical assets
652	   are "collaborating" with the attacker (by providing keys/content)
653	   without their owner's knowledge or consent.

655	   Any party that has access to encryption keys or unencrypted data can
656	   be a collaborator.  While collaborators are typically the endpoints
657	   of a communication (with encryption securing the links),
658	   intermediaries in an unencrypted communication can also facilitate
659	   content exfiltration attacks as collaborators by providing the
660	   attacker access to those communications.  For example, documents
661	   describing the NSA PRISM program claim that NSA is able to access
662	   user data directly from servers, where it is stored unencrypted.  In
663	   these cases, the operator of the server would be a collaborator, if
664	   an unwitting one.  By contrast, in the NSA MUSCULAR program, a set of
665	   collaborators enabled attackers to access the cables connecting data
666	   centers used by service providers such as Google and Yahoo.  Because
667	   communications among these data centers were not encrypted, the
668	   collaboration by an intermediate entity allowed NSA to collect
669	   unencrypted user data.

671	5.2.  Attacker Costs

673	     +--------------------------+-----------------------------------+
674	     | Attack Class             | Cost / Risk to Attacker           |
675	     +--------------------------+-----------------------------------+
676	     | Passive observation      | Passive data access               |
677	     |                          |                                   |
678	     | Passive inference        | Passive data access + processing  |
679	     |                          |                                   |
680	     | Active                   | Active data access + processing   |
681	     |                          |                                   |
682	     | Static key exfiltration  | One-time interaction              |
683	     |                          |                                   |
684	     | Dynamic key exfiltration | Ongoing interaction / code change |
685	     |                          |                                   |
686	     | Content exfiltration     | Ongoing, bulk interaction         |
687	     +--------------------------+-----------------------------------+

689	   Each of the attack types discussed in the previous section entails
690	   certain costs and risks.  These costs differ by attack, and can be
691	   helpful in guiding response to pervasive attack.

693	   Depending on the attack, the attacker may be exposed to several types
694	   of risk, ranging from simply losing access to arrest or prosecution.
695	   In order for any of these negative consequences to occur, however,
696	   the attacker must first be discovered and identified.  So the primary
697	   risk we focus on here is the risk of discovery and attribution.

699	   A passive attack is the simplest to mount in some ways.  The base
700	   requirement is that the attacker obtain physical access to a
701	   communications medium and extract communications from it.  For
702	   example, the attacker might tap a fiber-optic cable, acquire a mirror
703	   port on a switch, or listen to a wireless signal.  The need for these
704	   taps to have physical access or proximity to a link exposes the
705	   attacker to the risk that the taps will be discovered.  For example,
706	   a fiber tap or mirror port might be discovered by network operators
707	   noticing increased attenuation in the fiber or a change in switch
708	   configuration.  Of course, passive attacks may be accomplished with
709	   the cooperation of the network operator, in which case there is a
710	   risk that the attacker's interactions with the network operator will
711	   be exposed.

713	   In many ways, the costs and risks for an active attack are similar to
714	   those for a passive attack, with a few additions.  An active attacker
715	   requires more robust network access than a passive attacker, since
716	   for example they will often need to transmit data as well as
717	   receiving it.  In the wireless example above, the attacker would need
718	   to act as an transmitter as well as receiver, greatly increasing the
719	   probability the attacker will be discovered (e.g., using direction-
720	   finding technology).  Active attacks are also much more observable at
721	   higher layers of the network.  For example, an active attacker that
722	   attempts to use a mis-issued certificate could be detected via
723	   Certificate Transparency [RFC6962].

725	   In terms of raw implementation complexity, passive attacks require
726	   only enough processing to extract information from the network and
727	   store it.  Active attacks, by contrast, often depend on winning race
728	   conditions to inject pakets into active connections.  So active
729	   attacks in the core of the network require processing hardware to
730	   that can operate at line speed (roughly 100Gbps to 1Tbps in the core)
731	   to identify opportunities for attack and insert attack traffic in a
732	   high-volume traffic.
733	   Key exfiltration attacks rely on passive attack for access to
734	   encrypted data, with the collaborator providing keys to decrypt the
735	   data.  So the attacker undertakes the cost and risk of a passive
736	   attack, as well as additional risk of discovery via the interactions
737	   that the attacker has with the collaborator.

739	   In this sense, static exfiltration has a lower risk profile than
740	   dynamic.  In the static case, the attacker need only interact with
741	   the collaborator a small number of times, possibly only once, say to
742	   exchange a private key.  In the dynamic case, the attacker must have
743	   continuing interactions with the collaborator.  As noted above these
744	   interactions may real, such as in-person meetings, or virtual, such
745	   as software modifications that render keys available to the attacker.
746	   Both of these types of interactions introduce a risk that they will
747	   be discovered, e.g., by employees of the collaborator organization
748	   noticing suspicious meetings or suspicious code changes.

750	   Content exfiltration has a similar risk profile to dynamic key
751	   exfiltration.  In a content exfiltration attack, the attacker saves
752	   the cost and risk of conducting a passive attack.  The risk of
753	   discovery through interactions with the collaborator, however, is
754	   still present, and may be higher.  The content of a communication is
755	   obviously larger than the key used to encrypt it, often by several
756	   orders of magnitude.  So in the content exfiltration case, the
757	   interactions between the collaborator and the attacker need to be
758	   much higher-bandwidth than in the key exfiltration cases, with a
759	   corresponding increase in the risk that this high-bandwidth channel
760	   will be discovered.

762	   It should also be noted that in these latter three exfiltration
763	   cases, the collaborator also undertakes a risk that his collaboration
764	   with the attacker will be discovered.  Thus the attacker may have to
765	   incur additional cost in order to convince the collaborator to
766	   participate in the attack.  Likewise, the scope of these attacks is
767	   limited to case where the attacker can convince a collaborator to
768	   participate.  If the attacker is a national government, for example,
769	   it may be able to compel participation within its borders, but have a
770	   much more difficult time recruiting foreign collaborators.

772	   As noted above, the collaborator in an exfiltration attack can be
773	   unwitting; the attacker can steal keys or data to enable the attack.
774	   In some ways, the risks of this approach are similar to the case of
775	   an active collaborator.  In the static case, the attacker needs to
776	   steal information from the collaborator once; in the dynamic case,
777	   the attacker needs to continued presence inside the collaborators
778	   systems.  The main difference is that the risk in this case is of
779	   automated discovery (e.g., by intrusion detection systems) rather
780	   than discovery by humans.

782	6.  Security Considerations

784	   This document describes a threat model for pervasive surveillance
785	   attacks.  Mitigations are to be given in a future document.

787	7.  IANA Considerations

789	   This document has no actions for IANA.

791	8.  Acknowledgements

793	   Thanks to Dave Thaler for the list of attacks and taxonomy; to
794	   Security Area Directors Stephen Farrell, Sean Turner, and Kathleen
795	   Moriarty for starting and managing the IETF's discussion on pervasive
796	   attack; and to Stephan Neuhaus, Mark Townsley, Chris Inacio,
797	   Evangelos Halepilidis, Bjoern Hoehrmann, Aziz Mohaisen, as well as
798	   the IAB Privacy and Security Program, for their input.

800	9.  References

802	9.1.  Normative References

804	   [RFC6973]  Cooper, A., Tschofenig, H., Aboba, B., Peterson, J.,
805	              Morris, J., Hansen, M., and R. Smith, "Privacy
806	              Considerations for Internet Protocols", RFC 6973, July
807	              2013.

809	9.2.  Informative References

811	   [pass1]    The Guardian, "How the NSA is still harvesting your online
812	              data", 2013,
813	              <http://www.theguardian.com/world/2013/jun/27/
814	              nsa-online-metadata-collection>.

816	   [pass2]    The Guardian, "NSA's Prism surveillance program: how it
817	              works and what it can do", 2013,
818	              <http://www.theguardian.com/world/2013/jun/08/
819	              nsa-prism-server-collection-facebook-google>.

821	   [pass3]    The Guardian, "XKeyscore: NSA tool collects 'nearly
822	              everything a user does on the internet'", 2013,
823	              <http://www.theguardian.com/world/2013/jul/31/
824	              nsa-top-secret-program-online-data>.

826	   [pass4]    The Guardian, "How does GCHQ's internet surveillance
827	              work?", n.d., <http://www.theguardian.com/uk/2013/jun/21/
828	              how-does-gchq-internet-surveillance-work>.

830	   [dec1]     The New York Times, "N.S.A. Able to Foil Basic Safeguards
831	              of Privacy on Web", 2013,
832	              <http://www.nytimes.com/2013/09/06/us/
833	              nsa-foils-much-internet-encryption.html>.

835	   [dec2]     The Guardian, "Project Bullrun - classification guide to
836	              the NSA's decryption program", 2013,
837	              <http://www.theguardian.com/world/interactive/2013/sep/05/
838	              nsa-project-bullrun-classification-guide>.

840	   [dec3]     The Guardian, "Revealed: how US and UK spy agencies defeat
841	              internet privacy and security", 2013,
842	              <http://www.theguardian.com/world/2013/sep/05/
843	              nsa-gchq-encryption-codes-security>.

845	   [TOR]      The Tor Project, "Tor", 2013,
846	              <https://www.torproject.org/>.

848	   [TOR1]     Schneier, B., "How the NSA Attacks Tor/Firefox Users With
849	              QUANTUM and FOXACID", 2013,
850	              <https://www.schneier.com/blog/archives/2013/10/
851	              how_the_nsa_att.html>.

853	   [TOR2]     The Guardian, "'Tor Stinks' presentation - read the full
854	              document", 2013,
855	              <http://www.theguardian.com/world/interactive/2013/oct/04/
856	              tor-stinks-nsa-presentation-document>.

858	   [dir1]     The Guardian, "NSA collecting phone records of millions of
859	              Verizon customers daily", 2013,
860	              <http://www.theguardian.com/world/2013/jun/06/
861	              nsa-phone-records-verizon-court-order>.

863	   [dir2]     The Guardian, "NSA Prism program taps in to user data of
864	              Apple, Google and others", 2013,
865	              <http://www.theguardian.com/world/2013/jun/06/
866	              us-tech-giants-nsa-data>.

868	   [dir3]     The Guardian, "Sigint - how the NSA collaborates with
869	              technology companies", 2013,
870	              <http://www.theguardian.com/world/interactive/2013/sep/05/
871	              sigint-nsa-collaborates-technology-companies>.

873	   [secure]   Schneier, B., "NSA surveillance: A guide to staying
874	              secure", 2013,
875	              <http://www.theguardian.com/world/2013/sep/05/
876	              nsa-how-to-remain-secure-surveillance>.

878	   [snowden]  Technology Review, "NSA Leak Leaves Crypto-Math Intact but
879	              Highlights Known Workarounds", 2013,
880	              <http://www.technologyreview.com/news/519171/nsa-leak-
881	              leaves-crypto-math-intact-but-highlights-known-
882	              workarounds/>.

884	   [key-recovery]
885	              Golle, P., "The Design and Implementation of Protocol-
886	              Based Hidden Key Recovery", 2003,
887	              <http://crypto.stanford.edu/~pgolle/papers/escrow.pdf>.

889	   [RFC1035]  Mockapetris, P., "Domain names - implementation and
890	              specification", STD 13, RFC 1035, November 1987.

892	   [RFC1918]  Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and
893	              E. Lear, "Address Allocation for Private Internets", BCP
894	              5, RFC 1918, February 1996.

896	   [RFC1939]  Myers, J. and M. Rose, "Post Office Protocol - Version 3",
897	              STD 53, RFC 1939, May 1996.

899	   [RFC2015]  Elkins, M., "MIME Security with Pretty Good Privacy
900	              (PGP)", RFC 2015, October 1996.

902	   [RFC2821]  Klensin, J., "Simple Mail Transfer Protocol", RFC 2821,
903	              April 2001.

905	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
906	              A., Peterson, J., Sparks, R., Handley, M., and E.
907	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
908	              June 2002.

910	   [RFC3365]  Schiller, J., "Strong Security Requirements for Internet
911	              Engineering Task Force Standard Protocols", BCP 61, RFC
912	              3365, August 2002.

914	   [RFC3501]  Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION
915	              4rev1", RFC 3501, March 2003.

917	   [RFC3851]  Ramsdell, B., "Secure/Multipurpose Internet Mail
918	              Extensions (S/MIME) Version 3.1 Message Specification",
919	              RFC 3851, July 2004.

921	   [RFC4033]  Arends, R., Austein, R., Larson, M., Massey, D., and S.
922	              Rose, "DNS Security Introduction and Requirements", RFC
923	              4033, March 2005.

925	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
926	              Internet Protocol", RFC 4301, December 2005.

928	   [RFC4303]  Kent, S., "IP Encapsulating Security Payload (ESP)", RFC
929	              4303, December 2005.

931	   [RFC4306]  Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", RFC
932	              4306, December 2005.

934	   [RFC4949]  Shirey, R., "Internet Security Glossary, Version 2", RFC
935	              4949, August 2007.

937	   [RFC5246]  Dierks, T. and E. Rescorla, "The Transport Layer Security
938	              (TLS) Protocol Version 1.2", RFC 5246, August 2008.

940	   [RFC5321]  Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,
941	              October 2008.

943	   [RFC5655]  Trammell, B., Boschi, E., Mark, L., Zseby, T., and A.
944	              Wagner, "Specification of the IP Flow Information Export
945	              (IPFIX) File Format", RFC 5655, October 2009.

947	   [RFC5750]  Ramsdell, B. and S. Turner, "Secure/Multipurpose Internet
948	              Mail Extensions (S/MIME) Version 3.2 Certificate
949	              Handling", RFC 5750, January 2010.

951	   [RFC6120]  Saint-Andre, P., "Extensible Messaging and Presence
952	              Protocol (XMPP): Core", RFC 6120, March 2011.

954	   [RFC6962]  Laurie, B., Langley, A., and E. Kasper, "Certificate
955	              Transparency", RFC 6962, June 2013.

957	   [RFC6698]  Hoffman, P. and J. Schlyter, "The DNS-Based Authentication
958	              of Named Entities (DANE) Transport Layer Security (TLS)
959	              Protocol: TLSA", RFC 6698, August 2012.

961	   [RFC7011]  Claise, B., Trammell, B., and P. Aitken, "Specification of
962	              the IP Flow Information Export (IPFIX) Protocol for the
963	              Exchange of Flow Information", STD 77, RFC 7011, September
964	              2013.

966	   [RFC7258]  Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an
967	              Attack", BCP 188, RFC 7258, May 2014.

969	Authors' Addresses

971	   Richard Barnes

973	   Email: rlb@ipv.sx

975	   Bruce Schneier

977	   Email: schneier@schneier.com

979	   Cullen Jennings

981	   Email: fluffy@cisco.com

983	   Ted Hardie

985	   Email: ted.ietf@gmail.com

987	   Brian Trammell

989	   Email: ietf@trammell.ch

991	   Christian Huitema

993	   Email: huitema@huitema.net
994	   Daniel Borkmann

996	   Email: dborkman@redhat.com