idnits 2.17.1 

draft-boschi-ipfix-anon-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** The document seems to lack a License Notice according IETF Trust
     Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009
     Section 6.b -- however, there's a paragraph with a matching beginning.
     Boilerplate error?

     (You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Feb 2009 rather than one of the newer Notices.  See
     https://trustee.ietf.org/license-info/.)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There is 1 instance of too long lines in the document, the longest one
     being 4 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but
     does not include the phrase in its RFC 2119 key words list.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 10, 2009) is 5398 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Missing Reference: 'CITE' is mentioned on line 524, but not defined

  == Unused Reference: 'I-D.ietf-ipfix-mediators-problem-statement' is
     defined on line 1288, but no explicit reference was found in the text

  ** Obsolete normative reference: RFC 5101 (Obsoleted by RFC 7011)

  ** Obsolete normative reference: RFC 5102 (Obsoleted by RFC 7012)

  == Outdated reference: A later version (-05) exists of
     draft-ietf-ipfix-file-04

  == Outdated reference: A later version (-09) exists of
     draft-ietf-ipfix-mediators-framework-02

  == Outdated reference: A later version (-09) exists of
     draft-ietf-ipfix-mediators-problem-statement-03


     Summary: 4 errors (**), 0 flaws (~~), 7 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	IPFIX Working Group                                            E. Boschi
3	Internet-Draft                                               B. Trammell
4	Intended status: Experimental                             Hitachi Europe
5	Expires: January 11, 2010                                  July 10, 2009

7	                     IP Flow Anonymisation Support
8	                     draft-boschi-ipfix-anon-04.txt

10	Status of this Memo

12	   This Internet-Draft is submitted to IETF in full conformance with the
13	   provisions of BCP 78 and BCP 79.

15	   Internet-Drafts are working documents of the Internet Engineering
16	   Task Force (IETF), its areas, and its working groups.  Note that
17	   other groups may also distribute working documents as Internet-
18	   Drafts.

20	   Internet-Drafts are draft documents valid for a maximum of six months
21	   and may be updated, replaced, or obsoleted by other documents at any
22	   time.  It is inappropriate to use Internet-Drafts as reference
23	   material or to cite them other than as "work in progress."

25	   The list of current Internet-Drafts can be accessed at
26	   http://www.ietf.org/ietf/1id-abstracts.txt.

28	   The list of Internet-Draft Shadow Directories can be accessed at
29	   http://www.ietf.org/shadow.html.

31	   This Internet-Draft will expire on January 11, 2010.

33	Copyright Notice

35	   Copyright (c) 2009 IETF Trust and the persons identified as the
36	   document authors.  All rights reserved.

38	   This document is subject to BCP 78 and the IETF Trust's Legal
39	   Provisions Relating to IETF Documents in effect on the date of
40	   publication of this document (http://trustee.ietf.org/license-info).
41	   Please review these documents carefully, as they describe your rights
42	   and restrictions with respect to this document.

44	Abstract

46	   This document describes anonymisation techniques for IP flow data and
47	   the export of anonymised data using the IPFIX protocol.  It provides
48	   a categorization of common anonymisation schemes and defines the
49	   parameters needed to describe them.  It provides guidelines for the
50	   implementation of anonymised data export and storage over IPFIX, and
51	   describes an Options-based method for anonymization metadata export
52	   within the IPFIX protocol, providing the basis for the definition of
53	   information models for configuring anonymisation techniques within an
54	   IPFIX Metering or Exporting Process, and for reporting the technique
55	   in use to an IPFIX Collecting Process.

57	Table of Contents

59	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
60	     1.1.  IPFIX Protocol Overview  . . . . . . . . . . . . . . . . .  4
61	     1.2.  IPFIX Documents Overview . . . . . . . . . . . . . . . . .  5
62	     1.3.  Anonymisation within the IPFIX Architecture  . . . . . . .  5
63	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  7
64	   3.  Categorisation of Anonymisation Techniques . . . . . . . . . .  7
65	   4.  Anonymisation of IP Flow Data  . . . . . . . . . . . . . . . .  8
66	     4.1.  IP Address Anonymisation . . . . . . . . . . . . . . . . . 10
67	       4.1.1.  Truncation . . . . . . . . . . . . . . . . . . . . . . 10
68	       4.1.2.  Random Permutation . . . . . . . . . . . . . . . . . . 10
69	       4.1.3.  Prefix-preserving Pseudonymisation . . . . . . . . . . 11
70	     4.2.  Hardware Address Anonymisation . . . . . . . . . . . . . . 11
71	       4.2.1.  Random Permutation . . . . . . . . . . . . . . . . . . 12
72	       4.2.2.  Structured Pseudonymisation  . . . . . . . . . . . . . 12
73	     4.3.  Timestamp Anonymisation  . . . . . . . . . . . . . . . . . 12
74	       4.3.1.  Precision Degradation  . . . . . . . . . . . . . . . . 13
75	       4.3.2.  Enumeration  . . . . . . . . . . . . . . . . . . . . . 13
76	       4.3.3.  Random Time Shifts . . . . . . . . . . . . . . . . . . 13
77	     4.4.  Counter Anonymisation  . . . . . . . . . . . . . . . . . . 14
78	       4.4.1.  Precision Degradation  . . . . . . . . . . . . . . . . 14
79	       4.4.2.  Binning  . . . . . . . . . . . . . . . . . . . . . . . 14
80	       4.4.3.  Random Noise Addition  . . . . . . . . . . . . . . . . 15
81	     4.5.  Anonymisation of Other Flow Fields . . . . . . . . . . . . 15
82	       4.5.1.  Binning  . . . . . . . . . . . . . . . . . . . . . . . 15
83	       4.5.2.  Random Permutation . . . . . . . . . . . . . . . . . . 16
84	   5.  Parameters for the Description of Anonymisation Techniques . . 16
85	     5.1.  Stability  . . . . . . . . . . . . . . . . . . . . . . . . 16
86	     5.2.  Truncation Length  . . . . . . . . . . . . . . . . . . . . 16
87	     5.3.  Bin Map  . . . . . . . . . . . . . . . . . . . . . . . . . 17
88	     5.4.  Permutation  . . . . . . . . . . . . . . . . . . . . . . . 17
89	     5.5.  Shift Amount . . . . . . . . . . . . . . . . . . . . . . . 17
90	   6.  Anonymisation Export Support in IPFIX  . . . . . . . . . . . . 17
91	     6.1.  Anonymisation Options Template . . . . . . . . . . . . . . 18
92	     6.2.  Recommended Information Elements for Anonymisation
93	           Metadata . . . . . . . . . . . . . . . . . . . . . . . . . 19
94	       6.2.1.  anonymisationStability . . . . . . . . . . . . . . . . 19
95	       6.2.2.  anonymisationTechnique . . . . . . . . . . . . . . . . 20
96	       6.2.3.  informationElementIndex  . . . . . . . . . . . . . . . 22
97	   7.  Applying Anonymisation Techniques to IPFIX Export and
98	       Storage  . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
99	     7.1.  Arrangement of Processes in IPFIX Anonymisation  . . . . . 22
100	     7.2.  IPFIX-Specific Anonymisation Guidelines  . . . . . . . . . 25
101	       7.2.1.  Appropriate Use of Information Elements for
102	               Anonymised Data  . . . . . . . . . . . . . . . . . . . 25
103	       7.2.2.  Anonymisation of Header Data . . . . . . . . . . . . . 26
104	       7.2.3.  Anonymisation of Options Data  . . . . . . . . . . . . 27
105	   8.  Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
106	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 28
107	   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 28
108	   11. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 29
109	   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29
110	     12.1. Normative References . . . . . . . . . . . . . . . . . . . 29
111	     12.2. Informative References . . . . . . . . . . . . . . . . . . 29
112	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30

114	1.  Introduction

116	   The standardisation of an IP flow information export protocol
117	   [RFC5101] and associated representations removes a technical barrier
118	   to the sharing of IP flow data across organizational boundaries and
119	   with network operations, security, and research communities for a
120	   wide variety of purposes.  However, with wider dissemination comes
121	   greater risks to the privacy of the users of networks under
122	   measurement, and to the security of those networks.  While it is not
123	   a complete solution to the issues posed by distribution of IP flow
124	   information, anonymisation (i.e., the deletion or transformation of
125	   information that is considered sensitive and could be used to reveal
126	   the identity of subjects involved in a communication) is an important
127	   tool for the protection of privacy within network measurement
128	   infrastructures.

130	   This document presents a mechanism for representing anonymised data
131	   within IPFIX and guidelines for using it.  It begins with a
132	   categorization of anonymisation techniques.  It then describes
133	   applicability of each technique to commonly anonymisable fields of IP
134	   flow data, organized by information element data type and semantics
135	   as in [RFC5102]; enumerates the parameters required by each of the
136	   applicable anonymisation techniques; and provides guidelines for the
137	   use of each of these techniques in accordance with best practices in
138	   data protection.  Finally, it specifies a mechanism for exporting
139	   anonymised data and binding anonymisation metadata to templates using
140	   IPFIX Options.

142	1.1.  IPFIX Protocol Overview

144	   In the IPFIX protocol, { type, length, value } tuples are expressed
145	   in templates containing { type, length } pairs, specifying which {
146	   value } fields are present in data records conforming to the
147	   Template, giving great flexibility as to what data is transmitted.
148	   Since Templates are sent very infrequently compared with Data
149	   Records, this results in significant bandwidth savings.  Various
150	   different data formats may be transmitted simply by sending new
151	   Templates specifying the { type, length } pairs for the new data
152	   format.  See [RFC5101] for more information.

154	   The IPFIX information model [RFC5102] defines a large number of
155	   standard Information Elements which provide the necessary { type }
156	   information for Templates.  The use of standard elements enables
157	   interoperability among different vendors' implementations.
158	   Additionally, non-standard enterprise-specific elements may be
159	   defined for private use.

161	1.2.  IPFIX Documents Overview

163	   "Specification of the IPFIX Protocol for the Exchange of IP Traffic
164	   Flow Information" [RFC5101] and its associated documents define the
165	   IPFIX Protocol, which provides network engineers and administrators
166	   with access to IP traffic flow information.

168	   "Architecture for IP Flow Information Export" [RFC5470] defines the
169	   architecture for the export of measured IP flow information out of an
170	   IPFIX Exporting Process to an IPFIX Collecting Process, and the basic
171	   terminology used to describe the elements of this architecture, per
172	   the requirements defined in "Requirements for IP Flow Information
173	   Export" [RFC3917].  The IPFIX Protocol document [RFC5101] then covers
174	   the details of the method for transporting IPFIX Data Records and
175	   Templates via a congestion-aware transport protocol from an IPFIX
176	   Exporting Process to an IPFIX Collecting Process.

178	   "Information Model for IP Flow Information Export" [RFC5102]
179	   describes the Information Elements used by IPFIX, including details
180	   on Information Element naming, numbering, and data type encoding.
181	   Finally, "IPFIX Applicability" [RFC5472] describes the various
182	   applications of the IPFIX protocol and their use of information
183	   exported via IPFIX, and relates the IPFIX architecture to other
184	   measurement architectures and frameworks.

186	   Additionally, the "Specification of the IPFIX File Format"
187	   [I-D.ietf-ipfix-file] describes a file format based upon the IPFIX
188	   Protocol for the storage of flow data.

190	   This document references the Protocol and Architecture documents for
191	   terminology, and extends the IPFIX Information Model to provide new
192	   Information Elements for anonymisation metadata.  The anonymisation
193	   techniques described herein are equally applicable to the IPFIX
194	   Protocol and data stored in IPFIX Files.

196	1.3.  Anonymisation within the IPFIX Architecture

198	   "Architecture for IP Flow Information Export" [RFC5470] defines the
199	   functions performed in sequence by the various functional blocks in
200	   an IPFIX Device as in the figure below.

202	                    Packet(s) coming into Observation Point(s)
203	                      |                                   |
204	                      v                                   v
205	     +----------------+-------------------------+   +-----+-------+
206	     |          Metering Process on an          |   |             |
207	     |             Observation Point            |   |             |
208	     |                                          |   |             |
209	     |   packet header capturing                |   |             |
210	     |        |                                 |...| Metering    |
211	     |   timestamping                           |   | Process N   |
212	     |        |                                 |   |             |
213	     | +----->+                                 |   |             |
214	     | |      |                                 |   |             |
215	     | |   sampling Si (1:1 in case of no       |   |             |
216	     | |      |          sampling)              |   |             |
217	     | |   filtering Fi (select all when        |   |             |
218	     | |      |          no criteria)           |   |             |
219	     | +------+                                 |   |             |
220	     |        |                                 |   |             |
221	     |        |        Timing out Flows         |   |             |
222	     |        |    Handle resource overloads    |   |             |
223	     +--------|---------------------------------+   +-----|-------+
224	              |                                           |
225	      Flow Records (identified by Observation Domain)  Flow Records
226	              |                                           |
227	              +---------+---------------------------------+
228	                        |
229	   +--------------------|----------------------------------------------+
230	   |                    |     Exporting Process                        |
231	   |+-------------------|-------------------------------------------+  |
232	   ||                   v       IPFIX Protocol                      |  |
233	   ||+-----------------------------+  +----------------------------+|  |
234	   |||Rules for                    |  |Functions                   ||  |
235	   ||| Picking/sending Templates   |  |-Packetise selected Control ||  |
236	   ||| Picking/sending Flow Records|->|  & data Information into   ||  |
237	   ||| Encoding Template & data    |  |  IPFIX export packets.     ||  |
238	   ||| Selecting Flows to export(*)|  |-Handle export errors       ||  |
239	   ||+-----------------------------+  +----------------------------+|  |
240	   |+----------------------------+----------------------------------+  |
241	   |                             |                                     |
242	   |                    exported IPFIX Messages                        |
243	   |                             |                                     |
244	   |                +------------+-----------------+                   |
245	   |                |  Anonymise export packet(*)  |                   |
246	   |                +------------+-----------------+                   |
247	   |                             |                                     |
248	   |                +------------+-----------------+                   |
249	   |                |       Transport  Protocol    |                   |
250	   |                +------------+-----------------+                   |
251	   |                             |                                     |
252	   +-----------------------------+-------------------------------------+
253	                                 |
254	                                 v
255	                    IPFIX export packet to Collector

257	   (*) indicates that the block is optional.

259	                 Figure 1: IPFIX Device functional blocks

261	   Note that, according to the original architecture specification,
262	   IPFIX Message anonymisation is optionally performed as the final
263	   operation before handing the Message to the transport protocol for
264	   export.  While no provision is made in the architecture for
265	   anonymisation metadata as in Section 6, this arrangement does allow
266	   for the message rewriting necessary for comprehensive anonymisation
267	   of IPFIX export as in Section 7.  The development of the IPFIX
268	   Mediation [I-D.ietf-ipfix-mediators-framework] framework and the
269	   IPFIX File Format [I-D.ietf-ipfix-file] expand upon this initial
270	   architectural allowance for anonymisation by adding to the list of
271	   places that anonymisation may be applied.  The former specifies IPFIX
272	   Mediators, which rewrite existing IPFIX messages, and the latter
273	   specifies a method for storage of IPFIX data in files.

275	   More detail on the applicable architectural arrangements of
276	   anonymisation can be found in Section 7.1

278	2.  Terminology

280	   Terms used in this document that are defined in the Terminology
281	   section of the IPFIX Protocol [RFC5101] document are to be
282	   interpreted as defined there.

284	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
285	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
286	   document are to be interpreted as described in RFC 2119 [RFC2119].

288	3.  Categorisation of Anonymisation Techniques

290	   Anonymisation modifies a data set in order to protect the identity of
291	   the people or entities described by the data set from disclosure.
292	   With respect to network traffic data, anonymisation generally
293	   attempts to preserve some set of properties of the network traffic
294	   useful for a given application or applications, while ensuring the
295	   data cannot be traced back to the specific networks, hosts, or users
296	   generating the traffic.

298	   Anonymisation may be broadly classified according to two properties:
299	   recoverability and countability.  All anonymisation techniques map
300	   the real space of identifiers or values into a separate, anonymised
301	   space, according to some function.  A technique is said to be
302	   recoverable when the function used is invertible or can otherwise be
303	   reversed and a real identifier can be recovered from a given
304	   replacement identifier.

306	   Countability compares the dimension of the anonymised space (N) to
307	   the dimension of the real space (M), and denotes how the count of
308	   unique values is preserved by the anonymisation function.  If the
309	   anonymised space is smaller than the real space, then the function is
310	   said to generalise the input, mapping more than one input point to
311	   each anonymous value (e.g., as with aggregation).  By definition,
312	   generalisation is not recoverable.

314	   If the dimensions of the anonymised and real spaces are the same,
315	   such that the count of unique values is preserved, then the function
316	   is said to be a direct substitution function.  If the dimension of
317	   the anonymised space is larger, such that each real value maps to a
318	   set of anonymised values, then the function is said to be a set
319	   substitution function.  Note that with set substitution functions,
320	   the sets of anonymised values are not necessarily disjoint.  Either
321	   direct or set substitution functions are said to be one-way if there
322	   exists no method for recovering the real data point from an
323	   anonymised one.

325	   This classification is summarised in the table below.

327	   +------------------------+-----------------+------------------------+
328	   | Recoverability /       | Recoverable     | Non-recoverable        |
329	   | Countability           |                 |                        |
330	   +------------------------+-----------------+------------------------+
331	   | N < M                  | N.A.            | Generalisation         |
332	   | N = M                  | Direct          | One-way Direct         |
333	   |                        | Substitution    | Substitution           |
334	   | N > M                  | Set             | One-way Set            |
335	   |                        | Substitution    | Substitution           |
336	   +------------------------+-----------------+------------------------+

338	4.  Anonymisation of IP Flow Data

340	   Due to the restricted semantics of IP flow data, there are a
341	   relatively limited set of specific anonymisation techniques available
342	   on flow data, though each falls into the broad categories above.
343	   Each type of field that may commonly appear in a flow record may have
344	   its own applicable specific techniques.

346	   While anonymisation is generally applied at the resolution of single
347	   fields within a flow record, attacks against anonymisation use entire
348	   flows and relationships between hosts and flows within a given data
349	   set.  Therefore, fields which may not necessarily be identifying by
350	   themselves may be anonymised in order to increase the anonymity of
351	   the data set as a whole.

353	   Of all the fields in an IP flow record, only IP addresses directly
354	   identify entities in the real world.  Each IP address is associated
355	   with an interface on a network host, and can potentially be
356	   identified with a single user.  Additionally, IP addresses are
357	   structured identifiers; that is, partial IP address prefixes may be
358	   used to identify networks just as full IP addresses identify hosts.
359	   This makes anonymisation of IP addresses particularly important.

361	   Hardware addresses uniquely identify devices on the network; while
362	   they are not often available in traffic data collected at Layer 3,
363	   and cannot be used to locate devices within the network, some traces
364	   may contain sub-IP data including hardware address data.  Hardware
365	   addresses may be mappable to device serial numbers, and to the
366	   entities or individuals who purchased the devices, when combined with
367	   external databases.  They may also leak via IPv6 addresses in certain
368	   circumstances.  Therefore, hardware address anonymisation is also
369	   important.

371	   Port numbers identify abstract entities (applications) as opposed to
372	   real-world entities, but they can be used to classify hosts and user
373	   behavior.  Passive port fingerprinting, both of well-known and
374	   ephemeral ports, can be used to determine the operating system
375	   running on a host.  Relative data volumes by port can also be used to
376	   determine the host's function (workstation, web server, etc.); this
377	   information can be used to identify hosts and users.

379	   While not identifiers in and of themselves, timestamps and counters
380	   can reveal the behavior of the hosts and users on a network.  Any
381	   given network activity is recognizable by a pattern of relative time
382	   differences and data volumes in the associated sequence of flows,
383	   even without host address information.  They can therefore be used to
384	   identify hosts and users.  Timestamps and counters are also
385	   vulnerable to traffic injection attacks, where traffic with a known
386	   pattern is injected into a network under measurement, and this
387	   pattern is later identified in the anonymised data set.

389	   The simplest and most extreme form of anonymisation, which can be
390	   applied to any field of a flow record, is black-marker anonymisation,
391	   or complete deletion of a given field.  Note that black-marker
392	   anonymisation is equivalent to simply not exporting the field(s) in
393	   question.

395	   While black-marker anonymisation completely protects the data in the
396	   deleted fields from the risk of disclosure, it also reduces the
397	   utility of the anonymised data set as a whole.  Techniques that
398	   retain some information while reducing (though not eliminating) the
399	   disclosure risk will be extensively discussed in the following
400	   sections; note that the techniques specifically applicable to IP
401	   addresses, timestamps, ports, and counters will be discussed in
402	   separate sections.

404	4.1.  IP Address Anonymisation

406	   Since IP addresses are the most common identifiers within flow data
407	   that can be used to directly identify a person, organization, or
408	   host, most of the work on flow and trace data anonymisation has gone
409	   into IP address anonymisation techniques.  Indeed, the aim of most
410	   attacks against anonymisation is to recover the map from anonymised
411	   IP addresses to original IP addresses thereby identifying the
412	   identified hosts.  There is therefore a wide range of IP address
413	   anonymisation schemes that fit into the following categories.

415	       +------------------------------------+---------------------+
416	       | Scheme                             | Action              |
417	       +------------------------------------+---------------------+
418	       | Truncation                         | Generalisation      |
419	       | Random Permutation                 | Direct Substitution |
420	       | Prefix-preserving Pseudonymisation | Direct Substitution |
421	       +------------------------------------+---------------------+

423	4.1.1.  Truncation

425	   Truncation removes "n" of the least significant bits from an IP
426	   address, replacing them with zeroes.  In effect, it replaces a host
427	   address with a network address for some fixed netblock; for IPv4
428	   addresses, 8-bit truncation corresponds to replacement with a /24
429	   network address.  Truncation is a non-reversible generalisation
430	   scheme.  Note that while truncation is effective for making hosts
431	   non-identifiable, it preserves information which can be used to
432	   identify an organization, a geographic region, a country, or a
433	   continent (or RIR region of responsibility).

435	   Truncation to an address length of 0 is equivalent to black-marker
436	   anonymisation.  Removal of IP address information is only recommended
437	   for analysis tasks which have no need to separate flow data by host
438	   or network; e.g. as a first stage to per-application (port) or time-
439	   series total volume analyses.

441	4.1.2.  Random Permutation

443	   Random permutation is a direct substitution technique, replacing each
444	   IP address with an address randomly selected from the set of possible
445	   IP addresses, guaranteeing that each anonymised address represents a
446	   unique original address.  The random permutation does not preserve
447	   any structural information about a network, but it does preserve the
448	   unique count of IP addresses.  Any application that requires more
449	   structure than host-uniqueness will not be able to use randomly
450	   permuted IP addresses.

452	4.1.3.  Prefix-preserving Pseudonymisation

454	   Prefix-preserving pseudonymisation is a direct substitution
455	   technique, further restricted such that the structure of subnets is
456	   preserved at each level while anonymising IP addresses.  If two real
457	   IP addresses match on a prefix of "n" bits, the two anonymised IP
458	   addresses will match on a prefix of "n" bits as well.  This is useful
459	   when relationships among networks must be preserved for a given
460	   analysis task, but introduces structure into the anonymised data
461	   which can be exploited in attacks against the anonymisation
462	   technique.

464	4.2.  Hardware Address Anonymisation

466	   Flow data containing sub-IP information can also contain identifying
467	   information in the form of the hardware (MAC) address.  While
468	   hardware address information cannot be used to locate a node within a
469	   network, it can be used to directly uniquely identify a specific
470	   device.  Vendors or organizations within the supply chain may then
471	   have the information necessary to identify the entity or individual
472	   that purchased the device.

474	   Hardware address information is not as structured as IP address
475	   information.  EUI-48 and EUI-64 hardware addresses contain an
476	   Organizational Unique Identifier in the three most significant bytes
477	   of the address; this OUI additionally contains bits noting whether
478	   the address is locally or globally administered.  Beyond this, the
479	   address is unstructured, and there is no particular relationship
480	   among the OUIs assigned to a given vendor.

482	   Note that hardware address information also appear within IPv6
483	   addresses, as the EAP-64 address, or EAP-48 address encoded as an
484	   EAP-64 address, is used as the least significant 64 bits of the IPv6
485	   address in the case of link local addressing or stateless
486	   autoconfiguration; the considerations and techniques in this section
487	   may then apply to such IPv6 addresses as well.

489	           +-----------------------------+---------------------+
490	           | Scheme                      | Action              |
491	           +-----------------------------+---------------------+
492	           | Random Permutation          | Direct Substitution |
493	           | Structured Pseudonymisation | Direct Substitution |
494	           +-----------------------------+---------------------+

496	4.2.1.  Random Permutation

498	   Random permutation is a direct substitution technique, replacing each
499	   IP address with an address randomly selected from the set of possible
500	   IP addresses, guaranteeing that each anonymised address represents a
501	   unique original address.  The random permutation does not preserve
502	   any structural information about a network, but it does preserve the
503	   unique count of IP addresses.  Any application that requires more
504	   structure than host-uniqueness will not be able to use randomly
505	   permuted IP addresses.

507	4.2.2.  Structured Pseudonymisation

509	   Structured pseudonymisation for MAC addresses is a direct
510	   substitution technique, like random permutation, but restricted such
511	   that the OUI (the most significant three bytes) is permuted
512	   separately from the node identifier, the remainder.  This is useful
513	   when the uniqueness of OUIs must be preserved for a given analysis
514	   task, but introduces structure into the anonymised data which can be
515	   exploited in attacks against the anonymisation technique.

517	4.3.  Timestamp Anonymisation

519	   The particular time at which a flow began or ended is not
520	   particularly identifiable information, but it can be used as part of
521	   attacks against other anonymisation techniques or for user profiling.
522	   Presice timestamps can be used in injected-traffic fingerprinting
523	   attacks [CITE] as well as to identify certain activity by response
524	   delay and size fingerprinting [CITE].  Therefore, timestamp
525	   information may be anonymised in order to ensure the protection of
526	   the entire dataset.

528	          +-----------------------+----------------------------+
529	          | Scheme                | Action                     |
530	          +-----------------------+----------------------------+
531	          | Precision Degradation | Generalisation             |
532	          | Enumeration           | Direct or Set Substitution |
533	          | Random Shifts         | Direct Substitution        |
534	          +-----------------------+----------------------------+

536	4.3.1.  Precision Degradation

538	   Precision Degradation is a generalisation technique that removes the
539	   most precise components of a timestamp, accounting all events
540	   occurring in each given interval (e.g. one millisecond for
541	   millisecond level degradation) as simultaneous.  This has the effect
542	   of potentially collapsing many timestamps into one.  With this
543	   technique time precision is reduced, and sequencing may be lost, but
544	   the information at which time the event occurred is preserved.  The
545	   anonymised data may not be generally useful for applications which
546	   require strict sequencing of flows.

548	   Note that flow meters with low time precision (e.g. second precision,
549	   or millisecond precision on high-capacity networks) perform the
550	   equivalent of precision degradation anonymisation by their design.

552	   Note also that degradation to a very low precision (e.g. on the order
553	   of minutes, hours, or days) is commonly used in analyses operating on
554	   time-series aggregated data, and may also be described as binning;
555	   though the time scales are longer and applicability more restricted,
556	   this is in principle the same operation.

558	   Precision degradation to infinitely low precision is equivalent to
559	   black-marker anonymisation.  Removal of timestamp information is only
560	   recommended for analysis tasks which have no need to separate flows
561	   in time, for example for counting total volumes or unique occurrences
562	   of other flow keys in an entire dataset.

564	4.3.2.  Enumeration

566	   Enumeration is a substitution function that retains the chronological
567	   order in which events occurred while eliminating time information.
568	   Timestamps are substituted by equidistant timestamps (or numbers)
569	   starting from a randomly chosen start value.  The resulting data is
570	   useful for applications requiring strict sequencing, but not for
571	   those requiring good timing information (e.g. delay- or jitter-
572	   measurement for QoS applications or SLA validation).

574	4.3.3.  Random Time Shifts

576	   Random time shifts add a random offset to every timestamp within a
577	   dataset.  This reversible substitution technique therefore retains
578	   duration and inter-event interval information as well as
579	   chronological order of flows.  It is primarily intended to defeat
580	   traffic injection fingerprinting attacks.

582	4.4.  Counter Anonymisation

584	   Counters (such as packet and octet volumes per flow) are subject to
585	   fingerprinting and injection attacks against anonymisation, or for
586	   user profiling as timestamps are.  Counter anonymisation can help
587	   defeat these attacks, but are only usable for analysis tasks for
588	   which relative or imprecise magnitudes of activity are useful.

590	          +-----------------------+----------------------------+
591	          | Scheme                | Action                     |
592	          +-----------------------+----------------------------+
593	          | Precision Degradation | Generalisation             |
594	          | Binning               | Generalisation             |
595	          | Random noise addition | Direct or Set Substitution |
596	          +-----------------------+----------------------------+

598	4.4.1.  Precision Degradation

600	   As with precision degradation in timestamps, precision degradation of
601	   counters removes lower-order bits of the counters, treating all the
602	   counters in a given range as having the same value.  Depending on the
603	   precision reduction, this loses information about the relationships
604	   between sizes of similarly-sized flows, but keeps relative magnitude
605	   information.

607	4.4.2.  Binning

609	   Binning can be seen as a special case of precision degradation; the
610	   operation is identical, except for in precision degradation the
611	   counter ranges are uniform, and in binning they need not be.  For
612	   example, a common counter binning scheme for packet counters could be
613	   to bin values 1-2 together, and 3-infinity together, thereby
614	   separating potentially completely-opened TCP connections from
615	   unopened ones.  Binning schemes are generally chosen to keep
616	   precisely the amount of information required in a counter for a given
617	   analysis task.  Note that, also unlike precision degradation, the bin
618	   label need not be within the bin's range.

620	   Binning counters to a single bin 0-infinity, or alternately precision
621	   degradation to infinitely low precision, is equivalent to black-
622	   marker anonymisation.  Removal of counter information is only
623	   recommended for analysis tasks which have no need to evaluate the
624	   removed counter, for example for counting only unique occurrences of
625	   other flow keys.

627	4.4.3.  Random Noise Addition

629	   Random noise addition adds a random amount to a counter in each flow;
630	   this is used to keep relative magnitude information and minimize the
631	   disruption to size relationship information while avoiding
632	   fingerprinting attacks against anonymisation.  Note that there is no
633	   guarantee that random noise addition will maintain ranking order by a
634	   counter among members of a set.  Random noise addition is
635	   particularly useful when the derived analysis data will not be
636	   presented in such a way as to require the lower-order bits of the
637	   counters.

639	4.5.  Anonymisation of Other Flow Fields

641	   Other fields, particularly port numbers and protocol numbers, can be
642	   used to partially identify the applications that generated the
643	   traffic in a a given flow trace.  This information can be used in
644	   fingerprinting attacks, and may be of interest on its own (e.g., to
645	   reveal that a certain application with suspected vulnerabilities is
646	   running on a given network).  These fields are generally anonymised
647	   using one of two techniques.

649	               +--------------------+---------------------+
650	               | Scheme             | Action              |
651	               +--------------------+---------------------+
652	               | Binning            | Generalisation      |
653	               | Random Permutation | Direct Substitution |
654	               +--------------------+---------------------+

656	4.5.1.  Binning

658	   Binning is a generalisation technique mapping a set of potentially
659	   non-uniform ranges into a set of arbitrarily labeled bins.  Common
660	   bin arrangements depend on the field type and the analysis
661	   application.  For example, an IP protocol bin arrangement may
662	   preserve 1, 6, and 17 for ICMP, UDP, and TCP traffic, and bin all
663	   other protocols into a single bin, to mitigate the use of uncommon
664	   protocols in fingerprinting attacks.  Another example arrangement may
665	   bin source and destination ports into low (0-1023) and high (1024-
666	   65535) bins in order to tell service from ephemeral ports without
667	   identifying individual applications.

669	   Binning other flow key fields to a single bin is equivalent to black-
670	   marker anonymisation.  Removal of other flow key information is only
671	   recommended for analysis tasks which have no need to differentiate
672	   flows on the removed keys, for example for total traffic counts or
673	   unique counts of other flow keys.

675	4.5.2.  Random Permutation

677	   Random permutation is a direct substitution technique, replacing each
678	   value with an value randomly selected from the set of possible range,
679	   guaranteeing that each anonymised value represents a unique original
680	   value.  This is used to preserve the count of unique values without
681	   preserving information about, or the ordering of, the values
682	   themselves.

684	5.  Parameters for the Description of Anonymisation Techniques

686	   This section details the abstract parameters used to describe the
687	   anonymisation techniques examined in the previous section, on a per-
688	   parameter basis.  These parameters and their export safety inform the
689	   design of the IPFIX anonymisation metadata export specified in the
690	   following section.

692	5.1.  Stability

694	   Any given anonymisation technique may be applied with a varying range
695	   of stability.  Stability is important for assessing the comparability
696	   of anonymised information in different data sets, or in the same data
697	   set over different time periods.  In general, stability ranges from
698	   completely stable to completely unstable; however, note that the
699	   completely unstable case is indistinguishable from black-marker
700	   anonymisation.  A completely stable anonymisation will always map a
701	   given value in the real space to the same value in the anonymised
702	   space.  In practice, an anonymisation may also be stable for every
703	   data set published by an a particular producer to a particular
704	   consumer, stable for a stated time period within a dataset or across
705	   datasets, or stable only for a single data set.

707	   If no information about stability is available, users of anonymised
708	   data may assume that the techniques used are stable across the entire
709	   dataset, but unstable across datasets.  Note that stability presents
710	   a risk-utility tradeoff, as completely stable anonymisation can be
711	   used for longer-term trend analysis tasks but also presents more risk
712	   of attack given the stable mapping.

714	5.2.  Truncation Length

716	   Truncation and precision degradation are described by the truncation
717	   length, or the amount of data still remaining in the anonymised field
718	   after anonymisation.

720	   Truncation length can be inferred from a given data set, and need not
721	   be specially exported or protected.

723	5.3.  Bin Map

725	   Binning is described by the specification of a bin mapping function.
726	   This function can be generally expressed in terms of an associative
727	   array that maps each point in the original space to a bin, although
728	   from an implementation standpoint most bin functions are much simpler
729	   and more efficient.

731	   Since knowledge of the bin mapping function can be used to partially
732	   deanonymise binned data, depending on the degree of generalisation,
733	   no information about the bin mapping function should be exported.

735	5.4.  Permutation

737	   Like binning, permutation is described by the specification of a
738	   permutation function.  In the general case, this can be expressed in
739	   terms of an associative array that maps each point in the original
740	   space to a point in the anonymised space.  Unlike binning, each point
741	   in the anonymised space must correspond to a single, unique point in
742	   the original space.

744	   Since knowledge of the permutation function can be used to completely
745	   deanonymise permuted data, no information about the permutation
746	   function or its parameters should be exported.

748	5.5.  Shift Amount

750	   Shifting requires an amount to shift each value by.  Since the shift
751	   amount can be used to deanonymize data protected by shifting, no
752	   information about the shift amount should be exported.

754	6.  Anonymisation Export Support in IPFIX

756	   Anonymised data exported via IPFIX SHOULD be annotated with
757	   anonymisation metadata, which details which fields described by which
758	   Templates are anonymised, and provides appropriate information on the
759	   anonymisation techniques used.  This metadata SHOULD be exported in
760	   Data Records described by the recommended Options Templates described
761	   in this section; these Options Templates use the additional
762	   Information Elements described in the following subsection.

764	   Note that fields anonymised using the black-marker (removal)
765	   technique do not require any special metadata support.  Black-marker
766	   anonymised fields SHOULD NOT be exported at all; the absence of the
767	   field in a given Data Set is implicitly declared by not including the
768	   corresponding Information Element in the Template describing that
769	   Data Set; exporting "empty" data elements is inefficient and in the
770	   general case impossible, as many non-counter Information Elements do
771	   not have semantically distinct null values.

773	6.1.  Anonymisation Options Template

775	   The Anonymisation Options Template describes anonymisation records,
776	   which allow anonymisation metadata to be exported inline over IPFIX
777	   or stored in an IPFIX File, by binding information about
778	   anonymisation techniques to Information Elements within defined
779	   Templates.  IPFIX Exporting Processes SHOULD export anonymisation
780	   records for any Template describing exported anonymised Data Records;
781	   IPFIX Collecting Processes and processes downstream from them MAY use
782	   anonymisation records to treat anonymised data differently depending
783	   on the applied technique.

785	   An Exporting Process SHOULD export anonymisation records after the
786	   Templates they describe have been exported, and SHOULD export
787	   anonymisation records reliably.

789	   Anonymisation records, like Templates, MUST be handled by Collecting
790	   Processes as scoped to the Transport Session in which they are sent.
791	   While the anonymisationStability IE can be used to declare that a
792	   given anonymisation technique's mapping will remain stable across
793	   multiple sessions, each session MUST re-export the anonymisation
794	   Records along with the templates.

796	   [EDITOR'S NOTE: Multiple anon. techniques applied on an IE at the
797	   same time is indicated with multiple elements of the same type (in
798	   application order as in PSAMP).  Need to verify this is actually
799	   useful given the defined techniques.]

801	   +-------------------------+-----------------------------------------+
802	   | IE                      | Description                             |
803	   +-------------------------+-----------------------------------------+
804	   | templateId [scope]      | The Template ID of the Template         |
805	   |                         | containing the Information Element      |
806	   |                         | described by this anonymisation record. |
807	   |                         | This Information Element MUST be        |
808	   |                         | defined as a Scope Field.               |
809	   | informationElementId    | The Information Element identifier of   |
810	   | [scope]                 | the Information Element described by    |
811	   |                         | this anonymisation record.  This        |
812	   |                         | Information Element MUST be defined as  |
813	   |                         | a Scope Field.                          |
814	   | informationElementIndex | The Information Element index of the    |
815	   | [scope] [optional]      | instance of the Information Element     |
816	   |                         | described by this anonymisation record  |
817	   |                         | identified by the informationElementId  |
818	   |                         | within the Template.  Optional; need    |
819	   |                         | only be present when describing         |
820	   |                         | Templates that have multiple instances  |
821	   |                         | of the same Information Element.  This  |
822	   |                         | Information Element MUST be defined as  |
823	   |                         | a Scope Field if present.  This         |
824	   |                         | Information Element is defined in       |
825	   |                         | Section 6.2, below.                     |
826	   | anonymisationStability  | The stability class of the anonymised   |
827	   |                         | data.  MUST be present.  This           |
828	   |                         | Information Element is defined in       |
829	   |                         | Section 6.2, below.                     |
830	   | anonymisationTechnique  | The technique used to anonymise the     |
831	   |                         | data.  MUST be present.  This           |
832	   |                         | Information Element is defined in       |
833	   |                         | Section 6.2, below.                     |
834	   +-------------------------+-----------------------------------------+

836	6.2.  Recommended Information Elements for Anonymisation Metadata

838	6.2.1.  anonymisationStability

840	   Description:   A description of the stability class of the
841	      anonymisation technique applied to a referenced Information
842	      Element within a referenced Template.  Stability classes refer to
843	      the stability of the parameters of the anonymisation technique,
844	      and therefore the comparability of the mapping between the real
845	      and anonymised values over time.  This determines which anonymised
846	      datasets may be compared with each other.

848	   +-------+-----------------------------------------------------------+
849	   | Value | Description                                               |
850	   +-------+-----------------------------------------------------------+
851	   | 0     | Undefined: the Exporting Process makes no representation  |
852	   |       | as to how stable the mapping is, or over what time period |
853	   |       | values of this field will remain comparable; while the    |
854	   |       | Collecting Process MAY assume Session level stability,    |
855	   |       | Session level stability is not guaranteed.  This is       |
856	   |       | equivalent to 0x01 Session level stability while advising |
857	   |       | the Collecting Process that no special effort has been    |
858	   |       | made to ensure stability.  Collecting Processes SHOULD    |
859	   |       | assume this is the case in the absence of stability class |
860	   |       | information; this is the default stability class.         |
861	   | 1     | Session: the Exporting Process will ensure that the       |
862	   |       | parameters of the anonymisation technique are stable      |
863	   |       | during the Transport Session.  All the values of the      |
864	   |       | described Information Element for each Record described   |
865	   |       | by the referenced Template within the Transport Session   |
866	   |       | are comparable.  The Exporting Process SHOULD endeavour   |
867	   |       | to ensure at least this stability class.                  |
868	   | 2     | Exporter-Collector Pair: the Exporting Process will       |
869	   |       | ensure that the parameters of the anonymisation technique |
870	   |       | are stable across Transport Sessions over time with the   |
871	   |       | given Collecting Process, but may use different           |
872	   |       | parameters for different Collecting Processes.  Data      |
873	   |       | exported to different Collecting Processes is not         |
874	   |       | comparable.                                               |
875	   | 3     | Stable: the Exporting Process will ensure that the        |
876	   |       | parameters of the anonymisation technique are stable      |
877	   |       | across Transport Sessions over time, regardless of the    |
878	   |       | Collecting Process to which it is sent.                   |
879	   +-------+-----------------------------------------------------------+

881	   Abstract Data Type:   unsigned8

883	   ElementId:   TBD1

885	   Status:   Proposed

887	6.2.2.  anonymisationTechnique

889	   Description:   A description of the anonymisation technique applied
890	      to a referenced Information Element within a referenced Template.
891	      Each technique may be applicable only to certain Information
892	      Elements and recommended only for certain Infomation Elements;
893	      these restrictions are noted in the table below.

895	   +-------+--------------------------------+------------+-------------+
896	   | Value | Description                    | Applicable | Recommended |
897	   |       |                                | to         | for         |
898	   +-------+--------------------------------+------------+-------------+
899	   | 0     | Undefined: the Exporting       | all        | all         |
900	   |       | Process makes no               |            |             |
901	   |       | representation as to whether   |            |             |
902	   |       | the defined field is           |            |             |
903	   |       | anonymised or not.  While the  |            |             |
904	   |       | Collecting Process MAY assume  |            |             |
905	   |       | that the field is not          |            |             |
906	   |       | anonymised, it is not          |            |             |
907	   |       | guaranteed not to be.  This is |            |             |
908	   |       | the default anonymisation      |            |             |
909	   |       | technique.                     |            |             |
910	   | 1     | None: the values exported are  | all        | all         |
911	   |       | real.                          |            |             |
912	   | 2     | Precision                      | all        | all         |
913	   |       | Degradation/Truncation: the    |            |             |
914	   |       | values exported are anonymised |            |             |
915	   |       | using simple precision         |            |             |
916	   |       | degradation or truncation.     |            |             |
917	   |       | The new precision is implicit  |            |             |
918	   |       | in the exported data, and can  |            |             |
919	   |       | be deduced by the Collecting   |            |             |
920	   |       | Process.                       |            |             |
921	   | 3     | Binning: the values exported   | all        | all         |
922	   |       | are anonymised into bins.      |            |             |
923	   | 4     | Enumeration: the values        | all        | timestamps  |
924	   |       | exported are anonymised by     |            |             |
925	   |       | enumeration.                   |            |             |
926	   | 5     | Permutation: the values        | all        | identifiers |
927	   |       | exported are anonymised by     |            |             |
928	   |       | random permutation.            |            |             |
929	   | 6     | Structured Permutation: the    | addresses  |             |
930	   |       | values exported are anonymised |            |             |
931	   |       | by random permutation,         |            |             |
932	   |       | preserving bit-level structure |            |             |
933	   |       | as appropriate; this           |            |             |
934	   |       | represents prefix-preserving   |            |             |
935	   |       | IP address anonymisation or    |            |             |
936	   |       | structured MAC address         |            |             |
937	   |       | anonymisation.                 |            |             |
938	   +-------+--------------------------------+------------+-------------+
939	   Abstract Data Type:   unsigned8

941	   ElementId:   TBD2

943	   Status:   Proposed

945	6.2.3.  informationElementIndex

947	   Description:   A zero-based index of an Information Element
948	      referenced by informationElementId within a Template referenced by
949	      templateId; used to disambiguate scope for templates containing
950	      multiple identical Information Elements.

952	   Abstract Data Type:   unsigned16

954	   ElementId:   TBD3

956	   Status:   Proposed

958	7.  Applying Anonymisation Techniques to IPFIX Export and Storage

960	   When exporting or storing anonymised flow data using IPFIX, certain
961	   interactions between the IPFIX Protocol and the anonymisation
962	   techniques in use must be considered; these are treated in the
963	   subsections below.

965	7.1.  Arrangement of Processes in IPFIX Anonymisation

967	   Anonymisation may be applied to IPFIX data at three stages within a
968	   the collection infrastructure: on initial export, at a mediator, or
969	   after collection, as shown in Figure 2.  Each of these locations has
970	   specific considerations and applicability.

972	               +==========================================+
973	               | Exporting Process                        |
974	               +==========================================+
975	                 |                                      |
976	                 |    (Anonymised at Original Exporter) |
977	                 V                                      |
978	               +=============================+          |
979	               | Mediator                    |          |
980	               +=============================+          |
981	                 |                                      |
982	                 | (Anonymising Mediator)               |
983	                 V                                      V
984	               +==========================================+
985	               | Collecting Process                       |
986	               +==========================================+
987	                       |
988	                       | (Anonymising CP/File Writer)
989	                       V
990	               +--------------------+
991	               | IPFIX File Storage |
992	               +--------------------+

994	                Figure 2: Potential Anonymisation Locations

996	   Anonymisation is generally performed before the wider dissemination
997	   or repurposing of a flow data set, e.g., adapting operational
998	   measurement data for research.  Therefore, direct anonymisation of
999	   flow data on initial export is only applicable in certain restricted
1000	   circumstances: when the Exporting Process is "publishing" data to a
1001	   Collecting Process directly, and the Exporting Process and Collecting
1002	   Process are operated by different entities.  Note that certain
1003	   guidelines in Section 7.2.2 with respect to timestamp anonymisation
1004	   may not apply in this case, as the Collecting Process may be able to
1005	   deduce certain timing information from the time at which each Message
1006	   is received.

1008	   A much more flexible arrangement is to anonymise data within a
1009	   Mediator [I-D.ietf-ipfix-mediators-framework].  Here, original data
1010	   is sent to a Mediator, which performs the anonymisation function and
1011	   re-exports the anonymised data.  Such a Mediator could be located at
1012	   the administrative domain boundary of the initial Exporting Process
1013	   operator, exporting anonymised data to other consumers outside the
1014	   organisation.  In this case, the original Exporter SHOULD use TLS as
1015	   specified in [RFC5101] to secure the channel to the Mediator, and the
1016	   Mediator should follow the guidelines in Section 7.2, to mitigate the
1017	   risk of original data disclosure.

1019	   When data is to be published as an anonymised data set in an IPFIX
1020	   File [I-D.ietf-ipfix-file], the anonymisation may be done at the
1021	   final Collecting Process before storage and dissemination, as well.
1022	   In this case, the Collector should follow the guidelines in
1023	   Section 7.2, especially as regards File-specific Options in
1024	   Section 7.2.3

1026	   In each of these data flows, the anonymisation of records is
1027	   undertaken by an Intermediate Anonymisation Process (IAP); the data
1028	   flows into and out of this IAP are shown in Figure 3 below.

1030	   packets --+                     +- IPFIX Messages -+
1031	             |                     |                  |
1032	             V                     V                  V
1033	   +==================+ +====================+ +=============+
1034	   | Metering Process | | Collecting Process | | File Reader |
1035	   +==================+ +====================+ +=============+
1036	             |      Non-anonymised | Records          |
1037	             V                     V                  V
1038	   +=========================================================+
1039	   |          Intermediate Anonymisation Process (IAP)       |
1040	   +=========================================================+
1041	             | Anonymised     ^            Anonymised |
1042	             | Records        |               Records |
1043	             V                |                       V
1044	   +===================+    Anonymisation      +=============+
1045	   | Exporting Process |<--- Parameters ------>| File Writer |
1046	   +===================+                       +=============+
1047	             |                                        |
1048	             +------------> IPFIX Messages <----------+

1050	          Figure 3: Data flows through the anonymisation process

1052	   Anonymisation parameters must also be available to the Exporting
1053	   Process and/or File Writer in order to ensure header data is also
1054	   appropriately anonymised as in Section 7.2.2.

1056	   Following each of the data flows through the IAP, we describe five
1057	   basic types of anonymisation arrangements within this framework in
1058	   Figure 4.  In addition to the three arrangements described in detail
1059	   above, anonymisation can also be done at a collocated Metering
1060	   Process and File Writer (see section 7.3.2 of [I-D.ietf-ipfix-file]),
1061	   or at a file manipulator (see section 7.3.7 of
1062	   [I-D.ietf-ipfix-file]).

1064	         +----+  +-----+  +----+
1065	 pkts -> | MP |->| IAP |->| EP |-> anonymisation on Original Exporter
1066	         +----+  +-----+  +----+
1067	         +----+  +-----+  +----+
1068	 pkts -> | MP |->| IAP |->| FW |-> Anonymising collocated MP/File Writer
1069	         +----+  +-----+  +----+
1070	         +----+  +-----+  +----+
1071	IPFIX -> | CP |->| IAP |->| EP |-> Anonymising Mediator (Masquerading Proxy)
1072	         +----+  +-----+  +----+
1073	         +----+  +-----+  +----+
1074	IPFIX -> | CP |->| IAP |->| FW |-> Anonymising collocated CP/File Writer
1075	         +----+  +-----+  +----+
1076	         +----+  +-----+  +----+
1077	IPFIX -> | FR |->| IAP |->| FW |-> Anonymising file manipulator
1078	 File    +----+  +-----+  +----+

1080	        Figure 4: Possible anonymisation arrangements in the IPFIX
1081	                               architecture

1083	   Note that anonymisation may occur at more than one location within a
1084	   given collection infrastructure, to provide varying levels of
1085	   anonymisation, disclosure risk, or data utility for specific
1086	   purposes.

1088	7.2.  IPFIX-Specific Anonymisation Guidelines

1090	   In implementing and deploying the anonymisation techniques described
1091	   in this document, implementors should note that IPFIX already
1092	   provides features that support anonymised data export, and use these
1093	   where appropriate.  Care must also be taken that data structures
1094	   supporting the operation of the protocol itself do not leak data that
1095	   could be used to reverse the anonymisation applied to the flow data.
1096	   Such data structures may appear in the header, or within the data
1097	   stream itself, especially as options data.  Each of these and their
1098	   impact on specific anonymisation techniques is noted in a separate
1099	   subsection below.

1101	7.2.1.  Appropriate Use of Information Elements for Anonymised Data

1103	   Note, as in Section 6 above, that black-marker anonymised fields
1104	   SHOULD NOT be exported at all; the absence of the field in a given
1105	   Data Set is implicitly declared by not including the corresponding
1106	   Information Element in the Template describing that Data Set.

1108	   When using precision degradation of timestamps, Exporting Processes
1109	   SHOULD export timing information using Information Elements of an
1110	   appropriate precision, as explained in Section 4.5 of [RFC5153].  For
1111	   example, timestamps measured in millisecond-level precision and
1112	   degraded to second-level precision should use flowStartSeconds and
1113	   flowEndSeconds, not flowStartMilliseconds and flowEndMilliseconds.

1115	   When exporting anonymised data and anonymisation metadata, Exporting
1116	   Processes SHOULD ensure that the combination of Information Element
1117	   and declared anonymisation technique are compatible.  Specifically,
1118	   the applicable and recommended Information Element types and
1119	   semantics for each technique are noted in the description of the
1120	   anonymisationTechnique Information Element in Section 6.2.2.  In this
1121	   description, a timestamp is an Information Element with the data type
1122	   dateTimeSeconds, dataTimeMilliseconds, dateTimeMicroseconds, or
1123	   dateTimeNanoseconds; an address is an Information Element with the
1124	   data type ipv4Address, ipv6Address, or macAddress; and an identifier
1125	   is an Information Element with identifier data type semantics.
1126	   Exporting Process MUST NOT export Anonymisation Options records
1127	   binding techniques to Information Elements to which they are not
1128	   applicable, and SHOULD NOT export Anonymisation Options records
1129	   binding techniques to Information Elements for which they are not
1130	   recommended.

1132	7.2.2.  Anonymisation of Header Data

1134	   Each IPFIX Message contains a Message Header; within this Message
1135	   Header are contained two fields which may be used to break certain
1136	   anonymisation techniques: the Export Time, and the Observation Domain
1137	   ID

1139	   Export of IPFIX Messages containing anonymised timestamp data where
1140	   the original Export Time Message header has some relationship to the
1141	   anonymised timestamps SHOULD anonymise the Export Time header field
1142	   using an equivalent technique, if possible.  Otherwise, relationships
1143	   between export and flow time could be used to partially or totally
1144	   reverse timestamp anonymisation.

1146	   The similarity in size between an Observation Domain ID and an IPv4
1147	   address (32 bits) may lead to a temptation to use an IPv4 interface
1148	   address on the Metering or Exporting Process as the Observation
1149	   Domain ID.  If this address bears some relation to the IP addresses
1150	   in the flow data (e.g., shares a network prefix with internal
1151	   addresses) and the IP addresses in the flow data are anonymised in a
1152	   structure-preserving way, then the Observation Domain ID may be used
1153	   to break the IP address anonymisation.  Use of an IPv4 interface
1154	   address on the Metering or Exporting Process as the Observation
1155	   Domain ID is NOT RECOMMENDED in this case.

1157	7.2.3.  Anonymisation of Options Data

1159	   IPFIX uses the Options mechanism to export, among other things,
1160	   metadata about exported flows and the flow collection infrastructure.
1161	   As with the IPFIX Message Header, certain Options recommended in
1162	   [RFC5101] and the IPFIX File Format [I-D.ietf-ipfix-file] containing
1163	   flow timestamps and network addresses of Exporting and Collecting
1164	   Processes may be used to break certain anonymisation techniques; care
1165	   should be taken while using them with anonymised data export and
1166	   storage.

1168	   The Exporting Process Reliability Statistics Options Template,
1169	   recommended in [RFC5101], contains an Exporting Process ID field,
1170	   which may be an exportingProcessIPv4Address Information Element or an
1171	   exportingProcessIPv6Address Information Element.  If the Exporting
1172	   Process address bears some relation to the IP addresses in the flow
1173	   data (e.g., shares a network prefix with internal addresses) and the
1174	   IP addresses in the flow data are anonymised in a structure-
1175	   preserving way, then the Exporting Process address may be used to
1176	   break the IP address anonymisation.  Exporting Processes exporting
1177	   anonymised data in this situation SHOULD mitigate the risk of attack
1178	   either by omitting Options described by the Exporting Process
1179	   Reliability Statistics Options Template, or by anonymising the
1180	   Exporting Process address using a similar technique to that used to
1181	   anonymise the IP addresses in the exported data.

1183	   Similarly, the Export Session Details Options Template and Message
1184	   Details Options Template specified for the IPFIX File Format
1185	   [I-D.ietf-ipfix-file] may contain the exportingProcessIPv4Address
1186	   Information Element or the exportingProcessIPv6Address Information
1187	   Element to identify an Exporting Process from which a flow record was
1188	   received, and the collectingProcessIPv4Address Information Element or
1189	   the collectingProcessIPv6Address Information Element to identify the
1190	   Collecting Process which received it.  If the Exporting Process or
1191	   Collecting Process address bears some relation to the IP addresses in
1192	   the flow data (e.g., shares a network prefix with internal addresses)
1193	   and the IP addresses in the flow data are anonymised in a structure-
1194	   preserving way, then the Exporting Process or Collecting Process
1195	   address may be used to break the IP address anonymisation.  Since
1196	   these Options Templates are primarily intended for storing IPFIX
1197	   Transport Session data for auditing, replay, and testing purposes, it
1198	   is NOT RECOMMENDED that storage of anonymised data include these
1199	   Options Templates in order to mitigate the risk of attack.

1201	   The Message Details Options Template specified for the IPFIX File
1202	   Format [I-D.ietf-ipfix-file] also contains the
1203	   collectionTimeMilliseconds Information Element.  As with the Export
1204	   Time Message Header field, if the exported flow data contains
1205	   anonymised timestamp information, and the collectionTimeMilliseconds
1206	   Information Element in a given Message has some relationship to the
1207	   anonymised timestamp information, then this relationship can be
1208	   exploited to reverse the timestamp anonymisation.  Since this Options
1209	   Template is primarily intended for storing IPFIX Transport Session
1210	   data for auditing, replay, and testing purposes, it is NOT
1211	   RECOMMENDED that storage of anonymised data include this Options
1212	   Template in order to mitigate the risk of attack.

1214	   Since the Time Window Options Template specified for the IPFIX File
1215	   Format [I-D.ietf-ipfix-file] refers to the timestamps within the flow
1216	   data to provide partial table of contents information for an IPFIX
1217	   File, care must be taken to ensure that Options described by this
1218	   template are written using the anonymised timestamps instead of the
1219	   original ones.

1221	8.  Examples

1223	   [TODO: write this section.]

1225	9.  Security Considerations

1227	   [TODO: write this section.]

1229	10.  IANA Considerations

1231	   This document specifies the creation of several new IPFIX Information
1232	   Elements in the IPFIX Information Element registry located at
1233	   http://www.iana.org/assignments/ipfix, as defined in Section 6.2
1234	   above.  IANA has assigned the following Information Element numbers
1235	   for their respective Information Elements as specified below:

1237	   o  Information Element number TBD1 for the anonymisationStability
1238	      Information Element.

1240	   o  Information Element number TBD2 for the anonymisationTechnique
1241	      Information Element.

1243	   o  Information Element number TBD3 for the informationElementIndex
1244	      Information Element.

1246	   [NOTE for IANA: The text TBDn should be replaced with the respective
1247	   assigned Information Element numbers where they appear in this
1248	   document.]

1250	11.  Acknowledgments

1252	   We thank Paul Aitken for his comments and insight, and the PRISM
1253	   project for its support of this work.

1255	12.  References

1257	12.1.  Normative References

1259	   [RFC5101]  Claise, B., "Specification of the IP Flow Information
1260	              Export (IPFIX) Protocol for the Exchange of IP Traffic
1261	              Flow Information", RFC 5101, January 2008.

1263	   [RFC5102]  Quittek, J., Bryant, S., Claise, B., Aitken, P., and J.
1264	              Meyer, "Information Model for IP Flow Information Export",
1265	              RFC 5102, January 2008.

1267	12.2.  Informative References

1269	   [RFC5472]  Zseby, T., Boschi, E., Brownlee, N., and B. Claise, "IP
1270	              Flow Information Export (IPFIX) Applicability", RFC 5472,
1271	              March 2009.

1273	   [RFC5470]  Sadasivan, G., Brownlee, N., Claise, B., and J. Quittek,
1274	              "Architecture for IP Flow Information Export", RFC 5470,
1275	              March 2009.

1277	   [I-D.ietf-ipfix-file]
1278	              Trammell, B., Boschi, E., Mark, L., Zseby, T., and A.
1279	              Wagner, "Specification of the IPFIX File Format",
1280	              draft-ietf-ipfix-file-04 (work in progress), July 2009.

1282	   [I-D.ietf-ipfix-mediators-framework]
1283	              Kobayashi, A., Nishida, H., and B. Claise, "IPFIX
1284	              Mediation: Framework",
1285	              draft-ietf-ipfix-mediators-framework-02 (work in
1286	              progress), February 2009.

1288	   [I-D.ietf-ipfix-mediators-problem-statement]
1289	              Kobayashi, A., Claise, B., Nishida, H., Sommer, C.,
1290	              Dressler, F., and E. Stephan, "IPFIX Mediation: Problem
1291	              Statement",
1292	              draft-ietf-ipfix-mediators-problem-statement-03 (work in
1293	              progress), April 2009.

1295	   [RFC5153]  Boschi, E., Mark, L., Quittek, J., Stiemerling, M., and P.
1296	              Aitken, "IP Flow Information Export (IPFIX) Implementation
1297	              Guidelines", RFC 5153, April 2008.

1299	   [RFC3917]  Quittek, J., Zseby, T., Claise, B., and S. Zander,
1300	              "Requirements for IP Flow Information Export (IPFIX)",
1301	              RFC 3917, October 2004.

1303	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1304	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1306	Authors' Addresses

1308	   Elisa Boschi
1309	   Hitachi Europe
1310	   c/o ETH Zurich
1311	   Gloriastrasse 35
1312	   8092 Zurich
1313	   Switzerland

1315	   Phone: +41 44 632 70 57
1316	   Email: elisa.boschi@hitachi-eu.com

1318	   Brian Trammell
1319	   Hitachi Europe
1320	   c/o ETH Zurich
1321	   Gloriastrasse 35
1322	   8092 Zurich
1323	   Switzerland

1325	   Phone: +41 44 632 70 13
1326	   Email: brian.trammell@hitachi-eu.com